CN111126268A - Key point detection model training method and device, electronic equipment and storage medium - Google Patents

Key point detection model training method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111126268A
CN111126268A CN201911346309.5A CN201911346309A CN111126268A CN 111126268 A CN111126268 A CN 111126268A CN 201911346309 A CN201911346309 A CN 201911346309A CN 111126268 A CN111126268 A CN 111126268A
Authority
CN
China
Prior art keywords
key point
preset
point
predicted
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911346309.5A
Other languages
Chinese (zh)
Other versions
CN111126268B (en
Inventor
钟韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201911346309.5A priority Critical patent/CN111126268B/en
Publication of CN111126268A publication Critical patent/CN111126268A/en
Application granted granted Critical
Publication of CN111126268B publication Critical patent/CN111126268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the application provides a method and a device for training a key point detection model, electronic equipment and a storage medium, and relates to the technical field of computers.

Description

Key point detection model training method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for training a keypoint detection model, an electronic device, and a storage medium.
Background
With the development of computer vision technology, the target detection is more and more widely applied, and has wide application prospects in numerous fields such as face recognition, safety monitoring, dynamic tracking and the like. The key points refer to parts with stable and important semantic information on the target, and the target key point positioning refers to positioning the key points of the specific target and outputting the position information of the key points of the specific target after the specific target is detected. The target key point positioning has very important practical application value in a plurality of fields such as target attribute analysis, posture recognition, posture correction and the like.
For example, in most face applications, it is necessary to accurately detect key points of a face, and the main principle is to locate the positions of all key points of the face in an input face picture, where the common key points include 21 points, 68 points, 106 points, 240 points, and so on. Loss functions in the process of training the key point detection model are important parameters for quantifying whether the key points are accurate, and the loss functions commonly used at present include Mean Squared Error (MSE), Mean Absolute Error (MAE), Wing loss function (Wing loss function), and the like.
The inventor finds in research that the loss function does not consider the error problem of the key point at the semantic ambiguity, so that the position of the key point predicted by the trained key point detection model has an error and is poor in precision.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for training a keypoint detection model, an electronic device, and a storage medium, so as to solve the problem of poor accuracy of keypoint detection and improve the accuracy of keypoint detection.
The specific technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a method for training a keypoint detection model, where the method includes:
acquiring a sample picture, wherein the sample picture corresponds to sample picture marks, the sample picture marks comprise a preset number of real key points, and each real key point corresponds to a position coordinate;
inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, wherein each prediction key point corresponds to a position coordinate;
calculating the point-line distance of each predicted key point according to the position coordinates of each predicted key point and the position coordinates of each real key point, wherein for any predicted key point, the point-line distance of the predicted key point is the shortest distance in the distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point with semantic relation with the predicted key point, the real key point with semantic relation with the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point which has the same semantic type as the target real key point and is adjacent to the target real key point;
calculating a loss value according to the line distance of each point;
and training the preset key point detection model according to the loss value to obtain a trained key point detection model.
Optionally, the sample picture is a picture with facial features, a picture with human body features, or a picture with gesture features.
Optionally, before the step of calculating the loss value according to each dot-line distance, the method further includes:
calculating the Euclidean distance between each predicted key point and a real key point with the same semantic meaning according to the position coordinate of each predicted key point and the position coordinate of each real key point to obtain the Euclidean distance of each predicted key point;
the calculating a loss value according to each point-line distance includes:
and calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight.
Optionally, the calculating a loss value according to each of the point-line distances, each of the euclidean distances, the preset point-line distance weight, and the preset euclidean distance weight includes:
calculating the average value of all point line distances according to all the point line distances to obtain a point line distance average value;
calculating the average value of all Euclidean distances according to all the Euclidean distances to obtain the Euclidean distance average value;
and calculating a loss value according to the point-line distance average value, the Euclidean distance average value, a preset point-line distance weight and a preset Euclidean distance weight.
Optionally, the calculating a loss value according to the point-line distance average value, the euclidean distance average value, the preset point-line distance weight, and the preset euclidean distance weight includes:
the loss value is calculated according to the following formula:
l=α×lsa+β×lmse
wherein ,
Figure BDA0002333447010000031
wherein α is the preset dot line distance weight, β is the preset Euclidean distance weight,/, in the formulasaIs the line pitch mean value, /)mseThe Euclidean distance average value is obtained, and l is a loss value; n denotes the number of predicted keypoints, i denotes the ith predicted keypoint, PiDenotes the ith prediction Key, GiDenotes the ith real keypoint, F (P)i,Gi) Represents PiA dot-line distance of, wherein, PiAnd GiHas a semantic relationship with the other components of the network,
Figure BDA0002333447010000032
an x-coordinate value representing the ith predicted keypoint,
Figure BDA0002333447010000041
a y-coordinate value representing the ith predicted keypoint,
Figure BDA0002333447010000042
the x-coordinate value representing the ith real keypoint,
Figure BDA0002333447010000043
a y-coordinate value representing the ith real keypoint,
Figure BDA0002333447010000044
and the Euclidean distance of the prediction key point of the ith prediction key point is shown.
In a second aspect, an embodiment of the present application provides a method for detecting a keypoint, where the method includes:
acquiring a picture to be detected;
inputting the picture to be detected into a preset key point detection model for analysis, and obtaining a plurality of preset key points of the picture to be detected, wherein the preset key point detection model is obtained by training by using any one of the key point detection model training methods in the first aspect.
In a third aspect, an embodiment of the present application provides a keypoint detection model training device, where the device includes:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a sample picture, the sample picture corresponds to sample picture marks, the sample picture marks comprise a preset number of real key points, and each real key point corresponds to a position coordinate;
the processing module is used for inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, and each prediction key point corresponds to a position coordinate;
the first calculation module is used for calculating the point-line distance of each predicted key point according to the position coordinates of each predicted key point and the position coordinates of each real key point, wherein for any predicted key point, the point-line distance of the predicted key point is the shortest distance in the distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key points with semantic relation with the predicted key point, the real key points with semantic relation with the predicted key point are target real key points, and the preset virtual straight line also passes through the real key points with the same semantic type as the target real key points and adjacent to the target real key points;
the second calculation module is used for calculating a loss value according to the line distance of each point;
and the training module is used for training the preset key point detection model according to the loss value to obtain a trained key point detection model.
Optionally, the sample picture is a picture with facial features, a picture with human body features, or a picture with gesture features.
Optionally, the apparatus further comprises:
the third calculation module is used for calculating the Euclidean distance between each predicted key point and the real key point with the same semantic meaning according to the position coordinates of each predicted key point and the position coordinates of each real key point to obtain the Euclidean distance of each predicted key point;
the second calculation module is specifically configured to:
and calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight.
Optionally, the second calculating module is specifically configured to:
calculating the average value of all point line distances according to all the point line distances to obtain a point line distance average value;
calculating the average value of all Euclidean distances according to all the Euclidean distances to obtain the Euclidean distance average value;
and calculating a loss value according to the point-line distance average value, the Euclidean distance average value, a preset point-line distance weight and a preset Euclidean distance weight.
Optionally, the second calculating module is specifically configured to:
the loss value is calculated according to the following formula:
l=α×lsa+β×lmse
wherein ,
Figure BDA0002333447010000051
wherein α is the preset dot line distance weight, β is the preset Euclidean distance weight,/, in the formulasaIs the line pitch mean value, /)mseThe Euclidean distance average value is obtained, and l is a loss value; n denotes the number of predicted keypoints, i denotes the ith predicted keypoint, PiDenotes the ith prediction Key, GiDenotes the ith real keypoint, F (P)i,Gi) Represents PiDotted line ofA distance of, wherein, PiAnd GiHas a semantic relationship with the other components of the network,
Figure BDA0002333447010000061
an x-coordinate value representing the ith predicted keypoint,
Figure BDA0002333447010000062
a y-coordinate value representing the ith predicted keypoint,
Figure BDA0002333447010000063
the x-coordinate value representing the ith real keypoint,
Figure BDA0002333447010000064
a y-coordinate value representing the ith real keypoint,
Figure BDA0002333447010000065
and the Euclidean distance of the prediction key point of the ith prediction key point is shown.
In a fourth aspect, an embodiment of the present application provides a keypoint detection apparatus, including:
the acquisition module is used for acquiring a picture to be detected;
the prediction module is configured to input the picture to be detected into a preset key point detection model for analysis, so as to obtain a plurality of preset key points of the picture to be detected, where the preset key point detection model is obtained by training using any one of the key point detection model training methods described in the first aspect.
In a fifth aspect, an embodiment of the present application provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein:
the processor, the communication interface and the memory complete mutual communication through a communication bus;
the memory is used for storing a computer program;
the processor is configured to implement the method for training the keypoint detection model according to any of the first aspect when executing the program stored in the memory.
In a sixth aspect, an embodiment of the present application provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein:
the processor, the communication interface and the memory complete mutual communication through a communication bus;
the memory is used for storing a computer program;
the processor is configured to implement the key point detection method according to any one of the second aspects when executing the program stored in the memory.
In a seventh aspect, an embodiment of the present application provides a storage medium, where instructions are stored in the storage medium, and when the instructions are executed on a computer, the instructions cause the computer to perform the method for training a keypoint detection model according to any one of the above first aspects.
In an eighth aspect, an embodiment of the present application provides a storage medium, where instructions are stored, and when the storage medium is run on a computer, the instructions cause the computer to execute the keypoint detection method according to any one of the second aspects.
In a ninth aspect, embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the method for training a keypoint detection model according to any of the above first aspects.
In a tenth aspect, embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the keypoint detection method according to any one of the second aspects described above.
According to the method, the device, the electronic equipment, the storage medium and the computer program product containing the instructions, the distance from a predicted key point to a preset virtual straight line passing through the predicted key point and a real key point having a semantic relation with the predicted key point is calculated, the shortest distance is taken as the point-line distance of the predicted key point, a loss value is calculated according to the point-line distance, so that the key point error caused by semantic ambiguity is fully considered, the preset key point detection model is trained according to the loss value, the key point error caused by the semantic ambiguity is fully considered in the training process of the preset key point detection model, the problem of poor key point detection precision is solved, and the accuracy of key point detection is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1a is a first schematic diagram of a keypoint detection model training method according to an embodiment of the present application;
FIG. 1b is a second schematic diagram of a keypoint detection model training method according to an embodiment of the present application;
FIG. 1c is a third schematic diagram of a keypoint detection model training method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a keypoint detection method according to an embodiment of the present application;
FIG. 3a is a first schematic diagram of a keypoint detection model training apparatus according to an embodiment of the present application;
FIG. 3b is a second schematic diagram of a keypoint detection model training device according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a keypoint detection apparatus according to an embodiment of the present application;
fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to solve the problem of poor detection precision of key points and improve the accuracy of key point detection, the application discloses a method for training a key point detection model, which comprises the following steps:
obtaining a sample picture, wherein the sample picture corresponds to sample picture marks, the sample picture marks comprise a preset number of real key points, and each real key point corresponds to a position coordinate;
inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, wherein each prediction key point corresponds to a position coordinate;
calculating a point-line distance of each of the predicted key points according to the position coordinates of each of the predicted key points and the position coordinates of each of the real key points, wherein for any one of the predicted key points, the point-line distance of the predicted key point is the shortest distance among the distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point having a semantic relationship with the predicted key point, the real key point having a semantic relationship with the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point having the same semantic type as the target real key point and adjacent to the target real key point;
calculating a loss value according to the line distance of each point;
and training the preset key point detection model according to the loss value to obtain a trained key point detection model.
The method comprises the steps of calculating the distance from a predicted key point to a preset virtual straight line passing through the predicted key point and a real key point having a semantic relation with the predicted key point, taking the shortest distance as the point-line distance of the predicted key, calculating a loss value according to the point-line distance, fully considering key point errors caused by semantic ambiguity, and training a preset key point detection model according to the loss value, so that the key point errors caused by the semantic ambiguity are fully considered in the training process of the preset key point detection model, the problem of poor key point detection precision is solved, and the accuracy of key point detection is improved.
An embodiment of the present application provides a method for training a keypoint detection model, and referring to fig. 1a, fig. 1a is a first schematic diagram of the method for training the keypoint detection model in the embodiment of the present application, and includes the following steps:
step 110, obtaining a sample picture, where the sample picture corresponds to sample picture marks, the sample picture marks include a preset number of real key points, and each real key point corresponds to a position coordinate.
The key point detection model training method can be realized through electronic equipment, and specifically, the electronic equipment can be a server and the like.
In the process of training the preset key point detection model, a sample picture needs to be acquired so that the preset key point detection model can be trained according to the sample picture, and the sample picture can be a picture with facial features, a picture with human body features or a picture with gesture features. The sample picture corresponds to a sample picture mark, the sample picture mark comprises a preset number of real key points, and each real key point corresponds to a position coordinate. For example, in face recognition, key points of a face need to be detected, and the preset sample picture set is a picture set with face features and includes a plurality of sample pictures with face features, the sample pictures correspond to sample picture markers, the sample picture markers include 21 real key points, and each real key point corresponds to a position coordinate. The real key points of the sample picture of the human face features are generated through five sense organs of the human face and facial contours, the number of the real key points and the positions of the key points are set according to actual needs, for example, the key positions of the eyes, the upper lip, the lower lip and the cheek of the human face are labeled to generate sample picture marks, wherein the key points of each part are key points with the same type of semantic relationship, the semantics can be simply regarded as the meanings of concepts represented by the key points corresponding to the data, and the relationships among the meanings are the explanation and logical representation of the key points, and the semantic relationship type refers to the type of the semantic relationship divided according to different classification conditions. The semantic relationships can be classified to help understand the meaning and the characteristics of the semantic relationships so as to play a role of the semantic relationships in information organization, for example, in a human face, according to the parts divided by the key points, the key points of a certain part can be called key points of the same type of semantic relationships, for example, the key points of a cheek can be called key points of the same type of semantic relationships, the key points of a left eye can be called key points of the cheek, the key points of the left eye can be key points of the same type of semantic relationships, and the key points of the left eye can be called key points of the left eye.
And 120, inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, wherein each prediction key point corresponds to a position coordinate.
Inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, wherein each prediction key point is corresponding to a position coordinate, for example, the sample picture is a picture with human face characteristics, wherein the sample picture mark comprises 68 real key points, inputting the sample picture into the preset key point detection model for processing to obtain 68 prediction key points, each prediction key point is corresponding to a position coordinate, wherein each prediction key point is corresponding to the 68 real key points one by one, has a semantic relationship, and the prediction key points and the real key points have the semantic relationship one by one, and when the prediction key points and the real key points are generated, the prediction key points and the real key points can be numbered, and the semantic relationship between the prediction key points and the real key points is represented by the numbers, for example, the left eye has 6 real keypoints, labeled left eye true 1, left eye true 2, left eye true 3, left eye true 4, left eye true 5, the left eye has 6 predicted keypoints, labeled left eye prediction 1, left eye prediction 2, left eye prediction 3, left eye prediction 4, and left eye prediction 5, then the real keypoint labeled left eye true 1 and the predicted keypoint labeled left eye prediction 1 have a semantic relationship, the real keypoint labeled left eye true 2 and the predicted keypoint labeled left eye prediction 2 have a semantic relationship, and the real keypoint labeled left eye true 3 and the predicted keypoint labeled left eye prediction 3 have a semantic relationship.
Step 130, calculating a point-line distance of each of the predicted key points according to the position coordinates of each of the predicted key points and the position coordinates of each of the real key points, wherein for any one of the predicted key points, the point-line distance of the predicted key point is the shortest distance among the distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point having a semantic relationship with the predicted key point, the real key point having a semantic relationship with the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point having the same semantic type as the target real key point and adjacent to the target real key point.
For any one predicted key point, the point-line distance of the predicted key point is the shortest distance in the distances from the predicted key point to a preset virtual straight line, wherein the preset virtual straight line represents the spatial relationship between two points, the preset virtual straight line passes through a real key point with semantic relationship with the predicted key point, and in addition, the preset virtual straight line also passes through a real key point which has the same type of semantic relationship with the real key point with semantic relationship with the predicted key point and is adjacent to the real key point with semantic relationship with the predicted key point. The key points of each part are key points with the same type of semantic relationship, for example, real key points of a cheek, when the real key points of the cheek are generated, serial numbers can be sequentially marked for each real key point, and according to the serial numbers of each real key point, each adjacent real key point with the same type of semantic relationship is connected to obtain each preset virtual straight line. For example, the sample picture is a sample picture with human face features, wherein the sample picture markers include 68 real key points respectively located in the left eye, the right eye, the nose, the upper lip, the lower lip, the left eyebrow, the right eyebrow, and the cheek. With the left eye having 6 keypoints, the right eye having 6 keypoints, and the cheek having 16 keypoints. According to the position coordinates and the semantic relationship types of the real key points, connecting the adjacent real key points with the same semantic relationship type to obtain preset virtual straight lines, wherein the same semantic relationship type means the same type of real key points, for example, the real key points of the outer contour of the cheek are the real key points with the same semantic relationship type, the real key points on the outer contour line of the upper lip are the real key points with the same semantic relationship type, the real key points on the outer contour line of the lower lip are the real key points with the same semantic relationship type, for example, connecting the adjacent real key points of the 16 key points of the cheek to obtain the preset virtual straight lines of the cheek part, and connecting the adjacent real key points of the 6 key points of the left eye to obtain the preset virtual straight lines of the left eye part.
And then, according to the position coordinates of each predicted key point, the shortest distance from each predicted key point to the preset virtual straight line can be calculated.
For example, the sample picture is a picture with human face features, wherein the cheek part comprises n key points, GiRepresents the ith real keypoint, i ∈ { 1., n }, n ∈>3, then i>1, with GiThe adjacent real key point is Gi-1 and Gi+1Then G will bei and Gi-1Making a connection to obtain a straight line Xi-1,iG isi and Gi+1Making a connection to obtain a straight line Xi,i+1. Calculating the dot-to-dot pitch of each of said predicted keypoints, e.g. P, on the basis of the position coordinates of said predicted keypointsiRepresents the ith prediction key point, since PiDenotes the ith prediction Key, GiRepresenting the ith real key point, and representing the predicted key point and the real key point with semantic relation according to the serial numberIs, then represents PiIs GiCorresponding predicted keypoint, PiAnd GiHaving a semantic relationship, P is calculated separatelyiTo a straight line Xi-1,iAnd a straight line Xi,i+1Wherein the shortest distance is PiThe dot-line distance of (1) is marked as F (P)i,Gi). When i is 1, and G1The adjacent real key point is G2Then G will be1 and G2Making a connection to obtain a straight line X1,2. Predicting a keypoint P1Position coordinates of (2), calculating P1Distance of point of (P)1And G1If the semantics are the same, P is calculated1To a straight line X1,2, wherein ,P1To X1,2A distance of PiThe dot-line distance of (a). The loss value between the real key point and the predicted key point can be further calculated through the point-line distance, because the shortest distance between the predicted key point and the point-line distance and a preset virtual straight line passing through the real key point with semantic relation with the predicted key point can be fully considered, the key point error caused by semantic ambiguity can be fully considered, and the preset key point detection model is trained according to the loss value, so that the key point error caused by the semantic ambiguity is fully considered in the training process of the preset key point detection model, the problem of poor key point detection precision is further solved, and the accuracy of key point detection is improved.
And step 140, calculating a loss value according to the point-line distance.
The sum of the line distances of the points is calculated to obtain the sum of the line distances of the points, the average value of the line distances of the points is calculated according to the sum of the line distances of the points, the average value of the line distances of the points is used as a loss value, so that the key point error caused by semantic ambiguity is fully considered, the preset key point detection model is trained according to the loss value, the key point error caused by the semantic ambiguity is fully considered in the training process of the preset key point detection model, the problem of poor key point detection accuracy is solved, and the accuracy of key point detection is improved.
And 150, training the preset key point detection model according to the loss value to obtain a trained key point detection model.
Adjusting parameters of the preset key point detection model according to the loss values, continuing training according to the sample pictures until a preset training end condition is met, for example, the preset training end condition is 500 sample pictures to be trained, or the preset training end condition is that the loss values do not exceed a preset threshold value, and the like, obtaining the trained key point detection model, training the preset key point detection model according to the loss values, so that key point errors caused by semantic ambiguity are fully considered in the training process of the preset key point detection model, the problem of poor key point detection precision is solved, and the accuracy of key point detection is improved. Specifically, the training method of the preset key point detection model may refer to a training method of a model in the existing/related art.
The method comprises the steps of calculating the distance from a predicted key point to a straight line passing through the predicted key point and a real key point having a semantic relation with the predicted key point, taking the shortest distance as the point-line distance of the predicted key point, calculating a loss value according to the point-line distance, fully considering key point errors caused by semantic ambiguity, training a preset key point detection model according to the loss value, fully considering the key point errors caused by the semantic ambiguity in the training process of the preset key point detection model, solving the problem of poor key point detection precision and improving the accuracy of key point detection.
In a possible implementation manner, the sample picture is a picture with facial features, a picture with human body features, or a picture with gesture features.
The image with the facial features can be a human face feature image, an animal face feature image and the like. The sample picture is a picture with facial features, a picture with human body features or a picture with gesture features, and is used for detecting a target with facial features, a target with human body features or a target with gesture features.
Referring to fig. 1b, fig. 1b is a second schematic diagram of a method for training a keypoint detection model according to an embodiment of the present application, and in a possible implementation manner, before the step of calculating a loss value according to each of the above-mentioned line distances of points, the method further includes:
step 160, calculating the Euclidean distance between each predicted key point and the real key point with the same semantic meaning according to the position coordinates of each predicted key point and the position coordinates of each real key point to obtain the Euclidean distance of each predicted key point;
the calculating a loss value according to each of the dot pitches includes:
and step 141, calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight.
Calculating the Euclidean distance between each predicted key point and a real key point with the same semantic meaning according to the position coordinates of each predicted key point, for example,
Figure BDA0002333447010000151
an x-coordinate value representing the ith predicted keypoint,
Figure BDA0002333447010000152
a y-coordinate value representing the ith predicted keypoint,
Figure BDA0002333447010000153
the x-coordinate value representing the ith real keypoint,
Figure BDA0002333447010000154
the y coordinate value of the ith real key point is represented, then
Figure BDA0002333447010000155
And the Euclidean distance of the prediction key point of the ith prediction key point is shown.
And calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight. And finally, a weighted sum of the point and line distances is obtained by assigning a coefficient to the sum of the point and line distances by a preset point and line distance weight, a weighted sum of the point and line distances is obtained by a preset euclidean distance weight, and a loss value is calculated by calculating a sum of the weighted sum of the point and line distances and the weighted sum of the euclidean distances. And calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight, so that the loss value for adjusting the parameters of the preset key point detection model takes into account the point-line distance from the predicted key point to a preset virtual straight line passing through a real key point with a semantic relationship with the predicted key point and the distance from the predicted key point to two points of the real key point with the semantic relationship with the predicted key point, thereby fully considering the key point error caused by semantic ambiguity.
Alternatively, the loss value is calculated according to the following formula:
l=α×lsa+β×lmse
wherein ,
Figure BDA0002333447010000161
wherein α is the weight of the preset dot line distance, β is the weight of the preset Euclidean distance, lsaIs the above-mentioned line pitch average value,/mseThe Euclidean distance average value is shown, and l is a loss value; n denotes the number of predicted keypoints, i denotes the ith predicted keypoint, PiDenotes the ith prediction Key, GiDenotes the ith real keypoint, F (P)i,Gi) Represents PiA dot-line distance of, wherein, PiAnd GiHas a semantic relationship with the other components of the network,
Figure BDA0002333447010000162
an x-coordinate value representing the ith predicted keypoint,
Figure BDA0002333447010000163
a y-coordinate value representing the ith predicted keypoint,
Figure BDA0002333447010000164
the x-coordinate value representing the ith real keypoint,
Figure BDA0002333447010000165
a y-coordinate value representing the ith real keypoint,
Figure BDA0002333447010000166
and the Euclidean distance of the prediction key point of the ith prediction key point is shown.
An embodiment of the present application provides a method for training a keypoint detection model, referring to fig. 1c, where fig. 1c is a third schematic diagram of the method for training the keypoint detection model according to the embodiment of the present application, and in a possible implementation, the calculating a loss value according to each point-line distance, each euclidean distance, a preset point-line distance weight, and a preset euclidean distance weight includes:
step 1411, calculating an average value of all point line distances according to the point line distances to obtain a point line distance average value;
step 1412, calculating the average value of all the Euclidean distances according to the Euclidean distances to obtain the Euclidean distance average value;
and 1413, calculating a loss value according to the point-line distance average value, the Euclidean distance average value, the preset point-line distance weight and the preset Euclidean distance weight.
The cheek part comprises n key points, and the average value of all the point line distances is calculated according to the point line distances to obtain a point line distance average value which is represented as lsa
Figure BDA0002333447010000171
Calculating the average value of all Euclidean distances according to the Euclidean distances to obtain the Euclidean distance average value, wherein the Euclidean distance average value is expressed as lmse
Figure BDA0002333447010000172
The preset dot-line distance weight is α, the preset euclidean distance weight is β, and a loss value is calculated based on the dot-line distance average value, the euclidean distance average value, the preset dot-line distance weight, and the preset euclidean distance weight, where the loss value is represented by l, where l is α × lsa+β×lmse
An embodiment of the present application provides a method for detecting a keypoint, referring to fig. 2, where fig. 2 is a schematic diagram of the method for detecting a keypoint, and the method includes the following steps:
step 210, obtaining a picture to be detected.
The key point detection method of the embodiment of the application can be implemented by electronic equipment, and specifically, the electronic equipment can be a server and the like.
And acquiring a picture to be detected so as to input the picture to be detected into a preset key point detection model to obtain a plurality of preset key points of the picture to be detected.
Step 220, inputting the picture to be detected into a preset key point detection model to obtain a plurality of preset key points of the picture to be detected, wherein the preset key point detection model is obtained by training according to any one of the key point detection model training methods in the embodiments.
After the preset key point detection model is trained by any one of the key point detection model training methods in the embodiments, the picture to be detected is input into the preset key point detection model for analysis, and a plurality of preset key points of the picture to be detected can be obtained. For example, the sample picture is a face picture, and after the preset key point detection model is trained by the key point detection model training method in any one of the embodiments, the preset key point detection model may be used to predict key points on the face picture, and then the picture to be detected is input into the preset key point detection model to be analyzed, so as to obtain a plurality of preset key points of the picture to be detected, thereby realizing prediction of the key points of the picture to be detected.
An embodiment of the present application further provides a device, referring to fig. 3a, where fig. 3a is a first schematic diagram of a keypoint detection model training device according to an embodiment of the present application, where the device includes:
an obtaining module 310, configured to obtain a sample picture, where the sample picture corresponds to a sample picture mark, the sample picture mark includes a preset number of real key points, and each of the real key points corresponds to a position coordinate;
a processing module 320, configured to input the sample picture into a preset key point detection model for processing, so as to obtain a preset number of prediction key points, where each prediction key point corresponds to a position coordinate;
a first calculating module 330, configured to calculate a point-to-line distance of each of the predicted key points according to a position coordinate of each of the predicted key points and a position coordinate of each of the real key points, where, for any one of the predicted key points, the point-to-line distance of the predicted key point is a shortest distance among distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key points having a semantic relationship with the predicted key point, the real key points having a semantic relationship with the predicted key point are target real key points, and the preset virtual straight line also passes through the real key points having a semantic type same as that of the target real key points and adjacent to the target real key points;
a second calculating module 340, configured to calculate a loss value according to each of the dot-line distances;
and the training module 350 is configured to train the preset keypoint detection model according to the loss value to obtain a trained keypoint detection model.
Referring to fig. 3b, fig. 3b is a second schematic diagram of the keypoint detection model training apparatus according to the embodiment of the present application, and in a possible implementation manner, the sample picture is a picture with facial features, a picture with human body features, or a picture with gesture features.
In a possible embodiment, the above apparatus further comprises:
a third calculating module 360, configured to calculate a euclidean distance between each predicted key point and each real key point with the same semantic meaning according to the position coordinates of each predicted key point and the position coordinates of each real key point, so as to obtain a euclidean distance between each predicted key point;
the second calculating module 340 is specifically configured to:
and calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight.
In a possible implementation manner, the second calculating module 340 is specifically configured to:
calculating the average value of all the point line distances according to the point line distances to obtain a point line distance average value;
calculating the average value of all Euclidean distances according to all the Euclidean distances to obtain the Euclidean distance average value;
and calculating a loss value according to the point-line distance average value, the Euclidean distance average value, the preset point-line distance weight and the preset Euclidean distance weight.
In a possible implementation manner, the second calculating module 340 is specifically configured to:
the loss value is calculated according to the following formula:
l=α×lsa+β×lmse
wherein ,
Figure BDA0002333447010000191
wherein α is the weight of the preset dot line distance, β is the weight of the preset Euclidean distance, lsaIs the above-mentioned line pitch average value,/mseThe Euclidean distance average value is shown, and l is a loss value; n denotes the number of predicted keypoints, i denotes the ith predicted keypoint, PiDenotes the ith prediction Key, GiDenotes the ith real keypoint, F (P)i,Gi) Represents PiA dot-line distance of, wherein, PiAnd GiHas a semantic relationship with the other components of the network,
Figure BDA0002333447010000201
x coordinate value representing ith predicted keypoint,
Figure BDA0002333447010000202
A y-coordinate value representing the ith predicted keypoint,
Figure BDA0002333447010000203
the x-coordinate value representing the ith real keypoint,
Figure BDA0002333447010000204
a y-coordinate value representing the ith real keypoint,
Figure BDA0002333447010000205
and the Euclidean distance of the prediction key point of the ith prediction key point is shown.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
An embodiment of the present application further provides an apparatus, referring to fig. 4, where fig. 4 is a schematic diagram of a keypoint detection apparatus according to an embodiment of the present application, where the apparatus includes:
the acquisition module 410 is used for acquiring a picture to be detected;
the prediction module 420 is configured to input the picture to be detected into a preset keypoint detection model for analysis, so as to obtain a plurality of preset keypoints of the picture to be detected, where the preset keypoint detection model is obtained by training using any one of the keypoint detection model training methods in the embodiments.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
An embodiment of the present application further provides an electronic device, referring to fig. 5, where fig. 5 is a schematic diagram of the electronic device according to the embodiment of the present application, and the electronic device includes: a processor 510, a communication interface 520, a memory 530, and a communication bus 540, wherein the processor 510, the communication interface 520, and the memory 530 communicate with each other via the communication bus 540,
the memory 530 for storing a computer program;
the processor 510 is configured to implement the following steps when executing the computer program stored in the memory 530:
obtaining a sample picture, wherein the sample picture corresponds to sample picture marks, the sample picture marks comprise a preset number of real key points, and each real key point corresponds to a position coordinate;
inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, wherein each prediction key point corresponds to a position coordinate;
calculating a point-line distance of each of the predicted key points according to the position coordinates of each of the predicted key points and the position coordinates of each of the real key points, wherein for any one of the predicted key points, the point-line distance of the predicted key point is the shortest distance among the distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point having a semantic relationship with the predicted key point, the real key point having a semantic relationship with the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point having the same semantic type as the target real key point and adjacent to the target real key point;
calculating a loss value according to the line distance of each point;
and training the preset key point detection model according to the loss value to obtain a trained key point detection model.
Optionally, the processor 510, when configured to execute the program stored in the memory 530, may further implement any of the above-described methods for training the keypoint detection model.
An embodiment of the present application further provides an electronic device, including: a processor, a communication interface, a memory and a communication bus, wherein, the processor, the communication interface and the memory complete the mutual communication through the communication bus,
the memory is used for storing computer programs;
the processor is configured to implement the following steps when executing the computer program stored in the memory:
acquiring a picture to be detected;
inputting the picture to be detected into a preset key point detection model to obtain a plurality of preset key points of the picture to be detected, wherein the preset key point detection model is obtained by training by using the key point detection model training method according to any one of claims 1 to 5.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In an embodiment of the present application, there is further provided a storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform any one of the above-mentioned methods for training a keypoint detection model.
In an embodiment of the present application, there is also provided a storage medium having instructions stored therein, which when run on a computer, cause the computer to perform any of the above-described keypoint detection methods in the above-described embodiments.
In an embodiment of the present application, there is further provided a computer program product containing instructions, which when run on a computer, cause the computer to perform any one of the above-mentioned methods for training a keypoint detection model.
In an embodiment of the present application, there is also provided a computer program product containing instructions, which when run on a computer, cause the computer to perform any of the above-mentioned keypoint detection methods in the above-mentioned embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described above in accordance with the embodiments of the invention may be generated, in whole or in part, when the computer program instructions described above are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It should be noted that, in this document, the technical features in the various alternatives can be combined to form the scheme as long as the technical features are not contradictory, and the scheme is within the scope of the disclosure of the present application. Relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the same element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, and the storage medium, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.
The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (16)

1. A method for training a keypoint detection model, the method comprising:
acquiring a sample picture, wherein the sample picture corresponds to sample picture marks, the sample picture marks comprise a preset number of real key points, and each real key point corresponds to a position coordinate;
inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, wherein each prediction key point corresponds to a position coordinate;
calculating the point-line distance of each predicted key point according to the position coordinates of each predicted key point and the position coordinates of each real key point, wherein for any predicted key point, the point-line distance of the predicted key point is the shortest distance in the distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point with semantic relation with the predicted key point, the real key point with semantic relation with the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point which has the same semantic type as the target real key point and is adjacent to the target real key point;
calculating a loss value according to the line distance of each point;
and training the preset key point detection model according to the loss value to obtain a trained key point detection model.
2. The method according to claim 1, wherein the sample picture is a picture with facial features, a picture with human features or a picture with gesture features.
3. The method of claim 1 or 2, wherein the step of calculating a loss value from each of the dot-line distances is preceded by the method further comprising:
calculating the Euclidean distance between each predicted key point and a real key point with the same semantic meaning according to the position coordinate of each predicted key point and the position coordinate of each real key point to obtain the Euclidean distance of each predicted key point;
the calculating a loss value according to each point-line distance includes:
and calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight.
4. The method of claim 3, wherein calculating a loss value based on each of the dot-to-line distances, each of the Euclidean distances, a preset dot-to-line distance weight, and a preset Euclidean distance weight comprises:
calculating the average value of all point line distances according to all the point line distances to obtain a point line distance average value;
calculating the average value of all Euclidean distances according to all the Euclidean distances to obtain the Euclidean distance average value;
and calculating a loss value according to the point-line distance average value, the Euclidean distance average value, a preset point-line distance weight and a preset Euclidean distance weight.
5. The method of claim 4, wherein calculating a loss value based on the point-line distance average, the Euclidean distance average, a preset point-line distance weight, and a preset Euclidean distance weight comprises:
the loss value is calculated according to the following formula:
l=α×lsa+β×lmse
wherein ,
Figure FDA0002333445000000021
wherein α is the preset dot line distance weight, β is the preset Euclidean distance weight,/, in the formulasaIs the line pitch mean value, /)mseThe Euclidean distance average value is obtained, and l is a loss value; n denotes the number of predicted keypoints, i denotes the ith predicted keypoint, PiDenotes the ith prediction Key, GiDenotes the ith real keypoint, F (P)i,Gi) Represents PiA dot-line distance of, wherein, PiAnd GiHas a semantic relationship with the other components of the network,
Figure FDA0002333445000000022
is shown asThe x-coordinate values of the i predicted keypoints,
Figure FDA0002333445000000023
a y-coordinate value representing the ith predicted keypoint,
Figure FDA0002333445000000024
the x-coordinate value representing the ith real keypoint,
Figure FDA0002333445000000025
a y-coordinate value representing the ith real keypoint,
Figure FDA0002333445000000031
and the Euclidean distance of the prediction key point of the ith prediction key point is shown.
6. A method of keypoint detection, the method comprising:
acquiring a picture to be detected;
inputting the picture to be detected into a preset key point detection model to obtain a plurality of preset key points of the picture to be detected, wherein the preset key point detection model is obtained by training by using the key point detection model training method of any one of claims 1 to 5.
7. A keypoint detection model training device, characterized in that it comprises:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a sample picture, the sample picture corresponds to sample picture marks, the sample picture marks comprise a preset number of real key points, and each real key point corresponds to a position coordinate;
the processing module is used for inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, and each prediction key point corresponds to a position coordinate;
the first calculation module is used for calculating the point-line distance of each predicted key point according to the position coordinates of each predicted key point and the position coordinates of each real key point, wherein for any predicted key point, the point-line distance of the predicted key point is the shortest distance in the distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key points with semantic relation with the predicted key point, the real key points with semantic relation with the predicted key point are target real key points, and the preset virtual straight line also passes through the real key points with the same semantic type as the target real key points and adjacent to the target real key points;
the second calculation module is used for calculating a loss value according to the line distance of each point;
and the training module is used for training the preset key point detection model according to the loss value to obtain a trained key point detection model.
8. The apparatus of claim 7, wherein the sample picture is a picture with facial features, a picture with human features, or a picture with gesture features.
9. The apparatus of claim 7 or 8, further comprising:
the third calculation module is used for calculating the Euclidean distance between each predicted key point and the real key point with the same semantic meaning according to the position coordinates of each predicted key point and the position coordinates of each real key point to obtain the Euclidean distance of each predicted key point;
the second calculation module is specifically configured to:
and calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight.
10. The apparatus of claim 9, wherein the second computing module is specifically configured to:
calculating the average value of all point line distances according to all the point line distances to obtain a point line distance average value;
calculating the average value of all Euclidean distances according to all the Euclidean distances to obtain the Euclidean distance average value;
and calculating a loss value according to the point-line distance average value, the Euclidean distance average value, a preset point-line distance weight and a preset Euclidean distance weight.
11. The apparatus of claim 10, wherein the second computing module is specifically configured to:
the loss value is calculated according to the following formula:
l=α×lsa+β×lmse
wherein ,
Figure FDA0002333445000000041
wherein α is the preset dot line distance weight, β is the preset Euclidean distance weight,/, in the formulasaIs the line pitch mean value, /)mseThe Euclidean distance average value is obtained, and l is a loss value; n denotes the number of predicted keypoints, i denotes the ith predicted keypoint, PiDenotes the ith prediction Key, GiDenotes the ith real keypoint, F (P)i,Gi) Represents PiA dot-line distance of, wherein, PiAnd GiHas a semantic relationship with the other components of the network,
Figure FDA0002333445000000042
an x-coordinate value representing the ith predicted keypoint,
Figure FDA0002333445000000051
a y-coordinate value representing the ith predicted keypoint,
Figure FDA0002333445000000052
the x-coordinate value representing the ith real keypoint,
Figure FDA0002333445000000053
a y-coordinate value representing the ith real keypoint,
Figure FDA0002333445000000054
and the Euclidean distance of the prediction key point of the ith prediction key point is shown.
12. A keypoint detection device, the device comprising:
the acquisition module is used for acquiring a picture to be detected;
a prediction module, configured to input the picture to be detected into a preset keypoint detection model to obtain a plurality of preset keypoints of the picture to be detected, where the preset keypoint detection model is obtained by training using the keypoint detection model training method according to any one of claims 1 to 5.
13. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method of training a keypoint detection model according to any one of claims 1 to 5 when executing a program stored in a memory.
14. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the keypoint detection method of claim 6 when executing a program stored on a memory.
15. A storage medium having stored therein a computer program which, when executed by a processor, implements the keypoint detection model training method of any one of claims 1 to 5.
16. A storage medium having stored therein a computer program which, when executed by a processor, implements the keypoint detection method of claim 6.
CN201911346309.5A 2019-12-24 2019-12-24 Key point detection model training method and device, electronic equipment and storage medium Active CN111126268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911346309.5A CN111126268B (en) 2019-12-24 2019-12-24 Key point detection model training method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911346309.5A CN111126268B (en) 2019-12-24 2019-12-24 Key point detection model training method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111126268A true CN111126268A (en) 2020-05-08
CN111126268B CN111126268B (en) 2023-04-25

Family

ID=70501951

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911346309.5A Active CN111126268B (en) 2019-12-24 2019-12-24 Key point detection model training method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111126268B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115894A (en) * 2020-09-24 2020-12-22 北京达佳互联信息技术有限公司 Training method and device for hand key point detection model and electronic equipment
CN113743157A (en) * 2020-05-28 2021-12-03 北京沃东天骏信息技术有限公司 Key point detection model training method and device and key point detection method and device
CN114550207A (en) * 2022-01-17 2022-05-27 北京新氧科技有限公司 Method and device for detecting key points of neck and method and device for training detection model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622252A (en) * 2017-09-29 2018-01-23 百度在线网络技术(北京)有限公司 information generating method and device
CN108305283A (en) * 2018-01-22 2018-07-20 清华大学 Human bodys' response method and device based on depth camera and basic form
CN109614867A (en) * 2018-11-09 2019-04-12 北京市商汤科技开发有限公司 Human body critical point detection method and apparatus, electronic equipment, computer storage medium
CN109948590A (en) * 2019-04-01 2019-06-28 启霖世纪(北京)教育科技有限公司 Pose problem detection method and device
WO2019228040A1 (en) * 2018-05-30 2019-12-05 杭州海康威视数字技术股份有限公司 Facial image scoring method and camera

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622252A (en) * 2017-09-29 2018-01-23 百度在线网络技术(北京)有限公司 information generating method and device
CN108305283A (en) * 2018-01-22 2018-07-20 清华大学 Human bodys' response method and device based on depth camera and basic form
WO2019228040A1 (en) * 2018-05-30 2019-12-05 杭州海康威视数字技术股份有限公司 Facial image scoring method and camera
CN109614867A (en) * 2018-11-09 2019-04-12 北京市商汤科技开发有限公司 Human body critical point detection method and apparatus, electronic equipment, computer storage medium
CN109948590A (en) * 2019-04-01 2019-06-28 启霖世纪(北京)教育科技有限公司 Pose problem detection method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HOEL KERVADEC ETC.: "Boundary loss for highly unbalanced segmentation" *
ISMAIL ELEZI ETC.: "The Group Loss for Deep Metric Learning" *
景晨凯;宋涛;庄雷;刘刚;王乐;刘凯伦;: "基于深度卷积神经网络的人脸识别技术综述" *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743157A (en) * 2020-05-28 2021-12-03 北京沃东天骏信息技术有限公司 Key point detection model training method and device and key point detection method and device
CN112115894A (en) * 2020-09-24 2020-12-22 北京达佳互联信息技术有限公司 Training method and device for hand key point detection model and electronic equipment
CN112115894B (en) * 2020-09-24 2023-08-25 北京达佳互联信息技术有限公司 Training method and device of hand key point detection model and electronic equipment
CN114550207A (en) * 2022-01-17 2022-05-27 北京新氧科技有限公司 Method and device for detecting key points of neck and method and device for training detection model
CN114550207B (en) * 2022-01-17 2023-01-17 北京新氧科技有限公司 Method and device for detecting key points of neck and method and device for training detection model

Also Published As

Publication number Publication date
CN111126268B (en) 2023-04-25

Similar Documents

Publication Publication Date Title
CN108898086B (en) Video image processing method and device, computer readable medium and electronic equipment
WO2022027912A1 (en) Face pose recognition method and apparatus, terminal device, and storage medium.
WO2020207190A1 (en) Three-dimensional information determination method, three-dimensional information determination device, and terminal apparatus
CN111126268B (en) Key point detection model training method and device, electronic equipment and storage medium
WO2022166243A1 (en) Method, apparatus and system for detecting and identifying pinching gesture
WO2020244075A1 (en) Sign language recognition method and apparatus, and computer device and storage medium
CN112633084A (en) Face frame determination method and device, terminal equipment and storage medium
CN112085701A (en) Face ambiguity detection method and device, terminal equipment and storage medium
WO2021217937A1 (en) Posture recognition model training method and device, and posture recognition method and device
JP2022521540A (en) Methods and systems for object tracking using online learning
US20220327740A1 (en) Registration method and registration apparatus for autonomous vehicle
TW202201275A (en) Device and method for scoring hand work motion and storage medium
CN111274852B (en) Target object key point detection method and device
CN110956131A (en) Single-target tracking method, device and system
CN111507244B (en) BMI detection method and device and electronic equipment
CN116884045A (en) Identity recognition method, identity recognition device, computer equipment and storage medium
WO2022162844A1 (en) Work estimation device, work estimation method, and work estimation program
CN110934565B (en) Method and device for measuring pupil diameter and computer readable storage medium
CN115620254A (en) Method, device, equipment and storage medium for evaluating lane line detection
CN115527083A (en) Image annotation method and device and electronic equipment
CN111368792A (en) Characteristic point mark injection molding type training method and device, electronic equipment and storage medium
CN112633143A (en) Image processing system, method, head-mounted device, processing device, and storage medium
CN110765918A (en) MFANet-based vSLAM rapid loop detection method and device
CN112630736A (en) Method, device and equipment for determining parameters of roadside radar and storage medium
CN111368624A (en) Loop detection method and device based on generation of countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant