CN111126268A

CN111126268A - Key point detection model training method and device, electronic equipment and storage medium

Info

Publication number: CN111126268A
Application number: CN201911346309.5A
Authority: CN
Inventors: 钟韬
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2020-05-08
Anticipated expiration: 2039-12-24
Also published as: CN111126268B

Abstract

The embodiment of the application provides a method and a device for training a key point detection model, electronic equipment and a storage medium, and relates to the technical field of computers.

Description

Key point detection model training method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for training a keypoint detection model, an electronic device, and a storage medium.

Background

With the development of computer vision technology, the target detection is more and more widely applied, and has wide application prospects in numerous fields such as face recognition, safety monitoring, dynamic tracking and the like. The key points refer to parts with stable and important semantic information on the target, and the target key point positioning refers to positioning the key points of the specific target and outputting the position information of the key points of the specific target after the specific target is detected. The target key point positioning has very important practical application value in a plurality of fields such as target attribute analysis, posture recognition, posture correction and the like.

For example, in most face applications, it is necessary to accurately detect key points of a face, and the main principle is to locate the positions of all key points of the face in an input face picture, where the common key points include 21 points, 68 points, 106 points, 240 points, and so on. Loss functions in the process of training the key point detection model are important parameters for quantifying whether the key points are accurate, and the loss functions commonly used at present include Mean Squared Error (MSE), Mean Absolute Error (MAE), Wing loss function (Wing loss function), and the like.

The inventor finds in research that the loss function does not consider the error problem of the key point at the semantic ambiguity, so that the position of the key point predicted by the trained key point detection model has an error and is poor in precision.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and an apparatus for training a keypoint detection model, an electronic device, and a storage medium, so as to solve the problem of poor accuracy of keypoint detection and improve the accuracy of keypoint detection.

The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a method for training a keypoint detection model, where the method includes:

acquiring a sample picture, wherein the sample picture corresponds to sample picture marks, the sample picture marks comprise a preset number of real key points, and each real key point corresponds to a position coordinate;

inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, wherein each prediction key point corresponds to a position coordinate;

calculating the point-line distance of each predicted key point according to the position coordinates of each predicted key point and the position coordinates of each real key point, wherein for any predicted key point, the point-line distance of the predicted key point is the shortest distance in the distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point with semantic relation with the predicted key point, the real key point with semantic relation with the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point which has the same semantic type as the target real key point and is adjacent to the target real key point;

calculating a loss value according to the line distance of each point;

and training the preset key point detection model according to the loss value to obtain a trained key point detection model.

Optionally, the sample picture is a picture with facial features, a picture with human body features, or a picture with gesture features.

Optionally, before the step of calculating the loss value according to each dot-line distance, the method further includes:

calculating the Euclidean distance between each predicted key point and a real key point with the same semantic meaning according to the position coordinate of each predicted key point and the position coordinate of each real key point to obtain the Euclidean distance of each predicted key point;

the calculating a loss value according to each point-line distance includes:

and calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight.

Optionally, the calculating a loss value according to each of the point-line distances, each of the euclidean distances, the preset point-line distance weight, and the preset euclidean distance weight includes:

calculating the average value of all point line distances according to all the point line distances to obtain a point line distance average value;

calculating the average value of all Euclidean distances according to all the Euclidean distances to obtain the Euclidean distance average value;

and calculating a loss value according to the point-line distance average value, the Euclidean distance average value, a preset point-line distance weight and a preset Euclidean distance weight.

Optionally, the calculating a loss value according to the point-line distance average value, the euclidean distance average value, the preset point-line distance weight, and the preset euclidean distance weight includes:

the loss value is calculated according to the following formula:

l＝α×l_sa+β×l_mse

wherein ,

wherein α is the preset dot line distance weight, β is the preset Euclidean distance weight,/, in the formula_saIs the line pitch mean value, /)_mseThe Euclidean distance average value is obtained, and l is a loss value; n denotes the number of predicted keypoints, i denotes the ith predicted keypoint, P_iDenotes the ith prediction Key, G_iDenotes the ith real keypoint, F (P)_i,G_i) Represents P_iA dot-line distance of, wherein, P_iAnd G_iHas a semantic relationship with the other components of the network,

an x-coordinate value representing the ith predicted keypoint,

a y-coordinate value representing the ith predicted keypoint,

the x-coordinate value representing the ith real keypoint,

a y-coordinate value representing the ith real keypoint,

and the Euclidean distance of the prediction key point of the ith prediction key point is shown.

In a second aspect, an embodiment of the present application provides a method for detecting a keypoint, where the method includes:

acquiring a picture to be detected;

inputting the picture to be detected into a preset key point detection model for analysis, and obtaining a plurality of preset key points of the picture to be detected, wherein the preset key point detection model is obtained by training by using any one of the key point detection model training methods in the first aspect.

In a third aspect, an embodiment of the present application provides a keypoint detection model training device, where the device includes:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a sample picture, the sample picture corresponds to sample picture marks, the sample picture marks comprise a preset number of real key points, and each real key point corresponds to a position coordinate;

the processing module is used for inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, and each prediction key point corresponds to a position coordinate;

the first calculation module is used for calculating the point-line distance of each predicted key point according to the position coordinates of each predicted key point and the position coordinates of each real key point, wherein for any predicted key point, the point-line distance of the predicted key point is the shortest distance in the distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key points with semantic relation with the predicted key point, the real key points with semantic relation with the predicted key point are target real key points, and the preset virtual straight line also passes through the real key points with the same semantic type as the target real key points and adjacent to the target real key points;

the second calculation module is used for calculating a loss value according to the line distance of each point;

and the training module is used for training the preset key point detection model according to the loss value to obtain a trained key point detection model.

Optionally, the apparatus further comprises:

the third calculation module is used for calculating the Euclidean distance between each predicted key point and the real key point with the same semantic meaning according to the position coordinates of each predicted key point and the position coordinates of each real key point to obtain the Euclidean distance of each predicted key point;

the second calculation module is specifically configured to:

Optionally, the second calculating module is specifically configured to:

the loss value is calculated according to the following formula:

l＝α×l_sa+β×l_mse

wherein ,

wherein α is the preset dot line distance weight, β is the preset Euclidean distance weight,/, in the formula_saIs the line pitch mean value, /)_mseThe Euclidean distance average value is obtained, and l is a loss value; n denotes the number of predicted keypoints, i denotes the ith predicted keypoint, P_iDenotes the ith prediction Key, G_iDenotes the ith real keypoint, F (P)_i,G_i) Represents P_iDotted line ofA distance of, wherein, P_iAnd G_iHas a semantic relationship with the other components of the network,

an x-coordinate value representing the ith predicted keypoint,

a y-coordinate value representing the ith predicted keypoint,

the x-coordinate value representing the ith real keypoint,

a y-coordinate value representing the ith real keypoint,

In a fourth aspect, an embodiment of the present application provides a keypoint detection apparatus, including:

the acquisition module is used for acquiring a picture to be detected;

the prediction module is configured to input the picture to be detected into a preset key point detection model for analysis, so as to obtain a plurality of preset key points of the picture to be detected, where the preset key point detection model is obtained by training using any one of the key point detection model training methods described in the first aspect.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein:

the processor, the communication interface and the memory complete mutual communication through a communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the method for training the keypoint detection model according to any of the first aspect when executing the program stored in the memory.

In a sixth aspect, an embodiment of the present application provides an electronic device, including: a processor, a communication interface, a memory, and a communication bus, wherein:

the memory is used for storing a computer program;

the processor is configured to implement the key point detection method according to any one of the second aspects when executing the program stored in the memory.

In a seventh aspect, an embodiment of the present application provides a storage medium, where instructions are stored in the storage medium, and when the instructions are executed on a computer, the instructions cause the computer to perform the method for training a keypoint detection model according to any one of the above first aspects.

In an eighth aspect, an embodiment of the present application provides a storage medium, where instructions are stored, and when the storage medium is run on a computer, the instructions cause the computer to execute the keypoint detection method according to any one of the second aspects.

In a ninth aspect, embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the method for training a keypoint detection model according to any of the above first aspects.

In a tenth aspect, embodiments of the present application provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the keypoint detection method according to any one of the second aspects described above.

According to the method, the device, the electronic equipment, the storage medium and the computer program product containing the instructions, the distance from a predicted key point to a preset virtual straight line passing through the predicted key point and a real key point having a semantic relation with the predicted key point is calculated, the shortest distance is taken as the point-line distance of the predicted key point, a loss value is calculated according to the point-line distance, so that the key point error caused by semantic ambiguity is fully considered, the preset key point detection model is trained according to the loss value, the key point error caused by the semantic ambiguity is fully considered in the training process of the preset key point detection model, the problem of poor key point detection precision is solved, and the accuracy of key point detection is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1a is a first schematic diagram of a keypoint detection model training method according to an embodiment of the present application;

FIG. 1b is a second schematic diagram of a keypoint detection model training method according to an embodiment of the present application;

FIG. 1c is a third schematic diagram of a keypoint detection model training method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a keypoint detection method according to an embodiment of the present application;

FIG. 3a is a first schematic diagram of a keypoint detection model training apparatus according to an embodiment of the present application;

FIG. 3b is a second schematic diagram of a keypoint detection model training device according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a keypoint detection apparatus according to an embodiment of the present application;

fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to solve the problem of poor detection precision of key points and improve the accuracy of key point detection, the application discloses a method for training a key point detection model, which comprises the following steps:

obtaining a sample picture, wherein the sample picture corresponds to sample picture marks, the sample picture marks comprise a preset number of real key points, and each real key point corresponds to a position coordinate;

calculating a point-line distance of each of the predicted key points according to the position coordinates of each of the predicted key points and the position coordinates of each of the real key points, wherein for any one of the predicted key points, the point-line distance of the predicted key point is the shortest distance among the distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point having a semantic relationship with the predicted key point, the real key point having a semantic relationship with the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point having the same semantic type as the target real key point and adjacent to the target real key point;

calculating a loss value according to the line distance of each point;

The method comprises the steps of calculating the distance from a predicted key point to a preset virtual straight line passing through the predicted key point and a real key point having a semantic relation with the predicted key point, taking the shortest distance as the point-line distance of the predicted key, calculating a loss value according to the point-line distance, fully considering key point errors caused by semantic ambiguity, and training a preset key point detection model according to the loss value, so that the key point errors caused by the semantic ambiguity are fully considered in the training process of the preset key point detection model, the problem of poor key point detection precision is solved, and the accuracy of key point detection is improved.

An embodiment of the present application provides a method for training a keypoint detection model, and referring to fig. 1a, fig. 1a is a first schematic diagram of the method for training the keypoint detection model in the embodiment of the present application, and includes the following steps:

step 110, obtaining a sample picture, where the sample picture corresponds to sample picture marks, the sample picture marks include a preset number of real key points, and each real key point corresponds to a position coordinate.

The key point detection model training method can be realized through electronic equipment, and specifically, the electronic equipment can be a server and the like.

In the process of training the preset key point detection model, a sample picture needs to be acquired so that the preset key point detection model can be trained according to the sample picture, and the sample picture can be a picture with facial features, a picture with human body features or a picture with gesture features. The sample picture corresponds to a sample picture mark, the sample picture mark comprises a preset number of real key points, and each real key point corresponds to a position coordinate. For example, in face recognition, key points of a face need to be detected, and the preset sample picture set is a picture set with face features and includes a plurality of sample pictures with face features, the sample pictures correspond to sample picture markers, the sample picture markers include 21 real key points, and each real key point corresponds to a position coordinate. The real key points of the sample picture of the human face features are generated through five sense organs of the human face and facial contours, the number of the real key points and the positions of the key points are set according to actual needs, for example, the key positions of the eyes, the upper lip, the lower lip and the cheek of the human face are labeled to generate sample picture marks, wherein the key points of each part are key points with the same type of semantic relationship, the semantics can be simply regarded as the meanings of concepts represented by the key points corresponding to the data, and the relationships among the meanings are the explanation and logical representation of the key points, and the semantic relationship type refers to the type of the semantic relationship divided according to different classification conditions. The semantic relationships can be classified to help understand the meaning and the characteristics of the semantic relationships so as to play a role of the semantic relationships in information organization, for example, in a human face, according to the parts divided by the key points, the key points of a certain part can be called key points of the same type of semantic relationships, for example, the key points of a cheek can be called key points of the same type of semantic relationships, the key points of a left eye can be called key points of the cheek, the key points of the left eye can be key points of the same type of semantic relationships, and the key points of the left eye can be called key points of the left eye.

And 120, inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, wherein each prediction key point corresponds to a position coordinate.

Inputting the sample picture into a preset key point detection model for processing to obtain a preset number of prediction key points, wherein each prediction key point is corresponding to a position coordinate, for example, the sample picture is a picture with human face characteristics, wherein the sample picture mark comprises 68 real key points, inputting the sample picture into the preset key point detection model for processing to obtain 68 prediction key points, each prediction key point is corresponding to a position coordinate, wherein each prediction key point is corresponding to the 68 real key points one by one, has a semantic relationship, and the prediction key points and the real key points have the semantic relationship one by one, and when the prediction key points and the real key points are generated, the prediction key points and the real key points can be numbered, and the semantic relationship between the prediction key points and the real key points is represented by the numbers, for example, the left eye has 6 real keypoints, labeled left eye true 1, left eye true 2, left eye true 3, left eye true 4, left eye true 5, the left eye has 6 predicted keypoints, labeled left eye prediction 1, left eye prediction 2, left eye prediction 3, left eye prediction 4, and left eye prediction 5, then the real keypoint labeled left eye true 1 and the predicted keypoint labeled left eye prediction 1 have a semantic relationship, the real keypoint labeled left eye true 2 and the predicted keypoint labeled left eye prediction 2 have a semantic relationship, and the real keypoint labeled left eye true 3 and the predicted keypoint labeled left eye prediction 3 have a semantic relationship.

Step 130, calculating a point-line distance of each of the predicted key points according to the position coordinates of each of the predicted key points and the position coordinates of each of the real key points, wherein for any one of the predicted key points, the point-line distance of the predicted key point is the shortest distance among the distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key point having a semantic relationship with the predicted key point, the real key point having a semantic relationship with the predicted key point is a target real key point, and the preset virtual straight line also passes through the real key point having the same semantic type as the target real key point and adjacent to the target real key point.

For any one predicted key point, the point-line distance of the predicted key point is the shortest distance in the distances from the predicted key point to a preset virtual straight line, wherein the preset virtual straight line represents the spatial relationship between two points, the preset virtual straight line passes through a real key point with semantic relationship with the predicted key point, and in addition, the preset virtual straight line also passes through a real key point which has the same type of semantic relationship with the real key point with semantic relationship with the predicted key point and is adjacent to the real key point with semantic relationship with the predicted key point. The key points of each part are key points with the same type of semantic relationship, for example, real key points of a cheek, when the real key points of the cheek are generated, serial numbers can be sequentially marked for each real key point, and according to the serial numbers of each real key point, each adjacent real key point with the same type of semantic relationship is connected to obtain each preset virtual straight line. For example, the sample picture is a sample picture with human face features, wherein the sample picture markers include 68 real key points respectively located in the left eye, the right eye, the nose, the upper lip, the lower lip, the left eyebrow, the right eyebrow, and the cheek. With the left eye having 6 keypoints, the right eye having 6 keypoints, and the cheek having 16 keypoints. According to the position coordinates and the semantic relationship types of the real key points, connecting the adjacent real key points with the same semantic relationship type to obtain preset virtual straight lines, wherein the same semantic relationship type means the same type of real key points, for example, the real key points of the outer contour of the cheek are the real key points with the same semantic relationship type, the real key points on the outer contour line of the upper lip are the real key points with the same semantic relationship type, the real key points on the outer contour line of the lower lip are the real key points with the same semantic relationship type, for example, connecting the adjacent real key points of the 16 key points of the cheek to obtain the preset virtual straight lines of the cheek part, and connecting the adjacent real key points of the 6 key points of the left eye to obtain the preset virtual straight lines of the left eye part.

And then, according to the position coordinates of each predicted key point, the shortest distance from each predicted key point to the preset virtual straight line can be calculated.

For example, the sample picture is a picture with human face features, wherein the cheek part comprises n key points, G_iRepresents the ith real keypoint, i ∈ { 1., n }, n ∈>3, then i>1, with G_iThe adjacent real key point is G_i-1 and G_i+1Then G will be_i and G_i-1Making a connection to obtain a straight line X_i-1,iG is_i and G_i+1Making a connection to obtain a straight line X_i,i+1. Calculating the dot-to-dot pitch of each of said predicted keypoints, e.g. P, on the basis of the position coordinates of said predicted keypoints_iRepresents the ith prediction key point, since P_iDenotes the ith prediction Key, G_iRepresenting the ith real key point, and representing the predicted key point and the real key point with semantic relation according to the serial numberIs, then represents P_iIs G_iCorresponding predicted keypoint, P_iAnd G_iHaving a semantic relationship, P is calculated separately_iTo a straight line X_i-1,iAnd a straight line X_i,i+1Wherein the shortest distance is P_iThe dot-line distance of (1) is marked as F (P)_i,G_i). When i is 1, and G₁The adjacent real key point is G₂Then G will be₁ and G₂Making a connection to obtain a straight line X_1,2. Predicting a keypoint P₁Position coordinates of (2), calculating P₁Distance of point of (P)₁And G₁If the semantics are the same, P is calculated₁To a straight line X_1,2, wherein ,P₁To X_1,2A distance of P_iThe dot-line distance of (a). The loss value between the real key point and the predicted key point can be further calculated through the point-line distance, because the shortest distance between the predicted key point and the point-line distance and a preset virtual straight line passing through the real key point with semantic relation with the predicted key point can be fully considered, the key point error caused by semantic ambiguity can be fully considered, and the preset key point detection model is trained according to the loss value, so that the key point error caused by the semantic ambiguity is fully considered in the training process of the preset key point detection model, the problem of poor key point detection precision is further solved, and the accuracy of key point detection is improved.

And step 140, calculating a loss value according to the point-line distance.

The sum of the line distances of the points is calculated to obtain the sum of the line distances of the points, the average value of the line distances of the points is calculated according to the sum of the line distances of the points, the average value of the line distances of the points is used as a loss value, so that the key point error caused by semantic ambiguity is fully considered, the preset key point detection model is trained according to the loss value, the key point error caused by the semantic ambiguity is fully considered in the training process of the preset key point detection model, the problem of poor key point detection accuracy is solved, and the accuracy of key point detection is improved.

And 150, training the preset key point detection model according to the loss value to obtain a trained key point detection model.

Adjusting parameters of the preset key point detection model according to the loss values, continuing training according to the sample pictures until a preset training end condition is met, for example, the preset training end condition is 500 sample pictures to be trained, or the preset training end condition is that the loss values do not exceed a preset threshold value, and the like, obtaining the trained key point detection model, training the preset key point detection model according to the loss values, so that key point errors caused by semantic ambiguity are fully considered in the training process of the preset key point detection model, the problem of poor key point detection precision is solved, and the accuracy of key point detection is improved. Specifically, the training method of the preset key point detection model may refer to a training method of a model in the existing/related art.

The method comprises the steps of calculating the distance from a predicted key point to a straight line passing through the predicted key point and a real key point having a semantic relation with the predicted key point, taking the shortest distance as the point-line distance of the predicted key point, calculating a loss value according to the point-line distance, fully considering key point errors caused by semantic ambiguity, training a preset key point detection model according to the loss value, fully considering the key point errors caused by the semantic ambiguity in the training process of the preset key point detection model, solving the problem of poor key point detection precision and improving the accuracy of key point detection.

In a possible implementation manner, the sample picture is a picture with facial features, a picture with human body features, or a picture with gesture features.

The image with the facial features can be a human face feature image, an animal face feature image and the like. The sample picture is a picture with facial features, a picture with human body features or a picture with gesture features, and is used for detecting a target with facial features, a target with human body features or a target with gesture features.

Referring to fig. 1b, fig. 1b is a second schematic diagram of a method for training a keypoint detection model according to an embodiment of the present application, and in a possible implementation manner, before the step of calculating a loss value according to each of the above-mentioned line distances of points, the method further includes:

step 160, calculating the Euclidean distance between each predicted key point and the real key point with the same semantic meaning according to the position coordinates of each predicted key point and the position coordinates of each real key point to obtain the Euclidean distance of each predicted key point;

the calculating a loss value according to each of the dot pitches includes:

and step 141, calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight.

Calculating the Euclidean distance between each predicted key point and a real key point with the same semantic meaning according to the position coordinates of each predicted key point, for example,

an x-coordinate value representing the ith predicted keypoint,

a y-coordinate value representing the ith predicted keypoint,

the x-coordinate value representing the ith real keypoint,

the y coordinate value of the ith real key point is represented, then

And calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight. And finally, a weighted sum of the point and line distances is obtained by assigning a coefficient to the sum of the point and line distances by a preset point and line distance weight, a weighted sum of the point and line distances is obtained by a preset euclidean distance weight, and a loss value is calculated by calculating a sum of the weighted sum of the point and line distances and the weighted sum of the euclidean distances. And calculating a loss value according to the point-line distances, the Euclidean distances, the preset point-line distance weight and the preset Euclidean distance weight, so that the loss value for adjusting the parameters of the preset key point detection model takes into account the point-line distance from the predicted key point to a preset virtual straight line passing through a real key point with a semantic relationship with the predicted key point and the distance from the predicted key point to two points of the real key point with the semantic relationship with the predicted key point, thereby fully considering the key point error caused by semantic ambiguity.

Alternatively, the loss value is calculated according to the following formula:

l＝α×l_sa+β×l_mse

wherein ,

wherein α is the weight of the preset dot line distance, β is the weight of the preset Euclidean distance, l_saIs the above-mentioned line pitch average value,/_mseThe Euclidean distance average value is shown, and l is a loss value; n denotes the number of predicted keypoints, i denotes the ith predicted keypoint, P_iDenotes the ith prediction Key, G_iDenotes the ith real keypoint, F (P)_i,G_i) Represents P_iA dot-line distance of, wherein, P_iAnd G_iHas a semantic relationship with the other components of the network,

an x-coordinate value representing the ith predicted keypoint,

a y-coordinate value representing the ith predicted keypoint,

the x-coordinate value representing the ith real keypoint,

a y-coordinate value representing the ith real keypoint,

An embodiment of the present application provides a method for training a keypoint detection model, referring to fig. 1c, where fig. 1c is a third schematic diagram of the method for training the keypoint detection model according to the embodiment of the present application, and in a possible implementation, the calculating a loss value according to each point-line distance, each euclidean distance, a preset point-line distance weight, and a preset euclidean distance weight includes:

step 1411, calculating an average value of all point line distances according to the point line distances to obtain a point line distance average value;

step 1412, calculating the average value of all the Euclidean distances according to the Euclidean distances to obtain the Euclidean distance average value;

and 1413, calculating a loss value according to the point-line distance average value, the Euclidean distance average value, the preset point-line distance weight and the preset Euclidean distance weight.

The cheek part comprises n key points, and the average value of all the point line distances is calculated according to the point line distances to obtain a point line distance average value which is represented as l_sa，

Calculating the average value of all Euclidean distances according to the Euclidean distances to obtain the Euclidean distance average value, wherein the Euclidean distance average value is expressed as l_mse，

The preset dot-line distance weight is α, the preset euclidean distance weight is β, and a loss value is calculated based on the dot-line distance average value, the euclidean distance average value, the preset dot-line distance weight, and the preset euclidean distance weight, where the loss value is represented by l, where l is α × l_sa+β×l_mse。

An embodiment of the present application provides a method for detecting a keypoint, referring to fig. 2, where fig. 2 is a schematic diagram of the method for detecting a keypoint, and the method includes the following steps:

step 210, obtaining a picture to be detected.

The key point detection method of the embodiment of the application can be implemented by electronic equipment, and specifically, the electronic equipment can be a server and the like.

And acquiring a picture to be detected so as to input the picture to be detected into a preset key point detection model to obtain a plurality of preset key points of the picture to be detected.

Step 220, inputting the picture to be detected into a preset key point detection model to obtain a plurality of preset key points of the picture to be detected, wherein the preset key point detection model is obtained by training according to any one of the key point detection model training methods in the embodiments.

After the preset key point detection model is trained by any one of the key point detection model training methods in the embodiments, the picture to be detected is input into the preset key point detection model for analysis, and a plurality of preset key points of the picture to be detected can be obtained. For example, the sample picture is a face picture, and after the preset key point detection model is trained by the key point detection model training method in any one of the embodiments, the preset key point detection model may be used to predict key points on the face picture, and then the picture to be detected is input into the preset key point detection model to be analyzed, so as to obtain a plurality of preset key points of the picture to be detected, thereby realizing prediction of the key points of the picture to be detected.

An embodiment of the present application further provides a device, referring to fig. 3a, where fig. 3a is a first schematic diagram of a keypoint detection model training device according to an embodiment of the present application, where the device includes:

an obtaining module 310, configured to obtain a sample picture, where the sample picture corresponds to a sample picture mark, the sample picture mark includes a preset number of real key points, and each of the real key points corresponds to a position coordinate;

a processing module 320, configured to input the sample picture into a preset key point detection model for processing, so as to obtain a preset number of prediction key points, where each prediction key point corresponds to a position coordinate;

a first calculating module 330, configured to calculate a point-to-line distance of each of the predicted key points according to a position coordinate of each of the predicted key points and a position coordinate of each of the real key points, where, for any one of the predicted key points, the point-to-line distance of the predicted key point is a shortest distance among distances from the predicted key point to a preset virtual straight line, the preset virtual straight line is a straight line passing through the real key points having a semantic relationship with the predicted key point, the real key points having a semantic relationship with the predicted key point are target real key points, and the preset virtual straight line also passes through the real key points having a semantic type same as that of the target real key points and adjacent to the target real key points;

a second calculating module 340, configured to calculate a loss value according to each of the dot-line distances;

and the training module 350 is configured to train the preset keypoint detection model according to the loss value to obtain a trained keypoint detection model.

Referring to fig. 3b, fig. 3b is a second schematic diagram of the keypoint detection model training apparatus according to the embodiment of the present application, and in a possible implementation manner, the sample picture is a picture with facial features, a picture with human body features, or a picture with gesture features.

In a possible embodiment, the above apparatus further comprises:

a third calculating module 360, configured to calculate a euclidean distance between each predicted key point and each real key point with the same semantic meaning according to the position coordinates of each predicted key point and the position coordinates of each real key point, so as to obtain a euclidean distance between each predicted key point;

the second calculating module 340 is specifically configured to:

In a possible implementation manner, the second calculating module 340 is specifically configured to:

calculating the average value of all the point line distances according to the point line distances to obtain a point line distance average value;

and calculating a loss value according to the point-line distance average value, the Euclidean distance average value, the preset point-line distance weight and the preset Euclidean distance weight.

the loss value is calculated according to the following formula:

l＝α×l_sa+β×l_mse

wherein ,

x coordinate value representing ith predicted keypoint，

A y-coordinate value representing the ith predicted keypoint,

the x-coordinate value representing the ith real keypoint,

a y-coordinate value representing the ith real keypoint,

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

An embodiment of the present application further provides an apparatus, referring to fig. 4, where fig. 4 is a schematic diagram of a keypoint detection apparatus according to an embodiment of the present application, where the apparatus includes:

the acquisition module 410 is used for acquiring a picture to be detected;

the prediction module 420 is configured to input the picture to be detected into a preset keypoint detection model for analysis, so as to obtain a plurality of preset keypoints of the picture to be detected, where the preset keypoint detection model is obtained by training using any one of the keypoint detection model training methods in the embodiments.

An embodiment of the present application further provides an electronic device, referring to fig. 5, where fig. 5 is a schematic diagram of the electronic device according to the embodiment of the present application, and the electronic device includes: a processor 510, a communication interface 520, a memory 530, and a communication bus 540, wherein the processor 510, the communication interface 520, and the memory 530 communicate with each other via the communication bus 540,

the memory 530 for storing a computer program;

the processor 510 is configured to implement the following steps when executing the computer program stored in the memory 530:

calculating a loss value according to the line distance of each point;

Optionally, the processor 510, when configured to execute the program stored in the memory 530, may further implement any of the above-described methods for training the keypoint detection model.

An embodiment of the present application further provides an electronic device, including: a processor, a communication interface, a memory and a communication bus, wherein, the processor, the communication interface and the memory complete the mutual communication through the communication bus,

the memory is used for storing computer programs;

the processor is configured to implement the following steps when executing the computer program stored in the memory:

acquiring a picture to be detected;

inputting the picture to be detected into a preset key point detection model to obtain a plurality of preset key points of the picture to be detected, wherein the preset key point detection model is obtained by training by using the key point detection model training method according to any one of claims 1 to 5.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

In an embodiment of the present application, there is further provided a storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform any one of the above-mentioned methods for training a keypoint detection model.

In an embodiment of the present application, there is also provided a storage medium having instructions stored therein, which when run on a computer, cause the computer to perform any of the above-described keypoint detection methods in the above-described embodiments.

In an embodiment of the present application, there is further provided a computer program product containing instructions, which when run on a computer, cause the computer to perform any one of the above-mentioned methods for training a keypoint detection model.

In an embodiment of the present application, there is also provided a computer program product containing instructions, which when run on a computer, cause the computer to perform any of the above-mentioned keypoint detection methods in the above-mentioned embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described above in accordance with the embodiments of the invention may be generated, in whole or in part, when the computer program instructions described above are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It should be noted that, in this document, the technical features in the various alternatives can be combined to form the scheme as long as the technical features are not contradictory, and the scheme is within the scope of the disclosure of the present application. Relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the same element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiments of the apparatus, the electronic device, and the storage medium, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A method for training a keypoint detection model, the method comprising:

calculating a loss value according to the line distance of each point;

2. The method according to claim 1, wherein the sample picture is a picture with facial features, a picture with human features or a picture with gesture features.

3. The method of claim 1 or 2, wherein the step of calculating a loss value from each of the dot-line distances is preceded by the method further comprising:

the calculating a loss value according to each point-line distance includes:

4. The method of claim 3, wherein calculating a loss value based on each of the dot-to-line distances, each of the Euclidean distances, a preset dot-to-line distance weight, and a preset Euclidean distance weight comprises:

5. The method of claim 4, wherein calculating a loss value based on the point-line distance average, the Euclidean distance average, a preset point-line distance weight, and a preset Euclidean distance weight comprises:

the loss value is calculated according to the following formula:

l＝α×l_sa+β×l_mse

wherein ,

wherein α is the preset dot line distance weight, β is the preset Euclidean distance weight,/, in the formula_saIs the line pitch mean value, /)_mseThe Euclidean distance average value is obtained, and l is a loss value; n denotes the number of predicted keypoints, i denotes the ith predicted keypoint, P_iDenotes the ith prediction Key, G_iDenotes the ith real keypoint, F (P)_i，G_i) Represents P_iA dot-line distance of, wherein, P_iAnd G_iHas a semantic relationship with the other components of the network,

is shown asThe x-coordinate values of the i predicted keypoints,

a y-coordinate value representing the ith predicted keypoint,

the x-coordinate value representing the ith real keypoint,

a y-coordinate value representing the ith real keypoint,

6. A method of keypoint detection, the method comprising:

acquiring a picture to be detected;

inputting the picture to be detected into a preset key point detection model to obtain a plurality of preset key points of the picture to be detected, wherein the preset key point detection model is obtained by training by using the key point detection model training method of any one of claims 1 to 5.

7. A keypoint detection model training device, characterized in that it comprises:

8. The apparatus of claim 7, wherein the sample picture is a picture with facial features, a picture with human features, or a picture with gesture features.

9. The apparatus of claim 7 or 8, further comprising:

the second calculation module is specifically configured to:

10. The apparatus of claim 9, wherein the second computing module is specifically configured to:

11. The apparatus of claim 10, wherein the second computing module is specifically configured to:

the loss value is calculated according to the following formula:

l＝α×l_sa+β×l_mse

wherein ,

an x-coordinate value representing the ith predicted keypoint,

a y-coordinate value representing the ith predicted keypoint,

the x-coordinate value representing the ith real keypoint,

a y-coordinate value representing the ith real keypoint,

12. A keypoint detection device, the device comprising:

the acquisition module is used for acquiring a picture to be detected;

a prediction module, configured to input the picture to be detected into a preset keypoint detection model to obtain a plurality of preset keypoints of the picture to be detected, where the preset keypoint detection model is obtained by training using the keypoint detection model training method according to any one of claims 1 to 5.

13. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the method of training a keypoint detection model according to any one of claims 1 to 5 when executing a program stored in a memory.

14. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for implementing the keypoint detection method of claim 6 when executing a program stored on a memory.

15. A storage medium having stored therein a computer program which, when executed by a processor, implements the keypoint detection model training method of any one of claims 1 to 5.

16. A storage medium having stored therein a computer program which, when executed by a processor, implements the keypoint detection method of claim 6.