CN113095336B

CN113095336B - Method for training key point detection model and method for detecting key points of target object

Info

Publication number: CN113095336B
Application number: CN202110439103.8A
Authority: CN
Inventors: 宫延河
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-22
Filing date: 2021-04-22
Publication date: 2022-03-11
Anticipated expiration: 2041-04-22
Also published as: CN113095336A

Abstract

The invention discloses a training method of a key point detection model and a method for detecting key points of a target object, which are applied to the technical field of electronics, and are particularly applied to the technical field of augmented reality and deep learning. The specific implementation scheme of the training method of the key point detection model is as follows: obtaining training samples including a target object, wherein the training samples at least include a first type of training sample without a label; based on the training sample, adopting a key point detection model to obtain the predicted position information of a plurality of key points of the target object; the keypoint detection model is trained based on a predetermined loss function and predicted position information for the training samples. And constructing a predetermined loss function for the first class of training samples based on the predicted position information of the adjacent key points in the plurality of key points.

Description

Method for training key point detection model and method for detecting key points of target object

Technical Field

The present disclosure relates to the field of electronic technologies, and in particular, to the field of augmented reality and deep learning technologies, and in particular, to a method for training a keypoint detection model, a method and an apparatus for detecting keypoints of a target object, an electronic device, and a storage medium.

Background

Current keypoint detection methods use labeled supervisory data for training. In order to improve the detection accuracy, a large amount of supervision data is usually required, which undoubtedly brings a large labeling cost to the training.

Disclosure of Invention

The method for training the key point detection model, the method for detecting the key points of the target object, the device, the electronic equipment and the storage medium reduce training cost and guarantee detection precision.

According to an aspect of the present disclosure, there is provided a method for training a keypoint detection model, the method including: obtaining training samples including a target object, wherein the training samples at least include a first type of training sample without a label; based on the training sample, adopting a key point detection model to obtain the predicted position information of a plurality of key points of the target object; and training the key point detection model based on the preset loss function and the predicted position information aiming at the training samples, wherein the preset loss function aiming at the first class of training samples is constructed based on the predicted position information of adjacent key points in the plurality of key points.

According to another aspect of the present disclosure, there is provided a method of detecting a target object keypoint, comprising: acquiring an image to be processed including a target object; and obtaining the position information of the key points of the target object in the image to be processed by adopting a key point detection model. The key point detection model is obtained by training by adopting the training method of the key point detection model.

According to another aspect of the present disclosure, there is provided a training apparatus for a keypoint detection model, the apparatus including: the sample acquisition module is used for acquiring training samples including target objects, wherein the training samples at least include a first type of training samples without labels; the prediction information obtaining module is used for obtaining prediction position information of a plurality of key points of the target object by adopting a key point detection model based on the training sample; and the model training module is used for training the key point detection model based on the preset loss function and the predicted position information aiming at the training samples, wherein the preset loss function aiming at the first class of training samples is constructed based on the predicted position information of the adjacent key points in the plurality of key points.

According to another aspect of the present disclosure, there is provided an apparatus for detecting a key point of a target object, the apparatus including: the image acquisition module is used for acquiring an image to be processed comprising a target object; and the position information determining module is used for acquiring the position information of the key points of the target object in the image to be processed by adopting the key point detection model. The key point detection model is obtained by training by adopting the training device of the key point detection model.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of training a keypoint detection model and/or a method of detecting keypoints of a target object provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method of training a keypoint detection model and/or a method of detecting keypoints of a target object provided by the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method of training a keypoint detection model and/or the method of detecting keypoints of a target object provided by the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an application scenario of a method for training a keypoint detection model and a method for detecting keypoints of a target object according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method of training a keypoint detection model according to an embodiment of the disclosure;

FIG. 3 is a flow chart of a method of training a keypoint detection model according to another embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a principle of determining a value of a predetermined loss function for a first class of training samples according to an embodiment of the present disclosure;

FIG. 5 is a flow chart of a method of detecting key points of a target object according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of a training apparatus for a keypoint detection model according to an embodiment of the present disclosure;

FIG. 7 is a block diagram of an apparatus for detecting key points of a target object according to an embodiment of the present disclosure; and

fig. 8 is a block diagram of an electronic device for implementing a method for training a keypoint detection model and/or a method for detecting keypoints of a target object according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The disclosure provides a method for training a key point detection model, which comprises a sample acquisition stage, a prediction information acquisition stage and a model training stage. In the sample acquisition phase, training samples including the target object are acquired, wherein the training samples at least include a first type of training sample without a label. In the stage of obtaining the prediction information, on the basis of the training samples, the prediction position information of a plurality of key points of the target object is obtained by adopting a key point detection model. In the model training phase, the keypoint detection model is trained based on a predetermined loss function and predicted position information for the training samples. And constructing a predetermined loss function for the first class of training samples based on the predicted position information of the adjacent key points in the plurality of key points.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is a schematic view of an application scenario of a method for training a keypoint detection model and a method for detecting keypoints of a target object according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 includes a terminal device 110, which may be any electronic device with processing functionality, including but not limited to a smartphone, a tablet, a laptop, a desktop computer, a server, and so on.

The terminal device 110 may, for example, detect a target object from the input image 120, and label key points of the detected target object, thereby implementing identification of the target object. For example, the terminal device 110 may label the key points of the target object by using a key point detection model, and obtain the position information 130 of the key points of the target object.

Illustratively, the keypoint detection model may be based on a regression method or a model constructed based on a gaussian thermogram method, for example. The regression method directly outputs coordinates of key points of the target object through a regression method, and the method is suitable for detecting points with obvious texture features. The principle of the gaussian heatmap is to encode the position information of a point into a gaussian smoothed peak point. The gaussian thermogram method is suitable for fitting points of a non-rigid body.

According to an embodiment of the present disclosure, as shown in fig. 1, the application scenario 100 may further include a server 140, for example. Terminal device 110 may be communicatively coupled to server 140 via a network, which may include wired or wireless communication links.

For example, the server 140 may be configured to train the key point detection model, and send the trained key point detection model 150 to the terminal device 110 in response to the model acquisition request sent by the terminal device 110, so as to facilitate the terminal device 110 to detect the target object for the input image.

Illustratively, the server may be, for example, a server that provides various services, such as a background management server that provides support for applications running on the terminal device 110. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

According to an embodiment of the present disclosure, as shown in fig. 1, the application scenario 100 may further include a database 160, and the database 160 may maintain, for example, a massive amount of images including images with tags and images without tags. The server 140 may access the database 160, for example, randomly extract partial images from a large number of images in the database, and train the keypoint detection model using the extracted images as training samples.

In an embodiment, the terminal device 110 and the server 140 may be, for example, the same device, and the same device includes a first processing module for performing target object detection on an image and a second processing module for training a keypoint detection model. The first processing module and the second processing module can communicate with each other through a network protocol.

It should be noted that the method for training the keypoint detection model and the method for detecting the keypoints of the target object provided by the present disclosure may be executed by different devices in the server 140 and the terminal device 110, or may be executed by the same device in the server 140 and the terminal device 110. Accordingly, the training apparatus for the keypoint detection model and the apparatus for detecting the keypoints of the target object provided by the present disclosure may be disposed in different devices of the server 140 and the terminal device 110, or may be disposed in the same device of the server 140 and the terminal device 110.

It should be understood that the number and types of terminal devices, servers, keypoint detection models and databases in fig. 1 are merely illustrative. There may be any number and type of terminal devices, servers, keypoint detection models, and databases, as desired for implementation.

FIG. 2 is a flow chart of a method of training a keypoint detection model according to an embodiment of the disclosure.

As shown in fig. 2, the method 200 for training the keypoint detection model of the embodiment may include operations S210 to S230.

In operation S210, training samples including a target object are obtained, the training samples including at least a first type of training sample without a label.

According to an embodiment of the present disclosure, each training sample may be, for example, an image including a target object. The target object may be any entity, such as an animal, a plant, a living article, a dress, a shoe, a hat, etc., for example.

Illustratively, the training samples may be obtained, for example, by randomly drawing images from a database, where the images stored in the database may include images with labels and images without labels. In one embodiment, the images with tags and the images without tags may be stored in different memory partitions of a database, for example. This embodiment may obtain a predetermined number of images from each memory partition, resulting in a training sample. The image has labels for indicating position information of a plurality of key points of the target object in the image, and the position information can be obtained by pre-calibration.

For example, when the keypoint detection model is a pre-trained model with accuracy not meeting the predetermined requirement, the image may be acquired from a storage partition storing only images without labels, and the acquired images without labels may be used as a first class of training samples to perform unsupervised training on the keypoint detection model.

In operation S220, predicted position information of a plurality of key points of the target object is obtained using the key point detection model based on the training samples.

According to an embodiment of the disclosure, the keypoint detection model may be constructed based on a lightweight front-end mvc (model View controller) framework, for example, may be constructed based on a Backbone (Backbone) framework. It is to be understood that the architecture of the keypoint detection model is merely an example to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto.

For example, the keypoint detection model may be a neural network model constructed based on a regression method or a gaussian thermal mapping method, so that the position coordinates of the keypoints in the input image are output by processing the input image. The operation S220 may use the training sample as an input of the key point detection model, and output the predicted coordinate values of the key points of the target object in the training sample after being processed by the key point detection model. It is to be understood that the type of keypoint detection model and the algorithm employed are merely examples to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto.

According to an embodiment of the present disclosure, in the case where the training sample is plural, the training sample may be batch-input to the keypoint detection model to train the model one by one.

It will be appreciated that the number of the plurality of keypoints depends on the type of target object. For the same target object, the number of the plurality of key points can be set according to actual requirements. The number of the plurality of key points may be set by an initial configuration parameter of the key point detection model, which is not limited in this disclosure.

In operation S230, the keypoint detection model is trained based on the predicted position information and a predetermined loss function for the training samples.

According to an embodiment of the disclosure, for the first class of training samples, the predetermined loss function may be constructed based on predicted position information of adjacent key points in the plurality of key points. When the training sample is the first type of training sample, the operation S230 may determine a value of the predetermined loss function according to the predicted location information of the plurality of key points. The keypoint detection model is trained with values based on the predetermined loss function.

For example, the predicted position information of the plurality of key points may be used as the value of a variable in the predetermined loss function, so as to obtain the value of the predetermined loss function. And then, when the value of the predetermined loss function is determined to be the minimum value by adopting a gradient descent algorithm and the like, the value of the parameter in the key point detection model is determined. And the value of the parameter is assigned to the key point detection model, so that the optimization of the key point detection model can be realized. It is to be understood that the algorithm for determining the minimum value of the predetermined loss function value is only an example to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto. For example, a back propagation algorithm may also be employed to determine the minimum of the predetermined loss function values. For example, an auto-derivative system such as PyTorch may be used to determine the minimum value of the predetermined function value.

For example, when determining the value of the predetermined loss function for the first class of training samples, the distance between the predicted positions of two adjacent key points may be calculated. And if the distance is greater than the preset distance, taking the predicted positions of the two key points as values of variables in the preset loss function. Or, an included angle value between a connecting line between the two adjacent key points and a straight line parallel to the predetermined axis may be calculated first, and if the included angle value is greater than the predetermined included angle value, the predicted positions of the two key points are used as values of variables in the predetermined loss function.

The embodiment of the disclosure can use the unlabeled training sample when training the key point detection model by constructing the predetermined loss function according to the position information between the adjacent key points without the help of the position information indicated by the labels, thereby realizing the unsupervised training of the model. Compared with the technical scheme that the supervised training of the model is required by adopting a large number of training samples in the related technology, the method can at least partially reduce the labeling cost of the label while the trained model meets the precision requirement. The method of the embodiment is particularly suitable for scenes in which more training samples are needed due to large freedom degree of the target object and diversified feature textures.

FIG. 3 is a flow chart of a method of training a keypoint detection model according to another embodiment of the present disclosure.

As shown in fig. 3, the method 300 for training the keypoint detection model of this embodiment may include operations S310 to S340.

In operation S310, training samples including a target object are obtained, the training samples including a first type of training sample without a label and a second type of training sample with a label.

According to an embodiment of the present disclosure, the label of the second type of training sample indicates position information of a keypoint of the target object comprised by the second type of training sample. The second class of training samples may be obtained, for example, by labeling key points of the target object in the image. The implementation method of operation S310 is similar to the method for obtaining training samples described above, and is not described herein again.

In operation S320, a predetermined loss function for the training samples is determined according to the type of the training samples.

According to an embodiment of the present disclosure, the types of the training samples may include a first type and a second type, where the training samples are the first type training samples, and the training samples are the second type training samples. The embodiment may determine the type of training sample based on whether the training sample has a label. If the label is present, the type is the first type, and if the label is absent, the type is the second type.

For example, in the case that the training samples are training samples of the first type, the predetermined loss function for the training samples may be a function constructed based on the predicted location information of the neighboring keypoints in the plurality of keypoints as described above. The predetermined loss function for the first type of training samples will be described in detail later and will not be described in detail here.

Illustratively, in the case where the training samples are of the second class, the predetermined loss function for the training samples includes any one of: mean absolute error, mean square error loss, smoothed squared absolute error. It is to be understood that the above-mentioned types of predetermined loss functions for the second class of training samples are only examples to facilitate understanding of the present disclosure, and the present disclosure is not limited thereto.

In operation S330, predicted position information of a plurality of key points of the target object is obtained using the key point detection model based on the training samples. According to an embodiment of the present disclosure, the implementation of operation S330 is similar to the method for obtaining the predicted location information described above, and is not described herein again.

In operation S340, the keypoint detection model is trained based on the predicted location information and a predetermined loss function for the training samples.

According to the embodiment of the present disclosure, when the training sample is the first type of training sample, the implementation method of operation S340 is similar to the method for training the keypoint detection model described above, and is not described herein again.

According to an embodiment of the present disclosure, when the training sample is a second class training sample, the operation S340 may train the keypoint detection model based on a predetermined loss function for the training sample, predicted position information of a plurality of keypoints, and position information indicated by the label. For example, the value of the predetermined loss function may be determined according to a difference between the predicted position information of the plurality of key points and the position information indicated by the tag. And then, when a back propagation algorithm and the like are adopted to determine that the preset loss function takes the minimum value, the value of the parameter in the key point detection model is obtained. And replacing the value of the parameter in the key point detection model in the current state with the value of the parameter to obtain the optimized key point detection model, thereby realizing the training of the model.

According to the embodiment of the disclosure, the key point detection model is trained by integrating the labeled training sample and the unlabeled training sample, so that the precision of the trained key point detection model can be further improved.

According to the embodiment of the disclosure, after the first type of training sample and the second type of training sample are obtained, for example, the second type of training sample may be used to pre-train the keypoint detection model, so that the value of the parameter in the keypoint detection model is within the reasonable interval range. After the pre-training is finished, the key point detection model is trained by adopting a mode of mixing the first type of training sample and the second type of training sample or adopting a single first type of training sample. Namely, a supervised mode is adopted for pre-training, and then a supervised and unsupervised mixed mode or an unsupervised mode is adopted for accurate training. By the method, the training efficiency of the model can be improved.

Fig. 4 is a schematic diagram of a principle of determining a value of a predetermined loss function for a first class of training samples according to an embodiment of the present disclosure.

According to the embodiment of the disclosure, in the case that the training sample is the first type of training sample, the operation of determining the value of the predetermined loss function may, for example, first determine a connection line between any two adjacent key points in the plurality of key points, as a target connection line. After the target connecting line is obtained, the difference of two target connecting lines between any three adjacent key points is determined based on the predicted position information of any three adjacent key points in the plurality of key points, and the value of the predetermined loss function is determined according to the difference.

For example, the sum of all differences determined from the plurality of key points may be taken as the value of the predetermined loss function, or the average of all differences determined from the plurality of key points may be taken as the value of the predetermined loss function.

For example, as shown in fig. 4, when the target object is a shoe 400, the number of the plurality of key points may be 18, for example, and the data output by the key point detection model may be coordinate values of the 18 key points, which may be relative to the coordinate system shown in fig. 4, for example. The coordinate system takes the toe position of the shoe 400 as the origin O of coordinates, the shoe width direction as the X-axis, and the shoe length direction as the Y-axis. After the coordinate values of the 18 key points are obtained, a target connecting line can be obtained by positioning according to two adjacent coordinate values. So that an 18-entry scribe line (e.g., a dotted line between adjacent keypoints in fig. 4) can be obtained.

For example, after the target connection line is obtained, the length of the target connection line may be determined according to coordinate values of two end points of the target connection line. After the lengths of the target connecting lines are obtained, the difference value of the lengths of the two adjacent target connecting lines can be determined, and the difference value is used as the difference of the two target connecting lines between any three adjacent key points. For example, as shown in fig. 4, for three adjacent

key points

401, 402, and 403, two target connection lines can be obtained by connecting two adjacent key points, and the lengths of the two target connection lines are L respectively₁And L₂The difference between the lengths of the two target connecting lines is L₁-L₂. In an embodiment, the value of the difference may be L₁-L₂To avoid the situation that the value of the predetermined loss function is inaccurate due to the cancellation of the positive and negative difference values. This embodiment may use the sum of the values of the determined differences as the value of the predetermined loss function.

For example, after the target connection line is obtained, an included angle value of two target connection lines between any three adjacent key points may be determined according to coordinate values of two end points of the target connection line. And taking the included angle value as the difference of the two target connecting lines. In an embodiment, a value of an included angle between each target connection line and the X axis may also be determined according to a slope k of each target connection line with respect to the X axis in fig. 4, where the included angle is, for example, tan^-1k. Will be provided withAnd taking the difference value of the values of the two included angles between the two target connecting lines and the X axis as the difference of the two target connecting lines. For example, for three adjacent

key points

401, 402 and 403 in fig. 4, the included angle between two target connecting lines formed by connecting is λ. This embodiment may take the sum of all pinch angle values determined as the value of the predetermined loss function.

For example, after the target connecting line is obtained, the lengths of two target connecting lines between any three adjacent key points and the angle value of the two target connecting lines, which are determined by the method described above, may be used as the difference between the two target connecting lines. Substituting the lengths of the two target connecting lines and the included angle value of the two target connecting lines into a formula of a predetermined loss function, and calculating to obtain a value of the predetermined loss function.

Wherein n is the number of a plurality of key points, p_iFor the ith keypoint, p, of the plurality of keypoints_i-1Is the (i-1) th key point, p, of the plurality of key points_i+1For the (i +1) th keypoint of the plurality of keypoints, D (p)_i-1，p_i) Is the length of the target link between the (i-1) th and ith key points, D (p)_i，p_i+1) Is the length of the target link between the ith and (i +1) th keypoints, V (p)_i-1，p_i) A rotation angle, V (p), of a target link between the (i-1) th key point and the ith key point with respect to a predetermined axis_i，p_i+1) Is the rotation angle of the target connecting line between the ith key point and the (i +1) th key point relative to the predetermined axis, (V)_pi-1，pi-V_pi，pi+1) The included angle value between the two target connecting lines is shown. Where i-1 is assigned n when i is 1 and i +1 is assigned 1 when i is n. The predetermined axis may be any coordinate axis in a coordinate system constructed based on the target object, or may be any axis set in advance.

In summary, in the embodiment, through the setting of the predetermined loss function for the first class of training samples, unsupervised training of the keypoint detection model can be realized, and the labeling cost of the label can be reduced at least partially while the trained model meets the precision requirement. The method of the embodiment is particularly suitable for scenes in which more training samples are needed due to large freedom and diversified feature textures of target objects (such as shoes and the like).

Furthermore, the value of the loss function is determined according to the difference between two target connecting lines formed by connecting three adjacent key points, and the relative position information between the key points can be fully considered, so that the more accurate learning of the model on the target object characteristics can be improved, and the accuracy of the trained key point detection model can be improved conveniently.

Based on the above training method of the key point detection model, the present disclosure also provides a method for detecting key points of a target object. This method will be described in detail below with reference to fig. 5.

Fig. 5 is a flowchart of a method of detecting a target object keypoint according to an embodiment of the present disclosure.

As shown in fig. 5, the method 500 of detecting a target object keypoint of the embodiment may include operations S510 to S520.

In operation S510, a to-be-processed image including a target object is acquired.

According to the embodiments of the present disclosure, the target object is similar to the target object described above, and is not described herein again. The image to be processed may be, for example, an image photographed in real time, or may be an image that is previously photographed and then cached. In the virtual fitting scene, the image to be processed may be a garment image shot in advance, or a shoe image, or the like.

In operation S520, position information of a keypoint of a target object in an image to be processed is obtained using a keypoint detection model. The keypoint detection model may be obtained by training using the aforementioned training method for the keypoint detection model, for example.

It is understood that the operation S520 is similar to the method for obtaining the predicted position information of the plurality of key points of the target object in the training sample by using the key point detection model described above, except that the key point detection model in this embodiment is a model that is trained in advance and has a precision meeting a condition.

According to the method for detecting the key points of the target object, the key point detection model obtained by training through the training method described above can be used for accurately detecting the key points of the target object with high degree of freedom and diversified texture features, and therefore user experience is improved conveniently.

Based on the training method of the key point detection model, the disclosure also provides a training device of the key point detection model. The apparatus will be described in detail below with reference to fig. 6.

Fig. 6 is a block diagram of a structure of a training apparatus for a keypoint detection model according to an embodiment of the present disclosure.

As shown in fig. 6, the training apparatus 600 for a keypoint detection model of this embodiment may include a sample acquisition module 610, a prediction information obtaining module 620, and a model training module 630.

The sample acquiring module 610 is configured to acquire training samples including target objects, where the training samples include at least a first type of training sample without a label. In an embodiment, the sample obtaining module 610 may be configured to perform the operation S210 described above, for example, and is not described herein again.

The predicted information obtaining module 620 is configured to obtain predicted position information of a plurality of key points of the target object by using a key point detection model based on the training samples. In an embodiment, the prediction information obtaining module 620 may be configured to perform the operation S220 described above, for example, and is not described herein again.

The model training module 630 is used to train the keypoint detection model based on the predicted location information and the predetermined loss function for the training samples. And constructing a predetermined loss function for the first class of training samples based on the predicted position information of the adjacent key points in the plurality of key points. In an embodiment, the model training module 630 may be used to perform the operation S230 described above, for example, and is not described herein again.

According to an embodiment of the present disclosure, the model training module 630 may include, for example, a value determination sub-module and a training sub-module. And the value determination submodule is used for determining the value of the predetermined loss function based on the predicted position information of the plurality of key points. And the training submodule is used for training the key point detection model according to the value of the preset loss function.

According to an embodiment of the present disclosure, the value determination submodule may include, for example, a connection line determination unit, a difference determination unit, and a value determination unit. The connecting line determining unit is used for determining a connecting line between any two adjacent key points in the plurality of key points as a target connecting line under the condition that the training sample is the first type of training sample. The difference determining unit is used for determining the difference of two target connecting lines between any three adjacent key points based on the predicted position information of any three adjacent key points in the plurality of key points. The value determining unit is used for determining the value of the predetermined loss function according to the difference.

According to an embodiment of the present disclosure, the difference determined by the difference determination unit comprises at least one of: the difference of the lengths of the two target connecting lines and the included angle value between the two target connecting lines.

According to an embodiment of the present disclosure, the predetermined loss function for the first class of training samples is expressed by the following formula:

wherein n is the number of a plurality of key points, p_iFor the ith keypoint, p, of the plurality of keypoints_i-1Is the (i-1) th key point, p, of the plurality of key points_i+1For the (i +1) th keypoint of the plurality of keypoints, D (p)_i-1，p_i) Is the length of the target link between the (i-1) th and ith key points, D (p)_i，p_i+1) Is the length of the target link between the ith and (i +1) th keypoints, V (p)_i-1，p_i) A rotation angle, V (p), of a target link between the (i-1) th key point and the ith key point with respect to a predetermined axis_i，p_i+1) And a rotation angle of a target connecting line between the ith key point and the (i +1) th key point relative to a preset axis, wherein when i is equal to 1, i-1 is assigned as n, and when i is equal to n, i +1 is assigned as 1.

According to an embodiment of the present disclosure, the training samples further comprise a second class of training samples having labels indicating location information of a plurality of key points in the target object. The training apparatus 600 for the keypoint detection model may further include a loss function determining module, configured to determine a predetermined loss function for the training samples according to the types of the training samples.

According to an embodiment of the present disclosure, the model training module is specifically configured to: in the case where the training samples are second-class training samples, the keypoint detection model is trained based on a predetermined loss function for the training samples, predicted position information of a plurality of keypoints, and position information indicated by the labels.

According to an embodiment of the present disclosure, in a case where the training samples are training samples of a second class, the predetermined loss function for the training samples includes any one of: mean absolute error, mean square error loss, smoothed squared absolute error.

Based on the method for detecting the key points of the target object, the disclosure also provides a device for detecting the key points of the target object. The apparatus will be described in detail below with reference to fig. 7.

Fig. 7 is a block diagram of a structure of an apparatus for detecting a key point of a target object according to an embodiment of the present disclosure.

As shown in fig. 7, the apparatus 700 for detecting key points of a target object of this embodiment may include an image acquisition module 710 and a location information determination module 720.

The image acquisition module 710 is used for acquiring an image to be processed including a target object. In an embodiment, the image obtaining module 710 may be configured to perform the operation S510 described above, which is not described herein again.

The position information determining module 720 is configured to obtain position information of a key point of a target object in the image to be processed by using the key point detection model. The key point detection model may be obtained by training with the training device of the key point detection model described above. In an embodiment, the location information determining module 720 may be configured to perform the operation S520 described above, which is not described herein again.

It should be noted that, in the technical solution of the present disclosure, the acquisition, storage, application, and the like of the personal information of the related user all conform to the regulations of the relevant laws and regulations, and do not violate the common customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement the method of training the keypoint detection model and/or the method of detecting keypoints for a target object of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the various methods and processes described above, such as a method of training a keypoint detection model and/or a method of detecting keypoints of a target object. For example, in some embodiments, the method of training the keypoint detection model and/or the method of detecting keypoints for the target object may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM 803 and executed by computing unit 801, a computer program may perform one or more steps of the above described method of keypoint detection model training and/or method of detecting keypoints of a target object. Alternatively, in other embodiments, the computing unit 801 may be configured by any other suitable means (e.g., by means of firmware) to perform a method of training a keypoint detection model and/or a method of detecting keypoints of a target object.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in a traditional physical host and a VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for training a keypoint detection model comprises the following steps:

obtaining training samples including a target object, wherein the training samples at least include a first type of training sample without a label;

based on the training sample, obtaining predicted position information of a plurality of key points of the target object by adopting a key point detection model; and

training the keypoint detection model based on a predetermined loss function for the training samples and the predicted position information,

the predetermined loss function for the first class of training samples is constructed based on the predicted position information of adjacent key points in the plurality of key points;

wherein training the keypoint detection model comprises:

determining values of the predetermined loss function based on the predicted position information of the plurality of key points; and

training the key point detection model according to the value of the predetermined loss function;

wherein, in the case that the training sample is a first type of training sample, determining the value of the predetermined loss function includes:

determining a connecting line between any two adjacent key points in the plurality of key points as a target connecting line;

determining the difference of two target connecting lines between any three adjacent key points based on the predicted position information of the any three adjacent key points in the plurality of key points; and

and determining the value of the predetermined loss function according to the difference.

2. The method of claim 1, wherein determining a difference of two target links between the any three keypoints comprises at least one of:

determining the difference value of the lengths of the two target connecting lines;

and determining the included angle value between the two target connecting lines.

3. The method of claim 2, wherein the predetermined loss function for the first class of training samples is expressed by the following equation:

wherein n is the number of the plurality of key points, p_iFor the ith keypoint, p, of said plurality of keypoints_i-1For the (i-1) th keypoint, p, of said plurality of keypoints_i+1For the (i +1) th keypoint of said plurality, D (p)_i-1，p_i) Is the length of the target link between the (i-1) th and ith key points, D (p)_i，p_i+1) Is the length of the target link between the ith and (i +1) th keypoints, V (p)_i-1，p_i) A rotation angle, V (p), of a target link between the (i-1) th key point and the ith key point with respect to a predetermined axis_i，p_i+1) And a rotation angle of a target connecting line between the ith key point and the (i +1) th key point relative to a preset axis, wherein when i is equal to 1, i-1 is assigned as n, and when i is equal to n, i +1 is assigned as 1.

4. The method according to any one of claims 1-3, wherein the training samples further comprise a second class of training samples having labels indicating location information for a plurality of key points in the target object; the method further comprises the following steps:

determining a predetermined loss function for the training samples according to the type of the training samples.

5. The method of claim 4, wherein training the keypoint detection model if the training samples are of the second class comprises:

training the keypoint detection model based on a predetermined loss function for the training samples, predicted position information for the plurality of keypoints, and position information indicated by the labels.

6. The method of claim 4, wherein, in the case that the training samples are of the second class, the predetermined loss function for the training samples comprises any one of: mean absolute error, mean square error loss, smoothed squared absolute error.

7. A method of detecting key points of a target object, comprising:

acquiring an image to be processed including a target object; and

obtaining the position information of the key points of the target object in the image to be processed by adopting a key point detection model,

the method for detecting the key points comprises the following steps of training a key point detection model according to any one of claims 1-6.

8. A training apparatus for a keypoint detection model, comprising:

the system comprises a sample acquisition module, a comparison module and a comparison module, wherein the sample acquisition module is used for acquiring training samples comprising target objects, and the training samples at least comprise a first type of training sample without labels;

the predicted information obtaining module is used for obtaining predicted position information of a plurality of key points of the target object by adopting a key point detection model based on the training sample; and

a model training module to train the keypoint detection model based on a predetermined loss function for the training samples and the predicted location information,

wherein the predetermined loss function for the first class of training samples is constructed based on predicted position information of adjacent key points in the plurality of key points;

wherein the model training module comprises:

a value determination submodule, configured to determine a value of the predetermined loss function based on predicted location information of the plurality of key points; and

the training submodule is used for training the key point detection model according to the value of the preset loss function;

wherein, the value determination submodule comprises:

a connecting line determining unit, configured to determine, when the training sample is a first-class training sample, a connecting line between any two adjacent key points in the plurality of key points, as a target connecting line;

a difference determining unit, configured to determine a difference between two target connection lines between any three adjacent key points in the plurality of key points based on predicted position information of the any three adjacent key points; and

and the value determination unit is used for determining the value of the predetermined loss function according to the difference.

9. The apparatus of claim 8, wherein the difference determined by the difference determination unit comprises at least one of:

the difference of the lengths of the two target connecting lines;

and the included angle between the two target connecting lines.

10. The apparatus of claim 9, wherein the predetermined loss function for the first class of training samples is expressed by the following equation:

11. The apparatus according to any one of claims 8-10, wherein the training samples further comprise a second class of training samples having labels indicating location information of a plurality of key points in the target object; the device further comprises:

a loss function determination module for determining a predetermined loss function for the training samples according to the type of the training samples.

12. The apparatus of claim 11, wherein the model training module is specifically configured to:

training the keypoint detection model based on a predetermined loss function for the training samples, the predicted position information of the plurality of keypoints, and the position information indicated by the label, if the training samples are the second class of training samples.

13. The apparatus of claim 11, wherein, in the case that the training samples are the second class of training samples, the predetermined loss function for the training samples comprises any one of: mean absolute error, mean square error loss, smoothed squared absolute error.

14. An apparatus for detecting key points of a target object, comprising:

the image acquisition module is used for acquiring an image to be processed comprising a target object; and

a position information determining module for obtaining position information of the key point of the target object by taking the image to be processed as the input of the key point detection model,

the key point detection model is obtained by training by using the device of any one of claims 8-13.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-7.