CN113128436B

CN113128436B - Method and device for detecting key points

Info

Publication number: CN113128436B
Application number: CN202110459116.1A
Authority: CN
Inventors: 孟庆月
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2022-04-01
Anticipated expiration: 2041-04-27
Also published as: CN113128436A

Abstract

The disclosure provides a method and a device for detecting key points, and relates to the technical field of artificial intelligence such as deep learning and videos. The specific implementation mode comprises the following steps: acquiring a multipoint thermodynamic diagram of a first video frame in a target video, wherein the multipoint thermodynamic diagram comprises each key point of a target in the first video frame; inputting the multipoint thermodynamic diagrams and a second video frame in the target video into a lightweight key point detection model to obtain a single-point thermodynamic diagram containing key points of a target in the second video frame, wherein the number of parameters of the lightweight key point detection model is smaller than a specified value, and the number of video frames with a difference between a first video frame and the second video frame is not larger than a preset difference value; and determining the key point coordinate values of the targets in the second video frame according to the single-point thermodynamic diagrams. The method and the device can utilize the multipoint thermodynamic diagram of the first video frame in the video and the similar second video frame to input the lightweight model together to detect the second video frame, thereby realizing a rapid and accurate detection process.

Description

Method and device for detecting key points

Technical Field

The disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence such as deep learning and videos, and particularly relates to a method and a device for detecting key points.

Background

Pose estimation is typically estimated using geometric relationships of objects or key points of objects. Specifically, the keypoints in each frame of the image or the video can be detected, and the posture estimation is realized by using the detection result.

In the related art, for example, for estimating the posture of a human body, the detected key point is the coordinate position of the joint point of the human body. Specifically, a target frame indicating the position of the person may be detected by the detector, and then the coordinate position of the joint point of the human body may be detected from the target frame.

Disclosure of Invention

Provided are a method and device for detecting key points, an electronic device and a storage medium.

According to a first aspect, there is provided a method for detecting a keypoint, comprising: acquiring a multipoint thermodynamic diagram of a first video frame in a target video, wherein the multipoint thermodynamic diagram comprises each key point of a target in the first video frame; inputting the multipoint thermodynamic diagrams and a second video frame in the target video into a lightweight key point detection model to obtain a single-point thermodynamic diagram containing key points of a target in the second video frame, wherein the number of parameters of the lightweight key point detection model is smaller than a specified value, and the number of video frames with a difference between a first video frame and the second video frame is not larger than a preset difference value; and determining the key point coordinate values of the targets in the second video frame according to the single-point thermodynamic diagrams.

According to a second aspect, there is provided an apparatus for detecting a keypoint, comprising: acquiring a multipoint thermodynamic diagram of a first video frame in a target video, wherein the multipoint thermodynamic diagram comprises each key point of a target in the first video frame; inputting the multipoint thermodynamic diagrams and a second video frame in the target video into a lightweight key point detection model to obtain a single-point thermodynamic diagram containing key points of a target in the second video frame, wherein the number of parameters of the lightweight key point detection model is smaller than a specified value, and the number of video frames with a difference between a first video frame and the second video frame is not larger than a preset difference value; and determining the key point coordinate values of the targets in the second video frame according to the single-point thermodynamic diagrams.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any of the embodiments of the method of keypoint detection.

According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to any one of the embodiments of the method of detecting a keypoint.

According to a fifth aspect, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the method according to any one of the embodiments of the method of detection of keypoints.

According to the scheme disclosed by the invention, the lightweight model can be input together with a second video frame which is close to the first video frame by utilizing the multipoint thermodynamic diagram of the first video frame in the video to detect the second video frame, so that a quick and accurate detection process is realized.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which some embodiments of the present disclosure may be applied;

FIG. 2a is a flow chart of one embodiment of a method of keypoint detection according to the present disclosure;

FIG. 2b is a flow chart of yet another embodiment of a method of keypoint detection according to the present disclosure;

FIG. 3 is a schematic diagram of one application scenario of a method of keypoint detection according to the present disclosure;

FIG. 4 is a flow chart of yet another embodiment of a method of keypoint detection according to the present disclosure;

FIG. 5 is a schematic block diagram of one embodiment of a keypoint detection apparatus according to the present disclosure;

fig. 6 is a block diagram of an electronic device for implementing the method of detecting keypoints of the embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, necessary security measures are taken, and the customs of the public order is not violated.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method of detecting keypoints or the apparatus for detecting keypoints of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as video applications, live applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

Here, the

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the

terminal devices

101, 102, 103. The background server may analyze and perform other processing on the received data such as the target video, and feed back a processing result (for example, a key point coordinate value of the video frame) to the terminal device.

It should be noted that the method for detecting a keypoint provided by the embodiment of the present disclosure may be executed by the server 105 or the

terminal devices

101, 102, and 103, and accordingly, the device for detecting a keypoint may be disposed in the server 105 or the

terminal devices

101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2a, a flow 200 of one embodiment of a method of keypoint detection according to the present disclosure is shown. The detection method of the key points comprises the following steps:

step 201, a multipoint thermodynamic diagram of a first video frame in a target video is obtained, wherein the multipoint thermodynamic diagram includes each key point of a target in the first video frame.

In this embodiment, an execution subject (for example, the server or the terminal device shown in fig. 1) on which the method for detecting the keypoint is executed may obtain the multipoint thermodynamic diagram of the first video frame in the target video. The multipoint thermodynamic diagram contains the respective keypoints of the object in the first video frame. For example, the target is a human body, and the multipoint thermodynamic diagram contains various key points of the human body.

The multipoint thermodynamic diagram may be obtained by detecting the first video frame by a deep neural network for predicting the multipoint thermodynamic diagram, or may be obtained by combining the respective thermodynamic diagrams after detecting the first video frame and outputting the thermodynamic diagrams by the deep neural network for predicting the thermodynamic diagram.

It should be noted that, the present application does not limit the sequence of the first video frame and the second video frame in the target video, that is, the playing time of the first video frame may be prior to the second video frame or after the second video frame.

Step 202, inputting the multipoint thermodynamic diagram and a second video frame in the target video into the lightweight key point detection model to obtain the single-point thermodynamic diagram containing the key points of the target in the second video frame, wherein the number of parameters of the lightweight key point detection model is smaller than a specified value, and the number of video frames with a difference between the first video frame and the second video frame is not larger than a preset difference value.

In this embodiment, the executing body may input the multipoint thermodynamic diagram and the second video frame in the target video into the lightweight key point detection model, and obtain the single-point thermodynamic diagram including the key points output from the model. The single point thermodynamic diagram here contains keypoints that are keypoints of objects in the second video frame.

In practice, a single point refers to a single keypoint. That is, the keypoint contained in a single-point thermodynamic diagram is often a keypoint of a target, such as the fingertip of the index finger of the right hand of a human body.

The lightweight key point detection model is a deep neural network, is a lightweight small model, has fewer parameters, and can detect the second video frame at a higher speed.

The number of the video frames that differ between the first video frame and the second video frame is small, for example, the difference may be 1 or 2, that is, the two video frames are adjacent video frames or the two video frames are separated by one video frame.

And step 203, determining the key point coordinate values of the targets in the second video frame according to the single-point thermodynamic diagrams.

In this embodiment, the executing entity may determine the key point coordinate value of the second video frame according to the single-point thermodynamic diagram. In this way, keypoint detection for the second video frame is achieved.

In practice, the executing entity may determine the key point coordinate values of the object in the second video frame according to the single-point thermodynamic diagram in various ways. For example, the executing entity may directly determine the key point coordinate values respectively indicated by the single-point thermodynamic diagrams, so as to obtain the key point coordinate values of the object in the second video frame. Alternatively, the executing body may merge the single-point thermodynamic diagrams to obtain a multipoint thermodynamic diagram, and use the key point coordinate values indicated by the multipoint thermodynamic diagram as the key point coordinate values of the target in the second video frame.

Specifically, the execution subject may complete the detection of the key points in each video frame of the target video in a sequence from first to last, or in a sequence from last to first. In addition, the execution main body may also perform detection in a sequence other than the sequence, for example, the video frames with the playing time being the midpoint in each video frame of the video are detected to the two ends of the video respectively.

The target detected in the present disclosure may be a landscape, a human face, an object trajectory, and the like.

In the case where the detected target is a human face, it should be noted that:

the models (the lightweight key point detection model and the accurate key point detection model) in this embodiment are not models for a specific user, and cannot reflect personal information of a specific user.

In this embodiment, the executing entity of the method for detecting the key point may obtain the video in various public and legal compliance manners, for example, may obtain the video from a public data set, or obtain the video from the user after authorization of the user.

It should be noted that the model obtained through this step includes the face information of the user indicated by the video, but the construction of the model is performed after the authorization of the user, and the construction process thereof conforms to relevant laws and regulations.

The method provided by the above embodiment of the present disclosure may utilize a multipoint thermodynamic diagram of a first video frame in a video, and input a lightweight model together with a second video frame that is close to the first video frame to detect the second video frame, thereby implementing a fast and accurate detection process.

In some optional implementations of this embodiment, the first video frame is a starting video frame detected in the video; step 201 may include: inputting the initial video frame into an accurate key point detection model to obtain a single-point thermodynamic diagram containing key points of targets in the initial video frame, wherein the parameter number of the accurate key point detection model is greater than that of the lightweight key point detection model; the single-point thermodynamic diagrams are combined, and a multi-point thermodynamic diagram is generated according to the combination result.

In these alternative implementations, the executing entity may input the detected starting video frame into the precise key point detection model, and obtain a single-point thermodynamic diagram output from the model, where the single-point thermodynamic diagram includes key points of the target in the starting video frame. The accurate keypoint detection model here is also a deep neural network. The accurate key point detection model is a large model, the detection accuracy is high, and the number of parameters is large. Specifically, the detected start key frame is the first frame detected in the video.

The execution body can combine the single-point thermodynamic diagrams output by the accurate key point detection model, and generate a multi-point thermodynamic diagram according to a combination result. Specifically, the execution body may generate the multipoint thermodynamic diagram according to the merging result in various ways. For example, the execution subject may directly use the combined result of the single-point thermodynamic diagrams as the multipoint thermodynamic diagrams. In addition, the execution body may perform preset processing on the merged result, and use the preset processing result as a multipoint thermodynamic diagram. For example, the preset processing here may refer to performing gaussian blurring.

As shown in fig. 2b, a starting video frame (size of the starting video frame is 192 × 144, 3 channels are used) is input into a precise key point detection model to obtain a single-point thermodynamic diagram (size of the single-point thermodynamic diagram is 24 × 18, 17 key points correspond to 17 single-point thermodynamic diagrams), further a multi-point thermodynamic diagram (size of the multi-point thermodynamic diagram is 24 × 18, including 17 key points) is obtained, and a lightweight key point detection model (size of the input is 192 × 144, 4 channels are used) is input into a multi-point thermodynamic diagram and a second video frame after the starting video frame, further a multi-point thermodynamic diagram is obtained. And in the same way, inputting the multipoint thermodynamic diagram of the second video frame and the third video frame into the lightweight key point detection model.

In the implementation modes, the detection result of the detected initial video frame can be determined by adopting an accurate model, so that the second video frame can utilize the accurate detection result of the initial video frame, and the detection accuracy is improved on the basis of improving the detection efficiency by detecting the second video frame by utilizing a light-weight model.

In some optional implementations of this embodiment, inputting the multipoint thermodynamic diagram and the second video frame in the target video into the lightweight keypoint detection model in step 202 may include: splicing the multipoint thermodynamic diagram and the second video frame to obtain a splicing result; and inputting the splicing result into the lightweight key point detection model.

In these alternative implementations, the executing entity may first stitch the multipoint thermodynamic diagram and the second video frame, and input the stitching result into the lightweight keypoint detection model, thereby completing the process of inputting the multipoint thermodynamic diagram and the second video frame into the lightweight keypoint detection model. In practice, the stitching result is 4 channels of images, so the stitching process can make the image input to the lightweight keypoint detection model be 4 channels of images.

The implementation modes can realize common input of the multipoint thermodynamic diagram and the second video frame through splicing.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for detecting keypoints according to the present embodiment. In the application scenario of fig. 3, the executing entity 301 obtains a multipoint thermodynamic diagram 302 of a first video frame in the target video, wherein the multipoint thermodynamic diagram 302 includes key points of a target in the first video frame. The execution subject 301 inputs the multipoint thermodynamic diagram 302 and a second video frame 303 in the target video into the lightweight key point detection model 304, and obtains a single-point thermodynamic diagram 305 containing key points of the target in the second video frame, wherein the number of parameters of the lightweight key point detection model is smaller than a specified value, and the number of video frames, which are different from the first video frame and the second video frame, is not larger than a preset difference value. The executing agent 301 determines key point coordinate values 306 for objects in the second video frame 303 from the respective single point thermodynamic diagrams 305.

With further reference to FIG. 4, a flow 400 of one embodiment of training of a lightweight keypoint detection model is illustrated. The process 400 includes the following steps:

step 401, obtaining a training sample, where the training sample includes a sample image and a sample multipoint thermodynamic diagram obtained based on the sample image.

In this embodiment, an executing subject (e.g., a server or a terminal device shown in fig. 1) on which the training method of the lightweight keypoint detection model operates may obtain a training sample. The training sample can include not only the sample image but also the sample multi-point thermodynamic diagram. It should be noted that the execution subject of the training step of the lightweight keypoint detection model may be different from the execution subject of the detection of keypoints. The following execution subjects in this embodiment are all execution subjects of the training procedure of the lightweight key point detection model.

The sample multi-point thermodynamic diagram is obtained based on the sample image, and in practice, the sample multi-point thermodynamic diagram may be obtained based on the sample image in various ways, for example, a model (i.e., a deep neural network) for predicting a single-point thermodynamic diagram may be used to predict the sample image, obtain the single-point thermodynamic diagram, and generate the multi-point thermodynamic diagram from the obtained single-point thermodynamic diagrams, i.e., the sample multi-point thermodynamic diagram.

And step 402, inputting the sample multi-point thermodynamic diagrams and the sample images into a lightweight key point detection model to be trained to obtain the single-point thermodynamic diagrams of the key points of the targets in the sample images.

In this embodiment, the executing body may input the sample multi-point thermodynamic diagram and the sample image into the lightweight keypoint detection model to be trained, and obtain the single-point thermodynamic diagram of the keypoints included in the sample image. In practice, the execution subject can splice the sample multi-point thermodynamic diagram and the sample image, and input the splicing result into the lightweight key point detection model to be trained.

And 403, determining a loss value of the lightweight key point detection model to be trained based on the key point coordinate value indicated by the single-point thermodynamic diagram, and training the lightweight key point detection model to be trained through the loss value.

In this embodiment, the executing agent may determine a loss value of the lightweight keypoint detection model based on the keypoint coordinate values indicated by the single-point thermodynamic diagram output from the lightweight keypoint detection model to be trained, and train the lightweight keypoint detection model to be trained using the loss value.

In practice, the execution subject may determine the loss value of the lightweight keypoint detection model based on the keypoint coordinate values indicated by the single-point thermodynamic diagram in various ways. For example, the key point coordinate value and the key point coordinate true value are input into a preset loss function, so as to obtain a loss value.

The implementation modes can input the multipoint thermodynamic diagram and the sample images into the model to be trained during training, so that more features are input during training, and the accuracy and precision of the training are effectively improved.

In some optional implementations of this embodiment, the generating of the sample multipoint thermodynamic diagram may include: inputting the sample image into an accurate key point detection model to obtain a single-point thermodynamic diagram containing key points of a target in the sample image; and generating a sample multipoint thermodynamic diagram according to a combination result obtained by combining the single-point thermodynamic diagrams.

In these alternative implementations, the sample multi-point thermodynamic diagram may be obtained by inputting a sample image into an accurate keypoint detection model. Specifically, the executing body inputs the sample image into the precise key point detection model to obtain a single-point thermodynamic diagram output from the precise key point detection model, wherein the single-point thermodynamic diagram includes the key points of the target in the sample image. Then, the execution body combines the single-point thermodynamic diagrams, and can generate a sample multi-point thermodynamic diagram according to a combination result in various ways. For example, the execution subject may directly use the combination result as a sample multipoint thermodynamic diagram, or perform preset processing on the combination result and use the result of the preset processing as the sample multipoint thermodynamic diagram. The preset process here may be various, such as gaussian blur.

These implementations can generate accurate sample multi-point thermodynamic diagrams through accurate keypoint detection models.

In some optional implementations of this embodiment, the sample image is a video frame of the sample video; the generation step of the sample multipoint thermodynamic diagram comprises the following steps: acquiring a thermodynamic diagram truth value of each video frame in a sample video, wherein the thermodynamic diagram truth value is a single-point thermodynamic diagram truth value or a multi-point thermodynamic diagram truth value; and randomly disturbing the thermodynamic diagram truth value of each video frame, determining the multipoint thermodynamic diagram corresponding to each video frame according to the disturbed key point thermodynamic diagram, and taking the multipoint thermodynamic diagram as a sample multipoint thermodynamic diagram.

In these alternative implementations, the executing entity may randomly perturb the single-point thermodynamic diagram truth values (or the multi-point thermodynamic diagram truth values) of the video frames in the sample video, determine a multi-point thermodynamic diagram according to the perturbation result, and use each multi-point thermodynamic diagram as a sample multi-point thermodynamic diagram.

Specifically, the executing body may determine, in various manners, multipoint thermodynamic diagrams corresponding to the respective video frames according to the post-disturbance key point thermodynamic diagrams. In the case that the thermodynamic diagram truth value is a single-point thermodynamic diagram truth value, the executing entity may combine random perturbation results (i.e., post-perturbation key point thermodynamic diagrams) of the respective thermodynamic diagram truth values, and generate a sample multi-point thermodynamic diagram according to the combined result. For example, the execution subject may directly use the merged result as the sample multipoint thermodynamic diagram, and may also perform a preset operation on the merged result and use the result of the preset processing as the sample multipoint thermodynamic diagram. The preset operation here may be gaussian blur. In the case that the truth value of the thermodynamic diagram is the truth value of the multipoint thermodynamic diagram, the execution body may generate the sample multipoint thermodynamic diagram according to the truth value of the thermodynamic diagram. For example, the executing entity may directly use the truth value of the thermodynamic diagram as the sample multipoint thermodynamic diagram, or may perform preset processing on the key point thermodynamic diagram, and use the result of the preset processing as the sample multipoint thermodynamic diagram.

The implementation modes can randomly perturb the thermodynamic diagram truth values of each video frame in the sample video, so that the multi-sample multi-point thermodynamic diagrams are obtained quickly, and the efficiency of determining the multi-sample multi-point thermodynamic diagrams is improved.

In some optional implementations of this embodiment, step 403 may include: determining a coordinate loss value according to the coordinate value and the true value of the key point coordinate, and determining a thermodynamic diagram loss value according to the single-point thermodynamic diagram and the true value of the single-point thermodynamic diagram; and determining the loss value of the lightweight key point detection model to be trained according to the coordinate loss value and the thermodynamic diagram loss value.

In these alternative implementations, the executing entity may calculate not only the loss value of the key point coordinates, but also the loss value of the thermodynamic diagram, and use both of the loss values as the loss values of the lightweight key point detection model to be trained. The execution body may determine the coordinate loss value and the thermodynamic loss value in various ways. For example, the execution agent may determine the coordinate loss value and the thermodynamic loss value through a norm loss function (L1 loss function) or a two-norm loss function (L2 loss function).

These implementations may calculate a total loss value for training by both the thermodynamic loss value and the coordinate loss value, thereby improving the accuracy of determining the loss value.

Optionally, the determining a coordinate loss value according to the key point coordinate value and the key point coordinate true value, and determining a thermodynamic diagram loss value according to the single-point thermodynamic diagram and the single-point thermodynamic diagram true value may include: determining a norm of a difference between a key point coordinate true value and a key point coordinate value, and determining the square of the norm as a coordinate loss value; and determining the norm of the difference between the true value of the single-point thermodynamic diagram and the single-point thermodynamic diagram, and determining the square of the norm as the thermodynamic diagram loss value.

In these alternative implementations, the execution body may determine a difference between the key point coordinate true value and the key point coordinate value, and determine a square of a norm of the difference as the coordinate loss value. The execution agent may determine a difference between the single-point thermodynamic diagram true value and the single-point thermodynamic diagram, and determine a square of a norm of the difference (the difference between the single-point thermodynamic diagram true value and the single-point thermodynamic diagram) as the thermodynamic loss value.

In practice, the thermodynamic diagram Loss value Loss_hmMay be expressed as Loss_hm＝||HM_GT-HM_pre||²Wherein, HM_GT、HM_preAre respectively a truth value of a single-point thermodynamic diagram and the single-point thermodynamic diagram (namely the thermodynamic diagram forever)Measured value). Loss of coordinates value Loss_coordMay be expressed as Loss_coord＝||Coord_GT-Coord_pre||²Wherein Coord_GT、Coord_preThe key point coordinate true value and the key point coordinate value (namely the key point coordinate predicted value) are respectively.

The optional implementation modes can highlight the difference between the predicted value and the true value by determining the square of the norm, so that the fitting of the model is accelerated, and the training efficiency is improved.

Optionally, determining a loss value of the lightweight key point detection model to be trained according to the coordinate loss value and the thermodynamic diagram loss value includes: weighting the coordinate loss value and the thermodynamic loss value by using the weights of the coordinate loss value and the thermodynamic loss value; and determining the loss value of the lightweight key point detection model to be trained according to the weighting result.

In these alternative implementations, the executive body may weight the coordinate loss value and the thermodynamic loss value by using the weight of the coordinate loss value and the weight of the thermodynamic loss value, so as to obtain a weighted result. And determining the loss value of the lightweight key point detection model to be trained, namely the total loss value for training according to the weighting result. In practice, the execution subject may determine the loss value of the lightweight keypoint detection model to be trained according to the weighting result in various ways. For example, the execution subject may directly determine the weighting result as a loss value of the lightweight key point detection model to be trained. Alternatively, the execution subject may perform a designation process on the weighting result, and use the result of the designation process as a loss value of the lightweight keypoint detection model to be trained. The specified processing here may be various, such as multiplication with a preset coefficient, or input to a preset model.

For example, the Loss value Loss of the lightweight keypoint detection model to be trained may be expressed as Loss ═ α Loss_hm+βLoss_coordWhere α and β are the weight of the thermodynamic loss value and the weight of the coordinate loss value, respectively. The weights of the thermodynamic loss values and the coordinate loss values may be the same, i.e. 0.5 each, orThe same may be different. The weights may be preset or generated in real time.

These implementations may improve the fitness of the determined total loss value to the actual situation by weighting, thereby improving training accuracy.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a device for detecting a keypoint, the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the embodiment of the device may further include the same or corresponding features or effects as the embodiment of the method shown in fig. 2, in addition to the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the key point detecting device 500 of the present embodiment includes: an acquisition unit 501, a prediction unit 502, and a determination unit 503. The acquiring unit 501 is configured to acquire a multipoint thermodynamic diagram of a first video frame in a target video, where the multipoint thermodynamic diagram includes key points of a target in the first video frame; the prediction unit 502 is configured to input the multipoint thermodynamic diagram and a second video frame in the target video into a lightweight key point detection model to obtain a single-point thermodynamic diagram containing key points of a target in the second video frame, wherein the number of parameters of the lightweight key point detection model is smaller than a specified value, and the number of video frames, which are different from the first video frame and the second video frame, is not larger than a preset difference value; a determining unit 503 configured to determine the key point coordinate values of the objects in the second video frame according to the respective single point thermodynamic diagrams.

In this embodiment, specific processes of the obtaining unit 501, the predicting unit 502, and the determining unit 503 of the key point detecting device 500 and technical effects thereof can refer to related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementations of this embodiment, the first video frame is a starting video frame detected in the video; an obtaining unit, further configured to perform obtaining a multipoint thermodynamic diagram of a first video frame in a target video as follows: inputting the initial video frame into an accurate key point detection model to obtain a single-point thermodynamic diagram containing key points of targets in the initial video frame, wherein the parameter number of the accurate key point detection model is greater than that of the lightweight key point detection model; the single-point thermodynamic diagrams are combined, and a multi-point thermodynamic diagram is generated according to the combination result.

In some optional implementations of this embodiment, the prediction unit is configured to input the multipoint thermodynamic diagram and a second video frame in the target video into the lightweight keypoint detection model: splicing the multipoint thermodynamic diagram and the second video frame to obtain a splicing result; and inputting the splicing result into the lightweight key point detection model.

In some optional implementations of this embodiment, the training step of the lightweight keypoint detection model includes: acquiring a training sample, wherein the training sample comprises a sample image and a sample multipoint thermodynamic diagram obtained based on the sample image; inputting the sample multi-point thermodynamic diagrams and the sample images into a lightweight key point detection model to be trained to obtain single-point thermodynamic diagrams of key points of targets in the sample images; and determining a loss value of the lightweight key point detection model to be trained based on the key point coordinate values indicated by the single-point thermodynamic diagram, and training the lightweight key point detection model to be trained through the loss value.

In some optional implementations of this embodiment, the generating step of the sample multipoint thermodynamic diagram includes: inputting the sample image into an accurate key point detection model to obtain a single-point thermodynamic diagram containing key points of a target in the sample image; and generating a sample multipoint thermodynamic diagram according to a combination result obtained by combining the single-point thermodynamic diagrams.

In some optional implementations of this embodiment, determining a loss value of the lightweight keypoint detection model to be trained based on the keypoint coordinate values indicated by the single-point thermodynamic diagram includes: determining a coordinate loss value according to the coordinate value and the true value of the key point coordinate, and determining a thermodynamic diagram loss value according to the single-point thermodynamic diagram and the true value of the single-point thermodynamic diagram; and determining the loss value of the lightweight key point detection model to be trained according to the coordinate loss value and the thermodynamic diagram loss value.

In some optional implementations of this embodiment, determining a coordinate loss value according to a key point coordinate value and a key point coordinate true value, and determining a thermodynamic diagram loss value according to the single-point thermodynamic diagram and the single-point thermodynamic diagram true value includes: determining a norm of a difference between a key point coordinate true value and a key point coordinate value, and determining the square of the norm as a coordinate loss value; and determining the norm of the difference between the true value of the single-point thermodynamic diagram and the single-point thermodynamic diagram, and determining the square of the norm as the thermodynamic diagram loss value.

In some optional implementations of this embodiment, determining a loss value of the lightweight keypoint detection model to be trained according to the coordinate loss value and the thermodynamic loss value includes: weighting the coordinate loss value and the thermodynamic loss value by using the weights of the coordinate loss value and the thermodynamic loss value; and determining the loss value of the lightweight key point detection model to be trained according to the weighting result.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

As shown in fig. 6, is a block diagram of an electronic device of a method for detecting a keypoint according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium provided by the present disclosure. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for detecting keypoints provided by the present disclosure. The non-transitory computer-readable storage medium of the present disclosure stores computer instructions for causing a computer to perform the method for detecting a keypoint provided by the present disclosure.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the detection method of the keypoint in the embodiment of the present disclosure (for example, the acquisition unit 501, the prediction unit 502, and the determination unit 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing, i.e., implementing the detection method of the key points in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 602.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from the use of the electronic device by the detection of the key point, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, and these remote memories may be connected to the detection electronics of the keypoint through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for detecting a keypoint may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device for detection of key points, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a prediction unit, and a determination unit. The names of the units do not form a limitation on the units themselves in some cases, and for example, the acquiring unit may also be described as a unit for acquiring a multipoint thermodynamic diagram of a first video frame in a target video.

As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a multipoint thermodynamic diagram of a first video frame in a target video, wherein the multipoint thermodynamic diagram comprises each key point of a target in the first video frame; inputting the multipoint thermodynamic diagrams and a second video frame in the target video into a lightweight key point detection model to obtain a single-point thermodynamic diagram containing key points of a target in the second video frame, wherein the number of parameters of the lightweight key point detection model is smaller than a specified value, and the number of video frames with a difference between a first video frame and the second video frame is not larger than a preset difference value; and determining the key point coordinate values of the targets in the second video frame according to the single-point thermodynamic diagrams.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method of detecting keypoints, the method comprising:

acquiring a multipoint thermodynamic diagram of a first video frame in a target video, wherein the multipoint thermodynamic diagram comprises each key point of a target in the first video frame;

inputting the multipoint thermodynamic diagrams and a second video frame in the target video into a lightweight key point detection model to obtain a single-point thermodynamic diagram containing key points of a target in the second video frame, wherein the number of parameters of the lightweight key point detection model is smaller than a specified value, and the number of video frames with a difference between the first video frame and the second video frame is not larger than a preset difference value;

determining key point coordinate values of the targets in the second video frame according to each single-point thermodynamic diagram, wherein the determining comprises the following steps: combining the single-point thermodynamic diagrams to obtain corresponding multipoint thermodynamic diagrams; and taking the key point coordinate value indicated by the multipoint thermodynamic diagram as the key point coordinate value of the target in the second video frame.

2. The method of claim 1, wherein the first video frame is a starting video frame detected in the video;

the acquiring a multipoint thermodynamic diagram of a first video frame in a target video comprises the following steps:

inputting the starting video frame into an accurate key point detection model to obtain a single-point thermodynamic diagram containing key points of a target in the starting video frame, wherein the number of parameters of the accurate key point detection model is larger than that of the lightweight key point detection model;

and combining the single-point thermodynamic diagrams, and generating the multipoint thermodynamic diagram according to a combination result.

3. The method of claim 1, wherein the inputting the multipoint thermodynamic diagram and a second video frame in the target video into a lightweight keypoint detection model comprises:

splicing the multipoint thermodynamic diagram and the second video frame to obtain a splicing result;

and inputting the splicing result into a lightweight key point detection model.

4. The method of claim 1 or 3, wherein the step of training the lightweight keypoint detection model comprises:

obtaining a training sample, wherein the training sample comprises a sample image and a sample multipoint thermodynamic diagram obtained based on the sample image;

inputting the sample multi-point thermodynamic diagrams and the sample images into a lightweight key point detection model to be trained to obtain single-point thermodynamic diagrams of key points of targets in the sample images;

and determining a loss value of the lightweight key point detection model to be trained based on the key point coordinate values indicated by the single-point thermodynamic diagram, and training the lightweight key point detection model to be trained through the loss value.

5. The method of claim 2, wherein the step of training the lightweight keypoint detection model comprises:

6. The method of claim 5, wherein the step of generating the sample multi-point thermodynamic diagram comprises:

inputting the sample image into the accurate key point detection model to obtain a single-point thermodynamic diagram containing key points of targets in the sample image;

and generating the sample multipoint thermodynamic diagrams according to a combination result obtained by combining the single-point thermodynamic diagrams.

7. The method of claim 4, wherein the sample image is a video frame of a sample video;

the generation step of the sample multipoint thermodynamic diagram comprises the following steps:

acquiring a thermodynamic diagram truth value of each video frame in the sample video, wherein the thermodynamic diagram truth value is a single-point thermodynamic diagram truth value or a multi-point thermodynamic diagram truth value;

and randomly disturbing the thermodynamic diagram truth value of each video frame, determining the multipoint thermodynamic diagrams corresponding to each video frame according to the disturbed key point thermodynamic diagrams, and taking the multipoint thermodynamic diagrams as the sample multipoint thermodynamic diagrams.

8. The method of claim 5, wherein the sample image is a video frame of a sample video;

9. The method of claim 4, wherein determining the loss value of the lightweight keypoint detection model to be trained based on the keypoint coordinate values indicated by the single-point thermodynamic diagram comprises:

determining a coordinate loss value according to the key point coordinate value and the key point coordinate true value, and determining a thermodynamic diagram loss value according to the single-point thermodynamic diagram and the single-point thermodynamic diagram true value;

and determining a loss value of the lightweight key point detection model to be trained according to the coordinate loss value and the thermodynamic diagram loss value.

10. The method of claim 5, wherein determining the loss value of the lightweight keypoint detection model to be trained based on the keypoint coordinate values indicated by the single-point thermodynamic diagram comprises:

11. The method of claim 9, wherein determining a coordinate loss value based on the keypoint coordinate values and the true keypoint coordinate values and determining a thermodynamic loss value based on the single point thermodynamic diagram and the true single point thermodynamic diagram value comprises:

determining a norm of a difference between the key point coordinate true value and the key point coordinate value, and determining a square of the norm as the coordinate loss value;

and determining the norm of the difference between the true value of the single-point thermodynamic diagram and the single-point thermodynamic diagram, and determining the square of the norm as the thermodynamic diagram loss value.

12. The method of claim 10, wherein determining a coordinate loss value based on the keypoint coordinate values and the true keypoint coordinate values and determining a thermodynamic loss value based on the single point thermodynamic diagram and the true single point thermodynamic diagram value comprises:

13. The method of claim 9, wherein the determining a loss value for the lightweight keypoint detection model to be trained from the coordinate loss value and the thermodynamic loss value comprises:

weighting the coordinate loss value and the thermodynamic loss value with weights of both the coordinate loss value and the thermodynamic loss value;

and determining the loss value of the lightweight key point detection model to be trained according to the weighting result.

14. The method of claim 10, wherein the determining a loss value for the lightweight keypoint detection model to be trained from the coordinate loss value and the thermodynamic loss value comprises:

15. An apparatus for detecting keypoints, the apparatus comprising:

the acquisition unit is configured to acquire a multipoint thermodynamic diagram of a first video frame in a target video, wherein the multipoint thermodynamic diagram comprises key points of a target in the first video frame;

a prediction unit configured to input the multipoint thermodynamic diagram and a second video frame in the target video into a lightweight key point detection model to obtain a single-point thermodynamic diagram containing key points of a target in the second video frame, wherein the number of parameters of the lightweight key point detection model is less than a specified value, and the number of video frames different from the second video frame by a preset difference value is not more than a preset difference value;

a determining unit configured to determine, according to each of the single-point thermodynamic diagrams, a key point coordinate value of an object in the second video frame, including: combining the single-point thermodynamic diagrams to obtain corresponding multipoint thermodynamic diagrams; and taking the key point coordinate value indicated by the multipoint thermodynamic diagram as the key point coordinate value of the target in the second video frame.

16. The apparatus of claim 15, wherein the first video frame is a starting video frame detected in the video;

the acquiring unit is further configured to execute a multipoint thermodynamic diagram of a first video frame in the acquisition target video as follows:

17. The apparatus of claim 15, wherein the prediction unit is configured to input the multipoint thermodynamic diagram and a second video frame in the target video into a lightweight keypoint detection model:

and inputting the splicing result into a lightweight key point detection model.

18. The apparatus of claim 15 or 17, wherein the training step of the lightweight keypoint detection model comprises:

19. The apparatus of claim 16, wherein the training step of the lightweight keypoint detection model comprises:

20. The apparatus of claim 19, wherein the generating of the sample multi-point thermodynamic diagram comprises:

21. The apparatus of claim 18, wherein the sample image is a video frame of a sample video;

22. The apparatus of claim 19, wherein the sample image is a video frame of a sample video;

23. The apparatus of claim 18, wherein the determining a loss value of the lightweight keypoint detection model to be trained based on the keypoint coordinate values indicated by the single point thermodynamic diagram comprises:

24. The apparatus of claim 19, wherein the determining a loss value of the lightweight keypoint detection model to be trained based on the keypoint coordinate values indicated by the single point thermodynamic diagram comprises:

25. The apparatus of claim 23, wherein the determining a coordinate loss value based on the keypoint coordinate value and the true keypoint coordinate value and determining a thermodynamic loss value based on the single point thermodynamic diagram and the true single point thermodynamic diagram value comprises:

26. The apparatus of claim 24, wherein determining a coordinate loss value based on the keypoint coordinate value and the true keypoint coordinate value and determining a thermodynamic loss value based on the single point thermodynamic diagram and the true single point thermodynamic diagram value comprises:

27. The apparatus of claim 23, wherein the determining a loss value for the lightweight keypoint detection model to be trained from the coordinate loss value and the thermodynamic loss value comprises:

28. The apparatus of claim 24, wherein the determining a loss value for the lightweight keypoint detection model to be trained from the coordinate loss value and the thermodynamic loss value comprises:

29. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-14.

30. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-14.