CN115100691B

CN115100691B - Method, device and equipment for acquiring key point detection model and detecting key point

Info

Publication number: CN115100691B
Application number: CN202211021088.6A
Authority: CN
Inventors: 付灿苗; 孙冲; 李琛; 吕静
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2023-08-08
Anticipated expiration: 2042-08-24
Also published as: CN115100691A

Abstract

The application discloses a method, a device and equipment for acquiring a key point detection model and detecting a key point, and belongs to the technical field of computers. The method comprises the following steps: acquiring a training data set and an initial key point detection model, wherein the training data set comprises a sample image and standard position information corresponding to the sample key points, and the sample key points are key points of target parts included in the sample image; calling an initial key point detection model to process the sample image to obtain a sample heat point diagram corresponding to the sample key point; determining sample position information corresponding to the sample key points according to the sample heat point diagram; determining a reference loss value according to the standard position information, the sample position information and the sample heat point diagram, wherein the reference loss value is used for indicating the detection precision of the initial key point detection model; and updating the initial key point detection model based on the fact that the reference loss value is larger than the loss threshold value, so as to obtain the target key point detection model. The method improves the accuracy and precision of key point detection.

Description

Method, device and equipment for acquiring key point detection model and detecting key point

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to a method, a device and equipment for acquiring a key point detection model and detecting a key point.

Background

With the continuous development of computer technology, more and more application scenes start to support man-machine interaction, such as gesture interaction, which is a common man-machine interaction mode. The gesture interaction needs to detect the key points of the hand, which refer to the nodes of the hand.

In the related art, a hand image and first position information of a hand key point included in the hand image are acquired, and the first position information is obtained by labeling the hand image based on experience; and calling an initial hand key point detection model to obtain a hot spot diagram corresponding to the hand key points included in the hand image, and taking coordinates corresponding to the maximum numerical value in the hot spot diagram as second position information corresponding to the hand key points. And updating the initial hand key point detection model according to the first position information and the second position information to obtain a target hand key point detection model, wherein the target hand key point detection model is used for acquiring the position information of the hand key points included in the hand image.

However, in the above method, the coordinates corresponding to the maximum numerical value in the hotspot graph are directly used as the second position information of the hand key point, so that the determined second position information is not accurate enough; and updating the initial hand key point detection model by adopting the second position information with lower accuracy, so that the detection accuracy of the obtained target hand key point detection model is lower, the stability is poorer, and the accuracy is lower when the position information of the hand key points included in the hand image is acquired by adopting the target hand key point detection model.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for acquiring and detecting key points of a key point detection model, which can be used for solving the problem of low accuracy of key point detection in the related technology.

In a first aspect, an embodiment of the present application provides a method for acquiring a keypoint detection model, where the method includes:

acquiring a training data set and an initial key point detection model, wherein the training data set comprises a sample image and standard position information corresponding to a sample key point, and the sample key point is a key point of a target part included in the sample image;

invoking the initial key point detection model to process the sample image to obtain a sample heat point diagram corresponding to the sample key point;

Determining sample position information corresponding to the sample key points according to the sample heat point diagram;

determining a reference loss value according to the standard position information, the sample position information and the sample heat point diagram, wherein the reference loss value is used for indicating the detection precision of the initial key point detection model;

and updating the initial key point detection model based on the reference loss value being larger than a loss threshold value to obtain a target key point detection model, wherein the target key point detection model is used for detecting a target image so as to determine target position information corresponding to key points of a target part included in the target image.

In a second aspect, an embodiment of the present application provides a method for detecting a keypoint, where the method includes:

acquiring a target image and a target key point detection model, wherein the target image comprises a target part, and the target key point detection model is acquired by the acquisition method of any key point detection model in the first aspect;

invoking the target key point detection model to process the target image to obtain a target heat point diagram corresponding to the target key point of the target part;

and determining target position information corresponding to the target key points according to the target heat point diagram.

In a third aspect, an embodiment of the present application provides an apparatus for acquiring a keypoint detection model, where the apparatus includes:

the acquisition module is used for acquiring a training data set and an initial key point detection model, wherein the training data set comprises a sample image and standard position information corresponding to a sample key point, and the sample key point is a key point of a target part included in the sample image;

the processing module is used for calling the initial key point detection model to process the sample image so as to obtain a sample heat point diagram corresponding to the sample key point;

the determining module is used for determining sample position information corresponding to the sample key points according to the sample heat point diagram;

the determining module is further configured to determine a reference loss value according to the standard position information, the sample position information and the sample hotspot graph, where the reference loss value is used to indicate the detection accuracy of the initial keypoint detection model;

and the updating module is used for updating the initial key point detection model based on the reference loss value being larger than a loss threshold value to obtain a target key point detection model, wherein the target key point detection model is used for detecting a target image so as to determine target position information corresponding to a key point of a target part included in the target image.

In a possible implementation manner, the determining module is configured to obtain a first heat point map and a second heat point map, where the first heat point map is a heat point map corresponding to a first dimension, and the second heat point map is a heat point map corresponding to a second dimension;

and determining sample position information corresponding to the sample key point according to the sample heat point diagram, the first heat point diagram and the second heat point diagram.

In a possible implementation manner, the determining module is configured to determine a first value according to the sample hotspot graph and the first hotspot graph, where the first value is a value of the sample keypoint in the first dimension;

determining a second numerical value according to the sample heat point diagram and the second heat point diagram, wherein the second numerical value is the numerical value of the sample key point in the second dimension;

and determining sample position information corresponding to the sample key point according to the first numerical value and the second numerical value.

In one possible implementation, the sample hotspot graph, the first hotspot graph and the second hotspot graph each include a plurality of values, and the sample hotspot graph, the first hotspot graph and the second hotspot graph include the same number of values;

The determining module is used for multiplying the numerical values positioned at the same position in the sample hotspot graph and the first hotspot graph to obtain a third numerical value corresponding to each position; determining the first numerical value according to the third numerical value corresponding to each position; multiplying the sample hotspot graph by the numerical value positioned at the same position in the second hotspot graph to obtain a fourth numerical value corresponding to each position; and determining the second numerical value according to the fourth numerical value corresponding to each position.

In one possible implementation, the determining module is configured to determine a first loss value between the standard location information and the sample location information;

determining a second loss value according to the standard position information and the sample heat map;

and determining the reference loss value according to the first loss value and the second loss value.

In one possible implementation manner, the determining module is configured to determine a third heat map corresponding to the standard location information;

and determining the second loss value according to the sample heat point diagram and the third heat point diagram.

In one possible implementation manner, the acquiring module is configured to acquire the sample image;

Identifying the sample image to obtain candidate position information corresponding to the sample key points;

adjusting the candidate position information to obtain standard position information corresponding to the sample key points;

and acquiring the training data set according to the sample image and the standard position information corresponding to the sample key points.

In one possible implementation manner, the acquiring module is configured to acquire a shape parameter and an attitude parameter corresponding to the target location;

generating a target part model according to the shape parameter and the posture parameter, wherein the target part model comprises standard position information corresponding to the sample key points;

attaching a texture map to the target part model to obtain a target part model attached with the texture map;

projecting the target part model attached with the texture map into a background map to obtain the sample image;

In a possible implementation manner, the updating module is configured to update the initial keypoint detection model to obtain an intermediate keypoint detection model based on the reference loss value being greater than the loss threshold;

Calling the intermediate key point detection model to process the sample image to obtain an intermediate heat point diagram corresponding to the sample key point;

determining intermediate position information corresponding to the sample key points according to the intermediate heat point diagram;

determining candidate loss values according to the standard position information, the intermediate position information and the intermediate heat point diagram;

and taking the intermediate key point detection model as the target key point detection model based on the candidate loss value not being larger than the loss threshold value.

In a fourth aspect, embodiments of the present application provide a keypoint detection device, including:

the acquisition module is used for acquiring a target image and a target key point detection model, wherein the target image comprises a target part, and the target key point detection model is acquired by the acquisition device of the key point detection model in any one of the third aspects;

the processing module is used for calling the target key point detection model to process the target image to obtain a target heat point diagram corresponding to the target key point of the target part;

and the determining module is used for determining target position information corresponding to the target key points according to the target heat point diagram.

In a possible implementation manner, the determining module is configured to determine, according to the target hotspot graph, first location information corresponding to the target keypoint;

taking the first position information as target position information corresponding to the target key point; or determining target position information corresponding to the target key point according to the first position information and the reference position information, wherein the reference position information is the position information of the target key point of the target part included in the reference image, and the acquisition time of the reference image is adjacent to and before the acquisition time of the target image.

In a possible implementation manner, the acquiring module is further configured to acquire the reference image, where the reference image includes the target location;

the processing module is further used for calling the target key point detection model to process the reference image to obtain a reference heat point diagram corresponding to the target key point of the target part;

the determining module is further configured to determine, according to the reference heat point map, reference position information corresponding to the target key point.

In one possible implementation, the determining module is configured to obtain an optical flow compensation value between the reference image and the target image, where the optical flow compensation value is used to indicate a speed of the reference image to the target image;

Determining the distance between the reference image and the target image of the target key point according to the reference position information and the first position information;

determining a distance weight parameter according to the distance, wherein the distance weight parameter is in direct proportion to the distance;

and determining target position information corresponding to the target key point according to the reference position information, the first position information, the optical flow compensation value, the distance and the distance weight parameter.

In a fifth aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor and a memory, where at least one program code is stored in the memory, where the at least one program code is loaded and executed by the processor, to enable the computer device to implement a method for obtaining a keypoint detection model according to any one of the foregoing first aspect or any one of the foregoing possible implementation manners of the first aspect, or to enable the computer device to implement a keypoint detection method according to any one of the foregoing second aspect or any one of the foregoing possible implementation manners of the second aspect.

In a sixth aspect, there is further provided a computer readable storage medium, in which at least one program code is stored, where the at least one program code is loaded and executed by a processor, to cause a computer to implement a method for obtaining a keypoint detection model according to the first aspect or any one of the possible implementation manners of the first aspect, or to cause a computer to implement a keypoint detection method according to the second aspect or any one of the possible implementation manners of the second aspect.

In a seventh aspect, there is further provided a computer program or a computer program product, in which at least one computer instruction is stored, the at least one computer instruction being loaded and executed by a processor, to cause a computer to implement the method for obtaining the keypoint detection model according to any one of the possible implementations of the first aspect or the first aspect, or to cause a computer to implement the method for detecting keypoints according to any one of the possible implementations of the second aspect or the second aspect.

The technical scheme provided by the embodiment of the application at least brings the following beneficial effects.

According to the technical scheme provided by the embodiment of the application, through the initial key point detection model, the sample position information corresponding to the sample key points of the target part included in the sample image is obtained, when the loss value is determined, the standard position information corresponding to the sample key points and the sample position information corresponding to the sample key points are considered, and the sample heat point diagram corresponding to the sample key points is also considered, so that the accuracy of the determined loss value is higher. The initial key point detection model is updated by adopting a loss value with higher accuracy, so that the obtained target key point detection model has higher detection accuracy, higher precision, better stability and higher accuracy and precision of key point detection.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flowchart of a method for obtaining a keypoint detection model according to an embodiment of the present application;

fig. 3 is a schematic diagram of key points of a hand according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a sample image acquisition process provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of an initial keypoint detection model provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of a sample heat map, a first heat map, and a second heat map provided in an embodiment of the present application;

FIG. 7 is a schematic diagram of a third thermal diagram provided by an embodiment of the present application;

FIG. 8 is a flowchart of a method for detecting key points according to an embodiment of the present application;

fig. 9 is a schematic diagram of determining target position information corresponding to a target key point according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a reference image and a target image provided in an embodiment of the present application;

FIG. 11 is a schematic diagram of detecting key points of a reference image and a target image according to an embodiment of the present disclosure;

fig. 12 is a schematic structural diagram of an acquiring device of a keypoint detection model according to an embodiment of the present application;

fig. 13 is a schematic structural diagram of a key point detection device according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of a terminal device provided in an embodiment of the present application;

fig. 15 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like herein are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application.

In an exemplary embodiment, the method for acquiring the keypoint detection model and the method for detecting the keypoint provided in the embodiments of the present application may be applied to various scenes, including, but not limited to, cloud technology, artificial intelligence, intelligent traffic, assisted driving, games, and the like.

Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that reacts in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and other directions.

The scheme provided by the embodiment of the application relates to a Machine Learning technology in an artificial intelligence technology, wherein Machine Learning (ML) is a multi-domain interdisciplinary, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

With research and progress of artificial intelligence technology, research and application of artificial intelligence technology are being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, robotic, smart medical, smart customer service, car networking, autopilot, smart transportation, etc., and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and will be of increasing importance.

FIG. 1 is a schematic diagram of an implementation environment provided in an embodiment of the present application, as shown in FIG. 1, where the implementation environment includes: a terminal device 101 and a server 102.

The method for acquiring the key point detection model provided in the embodiment of the present application may be performed by the terminal device 101, may be performed by the server 102, or may be performed by both the terminal device 101 and the server 102, which is not limited in the embodiment of the present application. For the case where the method for acquiring the key point detection model provided in the embodiment of the present application is performed jointly by the terminal device 101 and the server 102, the server 102 takes over the primary computing work, and the terminal device 101 takes over the secondary computing work; alternatively, the server 102 carries the secondary computing job and the terminal device 101 carries the primary computing job; alternatively, a distributed computing architecture is used for collaborative computing between the server 102 and the terminal device 101.

The key point detection method provided in the embodiment of the present application may be executed by the terminal device 101, may be executed by the server 102, or may be executed by both the terminal device 101 and the server 102, which is not limited in the embodiment of the present application. For the case where the key point detection method provided in the embodiment of the present application is performed jointly by the terminal device 101 and the server 102, the server 102 takes on primary computing work, and the terminal device 101 takes on secondary computing work; alternatively, the server 102 carries the secondary computing job and the terminal device 101 carries the primary computing job; alternatively, a distributed computing architecture is used for collaborative computing between the server 102 and the terminal device 101.

It should be noted that, the execution device of the method for acquiring the keypoint detection model and the execution device of the keypoint detection method may be the same or different, which is not limited in the embodiment of the present application. Illustratively, the executing device of the acquiring method of the key point detection model is a terminal device 101, and the executing device of the key point detection method is a server 102; alternatively, the execution device of the acquisition method of the keypoint detection model and the execution device of the keypoint detection method are both the terminal device 101.

Alternatively, the terminal device 101 may be any electronic product that can perform man-machine interaction with a user through one or more modes of a keyboard, a touchpad, a touch screen, a remote controller, a voice interaction or a handwriting device, etc. Terminal devices 101 include, but are not limited to, cell phones, computers, intelligent voice interaction devices, intelligent appliances, vehicle terminals, aircraft, and the like. The server 102 is a server, or a server cluster formed by a plurality of servers, or any one of a cloud computing platform and a virtualization center, which is not limited in this embodiment of the present application. The server 102 is in communication connection with the terminal device 101 via a wired network or a wireless network. The server 102 has a data receiving function, a data processing function, and a data transmitting function. Of course, the server 102 may also have other functions, which are not limited in this embodiment of the present application.

It will be appreciated by those skilled in the art that the above-described terminal device 101 and server 102 are merely illustrative, and that other terminal devices or servers, now existing or hereafter may be present, as may be appropriate for use in the present application, are intended to be within the scope of the present application and are incorporated herein by reference.

The embodiment of the present application provides a method for acquiring a key point detection model, where the method for acquiring the key point detection model is performed by a computer device, and the method may be applied to the implementation environment shown in fig. 1, where the computer device may be the terminal device 101 in fig. 1 or the server 102 in fig. 1, which is not limited in this embodiment of the present application. Taking a flowchart of a method for obtaining a keypoint detection model according to the embodiment of the present application as shown in fig. 2 as an example, as shown in fig. 2, the method includes the following steps 201 to 205.

In step 201, a training data set and an initial keypoint detection model are acquired, wherein the training data set includes a sample image and standard position information corresponding to a sample keypoint, and the sample keypoint is a keypoint of a target part included in the sample image.

In the exemplary embodiment of the present application, the target portion may be a hand, a foot, or other body parts, which is not limited in the embodiment of the present application. The sample key points are key points included in the target portion, the number of the sample key points is one or more, and the embodiment of the present application is not limited thereto. Illustratively, if the target site is a hand, the hand is included in the sample image, and the sample keypoints are the keypoints of the hand.

In one possible implementation, there are two implementations to obtain the training data set.

Firstly, acquiring a sample image; identifying the sample image to obtain candidate position information corresponding to the sample key points; the candidate position information is adjusted to obtain standard position information corresponding to the sample key points; and acquiring a training data set according to the sample image and the standard position information corresponding to the sample key points.

The sample image may be an image stored in a storage space of the computer device, an image uploaded by a user, or an image downloaded from a browser, and the source of the sample image is not limited in the embodiment of the present application. The number of sample images may be one or a plurality of sample images.

Optionally, the process of identifying the sample image to obtain the candidate position information corresponding to the sample key point includes: and inputting the sample image into an oversized model, and taking the output result of the oversized model as candidate position information corresponding to the sample key points included in the sample image. For example, the oversized model may be an HRNET (High Resolution Net high resolution model) or a hourglass (a convolutional neural network) model.

Before the sample image is input into the oversized model, the oversized model is trained, and the training process of the oversized model comprises the following steps: a sample image set is acquired, wherein the sample image set comprises a first image and position information of key points included in the first image, and the first image comprises a target part. And training the oversized model according to the first image and the position information of the key points included in the first image to obtain a trained oversized model. Optionally, the first image includes location information of the key point that is manually noted.

Optionally, the process of adjusting the candidate position information corresponding to the sample key point to obtain the standard position information corresponding to the sample key point includes: manually adjusting candidate position information corresponding to the sample key points to obtain standard position information corresponding to the sample key points; or the computer equipment uniformly adjusts the candidate position information corresponding to the sample key points to obtain the standard position information corresponding to the sample key points.

For the same sample image, because the standards of each person are different, the position information of the key points marked by each person is different, and if the initial key point detection model is updated according to the position information of the key points included in the manually marked sample image and the sample image, the jitter of the target key point detection model obtained by updating is larger, and the accuracy of the position information of the key points included in the determined image is lower. Therefore, candidate position information of the key points included in the sample image is acquired according to the oversized model, fine adjustment is performed by manual or computer equipment on the basis of the candidate position information corresponding to the key points, standard position information corresponding to the key points included in the sample image is obtained, standards of the standard position information corresponding to the obtained key points are consistent, and consistency of the standard position information corresponding to the key points of the sample is improved. When the initial key point detection model is updated by the sample position information of the key points included in the sample image and the sample image acquired in the mode, the jitter of the obtained target key point detection model is smaller, and the accuracy of the position information of the key points included in the determined image can be improved.

When the target portion included in the sample image is a hand, the hand includes 21 key points, and the process of acquiring the standard position information corresponding to each key point is similar. Taking the schematic diagram of the key points of the hand as shown in fig. 3 as an example, the black points in fig. 3 are key points.

The second implementation mode is that the shape parameter and the gesture parameter corresponding to the target part are obtained, a target part model is generated according to the shape parameter and the gesture parameter, and the target part model comprises standard position information corresponding to the key points of the sample; attaching a texture map to the target part model to obtain a target part model attached with the texture map; projecting the target part model with the texture map into a background map to obtain a sample image; and obtaining a training data set by using the standard position information corresponding to the sample image and the sample key points.

Wherein, the shape (shape) parameter is used for controlling the shape of the target site, and the pose (pose) parameter is used for controlling the pose of the target site. The shape parameters and the gesture parameters can be input by a user or can be automatically generated by computer equipment, and the acquisition modes of the shape parameters and the gesture parameters are not limited in the embodiment of the application.

Optionally, the process of generating the target site model according to the shape parameter and the posture parameter includes: the shape parameters and the posture parameters are input into a mano model (a hand model with joints and non-rigid deformation), and a target part model is obtained according to the output result of the mano model.

In one possible implementation, a plurality of candidate texture maps are stored in a computer device, and the texture map is attached to a target site model, and the process of obtaining the target site model attached with the texture map includes: and determining one texture map from the plurality of candidate texture maps, and attaching the determined texture map to the target part model to obtain the target part model attached with the texture map. A process for determining a texture map from a plurality of candidate texture maps, including but not limited to: the computer equipment randomly selects one candidate texture map from the candidate texture maps, or takes the selected candidate texture map as a determined texture map according to an operation instruction of a user.

Optionally, attaching the texture map to the target site model, and obtaining the target site model to which the texture map is attached includes: the target part model and the texture map are input into a parameter hard texture model (Parametric Hand Texture model), and the target part model with the texture map is obtained according to the output result of the parameter hard texture model.

In one possible implementation, a plurality of candidate background images are stored in a computer device, and the process of projecting the target site model with the texture image into the background image to obtain a sample image includes: and determining a background image in the candidate background images, and projecting the target part model attached with the texture image to the determined background image to obtain a sample image. A process for determining a background map from a plurality of candidate background maps, including but not limited to: the computer equipment randomly selects one candidate background image from the candidate background images, or takes the selected candidate background image as a determined background image according to an operation instruction of a user.

Because the shape parameters and the gesture parameters are automatically generated by the computer equipment or manually input by a user, the generated sample images are more diversified, and when the initial key point detection model is updated by the sample images obtained in the mode, the obtained target key point detection model can detect the position information of the key points of the target part included in the images with any shape and any gesture, so that the application range of the target hand key point detection model is wider.

Fig. 4 is a schematic diagram of a sample image acquisition process according to an embodiment of the present application. In fig. 4, shape parameters and gesture parameters are input into a mano model to obtain a target part model, and a target part model and a texture map are input into a parameter hard texture model to obtain a target part model attached with the texture map; and obtaining a sample image according to the target part model attached with the texture map and the background map.

It should be noted that, any implementation manner may be selected to obtain the training data set, or the training data sets obtained by the two implementation manners may be used as the training data set of the initial key point detection model, which is not limited in this embodiment of the present application.

Fig. 5 is a schematic diagram of an initial keypoint detection model provided in an embodiment of the present application, and in fig. 5, the initial keypoint detection model includes 4 convolution layers and 2 upsampling layers. And inputting the sample image into an initial key point detection model, and obtaining a sample heat point diagram corresponding to the sample key points included in the sample image through 4 convolution layers and 2 up-sampling layers. Of course, the initial keypoint detection model may also include a greater or lesser number of convolution layers and upsampling layers. The feature map (featmap) of the initial key point detection model provided by the embodiment of the application is smaller, the number of up-sampling layers (upsamples) is smaller, and therefore the complexity of the model can be reduced, and the processing speed of the model is improved. The initial key point detection model provided by the embodiment of the application comprises, but is not limited to, mobile terminals such as smart phones.

In step 202, an initial keypoint detection model is called to process a sample image, and a sample hotspot graph corresponding to the sample keypoint is obtained.

In one possible implementation, the sample image is input into an initial keypoint detection model to obtain a sample hotspot graph corresponding to the sample keypoint.

Wherein each keypoint corresponds to a sample hotspot graph. Taking the target part included in the sample image as the hand, and taking the hand including 21 key points as an example, after the sample image is input into the initial key point detection model, a sample heat point diagram corresponding to each key point is obtained, namely 21 sample heat point diagrams are obtained.

In step 203, sample position information corresponding to the sample keypoints is determined according to the sample hotspot graph.

In one possible implementation manner, according to the sample hotspot graph, the process of determining sample position information corresponding to the sample keypoints includes: and (3) performing soft-argmax (a processing mode) processing on the sample hot spot diagram to obtain sample position information corresponding to the sample key points. The processing mode can avoid quantization errors, so that the accuracy and the stability of the determined sample position information are higher.

The process of performing soft-argmax processing on the sample hotspot graph to obtain sample position information corresponding to the sample key points comprises the following steps: acquiring a first heat point map and a second heat point map, wherein the first heat point map is a heat point map corresponding to a first dimension, and the second heat point map is a heat point map corresponding to a second dimension. And determining sample position information corresponding to the sample key point according to the sample heat point diagram, the first heat point diagram and the second heat point diagram.

The first hotspot graph and the second hotspot graph are set based on experience, or are adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Fig. 6 is a schematic diagram of a sample heat map, a first heat map, and a second heat map provided in an embodiment of the present application. Fig. 6 (1) is a sample thermal map, fig. 6 (2) is a first thermal map, and fig. 6 (3) is a second thermal map.

The embodiment of the application does not limit the process of determining the sample position information corresponding to the sample key point according to the sample heat point diagram, the first heat point diagram and the second heat point diagram. Optionally, determining a first value according to the sample heat point diagram and the first heat point diagram, wherein the first value is a value of the sample key point in the first dimension; determining a second numerical value according to the sample heat point diagram and the second heat point diagram, wherein the second numerical value is the numerical value of the sample key point in a second dimension; and acquiring sample position information corresponding to the sample key points according to the first numerical value and the second numerical value.

The sample heat point diagram, the first heat point diagram and the second heat point diagram respectively comprise a plurality of numerical values, and the number of the numerical values included in the sample heat point diagram, the first heat point diagram and the second heat point diagram is the same. From the sample heat map and the first heat map, the process of determining the first value includes, but is not limited to: multiplying the numerical values at the same position in the sample hotspot graph and the first hotspot graph to obtain a third numerical value corresponding to each position, and determining the first numerical value according to the third numerical value corresponding to each position. Illustratively, the third values corresponding to the respective positions are added to obtain the first value. The process of determining the second value from the sample hotspot graph and the second hotspot graph comprises: multiplying the numerical values at the same position in the sample hotspot graph and the second hotspot graph to obtain a fourth numerical value corresponding to each position, and determining a second numerical value according to the fourth numerical value corresponding to each position. Illustratively, the fourth values corresponding to the respective positions are added to obtain the second value.

Taking the sample heat map, the first heat map and the second heat map shown in fig. 6 as examples, determining a first value x=0.1×0+0.1×0.4+0.6×0.4+0.1×0.8+0.1×0.4=0.4; the second value y=0.1×0+0.1×0 (-0.4) +0.6×0+0.1×0+0.1×0.4=0, and the sample position information corresponding to the sample key point is (0.4, 0).

It should be noted that, the determination process of each key point in the target portion included in the sample image is similar, and will not be described in detail here.

In step 204, a reference loss value is determined according to the standard position information, the sample position information and the sample hotspot graph, wherein the reference loss value is used for indicating the detection accuracy of the initial keypoint detection model.

Optionally, the process of determining the reference loss value according to the standard position information, the sample position information and the sample heat map includes: determining a first loss value between the standard position information and the sample position information; and determining a second loss value according to the standard position information and the sample heat map, and determining a reference loss value according to the first loss value and the second loss value.

Wherein determining a first loss value between the standard position information and the sample position information comprises: and calling a target loss function according to the standard position information and the sample position information, and determining a first loss value. Alternatively, the target loss function may be an L2 loss function, but may be other loss functions. Alternatively, the euclidean distance between the standard position information and the sample position information may also be taken as the first loss value.

In one possible implementation, the determining the second loss value according to the standard location information and the sample hotspot graph includes: determining a third heat point diagram corresponding to the standard position information; and determining a second loss value according to the sample heat point diagram and the third heat point diagram. Optionally, a Mean Square Error (MSE) between the sample hotspot graph and the third hotspot graph is determined, with the mean square error being the second loss value.

Fig. 7 is a schematic diagram of a third heat map provided in an embodiment of the present application. Determining the second loss value as the third heat point diagram and the sample heat point diagram。

Optionally, determining the reference loss value based on the first loss value and the second loss value includes: taking the sum of the first loss value and the second loss value as a reference loss value. The reference loss value may also be determined based on the first loss value, the second loss value, and the super parameter. The value of the super parameter may be set based on experience, or may be adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Illustratively, the value of the super parameter is 0.1. For another example, the value of the super parameter is 0.2.

In one possible implementation, the reference loss value is determined according to the following formula (1) based on the standard position information, the sample position information, and the sample hotspot graph.

Formula (1)

In the above-mentioned formula (1),for reference loss value, +.>For sample position information +.>For standard position information>For the first loss value, +.>Is a regular term->For the sample hotspots, ++>For the second loss value, +.>Is a super parameter.

It should be noted that, the reference loss value is used to indicate the detection accuracy of the initial key point detection model, and the reference loss value and the detection accuracy of the initial key point detection model are in inverse relation. The larger the reference loss value is, the lower the detection accuracy of the initial key point detection model is, and the lower the detection accuracy is; on the contrary, the smaller the reference loss value is, the higher the detection accuracy of the initial key point detection model is, and the higher the detection accuracy is.

In step 205, based on the reference loss value being greater than the loss threshold, the initial keypoint detection model is updated to obtain a target keypoint detection model, which is used to detect the target image, so as to determine target position information corresponding to the keypoints of the target part included in the target image.

When the reference loss value is not greater than the loss threshold value, the detection accuracy of the initial key point detection model is higher, and therefore the initial key point detection model is taken as the target key point detection model. When the reference loss is larger than the loss threshold, the detection accuracy of the initial key point detection model is lower, and the initial key point detection model needs to be updated to obtain a target key point detection model with higher detection accuracy.

The loss threshold may be set based on experience, or may be adjusted according to the implementation environment, and the loss threshold is not limited in the embodiment of the present application.

Optionally, the process of updating the initial keypoint detection model to obtain the target keypoint detection model includes: updating the initial key point detection model to obtain an intermediate key point detection model; calling an intermediate key point detection model to process the sample image to obtain an intermediate heat point diagram corresponding to the sample key point; determining intermediate position information corresponding to the sample key points according to the intermediate heat point diagram; determining candidate loss values according to the standard position information, the intermediate position information and the intermediate heat point diagram; and taking the intermediate key point detection model as a target key point detection model based on the candidate loss value not being larger than the loss threshold value. And if the candidate loss value is still greater than the loss threshold value, continuing to update the intermediate key point detection model until the updated key point detection model is called, acquiring a hotspot graph corresponding to the sample image, acquiring position information corresponding to the key points included in the sample image according to the hotspot graph, and taking the updated key point detection model as a target key point detection model based on the position information corresponding to the key points, the standard position information corresponding to the key points and the hotspot graph corresponding to the sample image.

Optionally, the process of updating the initial keypoint detection model to obtain the intermediate keypoint detection model includes: and adjusting parameters of each convolution layer and each up-sampling layer included in the initial key point detection model to obtain an intermediate key point detection model. The target key point detection model provided by the embodiment of the application only uses the calculation power of 30M FLOPS (Million Floating-point Operations per Second, millions of Floating point operations per second), can be compatible with key point detection of various scenes, can well support 3D key points, can obtain very stable points in the video processing process, and can be well applied to various key point detection scenes.

According to the method, the sample position information corresponding to the sample key points of the target part included in the sample image is obtained through the initial key point detection model, and when the loss value is determined, the standard position information corresponding to the sample key points and the sample position information corresponding to the sample key points are considered, and the sample heat point diagram corresponding to the sample key points is also considered, so that the accuracy of the determined loss value is higher. And the initial key point detection model is updated by adopting a loss value with higher accuracy, so that the obtained target key point detection model has higher detection accuracy, higher precision and better stability, and the accuracy and the precision of key point detection are higher.

The embodiment of the application provides a method for detecting a key point, which is executed by a computer device, and the computer device may be the terminal device 101 in fig. 1 or the server 102 in fig. 1, where the method is applied to the implementation environment shown in fig. 1. Taking a flowchart of a key point detection method provided in the embodiment of the present application as an example, the method includes the following steps 801 to 803.

In step 801, a target image and a target keypoint detection model are acquired.

In the exemplary embodiment of the present application, the target keypoint detection model is obtained by the above-described method for obtaining a keypoint detection model shown in fig. 2. The target image includes a target portion, where the target portion is any portion, for example, a hand, a foot, or another portion of the body, which is not limited in the embodiment of the present application. The target key point detection model is used for detecting the target image so as to determine the position information of the key points of the target part included in the target image.

The method for acquiring the target image is not limited in the embodiment of the present application. By way of example, the target image may be acquired in any one of the following four ways.

In the first mode, a plurality of candidate images to be subjected to key point detection are stored in a storage space of the computer device, and one candidate image is determined as a target image from the plurality of candidate images.

For example, the computer device randomly determines one candidate image among the plurality of candidate images as the target image. For another example, the computer device displays a plurality of candidate images, and in response to receiving a selection instruction for any one of the candidate images, the selected candidate image is taken as the target image.

And secondly, taking the image uploaded by the user as a target image.

Optionally, the computer device displays an image upload control, the image upload control being used to upload the image. In response to an operation instruction for the image uploading control, the computer equipment receives an image uploaded by a user, and takes the image uploaded by the user as a target image.

And thirdly, downloading an image including the target part from the browser as a target image.

Mode four, the image acquired by the image acquisition device of the computer equipment is used as a target image.

Optionally, the computer device includes an image capturing device, which may be a camera or other component capable of capturing an image. The computer equipment collects images through the image collection device, identifies the collected images, and responds to the fact that the collected images comprise target parts, the collected images are used as target images.

In step 802, a target keypoint detection model is invoked to process a target image, and a target hotspot graph corresponding to a target keypoint of a target part is obtained.

In one possible implementation manner, a target image is input into a target key point detection model, and a target heat point diagram corresponding to a target key point of a target part included in the target image is obtained according to an output result of the target key point detection model.

It should be noted that, the number of key points included in the target portion in the target image is consistent with the number of target heat point diagrams, that is, each key point corresponds to one target heat point diagram.

In step 803, target position information corresponding to the target keypoints is determined according to the target hotspot graph.

In one possible implementation manner, according to the target hotspot graph, the target position information corresponding to the target key point may be determined by adopting any one of the following two implementation manners.

According to the first implementation mode, according to the target heat point diagram, first position information corresponding to the target key point is determined, and the first position information is used as target position information corresponding to the target key point.

The process of determining the first position information corresponding to the target key point according to the target heat point map is consistent with the process of determining the sample position information corresponding to the sample key point according to the sample heat point map in step 203, and will not be described herein.

Fig. 9 is a schematic diagram for determining target position information corresponding to a target key point according to an embodiment of the present application, where in fig. 9, a target image is input into a target key point detection model to obtain a target heat point map corresponding to a target key point included in a target portion in the target imageAnd performing soft-argmax processing on the target hot spot diagram to obtain target position information (coordinates) corresponding to the target key point.

According to the second implementation mode, according to the target heat point diagram, first position information corresponding to the target key point is determined, and according to the first position information and the reference position information, target position information corresponding to the target key point is determined.

The reference position information is position information of a target key point of a target part included in the reference image, and the acquisition time of the reference image is adjacent to and before the acquisition time of the target image. Illustratively, the acquisition time of the target image is t, and the acquisition time of the reference image is t-1. The target key point of the target part included in the reference image is the tip of the index finger, and the target key point of the target part included in the target image is also the tip of the index finger.

Optionally, the determining process of the reference position information corresponding to the target key point of the target part included in the reference image includes: acquiring a reference image, wherein the reference image comprises a target part; calling a target key point detection model to process the reference image to obtain a reference heat point diagram corresponding to the target key point of the target part; and determining the reference position information corresponding to the target key point according to the reference heat point diagram.

The process of calling the target key point detection model to process the reference image to obtain the reference heat point diagram corresponding to the target key point of the target part is similar to the process of calling the initial key point detection model to process the sample image to obtain the sample heat point diagram corresponding to the sample key point in the step 202; the process of determining the reference position information corresponding to the target key point according to the reference heat point map is similar to the process of determining the sample position information corresponding to the sample key point according to the sample heat point map in step 203, and will not be described herein.

In one possible implementation manner, the process of determining the target position information corresponding to the target key point according to the first position information and the reference position information includes: acquiring an optical flow compensation value between the reference image and the target image, wherein the optical flow compensation value is used for indicating the speed from the reference image to the target image; determining the distance between the reference image and the target image of the target key point according to the reference position information and the first position information; determining a distance weight parameter according to the distance, wherein the distance weight parameter is in direct proportion to the distance; and determining target position information corresponding to the target key point according to the reference position information, the first position information, the optical flow compensation value, the distance and the distance weight parameter.

Alternatively, an optical flow compensation value between the reference image and the target image is acquired according to an LK (Lucas-Kanade) optical flow compensation method. The euclidean distance between the reference position information and the first position information may be taken as the distance between the reference image and the target image of the target key point.

Illustratively, the target location information corresponding to the target keypoint is determined according to the following formula (2) according to the reference location information, the first location information, the optical flow compensation value, the distance, and the distance weight parameter.

Formula (2)

In the above-mentioned formula (2),for the target position information +.>For reference position information->For optical flow compensation value +.>For distance weight parameter, ++>For distance (I)>Is the first location information.

Fig. 10 is a schematic diagram of a reference image and a target image according to an embodiment of the present application. Fig. 10 (1) is a reference image, and fig. 10 (2) is a target image. The reference image is the image of the t-1 th frame, and the target image is the image of the t-1 th frame.

Fig. 11 is a schematic diagram of detection of key points of a reference image and a target image according to an embodiment of the present application. The black dots in (1) and (2) in fig. 11 are the same key point, and the gray dots in (1) and (2) in fig. 11 are the same key point. The target position information of the black dot in (2) in fig. 11 is determined according to the method of the second embodiment, and the target position information of the gray dot in (2) in fig. 11 is determined according to the method of the first embodiment. As can be seen from fig. 11, the black dot is basically motionless, and the gray dot can jump in a small range, which indicates that the accuracy of the target position information determined by the method provided by the second implementation manner is higher and the effect is better.

The target key point detection model obtained by the method has higher detection precision, higher detection accuracy and better detection effect, so that the target image is detected by adopting the target key point detection model with higher detection precision and better detection effect, and the accuracy of the target position information of the target key point of the target part included in the detected target image is higher, and the accuracy is higher.

Fig. 12 is a schematic structural diagram of an acquiring device for a keypoint detection model according to an embodiment of the present application, where, as shown in fig. 12, the device includes:

the acquiring module 1201 is configured to acquire a training data set and an initial keypoint detection model, where the training data set includes a sample image and standard position information corresponding to a sample keypoint, and the sample keypoint is a keypoint of a target part included in the sample image;

the processing module 1202 is configured to invoke an initial keypoint detection model to process a sample image, so as to obtain a sample hotspot graph corresponding to a sample keypoint;

a determining module 1203, configured to determine sample position information corresponding to the sample keypoints according to the sample hotspot graph;

the determining module 1203 is further configured to determine a reference loss value according to the standard position information, the sample position information, and the sample hotspot graph, where the reference loss value is used to indicate a detection accuracy of the initial key point detection model;

The updating module 1204 is configured to update the initial keypoint detection model based on the reference loss value being greater than the loss threshold value, to obtain a target keypoint detection model, where the target keypoint detection model is configured to detect a target image, so as to determine target position information corresponding to a keypoint of a target portion included in the target image.

In a possible implementation manner, the determining module 1203 is configured to obtain a first heat point map and a second heat point map, where the first heat point map is a heat point map corresponding to a first dimension, and the second heat point map is a heat point map corresponding to a second dimension; and determining sample position information corresponding to the sample key point according to the sample heat point diagram, the first heat point diagram and the second heat point diagram.

In one possible implementation manner, the determining module 1203 is configured to determine a first value according to the sample hotspot graph and the first hotspot graph, where the first value is a value of the sample key point in the first dimension; determining a second numerical value according to the sample heat point diagram and the second heat point diagram, wherein the second numerical value is the numerical value of the sample key point in a second dimension; and determining sample position information corresponding to the sample key points according to the first numerical value and the second numerical value.

The determining module 1203 is configured to multiply the values located at the same position in the sample hotspot graph and the first hotspot graph to obtain third values corresponding to each position; determining a first value according to the third value corresponding to each position; multiplying the numerical values at the same position in the sample hotspot graph and the second hotspot graph to obtain a fourth numerical value corresponding to each position; and determining the second numerical value according to the fourth numerical value corresponding to each position.

In one possible implementation, the determining module 1203 is configured to determine a first loss value between the standard position information and the sample position information; determining a second loss value according to the standard position information and the sample heat map; a reference loss value is determined based on the first loss value and the second loss value.

In one possible implementation manner, the determining module 1203 is configured to determine a third heat map corresponding to the standard location information; and determining a second loss value according to the sample heat point diagram and the third heat point diagram.

In one possible implementation, the acquiring module 1201 is configured to acquire a sample image; identifying the sample image to obtain candidate position information corresponding to the sample key points; the candidate position information is adjusted to obtain standard position information corresponding to the sample key points; and acquiring a training data set according to the sample image and the standard position information corresponding to the sample key points.

In one possible implementation, the obtaining module 1201 is configured to obtain a shape parameter and an attitude parameter corresponding to the target location; generating a target part model according to the shape parameters and the posture parameters, wherein the target part model comprises standard position information corresponding to the sample key points; attaching a texture map to the target part model to obtain a target part model attached with the texture map; projecting the target part model with the texture map into a background map to obtain a sample image; and acquiring a training data set according to the sample image and the standard position information corresponding to the sample key points.

In one possible implementation, the updating module 1204 is configured to update the initial keypoint detection model to obtain an intermediate keypoint detection model based on the reference loss value being greater than the loss threshold; calling an intermediate key point detection model to process the sample image to obtain an intermediate heat point diagram corresponding to the sample key point; determining intermediate position information corresponding to the sample key points according to the intermediate heat point diagram; determining candidate loss values according to the standard position information, the intermediate position information and the intermediate heat point diagram; and taking the intermediate key point detection model as a target key point detection model based on the candidate loss value not being larger than the loss threshold value.

According to the device, through the initial key point detection model, the sample position information corresponding to the sample key points of the target part included in the sample image is obtained, when the loss value is determined, the standard position information corresponding to the sample key points and the sample position information corresponding to the sample key points are considered, and the sample heat point diagram corresponding to the sample key points is also considered, so that the accuracy of the determined loss value is higher. The initial key point detection model is updated by adopting a loss value with higher accuracy, so that the obtained target key point detection model has higher detection accuracy, higher precision, better stability and better key point detection effect.

Fig. 13 is a schematic structural diagram of a hand key point detection device according to an embodiment of the present application, where, as shown in fig. 13, the device includes:

an acquiring module 1301, configured to acquire a target image and a target keypoint detection model, where the target image includes a target location, and the target keypoint detection model is acquired by an acquiring device of the keypoint detection model shown in fig. 12;

the processing module 1302 is configured to invoke a target key point detection model to process the target image, so as to obtain a target heat point map corresponding to the target key point of the target part;

The determining module 1303 is configured to determine target position information corresponding to the target key point according to the target heat point diagram.

In one possible implementation manner, the determining module 1303 is configured to determine, according to the target hotspot graph, first location information corresponding to the target key point; taking the first position information as target position information corresponding to the target key point; or determining target position information corresponding to the target key point according to the first position information and the reference position information, wherein the reference position information is the position information of the target key point of the target part included in the reference image, and the acquisition time of the reference image is adjacent to and before the acquisition time of the target image.

In one possible implementation, the obtaining module 1301 is further configured to obtain a reference image, where the reference image includes the target location;

the processing module 1302 is further configured to invoke the target key point detection model to process the reference image, so as to obtain a reference heat point map corresponding to the target key point of the target part;

the determining module 1303 is further configured to determine reference position information corresponding to the target key point according to the reference heat point map.

In one possible implementation, the determining module 1303 is configured to obtain an optical flow compensation value between the reference image and the target image, where the optical flow compensation value is used to indicate a speed of the reference image to the target image; determining the distance between the reference image and the target image of the target key point according to the reference position information and the first position information;

Determining a distance weight parameter according to the distance, wherein the distance weight parameter is in direct proportion to the distance; and determining target position information corresponding to the target key point according to the reference position information, the first position information, the optical flow compensation value, the distance and the distance weight parameter.

The target key point detection model obtained by the device has higher detection precision, higher detection accuracy and better detection effect, so that the target image is detected by adopting the target key point detection model with higher detection precision and higher detection accuracy, and the accuracy of target position information of target key points of target parts included in the detected target image is higher.

It should be understood that, in implementing the functions of the apparatus provided above, only the division of the above functional modules is illustrated, and in practical application, the above functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Fig. 14 shows a block diagram of a terminal device 1400 according to an exemplary embodiment of the present application. The terminal device 1400 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Terminal device 1400 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, and the like.

In general, the terminal apparatus 1400 includes: a processor 1401 and a memory 1402.

Processor 1401 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 1401 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 1401 may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 1401 may be integrated with a GPU (Graphics Processing Unit, image processor) for taking care of rendering and rendering of content that the display screen is required to display. In some embodiments, the processor 1401 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

Memory 1402 may include one or more computer-readable storage media, which may be non-transitory. Memory 1402 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1402 is used to store at least one instruction for execution by processor 1401 to implement the method of obtaining a keypoint detection model provided by the method embodiment shown in fig. 2 and/or the method of keypoint detection provided by the method embodiment shown in fig. 8 in the present application.

In some embodiments, the terminal device 1400 may further optionally include: a peripheral interface 1403 and at least one peripheral. The processor 1401, memory 1402, and peripheral interface 1403 may be connected by a bus or signal lines. The individual peripheral devices may be connected to the peripheral device interface 1403 via buses, signal lines or a circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1404, a display screen 1405, a camera assembly 1406, audio circuitry 1407, and a power source 1408.

Peripheral interface 1403 may be used to connect at least one Input/Output (I/O) related peripheral to processor 1401 and memory 1402. In some embodiments, processor 1401, memory 1402, and peripheral interface 1403 are integrated on the same chip or circuit board; in some other embodiments, either or both of processor 1401, memory 1402, and peripheral interface 1403 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 1404 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 1404 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 1404 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1404 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuit 1404 may communicate with other terminal devices via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuit 1404 may also include NFC (Near Field Communication, short range wireless communication) related circuits, which are not limited in this application.

The display screen 1405 is used to display UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1405 is a touch display screen, the display screen 1405 also has the ability to collect touch signals at or above the surface of the display screen 1405. The touch signal may be input to the processor 1401 as a control signal for processing. At this time, the display 1405 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1405 may be one, disposed on the front panel of the terminal device 1400; in other embodiments, the display 1405 may be at least two, respectively disposed on different surfaces of the terminal device 1400 or in a folded design; in other embodiments, the display 1405 may be a flexible display disposed on a curved surface or a folded surface of the terminal device 1400. Even more, the display 1405 may be arranged in a non-rectangular irregular pattern, i.e. a shaped screen. The display 1405 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera component 1406 is used to capture images or video. Optionally, camera assembly 1406 includes a front camera and a rear camera. Typically, a front camera is provided at the front panel of the terminal device 1400 and a rear camera is provided at the rear of the terminal device 1400. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1406 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuitry 1407 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1401 for processing, or inputting the electric signals to the radio frequency circuit 1404 for voice communication. For purposes of stereo acquisition or noise reduction, a plurality of microphones may be respectively disposed at different portions of the terminal device 1400. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 1401 or the radio frequency circuit 1404 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuitry 1407 may also include a headphone jack.

A power supply 1408 is used to power the various components in terminal device 1400. The power supply 1408 may be alternating current, direct current, disposable battery, or rechargeable battery. When the power supply 1408 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal device 1400 also includes one or more sensors 1409. The one or more sensors 1409 include, but are not limited to: acceleration sensor 1410, gyroscope sensor 1411, pressure sensor 1412, optical sensor 1413, and proximity sensor 1414.

The acceleration sensor 1410 may detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with the terminal apparatus 1400. For example, the acceleration sensor 1410 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 1401 may control the display screen 1405 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 1410. Acceleration sensor 1410 may also be used for the acquisition of motion data of a game or user.

The gyro sensor 1411 may detect a body direction and a rotation angle of the terminal device 1400, and the gyro sensor 1411 may collect a 3D motion of the user to the terminal device 1400 in cooperation with the acceleration sensor 1410. The processor 1401 can realize the following functions according to the data collected by the gyro sensor 1411: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 1412 may be disposed at a side frame of the terminal device 1400 and/or at an underlying layer of the display 1405. When the pressure sensor 1412 is provided at a side frame of the terminal device 1400, a grip signal of the user to the terminal device 1400 may be detected, and the processor 1401 performs a left-right hand recognition or a quick operation according to the grip signal collected by the pressure sensor 1412. When the pressure sensor 1412 is disposed at the lower layer of the display screen 1405, the processor 1401 realizes control of the operability control on the UI interface according to the pressure operation of the user on the display screen 1405. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The optical sensor 1413 is used to collect the ambient light intensity. In one embodiment, processor 1401 may control the display brightness of display screen 1405 based on the intensity of ambient light collected by optical sensor 1413. Specifically, when the intensity of the ambient light is high, the display luminance of the display screen 1405 is turned high; when the ambient light intensity is low, the display luminance of the display screen 1405 is turned down. In another embodiment, the processor 1401 may also dynamically adjust the shooting parameters of the camera assembly 1406 based on the ambient light intensity collected by the optical sensor 1413.

A proximity sensor 1414, also referred to as a distance sensor, is typically provided on the front panel of the terminal device 1400. The proximity sensor 1414 is used to collect the distance between the user and the front of the terminal device 1400. In one embodiment, when the proximity sensor 1414 detects that the distance between the user and the front face of the terminal device 1400 gradually decreases, the processor 1401 controls the display 1405 to switch from the bright screen state to the off screen state; when the proximity sensor 1414 detects that the distance between the user and the front surface of the terminal apparatus 1400 gradually increases, the display screen 1405 is controlled by the processor 1401 to switch from the off-screen state to the on-screen state.

It will be appreciated by those skilled in the art that the structure shown in fig. 14 is not limiting and that terminal device 1400 may include more or less components than those shown, or may combine certain components, or employ a different arrangement of components.

Fig. 15 is a schematic structural diagram of a server according to an embodiment of the present application, where the server 1500 may include one or more processors (Central Processing Units, CPU) 1501 and one or more memories 1502, where the one or more memories 1502 store at least one program code, and the at least one program code is loaded and executed by the one or more processors 1501 to implement the method for obtaining the keypoint detection model provided in the method embodiment shown in fig. 2 and/or the method for detecting keypoints provided in the method embodiment shown in fig. 8. Of course, the server 1500 may also have a wired or wireless network interface, a keyboard, an input/output interface, etc. for performing input/output, and the server 1500 may also include other components for implementing device functions, which are not described herein.

In an exemplary embodiment, there is also provided a computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to cause a computer to implement the method for acquiring a keypoint detection model provided by the method embodiment shown in fig. 2 and/or the method for detecting a keypoint provided by the method embodiment shown in fig. 8.

Alternatively, the above-mentioned computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Read-Only optical disk (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program or a computer program product is also provided, where at least one computer instruction is stored, where the at least one computer instruction is loaded and executed by a processor, to cause the computer to implement the method for obtaining the keypoint detection model provided by the method embodiment shown in fig. 2 and/or the method for detecting keypoints provided by the method embodiment shown in fig. 8.

It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals referred to in this application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, reference herein to a sample image, a target image, etc. is taken with sufficient authorization.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The foregoing description of the exemplary embodiments of the present application is not intended to limit the invention to the particular embodiments of the present application, but to limit the scope of the invention to any modification, equivalents, or improvements made within the principles of the present application.

Claims

1. The method for acquiring the key point detection model is characterized by comprising the following steps:

acquiring a training data set and an initial key point detection model, wherein the training data set comprises a sample image and standard position information corresponding to sample key points, the sample key points are key points of hands included in the sample image, and the initial key point detection model comprises a convolution layer and an up-sampling layer; the acquiring training data set includes: acquiring the sample image; inputting the sample image into an oversized model, and taking the result output by the oversized model as candidate position information corresponding to the sample key points; manually adjusting the candidate position information to obtain standard position information corresponding to the sample key points; acquiring the training data set according to the sample image and standard position information corresponding to the sample key points; the oversized model is a high-resolution HRNET model or an hourglass model;

acquiring a first heat point map and a second heat point map, wherein the first heat point map is a heat point map corresponding to a first dimension, the second heat point map is a heat point map corresponding to a second dimension, and the number of numerical values included in the sample heat point map, the first heat point map and the second heat point map is the same;

Multiplying the values of the sample hotspot graph and the values of the first hotspot graph, which are positioned at the same position, to obtain third values corresponding to all positions, and adding the third values corresponding to all positions to obtain a first value, wherein the first value is the value of the sample key point in the first dimension; multiplying the numerical values of the same position in the sample hotspot graph and the second hotspot graph to obtain fourth numerical values corresponding to all positions, and adding the fourth numerical values corresponding to all positions to obtain a second numerical value, wherein the second numerical value is the numerical value of the sample key point in the second dimension; taking the position information formed by the first numerical value and the second numerical value as sample position information corresponding to the sample key point;

taking Euclidean distance between the standard position information and the sample position information as a first loss value; taking the mean square error between the sample hotspot graph and the third hotspot graph as a second loss value; taking the sum of the product of the second loss value and the super parameter and the first loss value as a reference loss value; the third heat point diagram is a heat point diagram corresponding to the standard position information, and the reference loss value is used for indicating the detection precision of the initial key point detection model; the reference loss value is determined by the following formula:

L＝L _euc (softargmax(Z′)，gt)+λL _reg (Z′)；

Wherein L is a reference loss value, softargmax (Z') is sample position information, gt is standard position information, L _euc (softargmax (Z'), gt) is the first loss value, λL _reg (Z ') is a regularization term, Z' is a sampleThermal dot diagram, L _reg (Z') is a second loss value, lambda is a hyper-parameter;

and updating the initial key point detection model based on the reference loss value being larger than a loss threshold value to obtain a target key point detection model, wherein the target key point detection model is used for detecting a target image so as to determine target position information corresponding to key points of hands included in the target image.

2. The method of claim 1, wherein the acquiring a training data set comprises:

acquiring the sample image;

3. The method of claim 1, wherein the acquiring a training data set comprises:

Acquiring shape parameters and posture parameters corresponding to the hands;

4. A method according to any one of claims 1 to 3, wherein updating the initial keypoint detection model based on the reference loss value being greater than a loss threshold value to obtain a target keypoint detection model comprises:

updating the initial key point detection model based on the reference loss value being larger than the loss threshold value to obtain an intermediate key point detection model;

5. A method for key point detection, the method comprising:

acquiring a target image and a target keypoint detection model, wherein the target image comprises a hand, the target keypoint detection model is acquired by the method for acquiring the keypoint detection model according to any one of claims 1 to 4, and the target keypoint detection model comprises a convolution layer and an upsampling layer;

invoking the target key point detection model to process the target image to obtain a target heat point diagram corresponding to the target key point of the hand;

acquiring a first heat point map and a second heat point map, wherein the first heat point map is a heat point map corresponding to a first dimension, the second heat point map is a heat point map corresponding to a second dimension, and the number of numerical values included in the target heat point map, the first heat point map and the second heat point map is the same;

multiplying and then adding the numerical values of the target heat point diagram and the numerical values of the first heat point diagram, which are positioned at the same position, to obtain the numerical value of the target key point in the first dimension; multiplying and then adding the numerical values of the target heat point diagram and the second heat point diagram, which are positioned at the same position, to obtain the numerical value of the target key point in the second dimension; taking the position information formed by the numerical value of the target key point in the first dimension and the numerical value of the target key point in the second dimension as first position information corresponding to the target key point;

Taking the first position information as target position information corresponding to the target key point; or, the target key point detection model is called to process a reference image, so that the reference position information of the target key points of the hands included in the reference image is obtained; determining an optical flow compensation value between the reference image and the target image; determining the distance between the reference image and the target image of the target key point and a distance weight parameter corresponding to the distance according to the reference position information and the first position information; determining target position information corresponding to the target key point according to the reference position information, the first position information, the optical flow compensation value, the distance and the distance weight parameter;

wherein the acquisition time of the reference image is adjacent to the acquisition time of the target image, and the optical flow compensation value is used for indicating the speed of the reference image to the target image before the acquisition time of the target image, and the distance weight parameter is in direct proportion to the distance.

6. The method of claim 5, wherein the method further comprises:

Acquiring the reference image, wherein the reference image comprises the hand;

the step of calling the target key point detection model to process a reference image to obtain the reference position information of the target key points of the hand included in the reference image, wherein the step of calling the target key point detection model comprises the following steps:

invoking the target key point detection model to process the reference image to obtain a reference heat point diagram corresponding to the target key point of the hand;

and determining the reference position information corresponding to the target key point according to the reference heat point diagram.

7. An apparatus for obtaining a keypoint detection model, the apparatus comprising:

the acquisition module is used for acquiring a training data set and an initial key point detection model, wherein the training data set comprises a sample image and standard position information corresponding to the sample key points, the sample key points are key points of hands included in the sample image, and the initial key point detection model comprises a convolution layer and an up-sampling layer; the acquiring training data set includes: acquiring the sample image; inputting the sample image into an oversized model, and taking the result output by the oversized model as candidate position information corresponding to the sample key points; manually adjusting the candidate position information to obtain standard position information corresponding to the sample key points; acquiring the training data set according to the sample image and standard position information corresponding to the sample key points; the oversized model is a high-resolution HRNET model or an hourglass model;

the system comprises a determining module, a processing module and a processing module, wherein the determining module is used for acquiring a first heat point diagram and a second heat point diagram, the first heat point diagram is a heat point diagram corresponding to a first dimension, the second heat point diagram is a heat point diagram corresponding to a second dimension, and the number of numerical values included in the sample heat point diagram, the first heat point diagram and the second heat point diagram is the same; multiplying the values of the sample hotspot graph and the values of the first hotspot graph, which are positioned at the same position, to obtain third values corresponding to all positions, and adding the third values corresponding to all positions to obtain a first value, wherein the first value is the value of the sample key point in the first dimension; multiplying the numerical values of the same position in the sample hotspot graph and the second hotspot graph to obtain fourth numerical values corresponding to all positions, and adding the fourth numerical values corresponding to all positions to obtain a second numerical value, wherein the second numerical value is the numerical value of the sample key point in the second dimension; taking the position information formed by the first numerical value and the second numerical value as sample position information corresponding to the sample key point;

The determining module is further configured to take euclidean distance between the standard position information and the sample position information as a first loss value; taking the mean square error between the sample hotspot graph and the third hotspot graph as a second loss value; taking the sum of the product of the second loss value and the super parameter and the first loss value as a reference loss value; the third heat point diagram is a heat point diagram corresponding to the standard position information, and the reference loss value is used for indicating the detection precision of the initial key point detection model; the reference loss value is determined by the following formula:

L＝L _euc (softargmax(Z′)，gt)+λL _reg (Z′)；

wherein L is a reference loss value, softargmax (Z') is sample position information, gt is standard position information, L _euc (softargmax (Z'), gt) is the first loss value, λL _reg (Z ') is a regularized term, Z' is a sample hotspot graph, L _reg (Z') is a second loss value, lambda is a hyper-parameter;

and the updating module is used for updating the initial key point detection model based on the reference loss value being larger than a loss threshold value to obtain a target key point detection model, wherein the target key point detection model is used for detecting a target image so as to determine target position information corresponding to key points of hands included in the target image.

8. A keypoint detection device, said device comprising:

an acquisition module, configured to acquire a target image and a target keypoint detection model, where the target image includes a hand, the target keypoint detection model is acquired by the acquisition device for a keypoint detection model according to claim 7, and the target keypoint detection model includes a convolution layer and an upsampling layer;

the processing module is used for calling the target key point detection model to process the target image to obtain a target heat point diagram corresponding to the target key point of the hand;

the system comprises a determining module, a first heat point map and a second heat point map, wherein the first heat point map is a heat point map corresponding to a first dimension, the second heat point map is a heat point map corresponding to a second dimension, and the number of numerical values included in the target heat point map, the first heat point map and the second heat point map is the same; multiplying and then adding the numerical values of the target heat point diagram and the numerical values of the first heat point diagram, which are positioned at the same position, to obtain the numerical value of the target key point in the first dimension; multiplying and then adding the numerical values of the target heat point diagram and the second heat point diagram, which are positioned at the same position, to obtain the numerical value of the target key point in the second dimension; taking the position information formed by the numerical value of the target key point in the first dimension and the numerical value of the target key point in the second dimension as first position information corresponding to the target key point;

The determining module is further configured to use the first location information as target location information corresponding to the target key point; or, the target key point detection model is called to process a reference image, so that the reference position information of the target key points of the hands included in the reference image is obtained; determining an optical flow compensation value between the reference image and the target image; determining the distance between the reference image and the target image of the target key point and a distance weight parameter corresponding to the distance according to the reference position information and the first position information; determining target position information corresponding to the target key point according to the reference position information, the first position information, the optical flow compensation value, the distance and the distance weight parameter;

9. A computer device, characterized in that the computer device comprises a processor and a memory, in which at least one program code is stored, which is loaded and executed by the processor, to cause the computer device to implement the method of acquiring the keypoint detection model according to any one of claims 1 to 4, or to cause the computer device to implement the keypoint detection method according to any one of claims 5 to 6.

10. A computer-readable storage medium, wherein at least one program code is stored in the computer-readable storage medium, and the at least one program code is loaded and executed by a processor, so that a computer implements the method for acquiring the keypoint detection model according to any one of claims 1 to 4, or so that the computer implements the keypoint detection method according to any one of claims 5 to 6.