CN117710450A

CN117710450A - Method, device, equipment and storage medium for positioning key point

Info

Publication number: CN117710450A
Application number: CN202211037136.0A
Authority: CN
Inventors: 郑凯
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2024-03-15

Abstract

The application provides a method, a device, equipment and a storage medium for positioning a key point. The method comprises the following steps: determining the predicted key point coordinates of a current frame and the corrected actual key point coordinates of a previous frame of the current frame; and inputting the predicted key point coordinates and the actual key point coordinates into a trained time sequence correction model to obtain the corrected actual key point coordinates of the current frame. According to the technical scheme provided by the embodiment of the application, after the predicted key point coordinates of the current frame are obtained, the actual key point coordinates corrected by the preamble frame of the current frame are obtained. And then, the predicted key point coordinates and the actual key point coordinates are input into a trained time sequence correction model together to obtain the corrected actual key point coordinates of the current frame, so that the positioning correction of the key point coordinates in the current frame is realized, the positioning accuracy of the key points is ensured, the condition that the same key point in the continuous frame shakes is avoided, and the stability of the key points in the continuous frame is improved.

Description

Method, device, equipment and storage medium for positioning key point

Technical Field

The embodiment of the application relates to the technical field of data processing, in particular to a method, a device, equipment and a storage medium for positioning a key point.

Background

With the rapid development of image processing technology, for a plurality of continuous frame images, the coordinates of key points of a target object in each frame image are usually predicted in sequence to analyze the actual motion situation of the target object.

At present, a key point prediction model is usually trained in advance, and each frame of image is sequentially input into the trained key point prediction model, so that the key point coordinates in each frame of image can be obtained. However, due to the motion of the target object, some fine pixel value differences exist in the target object in the multiple continuous frame images, so that a certain jitter exists in the key point prediction model for the key point coordinates predicted by the multiple continuous frame images, and the stability of the key points in the multiple continuous frame images cannot be ensured.

Disclosure of Invention

The embodiment of the application provides a key point positioning method, device, equipment and storage medium, which ensure the accuracy of key point positioning, avoid the shake of the same key point in continuous frames and improve the stability of the key point in the continuous frames.

In a first aspect, an embodiment of the present application provides a method for positioning a key point, including:

determining the predicted key point coordinates of a current frame and the corrected actual key point coordinates of a previous frame of the current frame;

and inputting the predicted key point coordinates and the actual key point coordinates into a trained time sequence correction model to obtain the corrected actual key point coordinates of the current frame.

In a second aspect, embodiments of the present application provide a key point positioning device, including:

the key point coordinate determining module is used for determining the predicted key point coordinate of the current frame and the actual key point coordinate corrected by the previous frame of the current frame;

and the key point correction module is used for inputting the predicted key point coordinates and the actual key point coordinates into a trained time sequence correction model to obtain the corrected actual key point coordinates of the current frame.

In a third aspect, an embodiment of the present application provides an electronic device, including:

the system comprises a processor and a memory, wherein the memory is used for storing a computer program, and the processor is used for calling and running the computer program stored in the memory to execute the key point positioning method provided in the first aspect of the application.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program that causes a computer to perform the method of locating a key point as provided in the first aspect of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program/instruction, characterized in that the computer program/instruction, when executed by a processor, implements the method of positioning a key point as provided in the first aspect of the present application.

According to the key point positioning method, device, equipment and storage medium, after the predicted key point coordinates of the current frame are obtained, the actual key point coordinates corrected by the preamble frame of the current frame are obtained. And then, the predicted key point coordinates and the actual key point coordinates are input into a trained time sequence correction model together to obtain the corrected actual key point coordinates of the current frame, so that the positioning correction of the key point coordinates in the current frame is realized, the positioning accuracy of the key points is ensured, the condition that the same key point in the continuous frame shakes is avoided, and the stability of the key points in the continuous frame is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for locating a key point according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a key point location process according to an embodiment of the present application;

FIG. 3 is a flow chart of another method of keypoint locating shown in an embodiment of the present application;

FIG. 4 is a functional block diagram of a key point positioning device according to an embodiment of the present application;

fig. 5 is a schematic block diagram of an electronic device shown in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

When the key point coordinates in each frame are predicted in turn through the trained key point prediction model, a certain jitter exists on the key point coordinates predicted by the plurality of continuous frames due to the fact that a target object in the plurality of continuous frame images has a plurality of fine pixel value differences caused by motion. Therefore, the embodiment of the application designs a scheme for further correcting the coordinates of the key points obtained by the key point prediction model of each frame. And after the predicted key point coordinates of the current frame are obtained, the actual key point coordinates corrected by the preamble frame of the current frame and the predicted key point coordinates of the current frame are input into a trained time sequence correction model together, so that the actual key point coordinates corrected by the current frame can be obtained, the positioning correction of the key point coordinates in the current frame is realized, the positioning accuracy of the key points is ensured, the condition that the same key point in the continuous frame shakes is avoided, and the key point stability in the continuous frame is improved.

Fig. 1 is a flowchart of a method for positioning a key point according to an embodiment of the present application. The method may be performed by the keypoint locating device provided herein, wherein the keypoint locating device may be implemented in any software and/or hardware manner. The keypoint locating apparatus may be applied to any electronic device, including, but not limited to, tablet computers, mobile phones (e.g., folding screen mobile phones, large screen mobile phones, etc.), wearable devices, vehicle devices, augmented reality (augmented reality, AR)/Virtual Reality (VR) devices, notebook computers, ultra-mobile personal computer, UMPC, netbooks, personal digital assistants (personal digital assistant, PDA), smart televisions, smart screens, high definition televisions, 4K televisions, smart speakers, smart projectors, etc., and the specific type of electronic device is not limited in this application.

Specifically, as shown in fig. 1, the method may include the following steps:

s110, determining the predicted key point coordinates of the current frame and the actual key point coordinates corrected by the previous frame of the current frame.

Considering that when the key point prediction is sequentially performed on a plurality of continuous frames, a certain jitter problem may occur in the same key point in the plurality of continuous frames due to a difference of fine pixel values when the target in the frames moves. Therefore, after obtaining the predicted key point coordinates of each frame in the video to be processed, the method and the device further correct the predicted key point coordinates of the frame so as to ensure the accuracy of the key point coordinates in each frame.

Then, after predicting the coordinates of the key points of each frame in the video to be processed in sequence, each frame in the video to be processed can be sequentially regarded as the current frame in the application. And then, according to the actual prediction condition of the key points in each frame, obtaining the predicted key point coordinates of the current frame.

As an alternative implementation scheme in the present application, for the predicted key point coordinates of the current frame, the present application may obtain a large number of sample images, and mark each key point position in each sample image as a corresponding sample label. Then, a key point prediction model is trained in advance by using each sample image and the corresponding sample label thereof, and the key point prediction model is used for predicting the key point coordinates in any image frame.

As shown in fig. 2, after the above-mentioned keypoint prediction model is trained, the current frame may be input into the trained keypoint prediction model, and the corresponding predicted keypoint coordinates may be obtained. That is, the feature analysis is performed on the current frame through each trained network parameter in the key point prediction model, so as to obtain the predicted key point coordinates of the current frame.

In order to avoid the situation that jitter may occur in the same key point in a plurality of continuous frames, the method and the device can judge the overall motion trend of the same target object in the plurality of continuous frames by means of the key point situation in each historical frame before the current frame in the video to be processed, so as to analyze the actual key point coordinates conforming to the overall motion trend in the current frame.

In addition, as the predicted key point coordinates of each frame in the video to be processed are also corrected sequentially. Therefore, when correcting the predicted key coordinates of the current frame, the correction step of the predicted key coordinates for each of the history frames located before the current frame has been completed.

Therefore, each history frame positioned before the current frame in the video to be processed can be used as the preamble frame of the current frame. And then, according to the actual correction condition of the predicted key point coordinates of each preamble frame, determining the corrected actual key point coordinates of each preamble frame.

It should be noted that, in order to avoid jitter occurring at the same key point in multiple consecutive frames, the present application needs to analyze the motion of the target object in multiple consecutive frames. Thus, the preamble frame in the present application may be a continuous history frame at a preset number adjacent to the current frame. For example, the current frame is the 45 th frame in the video to be processed, and the preceding frame may be all frames between the 23 rd frame and the 44 th frame in the video to be processed.

S120, inputting the predicted key point coordinates and the actual key point coordinates into a trained time sequence correction model to obtain the corrected actual key point coordinates of the current frame.

In order to accurately correct the predicted key point coordinates of the current frame, a time sequence correction model can be trained in advance. The timing correction model may be a lightweight network formed by stacking a plurality of convolution layers (conv), batch normalization layers (batch norm) and activation layers (relu), and is used for analyzing the motion trend of each key point in the frame.

For training of the time sequence correction model, the method and the device can acquire the predicted key point coordinates of each frame in a large number of sample videos, and label the real key point coordinates in each frame as corresponding label coordinates. And then, sequentially adopting the predicted key point coordinates and the label coordinates of each frame and the real key point coordinates of each historical frame positioned in front of the frame, and carrying out continuous iterative training on the time sequence correction model until the real key point coordinates corrected by the time sequence correction model on a plurality of continuous frames are consistent with the label coordinates represented by the real key point coordinates of the frame, so that a trained time sequence correction model can be obtained.

It should be understood that, for the predicted keypoint coordinates of each frame in each sample video of the timing correction model, the present application may be determined by a trained keypoint prediction model to ensure preliminary accuracy of the predicted keypoint coordinates of each frame, facilitating efficient training of subsequent timing correction models.

Therefore, after the training of the key point prediction model is completed, training samples of the time sequence correction model, namely the predicted key point coordinates of each frame in a large number of sample videos, can be obtained through the trained key point prediction model, and the normal training of the time sequence correction model is further ensured. Therefore, the time sequence correction model is trained by utilizing the key point prediction model to predict the key point coordinates output by each sample frame in the sample video after the key point prediction model is trained. The batch independent training of the key point prediction model and the time sequence correction model is performed, and the combined training between the key point prediction model and the time sequence correction model is canceled, so that the training complexity of the key point prediction model and the time sequence correction model is greatly simplified, and the accuracy of model training is ensured.

The time sequence correction model mainly carries out corresponding operation on the key point coordinates of each continuous frame, does not need to additionally execute extraction operation of image features in the frames and the like, simplifies the calculated amount of model training, and improves the efficiency of model training.

Then, after the time sequence correction model finishes training, correcting the predicted key point coordinates of each frame in the video to be processed in sequence, wherein the actual key point coordinates corrected by each preamble frame in the application can be the key point coordinates output after the time sequence correction model.

Specifically, when correcting the predicted key point coordinates of the current frame, as shown in fig. 2, the present application may input the predicted key point coordinates of the current frame and the corrected actual key point coordinates of each preceding frame into a trained time sequence correction model together, so as to analyze the motion trend of the same key point in a plurality of continuous frames composed of each preceding frame and the current frame through each convolution kernel trained in the time sequence correction model, thereby judging the corrected actual key point coordinates of the current frame.

And analyzing the motion trend of each key point in a plurality of continuous frames through a time sequence correction model, so that each key point in a plurality of continuous frames consisting of each preamble frame and the current frame can be ensured to have better smoothness in time sequence, and the phenomenon of jitter of a certain key point is avoided.

According to the technical scheme provided by the embodiment of the application, after the predicted key point coordinates of the current frame are obtained, the actual key point coordinates corrected by the preamble frame of the current frame are obtained. And then, the predicted key point coordinates and the actual key point coordinates are input into a trained time sequence correction model together to obtain the corrected actual key point coordinates of the current frame, so that the positioning correction of the key point coordinates in the current frame is realized, the positioning accuracy of the key points is ensured, the condition that the same key point in the continuous frame shakes is avoided, and the stability of the key points in the continuous frame is improved.

As an alternative implementation in the present application, to ensure accurate correction of the coordinates of the keypoints in the current frame, the present application will describe a specific process of correcting the coordinates of the keypoints in the current frame by using a timing correction model.

FIG. 3 is a flow chart of another method of locating a key point according to an embodiment of the present application. As shown in fig. 3, the method specifically may include the following steps:

s310, determining the predicted key point coordinates of the current frame and the actual key point coordinates corrected by the previous frame of the current frame.

And S320, combining the predicted key point coordinates and the actual key point coordinates in the time sequence dimension to obtain a corresponding key point coordinate matrix.

After the predicted key point coordinates of the current frame and the corrected actual key point coordinates of each preceding frame are determined, in order to accurately analyze the motion trend of the key points in the frame, the coordinates of the same key point according to time sequence change need to be analyzed. Therefore, in the time sequence dimension, the corrected actual key point coordinates of each preamble frame and the predicted key point coordinates of the current frame can be combined to describe the coordinate change condition of each key point according to time sequence, so that a corresponding key point coordinate matrix is obtained.

S330, inputting the key point coordinate matrix into the trained time sequence correction model to obtain the actual key point coordinates corrected by the current frame.

After the corresponding key point coordinate matrix is obtained, the key point coordinate matrix can be input into a trained time sequence correction model to analyze the motion trend of the same key point in a plurality of continuous frames consisting of each preamble frame and the current frame, so as to judge the actual key point coordinate corrected by the current frame.

The following exemplifies a specific scheme of correcting the coordinates of the keypoints in the current frame by using the timing correction model, taking the hand keypoints as an example:

typically, 21 keypoints are set for each frame of the palm, and each keypoint is represented by three-dimensional coordinates in the world coordinate system.

Then, the current frame is input into a pre-trained key point prediction model, and the predicted key point coordinates of the current frame can be obtained. At this time, the predicted key point coordinates of the current frame in the present application may be expressed as a 1×21×3 coordinate matrix. If the number of the previous frames in the present application is 32, when the current frame is corrected by the coordinates of the key points, the previous frames have completed the step of correcting the coordinates of the key points, so that the corrected actual coordinates of the key points of the previous frames can be obtained, which can be expressed as a coordinate matrix of 32×21×3.

Then, in the time sequence dimension, the coordinate matrix of 32×21×3 corresponding to the preamble frame and the coordinate matrix of 1×21×3 corresponding to the current frame may be combined to obtain a coordinate matrix of 33×21×3, which can represent the coordinate change condition of each key point in 33 continuous frames composed of each preamble frame and the current frame. Further, the 33×21×3 coordinate matrix is input into a trained time sequence correction model, and a convolution operation is performed through a plurality of convolution kernels such as conv/batch norm/relu trained in the time sequence correction model, so that a 1×21×3 coordinate matrix can be output as actual key point coordinates after correction of the current frame.

At this time, the time series correction model also learns the merging rules input by the plurality of models at the input layer of the time series correction model during training. Therefore, the combination between the coordinate matrix of 32 x 21 x 3 corresponding to the preamble frame and the coordinate matrix of 1 x 21 x 3 corresponding to the current frame can be completed in advance at the input layer of the time sequence correction model, and the combination rule of a plurality of model inputs is not required to be designed manually in the convolution kernel in the time sequence correction model, so that the light weight design and the simplicity design of the time sequence correction model are ensured, and the accuracy of the time sequence correction model on the combination of the coordinate matrices of a plurality of current frames and the preamble frame is improved. Furthermore, when the combined coordinate matrix is checked by adopting a plurality of convolution cores trained in the time sequence correction model to carry out convolution operation, the coordinate change condition of each key point in the continuous frames can be judged more accurately, so that the accuracy of the actual key point coordinates corrected by the current frame is ensured.

It should be understood that the number of preamble frames in the present application is just a super parameter, and may be adjusted autonomously, and typically 16, 32, 64, etc. are used.

Fig. 4 is a schematic block diagram of a key point positioning device according to an embodiment of the present application. As shown in fig. 4, the apparatus 400 may include:

a key point coordinate determining module 410, configured to determine a predicted key point coordinate of a current frame and an actual key point coordinate corrected by a previous frame of the current frame;

and the key point correction module 420 is configured to input the predicted key point coordinate and the actual key point coordinate into a trained time sequence correction model, so as to obtain the corrected actual key point coordinate of the current frame.

In some implementations, the timing correction model is a lightweight network stacked of multiple convolutional layers, bulk normalization layers, and active layers.

In some implementations, the critical point correction module 420 may be specifically configured to:

combining the predicted key point coordinates and the actual key point coordinates in the time sequence dimension to obtain a corresponding key point coordinate matrix;

inputting the key point coordinate matrix into a trained time sequence correction model to obtain the corrected actual key point coordinates of the current frame.

In some implementations, the keypoint coordinate determination module 410 may be specifically configured to:

and inputting the current frame into a trained key point prediction model to obtain corresponding predicted key point coordinates.

In some implementations, the timing correction model is trained using the keypoint prediction model for predicted keypoint coordinates output by each sample frame within a sample video after the keypoint prediction model is trained.

In some implementations, the preamble frame is a continuous historical frame adjacent to the current frame in a preset number, and the actual key point coordinates corrected by the preamble frame are key point coordinates output after the time sequence correction model.

In the embodiment of the application, after the predicted key point coordinates of the current frame are obtained, the corrected actual key point coordinates of the previous frame of the current frame are obtained. And then, the predicted key point coordinates and the actual key point coordinates are input into a trained time sequence correction model together to obtain the corrected actual key point coordinates of the current frame, so that the positioning correction of the key point coordinates in the current frame is realized, the positioning accuracy of the key points is ensured, the condition that the same key point in the continuous frame shakes is avoided, and the stability of the key points in the continuous frame is improved.

It should be understood that apparatus embodiments and method embodiments may correspond with each other and that similar descriptions may refer to the method embodiments. To avoid repetition, no further description is provided here. Specifically, the apparatus 400 shown in fig. 4 may perform any one of the model training method embodiments provided herein, and the foregoing and other operations and/or functions of each module in the apparatus 400 are respectively for implementing corresponding flows in each method of the embodiment herein, and are not described herein for brevity.

The apparatus 400 of the embodiments of the present application is described above in terms of functional modules in connection with the accompanying drawings. It should be understood that the functional module may be implemented in hardware, or may be implemented by instructions in software, or may be implemented by a combination of hardware and software modules. Specifically, each step of the method embodiments in the embodiments of the present application may be implemented by an integrated logic circuit of hardware in a processor and/or an instruction in software form, and the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented as a hardware decoding processor or implemented by a combination of hardware and software modules in the decoding processor. Alternatively, the software modules may be located in a well-established storage medium in the art such as random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and the like. The storage medium is located in a memory, and the processor reads information in the memory, and in combination with hardware, performs the steps in the above method embodiments.

As shown in fig. 5, the electronic device 500 may include:

a memory 510 and a processor 520, the memory 510 being for storing a computer program and for transmitting the program code to the processor 520. In other words, the processor 520 may call and run a computer program from the memory 510 to implement the methods in embodiments of the present application.

For example, the processor 520 may be configured to perform the above-described method embodiments according to instructions in the computer program.

In some embodiments of the present application, the processor 520 may include, but is not limited to:

a general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

In some embodiments of the present application, the memory 510 includes, but is not limited to:

volatile memory and/or nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and Direct memory bus RAM (DR RAM).

In some embodiments of the present application, the computer program may be partitioned into one or more modules that are stored in the memory 510 and executed by the processor 520 to perform the methods provided herein. The one or more modules may be a series of computer program instruction segments capable of performing the specified functions, which are used to describe the execution of the computer program in the electronic device.

As shown in fig. 5, the electronic device may further include:

a transceiver 530, the transceiver 530 being connectable to the processor 520 or the memory 510.

The processor 520 may control the transceiver 530 to communicate with other devices, and in particular, may send information or data to other devices or receive information or data sent by other devices. The transceiver 530 may include a transmitter and a receiver. The transceiver 530 may further include antennas, the number of which may be one or more.

It will be appreciated that the various components in the electronic device are connected by a bus system that includes, in addition to a data bus, a power bus, a control bus, and a status signal bus.

The present application also provides a computer storage medium having stored thereon a computer program which, when executed by a computer, enables the computer to perform the method of the above-described method embodiments. Alternatively, embodiments of the present application also provide a computer program product comprising instructions which, when executed by a computer, cause the computer to perform the method of the method embodiments described above.

When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces, in whole or in part, a flow or function consistent with embodiments of the present application. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. For example, functional modules in the embodiments of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

The foregoing is merely illustrative of the present application, and the scope of the present application is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the technical scope of the present application, and the variations or substitutions are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of locating a key point, comprising:

2. The method of claim 1, wherein the timing correction model is a lightweight network of a plurality of convolutional layers, bulk normalization layers, and active layers stacked.

3. The method according to claim 1, wherein the inputting the predicted keypoint coordinates and the actual keypoint coordinates into the trained time-series correction model to obtain the corrected actual keypoint coordinates of the current frame comprises:

4. The method of claim 1, wherein determining the predicted keypoint coordinates of the current frame comprises:

5. The method of claim 4, wherein the timing correction model is trained using predicted keypoint coordinates output by the keypoint prediction model for each sample frame in the sample video after the keypoint prediction model has been trained.

6. The method of claim 1, wherein the preamble frame is a predetermined number of consecutive history frames adjacent to the current frame, and the actual keypoint coordinates corrected by the preamble frame are keypoint coordinates output after the time-series correction model.

7. A key point positioning device, comprising:

8. An electronic device, comprising:

a processor and a memory for storing a computer program, the processor for invoking and running the computer program stored in the memory to perform the keypoint location method of any of claims 1-6.

9. A computer readable storage medium storing a computer program for causing a computer to perform the keypoint location method as claimed in any one of claims 1-6.

10. A computer program product comprising computer program/instructions which, when executed by a processor, implements the method of positioning a key point as claimed in any one of claims 1-6.