CN113436226A - Method and device for detecting key points - Google Patents

Method and device for detecting key points Download PDF

Info

Publication number
CN113436226A
CN113436226A CN202010209977.XA CN202010209977A CN113436226A CN 113436226 A CN113436226 A CN 113436226A CN 202010209977 A CN202010209977 A CN 202010209977A CN 113436226 A CN113436226 A CN 113436226A
Authority
CN
China
Prior art keywords
position information
frame
key point
optical flow
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010209977.XA
Other languages
Chinese (zh)
Inventor
朱兆琪
董玉新
安山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010209977.XA priority Critical patent/CN113436226A/en
Publication of CN113436226A publication Critical patent/CN113436226A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for detecting key points, and relates to the technical field of computers. One embodiment of the method comprises: acquiring any frame in a video sequence and a historical frame before the any frame; determining model position information of each key point of the target in any frame based on a pre-trained key point model; determining optical flow position information of each key point of a target in any frame based on optical flow tracking according to any frame and the historical frame; and dynamically weighting the model position information and the optical flow position information to obtain the position information of each key point of the target in any frame. According to the method and the device, the key point lag condition caused by key point tracking of an optical flow method and the key point jitter problem caused by only using the model to predict the position information of the key point can be avoided, and the accuracy and the stability of key point detection are improved.

Description

Method and device for detecting key points
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for detecting key points.
Background
In the prior art, when detecting key points, a model is usually adopted to predict the position information of the key points, or the position information of the predicted key points is tracked by methods such as optical flow and filtering.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art:
when the position information of the key points is predicted by adopting the model, the problem that the key points are jittered when the video is predicted is easily caused even if the positions of the predicted key points of the two frames of images with small deviation have pixel level deviation visible to human eyes due to the existence of model errors;
the position information of the predicted key points is tracked through methods such as optical flow and filtering, when the difference between frames is large, the accuracy of the prediction result is low, and the key point hysteresis phenomenon is easy to occur.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for detecting a key point, which can avoid a situation of key point lag occurring when an optical flow method tracks a key point and a problem of key point jitter caused by only using a model to predict position information of a key point by dynamically weighting model position information and optical flow position information, and improve accuracy and stability of key point detection.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of keypoint detection, including:
acquiring any frame in a video sequence and a historical frame before the any frame;
determining model position information of each key point of the target in any frame based on a pre-trained key point model;
determining optical flow position information of each key point of a target in any frame based on optical flow tracking according to any frame and the historical frame;
and dynamically weighting the model position information and the optical flow position information to obtain the position information of each key point of the target in any frame.
Optionally, dynamically weighting the model position information and the optical flow position information comprises:
determining weights for the model position information and the optical flow position information as follows: for any key point, when the difference value between the optical flow position information of the key point and the position information of the key point in the historical frame is 0, setting the weight of the model position information of the key point to be 0; when the difference value between the optical flow position information of any key point and the position information of any key point in the historical frame is increased, adjusting the weight of the model position information of any key point; when the difference value between the optical flow position information of any key point and the position information of any key point in the historical frame is equal to a preset threshold value, setting the weight of the model position information of any key point to be 1;
dynamically weighting the model position information and the optical flow position information based on the weights.
Optionally, the preset threshold is 4.
Optionally, determining model position information of each key point of the target in any frame based on a pre-trained key point model, including:
determining a target detection frame in any frame; extracting a target image from any frame according to the target detection frame; and inputting the target image into the key point model to obtain model position information of each key point of the target in any frame.
Optionally, an SSD algorithm is used to determine the destination detection box in any of the frames.
Optionally, determining optical flow position information of each key point of the target in any frame based on optical flow tracking according to any frame and the historical frame, including:
determining a motion vector of a background image according to the position information of a background pixel point in the historical frame and the initial position information in any frame; and respectively determining the optical flow position information of each key point in any frame according to the position information of each key point in the historical frame and the motion vector of the background image.
Optionally, the target is a human face, and the number of the key points is 106.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for keypoint detection, including:
the image acquisition module is used for acquiring any frame in a video sequence and a historical frame before the any frame;
the model detection module is used for determining model position information of each key point of the target in any frame based on a pre-trained key point model;
the optical flow detection module is used for determining optical flow position information of each key point of the target in any frame based on optical flow tracking according to any frame and the historical frame;
and the dynamic weighting module is used for dynamically weighting the model position information and the optical flow position information to obtain the position information of each key point of the target in any frame.
Optionally, the dynamic weighting module dynamically weights the model position information and the optical flow position information, and includes:
determining weights for the model position information and the optical flow position information as follows: for any key point, when the difference value between the optical flow position information of the key point and the position information of the key point in the historical frame is 0, setting the weight of the model position information of the key point to be 0; when the difference value between the optical flow position information of any key point and the position information of any key point in the historical frame is increased, adjusting the weight of the model position information of any key point; when the difference value between the optical flow position information of any key point and the position information of any key point in the historical frame is equal to a preset threshold value, setting the weight of the model position information of any key point to be 1;
dynamically weighting the model position information and the optical flow position information based on the weights.
Optionally, the preset threshold is 4.
Optionally, the determining, by the model detection module, model position information of each key point of the target in any frame based on a pre-trained key point model includes:
determining a target detection frame in any frame; extracting a target image from any frame according to the target detection frame; and inputting the target image into the key point model to obtain model position information of each key point of the target in any frame.
Optionally, the model detection module determines the target detection box in any frame by using an SSD algorithm.
Optionally, the optical flow detection module determines optical flow position information of each key point of the target in the any frame based on optical flow tracking according to the any frame and the historical frame, and includes:
determining a motion vector of a background image according to the position information of a background pixel point in the historical frame and the initial position information in any frame; and respectively determining the optical flow position information of each key point in any frame according to the position information of each key point in the historical frame and the motion vector of the background image.
Optionally, the target is a human face, and the number of the key points is 106.
According to a third aspect of the embodiments of the present invention, there is provided an electronic device for keypoint detection, comprising:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method provided by the first aspect of the embodiments of the present invention.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method provided by the first aspect of embodiments of the present invention.
One embodiment of the above invention has the following advantages or benefits: by dynamically weighting the model position information and the optical flow position information, the problems of key point lag caused by key point tracking of an optical flow method and key point jitter caused by only using a model to predict the position information of the key point can be avoided, and the accuracy and the stability of key point detection are improved.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main flow of a method of keypoint detection according to an embodiment of the invention;
FIG. 2 is a schematic diagram of the main modules of an apparatus for keypoint detection in an embodiment of the present invention;
FIG. 3 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 4 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
According to an aspect of an embodiment of the present invention, a method of keypoint detection is provided.
Fig. 1 is a schematic diagram of a main flow of a method for detecting a keypoint according to an embodiment of the present invention, where as shown in fig. 1, the method for detecting a keypoint includes: step S101, step S102, step S103, and step S104.
Step S101, acquiring any frame in a video sequence and a history frame before the frame.
Any frame mentioned herein as well as the historical frame is a video frame in the video sequence. Typically, any frame does not contain the first video frame in a video sequence. A historical frame is a video frame in a video sequence that precedes any frame. The history frame may be a frame previous to any frame, or may be a second frame, a third frame, or the like previous to any frame.
And S102, determining model position information of each key point of the target in any frame based on the pre-trained key point model.
The key point model is a model obtained by pre-training according to the sample set, and the method for training the key point model can be selectively set according to the actual situation, for example, methods such as machine learning and deep learning are adopted, and details are not repeated here. The model position information is position information of each keypoint predicted by the keypoint model. Typically, the position information is identified in the form of coordinates, including an abscissa and an ordinate.
The key points refer to pixel points in a designated area in the image. Taking a human face as an example, the key points may include pixel points in the regions such as eyebrows, eyes, nose, mouth, and face contour. The number of the key points can be selectively set according to actual conditions, and taking a human face as an example, the number of the key points can be 106.
Optionally, determining model position information of each key point of the target in any frame based on a pre-trained key point model, including: determining a target detection frame in any frame; extracting a target image from any frame according to the target detection frame; and inputting the target image into the key point model to obtain model position information of each key point of the target in any frame.
The method for determining the target detection box may be selectively set according to an actual situation, for example, an SSD algorithm (single shot multi box detector, an end-to-end detection algorithm) is used to determine the target detection box in any frame, and for example, a YOLO algorithm (a end-to-end detection algorithm) is used.
Illustratively, a human face is targeted. Determining model position information of each key point of the target in any frame based on a pre-trained key point model, wherein the model position information comprises the following steps:
determining a face detection frame in any frame by using an SSD algorithm, setting the position of the face detection frame as (x, y, w, h), (x, y) representing the position coordinate of the upper left corner point of the face detection frame, and w, h respectively representing the width and height of the face detection frame; extracting the image of the face part in any frame according to the face detection frame to obtain a face image; and inputting the face image into a key point model for detecting the face key points to obtain model position information of each key point.
The face image input at the k frame is IkIf the mapping model from the face image to the key point is f, the model position information can be represented by formula (1):
Figure BDA0002422485160000071
Figure BDA0002422485160000072
and outputting model position information of 106 key points for the model.
When the key point detection is performed on continuous video frames, due to the change of each frame of image, there is pixel-level offset when a target detection frame is obtained, and in addition, there is also model error in the key point model, so that there is pixel-level offset in the key points between frames. When the video is continuous, human eyes can feel the jitter of the key points as a whole. Therefore, the embodiment of the invention processes the model position information of each key point obtained by the key point model in the subsequent steps.
And step S103, determining optical flow position information of each key point of the target in any frame based on optical flow tracking according to any frame and the historical frame.
Optical flow tracking is a method of calculating motion information of an object between adjacent frames by using the change of pixels in a video sequence in a time domain and the correlation between adjacent frames to find the correspondence between a previous frame and a current frame. In general, optical flow is due to movement of the foreground objects themselves in the scene, motion of the camera, or both.
When a moving object is viewed by the human eye, the scene of the object forms a series of continuously changing images on the retina of the human eye, and this series of continuously changing information constantly "flows" through the retina (i.e., the image plane) as if it were a "stream" of light, hence the term light stream. The optical flow expresses the change of the image, and can be used for determining the motion situation of the object because the optical flow contains the information of the motion of the object.
In space, motion can be described by motion fields, and in an image plane, motion of objects is often represented by different gray-scale distributions of different images in an image sequence, so that the motion fields in space are transferred to the image and are represented as optical flow fields. The optical flow field is a two-dimensional vector field which reflects the change trend of the gray level of each pixel point on the image and can be regarded as an instantaneous velocity field generated by the movement of the pixel points with the gray levels on the image plane. The contained information is the instantaneous motion velocity vector information of each image point.
Optionally, determining optical flow position information of each key point of the target in any frame based on optical flow tracking according to any frame and the historical frame, including:
determining a motion vector of a background image according to the position information of a background pixel point in the historical frame and the initial position information in any frame; and respectively determining the optical flow position information of each key point in any frame according to the position information of each key point in the historical frame and the motion vector of the background image.
The embodiment of the invention can track and determine the optical flow position information of each key point of the target in any frame based on the Lucas-Kanade optical flow method. There are several basic assumptions: (1) the brightness is constant, namely the brightness of the same pixel point cannot change along with the change of time; (2) small movements, that is, changes in time do not cause drastic changes in position; (3) the space is consistent, the adjacent points on one scene are projected to the image and are also adjacent points, and the speeds of the adjacent points are consistent.
Based on the above assumption, it is assumed that the pixel point I (x, y, t) in the first frame image moves to (x + dx, y + dy) of the second frame image after time dt. According to the first assumption: the brightness is unchanged, and the following can be obtained:
I(x,y,t)=I(x+dx,y+dy,t+dt) (2)
the taylor series expansion is performed on the right side of the equal sign, the same terms are eliminated, and both sides are divided by dt to obtain the following equation:
fxu+fyv+ft=0 (3)
wherein:
Figure BDA0002422485160000091
Figure BDA0002422485160000092
equation (2) is called optical flow equation, where fxAnd fyIs a directional gradient, ftIs a time gradient.
By optical flow, by giving the (k-1) th frame and the key point coordinates of the (k-1) th frame
Figure BDA0002422485160000093
The optical flow key point coordinate position of the k frame can be obtained
Figure BDA0002422485160000094
And step S104, carrying out dynamic weighting on the model position information and the optical flow position information to obtain the position information of each key point of the target in any frame.
In the optical flow tracking, when the difference between a frame and a frame is not large, the position information of a prediction key point is more stable and is not large different from the point of the previous frame. When the deviation between the frames is large, the accuracy of the prediction result of the optical flow tracking is poor, and the position information of the key point cannot be accurately predicted. The prediction result of the key point model is irrelevant to the target position between the frames, so that the position information of each key point in each frame can be accurately predicted, but errors exist in the process of predicting the position information of the key point every time, so that even if the position information of the key point of an image with small deviation of two frames has pixel level deviation visible to human eyes, when a video is predicted, the point can be felt to shake. If the model position information and the optical flow position information are weighted by the fixed weight values, the position information of the final key point is output wrongly, and the situation that the key point position drifts farther and farther occurs, so that the drifting key point cannot be corrected effectively. The embodiment of the invention adopts dynamic weighting, fully considers the characteristics of optical flow tracking and a key point model, can ensure the stability of the key points and respond to the conditions of target motion and the like in time, and ensures that the fused key points are more accurate and stable when the target is static.
Illustratively, with a human face as a target, in a scene of human face motion:
the fixed weight is used for weighting, when a human face moves, the human face distance deviation between frames is increased, the faster the human face moves, the larger the human face deviation in an image is, and therefore, the phenomenon of inaccuracy or lag occurs to optical flow position information predicted by optical flow tracking when the human face moves, and the phenomenon of lag or inaccuracy occurs to weighted key points. And the dynamic weight weighting is used, the dynamic weight weighting approaches to a set weighting threshold value during movement, the proportion of the result of using the model is increased, the threshold value is reached when the movement speed is too high, the result of completely using the model point is reached, and the final point cannot generate a lag or inaccuracy phenomenon.
Illustratively, with a human face as a target, in a scene where the human face is still:
when the human face is static, the deviation between the image frame and the frame is small or zero, and the optical flow position information obtained by optical flow tracking is more stable than the model position information obtained by the key point model, so that the stability of the key point is improved by increasing the proportion of optical flow tracking. And by using a dynamic weighting mode, the weight value of the optical flow position information is increased, so that the key point is more stable.
Dynamically weighting the model position information and the optical flow position information, comprising: determining weights for the model position information and the optical flow position information as follows: for any key point, when the difference value between the optical flow position information of the key point and the position information of the key point in the historical frame is 0, setting the weight of the model position information of the key point to be 0; when the difference value between the optical flow position information of any key point and the position information of any key point in the historical frame is increased, adjusting the weight of the model position information of any key point; when the difference value between the optical flow position information of any key point and the position information of any key point in the historical frame is equal to a preset threshold value, setting the weight of the model position information of any key point to be 1; dynamically weighting the model position information and the optical flow position information based on the weights.
In this embodiment, when a difference between the optical flow position information of any one of the key points and the position information of any one of the key points in the history frame is equal to a preset threshold, the weight of the model position information of any one of the key points may be 1 or a negative number greater than 1. The sum of the weights of the model position information and the optical flow position information is 1. The preset threshold may be selectively set according to actual conditions, for example, set to 1, 4, 8, and the like.
Optionally, dynamically weighting the model position information and the optical flow position information by using the following formula, including:
Figure BDA0002422485160000111
Figure BDA0002422485160000112
wherein,
Figure BDA0002422485160000113
Figure BDA0002422485160000114
in the formula,
Figure BDA0002422485160000115
represents the abscissa in the position information of the ith keypoint in the kth frame,
Figure BDA0002422485160000116
a vertical coordinate in the position information representing the ith key point in the kth frame;
Figure BDA0002422485160000117
the abscissa in the optical flow position information representing the ith keypoint in the kth frame,
Figure BDA0002422485160000118
a vertical coordinate in optical flow position information representing the ith key point in the kth frame;
Figure BDA0002422485160000119
the abscissa in the model location information representing the ith keypoint in the kth frame,
Figure BDA00024224851600001110
a vertical coordinate in the model position information representing the ith key point in the kth frame;
Figure BDA00024224851600001111
represents the abscissa in the location information of the ith keypoint in the (k-1) th frame,
Figure BDA00024224851600001112
a vertical coordinate in the position information representing the ith key point in the (k-1) th frame; threshold x, threshold respectively represent preset thresholds on the abscissa and ordinate.
In this embodiment, for any key point, the smaller the deviation between the optical flow position information of the current frame and the position information of the previous frame is, the greater the weight of the optical flow position information is; when the target moves rapidly, the deviation between the optical flow position information of the current frame and the position information of the previous frame is large, so that the weight of the model position information is increased; and when the preset threshold value is exceeded, the final result fully uses the model position information. By the method, the problem of key point jitter caused by model prediction errors is reduced, and the stability of the key points is realized.
According to the embodiment of the invention, by dynamically weighting the model position information and the optical flow position information, the key point lagging situation caused by the optical flow method in key point tracking and the key point jitter problem caused by only using the model to predict the position information of the key point can be avoided, and the accuracy and the stability of key point detection are improved. The embodiment of the invention can be combined with applications such as makeup, beauty picture, AR trial fitting and the like, and provides better experience for users.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for implementing the above method.
Fig. 2 is a schematic diagram of main modules of an apparatus for detecting a keypoint, according to an embodiment of the present invention, as shown in fig. 2, the apparatus 200 for detecting a keypoint includes:
an image acquisition module 201, which acquires any frame in a video sequence and a history frame before the frame;
the model detection module 202 is used for determining model position information of each key point of the target in any frame based on a pre-trained key point model;
an optical flow detection module 203, which determines optical flow position information of each key point of the target in any frame based on optical flow tracking according to any frame and the historical frame;
and a dynamic weighting module 204, configured to dynamically weight the model position information and the optical flow position information to obtain position information of each key point of the target in any frame.
Optionally, the dynamic weighting module dynamically weights the model position information and the optical flow position information, and includes:
determining weights for the model position information and the optical flow position information as follows: for any key point, when the difference value between the optical flow position information of the key point and the position information of the key point in the historical frame is 0, setting the weight of the model position information of the key point to be 0; when the difference value between the optical flow position information of any key point and the position information of any key point in the historical frame is increased, adjusting the weight of the model position information of any key point; when the difference value between the optical flow position information of any key point and the position information of any key point in the historical frame is equal to a preset threshold value, setting the weight of the model position information of any key point to be 1;
dynamically weighting the model position information and the optical flow position information based on the weights.
Optionally, the preset threshold is 4.
Optionally, the determining, by the model detection module, model position information of each key point of the target in any frame based on a pre-trained key point model includes:
determining a target detection frame in any frame; extracting a target image from any frame according to the target detection frame; and inputting the target image into the key point model to obtain model position information of each key point of the target in any frame.
Optionally, the model detection module determines the target detection box in any frame by using an SSD algorithm.
Optionally, the optical flow detection module determines optical flow position information of each key point of the target in the any frame based on optical flow tracking according to the any frame and the historical frame, and includes:
determining a motion vector of a background image according to the position information of a background pixel point in the historical frame and the initial position information in any frame; and respectively determining the optical flow position information of each key point in any frame according to the position information of each key point in the historical frame and the motion vector of the background image.
Optionally, the target is a human face, and the number of the key points is 106.
According to a third aspect of the embodiments of the present invention, there is provided an electronic device for keypoint detection, comprising:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method provided by the first aspect of the embodiments of the present invention.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method provided by the first aspect of embodiments of the present invention.
Fig. 3 shows an exemplary system architecture 300 to which the method of keypoint detection or the apparatus of keypoint detection of an embodiment of the invention may be applied.
As shown in fig. 3, the system architecture 300 may include terminal devices 301, 302, 303, a network 304, and a server 305. The network 304 serves as a medium for providing communication links between the terminal devices 301, 302, 303 and the server 305. Network 304 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal device 301, 302, 303 to interact with the server 305 via the network 304 to receive or send messages or the like. The terminal devices 301, 302, 303 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 301, 302, 303 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 305 may be a server providing various services, such as a background management server (for example only) providing support for shopping-like websites browsed by users using the terminal devices 301, 302, 303. The background management server may analyze and perform other processing on the received data such as the target tracking request, and feed back a processing result (for example, target position information in each video frame of a segment of video, which is merely an example) to the terminal device.
It should be noted that the method for detecting a keypoint is generally executed by the server 305, and accordingly, the apparatus for detecting a keypoint is generally disposed in the server 305.
It should be understood that the number of terminal devices, networks, and servers in fig. 3 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 4, a block diagram of a computer system 400 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the system of the present invention when executed by a Central Processing Unit (CPU) 401.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprising: the image acquisition module is used for acquiring any frame in a video sequence and a historical frame before the any frame; the model detection module is used for determining model position information of each key point of the target in any frame based on a pre-trained key point model; the optical flow detection module is used for determining optical flow position information of each key point of the target in any frame based on optical flow tracking according to any frame and the historical frame; and the dynamic weighting module is used for dynamically weighting the model position information and the optical flow position information to obtain the position information of each key point of the target in any frame. The names of these modules do not constitute a limitation to the module itself in some cases, and for example, the model detection module may also be described as a "module that determines optical-flow position information of each key point of the target in any one frame based on optical-flow tracking".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring any frame in a video sequence and a historical frame before the any frame; determining model position information of each key point of the target in any frame based on a pre-trained key point model; determining optical flow position information of each key point of a target in any frame based on optical flow tracking according to any frame and the historical frame; and dynamically weighting the model position information and the optical flow position information to obtain the position information of each key point of the target in any frame.
According to the technical scheme of the embodiment of the invention, the model position information and the optical flow position information are dynamically weighted, so that the problems of key point lag caused by key point tracking of an optical flow method and key point jitter caused by only using the model to predict the position information of the key point can be avoided, and the accuracy and the stability of key point detection are improved.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method of keypoint detection, comprising:
acquiring any frame in a video sequence and a historical frame before the any frame;
determining model position information of each key point of the target in any frame based on a pre-trained key point model;
determining optical flow position information of each key point of a target in any frame based on optical flow tracking according to any frame and the historical frame;
and dynamically weighting the model position information and the optical flow position information to obtain the position information of each key point of the target in any frame.
2. The method of claim 1, wherein dynamically weighting the model position information and the optical flow position information comprises:
determining weights for the model position information and the optical flow position information as follows: for any key point, when the difference value between the optical flow position information of the key point and the position information of the key point in the historical frame is 0, setting the weight of the model position information of the key point to be 0; when the difference value between the optical flow position information of any key point and the position information of any key point in the historical frame is increased, adjusting the weight of the model position information of any key point; when the difference value between the optical flow position information of any key point and the position information of any key point in the historical frame is equal to a preset threshold value, setting the weight of the model position information of any key point to be 1;
dynamically weighting the model position information and the optical flow position information based on the weights.
3. The method of claim 2, wherein the preset threshold is 4.
4. The method of claim 1, wherein determining model location information for each keypoint of the target in any frame based on a pre-trained keypoint model comprises:
determining a target detection frame in any frame; extracting a target image from any frame according to the target detection frame; and inputting the target image into the key point model to obtain model position information of each key point of the target in any frame.
5. The method of claim 4, wherein the target detection box in any of the frames is determined using an SSD algorithm.
6. The method of claim 1, wherein determining optical flow position information of each key point of the target in any frame based on optical flow tracking according to any frame and the historical frames comprises:
determining a motion vector of a background image according to the position information of a background pixel point in the historical frame and the initial position information in any frame; and respectively determining the optical flow position information of each key point in any frame according to the position information of each key point in the historical frame and the motion vector of the background image.
7. The method of any one of claims 1-6, wherein the object is a human face, and the number of keypoints is 106.
8. An apparatus for keypoint detection, comprising:
the image acquisition module is used for acquiring any frame in a video sequence and a historical frame before the any frame;
the model detection module is used for determining model position information of each key point of the target in any frame based on a pre-trained key point model;
the optical flow detection module is used for determining optical flow position information of each key point of the target in any frame based on optical flow tracking according to any frame and the historical frame;
and the dynamic weighting module is used for dynamically weighting the model position information and the optical flow position information to obtain the position information of each key point of the target in any frame.
9. An electronic device for keypoint detection, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202010209977.XA 2020-03-23 2020-03-23 Method and device for detecting key points Pending CN113436226A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010209977.XA CN113436226A (en) 2020-03-23 2020-03-23 Method and device for detecting key points

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010209977.XA CN113436226A (en) 2020-03-23 2020-03-23 Method and device for detecting key points

Publications (1)

Publication Number Publication Date
CN113436226A true CN113436226A (en) 2021-09-24

Family

ID=77752691

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010209977.XA Pending CN113436226A (en) 2020-03-23 2020-03-23 Method and device for detecting key points

Country Status (1)

Country Link
CN (1) CN113436226A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887547A (en) * 2021-12-08 2022-01-04 北京世纪好未来教育科技有限公司 Key point detection method and device and electronic equipment
CN115100691A (en) * 2022-08-24 2022-09-23 腾讯科技(深圳)有限公司 Method, device and equipment for acquiring key point detection model and detecting key points
WO2023151348A1 (en) * 2022-02-10 2023-08-17 腾讯科技(深圳)有限公司 Method for processing key points in image, and related apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100315505A1 (en) * 2009-05-29 2010-12-16 Honda Research Institute Europe Gmbh Object motion detection system based on combining 3d warping techniques and a proper object motion detection
CN109583391A (en) * 2018-12-04 2019-04-05 北京字节跳动网络技术有限公司 Critical point detection method, apparatus, equipment and readable medium
CN110363124A (en) * 2019-07-03 2019-10-22 广州多益网络股份有限公司 Rapid expression recognition and application method based on face key points and geometric deformation
CN110443828A (en) * 2019-07-31 2019-11-12 腾讯科技(深圳)有限公司 Method for tracing object and device, storage medium and electronic device
CN110544272A (en) * 2019-09-06 2019-12-06 腾讯科技(深圳)有限公司 face tracking method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100315505A1 (en) * 2009-05-29 2010-12-16 Honda Research Institute Europe Gmbh Object motion detection system based on combining 3d warping techniques and a proper object motion detection
CN109583391A (en) * 2018-12-04 2019-04-05 北京字节跳动网络技术有限公司 Critical point detection method, apparatus, equipment and readable medium
CN110363124A (en) * 2019-07-03 2019-10-22 广州多益网络股份有限公司 Rapid expression recognition and application method based on face key points and geometric deformation
CN110443828A (en) * 2019-07-31 2019-11-12 腾讯科技(深圳)有限公司 Method for tracing object and device, storage medium and electronic device
CN110544272A (en) * 2019-09-06 2019-12-06 腾讯科技(深圳)有限公司 face tracking method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨阳;周海英;王守义;: "基于多特征融合的显著性跟踪算法", 科学技术与工程, no. 26, 18 September 2017 (2017-09-18) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887547A (en) * 2021-12-08 2022-01-04 北京世纪好未来教育科技有限公司 Key point detection method and device and electronic equipment
CN113887547B (en) * 2021-12-08 2022-03-08 北京世纪好未来教育科技有限公司 Key point detection method and device and electronic equipment
WO2023151348A1 (en) * 2022-02-10 2023-08-17 腾讯科技(深圳)有限公司 Method for processing key points in image, and related apparatus
CN115100691A (en) * 2022-08-24 2022-09-23 腾讯科技(深圳)有限公司 Method, device and equipment for acquiring key point detection model and detecting key points
CN115100691B (en) * 2022-08-24 2023-08-08 腾讯科技(深圳)有限公司 Method, device and equipment for acquiring key point detection model and detecting key point

Similar Documents

Publication Publication Date Title
US11423695B2 (en) Face location tracking method, apparatus, and electronic device
CN109308469B (en) Method and apparatus for generating information
CN113436226A (en) Method and device for detecting key points
CN113286194A (en) Video processing method and device, electronic equipment and readable storage medium
US11900676B2 (en) Method and apparatus for detecting target in video, computing device, and storage medium
CN110659600B (en) Object detection method, device and equipment
EP3483835B1 (en) Information processing apparatus, background image update method, and non-transitory computer-readable storage medium
CN113920307A (en) Model training method, device, equipment, storage medium and image detection method
CN111553362B (en) Video processing method, electronic device and computer readable storage medium
US20210312650A1 (en) Method and apparatus of training depth estimation network, and method and apparatus of estimating depth of image
CN110427899B (en) Video prediction method and device based on face segmentation, medium and electronic equipment
CN110874853B (en) Method, device, equipment and storage medium for determining target movement
CN110706262B (en) Image processing method, device, equipment and storage medium
CN110032914B (en) Picture labeling method and device
CN110858316A (en) Classifying time series image data
CN113014936B (en) Video frame insertion method, device, equipment and storage medium
CN112584076B (en) Video frame interpolation method and device and electronic equipment
JP7163372B2 (en) Target tracking method and device, electronic device and storage medium
CN115205925A (en) Expression coefficient determining method and device, electronic equipment and storage medium
CN111601013B (en) Method and apparatus for processing video frames
CN111314626A (en) Method and apparatus for processing video
CN117581275A (en) Eye gaze classification
CN108734718B (en) Processing method, device, storage medium and equipment for image segmentation
CN113766117B (en) Video de-jitter method and device
Nunes et al. Adaptive global decay process for event cameras

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination