CN109583391B - Key point detection method, device, equipment and readable medium - Google Patents

Key point detection method, device, equipment and readable medium Download PDF

Info

Publication number
CN109583391B
CN109583391B CN201811473824.5A CN201811473824A CN109583391B CN 109583391 B CN109583391 B CN 109583391B CN 201811473824 A CN201811473824 A CN 201811473824A CN 109583391 B CN109583391 B CN 109583391B
Authority
CN
China
Prior art keywords
video frame
position information
key point
current video
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811473824.5A
Other languages
Chinese (zh)
Other versions
CN109583391A (en
Inventor
胡耀全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN201811473824.5A priority Critical patent/CN109583391B/en
Publication of CN109583391A publication Critical patent/CN109583391A/en
Application granted granted Critical
Publication of CN109583391B publication Critical patent/CN109583391B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Abstract

The embodiment of the disclosure discloses a key point detection method, a key point detection device, key point detection equipment and a readable medium. The method comprises the following steps: selecting a current video frame and a first historical video frame before the current video frame from a video frame sequence on which user images are displayed; detecting initial position information of each key point from a current video frame; mapping the position information of each key point in the first historical video frame to the current video frame through the optical flow to obtain the reference position information of each key point in the current video frame; and respectively obtaining the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point. The embodiment of the disclosure can improve the accuracy of key point detection.

Description

Key point detection method, device, equipment and readable medium
Technical Field
The disclosed embodiments relate to computer vision technologies, and in particular, to a method, an apparatus, a device, and a readable medium for detecting a keypoint.
Background
As computer vision has developed, some electronic devices can detect various key points of a user, such as various joints, limbs, and quincux, etc., from an image of the user.
Currently, the detected key points often need to be displayed in the image in real time, for example, in the process that the user makes various postures or motions in front of the lens, the corresponding key points are displayed in real time in the shot image so as to further perform operations such as body correction and the like, so as to increase the taste and interactivity. Therefore, higher requirements are provided for the accuracy and efficiency of the key point detection, however, the existing key point detection method cannot meet the requirements of high accuracy and high efficiency, and is difficult to be applied to a real-time application scene.
Disclosure of Invention
The embodiment of the disclosure provides a method, a device, equipment and a readable medium for detecting key points, so as to improve the accuracy and efficiency of key point detection and adapt to a real-time application scene.
In a first aspect, an embodiment of the present disclosure provides a method for detecting a key point, including:
selecting a current video frame and a first historical video frame before the current video frame from a video frame sequence on which user images are displayed;
detecting initial position information of each key point from a current video frame;
mapping the position information of each key point in the first historical video frame to the current video frame through the optical flow to obtain the reference position information of each key point in the current video frame;
and respectively obtaining the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point.
In a second aspect, an embodiment of the present disclosure further provides a key point detecting device, including:
the selection module is used for selecting a current video frame and a first historical video frame before the current video frame from a video frame sequence displaying user images;
the detection module is used for detecting the initial position information of each key point from the current video frame;
the mapping module is used for mapping the position information of each key point in the first historical video frame to the current video frame through the optical flow to obtain the reference position information of each key point in the current video frame;
and the obtaining module is used for respectively obtaining the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:
one or more processing devices;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processing devices to implement the keypoint detection method of any of the embodiments.
In a fourth aspect, the disclosed embodiments also provide a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processing device, implements the keypoint detection method according to any embodiment.
In the embodiment of the disclosure, initial position information of each key point in a first historical video frame is mapped to the current video frame through optical flow by detecting the initial position information of each key point in the current video frame, so that reference position information of each key point in the current video frame is obtained, and the position information of each key point in the current video frame is respectively obtained according to the initial position information and the reference position information of each key point, so that the position information of each key point in the current video frame is obtained by taking the historical position of each key point as a reference based on an optical flow algorithm, and the accuracy of key point detection is effectively improved; meanwhile, under the conditions that the key points in the current video frame are shielded and the motion is blurred, the position information of the key points can be accurately detected.
Drawings
Fig. 1 is a flowchart of a method for detecting a key point according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a key point detection method provided in the second embodiment of the present disclosure;
fig. 3 is a flowchart of a key point detection method provided in the third embodiment of the present disclosure;
fig. 4 is a flowchart of a method for detecting a key point according to a fourth embodiment of the disclosure;
fig. 5 is a schematic structural diagram of a keypoint detection apparatus provided in the fifth embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the disclosure and are not limiting of the disclosure. It should be further noted that, for the convenience of description, only some of the structures relevant to the present disclosure are shown in the drawings, not all of them. In the following embodiments, optional features and examples are provided in each embodiment, and various features described in the embodiments may be combined to form a plurality of alternatives, and each numbered embodiment should not be regarded as only one technical solution.
Example one
Fig. 1 is a flowchart of a method for detecting a key point according to an embodiment of the present disclosure, where the embodiment is applicable to a case of performing key point detection on a sequence of video frames displaying user images, and the method may be executed by a key point detecting apparatus, which may be formed by hardware and/or software and integrated in an electronic device, which may be a server or a terminal. With reference to fig. 1, the method provided by the embodiment of the present disclosure specifically includes the following operations:
s110, selecting a current video frame and a first historical video frame before the current video frame from a video frame sequence on which user images are displayed.
A sequence of video frames refers to consecutive video frames in a period of time in a video stream, the sequence of video frames comprising a plurality of video frames, for example 20 video frames. In this embodiment, the duration of the obtained video frame sequence should be short, for example, the duration is within a preset duration range, for example, 3 seconds, so that the display position change of the user image in different video frames is small, the posture change is small, and the accuracy of detecting the key point is improved.
Optionally, a user image is displayed in each video frame of the sequence of video frames, and at least one key point, such as the top of the head, the left shoulder, the right knee, etc., of the user is displayed on the user image.
The present embodiment sequentially selects video frames from the video frame sequence in time order as current video frames, and performs S110-S140 on each current video frame until the video frames in the video frame sequence are processed completely. The method provided by the embodiment aims to detect the position information of each key point in the current video frame, and in the detection process, the historical video frame before the current video frame is taken as a reference. Alternatively, for convenience of description and distinction, the historical video frame used for the keypoint location reference is referred to as a first historical video frame, and in the subsequent embodiments, the historical video frame used for the regression frame location reference is referred to as a second historical video frame. The first historical video frame and the second historical video frame can be the same video frame or different video frames, and the number of the video frames can be at least one. Preferably, in order to sufficiently refer to the position information of each key point in the historical video frames, the first historical video frame includes N video frames before the current video frame, where N is a natural number, such as 8, 9, 10, and the like.
And S120, detecting initial position information of each key point from the current video frame.
A keypoint detection model is trained in advance, and the keypoint detection model is used for outputting the position information of each keypoint according to an input video frame, wherein the position information of each keypoint comprises the identification (such as an ID number) of each keypoint and the coordinate of each keypoint. In this embodiment, the current video frame is input to the key point detection model to obtain the position information of each key point. For convenience of description and distinction, the position information directly detected from the current video frame is referred to as initial position information.
S130, mapping the position information of each key point in the first historical video frame to the current video frame through the optical flow to obtain the reference position information of each key point in the current video frame.
In this embodiment, an optical flow field algorithm is adopted to calculate motion vectors of a plurality of pixel points in the first historical video frame and the current video frame, that is, an optical flow field is established. Since the time interval between video frames is short, the motion vectors of the background image should be the same, and the motion vectors of the key points may be slightly different. Based on the property, the position information of each key point in the first historical video frame is mapped into a current video frame according to the motion vector of the background image.
Optionally, S130 includes the following two steps:
the first step is as follows: and determining the motion vector of the background image according to the position information of the background pixel point in the first historical video frame and the initial position information in the current video frame.
In an example, the display position of the background pixel point in the first historical video frame is the coordinate point C, the display position of the background pixel point in the current video frame is the coordinate point D, and then the motion vector of the background image is
Figure BDA0001891681220000051
The second step is that: and respectively determining the reference position information of each key point in the current video frame according to the position information of each key point in the first historical video frame and the motion vector of the background image.
In this embodiment, it is assumed that the positions of the key points in the real environment do not change, and if the position information of a key point in the first historical video frame is the coordinate point a, the position information of the key point in the current video is the coordinate point a
Figure BDA0001891681220000061
For convenience of description and distinction, the location information mapped to each key point of the current video frame is referred to as reference location information.
And S140, respectively obtaining the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point.
In one example, if the number of the first historical video frames is 10 and the number of the key points is 5, 10 pieces of reference position information and 1 piece of initial position information of each key point are obtained. And for each key point, comprehensively analyzing the initial position information and the reference position information to obtain the optimal position information.
In the embodiment, initial position information of each key point in a first historical video frame is mapped to the current video frame through optical flow by detecting the initial position information of each key point in the current video frame, so that reference position information of each key point in the current video frame is obtained, and the position information of each key point in the current video frame is respectively obtained according to the initial position information and the reference position information of each key point, so that the position information of each key point in the current video frame is obtained by taking the historical position of each key point as a reference based on an optical flow algorithm, and the accuracy of key point detection is effectively improved; meanwhile, under the conditions that the key points in the current video frame are shielded and the motion is blurred, the position information of the key points can be accurately detected.
Example two
Fig. 2 is a flowchart of a key point detection method provided in the second embodiment of the present disclosure. In this embodiment, each optional implementation manner of the foregoing embodiment is further optimized, and the regression frame in the current video frame is obtained by using the regression frame in the second historical video frame as a reference, so as to improve the accuracy of the keypoint detection. With reference to fig. 2, the method provided in this embodiment specifically includes the following operations:
s210, selecting a current video frame and a first historical video frame before the current video frame from the video frame sequence displayed with the user image.
S220, selecting a second historical video frame before the current video frame.
For convenience of description and distinction, the historical video frame used for reference by the regression block is referred to as a second historical video frame.
It should be noted that S220 is executed before S240, and the execution sequence of S220 is not limited in this embodiment. Preferably, S220 is performed in synchronization with S210.
And S230, determining a candidate frame comprising each key point in the current video frame.
Typically, multiple candidate blocks can be determined in the current video frame, which typically overlap or are redundant. For convenience of description and distinction, a frame directly determined from a current video frame is referred to as a candidate frame.
S240, mapping the regression frame including the key points in the second historical video frame to the current video frame through the optical flow to obtain a reference regression frame of the current video frame.
Similar to the mapping method of the key points, in this embodiment, an optical flow field algorithm is used to calculate motion vectors of a plurality of pixel points in the second historical video frame and the current video frame, that is, an optical flow field is established. And mapping the position information of the regression frame in the second historical video frame into a current video frame according to the motion vector of the background image, wherein the position information of the regression frame comprises the length, the width and the center coordinate of the regression frame.
Optionally, S240 includes the following two steps:
the method comprises the following steps that firstly, a motion vector of a background image is determined according to the position information of a background pixel point in a second historical video frame and the initial position information of the background pixel point in a current video frame.
The second step is that: and determining a reference regression frame comprising the key points in the current video frame according to the position information of the regression frame comprising the key points in the second historical video frame and the motion vector of the background image.
And S250, obtaining a regression frame in the current video frame according to the candidate frame and the reference regression frame.
Optionally, non-maximum suppression is performed on the candidate frame and the reference regression frame to obtain a regression frame in the current video frame. For example, the current video frame and the candidate and reference regression frames are input to a Non-Maximum Suppression (NMS) network, and redundant regression frames are removed to obtain the optimal regression frame in the current video frame. The NMS network belongs to the prior art, and is not described herein in detail.
It should be noted that, since a plurality of candidate regression frames are obtained in the current video frame, the operation only needs to keep a limited number of optimal regression frames, and the number of reference regression frames is not suitable to be too large. Optionally, the number of the reference regression boxes is 1, and accordingly, the second historical video frame is a video frame previous to the current video frame.
And S260, detecting initial position information of each key point from the regression frame in the current video frame.
Optionally, according to the position information of the regression frame, an image of the regression frame is captured from the current video frame, and then the image of the regression frame is input into the key point detection model, so as to obtain the position information of each key point.
In the embodiment, a regression frame including each key point in the second historical video frame is mapped to the current video frame through the optical flow to obtain a reference regression frame of the current video frame; obtaining a regression frame in the current video frame according to the candidate frame and the reference regression frame, thereby obtaining the regression frame in the current video frame by taking the historical position of the regression frame as a reference based on an optical flow algorithm and effectively improving the accuracy of the key point detection; meanwhile, under the conditions that the key points in the current video frame are shielded and the motion is fuzzy, the position information of the key points can be accurately detected; moreover, by detecting the initial position information of each key point from the regression frame in the current video frame, interference factors except the regression frame can be removed, and the accuracy of key point detection is improved.
EXAMPLE III
Fig. 3 is a flowchart of a key point detection method provided in the third embodiment of the present disclosure. In this embodiment, each optional implementation manner of each embodiment is further optimized, and optionally, "the position information of each key point in the current video frame is respectively obtained according to the initial position information and the reference position information of each key point" is optimized to "the position information of each key point in the current video frame, of which the confidence coefficient meets a first preset requirement, is respectively obtained according to the confidence coefficient of the initial position information and the confidence coefficient of the reference position information of each key point", so that the position information of each key point in the current video frame is selected according to the confidence coefficient. With reference to fig. 3, the method provided in this embodiment specifically includes the following operations:
s310, selecting a current video frame and a first historical video frame before the current video frame from the video frame sequence displayed with the user image.
And S320, detecting initial position information of each key point from the current video frame.
S330, mapping the position information of each key point in the first historical video frame to the current video frame through the optical flow to obtain the reference position information of each key point in the current video frame.
And S340, respectively obtaining the position information of each key point, of which the confidence coefficient meets a first preset requirement, in the current video frame according to the confidence coefficient of the initial position information of each key point and the confidence coefficient of the reference position information.
In this embodiment, the keypoint detection model outputs the position information of each keypoint, and also inputs the confidence of each keypoint, that is, the classification probability of each keypoint. In this embodiment, when mapping each keypoint in the first historical video frame to the current video frame, the confidence level of each keypoint in the first historical video frame is also obtained. Each of the keypoints in the first historical video frame is detected by the keypoint detection model, and therefore each of the keypoints corresponds to a respective confidence level, such as 0.9 or 0.8. And the initial position information of each key point in the current video frame is also detected by the key point detection model and corresponds to the respective confidence coefficient.
The key point detection model obtains the confidence of each key point by selecting the maximum value on the feature map corresponding to each key point or performing feature value integration.
For a keypoint, assuming that there are 1 initial position information and 10 reference position information, among the 11 position information, the position information of the keypoint with the confidence level meeting the first preset requirement is selected.
Optionally, the position information corresponding to the maximum confidence is selected from the confidence of the initial position information of each keypoint and the confidence of the reference position information, respectively, as the position information of each keypoint in the current video frame. Optionally, the position information corresponding to the confidence degree greater than or equal to the confidence degree threshold is respectively selected from the confidence degree of the initial position information of each key point and the confidence degree of the reference position information, and is used as the position information of each key point in the current video frame. For a key point, if there are more than two confidences greater than or equal to the confidence threshold, then one of the confidence levels may be selected as the position information of the key point, or the position information corresponding to the maximum confidence level may be selected as the position information of the key point.
In the foregoing embodiment, the current video frame refers to each of the first history video frames to the same extent, that is, different reference position information and initial position information of the same key point are integrated to the same extent. In order to effectively distinguish each first historical video frame from the current video frame, differential reference of each video frame is realized, and the accuracy of key point detection is further improved.
Optionally, first, a first weight of the current video frame and a second weight of the first historical video frame are determined. If the number of the first historical video frames is one, determining a second weight of the first historical video frames; and if the number of the first historical video frames is more than two, respectively determining each second weight of each first historical video frame. Then, weighting the confidence coefficient of the initial position information of each key point by adopting a first weight value to obtain a first weighted confidence coefficient of each key point; and weighting the confidence coefficient of the reference position information of each key point by adopting a second weight value to obtain a second weighted confidence coefficient of each key point. And if the number of the first historical video frames is more than two, weighting the confidence coefficient of the reference position information in each first historical video frame by adopting each second weight value. And then, respectively selecting the position information corresponding to the weighted confidence coefficient meeting the second preset requirement from the first weighted confidence coefficient and the second weighted confidence coefficient of each key point as the position information of each key point in the current video frame. The second preset requirement comprises that the weighted confidence coefficient is maximum or is more than or equal to the weighted confidence coefficient threshold value. And if more than two weighted confidences which are more than or equal to the weighted confidence threshold value exist, selecting one weighted confidence from the weighted confidences, or selecting the maximum weighted confidence.
Alternatively, the first weight and the second weight may be obtained by a machine learning algorithm. Firstly, a training set and a verification set are constructed, and weights of a first historical video frame and a current video frame are trained in the training set, so that position information corresponding to confidence coefficient after maximum weighting approaches to real position information of key points in the current video frame. And then, carrying out cross validation on the trained weights in the validation set to finally obtain the weights of the first historical video frame and the current video frame. Assuming that the first historical video frame comprises a video frame A and a video frame B, the second weights are 0.5 and 0.4 respectively, and the first weight is 0.8, the confidence coefficient of the reference position information of each key point mapped by the video frame A is multiplied by 0.5, the confidence coefficient of the reference position information of each key point mapped by the video frame B is multiplied by 0.4, and the confidence coefficient of the initial position information of each key point in the current video frame is multiplied by 0.8. Next, for each keypoint, the position information corresponding to the highest weighted confidence is selected.
In this embodiment, the position information of each keypoint in the current video frame, for which the confidence coefficient meets the first preset requirement, is obtained according to the confidence coefficient of the initial position information of each keypoint and the confidence coefficient of the reference position information, and the confidence coefficient indirectly reflects the accuracy of the keypoint, so that the position information of the keypoint is obtained according to the confidence coefficient, and the accuracy of the keypoint detection can be improved.
Example four
Fig. 4 is a flowchart of a keypoint detection method according to a fourth embodiment of the present invention. In this embodiment, each optional implementation manner of each embodiment is further optimized, and optionally, the "obtaining the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point" is optimized to "determining the third weight of the reference position information of each key point according to the reciprocal of the distance between the initial position information and the reference position information of each key point; and respectively carrying out weighted average on the reference position information and the initial position information by adopting a third weight and a default weight to obtain the position information of each key point in the current video frame, thereby integrating the distance between the key points, obtaining the position information and improving the accuracy of key point detection. With reference to fig. 4, the method provided in this embodiment specifically includes the following operations:
s410, selecting a current video frame and a first historical video frame before the current video frame from a video frame sequence on which user images are displayed.
And S420, detecting initial position information of each key point from the current video frame.
And S430, mapping the position information of each key point in the first historical video frame to the current video frame through the optical flow to obtain the reference position information of each key point in the current video frame.
S440, respectively determining a third weight of the reference position information of each key point according to the reciprocal of the distance between the initial position information of each key point and the reference position information.
And if the number of the reference position information of one key point is 1, determining the reciprocal of the distance between the reference position information and the initial position information as a third weight of the reference position information. And if the number of the reference position information of one key point is more than two, respectively determining the reciprocal of each distance between each reference position information and the initial position information as a third weight of each reference position information.
Assuming that the initial position information of a key point is A (X1, Y1), the reference position information is B (X2, Y2) and C (X3, Y3), the distance between A and B is L1, and the distance between A and C is L2, determining the third weight value of the reference position information B as 1/L1, and determining the third weight value of the reference position information C as 1/L2.
S450, weighted average is carried out on the reference position information and the initial position information respectively by adopting a third weight and a default weight, and the position information of each key point in the current video frame is obtained.
In this embodiment, the weight of the initial location information is set as a default weight, for example, 1. Following the above example, the weight of the initial location information A is set to 1 according to the formula
Figure BDA0001891681220000121
Figure BDA0001891681220000131
And obtaining the position information (XC, YC) of the key point in the current video frame.
In some embodiments, the distance and the confidence may be combined to obtain the position information of each keypoint in the current video frame, so as to further improve the accuracy of keypoint detection. Optionally, a third weight of the reference position information is determined according to the reference position information corresponding to the confidence degree meeting the first preset condition and the reciprocal of the distance corresponding to the initial position information, and the reference position information and the initial position information are weighted and averaged respectively by adopting the third weight and a default weight, so as to obtain the position information of each key point in the current video frame. Optionally, a third weight of the reference position information is determined according to the reference position information corresponding to the weighted confidence meeting the second preset requirement and the reciprocal of the distance corresponding to the initial position information, and the reference position information and the initial position information are weighted and averaged respectively by adopting the third weight and a default weight to obtain the position information of each key point in the current video frame.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a keypoint detection apparatus provided in the fifth embodiment of the present disclosure, including: an acquisition module 41, a first detection module 42 and a second detection module 43.
A selecting module 51, configured to select a current video frame and a first historical video frame before the current video frame from a sequence of video frames displaying user images;
a detecting module 52, configured to detect initial position information of each key point from a current video frame;
the mapping module 53 is configured to map, through an optical flow, the position information of each key point in the first historical video frame to the current video frame to obtain reference position information of each key point in the current video frame;
the obtaining module 54 is configured to obtain the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point.
In the embodiment, initial position information of each key point in a first historical video frame is mapped to the current video frame through optical flow by detecting the initial position information of each key point in the current video frame, so that reference position information of each key point in the current video frame is obtained, and the position information of each key point in the current video frame is respectively obtained according to the initial position information and the reference position information of each key point, so that the position information of each key point in the current video frame is obtained by taking the historical position of each key point as a reference based on an optical flow algorithm, and the accuracy of key point detection is effectively improved; meanwhile, under the conditions that the key points in the current video frame are shielded and the motion is blurred, the position information of the key points can be accurately detected.
Optionally, when the mapping module 53 maps the position information of each key point in the first historical video frame to the current video frame through the optical flow to obtain the reference position information of each key point in the current video frame, the mapping module is specifically configured to: determining a motion vector of a background image according to the position information of the background pixel point in the first historical video frame and the initial position information in the current video frame; and respectively determining the reference position information of each key point in the current video frame according to the position information of each key point in the first historical video frame and the motion vector of the background image.
Optionally, the selecting module 51 is further configured to select a second historical video frame before the current video frame. When the detecting module 52 detects the initial position information of each key point from the current video frame, it is specifically configured to: determining a candidate frame comprising each key point in a current video frame; mapping a regression frame including each key point in the second historical video frame to the current video frame through the optical flow to obtain a reference regression frame of the current video frame; obtaining a regression frame in the current video frame according to the candidate frame and the reference regression frame; and detecting initial position information of each key point from a regression frame in the current video frame.
Optionally, when obtaining the regression frame in the current video frame according to the candidate frame and the reference regression frame, the detecting module 52 is specifically configured to: and carrying out non-maximum value inhibition on the candidate frame and the reference regression frame to obtain the regression frame in the current video frame.
Optionally, the detecting module 52 is specifically configured to, when mapping the regression frame including each key point in the second historical video frame to the current video frame through the optical flow to obtain a reference regression frame of the current video frame: determining a motion vector of a background image according to the position information of the background pixel point in the second historical video frame and the initial position information in the current video frame; and determining a reference regression frame comprising the key points in the current video frame according to the position information of the regression frame comprising the key points in the second historical video frame and the motion vector of the background image.
Optionally, when the obtaining module 54 obtains the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point, specifically: and respectively obtaining the position information of each key point, the confidence of which meets a first preset requirement, in the current video frame according to the confidence of the initial position information of each key point and the confidence of the reference position information. Optionally, the first preset requirement includes that the confidence is maximum, or the confidence is greater than or equal to a confidence threshold.
Optionally, when the obtaining module 54 obtains the position information of each keypoint, whose confidence level meets the first preset requirement, in the current video frame according to the confidence level of the initial position information of each keypoint and the confidence level of the reference position information, specifically: determining a first weight of a current video frame and a second weight of a first historical video frame; weighting the confidence coefficient of the initial position information of each key point by adopting a first weight value to obtain a first weighted confidence coefficient of each key point; weighting the confidence coefficient of the reference position information of each key point by adopting a second weight value to obtain a second weighted confidence coefficient of each key point; and respectively selecting the position information corresponding to the weighted confidence coefficient meeting the second preset requirement from the first weighted confidence coefficient and the second weighted confidence coefficient of each key point as the position information of each key point in the current video frame.
Optionally, when the obtaining module 54 obtains the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point, specifically: respectively determining a third weight of the reference position information of each key point according to the reciprocal of the distance between the initial position information of each key point and the reference position information; and respectively carrying out weighted average on the reference position information and the initial position information by adopting a third weight and a default weight to obtain the position information of each key point in the current video frame.
The key point detection device provided by the embodiment of the disclosure can execute the key point detection method provided by any embodiment of the disclosure, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE six
Referring now to FIG. 6, a block diagram of an electronic device 600 suitable for use in implementing embodiments of the present disclosure is shown. The electronic device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like, or various forms of servers such as a stand-alone server or a server cluster. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the electronic device 600 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a read-only memory device (ROM)602 or a program loaded from a storage device 605 into a random access memory device (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program containing program code for performing a method of displaying an operable control. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 609, or installed from the storage means 605, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory device (RAM), a read-only memory device (ROM), an erasable programmable read-only memory device (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory device (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the processing device, cause the electronic device to: selecting a current video frame and a first historical video frame before the current video frame from a video frame sequence on which user images are displayed; detecting initial position information of each key point from a current video frame; mapping the position information of each key point in the first historical video frame to the current video frame through the optical flow to obtain the reference position information of each key point in the current video frame; and respectively obtaining the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of a module does not constitute a limitation to the module itself in some cases, for example, a selection module may also be described as a "module that selects a current video frame as well as a historical video frame".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (11)

1. A method for detecting a keypoint, comprising:
selecting a current video frame and a first historical video frame before the current video frame from a video frame sequence on which user images are displayed;
detecting initial position information of each key point from a current video frame;
determining a motion vector of a background image according to the position information of the background pixel point in the first historical video frame and the initial position information in the current video frame;
respectively determining reference position information of each key point in the current video frame according to the position information of each key point in the first historical video frame and the motion vector of the background image;
and respectively obtaining the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point.
2. The method of claim 1, further comprising, before detecting initial position information of each keypoint from a current video frame:
selecting a second historical video frame before the current video frame;
the detecting initial position information of each key point from the current video frame includes:
determining a candidate frame comprising each key point in a current video frame;
mapping a regression frame including each key point in the second historical video frame to the current video frame through the optical flow to obtain a reference regression frame of the current video frame;
obtaining a regression frame in the current video frame according to the candidate frame and the reference regression frame;
and detecting initial position information of each key point from a regression frame in the current video frame.
3. The method of claim 2, wherein obtaining the regression frame in the current video frame according to the candidate frame and the reference regression frame comprises:
and carrying out non-maximum value inhibition on the candidate frame and the reference regression frame to obtain the regression frame in the current video frame.
4. The method of claim 2, wherein the mapping the regression frame including the key points in the second historical video frame to the current video frame by optical flow to obtain a reference regression frame of the current video frame comprises:
determining a motion vector of a background image according to the position information of the background pixel point in the second historical video frame and the initial position information in the current video frame;
and determining a reference regression frame comprising each key point in the current video frame according to the position information of the regression frame comprising each key point in the second historical video frame and the motion vector of the background image.
5. The method according to claim 1, wherein the obtaining the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point respectively comprises:
and respectively obtaining the position information of each key point, the confidence of which meets a first preset requirement, in the current video frame according to the confidence of the initial position information of each key point and the confidence of the reference position information.
6. The method according to claim 5, wherein the obtaining, according to the confidence degrees of the initial position information and the reference position information of each keypoint, the position information of each keypoint in the current video frame whose confidence degree satisfies a first preset requirement, respectively, comprises:
the first preset requirement comprises that the confidence degree is maximum, or the confidence degree is greater than or equal to a confidence degree threshold value.
7. The method according to claim 5, wherein the obtaining, according to the confidence degrees of the initial position information and the reference position information of each keypoint, the position information of each keypoint in the current video frame whose confidence degree satisfies a first preset requirement, respectively, comprises:
determining a first weight of a current video frame and a second weight of a first historical video frame;
weighting the confidence coefficient of the initial position information of each key point by adopting a first weight value to obtain a first weighted confidence coefficient of each key point;
weighting the confidence coefficient of the reference position information of each key point by adopting a second weight value to obtain a second weighted confidence coefficient of each key point;
and respectively selecting the position information corresponding to the weighted confidence coefficient meeting the second preset requirement from the first weighted confidence coefficient and the second weighted confidence coefficient of each key point as the position information of each key point in the current video frame.
8. The method according to claim 1, wherein the obtaining the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point respectively comprises:
respectively determining a third weight of the reference position information of each key point according to the reciprocal of the distance between the initial position information of each key point and the reference position information;
and respectively carrying out weighted average on the reference position information and the initial position information by adopting a third weight and a default weight to obtain the position information of each key point in the current video frame.
9. A keypoint detection device, comprising:
the selection module is used for selecting a current video frame and a first historical video frame before the current video frame from a video frame sequence displaying user images;
the detection module is used for detecting the initial position information of each key point from the current video frame;
the mapping module is used for mapping the position information of each key point in the first historical video frame to the current video frame through the optical flow to obtain the reference position information of each key point in the current video frame;
the acquisition module is used for respectively acquiring the position information of each key point in the current video frame according to the initial position information and the reference position information of each key point;
the mapping module is specifically configured to, when mapping the position information of each key point in the first history video frame to the current video frame through the optical flow to obtain the reference position information of each key point in the current video frame: determining a motion vector of a background image according to the position information of the background pixel point in the first historical video frame and the initial position information in the current video frame; and respectively determining the reference position information of each key point in the current video frame according to the position information of each key point in the first historical video frame and the motion vector of the background image.
10. An electronic device, characterized in that the electronic device comprises:
one or more processing devices;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processing devices to implement the keypoint detection method of any of claims 1-8.
11. A computer-readable medium, on which a computer program is stored, which, when being executed by a processing means, carries out the keypoint detection method according to any one of claims 1 to 8.
CN201811473824.5A 2018-12-04 2018-12-04 Key point detection method, device, equipment and readable medium Active CN109583391B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811473824.5A CN109583391B (en) 2018-12-04 2018-12-04 Key point detection method, device, equipment and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811473824.5A CN109583391B (en) 2018-12-04 2018-12-04 Key point detection method, device, equipment and readable medium

Publications (2)

Publication Number Publication Date
CN109583391A CN109583391A (en) 2019-04-05
CN109583391B true CN109583391B (en) 2021-07-16

Family

ID=65926914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811473824.5A Active CN109583391B (en) 2018-12-04 2018-12-04 Key point detection method, device, equipment and readable medium

Country Status (1)

Country Link
CN (1) CN109583391B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111027412B (en) * 2019-11-20 2024-03-08 北京奇艺世纪科技有限公司 Human body key point identification method and device and electronic equipment
CN113255411A (en) * 2020-02-13 2021-08-13 北京百度网讯科技有限公司 Target detection method and device, electronic equipment and storage medium
CN111401228B (en) * 2020-03-13 2023-12-19 中科创达软件股份有限公司 Video target labeling method and device and electronic equipment
CN112066988B (en) * 2020-08-17 2022-07-26 联想(北京)有限公司 Positioning method and positioning equipment
CN114170632A (en) * 2021-12-03 2022-03-11 北京字节跳动网络技术有限公司 Image processing method and device, electronic equipment and storage medium
CN113887547B (en) * 2021-12-08 2022-03-08 北京世纪好未来教育科技有限公司 Key point detection method and device and electronic equipment
CN116630375A (en) * 2022-02-10 2023-08-22 腾讯科技(深圳)有限公司 Processing method and related device for key points in image
CN115511818B (en) * 2022-09-21 2023-06-13 北京医准智能科技有限公司 Optimization method, device, equipment and storage medium of lung nodule detection model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376576A (en) * 2014-09-04 2015-02-25 华为技术有限公司 Target tracking method and device
CN105447432A (en) * 2014-08-27 2016-03-30 北京千搜科技有限公司 Face anti-fake method based on local motion pattern

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006122009A2 (en) * 2005-05-09 2006-11-16 Lockheed Martin Corporation Continuous extended range image processing
WO2015146813A1 (en) * 2014-03-28 2015-10-01 株式会社ソニー・コンピュータエンタテインメント Object manipulation method, object manipulation program, and information processing device
CN104408743A (en) * 2014-11-05 2015-03-11 百度在线网络技术(北京)有限公司 Image segmentation method and device
US9367897B1 (en) * 2014-12-11 2016-06-14 Sharp Laboratories Of America, Inc. System for video super resolution using semantic components
US9613273B2 (en) * 2015-05-19 2017-04-04 Toyota Motor Engineering & Manufacturing North America, Inc. Apparatus and method for object tracking
CN104933735A (en) * 2015-06-30 2015-09-23 中国电子科技集团公司第二十九研究所 A real time human face tracking method and a system based on spatio-temporal context learning
US9794588B2 (en) * 2015-09-30 2017-10-17 Sony Corporation Image processing system with optical flow recovery mechanism and method of operation thereof
CN106780557B (en) * 2016-12-23 2020-06-09 南京邮电大学 Moving object tracking method based on optical flow method and key point features
US10628675B2 (en) * 2017-02-07 2020-04-21 Fyusion, Inc. Skeleton detection and tracking via client-server communication
CN108229282A (en) * 2017-05-05 2018-06-29 商汤集团有限公司 Critical point detection method, apparatus, storage medium and electronic equipment
CN108205655B (en) * 2017-11-07 2020-08-11 北京市商汤科技开发有限公司 Key point prediction method and device, electronic equipment and storage medium
CN108280444B (en) * 2018-02-26 2021-11-16 江苏裕兰信息科技有限公司 Method for detecting rapid moving object based on vehicle ring view
CN108898118B (en) * 2018-07-04 2023-04-18 腾讯科技(深圳)有限公司 Video data processing method, device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447432A (en) * 2014-08-27 2016-03-30 北京千搜科技有限公司 Face anti-fake method based on local motion pattern
CN104376576A (en) * 2014-09-04 2015-02-25 华为技术有限公司 Target tracking method and device

Also Published As

Publication number Publication date
CN109583391A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
CN109583391B (en) Key point detection method, device, equipment and readable medium
CN109584276B (en) Key point detection method, device, equipment and readable medium
US10311579B2 (en) Apparatus and method for detecting foreground in image
CN111641835B (en) Video processing method, video processing device and electronic equipment
CN110070063B (en) Target object motion recognition method and device and electronic equipment
CN110059623B (en) Method and apparatus for generating information
CN111107278B (en) Image processing method and device, electronic equipment and readable storage medium
CN111589138B (en) Action prediction method, device, equipment and storage medium
CN110796664A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111402122A (en) Image mapping processing method and device, readable medium and electronic equipment
CN111104827A (en) Image processing method and device, electronic equipment and readable storage medium
CN111310595B (en) Method and device for generating information
CN110189364B (en) Method and device for generating information, and target tracking method and device
CN112380929A (en) Highlight segment obtaining method and device, electronic equipment and storage medium
CN114422698B (en) Video generation method, device, equipment and storage medium
CN115393423A (en) Target detection method and device
CN115187510A (en) Loop detection method, device, electronic equipment and medium
CN112085733B (en) Image processing method, image processing device, electronic equipment and computer readable medium
CN111680754B (en) Image classification method, device, electronic equipment and computer readable storage medium
CN112115740B (en) Method and apparatus for processing image
CN110807728B (en) Object display method and device, electronic equipment and computer-readable storage medium
CN109842738B (en) Method and apparatus for photographing image
CN114120423A (en) Face image detection method and device, electronic equipment and computer readable medium
CN110717467A (en) Head pose estimation method, device, equipment and storage medium
CN111860209B (en) Hand recognition method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant