CN112926356A - Target tracking method and device - Google Patents
Target tracking method and device Download PDFInfo
- Publication number
- CN112926356A CN112926356A CN201911236052.8A CN201911236052A CN112926356A CN 112926356 A CN112926356 A CN 112926356A CN 201911236052 A CN201911236052 A CN 201911236052A CN 112926356 A CN112926356 A CN 112926356A
- Authority
- CN
- China
- Prior art keywords
- frame image
- target
- frame
- image
- target detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 238000001514 detection method Methods 0.000 claims abstract description 166
- 238000006073 displacement reaction Methods 0.000 claims abstract description 87
- 238000004590 computer program Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 description 27
- 238000004422 calculation algorithm Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
- G06V10/464—Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a target tracking method and device, and relates to the technical field of computers. One embodiment of the method comprises: carrying out target detection on the kth frame image, and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1; respectively determining a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image based on the target detection frame in the kth frame image; determining the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image; and when the average displacement is smaller than or equal to the threshold value, correcting the target detection frame in the k frame of image through the average displacement, and taking the corrected target detection frame as the target detection frame of the k +2 frame of image to realize target tracking. The method and the device can solve the problems that each frame of image needs target detection, consumes long time and cannot meet the real-time requirement, further improve the detection efficiency and are suitable for application scenes with high real-time requirements.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a target tracking method and device.
Background
Target tracking is an important part of an automatic identification system, and the technology is widely applied. It generally refers to that for any given image, a certain strategy is adopted to search the image to determine whether the image contains a target (such as a human face), and if the image contains the target, the position, size and the like of the target can be returned.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the existing target tracking algorithm is mainly divided into a traditional algorithm and a depth algorithm, the traditional algorithm is kcf (Kernel Correlation Filter) and other related filtering algorithms, a target to be tracked is given, and then the maximum response position in an image is obtained through a Filter, so that target tracking is realized. The depth algorithm regresses the position of the target in the image by extracting the characteristics of the target. However, the two methods are large in calculation amount and have high requirements on performance. Due to the limited performance of the mobile terminal, it is difficult to deploy and operate the mobile terminal in real time.
Disclosure of Invention
In view of this, embodiments of the present invention provide a target tracking method and apparatus, which can solve the problems that each frame of image needs target detection, is long in time consumption, and cannot meet a real-time requirement, thereby improving detection efficiency, and being suitable for an application scenario with a high real-time requirement.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a target tracking method including:
carrying out target detection on the kth frame image, and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1;
respectively determining a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image based on the target detection frame in the kth frame image;
determining the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image;
and when the average displacement is smaller than or equal to a threshold value, correcting a target detection frame in the k frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the k +2 frame image to realize target tracking.
Optionally, the method further comprises: and when the average displacement is larger than a threshold value, carrying out target detection on the (k + 2) th frame image, and determining a target detection frame in the (k + 2) th frame image so as to realize target tracking.
Optionally, determining an average displacement of the target keypoint in the k frame image and the target keypoint in the k +1 frame image comprises:
respectively determining the average positions of a plurality of target key points in the k frame image and the average positions of a plurality of target key points in the k +1 frame image;
and calculating the displacement difference between the average position of the plurality of target key points in the k +1 frame image and the average position of the plurality of target key points in the k frame image, and taking the displacement difference as the average displacement of the target key points in the k frame image and the target key points in the k +1 frame image.
Optionally, the correcting the target detection frame in the k frame image by the average displacement includes:
and translating the target detection frame in the k frame image according to the average displacement.
To achieve the above object, according to another aspect of embodiments of the present invention, there is provided an object tracking apparatus including:
the detection frame determining module is used for carrying out target detection on the kth frame image and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1;
a key point determining module, configured to determine, based on the target detection frame in the kth frame image, a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image, respectively;
a displacement determining module, configured to determine an average displacement between a target key point in the k frame image and a target key point in the (k + 1) th frame image;
and the tracking module is used for correcting the target detection frame in the k frame image through the average displacement when the average displacement is less than or equal to a threshold value, and using the corrected target detection frame as the target detection frame of the k +2 frame image so as to realize target tracking.
Optionally, the tracking module is further configured to: and when the average displacement is larger than a threshold value, carrying out target detection on the (k + 2) th frame image, and determining a target detection frame in the (k + 2) th frame image so as to realize target tracking.
Optionally, the displacement determining module is further configured to:
respectively determining the average positions of a plurality of target key points in the k frame image and the average positions of a plurality of target key points in the k +1 frame image;
and calculating the displacement difference between the average position of the plurality of target key points in the k +1 frame image and the average position of the plurality of target key points in the k frame image, and taking the displacement difference as the average displacement of the target key points in the k frame image and the target key points in the k +1 frame image.
Optionally, the tracking module is further configured to: and translating the target detection frame in the k frame image according to the average displacement.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the object tracking method of an embodiment of the present invention.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program implementing the object tracking method of an embodiment of the present invention when executed by a processor.
One embodiment of the above invention has the following advantages or benefits: because a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image are respectively determined through the target detection frame in the kth frame image, namely the target detection frame in the kth frame image is used as the target detection frame of the (k + 1) th frame image, the (k + 1) th frame image is not subjected to target detection, the target detection process is saved, the speed of the whole target tracking process is increased, the time is saved, and the efficiency is improved; when the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to a threshold value, correcting a target detection frame in the kth frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the (k + 2) th frame image to realize target tracking, namely under the condition that the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to the threshold value, the (k + 2) th frame image does not pass through target detection, so that the process of target detection is also saved, the speed of the whole process is accelerated, and the efficiency is improved. Therefore, the target tracking method of the embodiment of the invention avoids target detection on each frame of image, thereby overcoming the technical problems that each frame of image needs target detection, consumes long time and cannot meet the real-time requirement in the prior art, further improving the detection efficiency and being suitable for application scenes with higher real-time requirement.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a main flow of a target tracking method of an embodiment of the present invention;
FIG. 2 is a schematic diagram of the major modules of a target tracking device of an embodiment of the present invention;
FIG. 3 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 4 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of a target tracking method according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step S101: carrying out target detection on the kth frame image, and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1.
In this embodiment, the target may be a human face in the image, or may also be a vehicle or other objects in the image, and the present invention is not limited herein.
In this step, the object detection is to obtain the object in the image
And obtaining the target detection frame at the appearing position. As an example, the position of the target detection frame can be obtained by using the SSD detection algorithm, and the position of the target detection frame is (x, y, w, h)box,(x,y)boxPosition coordinates, w, representing the upper left corner of the target detection framebox,hboxRespectively representing the width and height of the target detection box. Among them, the SSD detection algorithm (target detection algorithm) is a deep convolutional neural network object detection algorithm based on a regression idea.
Step S102: and respectively determining a plurality of target key points in the k frame image and a plurality of target key points in the (k + 1) th frame image based on the target detection frame in the k frame image.
The purpose of this step is to obtain point coordinates of a specific location of an object in an image, for example to obtain point coordinates of a specific location of a human face in an image. In the present embodiment, a plurality of target key points in the k-th frame image and a plurality of target key points in the k + 1-th frame image correspond to each other.
Specifically, the target key point can be obtained through the following process:
and obtaining a target position in the image through a target detection frame obtained by a target detection algorithm, extracting the image of a target part in the image by using the target detection frame to obtain an image of the target part, and finally inputting the image of the target part into a target key point detection model to obtain a target key point. The target key point detection model is a deep learning model and is obtained through training of training data, namely, a mapping relation from an image to a point is obtained through training of a target image and a corresponding target key point, and is set as f. When the model is used for detecting the target key points, the positions of the target key points can be obtained only by inputting image data into the model. In a specific embodiment, the number of the target key points may be set to 106.
The image input at the k-th frame is IkAnd f is the mapping model of the image to the key points, the target key point detection can be expressed by the formula (1):
f(Ik)={(x1,y1),(x2,y2),…(xn,yn),}k (1)
wherein (x)n,yn) And detecting a plurality of key points output by the model for the target key points.
In the step, the target detection frame in the k frame image is used as the target detection frame of the k +1 frame image, namely the k +1 frame image is not subjected to target detection, so that the target detection process is saved, the speed of the whole target tracking process is increased, the time is saved, and the efficiency is improved. In this step, although the position of the target detection frame in the k-th frame image may deviate from the actual position of the target detection frame in the (k + 1) -th frame image, which may cause a certain deviation when the target detection frame in the k-th frame image is used to extract the target in the (k + 1) -th frame image, since the target keypoint model has a certain generalization, the target keypoint model can correctly output the position of the target keypoint even if the target image input into the target keypoint model has a deviation at a certain pixel level.
Step S103: and determining the average displacement of the target key point in the k frame image and the target key point in the k +1 frame image.
Specifically, the method comprises the following steps:
respectively determining the average positions of a plurality of target key points in the k frame image and the average positions of a plurality of target key points in the k +1 frame image;
and calculating the displacement difference between the average position of the plurality of target key points in the k +1 frame image and the average position of the plurality of target key points in the k frame image, and taking the displacement difference as the average displacement of the target key points in the k frame image and the target key points in the k +1 frame image.
Wherein the average position of a plurality of target keypoints in the kth frame image is calculated according to the following formula (2):
indicating the average position of a plurality of target keypoints in the image of the kth frame,indicating the position of the ith target keypoint in the image of the kth frame.
Calculating the average position of a plurality of target key points in the k +1 frame image according to the following formula (3):
represents the average position of a plurality of target key points in the image of the (k + 1) th frame,indicating the position of the ith target keypoint in the image of the (k + 1) th frame.
Calculating a displacement difference between the average position of the plurality of target keypoints in the image of the (k + 1) th frame and the average position of the plurality of target keypoints in the image of the k +1 th frame according to the following formula (4):
and representing the displacement difference between the average position of the plurality of target key points in the k +1 th frame image and the average position of the plurality of target key points in the k +1 th frame image.
Step S104: and judging the average displacement and the threshold value. In this step, the threshold may be flexibly set according to an application scenario, and the present invention is not limited herein.
Step S105: and when the average displacement is smaller than or equal to a threshold value, correcting a target detection frame in the k frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the k +2 frame image to realize target tracking.
In this embodiment, if the average displacement is smaller than the threshold, which means that the target in the k +1 th frame image moves less than the k th frame image, the target detection frame in the k th frame image is translated, as shown in the following formula (5):
indicating the position of the target detection frame in the image of the k-th frame,and indicates the position of the corrected target detection frame.
Step S106: and when the average displacement is larger than a threshold value, carrying out target detection on the (k + 2) th frame image, and determining a target detection frame in the (k + 2) th frame image so as to realize target tracking.
In this embodiment, if the average displacement is greater than the threshold, which means that the target in the (k + 1) th frame image moves more than the k +2 th frame image, the target detection needs to be performed again for the (k + 2) th frame image to obtain the target detection frame in the (k + 2) th frame image.
According to the target tracking method, the target detection frames in the kth frame image are used for respectively determining the target key points in the kth frame image and the target key points in the (k + 1) th frame image, namely the target detection frames in the kth frame image are used as the target detection frames of the (k + 1) th frame image, so that the (k + 1) th frame image is not subjected to target detection, the target detection process is saved, the speed of the whole target tracking process is increased, the time is saved, and the efficiency is improved; when the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to a threshold value, correcting a target detection frame in the kth frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the (k + 2) th frame image to realize target tracking, namely under the condition that the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to the threshold value, the (k + 2) th frame image does not pass through target detection, so that the process of target detection is also saved, the speed of the whole process is accelerated, and the efficiency is improved. Therefore, the target tracking method of the embodiment of the invention avoids target detection on each frame of image, thereby overcoming the technical problems that each frame of image needs target detection, consumes long time and cannot meet the real-time requirement in the prior art, further improving the detection efficiency and being suitable for application scenes with higher real-time requirement.
In order to make the target tracking method of the embodiment of the invention clearer, the processing procedure of the method is described again by taking the face in the tracking image as an example:
(1) the processing procedure for the k frame image is as follows: obtaining a face frame through an SSD (solid State disk) detection algorithm, obtaining a face image through the face frame, inputting the face image into a face key point detection model to obtain a plurality of face key points, and calculating the average positions of the face key points;
(2) the processing procedure for the k +1 frame image is as follows: and acquiring a face image in the (k + 1) th frame image by using a face frame in the kth frame image, inputting the face image into a face key point detection model to obtain face key points, and calculating the average position of the face key points. It is worth noting that the face frame in the k +1 th frame image is the face frame of the k-th frame image, but not the face frame of the k +1 th frame image, and the k +1 th frame image is not detected by the SSD algorithm, which aims to save the SSD detection process in this frame, although the position of the face frame of the k-th frame image may have a little deviation from the actual position of the face in the k +1 th frame image, which may result in a certain deviation of the face deducted from the k +1 th frame image, because the face key point model has a certain generalization, even if the face position in the face image input in the face key point model has some pixel level deviations, the face key point model can also correctly output the position of the key point.
(3) Calculating the average displacement of key points of the human face in the k frame and the k +1 frame of image, if the average displacement is less than or equal to the threshold value, indicating that the human face position of the k frame and the k +1 frame of image does not move much, and then correcting the position of the human face frame of the k frame by using the average displacement of the k frame and the k +1 frame as the human face frame of the k +2 frame of image. If the average displacement is larger than the threshold value, the fact that the human face moves too much when the k +1 th frame image is compared with the k +2 th frame image is shown, then the human face frame of the k +1 th frame image is directly translated through the average displacement to obtain the human face frame of the k +2 th frame image, and problems can be caused when the human face frame is used for conducting image matting on the k +2 th frame image, so that when the average displacement is larger than the threshold value, human face detection is conducted on the k +2 th frame image again, when image matting is conducted on the k +2 th frame image and key point detection is conducted on the k +2 th frame image, even if the human face moves fast, the position of the human face frame and the position of the image can be completely corresponding, the position of the human face image buckled on the k +2 th frame image is also correct, and therefore it is.
It is worth noting that the position of the face frame after the correction is the position of the face frame of the (k + 1) th frame image, compared with the (k + 2) th frame image, the position of the face frame may not be in the middle of the position of the face of the (k + 2) th frame image, but due to the generalization capability of the face key point model, even if the position of the face in the face image input in the face key point model has some pixel level deviations, the face key point model can correctly output the position of the key point, so that the face tracking method of the embodiment of the invention can reduce the face detection frequency of some frames by the means.
(4) The processing flow for the k +2 frame image is:
a. if the average displacement of the key points of the human faces of the (k + 1) th frame image and the k (k) th frame image is less than or equal to the threshold value, the (k + 1) th frame image moves less than the k (k) th frame image, the human face frame of the k (k) th frame image is directly translated according to the average displacement to obtain the human face frame of the image of the (k + 2) th frame, then the translated human face frame is used for matting the image of the (k + 2) th frame image, and the deducted human face image is input into a human face key point model;
b. if the average displacement of the face key points of the k +1 th frame and the k-th frame is larger than the threshold value, it indicates that the face frame of the k +1 th frame may not be accurate if the face frame of the k +1 th frame is directly translated, and if the face frame is used to scratch on the image of the k +2 th frame, the deviation may be larger, and the face key point model may not obtain a correct result. Therefore, under the condition that the average displacement is larger than the threshold value, the SSD detection is directly carried out on the (k + 2) th frame image, and the face frame of the (k + 2) th frame image is determined to be the position of the face in the (k + 2) th frame image.
According to the target tracking method, the plurality of face key points in the kth frame image and the plurality of face key points in the (k + 1) th frame image are respectively determined through the face frame in the kth frame image, namely, the face frame in the kth frame image is used as the face frame of the (k + 1) th frame image, so that the (k + 1) th frame image is not subjected to face detection, the face detection process is saved, the speed of the whole face tracking process is increased, the time is saved, and the efficiency is improved; when the average displacement between the key points of the face in the k frame image and the key points of the face in the (k + 1) th frame image is smaller than or equal to a threshold value, correcting the face frame in the k frame image through the average displacement, and using the corrected face frame as the face frame of the (k + 2) th frame image to realize face tracking, namely under the condition that the average displacement between the key points of the face in the k frame image and the key points of the face in the (k + 1) th frame image is smaller than or equal to the threshold value, the (k + 2) th frame image is not subjected to face detection, so that the process of face detection is also saved, the speed of the whole process is accelerated, and the efficiency is improved. Therefore, the face tracking method of the embodiment of the invention avoids face detection on each frame of image, thereby overcoming the technical problems that each frame of image needs face detection, is long in time consumption and cannot meet the real-time requirement in the prior art, further improving the detection efficiency, and being suitable for application scenes with higher real-time requirement.
Fig. 2 is a schematic diagram of the main modules of an object tracking apparatus 200 according to an embodiment of the present invention, as shown in fig. 2, the apparatus 200 includes:
a detection frame determining module 201, configured to perform target detection on a k-th frame image, and determine a target detection frame in the k-th frame image; wherein k is an integer greater than or equal to 1;
a keypoint determination module 202, configured to determine, based on the target detection frame in the kth frame image, a plurality of target keypoints in the kth frame image and a plurality of target keypoints in the (k + 1) th frame image, respectively;
a displacement determining module 203, configured to determine an average displacement between a target key point in the k frame image and a target key point in the (k + 1) th frame image;
and the tracking module 204 is configured to, when the average displacement is smaller than or equal to a threshold, correct the target detection frame in the k frame of image through the average displacement, and use the corrected target detection frame as the target detection frame of the k +2 frame of image, so as to implement target tracking.
In an alternative embodiment, the tracking module 204 is further configured to: and when the average displacement is larger than a threshold value, carrying out target detection on the (k + 2) th frame image, and determining a target detection frame in the (k + 2) th frame image so as to realize target tracking.
In an alternative embodiment, the displacement determining module 203 is further configured to:
respectively determining the average positions of a plurality of target key points in the k frame image and the average positions of a plurality of target key points in the k +1 frame image;
and calculating the displacement difference between the average position of the plurality of target key points in the k +1 frame image and the average position of the plurality of target key points in the k frame image, and taking the displacement difference as the average displacement of the target key points in the k frame image and the target key points in the k +1 frame image.
In an alternative embodiment, the tracking module 204 is further configured to: and translating the target detection frame in the k frame image according to the average displacement.
According to the target tracking device, the target detection frames in the kth frame image are used for respectively determining the target key points in the kth frame image and the target key points in the (k + 1) th frame image, namely the target detection frames in the kth frame image are used as the target detection frames of the (k + 1) th frame image, so that the (k + 1) th frame image is not subjected to target detection, the target detection process is saved, the speed of the whole target tracking process is increased, the time is saved, and the efficiency is improved; when the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to a threshold value, correcting a target detection frame in the kth frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the (k + 2) th frame image to realize target tracking, namely under the condition that the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to the threshold value, the (k + 2) th frame image does not pass through target detection, so that the process of target detection is also saved, the speed of the whole process is accelerated, and the efficiency is improved. Therefore, the target tracking method of the embodiment of the invention avoids target detection on each frame of image, thereby overcoming the technical problems that each frame of image needs target detection, consumes long time and cannot meet the real-time requirement in the prior art, further improving the detection efficiency and being suitable for application scenes with higher real-time requirement.
The device can execute the method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.
Fig. 3 illustrates an exemplary system architecture 300 to which a target tracking method or a target tracking apparatus of an embodiment of the present invention may be applied.
As shown in fig. 3, the system architecture 300 may include terminal devices 301, 302, 303, a network 304, and a server 305. The network 304 serves as a medium for providing communication links between the terminal devices 301, 302, 303 and the server 305. Network 304 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal device 301, 302, 303 to interact with the server 305 via the network 304 to receive or send messages or the like. The terminal devices 301, 302, 303 may have various communication client applications installed thereon, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like.
The terminal devices 301, 302, 303 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 305 may be a server providing various services, such as a background management server providing support for shopping websites browsed by the user using the terminal devices 301, 302, 303. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.
It should be noted that the target tracking method provided by the embodiment of the present invention is generally executed by the server 305, and accordingly, the target tracking apparatus is generally disposed in the server 305.
It should be understood that the number of terminal devices, networks, and servers in fig. 3 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 4, a block diagram of a computer system 400 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the system of the present invention when executed by a Central Processing Unit (CPU) 401.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not in some cases constitute a limitation on the unit itself, and for example, the sending module may also be described as a "module that sends a picture acquisition request to a connected server".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
carrying out target detection on the kth frame image, and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1;
respectively determining a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image based on the target detection frame in the kth frame image;
determining the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image;
and when the average displacement is smaller than or equal to a threshold value, correcting a target detection frame in the k frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the k +2 frame image to realize target tracking.
According to the technical scheme of the embodiment of the invention, a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image are respectively determined through the target detection frame in the kth frame image, namely the target detection frame in the kth frame image is used as the target detection frame of the (k + 1) th frame image, so that the (k + 1) th frame image is not subjected to target detection, the target detection process is saved, the speed of the whole target tracking process is increased, the time is saved, and the efficiency is improved; when the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to a threshold value, correcting a target detection frame in the kth frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the (k + 2) th frame image to realize target tracking, namely under the condition that the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to the threshold value, the (k + 2) th frame image does not pass through target detection, so that the process of target detection is also saved, the speed of the whole process is accelerated, and the efficiency is improved. Therefore, the target tracking method of the embodiment of the invention avoids target detection on each frame of image, thereby overcoming the technical problems that each frame of image needs target detection, consumes long time and cannot meet the real-time requirement in the prior art, further improving the detection efficiency and being suitable for application scenes with higher real-time requirement.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (7)
1. A target tracking method, comprising:
carrying out target detection on the kth frame image, and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1;
respectively determining a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image based on the target detection frame in the kth frame image;
determining the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image;
and when the average displacement is smaller than or equal to a threshold value, correcting a target detection frame in the k frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the k +2 frame image to realize target tracking.
2. The method of claim 1, further comprising:
and when the average displacement is larger than a threshold value, carrying out target detection on the (k + 2) th frame image, and determining a target detection frame in the (k + 2) th frame image so as to realize target tracking.
3. The method of claim 1, wherein determining an average displacement of a target keypoint in the k frame image and a target keypoint in the k +1 frame image comprises:
respectively determining the average positions of a plurality of target key points in the k frame image and the average positions of a plurality of target key points in the k +1 frame image;
and calculating the displacement difference between the average position of the plurality of target key points in the k +1 frame image and the average position of the plurality of target key points in the k frame image, and taking the displacement difference as the average displacement of the target key points in the k frame image and the target key points in the k +1 frame image.
4. The method of claim 1, wherein correcting the object detection frame in the k frame image by the average displacement comprises:
and translating the target detection frame in the k frame image according to the average displacement.
5. An object tracking device, comprising:
the detection frame determining module is used for carrying out target detection on the kth frame image and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1;
a key point determining module, configured to determine, based on the target detection frame in the kth frame image, a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image, respectively;
a displacement determining module, configured to determine an average displacement between a target key point in the k frame image and a target key point in the (k + 1) th frame image;
and the tracking module is used for correcting the target detection frame in the k frame image through the average displacement when the average displacement is less than or equal to a threshold value, and using the corrected target detection frame as the target detection frame of the k +2 frame image so as to realize target tracking.
6. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4.
7. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911236052.8A CN112926356B (en) | 2019-12-05 | 2019-12-05 | Target tracking method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911236052.8A CN112926356B (en) | 2019-12-05 | 2019-12-05 | Target tracking method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112926356A true CN112926356A (en) | 2021-06-08 |
CN112926356B CN112926356B (en) | 2024-06-18 |
Family
ID=76161900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911236052.8A Active CN112926356B (en) | 2019-12-05 | 2019-12-05 | Target tracking method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112926356B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007088759A1 (en) * | 2006-02-01 | 2007-08-09 | National University Corporation The University Of Electro-Communications | Displacement detection method, displacement detection device, displacement detection program, characteristic point matching method, and characteristic point matching program |
CN103077532A (en) * | 2012-12-24 | 2013-05-01 | 天津市亚安科技股份有限公司 | Real-time video object quick tracking method |
CN103455797A (en) * | 2013-09-07 | 2013-12-18 | 西安电子科技大学 | Detection and tracking method of moving small target in aerial shot video |
CN106846362A (en) * | 2016-12-26 | 2017-06-13 | 歌尔科技有限公司 | A kind of target detection tracking method and device |
KR101837407B1 (en) * | 2017-11-03 | 2018-03-12 | 국방과학연구소 | Apparatus and method for image-based target tracking |
CN109003245A (en) * | 2018-08-21 | 2018-12-14 | 厦门美图之家科技有限公司 | Coordinate processing method, device and electronic equipment |
CN109214245A (en) * | 2017-07-03 | 2019-01-15 | 株式会社理光 | A kind of method for tracking target, device, equipment and computer readable storage medium |
CN110349190A (en) * | 2019-06-10 | 2019-10-18 | 广州视源电子科技股份有限公司 | Target tracking method, device and equipment for adaptive learning and readable storage medium |
CN110378264A (en) * | 2019-07-08 | 2019-10-25 | Oppo广东移动通信有限公司 | Method for tracking target and device |
CN110400332A (en) * | 2018-04-25 | 2019-11-01 | 杭州海康威视数字技术股份有限公司 | A kind of target detection tracking method, device and computer equipment |
-
2019
- 2019-12-05 CN CN201911236052.8A patent/CN112926356B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007088759A1 (en) * | 2006-02-01 | 2007-08-09 | National University Corporation The University Of Electro-Communications | Displacement detection method, displacement detection device, displacement detection program, characteristic point matching method, and characteristic point matching program |
CN103077532A (en) * | 2012-12-24 | 2013-05-01 | 天津市亚安科技股份有限公司 | Real-time video object quick tracking method |
CN103455797A (en) * | 2013-09-07 | 2013-12-18 | 西安电子科技大学 | Detection and tracking method of moving small target in aerial shot video |
CN106846362A (en) * | 2016-12-26 | 2017-06-13 | 歌尔科技有限公司 | A kind of target detection tracking method and device |
CN109214245A (en) * | 2017-07-03 | 2019-01-15 | 株式会社理光 | A kind of method for tracking target, device, equipment and computer readable storage medium |
KR101837407B1 (en) * | 2017-11-03 | 2018-03-12 | 국방과학연구소 | Apparatus and method for image-based target tracking |
CN110400332A (en) * | 2018-04-25 | 2019-11-01 | 杭州海康威视数字技术股份有限公司 | A kind of target detection tracking method, device and computer equipment |
CN109003245A (en) * | 2018-08-21 | 2018-12-14 | 厦门美图之家科技有限公司 | Coordinate processing method, device and electronic equipment |
CN110349190A (en) * | 2019-06-10 | 2019-10-18 | 广州视源电子科技股份有限公司 | Target tracking method, device and equipment for adaptive learning and readable storage medium |
CN110378264A (en) * | 2019-07-08 | 2019-10-25 | Oppo广东移动通信有限公司 | Method for tracking target and device |
Non-Patent Citations (3)
Title |
---|
尹彦;耿兆丰;: "基于背景模型的运动目标检测与跟踪", 微计算机信息, no. 16, 5 June 2008 (2008-06-05) * |
栾庆磊;陈正伟;何勇;: "一种运动背景下移动目标的检测方法", 计算机与数字工程, no. 10, 20 October 2008 (2008-10-20) * |
谢永亮;洪留荣;葛方振;郑颖;孙雯;贾平平;: "运动背景下任意目标跟踪方法研究", 苏州科技学院学报(自然科学版), no. 03, 15 September 2016 (2016-09-15) * |
Also Published As
Publication number | Publication date |
---|---|
CN112926356B (en) | 2024-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10796438B2 (en) | Method and apparatus for tracking target profile in video | |
CN110225366B (en) | Video data processing and advertisement space determining method, device, medium and electronic equipment | |
CN109255337B (en) | Face key point detection method and device | |
US20210200971A1 (en) | Image processing method and apparatus | |
CN110069961B (en) | Object detection method and device | |
CN111192312B (en) | Depth image acquisition method, device, equipment and medium based on deep learning | |
CN111815738B (en) | Method and device for constructing map | |
CN108182457B (en) | Method and apparatus for generating information | |
CN113158773B (en) | Training method and training device for living body detection model | |
CN110349158A (en) | A kind of method and apparatus handling point cloud data | |
CN110941978A (en) | Face clustering method and device for unidentified personnel and storage medium | |
KR20240140057A (en) | Facial recognition method and device | |
CN113033377A (en) | Character position correction method, character position correction device, electronic equipment and storage medium | |
CN110288625B (en) | Method and apparatus for processing image | |
CN114119990A (en) | Method, apparatus and computer program product for image feature point matching | |
CN112651399A (en) | Method for detecting same-line characters in oblique image and related equipment thereof | |
CN113362090A (en) | User behavior data processing method and device | |
CN114724144B (en) | Text recognition method, training device, training equipment and training medium for model | |
CN112926356B (en) | Target tracking method and device | |
CN113808134B (en) | Oil tank layout information generation method, oil tank layout information generation device, electronic apparatus, and medium | |
CN110634155A (en) | Target detection method and device based on deep learning | |
CN112487943B (en) | Key frame de-duplication method and device and electronic equipment | |
CN115376026A (en) | Key area positioning method, device, equipment and storage medium | |
CN114581711A (en) | Target object detection method, apparatus, device, storage medium, and program product | |
CN112000218B (en) | Object display method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |