CN112926356A - Target tracking method and device - Google Patents

Target tracking method and device Download PDF

Info

Publication number
CN112926356A
CN112926356A CN201911236052.8A CN201911236052A CN112926356A CN 112926356 A CN112926356 A CN 112926356A CN 201911236052 A CN201911236052 A CN 201911236052A CN 112926356 A CN112926356 A CN 112926356A
Authority
CN
China
Prior art keywords
frame image
target
frame
image
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911236052.8A
Other languages
Chinese (zh)
Other versions
CN112926356B (en
Inventor
朱兆琪
董玉新
安山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201911236052.8A priority Critical patent/CN112926356B/en
Publication of CN112926356A publication Critical patent/CN112926356A/en
Application granted granted Critical
Publication of CN112926356B publication Critical patent/CN112926356B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target tracking method and device, and relates to the technical field of computers. One embodiment of the method comprises: carrying out target detection on the kth frame image, and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1; respectively determining a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image based on the target detection frame in the kth frame image; determining the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image; and when the average displacement is smaller than or equal to the threshold value, correcting the target detection frame in the k frame of image through the average displacement, and taking the corrected target detection frame as the target detection frame of the k +2 frame of image to realize target tracking. The method and the device can solve the problems that each frame of image needs target detection, consumes long time and cannot meet the real-time requirement, further improve the detection efficiency and are suitable for application scenes with high real-time requirements.

Description

Target tracking method and device
Technical Field
The invention relates to the technical field of computers, in particular to a target tracking method and device.
Background
Target tracking is an important part of an automatic identification system, and the technology is widely applied. It generally refers to that for any given image, a certain strategy is adopted to search the image to determine whether the image contains a target (such as a human face), and if the image contains the target, the position, size and the like of the target can be returned.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: the existing target tracking algorithm is mainly divided into a traditional algorithm and a depth algorithm, the traditional algorithm is kcf (Kernel Correlation Filter) and other related filtering algorithms, a target to be tracked is given, and then the maximum response position in an image is obtained through a Filter, so that target tracking is realized. The depth algorithm regresses the position of the target in the image by extracting the characteristics of the target. However, the two methods are large in calculation amount and have high requirements on performance. Due to the limited performance of the mobile terminal, it is difficult to deploy and operate the mobile terminal in real time.
Disclosure of Invention
In view of this, embodiments of the present invention provide a target tracking method and apparatus, which can solve the problems that each frame of image needs target detection, is long in time consumption, and cannot meet a real-time requirement, thereby improving detection efficiency, and being suitable for an application scenario with a high real-time requirement.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a target tracking method including:
carrying out target detection on the kth frame image, and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1;
respectively determining a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image based on the target detection frame in the kth frame image;
determining the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image;
and when the average displacement is smaller than or equal to a threshold value, correcting a target detection frame in the k frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the k +2 frame image to realize target tracking.
Optionally, the method further comprises: and when the average displacement is larger than a threshold value, carrying out target detection on the (k + 2) th frame image, and determining a target detection frame in the (k + 2) th frame image so as to realize target tracking.
Optionally, determining an average displacement of the target keypoint in the k frame image and the target keypoint in the k +1 frame image comprises:
respectively determining the average positions of a plurality of target key points in the k frame image and the average positions of a plurality of target key points in the k +1 frame image;
and calculating the displacement difference between the average position of the plurality of target key points in the k +1 frame image and the average position of the plurality of target key points in the k frame image, and taking the displacement difference as the average displacement of the target key points in the k frame image and the target key points in the k +1 frame image.
Optionally, the correcting the target detection frame in the k frame image by the average displacement includes:
and translating the target detection frame in the k frame image according to the average displacement.
To achieve the above object, according to another aspect of embodiments of the present invention, there is provided an object tracking apparatus including:
the detection frame determining module is used for carrying out target detection on the kth frame image and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1;
a key point determining module, configured to determine, based on the target detection frame in the kth frame image, a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image, respectively;
a displacement determining module, configured to determine an average displacement between a target key point in the k frame image and a target key point in the (k + 1) th frame image;
and the tracking module is used for correcting the target detection frame in the k frame image through the average displacement when the average displacement is less than or equal to a threshold value, and using the corrected target detection frame as the target detection frame of the k +2 frame image so as to realize target tracking.
Optionally, the tracking module is further configured to: and when the average displacement is larger than a threshold value, carrying out target detection on the (k + 2) th frame image, and determining a target detection frame in the (k + 2) th frame image so as to realize target tracking.
Optionally, the displacement determining module is further configured to:
respectively determining the average positions of a plurality of target key points in the k frame image and the average positions of a plurality of target key points in the k +1 frame image;
and calculating the displacement difference between the average position of the plurality of target key points in the k +1 frame image and the average position of the plurality of target key points in the k frame image, and taking the displacement difference as the average displacement of the target key points in the k frame image and the target key points in the k +1 frame image.
Optionally, the tracking module is further configured to: and translating the target detection frame in the k frame image according to the average displacement.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the object tracking method of an embodiment of the present invention.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program implementing the object tracking method of an embodiment of the present invention when executed by a processor.
One embodiment of the above invention has the following advantages or benefits: because a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image are respectively determined through the target detection frame in the kth frame image, namely the target detection frame in the kth frame image is used as the target detection frame of the (k + 1) th frame image, the (k + 1) th frame image is not subjected to target detection, the target detection process is saved, the speed of the whole target tracking process is increased, the time is saved, and the efficiency is improved; when the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to a threshold value, correcting a target detection frame in the kth frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the (k + 2) th frame image to realize target tracking, namely under the condition that the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to the threshold value, the (k + 2) th frame image does not pass through target detection, so that the process of target detection is also saved, the speed of the whole process is accelerated, and the efficiency is improved. Therefore, the target tracking method of the embodiment of the invention avoids target detection on each frame of image, thereby overcoming the technical problems that each frame of image needs target detection, consumes long time and cannot meet the real-time requirement in the prior art, further improving the detection efficiency and being suitable for application scenes with higher real-time requirement.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of a main flow of a target tracking method of an embodiment of the present invention;
FIG. 2 is a schematic diagram of the major modules of a target tracking device of an embodiment of the present invention;
FIG. 3 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 4 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of a main flow of a target tracking method according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step S101: carrying out target detection on the kth frame image, and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1.
In this embodiment, the target may be a human face in the image, or may also be a vehicle or other objects in the image, and the present invention is not limited herein.
In this step, the object detection is to obtain the object in the image
And obtaining the target detection frame at the appearing position. As an example, the position of the target detection frame can be obtained by using the SSD detection algorithm, and the position of the target detection frame is (x, y, w, h)box,(x,y)boxPosition coordinates, w, representing the upper left corner of the target detection framebox,hboxRespectively representing the width and height of the target detection box. Among them, the SSD detection algorithm (target detection algorithm) is a deep convolutional neural network object detection algorithm based on a regression idea.
Step S102: and respectively determining a plurality of target key points in the k frame image and a plurality of target key points in the (k + 1) th frame image based on the target detection frame in the k frame image.
The purpose of this step is to obtain point coordinates of a specific location of an object in an image, for example to obtain point coordinates of a specific location of a human face in an image. In the present embodiment, a plurality of target key points in the k-th frame image and a plurality of target key points in the k + 1-th frame image correspond to each other.
Specifically, the target key point can be obtained through the following process:
and obtaining a target position in the image through a target detection frame obtained by a target detection algorithm, extracting the image of a target part in the image by using the target detection frame to obtain an image of the target part, and finally inputting the image of the target part into a target key point detection model to obtain a target key point. The target key point detection model is a deep learning model and is obtained through training of training data, namely, a mapping relation from an image to a point is obtained through training of a target image and a corresponding target key point, and is set as f. When the model is used for detecting the target key points, the positions of the target key points can be obtained only by inputting image data into the model. In a specific embodiment, the number of the target key points may be set to 106.
The image input at the k-th frame is IkAnd f is the mapping model of the image to the key points, the target key point detection can be expressed by the formula (1):
f(Ik)={(x1,y1),(x2,y2),…(xn,yn),}k (1)
wherein (x)n,yn) And detecting a plurality of key points output by the model for the target key points.
In the step, the target detection frame in the k frame image is used as the target detection frame of the k +1 frame image, namely the k +1 frame image is not subjected to target detection, so that the target detection process is saved, the speed of the whole target tracking process is increased, the time is saved, and the efficiency is improved. In this step, although the position of the target detection frame in the k-th frame image may deviate from the actual position of the target detection frame in the (k + 1) -th frame image, which may cause a certain deviation when the target detection frame in the k-th frame image is used to extract the target in the (k + 1) -th frame image, since the target keypoint model has a certain generalization, the target keypoint model can correctly output the position of the target keypoint even if the target image input into the target keypoint model has a deviation at a certain pixel level.
Step S103: and determining the average displacement of the target key point in the k frame image and the target key point in the k +1 frame image.
Specifically, the method comprises the following steps:
respectively determining the average positions of a plurality of target key points in the k frame image and the average positions of a plurality of target key points in the k +1 frame image;
and calculating the displacement difference between the average position of the plurality of target key points in the k +1 frame image and the average position of the plurality of target key points in the k frame image, and taking the displacement difference as the average displacement of the target key points in the k frame image and the target key points in the k +1 frame image.
Wherein the average position of a plurality of target keypoints in the kth frame image is calculated according to the following formula (2):
Figure BDA0002304906040000071
Figure BDA0002304906040000072
indicating the average position of a plurality of target keypoints in the image of the kth frame,
Figure BDA0002304906040000073
indicating the position of the ith target keypoint in the image of the kth frame.
Calculating the average position of a plurality of target key points in the k +1 frame image according to the following formula (3):
Figure BDA0002304906040000074
Figure BDA0002304906040000075
represents the average position of a plurality of target key points in the image of the (k + 1) th frame,
Figure BDA0002304906040000076
indicating the position of the ith target keypoint in the image of the (k + 1) th frame.
Calculating a displacement difference between the average position of the plurality of target keypoints in the image of the (k + 1) th frame and the average position of the plurality of target keypoints in the image of the k +1 th frame according to the following formula (4):
Figure BDA0002304906040000077
Figure BDA0002304906040000078
and representing the displacement difference between the average position of the plurality of target key points in the k +1 th frame image and the average position of the plurality of target key points in the k +1 th frame image.
Step S104: and judging the average displacement and the threshold value. In this step, the threshold may be flexibly set according to an application scenario, and the present invention is not limited herein.
Step S105: and when the average displacement is smaller than or equal to a threshold value, correcting a target detection frame in the k frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the k +2 frame image to realize target tracking.
In this embodiment, if the average displacement is smaller than the threshold, which means that the target in the k +1 th frame image moves less than the k th frame image, the target detection frame in the k th frame image is translated, as shown in the following formula (5):
Figure BDA0002304906040000079
Figure BDA00023049060400000710
indicating the position of the target detection frame in the image of the k-th frame,
Figure BDA00023049060400000711
and indicates the position of the corrected target detection frame.
Step S106: and when the average displacement is larger than a threshold value, carrying out target detection on the (k + 2) th frame image, and determining a target detection frame in the (k + 2) th frame image so as to realize target tracking.
In this embodiment, if the average displacement is greater than the threshold, which means that the target in the (k + 1) th frame image moves more than the k +2 th frame image, the target detection needs to be performed again for the (k + 2) th frame image to obtain the target detection frame in the (k + 2) th frame image.
According to the target tracking method, the target detection frames in the kth frame image are used for respectively determining the target key points in the kth frame image and the target key points in the (k + 1) th frame image, namely the target detection frames in the kth frame image are used as the target detection frames of the (k + 1) th frame image, so that the (k + 1) th frame image is not subjected to target detection, the target detection process is saved, the speed of the whole target tracking process is increased, the time is saved, and the efficiency is improved; when the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to a threshold value, correcting a target detection frame in the kth frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the (k + 2) th frame image to realize target tracking, namely under the condition that the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to the threshold value, the (k + 2) th frame image does not pass through target detection, so that the process of target detection is also saved, the speed of the whole process is accelerated, and the efficiency is improved. Therefore, the target tracking method of the embodiment of the invention avoids target detection on each frame of image, thereby overcoming the technical problems that each frame of image needs target detection, consumes long time and cannot meet the real-time requirement in the prior art, further improving the detection efficiency and being suitable for application scenes with higher real-time requirement.
In order to make the target tracking method of the embodiment of the invention clearer, the processing procedure of the method is described again by taking the face in the tracking image as an example:
(1) the processing procedure for the k frame image is as follows: obtaining a face frame through an SSD (solid State disk) detection algorithm, obtaining a face image through the face frame, inputting the face image into a face key point detection model to obtain a plurality of face key points, and calculating the average positions of the face key points;
(2) the processing procedure for the k +1 frame image is as follows: and acquiring a face image in the (k + 1) th frame image by using a face frame in the kth frame image, inputting the face image into a face key point detection model to obtain face key points, and calculating the average position of the face key points. It is worth noting that the face frame in the k +1 th frame image is the face frame of the k-th frame image, but not the face frame of the k +1 th frame image, and the k +1 th frame image is not detected by the SSD algorithm, which aims to save the SSD detection process in this frame, although the position of the face frame of the k-th frame image may have a little deviation from the actual position of the face in the k +1 th frame image, which may result in a certain deviation of the face deducted from the k +1 th frame image, because the face key point model has a certain generalization, even if the face position in the face image input in the face key point model has some pixel level deviations, the face key point model can also correctly output the position of the key point.
(3) Calculating the average displacement of key points of the human face in the k frame and the k +1 frame of image, if the average displacement is less than or equal to the threshold value, indicating that the human face position of the k frame and the k +1 frame of image does not move much, and then correcting the position of the human face frame of the k frame by using the average displacement of the k frame and the k +1 frame as the human face frame of the k +2 frame of image. If the average displacement is larger than the threshold value, the fact that the human face moves too much when the k +1 th frame image is compared with the k +2 th frame image is shown, then the human face frame of the k +1 th frame image is directly translated through the average displacement to obtain the human face frame of the k +2 th frame image, and problems can be caused when the human face frame is used for conducting image matting on the k +2 th frame image, so that when the average displacement is larger than the threshold value, human face detection is conducted on the k +2 th frame image again, when image matting is conducted on the k +2 th frame image and key point detection is conducted on the k +2 th frame image, even if the human face moves fast, the position of the human face frame and the position of the image can be completely corresponding, the position of the human face image buckled on the k +2 th frame image is also correct, and therefore it is.
It is worth noting that the position of the face frame after the correction is the position of the face frame of the (k + 1) th frame image, compared with the (k + 2) th frame image, the position of the face frame may not be in the middle of the position of the face of the (k + 2) th frame image, but due to the generalization capability of the face key point model, even if the position of the face in the face image input in the face key point model has some pixel level deviations, the face key point model can correctly output the position of the key point, so that the face tracking method of the embodiment of the invention can reduce the face detection frequency of some frames by the means.
(4) The processing flow for the k +2 frame image is:
a. if the average displacement of the key points of the human faces of the (k + 1) th frame image and the k (k) th frame image is less than or equal to the threshold value, the (k + 1) th frame image moves less than the k (k) th frame image, the human face frame of the k (k) th frame image is directly translated according to the average displacement to obtain the human face frame of the image of the (k + 2) th frame, then the translated human face frame is used for matting the image of the (k + 2) th frame image, and the deducted human face image is input into a human face key point model;
b. if the average displacement of the face key points of the k +1 th frame and the k-th frame is larger than the threshold value, it indicates that the face frame of the k +1 th frame may not be accurate if the face frame of the k +1 th frame is directly translated, and if the face frame is used to scratch on the image of the k +2 th frame, the deviation may be larger, and the face key point model may not obtain a correct result. Therefore, under the condition that the average displacement is larger than the threshold value, the SSD detection is directly carried out on the (k + 2) th frame image, and the face frame of the (k + 2) th frame image is determined to be the position of the face in the (k + 2) th frame image.
According to the target tracking method, the plurality of face key points in the kth frame image and the plurality of face key points in the (k + 1) th frame image are respectively determined through the face frame in the kth frame image, namely, the face frame in the kth frame image is used as the face frame of the (k + 1) th frame image, so that the (k + 1) th frame image is not subjected to face detection, the face detection process is saved, the speed of the whole face tracking process is increased, the time is saved, and the efficiency is improved; when the average displacement between the key points of the face in the k frame image and the key points of the face in the (k + 1) th frame image is smaller than or equal to a threshold value, correcting the face frame in the k frame image through the average displacement, and using the corrected face frame as the face frame of the (k + 2) th frame image to realize face tracking, namely under the condition that the average displacement between the key points of the face in the k frame image and the key points of the face in the (k + 1) th frame image is smaller than or equal to the threshold value, the (k + 2) th frame image is not subjected to face detection, so that the process of face detection is also saved, the speed of the whole process is accelerated, and the efficiency is improved. Therefore, the face tracking method of the embodiment of the invention avoids face detection on each frame of image, thereby overcoming the technical problems that each frame of image needs face detection, is long in time consumption and cannot meet the real-time requirement in the prior art, further improving the detection efficiency, and being suitable for application scenes with higher real-time requirement.
Fig. 2 is a schematic diagram of the main modules of an object tracking apparatus 200 according to an embodiment of the present invention, as shown in fig. 2, the apparatus 200 includes:
a detection frame determining module 201, configured to perform target detection on a k-th frame image, and determine a target detection frame in the k-th frame image; wherein k is an integer greater than or equal to 1;
a keypoint determination module 202, configured to determine, based on the target detection frame in the kth frame image, a plurality of target keypoints in the kth frame image and a plurality of target keypoints in the (k + 1) th frame image, respectively;
a displacement determining module 203, configured to determine an average displacement between a target key point in the k frame image and a target key point in the (k + 1) th frame image;
and the tracking module 204 is configured to, when the average displacement is smaller than or equal to a threshold, correct the target detection frame in the k frame of image through the average displacement, and use the corrected target detection frame as the target detection frame of the k +2 frame of image, so as to implement target tracking.
In an alternative embodiment, the tracking module 204 is further configured to: and when the average displacement is larger than a threshold value, carrying out target detection on the (k + 2) th frame image, and determining a target detection frame in the (k + 2) th frame image so as to realize target tracking.
In an alternative embodiment, the displacement determining module 203 is further configured to:
respectively determining the average positions of a plurality of target key points in the k frame image and the average positions of a plurality of target key points in the k +1 frame image;
and calculating the displacement difference between the average position of the plurality of target key points in the k +1 frame image and the average position of the plurality of target key points in the k frame image, and taking the displacement difference as the average displacement of the target key points in the k frame image and the target key points in the k +1 frame image.
In an alternative embodiment, the tracking module 204 is further configured to: and translating the target detection frame in the k frame image according to the average displacement.
According to the target tracking device, the target detection frames in the kth frame image are used for respectively determining the target key points in the kth frame image and the target key points in the (k + 1) th frame image, namely the target detection frames in the kth frame image are used as the target detection frames of the (k + 1) th frame image, so that the (k + 1) th frame image is not subjected to target detection, the target detection process is saved, the speed of the whole target tracking process is increased, the time is saved, and the efficiency is improved; when the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to a threshold value, correcting a target detection frame in the kth frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the (k + 2) th frame image to realize target tracking, namely under the condition that the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to the threshold value, the (k + 2) th frame image does not pass through target detection, so that the process of target detection is also saved, the speed of the whole process is accelerated, and the efficiency is improved. Therefore, the target tracking method of the embodiment of the invention avoids target detection on each frame of image, thereby overcoming the technical problems that each frame of image needs target detection, consumes long time and cannot meet the real-time requirement in the prior art, further improving the detection efficiency and being suitable for application scenes with higher real-time requirement.
The device can execute the method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.
Fig. 3 illustrates an exemplary system architecture 300 to which a target tracking method or a target tracking apparatus of an embodiment of the present invention may be applied.
As shown in fig. 3, the system architecture 300 may include terminal devices 301, 302, 303, a network 304, and a server 305. The network 304 serves as a medium for providing communication links between the terminal devices 301, 302, 303 and the server 305. Network 304 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal device 301, 302, 303 to interact with the server 305 via the network 304 to receive or send messages or the like. The terminal devices 301, 302, 303 may have various communication client applications installed thereon, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, and the like.
The terminal devices 301, 302, 303 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 305 may be a server providing various services, such as a background management server providing support for shopping websites browsed by the user using the terminal devices 301, 302, 303. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.
It should be noted that the target tracking method provided by the embodiment of the present invention is generally executed by the server 305, and accordingly, the target tracking apparatus is generally disposed in the server 305.
It should be understood that the number of terminal devices, networks, and servers in fig. 3 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 4, a block diagram of a computer system 400 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 4, the computer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the system 400 are also stored. The CPU 401, ROM 402, and RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
The following components are connected to the I/O interface 405: an input section 406 including a keyboard, a mouse, and the like; an output section 407 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 408 including a hard disk and the like; and a communication section 409 including a network interface card such as a LAN card, a modem, or the like. The communication section 409 performs communication processing via a network such as the internet. A driver 410 is also connected to the I/O interface 405 as needed. A removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 410 as necessary, so that a computer program read out therefrom is mounted into the storage section 408 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 409, and/or installed from the removable medium 411. The computer program performs the above-described functions defined in the system of the present invention when executed by a Central Processing Unit (CPU) 401.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not in some cases constitute a limitation on the unit itself, and for example, the sending module may also be described as a "module that sends a picture acquisition request to a connected server".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
carrying out target detection on the kth frame image, and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1;
respectively determining a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image based on the target detection frame in the kth frame image;
determining the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image;
and when the average displacement is smaller than or equal to a threshold value, correcting a target detection frame in the k frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the k +2 frame image to realize target tracking.
According to the technical scheme of the embodiment of the invention, a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image are respectively determined through the target detection frame in the kth frame image, namely the target detection frame in the kth frame image is used as the target detection frame of the (k + 1) th frame image, so that the (k + 1) th frame image is not subjected to target detection, the target detection process is saved, the speed of the whole target tracking process is increased, the time is saved, and the efficiency is improved; when the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to a threshold value, correcting a target detection frame in the kth frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the (k + 2) th frame image to realize target tracking, namely under the condition that the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image is smaller than or equal to the threshold value, the (k + 2) th frame image does not pass through target detection, so that the process of target detection is also saved, the speed of the whole process is accelerated, and the efficiency is improved. Therefore, the target tracking method of the embodiment of the invention avoids target detection on each frame of image, thereby overcoming the technical problems that each frame of image needs target detection, consumes long time and cannot meet the real-time requirement in the prior art, further improving the detection efficiency and being suitable for application scenes with higher real-time requirement.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

1. A target tracking method, comprising:
carrying out target detection on the kth frame image, and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1;
respectively determining a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image based on the target detection frame in the kth frame image;
determining the average displacement of the target key point in the kth frame image and the target key point in the (k + 1) th frame image;
and when the average displacement is smaller than or equal to a threshold value, correcting a target detection frame in the k frame image through the average displacement, and using the corrected target detection frame as a target detection frame of the k +2 frame image to realize target tracking.
2. The method of claim 1, further comprising:
and when the average displacement is larger than a threshold value, carrying out target detection on the (k + 2) th frame image, and determining a target detection frame in the (k + 2) th frame image so as to realize target tracking.
3. The method of claim 1, wherein determining an average displacement of a target keypoint in the k frame image and a target keypoint in the k +1 frame image comprises:
respectively determining the average positions of a plurality of target key points in the k frame image and the average positions of a plurality of target key points in the k +1 frame image;
and calculating the displacement difference between the average position of the plurality of target key points in the k +1 frame image and the average position of the plurality of target key points in the k frame image, and taking the displacement difference as the average displacement of the target key points in the k frame image and the target key points in the k +1 frame image.
4. The method of claim 1, wherein correcting the object detection frame in the k frame image by the average displacement comprises:
and translating the target detection frame in the k frame image according to the average displacement.
5. An object tracking device, comprising:
the detection frame determining module is used for carrying out target detection on the kth frame image and determining a target detection frame in the kth frame image; wherein k is an integer greater than or equal to 1;
a key point determining module, configured to determine, based on the target detection frame in the kth frame image, a plurality of target key points in the kth frame image and a plurality of target key points in the (k + 1) th frame image, respectively;
a displacement determining module, configured to determine an average displacement between a target key point in the k frame image and a target key point in the (k + 1) th frame image;
and the tracking module is used for correcting the target detection frame in the k frame image through the average displacement when the average displacement is less than or equal to a threshold value, and using the corrected target detection frame as the target detection frame of the k +2 frame image so as to realize target tracking.
6. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4.
7. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.
CN201911236052.8A 2019-12-05 2019-12-05 Target tracking method and device Active CN112926356B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911236052.8A CN112926356B (en) 2019-12-05 2019-12-05 Target tracking method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911236052.8A CN112926356B (en) 2019-12-05 2019-12-05 Target tracking method and device

Publications (2)

Publication Number Publication Date
CN112926356A true CN112926356A (en) 2021-06-08
CN112926356B CN112926356B (en) 2024-06-18

Family

ID=76161900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911236052.8A Active CN112926356B (en) 2019-12-05 2019-12-05 Target tracking method and device

Country Status (1)

Country Link
CN (1) CN112926356B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007088759A1 (en) * 2006-02-01 2007-08-09 National University Corporation The University Of Electro-Communications Displacement detection method, displacement detection device, displacement detection program, characteristic point matching method, and characteristic point matching program
CN103077532A (en) * 2012-12-24 2013-05-01 天津市亚安科技股份有限公司 Real-time video object quick tracking method
CN103455797A (en) * 2013-09-07 2013-12-18 西安电子科技大学 Detection and tracking method of moving small target in aerial shot video
CN106846362A (en) * 2016-12-26 2017-06-13 歌尔科技有限公司 A kind of target detection tracking method and device
KR101837407B1 (en) * 2017-11-03 2018-03-12 국방과학연구소 Apparatus and method for image-based target tracking
CN109003245A (en) * 2018-08-21 2018-12-14 厦门美图之家科技有限公司 Coordinate processing method, device and electronic equipment
CN109214245A (en) * 2017-07-03 2019-01-15 株式会社理光 A kind of method for tracking target, device, equipment and computer readable storage medium
CN110349190A (en) * 2019-06-10 2019-10-18 广州视源电子科技股份有限公司 Target tracking method, device and equipment for adaptive learning and readable storage medium
CN110378264A (en) * 2019-07-08 2019-10-25 Oppo广东移动通信有限公司 Method for tracking target and device
CN110400332A (en) * 2018-04-25 2019-11-01 杭州海康威视数字技术股份有限公司 A kind of target detection tracking method, device and computer equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007088759A1 (en) * 2006-02-01 2007-08-09 National University Corporation The University Of Electro-Communications Displacement detection method, displacement detection device, displacement detection program, characteristic point matching method, and characteristic point matching program
CN103077532A (en) * 2012-12-24 2013-05-01 天津市亚安科技股份有限公司 Real-time video object quick tracking method
CN103455797A (en) * 2013-09-07 2013-12-18 西安电子科技大学 Detection and tracking method of moving small target in aerial shot video
CN106846362A (en) * 2016-12-26 2017-06-13 歌尔科技有限公司 A kind of target detection tracking method and device
CN109214245A (en) * 2017-07-03 2019-01-15 株式会社理光 A kind of method for tracking target, device, equipment and computer readable storage medium
KR101837407B1 (en) * 2017-11-03 2018-03-12 국방과학연구소 Apparatus and method for image-based target tracking
CN110400332A (en) * 2018-04-25 2019-11-01 杭州海康威视数字技术股份有限公司 A kind of target detection tracking method, device and computer equipment
CN109003245A (en) * 2018-08-21 2018-12-14 厦门美图之家科技有限公司 Coordinate processing method, device and electronic equipment
CN110349190A (en) * 2019-06-10 2019-10-18 广州视源电子科技股份有限公司 Target tracking method, device and equipment for adaptive learning and readable storage medium
CN110378264A (en) * 2019-07-08 2019-10-25 Oppo广东移动通信有限公司 Method for tracking target and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
尹彦;耿兆丰;: "基于背景模型的运动目标检测与跟踪", 微计算机信息, no. 16, 5 June 2008 (2008-06-05) *
栾庆磊;陈正伟;何勇;: "一种运动背景下移动目标的检测方法", 计算机与数字工程, no. 10, 20 October 2008 (2008-10-20) *
谢永亮;洪留荣;葛方振;郑颖;孙雯;贾平平;: "运动背景下任意目标跟踪方法研究", 苏州科技学院学报(自然科学版), no. 03, 15 September 2016 (2016-09-15) *

Also Published As

Publication number Publication date
CN112926356B (en) 2024-06-18

Similar Documents

Publication Publication Date Title
US10796438B2 (en) Method and apparatus for tracking target profile in video
CN110225366B (en) Video data processing and advertisement space determining method, device, medium and electronic equipment
CN109255337B (en) Face key point detection method and device
US20210200971A1 (en) Image processing method and apparatus
CN110069961B (en) Object detection method and device
CN111192312B (en) Depth image acquisition method, device, equipment and medium based on deep learning
CN111815738B (en) Method and device for constructing map
CN108182457B (en) Method and apparatus for generating information
CN113158773B (en) Training method and training device for living body detection model
CN110349158A (en) A kind of method and apparatus handling point cloud data
CN110941978A (en) Face clustering method and device for unidentified personnel and storage medium
KR20240140057A (en) Facial recognition method and device
CN113033377A (en) Character position correction method, character position correction device, electronic equipment and storage medium
CN110288625B (en) Method and apparatus for processing image
CN114119990A (en) Method, apparatus and computer program product for image feature point matching
CN112651399A (en) Method for detecting same-line characters in oblique image and related equipment thereof
CN113362090A (en) User behavior data processing method and device
CN114724144B (en) Text recognition method, training device, training equipment and training medium for model
CN112926356B (en) Target tracking method and device
CN113808134B (en) Oil tank layout information generation method, oil tank layout information generation device, electronic apparatus, and medium
CN110634155A (en) Target detection method and device based on deep learning
CN112487943B (en) Key frame de-duplication method and device and electronic equipment
CN115376026A (en) Key area positioning method, device, equipment and storage medium
CN114581711A (en) Target object detection method, apparatus, device, storage medium, and program product
CN112000218B (en) Object display method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant