CN110059521B - Target tracking method and device - Google Patents

Target tracking method and device Download PDF

Info

Publication number
CN110059521B
CN110059521B CN201810049002.8A CN201810049002A CN110059521B CN 110059521 B CN110059521 B CN 110059521B CN 201810049002 A CN201810049002 A CN 201810049002A CN 110059521 B CN110059521 B CN 110059521B
Authority
CN
China
Prior art keywords
detected
target object
similarity
video frame
frame image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810049002.8A
Other languages
Chinese (zh)
Other versions
CN110059521A (en
Inventor
黄元捷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Uniview Technologies Co Ltd
Original Assignee
Zhejiang Uniview Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Uniview Technologies Co Ltd filed Critical Zhejiang Uniview Technologies Co Ltd
Priority to CN201810049002.8A priority Critical patent/CN110059521B/en
Publication of CN110059521A publication Critical patent/CN110059521A/en
Application granted granted Critical
Publication of CN110059521B publication Critical patent/CN110059521B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Abstract

The invention provides a target tracking method and a target tracking device, which are applied to a server storing a characteristic model corresponding to each target object. The method comprises the following steps: carrying out target detection on the current video frame image, and extracting corresponding CNN characteristics according to the position information of each object to be detected obtained by detection; calculating to obtain a corresponding similarity matrix according to the position information and the CNN characteristics of each object to be detected and the position information and the characteristic model of each target object in the previous video frame image; performing data association on each object to be detected and each target object based on the similarity matrix to obtain an optimal matching result; and if the optimal matching result has an object to be detected which is successfully matched with the corresponding target object, updating the corresponding feature model according to the CNN feature of the object to be detected, and obtaining a corresponding tracking result based on the object to be detected. The method has the advantages of strong anti-interference capability and high tracking success rate, and can continuously track the target object.

Description

Target tracking method and device
Technical Field
The invention relates to the technical field of multi-target tracking of video images, in particular to a target tracking method and device.
Background
With the continuous development of monitoring technologies, the application of multi-target tracking technologies for tracking multiple target objects in a monitored video is becoming more and more extensive. In the existing multi-target tracking scheme, in the tracking process of a target object, the target object is tracked by comparing a CNN (Convolutional Neural Network) feature of the target object in a current video image with a CNN feature of the target object when the target object is tracked successfully recently, but the multi-target tracking scheme is weak in anti-interference capability and low in target tracking success rate, and usually, because the CNN feature of the target object when the target object is tracked successfully recently carries a feature of a partial obstruction, the CNN feature of the target object in the current video image cannot be correctly matched with the CNN feature of the target object when the target object is tracked successfully recently, so that tracking failure is caused.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention aims to provide a target tracking method and a target tracking device.
As for the method, a preferred embodiment of the present invention provides a target tracking method, which is applied to a server, where the server stores feature models corresponding to target objects, where each feature model includes a history CNN feature of a corresponding target object, and the method includes:
carrying out target detection on a current video frame image, and extracting CNN characteristics corresponding to each object to be detected from the current video frame image according to the position information of each object to be detected in the detected current video frame image;
calculating to obtain a similarity matrix between each object to be detected in the current video frame image and each target object in the previous video frame image according to the position information and the corresponding CNN characteristic of each object to be detected in the current video frame image, and the position information and the corresponding characteristic model of each target object in the previous video frame image;
performing data association on each object to be detected and each target object based on the similarity matrix to obtain an optimal matching result between the current video frame image and the previous video frame image;
and if the optimal matching result contains an object to be detected which is successfully matched with the corresponding target object, updating the characteristic model corresponding to the target object according to the CNN characteristic of the object to be detected which is successfully matched with the corresponding target object, and obtaining a corresponding tracking result based on the object to be detected which is successfully matched. According to the method, an optimal similarity matrix between each object to be detected in a current video frame image and each target object in a previous video frame image is calculated and obtained according to the CNN characteristics of each object to be detected in the current video frame image and the historical CNN characteristics included in the characteristic model of each target object in the previous video frame image, an optimal matching result between the current video frame image and the previous video frame image is obtained based on the similarity matrix, and a corresponding tracking result is obtained based on the object to be detected which is successfully matched with the corresponding target object and exists in the optimal matching result, so that the influence of an interfering object on target tracking is reduced, the target tracking success rate is improved, and the continuous tracking of the target object is realized.
In terms of an apparatus, a preferred embodiment of the present invention provides a target tracking apparatus applied to a server, where the server stores feature models corresponding to target objects, where each feature model includes a history CNN feature of a corresponding target object, and the apparatus includes:
the detection extraction module is used for carrying out target detection on the current video frame image and extracting CNN characteristics corresponding to each object to be detected from the current video frame image according to the position information of each object to be detected in the detected current video frame image;
the matrix calculation module is used for calculating to obtain a similarity matrix between each object to be detected in the current video frame image and each target object in the previous video frame image according to the position information and the corresponding CNN characteristic of each object to be detected in the current video frame image and the position information and the corresponding characteristic model of each target object in the previous video frame image;
the image matching module is used for performing data association on each object to be detected and each target object based on the similarity matrix to obtain an optimal matching result between the current video frame image and the previous video frame image;
and the updating and tracking module is used for updating the characteristic model corresponding to the target object according to the CNN characteristics of the object to be detected successfully matched with the corresponding target object and obtaining a corresponding tracking result based on the successfully matched object to be detected if the object to be detected successfully matched with the corresponding target object exists in the optimal matching result.
Compared with the prior art, the target tracking method and device provided by the preferred embodiment of the invention have the following beneficial effects: the target tracking method is strong in anti-interference capability and high in target tracking success rate, and can continuously track the target object. The target tracking method is applied to a server, and the server stores characteristic models corresponding to target objects, wherein each characteristic model comprises historical CNN characteristics of the corresponding target object. Firstly, the method comprises the steps of carrying out target detection on a current video frame image to obtain each object to be detected in the current video frame image, and extracting CNN characteristics corresponding to each object to be detected from the current video frame image according to position information of each object to be detected in the detected current video frame image; then, according to the position information and corresponding CNN characteristics of each object to be detected in the current video frame image, and the position information and corresponding CNN characteristics of each target object in the previous video frame image, calculating to obtain an optimal similarity matrix between each object to be detected in the current video frame image and each target object in the previous video frame image; then, the method carries out data association on each object to be detected and each target object based on the similarity matrix to obtain an optimal matching result between the current video frame image and the previous video frame image; finally, when the optimal matching result has an object to be detected which is successfully matched with the corresponding target object, the method updates the feature model corresponding to the target object according to the CNN feature of the object to be detected which is successfully matched with the corresponding target object, and obtains a corresponding tracking result based on the object to be detected which is successfully matched, thereby reducing the influence of an interfering object on target tracking, improving the success rate of target tracking and realizing continuous tracking of the target object.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments are briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope of the claims of the present invention, and it is obvious for those skilled in the art that other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a block diagram of a server according to a preferred embodiment of the present invention.
Fig. 2 is a flowchart illustrating a target tracking method according to a preferred embodiment of the invention.
Fig. 3 is a flowchart illustrating the sub-steps included in step S220 shown in fig. 2.
Fig. 4 is a flowchart illustrating the sub-steps included in step S240 shown in fig. 2.
FIG. 5 is a block diagram of the target tracking device shown in FIG. 1 according to a preferred embodiment of the present invention.
FIG. 6 is a block diagram of the matrix calculation module shown in FIG. 5.
Icon: 10-a server; 11-a memory; 12-a processor; 13-a communication unit; 100-a target tracking device; 110-a detection extraction module; 120-a matrix calculation module; 130-an image matching module; 140-update the tracking module; 121-similarity calculation operator module; 122-matrix generation submodule.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Fig. 1 is a block diagram of a server 10 according to a preferred embodiment of the present invention. In the embodiment of the present invention, the server 10 is configured to perform target tracking on each monitored object in the acquired monitoring video, where the target tracking has strong anti-interference capability and high tracking success rate, and the server 10 may be, but is not limited to, a cloud server, a distributed server, a centralized server, and the like.
In this embodiment, the server 10 includes a target tracking device 100, a memory 11, a processor 12, and a communication unit 13. The memory 11, the processor 12 and the communication unit 13 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.
The memory 11 may be configured to store feature models corresponding to target objects in a surveillance video, where each feature model includes a historical CNN feature extracted when a corresponding target object is tracked by the server 10, the target object is an object to be tracked in the surveillance video, and the target object may be a person, a vehicle, an animal, and/or a plant. The Memory 11 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), and the like. The memory 11 may store a software program, and the processor 12 may execute the software program after receiving an execution instruction.
The processor 12 may be an integrated circuit chip having signal processing capabilities. The Processor 12 may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also digital signal processors, application specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The processor 12 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The communication unit 13 is configured to establish a communication connection between the server 10 and another external device via a network, and to transmit and receive data via the network. The server 10 obtains a surveillance video that needs to be subjected to target tracking from the surveillance device through the communication unit 13, and after the target tracking of the surveillance video is completed, the surveillance video subjected to target tracking can be displayed on the display device through the communication unit 13.
The target tracking device 100 includes at least one software functional module that can be stored in the memory 11 in the form of software or firmware. The processor 12 may be used to execute executable modules stored in the memory 11 corresponding to the target tracking device 100, such as software functional modules and computer programs included in the target tracking device 100. In this embodiment, the target tracking apparatus 100 has a strong anti-interference capability, and can perform target tracking with a high success rate of continuously tracking the target objects in the monitoring video in a manner of comparing the CNN features of each object to be detected in the current video frame image with the historical CNN features included in the feature models of each target object one by one.
It is to be understood that the block diagram shown in fig. 1 is merely a schematic diagram of one structural component of the server 10, and that the server 10 may include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
Fig. 2 is a block diagram of a target tracking method according to a preferred embodiment of the invention. In the embodiment of the present invention, the target tracking method is applied to the server 10, and is used for continuously performing target tracking on each target object in a monitored video, where the target tracking has strong anti-interference capability and a high tracking success rate, and feature models corresponding to each target object in the monitored video are stored in the server 10, and each feature model includes a history CNN feature of the corresponding target object. The following describes the specific flow and steps of the target tracking method shown in fig. 2 in detail.
In an embodiment of the present invention, the target tracking method includes the following steps:
step S210, performing target detection on the current video frame image, and extracting CNN characteristics corresponding to each object to be detected from the current video frame image according to the position information of each object to be detected in the detected current video frame image.
In this embodiment, the surveillance video acquired by the server 10 may be formed by continuously displaying a plurality of video frame images, and the server 10 may complete target tracking on an object to be tracked in the surveillance video by comparing CNN characteristics of each target object that may exist in the plurality of video frame images. The server 10 may obtain position information of each object to be detected in the current video frame image by performing target detection on the current video frame image, and deduct a feature region corresponding to each object to be detected from a corresponding position in the current video frame image according to the position information of each object to be detected, so as to extract a CNN feature corresponding to each object to be detected from the feature region of each object to be detected. The object to be detected is an object detected in a current video frame image, where the object to be detected may include an object that has been continuously tracked in at least one previous video frame image of the current video frame image arranged according to a time sequence in the surveillance video, and an object that newly appears in the current video frame image and needs to be tracked for the first time.
Step S220, calculating to obtain a similarity matrix between each object to be detected in the current video frame image and each object in the previous video frame image according to the position information and the corresponding CNN feature of each object to be detected in the current video frame image, and the position information and the corresponding feature model of each object in the previous video frame image.
In this embodiment, the previous video frame image is a previous video frame image of the current video frame image arranged according to a time sequence in the surveillance video, the target objects in the previous video frame image are all the target objects acquired from the surveillance video by the server 10 before performing target tracking on the current video frame image, and the target objects in the previous video frame image include the target objects directly exposed in the previous video frame image and the target objects not directly exposed in the previous video frame image before performing target detection on the current video frame image.
In this embodiment, the server 10 obtains an optimal similarity matrix between each object to be detected in the current video frame image and each object in the previous video frame image by comparing the CNN feature of each object to be detected in the current video frame image in the video frame image with the historical CNN feature included in the feature model corresponding to each object in the previous video frame image, and comparing the position information of each object to be detected in the current video frame image in the video frame image with the position information of each object in the previous video frame image that should correspond to each other. Wherein the historical CNN features included in each feature model are CNN features in the corresponding video frame image when the corresponding target object is successfully tracked. For example, a first video frame image, a third video frame image, a fifth video frame image, and a seventh video frame image are sequentially arranged according to a time sequence, a target object is successfully tracked in the first video frame image, the fifth video frame image, and the seventh video frame image, and if the number of the historical CNN features in the feature model is not limited, the feature model of the target object includes the CNN features corresponding to the target object in the first video frame image, the fifth video frame image, and the seventh video frame image.
Optionally, please refer to fig. 3, which is a flowchart illustrating the sub-steps included in step S220 shown in fig. 2. In this embodiment, the step of calculating the similarity matrix between each object to be detected in the current video frame image and each target object in the previous video frame image in step S220 may include sub-step S221, sub-step S222, and sub-step S223:
and a substep S221, calculating and obtaining the feature similarity between each object to be detected and each target object based on the historical CNN features in the feature model corresponding to each target object.
In this embodiment, the server 10 may obtain the optimal feature similarity between each object to be detected and each target object by comparing and calculating the CNN features corresponding to each object to be detected in the current video frame image with all the historical CNN features included in the feature model of each target object in the previous video frame image.
Optionally, the step of calculating, based on the historical CNN features in the feature model corresponding to each target object, to obtain the feature similarity between each object to be detected and each target object includes:
calculating the cosine distance between the CNN characteristic of each object to be detected and each historical CNN characteristic in the characteristic model corresponding to each target object to obtain each cosine distance between the object to be detected and the corresponding target object;
and selecting the cosine distance with the minimum value from the cosine distances as the characteristic similarity between the object to be detected and the corresponding target object.
The optimal feature similarity between each object to be detected and the corresponding target object, which can be obtained according to the above steps, can be calculated by the following formula:
Figure BDA0001551814600000091
wherein M isiIndicates a target serial number ofi characteristic model of the target object, Fi 0Initial CNN feature, F, representing the target objecti nRepresenting the historical CNN feature, aff, of the target object in the nth video frame imageappRepresenting the degree of similarity of features between the object to be detected and the corresponding target object, FiRepresenting the historical CNN characteristics of the corresponding target object,
Figure BDA0001551814600000092
representing the CNN characteristic, cosine (F) of the object to be examinedi
Figure BDA0001551814600000093
) Feature F representing history CNN of target objectiCorresponding to CNN characteristics of object to be detected
Figure BDA0001551814600000094
The cosine distance between.
The substep S222 is to calculate and obtain the spatial similarity and the shape similarity between each object to be detected and each target object based on the position information and the target size information of each target object in the previous video frame image and the position information and the target size information of each object to be detected in the current video frame image;
in this embodiment, the position information of each target object in the previous video frame image includes X coordinate information and Y coordinate information of a coordinate point at the upper left corner of a corresponding feature region of the corresponding target object in the previous video frame image, the target size information of each target object in the previous video frame image includes a region width and a region height of a corresponding feature region of the corresponding target object in the previous video frame image, the position information of each object to be detected in the current video frame image includes X coordinate information and Y coordinate information of a coordinate point at the upper left corner of a corresponding feature region of the object to be detected in the current video frame image, and the target size information of each object to be detected in the current video frame image includes a region width and a region height of a corresponding feature region of the object to be detected in the current video frame image. The server 10 calculates and obtains the spatial similarity between each object to be detected and each target object according to the area width, the area height, the X coordinate information and the Y coordinate information of each object to be detected, and the X coordinate information and the Y coordinate information of each target object; the server 10 calculates and obtains the shape similarity between each object to be detected and each target object according to the area width, the area height, the X coordinate information, the Y coordinate information of each object to be detected, and the area width and the area height of each target object. In this embodiment, the spatial similarity and the shape similarity both conform to the matching criteria of the hungarian algorithm and the extended algorithm thereof. Wherein the spatial similarity and the shape similarity can be calculated by the following formula:
Figure BDA0001551814600000101
Figure BDA0001551814600000102
wherein, trkiRepresenting the ith target object, detjRepresenting the jth object to be detected, X, Y, W, H respectively representing the x coordinate value, y coordinate value, area width and area height of the upper left corner point of the corresponding characteristic area of the object in the corresponding video frame image, and affmot(trki,detj) Representing the spatial similarity, aff, between the ith target object and the jth object to be detectedshp(trki,detj) And representing the shape similarity between the ith target object and the jth object to be detected.
And a substep S223 of calculating the association similarity between each object to be detected and each target object according to the feature similarity, the spatial similarity and the shape similarity between each object to be detected and each target object, and correspondingly obtaining the similarity matrix.
In this embodiment, the server 10 obtains the optimal association similarity between each object to be detected and each target object by multiplying the feature similarity, the spatial similarity, and the shape similarity between each object to be detected and each target object, and arranges the optimal association similarity between each object to be detected and each target object in a matrix form to generate the optimal similarity matrix between the current video frame image and the previous video frame image.
Referring to fig. 2 again, in step S230, data association is performed between each object to be detected and each target object based on the similarity matrix, so as to obtain an optimal matching result between the current video frame image and the previous video frame image.
In this embodiment, the server 10 performs data association between each object to be detected and each target object based on the similarity matrix by using the hungarian algorithm or the extended algorithm thereof, so as to obtain an optimal matching result between the current video frame image and the previous video frame image.
Step S240, if there is an object to be detected successfully matched with the corresponding target object in the optimal matching result, updating the feature model corresponding to the target object according to the CNN feature of the object to be detected successfully matched with the corresponding target object, and obtaining a corresponding tracking result based on the successfully matched object to be detected.
In this embodiment, when obtaining the optimal matching result between the current video frame image and the previous video frame image, the server 10 performs object division on each object to be detected in the current video frame image, so as to obtain an object to be detected that is successfully matched with the corresponding target object in the previous video frame image, an object to be detected that is less matched with the corresponding target object in the previous video frame image, and an object to be detected that is not matched with each target object in the previous video frame image and that newly appears in the current video frame image and needs to be tracked.
In this embodiment, for an object to be detected that needs to be tracked newly appearing in the current video frame image, the server 10 may use the CNN feature of the object to be detected in the current video frame image as the initial CNN feature of the object to be detected, create a feature model of the object to be detected based on the initial CNN feature, and perform parameter correction on the object to be detected by using a Kalman filter to obtain a tracking result of the object to be detected. When the server 10 performs target tracking on a video frame image subsequent to the current video frame image, the created object to be detected is used as a target object of the monitoring video, and the target tracking is performed by using the feature model of the object to be detected.
In this embodiment, for an object to be detected with a low matching degree with a corresponding target object in the previous video frame image in the current video frame image, the server 10 predicts the position information of the object to be detected in the previous video frame image based on a Kalman filter, and determines whether to remove the tracker of the object to be detected according to the prediction result. If the prediction result indicates that the position information of the object to be detected in the past video frame image is unchanged for a long time and the total predicted time length of the object to be detected in the past video frame image is greater than a preset time length threshold, the server 10 removes the tracker of the object to be detected; if the prediction result indicates that the position information of the object to be detected in the previous video frame image is unchanged for a long time and the total predicted time length of the object to be detected in the previous video frame image is smaller than the preset time length threshold, the server 10 performs parameter correction on the object to be detected by using a Kalman filter to obtain the tracking result of the object to be detected.
In this embodiment, for the object to be detected in the current video frame image, which is successfully matched with the corresponding target object in the previous video frame image, the server 10 updates the feature model of the target object, which is successfully matched with the object to be detected, according to the CNN feature of the object to be detected in the current video frame image, and performs parameter correction on the object to be detected by using a Kalman filter to obtain the tracking result of the object to be detected, where the object to be detected is the target object which is successfully matched with the object to be detected.
Optionally, please refer to fig. 4, which is a flowchart illustrating the sub-steps included in step S240 shown in fig. 2. In this embodiment, the step of updating the feature model corresponding to the target object according to the CNN feature of the object to be detected successfully matched with the corresponding target object in step S240 includes substeps S241 and substep S242:
and a substep S241 of counting the feature number of the historical CNN features in the feature model corresponding to the target object to obtain the corresponding feature total number.
In this embodiment, when the feature model of the corresponding target object is updated, the server 10 obtains the total number of the corresponding features by counting the number of features of the history CNN features in the feature model of the target object.
And a substep S242, comparing the total number of the features with a preset feature storage number, and adding the CNN features of the object to be detected, which are successfully matched with the target object, into a feature model corresponding to the target object according to a comparison result.
In this embodiment, the step of adding, by the server 10, the CNN feature of the object to be detected, which is successfully matched with the target object, to the feature model corresponding to the target object according to the comparison result includes:
if the comparison result is that the total number of the features is smaller than the preset feature storage number, directly adding the CNN features of the object to be detected, which are successfully matched with the target object, into a corresponding feature model for storage;
and if the comparison result is that the total number of the features is not less than the preset feature storage number, replacing any one of the historical CNN features except the initial CNN feature in the corresponding feature model with the CNN feature of the object to be detected so as to add the CNN feature of the object to be detected into the feature model corresponding to the target object.
The number of the preset feature storage may be 10, 15, or 25, and the values may be configured differently according to actual requirements.
Fig. 5 is a block diagram of the target tracking device 100 shown in fig. 1 according to a preferred embodiment of the present invention. In the embodiment of the present invention, the target tracking device 100 includes a detection extraction module 110, a matrix calculation module 120, an image matching module 130, and an update tracking module 140.
The detection extraction module 110 is configured to perform target detection on a current video frame image, and extract, from the current video frame image, CNN features corresponding to each object to be detected according to position information of each object to be detected in the current video frame image.
In this embodiment, the detection extraction module 110 may execute step S210 shown in fig. 2, and the specific execution process may refer to the above detailed description of step S210.
The matrix calculation module 120 is configured to calculate, according to the position information and the corresponding CNN characteristic of each object to be detected in the current video frame image, and the position information and the corresponding characteristic model of each target object in the previous video frame image, to obtain a similarity matrix between each object to be detected in the current video frame image and each target object in the previous video frame image.
In this embodiment, the matrix calculation module 120 may perform step S220 shown in fig. 2, and the specific implementation process may refer to the detailed description of step S220 above.
Fig. 6 is a block diagram of the matrix calculation module 120 shown in fig. 5. In this embodiment, the matrix calculation module 120 includes a similarity calculation sub-module 121 and a matrix generation sub-module 122.
The similarity operator module 121 is configured to calculate, based on the historical CNN features in the feature model corresponding to each target object, feature similarities between each object to be detected and each target object.
In this embodiment, the way for calculating, by the similarity operator module 121, the feature similarity between each object to be detected and each target object based on the historical CNN features in the feature model corresponding to each target object includes:
calculating the cosine distance between the CNN characteristic of each object to be detected and each historical CNN characteristic in the characteristic model corresponding to each target object to obtain each cosine distance between the object to be detected and the corresponding target object;
and selecting the cosine distance with the minimum value from the cosine distances as the characteristic similarity between the object to be detected and the corresponding target object.
The similarity operator module 121 may perform the sub-step S221 shown in fig. 3, and the detailed implementation process may refer to the detailed description of the sub-step S221 above.
The similarity operator module 121 is further configured to calculate, based on the position information and the target size information of each target object in the previous video frame image, and the position information and the target size information of each target object in the current video frame image, a spatial similarity and a shape similarity between each target object and each target object.
In this embodiment, the similarity operator module 121 may further perform the sub-step S222 shown in fig. 3, and the specific implementation process may refer to the detailed description of the sub-step S222 above.
The matrix generation submodule 122 is configured to calculate association similarities between each object to be detected and each target object according to the feature similarity, the spatial similarity, and the shape similarity between each object to be detected and each target object, and accordingly obtain the similarity matrix.
In this embodiment, the matrix generation sub-module 122 may perform the sub-step S223 shown in fig. 3, and the detailed implementation process may refer to the detailed description of the sub-step S223 above.
Referring to fig. 5 again, the image matching module 130 is configured to perform data association between each object to be detected and each target object based on the similarity matrix, so as to obtain an optimal matching result between the current video frame image and the previous video frame image.
In this embodiment, the image matching module 130 may execute step S230 shown in fig. 2, and the specific execution process may refer to the above detailed description of step S230.
The update tracking module 140 is configured to, if an object to be detected successfully matched with the corresponding target object exists in the optimal matching result, update the feature model corresponding to the target object according to the CNN feature of the object to be detected successfully matched with the corresponding target object, and obtain a corresponding tracking result based on the object to be detected successfully matched.
In this embodiment, the manner of updating the feature model corresponding to the target object by the update tracking module 140 according to the CNN feature of the object to be detected successfully matched with the corresponding target object includes:
counting the feature number of the historical CNN features in the feature model corresponding to the target object to obtain the corresponding feature total number;
and comparing the total number of the features with the preset stored number of the features, and adding the CNN features of the object to be detected, which are successfully matched with the target object, into the feature model corresponding to the target object according to the comparison result.
The manner in which the update tracking module 140 adds the CNN feature of the object to be detected, which is successfully matched with the target object, to the feature model corresponding to the target object according to the comparison result includes:
if the comparison result is that the total number of the features is smaller than the preset feature storage number, directly adding the CNN features of the object to be detected, which are successfully matched with the target object, into a corresponding feature model for storage;
and if the comparison result is that the total number of the features is not less than the preset feature storage number, replacing any one of the historical CNN features except the initial CNN feature in the corresponding feature model with the CNN feature of the object to be detected so as to add the CNN feature of the object to be detected into the feature model corresponding to the target object.
In this embodiment, the update tracking module 140 may execute step S240 shown in fig. 2, and sub-step S241 and sub-step S242 shown in fig. 4, and the specific execution process may refer to the above detailed description of step S240, sub-step S241, and sub-step S242.
In summary, in the target tracking method and apparatus provided in the preferred embodiments of the present invention, the target tracking method has strong anti-interference capability and high target tracking success rate, and can continuously track the target object. The target tracking method is applied to a server, and the server stores characteristic models corresponding to target objects, wherein each characteristic model comprises historical CNN characteristics of the corresponding target object. Firstly, the method comprises the steps of carrying out target detection on a current video frame image to obtain each object to be detected in the current video frame image, and extracting CNN characteristics corresponding to each object to be detected from the current video frame image according to position information of each object to be detected in the detected current video frame image; then, according to the position information and corresponding CNN characteristics of each object to be detected in the current video frame image, and the position information and corresponding CNN characteristics of each target object in the previous video frame image, calculating to obtain an optimal similarity matrix between each object to be detected in the current video frame image and each target object in the previous video frame image; then, the method carries out data association on each object to be detected and each target object based on the similarity matrix to obtain an optimal matching result between the current video frame image and the previous video frame image; finally, when the optimal matching result has an object to be detected which is successfully matched with the corresponding target object, the method updates the feature model corresponding to the target object according to the CNN feature of the object to be detected which is successfully matched with the corresponding target object, and obtains a corresponding tracking result based on the object to be detected which is successfully matched, thereby reducing the influence of an interfering object on target tracking, improving the success rate of target tracking and realizing continuous tracking of the target object.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A target tracking method is applied to a server, the server stores feature models corresponding to target objects, wherein each feature model comprises historical CNN features of the corresponding target object, and the method comprises the following steps:
performing target detection on a current video frame image, and extracting CNN characteristics corresponding to each object to be detected from the current video frame image according to the position information of each object to be detected in the detected current video frame image;
calculating to obtain a similarity matrix between each object to be detected in the current video frame image and each target object in the previous video frame image according to the position information and the corresponding CNN characteristic of each object to be detected in the current video frame image, and the position information and the corresponding characteristic model of each target object in the previous video frame image;
performing data association on each object to be detected and each target object based on the similarity matrix to obtain an optimal matching result between the current video frame image and the previous video frame image;
if the optimal matching result contains an object to be detected which is successfully matched with the corresponding target object, updating the feature model corresponding to the target object according to the CNN feature of the object to be detected which is successfully matched with the corresponding target object, and obtaining a corresponding tracking result based on the object to be detected which is successfully matched;
the step of calculating the similarity matrix between each object to be detected in the current video frame image and each target object in the previous video frame image comprises the following steps:
calculating to obtain the feature similarity between each object to be detected and each target object based on the historical CNN features in the corresponding feature model of each target object;
calculating to obtain the spatial similarity and the shape similarity between each object to be detected and each target object based on the position information and the target size information of each target object in the last video frame image and the position information and the target size information of each object to be detected in the current video frame image;
multiplying and calculating according to the feature similarity, the space similarity and the shape similarity between each object to be detected and each target object to obtain the association similarity between each object to be detected and each target object, and correspondingly obtaining a similarity matrix;
the step of calculating the spatial similarity and the shape similarity between each object to be detected and each target object based on the position information and the target size information of each target object in the previous video frame image and the position information and the target size information of each object to be detected in the current video frame image comprises the following steps:
calculating and solving the spatial similarity between each object to be detected and each target object according to the area width, the area height, the X coordinate information and the Y coordinate information of each object to be detected, and the X coordinate information and the Y coordinate information of each target object;
calculating and solving the shape similarity between each object to be detected and each target object according to the area width, the area height, the X coordinate information and the Y coordinate information of each object to be detected, and the area width and the area height of each target object;
the space similarity and the shape similarity both accord with the matching criterion of the Hungarian algorithm and the expansion algorithm thereof; wherein the spatial similarity and the shape similarity can be calculated by the following formula:
Figure FDA0003305752390000021
Figure FDA0003305752390000022
wherein, trkiRepresenting the ith target object, detjRepresenting the jth object to be detected, X, Y, W, H respectively representing the x coordinate value, y coordinate value, area width and area height of the upper left corner point of the corresponding characteristic area of the object in the corresponding video frame image, and affmot(trki,detj) Representing the spatial similarity, aff, between the ith target object and the jth object to be detectedshp(trki,detj) Representing the shape similarity between the ith target object and the jth object to be detected。
2. The method according to claim 1, wherein the step of calculating the feature similarity between each object to be detected and each target object based on the historical CNN features in the feature model corresponding to each target object comprises:
calculating the cosine distance between the CNN characteristic of each object to be detected and each historical CNN characteristic in the characteristic model corresponding to each target object to obtain each cosine distance between the object to be detected and the corresponding target object;
and selecting the cosine distance with the minimum value from the cosine distances as the characteristic similarity between the object to be detected and the corresponding target object.
3. The method according to any one of claims 1-2, wherein the step of updating the feature model corresponding to the target object according to the CNN feature of the object to be detected that is successfully matched with the corresponding target object comprises:
counting the feature number of the historical CNN features in the feature model corresponding to the target object to obtain the corresponding feature total number;
and comparing the total number of the features with the preset stored number of the features, and adding the CNN features of the object to be detected, which are successfully matched with the target object, into the feature model corresponding to the target object according to the comparison result.
4. The method according to claim 3, wherein the step of adding the CNN feature of the object to be detected, which is successfully matched with the target object, into the feature model corresponding to the target object according to the comparison result comprises:
if the comparison result is that the total number of the features is smaller than the preset feature storage number, directly adding the CNN features of the object to be detected, which are successfully matched with the target object, into a corresponding feature model for storage;
and if the comparison result is that the total number of the features is not less than the preset feature storage number, replacing any one of the historical CNN features except the initial CNN feature in the corresponding feature model with the CNN feature of the object to be detected so as to add the CNN feature of the object to be detected into the feature model corresponding to the target object.
5. An object tracking apparatus applied to a server storing feature models corresponding to respective object objects, wherein each feature model includes a history CNN feature of the corresponding object, the apparatus comprising:
the detection extraction module is used for carrying out target detection on the current video frame image and extracting CNN characteristics corresponding to each object to be detected from the current video frame image according to the position information of each object to be detected in the detected current video frame image;
the matrix calculation module is used for calculating to obtain a similarity matrix between each object to be detected in the current video frame image and each target object in the previous video frame image according to the position information and the corresponding CNN characteristic of each object to be detected in the current video frame image and the position information and the corresponding characteristic model of each target object in the previous video frame image;
the image matching module is used for performing data association on each object to be detected and each target object based on the similarity matrix to obtain an optimal matching result between the current video frame image and the previous video frame image;
the updating and tracking module is used for updating the characteristic model corresponding to the target object according to the CNN characteristics of the object to be detected successfully matched with the corresponding target object if the object to be detected successfully matched with the corresponding target object exists in the optimal matching result, and obtaining a corresponding tracking result based on the object to be detected successfully matched;
the matrix calculation module comprises a similarity calculation operator module and a matrix generation submodule;
the similarity operator module is used for calculating and obtaining the feature similarity between each object to be detected and each target object based on the historical CNN features in the feature model corresponding to each target object;
the similarity calculation operator module is further used for calculating and obtaining the spatial similarity and the shape similarity between each object to be detected and each target object based on the position information and the target size information of each target object in the last video frame image and the position information and the target size information of each object to be detected in the current video frame image;
the matrix generation submodule is used for multiplying and calculating the correlation similarity between each object to be detected and each target object according to the feature similarity, the space similarity and the shape similarity between each object to be detected and each target object, and correspondingly obtaining a similarity matrix;
the similarity calculation operator module is specifically configured to, when the similarity calculation operator module is used to calculate spatial similarity and shape similarity between each object to be detected and each target object based on the position information and the target size information of each target object in the previous video frame image and the position information and the target size information of each object to be detected in the current video frame image, the similarity calculation operator module is configured to:
calculating and solving the spatial similarity between each object to be detected and each target object according to the area width, the area height, the X coordinate information and the Y coordinate information of each object to be detected, and the X coordinate information and the Y coordinate information of each target object;
calculating and solving the shape similarity between each object to be detected and each target object according to the area width, the area height, the X coordinate information and the Y coordinate information of each object to be detected, and the area width and the area height of each target object;
the space similarity and the shape similarity both accord with the matching criterion of the Hungarian algorithm and the expansion algorithm thereof; wherein the spatial similarity and the shape similarity can be calculated by the following formula:
Figure FDA0003305752390000051
Figure FDA0003305752390000052
wherein, trkiRepresents the ith target object, detjRepresenting the jth object to be detected, X, Y, W, H respectively representing the x coordinate value, the y coordinate value, the region width and the region height of the upper left corner of the corresponding characteristic region of the object in the corresponding video frame image, and affmot(trki,detj) Representing the spatial similarity, aff, between the ith target object and the jth object to be detectedshp(trki,detj) And representing the shape similarity between the ith target object and the jth object to be detected.
6. The device according to claim 5, wherein the manner of calculating the feature similarity between each object to be detected and each target object by the similarity operator module based on the historical CNN features in the feature model corresponding to each target object comprises:
calculating the cosine distance between the CNN characteristic of each object to be detected and each historical CNN characteristic in the characteristic model corresponding to each target object to obtain each cosine distance between the object to be detected and the corresponding target object;
and selecting the cosine distance with the minimum value from the cosine distances as the characteristic similarity between the object to be detected and the corresponding target object.
7. The apparatus according to any one of claims 5 to 6, wherein the manner of updating the feature model corresponding to the target object by the update tracking module according to the CNN feature of the object to be detected successfully matched with the corresponding target object includes:
counting the feature number of the historical CNN features in the feature model corresponding to the target object to obtain the corresponding feature total number;
and comparing the total number of the features with the preset stored number of the features, and adding the CNN features of the object to be detected, which are successfully matched with the target object, into the feature model corresponding to the target object according to the comparison result.
8. The apparatus according to claim 7, wherein the manner in which the update tracking module adds the CNN feature of the object to be detected, which is successfully matched with the target object, to the feature model corresponding to the target object according to the comparison result includes:
if the comparison result is that the total number of the features is smaller than the preset feature storage number, directly adding the CNN features of the object to be detected, which are successfully matched with the target object, into a corresponding feature model for storage;
and if the comparison result is that the total number of the features is not less than the preset feature storage number, replacing any one of the historical CNN features except the initial CNN feature in the corresponding feature model with the CNN feature of the object to be detected so as to add the CNN feature of the object to be detected into the feature model corresponding to the target object.
CN201810049002.8A 2018-01-18 2018-01-18 Target tracking method and device Active CN110059521B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810049002.8A CN110059521B (en) 2018-01-18 2018-01-18 Target tracking method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810049002.8A CN110059521B (en) 2018-01-18 2018-01-18 Target tracking method and device

Publications (2)

Publication Number Publication Date
CN110059521A CN110059521A (en) 2019-07-26
CN110059521B true CN110059521B (en) 2022-05-13

Family

ID=67315187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810049002.8A Active CN110059521B (en) 2018-01-18 2018-01-18 Target tracking method and device

Country Status (1)

Country Link
CN (1) CN110059521B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414443A (en) * 2019-07-31 2019-11-05 苏州市科远软件技术开发有限公司 A kind of method for tracking target, device and rifle ball link tracking
CN110660078B (en) * 2019-08-20 2024-04-05 平安科技(深圳)有限公司 Object tracking method, device, computer equipment and storage medium
CN110517293A (en) * 2019-08-29 2019-11-29 京东方科技集团股份有限公司 Method for tracking target, device, system and computer readable storage medium
KR20220098311A (en) * 2020-12-31 2022-07-12 센스타임 인터내셔널 피티이. 리미티드. Manipulation event recognition method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102201060A (en) * 2011-05-31 2011-09-28 温州大学 Method for tracking and evaluating nonparametric outline based on shape semanteme
CN104615740A (en) * 2015-02-11 2015-05-13 中南大学 Volunteered geographic information credibility calculation method
CN105141903A (en) * 2015-08-13 2015-12-09 中国科学院自动化研究所 Method for retrieving object in video based on color information
CN105224912A (en) * 2015-08-31 2016-01-06 电子科技大学 Based on the video pedestrian detection and tracking method of movable information and Track association
CN106373145A (en) * 2016-08-30 2017-02-01 上海交通大学 Multi-target tracking method based on tracking fragment confidence and discrimination appearance learning
CN107292911A (en) * 2017-05-23 2017-10-24 南京邮电大学 A kind of multi-object tracking method merged based on multi-model with data correlation
CN107316322A (en) * 2017-06-27 2017-11-03 上海智臻智能网络科技股份有限公司 Video tracing method and device and object identifying method and device

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101141633B (en) * 2007-08-28 2011-01-05 湖南大学 Moving object detecting and tracing method in complex scene
WO2009098894A1 (en) * 2008-02-06 2009-08-13 Panasonic Corporation Electronic camera and image processing method
US20100302138A1 (en) * 2009-05-29 2010-12-02 Microsoft Corporation Methods and systems for defining or modifying a visual representation
CN101673403B (en) * 2009-10-10 2012-05-23 安防制造(中国)有限公司 Target following method in complex interference scene
CN104866616B (en) * 2015-06-07 2019-01-22 中科院成都信息技术股份有限公司 Monitor video Target Searching Method
CN105023008B (en) * 2015-08-10 2018-12-18 河海大学常州校区 The pedestrian of view-based access control model conspicuousness and multiple features recognition methods again
CN105357425B (en) * 2015-11-20 2019-03-15 小米科技有限责任公司 Image capturing method and device
CN105931269A (en) * 2016-04-22 2016-09-07 海信集团有限公司 Tracking method for target in video and tracking device thereof
CN106203491B (en) * 2016-07-01 2019-03-05 交通运输部路网监测与应急处置中心 A kind of fusion update method of highway vector data
CN106296729A (en) * 2016-07-27 2017-01-04 南京华图信息技术有限公司 The REAL TIME INFRARED THERMAL IMAGE imaging ground moving object tracking of a kind of robust and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102201060A (en) * 2011-05-31 2011-09-28 温州大学 Method for tracking and evaluating nonparametric outline based on shape semanteme
CN104615740A (en) * 2015-02-11 2015-05-13 中南大学 Volunteered geographic information credibility calculation method
CN105141903A (en) * 2015-08-13 2015-12-09 中国科学院自动化研究所 Method for retrieving object in video based on color information
CN105224912A (en) * 2015-08-31 2016-01-06 电子科技大学 Based on the video pedestrian detection and tracking method of movable information and Track association
CN106373145A (en) * 2016-08-30 2017-02-01 上海交通大学 Multi-target tracking method based on tracking fragment confidence and discrimination appearance learning
CN107292911A (en) * 2017-05-23 2017-10-24 南京邮电大学 A kind of multi-object tracking method merged based on multi-model with data correlation
CN107316322A (en) * 2017-06-27 2017-11-03 上海智臻智能网络科技股份有限公司 Video tracing method and device and object identifying method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Robust Particle Filter-Based Method for Tracking Single Visual Object Through Complex Scenes Using Dynamical Object Shape and Appearance Similarity;Zulfiqar Hasan Khan等;《Journal of Signal Processing Systems》;20101009;第65卷;第63-79页 *
基于视频的行人检测与跟踪算法研究;罗招材;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170215(第2期);第I138-2908页 *

Also Published As

Publication number Publication date
CN110059521A (en) 2019-07-26

Similar Documents

Publication Publication Date Title
CN110059521B (en) Target tracking method and device
CN109035299B (en) Target tracking method and device, computer equipment and storage medium
CN108073864B (en) Target object detection method, device and system and neural network structure
CN111696132B (en) Target tracking method, device, computer readable storage medium and robot
CN108182695B (en) Target tracking model training method and device, electronic equipment and storage medium
CN110335313B (en) Audio acquisition equipment positioning method and device and speaker identification method and system
CN108647587B (en) People counting method, device, terminal and storage medium
CN113239719B (en) Trajectory prediction method and device based on abnormal information identification and computer equipment
CN112991389B (en) Target tracking method and device and mobile robot
CN111553234A (en) Pedestrian tracking method and device integrating human face features and Re-ID feature sorting
CN113112542A (en) Visual positioning method and device, electronic equipment and storage medium
EP4170561A1 (en) Method and device for improving performance of data processing model, storage medium and electronic device
CN114445453A (en) Real-time multi-target tracking method and system in automatic driving
CN111476814A (en) Target tracking method, device, equipment and storage medium
CN111507999B (en) Target tracking method and device based on FDSST algorithm
CN114187009A (en) Feature interpretation method, device, equipment and medium of transaction risk prediction model
CN112819889A (en) Method and device for determining position information, storage medium and electronic device
CN116977671A (en) Target tracking method, device, equipment and storage medium based on image space positioning
CN112950687B (en) Method and device for determining tracking state, storage medium and electronic equipment
CN111199179B (en) Target object tracking method, terminal equipment and medium
CN112230801A (en) Kalman smoothing processing method, memory and equipment applied to touch trajectory
CN105651284B (en) The method and device of raising experience navigation interior joint efficiency of selection
CN112288003B (en) Neural network training and target detection method and device
CN117368879B (en) Radar diagram generation method and device, terminal equipment and readable storage medium
CN117542017A (en) Filtering method, device, equipment and storage medium for target class confidence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant