CN112381132A - Target object tracking method and system based on fusion of multiple cameras - Google Patents

Target object tracking method and system based on fusion of multiple cameras Download PDF

Info

Publication number
CN112381132A
CN112381132A CN202011253000.4A CN202011253000A CN112381132A CN 112381132 A CN112381132 A CN 112381132A CN 202011253000 A CN202011253000 A CN 202011253000A CN 112381132 A CN112381132 A CN 112381132A
Authority
CN
China
Prior art keywords
target object
target
detection frame
image
object detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011253000.4A
Other languages
Chinese (zh)
Inventor
赖哲渊
姚明江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SAIC Volkswagen Automotive Co Ltd
Original Assignee
SAIC Volkswagen Automotive Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SAIC Volkswagen Automotive Co Ltd filed Critical SAIC Volkswagen Automotive Co Ltd
Priority to CN202011253000.4A priority Critical patent/CN112381132A/en
Publication of CN112381132A publication Critical patent/CN112381132A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for recognising patterns
    • G06K9/62Methods or arrangements for pattern recognition using electronic means
    • G06K9/6288Fusion techniques, i.e. combining data from various sources, e.g. sensor fusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for recognising patterns
    • G06K9/62Methods or arrangements for pattern recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06K9/6256Obtaining sets of training patterns; Bootstrap methods, e.g. bagging, boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Abstract

The invention discloses a target object tracking method based on fusion of a plurality of cameras, which comprises the following steps: 100: extracting target object information to be tracked in real time from images shot by a plurality of cameras; 200: adopting a trained depth residual encoder to identify images input into a target object detection frame; 300: storing the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp of the target object detection frame in the image, and taking the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp as corresponding historical data; 400: predicting the current position of the target object detection frame in the image according to the historical position information of the target object detection frame in the image; 500: screening out candidate target objects smaller than a first threshold value based on the set first threshold value; 600: screening out candidate matching target objects smaller than a second threshold value based on a set second threshold value; 700: and performing matching assignment on the current target object which is currently detected from the candidate matching target objects by adopting a Hungarian algorithm so as to realize tracking.

Description

Target object tracking method and system based on fusion of multiple cameras
Technical Field
The present invention relates to a target tracking method and system, and more particularly, to a target tracking method and system based on a camera.
Background
In recent years, with the rapid development of the automatic driving technology, the possibility of the automatic driving automobile being used in daily life is increasing. The method for detecting and tracking the object by utilizing the vehicle-mounted camera is an important link of the automatic driving automobile in automatic driving perception.
At present, the existing multi-object reproduction tracking method is generally performed on the basis of detection, and is almost based on a vehicle-mounted front-view camera. The current mainstream tracking method comprises the following steps: object positions are predicted based on optical flow tracking, linear velocity assumptions and matched by cross-over-crossing ratios (IOU), and the like.
However, the above methods have problems that a large estimation deviation is brought when a detected object is occluded for a long time, an object that has already appeared is tagged with a new tag (ID), and the possibility that IDs of different objects are exchanged is high; on the other hand, the field of view of a single camera is limited, and in a scene with multiple cameras, such as looking around, tracked objects are easily lost, and the algorithm is hardly applicable, which in turn affects prediction and planning.
Based on the situation, the invention is based on the automatic driving scene of the vehicle, and the number of cameras of the automatic driving vehicle is considered to be large, so that the target object tracking method based on the fusion of the plurality of cameras is expected to be obtained.
Disclosure of Invention
One of the objectives of the present invention is to provide a target tracking method based on multiple camera fusion, which can perform training and re-recognition on target objects, such as vehicles and pedestrians, during automatic driving, so as to extract appearance features of the target objects, and use the similarity as a matching reference to improve the tracking accuracy of the target objects.
In order to achieve the above object, the present invention provides a target tracking method based on fusion of multiple cameras, which includes the steps of:
100: extracting target object information to be tracked in real time from images shot by a plurality of cameras, wherein the target object information at least comprises: the position information of the target object detection frame in the image, the image in the target object detection frame, the target object type and the target object ID;
200: adopting a trained depth residual encoder to identify the image input into the target object detection frame so as to output a corresponding target object appearance characteristic code; the number of depth residual encoders corresponds to the number of object classes;
300: storing the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp of the target object detection frame in the image, and taking the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp as corresponding historical data;
400: predicting the current position of the target object detection frame in the image according to the historical position information of the target object detection frame in the image to obtain the predicted position of the target object detection frame;
500: calculating the Euclidean distance between the current target object detection frame and the corresponding predicted position of the target object detection frame based on the position of the current target object detection frame, and screening out candidate target objects smaller than a first threshold value based on a set first threshold value;
600: calculating the cosine distance between the appearance feature code of the current target and the appearance feature code of the candidate target based on the appearance feature code of the current target detected currently, and screening out candidate matching targets smaller than a second threshold based on a set second threshold;
700: and performing matching assignment on the current target object which is currently detected from the candidate matching target objects by adopting a Hungarian algorithm so as to realize tracking.
The difference between the target object tracking method based on the fusion of a plurality of cameras and the prior tracking technology is as follows: the traditional tracking technology is mostly used for predicting and judging based on the position of a target object, whether the target object is the same or not is difficult to determine in a plurality of different cameras, and the ID of the target object is often easy to lose; the target object tracking method based on the fusion of the plurality of cameras respectively performs training re-recognition on vehicles and pedestrians, extracts appearance characteristics of the target object, and takes the similarity as a matching reference, so that the tracking accuracy of the target object is effectively improved.
Further, in the target tracking method based on the fusion of multiple cameras of the present invention, the target at least includes a pedestrian and a vehicle.
Further, in the target tracking method based on the fusion of multiple cameras according to the present invention, in step 400, a kalman filter is used to predict the current position of the target detection frame in the image.
Further, in the target tracking method based on the fusion of multiple cameras according to the present invention, a preprocessing step is further included between step 100 and step 200: the image within the object detection box is scaled to the input size of the depth residual encoder.
Further, in the target tracking method based on the fusion of multiple cameras of the present invention, the position information of the target detection frame in the image includes the pixel position of the center point of the detection frame and the length and width of the detection frame.
Further, in the target tracking method based on the fusion of multiple cameras according to the present invention, the method further includes step 800: when the currently detected target object is not matched with the corresponding target object from the candidate matching target objects, a new ID is given to the currently detected target object, and the currently detected target object is stored as historical data.
Further, in the target tracking method based on the fusion of the plurality of cameras, the depth residual encoder is trained by adopting an MOT pedestrian re-identification data set and a vehicle-mounted camera acquisition data set.
Accordingly, another object of the present invention is to provide a target tracking system based on fusion of multiple cameras, which can be used to implement the above-mentioned target tracking method of the present invention.
In order to achieve the above object, the present invention provides a target tracking system based on fusion of multiple cameras, which includes:
the target object detection module extracts target object information to be tracked in real time from images shot by a plurality of cameras, and the target object information at least comprises: the position information of the target object detection frame in the image, the image in the target object detection frame, the target object type and the target object ID;
a target re-identification encoding module including depth residual encoders corresponding to the number of types of the target, each depth residual encoder outputting a corresponding target appearance feature code based on the input image in the target detection frame;
the database receives the position information, the target object type, the target object ID, the target object appearance feature code and the corresponding timestamp of the target object detection frame in the image and stores the position information, the target object type, the target object ID, the target object appearance feature code and the corresponding timestamp as corresponding historical data;
the online matching and tracking module comprises a position prediction submodule, a distance calculation submodule and a Hungarian matching submodule, wherein:
the position prediction sub-module predicts the current position of the target object detection frame in the image based on the historical position information of the target object detection frame in the image stored in the database so as to obtain the predicted position of the target object detection frame;
the distance calculation sub-module calculates the Euclidean distance between the position of the current target object detection frame detected currently and the predicted position of the corresponding target object detection frame, and screens out candidate target objects smaller than a first threshold value from a database on the basis of the set first threshold value; then calculating the cosine distance between the appearance feature code of the current detected target object and the appearance feature code of the candidate target object, and screening out candidate matching target objects smaller than a second threshold value from the database based on the set second threshold value;
and the Hungarian matching submodule performs matching assignment on the current detected target object from the candidate matching target objects by adopting a Hungarian algorithm so as to realize the tracking work.
Further, in the target tracking system based on the fusion of the plurality of cameras, the database includes a matching database and a cloud database, the cloud database stores all historical data of the target, and the matching database stores historical data of a set number of frames of the target.
Further, in the target tracking system based on the fusion of multiple cameras of the present invention, the target re-identification encoding module further includes a preprocessing sub-module, and the preprocessing sub-module scales the image in the target detection frame to the input size of the depth residual encoder.
Compared with the prior art, the target object tracking method and system based on the fusion of the plurality of cameras have the following advantages and beneficial effects:
(1) the invention provides a cross-camera multi-object tracking method suitable for an automatic driving scene of a vehicle, which can realize tracking of a wider field of view by utilizing the spatial arrangement of a plurality of cameras;
(2) the depth residual error encoder in the target object re-recognition encoding module can be trained through the data sets of pedestrian and vehicle re-recognition respectively, so that the image in the target object detection frame is input based on the image, the corresponding target object appearance characteristic code is output, and the tracking accuracy of the target object is improved;
drawings
Fig. 1 schematically shows a tracking algorithm overall module schematic diagram of a target tracking system based on multiple camera fusion in an embodiment of the invention.
Fig. 2 schematically shows a neural network training diagram of a target re-identification coding module of the target tracking system based on fusion of multiple cameras in an embodiment of the present invention.
Fig. 3 schematically shows a database module diagram of a target object tracking system based on multiple camera fusion according to an embodiment of the present invention.
Fig. 4 schematically shows a flowchart of steps of a target tracking method based on multi-camera fusion according to an embodiment of the present invention.
Detailed Description
The target tracking method and system based on multi-camera fusion according to the present invention will be further explained and explained with reference to the drawings and specific embodiments of the specification, however, the explanation and explanation do not unduly limit the technical solution of the present invention.
Fig. 1 schematically shows a tracking algorithm overall module schematic diagram of a target tracking system based on multiple camera fusion in an embodiment of the invention.
As shown in fig. 1, in the present embodiment, the target tracking system according to the present invention may include: the device comprises a target object detection module, a target object re-identification coding module, a database and an online matching tracking module.
In the target tracking system of the present invention, the target detection module can extract target information to be tracked in real time from images taken by a plurality of cameras, and the target information at least includes: position information of the object detection frame in the image, the image within the object detection frame, the object type, and the object ID. The target object detection module transmits the target object information to be tracked, which is extracted in real time, to the target object re-identification coding module, and the target object re-identification coding module may include depth residual encoders corresponding to the number of types of the target object, and each depth residual encoder outputs a corresponding target object appearance feature code based on the input image in the target object detection frame.
Correspondingly, the database in the target object tracking system can effectively store the position information, the target object type, the target object ID, the target object appearance feature code and the corresponding timestamp of the target object detection frame in the image, and the position information, the target object type, the target object ID, the target object appearance feature code and the corresponding timestamp of the target object detection frame are used as corresponding historical data.
The online matching and tracking module can receive the output of the current target object re-identification coding module and perform feature matching with the database to complete re-identification and tracking of the target object.
It should be noted that, in the present invention, the position prediction sub-module in the online matching and tracking module can predict the current position of the target object detection frame in the image based on the historical position information of the target object detection frame in the image stored in the database, so as to obtain the predicted position of the target object detection frame; a distance calculation sub-module in the online matching and tracking module can calculate the Euclidean distance between the position of the current target object detection frame currently detected and the predicted position of the corresponding target object detection frame, and screen out candidate target objects smaller than a first threshold value from a database based on the set first threshold value; then calculating the cosine distance between the appearance feature code of the current detected target object and the appearance feature code of the candidate target object, and screening out candidate matching target objects smaller than a second threshold value from the database based on the set second threshold value; after the candidate matching target objects are screened out, matching assignment is carried out on the current target object detected currently from the candidate matching target objects by the Hungarian matching submodule through the Hungarian algorithm, and tracking is achieved.
In addition, in this embodiment, the target re-identification encoding module in the target tracking system further includes a pre-processing sub-module, and the pre-processing sub-module may scale the image in the target detection frame to the input size of the depth residual encoder.
In addition, it should be noted that, in the present embodiment, the target object in the target object tracking system based on multiple camera fusion according to the present invention may include at least a pedestrian and a vehicle.
Fig. 2 schematically shows a neural network training diagram of a target re-identification coding module of the target tracking system based on fusion of multiple cameras in an embodiment of the present invention.
As shown in fig. 2, in the present embodiment, the target re-identification coding module of the target tracking system according to the present invention introduces a depth residual encoder (or called depth residual network) to extract coding of the appearance features of the target. The depth residual encoder consists of 2 convolutional layers, 1 pooling layer, 6 residual modules and 1 fully-connected layer. One part of data in the training stage of the depth residual error encoder is derived from a public MOT pedestrian re-identification data set, the other part of data is derived from a vehicle-mounted camera acquisition data set, and the depth residual error encoder adopts the MOT pedestrian re-identification data set and the vehicle-mounted camera acquisition data set for training.
In the present embodiment, in the training phase of the depth residual encoder, the data set is extended by using data enhancement in the preprocessing module, considering that the postures and the integrity of the same target object in different cameras are greatly different. Specifically, by randomly picking 1/3 pedestrians and vehicles in the data set, randomly cropping the bottom and top portion pixels, and then scaling to the original size. The enhanced data set better simulates the misalignment problem that exists in a multi-camera scene. The target object re-identification coding module separately trains 2 corresponding weights for pedestrians and vehicles, and can better distinguish the characteristics of the same type of target object. The output of the depth residual network is a 128-dimensional feature vector, which is encoded as the appearance feature of the target.
Fig. 3 schematically shows a database module of the target tracking system based on multi-camera fusion according to an embodiment of the present invention.
As shown in fig. 3, in this embodiment, the database in the target tracking system based on multiple camera fusion according to the present invention may include: the system comprises a matching database and a cloud database.
In the invention, the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding timestamp of the target object detection frame in the image need to be uploaded to a cloud database and a matching database. The cloud database records feature codes of all target objects, and on one hand, the cloud database can be used for adjusting the number of frames stored in the matching database and increasing the robustness of the algorithm; on the other hand, the characteristics and the appearing time sequence of a certain target object can be quickly found in a scene with monitoring requirements.
Accordingly, the matching database in the database only stores the historical data of the set frame number of the target object (or called tracker), and simultaneously has a data updating and deleting mechanism. In the present embodiment, the matching database may store only records of past 100 frames of the target object. When the target object (or called tracker) is not matched with the target objects of all the cameras in a new frame, the target object loss time is accumulated, and when the loss time exceeds a threshold value, the target object (or called tracker) is deleted from the matching database.
Fig. 4 schematically shows a flowchart of steps of a target tracking method based on multi-camera fusion according to an embodiment of the present invention.
It should be noted that, in the present invention, a target object tracking method based on the fusion of multiple cameras is also disclosed. As shown in fig. 4, and with reference to fig. 1 to fig. 3, the target tracking method based on multiple camera fusion according to the present invention can be obtained, and the target tracking method may include the following steps:
100: extracting target object information to be tracked in real time from images shot by a plurality of cameras, wherein the target object information at least comprises: the position information of the target object detection frame in the image, the image in the target object detection frame, the target object type and the target object ID;
200: adopting a trained depth residual encoder to identify the image input into the target object detection frame so as to output a corresponding target object appearance characteristic code; the number of depth residual encoders corresponds to the number of object classes;
300: storing the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp of the target object detection frame in the image, and taking the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp as corresponding historical data;
400: predicting the current position of the target object detection frame in the image according to the historical position information of the target object detection frame in the image to obtain the predicted position of the target object detection frame;
500: calculating the Euclidean distance between the current target object detection frame and the corresponding predicted position of the target object detection frame based on the position of the current target object detection frame, and screening out candidate target objects smaller than a first threshold value based on a set first threshold value;
600: calculating the cosine distance between the appearance feature code of the current target and the appearance feature code of the candidate target based on the appearance feature code of the current target detected currently, and screening out candidate matching targets smaller than a second threshold based on a set second threshold;
700: and performing matching assignment on the current target object which is currently detected from the candidate matching target objects by adopting a Hungarian algorithm so as to realize tracking.
In the target tracking method according to the present invention, in step 400, a kalman filter may be used to predict the current position of the target detection frame in the image.
In addition, in this embodiment, a preprocessing step may be further included between the step 100 and the step 200: the image within the object detection box is scaled to the input size of the depth residual encoder.
In addition, in some other embodiments, in the target tracking system based on multiple camera fusion according to the present invention, the position information of the target detection frame in the image may include the pixel position of the center point of the detection frame and the length and width of the detection frame.
In the target tracking method based on multi-camera fusion according to the present invention, in step 800, when a currently detected target is not matched with a corresponding target from among candidate matching targets, a new ID is assigned to the currently detected target, and the currently detected target is stored as history data.
Referring to fig. 4 in conjunction with the step 100 and 700 of the target tracking method, the target tracking method of the present invention is implemented based on the target tracking system of the present invention.
In the embodiment shown in fig. 4, the target to be tracked by the target tracking method according to the present invention is referred to as a tracker. In the process shown in fig. 4, the target detection module detects an input image sequence, obtains a tracker detection frame from an image of a current frame, and inputs the tracker detection frame into a trained depth residual encoder, thereby outputting a corresponding tracker appearance feature code.
And when an online matching tracking module in the system processes the previous frame, the position of the tracker possibly appearing in the current frame is measured by using a Kalman filtering predictor according to the historical information of the central position of each tracker detection frame, so as to obtain the predicted position of the tracker detection frame.
Accordingly, in the present embodiment, the distance calculating section of the online matching tracking module includes two steps of screening:
step one, calculating Euclidean distance between a current frame detection frame and a prediction position of each tracker detection frame; and secondly, calculating the minimum cosine distance between the appearance feature code of the tracker meeting the condition that the Euclidean distance in the first step is smaller than the threshold value, and regarding the minimum cosine distance as the similarity between the tracker and the detection target object, so as to screen out the tracker with the similarity smaller than the threshold value with the detection target object, and regarding the tracker as a potential candidate matching tracker. Wherein, the significance of the first step of screening is as follows: the positions of the trackers are constrained, the number of candidate trackers is reduced, and the calculation burden of the second step of screening is reduced.
In the embodiment, the detection target object and the potential matching tracker are assigned by adopting a Hungarian algorithm, if the detection target object and the potential matching tracker are matched with the corresponding tracker, the ID of the tracker is given to the target object, and meanwhile, the information of the matching database is updated to complete the tracking of the current target object; and if the current detection target object is not matched with the corresponding tracker, the current detection target object is a new target object, a new ID is given to the current detection target object, and the current detection target object is uploaded to the matching database to be used as the tracker.
In the above technical solution, for the remaining tracker that is not matched with the target object, it is described that if the tracker disappears in the current frame, the lost time is updated, and if the lost time exceeds the threshold, the tracker is deleted from the matching database, which indicates that the tracker is no longer observed by the camera.
The scope of the present invention is not limited to the examples given herein, and all prior art that does not contradict the inventive concept, including but not limited to prior patent documents, prior publications, and the like, are intended to be encompassed by the present invention.
In addition, the combination of the features in the present application is not limited to the combination described in the claims of the present application or the combination described in the embodiments, and all the features described in the present application may be freely combined or combined in any manner unless contradictory to each other.
It should also be noted that the above-mentioned embodiments are only specific embodiments of the present invention. It is apparent that the present invention is not limited to the above embodiments and similar changes or modifications can be easily made by those skilled in the art from the disclosure of the present invention and shall fall within the scope of the present invention.

Claims (10)

1. A target object tracking method based on fusion of a plurality of cameras is characterized by comprising the following steps:
100: extracting target object information to be tracked in real time from images shot by a plurality of cameras, wherein the target object information at least comprises: the position information of the target object detection frame in the image, the image in the target object detection frame, the target object type and the target object ID;
200: adopting a trained depth residual encoder to identify the image input into the target object detection frame so as to output a corresponding target object appearance characteristic code; the number of depth residual encoders corresponds to the number of object classes;
300: storing the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp of the target object detection frame in the image, and taking the position information, the target object type, the target object ID, the target object appearance characteristic code and the corresponding time stamp as corresponding historical data;
400: predicting the current position of the target object detection frame in the image according to the historical position information of the target object detection frame in the image to obtain the predicted position of the target object detection frame;
500: calculating the Euclidean distance between the current target object detection frame and the corresponding predicted position of the target object detection frame based on the position of the current target object detection frame, and screening out candidate target objects smaller than a first threshold value based on a set first threshold value;
600: calculating the cosine distance between the appearance feature code of the current target and the appearance feature code of the candidate target based on the appearance feature code of the current target detected currently, and screening out candidate matching targets smaller than a second threshold based on a set second threshold;
700: and performing matching assignment on the current target object which is currently detected from the candidate matching target objects by adopting a Hungarian algorithm so as to realize tracking.
2. The multi-camera fusion based object tracking method of claim 1, wherein the object comprises at least a pedestrian and a vehicle.
3. The method for tracking the target object based on the fusion of the plurality of cameras according to claim 1, wherein in step 400, a kalman filter is used to predict the current position of the target object detection frame in the image.
4. The target tracking method based on multi-camera fusion according to claim 1, further comprising a preprocessing step between the step 100 and the step 200: the image within the object detection box is scaled to the input size of the depth residual encoder.
5. The target tracking method based on the fusion of multiple cameras according to claim 1, wherein the position information of the target detection frame in the image comprises the pixel position of the center point of the detection frame and the length and width of the detection frame.
6. The method for tracking the target object based on the fusion of the plurality of cameras according to claim 1, further comprising the step 800 of: when the currently detected target object is not matched with the corresponding target object from the candidate matching target objects, a new ID is given to the currently detected target object, and the currently detected target object is stored as historical data.
7. The method of claim 1, wherein the depth residual encoder is trained using MOT pedestrian re-identification data sets and vehicle-mounted camera acquisition data sets.
8. A target tracking system based on fusion of a plurality of cameras, comprising:
the target object detection module extracts target object information to be tracked in real time from images shot by a plurality of cameras, and the target object information at least comprises: the position information of the target object detection frame in the image, the image in the target object detection frame, the target object type and the target object ID;
a target re-identification encoding module including depth residual encoders corresponding to the number of types of the target, each depth residual encoder outputting a corresponding target appearance feature code based on the input image in the target detection frame;
the database receives the position information, the target object type, the target object ID, the target object appearance feature code and the corresponding timestamp of the target object detection frame in the image and stores the position information, the target object type, the target object ID, the target object appearance feature code and the corresponding timestamp as corresponding historical data;
the online matching and tracking module comprises a position prediction submodule, a distance calculation submodule and a Hungarian matching submodule, wherein:
the position prediction sub-module predicts the current position of the target object detection frame in the image based on the historical position information of the target object detection frame in the image stored in the database so as to obtain the predicted position of the target object detection frame;
the distance calculation sub-module calculates the Euclidean distance between the position of the current target object detection frame detected currently and the predicted position of the corresponding target object detection frame, and screens out candidate target objects smaller than a first threshold value from a database on the basis of the set first threshold value; then calculating the cosine distance between the appearance feature code of the current detected target object and the appearance feature code of the candidate target object, and screening out candidate matching target objects smaller than a second threshold value from the database based on the set second threshold value;
and the Hungarian matching submodule performs matching assignment on the current detected target object from the candidate matching target objects by adopting a Hungarian algorithm so as to realize tracking.
9. The multi-camera fusion-based target tracking system of claim 8, wherein the database comprises a matching database and a cloud database, the cloud database stores all historical data of the target, and the matching database stores historical data of a set number of frames of the target.
10. The multi-camera fusion based object tracking system of claim 8 wherein the object re-identification encoding module further comprises a pre-processing sub-module that scales the image within the object detection box to the input size of a depth residual encoder.
CN202011253000.4A 2020-11-11 2020-11-11 Target object tracking method and system based on fusion of multiple cameras Pending CN112381132A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011253000.4A CN112381132A (en) 2020-11-11 2020-11-11 Target object tracking method and system based on fusion of multiple cameras

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011253000.4A CN112381132A (en) 2020-11-11 2020-11-11 Target object tracking method and system based on fusion of multiple cameras

Publications (1)

Publication Number Publication Date
CN112381132A true CN112381132A (en) 2021-02-19

Family

ID=74582097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011253000.4A Pending CN112381132A (en) 2020-11-11 2020-11-11 Target object tracking method and system based on fusion of multiple cameras

Country Status (1)

Country Link
CN (1) CN112381132A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598715A (en) * 2021-03-04 2021-04-02 奥特酷智能科技(南京)有限公司 Multi-sensor-based multi-target tracking method, system and computer readable medium
CN113012223A (en) * 2021-02-26 2021-06-22 清华大学 Target flow monitoring method and device, computer equipment and storage medium
CN113052876A (en) * 2021-04-25 2021-06-29 合肥中科类脑智能技术有限公司 Video relay tracking method and system based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074665A1 (en) * 2018-09-03 2020-03-05 Baidu Online Network Technology (Beijing) Co., Ltd. Object detection method, device, apparatus and computer-readable storage medium
CN111145213A (en) * 2019-12-10 2020-05-12 中国银联股份有限公司 Target tracking method, device and system and computer readable storage medium
CN111192297A (en) * 2019-12-31 2020-05-22 山东广域科技有限责任公司 Multi-camera target association tracking method based on metric learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200074665A1 (en) * 2018-09-03 2020-03-05 Baidu Online Network Technology (Beijing) Co., Ltd. Object detection method, device, apparatus and computer-readable storage medium
CN111145213A (en) * 2019-12-10 2020-05-12 中国银联股份有限公司 Target tracking method, device and system and computer readable storage medium
CN111192297A (en) * 2019-12-31 2020-05-22 山东广域科技有限责任公司 Multi-camera target association tracking method based on metric learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NICOLAI WOJKE ET AL.: "Simple Online And Realtime Tracking With A Deep Association Metric", 《ARXIV:1703.07402V1》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113012223A (en) * 2021-02-26 2021-06-22 清华大学 Target flow monitoring method and device, computer equipment and storage medium
CN112598715A (en) * 2021-03-04 2021-04-02 奥特酷智能科技(南京)有限公司 Multi-sensor-based multi-target tracking method, system and computer readable medium
CN113052876A (en) * 2021-04-25 2021-06-29 合肥中科类脑智能技术有限公司 Video relay tracking method and system based on deep learning

Similar Documents

Publication Publication Date Title
CN112381132A (en) Target object tracking method and system based on fusion of multiple cameras
EP2549759B1 (en) Method and system for facilitating color balance synchronization between a plurality of video cameras as well as method and system for obtaining object tracking between two or more video cameras
CN104966304A (en) Kalman filtering and nonparametric background model-based multi-target detection tracking method
TWI750498B (en) Method and device for processing video stream
WO2008020598A1 (en) Subject number detecting device and subject number detecting method
CN110232330B (en) Pedestrian re-identification method based on video detection
Bedruz et al. Real-time vehicle detection and tracking using a mean-shift based blob analysis and tracking approach
CN103093198A (en) Crowd density monitoring method and device
CN111881853B (en) Method and device for identifying abnormal behaviors in oversized bridge and tunnel
CN111199556A (en) Indoor pedestrian detection and tracking method based on camera
KR101842488B1 (en) Smart monitoring system applied with patten recognition technic based on detection and tracking of long distance-moving object
CN112329645A (en) Image detection method, image detection device, electronic equipment and storage medium
Wang et al. MCF3D: Multi-stage complementary fusion for multi-sensor 3D object detection
CN113469201A (en) Image acquisition equipment offset detection method, image matching method, system and equipment
JP4918615B2 (en) Object number detection device and object number detection method
Li et al. Time-spatial multi-scale net for vehicle counting and traffic volume estimation
CN110782433B (en) Dynamic information violent parabolic detection method and device based on time sequence and storage medium
CN111860050A (en) Loop detection method and device based on image frame and vehicle-mounted terminal
Zhang et al. Vehicle Detection and Tracking in Remote Sensing Satellite Vidio Based on Dynamic Association
CN113408550B (en) Intelligent weighing management system based on image processing
CN113628251B (en) Smart hotel terminal monitoring method
JP2009205695A (en) Apparatus and method for detecting the number of objects
KR102383377B1 (en) Electronic device for recognizing license plate
CN113052876B (en) Video relay tracking method and system based on deep learning
CN115019241B (en) Pedestrian identification and tracking method and device, readable storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination