CN111383252A - Multi-camera target tracking method, system, device and storage medium - Google Patents

Multi-camera target tracking method, system, device and storage medium Download PDF

Info

Publication number
CN111383252A
CN111383252A CN201811637626.8A CN201811637626A CN111383252A CN 111383252 A CN111383252 A CN 111383252A CN 201811637626 A CN201811637626 A CN 201811637626A CN 111383252 A CN111383252 A CN 111383252A
Authority
CN
China
Prior art keywords
camera
image
tracking
target
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811637626.8A
Other languages
Chinese (zh)
Other versions
CN111383252B (en
Inventor
吴旻烨
毕凝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yaoke Intelligent Technology Shanghai Co ltd
Original Assignee
Yaoke Intelligent Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yaoke Intelligent Technology Shanghai Co ltd filed Critical Yaoke Intelligent Technology Shanghai Co ltd
Priority to CN201811637626.8A priority Critical patent/CN111383252B/en
Publication of CN111383252A publication Critical patent/CN111383252A/en
Application granted granted Critical
Publication of CN111383252B publication Critical patent/CN111383252B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/292Multi-camera tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The method comprises the steps of selecting a plurality of original image areas of each image frame in a plurality of image frames synchronously shot by a plurality of cameras each time, generating a response image through a filter corresponding to an extracted feature image, and extracting a tracking result in the corresponding image frame by identifying a tracking target boundary frame determined by the response image containing a target point with the highest score; when the tracked target is judged to be shielded under some cameras, the tracking result of the shielded image frame is obtained through a multi-camera constraint method, the tracking shielding is effectively eliminated, the multi-cameras can simultaneously provide information of the tracked target at a plurality of visual angles, and the information can be used as input to enable a relevant filter to learn the characteristics of multiple angles, so that the robustness on the change of the visual angles is higher.

Description

Multi-camera target tracking method, system, device and storage medium
Technical Field
The present application relates to the field of target tracking technologies, and in particular, to a multi-camera target tracking method, system, device, and storage medium.
Background
General visual tracking (generic visual tracking) is a fundamental task in the field of computer vision. The task gives a bounding box of the tracked object for the first frame in the video sequence, and the tracker predicts the position and size of the tracked object for each frame thereafter. With the rapid development of visual tracking, the technology is increasingly applied to specific tracking in places such as crowded areas and security checkpoints. On the other hand, visual tracking is also a key technology in automatic driving. The visual tracking task can be divided into four categories, i.e., single-camera single-target tracking, single-camera multi-target tracking, multi-camera single-target tracking, and multi-camera multi-target tracking, according to the number of cameras and the number of objects to be tracked. The key to the tracking task is accurate target location and the efficiency of the algorithm.
At present, the tracking task mainly faces problems including occlusion, illumination change, deformation of the tracked object, and motion blur (motion blur). The tracking method based on the single camera is limited by the physical properties of the tracker, and has poor robustness to the situation that the tracked object is blocked.
Existing multi-camera tracking algorithms are basically multi-target tracking. The tracking algorithm mainly detects objects of people, and then uses a ReID (Re-identification) network to extract features and match frames, thereby achieving the tracking effect. However, due to the limitations of detection algorithms therein, such methods can only be used for tracking of people and cannot achieve tracking of arbitrary objects.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, it is an object of the present application to provide a multi-camera object tracking method, system, device and storage medium, which solve various problems of the object tracking technology in the prior art.
In order to achieve the above and other related objects, the present application provides a multi-camera object tracking method applied to an electronic device related to a camera array, wherein each camera in the camera array takes a picture synchronously, and each image frame taken by each camera at each time is taken as an image sequence; the method comprises the following steps: under the condition that an image sequence is correspondingly provided with an initial boundary frame used for selecting a tracking target from image frame frames in the image sequence, extracting a plurality of original image areas in each image frame in the image sequence respectively by using the initial boundary frame and a plurality of alternative boundary frames obtained under the condition that the center is unchanged and the scaling scale is changed; inputting a plurality of original image areas of each image frame into a feature extractor to obtain a plurality of corresponding feature maps of each image frame; filtering the plurality of feature maps of each image frame by using a filter to obtain a plurality of corresponding response maps; obtaining a response map of a target point containing the highest score from a plurality of response maps obtained according to an image frame corresponding to each camera in each image sequence to be used as a tracking result acquisition basis, and using the highest score as a generation basis of the score of the corresponding camera in the image frame; taking the position of the corresponding pixel point of the target point in the image frame as a boundary frame reference point, taking the scale of the alternative boundary frame used for obtaining the tracking result as the boundary frame scale, and combining the boundary frame reference point and the boundary frame scale to construct a tracking result boundary frame for extracting the tracking result from the image frame; comparing the score of each camera in the image sequence corresponding to the image frame with a preset threshold value so as to judge whether the camera has the condition that the tracking target is shielded; for the first type of image frames of the camera judged to have the condition that the tracking target is shielded, correcting the corresponding tracking result boundary frame by an inter-camera constraint method, and extracting the tracking result from the first type of image frames; and extracting the tracking result by utilizing the corresponding tracking result bounding box for the second type of image frame of the camera which is judged that the tracking target is not shielded.
In one embodiment, the filter and feature extractor are pre-trained, the pre-training comprising: one or more iterative computations, each iterative computation comprising: in a randomly selected video of a tracked target, randomly selecting a predetermined number of image frames for generating a plurality of training sample pairs, each training sample pair comprising: extracting an original image area in each randomly selected image frame according to a reference standard, and obtaining a response image according to the original image area; alternatively, each training sample pair comprises: an image area obtained by offsetting the original image area and a response image generated according to the image area obtained by offsetting; the filter and feature extractor are trained using one portion, and the other portion, of the plurality of training sample pairs, respectively.
In one embodiment, the filter fixes the parameters of the feature extractor during training to minimize a first objective function of the feature extractor for the filter; and/or the feature extractor fixes parameters of the filter during training to minimize a second objective function of the filter for the feature extractor.
In one embodiment, the video is from a target tracking data set comprising: one or more combinations of an OTB Dataset, a VOT Dataset, a sample Color 128 Dataset, a vidv Tracking Dataset, and a UAV123 Dataset.
In one embodiment, the filter is trained online with a target training set to obtain updates.
In one embodiment, the on-line training of the filter comprises: generating a training sample added into the target training set according to the image frames collected by each camera, wherein the training sample comprises an original image area extracted from the image frame; inputting training samples into a third objective function of a filter, wherein the third objective function of the filter:
Figure BDA0001930412920000021
Figure BDA0001930412920000022
wherein the content of the first and second substances,
Figure BDA0001930412920000023
a response graph obtained by filtering the jth training sample of the ith camera is shown,
Figure BDA0001930412920000024
a response graph corresponding to the reference standard of the jth training sample of the ith camera is represented;
Figure BDA0001930412920000025
represents a score when a tracking result is acquired according to the jth training sample of the ith camera; the objective function is transformed into the frequency domain, represented as:
Figure BDA0001930412920000026
Figure BDA0001930412920000027
wherein, conj () represents the conjugate of the complex number; the power factor represents Fourier transform; by iterative solution of gradients
Figure BDA0001930412920000028
And the objective function E (f) is optimized by using a conjugate gradient method to train the filter.
In an embodiment, the multi-camera target tracking method further includes: performing an update action on the target training set, comprising: when each camera collects a preset number of image frames, the tracking result of the camera which is judged that the tracking target is not shielded currently is added into the target training set as a training sample for updating.
In one embodiment, the response map is expressed as a gaussian distribution centered on a reference point in the original image region extracted by the reference standard.
In an embodiment, the method for multi-camera constraint includes: selecting a first camera from a first camera set containing cameras judged that the tracking target is occluded, and acquiring a second camera from a second camera set containing cameras judged that the tracking target is not occluded in the image sequence, wherein each camera in the first camera set and the second camera set corresponds to each image frame in the same image sequence; calculating a homography matrix for estimating a transformation relation of a motion plane of a tracking target under a first camera and a second camera according to a plurality of pairs of tracking results obtained by the first camera and the second camera in a plurality of image sequences; mapping a predetermined point of a tracking target in each second type image frame in each image sequence by using the homography matrix to obtain a mapping point of a first type image frame in the same image sequence; and carrying out constraint correction on the obtained tracking result bounding box according to the obtained mapping points of each first type image frame so as to obtain a corrected tracking result.
In an embodiment, the homography matrix is obtained by calculating a position relation between each pair of matching points in a plurality of pairs of tracking results obtained by a first tracking track of a first camera and a second tracking track of a second camera at a plurality of same time; the tracking trajectory refers to a time-sequentially ordered set of tracking results of the image frames acquired by each camera along the time sequence.
In an embodiment, before filtering each of the feature images, the method further includes: and smoothing the characteristic image.
To achieve the above and other related objects, the present application provides an electronic device, related to a camera array, including: at least one transceiver coupled to the camera array; at least one memory storing a computer program; at least one processor, coupled to the transceiver and the memory, for executing the computer program to perform the multi-camera target tracking method.
To achieve the above and other related objects, the present application provides a computer storage medium storing a computer program which, when executed, performs the multi-camera object tracking method.
In order to achieve the above and other related objects, the present application provides a multi-camera object tracking system, applied to an electronic device related to a camera array, wherein each camera in the camera array takes a picture synchronously, and each image frame taken by each camera at each time is taken as an image sequence; the system comprises: the image processing module is used for respectively extracting a plurality of original image areas in each image frame in an image sequence by utilizing an initial boundary frame and a plurality of alternative boundary frames obtained under the condition that the center of the initial boundary frame is unchanged and the scaling scale of the initial boundary frame is changed when the initial boundary frame is correspondingly configured for selecting a tracking target from the image frames in the image sequence; the characteristic extractor is used for extracting the characteristics of a plurality of original image areas of each image frame to obtain a plurality of corresponding characteristic maps of each image frame; the filter is used for filtering the characteristic maps of each image frame to obtain a plurality of corresponding response maps; a tracking calculation module, configured to obtain, from a plurality of response maps obtained according to an image frame corresponding to each camera in each image sequence, a response map of a target point including a highest score as a tracking result acquisition basis, and use the highest score as a generation basis of a score of the corresponding camera in the image frame; taking the position of the corresponding pixel point of the target point in the image frame as a boundary frame reference point, taking the scale of the alternative boundary frame used for obtaining the tracking result as the boundary frame scale, and combining the boundary frame reference point and the boundary frame scale to construct a tracking result boundary frame for extracting the tracking result from the image frame; comparing the score of each camera in the corresponding image frame in the image sequence with a preset threshold value so as to judge whether the camera has the condition that the tracking target is shielded; for the first type of image frames of the camera judged to have the condition that the tracking target is shielded, correcting the corresponding tracking result boundary frame by an inter-camera constraint method, and extracting the tracking result from the first type of image frames; and extracting the tracking result by utilizing the corresponding tracking result boundary frame for the second type of image frame of the camera which is judged that the tracking target is not shielded.
As described above, the multi-camera target tracking method, system, apparatus and storage medium of the present application extract a tracking result in a corresponding image frame by selecting a plurality of original image regions for each of a plurality of image frames synchronously captured by a plurality of cameras each time, and generating a response map by a filter in accordance with the extracted feature map, and by identifying a tracking target bounding box determined by the response map including a target point with the highest score; when the tracked target is judged to be shielded under some cameras, the tracking result of the shielded image frame is obtained through an inter-camera constraint method, the tracking shielding is effectively eliminated, the multi-cameras can simultaneously provide information of the tracked target at a plurality of visual angles, and the information can be used as input to enable a relevant filter to learn the characteristics of multiple angles, so that the robustness on the change of the visual angles is higher.
Drawings
Fig. 1 is a schematic flowchart of a multi-camera target tracking method according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of a multi-camera constraint method in the embodiment of the present application.
Fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Fig. 4 is a block diagram of a multi-camera object tracking system according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and of being practiced or being carried out in various ways, and it is capable of other various modifications and changes without departing from the spirit of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Aiming at the defects of the existing target tracking, the scheme for realizing the target tracking by analyzing each frame image of multiple angles shot by the camera array is provided, and particularly for the tracking of a single target, a more precise effect can be achieved.
The camera array refers to a shooting device that combines a plurality of cameras to shoot the same scene or the same object, and the structure of the camera array may be, for example, a row of cameras, a column of cameras, or M rows by N columns of cameras, however, the camera array is not necessarily in the form of a square matrix, and may also be in various shapes such as a circle, a triangle, or other shapes.
The plurality of cameras in the camera array shoot images of the tracked target from different angles, the tracked target is positioned through image analysis, information of the tracked target at each angle can be well presented, and the problem that the tracked target is lost due to the fact that the tracked target is shielded under certain camera view angles can be avoided.
In some embodiments, the tracking target may be, for example, a person, an animal, or other moving object such as a car.
Fig. 1 shows a schematic flow chart of a multi-camera target tracking method provided in the embodiment of the present application.
The method is applied to an electronic device related to a camera array. In some embodiments, the electronic device may be a processing terminal, such as a desktop computer, laptop computer, smart phone, tablet computer, or other terminal with processing capabilities, that is independent of the camera array and is coupled to the camera array; in some embodiments, the camera array and electronics may also be integrated as components together as a single product, such as a light field camera, and the electronics may be implemented as circuitry in the light field camera that is attached to one or more circuit boards in the light field camera; in some embodiments, the cameras may be coupled to each other, and the electronic device may be implemented by a circuit in each of the cameras.
Each camera in the camera array shoots synchronously, and each image frame shot by each camera at each time is used as an image sequence.
For example, N cameras in the camera array take synchronous shots at time t to obtain I1~INFor the image frames acquired by the N cameras respectively, the corresponding image sequence is represented as Ii,i=1,…,N。
The method specifically comprises the following steps:
step S101: in the case that an image sequence is correspondingly provided with an initial boundary box used for selecting a tracking target from image frames in the image sequence, a plurality of original image areas are respectively extracted in each image frame in the image sequence by using the initial boundary box and a plurality of alternative boundary boxes obtained under the condition that the center is unchanged and the scaling scale is changed.
In one embodiment, let the image sequence I of the tracking target in each camera IiLower corresponding initial bounding box BiI.e. one initial bounding box is required for each picture sequence. Wherein, the bounding box (bounding box) is a geometric box for framing out the image region where the tracking target is located in the image by manual or machine recognition, and the framing result of the initial bounding box can be used as referenceThe standard, i.e. group route.
In this embodiment, the initial bounding box is a rectangular bounding box, and the framed image area is represented as:
Bi=(x,y,w,h);
where (x, y) represents the coordinates of a reference point of the bounding box, such as a center point, but also other feature points.
In this embodiment, the reference point (x, y) is the coordinate of the top left corner of the bounding box, and (w, h) represents the width and length of the bounding box, in pixels.
For selecting original image area BiOf the initial bounding box
Figure BDA0001930412920000061
Centered on the image sequence I at camera IiExtracting an original image area through alternative bounding boxes under different scaling scales d in the corresponding image frame
Figure BDA0001930412920000062
Figure BDA0001930412920000063
Wherein d is 1, …, NdAnd d represents a different scaling scale.
Step S102: and inputting a plurality of original image areas of each image frame into a feature extractor to obtain a plurality of corresponding feature maps of each image frame.
In an embodiment, preferably, for the image feature extraction, the feature extractor may be implemented by using a CNN network model, which is commonly used, for example, various models such as VGG, ResNet, AlexNet, inclusion, and the like.
In an embodiment, the original image regions P under multiple scales are collected according to the above exampleiI-1, …, N is used as the input of the pre-trained feature extractor g to obtain the feature map set (feature map) under the current image frame:
Figure BDA0001930412920000064
d denotes the different scaling, i ═ 1, …, N. The set of feature maps is represented as:
Figure BDA0001930412920000065
optionally, after obtaining the feature map set, the feature map set may be smoothed by a smoothing function, for example, a two-dimensional cosine window function w for the image region at each scaling scale
Figure BDA0001930412920000068
Multiplying, and performing weighted smoothing on the region to obtain
Figure BDA0001930412920000069
So that the method has better continuity in the frequency domain. For the
Figure BDA00019304129200000610
The image area of (1) has:
Figure BDA0001930412920000066
Figure BDA0001930412920000067
it should be noted that the smoothing process is only optional, and not necessary; the smoothing method is not limited to the two-dimensional cosine window function, and a similar smoothing method, such as gaussian filtering, may be used.
Step S103: and filtering the plurality of characteristic maps of each image frame by using a filter to obtain a plurality of corresponding response maps.
In an embodiment, in accordance with the foregoing example, it is preferable to filter the feature map after the smoothing process; of course, in other embodiments, the feature map that is not smoothed may be filtered.
Using filteringThe device f corresponds the characteristic image under each scale d of each camera i
Figure BDA00019304129200000611
Performing relevant filtering to obtain a response graph of each camera i under the current frame under the scale d
Figure BDA00019304129200000612
Expressed as:
Figure BDA0001930412920000071
wherein N istTo represent
Figure BDA0001930412920000074
Indicates the relevant filtering operation.
In one embodiment, the response map is expressed as a gaussian distribution centered on a reference point in an original image region extracted by a reference standard, each point in the response map corresponds to a score, and the highest score is 1; each response map is used for describing the correlation degree of each pixel in the original image area and the tracking target.
Step S104: obtaining a response map containing a target point with the highest score from a plurality of response maps obtained according to an image frame corresponding to each camera in each image sequence to be used as a tracking result obtaining basis, and using the highest score as a generation basis of the score of the corresponding camera in the image frame; and taking the position of the corresponding pixel point of the target point in the image frame as a boundary frame reference point, taking the scale of the alternative boundary frame used for obtaining the tracking result as the boundary frame scale, and combining the boundary frame reference point and the boundary frame scale to construct a tracking result boundary frame for extracting the tracking result from the image frame.
In an embodiment, taking the above example into account, each image sequence includes a plurality of image frames, each image frame corresponds to a camera in a camera array, and in each response map corresponding to an original image area of various scales extracted from the image frame, a response map including a highest-score target point corresponding to a closest pixel is used as a tracking result acquisition basis, so as to construct a tracking result bounding box to obtain a tracking result in the image frame.
Preferably, the score is highest with each camera score
Figure BDA0001930412920000075
Corresponding dimension
Figure BDA0001930412920000076
As the scaling of the bounding box of the current image frame, the center of the moved object frame is determined by
Figure BDA0001930412920000077
Position coordinate (x) of pixel having highest median scoremax,ymax) The decision, the highest score is expressed as:
Figure RE-GDA0001981010730000076
the tracking result is expressed as:
Figure BDA0001930412920000073
that is to utilize
Figure BDA0001930412920000078
And the corresponding tracking result bounding box is used as a tracking result in the area intercepted in the current image frame.
Step S105: and comparing the score of each camera in the corresponding image frame in the image sequence with a preset threshold value so as to judge whether the tracking target is shielded or not.
The threshold value may be set empirically.
Step S106: and for the first type of image frames of the camera which are judged to have the condition that the tracking target is blocked, correcting the corresponding tracking result boundary frame by using an inter-camera constraint method, and extracting the tracking result from the first type of image frames.
In one embodiment, take the above example if
Figure BDA0001930412920000079
Less than a threshold thoccThen the camera i is considered to have a camera occlusion, and the tracking result of this camera in the current frame will be determined by the multi-camera inter-constraint method.
As shown in fig. 2, a flow chart of the multi-camera constraint method is shown.
The multi-camera constraint method comprises the following steps:
step S201: selecting a first camera from a first camera set containing cameras judged that the tracking target is occluded, and acquiring a second camera from a second camera set containing cameras judged that the tracking target is not occluded in the image sequence, wherein each camera in the first camera set and the second camera set corresponds to each image frame in the same image sequence.
In an embodiment, taking the above example into account, for the camera array of uncalibrated synchronous multiple cameras, the tracking trajectory T of the camera i obtained by the tracking algorithmiFrom image frames acquired by each camera in its temporal sequence
Figure BDA0001930412920000083
Constitutes the center point of (a).
When judging that the camera set O, O ∈ O is occluded and the camera set K, K ∈ K is not occluded in the current image sequence, taking 2 cameras K from the camera set with the occlusion and the camera set without the occlusion1,k2∈ K, and acquiring the highest score of all cameras in the current image frame
Figure BDA0001930412920000084
Step S202: and calculating a homography matrix for estimating a transformation relation of a motion plane of the tracking target under the first camera and the second camera according to a plurality of pairs of tracking results obtained by the first camera and the second camera in a plurality of image sequences.
In an embodiment, the homography matrix is obtained by calculating a position relation between each pair of matching points in a plurality of pairs of tracking results obtained by a first tracking track of a first camera and a second tracking track of a second camera at a plurality of same time; the tracking trajectory refers to a time-sequentially ordered set of tracking results of the image frames acquired by each camera along the time sequence.
Take advantage of the foregoing example, for example, according to
Figure BDA0001930412920000085
And
Figure BDA0001930412920000086
the central points of the last n tracking results, n is more than or equal to 4, and a homography matrix H is calculatedjiAnd the method is used for estimating the transformation relation of the motion plane of the tracking target under the two cameras. HjiRepresenting a coordinate transformation from camera j to camera i. HjiAccording to
Figure BDA0001930412920000087
And
Figure BDA0001930412920000088
linear equations are aligned and solved using SVD.
For the
Figure BDA0001930412920000089
And
Figure BDA00019304129200000810
center point of object frame under same time t
Figure BDA00019304129200000811
For a pair of matched point pairs, the homography matrix is H, then:
Figure BDA0001930412920000081
unfolding to obtain:
Figure BDA0001930412920000082
for ease of solution, the above equation can be changed to the form Ax ═ 0:
x2(H31x1+H32y1+H33)-H11x1+H12y1+H13=0
y2(H31x1+H32y1+H33)-H21x1+H22y1+H23=0
rewrite the above equation to the form of a vector product and normalize the last element to 1, let H ═ H (H)11,H12,H13,H21,H22,H23,H31,H32,1)TThen the above two equations can be rewritten as:
axh=0
ayh=0
wherein, ax=(-x1,-y1,0,0,0,x2x1,x2y1,x2)T,ay=(0,0,0,-x1,-y1,-1,y2x1,y2y1,y2)T
One pair of matched point pairs, one can get the above equation, H has 8 unknowns, using
Figure BDA0001930412920000094
And
Figure BDA0001930412920000095
the central point of the last n tracking results, n is greater than or equal to 4, the following equation can be obtained:
Ah=0
wherein the content of the first and second substances,
Figure BDA0001930412920000091
for the solution of such over-determined equations, a least squares solution can be obtained by singular value decomposition of the coefficient matrix a:
UΣVT=SVD(ATA)
selecting the maximum singular value lambda in the singular matrix sigmamaxThe corresponding vector u is the least square solution of Ah being 0, so as to find H.
Step S203: and mapping preset points of the tracking target in each second type image frame in each image sequence by using the homography matrix to obtain mapping points of the first type image frame in the same image sequence.
Taking the above example as a support, the center x of all the tracking targets in the camera k that are not occluded in the current image frame is calculatedkCoordinate transformation to occluded camera o:
λx′ko=Hkoxk
step S204: and carrying out constraint correction on the obtained tracking result bounding box according to the obtained mapping points of each first type image frame so as to obtain a corrected tracking result.
Receive the above example with the aid of NkRespectively calculating the coordinates of the shielded object under the camera o for the shielded camera
Figure BDA0001930412920000096
And uses it to predict the position x of the tracked object in the camera where the occlusion occursoThe following constraint corrections are performed:
Figure BDA0001930412920000092
Figure BDA0001930412920000093
wherein N iskRepresenting the number of cameras in the set K, xoPosition information (x, y, W,H),x′oand position information (x ', y', W ', H') of a reference point (such as a central point) of a boundary frame of the tracking result obtained after the current frame image frame is corrected after the multi-camera constraint is performed, so that the corrected tracking result is obtained.
Step S107: and extracting the tracking result by utilizing the corresponding tracking result boundary frame for the second type of image frame of the camera which is judged that the tracking target is not shielded.
According to the above description, in the embodiment of the present application, in the case that the score certainty is low, the final tracking result is obtained by using the inter-camera constraint method rather than the filter result.
In one embodiment, the filter and feature extractor are pre-trained, the pre-training comprising: one or more iterative calculations.
Each iterative calculation includes:
in a randomly selected video of a tracked target, randomly selecting a predetermined number of image frames for generating a plurality of training sample pairs, each training sample pair comprising: extracting an original image area in each randomly selected image frame according to a reference standard, and obtaining a response image according to the original image area; alternatively, each training sample pair comprises: an image area obtained by shifting the original image area and a response image generated according to the image area obtained by shifting; the filter and feature extractor are trained using one portion, and the other portion, of the plurality of training sample pairs, respectively.
In one embodiment, the filter fixes the parameters of the feature extractor during training to minimize a first objective function of the feature extractor for the filter; and/or the feature extractor fixes parameters of the filter during training to minimize a second objective function of the filter for the feature extractor.
Specifically, the video is from a target tracking data set, and the target tracking data set includes: one or more combinations of an OTB Dataset, a VOT Dataset, a sample Color 128 Dataset, a vidv Tracking Dataset, and a UAV123 Dataset.
For example, each iteration, a tracking target may be randomly selected, and 16 pictures may be randomly extracted from the corresponding video for generating a training sample pair (p)i,yi),piIs a raw image area (patch) cut out from a raw image frame according to a reference standard (ground route). The tracking target is not necessarily in the center of the intercepted image, but is randomly shifted by a certain position delta to increase data diversity. Accordingly, a Gaussian kernel w is usedGObject-centered response map yiThe center of (d) is also shifted by δ accordingly.
pi=(x,y,W,H)
p′i=(x±δ,y±δ,W,H)
Figure BDA0001930412920000101
Wherein, mu12Characterizing the mean, σ, of the horizontal and vertical directions12The variance of the gaussian function in two directions is characterized, and p represents the correlation coefficient of the two directions. w is aG(x, y) represents a two-dimensional Gaussian distribution centered on (x, y).
Each iteration extracts 16 pairs (p, y) of samples in the dataset, representing the image area and the corresponding response map, respectively, and the training is divided into two phases:
the filter f is trained using, for example, the first 10 samples of each set of inputs (batch) to train the filter for the correlation filtering. At this time, the parameters of the network are extracted by the fixed features, and the loss function of the filtering result is minimized.
Figure BDA0001930412920000111
Wherein the content of the first and second substances,
Figure BDA0001930412920000115
is a response graph corresponding to a reference standard, F is a deep learning feature extraction operation,
Figure BDA0001930412920000116
is the input image patch, ω is the parameter of the neural network, the same below; f. of*Is the filter obtained by optimizing the loss function. The optimization method uses a conjugate gradient algorithm.
The feature extractor F was trained and the last 6 samples were used. At this time, the parameter f of the filter is fixed*The loss function is minimized for the feature extraction network.
Figure BDA0001930412920000112
Wherein, yiIs a response map corresponding to a reference standard, siIs xiAs an input, the grad () represents the gradient of the image, which is the result of the filtering by the correlation filtering. Updating omega using a gradient descent method*
In one embodiment, the on-line training of the filter comprises:
generating a training sample added into the target training set according to the image frames acquired by each camera, wherein the training sample comprises an original image area extracted from the image frame; inputting training samples into a third objective function of a filter, wherein the third objective function of the filter:
Figure BDA0001930412920000113
wherein the content of the first and second substances,
Figure BDA0001930412920000117
a response graph obtained by filtering the jth training sample of the ith camera is shown,
Figure BDA0001930412920000118
a response graph corresponding to the reference standard of the jth training sample of the ith camera is represented;
Figure BDA0001930412920000119
representing the jth training from the ith cameraThe score of the sample when obtaining the tracking result;
the objective function is transformed into the frequency domain, represented as:
Figure BDA0001930412920000114
wherein, conj () represents the conjugate of the complex number; the power factor represents Fourier transform; by iterative solution of gradients
Figure BDA00019304129200001110
And the objective function e (f) is optimized using a conjugate gradient method to train the filter.
In an embodiment, the multi-camera target tracking method further includes: performing an update action on the target training set, comprising: when each camera collects a preset number of image frames, the tracking result of the camera which is judged that the tracking target is not shielded currently is added into the target training set as a training sample for updating.
For example, every 7 frames, for the judgment, the corresponding one calculated according to the current image frame is obtained
Figure BDA00019304129200001111
Greater than threshold thspCamera of, tracking result thereof
Figure BDA00019304129200001112
What is considered to be new training data is added as training samples to the target training set D, and the new filter f 'is trained using the updated target training set D'.
Fig. 3 is a schematic structural diagram of an electronic device 300 according to an embodiment of the present disclosure.
In some embodiments, the electronic device 300 may be a processing terminal, such as a desktop computer, a laptop computer, a smart phone, a tablet computer, or other terminal with processing capabilities, coupled to the camera array 304 independently of the camera array 304; in some embodiments, the camera array 304 and the electronic device 300 may also be integrated together as a component as a single product, such as a light field camera, and the electronic device 300 may be implemented as circuitry in the light field camera that is attached to one or more circuit boards in the light field camera; in some embodiments, the cameras may be coupled to each other, and the electronic device 300 may be implemented by the cooperation of circuits in each of the cameras.
The electronic device 300 includes:
at least one transceiver 301 coupled to the camera array.
In one embodiment, the transceiver 301 comprises: such as one or more of CVBS, VGA, DVI, HDMI, SDI, GigE, USB3.0, Cameralink, HSLink, or CoaXPres.
At least one memory 302 storing computer programs;
at least one processor 303, coupled to the transceiver 301 and the memory 302, is configured to run the computer program to perform the multi-camera object tracking method.
In some embodiments, the memory 302 may include, but is not limited to, a high speed random access memory 302, a non-volatile memory 302. Such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
The processor 301 may be a general-purpose processor 301, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the integrated circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components.
In addition, the various computer programs referred to in the foregoing multi-camera object tracking method embodiments (e.g., the embodiments of fig. 1, 2) may be loaded onto a computer-readable storage medium, which may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (compact disc-read only memories), magneto-optical disks, ROMs (read only memories), RAMs (random access memories), EPROMs (erasable programmable read only memories), EEPROMs (electrically erasable programmable read only memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions. The computer readable storage medium can be a product which is not accessed into the computer device, and can also be a component which is accessed into the computer device for use.
In particular implementations, the computer programs are routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
As shown in fig. 4, the multi-camera object tracking system in the embodiment of the present application is applied to an electronic device associated with a camera array, wherein each camera in the camera array takes pictures synchronously and each image frame taken by each camera at a time is taken as an image sequence. In this embodiment, technical features of specific implementation of the system are substantially the same as those of the multi-camera target tracking method in the foregoing embodiment, and technical contents that can be commonly used between embodiments are not repeated.
The system comprises:
an image processing module 401, configured to, in a case that an image sequence is configured with an initial bounding box for framing a tracking target from image frames therein, extract a plurality of original image regions in each image frame in the image sequence respectively by using the initial bounding box and a plurality of candidate bounding boxes obtained when the center of the initial bounding box is unchanged and the scaling scale of the initial bounding box is changed;
a feature extractor 402, configured to perform feature extraction on a plurality of original image regions of each image frame to obtain a plurality of corresponding feature maps of each image frame;
a filter 403, configured to filter the plurality of feature maps of each image frame to obtain a corresponding plurality of response maps;
a tracking calculation module 404, configured to obtain, from among a plurality of response maps obtained according to an image frame corresponding to each camera in each image sequence, a response map including a target point with a highest score as a basis for obtaining a tracking result, and use the highest score as a basis for generating a score of the corresponding camera in the image frame; taking the position of the corresponding pixel point of the target point in the image frame as a boundary frame reference point, taking the scale of the alternative boundary frame used for obtaining the tracking result as the boundary frame scale, and combining the boundary frame reference point and the boundary frame scale to construct a tracking result boundary frame for extracting the tracking result from the image frame; comparing the score of each camera in the corresponding image frame in the image sequence with a preset threshold value so as to judge whether the camera has the condition that the tracking target is shielded; for the first type of image frames of the camera judged to have the condition that the tracking target is shielded, correcting the corresponding tracking result boundary frame by an inter-camera constraint method, and extracting the tracking result from the first type of image frames; and extracting the tracking result by utilizing the corresponding tracking result boundary frame for the second type of image frame of the camera which is judged that the tracking target is not shielded.
In one embodiment, the filter and feature extractor are pre-trained, the pre-training comprising: one or more iterative computations, each iterative computation comprising: in a randomly selected video of a tracked target, randomly selecting a predetermined number of image frames for generating a plurality of training sample pairs, each training sample pair comprising: extracting an original image area in each randomly selected image frame according to a reference standard, and obtaining a response image according to the original image area; alternatively, each training sample pair comprises: an image area obtained by offsetting the original image area and a response image generated according to the image area obtained by offsetting; the filter and feature extractor are trained using one portion, and the other portion, of the plurality of training sample pairs, respectively.
In one embodiment, the filter fixes the parameters of the feature extractor during training to minimize a first objective function of the feature extractor for the filter; and/or the feature extractor fixes parameters of the filter during training to minimize a second objective function of the filter for the feature extractor.
In one embodiment, the video is from a target tracking data set comprising: one or more combinations of an OTB Dataset, a VOT Dataset, a sample Color 128 Dataset, a vidv Tracking Dataset, and a UAV123 Dataset.
In one embodiment, the filter is trained online with a target training set to obtain updates.
In one embodiment, the on-line training of the filter comprises: generating a training sample added into the target training set according to the image frames collected by each camera, wherein the training sample comprises an original image area extracted from the image frame; inputting training samples into a third objective function of a filter, wherein the third objective function of the filter:
Figure BDA0001930412920000141
Figure BDA0001930412920000142
wherein the content of the first and second substances,
Figure BDA0001930412920000145
a response graph obtained by filtering the jth training sample of the ith camera is shown,
Figure BDA0001930412920000146
a response graph corresponding to the reference standard of the jth training sample of the ith camera is represented;
Figure BDA0001930412920000147
represents a score when a tracking result is acquired according to the jth training sample of the ith camera; the objective function is transformed into the frequency domain, represented as:
Figure BDA0001930412920000143
Figure BDA0001930412920000144
wherein, conj () represents the conjugate of the complex number; the power factor represents Fourier transform; by iterative solution of gradients
Figure BDA0001930412920000148
And the objective function E (f) is optimized by using a conjugate gradient method to train the filter.
In one embodiment, the system further comprises: a training set updating module, configured to perform an updating action on the target training set, including: when each camera collects a preset number of image frames, the tracking result of the camera which is judged that the tracking target is not shielded currently is added into the target training set as a training sample for updating.
In one embodiment, the response map is expressed as a gaussian distribution centered on a reference point in the original image region extracted by the reference standard.
In an embodiment, the method for multi-camera constraint includes: selecting a first camera from a first camera set containing cameras judged that the tracking target is occluded, and acquiring a second camera from a second camera set containing cameras judged that the tracking target is not occluded in the image sequence, wherein each camera in the first camera set and the second camera set corresponds to each image frame in the same image sequence; calculating a homography matrix for estimating a transformation relation of a motion plane of a tracking target under a first camera and a second camera according to a plurality of pairs of tracking results obtained by the first camera and the second camera in a plurality of image sequences; mapping a predetermined point of a tracking target in each second type image frame in each image sequence by using the homography matrix to obtain a mapping point of a first type image frame in the same image sequence; and carrying out constraint correction on the obtained tracking result bounding box according to the obtained mapping points of each first type image frame so as to obtain a corrected tracking result.
In an embodiment, the homography matrix is obtained by calculating a position relation between each pair of matching points in a plurality of pairs of tracking results obtained by a first tracking track of a first camera and a second tracking track of a second camera at a plurality of same time; the tracking trajectory refers to a time-sequentially ordered set of tracking results of the image frames acquired by each camera along the time sequence.
In one embodiment, the system further comprises: and the smoothing module is arranged between the feature extractor and the filter and used for smoothing the feature image and outputting the smoothed feature image to the filter.
It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules may all be implemented in the form of software invoked by a processing element, for example, the feature extractor may be implemented by a CNN network model; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the tracking calculation module may be a processing element separately set up, or may be implemented by being integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the tracking calculation module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, the steps of the method or the modules may be implemented by hardware integrated logic circuits in a processor element or instructions in software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
To sum up, the multi-camera target tracking method, system, device and storage medium of the present application select a plurality of original image regions for each image frame of a plurality of image frames synchronously captured by a plurality of cameras each time, and generate a response map through a filter according to the extracted feature map, and extract a tracking result in the corresponding image frame by identifying a tracking target bounding box determined by the response map containing a target point with the highest score; when the tracked target is judged to be shielded under some cameras, the tracking result of the shielded image frame is obtained through an inter-camera constraint method, the tracking shielding is effectively eliminated, the multi-cameras can simultaneously provide information of the tracked target at a plurality of visual angles, and the information can be used as input to enable a relevant filter to learn the characteristics of multiple angles, so that the robustness on the change of the visual angles is higher.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the application. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which may be accomplished by those skilled in the art without departing from the spirit and scope of the present disclosure be covered by the claims which follow.

Claims (14)

1. A multi-camera target tracking method is applied to an electronic device related to a camera array, each camera in the camera array synchronously shoots, and each image frame shot by each camera at each time is taken as an image sequence; the method comprises the following steps:
under the condition that an image sequence is correspondingly provided with an initial boundary frame used for selecting a tracking target from image frames in the image sequence, extracting a plurality of original image areas in each image frame in the image sequence respectively by using the initial boundary frame and a plurality of alternative boundary frames obtained under the condition that the center is unchanged and the scaling scale is changed;
inputting a plurality of original image areas of each image frame into a feature extractor to obtain a plurality of corresponding feature maps of each image frame;
filtering the plurality of feature maps of each image frame by using a filter to obtain a plurality of corresponding response maps;
obtaining a response map of a target point containing the highest score from a plurality of response maps obtained according to an image frame corresponding to each camera in each image sequence to be used as a tracking result acquisition basis, and using the highest score as a generation basis of the score of the corresponding camera in the image frame; taking the position of the corresponding pixel point of the target point in the image frame as a boundary frame reference point, taking the scale of the alternative boundary frame used for obtaining the tracking result as the boundary frame scale, and combining the boundary frame reference point and the boundary frame scale to construct a tracking result boundary frame for extracting the tracking result from the image frame;
comparing the score of each camera in the corresponding image frame in the image sequence with a preset threshold value so as to judge whether the camera has the condition that the tracking target is shielded;
for the first type of image frames of the camera judged to have the condition that the tracking target is shielded, correcting the corresponding tracking result boundary frame by an inter-camera constraint method, and extracting the tracking result from the first type of image frames; and extracting the tracking result by utilizing the corresponding tracking result boundary frame for the second type of image frame of the camera which is judged that the tracking target is not shielded.
2. The multi-camera object tracking method of claim 1, wherein the filter and feature extractor are pre-trained, the pre-training comprising:
one or more iterative computations, each iterative computation comprising:
in a randomly selected video of a tracked target, randomly selecting a predetermined number of image frames for generating a plurality of training sample pairs, each training sample pair comprising: extracting an original image area in each randomly selected image frame according to a reference standard, and obtaining a response image according to the original image area; alternatively, each training sample pair comprises: an image area obtained by offsetting the original image area and a response image generated according to the image area obtained by offsetting;
the filter and feature extractor are trained using one portion, and the other portion, of the plurality of training sample pairs, respectively.
3. The multi-camera object tracking method of claim 2, wherein the filter fixes parameters of the feature extractor at the time of training to minimize a first objective function of the feature extractor for the filter; and/or the feature extractor fixes parameters of the filter during training to minimize a second objective function of the filter for the feature extractor.
4. A multi-camera target tracking method according to claim 2, wherein the video is from a target tracking data set comprising: one or more combinations of an OTB Dataset, a VOT Dataset, a sample Color 128 Dataset, a VIVIDTracking Dataset, and a UAV123 Dataset.
5. A multi-camera target tracking method as claimed in claim 1, characterized in that the filter is trained online by a training set of targets to be updated.
6. The multi-camera target tracking method of claim 5, wherein the on-line training of the filter comprises:
generating a training sample added into the target training set according to the image frames acquired by each camera, wherein the training sample comprises an original image area extracted from the image frame;
inputting training samples into a third objective function of a filter, wherein the third objective function of the filter:
Figure FDA0001930412910000021
wherein the content of the first and second substances,
Figure FDA0001930412910000022
a response graph obtained by filtering the jth training sample of the ith camera is shown,
Figure FDA0001930412910000023
a response graph corresponding to the reference standard of the jth training sample of the ith camera is represented;
Figure FDA0001930412910000024
represents a score when a tracking result is acquired according to the jth training sample of the ith camera;
the objective function is transformed into the frequency domain, represented as:
Figure FDA0001930412910000025
Figure FDA0001930412910000026
wherein, conj () represents the conjugate of the complex number; the power factor represents Fourier transform;
by iterative solution of gradients
Figure FDA0001930412910000027
And the objective function E (f) is optimized by using a conjugate gradient method to train the filter.
7. The multi-camera target tracking method according to claim 5 or 6, further comprising: performing an update action on the target training set, comprising: when each camera collects a preset number of image frames, the tracking result of the camera which is judged that the tracking target is not shielded currently is added into the target training set as a training sample for updating.
8. The multi-camera object tracking method according to claim 1, 2 or 6, wherein the response map is expressed as a gaussian distribution centered on a reference point in an original image region extracted by a reference standard.
9. The multi-camera object tracking method of claim 1, wherein the inter-camera constraint method comprises:
selecting a first camera from a first camera set containing cameras judged that the tracking target is occluded, and acquiring a second camera from a second camera set containing cameras judged that the tracking target is not occluded in the image sequence, wherein each camera in the first camera set and the second camera set corresponds to each image frame in the same image sequence;
calculating a homography matrix used for estimating a transformation relation of a motion plane of a tracking target under a first camera and a second camera according to a plurality of pairs of tracking results obtained by the first camera and the second camera in a plurality of image sequences;
mapping a predetermined point of a tracking target in each second type image frame in each image sequence by using the homography matrix to obtain a mapping point of a first type image frame in the same image sequence;
and carrying out constraint correction on the obtained tracking result bounding box according to each obtained mapping point of each first type image frame so as to obtain a corrected tracking result.
10. The multi-camera object tracking method of claim 9, wherein the homography matrix is calculated from a positional relationship between each pair of matching points in a plurality of pairs of tracking results obtained at a plurality of same time instants from a first tracking trajectory of the first camera and a second tracking trajectory of the second camera; the tracking trajectory refers to a time-sequentially ordered set of tracking results of the image frames acquired by each camera along the time sequence.
11. The multi-camera object tracking method of claim 1, further comprising, before filtering each of the feature images: and smoothing the characteristic image.
12. An electronic device, associated with a camera array, comprising:
at least one transceiver coupled to the camera array;
at least one memory storing a computer program;
at least one processor, coupled to the transceiver and the memory, for executing the computer program to perform the multi-camera object tracking method according to any of claims 1 to 11.
13. A computer storage medium, characterized in that a computer program is stored which, when running, performs the multi-camera object tracking method according to any one of claims 1 to 11.
14. A multi-camera object tracking system for use with an electronic device associated with an array of cameras, each camera in the array taking a picture simultaneously and each picture frame taken by each camera at a time as a sequence of pictures; the system comprises:
the image processing module is used for extracting a plurality of original image areas in each image frame in an image sequence respectively by utilizing an initial boundary frame and a plurality of alternative boundary frames obtained under the condition that the center of the initial boundary frame is unchanged and the scaling scale of the initial boundary frame is changed when the initial boundary frame is correspondingly configured for selecting a tracking target from the image frames in the image sequence;
the characteristic extractor is used for extracting the characteristics of a plurality of original image areas of each image frame to obtain a plurality of corresponding characteristic maps of each image frame;
the filter is used for filtering the characteristic maps of each image frame to obtain a plurality of corresponding response maps;
a tracking calculation module, configured to obtain, from a plurality of response maps obtained according to an image frame corresponding to each camera in each image sequence, a response map including a target point with a highest score as a tracking result acquisition basis, and use the highest score as a generation basis of a score of the corresponding camera in the image frame; taking the position of the corresponding pixel point of the target point in the image frame as a boundary frame reference point, taking the scale of the alternative boundary frame used for obtaining the tracking result as the boundary frame scale, and combining the boundary frame reference point and the boundary frame scale to construct a tracking result boundary frame for extracting the tracking result from the image frame; comparing the score of each camera in the corresponding image frame in the image sequence with a preset threshold value so as to judge whether the camera has the condition that the tracking target is shielded; for the first type of image frames of the camera judged to have the condition that the tracking target is shielded, correcting the corresponding tracking result boundary frame by an inter-camera constraint method, and extracting the tracking result from the first type of image frames; and extracting the tracking result by utilizing the corresponding tracking result boundary frame for the second type of image frame of the camera which is judged that the tracking target is not shielded.
CN201811637626.8A 2018-12-29 2018-12-29 Multi-camera target tracking method, system, device and storage medium Active CN111383252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811637626.8A CN111383252B (en) 2018-12-29 2018-12-29 Multi-camera target tracking method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811637626.8A CN111383252B (en) 2018-12-29 2018-12-29 Multi-camera target tracking method, system, device and storage medium

Publications (2)

Publication Number Publication Date
CN111383252A true CN111383252A (en) 2020-07-07
CN111383252B CN111383252B (en) 2023-03-24

Family

ID=71218038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811637626.8A Active CN111383252B (en) 2018-12-29 2018-12-29 Multi-camera target tracking method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN111383252B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111815682A (en) * 2020-09-07 2020-10-23 长沙鹏阳信息技术有限公司 Multi-target tracking method based on multi-track fusion
CN111833379A (en) * 2020-07-16 2020-10-27 西安电子科技大学 Method for tracking target position in moving object by monocular camera
CN113012194A (en) * 2020-12-25 2021-06-22 深圳市铂岩科技有限公司 Target tracking method, device, medium and equipment
CN115424187A (en) * 2022-11-07 2022-12-02 松立控股集团股份有限公司 Auxiliary driving method for multi-angle camera collaborative importance ranking constraint
CN115619832A (en) * 2022-12-20 2023-01-17 浙江莲荷科技有限公司 Multi-camera collaborative multi-target track confirmation method, system and related device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015163830A1 (en) * 2014-04-22 2015-10-29 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Target localization and size estimation via multiple model learning in visual tracking
WO2016026370A1 (en) * 2014-08-22 2016-02-25 Zhejiang Shenghui Lighting Co., Ltd. High-speed automatic multi-object tracking method and system with kernelized correlation filters
CN106981071A (en) * 2017-03-21 2017-07-25 广东华中科技大学工业技术研究院 A kind of method for tracking target applied based on unmanned boat
US20170286774A1 (en) * 2016-04-04 2017-10-05 Xerox Corporation Deep data association for online multi-class multi-object tracking
CN107730536A (en) * 2017-09-15 2018-02-23 北京飞搜科技有限公司 A kind of high speed correlation filtering object tracking method based on depth characteristic
CN108776975A (en) * 2018-05-29 2018-11-09 安徽大学 A kind of visual tracking method based on semi-supervised feature and filter combination learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015163830A1 (en) * 2014-04-22 2015-10-29 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Target localization and size estimation via multiple model learning in visual tracking
WO2016026370A1 (en) * 2014-08-22 2016-02-25 Zhejiang Shenghui Lighting Co., Ltd. High-speed automatic multi-object tracking method and system with kernelized correlation filters
US20170286774A1 (en) * 2016-04-04 2017-10-05 Xerox Corporation Deep data association for online multi-class multi-object tracking
CN106981071A (en) * 2017-03-21 2017-07-25 广东华中科技大学工业技术研究院 A kind of method for tracking target applied based on unmanned boat
CN107730536A (en) * 2017-09-15 2018-02-23 北京飞搜科技有限公司 A kind of high speed correlation filtering object tracking method based on depth characteristic
CN108776975A (en) * 2018-05-29 2018-11-09 安徽大学 A kind of visual tracking method based on semi-supervised feature and filter combination learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘万军等: "遮挡判别下的多尺度相关滤波跟踪算法", 《中国图象图形学报》 *
包晓安等: "基于KCF和SIFT特征的抗遮挡目标跟踪算法", 《计算机测量与控制》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111833379A (en) * 2020-07-16 2020-10-27 西安电子科技大学 Method for tracking target position in moving object by monocular camera
CN111833379B (en) * 2020-07-16 2023-07-28 西安电子科技大学 Method for tracking target position in moving object by monocular camera
CN111815682A (en) * 2020-09-07 2020-10-23 长沙鹏阳信息技术有限公司 Multi-target tracking method based on multi-track fusion
CN113012194A (en) * 2020-12-25 2021-06-22 深圳市铂岩科技有限公司 Target tracking method, device, medium and equipment
CN113012194B (en) * 2020-12-25 2024-04-09 深圳市铂岩科技有限公司 Target tracking method, device, medium and equipment
CN115424187A (en) * 2022-11-07 2022-12-02 松立控股集团股份有限公司 Auxiliary driving method for multi-angle camera collaborative importance ranking constraint
CN115619832A (en) * 2022-12-20 2023-01-17 浙江莲荷科技有限公司 Multi-camera collaborative multi-target track confirmation method, system and related device

Also Published As

Publication number Publication date
CN111383252B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN111383252B (en) Multi-camera target tracking method, system, device and storage medium
KR101333871B1 (en) Method and arrangement for multi-camera calibration
JP6095018B2 (en) Detection and tracking of moving objects
US8290212B2 (en) Super-resolving moving vehicles in an unregistered set of video frames
CN107358623B (en) Relevant filtering tracking method based on significance detection and robustness scale estimation
US10311595B2 (en) Image processing device and its control method, imaging apparatus, and storage medium
US8446468B1 (en) Moving object detection using a mobile infrared camera
CN109685045B (en) Moving target video tracking method and system
Lee et al. Simultaneous localization, mapping and deblurring
US20150205997A1 (en) Method, apparatus and computer program product for human-face features extraction
CN108229475B (en) Vehicle tracking method, system, computer device and readable storage medium
EP3798975B1 (en) Method and apparatus for detecting subject, electronic device, and computer readable storage medium
EP3093822B1 (en) Displaying a target object imaged in a moving picture
CN111340749B (en) Image quality detection method, device, equipment and storage medium
CN108875500B (en) Pedestrian re-identification method, device and system and storage medium
Seo Image denoising and refinement based on an iteratively reweighted least squares filter
CN111507340B (en) Target point cloud data extraction method based on three-dimensional point cloud data
Hua et al. Removing atmospheric turbulence effects via geometric distortion and blur representation
CN116883897A (en) Low-resolution target identification method
CN114757984A (en) Scene depth estimation method and device of light field camera
Carbajal et al. Single image non-uniform blur kernel estimation via adaptive basis decomposition.
CN108062741B (en) Binocular image processing method, imaging device and electronic equipment
Halperin et al. Clear Skies Ahead: Towards Real‐Time Automatic Sky Replacement in Video
JP2018010359A (en) Information processor, information processing method, and program
CN111144441A (en) DSO luminosity parameter estimation method and device based on feature matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant