WO2022217840A1 - 一种复杂背景下高精度多目标跟踪方法 - Google Patents
一种复杂背景下高精度多目标跟踪方法 Download PDFInfo
- Publication number
- WO2022217840A1 WO2022217840A1 PCT/CN2021/119796 CN2021119796W WO2022217840A1 WO 2022217840 A1 WO2022217840 A1 WO 2022217840A1 CN 2021119796 W CN2021119796 W CN 2021119796W WO 2022217840 A1 WO2022217840 A1 WO 2022217840A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- matching
- tracking
- result
- trajectory
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000001514 detection method Methods 0.000 claims abstract description 54
- 230000004044 response Effects 0.000 claims abstract description 26
- 230000008569 process Effects 0.000 claims abstract description 17
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 238000001914 filtration Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000000007 visual effect Effects 0.000 claims description 9
- 125000000205 L-threonino group Chemical group [H]OC(=O)[C@@]([H])(N([H])[*])[C@](C([H])([H])[H])([H])O[H] 0.000 claims description 6
- 238000000691 measurement method Methods 0.000 claims description 6
- 101150105088 Dele1 gene Proteins 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 abstract description 3
- 230000008859 change Effects 0.000 description 6
- 230000000875 corresponding effect Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 230000002596 correlated effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration using histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- the invention relates to the technical field of target tracking, in particular to a high-precision multi-target tracking method under complex background.
- Visual object tracking technology is an important means to process these video data.
- Visual target tracking is a basic research problem in computer vision, and has broad application prospects in many aspects such as video surveillance, unmanned driving, human-computer interaction, planetary detection, and military applications.
- the problem to be solved by visual target tracking can be expressed as: in a video sequence, given the position and size of the target in the first frame (usually a rectangular bounding box), the position and size of the target need to be predicted in subsequent frames.
- Target tracking algorithms can be divided into target tracking algorithms based on generative models and those based on discriminative models. Using the results of historical frames to generate a statistical model for describing target features can effectively deal with the situation of target loss in the tracking process, but the methods based on generative models usually ignore the background information around the target, and at the same time, in the face of background confusion It is easy to lose the target. Most of the traditional tracking methods based on correlation filtering only use hand-designed feature descriptors to extract features, so that the ability to represent the target is limited, and the target position determined by the response map is not accurate enough. Satisfactory performance is obtained. Before 2010, target tracking algorithms generally used classical tracking algorithms, such as mean shift, particle filter, Kalman filter, subspace learning, sparse expression method, and kernel density estimation method.
- Target tracking algorithms based on deep learning can be divided into target tracking algorithms based on deep features, target tracking algorithms based on Siamese networks, target tracking algorithms based on recurrent neural networks, target tracking algorithms based on generative adversarial networks, and target tracking algorithms based on other specific networks.
- the present invention proposes a high-precision multi-target tracking method under complex background, which solves the defect of poor tracking effect of traditional target tracking algorithm under complex scene, including the following steps:
- Step 1 Input the acquired video data into a residual network, perform target resolution feature extraction, and output an extraction result at an output end, where the extraction result includes target resolution features of different dimensions.
- the residual network may adopt ResNet.
- the target resolution features of different dimensions in the extraction result have different characteristics, and the feature expression ability can be enhanced according to the different characteristics.
- the problem of scale change that often occurs in the target tracking process is solved.
- Step 2 calculating the correlation filter response graph of the target resolution feature
- Step 3 using the target detection network to obtain the detection result of the target, the detection result of the target defines the motion state of the target as an 8-dimensional space, which respectively represents the state of the trajectory at a certain moment;
- Step 4 matching the detection result of the target with the predicted trajectory to obtain a matching result, and the matching result includes the value of two metrics of fusion motion information and appearance information;
- Step 5 Compare the fused value of the two metrics with a preset matching threshold to obtain a target tracking result.
- the step 2 includes:
- Step 2-1 perform an interpolation operation on the target resolution features of the different dimensions, and convert the features of the different resolutions into a continuous space domain, and the interpolation operator J d is expressed as:
- each sample contains a D-dimensional feature channel
- N d represents the number of spatial sampling points in the feature channel
- d ⁇ ⁇ 0,1,2,... ⁇ the features of different resolutions are converted to the continuous spatial domain [0, T) ⁇ R, where T represents the size of the support region, t represents the position of the tracking target in the image, t ⁇ [0, T), and n represents the discrete space variable n ⁇ 0,...N d-1 ⁇ ;
- Step 2-2 obtain the correlation filter by minimizing the loss function
- the corresponding loss function in the Fourier domain can be derived as:
- f is the filter
- P is the feature matrix
- z is the interpolation feature map
- the penalty function w ⁇ L 2 (T) is a spatial regularization term
- C is the C-dimensional feature map
- ⁇ is the weight parameter
- F is the The result of the Fourier transformation of the filter f;
- Step 2-3 perform the convolution operation of factorization to obtain the response of the correlation filter.
- the correlation is used to describe the connection between the two signals, and is divided into cross-correlation and positive correlation.
- the correlation refers to is positively correlated;
- the new filter response Rc is expressed as the matrix-vector product Pf, and the convolution operator factoring the filter response Rc is expressed as:
- the feature vector J ⁇ x ⁇ (t) of each position t is first multiplied by the matrix P T , and then the generated feature map is convolved with the filter, P dc represents the learning coefficient, which can be compactly expressed as D ⁇
- P dc represents the learning coefficient, which can be compactly expressed as D ⁇
- the matrix P (P dc ) of C; in the formula, the eigenvector J ⁇ x ⁇ (t) of each position t is expressed as J ⁇ x ⁇ ;
- Step 2-4 using visual saliency detection on the tracking target; in the present invention, by using the visual saliency detection on the tracking target, the tracking target can be quickly positioned and the accuracy of positioning can be improved;
- the steps 2-4 include:
- Step 2-4-1 assuming that the input image is I, if a target area of the tracking target is known, that is, the rectangular frame area, and the surrounding area, the probability that the pixel at the image belongs to the target pixel is:
- m represents the separated target pixel
- O represents the target area
- S represents the surrounding area
- b m represents the color component assigned to the input image I;
- Step 2-4-2 the maximum entropy assigned to the background pixel value is 0.5, in the target tracking process, given the target position of the first frame, in subsequent frames, a rectangular area is performed around the position of the previous frame. Search, the saliency R S of the current frame is calculated as:
- s v (O t ) represents the probability score based on the object model
- s d (O t ) represents the distance score of the Euclidean distance between the target and the target center c t-1 of the previous frame
- P 1:t-1 represents The probability score from the first frame to the previous frame, ⁇ expressed as the standard deviation of the normal distribution.
- the step 3 includes: using the target detection network to obtain the detection result of the target, and defining the motion state of the target as an 8-dimensional space (x t , y t , r t , h t ) , x * , y * , r * , h * ), respectively represent the state of the trajectory at a certain moment, where x t , y t represent the coordinates of the center of the detection frame in the image coordinate system, and r t represent the detection frame. Aspect ratio, h t represents the height of the detection frame; x * , y * , r * , h * represents the corresponding speed information in the image coordinates.
- the target detection network may use yolov4.
- the step 4 includes:
- Step 4-1 use the distance between the detection result of the target and the predicted trajectory to indicate the degree of motion matching:
- d jk represents the k-th state of the j-th target
- y ik represents the k-th state of the i-th trajectory
- the degree of motion matching represents the degree of matching between the detection result of the j-th target and the i-th track
- S i is the covariance matrix of the observation space at the current moment obtained by trajectory prediction
- yi is the predicted observation amount of the trajectory at the current moment
- d j is the state of the jth target.
- Step 4-2 use the minimum cosine distance between the detection result of the target and the feature vector of the target included in the trajectory as the apparent matching degree between the target and the trajectory;
- the present invention performs tracking by combining the apparent matching degree, which can effectively reduce the ID change of the tracking target compared with the prior art. .
- Step 4-3 using the weighted average method to fuse the two measurement methods, that is, the motion distance matching degree and the apparent information, to obtain the value ⁇ i,j of the fusion of the two measurement methods:
- ⁇ is a hyperparameter used to adjust the de-weighting of different items.
- the motion distance matching degree metric is very effective for short-term prediction and matching, while the apparent information is more effective for long-term lost trajectories.
- the choice of hyperparameters depends on the specific data set. If you want to get the importance of the same, ⁇ should be about 0.1.
- the step 5 includes:
- Step 5-1 if the value ⁇ i,j of the fusion of the two metrics is greater than or equal to the preset matching threshold Thres , the target tracking result is that the matching is successful;
- the target tracking result is a matching failure
- Step 5-2 the initial state of the known track is T ini , if the video is successfully matched for n consecutive frames in the processing process, the track is transferred from the initial state T ini to the definite state T cofr , which is regarded as successful tracking;
- the track is changed from the initial state T ini to the deletion state T dele , which is regarded as a tracking failure, and the current track is deleted from the video.
- the invention proposes a high-precision multi-target tracking method under complex background, which improves the traditional tracking algorithm.
- the traditional method matches the detection target and the trajectory, due to the lack of sufficient feature information, it is easy to cause an ID switch, that is, the ID of the detection frame is constantly replaced, which lacks accuracy and robustness.
- a residual network for extracting features is added to extract the multi-resolution features of the target, and the matching process is combined with motion information and appearance information to maximize the accuracy of the matching process.
- FIG. 1 is a schematic diagram of a basic flow frame of a high-precision multi-target tracking method under a complex background provided by an embodiment of the present invention
- FIG. 2 is a schematic diagram of a target area and a surrounding area in a high-precision multi-target tracking method under a complex background provided in part by an embodiment of the present invention.
- an embodiment of the present invention discloses a high-precision multi-target tracking method in a complex background, which is applied to the tracking of multi-target tasks in a complex background, and includes the following steps:
- step 1 video data is obtained first; in this embodiment, the camera can be used to capture the video in real time and send it to the computer, or the computer can directly read the local video.
- the camera and computer can be of any type.
- Step 1 Input the acquired video data into the residual network, extract the target resolution feature, and output the extraction result at the output end, and the extraction result includes the target resolution feature of different dimensions.
- the The residual network can use ResNet.
- the target resolution features of different dimensions in the extraction result have different characteristics, and the feature expression ability can be enhanced according to the different characteristics.
- the problem of scale change that often occurs in the target tracking process is solved.
- Step 2 calculating the correlation filter response graph of the target resolution feature
- Step 3 using the target detection network to obtain the detection result of the target, the detection result of the target defines the motion state of the target as an 8-dimensional space, which respectively represents the state of the trajectory at a certain moment;
- Step 4 matching the detection result of the target with the predicted trajectory to obtain a matching result, and the matching result includes the value of two metrics of fusion motion information and appearance information;
- Step 5 Compare the fused value of the two metrics with a preset matching threshold to obtain a target tracking result.
- the step 2 includes:
- Step 2-1 perform an interpolation operation on the target resolution features of the different dimensions, and convert the features of the different resolutions into a continuous space domain, and the interpolation operator J d is expressed as:
- each sample contains a D-dimensional feature channel
- N d represents the number of spatial sampling points in the feature channel
- d ⁇ ⁇ 0,1,2,... ⁇ the features of different resolutions are converted to the continuous spatial domain [0, T) ⁇ R, where T represents the size of the support region, t represents the position of the tracking target in the image, t ⁇ [0, T), and n represents the discrete space variable n ⁇ 0,...N d-1 ⁇ ;
- Step 2-2 obtain the correlation filter by minimizing the loss function
- the corresponding loss function in the Fourier domain can be derived as:
- f is the filter
- P is the feature matrix
- z is the interpolation feature map
- the penalty function w ⁇ L 2 (T) is a spatial regularization term
- C is the C-dimensional feature map
- ⁇ is the weight parameter
- F is the The result of the Fourier transformation of the filter f;
- Step 2-3 perform the convolution operation of factorization to obtain the response of the correlation filter.
- the correlation is used to describe the connection between the two signals, and is divided into cross-correlation and positive correlation.
- the correlation refers to is positively correlated;
- the new filter response Rc is expressed as the matrix-vector product Pf, and the convolution operator factoring the filter response Rc is expressed as:
- the feature vector J ⁇ x ⁇ (t) of each position t is first multiplied by the matrix P T , and then the generated feature map is convolved with the filter, P dc represents the learning coefficient, which can be compactly expressed as D ⁇
- P dc represents the learning coefficient, which can be compactly expressed as D ⁇
- the matrix P (P dc ) of C; in the formula, the eigenvector J ⁇ x ⁇ (t) of each position t is expressed as J ⁇ x ⁇ ;
- Step 2-4 using visual saliency detection on the tracking target; in this embodiment, by using the visual saliency detection on the tracking target, the tracking target can be quickly located, and the accuracy of positioning can be improved;
- the steps 2-4 include:
- Step 2-4-1 as shown in Figure 2, assuming that the input image is I, if the target area of a tracking target, that is, the rectangular frame area, and the surrounding area are known, the probability that the pixel at the image belongs to the target pixel is :
- m represents the separated target pixel
- O represents the target area
- S represents the surrounding area
- b m represents the color component assigned to the input image I;
- Step 2-4-2 in the target tracking process, given the target position of the first frame, in subsequent frames, a rectangular area search is performed around the position of the previous frame, and the saliency R S of the current frame is calculated by the formula for:
- s v (O t ) represents the probability score based on the object model
- s d (O t ) represents the distance score of the Euclidean distance between the target and the target center c t-1 of the previous frame
- P 1:t-1 represents The probability score from the first frame to the previous frame, ⁇ expressed as the standard deviation of the normal distribution.
- the step 3 includes: using a target detection network to obtain a detection result of the target, and defining the motion state of the target as an 8-dimensional space (x t , y t , r t , h t , x * , y * , r * , h * ), respectively represent the state of the trajectory at a certain moment, where x t , y t represent the center of the detection frame in the image coordinate system
- the coordinates in , r t represents the aspect ratio of the detection frame, h t represents the height of the detection frame; x * , y * , r * , h * represent the corresponding speed information in the image coordinates.
- the target detection network may use yolov4.
- the step 4 includes:
- Step 4-1 use the distance between the detection result of the target and the predicted trajectory to indicate the degree of motion matching:
- the degree of motion matching represents the degree of matching between the detection result of the j-th target and the i-th track
- S i is the covariance matrix of the observation space at the current moment obtained by trajectory prediction
- yi is the predicted observation amount of the trajectory at the current moment
- d j is the state of the jth target.
- Step 4-2 use the minimum cosine distance between the detection result of the target and the feature vector of the target included in the trajectory as the apparent matching degree between the target and the trajectory;
- d jk represents the k-th state of the j-th target
- y ik represents the k-th state of the i-th trajectory
- the present invention performs tracking by combining the apparent matching degree, which can effectively reduce the ID change of the tracking target compared with the prior art. .
- Step 4-3 using the weighted average method to fuse the two measurement methods, that is, the motion distance matching degree and the apparent information, to obtain the value ⁇ i,j of the fusion of the two measurement methods:
- ⁇ is a hyperparameter used to adjust the de-weighting of different items.
- the motion distance matching degree metric is very effective for short-term prediction and matching, while the apparent information is more effective for long-term lost trajectories.
- the choice of hyperparameters depends on the specific data set. If you want to get the importance of the same, ⁇ should be about 0.1.
- the step 5 includes:
- Step 5-1 if the value ⁇ i,j of the fusion of the two metrics is greater than or equal to the preset matching threshold Thres , the target tracking result is that the matching is successful;
- the target tracking result is a matching failure
- Step 5-2 the initial state of the known track is T ini , if the video is successfully matched for n consecutive frames in the processing process, the track is transferred from the initial state T ini to the definite state T cofr , which is regarded as successful tracking;
- the track is changed from the initial state T ini to the deletion state T dele , which is regarded as a tracking failure, and the current track is deleted from the video.
- the invention proposes a high-precision multi-target tracking method under complex background, which improves the traditional tracking algorithm.
- the traditional method matches the detection target and the trajectory, due to the lack of sufficient feature information, it is easy to cause an ID switch, that is, the ID of the detection frame is constantly replaced, which lacks accuracy and robustness.
- a residual network for extracting features is added to extract the multi-resolution features of the target, and the matching process is combined with motion information and appearance information to maximize the accuracy of the matching process.
- the present invention also provides a computer storage medium, wherein the computer storage medium can store a program, and when the program is executed, it can include various embodiments of the high-precision multi-target tracking method in a complex background provided by the present invention some or all of the steps in .
- the storage medium can be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM), and the like.
- the technology in the embodiments of the present invention can be implemented by means of software plus a necessary general hardware platform.
- the technical solutions in the embodiments of the present invention may be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products may be stored in a storage medium, such as ROM/RAM , magnetic disk, optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or some parts of the embodiments of the present invention.
- a computer device which may be a personal computer, a server, or a network device, etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
本发明提供了一种复杂背景下高精度多目标跟踪方法,包括将获取的视频数据输入至残差网络,进行目标分辨率特征提取,在输出端输出提取结果,提取结果包括不同维度的目标分辨率特征;计算目标分辨率特征的相关滤波响应图;利用目标检测网络得到目标的检测结果;将目标的检测结果与预测的轨迹进行匹配,获得匹配结果,匹配结果包括融合运动信息和表观信息两种度量的值;将两种度量融合的值与预设匹配阈值进行对比,获得目标跟踪结果。相较于现有技术,通过增加了一个提取特征的残差网络,提取目标的多分辨率特征,将匹配的过程结合运动信息以及表观信息,更大限度的提高了匹配过程的准确性。
Description
本发明涉及目标跟踪技术领域,尤其涉及一种复杂背景下高精度多目标跟踪方法。
目前,随着计算机视觉技术的发展,海量的视觉信息被获取、传输以及分析,因此如何让计算机处理这些视频数据成为了当前的研究热点,其中视觉目标跟踪技术是处理这些视频数据的重要手段,视觉目标跟踪是计算机视觉中的一个基本的研究问题,在视频监控、无人驾驶、人机交互、行星探测、军事应用等诸多方面都有广泛的应用前景。视觉目标跟踪要解决的问题可以表述为:在视频序列中,给出第一帧中目标的位置和大小(通常是一个矩形边界框),需要在后续帧中预测出目标的位置和大小。
传统的目标跟踪算法可以分为基于生成模型和基于判别模型的目标跟踪算法。采用历史帧的结果来生成用于描述目标特征的统计模型,能够有效处理跟踪过程中目标丢失的情况,但是基于生成模型的方法通常忽略了目标周围的背景信息,同时在面对背景混乱的情况时容易丢失目标。传统的基于相关滤波的跟踪方法大多只使用手工设计的特征描述子来提取特征,使得对目标的表征能力有限,通过响应图确定的目标位置不够精确,在遮挡和背景混杂等因素干扰下通常不能得到令人满意的性能。在2010年之前,目标跟踪算法一般采用经典的跟踪算法,例如均值漂移、粒子滤波、卡尔曼滤波、子空间学习、稀疏表达方法、核密度估计方法。
基于深度学习的目标跟踪算法可以分成基于深度特征的目标跟踪算法、基于孪生网络的目标跟踪算法、基于循环神经网络、基于生成对抗网络的目标跟踪算法和基于其他特定网络的目标跟踪算法。
尽管目标跟踪已经被研究了多年,并取得了一定的进展,但在复杂背景下仍然难以满足实际的需求,在跟踪任务中,当环境亮度降低或者存在较多相似的目标,跟踪算法区分目标区域与背景区域的能力将变弱,跟踪效果将变差;当目标发生遮时,目标的特征信息会丢失,而随着遮挡比例的增大,丢失的信息会越来越多。因此,如何设计一个实时鲁棒的跟踪算法是当前目标跟踪领域的研究焦点。
发明内容
本发明针对目标跟踪中的问题,提出了一种复杂背景下高精度多目标跟踪方法,解决了传统的目标跟踪算法在复杂场景下跟踪效果不佳的缺陷,包括以下步骤:
步骤1,将获取的视频数据输入至残差网络,进行目标分辨率特征提取,在输出端输出提取结果,所述提取结果包括不同维度的目标分辨率特征。具体的,本发明中,所述残差网络可采用ResNet。
本发明中,所述提取结果中不同维度的目标分辨率特征具有不同特性,根据所述不同特性可增强特征表达能力。通过本步骤解决了目标跟踪过程中经常出现的尺度变化问题。
步骤2,计算所述目标分辨率特征的相关滤波响应图;
步骤3,利用目标检测网络得到目标的检测结果,所述目标的检测结果将目标的运动状态定义为一个8维空间,分别表示轨迹在某个时刻的状态;
步骤4,将所述目标的检测结果与预测的轨迹进行匹配,获得匹配结果,所述匹配结果包括融合运动信息和表观信息两种度量的值;
步骤5,将所述两种度量融合的值与预设匹配阈值进行对比,获得目标跟踪结果。
进一步地,在一种实现方式中,所述步骤2,包括:
步骤2-1,对所述不同维度的所述目标分辨率特征进行插值操作,将所述不同分辨率的特征转换到连续空间域,插值算子J
d表示为:
其中,b
d∈L
2(T),属于差值函数,每个样本都包含D维的特征通道,N
d表示特征通道中空间采样点的数目,d∈{0,1,2,…},不同分辨率的特征被转换到连续的空间域[0,T)∈R,T表示支持区域的大小,t表示跟踪目标在图像中的位置,t∈[0,T),n表示离散空间变量n∈{0,…N
d-1};
步骤2-2,通过最小化损失函数,求出相关滤波器;
傅里叶域中相应的损失函数可推导为:
其中,f为滤波器,P是特征矩阵;z表示插值特征图,惩罚函数w∈L
2(T)是一个空间正则化项,C表示为C维特征图,λ表示为权重参数,F表示滤波器f经过傅里叶变化后的结果;
步骤2-3,进行因式分解的卷积操作求出相关滤波器的响应,相关性用来描述两个信号的联系,分为互相关和正相关,本实施例中,所述的相关指的是正相关;
新的滤波响应R
c表示为矩阵向量乘积Pf,所述滤波响应R
c因式分解的卷积算子表示为:
其中,每个位置t的特征向量J{x}(t)首先与矩阵P
T相乘,然后将生成的特征图与滤波器进行卷积,P
dc表示学习系数,可以紧凑地表示为D×C的矩阵P=(P
dc);式中,每个位置t的特征向量J{x}(t)表示为J{x};
步骤2-4,对所述跟踪目标采用视觉显著性检测;本发明中,通过对所述跟踪目标采用视觉显著性检测,能够快速定位所述跟踪目标,并提高定位的准确性;
步骤2-5,由获得的滤波响应R
c和当前帧的显著性R
S相乘,最终的响应图R
f=R
c·R
S,当所述最终的响应图R
f取最大值时,将响应值最大的位置映射到原图,得到所述目标在后续帧中的位置,即获得预测的轨迹。
进一步地,在一种实现方式中,所述步骤2-4,包括:
步骤2-4-1,假设输入图像为I,若已知一个跟踪目标的目标区域,即矩形框区域,以及环绕区域时,在图像处的像素属于目标像素的概率是:
其中,m表示分离出的目标像素,O表示目标区域,S表示环绕区域,b
m表示分配给输入图像I的颜色分量;
所述分配给输入图像I的颜色分量b
m属于目标区域O和环绕区域S的概率分别表示为:
步骤2-4-2,分配给背景像素值的最大熵为0.5,在目标跟踪过程中,给定第一帧的目标位置,在后续帧中,在前一帧的位置周围进行一个矩形区域的搜索,当前帧的显著性R
S计算公式为:
R
S=s
v(O
t)s
d(O
t),
其中,s
v(O
t)表示基于对象模型的概率分数,s
d(O
t)表示目标到前一帧的目标中心c
t-1的欧氏距离的距离分数,P
1:t-1表示从第一帧到前一帧的概率分数,σ表示为正态分布的标准差。
进一步地,在一种实现方式中,所述步骤3,包括:利用目标检测网络得到目标的检测结果,将目标的运动状态定义为一个8维空间(x
t,y
t,r
t,h
t,x
*,y
*,r
*,h
*),分别表示轨迹在某个时刻的状态,其中,x
t,y
t表示检测框的中心在图像坐标系中的坐标,r
t表示检测框的长宽比,h
t表示检测框的高度;x
*,y
*,r
*,h
*表示在图像坐标中对应的速度信息。具体的,本实施例中,所述目标检测网络可采用yolov4。
进一步地,在一种实现方式中,所述步骤4,包括:
步骤4-1,使用所述目标的检测结果与预测的轨迹之间的距离表示运动匹配程度:
其中,d
jk表示第j个目标的第k个的状态,y
ik表示第i条轨迹的第k个状态;
所述运动匹配程度表示第j个目标的检测结果和第i条轨迹之间的匹配程度;
其中,S
i是轨迹预测得到的在当前时刻观测空间的协方差矩阵,y
i是轨迹在当前时刻的预测观测量,d
j是第j个目标的状态。
步骤4-2,使用所述目标的检测结果与轨迹包含的目标的特征向量之间的最小余弦距离作为目标与轨迹之间的表观匹配程度;
第j个目标的检测结果和第i条轨迹之间的余弦相似度为:
余弦距离=1-余弦相似度,所述目标与轨迹之间的表观匹配程度为:
现有技术中,单独使用运动信息作为匹配度度量会导致追踪目标的ID变化过于严重,因此,本发明通过联合表观匹配度进行追踪,相较于现有技术能够有效减少追踪目标的ID变化。
步骤4-3,利用加权平均的方式对两种度量方式,即对运动距离匹配度和表观信息进行融合,获得所述两种度量方式融合的值ω
i,j:
其中,μ为超参数,用于调整不同项的去权重。
具体的,本实施例中,所述运动距离匹配度度量对于短期的预测和匹配效果很好,而表观信息对于长时间丢失的轨迹而言,匹配度度量的比较有效。超参数的选择要看具体的数据集,如果想取相通的重要程度,μ应该取0.1左右。
进一步地,在一种实现方式中,所述步骤5,包括:
步骤5-1,若所述两种度量融合的值ω
i,j大于或等于预设匹配阈值T
hres,则所述目标跟踪结果为匹配成功;
若所述两种度量融合的ω
i,j小于预设匹配阈值T
hres,则所述目标跟踪结果为匹配失败;
步骤5-2,已知轨迹的初始状态为T
ini,若视频在处理过程中连续n帧匹配成功,将所述轨迹从初始状态T
ini转为确定状态T
cofr,视为跟踪成功;
若视频连续匹配成功的帧数小于n帧,计当前帧数为z,z=z+1;返回所述步骤1,重新进行匹配;
若视频连续n帧都匹配失败,将轨迹从初始状态T
ini转为删除状态T
dele,视为跟踪失败,将当前轨迹从视频中删除。
本发明提出了一种复杂背景下高精度多目标跟踪方法,该方法将传统的跟踪算法进行改进。传统的方法在进行检测目标与轨迹匹配时,由于缺乏足够的特征信息,容易造成ID switch,就是检测框的ID不停的进行更换,缺乏准确性与鲁棒性。本文通过增加了一个提取特征的残差网络,提取目标的多分辨率特征,将匹配的过程结合运动信息以及表观信息,更大限度的提高了匹配过程的准确性。
为了更清楚地说明本发明的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例部分提供的一种复杂背景下高精度多目标跟踪方法的基本流程框架示意图;
图2是本发明实施例部分提供的一种复杂背景下高精度多目标跟踪方法中目标区域 和环绕区域示意图。
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。
如图1所示,本发明实施例公开一种复杂背景下高精度多目标跟踪方法,应用于复杂背景下多目标任务的跟踪,包括以下步骤:
在步骤1之前,首先获取视频数据;本实施例中,可利用摄像头时实抓拍视频并发送到计算机,或者计算机直接读取本地视频。具体的,所述摄像头和计算机可采用任意型号。
步骤1,将获取的视频数据输入至残差网络,进行目标分辨率特征提取,在输出端输出提取结果,所述提取结果包括不同维度的目标分辨率特征,具体的,本实施例中,所述残差网络可采用ResNet。
本实施例中,所述提取结果中不同维度的目标分辨率特征具有不同特性,根据所述不同特性可增强特征表达能力。通过本步骤解决了目标跟踪过程中经常出现的尺度变化问题。
步骤2,计算所述目标分辨率特征的相关滤波响应图;
步骤3,利用目标检测网络得到目标的检测结果,所述目标的检测结果将目标的运动状态定义为一个8维空间,分别表示轨迹在某个时刻的状态;
步骤4,将所述目标的检测结果与预测的轨迹进行匹配,获得匹配结果,所述匹配结果包括融合运动信息和表观信息两种度量的值;
步骤5,将所述两种度量融合的值与预设匹配阈值进行对比,获得目标跟踪结果。
本发明实施例所述的一种复杂背景下高精度多目标跟踪方法中,所述步骤2,包括:
步骤2-1,对所述不同维度的所述目标分辨率特征进行插值操作,将所述不同分辨率的特征转换到连续空间域,插值算子J
d表示为:
其中,b
d∈L
2(T),属于差值函数,每个样本都包含D维的特征通道,N
d表示特征通道中空间采样点的数目,d∈{0,1,2,…},不同分辨率的特征被转换到连续的空间域[0,T)∈R,T表示支持区域的大小,t表示跟踪目标在图像中的位置,t∈[0,T),n表示离散空间变量n∈{0,…N
d-1};
步骤2-2,通过最小化损失函数,求出相关滤波器;
傅里叶域中相应的损失函数可推导为:
其中,f为滤波器,P是特征矩阵;z表示插值特征图,惩罚函数w∈L
2(T)是一个空间正则化项,C表示为C维特征图,λ表示为权重参数,F表示滤波器f经过傅里叶变化后的结果;
步骤2-3,进行因式分解的卷积操作求出相关滤波器的响应,相关性用来描述两个信号的联系,分为互相关和正相关,本实施例中,所述的相关指的是正相关;
新的滤波响应R
c表示为矩阵向量乘积Pf,所述滤波响应R
c因式分解的卷积算子表示为:
其中,每个位置t的特征向量J{x}(t)首先与矩阵P
T相乘,然后将生成的特征图与滤波器进行卷积,P
dc表示学习系数,可以紧凑地表示为D×C的矩阵P=(P
dc);式中,每个位置t的特征向量J{x}(t)表示为J{x};
步骤2-4,对所述跟踪目标采用视觉显著性检测;本实施例中,通过对所述跟踪目标采用视觉显著性检测,能够快速定位所述跟踪目标,并提高定位的准确性;
本发明实施例所述的一种复杂背景下高精度多目标跟踪方法中,所述步骤2-4,包括:
步骤2-4-1,如图2所示,假设输入图像为I,若已知一个跟踪目标的目标区域,即 矩形框区域,以及环绕区域时,在图像处的像素属于目标像素的概率是:
其中,m表示分离出的目标像素,O表示目标区域,S表示环绕区域,b
m表示分配给输入图像I的颜色分量;
所述分配给输入图像I的颜色分量b
m属于目标区域O和环绕区域S的概率分别表示为:
步骤2-4-2,在目标跟踪过程中,给定第一帧的目标位置,在后续帧中,在前一帧的位置周围进行一个矩形区域的搜索,当前帧的显著性R
S计算公式为:
R
S=s
v(O
t)s
d(O
t),
其中,s
v(O
t)表示基于对象模型的概率分数,s
d(O
t)表示目标到前一帧的目标中心c
t-1的欧氏距离的距离分数,P
1:t-1表示从第一帧到前一帧的概率分数,σ表示为正态分布的标准差。
步骤2-5,由获得的滤波响应R
c和当前帧的显著性R
S相乘,最终的响应图R
f=R
c·R
S,当所述最终的响应图R
f取最大值时,将响应值最大的位置映射到原图,得到所述目标在后续帧中的位置,即获得预测的轨迹。
本发明实施例所述的一种复杂背景下高精度多目标跟踪方法中,所述步骤3,包括:利用目标检测网络得到目标的检测结果,将目标的运动状态定义为一个8维空间(x
t,y
t,r
t,h
t,x
*,y
*,r
*,h
*),分别表示轨迹在某个时刻的状态,其中,x
t,y
t表示检测框的中心在图像坐标系中的坐标,r
t表示检测框的长宽比,h
t表示检测框的高度;x
*,y
*,r
*,h
*表示在图像坐标中对应的速度信息。具体的,本实施例中,所述目标检测网络可采用yolov4。
本发明实施例所述的一种复杂背景下高精度多目标跟踪方法中,所述步骤4,包括:
步骤4-1,使用所述目标的检测结果与预测的轨迹之间的距离表示运动匹配程度:
所述运动匹配程度表示第j个目标的检测结果和第i条轨迹之间的匹配程度;
其中,S
i是轨迹预测得到的在当前时刻观测空间的协方差矩阵,y
i是轨迹在当前时刻的预测观测量,d
j是第j个目标的状态。
步骤4-2,使用所述目标的检测结果与轨迹包含的目标的特征向量之间的最小余弦距离作为目标与轨迹之间的表观匹配程度;
第j个目标的检测结果和第i条轨迹之间的余弦相似度为:
其中,d
jk表示第j个目标的第k个的状态,y
ik表示第i条轨迹的第k个状态;
余弦距离=1-余弦相似度,所述目标与轨迹之间的表观匹配程度为:
现有技术中,单独使用运动信息作为匹配度度量会导致追踪目标的ID变化过于严重,因此,本发明通过联合表观匹配度进行追踪,相较于现有技术能够有效减少追踪目标的ID变化。
步骤4-3,利用加权平均的方式对两种度量方式,即对运动距离匹配度和表观信息进行融合,获得所述两种度量方式融合的值ω
i,j:
其中,μ为超参数,用于调整不同项的去权重。
具体的,本实施例中,所述运动距离匹配度度量对于短期的预测和匹配效果很好,而表观信息对于长时间丢失的轨迹而言,匹配度度量的比较有效。超参数的选择要看具体的数据集,如果想取相通的重要程度,μ应该取0.1左右。
本发明实施例所述的一种复杂背景下高精度多目标跟踪方法中,所述步骤5,包括:
步骤5-1,若所述两种度量融合的值ω
i,j大于或等于预设匹配阈值T
hres,则所述目标跟踪结果为匹配成功;
若所述两种度量融合的ω
i,j小于预设匹配阈值T
hres,则所述目标跟踪结果为匹配失败;
步骤5-2,已知轨迹的初始状态为T
ini,若视频在处理过程中连续n帧匹配成功,将所述轨迹从初始状态T
ini转为确定状态T
cofr,视为跟踪成功;
若视频连续匹配成功的帧数小于n帧,计当前帧数为z,z=z+1;返回所述步骤1,重新进行匹配;
若视频连续n帧都匹配失败,将轨迹从初始状态T
ini转为删除状态T
dele,视为跟踪失败,将当前轨迹从视频中删除。
具体的,本实施例中,n=3;当前帧匹配结束后,z=z+1;重新返回所述步骤1,视频进入下一帧图像的目标匹配跟踪。
本发明提出了一种复杂背景下高精度多目标跟踪方法,该方法将传统的跟踪算法进行改进。传统的方法在进行检测目标与轨迹匹配时,由于缺乏足够的特征信息,容易造成ID switch,就是检测框的ID不停的进行更换,缺乏准确性与鲁棒性。本文通过增加了一个提取特征的残差网络,提取目标的多分辨率特征,将匹配的过程结合运动信息以及表观信息,更大限度的提高了匹配过程的准确性。
具体实现中,本发明还提供一种计算机存储介质,其中,该计算机存储介质可存储 有程序,该程序执行时可包括本发明提供的一种复杂背景下高精度多目标跟踪方法的各实施例中的部分或全部步骤。所述的存储介质可为磁碟、光盘、只读存储记忆体(read-only memory,ROM)或随机存储记忆体(random access memory,RAM)等。
本领域的技术人员可以清楚地了解到本发明实施例中的技术可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本发明实施例中的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。
本说明书中各个实施例之间相同相似的部分互相参见即可。以上所述的本发明实施方式并不构成对本发明保护范围的限定。
Claims (6)
- 一种复杂背景下高精度多目标跟踪方法,其特征在于,包括如下步骤:步骤1,将获取的视频数据输入至残差网络,进行目标分辨率特征提取,在输出端输出提取结果,所述提取结果包括不同维度的目标分辨率特征;步骤2,计算所述目标分辨率特征的相关滤波响应图;步骤3,利用目标检测网络得到目标的检测结果,所述目标的检测结果将目标的运动状态定义为一个8维空间,分别表示轨迹在某个时刻的状态;步骤4,将所述目标的检测结果与预测的轨迹进行匹配,获得匹配结果,所述匹配结果包括融合运动信息和表观信息两种度量的值;步骤5,将所述两种度量融合的值与预设匹配阈值进行对比,获得目标跟踪结果。
- 根据权利要求1所述的一种复杂背景下高精度多目标跟踪方法,其特征在于,所述步骤2,包括:步骤2-1,对所述不同维度的所述目标分辨率特征进行插值操作,将所述不同分辨率的特征转换到连续空间域,插值算子J d表示为:其中,b d∈L 2(T),属于差值函数,每个样本都包含D维的特征通道,N d表示特征通道中空间采样点的数目,d∈{0,1,2,…},不同分辨率的特征被转换到连续的空间域[0,T)∈R,T表示支持区域的大小,t表示跟踪目标在图像中的位置,t∈[0,T),n表示离散空间变量n∈{0,…N d-1};步骤2-2,通过最小化损失函数,求出相关滤波器;傅里叶域中相应的损失函数可推导为:其中,f为滤波器,P是特征矩阵;z表示插值特征图,惩罚函数w∈L 2(T)是一个空 间正则化项,C表示为C维特征图,λ表示为权重参数,F表示滤波器f经过傅里叶变化后的结果;步骤2-3,进行因式分解的卷积操作求出相关滤波器的响应;新的滤波响应R c表示为矩阵向量乘积Pf,所述滤波响应R c因式分解的卷积算子表示为:其中,每个位置t的特征向量J{x}(t)首先与矩阵P T相乘,然后将生成的特征图与滤波器进行卷积,P dc表示学习系数,可以紧凑地表示为D×C的矩阵P=(P dc);式中,每个位置t的特征向量J{x}(t)表示为J{x};步骤2-4,对所述跟踪目标采用视觉显著性检测;步骤2-5,由获得的滤波响应R c和当前帧的显著性R S相乘,最终的响应图R f=R c·R S,当所述最终的响应图R f取最大值时,将响应值最大的位置映射到原图,得到所述目标在后续帧中的位置,即获得预测的轨迹。
- 根据权利要求2所述的一种复杂背景下高精度多目标跟踪方法,其特征在于,所述步骤2-4,包括:步骤2-4-1,假设输入图像为I,若已知一个跟踪目标的目标区域,即矩形框区域,以及环绕区域时,在图像处的像素属于目标像素的概率是:其中,m表示分离出的目标像素,O表示目标区域,S表示环绕区域,b m表示分配给输入图像I的颜色分量;所述分配给输入图像I的颜色分量b m属于目标区域O和环绕区域S的概率分别表示为:步骤2-4-2,在目标跟踪过程中,给定第一帧的目标位置,在后续帧中,在前一帧的位置周围进行一个矩形区域的搜索,当前帧的显著性R S计算公式为:R S=s v(O t)s d(O t),其中,s v(O t)表示基于对象模型的概率分数,s d(O t)表示目标到前一帧的目标中心c t-1的欧氏距离的距离分数,P 1:t-1表示从第一帧到前一帧的概率分数,σ表示为正态分布的标准差。
- 根据权利要求1所述的一种复杂背景下高精度多目标跟踪方法,其特征在于,所述步骤3,包括:利用目标检测网络得到目标的检测结果,将目标的运动状态定义为一个8维空间(x t,y t,r t,h t,x *,y *,r *,h *),分别表示轨迹在某个时刻的状态,其中,x t,y t表示检测框的中心在图像坐标系中的坐标,r t表示检测框的长宽比,h t表示检测框的高度;x *,y *,r *,h *表示在图像坐标中对应的速度信息。
- 根据权利要求1所述的一种复杂背景下高精度多目标跟踪方法,其特征在于,所述步骤4,包括:步骤4-1,使用所述目标的检测结果与预测的轨迹之间的距离表示运动匹配程度:其中,d jk表示第j个目标的第k个的状态,y ik表示第i条轨迹的第k个状态;所述运动匹配程度表示第j个目标的检测结果和第i条轨迹之间的匹配程度;其中,S i是轨迹预测得到的在当前时刻观测空间的协方差矩阵,y i是轨迹在当前时刻的预测观测量,d j是第j个目标的状态。步骤4-2,使用所述目标的检测结果与轨迹包含的目标的特征向量之间的最小余弦距离作为目标与轨迹之间的表观匹配程度;第j个目标的检测结果和第i条轨迹之间的余弦相似度为:余弦距离=1-余弦相似度,所述目标与轨迹之间的表观匹配程度为:步骤4-3,利用加权平均的方式对两种度量方式,即对运动距离匹配度和表观信息进行融合,获得所述两种度量方式融合的值ω i,j:其中,μ为超参数,用于调整不同项的去权重。
- 根据权利要求1所述的一种复杂背景下高精度多目标跟踪方法,其特征在于,所述步骤5,包括:步骤5-1,若所述两种度量融合的值ω i,j大于或等于预设匹配阈值T hres,则所述目标跟踪结果为匹配成功;若所述两种度量融合的ω i,j小于预设匹配阈值T hres,则所述目标跟踪结果为匹配失败;步骤5-2,已知轨迹的初始状态为T ini,若视频在处理过程中连续n帧匹配成功,将 所述轨迹从初始状态T ini转为确定状态T cofr,视为跟踪成功;若视频连续匹配成功的帧数小于n帧,计当前帧数为z,z=z+1;返回所述步骤1,重新进行匹配;若视频连续n帧都匹配失败,将轨迹从初始状态T ini转为删除状态T dele,视为跟踪失败,将当前轨迹从视频中删除。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110404599.5 | 2021-04-15 | ||
CN202110404599.5A CN113012203B (zh) | 2021-04-15 | 2021-04-15 | 一种复杂背景下高精度多目标跟踪方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022217840A1 true WO2022217840A1 (zh) | 2022-10-20 |
Family
ID=76389386
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/119796 WO2022217840A1 (zh) | 2021-04-15 | 2021-09-23 | 一种复杂背景下高精度多目标跟踪方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113012203B (zh) |
WO (1) | WO2022217840A1 (zh) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115272420A (zh) * | 2022-09-28 | 2022-11-01 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 一种长时目标跟踪方法、系统及存储介质 |
CN115984846A (zh) * | 2023-02-06 | 2023-04-18 | 山东省人工智能研究院 | 一种基于深度学习的高分辨率图像中小目标的智能识别方法 |
CN116129332A (zh) * | 2023-04-12 | 2023-05-16 | 武汉理工大学 | 多船舶目标的跟踪识别方法、装置、电子设备及存储介质 |
CN116343125A (zh) * | 2023-03-30 | 2023-06-27 | 北京国泰星云科技有限公司 | 一种基于计算机视觉的集装箱箱底锁头检测方法 |
CN116452791A (zh) * | 2023-03-27 | 2023-07-18 | 广州市斯睿特智能科技有限公司 | 多相机点位的缺陷区域定位方法、系统、装置及存储介质 |
CN116563348A (zh) * | 2023-07-06 | 2023-08-08 | 中国科学院国家空间科学中心 | 基于双特征模板的红外弱小目标多模态跟踪方法及系统 |
CN116596958A (zh) * | 2023-07-18 | 2023-08-15 | 四川迪晟新达类脑智能技术有限公司 | 一种基于在线样本增广的目标跟踪方法及装置 |
CN116681660A (zh) * | 2023-05-18 | 2023-09-01 | 中国长江三峡集团有限公司 | 一种目标对象缺陷检测方法、装置、电子设备及存储介质 |
CN116721132A (zh) * | 2023-06-20 | 2023-09-08 | 中国农业大学 | 一种工厂化养殖的鱼类多目标跟踪方法、系统及设备 |
CN116758119A (zh) * | 2023-06-27 | 2023-09-15 | 重庆比特数图科技有限公司 | 基于运动补偿、联动的多目标循环检测跟踪方法及系统 |
CN116758110A (zh) * | 2023-08-15 | 2023-09-15 | 中国科学技术大学 | 复杂运动场景下的鲁棒多目标跟踪方法 |
CN116993779A (zh) * | 2023-08-03 | 2023-11-03 | 重庆大学 | 一种适于监控视频下的车辆目标跟踪方法 |
CN117214881A (zh) * | 2023-07-21 | 2023-12-12 | 哈尔滨工程大学 | 一种复杂场景下基于Transformer网络的多目标跟踪方法 |
CN117455955A (zh) * | 2023-12-14 | 2024-01-26 | 武汉纺织大学 | 一种基于无人机视角下的行人多目标跟踪方法 |
CN117746304A (zh) * | 2024-02-21 | 2024-03-22 | 浪潮软件科技有限公司 | 基于计算机视觉的冰箱食材识别定位方法及系统 |
CN117809054A (zh) * | 2024-02-29 | 2024-04-02 | 南京邮电大学 | 一种基于特征解耦融合网络的多目标跟踪方法 |
CN117853759A (zh) * | 2024-03-08 | 2024-04-09 | 山东海润数聚科技有限公司 | 一种多目标跟踪方法、系统、设备和存储介质 |
CN117437261B (zh) * | 2023-10-08 | 2024-05-31 | 南京威翔科技有限公司 | 一种适用于边缘端远距离目标的跟踪方法 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113012203B (zh) * | 2021-04-15 | 2023-10-20 | 南京莱斯电子设备有限公司 | 一种复杂背景下高精度多目标跟踪方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197502A (zh) * | 2019-06-06 | 2019-09-03 | 山东工商学院 | 一种基于身份再识别的多目标跟踪方法及系统 |
CN110490901A (zh) * | 2019-07-15 | 2019-11-22 | 武汉大学 | 抗姿态变化的行人检测跟踪方法 |
CN111476826A (zh) * | 2020-04-10 | 2020-07-31 | 电子科技大学 | 一种基于ssd目标检测的多目标车辆跟踪方法 |
CN111652909A (zh) * | 2020-04-21 | 2020-09-11 | 南京理工大学 | 一种基于深度哈希特征的行人多目标追踪方法 |
CN113012203A (zh) * | 2021-04-15 | 2021-06-22 | 南京莱斯电子设备有限公司 | 一种复杂背景下高精度多目标跟踪方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019006632A1 (zh) * | 2017-07-04 | 2019-01-10 | 深圳大学 | 一种视频多目标跟踪方法及装置 |
CN107818571B (zh) * | 2017-12-11 | 2018-07-20 | 珠海大横琴科技发展有限公司 | 基于深度学习网络和均值漂移的船只自动跟踪方法及系统 |
CN108198209B (zh) * | 2017-12-22 | 2020-05-01 | 天津理工大学 | 在发生遮挡和尺度变化情况下行人跟踪方法 |
CN108875588B (zh) * | 2018-05-25 | 2022-04-15 | 武汉大学 | 基于深度学习的跨摄像头行人检测跟踪方法 |
US10970856B2 (en) * | 2018-12-27 | 2021-04-06 | Baidu Usa Llc | Joint learning of geometry and motion with three-dimensional holistic understanding |
CN110321937B (zh) * | 2019-06-18 | 2022-05-17 | 哈尔滨工程大学 | 一种Faster-RCNN结合卡尔曼滤波的运动人体跟踪方法 |
CN110660083B (zh) * | 2019-09-27 | 2022-12-23 | 国网江苏省电力工程咨询有限公司 | 一种结合视频场景特征感知的多目标跟踪方法 |
CN111008997B (zh) * | 2019-12-18 | 2023-04-07 | 南京莱斯电子设备有限公司 | 一种车辆检测与跟踪一体化方法 |
CN111508002B (zh) * | 2020-04-20 | 2020-12-25 | 北京理工大学 | 一种小型低飞目标视觉检测跟踪系统及其方法 |
CN112084914B (zh) * | 2020-08-31 | 2024-04-26 | 的卢技术有限公司 | 一种融合空间运动和表观特征学习的多目标跟踪方法 |
CN112308881B (zh) * | 2020-11-02 | 2023-08-15 | 西安电子科技大学 | 一种基于遥感图像的舰船多目标跟踪方法 |
-
2021
- 2021-04-15 CN CN202110404599.5A patent/CN113012203B/zh active Active
- 2021-09-23 WO PCT/CN2021/119796 patent/WO2022217840A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197502A (zh) * | 2019-06-06 | 2019-09-03 | 山东工商学院 | 一种基于身份再识别的多目标跟踪方法及系统 |
CN110490901A (zh) * | 2019-07-15 | 2019-11-22 | 武汉大学 | 抗姿态变化的行人检测跟踪方法 |
CN111476826A (zh) * | 2020-04-10 | 2020-07-31 | 电子科技大学 | 一种基于ssd目标检测的多目标车辆跟踪方法 |
CN111652909A (zh) * | 2020-04-21 | 2020-09-11 | 南京理工大学 | 一种基于深度哈希特征的行人多目标追踪方法 |
CN113012203A (zh) * | 2021-04-15 | 2021-06-22 | 南京莱斯电子设备有限公司 | 一种复杂背景下高精度多目标跟踪方法 |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115272420A (zh) * | 2022-09-28 | 2022-11-01 | 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) | 一种长时目标跟踪方法、系统及存储介质 |
CN115984846B (zh) * | 2023-02-06 | 2023-10-10 | 山东省人工智能研究院 | 一种基于深度学习的高分辨率图像中小目标的智能识别方法 |
CN115984846A (zh) * | 2023-02-06 | 2023-04-18 | 山东省人工智能研究院 | 一种基于深度学习的高分辨率图像中小目标的智能识别方法 |
CN116452791A (zh) * | 2023-03-27 | 2023-07-18 | 广州市斯睿特智能科技有限公司 | 多相机点位的缺陷区域定位方法、系统、装置及存储介质 |
CN116452791B (zh) * | 2023-03-27 | 2024-03-22 | 广州市斯睿特智能科技有限公司 | 多相机点位的缺陷区域定位方法、系统、装置及存储介质 |
CN116343125A (zh) * | 2023-03-30 | 2023-06-27 | 北京国泰星云科技有限公司 | 一种基于计算机视觉的集装箱箱底锁头检测方法 |
CN116343125B (zh) * | 2023-03-30 | 2024-04-02 | 北京国泰星云科技有限公司 | 一种基于计算机视觉的集装箱箱底锁头检测方法 |
CN116129332A (zh) * | 2023-04-12 | 2023-05-16 | 武汉理工大学 | 多船舶目标的跟踪识别方法、装置、电子设备及存储介质 |
US12008801B1 (en) | 2023-04-12 | 2024-06-11 | Wuhan University Of Technology | Tracking and identification method, device, electronic device, and storage medium for multiple vessel targets |
CN116681660B (zh) * | 2023-05-18 | 2024-04-19 | 中国长江三峡集团有限公司 | 一种目标对象缺陷检测方法、装置、电子设备及存储介质 |
CN116681660A (zh) * | 2023-05-18 | 2023-09-01 | 中国长江三峡集团有限公司 | 一种目标对象缺陷检测方法、装置、电子设备及存储介质 |
CN116721132B (zh) * | 2023-06-20 | 2023-11-24 | 中国农业大学 | 一种工厂化养殖的鱼类多目标跟踪方法、系统及设备 |
CN116721132A (zh) * | 2023-06-20 | 2023-09-08 | 中国农业大学 | 一种工厂化养殖的鱼类多目标跟踪方法、系统及设备 |
CN116758119A (zh) * | 2023-06-27 | 2023-09-15 | 重庆比特数图科技有限公司 | 基于运动补偿、联动的多目标循环检测跟踪方法及系统 |
CN116758119B (zh) * | 2023-06-27 | 2024-04-19 | 重庆比特数图科技有限公司 | 基于运动补偿、联动的多目标循环检测跟踪方法及系统 |
CN116563348A (zh) * | 2023-07-06 | 2023-08-08 | 中国科学院国家空间科学中心 | 基于双特征模板的红外弱小目标多模态跟踪方法及系统 |
CN116563348B (zh) * | 2023-07-06 | 2023-11-14 | 中国科学院国家空间科学中心 | 基于双特征模板的红外弱小目标多模态跟踪方法及系统 |
CN116596958B (zh) * | 2023-07-18 | 2023-10-10 | 四川迪晟新达类脑智能技术有限公司 | 一种基于在线样本增广的目标跟踪方法及装置 |
CN116596958A (zh) * | 2023-07-18 | 2023-08-15 | 四川迪晟新达类脑智能技术有限公司 | 一种基于在线样本增广的目标跟踪方法及装置 |
CN117214881A (zh) * | 2023-07-21 | 2023-12-12 | 哈尔滨工程大学 | 一种复杂场景下基于Transformer网络的多目标跟踪方法 |
CN116993779A (zh) * | 2023-08-03 | 2023-11-03 | 重庆大学 | 一种适于监控视频下的车辆目标跟踪方法 |
CN116993779B (zh) * | 2023-08-03 | 2024-05-14 | 重庆大学 | 一种适于监控视频下的车辆目标跟踪方法 |
CN116758110B (zh) * | 2023-08-15 | 2023-11-17 | 中国科学技术大学 | 复杂运动场景下的鲁棒多目标跟踪方法 |
CN116758110A (zh) * | 2023-08-15 | 2023-09-15 | 中国科学技术大学 | 复杂运动场景下的鲁棒多目标跟踪方法 |
CN117437261B (zh) * | 2023-10-08 | 2024-05-31 | 南京威翔科技有限公司 | 一种适用于边缘端远距离目标的跟踪方法 |
CN117455955B (zh) * | 2023-12-14 | 2024-03-08 | 武汉纺织大学 | 一种基于无人机视角下的行人多目标跟踪方法 |
CN117455955A (zh) * | 2023-12-14 | 2024-01-26 | 武汉纺织大学 | 一种基于无人机视角下的行人多目标跟踪方法 |
CN117746304A (zh) * | 2024-02-21 | 2024-03-22 | 浪潮软件科技有限公司 | 基于计算机视觉的冰箱食材识别定位方法及系统 |
CN117746304B (zh) * | 2024-02-21 | 2024-05-14 | 浪潮软件科技有限公司 | 基于计算机视觉的冰箱食材识别定位方法及系统 |
CN117809054A (zh) * | 2024-02-29 | 2024-04-02 | 南京邮电大学 | 一种基于特征解耦融合网络的多目标跟踪方法 |
CN117809054B (zh) * | 2024-02-29 | 2024-05-10 | 南京邮电大学 | 一种基于特征解耦融合网络的多目标跟踪方法 |
CN117853759B (zh) * | 2024-03-08 | 2024-05-10 | 山东海润数聚科技有限公司 | 一种多目标跟踪方法、系统、设备和存储介质 |
CN117853759A (zh) * | 2024-03-08 | 2024-04-09 | 山东海润数聚科技有限公司 | 一种多目标跟踪方法、系统、设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN113012203B (zh) | 2023-10-20 |
CN113012203A (zh) | 2021-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022217840A1 (zh) | 一种复杂背景下高精度多目标跟踪方法 | |
CN111627045B (zh) | 单镜头下的多行人在线跟踪方法、装置、设备及存储介质 | |
CN109543641B (zh) | 一种实时视频的多目标去重方法、终端设备及存储介质 | |
CN110660082A (zh) | 一种基于图卷积与轨迹卷积网络学习的目标跟踪方法 | |
CN111476817A (zh) | 一种基于yolov3的多目标行人检测跟踪方法 | |
Patil et al. | Msednet: multi-scale deep saliency learning for moving object detection | |
CN113674328A (zh) | 一种多目标车辆跟踪方法 | |
Jellal et al. | LS-ELAS: Line segment based efficient large scale stereo matching | |
CN111402303A (zh) | 一种基于kfstrcf的目标跟踪架构 | |
Jiao et al. | Magicvo: End-to-end monocular visual odometry through deep bi-directional recurrent convolutional neural network | |
Yang et al. | Increaco: incrementally learned automatic check-out with photorealistic exemplar augmentation | |
Akok et al. | Robust object tracking by interleaving variable rate color particle filtering and deep learning | |
Liu et al. | Mean shift fusion color histogram algorithm for nonrigid complex target tracking in sports video | |
Wang et al. | Pmds-slam: Probability mesh enhanced semantic slam in dynamic environments | |
CN108346158B (zh) | 基于主块数据关联的多目标跟踪方法及系统 | |
Dhassi et al. | Visual tracking based on adaptive mean shift multiple appearance models | |
Abdelali et al. | Object tracking in video via particle filter | |
Maia et al. | Visual object tracking by an evolutionary self-organizing neural network | |
Xiao et al. | Research on scale adaptive particle filter tracker with feature integration | |
Ren et al. | An improved ORB-SLAM2 algorithm based on image information entropy | |
CN116580066B (zh) | 一种低帧率场景下的行人目标跟踪方法及可读存储介质 | |
Ren et al. | A face tracking method in videos based on convolutional neural networks | |
Jemilda et al. | Tracking Moving Objects in Video. | |
Rai et al. | Pearson's correlation and background subtraction (BGS) based approach for object's motion detection in infrared video frame sequences | |
Guo et al. | Fast Visual Tracking using Memory Gradient Pursuit Algorithm. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21936709 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21936709 Country of ref document: EP Kind code of ref document: A1 |