CN108986143B - Target detection tracking method in video - Google Patents

Target detection tracking method in video Download PDF

Info

Publication number
CN108986143B
CN108986143B CN201810940035.1A CN201810940035A CN108986143B CN 108986143 B CN108986143 B CN 108986143B CN 201810940035 A CN201810940035 A CN 201810940035A CN 108986143 B CN108986143 B CN 108986143B
Authority
CN
China
Prior art keywords
video
target
tracking
video image
image frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810940035.1A
Other languages
Chinese (zh)
Other versions
CN108986143A (en
Inventor
尚凌辉
张兆生
王弘玥
郑永宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Jiehuixin Digital Technology Co.,Ltd.
Original Assignee
Zhejiang Icare Vision Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Icare Vision Technology Co ltd filed Critical Zhejiang Icare Vision Technology Co ltd
Priority to CN201810940035.1A priority Critical patent/CN108986143B/en
Publication of CN108986143A publication Critical patent/CN108986143A/en
Application granted granted Critical
Publication of CN108986143B publication Critical patent/CN108986143B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Abstract

The invention discloses a target detection and tracking method in a video. The method firstly carries out segmented sampling on the video to obtain a plurality of segments of video image frame sequences. Then adopting a neural network model
Figure DEST_PATH_IMAGE002
And carrying out target detection and feature extraction on each video image frame sequence. And calculating the correlation matrix of the target characteristic vectors corresponding to all the detection results output in the video sequence again, and further obtaining the tracking results of all the detected targets in the video sequence in the frequency sequence. Finally, sequencing the video image frame sequence sampled in segments according to the time axis, and inputting the target detection tracking track and the characteristic matrix of the video image frame sequence into the neural network model
Figure DEST_PATH_IMAGE004
And obtaining the tracking characteristic of each target in each video image frame sequence, and calculating the correlation of all targets between two adjacent video image frame sequences by using the tracking characteristic so as to complete the tracking of the targets in the whole video segment. The method can effectively reduce the calculation amount required by completing the target detection and tracking task in the video.

Description

Target detection tracking method in video
Technical Field
The invention belongs to the technical field of computer vision, and relates to a method for detecting and tracking a target in a video.
Background
Monitoring equipment such as bayonets, public security and various network cameras and the like are installed and used in large quantities, video data acquired by the equipment plays a great role in traffic violation, public security management and the like, but with the continuous increase of the installation quantity of the equipment, the produced data quantity is increased day by day, the storage and utilization of the data face huge challenges, and video structuring becomes a research hotspot in scientific research and industry.
A fundamental problem that cannot be circumvented in various video structuring schemes is the accurate and efficient detection and tracking of key targets in the video. In the patents of ' target tracking optimization method based on tracking learning detection ' 107967692A, real-time unmanned aerial vehicle video target detection and tracking method ' 108108697A, and ' multi-target pedestrian detection and tracking method based on deep learning ' 107563313A, etc., single-frame images are used for completing target detection, characteristics of relevant areas of target detection results are calculated, and matching and tracking of targets between close frames are completed by relying on the characteristics. In the methods, target detection depends on information of a single-frame image, and cannot utilize related information between similar image frames in a time sequence, so that the accuracy of a detection result is limited; meanwhile, the features used in the matching and tracking process are also extracted from a single-frame image, and the features can distinguish various different target individuals, so that similar targets in the same row are very easy to be matched wrongly, and the tracking failure is caused; finally, in order to ensure the accuracy of detection and tracking, the interval of sampling at intervals of frames is limited, which results in large calculation amount and low efficiency.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for detecting and tracking a target in a video.
The technical scheme adopted by the invention for solving the technical problem is as follows:
step 1, performing segmented sampling on a video to obtain a plurality of segments of video image frame sequences.
Step 2, adopting a neural network model M1For each video image frameAnd performing target detection and feature extraction on the sequence, wherein the output information comprises: the number of the image where the target is located in the sequence, the rectangular frame of the target in the image and the feature vector of the target.
And 3, calculating correlation matrixes of target feature vectors corresponding to all detection results output in the video sequence, and further obtaining tracking results of all detected targets in the video sequence in the frequency sequence.
Step 4, inputting the target detection tracking track and the characteristic matrix in the video image frame sequence to the neural network model M according to the time axis2And obtaining the tracking characteristic of each target in each video image frame sequence, and calculating the correlation of all targets between two adjacent video image frame sequences by using the tracking characteristic so as to complete the tracking of the targets in the whole video segment.
The invention has the beneficial effects that:
1. the accuracy of the detector is improved by using the inter-frame information of the time series images.
2. The space-time information of the time sequence images is fully utilized to improve the tracking effect of the target.
3. The calculation amount of detection tracking can be effectively reduced, and the operation efficiency is improved.
4. The detection and the tracking are effectively integrated, and the overall detection and tracking effect is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the present invention comprises the steps of:
step 1, performing segmented sampling on a video to obtain a plurality of segments of video image frame sequences.
Step 2, performing target detection and feature extraction on each video image frame sequence, wherein the output information comprises: the number of the image where the target is located in the sequence, the rectangular frame of the target in the image and the feature vector of the target.
And 3, calculating correlation matrixes of target feature vectors corresponding to all detection results output in the video sequence, and further obtaining tracking results of all detected targets in the video sequence.
And 4, matching and tracking the targets in the front and rear adjacent video sequences by utilizing target detection tracking tracks (including the serial numbers of the targets in the sequences and the rectangular frames of the targets in the images) and feature matrixes (serial splicing of feature vectors) in the video image frame sequences according to a time axis.
The calculation method of the target detection and the feature extraction of each video image frame sequence comprises the following steps: executing the trained neural network model M1The number of the image in which the target is located in the sequence, the rectangular frame of the target in the image and the feature vector of the target are directly obtained in the inference process.
Wherein the neural network model M1The training method comprises the following steps:
collecting the annotated video data;
cutting the sampled video segment to obtain the video image frame sequence, the number of the marked image of the target in the sequence, the rectangular frame of the target in the image and the number information of the target;
and training an optimized network model by detecting and classifying targets in the video image sequence.
The following is an implementation scheme of a target detection and tracking method in a video, and the specific steps are as follows:
neural network model M for training target detection and matching feature calculation in video image frame sequence1The method comprises the following specific steps:
1. collecting a plurality of video segments V; artificially labeling target locations and each target occurrence in a sequence of video imagesObtaining the original marked sample set A ═ V by the disappeared ID information1,V2,…,VL}。
2. Using deep learning theory and method, each video segment V in the original labeled sample set AiDividing the sample to generate several video image frame sequences Pi,Pi+1,…,Pi+k∈ViObtaining a training test sample set B ═ P1,P2,…,Pi,Pi+1,…,Pi+k…,Pn-k,…,Pn-1,Pn}。
3. By utilizing deep learning theory and method and combining with the training test sample set B, a neural network model M capable of detecting the target and calculating the target characteristic is obtained by utilizing multi-task mode training1
Neural network model M for training target matching tracking feature calculation between video image frame sequences2The method comprises the following specific steps:
1. using neural network model M1Obtaining each video image sequence P in the training test sample set BiThe tracking trajectory of each target (the number of the image in which the target is located in the sequence, and the rectangular frame of the target in the image) and a feature matrix (serial splicing of feature vectors).
2. Using per video segment ViTarget information of the annotation and each video image frame sequence Pi+jPassing through a neural network model M1Obtaining the tracking track and the characteristic matrix of the target to obtain a video segment ViA set of feature samples of each object in a different sequence of video image frames: o ═ q1,q2,…,qkWherein q isiBy M1At PiTo generate a training data set C ═ O of target matching tracking features between sequences of video images1,O2,…,Os}
3. Training by using deep learning theory and method and combining with training test sample set C to obtain neural network model M for calculating target matching tracking characteristics between video image sequences2
Using neural network modelsType M1,M2Detecting and tracking a target in a video, and specifically comprising the following steps:
1. sampling the video segment to be analyzed to generate several video image frame sequences
2. For each sequence of video image frames, a neural network model M is implemented1The number of the image of each target in the sequence, the rectangular frame of the target in the image and the characteristic vector of the target are obtained in the reasoning process of (1)
3. And calculating a correlation matrix of target feature vectors corresponding to all detection results output in the video image frame sequence, wherein the correlation can be calculated by Euclidean distance, Mahalanobis distance and the like, so as to obtain the tracking results of all detected targets in the video image frame sequence.
4. Sequencing the sequence of segmented sampled video image frames according to time axis information, and executing a neural network model M according to a tracking trajectory and a feature matrix2Obtaining the tracking characteristics of each target in each video image frame sequence
Figure BDA0001768825990000041
The feature is used to calculate the correlation of all objects between two adjacent video image frame sequences (wherein the correlation can be calculated by Euclidean distance, Mahalanobis distance, etc.), thereby completing the tracking of the objects in the whole video segment.
In conclusion, the method for detecting and tracking the target in the video is realized based on the video image frame sequence data and by combining the information of the single frame image and the inter-frame correlation between the video image frame sequences. Compared with a target detection method based on a single-frame image, the target detection method combines the related information among the image sequences, and the target detection performance is improved. The features are obtained by calculation from a single frame image by using a machine learning method and are used for target tracking matching, the features need to meet the distinguishing of similar targets of the same type, or the calculated amount for obtaining the features is very large, or the distinguishing capability of the features is poor, so that matching errors are easy to occur, and the tracking failure is caused. Therefore, the tracking matching of the invention is divided into two stages, namely the matching tracking of the target in the video image frame sequence and the target matching tracking between different image frame frequency sequences in a short time: matching tracking features inside the video image frame sequence depend on the correlation between the sequences and multi-needle image information in the video image frame sequence, and the distinguishing capability of the features is limited to the objects inside the video image frame sequence; the target matching between the video image frame sequences mainly utilizes the matching and tracking results of targets in the video image frame sequences and the characteristics of the targets in the video image frame sequences, so that the tracking accuracy can be effectively improved. Compared with other methods, the method has the advantage that the calculated amount required for completing the target detection and tracking task in the video can be effectively reduced.
While the foregoing is directed to the preferred embodiment of the present invention, and is not intended to limit the scope of the invention, it will be understood that the invention is not limited to the embodiments described herein, which are described to assist those skilled in the art in practicing the invention.

Claims (3)

1. A method for detecting and tracking a target in a video is characterized by comprising the following steps:
step 1, performing segmented sampling on a video to obtain a plurality of segments of video image frame sequences;
step 2, adopting a neural network model
Figure DEST_PATH_IMAGE001
Carrying out target detection and feature extraction on each video image frame sequence, wherein the output information comprises: the number of the image where the target is located in the sequence, a rectangular frame of the target in the image and a feature vector of the target;
step 3, calculating correlation matrixes of target feature vectors corresponding to all detection results output in the video sequence, and further obtaining tracking results of all detected targets in the video sequence in the frequency sequence;
step 4, sequencing the video image frame sequences sampled in sections according to the time axis, and inputting the target detection tracking track and the characteristic matrix of the video image frame sequences into the neural network model
Figure 864681DEST_PATH_IMAGE002
And obtaining the tracking characteristic of each target in each video image frame sequence, and calculating the correlation of all targets between two adjacent video image frame sequences by using the tracking characteristic so as to complete the tracking of the targets in the whole video segment.
2. The method according to claim 1, wherein the method comprises: the neural network model
Figure 944633DEST_PATH_IMAGE001
The method is established in the following way:
collecting a large number of video segments, manually marking the target positions in the video image sequence and the ID information of each target from appearance to disappearance to obtain an original marked sample set;
by utilizing a deep learning method, for each video segment in an original labeled sample set, segmenting and sampling to generate a plurality of video image frame sequences to obtain a training test sample set;
the neural network model is obtained by utilizing a deep learning method, combining with a training test sample set and utilizing a multi-task mode for training
Figure 32675DEST_PATH_IMAGE001
3. The method according to claim 2, wherein the method comprises: the neural network model
Figure 299708DEST_PATH_IMAGE004
The method is established in the following way:
using neural network models
Figure 764187DEST_PATH_IMAGE006
Obtaining the tracking track sum of each target in each video image sequence in the training test sample setA feature matrix;
passing through a neural network model by using target information marked in each video and each video image frame sequence
Figure DEST_PATH_IMAGE007
Obtaining a tracking track and a characteristic matrix of the target to obtain a characteristic sample set of each target in different video image frame sequences in each section of video, thereby generating a training data set of target matching tracking characteristics among the video image sequences;
training to obtain a neural network model by using a deep learning method and combining a training data set
Figure 964225DEST_PATH_IMAGE002
CN201810940035.1A 2018-08-17 2018-08-17 Target detection tracking method in video Active CN108986143B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810940035.1A CN108986143B (en) 2018-08-17 2018-08-17 Target detection tracking method in video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810940035.1A CN108986143B (en) 2018-08-17 2018-08-17 Target detection tracking method in video

Publications (2)

Publication Number Publication Date
CN108986143A CN108986143A (en) 2018-12-11
CN108986143B true CN108986143B (en) 2022-05-03

Family

ID=64553984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810940035.1A Active CN108986143B (en) 2018-08-17 2018-08-17 Target detection tracking method in video

Country Status (1)

Country Link
CN (1) CN108986143B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109711332B (en) * 2018-12-26 2021-03-26 浙江捷尚视觉科技股份有限公司 Regression algorithm-based face tracking method and application
CN109934096B (en) * 2019-01-22 2020-12-11 浙江零跑科技有限公司 Automatic driving visual perception optimization method based on characteristic time sequence correlation
CN111862145B (en) * 2019-04-24 2022-05-17 四川大学 Target tracking method based on multi-scale pedestrian detection
CN110503663B (en) * 2019-07-22 2022-10-14 电子科技大学 Random multi-target automatic detection tracking method based on frame extraction detection
CN113033582B (en) * 2019-12-09 2023-09-26 杭州海康威视数字技术股份有限公司 Model training method, feature extraction method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BRPI0806019A2 (en) * 2008-07-15 2010-08-31 Invisys Sist S De Visao Comput Ltda counting and tracking of people on the move based on computer vision
CN102004920A (en) * 2010-11-12 2011-04-06 浙江工商大学 Method for splitting and indexing surveillance videos
CN102750527A (en) * 2012-06-26 2012-10-24 浙江捷尚视觉科技有限公司 Long-time stable human face detection and tracking method in bank scene and long-time stable human face detection and tracking device in bank scene
CN104094279A (en) * 2014-04-30 2014-10-08 中国科学院自动化研究所 Large-range-first cross-camera visual target re-identification method
CN104954743A (en) * 2015-06-12 2015-09-30 西安理工大学 Multi-camera semantic association target tracking method
CN105574505A (en) * 2015-12-16 2016-05-11 深圳大学 Human body target re-identification method and system among multiple cameras
CN106920248A (en) * 2017-01-19 2017-07-04 博康智能信息技术有限公司上海分公司 A kind of method for tracking target and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10270642B2 (en) * 2012-12-05 2019-04-23 Origin Wireless, Inc. Method, apparatus, and system for object tracking and navigation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BRPI0806019A2 (en) * 2008-07-15 2010-08-31 Invisys Sist S De Visao Comput Ltda counting and tracking of people on the move based on computer vision
CN102004920A (en) * 2010-11-12 2011-04-06 浙江工商大学 Method for splitting and indexing surveillance videos
CN102750527A (en) * 2012-06-26 2012-10-24 浙江捷尚视觉科技有限公司 Long-time stable human face detection and tracking method in bank scene and long-time stable human face detection and tracking device in bank scene
CN104094279A (en) * 2014-04-30 2014-10-08 中国科学院自动化研究所 Large-range-first cross-camera visual target re-identification method
CN104954743A (en) * 2015-06-12 2015-09-30 西安理工大学 Multi-camera semantic association target tracking method
CN105574505A (en) * 2015-12-16 2016-05-11 深圳大学 Human body target re-identification method and system among multiple cameras
CN106920248A (en) * 2017-01-19 2017-07-04 博康智能信息技术有限公司上海分公司 A kind of method for tracking target and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Matching tracking sequences across widely separated cameras;Yinghao Cai 等;《2008 15th IEEE International Conference on Image Processing》;20081212;765-768 *
序列帧间双重匹配的红外点目标跟踪算法;王乐东 等;《光电子·激光》;20100331;第21卷(第3期);465-469 *
监控视频中多目标检测与跟踪研究;吴尔杰;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20160515(第05期);I136-590 *
相关序列小目标图像运动跟踪与仿真研究;杨秋英;《系统仿真学报》;20080331;第20卷(第6期);1645-1647,1653 *

Also Published As

Publication number Publication date
CN108986143A (en) 2018-12-11

Similar Documents

Publication Publication Date Title
CN108986143B (en) Target detection tracking method in video
CN109657575B (en) Intelligent video tracking algorithm for outdoor constructors
CN111161315B (en) Multi-target tracking method and system based on graph neural network
Li et al. Robust people counting in video surveillance: Dataset and system
Lee et al. Learning discriminative appearance models for online multi-object tracking with appearance discriminability measures
CN110210335B (en) Training method, system and device for pedestrian re-recognition learning model
CN105654139A (en) Real-time online multi-target tracking method adopting temporal dynamic appearance model
Zhang et al. V-LPDR: Towards a unified framework for license plate detection, tracking, and recognition in real-world traffic videos
CN102254394A (en) Antitheft monitoring method for poles and towers in power transmission line based on video difference analysis
CN109376736A (en) A kind of small video target detection method based on depth convolutional neural networks
CN112861673A (en) False alarm removal early warning method and system for multi-target detection of surveillance video
CN112131929A (en) Cross-camera pedestrian tracking system and method based on block chain
Shirsat et al. Proposed system for criminal detection and recognition on CCTV data using cloud and machine learning
CN106572387A (en) Video sequence alignment method and video sequence alignment system
Yu et al. The multi-level classification and regression network for visual tracking via residual channel attention
CN102314591A (en) Method and equipment for detecting static foreground object
Mao et al. Aic2018 report: Traffic surveillance research
Yang et al. A method of pedestrians counting based on deep learning
CN104268902A (en) Multi-target video tracking method for industrial site
CN103996207A (en) Object tracking method
Vora et al. Bringing generalization to deep multi-view pedestrian detection
CN112348011B (en) Vehicle damage assessment method and device and storage medium
CN113724293A (en) Vision-based intelligent internet public transport scene target tracking method and system
Wang et al. Thermal infrared object tracking based on adaptive feature fusion
Sun et al. Deep learning-based vehicle tracking and traffic event detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20231030

Address after: Room 319-2, 3rd Floor, Building 2, No. 262 Wantang Road, Xihu District, Hangzhou City, Zhejiang Province, 310012

Patentee after: Zhejiang Jiehuixin Digital Technology Co.,Ltd.

Address before: 311121 East Building, building 7, No. 998, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee before: ZHEJIANG ICARE VISION TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right