CN111508006A

CN111508006A - Moving target synchronous detection, identification and tracking method based on deep learning

Info

Publication number: CN111508006A
Application number: CN202010325315.9A
Authority: CN
Inventors: 王鸿鹏; 代婉; 宋玉琳
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2020-08-07

Abstract

The invention provides a moving target synchronous detection, identification and tracking method based on deep learning, which comprises the following steps: acquiring video data to be detected collected by a camera, and inputting a video sequence; performing video capture on the video sequence; detecting an interested area of a target by utilizing an SSD target detection model; segmenting a target area by using a significance detection method; carrying out target identification by adopting a weighted cross entropy loss function; predicting the position of the tracking object in the next key frame by using a Kalman equation; performing feature matching on the position predicted by using a Kalman equation and the position calculated by a target detection algorithm; and calculating the ratio of the side length of the segmentation frame to the side length of the identification frame to perform scale updating judgment. The invention ensures the real-time performance of the algorithm, can be applied to various fields, not single field, and can solve the problem of real-time detection, identification and tracking under various dynamic complex environments, and the final design aim is to form a practical and generalized technical framework.

Description

Moving target synchronous detection, identification and tracking method based on deep learning

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a moving target synchronous detection, identification and tracking method based on deep learning.

Background

With the rapid development of computer technology, the synchronous target detection, identification and tracking has become a research hotspot in the field of computer vision due to the high application value in the aspects of video monitoring, scene understanding, human-computer interaction and the like, and attracts more and more overseas and overseas scholars and scientific research institutions to participate in the research in the field. In brief, the target detection, identification and tracking is to find and locate an interested target object from a specific image or video sequence frame by using computer knowledge such as machine learning, natural language processing, image processing and the like, and identify, segment and extract a moving target from background information of an image sequence, so as to deduce the position condition of the maximum probability of the target, thereby providing a reliable basis for people to further identify and understand the behavior of the target.

In recent years, a great number of improved algorithms are proposed in the target tracking field and the target detection and recognition field, but the distance enables a computer to perceive the real world, reasonable decisions are made in a complex and dynamic environment, and a plurality of scientific problems need to be overcome. Based on the above, the invention provides a method for synchronously detecting, identifying and tracking a moving target based on deep learning, and aims to solve the problem of synchronously detecting, identifying and tracking the moving target in a complex and dynamic environment (such as shielding between the target and a background, deformation and scale change of the target, illumination change and the like).

Disclosure of Invention

In order to solve the technical problem, the invention provides a moving target synchronous detection, identification and tracking method based on deep learning, which comprises the following steps:

s1: acquiring video data to be detected collected by a camera, wherein the video data to be detected comprises a plurality of frames of images to be detected, and inputting a video sequence;

s2: performing video capture on the video sequence;

s3: target detection: firstly, acquiring first sample data, wherein the first sample data is image data containing a target, manually labeling the first sample data, dividing the labeled first sample data into a first training set and a first testing set, training a neural network by using the first training set, and testing the trained neural network by using the first testing set to establish the target detection model based on the SSD; detecting the target by using an SSD target detection model to detect an interested area;

s4: target segmentation: after video capture is carried out, extracting target features and target edge features by using a saliency detection method, and segmenting a target based on the target features, wherein the segmented salient region is a target region;

s5: target identification: target identification is carried out on the basis of target detection, firstly, the ID of a target is determined, and if the problem of unbalanced proportion of training samples occurs, a weighted cross entropy loss function can be adopted, so that the identification accuracy is improved;

s6: target tracking: for the I-k key frames, performing target detection on videos acquired by a camera by adopting an SSD target detection model, obtaining tracking objects according to target detection results, maintaining a Kalman equation for each tracking object, and predicting the positions of the tracking objects in the I- (k +1) key frames by utilizing the Kalman equation;

s7: and (3) judging feature matching: matching the position predicted by using the Kalman equation with the position calculated by a target detection algorithm in a calculation and comparison mode, updating the Kalman equation if the feature matching is successful, and continuing to detect the next frame without any processing if the feature matching is failed;

s8: and (3) scale updating judgment: and calculating the ratio of the side length of the segmentation frame to the side length of the identification frame, and when the ratio of the two is greater than the threshold, making lambda equal to the ratio, namely, the identification frame is increased to the size of the segmentation frame in proportion, and if the ratio of the two is less than the threshold, continuing to segment the next frame without any processing.

Preferably, in the target segmentation, the target feature and the target edge feature are extracted by using a saliency detection method, the higher the saliency is, the more likely the saliency is to belong to the target, and the regions with higher and lower saliency are represented by 0 and 1 in the binary image respectively, that is, the target segmentation is completed.

Preferably, the feature matching is performed on the subject position calculated by the target detection algorithm and the tracked subject position in the feature matching determination, and when the features cannot be matched, a feature matching mechanism with the target detection as a reference is started.

Preferably, the size of the body of the segmented frame after the target is segmented in the scale updating determination and the size of the body of the identified frame after the target is identified are subjected to scale updating, and when the ratio of the two is higher than a preset threshold, a scale coordination mechanism taking the target segmentation as a reference is started, and iterative tracking is continued.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the method, an image segmentation technology is fully utilized, a target segmentation result is used as a target reference size aiming at different scales in the target segmentation and target identification processes, the size of a region to be identified is adjusted timely, a target identification position is updated once when scale updating judgment meets conditions, and the inference process does not need to wait for a judgment result, so that the algorithm real-time performance is guaranteed;

2. the design of the invention is intended to be applied to various fields, but not to a single field, and can solve the problems of real-time detection, identification and tracking of environments such as shielding between a target and a background, deformation and scale change of the target, illumination change and the like under various dynamic complex environments, and the final target of the design is a technical framework forming practicality and generalization.

Drawings

FIG. 1 is an overall framework flow diagram of the present invention;

FIG. 2 is a flow diagram of the object detection and tracking framework of the present invention;

FIG. 3 is a flow diagram of an object segmentation and recognition framework of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention is further described below:

example (b):

as shown in fig. 1, a method for synchronously detecting, identifying and tracking a moving object based on deep learning is characterized in that the method for synchronously detecting, identifying and tracking a moving object comprises the following steps:

(1) acquiring video data to be detected collected by a camera, wherein the video data to be detected comprises a plurality of frames of images to be detected, and inputting a video sequence;

(2) performing video capture on a video sequence, wherein the video capture mainly comprises video frame capture;

(3) target detection and tracking: the traditional target detection algorithm usually needs to occupy a large amount of computing resources and is difficult to operate efficiently under the condition of limited computing resources, so that the target is detected by using an SSD target detection model: as shown in fig. 2, first sample data is obtained, wherein the first sample data is image data including a target, the first sample data is manually subjected to image annotation, the annotated first sample data is divided into a first training set and a first test set, a neural network is trained by using the first training set, the trained neural network is tested by using the first test set, so as to establish a target detection model based on SSD, the target is detected by using the SSD target detection model, coordinates of four positions of a rectangular detection frame are obtained, region division is performed according to the position coordinates, and the position of a first frame region of interest is determined; for the I-k key frames, performing target detection on a video acquired by a camera by adopting an SSD target detection model, introducing a Kalman filtering mechanism into a tracking area obtained by using a correlation matrix of target characteristic vectors corresponding to all detection results output in a video sequence, and performing filtering processing on observation data of a target in an image to be detected of the current frame to obtain filtered target observation data in the current observation image, namely predicting the position of a tracking object in the I- (k +1) key frames by utilizing a Kalman equation;

(4) target identification and segmentation: as shown in fig. 3, after video capture is performed, a saliency detection method is used to extract target features and target edge features, the higher the saliency is, the more likely the saliency is to belong to a target, areas with higher and lower saliency are represented by 0 and 1 in a binary image respectively, that is, target segmentation is completed, and the segmented salient areas are target areas; in order to reduce the use of computing resources, target identification is carried out on the basis of target detection, firstly, the ID of a target is determined, and if the problem of unbalanced proportion of training samples occurs, a weighted cross entropy loss function can be adopted, so that the identification accuracy is improved;

(5) and (3) judging feature matching: carrying out feature matching on the main body position calculated by the target detection algorithm and the tracked main body position, if the feature matching is successful, updating the Kalman equation, if the feature matching is failed, carrying out no processing, continuously detecting the next frame, and when the features cannot be matched, starting a feature matching mechanism taking target detection as a reference;

(6) and (3) scale updating judgment: and carrying out scale updating on the main body size of the segmentation frame after the target is segmented and the main body size of the identification frame after the target is identified, calculating the ratio of the side length of the segmentation frame to the side length of the identification frame, starting a scale coordination mechanism taking the target as a reference when the ratio of the two is greater than a threshold value, continuing iterative tracking to enable lambda to be equal to the ratio, namely, the identification frame is increased to the size of the segmentation frame in proportion, and if the ratio of the two is less than the threshold value, continuing segmenting the next frame without any treatment.

Specifically, the method can be applied to various fields, not a single field, and can solve the problems of real-time detection, identification and tracking of various dynamic complex environments, such as shielding between a target and a background, deformation and scale change of the target, illumination change and the like, and the designed final target is a technical framework for forming practicality and generalization.

It should be noted that, in this document, moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for synchronously detecting, identifying and tracking a moving target based on deep learning is characterized in that the method for synchronously detecting, identifying and tracking the moving target comprises the following steps:

s2: performing video capture on the video sequence;

2. The method for synchronously detecting, identifying and tracking the moving target based on the deep learning as claimed in claim 1, wherein the method using saliency detection in the target segmentation extracts target features and target edge features, the higher the saliency is, the more likely the region belongs to the target, and the regions with higher and lower saliency are represented by 0 and 1 in a binary image respectively, i.e. the target segmentation is completed.

3. The method for synchronously detecting, identifying and tracking the moving target based on the deep learning as claimed in claim 1, wherein the feature matching is performed on the subject position calculated by the target detection algorithm and the tracked subject position in the feature matching determination, and when the features cannot be matched, a feature matching mechanism based on the target detection is started.

4. The method for synchronously detecting, identifying and tracking the moving target based on the deep learning as claimed in claim 1, wherein the body size of the segmentation frame after the target segmentation in the scale update determination and the body size of the identification frame after the target identification are subjected to scale update, and when a ratio of the two is higher than a preset threshold, a scale coordination mechanism based on the target segmentation is started to continue iterative tracking.