CN116091554B

CN116091554B - Moving target tracking method for open set

Info

Publication number: CN116091554B
Application number: CN202310375414.1A
Authority: CN
Inventors: 杨菲; 杨栋栋; 齐洁爽; 薛凡福; 葛玉慧
Original assignee: Zhiyang Innovation Technology Co Ltd
Current assignee: Zhiyang Innovation Technology Co Ltd
Priority date: 2023-04-11
Filing date: 2023-04-11
Publication date: 2023-06-16
Anticipated expiration: 2043-04-11
Also published as: CN116091554A

Abstract

The invention discloses a moving object tracking method for an open set, which belongs to the technical field of intelligent security, wherein an optical flow method is used for determining a moving object, a first rectangular frame is used for defining a first moving object area, a second rectangular frame is obtained by reducing the first rectangular frame according to a set proportion, a second moving object area defined by the second rectangular frame is used in the first moving object area defined by the first rectangular frame, and global features and local features of the same moving object are defined by the two rectangular frames; the single-Target detection network SiamRPN++ is improved, and the added secondary Target branches and the original Target branches are adopted to perform feature fusion on the same moving Target with two different scales to enrich the diversity of features, so that the accuracy of Target detection is improved, and any moving object in any scene can be tracked.

Description

Moving target tracking method for open set

Technical Field

The invention belongs to the technical field of intelligent security, and particularly relates to a moving target tracking method for an open set.

Background

With the popularization of security monitoring systems, massive video scene data are generated every day, manual filtering of the massive data becomes a bottleneck of video utilization, and how to automatically acquire knowledge from the video data becomes a research hot spot in the field of computer vision. The automatic tracking and track analysis of moving targets in video have become key application technologies for military, security and community security monitoring. Meanwhile, in the field of automatic driving, the video target tracking technology also has important application.

Video target tracking is classified into single target tracking and multi-target tracking according to the number of tracking targets. Wherein, single target tracking tracks a single target in the video, and multi-target tracking tracks a plurality of targets of single or multiple kinds in the video. Along with the development of deep learning, single-target tracking is mainly based on a twin network paradigm, firstly, a target to be tracked is designated in an initial frame, and then the target is searched and matched in a subsequent frame; the multi-target tracking is mainly based on a target detection paradigm, and firstly, target detection is carried out on video frames, and then, track matching is carried out on detected targets.

Although video object tracking techniques based on deep learning have made tremendous progress, object tracking in an open focus still faces a significant challenge. The single-target tracking technology needs to preset a tracking target, and a target detection algorithm used in the multi-target tracking technology needs to train on a training set containing specific targets at first, and only moving objects contained in the training set can be detected, so that the single-target tracking technology cannot track the moving target in an open set with unknown moving targets.

Disclosure of Invention

The invention provides a moving object tracking method for an open set, which aims at any moving object in the open set, estimates a moving area by using an optical flow method insensitive to the moving type, and then searches and matches the estimated moving area in a subsequent frame by using an improved single-object tracking network, thereby realizing the tracking of any moving object.

The invention is realized by adopting the following technical scheme:

a moving object tracking method for an open set is proposed, which is characterized by comprising:

s1, constructing a single target tracking data set;

s2: detecting any moving object based on an optical flow method;

s3, framing a first moving target area by adopting a first rectangular frame;

s4: reducing the first rectangular frame according to a set proportion to obtain a second rectangular frame, and framing a second moving target area in the first moving target area by adopting the second rectangular frame;

s5, constructing an improved single-target tracking network SiamRPN++, which comprises the following steps:

a Target branch for extracting the characteristics of a first moving Target area framed by a first rectangular frame;

a secondary Target branch, which is parallel to the Target branch and is used for extracting the characteristics of a second moving Target area framed by a second rectangular frame;

a Search branch for extracting features of the detected image;

the secondary Target branch and the feature map of the appointed layer of the Target branch perform channel-level Concat operation, and then the feature map output by the Concat operation and the feature map of the corresponding layer of the Search branch are sent to a Siamese RPN module for related operation;

s6: the S1 constructed single-target tracking data set is adopted to train the S5 improved single-target tracking network SiamRPN++.

In some embodiments of the invention, detecting any moving object based on an optical flow method includes:

if the moving object is segmented in the continuous N frames, dividing the segmented object into independent moving objects;

if the moving object starts to be in a split state, the N frames are changed into continuous states, and the object in the split state is divided into independent moving objects.

In some embodiments of the invention, the secondary Target branch is implemented using Resnet 18.

In some embodiments of the present invention, training in step S5 includes: and carrying out random proportion contraction on the first rectangular frame, obtaining a contracted moving Target, scaling the moving Target to the Target size, and inputting the Target size into the secondary Target branch.

Compared with the prior art, the invention has the advantages and positive effects that: in the moving object tracking method for the open set, an optical flow method is used for determining a moving object, a first moving object area is formed by adopting a first rectangular frame, a second rectangular frame is obtained by reducing the first rectangular frame according to a set proportion, and a second moving object area formed by adopting a second rectangular frame is formed in the first moving object area formed by the first rectangular frame, so that the moving object is jointly represented by global moving objects and local moving objects formed by the two rectangular frames, the robustness of the characteristics of the moving object is improved, and the moving object tracking method has the advantage of insensitivity to the category of the moving object; the single-Target detection network SiamRPN++ is improved, and the added secondary Target branches and the original Target branches are adopted to perform feature fusion on local and global moving objects, so that the accuracy of Target detection is improved, the dependence of a multi-Target tracking algorithm based on a Target detection paradigm on priori knowledge is broken through, and any moving object in any scene can be tracked.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a method schematic of the moving object tracking method for open set proposed by the present invention;

FIG. 2 is a schematic diagram of a first moving object region and a second moving object region framed after detecting a moving object by an optical flow method in the present invention;

FIG. 3 is a network architecture schematic of a single target tracking network SiamRPN++;

FIG. 4 is a network architecture schematic of the improved single-target tracking network SiamRPN++ of the present invention;

FIG. 5 is a graph of the tracking result of an original Siam RPN++ network to a conventional moving object;

FIG. 6 is a graph showing the results of tracking a conventional moving object via the modified Siam RPN++ network of the present invention;

FIG. 7 is a graph showing the tracking results of an original Siam RPN++ network when a conventional moving object moves substantially;

FIG. 8 is a graph showing the tracking of a conventional moving object moving substantially via the Siam RPN++ network of the improvement points of the present invention;

FIG. 9 is a graph of the tracking result of an original Siam RPN++ network to an unconventional moving object;

FIG. 10 is a graph showing the tracking of irregular moving objects via the modified Siam RPN++ network of the present invention;

FIG. 11 is a trace of the original Siam RPN++ network with less deformation of unconventional moving objects;

FIG. 12 is a graph showing the results of tracking a non-conventional moving object with less distortion via a modified Siam RPN++ network of the present invention;

FIG. 13 is a graph showing the tracking results of the original Siam RPN++ network when the deformation of the unconventional moving object is large;

FIG. 14 is a graph showing the results of a modified Siam RPN++ network of the present invention tracking a highly distorted, unconventional moving object.

Detailed Description

The invention provides a moving object tracking method for an open set, which can track a moving object in an open set with unknown moving objects, and concretely comprises the following steps as shown in fig. 1:

s1, constructing a single target tracking data set.

The data set includes conventional moving objects such as humans, vehicles, airplanes, animals, and some non-conventional non-active moving objects such as wind blown plastic bags, films, kites, etc.; wherein each category contains a plurality of videos, and the time length is not fixed.

The data set is divided into a training set and a test set, wherein a part of the types of the moving objects in the test set appear in the training set, and a part of the types of the moving objects do not appear in the training set.

In one embodiment of the invention, taking the conventional moving objects as 20 classes, the unconventional moving objects as 10 classes, each class as 5 videos, each video duration as an example, the divided training set as 1 class-15 class conventional moving objects and 1 class-6 class unconventional moving objects, and the test set as at least 16 class-20 class moving objects and 7 class-10 class unconventional moving objects.

S2: any moving object is detected based on an optical flow method.

Detecting an arbitrary moving object using a light flow method, and if the moving object appears that the object is divided (i.e., is not completely continuous) within N frames, dividing the divided object into individual moving objects; if the moving object is in a splitting state just before, the N frames are changed into continuous states, and the object in the splitting state is divided into individual moving objects.

S3 the method comprises the following steps: and framing a first moving target area by adopting a first rectangular frame.

And (2) framing the moving targets detected in the step (S2) by using a first rectangular frame, wherein the first rectangular frame is a maximum circumscribed rectangular frame which frames the moving targets including the most complete moving objects, and the dark frame of the outermost layer of each moving target is shown in the figure 2.

S4: and reducing the first rectangular frame according to a set proportion to obtain a second rectangular frame, and adopting the second rectangular frame to internally frame the second moving target area in the first moving target area.

That is, the partial image of the moving object is framed by the second rectangular frame, and the first rectangular frame is framed by the global image of the moving object, as shown by the light-colored frame inside each moving object in fig. 2.

S5: an improved single target tracking network SiamRPN++ is constructed.

As shown in fig. 3, the network input of the single-Target tracking network sialprn++ includes a Target branch and a Search branch, and the size of the first moving Target area image, which is input by the Target branch and is outlined by S2, is 127×127×3.

In the training process, the whole picture of one frame of the video is input by the Search branch, and the size is 255×255×3.

The Target branch network and the Search branch network form a twin network, and the outputs of a certain three layers are respectively subjected to a Siamese RPN operation.

The tracking network outputs the classified CLS of the moving object and the coordinate Bbox of the moving object in the original image.

According to the invention, on the basis of the existing single-Target tracking network SiamRPN++, as shown in fig. 4, a secondary Target branch is added, the secondary Target branch is parallel to the original Target branch, after channel-level Concat operation is carried out on feature graphs of designated layers of the two branches, namely, features of global and local moving targets are fused, feature fusion is carried out on the same moving targets with two different scales to enrich feature diversity, and the fused feature graphs and feature graphs of corresponding layers of Search branches are sent to a Siamese RPN module for relevant operation, so that the accuracy of Target tracking is improved.

In the embodiment of the invention, in the modified single-Target tracking network SiamRPN++, a conv3-2 layer, a conv4-2 layer and a conv5-2 layer of a secondary Target branch respectively carry out channel-level concat operation with a conv3-5 layer, a conv4-6 layer and a conv5-3 layer of a Target branch, and then a feature map output by the concat operation and a feature map of a corresponding layer of a Search branch are sent to a Siamese RPN module for relevant operation.

In the embodiment of the invention, in order to better fuse the sub-Target branch and the characteristics generated by the Target branch and to improve the characteristic extraction time as much as possible, the sub-Target branch is realized by adopting a lightweight Resnet 18.

In the training process, the first rectangular frame is contracted in random proportion, a contracted moving Target is obtained from the inner frame of the first moving Target area, the moving Target is scaled to 127 x 3, and then the moving Target is input to the sub-Target branch.

Prediction after training: the test set is subjected to the steps S1 to S4 to obtain a first motion area and a second motion area, the pictures of the first motion area are processed into 127 x 3 size input Target branches, and the pictures of the second motion area are processed into 127 x 3 size input sub-Target branches; the whole image of the frame of the determined moving object is input to the Search branch, the network outputs the corresponding moving object classification and the coordinate position as the result of tracking the frame, namely the position of the moving object of the frame, and then the calculation is repeated until the whole video is traversed.

In the embodiments shown in fig. 5 to 8, the tracking effect of the conventional moving object (the human body is taken as an example in this embodiment) by the conventional siem rpn++ network is compared with the modified siem rpn++ network of the present invention: in the initial tracking, the tracking result of the original Siam RPN++ network on the conventional moving object is shown in fig. 5, the tracking result of the modified Siam RPN++ network on the conventional moving object is shown in fig. 6, when the moving amplitude is greatly moved, the tracking result of the original Siam RPN++ network is shown in fig. 7, the tracking result of the modified Siam RPN++ network is shown in fig. 8, it can be seen that when the moving amplitude of the moving object is not greatly changed, the tracking capability of the original Siam RPN++ network is similar to that of the modified Siam RPN++ network, but when the moving amplitude of the moving object is greatly changed, the original Siam RPN++ network cannot accurately track, and the better tracking result can be obtained after the modified Siam RPN++ network combines local and global characteristics.

In the embodiment shown in fig. 9 to 14, the tracking result of the conventional moving object (in this embodiment, a plastic bag is taken as an example) by comparing the existing siem rpn++ network with the modified siem rpn++ network according to the present invention: in the initial tracking, the tracking result of the original Siam RPN++ network on the unconventional moving object is shown in fig. 9, the tracking result of the modified Siam RPN++ network on the unconventional moving object is shown in fig. 10, the tracking result of the original Siam RPN++ network is shown in fig. 11 when the shape of the moving object is smaller, the tracking result of the modified Siam RPN++ network is shown in fig. 12, the tracking result of the original Siam RPN++ network is shown in fig. 13 when the shape of the moving object is larger, the tracking result of the modified Siam RPN++ network is shown in fig. 14, and the tracking effect of the original Siam RPN++ network and the modified Siam RPN++ network on the unconventional moving object is similar when the shape of the moving object is larger, but the tracking effect of the modified Siam RPN++ network is obviously better than that of the original Siam RPN++ network when the shape of the moving object is larger.

It should be noted that, in the specific implementation process, the control portion may be implemented by executing, by a processor in a hardware form, computer-executed instructions in a software form stored in a memory, which is not described herein, and the program corresponding to the action executed by the control circuit may be stored in a computer-readable storage medium of the system in a software form, so that the processor invokes and executes the operation corresponding to each module.

The computer readable storage medium above may include volatile memory, such as random access memory; but may also include non-volatile memory such as read-only memory, flash memory, hard disk, or solid state disk; combinations of the above types of memories may also be included.

The processor referred to above may be a general term for a plurality of processing elements. For example, the processor may be a central processing unit, or may be other general purpose processors, digital signal processors, application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or may be any conventional processor or the like, but may also be a special purpose processor.

It should be noted that the above description is not intended to limit the invention, but rather the invention is not limited to the above examples, and that variations, modifications, additions or substitutions within the spirit and scope of the invention will be within the scope of the invention.

Claims

1. A moving object tracking method for an open set, comprising:

s1, constructing a single target tracking data set;

s2: detecting any moving object based on an optical flow method;

s3, framing a first moving target area by adopting a first rectangular frame;

a Search branch for extracting features of the detected image;

2. The moving object tracking method for an open set according to claim 1, wherein detecting an arbitrary moving object based on an optical flow method comprises:

3. The moving object tracking method for an open set according to claim 1, wherein said secondary Target branch is implemented using a Resnet 18.

4. The moving object tracking method for an open set according to claim 1, wherein the training of step S5 includes:

and carrying out random proportion contraction on the first rectangular frame, obtaining a contracted moving Target, scaling the moving Target to the Target size, and inputting the Target size into the secondary Target branch.