CN110517285A - The minimum target following of large scene based on estimation ME-CNN network - Google Patents
The minimum target following of large scene based on estimation ME-CNN network Download PDFInfo
- Publication number
- CN110517285A CN110517285A CN201910718847.6A CN201910718847A CN110517285A CN 110517285 A CN110517285 A CN 110517285A CN 201910718847 A CN201910718847 A CN 201910718847A CN 110517285 A CN110517285 A CN 110517285A
- Authority
- CN
- China
- Prior art keywords
- target
- network
- cnn
- training
- estimation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/207—Analysis of motion for motion estimation over a hierarchy of resolutions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention proposes a kind of minimum method for tracking target of large scene based on estimation ME-CNN network, it solves the problems, such as to carry out minimum target following using kinematic parameter without registration, realizes step are as follows: obtain the initial training collection D of target estimation network ME-CNN;Construct the network ME-CNN of estimating target motion;Network ME-CNN loss function is calculated with the parameters of target motion;Judge whether it is initial training collection;Update loss function training label;Obtain prediction target movement position initial model;Correct prediction model position;Training dataset is updated with revised target position, completes the target following of a frame;Obtain remote sensing video frequency object tracking result.The present invention predicts target movement position with deep learning network ME-CNN, avoids large scene image registration in tracking, and super model pastes the problem of target's feature-extraction hardly possible, reduces target signature dependence, improves the accuracy of target following in super fuzzy video.
Description
Technical field
The invention belongs to remote sensing technical field of video processing, are related to the remote sensing video frequency object tracking of the minimum target of large scene,
The minimum target remote sensing video tracing method of specifically a kind of large scene based on estimation ME-CNN network.For supervising safely
Control, smart city construction and means of transportation monitoring etc..
Background technique
Remote sensing target tracking is an important research direction of computer vision field, wherein the satellite shooting moved
The target following of the remote sensing video of the minimum target of large scene, low resolution is extremely challenging studies a question.The small mesh of large scene
Target remote sensing videograph is that the daily routines situation of a certain regional a period of time is covered because the height of satellite shooting is very high
More than half city of lid, therefore the resolution ratio of video is not high, the size of vehicle, naval vessel and aircraft in video is minimum, and vehicle is regarding
Size in frequency is even up to 3*3 pixel or so, also extremely low with the contrast of ambient enviroment, and human eye can only be observed one small
Bright spot, therefore this ultralow pixel and minimum Target Tracking Problem belong to the minimum Target Tracking Problem of large scene, difficulty is more
Greatly;And since the satellite of shooting video constantly moves, video is while entirety has more apparent offset to a direction due to ground
Area's height will appear some areas scaling, make it difficult to first to do the method that image registration carries out frame difference method again and obtain target movement
Position brings great challenge to the remote sensing video frequency object tracking of the minimum target of large scene.
Video frequency object tracking is to need to predict subsequent view after giving target position and the size in a video initial frame
Target position and size in frequency frame.Currently, the algorithm in video tracking field is mostly based on neural network (Neural
Network) and correlation filtering (Correlation Filter), wherein being based on neural network algorithm, such as CNN-SVM method
Main thought is that target is first inputted multilayer neural network, and learning objective feature recycles traditional SVM method to track,
The target signature of extraction learns out by a large amount of training data, has more distinctive than traditional feature;It is filtered based on correlation
The algorithm of wave, for example the basic thought of KCF method is to find a Filtering Template, and the image of next frame is allowed to roll up with Filtering Template
Product operation, responding maximum region is exactly the target predicted, does convolution algorithm with the template and other regions of search, obtains maximum
The region of search of response is exactly target position, and the arithmetic speed of this method is fast, and accuracy rate is higher.
The algorithm of natural optics video tracking is difficult to apply in the remote sensing video of the minimum Small object of large scene, because of target
Size is minimum and fuzzy, obtains effective target signature without calligraphy learning with neural network.And the tracking of traditional remote sensing video
Be not suitable for occurring yet background constantly deviate and the video of partial region scaling on, the technical method of image registration and frame difference method without
Method is realized, and target and ambient enviroment contrast are extremely low, easily with losing.
Summary of the invention
It is an object of the invention to overcome the problems of the above-mentioned prior art, it is low to propose a kind of computation complexity, essence
Spend the higher large scene Small object remote sensing video tracing method based on estimation.
The present invention is a kind of minimum target remote sensing video tracing method of the large scene based on estimation ME-CNN network,
It is characterized in that, includes the following steps:
(1) the initial training collection D of minimum target estimation network ME-CNN is obtained:
The preceding F frame image for taking original remotely-sensed data video A, to the same target continued labelling bounding box of each image,
Each bounding box top left corner apex coordinate is arranged sequentially by video frame number together as training set D;
(2) the network ME-CNN of minimum target movement is estimated in building: different special including three extraction training datas in parallel
The convolution module of sign, then stack gradually articulamentum, full articulamentum and output layer;
(3) loss function of network ME-CNN is calculated with the minimum parameters of target motion: being calculated according to the characteristics of motion of target
The movement tendency of target is obtained, and using it as the corresponding trained label of target, then calculates trained label and ME-CNN network
European space distance between prediction result, the loss function as ME-CNN network optimization training;
(4) judge whether it is initial training to integrate: judging current training set whether as initial training collection, if not initial instruction
Practice collection, execute step (5), updates the training label in loss function;If instead being initial training collection, execute step (6), into
Enter the circuit training of network;
(5) it updates the training label in loss function: when current training set is not initial training collection, using current training set
Data recalculate the training label of loss function, calculation method calculates the side of training label with the minimum parameters of target motion
Method, the training label that recalculates identical as the method for step (3) participate in estimation network ME-CNN training, enter
Step (6);
(6) it obtains the initial model M1 of prediction target movement position: training set D is inputted into target estimation network ME-
CNN obtains the initial model M1 of prediction target movement position according to current loss function training network;
(7) it corrects the position result of prediction model: calculating the aided location offset of target, estimated with offset correction movement
Count the position result of network ME-CNN prediction;
(7a) obtains target gray image block: obtaining the mesh of next frame according to the initial model M1 of prediction target movement position
Cursor position (Px,Py), according to obtained target position (Px,Py) the gray level image block of taking-up target in the image of next frame, and
It is normalized, the target gray image block after being normalized;
(7b) obtains target position offset: brightness classification is carried out to the target gray image block after normalization, using vertical
Straight sciagraphy determines the position of target in image block, the position at the target's center position being calculated and image block center away from
From to get arrive target position offset;
(7c) obtains revised target position: utilizing obtained target position offset correction estimation network ME-
CNN predicts the position of target, obtains the revised all positions of target;
(8) training dataset is updated with revised target position, completes the target following of a frame: by an obtained target left side
Training set D last line is added in upper Angle Position, and removes the first row of training set D, is disposably operated, has obtained one and repaired
Just and the training set D of update, the training of one frame of completion have obtained the target position result of a frame;
(9) judge whether current video frame number is less than total video frame number: if it is less than total video frame number with regard to circulating repetition
Step (4)~step (9), the tracking optimization training for carrying out target continue to train until traversing all video frames, otherwise such as
Fruit is equal to total video frame number, terminates training, executes step (10);
(10) obtain remote sensing video frequency object tracking result: the output accumulated is remote sensing video frequency object tracking result.
The present invention solves the height of computation complexity existing for existing video tracking algorithm, the lower problem of tracking accuracy.
Compared with prior art, the present invention having the advantage that
(1) the estimation network ME-CNN that the present invention uses carries out frame without first carrying out image registration in conventional method again
The image background of poor method or complexity models to obtain the motion profile of target, can be by neural network to the mesh of preceding F frame image
The analysis of the training set of cursor position composition, the movement that neural network forecast obtains target tend to, do not need manually to mark subsequent video frame
In target position label, can be realized network self-loopa training, not only greatly reduce the complexity of track algorithm, improve
The practicability of algorithm.
(2) algorithm that the present invention uses, in such a way that ME-CNN network and aided location offset method combine voluntarily
Amendment remote sensing video object position has modified the loss function of estimation network, reduces net according to the characteristics of motion of target
The calculation amount of network improves the robustness of target following.
Detailed description of the invention
Fig. 1 is implementation flow chart of the invention;
Fig. 2 is the structural schematic diagram of ME-CNN network proposed by the present invention;
Fig. 3 is using the present invention to the prediction locus result of target minimum in large scene and the curve pair of standard target track
Than figure, prediction result of the invention is green curve, and red is accurate target trajectory.
Specific embodiment
In the following with reference to the drawings and specific embodiments, the present invention is described in detail.
Embodiment 1
The minimum target remote sensing video tracking of large scene is in terms of security monitoring, smart city construction and means of transportation
It plays a significant role.The remote sensing video that the present invention studies is the minimum target of large scene, the low resolution of mobile satellite shooting
Remote sensing video.The target for the video tracking that the present invention studies is extremely fuzzy, and target is minimum, also very with the contrast of ambient enviroment
Low, human eye is also difficult to find out that target is vehicle in the no motion of situation of target, and video is again due to the movement of satellite and shooting ground
The height above sea level variation in area will appear image translation and part scales, the difficulty of target following compared to clear video target with
Track greatly improves and a challenge of remote sensing video tracking.There are mainly two types of existing methods, and one is utilize neural network
Target signature is extracted in study, extracts multiple search boxes in next frame, a frame for choosing target signature highest scoring is target institute
In position, this method can not extract validity feature, can not be applied to view of the invention since target super model paste is again minimum
Frequently.Another kind is image registration first to be carried out to frame difference method again to obtain target trajectory, then find a Filtering Template, is allowed next
The image and Filtering Template of frame do convolution operation, and responding maximum region is exactly the target predicted, this method is due to the present invention
Video not only there is image translation, more occur part scale, increase the complexity of image registration significantly, increase calculating
Difficulty, it is difficult to extract arrive effective motion profile.Therefore the present invention is directed to these statuses, proposes after study a kind of based on movement
The minimum target remote sensing video tracing method of large scene of estimation ME-CNN network includes the following steps: referring to Fig. 1
(1) the initial training collection D of minimum target estimation network ME-CNN is obtained:
The preceding F frame image for taking original remotely-sensed data video A only selectes a target, to each image in each image
The same target continued labelling bounding box, the present invention in by minimum target be target, by each bounding box top left corner apex sit
Mark is arranged sequentially by video frame number together as training set D, wherein using image coordinate system, training set D is that a F row 2 arranges
Matrix, it is the target coordinate position of a frame that every a line is corresponding, and wherein the position of target can be with the coordinate of top left corner apex
It indicates, can also be indicated by centre coordinate, not influence the analysis to target motion conditions.
(2) three that the network ME-CNN: ME-CNN network of the invention of minimum target movement includes in parallel are estimated in building
The convolution module of training data different characteristic is extracted to obtain the different motion feature of target, then stacks gradually articulamentum fusion and mentions
Different motion features, full articulamentum and the output layer got are exported as a result, overall constitute ME-CNN network.Of the invention
Using three convolution modules to obtain the different motion feature of target, single convolution module is difficult processing and obtains entire training set
Feature, if network layer will appear the problem of gradient disappears more deeply, therefore the present invention widens network, and multiple scales mention
The feature for taking training set different situations reduces network complexity, accelerates network speed.Since video of the invention constantly moves
Offset and since partial region scaling occurs in the height difference in area, is not available image registration and adds frame for this video
The methods of poor method and background modeling can use ME-CNN network to obtain target trajectory, relatively existing method at this time
Model complexity is low, and calculation amount is small.
(3) loss function of network ME-CNN is calculated with the minimum parameters of target motion: being calculated according to the characteristics of motion of target
The movement tendency of target is obtained, and using it as the corresponding trained label of target, then calculates trained label and ME-CNN network
European space distance between prediction result, as the loss function of the ME-CNN network optimization, trained loss in the present invention
Function can reinforce the analysis to training data, help network quickly to extract effective feature, to optimize estimation
Network ME-CNN.
(4) judge whether it is initial training to integrate: judging current training set whether as initial training collection, if current training set
It is not initial training collection, executes step (5), updates the training label in loss function, and then participate in network training.Conversely, such as
The current training set of fruit is initial training collection, is executed step (6), into the circuit training of network.
(5) it updates the training label in loss function: since training set D is constantly updated in subsequent step (8), training
Need constantly to adjust the training label in loss function in journey according to updated training set D, current training set is not initially to instruct
When practicing collection, it should recalculate the training label of loss function, the minimum target of calculation method using the data of current training set
The method of beginning parameter transform model training label is identical as the method for step (3);The training label recalculated participates in fortune
Dynamic estimation network ME-CNN training, enters step (6).
(6) it obtains the initial model M1 of prediction target movement position: training set D is inputted into target estimation network ME-
CNN obtains the initial model M1 of prediction target movement position according to current loss function training network.
(7) it corrects the position result of prediction model: calculating the aided location offset of target, estimated with offset correction movement
Count the position result of network ME-CNN prediction.
(7a) obtains target gray image block: obtaining the mesh of next frame according to the initial model M1 of prediction target movement position
Cursor position (Px,Py), according to obtained target position (Px,Py) the gray level image block of taking-up target in the image of next frame, and
It is normalized, the target gray image block after being normalized, the contrast since the size of target is minimum, with ambient enviroment
It is extremely low, judge that the method for offset is poor to its effect with neural network, therefore first obtain lesser target frame, then sentence in frame
The method of disconnected offset is preferable.
(7b) obtains target position offset: brightness classification is carried out to the target gray image block after normalization, by target
It is shown with road with different brightness, simultaneously because road surrounding enviroment and target contrast are extremely low, it is true using vertical projection method
Mesh is arrived at a distance from the position at image block center in the position for determining target in image block, the target's center position being calculated
Cursor position offset.
(7c) obtains revised target position: utilizing obtained target position offset correction estimation network ME-
CNN predicts the position of target, obtains the revised all positions of target, the position including the target upper left corner.
(8) training dataset is updated with revised target position, completes the target following of a frame: by an obtained target left side
Training set D last line is added in upper Angle Position, and removes the first row of training set D, is disposably operated, has obtained one and repaired
Just and the training set D of update, the training of one frame of completion have obtained the target position of a frame as a result, this loop modification training set
Method have updated network parameter, reduce target frame difference, adapt to target movement.
(9) judge whether current video frame number is less than total video frame number, if it is less than total video frame number with regard to circulating repetition
Step (4)~step (9) updates model parameter again, improves Model suitability, carries out the tracking optimization training of target, until
All video frames are traversed, continue to train, if instead current video frame number is equal to total video frame number, terminates training, executes
Step (10).
(10) obtain remote sensing video frequency object tracking result: after training, the target position output accumulated is as distant
Feel video frequency object tracking result.
To carry out frame again poor without first carrying out image registration in conventional method by the estimation network ME-CNN that the present invention uses
The image background of method or complexity models to obtain target trajectory, and the new algorithm of proposition is by neural network to preceding F frame image
Target position composition training set analysis, can effectively extract Target Motion Character.It will appear gradient since network is too deep
The problems such as disappearance, therefore tended to using the movement that the ME-CNN neural network forecast of multiscale analysis obtains target, it does not need manually to mark
The target position label in subsequent video frame is infused, network self-loopa training can be realized, not only greatly reduce track algorithm
Complexity improves the practicability of algorithm, may not need image registration by the estimation network of target, fast and accurately looks for
To target position.Voluntarily judged using the mode that ME-CNN network and aided location offset method combine to remote sensing video object
Position obtains the movement velocity of target according to the motion conditions of target, analyzes its possible movement tendency, also has modified movement
The loss function for estimating network, improves the robustness of target following.
The present invention carries out motion analysis using a kind of target pasted based on the method for deep learning to super model, predicts under it
The prediction direction of one step, then be aided with position offset correction motion estimation network, do not need subsequent tag, can carry out target with
Track, so as to avoid large scene image registration in tracking, super model pastes the problem of target's feature-extraction hardly possible, significantly improves super model
Paste the accuracy of target following in video, the tracking being also applied in other various remote sensing videos.
Embodiment 2
The minimum target remote sensing video tracing method of large scene based on estimation ME-CNN network is the same as embodiment 1, step
(2) building described in estimates that the network ME-CNN of minimum target movement, such as Fig. 2 are comprised the following steps that
The overall structure of (2a) estimation network: estimation network ME-CNN, three convolution modules including parallel connection,
Three convolution modules of present invention parallel connection extract different motion features, and three convolution modules in parallel are sequentially connected again later
To articulamentum, full articulamentum and output layer.The present invention constructs in the network ME-CNN for estimating minimum target movement and uses articulamentum
The different motion feature extracted is merged, is refined and is analyzed using full articulamentum, export to obtain result by output layer.
The structure of three convolution modules of (2b) parallel connection: three convolution modules in parallel, respectively convolution module Ι, convolution
Module Ι Ι and convolution module Ι Ι Ι, wherein
Convolution module I includes locally-attached LocallyConnected1D convolutional layer, and step-length is 2 coordinates for extracting target
Location information;
Convolution module Ι Ι includes empty convolution, step-length 1;
Convolution module Ι Ι Ι includes the one-dimensional convolution that step-length is 2;
Convolution module Ι, Ι Ι and Ι Ι Ι obtains the position feature of target different scale, three output datas is obtained, then by three
The output of a convolution module is cascaded to obtain fusion convolution results;Full articulamentum and output layer are inputted again, are obtained pre- to the end
Survey result.It is of the invention using three convolution modules to obtain the different motion feature of target, single convolution module is difficult to handle
The feature of entire training set is obtained, if network layer will appear the problem of gradient disappears more deeply, therefore the present invention carries out network
Widen, the feature of the extraction training set different situations of multiple scales reduces network complexity, accelerates network speed.Due to this
The video of invention constantly move offset and due to area height difference there is partial region scaling, for this video without
Method adds the methods of frame difference method and background modeling using image registration, and ME-CNN network can be used to obtain target movement rail at this time
Mark, relatively existing method model complexity is low, and calculation amount is small.
Embodiment 3
The minimum target remote sensing video tracing method of large scene based on estimation ME-CNN network is the same as embodiment 1-2, step
The loss function for calculating network ME-CNN described in rapid 3 with the minimum parameters of target motion, by handling the data of training set D,
Rough analysis is carried out to the motion conditions of target, has certain guidance to make in the optimization direction of estimation network ME-CNN
With comprising the following steps that
(3a) obtains training set D displacement of targets: the data of training set D line f, F-2 row, F-4 row being taken out, respectively
Subtract each other with training set D the first row data, obtain F frame, F-2 frame, F-4 frame respectively the displacement of targets between first frame according to
Secondary is S1、S2、S3。S1For the displacement of targets between F frame and first frame, S2For the displacement of targets between F-2 frame and first frame,
S3For the displacement of targets between F-4 frame and first frame.If training set not instead of initial training collection, update i times instruction is crossed
Practice collection D, the corresponding frame number of the every a line of training set is also accordingly changing, and becomes 1+i frame, 2+i frame ... ..., F+i frame will
The data taking-up of training set D line f, F-2 row, F-4 row, subtracts each other with training set D the first row data, obtained difference respectively
Be F+i frame, F+i-2 frame, F+i-4 frame displacement of targets S is followed successively by between first frame respectively1、S2、S3。
(3b) obtains the movement tendency of target:
According to the characteristics of motion of target, using obtained displacement of targets, respectively in the x of image coordinate system, the direction y is under
Movement tendency (the G of target is calculated in column formulax,Gy);
V1=(S1-S2)/2
V2=(S2-S3)/2
A=(V1-V2)/2
G=V1+a/2
Image coordinate system is used in the present invention, it is horizontally to the right the side x that image coordinate system, which is using the image upper left corner as origin,
To being vertically downward the direction y.In above formula, V1To be displaced S1With S2Between target speed, V2To be displaced S2With S3Between target
Movement velocity, a are acceleration of motion, and G is the movement tendency of target.
The loss function of (3c) building estimation network ME-CNN:
The movement tendency of target is calculated according to the characteristics of motion of target, and is marked it as the corresponding training of target
Label, the target movement tendency (G being calculatedx,Gy) with estimation network ME-CNN output predicted position (Px,Py) between Europe
Formula space length is configured to the loss function of estimation network ME-CNN;
In formula, GxFor the target movement tendency in the direction x under image coordinate system, GyFor the target fortune in the direction y under image coordinate system
Dynamic trend, PxFor the prediction result in estimation network direction x under image coordinate system, PyIt is sat for estimation network in image
Mark is the prediction result in the lower direction y.
A comprehensive example is given below, the present invention is further described
Embodiment 4
The minimum target remote sensing video tracing method of large scene based on estimation ME-CNN network with embodiment 1-3,
Referring to Fig.1, the minimum target remote sensing video tracing method of a kind of large scene based on estimation ME-CNN network, packet
Include following steps:
(1) the initial training collection D of minimum target estimation network ME-CNN is obtained:
The preceding F frame image for taking original remotely-sensed data video A will to a target continued labelling bounding box of each image
Each bounding box top left corner apex coordinate is superimposed as training set D, and wherein training set D is the matrix of a F row 2 column,
It is the coordinates of targets of a frame in video that every a line is corresponding, wherein the position of target can with the coordinate representation of top left corner apex,
It can also be indicated by centre coordinate, not influence the analysis to target motion conditions, minimum target is referred to as target in the present invention.
(2) the network ME-CNN of minimum target movement is estimated in building: different special including three extraction training datas in parallel
To obtain the different motion feature of target, single convolutional layer is difficult processing and obtains the feature of entire training set the convolution module of sign,
If network layer will appear the problem of gradient disappears more deeply, therefore widen network, the extraction training set of multiple scales is not
With the feature of situation, network complexity can reduce, accelerate network speed, then stack gradually fortune of the articulamentum to merge extraction
Dynamic feature, full articulamentum is analyzed and output layer obtains result.
The overall structure of (2a) estimation network: estimation network ME-CNN, three convolution modules including parallel connection,
Articulamentum, full articulamentum, output layer are stacked gradually again;
The structure of three convolution modules of (2b) parallel connection: three convolution modules in parallel, respectively convolution module Ι, convolution
Module Ι Ι and convolution module Ι Ι Ι, wherein
Convolution module I includes locally-attached LocallyConnected1D convolutional layer, and step-length is 2 coordinates for extracting target
Location information;
Convolution module Ι Ι includes empty convolution, step-length 1;
Convolution module Ι Ι Ι includes the one-dimensional convolution that step-length is 2;
Convolution module Ι, Ι Ι and Ι Ι Ι obtains the position feature of target different scale, three output datas is obtained, then by three
The output of a convolution module is cascaded to obtain fusion convolution results;Full articulamentum and output layer are inputted again, are obtained pre- to the end
Survey result.
(3) it constructs the loss function of minimum target estimation network ME-CNN: being calculated according to the characteristics of motion of target
To the movement tendency of target, and using it as the corresponding trained label of target, then calculate the prediction result of itself and ME-CNN network
Between European space distance, the loss function as ME-CNN network;
(3a) obtains training set D displacement of targets: if training set is initial training collection, by training set D line f, F-2
The data taking-up of row, F-4 row, subtracts each other with training set D the first row data respectively, obtains F frame, F-2 frame, F-4 frame point
Displacement of targets not between first frame is followed successively by S1、S2、S3。S1For the displacement of targets between F frame and first frame, S2For F-
Displacement of targets between 2 frames and first frame, S3For the displacement of targets between F-4 frame and first frame.If training set is not just
Beginning training set, but update i times training set D is crossed, the corresponding frame number of the every a line of training set is also accordingly changing, and becomes 1+i
Frame, 2+i frame ... ..., F+i frame take out the data of training set D line f, F-2 row, F-4 row, respectively with training set
D the first row data are subtracted each other, obtain be respectively F+i frame, F+i-2 frame, F+i-4 frame displacement of targets respectively with first frame
Between be followed successively by S1、S2、S3。
(3b) obtains the movement tendency of target:
According to the characteristics of motion of target, using obtained training data displacement of targets, respectively in the x of image coordinate system, the side y
To the movement tendency (G that target is calculated according to the following formulax,Gy)。
V1=(S1-S2)/2
V2=(S2-S3)/2
A=(V1-V2)/2
G=V1+a/2
The loss function of (3c) building estimation network ME-CNN:
Target movement tendency (the G being calculatedx,Gy) with estimation network output predicted position (Px,Py) between it is European
Space length is configured to the loss function of estimation network ME-CNN.
(4) it updates the training label in loss function: since training set D is constantly updated in subsequent step (7), training
It needs constantly to adjust the training label in loss function in journey according to updated training set D, participates in estimation network ME-
CNN training.
(5) it obtains the initial model M1 of prediction target movement position: training set D is inputted into target estimation network ME-
CNN obtains the initial model M1 of prediction target movement position according to loss function training network.
(6) it corrects the position result of prediction model: calculating the aided location offset of target, estimated with offset correction movement
Count the position result of network ME-CNN prediction.
(6a) obtains target gray image block: obtaining the mesh of next frame according to the initial model M1 of prediction target movement position
Cursor position (Px,Py), according to obtained target position (Px,Py) the gray level image block of taking-up target in the image of next frame, and
It is normalized, the target gray image block after being normalized, the contrast since the size of target is minimum, with ambient enviroment
It is extremely low, judge that the method for offset is poor to its effect with neural network, therefore first obtain lesser target frame, then sentence in frame
The method of disconnected offset is preferable.
(6b) obtains target position offset: brightness classification is carried out to the target gray image block after normalization, using vertical
Straight sciagraphy determines the position of target in image block, the position at the target's center position being calculated and image block center away from
From to get arrive target position offset.
(6c) obtains revised target position: utilizing obtained target position offset correction estimation network ME-
CNN predicts the position of target, obtains the revised all positions of target, the position including the target upper left corner.
(7) training dataset is updated with revised target position, completes the target following of a frame: by an obtained target left side
Training set D last line is added in upper Angle Position, and removes the first row of training set D, is disposably operated, has obtained one and repaired
Just and the training set of update, the training of one frame of completion have obtained the target position result of a frame.
(8) remote sensing video frequency object tracking result: circulating repetition step (4)~step (7) is obtained, constantly with updated
Training set retrieves trained label according to the method in step (3), updates network model, iterates, carry out target with
Track approaches training, until traversing all video frames, the output accumulated is remote sensing video frequency object tracking result.
The motion estimation model of target can also be moved by the target of former frames in this example, extract the road where target
Information finds city where the target of identical longitude and latitude on map, by path adaptation to corresponding road conditions, predicts mesh
Motion conditions are marked, the three-dimensional information of road is made full use of, change more violent, to have part to scale in video feelings in road height
Condition also can accurate tracking target;The aided location offset of target can also be obtained by training neural network, but be needed
Target and ambient enviroment are handled in advance, the higher image block of contrast is obtained, neural network can be trained.
Below in conjunction with l-G simulation test, technical effect of the invention is described further:
Embodiment 5
The minimum target remote sensing video tracing method of large scene based on estimation ME-CNN network with embodiment 1-4,
Simulated conditions and content:
Emulation platform of the invention are as follows: dominant frequency is the Intel Xeon CPU E5-2630v3CPU of 2.40GHz, 64GB's
Running memory, Ubuntu16.04 operating system, software platform are Keras and Python.Video card: GeForce GTX TITAN
X/PCIe/SSE2×2。
The remote sensing video in Libya's Derna area that the present invention uses Jilin No.1 video satellite to shoot, former 10
The vehicle of frame image is as target, and to the target label frame in image, the position of frame left upper apex is as training set
DateSet.Tracking emulation is carried out to target video based on the method for tracking target of KCF with existing with the present invention respectively.
Emulation content and result:
Control methods, that is, existing the method for tracking target based on KCF, with the method for the present invention and control methods in above-mentioned emulation
Under the conditions of tested, i.e., with control methods and the present invention vehicle target in Libya's Derna area remote sensing video is carried out
Tracking, obtains the prediction target trajectory result (green curve) of ME-CNN network and the comparison of accurate target track (red curve)
Figure such as Fig. 3, the result for obtaining table 1 is as follows.
1. Libya's Derna area remote sensing video frequency object tracking result list of table
Analysis of simulation result:
In table 1, Precision indicates the target position of ME-CNN neural network forecast and the region Duplication of label position, IOU
The average Euclidean distance of the center and tag hub position that indicate bounding box is less than the percentage of given threshold value, in this example
In, choosing given threshold value is that 5, KCF indicates control methods, and ME-CNN indicates method of the invention.
Referring to table 1, it can be seen that the precision of tracking, this hair is greatly improved in the present invention from the data comparison of table 1
It is bright that Precision has been increased to 85.63% by 63.21%,
The average Euclidean distance of the center of bounding box and tag hub position is less than given threshold value as seen from Table 1
Percentage IOU, the 58.72% of method for tracking target of the present invention by control methods based on KCF are increased to 76.51%.
Referring to Fig. 3, red curve is standard target geometric locus in Fig. 3, and green curve is using the present invention to same mesh
The tracking prediction estimation curve carried out is marked, the minimum target in large scene is shown in green box, compares two curves, it is seen that
Two heights of curve are consistent, essentially coincide, it was demonstrated that tracking accuracy of the present invention is high.
In short, the minimum target remote sensing video of a kind of large scene based on estimation ME-CNN network proposed by the present invention
Tracking, can shooting satellite constantly move, video occur integral translation and part scale, the resolution ratio of video it is extremely low and
Tracking accuracy is improved in the case that target size is minimum, solves and carries out minimum target following using kinematic parameter without registration
Problem realizes step are as follows: obtain the initial training collection D of minimum target estimation network ME-CNN;Minimum target is estimated in building
The network ME-CNN of movement;The loss function of network ME-CNN is calculated with the minimum parameters of target motion;Judge whether it is initial instruction
Practice collection;Update the training label in loss function;Obtain the initial model M1 of prediction target movement position;Correct prediction model
Position result;Training dataset is updated with revised target position, completes the target following of a frame;Judge current video frame
Whether number is less than total video frame number;Obtain remote sensing video frequency object tracking result.The present invention is pre- using deep learning network ME-CNN
Target movement position to be surveyed, large scene image registration in existing method tracking is avoided, super model pastes the problem of target's feature-extraction hardly possible,
Reduce the dependence to target signature, significantly improves the accuracy of target following in super fuzzy video, be also applied for other
Tracking in various remote sensing videos.
Claims (3)
1. a kind of minimum method for tracking target of large scene based on estimation ME-CNN network, which is characterized in that including as follows
Step:
(1) the initial training collection D of minimum target estimation network ME-CNN is obtained:
The preceding F frame image for taking original remotely-sensed data video A will be every to the same target continued labelling bounding box of each image
A bounding box top left corner apex coordinate is arranged sequentially by video frame number together as training set D;
(2) the network ME-CNN of minimum target movement is estimated in building: including three extraction training data different characteristics in parallel
Convolution module, then stack gradually articulamentum, full articulamentum and output layer;
(3) loss function of network ME-CNN is calculated with the minimum parameters of target motion: being calculated according to the characteristics of motion of target
The movement tendency of target, and using it as the corresponding trained label of target, then calculate the prediction of trained label Yu ME-CNN network
As a result the European space distance between, the loss function as ME-CNN network optimization training;
(4) judge whether it is initial training to integrate: judging current training set whether as initial training collection, if not initial training
Collection executes step (5), updates the training label in loss function;If instead being initial training collection, execute step (6), enters
The circuit training of network;
(5) it updates the training label in loss function: when current training set is not initial training collection, using the number of current training set
According to the training label for recalculating loss function, calculation method calculates the method for training label with the minimum parameters of target motion, with
The method of step (3) is identical, the training label recalculated, participates in estimation network ME-CNN training, enters step
(6);
(6) it obtains the initial model M1 of prediction target movement position: training set D is inputted into target estimation network ME-CNN,
According to current loss function training network, the initial model M1 of prediction target movement position is obtained;
(7) it corrects the position result of prediction model: the aided location offset of target is calculated, with offset correction estimation net
The position result of network ME-CNN prediction;
(7a) obtains target gray image block: obtaining the target position of next frame according to the initial model M1 of prediction target movement position
Set (Px,Py), according to obtained target position (Px,Py) the gray level image block of target is taken out in the image of next frame, and carry out
Normalization, the target gray image block after being normalized;
(7b) obtains target position offset: carrying out brightness classification to the target gray image block after normalization, is thrown using vertical
Shadow method determines the position of target in image block, and the target's center position being calculated is at a distance from the position at image block center, i.e.,
Obtain target position offset;
(7c) obtains revised target position: pre- using obtained target position offset correction estimation network ME-CNN
The position for surveying target, obtains the revised all positions of target;
(8) training dataset is updated with revised target position, completes the target following of a frame: the target upper left corner that will be obtained
Training set D last line is added in position, and removes the first row of training set D, is disposably operated, and has obtained an amendment simultaneously
The training set D of update completes the training of a frame, has obtained the target position result of a frame;
(9) judge whether current video frame number is less than total video frame number: if it is less than total video frame number with regard to circulating repetition step
(4)~step (9), carries out the tracking optimization training of target, until all video frames are traversed, if being equal to total video frame number,
Terminate training, executes step (10);
(10) obtain remote sensing video frequency object tracking result: the output accumulated is remote sensing video frequency object tracking result.
2. the large scene minimum target remote sensing video tracking side according to claim 1 based on estimation ME-CNN network
Method, it is characterised in that: the network ME-CNN of minimum target movement is estimated in building described in step (2), comprises the following steps that
The overall structure of (2a) estimation network: estimation network ME-CNN, including three convolution modules in parallel, then according to
Secondary stacking articulamentum, full articulamentum, output layer;
The structure of three convolution modules of (2b) parallel connection: three convolution modules in parallel, respectively convolution module Ι, convolution module Ι
Ι and convolution module Ι Ι Ι, wherein
Convolution module I includes locally-attached LocallyConnected1D convolutional layer, and step-length is 2 coordinate positions for extracting target
Information;
Convolution module Ι Ι includes empty convolution, step-length 1;
Convolution module Ι Ι Ι includes the one-dimensional convolution that step-length is 2;
Convolution module Ι, Ι Ι and Ι Ι Ι obtains the position feature of target different scale, obtains three output datas, then rolls up three
The output of volume module is cascaded to obtain fusion convolution results;Full articulamentum and output layer are inputted again, obtain prediction knot to the end
Fruit.
3. the large scene minimum target remote sensing video tracking side according to claim 1 based on estimation ME-CNN network
Method, it is characterised in that: with the loss function of minimum parameters of target motion calculating network ME-CNN described in step 3, include
Following steps:
(3a) obtains training set D displacement of targets: the data of training set D line f, F-2 row, F-4 row are taken out, respectively with instruction
Practice collection D the first row data to subtract each other, obtains F frame, F-2 frame, displacement of targets of the F-4 frame respectively between first frame and be followed successively by
S1、S2、S3。
(3b) obtains the movement tendency of target:
According to the characteristics of motion of target, using obtained displacement of targets, respectively in the x of image coordinate system, the direction y is according to following public affairs
Movement tendency (the G of target is calculated in formulax,Gy);
V1=(S1-S2)/2
V2=(S2-S3)/2
A=(V1-V2)/2
G=V1+a/2
In formula, V1To be displaced S1With S2Between target speed, V2To be displaced S2With S3Between target speed, a be movement
Acceleration, G are the movement tendency of target.
The loss function of (3c) building estimation network ME-CNN:
The movement tendency of target is calculated according to the characteristics of motion of target, and using it as the corresponding trained label of target, meter
Obtained target movement tendency (Gx,Gy) with estimation network ME-CNN output predicted position (Px,Py) between theorem in Euclid space
Distance is configured to the loss function of estimation network ME-CNN;
In formula, GxFor the target movement tendency in the direction x under image coordinate system, GyTarget movement for the direction y under image coordinate system becomes
Gesture, PxFor the prediction result in estimation network direction x under image coordinate system, PyIt is estimation network in image coordinate system
The prediction result in the direction lower y.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910718847.6A CN110517285B (en) | 2019-08-05 | 2019-08-05 | Large-scene minimum target tracking based on motion estimation ME-CNN network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910718847.6A CN110517285B (en) | 2019-08-05 | 2019-08-05 | Large-scene minimum target tracking based on motion estimation ME-CNN network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110517285A true CN110517285A (en) | 2019-11-29 |
CN110517285B CN110517285B (en) | 2021-09-10 |
Family
ID=68624473
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910718847.6A Active CN110517285B (en) | 2019-08-05 | 2019-08-05 | Large-scene minimum target tracking based on motion estimation ME-CNN network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110517285B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111986233A (en) * | 2020-08-20 | 2020-11-24 | 西安电子科技大学 | Large-scene minimum target remote sensing video tracking method based on feature self-learning |
CN114066937A (en) * | 2021-11-06 | 2022-02-18 | 中国电子科技集团公司第五十四研究所 | Multi-target tracking method for large-scale remote sensing image |
CN115086718A (en) * | 2022-07-19 | 2022-09-20 | 广州万协通信息技术有限公司 | Video stream encryption method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107886120A (en) * | 2017-11-03 | 2018-04-06 | 北京清瑞维航技术发展有限公司 | Method and apparatus for target detection tracking |
CN108154522A (en) * | 2016-12-05 | 2018-06-12 | 北京深鉴科技有限公司 | Target tracking system |
US10176388B1 (en) * | 2016-11-14 | 2019-01-08 | Zoox, Inc. | Spatial and temporal information for semantic segmentation |
CN109242884A (en) * | 2018-08-14 | 2019-01-18 | 西安电子科技大学 | Remote sensing video target tracking method based on JCFNet network |
CN109376736A (en) * | 2018-09-03 | 2019-02-22 | 浙江工商大学 | A kind of small video target detection method based on depth convolutional neural networks |
CN109636829A (en) * | 2018-11-24 | 2019-04-16 | 华中科技大学 | A kind of multi-object tracking method based on semantic information and scene information |
-
2019
- 2019-08-05 CN CN201910718847.6A patent/CN110517285B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10176388B1 (en) * | 2016-11-14 | 2019-01-08 | Zoox, Inc. | Spatial and temporal information for semantic segmentation |
CN108154522A (en) * | 2016-12-05 | 2018-06-12 | 北京深鉴科技有限公司 | Target tracking system |
CN107886120A (en) * | 2017-11-03 | 2018-04-06 | 北京清瑞维航技术发展有限公司 | Method and apparatus for target detection tracking |
CN109242884A (en) * | 2018-08-14 | 2019-01-18 | 西安电子科技大学 | Remote sensing video target tracking method based on JCFNet network |
CN109376736A (en) * | 2018-09-03 | 2019-02-22 | 浙江工商大学 | A kind of small video target detection method based on depth convolutional neural networks |
CN109636829A (en) * | 2018-11-24 | 2019-04-16 | 华中科技大学 | A kind of multi-object tracking method based on semantic information and scene information |
Non-Patent Citations (1)
Title |
---|
殷鹤楠,佟国香: "一种基于CNN-AE特征提取的目标跟踪方法", 《软件导刊》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111986233A (en) * | 2020-08-20 | 2020-11-24 | 西安电子科技大学 | Large-scene minimum target remote sensing video tracking method based on feature self-learning |
CN114066937A (en) * | 2021-11-06 | 2022-02-18 | 中国电子科技集团公司第五十四研究所 | Multi-target tracking method for large-scale remote sensing image |
CN114066937B (en) * | 2021-11-06 | 2022-09-02 | 中国电子科技集团公司第五十四研究所 | Multi-target tracking method for large-scale remote sensing image |
CN115086718A (en) * | 2022-07-19 | 2022-09-20 | 广州万协通信息技术有限公司 | Video stream encryption method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110517285B (en) | 2021-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11314973B2 (en) | Lane line-based intelligent driving control method and apparatus, and electronic device | |
CN109902677B (en) | Vehicle detection method based on deep learning | |
CN105405154B (en) | Target object tracking based on color-structure feature | |
WO2020151166A1 (en) | Multi-target tracking method and device, computer device and readable storage medium | |
CN112215128B (en) | FCOS-fused R-CNN urban road environment recognition method and device | |
CN110032949A (en) | A kind of target detection and localization method based on lightweight convolutional neural networks | |
CN112084869B (en) | Compact quadrilateral representation-based building target detection method | |
CN110533695A (en) | A kind of trajectory predictions device and method based on DS evidence theory | |
CN112487862B (en) | Garage pedestrian detection method based on improved EfficientDet model | |
CN108710913A (en) | A kind of switchgear presentation switch state automatic identification method based on deep learning | |
CN110517285A (en) | The minimum target following of large scene based on estimation ME-CNN network | |
CN113516664A (en) | Visual SLAM method based on semantic segmentation dynamic points | |
CN106778835A (en) | The airport target by using remote sensing image recognition methods of fusion scene information and depth characteristic | |
CN106373143A (en) | Adaptive method and system | |
CN103227888B (en) | A kind of based on empirical mode decomposition with the video stabilization method of multiple interpretational criteria | |
CN110427797B (en) | Three-dimensional vehicle detection method based on geometric condition limitation | |
CN109492596B (en) | Pedestrian detection method and system based on K-means clustering and regional recommendation network | |
CN109087337B (en) | Long-time target tracking method and system based on hierarchical convolution characteristics | |
CN113643329B (en) | Twin attention network-based online update target tracking method and system | |
CN111126278A (en) | Target detection model optimization and acceleration method for few-category scene | |
CN115797736A (en) | Method, device, equipment and medium for training target detection model and target detection | |
CN114022837A (en) | Station left article detection method and device, electronic equipment and storage medium | |
CN113033482A (en) | Traffic sign detection method based on regional attention | |
CN117949942A (en) | Target tracking method and system based on fusion of radar data and video data | |
CN117853955A (en) | Unmanned aerial vehicle small target detection method based on improved YOLOv5 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |