CN110517285A - The minimum target following of large scene based on estimation ME-CNN network - Google Patents

The minimum target following of large scene based on estimation ME-CNN network Download PDF

Info

Publication number
CN110517285A
CN110517285A CN201910718847.6A CN201910718847A CN110517285A CN 110517285 A CN110517285 A CN 110517285A CN 201910718847 A CN201910718847 A CN 201910718847A CN 110517285 A CN110517285 A CN 110517285A
Authority
CN
China
Prior art keywords
target
network
cnn
training
estimation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910718847.6A
Other languages
Chinese (zh)
Other versions
CN110517285B (en
Inventor
焦李成
杨晓岩
李阳阳
唐旭
程曦娜
刘旭
杨淑媛
冯志玺
侯彪
张丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Electronic Science and Technology
Original Assignee
Xian University of Electronic Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Electronic Science and Technology filed Critical Xian University of Electronic Science and Technology
Priority to CN201910718847.6A priority Critical patent/CN110517285B/en
Publication of CN110517285A publication Critical patent/CN110517285A/en
Application granted granted Critical
Publication of CN110517285B publication Critical patent/CN110517285B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/207Analysis of motion for motion estimation over a hierarchy of resolutions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention proposes a kind of minimum method for tracking target of large scene based on estimation ME-CNN network, it solves the problems, such as to carry out minimum target following using kinematic parameter without registration, realizes step are as follows: obtain the initial training collection D of target estimation network ME-CNN;Construct the network ME-CNN of estimating target motion;Network ME-CNN loss function is calculated with the parameters of target motion;Judge whether it is initial training collection;Update loss function training label;Obtain prediction target movement position initial model;Correct prediction model position;Training dataset is updated with revised target position, completes the target following of a frame;Obtain remote sensing video frequency object tracking result.The present invention predicts target movement position with deep learning network ME-CNN, avoids large scene image registration in tracking, and super model pastes the problem of target's feature-extraction hardly possible, reduces target signature dependence, improves the accuracy of target following in super fuzzy video.

Description

The minimum target following of large scene based on estimation ME-CNN network
Technical field
The invention belongs to remote sensing technical field of video processing, are related to the remote sensing video frequency object tracking of the minimum target of large scene, The minimum target remote sensing video tracing method of specifically a kind of large scene based on estimation ME-CNN network.For supervising safely Control, smart city construction and means of transportation monitoring etc..
Background technique
Remote sensing target tracking is an important research direction of computer vision field, wherein the satellite shooting moved The target following of the remote sensing video of the minimum target of large scene, low resolution is extremely challenging studies a question.The small mesh of large scene Target remote sensing videograph is that the daily routines situation of a certain regional a period of time is covered because the height of satellite shooting is very high More than half city of lid, therefore the resolution ratio of video is not high, the size of vehicle, naval vessel and aircraft in video is minimum, and vehicle is regarding Size in frequency is even up to 3*3 pixel or so, also extremely low with the contrast of ambient enviroment, and human eye can only be observed one small Bright spot, therefore this ultralow pixel and minimum Target Tracking Problem belong to the minimum Target Tracking Problem of large scene, difficulty is more Greatly;And since the satellite of shooting video constantly moves, video is while entirety has more apparent offset to a direction due to ground Area's height will appear some areas scaling, make it difficult to first to do the method that image registration carries out frame difference method again and obtain target movement Position brings great challenge to the remote sensing video frequency object tracking of the minimum target of large scene.
Video frequency object tracking is to need to predict subsequent view after giving target position and the size in a video initial frame Target position and size in frequency frame.Currently, the algorithm in video tracking field is mostly based on neural network (Neural Network) and correlation filtering (Correlation Filter), wherein being based on neural network algorithm, such as CNN-SVM method Main thought is that target is first inputted multilayer neural network, and learning objective feature recycles traditional SVM method to track, The target signature of extraction learns out by a large amount of training data, has more distinctive than traditional feature;It is filtered based on correlation The algorithm of wave, for example the basic thought of KCF method is to find a Filtering Template, and the image of next frame is allowed to roll up with Filtering Template Product operation, responding maximum region is exactly the target predicted, does convolution algorithm with the template and other regions of search, obtains maximum The region of search of response is exactly target position, and the arithmetic speed of this method is fast, and accuracy rate is higher.
The algorithm of natural optics video tracking is difficult to apply in the remote sensing video of the minimum Small object of large scene, because of target Size is minimum and fuzzy, obtains effective target signature without calligraphy learning with neural network.And the tracking of traditional remote sensing video Be not suitable for occurring yet background constantly deviate and the video of partial region scaling on, the technical method of image registration and frame difference method without Method is realized, and target and ambient enviroment contrast are extremely low, easily with losing.
Summary of the invention
It is an object of the invention to overcome the problems of the above-mentioned prior art, it is low to propose a kind of computation complexity, essence Spend the higher large scene Small object remote sensing video tracing method based on estimation.
The present invention is a kind of minimum target remote sensing video tracing method of the large scene based on estimation ME-CNN network, It is characterized in that, includes the following steps:
(1) the initial training collection D of minimum target estimation network ME-CNN is obtained:
The preceding F frame image for taking original remotely-sensed data video A, to the same target continued labelling bounding box of each image, Each bounding box top left corner apex coordinate is arranged sequentially by video frame number together as training set D;
(2) the network ME-CNN of minimum target movement is estimated in building: different special including three extraction training datas in parallel The convolution module of sign, then stack gradually articulamentum, full articulamentum and output layer;
(3) loss function of network ME-CNN is calculated with the minimum parameters of target motion: being calculated according to the characteristics of motion of target The movement tendency of target is obtained, and using it as the corresponding trained label of target, then calculates trained label and ME-CNN network European space distance between prediction result, the loss function as ME-CNN network optimization training;
(4) judge whether it is initial training to integrate: judging current training set whether as initial training collection, if not initial instruction Practice collection, execute step (5), updates the training label in loss function;If instead being initial training collection, execute step (6), into Enter the circuit training of network;
(5) it updates the training label in loss function: when current training set is not initial training collection, using current training set Data recalculate the training label of loss function, calculation method calculates the side of training label with the minimum parameters of target motion Method, the training label that recalculates identical as the method for step (3) participate in estimation network ME-CNN training, enter Step (6);
(6) it obtains the initial model M1 of prediction target movement position: training set D is inputted into target estimation network ME- CNN obtains the initial model M1 of prediction target movement position according to current loss function training network;
(7) it corrects the position result of prediction model: calculating the aided location offset of target, estimated with offset correction movement Count the position result of network ME-CNN prediction;
(7a) obtains target gray image block: obtaining the mesh of next frame according to the initial model M1 of prediction target movement position Cursor position (Px,Py), according to obtained target position (Px,Py) the gray level image block of taking-up target in the image of next frame, and It is normalized, the target gray image block after being normalized;
(7b) obtains target position offset: brightness classification is carried out to the target gray image block after normalization, using vertical Straight sciagraphy determines the position of target in image block, the position at the target's center position being calculated and image block center away from From to get arrive target position offset;
(7c) obtains revised target position: utilizing obtained target position offset correction estimation network ME- CNN predicts the position of target, obtains the revised all positions of target;
(8) training dataset is updated with revised target position, completes the target following of a frame: by an obtained target left side Training set D last line is added in upper Angle Position, and removes the first row of training set D, is disposably operated, has obtained one and repaired Just and the training set D of update, the training of one frame of completion have obtained the target position result of a frame;
(9) judge whether current video frame number is less than total video frame number: if it is less than total video frame number with regard to circulating repetition Step (4)~step (9), the tracking optimization training for carrying out target continue to train until traversing all video frames, otherwise such as Fruit is equal to total video frame number, terminates training, executes step (10);
(10) obtain remote sensing video frequency object tracking result: the output accumulated is remote sensing video frequency object tracking result.
The present invention solves the height of computation complexity existing for existing video tracking algorithm, the lower problem of tracking accuracy.
Compared with prior art, the present invention having the advantage that
(1) the estimation network ME-CNN that the present invention uses carries out frame without first carrying out image registration in conventional method again The image background of poor method or complexity models to obtain the motion profile of target, can be by neural network to the mesh of preceding F frame image The analysis of the training set of cursor position composition, the movement that neural network forecast obtains target tend to, do not need manually to mark subsequent video frame In target position label, can be realized network self-loopa training, not only greatly reduce the complexity of track algorithm, improve The practicability of algorithm.
(2) algorithm that the present invention uses, in such a way that ME-CNN network and aided location offset method combine voluntarily Amendment remote sensing video object position has modified the loss function of estimation network, reduces net according to the characteristics of motion of target The calculation amount of network improves the robustness of target following.
Detailed description of the invention
Fig. 1 is implementation flow chart of the invention;
Fig. 2 is the structural schematic diagram of ME-CNN network proposed by the present invention;
Fig. 3 is using the present invention to the prediction locus result of target minimum in large scene and the curve pair of standard target track Than figure, prediction result of the invention is green curve, and red is accurate target trajectory.
Specific embodiment
In the following with reference to the drawings and specific embodiments, the present invention is described in detail.
Embodiment 1
The minimum target remote sensing video tracking of large scene is in terms of security monitoring, smart city construction and means of transportation It plays a significant role.The remote sensing video that the present invention studies is the minimum target of large scene, the low resolution of mobile satellite shooting Remote sensing video.The target for the video tracking that the present invention studies is extremely fuzzy, and target is minimum, also very with the contrast of ambient enviroment Low, human eye is also difficult to find out that target is vehicle in the no motion of situation of target, and video is again due to the movement of satellite and shooting ground The height above sea level variation in area will appear image translation and part scales, the difficulty of target following compared to clear video target with Track greatly improves and a challenge of remote sensing video tracking.There are mainly two types of existing methods, and one is utilize neural network Target signature is extracted in study, extracts multiple search boxes in next frame, a frame for choosing target signature highest scoring is target institute In position, this method can not extract validity feature, can not be applied to view of the invention since target super model paste is again minimum Frequently.Another kind is image registration first to be carried out to frame difference method again to obtain target trajectory, then find a Filtering Template, is allowed next The image and Filtering Template of frame do convolution operation, and responding maximum region is exactly the target predicted, this method is due to the present invention Video not only there is image translation, more occur part scale, increase the complexity of image registration significantly, increase calculating Difficulty, it is difficult to extract arrive effective motion profile.Therefore the present invention is directed to these statuses, proposes after study a kind of based on movement The minimum target remote sensing video tracing method of large scene of estimation ME-CNN network includes the following steps: referring to Fig. 1
(1) the initial training collection D of minimum target estimation network ME-CNN is obtained:
The preceding F frame image for taking original remotely-sensed data video A only selectes a target, to each image in each image The same target continued labelling bounding box, the present invention in by minimum target be target, by each bounding box top left corner apex sit Mark is arranged sequentially by video frame number together as training set D, wherein using image coordinate system, training set D is that a F row 2 arranges Matrix, it is the target coordinate position of a frame that every a line is corresponding, and wherein the position of target can be with the coordinate of top left corner apex It indicates, can also be indicated by centre coordinate, not influence the analysis to target motion conditions.
(2) three that the network ME-CNN: ME-CNN network of the invention of minimum target movement includes in parallel are estimated in building The convolution module of training data different characteristic is extracted to obtain the different motion feature of target, then stacks gradually articulamentum fusion and mentions Different motion features, full articulamentum and the output layer got are exported as a result, overall constitute ME-CNN network.Of the invention Using three convolution modules to obtain the different motion feature of target, single convolution module is difficult processing and obtains entire training set Feature, if network layer will appear the problem of gradient disappears more deeply, therefore the present invention widens network, and multiple scales mention The feature for taking training set different situations reduces network complexity, accelerates network speed.Since video of the invention constantly moves Offset and since partial region scaling occurs in the height difference in area, is not available image registration and adds frame for this video The methods of poor method and background modeling can use ME-CNN network to obtain target trajectory, relatively existing method at this time Model complexity is low, and calculation amount is small.
(3) loss function of network ME-CNN is calculated with the minimum parameters of target motion: being calculated according to the characteristics of motion of target The movement tendency of target is obtained, and using it as the corresponding trained label of target, then calculates trained label and ME-CNN network European space distance between prediction result, as the loss function of the ME-CNN network optimization, trained loss in the present invention Function can reinforce the analysis to training data, help network quickly to extract effective feature, to optimize estimation Network ME-CNN.
(4) judge whether it is initial training to integrate: judging current training set whether as initial training collection, if current training set It is not initial training collection, executes step (5), updates the training label in loss function, and then participate in network training.Conversely, such as The current training set of fruit is initial training collection, is executed step (6), into the circuit training of network.
(5) it updates the training label in loss function: since training set D is constantly updated in subsequent step (8), training Need constantly to adjust the training label in loss function in journey according to updated training set D, current training set is not initially to instruct When practicing collection, it should recalculate the training label of loss function, the minimum target of calculation method using the data of current training set The method of beginning parameter transform model training label is identical as the method for step (3);The training label recalculated participates in fortune Dynamic estimation network ME-CNN training, enters step (6).
(6) it obtains the initial model M1 of prediction target movement position: training set D is inputted into target estimation network ME- CNN obtains the initial model M1 of prediction target movement position according to current loss function training network.
(7) it corrects the position result of prediction model: calculating the aided location offset of target, estimated with offset correction movement Count the position result of network ME-CNN prediction.
(7a) obtains target gray image block: obtaining the mesh of next frame according to the initial model M1 of prediction target movement position Cursor position (Px,Py), according to obtained target position (Px,Py) the gray level image block of taking-up target in the image of next frame, and It is normalized, the target gray image block after being normalized, the contrast since the size of target is minimum, with ambient enviroment It is extremely low, judge that the method for offset is poor to its effect with neural network, therefore first obtain lesser target frame, then sentence in frame The method of disconnected offset is preferable.
(7b) obtains target position offset: brightness classification is carried out to the target gray image block after normalization, by target It is shown with road with different brightness, simultaneously because road surrounding enviroment and target contrast are extremely low, it is true using vertical projection method Mesh is arrived at a distance from the position at image block center in the position for determining target in image block, the target's center position being calculated Cursor position offset.
(7c) obtains revised target position: utilizing obtained target position offset correction estimation network ME- CNN predicts the position of target, obtains the revised all positions of target, the position including the target upper left corner.
(8) training dataset is updated with revised target position, completes the target following of a frame: by an obtained target left side Training set D last line is added in upper Angle Position, and removes the first row of training set D, is disposably operated, has obtained one and repaired Just and the training set D of update, the training of one frame of completion have obtained the target position of a frame as a result, this loop modification training set Method have updated network parameter, reduce target frame difference, adapt to target movement.
(9) judge whether current video frame number is less than total video frame number, if it is less than total video frame number with regard to circulating repetition Step (4)~step (9) updates model parameter again, improves Model suitability, carries out the tracking optimization training of target, until All video frames are traversed, continue to train, if instead current video frame number is equal to total video frame number, terminates training, executes Step (10).
(10) obtain remote sensing video frequency object tracking result: after training, the target position output accumulated is as distant Feel video frequency object tracking result.
To carry out frame again poor without first carrying out image registration in conventional method by the estimation network ME-CNN that the present invention uses The image background of method or complexity models to obtain target trajectory, and the new algorithm of proposition is by neural network to preceding F frame image Target position composition training set analysis, can effectively extract Target Motion Character.It will appear gradient since network is too deep The problems such as disappearance, therefore tended to using the movement that the ME-CNN neural network forecast of multiscale analysis obtains target, it does not need manually to mark The target position label in subsequent video frame is infused, network self-loopa training can be realized, not only greatly reduce track algorithm Complexity improves the practicability of algorithm, may not need image registration by the estimation network of target, fast and accurately looks for To target position.Voluntarily judged using the mode that ME-CNN network and aided location offset method combine to remote sensing video object Position obtains the movement velocity of target according to the motion conditions of target, analyzes its possible movement tendency, also has modified movement The loss function for estimating network, improves the robustness of target following.
The present invention carries out motion analysis using a kind of target pasted based on the method for deep learning to super model, predicts under it The prediction direction of one step, then be aided with position offset correction motion estimation network, do not need subsequent tag, can carry out target with Track, so as to avoid large scene image registration in tracking, super model pastes the problem of target's feature-extraction hardly possible, significantly improves super model Paste the accuracy of target following in video, the tracking being also applied in other various remote sensing videos.
Embodiment 2
The minimum target remote sensing video tracing method of large scene based on estimation ME-CNN network is the same as embodiment 1, step (2) building described in estimates that the network ME-CNN of minimum target movement, such as Fig. 2 are comprised the following steps that
The overall structure of (2a) estimation network: estimation network ME-CNN, three convolution modules including parallel connection, Three convolution modules of present invention parallel connection extract different motion features, and three convolution modules in parallel are sequentially connected again later To articulamentum, full articulamentum and output layer.The present invention constructs in the network ME-CNN for estimating minimum target movement and uses articulamentum The different motion feature extracted is merged, is refined and is analyzed using full articulamentum, export to obtain result by output layer.
The structure of three convolution modules of (2b) parallel connection: three convolution modules in parallel, respectively convolution module Ι, convolution Module Ι Ι and convolution module Ι Ι Ι, wherein
Convolution module I includes locally-attached LocallyConnected1D convolutional layer, and step-length is 2 coordinates for extracting target Location information;
Convolution module Ι Ι includes empty convolution, step-length 1;
Convolution module Ι Ι Ι includes the one-dimensional convolution that step-length is 2;
Convolution module Ι, Ι Ι and Ι Ι Ι obtains the position feature of target different scale, three output datas is obtained, then by three The output of a convolution module is cascaded to obtain fusion convolution results;Full articulamentum and output layer are inputted again, are obtained pre- to the end Survey result.It is of the invention using three convolution modules to obtain the different motion feature of target, single convolution module is difficult to handle The feature of entire training set is obtained, if network layer will appear the problem of gradient disappears more deeply, therefore the present invention carries out network Widen, the feature of the extraction training set different situations of multiple scales reduces network complexity, accelerates network speed.Due to this The video of invention constantly move offset and due to area height difference there is partial region scaling, for this video without Method adds the methods of frame difference method and background modeling using image registration, and ME-CNN network can be used to obtain target movement rail at this time Mark, relatively existing method model complexity is low, and calculation amount is small.
Embodiment 3
The minimum target remote sensing video tracing method of large scene based on estimation ME-CNN network is the same as embodiment 1-2, step The loss function for calculating network ME-CNN described in rapid 3 with the minimum parameters of target motion, by handling the data of training set D, Rough analysis is carried out to the motion conditions of target, has certain guidance to make in the optimization direction of estimation network ME-CNN With comprising the following steps that
(3a) obtains training set D displacement of targets: the data of training set D line f, F-2 row, F-4 row being taken out, respectively Subtract each other with training set D the first row data, obtain F frame, F-2 frame, F-4 frame respectively the displacement of targets between first frame according to Secondary is S1、S2、S3。S1For the displacement of targets between F frame and first frame, S2For the displacement of targets between F-2 frame and first frame, S3For the displacement of targets between F-4 frame and first frame.If training set not instead of initial training collection, update i times instruction is crossed Practice collection D, the corresponding frame number of the every a line of training set is also accordingly changing, and becomes 1+i frame, 2+i frame ... ..., F+i frame will The data taking-up of training set D line f, F-2 row, F-4 row, subtracts each other with training set D the first row data, obtained difference respectively Be F+i frame, F+i-2 frame, F+i-4 frame displacement of targets S is followed successively by between first frame respectively1、S2、S3
(3b) obtains the movement tendency of target:
According to the characteristics of motion of target, using obtained displacement of targets, respectively in the x of image coordinate system, the direction y is under Movement tendency (the G of target is calculated in column formulax,Gy);
V1=(S1-S2)/2
V2=(S2-S3)/2
A=(V1-V2)/2
G=V1+a/2
Image coordinate system is used in the present invention, it is horizontally to the right the side x that image coordinate system, which is using the image upper left corner as origin, To being vertically downward the direction y.In above formula, V1To be displaced S1With S2Between target speed, V2To be displaced S2With S3Between target Movement velocity, a are acceleration of motion, and G is the movement tendency of target.
The loss function of (3c) building estimation network ME-CNN:
The movement tendency of target is calculated according to the characteristics of motion of target, and is marked it as the corresponding training of target Label, the target movement tendency (G being calculatedx,Gy) with estimation network ME-CNN output predicted position (Px,Py) between Europe Formula space length is configured to the loss function of estimation network ME-CNN;
In formula, GxFor the target movement tendency in the direction x under image coordinate system, GyFor the target fortune in the direction y under image coordinate system Dynamic trend, PxFor the prediction result in estimation network direction x under image coordinate system, PyIt is sat for estimation network in image Mark is the prediction result in the lower direction y.
A comprehensive example is given below, the present invention is further described
Embodiment 4
The minimum target remote sensing video tracing method of large scene based on estimation ME-CNN network with embodiment 1-3,
Referring to Fig.1, the minimum target remote sensing video tracing method of a kind of large scene based on estimation ME-CNN network, packet Include following steps:
(1) the initial training collection D of minimum target estimation network ME-CNN is obtained:
The preceding F frame image for taking original remotely-sensed data video A will to a target continued labelling bounding box of each image Each bounding box top left corner apex coordinate is superimposed as training set D, and wherein training set D is the matrix of a F row 2 column, It is the coordinates of targets of a frame in video that every a line is corresponding, wherein the position of target can with the coordinate representation of top left corner apex, It can also be indicated by centre coordinate, not influence the analysis to target motion conditions, minimum target is referred to as target in the present invention.
(2) the network ME-CNN of minimum target movement is estimated in building: different special including three extraction training datas in parallel To obtain the different motion feature of target, single convolutional layer is difficult processing and obtains the feature of entire training set the convolution module of sign, If network layer will appear the problem of gradient disappears more deeply, therefore widen network, the extraction training set of multiple scales is not With the feature of situation, network complexity can reduce, accelerate network speed, then stack gradually fortune of the articulamentum to merge extraction Dynamic feature, full articulamentum is analyzed and output layer obtains result.
The overall structure of (2a) estimation network: estimation network ME-CNN, three convolution modules including parallel connection, Articulamentum, full articulamentum, output layer are stacked gradually again;
The structure of three convolution modules of (2b) parallel connection: three convolution modules in parallel, respectively convolution module Ι, convolution Module Ι Ι and convolution module Ι Ι Ι, wherein
Convolution module I includes locally-attached LocallyConnected1D convolutional layer, and step-length is 2 coordinates for extracting target Location information;
Convolution module Ι Ι includes empty convolution, step-length 1;
Convolution module Ι Ι Ι includes the one-dimensional convolution that step-length is 2;
Convolution module Ι, Ι Ι and Ι Ι Ι obtains the position feature of target different scale, three output datas is obtained, then by three The output of a convolution module is cascaded to obtain fusion convolution results;Full articulamentum and output layer are inputted again, are obtained pre- to the end Survey result.
(3) it constructs the loss function of minimum target estimation network ME-CNN: being calculated according to the characteristics of motion of target To the movement tendency of target, and using it as the corresponding trained label of target, then calculate the prediction result of itself and ME-CNN network Between European space distance, the loss function as ME-CNN network;
(3a) obtains training set D displacement of targets: if training set is initial training collection, by training set D line f, F-2 The data taking-up of row, F-4 row, subtracts each other with training set D the first row data respectively, obtains F frame, F-2 frame, F-4 frame point Displacement of targets not between first frame is followed successively by S1、S2、S3。S1For the displacement of targets between F frame and first frame, S2For F- Displacement of targets between 2 frames and first frame, S3For the displacement of targets between F-4 frame and first frame.If training set is not just Beginning training set, but update i times training set D is crossed, the corresponding frame number of the every a line of training set is also accordingly changing, and becomes 1+i Frame, 2+i frame ... ..., F+i frame take out the data of training set D line f, F-2 row, F-4 row, respectively with training set D the first row data are subtracted each other, obtain be respectively F+i frame, F+i-2 frame, F+i-4 frame displacement of targets respectively with first frame Between be followed successively by S1、S2、S3
(3b) obtains the movement tendency of target:
According to the characteristics of motion of target, using obtained training data displacement of targets, respectively in the x of image coordinate system, the side y To the movement tendency (G that target is calculated according to the following formulax,Gy)。
V1=(S1-S2)/2
V2=(S2-S3)/2
A=(V1-V2)/2
G=V1+a/2
The loss function of (3c) building estimation network ME-CNN:
Target movement tendency (the G being calculatedx,Gy) with estimation network output predicted position (Px,Py) between it is European Space length is configured to the loss function of estimation network ME-CNN.
(4) it updates the training label in loss function: since training set D is constantly updated in subsequent step (7), training It needs constantly to adjust the training label in loss function in journey according to updated training set D, participates in estimation network ME- CNN training.
(5) it obtains the initial model M1 of prediction target movement position: training set D is inputted into target estimation network ME- CNN obtains the initial model M1 of prediction target movement position according to loss function training network.
(6) it corrects the position result of prediction model: calculating the aided location offset of target, estimated with offset correction movement Count the position result of network ME-CNN prediction.
(6a) obtains target gray image block: obtaining the mesh of next frame according to the initial model M1 of prediction target movement position Cursor position (Px,Py), according to obtained target position (Px,Py) the gray level image block of taking-up target in the image of next frame, and It is normalized, the target gray image block after being normalized, the contrast since the size of target is minimum, with ambient enviroment It is extremely low, judge that the method for offset is poor to its effect with neural network, therefore first obtain lesser target frame, then sentence in frame The method of disconnected offset is preferable.
(6b) obtains target position offset: brightness classification is carried out to the target gray image block after normalization, using vertical Straight sciagraphy determines the position of target in image block, the position at the target's center position being calculated and image block center away from From to get arrive target position offset.
(6c) obtains revised target position: utilizing obtained target position offset correction estimation network ME- CNN predicts the position of target, obtains the revised all positions of target, the position including the target upper left corner.
(7) training dataset is updated with revised target position, completes the target following of a frame: by an obtained target left side Training set D last line is added in upper Angle Position, and removes the first row of training set D, is disposably operated, has obtained one and repaired Just and the training set of update, the training of one frame of completion have obtained the target position result of a frame.
(8) remote sensing video frequency object tracking result: circulating repetition step (4)~step (7) is obtained, constantly with updated Training set retrieves trained label according to the method in step (3), updates network model, iterates, carry out target with Track approaches training, until traversing all video frames, the output accumulated is remote sensing video frequency object tracking result.
The motion estimation model of target can also be moved by the target of former frames in this example, extract the road where target Information finds city where the target of identical longitude and latitude on map, by path adaptation to corresponding road conditions, predicts mesh Motion conditions are marked, the three-dimensional information of road is made full use of, change more violent, to have part to scale in video feelings in road height Condition also can accurate tracking target;The aided location offset of target can also be obtained by training neural network, but be needed Target and ambient enviroment are handled in advance, the higher image block of contrast is obtained, neural network can be trained.
Below in conjunction with l-G simulation test, technical effect of the invention is described further:
Embodiment 5
The minimum target remote sensing video tracing method of large scene based on estimation ME-CNN network with embodiment 1-4,
Simulated conditions and content:
Emulation platform of the invention are as follows: dominant frequency is the Intel Xeon CPU E5-2630v3CPU of 2.40GHz, 64GB's Running memory, Ubuntu16.04 operating system, software platform are Keras and Python.Video card: GeForce GTX TITAN X/PCIe/SSE2×2。
The remote sensing video in Libya's Derna area that the present invention uses Jilin No.1 video satellite to shoot, former 10 The vehicle of frame image is as target, and to the target label frame in image, the position of frame left upper apex is as training set DateSet.Tracking emulation is carried out to target video based on the method for tracking target of KCF with existing with the present invention respectively.
Emulation content and result:
Control methods, that is, existing the method for tracking target based on KCF, with the method for the present invention and control methods in above-mentioned emulation Under the conditions of tested, i.e., with control methods and the present invention vehicle target in Libya's Derna area remote sensing video is carried out Tracking, obtains the prediction target trajectory result (green curve) of ME-CNN network and the comparison of accurate target track (red curve) Figure such as Fig. 3, the result for obtaining table 1 is as follows.
1. Libya's Derna area remote sensing video frequency object tracking result list of table
Analysis of simulation result:
In table 1, Precision indicates the target position of ME-CNN neural network forecast and the region Duplication of label position, IOU The average Euclidean distance of the center and tag hub position that indicate bounding box is less than the percentage of given threshold value, in this example In, choosing given threshold value is that 5, KCF indicates control methods, and ME-CNN indicates method of the invention.
Referring to table 1, it can be seen that the precision of tracking, this hair is greatly improved in the present invention from the data comparison of table 1 It is bright that Precision has been increased to 85.63% by 63.21%,
The average Euclidean distance of the center of bounding box and tag hub position is less than given threshold value as seen from Table 1 Percentage IOU, the 58.72% of method for tracking target of the present invention by control methods based on KCF are increased to 76.51%.
Referring to Fig. 3, red curve is standard target geometric locus in Fig. 3, and green curve is using the present invention to same mesh The tracking prediction estimation curve carried out is marked, the minimum target in large scene is shown in green box, compares two curves, it is seen that Two heights of curve are consistent, essentially coincide, it was demonstrated that tracking accuracy of the present invention is high.
In short, the minimum target remote sensing video of a kind of large scene based on estimation ME-CNN network proposed by the present invention Tracking, can shooting satellite constantly move, video occur integral translation and part scale, the resolution ratio of video it is extremely low and Tracking accuracy is improved in the case that target size is minimum, solves and carries out minimum target following using kinematic parameter without registration Problem realizes step are as follows: obtain the initial training collection D of minimum target estimation network ME-CNN;Minimum target is estimated in building The network ME-CNN of movement;The loss function of network ME-CNN is calculated with the minimum parameters of target motion;Judge whether it is initial instruction Practice collection;Update the training label in loss function;Obtain the initial model M1 of prediction target movement position;Correct prediction model Position result;Training dataset is updated with revised target position, completes the target following of a frame;Judge current video frame Whether number is less than total video frame number;Obtain remote sensing video frequency object tracking result.The present invention is pre- using deep learning network ME-CNN Target movement position to be surveyed, large scene image registration in existing method tracking is avoided, super model pastes the problem of target's feature-extraction hardly possible, Reduce the dependence to target signature, significantly improves the accuracy of target following in super fuzzy video, be also applied for other Tracking in various remote sensing videos.

Claims (3)

1. a kind of minimum method for tracking target of large scene based on estimation ME-CNN network, which is characterized in that including as follows Step:
(1) the initial training collection D of minimum target estimation network ME-CNN is obtained:
The preceding F frame image for taking original remotely-sensed data video A will be every to the same target continued labelling bounding box of each image A bounding box top left corner apex coordinate is arranged sequentially by video frame number together as training set D;
(2) the network ME-CNN of minimum target movement is estimated in building: including three extraction training data different characteristics in parallel Convolution module, then stack gradually articulamentum, full articulamentum and output layer;
(3) loss function of network ME-CNN is calculated with the minimum parameters of target motion: being calculated according to the characteristics of motion of target The movement tendency of target, and using it as the corresponding trained label of target, then calculate the prediction of trained label Yu ME-CNN network As a result the European space distance between, the loss function as ME-CNN network optimization training;
(4) judge whether it is initial training to integrate: judging current training set whether as initial training collection, if not initial training Collection executes step (5), updates the training label in loss function;If instead being initial training collection, execute step (6), enters The circuit training of network;
(5) it updates the training label in loss function: when current training set is not initial training collection, using the number of current training set According to the training label for recalculating loss function, calculation method calculates the method for training label with the minimum parameters of target motion, with The method of step (3) is identical, the training label recalculated, participates in estimation network ME-CNN training, enters step (6);
(6) it obtains the initial model M1 of prediction target movement position: training set D is inputted into target estimation network ME-CNN, According to current loss function training network, the initial model M1 of prediction target movement position is obtained;
(7) it corrects the position result of prediction model: the aided location offset of target is calculated, with offset correction estimation net The position result of network ME-CNN prediction;
(7a) obtains target gray image block: obtaining the target position of next frame according to the initial model M1 of prediction target movement position Set (Px,Py), according to obtained target position (Px,Py) the gray level image block of target is taken out in the image of next frame, and carry out Normalization, the target gray image block after being normalized;
(7b) obtains target position offset: carrying out brightness classification to the target gray image block after normalization, is thrown using vertical Shadow method determines the position of target in image block, and the target's center position being calculated is at a distance from the position at image block center, i.e., Obtain target position offset;
(7c) obtains revised target position: pre- using obtained target position offset correction estimation network ME-CNN The position for surveying target, obtains the revised all positions of target;
(8) training dataset is updated with revised target position, completes the target following of a frame: the target upper left corner that will be obtained Training set D last line is added in position, and removes the first row of training set D, is disposably operated, and has obtained an amendment simultaneously The training set D of update completes the training of a frame, has obtained the target position result of a frame;
(9) judge whether current video frame number is less than total video frame number: if it is less than total video frame number with regard to circulating repetition step (4)~step (9), carries out the tracking optimization training of target, until all video frames are traversed, if being equal to total video frame number, Terminate training, executes step (10);
(10) obtain remote sensing video frequency object tracking result: the output accumulated is remote sensing video frequency object tracking result.
2. the large scene minimum target remote sensing video tracking side according to claim 1 based on estimation ME-CNN network Method, it is characterised in that: the network ME-CNN of minimum target movement is estimated in building described in step (2), comprises the following steps that
The overall structure of (2a) estimation network: estimation network ME-CNN, including three convolution modules in parallel, then according to Secondary stacking articulamentum, full articulamentum, output layer;
The structure of three convolution modules of (2b) parallel connection: three convolution modules in parallel, respectively convolution module Ι, convolution module Ι Ι and convolution module Ι Ι Ι, wherein
Convolution module I includes locally-attached LocallyConnected1D convolutional layer, and step-length is 2 coordinate positions for extracting target Information;
Convolution module Ι Ι includes empty convolution, step-length 1;
Convolution module Ι Ι Ι includes the one-dimensional convolution that step-length is 2;
Convolution module Ι, Ι Ι and Ι Ι Ι obtains the position feature of target different scale, obtains three output datas, then rolls up three The output of volume module is cascaded to obtain fusion convolution results;Full articulamentum and output layer are inputted again, obtain prediction knot to the end Fruit.
3. the large scene minimum target remote sensing video tracking side according to claim 1 based on estimation ME-CNN network Method, it is characterised in that: with the loss function of minimum parameters of target motion calculating network ME-CNN described in step 3, include Following steps:
(3a) obtains training set D displacement of targets: the data of training set D line f, F-2 row, F-4 row are taken out, respectively with instruction Practice collection D the first row data to subtract each other, obtains F frame, F-2 frame, displacement of targets of the F-4 frame respectively between first frame and be followed successively by S1、S2、S3
(3b) obtains the movement tendency of target:
According to the characteristics of motion of target, using obtained displacement of targets, respectively in the x of image coordinate system, the direction y is according to following public affairs Movement tendency (the G of target is calculated in formulax,Gy);
V1=(S1-S2)/2
V2=(S2-S3)/2
A=(V1-V2)/2
G=V1+a/2
In formula, V1To be displaced S1With S2Between target speed, V2To be displaced S2With S3Between target speed, a be movement Acceleration, G are the movement tendency of target.
The loss function of (3c) building estimation network ME-CNN:
The movement tendency of target is calculated according to the characteristics of motion of target, and using it as the corresponding trained label of target, meter Obtained target movement tendency (Gx,Gy) with estimation network ME-CNN output predicted position (Px,Py) between theorem in Euclid space Distance is configured to the loss function of estimation network ME-CNN;
In formula, GxFor the target movement tendency in the direction x under image coordinate system, GyTarget movement for the direction y under image coordinate system becomes Gesture, PxFor the prediction result in estimation network direction x under image coordinate system, PyIt is estimation network in image coordinate system The prediction result in the direction lower y.
CN201910718847.6A 2019-08-05 2019-08-05 Large-scene minimum target tracking based on motion estimation ME-CNN network Active CN110517285B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910718847.6A CN110517285B (en) 2019-08-05 2019-08-05 Large-scene minimum target tracking based on motion estimation ME-CNN network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910718847.6A CN110517285B (en) 2019-08-05 2019-08-05 Large-scene minimum target tracking based on motion estimation ME-CNN network

Publications (2)

Publication Number Publication Date
CN110517285A true CN110517285A (en) 2019-11-29
CN110517285B CN110517285B (en) 2021-09-10

Family

ID=68624473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910718847.6A Active CN110517285B (en) 2019-08-05 2019-08-05 Large-scene minimum target tracking based on motion estimation ME-CNN network

Country Status (1)

Country Link
CN (1) CN110517285B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986233A (en) * 2020-08-20 2020-11-24 西安电子科技大学 Large-scene minimum target remote sensing video tracking method based on feature self-learning
CN114066937A (en) * 2021-11-06 2022-02-18 中国电子科技集团公司第五十四研究所 Multi-target tracking method for large-scale remote sensing image
CN115086718A (en) * 2022-07-19 2022-09-20 广州万协通信息技术有限公司 Video stream encryption method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886120A (en) * 2017-11-03 2018-04-06 北京清瑞维航技术发展有限公司 Method and apparatus for target detection tracking
CN108154522A (en) * 2016-12-05 2018-06-12 北京深鉴科技有限公司 Target tracking system
US10176388B1 (en) * 2016-11-14 2019-01-08 Zoox, Inc. Spatial and temporal information for semantic segmentation
CN109242884A (en) * 2018-08-14 2019-01-18 西安电子科技大学 Remote sensing video target tracking method based on JCFNet network
CN109376736A (en) * 2018-09-03 2019-02-22 浙江工商大学 A kind of small video target detection method based on depth convolutional neural networks
CN109636829A (en) * 2018-11-24 2019-04-16 华中科技大学 A kind of multi-object tracking method based on semantic information and scene information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10176388B1 (en) * 2016-11-14 2019-01-08 Zoox, Inc. Spatial and temporal information for semantic segmentation
CN108154522A (en) * 2016-12-05 2018-06-12 北京深鉴科技有限公司 Target tracking system
CN107886120A (en) * 2017-11-03 2018-04-06 北京清瑞维航技术发展有限公司 Method and apparatus for target detection tracking
CN109242884A (en) * 2018-08-14 2019-01-18 西安电子科技大学 Remote sensing video target tracking method based on JCFNet network
CN109376736A (en) * 2018-09-03 2019-02-22 浙江工商大学 A kind of small video target detection method based on depth convolutional neural networks
CN109636829A (en) * 2018-11-24 2019-04-16 华中科技大学 A kind of multi-object tracking method based on semantic information and scene information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
殷鹤楠,佟国香: "一种基于CNN-AE特征提取的目标跟踪方法", 《软件导刊》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111986233A (en) * 2020-08-20 2020-11-24 西安电子科技大学 Large-scene minimum target remote sensing video tracking method based on feature self-learning
CN114066937A (en) * 2021-11-06 2022-02-18 中国电子科技集团公司第五十四研究所 Multi-target tracking method for large-scale remote sensing image
CN114066937B (en) * 2021-11-06 2022-09-02 中国电子科技集团公司第五十四研究所 Multi-target tracking method for large-scale remote sensing image
CN115086718A (en) * 2022-07-19 2022-09-20 广州万协通信息技术有限公司 Video stream encryption method and device

Also Published As

Publication number Publication date
CN110517285B (en) 2021-09-10

Similar Documents

Publication Publication Date Title
US11314973B2 (en) Lane line-based intelligent driving control method and apparatus, and electronic device
CN109902677B (en) Vehicle detection method based on deep learning
CN105405154B (en) Target object tracking based on color-structure feature
WO2020151166A1 (en) Multi-target tracking method and device, computer device and readable storage medium
CN112215128B (en) FCOS-fused R-CNN urban road environment recognition method and device
CN110032949A (en) A kind of target detection and localization method based on lightweight convolutional neural networks
CN112084869B (en) Compact quadrilateral representation-based building target detection method
CN110533695A (en) A kind of trajectory predictions device and method based on DS evidence theory
CN112487862B (en) Garage pedestrian detection method based on improved EfficientDet model
CN108710913A (en) A kind of switchgear presentation switch state automatic identification method based on deep learning
CN110517285A (en) The minimum target following of large scene based on estimation ME-CNN network
CN113516664A (en) Visual SLAM method based on semantic segmentation dynamic points
CN106778835A (en) The airport target by using remote sensing image recognition methods of fusion scene information and depth characteristic
CN106373143A (en) Adaptive method and system
CN103227888B (en) A kind of based on empirical mode decomposition with the video stabilization method of multiple interpretational criteria
CN110427797B (en) Three-dimensional vehicle detection method based on geometric condition limitation
CN109492596B (en) Pedestrian detection method and system based on K-means clustering and regional recommendation network
CN109087337B (en) Long-time target tracking method and system based on hierarchical convolution characteristics
CN113643329B (en) Twin attention network-based online update target tracking method and system
CN111126278A (en) Target detection model optimization and acceleration method for few-category scene
CN115797736A (en) Method, device, equipment and medium for training target detection model and target detection
CN114022837A (en) Station left article detection method and device, electronic equipment and storage medium
CN113033482A (en) Traffic sign detection method based on regional attention
CN117949942A (en) Target tracking method and system based on fusion of radar data and video data
CN117853955A (en) Unmanned aerial vehicle small target detection method based on improved YOLOv5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant