CN110517285A

CN110517285A - Tracking of very small objects in large scenes based on motion estimation ME-CNN network

Info

Publication number: CN110517285A
Application number: CN201910718847.6A
Authority: CN
Inventors: 焦李成; 杨晓岩; 李阳阳; 唐旭; 程曦娜; 刘旭; 杨淑媛; 冯志玺; 侯彪; 张丹
Original assignee: Xian University of Electronic Science and Technology
Current assignee: Xian University of Electronic Science and Technology
Priority date: 2019-08-05
Filing date: 2019-08-05
Publication date: 2019-11-29
Anticipated expiration: 2039-08-05
Also published as: CN110517285B

Abstract

The present invention proposes a method for tracking extremely small targets in a large scene based on the motion estimation ME-CNN network, which solves the problem of using motion parameters to track extremely small targets without registration. The implementation steps are: obtaining the target motion estimation network ME-CNN The initial training set D; construct the network ME-CNN to estimate the target motion; use the target motion parameters to calculate the network ME-CNN loss function; judge whether it is the initial training set; update the loss function training label; get the initial model of the predicted target motion position; correct Predict the position of the model; update the training data set with the corrected target position, and complete the target tracking of one frame; obtain the remote sensing video target tracking result. The invention uses the deep learning network ME-CNN to predict the moving position of the target, avoiding the registration of large scene images in tracking and the difficulty of extracting super-fuzzy target features, reducing the dependence of target features, and improving the accuracy of target tracking in super-fuzzy videos Spend.

Description

Tracking of very small objects in large scenes based on motion estimation ME-CNN network

技术领域technical field

本发明属于遥感视频处理技术领域，涉及大场景极小目标的遥感视频目标跟踪，具体是一种基于运动估计ME-CNN网络的大场景极小目标遥感视频跟踪方法。用于安全监控、智慧城市建设和交通设施监测等方面。The invention belongs to the technical field of remote sensing video processing, and relates to remote sensing video target tracking of very small targets in large scenes, in particular to a method for remote sensing video tracking of very small targets in large scenes based on a motion estimation ME-CNN network. It is used in security monitoring, smart city construction and traffic facility monitoring.

背景技术Background technique

遥感目标跟踪是计算机视觉领域的一个重要的研究方向，其中移动的卫星拍摄的大场景极小目标、低分辨率的遥感视频的目标跟踪是极具挑战性的研究问题。大场景小目标的遥感视频记录的是某一地区一段时间的日常活动情况，因为卫星拍摄的高度很高，覆盖大半个城市，因此视频的分辨率不高，车辆、舰船和飞机在视频中的尺寸极小，车辆在视频中的尺寸甚至达到3*3个像素左右，与周围环境的对比度也极低，人眼只能观察到一个小亮点，因此这种超低像素且极小的目标跟踪问题属于大场景极小目标跟踪问题，其难度更大；且由于拍摄视频的卫星不断运动，视频在整体向一个方向有较明显偏移的同时由于地区高低会出现部分地区缩放，使得很难以先做图像配准再进行帧差法的方法得到目标移动位置，给大场景极小目标的遥感视频目标跟踪带来了极大的挑战。Remote sensing target tracking is an important research direction in the field of computer vision, in which the target tracking of large scenes, extremely small targets and low-resolution remote sensing videos captured by moving satellites is a very challenging research problem. The remote sensing video of a large scene and a small target records the daily activities of a certain area for a period of time. Because the satellite shooting is at a high altitude and covers more than half of the city, the resolution of the video is not high. Vehicles, ships and aircraft are in the video The size of the vehicle is extremely small, and the size of the vehicle in the video even reaches about 3*3 pixels, and the contrast with the surrounding environment is also extremely low. Human eyes can only observe a small bright spot, so this ultra-low pixel and extremely small target The tracking problem is a very small target tracking problem in a large scene, which is more difficult; and because the satellites that shoot the video are constantly moving, the overall video has a relatively obvious shift in one direction, and some areas will zoom due to the height of the area, making it difficult to The method of image registration first and then frame difference method to obtain the moving position of the target brings great challenges to remote sensing video target tracking of very small targets in large scenes.

视频目标跟踪是在给定一个视频初始帧中的目标位置和尺寸后，需要预测后续视频帧中的目标位置与尺寸。目前，视频跟踪领域的算法大多基于神经网络(NeuralNetwork)和相关滤波(Correlation Filter)，其中基于神经网络算法，比如CNN-SVM方法的主要思路是先将目标输入多层神经网络，学习目标特征，再利用传统的SVM方法做跟踪，其提取的目标特征通过大量的训练数据学习出来，比传统的特征更具有鉴别性；基于相关滤波的算法，比如KCF方法的基本思想是寻找一个滤波模板，让下一帧的图像与滤波模板做卷积操作，响应最大的区域就是预测的目标，用该模板和其他搜索区域做卷积运算，得到最大响应的搜索区域就是目标位置，这种方法的运算速度快，且准确率较高。Video target tracking is to predict the target position and size in subsequent video frames after given the target position and size in an initial video frame. At present, most of the algorithms in the field of video tracking are based on neural network (Neural Network) and correlation filter (Correlation Filter). The main idea based on neural network algorithms, such as the CNN-SVM method, is to first input the target into a multi-layer neural network to learn target features. Then use the traditional SVM method for tracking. The extracted target features are learned through a large amount of training data, which is more discriminative than traditional features; the algorithm based on correlation filtering, such as the basic idea of the KCF method is to find a filter template, so that The image of the next frame is convolved with the filter template, and the area with the largest response is the predicted target. Use this template and other search areas for convolution operation, and the search area with the largest response is the target position. The calculation speed of this method is Fast and high accuracy.

自然光学视频跟踪的算法很难应用于大场景极小小目标的遥感视频中，因为目标尺寸极小且模糊，用神经网络无法学习得到有效的目标特征。而传统遥感视频的跟踪方法也不适用于出现背景不断偏移及部分区域缩放的视频上，图像配准和帧差法的技术方法无法实现，且目标和周围环境对比度极低，极易跟丢。Natural optical video tracking algorithms are difficult to apply to remote sensing videos of very small targets in large scenes, because the target size is extremely small and blurred, and neural networks cannot learn effective target features. However, the tracking method of traditional remote sensing video is not suitable for the video where the background is continuously shifted and some areas are zoomed in. The technical methods of image registration and frame difference method cannot be realized, and the contrast between the target and the surrounding environment is extremely low, and it is easy to lose track. .

发明内容Contents of the invention

本发明的目的在于克服上述现有技术存在的缺陷，提出了一种计算复杂度低，精度更高的基于运动估计的大场景小目标遥感视频跟踪方法。The purpose of the present invention is to overcome the defects in the above-mentioned prior art, and propose a remote sensing video tracking method based on motion estimation with low computational complexity and higher precision for large scenes and small targets.

本发明是一种基于运动估计ME-CNN网络的大场景极小目标遥感视频跟踪方法，其特征在于，包括如下步骤：The present invention is a kind of remote sensing video tracking method based on motion estimation ME-CNN network of very small target in large scene, it is characterized in that, comprises the following steps:

(1)获取极小目标运动估计网络ME-CNN的初始训练集D：(1) Obtain the initial training set D of the minimal target motion estimation network ME-CNN:

取原始遥感数据视频A的前F帧图像，对每幅图像的同一个目标连续标记边界框，将每个边界框左上角顶点坐标按视频帧数顺序排列在一起作为训练集D；Take the first F frames of the original remote sensing data video A, continuously mark the bounding box for the same target in each image, and arrange the vertex coordinates of the upper left corner of each bounding box in order of video frame number as the training set D;

(2)构建估计极小目标运动的网络ME-CNN：包括并联的三个提取训练数据不同特征的卷积模块，再依次层叠连接层、全连接层和输出层；(2) Construct a network ME-CNN for estimating the motion of a very small target: including three parallel convolution modules that extract different features of the training data, and then sequentially stack the connection layer, the fully connected layer and the output layer;

(3)用极小目标运动参数计算网络ME-CNN的损失函数：根据目标的运动规律计算得到目标的运动趋势，并将它作为目标对应的训练标签，再计算训练标签与ME-CNN网络的预测结果之间的欧式空间距离，作为ME-CNN网络优化训练的损失函数；(3) Calculate the loss function of the network ME-CNN with the minimum target motion parameters: calculate the motion trend of the target according to the motion law of the target, and use it as the training label corresponding to the target, and then calculate the relationship between the training label and the ME-CNN network The Euclidean spatial distance between the prediction results is used as the loss function of the ME-CNN network optimization training;

(4)判断是否为初始训练集：判断当前训练集是否为初始训练集，如果不是初始训练集，执行步骤(5)，更新损失函数中的训练标签；反之如果是初始训练集，执行步骤(6)，进入网络的循环训练；(4) Determine whether it is the initial training set: determine whether the current training set is the initial training set, if it is not the initial training set, perform step (5), and update the training label in the loss function; otherwise, if it is the initial training set, perform step ( 6), enter the cycle training of the network;

(5)更新损失函数中的训练标签：当前训练集不是初始训练集时，使用当前训练集的数据重新计算损失函数的训练标签，计算方法用极小目标运动参数计算训练标签的方法，与步骤(3)的方法相同，重新计算得到的训练标签，参与运动估计网络ME-CNN训练，进入步骤(6)；(5) Update the training label in the loss function: when the current training set is not the initial training set, use the data of the current training set to recalculate the training label of the loss function, the calculation method uses the minimum target motion parameters to calculate the training label method, and steps The method of (3) is the same, recalculate the training label that obtains, participate in motion estimation network ME-CNN training, enter step (6);

(6)得到预测目标运动位置的初始模型M1：将训练集D输入目标运动估计网络ME-CNN，根据当前的损失函数训练网络，得到预测目标运动位置的初始模型M1；(6) Obtain the initial model M1 of the predicted target motion position: input the training set D into the target motion estimation network ME-CNN, train the network according to the current loss function, and obtain the initial model M1 of the predicted target motion position;

(7)修正预测模型的位置结果：计算目标的辅助位置偏移量，用偏移量修正运动估计网络ME-CNN预测的位置结果；(7) Correct the position result of the prediction model: calculate the auxiliary position offset of the target, and use the offset to correct the position result predicted by the motion estimation network ME-CNN;

(7a)得到目标灰度图像块：根据预测目标运动位置的初始模型M1得到下一帧的目标位置(P_x,P_y)，根据得到的目标位置(P_x,P_y)在下一帧的图像中取出目标的灰度图像块，并进行归一化，得到归一化后的目标灰度图像块；(7a) Get the target grayscale image block: get the target position (P _x , P _y ) of the next frame according to the initial model M1 of the predicted target motion position, and get the target position (P _x , P _y ) in the next frame according to the obtained target position (P x , P y ) The grayscale image block of the target is taken out from the image, and normalized to obtain the normalized target grayscale image block;

(7b)得到目标位置偏移量：对归一化后的目标灰度图像块进行亮度分级，使用垂直投影法确定图像块中目标的位置，计算得到的目标中心位置与图像块中心的位置的距离，即得到目标位置偏移量；(7b) Obtain the target position offset: perform brightness classification on the normalized target grayscale image block, use the vertical projection method to determine the position of the target in the image block, and calculate the distance between the target center position and the center position of the image block Distance, that is, to get the target position offset;

(7c)得到修正后的目标位置：利用得到的目标位置偏移量修正运动估计网络ME-CNN预测目标的位置，得到目标修正后的所有位置；(7c) Get the corrected target position: Use the obtained target position offset to correct the position of the target predicted by the motion estimation network ME-CNN, and obtain all corrected positions of the target;

(8)用修正后的目标位置更新训练数据集，完成一帧的目标跟踪：将得到的目标左上角位置加入训练集D最后一行，并移除训练集D的第一行，进行一次性操作，得到了一个修正并更新的训练集D，完成一帧的训练，得到了一帧的目标位置结果；(8) Update the training data set with the corrected target position to complete the target tracking of one frame: add the obtained upper left corner position of the target to the last line of the training set D, and remove the first line of the training set D to perform a one-time operation , get a corrected and updated training set D, complete the training of one frame, and get the target position result of one frame;

(9)判断目前的视频帧数是否小于总视频帧数：如果小于总视频帧数就循环重复步骤(4)～步骤(9)，进行目标的跟踪优化训练，直至遍历所有的视频帧，继续训练，反之如果等于总视频帧数，结束训练，执行步骤(10)；(9) Determine whether the current video frame number is less than the total video frame number: if it is less than the total video frame number, repeat steps (4) to (9) cyclically to perform target tracking optimization training until all video frames are traversed, and continue Training, otherwise if it is equal to the total number of video frames, end the training and perform step (10);

(10)得到遥感视频目标跟踪结果：累积得到的输出即为遥感视频目标跟踪结果。(10) Obtain the remote sensing video target tracking result: the accumulated output is the remote sensing video target tracking result.

本发明解决了现有视频跟踪算法存在的计算复杂度高，跟踪精度较低的问题。The invention solves the problems of high computational complexity and low tracking precision existing in existing video tracking algorithms.

本发明与现有技术相比，具有以下优点：Compared with the prior art, the present invention has the following advantages:

(1)本发明采用的运动估计网络ME-CNN无需传统方法中先进行图像配准再进行帧差法或者复杂的图像背景建模得到目标的运动轨迹，可以通过神经网络对前F帧图像的目标位置组成的训练集的分析，网络预测得到目标的运动趋向，不需要人工标注后续视频帧中的目标位置标签，即可实现网络自循环训练，不仅大大减少了跟踪算法的复杂度，提高了算法的实用性。(1) The motion estimation network ME-CNN adopted in the present invention does not need to perform image registration first in the traditional method, and then perform frame difference method or complex image background modeling to obtain the motion track of the target, and can use the neural network for the previous F frame images The analysis of the training set composed of the target position and the network prediction can obtain the target's motion trend, and the self-circulation training of the network can be realized without manually labeling the target position label in the subsequent video frame, which not only greatly reduces the complexity of the tracking algorithm, but also improves the The usefulness of the algorithm.

(2)本发明采用的算法，采用ME-CNN网络和辅助位置偏移量方法结合的方式自行修正遥感视频目标位置，根据目标的运动规律，修改了运动估计网络的损失函数，减小了网络的计算量，提高了目标跟踪的鲁棒性。(2) The algorithm used in the present invention adopts the method of combining ME-CNN network and auxiliary position offset method to correct the remote sensing video target position by itself, and according to the motion law of the target, the loss function of the motion estimation network is modified, which reduces the network The amount of calculation improves the robustness of target tracking.

附图说明Description of drawings

图1是本发明的实现流程图；Fig. 1 is the realization flowchart of the present invention;

图2是本发明提出的ME-CNN网络的结构示意图；Fig. 2 is the structural representation of the ME-CNN network that the present invention proposes;

图3是采用本发明对大场景中极小目标的预测轨迹结果与标准目标轨迹的曲线对比图，本发明的预测结果为绿色曲线，红色为准确目标轨迹。Fig. 3 is a graph comparing curves between the predicted trajectory results of extremely small targets in large scenes and standard target trajectories using the present invention. The predicted results of the present invention are green curves, and red is the accurate target trajectory.

具体实施方式Detailed ways

下面结合附图和具体实施例，对本发明作详细描述。The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

实施例1Example 1

大场景极小目标遥感视频跟踪在安全监控、智慧城市建设和交通设施监测等方面发挥重要作用。本发明研究的遥感视频是移动的卫星拍摄的大场景极小目标、低分辨率的遥感视频。本发明研究的视频跟踪的目标极其模糊，且目标极小，与周围环境的对比度也很低，人眼在目标不运动的情况下也很难看出目标是车辆，视频又由于卫星的移动和拍摄地区的海拔高度变化会出现图像平移及部分缩放，其目标跟踪的难度相比清晰视频的目标跟踪大大提高，也是遥感视频跟踪的一个挑战。现有的方法主要有两种，一种是利用神经网络学习提取目标特征，在下一帧提取多个搜索框，选取目标特征得分最高的一个框为目标所在位置，这种方法由于目标超模糊又极小，无法提取到有效特征，无法应用于本发明的视频。另一种是先将图像配准再进行帧差法得到目标运动轨迹，再寻找一个滤波模板，让下一帧的图像与滤波模板做卷积操作，响应最大的区域就是预测的目标，这种方法由于本发明的视频不仅出现图像平移，更出现了部分缩放，大大加大了图像配准的复杂度，增大了计算难度，很难提取到有效的运动轨迹。因此本发明针对这些现状，经过研究提出一种基于运动估计ME-CNN网络的大场景极小目标遥感视频跟踪方法，参见图1，包括如下步骤：Remote sensing video tracking of very small targets in large scenes plays an important role in security monitoring, smart city construction, and traffic facility monitoring. The remote sensing video studied by the present invention is a remote sensing video of a large scene, a very small target, and a low resolution taken by a moving satellite. The target of the video tracking researched by the present invention is extremely blurred, and the target is extremely small, and the contrast with the surrounding environment is also very low. It is difficult for human eyes to see that the target is a vehicle when the target does not move, and the video is due to the movement and shooting of satellites. Altitude changes in the region will cause image translation and partial zooming, and the difficulty of target tracking is greatly improved compared with clear video target tracking, which is also a challenge for remote sensing video tracking. There are mainly two existing methods. One is to use neural network to learn and extract target features, extract multiple search boxes in the next frame, and select the box with the highest score of target features as the target location. Extremely small, effective features cannot be extracted, and cannot be applied to the video of the present invention. The other is to first register the image and then perform the frame difference method to obtain the target motion trajectory, then find a filter template, and let the image of the next frame be convolved with the filter template, and the area with the largest response is the predicted target. Method Because the video of the present invention not only has image translation, but also partial zooming, the complexity of image registration is greatly increased, the difficulty of calculation is increased, and it is difficult to extract effective motion trajectories. Therefore, the present invention aims at these present situations, proposes a kind of remote sensing video tracking method based on motion estimation ME-CNN network of motion estimation ME-CNN network, referring to Fig. 1, comprising the following steps:

取原始遥感数据视频A的前F帧图像，在每幅图像中只选定一个目标，对每幅图像的同一个目标连续标记边界框，本发明中将极小目标为目标，将每个边界框左上角顶点坐标按视频帧数顺序排列在一起作为训练集D，其中采用图像坐标系，训练集D是一个F行2列的矩阵，每一行对应的是一帧的目标坐标位置，其中目标的位置可以以左上角顶点的坐标表示，也可以由中心坐标表示，不影响对目标运动情况的分析。Take the first F frame images of the original remote sensing data video A, select only one target in each image, and mark the bounding box continuously for the same target in each image. In the present invention, the minimal target is used as the target, and each boundary The coordinates of the vertices in the upper left corner of the frame are arranged together in order of the number of video frames as the training set D, in which the image coordinate system is used, and the training set D is a matrix of F rows and 2 columns, and each row corresponds to the target coordinate position of a frame, where the target The position of can be expressed by the coordinates of the upper left corner vertex, or by the center coordinates, which does not affect the analysis of the target movement.

(2)构建估计极小目标运动的网络ME-CNN：本发明的ME-CNN网络包括并联的三个提取训练数据不同特征的卷积模块以得到目标的不同运动特征，再依次层叠连接层融合提取到的不同的运动特征、全连接层和输出层得到输出结果，总体构成ME-CNN网络。本发明的使用三个卷积模块以得到目标的不同运动特征，单个卷积模块很难处理得到整个训练集的特征，如果网络层较深会出现梯度消失的问题，因此本发明对网络进行加宽，多个尺度的提取训练集不同情况的特征，减小了网络复杂度，加快网络速度。由于本发明的视频不断移动偏移且由于地区的高低不同出现部分区域缩放，因此针对这种视频无法使用图像配准加帧差法和背景建模等方法，此时就可以使用ME-CNN网络得到目标运动轨迹，相对现有的方法模型复杂度低，计算量小。(2) Build a network ME-CNN to estimate the motion of a very small target: the ME-CNN network of the present invention includes three parallel convolution modules that extract different features of the training data to obtain different motion features of the target, and then sequentially stack the connection layer fusion The extracted different motion features, the fully connected layer and the output layer get the output results, which constitute the ME-CNN network as a whole. The present invention uses three convolution modules to obtain different motion characteristics of the target. It is difficult for a single convolution module to process the characteristics of the entire training set. If the network layer is deep, the problem of gradient disappearance will occur. Therefore, the present invention increases the network Wide, multiple scales extract the features of different situations in the training set, which reduces the network complexity and speeds up the network speed. Because the video of the present invention is constantly moving and offset and some areas are zoomed due to the difference in height of the region, methods such as image registration plus frame difference method and background modeling cannot be used for this kind of video. At this time, the ME-CNN network can be used Obtaining the target trajectory has lower model complexity and less calculation than the existing methods.

(3)用极小目标运动参数计算网络ME-CNN的损失函数：根据目标的运动规律计算得到目标的运动趋势，并将它作为目标对应的训练标签，再计算训练标签与ME-CNN网络的预测结果之间的欧式空间距离，作为ME-CNN网络优化的损失函数，本发明中训练用的损失函数可以加强对训练数据的分析，帮助网络快速的提取到有效的特征，用以优化运动估计网络ME-CNN。(3) Calculate the loss function of the network ME-CNN with the minimum target motion parameters: calculate the motion trend of the target according to the motion law of the target, and use it as the training label corresponding to the target, and then calculate the relationship between the training label and the ME-CNN network The Euclidean space distance between the prediction results is used as the loss function optimized by the ME-CNN network. The loss function used for training in the present invention can strengthen the analysis of the training data and help the network to quickly extract effective features for optimizing motion estimation. Network ME-CNN.

(4)判断是否为初始训练集：判断当前训练集是否为初始训练集，如果当前训练集不是初始训练集，执行步骤(5)，更新损失函数中的训练标签，进而参与网络训练。反之，如果当前训练集是初始训练集，执行步骤(6)，进入网络的循环训练。(4) Determine whether it is the initial training set: determine whether the current training set is the initial training set, if the current training set is not the initial training set, perform step (5), update the training label in the loss function, and then participate in network training. On the contrary, if the current training set is the initial training set, perform step (6) to enter the network cycle training.

(5)更新损失函数中的训练标签：由于训练集D在后续步骤(8)中不断更新，训练过程中需要根据更新后的训练集D不断调整损失函数中的训练标签，当前训练集不是初始训练集时，应该使用当前训练集的数据重新计算损失函数的训练标签，计算方法用极小目标运动参数计算训练标签的方法，与步骤(3)的方法相同；重新计算得到的训练标签，参与运动估计网络ME-CNN训练，进入步骤(6)。(5) Update the training label in the loss function: Since the training set D is continuously updated in the subsequent step (8), it is necessary to continuously adjust the training label in the loss function according to the updated training set D during the training process. The current training set is not the initial When training the set, the training label of the loss function should be recalculated using the data of the current training set. The calculation method uses the minimum target motion parameters to calculate the training label, which is the same as the method in step (3); the recalculated training label, participate in Motion estimation network ME-CNN training, enter step (6).

(6)得到预测目标运动位置的初始模型M1：将训练集D输入目标运动估计网络ME-CNN，根据当前的损失函数训练网络，得到预测目标运动位置的初始模型M1。(6) Obtain the initial model M1 for predicting the target motion position: input the training set D into the target motion estimation network ME-CNN, train the network according to the current loss function, and obtain the initial model M1 for predicting the target motion position.

(7)修正预测模型的位置结果：计算目标的辅助位置偏移量，用偏移量修正运动估计网络ME-CNN预测的位置结果。(7) Correct the position result of the prediction model: calculate the auxiliary position offset of the target, and use the offset to correct the position result predicted by the motion estimation network ME-CNN.

(7a)得到目标灰度图像块：根据预测目标运动位置的初始模型M1得到下一帧的目标位置(P_x,P_y)，根据得到的目标位置(P_x,P_y)在下一帧的图像中取出目标的灰度图像块，并进行归一化，得到归一化后的目标灰度图像块，由于目标的尺寸极小，与周围环境的对比度极低，用神经网络判断偏移量的方法对其效果较差，因此先取得较小的目标框，再在框内判断偏移量的方法较好。(7a) Get the target grayscale image block: get the target position (P _x , P _y ) of the next frame according to the initial model M1 of the predicted target motion position, and get the target position (P _x , P _y ) in the next frame according to the obtained target position (P x , P y ) The grayscale image block of the target is taken out from the image and normalized to obtain the normalized grayscale image block of the target. Since the size of the target is extremely small and the contrast with the surrounding environment is extremely low, the offset is judged by the neural network The effect of the method is poor, so the method of obtaining a smaller target frame first, and then judging the offset in the frame is better.

(7b)得到目标位置偏移量：对归一化后的目标灰度图像块进行亮度分级，将目标和道路以不同的亮度显示，同时由于道路周边环境和目标对比度极低，使用垂直投影法确定图像块中目标的位置，计算得到的目标中心位置与图像块中心的位置的距离，即得到目标位置偏移量。(7b) Obtain the target position offset: perform brightness grading on the normalized target grayscale image block, and display the target and the road with different brightness. At the same time, due to the extremely low contrast between the surrounding environment of the road and the target, the vertical projection method is used Determine the position of the target in the image block, and calculate the distance between the center position of the target and the center of the image block to obtain the target position offset.

(7c)得到修正后的目标位置：利用得到的目标位置偏移量修正运动估计网络ME-CNN预测目标的位置，得到目标修正后的所有位置，包括目标左上角的位置。(7c) Get the corrected target position: Use the obtained target position offset to correct the position of the target predicted by the motion estimation network ME-CNN, and obtain all corrected positions of the target, including the position of the upper left corner of the target.

(8)用修正后的目标位置更新训练数据集，完成一帧的目标跟踪：将得到的目标左上角位置加入训练集D最后一行，并移除训练集D的第一行，进行一次性操作，得到了一个修正并更新的训练集D，完成一帧的训练，得到了一帧的目标位置结果，这种循环修改训练集的方法更新了网络参数，减小了目标帧间差异，适应目标运动。(8) Update the training data set with the corrected target position to complete the target tracking of one frame: add the obtained upper left corner position of the target to the last line of the training set D, and remove the first line of the training set D to perform a one-time operation , a corrected and updated training set D is obtained, one frame of training is completed, and the target position result of one frame is obtained. This method of cyclically modifying the training set updates the network parameters, reduces the difference between target frames, and adapts to the target position. sports.

(9)判断目前的视频帧数是否小于总视频帧数，如果小于总视频帧数就循环重复步骤(4)～步骤(9)，重新更新模型参数，提高模型适应性，进行目标的跟踪优化训练，直至遍历所有的视频帧，继续训练，反之如果目前的视频帧数等于总视频帧数，结束训练，执行步骤(10)。(9) Determine whether the current number of video frames is less than the total number of video frames. If it is less than the total number of video frames, repeat steps (4) to (9) cyclically, re-update model parameters, improve model adaptability, and perform target tracking optimization. Train until all video frames are traversed, and continue training, otherwise if the current video frame number is equal to the total video frame number, end the training and perform step (10).

(10)得到遥感视频目标跟踪结果：训练结束后，累积得到的目标位置输出即为遥感视频目标跟踪结果。(10) Get the remote sensing video target tracking result: After the training, the accumulated target position output is the remote sensing video target tracking result.

本发明采用的运动估计网络ME-CNN无需传统方法中先进行图像配准再进行帧差法或者复杂的图像背景建模得到目标运动轨迹，提出的新算法通过神经网络对前F帧图像的目标位置组成的训练集的分析，可以有效提取目标运动特征。由于网络过深会出现梯度消失等问题，因此采用多尺度分析的ME-CNN网络预测得到目标的运动趋向，不需要人工标注后续视频帧中的目标位置标签，即可实现网络自循环训练，不仅大大减少了跟踪算法的复杂度，提高了算法的实用性，通过目标的运动估计网络可以无需图像配准、快速准确的找到目标位置。使用ME-CNN网络和辅助位置偏移量方法结合的方式自行判断对遥感视频目标位置，根据目标的运动情况，得到目标的运动速度，分析它可能的运动趋势，还修改了运动估计网络的损失函数，提高了目标跟踪的鲁棒性。The motion estimation network ME-CNN adopted in the present invention does not need to perform image registration first in the traditional method and then perform frame difference method or complex image background modeling to obtain the target motion trajectory. The analysis of the training set composed of locations can effectively extract the target motion features. Due to problems such as gradient disappearance when the network is too deep, the ME-CNN network with multi-scale analysis is used to predict the movement trend of the target. It does not need to manually mark the target position label in the subsequent video frames, and the network self-circulation training can be realized, not only The complexity of the tracking algorithm is greatly reduced, and the practicability of the algorithm is improved. Through the target motion estimation network, the target position can be found quickly and accurately without image registration. Using the combination of ME-CNN network and auxiliary position offset method to judge the position of the remote sensing video target by itself, according to the motion of the target, get the motion speed of the target, analyze its possible motion trend, and modify the loss of the motion estimation network function, which improves the robustness of object tracking.

本发明利用一种基于深度学习的方法来对超模糊的目标进行运动分析，预测其下一步的预测方向，再辅以位置偏移量修正运动估计网络，不需要后续标签，即可进行目标跟踪，从而避免了跟踪中大场景图像配准，超模糊目标特征提取难的问题，显著的提高了超模糊视频中目标跟踪的准确度，也适用于其他各种遥感视频中的跟踪。The present invention uses a method based on deep learning to analyze the motion of the ultra-fuzzy target, predict its next prediction direction, and then use the position offset to correct the motion estimation network, so that the target can be tracked without subsequent labels , so as to avoid the problems of image registration in large scenes and difficult feature extraction of super-fuzzy targets in tracking, and significantly improve the accuracy of target tracking in super-fuzzy videos, and is also suitable for tracking in various other remote sensing videos.

实施例2Example 2

基于运动估计ME-CNN网络的大场景极小目标遥感视频跟踪方法同实施例1，步骤(2)中所述的构建估计极小目标运动的网络ME-CNN，如图2，包括有如下步骤：The remote sensing video tracking method for very small objects in large scenes based on motion estimation ME-CNN network is the same as embodiment 1, and the network ME-CNN for estimating the motion of extremely small objects described in step (2), as shown in Figure 2, includes the following steps :

(2a)运动估计网络的整体结构：运动估计网络ME-CNN，包括并联的三个卷积模块，本发明用并联的三个卷积模块提取不同的运动特征，之后并联的三个卷积模块再依次连接到连接层、全连接层和输出层。本发明构建估计极小目标运动的网络ME-CNN中使用连接层融合提取的不同运动特征，使用全连接层提炼分析，通过输出层输出得到结果。(2a) The overall structure of the motion estimation network: the motion estimation network ME-CNN, including three convolution modules connected in parallel, the present invention uses the three convolution modules connected in parallel to extract different motion features, and then the three convolution modules connected in parallel Then connect to the connection layer, fully connected layer and output layer in turn. The invention constructs a network ME-CNN for estimating the motion of a very small target, uses a connection layer to fuse and extract different motion features, uses a fully connected layer to refine and analyze, and outputs the result through an output layer.

(2b)并联的三个卷积模块的结构：并联的三个卷积模块，分别为卷积模块Ι、卷积模块ΙΙ和卷积模块ΙΙΙ，其中(2b) Structure of three convolution modules connected in parallel: three convolution modules connected in parallel are respectively convolution module I, convolution module II and convolution module III, wherein

卷积模块I包含局部连接的LocallyConnected1D卷积层，步长为2提取目标的坐标位置信息；The convolution module I includes a locally connected LocallyConnected1D convolution layer with a step size of 2 to extract the coordinate position information of the target;

卷积模块ΙΙ包含空洞卷积，其步长为1；Convolution module ΙΙ contains dilated convolution with a step size of 1;

卷积模块ΙΙΙ包含步长为2的一维卷积；The convolution module ΙΙΙ contains a one-dimensional convolution with a step size of 2;

卷积模块Ι、ΙΙ和ΙΙΙ获取目标不同尺度的位置特征，得到三个输出数据，然后将三个卷积模块的输出串联在一起得到融合卷积结果；再输入全连接层及输出层，得到最后预测结果。本发明的使用三个卷积模块以得到目标的不同运动特征，单个卷积模块很难处理得到整个训练集的特征，如果网络层较深会出现梯度消失的问题，因此本发明对网络进行加宽，多个尺度的提取训练集不同情况的特征，减小了网络复杂度，加快网络速度。由于本发明的视频不断移动偏移且由于地区的高低不同出现部分区域缩放，因此针对这种视频无法使用图像配准加帧差法和背景建模等方法，此时就可以使用ME-CNN网络得到目标运动轨迹，相对现有的方法模型复杂度低，计算量小。The convolution modules Ι, ΙΙ, and ΙΙΙ obtain the location features of the target at different scales, and obtain three output data, and then connect the outputs of the three convolution modules in series to obtain the fusion convolution result; then input the fully connected layer and the output layer to obtain The final prediction result. The present invention uses three convolution modules to obtain different motion characteristics of the target. It is difficult for a single convolution module to process the characteristics of the entire training set. If the network layer is deep, the problem of gradient disappearance will occur. Therefore, the present invention increases the network Wide, multiple scales extract the features of different situations in the training set, which reduces the network complexity and speeds up the network speed. Because the video of the present invention is constantly moving and offset and some areas are zoomed due to the difference in height of the region, methods such as image registration plus frame difference method and background modeling cannot be used for this kind of video. At this time, the ME-CNN network can be used Obtaining the target trajectory has lower model complexity and less calculation than the existing methods.

实施例3Example 3

基于运动估计ME-CNN网络的大场景极小目标遥感视频跟踪方法同实施例1-2，步骤3中所述的用极小目标运动参数计算网络ME-CNN的损失函数，通过处理训练集D的数据，对目标的运动情况进行大致的分析，对运动估计网络ME-CNN的优化方向有一定的指导作用，包括有如下步骤：Based on the motion estimation ME-CNN network, the large scene minimal target remote sensing video tracking method is the same as embodiment 1-2, and the loss function of the minimal target motion parameter calculation network ME-CNN described in step 3 is processed by processing the training set D The data, a rough analysis of the movement of the target, has a certain guiding effect on the optimization direction of the motion estimation network ME-CNN, including the following steps:

(3a)获取训练集D目标位移：将训练集D第F行、第F-2行、第F-4行的数据取出，分别与训练集D第一行数据相减，得到第F帧、第F-2帧、第F-4帧分别与第一帧之间的目标位移依次为S₁、S₂、S₃。S₁为第F帧与第一帧之间的目标位移，S₂为第F-2帧与第一帧之间的目标位移，S₃为第F-4帧与第一帧之间的目标位移。如果训练集不是初始训练集，而是过更新i次的训练集D，训练集每一行对应的帧数也在相应改变，变为第1+i帧，第2+i帧，……，第F+i帧，将训练集D第F行、第F-2行、第F-4行的数据取出，分别与训练集D第一行数据相减，得到的分别是第F+i帧、第F+i-2帧、第F+i-4帧的目标位移分别与第一帧之间依次为S₁、S₂、S₃。(3a) Obtain the target displacement of the training set D: take out the data of the Fth row, the F-2th row, and the F-4th row of the training set D, and subtract them from the data of the first row of the training set D respectively to obtain the Fth frame, The target displacements between frame F-2, frame F-4 and the first frame are S ₁ , S ₂ , and S ₃ in sequence. S ₁ is the target displacement between frame F and the first frame, S ₂ is the target displacement between frame F-2 and the first frame, S ₃ is the target between frame F-4 and the first frame displacement. If the training set is not the initial training set, but the training set D that has been updated i times, the number of frames corresponding to each row of the training set will also change accordingly, becoming the 1st+i frame, 2+i frame, ..., the th In frame F+i, the data in row F, row F-2, and row F-4 of training set D are taken out, and subtracted from the data in the first row of training set D, respectively, and the results are frame F+i, The distances between the target displacements of the F+i-2th frame and the F+i-4th frame and the first frame are respectively S ₁ , S ₂ , and S ₃ in sequence.

(3b)得到目标的运动趋势：(3b) Obtain the movement trend of the target:

根据目标的运动规律，利用得到的目标位移，分别在图像坐标系的x,y方向按照下列公式计算得到目标的运动趋势(G_x,G_y)；According to the movement law of the target, use the obtained target displacement to calculate the target's movement trend (G _x , G _y ) in the x and y directions of the image coordinate system according to the following formula;

V₁＝(S₁-S₂)/2V ₁ =(S ₁ -S ₂ )/2

V₂＝(S₂-S₃)/2V ₂ =(S ₂ −S ₃ )/2

a＝(V₁-V₂)/2a=(V ₁ -V ₂ )/2

G＝V₁+a/2G=V ₁ +a/2

本发明中使用图像坐标系，图像坐标系是以图像左上角为原点，水平向右为x方向，垂直向下为y方向。上式中，V₁为位移S₁与S₂间的目标运动速度，V₂为位移S₂与S₃间的目标运动速度，a为运动加速度，G为目标的运动趋势。The image coordinate system is used in the present invention, and the image coordinate system is based on the upper left corner of the image as the origin, the horizontal direction to the right is the x direction, and the vertical downward direction is the y direction. In the above formula, V ₁ is the target motion speed between displacement S ₁ and S ₂ , V ₂ is the target motion speed between displacement S ₂ and S ₃ , a is the motion acceleration, and G is the motion trend of the target.

(3c)构建运动估计网络ME-CNN的损失函数：(3c) Construct the loss function of the motion estimation network ME-CNN:

根据目标的运动规律计算得到目标的运动趋势，并将它作为目标对应的训练标签，计算得到的目标运动趋势(G_x,G_y)与估计网络ME-CNN输出的预测位置(P_x,P_y)之间的欧式空间距离，构建为运动估计网络ME-CNN的损失函数；Calculate the movement trend of the target according to the movement law of the target, and use it as the training label corresponding to the target. The calculated target movement trend (G _x , G _y ) and the predicted position (P _x , P The Euclidean spatial distance between _y ) is constructed as the loss function of the motion estimation network ME-CNN;

式中，G_x为图像坐标系下x方向的目标运动趋势，G_y为图像坐标系下y方向的目标运动趋势，P_x为运动估计网络在图像坐标系下x方向的预测结果，P_y为运动估计网络在图像坐标系下y方向的预测结果。In the formula, G _x is the target motion trend in the x direction in the image coordinate system, G _y is the target motion trend in the y direction in the image coordinate system, P _x is the prediction result of the motion estimation network in the x direction in the image coordinate system, and P _y is the prediction result of the motion estimation network in the y direction in the image coordinate system.

下面给出一个综合性的例子，对本发明进一步说明A comprehensive example is given below to further illustrate the present invention

实施例4Example 4

基于运动估计ME-CNN网络的大场景极小目标遥感视频跟踪方法同实施例1-3，The remote sensing video tracking method for a very small target in a large scene based on a motion estimation ME-CNN network is the same as in Embodiment 1-3,

参照图1、一种基于运动估计ME-CNN网络的大场景极小目标遥感视频跟踪方法，包括如下步骤：Referring to Fig. 1, a method for remote sensing video tracking of very small targets in a large scene based on a motion estimation ME-CNN network comprises the following steps:

取原始遥感数据视频A的前F帧图像，对每幅图像的一个目标连续标记边界框，将每个边界框左上角顶点坐标叠加在一起作为训练集D，其中训练集D是一个F行2列的矩阵，每一行对应的是视频中一帧的目标坐标，其中目标的位置可以以左上角顶点的坐标表示，也可以由中心坐标表示，不影响对目标运动情况的分析，本发明中将极小目标简称为目标。Take the first F frame images of the original remote sensing data video A, continuously mark a bounding box for a target in each image, and superimpose the vertex coordinates of the upper left corner of each bounding box together as a training set D, where the training set D is an F row 2 The matrix of columns, each row corresponds to the target coordinates of a frame in the video, wherein the position of the target can be represented by the coordinates of the upper left corner vertex, or can be represented by the center coordinates, which does not affect the analysis of the target motion situation. In the present invention, Minimal targets are simply referred to as targets.

(2)构建估计极小目标运动的网络ME-CNN：包括并联的三个提取训练数据不同特征的卷积模块以得到目标的不同运动特征，单个卷积层很难处理得到整个训练集的特征，如果网络层较深会出现梯度消失的问题，因此对网络进行加宽，多个尺度的提取训练集不同情况的特征，可以减小网络复杂度，加快网络速度，再依次层叠连接层用以融合提取的运动特征、全连接层加以分析和输出层得到结果。(2) Build a network ME-CNN to estimate the motion of extremely small targets: including three parallel convolution modules that extract different features of the training data to obtain different motion features of the target. It is difficult for a single convolutional layer to process the features of the entire training set , if the network layer is deep, there will be a problem of gradient disappearance. Therefore, widening the network and extracting the characteristics of different situations in the training set at multiple scales can reduce the complexity of the network and speed up the network. Fusion extracted motion features, fully connected layer for analysis and output layer to get the result.

(2a)运动估计网络的整体结构：运动估计网络ME-CNN，包括并联的三个卷积模块，再依次层叠连接层、全连接层、输出层；(2a) The overall structure of the motion estimation network: the motion estimation network ME-CNN, including three convolution modules connected in parallel, and then stacking the connection layer, the fully connected layer, and the output layer in sequence;

卷积模块Ι、ΙΙ和ΙΙΙ获取目标不同尺度的位置特征，得到三个输出数据，然后将三个卷积模块的输出串联在一起得到融合卷积结果；再输入全连接层及输出层，得到最后预测结果。The convolution modules Ι, ΙΙ, and ΙΙΙ obtain the location features of the target at different scales, and obtain three output data, and then connect the outputs of the three convolution modules in series to obtain the fusion convolution result; then input the fully connected layer and the output layer to obtain The final prediction result.

(3)构建极小目标运动估计网络ME-CNN的损失函数：根据目标的运动规律计算得到目标的运动趋势，并将它作为目标对应的训练标签，再计算其与ME-CNN网络的预测结果之间的欧式空间距离，作为ME-CNN网络的损失函数；(3) Construct the loss function of the minimal target motion estimation network ME-CNN: calculate the target's motion trend according to the target's motion law, and use it as the training label corresponding to the target, and then calculate its prediction result with the ME-CNN network The Euclidean spatial distance between is used as the loss function of the ME-CNN network;

(3a)获取训练集D目标位移：如果训练集是初始训练集，将训练集D第F行、第F-2行、第F-4行的数据取出，分别与训练集D第一行数据相减，得到第F帧、第F-2帧、第F-4帧分别与第一帧之间的目标位移依次为S₁、S₂、S₃。S₁为第F帧与第一帧之间的目标位移，S₂为第F-2帧与第一帧之间的目标位移，S₃为第F-4帧与第一帧之间的目标位移。如果训练集不是初始训练集，而是过更新i次的训练集D，训练集每一行对应的帧数也在相应改变，变为第1+i帧，第2+i帧，……，第F+i帧，将训练集D第F行、第F-2行、第F-4行的数据取出，分别与训练集D第一行数据相减，得到的分别是第F+i帧、第F+i-2帧、第F+i-4帧的目标位移分别与第一帧之间依次为S₁、S₂、S₃。(3a) Obtain the target displacement of training set D: if the training set is the initial training set, take out the data of row F, row F-2, and row F-4 of training set D, and compare them with the data of the first row of training set D By subtracting, the target displacements between frame F, frame F-2, and frame F-4 and the first frame are respectively S ₁ , S ₂ , and S ₃ . S ₁ is the target displacement between frame F and the first frame, S ₂ is the target displacement between frame F-2 and the first frame, S ₃ is the target between frame F-4 and the first frame displacement. If the training set is not the initial training set, but the training set D that has been updated i times, the number of frames corresponding to each row of the training set will also change accordingly, becoming the 1st+i frame, 2+i frame, ..., the th In frame F+i, the data in row F, row F-2, and row F-4 of training set D are taken out, and subtracted from the data in the first row of training set D, respectively, and the results are frame F+i, The distances between the target displacements of the F+i-2th frame and the F+i-4th frame and the first frame are respectively S ₁ , S ₂ , and S ₃ in sequence.

(3b)得到目标的运动趋势：(3b) Obtain the movement trend of the target:

根据目标的运动规律，利用得到的训练数据目标位移，分别在图像坐标系的x,y方向按照下列公式计算得到目标的运动趋势(G_x,G_y)。According to the movement law of the target, using the target displacement obtained from the training data, the movement trend (G _x , G _y ) of the target is calculated according to the following formula in the x and y directions of the image coordinate system respectively.

V₁＝(S₁-S₂)/2V ₁ =(S ₁ -S ₂ )/2

V₂＝(S₂-S₃)/2V ₂ =(S ₂ −S ₃ )/2

a＝(V₁-V₂)/2a=(V ₁ -V ₂ )/2

G＝V₁+a/2G=V ₁ +a/2

计算得到的目标运动趋势(G_x,G_y)与估计网络输出的预测位置(P_x,P_y)之间的欧式空间距离，构建为运动估计网络ME-CNN的损失函数。The calculated Euclidean spatial distance between the target motion trend (G _x , G _y ) and the predicted position (P _x , P _y ) output by the estimation network is constructed as the loss function of the motion estimation network ME-CNN.

(4)更新损失函数中的训练标签：由于训练集D在后续步骤(7)中不断更新，训练过程中需要根据更新后的训练集D不断调整损失函数中的训练标签，参与运动估计网络ME-CNN训练。(4) Update the training label in the loss function: Since the training set D is continuously updated in the subsequent step (7), it is necessary to continuously adjust the training label in the loss function according to the updated training set D during the training process, and participate in the motion estimation network ME - CNN training.

(5)得到预测目标运动位置的初始模型M1：将训练集D输入目标运动估计网络ME-CNN，根据损失函数训练网络，得到预测目标运动位置的初始模型M1。(5) Obtain the initial model M1 for predicting the target motion position: input the training set D into the target motion estimation network ME-CNN, train the network according to the loss function, and obtain the initial model M1 for predicting the target motion position.

(6)修正预测模型的位置结果：计算目标的辅助位置偏移量，用偏移量修正运动估计网络ME-CNN预测的位置结果。(6) Correct the position result of the prediction model: calculate the auxiliary position offset of the target, and use the offset to correct the position result predicted by the motion estimation network ME-CNN.

(6a)得到目标灰度图像块：根据预测目标运动位置的初始模型M1得到下一帧的目标位置(P_x,P_y)，根据得到的目标位置(P_x,P_y)在下一帧的图像中取出目标的灰度图像块，并进行归一化，得到归一化后的目标灰度图像块，由于目标的尺寸极小，与周围环境的对比度极低，用神经网络判断偏移量的方法对其效果较差，因此先取得较小的目标框，再在框内判断偏移量的方法较好。(6a) Get the target grayscale image block: get the target position (P _x , P _y ) of the next frame according to the initial model M1 of the predicted target motion position, and get the target position (P _x , P _y ) in the next frame according to the obtained target position (P x , P y ) The grayscale image block of the target is taken out from the image and normalized to obtain the normalized grayscale image block of the target. Since the size of the target is extremely small and the contrast with the surrounding environment is extremely low, the offset is judged by the neural network The effect of the method is poor, so the method of obtaining a smaller target frame first, and then judging the offset in the frame is better.

(6b)得到目标位置偏移量：对归一化后的目标灰度图像块进行亮度分级，使用垂直投影法确定图像块中目标的位置，计算得到的目标中心位置与图像块中心的位置的距离，即得到目标位置偏移量。(6b) Obtain the target position offset: perform brightness classification on the normalized target grayscale image block, use the vertical projection method to determine the position of the target in the image block, and calculate the center position of the target and the position of the center of the image block Distance, that is, to get the target position offset.

(6c)得到修正后的目标位置：利用得到的目标位置偏移量修正运动估计网络ME-CNN预测目标的位置，得到目标修正后的所有位置，包括目标左上角的位置。(6c) Obtain the corrected target position: use the obtained target position offset to correct the position of the target predicted by the motion estimation network ME-CNN, and obtain all corrected positions of the target, including the position of the upper left corner of the target.

(7)用修正后的目标位置更新训练数据集，完成一帧的目标跟踪：将得到的目标左上角位置加入训练集D最后一行，并移除训练集D的第一行，进行一次性操作，得到了一个修正并更新的训练集，完成一帧的训练，得到了一帧的目标位置结果。(7) Update the training data set with the corrected target position to complete the target tracking of one frame: add the obtained upper left corner position of the target to the last row of the training set D, and remove the first row of the training set D to perform a one-time operation , a corrected and updated training set is obtained, one frame of training is completed, and the target position result of one frame is obtained.

(8)得到遥感视频目标跟踪结果：循环重复步骤(4)～步骤(7)，不断的用更新后的训练集根据步骤(3)中的方法重新得到训练标签，更新网络模型，反复迭代，进行目标的跟踪逼近训练，直至遍历所有的视频帧，累积得到的输出即为遥感视频目标跟踪结果。(8) Obtain the remote sensing video target tracking result: repeat steps (4) to (7) in a loop, continuously use the updated training set to re-obtain the training label according to the method in step (3), update the network model, and iterate repeatedly. Carry out target tracking and approximation training until all video frames are traversed, and the accumulated output is the remote sensing video target tracking result.

本例中目标的运动估计模型还可以通过前几帧的目标运动，提取目标所在的道路信息，在地图上找到相同经纬度的目标所在城市，通过道路匹配到相应的道路情况，预测目标运动情况，充分利用道路的三维信息，在道路高度变化较为剧烈，视频中有部分缩放的情况也能较准确的跟踪目标；目标的辅助位置偏移量也可通过训练神经网络得到，但是需要提前对目标及周围环境进行处理，得到对比度较高的图像块，即可对神经网络进行训练。In this example, the motion estimation model of the target can also extract the road information of the target through the target motion in the previous few frames, find the city where the target is located at the same longitude and latitude on the map, match the corresponding road conditions through the road, and predict the target motion. Make full use of the three-dimensional information of the road, and track the target more accurately when the height of the road changes sharply and there is some zoom in the video; the auxiliary position offset of the target can also be obtained by training the neural network, but it needs to be adjusted in advance. The surrounding environment is processed to obtain image blocks with high contrast, and then the neural network can be trained.

以下结合仿真试验，对本发明的技术效果作进一步说明：Below in conjunction with simulation experiment, technical effect of the present invention is further described:

实施例5Example 5

基于运动估计ME-CNN网络的大场景极小目标遥感视频跟踪方法同实施例1-4，The remote sensing video tracking method of a very small target in a large scene based on a motion estimation ME-CNN network is the same as in Embodiment 1-4,

仿真条件和内容：Simulation conditions and content:

本发明的仿真平台为：主频为2.40GHz的Intel Xeon CPU E5-2630v3CPU，64GB的运行内存，Ubuntu16.04操作系统，软件平台为Keras和Python。显卡：GeForce GTX TITANX/PCIe/SSE2×2。The emulation platform of the present invention is: main frequency is the Intel Xeon CPU E5-2630v3CPU of 2.40GHz, the operating memory of 64GB, Ubuntu16.04 operating system, software platform is Keras and Python. Graphics card: GeForce GTX TITANX/PCIe/SSE2×2.

本发明使用的是吉林一号视频卫星拍摄的利比亚德尔纳地区的遥感视频，以前10帧图像的车辆作为目标，对图像中的目标标记边框，边框左上顶点的位置作为训练集DateSet。分别以本发明和现有基于KCF的目标跟踪方法对目标视频进行跟踪仿真。What the present invention uses is the remote sensing video of Libya's Derna area that Jilin No. 1 video satellite shoots, and the vehicle of the previous 10 frames of images is used as the target, and the frame is marked on the target in the image, and the position of the upper left vertex of the frame is used as the training set DateSet. The target video is tracked and simulated by the present invention and the existing KCF-based target tracking method respectively.

仿真内容与结果：Simulation content and results:

对比方法即现有基于KCF的目标跟踪方法，用本发明方法和对比方法在上述仿真条件下进行实验，即运用对比方法与本发明对利比亚德尔纳地区遥感视频中车辆目标进行跟踪，得到ME-CNN网络的预测目标轨迹结果(绿色曲线)与准确目标轨迹(红色曲线)的对比图如图3，得到表1的结果如下。The comparison method is the existing KCF-based target tracking method, and the method of the present invention and the comparison method are used to carry out experiments under the above-mentioned simulation conditions, that is, the comparison method and the present invention are used to track the vehicle target in the remote sensing video of the Libyan Derna area, and the ME- The comparison of the predicted target trajectory results (green curve) and the accurate target trajectory (red curve) of the CNN network is shown in Figure 3, and the results in Table 1 are as follows.

表1.利比亚德尔纳地区遥感视频目标跟踪结果一览表Table 1. List of remote sensing video target tracking results in Derna, Libya

仿真结果分析：Simulation result analysis:

表1中，Precision表示ME-CNN网络预测的目标位置与标签位置的区域重叠率，IOU表示边界框的中心位置和标签中心位置的平均欧氏距离小于给定阈值的百分比，在本例中，选取给定阈值为5，KCF表示对比方法，ME-CNN表示本发明的方法。In Table 1, Precision represents the area overlap rate between the target position predicted by the ME-CNN network and the label position, and IOU represents the percentage of the average Euclidean distance between the center position of the bounding box and the center position of the label less than a given threshold. In this example, The given threshold is selected as 5, KCF represents the comparison method, and ME-CNN represents the method of the present invention.

参见表1，从表1的数据对比中可以看出本发明大幅度的提高了跟踪的精度，本发明将Precision由63.21％提高到了85.63％，Referring to Table 1, it can be seen from the data comparison in Table 1 that the present invention greatly improves the tracking accuracy, and the present invention improves the Precision from 63.21% to 85.63%,

从表1中可见边界框的中心位置和标签中心位置的平均欧氏距离小于给定阈值的百分比IOU，本发明将对比方法基于KCF的目标跟踪方法的58.72％提高到76.51％。It can be seen from Table 1 that the average Euclidean distance between the center position of the bounding box and the center position of the label is less than the percentage IOU of the given threshold, and the present invention improves the comparison method based on the KCF target tracking method from 58.72% to 76.51%.

参见图3，图3中红色曲线为标准目标轨迹曲线，绿色曲线为采用本发明对同一目标进行的跟踪预测估计曲线，绿色方框内显示有大场景中的极小目标，对比两条曲线，可见两条曲线高度一致，基本重合，证明了本发明跟踪精度高。Referring to Fig. 3, the red curve in Fig. 3 is the standard target trajectory curve, the green curve is the tracking prediction estimation curve of the same target by the present invention, and the extremely small target in the large scene is displayed in the green box, comparing the two curves, It can be seen that the two curves are highly consistent and basically overlap, which proves that the tracking accuracy of the present invention is high.

简言之，本发明提出的一种基于运动估计ME-CNN网络的大场景极小目标遥感视频跟踪方法，可以在拍摄卫星不断运动，视频出现整体平移和部分缩放，视频的分辨率极低且目标尺寸极小的情况下提高跟踪精度，解决了无需配准利用运动参数进行极小目标跟踪的问题，实现步骤为：获取极小目标运动估计网络ME-CNN的初始训练集D；构建估计极小目标运动的网络ME-CNN；用极小目标运动参数计算网络ME-CNN的损失函数；判断是否为初始训练集；更新损失函数中的训练标签；得到预测目标运动位置的初始模型M1；修正预测模型的位置结果；用修正后的目标位置更新训练数据集，完成一帧的目标跟踪；判断目前的视频帧数是否小于总视频帧数；得到遥感视频目标跟踪结果。本发明使用深度学习网络ME-CNN预测目标运动位置，避免了现有方法跟踪中大场景图像配准，超模糊目标特征提取难的问题，减小对目标特征的依赖性，显著的提高了超模糊视频中目标跟踪的准确度，也适用于其他各种遥感视频中的跟踪。In short, the present invention proposes a remote sensing video tracking method based on a motion estimation ME-CNN network for very small targets in a large scene, which can capture satellites in constant motion, overall panning and partial zooming in the video, and the resolution of the video is extremely low and The tracking accuracy is improved when the target size is extremely small, and the problem of using motion parameters to track extremely small objects without registration is solved. The implementation steps are: obtain the initial training set D of the extremely small object motion estimation network ME-CNN; Network ME-CNN for small target motion; calculate the loss function of network ME-CNN with extremely small target motion parameters; judge whether it is the initial training set; update the training label in the loss function; obtain the initial model M1 for predicting the target motion position; correct Predict the position result of the model; update the training data set with the corrected target position to complete the target tracking of one frame; judge whether the current video frame number is less than the total video frame number; obtain the remote sensing video target tracking result. The invention uses the deep learning network ME-CNN to predict the moving position of the target, which avoids the problems of difficult feature extraction of super-fuzzy target features in the registration of medium and large scene images in the existing method of tracking, reduces the dependence on target features, and significantly improves the super-ambiguity. The accuracy of object tracking in blurry videos is also applicable to tracking in various other remote sensing videos.

Claims

1. a kind of minimum method for tracking target of large scene based on estimation ME-CNN network, which is characterized in that including as follows Step:

(1) the initial training collection D of minimum target estimation network ME-CNN is obtained:

The preceding F frame image for taking original remotely-sensed data video A will be every to the same target continued labelling bounding box of each image A bounding box top left corner apex coordinate is arranged sequentially by video frame number together as training set D；

(2) the network ME-CNN of minimum target movement is estimated in building: including three extraction training data different characteristics in parallel Convolution module, then stack gradually articulamentum, full articulamentum and output layer；

(3) loss function of network ME-CNN is calculated with the minimum parameters of target motion: being calculated according to the characteristics of motion of target The movement tendency of target, and using it as the corresponding trained label of target, then calculate the prediction of trained label Yu ME-CNN network As a result the European space distance between, the loss function as ME-CNN network optimization training；

(4) judge whether it is initial training to integrate: judging current training set whether as initial training collection, if not initial training Collection executes step (5), updates the training label in loss function；If instead being initial training collection, execute step (6), enters The circuit training of network；

(5) it updates the training label in loss function: when current training set is not initial training collection, using the number of current training set According to the training label for recalculating loss function, calculation method calculates the method for training label with the minimum parameters of target motion, with The method of step (3) is identical, the training label recalculated, participates in estimation network ME-CNN training, enters step (6)；

(6) it obtains the initial model M1 of prediction target movement position: training set D is inputted into target estimation network ME-CNN, According to current loss function training network, the initial model M1 of prediction target movement position is obtained；

(7) it corrects the position result of prediction model: the aided location offset of target is calculated, with offset correction estimation net The position result of network ME-CNN prediction；

(7a) obtains target gray image block: obtaining the target position of next frame according to the initial model M1 of prediction target movement position Set (P_x,P_y), according to obtained target position (P_x,P_y) the gray level image block of target is taken out in the image of next frame, and carry out Normalization, the target gray image block after being normalized；

(7b) obtains target position offset: carrying out brightness classification to the target gray image block after normalization, is thrown using vertical Shadow method determines the position of target in image block, and the target's center position being calculated is at a distance from the position at image block center, i.e., Obtain target position offset；

(7c) obtains revised target position: pre- using obtained target position offset correction estimation network ME-CNN The position for surveying target, obtains the revised all positions of target；

(8) training dataset is updated with revised target position, completes the target following of a frame: the target upper left corner that will be obtained Training set D last line is added in position, and removes the first row of training set D, is disposably operated, and has obtained an amendment simultaneously The training set D of update completes the training of a frame, has obtained the target position result of a frame；

(9) judge whether current video frame number is less than total video frame number: if it is less than total video frame number with regard to circulating repetition step (4)~step (9), carries out the tracking optimization training of target, until all video frames are traversed, if being equal to total video frame number, Terminate training, executes step (10)；

(10) obtain remote sensing video frequency object tracking result: the output accumulated is remote sensing video frequency object tracking result.

2. the large scene minimum target remote sensing video tracking side according to claim 1 based on estimation ME-CNN network Method, it is characterised in that: the network ME-CNN of minimum target movement is estimated in building described in step (2), comprises the following steps that

The overall structure of (2a) estimation network: estimation network ME-CNN, including three convolution modules in parallel, then according to Secondary stacking articulamentum, full articulamentum, output layer；

The structure of three convolution modules of (2b) parallel connection: three convolution modules in parallel, respectively convolution module Ι, convolution module Ι Ι and convolution module Ι Ι Ι, wherein

Convolution module I includes locally-attached LocallyConnected1D convolutional layer, and step-length is 2 coordinate positions for extracting target Information；

Convolution module Ι Ι includes empty convolution, step-length 1；

Convolution module Ι Ι Ι includes the one-dimensional convolution that step-length is 2；

Convolution module Ι, Ι Ι and Ι Ι Ι obtains the position feature of target different scale, obtains three output datas, then rolls up three The output of volume module is cascaded to obtain fusion convolution results；Full articulamentum and output layer are inputted again, obtain prediction knot to the end Fruit.

3. the large scene minimum target remote sensing video tracking side according to claim 1 based on estimation ME-CNN network Method, it is characterised in that: with the loss function of minimum parameters of target motion calculating network ME-CNN described in step 3, include Following steps:

(3a) obtains training set D displacement of targets: the data of training set D line f, F-2 row, F-4 row are taken out, respectively with instruction Practice collection D the first row data to subtract each other, obtains F frame, F-2 frame, displacement of targets of the F-4 frame respectively between first frame and be followed successively by S₁、S₂、S₃。

(3b) obtains the movement tendency of target:

According to the characteristics of motion of target, using obtained displacement of targets, respectively in the x of image coordinate system, the direction y is according to following public affairs Movement tendency (the G of target is calculated in formula_x,G_y)；

V₁=(S₁-S₂)/2

V₂=(S₂-S₃)/2

A=(V₁-V₂)/2

G=V₁+a/2

In formula, V₁To be displaced S₁With S₂Between target speed, V₂To be displaced S₂With S₃Between target speed, a be movement Acceleration, G are the movement tendency of target.

The loss function of (3c) building estimation network ME-CNN:

The movement tendency of target is calculated according to the characteristics of motion of target, and using it as the corresponding trained label of target, meter Obtained target movement tendency (G_x,G_y) with estimation network ME-CNN output predicted position (P_x,P_y) between theorem in Euclid space Distance is configured to the loss function of estimation network ME-CNN；

In formula, G_xFor the target movement tendency in the direction x under image coordinate system, G_yTarget movement for the direction y under image coordinate system becomes Gesture, P_xFor the prediction result in estimation network direction x under image coordinate system, P_yIt is estimation network in image coordinate system The prediction result in the direction lower y.