CN106203350A

CN106203350A - A kind of moving target is across yardstick tracking and device

Info

Publication number: CN106203350A
Application number: CN201610548086.0A
Authority: CN
Inventors: 杜军平; 朱素果; 任楠
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2016-07-12
Filing date: 2016-07-12
Publication date: 2016-12-07
Anticipated expiration: 2036-07-12
Also published as: CN106203350B

Abstract

The invention discloses a method and device for cross-scale tracking of a moving target, constructing a neural network, and performing weight training layer by layer to obtain initialized weights; inputting the true value of the first video frame into the established neural network, to The initial weight value is updated; starting from the second video frame, calculating the bias term for the input video frame; according to the bias term of the obtained video frame, calculating the output value of the video frame after inputting the neural network; judging the video Whether the confidence level of the neural network output in the frame is less than the preset threshold, if less than, the weight of the neural network is updated, and the moving target of the video frame is estimated according to the neural network with the updated weight; if If it is greater than , estimate the moving target of the video frame directly. Therefore, the moving target cross-scale tracking method and device can realize accurate feature extraction of the moving target, and predict the position and size of the moving target, and obtain the best tracking result of the moving target.

Description

A method and device for cross-scale tracking of moving targets

技术领域technical field

本发明涉及图像处理技术领域，特别是指一种运动目标跨尺度跟踪方法和装置。The invention relates to the technical field of image processing, in particular to a method and device for cross-scale tracking of a moving target.

背景技术Background technique

传统跟踪方法将不同的复杂特征与深度神经网络相结合，既没有考虑不同采样片之间的尺度差异，也在一定程度上增加了方法的复杂度。利用堆栈式降噪自编码器(SDAE)学习视觉跟踪过程中的目标外观，结合深度网络和粒子滤波对运动目标进行跟踪，这种框架结构(DLT，Deep Learning Tracking)简单，单纯考虑了4层的神经网络架构，对于简单的运动目标跟踪效果良好，但是并不适合于复杂环境下的运动目标跟踪。Traditional tracking methods combine different complex features with deep neural networks, which neither considers the scale difference between different sampling patches, but also increases the complexity of the method to a certain extent. The stacked denoising autoencoder (SDAE) is used to learn the appearance of the target in the visual tracking process, and the deep network and particle filter are used to track the moving target. This framework (DLT, Deep Learning Tracking) is simple and only considers 4 layers. The neural network architecture works well for simple moving target tracking, but it is not suitable for moving target tracking in complex environments.

一种自学习与暂缓约束原则相结合的跟踪方法，通过线下的学习对网络进行初始化，并在线更新网络权值，运动估计使用了逻辑回归概率估计。虽然该方法的准确性有了一定的提升，但是对于遮挡或部分遮挡等更具挑战性的问题仍然不够鲁棒。哈希方法可以利用简单的信息区分不同的图像块，对图像的具体内容比较敏感，而对图像本身的完整性并不敏感。采用三种基本哈希方法(感知哈希、平均哈希以及差分哈希方法)可对运动目标进行跟踪，该方法考虑了三种哈希方法的特性，但同时也增加了方法的复杂度。为了降低方法的复杂性，提高方法的效率，提出了一种二维组合哈希方法作为特征提取的方法，结合贝叶斯运动估计完成对运动目标的跟踪，但该方法只能在部分视频序列上显示出较好的效果。A tracking method that combines self-learning and suspend constraint principles. The network is initialized through offline learning, and the network weights are updated online. The motion estimation uses logistic regression probability estimation. Although the accuracy of this method has improved to some extent, it is still not robust enough for more challenging problems such as occlusion or partial occlusion. The hash method can use simple information to distinguish different image blocks, and is sensitive to the specific content of the image, but not sensitive to the integrity of the image itself. Three basic hashing methods (perceptual hashing, average hashing and differential hashing) can be used to track moving targets. This method takes the characteristics of the three hashing methods into account, but at the same time increases the complexity of the method. In order to reduce the complexity of the method and improve the efficiency of the method, a two-dimensional combined hashing method is proposed as a method of feature extraction, combined with Bayesian motion estimation to complete the tracking of moving targets, but this method can only be used in some video sequences showed a better effect.

发明内容Contents of the invention

有鉴于此，本发明的目的在于提出一种运动目标跨尺度跟踪方法和装置，能够实现精准的运动目标的特征提取，且对运动目标的位置和大小进行预测，获取最佳的运动目标跟踪结果。In view of this, the object of the present invention is to propose a method and device for cross-scale tracking of moving objects, which can realize accurate feature extraction of moving objects, predict the position and size of moving objects, and obtain the best tracking results of moving objects .

基于上述目的本发明提供运动目标跨尺度跟踪方法，包括步骤：Based on the above purpose, the present invention provides a moving target cross-scale tracking method, including steps:

构建神经网络，并逐层进行权值训练得到初始化的权值；Construct a neural network, and perform weight training layer by layer to obtain initialized weights;

将第一视频帧的真值输入建立的所述神经网络中，对初始的权值进行更新；Input the true value of the first video frame into the established neural network, and update the initial weights;

从第二视频帧开始，对输入的视频帧计算偏置项；Computing a bias term for the input video frame starting from the second video frame;

根据获得的视频帧的偏置项，计算该视频帧输入神经网络后的输出值；According to the bias item of the obtained video frame, calculate the output value after the video frame is input into the neural network;

判断所述视频帧中神经网络输出的置信度是否小于预设的阈值，若小于则对所述神经网络的权值进行更新，根据具有更新后权值的神经网络，对该视频帧的运动目标进行估计；若大于则直接对该视频帧的运动目标进行估计。Judging whether the confidence degree of the neural network output in the video frame is less than a preset threshold, if less than the weight of the neural network is updated, according to the neural network with the updated weight, the moving target of the video frame Estimate; if it is greater than, estimate the moving target of the video frame directly.

在本发明的一些实施例中，利用堆栈式降噪自编码器进行逐层权值的训练，得到初始化的权值。In some embodiments of the present invention, a stacked denoising autoencoder is used to train layer-by-layer weights to obtain initialized weights.

在本发明的一些实施例中，所述构建神经网络，并逐层进行权值训练得到初始化的权值包括：In some embodiments of the present invention, said constructing a neural network, and performing weight training layer by layer to obtain initialized weights include:

假设k表示训练样本的个数，i＝{1,2,…,k}，样本的训练集为{x_l,…,x_i,…,x_k}，W'和W分别表示隐藏层的权值和输出层的权值，b'和b表示不同隐藏层的偏置项；由输入样本x_i可以得到隐藏层表示h_i，以及输入的重构如式(1)、式(2)所示：Suppose k represents the number of training samples, i={1,2,...,k}, the training set of samples is {x _l ,..., _xi ,...,x _k }, W' and W represent the hidden layer The weight and the weight of the output layer, b' and b represent the bias items of different hidden layers; the hidden layer representation h _i can be obtained from the input sample _xi , and the reconstruction of the input As shown in formula (1) and formula (2):

h_i＝f(W'x_i+b') (1)h _i =f(W'x _i +b') (1)

${\overset{^^}{x x}}_{i i} = = s the s i i g g m m (({Wh wh}_{i i} + + b b)) - - - - - - ((22))$

其中，f(·)表示非线性激励函数；sigm(·)表示神经网络的激励函数，如式(3)所示：Among them, f( ) represents the nonlinear activation function; sigm( ) represents the activation function of the neural network, as shown in formula (3):

$s the s i i g g m m ((y the y)) = = \frac{11}{11 + + exp exp ((- - y the y))} - - - - - - ((33))$

通过学习得到降噪自编码器，如式(4)所示：The noise reduction self-encoder is obtained through learning, as shown in formula (4):

$\underset{W W,, {W W}^{' '},, b b,, {b b}^{' '}}{min min} {Σ Σ}_{i i = = 11}^{k k} | | | | {x x}_{i i} - - {\overset{^^}{x x}}_{i i} | | {| |}_{22}^{22} + + γ γ ((| | | | W W | | {| |}_{F f}^{22} + + | | | | {W W}^{' '} | | {| |}_{F f}^{22})) - - - - - - ((44))$

其中，表示经神经网络重构后的损失函数，其第二项是权值惩罚项；利用参数γ对重构误差和权值惩罚项进行平衡，以充分考虑两者的关系；假设式(4)使用来表示，那么对E求偏导，得式(5)：in, Represents the loss function reconstructed by the neural network, and its second item is the weight penalty item; the parameter γ is used to balance the reconstruction error and the weight penalty item to fully consider the relationship between the two; assuming that formula (4) uses to express, then take the partial derivative of E, and get formula (5):

$\frac{\partial \partial E E. ((\overset{&RightArrow; &Right Arrow;}{w w}))}{\partial \partial {\overset{&RightArrow; &Right Arrow;}{w w}}_{j j i i}} = = \frac{\partial \partial {E E.}_{00} ((\overset{&RightArrow; &Right Arrow;}{w w}))}{\partial \partial {\overset{&RightArrow; &Right Arrow;}{w w}}_{j j i i}} + + 22 {λw λw}_{j j i i} = = - - {δ δ}_{j j} {x x}_{j j i i} + + 22 {γw γw}_{j j i i} - - - - - - ((55))$

如果令则得式(6)和式(7)：if make Then get formula (6) and formula (7):

${\overset{~ ~}{w w}}_{j j i i} = = {w w}_{j j i i} + + η η (({δ δ}_{j j} {x x}_{j j i i} - - 22 {γw γw}_{j j i i})) - - - - - - ((66))$

${\overset{~ ~}{w w}}_{j j i i} = = ((11 - - 22 η η γ γ)) {w w}_{j j i i} + + {ηδ ηδ}_{j j} {x x}_{j j i i} - - - - - - ((77))$

其中，η是学习率，δ_j是误差，x_ji和w_ji分别表示数据和从第i层到第j层的权值。Among them, η is the learning rate, δ _j is the error, x _ji and w _ji represent the data and the weight from layer i to layer j, respectively.

在本发明的一些实施例中，所述对输入的视频帧计算偏置项包括：In some embodiments of the present invention, the calculation of the bias term for the input video frame includes:

对视频帧采样，得到初始化样本S＝{s₁,s₂,…,s_N}，i＝1,2,…,N；Sampling the video frame to obtain initialization samples S={s ₁ ,s ₂ ,…,s _N }, i=1,2,…,N;

计算前一帧I_t-1跟踪结果的哈希值l；Calculate the hash value l of the previous frame I _t-1 tracking result;

对每一个样本s_i，计算s_i的平均哈希值 For each sample s _i , calculate the average hash value of s _i

利用公式(8)计算l与每一个样本s_i之间的汉明距离；其中，N表示样本的个数，S＝{s₁,s₂,…,s_N}，i＝1,2,…,N，dis(l,s_i)表示视频帧I_t-1跟踪结果的哈希值l与当前第i个样本s_i的哈希值之间的汉明距离表示为式(8)所示：Use formula (8) to calculate the Hamming distance between l and each sample _si ; where, N represents the number of samples, S={s ₁ ,s ₂ ,...,s _N }, i=1,2, ..., N, dis(l, s _i ) represents the hash value l of the video frame I _t-1 tracking result and the hash value of the current i-th sample s _i The Hamming distance between is expressed as formula (8):

$d d i i s the s ((l l,, {s the s}_{i i})) = = {Σ Σ}_{i i = = 11}^{N N} l l &CircleTimes; &CircleTimes; {l l}_{{s the s}_{i i}} - - - - - - ((88))$

根据获得的汉明距离和式(9)，得到样本s_i对应的偏置项：According to the obtained Hamming distance and formula (9), the bias item corresponding to the sample _si is obtained:

${b b}_{i i} = = 11 - - \frac{d d i i s the s ((l l,, {s the s}_{i i}))}{{Σ Σ}_{t t = = 11}^{N N} d d i i s the s ((l l,, {s the s}_{t t}))} - - - - - - ((99))$

其中，b_i表示第i个样本的偏置项。Among them, bi represents the bias term of the _i -th sample.

在本发明的一些实施例中，所述根据具有更新后权值的神经网络，对该视频帧的运动目标进行估计包括：In some embodiments of the present invention, the estimation of the moving target of the video frame according to the neural network with the updated weights includes:

根据具有更新后权值的神经网络，计算每个粒子的更新后置信度；Calculate the updated confidence of each particle according to the neural network with updated weights;

在计算获得的更新后置信度中，选择更新后置信度最大的粒子作为该视频帧的运动目标。Among the updated confidence obtained through calculation, the particle with the highest updated confidence is selected as the moving target of the video frame.

在另一方面，本发明还提供了一种运动目标跨尺度跟踪装置，包括：In another aspect, the present invention also provides a moving target cross-scale tracking device, including:

神经网络构建单元，用于构建神经网络，并逐层进行权值训练得到初始化的权值；The neural network construction unit is used to construct a neural network, and perform weight training layer by layer to obtain initialized weights;

权值更新单元，用于将第一视频帧的真值输入建立的所述神经网络中，对初始的权值进行更新；从第二视频帧开始，对输入的视频帧计算偏置项；根据获得的视频帧的偏置项，计算该视频帧输入神经网络后的输出值；判断所述视频帧中神经网络输出的置信度是否小于预设的阈值，若小于则对所述神经网络的权值进行更新，若大于则不处理；The weight updating unit is used to input the true value of the first video frame into the established neural network, and update the initial weight; starting from the second video frame, calculating the bias term for the input video frame; according to The bias item of the obtained video frame, calculate the output value after the video frame is input into the neural network; judge whether the confidence degree of the neural network output in the video frame is less than the preset threshold, if less than the weight of the neural network The value is updated, if it is greater than, it will not be processed;

运动目标估计单元，用于对该视频帧的运动目标进行估计。The moving object estimating unit is used for estimating the moving object of the video frame.

在本发明的一些实施例中，所述神经网络构建单元利用堆栈式降噪自编码器进行逐层权值的训练，得到初始化的权值。In some embodiments of the present invention, the neural network construction unit uses a stacked denoising autoencoder to train layer-by-layer weights to obtain initialized weights.

在本发明的一些实施例中，所述神经网络构建单元得到初始化的权值包括：In some embodiments of the present invention, the weights initialized by the neural network construction unit include:

h_i＝f(W'x_i+b') (1)h _i =f(W'x _i +b') (1)

$\underset{W W,, {W W}^{' '},, b b,, {b b}^{' '}}{min min} | | | | {x x}_{i i} - - {\overset{^^}{x x}}_{i i} | | {| |}_{22}^{22} + + γ γ ((| | | | W W | | {| |}_{F f}^{22} + + | | | | {W W}^{' '} | | {| |}_{F f}^{22})) - - - - - - ((44))$

如果令则得式(6)和式(7)：if make Then get formula (6) and formula (7):

在本发明的一些实施例中，所述权值更新单元对输入的视频帧计算偏置项包括：In some embodiments of the present invention, the calculation of the bias item for the input video frame by the weight update unit includes:

根据获得的汉明距离和式(9)，得到样本si对应的偏置项：According to the obtained Hamming distance and formula (9), the bias item corresponding to the sample si is obtained:

在本发明的一些实施例中，所述运动目标估计单元根据具有更新后权值的神经网络，对该视频帧的运动目标进行估计包括：In some embodiments of the present invention, the moving target estimating unit estimates the moving target of the video frame according to the neural network with updated weights, including:

从上面所述可以看出，本发明提供的运动目标跨尺度跟踪方法和装置，通过计算不同采样片的哈希特征值，将其与模板的相似度作为尺度特征对神经网络的偏置项进行修正；采用粒子滤波的运动目标跟踪方法，对每个粒子的概率进行估计，从而完成对运动目标的跟踪。From the above, it can be seen that the cross-scale tracking method and device for moving objects provided by the present invention calculate the hash feature values of different sampling slices, and use the similarity between them and the template as the scale feature to carry out the bias item of the neural network. Correction; the particle filter moving target tracking method is used to estimate the probability of each particle, so as to complete the tracking of the moving target.

附图说明Description of drawings

图1为本发明实施例中运动目标跨尺度跟踪方法流程示意图；FIG. 1 is a schematic flow chart of a method for cross-scale tracking of a moving object in an embodiment of the present invention;

图2为本发明实施例中运动目标跨尺度跟踪装置的结构示意图。Fig. 2 is a schematic structural diagram of a moving object cross-scale tracking device in an embodiment of the present invention.

具体实施方式detailed description

为使本发明的目的、技术方案和优点更加清楚明白，以下结合具体实施例，并参照附图，对本发明进一步详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

需要说明的是，本发明实施例中所有使用“第一”和“第二”的表述均是为了区分两个相同名称非相同的实体或者非相同的参量，可见“第一”“第二”仅为了表述的方便，不应理解为对本发明实施例的限定，后续实施例对此不再一一说明。It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are to distinguish two entities with the same name but different parameters or parameters that are not the same, see "first" and "second" It is only for the convenience of expression, and should not be construed as a limitation on the embodiments of the present invention, which will not be described one by one in the subsequent embodiments.

作为一个实施例，参阅图1所示，所述运动目标跨尺度跟踪方法可采用如下步骤：As an embodiment, referring to Fig. 1, the moving target cross-scale tracking method may adopt the following steps:

步骤101，构建神经网络，并逐层进行权值训练得到初始化的权值以及偏置项。Step 101, constructing a neural network, and performing weight training layer by layer to obtain initialized weights and bias items.

较佳地，利用堆栈式降噪自编码器进行逐层权值的训练，得到初始化的权值。Preferably, a stacked denoising autoencoder is used to perform layer-by-layer weight training to obtain initialized weights.

在实施例中，假设k表示训练样本的个数，i＝{1,2,…,k}，样本的训练集为{x₁,…,x_i,…,x_k}，W'和W分别表示隐藏层的权值和输出层的权值，b'和b表示不同隐藏层的偏置项。由输入样本x_i可以得到隐藏层表示h_i，以及输入的重构如式(1)、式(2)所示：In the embodiment, assuming that k represents the number of training samples, i={1,2,...,k}, the training set of samples is {x ₁ ,..., _xi ,...,x _k }, W' and W Denote the weight of the hidden layer and the weight of the output layer, respectively, and b' and b represent the bias items of different hidden layers. From the input sample _{xi, the hidden layer representation h i} _can be obtained, as well as the reconstruction of the input As shown in formula (1) and formula (2):

h_i＝f(W'x_i+b') (1)h _i =f(W'x _i +b') (1)

其中，f(·)表示非线性激励函数。sigm(·)表示神经网络的激励函数，如式(3)所示：Among them, f( ) represents the nonlinear activation function. sigm( ) represents the activation function of the neural network, as shown in formula (3):

其中，表示经神经网络重构后的损失函数，其第二项是权值惩罚项，利用梯度下降方法完成对较小权值的查找，降低过拟合的可能性；利用参数γ对重构误差和权值惩罚项进行平衡，以充分考虑两者的关系。假设式(4)可以使用来表示，那么对E求偏导，可得式(5)：in, Represents the loss function reconstructed by the neural network, and its second item is the weight penalty item. The gradient descent method is used to complete the search for smaller weights, reducing the possibility of overfitting; the parameter γ is used to determine the reconstruction error and The weight penalty item is balanced to fully consider the relationship between the two. Assuming that formula (4) can be used to express, then take the partial derivative of E, the formula (5) can be obtained:

如果令则得式(6)和式(7)：if make Then get formula (6) and formula (7):

优选地，在步骤101中可以建立八层的神经网络结构。Preferably, in step 101, an eight-layer neural network structure can be established.

步骤102，将第一视频帧的真值输入建立的所述神经网络中，对初始的权值进行更新。Step 102, input the true value of the first video frame into the established neural network, and update the initial weight value.

具体来说，可以将第一帧视频帧的真值代入到公式(7)中，获得更新后的权值。Specifically, the truth value of the first video frame can be substituted into formula (7) to obtain the updated weight value.

步骤103，从第二视频帧开始，对输入的视频帧计算偏置项，即更新初始化的偏置项。具体的实施过程包括：Step 103, starting from the second video frame, calculating a bias term for the input video frame, that is, updating the initialized bias term. The specific implementation process includes:

步骤一：对视频帧采样，得到初始化样本S＝{s₁，s₂,…,s_N}，i＝1,2,…,NStep 1: Sampling video frames to obtain initialization samples S={s ₁ , s ₂ ,…,s _N }, i=1,2,…,N

步骤二：计算前一帧I_t-1跟踪结果的哈希值l。Step 2: Calculate the hash value l of the tracking result of the previous frame I _t-1 .

步骤三：对每一个样本s_i，计算s_i的平均哈希值 Step 3: For each sample _si , calculate the average hash value of _si

步骤四：利用公式(8)计算l与每一个样本s_i之间的汉明距离。Step 4: Use formula (8) to calculate the Hamming distance between l and each sample _si .

在一个实施例中，假设当前视频帧为I_t，将前一视频帧I_t-1的跟踪结果作为比较对象，计算I_t-1上跟踪结果的平均哈希值，并计算其与当前所有样本的平均哈希值之间的汉明距离。每两个哈希值的汉明距离越大，表示两者之间的相似度越小，反之亦然。In one embodiment, assuming that the current video frame is I _t , the tracking result of the previous video frame I _t-1 is used as a comparison object, and the average hash value of the tracking result on I _t-1 is calculated, and it is compared with all current The Hamming distance between the average hashes of the samples. The larger the Hamming distance between each two hash values, the smaller the similarity between them, and vice versa.

假设N表示样本的个数，S＝{s₁，s₂,…,s_N}，i＝1,2,…,N，dis(l,s_i)表示视频帧I_t-1跟踪结果的哈希值l与当前第i个样本s_i的哈希值之间的汉明距离可表示为式(8)所示：Suppose N represents the number of samples, S={s ₁ , s ₂ ,...,s _N }, i=1,2,...,N, dis(l,s _i ) represents the tracking result of video frame I _t-1 The hash value l and the hash value of the current i-th sample s _i The Hamming distance between can be expressed as formula (8):

其中，表示异或操作。in, Indicates an XOR operation.

步骤五：根据获得的汉明距离，得到样本s_i对应的偏置项。Step 5: Obtain the bias item corresponding to the sample _si according to the obtained Hamming distance.

其中，样本s_i对应的新的偏置项如式(9)所示：Among them, the new bias item corresponding to the sample _si is shown in formula (9):

其中，b_i表示第i个样本的偏置项。容易发现，新的偏置项b_i在某种程度上表明了当前样本在所有样本中的重要程度。Among them, bi represents the bias term of the _i -th sample. It is easy to find that the new bias item _bi indicates to some extent the importance of the current sample among all samples.

还值得说明的是，在神经网络中，偏置项通常初始被设置为固定的值，例如1。偏置项作为深度网络的一个输入项，它在一定程度上代表了不同样本之间的重要性程度。利用图像固有的低频信息，捕获到目标的结构特征，对偏置项进行修复，使得不同的样本对应不同的偏置值。这种结构特征在一定程度上反映了采样片之间的不同尺度，也是图像尺度特征的一种表示形式。It's also worth noting that in neural networks, the bias term is usually initially set to a fixed value, such as 1. As an input item of the deep network, the bias item represents the importance of different samples to a certain extent. Using the inherent low-frequency information of the image, the structural features of the target are captured, and the bias item is repaired, so that different samples correspond to different bias values. This structural feature reflects the different scales between sampling slices to a certain extent, and is also a representation of image scale features.

步骤104，根据获得的视频帧的偏置项，计算该视频帧输入神经网络后输出的置信度。Step 104, according to the obtained bias item of the video frame, calculate the confidence degree of the output after the video frame is input into the neural network.

作为较佳地实施例，对于深度网络，从第二层开始，即layer＝2:8。进一步地，可以利用公式(1)、式(2)逐层计算神经网络各层输出的置信度。其中，当计算到神经网络的最后一层时结束。As a preferred embodiment, for a deep network, start from the second layer, ie layer=2:8. Further, formula (1) and formula (2) can be used to calculate the confidence of each layer output of the neural network layer by layer. Among them, the calculation ends when the last layer of the neural network is reached.

步骤105，判断所述视频帧中神经网络输出的置信度是否小于预设的阈值，如果小于，则对所述神经网络的权值进行更新，执行步骤106。如果大于，则直接进行步骤106。Step 105, judging whether the confidence degree of the neural network output in the video frame is less than a preset threshold, and if so, updating the weight of the neural network, and performing step 106. If greater, go to step 106 directly.

优选地，所述预设的阈值可以为0.8。在实施例中，所述置信度是通过神经网络计算得到，即将采样片输入神经网络，输出其对应的置信度。当所述置信度小于预设的阈值时(例如小于0.8)，说明当前的神经网络已经不适应动态变化的运动目标，需要用最近几帧的正负样本对神经网络的权值进行更新。而大于预设的阈值，说明当前的神经网络依然能很好地区分运动目标和背景，所以可以继续用当前的神经网络计算采样粒子的权重，不需要做其他处理。Preferably, the preset threshold may be 0.8. In an embodiment, the confidence level is calculated through a neural network, that is, the sampling slice is input into the neural network, and its corresponding confidence level is output. When the confidence is less than the preset threshold (for example, less than 0.8), it indicates that the current neural network is not suitable for the dynamically changing moving target, and the weights of the neural network need to be updated with the positive and negative samples of the last few frames. If it is greater than the preset threshold, it means that the current neural network can still distinguish the moving target from the background very well, so the current neural network can continue to be used to calculate the weight of the sampled particles without any other processing.

更进一步地，可以利用上面所述的公式(7)对神经网络的权值进行更新。例如：对每个样本，通过正向传播经过最后一个隐层得到x_ji，最后经过输出层得出其估计值a，而其标签为y(正样本标签为1，负样本为0)，则公式(7)中δ_j＝(a-y)×sigmoid′(x_ji)。然后通过反向传播从输出层到输入层，对每层权重进行更新。例如：对于第l层， Furthermore, the above formula (7) can be used to update the weights of the neural network. For example: for each sample, x _ji is obtained through the last hidden layer through forward propagation, and finally its estimated value a is obtained through the output layer, and its label is y (positive sample label is 1, negative sample is 0), then In formula (7), δ _j =(ay)×sigmoid′(x _ji ). The weights of each layer are then updated through backpropagation from the output layer to the input layer. For example: for layer l,

步骤106，估计该视频帧的运动目标。Step 106, estimating the moving target of the video frame.

在实施例中，可以根据具有更新后权值的神经网络，计算每个粒子的更新后置信度。在计算获得的更新后置信度中，选择更新后置信度最大的粒子作为该视频帧的运动目标。当然，如果在步骤105中置信度大于预设的阈值，则可以直接选择置信度最大的粒子作为该视频帧的运动目标。其中，相邻帧中目标位置变化很小，则可以认为当前帧中目标更可能在上一帧置信度大的粒子周围。In an embodiment, the updated confidence of each particle may be calculated according to a neural network with updated weights. Among the updated confidence obtained through calculation, the particle with the highest updated confidence is selected as the moving target of the video frame. Of course, if the confidence level is greater than the preset threshold in step 105, the particle with the highest confidence level may be directly selected as the moving target of the video frame. Among them, if the change of the target position in adjacent frames is small, it can be considered that the target in the current frame is more likely to be around the particle with high confidence in the previous frame.

在一个较佳地实施例中，在将每个粒子代入到更新后权值的神经网络中之前，需要先对粒子进行滤波处理。其中，粒子滤波的过程是一个迭代的过程，假设X_t表示在t时刻运动目标的状态变量，那么在初始步骤中，首先在前面粒子集合X_t-1中，根据粒子的分布成比例的采集N个粒子，得到一个新的状态。但是，新产生的粒子通常会受到概率的影响，造成粒子退化的现象，从而导致绝大部分粒子集中于权值较大的粒子周围，而后验概率p(x_t|z_1:t-1)就近似等于具有重要性权值的N个粒子的有限集，如式(10)所示：In a preferred embodiment, before each particle is substituted into the neural network with updated weights, the particles need to be filtered. Among them, the process of particle filtering is an iterative process, assuming that X _t represents the state variable of the moving target at time t, then in the initial step, firstly, in the previous particle set X _t-1 , according to the distribution of particles proportional to the collection N particles, get a new state. However, the newly generated particles are usually affected by the probability, resulting in the phenomenon of particle degeneration, resulting in most of the particles being concentrated around the particles with larger weights, and the posterior probability p(x _t |z _1:t-1 ) is approximately equal to the importance weight A finite set of N particles of , as shown in formula (10):

其中，如果粒子就是要预测的跟踪结果，那么在其对应的矩形框(跟踪结果是由粒子的位置和目标的大小表现的，该位置和大小构成一个矩形框，利用该矩形框反向的直观地在图像上表示目标的位置和大小)中所包含的背景信息将比其他粒子对应的矩形框中的背景信息少，而且该粒子的权值也更大。那么，权值与后验概率的关系可以从式(11)得到：Among them, if the particle It is the tracking result to be predicted, then in its corresponding rectangular frame (the tracking result is represented by the position of the particle and the size of the target, the position and size constitute a rectangular frame, and use the rectangular frame to reversely and intuitively display on the image Indicates the position and size of the target) will contain less background information than the background information in the rectangular box corresponding to other particles, and the weight of this particle is also larger. Then, the relationship between weight and posterior probability can be obtained from formula (11):

${w w}_{t t,, j j}^{i i} = = {w w}_{t t - - 11}^{i i} \frac{{p p}_{j j} (({z z}_{t t} | | {x x}_{t t}^{i i})) {p p}_{j j} (({x x}_{t t}^{i i} | | {x x}_{t t - - 11}^{i i}))}{{q q}_{j j} (({x x}_{t t} | | {x x}_{11 : : t t - - 11},, {z z}_{11 : : t t}))},, i i = = 11,, 22,, ... ...,, N N,, j j = = 11,, 22 - - - - - - ((1111))$

其中，x_1:t-1表示构成后验概率分布的随机样本，N表示样本的个数，z_1:t表示从开始时刻到时刻t的观测值，表示在时刻t的第i个样本，表示在时刻t-1(也表示上一帧)的第i个权值向量，q(x_t|x_1:t-1,z_1:t)＝p(x_t|x_t-1)为重要性分布。Among them, x _1:t-1 represents the random sample that constitutes the posterior probability distribution, N represents the number of samples, z _1:t represents the observed value from the start time to time t, Denotes the i-th sample at time t, Represents the i-th weight vector at time t-1 (also representing the previous frame), q(x _t |x _1:t-1 ,z _1:t )=p(x _t |x _t-1 ) is importance distribution.

在本发明的另一方面，还提供了一种运动目标跨尺度跟踪装置，如图2所示，所述运动目标跨尺度跟踪装置包括依次连接的神经网络构建单元201、权值更新单元202以及运动目标估计单元203。其中，神经网络构建单元201构建神经网络，并逐层进行权值训练得到初始化的权值。较佳地，神经网络构建单元201利用堆栈式降噪自编码器进行逐层权值的训练，得到初始化的权值。权值更新单元202将第一视频帧的真值输入建立的所述神经网络中，对初始的权值进行更新；从第二视频帧开始，对输入的视频帧计算偏置项；根据获得的视频帧的偏置项，计算该视频帧输入神经网络后的输出值；判断所述视频帧中神经网络输出的置信度是否小于预设的阈值，若小于则对所述神经网络的权值进行更新，运动目标估计单元203根据具有更新后权值的神经网络，对该视频帧的运动目标进行估计；若大于则运动目标估计单元203直接对该视频帧的运动目标进行估计。In another aspect of the present invention, a moving object cross-scale tracking device is also provided, as shown in Figure 2, the moving object cross-scale tracking device includes a neural network construction unit 201, a weight updating unit 202 and Moving target estimation unit 203 . Wherein, the neural network construction unit 201 constructs a neural network, and performs weight training layer by layer to obtain initialized weights. Preferably, the neural network construction unit 201 uses a stacked denoising autoencoder to perform layer-by-layer weight training to obtain initialized weights. The weight updating unit 202 inputs the true value of the first video frame into the established neural network, and updates the initial weight; starting from the second video frame, calculates the bias term for the input video frame; according to the obtained The bias item of the video frame, calculates the output value after the video frame is input into the neural network; judges whether the confidence degree of the neural network output in the video frame is less than a preset threshold, if less than the weight of the neural network Update, the moving object estimating unit 203 estimates the moving object of the video frame according to the neural network with the updated weight; if it is greater than, the moving object estimating unit 203 directly estimates the moving object of the video frame.

作为一个实施例，神经网络构建单元202在进行初始化权值的过程中可以假设k表示训练样本的个数，i＝{1,2,…,k}，样本的训练集为W'和W分别表示隐藏层的权值和输出层的权值，b'和b表示不同隐藏层的偏置项；由输入样本x_i可以得到隐藏层表示h_i，以及输入的重构如式(1)、式(2)所示：As an embodiment, the neural network construction unit 202 may assume that k represents the number of training samples in the process of initializing the weights, i={1,2,...,k}, and the training set of samples is W' and W represent the weight of the hidden layer and the weight of the output layer respectively, b' and b represent the bias items of different hidden layers; the hidden layer representation h _i can be obtained from the input sample _xi , and the reconstruction of the input As shown in formula (1) and formula (2):

h_i＝f(W'x_i+b') (1)h _i =f(W'x _i +b') (1)

如果令则得式(6)和式(7)：if make Then get formula (6) and formula (7):

更进一步地，在权值更新单元202对输入的视频帧计算偏置项时可以假设当前视频帧为I_t，将前一视频帧I_t-1的跟踪结果作为比较对象，计算I_t-1上跟踪结果的平均哈希值，并计算其与当前所有样本的平均哈希值之间的汉明距离。每两个哈希值的汉明距离越大，表示两者之间的相似度越小，反之亦然。具体来说，对视频帧采样，得到初始化样本S＝{s₁,s₂,…,s_N}，i＝1,2,…,N。然后，计算前一帧I_t-1跟踪结果的哈希值l，对每一个样本s_i，计算s_i的平均哈希值利用下面的公式(8)计算l与每一个样本s_i之间的汉明距离：Furthermore, when the weight update unit 202 calculates the bias item for the input video frame, it can be assumed that the current video frame is I _t , and the tracking result of the previous video frame I _t-1 is used as a comparison object to calculate I _t-1 The average hash value of the result is tracked above, and the Hamming distance between it and the average hash value of all current samples is calculated. The larger the Hamming distance between each two hash values, the smaller the similarity between them, and vice versa. Specifically, the video frame is sampled to obtain initialization samples S={s ₁ , s ₂ ,...,s _N }, i=1, 2,...,N. Then, calculate the hash value l of the tracking result of the previous frame I _t-1 , for each sample s _i , calculate the average hash value of s _i Use the following formula (8) to calculate the Hamming distance between l and each sample _si :

最后，根据获得的汉明距离和式(9)，得到样本s_i对应的偏置项。Finally, according to the obtained Hamming distance and formula (9), the bias item corresponding to the sample _si is obtained.

在另一个实施例中，运动目标估计单元203可以根据具有更新后权值的神经网络，计算每个粒子的更新后置信度。在计算获得的更新后置信度中，选择更新后置信度最大的粒子作为该视频帧的运动目标。当然，如果置信度大于预设的阈值，则可以直接选择置信度最大的粒子作为该视频帧的运动目标。较佳地实施例，在将每个粒子代入到更新后权值的神经网络中之前，需要先对粒子进行滤波处理。In another embodiment, the moving target estimation unit 203 may calculate the updated confidence of each particle according to the neural network with the updated weight. Among the updated confidence obtained through calculation, the particle with the highest updated confidence is selected as the moving target of the video frame. Of course, if the confidence is greater than the preset threshold, the particle with the highest confidence can be directly selected as the moving target of the video frame. In a preferred embodiment, before each particle is substituted into the neural network with updated weights, the particles need to be filtered.

需要说明的是，在本发明所述的运动目标跨尺度跟踪装置的具体实施内容，在上面所述的运动目标跨尺度跟踪方法中已经详细说明了，故在此重复内容不再说明。It should be noted that the specific implementation content of the device for cross-scale tracking of moving objects in the present invention has been described in detail in the above-mentioned method for cross-scale tracking of moving objects, so repeated content will not be described here.

综上所述，本发明提供的运动目标跨尺度跟踪方法和装置，创造性地在跟踪之前，利用堆栈式降噪自动编码器在线下对网络的参数进行训练学习；在跟踪的过程中，采用平均哈希方法计算各个采样片的哈希值，通过相似性计算求取采样片与上一帧跟踪结果的汉明距离，利用该距离对网络的偏置项进行修正；通过该网络的特征提取过程，可以获取运动目标的尺度信息和细节信息，利用粒子滤波运动估计对运动目标的位置和大小进行预测，从而最终得出运动目标的跟踪结果；To sum up, the method and device for cross-scale tracking of moving objects provided by the present invention creatively use a stacked denoising autoencoder to train and learn network parameters offline before tracking; in the process of tracking, the average The hash method calculates the hash value of each sampling slice, calculates the Hamming distance between the sampling slice and the tracking result of the previous frame through similarity calculation, and uses this distance to correct the bias item of the network; through the feature extraction process of the network , can obtain the scale information and detail information of the moving target, and use the particle filter motion estimation to predict the position and size of the moving target, so as to finally obtain the tracking result of the moving target;

并且，通过学习得到降噪自编码器构建神经网络，利用梯度下降方法完成对较小权值的查找，降低过拟合的可能性；同时，本发明利用图像固有的低频信息，捕获到目标的结构特征，对偏置项进行修复，使得不同的样本对应不同的偏置值；对每一个样本降采样为不同大小的新样本，去除图像本身的高频部分，获得一个包含64个像素的图像，并计算这幅新图像的灰度平均值；判断新图像的每一个像素值的大小，大于或者等于平均值的为1，否则为0，最终得到该样本的哈希值；计算当前帧中采样得到的样本与模板的哈希值之间的汉明距离，距离越小，说明与模板的相似度越高；通过平均哈希值，构建神经网络针对不同采样片的偏置项，从而实现不同采样片的不同尺度对应；因此，本发明能够具有广泛、重大的推广意义；最后，整个所述运动目标跨尺度跟踪方法和装置紧凑，易于控制。Moreover, the neural network is constructed by learning the noise-reducing self-encoder, and the gradient descent method is used to complete the search for smaller weights, reducing the possibility of over-fitting; at the same time, the present invention uses the inherent low-frequency information of the image to capture the target's Structural features, repairing the bias item, so that different samples correspond to different bias values; downsampling each sample into a new sample of different size, removing the high-frequency part of the image itself, and obtaining an image containing 64 pixels , and calculate the gray average value of this new image; judge the size of each pixel value of the new image, if it is greater than or equal to the average value, it is 1, otherwise it is 0, and finally get the hash value of the sample; calculate the current frame The Hamming distance between the sampled sample and the hash value of the template, the smaller the distance, the higher the similarity with the template; through the average hash value, construct the bias item of the neural network for different sampling slices, so as to realize Different scales correspond to different sampling slices; therefore, the present invention can have extensive and significant promotional significance; finally, the entire moving target cross-scale tracking method and device are compact and easy to control.

所属领域的普通技术人员应当理解：以上任何实施例的讨论仅为示例性的，并非旨在暗示本公开的范围(包括权利要求)被限于这些例子；在本发明的思路下，以上实施例或者不同实施例中的技术特征之间也可以进行组合，步骤可以以任意顺序实现，并存在如上所述的本发明的不同方面的许多其它变化，为了简明它们没有在细节中提供。Those of ordinary skill in the art should understand that: the discussion of any of the above embodiments is exemplary only, and is not intended to imply that the scope of the present disclosure (including claims) is limited to these examples; under the idea of the present invention, the above embodiments or Combinations between technical features in different embodiments are also possible, steps may be carried out in any order, and there are many other variations of the different aspects of the invention as described above, which are not presented in detail for the sake of brevity.

另外，为简化说明和讨论，并且为了不会使本发明难以理解，在所提供的附图中可以示出或可以不示出与集成电路(IC)芯片和其它部件的公知的电源/接地连接。此外，可以以框图的形式示出装置，以便避免使本发明难以理解，并且这也考虑了以下事实，即关于这些框图装置的实施方式的细节是高度取决于将要实施本发明的平台的(即，这些细节应当完全处于本领域技术人员的理解范围内)。在阐述了具体细节(例如，电路)以描述本发明的示例性实施例的情况下，对本领域技术人员来说显而易见的是，可以在没有这些具体细节的情况下或者这些具体细节有变化的情况下实施本发明。因此，这些描述应被认为是说明性的而不是限制性的。In addition, well-known power/ground connections to integrated circuit (IC) chips and other components may or may not be shown in the provided figures, for simplicity of illustration and discussion, and so as not to obscure the present invention. . Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and this also takes into account the fact that details regarding the implementation of these block diagram devices are highly dependent on the platform on which the invention is to be implemented (i.e. , these details should be well within the understanding of those skilled in the art). Where specific details (eg, circuits) have been set forth to describe example embodiments of the invention, it will be apparent to those skilled in the art that other embodiments may be implemented without or with variations from these specific details. Implement the present invention down. Accordingly, these descriptions should be regarded as illustrative rather than restrictive.

尽管已经结合了本发明的具体实施例对本发明进行了描述，但是根据前面的描述，这些实施例的很多替换、修改和变型对本领域普通技术人员来说将是显而易见的。例如，其它存储器架构(例如，动态RAM(DRAM))可以使用所讨论的实施例。Although the invention has been described in conjunction with specific embodiments of the invention, many alternatives, modifications and variations of those embodiments will be apparent to those of ordinary skill in the art from the foregoing description. For example, other memory architectures such as dynamic RAM (DRAM) may use the discussed embodiments.

本发明的实施例旨在涵盖落入所附权利要求的宽泛范围之内的所有这样的替换、修改和变型。因此，凡在本发明的精神和原则之内，所做的任何省略、修改、等同替换、改进等，均应包含在本发明的保护范围之内。Embodiments of the present invention are intended to embrace all such alterations, modifications and variations that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalent replacements, improvements, etc. within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims

1. A cross-scale tracking method for a moving target is characterized by comprising the following steps:

building a neural network, and carrying out weight training layer by layer to obtain initialized weights;

inputting the true value of the first video frame into the established neural network, and updating the initial weight;

calculating an offset term for the input video frame starting from the second video frame;

calculating an output value of the video frame after the video frame is input into the neural network according to the obtained bias term of the video frame;

judging whether the confidence coefficient output by the neural network in the video frame is smaller than a preset threshold value, if so, updating the weight of the neural network, and estimating the moving target of the video frame according to the neural network with the updated weight; and if the difference is larger than the preset threshold, directly estimating the moving object of the video frame.

2. The method of claim 1, wherein the layer-by-layer weights are trained by using a stacked noise reduction self-encoder to obtain initialized weights.

3. The method of claim 2, wherein the constructing the neural network and performing weight training layer by layer to obtain initialized weights comprises:

let k denote the number of training samples, i ═ 1,2, …, k, and the training set of samples is { x }₁,…,x_i,…,x_kRespectively representing a weight value of a hidden layer and a weight value of an output layer, and b' and b represent bias items of different hidden layers; from input samples x_iA hidden layer representation h can be derived_iAnd reconstruction of the inputAs shown in formula (1) and formula (2):

h_i＝f(W'x_i+b') (1)

{\hat{x}}_{i} = s i g m ({Wh}_{i} + b) - - - (2)

wherein f (·) represents a nonlinear excitation function; sigm (-) represents the excitation function of the neural network, as shown in equation (3):

s i g m (y) = \frac{1}{1 + \exp (- y)} - - - (3)

the noise reduction self-encoder is obtained through learning, and the formula (4) is shown as follows:

\underset{W, W^{'}, b, b^{'}}{m i n} Σ_{i = 1}^{k} | | x_{i} - {\hat{x}}_{i} | |_{2}^{2} + γ (| | W | |_{F}^{2} + | | W^{'} | |_{F}^{2}) - - - (4)

wherein,representing a loss function reconstructed by a neural network, wherein a second term is a weight penalty term; balancing the reconstruction error and the weight penalty term by using the parameter gamma so as to fully consider the relationship between the reconstruction error and the weight penalty term; assume the use of equation (4)Then, the partial derivative is calculated for E, and the formula (5) is obtained:

\frac{\partial E (\overset{&RightArrow;}{w})}{\partial {\overset{&RightArrow;}{w}}_{j i}} = \frac{\partial E_{0} (\overset{&RightArrow;}{w})}{\partial {\overset{&RightArrow;}{w}}_{j i}} + 2 {γw}_{j i} = - δ_{j} x_{j i} + 2 {γw}_{j i} - - - (5)

if it is notOrder toThen obtaining the formula (6) and the formula (7):

{\tilde{w}}_{j i} = w_{j i} + η (δ_{j} x_{j i} - 2 {γw}_{j i}) - - (6)

{\tilde{w}}_{j i} = (1 - 2 η γ) w_{j i} + {ηδ}_{j} x_{j i} - - - (7)

where, η is the learning rate,_jis an error, x_jiAnd w_jiRepresenting the data and the weights from the ith layer to the jth layer, respectively.

4. The method of claim 3, wherein calculating the bias term for the input video frame comprises:

sampling a video frame to obtain an initialization sample S ═ S₁,s₂,…,s_N}，i＝1,2,…,N；

Calculating the previous frame I_t-1Tracking a hash value l of the result;

for each sample s_iCalculating s_iAverage hash value of

Calculate l and each sample s using equation (8)_iHamming distance therebetween; where N denotes the number of samples, S ═ S₁,s₂,…,s_N}，i＝1,2,…,N，dis(l,s_i) Representing video frames I_t-1Tracking the hash value l of the result and the current ith sample s_iHash ofValue ofThe Hamming distance therebetween is represented by formula (8):

d i s (l, s_{i}) = Σ_{i = 1}^{N} l &CircleTimes; l_{s_{i}} - - - (8)

obtaining a sample s according to the obtained Hamming distance and the formula (9)_iThe corresponding bias term:

b_{i} = 1 - \frac{d i s (l, s_{i})}{Σ_{t = 1}^{N} d i s (l, s_{t})} - - - (9)

wherein, b_iThe bias term for the ith sample is indicated.

5. The method according to any one of claims 1 to 4, wherein the estimating the moving object of the video frame according to the neural network with the updated weight value comprises:

calculating the updated confidence of each particle according to the neural network with the updated weight;

and selecting the particle with the maximum confidence coefficient after updating as the moving object of the video frame in the calculated confidence coefficient after updating.

6. A moving object cross-scale tracking device is characterized by comprising:

the neural network construction unit is used for constructing a neural network and carrying out weight training layer by layer to obtain initialized weights;

the weight updating unit is used for inputting the true value of the first video frame into the established neural network and updating the initial weight; calculating an offset term for the input video frame starting from the second video frame; calculating an output value of the video frame after the video frame is input into the neural network according to the obtained bias term of the video frame; judging whether the confidence coefficient output by the neural network in the video frame is smaller than a preset threshold value, if so, updating the weight of the neural network, and if not, not processing;

and the moving object estimation unit is used for estimating the moving object of the video frame.

7. The apparatus of claim 6, wherein the neural network building unit performs layer-by-layer weight training by using a stacked noise reduction self-encoder to obtain initialized weights.

8. The apparatus of claim 7, wherein the neural network building unit obtaining initialized weights comprises:

h_i＝f(W'x_i+b') (1)

{\hat{x}}_{i} = s i g m ({Wh}_{i} + b) - - - (2)

s i g m (y) = \frac{1}{1 + \exp (- y)} - - - (3)

\underset{W, W^{'}, b, b^{'}}{m i n} Σ_{i = 1}^{k} | | x_{i} - {\hat{x}}_{i} | |_{2}^{2} + γ (| | W | |_{F}^{2} + | | W^{'} | |_{F}^{2}) - - - (4)

wherein,representing a loss function reconstructed by a neural network, wherein a second term is a weight penalty term; balancing the reconstruction error and the weight penalty term by using the parameter gamma to fullyConsidering the relationship between the two; assume the use of equation (4)Then, the partial derivative is calculated for E, and the formula (5) is obtained:

\frac{\partial E (\overset{&RightArrow;}{w})}{\partial {\overset{&RightArrow;}{w}}_{j i}} = \frac{\partial E_{0} (\overset{&RightArrow;}{w})}{\partial {\overset{&RightArrow;}{w}}_{j i}} + 2 {γw}_{j i} = - δ_{j} x_{j i} + 2 {γw}_{j i} - - - (5)

if it is notOrder toThen obtaining the formula (6) and the formula (7):

{\tilde{w}}_{j i} = w_{j i} + η (δ_{j} x_{j i} - 2 {γw}_{j i}) - - - (6)

{\tilde{w}}_{j i} = (1 - 2 η γ) w_{j i} + {ηδ}_{j} x_{j i} - - - (7)

9. The apparatus of claim 8, wherein the weight update unit calculates the bias term for the input video frame comprises:

Calculating the previous frame I_t-1Tracking a hash value l of the result;

for each sample s_iCalculating s_iAverage hash value of

Calculate l and each sample s using equation (8)_iHamming distance therebetween; where N denotes the number of samples, S ═ S₁,s₂,…,s_N}，i＝1,2,…,N，dis(l,s_i) Representing video frames I_t-1Tracking the hash value l of the result and the current ith sample s_iHash value ofThe Hamming distance therebetween is represented by formula (8):

d i s (l, s_{i}) = Σ_{i = 1}^{N} l &CircleTimes; l_{s_{i}} - - - (8)

b_{i} = 1 - \frac{d i s (l, s_{i})}{Σ_{t = 1}^{N} d i s (l, s_{t})} - - - (9)

wherein, b_iThe bias term for the ith sample is indicated.

10. The apparatus according to any one of claims 6 to 9, wherein the estimating the moving object of the video frame according to the neural network with updated weights comprises: