CN106203350A - A kind of moving target is across yardstick tracking and device - Google Patents
A kind of moving target is across yardstick tracking and device Download PDFInfo
- Publication number
- CN106203350A CN106203350A CN201610548086.0A CN201610548086A CN106203350A CN 106203350 A CN106203350 A CN 106203350A CN 201610548086 A CN201610548086 A CN 201610548086A CN 106203350 A CN106203350 A CN 106203350A
- Authority
- CN
- China
- Prior art keywords
- neural network
- video frame
- weight
- layer
- formula
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 claims abstract description 86
- 238000000034 method Methods 0.000 claims description 43
- 239000002245 particle Substances 0.000 claims description 37
- 230000006870 function Effects 0.000 claims description 18
- 230000009467 reduction Effects 0.000 claims description 15
- 238000005070 sampling Methods 0.000 claims description 13
- 230000005284 excitation Effects 0.000 claims description 12
- 238000010276 construction Methods 0.000 claims description 2
- 238000000605 extraction Methods 0.000 abstract description 4
- 230000007935 neutral effect Effects 0.000 abstract 5
- 230000008569 process Effects 0.000 description 6
- 238000001914 filtration Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000000725 suspension Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of moving target across yardstick tracking and device, build neutral net, and successively carry out Weight Training and obtain initialized weights;The true value of the first frame of video is inputted in the described neutral net set up, initial weights are updated;From the beginning of the second frame of video, the frame of video of input is calculated bias term;According to the bias term of the frame of video obtained, calculate the output valve after this frame of video input neural network;Judge the confidence level of the output of neutral net in described frame of video whether less than the threshold value preset, if less than; the weights of described neutral net would be updated, according to having the neutral net of weights after renewal, the moving target of this frame of video would be estimated;If more than, directly the moving target of this frame of video is estimated.Therefore, described moving target is capable of the feature extraction of moving target accurately across yardstick tracking and device, and the position and size to moving target is predicted, and obtains optimal motion target tracking result.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to a cross-scale tracking method and a cross-scale tracking device for a moving target.
Background
The traditional tracking method combines different complex features with a deep neural network, does not consider the scale difference between different sampling sheets, and increases the complexity of the method to a certain extent. The appearance of a target in the process of visual Tracking is learned by utilizing a stacked noise reduction self-encoder (SDAE), and a moving target is tracked by combining a depth network and particle filtering, the frame structure (DLT, Deep Learning Tracking) is simple, a 4-layer neural network framework is simply considered, the simple moving target Tracking effect is good, but the method is not suitable for the moving target Tracking in a complex environment.
A tracking method combining self-learning and a suspension constraint principle initializes a network through learning under a line and updates a network weight on the line, wherein the motion estimation uses logistic regression probability estimation. Although the accuracy of the method is improved to some extent, the method is still not robust to more challenging problems such as occlusion or partial occlusion. The hash method can distinguish different image blocks by using simple information, is sensitive to specific contents of the image and is insensitive to the integrity of the image. The moving target can be tracked by adopting three basic hash methods (a perceptual hash method, an average hash method and a differential hash method), and the method considers the characteristics of the three hash methods, but also increases the complexity of the method. In order to reduce the complexity of the method and improve the efficiency of the method, a two-dimensional combined hash method is provided as a feature extraction method, and Bayesian motion estimation is combined to complete the tracking of a moving target, but the method can only show a good effect on a part of video sequences.
Disclosure of Invention
In view of this, the present invention provides a method and an apparatus for tracking a moving target across scales, which can implement accurate feature extraction of the moving target, predict the position and size of the moving target, and obtain an optimal tracking result of the moving target.
Based on the above purpose, the invention provides a cross-scale tracking method for a moving target, which comprises the following steps:
building a neural network, and carrying out weight training layer by layer to obtain initialized weights;
inputting the true value of the first video frame into the established neural network, and updating the initial weight;
calculating an offset term for the input video frame starting from the second video frame;
calculating an output value of the video frame after the video frame is input into the neural network according to the obtained bias term of the video frame;
judging whether the confidence coefficient output by the neural network in the video frame is smaller than a preset threshold value, if so, updating the weight of the neural network, and estimating the moving target of the video frame according to the neural network with the updated weight; and if the difference is larger than the preset threshold, directly estimating the moving object of the video frame.
In some embodiments of the present invention, a stack-type noise reduction self-encoder is used to train the weights layer by layer, so as to obtain initialized weights.
In some embodiments of the present invention, the constructing the neural network and performing weight training layer by layer to obtain the initialized weight includes:
let k denote the number of training samples, i ═ 1,2, …, k, and the training set of samples is { x }l,…,xi,…,xkRespectively representing a weight value of a hidden layer and a weight value of an output layer, and b' and b represent bias items of different hidden layers; from input samples xiA hidden layer representation h can be derivediAnd reconstruction of the inputAs shown in formula (1) and formula (2):
hi=f(W'xi+b') (1)
wherein f (·) represents a nonlinear excitation function; sigm (-) represents the excitation function of the neural network, as shown in equation (3):
the noise reduction self-encoder is obtained through learning, and the formula (4) is shown as follows:
wherein,representing a loss function reconstructed by a neural network, wherein a second term is a weight penalty term; using parameter gamma to reconstruction error and weight penalty itemBalancing to fully consider the relationship between the two; assume the use of equation (4)Then, the partial derivative is calculated for E, and the formula (5) is obtained:
if it is notOrder toThen obtaining the formula (6) and the formula (7):
where, η is the learning rate,jis an error, xjiAnd wjiRepresenting the data and the weights from the ith layer to the jth layer, respectively.
In some embodiments of the invention, said calculating the bias term for the input video frame comprises:
sampling a video frame to obtain an initialization sample S ═ S1,s2,…,sN},i=1,2,…,N;
Calculating the previous frame It-1Tracking a hash value l of the result;
for each sample siCalculating siAverage hash value of
Calculate l and each sample s using equation (8)iHamming distance therebetween; where N denotes the number of samples, S ═ S1,s2,…,sN},i=1,2,…,N,dis(l,si) Representing video frames It-1Tracking the hash value l of the result and the current ith sample siHash value ofThe Hamming distance therebetween is represented by formula (8):
obtaining a sample s according to the obtained Hamming distance and the formula (9)iThe corresponding bias term:
wherein, biThe bias term for the ith sample is indicated.
In some embodiments of the present invention, said estimating the moving object of the video frame according to the neural network with the updated weight value includes:
calculating the updated confidence of each particle according to the neural network with the updated weight;
and selecting the particle with the maximum confidence coefficient after updating as the moving object of the video frame in the calculated confidence coefficient after updating.
In another aspect, the present invention further provides a moving object cross-scale tracking apparatus, including:
the neural network construction unit is used for constructing a neural network and carrying out weight training layer by layer to obtain initialized weights;
the weight updating unit is used for inputting the true value of the first video frame into the established neural network and updating the initial weight; calculating an offset term for the input video frame starting from the second video frame; calculating an output value of the video frame after the video frame is input into the neural network according to the obtained bias term of the video frame; judging whether the confidence coefficient output by the neural network in the video frame is smaller than a preset threshold value, if so, updating the weight of the neural network, and if not, not processing;
and the moving object estimation unit is used for estimating the moving object of the video frame.
In some embodiments of the present invention, the neural network constructing unit performs a layer-by-layer weight training by using a stacked noise reduction self-encoder to obtain an initialized weight.
In some embodiments of the present invention, the obtaining the initialized weight value by the neural network constructing unit includes:
let k denote the number of training samples, i ═ 1,2, …, k, and the training set of samples is { x }l,…,xi,…,xkRespectively representing a weight value of a hidden layer and a weight value of an output layer, and b' and b represent bias items of different hidden layers; from input samples xiA hidden layer representation h can be derivediAnd reconstruction of the inputAs shown in formula (1) and formula (2):
hi=f(W'xi+b') (1)
wherein f (·) represents a nonlinear excitation function; sigm (-) represents the excitation function of the neural network, as shown in equation (3):
the noise reduction self-encoder is obtained through learning, and the formula (4) is shown as follows:
wherein,representing a loss function reconstructed by a neural network, wherein a second term is a weight penalty term; balancing the reconstruction error and the weight penalty term by using the parameter gamma so as to fully consider the relationship between the reconstruction error and the weight penalty term; assume the use of equation (4)Then, the partial derivative is calculated for E, and the formula (5) is obtained:
if it is notOrder toThen obtaining the formula (6) and the formula (7):
where, η is the learning rate,jis an error, xjiAnd wjiRepresenting the data and the weights from the ith layer to the jth layer, respectively.
In some embodiments of the present invention, the calculating the bias term for the input video frame by the weight updating unit includes:
eye-to-eyeSampling frequency frame to obtain initialization sample S ═ S1,s2,…,sN},i=1,2,…,N;
Calculating the previous frame It-1Tracking a hash value l of the result;
for each sample siCalculating siAverage hash value of
Calculate l and each sample s using equation (8)iHamming distance therebetween; where N denotes the number of samples, S ═ S1,s2,…,sN},i=1,2,…,N,dis(l,si) Representing video frames It-1Tracking the hash value l of the result and the current ith sample siHash value ofThe Hamming distance therebetween is represented by formula (8):
and obtaining a bias term corresponding to the sample si according to the obtained Hamming distance and the formula (9):
wherein, biThe bias term for the ith sample is indicated.
In some embodiments of the present invention, the estimating, by the moving object estimating unit, the moving object of the video frame according to the neural network with the updated weight value includes:
calculating the updated confidence of each particle according to the neural network with the updated weight;
and selecting the particle with the maximum confidence coefficient after updating as the moving object of the video frame in the calculated confidence coefficient after updating.
From the above, the moving target cross-scale tracking method and device provided by the invention correct the bias term of the neural network by calculating the hash characteristic values of different sampling slices and taking the similarity of the hash characteristic values and the template as the scale characteristic; and estimating the probability of each particle by adopting a moving target tracking method of particle filtering, thereby completing the tracking of the moving target.
Drawings
FIG. 1 is a schematic flow chart of a cross-scale tracking method for a moving object according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a moving object cross-scale tracking device in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
As an embodiment, referring to fig. 1, the method for tracking a moving object across scales may adopt the following steps:
step 101, building a neural network, and performing weight training layer by layer to obtain initialized weights and bias items.
Preferably, the stack-type noise reduction self-encoder is used for training the weight layer by layer to obtain the initialized weight.
In the embodiment, let k denote the number of training samples, i ═ 1,2, …, k, and the training set of samples is { x }1,…,xi,…,xkAnd b' and b represent bias items of different hidden layers. From input samples xiA hidden layer representation h can be derivediAnd reconstruction of the inputAs shown in formula (1) and formula (2):
hi=f(W'xi+b') (1)
where f (-) represents a nonlinear excitation function. sigm (-) represents the excitation function of the neural network, as shown in equation (3):
the noise reduction self-encoder is obtained through learning, and the formula (4) is shown as follows:
wherein,representing a loss function reconstructed by a neural network, wherein the second item is a weight penalty item, and searching for a smaller weight is completed by using a gradient descent method, so that the possibility of overfitting is reduced; and balancing the reconstruction error and the weight penalty term by using the parameter gamma so as to fully consider the relationship between the reconstruction error and the weight penalty term. It is assumed that equation (4) can be usedThen, the partial derivative is obtained for E, and the formula (5) is obtained:
if it is notOrder toThen obtaining the formula (6) and the formula (7):
where, η is the learning rate,jis an error, xjiAnd wjiRepresenting the data and the weights from the ith layer to the jth layer, respectively.
Preferably, an eight-layer neural network structure may be established in step 101.
And 102, inputting the true value of the first video frame into the established neural network, and updating the initial weight.
Specifically, the true value of the first frame video frame can be substituted into equation (7) to obtain the updated weight.
Step 103, starting from the second video frame, calculating the bias term for the input video frame, i.e. updating the initialized bias term. The specific implementation process comprises the following steps:
the method comprises the following steps: sampling a video frame to obtain an initialization sample S ═ S1,s2,…,sN},i=1,2,…,N
Step two: calculating the previous frame It-1The hash value l of the result is tracked.
Step three: for each sample siCalculating siAverage hash value of
Step four: calculate l and each sample s using equation (8)iHamming distance between.
In one embodiment, assume that the current video frame is ItThe previous video frame It-1As a comparison object, calculating It-1And tracking the average hash value of the result, and calculating the Hamming distance between the average hash value of the result and the average hash values of all current samples. The greater the hamming distance between each two hash values, the less similarity between the two, and vice versa.
Let N denote the number of samples, S ═ S1,s2,…,sN},i=1,2,…,N,dis(l,si) Representing video frames It-1Tracking the hash value l of the result and the current ith sample siHash ofValue ofThe hamming distance therebetween can be expressed as shown in equation (8):
wherein,indicating an exclusive or operation.
Step five: obtaining a sample s according to the obtained Hamming distanceiThe corresponding bias term.
Wherein, the sample siThe corresponding new bias term is shown as equation (9):
wherein, biThe bias term for the ith sample is indicated. Easy to find, new bias term biTo some extent, the importance of the current sample in all samples.
It is also worth noting that in neural networks, the bias term is typically initially set to a fixed value, such as 1. The bias term is used as an input term of the deep network, and represents the importance degree between different samples to a certain degree. And capturing structural features of the target by using inherent low-frequency information of the image, and repairing the bias term to enable different samples to correspond to different bias values. The structural feature reflects different scales among sampling slices to a certain extent and is a representation form of image scale features.
And 104, calculating the output confidence of the video frame after the video frame is input into the neural network according to the obtained bias term of the video frame.
As a preferred embodiment, for deep networks, starting from the second layer, i.e. layer 2: 8. Further, the confidence of each layer output of the neural network can be calculated layer by using the formula (1) and the formula (2). Where it ends when the last layer of the neural network is computed.
Step 105, judging whether the confidence coefficient output by the neural network in the video frame is smaller than a preset threshold value, if so, updating the weight of the neural network, and executing step 106. If so, proceed directly to step 106.
Preferably, the preset threshold may be 0.8. In an embodiment, the confidence is calculated by a neural network, that is, the sampling slice is input into the neural network, and the corresponding confidence is output. When the confidence is smaller than a preset threshold (for example, smaller than 0.8), it indicates that the current neural network is already not suitable for the dynamically changing moving object, and the weights of the neural network need to be updated by using the positive and negative samples of the last several frames. And if the current weight is larger than the preset threshold value, the current neural network still can well distinguish the moving target from the background, so that the weight of the sampling particles can be continuously calculated by using the current neural network without other processing.
Further, the weights of the neural network may be updated using equation (7) described above. For example: for each sample, x is obtained by forward propagation through the last hidden layerjiFinally, the estimated value a is obtained through the output layer, and the label is y (the label of the positive sample is 1, and the negative sample is 0), then the formula (7) is obtainedj=(a-y)×sigmoid′(xji). The weights for each layer are then updated by back-propagation from the output layer to the input layer. For example: for the l-th layer or layers,
step 106, estimating the moving object of the video frame.
In an embodiment, the updated confidence for each particle may be calculated from the neural network with updated weights. And selecting the particle with the maximum confidence coefficient after updating as the moving object of the video frame in the calculated confidence coefficient after updating. Of course, if the confidence level is greater than the preset threshold in step 105, the particle with the highest confidence level may be directly selected as the moving object of the video frame. If the position change of the target in the adjacent frame is small, it can be considered that the target in the current frame is more likely to surround the particle with high confidence in the previous frame.
In a preferred aspectIn the embodiment, before each particle is substituted into the neural network with updated weight, the particle needs to be filtered. Wherein the process of particle filtering is an iterative process, assuming XtRepresenting the state variable of the moving object at time t, then in an initial step, first in the preceding set of particles Xt-1In the method, N particles are proportionally collected according to the distribution of the particles to obtain a new state. However, the newly generated particles are usually influenced by probability, resulting in particle degradation, which results in a majority of particles concentrating around the particles with larger weight, and then posterior probability p (x)t|z1:t-1) Is approximately equal to having the importance weightA finite set of N particles as shown in equation (10):
p(xt|z1:t-1)=∫p(xt|xt-1)p(xt-1|z1:t-1)dxt-1(10)
wherein, if particleThat is, the predicted tracking result, the background information contained in the corresponding rectangular frame (the tracking result is represented by the position of the particle and the size of the target, the position and the size form a rectangular frame, and the position and the size of the target are visually represented on the image in a reverse direction by using the rectangular frame) is less than the background information in the rectangular frames corresponding to other particles, and the weight of the particle is larger. Then, the relationship between the weight and the posterior probability can be obtained from equation (11):
wherein x is1:t-1Representing random samples forming a posterior probability distribution, N representing the number of samples, z1:tIndicating the observed value from the start time to the time t,representing the ith sample at time t,represents the ith weight vector, q (x), at time t-1 (also representing the previous frame)t|x1:t-1,z1:t)=p(xt|xt-1) Is the importance distribution.
In another aspect of the present invention, a moving object cross-scale tracking apparatus is further provided, as shown in fig. 2, the moving object cross-scale tracking apparatus includes a neural network constructing unit 201, a weight updating unit 202, and a moving object estimating unit 203, which are connected in sequence. The neural network constructing unit 201 constructs a neural network, and performs weight training layer by layer to obtain an initialized weight. Preferably, the neural network constructing unit 201 performs a layer-by-layer weight training by using the stacked noise reduction self-encoder to obtain an initialized weight. The weight updating unit 202 inputs the true value of the first video frame into the established neural network, and updates the initial weight; calculating an offset term for the input video frame starting from the second video frame; calculating an output value of the video frame after the video frame is input into the neural network according to the obtained bias term of the video frame; judging whether the confidence coefficient output by the neural network in the video frame is smaller than a preset threshold value, if so, updating the weight of the neural network, and estimating the moving target of the video frame by the moving target estimation unit 203 according to the neural network with the updated weight; if the motion object is larger than the threshold, the motion object estimation unit 203 directly estimates the motion object of the video frame.
As an embodiment, in the process of initializing the weights, the neural network constructing unit 202 may assume that k represents the number of training samples, i ═ 1,2, …, k, and the training set of samples is ═ 1,2, …, k ″W 'and W respectively represent the weight of a hidden layer and the weight of an output layer, and b' and b represent bias items of different hidden layers; from input samples xiA hidden layer representation h can be derivediAnd reconstruction of the inputAs shown in formula (1) and formula (2):
hi=f(W'xi+b') (1)
wherein f (·) represents a nonlinear excitation function; sigm (-) represents the excitation function of the neural network, as shown in equation (3):
the noise reduction self-encoder is obtained through learning, and the formula (4) is shown as follows:
wherein,representing a loss function reconstructed by a neural network, wherein a second term is a weight penalty term; balancing the reconstruction error and the weight penalty term by using the parameter gamma so as to fully consider the relationship between the reconstruction error and the weight penalty term; assume the use of equation (4)Then, the partial derivative is calculated for E, and the formula (5) is obtained:
if it is notOrder toThen obtaining the formula (6) and the formula (7):
where, η is the learning rate,jis an error, xjiAnd wjiRepresenting the data and the weights from the ith layer to the jth layer, respectively.
Further, when the weight value updating unit 202 calculates the bias term for the input video frame, it can be assumed that the current video frame is ItThe previous video frame It-1As a comparison object, calculating It-1And tracking the average hash value of the result, and calculating the Hamming distance between the average hash value of the result and the average hash values of all current samples. The greater the hamming distance between each two hash values, the less similarity between the two, and vice versa. Specifically, a video frame is sampled, and an initialization sample S ═ S is obtained1,s2,…,sN1,2, …, N. Then, the previous frame I is calculatedt-1Tracking the hash value l of the result, for each sample siCalculating siAverage hash value ofCalculate l and each sample s using equation (8) belowiHamming distance between:
finally, according to the obtained Hamming distance and the formula (9), obtaining a sample siThe corresponding bias term.
Wherein, biThe bias term for the ith sample is indicated.
In another embodiment, the moving object estimation unit 203 may calculate an updated confidence for each particle according to a neural network with updated weights. And selecting the particle with the maximum confidence coefficient after updating as the moving object of the video frame in the calculated confidence coefficient after updating. Of course, if the confidence is greater than the preset threshold, the particle with the highest confidence may be directly selected as the moving object of the video frame. In a preferred embodiment, the filtering process is performed on each particle before each particle is substituted into the neural network with updated weights.
It should be noted that, in the implementation of the moving object cross-scale tracking apparatus according to the present invention, the details of the moving object cross-scale tracking method described above have been described in detail, and therefore, the repeated contents are not described again.
In summary, the moving target cross-scale tracking method and device provided by the invention creatively train and learn the parameters of the network on line by using the stack type noise reduction automatic encoder before tracking; in the tracking process, calculating the hash value of each sampling chip by adopting an average hash method, calculating the Hamming distance between the sampling chip and the tracking result of the previous frame through similarity calculation, and correcting the offset item of the network by utilizing the distance; through the characteristic extraction process of the network, scale information and detail information of the moving target can be obtained, and the position and the size of the moving target are predicted by utilizing particle filter motion estimation, so that the tracking result of the moving target is finally obtained;
moreover, a neural network is constructed by the noise reduction self-encoder obtained through learning, the smaller weight is searched by a gradient descent method, and the possibility of overfitting is reduced; meanwhile, the invention captures the structural characteristics of the target by utilizing the inherent low-frequency information of the image, and repairs the bias item, so that different samples correspond to different bias values; each sample is down-sampled into new samples with different sizes, the high-frequency part of the image is removed, an image containing 64 pixels is obtained, and the gray level average value of the new image is calculated; judging the size of each pixel value of the new image, wherein the size is larger than or equal to the average value and is 1, otherwise, the size is 0, and finally obtaining the hash value of the sample; calculating the Hamming distance between the sample sampled in the current frame and the Hash value of the template, wherein the smaller the distance is, the higher the similarity with the template is; establishing bias items of the neural network aiming at different sampling pieces through the average hash value, thereby realizing different scale correspondences of the different sampling pieces; therefore, the invention can have wide and important popularization significance; finally, the whole moving target cross-scale tracking method and device are compact and easy to control.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (10)
1. A cross-scale tracking method for a moving target is characterized by comprising the following steps:
building a neural network, and carrying out weight training layer by layer to obtain initialized weights;
inputting the true value of the first video frame into the established neural network, and updating the initial weight;
calculating an offset term for the input video frame starting from the second video frame;
calculating an output value of the video frame after the video frame is input into the neural network according to the obtained bias term of the video frame;
judging whether the confidence coefficient output by the neural network in the video frame is smaller than a preset threshold value, if so, updating the weight of the neural network, and estimating the moving target of the video frame according to the neural network with the updated weight; and if the difference is larger than the preset threshold, directly estimating the moving object of the video frame.
2. The method of claim 1, wherein the layer-by-layer weights are trained by using a stacked noise reduction self-encoder to obtain initialized weights.
3. The method of claim 2, wherein the constructing the neural network and performing weight training layer by layer to obtain initialized weights comprises:
let k denote the number of training samples, i ═ 1,2, …, k, and the training set of samples is { x }1,…,xi,…,xkRespectively representing a weight value of a hidden layer and a weight value of an output layer, and b' and b represent bias items of different hidden layers; from input samples xiA hidden layer representation h can be derivediAnd reconstruction of the inputAs shown in formula (1) and formula (2):
hi=f(W'xi+b') (1)
wherein f (·) represents a nonlinear excitation function; sigm (-) represents the excitation function of the neural network, as shown in equation (3):
the noise reduction self-encoder is obtained through learning, and the formula (4) is shown as follows:
wherein,representing a loss function reconstructed by a neural network, wherein a second term is a weight penalty term; balancing the reconstruction error and the weight penalty term by using the parameter gamma so as to fully consider the relationship between the reconstruction error and the weight penalty term; assume the use of equation (4)Then, the partial derivative is calculated for E, and the formula (5) is obtained:
if it is notOrder toThen obtaining the formula (6) and the formula (7):
where, η is the learning rate,jis an error, xjiAnd wjiRepresenting the data and the weights from the ith layer to the jth layer, respectively.
4. The method of claim 3, wherein calculating the bias term for the input video frame comprises:
sampling a video frame to obtain an initialization sample S ═ S1,s2,…,sN},i=1,2,…,N;
Calculating the previous frame It-1Tracking a hash value l of the result;
for each sample siCalculating siAverage hash value of
Calculate l and each sample s using equation (8)iHamming distance therebetween; where N denotes the number of samples, S ═ S1,s2,…,sN},i=1,2,…,N,dis(l,si) Representing video frames It-1Tracking the hash value l of the result and the current ith sample siHash ofValue ofThe Hamming distance therebetween is represented by formula (8):
obtaining a sample s according to the obtained Hamming distance and the formula (9)iThe corresponding bias term:
wherein, biThe bias term for the ith sample is indicated.
5. The method according to any one of claims 1 to 4, wherein the estimating the moving object of the video frame according to the neural network with the updated weight value comprises:
calculating the updated confidence of each particle according to the neural network with the updated weight;
and selecting the particle with the maximum confidence coefficient after updating as the moving object of the video frame in the calculated confidence coefficient after updating.
6. A moving object cross-scale tracking device is characterized by comprising:
the neural network construction unit is used for constructing a neural network and carrying out weight training layer by layer to obtain initialized weights;
the weight updating unit is used for inputting the true value of the first video frame into the established neural network and updating the initial weight; calculating an offset term for the input video frame starting from the second video frame; calculating an output value of the video frame after the video frame is input into the neural network according to the obtained bias term of the video frame; judging whether the confidence coefficient output by the neural network in the video frame is smaller than a preset threshold value, if so, updating the weight of the neural network, and if not, not processing;
and the moving object estimation unit is used for estimating the moving object of the video frame.
7. The apparatus of claim 6, wherein the neural network building unit performs layer-by-layer weight training by using a stacked noise reduction self-encoder to obtain initialized weights.
8. The apparatus of claim 7, wherein the neural network building unit obtaining initialized weights comprises:
let k denote the number of training samples, i ═ 1,2, …, k, and the training set of samples is { x }1,…,xi,…,xkRespectively representing a weight value of a hidden layer and a weight value of an output layer, and b' and b represent bias items of different hidden layers; from input samples xiA hidden layer representation h can be derivediAnd reconstruction of the inputAs shown in formula (1) and formula (2):
hi=f(W'xi+b') (1)
wherein f (·) represents a nonlinear excitation function; sigm (-) represents the excitation function of the neural network, as shown in equation (3):
the noise reduction self-encoder is obtained through learning, and the formula (4) is shown as follows:
wherein,representing a loss function reconstructed by a neural network, wherein a second term is a weight penalty term; balancing the reconstruction error and the weight penalty term by using the parameter gamma to fullyConsidering the relationship between the two; assume the use of equation (4)Then, the partial derivative is calculated for E, and the formula (5) is obtained:
if it is notOrder toThen obtaining the formula (6) and the formula (7):
where, η is the learning rate,jis an error, xjiAnd wjiRepresenting the data and the weights from the ith layer to the jth layer, respectively.
9. The apparatus of claim 8, wherein the weight update unit calculates the bias term for the input video frame comprises:
sampling a video frame to obtain an initialization sample S ═ S1,s2,…,sN},i=1,2,…,N;
Calculating the previous frame It-1Tracking a hash value l of the result;
for each sample siCalculating siAverage hash value of
Calculate l and each sample s using equation (8)iHamming distance therebetween; where N denotes the number of samples, S ═ S1,s2,…,sN},i=1,2,…,N,dis(l,si) Representing video frames It-1Tracking the hash value l of the result and the current ith sample siHash value ofThe Hamming distance therebetween is represented by formula (8):
obtaining a sample s according to the obtained Hamming distance and the formula (9)iThe corresponding bias term:
wherein, biThe bias term for the ith sample is indicated.
10. The apparatus according to any one of claims 6 to 9, wherein the estimating the moving object of the video frame according to the neural network with updated weights comprises:
calculating the updated confidence of each particle according to the neural network with the updated weight;
and selecting the particle with the maximum confidence coefficient after updating as the moving object of the video frame in the calculated confidence coefficient after updating.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610548086.0A CN106203350B (en) | 2016-07-12 | 2016-07-12 | A kind of across the scale tracking of moving target and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610548086.0A CN106203350B (en) | 2016-07-12 | 2016-07-12 | A kind of across the scale tracking of moving target and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106203350A true CN106203350A (en) | 2016-12-07 |
CN106203350B CN106203350B (en) | 2019-10-11 |
Family
ID=57477769
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610548086.0A Active CN106203350B (en) | 2016-07-12 | 2016-07-12 | A kind of across the scale tracking of moving target and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106203350B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169117A (en) * | 2017-05-25 | 2017-09-15 | 西安工业大学 | A kind of manual draw human motion search method based on autocoder and DTW |
CN107292914A (en) * | 2017-06-15 | 2017-10-24 | 国家新闻出版广电总局广播科学研究院 | Visual target tracking method based on small-sized single branch convolutional neural networks |
CN109559329A (en) * | 2018-11-28 | 2019-04-02 | 陕西师范大学 | A kind of particle filter tracking method based on depth denoising autocoder |
CN110136173A (en) * | 2019-05-21 | 2019-08-16 | 浙江大华技术股份有限公司 | A kind of target location processing method and device |
CN110298404A (en) * | 2019-07-02 | 2019-10-01 | 西南交通大学 | A kind of method for tracking target based on triple twin Hash e-learnings |
CN112634188A (en) * | 2021-02-02 | 2021-04-09 | 深圳市爱培科技术股份有限公司 | Vehicle far and near scene combined imaging method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5548662A (en) * | 1993-02-08 | 1996-08-20 | Lg Electronics Inc. | Edge extracting method and apparatus using diffusion neural network |
CN101089917A (en) * | 2007-06-01 | 2007-12-19 | 清华大学 | Quick identification method for object vehicle lane changing |
CN104036323A (en) * | 2014-06-26 | 2014-09-10 | 叶茂 | Vehicle detection method based on convolutional neural network |
CN105654509A (en) * | 2015-12-25 | 2016-06-08 | 燕山大学 | Motion tracking method based on composite deep neural network |
-
2016
- 2016-07-12 CN CN201610548086.0A patent/CN106203350B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5548662A (en) * | 1993-02-08 | 1996-08-20 | Lg Electronics Inc. | Edge extracting method and apparatus using diffusion neural network |
CN101089917A (en) * | 2007-06-01 | 2007-12-19 | 清华大学 | Quick identification method for object vehicle lane changing |
CN104036323A (en) * | 2014-06-26 | 2014-09-10 | 叶茂 | Vehicle detection method based on convolutional neural network |
CN105654509A (en) * | 2015-12-25 | 2016-06-08 | 燕山大学 | Motion tracking method based on composite deep neural network |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169117A (en) * | 2017-05-25 | 2017-09-15 | 西安工业大学 | A kind of manual draw human motion search method based on autocoder and DTW |
CN107169117B (en) * | 2017-05-25 | 2020-11-10 | 西安工业大学 | Hand-drawn human motion retrieval method based on automatic encoder and DTW |
CN107292914A (en) * | 2017-06-15 | 2017-10-24 | 国家新闻出版广电总局广播科学研究院 | Visual target tracking method based on small-sized single branch convolutional neural networks |
CN109559329A (en) * | 2018-11-28 | 2019-04-02 | 陕西师范大学 | A kind of particle filter tracking method based on depth denoising autocoder |
CN110136173A (en) * | 2019-05-21 | 2019-08-16 | 浙江大华技术股份有限公司 | A kind of target location processing method and device |
CN110298404A (en) * | 2019-07-02 | 2019-10-01 | 西南交通大学 | A kind of method for tracking target based on triple twin Hash e-learnings |
CN112634188A (en) * | 2021-02-02 | 2021-04-09 | 深圳市爱培科技术股份有限公司 | Vehicle far and near scene combined imaging method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106203350B (en) | 2019-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106203350B (en) | A kind of across the scale tracking of moving target and device | |
CN110232394B (en) | Multi-scale image semantic segmentation method | |
CN107358293B (en) | Neural network training method and device | |
CN107369166B (en) | Target tracking method and system based on multi-resolution neural network | |
CN105938559B (en) | Use the Digital Image Processing of convolutional neural networks | |
CN108062562B (en) | Object re-recognition method and device | |
Saputra et al. | Learning monocular visual odometry through geometry-aware curriculum learning | |
CN110120020A (en) | A kind of SAR image denoising method based on multiple dimensioned empty residual error attention network | |
CN107330357A (en) | Vision SLAM closed loop detection methods based on deep neural network | |
CN109754078A (en) | Method for optimization neural network | |
CN113657560B (en) | Weak supervision image semantic segmentation method and system based on node classification | |
CN107154024A (en) | Dimension self-adaption method for tracking target based on depth characteristic core correlation filter | |
CN110349185B (en) | RGBT target tracking model training method and device | |
CN107885322A (en) | Artificial neural network for mankind's activity identification | |
CN104881029B (en) | Mobile Robotics Navigation method based on a point RANSAC and FAST algorithms | |
CN109948741A (en) | A kind of transfer learning method and device | |
CN105678284A (en) | Fixed-position human behavior analysis method | |
CN111126278B (en) | Method for optimizing and accelerating target detection model for few-class scene | |
CN105981050A (en) | Method and system for exacting face features from data of face images | |
CN104156919B (en) | A kind of based on wavelet transformation with the method for restoring motion blurred image of Hopfield neutral net | |
CN114186672A (en) | Efficient high-precision training algorithm for impulse neural network | |
CN112446888A (en) | Processing method and processing device for image segmentation model | |
CN114897728A (en) | Image enhancement method and device, terminal equipment and storage medium | |
CN114283320A (en) | Target detection method based on full convolution and without branch structure | |
CN110827319B (en) | Improved Staple target tracking method based on local sensitive histogram |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |