CN107808122A - Method for tracking target and device - Google Patents
Method for tracking target and device Download PDFInfo
- Publication number
- CN107808122A CN107808122A CN201710920018.7A CN201710920018A CN107808122A CN 107808122 A CN107808122 A CN 107808122A CN 201710920018 A CN201710920018 A CN 201710920018A CN 107808122 A CN107808122 A CN 107808122A
- Authority
- CN
- China
- Prior art keywords
- mrow
- target
- bounding box
- msub
- neural networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The embodiment of the present application discloses a kind of method for tracking target and device, and two layers of convolutional neural networks is combined with time recurrent neural networks model, solves the problems, such as that the verification and measurement ratio for Small object is low.Moreover, the information in extraction background with target association carries out target detection, speed and accuracy rate of the target following model in video object detection are improved.
Description
Technical field
The application is related to target detection technique field, more specifically to a kind of method for tracking target and device.
Background technology
Target following is always computer vision, and the hot issue in area of pattern recognition, it is in video monitoring, man-machine friendship
Mutually, automobile navigation etc. is all widely used.Inventor has found during the application is realized, current target following
Method, it is poor for the crowd surveillance effect of very little.
Therefore, how to improve the accuracy rate of object detection results turns into urgent problem to be solved.
The content of the invention
The purpose of the application is to provide a kind of method for tracking target and device, to improve the accuracy rate of object detection results.
To achieve the above object, this application provides following technical scheme:
A kind of method for tracking target, each two field picture in video flowing is carried out by training in advance good target following model
Target detection, including:
The first convolutional neural networks in the target following model carry out target detection to described image, are detected
Position of the target in described image, and the classification of detected target;
The second convolutional neural networks in the target following model carry out the target detection based on background to described image,
Obtain information associated with different classes of target in background;
Time recurrent neural network in the target following model based in the background with different classes of target phase
The information of association, the target detected is being associated with different backgrounds at different moments, is obtaining object detection results.
The above method, it is preferred that first convolutional neural networks carry out the process of target detection to image, including:
Described image is divided into n*n grid;
In several bounding boxs of each grid forecasting, and the position of each bounding box, size are recorded, and each bounding box
Corresponding trust value and class label;
Based on trust value and class label corresponding to each bounding box, trust value point of each bounding box to generic is calculated
Number;
The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic in the grid, and to all guarantors
The different classes of bounding box stayed carries out non-maxima suppression respectively, obtains position and the classification information of target.
The above method, it is preferred that first convolutional neural networks carry out the process of target detection to image, including:
Described image is divided into m*m grid according to L kinds different granularity of division, m there are L different values;
Each corresponding granularity of division, predicts several bounding boxs, and record the position of each bounding box in each grid
Put, size, and trust value and class label corresponding to each bounding box;
Based on each trust value and class label corresponding to bounding box in grid, letter of each bounding box to generic is calculated
Appoint value fraction;
The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic in the grid, and difference is drawn
The different classes of bounding box remained under gradation degree carries out non-maxima suppression respectively, obtains position and the classification of target
Information.
The above method, it is preferred that time recurrent neural network is based on associated with different classes of target in the background
Information, the target detected is being associated with different backgrounds at different moments, is obtaining object detection results, including:
Time recurrent neural network passes through between the same type of at different moments target and different background that learn in advance
Incidence relation, the target detected is being associated with different backgrounds at different moments, is obtaining object detection results.
The above method, it is preferred that the training process of the target following model includes:
The weights of the parameter of convolutional layer in YOLO convolutional neural networks are assigned to first convolutional neural networks, institute
The weights for stating the other parameters of the first convolutional neural networks carry out weight initialization from gaussian random distribution;In target detection and
First convolutional neural networks are trained end to end in classification task, obtain the first convolution neural network model;
The weights of the parameter of convolutional layer in first convolutional neural networks are assigned to second convolutional neural networks, it is described
The weights of the other parameters of second convolutional neural networks carry out weight initialization from gaussian random distribution;In the mesh based on background
Second convolutional neural networks are trained end to end in mark type detection task, obtain the second convolutional neural networks mould
Type;
Give the parameter assignment of the weights of the convolutional layer of the second convolution neural network model to first convolutional Neural
The convolutional layer of network model, it is trained again by as above step, so circulation twice, obtains the first final convolutional Neural
Network model and the second convolution neural network model;
By the video training set chosen in advance by target under at different moments same type of target and different background
Time recurrent neural network is trained in being associated for task, obtains time recurrent neural networks model;The video
Training set includes the equal first kind video of quantity and the second class video, the first kind video and the second class video
Duration is identical, and the amplitude of variation of target is more than the amplitude of variation of target in second video in the first kind video;
Construct initial target following model:Whole convolutional layers of first convolution neural network model are connected entirely by first
Connect layer and be connected into the time recurrent neural networks model, by least one of the convolutional layer of the second convolution neural network model
Point (for example, it may be whole convolutional layer or first 12 layers) is connected into the time recurrence god by the second full articulamentum
Through network model, by the output end of the time recurrent neural networks model and the described first full articulamentum and the second full articulamentum
Input, and the 3rd full articulamentum input connection,
The initial target following model is trained on preset object detection task, obtain the target with
Track model.
The above method, it is preferred that described that first convolutional neural networks are carried out in target detection and classification task
Train end to end, including:First convolutional neural networks carry out target detection and classification in the following way:
Divide an image into n*n grid;
Several bounding boxs are predicted in each grid, and record the position of each bounding box, size, and each encirclement
Trust value corresponding to box and class label;
Based on trust value and class label corresponding to each bounding box, trust value point of each bounding box to generic is calculated
Number;
The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic information in the grid, and to institute
There is the different classes of bounding box retained in grid to carry out non-maxima suppression respectively, obtain object detection results;
The extent of error of the object detection results of first convolutional neural networks is calculated by preset loss function, it is described
Loss function is:
Wherein, Loss be the first convolutional neural networks object detection results extent of error, λ1Predict and lose for coordinate
Loss weight, λ1Value can be 5, λ2The loss weight lost for the trust value of aimless bounding box, λ2Value
Can be 0.5, λ3For the loss weight of trust value loss and the classification loss of the bounding box containing target, λ3Value can be
1;I is used to distinguish different grids, and j is used to distinguish different bounding boxs;xij, yij, wij, hij, CijRepresent predicted value,Represent calibration value, S2Divided grid number is represented, B represents the bounding box in some grid
Number, CijRepresent the trust value fraction of j-th of bounding box in i-th of grid, pi(c) mesh of c classifications in i-th of grid is represented
Mark existing probability;If in the bounding box and i-th of grid demarcated in advance j-th of bounding box detection goods categories be
, thenTake 1;OtherwiseTake 0;If the article of j-th of bounding box detection in the bounding box and i-th of grid demarcated in advance
Classification be it is the same, thenTake 0;OtherwiseTake 1;
If extent of error is more than or equal to predetermined threshold value, weighed using back-propagation algorithm and Adam update methods
Value renewal, and input in training storehouse be not used data are trained next time, until the extent of damage and the loss function
The difference of minimum value be less than pre-determined threshold.
A kind of object detecting device, including:
First detection module, for carrying out target inspection to each two field picture in video flowing by the first convolutional neural networks
Survey, position of the target detected in described image, and the classification of detected target;
Second detection module, for carrying out the target inspection based on background to described image by the second convolutional neural networks
Survey, obtain information associated with different classes of target in background;
Relating module, for based on information associated with different classes of target in the background, the mesh that will be detected
It is marked on and is associated at different moments with different backgrounds, obtains object detection results.
Said apparatus, it is preferred that the first detection module is specifically used for, by the first convolutional neural networks by the figure
As being divided into n*n grid;In several bounding boxs of each grid forecasting, and the position of each bounding box, size are recorded, and
Trust value and class label corresponding to each bounding box;Based on trust value and class label corresponding to each bounding box, each bag is calculated
Enclose trust value fraction of the box to generic;The bag of predetermined threshold value will be less than to the trust value fraction of generic in the grid
Enclose box deletion, and to different classes of bounding box with a grain of salt carry out non-maxima suppression respectively, obtain target position and
Classification information.
Said apparatus, it is preferred that the first detection module is specifically used for, by the first convolutional neural networks according to L kinds
Described image is divided m*m grid by different granularity of division, and m has L different values;Each corresponding granularity of division,
Several bounding boxs are predicted in each grid, and record the position of each bounding box, size, and are believed corresponding to each bounding box
Appoint value and class label;Based on each trust value and class label corresponding to bounding box in grid, each bounding box is calculated to affiliated class
Other trust value fraction;The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic in the grid, and
Non-maxima suppression is carried out respectively to the different classes of bounding box remained under different demarcation granularity, obtains the position of target
Put and classification information.
Said apparatus, it is preferred that the relating module is specifically used for,
By the incidence relation between the same type of at different moments target and different background that learn in advance, will detect
To target be associated at different moments with different backgrounds, obtain object detection results.
By above scheme, a kind of method for tracking target and device that the application provides, by two layers of convolutional Neural net
Network is combined with time recurrent neural networks model, solves the problems, such as that the verification and measurement ratio for Small object is low.Moreover, extraction background
In with the information of target association carry out target detection, improve speed of the target following model in video object detection with it is accurate
Rate.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the exemplary plot for the target following model that the embodiment of the present application provides;
Fig. 2 is a kind of implementation process figure for the method for tracking target that the embodiment of the present application provides;
Fig. 3 is a kind of implementation process figure for the object detecting device that the embodiment of the present application provides.
Term " first ", " second ", " the 3rd " " the 4th " in specification and claims and above-mentioned accompanying drawing etc. (if
In the presence of) it is for distinguishing similar part, without for describing specific order or precedence.It should be appreciated that so use
Data can exchange in the appropriate case, so that embodiments herein described herein can be with except illustrating herein
Order in addition is implemented.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not paid
Embodiment, belong to the scope of protection of the invention.
As shown in figure 1, for the embodiment of the present application provide target following model exemplary plot, the application provide target with
Track model includes two convolutional neural networks (Convolutional Neural Networks, CNN) and a time recurrence god
Through network LSTM (Long Short-Term Memory).Wherein, convolutional network 1 is one of convolutional neural networks (for just
In differentiation, hereinafter referred to as the first convolutional neural networks) convolutional layer, convolutional network 2 for another convolutional neural networks (for ease of
Distinguish, hereinafter referred to as the second convolutional neural networks) convolutional layer.
Illustrate the training process of target following model first below.
In the embodiment of the present application, independent instruction is first carried out respectively to two convolutional neural networks and time recurrent neural network
Practice, then, the result obtained based on each self-training constructs the initial target following model of the application, then to initial target
Trace model is trained, and obtains final target following model.
In the embodiment of the present application, the first convolutional neural networks are mainly responsible for extraction target, and mark classification and the position of target
Put.First convolutional neural networks include 24 layers of convolutional layer and 2 layers of full articulamentum.Can be at YOLO (You Only Look Once)
It is trained to obtain on the basis of convolutional neural networks.Specifically, by the parameter of the convolutional layer in YOLO convolutional neural networks
Weights are assigned to the convolutional layer of the first convolutional neural networks, and the weights of the full articulamentum of first convolutional neural networks are from high
This random distribution (for example, it may be average is zero, the gaussian random that variance is 0.01 is distributed) carries out weight initialization;In target
First convolutional neural networks are trained end to end in detection and classification task, at the beginning of obtaining the first convolutional neural networks
Beginning model;
In the training process, a kind of mode of the detection of the first convolutional neural networks performance objective and classification task can be:
Training is divided into n*n grid with each two field picture in video, n is positive integer.In an optional embodiment,
N value can be 7.All demarcation has position and the class label of target in each two field picture in the training video.
Several bounding boxs (generally rectangular frame, for marking the target detected) are predicted in each grid, and are remembered
Record position, the size of each bounding box of prediction, and trust value and class label corresponding to each bounding box;Wherein, class label
The classification of target in bounding box is characterized, trust value represents the confidence level containing target and this encirclement in predicted bounding box
This two important informations of the accuracy of box prediction, the calculation formula of trust value are as follows:
In formula, Pr (Object) value falls in a bag depending on whether target falls in bounding box, when there is target
When enclosing in box, Pr (Object) value is 1, and otherwise Pr (Object) value is 0.Represent the bounding box and mark of prediction
IOU (the ratio between Intersection-over-Union, common factor union) value between fixed target bounding box.Wherein, whether target
Falling can judge in bounding box according to calibration value, and target falls to be included in bounding box:Target is fully fallen in bounding box, and
Target part falls in bounding box.
Generally, the position of bounding box is the coordinate in the upper left corner of bounding box, and the size of bounding box is the length of bounding box
Degree and width.
Based on trust value and class label corresponding to each bounding box, trust value point of each bounding box to generic is calculated
Number.
Trust value corresponding to each bounding box is multiplied with class label, obtains the certain kinds trust value point of each bounding box
Number, i.e., trust value fraction of each bounding box to generic.
The bounding box deletion of preset fraction threshold value will be less than to the trust value fraction of generic in the grid, and to net
Belong to same category of bounding box in the bounding box retained in lattice and carry out non-maxima suppression, obtain the target detection of each grid
As a result.
The processing mode of each grid is identical, no longer repeats one by one here.
In an optional embodiment, preset fraction threshold value can be 0.6.
After the object detection results of each grid are obtained, carried out to belonging to same category of bounding box in whole image
Non-maxima suppression, obtain final object detection results.
The process that non-maxima suppression is carried out to belonging to same category of bounding box in the bounding box that retains in grid can be with
For:
Determine that trust value fraction highest bounding box in same category of bounding box (is designated as the first encirclement for ease of narration
Box);
Same category of other bounding boxs (being designated as the second bounding box for ease of narration) are calculated to overlap with the first bounding box
Rate, if coincidence factor is higher than a setting value, the second bounding box is deleted, otherwise, retain the second bounding box.
The extent of error of the object detection results of first convolutional neural networks, loss are calculated by preset loss function
Degree characterizes error of the predicted value (i.e. testing result) between calibration value, and the loss function is:
Wherein, Loss be the first convolutional neural networks object detection results extent of error, λ1Predict and lose for coordinate
Loss weight, λ1Value can be 5, λ2The loss weight lost for the trust value of aimless bounding box, λ2Value
Can be 0.5, λ3For the loss weight of trust value loss and the classification loss of the bounding box containing target, λ3Value can be
1;I is used to distinguish different grids, and j is used to distinguish different bounding boxs.xij, yij, wij, hij, CijRepresent predicted value, xijAnd yij
For the coordinate of j-th of bounding box in i-th of grid of prediction, wijFor the width of j-th of bounding box in i-th of grid of prediction,
hijFor the height of j-th of bounding box in i-th of grid of prediction,Represent calibration value,WithFor the coordinate of j-th of bounding box in i-th of grid of demarcation,For the width of j-th of bounding box in i-th of grid of demarcation
Degree,For the height of j-th of bounding box in i-th of grid of demarcation, S2Divided grid number is represented, B represents some grid
In bounding box number, CijThe trust value fraction of j-th of bounding box in i-th of grid of prediction is represented,Represent demarcation
I-th of grid in j-th of bounding box trust value fraction, pi(c) encirclement of c classifications in i-th of grid of prediction is represented
The probability of box;Represent the probability of the bounding box of c classifications in i-th of grid of demarcation.The encirclement of c classifications in i-th of grid
The probability that box occurs is the quantity of the bounding box of c classifications and all bounding box sums in i-th of bounding box in i-th of grid
Quotient.
J-th bounding box of the value in i-th of grid whether depending on the detection target comprising setting, if in advance
In the bounding box and i-th of grid first demarcated j-th of bounding box detection goods categories be it is the same, thenTake 1;Otherwise take
0。
Represent the trust value prediction loss of the bounding box containing target and multiplying for loss weight
Product;
Represent the trust value prediction loss of the bounding box without target and multiplying for loss weight
Product;J-th bounding box of the value in i-th of grid whether depending on the detection target comprising setting, if in advance
In the bounding box of demarcation and i-th of grid j-th of bounding box detection goods categories be it is the same, thenTake 0;OtherwiseTake 1.
Indicate whether target's center fall in grid i class prediction loss with
Lose the product of weight.Wherein, if there is target's center to fall in grid i,Value be 1, otherwise,Value is 0.C tables
Show classification.
In order to which small target should be detected, big target is detected again, in the embodiment of the present application, in order that must lose
Each loss is more balanced in function, and coordinate prediction loss is characterized by Euler's distance, so excellent to the first convolutional neural networks
During change, only coordinate is finely adjusted, target flase drop and target missing inspection is solved, examines problem more.
If extent of error is more than or equal to predetermined threshold value, carried out using BP back-propagation algorithms and Adam update methods
Right value update, and other data of input database are trained next time, until extent of error is less than the predetermined threshold value.
In the training process, the another way of the detection of the first convolutional neural networks performance objective and classification task can be with
For:
Described image is divided into m*m grid according to L kinds different granularity of division, m there are L different values;Can one
In the embodiment of choosing, L value can be able to be respectively 7,5,3,1 for 4, m 4 kinds of values.Each corresponding granularity of division,
Several bounding boxs are predicted in each grid, and record the position of each bounding box of prediction, size, and often
Trust value and class label corresponding to individual bounding box;
Based on trust value and class label corresponding to each bounding box, trust value point of each bounding box to generic is calculated
Number;
The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic information in the grid, and to net
The different classes of bounding box retained in lattice carries out non-maxima suppression respectively, i.e., to belonging to same in the bounding box that retains in grid
A kind of other bounding box carries out non-maxima suppression, obtains the object detection results of each grid.
The processing mode of each grid is identical, no longer repeats one by one here.
After the object detection results of each grid are obtained, bounding box different classes of in whole image is carried out respectively
Non-maxima suppression, i.e., non-maxima suppression is carried out to belonging to same category of bounding box in whole image, obtain final mesh
Mark testing result.
The extent of error of the object detection results of first convolutional neural networks is calculated by preset loss function.
If extent of error is more than or equal to predetermined threshold value, carried out using BP back-propagation algorithms and Adam update methods
Right value update, and other data of input database are trained next time, until extent of error is less than the predetermined threshold value.
Target detection and assorting process under each above-mentioned granularity of division may refer to aforementioned process, that is to say, that will
When image is divided into 7*7 grid, once above-mentioned target detection process is performed, when dividing an image into 5*5 grid, performs one
Secondary above-mentioned target detection process, the like, until carrying out as above target detection under each granularity of division.Here no longer one by one
Repeat the target detection process under each granularity.
In each training process, the union of the testing result under all granularities is final goal inspection in this training process
Survey result.
In the embodiment of the present application, pass through a variety of granularity of division and carry out target detection and classification so that target detection it is accurate
Rate is higher.
Second convolutional neural networks are mainly responsible for information associated with different classes of target in extraction background.Volume Two
Product neutral net is identical with the structure of the first convolutional neural networks, but the task of the second convolutional neural networks execution and output are not
Together, the task that the second convolutional neural networks perform detects for the target type based on background, the output of the second convolutional neural networks
For information associated with different classes of target in background, the second convolutional neural networks are using Softmax functions as loss letter
Number optimizes, and parameter renewal process is identical with the first convolutional network.
When being trained to the second convolutional neural networks, by the ginseng of convolutional layer in the first convolutional neural networks trained
Several weights are assigned to the second convolutional neural networks, and the weights of the parameter of the full articulamentum of the second convolutional neural networks select Gauss
Random distribution carries out weight initialization;The second convolutional neural networks are held on the target type Detection task based on background
To the training at end, the second convolution neural network model is obtained;Target type detection based on background can use conventional detection
Method.
Give the parameter assignment of the weights of the convolutional layer of the second convolution neural network model to the first convolution neural network model
Convolutional layer parameter, the first convolution neural network model and the second convolution neural network model are entered again by preceding method
Row training, so circulation (carry out altogether training three times) twice, obtain final the first convolution neural network model and second
Convolutional neural networks model.
In the embodiment of the present application, the first convolutional neural networks and the second convolutional neural networks carry out joint training, improve
Calculating speed in training process.
From the training process of both of the aforesaid convolutional neural networks, the first convolutional neural networks and the second convolution nerve net
The convolution layer parameter of network is identical.The time is calculated in order to reduce, above-mentioned first convolutional neural networks and the second convolutional neural networks can
To share convolutional layer parameter, the memory space of occupancy can also be so reduced.
Time recurrent neural network is mainly used in detection target being associated with different background at different moments, improves
Target detection accuracy rate in video.
In the embodiment of the present application, time recurrent neural network is trained from the training set comprising two class videos.Its
In, the quantity of first kind video and the second class video is equal, and the duration of first kind video and the second class video is identical, and the first kind regards
The amplitude of variation of target is more than the amplitude of variation of target in the second video in frequency;The amplitude of variation of target can refer to that greatly target is dashed forward
So occur, suddenly disappear, or the change that the appearance such as posture is big.The amplitude of variation of target is small can to refer to that object variations are slow,
Be not in appear or disappear suddenly, attitudes vibration is small etc..
Same target is in associating between different background at different moments in each video of time recurrent neural network analysis
Relation, the incidence relation between same type of target and different background at different moments is obtained by machine learning.
In the training process, right value update is carried out according to time reversal propagation algorithm and Adam update methods.
Front has been described above the respective training process of convolutional neural networks and time recurrent neural network.Explanation pair below
The mistake that the target following model being made up of the above-mentioned convolutional neural networks trained and time recurrent neural network is trained
Journey.
Initial by the above-mentioned two convolutional neural networks models trained and time recurrent neural networks model construction
Target following model:Whole convolutional layers of first convolution neural network model are connected into time recurrence god by the first full articulamentum
It is connected through network model, at least part convolutional layer of the second convolution neural network model is connected into by the second full articulamentum described
Time recurrent neural networks model, the output end of the time recurrent neural networks model also with the first full articulamentum of above-mentioned two
Input, and the 3rd full articulamentum input connection.
The initial target following model is trained on preset object detection task, obtain the target with
Track model.
Above-mentioned preset object detection task can be:
First convolutional neural networks carry out target detection, position of the target detected in described image to image
Put, and the classification of detected target;
Second convolutional neural networks carry out the target detection based on background to described image, obtain in background with it is different classes of
The associated information of target;
Time recurrent neural network is based on information associated with different classes of target in the background, by what is detected
Target is being associated with different backgrounds at different moments, obtains object detection results, and object detection results are complete by the 3rd
Articulamentum exports.
In a preferred embodiment, time recurrent neural network is after object detection results are obtained, first not output result,
But object detection results are fed back into convolutional neural networks, specifically feed back to the full articulamentum of convolutional neural networks, previous stage
Full articulamentum the data that convolutional network exports and the LSTM data fed back are randomly selected, by the numerical value randomly selected pass through
The processing of time recurrent neural net is crossed, obtains final object detection results, the final object detection results are passed through last
Full articulamentum output.In the embodiment of the present application, by feedback mechanism, target detection precision is improved.
During target following model training, using BP back-propagation algorithms and Adam update methods to convolutional Neural
The weights of parameter are updated in network, using time reversal propagation algorithm and Adam update methods to time recurrent neural
The weights of parameter in network are updated.
In an optional embodiment, the process that the first convolutional neural networks carry out target detection to image can include:
Described image is divided into n*n grid;
Several bounding boxs are predicted in each grid, and record the position of each bounding box, size, and each encirclement
Trust value corresponding to box and class label;
Based on trust value and class label corresponding to each bounding box, trust value point of each bounding box to generic is calculated
Number;
The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic in the grid, and in grid
The same category of bounding box that belongs to retained carries out non-maxima suppression, obtains the position of target and classification information in grid.
After the object detection results of each grid are obtained, carried out to belonging to same category of bounding box in whole image
Non-maxima suppression, obtain final object detection results.
In an optional embodiment, the process that the first convolutional neural networks carry out target detection to image can include:
Described image is divided into m*m grid according to L kinds different granularity of division, m there are L different values;
Each corresponding granularity of division, predicts several bounding boxs, and record each bounding box in each grid
Position, size, and trust value and class label corresponding to each bounding box;
Based on trust value and class label corresponding to each bounding box, trust of each bounding box to generic information is calculated
It is worth fraction;
The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic in the grid, and in grid
Belong to same category of bounding box in the bounding box of reservation and carry out non-maxima suppression, obtain position and the classification information of target.
After the object detection results of each grid are obtained, carried out to belonging to same category of bounding box in whole image
Non-maxima suppression, obtain final object detection results.
Target detection is carried out by as above method under each granularity of division.
After training target following model, it is possible to carry out target detection using target following model.
Referring to Fig. 2, a kind of implementation process figure that Fig. 2 is the method for tracking target that the embodiment of the present application provides can wrap
Include:
Step S21:First convolutional neural networks carry out target detection to described image, and the target detected is described
Position in image, and the classification of detected target;
Step S22:Second convolutional neural networks carry out the target detection based on background to described image, obtain in background with
The associated information of different classes of target;
Step S22:Time recurrent neural network, will based on information associated with different classes of target in the background
The target detected is being associated with different backgrounds at different moments, obtains object detection results.
Wherein, the first convolutional neural networks carry out the process of target detection to image, can include:
Described image is divided into n*n grid;
Several bounding boxs are predicted in each grid, and record the position of each bounding box, size, and each encirclement
Trust value corresponding to box and class label;
Based on trust value and class label corresponding to each bounding box, trust of each bounding box to generic information is calculated
It is worth fraction;
The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic information in the grid, and to net
The bounding box that lattice retain belongs to same category of bounding box and carries out non-maxima suppression, obtain in each grid the position of target and
Classification information.
After the object detection results of each grid are obtained, carried out to belonging to same category of bounding box in whole image
Non-maxima suppression, obtain final object detection results.
In another optional embodiment, the first convolutional neural networks carry out the process of target detection to image, can wrap
Include:
Described image is divided into m*m grid according to L kinds different granularity of division, m there are L different values;Can one
In the embodiment of choosing, L value can be able to be respectively 7,5,3,1 for 4, m 4 kinds of values.Each corresponding granularity of division,
Several bounding boxs are predicted in each grid, and record the position of each bounding box of prediction, size, and often
Trust value and class label corresponding to individual bounding box;
Based on trust value and class label corresponding to each bounding box, trust of each bounding box to generic information is calculated
It is worth fraction;
The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic information in the grid, and to net
Belong to same category of bounding box in the bounding box retained in lattice and carry out non-maxima suppression, obtain the position of target in each grid
Put and classification information.
After the object detection results of each grid are obtained, carried out to belonging to same category of bounding box in whole image
Non-maxima suppression, obtain final object detection results.
Under each granularity of division, the process of target detection is identical, does not repeat one by one here.
In an optional embodiment, time recurrent neural network is based on related to different classes of target in the background
The information of connection, the target detected is being associated with different backgrounds at different moments, object detection results is being obtained, can wrap
Include:
Time recurrent neural network passes through between the same type of at different moments target and different background that learn in advance
Incidence relation, the target detected is being associated with different backgrounds at different moments, is obtaining object detection results.
Corresponding with embodiment of the method, the application also provides a kind of object detecting device, the mesh that the embodiment of the present application provides
A kind of implementation process figure of detection means is marked as shown in figure 3, can include:
First detection module 31, the second detection module 32 and relating module 33;Wherein,
First detection module 31 is used to carry out target inspection to each two field picture in video flowing by the first convolutional neural networks
Survey, position of the target detected in described image, and the classification of detected target;
Second detection module 32 is used to carry out described image by the second convolutional neural networks the target inspection based on background
Survey, obtain information associated with different classes of target in background;
Relating module 33 is used for based on information associated with different classes of target in the background, the mesh that will be detected
It is marked on and is associated at different moments with different backgrounds, obtains object detection results.
The object detecting device that the application provides, two layers of convolutional neural networks and time recurrent neural networks model are mutually tied
Close, solve the problems, such as that the verification and measurement ratio for Small object is low.Moreover, the information in extraction background with target association carries out target inspection
Survey, improve speed and accuracy rate of the target following model in video object detection.
In an optional embodiment, above-mentioned first detection module 31 specifically can be used for, and pass through the first convolution nerve net
Described image is divided into n*n grid by network;In several bounding boxs of each grid forecasting, and record the position of each bounding box
Put, size, and trust value and class label corresponding to each bounding box;Based on trust value and classification corresponding to each bounding box
Value, calculates trust value fraction of each bounding box to generic;Will be small to the trust value fraction of generic in the grid
In predetermined threshold value bounding box delete, and to different classes of bounding box with a grain of salt carry out non-maxima suppression respectively, obtain
To the position of target and classification information.
In another optional embodiment, first detection module 31 specifically can be used for, and pass through the first convolutional neural networks
Described image is divided into m*m grid according to L kinds different granularity of division, m there are L different values;Each corresponding division
Granularity, several bounding boxs are predicted in each grid, and record the position of each bounding box, size, and each bounding box
Corresponding trust value and class label;Based on each trust value and class label corresponding to bounding box in grid, each bounding box is calculated
To the trust value fraction of generic;The bounding box of predetermined threshold value will be less than to the trust value fraction of generic in the grid
Delete, and the different classes of bounding box to being remained under different demarcation granularity carries out non-maxima suppression respectively, obtains
The position of target and classification information.
In an optional embodiment, relating module 33 specifically can be used for,
By the incidence relation between the same type of at different moments target and different background that learn in advance, will detect
To target be associated at different moments with different backgrounds, obtain object detection results.
In an optional embodiment, object detecting device can also include:
Training module, for training objective trace model, it is specifically used for, by the convolutional layer in YOLO convolutional neural networks
The weights of parameter are assigned to first convolutional neural networks, and the weights of the other parameters of first convolutional neural networks are selected
Gaussian random distribution carries out weight initialization;First convolutional neural networks are held in target detection and classification task
To the training at end, the first convolution neural network model is obtained;
The weights of the parameter of convolutional layer in first convolutional neural networks are assigned to second convolutional neural networks, it is described
The weights of the other parameters of second convolutional neural networks carry out weight initialization from gaussian random distribution;In the mesh based on background
Second convolutional neural networks are trained end to end in mark type detection task, obtain the second convolutional neural networks mould
Type;
Give the parameter assignment of the weights of the convolutional layer of the second convolution neural network model to first convolutional Neural
The convolutional layer of network model, it is trained again by as above step, so circulation twice, obtains the first final convolutional Neural
Network model and the second convolution neural network model;
By the video training set chosen in advance by target under at different moments same type of target and different background
Time recurrent neural network is trained in being associated for task, obtains time recurrent neural networks model;The video
Training set includes the equal first kind video of quantity and the second class video, the first kind video and the second class video
Duration is identical, and the amplitude of variation of target is more than the amplitude of variation of target in second video in the first kind video;
Construct initial target following model:Whole convolutional layers of first convolution neural network model are connected entirely by first
Connect layer and be connected into the time recurrent neural networks model, by least one of the convolutional layer of the second convolution neural network model
Point (for example, it may be whole convolutional layer or first 12 layers) is connected into the time recurrence god by the second full articulamentum
Through network model, by the output end of the time recurrent neural networks model and the described first full articulamentum and the second full articulamentum
Input, and the 3rd full articulamentum input connection.
The initial target following model is trained on preset object detection task, obtain the target with
Track model.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein
Member and algorithm steps, it can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
Performed with hardware or software mode, application-specific and design constraint depending on technical scheme.Professional and technical personnel
Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed
The scope of the present invention.
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, can be with
Realize by another way.Another, shown or discussed mutual coupling or direct-coupling or communication connection can
Can be electrical, mechanical or other forms to be by some interfaces, the INDIRECT COUPLING or communication connection of device or unit.
The unit illustrated as separating component can be or may not be physically separate, show as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.
It should be appreciated that in the embodiment of the present application, combination can be combined with each other from power, each embodiment, feature, can be realized
Solves aforementioned technical problem.
If the function is realized in the form of SFU software functional unit and is used as independent production marketing or in use, can be with
It is stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially in other words
The part to be contributed to prior art or the part of the technical scheme can be embodied in the form of software product, the meter
Calculation machine software product is stored in a storage medium, including some instructions are causing a computer equipment (can be
People's computer, server, or network equipment etc.) perform all or part of step of each embodiment methods described of the present invention.
And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with the medium of store program codes.
The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the present invention.
A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein
General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention
The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one
The most wide scope caused.
Claims (10)
1. a kind of method for tracking target, it is characterised in that by the good target following model of training in advance to each in video flowing
Two field picture carries out target detection, including:
The first convolutional neural networks in the target following model carry out target detection, the mesh detected to described image
The position being marked in described image, and the classification of detected target;
The second convolutional neural networks in the target following model carry out the target detection based on background to described image, obtain
The information associated with different classes of target in background;
Time recurrent neural network in the target following model is based on associated with different classes of target in the background
Information, the target detected is being associated with different backgrounds at different moments, is obtaining object detection results.
2. according to the method for claim 1, it is characterised in that first convolutional neural networks carry out target inspection to image
The process of survey, including:
Described image is divided into n*n grid;
In several bounding boxs of each grid forecasting, and the position of each bounding box, size are recorded, and each bounding box is corresponding
Trust value and class label;
Based on trust value and class label corresponding to each bounding box, trust value fraction of each bounding box to generic is calculated;
The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic in the grid, and it is with a grain of salt to institute
Different classes of bounding box carries out non-maxima suppression respectively, obtains position and the classification information of target.
3. according to the method for claim 1, it is characterised in that first convolutional neural networks carry out target inspection to image
The process of survey, including:
Described image is divided into m*m grid according to L kinds different granularity of division, m there are L different values;
Each corresponding granularity of division, predicts several bounding boxs in each grid, and records the position, big of each bounding box
It is small, and trust value and class label corresponding to each bounding box;
Based on each trust value and class label corresponding to bounding box in grid, trust value of each bounding box to generic is calculated
Fraction;
The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic in the grid, and to different demarcation grain
The lower different classes of bounding box remained of degree carries out non-maxima suppression respectively, obtains position and the classification letter of target
Breath.
4. according to the method for claim 1, it is characterised in that time recurrent neural network is based in the background and difference
The associated information of the target of classification, the target detected is being associated with different backgrounds at different moments, is obtaining target
Testing result, including:
Time recurrent neural network passes through the pass between the same type of at different moments target and different background that learn in advance
Connection relation, the target detected is being associated with different backgrounds at different moments, is obtaining object detection results.
5. according to the method for claim 1, it is characterised in that the training process of the target following model includes:
The weights of the parameter of convolutional layer in YOLO convolutional neural networks are assigned to first convolutional neural networks, described
The weights of the other parameters of one convolutional neural networks carry out weight initialization from gaussian random distribution;In target detection and classification
First convolutional neural networks are trained end to end in task, obtain the first convolution neural network model;
The weights of the parameter of convolutional layer in first convolutional neural networks are assigned to second convolutional neural networks, described second
The weights of the other parameters of convolutional neural networks carry out weight initialization from gaussian random distribution;In the target class based on background
Second convolutional neural networks are trained end to end on type Detection task, obtain the second convolution neural network model;
Give the parameter assignment of the weights of the convolutional layer of the second convolution neural network model to first convolutional neural networks
The convolutional layer of model, it is trained again by as above step, so circulation twice, obtains the first final convolutional neural networks
Model and the second convolution neural network model;
By the video training set chosen in advance by target, same type of target is carried out with different background under at different moments
Time recurrent neural network is trained in the task of association, obtains time recurrent neural networks model;The video training
Concentrating includes the duration of the equal first kind video of quantity and the second class video, the first kind video and the second class video
Identical, the amplitude of variation of target is more than the amplitude of variation of target in second video in the first kind video;
Construct initial target following model:Whole convolutional layers of first convolution neural network model are passed through into the first full articulamentum
The time recurrent neural networks model is connected into, at least a portion of the convolutional layer of the second convolution neural network model is led to
Cross the second full articulamentum and be connected into the time recurrent neural networks model, by the output end of the time recurrent neural networks model
Connected with the input of the described first full articulamentum and the input of the second full articulamentum, and the 3rd full articulamentum,
The initial target following model is trained on preset object detection task, obtains the target following mould
Type.
6. according to the method for claim 5, it is characterised in that it is described in target detection and classification task to described first
Convolutional neural networks are trained end to end, including:First convolutional neural networks carry out in the following way target detection and
Classification:
Divide an image into n*n grid;
Several bounding boxs are predicted in each grid, and record the position of each bounding box, size, and each bounding box pair
The trust value and class label answered;
Based on trust value and class label corresponding to each bounding box, trust value fraction of each bounding box to generic is calculated;
The bounding box deletion of predetermined threshold value will be less than to the trust value fraction of generic information in the grid, and to all nets
The different classes of bounding box retained in lattice carries out non-maxima suppression respectively, obtains object detection results;
The extent of error of the object detection results of first convolutional neural networks, the loss are calculated by preset loss function
Function is:
<mfenced open = "" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mi>L</mi>
<mi>o</mi>
<mi>s</mi>
<mi>s</mi>
<mo>=</mo>
<msub>
<mi>&lambda;</mi>
<mn>1</mn>
</msub>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<msup>
<mi>S</mi>
<mn>2</mn>
</msup>
</munderover>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>B</mi>
</munderover>
<msubsup>
<mi>l</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<msup>
<mrow>
<mo>&lsqb;</mo>
<mrow>
<msup>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<mi>x</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>x</mi>
<mo>^</mo>
</mover>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
</mrow>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<mi>y</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>y</mi>
<mo>^</mo>
</mover>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
</mrow>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
<mo>&rsqb;</mo>
</mrow>
<mrow>
<mn>1</mn>
<mo>/</mo>
<mn>2</mn>
</mrow>
</msup>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mn>1</mn>
</msub>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<msup>
<mi>S</mi>
<mn>2</mn>
</msup>
</munderover>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>B</mi>
</munderover>
<msubsup>
<mi>l</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<mrow>
<mo>&lsqb;</mo>
<mrow>
<msup>
<mrow>
<mo>(</mo>
<mrow>
<msqrt>
<msub>
<mi>w</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
</msqrt>
<mo>-</mo>
<msqrt>
<msub>
<mover>
<mi>w</mi>
<mo>^</mo>
</mover>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
</msqrt>
</mrow>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<mrow>
<mo>(</mo>
<mrow>
<msqrt>
<msub>
<mi>h</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
</msqrt>
<mo>-</mo>
<msqrt>
<msub>
<mover>
<mi>h</mi>
<mo>^</mo>
</mover>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
</msqrt>
</mrow>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
<mo>&rsqb;</mo>
</mrow>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mn>3</mn>
</msub>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<msup>
<mi>S</mi>
<mn>2</mn>
</msup>
</munderover>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>B</mi>
</munderover>
<msubsup>
<mi>l</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<msup>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<mi>C</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>C</mi>
<mo>^</mo>
</mover>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
</mrow>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mn>2</mn>
</msub>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<msup>
<mi>S</mi>
<mn>2</mn>
</msup>
</munderover>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mi>B</mi>
</munderover>
<msubsup>
<mi>l</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
<mrow>
<mi>n</mi>
<mi>o</mi>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<msup>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<mi>C</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>-</mo>
<msub>
<mover>
<mi>C</mi>
<mo>^</mo>
</mover>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
</mrow>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mn>3</mn>
</msub>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<msup>
<mi>S</mi>
<mn>2</mn>
</msup>
</munderover>
<msubsup>
<mi>l</mi>
<mi>i</mi>
<mrow>
<mi>o</mi>
<mi>b</mi>
<mi>j</mi>
</mrow>
</msubsup>
<munder>
<mi>&Sigma;</mi>
<mrow>
<mi>c</mi>
<mo>&Element;</mo>
<mi>c</mi>
<mi>l</mi>
<mi>a</mi>
<mi>s</mi>
<mi>s</mi>
<mi>e</mi>
<mi>s</mi>
</mrow>
</munder>
<msup>
<mrow>
<mo>(</mo>
<mrow>
<msub>
<mi>p</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>c</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mover>
<mi>p</mi>
<mo>^</mo>
</mover>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>c</mi>
<mo>)</mo>
</mrow>
</mrow>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
Wherein, Loss be the first convolutional neural networks object detection results extent of error, λ1For the loss of coordinate prediction loss
Weight, λ1Value can be 5, λ2The loss weight lost for the trust value of aimless bounding box, λ2Value can be
0.5, λ3For the loss weight of trust value loss and the classification loss of the bounding box containing target, λ3Value can be 1;I is used for
Different grids is distinguished, j is used to distinguish different bounding boxs;xij, yij, wij, hij, CijRepresent predicted value,Represent calibration value, S2Divided grid number is represented, B represents the bounding box in some grid
Number, CijRepresent the trust value fraction of j-th of bounding box in i-th of grid, pi(c) mesh of c classifications in i-th of grid is represented
Mark existing probability;If in the bounding box and i-th of grid demarcated in advance j-th of bounding box detection goods categories be
, thenTake 1;OtherwiseTake 0;If the article of j-th of bounding box detection in the bounding box and i-th of grid demarcated in advance
Classification be it is the same, thenTake 0;OtherwiseTake 1;
If extent of error is more than or equal to predetermined threshold value, weights are carried out more using back-propagation algorithm and Adam update methods
Newly, and input in training storehouse be not used data are trained next time, until the extent of damage and the loss function are most
The difference of small value is less than pre-determined threshold.
A kind of 7. object detecting device, it is characterised in that including:
First detection module, for carrying out target detection to each two field picture in video flowing by the first convolutional neural networks, obtain
To position of the target detected in described image, and the classification of detected target;
Second detection module, for carrying out the target detection based on background to described image by the second convolutional neural networks, obtain
The information associated with different classes of target into background;
Relating module, for based on information associated with different classes of target in the background, the target detected to be existed
It is associated at different moments with different backgrounds, obtains object detection results.
8. device according to claim 7, it is characterised in that the first detection module is specifically used for, and passes through the first volume
Described image is divided into n*n grid by product neutral net;In several bounding boxs of each grid forecasting, and record each surround
The position of box, size, and trust value and class label corresponding to each bounding box;Based on trust value corresponding to each bounding box and
Class label, calculate trust value fraction of each bounding box to generic;The trust value of generic will be divided in the grid
Number less than predetermined threshold value bounding boxs delete, and to different classes of bounding box with a grain of salt carry out non-maximum suppression respectively
System, obtains position and the classification information of target.
9. device according to claim 7, it is characterised in that the first detection module is specifically used for, and passes through the first volume
Described image is divided m*m grid by product neutral net according to the different granularity of division of L kinds, and m has L different values;It is corresponding
Each granularity of division, several bounding boxs are predicted in each grid, and record the position of each bounding box, size, and
Trust value and class label corresponding to each bounding box;Based on each trust value and class label corresponding to bounding box in grid, calculate
Trust value fraction of each bounding box to generic;Default threshold will be less than to the trust value fraction of generic in the grid
The bounding box of value is deleted, and the different classes of bounding box to being remained under different demarcation granularity carries out non-maximum respectively
Suppress, obtain position and the classification information of target.
10. device according to claim 7, it is characterised in that the relating module is specifically used for,
By the incidence relation between the same type of at different moments target and different background that learn in advance, by what is detected
Target is being associated with different backgrounds at different moments, obtains object detection results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710920018.7A CN107808122B (en) | 2017-09-30 | 2017-09-30 | Target tracking method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710920018.7A CN107808122B (en) | 2017-09-30 | 2017-09-30 | Target tracking method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107808122A true CN107808122A (en) | 2018-03-16 |
CN107808122B CN107808122B (en) | 2020-08-11 |
Family
ID=61584759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710920018.7A Active CN107808122B (en) | 2017-09-30 | 2017-09-30 | Target tracking method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107808122B (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108764215A (en) * | 2018-06-21 | 2018-11-06 | 郑州云海信息技术有限公司 | Target search method for tracing, system, service centre and terminal based on video |
CN108968811A (en) * | 2018-06-20 | 2018-12-11 | 四川斐讯信息技术有限公司 | A kind of object identification method and system of sweeping robot |
CN109145781A (en) * | 2018-08-03 | 2019-01-04 | 北京字节跳动网络技术有限公司 | Method and apparatus for handling image |
CN109410251A (en) * | 2018-11-19 | 2019-03-01 | 南京邮电大学 | Method for tracking target based on dense connection convolutional network |
CN109753931A (en) * | 2019-01-04 | 2019-05-14 | 广州广电卓识智能科技有限公司 | Convolutional neural networks training method, system and facial feature points detection method |
CN109817009A (en) * | 2018-12-31 | 2019-05-28 | 天合光能股份有限公司 | A method of obtaining unmanned required dynamic information |
CN110007366A (en) * | 2019-03-04 | 2019-07-12 | 中国科学院深圳先进技术研究院 | A kind of life searching method and system based on Multi-sensor Fusion |
CN110008792A (en) * | 2018-01-05 | 2019-07-12 | 比亚迪股份有限公司 | Image detecting method, device, computer equipment and storage medium |
CN110087041A (en) * | 2019-04-30 | 2019-08-02 | 中国科学院计算技术研究所 | Video data processing and transmission method and system based on the base station 5G |
CN110443789A (en) * | 2019-08-01 | 2019-11-12 | 四川大学华西医院 | A kind of foundation and application method of immunofixation electrophoresis figure automatic identification model |
CN110487211A (en) * | 2019-09-29 | 2019-11-22 | 中国科学院长春光学精密机械与物理研究所 | Non-spherical element surface testing method, device, equipment and readable storage medium storing program for executing |
CN110619254A (en) * | 2018-06-19 | 2019-12-27 | 海信集团有限公司 | Target tracking method and device based on disparity map and terminal |
CN110826572A (en) * | 2018-08-09 | 2020-02-21 | 京东方科技集团股份有限公司 | Multi-target detection non-maximum suppression method, device and equipment |
CN110826379A (en) * | 2018-08-13 | 2020-02-21 | 中国科学院长春光学精密机械与物理研究所 | Target detection method based on feature multiplexing and YOLOv3 |
CN111104831A (en) * | 2018-10-29 | 2020-05-05 | 香港城市大学深圳研究院 | Visual tracking method, device, computer equipment and medium |
CN111178495A (en) * | 2018-11-10 | 2020-05-19 | 杭州凝眸智能科技有限公司 | Lightweight convolutional neural network for detecting very small objects in images |
CN112306104A (en) * | 2020-11-17 | 2021-02-02 | 广西电网有限责任公司 | Image target tracking holder control method based on grid weighting |
CN112911171A (en) * | 2021-02-04 | 2021-06-04 | 上海航天控制技术研究所 | Intelligent photoelectric information processing system and method based on accelerated processing |
CN115482417A (en) * | 2022-09-29 | 2022-12-16 | 珠海视熙科技有限公司 | Multi-target detection model and training method, device, medium and equipment thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106503723A (en) * | 2015-09-06 | 2017-03-15 | 华为技术有限公司 | A kind of video classification methods and device |
CN106682697A (en) * | 2016-12-29 | 2017-05-17 | 华中科技大学 | End-to-end object detection method based on convolutional neural network |
CN106846364A (en) * | 2016-12-30 | 2017-06-13 | 明见(厦门)技术有限公司 | A kind of method for tracking target and device based on convolutional neural networks |
CN106911930A (en) * | 2017-03-03 | 2017-06-30 | 深圳市唯特视科技有限公司 | It is a kind of that the method for perceiving video reconstruction is compressed based on recursive convolution neutral net |
-
2017
- 2017-09-30 CN CN201710920018.7A patent/CN107808122B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106503723A (en) * | 2015-09-06 | 2017-03-15 | 华为技术有限公司 | A kind of video classification methods and device |
CN106682697A (en) * | 2016-12-29 | 2017-05-17 | 华中科技大学 | End-to-end object detection method based on convolutional neural network |
CN106846364A (en) * | 2016-12-30 | 2017-06-13 | 明见(厦门)技术有限公司 | A kind of method for tracking target and device based on convolutional neural networks |
CN106911930A (en) * | 2017-03-03 | 2017-06-30 | 深圳市唯特视科技有限公司 | It is a kind of that the method for perceiving video reconstruction is compressed based on recursive convolution neutral net |
Non-Patent Citations (1)
Title |
---|
张顺等: ""深度卷积神经网络的发展及其在计算机视觉领域的应用"", 《计算机学报》 * |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110008792B (en) * | 2018-01-05 | 2021-10-22 | 比亚迪股份有限公司 | Image detection method, image detection device, computer equipment and storage medium |
CN110008792A (en) * | 2018-01-05 | 2019-07-12 | 比亚迪股份有限公司 | Image detecting method, device, computer equipment and storage medium |
CN110619254A (en) * | 2018-06-19 | 2019-12-27 | 海信集团有限公司 | Target tracking method and device based on disparity map and terminal |
CN110619254B (en) * | 2018-06-19 | 2023-04-18 | 海信集团有限公司 | Target tracking method and device based on disparity map and terminal |
CN108968811A (en) * | 2018-06-20 | 2018-12-11 | 四川斐讯信息技术有限公司 | A kind of object identification method and system of sweeping robot |
CN108764215A (en) * | 2018-06-21 | 2018-11-06 | 郑州云海信息技术有限公司 | Target search method for tracing, system, service centre and terminal based on video |
CN109145781A (en) * | 2018-08-03 | 2019-01-04 | 北京字节跳动网络技术有限公司 | Method and apparatus for handling image |
CN109145781B (en) * | 2018-08-03 | 2021-05-04 | 北京字节跳动网络技术有限公司 | Method and apparatus for processing image |
CN110826572A (en) * | 2018-08-09 | 2020-02-21 | 京东方科技集团股份有限公司 | Multi-target detection non-maximum suppression method, device and equipment |
CN110826379B (en) * | 2018-08-13 | 2022-03-22 | 中国科学院长春光学精密机械与物理研究所 | Target detection method based on feature multiplexing and YOLOv3 |
CN110826379A (en) * | 2018-08-13 | 2020-02-21 | 中国科学院长春光学精密机械与物理研究所 | Target detection method based on feature multiplexing and YOLOv3 |
CN111104831A (en) * | 2018-10-29 | 2020-05-05 | 香港城市大学深圳研究院 | Visual tracking method, device, computer equipment and medium |
CN111104831B (en) * | 2018-10-29 | 2023-09-29 | 香港城市大学深圳研究院 | Visual tracking method, device, computer equipment and medium |
CN111178495A (en) * | 2018-11-10 | 2020-05-19 | 杭州凝眸智能科技有限公司 | Lightweight convolutional neural network for detecting very small objects in images |
CN109410251A (en) * | 2018-11-19 | 2019-03-01 | 南京邮电大学 | Method for tracking target based on dense connection convolutional network |
CN109817009A (en) * | 2018-12-31 | 2019-05-28 | 天合光能股份有限公司 | A method of obtaining unmanned required dynamic information |
CN109753931A (en) * | 2019-01-04 | 2019-05-14 | 广州广电卓识智能科技有限公司 | Convolutional neural networks training method, system and facial feature points detection method |
CN110007366A (en) * | 2019-03-04 | 2019-07-12 | 中国科学院深圳先进技术研究院 | A kind of life searching method and system based on Multi-sensor Fusion |
CN110087041A (en) * | 2019-04-30 | 2019-08-02 | 中国科学院计算技术研究所 | Video data processing and transmission method and system based on the base station 5G |
CN110443789B (en) * | 2019-08-01 | 2021-11-26 | 四川大学华西医院 | Method for establishing and using immune fixed electrophoretogram automatic identification model |
CN110443789A (en) * | 2019-08-01 | 2019-11-12 | 四川大学华西医院 | A kind of foundation and application method of immunofixation electrophoresis figure automatic identification model |
CN110487211B (en) * | 2019-09-29 | 2020-07-24 | 中国科学院长春光学精密机械与物理研究所 | Aspheric element surface shape detection method, device and equipment and readable storage medium |
CN110487211A (en) * | 2019-09-29 | 2019-11-22 | 中国科学院长春光学精密机械与物理研究所 | Non-spherical element surface testing method, device, equipment and readable storage medium storing program for executing |
CN112306104A (en) * | 2020-11-17 | 2021-02-02 | 广西电网有限责任公司 | Image target tracking holder control method based on grid weighting |
CN112911171A (en) * | 2021-02-04 | 2021-06-04 | 上海航天控制技术研究所 | Intelligent photoelectric information processing system and method based on accelerated processing |
CN112911171B (en) * | 2021-02-04 | 2022-04-22 | 上海航天控制技术研究所 | Intelligent photoelectric information processing system and method based on accelerated processing |
CN115482417A (en) * | 2022-09-29 | 2022-12-16 | 珠海视熙科技有限公司 | Multi-target detection model and training method, device, medium and equipment thereof |
CN115482417B (en) * | 2022-09-29 | 2023-08-08 | 珠海视熙科技有限公司 | Multi-target detection model, training method, device, medium and equipment thereof |
Also Published As
Publication number | Publication date |
---|---|
CN107808122B (en) | 2020-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107808122A (en) | Method for tracking target and device | |
CN105912990B (en) | The method and device of Face datection | |
CN110443969B (en) | Fire detection method and device, electronic equipment and storage medium | |
CN109902677A (en) | A kind of vehicle checking method based on deep learning | |
CN110188720A (en) | A kind of object detection method and system based on convolutional neural networks | |
US10683173B2 (en) | Method of managing resources in a warehouse | |
CN110164128A (en) | A kind of City-level intelligent transportation analogue system | |
CN107833209A (en) | A kind of x-ray image detection method, device, electronic equipment and storage medium | |
CN110163889A (en) | Method for tracking target, target tracker, target following equipment | |
CN109214948A (en) | A kind of method and apparatus of electric system heat load prediction | |
CN106682697A (en) | End-to-end object detection method based on convolutional neural network | |
CN109271970A (en) | Face datection model training method and device | |
CN103366602A (en) | Method of determining parking lot occupancy from digital camera images | |
CN102054166B (en) | A kind of scene recognition method for Outdoor Augmented Reality System newly | |
CN108062574A (en) | A kind of Weakly supervised object detection method based on particular category space constraint | |
Kim et al. | Structural recurrent neural network for traffic speed prediction | |
CN111160125A (en) | Railway foreign matter intrusion detection method based on railway monitoring | |
CN107563549A (en) | A kind of best-effort path generation method, device and equipment based on BIM models | |
CN108038651A (en) | A kind of monitoring logistics transportation system for tracing and managing | |
CN111242144B (en) | Method and device for detecting abnormality of power grid equipment | |
CN109598430B (en) | Distribution range generation method, distribution range generation device, electronic equipment and storage medium | |
CN107358182A (en) | Pedestrian detection method and terminal device | |
CN108389631A (en) | Varicella morbidity method for early warning, server and computer readable storage medium | |
CN109657077A (en) | Model training method, lane line generation method, equipment and storage medium | |
Nishanthi et al. | Prediction of dengue outbreaks in Sri Lanka using artificial neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |