Immediate targets tracking detection method and system based on full convolutional neural networks
Technical field
The invention belongs to the technical fields of deep learning, are related to a kind of immediate targets tracking based on full convolutional neural networks
Detection method and system.
Background technique
Target tracking detection is all the research emphasis of image and field of video detection all the time, it is at traffic image
Crux tracking and video specially treated (such as breaking mosaic) field in reason, medical image have highly important effect.
Common target tracking detection method has Struck, SCM, ASLA, KCF, TLD and DCF etc., but these methods are all deposited
It is lower in many disadvantages, such as part accuracy of identification, and part can only identify particular category object, when needing to identify new object
It when body type, needs to rewrite algorithm, this makes its development cycle longer, and is mostly based on the calculation of filtering detection at present
Method is all unable to reach higher semantic segmentation precision.
The target tracking detection algorithm based on deep learning rises in recent years, although these algorithms have higher performance,
It is slow that however, there are also arithmetic speeds, and training is difficult, and has to the size of input picture and be strict with.In practical application
In, often there is the demand of multi-target detection, conventional method often may only single goal tracking, and when needing multi-target tracking
When, then Multi-thread synchronization must be used to track each unit respectively, or introduce other methods, this has ten for arithmetic speed
Divide deleterious effect.So the optimization of target tracking detection method is a highly important project.
Summary of the invention
The shortcomings that it is a primary object of the present invention to overcome the prior art and deficiency provide a kind of based on full convolutional Neural net
The immediate targets tracking detection method of network can carry out accurate semantic segmentation to tracking target and have the higher speed of service.
Another object of the present invention is to provide a kind of immediate targets trace detection system based on full convolutional neural networks
System.
In order to reach above-mentioned first purpose, the invention adopts the following technical scheme:
It is described to include the following steps: the present invention is based on the immediate targets tracking detection method of full convolutional neural networks
S1, data enhancing processing is carried out to the image that data are concentrated, obtains training sample;
S2, obtained training sample Target Segmentation figure corresponding with training sample first frame is carried out with Color Channel dimension
Combination generates a new three-dimensional array, and all pixels value therein is normalized;
S3, the corresponding Target Segmentation figure of the second frame of training sample and its transposition figure are combined in Color Channel dimension,
As the label of neural network, for calculating penalty values;
S4, the full convolution of building fight neural network, and the main body of the full convolution confrontation neural network is by one for generating
The full convolutional network of Target Segmentation figure and an arbiter network for dual training are constituted, wherein full convolutional network nervous layer
Setting can be adjusted with use demand with construction, and that classifier is not added in the middle part of full convolutional neural networks;
S5, segmentation figure that full convolutional network generates and its label are inputted into arbiter network respectively, using the arbiter into
Two classification based training of row allows it to may determine that the segmentation figure is the data falsification generated by full convolutional network or artificially marks true
Real data, the target value provided for its data falsification and the truthful data artificially marked is respectively, pseudo-: [[0], [1]], it is true:
[[1], [0]] intersects entropy function and calculates to obtain penalty values 0;
S6, the segmentation figure that full convolutional network is generated and its label calculate penalty values 1 using cross entropy, then full convolution net
The segmentation figure that network generates inputs arbiter network, is for its target value provided, true: [[1], [0]] uses intersection entropy function meter
Calculate its penalty values 2;
S7, step S5 and S6 are carried out in turn, until full convolutional network is generated as close possible to really artificially drawing
Target Segmentation figure.
As a preferred technical solution, in step sl, the data enhancing processing specifically includes following several ways:
(1) the image combination of two of adjacent two frames all in data set can be obtained altogether when one section of image length is n frame
Obtain n-1 sample;
(2) image in data set is separated by the image combination of a frame or more as training sample;
(3) using the image that front and back two field pictures sequence is exchanged in data set as training sample.
As a preferred technical solution, in the step S2, there are three Color Channels for the cromogram after normalized, and
Target Segmentation figure has a Color Channel, therefore new three-dimensional array height is 7, and area is still identical with original image, the new three-dimensional
Input of the array as neural network.
As a preferred technical solution, in the step S3, transposition figure is obtained by following manner:
If the corresponding Target Segmentation figure of the second frame is P, its transposition figure calculation formula is PTurn=1-P.
As a preferred technical solution, in the step S4, adopted in the last layer of full convolutional network and arbiter network
With Softmax by data compression between (0,1).
Further include following step as a preferred technical solution, in the step S5:
Penalty values 0 are calculated using entropy function is intersected, the parameter of fixed full convolutional network be changed it will not in training,
Penalty values 0 are minimized using Adam optimizer, optimize the parameter of arbiter network.
Further include following step as a preferred technical solution, in step S6:
The parameter of fixed arbiter network be changed it will not in training, minimize penalty values using Adam optimizer
1 and penalty values 2, optimize the parameter of full convolutional network.
In order to reach above-mentioned second purpose, the invention adopts the following technical scheme:
The present invention is based on the immediate targets trace detection system of full convolutional neural networks, the system comprises:
Data enhance processing module, and the image for concentrating to data carries out data enhancing processing, obtain training sample;
Three-dimensional array generation module, for by obtained training sample Target Segmentation figure corresponding with training sample first frame
It is combined with Color Channel dimension, generates a new three-dimensional array;
Composite module is used for the corresponding Target Segmentation figure of the second frame of training sample and its transposition figure in Color Channel dimension
It is combined;
Neural network constructs module, for constructing full convolution confrontation neural network, the full convolution confrontation neural network
Main body is made of a full convolutional network and an arbiter network for dual training for generating Target Segmentation figure;
Judgment module, segmentation figure and its label for generating full convolutional network input arbiter network respectively, judge
The truthful data that the segmentation figure is the data falsification generated by full convolutional network or artificially marks;
Penalty values computing module, segmentation figure and its label for generating full convolutional network are calculated using cross entropy loses
Value 1, the segmentation figure that then full convolutional network generates inputs arbiter network, is for its target value provided, true: [[1], [0]],
Its penalty values 2 is calculated using entropy function is intersected;
Loop module, for carrying out judgment module and penalty values computing module in turn, until the generation of full convolutional network to the greatest extent may be used
It can be close to the Target Segmentation figure really artificially drawn.
It further include as a preferred technical solution, normalized module in the three-dimensional array generation module, normalization
There are three Color Channels for treated cromogram, and Target Segmentation figure has a Color Channel, therefore new three-dimensional array height is
7, area is still identical with original image, the input of the new three-dimensional array as neural network.
It as a preferred technical solution, further include arbiter network reference services module and full convolutional network parameter optimization mould
Block;
The arbiter network reference services module calculates penalty values 0 using entropy function is intersected, fixed full convolutional network
Parameter be changed it will not in training, minimizes penalty values 0 using Adam optimizer, optimizes the parameter of arbiter network;
The full convolutional network parameter optimization module minimizes penalty values 1 and penalty values 2, optimization using Adam optimizer
The parameter of full convolutional network.
Compared with the prior art, the invention has the following advantages and beneficial effects:
Present invention accuracy with higher can carry out accurate semantic segmentation to tracking target and have higher operation
Speed;And due to the special construction of full convolutional neural networks, neural network does not appoint the size of input picture and video
What is required, this neural metwork training is simple and quick.
Detailed description of the invention
Fig. 1 (a), Fig. 1 (b) are respectively the Target Segmentation figure of original graph and drafting;
Fig. 2 is the core methed of neural network work of the present invention;
Fig. 3 is the combined method for inputting the image array of neural network;
Fig. 4 (a), Fig. 4 (b) are Target Segmentation figure and its transposition figure respectively;
Fig. 5 is used to generate a kind of structure of the full convolutional neural networks of Target Segmentation figure, and wherein arrow represents data
Flow direction;
Fig. 6 is a kind of structure of the arbiter network for dual training, and wherein arrow represents the flow direction of data;
Fig. 7 is the training flow chart of neural network, and wherein arrow represents the flow direction of data;
Fig. 8 is application method of the present invention in actually detected, and wherein arrow represents the flow direction of data;
Fig. 9 is system block diagram of the invention.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited
In this.
Embodiment 1
Target tracking application in (by taking video object is tracked as an example), it is only necessary to draw one by one with video pictures size
Equal Target Segmentation figure, region where target in the first frame image of video is by the corresponding region in Target Segmentation figure
Pixel be set as 1, and background area is set as 0 (process that such as Fig. 1 (a) arrives Fig. 1 (b)).Then by Target Segmentation figure and view
Frequency inputs together, can return to corresponding Target Segmentation figure according to each frame image in video using method of the invention.
Since two images of consecutive frame and its corresponding Target Segmentation figure often have some associations in video, by
This can the image according to next frame and the image based on present frame with the Target Segmentation figure of present frame speculate that next frame is corresponding
Target Segmentation figure (as shown in Figure 2), the present invention is in this, as core ideas.
Technical solution of the present invention provides a kind of immediate targets tracking detection method based on full convolutional neural networks, packet
Include following steps:
S1: the image combination of two of adjacent two frames all in data set can be obtained altogether when one section of image length is n frame
N-1 sample is obtained, this makes it have huge training sample.In addition, in order to further enhance the training effect of neural network, it can
Image will be separated by a frame or more is combined as training sample, the sequence of front and back two field pictures can also be exchanged, these are all
It is relatively good data enhancement methods.
S2: by the sample obtained in step S1 Target Segmentation figure corresponding with sample first frame with the progress of Color Channel dimension
Combination generates a new three-dimensional array (as shown in Figure 3), and all pixels value therein is made its normalization divided by 255.By
In color image, there are three Color Channels, and Target Segmentation figure has a Color Channel, and new three-dimensional array height is 7, area
It is still identical with original image.Using the three-dimensional array as the input of neural network.
S3: by the corresponding Target Segmentation figure of the second frame and its transposition figure (shown in such as Fig. 4 (a), Fig. 4 (b)) in Color Channel
Dimension is combined, as the label of neural network, for calculating penalty values.If the corresponding Target Segmentation figure of the second frame is P, it
Transposition figure calculation formula it is as follows:
PTurn=1-P
S4: full convolution confrontation neural network is constructed, the main body of neural network is by one for generating the complete of Target Segmentation figure
The arbiter network composition (as shown in Figure 6) of convolutional network (as shown in Figure 5) and one for dual training, wherein nervous layer
Setting can be adjusted with construction with use demand, and Fig. 5 and Fig. 6 are wherein an example.Since this neural network is not right
In the demand of object classification, there is no classifier is added in the middle part of full convolutional neural networks.
The last layer of full convolutional network and arbiter network is using Softmax rather than sigmoid is by data compression
The reason of between (0,1) is that sigmoid is easily saturated, and gradient is be easy to cause to disappear, and Softmax can be propagated more efficiently
Gradient is conducive to the optimization of neural network.
S5: segmentation figure and its label that full convolutional network generates are inputted into arbiter network respectively, it is made to judge the image
The truthful data for being the data falsification generated by full convolutional network or artificially marking.It is respectively for its target value provided, pseudo-:
[[0], [1]], it is true: [[1], [0]].Calculate its penalty values 0 using entropy function is intersected, the parameter of fixed full convolutional network make its
It will not be changed in training, penalty values 0 are minimized using Adam optimizer, optimize the parameter of arbiter network.
S6: the segmentation figure that full convolutional network is generated and its label use cross entropy to calculate penalty values 1, then full convolution net
The segmentation figure that network generates inputs arbiter network, is for its target value provided, true: [[1], [0]] uses intersection entropy function meter
Calculate its penalty values 2.The parameter of fixed arbiter network be changed it will not in training, is minimized and is damaged using Adam optimizer
Mistake value 1 and penalty values 2, optimize the parameter of full convolutional network.
Step S5 and step S6 are run in turn (shown in Fig. 7) in training.This is one very classical and cleverly right
Anti- training, it is conducive to help full convolutional network to generate as close possible to the Target Segmentation figure really artificially drawn, and allows nerve
The quality of network output has further promotion.
Full convolutional neural networks export two images in each use in the present invention, this is that neural network is last
Caused by Softmax data processing, only with extracting first image when use.
In actual use, the target following in one section of video is detected, it is only necessary to which the Target Segmentation of first frame is provided
Figure, inputs neural network together with the image of first frame and the second frame for it, obtains the Target Segmentation figure of the second frame, similarly, will
Neural network is inputted together with obtained the second frame Target Segmentation figure and the image of the second frame and third frame, obtains the target of third frame
Segmentation figure.And so on, nerve net will be inputted together with obtained n-th frame Target Segmentation figure and the image of n-th frame and the (n+1)th frame
Network, the Target Segmentation figure (as shown in Figure 8) of available (n+1)th frame.As long as being hidden completely does not occur in target object in video
The case where gear, the present invention can be operated well.
Due to the present invention for track object detection be based only upon present frame image Target Segmentation figure and image and
The data of the approach of next frame, dependence are few, and arithmetic speed is fast, have instantaneity, can be while video capture just to video
In target be tracked.
Embodiment 2
The present embodiment 2 is that immediate targets tracking detection method of the above-described embodiment 1 based on full convolutional neural networks is corresponding
System, as shown in figure 9, the system includes: data enhancing processing module 101, three-dimensional array generation module 102, composite module
103, neural network building module 104, judgment module 105, penalty values computing module 106 and loop module 107;
The data enhance processing module 101, and the image for concentrating to data carries out data enhancing processing, are trained
Sample;
The three-dimensional array generation module 102, for by obtained training sample mesh corresponding with training sample first frame
Mark segmentation figure is combined with Color Channel dimension, generates a new three-dimensional array;In the three-dimensional array generation module also
Including normalized module, there are three Color Channels for the cromogram after normalized, and Target Segmentation figure has a color
Channel, therefore new three-dimensional array height is 7, area is still identical with original image, and the new three-dimensional array is as the defeated of neural network
Enter.
The composite module 103 is used for the corresponding Target Segmentation figure of the second frame of training sample and its transposition figure in color
Channel dimension is combined;
The neural network constructs module 104, for constructing full convolution confrontation neural network, the full convolution confrontation nerve
The main body of network is by a full convolutional network and an arbiter network for dual training for generating Target Segmentation figure
It constitutes;
The judgment module 105, segmentation figure and its label for generating full convolutional network input arbiter net respectively
Network, training arbiter judge the truthful data that the segmentation figure is the data falsification generated by full convolutional network or artificially marks;
The penalty values computing module 106, segmentation figure and its label for generating full convolutional network use cross entropy
Penalty values 1 are calculated, the segmentation figure that then full convolutional network generates inputs arbiter network, is for its target value provided, true:
[[1], [0]] calculates its penalty values 2 using entropy function is intersected;
The loop module 107, for carrying out judgment module and penalty values computing module in turn, until full convolutional network is raw
At as close possible to the Target Segmentation figure really artificially drawn.
It further include arbiter network reference services module and full convolutional network parameter optimization module in the present embodiment;
The arbiter network reference services module calculates penalty values 0 using entropy function is intersected, fixed full convolutional network
Parameter be changed it will not in training, minimizes penalty values 0 using Adam optimizer, optimizes the parameter of arbiter network;
The full convolutional network parameter optimization module minimizes penalty values 1 and penalty values 2, optimization using Adam optimizer
The parameter of full convolutional network.
It should be noted that system provided by the above embodiment is only illustrated with the division of above-mentioned each functional module
Illustrate, in practical applications, can according to need and be completed by different functional modules above-mentioned function distribution, i.e., by internal junction
Structure is divided into different functional modules, to complete all or part of the functions described above.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment
Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention,
It should be equivalent substitute mode, be included within the scope of the present invention.