CN109543684A

CN109543684A - Immediate targets tracking detection method and system based on full convolutional neural networks

Info

Publication number: CN109543684A
Application number: CN201811172150.5A
Authority: CN
Inventors: 黄文恺; 胡凌恺; 薛义豪; 彭广龙; 何杰贤; 倪皓舟; 朱静; 吴羽
Original assignee: Guangzhou University
Current assignee: China Southern Power Grid Internet Service Co ltd; Ourchem Information Consulting Co ltd
Priority date: 2018-10-09
Filing date: 2018-10-09
Publication date: 2019-03-29
Anticipated expiration: 2038-10-09
Also published as: CN109543684B

Abstract

The invention discloses a real-time target tracking detection method and system based on a full convolutional neural network. The method includes: S1, performing data enhancement processing to obtain training samples; S2, converting the training samples and the target segmentation map corresponding to the first frame to The color channel dimension is combined; S3, the target segmentation map corresponding to the second frame of the training sample and its transpose map are combined in the color channel dimension; S4, a fully convolutional adversarial neural network is constructed, which consists of a fully convolutional network and a discriminator network. S5, train the discriminator to judge whether the segmentation map is fake data or real data generated by the fully convolutional network; S6, use cross entropy to calculate the loss value 1 and the loss value 2 of the segmentation map and its label; S7, step S5 and step S5 and S6 takes turns until the fully convolutional network generates a human-drawn object segmentation map that is as close to the real as possible. The invention relies on less data, has fast operation speed, and is instant, and can track the target in the video at the same time as the video is shot.

Description

Immediate targets tracking detection method and system based on full convolutional neural networks

Technical field

The invention belongs to the technical fields of deep learning, are related to a kind of immediate targets tracking based on full convolutional neural networks Detection method and system.

Background technique

Target tracking detection is all the research emphasis of image and field of video detection all the time, it is at traffic image Crux tracking and video specially treated (such as breaking mosaic) field in reason, medical image have highly important effect.

Common target tracking detection method has Struck, SCM, ASLA, KCF, TLD and DCF etc., but these methods are all deposited It is lower in many disadvantages, such as part accuracy of identification, and part can only identify particular category object, when needing to identify new object It when body type, needs to rewrite algorithm, this makes its development cycle longer, and is mostly based on the calculation of filtering detection at present Method is all unable to reach higher semantic segmentation precision.

The target tracking detection algorithm based on deep learning rises in recent years, although these algorithms have higher performance, It is slow that however, there are also arithmetic speeds, and training is difficult, and has to the size of input picture and be strict with.In practical application In, often there is the demand of multi-target detection, conventional method often may only single goal tracking, and when needing multi-target tracking When, then Multi-thread synchronization must be used to track each unit respectively, or introduce other methods, this has ten for arithmetic speed Divide deleterious effect.So the optimization of target tracking detection method is a highly important project.

Summary of the invention

The shortcomings that it is a primary object of the present invention to overcome the prior art and deficiency provide a kind of based on full convolutional Neural net The immediate targets tracking detection method of network can carry out accurate semantic segmentation to tracking target and have the higher speed of service.

Another object of the present invention is to provide a kind of immediate targets trace detection system based on full convolutional neural networks System.

In order to reach above-mentioned first purpose, the invention adopts the following technical scheme:

It is described to include the following steps: the present invention is based on the immediate targets tracking detection method of full convolutional neural networks

S1, data enhancing processing is carried out to the image that data are concentrated, obtains training sample；

S2, obtained training sample Target Segmentation figure corresponding with training sample first frame is carried out with Color Channel dimension Combination generates a new three-dimensional array, and all pixels value therein is normalized；

S3, the corresponding Target Segmentation figure of the second frame of training sample and its transposition figure are combined in Color Channel dimension, As the label of neural network, for calculating penalty values；

S4, the full convolution of building fight neural network, and the main body of the full convolution confrontation neural network is by one for generating The full convolutional network of Target Segmentation figure and an arbiter network for dual training are constituted, wherein full convolutional network nervous layer Setting can be adjusted with use demand with construction, and that classifier is not added in the middle part of full convolutional neural networks；

S5, segmentation figure that full convolutional network generates and its label are inputted into arbiter network respectively, using the arbiter into Two classification based training of row allows it to may determine that the segmentation figure is the data falsification generated by full convolutional network or artificially marks true Real data, the target value provided for its data falsification and the truthful data artificially marked is respectively, pseudo-: [[0], [1]], it is true: [[1], [0]] intersects entropy function and calculates to obtain penalty values 0；

S6, the segmentation figure that full convolutional network is generated and its label calculate penalty values 1 using cross entropy, then full convolution net The segmentation figure that network generates inputs arbiter network, is for its target value provided, true: [[1], [0]] uses intersection entropy function meter Calculate its penalty values 2；

S7, step S5 and S6 are carried out in turn, until full convolutional network is generated as close possible to really artificially drawing Target Segmentation figure.

As a preferred technical solution, in step sl, the data enhancing processing specifically includes following several ways:

(1) the image combination of two of adjacent two frames all in data set can be obtained altogether when one section of image length is n frame Obtain n-1 sample；

(2) image in data set is separated by the image combination of a frame or more as training sample；

(3) using the image that front and back two field pictures sequence is exchanged in data set as training sample.

As a preferred technical solution, in the step S2, there are three Color Channels for the cromogram after normalized, and Target Segmentation figure has a Color Channel, therefore new three-dimensional array height is 7, and area is still identical with original image, the new three-dimensional Input of the array as neural network.

As a preferred technical solution, in the step S3, transposition figure is obtained by following manner:

If the corresponding Target Segmentation figure of the second frame is P, its transposition figure calculation formula is P_Turn=1-P.

As a preferred technical solution, in the step S4, adopted in the last layer of full convolutional network and arbiter network With Softmax by data compression between (0,1).

Further include following step as a preferred technical solution, in the step S5:

Penalty values 0 are calculated using entropy function is intersected, the parameter of fixed full convolutional network be changed it will not in training, Penalty values 0 are minimized using Adam optimizer, optimize the parameter of arbiter network.

Further include following step as a preferred technical solution, in step S6:

The parameter of fixed arbiter network be changed it will not in training, minimize penalty values using Adam optimizer 1 and penalty values 2, optimize the parameter of full convolutional network.

In order to reach above-mentioned second purpose, the invention adopts the following technical scheme:

The present invention is based on the immediate targets trace detection system of full convolutional neural networks, the system comprises:

Data enhance processing module, and the image for concentrating to data carries out data enhancing processing, obtain training sample；

Three-dimensional array generation module, for by obtained training sample Target Segmentation figure corresponding with training sample first frame It is combined with Color Channel dimension, generates a new three-dimensional array；

Composite module is used for the corresponding Target Segmentation figure of the second frame of training sample and its transposition figure in Color Channel dimension It is combined；

Neural network constructs module, for constructing full convolution confrontation neural network, the full convolution confrontation neural network Main body is made of a full convolutional network and an arbiter network for dual training for generating Target Segmentation figure；

Judgment module, segmentation figure and its label for generating full convolutional network input arbiter network respectively, judge The truthful data that the segmentation figure is the data falsification generated by full convolutional network or artificially marks；

Penalty values computing module, segmentation figure and its label for generating full convolutional network are calculated using cross entropy loses Value 1, the segmentation figure that then full convolutional network generates inputs arbiter network, is for its target value provided, true: [[1], [0]], Its penalty values 2 is calculated using entropy function is intersected；

Loop module, for carrying out judgment module and penalty values computing module in turn, until the generation of full convolutional network to the greatest extent may be used It can be close to the Target Segmentation figure really artificially drawn.

It further include as a preferred technical solution, normalized module in the three-dimensional array generation module, normalization There are three Color Channels for treated cromogram, and Target Segmentation figure has a Color Channel, therefore new three-dimensional array height is 7, area is still identical with original image, the input of the new three-dimensional array as neural network.

It as a preferred technical solution, further include arbiter network reference services module and full convolutional network parameter optimization mould Block；

The arbiter network reference services module calculates penalty values 0 using entropy function is intersected, fixed full convolutional network Parameter be changed it will not in training, minimizes penalty values 0 using Adam optimizer, optimizes the parameter of arbiter network；

The full convolutional network parameter optimization module minimizes penalty values 1 and penalty values 2, optimization using Adam optimizer The parameter of full convolutional network.

Compared with the prior art, the invention has the following advantages and beneficial effects:

Present invention accuracy with higher can carry out accurate semantic segmentation to tracking target and have higher operation Speed；And due to the special construction of full convolutional neural networks, neural network does not appoint the size of input picture and video What is required, this neural metwork training is simple and quick.

Detailed description of the invention

Fig. 1 (a), Fig. 1 (b) are respectively the Target Segmentation figure of original graph and drafting；

Fig. 2 is the core methed of neural network work of the present invention；

Fig. 3 is the combined method for inputting the image array of neural network；

Fig. 4 (a), Fig. 4 (b) are Target Segmentation figure and its transposition figure respectively；

Fig. 5 is used to generate a kind of structure of the full convolutional neural networks of Target Segmentation figure, and wherein arrow represents data Flow direction；

Fig. 6 is a kind of structure of the arbiter network for dual training, and wherein arrow represents the flow direction of data；

Fig. 7 is the training flow chart of neural network, and wherein arrow represents the flow direction of data；

Fig. 8 is application method of the present invention in actually detected, and wherein arrow represents the flow direction of data；

Fig. 9 is system block diagram of the invention.

Specific embodiment

Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.

Embodiment 1

Target tracking application in (by taking video object is tracked as an example), it is only necessary to draw one by one with video pictures size Equal Target Segmentation figure, region where target in the first frame image of video is by the corresponding region in Target Segmentation figure Pixel be set as 1, and background area is set as 0 (process that such as Fig. 1 (a) arrives Fig. 1 (b)).Then by Target Segmentation figure and view Frequency inputs together, can return to corresponding Target Segmentation figure according to each frame image in video using method of the invention.

Since two images of consecutive frame and its corresponding Target Segmentation figure often have some associations in video, by This can the image according to next frame and the image based on present frame with the Target Segmentation figure of present frame speculate that next frame is corresponding Target Segmentation figure (as shown in Figure 2), the present invention is in this, as core ideas.

Technical solution of the present invention provides a kind of immediate targets tracking detection method based on full convolutional neural networks, packet Include following steps:

S1: the image combination of two of adjacent two frames all in data set can be obtained altogether when one section of image length is n frame N-1 sample is obtained, this makes it have huge training sample.In addition, in order to further enhance the training effect of neural network, it can Image will be separated by a frame or more is combined as training sample, the sequence of front and back two field pictures can also be exchanged, these are all It is relatively good data enhancement methods.

S2: by the sample obtained in step S1 Target Segmentation figure corresponding with sample first frame with the progress of Color Channel dimension Combination generates a new three-dimensional array (as shown in Figure 3), and all pixels value therein is made its normalization divided by 255.By In color image, there are three Color Channels, and Target Segmentation figure has a Color Channel, and new three-dimensional array height is 7, area It is still identical with original image.Using the three-dimensional array as the input of neural network.

S3: by the corresponding Target Segmentation figure of the second frame and its transposition figure (shown in such as Fig. 4 (a), Fig. 4 (b)) in Color Channel Dimension is combined, as the label of neural network, for calculating penalty values.If the corresponding Target Segmentation figure of the second frame is P, it Transposition figure calculation formula it is as follows:

P_Turn=1-P

S4: full convolution confrontation neural network is constructed, the main body of neural network is by one for generating the complete of Target Segmentation figure The arbiter network composition (as shown in Figure 6) of convolutional network (as shown in Figure 5) and one for dual training, wherein nervous layer Setting can be adjusted with construction with use demand, and Fig. 5 and Fig. 6 are wherein an example.Since this neural network is not right In the demand of object classification, there is no classifier is added in the middle part of full convolutional neural networks.

The last layer of full convolutional network and arbiter network is using Softmax rather than sigmoid is by data compression The reason of between (0,1) is that sigmoid is easily saturated, and gradient is be easy to cause to disappear, and Softmax can be propagated more efficiently Gradient is conducive to the optimization of neural network.

S5: segmentation figure and its label that full convolutional network generates are inputted into arbiter network respectively, it is made to judge the image The truthful data for being the data falsification generated by full convolutional network or artificially marking.It is respectively for its target value provided, pseudo-: [[0], [1]], it is true: [[1], [0]].Calculate its penalty values 0 using entropy function is intersected, the parameter of fixed full convolutional network make its It will not be changed in training, penalty values 0 are minimized using Adam optimizer, optimize the parameter of arbiter network.

S6: the segmentation figure that full convolutional network is generated and its label use cross entropy to calculate penalty values 1, then full convolution net The segmentation figure that network generates inputs arbiter network, is for its target value provided, true: [[1], [0]] uses intersection entropy function meter Calculate its penalty values 2.The parameter of fixed arbiter network be changed it will not in training, is minimized and is damaged using Adam optimizer Mistake value 1 and penalty values 2, optimize the parameter of full convolutional network.

Step S5 and step S6 are run in turn (shown in Fig. 7) in training.This is one very classical and cleverly right Anti- training, it is conducive to help full convolutional network to generate as close possible to the Target Segmentation figure really artificially drawn, and allows nerve The quality of network output has further promotion.

Full convolutional neural networks export two images in each use in the present invention, this is that neural network is last Caused by Softmax data processing, only with extracting first image when use.

In actual use, the target following in one section of video is detected, it is only necessary to which the Target Segmentation of first frame is provided Figure, inputs neural network together with the image of first frame and the second frame for it, obtains the Target Segmentation figure of the second frame, similarly, will Neural network is inputted together with obtained the second frame Target Segmentation figure and the image of the second frame and third frame, obtains the target of third frame Segmentation figure.And so on, nerve net will be inputted together with obtained n-th frame Target Segmentation figure and the image of n-th frame and the (n+1)th frame Network, the Target Segmentation figure (as shown in Figure 8) of available (n+1)th frame.As long as being hidden completely does not occur in target object in video The case where gear, the present invention can be operated well.

Due to the present invention for track object detection be based only upon present frame image Target Segmentation figure and image and The data of the approach of next frame, dependence are few, and arithmetic speed is fast, have instantaneity, can be while video capture just to video In target be tracked.

Embodiment 2

The present embodiment 2 is that immediate targets tracking detection method of the above-described embodiment 1 based on full convolutional neural networks is corresponding System, as shown in figure 9, the system includes: data enhancing processing module 101, three-dimensional array generation module 102, composite module 103, neural network building module 104, judgment module 105, penalty values computing module 106 and loop module 107；

The data enhance processing module 101, and the image for concentrating to data carries out data enhancing processing, are trained Sample；

The three-dimensional array generation module 102, for by obtained training sample mesh corresponding with training sample first frame Mark segmentation figure is combined with Color Channel dimension, generates a new three-dimensional array；In the three-dimensional array generation module also Including normalized module, there are three Color Channels for the cromogram after normalized, and Target Segmentation figure has a color Channel, therefore new three-dimensional array height is 7, area is still identical with original image, and the new three-dimensional array is as the defeated of neural network Enter.

The composite module 103 is used for the corresponding Target Segmentation figure of the second frame of training sample and its transposition figure in color Channel dimension is combined；

The neural network constructs module 104, for constructing full convolution confrontation neural network, the full convolution confrontation nerve The main body of network is by a full convolutional network and an arbiter network for dual training for generating Target Segmentation figure It constitutes；

The judgment module 105, segmentation figure and its label for generating full convolutional network input arbiter net respectively Network, training arbiter judge the truthful data that the segmentation figure is the data falsification generated by full convolutional network or artificially marks；

The penalty values computing module 106, segmentation figure and its label for generating full convolutional network use cross entropy Penalty values 1 are calculated, the segmentation figure that then full convolutional network generates inputs arbiter network, is for its target value provided, true: [[1], [0]] calculates its penalty values 2 using entropy function is intersected；

The loop module 107, for carrying out judgment module and penalty values computing module in turn, until full convolutional network is raw At as close possible to the Target Segmentation figure really artificially drawn.

It further include arbiter network reference services module and full convolutional network parameter optimization module in the present embodiment；

It should be noted that system provided by the above embodiment is only illustrated with the division of above-mentioned each functional module Illustrate, in practical applications, can according to need and be completed by different functional modules above-mentioned function distribution, i.e., by internal junction Structure is divided into different functional modules, to complete all or part of the functions described above.

The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims

1. the immediate targets tracking detection method based on full convolutional neural networks, which is characterized in that described to include the following steps:

S2, obtained training sample Target Segmentation figure corresponding with training sample first frame is subjected to group with Color Channel dimension It closes, generates a new three-dimensional array, and all pixels value therein is normalized；

S4, the full convolution of building fight neural network, and the main body of the full convolution confrontation neural network is by one for generating target The full convolutional network of segmentation figure and an arbiter network for dual training are constituted, wherein full convolutional network nervous layer is set Setting can adjust with construction with use demand, and that classifier is not added in the middle part of full convolutional neural networks；

S5, segmentation figure that full convolutional network generates and its label are inputted into arbiter network respectively, the arbiter is used to carry out two Classification based training, the true number for allowing it to may determine that the segmentation figure is the data falsification generated by full convolutional network or artificially marks According to, the target value provided for its data falsification and the truthful data artificially marked is respectively, pseudo-: [[0], [1]], true: [[1], [0]], intersect entropy function and calculate to obtain penalty values 0；

S6, the segmentation figure that full convolutional network is generated and its label calculate penalty values 1 using cross entropy, and then full convolutional network is raw At segmentation figure input arbiter network, be that true: [[1], [0]] calculates it using entropy function is intersected for its target value provided Penalty values 2；

S7, step S5 and S6 are carried out in turn, until full convolutional network is generated as close possible to the target really artificially drawn Segmentation figure.

2. according to claim 1 based on the immediate targets tracking detection method of full convolutional neural networks, which is characterized in that In step S1, the data enhancing processing specifically includes following several ways:

(1) the image combination of two of adjacent two frames all in data set be can get into n-1 when one section of image length is n frame altogether A sample；

3. according to claim 1 based on the immediate targets tracking detection method of full convolutional neural networks, which is characterized in that institute It states in step S2, there are three Color Channels for the cromogram after normalized, and Target Segmentation figure has a Color Channel, therefore new Three-dimensional array height be 7, area is still identical with original image, the input of the new three-dimensional array as neural network.

4. according to claim 1 based on the immediate targets tracking detection method of full convolutional neural networks, which is characterized in that institute It states in step S3, transposition figure is obtained by following manner:

5. according to claim 1 based on the immediate targets tracking detection method of full convolutional neural networks, which is characterized in that institute State in step S4, the last layer of full convolutional network and arbiter network using Softmax by data compression to (0,1) it Between.

6. according to claim 1 based on the immediate targets tracking detection method of full convolutional neural networks, which is characterized in that institute It states in step S5, further includes following step:

Penalty values 0 are calculated using entropy function is intersected, the parameter of fixed full convolutional network be changed it will not in training, use Adam optimizer minimizes penalty values 0, optimizes the parameter of arbiter network.

7. according to claim 1 based on the immediate targets tracking detection method of full convolutional neural networks, which is characterized in that step Further include following step in rapid S6:

The parameter of fixed arbiter network be changed it will not in training, minimize 1 He of penalty values using Adam optimizer Penalty values 2 optimize the parameter of full convolutional network.

8. the immediate targets trace detection system based on full convolutional neural networks, which is characterized in that the system comprises:

Three-dimensional array generation module, for by obtained training sample Target Segmentation figure corresponding with training sample first frame with face Chrominance channel dimension is combined, and generates a new three-dimensional array；

Composite module, for carrying out the corresponding Target Segmentation figure of the second frame of training sample and its transposition figure in Color Channel dimension Combination；

Neural network constructs module, for constructing full convolution confrontation neural network, the main body of the full convolution confrontation neural network It is made of a full convolutional network and an arbiter network for dual training for generating Target Segmentation figure；

Judgment module, segmentation figure and its label for generating full convolutional network input arbiter network respectively, judge this point Cut the truthful data that figure is the data falsification generated by full convolutional network or artificially marks；

Penalty values computing module, segmentation figure and its label for generating full convolutional network calculate penalty values 1 using cross entropy, Then the segmentation figure that full convolutional network generates inputs arbiter network, is for its target value provided, true: [[1], [0]] uses Intersect entropy function and calculates its penalty values 2；

Loop module, for carrying out judgment module and penalty values computing module in turn, until the generation of full convolutional network connects as far as possible The nearly Target Segmentation figure really artificially drawn.

9. the immediate targets trace detection system based on full convolutional neural networks according to claim 8, which is characterized in that institute Stating further includes normalized module in three-dimensional array generation module, the cromogram after normalized there are three Color Channel, And Target Segmentation figure has a Color Channel, therefore new three-dimensional array height is 7, and area is still identical with original image, this new three Input of the dimension group as neural network.

10. the immediate targets trace detection system based on full convolutional neural networks according to claim 8, which is characterized in that It further include arbiter network reference services module and full convolutional network parameter optimization module；

The arbiter network reference services module calculates penalty values 0, the parameter of fixed full convolutional network using entropy function is intersected It is changed it will not in training, penalty values 0 minimized using Adam optimizer, optimize the parameter of arbiter network；

The full convolutional network parameter optimization module minimizes penalty values 1 and penalty values 2 using Adam optimizer, optimizes full volume The parameter of product network.