CN104408469A

CN104408469A - Firework identification method and firework identification system based on deep learning of image

Info

Publication number: CN104408469A
Application number: CN201410711008.9A
Authority: CN
Inventors: 赵俭辉; 王勇; 章登义; 武小平
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2014-11-28
Filing date: 2014-11-28
Publication date: 2015-03-11

Abstract

The invention discloses a firework identification method and a firework identification system based on deep learning of an image. The firework identification method comprises the following steps of step 1, acquiring a label-free sample image set and a label sample image set; step 2, obtaining a label-free training data set and a label training data set; step 3, performing whitening preliminary processing on training data; step 4, based on the label-free training data subjected to the whitening preliminary processing, constructing a deep neutral network based on sparse self coding by adopting unsupervised learning, and extracting a basic image feature set of the label-free training data; step 5, convolving basic image features and pooling image data; step 6, training a Softmax classifier based on the convolved and pooled label training data set; step 7, inputting the convolved and pooled images to be identified into the trained Softmax classifier to obtain the identification result. According to the firework identification method and the firework identification system disclosed by the invention, the visual identification rate of fireworks and a similar object can be effectively improved, and automatic identification with higher precision for the fireworks can be realized.

Description

Based on firework identification method and the system of picture depth study

Technical field

The invention belongs to the fire disaster intelligently monitoring based on digital picture and pyrotechnics automatic target recognition technology field, particularly relate to a kind of firework identification method based on picture depth study and system.

Background technology

Smoke and fire intelligent monitoring based on digital picture is a classical problem relevant to the numerous areas such as image procossing, computer vision, artificial intelligence, machine learning, more existing automatic documents identifying pyrotechnics object at present, identifying generally can be divided into several stages such as Target Segmentation, feature extraction, comprehensive descision.

Stage one, Target Segmentation.

The segmentation of pyrotechnics automatic target is roughly divided into the methods such as Threshold segmentation, rim detection segmentation, region characteristic segmentation, feature space cluster segmentation.Thresholding method mainly comprises histogram thresholding, maximum between-cluster variance (Otsu) threshold value, Two-dimensional Maximum entropy, Fuzzy Threshold, co-occurrence matrix threshold value etc.; Rim detection split plot design mainly comprises Sobel operator, Canny operator, Laplacan operator, Roberts operator, Prewitt operator, Susan operator, movable contour model, watershed algorithm, Level Set Method etc.; Region characteristic split plot design mainly comprises region growth, region separates and merging, mathematical morphology etc.; Feature space cluster segmentation method mainly comprises K average, fuzzy C-mean algorithm, Mean-Shift etc.Specifically, the acquisition of pyrotechnics target is usually by color segmentation, and as the gamut of coloration of fire and the tonal range of cigarette, and Current Color Model comprises RGB, HSI, YCbCr etc.

Stage two, feature extraction.

The visual signature of pyrotechnics target mainly comprises the features such as color, shape, texture, spatial relationship.Color characteristic is not by image rotation and translation variable effect, and further normalization also can not affect by dimensional variation, and conventional color characteristic has color histogram, color set, color moment, color convergence vector sum color correlogram etc.Shape facility comprises contour feature and provincial characteristics two class, contour feature is mainly for object boundary, and provincial characteristics is related to whole object area, conventional shape facility has boundary chain code, fourier descriptor, geometric shape parameters, Shape expression and small echo relative moment etc.Textural characteristics has stronger resistivity to noise, but can be subject to the impact of the correlative factors such as resolution, directivity, a priori assumption, and conventional texture analysis method has statistical study, geometric analysis and spectrum analysis etc.Spatial relationship refers to position mutual between multiple goal or direction relations, can strengthen the separating capacity described picture material, but more responsive to target rotation, dimensional variation etc., and only usage space relation information is inadequate often in actual applications.The expression of above-mentioned various pyrotechnics feature often needs by certain mathematical tool, as Laplace operator, Fourier transform, gray level co-occurrence matrixes, Hidden Markov Model (HMM), LBP operator, discrete wavelet analysis etc.

Stage three, comprehensive descision.

Pyrotechnics target comprehensive descision is exactly provide the conclusion that whether there is fire, i.e. the designing and employing of pattern recognition classifier device based on the various features extracted.The pyrotechnics characteristics of image being usually used in comprehensive descision comprises brightness value, color distribution value, parametric texture, barycenter, area, average density, circularity, curvature, degree of eccentricity, wedge angle number, fractal image, transmitance etc.Pattern classification includes supervision and without supervision two type, can carry out alone or in combination at Information Level, characteristic layer, decision-making level's three levels.Pattern classification for pyrotechnics mainly realizes at characteristic layer, and common method comprises ballot method, lowest mean square fusion, Bayes sorter, fuzzy logic, artificial neural network, support vector machine etc.

Said method is demonstrating its validity in occasions such as building fire monitorings, but in natural scene, there is the object similar to pyrotechnics sometimes, safflower, red autumnal leaves, the red flag of such as similar fire, the mist, cloud, haze etc. of similar cigarette.The outwardness of these objects causes pyrotechnics accuracy of identification lower, rate of failing to report and rate of false alarm higher.Therefore, the higher pyrotechnics target identification method of a kind of precision is needed in fire disaster intelligently monitoring field badly.In recent years, the degree of depth learning art in machine learning field is progressively applied in image procossing and pattern-recognition, degree of depth study is by learning a kind of deep layer nonlinear network structure, realize complicated function to approach, characterize input Data distribution8 formula to represent, and present the powerful ability from a few sample focusing study data set essential characteristic.Up to the present, the research by the degree of depth learns to combine with pyrotechnics identification is not yet had to occur.

Summary of the invention

For the deficiency that prior art exists, degree of depth study combines with pyrotechnics identification by the present invention, provides a kind of firework identification method based on picture depth study and system, automatically identifies in order to realize more high-precision pyrotechnics.

For solving the problems of the technologies described above, the present invention adopts following technical scheme:

Based on the firework identification method of picture depth study, comprise step:

Step 1, capturing sample image collection, what comprise the target image without exemplar image set and (2) key words sorting of the target image of (1) unfiled mark and the image construction of target homologue and the image construction of target homologue has exemplar image set;

Step 2, respectively from without exemplar image set and have random acquiring unit image block exemplar image set, form without label training dataset with have label training dataset;

Step 3, to without label training dataset and have label training data to concentrate training data to carry out whitening pretreatment, described training data is the color value matrix of RGB tri-chrominance channel that cell picture block is corresponding;

Step 4, based on after whitening pretreatment without label training data, adopt unsupervised learning to build based on the deep neural network of sparse own coding, and extract the primary image feature set without label training data;

Step 5, by without the primary image feature convolution of label training data and pond view data, described view data includes label training data and image to be identified;

Step 6, trains Softmax sorter based on the label training dataset that has behind Convolution sums pond;

Step 7, the Softmax sorter of the image to be identified input behind Convolution sums pond having been trained obtains recognition result.

Whitening pretreatment described in above-mentioned steps 3 is ZCA whitening pretreatment or PCA whitening pretreatment.

Above-mentioned steps 4 comprises sub-step further:

4.1 construction depth neural networks, comprise single input layer, many hidden layers and single output layer;

4.2 using after whitening pretreatment without the input and output of label training data as deep neural network, by training carry out unsupervised learning based on the deep neural network of sparse own coding;

4.3 extract the primary image feature set without label training data based on the deep neural network of training.

Described in sub-step 4.2 by training carry out unsupervised learning based on the deep neural network of sparse own coding, be specially:

4.2.1 neuron input value weighted sum and neuron output value is obtained;

4.2.2 setting adds the objective cost function of openness restriction;

4.2.3 the Gradient Descent direction of the weight coefficient vector sum bias term vector of set depth neural network, i.e. rule of iteration;

4.2.4 LBFGS parameter training algorithm is adopted, by the rule of iteration iterative weight coefficient vector sum bias term vector of setting.

Above-mentioned steps 5 comprises sub-step further:

Primary image feature without label training data is carried out convolution algorithm with each Color Channel of view data by 5.1 respectively obtains convolved image;

5.2 utilize regional area statistical nature in natural image, are realized the Feature Dimension Reduction of convolved image by average pond.

Above-mentioned steps 6 comprises sub-step further:

6.1 to have label training dataset as training sample behind Convolution sums pond;

6.2 structure Softmax sorter regression models;

The gradient of 6.3 setting Parameters in Regression Models, i.e. rule of iteration;

6.4 adopt LBFGS parameter training algorithm, by the rule of iteration iterative model parameter θ of setting.

The above-mentioned system corresponding based on the firework identification method of picture depth study, comprising:

Sample image acquisition module, be used for capturing sample image collection, what comprise the target image without exemplar image set and (2) key words sorting of the target image of (1) unfiled mark and the image construction of target homologue and the image construction of target homologue has exemplar image set;

Training data obtains module, is used for respectively from without exemplar image set and have random acquiring unit image block exemplar image set, formation without label training dataset with have label training dataset;

Whitening pretreatment module, be used for without label training dataset and have label training data to concentrate training data to carry out whitening pretreatment, described training data is the color value matrix of RGB tri-chrominance channel that cell picture block is corresponding;

Unsupervised learning module, be used for after based on whitening pretreatment without label training data, adopt unsupervised learning to build based on the deep neural network of sparse own coding, and extract the primary image feature set without label training data;

Convolution sums pond module, be used for without the primary image feature convolution of label training data and pond view data, described view data includes label training data and image to be identified;

Sorter training module, being used for the label training dataset that has after based on Convolution sums pond trains Softmax sorter;

Identification module, being used for the Softmax sorter that the image to be identified input after by Convolution sums pond trained obtains recognition result.

Compared with prior art, the present invention has the following advantages and good effect:

(1) degree of depth study is by study deep layer nonlinear network structure, realize complicated function to approach, characterize input Data distribution8 formula to represent, have the powerful ability from large sample focusing study data essential characteristic, therefore the classification accuracy of sparse own coding deep neural network is higher than traditional neural network.

(2) the ZCA technology adopted is for Data Dimensionality Reduction, and whitening techniques for reducing degree of being associated between input image pixels, thus is conducive to the speed improving unsupervised learning.

(3) convolution technique adopted contributes to the parameter of minimizing neural network needs training and simplifies characteristic extraction procedure, and pond technology contributes to utilizing regional area statistical nature realization character dimensionality reduction and preventing over-fitting.

(4) the Softmax sorter adopted is the expansion of two sorting techniques, can solve many classification problems, is conducive to the identification realizing pyrotechnics and more similar purpose.

Accompanying drawing explanation

Fig. 1 is the primary image feature set that sample image and deep neural network learn.

Embodiment

Technical scheme of the present invention can adopt computer software means to realize by those skilled in the art, and with specific embodiment, the invention will be further described below.

Concrete steps of the present invention are as follows:

Step 1, obtains by without exemplar image set and the sample graph image set having exemplar image set to form, obtains without label training dataset and have label training dataset based on sample graph image set.

This step comprises following sub-step further:

Step 1.1, gathers without exemplar image construction without exemplar image set.

Comprise target image and the target homologue image of non-classified mark without exemplar image set, target i.e. fire and cigarette, and target homologue refers to the object similar with cigarette to fire.Such as, safflower, red autumnal leaves, red flag etc. the i.e. homologue of fire; Mist, cloud, haze etc. the i.e. homologue of cigarette.In this sub-step, by the image composition first kind of a large amount of fire, safflower, red autumnal leaves, red flag without exemplar image set, by the image of a large amount of cigarette, mist, cloud, haze composition Equations of The Second Kind without exemplar image set.

Step 1.2, gathers and has exemplar image construction to have exemplar image set.

Exemplar image set is had to comprise target image through key words sorting and target homologue image, the first kind as the image composition through the fire of key words sorting, safflower, red autumnal leaves, red flag has exemplar image set, and the Equations of The Second Kind through the image composition of the cigarette of key words sorting, mist, cloud, haze has exemplar image set.

Step 1.3, obtains without label training dataset based on sample graph image set and has label training dataset.

From without the cell picture block obtaining fixed measure (such as 8 pixel × 8 pixels) exemplar image at random, as without label training dataset; From having exemplar image, obtain the cell picture block of fixed measure (such as 8 pixel × 8 pixels) at random, as there being label training dataset.

Step 2, without label training dataset and have label training data to concentrate the whitening pretreatment of training data, the cell picture block that described training data and step 1.3 obtain, the color value matrix of RGB tri-chrominance channel that namely cell picture block is corresponding.

In this embodiment, ZCA whitening pretreatment is carried out to training data, comprises following sub-step successively:

Step 2.1, the zero-mean of training data.

Each dimension deducts this dimension mean value and obtains x ⁱ, and normalization training data is in [0,1] scope, if m is training data quantity, can obtain the covariance matrix ∑ of training data:

Σ = \frac{1}{m} Σ_{i = 1}^{m} [x^{i} \cdot {(x^{i})}^{T}] - - - (1)

Step 2.2, calculates the vector basis of training data under new dimension after zero-mean.

As svd, eigenwert diagonal matrix S and n dimensional feature vector U=[u is obtained to covariance matrix ∑ ₁u ₂u _n], wherein, u ₁the main proper vector of ∑, u ₂sub-eigenvector, u _nbe worst proper vector, these proper vectors constitute one group of vector basis under new latitude coordinates.

Step 2.3, obtains the training data under new dimension.

Training data is carried out dimension transformation and obtains new data x ^r=U ^tx ⁱ, obvious x ^rin separate between each dimension, then by x ^rdivided by standard deviation obtaining each dimension variance is 1, thus the average meeting albefaction is close to 0 two necessary conditions equal to variance, if ε is ZCA whitening parameters, this is concrete implement in ε get 10 ^-5, final ZCA albefaction result is

x^{t} = U \cdot \frac{x^{r}}{\sqrt{S + ϵ}} \cdot U^{T} - - - (2)

In the present invention, training data pre-service is not limited to ZCA albefaction, also can adopt other conventional whitening techniques such as PCA albefaction.

Step 3, carries out unsupervised learning to without label training data, builds deep neural network based on sparse own coding.

This step comprises following sub-step further:

Step 3.1, construction depth neural network, comprises input layer, hidden layer and output layer, and input layer and output layer are individual layer, and hidden layer is multilayer, and using without the constrained input of label training data as deep neural network.

Step 3.2, carries out unsupervised learning based on training deep neural network, namely obtains the weight coefficient vector sum bias term vector of deep neural network.

Step 3.2.1, obtains neuron input value weighted sum:

If represent a Connection Neural Network l layer jth neuron with l+1 layer i-th neuronic weight coefficient, represent l+1 layer i-th neuronic bias term, S _lrepresent the neuron population of l layer, represent i-th neuronic input value weighted sum in l+1 layer, then:

z_{i}^{l + 1} = Σ_{j = 1}^{S_{l}} (w_{ij}^{l} x_{j}^{l}) + b_{i}^{l + 1} - - - (3)

Step 3.2.2, obtains neuron output value:

Known neuron activation functions is represent that in neural network l layer, i-th neuronic output valve namely allow without label training data x ^t, i.e. the input amendment of own coding deep neural network and Output rusults y ^tequal, i.e. y ^t=x ^tif M is without label training data x ^tquantity, t is without label training data numbering, then 1≤t≤M; If expression input amendment is x ^ta l layer jth neuronic output valve in situation, then a hidden layer jth neuronic average output value for:

Step 3.2.3, definition deep neural network objective cost function:

For deep neural network adds openness restriction, even ρ is openness parameter, and openness parameter is the positive number close to 0, generally 0 ~ 0.05 value, gets ρ=0.035 in this concrete enforcement.In other words, a hidden layer jth neuronic average output value be made close to ρ, in order to realize openness restriction, definition cost objective function J (w, b):

Cost objective function by three parts and form, Part I is mean square deviation item, and Part II is regularization term, and Part III is penalty term, for punishing those the situation significantly different with ρ is to realize the openness restriction to neural network.Wherein, N is the own coding deep neural network number of plies; λ is regularization coefficient, λ=0.003 in this concrete enforcement; h _w,b(x ⁱ) be input amendment x ^tthe output valve of corresponding neural network output layer; β is the coefficient controlling openness restriction penalty term, β=5 in this concrete enforcement; W and b is respectively the weight coefficient vector sum bias term vector of deep neural network. be and the relative entropy between ρ, for measuring the difference between two distributions, as convex function, relative entropy computing formula is:

Step 3.2.4, solves objective cost function:

For the weight coefficient vector sum bias term vector of deep neural network, define their Gradient Descent direction:

\{\begin{matrix} {&dtri; w}^{l} = \frac{1}{M} \cdot σ^{l + 1} \cdot {(a^{l})}^{T} + {λw}^{l} \\ {&dtri; b}^{l} = \frac{1}{M} Σ_{t = 1}^{M} σ_{t}^{l + 1} \end{matrix} - - - (7)

In formula (7), represent the Gradient Descent direction of l layer weight coefficient vector, show the Gradient Descent direction of l layer bias term vector; w ^lrepresent l layer weight coefficient vector; a ^lfor the output vector of neural network l layer, for input amendment x ^tin the residual values of l+1 layer correspondence, σ ^l+1for the residual vector of this layer.

Formula (7) determines the rule of iteration of w and b, LBFGS parameter training algorithm iteration is adopted to solve w and b in this concrete enforcement, present weight coefficient vector w until iteration convergence or when reaching maximum iteration time and bias term vector b, the weight coefficient vector w of the sparse own coding deep neural network of namely training and bias term vector b.Iteration convergence standard and maximum iteration time preset according to the actual requirements.Obtain weight coefficient vector w and bias term vector b, namely complete the training of sparse own coding deep neural network.

Step 3.3, the deep neural network based on training extracts the primary image feature set expressed without label training data.

Primary image feature set refers to the set of the primary image feature that can form complicated image, see Fig. 1, upper left is one of the sample image for training (1), the right is the primary image feature set (3) that all sample images learn through deep neural network to obtain, and the combination of primary image feature can express the arbitrary cell picture block (2) in sample image.

Step 4, utilizes primary image feature set convolution and pond view data, and described view data includes label training data and view data to be identified.

This step comprises sub-step further:

Step 4.1, primary image feature is carried out convolution algorithm with each Color Channel of each view data respectively, namely the image pixel within the scope of convolution masterplate averaged and with this mean value for desired value, the convolution results of three Color Channels added up, namely obtains convolved image.

Step 4.2, is utilized regional area statistical nature in natural image, is realized the Feature Dimension Reduction of convolved image by average pond, by convolved image subregion, asks each area pixel average, and adopts each area pixel average to represent this region.

Regional area statistical nature is the inherent characteristic of natural image, and namely the statistical property of a natural image part and other parts are similar.Such as, landscape image region and other region have similarity.This means that the feature that image part learns also can be applied in another part, average pondization is then concrete implementation method.

Step 5, trains Softmax sorter based on there being label training dataset.

This step comprises sub-step further:

Step 5.1, builds Softmax sorter training sample set.

Convolution and Chi Huahou formed training sample set { (x by label training data ¹, y ¹), (x ², y ²) ..., (x ^k, y ^k), K is for having label training sample quantity, x ⁱrepresent i-th training sample, namely convolution and Chi Huahou's has label training data, y ⁱfor training sample x ⁱcorresponding key words sorting.If Softmax sorter is for solving k classification problem, then y ⁱ∈ 1,2 ..., k}.

Step 5.2, structure Softmax sorter regression model.

If θ is model parameter, for be valuated;

h_{θ} (x^{i}) = [\begin{matrix} p (y^{i} = 1 | x^{i}; θ) \\ p (y^{i} = 2 | x^{i}; θ) \\ . \\ . \\ . \\ p (y^{i} = k | x^{i}; θ) \end{matrix}]

The evaluation function of model parameter θ, wherein evaluation function h _θ(x ⁱ) cost function J (θ) be:

J (θ) = - \frac{1}{K} [Σ_{i = 1}^{K} Σ_{j = 1}^{k} [f (y^{i} = j) \log h_{θ} (x^{i})]] - - - (8)

Wherein, f (y ⁱ=j) be indicator function, value is 0 or 1, if i-th training sample x ⁱlabel is classification j, then function f (y ⁱ=j)=1, otherwise, function f (y ⁱ=j)=0.

Step 5.3, the gradient of Definition Model parameter θ

{&dtri;}_{θj} J (θ) = - \frac{1}{K} Σ_{i = 1}^{K} [x^{i} (f (y^{i} = j) - p (y^{i} = j | x^{i}; θ))] - - - (9)

Formula (9) gives the rule of iteration of model parameter θ, LBFGS parameter training algorithm iteration solving model parameter θ is adopted in this concrete enforcement, rule of iteration based on formula (9) carries out iterative computation, treat iteration convergence or reach the current Parameters in Regression Model θ of maximum iteration time, the i.e. optimum solution of Softmax sorter Parameters in Regression Model θ, obtain Parameters in Regression Model θ, namely complete the training of Softmax sorter.

After completing steps 1 ~ 5, treat recognition image, the primary image feature set adopting sparse own coding deep neural network to learn carries out convolution and pond, by the Softmax sorter that the image to be identified input behind Convolution sums pond trains, classification results can be obtained, can be judged as that image to be identified is the image of fire, safflower, red autumnal leaves or red flag, or be the image of cigarette, mist, cloud or haze.

Specific embodiment described herein is only to the explanation for example of the present invention's spirit.Those skilled in the art can make various amendment or supplement or adopt similar mode to substitute to described specific embodiment, but can't depart from spirit of the present invention or surmount the scope that appended claims defines.

Claims

1., based on the firework identification method of picture depth study, it is characterized in that, comprise step:

2., as claimed in claim 1 based on the firework identification method of picture depth study, it is characterized in that:

Whitening pretreatment described in step 3 is ZCA whitening pretreatment or PCA whitening pretreatment.

3., as claimed in claim 1 based on the firework identification method of picture depth study, it is characterized in that:

Step 4 comprises sub-step further:

4., as claimed in claim 3 based on the firework identification method of picture depth study, it is characterized in that:

4.2.1 neuron input value weighted sum and neuron output value is obtained;

4.2.2 setting adds the objective cost function of openness restriction;

5., as claimed in claim 1 based on the firework identification method of picture depth study, it is characterized in that:

Step 5 comprises sub-step further:

6., as claimed in claim 1 based on the firework identification method of picture depth study, it is characterized in that:

Step 6 comprises sub-step further:

6.2 structure Softmax sorter regression models;

6.4 adopt LBFGS parameter training algorithm, by the rule of iteration iterative model parameter of setting .

7., based on the pyrotechnics recognition system of picture depth study, it is characterized in that, comprising: