CN108805167A

CN108805167A - L aplace function constraint-based sparse depth confidence network image classification method

Info

Publication number: CN108805167A
Application number: CN201810417793.5A
Authority: CN
Inventors: 宋威; 李蓓蓓; 王晨妮
Original assignee: Jiangnan University
Current assignee: Jiangnan University
Priority date: 2018-05-04
Filing date: 2018-05-04
Publication date: 2018-11-13
Anticipated expiration: 2038-05-04
Also published as: CN108805167B

Abstract

The invention provides a sparse depth confidence network image classification method based on L aplae function constraint, which belongs to the field of image processing and deep learning.A penalty regular term is introduced into a likelihood function in an unsupervised stage based on the enlightenment of visual cortex analysis of primates, a CD algorithm is used for maximizing a target function, meanwhile, the sparse distribution of a training set is obtained through L aplace sparse constraint, so that unlabeled data can be learned to visual feature representation.

Description

A kind of sparse depth confidence network image classification based on Laplace function constraints Method

Technical field

The present invention relates to image procossing, deep learning fields, more particularly to a kind of based on the dilute of Laplace function constraints Degree of deepening by dredging confidence network (Laplace Sparse Deep Belief Network, LSDBN) image classification method.

Background technology

Existing image classification is mainly using based on the method for generating model or discrimination model, these shallow structure moulds Type has some limitations, and the ability to express of complicated function is limited in the case of Finite Samples, and generalization ability is by certain Restriction, cause category of model effect decline；Image data feature needs to be located in advance there are much noise and redundancy Reason, to consume plenty of time and resource.Therefore excellent feature extraction algorithm and disaggregated model are a weights of image procossing Want research direction.

In recent years, deep learning rapidly develops, and Hinton et al. proposed depth confidence network (Deep in 2006 Belief Networks, DBN) and unsupervised greedy successively training algorithm, it solves deep neural network and is easily absorbed in part most Excellent problem, caused deep learning academia New Wave.DBN is obtained by multi-level eigentransformation for original number According to abstract representation, to improve the accuracy of the tasks such as classification and prediction, because DBN have automatic learning characteristic and Data Dimensionality Reduction Advantage, have become the most widely used network structure of deep learning, currently, DBN is in speech recognition, image classification, face The related fields such as identification all made breakthrough progress.

Image classification algorithms constructed by DBN can retain the sky of characteristics of image with the character representation of each level of integrated study Between information, while the advantage of the automatic learning classification features of DBN is utilized, avoids poor pervasive of tradition extraction characteristics algorithm Property.Although DBN model has been achieved for encouraging achievement, in the training process existing characteristics unification, i.e., There are a large amount of common characteristic, causes the posterior probability of implicit layer unit higher, cannot learn well to the useful spy of data Sign indicates, especially prominent when hidden layer element number is very few.At present solve feature unification method be exactly adjust it is hidden Sparsity containing node layer, reduce connection weight row between similarity, i.e., by be added in a network sparse penalty factor into Row rarefaction.According to the vision system of the research mankind for targetedly things, only a small number of neurons are activated.It is ground by this The inspiration studied carefully, researcher propose sparse volume for the rarefaction representation (Sparse Representation) of analog vision system Code is theoretical.

Rarefaction representation is considered not influenced by local deformation on computer vision direction, and in study rarefaction representation During, the most important feature of things is always paid close attention to, therefore redundancy feature can be abandoned, reduces overfitting and noise is dirty The influence of dye.So sparsity is introduced into limited Boltzmann machine (Restricted Boltzmann Machine, RBM) Training process avoid feature unification, be a significant idea.Scholars have proposed a variety of dilute at present RBM models are dredged to solve the problems, such as this, have scholar to attempt the likelihood that the L0 regularizations of hidden layer unit activating probability are introduced into RBM In function, but it is a np hard problem to be to solve for L0 regularizations；In view of L1 regularizations are convex double optimization problems, there is scholar logical It crosses and activates the L1 regularizations of probability to introduce the likelihood function of RBM hidden unit, propose novel sparse deep layer belief network, Hinton proposes the sparse penalty factor of cross entropy using the concept of cross entropy so that hidden unit has whole sparsity；Lee etc. Propose the sparse RBM (SP-RBM) based on error sum of squares；There is scholar to propose the sparse RBM (SR- based on rate distortion theory RBM), but SR-RBM distortion metrics are obtained without correct method.In short, for DBN, it can using the variant of RBM To reach the sparse behavior that binary system implies unit by specifying " sparse target ".But this method needs are previously set " sparse target ", hidden layer node sparse degree all having the same in a certain state.

Invention content

Of the existing technology in order to solve the problems, such as, the present invention proposes a kind of sparse depth based on Laplace function constraints Spend confidence network image sorting technique.

Technical scheme of the present invention：

A kind of sparse depth confidence network image sorting technique based on Laplace function constraints, includes the following steps：

Step 1 chooses training image data set, and carries out image preprocessing, obtains training dataset；

The pretreated training dataset of step 1 is input in LSDBN network models by step 2, is calculated using to sdpecific dispersion Method (Contrastive Divergence, CD-k) is unsupervisedly bottom-up individually to train each layer to be based on Laplace function The sparse limited Boltzmann machine (Laplace Sparse Restricted Boltzmann Machine, LS-RBM) of constraint Network, the output using lower layer's LS-RBM networks are obtained adjacent to the input of last layer LS-RBM networks by repetitive exercise as it To the parameter value of each LS-RBM networks, and finally obtain the high-level feature of institute's input image data；The parameter value is Weights and biasing；

Step 3, the parameter value for obtaining step 2 are as the initial value in fine tuning stage, using top-down backpropagation Algorithm finely tunes entire LSDBN networks, obtains LSDBN network models；

Test image data set is input in the LSDBN network models that step 3 obtains by step 4, and uses Softmax Test, final output image classification results are identified in grader.

The step 1 is specially：Coloured image is changed into grayscale image by binarization method, and grayscale map The gray value of picture normalizes between [0,1], obtains training dataset；Wherein normalizing formula is：

Wherein,For the characteristic value of image data set, x_maxAnd x_minRespectively the maximum value of all features of image data set and Minimum value, x are the image data sets after normalization.

The step 2 is specially：

Step 2.1, LSDBN network models are built, the parameter value of LSDBN network architectures is set：It is visual node layer, hidden Node layer, the hidden layer number of plies, iterations and fine tuning number；Wherein, visual node layer is the intrinsic dimensionality of the image set of input, Hidden node is determined according to the intrinsic dimensionality size of input picture collection；

Step 2.2, using the training dataset x pre-processed as the input of first LS-RBM, using CD algorithms to LS- RBM is trained；

(1) relationship between visual layers and hidden layer is expressed as with energy function：

Wherein, θ indicates the parameter in model, i.e. θ={ W_ij,a_i,b_j}；W_ijIt is the weight square between visible layer and hidden layer Battle array, a_iFor the biasing of visible node layer, b_jTo hide the biasing of node layer, i is the feature quantity of the image of input, you can regards layer Node shares n；J is hidden layer node, shares m；v_iIndicate i-th of visual node layer, h_jIndicate j-th of hidden layer section Point；

(2) it is based on energy function formula (2), the joint probability distribution for obtaining v and h in RBM is：

Wherein, Z (θ) is to all possible visible node layer and hidden layer node to summation, a_iFor visible node layer Biasing, b_jTo hide the biasing of node layer；

Using the principle of Bayesian formula, visual layer unit v and hidden is found out according to the joint probability distribution of formula (3) respectively The marginal probability distribution of the h containing layer：

Using Bayesian formula principle and the defined formula of sigmoid activation primitives, visual layer unit v and hidden is derived The conditional probability distribution formula of the h containing layer：

Wherein, σ () is sigmoid activation primitives, the i.e. nonlinear mapping function of neuron；

Using formula (7) and formula (8), training image is obtained by a step gibbs sampler using to sdpecific dispersion algorithm Approximate reconstruction P (v；θ)；

(3) maximum-likelihood method is utilized to solve P (v；θ), the optimal value of θ is obtained；The likelihood function of LS-RBM is：

The optimal value of parameter is：

After adding sparse penalty term, the object function of LS-RBM pre-training optimization is：

F=F_unsup+λF_sparse (11)

Wherein, λ is sparsity parameter, for adjusting F_sparseRelative importance, F_sparseIndicate sparse Regularization function, Formula is：

Wherein, L (q_j, μ, b) and it is Laplacian probability density function, q_jIndicate give data j-th of item for implying layer unit The desired average value of part, p are a constants, control n hidden unit h_jDegree of rarefication；U indicates scale parameter；q_jExpression formula is as follows：

Wherein, j-th of the conditional expectation for implying layer unit when E () is data-oriented, l indicate the number of training image, m It is the quantity of training image data set,For j-th of unit of the corresponding hidden layer of l pictures, v^(l)For l pictures pair The visual layer unit answered,It is to hide layer unit h when providing visible layer v_jActivation probability, g is sigmoid functions；

After increasing sparse penalty term, the purpose of training LS-RBM is the object function optimal value of solution formula (10)：

Wherein, P (v^(l)) for the LS-RBM likelihood functions to be optimized, you can regard the distribution P (v of layer v；θ)；

(4) biasing of weight matrix and hidden layer is updated to the object function derivation of LS-RBM using gradient descent method, Derivation formula is：

Parameter value after derivation will be brought into the newer of parameter θ, obtains new parameter value：

a⁽¹⁾:=ma+ α (v₁-v₂) (26)

(5) continued to train network with new parameter value, by continuing to optimize object function, keep the activation of implicit layer unit general Rate moves closer to given fixed value p, study to one group of weight parameter and corresponding biasing, is sought by these suitable parameters Find sparse features vector, control image present in redundancy feature, with main feature in image come ensemble learning weight with Reconstruct input data, the training of first LS-RBM of completion and the update of corresponding parameter value θ.

Step 2.3 passes through first trained W of LS-RBM⁽¹⁾And b⁽¹⁾, and use P (h⁽¹⁾|v⁽¹⁾,W⁽¹⁾,b⁽¹⁾) obtain The input feature vector of second LS-RBM continues second LS-RBM of training using algorithm in step 2.2；

Step 2.4 is recursive to train according to above step after trained to l-1 layers, multiple loop iteration, obtains one The sparse DBN model of a profound level, i.e. LSDBN network models；

Step 2.5 is to L layers of W^(L)And b^(L)It is initialized, and uses { W⁽¹⁾,W⁽²⁾..., W^(L)And { b⁽¹⁾,b⁽²⁾,…, b^(L)For composition one with L layers of deep neural network, output layer is the label data of training image collection, uses Softmax points Class device is as output layer；

The step 3 is specially：

Step 3.1, using trained parameter value θ in step 2 as the initial value of fine tuning stage parameter, the fine tuning stage is inputted LSDBN network models in；

Step 3.2, the activation value of each implicit layer unit is calculated using propagated forward algorithm；

Step 3.3, the error of the output result and the image set corresponding label of training image collection propagated forward is calculated, it will Error back propagation calculates the residual error of each implicit layer unit, to indicate the unit on being influenced caused by residual error；Using each The residual computations partial derivative of unit, each iteration is using gradient descent method according to formula (28), (29) to weight matrix and biasing It is updated, until reaching maximum iteration obtains the LSDBN network models finely tuned；

Wherein, α is learning rate, and J (W, b) is obtained by calculating the error of realistic model output valve and corresponding label value The cost function arrived.

The step 4 is specially：

Step 4.1, the test chart image set pre-processed is input in the LSDBN network models that step 3 has been finely tuned, is extracted The main feature of test image；

Step 4.2, the main feature of test image is input to Softmax graders in, output belongs to a certain classification Probability, the output for the Softmax graders of a k class is：

Wherein, x⁽ⁱ⁾For i-th of test image, y⁽ⁱ⁾Indicate the label of i-th of test image, p (y⁽ⁱ⁾=k | x⁽ⁱ⁾；θ) table Show that i-th of test image belongs to the probability of kth class,That probability distribution is normalized, make all probability it Be 1；

Step 4.3, the class label corresponding to certain a kind of probability and the image is belonged to according to test image, calculates classification Accuracy rate, the classification results of final output image.

Beneficial effects of the present invention：In order to make model with more explanation and resolving ability, first, in unsupervised stage likelihood Punishment regular terms is introduced in function, while training maximization object function using CD, is passed through sparse constraint and is obtained training set Sparse distribution can be such that no label data study is indicated to useful low-level features, and the energy for meeting biological evolution is minimum Economic policies.Secondly, it proposes a kind of sparse depth belief network based on Laplace function, is induced using laplacian distribution The rarefaction state of hidden layer, while the scale parameter in the distribution can be used for controlling sparse dynamics, may learn diluter Thin expression has stronger ability in feature extraction.

Description of the drawings

Fig. 1 is the image classification method flow chart of LSDBN models in the present invention.

Fig. 2 is LS-RBM classification accuracy results on Pendigits data sets in the present invention.

Fig. 3 is LSDBN classification accuracy results on Pendigits data sets in the present invention.

Specific implementation mode

Below in conjunction with attached drawing and technical solution, the specific implementation mode that further illustrates the present invention.

Embodiment 1：

As shown in Figure 1, a kind of sparse depth confidence network image sorting technique based on Laplace function constraints, specifically Steps are as follows：

Step 1 chooses suitable training image data set, and carries out image preprocessing to it, obtains training dataset.

Since image classification focuses on characteristic extraction procedure, coloured image is changed by grayscale map by binaryzation Picture, and between gray value is normalized to [0,1], need to only feature extraction be carried out to a two-dimensional gray level matrix in this way.Tool It is as follows that body normalizes formula：

Step 2, the pre-training that the training dataset pre-processed is used for LSDBN network models.According to the training of input Data set carries out pre-training to network, selects unsupervised to sdpecific dispersion (Contrastive Divergence, CD-k) algorithm Bottom-up individually to train each layer of LS-RBM, the output using the LS-RBM of bottom is inputted as high level LS-RBM, iteration instruction Practice, obtains corresponding weights and biasing, the final high-level feature for obtaining data.It is as follows with process：

Step 2.1, LSDBN network models are built, the parameter value of LSDBN network architectures is set：Visually node layer is For the intrinsic dimensionality of input data set, hidden node is configured according to the intrinsic dimensionality size of different data collection, is implied layer by layer Number is set as 2, and iterations 100, fine tuning number is 100, to obtain preferably convergence result.

Step 2.2, using the training dataset pre-processed as the input of first LS-RBM, using CD algorithms to LS- RBM is trained.

(1) since RBM is the model based on energy, the relationship between visual layers and hidden layer can use energy function table It is shown as：

Wherein：θ indicates the parameter in model, i.e. θ={ W_ij,a_i,b_j}；W_ijIt is the weight square between visible layer and hidden layer Battle array, a_iFor the biasing of visible node layer, b_jTo hide the biasing of node layer, i is the feature quantity of the image of input, you can regards layer Node shares n；J is hidden layer node, shares m；v_iIndicate i-th of visual node layer, h_jIndicate j-th of hidden layer section Point.

(2) according to energy function formula (2) it is found that the value of each group of visible node layer and hidden layer node is all there are one phase The energy value answered.When all parameters determine, according to the definition of energy function and the principle of thermal dynamics statistics, define in RBM v and The joint probability distribution of h is：

In above formula, Z (θ) is obtained to summation to all possible visible node layer and hidden layer node.RBM models are wanted Study is exactly joint probability distribution, i.e. the generation of object.

Further, for specifically problem, most concern is the probability distribution about observation data v defined by RBM, It is exactly P (v, h；Marginal probability distribution θ), the probability (training data) which distributes to visual element pass through to all Possible hidden unit summation provides：

Correspondingly, the marginal probability distribution of hidden layer h is：

Visible layer and the condition of hidden layer can be derived by the function defined formula of Bayesian formula and sigmoid Probability distribution formula is：

(3) for given training sample, training RBM models mean to find out the value of parameter θ so that under the parameter RBM can greatly be fitted training sample.Therefore, the log-likelihood function P (v | θ) that RBM is maximized by maximum-likelihood method, from And obtain the optimal value of θ, i.e.,

Assuming that giving a training set { v⁽¹⁾,...,v^(m), use the unsupervised pre-training Optimized model of sparse penalty term It is defined as follows：

F=F_unsup+λF_sparse (10)

Wherein, F_unsupThe likelihood function for indicating RBM, i.e., shown in formula (4-5)λ is sparse penalty term Parameter, F_sparseIndicate arbitrary sparse Regularization function.

For statistical theory, the purpose of sparse RBM mainly so that most of hidden layer node is zero, changes sentence Talk about be exactly implicit layer unit activation probability close to zero.If implicit layer unit is that sparse (implicit layer unit is most of When do not activate), then feature of this implicit layer unit is used only for indicating small part training data.

The average activation probability of training data can be reduced by defining sparse regular terms, so that it is guaranteed that model neuron (correspond to stochastic variable h_j) " activity ratio " be maintained at rather low level so that the activation of neuron is sparse.This is just It is required that the activation probabilistic image of implicit layer unit has the heavy-tailed feature of spike.

Based on above-mentioned analysis, the inspiration of compressed perception theory, by the present invention in that being lured with Laplace function punishment The rarefaction state of implicit layer unit is led, which has heavy-tailed feature, according to hidden layer unit activating probability and fixed value parameter p Deviation have different behaviors.In addition, also having the scale parameter that can control sparse degree.

In probability and mathematical statistics, laplacian distribution is a kind of continuous distribution, is compared with normal distribution, With more flat tail portion.During solving sparse solution, most of feature can gradually be intended to the both sides distribution of function, That is being closer to zero.A small number of hidden units have higher activation probability, and there will be some hidden units to be activated, warp The main feature for training these hidden units being activated that can indicate data is crossed, the rarefaction mark sheet of implementation model is more easy to Show.

Objective function optimization formula is after increasing sparse regularization term：

Laplce is sparse, and penalty term is defined as follows:

In formula, L (q_j,μ, b) it is Laplacian probability density function, q_jIndicate give data j-th of item for implying layer unit The desired average value of part, p are a constants, control n hidden unit h_jDegree of rarefication optimize mesh by defining the value of the parameter Scalar functions are exactly to make the average activation probability of hidden layer node as close as p；U indicates scale parameter, by changing its Value can be used for controlling the degree of sparse power.Wherein, q_jIt indicates as follows：

In formula, the conditional expectation of j-th of hidden layer node when E () is data-oriented, m is the quantity of training data,It is to hide layer unit h when providing visible layer v_jActivation probability, g is sigmoid functions.

Further, the object function of training LS-RBM is as follows：：

In above formula, first item is log-likelihood function item, and Section 2 is sparse penalty term, and wherein λ is the parameter of this, is used To indicate relative importance of this in object function between data distribution.Therefore the same of log-likelihood function is being maximized When also to maximize sparse regular terms and solve.

(4) biasing of hidden layer directly affects the degree of rarefication of hidden unit, therefore only updates the inclined of weight matrix and hidden layer It sets.Wherein, the gradient of sparse regularization term calculates as follows：

Wherein, the derivation result of the item of first item is shown below in above formula：

Shown in Section 2 is unfolded as follows:

Wherein, σ_j=∑_iW_ijv_i+b_jIndicate the input of hidden unit j.Each single item derivation after formula expansion is as follows：

Hidden layer biasing derivation can be obtained：

Parameter value after progress derivation is brought into parameter newer, new parameter value is obtained：

a:=ma+ α (v₁-v₂) (26)

(5) after undated parameter value, new parameter value continues that network is trained to make to imply by continuing to optimize object function The activation probability of node layer moves closer to given fixed value p, trial learning to one group of weight parameter and corresponding biasing, leads to Sparse features vector can be searched out by crossing these suitable parameters, control redundancy feature, be weighed come ensemble learning with main feature Weight improves robustness of the algorithm to noise to reconstruct input data.First LS-RBM and corresponding has just been trained as a result, Parameter value.

Step 2.3 trains W by first LS-RBM⁽¹⁾And b⁽¹⁾, and use P (h⁽¹⁾|v⁽¹⁾,W⁽¹⁾,b⁽¹⁾) obtain The input feature vector of second LS-RBM continues second LS-RBM of training using algorithm in step 2.2；

Step 2.4 it is recursive according to above step until training arrive L-1 layer, the trained parameters of LS-RBM of the bottom with Output is as next higher " data " in training pattern, that is, next LS-RBM, can be with after multiple loop iteration Learn to a profound sparse DBN model, i.e. LSDBN models；

Step 2.5 is to L layers of W^(L)And b^(L)It is initialized, and uses { W⁽¹⁾,W⁽²⁾..., W^(L)And { b⁽¹⁾,b⁽²⁾,…, b^(L)Composition one is with L layer of deep neural network, output layer is has label data, using Softmax graders as defeated Go out layer；

Step 3 advanced optimizes LSDBN.The pre-training stage is obtained into parameter value as the initial value in fine tuning stage, to whole A LSDBN networks are finely adjusted.The present invention finely tunes entire net using top-down supervised learning algorithm-back-propagation algorithm Network inputs training sample and test data is trained, the top-down backpropagation of error optimizes network.Mainly Steps are as follows：

Step 3.1, using the good network parameter of pre-training stage-training as the initial value in fine tuning stage, training sample is inputted To fine tuning the stage network in optimize.

Step 3.2, the activation value of every layer of neuron is calculated using propagated forward algorithm.

Wherein, α is learning rate, and J (W, b) is the cost function of network, is obtained by calculating real output value and desired value It arrives.The iterative step of Reusability gradient descent method is come the value of the J (W, b) reduced.

Test image data set is input in the trained LSDBN network models of step 3 by step 4, using Softmax points Test is identified in class device, realizes the output of image classification result.In order to carry out image classification, classification is used in the top layer of network Device, input test data carry out testing classification.In order to enable network application is extensive, the present invention is carried out using Softmax graders Classification exports for the grader of a k class and is：

Wherein, θ is a parameter matrix for including weights and biasing, regards grader corresponding to a classification as per a line Parameter, total k rows.It is that probability distribution is normalized, to make the sum of all probability be 1.Therefore, it finely tunes The cost function J (θ) in stage is：

Wherein, 1 { } was an indicative function, i.e., 1 { value is genuine expression formula }=1,1 { value is the expression formula of vacation }=0.

Below by method provided by the invention to MNIST handwritten forms database and Pendigits handwriting recongnition data Collection is detected.

Embodiment 2：Experiment on MNIST handwritten form databases

The hand-written volumetric data sets of MNIST include 60000 training samples and 10000 test samples, each picture size are 28*28 pixels.In order to facilitate the extraction of characteristics of image, the present invention extracts the different numbers of each classification from 60000 training datas The image of amount carries out experimental analysis.Wherein, model includes that 784 visible node layers and 500 hidden layer nodes, learning rate are set as 1, block number size is 100 in batches, maximum iteration 100, and the CD algorithm training patterns for the use of step-length being 1.

Table 1 be in the present invention on MNIST data sets sparsity metric as a result, and being carried out pair with remaining two kinds of sparse model Than analysis.Wherein degree of rarefication measure is shown below：

For sparse model, degree of rarefication is higher, and algorithm stability is higher, and robustness is also stronger.It can from table 1 Go out, compared with SP-RBM and SR-RBM, the sparse angle value higher of LS-RBM may learn more sparse expression.

The sparsity metric result on MNIST data sets of table 1

Table 2 is classification accuracy knots of the LS-RBM based on every class difference sample size on MNIST data sets in the present invention Fruit, and with artificial neural network (ANN), automatic coding machine (AE), limited Boltzmann machine (RBM) and SR-RBM, SP-RBM Method compares and analyzes.From table 2 it can be seen that for MNIST data sets per the sample of class different number, LS-RBM of the present invention Method is in best accuracy of identification always.Especially reach 96.8% discrimination in the case of 1000 samples of every class, 3 percentage points are higher by than secondary high SR-RBM algorithms, the method for also illustrating the present invention has more preferable ability in feature extraction.

2 LS-RBM of table classification accuracy results on MNIST data sets；

In order to learn the feature to profound level, table 3 illustrates LSDBN in the present invention and is based on per class on MNIST data sets The classification accuracies of different sample sizes is as a result, and DBN, SP-DBN, SR- for being formed with DBN and SP-RBM, SR-RBM DBN is compared and analyzed.For deeper sparse model, more abstract feature will be extracted by often increasing by one layer, equally The presence for also having some redundancy features influence final classifying quality, from table 3, it can readily be seen that, LSDBN more can Learn to the feature for being more easy to discrimination, 3 percentage points are improved than secondary high SP-DBN most multipotencys, illustrates that LSDBN can be for The interference of redundancy feature has better robustness.

3 LSDBN of table classification accuracy results on MNIST data sets

Embodiment 3：Experiment on Pendigits handwriting recongnition data sets

Pen-Based Recognition of Handwritten Digits (PenDigits) data set includes 10992 A data sample, is divided into 10 classes, wherein training data 7494, test data 3298,16 feature vectors of each sample, Equally the image of each class different number is analyzed.Visible node layer 16 is set, and hidden layer node 10, learning rate is set It is 1, block number size is 100 in batches, maximum iteration 1000.

Fig. 2 illustrates LS-RBM in the present invention and is based on per class difference sample on Pendigits handwriting recongnition data sets For the classification accuracy of quantity results, it can be seen that most of algorithms are when every class number of samples is more, classification accuracy is also more next It is higher.Even if LS-RBM algorithms only have tens data samples on PenDigits data sets per class, still reach best point Class accuracy rate, the character representation ratio SP-RBM and SR-RBM that the present invention learns have better discrimination.

In order to learn to profound feature, RBM is used on the basis of RBM, SP-RBM, SR-RBM and LS-RBM experiment The activation probability of implicit layer unit trains second RBM, that is, DBN, SP-DBN, SR-DBN, LS-DBN, hidden unit to be still set as 10, Iterations are respectively 1000.The classification accuracy of each model is tested using PenDigits data set test sets, as a result such as Fig. 3 It is shown.The present invention still can get its main feature when sample is less by sparsity constraints known to observation, than DBN model nicety of grading promotes 2.7%~6%, it was demonstrated that applicability of this paper algorithms in low dimensional data set.

Claims

1. a kind of sparse depth confidence network image sorting technique based on Laplace function constraints, which is characterized in that including such as Lower step：

The pretreated training dataset of step 1 is input in LSDBN network models by step 2, using to sdpecific dispersion algorithm without Supervise it is bottom-up individually train each layer of LS-RBM networks, using lower layer's LS-RBM networks output as it adjacent to upper The input of one layer of LS-RBM network obtains the parameter value of each LS-RBM networks, and finally obtain and inputted by repetitive exercise The high-level feature of image data；Parameter value, that is, the weights and biasing；

Step 3, the parameter value for obtaining step 2 are as the initial value in fine tuning stage, using top-down back-propagation algorithm Entire LSDBN networks are finely tuned, LSDBN network models are obtained；

Test image data set is input in the LSDBN network models that step 3 obtains by step 4, and is classified using Softmax Test, final output image classification results are identified in device.

2. according to the method described in claim 1, it is characterized in that, the step 1 is specially：It will be colored by binarization method Image is changed into grayscale image, and between the gray value of grayscale image is normalized to [0,1], obtains training dataset； Wherein normalizing formula is：

Wherein,For the characteristic value of image data set, x_maxAnd x_minThe respectively maximum value and minimum of all features of image data set Value, x is the image data set after normalization.

3. method according to claim 1 or 2, which is characterized in that the step 2 is specially：

Step 2.1, LSDBN network models are built, the parameter value of LSDBN network architectures is set：Visual node layer, hidden layer section Point, the hidden layer number of plies, iterations and fine tuning number；Wherein, visual node layer is the intrinsic dimensionality of the image set of input, hidden layer Node is determined according to the intrinsic dimensionality size of input picture collection；

Step 2.2, using the training dataset x pre-processed as the input of first LS-RBM, using CD algorithms to LS-RBM It is trained；

Wherein, θ indicates the parameter in model, i.e. θ={ W_ij,a_i,b_j}；W_ijIt is the weight matrix between visible layer and hidden layer, a_i For the biasing of visible node layer, b_jTo hide the biasing of node layer, i is the feature quantity of the image of input, you can node layer is regarded, Shared n；J is hidden layer node, shares m；v_iIndicate i-th of visual node layer, h_jIndicate j-th of hidden layer node；

Wherein, Z (θ) is to all possible visible node layer and hidden layer node to summation, a_iFor the biasing of visible node layer, b_jTo hide the biasing of node layer；

Using the principle of Bayesian formula, visual layer unit v and hidden layer are found out according to the joint probability distribution of formula (3) respectively The marginal probability distribution of h：

Using Bayesian formula principle and the defined formula of sigmoid activation primitives, visual layer unit v and hidden layer are derived The conditional probability distribution formula of h：

Using formula (7) and formula (8), the approximation of training image is obtained by a step gibbs sampler using to sdpecific dispersion algorithm Reconstruct P (v；θ)；

The optimal value of parameter is：

F=F_unsup+λF_sparse (11)

Wherein, λ is sparsity parameter, for adjusting F_sparseRelative importance, F_sparseIndicate sparse Regularization function, formula For：

Wherein, L (q_j, μ, b) and it is Laplacian probability density function, q_jIndicate to j-th of condition phase for implying layer unit of data The average value of prestige, p are a constants, control n hidden unit h_jDegree of rarefication；U indicates scale parameter；q_jExpression formula is as follows：

Wherein, j-th of the conditional expectation for implying layer unit when E () is data-oriented, l indicate that the number of training image, m are instructions Practice the quantity of image data set,For j-th of unit of the corresponding hidden layer of l pictures, v^(l)It is corresponding for l pictures Visual layer unit,It is to hide the activation probability of layer unit hj, g is sigmoid functions when providing visible layer v；

After increasing sparse regularization term, the purpose of training LS-RBM is the object function optimal value of solution formula (10)：

(4) biasing of weight matrix and hidden layer, derivation are updated to the object function derivation of LS-RBM using gradient descent method Formula is：

a⁽¹⁾:=ma+ α (v₁-v₂) (26)

(5) continued to train network with new parameter value, by continuing to optimize object function, make the activation probability of implicit layer unit by Gradually close to given fixed value p, study to one group of weight parameter and corresponding biasing, searched out by these suitable parameters Sparse features vector, control image present in redundancy feature, with main feature in image come ensemble learning weight to reconstruct Input data, the training of first LS-RBM of completion and the update of corresponding parameter value θ；

Step 2.3 passes through first trained W of LS-RBM⁽¹⁾And b⁽¹⁾, and use P (h⁽¹⁾|v⁽¹⁾,W⁽¹⁾,b⁽¹⁾) obtain second The input feature vector of a LS-RBM continues second LS-RBM of training using algorithm in step 2.2；

Step 2.4 is recursive to train according to above step after trained to l-1 layers, multiple loop iteration, obtains a depth The sparse DBN model of level, i.e. LSDBN network models；

Step 2.5 is to l layers of W^(L)And b^(L)It is initialized, and uses { W⁽¹⁾,W⁽²⁾..., W^(L)And { b⁽¹⁾,b⁽²⁾,…,b^(L)} For composition one with L layers of deep neural network, output layer is the label data of training image collection, uses Softmax graders As output layer.

4. method according to claim 1 or 2, which is characterized in that the step 3 is specially：

Step 3.1, using trained parameter value θ in step 2 as the initial value of fine tuning stage parameter, the fine tuning stage is inputted In LSDBN network models；

Step 3.3, the error for calculating the output result and the image set corresponding label of training image collection propagated forward, by error Backpropagation calculates the residual error of each implicit layer unit, to indicate the unit on being influenced caused by residual error；Utilize each unit Residual computations partial derivative, each iteration using gradient descent method according to formula (28), (29) to weight matrix and biasing carry out Update, until reaching maximum iteration obtains the LSDBN network models finely tuned；

Wherein, α is learning rate, and J (W, b) is obtained by calculating the error of realistic model output valve and corresponding label value Cost function.

5. according to the method described in claim 3, it is characterized in that, the step 3 is specially：

Step 3.1, using the good parameter value θ of pre-training stage-training as the initial value of fine tuning stage parameter, the fine tuning stage is inputted LSDBN network models in；

6. according to the method described in claim 1,2 or 5, which is characterized in that the step 4 is specially：

Step 4.1, the test chart image set pre-processed is input in the LSDBN network models that step 3 has been finely tuned, extraction test The main feature of image；

Step 4.2, the main feature of test image is input to Softmax graders in, output belongs to the general of a certain classification Rate, the output for the Softmax graders of a k class are：

Wherein, x⁽ⁱ⁾For i-th of test image, y⁽ⁱ⁾Indicate the label of i-th of test image, p (y⁽ⁱ⁾=k | x⁽ⁱ⁾；θ) indicate i-th A test image belongs to the probability of kth class,It is that probability distribution is normalized, it is 1 to make the sum of all probability；

Step 4.3, the class label corresponding to certain a kind of probability and the image is belonged to according to test image, it is accurate calculates classification Rate, the classification results of final output image.

7. according to the method described in claim 3, it is characterized in that, the step 4 is specially：

Wherein, x⁽ⁱ⁾It isⁱA test image, y⁽ⁱ⁾Indicate theⁱThe label of a test image, p (y⁽ⁱ⁾=k | x⁽ⁱ⁾；θ) indicate i-th A test image belongs to the probability of kth class,It is that probability distribution is normalized, it is 1 to make the sum of all probability；

8. according to the method described in claim 4, it is characterized in that, the step 4 is specially：