CN108108806A

CN108108806A - Convolutional neural networks initial method based on the extraction of pre-training model filter

Info

Publication number: CN108108806A
Application number: CN201711335174.3A
Authority: CN
Inventors: 周巍; 张冠文
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2017-12-14
Filing date: 2017-12-14
Publication date: 2018-06-01

Abstract

The present invention provides a kind of convolutional neural networks initial methods based on the extraction of pre-training model filter, it is related to technical field of video processing, the present invention utilizes minimum entropy loss and minimal reconstruction error approach, extract pre-training model median filter parameter, to initialized target Task Network model, realization meets actual application problem medium or small scale network initialization matter.The present invention is due to the use of minimum entropy loss and minimal linear reconstructing method, the extraction filter parameter from pre-training model, goal task network model is initialized, it is consistent with pre-training network structure present invention does not require goal task network structure, goal task can be made to meet memory overhead and calculating speed requirement in actual application problem according to practical application flexible design network structure.

Description

Convolutional neural networks initial method based on the extraction of pre-training model filter

Technical field

The present invention relates to technical field of video processing, especially a kind of convolutional neural networks initial method.

Background technology

Depth convolutional neural networks (ConvolutionalNeuralNetwork, CNN) are by learning multilayered nonlinear net Network structure using approaching for simple network structure function to achieve the objective, and then can be obtained from primary data sample focusing study Obtain the character representation of sample data.Have benefited from mass data, depth convolutional neural networks are in recent years in artificial intelligence and machine One of important breakthrough that learning areas obtain, it obtained in terms of graphical analysis, speech recognition and natural language processing it is huge into Work(.

Due to often only having low volume data in actual application problem, the CNN models that training obtains have over-fitting situation, Its model generalization ability is weaker, the poor performance on goal task.A kind of effective solution is right using good strategy Network model initializes.Conventional method carries out convolutional network median filter parameter initial usually by being sampled to Gaussian Profile Change.As network structure extends in breadth and depth, Gaussian Profile initial method is difficult to meet complex network structures requirement. To solve the above problems, Lee and Sermanet et al. research and utilization supervised learning or unsupervised learning method, in CNN networks Convolutional layer successively training initialization.Since correlation technique needs the additional training time, simultaneously because convolutional layer training is local most Excellent problem, above-mentioned initial method are not used widely in practical applications.Girshick et al. proposes to utilize pre-training The method of model initialization goal task network model.The pre-training model that training obtains on large-scale dataset has one Fixed character representation ability and generalization ability.Pre-training model is used for the initialization of goal task network model, CNN network moulds Type can be outstanding completion goal task.However, this limited to using pre-training model initialization method with following aspect.It is first First, it is consistent with goal task network structure using pre-training model needs pre-training prototype network structure, as filtered in convolutional layer Device quantity, wave filter size, step-length, this causes network model can not be according to goal task flexible design network structure.Secondly, Since pre-training prototype network structure and scale is usually larger, goal task content expense and calculating speed will in actual application problem Ask higher, therefore large scale network structural model can not be adapted to the goal task in actual application problem.How profit is studied With pre-training network model, concrete application problem medium or small scale network is initialized, meets content expense in practical application And calculating speed, it has a very important significance.

The content of the invention

For overcome the deficiencies in the prior art, the present invention provides a kind of convolution god based on the extraction of pre-training model filter Through netinit method.The object of the invention is convolutional neural networks initialization of the design based on the extraction of pre-training model filter Method using minimum entropy loss and minimal reconstruction error approach, extracts pre-training model median filter parameter, to initialize mesh Task Network model is marked, realization meets actual application problem medium or small scale network initialization matter.

The technical solution adopted by the present invention to solve the technical problems comprises the following steps：

The first step：CNN network structures are designed for goal task；

Second step：Select pre-training network model；

3rd step：According to goal task network structure, extracted using two methods of minimum entropy loss or minimal reconstruction error Pre-training model median filter parameter；Define F_s={ f_i}^M, F_sRepresent M wave filter in pre-training model, wave filter size is k^s ×k^s, goal task network structure has N number of wave filter F_t, wave filter size is k^t×k^t, it is assumed that wave filter F_sWith wave filter F_tPlace In identical convolutional layer depth；

A) the filter parameter extracting method based on minimum entropy loss

Utilize gauss hybrid models (GaussianMixtureModel, GMM) description pre-training model filter probability point Cloth, filter set F_sGaussian Mixture distribution g (F_s) it is expressed as form：

In formula (1), C represents Gauss model number in mixed model, g_i(F_s) represent i-th of Gauss model,It represents The prior probability (PriorProbability) of corresponding i-th of Gauss model, makes F_s(i) filter set F is represented_sMiddle removal i-th The filter set formed after wave filter represents mixed model probability distribution g using Kullback-Leibler (KL) is scattered (F_s) and g (F_s(i)) cross entropy (RelativeEntropy) between is following form：

Wherein Filter set F is represented respectively_sIn j-th and k-th Gauss model prior probability,Table Show filter set F_s(i) prior probability of l-th of Gauss model, D in_kl(g_i(F_s)||g_k(F_s)) represent filter set F_sIn I-th of Gauss model g_i(F_s) and filter set F_sIn k-th of Gauss model g_k(F_s) KL disperse, D_kl(g_i(F_s)||g_l (F_s(i)) represent filter set F_sIn i-th of Gauss model g_i(F_s) and filter set F_s(i)In l-th of Gauss model g_l (F_s(i)) KL disperse；

Finally, the wave filter of N number of cross entropy minimum is extracted using formula (3)：

Wave filter is extracted in a manner that nothing is put back to from F_sMiddle extraction filter；

B) the filter parameter extracting method based on minimal reconstruction error

In minimal reconstruction error approach, filter set F_sMedian filter f_iUse filter set F_tMedian filter f '_j Linear reconstruction expression, i.e.,Wherein weight factorIt is N number of for scalarForm weight vectors W_i, wave filter Set F_sThe reconstructed error of all wave filters is expressed as：

In formula (4), all weight factorsForm weight matrix W, and W ∈ R^M×N, N≤M, use L1 normal forms constraint power Weight matrix W extracts problem to wave filter, is converted to the solution of following optimization problem：

Wherein, regular terms | W |₁For L1 normal form representations, using L1 regularization least square methods (L1-regularized LeastSquares) method solves formula (5), and γ is weight parameter, adjusts the proportionality coefficient of regular terms；

4th step：According to goal task network structure median filter size parameter k^t×k^t, in the pre-training model of extraction Size is k^s×k^sWave filter carry out bilinear interpolation, the wave filter F in goal task network_tWith the pre-training mould extracted Type median filter F_sWith same size and scale, with wave filter F_sIn parameter to wave filter F_tIn relevant parameter assigned Value, you can realize the initialization to goal task network.

The beneficial effects of the present invention are due to the use of minimum entropy loss and minimal linear reconstructing method, from pre-training model Middle extraction filter parameter, initializes goal task network model, present invention does not require goal task network structure and Pre-training network structure is consistent, and goal task can be made to meet actual application problem according to practical application flexible design network structure Middle memory overhead and calculating speed requirement.

Description of the drawings

Fig. 1 is the flow chart of the present invention.

Specific embodiment

The present invention is further described with reference to the accompanying drawings and examples.

Embodiment of the method, for goal task, is selected with CIFAR10, CIFAR100, SVHN and STL10 classification task data set GoogleNet, CaffeNet and the VGG16 that training obtains on ImageNet are pre-training model, using minimum entropy loss and Minimal reconstruction error approach extraction filter parameter, initializes goal task network model, with using Gaussian Profile Random initializtion method compares, and investigates goal task network model classification accuracy (TestingError) and network model training Convergence rate (normalizedAUC).The method of the present invention workflow is as shown in Figure 1.

As shown in Figure 1, the present invention comprises the steps of：

The first step：CNN network structures are designed for goal task；

The present invention designs CNN network structures for CIFAR10, CIFAR100, SVHN and STL10 classification task, and target is appointed Network structure of being engaged in is as shown in table 1.

Table 1 CIFAR10/100, SVHN, STL10 goal task network structure

Second step：Select pre-training network model；

It is pre-training to select the network models such as GoogleNet, CaffeNet and the VGG16 that training obtains on ImageNet Model；

A) the filter parameter extracting method based on minimum entropy loss

In formula (1), C represents Gauss model number in mixed model, g_i(F_s) represent i-th of Gauss model,It represents The prior probability (PriorProbability) of corresponding i-th of Gauss model, makes F_s(i)Represent filter set F_sMiddle removal i-th The filter set formed after wave filter represents mixed model probability distribution g using Kullback-Leibler (KL) is scattered (F_s) and g (F_s(i)) between cross entropy (RelativeEntropy) be following form：

Wherein Filter set F is represented respectively_sIn j-th and k-th Gauss model prior probability, Represent filter set F_s(i)In l-th of Gauss model prior probability, D_kl(g_i(F_s)||g_k(F_s)) represent filter set F_s In i-th of Gauss model g_i(F_s) and filter set F_sIn k-th of Gauss model g_k(F_s) KL disperse, D_kl(g_i(F_s)||g_l (F_s(i)) represent filter set F_sIn i-th of Gauss model g_i(F_s) and filter set F_s(i)In l-th of Gauss model g_l (F_s(i)) KL disperse；

In formula (2), D_kl(_g(F_s)||g(F_s(i))) it is smaller when represent wave filter f_iTo filter set F_sProbability distribution tribute It offers less.For the present invention by deleting the smaller wave filter of cross entropy, the larger wave filter of reservation cross entropy is larger using cross entropy The parameter of wave filter, to initializing for goal task network model.

B) the filter parameter extracting method based on minimal reconstruction error

In formula (4), all weight factorsForm weight matrix W, and W ∈ R^M×N, since objective network scale is usually small In pre-training prototype network structure and scale, thus N≤M, it is openness to strengthen weight matrix W, use L1 normal forms constraint weight square Battle array W extracts problem to wave filter, is converted to the solution of following optimization problem：

Wherein, regular terms | W |₁For L1 normal form representations, using L1 regularization least square methods (L1-regularized LeastSquares) method solves formula (5), and γ is weight parameter, adjusts the proportionality coefficient of regular terms, γ is taken in the present invention =0.4；

It is referred to using symbol M EL G, MEL C, MEL V using minimum entropy loss method in pre-training model GoogleNet, CaffeNet, VGG16 extraction filter parameter initialization are referred to using symbol M RE@G, MRE@C, MRE@V6 and use minimal reconstruction Error approach is in pre-training model GoogleNet, CaffeNet, VGG16 extraction filter parameter initialization.Different initialization sides Method classification accuracy (TestingError) and convergence rate on goal task CIFAR10, CIFAR100, SVHN and STL10 (normalizedAUC) index is as shown in table 2.

Table 2 different initial method classification accuracy and convergence rates on goal task

As shown in table 2, the present invention extracts filter using minimum entropy loss and minimal reconstruction error approach from pre-training model The method of ripple device parameters on target Task Network initialization, compares the method using Gaussian Profile random initializtion, accurate in classification Best test error has been taken in true rate, has also there is faster convergence rate on model training.It is pointed out that different is pre- Training pattern shows difference, it is necessary to select different models as goal task according to specific tasks on different goal tasks The pre-training model of netinit.

Claims

1. a kind of convolutional neural networks initial method based on the extraction of pre-training model filter, it is characterised in that including following Step：

The first step：CNN network structures are designed for goal task；

Second step：Select pre-training network model；

3rd step：According to goal task network structure, two methods of minimum entropy loss or the pre- instruction of minimal reconstruction error extraction are used Practice model median filter parameter；Define F_s={ f_i}^M, F_sRepresent M wave filter in pre-training model, wave filter size is k^s× k^s, goal task network structure has N number of wave filter F_t, wave filter size is k^t×k^t, it is assumed that wave filter F_sWith wave filter F_tIt is in Identical convolutional layer depth；

A) the filter parameter extracting method based on minimum entropy loss

Pre-training model filter probability distribution, filter set F are described using gauss hybrid models_sGaussian Mixture distribution g (F_s) It is expressed as form：

In formula (1), C represents Gauss model number in mixed model, g_i(F_s) represent i-th of Gauss model,Represent corresponding the The prior probability of i Gauss model, makes F_s(i) filter set F is represented_sThe wave filter collection formed after the i-th wave filter of middle removal It closes, disperses to represent mixed model probability distribution g (F using Kullback-Leibler_s) and g (F_s(i)) between cross entropy for such as Lower form：

<mrow> <msub> <mi>D</mi> <mrow> <mi>k</mi> <mi>l</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>g</mi> <mo>(</mo> <msub> <mi>F</mi> <mi>s</mi> </msub> <mo>)</mo> <mo>|</mo> <mo>|</mo> <mi>g</mi> <mo>(</mo> <msub> <mi>F</mi> <mrow> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>)</mo> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mi>j</mi> </munder> <msubsup> <mi>&pi;</mi> <mi>j</mi> <mi>s</mi> </msubsup> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mfrac> <mrow> <msub> <mi>&Sigma;</mi> <mi>k</mi> </msub> <msubsup> <mi>&pi;</mi> <mi>k</mi> <mi>s</mi> </msubsup> <mi>exp</mi> <mo>{</mo> <mo>-</mo> <msub> <mi>D</mi> <mrow> <mi>k</mi> <mi>l</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>g</mi> <mi>i</mi> </msub> <mo>(</mo> <msub> <mi>F</mi> <mi>s</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>g</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>s</mi> </msub> <mo>)</mo> </mrow> <mo>)</mo> <mo>}</mo> </mrow> <mrow> <msub> <mi>&Sigma;</mi> <mi>l</mi> </msub> <msubsup> <mi>&pi;</mi> <mi>l</mi> <mrow> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msubsup> <mi>exp</mi> <mo>{</mo> <mo>-</mo> <msub> <mi>D</mi> <mrow> <mi>k</mi> <mi>l</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>g</mi> <mi>i</mi> </msub> <mo>(</mo> <msub> <mi>F</mi> <mi>s</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>g</mi> <mi>k</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>F</mi> <mi>s</mi> </msub> <mo>)</mo> </mrow> <mo>)</mo> <mo>}</mo> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

WhereinFilter set F is represented respectively_sIn j-th and k-th Gauss model prior probability,Represent filter Ripple device set F_s(i) prior probability of l-th of Gauss model, D in_kl(g_i(F_s)||g_k(F_s)) represent filter set F_sIn i-th A Gauss model g_i(F_s) and filter set F_sIn k-th of Gauss model g_k(F_s) KL disperse, D_kl(g_i(F_s)||g_l(F_s(i)) Represent filter set F_sIn i-th of Gauss model g_i(F_s) and filter set F_s(i)In l-th of Gauss model g_l(F_s(i)) KL disperses；

<mrow> <msub> <mi>F</mi> <mi>t</mi> </msub> <mo>=</mo> <mi>arg</mi> <munder> <mi>max</mi> <mrow> <mn>1</mn> <mo>,</mo> <mn>...</mn> <mo>,</mo> <mi>N</mi> </mrow> </munder> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>D</mi> <mrow> <mi>k</mi> <mi>l</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>g</mi> <mo>(</mo> <msub> <mi>F</mi> <mi>s</mi> </msub> <mo>)</mo> <mo>|</mo> <mo>|</mo> <mi>g</mi> <mo>(</mo> <msub> <mi>F</mi> <mrow> <mi>s</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> </mrow> </msub> <mo>)</mo> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

B) the filter parameter extracting method based on minimal reconstruction error

In minimal reconstruction error approach, filter set F_sMedian filter f_iUse filter set F_tMedian filter f_j' linear Reconstruct expression, i.e.,Wherein weight factorIt is N number of for scalarForm weight vectors W_i, filter set F_sThe reconstructed error of all wave filters is expressed as：

<mrow> <mi>E</mi> <mo>=</mo> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mo>|</mo> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>-</mo> <munderover> <mo>&Sigma;</mo> <mi>j</mi> <mi>N</mi> </munderover> <msubsup> <mi>w</mi> <mi>j</mi> <mi>i</mi> </msubsup> <msub> <msup> <mi>f</mi> <mo>&prime;</mo> </msup> <mi>j</mi> </msub> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

In formula (4), all weight factorsForm weight matrix W, and W ∈ R^M×N, N≤M, use L1 normal forms constraint weight square Battle array W extracts problem to wave filter, is converted to the solution of following optimization problem：

<mrow> <mi>min</mi> <munderover> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </munderover> <mo>|</mo> <msub> <mi>f</mi> <mi>i</mi> </msub> <mo>-</mo> <munderover> <mo>&Sigma;</mo> <mi>j</mi> <mi>N</mi> </munderover> <msubsup> <mi>w</mi> <mi>j</mi> <mi>i</mi> </msubsup> <msub> <msup> <mi>f</mi> <mo>&prime;</mo> </msup> <mi>j</mi> </msub> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>+</mo> <mi>&gamma;</mi> <mo>|</mo> <mi>W</mi> <msub> <mo>|</mo> <mn>1</mn> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>

Wherein, regular terms | W |₁For L1 normal form representations, formula (5) is solved using L1 regularization least squares method, γ is power Weight parameter adjusts the proportionality coefficient of regular terms；

4th step：According to goal task network structure median filter size parameter k^t×k^t, to size in the pre-training model of extraction For k^s×k^sWave filter carry out bilinear interpolation, the wave filter F in goal task network_tWith in the pre-training model that extracts Wave filter F_sWith same size and scale, with wave filter F_sIn parameter to wave filter F_tIn relevant parameter carry out assignment, i.e., The initialization to goal task network can be achieved.