CN108898213A

CN108898213A - A kind of adaptive activation primitive parameter adjusting method towards deep neural network

Info

Publication number: CN108898213A
Application number: CN201810631395.3A
Authority: CN
Inventors: 胡海根; 周莉莉; 罗诚; 陈胜勇; 管秋; 周乾伟
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-06-19
Filing date: 2018-06-19
Publication date: 2018-11-27
Anticipated expiration: 2038-06-19
Also published as: CN108898213B

Abstract

A kind of adaptive activation primitive parameter adjusting method towards deep neural network, the described method comprises the following steps：Step 1, mathematical definition is carried out to adaptive activation primitive parameter adjusting method first；Step 2, adaptive activation primitive is carried out based on MNIST data set and other classical activation primitives carry out experimental result comparison and analysis, the network used is that there are three hidden layers, each hidden layer has 50 neurons, use 100 periods of gradient descent algorithm iteration at any time, learning rate is set as 0.01, and minimum batch size is 100；Step 3, the detection after step 2 obtains optimal activation primitive version, applied to specific bladder cancer cell.The present invention is during network is constantly trained, find the optimal activation primitive for being suitble to the network by constantly adjusting own form, improve the performance of network, reduce adaptive activation primitive in network can learning parameter total number, accelerate e-learning rate, improves the extensive of network.

Description

A kind of adaptive activation primitive parameter adjusting method towards deep neural network

Technical field

The invention belongs to adaptive activation primitive fields, devise a kind of adaptive activation letter towards deep neural network Number parameter adjusting method.Specifically adaptive activation primitive by addition can learning parameter control itself shape, while these Learning parameter with network training can be updated, reduce adaptive activation primitive in net by back-propagation algorithm It is whole in network can learning parameter quantity.

Background technique

Nowadays machine learning is widely used in social life, and traditional machine learning mostly uses shallow structure, such as high This mixed model (GMM), conditional random fields (CRF), support vector machines (SVM) etc., table of these shallow structures to complicated function Show that ability is limited, to the extraction opposing primary of original input signal feature, for complicated classification problem its generalization ability by one Fixed restriction compares and is difficult to resolve certainly some more complicated natural sign processing problems, such as human speech and natural image identification Deng.Then deep learning is learnt by simulation brain to greatly facilitate the development of machine learning, and deep learning is maximum Feature is initial data to be transformed by some simple but nonlinear model higher level, to be more abstracted mark sheet It reaches, learns a kind of deep layer nonlinear network structure, realize approaching for complicated function, and from a few sample focusing study data set The ability of substantive characteristics.By it was verified that deep learning be good at discovery high dimensional data in labyrinth, be widely used in counting The research fields such as calculation machine vision, speech recognition, natural language processing.

As deep learning is in the application of various fields, more and more researchs concentrate on the innovation to deep learning algorithm and Optimization.At the beginning of optimization including classifier and loss function, the decline of the gradient based on backpropagation optimization, network weight parameter Optimization and the optimization of artificial neural network of beginningization etc., wherein being the innovation of deep learning algorithm to the optimization of artificial neural network Important component.Artificial neural network can possess different network structure and neuronal quantity, at this according to the difference of task People are usually using identical activation primitive, such as Sigmoid, Tanh, Relu in a little networks.It is proposed in recent years it is adaptive stress Function living, makes network neural member show different shapes, but with the expansion of network size and the increase of neuron, for adjusting Save these neuronal shapes can learning parameter show linear increase, the learning efficiency of network is significantly dragged down.Thus may be used See that artificial neural network basic structure is considered as being interconnected to form by some neurons, activation primitive then plays wherein Very important role.

The main function of activation primitive is to provide non-linear expression's ability of network in artificial neural network.If a mind It is only linear operation through neuron in network, then the network can only express simple Linear Mapping, even if increasing network Depth and width also still or Linear Mapping, it is difficult to effectively modeling actual environment in nonlinear Distribution data.It is added non- After linear activation primitive, deep neural network just has the Nonlinear Mapping learning ability of layering.Present invention is generally directed to Activation primitive improves, and optimizes the connection in network between neuron, further increases the performance of network whereby.

Summary of the invention

In order to reduce adaptive activation primitive in network can learning parameter total number, accelerate e-learning rate, change The generalization ability of kind network, the invention proposes a kind of adaptive activation primitive parameter regulation side towards deep neural network Method finds the optimal activation letter for being suitble to the network by constantly adjusting own form during network is constantly trained Number, improves the performance of network.

The technical solution adopted by the present invention to solve the technical problems is：

A kind of adaptive activation primitive parameter adjusting method towards deep neural network, the method includes following steps Suddenly：

Step 1, mathematical definition is carried out to adaptive activation primitive parameter adjusting method first, process is as follows：

If the adjustable parameter number of adaptive activation primitive is N, then adaptive activation primitive is defined as：

f_(x)=f (a*x+c)

Wherein a and c be all for control activation primitive shape can learning parameter, so-called neural network regards many as The combination of single neuron, the output for defining neural network are collection weight, a deviation and can learn the compound of neuron parameter Function, function are as follows：

h_(w,b,a,c)=h (f (a*x+c))

Wherein h represents the output of neural network, and w and b represent the weight and deviation of network；At the same time, the function also by Regard as in neural network all neurons using same group can learning parameter, more extensive defines is：Neural network In each neuron use different customized parameters, as follows：

Wherein fn represents one layer in network of each neuron, and each layer of neuron is determined using identical customized parameter Justice is as follows：

Relay algorithm using reversed and train the adaptive activation primitive in neural network, wherein can learning parameter along with Weight and bigoted as the progress of network training is optimized together, parameter { a1 ..., n, b1 ..., n } is according to chain type method of derivation It is then updated, is updated as follows：

Wherein ai ∈ { a1 ..., n, b1 ..., n }, L indicate cost function,This can pass through reversed from later layer Propagation obtains, and weighted term ∑ Xi can be used on all positions of characteristic pattern or neural net layer, for the change shared in one layer Amount, gradient ai can be acquired with following formula, and ∑ i is used to sum to the neuron in all channels or one layer, and formula is as follows：

Step 2, adaptive activation primitive is carried out based on MNIST data set and other classical activation primitives carries out experimental result Comparison and analysis, obtain final activation primitive version.Process is as follows：

The network used is there are three hidden layer, and each hidden layer has 50 neurons, uses gradient descent algorithm at any time 100 periods of iteration, learning rate are set as 0.01, and minimum batch size is 100.

Further, in the step 2, the comparison activation primitive used has traditional Si gmoid function, tradition ReLU activation letter The layering version of the unified version of several, adaptive activation primitive, the respective version of adaptive activation primitive and adaptive activation primitive This.

Step 3, after step 2 obtains optimal activation primitive version, applied to the detection of specific bladder cancer cell, process It is as follows：

3.1, the production of data set is carried out to bladder cancer；

3.2, hop algorithm and model is selected to carry out the initialization of parameter；

3.3, optimal activation primitive and conventional activation function are carried out to the comparison and analysis of experimental result.

Further, in described 3.1, bladder cancer cell data set is made into pascal_voc2007 format, it is mainly sharp The label information of cell is saved with the xml document of generation.

In described 3.2, Faster R-CNN algorithm is selected, the initialization of network parameter is carried out using vgg16 model, is utilized Vgg16 pre-training model carries out network parameter initialization.

In described 3.3, using in the optimal activation primitive version replacement Faster R-CNN algorithm generated in step 3.2 Conventional activation function finally carries out the analysis and comparison of experimental result.

Beneficial effects of the present invention are mainly manifested in：From theoretical and experiments have shown that adaptive activation primitive parameter adjusting method Validity, the problems such as best activation primitive is provided, gradient disperse existing for conventional activation function is avoided for network, improve network Capability of fitting.

Detailed description of the invention

Fig. 1 is invention activation function AS convergence curve figure；

Fig. 2 be AS activation primitive of the present invention can learning parameter adjusting figure；

Fig. 3 is original sigmoid activation primitive of the invention and final AS activation primitive figure；

Fig. 4 is the final AS activation primitive of the present invention and Experimental comparison results' figure of other activation primitives.

Fig. 5 is Sigmoid activation primitive figure.

Fig. 6 is Tanh activation primitive figure.

Fig. 7 is ReLU activation primitive figure.

Specific embodiment

The invention will be further described below in conjunction with the accompanying drawings.

Referring to Fig.1~Fig. 7, a kind of adaptive activation primitive parameter adjusting method towards deep neural network, the side Method includes the following steps：

f_(x)=f (a*x+c)

h_(w,b,a,c)=h (f (a*x+c))

Step 2, adaptive activation primitive is carried out based on MNIST data set and other classical activation primitives carries out experimental result Comparison and analysis, process are as follows：

The present invention is based on MNIST data set, in sigmoid classics activation primitive addition can learning parameter make from Adapt to activation primitive：Adaptive Sigmoid (AS), then by the adaptive activation primitive of each version and Sigmoid and ReLU two A classics activation primitive carries out the comparison of test result.

MNIST is a Handwritten Digit Recognition data set, referred to as the drosophila of deep learning experiment, it contains 60000 Picture is as training data, and 10000 pictures are as test set.In MNIST data set, each gray scale picture is all represented Number in 0~9.The size of picture is 28*28, and handwritten numeral appears in the middle of picture.Activate letter Number AS definition be：

F=b₀*sigmoid(a₀*x+a₁)+b₁

A0, a1, b0, b1 be four can learning parameter, they control the shape of function, can be with network weight and bigoted It is trained together.

The present invention mainly in sigmoid classics activation primitive addition can learning parameter make adaptive activation letter Number AS, the mathematical definition of function are as follows：

If the adjustable parameter number of adaptive activation primitive is N, it is assumed here that N=2.So adaptive activation primitive can be with It is defined as：

f_(x)=f (a*x+c)

Wherein a and c be all can learning parameter for control activation primitive shape.So-called neural network can be regarded as The combination of many single neurons, the output for defining neural network are collection weight, a deviation and can learn neuron parameter Compound function, function are as follows：

h_(w,b,a,c)=h (f (a*x+c))

Wherein h represents the output of neural network, and w and b represent the weight and deviation of network.At the same time, which may be used also It can learning parameter using same group to be seen as all neurons in neural network.One more it is extensive definition be：Nerve Each neuron uses different customized parameters in network, as follows：

Wherein fn represents one layer in network of each neuron.Each layer of neuron is determined using identical customized parameter Justice is as follows：

The present invention trains the adaptive activation primitive in neural network using reversed relay algorithm, wherein can learning parameter Along with weight and bigoted as the progress of network training is optimized together.Parameter a1 ..., n, b1 ..., n } it can be according to chain Formula Rule for derivation is updated, and is updated as follows：

Wherein ai ∈ { a1 ..., n, b1 ..., n }, L indicate cost function.This can pass through reversed from later layer Propagation obtains, and weighted term ∑ Xi can be used on all positions of characteristic pattern or neural net layer.For the change shared in one layer Amount, gradient ai can be acquired with following formula, and ∑ i is used to sum to the neuron in all channels or one layer, and formula is as follows：

In step 3, the method for adaptive activation primitive is applied to deep learning, the present invention is will be obtained in step 2 Optimal activation primitive is applied to the detection of bladder cancer cell.Process is as follows：

3.1, the production of data set is carried out.The present invention is that bladder cancer cell data set is made into pascal_voc2007 lattice Formula mainly saves the label information of cell using the xml document generated.

3.2, suitable algorithm and model is selected to carry out the initialization of parameter.The present invention selects Faster R-CNN algorithm, The initialization of network parameter is carried out using vgg16 model, mainly carries out network parameter initialization using vgg16 pre-training model The training time is reduced, while reducing the risk of poor fitting or over-fitting.

3.3, optimal activation primitive and conventional activation function are carried out to the comparison and analysis of experimental result.Mainly utilize step The conventional activation function in optimal activation primitive version replacement Faster R-CNN algorithm generated in rapid 2, is finally tested As a result analysis and comparison.

Finally, method proposed by the invention, i.e., use identical adjustable activation primitive in whole network, it is no matter neural How many a neurons are used in network, the number of parameters added in total is exactly the number of parameters that adaptive activation primitive can learn (shape that these parameters are used to control function).Whole network uses identical variant sigmoid function, more just as compound function Xiang Shici superposition, enhances the non-linear of network, improves the capability of fitting of network, and accelerate the pace of learning of network.

Hereinafter reference will be made to the drawings, and the present invention is described in detail.

As shown in Figure 1, the network used is there are three hidden layer, each hidden layer has 50 neurons, and use is terraced at any time Spend 100 periods of descent algorithm iteration.Learning rate is set as 0.01, and minimum batch size is 100.Comparison activation letter used in it Number has traditional Si gmoid function, tradition ReLU activation primitive, the unified version of adaptive activation primitive, adaptive activation primitive Respective version and adaptive activation primitive layering version.Wherein Fig. 1 is ReLU, Sigmoid and based on adaptive activation letter Three kinds of version convergence curves of number AS." relu_train " indicates to use Relu activation primitive classification error rate on training set. " relu_test " indicates to use Relu activation primitive classification error rate on test set." AUsigmoid " indicates adaptive activation The unified version (Unified Version, UV) of function AS, i.e., each neuron is all using identical activation primitive. " ALsigmoid " indicates the respective version (Individual Version, IV) of adaptive activation primitive, i.e., each neuron makes With the activation primitive respectively identified oneself." AIsigmoid " indicates layering version (Layer Version, LV), i.e. each layer of institute There is neuron using identical activation primitive, but the activation primitive between every layer is not necessarily identical.

The expression formula of traditional Si gmoid activation primitive is as follows：

Sigmoid activation primitive figure is referring to Fig. 5.

Sigmoid activation primitive is a kind of Activiation method often having, and is because of activation of the activation primitive for neuron Frequency has good explanation：From do not activate completely 0 to maximum boundary 1 fully saturated activation.But present Sigmoid function It seldom uses, a major reason is that Sigmoid function presence saturation makes gradient disappear.This is because Sigmoid is neural There are a bad characteristics for member, exactly when the activation value of neuron can be saturated when close at 0 or 1, at these regions, Functional gradient is almost 0, and such case causes when backpropagation, this (part) gradient will be with entire loss function Gradient about gate cell output is multiplied, and the result being multiplied also can be close to zero, this can effectively terminate gradient, cause almost There is no signal to pass to weight by neuron and arrive data again, leads to last gradient disperse problem.

Another classical activation primitive is：Tanh nonlinear function, expression formula are as follows：

Tanh activation primitive figure is referring to Fig. 6.

As seen from the figure, compared to Sigmoid function, although real number value is compressed between [- 1,1] by Tanh, and Sigmoid is the same, and there is also saturation problems.But with Sigmoid neuron unlike, its output is zero center.In reality In the operation of border, Tanh nonlinear function ratio Sigmoid nonlinear function is more favourable.It may be said that Tanh neuron is one simple The Sigmoid neuron of amplification.

Compared to the first two classics activation primitive, ReLU is the activation primitive being widely used now, its mathematical formulae It is as follows：

F (x)=max (0, x)

ReLU activation primitive figure is referring to Fig. 7.

Compared to Sigmoid and Tanh function, ReLU has huge acceleration for the convergence that gradient declines, this be by In the linear of it, unsaturated formula is generated.For ReLU activation primitive when input is positive number, there is no gradient saturations to ask Topic；When input is negative, ReLU is not activated completely, this indicates that ReLU will once being input to negative Paralysis.For example, may result in gradient updating to one kind when a very big gradient passes through the backpropagation of ReLU neuron Special state, neuron is likely to be activated again by other any operation nodes again in this state.If this It happens, then will all become 0 from this gradient Jing Guo this neuron backpropagation.That is, this ReLU unit By irreversible paralysis in training, which results in the diversified loss of data.

As shown in Figure 1, on MNIST training set, Relu activation primitive is used using the unified version ratio of activation primitive AS Realize lower classification error rate.Compared to original Sigmoid activation primitive is used, network has stronger capability of fitting.

It is illustrated in figure 2 the parameter regulation procedure chart that adaptive activation primitive unifies version, adaptive activation primitive is learned Parameter initialization is practised to be set as：A0=1.0, a1=0.0, b0=1.0, b1=0.0, after training iteration, final argument Become：A0=3.87, a1=0.07, b0=5.89, b1=-0.51, substantially just without too big variation.

As shown in figure 3, final adaptive activation primitive unifies version to be had more greatly compared to traditional Si gmoid activation primitive Codomain, largely solve the problems, such as the gradient disperse of traditional Si gmoid activation primitive, thus classification accuracy rate obtain Rise.

As shown in figure 4, the present invention using final adaptive activation primitive version (RAS) and other several activation primitives into The comparison of row experimental result, final adaptive activation primitive formula are：

F=5.89*sigmoid (3.87*x+0.07) -0.51

Can be seen that unified adaptive activation primitive from experimental result comparison diagram can reach best experiment effect, In the experiment of bladder cancer cell detection, using the testing result and speed of unified adaptive activation primitive all than traditional activation Function will be got well, and further demonstrating each network can train to obtain oneself most suitable activation primitive.

Claims

1. a kind of adaptive activation primitive parameter adjusting method towards deep neural network, which is characterized in that the method packet Include following steps：

f_(x)=f (a*x+c)

Wherein a and c be all for control activation primitive shape can learning parameter, so-called neural network regards many single as The combination of neuron, define neural network output be collection weight, deviation and the compound function that neuron parameter can be learnt, Function is as follows：

h_(w,b,a,c)=h (f (a*x+c))

Wherein h represents the output of neural network, and w and b represent the weight and deviation of network；At the same time, which is also seen as In neural network all neurons using same group can learning parameter, one more it is extensive define be：It is every in neural network A neuron all uses different customized parameters, as follows：

Wherein fn represents one layer in network of each neuron, each layer of neuron using identical customized parameter be defined as Under：

Relay algorithm using reversed and train the adaptive activation primitive in neural network, wherein can learning parameter along with weight With bigoted as the progress of network training is optimized together, parameter { a1 ..., n, b1 ..., n } is obtained according to chain type Rule for derivation To update, update as follows：

Wherein ai ∈ { a1 ..., n, b1 ..., n }, L indicate cost function,This can pass through backpropagation from later layer It obtains, weighted term ∑ Xi can be used on all positions of characteristic pattern or neural net layer, for the variable shared in one layer, ladder Degree ai can be used following formula to acquire, and ∑ i is used to sum to the neuron in all channels or one layer, and formula is as follows：

Step 2, adaptive activation primitive is carried out based on MNIST data set and other classical activation primitives carries out experimental result comparison With analysis, process is as follows：

The network used is there are three hidden layer, and each hidden layer has 50 neurons, uses gradient descent algorithm iteration at any time 100 periods, learning rate are set as 0.01, and minimum batch size is 100.

Step 3, after step 2 obtains optimal activation primitive version, applied to the detection of specific bladder cancer cell, process is such as Under：

3.1, the production of data set is carried out to bladder cancer；

3.2, selection algorithm and model carry out the initialization of parameter；

2. a kind of method of the adaptive activation primitive parameter regulation of deep neural network as described in claim 1, feature exist In, in the step 2, the comparison activation primitive used have traditional Si gmoid function, tradition ReLU activation primitive, it is adaptive stress The layering version of the unified version of function living, the respective version of adaptive activation primitive and adaptive activation primitive.

3. a kind of method of the adaptive activation primitive parameter regulation of deep neural network as claimed in claim 1 or 2, feature It is, in described 3.1, bladder cancer cell data set is made into pascal_voc2007 format, mainly utilizes the xml text generated The label information of part preservation cell.

4. a kind of method of the adaptive activation primitive parameter regulation of deep neural network as claimed in claim 1 or 2, feature It is, in described 3.2, selects Faster R-CNN algorithm, the initialization of network parameter is carried out using vgg16 model, utilize Vgg16 pre-training model carries out network parameter initialization.

5. a kind of method of the adaptive activation primitive parameter regulation of deep neural network as claimed in claim 4, feature exist In, in described 3.3, utilize generated in step 3.2 optimal activation primitive version replacement Faster R-CNN algorithm in tradition Activation primitive finally carries out the analysis and comparison of experimental result.