CN106779050A

CN106779050A - The optimization method and device of a kind of convolutional neural networks

Info

Publication number: CN106779050A
Application number: CN201611051664.6A
Authority: CN
Inventors: 陈书楷; 杨奇
Original assignee: Xiamen Zhongkong Biological Recognition Information Technology Co Ltd
Current assignee: Xiamen Central Intelligent Information Technology Co., Ltd.
Priority date: 2016-11-24
Filing date: 2016-11-24
Publication date: 2017-05-31

Abstract

Include the invention provides a kind of optimization method of convolutional neural networks：Shortcut connection is set on the increased layer of convolutional neural networks institute, connecting corresponding residual error by the study acquisition shortcut maps；Determine that the shortcut connection is corresponding according to residual error mapping and expect mapping；Expect that mapping replaces the shortcut to connect corresponding layer by described, carry out convolutional neural networks model prediction.Convolutional neural networks optimization method of the present invention, can effectively reduce increased layer parameter, make the circulation of internetwork data more smoothly, be conducive to improving the precision of prediction and predetermined speed of model.

Description

The optimization method and device of a kind of convolutional neural networks

Technical field

The invention belongs to artificial neural network field, more particularly to a kind of convolutional neural networks optimization method and device.

Background technology

Convolutional neural networks (English full name be Convolutional Neural Network, English abbreviation is CNN) are One kind of artificial neural network, has turned into the study hotspot of speech analysis and field of image recognition at present.Convolutional neural networks Weights share network structure, similar to biological neural network, effectively reduce the complexity of network model, reduce weights Quantity.

With the development of CNN networks, (English full name is visual geometry group, Chinese full name to particularly VGG For：Visual geometric group) convolutional neural networks proposition so that the increase of the network number of plies is important as one of convolutional neural networks Research direction.But, with the increase of the network number of plies, it may appear that the blast of the significantly disappearance of gradient, or gradient, can cause Training can not effectively restrain, and the number of parameters of convolutional neural networks increases sharply, and influence the precision of prediction of system and test the speed in advance Degree.

The content of the invention

It is an object of the invention to provide a kind of optimization method of convolutional neural networks, to solve prior art due to network The number of plies increases, and causes the number of parameters of convolutional neural networks to increase sharply, and influences the precision of prediction of system and asking for predetermined speed Topic.

In a first aspect, the embodiment of the invention provides a kind of optimization method of convolutional neural networks, methods described includes：

Shortcut connection is set on the increased layer of convolutional neural networks institute, connects right by learning the acquisition shortcut The residual error answered maps；

Determine that the shortcut connection is corresponding according to residual error mapping and expect mapping；

Expect that mapping replaces the shortcut to connect corresponding layer by described, carry out convolutional neural networks model prediction.

It is described true according to residual error mapping with reference in a first aspect, in the first possible implementation of first aspect The fixed shortcut connection is corresponding to expect that mapping step includes：

Judge that expectation mapping H (X) is Nonlinear Mapping, and mapping variable X has identical dimension with expectation mapping H (X) During number, expectation mapping H (X)=F (X)+X, wherein F (X) maps for residual error；

Judge that expectation mapping H (X) is Nonlinear Mapping, then mapping variable X is differed with expectation mapping H (X) dimension When, the expectation maps H (X)=F (X)+f (X), wherein, f (X)=w × X, wherein F (X) map for residual error, and w is neutral net Weight.

With reference in a first aspect, in second possible implementation of first aspect, methods described also includes：

The convolutional neural networks are normalized with initialization and the normalization in intermediate layer training.

With reference in a first aspect, in the third possible implementation of first aspect, methods described also includes：

Convolution algorithm is carried out to the layer of the convolutional neural networks using N × N convolution kernels, wherein 2<N<7.

With reference in a first aspect, in the 4th kind of possible implementation of first aspect, methods described also comprises the steps In one or more：

After the last convolutional layer of the convolutional neural networks, maximum pond layer is added, the maximum pond layer Sample sliding-window is C*C, and step-length is 1, and wherein C is identical through the length of side of the image block after the treatment of whole convolutional layers with image；

First the training convolutional neural networks model on sample database, then updates middle on accurate database is marked The convolutional neural networks are finely adjusted by convolutional layer and the parameter for connecting layer entirely；

On the basis of original image length-width ratio is kept, the size of input picture is reduced；

Reduce the port number of convolutional layer.

Second aspect, the embodiment of the invention provides a kind of optimization device of convolutional neural networks, and described device includes：

Shortcut connect setting unit, for setting shortcut connection on the increased layer of convolutional neural networks institute, passes through Study obtains the shortcut and connects corresponding residual error mapping；

Expect to map acquiring unit, mapping is expected for determining that the shortcut connection is corresponding according to residual error mapping；

Instead of unit, for expecting that mapping replaces the shortcut to connect corresponding layer by described, convolutional neural networks are carried out Model prediction.

With reference to second aspect, in the first possible implementation of second aspect, the expectation map unit includes：

First computation subunit, for judge it is described expect that mapping H (X) be Nonlinear Mapping, and mapping variable X and expectation When mapping H (X) has same dimension, expectation mapping H (X)=F (X)+X, wherein F (X) maps for residual error；

Second computation subunit, for judging that expectation mapping H (X) is Nonlinear Mapping, then mapping variable X and expectation When mapping H (X) dimension is differed, the expectation maps H (X)=F (X)+f (X), wherein, f (X)=w × X, wherein F (X) they are residual Difference mapping, w is the weight of neutral net.

With reference to second aspect, in second possible implementation of second aspect, described device also includes：

Normalization training unit, for the convolutional neural networks to be normalized with initialization and the normalization in intermediate layer instruction Practice.

With reference to second aspect, in the third possible implementation of second aspect, described device also includes：

Convolution algorithm unit, for carrying out convolution algorithm to the layer of the convolutional neural networks using N × N convolution kernels, its In 2<N<7.

With reference to second aspect, in the 4th kind of possible implementation of second aspect, described device also includes following units In one or more：

Maximum pond unit, after the last convolutional layer in the convolutional neural networks, adds maximum pond layer, The sample sliding-window of the maximum pond layer is C*C, and step-length is 1, and wherein C is with image through the image after the treatment of whole convolutional layers The length of side of block is identical；

Fine-adjusting unit, for elder generation on sample database training convolutional neural networks model, then mark accurately number According to convolutional layer and the parameter for connecting layer entirely in the middle of being updated on storehouse, the convolutional neural networks are finely adjusted；

Size adjusting unit, on the basis of original image length-width ratio is kept, reducing the size of input picture；

Passage adjustment unit, the port number for reducing convolutional layer.

In the present invention, connected by setting shortcut on the increased layer of convolutional neural networks institute, and by learning to obtain The shortcut connects corresponding residual error mapping, obtains that the shortcut connection is corresponding to expect mapping according to residual error mapping, will It is described to expect that mapping replaces the increased layer of institute, it is possible to reduce the parameter of the increased layer of institute, internetwork data is circulated more suitable Freely, be conducive to improving the precision of prediction and predetermined speed of model.

Brief description of the drawings

Fig. 1 is that the optimization method of convolutional neural networks provided in an embodiment of the present invention realizes flow chart；

Fig. 2 is that residual error network provided in an embodiment of the present invention realizes schematic diagram；

Fig. 3 is the structural representation of the optimization device of convolutional neural networks provided in an embodiment of the present invention.

Specific embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

The purpose of the embodiment of the present invention is to provide a kind of optimization method of convolutional neural networks, to solve in the prior art When increasing due to the convolutional neural networks number of plies, because the layer of increased convolutional neural networks can cause the parameter of neural network model Increase sharply, due to increasing for parameter, cause the more memory spaces of forecast model needs to carry out computing and storage to parameter, The problem that the precision of prediction of the prediction correct velocity of the model of reduction, and forecast model accordingly can also be reduced.With reference to The present invention is further illustrated for accompanying drawing.

What Fig. 1 showed the optimization method of convolutional neural networks provided in an embodiment of the present invention realizes flow, and details are as follows：

In step S101, shortcut connection is set on the increased layer of convolutional neural networks institute, by learning to obtain The shortcut connects corresponding residual error mapping.

In step s 102, determine that the shortcut connection is corresponding according to residual error mapping and expect mapping.

In step s 103, expect that mapping replaces the shortcut to connect corresponding layer by described, carry out convolutional neural networks Model prediction.

Specifically, the increased layer of institute, if identical mapping layer (identify mapping), as shown in Fig. 2 i.e. In the presence of an identical mapping H so that

H (X)=X (1)

So, for identical mapping H, with the increase of the number of plies, increase the training error after the number of plies with without before increasing Training error compare, training error will not increase therewith.

The corresponding mapping H (X) of the increased layer of institute is Nonlinear Mapping, then：

H (X)=F (X)+X (2)

That is：

F (X)=H (X)-X (3)

Wherein, F (X) is residual error mapping, if (3) formula is optimized for identity, it is only necessary to optimize F (X)=0.

In convolutional neural networks, to expect to map, F (X)+X can be connected and realized H (X) by shortcut, as shown in Fig. 2 The shortcut connection is exactly on the feedforward convolutional network of standard, to increase a jump, bypasses the connection of some layers.Also, Often bypass a layer and just produce a residual block, the residual error plus input tensor is needed when convolutional layer is predicted.

In addition, if the corresponding mapping H (X) of the increased layer of institute is Nonlinear Mapping, and mapping variable X and expectation mapping H (X) it is when dimension is differed, then described to expect mapping H (X)=F (X)+f (X), wherein, f (X)=w × X, wherein F (X) they are residual error Mapping, w is the weight of neutral net.

Simply one of which implementation method is optimized to the convolutional neural networks by residual error network, in actual optimization During, can also include being combined one or more modes in following optimal way carrying out convolutional neural networks Optimization：

Mode 1, to the convolutional neural networks normalize initialization and intermediate layer normalization training.

Wherein, the English abbreviation of the normalization initialization is PReLU, and English full name is Parametric Rectified Linear Unit, the as ReLU with parameter, wherein：

ReLU can be expressed as：

Normalization initialization can be expressed as：

If ai=0, then PReLU deteriorates to ReLU；If ai is a fixed value for very little (such as ai=0.01), PReLU deteriorates to Leaky ReLU (LReLU).In actually used, ai is initialized to 0.25.

The normalization in the intermediate layer, refers to batch normalization, i.e., batch is normalized.It is for each group Batch sample, in each layer of network, dtex is levied carries out normalization (normalization) to input, to each feature point Other normalization normalization, the i.e. single neuron input to each layer in network, after calculating average and variance, then enter Row normalization is normalized so that the average of result is 0, and variance is 1.

Mode 2, convolution algorithm is carried out to the layer of the convolutional neural networks using N × N convolution kernels, wherein 2<N<7, than Such as, it is a kind of preferred embodiment in, can select 3 × 3 convolution kernel convolution fortune is carried out to the layer of the convolutional neural networks Calculate.Convolution algorithm is carried out by convolution kernel, the resolving power of convolutional layer can be increased, reduce network parameter, reduce computation complexity.

Mode 3, after the last convolutional layer of the convolutional neural networks, add maximum pond layer, the maximum pond The sample sliding-window for changing layer is C*C, and step-length is 1, wherein C and image through the image block after the treatment of whole convolutional layers length of side phase Together.

Mode 4, first the training convolutional neural networks model on sample database, then on accurate database is marked more The convolutional neural networks are finely adjusted by convolutional layer and the parameter for connecting layer entirely in the middle of new.Trained by finely tuning, net can be increased The generalization ability of network model.It is worth noting that, the sample database requirement sample class is more, and mark accurate database It is required that there is identical classification with sample database.

Mode 5, keep original image length-width ratio on the basis of, reduce input picture size.

Mode 6, the port number for reducing convolutional layer.

In order to verify the effect of optimization of convolutional neural networks of the present invention, said with reference to specific experiment data It is bright：

Experiment 1, the residual error network optimization of the present invention is compared with predicting the outcome for VGG networks：

On MAWI databases, image has 300,000 width images, checking collection by cutting postnormalization to 64 × 64, training set There are 3.77 ten thousand width images.Test library is LAP and Staff, respectively comprising 4022 and 790 width images, so that the age is estimated as an example, and test Result is as follows：

The test result of table one, VGG networks

The test result of table two, residual error network

As can be seen here, residual error network (34 layers) is both configured to 256, residual error net with the largest passages number of VGG networks (53 layers) Residual block in network has only been used 1 time between convolutional layer.From the point of view of test result, in the test result of residual error network, the age is estimated Meter effect will get well, and compared with VGG networks, wave filter is less, and complexity is small for residual error network.Here residual error network has been used above-mentioned Mode 1, mode 2, mode 5 and optimal way described in Fig. 1.

Experiment 2, miniaturization optimization residual error network

On the basis of experiment 1, this practicality further increases maximum pond and the method for reduction convolutional layer port number is entered Row optimization.On Staff databases, though the average age difference of worst result is increased 0.21 years old, average taking at least is reduced 54ms.On LAP databases, though the average age difference of worst result is increased 0.01 years old, average taking at least reduces 95ms.Concrete condition is as shown in following table three, four, five：

Table three, model size are the test result of 14M：

Table four, model size are the test result of 3.5M：

Table five, model size are the test result of 1.6M：

From 3-table of table 5 as can be seen that being optimized in the method that increased maximum pond and reduction convolutional layer port number Afterwards, although the accuracy at residual error network-evaluated age has a smaller reduction, but its model become it is smaller, age estimating speed becomes more Hurry up, this is extremely advantageous to engineer applied, also illustrate that miniaturization residual error network is successful.

Fig. 3 is the optimization device of convolutional neural networks provided in an embodiment of the present invention, and details are as follows：

The optimization device of the convolutional neural networks, including：

Shortcut connect setting unit 301, for setting shortcut connection on the increased layer of convolutional neural networks institute, leads to Cross study and obtain the corresponding residual error mapping of the shortcut connection；

Expect to map acquiring unit 302, for determining that the corresponding expectation of the shortcut connection is reflected according to residual error mapping Penetrate；

Instead of unit 303, for expecting that mapping replaces the shortcut to connect corresponding layer by described, convolutional Neural net is carried out Network model prediction.

Preferably, the expectation map unit includes：

Preferably, described device also includes：

Preferably, described device also include following units in one or more：

Size adjusting unit, on the basis of the ratio for keeping original image size, reducing the size of input picture；

Passage adjustment unit, the port number for reducing convolutional layer.

The optimization device of convolutional neural networks described in Fig. 3, the optimization method with convolutional neural networks described in Fig. 1 is corresponding.

In several embodiments provided by the present invention, it should be understood that disclosed apparatus and method, can be by it Its mode is realized.For example, device embodiment described above is only schematical, for example, the division of the unit, only Only a kind of division of logic function, can there is other dividing mode when actually realizing, such as multiple units or component can be tied Another system is closed or is desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or discussed Coupling each other or direct-coupling or communication connection can be the INDIRECT COUPLINGs or logical of device or unit by some interfaces Letter connection, can be electrical, mechanical or other forms.

The unit that is illustrated as separating component can be or may not be it is physically separate, it is aobvious as unit The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be according to the actual needs selected to realize the mesh of this embodiment scheme 's.

In addition, during each functional unit in each embodiment of the invention can be integrated in a processing unit, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.Above-mentioned integrated list Unit can both be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.

If the integrated unit is to realize in the form of SFU software functional unit and as independent production marketing or use When, can store in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part for being contributed to prior art in other words or all or part of the technical scheme can be in the form of software products Embody, the computer software product is stored in a storage medium, including some instructions are used to so that a computer Equipment (can be personal computer, server, or network equipment etc.) performs the complete of each embodiment methods described of the invention Portion or part.And foregoing storage medium includes：USB flash disk, mobile hard disk, read-only storage (ROM, Read-Only Memory), Random access memory (RAM, Random Access Memory), magnetic disc or CD etc. are various can be with store program codes Medium.

Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit the invention, it is all in essence of the invention Any modification, equivalent and improvement made within god and principle etc., should be included within the scope of the present invention.

Claims

1. a kind of optimization method of convolutional neural networks, it is characterised in that methods described includes：

Shortcut connection is set on the increased layer of convolutional neural networks institute, connects corresponding by learning the acquisition shortcut Residual error maps；

2. method according to claim 1, it is characterised in that described to determine that the shortcut connection is right according to residual error mapping The expectation mapping step answered includes：

Judge it is described expect that mapping H (X) be Nonlinear Mapping, and mapping variable X with mapping H (X) is expected with same dimension when, It is described to expect that mapping H (X)=F (X)+X, wherein F (X) maps for residual error；

Judge that expectation mapping H (X) is Nonlinear Mapping, then when mapping variable X is with expecting that mapping H (X) dimension is differed, institute Expectation mapping H (X)=F (X)+f (X) is stated, wherein, f (X)=w × X, wherein F (X) map for residual error, and w is the power of neutral net Weight.

3. method according to claim 1, it is characterised in that methods described also includes：

4. method according to claim 1, it is characterised in that methods described also includes：

5. it is according to claim 1 to state method, it is characterised in that methods described also comprise the steps in it is a kind of or many Kind：

After the last convolutional layer of the convolutional neural networks, maximum pond layer, the sampling of the maximum pond layer are added Sliding window is C*C, and step-length is 1, and wherein C is identical through the length of side of the image block after the treatment of whole convolutional layers with image；

First the training convolutional neural networks model on sample database, then updates middle convolution on accurate database is marked The convolutional neural networks are finely adjusted by layer and the parameter for connecting layer entirely；

Reduce the port number of convolutional layer.

6. the optimization device of a kind of convolutional neural networks, it is characterised in that described device includes：

Shortcut connect setting unit, for setting shortcut connection on the increased layer of convolutional neural networks institute, by study Obtain the shortcut and connect corresponding residual error mapping；

Instead of unit, for expecting that mapping replaces the shortcut to connect corresponding layer by described, convolutional neural networks model is carried out Prediction.

7. device according to claim 6, it is characterised in that the expectation map unit includes：

First computation subunit, for judge it is described expect that mapping H (X) be Nonlinear Mapping, and mapping variable X with expect mapping When H (X) has same dimension, expectation mapping H (X)=F (X)+X, wherein F (X) maps for residual error；

Second computation subunit, for judging that expectation mapping H (X) is Nonlinear Mapping, then mapping variable X maps with expectation When H (X) dimension is differed, the expectation maps H (X)=F (X)+f (X), wherein, f (X)=w × X, wherein F (X) reflects for residual error Penetrate, w is the weight of neutral net.

8. device according to claim 6, it is characterised in that described device also includes：

Normalization training unit, for the convolutional neural networks to be normalized with initialization and the normalization in intermediate layer training.

9. device according to claim 6, it is characterised in that described device also includes：

Convolution algorithm unit, for carrying out convolution algorithm to the layer of the convolutional neural networks using N × N convolution kernels, wherein 2<N <7。

10. it is according to claim 6 to state device, it is characterised in that described device is also including in following units or many It is individual：

Maximum pond unit, after the last convolutional layer in the convolutional neural networks, adds maximum pond layer, described The sample sliding-window of maximum pond layer is C*C, and step-length is 1, and wherein C is with image through the image block after the treatment of whole convolutional layers The length of side is identical；

Fine-adjusting unit, for elder generation on sample database training convolutional neural networks model, then marking accurate database The convolutional neural networks are finely adjusted by convolutional layer and the parameter for connecting layer entirely in the middle of upper renewal；

Passage adjustment unit, the port number for reducing convolutional layer.