CN108710944A - One kind can train piece-wise linear activation primitive generation method - Google Patents

One kind can train piece-wise linear activation primitive generation method Download PDF

Info

Publication number
CN108710944A
CN108710944A CN201810412916.6A CN201810412916A CN108710944A CN 108710944 A CN108710944 A CN 108710944A CN 201810412916 A CN201810412916 A CN 201810412916A CN 108710944 A CN108710944 A CN 108710944A
Authority
CN
China
Prior art keywords
activation primitive
function
piece
wise linear
sections
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810412916.6A
Other languages
Chinese (zh)
Inventor
潘红兵
郭良蛟
秦子迪
李丽
何书专
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201810412916.6A priority Critical patent/CN108710944A/en
Publication of CN108710944A publication Critical patent/CN108710944A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Computing arrangements based on biological models using neural network models
    • G06N3/04Architectures, e.g. interconnection topology
    • G06N3/0481Non-linear activation functions, e.g. sigmoids, thresholds
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Computing arrangements based on biological models using neural network models
    • G06N3/08Learning methods

Abstract

The method of the trainable piece-wise linear activation primitive of generation of the present invention, the computational problem for simplifying hardware-accelerated complicated nonlinear activation function in the process.The invention substitutes nonlinear activation primitive using the linear function of segmented, and coefficient is constantly updated by the method for itself study, to achieve the purpose that substitute nonlinear activation function using linear function.For general linear activation primitive, for this method more close to script nonlinear activation function, relative error is smaller, and updates coefficient in continuous reverse propagation, the problem of accelerating the convergence rate of study, and eliminating gradient explosion and gradient disappearance as far as possible in a certain range.

Description

One kind can train piece-wise linear activation primitive generation method
Technical field
The invention belongs to computer realms more particularly to one kind can training piece-wise linear activation primitive generation method.
Background technology
In recent years, machine learning obtains practice in multiple fields such as computer, internets, huge work is played With greatly improving including image recognition, the success rate of the functions such as language identification.In neural network, the knot of each layer network For fruit all by being just used as final output after the processing of activation primitive, the continuous development of activation primitive is that depth network has been constantly progressive A kind important link, activation primitive are constantly progressive so that neural network output result is more accurate.
Learning network only comprising convolutional layer and full articulamentum by the operation of multilayer, is obtained by Linear Mapping It arrives, even the network of depth, the mapping that can be expressed also is only linear, it is difficult to be expressed non-linear in practice Data.Introducing nonlinear activation function enables neural network, come segmentation plane, to classify using smooth curve, no longer Simple approaches smoothed curve to reach identical purpose using complicated linear combination.
Along with the development of depth network, diversified activation primitive is used among network, but upper using, Machine learning is slower with general hardware realization since network is huge, therefore is carried out to depth network by hardware It is imperative to accelerate.During hardware-accelerated, due to the shortage of calculation resources, it is complicated to be difficult to realize nonlinear activation function Operation.And existing linear activation primitive, such as ReLU functions, PReLU functions etc., although the utilization simplification of these linear functions Network, but the utilization of simple linear function is still solving the problems, such as that Nonlinear Mapping, the output of depth network are still stagnant Stay in the stage of Linear Mapping.
Invention content
It is an object of the invention to overcome the shortcomings of above-mentioned theory result, providing one kind can train piece-wise linear to activate letter Several generation methods utilizes fixed calculation resources, under conditions of ensureing the various characteristics of activation primitive, what simplification there is now The method of nonlinear activation function, is specifically realized by following technical scheme:
The generation method for training piece-wise linear activation primitive, specifically comprises the following steps:
Step 1) determines the required nonlinear function substituted;
Selected nonlinear function is segmented by step 2), by each section of slope and offset with the described non-of step 1) It is initialized on the basis of linear function;
Step 3) activates the slope and offset of nonlinear function by neural network model, and with the network model Iteration update is trained to the slope and offset of nonlinear function.
The further design for training the generation method of piece-wise linear activation primitive is, in the step 1) Nonlinear function is sigmoid functions or tanh functions.
The further design for training the generation method of piece-wise linear activation primitive is, in the step 2) Eight sections are segmented into, four sections of semiaxis, respectively (- ∞, -6&#93 are born;,(-6,-4],(-4,-2],(-2,0];Four sections of positive axis, respectively (0,2],(2,4],(4,6], (6 ,+∞), positive axis, negative semiaxis symmetry axis centered on y-axis are distributed at axial symmetry.
The further design for training the generation method of piece-wise linear activation primitive is, at the beginning of the step 2) Beginningization operates:The nonlinear activation function for being divided into eight sections of sections is analyzed paragraph by paragraph, it is wherein representative by each section The slope of any as the initial slope k that can train piece-wise linear activation primitivei 0, then found out by simple mathematical relationship The initial offset b of functioni 0, for multistage function link to get up, constitute wherein one section of y=k of initial linear functioni 0x+ bi 0
The further design of the generation method for training piece-wise linear activation primitive is, nonlinear activation function The nonlinear activation function in two sections of sections in eight sections near origin does not have offset b.
The further design for training the generation method of piece-wise linear activation primitive is, in the step 3) Training update includes the following steps:
Step 3-1) eight sections of functions are merged into a complete segmented activation primitive, the segmented activation primitive Each slope k is iterated update with the backpropagation of neural network, and iteration update is used with momentum more according to formula (1) New paragon,
Wherein, μ indicates that momentum, ε indicate learning rate.
Step 3-2) the new offset b corresponding to k found out by simple mathematical relationship, so that piece-wise linear function is connected It passes through, forms new piece-wise linear activation primitive;
Step 3-3) iteration, obtain the piece-wise linear activation primitive of training completion.
Advantages of the present invention is as follows:
The method provided by the invention for generating activation primitive realizes the now Common advantages there are two types of activation primitive, both possesses Non-linear, utilization of linear function reduction that can also be segmented using it to hardware computation resource of normal activation function, to Accelerate the arithmetic speed of neural network.
The nonlinear activation function that this method simplifies both had been provided with the nonlinear feature of nonlinear function, it may have simple Linear operation condition, in the calculating process of complex network, while using a small amount of calculation resources but also data possess it is non- Linear Mapping as a result, the advantages of incorporating existing two parts activation primitive.
Description of the drawings
Fig. 1 is the flow chart that can train piece-wise linear activation primitive generation method.
Specific implementation method
The present invention is described in detail below in conjunction with the accompanying drawings.
The implementation case carries out example to approach tanh functions, disclose it is a kind of can training type piecewise linearity activation primitive Generation method, flow is as shown in Figure 1, steps are as follows:
Step 1) determines that the required nonlinear function substituted is tanh functions.
Selected nonlinear function is segmented, nonlinear activation function is divided into eight sections by the implementation case, bears semiaxis four Section, respectively (- ∞, -6],(-6,-4],(-4,-2],(-2,0], four sections of positive axis, respectively (0,2],(2,4],(4,6], (6 ,+∞), positive and negative semiaxis is centered on y-axis, at axial symmetry.
Step 2) will point good every section of independent analysis of nonlinear activation function at eight sections, it is wherein representative by each section The slope of a bit, the representative slope in this case is respectively tanh functions at x=-6, -5, -3, -1,1,3,5,6 eight Slope is as the initial slope k that can train piece-wise linear activation primitivei 0, simple mathematical relationship is recycled to find out the inclined of function Shifting amount bi 0, it is therefore an objective to multistage function is linked up, initial linear function wherein one section of y=k is constitutedi 0x+bi 0, wherein origin is attached Close two sections cross origin, have not both had offset b.And it is merged into a segmented activation primitive completely to link up, shape by eight sections Such as:
Each coefficient k of step 3) activation primitive is iterated update with the backpropagation of neural network, using band The update mode of momentum, specific formula are as follows:
Momentum and learning rate when wherein μ and ε is training in present case.
Each k utilizes the update mode with momentum to be updated, and institute is found out using simple mathematical relationship after update Corresponding new b values, form new activation primitive.By successive ignition, the piece-wise linear that last training is completed can be obtained Activation primitive, shaped like:
The method of the generation activation primitive of the present embodiment realizes the now Common advantages there are two types of activation primitive, both possesses Non-linear, utilization of linear function reduction that can also be segmented using it to hardware computation resource of normal activation function, to Accelerate the arithmetic speed of neural network.
The nonlinear activation function that this method simplifies both had been provided with the nonlinear feature of nonlinear function, it may have simple Linear operation condition, in the calculating process of complex network, while using a small amount of calculation resources but also data possess it is non- Linear Mapping as a result, the advantages of incorporating existing two parts activation primitive.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, Any one skilled in the art in the technical scope disclosed by the present invention, the change or replacement that can be readily occurred in, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of the claims Subject to.

Claims (6)

1. one kind can train piece-wise linear activation primitive generation method, it is characterised in that specifically comprise the following steps:
Step 1) determines the required nonlinear function substituted;
Selected nonlinear function is segmented by step 2), by each section of slope and offset with the described non-linear of step 1) It is initialized on the basis of function;
Step 3) activates the slope and offset of nonlinear function, and changing with the network model by neural network model In generation, is trained update to the slope and offset of nonlinear function.
2. the generation method according to claim 1 for training piece-wise linear activation primitive, which is characterized in that the step It is rapid 1) in nonlinear function be sigmoid functions or tanh functions.
3. the generation method according to claim 1 for training piece-wise linear activation primitive, which is characterized in that the step It is rapid 2) in be segmented into eight sections, bear four sections of semiaxis, respectively (- ∞, -6],(-6,-4],(-4,-2],(-2,0];Positive axis four Section, respectively (0,2],(2,4],(4,6], (6 ,+∞), positive axis, negative semiaxis symmetry axis centered on y-axis, at axial symmetry Distribution.
4. the generation method according to claim 3 for training piece-wise linear activation primitive, which is characterized in that the step Rapid initialization operation 2) is:The nonlinear activation function for being divided into eight sections of sections is analyzed paragraph by paragraph, wherein by each section More representational slope is as the initial slope k that can train piece-wise linear activation primitivei 0, then pass through simple mathematics Relationship finds out the initial offset b of functioni 0, for multistage function link to get up, constitute wherein one section of y of initial linear function =ki 0x+bi 0
5. the generation method according to claim 1 for training piece-wise linear activation primitive, which is characterized in that non-linear The nonlinear activation function in two sections of sections in eight sections of activation primitive near origin does not have offset b.
6. the generation method according to claim 1 for training piece-wise linear activation primitive, which is characterized in that the step It is rapid 3) in training update include the following steps:
Step 3-1) eight sections of functions are merged into a complete segmented activation primitive, the segmented activation primitive it is each A slope k is iterated update with the backpropagation of neural network, and iteration update uses the update side with momentum according to formula (1) Formula,
Wherein, μ indicates that momentum, ε indicate learning rate.
Step 3-2) the new offset b corresponding to k found out by simple mathematical relationship, keep piece-wise linear function coherent, Form new piece-wise linear activation primitive;
Step 3-3) iteration, obtain the piece-wise linear activation primitive of training completion.
CN201810412916.6A 2018-04-30 2018-04-30 One kind can train piece-wise linear activation primitive generation method Pending CN108710944A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810412916.6A CN108710944A (en) 2018-04-30 2018-04-30 One kind can train piece-wise linear activation primitive generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810412916.6A CN108710944A (en) 2018-04-30 2018-04-30 One kind can train piece-wise linear activation primitive generation method

Publications (1)

Publication Number Publication Date
CN108710944A true CN108710944A (en) 2018-10-26

Family

ID=63867625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810412916.6A Pending CN108710944A (en) 2018-04-30 2018-04-30 One kind can train piece-wise linear activation primitive generation method

Country Status (1)

Country Link
CN (1) CN108710944A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110837885A (en) * 2019-10-11 2020-02-25 西安电子科技大学 Sigmoid function fitting method based on probability distribution
CN111126581A (en) * 2018-12-18 2020-05-08 中科寒武纪科技股份有限公司 Data processing method and device and related products
CN114880693A (en) * 2022-07-08 2022-08-09 蓝象智联(杭州)科技有限公司 Method and device for generating activation function, electronic equipment and readable medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126581A (en) * 2018-12-18 2020-05-08 中科寒武纪科技股份有限公司 Data processing method and device and related products
CN110837885A (en) * 2019-10-11 2020-02-25 西安电子科技大学 Sigmoid function fitting method based on probability distribution
CN114880693A (en) * 2022-07-08 2022-08-09 蓝象智联(杭州)科技有限公司 Method and device for generating activation function, electronic equipment and readable medium

Similar Documents

Publication Publication Date Title
CN108710944A (en) One kind can train piece-wise linear activation primitive generation method
CN106548208B (en) A kind of quick, intelligent stylizing method of photograph image
CN110334219A (en) The knowledge mapping for incorporating text semantic feature based on attention mechanism indicates learning method
CN107180430A (en) A kind of deep learning network establishing method and system suitable for semantic segmentation
CN108399428B (en) Triple loss function design method based on trace ratio criterion
CN111260740A (en) Text-to-image generation method based on generation countermeasure network
CN103942571B (en) Graphic image sorting method based on genetic programming algorithm
CN108121975A (en) A kind of face identification method combined initial data and generate data
CN109871504B (en) Course recommendation system based on heterogeneous information network and deep learning
CN108171324A (en) A kind of variation own coding mixed model
CN108986101B (en) Human body image segmentation method based on cyclic cutout-segmentation optimization
CN109711401A (en) A kind of Method for text detection in natural scene image based on Faster Rcnn
CN109920021A (en) A kind of human face sketch synthetic method based on regularization width learning network
CN109800317A (en) A kind of image querying answer method based on the alignment of image scene map
CN112307714A (en) Character style migration method based on double-stage deep network
CN110826338B (en) Fine-grained semantic similarity recognition method for single-selection gate and inter-class measurement
Duarte et al. A new strategy to evaluate the attractiveness in a dynamic island model
CN114332519A (en) Image description generation method based on external triple and abstract relation
CN110210347A (en) A kind of colored jacket layer paper-cut Intelligentized design method based on deep learning
CN112784909B (en) Image classification and identification method based on self-attention mechanism and self-adaptive sub-network
CN108629374A (en) A kind of unsupervised multi-modal Subspace clustering method based on convolutional neural networks
CN108256630A (en) A kind of over-fitting solution based on low dimensional manifold regularization neural network
CN112905762A (en) Visual question-answering method based on equal attention-deficit-diagram network
CN111062452A (en) Method and equipment for performing enhanced training aiming at Gaussian classification neural network
CN112488238A (en) Hybrid anomaly detection method based on countermeasure self-encoder

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181026