CN108898213B - Adaptive activation function parameter adjusting method for deep neural network - Google Patents

Adaptive activation function parameter adjusting method for deep neural network Download PDF

Info

Publication number
CN108898213B
CN108898213B CN201810631395.3A CN201810631395A CN108898213B CN 108898213 B CN108898213 B CN 108898213B CN 201810631395 A CN201810631395 A CN 201810631395A CN 108898213 B CN108898213 B CN 108898213B
Authority
CN
China
Prior art keywords
activation function
network
adaptive
parameters
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810631395.3A
Other languages
Chinese (zh)
Other versions
CN108898213A (en
Inventor
胡海根
周莉莉
罗诚
陈胜勇
管秋
周乾伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810631395.3A priority Critical patent/CN108898213B/en
Publication of CN108898213A publication Critical patent/CN108898213A/en
Application granted granted Critical
Publication of CN108898213B publication Critical patent/CN108898213B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A method for adjusting parameters of an adaptive activation function for a deep neural network, the method comprising the steps of: step 1, firstly, carrying out mathematical definition on a self-adaptive activation function parameter adjusting method; step 2, carrying out comparison and analysis of experimental results of an adaptive activation function and other classical activation functions based on an MNIST data set, wherein the used network comprises three hidden layers, each hidden layer comprises 50 neurons, 100 cycles of iteration are carried out by using a gradient descent algorithm at any time, the learning rate is set to be 0.01, and the minimum batch number is 100; and 3, after the optimal activation function version is obtained in the step 2, the method is applied to the detection of specific bladder cancer cells. In the process of continuously training the network, the invention searches the optimal activation function suitable for the network by continuously adjusting the shape of the self, improves the performance of the network, reduces the total quantity of parameters which can be learned by the self-adaptive activation function in the network, accelerates the network learning rate and improves the generalization of the network.

Description

Adaptive activation function parameter adjusting method for deep neural network
Technical Field
The invention belongs to the field of adaptive activation functions, and designs a parameter adjusting method of an adaptive activation function for a deep neural network. The self-adaptive activation function controls the shape of the self-adaptive activation function by adding learnable parameters, and meanwhile, the learnable parameters can be updated along with network training through a back propagation algorithm, so that the integral learnable parameter number of the self-adaptive activation function in the network is reduced.
Background
Machine learning is widely applied to social life nowadays, while traditional machine learning mostly adopts shallow layer structures, such as a Gaussian Mixture Model (GMM), a Conditional Random Field (CRF), a Support Vector Machine (SVM), and the like, the shallow layer structures have limited representation capability on complex functions, relatively elementary extraction on original input signal features, certain restriction on generalization capability of the shallow layer structures on complex classification problems, and difficult solution of some complex natural signal processing problems, such as human voice and natural image recognition. Therefore, the deep learning greatly promotes the development of machine learning by simulating brain learning, and has the greatest characteristic that original data is converted into a higher-level and more abstract feature expression through some simple but nonlinear models, a deep nonlinear network structure is learned, the approximation of a complex function is realized, and the capability of learning essential features of a data set from a few sample sets is realized. Practice proves that deep learning is good at finding complex structures in high-dimensional data and is widely applied to the research fields of computer vision, speech recognition, natural language processing and the like.
With the application of deep learning in various fields, more and more research is focused on innovations and optimization of deep learning algorithms. The method comprises the steps of classifier and loss function optimization, gradient descent optimization based on back propagation, network weight parameter initialization optimization, artificial neural network optimization and the like, wherein the optimization of the artificial neural network is an important component of deep learning algorithm innovation. Artificial neural networks will have different network structures and numbers of neurons depending on the task, and in these networks one usually uses the same activation function, e.g. Sigmoid, Tanh, Relu. The adaptive activation function proposed in recent years enables network neurons to take different shapes, but with the enlargement of the network size and the increase of the neurons, learnable parameters for adjusting the shapes of the neurons take linear growth, and the learning efficiency of the network is greatly reduced. Therefore, the basic structure of the artificial neural network can be regarded as formed by connecting a plurality of neurons, and the activation function plays a very important role in the basic structure.
The main role of the activation function in an artificial neural network is to provide the non-linear expressive power of the network. If the neurons in a neural network are only linear operations, the network can only express simple linear mapping, and the linear mapping is still realized even if the depth and the width of the network are increased, so that the data of nonlinear distribution in an actual environment is difficult to model effectively. After the nonlinear activation function is added, the deep neural network has the layered nonlinear mapping learning capability. The invention mainly aims at improving the activation function and optimizing the connection between the neurons in the network, thereby further improving the performance of the network.
Disclosure of Invention
In order to reduce the total quantity of learnable parameters of the adaptive activation function in the network, accelerate the network learning rate and improve the generalization capability of the network, the invention provides a parameter adjusting method of the adaptive activation function for the deep neural network.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a method for adjusting parameters of an adaptive activation function for a deep neural network, the method comprising the steps of:
step 1, firstly, a mathematical definition is carried out on a parameter adjustment method of the adaptive activation function, and the process is as follows:
assuming that the number of adjustable parameters of the adaptive activation function is N, the adaptive activation function is defined as:
f(x)=f(a*x+c)
where a and c are both learnable parameters used to control the shape of the activation function, so-called neural networks are considered as a combination of many individual neurons, and the output of the neural network is defined as a complex function of set weights, biases, and learnable neuron parameters, the function being as follows:
h(w,b,a,c)=h(f(a*x+c))
where h represents the output of the neural network, w and b represent the weights and biases of the network; at the same time, the function is also viewed as using the same set of learnable parameters for all neurons in the neural network, a more extensive definition: each neuron in the neural network uses different adjustable parameters, as follows:
Figure BDA0001699947670000031
where fn represents each neuron in a layer of the network, each layer of neurons being defined using the same adjustable parameters as follows:
Figure BDA0001699947670000035
training an adaptive activation function in a neural network using an inverse rebroadcasting algorithm, wherein learnable parameters are optimized along with weights and bias execution as the network training progresses, and the parameters { a1, …, n, b1, …, n } are updated according to a chain derivative rule, as follows:
Figure BDA0001699947670000032
where ai e { a1, …, n, b1, …, n }, L denotes a cost function,
Figure BDA0001699947670000033
this term can be derived from the later layer by back propagation, the weighting term Σ Xi can be used at all positions in the profile or neural network layer, the gradient ai can be found for variables shared in one layer by the following formula, Σ i is used to sum the neurons in all channels or one layer, the formula is as follows:
Figure BDA0001699947670000034
and 2, comparing and analyzing the experimental results of the adaptive activation function and other classical activation functions based on the MNIST data set to obtain the final activation function version. The process is as follows:
the network used has three hidden layers, each hidden layer has 50 neurons, the gradient descent algorithm at any time is used for iterating 100 cycles, the learning rate is set to be 0.01, and the minimum batch number is 100.
Further, in step 2, the contrast activation functions used include a conventional Sigmoid function, a conventional ReLU activation function, a uniform version of an adaptive activation function, respective versions of the adaptive activation function, and a hierarchical version of the adaptive activation function.
Step 3, after obtaining the optimal activation function version in step 2, applying the optimal activation function version to the detection of specific bladder cancer cells, wherein the process is as follows:
3.1, making a data set for the bladder cancer;
3.2, selecting a combination algorithm and a model to initialize parameters;
and 3.3, comparing and analyzing the experimental results of the optimal activation function and the traditional activation function.
Further, in step 2, the contrast activation functions used include a conventional Sigmoid function, a conventional ReLU activation function, a uniform version of an adaptive activation function, respective versions of the adaptive activation function, and a hierarchical version of the adaptive activation function.
In the 3.1, the bladder cancer cell data set is prepared in the format of passacal _ voc2007, and the tag information of the cells is mainly stored by using the generated xml file.
In the step 3.2, a Faster R-CNN algorithm is selected, network parameters are initialized by using an vgg16 model, and network parameters are initialized by using a vgg16 pre-training model.
In the step 3.3, the optimal activation function version generated in the step 3.2 is used for replacing the traditional activation function in the Faster R-CNN algorithm, and finally, the analysis and comparison of the experimental results are carried out.
The invention has the following beneficial effects: the effectiveness of the adaptive activation function parameter adjusting method is proved by theory and experiment, the optimal activation function is provided for the network, the problems of gradient dispersion and the like of the traditional activation function are avoided, and the fitting capability of the network is improved.
Drawings
FIG. 1 is a graph of the AS convergence of the activation function of the present invention;
FIG. 2 is a diagram illustrating adjustment of parameters learnable by the AS activation function according to the present invention;
FIG. 3 is a diagram of an original sigmoid activation function and a final AS activation function according to the present invention;
FIG. 4 is a diagram of the experimental comparison of the final AS activation function of the present invention with other activation functions.
Fig. 5 is a diagram of Sigmoid activation function.
FIG. 6 is a graph of Tanh activation function.
Fig. 7 is a graph of the ReLU activation function.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 7, a method for adjusting parameters of an adaptive activation function for a deep neural network includes the following steps:
step 1, firstly, a mathematical definition is carried out on a parameter adjustment method of the adaptive activation function, and the process is as follows:
assuming that the number of adjustable parameters of the adaptive activation function is N, the adaptive activation function is defined as:
f(x)=f(a*x+c)
where a and c are both learnable parameters used to control the shape of the activation function, so-called neural networks are considered as a combination of many individual neurons, and the output of the neural network is defined as a complex function of set weights, biases, and learnable neuron parameters, the function being as follows:
h(w,b,a,c)=h(f(a*x+c))
where h represents the output of the neural network, w and b represent the weights and biases of the network; at the same time, the function is also viewed as using the same set of learnable parameters for all neurons in the neural network, a more extensive definition: each neuron in the neural network uses different adjustable parameters, as follows:
Figure BDA0001699947670000051
where fn represents each neuron in a layer of the network, each layer of neurons being defined using the same adjustable parameters as follows:
Figure BDA0001699947670000052
training an adaptive activation function in a neural network using an inverse rebroadcasting algorithm, wherein learnable parameters are optimized along with weights and bias execution as the network training progresses, and the parameters { a1, …, n, b1, …, n } are updated according to a chain derivative rule, as follows:
Figure BDA0001699947670000061
where ai e { a1, …, n, b1, …, n }, L denotes a cost function,
Figure BDA0001699947670000062
this term can be derived from the later layer by back propagation, the weighting term Σ Xi can be used at all positions in the profile or neural network layer, the gradient ai can be found for variables shared in one layer by the following formula, Σ i is used to sum the neurons in all channels or one layer, the formula is as follows:
Figure BDA0001699947670000063
step 2, carrying out comparison and analysis of experimental results of the adaptive activation function and other classical activation functions based on the MNIST data set, wherein the process is as follows:
the network used has three hidden layers, each hidden layer has 50 neurons, the gradient descent algorithm at any time is used for iterating 100 cycles, the learning rate is set to be 0.01, and the minimum batch number is 100.
Further, in step 2, the contrast activation functions used include a conventional Sigmoid function, a conventional ReLU activation function, a uniform version of an adaptive activation function, respective versions of the adaptive activation function, and a hierarchical version of the adaptive activation function.
The invention is based on MNIST data set, adding learnable parameters in sigmoid classical activation function to make it become self-adaptive activation function: and comparing the test results of the Adaptive active functions of each version with two classical active functions, namely Sigmoid and ReLU.
MNIST is a hand-written number recognition data set called fruit fly of deep learning experiment, which comprises 60000 pictures as training data and 10000 pictures as test set. In the MNIST data set, each gray scale picture represents one number from 0 to 9. The size of the picture is 28 x 28, and the handwritten numbers all appear in the middle of the picture. The activation function AS is defined AS:
f=b0*sigmoid(a0*x+a1)+b1
a0, a1, b0, b1 are four learnable parameters that control the shape of the function and can be trained along with network weights and bias.
The invention mainly adds learnable parameters in the sigmoid classical activation function to make the sigmoid classical activation function become an adaptive activation function AS, and the mathematical definition of the function is AS follows:
let N be the number of adjustable parameters of the adaptive activation function, where N is assumed to be 2. The adaptive activation function may be defined as:
f(x)=f(a*x+c)
where a and c are both learnable parameters used to control the shape of the activation function. The so-called neural network can be seen as a combination of many individual neurons, defining the output of the neural network as a complex function of set weights, biases and learnable neuron parameters, the function being as follows:
h(w,b,a,c)=h(f(a*x+c))
where h represents the output of the neural network and w and b represent the weights and biases of the network. At the same time, the function can also be viewed as using the same set of learnable parameters for all neurons in the neural network. One of the more broad definitions is: each neuron in the neural network uses different adjustable parameters, as follows:
Figure BDA0001699947670000071
where fn represents each neuron in one layer of the network. Each layer of neurons using the same adjustable parameters is defined as follows:
Figure BDA0001699947670000074
the present invention uses a back-propagation algorithm to train an adaptive activation function in a neural network, where learnable parameters are optimized along with weights and bias execution as the network training progresses. The parameters { a1, …, n, b1, …, n } may be updated according to the chain derivative rule as follows:
Figure BDA0001699947670000072
where ai e { a1, …, n, b1, …, n }, and L represents a cost function.
Figure BDA0001699947670000073
This term can be derived from the latter layer by back propagation, and the weighting term ∑ Xi can be used at all positions of the feature map or neural network layer. For variables shared in one layer, the gradient ai can be found by the following equation, Σ i is used to sum the neurons in all channels or one layer, as follows:
Figure BDA0001699947670000081
in step 3, the method of the self-adaptive activation function is applied to deep learning, and the optimal activation function obtained in step 2 is applied to the detection of the bladder cancer cells. The process is as follows:
and 3.1, making a data set. The bladder cancer cell data set is made into a pascal _ voc2007 format, and the label information of the cells is mainly stored by using the generated xml file.
And 3.2, selecting a proper algorithm and a proper model to initialize parameters. The invention selects a Faster R-CNN algorithm, initializes the network parameters by using an vgg16 model, mainly initializes the network parameters by using a vgg16 pre-training model to reduce the training time and reduce the risk of under-fitting or over-fitting.
And 3.3, comparing and analyzing the experimental results of the optimal activation function and the traditional activation function. The optimal activation function version generated in the step 2 is mainly used for replacing the traditional activation function in the Faster R-CNN algorithm, and finally, the analysis and comparison of the experimental results are carried out.
Finally, the method proposed by the present invention, that is, using the same adjustable activation function in the whole network, adds a total number of parameters, no matter how many neurons are used in the neural network, which is the number of parameters that the adaptive activation function can learn (these parameters are used to control the shape of the function). The whole network uses the same adjustable activation function, just like the polynomial order superposition of a compound function, thereby enhancing the nonlinearity of the network, improving the fitting capability of the network and accelerating the learning speed of the network.
The present invention will be described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the network used was three hidden layers, each with 50 neurons, iterated for 100 cycles using a time-gradient descent algorithm. The learning rate was set to 0.01 and the minimum number of batches was 100. Among the contrastive activation functions used are the traditional Sigmoid function, the traditional ReLU activation function, the unified version of the adaptive activation function, the respective versions of the adaptive activation function, and the hierarchical version of the adaptive activation function. Fig. 1 shows convergence curves of the ReLU, Sigmoid, and three versions based on the adaptive activation function AS. "Relu _ train" represents the classification error rate on the training set using the Relu activation function. "Relu _ test" indicates that the Relu activation function is used to classify the error rate on the test set. "ausimoid" denotes a Unified Version (UV) of the adaptive activation function AS, i.e. each neuron uses the same activation function throughout. "ALsigmoid" represents the respective Version of the adaptive activation function (Indvidual Version, IV), i.e. each neuron uses a respective activation function that is considered good. "AIsigmoid" denotes a hierarchical Version (LV), i.e., all neurons of each Layer use the same activation function, but the activation function is not necessarily the same between each Layer.
The expression of the conventional Sigmoid activation function is as follows:
Figure BDA0001699947670000091
the Sigmoid activation function diagram refers to fig. 5.
Sigmoid activation function is a common activation method because it has a good explanation for the activation frequency of neurons: from completely inactive 0 to fully saturated activation at the maximum boundary 1. But the Sigmoid function is now rarely used, one important reason being that the Sigmoid function saturates and the gradient disappears. This is because Sigmoid neurons have the undesirable property that they saturate when their activation value is close to 0 or 1, and when in these regions the gradient of the function is almost 0, which results in that when propagating backwards, this (local) gradient will be multiplied by the gradient of the entire loss function with respect to the gate unit output, the result of which multiplication will also be close to zero, which effectively terminates the gradient, resulting in almost no signal passing through the neuron to the weight and then to the data, resulting in the final gradient dispersion problem.
Another classical activation function is: tanh nonlinear function, the expression is as follows:
Figure BDA0001699947670000092
the Tanh activation function diagram refers to FIG. 6.
As can be seen from fig. 6, Tanh, compared to Sigmoid function, compresses the real value between [ -1,1], but has saturation problem like Sigmoid. But unlike the Sigmoid neuron, its output is zero-centered. In actual practice, the Tanh nonlinear function is more popular than the Sigmoid nonlinear function. It can be said that the Tanh neuron is a simply amplified Sigmoid neuron.
ReLU is an activation function that is now widely used, compared to the first two classical activation functions, and its mathematical formula is as follows:
f(x)=max(0,x)
the ReLU activation function map refers to fig. 7.
ReLU has a tremendous acceleration effect on the convergence of gradient descent compared to Sigmoid and Tanh functions, due to its linear, non-saturated formulation. When the input of the ReLU activation function is positive, the problem of gradient saturation does not exist; when the input is negative, the ReLU is completely inactive, which indicates that the ReLU will fail once a negative number is input. For example, when a large gradient propagates back through a ReLU neuron, it may cause the gradient to update to a particular state where the neuron is likely to no longer be reactivated by any other operational node. If this happens, the gradients propagating backwards from there through this neuron will all become 0. That is, this ReLU unit will fail irreversibly during training, which results in loss of data diversity.
AS can be seen from fig. 1, using a unified version of the activate function AS achieves a lower classification error rate than using the Relu activate function on the MNIST training set. The network has a stronger fitting ability than using the original Sigmoid activation function.
As shown in fig. 2, which is a diagram of a parameter adjustment process of a unified version of an adaptive activation function, learnable parameters of the adaptive activation function are initially set to: after training iteration, the final parameters of a0 ═ 1.0, a1 ═ 0.0, b0 ═ 1.0, and b1 ═ 0.0 become: a 0-3.87, a 1-0.07, b 0-5.89, and b 1-0.51, there is basically no great change.
As shown in fig. 3, the final unified version of the adaptive activation function has a larger value range than that of the traditional Sigmoid activation function, and the problem of gradient dispersion of the traditional Sigmoid activation function is solved to a great extent, so that the accuracy of classification is improved.
As shown in fig. 4, the final adaptive activation function version (RAS) is compared with other activation functions according to the experimental results, and the final adaptive activation function formula is as follows:
f=5.89*sigmoid(3.87*x+0.07)-0.51
it can be seen from the comparison of the experimental results with fig. 4 that the unified adaptive activation function can achieve the best experimental effect, and in the experiment of detecting bladder cancer cells, the detection result and speed of the unified adaptive activation function are better than those of the traditional activation function, which further proves that each network can be trained to obtain the most suitable activation function.

Claims (5)

1. A deep neural network-oriented adaptive activation function parameter adjustment method is characterized by comprising the following steps:
step 1, firstly, a mathematical definition is carried out on a parameter adjustment method of the adaptive activation function, and the process is as follows:
assuming that the number of adjustable parameters of the adaptive activation function is N, the adaptive activation function is defined as:
f(x)=f(a*x+c)
where a and c are both learnable parameters used to control the shape of the activation function, so-called neural networks are considered as a combination of many individual neurons, and the output of the neural network is defined as a complex function of set weights, biases, and learnable neuron parameters, the function being as follows:
h(w,b,a,c)=h(f(a*x+c))
where h represents the output of the neural network, w and b represent the weights and biases of the network; at the same time, the function is also viewed as using the same set of learnable parameters for all neurons in the neural network, a more extensive definition: each neuron in the neural network uses different adjustable parameters, as follows:
Figure FDA0003279030970000011
where fn represents each neuron in a layer of the network, each layer of neurons being defined using the same adjustable parameters as follows:
Figure FDA0003279030970000012
training adaptive activation functions in neural networks using a back-propagation algorithm, where learnable parameters are optimized along with weights and bias-executions as network training progresses, parameter { a }1,…,an,b1,…,bnGet updated according to the chain-type derivative rule, the update is as follows:
Figure FDA0003279030970000013
wherein a isi∈{a1,…,an,b1,…,bnL represents a cost function,
Figure FDA0003279030970000014
this term can be obtained from the latter layer by back propagation, weighting the term
Figure FDA0003279030970000015
Can be used at all positions of the profile or neural network layer, gradient a for variables shared in one layeriCan be obtained by the following formula,
Figure FDA0003279030970000021
to sum the neurons in all channels or one layer, the formula is as follows:
Figure FDA0003279030970000022
step 2, carrying out comparison and analysis of experimental results of the adaptive activation function and other activation functions based on the MNIST data set, wherein the process is as follows:
the used network has three hidden layers, each hidden layer has 50 neurons, the gradient descent algorithm at any time is used for iterating 100 periods, the learning rate is set to be 0.01, and the minimum batch number is 100;
step 3, after obtaining the optimal activation function version in step 2, applying the optimal activation function version to the detection of specific bladder cancer cells, wherein the process is as follows:
3.1, making a data set for the bladder cancer;
3.2, initializing parameters by selecting an algorithm and a model;
and 3.3, comparing and analyzing the experimental results of the optimal activation function and the traditional activation function.
2. The method as claimed in claim 1, wherein the contrast activation function used in step 2 includes a conventional Sigmoid function, a conventional ReLU activation function, a uniform version of the adaptive activation function, respective versions of the adaptive activation function, and a hierarchical version of the adaptive activation function.
3. The method for adjusting parameters of the adaptive activation function oriented to the deep neural network as claimed in claim 1 or 2, wherein in 3.1, the bladder cancer cell data set is made into a pascal _ voc2007 format, and the generated xml file is mainly used for storing the label information of the cells.
4. The method for adjusting parameters of adaptive activation function for deep neural network as claimed in claim 1 or 2, wherein in 3.2, fast R-CNN algorithm is selected, network parameters are initialized by vgg16 model, and vgg16 pre-training model is used for initializing network parameters.
5. The adaptive activation function parameter adjustment method for the deep neural network as claimed in claim 4, wherein in the step 3.3, the optimal activation function version generated in the step 3.2 is used to replace a traditional activation function in the Faster R-CNN algorithm, and finally, the analysis and comparison of the experimental results are performed.
CN201810631395.3A 2018-06-19 2018-06-19 Adaptive activation function parameter adjusting method for deep neural network Active CN108898213B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810631395.3A CN108898213B (en) 2018-06-19 2018-06-19 Adaptive activation function parameter adjusting method for deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810631395.3A CN108898213B (en) 2018-06-19 2018-06-19 Adaptive activation function parameter adjusting method for deep neural network

Publications (2)

Publication Number Publication Date
CN108898213A CN108898213A (en) 2018-11-27
CN108898213B true CN108898213B (en) 2021-12-17

Family

ID=64345490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810631395.3A Active CN108898213B (en) 2018-06-19 2018-06-19 Adaptive activation function parameter adjusting method for deep neural network

Country Status (1)

Country Link
CN (1) CN108898213B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109934222A (en) * 2019-03-01 2019-06-25 长沙理工大学 A kind of insulator chain self-destruction recognition methods based on transfer learning
CN110084380A (en) * 2019-05-10 2019-08-02 深圳市网心科技有限公司 A kind of repetitive exercise method, equipment, system and medium
CN110222173B (en) * 2019-05-16 2022-11-04 吉林大学 Short text emotion classification method and device based on neural network
CN110443296B (en) * 2019-07-30 2022-05-06 西北工业大学 Hyperspectral image classification-oriented data adaptive activation function learning method
CN110570048A (en) * 2019-09-19 2019-12-13 深圳市物语智联科技有限公司 user demand prediction method based on improved online deep learning
CN111860460A (en) * 2020-08-05 2020-10-30 江苏新安电器股份有限公司 Application method of improved LSTM model in human behavior recognition
JP6942900B1 (en) * 2021-04-12 2021-09-29 望 窪田 Information processing equipment, information processing methods and programs
CN113822386B (en) * 2021-11-24 2022-02-22 苏州浪潮智能科技有限公司 Image identification method, device, equipment and medium
CN114708460A (en) * 2022-04-12 2022-07-05 济南博观智能科技有限公司 Image classification method, system, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5113483A (en) * 1990-06-15 1992-05-12 Microelectronics And Computer Technology Corporation Neural network with semi-localized non-linear mapping of the input space
CN104951836A (en) * 2014-03-25 2015-09-30 上海市玻森数据科技有限公司 Posting predication system based on nerual network technique
CN105654136A (en) * 2015-12-31 2016-06-08 中国科学院电子学研究所 Deep learning based automatic target identification method for large-scale remote sensing images
CN105891215A (en) * 2016-03-31 2016-08-24 浙江工业大学 Welding visual detection method and device based on convolutional neural network
CN107122825A (en) * 2017-03-09 2017-09-01 华南理工大学 A kind of activation primitive generation method of neural network model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5113483A (en) * 1990-06-15 1992-05-12 Microelectronics And Computer Technology Corporation Neural network with semi-localized non-linear mapping of the input space
CN104951836A (en) * 2014-03-25 2015-09-30 上海市玻森数据科技有限公司 Posting predication system based on nerual network technique
CN105654136A (en) * 2015-12-31 2016-06-08 中国科学院电子学研究所 Deep learning based automatic target identification method for large-scale remote sensing images
CN105891215A (en) * 2016-03-31 2016-08-24 浙江工业大学 Welding visual detection method and device based on convolutional neural network
CN107122825A (en) * 2017-03-09 2017-09-01 华南理工大学 A kind of activation primitive generation method of neural network model

Also Published As

Publication number Publication date
CN108898213A (en) 2018-11-27

Similar Documents

Publication Publication Date Title
CN108898213B (en) Adaptive activation function parameter adjusting method for deep neural network
Wang et al. TL-GDBN: Growing deep belief network with transfer learning
Jia Investigation into the effectiveness of long short term memory networks for stock price prediction
Bawa et al. Linearized sigmoidal activation: A novel activation function with tractable non-linear characteristics to boost representation capability
CN109829541A (en) Deep neural network incremental training method and system based on learning automaton
Kong et al. Hexpo: A vanishing-proof activation function
Zhang et al. Evolving neural network classifiers and feature subset using artificial fish swarm
CN108009635A (en) A kind of depth convolutional calculation model for supporting incremental update
Hu et al. A dynamic rectified linear activation units
Yonekawa et al. A ternary weight binary input convolutional neural network: Realization on the embedded processor
CN111382840B (en) HTM design method based on cyclic learning unit and oriented to natural language processing
Chen et al. CNN-based broad learning with efficient incremental reconstruction model for facial emotion recognition
Roudi et al. Learning with hidden variables
Dudek Data-driven randomized learning of feedforward neural networks
CN111144500A (en) Differential privacy deep learning classification method based on analytic Gaussian mechanism
Lagani et al. Training convolutional neural networks with competitive hebbian learning approaches
Liu et al. Comparison and evaluation of activation functions in term of gradient instability in deep neural networks
Qiao et al. SRS-DNN: a deep neural network with strengthening response sparsity
Deng et al. Ensemble SVR for prediction of time series
Chen et al. Deep sparse autoencoder network for facial emotion recognition
Talafha et al. Biologically inspired sleep algorithm for variational auto-encoders
Zhang et al. Weight asynchronous update: Improving the diversity of filters in a deep convolutional network
Ben-Bright et al. Taxonomy and a theoretical model for feedforward neural networks
Abramova et al. Research of the extreme learning machine as incremental learning
CN114004353A (en) Optical neural network chip construction method and system for reducing number of optical devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant