CN110782030A - Deep learning weight updating method, system, computer device and storage medium - Google Patents

Deep learning weight updating method, system, computer device and storage medium Download PDF

Info

Publication number
CN110782030A
CN110782030A CN201910872174.XA CN201910872174A CN110782030A CN 110782030 A CN110782030 A CN 110782030A CN 201910872174 A CN201910872174 A CN 201910872174A CN 110782030 A CN110782030 A CN 110782030A
Authority
CN
China
Prior art keywords
neural network
weight vector
updating
training
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910872174.XA
Other languages
Chinese (zh)
Inventor
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910872174.XA priority Critical patent/CN110782030A/en
Priority to PCT/CN2019/117553 priority patent/WO2021051556A1/en
Publication of CN110782030A publication Critical patent/CN110782030A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The embodiment of the invention provides a deep learning weight value updating method based on parameter rewriting, which comprises the following steps: constructing a deep neural network model according to a plurality of neuron output functions; updating parameters of each weight vector in the deep neural network model to obtain each updated weight vector; inputting a training sample into the deep neural network model, and obtaining calculation output from the deep neural network model; and updating each weight vector according to the calculation output. By the embodiment of the invention, the weight parameter can be rewritten, the problem of limitation of batch normalization on the number of samples is avoided, meanwhile, the convergence speed of the neural network is improved, and the training process of the neural network is accelerated.

Description

Deep learning weight updating method, system, computer device and storage medium
Technical Field
The embodiment of the invention relates to the field of artificial neural networks, in particular to a method, a system, computer equipment and a computer readable storage medium for updating deep learning weight.
Background
The batch normalization is a common sample characteristic normalization method used when a neural network model is trained, namely, the mean value and the variance of sample data are reduced, so that the distribution of the data is optimized, and the training speed of the neural network is accelerated. However, batch normalization has a limit on the number of training samples, and when the number of samples is 1, batch normalization does not work.
The use of batch normalization requires the storage of the mean and variance of small batches at each time step, which is inefficient and occupies memory, and slows down the convergence rate of the neural network to a certain extent.
Therefore, the invention aims to solve the problems of limitation of batch normalization on samples and low convergence speed of the neural network.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method, a system, a computer device, and a computer-readable storage medium for updating a deep learning weight based on parameter rewriting, which are not limited by batch normalization on the number of samples, and can accelerate the convergence rate of a neural network model.
In order to achieve the above object, an embodiment of the present invention provides a method for updating a deep learning weight, where the method includes:
constructing a deep neural network model according to a plurality of neuron output functions, wherein the output function of each neuron is y ═ phi (WX + b), wherein y represents the output value of the corresponding neuron, phi represents an excitation function, X represents multidimensional input features, W represents a weight vector, and b represents a deviation scalar of the corresponding neuron;
performing parameter updating on each weight vector in the deep neural network model to obtain each updated weight vector, wherein an updating formula for parameter updating is as follows:
Figure BDA0002203166250000021
wherein, W nRepresenting the updated weight vector of the corresponding neuron, v represents W nUnit vector of (1), g represents W nThe scalar quantity of (a), the g | | | W n||,v n-1A unit vector representing each weight vector when the deep neural network model is trained for the (n-1) th time;
inputting a training sample into the deep neural network model, and obtaining calculation output from the deep neural network model;
and updating each weight vector according to the calculation output.
Further, before the step of constructing the deep neural network model according to the plurality of neuron output functions, the method further includes:
and initializing each weight vector W and each deviation scalar b.
Further, the step of updating each weight vector according to the calculation output includes:
calculating a training error by using the calculated output and a preset target output according to a training error formula, wherein the training error formula is as follows:
Figure BDA0002203166250000022
wherein J (W) represents a training error, t kRepresents the target output of the kth training, z kRepresenting the calculation output of the k training, wherein k is a positive integer, and k is 1, 2 … c;
judging whether back propagation needs to be executed or not according to the training error;
and when the reverse propagation is not required to be executed, taking each weight vector as each weight vector after the deep neural network model is updated.
Further, the step of determining whether back propagation needs to be performed according to the training error includes: comparing the training error with a preset expected value; and
and when the training error is larger than the preset expected value, executing the back propagation to update each weight vector.
Further, after the step of comparing the training error with the preset expected value, the method further includes:
and when the training error is not greater than the preset expected value, obtaining each weight vector without executing the back propagation, and taking each weight vector as each weight vector after the deep neural network model is updated.
Further, when the training error is greater than the preset expected value, the step of performing the back propagation to update each weight vector further includes:
updating each weight vector according to a weight updating formula, wherein the weight updating formula is as follows: w (n +1) ═ W (n) +ΔW(n), W (n) represents a weight vector of the corresponding neuron in the n training of the deep neural network model, W (n +1) represents a weight vector of the corresponding neuron in the n +1 training of the deep neural network model, Δ W (n) represents a change of the weight vector of the corresponding neuron in a gradient descending direction in the n training of the deep neural network model, η represents a learning rate,
Figure BDA0002203166250000032
a partial derivative function representing the training error to the weight vector of the corresponding neuron.
Further, when the training error is greater than the preset expected value, after the step of performing the back propagation to update each weight vector, the method further includes:
updating each weight vector W according to the vector v and the change value of the scalar g, wherein the change value of the scalar g in the gradient descending direction is as follows: wherein ▽ g L represents a partial derivative function of the error function with respect to the parameter g, and the vector v varies in the gradient descent direction by:
Figure BDA0002203166250000034
where ▽ vL represents the partial derivative of the error function with respect to the parameter v.
In order to achieve the above object, an embodiment of the present invention further provides a deep learning weight updating system, including:
the deep neural network model is constructed according to a plurality of neuron output functions, wherein the output function of each neuron is y ═ phi (WX + b), wherein y represents the output value of the corresponding neuron, phi represents an excitation function, X represents multidimensional input characteristics, W represents a weight vector, and b represents a deviation scalar of the corresponding neuron;
a parameter updating module, configured to perform parameter updating on each weight vector in the deep neural network model to obtain each updated weight vector, where an updating formula for parameter updating is as follows:
Figure BDA0002203166250000041
wherein, W nRepresenting the updated weight vector of the corresponding neuron, v represents W nUnit vector of (1), g represents W nThe scalar quantity of (a), the g | | | W n||,v n-1A unit vector representing each weight vector when the deep neural network model is trained for the (n-1) th time;
the training module is used for inputting training samples into the deep neural network model and obtaining calculation output from the deep neural network model;
and the updating module is used for updating each weight vector according to the calculation output.
In order to achieve the above object, an embodiment of the present invention further provides a computer device, where the computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the steps of the deep learning weight updating method as described above when executing the computer program.
In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, where the computer program is executable by at least one processor to cause the at least one processor to execute the steps of the deep learning weight update method described above.
The deep learning weight updating method, the deep learning weight updating system, the computer equipment and the computer readable storage medium provided by the embodiment of the invention update the weight of the deep neural network model based on parameter rewriting, are free from the problem of limitation of batch normalization on the number of samples, and accelerate the convergence speed of the neural network model.
The invention is described in detail below with reference to the drawings and specific examples, but the invention is not limited thereto.
Drawings
Fig. 1 is a flowchart illustrating steps of a deep learning weight updating method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a program module of a deep learning weight update system according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Technical solutions between various embodiments may be combined with each other, but must be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Example one
Referring to fig. 1, a flowchart illustrating a method for updating deep learning weights according to a first embodiment of the present invention is shown. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. The following description is given by taking a computer device as an execution subject, specifically as follows:
and S100, constructing a deep neural network model according to a plurality of neuron output functions.
Specifically, the output function of each neuron is y ═ Φ (WX + b), where y represents the output value of the neuron, Φ represents an excitation function, X represents a multi-dimensional input feature, W represents a weight vector representing the weight occupied by the input on the neuron, and b represents a deviation scalar of the neuron, and the neuron generates an output when the input is greater than a threshold of the excitation function.
Usually, a neural network is composed of an input layer, one or more hidden layers and an output layer, and the number of the hidden layers of the deep neural network is greater than or equal to 2.
In a preferred embodiment, before the deep neural network model is constructed according to a plurality of neuron output functions, the weight vectors W and the bias scalars b are initialized, and the initialization refers to randomly taking values for the weight vectors W and the bias scalars b within a preset value range.
And step S102, performing parameter updating on each weight vector in the deep neural network model to obtain each updated weight vector.
Specifically, the update formula for updating the parameters is as follows:
Figure BDA0002203166250000061
wherein, W nRepresenting the updated weight vector of the corresponding neuron, v represents W nUnit vector of (1), g represents W nThe scalar quantity of (a), the g | | | W n||,v n-1A unit vector representing the respective weight vectors at the time of training the n-1 th time of the deep neural network model, is an initial value of the unit vector v, and
Figure BDA0002203166250000063
is also w nThe initial coefficient of (a). In this embodiment, v 0And taking a v value when the weight vector W is initialized.
And step S104, inputting the training sample into the deep neural network model, and obtaining calculation output from the deep neural network model.
Specifically, forward propagation calculation is performed by using the weight vectors to obtain calculation output. The forward propagation calculation refers to that the training samples are calculated forward layer by layer through the deep neural network model, and then calculation output is output by the output layer.
In a preferred embodiment, in the step of updating the weight vectors according to the calculated output, the calculated output and a preset target output are further input into a preset training error formula to calculate a training error, where the training error formula is:
Figure BDA0002203166250000064
wherein W represents the corresponding weight vector, J (W) represents the training error, t kRepresents the target output of the kth training, z kRepresents the calculation output of the k-th training, wherein k is a positive integer, and k is 1, 2 … c. And then judging whether reverse propagation needs to be executed or not according to the training error, and when the reverse propagation does not need to be executed, taking each weight vector as each weight vector after the deep neural network model is updated. Illustratively, in the 1 st training, the preset target output is 0.5, the calculated output is 0.4, and the training error is j (w) ═ 1/2 ^ (0.5-0.4) ^2 ^ 0.005.
In another preferred embodiment, the training error is compared with a preset expected value before determining whether back propagation is required according to the training error. If the training error is larger than the preset expected value, the reverse propagation is needed; and if the training error is not greater than the preset expected value, stopping training the deep neural network, and taking each weight vector as each weight vector after the deep neural network model is updated. Illustratively, in the 1 st training, the training error is 0.005, the preset expected value is 0.1, and if the training error is not greater than the expected value, the training of the deep neural network is stopped, and each weight vector is the updated weight vector of the deep neural network.
In another preferred embodiment, if the training error is greater than the predetermined expected value, then the back propagation is required, and the weight vectors are processed according to a weight update formulaUpdating, wherein the weight value updating formula is as follows: w (n +1) ═ W (n) + Δ W (n), wherein W (n) represents a weight vector of the corresponding neuron when the deep neural network model is trained for the nth time, W (n +1) represents a weight vector of the corresponding neuron when the deep neural network model is trained for the n +1 th time, AW (n) represents a change of the weight vector of the corresponding neuron when the deep neural network model is trained for the nth time in a gradient descending direction, η represents a learning rate,
Figure BDA0002203166250000072
a partial derivative function representing a weight vector of the corresponding neuron.
It should be noted that the gradient descending direction refers to a training direction that can make the training error within the fastest time to be smaller than the expected value. And returning the training error to each neuron of each layer by the back propagation, solving the partial derivative function according to the training error and the weight of each neuron, and updating each weight vector according to the solution of the partial derivative function.
In another preferred embodiment, if the training error is greater than the preset expected value, then the propagation in reverse direction is required, and the weight vectors may be further updated according to the vector v and the change value of the scalar g, where the change value of the scalar g in the gradient descending direction is:
Figure BDA0002203166250000073
wherein ▽ g L represents the partial derivative function of the error function to the parameter g, ▽ WL represents the partial derivative function of the error function to the weight W, and the variation value of the vector v in the gradient descending direction is:
Figure BDA0002203166250000081
wherein ▽ vL represents the partial derivative of the error function on the parameter v. since the weight W is rewritten, the weight W is originally calculatedMay be converted into a variation of the parameters v and g.
Illustratively, when the back propagation calculation is performed, the variation value of the scalar g and the variation value of the parameter v are obtained by differentiating the partial derivative function of the error function with respect to the parameter g and the partial derivative function of the error function with respect to the parameter v. Then, scalar g and vector v are updated with the change value of scalar g and the change value of vector v. And finally, updating each weight vector according to the updated scalar g and the vector v.
In another preferred embodiment, after updating each weight vector according to the gradient of the vector v and the scalar g, the deep neural network model is continuously trained by each weight vector, and a corresponding calculation output is obtained, and then the calculation output and the target output are recalculated into a corresponding training error according to the training error formula. And stopping training the neural network when the training error is not greater than the preset expected value or the training times reach the preset training times.
And S106, updating each weight vector according to the calculated output.
Specifically, calculation output is obtained from the deep neural network model, and then each weight vector is updated according to the calculation output.
Illustratively, a deep neural network is used for classifying blue points and red points in a certain image data set, when a weight vector is valued by using a random initialization method, the weight vector needs to be valued from a standard normal distribution, and then the deep neural network is trained by using the weight vector, so that the training effect is as follows: the gradient descending speed is 41.9968s, and the classification accuracy is 93%; when the weight updating method is used for updating each weight vector in each iteration of the deep neural network, the obtained training effect is as follows: the gradient descending speed is 40.8717s, which is 1.12 seconds faster than the original gradient descending speed, the classification accuracy is 96%, which is 3% higher than the original accuracy.
The invention updates the weight of the deep neural network model based on parameter rewriting, is not limited by batch normalization to the number of samples, and can accelerate the convergence rate of the neural network model.
Example two
Please refer to fig. 2, which illustrates a schematic diagram of a processing module of a deep learning weight updating system according to a second embodiment of the present invention. In this embodiment, the deep learning weight update system 20 may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors to implement the deep learning weight update method. The program module referred to in the embodiments of the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable for describing the execution process of the deep learning weight updating system 20 in the storage medium than the program itself. The following description will specifically describe the functions of the program modules of the present embodiment:
a building module 200, configured to build a deep neural network model according to the plurality of neuron output functions.
Specifically, the output function of each neuron is y ═ Φ (WX + b), where y represents the output value of the neuron, Φ is an excitation function, X represents a multidimensional input feature, W is a weight vector representing the weight occupied by the input on the neuron, and b represents a deviation scalar of the neuron, and the neuron generates an output when the input is greater than a threshold of the excitation function.
Usually, a neural network is composed of an input layer, one or more hidden layers and an output layer, and the number of the hidden layers of the deep neural network is greater than or equal to 2.
In a preferred embodiment, before the deep neural network model is constructed according to a plurality of neuron output functions, the weight vectors W and the bias scalars b are initialized, and the initialization refers to randomly taking values for the weight vectors W and the bias scalars b within a preset value range.
A parameter updating module 202, configured to perform parameter updating on each weight vector in the deep neural network model to obtain each updated weight vector.
Specifically, the update formula for updating the parameters is as follows: wherein, W nRepresenting the updated weight vector of the corresponding neuron, v represents W nUnit vector of (1), g represents W nThe scalar quantity of (a), the g | | | W n||,v n-1A unit vector representing the respective weight vectors at the time of training the n-1 th time of the deep neural network model, is an initial value of the unit vector v, and
Figure BDA0002203166250000102
is also w nThe initial coefficient of (a). In this embodiment, v 0And taking a v value when the weight vector W is initialized.
And the training module 204 is configured to input a training sample into the deep neural network model, and obtain a calculation output from the deep neural network model.
Specifically, the training module 204 performs forward propagation calculation by using each weight vector, and obtains calculation output. The forward propagation calculation refers to that the training samples are calculated forward layer by layer through the deep neural network model, and then calculation output is output by the output layer.
In a preferred embodiment, in the step of updating the weight vectors according to the calculated output, the training module 204 further inputs the calculated output and a preset target output into a preset training error formula to calculate a training error, where the training error formula is:
Figure BDA0002203166250000103
wherein W represents the corresponding weight vector, J (W) represents the training error, t kRepresents the target output of the kth training, z kRepresents the calculated output of the k-th trainingAnd k is a positive integer, and k is 1, 2 … c. And then judging whether reverse propagation needs to be executed or not according to the training error, and when the reverse propagation does not need to be executed, taking each weight vector as each weight vector after the deep neural network model is updated. Illustratively, in the 1 st training, the preset target output is 0.5, the calculated output is 0.4, and the training error is j (w) ═ 1/2 ^ (0.5-0.4) ^2 ^ 0.005.
In another preferred embodiment, the training module 204 further compares the training error with a predetermined expected value before determining whether the back propagation is required according to the training error. If the training error is larger than the preset expected value, the reverse propagation is needed; and if the training error is not greater than the preset expected value, stopping training the deep neural network, and taking each weight vector W as the updated weight vector of the deep neural network model. Illustratively, in the 1 st training, the training error is 0.005, the preset expected value is 0.1, and if the training error is not greater than the expected value, the training of the deep neural network is stopped, and each weight vector is the updated weight vector of the deep neural network.
In another preferred embodiment, if the training error is greater than the preset expected value, then the back propagation is required, and the training module 204 updates each weight vector according to a weight update formula, where the weight update formula is: w (n +1) ═ W (n) + Δ W (n),
Figure BDA0002203166250000111
wherein W (n) represents a weight vector of the corresponding neuron when the deep neural network model is trained for the nth time, W (n +1) represents a weight vector of the corresponding neuron when the deep neural network model is trained for the n +1 th time, AW (n) represents a change of the weight vector of the corresponding neuron when the deep neural network model is trained for the nth time in a gradient descending direction, η represents a learning rate,
Figure BDA0002203166250000112
a partial derivative function representing a weight vector of the corresponding neuron.
It should be noted that the gradient descending direction refers to a training direction that can make the training error within the fastest time to be smaller than the expected value. And returning the training error to each neuron of each layer by the back propagation, solving the partial derivative function according to the training error and the weight of each neuron, and updating each weight vector according to the solution of the partial derivative function.
In another preferred embodiment, if the training error is greater than the preset expected value and the reverse propagation is required, the training module 204 may further update each weight vector according to the vector v and a variation value of the scalar g, where the variation value of the scalar g in the gradient descending direction is represented by ▽ g L, the partial derivative function of the error function on the parameter g is represented by ▽ wL, the partial derivative function of the error function on the weight W is represented by ▽ wL, and the variation value of the vector v in the gradient descending direction is represented by:
Figure BDA0002203166250000113
since parameter rewriting is performed on the weight value W, the change of the original weight value W can be converted into the change of the parameters v and g.
Illustratively, when performing the back propagation calculation, the training module 204 obtains the variation value of the scalar g and the variation value of the parameter v by differentiating the partial derivative function of the error function with respect to the parameter g and the partial derivative function of the error function with respect to the parameter v. Then, scalar g and vector v are updated with the change value of scalar g and the change value of vector v. And finally, updating each weight vector according to the updated scalar g and the vector v.
In another preferred embodiment, after updating each weight vector according to the gradient of the vector v and the scalar g, the training module 204 continues the training of the deep neural network model with each weight vector, obtains a corresponding calculation output, and then recalculates the corresponding training error between the calculation output and the target output according to the training error formula. And stopping training the neural network when the training error is not greater than the preset expected value or the training times reach the preset training times.
An updating module 206, configured to update each weight vector according to the calculation output.
Specifically, the updating module 206 obtains the calculation output from the deep neural network model, and then updates each weight vector according to the calculation output.
Illustratively, a deep neural network is used for classifying blue points and red points in a certain image data set, when a weight vector is valued by using a random initialization method, the weight vector needs to be valued from a standard normal distribution, and then the deep neural network is trained by using the weight vector, so that the training effect is as follows: the gradient descending speed is 41.9968s, and the classification accuracy is 93%; when the weight updating method is used for updating each weight vector in each iteration of the deep neural network, the obtained training effect is as follows: the gradient descending speed is 40.8717s, which is 1.12 seconds faster than the original gradient descending speed, the classification accuracy is 96%, which is 3% higher than the original accuracy.
The invention updates the weight of the deep neural network model based on parameter rewriting, is not limited by batch normalization to the number of samples, and can accelerate the convergence rate of the neural network model.
EXAMPLE III
Fig. 3 is a schematic diagram of a hardware architecture of a computer device according to a third embodiment of the present invention. In the present embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a preset or stored instruction. The computer device 2 may be a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers), and the like. As shown in fig. 3, the computer device 2 includes, but is not limited to, at least a memory 21, a processor 22, a network interface 23, and a deep learning weight value updating system 20, which are connected to each other in a communication manner through a system bus. Wherein:
in this embodiment, the memory 21 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. In other embodiments, the memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like provided on the computer device 2. Of course, the memory 21 may also comprise both internal and external memory units of the computer device 2. In this embodiment, the memory 21 is generally used to store an operating system and various application software installed on the computer device 2, for example, the program code of the deep learning weight updating system 20 in the second embodiment. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the computer device 2. In this embodiment, the processor 22 is configured to run a program code stored in the memory 21 or process data, for example, run the deep learning weight updating system 20, so as to implement the deep learning weight updating method of the first embodiment.
The network interface 23 may comprise a wireless network interface or a wired network interface, and the network interface 23 is generally used for establishing communication connection between the computer device 2 and other electronic apparatuses. For example, the network interface 23 is used to connect the computer device 2 to an external terminal through a network, establish a data transmission channel and a communication connection between the computer device 2 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, and the like.
It is noted that fig. 3 only shows the computer device 2 with components 20-23, but it is to be understood that not all shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the deep learning weight update system 20 stored in the memory 21 can be further divided into one or more program modules, and the one or more program modules are stored in the memory 21 and executed by one or more processors (in this embodiment, the processor 22) to complete the present invention.
For example, fig. 2 shows a schematic diagram of program modules for implementing the deep learning weight updating system 20, in this embodiment, the deep learning weight updating system 20 may be divided into a building module 200, a parameter updating module 202, a training module 204, and an updating module 206. The program module referred to in the present invention refers to a series of computer program instruction segments capable of performing specific functions, and is more suitable than a program for describing the execution process of the deep learning weight value updating system 20 in the computer device 2. The specific functions of the program modules 200 and 206 have been described in detail in the second embodiment, and are not described herein again.
Example four
The present embodiment also provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, which when executed by a processor implements corresponding functions. The computer-readable storage medium of this embodiment is used for storing a deep learning weight updating system 20, and when being executed by a processor, the deep learning weight updating system implements the deep learning weight updating method of the first embodiment.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for updating deep learning weight value is characterized in that the method comprises the following steps:
constructing a deep neural network model according to a plurality of neuron output functions, wherein the output function of each neuron is y ═ phi (WX + b), wherein y represents the output value of the corresponding neuron, phi represents an excitation function, X represents multidimensional input features, W represents a weight vector, and b represents a deviation scalar of the corresponding neuron;
performing parameter updating on each weight vector in the deep neural network model to obtain each updated weight vector, wherein an updating formula for parameter updating is as follows:
Figure FDA0002203166240000011
wherein, W nRepresenting the updated weight vector of the corresponding neuron, v represents W nUnit vector of (1), g represents W nThe scalar quantity of (a), the g | | | W n||,v n-1Representing the n-1 th time of the deep neural network modelThe unit vector of each weight vector during training;
inputting a training sample into the deep neural network model, and obtaining calculation output from the deep neural network model;
and updating each weight vector according to the calculation output.
2. The method for updating deep learning weights according to claim 1, wherein the step of constructing the deep neural network model according to the plurality of neuron output functions further comprises:
and initializing each weight vector and each deviation scalar.
3. The method for updating deep learning weights according to claim 1, wherein the step of updating the weight vectors according to the computation output comprises:
calculating a training error by using the calculated output and a preset target output according to a training error formula, wherein the training error formula is as follows:
Figure FDA0002203166240000012
wherein J (W) represents a training error, t kRepresenting the target output for the kth training of the deep neural network model, z kRepresenting the computation output of the k training of the deep neural network model, wherein k is a positive integer and is 1, 2 … c;
judging whether back propagation needs to be executed or not according to the training error;
and when the reverse propagation is not required to be executed, taking each weight vector as each weight vector after the deep neural network model is updated.
4. The method of claim 3, wherein the step of determining whether back propagation needs to be performed according to the training error comprises:
comparing the training error with a preset expected value; and
and when the training error is larger than the preset expected value, executing the back propagation to update each weight vector.
5. The method of claim 4, wherein the step of comparing the training error with a preset expected value is followed by:
and when the training error is not greater than the preset expected value, obtaining each weight vector without executing the back propagation, and taking each weight vector as each weight vector after the deep neural network model is updated.
6. The method of claim 4, wherein the step of performing the back propagation to update the weight vectors when the training error is greater than the predetermined expected value comprises:
updating each weight vector according to a weight updating formula, wherein the weight updating formula is as follows: w (n +1) ═ W (n) + Δ W (n), w (n) represents a weight vector of the corresponding neuron in the n training of the deep neural network model, W (n +1) represents a weight vector of the corresponding neuron in the n +1 training of the deep neural network model, Δ W (n) represents a change of the weight vector of the corresponding neuron in a gradient descending direction in the n training of the deep neural network model, η represents a learning rate,
Figure FDA0002203166240000022
a partial derivative function representing the training error to the weight vector of the corresponding neuron.
7. The method of claim 4, wherein the step of performing the back propagation to update the weight vectors when the training error is greater than the preset expected value further comprises:
updating each weight vector according to the vector v and the change value of the scalar g, wherein the change value of the scalar g in the gradient descending direction is as follows:
Figure FDA0002203166240000031
wherein ▽ gL represents a partial derivative function of the error function to the parameter g, and the change value of the vector v in the gradient descent direction is:
Figure FDA0002203166240000032
where ▽ vL represents the partial derivative of the error function with respect to the parameter v.
8. A deep learning weight updating system is characterized by comprising:
the deep neural network model is constructed according to a plurality of neuron output functions, wherein the output function of each neuron is y ═ phi (WX + b), wherein y represents the output value of the corresponding neuron, phi represents an excitation function, X represents multidimensional input characteristics, W represents a weight vector, and b represents a deviation scalar of the corresponding neuron;
a parameter updating module, configured to perform parameter updating on each weight vector in the deep neural network model to obtain each updated weight vector, where an updating formula for parameter updating is as follows:
Figure FDA0002203166240000033
wherein, W nRepresenting the updated weight vector of the corresponding neuron, v represents W nUnit vector of (1), g represents W nThe scalar quantity of (a), the g | | | W n||,v n-1A unit vector representing each weight vector when the deep neural network model is trained for the (n-1) th time;
the training module is used for inputting training samples into the deep neural network model and obtaining calculation output from the deep neural network model;
and the updating module is used for updating each weight vector according to the calculation output.
9. A computer device, characterized by a computer device memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the deep learning weight update method according to any one of claims 1 to 7.
10. A computer-readable storage medium, having stored therein a computer program, the computer program being executable by at least one processor to cause the at least one processor to perform the steps of the deep learning weight update method according to any one of claims 1-7.
CN201910872174.XA 2019-09-16 2019-09-16 Deep learning weight updating method, system, computer device and storage medium Pending CN110782030A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910872174.XA CN110782030A (en) 2019-09-16 2019-09-16 Deep learning weight updating method, system, computer device and storage medium
PCT/CN2019/117553 WO2021051556A1 (en) 2019-09-16 2019-11-12 Deep learning weight updating method and system, and computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910872174.XA CN110782030A (en) 2019-09-16 2019-09-16 Deep learning weight updating method, system, computer device and storage medium

Publications (1)

Publication Number Publication Date
CN110782030A true CN110782030A (en) 2020-02-11

Family

ID=69383461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910872174.XA Pending CN110782030A (en) 2019-09-16 2019-09-16 Deep learning weight updating method, system, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN110782030A (en)
WO (1) WO2021051556A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340205A (en) * 2020-02-18 2020-06-26 中国科学院微小卫星创新研究院 Anti-irradiation system and method of neural network chip for space application
CN111860828A (en) * 2020-06-15 2020-10-30 北京仿真中心 Neural network training method, storage medium and equipment
CN113505832A (en) * 2021-07-09 2021-10-15 合肥云诊信息科技有限公司 BGRN normalization method for batch grouping response of neural network
CN113642592A (en) * 2020-04-27 2021-11-12 武汉Tcl集团工业研究院有限公司 Training method of training model, scene recognition method and computer equipment
CN114979033A (en) * 2022-06-13 2022-08-30 华北理工大学 Intranet neural computing system based on programmable data plane

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392310A (en) * 2016-05-16 2017-11-24 北京陌上花科技有限公司 neural network model training method and device
WO2019056470A1 (en) * 2017-09-19 2019-03-28 平安科技(深圳)有限公司 Driving model training method, driver recognition method and apparatus, device, and medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106997484A (en) * 2016-01-26 2017-08-01 阿里巴巴集团控股有限公司 A kind of method and device for optimizing user credit model modeling process
CN109472345A (en) * 2018-09-28 2019-03-15 深圳百诺名医汇网络技术有限公司 A kind of weight update method, device, computer equipment and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392310A (en) * 2016-05-16 2017-11-24 北京陌上花科技有限公司 neural network model training method and device
WO2019056470A1 (en) * 2017-09-19 2019-03-28 平安科技(深圳)有限公司 Driving model training method, driver recognition method and apparatus, device, and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TIM SALIMANS等: ""Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks"", 《ARXIV》, pages 2 *
赵章明等: ""带启发信息的蚁群神经网络训练算法"", 《计算机科学》, vol. 44, no. 11, pages 285 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340205A (en) * 2020-02-18 2020-06-26 中国科学院微小卫星创新研究院 Anti-irradiation system and method of neural network chip for space application
CN111340205B (en) * 2020-02-18 2023-05-12 中国科学院微小卫星创新研究院 Neural network chip anti-irradiation system and method for space application
CN113642592A (en) * 2020-04-27 2021-11-12 武汉Tcl集团工业研究院有限公司 Training method of training model, scene recognition method and computer equipment
CN111860828A (en) * 2020-06-15 2020-10-30 北京仿真中心 Neural network training method, storage medium and equipment
CN111860828B (en) * 2020-06-15 2023-11-28 北京仿真中心 Neural network training method, storage medium and equipment
CN113505832A (en) * 2021-07-09 2021-10-15 合肥云诊信息科技有限公司 BGRN normalization method for batch grouping response of neural network
CN113505832B (en) * 2021-07-09 2023-10-10 合肥云诊信息科技有限公司 BGRN normalization method for neural network batch grouping response of image classification task
CN114979033A (en) * 2022-06-13 2022-08-30 华北理工大学 Intranet neural computing system based on programmable data plane

Also Published As

Publication number Publication date
WO2021051556A1 (en) 2021-03-25

Similar Documents

Publication Publication Date Title
CN110782030A (en) Deep learning weight updating method, system, computer device and storage medium
CN111445007B (en) Training method and system for countermeasure generation neural network
CN108733508B (en) Method and system for controlling data backup
WO2022095432A1 (en) Neural network model training method and apparatus, computer device, and storage medium
CN112101530A (en) Neural network training method, device, equipment and storage medium
WO2020224106A1 (en) Text classification method and system based on neural network, and computer device
CN110705718A (en) Model interpretation method and device based on cooperative game and electronic equipment
CN112232426A (en) Training method, device and equipment of target detection model and readable storage medium
EP4343616A1 (en) Image classification method, model training method, device, storage medium, and computer program
CN111126555A (en) Neural network model training method, device, equipment and storage medium
CN110414620B (en) Semantic segmentation model training method, computer equipment and storage medium
CN113743650B (en) Power load prediction method, device, equipment and storage medium
US20210037084A1 (en) Management device, management method, and management program
CN113011532A (en) Classification model training method and device, computing equipment and storage medium
US20220405561A1 (en) Electronic device and controlling method of electronic device
TWI767122B (en) Model constructing method, system, and non-transitory computer readable storage medium
CN110782017B (en) Method and device for adaptively adjusting learning rate
Barbot et al. Importance sampling for model checking of continuous time markov chains
CN113591398B (en) Intelligent operation batch method and device based on deep reinforcement learning and electronic equipment
WO2018198298A1 (en) Parameter estimation device, parameter estimation method, and computer-readable recording medium
Matsubara et al. Dynamic linear bellman combination of optimal policies for solving new tasks
Al-Dabbagh et al. An integration of compact Genetic algorithm and local search method for optimizing ARMA (1, 1) model of likelihood estimator
WO2019194285A1 (en) Calculation device, calculation method, and calculation program
WO2020170358A1 (en) Tensor decomposition processing system, method and program
CN117475968A (en) Gamma voltage setting method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination