WO2019037409A1 - 神经网络训练系统、方法和计算机可读存储介质 - Google Patents

神经网络训练系统、方法和计算机可读存储介质 Download PDF

Info

Publication number
WO2019037409A1
WO2019037409A1 PCT/CN2018/079500 CN2018079500W WO2019037409A1 WO 2019037409 A1 WO2019037409 A1 WO 2019037409A1 CN 2018079500 W CN2018079500 W CN 2018079500W WO 2019037409 A1 WO2019037409 A1 WO 2019037409A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing device
vector
neural network
weight vector
correction
Prior art date
Application number
PCT/CN2018/079500
Other languages
English (en)
French (fr)
Inventor
费旭东
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880025109.7A priority Critical patent/CN110506280B/zh
Publication of WO2019037409A1 publication Critical patent/WO2019037409A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present application relates to the field of machine learning algorithms, and in particular to a neural network training system, method and computer readable storage medium.
  • Neural networks also known as artificial neural networks
  • neural networks are a common machine learning algorithm that has achieved great success in many fields such as speech recognition, image recognition, and natural language processing, and is still in the process of rapid development. in.
  • the neural network may generally include multiple weight vectors.
  • the data vector of the object to be identified may be input into the neural network, and the neural network may be based on the data vector and its own multiple rights.
  • the vector calculates the output vector corresponding to the data vector, and the neural network can identify the object to be identified based on the output vector.
  • the weight vector in the neural network in the initial state is unknown.
  • the neural network can perform the recognition operation normally, and the neural network in the initial state needs to be trained.
  • a set of weight vectors can be randomly set for the neural network, and the weight vector is corrected multiple times based on the recognition operation result of the different data vectors by the neural network under the set of weight vectors. Until the neural network can obtain a near-ideal output vector for any data vector based on the modified weight vector.
  • the neural network training device can be trained using a neural network training device, wherein the neural network training device can include a processor that needs to perform all operations involved in the neural network training process.
  • the training process of the neural network usually involves many different types of operations, such as vector dot product operation, nonlinear transformation operation, weight vector correction operation, etc. Therefore, the processor in the neural network training device in the related art is generally capable of performing In order to satisfy the computational versatility of the processor, the processor structure of the processor is usually complicated, which makes the processor less efficient, and the training efficiency of the neural network is also low.
  • the present application provides a neural network training system, method, and computer readable storage medium, which can solve the problem of low training efficiency of a neural network in the related art.
  • the technical solution is as follows:
  • a neural network training system comprising a first processing device and a second processing device, the first processing device being different from the second processing device;
  • the first processing device is configured to:
  • N N data vectors in the training set, where the training set includes multiple data vectors, and N is a positive integer greater than or equal to 1;
  • the first operation including a vector dot product operation
  • the second processing device is configured to:
  • each correction value is calculated according to the N sets of output values
  • the correction weight vector is used to instruct the first processing device to perform the first step based on N other data vectors in the training set and the correction weight vector An operation, the N other data vectors being data vectors other than the N data vectors in the training set.
  • the present application sets a first processing device and a second processing device in a neural network training system, wherein the first processing device can perform a vector dot product operation in a neural network training process, and the second processing device can perform a neural network training process
  • the weight vector corrects operations and other types of operations. Therefore, the first processing device may only include a special circuit required to perform a vector dot product operation, so that the first processing device has a relatively simple circuit structure and high computational efficiency, since most of the operations in the neural network training process are vectors.
  • the dot product operation therefore, the higher efficiency of the first processing device can improve the training efficiency of the neural network, and the training efficiency of the neural network is also higher.
  • the second processing device performing the weight vector correction operation can be configured accordingly according to different neural network training algorithms. Therefore, the requirements of different neural network training algorithms can be satisfied, and the neural network training system is more flexible in training the neural network.
  • the neural network training system may perform a process of calculating a weight vector based on at least one correction value and performing a weight vector on a batch of data vectors without performing a process of calculating at least one correction value for each data vector in the training set and correcting the weight vector based on the at least one correction value.
  • the correction that is, the neural network training system can calculate the recognition operation result corresponding to each batch of data vectors, and calculate the accumulated correction value according to the recognition operation result, and correct the weight vector according to the accumulated correction value, so that The number of times the second processing device sends the correction weight vector to the first processing device is reduced, thereby reducing the communication bandwidth between the second processing device and the first processing device.
  • the correction value may be calculated by the first processing device, and the correction value may be sent to the second processing device, or the correction value may be calculated by the second processing device.
  • the first processing device is further configured to calculate the at least one correction value according to the N sets of output values, and send the calculated at least one correction value to the Said second processing device.
  • the second processing device is specifically configured to receive the at least one correction value sent by the first processing device.
  • the first processing device is further configured to send the N sets of output values to the second processing device when the correction value is calculated by the second processing device.
  • the second processing device is specifically configured to calculate the at least one correction value according to the N sets of output values.
  • the second processing device in order to reduce the communication bandwidth between the first processing device and the second processing device, is further configured to perform the first preset processing on the correction weight vector.
  • the weight vector is corrected, and the data amount of the processed correction weight vector is smaller than the data amount of the correction weight vector.
  • the second processing device is specifically configured to send the processed correction weight vector to the first processing device.
  • the first processing device is further configured to receive the processed correction weight vector sent by the second processing device, and perform a second preset process on the processed correction weight vector to obtain the correction weight vector
  • the second preset process is an inverse process of the first preset process.
  • the first preset process includes at least one of a compression process and a quantization process.
  • the first processing device is specifically configured to:
  • the first data vector is any one of the N data vectors
  • the error vector is a difference vector between a forward output vector of an output layer of the target neural network and an ideal output vector corresponding to the first data vector in the training set;
  • a forward output vector of each layer of the target neural network and an inverted output vector of each layer of the target neural network are acquired as a set of output values corresponding to the first data vector.
  • the first processing device is further configured to calculate the at least one correction value by using a formula according to the N sets of output values, where the formula is:
  • ⁇ w ij represents a correction value of the weight vector element corresponding to the directed arc of the jth node in the p+1th layer in the target neural network
  • X pib represents the The i-th vector element in the forward output vector of the p-th layer of the target neural network in the b-th group output value of the N-group output value
  • E (p+1)jb represents the b-th of the N-group output value
  • the jth vector element in the inverse output vector of the p+1th layer of the target neural network in the group output value, i, j, and p are positive integers greater than or equal to 1;
  • the first processing device is further configured to send the calculated at least one correction value to the second processing device;
  • the second processing device is specifically configured to receive the at least one correction value sent by the first processing device.
  • the first processing device is further configured to send the N sets of output values to the second processing device;
  • the second processing device is specifically configured to calculate the at least one correction value according to the N sets of output values by using a formula, where the formula is:
  • ⁇ w ij represents a correction value of the weight vector element corresponding to the directed arc of the jth node in the p+1th layer in the target neural network
  • X pib represents the The i-th vector element in the forward output vector of the p-th layer of the target neural network in the b-th group output value of the N-group output value
  • E (p+1)jb represents the b-th of the N-group output value
  • the jth vector element in the inverse output vector of the p+1th layer of the target neural network in the group output value, i, j, and p are positive integers greater than or equal to 1.
  • the first processing device and the second processing device are integrated in one device.
  • the first processing device includes a second processor
  • the second processing device includes a third processor
  • the first processing device is a processing device that is composed of multiple computing nodes in a preset computing network
  • the second processing device is a processing device deployed at the edge of the cloud or a preset communication network.
  • the first operation further includes a nonlinear transform operation.
  • a neural network training method comprising:
  • the first processing device acquires a weight vector of the target neural network
  • the first processing device acquires N data vectors in the training set, where the training set includes multiple data vectors, and N is a positive integer greater than or equal to 1;
  • the first processing device performs a first operation based on each of the N data vectors and the weight vector to obtain N sets of output values, where the first operation includes a vector dot product operation;
  • the second processing device acquires at least one correction value, each of the at least one correction value is used to correct one of the weight vectors, and each of the correction values is calculated according to the N sets of output values
  • the second processing device is different from the first processing device
  • the second processing device corrects the vector element in the weight vector according to the at least one correction value to obtain a correction weight vector
  • the correction weight vector is used to indicate that the first processing device is based on N other data vectors in the training set and the correction
  • the weight vector performs the first operation, and the N other data vectors are data vectors other than the N data vectors in the training set.
  • the method further includes:
  • the first processing device calculates the at least one correction value according to the N sets of output values, and sends the calculated at least one correction value to the second processing device;
  • the second processing device acquires at least one correction value, including:
  • the second processing device receives the at least one correction value sent by the first processing device.
  • the method further includes:
  • the second processing device acquires at least one correction value, including:
  • the second processing device calculates the at least one correction value according to the N sets of output values.
  • the method further includes:
  • the second processing device performs a first preset process on the correction weight vector to obtain a processed correction weight vector, and the processed data volume of the modified weight vector is smaller than the data amount of the correction weight vector;
  • the method further includes:
  • the second preset process is an inverse process of the first preset process.
  • the first preset process includes at least one of a compression process and a quantization process.
  • the first processing device performs a first operation based on each of the N data vectors and the weight vector to obtain N sets of output values, including:
  • the first processing device performs a neural network forward operation based on the first data vector and the weight vector to obtain a forward output vector of each layer of the target neural network, where the neural network forward operation includes a vector dot product operation and a nonlinear transform operation, the first data vector being any one of the N data vectors;
  • the first processing device acquires an error vector, where the error vector is a difference vector between a forward output vector of an output layer of the target neural network and an ideal output vector corresponding to the first data vector in the training set;
  • the first processing device performs a neural network inverse operation based on the error vector and the weight vector to obtain an inverse output vector of each layer of the target neural network, and the neural network inverse operation includes a vector dot product operation;
  • the first processing device acquires a forward output vector of each layer of the target neural network and an inverse output vector of each layer of the target neural network as a set of output values corresponding to the first data vector.
  • the method further includes:
  • the first processing device calculates the at least one correction value by a formula according to the N sets of output values, wherein the formula is:
  • ⁇ w ij represents a correction value of the weight vector element corresponding to the directed arc of the jth node in the p+1th layer in the target neural network
  • X pib represents the The i-th vector element in the forward output vector of the p-th layer of the target neural network in the b-th group output value of the N-group output value
  • E (p+1)jb represents the b-th of the N-group output value
  • the jth vector element in the inverse output vector of the p+1th layer of the target neural network in the group output value, i, j, and p are positive integers greater than or equal to 1;
  • the first processing device sends the calculated at least one correction value to the second processing device
  • the second processing device acquires at least one correction value, including:
  • the second processing device receives the at least one correction value sent by the first processing device.
  • the method further includes:
  • the second processing device acquires at least one correction value, including:
  • the second processing device calculates the at least one correction value by a formula according to the N sets of output values, wherein the formula is:
  • ⁇ w ij represents a correction value of the weight vector element corresponding to the directed arc of the jth node in the p+1th layer in the target neural network
  • X pib represents the The i-th vector element in the forward output vector of the p-th layer of the target neural network in the b-th group output value of the N-group output value
  • E (p+1)jb represents the b-th of the N-group output value
  • the jth vector element in the inverse output vector of the p+1th layer of the target neural network in the group output value, i, j, and p are positive integers greater than or equal to 1.
  • the first processing device and the second processing device are integrated in one device.
  • the first processing device includes a second processor
  • the second processing device includes a third processor
  • the first processing device is a processing device that is composed of multiple computing nodes in a preset computing network
  • the second processing device is a processing device deployed at the edge of the cloud or a preset communication network.
  • the first operation further includes a nonlinear transform operation.
  • a computer readable storage medium stores a computer program, and the stored computer program can be implemented when executed by the first processing device of the first aspect The operation performed by the first processing device of the first aspect during the neural network training process;
  • the stored computer program when executed by the second processing device of the first aspect, is capable of performing the operations performed by the second processing device of the first aspect described above in the neural network training process.
  • a computer program product comprising instructions, when run on a first processing device, enables the first processing device to implement the first processing device of the first aspect described above in a neural network training process The operation performed in ; or,
  • the second processing device When it is run on the second processing device, the second processing device is enabled to perform the operations performed by the second processing device described in the first aspect above during the neural network training process.
  • the first processing device can perform a vector dot product operation in the neural network training process
  • the second processing device can perform the neural network training process Weight vector correction operations and other types of operations. Therefore, the first processing device may only include a special circuit required to perform a vector dot product operation, so that the first processing device has a relatively simple circuit structure and high computational efficiency, since most of the operations in the neural network training process are vectors. The dot product operation, therefore, the higher efficiency of the first processing device can improve the training efficiency of the neural network, and the training efficiency of the neural network is also higher.
  • FIG. 1 is a schematic diagram of a neural network provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of another neural network provided by an embodiment of the present application.
  • FIG. 3 is a block diagram of a neural network training system provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a directed neural arc pointing direction when a neural network is inversely operated by a neural network according to an embodiment of the present application.
  • FIG. 5 is a flowchart of a method for a neural network training system to calculate at least one correction value according to N sets of output values according to an embodiment of the present application.
  • FIG. 6 is a flowchart of a neural network training method according to an embodiment of the present application.
  • FIG. 7 is a flowchart of a neural network training method according to an embodiment of the present application.
  • FIG. 8 is a flowchart of a neural network training method according to an embodiment of the present application.
  • a neural network can be viewed as a directed graph in nature, which can include multiple layers, each layer including at least one node, wherein the first layer of the neural network can be referred to as an input layer, and the last layer can be called For the output layer, the layer between the input layer and the output layer can be called an implicit layer.
  • nodes in each layer except the output layer can point to all of the next layer through the directed arc. Nodes, where each directed arc corresponds to a weight.
  • FIG. 1 is a schematic diagram of an exemplary neural network.
  • the neural network may include four layers, wherein the first layer includes two nodes, which are input layers, and the second and third layers respectively include Three nodes and two nodes are hidden layers, and the fourth layer includes three nodes, which are output layers.
  • the first node in the first layer refers to the first node in the first layer in the order of top to bottom, the similar description below is the same as here
  • the node passes the directed arc a 1 , a 2 and a 3 respectively point to all 3 nodes in the second layer, wherein the weights of the directed arcs a 1 , a 2 and a 3 may be 1, -1, 1, respectively.
  • each layer in the neural network may correspond to an output vector
  • the output vector corresponding to each layer may be composed of output values of all nodes in the layer, except for the input layer in the neural network.
  • the output vector corresponding to the layer can be calculated according to the following formula (1) according to the weight vector composed of the output vector of the layer above the layer and the weight corresponding to the directed arc of the node in the layer, and the neural network input layer
  • the output vector is equal to the data vector that the outside world inputs to the input layer:
  • x (p+1)j refers to the value of the jth vector element of the output vector corresponding to the p+1th layer of the neural network
  • u is the output vector of the pth layer of the neural network
  • u [x p1 , x p2 , x p3 , ..., x pn ]
  • x pn refers to the output value of the nth node in the p-th layer of the neural network
  • the p-th layer of the neural network contains n nodes
  • " ⁇ " is a vector
  • v is a weight vector composed of weights corresponding to all directed arcs of the jth node in the p+1th layer of the neural network
  • v [w 1j , w 2j , w 3j , .
  • w nj refers to the weight of the nth node in the p-th layer of the neural network pointing to the directed arc of the j-th node in the p+1th layer of the neural network
  • f is a nonlinear function
  • b ( p+1)j is the offset value of the jth node in the p+1th layer of the neural network.
  • the output values of the three nodes in the second layer are 3, 2, and 3, respectively, and the output vector of the composition is [3, 2, 3], pointing to the third layer.
  • the main role of the neural network is to identify the object to be identified, that is, the main role of the neural network is to perform recognition operations.
  • the data vector of the object to be identified may be input by the input layer of the neural network, and then the neural network may be operated layer by layer according to the above formula (1) to finally obtain the output vector of the output layer of the neural network as the neural network.
  • the neural network can identify the object to be identified based on the output vector of the output layer of the neural network.
  • the present application will describe the process of recognizing a neural network by using a neural network including only an input layer and an output layer as shown in FIG. 2 to identify bananas and apples.
  • Table 1 the characteristic values of apple and banana are shown, wherein the characteristic value of the color is 1 for red, the characteristic value is -1 for yellow, the characteristic value of the shape is 1 for a circle, and the characteristic value is -1 for a curved shape. .
  • the data vector of Apple can be [1, 1] and the data vector of banana can be [-1, -1].
  • the input layer includes two nodes, and the output layer includes 1 node, wherein the first node of the input layer points to the directed layer node, the directed arc a 7 corresponds to a weight of 1, and the second node of the input layer points to the right of the directed layer node a directed arc a 8
  • the value is also 1, the offset value of the output layer node is 0, and the nonlinear function f is a step function, which is:
  • the neural network shown in Fig. 2 can realize the recognition of apples and bananas.
  • the object to be identified may include more than two feature values, and the feature value may not be a specific value, but any value within a certain preset range, and the neural network may also be compared to FIG. 2
  • the neural network shown is much more complicated, but its recognition principle is the same as the above description.
  • the neural network can usually be trained using a training set, wherein the training set can include multiple data vectors.
  • a set of weight vectors can be randomly set for the neural network in the initial state, and a data vector in the training set is identified based on the randomly set weight vector, and then based on the recognition operation The result is corrected by the randomly set weight vector of the group, and the correction weight vector is obtained.
  • the neural network can identify another data vector in the training set based on the correction weight vector, and correct the weight vector according to the result of the recognition operation. Further modification, the above process can be repeated multiple times during the training of the neural network until the neural network can obtain a near-ideal output vector for any data vector based on the modified weight vector.
  • BP Back Propagation
  • Hebbian Learning English: Hebbian Learning
  • synaptic plasticity English: Spike
  • RBM Restricted Boltzmann Machines
  • a neural network training device in order to train a neural network, usually needs to configure a processor capable of performing the above various types of operations.
  • the circuit structure of the processor is usually complicated. This makes the processor less efficient, resulting in lower training efficiency of the neural network.
  • the present application provides a neural network training system 300.
  • the neural network training system 300 may include a first processing device 301 and a second processing.
  • the device 302 is different from the first processing device 301 and the second processing device 302.
  • the first processing device 301 is configured to: acquire a weight vector of the target neural network; acquire N data vectors in the training set, where the training set includes multiple data vectors, where N is a positive integer greater than or equal to 1; Each of the N data vectors and the weight vector performs a first operation to obtain N sets of output values, the first operation including a vector dot product operation.
  • the second processing device 302 is configured to: acquire at least one correction value, each correction value of the at least one correction value is used to correct one vector element in the weight vector, and each correction value is calculated according to the N group output values. Obtaining; correcting the vector element in the weight vector according to the at least one correction value to obtain a correction weight vector; and sending the correction weight vector to the first processing device 301, the correction weight vector is used to indicate that the first processing device 301 is based on The first operation is performed by N other data vectors in the training set and the correction weight vector, and the N other data vectors are data vectors other than the N data vectors in the training set.
  • the recognition operation of the neural network may include a vector dot product operation and a nonlinear transformation operation.
  • the two operation types are required.
  • most of the operations in the training process of neural networks are vector dot product operations.
  • the neural network training system can separately set a first processing device 301 to perform a vector dot product operation.
  • the first processing device 301 may acquire a weight vector of the target neural network and N data vectors in the training set, and perform a first operation based on the weight vector and each of the N data vectors to N sets of output values corresponding to the N data vectors are obtained, wherein the first operation includes a vector dot product operation, and each set of output values includes a recognition operation result of the neural network pair corresponding to the data vector.
  • the weight vector of the target neural network acquired by the first processing device 301 may be a weight vector randomly set for the target neural network in the initial state, or may be the second processing device 302 to the first processing device 301. The corrected weight vector sent.
  • the neural network training system may further set the second processing device 302 to perform other types of operations except the vector dot product operation in the neural network training process, and the other types of operations may generally include a weight vector correction operation and the like.
  • the second processing device 302 may obtain at least one correction value, wherein each of the at least one correction value is used to correct one vector element in the weight vector, and the second processing device 302 may be configured according to the at least one The correction value corrects the vector element in the weight vector to obtain the correction weight vector.
  • the second processing device 302 may store the correction weight vector and send the correction weight vector to the first processing device 301 to be performed by the first processing device 301 according to the correction weight vector and N other data vectors in the training set.
  • An operation obtains another N sets of output values, that is, the first processing device 301 can perform a recognition operation on the data vector in the training set according to the correction weight vector, and the second processing device 302 further corrects the correction according to the recognition operation result.
  • Weight vector refers to: adding the vector element in the weight vector to the corresponding correction value to obtain a correction vector element, for example, in the neural network as shown in FIG.
  • the directed arcs a 4 , a 5 , and a 6 pointing to the 1st node in the 3rd layer currently have weights of 1, -1, and 1, respectively, and the weight vectors of the constituents are [1, -1, 1].
  • the correction values acquired by the second processing device 302 are 0.2, 0.3, and -0.1, respectively, and the vector elements of the weight vector [1, -1, 1] are respectively corrected according to the correction value, and the corrected vector element 1.2 can be obtained.
  • the correction vector element can constitute a correction weight vector [1.2, -0.7, 0.9].
  • the first processing device 301 can perform only the vector dot product operation, the first processing device 301 can only include the special circuit required to perform the vector dot product operation, so that the first processing device 301 has a relatively simple circuit structure.
  • the efficiency is also high. Since most of the operations in the neural network training process are vector dot product operations, the higher efficiency of the first processing device 301 can improve the training efficiency of the neural network, and the training efficiency of the neural network is also higher. .
  • the second processing device 302 performing the weight vector correction operation can be configured according to different neural network training algorithms, thereby being able to satisfy different neural network training. The requirements of the algorithm make the neural network training system more flexible in training the neural network.
  • the neural network training system needs to perform a weight vector correction, that is, the neural network training system needs for each data vector in the training set. Performing a technical process of calculating at least one correction value and correcting the weight vector using the at least one correction value.
  • the neural network training system provided by the present application trains the neural network according to the above conventional method
  • the second processing device 302 needs to perform a weight vector correction operation for each data vector in the training set, and needs to correct the weight vector.
  • the data is sent to the first processing device 301.
  • the data amount of the weight vector of the target neural network may be very large. Therefore, the second processing device 302 frequently transmits the communication required to modify the weight vector to the first processing device 301. The bandwidth is large.
  • the neural network training system may perform a weight vector for a batch of data vectors in the training set (ie, N is greater than or equal to 2).
  • the correction that is, the neural network training system can calculate the recognition operation results corresponding to a batch of data vectors in the training set, obtain a batch of output values (N sets of output values), and calculate the accumulated correction values according to the batch of output values. And correcting the weight vector according to the accumulated correction value, so that the number of times the second processing device 302 sends the correction weight vector to the first processing device 301 is reduced, thereby reducing the second processing device 302 and the first processing device. Communication bandwidth between 301.
  • the first processing device 301 and the second processing device 302 may be integrated into the same device. In one embodiment of the present application, the first processing device 301 and the second processing device 302 may be integrated. In one device, of course, the first processing device 301 and the second processing device 302 may also be different devices. In an embodiment of the present application, the first processing device 301 may include a second processor, and the second processing device. 302 may include a third processor; or, the first processing device 301 may be a processing device composed of multiple computing nodes in a preset computing network. In practical applications, the multiple computing nodes may be mobile phones or computers, etc.
  • the processing device 302 can be a processing device deployed at the edge of the cloud or a preset communication network. For example, the second processing device 302 can be a base station deployed at a preset communication network edge or the like.
  • the neural network training system provides a first processing device and a second processing device in a neural network training system, wherein the first processing device can perform a vector dot product operation in the neural network training process. And the second processing device can perform other types of operations such as a weight vector correction operation in the neural network training process. Therefore, the first processing device may only include a special circuit required to perform a vector dot product operation, so that the first processing device has a relatively simple circuit structure and high computational efficiency, since most of the operations in the neural network training process are vectors. The dot product operation, therefore, the higher efficiency of the first processing device can improve the training efficiency of the neural network, and the training efficiency of the neural network is also higher.
  • the recognition operation of the neural network usually includes two types of operations, a vector dot product operation and a nonlinear transformation operation, that is, in order to obtain the above N sets of output values, vector dot product operation and nonlinear transformation are required.
  • the first processing device 301 in the present application may perform only the vector dot product operation therein, and the non-linear transform operation is performed by another processing device to finally obtain the above-mentioned N sets of output values, or the first processing device 301 in the present application may Both the vector dot product operation and the nonlinear transform operation are performed, that is, the first operation may further include a vector dot product operation, in which case the first processing device 301 may include an execution vector dot product operation and nonlinearity.
  • the circuit required for the conversion operation has a relatively simple circuit structure and high computational efficiency. At the same time, it is not necessary to provide another processing device for performing nonlinear variation operations, and the overhead on the hardware can be reduced.
  • the second processing device 302 may perform the first preset processing on the correction weight vector.
  • the weight vector is modified, and the processed correction weight vector is sent to the first processing device 301, so that the first processing device 301 performs a recognition operation on the data vector in the training set according to the processed correction weight vector.
  • the data amount of the processed correction weight vector is usually smaller than the data amount of the correction weight vector.
  • a storage element occupying a storage space in the correction weight vector may be 16 or more bits. (usually 32 bits), and the storage space occupied by one vector element in the processed correction weight vector may be 4 bits to 8 bits.
  • the correction weight vector is sent to the first processing device after the first preset processing.
  • 301 can reduce the communication bandwidth between the second processing device 302 and the first processing device 301.
  • the first preset processing may be at least one of a compression process or a quantization process, where the quantization process refers to mapping each vector element in the correction weight vector to a value with a small data amount, wherein the mapping process This can be implemented by a function or by a lookup table, which is not specifically limited in this application.
  • the first processing device 301 may perform a second preset process on the processed correction weight vector to obtain a correction weight vector, where the second preset processing is the first preset
  • the inverse processing of the processing that is, the second preset processing is a reverse processing from the first preset processing, and then the first processing device 301 performs a recognition operation on the data vector in the training set based on the obtained correction weight vector, or first
  • the processing device 301 can directly perform the recognition operation on the data vector in the training set according to the processed correction weight vector, which is not specifically limited in this application.
  • the second processing device 302 may obtain at least one correction value calculated according to the N sets of output values, where the application provides two manners for the second processing device 302 to obtain the at least one correction value. among them:
  • the first processing device 301 can calculate the at least one correction value according to the N sets of output values, and send the calculated at least one correction value to the second processing device 302.
  • the second processing device 302 can receive at least one correction value sent by the first processing device 301.
  • the first processing device 301 can send the N sets of output values to the second processing device 302, and the second processing device 302 can calculate the at least one modified value according to the N sets of output values.
  • the present application will describe a technical process of calculating N sets of output values for a neural network training system by using a relatively common BP algorithm, and calculating at least one correction value according to the N sets of output values, as shown in FIG. 5, the technical process
  • the steps can be included:
  • Step 11 The first processing device 301 performs a neural network forward operation based on the first data vector and the weight vector to obtain a forward output vector of each layer of the target neural network, where the first data vector is in the N data vectors. Any one of the data vectors.
  • the neural network forward operation generally includes a vector dot product operation and a nonlinear transform operation, which refers to inputting a data vector by an input layer of a neural network, and obtaining an output layer of the neural network after layer-by-layer operation according to the above formula (1)
  • the operation of the output vector, the output vector of each layer of the neural network calculated during the operation can be referred to as the forward output vector of the layer.
  • the neural network forward operation refers to the input of the data layer of the first layer of the neural network, and then the second layer of the neural network is sequentially calculated according to the weight vector of the neural network by the formula (1).
  • Step 12 The first processing device 301 acquires a difference vector obtained by subtracting the forward output vector of the target neural network output layer from the ideal output vector corresponding to the first data vector as an error vector.
  • the ideal output vector corresponding to the first data vector may be stored in the training set.
  • the data set in the training set input by the input layer of the neural network that is, the first layer of the neural network
  • the forward output vector of the output layer of the obtained neural network that is, the fourth layer of the neural network
  • the output vector is [1, 1, 1]
  • the first processing device 301 can obtain the difference vector [2, 1, 2] of the vector [3, 2, 3] and the vector [1, 1, 1] as an error vector. .
  • Step 13 The first processing device 301 performs a neural network inverse operation based on the error vector and the weight vector to obtain an inverse output vector of each layer of the target neural network.
  • the neural network inverse operation includes a vector dot product operation, which refers to inputting an error vector by the output layer of the neural network, and obtaining an output vector of the input layer of the neural network after layer-by-layer operation, which is calculated in the operation process.
  • the output vector of each layer of the neural network can be called the inverse output vector of the layer.
  • the direction of the directed arc and the direction of the directed arc in the forward operation of the neural network are just right. in contrast.
  • the inverse output vector of each layer of the neural network can be calculated based on formula (2):
  • e pj is the value of the jth vector element in the inverse output vector of the p-th layer of the neural network
  • g is the function symbol
  • t is the inverse output vector of the p+1th layer of the neural network
  • t [e (p+1)1 , e (p+1)2 , ..., e (p+1)n ], where e (p+1)n is the p+1th layer of the neural network
  • the nth vector element, the p+1th layer of the neural network includes n nodes, " ⁇ " is the vector dot product operator, and q is the neural network.
  • All nodes in the p+1 layer point to the p
  • FIG. 4 is a schematic diagram showing the direction of the directed arc in the neural network shown in FIG. 1 when performing the neural network inverse operation.
  • the neural network inverse operation is the fourth layer input error of the neural network.
  • the vector is then sequentially calculated according to the above formula (2), and the output vector of the third layer of the neural network and the output vector of the second layer of the neural network are finally obtained, and finally the operation of the output vector of the first layer of the neural network is obtained.
  • Step 14 The first processing device 301 obtains N sets of output values that are in one-to-one correspondence with the N data vectors in the training set.
  • the first processing device 301 acquires the forward output vector of each layer of the target neural network and the inverted output vector of each layer of the target neural network as a set of output values corresponding to the first data vector. Performing the above technical processes on the N data vectors respectively can obtain N sets of output values one-to-one corresponding to the N data vectors.
  • Step 15 The first processing device 301 or the second processing device 302 calculates at least one correction value by formula (3) based on the N sets of output values.
  • ⁇ w ij represents the correction value of the weight vector element corresponding to the directed arc of the jth node in the p+1th layer in the p-th layer in the target neural network
  • X pib represents the N sets of output values
  • E (p+1)jb represents the output value of the b-group of the N-group output value of the target neural network
  • the jth vector element in the inverse output vector of the p+1th layer, i, j, and p are positive integers greater than or equal to one.
  • the first processing device 301 when the step 15 is performed by the second processing device 302, the first processing device 301 further needs to send the calculated N sets of output values to the second processing device 302.
  • the present application also provides a neural network training method, which is applied to a first processing device.
  • the neural network training method may include the following steps:
  • Step 601 The first processing device acquires a weight vector of the target neural network.
  • Step 602 The first processing device acquires N data vectors in the training set, where the training set includes multiple data vectors, where N is a positive integer greater than or equal to 1.
  • Step 603 The first processing device performs a first operation based on each of the N data vectors and the weight vector to obtain N sets of output values, where the first operation includes a vector dot product operation, so that the second processing device obtains At least one correction value, wherein each of the at least one correction value is used to correct a vector element in the weight vector, the correction value being calculated according to the N sets of output values, and causing the second processing device Correcting the vector element in the weight vector according to the at least one correction value, obtaining a correction weight vector, and causing the second processing device to send the correction weight vector to the first processing device, where the correction weight vector is used to indicate the first processing
  • the device performs the first operation based on the N other data vectors in the training set and the correction weight vector, where the N other data vectors are data vectors other than the N data vectors in the training set.
  • the neural network training method provided by the present application performs vector dot product operation in the neural network training process by the first processing device, and the second processing device performs other types such as weight vector correction operation in the neural network training process.
  • the operation allows the first processing device to include only the special circuits required to perform the vector dot product operation, so that the first processing device has a relatively simple circuit structure and high computational efficiency, since most of the operations in the neural network training process are performed. It is a vector dot product operation. Therefore, the higher efficiency of the first processing device can improve the training efficiency of the neural network, and the training efficiency of the neural network is also higher.
  • the embodiment of the present application further provides a neural network training method, which is applied to a second processing device.
  • the neural network training method may include the following steps:
  • Step 701 The second processing device acquires at least one correction value, where each of the at least one correction value is used to correct a vector element in a weight vector of the target neural network, and each correction value is calculated according to the N group output values. Obtaining, wherein the N sets of output values are calculated by the first processing device according to a weight vector of the target neural network and N data vectors in the training set, the first operation comprising a vector dot product operation.
  • Step 702 The second processing device corrects the vector element in the weight vector according to the at least one correction value to obtain a correction weight vector.
  • Step 703 The second processing device sends the correction weight vector to the first processing device, where the correction weight vector is used to instruct the first processing device to perform the first operation based on the N other data vectors in the training set and the correction weight vector.
  • the N other data vectors are data vectors other than the N data vectors in the training set.
  • the neural network training method provided by the present application performs vector dot product operation in the neural network training process by the first processing device, and the second processing device performs other types such as weight vector correction operation in the neural network training process.
  • the operation allows the first processing device to include only the special circuits required to perform the vector dot product operation, so that the first processing device has a relatively simple circuit structure and high computational efficiency, since most of the operations in the neural network training process are performed. It is a vector dot product operation. Therefore, the higher efficiency of the first processing device can improve the training efficiency of the neural network, and the training efficiency of the neural network is also higher.
  • the embodiment of the present application further provides a neural network training method, which is applied to a neural network training system.
  • the neural network training method may include the following steps:
  • Step 801 The first processing device acquires a weight vector of the target neural network.
  • Step 802 The first processing device acquires N data vectors in the training set, where the training set includes multiple data vectors, and N is a positive integer greater than or equal to 1.
  • Step 803 The first processing device performs a first operation based on each of the N data vectors and the weight vector to obtain N sets of output values, where the first operation includes a vector dot product operation.
  • Step 804 The second processing device acquires at least one correction value, each of the at least one correction value is used to correct one vector element in the weight vector, and each correction value is calculated according to the N sets of output values.
  • the application provides two ways for the second processing device to obtain the at least one correction value, wherein:
  • the first processing device may calculate the at least one correction value according to the N sets of output values, and send the calculated at least one correction value to the second processing device.
  • the second processing device can receive the at least one correction value sent by the first processing device.
  • the first processing device may send the N sets of output values to the second processing device, and the second processing device may calculate the at least one modified value according to the N sets of output values.
  • Step 805 The second processing device corrects the vector element in the weight vector according to the at least one correction value to obtain a correction weight vector.
  • Step 806 The second processing device sends the correction weight vector to the first processing device, where the correction weight vector is used to instruct the first processing device to perform the first step based on the N other data vectors in the training set and the correction weight vector.
  • An operation, the N other data vectors being data vectors other than the N data vectors in the training set.
  • the second processing device may perform the first preset processing on the correction weight vector to obtain the processed correction weight vector, and The correction weight vector is sent to the first processing device, so that the first processing device performs a recognition operation on the data vector in the training set according to the processed correction weight vector.
  • the data amount of the processed correction weight vector is usually smaller than the data amount of the correction weight vector.
  • a storage element occupying a storage space in the correction weight vector may be 16 or more bits. (usually 32 bits), and the storage space occupied by one vector element in the processed correction weight vector may be 4 bits to 8 bits.
  • the correction weight vector is sent to the first processing device after the first preset processing.
  • the communication bandwidth between the second processing device and the first processing device can be reduced.
  • the first preset processing may be at least one of a compression process or a quantization process, where the quantization process refers to mapping each vector element in the correction weight vector to a value with a small data amount, wherein the mapping process This can be implemented by a function or by a lookup table, which is not specifically limited in this application.
  • the first processing device may perform a second preset process on the processed correction weight vector to obtain a correction weight vector, where the second preset processing is the first preset processing.
  • the inverse processing that is, the second preset processing is a reverse processing from the first preset processing, and then the first processing device performs a recognition operation on the data vector in the training set based on the obtained correction weight vector, or the first processing device
  • the data vector in the training set can be directly identified and operated according to the processed correction weight vector, which is not specifically limited in this application.
  • the neural network training method provided by the present application performs vector dot product operation in the neural network training process by the first processing device, and the second processing device performs other types such as weight vector correction operation in the neural network training process.
  • the operation allows the first processing device to include only the special circuits required to perform the vector dot product operation, so that the first processing device has a relatively simple circuit structure and high computational efficiency, since most of the operations in the neural network training process are performed. It is a vector dot product operation. Therefore, the higher efficiency of the first processing device can improve the training efficiency of the neural network, and the training efficiency of the neural network is also higher.
  • the present application further provides a computer readable storage medium, which may be a non-volatile storage medium, in which a computer program is stored, when the computer
  • the computer program in the readable storage medium when executed by the first processing device 301 described above, is capable of performing the operations performed by the first processing device 301 during the neural network training process, or in the computer readable storage medium
  • the computer program is executed by the second processing device 302
  • the operation performed by the second processing device 302 in the neural network training process can be implemented.
  • the present application also provides a computer program product comprising instructions that, when run on a first processing device 301, enable the first processing device 301 to implement the first process of the above-described embodiments The operation performed by the device 301 during the neural network training process; or,
  • the second processing device 302 When it is running on the second processing device 302, the second processing device 302 is enabled to perform the operations performed by the second processing device 302 in the neural network training process in the above-described embodiments.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Feedback Control In General (AREA)
  • Manipulator (AREA)

Abstract

一种神经网络训练系统、方法和计算机可读存储介质,属于机器学习算法领域。所述神经网络训练系统(300)包括:第一处理设备(301)和第二处理设备(302),其中,第一处理设备(301)用于获取目标神经网络的权向量和训练集中的N个数据向量,并基于N个数据向量中的每一个数据向量和权向量进行第一运算得到N组输出值,第一运算包括向量点积运算;第二处理设备(302),用于获取根据N组输出值计算得到的至少一个修正值,并根据至少一个修正值对第二处理设备(302)中存储的神经网络的权向量中的向量元素进行修正,得到修正权向量,并将修正权向量发送至第一处理设备(301)。提供的神经网络训练系统能够提高神经网络训练的效率。

Description

神经网络训练系统、方法和计算机可读存储介质
本申请要求于2017年8月22日提交中国国家知识产权局、申请号为201710725775.9、申请名称为“神经网络训练系统、方法和计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及机器学习算法领域,特别涉及一种神经网络训练系统、方法和计算机可读存储介质。
背景技术
神经网络(也可称为人工神经网络)是一种常见的机器学习算法,其在语音识别、图像识别、自然语言处理等多个领域已经取得了较大的成功,而且还在迅速发展的过程中。
在实际应用中,神经网络一般可以包括多个权向量,在利用神经网络进行识别运算时,可以向神经网络中输入待识别对象的数据向量,神经网络可以基于该数据向量和自身的多个权向量计算得到该数据向量对应的输出向量,而后神经网络可以基于该输出向量对待识别对象进行识别。通常情况下,初始状态下的神经网络中的权向量是未知的,为了获取权向量,使神经网络能够正常进行识别运算,需要对初始状态下的神经网络进行训练。在对神经网络进行训练的过程中,通常可以为神经网络随机设定一组权向量,并基于神经网络在该组权向量下对不同数据向量的识别运算结果对该组权向量进行多次修正,直至神经网络能够基于修正后的权向量对任一数据向量都能得到一个接近理想的输出向量为止。
相关技术中,可以使用神经网络训练装置对神经网络进行训练,其中,该神经网络训练装置可以包括一个处理器,该处理器需要完成神经网络训练过程中所涉及到的所有运算。
在实现本申请的过程中,发明人发现相关技术至少存在以下问题:
神经网络的训练过程中通常会涉及多种不同类型的运算,如向量点积运算、非线性变换运算、权向量修正运算等,因此,相关技术中神经网络训练装置中的处理器一般为能够进行多种类型运算的处理器,为了满足该处理器的运算通用性,该处理器的电路结构通常较为复杂,这使得该处理器的运算效率较低,从而导致神经网络的训练效率也较低。
发明内容
本申请提供了一种神经网络训练系统、方法和计算机可读存储介质,能够解决相关技术中神经网络的训练效率较低的问题。所述技术方案如下:
第一方面,提供了一种神经网络训练系统,所述神经网络训练系统包括第一处理设备和第二处理设备,所述第一处理设备和所述第二处理设备不同;
所述第一处理设备,用于:
获取目标神经网络的权向量;
获取训练集中的N个数据向量,其中,所述训练集包括多个数据向量,N为大于或等于 1的正整数;
基于所述N个数据向量中的每一个数据向量和所述权向量进行第一运算得到N组输出值,所述第一运算包括向量点积运算;
所述第二处理设备,用于:
获取至少一个修正值,所述至少一个修正值中的每个修正值用于修正所述权向量中的一个向量元素,所述每个修正值根据所述N组输出值计算得到;
根据所述至少一个修正值对所述权向量中的向量元素进行修正,得到修正权向量;
将所述修正权向量发送至所述第一处理设备,所述修正权向量用于指示所述第一处理设备基于所述训练集中的N个其他数据向量和所述修正权向量进行所述第一运算,所述N个其他数据向量为所述训练集中除所述N个数据向量之外的数据向量。
本申请在神经网络训练系统中设置第一处理设备和第二处理设备,其中,第一处理设备可以执行神经网络训练过程中的向量点积运算,而第二处理设备可以执行神经网络训练过程中的权向量修正运算等其他类型的运算。因此,第一处理设备可以仅包含执行向量点积运算所需的特殊电路,从而使得第一处理设备电路结构较为简单,运算效率也较高,由于神经网络训练过程中大部分的运算均为向量点积运算,因此,第一处理设备运算效率较高可以提高神经网络的训练效率,使得神经网络的训练效率也较高。
此外,由于不同的神经网络训练算法的权向量修正策略不同,也即是权向量修正运算不同,因此,执行权向量修正运算的第二处理设备可以根据不同的神经网络训练算法进行相应地配置,从而能够满足不同的神经网络训练算法的需求,使得神经网络训练系统对神经网络的训练更加灵活。
进一步地,本申请提供的神经网络训练系统可以不对训练集中每一个数据向量都执行一次计算至少一个修正值并基于至少一个修正值修正权向量的过程,而是对一批数据向量执行一次权向量修正,也即是,神经网络训练系统可以计算一批数据向量分别对应的识别运算结果,并根据该识别运算结果计算累加的修正值,并根据该累加的修正值对权向量进行一次修正,这样,第二处理设备向第一处理设备发送修正权向量的次数就减少了,从而降低了第二处理设备和第一处理设备之间的通信带宽。
在实际应用中,可以由第一处理设备计算上述修正值,并将上述修正值发送至第二处理设备,也可以由第二处理设备计算上述修正值。
当由第一处理设备计算修正值时,所述第一处理设备,还用于根据所述N组输出值计算所述至少一个修正值,并将计算得到的所述至少一个修正值发送至所述第二处理设备。所述第二处理设备,具体用于接收所述第一处理设备发送的所述至少一个修正值。
当由第二处理设备计算修正值时,所述第一处理设备,还用于将所述N组输出值发送至所述第二处理设备。所述第二处理设备,具体用于根据所述N组输出值计算所述至少一个修正值。
此外,在实际应用中,为了降低第一处理设备和第二处理设备之间的通信带宽,所述第二处理设备,还用于对所述修正权向量进行第一预设处理得到处理后的修正权向量,所述处理后的修正权向量的数据量小于所述修正权向量的数据量。所述第二处理设备,具体用于将所述处理后的修正权向量发送至所述第一处理设备。所述第一处理设备,还用于接收所述第二处理设备发送的所述处理后的修正权向量,并对所述处理后的修正权向量进行 第二预设处理得到所述修正权向量,所述第二预设处理为所述第一预设处理的逆处理。
可选的,所述第一预设处理包括压缩处理和量化处理中的至少一个。
可选的,所述第一处理设备,具体用于:
基于第一数据向量和所述权向量进行神经网络正向运算得到所述目标神经网络每一层的正向输出向量,所述神经网络正向运算包括向量点积运算和非线性变换运算,所述第一数据向量为所述N个数据向量中的任意一个数据向量;
获取误差向量,所述误差向量为所述目标神经网络的输出层的正向输出向量与所述训练集中所述第一数据向量对应的理想输出向量的差向量;
基于所述误差向量和所述权向量进行神经网络反向运算得到所述目标神经网络每一层的反向输出向量,所述神经网络反向运算包括向量点积运算;
将所述目标神经网络每一层的正向输出向量和所述目标神经网络每一层的反向输出向量获取为对应于所述第一数据向量的一组输出值。
可选的,所述第一处理设备,还用于根据所述N组输出值通过公式计算所述至少一个修正值,其中,所述公式为:
Figure PCTCN2018079500-appb-000001
其中,Δw ij表示所述目标神经网络中由第p层中第i个节点指向第p+1层中第j个节点的有向弧对应的权向量向量元素的修正值,X pib表示所述N组输出值的第b组输出值中所述目标神经网络的第p层的正向输出向量中的第i个向量元素,E (p+1)jb表示所述N组输出值的第b组输出值中所述目标神经网络的第p+1层的反向输出向量中的第j个向量元素,i、j和p均为大于或等于1的正整数;
所述第一处理设备,还用于将计算得到的所述至少一个修正值发送至所述第二处理设备;
所述第二处理设备,具体用于接收所述第一处理设备发送的所述至少一个修正值。
可选的,第一处理设备,还用于将所述N组输出值发送至所述第二处理设备;
所述第二处理设备,具体用于根据所述N组输出值通过公式计算所述至少一个修正值,其中,所述公式为:
Figure PCTCN2018079500-appb-000002
其中,Δw ij表示所述目标神经网络中由第p层中第i个节点指向第p+1层中第j个节点的有向弧对应的权向量向量元素的修正值,X pib表示所述N组输出值的第b组输出值中所述目标神经网络的第p层的正向输出向量中的第i个向量元素,E (p+1)jb表示所述N组输出值的第b组输出值中所述目标神经网络的第p+1层的反向输出向量中的第j个向量元素,i、j和p均为大于或等于1的正整数。
可选的,所述第一处理设备和所述第二处理设备集成于一个设备中。
可选的,所述第一处理设备包括第二处理器,所述第二处理设备包括第三处理器。
可选的,所述第一处理设备为预设的运算网络中多个运算节点组成的处理设备;
所述第二处理设备为部署于云端或预设的通信网络边缘的处理设备。
可选的,所述第一运算还包括非线性变换运算。
第二方面,提供了一种神经网络训练方法,所述方法包括:
第一处理设备获取目标神经网络的权向量;
所述第一处理设备获取训练集中的N个数据向量,其中,所述训练集包括多个数据向量,N为大于或等于1的正整数;
所述第一处理设备基于所述N个数据向量中的每一个数据向量和所述权向量进行第一运算得到N组输出值,所述第一运算包括向量点积运算;
第二处理设备获取至少一个修正值,所述至少一个修正值中的每个修正值用于修正所述权向量中的一个向量元素,所述每个修正值根据所述N组输出值计算得到,所述第二处理设备与所述第一处理设备不同;
所述第二处理设备根据所述至少一个修正值对所述权向量中的向量元素进行修正,得到修正权向量;
所述第二处理设备将所述修正权向量发送至所述第一处理设备,所述修正权向量用于指示所述第一处理设备基于所述训练集中的N个其他数据向量和所述修正权向量进行所述第一运算,所述N个其他数据向量为所述训练集中除所述N个数据向量之外的数据向量。
可选的,所述方法还包括:
所述第一处理设备根据所述N组输出值计算所述至少一个修正值,并将计算得到的所述至少一个修正值发送至所述第二处理设备;
所述第二处理设备获取至少一个修正值,包括:
所述第二处理设备接收所述第一处理设备发送的所述至少一个修正值。
可选的,所述方法还包括:
所述第一处理设备将所述N组输出值发送至所述第二处理设备;
所述第二处理设备获取至少一个修正值,包括:
所述第二处理设备根据所述N组输出值计算所述至少一个修正值。
可选的,所述方法还包括:
所述第二处理设备对所述修正权向量进行第一预设处理得到处理后的修正权向量,所述处理后的修正权向量的数据量小于所述修正权向量的数据量;
所述第二处理设备将所述修正权向量发送至所述第一处理设备,包括:
所述第二处理设备将所述处理后的修正权向量发送至所述第一处理设备;
所述方法还包括:
所述第一处理设备接收所述第二处理设备发送的所述处理后的修正权向量,并对所述处理后的修正权向量进行第二预设处理得到所述修正权向量,所述第二预设处理是所述第一预设处理的逆处理。
可选的,所述第一预设处理包括压缩处理和量化处理中的至少一个。
可选的,所述第一处理设备基于所述N个数据向量中的每一个数据向量和所述权向量进行第一运算得到N组输出值,包括:
所述第一处理设备基于第一数据向量和所述权向量进行神经网络正向运算得到所述目标神经网络每一层的正向输出向量,所述神经网络正向运算包括向量点积运算和非线性变换运算,所述第一数据向量为所述N个数据向量中的任意一个数据向量;
所述第一处理设备获取误差向量,所述误差向量为所述目标神经网络的输出层的正向输出向量与所述训练集中所述第一数据向量对应的理想输出向量的差向量;
所述第一处理设备基于所述误差向量和所述权向量进行神经网络反向运算得到所述目标神经网络每一层的反向输出向量,所述神经网络反向运算包括向量点积运算;
所述第一处理设备将所述目标神经网络每一层的正向输出向量和所述目标神经网络每一层的反向输出向量获取为对应于所述第一数据向量的一组输出值。
可选的,所述方法还包括:
所述第一处理设备根据所述N组输出值通过公式计算所述至少一个修正值,其中,所述公式为:
Figure PCTCN2018079500-appb-000003
其中,Δw ij表示所述目标神经网络中由第p层中第i个节点指向第p+1层中第j个节点的有向弧对应的权向量向量元素的修正值,X pib表示所述N组输出值的第b组输出值中所述目标神经网络的第p层的正向输出向量中的第i个向量元素,E (p+1)jb表示所述N组输出值的第b组输出值中所述目标神经网络的第p+1层的反向输出向量中的第j个向量元素,i、j和p均为大于或等于1的正整数;
所述第一处理设备将计算得到的所述至少一个修正值发送至所述第二处理设备;
所述第二处理设备获取至少一个修正值,包括:
所述第二处理设备接收所述第一处理设备发送的所述至少一个修正值。
可选的,所述方法还包括:
所述第一处理设备将所述N组输出值发送至所述第二处理设备;
所述第二处理设备获取至少一个修正值,包括:
所述第二处理设备根据所述N组输出值通过公式计算所述至少一个修正值,其中,所述公式为:
Figure PCTCN2018079500-appb-000004
其中,Δw ij表示所述目标神经网络中由第p层中第i个节点指向第p+1层中第j个节点的有向弧对应的权向量向量元素的修正值,X pib表示所述N组输出值的第b组输出值中所述目标神经网络的第p层的正向输出向量中的第i个向量元素,E (p+1)jb表示所述N组输出值的第b组输出值中所述目标神经网络的第p+1层的反向输出向量中的第j个向量元素,i、j和p均为大于或等于1的正整数。
可选的,所述第一处理设备和所述第二处理设备集成于一个设备中。
可选的,所述第一处理设备包括第二处理器,所述第二处理设备包括第三处理器。
可选的,所述第一处理设备为预设的运算网络中多个运算节点组成的处理设备;
所述第二处理设备为部署于云端或预设的通信网络边缘的处理设备。
可选的,所述第一运算还包括非线性变换运算。
第三方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,存储的所述计算机程序被上述第一方面所述的第一处理设备执行时能够实现上述第一方面所述的第一处理设备在神经网络训练过程中执行的运算;
存储的所述计算机程序被上述第一方面所述的第二处理设备执行时能够实现上述第一方面所述的第二处理设备在神经网络训练过程中执行的运算。
第四方面,提供了一种包含指令的计算机程序产品,当其在第一处理设备上运行时, 使得该第一处理设备能够实现上述第一方面所述的第一处理设备在神经网络训练过程中执行的运算;或者,
当其在第二处理设备上运行时,使得该第二处理设备能够实现上述第一方面所述的第二处理设备在神经网络训练过程中执行的运算。
本申请提供的技术方案带来的有益效果是:
通过在神经网络训练系统中设置第一处理设备和第二处理设备,其中,第一处理设备可以执行神经网络训练过程中的向量点积运算,而第二处理设备可以执行神经网络训练过程中的权向量修正运算等其他类型的运算。因此,第一处理设备可以仅包含执行向量点积运算所需的特殊电路,从而使得第一处理设备电路结构较为简单,运算效率也较高,由于神经网络训练过程中大部分的运算均为向量点积运算,因此,第一处理设备运算效率较高可以提高神经网络的训练效率,使得神经网络的训练效率也较高。
附图说明
图1是本申请实施例提供的一种神经网络的示意图。
图2是本申请实施例提供的另一种神经网络的示意图。
图3是本申请实施例提供的神经网络训练系统的框图。
图4是本申请实施例提供的一种神经网络在进行神经网络反向运算时有向弧指向方向的示意图。
图5是本申请实施例提供的一种神经网络训练系统根据N组输出值计算至少一个修正值的方法的流程图。
图6是本申请实施例提供的一种神经网络训练方法的流程图。
图7是本申请实施例提供的一种神经网络训练方法的流程图。
图8是本申请实施例提供的一种神经网络训练方法的流程图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
为了使读者能够理解本申请的技术方案,下面,本申请将对神经网络及神经网络的训练过程进行简要说明。
神经网络本质上可以看成是一种有向图,其可以包括多个层,每一个层包括至少一个节点,其中,神经网络的第一层可以被称为输入层,最后一层可以被称为输出层,输入层和输出层之间的层可以被称为隐含层,在神经网络中,除输出层以外的每一层中的节点都可以通过有向弧指向下一层中的所有的节点,其中,每一个有向弧均对应于一个权值。
图1所示为一个示例性的神经网络的示意图,如图1所示,该神经网络可以包括4个层,其中,第1层包括2个节点,为输入层,第2、3层分别包括3个节点和2个节点,为隐含层,第4层包括3个节点,为输出层。以第1层中的第1个节点(指的是第1层中按照由上至下顺序的第1个节点,下文类似的说明与此处同理)为例,该节点通过有向弧a 1、a 2和a 3分别指向第2层中的所有的3个节点,其中,有向弧a 1、a 2和a 3对应的权值可以分别为1、-1、1。
在实际应用中,神经网络中的每一层都可以对应于一个输出向量,每一层对应的输出向量都可以由该层中所有节点的输出值组成,神经网络中除输入层以外的每一层对应的输出向量均可以根据该层的上一层的输出向量和指向该层中节点的有向弧对应的权值组成的权向量基于下述公式(1)计算得到,而神经网络输入层的输出向量等于外界向输入层输入的数据向量:
x (p+1)j=f(u·v+b (p+1)j)=f(x p1w 1j+x p2w 2j+x p3w 3j+……+x pnw nj+b (p+1)j)       (1)。
其中,x (p+1)j指的是神经网络第p+1层对应的输出向量的第j个向量元素的值,u为神经网络第p层的输出向量,且,u=[x p1,x p2,x p3,……,x pn],x pn指的是神经网络第p层中第n个节点的输出值,该神经网络第p层中包含n个节点,“·”为向量点积运算符,v为指向神经网络第p+1层中第j个节点的所有有向弧对应的权值组成的权向量,且,v=[w 1j,w 2j,w 3j,……,w nj],w nj指的是神经网络第p层中第n个节点指向神经网络第p+1层中第j个节点的有向弧对应的权值,f为非线性函数,b (p+1)j为神经网络第p+1层中第j个节点的偏置值。
例如,如图1所示的神经网络中,第2层中3个节点的输出值分别为3、2、3,其组成的输出向量为[3,2,3],指向第3层中第1个节点的有向弧a 4、a 5和a 6对应的权值分别为1、-1、1,其组成的权向量为[1,-1,1],非线性函数f为y=x 2,第3层中第1个节点的偏置值为2,则第3层对应输出向量中第1个向量元素的值为:
x 31=[3×1+2×(-1)+3×1+2] 2=36。
在实际应用中,神经网络的主要作用是对待识别对象进行识别,也即是神经网络的主要作用是进行识别运算。在神经网络的识别运算中,可以由神经网络的输入层输入待识别对象的数据向量,而后神经网络可以根据上述公式(1)逐层运算以最终得到神经网络的输出层的输出向量作为神经网络识别运算的结果,实际应用中,神经网络可以基于该神经网络的输出层的输出向量来对该待识别对象进行识别。
下面本申请将以图2所示的一个仅包含输入层和输出层的神经网络识别香蕉和苹果为例对神经网络的识别过程进行说明。如表1所示为苹果和香蕉的特征值,其中,颜色的特征值为1代表红色,特征值为-1代表黄色,形状的特征值为1代表圆形,特征值为-1代表弯形。
表1
品种 颜色 形状
苹果 1 1
香蕉 -1 -1
则由表1可知,苹果的数据向量可以为[1,1],香蕉的数据向量可以为[-1,-1],图2所示的神经网络中输入层包括两个节点、输出层包括1个节点,其中,输入层的第1个节点指向输出层节点的有向弧a 7对应的权值为1,输入层的第2个节点指向输出层节点的有向弧a 8对应的权值也为1,输出层节点的偏置值为0,非线性函数f为阶梯函数,该阶梯函数为:
Figure PCTCN2018079500-appb-000005
则当待识别对象为苹果时,图2中神经网络输出层的节点的输出值为: x 21=f(1×1+1×1+0)=f(2)=1,也即是输出层的输出向量为[1],当待识别对象为香蕉时,图2中神经网络输出层的节点的输出值为:x 21=f(-1×1-1×1+0)=f(-2)=0,也即是输出层的输出向量为[0]。换句话说,当神经网络输出层的输出向量为[1]时,可以确定待识别对象为苹果,当神经网络输出层的输出向量为[0]时,可以确定待识别对象为香蕉。由此,图2所示的神经网络即可实现对苹果和香蕉的识别。
当然,在实际应用中,待识别对象可以包括两个以上的特征值,且特征值可以不为一个特定的值,而是某一预设范围内的任一数值,神经网络也可以比图2所示的神经网络复杂得多,但其识别原理与上述说明是同理的。
由上述说明可知,为了使神经网络能够准确地对待识别对象进行识别,需要确定神经网络中每一个有向弧对应的权值,也即是需要确定神经网络中的各个权向量,而这就需要对神经网络进行训练。
在实际应用中,通常可以使用训练集对神经网络进行训练,其中,训练集中可以包括多个数据向量。在对神经网络进行训练的过程中,可以为初始状态下的神经网络随机设定一组权向量,并基于该随机设定的权向量对训练集中的一个数据向量进行识别运算,而后基于识别运算的结果对该组随机设定的权向量进行修正,得到修正权向量,神经网络可以基于该修正权向量对训练集中的另一个数据向量进行识别运算,并根据识别运算的结果对该修正权向量进行进一步地修正,在神经网络的训练过程中可以重复多次上述过程,直至神经网络能够基于修正后的权向量对任一数据向量都能得到一个接近理想的输出向量为止。
在实际应用中,神经网络的训练算法有许多种,例如,反向传播(英文:Back Propagation;简称:BP)算法、赫布学习(英文:Hebbian Learning)算法、神经突触可塑性(英文:Spike Timing Dependent Plasticity;简称:STDP)算法和受限玻尔兹曼机(英文:Restricted Boltzmann Machines;简称:RBM)算法等。然而,无论是哪一种训练算法都需要根据神经网络对数据向量的识别运算结果对权向量进行修正,不同训练算法的区别仅在于对权向量修正的策略不同,同样地,无论是哪一种训练算法均包含多种类型的运算,如向量点积运算、非线性变换运算、权向量修正运算等。相关技术中,神经网络训练装置为了对神经网络进行训练,通常需要配置一个能够进行上述多种类型运算的处理器,为了满足该处理器的运算通用性,处理器的电路结构通常较为复杂,这使得处理器器的运算效率较低,从而导致神经网络的训练效率也较低。
为了解决现有的神经网络的训练效率较低的问题,本申请提供了一种神经网络训练系统300,如图3所示,该神经网络训练系统300可以包括第一处理设备301和第二处理设备302,该第一处理设备301和第二处理设备302不同。
其中,第一处理设备301,用于:获取目标神经网络的权向量;获取训练集中的N个数据向量,其中,该训练集包括多个数据向量,N为大于或等于1的正整数;基于该N个数据向量中的每一个数据向量和该权向量进行第一运算得到N组输出值,该第一运算包括向量点积运算。
该第二处理设备302,用于:获取至少一个修正值,该至少一个修正值中的每个修正值用于修正该权向量中的一个向量元素,每个修正值根据该N组输出值计算得到;根据该至少一个修正值对该权向量中的向量元素进行修正,得到修正权向量;将该修正权向量发送 至第一处理设备301,该修正权向量用于指示第一处理设备301基于该训练集中的N个其他数据向量和该修正权向量进行该第一运算,该N个其他数据向量为训练集中除该N个数据向量之外的数据向量。
根据上文的说明,尽管神经网络的训练算法有许多种,然而,无论是哪一种训练算法都需要根据神经网络对数据向量的识别运算结果对权向量进行修正,其中,根据上文中的公式(1)可知,神经网络的识别运算可以包括向量点积运算和非线性变换运算,换句话说,无论是哪一种训练算法都需要进行向量点积运算和非线性变换运算这两种运算类型,实际上,神经网络的训练过程中的大部分运算都是向量点积运算。
本申请提供的神经网络训练系统可以单独设置一个第一处理设备301执行向量点积运算。可选地,该第一处理设备301可以获取目标神经网络的权向量和训练集中的N个数据向量,并基于该权向量和上述N个数据向量中的每一个数据向量进行第一运算,以得到与该N个数据向量一一对应的N组输出值,其中,上述第一运算包括向量点积运算,每一组输出值中都包含神经网络对与其对应的数据向量的识别运算结果。需要指出的是,上述第一处理设备301获取的目标神经网络的权向量可以是为初始状态下为目标神经网络随机设定的权向量,也可以是第二处理设备302向第一处理设备301发送的修正后的权向量。
同时,本申请提供的神经网络训练系统还可以设置第二处理设备302执行神经网络训练过程中除向量点积运算以外的其他类型的运算,该其他类型的运算通常可以包括权向量修正运算等。可选的,第二处理设备302可以获取至少一个修正值,其中,该至少一个修正值中的每个修正值用于修正权向量中的一个向量元素,第二处理设备302可以根据该至少一个修正值对权向量中的向量元素进行修正,得到修正权向量。第二处理设备302可以存储该修正权向量,并将该修正权向量发送至第一处理设备301中,以由第一处理设备301根据该修正权向量和训练集中的N个其他数据向量进行第一运算得到另外的N组输出值,也即是第一处理设备301可以根据该修正权向量对训练集中的数据向量进行识别运算,并由第二处理设备302根据该识别运算结果进一步修正该修正权向量。其中,所谓根据至少一个修正值对权向量的向量元素进行修正指的是:将权向量中的向量元素与对应的修正值相加得到修正向量元素,例如,如图1所示的神经网络中,指向第3层中第1个节点的有向弧a 4、a 5和a 6当前对应的权值分别为1、-1、1,其组成的权向量为[1,-1,1],第二处理设备302获取的修正值分别为0.2、0.3和-0.1,则根据该修正值对权向量[1,-1,1]的向量元素分别进行修正后,可以得到修正向量元素1.2、-0.7和0.9,该修正向量元素可以组成修正权向量[1.2,-0.7,0.9]。
这样,由于第一处理设备301可以仅执行向量点积运算,因此,第一处理设备301可以仅包含执行向量点积运算所需的特殊电路,从而使得第一处理设备301电路结构较为简单,运算效率也较高,由于神经网络训练过程中大部分的运算均为向量点积运算,因此,第一处理设备301运算效率较高可以提高神经网络的训练效率,使得神经网络的训练效率也较高。同时,由于不同的神经网络训练算法的权向量修正策略不同,因此,执行权向量修正运算的第二处理设备302可以根据不同的神经网络训练算法进行相应地配置,从而能够满足不同的神经网络训练算法的需求,使得神经网络训练系统对神经网络的训练更加灵活。
在传统的神经网络训练过程中,对于训练集中的每一个数据向量而言,神经网络训练 系统都需要进行一次权向量修正,也即是,神经网络训练系统对于训练集中的每一个数据向量都需要执行一次计算至少一个修正值和利用该至少一个修正值对权向量进行修正的技术过程。如果本申请提供的神经网络训练系统按照上述传统方法对神经网络进行训练,那么,针对训练集中的每一个数据向量,第二处理设备302都需要进行一次权向量修正运算,并需要将修正权向量发送至第一处理设备301,然而,实际应用中,目标神经网络的权向量的数据量可能十分庞大,因此,第二处理设备302频繁地向第一处理设备301发送修正权向量所需的通信带宽较大。
为了降低第二处理设备302和第一处理设备301之间的通信带宽,本申请提供的神经网络训练系统可以针对训练集中的一批数据向量(也即是N大于或等于2)执行一次权向量修正,也即是,神经网络训练系统可以计算训练集中的一批数据向量分别对应的识别运算结果,得到一批输出值(N组输出值),并根据该一批输出值计算累加的修正值,并根据累加的修正值对权向量进行一次修正,这样,第二处理设备302向第一处理设备301发送修正权向量的次数就减少了,从而降低了第二处理设备302和第一处理设备301之间的通信带宽。
需要指出的是,在实际应用中,第一处理设备301和第二处理设备302可以集成于同一设备中,在本申请的一个实施例中,第一处理设备301和第二处理设备302可以集成于一个设备中,当然,第一处理设备301和第二处理设备302也可为不同的设备,在本申请的一个实施例中,第一处理设备301可以包括第二处理器,第二处理设备302可以包括第三处理器;或者,第一处理设备301可以为预设的运算网络中多个运算节点组成的处理设备,在实际应用中,该多个运算节点可以为手机或电脑等,第二处理设备302可以为部署于云端或预设的通信网络边缘的处理设备,例如,第二处理设备302可以为部署于预设的通信网络边缘的基站等。
综上所述,本申请提供的神经网络训练系统,通过在神经网络训练系统中设置第一处理设备和第二处理设备,其中,第一处理设备可以执行神经网络训练过程中的向量点积运算,而第二处理设备可以执行神经网络训练过程中的权向量修正运算等其他类型的运算。因此,第一处理设备可以仅包含执行向量点积运算所需的特殊电路,从而使得第一处理设备电路结构较为简单,运算效率也较高,由于神经网络训练过程中大部分的运算均为向量点积运算,因此,第一处理设备运算效率较高可以提高神经网络的训练效率,使得神经网络的训练效率也较高。
在实际应用中,由于神经网络的识别运算通常包含向量点积运算和非线性变换运算这两种类型的运算,也即是,为了得到上述N组输出值需要进行向量点积运算和非线性变换运算两种类型的运算。本申请中第一处理设备301可以仅执行其中的向量点积运算,而由另外的处理设备执行非线性变换运算以最终得到上述N组输出值,或者,本申请中的第一处理设备301可以既执行向量点积运算又执行非线性变换运算,也即是,上述第一运算还可以包括向量点积运算,在这种情况下,第一处理设备301可以包含执行向量点积运算和非线性变换运算所需的电路,其电路结构仍然较为简单,运算效率也较高,同时,不需要设置另一个进行非线性变化运算的处理设备也可以减小硬件上的开销。
可选的,在本申请中,为了进一步降低第二处理设备302和第一处理设备301之间的通信带宽,第二处理设备302可以对上述修正权向量进行第一预设处理得到处理后的修正 权向量,并将处理后的修正权向量发送至第一处理设备301中,以由第一处理设备301根据该处理后的修正权向量对训练集中的数据向量进行识别运算。在实际应用中,处理后的修正权向量的数据量通常小于修正权向量的数据量,例如,在本申请的一个实施例中,修正权向量中一个向量元素占据的存储空间可以为16位以上(通常为32位),而处理后的修正权向量中一个向量元素占据的存储空间可以为4位到8位,因此,将修正权向量经过第一预设处理后再发送给第一处理设备301可以降低第二处理设备302和第一处理设备301之间的通信带宽。其中,上述第一预设处理可以为压缩处理或者量化处理中的至少一个,量化处理指的是将修正权向量中的每个向量元素映射为一个数据量较小的值,其中,该映射过程可以通过函数实现,也可以通过查找表的形式实现,本申请对此不做具体限定。
第一处理设备301在接收到上述经过处理后的修正权向量后可以对该处理后的修正权向量进行第二预设处理得到修正权向量,其中,该第二预设处理是第一预设处理的逆处理,也即是第二预设处理为与第一预设处理相反的处理,而后第一处理设备301基于得到的修正权向量对训练集中的数据向量进行识别运算,或者,第一处理设备301可以根据该处理后的修正权向量直接对训练集中的数据向量进行识别运算,本申请对此不做具体限定。
可选的,如上所述,第二处理设备302可以获取根据N组输出值计算得到的至少一个修正值,其中,本申请提供了两种第二处理设备302获取该至少一个修正值的方式,其中:
在第一种方式中,第一处理设备301可以根据该N组输出值计算该至少一个修正值,并将计算得到的该至少一个修正值发送至第二处理设备302。该第二处理设备302可以接收第一处理设备301发送的至少一个修正值。
在第二种方式中,第一处理设备301可以将该N组输出值发送至第二处理设备302中,第二处理设备302可以根据该N组输出值计算得到该至少一个修正值。
下面,本申请将以当前较为常见的BP算法对神经网络训练系统计算N组输出值,并根据该N组输出值计算至少一个修正值的技术过程进行说明,如图5所示,该技术过程可以包括以下步骤:
步骤11、第一处理设备301基于第一数据向量和权向量进行神经网络正向运算得到该目标神经网络每一层的正向输出向量,其中,该第一数据向量为上述N个数据向量中的任意一个数据向量。
其中,神经网络正向运算通常包括向量点积运算和非线性变换运算,其指的是由神经网络的输入层输入数据向量,根据上述公式(1)经过逐层运算后得到神经网络的输出层的输出向量的运算,在运算过程中计算得到的神经网络每一层的输出向量均可以被称为该层的正向输出向量。例如,如图1所示的神经网络中,神经网络正向运算指的是由神经网络的第1层输入数据向量,而后根据神经网络的权向量通过公式(1)依次计算神经网络第2层的输出向量、神经网络第3层的输出向量,并最终得到神经网络第4层的输出向量的运算。
步骤12、第一处理设备301将目标神经网络输出层的正向输出向量与第一数据向量对应的理想输出向量相减的差向量获取为误差向量。
其中,第一数据向量对应的理想输出向量可以存储于上述训练集中。例如,图1所示的神经网络中,由该神经网络的输入层(也即是该神经网络的第1层)输入的训练集中的数据向量为[1,1],经过神经网络正向运算后得到的神经网络的输出层(也即是该神经网 络的第4层)的正向输出向量可以为[3,2,3],在训练集中,该数据向量[1,1]对应的理想输出向量为[1,1,1],则第一处理设备301可以将向量[3,2,3]和向量[1,1,1]的差向量[2,1,2]获取为误差向量。
步骤13、第一处理设备301基于误差向量和权向量进行神经网络反向运算得到目标神经网络每一层的反向输出向量。
其中,神经网络反向运算包括向量点积运算,其指的是由神经网络的输出层输入误差向量,经过逐层运算后得到神经网络的输入层的输出向量的运算,在运算过程中计算得到的神经网络每一层的输出向量均可以被称为该层的反向输出向量,在神经网络反向运算中,有向弧的指向方向与神经网络正向运算中有向弧的指向方向正好相反。在神经网络的反向运算中,神经网络每一层的反向输出向量可以基于公式(2)计算得到:
e pj=g(t·q)=g(e (p+1)1×w (p+1)1j+e (p+1)2×w (p+1)2j+……+e (p+1)n×w (p+1)nj)       (2)。
在公式(2)中,e pj为神经网络第p层的反向输出向量中第j个向量元素的值,g为函数符号,t为神经网络第p+1层的反向输出向量,且,t=[e (p+1)1,e (p+1)2,……,e (p+1)n],其中,e (p+1)n为神经网络第p+1层的反向输出向量中的第n个向量元素,神经网络的第p+1层包括n个节点,“·”为向量点积运算符,q为神经网络第p+1层中所有节点指向第p层的第j个节点的有向弧对应的权值所组成的权向量,且,q=[w (p+1)1j,w (p+1)2j,……,w (p+1)nj],其中,w (p+1)nj为神经网络的第p+1层中第n个节点指向神经网络第p层中第j个节点的有向弧对应的权值。
图4所示为图1所示的神经网络在进行神经网络反向运算时有向弧的指向方向的示意图,根据图4所示,神经网络反向运算是由神经网络的第4层输入误差向量,而后根据上述公式(2)依次计算神经网络第3层的输出向量、神经网络第2层的输出向量,并最终得到神经网络第1层的输出向量的运算。
步骤14、第一处理设备301得到与训练集中的N个数据向量一一对应的N组输出值。
第一处理设备301将目标神经网络每一层的正向输出向量和目标神经网络每一层的反向输出向量获取为对应于第一数据向量的一组输出值。对上述N个数据向量分别执行上述技术过程即可得到与该N个数据向量一一对应的N组输出值。
步骤15、第一处理设备301或第二处理设备302基于N组输出值通过公式(3)计算至少一个修正值。
其中,公式(3)可以为:
Figure PCTCN2018079500-appb-000006
其中,Δw ij表示目标神经网络中由第p层中第i个节点指向第p+1层中第j个节点的有向弧对应的权向量向量元素的修正值,X pib表示N组输出值的第b组输出值中目标神经网络的第p层的正向输出向量中的第i个向量元素,E (p+1)jb表示N组输出值的第b组输出值中目标神经网络的第p+1层的反向输出向量中的第j个向量元素,i、j和p均为大于或等于1的正整数。
其中,当步骤15由第二处理设备302执行时,第一处理设备301还需要将计算得到的N组输出值发送至第二处理设备302。
本申请还提供了一种神经网络训练方法,该神经网络训练方法应用于第一处理设备中,如图6所示,该神经网络训练方法可以包括以下步骤:
步骤601、第一处理设备获取目标神经网络的权向量。
步骤602、第一处理设备获取训练集中的N个数据向量,其中,该训练集包括多个数据向量,N为大于或等于1的正整数。
步骤603、第一处理设备基于该N个数据向量中的每一个数据向量和该权向量进行第一运算得到N组输出值,该第一运算包括向量点积运算,以使第二处理设备获取至少一个修正值,其中,该至少一个修正值中的每个修正值用于修正该权向量中的一个向量元素,该每个修正值根据该N组输出值计算得到,并使得第二处理设备根据该至少一个修正值对该权向量中的向量元素进行修正,得到修正权向量,并使得第二处理设备将该修正权向量发送至第一处理设备,该修正权向量用于指示第一处理设备基于该训练集中的N个其他数据向量和该修正权向量进行该第一运算,该N个其他数据向量为该训练集中除该N个数据向量之外的数据向量。
综上所述,本申请提供的神经网络训练方法,通过第一处理设备执行神经网络训练过程中的向量点积运算,而第二处理设备执行神经网络训练过程中的权向量修正运算等其他类型的运算,使得第一处理设备可以仅包含执行向量点积运算所需的特殊电路,从而使得第一处理设备电路结构较为简单,运算效率也较高,由于神经网络训练过程中大部分的运算均为向量点积运算,因此,第一处理设备运算效率较高可以提高神经网络的训练效率,使得神经网络的训练效率也较高。
本申请实施例还提供了一种神经网络训练方法,该神经网络训练方法应用于第二处理设备中,如图7所示,该神经网络训练方法可以包括以下步骤:
步骤701、第二处理设备获取至少一个修正值,该至少一个修正值中的每个修正值用于修正目标神经网络的权向量中的一个向量元素,该每个修正值根据N组输出值计算得到,其中,该N组输出值由第一处理设备根据目标神经网络的权向量和训练集中的N个数据向量进行第一运算计算得到,该第一运算包括向量点积运算。
步骤702、第二处理设备根据该至少一个修正值对该权向量中的向量元素进行修正,得到修正权向量。
步骤703、第二处理设备将该修正权向量发送至第一处理设备,该修正权向量用于指示第一处理设备基于该训练集中的N个其他数据向量和该修正权向量进行该第一运算,该N个其他数据向量为该训练集中除该N个数据向量之外的数据向量。
综上所述,本申请提供的神经网络训练方法,通过第一处理设备执行神经网络训练过程中的向量点积运算,而第二处理设备执行神经网络训练过程中的权向量修正运算等其他类型的运算,使得第一处理设备可以仅包含执行向量点积运算所需的特殊电路,从而使得第一处理设备电路结构较为简单,运算效率也较高,由于神经网络训练过程中大部分的运算均为向量点积运算,因此,第一处理设备运算效率较高可以提高神经网络的训练效率,使得神经网络的训练效率也较高。
本申请实施例还提供了一种神经网络训练方法,该神经网络训练方法应用于神经网络 训练系统中,如图8所示,该神经网络训练方法可以包括以下步骤:
步骤801、第一处理设备获取目标神经网络的权向量。
步骤802、第一处理设备获取训练集中的N个数据向量,其中,该训练集包括多个数据向量,N为大于或等于1的正整数。
步骤803、第一处理设备基于该N个数据向量中的每一个数据向量和该权向量进行第一运算得到N组输出值,该第一运算包括向量点积运算。
其中,第一处理设备得到N组输出值的技术过程在上文中的步骤11至步骤14中已经进行了说明,本申请在此不再赘述。
步骤804、第二处理设备获取至少一个修正值,该至少一个修正值中的每个修正值用于修正该权向量中的一个向量元素,每个修正值根据该N组输出值计算得到。
本申请提供了两种第二处理设备获取该至少一个修正值的方式,其中:
在第一种方式中,第一处理设备可以根据该N组输出值计算该至少一个修正值,并将计算得到的该至少一个修正值发送至第二处理设备。该第二处理设备可以接收第一处理设备发送的至少一个修正值。
在第二种方式中,第一处理设备可以将该N组输出值发送至第二处理设备中,第二处理设备可以根据该N组输出值计算得到该至少一个修正值。
其中,第一处理设备或第二处理设备根据该N组输出值计算至少一个修正值的技术过程在上文中的步骤15中已经进行了说明,本申请在此不再赘述。
步骤805、第二处理设备根据该至少一个修正值对该权向量中的向量元素进行修正,得到修正权向量。
步骤806、第二处理设备将该修正权向量发送至该第一处理设备,该修正权向量用于指示该第一处理设备基于该训练集中的N个其他数据向量和该修正权向量进行该第一运算,该N个其他数据向量为该训练集中除该N个数据向量之外的数据向量。
在实际应用中,为了降低第二处理设备和第一处理设备之间的通信带宽,第二处理设备可以对上述修正权向量进行第一预设处理得到处理后的修正权向量,并将处理后的修正权向量发送至第一处理设备中,以由第一处理设备根据该处理后的修正权向量对训练集中的数据向量进行识别运算。在实际应用中,处理后的修正权向量的数据量通常小于修正权向量的数据量,例如,在本申请的一个实施例中,修正权向量中一个向量元素占据的存储空间可以为16位以上(通常为32位),而处理后的修正权向量中一个向量元素占据的存储空间可以为4位到8位,因此,将修正权向量经过第一预设处理后再发送给第一处理设备可以降低第二处理设备和第一处理设备之间的通信带宽。其中,上述第一预设处理可以为压缩处理或者量化处理中的至少一个,量化处理指的是将修正权向量中的每个向量元素映射为一个数据量较小的值,其中,该映射过程可以通过函数实现,也可以通过查找表的形式实现,本申请对此不做具体限定。
第一处理设备在接收到上述经过处理后的修正权向量后可以对该处理后的修正权向量进行第二预设处理得到修正权向量,其中,该第二预设处理是第一预设处理的逆处理,也即是第二预设处理为与第一预设处理相反的处理,而后第一处理设备基于得到的修正权向量对训练集中的数据向量进行识别运算,或者,第一处理设备可以根据该处理后的修正权向量直接对训练集中的数据向量进行识别运算,本申请对此不做具体限定。
综上所述,本申请提供的神经网络训练方法,通过第一处理设备执行神经网络训练过程中的向量点积运算,而第二处理设备执行神经网络训练过程中的权向量修正运算等其他类型的运算,使得第一处理设备可以仅包含执行向量点积运算所需的特殊电路,从而使得第一处理设备电路结构较为简单,运算效率也较高,由于神经网络训练过程中大部分的运算均为向量点积运算,因此,第一处理设备运算效率较高可以提高神经网络的训练效率,使得神经网络的训练效率也较高。
在示例性实施例中,本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以为非易失性存储介质,该计算机可读存储介质中存储有计算机程序,当该计算机可读存储介质中的计算机程序被上文所述的第一处理设备301执行时能够实现上述第一处理设备301在神经网络训练过程中执行的运算,或者,当该计算机可读存储介质中的计算机程序被上述第二处理设备302执行时能够实现上述第二处理设备302在神经网络训练过程中执行的运算。
在示例性的实施例中,本申请还提供了一种包含指令的计算机程序产品,当其在第一处理设备301上运行时,使得该第一处理设备301能够实现上述实施例中第一处理设备301在神经网络训练过程中执行的运算;或者,
当其在第二处理设备302上运行时,使得该第二处理设备302能够实现上述实施例中第二处理设备302在神经网络训练过程中执行的运算。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (23)

  1. 一种神经网络训练系统,其特征在于,所述神经网络训练系统包括第一处理设备和第二处理设备,所述第一处理设备和所述第二处理设备不同;
    所述第一处理设备,用于:
    获取目标神经网络的权向量;
    获取训练集中的N个数据向量,其中,所述训练集包括多个数据向量,N为大于或等于1的正整数;
    基于所述N个数据向量中的每一个数据向量和所述权向量进行第一运算得到N组输出值,所述第一运算包括向量点积运算;
    所述第二处理设备,用于:
    获取至少一个修正值,所述至少一个修正值中的每个修正值用于修正所述权向量中的一个向量元素,所述每个修正值根据所述N组输出值计算得到;
    根据所述至少一个修正值对所述权向量中的向量元素进行修正,得到修正权向量;
    将所述修正权向量发送至所述第一处理设备,所述修正权向量用于指示所述第一处理设备基于所述训练集中的N个其他数据向量和所述修正权向量进行所述第一运算,所述N个其他数据向量为所述训练集中除所述N个数据向量之外的数据向量。
  2. 根据权利要求1所述的系统,其特征在于,所述第一处理设备,还用于根据所述N组输出值计算所述至少一个修正值,并将计算得到的所述至少一个修正值发送至所述第二处理设备;
    所述第二处理设备,具体用于接收所述第一处理设备发送的所述至少一个修正值。
  3. 根据权利要求1所述的系统,其特征在于,所述第一处理设备,还用于将所述N组输出值发送至所述第二处理设备;
    所述第二处理设备,具体用于根据所述N组输出值计算所述至少一个修正值。
  4. 根据权利要求1所述的系统,其特征在于,所述第二处理设备,还用于对所述修正权向量进行第一预设处理得到处理后的修正权向量,所述处理后的修正权向量的数据量小于所述修正权向量的数据量;
    所述第二处理设备,具体用于将所述处理后的修正权向量发送至所述第一处理设备;
    所述第一处理设备,还用于接收所述第二处理设备发送的所述处理后的修正权向量,并对所述处理后的修正权向量进行第二预设处理得到所述修正权向量,所述第二预设处理为所述第一预设处理的逆处理。
  5. 根据权利要求4所述的系统,其特征在于,所述第一预设处理包括压缩处理和量化处理中的至少一个。
  6. 根据权利要求1所述的系统,其特征在于,所述第一处理设备,具体用于:
    基于第一数据向量和所述权向量进行神经网络正向运算得到所述目标神经网络每一层的 正向输出向量,所述神经网络正向运算包括向量点积运算和非线性变换运算,所述第一数据向量为所述N个数据向量中的任意一个数据向量;
    获取误差向量,所述误差向量为所述目标神经网络的输出层的正向输出向量与所述训练集中所述第一数据向量对应的理想输出向量的差向量;
    基于所述误差向量和所述权向量进行神经网络反向运算得到所述目标神经网络每一层的反向输出向量,所述神经网络反向运算包括向量点积运算;
    将所述目标神经网络每一层的正向输出向量和所述目标神经网络每一层的反向输出向量获取为对应于所述第一数据向量的一组输出值。
  7. 根据权利要求6所述的系统,其特征在于,所述第一处理设备,还用于根据所述N组输出值通过公式计算所述至少一个修正值,其中,所述公式为:
    Figure PCTCN2018079500-appb-100001
    其中,Δw ij表示所述目标神经网络中由第p层中第i个节点指向第p+1层中第j个节点的有向弧对应的权向量向量元素的修正值,X pib表示所述N组输出值的第b组输出值中所述目标神经网络的第p层的正向输出向量中的第i个向量元素,E (p+1)jb表示所述N组输出值的第b组输出值中所述目标神经网络的第p+1层的反向输出向量中的第j个向量元素,i、j和p均为大于或等于1的正整数;
    所述第一处理设备,还用于将计算得到的所述至少一个修正值发送至所述第二处理设备;
    所述第二处理设备,具体用于接收所述第一处理设备发送的所述至少一个修正值。
  8. 根据权利要求6所述的系统,其特征在于,第一处理设备,还用于将所述N组输出值发送至所述第二处理设备;
    所述第二处理设备,具体用于根据所述N组输出值通过公式计算所述至少一个修正值,其中,所述公式为:
    Figure PCTCN2018079500-appb-100002
    其中,Δw ij表示所述目标神经网络中由第p层中第i个节点指向第p+1层中第j个节点的有向弧对应的权向量向量元素的修正值,X pib表示所述N组输出值的第b组输出值中所述目标神经网络的第p层的正向输出向量中的第i个向量元素,E (p+1)jb表示所述N组输出值的第b组输出值中所述目标神经网络的第p+1层的反向输出向量中的第j个向量元素,i、j和p均为大于或等于1的正整数。
  9. 根据权利要求1所述的系统,其特征在于,所述第一处理设备和所述第二处理设备集成于一个设备中。
  10. 根据权利要求1所述的系统,其特征在于,所述第一处理设备为预设的运算网络中多个运算节点组成的处理设备;
    所述第二处理设备为部署于云端或预设的通信网络边缘的处理设备。
  11. 根据权利要求1-10任一所述的系统,其特征在于,所述第一运算还包括非线性变换运算。
  12. 一种神经网络训练方法,其特征在于,所述方法包括:
    第一处理设备获取目标神经网络的权向量;
    所述第一处理设备获取训练集中的N个数据向量,其中,所述训练集包括多个数据向量,N为大于或等于1的正整数;
    所述第一处理设备基于所述N个数据向量中的每一个数据向量和所述权向量进行第一运算得到N组输出值,所述第一运算包括向量点积运算;
    第二处理设备获取至少一个修正值,所述至少一个修正值中的每个修正值用于修正所述权向量中的一个向量元素,所述每个修正值根据所述N组输出值计算得到,所述第二处理设备与所述第一处理设备不同;
    所述第二处理设备根据所述至少一个修正值对所述权向量中的向量元素进行修正,得到修正权向量;
    所述第二处理设备将所述修正权向量发送至所述第一处理设备,所述修正权向量用于指示所述第一处理设备基于所述训练集中的N个其他数据向量和所述修正权向量进行所述第一运算,所述N个其他数据向量为所述训练集中除所述N个数据向量之外的数据向量。
  13. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    所述第一处理设备根据所述N组输出值计算所述至少一个修正值,并将计算得到的所述至少一个修正值发送至所述第二处理设备;
    所述第二处理设备获取至少一个修正值,包括:
    所述第二处理设备接收所述第一处理设备发送的所述至少一个修正值。
  14. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    所述第一处理设备将所述N组输出值发送至所述第二处理设备;
    所述第二处理设备获取至少一个修正值,包括:
    所述第二处理设备根据所述N组输出值计算所述至少一个修正值。
  15. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    所述第二处理设备对所述修正权向量进行第一预设处理得到处理后的修正权向量,所述处理后的修正权向量的数据量小于所述修正权向量的数据量;
    所述第二处理设备将所述修正权向量发送至所述第一处理设备,包括:
    所述第二处理设备将所述处理后的修正权向量发送至所述第一处理设备;
    所述方法还包括:
    所述第一处理设备接收所述第二处理设备发送的所述处理后的修正权向量,并对所述处理后的修正权向量进行第二预设处理得到所述修正权向量,所述第二预设处理是所述第一预设处理的逆处理。
  16. 根据权利要求15所述的方法,其特征在于,所述第一预设处理包括压缩处理和量化处理中的至少一个。
  17. 根据权利要求12所述的方法,其特征在于,所述第一处理设备基于所述N个数据向量中的每一个数据向量和所述权向量进行第一运算得到N组输出值,包括:
    所述第一处理设备基于第一数据向量和所述权向量进行神经网络正向运算得到所述目标神经网络每一层的正向输出向量,所述神经网络正向运算包括向量点积运算和非线性变换运算,所述第一数据向量为所述N个数据向量中的任意一个数据向量;
    所述第一处理设备获取误差向量,所述误差向量为所述目标神经网络的输出层的正向输出向量与所述训练集中所述第一数据向量对应的理想输出向量的差向量;
    所述第一处理设备基于所述误差向量和所述权向量进行神经网络反向运算得到所述目标神经网络每一层的反向输出向量,所述神经网络反向运算包括向量点积运算;
    所述第一处理设备将所述目标神经网络每一层的正向输出向量和所述目标神经网络每一层的反向输出向量获取为对应于所述第一数据向量的一组输出值。
  18. 根据权利要求17所述的方法,其特征在于,所述方法还包括:
    所述第一处理设备根据所述N组输出值通过公式计算所述至少一个修正值,其中,所述公式为:
    Figure PCTCN2018079500-appb-100003
    其中,Δw ij表示所述目标神经网络中由第p层中第i个节点指向第p+1层中第j个节点的有向弧对应的权向量向量元素的修正值,X pib表示所述N组输出值的第b组输出值中所述目标神经网络的第p层的正向输出向量中的第i个向量元素,E (p+1)jb表示所述N组输出值的第b组输出值中所述目标神经网络的第p+1层的反向输出向量中的第j个向量元素,i、j和p均为大于或等于1的正整数;
    所述第一处理设备将计算得到的所述至少一个修正值发送至所述第二处理设备;
    所述第二处理设备获取至少一个修正值,包括:
    所述第二处理设备接收所述第一处理设备发送的所述至少一个修正值。
  19. 根据权利要求17所述的方法,其特征在于,所述方法还包括:
    所述第一处理设备将所述N组输出值发送至所述第二处理设备;
    所述第二处理设备获取至少一个修正值,包括:
    所述第二处理设备根据所述N组输出值通过公式计算所述至少一个修正值,其中,所述公式为:
    Figure PCTCN2018079500-appb-100004
    其中,Δw ij表示所述目标神经网络中由第p层中第i个节点指向第p+1层中第j个节点的有向弧对应的权向量向量元素的修正值,X pib表示所述N组输出值的第b组输出值中所述目标神经网络的第p层的正向输出向量中的第i个向量元素,E (p+1)jb表示所述N组输出值的第b组输出值中所述目标神经网络的第p+1层的反向输出向量中的第j个向量元素,i、j和p均 为大于或等于1的正整数。
  20. 根据权利要求12所述的方法,其特征在于,所述第一处理设备和所述第二处理设备集成于一个设备中。
  21. 根据权利要求12所述的方法,其特征在于,所述第一处理设备为预设的运算网络中多个运算节点组成的处理设备;
    所述第二处理设备为部署于云端或预设的通信网络边缘的处理设备。
  22. 根据权利要求12-21任一所述的方法,其特征在于,所述第一运算还包括非线性变换运算。
  23. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,存储的所述计算机程序被权利要求1-11任一所述的第一处理设备执行时能够实现权利要求1-11任一所述的第一处理设备在神经网络训练过程中执行的运算;或者,
    存储的所述计算机程序被权利要求1-11任一所述的第二处理设备执行时能够实现权利要求1-11任一所述的第二处理设备在神经网络训练过程中执行的运算。
PCT/CN2018/079500 2017-08-22 2018-03-19 神经网络训练系统、方法和计算机可读存储介质 WO2019037409A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201880025109.7A CN110506280B (zh) 2017-08-22 2018-03-19 神经网络训练系统、方法和计算机可读存储介质

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710725775.9 2017-08-22
CN201710725775.9A CN109426859B (zh) 2017-08-22 2017-08-22 神经网络训练系统、方法和计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2019037409A1 true WO2019037409A1 (zh) 2019-02-28

Family

ID=65438345

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079500 WO2019037409A1 (zh) 2017-08-22 2018-03-19 神经网络训练系统、方法和计算机可读存储介质

Country Status (2)

Country Link
CN (2) CN109426859B (zh)
WO (1) WO2019037409A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177355A (zh) * 2021-04-28 2021-07-27 南方电网科学研究院有限责任公司 一种电力负荷预测方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109426859B (zh) * 2017-08-22 2021-03-05 华为技术有限公司 神经网络训练系统、方法和计算机可读存储介质
CN111783932B (zh) * 2019-04-03 2024-07-23 华为技术有限公司 训练神经网络的方法和装置
CN111126596B (zh) * 2019-12-17 2021-03-19 百度在线网络技术(北京)有限公司 神经网络训练中的信息处理方法、设备与存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654176A (zh) * 2014-11-14 2016-06-08 富士通株式会社 神经网络系统及神经网络系统的训练装置和方法
CN105678395A (zh) * 2014-11-21 2016-06-15 阿里巴巴集团控股有限公司 神经网络的建立方法及系统和神经网络的应用方法及系统
CN105900116A (zh) * 2014-02-10 2016-08-24 三菱电机株式会社 分层型神经网络装置、判别器学习方法以及判别方法
CN106203622A (zh) * 2016-07-14 2016-12-07 杭州华为数字技术有限公司 神经网络运算装置
CN106203616A (zh) * 2015-05-04 2016-12-07 富士通株式会社 神经网络模型训练装置和方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040024750A1 (en) * 2002-07-31 2004-02-05 Ulyanov Sergei V. Intelligent mechatronic control suspension system based on quantum soft computing
CN101101299A (zh) * 2007-06-25 2008-01-09 华东理工大学 一种并-串联模式识别方法及其在机器嗅觉中的应用
NO2310880T3 (zh) * 2008-08-06 2017-12-30
US9235799B2 (en) * 2011-11-26 2016-01-12 Microsoft Technology Licensing, Llc Discriminative pretraining of deep neural networks
CN107688493B (zh) * 2016-08-05 2021-06-18 阿里巴巴集团控股有限公司 训练深度神经网络的方法、装置及系统
CN109426859B (zh) * 2017-08-22 2021-03-05 华为技术有限公司 神经网络训练系统、方法和计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105900116A (zh) * 2014-02-10 2016-08-24 三菱电机株式会社 分层型神经网络装置、判别器学习方法以及判别方法
CN105654176A (zh) * 2014-11-14 2016-06-08 富士通株式会社 神经网络系统及神经网络系统的训练装置和方法
CN105678395A (zh) * 2014-11-21 2016-06-15 阿里巴巴集团控股有限公司 神经网络的建立方法及系统和神经网络的应用方法及系统
CN106203616A (zh) * 2015-05-04 2016-12-07 富士通株式会社 神经网络模型训练装置和方法
CN106203622A (zh) * 2016-07-14 2016-12-07 杭州华为数字技术有限公司 神经网络运算装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177355A (zh) * 2021-04-28 2021-07-27 南方电网科学研究院有限责任公司 一种电力负荷预测方法
CN113177355B (zh) * 2021-04-28 2024-01-12 南方电网科学研究院有限责任公司 一种电力负荷预测方法

Also Published As

Publication number Publication date
CN109426859A (zh) 2019-03-05
CN110506280B (zh) 2022-12-27
CN109426859B (zh) 2021-03-05
CN110506280A (zh) 2019-11-26

Similar Documents

Publication Publication Date Title
WO2019037409A1 (zh) 神经网络训练系统、方法和计算机可读存储介质
CN110288030B (zh) 基于轻量化网络模型的图像识别方法、装置及设备
US10643124B2 (en) Method and device for quantizing complex artificial neural network
JP7266674B2 (ja) 画像分類モデルの訓練方法、画像処理方法及び装置
CN109919183B (zh) 一种基于小样本的图像识别方法、装置、设备及存储介质
Shetty Application of convolutional neural network for image classification on Pascal VOC challenge 2012 dataset
WO2023050707A1 (zh) 网络模型量化方法、装置、计算机设备以及存储介质
Al-Faiz et al. The effect of Z-Score standardization (normalization) on binary input due the speed of learning in back-propagation neural network
WO2020151310A1 (zh) 文本生成方法、装置、计算机设备及介质
CN110033026A (zh) 一种连续小样本图像的目标检测方法、装置及设备
WO2017166155A1 (zh) 一种对神经网络模型进行训练的方法、装置及电子设备
US20210133571A1 (en) Systems and Methods for Training Neural Networks
JP7439607B2 (ja) 遺伝モデルに基づきディープニューラルネットワーク(dnn)を訓練することにおけるデータ拡張
WO2019155523A1 (ja) 分類器形成装置、分類器形成方法、及びプログラムを格納する非一時的なコンピュータ可読媒体
WO2024188311A1 (zh) 对神经网络进行训练的方法及更新神经网络参数的优化器
KR20210064817A (ko) 상이한 딥러닝 모델 간의 전이 학습방법
Ditzler et al. Incremental learning of new classes in unbalanced datasets: Learn++. UDNC
US20190272309A1 (en) Apparatus and method for linearly approximating deep neural network model
WO2019080844A1 (zh) 数据推理方法、装置及计算机设备
Ye et al. Low-quality image object detection based on reinforcement learning adaptive enhancement
Abrishami et al. Efficient training of deep convolutional neural networks by augmentation in embedding space
CN111402164B (zh) 矫正网络模型的训练方法和装置、文本识别方法和装置
TWI763975B (zh) 降低類神經網路之運算複雜度的系統與方法
Zhang et al. A novel noise injection-based training scheme for better model robustness
WO2019142242A1 (ja) データ処理システムおよびデータ処理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18848921

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18848921

Country of ref document: EP

Kind code of ref document: A1