CN111788584A

CN111788584A - Neural network computing method and device

Info

Publication number: CN111788584A
Application number: CN201880090586.1A
Authority: CN
Inventors: 胡慧; 郑成林
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-08-21
Filing date: 2018-08-21
Publication date: 2020-10-16
Also published as: WO2020037512A1

Abstract

The embodiment of the application discloses a neural network computing method and device, relates to the technical field of communication, and solves the problems that training is difficult to converge and precision is obviously reduced when a block circulant matrix compresses a neural network in the prior art and the size of a block is large. The specific scheme is as follows: acquiring an input vector of a first network layer to be processed; obtaining a disturbance vector of the first network layer to be processed according to the reference random vector of the neural network and the dimension of the random vector of the first network layer to be processed, wherein the dimension of the disturbance vector is equal to the dimension of an input vector of the first network layer to be processed; multiplying elements in the input vector of the first network layer to be processed with elements at corresponding positions in the perturbation vector to obtain a corrected input vector of the first network layer to be processed; and obtaining an output vector of the first network layer to be processed based on the correction input vector and the calculation model of the neural network.

Description

Neural network computing method and device

Technical Field

The embodiment of the application relates to the technical field of communication, in particular to a neural network computing method and device.

Background

The neural network is an arithmetic mathematical model which simulates the behavior characteristics of the animal neural network and performs distributed parallel information processing, and has wide application in a plurality of fields. At present, a structured matrix obtains a good effect on compression and acceleration of a neural network, wherein a block circulant matrix is one of the structured matrices, and the block circulant matrix is applied to a convolutional layer, a full connection layer and a Long-Short Term Memory (LSTM) layer to realize high-multiple network parameter compression.

However, in the existing method for compressing the neural network by using the block circulant matrix, when the size of a block is large, due to the structuralization of the weight parameters, the correlation between output characteristic images is introduced, so that the precision loss of an application task based on the neural network is caused by the loss of information, the training convergence speed is slow or even the convergence is not realized, and the performance of the model is reduced.

Disclosure of Invention

The embodiment of the application provides a neural network computing method and device, and can solve the problems that training is difficult to converge and precision is obviously reduced when a block circulant matrix compresses a neural network and the size of a block is large.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

in a first aspect of embodiments of the present application, a neural network computing method is provided, where the neural network includes a plurality of network layers, and the plurality of network layers includes a first to-be-processed network layer, and the method includes: acquiring an input vector of a first network layer to be processed; acquiring a disturbance vector of the first network layer to be processed according to the reference random vector of the neural network and the dimension of the random vector of the first network layer to be processed, wherein the dimension of the disturbance vector is equal to the dimension of the input vector of the first network layer to be processed, and the dimension of the random vector of the first network layer to be processed is determined based on the weight parameter of the first network layer to be processed and the dimension of the input vector of the first network layer to be processed; multiplying elements in the input vector of the first network layer to be processed with elements in corresponding positions in the perturbation vector to obtain a corrected input vector of the first network layer to be processed; and obtaining an output vector of the first network layer to be processed based on the correction input vector and the calculation model of the neural network. When the first network layer to be processed is the input layer of the neural network, the obtaining of the input vector of the first network layer to be processed includes using the input vector of the neural network as the input vector of the first network layer to be processed; when the first to-be-processed network layer is not the input layer of the neural network, the obtaining of the input vector of the first to-be-processed network layer includes using the output vector of the previous network layer of the first to-be-processed network layer as the input vector of the first to-be-processed network layer. Based on the scheme, the relevance between the output characteristic images can be reduced, the training convergence speed is accelerated, and the precision loss is reduced.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the input vector of the first to-be-processed network layer is { x1, x2 … xn }, the disturbance vector is { y1, y2 … yn }, and the corrected input vector of the first to-be-processed network layer is { x1 × y1, x2 × y2 … xn × yn }, where n is a positive integer. Based on the scheme, the relevance of the projection matrix of the first to-be-processed network layer can be effectively reduced by scrambling the motion vector on the input vector of the first to-be-processed network layer.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the neural network computational model includes: y is_j＝C_j×X_j+b_jWherein Y is_jComputing the output vector, X, of the model for the neural network_jComputing the input vector of the model for the neural network, C_jCalculating weight parameters of the model for the neural network, b_jAnd calculating a preset offset value of the neural network calculation model, wherein the neural network calculation model is used for calculating the neural network of the j-th layer in the network layers, and j is an integer. The obtaining an output vector of the first to-be-processed network layer based on the corrected input vector and the calculation model of the neural network includes: the first network to be processed is connected with the network to be processedAnd respectively calculating the weight parameters and the correction input vectors of the layer as the weight parameters and the input vectors of the neural network calculation model to obtain the output vectors of the first to-be-processed network layer. Based on the scheme, the output vector of the first network layer to be processed can be calculated by correcting the input vector and the weight parameter.

With reference to the first aspect and the foregoing possible implementation manner, in another possible implementation manner, the weighting parameter is a compression matrix corresponding to a block-structured matrix, where the block-structured matrix is uniquely determined by the compression matrix, and before the calculating, by a neural network, based on the correction input vector to obtain the output vector of the first to-be-processed network layer, the method further includes: and decompressing the weight parameters into a block structured matrix corresponding to the compression matrix. Based on the scheme, for the first network layer to be processed of the compression matrix corresponding to the matrix with the weight parameter being the block structured matrix, the block structured matrix can be obtained through decompressing the weight parameter, and then the output vector is calculated.

With reference to the first aspect and the foregoing possible implementation manner, in another possible implementation manner, the determining a dimension of a random vector of the first network layer to be processed based on the weight parameter of the first network layer to be processed and a dimension of an input vector of the first network layer to be processed includes: determining a target interval, wherein two interval endpoints of the target interval are the block size of the block structured matrix corresponding to the weight parameter of the first network layer to be processed and the dimension of the input vector of the first network layer to be processed respectively; and randomly determining the dimension of the random vector of the first network layer to be processed in the target interval. Based on the scheme, the dimension of the random vector of the first network layer to be processed can be obtained, and the dimension of the random vector is smaller than that of the input vector.

With reference to the first aspect and the foregoing possible implementation manner, in another possible implementation manner, the foregoing multiple network layers include multiple first network layers to be processed, and before the obtaining the perturbation vector of the first network layer to be processed according to the reference random vector of the neural network and the dimension of the random vector of the first network layer to be processed, the method further includes: determining the maximum value in the dimensions of all the random vectors of the first network layer to be processed as the dimension of the reference random vector of the neural network; generating a random number meeting the dimensionality of the reference random vector based on a preset random number generation model; and forming the generated random number into a reference random vector. Based on the scheme, the dimension of the reference random vector can be determined according to the dimensions of all the random vectors of the first network layer to be processed, and the dimension of the reference random vector is smaller than the dimension of the input vector.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the obtaining a perturbation vector of the first network layer to be processed according to dimensions of a reference random vector of the neural network and a random vector of the first network layer to be processed includes: intercepting a vector with the same dimension as the random vector of the first network layer to be processed from the reference random vector as the random vector of the first network layer to be processed; and generating a vector with the same dimension as the input vector of the first network layer to be processed as the perturbation vector of the first network layer to be processed by adopting a cyclic shift or vector element replication mode on the random vector of the first network layer to be processed. Based on the scheme, the disturbance vector of the first network layer to be processed can be obtained in an intercepting and expanding mode according to the dimensions of the reference random vector and the random vector of the first network layer to be processed, so that the accuracy of the neural network can be guaranteed not to be reduced only by adding few storage resources and calculation resources, and the convergence speed is effectively improved.

In a second aspect of the embodiments of the present application, a method for training a neural network model is provided, where the method is used to obtain a weight parameter of each network layer in a neural network, and the method includes: step 1, initializing the weight parameters of each network layer; step 2, performing neural network computation according to a neural network computation model corresponding to each network layer to obtain a temporary output vector of the neural network, where a weight parameter of the neural network computation model is an initialized weight parameter, and the performing neural network computation according to the neural network computation model corresponding to each network layer includes performing neural network computation on one or more first to-be-processed network layers in each network layer according to the neural network computation method of the first aspect or any implementation manner of the first aspect; step 3, updating the weight parameter of each network layer through the reverse transmission of the neural network; taking the updated weight parameter of each network layer as the weight parameter of the neural network calculation model corresponding to each network layer, and repeating the step 2 and the step 3 until the difference between the temporary output vector and the preset output vector of the neural network is smaller than a preset value; and 4, acquiring the weight parameter of each network layer. Based on the scheme, a plurality of network layers of the neural network can be trained, and the weight parameter of each network layer during training convergence is obtained.

In a third aspect of the embodiments of the present application, there is provided a neural network computing apparatus, where the neural network includes a plurality of network layers, and the plurality of network layers includes a first to-be-processed network layer, the apparatus includes: a first obtaining unit, configured to obtain an input vector of the first to-be-processed network layer; a second obtaining unit, configured to obtain a perturbation vector of the first to-be-processed network layer according to a reference random vector of the neural network and dimensions of a random vector of the first to-be-processed network layer, where the dimension of the perturbation vector is equal to the dimension of an input vector of the first to-be-processed network layer, and the dimension of the random vector of the first to-be-processed network layer is determined according to a weight parameter of the first to-be-processed network layer and the dimension of the input vector of the first to-be-processed network layer; a first calculating unit, configured to multiply an element in the input vector of the first to-be-processed network layer with an element at a corresponding position in the perturbation vector acquired by the second acquiring unit, so as to obtain a corrected input vector of the first to-be-processed network layer; and a second calculation unit, configured to obtain an output vector of the first to-be-processed network layer based on the corrected input vector obtained by the first calculation unit and a calculation model of the neural network. When the first to-be-processed network layer is an input layer of the neural network, the obtaining unit is specifically configured to obtain an input vector of the neural network; when the first to-be-processed network layer is not an input layer of the neural network, the obtaining unit is specifically configured to obtain an output vector of a network layer preceding the first to-be-processed network layer.

With reference to the third aspect and the foregoing possible implementation manners, in another possible implementation manner, the input vector of the first network layer to be processed is { x1, x2 … xn }, the disturbance vector is { y1, y2 … yn }, and the corrected input vector of the first network layer to be processed is { x1 × y1, x2 × y2 … xn × yn }, where n is a positive integer.

With reference to the third aspect and the foregoing possible implementation manners, in another possible implementation manner, the neural network computational model includes: y is_j＝C_j×X_j+b_jWherein Y is_jComputing the output vector, X, of the model for the neural network_jComputing the input vector of the model for the neural network, C_jCalculating weight parameters of the model for the neural network, b_jAnd calculating a preset offset value of the neural network calculation model, wherein the neural network calculation model is used for calculating the neural network of the j-th layer in the network layers, and j is an integer. The second calculating unit is specifically configured to: and respectively taking the weight parameter and the correction input vector of the first network layer to be processed as the weight parameter and the input vector of the neural network calculation model to calculate so as to obtain the output vector of the first network layer to be processed.

With reference to the third aspect and the foregoing possible implementation manner, in another possible implementation manner, the weight parameter is a compression matrix corresponding to a block-structured matrix, where the block-structured matrix is uniquely determined by the compression matrix, and the apparatus further includes a decompression unit configured to decompress the weight parameter into the block-structured matrix corresponding to the compression matrix.

With reference to the third aspect and the foregoing possible implementation manner, in another possible implementation manner, the apparatus further includes a first determining unit, where the first determining unit is configured to determine a target interval, and two interval endpoints of the target interval are a block size of the block-structured matrix corresponding to the weight parameter of the first network layer to be processed and a dimension of the input vector of the first network layer to be processed, respectively; the first determining unit is further configured to randomly determine a dimension of the random vector of the first to-be-processed network layer within the target interval.

With reference to the third aspect and the foregoing possible implementation manners, in another possible implementation manner, the foregoing multiple network layers include multiple first network layers to be processed, and the apparatus further includes: a second determining unit and a generating unit, wherein the second determining unit is further configured to determine a maximum value among dimensions of random vectors of all the first to-be-processed network layers as a dimension of a reference random vector of the neural network; the generating unit is configured to generate random numbers that satisfy the dimension of the reference random vector based on a preset random number generation model, and compose the generated random numbers into the reference random vector.

With reference to the third aspect and the foregoing possible implementation manner, in another possible implementation manner, the second obtaining unit is specifically configured to intercept, from the reference random vector, a vector having a dimension equal to that of the random vector of the first network layer to be processed as the random vector of the first network layer to be processed; the second obtaining unit is specifically configured to generate, as the perturbation vector of the first network layer to be processed, a vector having a dimension equal to that of the input vector of the first network layer to be processed by using a cyclic shift or vector element replication method for the random vector of the first network layer to be processed.

In a fourth aspect of the embodiments of the present application, there is provided a training apparatus for a neural network model, configured to obtain a weight parameter of each network layer in a neural network, the apparatus including: an initialization unit, configured to initialize the weight parameter of each network layer; a neural network computing unit, configured to perform neural network computation according to the neural network computation model corresponding to each network layer to obtain a temporary output vector of the neural network, where a weight parameter of the neural network computation model is the initialized weight parameter; the neural network computing unit is specifically configured to perform neural network computation on one or more first to-be-processed network layers in each network layer according to the first aspect or the neural network computing method described in any implementation manner of the first aspect; a reverse transfer unit, configured to update the weight parameter of each network layer through reverse transfer of the neural network; an obtaining unit, configured to obtain a weight parameter of each network layer, where the weight parameter of each network layer is obtained when a difference between the temporary output vector of the neural network and a preset output vector of the neural network is smaller than a preset value.

The description of the effects of the third aspect and various implementation manners of the third aspect may refer to the description of the corresponding effects of the first aspect, and the description of the effects of the fourth aspect may refer to the description of the corresponding effects of the second aspect, which is not repeated herein.

In a fifth aspect of the embodiments of the present application, a server is provided, where the server includes a processor and a memory, the memory is configured to be coupled with the processor and store necessary program instructions and data of the server, and the processor is configured to execute the program instructions stored in the memory, so that the server executes the method described above.

A sixth aspect of the embodiments of the present application provides a computer storage medium, in which computer program code is stored, and when the computer program code runs on a processor, the processor is caused to execute the neural network computing method according to the first aspect or any one of the possible implementation manners of the first aspect, or execute the training method of the neural network model according to the second aspect.

In a seventh aspect of the embodiments of the present application, a computer program product is provided, where the computer program product stores computer software instructions executed by the processor, and the computer software instructions include a program for executing the aspects of the above aspects.

In an eighth aspect of the embodiments of the present application, there is provided an apparatus, which exists in the form of a chip product, and which includes a processor and a memory, the memory is configured to be coupled with the processor and stores necessary program instructions and data of the apparatus, and the processor is configured to execute the program instructions stored in the memory, so that the apparatus performs the functions of the neural network computing apparatus or the training apparatus of the neural network model in the above method.

Drawings

Fig. 1 is a schematic diagram of a hardware architecture of a neural network computing according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of another hardware architecture of neural network computation according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of another hardware architecture of neural network computation according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a neural network computing method according to an embodiment of the present application;

FIG. 5 is a flow chart of another neural network computing method provided by an embodiment of the present application;

FIG. 6 is a flow chart of another neural network computing method provided by an embodiment of the present application;

FIG. 7 is a flow chart of another neural network computing method provided by an embodiment of the present application;

FIG. 8 is a flow chart of another neural network computing method provided by an embodiment of the present application;

FIG. 9 is a schematic diagram illustrating comparison of effects obtained by using the neural network calculation methods in the prior art and the present embodiment according to the present invention;

FIG. 10 is a flowchart of a method for training a neural network model according to an embodiment of the present disclosure;

FIG. 11 is a schematic diagram of a neural network computing device according to an embodiment of the present disclosure;

fig. 12 is a schematic composition diagram of a neural network model training apparatus according to an embodiment of the present disclosure;

FIG. 13 is a schematic diagram of another embodiment of a neural network computing device;

fig. 14 is a schematic composition diagram of another neural network model training apparatus according to an embodiment of the present application.

Detailed Description

In order to solve the problems that training is difficult to converge and precision is obviously reduced when a block circulant matrix compresses a neural network in the prior art and the size of a block is large, the embodiment of the application provides a neural network calculation method which can reduce the correlation between output characteristic images, accelerate the training convergence speed and reduce the precision loss.

The neural network in the embodiment of the application comprises a plurality of network layers, wherein the plurality of network layers comprise one or more first network layers to be processed. When the first network layer to be processed is an input layer of the neural network, the input vector of the first network layer to be processed is the input vector of the neural network; when the first network layer to be processed is not an input layer of the neural network, the input vector of the first network layer to be processed is an output vector of a previous network layer of the first network layer to be processed. The calculation method for any of the one or more network layers to be processed may be applied to the hardware architecture shown in fig. 1.

As shown in fig. 1, the hardware structure includes: a first obtaining module 101, a second obtaining module 102, a first calculating module 103 and a second calculating module 104. The first obtaining module 101 is sequentially connected to the first calculating module 103 and the second calculating module 104, and the second obtaining module 102 is sequentially connected to the first calculating module 103 and the second calculating module 104.

A first obtaining module 101, configured to obtain an input vector of each first to-be-processed network layer. Illustratively, for the first network layer, the input vector is the input vector of the neural network; for the network layer beyond the first network layer, the input vector is the output vector of the previous network layer.

The second obtaining module 102 is configured to obtain a perturbation vector of the first network layer to be processed according to the input reference random vector and the dimensionality of the random vector of the first network layer to be processed, where the dimensionality of the perturbation vector of the first network layer to be processed may be equal to the dimensionality of the input vector of the first network layer to be processed. Illustratively, the second obtaining module may include a cyclic shift module or a copy module.

The first calculating module 103 is configured to scramble a perturbation based on the input vector of the first to-be-processed network layer acquired by the first acquiring module 101, where the perturbation is a perturbation vector acquired by the second acquiring module 102. For example, the corrected input vector of the first network layer to be processed may be obtained by multiplying an element in the input vector of the first network layer to be processed by an element in a corresponding position in the perturbation vector.

A second calculating module 104, configured to calculate an output vector of the first to-be-processed network layer according to the corrected input vector obtained by the first calculating module 103 and a calculation model of the neural network. For example, the computational model of the neural network may be: y is_j＝C_j×X_j+b_jWherein Y is_jComputing the output vector, X, of the model for the neural network_jComputing the input vector of the model for the neural network, C_jCalculating weight parameters of the model for the neural network, b_jA preset bias value of a neural network computation model is calculated, wherein the neural network computation model is used for neural network computation of a j-th layer in a plurality of network layers, and j is an integer. For example, the second calculation module 104 may calculate the output vector of the first layer to be processed by using a fast calculation method, for example, when the block structured matrix is a block circulant matrix, the output vector of the first layer to be processed may be calculated by using a fast fourier transform.

It will be appreciated that figure 1 is merely exemplary and that in practice the hardware architecture of a neural network computing device may include more or fewer components than those shown in figure 1. The architecture shown in fig. 1 does not set any limit to the hardware architecture provided by the embodiments of the present application.

For example, the hardware architecture in the embodiment of the present application may further include a decompression module 105 shown in fig. 2, in addition to the modules shown in fig. 1, where the decompression module 105 is connected to the second computing module 104.

The decompression module 105 is configured to decompress the weight parameters into a block-structured matrix corresponding to the compression matrix, and transmit the block-structured matrix to the second calculation module 104, so as to calculate an output vector of the network layer. For example, when the block structured matrix is a block circulant matrix, the decompressing module 105 may be a cyclic shift module, and decompress the weight parameter into the block circulant matrix through the cyclic shift module, and then perform matrix multiplication operation by using the second calculating module 104 to calculate the output vector of the first layer to be processed.

As shown in fig. 3, an embodiment of the present application further provides a hardware structure, where the hardware structure includes a first determining module 301, a second determining module 302, and a generating module 303, which are connected in sequence.

A first determining module 301, configured to determine the dimension of the random vector of the first network layer to be processed according to the dimension of the input vector and the weight parameter.

A second determining module 302, configured to determine the dimension of the reference random vector according to the dimension of the random vectors of the one or more first network layers to be processed. For example, the dimension of the reference random vector may be the maximum of the dimensions of the random vectors of all the first to-be-processed network layers.

A generating module 303, configured to generate a reference random vector having the same dimension as the reference random vector according to the dimension of the reference random vector.

It is understood that fig. 1-3 are only exemplary, and the structures shown in fig. 1-3 do not set any limit to the hardware architecture provided by the embodiments of the present application.

With reference to fig. 1 and fig. 2, as shown in fig. 4, for any first to-be-processed network layer in a plurality of network layers included in a neural network, the calculation method shown in fig. 4 may be adopted to calculate an output vector of the first to-be-processed network layer. As shown in fig. 4, the neural network computing method provided in the embodiment of the present application may include steps S401 to S404.

S401, obtaining an input vector of a first network layer to be processed.

It is understood that step S401 may be performed by the first obtaining module 101 shown in fig. 1.

For example, the first network layer to be processed refers to a network layer of a compressed matrix corresponding to a block-structured matrix in the plurality of network layers of the neural network, and the weight parameter is the block-structured matrix. Wherein the block structured matrix can be uniquely determined by the compression matrix.

For example, the block structured matrix mentioned above means that the matrix can be divided into a plurality of blocks, and each block is arranged according to a certain rule. The block structured matrix may include a block circulant matrix, a block toeplitz matrix, etc., and the specific type of the block structured matrix is not limited in the embodiments of the present application, and only the block circulant matrix is taken as an example for description.

For example, W is a block cyclic matrix, a second column of each block in the block cyclic matrix is obtained by cyclically shifting the first column downward, a third column is obtained by cyclically shifting the second column downward again, and the weight parameter C of the network layer is a compression matrix corresponding to the block cyclic matrix W. The compression matrix C is represented by a matrix composed of N w _ bases, where N refers to the number of blocks of the block circulant matrix, and the w _ base of each block is the minimum set of elements that uniquely determine each block of the structured matrix of the block (e.g., the w _ base may be the first row or column of a block). The compression matrix of the block circulant matrix W in the following formula is C (weight parameter), the block circulant matrix is divided into 4 blocks, and the block circulant matrix W can be uniquely determined by the compression matrix C.

It is to be understood that the neural network in the embodiment of the present application may include one or more first to-be-processed network layers, and one of the one or more to-be-processed network layers may be an input vector of the neural network, and when the first to-be-processed network layer is the first network layer of the neural network, the input vector of the first to-be-processed network layer is the input vector of the neural network; when the first network layer to be processed is any network layer except the first network layer, the input vector of the first network layer to be processed is the output vector of the previous network layer of the first network layer to be processed.

For example, the Neural Network in the embodiment of the present application may be a Deep Neural Network (DNN), a Convolutional Neural Network (CNN), or the like, and the first Network layer to be processed may be a Convolutional layer (convolution layer), an LSTM layer, a full connectivity layer (full connectivity layer), or the like. The embodiments of the present application are not limited to specific neural network types and structures.

S402, obtaining a disturbance vector of the first network layer to be processed according to the reference random vector of the neural network and the dimension of the random vector of the first network layer to be processed.

It is understood that step S402 may be performed by the second obtaining module 102 shown in fig. 1.

Wherein the dimension of the perturbation vector is equal to the dimension of the input vector of the first to-be-processed network layer.

For example, the obtaining the perturbation vector of the first to-be-processed network layer in step S402 may include: steps S4021-S4022.

S4021, intercepting a vector with the same dimension as the random vector of the first network layer to be processed from the reference random vector as the random vector of the first network layer to be processed.

For example, the reference random vector may be the maximum value among the dimensions of the random vectors of all the first network layers to be processed, and thus a vector having the same dimension as the random vector of the first network layer to be processed may be truncated as the random vector of the first network layer to be processed.

For example, the above-mentioned intercepting a vector having a dimension equal to that of the random vector of the first to-be-processed network layer from the reference random vector may include: the method includes intercepting a vector having a dimension equal to that of the random vector of the first to-be-processed network layer as the random vector of the first to-be-processed network layer in sequence from a certain position (e.g., the first bit) of the reference random vector, or intercepting a vector having a dimension equal to that of the random vector of the first to-be-processed network layer as the random vector of the first to-be-processed network layer according to a certain interception rule, for example, intercepting the random vector of the first to-be-processed network layer from odd bits or even bits of the reference random vector.

S4022, generating a vector with the same dimension as the input vector of the first network layer to be processed as a disturbance vector of the first network layer to be processed by adopting a cyclic shift or vector element replication mode on the random vector of the first network layer to be processed.

Illustratively, the dimension of the random vector of the first network layer to be processed is smaller than the dimension of the input vector of the first network layer to be processed, and the dimension of the perturbation vector of the first network layer to be processed is equal to the dimension of the input vector of the first network layer to be processed, so that a vector with the same dimension as the dimension of the input vector of the first network layer to be processed can be obtained as the perturbation vector of the first network layer to be processed by expanding the random vector of the first network layer to be processed by adopting a cyclic shift or vector element copy manner.

For example, if the random vector of the first network layer to be processed is A_i＝[1,-1,1,1,-1]The dimension of the input vector of the first network layer to be processed is 15, and the perturbation vector of the first network layer to be processed obtained by adopting the cyclic shift mode can be a _ L_i＝[1,-1,1,1,-1,-1,1,1,-1,1,1,1,-1,1,-1](ii) a Obtaining a disturbance vector A _ L of a first network layer to be processed by adopting a vector element copying mode_iMay be A _ L_i＝[1,-1,1,1,-1,1,-1,1,1,-1,1,-1,1,1,-1]. In the embodiment of the present application, it is not limited to specifically what kind of method is adopted to obtain the perturbation vector of the first network layer to be processed according to the random vector of the first network layer to be processed.

It can be understood that, when the perturbation vector of the first network layer to be processed is obtained by adopting a cyclic shift manner, the second obtaining module in fig. 1 includes a cyclic shift module, and when the perturbation vector of the first network layer to be processed is obtained by adopting a vector element copy manner, the second obtaining module in fig. 1 includes a copy module.

It should be noted that, in the embodiment of the present application, only the reference random vector of the neural network and the dimensions of the random vectors of all the first to-be-processed network layers need to be used, the perturbation vector having the same dimension as the input vector may be generated by adopting the method in step S402, and since the dimension of the random vector of the first to-be-processed network layer is smaller than the dimension of the input vector, only a few storage resources and computation resources need to be added, so that the accuracy of the neural network is ensured not to be decreased, and the convergence speed is effectively improved.

And S403, multiplying the elements in the input vector of the first network layer to be processed with the elements at the corresponding positions in the perturbation vector to obtain a corrected input vector of the first network layer to be processed.

It is understood that step S403 may be performed by the first calculation module 103 shown in fig. 1.

For example, if the input vector of the first to-be-processed network layer is { x1, x2 … xn }, the perturbation vector of the first to-be-processed network layer is { y1, y2 … yn }, and the corrected input vector of the first to-be-processed network layer is { x1 × y1, x2 × y2 … xn × yn }, where n is a positive integer.

It should be noted that, in the embodiment of the present application, by scrambling a motion vector on an input vector of a first to-be-processed network layer, correlation of a projection matrix of the layer can be effectively reduced.

S404, obtaining an output vector of the first network layer to be processed based on the correction input vector and the calculation model of the neural network.

It is understood that step S404 may be performed by the second calculation module 104 shown in fig. 1.

Illustratively, the neural network computational model includes: y is_j＝C_j×X_j+b_jWherein Y is_jComputing the output vector, X, of the model for the neural network_jComputing the input vector of the model for the neural network, C_jCalculating weight parameters of the model for the neural network, b_jcalculating a preset offset value of a neural network calculation model, wherein 'x' is matrix multiplication, the neural network calculation model is used for calculating the neural network of the j-th network layer in a plurality of network layers, and j is an integer.

It can be understood that the neural network computing method provided by the embodiment of the application can be suitable for converting a computing model into Y_j＝C_j×X_j+b_jAll network layers. For example, for a fully connected layer, its computational model is the same as that described above; for the LSTM layer, it may be composed of a plurality of the above calculation models; for convolutional layers, the original calculation model is Y_j＝C_j*X_j+b_jWhere ". star" is a convolution operation, it is possible to convert convolution operations into matrix multiplication operations according to the prior art, i.e. convolution layers may also be converted into the above-mentioned calculation model. Therefore, the neural network computing method in the embodiment of the present application may be applied to network layers including a fully-connected layer, a convolutional layer, an LSTM, and the like.

When the convolutional layer is compressed, after the convolution operation is converted into matrix multiplication, the dimension of the input vector of the first network layer to be processed is the width of the input matrix after im2col conversion according to the size of the convolution kernel. The im2col is a process of expanding the pixel value of a small window on an input image to be processed by a convolution kernel each time to one row (column) of a new matrix, and the number of columns (rows) of the new matrix is the number of convolution operations (the number of times of convolution kernel sliding) for one input image.

For example, the above-mentioned calculating model based on the correction input vector and the neural network to obtain the output vector of the first to-be-processed network layer may include: and respectively taking the weight parameter and the correction input vector of the first network layer to be processed as the weight parameter and the input vector of the neural network calculation model to calculate so as to obtain the output vector of the first network layer to be processed.

Illustratively, the corrected input vector X 'obtained after scrambling the motion vector may be'_jAnd a weight parameter C_jCalculating as weight parameters and input vectors of a neural network calculation model with an output vector of Y_j＝C_j×X'_j+b_j，X'_jCalculating a correction input vector, Y, for a model of a neural network_jIs the output of the first network layer to be processed toAmount of the compound (A).

According to the neural network computing method provided by the embodiment of the application, the input vector of a first network layer to be processed is obtained; obtaining a disturbance vector of the first network layer to be processed according to the reference random vector of the neural network and the dimension of the random vector of the first network layer to be processed, wherein the dimension of the disturbance vector is equal to the dimension of an input vector of the first network layer to be processed; multiplying elements in the input vector of the first network layer to be processed with elements at corresponding positions in the perturbation vector to obtain a corrected input vector of the first network layer to be processed; and obtaining an output vector of the first network layer to be processed based on the correction input vector and the calculation model of the neural network. According to the embodiment of the application, the dynamic vector is scrambled on the input vector of the first network layer to be processed, so that the correlation between output characteristic images can be reduced, the training convergence speed is accelerated, and the precision loss is reduced. In addition, according to the embodiment of the application, the disturbance vector with the same dimension as the input vector of the first network layer to be processed can be generated through the reference random vector of the neural network and the dimension of the random vector of the first network layer to be processed, and because the dimension of the random vector of the first network layer to be processed is smaller than the dimension of the input vector, only few storage resources and calculation resources need to be added, so that the accuracy and the convergence speed of the neural network can be effectively improved.

The present application further provides an embodiment, as shown in fig. 5, before the step S404, a step S405 may be further included.

S405, decompressing the weight parameters of the first network layer to be processed into a block structured matrix corresponding to the compression matrix.

It is understood that step S405 may be performed by the decompression module 105 shown in fig. 2.

For example, since the weight parameter of the first network to be processed is a compressed matrix corresponding to the block-structured matrix, the weight parameter C may be first calculated before the output vector of the first network layer to be processed is calculated_jDecompressing to obtain the block structured matrix W corresponding to the weight parameter (compression matrix)_jThen the block structured matrix and the input vector are adopted to carry out matrix multiplicationAn output vector is calculated.

For example, the output vector of the first to-be-processed network layer is specifically: y is_m＝W_m×X'_m+b_mWherein Y is_mIs the output vector, X 'of the first network layer to be processed'_mIs a correction input vector of the first network layer to be processed, W_mA block-structured matrix corresponding to the weight parameter of the first network layer to be processed, b_mIs a preset offset value of the first network layer to be processed.

For example, if the decompression module 105 in fig. 2 is used to decompress the weight parameters into the block-structured matrix (as in the calculation method shown in fig. 5), the second calculation module 104 in fig. 2 may perform matrix multiplication to obtain the output vector of the first network layer to be processed. Instead of using the decompression module 105 (as the calculation method shown in fig. 4), the output vector of the first to-be-processed network layer can be calculated directly by the second calculation module 104 in fig. 1 using fast fourier transform.

According to the neural network computing method provided by the embodiment of the application, the input vector of a first network layer to be processed is obtained; obtaining a disturbance vector of the first network layer to be processed according to the reference random vector of the neural network and the dimension of the random vector of the first network layer to be processed, wherein the dimension of the disturbance vector is equal to the dimension of an input vector of the first network layer to be processed; multiplying elements in the input vector of the first network layer to be processed with elements at corresponding positions in the perturbation vector to obtain a corrected input vector of the first network layer to be processed; decompressing the weight parameters of the first network layer to be processed into a block structured matrix corresponding to the compression matrix; and obtaining an output vector of the first network layer to be processed based on the correction input vector and the calculation model of the neural network. According to the embodiment of the application, the dynamic vector is scrambled on the input vector of the first network layer to be processed, so that the correlation between output characteristic images can be reduced, the training convergence speed is accelerated, and the precision loss is reduced. In addition, according to the embodiment of the application, the disturbance vector with the same dimension as the input vector of the first network layer to be processed can be generated through the reference random vector of the neural network and the dimension of the random vector of the first network layer to be processed, and because the dimension of the random vector of the first network layer to be processed is smaller than the dimension of the input vector, only few storage resources and calculation resources need to be added, so that the accuracy and the convergence speed of the neural network can be effectively improved.

The present application provides yet another embodiment, as shown in fig. 6, the method may further include steps S601-S603.

S601, determining the dimension of the random vector of the first network layer to be processed according to the weight parameter of the first network layer to be processed and the dimension of the input vector of the first network layer to be processed.

It is understood that step S601 may be performed by the first determination module 301 shown in fig. 3.

For example, the determining the dimension of the random vector of the first network layer to be processed based on the weight parameter of the first network layer to be processed and the dimension of the input vector of the first network layer to be processed may include: determining a target interval, wherein two interval endpoints of the target interval are the block size of the block structured matrix corresponding to the weight parameter of the first network layer to be processed and the dimension of the input vector of the first network layer to be processed respectively; and randomly determining the dimension of the random vector of the first network layer to be processed in the target interval. For example, if the block size of the block circulant matrix of the first to-be-processed network layer in the current neural network is 32, the dimension of the input vector of the neural network is 800, and the dimension of the random vector of the first to-be-processed network layer may randomly select an integer within the (32, 800) interval, for example, may be 40.

It is to be understood that the neural network in the embodiment of the present application may include one or more first to-be-processed network layers, and when the neural network includes a plurality of first to-be-processed network layers, the dimensions of the random vectors of the plurality of first to-be-processed network layers may be the same or different, which is not limited in the embodiment of the present application.

S602, determining the maximum value in the dimensionality of all the random vectors of the first network layer to be processed as the dimensionality of the reference random vector of the neural network.

It is understood that step S602 may be performed by the second determination module 302 shown in fig. 3.

For example, when the plurality of network layers of the neural network includes a plurality of first network layers to be processed, the maximum value among the dimensions of the random vectors of all the first network layers to be processed is determined as the dimension of the reference random vector of the neural network. For example, when the neural network includes 5 first network layers to be processed, the dimensions of the random vectors of the 5 first network layers to be processed are: 40. 35, 50, 46, 55, the dimension of the reference random vector of the neural network is the maximum of the dimensions of the random vectors of the 5 first to-be-processed network layers, i.e. 55.

And S603, generating a random number meeting the dimensionality of the reference random vector based on a preset random number generation model.

It is understood that step S603 may be performed by the generation module 303 shown in fig. 3.

For example, a set of random numbers satisfying the dimension of the reference random vector may be generated according to a preset random number generation model and the dimension of the reference random vector, where the random numbers satisfying the dimension of the reference random vector are the reference random vector. For example, the random number generation model may be a symbol vector that follows a binomial distribution, and if the reference random vector has a dimension of 55, a symbol vector [1, -1,1,1, -1 … 1, 1] having a dimension of 55 may be generated, where a may be denoted as a, and a may be [1, -1,1,1, -1 … 1,1 ]. The embodiment of the present application does not limit the specific form of the random number generation model, and any random number generation model is within the scope of the embodiment of the present application, and is only an exemplary illustration here.

It should be noted that, in the embodiment of the present application, the process of determining the basic random vector in steps S601 to S603 may be performed before training of a neural network model described below, and the output vector of the neural network may be calculated by using the neural network calculation method in steps S401 to S406 according to the weight parameter of each network layer obtained by training of the neural network model when performing forward calculation on the neural network, and the dimensions of the reference random vector generated before training of the model and the random vector of the first network layer to be processed.

In another embodiment, when the neural network includes a plurality of first to-be-processed network layers, different first to-be-processed network layers in the plurality of first to-be-processed network layers may adopt the same calculation method (such as the calculation method shown in fig. 4 or fig. 5) or different calculation methods. When a plurality of different first to-be-processed network layers adopt different calculation methods, a part of the first to-be-processed network layers may adopt the calculation methods shown in fig. 5 and 6, and another part of the first to-be-processed network layers may adopt the calculation method shown in fig. 7. As shown in fig. 7, the method includes steps S701-S703.

S701, obtaining an input vector of a first network layer to be processed.

It is understood that step S701 may be performed by the first obtaining module 101 shown in fig. 1 or fig. 2.

S702, decompressing the weight parameters of the first network layer to be processed into a block structured matrix corresponding to the compression matrix.

It is understood that step S702 may be performed by the decompression module 105 shown in fig. 2.

For example, since the weight parameter of the first network to be processed is a compressed matrix corresponding to the block-structured matrix, the weight parameter C may be first calculated before the output vector of the first network layer to be processed is calculated_nDecompressing to obtain the block structured matrix W corresponding to the weight parameter (compression matrix)_n。

And S703, obtaining an output vector of the first network layer to be processed according to the input vector of the first network layer to be processed and the calculation model of the neural network.

It is understood that step S703 may be performed by the second calculation module 104 shown in fig. 1 or fig. 2.

Illustratively, the output vector of the first to-be-processed network layer is Y_n＝W_n×X_n+b_nWherein Y is_nIs the output vector of the first network layer to be processed, X_nIs an input vector of the first network layer to be processed, W_nA block-structured matrix corresponding to the weight parameter of the first network layer to be processed, b_nIs a preset offset value of the first network layer to be processed.

It should be noted that, for which first to-be-processed network layer of the one or more first processing network layers included in the neural network calculates the output of the network layer by using the calculation method shown in fig. 5, and which first to-be-processed network layer calculates the output of the network layer by using the calculation method shown in fig. 7, the embodiment of the present application is not limited, and may be determined specifically according to an actual network structure. However, when the neural network computing method according to the embodiment of the present application is used, at least one first to-be-processed network layer should calculate the output vector of the network layer by using the computing method shown in fig. 5.

In another embodiment, if the plurality of network layers included in the neural network include one or more second network layers to be processed in addition to the first network layer to be processed, the output vector of any one of the second network layers to be processed may be calculated for the one or more second network layers to be processed by the calculation method in the embodiment of the present application. As shown in fig. 8, the calculation method may include steps S801 to S802.

S801, obtaining an input vector of a second network layer to be processed.

It is understood that step S801 may be performed by the first obtaining module 101 shown in fig. 1 or fig. 2.

The second network layer to be processed refers to a network layer of which the weight parameter is an unstructured matrix among a plurality of network layers of the neural network. The unstructured matrix means that the matrix is not arranged according to a certain rule and cannot be represented by a minimum element set which uniquely determines the structured matrix. The matrix a in the following formula is an unstructured matrix, and cannot be compressed, and there is no compressed matrix.

For example, when the second to-be-processed network layer is the first network layer of the neural network, the input vector of the second to-be-processed network layer is the input vector of the neural network; when the second network layer to be processed is any network layer except the first network layer, the input vector of the second network layer to be processed is the output vector of the previous network layer of the second network layer to be processed.

S802, obtaining an output vector of the second network to be processed according to the input vector of the second network layer to be processed and the calculation model of the neural network.

It is understood that step S802 may be performed by the second calculation module 104 shown in fig. 1 or fig. 2.

Illustratively, the output vector of the second pending network is Y_i＝C_i×X_i+b_iWherein Y is_iIs the output vector of the second network layer to be processed, X_iAs input vector of the second network layer to be processed, C_iAs a weight parameter of the second pending network layer, b_iAnd the preset offset value of the second network layer to be processed. Since the weighting parameters of the second network layer to be processed are unstructured matrices, C_i＝W_iTherefore, the output vector of the second network layer to be processed can be directly calculated according to the input vector of the second network layer to be processed, the weight parameter and the preset offset value, and decompression processing of the weight parameter is not required.

As shown in fig. 9, it is an effect diagram of an experiment on a convolutional neural network formed by using 2 convolutional layers and 1 fully-connected layer on an MNIST database, where the second convolutional layer is a first network layer to be processed, and the first convolutional layer and the third fully-connected layer are second networks to be processed. Fig. 9 is an effect diagram obtained by a method of adding no sign vector in the prior art when the abscissa indicates the number of iterations, the ordinate indicates the accuracy, and the x waist bar plus a straight line indicates that the block size of the block circulant matrix is 32; when the straight line represents that the block size of the block cyclic matrix is 32, obtaining a disturbance vector by adopting a cyclic shift mode, and obtaining an effect graph by adopting the neural network computing method shown in FIG. 5 for a first network layer to be processed and the neural network computing method shown in FIG. 8 for a second network layer to be processed; when the triangle waist block plus the straight line indicates that the block size of the block cyclic matrix is 40, obtaining a disturbance vector by adopting a vector element copying (content copying) mode, and adopting the neural network computing method shown in FIG. 5 for a first network layer to be processed and adopting an effect diagram obtained by adopting the neural network computing method shown in FIG. 8 for a second network layer to be processed; the dotted line represents that when the block size of the block circulant matrix is 40, the disturbance vector is obtained by adopting a cyclic shift mode, the neural network calculation method shown in fig. 5 is adopted for the first network layer to be processed, and the effect graph obtained by adopting the neural network calculation method shown in fig. 8 is adopted for the second network layer to be processed.

As shown in fig. 9, when the block size is 32, the block size is large, and the network cannot converge when the calculation method without adding the sign vector in the prior art is adopted, so that the compression by a factor of 32 cannot be realized. By adopting the method for calculating the neural network in the embodiment of the application to obtain the disturbance symbol vector of the first to-be-processed network layer through cyclic shift, the network can be rapidly converged, and the method has a very obvious beneficial effect compared with the prior art. When the block size is 40, the perturbation vector of the first network layer to be processed is generated by adopting the content copy (vector element copy) and content cyclic shift in the embodiment of the present application, and the calculation methods shown in fig. 5 and 8 are adopted for the network layer, the network can converge, and when cyclic shift is adopted, the convergence speed of the network is faster, and the accuracy is higher. Therefore, when the neural network computing method in the embodiment of the application is adopted to carry out forward reasoning on the network, the problems that training is difficult to converge and precision is obviously reduced when the neural network is compressed by the block circulant matrix and the size of the block is large can be solved.

The embodiment of the present application further provides a training method of a neural network model, which is used for obtaining a weight parameter of each network layer in the neural network before calculating the neural network, as shown in fig. 10, the training method of the neural network model includes steps S1001 to S1004.

S1001, initializing the weight parameters of each network layer.

For example, the weight parameter of each of the one or more first to-be-processed network layers and the one or more second to-be-processed network layers included in the neural network is initialized.

S1002, performing neural network calculation according to the neural network calculation model corresponding to each network layer to obtain a temporary output vector of the neural network.

Wherein, the weight parameter of the neural network calculation model is the initialized weight parameter.

For example, the performing the neural network computation according to the neural network computation model corresponding to each network layer may include:

for a first network layer to be processed in the first network layer set, the temporary output vector is: y is_m0＝W_m0×X'_m+b_mWherein Y is_m0Is a temporary output vector, X 'of the first network layer to be processed'_mIs a correction input vector of the first network layer to be processed, W_m0A block-structured matrix corresponding to the initialized weight parameters of the first network layer to be processed, b_mIs a preset offset value of the first network layer to be processed.

For a first network layer to be processed in the second network layer set, the temporary output vector is: y is_n0＝W_n0×X_n+b_nWherein Y is_n0Is a temporary output vector, X, of a first network layer to be processed in a second set of network layers_nIs an input vector of a first network layer to be processed in a second network layer set, W_n0A block-structured matrix corresponding to the initialized weight parameters of the first network layer to be processed in the second network layer set, b_nThe preset offset value is the preset offset value of the first network layer to be processed in the second network layer set.

For the second network layer to be processed, the temporary output vector is: y is_i0＝C_i0×X_i+b_iWherein Y is_iFor a temporary output vector of the second network layer to be processed, X_iAs input vector of the second network layer to be processed, C_iWeight parameter for initialization of the second network layer to be processed, b_iAnd the preset offset value of the second network layer to be processed.

For example, when only the first to-be-processed network is included in the plurality of network layers of the neural network, the neural network calculation may be performed for each network layer according to the calculation method shown in fig. 4 or fig. 5; or, when only the first to-be-processed network is included in the plurality of network layers of the neural network, the neural network may be calculated for the corresponding network layer according to the calculation method shown in fig. 4 or 5 and the calculation method shown in fig. 7; or, when the network layers of the neural network include the first network layer to be processed and the second network layer to be processed, the neural network calculation may be performed on the corresponding network layers according to the calculation method shown in fig. 4 or 5 and the calculation method described in fig. 8; alternatively, when the plurality of network layers of the neural network include the first network layer to be processed and the second network layer to be processed, the neural network may be calculated for the corresponding network layer according to the calculation method shown in fig. 4 or 5 and the calculation method described in fig. 7 and 8. The specific calculation method may be determined according to a network architecture of an actual application, and is not limited herein. It should be noted that, in the embodiment of the present application, a calculation manner adopted for each network layer in the training process should be the same as a calculation manner adopted for the neural network inference process.

And S1003, updating the weight parameter of each network layer through the reverse transmission of the neural network.

Illustratively, the updated weight parameter of each network layer is used as the weight parameter of the neural network computational model corresponding to each network layer, and the steps S1002-S1003 are repeated until the difference between the temporary output vector of the neural network and the preset output vector of the neural network is smaller than the preset value.

For example, when the difference between the temporary output vector of the neural network and the preset output vector of the neural network is smaller than the preset value, the weight parameter of the neural network may cause the training to converge. The value of the preset value is not limited in the embodiment of the application, and can be determined according to practical application.

S1004, acquiring the weight parameter of each network layer.

Illustratively, the weight parameter when the neural network training converges is the trained weight parameter, and the neural network inference shown in fig. 4 to 8 can be performed by using the weight parameter, so that a larger compression ratio can be realized, and the neural network converges quickly.

For example, the training method of the neural network model provided in the embodiment of the present application may further include, before step S901, steps S601 to S603, and obtain the weight parameter of each network layer by using the model training method of steps S1001 to S1004 according to the dimensions of the reference random vector and the random vector of the first network layer to be processed determined in steps S601 to S603, and then perform the aforementioned neural network calculation process.

According to the training method of the neural network model, the weight parameters of each network layer are initialized; performing neural network calculation according to the neural network calculation model corresponding to each network layer to obtain a temporary output vector of the neural network; and updating the weight parameter of each network layer through the reverse transmission of the neural network, taking the updated weight parameter of each network layer as the weight parameter of the neural network calculation model corresponding to each network layer, repeating the calculation until the difference between the temporary output vector of the neural network and the preset output vector of the neural network is less than a preset value, and acquiring the weight parameter of each network layer. In the training process of the embodiment of the application, the correction input vector is obtained by scrambling the dynamic vector on the input vector of the first to-be-processed network layer, and the neural network training is performed by correcting the input vector, so that the correlation between output characteristic images can be reduced, the training convergence speed is accelerated, and the precision loss is reduced.

The above description has introduced the scheme provided by the embodiments of the present invention mainly from the perspective of the method steps. It will be appreciated that the computer, in order to carry out the above-described functions, may comprise corresponding hardware structures and/or software modules for performing the respective functions. Those of skill in the art would readily appreciate that the present application is capable of being implemented as a combination of hardware and computer software for carrying out the various example elements and algorithm steps described in connection with the embodiments disclosed herein. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiment of the present application, functional modules may be divided according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. It should be noted that, the division of the modules in the embodiment of the present invention is schematic, and is only a logic function division, and there may be another division manner in actual implementation.

In the case of dividing each functional module by corresponding functions, fig. 11 shows a schematic diagram of a possible structure of the neural network computing device according to the above-described embodiment, and the neural network computing device 1100 includes: a first acquisition unit 1101, a second acquisition unit 1102, a first calculation unit 1103, a second calculation unit 1104, and a decompression unit 1105. The first obtaining unit 1101 is configured to support the neural network computing device 1100 to perform S401 in fig. 4, or S701 in fig. 7, or S801 in fig. 8; the second obtaining unit 1102 is configured to support the neural network computing device 1100 to perform S402 in fig. 4; the first calculating unit 1103 is used to support the neural network calculating device 1100 to execute S403 in fig. 4; the second computing unit 1104 is configured to support the neural network computing device 1100 to perform S404 in fig. 4, or S703 in fig. 7, or S802 in fig. 8; the decompression unit 1105 is used to support the neural network computing device 1100 to perform S405 in fig. 5, or S702 in fig. 7. All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module shown in fig. 1 or fig. 2, and are not described herein again.

In the case of dividing each functional module by corresponding functions, fig. 12 is a schematic diagram showing a possible structure of the neural network model training apparatus according to the above embodiment, where the neural network model training apparatus 1200 includes: an initialization unit 1201, a first determination unit 1202, a second determination unit 1203, a generation unit 1204, a neural network calculation unit 1205, a back transfer unit 1206, and an acquisition unit 1207. The initialization unit 1201 is configured to support the neural network model training apparatus 1200 to execute S1001 in fig. 10; the first determination unit 1202 is configured to support the neural network model training apparatus 1200 to perform S601 in fig. 6; the second determining unit 1203 is configured to support the neural network model training apparatus 1200 to perform S602 in fig. 6; the generating unit 1204 is configured to support the neural network model training apparatus 1200 to perform S603 in fig. 6; the neural network calculating unit 1205 is used to support the neural network model training apparatus 1200 to execute S1002 in fig. 10; the backward transfer unit 1206 is configured to support the neural network model training apparatus 1200 to perform S1003 in fig. 10; the obtaining unit 1207 is configured to support the neural network model training apparatus 1200 to execute S1004 in fig. 10.

In the case of an integrated unit, fig. 13 shows a schematic diagram of a possible structure of the neural network computing device 1300 involved in the above-described embodiments. The neural network computing device 1300 includes: a storage module 1301 and a processing module 1302. The processing module 1302 is used to control and manage the actions of the computer, for example, the processing module 1302 is used to support the neural network computing device 1300 in performing S401-S404 in fig. 4, or S401-S405 in fig. 5, or S601-S603 in fig. 6, or S701-S703 in fig. 7, or S801-S802 in fig. 8, and/or other processes for the techniques described herein. A storage module 1301 for storing program codes and data of the computer. In another implementation, the neural network computing device according to the above embodiments may be further configured to include a processor and an interface, where the processor and the interface are in communication, and the processor is configured to execute the embodiments of the present invention. The processor may be a CPU, or other hardware, such as a Field-Programmable Gate Array (FPGA), etc., or a combination of both.

In the case of an integrated unit, fig. 14 shows a schematic diagram of a possible structure of the neural network model training device 1400 involved in the above-described embodiment. The neural network model training device 1400 includes: a storage module 1401 and a processing module 1402. The processing module 1402 is used to control and manage the actions of the neural network model training device 1400, for example, the processing module 1402 is used to support the neural network model training device 1400 to perform S601-S703 in fig. 6, or S1001-S1004 in fig. 10, and/or other processes for the techniques described herein. A storage module 1401 for storing program codes and data of the computer. In another implementation, the neural network model training apparatus according to the foregoing embodiment may further include a processor and an interface, where the processor is in communication with the interface, and the processor is configured to execute the embodiment of the present invention. The processor may be a CPU, or other hardware, such as a Field-Programmable Gate Array (FPGA), etc., or a combination of both.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied in hardware or in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in Random Access Memory (RAM), flash Memory, Erasable Programmable read-only Memory (EPROM), Electrically Erasable Programmable read-only Memory (EEPROM), registers, a hard disk, a removable disk, a compact disc read-only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a core network interface device. Of course, the processor and the storage medium may reside as discrete components in a core network interface device.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

A neural network computing method, wherein the neural network comprises a plurality of network layers including a first to-be-processed network layer, the method comprising:

acquiring an input vector of the first network layer to be processed;

obtaining a disturbance vector of the first network layer to be processed according to the reference random vector of the neural network and the dimensions of the random vector of the first network layer to be processed, wherein the dimensions of the disturbance vector are equal to the dimensions of the input vector of the first network layer to be processed, and the dimensions of the random vector of the first network layer to be processed are determined based on the weight parameters of the first network layer to be processed and the dimensions of the input vector of the first network layer to be processed;

multiplying elements in the input vector of the first network layer to be processed with elements in the corresponding position in the perturbation vector to obtain a corrected input vector of the first network layer to be processed;

obtaining an output vector of the first to-be-processed network layer based on the correction input vector and a computational model of the neural network.
The neural network computing method according to claim 1, wherein when the first to-be-processed network layer is an input layer of the neural network, the obtaining an input vector of the first to-be-processed network layer includes using the input vector of the neural network as the input vector of the first to-be-processed network layer; when the first network layer to be processed is not the input layer of the neural network, the obtaining of the input vector of the first network layer to be processed includes using the output vector of the previous network layer of the first network layer to be processed as the input vector of the first network layer to be processed.
The neural network computing method according to claim 1 or 2, wherein the input vector of the first to-be-processed network layer is { x1, x2 … xn }, the disturbance vector is { y1, y2 … yn }, and the correction input vector of the first to-be-processed network layer is { x1 × y1, x2 × y2 … xn × yn }, where n is a positive integer.
The neural network computational method of any one of claims 1 to 3, wherein the neural network computational model comprises:

Y_j＝C_j×X_j+b_jwherein Y is_jComputing an output vector, X, of the model for the neural network_jComputing an input vector of a model for the neural network, C_jCalculating weight parameters of the model for the neural network, b_jAnd calculating a preset offset value of the neural network calculation model, wherein the neural network calculation model is used for calculating the neural network of the j-th network layer in the plurality of network layers, and j is an integer.
The neural network computing method of claim 4, wherein the obtaining an output vector of the first to-be-processed network layer based on the corrected input vector and a computational model of the neural network comprises:

and respectively taking the weight parameter and the correction input vector of the first network layer to be processed as the weight parameter and the input vector of the neural network calculation model to calculate so as to obtain the output vector of the first network layer to be processed.
The neural network computing method according to claim 4 or 5, wherein the weighting parameter is a compression matrix corresponding to a block-structured matrix, wherein the block-structured matrix is uniquely determined by the compression matrix, and before the calculating by the neural network based on the correction input vector to obtain the output vector of the first to-be-processed network layer, the method further comprises:

and decompressing the weight parameters into a block structured matrix corresponding to the compression matrix.
The neural network computing method of claim 6, wherein determining the dimension of the random vector of the first network layer to be processed based on the weight parameter of the first network layer to be processed and the dimension of the input vector of the first network layer to be processed comprises:

determining a target interval, wherein two interval endpoints of the target interval are the block size of the block structured matrix corresponding to the weight parameter of the first network layer to be processed and the dimension of the input vector of the first network layer to be processed respectively;

and randomly determining the dimension of the random vector of the first network layer to be processed in the target interval.
The neural network computing method according to any one of claims 1 to 7, wherein the plurality of network layers includes a plurality of first network layers to be processed, and before the obtaining the perturbation vector of the first network layer to be processed according to the reference stochastic vector of the neural network and the dimensionality of the stochastic vector of the first network layer to be processed, the method further includes:

determining the maximum value in the dimensionalities of all random vectors of the first network layer to be processed as the dimensionality of a reference random vector of the neural network;

generating a random number satisfying the dimensionality of the reference random vector based on a preset random number generation model;

and forming the generated random numbers into the reference random vector.
The neural network computing method according to any one of claims 1 to 8, wherein obtaining the perturbation vector of the first network layer to be processed according to the dimensions of the reference stochastic vector of the neural network and the stochastic vector of the first network layer to be processed comprises:

intercepting a vector with the same dimension as the random vector of the first network layer to be processed from the reference random vector as the random vector of the first network layer to be processed;

and generating a vector with the same dimension as the input vector of the first network layer to be processed as the disturbance vector of the first network layer to be processed by adopting a cyclic shift or vector element replication mode on the random vector of the first network layer to be processed.
A training method of a neural network model for obtaining a weight parameter of each network layer in a neural network, the method comprising:

step 1, initializing the weight parameter of each network layer;

step 2, performing neural network computation according to a neural network computation model corresponding to each network layer to obtain a temporary output vector of the neural network, wherein weight parameters of the neural network computation model are the initialized weight parameters, and the neural network computation according to the neural network computation model corresponding to each network layer includes performing neural network computation on one or more first to-be-processed network layers in each network layer according to the neural network computation method of any one of claims 1 to 9;

step 3, updating the weight parameter of each network layer through the reverse transmission of the neural network;

repeating the step 2 and the step 3 by taking the updated weight parameter of each network layer as the weight parameter of the neural network calculation model corresponding to each network layer until the difference between the temporary output vector and the preset output vector of the neural network is less than a preset value;

and 4, acquiring the weight parameter of each network layer.
A neural network computing apparatus, wherein the neural network includes a plurality of network layers including a first to-be-processed network layer, the apparatus comprising:

a first obtaining unit, configured to obtain an input vector of the first to-be-processed network layer;

a second obtaining unit, configured to obtain a perturbation vector of the first network layer to be processed according to a reference random vector of the neural network and dimensions of a random vector of the first network layer to be processed, where the dimensions of the perturbation vector are equal to those of an input vector of the first network layer to be processed, and the dimensions of the random vector of the first network layer to be processed are determined according to a weight parameter of the first network layer to be processed and the dimensions of the input vector of the first network layer to be processed;

the first calculation unit is used for multiplying elements in the input vector of the first network layer to be processed with elements at corresponding positions in the disturbance vector acquired by the second acquisition unit to acquire a corrected input vector of the first network layer to be processed;

and the second calculation unit is used for obtaining an output vector of the first network layer to be processed based on the correction input vector obtained by the first calculation unit and the calculation model of the neural network.
The neural network computing device according to claim 11, wherein when the first to-be-processed network layer is an input layer of the neural network, the obtaining unit is specifically configured to obtain an input vector of the neural network; when the first to-be-processed network layer is not the input layer of the neural network, the obtaining unit is specifically configured to obtain an output vector of a network layer preceding the first to-be-processed network layer.
The neural network computing device of claim 11 or 12, wherein the input vector of the first to-be-processed network layer is { x1, x2 … xn }, the perturbation vector is { y1, y2 … yn }, and the correction input vector of the first to-be-processed network layer is { x1 xy 1, x2 xy 2 … xn xyn }, where n is a positive integer.
The neural network computing device of any one of claims 11 to 13, wherein the neural network computing model comprises:

Y_j＝C_j×X_j+b_jwherein Y is_jComputing an output vector, X, of the model for the neural network_jComputing an input vector of a model for the neural network, C_jCalculating weight parameters of the model for the neural network, b_jAnd calculating a preset offset value of the neural network calculation model, wherein the neural network calculation model is used for calculating the neural network of the j-th network layer in the plurality of network layers, and j is an integer.
The neural network computing device of claim 14, wherein the second computing unit is specifically configured to: and respectively taking the weight parameter and the correction input vector of the first network layer to be processed as the weight parameter and the input vector of the neural network calculation model to calculate so as to obtain the output vector of the first network layer to be processed.
The neural network computing device of claim 14 or 15, wherein the weighting parameters are compression matrices corresponding to block-structured matrices, wherein the block-structured matrices are uniquely determined by the compression matrices, the device further comprising a decompression unit,

and the decompression unit is used for decompressing the weight parameters into a block structured matrix corresponding to the compression matrix.
The neural network computing device of claim 16, wherein the device further comprises a first determining unit,

the first determining unit is configured to determine a target interval, where two interval endpoints of the target interval are a block size of a block-structured matrix corresponding to the weight parameter of the first network layer to be processed and a dimension of an input vector of the first network layer to be processed, respectively;

the first determining unit is further configured to randomly determine the dimension of the random vector of the first network layer to be processed within the target interval.
The neural network computing device of any one of claims 11 to 17, wherein the plurality of network layers includes a first plurality of to-be-processed network layers, the device further comprising: a second determination unit and a generation unit,

the second determining unit is further configured to determine a maximum value among dimensions of random vectors of all the first network layers to be processed as a dimension of a reference random vector of the neural network;

the generating unit is used for generating random numbers meeting the dimensionality of the reference random vector based on a preset random number generation model, and forming the generated random numbers into the reference random vector.
The neural network computing device of any one of claims 11 to 18,

the second obtaining unit is specifically configured to intercept, from the reference random vector, a vector having a dimension equal to that of the random vector of the first network layer to be processed as the random vector of the first network layer to be processed;

the second obtaining unit is specifically configured to generate, by using a cyclic shift or vector element replication method on the random vector of the first network layer to be processed, a vector having a dimension equal to that of the input vector of the first network layer to be processed, as the perturbation vector of the first network layer to be processed.
An apparatus for training a neural network model to obtain a weight parameter for each network layer in a neural network, the apparatus comprising:

the initialization unit is used for initializing the weight parameter of each network layer;

the neural network computing unit is used for performing neural network computing according to the neural network computing model corresponding to each network layer to obtain a temporary output vector of the neural network, wherein the weight parameter of the neural network computing model is the initialized weight parameter;

the neural network computing unit is specifically configured to perform neural network computation on one or more first to-be-processed network layers in each network layer according to the neural network computing method of any one of claims 1 to 9;

a reverse transfer unit, configured to update the weight parameter of each network layer through reverse transfer of the neural network;

an obtaining unit, configured to obtain a weight parameter of each network layer, where the weight parameter of each network layer is obtained when a difference between a temporary output vector of the neural network and a preset output vector of the neural network is smaller than a preset value.
A computer storage medium having computer program code stored therein, which, when run on a processor, causes the processor to perform the neural network computing method of any one of claims 1-9, or to perform the training method of the neural network model of claim 10.