CN114676832A - Neural network model operation method, medium, and electronic device - Google Patents
Neural network model operation method, medium, and electronic device Download PDFInfo
- Publication number
- CN114676832A CN114676832A CN202210330475.1A CN202210330475A CN114676832A CN 114676832 A CN114676832 A CN 114676832A CN 202210330475 A CN202210330475 A CN 202210330475A CN 114676832 A CN114676832 A CN 114676832A
- Authority
- CN
- China
- Prior art keywords
- matrix
- time step
- data
- weight matrix
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Image Analysis (AREA)
Abstract
The present application relates to the field of machine learning technologies, and in particular, to a neural network model operation method, medium, and electronic device. According to the neural network model operation method, the weight matrixes in the operation items with the same operation form and the same input data are spliced, the operation units corresponding to the operation items in the form are used for obtaining the operation results of the operation items through acquiring the input data from the storage unit once based on the spliced weight matrixes, and therefore the speed of the electronic equipment for operating the neural network model is increased.
Description
Technical Field
The present application relates to the field of machine learning technologies, and in particular, to a neural network model operation method, medium, and electronic device.
Background
With the rapid development of Artificial Intelligence (AI) technology, neural networks (e.g., deep neural networks, recurrent neural networks) have recently gained excellent results in the fields of computer vision, speech, natural language, reinforcement learning, and the like. With the development of neural network algorithms, the complexity of the algorithms is higher and higher, and in order to improve the recognition degree, the scale of the models is gradually increased, and accordingly, the power consumption of the devices with the neural network models and the consumption of computing resources are higher and higher. Especially for some edge devices with limited operation resources, the method has the advantages of improving the operation speed of the neural network model, saving the operation time and reducing the power consumption.
Disclosure of Invention
The application aims to provide a neural network model operation method and mediumMass and electronic devices. By the operation method of the neural network model of the application, the operation items (such as 'x' in the following formula (1)) with the same operation form and the same input datat×Wxz+ht-1×Whz"and" x "in the formula (2)t×Wxr+ht-1×WhrThe operation forms are the same, and the input data are all xtAnd ht-1) "weight matrix of (W)xzAnd WxrAnd WhzAnd Whr) And splicing, and acquiring input data from the storage unit once based on the spliced weight matrix by using the operation unit corresponding to the operation item in the form to obtain the operation result of each operation item. Therefore, the speed of the electronic equipment for operating the neural network model is improved.
A first aspect of the present application provides a method for operating a neural network model, which is applied to an electronic device, and is characterized in that the neural network model includes a first operation and a second operation. And the method comprises: and acquiring a data matrix to be operated of a first operation or a second operation, wherein the number of operation factors in the first operation and the second operation is the same, and for each operation factor in the first operation, corresponding operation factors with the same data matrix to be operated and different operation coefficient matrixes exist in the second operation. And performing third operation on the data matrix to be operated to generate a result matrix of the third operation, wherein the third operation is an operation mode obtained by combining operation coefficient matrixes of corresponding operation factors in the first operation and the second operation. And splitting the result matrix of the third operation to respectively obtain the result matrix of the first operation and the result matrix of the second operation.
For example, with the neural network model as the threshold cycle unit model, the first operation may be x of the following formula (1) in the GRU network at the t-th time step of the threshold cycle unit modelt×Wxz+ht-1×WhzThe second operation may be x of formula (2) belowt×Wxz+ht-1×Whz. For example, the arithmetic unit of the processor reads the input of the t time step from the memory unit of the processor at a timeData xtA weight coefficient matrix WxzA weight coefficient matrix WhzA weight coefficient matrix WxrA weight coefficient matrix WhrAnd output data h of t-1 time stept-1The weight coefficient matrix W is setxzAnd a weight coefficient matrix WxrSplicing is carried out to obtain a splicing weight matrix WxzrThe weight coefficient matrix W is used to calculate the weight coefficienthzAnd a weight coefficient matrix WhrSplicing is carried out to obtain a splicing weight matrix WhzrThe operation unit splices the weight matrix W by operating the matrix operation logic of X multiplied by H1+ Y multiplied by H2xzrAs H1, splicing weight matrix WhzrInput data x as H2 at t-th time steptAs output data h of X, t-1 time stept-1As Y, corresponding matrix operation is performed, that is, x of the formula (1) can be obtained by one-time operationt×Wxz+ht-1×WhzOperation result sum xt×Wxr+ht-1×WhrThe result of the operation is then based on the weight coefficient matrix WxzAnd a weight coefficient matrix WxrDimension value of (a) xt×Wxz+ht-1×WhzOperation result sum xt×Wxr+ht-1×WhrSplitting the operation result to respectively obtain xt×Wxz+ht-1×WhzOperation result, xt×Wxr+ht-1×WhrAnd (5) calculating the result. The operation result of each operation item can be obtained by acquiring the input data from the storage unit once. Therefore, the speed of the electronic equipment for operating the neural network model is improved.
In one possible implementation of the first aspect, the operation coefficient matrices of the first operation and the second operation include two data dimensions of height and width, where the height and width of the operation coefficient matrices of the corresponding operation factors in the first operation and the second operation are equal to each other. The third operation is an operation mode obtained after the operation coefficient matrixes of the corresponding operation factors in the first operation and the second operation are combined along any data dimension direction.
In one possible implementation of the first aspect, the third operation is an operation mode obtained by combining operation coefficient matrices of corresponding operation factors in the first operation and the second operation in the width direction.
In a possible implementation of the first aspect, splitting the result matrix of the third operation to obtain the result matrix of the first operation and the result matrix of the second operation respectively includes: the result matrix of the third operation includes two data dimensions, height and width. And splitting the result matrix of the third operation along any data dimension direction to respectively obtain the result matrix of the first operation and the result matrix of the second operation.
In a possible implementation of the first aspect, the result matrix of the third operation is split along the width direction to obtain a result matrix of the first operation and a result matrix of the second operation, respectively.
In a possible implementation of the first aspect, the operation manner in which the third operation is obtained by combining operation coefficient matrices of corresponding operation factors in the first operation and the second operation includes: the data matrix to be operated comprises a first input data matrix and a second input data matrix, and the operation factors of the first operation and the second operation comprise the matrix product of the first input data matrix of the first operation and the second operation and the corresponding operation coefficient matrix and the matrix product of the second input data matrix of the first operation and the second operation and the corresponding operation coefficient matrix.
The third operation is an operation mode obtained by combining the operation coefficient matrixes corresponding to the first input data matrix of the first operation and the second input data matrix of the second operation and combining the operation coefficient matrixes corresponding to the second input data matrix of the first operation and the second operation.
In one possible implementation of the first aspect, the neural network model is a recurrent neural network model, and the first operation and the second operation are operations of a fully connected layer of the recurrent neural network model.
In one possible implementation of the first aspect described above, the neural network model comprises at least one of: a threshold cycle unit model and a long-short term memory model.
A second aspect of the present application provides an electronic device, including: the device comprises an arithmetic unit and a storage unit, wherein the arithmetic unit operates a first matrix arithmetic circuit. The operation unit acquires a data matrix to be operated of a first operation or a second operation from the storage unit, wherein the number of operation factors in the first operation and the second operation is the same, and for each operation factor in the first operation, there is a corresponding operation factor in the second operation in which the data matrix to be operated is the same and the operation coefficient matrix is different. The operation unit performs a third operation on the data matrix to be operated by operating the first matrix operation circuit to generate a result matrix of the third operation, wherein the third operation is an operation mode obtained by combining operation coefficient matrixes of corresponding operation factors in the first operation and the second operation, and the first matrix operation circuit is used for generating a first operation result matrix and a second operation result matrix. The operation unit splits the result matrix of the third operation to obtain a result matrix of the first operation and a result matrix of the second operation respectively.
For example, when an arithmetic logic unit (i.e., a first matrix operation circuit) corresponding to a matrix operation of X × H1+ Y × H2 of an operation unit is capable of processing data of a data amount larger than data processing amounts of an output result of an update gate generating a t-th time step and an output result of a reset gate generating the t-th time step at a time, the operation unit may read input data X of the t-th time step from a storage unit last timetA first weight matrix WxzA second weight matrix WhzA third weight matrix WxrA fourth weight matrix WhrAnd output data h of t-1 time stept-1Then, for the first weight matrix WxzAnd a third weight matrix WxrAnd splicing along the width direction to generate a first splicing weight matrix. For the second weight matrix WxzAnd a fourth weight matrix WhrAnd splicing along the width direction to generate a second splicing weight matrix. The arithmetic unit operates the corresponding arithmetic logic unit by running the matrix of X × H1+ Y × H2, i.e. the first splicing weight matrix is used as H1, and the second splicing weight matrix is input as H2The arithmetic logic unit, so that the arithmetic unit reads the data from the storage unit once to determine the output result z of the refresh gate at the t-th time stept' and reset gate output result rt'. Therefore, the times of reading data from the storage unit by the operation unit from the processing unit are reduced, when a GRU model processes a section of voice to be recognized, the operation unit only needs to read n times of data (double data reading amount is reduced) from the storage unit to generate characters corresponding to the voice to be recognized, the speed of voice processing by the processor is increased, the speed of voice recognition of the electronic equipment is increased, the time of the voice recognition is shortened, and the user experience is improved.
A third aspect of the present application provides an electronic device comprising: a memory for storing instructions for execution by the one or more processors of the electronic device, and a plurality of processors for executing the instructions in the memory to perform the neural network model operation method of the first aspect.
A fourth aspect of the present application provides a computer-readable storage medium comprising: the readable medium of the electronic device has stored thereon instructions that, when executed on the electronic device, cause the electronic device to perform the neural network model execution method of the first aspect.
A fifth aspect of the present application provides a computer program product comprising instructions for implementing the neural network model operation method of the first aspect.
Drawings
FIG. 1A illustrates an application diagram of a GRU model, according to some embodiments of the present application;
FIG. 1B illustrates a structural schematic diagram of a GRU model, according to some embodiments of the present application;
fig. 1C illustrates a schematic structural diagram of a GRU network at the t-th time step in fig. 1B, according to some embodiments of the present application;
FIG. 2 illustrates a schematic structural diagram of an electronic device, according to some embodiments of the present application;
FIG. 3 illustrates a schematic diagram of the output results of an update gate generating a t time step, according to some embodiments of the present application;
FIG. 4 illustrates a schematic diagram of the output results of a reset gate generating a t time step, according to some embodiments of the present application;
FIG. 5 illustrates a schematic diagram of a concatenation of a first weight matrix at a tth time step and a third weight matrix at the tth time step, according to some embodiments of the present application;
FIG. 6 illustrates a diagram of a concatenation of a second weight matrix at a tth time step and a fourth weight matrix at the tth time step, according to some embodiments of the present application;
FIG. 7 illustrates a graph of first output data at a tth time step, according to some embodiments of the present application;
FIG. 8 illustrates a schematic diagram of splitting first output data at a t time step into output results of an update gate and output results of a reset gate, according to some embodiments of the present application;
FIG. 9 illustrates a flow diagram of a GRU model method of operation, according to some embodiments of the present application;
fig. 10 illustrates a flow diagram of another method of GRU model operation, according to some embodiments of the present application.
Detailed Description
Illustrative embodiments of the present application include, but are not limited to, a neural network model operation method, apparatus, electronic device, medium, and computer program product. Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Since the present application relates to the contents of a threshold recycling Unit (GRU) model, in order to more clearly illustrate the solution of the embodiment of the present application, the GRU model related to the embodiment of the present application is described in detail below.
(1) Threshold cycling Unit (GRU) model
Is one of the Recurrent Neural Network (RNN) models. Like the Long-Short Term Memory (LSTM) model, it is proposed to solve the problems of Long-Term Memory and gradients in back propagation.
Fig. 1A shows an application diagram of a GRU model 10. As shown in fig. 1A, the input data of the GRU model 10 is the speech to be recognized, and the output data of the GRU model 10 is the text corresponding to the speech to be recognized.
In other embodiments, the GRU model 10 may also be used for text translation in different languages, for example, the input data to the GRU model 10 may be chinese text and the output data of the GRU model 10 may be english text corresponding to the chinese text. The GRU model 10 may also be used for image classification, for example, the input data of the GRU model 10 may be a plurality of frames of images, and the output data of the GRU model 10 may be a corresponding image type for each frame of image. It is to be understood that the GRU model 10 is primarily used for processing and predicting sequence data, and the content of identification of the GRU model 10 is not particularly limited by the present application, depending on the actual application.
Fig. 1B shows a schematic structural diagram of a GRU model 10. As shown in fig. 1B, the GRU model 10 includes n GRU networks, which are GRU1, GRU2, …, GRUt-1, GRUt … …, GRUn, respectively. Wherein, GRU1 represents the GRU network at the 1 st time step, GRU2 represents the GRU network at the 2 nd time step, GRUt-1 represents the GRU network at the t-1 th time step, GRUt represents the GRU network at the t-th time step, and GRUn represents the GRU network at the n-th time step.
For example, as shown in FIG. 1B, { x1、x2、…、xt-1、xt……、xnIs the voice data to be recognized, where x1Speech input data for GRU network (GRU1) at time step 1, x2Speech input data for GRU network (GRU2) at time step 2, xtInput data for the GRU network (GRUt) for the t-th time step, … …, xnSpeech input data for the GRU network (GRUn) at the nth time step. Output data { h) of GRU model 101、h2、…、ht-1、ht……、hnCan be the words corresponding to the speech to be recognized, wherein h1Output data of GRU network (GRU1) for the 1 st time step, h2GRU network for the 2 nd time step (GRU2), htOutput data of the GRU network (GRUt) for the t-th time step, … …, hnOutput data of the GRU network (GRUn) for the nth time step.
It is to be understood that the input data or the output data of the GRU network at each time step may be a matrix, a tensor, a vector, etc., and for convenience of description, in the following description of the embodiments, the input data or the output data matrix of the GRU network is taken as an example to describe the data processing related to the input or the output of each layer of the neural network.
It will be appreciated that since the GRU model 10 is primarily used to process and predict sequence data, the GRU network at the current time step needs to combine the output data at the previous time step, process the input data at the current time step and generate the output data at the current time step.
Fig. 1C shows a schematic structure diagram of a GRU network at the t-th time step in fig. 1B. Depending on the gate structure, the GRU network at the t-th time step can be divided into the following four phases:
1. update phase
The update phase is used to control the extent to which the state information output by the GRU network at the previous time step is brought into the state of the GRU network at the current time step. A larger gate value for an updated gate indicates a higher degree to which the state information of the previous time step is brought into the current state. Specifically, in the GRU network at the t-th time step, the update stage may be based on the output data h of the GRU network at the t-1-th time stept-1And the GRU network input data x of the t time steptCalculating an updated gating ztBy updating the gating ztTo describe being brought into the current state htTo the extent of (c).
Illustratively, the gating z is updatedtThe calculation can be performed by the following formula (1):
zt=σ(xt×Wxz+ht-1×Whz) (1)
where σ denotes sigmoid function, i.e. for xt×Wxz+ht-1×WhzThe calculated result of (2) is processed by sigmoid functionTransition to (0, 1). WxzInput data x of GRU network representing t time steptWeight coefficient matrix of, WhzOutput data h of GRU network representing t-1 time stept-1The weight coefficient matrix of (2). X denotes matrix multiplication and + denotes matrix splicing.
Illustratively, input data x using the t time steptAnd a weight matrix WxzThe result of the matrix multiplication is compared with the output data h of the t-1 time stept-1And a weight matrix WxzThe result of the matrix multiplication is matrix-spliced, i.e. xt×Wxz+ht-1×Whz。xt×Wxz+ht-1×WhzWhich can be understood as the vector stitching operation t11 in fig. 1C. Updating gating ztWhich can be understood as the output of the sigma layer t12 in fig. 1C.
2. Reset phase
The reset phase is used for controlling how much information of the state information output by the GRU network at the previous time step is written into the state of the GRU network at the current time step, and the smaller the reset gate is, the less the state information output by the GRU network at the previous time step is written into the state of the GRU network at the current time step is. Specifically, at the t-th time step of the GRU network, the reset phase may be based on the output data h of the GRU network at the t-1 th time stept-1And the GRU network input data x of the t time steptCalculating a reset gate control rtBy resetting the gate control rtTo describe the extent to which the status information of the GRU network output at the previous time step is written to the status of the GRU network at the current time step.
Illustratively, the gating r is resettThe calculation can be made by the following equation (2):
rt=σ(xt×Wxr+ht-1×Whr) (2)
where σ denotes sigmoid function, i.e. for xt×Wxr+ht-1×WhrThe calculated result of (c) is converted to (0,1) by the sigmoid function. WxrInput data x of GRU network representing t time steptWeight coefficient matrix of WhrOutput data h of GRU network representing t-1 time stept-1X represents matrix multiplication, and + represents matrix splicing.
It can be easily seen that the input data x at the t-th time step is usedtAnd a weight coefficient matrix WxrThe result of the matrix multiplication is compared with the output data h of the t-1 time stept-1And a weight coefficient matrix WhrThe result of the matrix multiplication is matrix-spliced, i.e. xt×Wxr+ht-1×Whr。xt×Wxr+ht-1×WhrWhich can be understood as the vector stitching operation t13 in fig. 1C. Reset gate control rtWhich can be understood as the output of the sigma layer t14 in fig. 1C.
3. Update memory phase
The update memory stage can update and memorize the input data of the GRU network at the current time step, namely, the update is important. Specifically, in the GRU network at the t-th time step, the update stage may be based on the output data h of the GRU network at the t-1-th time stept-1And the GRU network input data x of the t time steptCalculating a reset gate control rtThen using reset gating rtRealizing intermediate output result c of t time steptAnd (4) calculating.
Exemplarily, the intermediate output result c at the t-th time steptThe calculation can be performed by the following equation (3):
ct=tanh(xt×Wx+(ht-1·rt)Wh) (3)
wherein tanh represents the tanh function, i.e., for xt×Wx+(ht-1·rt)WhThe result of the calculation of (c) is converted to a value between (-1,1) by the tanh function. WxInput data x of GRU network representing t time steptWeight coefficient matrix of WhOutput h of GRU network representing t-1 time stept-1And reset gate value rtAnd a weight coefficient matrix of the calculation results of bit-wise multiplication. Representing rows in both matricesThe data with the same column number are multiplied correspondingly, x represents matrix multiplication, and + represents matrix splicing.
It will be readily seen that the output h of the GRU network using the t-1 time stept-1And reset gate value rtMultiplication by bit, i.e. ht-1·rt,ht-1·rtWhich may be understood as t15 of fig. 1C. Input data x using the t-th time steptAnd a weight coefficient matrix WxrThe result of the matrix multiplication is compared with the output data h of the t-1 time stept-1And the gate value r of the reset gatetBit-wise multiplied calculation result and weight coefficient matrix WhrThe result of the matrix multiplication is matrix-spliced, i.e. xt×Wx+(ht-1·rt)Wh。xt×Wx+(ht-1·rt)WhWhich can be understood as the vector stitching operation t16 in fig. 1C. Intermediate output result c at t-th time steptWhich can be understood as the output of tanh layer t17 in fig. 1C.
4. Output stage
The output stage may determine and output the output and state of the GRU network at the t-th time step. Specifically, the output stage can be based on the output data h of the GRU network at the t-1 time stept-1Updating the gating of the gate ztAnd intermediate output result c of t time steptCalculating to obtain output data h of the t-th time stept。
Exemplarily, the output data h at the t-th time steptThe calculation can be made by the following equation (4):
ht=(1-zt)·ct+zt·ht-1 (4)
wherein z istIndicating gating z of an update gatet,ht-1Output h of GRU network representing t-1 time stept-1,ctIntermediate output result c representing the t-th time stept. And + represents the vector summation.
It will be readily seen that 1 minus the value of the updated gate is usedztI.e. 1-zt,1-ztIt can be understood as the operation of t18 in fig. 1C. Intermediate output result c using t-th time steptAnd 1-ztThe result of the operation of (a) is subjected to a vector bitwise product operation, i.e., (1-z)t)·ct,(1-zt)·ctIt can be understood as the operation of t19 in fig. 1C. Output h of GRU network using t-1 time stept-1And the gating of the update gate ztPerforming a bit-wise multiplication of vectors, i.e. zt·ht-1,zt·ht-1It can be understood as the operation of t20 in fig. 1C. Output h of GRU network using t-1 time stept-1And the gating of the update gate ztThe result of the bitwise multiplication of the vector and the intermediate output c of the t-th time steptAnd 1-ztThe result of the vector bitwise multiplication operation is subjected to a vector summation operation, i.e., (1-z)t)·ct+zt·ht-1,(1-zt)·ct+zt·ht-1It can be understood as the operation of t21 in fig. 1C.
It can be understood that the processor of the electronic device is provided with an arithmetic unit for each operation term in the above equations (1) to (4), and the arithmetic unit can obtain the operation result of each operation term by reading the input data of each operation term from the storage unit of the processor. Wherein the operation term can be at least part of the above formulas, such as x in formula (1)t×Wxz+ht-1×WhzX in the formula (2)t×Wxz+ht-1×WhzSigmoid function σ (), etc.
When the GRU network defined by the above formulas (1) to (4) is operated by the arithmetic unit of the processor of the electronic device, it is necessary to read the input data of the operation items in each formula from the storage unit of the processor into the arithmetic logic circuit corresponding to the arithmetic unit, and then obtain the operation results of each operation item according to the arithmetic logic of each operation item. That is to say, each time the arithmetic unit of the processor runs an arithmetic item, input data needs to be read from the storage unit of the processor once, so that the times of reading the data from the storage unit by the arithmetic unit of the processor are increased, and the running speed of the neural network model is reduced.
For example, the processing unit of the processor is running the matrix operation logic of X × H1+ Y × H2 to calculate X of formula (1)t×Wxz+ht-1×WhzIn the operation of (3), it is necessary to read the input data x from the memory cell of the processor firsttA weight coefficient matrix WxzA weight coefficient matrix WhzAnd output data h of t-1 time stept-1After the processing unit reads the data, the processing unit runs the matrix operation logic of X multiplied by H1+ Y multiplied by H2, and the matrix operation logic is used for Xt×Wxz+ht-1×WhzPerforming an operation to obtain xt×Wxz+ht-1×WhzThe operation result of (1). Then, the processing unit of the processor further runs the matrix operation logic of X × H1+ Y × H2 to calculate X of formula (2)t×Wxr+ht-1×WhrAnd it is necessary to read the input data x again from the memory cell of the processortA weight coefficient matrix WxrA weight coefficient matrix WhrAnd output data h of t-1 time stept-1After the processing unit reads the data, the matrix operation logic of X × H1+ Y × H2 is operated to Xt×Wxr+ht-1×WhrPerforming an operation to obtain xt×Wxr+ht-1×WhrThe operation result of (2). It is easy to see that the arithmetic unit is calculating xt×Wxz+ht-1×WhzAnd xt×Wxr+ht-1×WhrWhen the operation result is obtained, it is necessary to read data twice from the memory cell to obtain xt×Wxz+ht-1×WhzAnd xt×Wxr+ht-1×WhrThe operation result of (1).
In order to reduce the number of times that the arithmetic unit reads data from the storage unit, the embodiment of the application provides an operation method of a neural network model, which is implemented by using arithmetic items (as disclosed in the specification) with the same operation form and the same input data"x" in the formula (1)t×Wxz+ht-1×Whz"and" x "in the formula (2)t×Wxr+ht-1×WhrThe operation forms are the same, and the input data are all xtAnd ht-1) "weight matrix in (W)xzAnd WxrAnd WhzAnd Whr) And splicing, and acquiring input data from the storage unit once based on the spliced weight matrix by using the operation unit corresponding to the operation item in the form to obtain the operation result of each operation item. Therefore, the speed of the electronic equipment for operating the neural network model is improved.
For example, the arithmetic unit of the processor reads the input data x of the t-th time step from the memory unit of the processor at a timetA weight coefficient matrix WxzA weight coefficient matrix WhzA weight coefficient matrix WxrA weight coefficient matrix WhrAnd output data h of t-1 time stept-1The weight coefficient matrix W is setxzAnd a weight coefficient matrix WxrSplicing is carried out to obtain a splicing weight matrix WxzrThe weight coefficient matrix W is used to calculate the weight coefficienthzAnd a weight coefficient matrix WhrSplicing is carried out to obtain a splicing weight matrix WhzrThe arithmetic unit splices the weight matrix W by running the matrix arithmetic logic of X × H1+ Y × H2xzrAs H1, splicing weight matrix WhzrInput data x as H2 at t-th time steptAs output data h of X, t-1 time stept-1As Y, corresponding matrix operation is performed, that is, x of the formula (1) can be obtained by one-time operationt×Wxz+ht-1×WhzOperation result sum xt×Wxr+ht-1×WhrThe result of the operation is then based on the weight coefficient matrix WxzAnd a weight coefficient matrix WxrDimension value of (a) xt×Wxz+ht-1×WhzOperation result sum xt×Wxr+ht-1×WhrSplitting the operation result to respectively obtain xt×Wxz+ht-1×WhzOperation result, xt×Wxr+ht-1×WhrAnd (5) calculating the result.
To facilitate understanding of the technical solution of the embodiment of the present application, the electronic device 20 executing the calculation process of the threshold cycle unit model 10 is described below.
Fig. 2 illustrates a schematic block diagram of an electronic device 20 according to some embodiments of the present application, and as shown in fig. 2, the electronic device 20 includes a processor 21, a system memory 22, a non-volatile memory 23, an input/output device 24, a communication interface 25, and system control logic 26 for coupling the processor 21, the system memory 22, the non-volatile memory 23, the input/output device 24, and the communication interface 25. Wherein:
the Processor 201 may include one or more Processing units, such as Processing modules or Processing circuits that may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), a Microprocessor (MCU), a Programmable Gate Array (FPGA), an Artificial Intelligence Processing Unit (AIPU), a Neural Network Processor (NPU), and so on. The different processing units may be separate devices or may be integrated into one or more processors. In some embodiments, processor 201 may perform the computational process of neural network model 10.
In some embodiments, the processor 21 may include a control unit 210, an arithmetic unit 211, and a storage unit 212, wherein the control unit 210 is configured to schedule the processor 21, and in some embodiments, the control unit 210 further includes a Direct Memory Access Controller (DMAC) 2101 configured to transfer data in the storage unit 212 to other units, for example, to the system Memory 22.
The operation unit 211 is used for performing specific arithmetic and/or logic operations, and in some embodiments, the operation unit 211 may include an arithmetic logic unit, which refers to a combinational logic circuit capable of implementing multiple sets of arithmetic and logic operations for performing arithmetic and logic operations. For example, the operation unit 211 includes arithmetic logic units corresponding to the operators of formula (1) to formula (4).
In some embodiments, the arithmetic unit 211 internally includes a plurality of processing units (PEs). In some implementations, the arithmetic unit 211 is a two-dimensional systolic array. The arithmetic unit 211 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic unit 211 is a general-purpose matrix processor.
For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The operation unit 211 obtains the data corresponding to the input matrix a and the weighting matrix B from the storage unit 212, and buffers the data on each PE in the operation circuit, and the operation circuit performs matrix operation on the data in the matrix a and the data in the matrix B to obtain a partial result or a final result of the matrix. It is understood that the operation unit 211 may process the final result of the matrix operation with a size of 100 × 100 at a time, or may process the final result of the matrix operation with a size of 10 × 10 at a time.
In other embodiments, the arithmetic unit 211 may further include a plurality of Application Specific Integrated Circuits (ASICs) adapted to run the neural network model, such as a convolution calculation unit, a vector calculation unit, and the like. In some embodiments, the operation unit 211 may be configured to read the input data, the first weight matrix, the second weight matrix, the third weight matrix, the fourth weight matrix, and the output data at the t-1 th time step from the storage unit 212, and generate the gate value of the reset gate and the gate value of the update gate at the t-th time step through an arithmetic operation. The storage unit 212 is used to temporarily store input and/or output data of the arithmetic unit 211. For example, the storage unit 212 may be used to store input data at the t-th time step, a first weight matrix, a second weight matrix, a third weight matrix, a fourth weight matrix, and output data at the t-1 th time step.
It is understood that DMAC 2101 may not be integrated into processor 21 in other embodiments, but may be a separate module coupled to system control logic 26, which is not limited by the embodiments of the present application.
The system Memory 22 may include Random-Access Memory (RAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and other Memory devices for temporarily storing Data or instructions of the electronic device 20.
Non-volatile memory 23 may be a tangible, non-transitory computer-readable medium including one or more instructions for permanently storing data and/or instructions. The nonvolatile memory 23 may include any suitable nonvolatile memory such as a flash memory and/or any suitable nonvolatile storage device, such as a Hard Disk Drive (HDD), a Compact Disc (CD), a Digital Versatile Disc (DVD), a Solid-State Drive (SSD), and the like. In some embodiments, the non-volatile memory 23 may also be a removable storage medium, such as a Secure Digital (SD) memory card or the like. In some embodiments, the non-volatile memory 23 is used to permanently store data or instructions for the electronic device 20, such as instructions for storing the neural network model 10.
Input/output (I/O) devices 24 may include input devices such as a keyboard, mouse, touch screen, etc. for converting user operations into analog or digital signals and communicating them to processor 21; and output devices such as speakers, printers, displays, etc. for presenting information in the electronic device 20 to the user in the form of sounds, text, images, etc.
The communication interface 25 provides a software/hardware interface for the electronic device 20 to communicate with other electronic devices, so that the electronic device 20 can exchange data with other electronic devices 20, for example, the electronic device 20 may obtain data for operating the neural network model from other electronic devices through the communication interface 25, and may also transmit the operation result of the neural network model to other electronic devices through the communication interface 25.
System control logic 26 may include any suitable interface controllers to provide any suitable interfaces to the other modules of electronic device 20 so that the various modules of electronic device 20 may communicate with one another.
In some embodiments, at least one of the processors 21 may be packaged together with logic for one or more controllers of the System control logic 26 to form a System In Package (SiP). In other embodiments, at least one of the processors 21 may also be integrated on the same Chip with logic for one or more controllers of the System control logic 26 to form a System-on-Chip (SoC).
It is understood that the hardware structure of the electronic device 20 shown in fig. 2 is only an example, in other embodiments, the electronic device 20 may also include more or fewer modules, and a part of the modules may also be combined or split, and the embodiment of the present application is not limited.
It is understood that the electronic device 20 may be any electronic device capable of running the GRU model 10, including but not limited to a laptop computer, a desktop computer, a tablet computer, a cell phone, a server, a wearable device, a head-mounted display, a mobile email device, a portable game console, a portable music player, a reader device, a television with one or more processors embedded therein or coupled thereto, and the embodiments of the present application are not limited thereto.
For ease of understanding, the process of the electronic device running the GRU model 10 will be described below in conjunction with the structure of the electronic device 20.
It is understood that the process of the electronic device 20 running each of the GRU networks in the GRU model 10 is similar, and the process of the electronic device 20 running the GRU model 10 is described below by taking the tth GRU network as an example.
In the method for operating a GRU model according to the embodiment of the present application, the operation unit 211 reads the input data of the t-th time step from the storage unit 212 for the first time, and the first weight matrix (i.e. the weight coefficient matrix W of the formula (1))xz) The second weight matrix (i.e., the weight coefficient matrix W of equation (1))hz) And the output data of the t-1 time step, the input data of the t time step and the first time step are processed by operating the nonlinear operator and the linear operator of the update gateThe weight matrix, the second weight matrix and the output data of the t-1 th time step, the operation unit 211 determines the output result of the update gate (i.e. the operator x of the formula (1)) by operating the nonlinear operatort×Wxz+ht-1×WhzThe calculation result of (c). Then, the third weight matrix (i.e. the weight coefficient matrix W of the formula (2)) is obtained according to the input data xt of the t-th time step read from the storage unit 212 for the second timexr) A fourth weight matrix (i.e., the weight coefficient matrix W of equation (2))hr) And the output data of the t-1 time step, and generating the output result of the reset gate (namely the operator x of the formula (2)) by operating the linear operator of the reset gatet×Wxr+ht-1×WhrThe calculation result of (c).
Input data x at the t-th time steptOutput data h of t-1 time step of matrix with size of 1 × 2t-1Taking a matrix of 1 × 4, a matrix of 2 × 4 for the first weight matrix, and a matrix of 4 × 4 for the second weight matrix as examples, the linear operator of the update gate of the GRU network (i.e. operator x in equation 1) of which the operation unit 211 runs the t-th time step is describedt×Wxz+ht-1×Whz) Generating the output z of the refresh gate at the t-th time stept' (size of 1 × 4 matrix).
For example, as shown in FIG. 3, the input data x at the t-th time steptIs a matrix of size 1 × 2, in which the input data xtThe first row of data of (1) includes 1, 2. First weighting matrix W at the t-th time stepxzIs a matrix of size 2 × 4, wherein the first weight matrix WxzComprises 3, 2, 5, 4, a first weight matrix WxzThe second line of data of (1) includes 4, 1. Output data h of t-1 time stept-1Is a 1 × 4 matrix in which the output data ht-1The first row of data of (1) includes 1, 2, 1, 3. Second weighting matrix W for the t-th time stephzIs a matrix of size 4 × 4, wherein the second weight matrix WhzComprises 3, 2, 1, 0, a second weight matrix WhzComprises 2, 1, 0, 3The third line data includes 1, 0,1, 2, and the fourth line data includes 1, 3, 2, 1. The arithmetic unit 211 runs the linear operator of the update gate and generates the output result z of the update gatet' is a matrix of size 1 × 4, in which the output result z of the gate is updatedtThe first row of data of' includes 22, 17, 15, 17.
Input data x at the t-th time steptOutput data h of t-1 time step of matrix with size of 1 × 2t-1Is a 1 × 4 matrix, a third weight matrix WxrIs a matrix of size 2 × 4 and a fourth weight matrix WhrFor the example of a 4 × 4 matrix, the linear operator of the reset gate of the GRU network (i.e. operator x in equation 2) of which the operation unit 211 runs the t-th time step is describedt×Wxr+ht-1×Whr) Generating an output r of the reset gate at the t-th time stept' (size of 1 × 4 matrix).
For example, as shown in FIG. 4, input data x at the t-th time steptIs a matrix of size 1 × 2, in which the input data xtThe first row of data of (1) includes 1, 2. Third weighting matrix W at the t-th time stepxrIs a matrix of size 2 × 4, wherein the third weight matrix WxrThe first row of data comprises 3, 2, 0, 2, and a third weight matrix WxrIncludes 2, 1. Output data h of t-1 time stept-1Is a 1 × 4 matrix in which the output data ht-1The first row of data of (1) includes 1, 2, 1, 3. Fourth weighting matrix W for the t-th time stephrIs a matrix of size 4 × 4, wherein the fourth weight matrix WhrIncludes 0, 2, 1, 0, a fourth weight matrix WhrComprises 2, 1, 0, a fourth weight matrix WhrIncludes 1, 0,1, 2, a fourth weight matrix WhrThe fourth line of data of (1) includes 1, 0, 2, 1. The arithmetic unit 211 executes the linear operator of the reset gate to generate the output result z of the reset gatet' is a matrix of size 1 × 4, in which the output z of the reset gatetThe first row of data of' includes 15, 8, 10, 9。
As can be seen from the descriptions of fig. 3 to fig. 4, the operation unit 211 needs to first read the input data x at the t-th time step from the storage unit 212tOutput data h of t-1 time stept-1A first weight matrix WxzAnd a second weight matrix WhzThe output z of the update gate is generated by running the linear operator of the update gatet'. Then, the operation unit 211 reads the input data x of the t-th time step from the storage unit 212tOutput data h of t-1 time stept-1A third weight matrix WxrAnd a fourth weight matrix WhrThe output result r of the reset gate is generated by operating the linear operator of the reset gatet′。
It is easy to see that, in the process of reasoning the input data of one time step to obtain the characters corresponding to the voice data of the time step, the computing unit 211 needs to pass through the data read twice from the storage unit 212 (i.e. the input data x of the t-th time step read for the first time)tOutput data h of t-1 time stept-1A first weight matrix WxzAnd a second weight matrix Whzy and the input data x of the t time step read for the second timetOutput data h of t-1 time stept-1A third weight matrix WxrAnd a fourth weight matrix Whr) And the output result z of the updating gate at the t-th time step can be generated only by performing matrix operation on the data read each timet' and reset gate output result r of reset gatet' the speed of the arithmetic unit in reasoning the input data is reduced.
It is understood that the GRU model 10 includes a GRU network of n time steps, and in the GRU network of each time step of the GRU model 10, the arithmetic unit 211 needs to read data from the storage unit 212 twice to calculate the gate values of the update gate and the reset gate of the GRU network of each time step. Therefore, when the GRU model 10 processes a segment of speech, the arithmetic unit 211 needs to read at least 2 × n times of data from the storage unit 212, and when the GRU model 10 processes a segment of speech, the arithmetic unit 211 needs to read at least 20 × n times of data from the storage unit 212. In this way, when the GRU model 10 is run to perform data processing (e.g., speech recognition), the operation unit 211 needs to read data from the storage unit 212 multiple times, which affects the speed of speech processing performed by the processor 21 and further affects the speed of speech recognition of the electronic device 20, resulting in a long speech recognition time and affecting user experience.
As is apparent from the above description, the matrix operations of the update gate operators and the reset gate operators are the same in structure, i.e., are all X × H1+ Y × H2 matrix operations, i.e., the operation unit 211 has an arithmetic logic unit corresponding to the matrix operation of X × H1+ Y × H2. However, the weight coefficients of the two input data of the matrix operation of the update gate operator and the reset gate operator are different, that is, the first weight matrix is different from the third weight matrix and the second weight matrix is different from the fourth weight matrix.
In order to reduce the number of times that the arithmetic unit 211 reads data from the storage unit 212, the embodiment of the present application provides another GRU model operation method, in which the arithmetic unit 211 reads the input data x of the t-th time step from the storage unit 212 oncetA first weight matrix WxzA second weight matrix WhzA third weight matrix WxrA fourth weight matrix WhrAnd output data h of t-1 time stept-1The first weight matrix W is usedxzAnd a third weight matrix WxrSplicing to obtain a first splicing weight matrix, and splicing the second weight matrix WhzAnd a fourth weight matrix WhrSplicing to obtain a second splicing weight matrix, and combining the first splicing weight matrix, the second splicing weight matrix and the input data x of the t-th time steptOutput data h of t-1 time stept-1Performing matrix operation of X × H1+ Y × H2 to obtain output result z of the update gate at the t-th time stept' and reset gate output result rt'. In this way, the number of times the arithmetic unit 211 reads data from the storage unit 212 is reduced, thereby increasing the speed of the arithmetic unit in running the neural network model.
For example, if the arithmetic logic unit corresponding to the matrix operation of X × H1+ Y × H2 of the operation unit 211 can process the matrix operation at one timeWhen the data amount is larger than the data processing amount of the output result of the refresh gate generating the t-th time step and the output result of the reset gate generating the t-th time step, the arithmetic unit 211 may read the input data x of the t-th time step from the storage unit 212 last timetA first weight matrix WxzA second weight matrix WhzA third weight matrix WxrFourth weight matrix WhrAnd output data h of t-1 time stept-1Then, for the first weight matrix WxzAnd a third weight matrix WxrAnd splicing along the width direction to generate a first splicing weight matrix. For the second weight matrix WxzAnd a fourth weight matrix WhrAnd splicing along the width direction to generate a second splicing weight matrix. The arithmetic unit 211 operates the corresponding arithmetic logic unit by running the matrix of X × H1+ Y × H2, i.e. the first splicing weight matrix is H1, and the second splicing weight matrix is H2, and inputs the first splicing weight matrix and the second splicing weight matrix to the arithmetic logic unit, so that the arithmetic unit 211 reads data from the storage unit 212 once, and can determine the output result z of the update gate at the t-th time stept' and reset gate output result rt'. Therefore, the number of times that the operation unit 211 reads data from the storage unit 212 from the processing unit 211 is reduced, when the GRU model 10 processes a section of speech to be recognized, the operation unit 211 can generate characters corresponding to the speech to be recognized only by reading n times of data from the storage unit 212 (reducing the data reading amount by one time), so that the speech processing speed of the processor 21 is increased, the speech recognition speed of the electronic device 20 is increased, the speech recognition time is shortened, and the user experience is improved.
Specifically, the input data x for the t-th time step shown in fig. 3 and 4tOutput data h of t-1 time stept-1A first weight matrix WxzA second weight matrix WhzAnd the input data x of the t time steptOutput data h of t-1 time stept-1A third weight matrix WxrAnd a fourth weight matrix WhrThe operation unit 211 can read the t-th time from the storage unit 212 at the same timeInput data x of steptOutput data h of t-1 time stept-1First weight matrix WxzA second weight matrix WhzA third weight matrix WxrAnd a fourth weight matrix WhrAnd the first weight matrix WxzAnd a third weight matrix WxrSplicing is the first splicing weight matrix W shown in FIG. 5xzrA second weight matrix WhzAnd a fourth weight matrix WhrSplicing is the second splicing weight matrix W shown in FIG. 6hzr(ii) a Then the first splicing weight matrix WxzrA second stitching weight matrix WhzrAnd input data x at the t-th time steptOutput data h of t-1 time stept-1Matrix operation of X × H1+ Y × H2 is performed to obtain the first output data zr at the t-th time step shown in FIG. 7t'; referring to fig. 8, the operation unit 211 further obtains the output result z of the refresh gate at the t-th time step of fig. 3 and 4 obtained in fig. 7t' and reset gate output result rt' splitting to obtain the output result z of the update gate at the t-th time stept' and reset gate output result rt'. The specific calculation process will be described below, and will not be described herein.
The following describes in detail a process of the electronic device 20 executing the GRU model 10 operation method provided in the embodiment of the present application, with reference to a hardware structure of the electronic device 20.
Fig. 9 illustrates a flow diagram of a method of operating a GRU model 10 according to some embodiments of the present application. In the following, an embodiment of the present application is described by taking a GRU network (GRUt) at a tth time step in the GRU model 10 as an example, where the operation method includes:
s901: the operation unit 211 reads the input data, the first weight matrix, the second weight matrix, and the output data at the t-1 th time step from the storage unit 212. Wherein the first weight matrix is the weight coefficient matrix W in the formula (1)xzThe second weight matrix is the weight coefficient matrix W in the formula (1)hz。
S902: the operation unit 211 inputs the number of times according to the t-th time stepAnd determining the output result of the updating gate of the t-th time step by operating the linear operator of the updating gate of the GRU network of the t-th time step according to the first weight matrix, the second weight matrix and the output data of the t-1-th time step. Wherein, the linear operator of the update gate of the GRU network at the t-th time step may be the operator x in formula (1)t×Wxz+ht-1×Whz。
S903: the operation unit 211 determines the gate value of the update gate of the GRU network at the t-th time step by operating the nonlinear operator of the update gate of the GRU network at the t-th time step according to the output result of the update gate. The non-linear operator of the update gate of the GRU network may be a σ () (i.e., sigmoid function) operator in formula (1).
S904: the operation unit 211 reads the input data at the t-th time step, the third weight matrix, the fourth weight matrix, and the output data at the t-1-th time step from the storage unit 212. Wherein the third weight matrix is the weight coefficient matrix W in the formula (2)xrThe second weight matrix is the weight coefficient matrix W in the formula (2)hr。
S905, the operation unit 211 determines the output result of the reset gate through the linear operator of the reset gate of the GRU network according to the input data of the t-th time step, the third weight matrix, the fourth weight matrix and the output data of the t-1-th time step.
S906: the operation unit 211 determines a gate value of the reset gate of the GRU network at the t-th time step by operating the nonlinear operator of the reset gate of the GRU network at the t-th time step according to the output result of the reset gate. The non-linear operator of the reset gate of the GRU network may be a sigma () (i.e., sigmoid function) operator in formula (2).
S907: the operation unit 211 determines output data of the t-th time step based on the gate value of the reset gate of the t-th time step, the gate value of the update gate of the t-th time step, the input data of the t-th time step, and the output data of the t-1-th time step.
In some embodiments, the arithmetic unit 211 determines the output data for the t-th time step based on the gate value of the reset gate for the t-th time step, the gate value of the update gate for the t-th time step, the input data for the t-th time step, and the output data for the t-1-th time step. Specifically, the output data of the t-th time step can be obtained by calculation according to formula (3) and formula (4), and the specific calculation process refers to the description of formula (3) and formula (4), which is not described herein again.
It is to be noted that, in steps S901 to S906, when the arithmetic unit 211 calculates the gate values of the reset gates and the update gates of the GRU network at the t-th time step, it is necessary to read data at step S901, that is, input data at the t-th time step, the first weight matrix, the second weight matrix, and output data at the t-1-th time step, and read data at step S903, that is, input data at the t-th time step, the third weight matrix, the fourth weight matrix, and output data at the t-1-th time step.
Therefore, in order to reduce the number of times the operation unit 211 reads data from the storage unit 212 in calculating the gate value of the reset gate and updating the gate value of the gate at the t-th time step. The application further provides an operation method of the GRU model 10, the operation unit 211 only needs to read the input data, the first weight matrix, the second weight matrix, the third weight matrix, the fourth weight matrix and the output data of the t-1 time step from the storage unit 212 once, the first weight matrix and the third weight matrix are spliced, the second weight matrix and the fourth weight matrix are spliced, and then the operation unit 211 generates the gate value of the reset gate of the t time step and the gate value of the update gate of the t time step according to the spliced weight matrix, the input data of the t time step and the output data of the t-1 time step.
Fig. 10 illustrates a flow diagram of another method of operating a GRU model 10 according to some embodiments of the present application. In the following, an embodiment of the present application is described by taking a GRU network (GRUt) at a tth time step in the GRU model 10 as an example, where the operation method includes:
s1001: the operation unit 211 reads the input data, the first weight matrix, the second weight matrix, the third weight matrix, the fourth weight matrix, and the output data at the t-1 time step from the storage unit 212.
For example, as shown in fig. 1A to 1C, taking the GRU network (GRUt) at the t-th time step in the GRU model 10 as an example, in order to calculate the gate values of the update gate and the reset gate, the operation unit 211 may read the input data at the t-th time step, the first weight matrix, the second weight matrix, the third weight matrix, the fourth weight matrix, and the output data at the t-1-th time step from the storage unit 212, wherein the first weight matrix is the weight coefficient matrix W in the formula (1)xzThe second weight matrix is the weight coefficient matrix W in the formula (1)hzThe second weight matrix is the weight coefficient matrix W in the formula (2)xrThe second weight matrix is the weight coefficient matrix W in the formula (2)hr。
In some embodiments, the input data x for the t time steptIs a matrix of size H1 xW 1, wherein H1 represents the input data x of the t time steptW1 represents the input data x at the t-th time steptThe number of columns. First weighting matrix W at the t-th time stepxzAnd a third weight matrix WxrAre all matrices with the size of H2 XW 2, and H2 represents a first weight matrix WxzAnd a third weight matrix WxrThe number of rows of (c). Second weighting matrix W for the t-th time stephzAnd a fourth weight matrix WhrAre all matrices with the size of H3 XW 3, H3 represents a second weight matrix WhzAnd a fourth weight matrix WhrThe number of rows of (c). W3 denotes a second weight matrix WhzAnd a fourth weight matrix WhrThe number of columns. Output data h of t-1 time stept-1H4 represents the output data H of the t-1 time step, which is a matrix with the size of H4 xW 4t-1W4 represents the output data h at the t-1 time stept-1The number of columns.
As will be readily appreciated, input data x in the linear operator of the update gate of the GRU network to ensure the t-th time steptAnd a first weight matrix WxzMultiplying the matrix of (1), and outputting data ht-1And a second weight matrix WhzAnd input data xtAnd a first weight matrix WxzWith the output data ht-1And a second weight matrix WhzThe result of the matrix multiplication of (a) realizes matrix addition, and the value of the number of lines H1 of the input data at the t-th time step is equal to the value of the number of lines H4 of the output data at the t-1-th time step. The value of the row number H2 of the first weight matrix is equal to the value of the column W1 of the input data at the t-th time step. The value of the row number H3 of the second weight matrix is equal to the value of the column number W4 of the output data at time step t-1. The number of columns W2 of the first weight matrix is equal to the number of columns W3 of the second weight matrix. Similarly, the corresponding relationship between the row number and the column number of each matrix in the linear operator of the reset gate of the GRU network at the t-th time step refers to the corresponding relationship between the row number and the column number of each matrix in the linear operator of the update gate of the GRU network at the t-th time step, which is not described herein again.
In some embodiments, the data in the input data at the t-th time step, the first weight matrix, the second weight matrix, the third weight matrix, the fourth weight matrix, and the output data at the t-1 th time step may be integer numbers or floating point numbers, and the data types of the input data at the t-th time step, the first weight matrix, the second weight matrix, and the output data at the t-1 th time step are not particularly limited according to the practical application.
S1002: the operation unit 211 concatenates the first weight matrix at the t-th time step and the third weight matrix at the t-th time step along the width direction of the matrices, and generates a concatenated first concatenation weight matrix.
In some embodiments, the first weight matrix W for the t time stepxzIs a matrix of size H2 XW 2, H2 both representing WxzThe number of rows of, in turn, WxzW2 denotes both WxzThe number of columns of (2) further represents WxzThe width of the matrix of (a). Third weighting matrix W at the t-th time stepxrIs a matrix of size H2 XW 2, H2 both representing WxrThe number of rows of, in turn, WxrW2 denotes both WxrThe number of columns of (2) further represents WxrThe width of the matrix of (a).
It is obvious that the height values and the width values of the first weight matrix of the t-th time step and the third weight matrix of the t-th time step are respectively equal, but the data in the first weight matrix of the t-th time step and the third weight matrix of the t-th time step are different. Therefore, the operation unit 211 may concatenate the first weight matrix at the t-th time step and the third weight matrix at the t-th time step in the width direction of the matrices to generate a concatenated first concatenation weight matrix.
In some embodiments, the first weight matrix at the t-th time step is a matrix with a size of H2 × W2, the third weight matrix at the t-th time step is a matrix with a size of H2 × W2, and the operation unit 211 splices the first weight matrix at the t-th time step and the third weight matrix at the t-th time step along a width direction of the matrices to generate a spliced first spliced weight matrix. The first spliced weight matrix after splicing is a matrix with the size of H2 multiplied by 2W 2.
For example, as shown in FIG. 5, the first weight matrix W for the t-th time stepxzThird weighting matrix W with the t time stepxrAre all matrices of size 2 × 4, the first weight matrix WxzComprises 3, 2, 5, 4, a first weight matrix WxzIncludes 4, 1. Third weight matrix WxrThe first row of data comprises 3, 2, 0, 2, and a third weight matrix WxrComprises 2, 1. The operation unit 211 applies the first weight matrix W of the t-th time stepxzAnd a third weight matrix WxrSplicing along the width direction of the matrix, and generating a spliced first splicing weight matrix WxzrA matrix of size 2 x 8. As shown in FIG. 5, the first stitching weight matrix WxzrThe first row of data of (1) includes: 3. 2, 5, 4, 3, 2, 0, 2, first stitching weight matrix WxzrThe second line of data includes: 4. 1, 2, 1.
S1003: the operation unit 211 concatenates the second weight matrix at the t-th time step and the fourth weight matrix at the t-th time step along the width direction of the matrices, and generates a concatenated second concatenation weight matrix.
In some embodiments, the second weight matrix W for the t time stephzIs a matrix of size H3 xW 3, H3 both representing WhzThe number of rows of, in turn, WhzW3 denotes both WhzThe number of columns of (2) further represents WhzThe width of the matrix of (a). Third weighting matrix W at the t-th time stephrIs a matrix of size H2 XW 2, H2 both representing WhrNumber of rows of (2), in turn, represents WhrW2 denotes both WhrThe number of columns of (2) further represents WhrThe width of the matrix of (a).
It is obvious that the width and height of the second weight matrix at the t-th time step and the fourth weight matrix at the t-th time step are respectively equal, except that the data in the second weight matrix at the t-th time step and the fourth weight matrix at the t-th time step are different. The operation unit 211 may splice the second weight matrix at the t-th time step and the fourth weight matrix at the t-th time step along the height direction of the matrices, and generate a spliced second splicing weight matrix.
In some embodiments, the second weight matrix of the t-th time step and the fourth weight matrix of the t-th time step are both matrices with a size of H3 × W3, and therefore, the operation unit 211 may concatenate the second weight matrix of the t-th time step and the fourth weight matrix of the t-th time step along a width direction of the matrices to generate a concatenated second concatenation weight matrix. The second stitching weight matrix is a matrix with size H3 × 2 × W3.
For example, as shown in FIG. 6, the second weight matrix W for the t-th time stephzFourth weighting matrix W with the t-th time stepxrAre all matrices of size 4 x 4. Wherein the second weight matrix WhzThe first row of data of (1) includes: 3. 2, 1, 0, second weight matrix WhzThe second line of data includes: 2. 1, 0, 3, and the third row data includes: 1. 0,1, 2, and the fourth line data includes 1, 3, 2, 1. Fourth weight matrix WhrComprises 0, 2, 1, 0, a fourth weight matrix WhrThe second line of data includes: 2. 1, 0, fourth weight matrix WhrThe third line of data of (1) includes: 1. 0,1, 2, fourthWeight matrix WhrThe fourth line data of (2) includes: 1. 0, 2 and 1. The operation unit 211 uses the second weight matrix W of the t-th time stephzAnd a third weight matrix WhrSplicing along the width direction of the matrix, and generating a spliced second splicing weight matrix WhzrA matrix of size 4 x 8. Wherein the second stitching weight matrix WhzrThe first row of data of (1) includes: 3. 2, 1, 0, 2, 1, 0, a second weight matrix WhzThe second line of data includes: 2. 1, 0, 3, 2, 1, 0, a second weight matrix WhzThe third line of data of (1) includes: 1. 0,1, 2, 1, 0,1, 2, a second weight matrix WhzThe fourth line data of (2) includes: 1. 3, 2, 1, 0, 2, 1.
S1004: the operation unit 211 determines the first operator of the GRU network at the t-th time step according to the linear operator of the reset gate and the linear operator of the update gate of the GRU network at the t-th time step, and operates the first operator of the GRU network at the t-th time step according to the input data at the t-th time step, the first splicing weight matrix, the second splicing weight matrix, and the output data at the t-1 th time step to generate the first output data of the GRU network at the t-th time step.
In some embodiments, the arithmetic unit 211 determines the first operator for the t-th time step from the linear operator of the reset gate and the linear operator of the update gate of the GRU network. The operation unit 211 runs the first operator at the t-th time step according to the input data at the t-th time step, the first splicing weight matrix, the second splicing weight matrix, and the output data at the t-1 th time step, and generates the first output data at the t-th time step. In particular, the first output data zr at the t-th time stept' can be expressed by the following formula (5):
zrt′=xt×Wxzr+ht-1×Whzr (5)
in the formula (5), xt×Wxzr+ht-1×WhzrFirst operator of GRU network representing t time step, xtInput data representing the t-th time step, ht-1Represents the t-1 th time stepOutput data of WxzrA first stitching weight matrix, W, representing the t-th time stephzrA second stitching weight matrix representing the t-th time step, x representing matrix multiplication, and + representing matrix addition.
For example, as shown in FIG. 7, the input data x at the t-th time steptIs a matrix of size 1 × 2, in which the input data xtThe first row of data of (1) includes 1, 2. Output data h of t-1 time stept-1Is a 1 × 4 matrix in which the output data ht-1The first row of data of (1) includes 1, 2, 1, 3. First stitching weight matrix WxzrA matrix of size 2 x 8. First stitching weight matrix WxzrReference is made to the description of fig. 5 for specific data therein. Second stitching weight matrix WhzrA matrix of size 4 x 8. Second stitching weight matrix WhzrReference is made to the description of fig. 6 for specific data therein.
Specifically, the arithmetic unit 211 calculates the input data x according to the t-th time steptA first stitching weight matrix WxzrA second stitching weight matrix WhzrAnd output data h of t-1 time stept-1Running the first operator of the GRU network at the t-th time step to generate the first output data zr of the GRU network at the t-th time stept'. As shown in FIG. 7, the first output data zr of the GRU network at the t-th time stept' first output data zr of GRU network at t time step, which is a matrix of size 1 × 8tThe first row data of' includes: 22. 17, 15, 8, 10, 9.
S1005: the operation unit 211 determines an output result of the update gate and an output result of the reset gate in the first output data of the GRU network at the t-th time step according to the width values of the first weight matrix and the third weight matrix, or the width values of the second weight matrix and the fourth weight matrix.
For example, as shown in FIG. 8, the first weight matrix WxzAnd a third weight matrix WxrAre all 4, the arithmetic unit 211 outputs the first output data zrt' column 1 to column 4 data as the output result z of the update gatet', with a first outputData of 5 th to 8 th columns of data as an output result r of the reset gatet'. As shown in FIG. 8, the output z of the update gate of the GRU network at the t-th time stept' update the output z of the gate for a matrix of size 1 × 4tThe first row of data of' includes 22, 17, 15, 17. Output result r of reset gatet' reset the output z of the gate for a matrix of size 1 × 4tThe first row of data of' includes 15, 8, 10, 9.
S1006: the operation unit 211 generates a gate value of the reset gate and a gate value of the update gate by operating the nonlinear operator of the GRU network at the t-th time step according to the output result of the reset gate and the output result of the update gate.
In some embodiments, the operation unit 211 determines a gate value of an update gate of the GRU network at a t-th time step by operating a non-linear operator of the update gate of the GRU network at the t-th time step according to an output result of the update gate. In particular, the gate value z of the update gate of the GRU network at the t-th time steptCan be calculated by the following equation (6):
zt=σ(z′t) (6)
in equation (6), σ (z't) Represents the nonlinear operator of the update gate of the GRU network at the t-th time step, sigma () represents a sigmoid function, and the calculation formula of the sigmoid function can be expressed as
In some embodiments, the operation unit 211 determines the gate value of the reset gate of the GRU network at the t-th time step by operating the nonlinear operator of the reset gate of the GRU network at the t-th time step according to the output result of the reset gate. In particular, the gate value z of the reset gate of the GRU network at the t-th time steptCan be calculated by the following equation (7):
rt=σ(rt′) (7)
in equation (7), σ (r)t') represents the nonlinear operator of the reset gate of the GRU network at the t-th time step, σ() Representing a sigmoid function, the formula for calculating the sigmoid function can be expressed as
S1007: the operation unit 211 determines output data of the t-th time step based on the gate value of the reset gate of the t-th time step, the gate value of the update gate of the t-th time step, the input data of the t-th time step, and the output data of the t-1-th time step.
In some embodiments, the arithmetic unit 211 determines the output data for the t-th time step based on the gate value of the reset gate for the t-th time step, the gate value of the update gate for the t-th time step, the input data for the t-th time step, and the output data for the t-1-th time step. Specifically, the output data of the t-th time step can be obtained by calculation according to formula (3) and formula (4), and the specific calculation process refers to the description of formula (3) and formula (4), which is not described herein again.
As can be seen from the process of the GRU model 10 operation method in fig. 10, in the process that the operation unit 211 calculates the gate values of the reset gate and the update gate of the GRU network at the t-th time step, the operation unit 211 generates the first concatenation weight matrix by concatenating the first weight matrix and the third weight matrix in the width direction. And splicing the second weight matrix and the fourth weight matrix along the width direction to generate a second spliced weight matrix. The operation unit 211 performs matrix operation by operating the first operator, using the first stitching weight matrix as the weight parameter of the input data at the t-th time step, and using the second stitching weight matrix as the weight parameter of the output data at the t-1 th time step, so that the operation unit 211 can determine the gate value of the refresh gate and the reset gate at the t-th time step by reading the data from the storage unit 212 once. Thus, when the GRU model 10 processes a section of speech to be recognized, compared with the text operation unit 211 which needs the operation unit 211 to read at least 2n times of data from the storage unit 212 to recognize the text corresponding to the speech, as shown in fig. 9, the text operation unit 211 can recognize the text corresponding to the speech only by reading n times of data from the storage unit 212, thereby reducing the data reading amount by one time, increasing the speed of the processor 21 for performing speech processing, further increasing the speed of the electronic device 20 for speech recognition, shortening the time of speech recognition, and improving the user experience.
The embodiments disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.
It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.
It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.
Claims (11)
1. A neural network model operation method is applied to electronic equipment and is characterized in that the neural network model comprises a first operation and a second operation;
and the method comprises:
acquiring a data matrix to be operated of the first operation or the second operation, wherein the number of operation factors in the first operation and the second operation is the same, and for each operation factor in the first operation, corresponding operation factors with the same data matrix to be operated and different operation coefficient matrixes exist in the second operation;
performing a third operation on the data matrix to be operated to generate a result matrix of the third operation, wherein the third operation is an operation mode obtained by combining operation coefficient matrixes of corresponding operation factors in the first operation and the second operation;
and splitting the result matrix of the third operation to respectively obtain a result matrix of the first operation and a result matrix of the second operation.
2. The method according to claim 1, wherein the operation coefficient matrixes of the first operation and the second operation comprise two data dimensions of height and width, and the operation coefficient matrixes of the operation factors corresponding to the first operation and the second operation have the same height and width;
the third operation is an operation mode obtained after the operation coefficient matrixes of the corresponding operation factors in the first operation and the second operation are combined along any data dimension direction.
3. The method according to claim 2, wherein the third operation is an operation mode obtained by combining operation coefficient matrixes of corresponding operation factors in the first operation and the second operation in a width direction.
4. The method of claim 1, wherein the splitting the result matrix of the third operation to obtain the result matrix of the first operation and the result matrix of the second operation respectively comprises:
the result matrix of the third operation comprises two data dimensions of height and width;
and splitting the result matrix of the third operation along any data dimension direction to respectively obtain a result matrix of the first operation and a result matrix of the second operation.
5. The method of claim 4, wherein the result matrix of the third operation is split along the width direction to obtain a result matrix of the first operation and a result matrix of the second operation.
6. The method according to claim 1, wherein the third operation is an operation obtained by combining operation coefficient matrices of corresponding operation factors in the first operation and the second operation, and comprises:
the data matrix to be operated comprises a first input data matrix and a second input data matrix, and the operation factors of the first operation and the second operation comprise the matrix product of the first input data matrix of the first operation and the second operation and the corresponding operation coefficient matrix, and the matrix product of the second input data matrix of the first operation and the second operation and the corresponding operation coefficient matrix;
the third operation is an operation mode obtained by combining the operation coefficient matrix corresponding to the first input data matrix of the first operation and the second input data matrix of the second operation, and combining the operation coefficient matrix corresponding to the second input data matrix of the first operation and the second operation.
7. The method of claim 1, wherein the neural network model is a recurrent neural network model, and the first and second operations are operations of a fully-connected layer of the recurrent neural network model.
8. The method of claim 7, wherein the neural network model comprises at least one of: a threshold cycle unit model and a long-short term memory model.
9. An electronic device is characterized by comprising an operation unit and a storage unit, wherein the operation unit operates a first matrix operation circuit;
the operation unit acquires a data matrix to be operated of the first operation or the second operation from the storage unit, wherein the number of operation factors in the first operation and the second operation is the same, and for each operation factor in the first operation, corresponding operation factors with the same data matrix to be operated and different operation coefficient matrixes exist in the second operation;
the operation unit performs a third operation on the data matrix to be operated by operating a first matrix operation circuit to generate a result matrix of the third operation, wherein the third operation is an operation mode obtained by combining operation coefficient matrixes of corresponding operation factors in the first operation and the second operation, and the first matrix operation circuit is used for generating the first operation result matrix and the second operation result matrix;
and the operation unit splits the result matrix of the third operation to respectively obtain a result matrix of the first operation and a result matrix of the second operation.
10. An electronic device, comprising:
a memory for storing instructions for execution by one or more processors of the electronic device, an
A processor, one of the one or more processors of the electronic device, configured to perform the neural network model operation method of any one of claims 1 to 8.
11. A readable medium, characterized in that the readable medium of an electronic device has stored thereon instructions that, when executed, cause the electronic device to perform the neural network model operation method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210330475.1A CN114676832A (en) | 2022-03-30 | 2022-03-30 | Neural network model operation method, medium, and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210330475.1A CN114676832A (en) | 2022-03-30 | 2022-03-30 | Neural network model operation method, medium, and electronic device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114676832A true CN114676832A (en) | 2022-06-28 |
Family
ID=82075847
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210330475.1A Pending CN114676832A (en) | 2022-03-30 | 2022-03-30 | Neural network model operation method, medium, and electronic device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114676832A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024046183A1 (en) * | 2022-08-30 | 2024-03-07 | 华为技术有限公司 | Model compression method and apparatus, and related device |
-
2022
- 2022-03-30 CN CN202210330475.1A patent/CN114676832A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024046183A1 (en) * | 2022-08-30 | 2024-03-07 | 华为技术有限公司 | Model compression method and apparatus, and related device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107832843B (en) | Information processing method and related product | |
US11809515B2 (en) | Reduced dot product computation circuit | |
CN113673701B (en) | Operation method of neural network model, readable medium and electronic equipment | |
WO2023005386A1 (en) | Model training method and apparatus | |
CN114925320B (en) | Data processing method and related device | |
US20210174179A1 (en) | Arithmetic apparatus, operating method thereof, and neural network processor | |
CN112651485A (en) | Method and apparatus for recognizing image and method and apparatus for training neural network | |
CN112396085B (en) | Method and apparatus for recognizing image | |
WO2019138897A1 (en) | Learning device and method, and program | |
CN113947703A (en) | Method and apparatus for recognizing image through neural network | |
CN114676832A (en) | Neural network model operation method, medium, and electronic device | |
KR20210112834A (en) | Method and apparatus for processing convolution operation on layer in neural network | |
CN112784951A (en) | Winograd convolution operation method and related product | |
US20190114542A1 (en) | Electronic apparatus and control method thereof | |
KR102704647B1 (en) | Electronic apparatus and control method thereof | |
WO2020005599A1 (en) | Trend prediction based on neural network | |
US20210264247A1 (en) | Activation function computation for neural networks | |
KR102704648B1 (en) | Electronic apparatus and control method thereof | |
CN113869517A (en) | Inference method based on deep learning model | |
CN115239409A (en) | Sequence recommendation information selection method and system based on multi-agent reinforcement learning | |
CN112530416B (en) | Speech recognition method, apparatus, device and computer readable medium | |
CN113570053A (en) | Neural network model training method and device and computing equipment | |
CN111723917A (en) | Operation method, device and related product | |
JP7524946B2 (en) | Data processing device, data processing method and recording medium | |
US20240071068A1 (en) | Information processing apparatus, information processing method, and non-transitory computer-readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Zhang Xiaolong Inventor after: Wu Dawei Inventor before: Zhang Xiaolong |
|
CB03 | Change of inventor or designer information |