CN114676832A

CN114676832A - Neural network model operation method, medium, and electronic device

Info

Publication number: CN114676832A
Application number: CN202210330475.1A
Authority: CN
Inventors: 章小龙
Original assignee: ARM Technology China Co Ltd
Current assignee: ARM Technology China Co Ltd
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-06-28

Abstract

The present application relates to the field of machine learning technologies, and in particular, to a neural network model operation method, medium, and electronic device. According to the neural network model operation method, the weight matrixes in the operation items with the same operation form and the same input data are spliced, the operation units corresponding to the operation items in the form are used for obtaining the operation results of the operation items through acquiring the input data from the storage unit once based on the spliced weight matrixes, and therefore the speed of the electronic equipment for operating the neural network model is increased.

Description

Neural network model operation method, medium, and electronic device

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a neural network model operation method, medium, and electronic device.

Background

With the rapid development of Artificial Intelligence (AI) technology, neural networks (e.g., deep neural networks, recurrent neural networks) have recently gained excellent results in the fields of computer vision, speech, natural language, reinforcement learning, and the like. With the development of neural network algorithms, the complexity of the algorithms is higher and higher, and in order to improve the recognition degree, the scale of the models is gradually increased, and accordingly, the power consumption of the devices with the neural network models and the consumption of computing resources are higher and higher. Especially for some edge devices with limited operation resources, the method has the advantages of improving the operation speed of the neural network model, saving the operation time and reducing the power consumption.

Disclosure of Invention

The application aims to provide a neural network model operation method and mediumMass and electronic devices. By the operation method of the neural network model of the application, the operation items (such as 'x' in the following formula (1)) with the same operation form and the same input data_t×W^xz+h_t-1×W^hz"and" x "in the formula (2)_t×W^xr+h_t-1×W^hrThe operation forms are the same, and the input data are all x_tAnd h_t-1) "weight matrix of (W)^xzAnd W^xrAnd W^hzAnd W^hr) And splicing, and acquiring input data from the storage unit once based on the spliced weight matrix by using the operation unit corresponding to the operation item in the form to obtain the operation result of each operation item. Therefore, the speed of the electronic equipment for operating the neural network model is improved.

A first aspect of the present application provides a method for operating a neural network model, which is applied to an electronic device, and is characterized in that the neural network model includes a first operation and a second operation. And the method comprises: and acquiring a data matrix to be operated of a first operation or a second operation, wherein the number of operation factors in the first operation and the second operation is the same, and for each operation factor in the first operation, corresponding operation factors with the same data matrix to be operated and different operation coefficient matrixes exist in the second operation. And performing third operation on the data matrix to be operated to generate a result matrix of the third operation, wherein the third operation is an operation mode obtained by combining operation coefficient matrixes of corresponding operation factors in the first operation and the second operation. And splitting the result matrix of the third operation to respectively obtain the result matrix of the first operation and the result matrix of the second operation.

For example, with the neural network model as the threshold cycle unit model, the first operation may be x of the following formula (1) in the GRU network at the t-th time step of the threshold cycle unit model_t×W^xz+h_t-1×W^hzThe second operation may be x of formula (2) below_t×W^xz+h_t-1×W^hz. For example, the arithmetic unit of the processor reads the input of the t time step from the memory unit of the processor at a timeData x_tA weight coefficient matrix W^xzA weight coefficient matrix W^hzA weight coefficient matrix W^xrA weight coefficient matrix W^hrAnd output data h of t-1 time step_t-1The weight coefficient matrix W is set^xzAnd a weight coefficient matrix W^xrSplicing is carried out to obtain a splicing weight matrix W^xzrThe weight coefficient matrix W is used to calculate the weight coefficient^hzAnd a weight coefficient matrix W^hrSplicing is carried out to obtain a splicing weight matrix W^hzrThe operation unit splices the weight matrix W by operating the matrix operation logic of X multiplied by H1+ Y multiplied by H2^xzrAs H1, splicing weight matrix W^hzrInput data x as H2 at t-th time step_tAs output data h of X, t-1 time step_t-1As Y, corresponding matrix operation is performed, that is, x of the formula (1) can be obtained by one-time operation_t×W^xz+h_t-1×W^hzOperation result sum x_t×W^xr+h_t-1×W^hrThe result of the operation is then based on the weight coefficient matrix W^xzAnd a weight coefficient matrix W^xrDimension value of (a) x_t×W^xz+h_t-1×W^hzOperation result sum x_t×W^xr+h_t-1×W^hrSplitting the operation result to respectively obtain x_t×W^xz+h_t-1×W^hzOperation result, x_t×W^xr+h_t-1×W^hrAnd (5) calculating the result. The operation result of each operation item can be obtained by acquiring the input data from the storage unit once. Therefore, the speed of the electronic equipment for operating the neural network model is improved.

In one possible implementation of the first aspect, the operation coefficient matrices of the first operation and the second operation include two data dimensions of height and width, where the height and width of the operation coefficient matrices of the corresponding operation factors in the first operation and the second operation are equal to each other. The third operation is an operation mode obtained after the operation coefficient matrixes of the corresponding operation factors in the first operation and the second operation are combined along any data dimension direction.

In one possible implementation of the first aspect, the third operation is an operation mode obtained by combining operation coefficient matrices of corresponding operation factors in the first operation and the second operation in the width direction.

In a possible implementation of the first aspect, splitting the result matrix of the third operation to obtain the result matrix of the first operation and the result matrix of the second operation respectively includes: the result matrix of the third operation includes two data dimensions, height and width. And splitting the result matrix of the third operation along any data dimension direction to respectively obtain the result matrix of the first operation and the result matrix of the second operation.

In a possible implementation of the first aspect, the result matrix of the third operation is split along the width direction to obtain a result matrix of the first operation and a result matrix of the second operation, respectively.

In a possible implementation of the first aspect, the operation manner in which the third operation is obtained by combining operation coefficient matrices of corresponding operation factors in the first operation and the second operation includes: the data matrix to be operated comprises a first input data matrix and a second input data matrix, and the operation factors of the first operation and the second operation comprise the matrix product of the first input data matrix of the first operation and the second operation and the corresponding operation coefficient matrix and the matrix product of the second input data matrix of the first operation and the second operation and the corresponding operation coefficient matrix.

The third operation is an operation mode obtained by combining the operation coefficient matrixes corresponding to the first input data matrix of the first operation and the second input data matrix of the second operation and combining the operation coefficient matrixes corresponding to the second input data matrix of the first operation and the second operation.

In one possible implementation of the first aspect, the neural network model is a recurrent neural network model, and the first operation and the second operation are operations of a fully connected layer of the recurrent neural network model.

In one possible implementation of the first aspect described above, the neural network model comprises at least one of: a threshold cycle unit model and a long-short term memory model.

A second aspect of the present application provides an electronic device, including: the device comprises an arithmetic unit and a storage unit, wherein the arithmetic unit operates a first matrix arithmetic circuit. The operation unit acquires a data matrix to be operated of a first operation or a second operation from the storage unit, wherein the number of operation factors in the first operation and the second operation is the same, and for each operation factor in the first operation, there is a corresponding operation factor in the second operation in which the data matrix to be operated is the same and the operation coefficient matrix is different. The operation unit performs a third operation on the data matrix to be operated by operating the first matrix operation circuit to generate a result matrix of the third operation, wherein the third operation is an operation mode obtained by combining operation coefficient matrixes of corresponding operation factors in the first operation and the second operation, and the first matrix operation circuit is used for generating a first operation result matrix and a second operation result matrix. The operation unit splits the result matrix of the third operation to obtain a result matrix of the first operation and a result matrix of the second operation respectively.

For example, when an arithmetic logic unit (i.e., a first matrix operation circuit) corresponding to a matrix operation of X × H1+ Y × H2 of an operation unit is capable of processing data of a data amount larger than data processing amounts of an output result of an update gate generating a t-th time step and an output result of a reset gate generating the t-th time step at a time, the operation unit may read input data X of the t-th time step from a storage unit last time_tA first weight matrix W^xzA second weight matrix W^hzA third weight matrix W^xrA fourth weight matrix W^hrAnd output data h of t-1 time step_t-1Then, for the first weight matrix W^xzAnd a third weight matrix W^xrAnd splicing along the width direction to generate a first splicing weight matrix. For the second weight matrix W^xzAnd a fourth weight matrix W^hrAnd splicing along the width direction to generate a second splicing weight matrix. The arithmetic unit operates the corresponding arithmetic logic unit by running the matrix of X × H1+ Y × H2, i.e. the first splicing weight matrix is used as H1, and the second splicing weight matrix is input as H2The arithmetic logic unit, so that the arithmetic unit reads the data from the storage unit once to determine the output result z of the refresh gate at the t-th time step_t' and reset gate output result r_t'. Therefore, the times of reading data from the storage unit by the operation unit from the processing unit are reduced, when a GRU model processes a section of voice to be recognized, the operation unit only needs to read n times of data (double data reading amount is reduced) from the storage unit to generate characters corresponding to the voice to be recognized, the speed of voice processing by the processor is increased, the speed of voice recognition of the electronic equipment is increased, the time of the voice recognition is shortened, and the user experience is improved.

A third aspect of the present application provides an electronic device comprising: a memory for storing instructions for execution by the one or more processors of the electronic device, and a plurality of processors for executing the instructions in the memory to perform the neural network model operation method of the first aspect.

A fourth aspect of the present application provides a computer-readable storage medium comprising: the readable medium of the electronic device has stored thereon instructions that, when executed on the electronic device, cause the electronic device to perform the neural network model execution method of the first aspect.

A fifth aspect of the present application provides a computer program product comprising instructions for implementing the neural network model operation method of the first aspect.

Drawings

FIG. 1A illustrates an application diagram of a GRU model, according to some embodiments of the present application;

FIG. 1B illustrates a structural schematic diagram of a GRU model, according to some embodiments of the present application;

fig. 1C illustrates a schematic structural diagram of a GRU network at the t-th time step in fig. 1B, according to some embodiments of the present application;

FIG. 2 illustrates a schematic structural diagram of an electronic device, according to some embodiments of the present application;

FIG. 3 illustrates a schematic diagram of the output results of an update gate generating a t time step, according to some embodiments of the present application;

FIG. 4 illustrates a schematic diagram of the output results of a reset gate generating a t time step, according to some embodiments of the present application;

FIG. 5 illustrates a schematic diagram of a concatenation of a first weight matrix at a tth time step and a third weight matrix at the tth time step, according to some embodiments of the present application;

FIG. 6 illustrates a diagram of a concatenation of a second weight matrix at a tth time step and a fourth weight matrix at the tth time step, according to some embodiments of the present application;

FIG. 7 illustrates a graph of first output data at a tth time step, according to some embodiments of the present application;

FIG. 8 illustrates a schematic diagram of splitting first output data at a t time step into output results of an update gate and output results of a reset gate, according to some embodiments of the present application;

FIG. 9 illustrates a flow diagram of a GRU model method of operation, according to some embodiments of the present application;

fig. 10 illustrates a flow diagram of another method of GRU model operation, according to some embodiments of the present application.

Detailed Description

Illustrative embodiments of the present application include, but are not limited to, a neural network model operation method, apparatus, electronic device, medium, and computer program product. Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Since the present application relates to the contents of a threshold recycling Unit (GRU) model, in order to more clearly illustrate the solution of the embodiment of the present application, the GRU model related to the embodiment of the present application is described in detail below.

(1) Threshold cycling Unit (GRU) model

Is one of the Recurrent Neural Network (RNN) models. Like the Long-Short Term Memory (LSTM) model, it is proposed to solve the problems of Long-Term Memory and gradients in back propagation.

Fig. 1A shows an application diagram of a GRU model 10. As shown in fig. 1A, the input data of the GRU model 10 is the speech to be recognized, and the output data of the GRU model 10 is the text corresponding to the speech to be recognized.

In other embodiments, the GRU model 10 may also be used for text translation in different languages, for example, the input data to the GRU model 10 may be chinese text and the output data of the GRU model 10 may be english text corresponding to the chinese text. The GRU model 10 may also be used for image classification, for example, the input data of the GRU model 10 may be a plurality of frames of images, and the output data of the GRU model 10 may be a corresponding image type for each frame of image. It is to be understood that the GRU model 10 is primarily used for processing and predicting sequence data, and the content of identification of the GRU model 10 is not particularly limited by the present application, depending on the actual application.

Fig. 1B shows a schematic structural diagram of a GRU model 10. As shown in fig. 1B, the GRU model 10 includes n GRU networks, which are GRU1, GRU2, …, GRUt-1, GRUt … …, GRUn, respectively. Wherein, GRU1 represents the GRU network at the 1 st time step, GRU2 represents the GRU network at the 2 nd time step, GRUt-1 represents the GRU network at the t-1 th time step, GRUt represents the GRU network at the t-th time step, and GRUn represents the GRU network at the n-th time step.

For example, as shown in FIG. 1B, { x₁、x₂、…、x_t-1、x_t……、x_nIs the voice data to be recognized, where x₁Speech input data for GRU network (GRU1) at time step 1, x₂Speech input data for GRU network (GRU2) at time step 2, x_tInput data for the GRU network (GRUt) for the t-th time step, … …, x_nSpeech input data for the GRU network (GRUn) at the nth time step. Output data { h) of GRU model 10₁、h₂、…、h_t-1、h_t……、h_nCan be the words corresponding to the speech to be recognized, wherein h₁Output data of GRU network (GRU1) for the 1 st time step, h₂GRU network for the 2 nd time step (GRU2), h_tOutput data of the GRU network (GRUt) for the t-th time step, … …, h_nOutput data of the GRU network (GRUn) for the nth time step.

It is to be understood that the input data or the output data of the GRU network at each time step may be a matrix, a tensor, a vector, etc., and for convenience of description, in the following description of the embodiments, the input data or the output data matrix of the GRU network is taken as an example to describe the data processing related to the input or the output of each layer of the neural network.

It will be appreciated that since the GRU model 10 is primarily used to process and predict sequence data, the GRU network at the current time step needs to combine the output data at the previous time step, process the input data at the current time step and generate the output data at the current time step.

Fig. 1C shows a schematic structure diagram of a GRU network at the t-th time step in fig. 1B. Depending on the gate structure, the GRU network at the t-th time step can be divided into the following four phases:

1. update phase

The update phase is used to control the extent to which the state information output by the GRU network at the previous time step is brought into the state of the GRU network at the current time step. A larger gate value for an updated gate indicates a higher degree to which the state information of the previous time step is brought into the current state. Specifically, in the GRU network at the t-th time step, the update stage may be based on the output data h of the GRU network at the t-1-th time step_t-1And the GRU network input data x of the t time step_tCalculating an updated gating z_tBy updating the gating z_tTo describe being brought into the current state h_tTo the extent of (c).

Illustratively, the gating z is updated_tThe calculation can be performed by the following formula (1):

z_t＝σ(x_t×W^xz+h_t-1×W^hz) (1)

where σ denotes sigmoid function, i.e. for x_t×W^xz+h_t-1×W^hzThe calculated result of (2) is processed by sigmoid functionTransition to (0, 1). W^xzInput data x of GRU network representing t time step_tWeight coefficient matrix of, W^hzOutput data h of GRU network representing t-1 time step_t-1The weight coefficient matrix of (2). X denotes matrix multiplication and + denotes matrix splicing.

Illustratively, input data x using the t time step_tAnd a weight matrix W^xzThe result of the matrix multiplication is compared with the output data h of the t-1 time step_t-1And a weight matrix W^xzThe result of the matrix multiplication is matrix-spliced, i.e. x_t×W^xz+h_t-1×W^hz。x_t×W^xz+h_t-1×W^hzWhich can be understood as the vector stitching operation t11 in fig. 1C. Updating gating z_tWhich can be understood as the output of the sigma layer t12 in fig. 1C.

2. Reset phase

The reset phase is used for controlling how much information of the state information output by the GRU network at the previous time step is written into the state of the GRU network at the current time step, and the smaller the reset gate is, the less the state information output by the GRU network at the previous time step is written into the state of the GRU network at the current time step is. Specifically, at the t-th time step of the GRU network, the reset phase may be based on the output data h of the GRU network at the t-1 th time step_t-1And the GRU network input data x of the t time step_tCalculating a reset gate control r_tBy resetting the gate control r_tTo describe the extent to which the status information of the GRU network output at the previous time step is written to the status of the GRU network at the current time step.

Illustratively, the gating r is reset_tThe calculation can be made by the following equation (2):

r_t＝σ(x_t×W^xr+h_t-1×W^hr) (2)

where σ denotes sigmoid function, i.e. for x_t×W^xr+h_t-1×W^hrThe calculated result of (c) is converted to (0,1) by the sigmoid function. W^xrInput data x of GRU network representing t time step_tWeight coefficient matrix of W^hrOutput data h of GRU network representing t-1 time step_t-1X represents matrix multiplication, and + represents matrix splicing.

It can be easily seen that the input data x at the t-th time step is used_tAnd a weight coefficient matrix W^xrThe result of the matrix multiplication is compared with the output data h of the t-1 time step_t-1And a weight coefficient matrix W^hrThe result of the matrix multiplication is matrix-spliced, i.e. x_t×W^xr+h_t-1×W^hr。x_t×W^xr+h_t-1×W^hrWhich can be understood as the vector stitching operation t13 in fig. 1C. Reset gate control r_tWhich can be understood as the output of the sigma layer t14 in fig. 1C.

3. Update memory phase

The update memory stage can update and memorize the input data of the GRU network at the current time step, namely, the update is important. Specifically, in the GRU network at the t-th time step, the update stage may be based on the output data h of the GRU network at the t-1-th time step_t-1And the GRU network input data x of the t time step_tCalculating a reset gate control r_tThen using reset gating r_tRealizing intermediate output result c of t time step_tAnd (4) calculating.

Exemplarily, the intermediate output result c at the t-th time step_tThe calculation can be performed by the following equation (3):

c_t＝tanh(x_t×W^x+(h_t-1·r_t)W^h) (3)

wherein tanh represents the tanh function, i.e., for x_t×W^x+(h_t-1·r_t)W^hThe result of the calculation of (c) is converted to a value between (-1,1) by the tanh function. W^xInput data x of GRU network representing t time step_tWeight coefficient matrix of W^hOutput h of GRU network representing t-1 time step_t-1And reset gate value r_tAnd a weight coefficient matrix of the calculation results of bit-wise multiplication. Representing rows in both matricesThe data with the same column number are multiplied correspondingly, x represents matrix multiplication, and + represents matrix splicing.

It will be readily seen that the output h of the GRU network using the t-1 time step_t-1And reset gate value r_tMultiplication by bit, i.e. h_t-1·r_t，h_t-1·r_tWhich may be understood as t15 of fig. 1C. Input data x using the t-th time step_tAnd a weight coefficient matrix W^xrThe result of the matrix multiplication is compared with the output data h of the t-1 time step_t-1And the gate value r of the reset gate_tBit-wise multiplied calculation result and weight coefficient matrix W^hrThe result of the matrix multiplication is matrix-spliced, i.e. x_t×W^x+(h_t-1·r_t)W^h。x_t×W^x+(h_t-1·r_t)W^hWhich can be understood as the vector stitching operation t16 in fig. 1C. Intermediate output result c at t-th time step_tWhich can be understood as the output of tanh layer t17 in fig. 1C.

4. Output stage

The output stage may determine and output the output and state of the GRU network at the t-th time step. Specifically, the output stage can be based on the output data h of the GRU network at the t-1 time step_t-1Updating the gating of the gate z_tAnd intermediate output result c of t time step_tCalculating to obtain output data h of the t-th time step_t。

Exemplarily, the output data h at the t-th time step_tThe calculation can be made by the following equation (4):

h_t＝(1-z_t)·c_t+z_t·h_t-1 (4)

wherein z is_tIndicating gating z of an update gate_t，h_t-1Output h of GRU network representing t-1 time step_t-1，c_tIntermediate output result c representing the t-th time step_t. And + represents the vector summation.

It will be readily seen that 1 minus the value of the updated gate is usedz_tI.e. 1-z_t，1-z_tIt can be understood as the operation of t18 in fig. 1C. Intermediate output result c using t-th time step_tAnd 1-z_tThe result of the operation of (a) is subjected to a vector bitwise product operation, i.e., (1-z)_t)·c_t，(1-z_t)·c_tIt can be understood as the operation of t19 in fig. 1C. Output h of GRU network using t-1 time step_t-1And the gating of the update gate z_tPerforming a bit-wise multiplication of vectors, i.e. z_t·h_t-1，z_t·h_t-1It can be understood as the operation of t20 in fig. 1C. Output h of GRU network using t-1 time step_t-1And the gating of the update gate z_tThe result of the bitwise multiplication of the vector and the intermediate output c of the t-th time step_tAnd 1-z_tThe result of the vector bitwise multiplication operation is subjected to a vector summation operation, i.e., (1-z)_t)·c_t+z_t·h_t-1，(1-z_t)·c_t+z_t·h_t-1It can be understood as the operation of t21 in fig. 1C.

It can be understood that the processor of the electronic device is provided with an arithmetic unit for each operation term in the above equations (1) to (4), and the arithmetic unit can obtain the operation result of each operation term by reading the input data of each operation term from the storage unit of the processor. Wherein the operation term can be at least part of the above formulas, such as x in formula (1)_t×W^xz+h_t-1×W^hzX in the formula (2)_t×W^xz+h_t-1×W^hzSigmoid function σ (), etc.

When the GRU network defined by the above formulas (1) to (4) is operated by the arithmetic unit of the processor of the electronic device, it is necessary to read the input data of the operation items in each formula from the storage unit of the processor into the arithmetic logic circuit corresponding to the arithmetic unit, and then obtain the operation results of each operation item according to the arithmetic logic of each operation item. That is to say, each time the arithmetic unit of the processor runs an arithmetic item, input data needs to be read from the storage unit of the processor once, so that the times of reading the data from the storage unit by the arithmetic unit of the processor are increased, and the running speed of the neural network model is reduced.

For example, the processing unit of the processor is running the matrix operation logic of X × H1+ Y × H2 to calculate X of formula (1)_t×W^xz+h_t-1×W^hzIn the operation of (3), it is necessary to read the input data x from the memory cell of the processor first_tA weight coefficient matrix W^xzA weight coefficient matrix W^hzAnd output data h of t-1 time step_t-1After the processing unit reads the data, the processing unit runs the matrix operation logic of X multiplied by H1+ Y multiplied by H2, and the matrix operation logic is used for X_t×W^xz+h_t-1×W^hzPerforming an operation to obtain x_t×W^xz+h_t-1×W^hzThe operation result of (1). Then, the processing unit of the processor further runs the matrix operation logic of X × H1+ Y × H2 to calculate X of formula (2)_t×W^xr+h_t-1×W^hrAnd it is necessary to read the input data x again from the memory cell of the processor_tA weight coefficient matrix W^xrA weight coefficient matrix W^hrAnd output data h of t-1 time step_t-1After the processing unit reads the data, the matrix operation logic of X × H1+ Y × H2 is operated to X_t×W^xr+h_t-1×W^hrPerforming an operation to obtain x_t×W^xr+h_t-1×W^hrThe operation result of (2). It is easy to see that the arithmetic unit is calculating x_t×W^xz+h_t-1×W^hzAnd x_t×W^xr+h_t-1×W^hrWhen the operation result is obtained, it is necessary to read data twice from the memory cell to obtain x_t×W^xz+h_t-1×W^hzAnd x_t×W^xr+h_t-1×W^hrThe operation result of (1).

In order to reduce the number of times that the arithmetic unit reads data from the storage unit, the embodiment of the application provides an operation method of a neural network model, which is implemented by using arithmetic items (as disclosed in the specification) with the same operation form and the same input data"x" in the formula (1)_t×W^xz+h_t-1×W^hz"and" x "in the formula (2)_t×W^xr+h_t-1×W^hrThe operation forms are the same, and the input data are all x_tAnd h_t-1) "weight matrix in (W)^xzAnd W^xrAnd W^hzAnd W^hr) And splicing, and acquiring input data from the storage unit once based on the spliced weight matrix by using the operation unit corresponding to the operation item in the form to obtain the operation result of each operation item. Therefore, the speed of the electronic equipment for operating the neural network model is improved.

For example, the arithmetic unit of the processor reads the input data x of the t-th time step from the memory unit of the processor at a time_tA weight coefficient matrix W^xzA weight coefficient matrix W^hzA weight coefficient matrix W^xrA weight coefficient matrix W^hrAnd output data h of t-1 time step_t-1The weight coefficient matrix W is set^xzAnd a weight coefficient matrix W^xrSplicing is carried out to obtain a splicing weight matrix W^xzrThe weight coefficient matrix W is used to calculate the weight coefficient^hzAnd a weight coefficient matrix W^hrSplicing is carried out to obtain a splicing weight matrix W^hzrThe arithmetic unit splices the weight matrix W by running the matrix arithmetic logic of X × H1+ Y × H2^xzrAs H1, splicing weight matrix W^hzrInput data x as H2 at t-th time step_tAs output data h of X, t-1 time step_t-1As Y, corresponding matrix operation is performed, that is, x of the formula (1) can be obtained by one-time operation_t×W^xz+h_t-1×W^hzOperation result sum x_t×W^xr+h_t-1×W^hrThe result of the operation is then based on the weight coefficient matrix W^xzAnd a weight coefficient matrix W^xrDimension value of (a) x_t×W^xz+h_t-1×W^hzOperation result sum x_t×W^xr+h_t-1×W^hrSplitting the operation result to respectively obtain x_t×W^xz+h_t-1×W^hzOperation result, x_t×W^xr+h_t-1×W^hrAnd (5) calculating the result.

To facilitate understanding of the technical solution of the embodiment of the present application, the electronic device 20 executing the calculation process of the threshold cycle unit model 10 is described below.

Fig. 2 illustrates a schematic block diagram of an electronic device 20 according to some embodiments of the present application, and as shown in fig. 2, the electronic device 20 includes a processor 21, a system memory 22, a non-volatile memory 23, an input/output device 24, a communication interface 25, and system control logic 26 for coupling the processor 21, the system memory 22, the non-volatile memory 23, the input/output device 24, and the communication interface 25. Wherein:

the Processor 201 may include one or more Processing units, such as Processing modules or Processing circuits that may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), a Microprocessor (MCU), a Programmable Gate Array (FPGA), an Artificial Intelligence Processing Unit (AIPU), a Neural Network Processor (NPU), and so on. The different processing units may be separate devices or may be integrated into one or more processors. In some embodiments, processor 201 may perform the computational process of neural network model 10.

In some embodiments, the processor 21 may include a control unit 210, an arithmetic unit 211, and a storage unit 212, wherein the control unit 210 is configured to schedule the processor 21, and in some embodiments, the control unit 210 further includes a Direct Memory Access Controller (DMAC) 2101 configured to transfer data in the storage unit 212 to other units, for example, to the system Memory 22.

The operation unit 211 is used for performing specific arithmetic and/or logic operations, and in some embodiments, the operation unit 211 may include an arithmetic logic unit, which refers to a combinational logic circuit capable of implementing multiple sets of arithmetic and logic operations for performing arithmetic and logic operations. For example, the operation unit 211 includes arithmetic logic units corresponding to the operators of formula (1) to formula (4).

In some embodiments, the arithmetic unit 211 internally includes a plurality of processing units (PEs). In some implementations, the arithmetic unit 211 is a two-dimensional systolic array. The arithmetic unit 211 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic unit 211 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The operation unit 211 obtains the data corresponding to the input matrix a and the weighting matrix B from the storage unit 212, and buffers the data on each PE in the operation circuit, and the operation circuit performs matrix operation on the data in the matrix a and the data in the matrix B to obtain a partial result or a final result of the matrix. It is understood that the operation unit 211 may process the final result of the matrix operation with a size of 100 × 100 at a time, or may process the final result of the matrix operation with a size of 10 × 10 at a time.

In other embodiments, the arithmetic unit 211 may further include a plurality of Application Specific Integrated Circuits (ASICs) adapted to run the neural network model, such as a convolution calculation unit, a vector calculation unit, and the like. In some embodiments, the operation unit 211 may be configured to read the input data, the first weight matrix, the second weight matrix, the third weight matrix, the fourth weight matrix, and the output data at the t-1 th time step from the storage unit 212, and generate the gate value of the reset gate and the gate value of the update gate at the t-th time step through an arithmetic operation. The storage unit 212 is used to temporarily store input and/or output data of the arithmetic unit 211. For example, the storage unit 212 may be used to store input data at the t-th time step, a first weight matrix, a second weight matrix, a third weight matrix, a fourth weight matrix, and output data at the t-1 th time step.

It is understood that DMAC 2101 may not be integrated into processor 21 in other embodiments, but may be a separate module coupled to system control logic 26, which is not limited by the embodiments of the present application.

The system Memory 22 may include Random-Access Memory (RAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM), and other Memory devices for temporarily storing Data or instructions of the electronic device 20.

Non-volatile memory 23 may be a tangible, non-transitory computer-readable medium including one or more instructions for permanently storing data and/or instructions. The nonvolatile memory 23 may include any suitable nonvolatile memory such as a flash memory and/or any suitable nonvolatile storage device, such as a Hard Disk Drive (HDD), a Compact Disc (CD), a Digital Versatile Disc (DVD), a Solid-State Drive (SSD), and the like. In some embodiments, the non-volatile memory 23 may also be a removable storage medium, such as a Secure Digital (SD) memory card or the like. In some embodiments, the non-volatile memory 23 is used to permanently store data or instructions for the electronic device 20, such as instructions for storing the neural network model 10.

Input/output (I/O) devices 24 may include input devices such as a keyboard, mouse, touch screen, etc. for converting user operations into analog or digital signals and communicating them to processor 21; and output devices such as speakers, printers, displays, etc. for presenting information in the electronic device 20 to the user in the form of sounds, text, images, etc.

The communication interface 25 provides a software/hardware interface for the electronic device 20 to communicate with other electronic devices, so that the electronic device 20 can exchange data with other electronic devices 20, for example, the electronic device 20 may obtain data for operating the neural network model from other electronic devices through the communication interface 25, and may also transmit the operation result of the neural network model to other electronic devices through the communication interface 25.

System control logic 26 may include any suitable interface controllers to provide any suitable interfaces to the other modules of electronic device 20 so that the various modules of electronic device 20 may communicate with one another.

In some embodiments, at least one of the processors 21 may be packaged together with logic for one or more controllers of the System control logic 26 to form a System In Package (SiP). In other embodiments, at least one of the processors 21 may also be integrated on the same Chip with logic for one or more controllers of the System control logic 26 to form a System-on-Chip (SoC).

It is understood that the hardware structure of the electronic device 20 shown in fig. 2 is only an example, in other embodiments, the electronic device 20 may also include more or fewer modules, and a part of the modules may also be combined or split, and the embodiment of the present application is not limited.

It is understood that the electronic device 20 may be any electronic device capable of running the GRU model 10, including but not limited to a laptop computer, a desktop computer, a tablet computer, a cell phone, a server, a wearable device, a head-mounted display, a mobile email device, a portable game console, a portable music player, a reader device, a television with one or more processors embedded therein or coupled thereto, and the embodiments of the present application are not limited thereto.

For ease of understanding, the process of the electronic device running the GRU model 10 will be described below in conjunction with the structure of the electronic device 20.

It is understood that the process of the electronic device 20 running each of the GRU networks in the GRU model 10 is similar, and the process of the electronic device 20 running the GRU model 10 is described below by taking the tth GRU network as an example.

In the method for operating a GRU model according to the embodiment of the present application, the operation unit 211 reads the input data of the t-th time step from the storage unit 212 for the first time, and the first weight matrix (i.e. the weight coefficient matrix W of the formula (1))^xz) The second weight matrix (i.e., the weight coefficient matrix W of equation (1))^hz) And the output data of the t-1 time step, the input data of the t time step and the first time step are processed by operating the nonlinear operator and the linear operator of the update gateThe weight matrix, the second weight matrix and the output data of the t-1 th time step, the operation unit 211 determines the output result of the update gate (i.e. the operator x of the formula (1)) by operating the nonlinear operator_t×W^xz+h_t-1×W^hzThe calculation result of (c). Then, the third weight matrix (i.e. the weight coefficient matrix W of the formula (2)) is obtained according to the input data xt of the t-th time step read from the storage unit 212 for the second time^xr) A fourth weight matrix (i.e., the weight coefficient matrix W of equation (2))^hr) And the output data of the t-1 time step, and generating the output result of the reset gate (namely the operator x of the formula (2)) by operating the linear operator of the reset gate_t×W^xr+h_t-1×W^hrThe calculation result of (c).

Input data x at the t-th time step_tOutput data h of t-1 time step of matrix with size of 1 × 2_t-1Taking a matrix of 1 × 4, a matrix of 2 × 4 for the first weight matrix, and a matrix of 4 × 4 for the second weight matrix as examples, the linear operator of the update gate of the GRU network (i.e. operator x in equation 1) of which the operation unit 211 runs the t-th time step is described_t×W^xz+h_t-1×W^hz) Generating the output z of the refresh gate at the t-th time step_t' (size of 1 × 4 matrix).

For example, as shown in FIG. 3, the input data x at the t-th time step_tIs a matrix of size 1 × 2, in which the input data x_tThe first row of data of (1) includes 1, 2. First weighting matrix W at the t-th time step^xzIs a matrix of size 2 × 4, wherein the first weight matrix W^xzComprises 3, 2, 5, 4, a first weight matrix W^xzThe second line of data of (1) includes 4, 1. Output data h of t-1 time step_t-1Is a 1 × 4 matrix in which the output data h_t-1The first row of data of (1) includes 1, 2, 1, 3. Second weighting matrix W for the t-th time step^hzIs a matrix of size 4 × 4, wherein the second weight matrix W^hzComprises 3, 2, 1, 0, a second weight matrix W^hzComprises 2, 1, 0, 3The third line data includes 1, 0,1, 2, and the fourth line data includes 1, 3, 2, 1. The arithmetic unit 211 runs the linear operator of the update gate and generates the output result z of the update gate_t' is a matrix of size 1 × 4, in which the output result z of the gate is updated_tThe first row of data of' includes 22, 17, 15, 17.

Input data x at the t-th time step_tOutput data h of t-1 time step of matrix with size of 1 × 2_t-1Is a 1 × 4 matrix, a third weight matrix W^xrIs a matrix of size 2 × 4 and a fourth weight matrix W^hrFor the example of a 4 × 4 matrix, the linear operator of the reset gate of the GRU network (i.e. operator x in equation 2) of which the operation unit 211 runs the t-th time step is described_t×W^xr+h_t-1×W^hr) Generating an output r of the reset gate at the t-th time step_t' (size of 1 × 4 matrix).

For example, as shown in FIG. 4, input data x at the t-th time step_tIs a matrix of size 1 × 2, in which the input data x_tThe first row of data of (1) includes 1, 2. Third weighting matrix W at the t-th time step^xrIs a matrix of size 2 × 4, wherein the third weight matrix W^xrThe first row of data comprises 3, 2, 0, 2, and a third weight matrix W^xrIncludes 2, 1. Output data h of t-1 time step_t-1Is a 1 × 4 matrix in which the output data h_t-1The first row of data of (1) includes 1, 2, 1, 3. Fourth weighting matrix W for the t-th time step^hrIs a matrix of size 4 × 4, wherein the fourth weight matrix W^hrIncludes 0, 2, 1, 0, a fourth weight matrix W^hrComprises 2, 1, 0, a fourth weight matrix W^hrIncludes 1, 0,1, 2, a fourth weight matrix W^hrThe fourth line of data of (1) includes 1, 0, 2, 1. The arithmetic unit 211 executes the linear operator of the reset gate to generate the output result z of the reset gate_t' is a matrix of size 1 × 4, in which the output z of the reset gate_tThe first row of data of' includes 15, 8, 10, 9。

As can be seen from the descriptions of fig. 3 to fig. 4, the operation unit 211 needs to first read the input data x at the t-th time step from the storage unit 212_tOutput data h of t-1 time step_t-1A first weight matrix W^xzAnd a second weight matrix W^hzThe output z of the update gate is generated by running the linear operator of the update gate_t'. Then, the operation unit 211 reads the input data x of the t-th time step from the storage unit 212_tOutput data h of t-1 time step_t-1A third weight matrix W^xrAnd a fourth weight matrix W^hrThe output result r of the reset gate is generated by operating the linear operator of the reset gate_t′。

It is easy to see that, in the process of reasoning the input data of one time step to obtain the characters corresponding to the voice data of the time step, the computing unit 211 needs to pass through the data read twice from the storage unit 212 (i.e. the input data x of the t-th time step read for the first time)_tOutput data h of t-1 time step_t-1A first weight matrix W^xzAnd a second weight matrix W^hzy and the input data x of the t time step read for the second time_tOutput data h of t-1 time step_t-1A third weight matrix W^xrAnd a fourth weight matrix W^hr) And the output result z of the updating gate at the t-th time step can be generated only by performing matrix operation on the data read each time_t' and reset gate output result r of reset gate_t' the speed of the arithmetic unit in reasoning the input data is reduced.

It is understood that the GRU model 10 includes a GRU network of n time steps, and in the GRU network of each time step of the GRU model 10, the arithmetic unit 211 needs to read data from the storage unit 212 twice to calculate the gate values of the update gate and the reset gate of the GRU network of each time step. Therefore, when the GRU model 10 processes a segment of speech, the arithmetic unit 211 needs to read at least 2 × n times of data from the storage unit 212, and when the GRU model 10 processes a segment of speech, the arithmetic unit 211 needs to read at least 20 × n times of data from the storage unit 212. In this way, when the GRU model 10 is run to perform data processing (e.g., speech recognition), the operation unit 211 needs to read data from the storage unit 212 multiple times, which affects the speed of speech processing performed by the processor 21 and further affects the speed of speech recognition of the electronic device 20, resulting in a long speech recognition time and affecting user experience.

As is apparent from the above description, the matrix operations of the update gate operators and the reset gate operators are the same in structure, i.e., are all X × H1+ Y × H2 matrix operations, i.e., the operation unit 211 has an arithmetic logic unit corresponding to the matrix operation of X × H1+ Y × H2. However, the weight coefficients of the two input data of the matrix operation of the update gate operator and the reset gate operator are different, that is, the first weight matrix is different from the third weight matrix and the second weight matrix is different from the fourth weight matrix.

In order to reduce the number of times that the arithmetic unit 211 reads data from the storage unit 212, the embodiment of the present application provides another GRU model operation method, in which the arithmetic unit 211 reads the input data x of the t-th time step from the storage unit 212 once_tA first weight matrix W^xzA second weight matrix W^hzA third weight matrix W^xrA fourth weight matrix W^hrAnd output data h of t-1 time step_t-1The first weight matrix W is used^xzAnd a third weight matrix W^xrSplicing to obtain a first splicing weight matrix, and splicing the second weight matrix W^hzAnd a fourth weight matrix W^hrSplicing to obtain a second splicing weight matrix, and combining the first splicing weight matrix, the second splicing weight matrix and the input data x of the t-th time step_tOutput data h of t-1 time step_t-1Performing matrix operation of X × H1+ Y × H2 to obtain output result z of the update gate at the t-th time step_t' and reset gate output result r_t'. In this way, the number of times the arithmetic unit 211 reads data from the storage unit 212 is reduced, thereby increasing the speed of the arithmetic unit in running the neural network model.

For example, if the arithmetic logic unit corresponding to the matrix operation of X × H1+ Y × H2 of the operation unit 211 can process the matrix operation at one timeWhen the data amount is larger than the data processing amount of the output result of the refresh gate generating the t-th time step and the output result of the reset gate generating the t-th time step, the arithmetic unit 211 may read the input data x of the t-th time step from the storage unit 212 last time_tA first weight matrix W^xzA second weight matrix W^hzA third weight matrix W^xrFourth weight matrix W^hrAnd output data h of t-1 time step_t-1Then, for the first weight matrix W^xzAnd a third weight matrix W^xrAnd splicing along the width direction to generate a first splicing weight matrix. For the second weight matrix W^xzAnd a fourth weight matrix W^hrAnd splicing along the width direction to generate a second splicing weight matrix. The arithmetic unit 211 operates the corresponding arithmetic logic unit by running the matrix of X × H1+ Y × H2, i.e. the first splicing weight matrix is H1, and the second splicing weight matrix is H2, and inputs the first splicing weight matrix and the second splicing weight matrix to the arithmetic logic unit, so that the arithmetic unit 211 reads data from the storage unit 212 once, and can determine the output result z of the update gate at the t-th time step_t' and reset gate output result r_t'. Therefore, the number of times that the operation unit 211 reads data from the storage unit 212 from the processing unit 211 is reduced, when the GRU model 10 processes a section of speech to be recognized, the operation unit 211 can generate characters corresponding to the speech to be recognized only by reading n times of data from the storage unit 212 (reducing the data reading amount by one time), so that the speech processing speed of the processor 21 is increased, the speech recognition speed of the electronic device 20 is increased, the speech recognition time is shortened, and the user experience is improved.

Specifically, the input data x for the t-th time step shown in fig. 3 and 4_tOutput data h of t-1 time step_t-1A first weight matrix W^xzA second weight matrix W^hzAnd the input data x of the t time step_tOutput data h of t-1 time step_t-1A third weight matrix W^xrAnd a fourth weight matrix W^hrThe operation unit 211 can read the t-th time from the storage unit 212 at the same timeInput data x of step_tOutput data h of t-1 time step_t-1First weight matrix W^xzA second weight matrix W^hzA third weight matrix W^xrAnd a fourth weight matrix W^hrAnd the first weight matrix W^xzAnd a third weight matrix W^xrSplicing is the first splicing weight matrix W shown in FIG. 5^xzrA second weight matrix W^hzAnd a fourth weight matrix W^hrSplicing is the second splicing weight matrix W shown in FIG. 6^hzr(ii) a Then the first splicing weight matrix W^xzrA second stitching weight matrix W^hzrAnd input data x at the t-th time step_tOutput data h of t-1 time step_t-1Matrix operation of X × H1+ Y × H2 is performed to obtain the first output data zr at the t-th time step shown in FIG. 7_t'; referring to fig. 8, the operation unit 211 further obtains the output result z of the refresh gate at the t-th time step of fig. 3 and 4 obtained in fig. 7_t' and reset gate output result r_t' splitting to obtain the output result z of the update gate at the t-th time step_t' and reset gate output result r_t'. The specific calculation process will be described below, and will not be described herein.

The following describes in detail a process of the electronic device 20 executing the GRU model 10 operation method provided in the embodiment of the present application, with reference to a hardware structure of the electronic device 20.

Fig. 9 illustrates a flow diagram of a method of operating a GRU model 10 according to some embodiments of the present application. In the following, an embodiment of the present application is described by taking a GRU network (GRUt) at a tth time step in the GRU model 10 as an example, where the operation method includes:

s901: the operation unit 211 reads the input data, the first weight matrix, the second weight matrix, and the output data at the t-1 th time step from the storage unit 212. Wherein the first weight matrix is the weight coefficient matrix W in the formula (1)^xzThe second weight matrix is the weight coefficient matrix W in the formula (1)^hz。

S902: the operation unit 211 inputs the number of times according to the t-th time stepAnd determining the output result of the updating gate of the t-th time step by operating the linear operator of the updating gate of the GRU network of the t-th time step according to the first weight matrix, the second weight matrix and the output data of the t-1-th time step. Wherein, the linear operator of the update gate of the GRU network at the t-th time step may be the operator x in formula (1)_t×W^xz+h_t-1×W^hz。

S903: the operation unit 211 determines the gate value of the update gate of the GRU network at the t-th time step by operating the nonlinear operator of the update gate of the GRU network at the t-th time step according to the output result of the update gate. The non-linear operator of the update gate of the GRU network may be a σ () (i.e., sigmoid function) operator in formula (1).

S904: the operation unit 211 reads the input data at the t-th time step, the third weight matrix, the fourth weight matrix, and the output data at the t-1-th time step from the storage unit 212. Wherein the third weight matrix is the weight coefficient matrix W in the formula (2)^xrThe second weight matrix is the weight coefficient matrix W in the formula (2)^hr。

S905, the operation unit 211 determines the output result of the reset gate through the linear operator of the reset gate of the GRU network according to the input data of the t-th time step, the third weight matrix, the fourth weight matrix and the output data of the t-1-th time step.

S906: the operation unit 211 determines a gate value of the reset gate of the GRU network at the t-th time step by operating the nonlinear operator of the reset gate of the GRU network at the t-th time step according to the output result of the reset gate. The non-linear operator of the reset gate of the GRU network may be a sigma () (i.e., sigmoid function) operator in formula (2).

S907: the operation unit 211 determines output data of the t-th time step based on the gate value of the reset gate of the t-th time step, the gate value of the update gate of the t-th time step, the input data of the t-th time step, and the output data of the t-1-th time step.

In some embodiments, the arithmetic unit 211 determines the output data for the t-th time step based on the gate value of the reset gate for the t-th time step, the gate value of the update gate for the t-th time step, the input data for the t-th time step, and the output data for the t-1-th time step. Specifically, the output data of the t-th time step can be obtained by calculation according to formula (3) and formula (4), and the specific calculation process refers to the description of formula (3) and formula (4), which is not described herein again.

It is to be noted that, in steps S901 to S906, when the arithmetic unit 211 calculates the gate values of the reset gates and the update gates of the GRU network at the t-th time step, it is necessary to read data at step S901, that is, input data at the t-th time step, the first weight matrix, the second weight matrix, and output data at the t-1-th time step, and read data at step S903, that is, input data at the t-th time step, the third weight matrix, the fourth weight matrix, and output data at the t-1-th time step.

Therefore, in order to reduce the number of times the operation unit 211 reads data from the storage unit 212 in calculating the gate value of the reset gate and updating the gate value of the gate at the t-th time step. The application further provides an operation method of the GRU model 10, the operation unit 211 only needs to read the input data, the first weight matrix, the second weight matrix, the third weight matrix, the fourth weight matrix and the output data of the t-1 time step from the storage unit 212 once, the first weight matrix and the third weight matrix are spliced, the second weight matrix and the fourth weight matrix are spliced, and then the operation unit 211 generates the gate value of the reset gate of the t time step and the gate value of the update gate of the t time step according to the spliced weight matrix, the input data of the t time step and the output data of the t-1 time step.

Fig. 10 illustrates a flow diagram of another method of operating a GRU model 10 according to some embodiments of the present application. In the following, an embodiment of the present application is described by taking a GRU network (GRUt) at a tth time step in the GRU model 10 as an example, where the operation method includes:

s1001: the operation unit 211 reads the input data, the first weight matrix, the second weight matrix, the third weight matrix, the fourth weight matrix, and the output data at the t-1 time step from the storage unit 212.

For example, as shown in fig. 1A to 1C, taking the GRU network (GRUt) at the t-th time step in the GRU model 10 as an example, in order to calculate the gate values of the update gate and the reset gate, the operation unit 211 may read the input data at the t-th time step, the first weight matrix, the second weight matrix, the third weight matrix, the fourth weight matrix, and the output data at the t-1-th time step from the storage unit 212, wherein the first weight matrix is the weight coefficient matrix W in the formula (1)^xzThe second weight matrix is the weight coefficient matrix W in the formula (1)^hzThe second weight matrix is the weight coefficient matrix W in the formula (2)^xrThe second weight matrix is the weight coefficient matrix W in the formula (2)^hr。

In some embodiments, the input data x for the t time step_tIs a matrix of size H1 xW 1, wherein H1 represents the input data x of the t time step_tW1 represents the input data x at the t-th time step_tThe number of columns. First weighting matrix W at the t-th time step^xzAnd a third weight matrix W^xrAre all matrices with the size of H2 XW 2, and H2 represents a first weight matrix W^xzAnd a third weight matrix W^xrThe number of rows of (c). Second weighting matrix W for the t-th time step^hzAnd a fourth weight matrix W^hrAre all matrices with the size of H3 XW 3, H3 represents a second weight matrix W^hzAnd a fourth weight matrix W^hrThe number of rows of (c). W3 denotes a second weight matrix W^hzAnd a fourth weight matrix W^hrThe number of columns. Output data h of t-1 time step_t-1H4 represents the output data H of the t-1 time step, which is a matrix with the size of H4 xW 4_t-1W4 represents the output data h at the t-1 time step_t-1The number of columns.

As will be readily appreciated, input data x in the linear operator of the update gate of the GRU network to ensure the t-th time step_tAnd a first weight matrix W^xzMultiplying the matrix of (1), and outputting data h_t-1And a second weight matrix W^hzAnd input data x_tAnd a first weight matrix W^xzWith the output data h_t-1And a second weight matrix W^hzThe result of the matrix multiplication of (a) realizes matrix addition, and the value of the number of lines H1 of the input data at the t-th time step is equal to the value of the number of lines H4 of the output data at the t-1-th time step. The value of the row number H2 of the first weight matrix is equal to the value of the column W1 of the input data at the t-th time step. The value of the row number H3 of the second weight matrix is equal to the value of the column number W4 of the output data at time step t-1. The number of columns W2 of the first weight matrix is equal to the number of columns W3 of the second weight matrix. Similarly, the corresponding relationship between the row number and the column number of each matrix in the linear operator of the reset gate of the GRU network at the t-th time step refers to the corresponding relationship between the row number and the column number of each matrix in the linear operator of the update gate of the GRU network at the t-th time step, which is not described herein again.

In some embodiments, the data in the input data at the t-th time step, the first weight matrix, the second weight matrix, the third weight matrix, the fourth weight matrix, and the output data at the t-1 th time step may be integer numbers or floating point numbers, and the data types of the input data at the t-th time step, the first weight matrix, the second weight matrix, and the output data at the t-1 th time step are not particularly limited according to the practical application.

S1002: the operation unit 211 concatenates the first weight matrix at the t-th time step and the third weight matrix at the t-th time step along the width direction of the matrices, and generates a concatenated first concatenation weight matrix.

In some embodiments, the first weight matrix W for the t time step^xzIs a matrix of size H2 XW 2, H2 both representing W^xzThe number of rows of, in turn, W^xzW2 denotes both W^xzThe number of columns of (2) further represents W^xzThe width of the matrix of (a). Third weighting matrix W at the t-th time step^xrIs a matrix of size H2 XW 2, H2 both representing W^xrThe number of rows of, in turn, W^xrW2 denotes both W^xrThe number of columns of (2) further represents W^xrThe width of the matrix of (a).

It is obvious that the height values and the width values of the first weight matrix of the t-th time step and the third weight matrix of the t-th time step are respectively equal, but the data in the first weight matrix of the t-th time step and the third weight matrix of the t-th time step are different. Therefore, the operation unit 211 may concatenate the first weight matrix at the t-th time step and the third weight matrix at the t-th time step in the width direction of the matrices to generate a concatenated first concatenation weight matrix.

In some embodiments, the first weight matrix at the t-th time step is a matrix with a size of H2 × W2, the third weight matrix at the t-th time step is a matrix with a size of H2 × W2, and the operation unit 211 splices the first weight matrix at the t-th time step and the third weight matrix at the t-th time step along a width direction of the matrices to generate a spliced first spliced weight matrix. The first spliced weight matrix after splicing is a matrix with the size of H2 multiplied by 2W 2.

For example, as shown in FIG. 5, the first weight matrix W for the t-th time step^xzThird weighting matrix W with the t time step^xrAre all matrices of size 2 × 4, the first weight matrix W^xzComprises 3, 2, 5, 4, a first weight matrix W^xzIncludes 4, 1. Third weight matrix W^xrThe first row of data comprises 3, 2, 0, 2, and a third weight matrix W^xrComprises 2, 1. The operation unit 211 applies the first weight matrix W of the t-th time step^xzAnd a third weight matrix W^xrSplicing along the width direction of the matrix, and generating a spliced first splicing weight matrix W^xzrA matrix of size 2 x 8. As shown in FIG. 5, the first stitching weight matrix W^xzrThe first row of data of (1) includes: 3. 2, 5, 4, 3, 2, 0, 2, first stitching weight matrix W^xzrThe second line of data includes: 4. 1, 2, 1.

S1003: the operation unit 211 concatenates the second weight matrix at the t-th time step and the fourth weight matrix at the t-th time step along the width direction of the matrices, and generates a concatenated second concatenation weight matrix.

In some embodiments, the second weight matrix W for the t time step^hzIs a matrix of size H3 xW 3, H3 both representing W^hzThe number of rows of, in turn, W^hzW3 denotes both W^hzThe number of columns of (2) further represents W^hzThe width of the matrix of (a). Third weighting matrix W at the t-th time step^hrIs a matrix of size H2 XW 2, H2 both representing W^hrNumber of rows of (2), in turn, represents W^hrW2 denotes both W^hrThe number of columns of (2) further represents W^hrThe width of the matrix of (a).

It is obvious that the width and height of the second weight matrix at the t-th time step and the fourth weight matrix at the t-th time step are respectively equal, except that the data in the second weight matrix at the t-th time step and the fourth weight matrix at the t-th time step are different. The operation unit 211 may splice the second weight matrix at the t-th time step and the fourth weight matrix at the t-th time step along the height direction of the matrices, and generate a spliced second splicing weight matrix.

In some embodiments, the second weight matrix of the t-th time step and the fourth weight matrix of the t-th time step are both matrices with a size of H3 × W3, and therefore, the operation unit 211 may concatenate the second weight matrix of the t-th time step and the fourth weight matrix of the t-th time step along a width direction of the matrices to generate a concatenated second concatenation weight matrix. The second stitching weight matrix is a matrix with size H3 × 2 × W3.

For example, as shown in FIG. 6, the second weight matrix W for the t-th time step^hzFourth weighting matrix W with the t-th time step^xrAre all matrices of size 4 x 4. Wherein the second weight matrix W^hzThe first row of data of (1) includes: 3. 2, 1, 0, second weight matrix W^hzThe second line of data includes: 2. 1, 0, 3, and the third row data includes: 1. 0,1, 2, and the fourth line data includes 1, 3, 2, 1. Fourth weight matrix W^hrComprises 0, 2, 1, 0, a fourth weight matrix W^hrThe second line of data includes: 2. 1, 0, fourth weight matrix W^hrThe third line of data of (1) includes: 1. 0,1, 2, fourthWeight matrix W^hrThe fourth line data of (2) includes: 1. 0, 2 and 1. The operation unit 211 uses the second weight matrix W of the t-th time step^hzAnd a third weight matrix W^hrSplicing along the width direction of the matrix, and generating a spliced second splicing weight matrix W^hzrA matrix of size 4 x 8. Wherein the second stitching weight matrix W^hzrThe first row of data of (1) includes: 3. 2, 1, 0, 2, 1, 0, a second weight matrix W^hzThe second line of data includes: 2. 1, 0, 3, 2, 1, 0, a second weight matrix W^hzThe third line of data of (1) includes: 1. 0,1, 2, 1, 0,1, 2, a second weight matrix W^hzThe fourth line data of (2) includes: 1. 3, 2, 1, 0, 2, 1.

S1004: the operation unit 211 determines the first operator of the GRU network at the t-th time step according to the linear operator of the reset gate and the linear operator of the update gate of the GRU network at the t-th time step, and operates the first operator of the GRU network at the t-th time step according to the input data at the t-th time step, the first splicing weight matrix, the second splicing weight matrix, and the output data at the t-1 th time step to generate the first output data of the GRU network at the t-th time step.

In some embodiments, the arithmetic unit 211 determines the first operator for the t-th time step from the linear operator of the reset gate and the linear operator of the update gate of the GRU network. The operation unit 211 runs the first operator at the t-th time step according to the input data at the t-th time step, the first splicing weight matrix, the second splicing weight matrix, and the output data at the t-1 th time step, and generates the first output data at the t-th time step. In particular, the first output data zr at the t-th time step_t' can be expressed by the following formula (5):

zr_t′＝x_t×W^xzr+h_t-1×W^hzr (5)

in the formula (5), x_t×W^xzr+h_t-1×W^hzrFirst operator of GRU network representing t time step, x_tInput data representing the t-th time step, h_t-1Represents the t-1 th time stepOutput data of W^xzrA first stitching weight matrix, W, representing the t-th time step^hzrA second stitching weight matrix representing the t-th time step, x representing matrix multiplication, and + representing matrix addition.

For example, as shown in FIG. 7, the input data x at the t-th time step_tIs a matrix of size 1 × 2, in which the input data x_tThe first row of data of (1) includes 1, 2. Output data h of t-1 time step_t-1Is a 1 × 4 matrix in which the output data h_t-1The first row of data of (1) includes 1, 2, 1, 3. First stitching weight matrix W^xzrA matrix of size 2 x 8. First stitching weight matrix W^xzrReference is made to the description of fig. 5 for specific data therein. Second stitching weight matrix W^hzrA matrix of size 4 x 8. Second stitching weight matrix W^hzrReference is made to the description of fig. 6 for specific data therein.

Specifically, the arithmetic unit 211 calculates the input data x according to the t-th time step_tA first stitching weight matrix W^xzrA second stitching weight matrix W^hzrAnd output data h of t-1 time step_t-1Running the first operator of the GRU network at the t-th time step to generate the first output data zr of the GRU network at the t-th time step_t'. As shown in FIG. 7, the first output data zr of the GRU network at the t-th time step_t' first output data zr of GRU network at t time step, which is a matrix of size 1 × 8_tThe first row data of' includes: 22. 17, 15, 8, 10, 9.

S1005: the operation unit 211 determines an output result of the update gate and an output result of the reset gate in the first output data of the GRU network at the t-th time step according to the width values of the first weight matrix and the third weight matrix, or the width values of the second weight matrix and the fourth weight matrix.

For example, as shown in FIG. 8, the first weight matrix W^xzAnd a third weight matrix W^xrAre all 4, the arithmetic unit 211 outputs the first output data zr_t' column 1 to column 4 data as the output result z of the update gate_t', with a first outputData of 5 th to 8 th columns of data as an output result r of the reset gate_t'. As shown in FIG. 8, the output z of the update gate of the GRU network at the t-th time step_t' update the output z of the gate for a matrix of size 1 × 4_tThe first row of data of' includes 22, 17, 15, 17. Output result r of reset gate_t' reset the output z of the gate for a matrix of size 1 × 4_tThe first row of data of' includes 15, 8, 10, 9.

S1006: the operation unit 211 generates a gate value of the reset gate and a gate value of the update gate by operating the nonlinear operator of the GRU network at the t-th time step according to the output result of the reset gate and the output result of the update gate.

In some embodiments, the operation unit 211 determines a gate value of an update gate of the GRU network at a t-th time step by operating a non-linear operator of the update gate of the GRU network at the t-th time step according to an output result of the update gate. In particular, the gate value z of the update gate of the GRU network at the t-th time step_tCan be calculated by the following equation (6):

z_t＝σ(z′_t) (6)

in equation (6), σ (z'_t) Represents the nonlinear operator of the update gate of the GRU network at the t-th time step, sigma () represents a sigmoid function, and the calculation formula of the sigmoid function can be expressed as

In some embodiments, the operation unit 211 determines the gate value of the reset gate of the GRU network at the t-th time step by operating the nonlinear operator of the reset gate of the GRU network at the t-th time step according to the output result of the reset gate. In particular, the gate value z of the reset gate of the GRU network at the t-th time step_tCan be calculated by the following equation (7):

r_t＝σ(r_t′) (7)

in equation (7), σ (r)_t') represents the nonlinear operator of the reset gate of the GRU network at the t-th time step, σ() Representing a sigmoid function, the formula for calculating the sigmoid function can be expressed as

S1007: the operation unit 211 determines output data of the t-th time step based on the gate value of the reset gate of the t-th time step, the gate value of the update gate of the t-th time step, the input data of the t-th time step, and the output data of the t-1-th time step.

As can be seen from the process of the GRU model 10 operation method in fig. 10, in the process that the operation unit 211 calculates the gate values of the reset gate and the update gate of the GRU network at the t-th time step, the operation unit 211 generates the first concatenation weight matrix by concatenating the first weight matrix and the third weight matrix in the width direction. And splicing the second weight matrix and the fourth weight matrix along the width direction to generate a second spliced weight matrix. The operation unit 211 performs matrix operation by operating the first operator, using the first stitching weight matrix as the weight parameter of the input data at the t-th time step, and using the second stitching weight matrix as the weight parameter of the output data at the t-1 th time step, so that the operation unit 211 can determine the gate value of the refresh gate and the reset gate at the t-th time step by reading the data from the storage unit 212 once. Thus, when the GRU model 10 processes a section of speech to be recognized, compared with the text operation unit 211 which needs the operation unit 211 to read at least 2n times of data from the storage unit 212 to recognize the text corresponding to the speech, as shown in fig. 9, the text operation unit 211 can recognize the text corresponding to the speech only by reading n times of data from the storage unit 212, thereby reducing the data reading amount by one time, increasing the speed of the processor 21 for performing speech processing, further increasing the speed of the electronic device 20 for speech recognition, shortening the time of speech recognition, and improving the user experience.

The embodiments disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.

It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.

It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.

While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. A neural network model operation method is applied to electronic equipment and is characterized in that the neural network model comprises a first operation and a second operation;

and the method comprises:

acquiring a data matrix to be operated of the first operation or the second operation, wherein the number of operation factors in the first operation and the second operation is the same, and for each operation factor in the first operation, corresponding operation factors with the same data matrix to be operated and different operation coefficient matrixes exist in the second operation;

performing a third operation on the data matrix to be operated to generate a result matrix of the third operation, wherein the third operation is an operation mode obtained by combining operation coefficient matrixes of corresponding operation factors in the first operation and the second operation;

and splitting the result matrix of the third operation to respectively obtain a result matrix of the first operation and a result matrix of the second operation.

2. The method according to claim 1, wherein the operation coefficient matrixes of the first operation and the second operation comprise two data dimensions of height and width, and the operation coefficient matrixes of the operation factors corresponding to the first operation and the second operation have the same height and width;

the third operation is an operation mode obtained after the operation coefficient matrixes of the corresponding operation factors in the first operation and the second operation are combined along any data dimension direction.

3. The method according to claim 2, wherein the third operation is an operation mode obtained by combining operation coefficient matrixes of corresponding operation factors in the first operation and the second operation in a width direction.

4. The method of claim 1, wherein the splitting the result matrix of the third operation to obtain the result matrix of the first operation and the result matrix of the second operation respectively comprises:

the result matrix of the third operation comprises two data dimensions of height and width;

and splitting the result matrix of the third operation along any data dimension direction to respectively obtain a result matrix of the first operation and a result matrix of the second operation.

5. The method of claim 4, wherein the result matrix of the third operation is split along the width direction to obtain a result matrix of the first operation and a result matrix of the second operation.

6. The method according to claim 1, wherein the third operation is an operation obtained by combining operation coefficient matrices of corresponding operation factors in the first operation and the second operation, and comprises:

the data matrix to be operated comprises a first input data matrix and a second input data matrix, and the operation factors of the first operation and the second operation comprise the matrix product of the first input data matrix of the first operation and the second operation and the corresponding operation coefficient matrix, and the matrix product of the second input data matrix of the first operation and the second operation and the corresponding operation coefficient matrix;

the third operation is an operation mode obtained by combining the operation coefficient matrix corresponding to the first input data matrix of the first operation and the second input data matrix of the second operation, and combining the operation coefficient matrix corresponding to the second input data matrix of the first operation and the second operation.

7. The method of claim 1, wherein the neural network model is a recurrent neural network model, and the first and second operations are operations of a fully-connected layer of the recurrent neural network model.

8. The method of claim 7, wherein the neural network model comprises at least one of: a threshold cycle unit model and a long-short term memory model.

9. An electronic device is characterized by comprising an operation unit and a storage unit, wherein the operation unit operates a first matrix operation circuit;

the operation unit acquires a data matrix to be operated of the first operation or the second operation from the storage unit, wherein the number of operation factors in the first operation and the second operation is the same, and for each operation factor in the first operation, corresponding operation factors with the same data matrix to be operated and different operation coefficient matrixes exist in the second operation;

the operation unit performs a third operation on the data matrix to be operated by operating a first matrix operation circuit to generate a result matrix of the third operation, wherein the third operation is an operation mode obtained by combining operation coefficient matrixes of corresponding operation factors in the first operation and the second operation, and the first matrix operation circuit is used for generating the first operation result matrix and the second operation result matrix;

and the operation unit splits the result matrix of the third operation to respectively obtain a result matrix of the first operation and a result matrix of the second operation.

10. An electronic device, comprising:

a memory for storing instructions for execution by one or more processors of the electronic device, an

A processor, one of the one or more processors of the electronic device, configured to perform the neural network model operation method of any one of claims 1 to 8.

11. A readable medium, characterized in that the readable medium of an electronic device has stored thereon instructions that, when executed, cause the electronic device to perform the neural network model operation method of any one of claims 1 to 8.