WO2018108126A1 - 神经网络卷积运算装置及方法 - Google Patents
神经网络卷积运算装置及方法 Download PDFInfo
- Publication number
- WO2018108126A1 WO2018108126A1 PCT/CN2017/116161 CN2017116161W WO2018108126A1 WO 2018108126 A1 WO2018108126 A1 WO 2018108126A1 CN 2017116161 W CN2017116161 W CN 2017116161W WO 2018108126 A1 WO2018108126 A1 WO 2018108126A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- matrix
- transformation
- neuron
- neural network
- winograd
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
- G06N3/065—Analogue means
Definitions
- the present disclosure relates to the field of artificial neural network technologies, and in particular, to a neural network convolution operation device and a convolution operation method of a neural network.
- Multi-layer artificial neural networks are widely used in the fields of pattern recognition, image processing, function approximation and optimization calculation.
- Multi-layer artificial networks have been accepted by Kirin, image processing, function approximation and optimization calculation.
- Multi-layer artificial networks have been accepted by Kirin, image processing, function approximation and optimization calculation.
- Multi-layer artificial networks have been accepted by Kirin, image processing, function approximation and optimization calculation.
- Multi-layer artificial networks have been accepted by Kir in recent years due to their high recognition accuracy and good parallelism. The industry is getting more and more attention.
- the purpose of the present disclosure is to provide a convolution operation and method for a neural network convolution operation device neural network to at least partially solve the above technical problems.
- the disclosure provides a neural network convolution operation device for implementing a convolution operation of a weight matrix and a neuron in a neural network by matrix multiplication, including:
- a shift operator for respectively performing a winograd transformation on the neuron matrix and the weight matrix to obtain a transformed neuron matrix and a transformed weight matrix
- a matrix multiplication operator configured to perform a matrix multiplication operation of multiplying the transformed neuron matrix and the transformed weight matrix by a matrix to obtain a multiplication matrix
- the shift operator is further configured to perform inverse inverse transformation on the multiplication matrix to obtain a convolution operation result
- a controller configured to control the shift operator to perform a winograd transform or a winograd inverse transform, and is further configured to control the matrix multiplier to perform a matrix multiplication operation.
- an on-chip buffer is further included for storing the neuron matrix and the weight matrix, and is further configured to store a transformation matrix C for performing a winograd transformation on the neuron matrix, for storing the weight matrix for winograd Transformed transformation matrix G.
- the element values in the transformation matrix C and the transformation matrix G are independently ⁇ 2 n or 0, and n is an integer.
- the on-chip buffer is further configured to store a winograd inverse transform matrix and to store an inverse transform matrix A that performs a winograd inverse transform on the neuron matrix.
- the value of the element in the inverse transformation matrix A is ⁇ 2 n or 0, and n is an integer.
- the controller is further configured to control, according to the transformation matrix C or its transposition matrix C T , the binary value independent of each element in the neuron matrix or the weight matrix by the shift operator Performing a left shift or a right shift; or controlling the shift operator to independently shift the binary value of each element in the weight matrix to the left or right according to the transformation matrix G or its transposition matrix G T .
- the controller is further configured to independently perform a left shift or a right shift on the binary values of the elements in the multiplication matrix according to the inverse transform matrix A or its transposed matrix A T .
- a thinning processing unit is further configured to perform thinning processing on the transformed weight matrix to generate a binary sparse sequence, where “0” corresponds to a value in the transformed weight matrix.
- the element of 0", "1" corresponds to an element whose value is not 0 in the transformed weight matrix; preferably, the sparse sequence corresponds to the elements of the transformed weight matrix read row by row or column by column from the upper to the lower.
- the mapping unit further includes a mapping unit that generates a mapping relationship between the sparse sequence and the position of the element in the transformed neuron matrix, and the Kth bit of the sparse sequence corresponds to the M row ⁇ N column neuron matrix.
- the controller is further configured to control the matrix multiplier to perform a matrix multiplication operation according to the mapping relationship table, wherein a bit of “0” in the sparse sequence corresponds to a matrix in the neuron matrix The corresponding elements are not matrix multiplied.
- an adder is further included for accumulating the result of the shift operation of the shift operator according to the matrix multiplication rule when the neuron matrix and the weight matrix respectively perform a winograd transform.
- a data buffer unit is further included for buffering the sparse sequence and the mapping relationship table.
- Another aspect of the present disclosure provides a method for performing a convolution operation using any of the above neural network convolution operation devices, including:
- the neuron matrix and the weight matrix are respectively subjected to winograd transformation by a shift operator and an adder to obtain a transformed neuron matrix and a transformed weight matrix;
- a matrix multiplication operation for multiplying the transformed neuron matrix and the transformed weight matrix by a matrix multiplication operator to obtain a multiplication matrix
- the shift operator is controlled by the controller to perform a winograd transform or a winograd inverse transform
- the matrix multiply operator is further controlled by the controller to perform a matrix multiplication operation.
- the method further includes: storing the neuron matrix and the weight matrix by using an on-chip cache, and storing a transformation matrix C for performing a winograd transformation on the neuron matrix, for storing a transform of the weight matrix by winograd transformation Matrix G.
- the element values in the transformation matrix C and the transformation matrix G are independently ⁇ 2 n or 0, and n is an integer.
- the on-chip cache storage is also employed to perform a winograd inverse transform matrix, and an inverse transform matrix A that performs a winograd inverse transform on the neuron matrix is stored.
- the value of the element in the inverse transformation matrix A is ⁇ 2 n or 0, and n is an integer.
- the method further includes: acquiring the transformation matrix C for performing a winograd transformation on the neuron matrix, and transforming the matrix of the weight matrix by winograd transformation G, storing an inverse transformation matrix A of the inverse inverse transform of the neuron matrix, comprising:
- the transformation matrix C, the transformation matrix G, and the inverse transformation matrix A are determined according to the winograd algorithm.
- the controller controls the shift operator to independently shift the binary value of each element in the neuron matrix or the weight matrix to the left or right according to the transformation matrix C or its transposition matrix C T And shifting the binary value of each element in the weight matrix independently to the left or right or to zero according to the element value in the transformation matrix G or its transposed matrix G T .
- the controller independently shifts the binary value of each element in the multiplication matrix to the left or right or to zero according to the element values in the inverse transform matrix A or its transpose matrix A T .
- the transformed weight matrix is thinned by the thinning processing unit to generate a binary sparse sequence, where “0” corresponds to an element with a value of “0” in the transformed weight matrix. "1” corresponds to an element whose value is not 0 in the transformed weight matrix; preferably, the sparse sequence corresponds to the elements of the transformed weight matrix read row by row or column by column from the upper to the lower.
- the matrix multiplier is controlled by the controller to perform a matrix multiplication operation according to the mapping relationship table, wherein a bit of “0” in the sparse sequence corresponds to a corresponding element in the neuron matrix. Matrix multiplication.
- the winograd transform is performed by the adder on the neuron matrix and the weight matrix, respectively, and the result of the shift operation of the shift operator is accumulated according to the matrix multiplication rule.
- the method further includes caching the sparse sequence and mapping relationship table by a data cache unit.
- a further aspect of the present disclosure provides a neural network convolution operation apparatus, comprising one or more of the above-described neural network convolution operation devices for acquiring data to be processed and control information, and performing a neural network operation.
- Yet another aspect of the present disclosure provides a combined processing apparatus comprising the above-described neural network computing device, a universal interconnect interface, and other processing devices for performing non-neural network operations, the other processing device and the neural network
- the computing device is connected through a universal interconnect interface.
- Still another aspect of the present disclosure provides a chip characterized by comprising the above-described neural network computing device or the combined processing device of claim 27.
- a further aspect of the present disclosure provides an electronic device comprising the chip described above.
- the shift operator of the present disclosure can completely replace the multiplier when performing the neuron matrix and the weight matrix winograd transform and the inverse transform, and can perform the multiplication operation only by the shift operation;
- the present disclosure can turn a complex convolution operation into a sparse matrix multiplication operation, and the transform and inverse transform processes can be implemented by bit operations.
- the amount of computation required for convolution can be greatly reduced, and the operation speed of the neural network can be improved. Improve the efficiency of data processing;
- the disclosure can reduce the storage space required for storing network parameters, reduce the bandwidth of memory access, and perform matrix multiplication of the multiplicative multiplication of the transformed neuron matrix and the transformed weight matrix. When operating, reduce multiplication and save overhead.
- FIG. 1 is a schematic block diagram showing the structure of a neural network convolution operation device according to an embodiment of the present disclosure.
- FIG. 2 is a flow chart schematically showing a method of performing a convolution operation by the neural network convolution operation device of the embodiment of the present disclosure.
- FIG. 3 schematically shows a mapping relationship table of an embodiment of the present disclosure.
- Fig. 4 schematically shows a schematic diagram of a convolution operation.
- Figure 5 is a schematic illustration of the process of performing the convolution operation of Figure 4 in accordance with an embodiment of the present disclosure, in conjunction with the apparatus described in the disclosed embodiments.
- FIG. 6 is a schematic structural diagram of a combined processing apparatus according to an embodiment of the present disclosure.
- the techniques of this disclosure may be implemented in the form of hardware and/or software (including firmware, microcode, etc.). Additionally, the techniques of the present disclosure can take the form of a computer program product on a computer readable medium storing instructions for use by an instruction execution system.
- the neural network convolution operation device 100 includes an operator 110 and a controller 120, wherein the operator 110 includes a shift The bit operator 111 and the matrix multiplier 112.
- the shift operator 111 is configured to separately perform a neuron matrix and a weight matrix.
- the winograd transform obtains the transformed neuron matrix and the transformed weight matrix.
- the matrix multiplication operator 112 is configured to perform multiplication of the transformed neuron matrix and the transformed weight matrix by a matrix multiplication operation to obtain a multiplication matrix; the shift operator 111 is further configured to use a multiplication matrix after obtaining the multiplication matrix.
- the inverse inverse of winograd is performed to obtain the result of the convolution operation.
- the controller 120 of the embodiment of the present disclosure is configured to control the shift operator 111 to perform a winograd transform or a winograd inverse transform, and is also used to control the matrix multiplier 111 to perform a matrix multiplication operation.
- an on-chip buffer is further included for storing the neuron matrix and the weight matrix, and is further configured to store a transformation matrix C for performing a winograd transformation on the neuron matrix, and storing the transformation of the weight matrix by the winograd transformation.
- the on-chip cache can be a cache register.
- the element values in the transformation matrix C and the transformation matrix G are independently ⁇ 2 n or 0, and n is an integer.
- the independent means that the elements on the two transformation matrices individually take values to satisfy the above conditions.
- the on-chip cache includes an input neuron buffer that exclusively stores the neuron matrix, an output neuron cache, a weight buffer that exclusively stores the weight matrix, and two buffers that specifically store the transformation matrix C and the transformation matrix G. Or input the neuron cache, output the neuron cache, and any two of the weight buffers can be used to store the transformation matrix C and the transformation matrix G.
- the memory is further configured to store a winograd inverse transform matrix and to store an inverse transform matrix A that performs a winograd inverse transform on the neuron matrix.
- the value of the element in the inverse transformation matrix A is ⁇ 2 n or 0, and n is an integer.
- the controller 120 is further configured to control the shift operator to independently shift the binary value of each element in the neuron matrix or the weight matrix to the left or right according to the transformation matrix C or its transposed matrix C T Or according to the transformation matrix G or its transposition matrix G T , the control shift operator independently shifts the binary value of each element in the weight matrix to the left or right or to zero. Since the elements in the matrix C and G are both a power exponent or an integer multiple, the multiplication of the corresponding elements between the matrix C and the neuron matrix can be realized by shifting left, right, or zero.
- the controller is further configured to independently perform a left shift or a right shift on the binary values of the elements in the multiplication matrix according to the inverse transform matrix A or its transpose matrix A T .
- the thinning processing unit 113 is further configured to perform a thinning process on the transformed weight matrix to generate a binary sparse sequence, where “0” corresponds to a value of “0” in the transformed weight matrix.
- the element, "1" corresponds to the transformed weight matrix, the value is not 0
- a mapping unit 114 is further included.
- the mapping unit 114 generates a mapping relationship table between the sparse sequence and the element position in the transformed neuron matrix.
- the Kth bit of the sparse sequence corresponds to the i th in the M row ⁇ N column neuron matrix.
- the controller 120 is further configured to perform a matrix multiplication operation by the control matrix multiplier 112 according to the mapping relationship table, wherein the bit corresponding to “0” in the sparse sequence does not perform a matrix corresponding to the corresponding element in the neuron matrix.
- Multiplication operation Since the multiplication is performed by the alignment (for example, the i-th row and the j-th column in the first matrix are multiplied with the i-th row and the j-th column in the second matrix, the result is the i-th row and the j-th column of the result matrix. Element), matrix multiplier 120 primarily includes one or more multipliers.
- an adder 115 is further included for accumulating the result of the shift operation of the shift operator according to the matrix multiplication rule when the neuron matrix and the weight matrix respectively perform a winograd transform.
- the winograd transform and the winograd inverse transform when the two 3 ⁇ 3 matrices are multiplied, the value of the first row and the first column of the determination result matrix is corresponding to the three elements of the first row of the first matrix.
- the first element of the first column of the second matrix is accumulated by the adder 115 to obtain the final result of the first column of the first row of the matrix.
- a data buffer unit 130 is further included for buffering the sparse sequence and the mapping relationship table.
- FIG. 2 is a flow chart schematically showing a method for performing a convolution operation by using the neural network convolution operation device of the above embodiment. As shown in FIG. 2, the method includes:
- Step 1 performing a winograd transformation on the neuron matrix and the weight matrix by the shift operator 111 and the adder 115 to obtain a transformed neuron matrix and a transformed weight matrix;
- the neuron matrix d 0 and the weight matrix w 0 are subjected to winograd transformation using the following formula to obtain the transformed neuron matrix d and the transformed weight matrix w:
- C is the transformation matrix of the neuron matrix d 0
- C T is the transposed matrix of C
- G is the transformation matrix of the weight matrix w 0
- G T is the transposed matrix of G.
- the transformation matrices C and G of the neuron matrix d 0 and the weight matrix w 0 are obtained using the winograd algorithm.
- winograd algorithm which uses block multiplication of matrices to reduce the number of multiplications of matrix multiplication.
- matrix partitioning methods There are many different matrix partitioning methods.
- a winograd algorithm is as follows:
- M 5 S 1 S 5
- M 6 S 4 B 22
- M 7 A 22 S 8
- T 1 M 1 + M 2
- T 2 T 1 + M 4
- the transformation matrix required for convolution is obtained by the above winograd algorithm, for example, for a one-dimensional convolution [d 1 , d 2 , d 3 ]*[w 1 , w 2 ], assuming that each convolution slip is 1,
- the convolution can be extended into a matrix multiplied form
- M 1 (-a 1 + a 2 + a 3 ) b 1
- M 2 a 1 b 1
- M 3 a 2 b 2
- M 4 0
- m 1 (-a 1 + a 2 + a 3 ) b 1
- m 2 a 1 b 1
- m 3 a 2 b 2
- m 4 a 3 (b 1 - b 2 )
- the convolutional transformation matrix can be obtained by multiple matrix partitioning.
- the winograd algorithm has different matrix blocking methods.
- the specific values and dimensions of the transformation matrix are determined by the dimension of the input neuron and the weight matrix and the convolution sliding step.
- the specific transformation method can refer to There are winograd algorithms.
- the specific value and dimension of the transformation matrix are determined by the dimensions of the input neuron and the weight matrix.
- the specific influencing factors include the dimension of the input neuron, the dimension of the weight matrix, and the sliding step size of each convolution operation.
- the values and dimensions of the transformation matrices are also determined. Since the three influencing factors can be set in advance in the neural network structure, the present embodiment is offline or preprocessed. Operation to complete the setting for each transformation matrix.
- the values in the neuron matrix and the weight matrix are binary, and the values of the elements in the transformation matrix C, G are ⁇ 2 n or 0, such as -2, -1, -0.5, 0, 0.5, 1, 2 Wait.
- the embodiment of the present disclosure implements a winograd transform using a bit operation, and implements operations of multiplying 2 and dividing 2 by left shift and right shift. For example, when a value in the neuron matrix d 0 is multiplied by 0.5, the value is shifted to the right by one bit. When multiplied by -0.5, the value is shifted to the left by one bit and the highest bit is inverted.
- the embodiment of the present disclosure implements a winograd transformation through a bit operation, which reduces the amount of calculation and improves the operation speed.
- Step 2 through the matrix multiplier 112, a matrix multiplication operation for multiplying the transformed neuron matrix and the transformed weight matrix by a multiplication, to obtain a multiplication matrix t:
- the two matrices participating in the operation may have different scales, so multiple matrix multiplication operations need to be performed by a sliding operation, and in the embodiment of the present disclosure, the converted neurons are The matrix d and the weight matrix w conform to the matrix multiplication rule, that is, only one matrix multiplication operation is performed, which greatly saves the calculation amount.
- the embodiment of the present disclosure maps the transformed weight matrix to a sparse sequence consisting of “0” and “1” by the thinning processing unit 113, where “0” corresponds to the value of the transformed weight matrix being “0”.
- the element, "1” corresponds to an element whose value is not 0 in the transformed weight matrix.
- a mapping relationship between a coefficient sequence and an element position in the transformed neuron matrix is formed by the mapping unit 114.
- the elements of the corresponding positions in the transformed neuron matrix are extracted according to the "1" recorded by the sparse sequence to be multiplied by the corresponding elements in the transformed weight matrix.
- the sparse sequence corresponding to w is 1110111011101100 (read one line by line, or one column and one column).
- the use of sparse sequences can further reduce the amount of computation of matrix multiplication operations.
- the sparse sequence can be completed offline, and the storage space occupied by the sparse sequence relative to the sparse storage space is very small, so the process does not affect the operation speed and storage space of the neural network.
- step 3 the multiplication matrix is inverse-transformed by the shift operator 111 and the adder 115 to obtain a convolution operation result.
- the multiplication matrix t is inverse-transformed by winograd using the following formula to obtain an operation result:
- A is the inverse transformation matrix and A T is the transposed matrix of A.
- the inverse transformation matrix A is the same as C and G, and is obtained by the winograd algorithm. The specific process is not repeated here.
- the value of the inverse transformation matrix A is also 0 or ⁇ 2 n , n is Integers, also perform operations between values by bit operations.
- the shift operator is controlled by the controller to perform winograd transformation or winograd inverse transformation
- the matrix multiplication operator is also controlled by the controller to perform matrix multiplication operation.
- the convolution kernel is a 3 ⁇ 3 matrix, and the convolution kernel slides on the input image, wherein the convolution kernel in the figure is the disclosure.
- the layer weight matrix, the input image is the neuron matrix of the disclosure.
- the convolution operation commonly used in neural networks it is assumed that each time a pixel is slid, a total of four convolution operations are required, and each convolution operation, the convolution kernel and the corresponding data image data are multiplied and added. Therefore, for different output neurons on the same output feature map, the required input neurons are different, and the weights and connection relationships are the same.
- FIG. 5 is a schematic diagram showing the process of performing the convolution operation of FIG. 4 according to the embodiment of the present disclosure, as shown in FIG. 5:
- step S1 the controller 120 reads an instruction from the memory.
- step S2 the controller 120 decodes the microinstruction, and then the neural network convolution operation device 100 reads the data required to perform the convolution operation from the external address space according to the microinstruction, including the neuron matrix d 0 and the weight matrix w 0 . . Then, the transformation matrices C, G, and the inverse transform matrix A are obtained, in the example of FIG. 4:
- step S3 the shift operator 111 reads the neuron matrix d 0 and the weight matrix w 0 from the memory or the data buffer, respectively, the shift operator 111 and the adder 115 pair the neuron matrix d 0 and the weight matrix w 0 do winograd transformation, namely:
- step S4 the thinning processing unit 113 obtains a sparse sequence according to the transformed weight matrix w, that is, [1110111011101100]. Then, according to the mapping relationship of the mapping unit 114, by traversing the weight matrix, the non-zero value of the weight matrix is marked with a bit 1 , the zero value is marked with a bit 0, and finally a bit sequence is obtained as a sparse sequence, and the length and weight of the bit sequence are The number of values of the value matrix is the same.
- step S5 the matrix multiplier 115 selects the corresponding neuron and the weight according to the sparse sequence to perform a multiplication operation, and completes the matrix multiplication of the input neuron and the weight, wherein the transformed neuron matrix d is based on the index sequence.
- d 03 , d 13 , d 23 , d 32 , d 33 ] do not participate in the operation, and finally get the operation result, namely:
- Step S6 the shift operator 111 and the adder 115 perform a winograd inverse transform operation on the result of multiplying the matrix, and obtain an output as follows
- the present disclosure also discloses a neural network computing device including one or more neural network convolution operation devices mentioned in the present disclosure for acquiring data to be processed and control information from other processing devices, executing specified nerves Network computing, the execution results are passed to the peripheral device through the I/O interface.
- Peripherals such as cameras, monitors, mice, keyboards, network cards, wifi interfaces, servers.
- the neural network convolution operation devices can link and transmit data through a specific structure, for example, interconnecting and transmitting data through the PCIE bus to support a larger-scale neural network. The operation.
- the interconnection method can be any interconnection topology.
- the neural network computing device has high compatibility and can be connected to various types of servers through a PCIE interface.
- the present disclosure also discloses a combined processing apparatus including the above-described neural network computing device, universal interconnect interface, and other processing devices.
- the neural network computing device interacts with other processing devices (for performing non-neural network operations) to perform user-specified operations.
- Fig. 6 is a schematic structural view of a combined processing apparatus.
- Other processing devices include processor types of one or more of general purpose/dedicated processors such as a central processing unit CPU, a graphics processing unit GPU, a neural network processor, and the like. Other treatment equipment There is no limit to the number of processors included.
- the other processing device serves as an interface between the neural network computing device and external data and control, including data handling, and completes basic control such as opening and stopping of the neural network computing device; other processing devices may also cooperate with the neural network computing device to complete the computing task.
- a universal interconnect interface for transmitting data and control commands between the neural network computing device and other processing devices.
- the neural network computing device acquires the required input data from other processing devices and writes the storage device on the slice of the neural network computing device; the control command can be obtained from the other processing device and written into the control cache on the slice of the neural network computing device;
- the data in the storage module of the neural network computing device can be read and transmitted to other processing devices.
- the combined processing device can be used as a SOC on-chip system for mobile phones, robots, drones, video monitoring devices, etc., effectively reducing the core area of the control part, increasing the processing speed, and reducing the overall power consumption.
- the universal interconnect interface of the combined processing device is coupled to certain components of the device. Some components such as camera, monitor, mouse, keyboard, network card, wifi interface.
- the present disclosure discloses a chip that includes the neural network computing device or combination processing device described above.
- the present disclosure discloses a chip package structure that includes the chip described above.
- the present disclosure discloses a board that includes the chip package structure described above.
- the present disclosure discloses an electronic device that includes the above-described card.
- the electronic device may include a data processing device, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a cloud server, a camera, a camera, a projector, a watch, a headset, Mobile storage, wearable devices, vehicles, household appliances, and/or medical devices.
- a data processing device a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a cloud server, a camera, a camera, a projector, a watch, a headset, Mobile storage, wearable devices, vehicles, household appliances, and/or medical devices.
- the vehicle includes an airplane, a ship, and/or a vehicle;
- the household appliance includes a television, an air conditioner, a microwave oven, a refrigerator, a rice cooker, a humidifier, a washing machine, an electric lamp, a gas stove, a range hood;
- the medical device includes a nuclear magnetic resonance instrument, B-ultrasound and / or electrocardiograph.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
- Image Processing (AREA)
Abstract
一种神经网络卷积运算装置和方法,其中装置用于以矩阵乘法的方式实现神经网络中的权值矩阵与神经元的卷积运算,其中包括:移位运算器,用于对所述神经元矩阵和权值矩阵分别进行winograd变换,得到变换后神经元矩阵和变换后权值矩阵;矩阵乘法运算器,用于将所述变换后神经元矩阵和变换后权值矩阵进行对位相乘的矩阵乘法操作,得到乘法矩阵;所述移位运算器还用于将所述乘法矩阵进行winograd反变换,得到卷积运算结果;控制器,用于控制所述移位运算器进行winograd变换或winograd反变换,还用于控制所述矩阵乘法运算器进行矩阵乘法操作。
Description
本披露涉及人工神经网络技术领域,具体涉及一种神经网络卷积运算装置以及神经网络的卷积运算方法。
多层人工神经网络被广泛应用于模式识别、图像处理、函数逼近和优化计算等领域,多层人工网络在近年来由于其较高的识别准确度和较好的可并行性,受到学术界和工业界越来越广泛的关注。
为了适应越来越来高的任务需求,神经网络的规模变得越来越庞大,目前大型的卷积神经网络已经包含了上百层的网络层结构。随之带来的问题神经网络需要做更大量的运算,特别是卷积神经网络,大量的卷积运算降低了神经网络的运算速度,影响神经网络在实际应用场合的使用。
发明内容
有鉴于此,本披露的目的在于提供一种神经网络卷积运算装置神经网络的卷积运算及方法,以至少部分解决以上所述的技术问题。
本披露一方面提供一种神经网络卷积运算装置,用于以矩阵乘法的方式实现神经网络中的权值矩阵与神经元的卷积运算,其中包括:
移位运算器,用于对所述神经元矩阵和权值矩阵分别进行winograd变换,得到变换后神经元矩阵和变换后权值矩阵;
矩阵乘法运算器,用于将所述变换后神经元矩阵和变换后权值矩阵进行对位相乘的矩阵乘法操作,得到乘法矩阵;
所述移位运算器还用于将所述乘法矩阵进行winograd反变换,得到卷积运算结果;
控制器,用于控制所述移位运算器进行winograd变换或winograd反变换,还用于控制所述矩阵乘法运算器进行矩阵乘法操作。
在进一步的实施方案中,还包括片上缓存,用于存储所述神经元矩阵和权值矩阵,还用于存储对神经元矩阵进行winograd变换的变换矩阵C,用于存储对权值矩阵进行winograd变换的变换矩阵G。
在进一步的实施方案中,所述变换矩阵C和所述变换矩阵G中的元素数值独立为±2n或0,n为整数。
在进一步的实施方案中,所述片上缓存还用于存储进行winograd反变换矩阵,还用于存储对神经元矩阵进行winograd反变换的反变换矩阵A。
在进一步的实施方案中,所述反变换矩阵A中的元素数值为±2n或0,n为整数。
在进一步的实施方案中,所述控制器还用于根据所述变换矩阵C或其转置矩阵CT,控制所述移位运算器对神经元矩阵或权值矩阵中各元素的二进制数值独立进行左移或右移;或根据所述变换矩阵G或其转置矩阵GT,控制所述移位运算器对权值矩阵中各元素二进制数值独立进行左移或右移。
在进一步的实施方案中,所述控制器还用于根据所述反变换矩阵A或其转置矩阵AT,对乘法矩阵中各元素的二进制数值独立进行左移或右移。
在进一步的实施方案中,还包括稀疏化处理单元,用于所述变换后权值矩阵进行稀疏化处理,生成一二进制的稀疏序列,其中,“0”对应变换后权值矩阵中数值为“0”的元素,“1”对应变换后权值矩阵中数值不为0的元素;优选的,所述稀疏序列从高位至低位对应变换后权值矩阵逐行或逐列读取的元素。
在进一步的实施方案中,还包括映射单元,所述映射单元生成稀疏序列与变换后神经元矩阵中元素位置的映射关系表,稀疏序列的第K位对应M行×N列神经元矩阵中第i行第j列元素,且满足(i-1)×N+j=K或者(j-1)×M+i=K。
在进一步的实施方案中,所述控制器还用于根据所述映射关系表,控制所述矩阵乘法运算器进行矩阵乘法操作,其中,稀疏序列中为“0”的位对应神经元矩阵中的相应元素不进行矩阵乘法运算。
在进一步的实施方案中,还包括加法器,用于在所述神经元矩阵和权值矩阵分别进行winograd变换时,对移位运算器移位运算后的结果按照矩阵乘法规则累加。
在进一步的实施方案中,还包括数据缓存单元,用于缓存所述稀疏序列和映射关系表。
本披露另一方面提供一种应用以上任一种神经网络卷积运算装置进行卷积运算的方法,包括:
通过移位运算器和加法器对所述神经元矩阵和权值矩阵分别进行winograd变换,得到变换后神经元矩阵和变换后权值矩阵;
通过矩阵乘法运算器,用于将所述变换后神经元矩阵和变换后权值矩阵进行对位相乘的矩阵乘法操作,得到乘法矩阵;
通过移位运算器和加法器将所述乘法矩阵进行winograd反变换,得到卷积运算结果;
通过控制器控制所述移位运算器进行winograd变换或winograd反变换,还通过控制器控制所述矩阵乘法运算器进行矩阵乘法操作。
在进一步的实施方案中,还包括:采用片上缓存存储所述神经元矩阵和权值矩阵,以及存储对神经元矩阵进行winograd变换的变换矩阵C,用于存储对权值矩阵进行winograd变换的变换矩阵G。
在进一步的实施方案中,所述变换矩阵C和所述变换矩阵G中的元素数值独立为±2n或0,n为整数。
在进一步的实施方案中,还采用所述片上缓存存储进行winograd反变换矩阵,以及存储对神经元矩阵进行winograd反变换的反变换矩阵A。
在进一步的实施方案中,所述反变换矩阵A中的元素数值为±2n或0,n为整数。
在进一步的实施方案中,还包括获取所述对神经元矩阵进行winograd变换的变换矩阵C,对权值矩阵进行winograd变换的变换矩阵
G,存储对神经元矩阵进行winograd反变换的反变换矩阵A,包括:
接收所述输入神经元矩阵与权值矩阵的维度,以及卷积运算滑动步长数据;
根据接收的数据,依据winograd算法确定所述变换矩阵C,变换矩阵G和反变换矩阵A。
在进一步的实施方案中,控制器根据所述变换矩阵C或其转置矩阵CT,控制所述移位运算器对神经元矩阵或权值矩阵中各元素的二进制数值独立进行左移或右移;或根据所述变换矩阵G或其转置矩阵GT中的元素值,控制所述移位运算器对权值矩阵中各元素二进制数值独立进行左移或右移或归零。
在进一步的实施方案中,控制器根据所述反变换矩阵A或其转置矩阵AT中的元素值,对乘法矩阵中各元素的二进制数值独立进行左移或右移或归零。
在进一步的实施方案中,通过稀疏化处理单元对变换后权值矩阵进行稀疏化处理,生成一二进制的稀疏序列,其中,“0”对应变换后权值矩阵中数值为“0”的元素,“1”对应变换后权值矩阵中数值不为0的元素;优选的,所述稀疏序列从高位至低位对应变换后权值矩阵逐行或逐列读取的元素。
在进一步的实施方案中,通过映射单元生成稀疏序列与变换后神经元矩阵中元素位置的映射关系表,稀疏序列的第K位对应M行×N列神经元矩阵中第i行第j列元素,且满足(i-1)×N+j=K或者(j-1)×M+i=K。
在进一步的实施方案中,通过控制器根据所述映射关系表,控制所述矩阵乘法运算器进行矩阵乘法操作,其中,稀疏序列中为“0”的位对应神经元矩阵中的相应元素不进行矩阵乘法运算。
在进一步的实施方案中,包括通过加法器在所述神经元矩阵和权值矩阵分别进行winograd变换时,对移位运算器移位运算后的结果按照矩阵乘法规则累加。
在进一步的实施方案中,还包括通过数据缓存单元缓存所述稀疏序列和映射关系表。
本披露的再一方面提供一种神经网络卷积运算装置,包括一个或多个以上所述的神经网络卷积运算装置,用于获取待运算数据和控制信息,执行神经网络运算。
本披露的又一方面提供一种组合处理装置,包括以上所述的神经网络运算装置,通用互联接口,和用于执行非神经网络运算的其他处理装置,所述其他处理装置与所述神经网络运算装置通过通用互联接口连接。
本披露的还一方面提供一种芯片,其特征在于包括其包括以上所述的神经网络运算装置或权利要求27所述的组合处理装置。
本披露的再一方面提供一种电子装置,包括以上所述的芯片。
本披露的移位运算器在进行神经元矩阵和权值矩阵winograd变换和反变换时能够完全替代乘法器,仅通过移位操作即可完成乘法运算;
本披露能够将复杂的卷积操作变为稀疏矩阵乘法操作,并且变换与反变换过程可用位操作实现,通过这种方法可以大量减少卷积所需的运算量,提高神经网络的运算速度,大幅提高数据处理的效率;
本披露通过采用稀疏化单元形成稀疏序列,可以减少存储网络参数所需的存储空间,降低内存访问的带宽,还可以在变换后神经元矩阵和变换后权值矩阵进行对位相乘的矩阵乘法操作时,减少乘法运算,节省开销。
图1示意性示出了本披露实施例的神经网络卷积运算装置的结构示意图。
图2示意性示出了本披露实施例的神经网络卷积运算装置进行卷积运算的方法流程图。
图3示意性示出了本披露实施例的映射关系表。
图4示意性示出了卷积运算的示意图。
图5结合本披露实施例所描述的装置,示意性示出本披露实施例执行图4卷积运算的过程。
图6为本披露实施例的组合处理装置的结构示意图。
根据结合附图对本披露示例性实施例的以下详细描述,本披露的其它方面、优势和突出特征对于本领域技术人员将变得显而易见。
在本披露中,术语“包括”和“含有”及其派生词意为包括而非限制;术语“或”是包含性的,意为和/或。
在本说明书中,下述用于描述本披露原理的各种实施例只是说明,不应该以任何方式解释为限制发明的范围。参照附图的下述描述用于帮助全面理解由权利要求及其等同物限定的本披露的示例性实施例。下述描述包括多种具体细节来帮助理解,但这些细节应认为仅仅是示例性的。因此,本领域普通技术人员应认识到,在不悖离本披露的范围和精神的情况下,可以对本文中描述的实施例进行多种改变和修改。此外,为了清楚和简洁起见,省略了公知功能和结构的描述。此外,贯穿附图,相同参考数字用于相思功能和操作。
附图中示出了一些方框图和/或流程图。应理解,方框图和/或流程图中的一些方框或其组合可以由计算机程序指令来实现。这些计算机程序指令可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器,从而这些指令在由该处理器执行时可以创建用于实现这些方框图和/或流程图中所说明的功能/操作的装置。
因此,本披露的技术可以硬件和/或软件(包括固件、微代码等)的形式来实现。另外,本披露的技术可以采取存储有指令的计算机可读介质上的计算机程序产品的形式,该计算机程序产品可供指令执行系统使用。
图1示意性示出了本披露实施例的神经网络卷积运算装置的结构示意图,如图1所示,神经网络卷积运算装置100包括运算器110和控制器120,其中运算器110包括移位运算器111和矩阵乘法运算器112。
其中,移位运算器111,用于对神经元矩阵和权值矩阵分别进行
winograd变换,得到变换后神经元矩阵和变换后权值矩阵。矩阵乘法运算器112,用于将变换后神经元矩阵和变换后权值矩阵进行对位相乘的矩阵乘法操作,得到乘法矩阵;移位运算器111还用于在得到乘法矩阵后将乘法矩阵进行winograd反变换,得到卷积运算结果。本披露实施例的控制器120用于控制移位运算器111进行winograd变换或winograd反变换,还用于控制矩阵乘法运算器111进行矩阵乘法操作。
在一些实施例中,还包括片上缓存,用于存储神经元矩阵和权值矩阵,还用于存储对神经元矩阵进行winograd变换的变换矩阵C,用于存储对权值矩阵进行winograd变换的变换矩阵G。该片上缓存可以是高速缓冲寄存器。变换矩阵C和变换矩阵G中的元素数值独立为±2n或0,n为整数。这里的独立为是指两个变换矩阵上的元素各自单独取值满足上述条件。片上缓存包括专门存储神经元矩阵的输入神经元缓存,输出神经元缓存,专门存储权值矩阵的权值缓存,专门存储变换矩阵C和变换矩阵G的两个缓存。或者输入神经元缓存,输出神经元缓存,权值缓存的任意两个可以用于存储变换矩阵C和变换矩阵G。
在一些实施例中,存储器还用于存储进行winograd反变换矩阵,还用于存储对神经元矩阵进行winograd反变换的反变换矩阵A。反变换矩阵A中的元素数值为±2n或0,n为整数。
在一些实施例中,控制器120还用于根据变换矩阵C或其转置矩阵CT,控制移位运算器对神经元矩阵或权值矩阵中各元素的二进制数值独立进行左移或右移;或根据变换矩阵G或其转置矩阵GT,控制移位运算器对权值矩阵中各元素二进制数值独立进行左移或右移或者置零。其中由于矩阵C和G中的元素均为2幂指数或者整数倍,可以通过左移、右移或者置零实现矩阵C与神经元矩阵之间对应元素的乘法运算。
在一些实施例中,控制器还用于根据反变换矩阵A或其转置矩阵AT,对乘法矩阵中各元素的二进制数值独立进行左移或右移。
在一些实施例中,还包括稀疏化处理单元113,用于变换后权值矩阵进行稀疏化处理,生成一二进制的稀疏序列,其中,“0”对应变换后权值矩阵中数值为“0”的元素,“1”对应变换后权值矩阵中数值不为0
的元素;优选的,稀疏序列从高位至低位对应变换后权值矩阵逐行或逐列读取的元素。
在一些实施例中,还包括映射单元114,映射单元114生成稀疏序列与变换后神经元矩阵中元素位置的映射关系表,稀疏序列的第K位对应M行×N列神经元矩阵中第i行第j列元素,且满足(i-1)×N+j=K或者(j-1)×M+i=K。
在一些实施例中,控制器120还用于根据映射关系表,控制矩阵乘法运算器112进行矩阵乘法操作,其中,稀疏序列中为“0”的位对应神经元矩阵中的相应元素不进行矩阵乘法运算。由于实现的为对位相乘(例如第1个矩阵中的第i行第j列与第2各矩阵中的第i行第j列相乘,结果作为结果矩阵的第i行第j列的元素),矩阵乘法器120主要包括一个或多个乘法器。
在一些实施例中,还包括加法器115,用于在神经元矩阵和权值矩阵分别进行winograd变换时,对移位运算器移位运算后的结果按照矩阵乘法规则累加。例如在winograd变换和winograd反变换时,两个3×3的矩阵进行乘法操作时,在确定结果矩阵第1行第1列的数值是将第1个矩阵的第1行三个元素分别对应的第2个矩阵的第1列三个元素,三个积再通过上述加法器115进行累加,获得矩阵第1行第1列的最后结果。
在一些实施例中,还包括数据缓存单元130,用于缓存稀疏序列和映射关系表。
关于上述神经网络卷积运算装置100的各部分所实现的具体功能,以下还将结合以下卷积运算的方法进行进一步说明。
图2示意性示出了应用上述实施例的神经网络卷积运算装置进行卷积运算的方法流程图,如图2所示,方法包括:
步骤1:通过移位运算器111和加法器115对所述神经元矩阵和权值矩阵分别进行winograd变换,得到变换后神经元矩阵和变换后权值矩阵;
在本步骤中,采用下式对神经元矩阵d0和权值矩阵w0进行winograd
变换,得到变换后神经元矩阵d和变换后权值矩阵w:
d=CTd0C,w=Gw0GT,
其中,C为神经元矩阵d0的变换矩阵,CT为C的转置矩阵,G为权值矩阵w0的变换矩阵,GT为G的转置矩阵。
神经元矩阵d0和权值矩阵w0的变换矩阵C和G是采用winograd算法得到的。
为方便理解,以下简单介绍winograd算法,该算法利用矩阵的分块相乘以减小矩阵乘法的乘法次数,有多种不同的矩阵分块方法,一种winograd算法如下所示:
计算矩阵乘法C=AB,对各矩阵进行分块,有
记
S1=A21+A22,S2=S1-A11,S3=A11-A21,S4=A12-S2
S5=B12-B11,S6=B22-S5,S7=B22-B12,S8=S6-B21
M1=S2S6,M2=A11B11,M3=A12B21,M4=S3S7
M5=S1S5,M6=S4B22,M7=A22S8
T1=M1+M2,T2=T1+M4
则
C11=M2+M3+M6,C12=T1+M5
C21=T2-M7,C22=T2+M5
通过上述的winograd算法,获得卷积所需的变换矩阵,例如,对于一维卷积[d1,d2,d3]*[w1,w2],假设每次卷积滑动为1,可将卷积扩展成矩阵相乘的形式
通过winograd算法可获得
M1=(-a1+a2+a3)b1,M2=a1b1,M3=a2b2,M4=0
M5=(a2+a3)(-b1),M6=0,M7=a3(b1-b2)
output1=M2+M3+M6,output2=M1+M2+M4-M7
去除其中的0值项,和未用到部分可改写为
m1=(-a1+a2+a3)b1,m2=a1b1,m3=a2b2,m4=a3(b1-b2)
output1=m2+m3,output2=m1+m2-m4
从而可获得卷积的变换矩阵
上述获取变换矩阵的方法仅作为例示,并不应理解为本披露的限定。对于高维的矩阵,可通过多次矩阵分块获得其卷积变换矩阵。winograd算法有不同的矩阵分块方式,对同一种矩阵分块方式,变换矩阵的具体数值及维度由输入神经元与权值矩阵的维度决定以及卷积滑动步长决定,具体变换方式可参考已有的winograd算法。
从上述算法可以看出变换矩阵的具体数值及维度由输入神经元与权值矩阵的维度决定,具体影响因素包括输入神经元的维度、权值矩阵的维度和每次卷积操作的滑动步长,当这三个因素确定后,各变换矩阵的数值及维度也随之确定,由于在神经网络结构中,三个影响因素可以是事先设定好的,因此本实施例在线下或者预处理阶段操作以完成对于各变换矩阵的设定。
另外,神经元矩阵和权值矩阵中的数值为二进制,并且,变换矩阵C、G中元素的数值为±2n或者0,例如-2,-1,-0.5,0,0.5,1,2等。这样,本披露实施例采用位操作实现winograd变换,通过左移和右移实现乘2与除2的操作。例如,神经元矩阵d0中的一个数值与0.5相乘时,即将该数值向右移一位,与-0.5相乘时,即将该数值向左移一位并将最高位取反。因此,该步骤中不需要专用的乘法器参与运算,仅需要移位运算器111和加法器115即可以完成整个矩阵运算。因此,本披露实施例通
过位操作来实现winograd变换,减少了运算量,提高了运算速度。
步骤2,通过矩阵乘法运算器112,用于将所述变换后神经元矩阵和变换后权值矩阵进行对位相乘的矩阵乘法操作,得到乘法矩阵t:
需要说明的是,在通常的卷积过程中,参与运算的两个矩阵可能具有不同规模,故需要通过滑动操作,进行多次矩阵乘法运算,而在本披露实施例中,转换后的神经元矩阵d和权值矩阵w符合矩阵乘法规则,即只进行一次矩阵乘法运算,这样大大节省了计算量。
另外,当两个矩阵相乘时,如果已知一个矩阵的部分元素的数值为0,其与另一矩阵相应元素相乘得到的数值必然是0。那么,在实际数据计算过程中,上述过程其实可以不用参与运算,这样可以省去不必要的计算量。所以,本披露实施例通过稀疏化处理单元113将变换后权值矩阵映射成“0”和“1”组成的稀疏序列,其中,“0”对应变换后权值矩阵中数值为“0”的元素,“1”对应变换后权值矩阵中数值不为0的元素。参照图3所示,然后通过映射单元114形成系数序列与变换后神经元矩阵中元素位置的映射关系表,稀疏序列的第K位对应M行×N列神经元矩阵中第i行第j列元素,且满足(i-1)×N+j=K或者(j-1)×M+i=K。在执行矩阵乘法操作时,根据稀疏序列记录的“1”,提取变换后神经元矩阵中对应位置的元素,以与变换后权值矩阵中对应元素相乘。
例如:
其中,w对应的稀疏序列为1110111011101100(一行一行读取,也可以一列一列读取),在执行矩阵乘法操作时,根据该序列,可知变换后神经元矩阵中[d03,d13,d23,d32,d33]不参与运算。因此,采用稀疏序列可以
进一步减少矩阵乘法运算的运算量。在神经网络实际应用中,稀疏序列可以离线完成,并且稀疏序列占用存储空间相对于稀疏带来的存储空间减少是非常小的,因此该过程并不影响神经网络的运算速度与存储空间。
步骤3,通过移位运算器111和加法器115将所述乘法矩阵进行winograd反变换,得到卷积运算结果。
在本步骤中,采用下式将乘法矩阵t进行winograd反变换,得到运算结果output:
output=ATtA,
其中,A为反变换矩阵,AT为A的转置矩阵。
需要说明的是,反变换矩阵A与C、G一样,是采用winograd算法得到的,其具体过程在此就不再赘述,另外,反变换矩阵A的数值也为0或者±2n,n为整数,同样通过位操作即可实现数值间的运算。
需要说明的是,以上3个步骤,还需要通过控制器控制所述移位运算器进行winograd变换或winograd反变换,还通过控制器控制所述矩阵乘法运算器进行矩阵乘法操作。
以下列举一具体实例,进一步说本披露实施例的神级网络卷积运算装置进行卷积运算的过程。
图4示意性示出了卷积运算的示意图,如图4所示,卷积核是一个3×3的矩阵,卷积核在输入图像上滑动,其中,图中卷积核即为本披露的层权值矩阵,输入图像即为本披露的神经元矩阵。
对于通常神经网络中采用的卷积操作,假设每次滑动一个像素点,则总共需要做4次卷积操作,每次卷积操作,卷积核与对应的数据图像数据做乘加操作。因此,对于同个输出特征图上的不同的输出神经元,所需要的输入神经元不同,而权值和连接关系是相同的。例如,第一个卷积结果的计算过程为:1*1+1*0+1*1+0*0+1*1+1*0+0*1+0*0+1*1=4,第二个卷积结果的计算过程为:
1*1+0*1+1*0+1*0+1*1+1*0+0*1+1*0+1*1=3,以此类推。
图5结合本披露实施例所描述的装置,示意性示出本披露实施例执行图4卷积运算的过程,如图5所示:
步骤S1,控制器120从内存读取一条指令。
步骤S2,控制器120译出微指令,然后神经网络卷积运算装置100根据该微指令从外部地址空间读入执行卷积运算所需的数据,包括神经元矩阵d0、权值矩阵w0。然后,获取变换矩阵C、G,反变换矩阵A,在图4的示例中:
步骤S3,移位运算器111从存储器或数据缓存当中中分别读取神经元矩阵d0和权值矩阵w0,移位运算器111和加法器115对神经元矩阵d0和权值矩阵w0做winograd变换,即:
步骤S4,稀疏化处理单元113根据变换后的权值矩阵w获得稀疏序列,即[1110111011101100]。然后根据映射单元114的映射关系,通过遍历权值矩阵,对于权值矩阵非零值用比特1标志,对于零值用比特0标志,最后得到一个比特序列作为稀疏序列,比特序列的长度与权值矩阵的数值个数一致。
步骤S5,矩阵乘法运算器115根据稀疏序列选择相应的神经元与权值做乘法操作,完成输入神经元与权值的矩阵对位相乘,其中根据索引序列,变换后神经元矩阵d中[d03,d13,d23,d32,d33]不参与运算,最后得到运
算结果,即:
步骤S6,移位运算器111和加法器115对矩阵相乘的结果做winograd反变换操作,获得输出如下
本披露还揭露了一个神经网络运算装置,其包括一个或多个在本披露中提到的神经网络卷积运算装置,用于从其他处理装置中获取待运算数据和控制信息,执行指定的神经网络运算,执行结果通过I/O接口传递给外围设备。外围设备譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口,服务器。当包含一个以上权利神经网络卷积运算装置时,神经网络卷积运算装置间可以通过特定的结构进行链接并传输数据,譬如,通过PCIE总线进行互联并传输数据,以支持更大规模的神经网络的运算。此时,可以共享同一控制系统,也可以有各自独立的控制系统;可以共享内存,也可以每个加速器有各自的内存。此外,其互联方式可以是任意互联拓扑。
该神经网络运算装置具有较高的兼容性,可通过PCIE接口与各种类型的服务器相连接。
本披露还揭露了一个组合处理装置,其包括上述的神经网络运算装置,通用互联接口,和其他处理装置。神经网络运算装置与其他处理装置(用于执行非神经网络运算)进行交互,共同完成用户指定的操作。图6为组合处理装置的结构示意图。
其他处理装置,包括中央处理器CPU、图形处理器GPU、神经网络处理器等通用/专用处理器中的一种或以上的处理器类型。其他处理装
置所包括的处理器数量不做限制。其他处理装置作为神经网络运算装置与外部数据和控制的接口,包括数据搬运,完成对本神经网络运算装置的开启、停止等基本控制;其他处理装置也可以和神经网络运算装置协作共同完成运算任务。
通用互联接口,用于在所述神经网络运算装置与其他处理装置间传输数据和控制指令。该神经网络运算装置从其他处理装置中获取所需的输入数据,写入神经网络运算装置片上的存储装置;可以从其他处理装置中获取控制指令,写入神经网络运算装置片上的控制缓存;也可以读取神经网络运算装置的存储模块中的数据并传输给其他处理装置。
该组合处理装置可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统,有效降低控制部分的核心面积,提高处理速度,降低整体功耗。此情况时,该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件譬如摄像头,显示器,鼠标,键盘,网卡,wifi接口。
在一个实施例里,本披露公开了一个芯片,其包括了上述神经网络运算装置或组合处理装置。
在一个实施例里,本披露公开了一个芯片封装结构,其包括了上述芯片。
在一个实施例里,本披露公开了一个板卡,其包括了上述芯片封装结构。
在一个实施例里,本披露公开了一个电子装置,其包括了上述板卡。
电子装置可包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。
所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。
以上所述的具体实施例,对本披露的目的、技术方案和有益效果进
行了进一步详细说明,所应理解的是,以上所述仅为本披露的具体实施例而已,并不用于限制本披露,凡在本披露的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本披露的保护范围之内。
Claims (29)
- 一种神经网络卷积运算装置,用于以矩阵乘法的方式实现神经网络中的权值矩阵与神经元的卷积运算,其中包括:移位运算器,用于对所述神经元矩阵和权值矩阵分别进行winograd变换,得到变换后神经元矩阵和变换后权值矩阵;矩阵乘法运算器,用于将所述变换后神经元矩阵和变换后权值矩阵进行对位相乘的矩阵乘法操作,得到乘法矩阵;所述移位运算器还用于将所述乘法矩阵进行winograd反变换,得到卷积运算结果;控制器,用于控制所述移位运算器进行winograd变换或winograd反变换,还用于控制所述矩阵乘法运算器进行矩阵乘法操作。
- 根据权利要求1所述的神经网络卷积运算装置,其特征在于,还包括片上缓存,用于存储所述神经元矩阵和权值矩阵,还用于存储对神经元矩阵进行winograd变换的变换矩阵C,用于存储对权值矩阵进行winograd变换的变换矩阵G。
- 根据权利要求2所述的神经网络卷积运算装置,其特征在于,所述变换矩阵C和所述变换矩阵G中的元素数值独立为±2n或0,n为整数。
- 根据权利要求2所述的神经网络卷积运算装置,其特征在于,所述片上缓存还用于存储进行winograd反变换矩阵,还用于存储对神经元矩阵进行winograd反变换的反变换矩阵A。
- 根据权利要求4所述的神经网络卷积运算装置,其特征在于,所述反变换矩阵A中的元素数值为±2n或0,n为整数。
- 根据权利要求2所述的神经网络卷积运算装置,其特征在于,所述控制器还用于根据所述变换矩阵C或其转置矩阵CT,控制所述移位运算器对神经元矩阵或权值矩阵中各元素的二进制数值独立进行左移或右移;或根据所述变换矩阵G或其转置矩阵GT,控制所述移位运算器对权值矩阵中各元素二进制数值独立进行左移或右移。
- 根据权利要求4所述的神经网络卷积运算装置,其特征在于,所述控制器还用于根据所述反变换矩阵A或其转置矩阵AT,对乘法矩阵中各元素的二进制数值独立进行左移或右移。
- 根据权利要求1所述的神经网络卷积运算装置,其特征在于,还包括稀疏化处理单元,用于对所述变换后权值矩阵进行稀疏化处理,生成一二进制的稀疏序列,其中,“0”对应变换后权值矩阵中数值为“0”的元素,“1”对应变换后权值矩阵中数值不为0的元素;优选的,所述稀疏序列从高位至低位对应变换后权值矩阵逐行或逐列读取的元素。
- 根据权利要求7所述的神经网络卷积运算装置,其特征在于,还包括映射单元,所述映射单元生成稀疏序列与变换后神经元矩阵中元素位置的映射关系表,稀疏序列的第K位对应M行×N列神经元矩阵中第i行第j列元素,且满足(i-1)×N+j=K或者(j-1)×M+i=K。
- 根据权利要求8所述的神经网络卷积运算装置,其特征在于,所述控制器还用于根据所述映射关系表,控制所述矩阵乘法运算器进行矩阵乘法操作,其中,稀疏序列中为“0”的位对应神经元矩阵中的相应元素不进行矩阵乘法运算。
- 根据权利要求1所述的神经网络卷积运算装置,其特征在于,还包括加法器,用于在所述神经元矩阵和权值矩阵分别进行winograd变换时,对移位运算器移位运算后的结果按照矩阵乘法规则累加。
- 根据权利要求9所述的神经网络卷积运算装置,其特征在于,还包括数据缓存单元,用于缓存所述稀疏序列和映射关系表。
- 应用权利要求1-12任一所述神经网络卷积运算装置进行卷积运算的方法,包括:通过移位运算器和加法器对所述神经元矩阵和权值矩阵分别进行winograd变换,得到变换后神经元矩阵和变换后权值矩阵;通过矩阵乘法运算器,用于将所述变换后神经元矩阵和变换后权值矩阵进行对位相乘的矩阵乘法操作,得到乘法矩阵;通过移位运算器和加法器将所述乘法矩阵进行winograd反变换,得到卷积运算结果;通过控制器控制所述移位运算器进行winograd变换或winograd反变换,还通过控制器控制所述矩阵乘法运算器进行矩阵乘法操作。
- 根据权利要求13所述的方法,其特征在于,还包括:采用片上换粗存储所述神经元矩阵和权值矩阵,以及存储对神经元矩阵进行winograd变换的变换矩阵C,用于存储对权值矩阵进行winograd变换的变换矩阵G。
- 根据权利要求14所述的方法,其特征在于,所述变换矩阵C和所述变换矩阵G中的元素数值独立为±2n或0,n为整数。
- 根据权利要求14所述的方法,其特征在于,还采用所述片上缓存存储进行winograd反变换矩阵,以及存储对神经元矩阵进行winograd反变换的反变换矩阵A。
- 根据权利要求16所述的神经网络卷积运算装置,其特征在于,所述反变换矩阵A中的元素数值为±2n或0,n为整数。
- 根据权利要求13所述的方法,其特征在于,还包括获取所述对神经元矩阵进行winograd变换的变换矩阵C,对权值矩阵进行winograd变换的变换矩阵G,存储对神经元矩阵进行winograd反变换的反变换矩阵A,包括:接收所述输入神经元矩阵与权值矩阵的维度,以及卷积运算滑动步长数据;根据接收的数据,依据winograd算法确定所述变换矩阵C,变换矩阵G和反变换矩阵A。
- 根据权利要求14所述的方法,其特征在于,还包括:控制器根据所述变换矩阵C或其转置矩阵CT,控制所述移位运算器对神经元矩阵或权值矩阵中各元素的二进制数值独立进行左移或右移;或根据所述变换矩阵G或其转置矩阵GT中的元素值,控制所述移位运算器对权值矩阵中各元素二进制数值独立进行左移或右移或归零。
- 根据权利要求16所述的方法,其特征在于,还包括:控制器根据所述反变换矩阵A或其转置矩阵AT中的元素值,对乘法矩阵中各元素的二进制数值独立进行左移或右移或归零。
- 根据权利要求12所述的方法,其特征在于,还包括:通过稀疏化处理单元对变换后权值矩阵进行稀疏化处理,生成一二进制的稀疏序列,其中,“0”对应变换后权值矩阵中数值为“0”的元素,“1”对应变换后权值矩阵中数值不为0的元素;优选的,所述稀疏序列从高位至低位对应变换后权值矩阵逐行或逐列读取的元素。
- 根据权利要求21所述的方法,其特征在于,还包括:通过映射单元生成稀疏序列与变换后神经元矩阵中元素位置的映射关系表,稀疏序列的第K位对应M行×N列神经元矩阵中第i行第j列元素,且满足(i-1)×N+j=K或者(j-1)×M+i=K。
- 根据权利要求24所述的方法,其特征在于,通过控制器根据所述映射关系表,控制所述矩阵乘法运算器进行矩阵乘法操作,其中,稀疏序列中为“0”的位对应神经元矩阵中的相应元素不进行矩阵乘法运算。
- 根据权利要求13所述的方法,其特征在于,包括通过加法器在所述神经元矩阵和权值矩阵分别进行winograd变换时,对移位运算器移位运算后的结果按照矩阵乘法规则累加。
- 根据权利要求22所述的方法,其特征在于,还包括通过数据缓存单元缓存所述稀疏序列和映射关系表。
- 一种神经网络卷积运算装置,其特征在于,包括多个如权利要求1-12任一所述的神经网络卷积运算装置,用于获取待运算数据和控制信息,执行神经网络运算。
- 一种组合处理装置,包括权利要求1-12任一所述的神经网络卷积运算装置,通用互联接口,和用于执行非神经网络运算的其他处理装置,所述其他处理装置与所述神经网络运算装置通过通用互联接口连接。
- 一种芯片,其特征在于包括其包括权利要求26所述的神经网络运算装置或权利要求27所述的组合处理装置。
- 一种电子装置,包括权利要求28所述的芯片。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP17882134.4A EP3557484B1 (en) | 2016-12-14 | 2017-12-14 | Neural network convolution operation device and method |
US16/440,204 US10635965B2 (en) | 2016-12-14 | 2019-06-13 | Neural network convolution computation method and device, and computer-readable storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611152537 | 2016-12-14 | ||
CN201611152537.5 | 2016-12-14 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/440,204 Continuation-In-Part US10635965B2 (en) | 2016-12-14 | 2019-06-13 | Neural network convolution computation method and device, and computer-readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018108126A1 true WO2018108126A1 (zh) | 2018-06-21 |
Family
ID=62558042
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/116161 WO2018108126A1 (zh) | 2016-12-14 | 2017-12-14 | 神经网络卷积运算装置及方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US10635965B2 (zh) |
EP (1) | EP3557484B1 (zh) |
CN (1) | CN108229654B (zh) |
WO (1) | WO2018108126A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190756A (zh) * | 2018-09-10 | 2019-01-11 | 中国科学院计算技术研究所 | 基于Winograd卷积的运算装置及包含该装置的神经网络处理器 |
US20210241083A1 (en) * | 2018-05-15 | 2021-08-05 | Mitsubishi Electric Corporation | Arithmetic device |
TWI842180B (zh) * | 2022-11-04 | 2024-05-11 | 瑞昱半導體股份有限公司 | 卷積電路與卷積計算方法 |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3564864A4 (en) * | 2016-12-30 | 2020-04-15 | Shanghai Cambricon Information Technology Co., Ltd | DEVICES FOR COMPRESSION / DECOMPRESSION, SYSTEM, CHIP AND ELECTRONIC DEVICE |
CN109255432B (zh) * | 2018-08-22 | 2024-04-30 | 中国平安人寿保险股份有限公司 | 神经网络模型构建方法及装置、存储介质、电子设备 |
WO2020046859A1 (en) * | 2018-08-27 | 2020-03-05 | Neuralmagic Inc. | Systems and methods for neural network convolutional layer matrix multiplication using cache memory |
CN109190755B (zh) * | 2018-09-07 | 2021-07-20 | 中国科学院计算技术研究所 | 面向神经网络的矩阵转换装置及方法 |
CN109359730B (zh) * | 2018-09-26 | 2020-12-29 | 中国科学院计算技术研究所 | 面向固定输出范式Winograd卷积的神经网络处理器 |
CN109325591B (zh) * | 2018-09-26 | 2020-12-29 | 中国科学院计算技术研究所 | 面向Winograd卷积的神经网络处理器 |
CN109255434A (zh) * | 2018-10-15 | 2019-01-22 | 旺微科技(上海)有限公司 | 一种卷积神经网络中计算资源的调度方法及装置 |
US11068394B2 (en) * | 2018-10-29 | 2021-07-20 | Electronics And Telecommunications Research Institute | Neural network system including data moving controller |
CN111260020B (zh) * | 2018-11-30 | 2024-04-16 | 深圳市海思半导体有限公司 | 卷积神经网络计算的方法和装置 |
CN111382854B (zh) * | 2018-12-28 | 2021-03-23 | 广州市百果园信息技术有限公司 | 一种卷积神经网络处理方法、装置、设备及存储介质 |
CN109740739B (zh) * | 2018-12-29 | 2020-04-24 | 中科寒武纪科技股份有限公司 | 神经网络计算装置、神经网络计算方法及相关产品 |
CN111523655B (zh) * | 2019-02-03 | 2024-03-29 | 上海寒武纪信息科技有限公司 | 处理装置及方法 |
CN110097172B (zh) * | 2019-03-18 | 2021-10-29 | 中国科学院计算技术研究所 | 一种基于winograd卷积运算的卷积神经网络数据处理方法及装置 |
CN110580519B (zh) * | 2019-08-19 | 2022-03-22 | 中国科学院计算技术研究所 | 一种卷积运算装置及其方法 |
DE102019214402A1 (de) * | 2019-09-20 | 2021-03-25 | Robert Bosch Gmbh | Verfahren und vorrichtung zum verarbeiten von daten mittels eines neuronalen konvolutionsnetzwerks |
KR20210037569A (ko) * | 2019-09-27 | 2021-04-06 | 삼성전자주식회사 | 컨볼루션 신경망 가속기 아키텍처를 위한 전력 효율적인 하이브리드 트래버설 장치 및 방법 |
CN112784207B (zh) * | 2019-11-01 | 2024-02-02 | 中科寒武纪科技股份有限公司 | 运算方法及相关产品 |
CN112784206A (zh) * | 2019-11-01 | 2021-05-11 | 中科寒武纪科技股份有限公司 | winograd卷积运算方法、装置、设备及存储介质 |
CN112766473B (zh) * | 2019-11-01 | 2023-12-05 | 中科寒武纪科技股份有限公司 | 运算装置及相关产品 |
CN112766472B (zh) * | 2019-11-01 | 2024-04-12 | 中科寒武纪科技股份有限公司 | 数据处理方法、装置、计算机设备和存储介质 |
CN112765537B (zh) * | 2019-11-01 | 2024-08-23 | 中科寒武纪科技股份有限公司 | 数据处理方法、装置、计算机设备和存储介质 |
CN112765539B (zh) * | 2019-11-01 | 2024-02-02 | 中科寒武纪科技股份有限公司 | 运算装置、方法及相关产品 |
CN113033813B (zh) * | 2019-12-09 | 2024-04-26 | 中科寒武纪科技股份有限公司 | 数据处理方法、装置、计算机设备和存储介质 |
CN111240746B (zh) * | 2020-01-12 | 2023-01-10 | 苏州浪潮智能科技有限公司 | 一种浮点数据反量化及量化的方法和设备 |
US11216375B2 (en) * | 2020-02-26 | 2022-01-04 | Hangzhou Zhicun Intelligent Technology Co., Ltd. | Data caching |
CN113313228B (zh) * | 2020-02-26 | 2022-10-14 | 杭州知存智能科技有限公司 | 数据缓存电路和方法 |
US11379557B2 (en) * | 2020-05-07 | 2022-07-05 | Meta Platforms, Inc. | Device and method for flexibly summing matrix values |
CN111814957B (zh) * | 2020-06-28 | 2024-04-02 | 深圳云天励飞技术股份有限公司 | 神经网络运算方法及相关设备 |
CN116113941A (zh) * | 2020-09-29 | 2023-05-12 | 华为技术有限公司 | 一种神经网络加速器、加速方法以及装置 |
CN112765552B (zh) * | 2021-01-21 | 2024-05-07 | 中国科学院重庆绿色智能技术研究院 | 基于数组打包的矩阵乘法的分块参数空间优化方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5473730A (en) * | 1993-11-09 | 1995-12-05 | At&T Ipm Corp. | High efficiency learning network |
CN104915322A (zh) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | 一种卷积神经网络硬件加速方法及其axi总线ip核 |
CN106066783A (zh) * | 2016-06-02 | 2016-11-02 | 华为技术有限公司 | 基于幂次权重量化的神经网络前向运算硬件结构 |
CN106203617A (zh) * | 2016-06-27 | 2016-12-07 | 哈尔滨工业大学深圳研究生院 | 一种基于卷积神经网络的加速处理单元及阵列结构 |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001018743A1 (en) * | 1999-09-03 | 2001-03-15 | Cheng T C | Fast and efficient computation of cubic-spline interpolation for data compression |
US6947916B2 (en) * | 2001-12-21 | 2005-09-20 | Quicksilver Technology, Inc. | IC for universal computing with near zero programming complexity |
US7296045B2 (en) * | 2004-06-10 | 2007-11-13 | Hasan Sehitoglu | Matrix-valued methods and apparatus for signal processing |
WO2008153823A1 (en) * | 2007-06-08 | 2008-12-18 | Thomson Licensing | Method and apparatus for multi-lattice sparsity-based filtering |
US7885819B2 (en) * | 2007-06-29 | 2011-02-08 | Microsoft Corporation | Bitstream syntax for multi-process audio decoding |
US8731062B2 (en) * | 2008-02-05 | 2014-05-20 | Ntt Docomo, Inc. | Noise and/or flicker reduction in video sequences using spatial and temporal processing |
US8972329B2 (en) * | 2008-05-02 | 2015-03-03 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for ranking nodes of a graph using random parameters |
CN101710393A (zh) * | 2009-11-25 | 2010-05-19 | 北京航空航天大学 | 一种专家系统知识表示机制和推理方法 |
US20110231355A1 (en) * | 2010-03-22 | 2011-09-22 | University Of Seoul Foundation Of Industry Academic Cooperation | Intelligent ubiquitous-city middleware apparatus and the u-city system having the same |
US10453479B2 (en) * | 2011-09-23 | 2019-10-22 | Lessac Technologies, Inc. | Methods for aligning expressive speech utterances with text and systems therefor |
CN102571768B (zh) * | 2011-12-26 | 2014-11-26 | 北京大学 | 一种钓鱼网站检测方法 |
CN103106181B (zh) * | 2013-01-29 | 2016-03-02 | 北京理工大学 | 一种大点数fft在处理器上的实现方法 |
CN107563497B (zh) * | 2016-01-20 | 2021-03-19 | 中科寒武纪科技股份有限公司 | 用于稀疏人工神经网络的计算装置和运算方法 |
US20170344876A1 (en) * | 2016-05-31 | 2017-11-30 | Samsung Electronics Co., Ltd. | Efficient sparse parallel winograd-based convolution scheme |
US9646243B1 (en) * | 2016-09-12 | 2017-05-09 | International Business Machines Corporation | Convolutional neural networks using resistive processing unit array |
-
2017
- 2017-12-14 CN CN201711343539.7A patent/CN108229654B/zh active Active
- 2017-12-14 WO PCT/CN2017/116161 patent/WO2018108126A1/zh unknown
- 2017-12-14 EP EP17882134.4A patent/EP3557484B1/en active Active
-
2019
- 2019-06-13 US US16/440,204 patent/US10635965B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5473730A (en) * | 1993-11-09 | 1995-12-05 | At&T Ipm Corp. | High efficiency learning network |
CN104915322A (zh) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | 一种卷积神经网络硬件加速方法及其axi总线ip核 |
CN106066783A (zh) * | 2016-06-02 | 2016-11-02 | 华为技术有限公司 | 基于幂次权重量化的神经网络前向运算硬件结构 |
CN106203617A (zh) * | 2016-06-27 | 2016-12-07 | 哈尔滨工业大学深圳研究生院 | 一种基于卷积神经网络的加速处理单元及阵列结构 |
Non-Patent Citations (2)
Title |
---|
See also references of EP3557484A4 * |
TAN, FUPING ET AL.: "ANew Scheme to Divide Odd-Sized Matrices for the Winograd's Algorithm", COMMUNICATION ON APPLIED MATHEMATICS AND COMPUTATION, vol. 18, no. 1, 30 June 2004 (2004-06-30), pages 92 - 96, XP009515379 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210241083A1 (en) * | 2018-05-15 | 2021-08-05 | Mitsubishi Electric Corporation | Arithmetic device |
CN109190756A (zh) * | 2018-09-10 | 2019-01-11 | 中国科学院计算技术研究所 | 基于Winograd卷积的运算装置及包含该装置的神经网络处理器 |
CN109190756B (zh) * | 2018-09-10 | 2022-02-18 | 中国科学院计算技术研究所 | 基于Winograd卷积的运算装置及包含该装置的神经网络处理器 |
TWI842180B (zh) * | 2022-11-04 | 2024-05-11 | 瑞昱半導體股份有限公司 | 卷積電路與卷積計算方法 |
Also Published As
Publication number | Publication date |
---|---|
EP3557484A4 (en) | 2020-03-25 |
CN108229654A (zh) | 2018-06-29 |
EP3557484B1 (en) | 2021-11-17 |
CN108229654B (zh) | 2020-08-14 |
EP3557484A1 (en) | 2019-10-23 |
US20190311242A1 (en) | 2019-10-10 |
US10635965B2 (en) | 2020-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018108126A1 (zh) | 神经网络卷积运算装置及方法 | |
CN109101273B (zh) | 神经网络处理装置及其执行向量最大值指令的方法 | |
US11710041B2 (en) | Feature map and weight selection method and accelerating device | |
CN110084361B (zh) | 一种运算装置和方法 | |
KR102354722B1 (ko) | 계산 장치 및 방법 | |
WO2017185391A1 (zh) | 一种用于执行卷积神经网络训练的装置和方法 | |
WO2018107383A1 (zh) | 神经网络的卷积运算方法、装置及计算机可读存储介质 | |
CN108229656A (zh) | 神经网络运算装置及方法 | |
CN108320018B (zh) | 一种人工神经网络运算的装置及方法 | |
CN111381871B (zh) | 运算方法、装置及相关产品 | |
CN117933314A (zh) | 处理装置、处理方法、芯片及电子装置 | |
CN109521994A (zh) | 乘法硬件电路、片上系统及电子设备 | |
WO2019215907A1 (ja) | 演算処理装置 | |
CN113065997B (zh) | 一种图像处理方法、神经网络的训练方法以及相关设备 | |
WO2021185262A1 (zh) | 计算装置、方法、板卡和计算机可读存储介质 | |
CN108960420B (zh) | 处理方法及加速装置 | |
CN111784557A (zh) | 一种处理图像数据的方法、装置、板卡及可读存储介质 | |
CN111382850A (zh) | 运算方法、装置及相关产品 | |
CN111401536A (zh) | 运算方法、装置及相关产品 | |
CN112394996A (zh) | 八位整形转半精度浮点指令处理装置、方法及相关产品 | |
CN114691083A (zh) | 矩阵乘法电路、方法及相关产品 | |
CN113934678A (zh) | 一种计算装置、集成电路芯片、板卡、设备和计算方法 | |
CN112394989A (zh) | 无符号转半精度浮点指令处理装置、方法及相关产品 | |
CN112394993A (zh) | 半精度浮点转短整形指令处理装置、方法及相关产品 | |
CN112394990A (zh) | 浮点转半精度浮点指令处理装置、方法及相关产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17882134 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2017882134 Country of ref document: EP Effective date: 20190715 |