WO2023098256A1 - Neural network operation method and apparatus, chip, electronic device and storage medium - Google Patents

Neural network operation method and apparatus, chip, electronic device and storage medium Download PDF

Info

Publication number
WO2023098256A1
WO2023098256A1 PCT/CN2022/121427 CN2022121427W WO2023098256A1 WO 2023098256 A1 WO2023098256 A1 WO 2023098256A1 CN 2022121427 W CN2022121427 W CN 2022121427W WO 2023098256 A1 WO2023098256 A1 WO 2023098256A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
convolution
convolution kernel
input data
result
Prior art date
Application number
PCT/CN2022/121427
Other languages
French (fr)
Chinese (zh)
Inventor
徐东
熊先奎
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023098256A1 publication Critical patent/WO2023098256A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the data rearrangement mode corresponding to the first sub-convolution kernel group is set so that the position of each part of the data in the input data remains unchanged.
  • the operation of the ith sub-convolution kernel group after the operation of the ith sub-convolution kernel group is completed, it is necessary to judge the size of the i value, and when the i value is less than Wk*Hk, update the i value to i+1, and load the i+1th Sub-convolution kernel group, execute the operation step again; when the value of i is equal to Wk*Hk, the operation step ends at this time, and the data at the effective position in the i-th accumulation result will be used as the output result of the neural network operation.
  • step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
  • the convolution unit is used to convolve each sub-convolution kernel group and the rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result corresponding to each sub-convolution kernel group, and to convolve each sub-convolution kernel group
  • the corresponding convolution result is output to the addition unit;

Abstract

The present application relates to a neural network operation method and apparatus, a chip, an electronic device and a storage medium. The neural network operation method comprises: acquiring input data and Wk*Hk sub-convolution kernel groups for neural network operation, and executing an operation step, N Wk*Hk*C convolution kernels of the neural network operation being split into N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels being divided into Wk*Hk sub-convolution kernel groups. The operation step comprises: rearranging the input data on the basis of a data rearrangement mode corresponding to each sub-convolution kernel group so as to obtain rearranged input data corresponding to each sub-convolution kernel group; and convolving each sub-convolution kernel group with the rearranged input data corresponding to the sub-convolution kernel group to obtain a convolution result of the sub-convolution kernel group; accumulating the convolution results of the sub-convolution kernel groups to obtain an accumulation result, and taking data located at a valid position in the accumulation result as an output result of the neural network operation.

Description

神经网络运算方法、装置、芯片、电子设备和存储介质Neural network computing method, device, chip, electronic equipment and storage medium
相关申请related application
本申请要求于2021年12月3日申请的、申请号为202111466758.0的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to a Chinese patent application with application number 202111466758.0 filed on December 3, 2021, the entire contents of which are incorporated herein by reference.
技术领域technical field
本申请实施例涉及数据计算领域,特别涉及一种神经网络运算方法、装置、芯片、电子设备及存储介质。The embodiments of the present application relate to the field of data computing, and in particular to a neural network computing method, device, chip, electronic device, and storage medium.
背景技术Background technique
神经网络90%的计算量都在卷积和全连接,全连接从本质上来说也是一种特殊的卷积运算,卷积运算当前基本转换为矩阵运算,通过脉动阵列或通用矩阵乘法(General Matrix Multiplication,简称GEMM)来实现,现有技术对神经网络的研究主要还是如何高效实现卷积运算中的乘法和加法运算,而忽略了数据访问对计算效率的影响,以及访存带来功耗的增加。90% of the calculation of the neural network is convolution and full connection. The full connection is essentially a special convolution operation. The convolution operation is currently basically converted into a matrix operation, through a systolic array or general matrix multiplication (General Matrix Multiplication, referred to as GEMM), the existing research on neural networks is mainly how to efficiently implement multiplication and addition operations in convolution operations, while ignoring the impact of data access on computing efficiency and the power consumption caused by memory access. Increase.
现有神经网络加速器为了方便调度,通常采用img2col的方式对权重和激活数据进行排布。当权重和输入数据都经过img2col以后,将2个矩阵输入到矩阵运算单元进行运算,就得到最终2个矩阵相乘的结果,也就是神经网络卷积输出的结果。权重数据img2col后不会增加数据的大小,仅仅需要对数据进行重排,而且由于权重可以通过离线的方式排布好,所以权重img2col并不增加额外的开销。但对于输入数据采用img2col以后,由于卷积滑窗的原因,会明显增加输入数据的容量,如图1所示,原始输入为W=10,H=10的图片,数据总量为10*10=100,img2col后数据为64*9=576,数据膨胀了近6倍,如果输入的尺寸(W*H)更大的情况下,理论的数据膨胀接近卷积核的K W*K H倍。img2col可以由软件或硬件实现,但不管采用何种方式都会增加对输入数据的访问,这样会导致动态功耗的增加。而且由于神经网络计算本身就是访存受限的,这种数据量的增加也会导致性能的降低。 In order to facilitate scheduling, existing neural network accelerators usually use img2col to arrange weights and activation data. After the weights and input data have passed through img2col, the two matrices are input to the matrix operation unit for operation, and the final result of multiplying the two matrices is obtained, which is the result of the neural network convolution output. The weight data img2col will not increase the size of the data, only need to rearrange the data, and because the weights can be arranged offline, so the weight img2col does not increase additional overhead. However, after using img2col for the input data, due to the convolution sliding window, the capacity of the input data will be significantly increased. As shown in Figure 1, the original input is a picture of W=10, H=10, and the total amount of data is 10*10 =100, the data after img2col is 64*9=576, and the data is expanded by nearly 6 times. If the input size (W*H) is larger, the theoretical data expansion is close to K W *K H times of the convolution kernel . img2col can be implemented by software or hardware, but no matter which method is used, it will increase the access to the input data, which will lead to an increase in dynamic power consumption. Moreover, since the neural network calculation itself has limited memory access, the increase in the amount of data will also lead to a decrease in performance.
发明内容Contents of the invention
本申请实施例的主要目的在于提出一种神经网络运算方法、装置、电子设备及存储介质。旨在消除因为img2col带来的硬件设计的开销、数据访存量的增加、以及动态功耗的增加。The main purpose of the embodiments of the present application is to provide a neural network computing method, device, electronic equipment, and storage medium. It aims to eliminate the overhead of hardware design, the increase of data access and the increase of dynamic power consumption caused by img2col.
为实现上述目的,本申请实施例提供了一种神经网络运算方法,包括:获取神经网络运算的输入数据和Wk*Hk个子卷积核组,并进入运算步骤;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;所述运算步骤,包括:根据各所述子卷积核组对应的数据重排方式对所述输入数据进行重排,得到各所述子卷积核组对应的重排后输入数据;将各所述子卷积核组和各所述子卷积核组对应的重排后输入数据卷积得到各所述子卷积核组对应的卷积结果;将各所述子卷积核组对应的卷积结果累加得到累加结果,并将所述累加结果中位于有效位置的数据作为所述神经网络运算的输出结果;其中,各所述子卷积核组对应的重排后输入数据中与各所述子卷积核组对应的所述部分输入数据具有相同的数据位置,所述相同的数据位置为所述有效位置。In order to achieve the above purpose, the embodiment of the present application provides a neural network operation method, including: obtaining the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and entering the operation step; wherein, the neural network operation The N Wk*Hk*C convolution kernels are split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into all Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution The kernel corresponds to part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the operation steps include: Rearrange the input data according to the data rearrangement mode corresponding to each of the sub-convolution kernel groups, and obtain the rearranged input data corresponding to each of the sub-convolution kernel groups; Convolving the rearranged input data corresponding to each of the sub-convolution kernel groups to obtain convolution results corresponding to each of the sub-convolution kernel groups; accumulating the convolution results corresponding to each of the sub-convolution kernel groups to obtain an accumulated As a result, the data in the effective position in the accumulation result is used as the output result of the neural network operation; wherein, the rearranged input data corresponding to each sub-convolution kernel group is related to each sub-convolution kernel The part of input data corresponding to the group has the same data position, and the same data position is the effective position.
为实现上述目的,本申请实施例还提供了一种神经网络运算方法,包括:获取神经网络运算的输入数据和Wk*Hk个子卷积核组,并进入运算步骤;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组, 且每个子卷积核组包括N个1*1*C子卷积核;N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;所述运算步骤,包括:将各所述子卷积核组和所述输入数据分别卷积得到各所述子卷积核组对应的卷积结果;根据各所述子卷积核组对应的数据重排方式对各所述子卷积核组对应的卷积结果进行重排,得到各所述子卷积核组对应的重排后卷积结果;将各所述子卷积核组对应的重排后卷积结果累加得到累加结果,并将所述累加结果中位于有效位置的数据作为所述神经网络运算的输出结果;其中,各所述子卷积核组与各所述子卷积核组对应的所述部分输入数据卷积得到各所述子卷积核组对应的有效卷积结果,各所述子卷积核组对应的重排后卷积结果中的所述有效卷积结果具有相同的数据位置,且所述相同的数据位置为所述有效位置。In order to achieve the above purpose, the embodiment of the present application also provides a neural network operation method, including: obtaining the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and entering the operation step; wherein, the neural network operation The N Wk*Hk*C convolution kernels are split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into The Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels; N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-volume The product kernel corresponds to part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the operation steps include : Convolving each of the sub-convolution kernel groups and the input data respectively to obtain convolution results corresponding to each of the sub-convolution kernel groups; The convolution results corresponding to the sub-convolution kernel groups are rearranged to obtain the rearranged convolution results corresponding to each of the sub-convolution kernel groups; the rearranged convolution results corresponding to each of the sub-convolution kernel groups are The results are accumulated to obtain an accumulation result, and the data in the effective position in the accumulation result is used as the output result of the neural network operation; wherein, each of the sub-convolution kernel groups corresponds to each of the sub-convolution kernel groups. Part of the input data is convoluted to obtain an effective convolution result corresponding to each of the sub-convolution kernel groups, and the effective convolution results in the rearranged convolution results corresponding to each of the sub-convolution kernel groups have the same data location, and the same data location is the valid location.
为实现上述目的,本申请实施例还提供了一种神经网络运算方法,包括:获取神经网络运算的输入数据和Wk*Hk个子卷积核组;并进入运算步骤;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核;N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;所述运算步骤包括:将第i个子卷积核组和所述输入数据卷积,得到第i个卷积结果;其中,所述第i个子卷积核组与所述第i个子卷积核组对应的所述部分输入数据卷积得到所述第i个子卷积核组对应的有效卷积结果,所述第i个卷积结果中包含所述有效卷积结果;将所述第(i-1)次累加结果进行重排,以使得重排后的第(i-1)次累加结果中所述有效卷积结果和所述第i个卷积结果中所述有效卷积结果具有相同的数据位置;将所述重排后的第(i-1)次累加结果与所述第i个卷积结果累加,得到第i次累加结果;若i小于Wk*Hk,将i更新为i+1,并再次执行所述运算步骤;若i等于Wk*Hk,将所述第i次累加结果中的所述有效卷积结果作为所述神经网络运算的输出结果;其中,i的初始值为1,且i=1时,第0次累加结果被设定零,且所述重排后的第0次累加结果中所述有效卷积结果和所述第1个卷积结果中所述有效卷积结果被默认为具有相同的数据位置。In order to achieve the above object, the embodiment of the present application also provides a neural network operation method, including: obtaining the input data of the neural network operation and Wk*Hk sub-convolution kernel groups; and entering the operation step; wherein, the neural network operation The N Wk*Hk*C convolution kernels are split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into The Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels; N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-volume The product kernel corresponds to part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the operation steps include: Convolving the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result; wherein, the i-th sub-convolution kernel group corresponds to the part corresponding to the i-th sub-convolution kernel group The input data convolution obtains the effective convolution result corresponding to the i-th sub-convolution kernel group, the i-th convolution result contains the effective convolution result; the (i-1)th accumulation result Perform rearrangement so that the effective convolution result in the rearranged (i-1)th accumulation result and the effective convolution result in the ith convolution result have the same data position; Accumulate the (i-1)th accumulation result after the rearrangement with the i-th convolution result to obtain the i-th accumulation result; if i is less than Wk*Hk, update i to i+1, and execute again The operation step: if i is equal to Wk*Hk, use the effective convolution result in the i-th accumulation result as the output result of the neural network operation; wherein, the initial value of i is 1, and i= When 1, the 0th accumulation result is set to zero, and the effective convolution result in the 0th accumulation result after the rearrangement and the effective convolution result in the 1st convolution result are defaulted to have the same data location.
为实现上述目的,本申请实施例还提供一种神经网络运算装置,包括:第一存储单元、第二存储单元、控制单元、第一数据重排单元、卷积单元以及加法单元;所述第一存储单元用于存储神经网络运算的输入数据,所述第二存储单元用于存储所述神经网络运算的Wk*Hk个子卷积核组;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;所述控制单元用于从所述第一存储单元获取所述输入数据,并将所述输入数据输入所述第一数据重排单元,所述控制单元还用于将各所述子卷积核组对应的数据重排方式发送至所述第一数据重排单元;所述第一数据重排单元用于根据各所述子卷积核组对应的数据重排方式对所述输入数据进行重排,得到各所述子卷积核组对应的重排后输入数据,并将各所述子卷积核组对应的重排后输入数据输出至所述卷积单元;其中,各所述子卷积核组对应的重排后输入数据中与各所述子卷积核组对应的所述部分输入数据具有相同的数据位置,所述相同的数据位置为所述有效位置;所述控制单元还用于从所述第二存储单元获取各所述子卷积核组,并将各所述子卷积核组发送至所述卷积单元;所述卷积单元用于将各所述子卷积核组和各所述子卷积核组对应的重排后输入数据卷积得到各所述子卷积核组对应的卷积结果,并将各所述子卷积核组对应的卷积结果输出至所述加法单元;所述加法单元用于将各所述子卷积核组对应的卷积结果累加得到累加结果,并将所述累加结果中位于有效位置的数据作为所 述神经网络运算的输出结果。In order to achieve the above purpose, the embodiment of the present application also provides a neural network computing device, including: a first storage unit, a second storage unit, a control unit, a first data rearrangement unit, a convolution unit, and an addition unit; A storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the N Wk*Hk of the neural network operation The *C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk*Hk sub-convolution kernels Convolution kernel group, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to the input Part of the input data in the data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; The unit acquires the input data, and inputs the input data into the first data rearrangement unit, and the control unit is further configured to send the data rearrangement mode corresponding to each of the sub-convolution kernel groups to the first data rearrangement unit. A data rearrangement unit; the first data rearrangement unit is used to rearrange the input data according to the data rearrangement mode corresponding to each of the sub-convolution kernel groups, so as to obtain the data corresponding to each of the sub-convolution kernel groups The rearranged input data, and output the rearranged input data corresponding to each of the sub-convolution kernel groups to the convolution unit; wherein, the rearranged input data corresponding to each of the sub-convolution kernel groups The part of input data corresponding to each of the sub-convolution kernel groups has the same data position, and the same data position is the effective position; the control unit is also used to acquire each The sub-convolution kernel group, and each of the sub-convolution kernel groups is sent to the convolution unit; the convolution unit is used to combine each of the sub-convolution kernel groups and each of the sub-convolution kernel groups The rearranged input data corresponding to the group is convoluted to obtain the convolution results corresponding to each of the sub-convolution kernel groups, and output the convolution results corresponding to each of the sub-convolution kernel groups to the addition unit; the addition The unit is used for accumulating the convolution results corresponding to each of the sub-convolution kernel groups to obtain an accumulation result, and taking the data at a valid position in the accumulation result as the output result of the neural network operation.
为实现上述目的,本申请实施例还提供一种神经网络运算装置,包括:第一存储单元、第二存储单元、控制单元、第二数据重排单元、卷积单元以及加法单元;所述第一存储单元用于存储神经网络运算的输入数据,所述第二存储单元用于存储所述神经网络运算的Wk*Hk个子卷积核组;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;所述控制单元用于从所述第一存储单元获取所述输入数据,并将所述输入数据输入所述卷积单元,所述控制单元还用于从所述第二存储单元获取各所述子卷积核组,并将各所述子卷积核组输入所述卷积单元;所述卷积单元用于将各所述子卷积核组和所述输入数据分别卷积得到各所述子卷积核组对应的卷积结果,并将各所述子卷积核组对应的卷积结果输出至所述第二数据重排单元;所述控制单元还用于将各所述子卷积核组对应的数据重排方式发送至所述第二数据重排单元;所述第二数据重排单元根据各所述子卷积核组对应的数据重排方式对各所述子卷积核组对应的卷积结果进行重排,得到各所述子卷积核组对应的重排后卷积结果,并将各所述子卷积核组对应的重排后卷积结果输出至所述加法单元;其中,各所述子卷积核组与各所述子卷积核组对应的所述部分输入数据卷积得到各所述子卷积核组对应的有效卷积结果,各所述子卷积核组对应的重排后卷积结果中的所述有效卷积结果具有相同的数据位置,且所述相同的数据位置为所述有效位置。In order to achieve the above purpose, the embodiment of the present application also provides a neural network computing device, including: a first storage unit, a second storage unit, a control unit, a second data rearrangement unit, a convolution unit, and an addition unit; A storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the N Wk*Hk of the neural network operation The *C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk*Hk sub-convolution kernels Convolution kernel group, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to the input Part of the input data in the data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; The unit obtains the input data, and inputs the input data into the convolution unit, and the control unit is also used to obtain each of the sub-convolution kernel groups from the second storage unit, and each of the sub-convolution kernel groups The convolution kernel group is input to the convolution unit; the convolution unit is used to convolve each of the sub-convolution kernel groups and the input data respectively to obtain a convolution result corresponding to each of the sub-convolution kernel groups, And output the convolution result corresponding to each sub-convolution kernel group to the second data rearrangement unit; the control unit is also used to send the data rearrangement mode corresponding to each sub-convolution kernel group to The second data rearrangement unit; the second data rearrangement unit rearranges the convolution results corresponding to each sub-convolution kernel group according to the data rearrangement mode corresponding to each sub-convolution kernel group, Obtain the rearranged convolution results corresponding to each of the sub-convolution kernel groups, and output the rearranged convolution results corresponding to each of the sub-convolution kernel groups to the adding unit; wherein, each of the sub-volumes Convolving the part of the input data corresponding to each of the sub-convolution kernel groups by the product kernel group to obtain an effective convolution result corresponding to each of the sub-convolution kernel groups, after the rearrangement corresponding to each of the sub-convolution kernel groups The valid convolution results in the convolution results have the same data position, and the same data position is the valid position.
为实现上述目的,本申请实施例还提供一种神经网络运算装置,包括:第一存储单元、第二存储单元、第三存储单元、控制单元、第三数据重排单元、卷积单元以及加法单元;所述第一存储单元用于存储神经网络运算的输入数据,所述第二存储单元用于存储所述神经网络运算的Wk*Hk个子卷积核组;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;所述控制单元用于从所述第一存储单元获取所述输入数据,并将所述输入数据输入所述卷积单元,所述控制单元还用于从所述第二存储单元获取第i个子卷积核组,并将所述第i个子卷积核组输入所述卷积单元;所述卷积单元用于将第i个子卷积核组和所述输入数据卷积,得到第i个卷积结果,并将所述第i个卷积结果输出至所述加法单元;其中,所述第i个子卷积核组与所述第i个子卷积核组对应的所述部分输入数据卷积得到所述第i个子卷积核组对应的有效卷积结果,所述第i个卷积结果中包含所述有效卷积结果;所述控制单元还用于从所述第三存储单元中获取所述第(i-1)次累加结果,并将所述第(i-1)次累加结果发送至所述第三数据重排单元;所述第三数据重排单元用于将所述第(i-1)次累加结果进行重排,以使得重排后的第(i-1)次累加结果中所述有效卷积结果和所述第i个卷积结果中所述有效卷积结果具有相同的数据位置;并将所述重排后的第(i-1)次累加结果输出至所述加法单元;将所述重排后的第(i-1)次累加结果与所述第i个卷积结果累加,得到第i次累加结果,并将所述第i次累加结果存储到所述第三存储单元中且覆盖所述第(i-1)次累加结果;所述控制单元还用于判断i的数值大小,若i小于Wk*Hk,将i更新为i+1,并再次执行所述运算步骤;若i等于Wk*Hk,将所述第i次累加结果中的所述有效卷积结果作为所述神经网络运算的输出结果;其中,i的初始值为1,且i=1时,第0次累加结果被设定零,且所述重排后的第0次累加结果中所述有效卷积结果和所述第1个卷积结果中所述有效卷积结果被默认为具有相同的数据位置。In order to achieve the above purpose, the embodiment of the present application also provides a neural network computing device, including: a first storage unit, a second storage unit, a third storage unit, a control unit, a third data rearrangement unit, a convolution unit and an addition unit unit; the first storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the neural network operation The N Wk*Hk*C convolution kernels are split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into all Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution The kernel corresponds to part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the control unit is used for from The first storage unit obtains the input data, and inputs the input data into the convolution unit, and the control unit is further configured to obtain the i-th sub-convolution kernel group from the second storage unit, and The i-th sub-convolution kernel group is input to the convolution unit; the convolution unit is used to convolve the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result, and The i-th convolution result is output to the adding unit; wherein, the i-th sub-convolution kernel group is convoluted with the part of the input data corresponding to the i-th sub-convolution kernel group to obtain the i-th sub-convolution kernel group The effective convolution result corresponding to the convolution kernel group, the i-th convolution result includes the effective convolution result; the control unit is also used to obtain the (i-th) from the third storage unit 1) The accumulation result of the second time, and sending the (i-1)th time accumulation result to the third data rearrangement unit; the third data rearrangement unit is used for the (i-1)th time The accumulation result is rearranged, so that the effective convolution result in the rearranged (i-1)th accumulation result and the effective convolution result in the ith convolution result have the same data position; And output the (i-1)th accumulation result after the rearrangement to the adding unit; accumulate the (i-1)th accumulation result after the rearrangement with the ith convolution result , to obtain the i-th accumulation result, and store the i-th accumulation result in the third storage unit and cover the (i-1)-th accumulation result; the control unit is also used to judge the i Numerical size, if i is less than Wk*Hk, update i to i+1, and perform the operation steps again; if i is equal to Wk*Hk, use the effective convolution result in the i-th accumulation result as The output result of the neural network operation; wherein, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the 0th accumulation result after the rearrangement is valid The convolution result and the effective convolution result in the first convolution result are assumed to have the same data location by default.
为实现上述目的,本申请实施例还提供了一种芯片,包括:至少一个处理器;以及,与所述至少一个 处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的神经网络运算方法。To achieve the above purpose, an embodiment of the present application further provides a chip, including: at least one processor; and a memory connected in communication with the at least one processor; wherein, the memory stores information that can be used by the at least one processor. An instruction executed by a processor, the instruction is executed by the at least one processor, so that the at least one processor can execute the above-mentioned neural network operation method.
为实现上述目的,本申请实施例还提供了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的神经网络运算方法。To achieve the above purpose, an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be used by the at least one processor An instruction executed by a processor, the instruction is executed by the at least one processor, so that the at least one processor can execute the above-mentioned neural network operation method.
为实现上述目的,本申请实施例还提供了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现上述的神经网络运算方法。To achieve the above purpose, the embodiment of the present application further provides a computer-readable storage medium storing a computer program, and the computer program implements the above-mentioned neural network operation method when executed by a processor.
本申请提出的神经网络运算方法,在神经网络运算的过程中,获取神经网络运算的输入数据和Wk*Hk个子卷积核组,并进入运算步骤;其中,神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,N*Wk*Hk个1*1*C子卷积核被划分成Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的部分输入数据相同;运算步骤,包括:根据各子卷积核组对应的数据重排方式对输入数据进行重排,得到各子卷积核组对应的重排后输入数据;将各子卷积核组和各子卷积核组对应的重排后输入数据卷积得到各子卷积核组对应的卷积结果;将各子卷积核组对应的卷积结果累加得到累加结果,并将累加结果中位于有效位置的数据作为神经网络运算的输出结果;其中,各子卷积核组对应的重排后输入数据中与各子卷积核组对应的部分输入数据具有相同的数据位置,相同的数据位置为有效位置。通过将卷积进行拆分,复用输入数据,而不需要为了满足调度和计算将数据进行img2col的重排,由于不对输入数据做img2col转换,直接在原始输入数据上进行计算,从而消除因为img2col带来的硬件设计的开销、数据访存量的增加、以及动态功耗的增加。The neural network operation method proposed by this application, in the process of neural network operation, obtains the input data of neural network operation and Wk*Hk sub-convolution kernel groups, and enters the operation step; wherein, the N Wk*Hk of neural network operation The *C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups , and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data , and in the case of N≥2, part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the operation step includes: according to the data rearrangement method corresponding to each sub-convolution kernel group, input Rearrange the data to obtain the rearranged input data corresponding to each sub-convolution kernel group; convolve each sub-convolution kernel group and the rearranged input data corresponding to each sub-convolution kernel group to obtain each sub-convolution kernel group Corresponding convolution results; accumulate the convolution results corresponding to each sub-convolution kernel group to obtain an accumulation result, and use the data in the effective position in the accumulation result as the output result of the neural network operation; wherein, each sub-convolution kernel group corresponds to Part of the input data corresponding to each sub-convolution kernel group in the rearranged input data has the same data position, and the same data position is a valid position. By splitting the convolution and reusing the input data, there is no need to rearrange the data by img2col in order to meet the scheduling and calculation. Since the input data is not converted to img2col, the calculation is directly performed on the original input data, thereby eliminating the img2col The hardware design overhead, the increase in the amount of data access, and the increase in dynamic power consumption are brought about.
附图说明Description of drawings
图1是现有技术中对输入数据进行img2col处理的示意图;Fig. 1 is a schematic diagram of performing img2col processing on input data in the prior art;
图2是本申请实施方式提供的神经网络运算方法的流程图;Fig. 2 is a flow chart of the neural network operation method provided by the embodiment of the present application;
图3是本申请实施方式提供的输入数据的示意图;Fig. 3 is a schematic diagram of the input data provided by the embodiment of the present application;
图4是本申请实施方式提供的子卷积核组的示意图;FIG. 4 is a schematic diagram of a sub-convolution kernel group provided in an embodiment of the present application;
图5是本申请实施方式提供的重排后输入数据的有效位置的示意图;Fig. 5 is a schematic diagram of the effective position of the rearranged input data provided by the embodiment of the present application;
图6是本申请实施方式提供的拆分输入数据的示意图;FIG. 6 is a schematic diagram of splitting input data provided by an embodiment of the present application;
图7是现有技术中对输入数据进行卷积的示意图;Fig. 7 is a schematic diagram of performing convolution on input data in the prior art;
图8是本申请实施方式提供的对输入数据进行卷积的示意图;FIG. 8 is a schematic diagram of convolution of input data provided by an embodiment of the present application;
图9是本申请实施方式提供的神经网络运算方法方法的流程图;Fig. 9 is a flow chart of the neural network operation method method provided by the embodiment of the present application;
图10是本申请实施方式提供的神经网络运算方法的流程图;Fig. 10 is a flow chart of the neural network operation method provided by the embodiment of the present application;
图11是本申请实施方式提供的神经网络运算方法的流程图;Fig. 11 is a flow chart of the neural network operation method provided by the embodiment of the present application;
图12是本申请实施方式提供的神经网络运算装置的结构示意图;FIG. 12 is a schematic structural diagram of a neural network computing device provided in an embodiment of the present application;
图13是本申请实施方式提供的神经网络运算装置的结构示意图;FIG. 13 is a schematic structural diagram of a neural network computing device provided in an embodiment of the present application;
图14是本申请实施方式提供的神经网络运算装置的结构示意图;FIG. 14 is a schematic structural diagram of a neural network computing device provided in an embodiment of the present application;
图15是本申请实施方式提供的芯片的结构示意图;FIG. 15 is a schematic structural diagram of a chip provided in an embodiment of the present application;
图16是本申请实施方式提供的电子设备的结构示意图。FIG. 16 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而 提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can understand that in each embodiment of the application, many technical details are proposed in order to enable readers to better understand the application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solutions claimed in this application can also be realized. The division of the following embodiments is for the convenience of description, and should not constitute any limitation to the specific implementation of the present application, and the embodiments can be combined and referred to each other on the premise of no contradiction.
本申请的一个实施例涉及一种神经网络运算方法,如图2所示,包括:An embodiment of the present application relates to a neural network operation method, as shown in Figure 2, including:
步骤101,获取神经网络运算的输入数据和Wk*Hk个子卷积核组,并进入运算步骤;其中,神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,N*Wk*Hk个1*1*C子卷积核被划分成Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的部分输入数据相同。 Step 101, obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1*1*C sub-convolution kernel, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1 *C sub-convolution kernel, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2, each sub-convolution Part of the input data corresponding to the N sub-convolution kernels in the kernel group is the same.
在一示例实施中,如图3所示,所获取的输入数据是W 输入*H 输入*C 输入的数据,当C 输入为1时,输入数据为二维数据,当C 输入大于1时,输入数据为三维数据。 In an exemplary implementation, as shown in FIG. 3 , the acquired input data is the data input by W input *H input *C input . When the C input is 1, the input data is two-dimensional data. When the C input is greater than 1, The input data is three-dimensional data.
在一示例实施中,如图4所示,所获取的子卷积核组是由卷积核拆分得到的,一个子卷积核组中包含的子卷积核的个数由进行拆分的卷积核的个数决定的,如:有9个卷积核进行拆分,所获取的各个子卷积核组中均包含有9个子卷积核;而子卷积核组的个数是由卷积核的宽度W和长度H决定的,如:进行拆分的卷积核为3*3大小的,拆分之后可以得到3*3=9个子卷积核组。In an example implementation, as shown in Figure 4, the obtained sub-convolution kernel group is obtained by splitting the convolution kernel, and the number of sub-convolution kernels contained in a sub-convolution kernel group is split by The number of convolution kernels is determined by the number of convolution kernels. For example, if there are 9 convolution kernels for splitting, each sub-convolution kernel group obtained contains 9 sub-convolution kernels; and the number of sub-convolution kernel groups It is determined by the width W and length H of the convolution kernel. For example, the split convolution kernel is 3*3 in size, and 3*3=9 sub-convolution kernel groups can be obtained after splitting.
在一示例实施中,每个子卷积核对应输入数据中的部分输入数据,如:图4的子卷积核00对应图3的输入数据中的00至77、图4的子卷积核01对应图3的输入数据中的01至78、图4的子卷积核02对应图3的输入数据中的02至79、……、图4的子卷积核22对应图3的输入数据中的22至99;当一个子卷积核组中包含两个以上的子卷积核时,每个子卷积核组中的N个子卷积核对应的部分输入数据相同,如:图4的第1个子卷积核组中的各子卷积核00均对应图3的输入数据中的00至77、图4的第2个子卷积核组中的各子卷积核01对应图3的输入数据中的01至78、图4的第3个子卷积核组中的各子卷积核02对应图3的输入数据中的02至79、……、图4的第9个子卷积核组中的各子卷积核22对应图3的输入数据中的22至99。In an exemplary implementation, each sub-convolution kernel corresponds to part of the input data in the input data, such as: the sub-convolution kernel 00 in FIG. 4 corresponds to 00 to 77 in the input data in FIG. 3 , and the sub-convolution kernel 01 in FIG. 4 Corresponding to 01 to 78 in the input data of Figure 3, the sub-convolution kernel 02 in Figure 4 corresponds to 02 to 79 in the input data of Figure 3, ..., the sub-convolution kernel 22 in Figure 4 corresponds to the input data in Figure 3 22 to 99; when a sub-convolution kernel group contains more than two sub-convolution kernels, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same, such as: Figure 4 Each sub-convolution kernel 00 in one sub-convolution kernel group corresponds to 00 to 77 in the input data in Figure 3, and each sub-convolution kernel 01 in the second sub-convolution kernel group in Figure 4 corresponds to the input in Figure 3 01 to 78 in the data, each sub-convolution kernel 02 in the third sub-convolution kernel group in Figure 4 corresponds to 02 to 79 in the input data in Figure 3, ..., the ninth sub-convolution kernel group in Figure 4 Each sub-convolution kernel 22 in corresponds to 22 to 99 in the input data of FIG. 3 .
步骤102,根据各子卷积核组对应的数据重排方式对输入数据进行重排,得到各子卷积核组对应的重排后输入数据;将各子卷积核组和各子卷积核组对应的重排后输入数据卷积得到各子卷积核组对应的卷积结果;将各子卷积核组对应的卷积结果累加得到累加结果,并将累加结果中位于有效位置的数据作为神经网络运算的输出结果;其中,各子卷积核组对应的重排后输入数据中与各子卷积核组对应的部分输入数据具有相同的数据位置,相同的数据位置为有效位置。 Step 102, rearrange the input data according to the data rearrangement mode corresponding to each sub-convolution kernel group, and obtain the rearranged input data corresponding to each sub-convolution kernel group; combine each sub-convolution kernel group and each sub-convolution kernel group The rearranged input data corresponding to the kernel group is convolved to obtain the convolution result corresponding to each sub-convolution kernel group; the convolution results corresponding to each sub-convolution kernel group are accumulated to obtain the cumulative result, and the cumulative result is located in a valid position The data is used as the output result of the neural network operation; among them, the part of the input data corresponding to each sub-convolution kernel group in the rearranged input data corresponding to each sub-convolution kernel group has the same data position, and the same data position is an effective position .
在一示例实施中,各子卷积核组和输入数据的运算过程在神经网络的矩阵运算单元中完成,各子卷积核组都有其对应的数据重排方式,将各子卷积核组与输入数据进行卷积之前,需要先获取各子卷积核组对应的数据重排方式,根据各子卷积核组对应的数据重排方式对输入数据进行重排,得到各子卷积核组对应的重排后输入数据。之后将各子卷积核组和各子卷积核组对应的重排后输入数据进行卷积,得到各子卷积核组对应的卷积结果,再将各子卷积核组对应的卷积结果进行累加得到累加结果,并将累加结果中位于有效位置的数据作为神经网络运算的输出结果。In an example implementation, the operation process of each sub-convolution kernel group and input data is completed in the matrix operation unit of the neural network, each sub-convolution kernel group has its corresponding data rearrangement mode, and each sub-convolution kernel Before convolving the group with the input data, it is necessary to obtain the data rearrangement method corresponding to each sub-convolution kernel group, rearrange the input data according to the data rearrangement method corresponding to each sub-convolution kernel group, and obtain each sub-convolution kernel group The rearranged input data corresponding to the core group. After that, each sub-convolution kernel group and the rearranged input data corresponding to each sub-convolution kernel group are convoluted to obtain the convolution results corresponding to each sub-convolution kernel group, and then the convolution results corresponding to each sub-convolution kernel group are The accumulation results are accumulated to obtain the accumulation results, and the data in the effective position in the accumulation results is used as the output result of the neural network operation.
在一示例实施中,各子卷积核组对应的重排后输入数据中与各子卷积核组对应的部分输入数据具有相同的数据位置,相同的数据位置为有效位置。也就是:各子卷积核组的重排后输入数据与其部分输入数据具有相同的数据位置,但相同的数据位置上的数据进行了重排,图4中的第1个子卷积核组00对应的数据重排方式被设定为输入数据中各部分数据的位置不变,第1个子卷积核组00对应的重排后输入数据如图5所示;图4中的第2个子卷积核组01对应的数据重排方式被设定为输入数据中各部分数据的各列数 据向前平移一列,第2个子卷积核组01对应的重排后输入数据如图5所示,以此类推,图4中的第Wk*Hk个子卷积核组22对应的数据重排方式被设定为输入数据中各部分数据的各列数据向前平移两列和各行向上平移两列,第Wk*Hk个子卷积核组22对应的重排后输入数据如图5所示;其中,有效位置为图5中的实线部分数据。In an example implementation, among the rearranged input data corresponding to each sub-convolution kernel group, part of the input data corresponding to each sub-convolution kernel group has the same data position, and the same data position is an effective position. That is: the rearranged input data of each sub-convolution kernel group has the same data position as some of its input data, but the data on the same data position has been rearranged. The first sub-convolution kernel group 00 in Figure 4 The corresponding data rearrangement method is set so that the position of each part of the data in the input data remains unchanged, and the rearranged input data corresponding to the first sub-convolution kernel group 00 is shown in Figure 5; the second sub-volume in Figure 4 The data rearrangement method corresponding to the kernel group 01 is set to shift the data of each column of each part of the data in the input data forward by one column, and the rearranged input data corresponding to the second sub-convolution kernel group 01 is shown in Figure 5. By analogy, the data rearrangement mode corresponding to the Wk*Hkth sub-convolution kernel group 22 in FIG. 4 is set to shift the data of each column of each part of the data in the input data forward by two columns and each row by two columns upwards, The rearranged input data corresponding to the Wk*Hkth sub-convolution kernel group 22 is shown in FIG. 5 ; wherein, the valid position is the data in the solid line in FIG. 5 .
在一示例实施中,为了满足矩阵运算单元的高效运行,为矩阵运算提供输入数据位宽需要和矩阵运算的规模匹配,假设矩阵运算模块单周期能够输出M*N的矩阵,那么输入数据的带宽M*W i,其中W i为单个数据的表示的位宽,如INT8精度下W i=8,FP16精度下W i=16。假设C0为参与一次矩阵运算的输入数据的深度,也就是输入数据在深度方向的切分粒度,最小为1。C 1输入数据总的深度C根据C0为粒度切分的次数。输入数据在Buffer中的存放顺序为按照C 1HWC0,如图6所示,M*Wi bit位宽的数据将被分成M*Wi/C0组,每组的数据都用一个内存块进行存储,位宽为Wi*C0,每组数据都有单独的地址管理,单个周期可以从任意位置取出W i*C 0bit的数据。数据重排模块根据从每个Buffer读取的数据进行重新排布,然后送到矩阵运算模块进行处理。 In an example implementation, in order to meet the efficient operation of the matrix operation unit, the input data bit width for the matrix operation needs to match the scale of the matrix operation. Assuming that the matrix operation module can output an M*N matrix in a single cycle, then the bandwidth of the input data M*W i , wherein W i is the bit width of a single data representation, for example, W i =8 for INT8 precision, and W i =16 for FP16 precision. Assume that C0 is the depth of the input data participating in a matrix operation, that is, the segmentation granularity of the input data in the depth direction, and the minimum is 1. C 1 The total depth of input data C is the number of granularity segmentation based on C0. The storage order of the input data in the Buffer is C 1 HWC0, as shown in Figure 6, the M*Wi bit width data will be divided into M*Wi/C0 groups, and each group of data is stored in a memory block. The bit width is Wi*C0, each group of data has a separate address management, and the data of W i *C 0 bit can be fetched from any position in a single cycle. The data rearrangement module rearranges the data read from each Buffer, and then sends it to the matrix operation module for processing.
在一示例实施中,若将输入数据存储至内存块中时,输入到矩阵运算单元的指令只需要指定输入数据的起始地址、权重数据的起始地址,以及2个参与矩阵运算的矩阵的大小,即m、n、k的大小,假设矩阵单元支持的最小规格为M*K和K*N的矩阵运算,那么控制模块会自动每个周期从存储单元1获取M*K个数据,从存储单元2取K*N个数据加载到矩阵运算单元进行矩阵运算,对于m*k和k*n的2个数据矩阵,控制单元会自动进行切分并进行计算,所以m、n、k的值必须是M、N、K的整数倍,如果不是需要在输入的时候就补齐好,即参与矩阵运算的时候,m、n、k的值满足是M、N、K的整数倍的条件。In an exemplary implementation, if the input data is stored in the memory block, the instruction input to the matrix operation unit only needs to specify the start address of the input data, the start address of the weight data, and the addresses of the two matrices participating in the matrix operation Size, that is, the size of m, n, and k. Assuming that the matrix unit supports the minimum specification of M*K and K*N matrix operations, the control module will automatically obtain M*K data from storage unit 1 every cycle, from The storage unit 2 takes K*N data and loads them into the matrix operation unit for matrix operation. For the two data matrices of m*k and k*n, the control unit will automatically segment and perform calculations, so m, n, k The value must be an integer multiple of M, N, and K. If it is not necessary to complete it when inputting, that is, when participating in matrix operations, the values of m, n, and k satisfy the condition that they are integer multiples of M, N, and K .
在一示例实施中,图7所示的是现有技术中所使用的卷积方式,图8是本申请所使用的卷积过程,各个子卷积核组的输入源数据是相同的,并没有产生数据膨胀。卷积计算的流程和传统的卷积计算流程基本一致,先加载输入数据和权重数据,对于单次卷积块来说,输入数据是一次性加载,卷积时的权重数据可以根据图8所示的方式拆分进行分批加载,但也可以一次加载到权重的缓存单元。为了满足更好的并行性,输入数据和权重数据可以同时加载到各自的存储单元。输入数据和权重加载完成后,按照图8描述的拆分规则进行多次矩阵运算操作,中间不需要改变输入数据,只需要指定不同的起始位置,其中输入数据高度复用,对于卷积时的数据,每次都是取不同的卷积时的权值数据参与矩阵运算。In an example implementation, what Fig. 7 shows is the convolution method used in the prior art, Fig. 8 is the convolution process used in the present application, the input source data of each sub-convolution kernel group is the same, and No data bloat occurs. The convolution calculation process is basically the same as the traditional convolution calculation process. The input data and weight data are loaded first. For a single convolution block, the input data is loaded at one time, and the weight data during convolution can be calculated according to Figure 8. The method shown above is divided into batches for loading, but it can also be loaded to the cache unit of the weight at one time. In order to meet better parallelism, input data and weight data can be loaded to their respective storage units at the same time. After the input data and weights are loaded, multiple matrix operations are performed according to the splitting rules described in Figure 8. There is no need to change the input data in the middle, and only need to specify different starting positions. The input data is highly multiplexed. For convolution For the data, the weight data of different convolutions are taken each time to participate in the matrix operation.
在一示例实施中,假设图8所示的深度C为16,那针对图8所示的10*10*16的输入数据,图4所示的卷积核为3*3*16(拆分为图8中的1*1*16的各个子卷积核组),会增加2列输出的无效计算(即图5中虚线所示部分),假设矩阵计算本身效率为100%,那么针对10*10*16的输入数据,3*3*16的卷积核下效率为8*8/(8*10)=80%。而通常情况下通用神经网络加速器在单图模式下很难做到大于50%的效率,所以这种计算模式并不会对整个网络计算效率有太多影响,但输入数据访问仅仅为img2col模式下的(10*10)/(9*8*8)=17.3%。对于输入尺寸大于10*10的情况下无效计算的比例会更低,无效计算比例(Wk-1)/W。如果输入尺寸过小,那么可以通过批量或img2col的方式上进行计算,由于这种情况下输入数据通常不会是瓶颈,产生img2col数据也不会成影响系统性能。In an exemplary implementation, assuming that the depth C shown in FIG. 8 is 16, then for the input data of 10*10*16 shown in FIG. 8, the convolution kernel shown in FIG. 4 is 3*3*16 (split For each sub-convolution kernel group of 1*1*16 in Figure 8), it will increase the invalid calculation of the output of 2 columns (that is, the part shown by the dotted line in Figure 5), assuming that the efficiency of the matrix calculation itself is 100%, then for 10 *10*16 input data, the efficiency under the 3*3*16 convolution kernel is 8*8/(8*10)=80%. In general, it is difficult for a general-purpose neural network accelerator to achieve an efficiency greater than 50% in single-image mode, so this calculation mode will not have much impact on the entire network calculation efficiency, but the input data access is only in img2col mode (10*10)/(9*8*8)=17.3%. For the case where the input size is greater than 10*10, the proportion of invalid calculation will be lower, and the proportion of invalid calculation is (Wk-1)/W. If the input size is too small, it can be calculated in batches or img2col. In this case, the input data is usually not the bottleneck, and the generation of img2col data will not affect system performance.
在一示例实施中,对于包含N个子卷积核的子卷积核组,子卷积核组中的N个子卷积核分别与该子卷积核组对应的重排后输入数据进行卷积,得到N个子卷积结果,将N个子卷积结果分别作为子卷积核组对应的卷积结果的第N层数据。In an exemplary implementation, for a sub-convolution kernel group containing N sub-convolution kernels, the N sub-convolution kernels in the sub-convolution kernel group are respectively convolved with the rearranged input data corresponding to the sub-convolution kernel group , N sub-convolution results are obtained, and the N sub-convolution results are respectively used as the Nth layer data of the convolution results corresponding to the sub-convolution kernel group.
本申请实施例,在神经网络运算的过程中,获取神经网络运算的输入数据和Wk*Hk个子卷积核组,并进入运算步骤;其中,神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,N*Wk*Hk个1*1*C子卷积核被划分成Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积 核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的部分输入数据相同;运算步骤,包括:根据各子卷积核组对应的数据重排方式对输入数据进行重排,得到各子卷积核组对应的重排后输入数据;将各子卷积核组和各子卷积核组对应的重排后输入数据卷积得到各子卷积核组对应的卷积结果;将各子卷积核组对应的卷积结果累加得到累加结果,并将累加结果中位于有效位置的数据作为神经网络运算的输出结果;其中,各子卷积核组对应的重排后输入数据中与各子卷积核组对应的部分输入数据具有相同的数据位置,相同的数据位置为有效位置。通过将卷积进行拆分,复用输入数据,而不需要为了满足调度和计算将数据进行img2col的重排,由于不对输入数据做img2col转换,直接在原始输入数据上进行计算,从而消除因为img2col带来的硬件设计的开销、数据访存量的增加、以及动态功耗的增加。In the embodiment of the present application, in the process of neural network operation, the input data of neural network operation and Wk*Hk sub-convolution kernel groups are obtained, and enter the operation step; wherein, the N Wk*Hk*C convolutions of neural network operation The kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel The convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in N In the case of ≥2, part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same; the operation step includes: rearranging the input data according to the data rearrangement method corresponding to each sub-convolution kernel group , to obtain the rearranged input data corresponding to each sub-convolution kernel group; convolve each sub-convolution kernel group and the rearranged input data corresponding to each sub-convolution kernel group to obtain the convolution corresponding to each sub-convolution kernel group Result; the convolution results corresponding to each sub-convolution kernel group are accumulated to obtain the accumulation result, and the data in the effective position in the accumulation result is used as the output result of the neural network operation; wherein, the corresponding rearrangement of each sub-convolution kernel group Part of the input data corresponding to each sub-convolution kernel group in the input data has the same data position, and the same data position is a valid position. By splitting the convolution and reusing the input data, there is no need to rearrange the data by img2col in order to meet the scheduling and calculation. Since the input data is not converted to img2col, the calculation is directly performed on the original input data, thereby eliminating the img2col The hardware design overhead, the increase in the amount of data access, and the increase in dynamic power consumption are brought about.
本申请的一个实施例涉及一种神经网络运算方法的运算步骤,如图9所示,包括:An embodiment of the present application relates to the operation steps of a neural network operation method, as shown in FIG. 9 , including:
步骤201,根据第i个子卷积核组对应的数据重排方式对输入数据进行重排,得到第i个重排后输入数据。 Step 201, rearrange the input data according to the data rearrangement mode corresponding to the i-th sub-convolution kernel group, and obtain the i-th rearranged input data.
在一示例实施中,本申请在获取到输入数据之后,再获取子卷积核组,但在获取子卷积核组的时候,并不是一次性全部获取到Wk*Hk个子卷积核组;而是首先获取第1个子卷积核组,在第1个子卷积核组执行完所有的运算步骤之后,在获取第2个子卷积核组,以此类推,直至获取到第Wk*Hk个子卷积核组。In an example implementation, after the application obtains the input data, the sub-convolution kernel group is obtained again, but when the sub-convolution kernel group is obtained, Wk*Hk sub-convolution kernel groups are not obtained all at once; Instead, first obtain the first sub-convolution kernel group, after the first sub-convolution kernel group performs all the operation steps, obtain the second sub-convolution kernel group, and so on, until the Wk*Hkth sub-convolution kernel group is obtained Convolution kernel group.
在一示例实施中,对于每一次所获取到的第i个子卷积核组,首先确定该第i个子卷积核组对应的第i个数据重排方式,在根据第i个数据重排方式对输入数据进行重排,得到第i个重排后输入数据;其中,i的取值为从1至Wk*Hk。In an exemplary implementation, for each obtained i-th sub-convolution kernel group, first determine the i-th data rearrangement method corresponding to the i-th sub-convolution kernel group, and then according to the i-th data rearrangement method Rearrange the input data to obtain the i-th rearranged input data; wherein, the value of i is from 1 to Wk*Hk.
在一示例实施中,第i个子卷积核组的加载方式为数据覆盖加载方式,如:加载第2个子卷积核组时,用第2个子卷积核组覆盖第1个子卷积核组,以此来减少存储各个子卷积核组所占用的内存。In an example implementation, the loading method of the i-th sub-convolution kernel group is a data coverage loading method, such as: when loading the second sub-convolution kernel group, cover the first sub-convolution kernel group with the second sub-convolution kernel group , so as to reduce the memory occupied by storing each sub-convolution kernel group.
在一示例实施中,当i的值为1时,第1个子卷积核组对应的数据重排方式被设定为输入数据中各部分数据的位置不变。In an exemplary implementation, when the value of i is 1, the data rearrangement mode corresponding to the first sub-convolution kernel group is set so that the position of each part of the data in the input data remains unchanged.
步骤202,将第i个子卷积核组和第i个重排后输入数据卷积,得到第i个卷积结果。 Step 202, convolving the i-th sub-convolution kernel group with the i-th rearranged input data to obtain the i-th convolution result.
在一示例实施中,在获取到第i个重排后输入数据之后,将第i个子卷积核组和第i个重排后输入数据进行卷积,得到第i个卷积结果。In an exemplary implementation, after the i-th rearranged input data is obtained, the i-th sub-convolution kernel group is convolved with the i-th rearranged input data to obtain the i-th convolution result.
在一示例实施中,当神经网络中包含有X个矩阵运算单元时(以3个为例),可以将图4中的9个子卷积核组划分为3组,第一次将第1-3个子卷积核组分别输入到3个矩阵运算单元进行运算,第二次将第4-6个子卷积核组分别输入到3个矩阵运算单元进行运算,第三次将第7-9个子卷积核组分别输入到3个矩阵运算单元进行运算,获取到各自对应的卷积结果。In an exemplary implementation, when X matrix operation units are included in the neural network (taking 3 as an example), the 9 sub-convolution kernel groups in Fig. 4 can be divided into 3 groups, and the 1st- The 3 sub-convolution kernel groups are respectively input to 3 matrix operation units for operation, the 4th-6th sub-convolution kernel groups are respectively input to 3 matrix operation units for operation in the second time, and the 7th-9th sub-convolution kernel groups are respectively input to 3 matrix operation units for operation in the third time The convolution kernel groups are respectively input to three matrix operation units for operation, and the corresponding convolution results are obtained.
步骤203,将第i个卷积结果和第(i-1)次累加结果累加,得到第i次累加结果。Step 203: Accumulate the i-th convolution result and the (i-1)-th accumulation result to obtain the i-th accumulation result.
在一示例实施中,将所生成的第i个卷积结果与前一次的第(i-1)次累加结果进行累加,得到第i次累加结果。In an exemplary implementation, the generated i-th convolution result is accumulated with the previous (i-1)-th accumulation result to obtain the i-th accumulation result.
在一示例实施中,当i的值为1时,第1个子卷积核组对应的第0次累加结果被设定零。In an example implementation, when the value of i is 1, the 0th accumulation result corresponding to the 1st sub-convolution kernel group is set to zero.
在一示例实施中,当神经网络中包含有Y个加法单元时(以3个为例),可以将图4中的9个子卷积核组对应的9个卷积结果,划分为3组,将第1-3个卷积结果输入到第1个加法单元中计算第1-3个的卷积累加结果,将第4-6个卷积结果输入到第2个加法单元中计算第4-6个的卷积累加结果,将第7-8个卷积结果输入到第3个加法单元中计算第7-8个的卷积累加结果,之后再将第1-3个的卷积累加结果、第4-6个的卷积累加结果和第7-8个的卷积累加结果输入到任一个加法单元中计算获取到9个卷积结果的累加结果。In an exemplary implementation, when the neural network includes Y addition units (taking 3 as an example), the 9 convolution results corresponding to the 9 sub-convolution kernel groups in Figure 4 can be divided into 3 groups, Input the 1st-3rd convolution result into the 1st addition unit to calculate the 1st-3rd convolution accumulation result, input the 4th-6th convolution result into the 2nd addition unit to calculate the 4th- 6 convolution accumulation results, input the 7th to 8th convolution results into the third addition unit to calculate the 7th to 8th convolution accumulation results, and then add the 1st to 3rd convolution accumulation results , the 4th to 6th convolution accumulation results and the 7th to 8th convolution accumulation results are input to any addition unit to calculate and obtain the accumulation results of 9 convolution results.
步骤204,若i小于Wk*Hk,将i更新为i+1,并再次执行运算步骤;若i等于Wk*Hk,将第i次累加结果中有效位置上的数据作为神经网络运算的输出结果。 Step 204, if i is less than Wk*Hk, update i to i+1, and execute the operation step again; if i is equal to Wk*Hk, use the data at the effective position in the i-th accumulation result as the output result of the neural network operation .
在一示例实施中,当第i个子卷积核组的运算完成之后,需要判断i值的大小,当i值小于Wk*Hk时,将i值更新为i+1,并加载第i+1个子卷积核组,再次执行运算步骤;当i值等于Wk*Hk时,此时运算步骤结束,将将第i次累加结果中有效位置上的数据作为神经网络运算的输出结果。In an example implementation, after the operation of the ith sub-convolution kernel group is completed, it is necessary to judge the size of the i value, and when the i value is less than Wk*Hk, update the i value to i+1, and load the i+1th Sub-convolution kernel group, execute the operation step again; when the value of i is equal to Wk*Hk, the operation step ends at this time, and the data at the effective position in the i-th accumulation result will be used as the output result of the neural network operation.
本申请的实施方式,在其他实施例的基础之上还可以通过串行或并行的方式进行子卷积核组和输入数据的卷积运算,从而使得本申请所提及的神经网络运算方法能够应用到各类型的神经网络中。In the embodiment of the present application, on the basis of other embodiments, the convolution operation of the sub-convolution kernel group and the input data can also be performed in a serial or parallel manner, so that the neural network operation method mentioned in the application can Applied to various types of neural networks.
本申请的一个实施例涉及一种神经网络运行方法,如图10所示,包括:An embodiment of the present application relates to a neural network operation method, as shown in FIG. 10 , including:
步骤301,获取神经网络运算的输入数据和Wk*Hk个子卷积核组,并进入运算步骤;其中,神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,N*Wk*Hk个1*1*C子卷积核被划分成Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的部分输入数据相同。 Step 301, obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1*1*C sub-convolution kernel, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1 *C sub-convolution kernel, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2, each sub-convolution Part of the input data corresponding to the N sub-convolution kernels in the kernel group is the same.
在一示例实施中,本步骤与本申请实施例的步骤101大致相同,此处不一一赘述。In an exemplary implementation, this step is substantially the same as step 101 in the embodiment of the present application, and details are not repeated here.
步骤302,将各子卷积核组和输入数据分别卷积得到各子卷积核组对应的卷积结果;根据各子卷积核组对应的数据重排方式对各子卷积核组对应的卷积结果进行重排,得到各子卷积核组对应的重排后卷积结果;将各子卷积核组对应的重排后卷积结果累加得到累加结果,并将累加结果中位于有效位置的数据作为神经网络运算的输出结果;其中,各子卷积核组与各子卷积核组对应的部分输入数据卷积得到各子卷积核组对应的有效卷积结果,各子卷积核组对应的重排后卷积结果中的有效卷积结果具有相同的数据位置,且相同的数据位置为有效位置。 Step 302, respectively convolving each sub-convolution kernel group with the input data to obtain the convolution result corresponding to each sub-convolution kernel group; corresponding to each sub-convolution kernel group according to the data rearrangement mode corresponding to each sub-convolution kernel group The convolution results are rearranged to obtain the rearranged convolution results corresponding to each sub-convolution kernel group; the rearranged convolution results corresponding to each sub-convolution kernel group are accumulated to obtain the cumulative result, and the accumulated results are located in The data at the effective position is taken as the output result of the neural network operation; among them, each sub-convolution kernel group is convoluted with part of the input data corresponding to each sub-convolution kernel group to obtain an effective convolution result corresponding to each sub-convolution kernel group, and each sub-convolution kernel group The valid convolution results in the rearranged convolution results corresponding to the convolution kernel group have the same data position, and the same data position is an effective position.
在一示例实施中,先将各子卷积核组和输入数据分别进行卷积得到各子卷积核组对应的卷积结果,之后再获取到各子卷积核组对应的数据重排方式,根据各子卷积核组对应的数据重排方式对各子卷积核组对应的卷积结果进行重排,得到各子卷积核组对应的重排后卷积结果;之后将各子卷积核组对应的重排后卷积结果累加得到累加结果,并将累加结果中位于有效位置的数据作为神经网络运算的输出结果。In an example implementation, each sub-convolution kernel group is firstly convolved with the input data to obtain the convolution result corresponding to each sub-convolution kernel group, and then the data rearrangement method corresponding to each sub-convolution kernel group is obtained , rearrange the convolution results corresponding to each sub-convolution kernel group according to the data rearrangement method corresponding to each sub-convolution kernel group, and obtain the rearranged convolution results corresponding to each sub-convolution kernel group; The rearranged convolution results corresponding to the convolution kernel group are accumulated to obtain an accumulation result, and the data in the effective position in the accumulation result is used as the output result of the neural network operation.
在一示例实施中,对于包含N个子卷积核的子卷积核组,子卷积核组中的N个子卷积核分别与该子卷积核组对应的重排后输入数据进行卷积,得到N个子卷积结果,将N个子卷积结果分别作为子卷积核组对应的卷积结果的第N层数据。在一示例实施中,各子卷积核组对应的数据重排方式与本申请实施例步骤102提及的数据重排方式大致相同,此处不一一赘述。In an exemplary implementation, for a sub-convolution kernel group containing N sub-convolution kernels, the N sub-convolution kernels in the sub-convolution kernel group are respectively convolved with the rearranged input data corresponding to the sub-convolution kernel group , N sub-convolution results are obtained, and the N sub-convolution results are respectively used as the Nth layer data of the convolution results corresponding to the sub-convolution kernel group. In an exemplary implementation, the data rearrangement manners corresponding to each sub-convolution kernel group are substantially the same as the data rearrangement manners mentioned in step 102 of the embodiment of the present application, and will not be repeated here.
在一示例实施中,步骤201至步骤204提及的串行运算和并行运算均可以应用在本申请实施例中;先对一个子卷积核组进行卷积、重排后累加,直至进行到最后一个子卷积核组的卷积、重排后累加,便可以得到神经网络运算的输出结果;也可以由多个矩阵运算单元和或多个加法单元进行运算步骤。In an exemplary implementation, both the serial operations and parallel operations mentioned in steps 201 to 204 can be applied in the embodiment of the present application; first, a sub-convolution kernel group is convolved, rearranged and accumulated until the The output result of the neural network operation can be obtained by the convolution and rearrangement of the last sub-convolution kernel group and accumulation; the operation steps can also be performed by multiple matrix operation units and or multiple addition units.
本申请的实施方式,在其他实施例的基础之上还可以先对各子卷积核组和输入数据进行卷积,再对各子卷积核组和输入数据的卷积结果进行重排、累加等操作,使得本申请对卷积和重排的先后顺序有着具体的规定,提高本申请的适用性。In the embodiment of the present application, on the basis of other embodiments, each sub-convolution kernel group and input data may be convolved first, and then the convolution results of each sub-convolution kernel group and input data may be rearranged, Operations such as accumulation make this application have specific regulations on the order of convolution and rearrangement, which improves the applicability of this application.
本申请的一个实施例涉及一种神经网络运算方法,如图11所示,包括:An embodiment of the present application relates to a neural network computing method, as shown in FIG. 11 , including:
步骤401,获取神经网络运算的输入数据和Wk*Hk个子卷积核组,并进入运算步骤;其中,神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,N*Wk*Hk个1*1*C子卷积核被划分成Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的 N个子卷积核对应的部分输入数据相同。 Step 401, obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1*1*C sub-convolution kernel, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1 *C sub-convolution kernel, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2, each sub-convolution Part of the input data corresponding to the N sub-convolution kernels in the kernel group is the same.
在一示例实施中,本步骤与本申请实施例的步骤101大致相同,此处不一一赘述。In an exemplary implementation, this step is substantially the same as step 101 in the embodiment of the present application, and details are not repeated here.
步骤402,将第i个子卷积核组和输入数据卷积,得到第i个卷积结果;其中,第i个子卷积核组与第i个子卷积核组对应的部分输入数据卷积得到第i个子卷积核组对应的有效卷积结果,第i个卷积结果中包含有效卷积结果。 Step 402, convolving the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result; wherein, the i-th sub-convolution kernel group is convoluted with the part of the input data corresponding to the i-th sub-convolution kernel group to obtain The effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result contains the effective convolution result.
在一示例实施中,将第i个子卷积核组与输入数据进行卷积,得到第i个卷积结果;而在第i个子卷积核组与输入数据进行卷积的过程中,将第i个子卷积核组与第i个子卷积核组对应的部分输入数据卷积得到卷积结果称为有效卷积结果,也就是第i个卷积结果中包含有效卷积结果。In an example implementation, the ith sub-convolution kernel group is convolved with the input data to obtain the i-th convolution result; and in the process of convolving the i-th sub-convolution kernel group with the input data, the i-th The convolution result obtained by convolving the i sub-convolution kernel group with the part of the input data corresponding to the i-th sub-convolution kernel group is called an effective convolution result, that is, the i-th convolution result contains an effective convolution result.
在一示例实施中,对于包含N个子卷积核的子卷积核组,子卷积核组中的N个子卷积核分别与该子卷积核组对应的重排后输入数据进行卷积,得到N个子卷积结果,将N个子卷积结果分别作为子卷积核组对应的卷积结果的第N层数据。In an exemplary implementation, for a sub-convolution kernel group containing N sub-convolution kernels, the N sub-convolution kernels in the sub-convolution kernel group are respectively convolved with the rearranged input data corresponding to the sub-convolution kernel group , N sub-convolution results are obtained, and the N sub-convolution results are respectively used as the Nth layer data of the convolution results corresponding to the sub-convolution kernel group.
在一示例实施中,Wk*Hk个子卷积核组可以按照顺序进行加载,当加载第i个子卷积核组时,以数据覆盖方式进行加载,即:使用第i个子卷积核组覆盖第i-1个子卷积核组。In an example implementation, the Wk*Hk sub-convolution kernel groups can be loaded in order, and when the i-th sub-convolution kernel group is loaded, it is loaded in a data coverage manner, that is: use the i-th sub-convolution kernel group to cover the first i-1 sub-convolution kernel groups.
步骤403,将第(i-1)次累加结果进行重排,以使得重排后的第(i-1)次累加结果中有效卷积结果和第i个卷积结果中有效卷积结果具有相同的数据位置。 Step 403, rearrange the (i-1)th accumulation result, so that the effective convolution result in the rearranged (i-1)th accumulation result and the effective convolution result in the i-th convolution result have same data location.
在一示例实施中,根据第i个子卷积核组对应的数据重排方式对第(i-1)次的累加结果进行重排,使得重排后的第(i-1)次累加结果中有效卷积结果和第i个卷积结果中有效卷积结果具有相同的数据位置。In an exemplary implementation, the (i-1)th accumulation result is rearranged according to the data rearrangement method corresponding to the i-th sub-convolution kernel group, so that in the rearranged (i-1)th accumulation result The effective convolution result and the effective convolution result in the i-th convolution result have the same data location.
步骤404,将重排后的第(i-1)次累加结果与第i个卷积结果累加,得到第i次累加结果。Step 404: Accumulate the rearranged (i-1)th accumulation result and the i-th convolution result to obtain the i-th accumulation result.
在一示例实施中,将重排后的第(i-1)次累加结果与第i个卷积结果累加,得到第i次累加结果。In an exemplary implementation, the rearranged (i-1)th accumulation result is accumulated with the i-th convolution result to obtain the i-th accumulation result.
在一示例实施中,i的初始值为1,且i=1时,第0次累加结果被设定零,且重排后的第0次累加结果中有效卷积结果和第1个卷积结果中有效卷积结果被默认为具有相同的数据位置。In an exemplary implementation, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result and the 1st convolution result in the 0th accumulation result after rearrangement Valid convolution results are assumed to have the same data position in the result by default.
步骤405,若i小于Wk*Hk,将i更新为i+1,并再次执行运算步骤;若i等于Wk*Hk,将第i次累加结果中的有效卷积结果作为神经网络运算的输出结果。 Step 405, if i is less than Wk*Hk, update i to i+1, and execute the operation step again; if i is equal to Wk*Hk, use the effective convolution result in the i-th accumulation result as the output result of the neural network operation .
在一示例实施中,当第i个子卷积核组的运算完成之后,需要判断i值的大小,当i值小于Wk*Hk时,将i值更新为i+1,并加载第i+1个子卷积核组,再次执行运算步骤;当i值等于Wk*Hk时,此时运算步骤结束,将将第i次累加结果中有效位置上的数据作为神经网络运算的输出结果。In an example implementation, after the operation of the ith sub-convolution kernel group is completed, it is necessary to judge the size of the i value, and when the i value is less than Wk*Hk, update the i value to i+1, and load the i+1th Sub-convolution kernel group, execute the operation step again; when the value of i is equal to Wk*Hk, the operation step ends at this time, and the data at the effective position in the i-th accumulation result will be used as the output result of the neural network operation.
本申请的实施方式,在其他实施例的基础之上还可以通过串行或并行的方式进行子卷积核组和输入数据的卷积运算,从而使得本申请所提及的神经网络运算方法能够应用到各类型的神经网络中。In the embodiment of the present application, on the basis of other embodiments, the convolution operation of the sub-convolution kernel group and the input data can also be performed in a serial or parallel manner, so that the neural network operation method mentioned in the application can Applied to various types of neural networks.
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。The step division of the above various methods is only for the sake of clarity of description. During implementation, it can be combined into one step or some steps can be split and decomposed into multiple steps. As long as they include the same logical relationship, they are all within the scope of protection of this patent. ; Adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of this patent.
本申请的另一个实施例涉及一种神经网络运算装置,下面对本实施例的神经网络运算装置的细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本例的必须,图12是本实施例所述的神经网络运算装置的示意图,包括:第一存储单元1201、第二存储单元1202、控制单元1203、第一数据重排单元1204、卷积单元1205以及加法单元1206;Another embodiment of the present application relates to a neural network computing device. The details of the neural network computing device in this embodiment are described in detail below. The following content is only the implementation details provided for the convenience of understanding, and is not necessary for the implementation of this embodiment. 12 is a schematic diagram of the neural network operation device described in this embodiment, including: a first storage unit 1201, a second storage unit 1202, a control unit 1203, a first data rearrangement unit 1204, a convolution unit 1205, and an addition unit 1206;
其中,第一存储单元,用于存储神经网络运算的输入数据;Wherein, the first storage unit is used to store the input data of the neural network operation;
第二存储单元,用于存储神经网络运算的Wk*Hk个子卷积核组;其中,神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,N*Wk*Hk个1*1*C子卷积核被划分成Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子 卷积核对应输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的部分输入数据相同;The second storage unit is used to store Wk*Hk sub-convolution kernel groups for neural network operations; wherein, the N Wk*Hk*C convolution kernels for neural network operations are split to obtain N*Wk*Hk 1*1* C sub-convolution kernels, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-volumes Product kernel, N, Wk, Hk, C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2, each sub-convolution kernel group Part of the input data corresponding to the N sub-convolution kernels is the same;
控制单元,用于从第一存储单元获取输入数据,并将输入数据输入第一数据重排单元,控制单元还用于将各子卷积核组对应的数据重排方式发送至第一数据重排单元;The control unit is used to obtain input data from the first storage unit, and input the input data to the first data rearrangement unit, and the control unit is also used to send the data rearrangement mode corresponding to each sub-convolution kernel group to the first data rearrangement unit. row unit;
第一数据重排单元,用于根据各子卷积核组对应的数据重排方式对输入数据进行重排,得到各子卷积核组对应的重排后输入数据,并将各子卷积核组对应的重排后输入数据输出至卷积单元;其中,各子卷积核组对应的重排后输入数据中与各子卷积核组对应的部分输入数据具有相同的数据位置,相同的数据位置为有效位置;The first data rearrangement unit is used to rearrange the input data according to the data rearrangement mode corresponding to each sub-convolution kernel group, obtain the rearranged input data corresponding to each sub-convolution kernel group, and convert each sub-convolution The rearranged input data corresponding to the kernel group is output to the convolution unit; wherein, in the rearranged input data corresponding to each sub-convolution kernel group, part of the input data corresponding to each sub-convolution kernel group has the same data position, the same The data location of is a valid location;
控制单元,还用于从第二存储单元获取各子卷积核组,并将各子卷积核组发送至卷积单元;The control unit is also used to obtain each sub-convolution kernel group from the second storage unit, and send each sub-convolution kernel group to the convolution unit;
卷积单元,用于将各子卷积核组和各子卷积核组对应的重排后输入数据卷积得到各子卷积核组对应的卷积结果,并将各子卷积核组对应的卷积结果输出至加法单元;The convolution unit is used to convolve each sub-convolution kernel group and the rearranged input data corresponding to each sub-convolution kernel group to obtain a convolution result corresponding to each sub-convolution kernel group, and to convolve each sub-convolution kernel group The corresponding convolution result is output to the addition unit;
加法单元,用于将各子卷积核组对应的卷积结果累加得到累加结果,并将累加结果中位于有效位置的数据作为神经网络运算的输出结果。The addition unit is used to accumulate the convolution results corresponding to each sub-convolution kernel group to obtain an accumulation result, and use the data in the effective position in the accumulation result as the output result of the neural network operation.
在一示例实施中,本申请提供的神经网络运算装置,还包括第三存储单元,用于在神经网络运算过程为串行运算时,存储上一次子卷积核组和输入数据运算的运算结果。In an exemplary implementation, the neural network computing device provided by the present application further includes a third storage unit, configured to store the operation result of the last sub-convolution kernel group and the input data operation when the neural network computing process is a serial operation .
本申请的另一个实施例涉及一种神经网络运算装置,下面对本实施例的神经网络运算装置的细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本例的必须,图13是本实施例所述的神经网络运算装置的示意图,包括:第一存储单元1301、第二存储单元1302、控制单元1303、第二数据重排单元1404、卷积单元1405以及加法单元1406。Another embodiment of the present application relates to a neural network computing device. The details of the neural network computing device in this embodiment are described in detail below. The following content is only the implementation details provided for the convenience of understanding, and is not necessary for the implementation of this embodiment. 13 is a schematic diagram of the neural network computing device in this embodiment, including: a first storage unit 1301 , a second storage unit 1302 , a control unit 1303 , a second data rearrangement unit 1404 , a convolution unit 1405 and an addition unit 1406 .
其中,第一存储单元,用于存储神经网络运算的输入数据。Wherein, the first storage unit is used for storing the input data of the neural network operation.
第二存储单元,用于存储神经网络运算的Wk*Hk个子卷积核组;其中,神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,N*Wk*Hk个1*1*C子卷积核被划分成Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的部分输入数据相同;The second storage unit is used to store Wk*Hk sub-convolution kernel groups for neural network operations; wherein, the N Wk*Hk*C convolution kernels for neural network operations are split to obtain N*Wk*Hk 1*1* C sub-convolution kernels, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-volumes Product kernel, N, Wk, Hk, C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2, each sub-convolution kernel group Part of the input data corresponding to the N sub-convolution kernels is the same;
控制单元,用于从第一存储单元获取输入数据,并将输入数据输入卷积单元,控制单元还用于从第二存储单元获取各子卷积核组,并将各子卷积核组输入卷积单元;The control unit is used to obtain input data from the first storage unit, and input the input data to the convolution unit, and the control unit is also used to obtain each sub-convolution kernel group from the second storage unit, and input each sub-convolution kernel group convolution unit;
卷积单元,用于将各子卷积核组和输入数据分别卷积得到各子卷积核组对应的卷积结果,并将各子卷积核组对应的卷积结果输出至第二数据重排单元;The convolution unit is used to convolve each sub-convolution kernel group and the input data respectively to obtain the convolution result corresponding to each sub-convolution kernel group, and output the convolution result corresponding to each sub-convolution kernel group to the second data rearrangement unit;
控制单元,还用于将各子卷积核组对应的数据重排方式发送至第二数据重排单元;The control unit is also used to send the data rearrangement mode corresponding to each sub-convolution kernel group to the second data rearrangement unit;
第二数据重排单元,用于根据各子卷积核组对应的数据重排方式对各子卷积核组对应的卷积结果进行重排,得到各子卷积核组对应的重排后卷积结果,并将各子卷积核组对应的重排后卷积结果输出至加法单元;其中,各子卷积核组与各子卷积核组对应的部分输入数据卷积得到各子卷积核组对应的有效卷积结果,各子卷积核组对应的重排后卷积结果中的有效卷积结果具有相同的数据位置,且相同的数据位置为有效位置。The second data rearrangement unit is used to rearrange the convolution results corresponding to each sub-convolution kernel group according to the data rearrangement mode corresponding to each sub-convolution kernel group, and obtain the rearranged results corresponding to each sub-convolution kernel group. Convolution results, and output the rearranged convolution results corresponding to each sub-convolution kernel group to the addition unit; wherein, each sub-convolution kernel group is convoluted with part of the input data corresponding to each sub-convolution kernel group to obtain each sub-convolution kernel group The effective convolution results corresponding to the convolution kernel group, and the effective convolution results in the rearranged convolution results corresponding to each sub-convolution kernel group have the same data position, and the same data position is an effective position.
在一示例实施中,本申请提供的神经网络运算装置,还包括第三存储单元,用于在神经网络运算过程为串行运算时,存储上一次子卷积核组和输入数据运算的运算结果。In an exemplary implementation, the neural network computing device provided by the present application further includes a third storage unit, configured to store the operation result of the last sub-convolution kernel group and the input data operation when the neural network computing process is a serial operation .
本申请的另一个实施例涉及一种神经网络运算装置,下面对本实施例的神经网络运算装置的细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本例的必须,图14是本实施例所述的神 经网络运算装置的示意图,包括:第一存储单元1401、第二存储单元1402、控制单元1403、第三数据重排单元1404、卷积单元1405以及加法单元1406;Another embodiment of the present application relates to a neural network computing device. The details of the neural network computing device in this embodiment are described in detail below. The following content is only the implementation details provided for the convenience of understanding, and is not necessary for the implementation of this embodiment. 14 is a schematic diagram of the neural network operation device described in this embodiment, including: a first storage unit 1401, a second storage unit 1402, a control unit 1403, a third data rearrangement unit 1404, a convolution unit 1405, and an addition unit 1406;
其中,第一存储单元,用于存储神经网络运算的输入数据;Wherein, the first storage unit is used to store the input data of the neural network operation;
第二存储单元,用于存储神经网络运算的Wk*Hk个子卷积核组;其中,神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,N*Wk*Hk个1*1*C子卷积核被划分成Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的部分输入数据相同;The second storage unit is used to store Wk*Hk sub-convolution kernel groups for neural network operations; wherein, the N Wk*Hk*C convolution kernels for neural network operations are split to obtain N*Wk*Hk 1*1* C sub-convolution kernels, N*Wk*Hk 1*1*C sub-convolution kernels are divided into Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-volumes Product kernel, N, Wk, Hk, C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2, each sub-convolution kernel group Part of the input data corresponding to the N sub-convolution kernels is the same;
控制单元,用于从第一存储单元获取输入数据,并将输入数据输入卷积单元,控制单元还用于从第二存储单元获取第i个子卷积核组,并将第i个子卷积核组输入卷积单元;The control unit is used to obtain input data from the first storage unit, and input the input data to the convolution unit, and the control unit is also used to obtain the i-th sub-convolution kernel group from the second storage unit, and input the i-th sub-convolution kernel Group input convolution unit;
卷积单元,用于将第i个子卷积核组和输入数据卷积,得到第i个卷积结果,并将第i个卷积结果输出至加法单元;其中,第i个子卷积核组与第i个子卷积核组对应的部分输入数据卷积得到第i个子卷积核组对应的有效卷积结果,第i个卷积结果中包含有效卷积结果;The convolution unit is used to convolve the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result, and output the i-th convolution result to the addition unit; wherein, the i-th sub-convolution kernel group Part of the input data corresponding to the i-th sub-convolution kernel group is convoluted to obtain an effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result contains an effective convolution result;
控制单元,还用于从第三存储单元中获取第(i-1)次累加结果,并将第(i-1)次累加结果发送至第三数据重排单元;The control unit is also used to obtain the (i-1)th accumulation result from the third storage unit, and send the (i-1)th accumulation result to the third data rearrangement unit;
第三数据重排单元,用于将第(i-1)次累加结果进行重排,以使得重排后的第(i-1)次累加结果中有效卷积结果和第i个卷积结果中有效卷积结果具有相同的数据位置;并将重排后的第(i-1)次累加结果输出至加法单元;将重排后的第(i-1)次累加结果与第i个卷积结果累加,得到第i次累加结果,并将第i次累加结果存储到第三存储单元中且覆盖第(i-1)次累加结果;The third data rearrangement unit is used to rearrange the (i-1)th accumulation result, so that the effective convolution result and the i-th convolution result in the rearranged (i-1)th accumulation result The effective convolution results in have the same data position; and output the rearranged (i-1)th accumulation result to the addition unit; combine the rearranged (i-1)th accumulation result with the i-th volume Accumulate the accumulation results, obtain the i-th accumulation result, and store the i-th accumulation result in the third storage unit and cover the (i-1)-th accumulation result;
控制单元,还用于判断i的数值大小,若i小于Wk*Hk,将i更新为i+1,并再次执行运算步骤;若i等于Wk*Hk,将第i次累加结果中的有效卷积结果作为神经网络运算的输出结果;其中,i的初始值为1,且i=1时,第0次累加结果被设定零,且重排后的第0次累加结果中有效卷积结果和第1个卷积结果中有效卷积结果被默认为具有相同的数据位置。The control unit is also used to judge the value of i, if i is less than Wk*Hk, update i to i+1, and execute the operation step again; if i is equal to Wk*Hk, add the effective volume in the i-th accumulation result The result of the accumulation is taken as the output result of the neural network operation; where, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result in the 0th accumulation result after rearrangement The valid convolution result is assumed to have the same data position as the first convolution result by default.
不难发现,本实施例为与上述方法实施例对应的系统实施例,本实施例可以与上述方法实施例互相配合实施。上述实施例中提到的相关技术细节和技术效果在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在上述实施例中。It is not difficult to find that this embodiment is a system embodiment corresponding to the above method embodiment, and this embodiment can be implemented in cooperation with the above method embodiment. The relevant technical details and technical effects mentioned in the above embodiments are still valid in this embodiment, and will not be repeated here to reduce repetition. Correspondingly, the relevant technical details mentioned in this embodiment can also be applied in the above embodiments.
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present application, units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
本申请另一个实施例涉及一种芯片,如图6所示,包括:至少一个处理器601;以及,与所述至少一个处理器601通信连接的存储器602;其中,所述存储器602存储有可被所述至少一个处理器601执行的指令,所述指令被所述至少一个处理器601执行,以使所述至少一个处理器601能够执行上述各实施例中的神经网络运算方法。Another embodiment of the present application relates to a chip, as shown in FIG. 6 , including: at least one processor 601; and a memory 602 communicatively connected to the at least one processor 601; Instructions executed by the at least one processor 601, the instructions are executed by the at least one processor 601, so that the at least one processor 601 can execute the neural network computing methods in the foregoing embodiments.
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输, 进一步,天线还接收数据并将数据传送给处理器。Wherein, the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together. The bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein. The bus interface provides an interface between the bus and the transceivers. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor.
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。The processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.
本申请另一个实施例涉及一种电子设备,如图6所示,包括:至少一个处理器601;以及,与所述至少一个处理器601通信连接的存储器602;其中,所述存储器602存储有可被所述至少一个处理器601执行的指令,所述指令被所述至少一个处理器601执行,以使所述至少一个处理器601能够执行上述各实施例中的神经网络运算方法。Another embodiment of the present application relates to an electronic device, as shown in FIG. 6 , including: at least one processor 601; and a memory 602 communicatively connected to the at least one processor 601; wherein, the memory 602 stores Instructions that can be executed by the at least one processor 601, the instructions are executed by the at least one processor 601, so that the at least one processor 601 can execute the neural network computing methods in the foregoing embodiments.
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。Wherein, the memory and the processor are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together. The bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein. The bus interface provides an interface between the bus and the transceivers. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。The processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.
本申请另一个实施例涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。Another embodiment of the present application relates to a computer-readable storage medium storing a computer program. The above method embodiments are implemented when the computer program is executed by the processor.
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, referred to as ROM), random access memory (Random Access Memory, referred to as RAM), magnetic disk or optical disc, etc. can store program codes. medium.
本领域的普通技术人员可以理解,上述各实施方式是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned implementation modes are specific examples for realizing the present application, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present application. scope.

Claims (20)

  1. 一种神经网络运算方法,包括:A neural network operation method, comprising:
    获取神经网络运算的输入数据和Wk*Hk个子卷积核组,并进入运算步骤;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;Obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1 *1*C sub-convolution kernels, the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernel, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2 , the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;
    所述运算步骤,包括:The operation steps include:
    根据各所述子卷积核组对应的数据重排方式对所述输入数据进行重排,得到各所述子卷积核组对应的重排后输入数据;将各所述子卷积核组和各所述子卷积核组对应的重排后输入数据卷积得到各所述子卷积核组对应的卷积结果;将各所述子卷积核组对应的卷积结果累加得到累加结果,并将所述累加结果中位于有效位置的数据作为所述神经网络运算的输出结果;Rearrange the input data according to the data rearrangement mode corresponding to each of the sub-convolution kernel groups, and obtain the rearranged input data corresponding to each of the sub-convolution kernel groups; Convolving the rearranged input data corresponding to each of the sub-convolution kernel groups to obtain convolution results corresponding to each of the sub-convolution kernel groups; accumulating the convolution results corresponding to each of the sub-convolution kernel groups to obtain an accumulated result, and use the data in the effective position in the accumulation result as the output result of the neural network operation;
    其中,各所述子卷积核组对应的重排后输入数据中与各所述子卷积核组对应的所述部分输入数据具有相同的数据位置,所述相同的数据位置为所述有效位置。Wherein, in the rearranged input data corresponding to each of the sub-convolution kernel groups, the part of the input data corresponding to each of the sub-convolution kernel groups has the same data position, and the same data position is the effective Location.
  2. 根据权利要求1所述的神经网络运算方法,其中,所述运算步骤包括:The neural network computing method according to claim 1, wherein said computing step comprises:
    根据第i个子卷积核组对应的数据重排方式对所述输入数据进行重排,得到第i个重排后输入数据;rearrange the input data according to the data rearrangement mode corresponding to the i-th sub-convolution kernel group, and obtain the i-th rearranged input data;
    将所述第i个子卷积核组和所述第i个重排后输入数据卷积,得到第i个卷积结果;Convolving the i-th sub-convolution kernel group with the i-th rearranged input data to obtain the i-th convolution result;
    将所述第i个卷积结果和第(i-1)次累加结果累加,得到第i次累加结果;accumulating the ith convolution result and the (i-1)th accumulation result to obtain the ith accumulation result;
    若i小于Wk*Hk,将i更新为i+1,并再次执行所述运算步骤;若i等于Wk*Hk,将所述第i次累加结果中所述有效位置上的数据作为所述神经网络运算的输出结果;If i is less than Wk*Hk, update i to i+1, and perform the operation step again; if i is equal to Wk*Hk, use the data at the effective position in the ith accumulation result as the nerve The output of the network operation;
    其中,i的初始值为1,且i=1时,第1个子卷积核组对应的数据重排方式被设定为所述输入数据中各部分数据的位置不变,第0次累加结果被设定零。Wherein, the initial value of i is 1, and when i=1, the data rearrangement method corresponding to the first sub-convolution kernel group is set so that the position of each part of the data in the input data remains unchanged, and the 0th accumulation result is set to zero.
  3. 根据权利要求2所述的神经网络运算方法,其中,所述获取神经网络运算的输入数据和Wk*Hk个子卷积核组,包括:The neural network operation method according to claim 2, wherein said acquisition of input data of neural network operation and Wk*Hk sub-convolution kernel groups includes:
    加载所述输入数据;load said input data;
    在所述根据所述第i个子卷积核组对应的数据重排方式对所述输入数据进行重排,得到第i个重排后输入数据之前,加载所述第i个子卷积核组。Before the input data is rearranged according to the data rearrangement manner corresponding to the ith sub-convolution kernel group to obtain the i-th rearranged input data, the i-th sub-convolution kernel group is loaded.
  4. 根据权利要求3所述的神经网络运算方法,其中,所述加载所述第i个子卷积核组,具体为:以数据覆盖方式加载所述第i个子卷积核组。The neural network operation method according to claim 3, wherein the loading of the ith sub-convolution kernel group is specifically: loading the i-th sub-convolution kernel group in a data coverage manner.
  5. 根据权利要求1至4中任一项所述的神经网络运算方法,其中,在N≥2的情况下,所述将各所述子卷积核组和各所述子卷积核组对应的重排后输入数据卷积得到各所述子卷积核组对应的卷积结果,具体为:The neural network operation method according to any one of claims 1 to 4, wherein, in the case of N≥2, each of the sub-convolution kernel groups corresponding to each of the sub-convolution kernel groups After the rearrangement, the input data is convoluted to obtain the convolution results corresponding to each of the sub-convolution kernel groups, specifically:
    对于每个子卷积核组,所述子卷积核组中的N个子卷积核分别与所述子卷积核组对应的重排后输入数据卷积,得到N个子卷积结果,且所述N个子卷积结果作为所述子卷积核组对应的卷积结果中的N层数据。For each sub-convolution kernel group, the N sub-convolution kernels in the sub-convolution kernel group are respectively convolved with the rearranged input data corresponding to the sub-convolution kernel group to obtain N sub-convolution results, and the The N sub-convolution results are used as N-layer data in the convolution results corresponding to the sub-convolution kernel groups.
  6. 一种神经网络运算方法,包括:A neural network operation method, comprising:
    获取神经网络运算的输入数据和Wk*Hk个子卷积核组,并进入运算步骤;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核;N、Wk、Hk、C均为大于 或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;Obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups, and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1 *1*C sub-convolution kernels, the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernel; N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2 , the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;
    所述运算步骤,包括:The operation steps include:
    将各所述子卷积核组和所述输入数据分别卷积得到各所述子卷积核组对应的卷积结果;根据各所述子卷积核组对应的数据重排方式对各所述子卷积核组对应的卷积结果进行重排,得到各所述子卷积核组对应的重排后卷积结果;将各所述子卷积核组对应的重排后卷积结果累加得到累加结果,并将所述累加结果中位于有效位置的数据作为所述神经网络运算的输出结果;Each of the sub-convolution kernel groups and the input data are respectively convoluted to obtain convolution results corresponding to each of the sub-convolution kernel groups; The convolution results corresponding to the sub-convolution kernel groups are rearranged to obtain the rearranged convolution results corresponding to each of the sub-convolution kernel groups; the rearranged convolution results corresponding to each of the sub-convolution kernel groups are Accumulate to obtain an accumulation result, and use the data in the effective position in the accumulation result as the output result of the neural network operation;
    其中,各所述子卷积核组与各所述子卷积核组对应的所述部分输入数据卷积得到各所述子卷积核组对应的有效卷积结果,各所述子卷积核组对应的重排后卷积结果中的所述有效卷积结果具有相同的数据位置,且所述相同的数据位置为所述有效位置。Wherein, each of the sub-convolution kernel groups is convolved with the part of the input data corresponding to each of the sub-convolution kernel groups to obtain an effective convolution result corresponding to each of the sub-convolution kernel groups, and each of the sub-convolution kernel groups The effective convolution results in the rearranged convolution results corresponding to the core group have the same data position, and the same data position is the effective position.
  7. 根据权利要求6所述的神经网络运算方法,其中,所述运算步骤包括:The neural network computing method according to claim 6, wherein said computing step comprises:
    将第i个子卷积核组和所述输入数据卷积,得到第i个卷积结果;Convolving the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result;
    根据所述第i个卷积结果对应的数据重排方式对所述第i个卷积结果进行重排,得到第i个重排后卷积结果;rearrange the ith convolution result according to the data rearrangement mode corresponding to the i-th convolution result, and obtain the i-th rearranged convolution result;
    将所述第i个重排后卷积结果和第(i-1)次累加结果累加,得到第i次累加结果;accumulating the ith rearranged convolution result and the (i-1)th accumulation result to obtain the ith accumulation result;
    若i小于Wk*Hk,将i更新为i+1,并再次执行所述运算步骤;若i等于Wk*Hk,将所述第i次累加结果中所述有效位置上的数据作为所述神经网络运算的输出结果;If i is less than Wk*Hk, update i to i+1, and perform the operation step again; if i is equal to Wk*Hk, use the data at the effective position in the ith accumulation result as the nerve The output of the network operation;
    其中,i的初始值为1,且i=1时,第1个卷积结果对应的数据重排方式被设定为所述第1个卷积结果中各部分卷积结果的位置不变,第0次累加结果被设定零。Wherein, the initial value of i is 1, and when i=1, the data rearrangement method corresponding to the first convolution result is set so that the position of each part of the convolution result in the first convolution result remains unchanged, The 0th accumulation result is set to zero.
  8. 根据权利要求7所述的神经网络运算方法,其中,所述获取神经网络运算的输入数据和Wk*Hk个子卷积核组,包括:The neural network operation method according to claim 7, wherein said acquisition of input data of neural network operation and Wk*Hk sub-convolution kernel groups includes:
    加载所述输入数据;load said input data;
    在所述将第i个子卷积核组和所述输入数据卷积,得到第i个卷积结果之前,加载所述第i个子卷积核组。Before the i-th sub-convolution kernel group is convolved with the input data to obtain the i-th convolution result, the i-th sub-convolution kernel group is loaded.
  9. 根据权利要求8所述的神经网络运算方法,其中,所述加载所述第i个子卷积核组,具体为:以数据覆盖方式加载所述第i个子卷积核组。The neural network operation method according to claim 8, wherein the loading of the ith sub-convolution kernel group is specifically: loading the i-th sub-convolution kernel group in a data coverage manner.
  10. 根据权利要求6至9中任一项所述的神经网络运算方法,其中,在N≥2的情况下,所述将各所述子卷积核组和所述输入数据分别卷积得到各所述子卷积核组对应的卷积结果,具体为:According to the neural network operation method described in any one of claims 6 to 9, wherein, in the case of N≥2, the convolution of each of the sub-convolution kernel groups and the input data is obtained by each of the sub-convolution kernel groups. The convolution result corresponding to the sub-convolution kernel group is as follows:
    对于每个子卷积核组,所述子卷积核组中的N个子卷积核分别与所述输入数据卷积,得到N个子卷积结果,且所述N个子卷积结果作为所述子卷积核组对应的卷积结果中的N层数据。For each sub-convolution kernel group, the N sub-convolution kernels in the sub-convolution kernel group are respectively convoluted with the input data to obtain N sub-convolution results, and the N sub-convolution results are used as the sub-convolution results. The N-layer data in the convolution result corresponding to the convolution kernel group.
  11. 一种神经网络运算方法,包括:A neural network operation method, comprising:
    获取神经网络运算的输入数据和Wk*Hk个子卷积核组;并进入运算步骤;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核;N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;Obtain the input data of the neural network operation and Wk*Hk sub-convolution kernel groups; and enter the operation step; wherein, the N Wk*Hk*C convolution kernels of the neural network operation are split to obtain N*Wk*Hk 1s *1*C sub-convolution kernels, the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk*Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernel; N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to part of the input data in the input data, and in the case of N≥2 , the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;
    所述运算步骤包括:The operation steps include:
    将第i个子卷积核组和所述输入数据卷积,得到第i个卷积结果;其中,所述第i个子卷积核组与所述第i个子卷积核组对应的所述部分输入数据卷积得到所述第i个子卷积核组对应的有效卷积结果,所述第i 个卷积结果中包含所述有效卷积结果;Convolving the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result; wherein, the i-th sub-convolution kernel group corresponds to the part corresponding to the i-th sub-convolution kernel group The input data is convolved to obtain an effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th convolution result includes the effective convolution result;
    将所述第(i-1)次累加结果进行重排,以使得重排后的第(i-1)次累加结果中所述有效卷积结果和所述第i个卷积结果中所述有效卷积结果具有相同的数据位置;The (i-1)th accumulation result is rearranged so that the effective convolution result in the rearranged (i-1)th accumulation result and the i-th convolution result Valid convolution results have the same data location;
    将所述重排后的第(i-1)次累加结果与所述第i个卷积结果累加,得到第i次累加结果;accumulating the (i-1)th accumulation result after the rearrangement with the i-th convolution result to obtain the i-th accumulation result;
    若i小于Wk*Hk,将i更新为i+1,并再次执行所述运算步骤;若i等于Wk*Hk,将所述第i次累加结果中的所述有效卷积结果作为所述神经网络运算的输出结果;If i is less than Wk*Hk, update i to i+1, and perform the operation step again; if i is equal to Wk*Hk, use the effective convolution result in the i-th accumulation result as the neuron The output of the network operation;
    其中,i的初始值为1,且i=1时,第0次累加结果被设定零,且所述重排后的第0次累加结果中所述有效卷积结果和所述第1个卷积结果中所述有效卷积结果被默认为具有相同的数据位置。Wherein, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result and the 1st accumulation result in the rearranged 0th accumulation result Valid convolution results described in Convolution Results are assumed to have the same data location by default.
  12. 根据权利要求11所述的神经网络运算方法,其中,所述获取神经网络运算的输入数据和Wk*Hk个子卷积核组,包括:The neural network operation method according to claim 11, wherein said obtaining the input data of the neural network operation and Wk*Hk sub-convolution kernel groups includes:
    加载所述输入数据;load said input data;
    在所述将第i个子卷积核组和所述输入数据卷积,得到第i个卷积结果之前,加载所述第i个子卷积核组。Before the i-th sub-convolution kernel group is convolved with the input data to obtain the i-th convolution result, the i-th sub-convolution kernel group is loaded.
  13. 根据权利要求12所述的神经网络运算方法,其中,所述加载所述第i个子卷积核组,具体为:以数据覆盖方式加载所述第i个子卷积核组。The neural network operation method according to claim 12, wherein the loading of the ith sub-convolution kernel group is specifically: loading the i-th sub-convolution kernel group in a data coverage manner.
  14. 根据权利要求11至13中任一项所述的神经网络运算方法,其中,在N≥2的情况下,所述将第i个子卷积核组和所述输入数据卷积,得到第i个卷积结果,具体为:The neural network operation method according to any one of claims 11 to 13, wherein, in the case of N≥2, said convolving the i-th sub-convolution kernel group with the input data to obtain the i-th Convolution results, specifically:
    将所述第i个子卷积核组中的N个子卷积核分别与所述输入数据卷积,得到N个子卷积结果,且所述N个子卷积结果作为所述第i个子卷积核组对应的卷积结果中的N层数据。Convolving the N sub-convolution kernels in the i-th sub-convolution kernel group with the input data respectively to obtain N sub-convolution results, and the N sub-convolution results are used as the i-th sub-convolution kernel Groups correspond to N layers of data in the convolution result.
  15. 一种神经网络运算装置,包括:第一存储单元、第二存储单元、控制单元、第一数据重排单元、卷积单元以及加法单元;A neural network computing device, comprising: a first storage unit, a second storage unit, a control unit, a first data rearrangement unit, a convolution unit, and an addition unit;
    所述第一存储单元用于存储神经网络运算的输入数据,所述第二存储单元用于存储所述神经网络运算的Wk*Hk个子卷积核组;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;The first storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the N of the neural network operations The Wk*Hk*C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk *Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to Part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;
    所述控制单元用于从所述第一存储单元获取所述输入数据,并将所述输入数据输入所述第一数据重排单元,所述控制单元还用于将各所述子卷积核组对应的数据重排方式发送至所述第一数据重排单元;The control unit is used to obtain the input data from the first storage unit, and input the input data to the first data rearrangement unit, and the control unit is also used to transfer each of the sub-convolution kernels sending the data rearrangement mode corresponding to the group to the first data rearrangement unit;
    所述第一数据重排单元用于根据各所述子卷积核组对应的数据重排方式对所述输入数据进行重排,得到各所述子卷积核组对应的重排后输入数据,并将各所述子卷积核组对应的重排后输入数据输出至所述卷积单元;其中,各所述子卷积核组对应的重排后输入数据中与各所述子卷积核组对应的所述部分输入数据具有相同的数据位置,所述相同的数据位置为所述有效位置;The first data rearrangement unit is used to rearrange the input data according to the data rearrangement mode corresponding to each of the sub-convolution kernel groups, to obtain rearranged input data corresponding to each of the sub-convolution kernel groups , and output the rearranged input data corresponding to each of the sub-convolution kernel groups to the convolution unit; The part of input data corresponding to the accumulation group has the same data position, and the same data position is the effective position;
    所述控制单元还用于从所述第二存储单元获取各所述子卷积核组,并将各所述子卷积核组发送至所述卷积单元;The control unit is further configured to obtain each of the sub-convolution kernel groups from the second storage unit, and send each of the sub-convolution kernel groups to the convolution unit;
    所述卷积单元用于将各所述子卷积核组和各所述子卷积核组对应的重排后输入数据卷积得到各所述子卷积核组对应的卷积结果,并将各所述子卷积核组对应的卷积结果输出至所述加法单元;The convolution unit is used to convolve each of the sub-convolution kernel groups and the rearranged input data corresponding to each of the sub-convolution kernel groups to obtain a convolution result corresponding to each of the sub-convolution kernel groups, and Outputting the convolution results corresponding to each of the sub-convolution kernel groups to the adding unit;
    所述加法单元用于将各所述子卷积核组对应的卷积结果累加得到累加结果,并将所述累加结果中位于有效位置的数据作为所述神经网络运算的输出结果。The adding unit is used for accumulating the convolution results corresponding to each of the sub-convolution kernel groups to obtain an accumulation result, and taking the data at a valid position in the accumulation result as the output result of the neural network operation.
  16. 一种神经网络运算装置,包括:第一存储单元、第二存储单元、控制单元、第二数据重排单元、 卷积单元以及加法单元;A neural network computing device, comprising: a first storage unit, a second storage unit, a control unit, a second data rearrangement unit, a convolution unit, and an addition unit;
    所述第一存储单元用于存储神经网络运算的输入数据,所述第二存储单元用于存储所述神经网络运算的Wk*Hk个子卷积核组;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;The first storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the N of the neural network operations The Wk*Hk*C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk *Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to Part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;
    所述控制单元用于从所述第一存储单元获取所述输入数据,并将所述输入数据输入所述卷积单元,所述控制单元还用于从所述第二存储单元获取各所述子卷积核组,并将各所述子卷积核组输入所述卷积单元;The control unit is used to obtain the input data from the first storage unit, and input the input data to the convolution unit, and the control unit is also used to obtain each of the input data from the second storage unit. A sub-convolution kernel group, and each of the sub-convolution kernel groups is input to the convolution unit;
    所述卷积单元用于将各所述子卷积核组和所述输入数据分别卷积得到各所述子卷积核组对应的卷积结果,并将各所述子卷积核组对应的卷积结果输出至所述第二数据重排单元;The convolution unit is used to convolve each of the sub-convolution kernel groups and the input data respectively to obtain convolution results corresponding to each of the sub-convolution kernel groups, and to correspond to each of the sub-convolution kernel groups. The convolution result is output to the second data rearrangement unit;
    所述控制单元还用于将各所述子卷积核组对应的数据重排方式发送至所述第二数据重排单元;The control unit is further configured to send the data rearrangement mode corresponding to each sub-convolution kernel group to the second data rearrangement unit;
    所述第二数据重排单元根据各所述子卷积核组对应的数据重排方式对各所述子卷积核组对应的卷积结果进行重排,得到各所述子卷积核组对应的重排后卷积结果,并将各所述子卷积核组对应的重排后卷积结果输出至所述加法单元;The second data rearrangement unit rearranges the convolution results corresponding to each of the sub-convolution kernel groups according to the data rearrangement mode corresponding to each of the sub-convolution kernel groups, to obtain each of the sub-convolution kernel groups The corresponding rearranged convolution results, and output the rearranged convolution results corresponding to each of the sub-convolution kernel groups to the adding unit;
    其中,各所述子卷积核组与各所述子卷积核组对应的所述部分输入数据卷积得到各所述子卷积核组对应的有效卷积结果,各所述子卷积核组对应的重排后卷积结果中的所述有效卷积结果具有相同的数据位置,且所述相同的数据位置为所述有效位置。Wherein, each of the sub-convolution kernel groups is convolved with the part of the input data corresponding to each of the sub-convolution kernel groups to obtain an effective convolution result corresponding to each of the sub-convolution kernel groups, and each of the sub-convolution kernel groups The effective convolution results in the rearranged convolution results corresponding to the core group have the same data position, and the same data position is the effective position.
  17. 一种神经网络运算装置,包括:第一存储单元、第二存储单元、第三存储单元、控制单元、第三数据重排单元、卷积单元以及加法单元;A neural network computing device, comprising: a first storage unit, a second storage unit, a third storage unit, a control unit, a third data rearrangement unit, a convolution unit, and an addition unit;
    所述第一存储单元用于存储神经网络运算的输入数据,所述第二存储单元用于存储所述神经网络运算的Wk*Hk个子卷积核组;其中,所述神经网络运算的N个Wk*Hk*C卷积核拆分得到N*Wk*Hk个1*1*C子卷积核,所述N*Wk*Hk个1*1*C子卷积核被划分成所述Wk*Hk个子卷积核组,且每个子卷积核组包括N个1*1*C子卷积核,N、Wk、Hk、C均为大于或等于1的整数;每个子卷积核对应所述输入数据中的部分输入数据,且在N≥2的情况下,每个子卷积核组中的N个子卷积核对应的所述部分输入数据相同;The first storage unit is used to store the input data of the neural network operation, and the second storage unit is used to store the Wk*Hk sub-convolution kernel groups of the neural network operation; wherein, the N of the neural network operations The Wk*Hk*C convolution kernel is split to obtain N*Wk*Hk 1*1*C sub-convolution kernels, and the N*Wk*Hk 1*1*C sub-convolution kernels are divided into the Wk *Hk sub-convolution kernel groups, and each sub-convolution kernel group includes N 1*1*C sub-convolution kernels, N, Wk, Hk, and C are all integers greater than or equal to 1; each sub-convolution kernel corresponds to Part of the input data in the input data, and in the case of N≥2, the part of the input data corresponding to the N sub-convolution kernels in each sub-convolution kernel group is the same;
    所述控制单元用于从所述第一存储单元获取所述输入数据,并将所述输入数据输入所述卷积单元,所述控制单元还用于从所述第二存储单元获取第i个子卷积核组,并将所述第i个子卷积核组输入所述卷积单元;The control unit is used to obtain the input data from the first storage unit, and input the input data to the convolution unit, and the control unit is also used to obtain the i-th sub A convolution kernel group, and input the ith sub-convolution kernel group into the convolution unit;
    所述卷积单元用于将第i个子卷积核组和所述输入数据卷积,得到第i个卷积结果,并将所述第i个卷积结果输出至所述加法单元;其中,所述第i个子卷积核组与所述第i个子卷积核组对应的所述部分输入数据卷积得到所述第i个子卷积核组对应的有效卷积结果,所述第i个卷积结果中包含所述有效卷积结果;The convolution unit is used to convolve the i-th sub-convolution kernel group with the input data to obtain the i-th convolution result, and output the i-th convolution result to the addition unit; wherein, The i-th sub-convolution kernel group is convolved with the part of the input data corresponding to the i-th sub-convolution kernel group to obtain an effective convolution result corresponding to the i-th sub-convolution kernel group, and the i-th The convolution result includes the effective convolution result;
    所述控制单元还用于从所述第三存储单元中获取所述第(i-1)次累加结果,并将所述第(i-1)次累加结果发送至所述第三数据重排单元;The control unit is further configured to obtain the (i-1)th accumulation result from the third storage unit, and send the (i-1)th accumulation result to the third data rearrangement unit;
    所述第三数据重排单元用于将所述第(i-1)次累加结果进行重排,以使得重排后的第(i-1)次累加结果中所述有效卷积结果和所述第i个卷积结果中所述有效卷积结果具有相同的数据位置;并将所述重排后的第(i-1)次累加结果输出至所述加法单元;The third data rearrangement unit is used to rearrange the (i-1)th accumulation result, so that the effective convolution result and the The effective convolution result in the ith convolution result has the same data position; and the rearranged (i-1)th accumulation result is output to the addition unit;
    将所述重排后的第(i-1)次累加结果与所述第i个卷积结果累加,得到第i次累加结果,并将所述第i次累加结果存储到所述第三存储单元中且覆盖所述第(i-1)次累加结果;accumulating the (i-1)th accumulation result after the rearrangement with the i-th convolution result to obtain the i-th accumulation result, and storing the i-th accumulation result in the third storage in the unit and cover the (i-1)th accumulation result;
    所述控制单元还用于判断i的数值大小,若i小于Wk*Hk,将i更新为i+1,并再次执行所述运算步骤;若i等于Wk*Hk,将所述第i次累加结果中的所述有效卷积结果作为所述神经网络运算的输出结果;The control unit is also used to judge the value of i, if i is less than Wk*Hk, update i to i+1, and execute the operation step again; if i is equal to Wk*Hk, accumulate the ith The effective convolution result in the result is used as the output result of the neural network operation;
    其中,i的初始值为1,且i=1时,第0次累加结果被设定零,且所述重排后的第0次累加结果中所述有效卷积结果和所述第1个卷积结果中所述有效卷积结果被默认为具有相同的数据位置。Wherein, the initial value of i is 1, and when i=1, the 0th accumulation result is set to zero, and the effective convolution result and the 1st accumulation result in the rearranged 0th accumulation result Valid convolution results described in Convolution Results are assumed to have the same data location by default.
  18. 一种芯片,包括:A chip comprising:
    至少一处理模块;以及,at least one processing module; and,
    与所述至少一处理模块通信连接的存储模块;其中,A storage module communicatively connected to the at least one processing module; wherein,
    所述存储模块存储有可被所述至少一处理模块执行的指令,所述指令被所述至少一处理模块执行,以使所述至少一处理模块能够执行如权利要求1至5中任一项所述的方法、或执行如权利要求6至10中任一项所述的方法、或执行如权利要求11至14中任一项所述的方法。The storage module stores instructions executable by the at least one processing module, and the instructions are executed by the at least one processing module, so that the at least one processing module can perform any one of claims 1 to 5 The method, or performing the method according to any one of claims 6 to 10, or performing the method according to any one of claims 11 to 14.
  19. 一种电子设备,包括:An electronic device comprising:
    至少一处理器;以及,at least one processor; and,
    与所述至少一处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一处理器执行的指令,所述指令被所述至少一处理器执行,以使所述至少一处理器能够执行如权利要求1至5中任一项所述的方法、或执行如权利要求6至10中任一项所述的方法、或执行如权利要求11至14中任一项所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform the operations described in any one of claims 1 to 5. described method, or perform a method as described in any one of claims 6 to 10, or perform a method as described in any one of claims 11 to 14.
  20. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1至5中任一项所述的方法、或执行如权利要求6至10中任一项所述的方法、或执行如权利要求11至14中任一项所述的方法。A computer-readable storage medium storing a computer program, wherein, when the computer program is executed by a processor, the method according to any one of claims 1 to 5 is implemented, or the method according to any one of claims 6 to 10 is executed. A method according to one, or performing a method according to any one of claims 11 to 14.
PCT/CN2022/121427 2021-12-03 2022-09-26 Neural network operation method and apparatus, chip, electronic device and storage medium WO2023098256A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111466758.0 2021-12-03
CN202111466758.0A CN116306840A (en) 2021-12-03 2021-12-03 Neural network operation method, device, chip, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023098256A1 true WO2023098256A1 (en) 2023-06-08

Family

ID=86611499

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/121427 WO2023098256A1 (en) 2021-12-03 2022-09-26 Neural network operation method and apparatus, chip, electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN116306840A (en)
WO (1) WO2023098256A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116861149A (en) * 2023-09-05 2023-10-10 之江实验室 Convolution operation optimization method, device and processor

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881618A (en) * 2023-08-25 2023-10-13 之江实验室 General matrix multiplication calculation optimization method, device and processor

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241890A (en) * 2018-01-29 2018-07-03 清华大学 A kind of restructural neural network accelerated method and framework
US20190065896A1 (en) * 2017-08-23 2019-02-28 Samsung Electronics Co., Ltd. Neural network method and apparatus
US20190188237A1 (en) * 2017-12-18 2019-06-20 Nanjing Horizon Robotics Technology Co., Ltd. Method and electronic device for convolution calculation in neutral network
CN111260037A (en) * 2020-02-11 2020-06-09 深圳云天励飞技术有限公司 Convolution operation method and device for image data, electronic device and storage medium
CN112215745A (en) * 2020-09-30 2021-01-12 深圳云天励飞技术股份有限公司 Image processing method and device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190065896A1 (en) * 2017-08-23 2019-02-28 Samsung Electronics Co., Ltd. Neural network method and apparatus
US20190188237A1 (en) * 2017-12-18 2019-06-20 Nanjing Horizon Robotics Technology Co., Ltd. Method and electronic device for convolution calculation in neutral network
CN108241890A (en) * 2018-01-29 2018-07-03 清华大学 A kind of restructural neural network accelerated method and framework
CN111260037A (en) * 2020-02-11 2020-06-09 深圳云天励飞技术有限公司 Convolution operation method and device for image data, electronic device and storage medium
CN112215745A (en) * 2020-09-30 2021-01-12 深圳云天励飞技术股份有限公司 Image processing method and device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116861149A (en) * 2023-09-05 2023-10-10 之江实验室 Convolution operation optimization method, device and processor
CN116861149B (en) * 2023-09-05 2024-01-09 之江实验室 Convolution operation optimization method, device and processor

Also Published As

Publication number Publication date
CN116306840A (en) 2023-06-23

Similar Documents

Publication Publication Date Title
WO2023098256A1 (en) Neural network operation method and apparatus, chip, electronic device and storage medium
US11720646B2 (en) Operation accelerator
CN108805266B (en) Reconfigurable CNN high-concurrency convolution accelerator
CN109102065B (en) Convolutional neural network accelerator based on PSoC
EP3757901A1 (en) Schedule-aware tensor distribution module
CN109447241B (en) Dynamic reconfigurable convolutional neural network accelerator architecture for field of Internet of things
JP7074831B2 (en) Network-on-chip data processing methods and equipment
CN107657581A (en) Convolutional neural network CNN hardware accelerator and acceleration method
WO2019205617A1 (en) Calculation method and apparatus for matrix multiplication
US20190212982A1 (en) Processor, information processing apparatus and operation method for processor
CN111461311B (en) Convolutional neural network operation acceleration method and device based on many-core processor
EP3844610B1 (en) Method and system for performing parallel computation
EP3970036A1 (en) High throughput neural network operations using inter-layer memory layout transformation
CN110991619A (en) Neural network processor, chip and electronic equipment
CN111047036A (en) Neural network processor, chip and electronic equipment
US11086574B2 (en) Machine perception and dense algorithm integrated circuit
CN110377874B (en) Convolution operation method and system
CN115328439A (en) Incremental matrix multiplication accelerator applied to HPC/AI
CN111047035B (en) Neural network processor, chip and electronic equipment
US20230128421A1 (en) Neural network accelerator
CN110414672B (en) Convolution operation method, device and system
CN115860080B (en) Computing core, accelerator, computing method, apparatus, device, medium, and system
CN116185937B (en) Binary operation memory access optimization method and device based on multi-layer interconnection architecture of many-core processor
CN111143766A (en) Method and apparatus for processing two-dimensional complex matrix by artificial intelligence processor
CN116431562A (en) Multi-head attention mechanism fusion calculation distribution method based on acceleration processor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22900064

Country of ref document: EP

Kind code of ref document: A1