WO2021082722A1 - Computing device and method, and related product - Google Patents

Computing device and method, and related product Download PDF

Info

Publication number
WO2021082722A1
WO2021082722A1 PCT/CN2020/113160 CN2020113160W WO2021082722A1 WO 2021082722 A1 WO2021082722 A1 WO 2021082722A1 CN 2020113160 W CN2020113160 W CN 2020113160W WO 2021082722 A1 WO2021082722 A1 WO 2021082722A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
transformation
sub
result
unit
Prior art date
Application number
PCT/CN2020/113160
Other languages
French (fr)
Chinese (zh)
Inventor
张英男
曾洪博
张尧
刘少礼
黄迪
周诗怡
张曦珊
刘畅
郭家明
高钰峰
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2021082722A1 publication Critical patent/WO2021082722A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

A computing device and method, and a related product. The device comprises a master control unit, a slave control unit, a storage unit, a master computing unit, and a slave computing unit. The invention effectively improves the energy efficiency ratio and computing speed of deep learning networks in terms of hardware architecture, thus improving the performance of the deep learning networks.

Description

运算装置、方法及相关产品Computing device, method and related products
本申请要求于2019年11月1日提交中国专利局、申请号为2019110610783、申请名称为“运算装置、方法及相关产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on November 1, 2019, the application number is 2019110610783, and the application name is "Computer Devices, Methods and Related Products", the entire content of which is incorporated into this application by reference .
技术领域Technical field
本公开涉及计算机技术领域,尤其涉及一种运算装置、方法及相关产品。The present disclosure relates to the field of computer technology, and in particular to a computing device, method and related products.
背景技术Background technique
伴随着大数据、机器学习等新兴技术的成熟,越来越多的任务中包含了各种各样的矩阵运算,而矩阵运算的速度决定了计算机算法的运算速度。With the maturity of emerging technologies such as big data and machine learning, more and more tasks include a variety of matrix operations, and the speed of matrix operations determines the speed of computer algorithms.
目前,热门的深度学习网络中就包含着大量的矩阵乘运算。在深度学习网络的全连接层中,输出神经元的运算表达式为y=f(wx+b),其中w是权值矩阵,x是输入向量,b是偏置向量,计算输出矩阵y的过程为矩阵w与向量x相乘,加上向量b,然后对得到的向量进行激活函数运算(即对矩阵中的每个元素进行激活函数运算)。在这个过程中,矩阵乘向量运算的复杂度远高于之后的加偏置和做激活的操作,高效的实现前者对于整个运算过程有着最重要的影响。At present, the popular deep learning network contains a large number of matrix multiplication operations. In the fully connected layer of the deep learning network, the operation expression of the output neuron is y=f(wx+b), where w is the weight matrix, x is the input vector, b is the bias vector, and the output matrix y is calculated The process is to multiply the matrix w and the vector x, add the vector b, and then perform the activation function operation on the obtained vector (that is, perform the activation function operation on each element in the matrix). In this process, the complexity of the matrix multiplication vector operation is much higher than the subsequent addition of offset and activation operations, and the efficient realization of the former has the most important impact on the entire operation process.
但是,在硬件实现过程中,原始卷积的乘法运算开销大,使得深度学习网络的能效比低,运算时间长。However, in the hardware implementation process, the multiplication operation cost of the original convolution is large, which makes the energy efficiency ratio of the deep learning network low and the operation time is long.
发明内容Summary of the invention
本公开提供一种运算装置、方法及相关产品,以提升深度学习网络在硬件架构上的能效比和运算速度,提高深度学习网络的性能。The present disclosure provides a computing device, a method, and related products to improve the energy efficiency ratio and computing speed of a deep learning network on a hardware architecture, and improve the performance of the deep learning network.
第一方面,本公开提供了一种运算装置,包括:主控制单元、从控制单元、存储单元、主运算单元、以及从运算单元;In a first aspect, the present disclosure provides an arithmetic device, including: a master control unit, a slave control unit, a storage unit, a master arithmetic unit, and a slave arithmetic unit;
所述主控制单元,用于发送第一控制指令,所述第一控制指令用于指示所述主运算单元进行winograd卷积运算中的正变换运算,以及指示所述从控制单元发送第二控制指令,所述第二控制指令用于指示所述从运算单元进行winograd卷积运算中的乘加运算和逆变换运算;The master control unit is configured to send a first control instruction, and the first control instruction is used to instruct the master arithmetic unit to perform a positive transformation operation in a winograd convolution operation, and instruct the slave control unit to send a second control Instruction, the second control instruction is used to instruct the slave arithmetic unit to perform multiplication and addition operations and inverse transformation operations in a winograd convolution operation;
所述存储单元,用于存储用于winograd卷积运算的数据;The storage unit is used to store data used for winograd convolution operation;
所述主运算单元,用于响应所述第一控制指令,从所述存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;其中,所述正变换运算被拆解为求和运算;The main arithmetic unit is configured to respond to the first control instruction and extract data from the storage unit to perform a positive transformation operation in a winograd convolution operation to obtain a positive transformation operation result; wherein the forward transformation operation is split The solution is a summation operation;
所述从运算单元,用于响应所述第二控制指令,从所述主运算单元中获取正变换运算结果,以及从所述存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果;其中,所述逆变换运算被拆解为求和运算。The slave operation unit is configured to respond to the second control instruction, obtain the result of the forward transformation operation from the main operation unit, extract data from the storage unit, and perform multiplication and addition operations in the winograd convolution operation And the inverse transformation operation to obtain the winograd convolution operation result; wherein, the inverse transformation operation is disassembled into a summation operation.
第二方面,本公开提供了一种神经网络运算装置,所述神经网络运算装置包括一个或者多个如第一方面中所述的运算装置,用于从其他处理装置中获取待运算数据和控制信息,执行指定的神经网络运算,执行结果通过I/O接口传递给其他处理装置;In a second aspect, the present disclosure provides a neural network computing device. The neural network computing device includes one or more computing devices as described in the first aspect for obtaining data to be computed and controlling from other processing devices. Information, execute the specified neural network operation, and transfer the execution result to other processing devices through the I/O interface;
当所述神经网络运算装置包含多个所述运算装置时,多个所述运算装置通过特定的结构进行连接并传输数据;When the neural network computing device includes multiple computing devices, the multiple computing devices are connected through a specific structure and transmit data;
其中,多个所述运算装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的神经网络的运算;多个所述运算装置共享同一控制系统或拥有各自的控制系统;多个所述运算装置共享内存或者拥有各自的内存;多个所述运算装置的互联方式是任意互联拓扑。Wherein, a plurality of said arithmetic devices are interconnected and transmit data through a fast external device interconnection bus to support larger-scale neural network operations; a plurality of said arithmetic devices share the same control system or have their own control systems; The two computing devices share memory or have their own memory; the interconnection mode of the multiple computing devices is an arbitrary interconnection topology.
第三方面,本公开提供了一种人工智能芯片,所述芯片包括如第一方面中任意一项所述的运算装置。In a third aspect, the present disclosure provides an artificial intelligence chip including the computing device according to any one of the first aspect.
第四方面,本公开提供了一种电子设备,所述电子设备包括如第三方面所述的人工智能芯片。In a fourth aspect, the present disclosure provides an electronic device including the artificial intelligence chip as described in the third aspect.
第五方面,本公开提供了一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如第三方面所述的人工智能芯片;In a fifth aspect, the present disclosure provides a board card, the board card includes: a storage device, an interface device, a control device, and the artificial intelligence chip as described in the third aspect;
其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
所述存储器件,用于存储数据;The storage device is used to store data;
所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;
所述控制器件,用于对所述人工智能芯片的状态进行监控。The control device is used to monitor the state of the artificial intelligence chip.
第六方面,本公开提供了一种运算方法,应用于运算装置,所述运算装置包括:主控制单元、从控制单元、存储单元、主运算单元、以及从运算单元;所述方法包括:In a sixth aspect, the present disclosure provides an arithmetic method applied to a computing device, the arithmetic device comprising: a master control unit, a slave control unit, a storage unit, a master arithmetic unit, and a slave arithmetic unit; the method includes:
所述主控制单元发送第一控制指令,所述第一控制指令用于指示所述主运算单元进行winograd卷积运算中的正变换运算,以及指示所述从控制单元发送第二控制指令, 所述第二控制指令用于指示所述从运算单元进行winograd卷积运算中的乘加运算和逆变换运算;The master control unit sends a first control instruction, and the first control instruction is used to instruct the master arithmetic unit to perform the forward transformation operation in the winograd convolution operation, and instruct the slave control unit to send a second control instruction, so The second control instruction is used to instruct the slave operation unit to perform multiplication and addition operations and inverse transformation operations in a winograd convolution operation;
所述存储单元存储用于winograd卷积运算的数据;The storage unit stores data used for winograd convolution operation;
所述主运算单元响应所述第一控制指令,从所述存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;其中,所述正变换运算被拆解为求和运算;In response to the first control instruction, the main arithmetic unit extracts data from the storage unit to perform a positive transformation operation in a winograd convolution operation to obtain a positive transformation operation result; wherein the forward transformation operation is decomposed into Sum operation
所述从运算单元响应所述第二控制指令,从所述主运算单元中获取正变换运算结果,以及从所述存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果;其中,所述逆变换运算被拆解为求和运算。The slave operation unit responds to the second control instruction, obtains the result of the forward transformation operation from the main operation unit, and extracts data from the storage unit, and performs the multiplication and addition operation and the inverse transformation in the winograd convolution operation Operation to obtain the winograd convolution operation result; wherein, the inverse transformation operation is disassembled into a summation operation.
本公开提供的运算装置、方法及相关产品,运算装置中设置有主控制单元、从控制单元、存储单元、主运算单元、以及从运算单元;主控制单元发送第一控制指令,第一控制指令用于指示主运算单元进行winograd卷积运算中的正变换运算,以及指示从控制单元发送第二控制指令,第二控制指令用于指示从运算单元进行winograd卷积运算中的乘加运算和逆变换运算;存储单元存储用于winograd卷积运算的数据;主运算单元响应第一控制指令,从存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;其中,正变换运算被拆解为求和运算;从运算单元响应第二控制指令,从主运算单元中获取正变换运算结果,以及从存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果;其中,逆变换运算被拆解为求和运算。从而能够有效提升深度学习网络在硬件架构上的能效比和运算速度,提高深度学习网络的性能。The arithmetic device, method and related products provided by the present disclosure are provided with a master control unit, a slave control unit, a storage unit, a master arithmetic unit, and a slave arithmetic unit in the arithmetic device; the master control unit sends a first control instruction, a first control instruction Used to instruct the main arithmetic unit to perform the forward transformation operation in the winograd convolution operation, and instruct the slave control unit to send a second control instruction, the second control instruction is used to instruct the slave arithmetic unit to perform the multiply-add operation and inverse in the winograd convolution operation Transformation operation; the storage unit stores data used for winograd convolution operation; the main operation unit responds to the first control instruction, extracts data from the storage unit and performs the positive transformation operation in the winograd convolution operation to obtain the result of the positive transformation operation; The transformation operation is broken down into a summation operation; the slave operation unit responds to the second control instruction, obtains the result of the positive transformation operation from the main operation unit, and extracts data from the storage unit, and performs the multiplication and addition operation in the winograd convolution operation. The inverse transformation operation obtains the winograd convolution operation result; among them, the inverse transformation operation is disassembled into a summation operation. This can effectively improve the energy efficiency ratio and computing speed of the deep learning network on the hardware architecture, and improve the performance of the deep learning network.
本公开可以应用于以下(包括但不限于)场景中:数据处理、机器人、电脑、打印机、扫描仪、电话、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备等各类电子产品;飞机、轮船、车辆等各类交通工具;电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机等各类家用电器;以及包括核磁共振仪、B超、心电图仪等各类医疗设备。The present disclosure can be applied to the following (including but not limited to) scenarios: data processing, robots, computers, printers, scanners, phones, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, cloud servers , Cameras, camcorders, projectors, watches, earphones, mobile storage, wearable devices and other electronic products; aircraft, ships, vehicles and other transportation tools; TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, Electric lights, gas stoves, range hoods and other household appliances; and various medical equipment including nuclear magnetic resonance, B-ultrasound, electrocardiograph, etc.
附图说明Description of the drawings
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面,并且用于解释本公开的原理。The drawings included in the specification and constituting a part of the specification together with the specification illustrate exemplary embodiments, features, and aspects of the present disclosure, and are used to explain the principle of the present disclosure.
图1为本申请一实施例提供的运算装置的结构示意图;FIG. 1 is a schematic diagram of the structure of an arithmetic device provided by an embodiment of the application;
图2为本申请另一实施例提供的运算装置的结构示意图;2 is a schematic structural diagram of a computing device provided by another embodiment of the application;
图3为本申请又一实施例提供的运算装置的流水时序图;FIG. 3 is a flow sequence diagram of a computing device provided by another embodiment of this application;
图4为本申请又一实施例提供的运算方法的流程示意图;FIG. 4 is a schematic flowchart of an operation method provided by another embodiment of this application;
图5为根据本公开实施例的运算方法的处理系统的示意图;FIG. 5 is a schematic diagram of a processing system of an arithmetic method according to an embodiment of the present disclosure;
图6为根据本公开实施例的板卡的结构框图。Fig. 6 is a structural block diagram of a board according to an embodiment of the present disclosure.
具体实施方式Detailed ways
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, rather than all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present disclosure.
应当理解,本公开的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本公开的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that the terms "first", "second", "third" and "fourth" in the claims, specification and drawings of the present disclosure are used to distinguish different objects, rather than to describe a specific order. . The terms "comprising" and "comprising" used in the specification and claims of the present disclosure indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or more other features, wholes The existence or addition of, steps, operations, elements, components, and/or their collections.
还应当理解,在此本公开说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本公开。如在本公开说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本公开说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the terms used in this specification of the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. As used in the specification and claims of the present disclosure, unless the context clearly indicates otherwise, the singular forms "a", "an" and "the" are intended to include plural forms. It should be further understood that the term "and/or" used in the specification and claims of the present disclosure refers to any combination of one or more of the items listed in association and all possible combinations, and includes these combinations.
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and claims, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context. Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".
为了清楚理解本申请的技术方案,下面对现有技术和本申请实施例中涉及的技术术语进行解释:In order to clearly understand the technical solutions of the present application, the technical terms involved in the prior art and the embodiments of the present application are explained below:
卷积运算:卷积运算是指从图像的左上角开始,开一个与模板同样大小的活动窗口,活动窗口对应一个窗口图像,该窗口图像为卷积核,窗口图像与图像中的像素对应起来相乘再相加,并用计算结果作为卷积运算后新图像的第一个像素值。然后,活动窗口向右移动一列,将活动窗口对应的窗口图像与图像中的像素对应起来相乘再相加,并用计算结果 作为卷积运算后新图像的第二个像素值。以此类推,从左到右、从上到下,即可得到一幅新图像。Convolution operation: Convolution operation refers to opening an active window with the same size as the template from the upper left corner of the image. The active window corresponds to a window image, which is the convolution kernel, and the window image corresponds to the pixels in the image Multiply and add, and use the calculation result as the first pixel value of the new image after the convolution operation. Then, the active window moves one column to the right, the window image corresponding to the active window and the pixels in the image are multiplied and then added together, and the calculation result is used as the second pixel value of the new image after the convolution operation. By analogy, from left to right, from top to bottom, you can get a new image.
winograd卷积:Winograd卷积是一种基于多项式插值算法的卷积加速实现方式。它通过对卷积操作的两个输入:第一目标矩阵和第二目标矩阵分别进行winograd卷积正变换,再将正变换后的第一目标矩阵和第二目标矩阵进行对位乘法,最后对对位乘法结果再次进行winograd卷积逆变换,得到与原卷积操作等价的卷积结果。Winograd convolution: Winograd convolution is a convolution acceleration implementation method based on polynomial interpolation algorithm. It passes the two inputs of the convolution operation: the first target matrix and the second target matrix are respectively subjected to winograd convolution positive transformation, and then the first target matrix and the second target matrix after the positive transformation are subjected to bitwise multiplication, and finally The winograd convolution inverse transformation is performed again on the result of the bit multiplication, and the convolution result equivalent to the original convolution operation is obtained.
卷积神经网络模型:卷积神经网络模型是一类包含卷积计算且具有深度结构的前馈神经网络模型,是深度学习的代表模型之一。在卷积神经网络模型中的卷积层,全连接层等网络层中均需要对神经元与卷积核进行卷积运算,得到特征数据。Convolutional neural network model: Convolutional neural network model is a type of feedforward neural network model that includes convolution calculation and has a deep structure, and is one of the representative models of deep learning. In the convolutional layer, fully connected layer and other network layers in the convolutional neural network model, it is necessary to perform convolution operations on neurons and convolution kernels to obtain feature data.
图1为本申请一实施例提供的运算装置的结构示意图,如图1所示,本实施例中的装置可以包括:主控制单元、从控制单元、存储单元、主运算单元、以及从运算单元;主控制单元,用于发送第一控制指令,第一控制指令用于指示主运算单元进行winograd卷积运算中的正变换运算,以及指示从控制单元发送第二控制指令,第二控制指令用于指示从运算单元进行winograd卷积运算中的乘加运算和逆变换运算;存储单元,用于存储用于winograd卷积运算的数据;主运算单元,用于响应第一控制指令,从存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;其中,正变换运算被拆解为求和运算;从运算单元,用于响应第二控制指令,从主运算单元中获取正变换运算结果,以及从存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果;其中,逆变换运算被拆解为求和运算。Figure 1 is a schematic structural diagram of a computing device provided by an embodiment of this application. As shown in Figure 1, the device in this embodiment may include: a master control unit, a slave control unit, a storage unit, a master arithmetic unit, and a slave arithmetic unit ; The main control unit is used to send a first control instruction, the first control instruction is used to instruct the main arithmetic unit to perform the forward transformation operation in the winograd convolution operation, and to instruct the slave control unit to send a second control instruction, the second control instruction is used To instruct the slave operation unit to perform multiplication and addition operations and inverse transformation operations in the winograd convolution operation; the storage unit is used to store data used for the winograd convolution operation; the main operation unit is used to respond to the first control instruction, and the slave storage unit The data is extracted from the winograd convolution operation and the positive transformation operation is performed to obtain the positive transformation operation result; the positive transformation operation is disassembled into a summation operation; the slave operation unit is used to respond to the second control instruction from the main operation unit Obtain the result of the forward transformation operation, and extract the data from the storage unit, and perform the multiplication and addition operation and the inverse transformation operation in the winograd convolution operation to obtain the winograd convolution operation result; wherein the inverse transformation operation is disassembled into a summation operation.
本实施例中,主控制单元和从控制单元接收运算控制信号,并对该运算控制信号进行译码处理,得到相应的第一控制指令和第二控制指令,然后主控制单元将第一控制指令发送给主运算单元,将第二控制指令发送给从运算单元;主运算单元响应第一控制指令,从存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;从运算单元响应第二控制指令,从主运算单元中获取正变换运算结果,以及从存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果。In this embodiment, the main control unit and the slave control unit receive the arithmetic control signal, and decode the arithmetic control signal to obtain the corresponding first control instruction and second control instruction, and then the master control unit transfers the first control instruction Send to the main arithmetic unit, send the second control instruction to the slave arithmetic unit; the master arithmetic unit responds to the first control instruction, extracts data from the storage unit and performs the positive transformation operation in the winograd convolution operation to obtain the positive transformation operation result; In response to the second control instruction, the arithmetic unit obtains the result of the forward transformation operation from the main arithmetic unit, and extracts data from the storage unit, and performs multiplication and addition operations and inverse transformation operations in the winograd convolution operation to obtain the winograd convolution operation result.
本实施例中,通过在运算装置中设置有主控制单元、从控制单元、存储单元、主运算单元、以及从运算单元;主控制单元发送第一控制指令,第一控制指令用于指示主运算单元进行winograd卷积运算中的正变换运算,以及指示从控制单元发送第二控制指令,第二控制指令用于指示从运算单元进行winograd卷积运算中的乘加运算和逆 变换运算;存储单元存储用于winograd卷积运算的数据;主运算单元响应第一控制指令,从存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;从运算单元响应第二控制指令,从主运算单元中获取正变换运算结果,以及从存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果。由于主运算单元的正变换运算被拆解为求和运算,从运算单元的逆变换运算被拆解为求和运算;从而能够有效提升深度学习网络在硬件架构上的能效比和运算速度,提高深度学习网络的性能。In this embodiment, the arithmetic device is provided with a master control unit, a slave control unit, a storage unit, a master arithmetic unit, and a slave arithmetic unit; the master control unit sends a first control instruction, and the first control instruction is used to instruct the master arithmetic The unit performs the forward transformation operation in the winograd convolution operation, and instructs the slave control unit to send a second control instruction, the second control instruction is used to instruct the slave operation unit to perform the multiplication and addition operation and the inverse transformation operation in the winograd convolution operation; storage unit Store data used for winograd convolution operation; the main arithmetic unit responds to the first control instruction, extracts data from the storage unit to perform the positive transformation operation in the winograd convolution operation, and obtains the result of the positive transformation operation; the slave arithmetic unit responds to the second control instruction , Obtain the result of the forward transformation operation from the main operation unit, and extract the data from the storage unit, and perform the multiplication and addition operation and the inverse transformation operation in the winograd convolution operation to obtain the winograd convolution operation result. Since the forward transformation operation of the main arithmetic unit is disassembled into a summation operation, the inverse transformation operation of the slave arithmetic unit is disassembled into a summation operation; this can effectively improve the energy efficiency ratio and calculation speed of the deep learning network on the hardware architecture, and improve The performance of the deep learning network.
图2为本申请另一实施例提供的运算装置的结构示意图,如图2所示,本实施例中的装置可以包括:主控制单元、从控制单元、主存储单元、从存储单元主运算单元,以及从运算单元。FIG. 2 is a schematic structural diagram of an arithmetic device provided by another embodiment of the application. As shown in FIG. 2, the device in this embodiment may include: a master control unit, a slave control unit, a master storage unit, and a slave storage unit master arithmetic unit , And from the computing unit.
在第一种可选的实施方式中,主存储单元,用于接收特征数据并存储;从存储单元,用于接收权值数据的正变换数据并存储。In the first optional implementation manner, the main storage unit is used to receive and store the characteristic data; the slave storage unit is used to receive and store the positive transformation data of the weight data.
本实施例中,主运算单元从主存储单元获取特征数据,并将特征数据拆解为多个子张量;对多个子张量进行变换运算并求和,根据求和运算的结果得到特征数据的正变换数据。In this embodiment, the main arithmetic unit obtains the feature data from the main storage unit, and disassembles the feature data into multiple sub-tensors; performs transformation operations on the multiple sub-tensors and sums them, and obtains the feature data according to the result of the summation operation. The data is being transformed.
可选地,主运算单元从特征数据解析得到多个子张量,其中,特征数据为多个子张量之和,多个子张量的个数与特征数据中非0元素的个数相同,每个子张量中有单个非0元素,且在子张量中的非0元素与在特征数据中对应位置的非0元素相同。主运算单元获取各子张量对应的元子张量的winograd变换结果,其中,元子张量是将子张量的非0元素置为1的张量;将子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到特征数据的正变换数据。主运算单元对于每一个子张量,将子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到元子张量的winograd变换结果,其中,左乘矩阵和右乘矩阵都是由子张量的规模以及winograd变换类型确定的,其中winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。Optionally, the main arithmetic unit parses the feature data to obtain multiple sub-tensors, where the feature data is the sum of multiple sub-tensors, the number of multiple sub-tensors is the same as the number of non-zero elements in the feature data, and each sub-tensor There is a single non-zero element in the tensor, and the non-zero element in the sub-tensor is the same as the non-zero element at the corresponding position in the feature data. The main arithmetic unit obtains the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor, where the meta-sub-tensor is a tensor with the non-zero elements of the sub-tensor set to 1, and the non-zero element values in the sub-tensor are taken as The coefficient is multiplied by the winograd transformation result of the corresponding element sub-tensor to obtain the winograd transformation result of the sub-tensor; the winograd transformation results of multiple sub-tensors are added to obtain the positive transformation data of the feature data. For each sub-tensor, the main arithmetic unit multiplies the left side of the sub-tensor corresponding to the sub-tensor by the left-multiplying matrix, and the right side by the right-multiplying matrix to obtain the winograd transformation result of the sub-tensor, where the left multiplying matrix and The right multiplication matrix is determined by the scale of the subtensor and the winograd transformation type, where the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
本实施例中,从运算单元还可以将winograd卷积运算结果存储到存储单元的预设地址空间中。这种方式可以实现运算结果的定向存储,充分利用存储空间。In this embodiment, the slave operation unit may also store the winograd convolution operation result in the preset address space of the storage unit. In this way, the directional storage of the calculation results can be realized, and the storage space can be fully utilized.
本实施例中,主运算单元与多个从运算单元通信连接,不同的从运算单元负责对不同正变换运算结果的运算。这种设计方式可以提高运算能力和运算效率,实现异步运算。In this embodiment, the main arithmetic unit is communicatively connected with a plurality of slave arithmetic units, and different slave arithmetic units are responsible for calculating the results of different forward transformation operations. This design method can improve computing power and computing efficiency, and achieve asynchronous computing.
本实施例中,从运算单元从从存储单元获取权值数据的正变换数据,然后将特征数据的正变换数据和权值数据的正变换数据进行对位乘法运算,得到乘法运算结果;将乘法运算结果拆解为多个子张量;对多个子张量进行变换运算并求和,得到winograd卷积运算结果。In this embodiment, the slave arithmetic unit obtains the positive transformation data of the weight data from the storage unit, and then performs bitwise multiplication on the positive transformation data of the feature data and the positive transformation data of the weight data to obtain the multiplication result; The operation result is disassembled into multiple sub-tensors; the multiple sub-tensors are transformed and summed to obtain the winograd convolution operation result.
可选地,从运算单元从乘法运算结果解析得到多个子张量,其中,乘法运算结果为多个子张量之和,多个子张量的个数与乘法运算结果中非0元素的个数相同,每个子张量中有单个非0元素,且在子张量中的非0元素与在乘法运算结果中对应位置的非0元素相同。Optionally, the multiplication operation result is analyzed from the operation unit to obtain multiple sub-tensors, where the multiplication operation result is the sum of multiple sub-tensors, and the number of the multiple sub-tensors is the same as the number of non-zero elements in the multiplication operation result , There is a single non-zero element in each sub-tensor, and the non-zero element in the sub-tensor is the same as the non-zero element in the corresponding position in the result of the multiplication operation.
本实施例中,从运算单元获取各子张量对应的元子张量的winograd变换结果,其中,元子张量是将子张量的非0元素置为1的张量;将子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到winograd卷积运算结果。从运算单元对于每一个子张量,将子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到元子张量的winograd变换结果,其中,左乘矩阵和右乘矩阵都是由子张量的规模以及winograd变换类型确定的,其中winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。In this embodiment, the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor is obtained from the arithmetic unit, where the meta-sub-tensor is a tensor in which the non-zero elements of the sub-tensor are set to 1; The element value of 0 is used as the coefficient multiplied by the winograd transformation result of the corresponding element sub-tensor to obtain the winograd transformation result of the sub-tensor; the winograd transformation results of multiple sub-tensors are added to obtain the winograd convolution operation result. For each sub-tensor from the arithmetic unit, multiply the left side of the sub-tensor corresponding to the sub-tensor by the left multiplying matrix and the right side by the right multiplying matrix to obtain the winograd transformation result of the sub-tensor, where the left multiplying matrix and The right multiplication matrix is determined by the scale of the subtensor and the winograd transformation type, where the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
在第二种可选的实施方式中,主存储单元,用于接收特征数据并存储;从存储单元,用于接收权值数据并存储。In the second optional implementation manner, the main storage unit is used to receive and store the characteristic data; the slave storage unit is used to receive and store the weight data.
本实施例中,主运算单元从主存储单元中获取特征数据,并将特征数据拆解为多个子张量;对多个子张量进行变换运算并求和,根据求和运算的结果得到特征数据的正变换数据。In this embodiment, the main arithmetic unit obtains the feature data from the main storage unit, and disassembles the feature data into multiple sub-tensors; performs transformation operations on the multiple sub-tensors and sums them, and obtains the feature data according to the result of the summation operation The positive transformation data.
可选地,主运算单元从特征数据解析得到多个子张量,其中,特征数据为多个子张量之和,多个子张量的个数与特征数据中非0元素的个数相同,每个子张量中有单个非0元素,且在子张量中的非0元素与在特征数据中对应位置的非0元素相同。主运算单元获取各子张量对应的元子张量的winograd变换结果,其中,元子张量是将子张量的非0元素置为1的张量;将子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到特征数据的正变换数据。Optionally, the main arithmetic unit parses the feature data to obtain multiple sub-tensors, where the feature data is the sum of multiple sub-tensors, the number of multiple sub-tensors is the same as the number of non-zero elements in the feature data, and each sub-tensor There is a single non-zero element in the tensor, and the non-zero element in the sub-tensor is the same as the non-zero element at the corresponding position in the feature data. The main arithmetic unit obtains the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor, where the meta-sub-tensor is a tensor with the non-zero elements of the sub-tensor set to 1, and the non-zero element values in the sub-tensor are taken as The coefficient is multiplied by the winograd transformation result of the corresponding element sub-tensor to obtain the winograd transformation result of the sub-tensor; the winograd transformation results of multiple sub-tensors are added to obtain the positive transformation data of the feature data.
本实施例中,主运算单元对于每一个子张量,将子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到元子张量的winograd变换结果,其中,左乘矩阵和右乘矩阵都是由子张量的规模以及winograd变换类型确定的,其中winograd变换类 型包括正变换的winograd变换类型和逆变换的winograd变换类型。In this embodiment, for each sub-tensor, the main arithmetic unit multiplies the left side of the sub-tensor corresponding to the sub-tensor by the left multiplying matrix and the right side by the right multiplying matrix to obtain the winograd transformation result of the sub-tensor, where , The left multiplication matrix and the right multiplication matrix are both determined by the scale of the sub-tensor and the winograd transformation type, where the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
本实施例中,从运算单元从从存储单元中获取权值数据,并将权值数据拆解为多个子张量;对多个子张量进行变换运算并求和,根据求和运算的结果得到权值数据的正变换数据。从运算单元将特征数据的正变换数据和权值数据的正变换数据进行对位乘法运算,得到乘法运算结果;将乘法运算结果拆解为多个子张量;对多个子张量进行变换运算并求和,得到winograd卷积运算结果。In this embodiment, the slave computing unit obtains the weight data from the storage unit, and disassembles the weight data into multiple sub-tensors; performs transformation operations on the multiple sub-tensors and sums them, and obtains according to the result of the summation operation The positive transformation data of the weight data. Perform bitwise multiplication on the positive transformation data of the feature data and the positive transformation data of the weight data from the arithmetic unit to obtain the result of the multiplication operation; disassemble the result of the multiplication operation into multiple sub-tensors; perform transformation operations on the multiple sub-tensors and combine Sum and get the result of winograd convolution operation.
可选地,从运算单元从乘法运算结果解析得到多个子张量,其中,乘法运算结果为多个子张量之和,多个子张量的个数与乘法运算结果中非0元素的个数相同,每个子张量中有单个非0元素,且在子张量中的非0元素与在乘法运算结果中对应位置的非0元素相同。从运算单元获取各子张量对应的元子张量的winograd变换结果,其中,元子张量是将子张量的非0元素置为1的张量;将子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到winograd卷积运算结果。Optionally, the multiplication operation result is analyzed from the operation unit to obtain multiple sub-tensors, where the multiplication operation result is the sum of multiple sub-tensors, and the number of the multiple sub-tensors is the same as the number of non-zero elements in the multiplication operation result , There is a single non-zero element in each sub-tensor, and the non-zero element in the sub-tensor is the same as the non-zero element in the corresponding position in the result of the multiplication operation. Obtain the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor from the arithmetic unit, where the meta-sub-tensor is a tensor with the non-zero elements of the sub-tensor set to 1, and the non-zero element values in the sub-tensor are taken as The coefficient is multiplied by the winograd transformation result of the corresponding element sub-tensor to obtain the winograd transformation result of the sub-tensor; the winograd transformation results of multiple sub-tensors are added to obtain the winograd convolution operation result.
本实施例中,从运算单元对于每一个子张量,将子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到元子张量的winograd变换结果,其中,左乘矩阵和右乘矩阵都是由子张量的规模以及winograd变换类型确定的,其中winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。In this embodiment, for each sub-tensor from the arithmetic unit, the left side of the sub-tensor corresponding to the sub-tensor is multiplied by the left multiplying matrix, and the right side is multiplied by the right multiplying matrix to obtain the winograd transformation result of the sub-tensor, where , The left multiplication matrix and the right multiplication matrix are both determined by the scale of the sub-tensor and the winograd transformation type, where the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
在第三种可选的实施方式中,主存储单元,用于接收特征数据和权值数据并存储;从存储单元,用于接收权值数据的正变换数据并存储。In a third alternative embodiment, the main storage unit is used to receive and store the characteristic data and weight data; the slave storage unit is used to receive and store the positive transformation data of the weight data.
本实施例中,主运算单元从主存储单元中获取特征数据,并将特征数据拆解为多个子张量;对多个子张量进行变换运算并求和,根据求和运算的结果得到特征数据的正变换数据。In this embodiment, the main arithmetic unit obtains the feature data from the main storage unit, and disassembles the feature data into multiple sub-tensors; performs transformation operations on the multiple sub-tensors and sums them, and obtains the feature data according to the result of the summation operation The positive transformation data.
可选地,主运算单元从特征数据解析得到多个子张量,其中,特征数据为多个子张量之和,多个子张量的个数与特征数据中非0元素的个数相同,每个子张量中有单个非0元素,且在子张量中的非0元素与在特征数据中对应位置的非0元素相同。Optionally, the main arithmetic unit parses the feature data to obtain multiple sub-tensors, where the feature data is the sum of multiple sub-tensors, the number of multiple sub-tensors is the same as the number of non-zero elements in the feature data, and each sub-tensor There is a single non-zero element in the tensor, and the non-zero element in the sub-tensor is the same as the non-zero element at the corresponding position in the feature data.
本实施例中,主运算单元获取各子张量对应的元子张量的winograd变换结果,其中,元子张量是将子张量的非0元素置为1的张量;将子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到特征数据的正变换数据。主运算单元对于每一个子张量,将子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到元子张量的winograd变换结果,其中,左乘矩阵和右乘矩阵都是由子张量的规模以及 winograd变换类型确定的,其中winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。In this embodiment, the main arithmetic unit obtains the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor, where the meta-sub-tensor is a tensor in which the non-zero elements of the sub-tensor are set to 1; The element value of 0 is used as a coefficient multiplied by the winograd transformation result of the corresponding element sub-tensor to obtain the winograd transformation result of the sub-tensor; the winograd transformation results of multiple sub-tensors are added to obtain the positive transformation data of the feature data. For each sub-tensor, the main arithmetic unit multiplies the left side of the sub-tensor corresponding to the sub-tensor by the left-multiplying matrix, and the right side by the right-multiplying matrix to obtain the winograd transformation result of the sub-tensor, where the left multiplying matrix and The right multiplication matrix is determined by the scale of the subtensor and the winograd transformation type, where the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
本实施例中,主运算单元将权值数据拆解为多个子张量;对多个子张量进行变换运算并求和,根据求和运算的结果得到权值数据的正变换数据;将权值数据的正变换数据发送给从存储单元。从运算单元从从存储单元中获取权值数据的正变换数据,并将特征数据的正变换数据和权值数据的正变换数据进行对位乘法运算,得到乘法运算结果;将乘法运算结果拆解为多个子张量;对多个子张量进行变换运算并求和,得到winograd卷积运算结果。In this embodiment, the main arithmetic unit disassembles the weight data into multiple sub-tensors; performs transformation operations on the multiple sub-tensors and sums them, and obtains the positive transformation data of the weight data according to the result of the summation operation; The positive conversion data of the data is sent to the slave storage unit. The slave operation unit obtains the positive transformation data of the weight data from the storage unit, and performs bitwise multiplication of the positive transformation data of the feature data and the positive transformation data of the weight data to obtain the multiplication result; disassemble the multiplication result For multiple sub-tensors; transform and sum multiple sub-tensors to obtain the winograd convolution operation result.
可选地,从运算单元从乘法运算结果解析得到多个子张量,其中,乘法运算结果为多个子张量之和,多个子张量的个数与乘法运算结果中非0元素的个数相同,每个子张量中有单个非0元素,且在子张量中的非0元素与在乘法运算结果中对应位置的非0元素相同。从运算单元获取各子张量对应的元子张量的winograd变换结果,其中,元子张量是将子张量的非0元素置为1的张量;将子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到winograd卷积运算结果。从运算单元对于每一个子张量,将子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到元子张量的winograd变换结果,其中,左乘矩阵和右乘矩阵都是由子张量的规模以及winograd变换类型确定的,其中winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。Optionally, the multiplication operation result is analyzed from the operation unit to obtain multiple sub-tensors, where the multiplication operation result is the sum of multiple sub-tensors, and the number of the multiple sub-tensors is the same as the number of non-zero elements in the multiplication operation result , There is a single non-zero element in each sub-tensor, and the non-zero element in the sub-tensor is the same as the non-zero element in the corresponding position in the result of the multiplication operation. Obtain the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor from the arithmetic unit, where the meta-sub-tensor is a tensor with the non-zero elements of the sub-tensor set to 1, and the non-zero element values in the sub-tensor are taken as The coefficient is multiplied by the winograd transformation result of the corresponding element sub-tensor to obtain the winograd transformation result of the sub-tensor; the winograd transformation results of multiple sub-tensors are added to obtain the winograd convolution operation result. For each sub-tensor from the arithmetic unit, multiply the left side of the sub-tensor corresponding to the sub-tensor by the left multiplying matrix and the right side by the right multiplying matrix to obtain the winograd transformation result of the sub-tensor, where the left multiplying matrix and The right multiplication matrix is determined by the scale of the subtensor and the winograd transformation type, where the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
在第四种可选的实施方式中,主运算单元包括:主处理模块和缓存;主处理模块,用于响应第一控制指令,从存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;缓存,用于存储正变换运算结果。In a fourth optional implementation manner, the main operation unit includes: a main processing module and a cache; the main processing module is used to respond to the first control instruction and extract data from the storage unit to perform the positive transformation operation in the winograd convolution operation , Get the result of the positive transformation operation; cache, used to store the result of the positive transformation operation.
本实施例中,可以在缓存存储的正变换运算结果累积到预设数量时,向从运算单元发送正变换运算结果。这种设计方式可以整合从运算单元的处理数据,避免从运算单元一直处于运算状态。In this embodiment, the positive transformation operation result may be sent to the slave operation unit when the positive transformation operation result stored in the cache has accumulated to a preset number. This design method can integrate the processing data of the slave arithmetic unit and avoid the slave arithmetic unit from being in the arithmetic state all the time.
在第五种可选的实施方式中,主运算单元和从运算单元的运算过程为并行运算,在主运算单元对特征数据的正变换数据计算完成之前,从运算单元针对已计算出的特征数据的正变换数据的元素位置和对应的权值数据的正变换数据的元素位置进行对位乘法运算,直至计算出各元素位置的对位乘法运算值,获得乘法运算结果。In the fifth optional implementation manner, the operation process of the main operation unit and the slave operation unit are parallel operations. Before the main operation unit completes the calculation of the positive transformation data of the feature data, the slave operation unit focuses on the calculated feature data. The bitwise multiplication is performed on the element position of the positive transformation data of the forward transformation data and the element position of the forward transformation data of the corresponding weight data, until the alignment multiplication value of each element position is calculated, and the multiplication result is obtained.
本实施例中,主存储单元中存储的特征数据被划分为多个用于winograd卷积运算的第一数据,第一数据的尺寸根据卷积核的尺寸确定;从存储单元中存储的权值数据 的正变换数据被划分为多个用于winograd卷积运算的第二数据,第二数据的尺寸根据第一数据的尺寸确定;主运算单元的主处理模块,响应主控制单元发送的第一控制指令,从主存储单元中依次获取第一数据,对第一数据进行正变换运算得到第一数据的正变换结果,并将第一数据的正变换结果存储在缓存中;当缓存中的第一数据的正变换结果达到预设数量时,主运算单元的主处理模块将缓存中的第一数据的正变换结果依次发送给从运算单元;从运算单元响应从控制单元发送的第二控制指令,从从存储单元中获取第二数据,将第一数据的正变换结果与第二数据进行对位乘运算,得到对位乘运算结果,并对对位乘运算结果进行逆变换运算,得到逆变换结果;从运算单元根据逆变换结果得到winograd卷积运算结果,并将winograd卷积运算结果发送给主存储单元进行存储。In this embodiment, the feature data stored in the main storage unit is divided into a plurality of first data used for winograd convolution operation, and the size of the first data is determined according to the size of the convolution kernel; the weight stored in the storage unit The forward transformation data of the data is divided into a plurality of second data used for winograd convolution operation, the size of the second data is determined according to the size of the first data; the main processing module of the main operation unit responds to the first data sent by the main control unit The control instruction is to sequentially obtain the first data from the main storage unit, perform a forward transformation operation on the first data to obtain the positive transformation result of the first data, and store the positive transformation result of the first data in the cache; when the first data in the cache is When the positive conversion result of a data reaches the preset number, the main processing module of the main arithmetic unit sends the positive conversion result of the first data in the buffer to the slave arithmetic unit in turn; the slave arithmetic unit responds to the second control instruction sent by the slave control unit , Obtain the second data from the storage unit, perform the bitwise multiplication operation on the forward transformation result of the first data and the second data to obtain the bitwise multiplication operation result, and perform the inverse transformation operation on the bitwise multiplication operation result to obtain the inverse Transformation result; the slave operation unit obtains the winograd convolution operation result according to the inverse transformation result, and sends the winograd convolution operation result to the main storage unit for storage.
图3为本申请又一实施例提供的运算装置的流水时序图;如图3所示,从功能单元、主处理模块为运算模块;缓存、主处理内存、从处理内存为存储模块。主处理内存和从处理内存中预先存储的输入数据是进行切分之后的数据。主功能单元to主处理内存,表示主处理内存向主功能单元发送特征数据。具体地,以输入规模4x4,卷积核规模3x3,stride=1,输出规模2x2的卷积为例进行详细说明。假设m表示沿高度和宽度方向切分的特征数据块数,n表示沿Cout方向切分的权值数据块数;k<=4,表示Cin方向的切分数目,res(0,0)…(1,0)表示输出的运算结果。bd(i,j)表示bottom_data切分后的沿高度和宽度方向的第i个,沿Cin方向的第j个数据块;其中,数据块的大小为16*16*512bit,即syn_reuse_iter个kernel大小的块,syn_reuse_iter表示权值复用次数,kernel表示winograd变换的最基本单元。Wino_bd(i,j)表示winograd变换后的winograd域的bottom_data块。Wino_w(i,j)表示经过变换的winograd域weight的Cout方向的第i个,Cin方向的第j个数据块,且数据块大小为64*16*512bit,与特征数据块相对应,总数据块的数量为n*k,n=Cout/64,k<=4。bd(i,j)表示经过主处理模块计算后得到bottom_data变换结果Wino_bd(i1,j1),然后将变换结果向从功能单元发送,并与相应的Wino_w(i2,j2)进行对位乘法运算,以及逆变换运算,得到运算结果。Fig. 3 is a flow sequence diagram of an arithmetic device provided by another embodiment of the application; as shown in Fig. 3, the slave functional unit and the main processing module are the arithmetic modules; the cache, the main processing memory, and the slave processing memory are the storage modules. The input data pre-stored in the main processing memory and the slave processing memory are data after segmentation. The main function unit to main processing memory means that the main processing memory sends characteristic data to the main function unit. Specifically, a convolution with an input scale of 4×4, a convolution kernel scale of 3×3, stride=1, and an output scale of 2×2 is taken as an example for detailed description. Suppose that m represents the number of feature data blocks divided along the height and width directions, and n represents the number of weight data blocks divided along the Cout direction; k<=4, represents the number of divisions in the Cin direction, res(0,0)... (1,0) represents the output calculation result. bd(i,j) represents the i-th data block along the height and width directions after bottom_data segmentation, and the j-th data block along the Cin direction; the size of the data block is 16*16*512bit, that is, the size of syn_reuse_iter kernels Syn_reuse_iter represents the number of times of weight reuse, and kernel represents the most basic unit of winograd transformation. Wino_bd(i,j) represents the bottom_data block of the winograd domain after winograd transformation. Wino_w(i,j) represents the i-th data block in the Cout direction of the transformed winograd domain weight and the j-th data block in the Cin direction, and the data block size is 64*16*512bit, which corresponds to the characteristic data block, the total data The number of blocks is n*k, n=Cout/64, k<=4. bd(i,j) means that the bottom_data transformation result Wino_bd(i1,j1) is obtained after the calculation of the main processing module, and then the transformation result is sent to the slave functional unit, and the corresponding Wino_w(i2,j2) is used for bitwise multiplication. And the inverse transform operation to get the result of the operation.
需要说明的是,图3中,缓存To从功能单元,表示缓存向从功能单元发送数据块。从处理内存To从功能单元,表示从处理内存向从功能单元发送数据块。所以当缓存To从功能单元,以及从处理内存To从功能单元的过程中,若发送数据块的编号变化时,会反映出相关的数据复用关系。另外,从功能单元向主处理内存发送数据块时,为间隔发送,即在Cin方向可以累加的数据会暂存在从功能单元中。上述过程,需要按照块为单位进行循环,循环最低维为Cin,即沿Cin方向将变换后的块进行运 算。Cin方向计算过程中,在从功能单元缓存输出结果时不立刻进行输出,而是等待Cin方向累加结束后进行输出。其中Cin方向的循环次数<=4,否则需要进行重复变换。主功能单元bottom_data变换后的结果不需要在特征数据块完成正变换之后再向从功能单元发送。从功能单元逆变换后的结果也可以在部分运算完成后直接进行输出,而不用等待整个kernel的结果都变换完成之后再输出。上述设计方式,可以减小数据运算的延时,提高运算速率。It should be noted that in FIG. 3, buffering the To slave functional unit means that the buffer sends data blocks to the slave functional unit. From the processing memory to the slave functional unit, it means that the data block is sent from the processing memory to the slave functional unit. Therefore, when the To slave functional unit is cached and the To slave functional unit is processed from the memory, if the number of the sent data block changes, the relevant data multiplexing relationship will be reflected. In addition, when the slave functional unit sends data blocks to the main processing memory, it is sent at intervals, that is, the data that can be accumulated in the Cin direction will be temporarily stored in the slave functional unit. In the above process, it is necessary to circulate in the unit of block, and the lowest dimension of the cycle is Cin, that is, the transformed block is calculated along the direction of Cin. During the calculation of the Cin direction, the output is not performed immediately when the output result is buffered from the functional unit, but output is performed after the accumulation of the Cin direction is completed. Among them, the number of cycles in the Cin direction is <=4, otherwise repeated transformations are required. The result of the bottom_data transformation of the main functional unit does not need to be sent to the slave functional unit after the feature data block has been transformed. The result of the inverse transformation from the functional unit can also be output directly after part of the operation is completed, without waiting for the result of the entire kernel to be output after the transformation is completed. The above-mentioned design method can reduce the delay of data operation and increase the operation speed.
本实施例中,通过在运算装置中设置有主控制单元、从控制单元、主存储单元、从存储单元、主运算单元、以及从运算单元;主控制单元发送第一控制指令,第一控制指令用于指示主运算单元进行winograd卷积运算中的正变换运算,以及指示从控制单元发送第二控制指令,第二控制指令用于指示从运算单元进行winograd卷积运算中的乘加运算和逆变换运算;存储单元存储用于winograd卷积运算的数据;主运算单元响应第一控制指令,从主存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;从运算单元响应第二控制指令,从主运算单元中获取正变换运算结果,以及从从存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果。由于主运算单元的正变换运算被拆解为求和运算,从运算单元的逆变换运算被拆解为求和运算;从而能够有效提升深度学习网络在硬件架构上的能效比和运算速度,提高深度学习网络的性能。In this embodiment, the arithmetic device is provided with a master control unit, a slave control unit, a master storage unit, a slave storage unit, a master arithmetic unit, and a slave arithmetic unit; the master control unit sends the first control instruction, the first control instruction Used to instruct the main arithmetic unit to perform the forward transformation operation in the winograd convolution operation, and instruct the slave control unit to send a second control instruction, the second control instruction is used to instruct the slave arithmetic unit to perform the multiply-add operation and inverse in the winograd convolution operation Transformation operation; the storage unit stores data used for winograd convolution operation; the main operation unit responds to the first control instruction and extracts data from the main storage unit to perform the positive transformation operation in the winograd convolution operation to obtain the result of the positive transformation operation; The unit responds to the second control instruction, obtains the result of the forward transformation operation from the main operation unit, and extracts data from the storage unit, and performs the multiplication and addition operation and the inverse transformation operation in the winograd convolution operation to obtain the winograd convolution operation result. Since the forward transformation operation of the main arithmetic unit is disassembled into a summation operation, the inverse transformation operation of the slave arithmetic unit is disassembled into a summation operation; this can effectively improve the energy efficiency ratio and calculation speed of the deep learning network on the hardware architecture, and improve The performance of the deep learning network.
本公开提供了一种神经网络运算装置,神经网络运算装置包括一个或者多个如图1、图2中的运算装置,用于从其他处理装置中获取待运算数据和控制信息,执行指定的神经网络运算,执行结果通过I/O接口传递给其他处理装置;当神经网络运算装置包含多个运算装置时,多个运算装置通过特定的结构进行连接并传输数据;其中,多个运算装置通过快速外部设备互连总线进行互联并传输数据,以支持更大规模的神经网络的运算;多个运算装置共享同一控制系统或拥有各自的控制系统;多个运算装置共享内存或者拥有各自的内存;多个运算装置的互联方式是任意互联拓扑。The present disclosure provides a neural network computing device. The neural network computing device includes one or more computing devices as shown in Figures 1 and 2, which are used to obtain data and control information to be computed from other processing devices, and execute the specified neural network. Network operations, the execution results are transmitted to other processing devices through the I/O interface; when the neural network computing device includes multiple computing devices, the multiple computing devices are connected through a specific structure and transmit data; among them, multiple computing devices pass through the fast The external device interconnection bus interconnects and transmits data to support larger-scale neural network operations; multiple computing devices share the same control system or have their own control systems; multiple computing devices share memory or have their own memory; more The interconnection mode of the arithmetic devices is arbitrary interconnection topology.
图4为本申请又一实施例提供的运算方法的流程示意图,如图4所示,本实施例中的方法应用于图1、图2所示的运算装置中,方法可以包括:FIG. 4 is a schematic flowchart of an operation method provided by another embodiment of this application. As shown in FIG. 4, the method in this embodiment is applied to the operation device shown in FIG. 1 and FIG. 2, and the method may include:
步骤S101、主控制单元发送第一控制指令,第一控制指令用于指示主运算单元进行winograd卷积运算中的正变换运算,以及指示从控制单元发送第二控制指令,第二控制指令用于指示从运算单元进行winograd卷积运算中的乘加运算和逆变换运算。Step S101, the main control unit sends a first control instruction, the first control instruction is used to instruct the main arithmetic unit to perform the forward transformation operation in the winograd convolution operation, and to instruct the slave control unit to send a second control instruction, the second control instruction is used for Instructs the multiplication and addition operation and the inverse transformation operation in the winograd convolution operation from the arithmetic unit.
步骤S102、主运算单元响应第一控制指令,从存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;其中,正变换运算被拆解为求和运 算。Step S102: In response to the first control instruction, the main arithmetic unit extracts data from the storage unit to perform the positive transformation operation in the winograd convolution operation to obtain the positive transformation operation result; wherein the positive transformation operation is disassembled into a summation operation.
步骤S103、从运算单元响应第二控制指令,从主运算单元中获取正变换运算结果,以及从存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果;其中,逆变换运算被拆解为求和运算。Step S103: The slave operation unit responds to the second control instruction, obtains the result of the forward transformation operation from the main operation unit, and extracts data from the storage unit, and performs the multiplication and addition operation and the inverse transformation operation in the winograd convolution operation to obtain the winograd volume The result of the product operation; among them, the inverse transformation operation is broken down into a summation operation.
在一种可能的设计中,存储单元包括:主存储单元和从存储单元;In a possible design, the storage unit includes: a master storage unit and a slave storage unit;
主存储单元,用于接收特征数据并存储;The main storage unit is used to receive and store characteristic data;
从存储单元,用于接收权值数据的正变换数据并存储。The slave storage unit is used to receive and store the positive transformation data of the weight data.
在一种可能的设计中,主运算单元,具体用于将特征数据拆解为多个子张量;对多个子张量进行变换运算并求和,根据求和运算的结果得到特征数据的正变换数据。In a possible design, the main arithmetic unit is specifically used to disassemble the feature data into multiple sub-tensors; perform transformation operations on multiple sub-tensors and sum them, and obtain the positive transformation of the feature data according to the result of the summation operation data.
在一种可能的设计中,主运算单元,具体用于从特征数据解析得到多个子张量,其中,特征数据为多个子张量之和,多个子张量的个数与特征数据中非0元素的个数相同,每个子张量中有单个非0元素,且在子张量中的非0元素与在特征数据中对应位置的非0元素相同。In a possible design, the main arithmetic unit is specifically used to parse the feature data to obtain multiple sub-tensors, where the feature data is the sum of multiple sub-tensors, and the number of multiple sub-tensors is non-zero in the feature data. The number of elements is the same, each sub-tensor has a single non-zero element, and the non-zero element in the sub-tensor is the same as the non-zero element at the corresponding position in the feature data.
在一种可能的设计中,主运算单元,具体用于获取各子张量对应的元子张量的winograd变换结果,其中,元子张量是将子张量的非0元素置为1的张量;将子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到特征数据的正变换数据。In a possible design, the main arithmetic unit is specifically used to obtain the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor, where the meta-sub-tensor sets the non-zero elements of the sub-tensor to 1 Tensor; multiply the non-zero element value of the sub-tensor as the coefficient by the winograd transformation result of the corresponding element sub-tensor to obtain the winograd transformation result of the sub-tensor; add the winograd transformation results of multiple sub-tensors to obtain the feature data The data is being transformed.
在一种可能的设计中,主运算单元,具体用于对于每一个子张量,将子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到元子张量的winograd变换结果,其中,左乘矩阵和右乘矩阵都是由子张量的规模以及winograd变换类型确定的,其中winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。In a possible design, the main arithmetic unit is specifically used for each sub-tensor, multiplying the left side of the sub-tensor corresponding to the sub-tensor by the left multiplying matrix and the right side by the right multiplying matrix to obtain the sub-tensor The result of the winograd transformation of the quantity, where the left multiplication matrix and the right multiplication matrix are both determined by the scale of the subtensor and the winograd transformation type, where the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
在一种可能的设计中,从运算单元,具体用于将特征数据的正变换数据和权值数据的正变换数据进行对位乘法运算,得到乘法运算结果;In a possible design, the slave arithmetic unit is specifically used to perform bitwise multiplication on the positive transformation data of the characteristic data and the positive transformation data of the weight data to obtain the result of the multiplication operation;
将乘法运算结果拆解为多个子张量;对多个子张量进行变换运算并求和,得到winograd卷积运算结果。The multiplication operation result is disassembled into multiple sub-tensors; the multiple sub-tensors are transformed and summed to obtain the winograd convolution operation result.
在一种可能的设计中,从运算单元,具体用于从乘法运算结果解析得到多个子张量,其中,乘法运算结果为多个子张量之和,多个子张量的个数与乘法运算结果中非0元素的个数相同,每个子张量中有单个非0元素,且在子张量中的非0元素与在乘法运算结果中对应位置的非0元素相同。In a possible design, the arithmetic unit is specifically used to obtain multiple sub-tensors from the multiplication operation result analysis, where the multiplication operation result is the sum of the multiple sub-tensors, and the number of the multiple sub-tensors and the multiplication operation result The number of non-zero elements is the same, each sub-tensor has a single non-zero element, and the non-zero element in the sub-tensor is the same as the non-zero element in the corresponding position in the result of the multiplication operation.
在一种可能的设计中,从运算单元,具体用于获取各子张量对应的元子张量的winograd变换结果,其中,元子张量是将子张量的非0元素置为1的张量;将子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到winograd卷积运算结果。In a possible design, the slave arithmetic unit is specifically used to obtain the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor, where the meta-sub-tensor sets the non-zero elements of the sub-tensor to 1 Tensor; multiply the non-zero element value in the sub-tensor as the coefficient by the winograd transformation result of the corresponding meta-sub-tensor to obtain the winograd transformation result of the sub-tensor; add the winograd transformation results of multiple sub-tensors to obtain the winograd convolution The result of the calculation.
在一种可能的设计中,从运算单元,具体用于对于每一个子张量,将子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到元子张量的winograd变换结果,其中,左乘矩阵和右乘矩阵都是由子张量的规模以及winograd变换类型确定的,其中winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。In a possible design, the slave operation unit is specifically used for each sub-tensor, multiplying the left side of the sub-tensor corresponding to the sub-tensor by the left multiplying matrix, and the right multiplying by the right multiplying matrix to obtain the sub-tensor The result of the winograd transformation of the quantity, where the left multiplication matrix and the right multiplication matrix are both determined by the scale of the subtensor and the winograd transformation type, where the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
在一种可能的设计中,存储单元包括:主存储单元和从存储单元;In a possible design, the storage unit includes: a master storage unit and a slave storage unit;
主存储单元,用于接收特征数据并存储;The main storage unit is used to receive and store characteristic data;
从存储单元,用于接收权值数据并存储。The slave storage unit is used to receive weight data and store it.
在一种可能的设计中,从运算单元,还用于将权值数据拆解为多个子张量;对多个子张量进行变换运算并求和,根据求和运算的结果得到权值数据的正变换数据。In a possible design, the slave arithmetic unit is also used to disassemble the weight data into multiple sub-tensors; transform and sum the multiple sub-tensors, and obtain the weight data according to the result of the summation operation. The data is being transformed.
在一种可能的设计中,存储单元包括:主存储单元和从存储单元;In a possible design, the storage unit includes: a master storage unit and a slave storage unit;
主存储单元,用于接收特征数据和权值数据并存储;The main storage unit is used to receive and store characteristic data and weight data;
从存储单元,用于接收权值数据的正变换数据并存储。The slave storage unit is used to receive and store the positive transformation data of the weight data.
在一种可能的设计中,主运算单元,还用于将权值数据拆解为多个子张量;对多个子张量进行变换运算并求和,根据求和运算的结果得到权值数据的正变换数据;In a possible design, the main arithmetic unit is also used to disassemble the weight data into multiple sub-tensors; transform and sum the multiple sub-tensors, and obtain the weight data according to the result of the summation operation. Positive transformation data;
将权值数据的正变换数据发送给从存储单元。The positive transformation data of the weight data is sent to the slave storage unit.
在一种可能的设计中,主运算单元包括:主处理模块和缓存;In a possible design, the main computing unit includes: a main processing module and a cache;
主处理模块,用于响应第一控制指令,从存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;The main processing module is configured to respond to the first control instruction, extract data from the storage unit and perform the positive transformation operation in the winograd convolution operation to obtain the positive transformation operation result;
缓存,用于存储正变换运算结果。Cache, used to store the result of the positive transformation operation.
在一种可能的设计中,缓存还用于在存储的正变换运算结果累积到预设数量时,向从运算单元发送正变换运算结果。In a possible design, the buffer is also used to send the result of the positive conversion to the slave arithmetic unit when the stored result of the positive conversion accumulates to a preset number.
在一种可能的设计中,从运算单元还用于,将winograd卷积运算结果存储到存储单元的预设地址空间中。In a possible design, the slave operation unit is also used to store the winograd convolution operation result in the preset address space of the storage unit.
在一种可能的设计中,主运算单元与多个从运算单元通信连接,不同的从运算单元负责对不同正变换运算结果的运算。In a possible design, the main arithmetic unit is communicatively connected with multiple slave arithmetic units, and different slave arithmetic units are responsible for calculating the results of different forward transformation operations.
在一种可能的设计中,主运算单元和从运算单元的运算过程为并行运算,在主运算单元对特征数据的正变换数据计算完成之前,从运算单元针对已计算出的特征数据的正变换数据的元素位置和对应的权值数据的正变换数据的元素位置进行对位乘法运算,直至计算出各元素位置的对位乘法运算值,获得乘法运算结果。In a possible design, the operation process of the main arithmetic unit and the slave arithmetic unit are parallel operations. Before the main arithmetic unit completes the calculation of the positive transformation data of the feature data, the slave arithmetic unit performs the positive transformation of the calculated feature data. The element position of the data and the element position of the positive transformation data of the corresponding weight data are subjected to a bitwise multiplication operation until the bitwise multiplication operation value of each element position is calculated, and the multiplication result is obtained.
在一种可能的设计中,主存储单元中存储的特征数据被划分为多个用于winograd卷积运算的第一数据,第一数据的尺寸根据卷积核的尺寸确定;In a possible design, the characteristic data stored in the main storage unit is divided into a plurality of first data used for winograd convolution operation, and the size of the first data is determined according to the size of the convolution kernel;
从存储单元中存储的权值数据的正变换数据被划分为多个用于winograd卷积运算的第二数据,第二数据的尺寸根据第一数据的尺寸确定;The positive transformation data of the weight data stored in the storage unit is divided into a plurality of second data used for winograd convolution operation, and the size of the second data is determined according to the size of the first data;
主运算单元的主处理模块,响应主控制单元发送的第一控制指令,从主存储单元中依次获取第一数据,对第一数据进行正变换运算得到第一数据的正变换结果,并将第一数据的正变换结果存储在缓存中;The main processing module of the main arithmetic unit responds to the first control instruction sent by the main control unit, obtains the first data in sequence from the main storage unit, performs a forward transformation operation on the first data to obtain the positive transformation result of the first data, and combines The result of the positive transformation of one data is stored in the cache;
当缓存中的第一数据的正变换结果达到预设数量时,主运算单元的主处理模块将缓存中的第一数据的正变换结果依次发送给从运算单元;When the positive transformation result of the first data in the buffer reaches a preset number, the main processing module of the main arithmetic unit sends the positive transformation result of the first data in the buffer to the slave arithmetic unit in turn;
从运算单元响应从控制单元发送的第二控制指令,从从存储单元中获取第二数据,将第一数据的正变换结果与第二数据进行对位乘运算,得到对位乘运算结果,并对对位乘运算结果进行逆变换运算,得到逆变换结果;The slave arithmetic unit responds to the second control instruction sent by the control unit, obtains the second data from the storage unit, performs a bitwise multiplication operation on the positive transformation result of the first data and the second data to obtain the bitwise multiplication result, and Perform the inverse transformation operation on the result of the bit multiplication operation to obtain the inverse transformation result;
从运算单元根据逆变换结果得到winograd卷积运算结果,并将winograd卷积运算结果发送给主存储单元进行存储。The slave operation unit obtains the winograd convolution operation result according to the inverse transformation result, and sends the winograd convolution operation result to the main storage unit for storage.
根据本公开实施例的运算方法可应用于包括多个处理器(多核)的处理系统(例如人工智能芯片)的任意一个处理器中。该处理器可以是通用处理器,例如CPU(Central Processing Unit,中央处理器),也可以是用于执行人工智能运算的人工智能处理器(IPU)。人工智能运算可包括机器学习运算,类脑运算等。其中,机器学习运算包括神经网络运算、k-means运算、支持向量机运算等。该人工智能处理器可例如包括GPU(Graphics Processing Unit,图形处理单元)、NPU(Neural-Network Processing Unit,神经网络处理单元)、DSP(Digital Signal Process,数字信号处理单元)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)芯片中的一种或组合。本公开对处理器的具体类型不作限制。此外,处理系统中的多个处理器的类型可以相同或不同,本公开对此不作限制。The operation method according to the embodiment of the present disclosure can be applied to any one processor of a processing system (for example, an artificial intelligence chip) including multiple processors (multi-core). The processor may be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor (IPU) for performing artificial intelligence operations. Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and so on. The artificial intelligence processor may, for example, include GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips. The present disclosure does not limit the specific types of processors. In addition, the types of multiple processors in the processing system may be the same or different, which is not limited in the present disclosure.
在一种可能的实现方式中,本公开中所提及的处理器可包括多个处理单元,每个处理单元可以独立运行所分配到的各种任务,如:卷积运算任务、池化任务或全连接任务等。本公开对处理单元及处理单元所运行的任务不作限制。In a possible implementation manner, the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as convolution computing tasks and pooling tasks. Or fully connected tasks, etc. The present disclosure does not limit the processing unit and the tasks run by the processing unit.
图5为根据本公开实施例的运算方法的处理系统的示意图。如图5所示,处理系统10包括多个处理器11以及存储器12,多个处理器11用于执行指令序列,存储器12用于存储数据,可包括随机存储器(RAM,Random Access Memory)和寄存器堆。处理系统10中的多个处理器11既可共用部分存储空间,例如共用部分RAM存储空间和寄存器堆,又可同时拥有各自的存储空间。Fig. 5 is a schematic diagram of a processing system of an arithmetic method according to an embodiment of the present disclosure. As shown in FIG. 5, the processing system 10 includes multiple processors 11 and a memory 12, the multiple processors 11 are used to execute instruction sequences, and the memory 12 is used to store data, and may include random access memory (RAM, Random Access Memory) and registers. heap. The multiple processors 11 in the processing system 10 can share part of the storage space, for example, share part of the RAM storage space and the register file, and can also have their own storage space at the same time.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present disclosure is not limited by the described sequence of actions. Because according to the present disclosure, certain steps can be performed in other order or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the involved actions and modules are not necessarily required by the present disclosure.
进一步需要说明的是,虽然流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be further noted that although the various steps in the flowchart are displayed in sequence according to the directions of the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless there is a clear description in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least part of the steps in the flowchart may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
应该理解,上述的装置实施例仅是示意性的,本公开的装置还可通过其它的方式实现。例如,上述实施例中单元/模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个单元、模块或组件可以结合,或者可以集成到另一个系统,或一些特征可以忽略或不执行。It should be understood that the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways. For example, the division of units/modules in the above embodiments is only a logical function division, and there may be other division methods in actual implementation. For example, multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
另外,若无特别说明,在本公开各个实施例中的各功能单元/模块可以集成在一个单元/模块中,也可以是各个单元/模块单独物理存在,也可以两个或两个以上单元/模块集成在一起。上述集成的单元/模块既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。In addition, unless otherwise specified, the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist. The modules are integrated together. The above-mentioned integrated unit/module can be implemented in the form of hardware or software program module.
集成的单元/模块如果以硬件的形式实现时,该硬件可以是数字电路,模拟电路等等。硬件结构的物理实现包括但不局限于晶体管,忆阻器等等。若无特别说明,人工智能处理器可以是任何适当的硬件处理器,比如CPU、GPU、FPGA、DSP和ASIC等等。若无特别说明,存储单元可以是任何适当的磁存储介质或者磁光存储介质,比如,阻变式存储器RRAM(Resistive Random Access Memory)、动态随机存取存储器DRAM(Dynamic Random Access Memory)、静态随机存取存储器SRAM(Static  Random-Access Memory)、增强动态随机存取存储器EDRAM(Enhanced Dynamic Random Access Memory)、高带宽内存HBM(High-Bandwidth Memory)、混合存储立方HMC(Hybrid Memory Cube)等等。If the integrated unit/module is implemented in the form of hardware, the hardware may be a digital circuit, an analog circuit, and so on. The physical realization of the hardware structure includes but is not limited to transistors, memristors and so on. Unless otherwise specified, the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on. Unless otherwise specified, the storage unit can be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), Dynamic Random Access Memory (DRAM), and static random access memory. Access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc.
集成的单元/模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods of the various embodiments of the present disclosure. The aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
在一种可能的实现方式中,还公开了一种人工智能芯片,其包括了上述运算装置。In a possible implementation manner, an artificial intelligence chip is also disclosed, which includes the aforementioned computing device.
在一种可能的实现方式中,还公开了一种板卡,其包括存储器件、接口装置和控制器件以及上述人工智能芯片;其中,人工智能芯片与存储器件、控制器件以及接口装置分别连接;存储器件,用于存储数据;接口装置,用于实现人工智能芯片与外部设备之间的数据传输;控制器件,用于对人工智能芯片的状态进行监控。In a possible implementation manner, a board card is also disclosed, which includes a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip; wherein the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively; The storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and external equipment; the control device is used to monitor the state of the artificial intelligence chip.
图6为根据本公开实施例的板卡的结构框图,参阅图6,上述板卡除了包括上述芯片389以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件390、接口装置391和控制器件392;Fig. 6 is a structural block diagram of a board according to an embodiment of the present disclosure. Referring to Fig. 6, the board may include other supporting components in addition to the chip 389 described above. The supporting components include, but are not limited to: a storage device 390, an interface Device 391 and control device 392;
存储器件390与人工智能芯片通过总线连接,用于存储数据。存储器件可以包括多组存储单元393。每一组存储单元与人工智能芯片通过总线连接。可以理解,每一组存储单元可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。The storage device 390 is connected to the artificial intelligence chip through a bus for storing data. The storage device may include multiple sets of storage units 393. Each group of storage units is connected to the artificial intelligence chip through a bus. It can be understood that each group of storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,存储装置可以包括4组存储单元。每一组存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,人工智能芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。可以理解,当每一组存储单元中采用DDR4-3200颗粒时,数据传输的理论带宽可达到25600MB/s。DDR does not need to increase the clock frequency to double the speed of SDRAM. DDR allows data to be read on the rising and falling edges of the clock pulse. The speed of DDR is twice that of standard SDRAM. In one embodiment, the storage device may include 4 sets of storage units. Each group of memory cells can include multiple DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600MB/s.
在一个实施例中,每一组存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在芯片中设置控制DDR的控制器, 用于对每个存储单元的数据传输与数据存储的控制。In one embodiment, each group of storage units includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller that controls the DDR is provided in the chip to control the data transmission and data storage of each storage unit.
接口装置与人工智能芯片电连接。接口装置用于实现人工智能芯片与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中,接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至芯片,实现数据转移。优选的,当采用PCIE 3.0X 16接口传输时,理论带宽可达到16000MB/s。在另一个实施例中,接口装置还可以是其他的接口,本公开并不限制上述其他的接口的具体表现形式,接口单元能够实现转接功能即可。另外,人工智能芯片的计算结果仍由接口装置传送回外部设备(例如服务器)。The interface device is electrically connected with the artificial intelligence chip. The interface device is used to realize data transmission between the artificial intelligence chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer. Preferably, when the PCIE 3.0X 16 interface is used for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as the interface unit can realize the switching function. In addition, the calculation result of the artificial intelligence chip is still transmitted back to the external device (such as the server) by the interface device.
控制器件与人工智能芯片电连接。控制器件用于对人工智能芯片的状态进行监控。具体的,人工智能芯片与控制器件可以通过SPI接口电连接。控制器件可以包括单片机(Micro Controller Unit,MCU)。如人工智能芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,人工智能芯片可以处于多负载和轻负载等不同的工作状态。通过控制装置可以实现对人工智能芯片中多个处理芯片、多个处理和或多个处理电路的工作状态的调控。The control device is electrically connected with the artificial intelligence chip. The control device is used to monitor the state of the artificial intelligence chip. Specifically, the artificial intelligence chip and the control device may be electrically connected through an SPI interface. The control device may include a single-chip microcomputer (Micro Controller Unit, MCU). For example, an artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, which can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load. The control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.
在一种可能的实现方式中,公开了一种电子设备,其包括了上述人工智能芯片。电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。In a possible implementation manner, an electronic device is disclosed, which includes the aforementioned artificial intelligence chip. Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
交通工具包括飞机、轮船和/或车辆;家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;医疗设备包括核磁共振仪、B超仪和/或心电图仪。Transportation includes airplanes, ships, and/or vehicles; household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, range hoods; medical equipment includes nuclear magnetic resonance, ultrasound, and/or Electrocardiograph.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments. The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the various technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should all be combined. It is considered as the range described in this specification.
依据以下条款可更好地理解前述内容:The foregoing can be better understood according to the following clauses:
条款1、一种运算装置,装置用于进行winograd卷积运算,包括:主控制单元、从控制单元、存储单元、主运算单元、以及从运算单元; Clause 1. An arithmetic device for performing winograd convolution operations, including: a master control unit, a slave control unit, a storage unit, a master arithmetic unit, and a slave arithmetic unit;
主控制单元,用于发送第一控制指令,第一控制指令用于指示主运算单元进行winograd卷积运算中的正变换运算,以及指示从控制单元发送第二控制指令,第二控制指令用于指示从运算单元进行winograd卷积运算中的乘加运算和逆变换运算;The main control unit is used to send a first control instruction, the first control instruction is used to instruct the main arithmetic unit to perform the forward transformation operation in the winograd convolution operation, and to instruct the slave control unit to send a second control instruction, the second control instruction is used to Instruct the multiplication and addition operation and the inverse transformation operation in the winograd convolution operation from the arithmetic unit;
存储单元,用于存储用于winograd卷积运算的数据;Storage unit, used to store data used for winograd convolution operation;
主运算单元,用于响应第一控制指令,从存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;其中,正变换运算被拆解为求和运算;The main operation unit is used to respond to the first control instruction, extract data from the storage unit and perform the positive transformation operation in the winograd convolution operation to obtain the positive transformation operation result; wherein the positive transformation operation is disassembled into a summation operation;
从运算单元,用于响应第二控制指令,从主运算单元中获取正变换运算结果,以及从存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果;其中,逆变换运算被拆解为求和运算。The slave operation unit is used to respond to the second control instruction, obtain the result of the forward transformation operation from the main operation unit, and extract the data from the storage unit, and perform the multiplication and addition operation and the inverse transformation operation in the winograd convolution operation to obtain the winograd volume The result of the product operation; among them, the inverse transformation operation is broken down into a summation operation.
条款2、根据条款1的运算装置,存储单元包括:主存储单元和从存储单元; Clause 2. The computing device according to Clause 1, the storage unit includes: a master storage unit and a slave storage unit;
主存储单元,用于接收特征数据并存储;The main storage unit is used to receive and store characteristic data;
从存储单元,用于接收权值数据的正变换数据并存储。The slave storage unit is used to receive and store the positive transformation data of the weight data.
条款3、根据条款2的运算装置,Clause 3. The computing device according to Clause 2,
主运算单元,具体用于将特征数据拆解为多个子张量;对多个子张量进行变换运算并求和,根据求和运算的结果得到特征数据的正变换数据。The main arithmetic unit is specifically used to disassemble the feature data into multiple sub-tensors; perform transformation operations on the multiple sub-tensors and sum them, and obtain the positive transformation data of the feature data according to the result of the summation operation.
条款4、根据条款3的运算装置,Clause 4. The computing device according to Clause 3,
主运算单元,具体用于从特征数据解析得到多个子张量,其中,特征数据为多个子张量之和,多个子张量的个数与特征数据中非0元素的个数相同,每个子张量中有单个非0元素,且在子张量中的非0元素与在特征数据中对应位置的非0元素相同。The main arithmetic unit is specifically used to parse the feature data to obtain multiple sub-tensors, where the feature data is the sum of multiple sub-tensors, the number of multiple sub-tensors is the same as the number of non-zero elements in the feature data, and each sub-tensor is the same as the number of non-zero elements in the feature data. There is a single non-zero element in the tensor, and the non-zero element in the sub-tensor is the same as the non-zero element at the corresponding position in the feature data.
条款5、根据条款4的运算装置,Clause 5. The computing device according to Clause 4,
主运算单元,具体用于获取各子张量对应的元子张量的winograd变换结果,其中,元子张量是将子张量的非0元素置为1的张量;将子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到特征数据的正变换数据。The main arithmetic unit is specifically used to obtain the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor, where the meta-sub-tensor is a tensor in which the non-zero elements of the sub-tensor are set to 1, and the sub-tensor is non-zero The element value of is used as a coefficient multiplied by the winograd transformation result of the corresponding element sub-tensor to obtain the winograd transformation result of the sub-tensor; the winograd transformation results of multiple sub-tensors are added to obtain the positive transformation data of the feature data.
条款6、根据条款5的运算装置,主运算单元,具体用于对于每一个子张量,将子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到元子张量的winograd变换结果,其中,左乘矩阵和右乘矩阵都是由子张量的规模以及winograd变换类型确定的,其中winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。Clause 6. The arithmetic device according to Clause 5, the main arithmetic unit is specifically used for each sub-tensor, multiplying the left side of the sub-tensor corresponding to the sub-tensor by the left multiplying matrix, and the right multiplying by the right multiplying matrix to obtain the element The winograd transformation result of the sub-tensor, where the left-multiplication matrix and the right-multiplication matrix are both determined by the size of the sub-tensor and the winograd transformation type, where the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
条款7、根据条款2的运算装置,Clause 7. The computing device according to Clause 2,
从运算单元,具体用于将特征数据的正变换数据和权值数据的正变换数据进行对位乘法运算,得到乘法运算结果;The slave arithmetic unit is specifically used to perform a bitwise multiplication operation on the positive transformation data of the characteristic data and the positive transformation data of the weight data to obtain the result of the multiplication operation;
将乘法运算结果拆解为多个子张量;对多个子张量进行变换运算并求和,得到winograd卷积运算结果。The multiplication operation result is disassembled into multiple sub-tensors; the multiple sub-tensors are transformed and summed to obtain the winograd convolution operation result.
条款8、根据条款7的运算装置,Clause 8. The computing device according to Clause 7,
从运算单元,具体用于从乘法运算结果解析得到多个子张量,其中,乘法运算结果为多个子张量之和,多个子张量的个数与乘法运算结果中非0元素的个数相同,每个子张量中有单个非0元素,且在子张量中的非0元素与在乘法运算结果中对应位置的非0元素相同。From the arithmetic unit, it is specifically used to analyze the result of the multiplication operation to obtain multiple sub-tensors, where the result of the multiplication operation is the sum of multiple sub-tensors, and the number of the multiple sub-tensors is the same as the number of non-zero elements in the multiplication operation result , There is a single non-zero element in each sub-tensor, and the non-zero element in the sub-tensor is the same as the non-zero element in the corresponding position in the result of the multiplication operation.
条款9、根据条款8的运算装置,Clause 9. The computing device according to Clause 8,
从运算单元,具体用于获取各子张量对应的元子张量的winograd变换结果,其中,元子张量是将子张量的非0元素置为1的张量;将子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到winograd卷积运算结果。From the arithmetic unit, it is specifically used to obtain the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor, where the meta-sub-tensor is a tensor with the non-zero elements of the sub-tensor set to 1, and the sub-tensor is non-zero The element value of is used as a coefficient multiplied by the winograd transformation result of the corresponding element sub-tensor to obtain the winograd transformation result of the sub-tensor; the winograd transformation results of multiple sub-tensors are added to obtain the winograd convolution operation result.
条款10、根据条款9的运算装置,从运算单元,具体用于对于每一个子张量,将子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到元子张量的winograd变换结果,其中,左乘矩阵和右乘矩阵都是由子张量的规模以及winograd变换类型确定的,其中winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。 Clause 10. The arithmetic device according to Clause 9, from the arithmetic unit, specifically used for each sub-tensor, multiplying the left side of the sub-tensor corresponding to the sub-tensor by the left multiplying matrix, and the right multiplying by the right multiplying matrix to obtain the element The winograd transformation result of the sub-tensor, where the left-multiplication matrix and the right-multiplication matrix are both determined by the size of the sub-tensor and the winograd transformation type, where the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
条款11、根据条款1的运算装置,存储单元包括:主存储单元和从存储单元;Clause 11. The computing device according to Clause 1, the storage unit includes: a master storage unit and a slave storage unit;
主存储单元,用于接收特征数据并存储;The main storage unit is used to receive and store characteristic data;
从存储单元,用于接收权值数据并存储。The slave storage unit is used to receive weight data and store it.
条款12、根据条款11的运算装置,Clause 12. The computing device according to Clause 11,
从运算单元,还用于将权值数据拆解为多个子张量;对多个子张量进行变换运算并求和,根据求和运算的结果得到权值数据的正变换数据。The slave arithmetic unit is also used to disassemble the weight data into multiple sub-tensors; perform transformation operations on the multiple sub-tensors and sum them, and obtain the positive transformation data of the weight data according to the result of the summation operation.
条款13、根据条款1的运算装置,存储单元包括:主存储单元和从存储单元;Clause 13. The computing device according to Clause 1, the storage unit includes: a master storage unit and a slave storage unit;
主存储单元,用于接收特征数据和权值数据并存储;The main storage unit is used to receive and store characteristic data and weight data;
从存储单元,用于接收权值数据的正变换数据并存储。The slave storage unit is used to receive and store the positive transformation data of the weight data.
条款14、根据条款13的运算装置,Clause 14. The computing device according to Clause 13,
主运算单元,还用于将权值数据拆解为多个子张量;对多个子张量进行变换运算并求和,根据求和运算的结果得到权值数据的正变换数据;The main operation unit is also used to disassemble the weight data into multiple sub-tensors; perform transformation operations on the multiple sub-tensors and sum them, and obtain the positive transformation data of the weight data according to the result of the summation operation;
将权值数据的正变换数据发送给从存储单元。The positive transformation data of the weight data is sent to the slave storage unit.
条款15、根据条款1-14中任一项的运算装置,主运算单元包括:主处理模块和缓存;Clause 15. The computing device according to any one of clauses 1-14, the main computing unit includes: a main processing module and a buffer;
主处理模块,用于响应第一控制指令,从存储单元中提取数据进行winograd卷积 运算中的正变换运算,得到正变换运算结果;The main processing module is used to respond to the first control instruction to extract data from the storage unit to perform the positive transformation operation in the winograd convolution operation to obtain the positive transformation operation result;
缓存,用于存储正变换运算结果。Cache, used to store the result of the positive transformation operation.
条款16、根据条款15的运算装置,缓存还用于在存储的正变换运算结果累积到预设数量时,向从运算单元发送正变换运算结果。 Clause 16. According to the arithmetic device of Clause 15, the buffer is also used to send the result of the forward conversion to the slave arithmetic unit when the stored result of the forward conversion has accumulated to a preset number.
条款17、根据条款1-14中任一项的运算装置,从运算单元还用于,将winograd卷积运算结果存储到存储单元的预设地址空间中。Clause 17. According to the arithmetic device of any one of clauses 1-14, the slave arithmetic unit is also used to store the result of the winograd convolution operation in the preset address space of the storage unit.
条款18、根据条款1-14中任一项的装置,主运算单元与多个从运算单元通信连接,不同的从运算单元负责对不同正变换运算结果的运算。Clause 18. According to the device of any one of clauses 1-14, the main arithmetic unit is communicatively connected with multiple slave arithmetic units, and different slave arithmetic units are responsible for calculating the results of different forward transformation operations.
条款19、根据条款7的装置,主运算单元和从运算单元的运算过程为并行运算,在主运算单元对特征数据的正变换数据计算完成之前,从运算单元针对已计算出的特征数据的正变换数据的元素位置和对应的权值数据的正变换数据的元素位置进行对位乘法运算,直至计算出各元素位置的对位乘法运算值,获得乘法运算结果。Clause 19. According to the device of Clause 7, the operation process of the main arithmetic unit and the slave arithmetic unit are parallel operations. Before the main arithmetic unit completes the calculation of the positive transformation data of the feature data, the slave arithmetic unit performs the calculation of the positive data of the feature data. The element position of the transformed data and the element position of the positive transformation data of the corresponding weight data are subjected to bitwise multiplication until the bitwise multiplication value of each element position is calculated, and the multiplication result is obtained.
条款20、根据条款19的装置,Clause 20, the device according to Clause 19,
主存储单元中存储的特征数据被划分为多个用于winograd卷积运算的第一数据,第一数据的尺寸根据卷积核的尺寸确定;The feature data stored in the main storage unit is divided into a plurality of first data used for winograd convolution operation, and the size of the first data is determined according to the size of the convolution kernel;
从存储单元中存储的权值数据的正变换数据被划分为多个用于winograd卷积运算的第二数据,第二数据的尺寸根据第一数据的尺寸确定;The positive transformation data of the weight data stored in the storage unit is divided into a plurality of second data used for winograd convolution operation, and the size of the second data is determined according to the size of the first data;
主运算单元的主处理模块,响应主控制单元发送的第一控制指令,从主存储单元中依次获取第一数据,对第一数据进行正变换运算得到第一数据的正变换结果,并将第一数据的正变换结果存储在缓存中;The main processing module of the main arithmetic unit responds to the first control instruction sent by the main control unit, obtains the first data in sequence from the main storage unit, performs a forward transformation operation on the first data to obtain the positive transformation result of the first data, and combines The result of the positive transformation of one data is stored in the cache;
当缓存中的第一数据的正变换结果达到预设数量时,主运算单元的主处理模块将缓存中的第一数据的正变换结果依次发送给从运算单元;When the positive transformation result of the first data in the buffer reaches a preset number, the main processing module of the main arithmetic unit sends the positive transformation result of the first data in the buffer to the slave arithmetic unit in turn;
从运算单元响应从控制单元发送的第二控制指令,从从存储单元中获取第二数据,将第一数据的正变换结果与第二数据进行对位乘运算,得到对位乘运算结果,并对对位乘运算结果进行逆变换运算,得到逆变换结果;The slave arithmetic unit responds to the second control instruction sent by the control unit, obtains the second data from the storage unit, performs a bitwise multiplication operation on the positive transformation result of the first data and the second data to obtain the bitwise multiplication result, and Perform the inverse transformation operation on the result of the bit multiplication operation to obtain the inverse transformation result;
从运算单元根据逆变换结果得到winograd卷积运算结果,并将winograd卷积运算结果发送给主存储单元进行存储。The slave operation unit obtains the winograd convolution operation result according to the inverse transformation result, and sends the winograd convolution operation result to the main storage unit for storage.
条款21、一种人工智能芯片,芯片包括如条款1-20中任意一项的运算装置。Clause 21. An artificial intelligence chip, the chip including a computing device as in any one of clauses 1-20.
条款22、一种电子设备,电子设备包括如条款21的人工智能芯片。Clause 22. An electronic device. The electronic device includes an artificial intelligence chip as in Clause 21.
条款23、一种板卡,板卡包括:存储器件、接口装置和控制器件以及如条款21的人工智能芯片;Clause 23. A board card, which includes: a storage device, an interface device, a control device, and an artificial intelligence chip as in Clause 21;
其中,人工智能芯片与存储器件、控制器件以及接口装置分别连接;Among them, the artificial intelligence chip is connected to the storage device, the control device and the interface device respectively;
存储器件,用于存储数据;Storage device, used to store data;
接口装置,用于实现人工智能芯片与外部设备之间的数据传输;Interface device, used to realize data transmission between artificial intelligence chip and external equipment;
控制器件,用于对人工智能芯片的状态进行监控。The control device is used to monitor the state of the artificial intelligence chip.
条款24、根据条款23的板卡,存储器件包括:多组存储单元,每一组存储单元与人工智能芯片通过总线连接,存储单元为:DDR SDRAM;Clause 24. According to the board card of Clause 23, the storage device includes: multiple groups of storage units, each group of storage units is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
芯片包括:DDR控制器,用于对每个存储单元的数据传输与数据存储的控制;The chip includes: DDR controller, used to control the data transmission and data storage of each storage unit;
接口装置为:标准PCIE接口。The interface device is: standard PCIE interface.
条款25、一种运算方法,应用于运算装置,运算装置包括:主控制单元、从控制单元、存储单元、主运算单元、以及从运算单元;方法包括:Clause 25. An arithmetic method applied to an arithmetic device. The arithmetic device includes: a master control unit, a slave control unit, a storage unit, a master arithmetic unit, and a slave arithmetic unit; the method includes:
主控制单元发送第一控制指令,第一控制指令用于指示主运算单元进行winograd卷积运算中的正变换运算,以及指示从控制单元发送第二控制指令,第二控制指令用于指示从运算单元进行winograd卷积运算中的乘加运算和逆变换运算;The master control unit sends a first control instruction, the first control instruction is used to instruct the master arithmetic unit to perform the forward transformation operation in the winograd convolution operation, and the slave control unit to send a second control instruction, the second control instruction is used to instruct the slave operation The unit performs multiplication and addition operations and inverse transformation operations in winograd convolution operations;
存储单元存储用于winograd卷积运算的数据;The storage unit stores data used for winograd convolution operation;
主运算单元响应第一控制指令,从存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;其中,正变换运算被拆解为求和运算;The main arithmetic unit responds to the first control instruction and extracts data from the storage unit to perform the positive transformation operation in the winograd convolution operation to obtain the positive transformation operation result; wherein, the positive transformation operation is disassembled into a summation operation;
从运算单元响应第二控制指令,从主运算单元中获取正变换运算结果,以及从存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果;其中,逆变换运算被拆解为求和运算。The slave operation unit responds to the second control instruction, obtains the result of the forward transformation operation from the main operation unit, and extracts data from the storage unit, and performs the multiplication and addition operation and the inverse transformation operation in the winograd convolution operation to obtain the winograd convolution operation result ; Among them, the inverse transform operation is broken down into a summation operation.
上述方案能够有效提高芯片的处理效率,但由于winograd卷积正变换和winograd卷积逆变换中仍然进行了矩阵乘,在硬件实现过程中仍然具有很大的开销,所以为了进一步提高处理效率,本申请实施例还提出了一种winograd卷积运算方法。该winograd卷积运算方法可应用在卷积神经网络的硬件实现过程中。The above scheme can effectively improve the processing efficiency of the chip. However, since the matrix multiplication is still performed in the winograd convolution forward transformation and the winograd convolution inverse transformation, there is still a large overhead in the hardware implementation process. Therefore, in order to further improve the processing efficiency, this The application embodiment also proposes a winograd convolution operation method. The winograd convolution operation method can be applied in the hardware implementation process of the convolutional neural network.
以上对本公开实施例进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明仅用于帮助理解本公开的方法及其核心思想。同时,本领域技术人员依据本公开的思想,基于本公开的具体实施方式及应用范围上做出的改变或变形之处,都属于本公开保护的范围。综上,本说明书内容不应理解为对本公开的限制。The embodiments of the present disclosure are described in detail above, and specific examples are used in this article to illustrate the principles and implementations of the present disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure. At the same time, changes or modifications made by those skilled in the art based on the ideas of the present disclosure, the specific embodiments and the scope of application of the present disclosure, are all within the protection scope of the present disclosure. In summary, the content of this specification should not be construed as a limitation of this disclosure.

Claims (44)

  1. 一种运算装置,所述装置用于进行winograd卷积运算,其特征在于,包括:主控制单元、从控制单元、存储单元、主运算单元、以及从运算单元;An arithmetic device, which is used to perform winograd convolution operations, and is characterized by comprising: a master control unit, a slave control unit, a storage unit, a master arithmetic unit, and a slave arithmetic unit;
    所述主控制单元,用于发送第一控制指令,所述第一控制指令用于指示所述主运算单元进行winograd卷积运算中的正变换运算,以及指示所述从控制单元发送第二控制指令,所述第二控制指令用于指示所述从运算单元进行winograd卷积运算中的乘加运算和逆变换运算;The master control unit is configured to send a first control instruction, and the first control instruction is used to instruct the master arithmetic unit to perform a positive transformation operation in a winograd convolution operation, and instruct the slave control unit to send a second control Instruction, the second control instruction is used to instruct the slave arithmetic unit to perform multiplication and addition operations and inverse transformation operations in a winograd convolution operation;
    所述存储单元,用于存储用于winograd卷积运算的数据;The storage unit is used to store data used for winograd convolution operation;
    所述主运算单元,用于响应所述第一控制指令,从所述存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;其中,所述正变换运算被拆解为求和运算;The main arithmetic unit is configured to respond to the first control instruction and extract data from the storage unit to perform a positive transformation operation in a winograd convolution operation to obtain a positive transformation operation result; wherein the forward transformation operation is split The solution is a summation operation;
    所述从运算单元,用于响应所述第二控制指令,从所述主运算单元中获取正变换运算结果,以及从所述存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果;其中,所述逆变换运算被拆解为求和运算。The slave operation unit is configured to respond to the second control instruction, obtain the result of the forward transformation operation from the main operation unit, extract data from the storage unit, and perform multiplication and addition operations in the winograd convolution operation And the inverse transformation operation to obtain the winograd convolution operation result; wherein, the inverse transformation operation is disassembled into a summation operation.
  2. 根据权利要求1所述的运算装置,其特征在于,所述存储单元包括:主存储单元和从存储单元;The computing device according to claim 1, wherein the storage unit comprises: a main storage unit and a slave storage unit;
    所述主存储单元,用于接收特征数据并存储;The main storage unit is used to receive and store characteristic data;
    所述从存储单元,用于接收权值数据的正变换数据并存储。The slave storage unit is used to receive and store the positive transformation data of the weight data.
  3. 根据权利要求2所述的运算装置,其特征在于,The arithmetic device according to claim 2, wherein:
    所述主运算单元,具体用于将所述特征数据拆解为多个子张量;对所述多个子张量进行变换运算并求和,根据求和运算的结果得到所述特征数据的正变换数据。The main arithmetic unit is specifically configured to disassemble the feature data into multiple sub-tensors; perform transformation operations on the multiple sub-tensors and sum them, and obtain the positive transformation of the feature data according to the result of the summation operation data.
  4. 根据权利要求3所述的运算装置,其特征在于,The arithmetic device according to claim 3, wherein:
    所述主运算单元,具体用于从所述特征数据解析得到多个子张量,其中,所述特征数据为所述多个子张量之和,所述多个子张量的个数与所述特征数据中非0元素的个数相同,每个所述子张量中有单个非0元素,且在所述子张量中的非0元素与在所述特征数据中对应位置的非0元素相同。The main arithmetic unit is specifically configured to parse the feature data to obtain multiple sub-tensors, where the feature data is the sum of the multiple sub-tensors, and the number of the multiple sub-tensors and the feature The number of non-zero elements in the data is the same, each sub-tensor has a single non-zero element, and the non-zero element in the sub-tensor is the same as the non-zero element in the corresponding position in the feature data.
  5. 根据权利要求4所述的运算装置,其特征在于,The arithmetic device according to claim 4, wherein:
    所述主运算单元,具体用于获取各子张量对应的元子张量的winograd变换结果,其中,所述元子张量是将所述子张量的非0元素置为1的张量;将所述子张量中非0 的元素值作为系数乘以对应的元子张量的winograd变换结果,得到所述子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到所述特征数据的正变换数据。The main arithmetic unit is specifically configured to obtain the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor, wherein the meta-sub-tensor is a tensor with non-zero elements of the sub-tensor set to 1. ; Multiplying the non-zero element values in the sub-tensor by the coefficients of the winograd transformation result of the corresponding element sub-tensor to obtain the winograd transformation result of the sub-tensor; adding the winograd transformation results of multiple sub-tensors to obtain the result The positive transformation data of the feature data.
  6. 根据权利要求5所述的运算装置,其特征在于,所述主运算单元,具体用于对于每一个所述子张量,将所述子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到所述元子张量的winograd变换结果,其中,所述左乘矩阵和所述右乘矩阵都是由所述子张量的规模以及winograd变换类型确定的,其中所述winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。5. The arithmetic device according to claim 5, wherein the main arithmetic unit is specifically configured to, for each of the sub-tensors, multiply the left side of the element sub-tensor corresponding to the sub-tensor by the left multiplication matrix , The right side is multiplied by the right multiplication matrix to obtain the winograd transformation result of the element sub-tensor, wherein the left multiplication matrix and the right multiplication matrix are both determined by the scale of the sub-tensor and the winograd transformation type, Wherein, the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
  7. 根据权利要求2所述的运算装置,其特征在于,The arithmetic device according to claim 2, wherein:
    所述从运算单元,具体用于将所述特征数据的正变换数据和所述权值数据的正变换数据进行对位乘法运算,得到乘法运算结果;The slave operation unit is specifically configured to perform a bitwise multiplication operation on the positive transformation data of the characteristic data and the positive transformation data of the weight data to obtain a multiplication operation result;
    将所述乘法运算结果拆解为多个子张量;对所述多个子张量进行变换运算并求和,得到所述winograd卷积运算结果。The multiplication operation result is disassembled into multiple sub-tensors; the multiple sub-tensors are transformed and summed to obtain the winograd convolution operation result.
  8. 根据权利要求7所述的运算装置,其特征在于,The arithmetic device according to claim 7, wherein:
    所述从运算单元,具体用于从所述乘法运算结果解析得到多个子张量,其中,所述乘法运算结果为所述多个子张量之和,所述多个子张量的个数与所述乘法运算结果中非0元素的个数相同,每个所述子张量中有单个非0元素,且在所述子张量中的非0元素与在所述乘法运算结果中对应位置的非0元素相同。The slave operation unit is specifically configured to parse the result of the multiplication operation to obtain a plurality of sub-tensors, wherein the result of the multiplication operation is the sum of the plurality of sub-tensors, and the number of the plurality of sub-tensors is The number of non-zero elements in the multiplication operation result is the same, each of the sub-tensors has a single non-zero element, and the non-zero elements in the sub-tensor are the same as the non-zero elements in the corresponding position in the multiplication operation result .
  9. 根据权利要求8所述的运算装置,其特征在于,The arithmetic device according to claim 8, wherein:
    所述从运算单元,具体用于获取各子张量对应的元子张量的winograd变换结果,其中,所述元子张量是将所述子张量的非0元素置为1的张量;将所述子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到所述子张量的winograd变换结果;将多个子张量的winograd变换结果相加得到所述winograd卷积运算结果。The slave operation unit is specifically configured to obtain the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor, wherein the meta-sub-tensor is a tensor with non-zero elements of the sub-tensor set to 1. ; Multiplying the non-zero element value of the sub-tensor by the winograd transformation result of the corresponding element sub-tensor as a coefficient to obtain the winograd transformation result of the sub-tensor; adding the winograd transformation results of multiple sub-tensors to obtain the result Describe the result of winograd convolution operation.
  10. 根据权利要求9所述的运算装置,其特征在于,所述从运算单元,具体用于对于每一个所述子张量,将所述子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到所述元子张量的winograd变换结果,其中,所述左乘矩阵和所述右乘矩阵都是由所述子张量的规模以及winograd变换类型确定的,其中所述winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。8. The arithmetic device according to claim 9, wherein the slave arithmetic unit is specifically configured to, for each of the sub-tensors, multiply the left side of the element sub-tensor corresponding to the sub-tensor by the left multiplication matrix , The right side is multiplied by the right multiplication matrix to obtain the winograd transformation result of the element sub-tensor, wherein the left multiplication matrix and the right multiplication matrix are both determined by the scale of the sub-tensor and the winograd transformation type, Wherein, the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation.
  11. 根据权利要求1所述的运算装置,其特征在于,所述存储单元包括:主存储单元和从存储单元;The computing device according to claim 1, wherein the storage unit comprises: a main storage unit and a slave storage unit;
    所述主存储单元,用于接收特征数据并存储;The main storage unit is used to receive and store characteristic data;
    所述从存储单元,用于接收权值数据并存储。The slave storage unit is used to receive and store weight data.
  12. 根据权利要求11所述的运算装置,其特征在于,The arithmetic device according to claim 11, wherein:
    所述从运算单元,还用于将所述权值数据拆解为多个子张量;对所述多个子张量进行变换运算并求和,根据求和运算的结果得到所述权值数据的正变换数据。The slave operation unit is also used to disassemble the weight data into multiple sub-tensors; perform transformation operations on the multiple sub-tensors and sum them, and obtain the weight data of the weight data according to the result of the summation operation. The data is being transformed.
  13. 根据权利要求1所述的运算装置,其特征在于,所述存储单元包括:主存储单元和从存储单元;The computing device according to claim 1, wherein the storage unit comprises: a main storage unit and a slave storage unit;
    所述主存储单元,用于接收特征数据和权值数据并存储;The main storage unit is used to receive and store characteristic data and weight data;
    所述从存储单元,用于接收权值数据的正变换数据并存储。The slave storage unit is used to receive and store the positive transformation data of the weight data.
  14. 根据权利要求13所述的运算装置,其特征在于,The arithmetic device according to claim 13, wherein:
    所述主运算单元,还用于将所述权值数据拆解为多个子张量;对所述多个子张量进行变换运算并求和,根据求和运算的结果得到所述权值数据的正变换数据;The main arithmetic unit is also used to disassemble the weight data into multiple sub-tensors; perform transformation operations on the multiple sub-tensors and sum them, and obtain the weight data of the weight data according to the result of the summation operation. Positive transformation data;
    将所述权值数据的正变换数据发送给所述从存储单元。Sending the positive transformation data of the weight data to the slave storage unit.
  15. 根据权利要求1-14中任一项所述的运算装置,其特征在于,所述主运算单元包括:主处理模块和缓存;The computing device according to any one of claims 1-14, wherein the main computing unit comprises: a main processing module and a buffer;
    所述主处理模块,用于响应所述第一控制指令,从所述存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;The main processing module is configured to respond to the first control instruction to extract data from the storage unit to perform a positive transformation operation in a winograd convolution operation to obtain a positive transformation operation result;
    所述缓存,用于存储所述正变换运算结果。The buffer is used to store the result of the forward transformation operation.
  16. 根据权利要求15所述的运算装置,其特征在于,所述缓存还用于在存储的正变换运算结果累积到预设数量时,向所述从运算单元发送正变换运算结果。15. The arithmetic device according to claim 15, wherein the buffer is further configured to send the result of the forward conversion to the slave arithmetic unit when the stored result of the forward conversion is accumulated to a preset number.
  17. 根据权利要求1-14中任一项所述的运算装置,其特征在于,所述从运算单元还用于,将所述winograd卷积运算结果存储到所述存储单元的预设地址空间中。14. The arithmetic device according to any one of claims 1-14, wherein the slave arithmetic unit is further configured to store the winograd convolution operation result in a preset address space of the storage unit.
  18. 根据权利要求1-14中任一项所述的装置,其特征在于,主运算单元与多个从运算单元通信连接,不同的从运算单元负责对不同正变换运算结果的运算。The device according to any one of claims 1-14, wherein the main arithmetic unit is communicatively connected with a plurality of slave arithmetic units, and different slave arithmetic units are responsible for calculating different forward transform operation results.
  19. 根据权利要求7所述的装置,其特征在于,所述主运算单元和所述从运算单元的运算过程为并行运算,在所述主运算单元对所述特征数据的正变换数据计算完成之前,所述从运算单元针对已计算出的所述特征数据的正变换数据的元素位置和对应的所述权值数据的正变换数据的元素位置进行对位乘法运算,直至计算出各元素位置的对位乘法运算值,获得乘法运算结果。7. The device according to claim 7, wherein the operation processes of the main operation unit and the slave operation unit are parallel operations, and before the main operation unit completes the calculation of the forward transformation data of the characteristic data, The slave arithmetic unit performs a bitwise multiplication operation on the calculated element position of the forward transformation data of the characteristic data and the corresponding element position of the forward transformation data of the weight data, until the pair of element positions is calculated. Multiply the value of the bit to obtain the result of the multiplication.
  20. 根据权利要求19所述的装置,其特征在于,The device of claim 19, wherein:
    所述主存储单元中存储的所述特征数据被划分为多个用于winograd卷积运算的第 一数据,所述第一数据的尺寸根据卷积核的尺寸确定;The feature data stored in the main storage unit is divided into a plurality of first data used for winograd convolution operations, and the size of the first data is determined according to the size of the convolution kernel;
    所述从存储单元中存储的所述权值数据的正变换数据被划分为多个用于winograd卷积运算的第二数据,所述第二数据的尺寸根据所述第一数据的尺寸确定;The positive transformation data of the weight data stored in the secondary storage unit is divided into a plurality of second data used for winograd convolution operation, and the size of the second data is determined according to the size of the first data;
    所述主运算单元的主处理模块,响应所述主控制单元发送的第一控制指令,从所述主存储单元中依次获取所述第一数据,对所述第一数据进行正变换运算得到所述第一数据的正变换结果,并将所述第一数据的正变换结果存储在缓存中;The main processing module of the main arithmetic unit responds to the first control instruction sent by the main control unit to sequentially obtain the first data from the main storage unit, and perform a forward transformation operation on the first data to obtain the The positive transformation result of the first data, and storing the positive transformation result of the first data in a cache;
    当所述缓存中的所述第一数据的正变换结果达到预设数量时,所述主运算单元的主处理模块将所述缓存中的所述第一数据的正变换结果依次发送给所述从运算单元;When the positive transformation results of the first data in the buffer reach a preset number, the main processing module of the main arithmetic unit sends the positive transformation results of the first data in the buffer to the Slave unit
    所述从运算单元响应所述从控制单元发送的第二控制指令,从所述从存储单元中获取所述第二数据,将所述第一数据的正变换结果与所述第二数据进行对位乘运算,得到对位乘运算结果,并对所述对位乘运算结果进行逆变换运算,得到逆变换结果;The slave arithmetic unit responds to the second control instruction sent by the slave control unit, acquires the second data from the slave storage unit, and compares the positive transformation result of the first data with the second data Bit multiplication operation to obtain a bit multiplication operation result, and perform an inverse transformation operation on the bit multiplication operation result to obtain an inverse transformation result;
    所述从运算单元根据所述逆变换结果得到winograd卷积运算结果,并将所述winograd卷积运算结果发送给所述主存储单元进行存储。The slave operation unit obtains the winograd convolution operation result according to the inverse transformation result, and sends the winograd convolution operation result to the main storage unit for storage.
  21. 一种人工智能芯片,其特征在于,所述芯片包括如权利要求1-20中任意一项所述的运算装置。An artificial intelligence chip, characterized in that the chip comprises the computing device according to any one of claims 1-20.
  22. 一种电子设备,其特征在于,所述电子设备包括如权利要求21所述的人工智能芯片。An electronic device, wherein the electronic device comprises the artificial intelligence chip according to claim 21.
  23. 一种板卡,其特征在于,所述板卡包括:存储器件、接口装置和控制器件以及如权利要求21所述的人工智能芯片;A board card, characterized in that the board card comprises: a storage device, an interface device, a control device, and the artificial intelligence chip according to claim 21;
    其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
    所述存储器件,用于存储数据;The storage device is used to store data;
    所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;
    所述控制器件,用于对所述人工智能芯片的状态进行监控。The control device is used to monitor the state of the artificial intelligence chip.
  24. 根据权利要求23所述的板卡,其特征在于,The board card according to claim 23, characterized in that,
    所述存储器件包括:多组存储单元,每一组所述存储单元与所述人工智能芯片通过总线连接,所述存储单元为:DDR SDRAM;The storage device includes: multiple groups of storage units, each group of the storage unit is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
    所述芯片包括:DDR控制器,用于对每个所述存储单元的数据传输与数据存储的控制;The chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;
    所述接口装置为:标准PCIE接口。The interface device is: a standard PCIE interface.
  25. 一种运算方法,应用于运算装置,其特征在于,所述运算装置包括:主控制单元、从控制单元、存储单元、主运算单元、以及从运算单元;所述方法包括:An arithmetic method applied to an arithmetic device, characterized in that the arithmetic device includes: a master control unit, a slave control unit, a storage unit, a master arithmetic unit, and a slave arithmetic unit; the method includes:
    所述主控制单元发送第一控制指令,所述第一控制指令用于指示所述主运算单元进行winograd卷积运算中的正变换运算,以及指示所述从控制单元发送第二控制指令,所述第二控制指令用于指示所述从运算单元进行winograd卷积运算中的乘加运算和逆 变换运算;The master control unit sends a first control instruction, and the first control instruction is used to instruct the master arithmetic unit to perform the forward transformation operation in the winograd convolution operation, and instruct the slave control unit to send a second control instruction, so The second control instruction is used to instruct the slave operation unit to perform multiplication and addition operations and inverse transformation operations in a winograd convolution operation;
    所述存储单元存储用于winograd卷积运算的数据;The storage unit stores data used for winograd convolution operation;
    所述主运算单元响应所述第一控制指令,从所述存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;其中,所述正变换运算被拆解为求和运算;In response to the first control instruction, the main arithmetic unit extracts data from the storage unit to perform a positive transformation operation in a winograd convolution operation to obtain a positive transformation operation result; wherein the forward transformation operation is decomposed into Sum operation
    所述从运算单元响应所述第二控制指令,从所述主运算单元中获取正变换运算结果,以及从所述存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果;其中,所述逆变换运算被拆解为求和运算。The slave operation unit responds to the second control instruction, obtains the result of the forward transformation operation from the main operation unit, and extracts data from the storage unit, and performs the multiplication and addition operation and the inverse transformation in the winograd convolution operation Operation to obtain the winograd convolution operation result; wherein, the inverse transformation operation is disassembled into a summation operation.
  26. 根据权利要求25所述的运算方法,其特征在于,所述存储单元包括:主存储单元和从存储单元;The operation method according to claim 25, wherein the storage unit comprises: a main storage unit and a slave storage unit;
    所述存储单元存储用于winograd卷积运算的数据,包括:The storage unit stores data used for winograd convolution operations, including:
    所述主存储单元接收特征数据并存储;The main storage unit receives and stores characteristic data;
    所述从存储单元接收权值数据的正变换数据并存储。The positive conversion data of the weight data is received from the storage unit and stored.
  27. 根据权利要求26所述的运算方法,其特征在于,所述主运算单元响应所述第一控制指令,从所述存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果,包括:The operation method according to claim 26, wherein the main operation unit responds to the first control instruction to extract data from the storage unit and perform a positive transformation operation in a winograd convolution operation to obtain a positive transformation operation The results include:
    所述主运算单元将所述特征数据拆解为多个子张量;The main arithmetic unit disassembles the feature data into a plurality of sub-tensors;
    所述主运算单元对所述多个子张量进行变换运算并求和,根据求和运算的结果得到所述特征数据的正变换数据。The main arithmetic unit performs a transformation operation on the multiple sub-tensors and sums them, and obtains the positive transformation data of the feature data according to the result of the summation operation.
  28. 根据权利要求27所述的运算方法,其特征在于,所述主运算单元将所述特征数据拆解为多个子张量,包括:The operation method according to claim 27, wherein the main operation unit decomposes the feature data into a plurality of sub-tensors, comprising:
    所述主运算单元从所述特征数据解析得到多个子张量;The main computing unit analyzes and obtains a plurality of sub-tensors from the feature data;
    其中,所述特征数据为所述多个子张量之和,所述多个子张量的个数与所述特征数据中非0元素的个数相同,每个所述子张量中有单个非0元素,且在所述子张量中的非0元素与在所述特征数据中对应位置的非0元素相同。Wherein, the feature data is the sum of the multiple sub-tensors, the number of the multiple sub-tensors is the same as the number of non-zero elements in the feature data, and each of the sub-tensors has a single non-zero element , And the non-zero element in the sub-tensor is the same as the non-zero element at the corresponding position in the feature data.
  29. 根据权利要求28所述的运算方法,其特征在于,所述主运算单元对所述多个子张量进行变换运算并求和,根据求和运算的结果得到所述特征数据的正变换数据,包括:The operation method according to claim 28, wherein the main operation unit performs a transformation operation on the plurality of sub-tensors and sums them, and obtains the positive transformation data of the feature data according to the result of the summation operation, including :
    所述主运算单元获取各子张量对应的元子张量的winograd变换结果,其中,所述元子张量是将所述子张量的非0元素置为1的张量;The main arithmetic unit obtains the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor, where the meta-sub-tensor is a tensor in which non-zero elements of the sub-tensor are set to 1;
    所述主运算单元将所述子张量中非0的元素值作为系数乘以对应的元子张量的 winograd变换结果,得到所述子张量的winograd变换结果;The main arithmetic unit multiplies the non-zero element value in the sub-tensor as a coefficient by the winograd transformation result of the corresponding meta-sub-tensor to obtain the winograd transformation result of the sub-tensor;
    所述主运算单元将多个子张量的winograd变换结果相加得到所述特征数据的正变换数据。The main arithmetic unit adds the winograd transformation results of multiple sub-tensors to obtain the positive transformation data of the feature data.
  30. 根据权利要求29所述的运算方法,其特征在于,所述主运算单元获取各子张量对应的元子张量的winograd变换结果,包括:The operation method according to claim 29, wherein the main operation unit obtains the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor, comprising:
    所述主运算单元对于每一个所述子张量,将所述子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到所述元子张量的winograd变换结果;For each of the sub-tensors, the main arithmetic unit multiplies the left side of the element sub-tensor corresponding to the sub-tensor by the left multiplying matrix, and the right side multiplies the right multiplying matrix to obtain the winograd transformation of the element sub-tensor result;
    其中,所述左乘矩阵和所述右乘矩阵都是由所述子张量的规模以及winograd变换类型确定的,其中所述winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。Wherein, the left multiplication matrix and the right multiplication matrix are both determined by the scale of the sub-tensor and the winograd transformation type, wherein the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation .
  31. 根据权利要求26所述的运算方法,其特征在于,所述从运算单元响应所述第二控制指令,从所述主运算单元中获取正变换运算结果,以及从所述存储单元中提取数据,并进行winograd卷积运算中的乘加运算和逆变换运算,得到winograd卷积运算结果,包括:The operation method according to claim 26, wherein the slave operation unit responds to the second control instruction to obtain the result of the forward transformation operation from the main operation unit, and extracts data from the storage unit, And perform the multiplication and addition operation and the inverse transformation operation in the winograd convolution operation to obtain the winograd convolution operation result, including:
    所述从运算单元将所述特征数据的正变换数据和所述权值数据的正变换数据进行对位乘法运算,得到乘法运算结果;The slave operation unit performs a bitwise multiplication operation on the positive transformation data of the characteristic data and the positive transformation data of the weight data to obtain a multiplication operation result;
    所述从运算单元将所述乘法运算结果拆解为多个子张量;对所述多个子张量进行变换运算并求和,得到所述winograd卷积运算结果。The slave operation unit disassembles the multiplication operation result into a plurality of sub-tensors; performs transformation operations on the plurality of sub-tensors and sums them to obtain the winograd convolution operation result.
  32. 根据权利要求31所述的运算方法,其特征在于,所述从运算单元将所述乘法运算结果拆解为多个子张量,包括:The operation method according to claim 31, wherein the slave operation unit decomposes the multiplication operation result into a plurality of sub-tensors, comprising:
    所述从运算单元从所述乘法运算结果解析得到多个子张量;The slave operation unit parses and obtains a plurality of sub-tensors from the multiplication operation result;
    其中,所述乘法运算结果为所述多个子张量之和,所述多个子张量的个数与所述乘法运算结果中非0元素的个数相同,每个所述子张量中有单个非0元素,且在所述子张量中的非0元素与在所述乘法运算结果中对应位置的非0元素相同。Wherein, the multiplication operation result is the sum of the multiple sub-tensors, the number of the multiple sub-tensors is the same as the number of non-zero elements in the multiplication operation result, and each of the sub-tensors has a single non-zero element. 0 elements, and the non-zero elements in the sub-tensor are the same as the non-zero elements in the corresponding position in the multiplication operation result.
  33. 根据权利要求32所述的运算方法,其特征在于,所述从运算单元将所述乘法运算结果拆解为多个子张量;对所述多个子张量进行变换运算并求和,得到所述winograd卷积运算结果,包括:The operation method according to claim 32, wherein the slave operation unit disassembles the multiplication operation result into a plurality of sub-tensors; performs transformation operations on the plurality of sub-tensors and sums them to obtain the Winograd convolution operation results, including:
    所述从运算单元获取各子张量对应的元子张量的winograd变换结果;Obtaining the winograd transformation result of the meta-sub-tensor corresponding to each sub-tensor from the arithmetic unit;
    其中,所述元子张量是将所述子张量的非0元素置为1的张量;Wherein, the element sub-tensor is a tensor in which the non-zero elements of the sub-tensor are set to 1;
    所述从运算单元将所述子张量中非0的元素值作为系数乘以对应的元子张量的winograd变换结果,得到所述子张量的winograd变换结果;The slave operation unit multiplies the non-zero element value in the sub-tensor as a coefficient by the winograd transformation result of the corresponding element sub-tensor to obtain the winograd transformation result of the sub-tensor;
    所述从运算单元将多个子张量的winograd变换结果相加得到所述winograd卷积运算结果。The slave operation unit adds the winograd transformation results of multiple sub-tensors to obtain the winograd convolution operation result.
  34. 根据权利要求33所述的运算方法,其特征在于,所述从运算单元获取各子张量对应的元子张量的winograd变换结果,包括:The operation method according to claim 33, wherein the obtaining from the operation unit the winograd transformation result of the element sub-tensor corresponding to each sub-tensor comprises:
    所述从运算单元对于每一个所述子张量,将所述子张量对应的元子张量左边乘以左乘矩阵、右边乘以右乘矩阵,得到所述元子张量的winograd变换结果;For each of the sub-tensors, the slave operation unit multiplies the left side of the element sub-tensor corresponding to the sub-tensor by the left multiplying matrix, and the right side multiplies the right multiplying matrix to obtain the winograd transformation of the element sub-tensor result;
    其中,所述左乘矩阵和所述右乘矩阵都是由所述子张量的规模以及winograd变换类型确定的,其中所述winograd变换类型包括正变换的winograd变换类型和逆变换的winograd变换类型。Wherein, the left multiplication matrix and the right multiplication matrix are both determined by the scale of the sub-tensor and the winograd transformation type, wherein the winograd transformation type includes the winograd transformation type of the forward transformation and the winograd transformation type of the inverse transformation .
  35. 根据权利要求25所述的运算方法,其特征在于,所述存储单元包括:主存储单元和从存储单元;The operation method according to claim 25, wherein the storage unit comprises: a main storage unit and a slave storage unit;
    所述存储单元存储用于winograd卷积运算的数据,包括:The storage unit stores data used for winograd convolution operations, including:
    所述主存储单元接收特征数据并存储;The main storage unit receives and stores characteristic data;
    所述从存储单元接收权值数据并存储。The weight data is received from the storage unit and stored.
  36. 根据权利要求35所述的运算方法,其特征在于,还包括:The operation method according to claim 35, further comprising:
    所述从运算单元将所述权值数据拆解为多个子张量;The slave arithmetic unit disassembles the weight data into a plurality of sub-tensors;
    所述从运算单元对所述多个子张量进行变换运算并求和,根据求和运算的结果得到所述权值数据的正变换数据。The slave operation unit performs a transformation operation on the plurality of sub-tensors and sums them, and obtains the positive transformation data of the weight data according to the result of the summation operation.
  37. 根据权利要求25所述的运算方法,其特征在于,所述存储单元包括:主存储单元和从存储单元;The operation method according to claim 25, wherein the storage unit comprises: a main storage unit and a slave storage unit;
    所述存储单元存储用于winograd卷积运算的数据,包括:The storage unit stores data used for winograd convolution operations, including:
    所述主存储单元接收特征数据和权值数据并存储;The main storage unit receives and stores characteristic data and weight data;
    所述从存储单元接收权值数据的正变换数据并存储。The positive conversion data of the weight data is received from the storage unit and stored.
  38. 根据权利要求37所述的运算方法,其特征在于,还包括:The operation method according to claim 37, further comprising:
    所述主运算单元将所述权值数据拆解为多个子张量;对所述多个子张量进行变换运算并求和,根据求和运算的结果得到所述权值数据的正变换数据;The main operation unit disassembles the weight data into multiple sub-tensors; performs transformation operations on the multiple sub-tensors and sums them, and obtains the positive transformation data of the weight data according to the result of the summation operation;
    所述主运算单元将所述权值数据的正变换数据发送给所述从存储单元。The main arithmetic unit sends the positive conversion data of the weight data to the slave storage unit.
  39. 根据权利要求25-38任一项所述的运算方法,其特征在于,所述主运算单元包括:主处理模块和缓存;The operation method according to any one of claims 25-38, wherein the main operation unit comprises: a main processing module and a cache;
    所述主运算单元响应所述第一控制指令,从所述存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果,包括:In response to the first control instruction, the main arithmetic unit extracts data from the storage unit to perform a positive transformation operation in a winograd convolution operation to obtain a positive transformation operation result, including:
    所述主处理模块响应所述第一控制指令,从所述存储单元中提取数据进行winograd卷积运算中的正变换运算,得到正变换运算结果;In response to the first control instruction, the main processing module extracts data from the storage unit to perform a positive transformation operation in a winograd convolution operation to obtain a positive transformation operation result;
    所述缓存存储所述正变换运算结果。The cache stores the result of the forward transformation operation.
  40. 根据权利要求39所述的运算方法,其特征在于,还包括:The operation method according to claim 39, further comprising:
    所述缓存在存储的正变换运算结果累积到预设数量时,向所述从运算单元发送正变换运算结果。When the stored positive transformation operation result is accumulated to a preset number, the buffer sends the positive transformation operation result to the slave operation unit.
  41. 根据权利要求25-38任一项所述的运算方法,其特征在于,还包括:The operation method according to any one of claims 25-38, further comprising:
    从运算单元将所述winograd卷积运算结果存储到所述存储单元的预设地址空间中。The slave operation unit stores the winograd convolution operation result in the preset address space of the storage unit.
  42. 根据权利要求25-38任一项所述的运算方法,其特征在于,主运算单元与多个从运算单元通信连接,不同的从运算单元负责对不同正变换运算结果的运算。The operation method according to any one of claims 25-38, wherein the main operation unit is communicatively connected with a plurality of slave operation units, and different slave operation units are responsible for calculating different forward transformation operation results.
  43. 根据权利要求31所述的运算方法,其特征在于,所述主运算单元和所述从运算单元的运算过程为并行运算,在所述主运算单元对所述特征数据的正变换数据计算完成之前,所述从运算单元针对已计算出的所述特征数据的正变换数据的元素位置和对应的所述权值数据的正变换数据的元素位置进行对位乘法运算,直至计算出各元素位置的对位乘法运算值,获得乘法运算结果。The operation method according to claim 31, wherein the operation processes of the main operation unit and the slave operation unit are parallel operations, and before the main operation unit completes the calculation of the forward transformation data of the characteristic data , The slave arithmetic unit performs a bitwise multiplication operation on the calculated element position of the forward transformation data of the characteristic data and the corresponding element position of the forward transformation data of the weight data, until the calculation of the position of each element Multiply the value of the bitwise operation to obtain the result of the multiplication operation.
  44. 根据权利要求43所述的运算方法,其特征在于,所述主存储单元中存储的所述特征数据被划分为多个用于winograd卷积运算的第一数据,所述第一数据的尺寸根据卷积核的尺寸确定;The operation method according to claim 43, wherein the characteristic data stored in the main storage unit is divided into a plurality of first data used for winograd convolution operation, and the size of the first data is according to The size of the convolution kernel is determined;
    所述从存储单元中存储的所述权值数据的正变换数据被划分为多个用于winograd卷积运算的第二数据,所述第二数据的尺寸根据所述第一数据的尺寸确定;The positive transformation data of the weight data stored in the secondary storage unit is divided into a plurality of second data used for winograd convolution operation, and the size of the second data is determined according to the size of the first data;
    所述主运算单元的主处理模块,响应所述主控制单元发送的第一控制指令,从所述主存储单元中依次获取所述第一数据,对所述第一数据进行正变换运算得到所述第一数据的正变换结果,并将所述第一数据的正变换结果存储在缓存中;The main processing module of the main arithmetic unit responds to the first control instruction sent by the main control unit to sequentially obtain the first data from the main storage unit, and perform a forward transformation operation on the first data to obtain the The positive transformation result of the first data, and storing the positive transformation result of the first data in a cache;
    当所述缓存中的所述第一数据的正变换结果达到预设数量时,所述主运算单元的主处理模块将所述缓存中的所述第一数据的正变换结果依次发送给所述从运算单元;When the positive transformation results of the first data in the buffer reach a preset number, the main processing module of the main arithmetic unit sends the positive transformation results of the first data in the buffer to the Slave unit
    所述从运算单元响应所述从控制单元发送的第二控制指令,从所述从存储单元中获取所述第二数据,将所述第一数据的正变换结果与所述第二数据进行对位乘运算,得到对位乘运算结果,并对所述对位乘运算结果进行逆变换运算,得到逆变换结果;The slave arithmetic unit responds to the second control instruction sent by the slave control unit, acquires the second data from the slave storage unit, and compares the positive transformation result of the first data with the second data Bit multiplication operation to obtain a bit multiplication operation result, and perform an inverse transformation operation on the bit multiplication operation result to obtain an inverse transformation result;
    所述从运算单元根据所述逆变换结果得到winograd卷积运算结果,并将所述winograd卷积运算结果发送给所述主存储单元进行存储。The slave operation unit obtains the winograd convolution operation result according to the inverse transformation result, and sends the winograd convolution operation result to the main storage unit for storage.
PCT/CN2020/113160 2019-11-01 2020-09-03 Computing device and method, and related product WO2021082722A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911061078.3A CN112765539B (en) 2019-11-01 2019-11-01 Computing device, computing method and related product
CN201911061078.3 2019-11-01

Publications (1)

Publication Number Publication Date
WO2021082722A1 true WO2021082722A1 (en) 2021-05-06

Family

ID=75692126

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/113160 WO2021082722A1 (en) 2019-11-01 2020-09-03 Computing device and method, and related product

Country Status (2)

Country Link
CN (1) CN112765539B (en)
WO (1) WO2021082722A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325591A (en) * 2018-09-26 2019-02-12 中国科学院计算技术研究所 Neural network processor towards Winograd convolution
CN109359730A (en) * 2018-09-26 2019-02-19 中国科学院计算技术研究所 Neural network processor towards fixed output normal form Winograd convolution

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11055063B2 (en) * 2016-05-02 2021-07-06 Marvell Asia Pte, Ltd. Systems and methods for deep learning processor
WO2018107383A1 (en) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Neural network convolution computation method and device, and computer-readable storage medium
CN108229656A (en) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 Neural network computing device and method
WO2018108126A1 (en) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Neural network convolution operation device and method
US10482155B2 (en) * 2016-12-30 2019-11-19 Intel Corporation Winograd algorithm on a matrix processing architecture
US10990648B2 (en) * 2017-08-07 2021-04-27 Intel Corporation System and method for an optimized winograd convolution accelerator
US10372787B2 (en) * 2017-12-12 2019-08-06 Facebook, Inc. Hardware accelerator pre-configured with coefficients for matrix-transform operations
CN110163349B (en) * 2018-02-12 2021-03-23 上海寒武纪信息科技有限公司 Network model calculation method and device
CN110147249B (en) * 2018-02-12 2021-02-09 上海寒武纪信息科技有限公司 Network model calculation method and device
US11586907B2 (en) * 2018-02-27 2023-02-21 Stmicroelectronics S.R.L. Arithmetic unit for deep learning acceleration

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325591A (en) * 2018-09-26 2019-02-12 中国科学院计算技术研究所 Neural network processor towards Winograd convolution
CN109359730A (en) * 2018-09-26 2019-02-19 中国科学院计算技术研究所 Neural network processor towards fixed output normal form Winograd convolution

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FENG SHI, HAOCHEN LI, YUHE GAO, BENJAMIN KUSCHNER, SONG-CHUN ZHU: "Sparse Winograd Convolutional neural networks on small-scale systolic arrays", COMPUTER SCIENCE, 3 October 2018 (2018-10-03), pages 1 - 7, XP080933823 *

Also Published As

Publication number Publication date
CN112765539A (en) 2021-05-07
CN112765539B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
CN109522052B (en) Computing device and board card
CN109543832B (en) Computing device and board card
TWI795519B (en) Computing apparatus, machine learning computing apparatus, combined processing device, neural network chip, electronic device, board, and method for performing machine learning calculation
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
CN110059797B (en) Computing device and related product
WO2021082725A1 (en) Winograd convolution operation method and related product
WO2021083101A1 (en) Data processing method and apparatus, and related product
CN109670581B (en) Computing device and board card
CN115221102B (en) Method for optimizing convolution operation of system-on-chip and related product
WO2021185262A1 (en) Computing apparatus and method, board card, and computer readable storage medium
WO2021082723A1 (en) Operation apparatus
WO2021082722A1 (en) Computing device and method, and related product
WO2021082746A1 (en) Operation apparatus and related product
WO2021223642A1 (en) Data processing method and apparatus, and related product
WO2021082721A1 (en) Winograd convolution operation method, apparatus, and device, and storage medium
WO2021082724A1 (en) Operation method and related product
CN111382852B (en) Data processing device, method, chip and electronic equipment
WO2021082747A1 (en) Operational apparatus and related product
CN111047030A (en) Operation method, operation device, computer equipment and storage medium
CN111061507A (en) Operation method, operation device, computer equipment and storage medium
WO2021223644A1 (en) Data processing method and device, and related product
CN111222632B (en) Computing device, computing method and related product
WO2021169914A1 (en) Data quantification processing method and apparatus, electronic device and storage medium
WO2021223645A1 (en) Data processing method and apparatus, and related product
WO2021212972A1 (en) Operation method, processor, and related product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20880593

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20880593

Country of ref document: EP

Kind code of ref document: A1