CN114077718A - Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium - Google Patents

Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
CN114077718A
CN114077718A CN202010807002.7A CN202010807002A CN114077718A CN 114077718 A CN114077718 A CN 114077718A CN 202010807002 A CN202010807002 A CN 202010807002A CN 114077718 A CN114077718 A CN 114077718A
Authority
CN
China
Prior art keywords
data
matrix
circuit
read
control signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010807002.7A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Simm Computing Technology Co ltd
Original Assignee
Beijing Simm Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simm Computing Technology Co ltd filed Critical Beijing Simm Computing Technology Co ltd
Priority to CN202010807002.7A priority Critical patent/CN114077718A/en
Publication of CN114077718A publication Critical patent/CN114077718A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the disclosure discloses a matrix calculation circuit, a matrix calculation method, electronic equipment and a computer-readable storage medium. Wherein the matrix calculation circuit includes: a first data reading circuit for reading a plurality of first data in the first matrix according to a read address of the first matrix; generating a second data read control signal according to the plurality of first data; the second data reading circuit is used for reading a plurality of second data in the second matrix according to the second data reading control signal and the reading address of the second matrix; and the calculation circuit is used for calculating third data according to the plurality of first data and the plurality of second data. The matrix calculation circuit generates the reading control signal of the second data through the plurality of read first data, and solves the problem that calculation time is wasted due to the fact that data which do not need to be calculated are calculated in the prior art.

Description

矩阵计算电路、方法、电子设备及计算机可读存储介质Matrix computing circuit, method, electronic device, and computer-readable storage medium

技术领域technical field

本公开涉及处理器领域,尤其涉及一种矩阵计算电路、方法、电子设备及计算机可读存储介质。The present disclosure relates to the field of processors, and in particular, to a matrix computing circuit, method, electronic device, and computer-readable storage medium.

背景技术Background technique

随着科学技术的发展,人类社会正在快速进入智能时代。智能时代的重要特点,就是人们获得数据的种类越来越多,获得数据的量越来越大,而对处理数据的速度要求越来越高。芯片是任务分配的基石,它从根本上决定了人们处理数据的能力。从应用领域来看,芯片主要有两条路线:一条是通用芯片路线,例如CPU等,它们能提供极大的灵活性,但是在处理特定领域算法时有效算力比较低;另一条是专用芯片路线,例如TPU等,它们在某些特定领域,能发挥较高的有效算力,但是面对灵活多变的比较通用的领域,它们处理能力比较差甚至无法处理。由于智能时代的数据种类繁多且数量巨大,所以要求芯片既具有极高的灵活性,能处理不同领域且日新月异的算法,又具有极强的处理能力,能快速处理极大的且急剧增长的数据量。With the development of science and technology, human society is rapidly entering the era of intelligence. An important feature of the intelligent age is that people obtain more and more types of data, the amount of data obtained is increasing, and the speed of processing data is getting higher and higher. The chip is the cornerstone of task distribution, and it fundamentally determines people's ability to process data. From the perspective of application fields, there are two main routes for chips: one is general-purpose chip routes, such as CPU, which can provide great flexibility, but the effective computing power is relatively low when processing algorithms in specific fields; the other is dedicated chips Routes, such as TPU, can exert high effective computing power in some specific fields, but in the face of flexible and more general fields, their processing power is relatively poor or even impossible. Due to the wide variety and huge amount of data in the intelligent era, chips are required to have extremely high flexibility, capable of processing algorithms in different fields and changing with each passing day, and extremely strong processing capabilities to rapidly process huge and rapidly growing data. quantity.

在神经网络计算中,卷积计算占了总运算量的大部分,而卷积计算可以转换成矩阵乘计算,因此要提高神经网络任务中的吞吐量、降低延时、提升芯片的有效算力,重点在于提升矩阵乘计算的速度。In neural network computing, convolution computing accounts for most of the total computing volume, and convolution computing can be converted into matrix multiplication computing, so it is necessary to improve the throughput of neural network tasks, reduce latency, and improve the effective computing power of the chip , the focus is to improve the speed of matrix multiplication calculations.

图1a为在神经网络中矩阵乘计算的示意图。如图1a所示,M1为数据矩阵,M2为参数矩阵,M为输出矩阵。M1中的一行数据和M2中的一列参数做乘加计算得到M中的一个数据。Figure 1a is a schematic diagram of a matrix multiplication calculation in a neural network. As shown in Figure 1a, M1 is the data matrix, M2 is the parameter matrix, and M is the output matrix. A row of data in M1 and a column of parameters in M2 are multiplied and added to obtain a data in M.

图1b为矩阵计算所使用的硬件结构图。如图1b所示,为了更好的提升矩阵计算的速度,提升数据利用的效率,很多时候会用计算单元阵列来实现矩阵的运算。以一个MxN(M>1,N>1)个计算单元组成的阵列为例,通过计算单元阵列,能够使得数据得到充分的利用。例如,对于M1中的一个元素,可以同时被同一行中的N个计算单元复用,而对于M2中的一个元素,可以同时被同一列中的M个计算单元复用。每一次可以完成M1的一列元素和M2的一行元素的计算。Fig. 1b is a hardware structure diagram used for matrix calculation. As shown in Figure 1b, in order to better improve the speed of matrix calculation and improve the efficiency of data utilization, the calculation unit array is often used to realize the matrix operation. Taking an array composed of MxN (M>1, N>1) computing units as an example, the data can be fully utilized through the computing unit array. For example, an element in M1 can be simultaneously multiplexed by N computing units in the same row, and an element in M2 can be simultaneously multiplexed by M computing units in the same column. One column of elements of M1 and one row of elements of M2 can be calculated at a time.

上述方案存在以下缺点:很多神经网络中的数据矩阵和/或参数矩阵是稀疏矩阵,即矩阵的数据中存在大量的0,对于这种0元素也会当成正常元素计算,这样会浪费计算时间,降低芯片计算能力的发挥,且增加功耗。The above scheme has the following disadvantages: many data matrices and/or parameter matrices in neural networks are sparse matrices, that is, there are a large number of 0s in the data of the matrix, and such 0 elements will also be calculated as normal elements, which will waste computing time. Reduce the performance of chip computing power and increase power consumption.

发明内容SUMMARY OF THE INVENTION

提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。This Summary is provided to introduce concepts in a simplified form that are described in detail in the Detailed Description section that follows. This summary section is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.

为了解决现有技术中处理核的任务分配不灵活、控制复杂的技术问题,本公开实施例提出如下技术方案:In order to solve the technical problems of inflexible task assignment and complex control of processing cores in the prior art, the embodiments of the present disclosure propose the following technical solutions:

第一方面,本公开实施例提供一种矩阵计算电路,其特征在于,包括:In a first aspect, an embodiment of the present disclosure provides a matrix calculation circuit, characterized in that it includes:

第一数据读取电路,用于根据第一矩阵的读取地址读取第一矩阵中的多个第一数据;根据所述多个第一数据生成第二数据读取控制信号;a first data reading circuit, configured to read a plurality of first data in the first matrix according to a read address of the first matrix; generate a second data read control signal according to the plurality of first data;

第二数据读取电路,用于根据所述第二数据读取控制信号以及第二矩阵的读取地址读取第二矩阵中的多个第二数据;a second data read circuit, configured to read a plurality of second data in the second matrix according to the second data read control signal and the read address of the second matrix;

计算电路,用于根据所述多个第一数据和所述多个第二数据计算得到第三数据。A calculation circuit, configured to calculate and obtain third data according to the plurality of first data and the plurality of second data.

进一步的,所述第一数据读取电路用于根据所述多个第一数据生成第二数据读取控制信号,包括:Further, the first data reading circuit is configured to generate a second data reading control signal according to the plurality of first data, including:

所述第一数据读取电路用于根据所述多个数据的每一个数据是否为零,生成所述第二数据读取控制信号。The first data read circuit is configured to generate the second data read control signal according to whether each data of the plurality of data is zero.

进一步的,所述根据所述多个数据的每一个数据是否为零,生成所述第二数据读取控制信号,具体包括:Further, generating the second data read control signal according to whether each of the plurality of data is zero, specifically includes:

响应于所述多个第一数据中的每一个数据的值均为零,生成指示累加所述第二矩阵的读取地址的所述第二数据读取控制信号;或者,in response to the value of each of the plurality of first data being zero, generating the second data read control signal indicating the accumulation of the read address of the second matrix; or,

响应于所述多个第一数据中至少有一个数据的值不为零,生成指示所述第二数据读取电路根据所述第二矩阵的读取地址读取所述多个第二数据的所述第二数据读取控制信号。In response to the value of at least one of the plurality of first data being non-zero, generating a signal instructing the second data read circuit to read the plurality of second data according to the read address of the second matrix the second data read control signal.

进一步的,所述第一数据读取电路,包括:Further, the first data reading circuit includes:

第一数值比较电路,用于判断所述多个第一数据中的每一个数据的值是否为零;a first value comparison circuit, configured to determine whether the value of each of the plurality of first data is zero;

第一控制电路,用于根据所述第一数值比较电路的判断结果生成所述第二数据读取控制信号。The first control circuit is configured to generate the second data read control signal according to the judgment result of the first numerical comparison circuit.

进一步的,所述第二数据读取电路,还包括:Further, the second data reading circuit also includes:

第二数值比较电路,用于判断所述多个第二数据中的每一个数据的值是否为零;a second value comparison circuit, configured to determine whether the value of each of the plurality of second data is zero;

第二控制电路,用于根据所述第二数值比较电路的判断结果生成计算控制信号。The second control circuit is configured to generate a calculation control signal according to the judgment result of the second numerical value comparison circuit.

进一步的,所述第二控制电路,用于:Further, the second control circuit is used for:

响应于所述多个第二数据中至少有一个不为零,生成所述计算控制信号以指示所述计算电路执行计算操作。The calculation control signal is generated to instruct the calculation circuit to perform a calculation operation in response to at least one of the plurality of second data being non-zero.

进一步的,所述第一控制电路包括:Further, the first control circuit includes:

第一控制信号产生电路,用于根据所述第一数值比较电路的判断结果生成第一控制信号;a first control signal generating circuit, configured to generate a first control signal according to the judgment result of the first numerical comparison circuit;

第一读取地址生成电路,用于累加所述第一矩阵的读取地址以得到新的第一矩阵的读取地址。The first read address generation circuit is configured to accumulate the read addresses of the first matrix to obtain a new read address of the first matrix.

进一步的,所述多个第一数据为所述第一矩阵中的一列第一数据;所述多个第二数据为所述第二矩阵中的一行第二数据。Further, the plurality of first data is a column of first data in the first matrix; the plurality of second data is a row of second data in the second matrix.

进一步的,所述计算电路,包括:Further, the computing circuit includes:

计算单元阵列,其中所述计算单元阵列中包括多个计算单元;a computing unit array, wherein the computing unit array includes a plurality of computing units;

所述计算单元阵列中的每一行计算单元分别接收所述多个第一数据;Each row of computing units in the computing unit array respectively receives the plurality of first data;

所述计算单元阵列中的每一列计算单元分别接收所述多个第二数据。第二方面,本公开实施例提供一种矩阵计算方法,其特征在于,包括:Each column of computing units in the computing unit array respectively receives the plurality of second data. In a second aspect, an embodiment of the present disclosure provides a matrix calculation method, characterized in that it includes:

根据所述第一矩阵的读取地址读取第一矩阵中的多个第一数据;Read a plurality of first data in the first matrix according to the read address of the first matrix;

根据所述多个第一数据生成第二数据读取控制信号;generating a second data read control signal according to the plurality of first data;

根据所述第二数据读取控制信号以及第二矩阵的读取地址读取第二矩阵中的多个第二数据;Read a plurality of second data in the second matrix according to the second data read control signal and the read address of the second matrix;

根据所述多个第一数据和所述多个第二数据计算得到第三数据。第三方面,本公开实施例提供一种芯片,包括第一方面中任一项所述的矩阵计算电路。The third data is calculated and obtained according to the plurality of first data and the plurality of second data. In a third aspect, an embodiment of the present disclosure provides a chip, including the matrix computing circuit described in any one of the first aspect.

第四方面,本公开实施例提供一种电子设备,包括:存储器,用于存储计算机可读指令;以及一个或多个处理器,用于运行所述计算机可读指令,使得所述处理器运行时实现前述第一方面中的任一所述的矩阵计算方法。In a fourth aspect, embodiments of the present disclosure provide an electronic device, including: a memory for storing computer-readable instructions; and one or more processors for executing the computer-readable instructions, so that the processors run When implementing the matrix calculation method described in any one of the foregoing first aspects.

第五方面,本公开实施例提供一种非暂态计算机可读存储介质,其特征在于,该非暂态计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行前述第一方面中的任一所述的矩阵计算方法。In a fifth aspect, an embodiment of the present disclosure provides a non-transitory computer-readable storage medium, characterized in that the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause a computer to execute the foregoing first aspect. Any of the described matrix calculation methods.

第六方面,本公开实施例提供一种计算机程序产品,其特征在于:包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行前述第一方面中的任一所述的矩阵计算方法。In a sixth aspect, an embodiment of the present disclosure provides a computer program product, which is characterized by comprising computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute any one of the foregoing first aspects. Matrix calculation method.

第七方面,本公开实施例提供一种计算装置,其特征在于,包括一个或多个所述第三方面所述的芯片。In a seventh aspect, an embodiment of the present disclosure provides a computing device, characterized by comprising one or more chips according to the third aspect.

本公开实施例公开了一种矩阵计算电路、方法、电子设备及计算机可读存储介质。其中该矩阵计算电路包括:第一数据读取电路,用于根据第一矩阵的读取地址读取第一矩阵中的多个第一数据;根据所述多个第一数据生成第二数据读取控制信号;第二数据读取电路,用于根据所述第二数据读取控制信号以及第二矩阵的读取地址读取第二矩阵中的多个第二数据;计算电路,用于根据所述多个第一数据和所述多个第二数据计算得到第三数据。上述矩阵计算电路通过读取出来的多个第一数据生成第二数据的读取控制信号,解决了现有技术中将不需要计算的数据也进行计算所导致的浪费计算时间的问题。Embodiments of the present disclosure disclose a matrix computing circuit, method, electronic device, and computer-readable storage medium. Wherein the matrix calculation circuit includes: a first data reading circuit for reading a plurality of first data in the first matrix according to the read address of the first matrix; generating a second data reading according to the plurality of first data a control signal; a second data reading circuit for reading a plurality of second data in the second matrix according to the second data reading control signal and the read address of the second matrix; a calculation circuit for reading a plurality of second data according to the second matrix The plurality of first data and the plurality of second data are calculated to obtain third data. The above-mentioned matrix calculation circuit generates a read control signal of the second data by using the plurality of read first data, which solves the problem of wasting calculation time caused by calculating the data that does not need to be calculated in the prior art.

上述说明仅是本公开技术方案的概述,为了能更清楚了解本公开的技术手段,而可依照说明书的内容予以实施,并且为让本公开的上述和其他目的、特征和优点能够更明显易懂,以下特举较佳实施例,并配合附图,详细说明如下。The above description is only an overview of the technical solutions of the present disclosure. In order to understand the technical means of the present disclosure more clearly, it can be implemented according to the content of the description, and to make the above and other purposes, features and advantages of the present disclosure more obvious and easy to understand , the following specific preferred embodiments, and in conjunction with the accompanying drawings, are described in detail as follows.

附图说明Description of drawings

结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent when taken in conjunction with the accompanying drawings and with reference to the following detailed description. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale.

图1a和1b为本公开现有技术的示意图;1a and 1b are schematic diagrams of the prior art of the disclosure;

图2为本公开实施例提供的矩阵计算电路的结构示意图;FIG. 2 is a schematic structural diagram of a matrix calculation circuit provided by an embodiment of the present disclosure;

图3为本公开实施例提供的第一矩阵的存储格式示意图;3 is a schematic diagram of a storage format of a first matrix provided by an embodiment of the present disclosure;

图4为本公开实施例提供的第一数据读取电路的结构示意图;FIG. 4 is a schematic structural diagram of a first data reading circuit provided by an embodiment of the present disclosure;

图5为本公开实施例提供的第二数据读取电路的结构示意图;5 is a schematic structural diagram of a second data reading circuit provided by an embodiment of the present disclosure;

图6a-6e为本公开实施例的一个应用实例的示意图;6a-6e are schematic diagrams of an application example of an embodiment of the present disclosure;

图7为本公开实施例提供的矩阵计算方法的流程图。FIG. 7 is a flowchart of a matrix calculation method provided by an embodiment of the present disclosure.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only for exemplary purposes, and are not intended to limit the protection scope of the present disclosure.

应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.

本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "including" and variations thereof are open-ended inclusions, ie, "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below.

需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence.

需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that the modifications of "a" and "a plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, they should be understood as "one or a plurality of". multiple".

本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only for illustrative purposes, and are not intended to limit the scope of these messages or information.

图2为本公开实施例提供的矩阵计算电路的示意图。本实施例提供的矩阵计算电路(EU)200包括:FIG. 2 is a schematic diagram of a matrix calculation circuit provided by an embodiment of the present disclosure. The matrix calculation circuit (EU) 200 provided by this embodiment includes:

第一数据读取电路(LD_M1)201,所述第一数据读取电路用于根据第一矩阵的读取地址读取第一矩阵中的多个第一数据;根据所述多个第一数据生成第二数据读取控制信号;A first data reading circuit (LD_M1) 201, the first data reading circuit is used to read a plurality of first data in the first matrix according to the read address of the first matrix; according to the plurality of first data generating a second data read control signal;

第二数据读取电路(LD_M2)202,所述第二数据读取电路用于根据所述第二数据读取控制信号以及第二矩阵的读取地址读取第二矩阵中的多个第二数据;A second data read circuit (LD_M2) 202, the second data read circuit is configured to read a plurality of second data in the second matrix according to the second data read control signal and the read address of the second matrix data;

计算电路203,用于根据所述多个第一数据和所述多个第二数据计算得到第三数据。The calculation circuit 203 is configured to calculate and obtain third data according to the plurality of first data and the plurality of second data.

示例性的,所述第一矩阵的读取地址为第一矩阵的存储首地址,所述第二矩阵的读取地址为第二矩阵的存储首地址。所述第一矩阵的存储首地址和第二矩阵的存储首地址通过指令解码电路ID(Instruction Decoder)得到,所述指令解码电路用于解码矩阵计算指令得到第一矩阵的存储首地址、第二矩阵的存储首地址以及第一矩阵和第二矩阵的大小等参数。Exemplarily, the read address of the first matrix is the first storage address of the first matrix, and the read address of the second matrix is the first storage address of the second matrix. The storage first address of the first matrix and the storage first address of the second matrix are obtained by the instruction decoding circuit ID (Instruction Decoder), and the instruction decoding circuit is used to decode the matrix calculation instruction to obtain the storage first address of the first matrix, the second Parameters such as the storage address of the matrix and the size of the first matrix and the second matrix.

示例性的,所述矩阵计算指令中包括指令类型、参与计算的第一矩阵的存储首地址、第二矩阵的存储首地址以及第一矩阵和第二矩阵的大小参数。在一个实施例中,所述指令类型为矩阵的乘法指令,所述第一矩阵为神经网络卷积计算中的数据矩阵,所述第二矩阵为神经网络卷积计算中的参数矩阵;其中,所述第一矩阵和/或所述第二矩阵为稀疏矩阵,所述稀疏矩阵中有大量的元素为0。可以理解的,所述矩阵计算指令中的矩阵的存储首地址以及矩阵的大小参数(如矩阵的行数和列数)可以以寄存器地址的形式表示,所示指令解码电路从对应的寄存器地址中获取对应的数据。Exemplarily, the matrix calculation instruction includes an instruction type, a first storage address of the first matrix involved in the calculation, a storage first address of the second matrix, and size parameters of the first matrix and the second matrix. In one embodiment, the instruction type is a matrix multiplication instruction, the first matrix is a data matrix in a neural network convolution calculation, and the second matrix is a parameter matrix in a neural network convolution calculation; wherein, The first matrix and/or the second matrix are sparse matrices, and a large number of elements in the sparse matrix are 0. It can be understood that the storage first address of the matrix in the matrix calculation instruction and the size parameters of the matrix (such as the number of rows and columns of the matrix) can be expressed in the form of register addresses, and the instruction decoding circuit shown is from the corresponding register address. Get the corresponding data.

在本公开实施例中,所述第一数据读取电路201接收所述指令解码电路解码出的第一矩阵的首地址,并根据该首地址生成第一矩阵的读取地址;根据所述第一矩阵的读取地址一次性读取出第一矩阵中的多个第一数据。示例性的,所述多个第一数据为第一矩阵中的一列数据,在存储所述第一矩阵时,按照先列后行的顺序存储,即逐列存储。如图3所示的第一矩阵M1,其在存储时按照先列后行的顺序,先存储a11-a41,之后存储a12-a42,以此类推直至整个第一矩阵存储完毕。a11-a44在逻辑上连续存储。设第一矩阵的存储首地址为PD,设第一矩阵的行数为L1,则第一数据读取电路每次读取时,按照当前的存储首地址读取第一数据,在读取完第一矩阵的一列数据之后,将第一矩阵的读取地址自动加L1,即PD=PD+L1,以得到下次的第一矩阵的读取首地址,直至读取次数达到第一矩阵的列数为止。以如图3所示的M1为例,PD=PD+4。In this embodiment of the present disclosure, the first data reading circuit 201 receives the first address of the first matrix decoded by the instruction decoding circuit, and generates a reading address of the first matrix according to the first address; A read address of a matrix reads out a plurality of first data in the first matrix at one time. Exemplarily, the plurality of first data is data of one column in the first matrix, and when the first matrix is stored, it is stored in the order of columns first and then rows, that is, stored column by column. As shown in FIG. 3 , the first matrix M1 stores a11-a41 first, then stores a12-a42, and so on until the entire first matrix is stored in the order of first column and then row. a11-a44 are stored logically consecutively. Set the storage first address of the first matrix as PD, and set the number of rows of the first matrix as L1, then each time the first data reading circuit reads the first data according to the current storage first address, after reading After one column of data in the first matrix, automatically add L1 to the read address of the first matrix, that is, PD=PD+L1, to obtain the first read address of the next first matrix, until the number of reads reaches the number of times of the first matrix. up to the number of columns. Taking M1 as shown in FIG. 3 as an example, PD=PD+4.

在本公开实施例中,当所述第一数据读取电路201读取到所述多个第一数据之后,根据所述多个第一数据生成第二数据读取控制信号。在矩阵计算中,参与计算的矩阵可能为稀疏矩阵,即第一矩阵中的很多数据都为0,这时候很多计算实际上是可以跳过的,如在矩阵乘法中,第一矩阵的一个数据为0,则无论第二矩阵的对应的第二数据的值如何,其结果均为0,该计算可以不执行。因此,可选的,所述第一数据读取电路用于根据所述多个第一数据生成所述第二数据读取控制信号,包括:所述第一数据读取电路用于根据所述多个数据的每一个数据是否为零,生成所述第二数据读取控制信号。In this embodiment of the present disclosure, after the first data reading circuit 201 reads the plurality of first data, a second data read control signal is generated according to the plurality of first data. In matrix calculation, the matrix involved in the calculation may be a sparse matrix, that is, many data in the first matrix are 0. At this time, many calculations can actually be skipped. For example, in matrix multiplication, a data in the first matrix is is 0, no matter what the value of the corresponding second data of the second matrix is, the result is 0, and the calculation may not be performed. Therefore, optionally, the first data reading circuit is configured to generate the second data reading control signal according to the plurality of first data, including: the first data reading circuit is configured to generate the second data reading control signal according to the The second data read control signal is generated whether each of the plurality of data is zero.

具体的,响应于所述多个第一数据中的每一个数据的值均为零,生成指示累加所述第二矩阵的读取地址的所述第二数据读取控制信号;或者,响应于所述多个第一数据中至少有一个数据的值不为零,生成指示所述第二数据读取电路根据当前的第二矩阵的读取地址读取所述多个第二数据的所述第二数据读取控制信号。Specifically, in response to the value of each of the plurality of first data being zero, the second data read control signal indicating the accumulation of the read address of the second matrix is generated; or, in response to The value of at least one data in the plurality of first data is not zero, and generating the second data reading circuit that instructs the second data reading circuit to read the plurality of second data according to the current read address of the second matrix. The second data read control signal.

在该实施例中,如果所述多个数据中的每一个数据的值均为0,则与其对应的多个第二数据可以不读取,此时生成第二数据读取控制信号指示累加所述第二矩阵的读取地址,跳过当前的多个第二数据的读取;如果所述多个数据中至少有一个数据的值不为零,则表示需要执行计算,此时生成第二数据读取控制信号指示所述第二数据读取电路根据当前的第二矩阵的读取地址读取与所述多个第一数据对应的多个第二数据。In this embodiment, if the value of each of the plurality of data is 0, the corresponding plurality of second data may not be read, and at this time, a second data read control signal is generated to instruct the accumulation of all data. The read address of the second matrix, skip the reading of the current multiple second data; if the value of at least one of the multiple data is not zero, it means that the calculation needs to be performed, and the second data is generated at this time. The data read control signal instructs the second data read circuit to read a plurality of second data corresponding to the plurality of first data according to the current read address of the second matrix.

在本公开实施例中,所述第二数据读取电路202接收所述指令解码电路解码出的第二矩阵的首地址,并根据该首地址以及所述第一数据读取电路201所产生的第二数据读取控制信号生成第二矩阵的读取地址以读取第二矩阵中的多个第二数据。可选的,所述第二数据读取电路每次读取第二矩阵中的一行数据,在存储所述第二矩阵时,按照第二矩阵先行后列的顺序存储,即逐行存储,每次读取其中的一行,设第二矩阵的存储首地址为PW,设第二矩阵的列数为L2,则第二数据读取电路每次读取时,按照当前的存储首地址读取第二数据,在读取完第二矩阵的一行数据之后,将第二矩阵的读取地址自动加L2,即PW=PW+L2,以得到下次的第二矩阵的读取首地址,直至读取次数达到第二矩阵的行数为止。In the embodiment of the present disclosure, the second data reading circuit 202 receives the first address of the second matrix decoded by the instruction decoding circuit, and according to the first address and the first address generated by the first data reading circuit 201 The second data read control signal generates a read address of the second matrix to read a plurality of second data in the second matrix. Optionally, the second data reading circuit reads a row of data in the second matrix each time, and when storing the second matrix, it is stored in the order of the second matrix row first and then the column, that is, row by row storage, each row is stored. Read one row at a time, set the storage first address of the second matrix to be PW, and set the number of columns of the second matrix to be L2, then each time the second data reading circuit reads, it reads the first storage address according to the current storage first address. Two data, after reading a row of data of the second matrix, automatically add L2 to the read address of the second matrix, that is, PW=PW+L2, to obtain the first read address of the second matrix next time, until the read address of the second matrix is read. Take the number of times until the number of rows of the second matrix is reached.

具体的,所述第二数据读取电路需要根据所述第二数据读取控制信号以及当前的第二矩阵的读取地址确定实际的第二矩阵的读取地址。如果所述第二数据读取控制信号指示累加所述第二矩阵的读取地址,则用过PW=PW+L2得到新的第二矩阵的读取地址,之后继续等待新的第二数据读取控制信号;如果所述第二数据读取控制信号指示通过当前的第二矩阵的读取地址读取第二数据,则先通过PW读取出多个第二数据,之后再通过PW=PW+L2计算出下一次第二矩阵的读取地址,并等待新的第二数据读取控制信号。Specifically, the second data read circuit needs to determine the actual read address of the second matrix according to the second data read control signal and the current read address of the second matrix. If the second data read control signal indicates to accumulate the read addresses of the second matrix, use PW=PW+L2 to obtain a new read address of the second matrix, and then continue to wait for a new read of the second data Take the control signal; if the second data read control signal instructs to read the second data through the current read address of the second matrix, first read out a plurality of second data through PW, and then pass PW=PW +L2 calculates the next read address of the second matrix, and waits for a new second data read control signal.

在得到所述多个第一数据和所述多个第二数据之后,所述计算单元根据所述多个第一数据和所述多个第二数据执行所述矩阵计算指令所指示的计算得到第三数据,其中第三数据为输出矩阵中的数据或部分数据。After obtaining the plurality of first data and the plurality of second data, the calculation unit performs the calculation indicated by the matrix calculation instruction according to the plurality of first data and the plurality of second data to obtain The third data, wherein the third data is the data or part of the data in the output matrix.

可选的,所述计算电路包括:计算单元阵列,其中所述计算单元阵列中包括多个计算单元;所述计算单元阵列中的每一行计算单元分别接收所述多个第一数据;所述计算单元阵列中的每一列计算单元分别接收所述多个第二数据。具体的,如图2所示,所述计算电路包括计算单元阵列,其中包括多个计算单元PU,每一行计算单元分别接收一个所述多个第一数据,以图3中的M1为例,当取出的一列第一数据为a11-a41时,第一行计算单元接收a11,第二行计算单元接收a21,第三行计算单元接收a31,第四行计算单元接收a41。所述计算单元阵列中的每一列计算单元分别接收所述多个第二数据,与上述例子相似,每一列计算单元接收同一个第二数据,不同列的计算单元接收不同的第二数据,详细情况将在后文叙述,在此不再赘述。Optionally, the calculation circuit includes: a calculation unit array, wherein the calculation unit array includes a plurality of calculation units; each row of calculation units in the calculation unit array respectively receives the plurality of first data; the Each column of computing units in the computing unit array respectively receives the plurality of second data. Specifically, as shown in FIG. 2 , the computing circuit includes a computing unit array, which includes a plurality of computing units PU, and each row of computing units respectively receives one of the plurality of first data. Taking M1 in FIG. 3 as an example, When the fetched column of first data is a11-a41, the first row computing unit receives a11, the second row computing unit receives a21, the third row computing unit receives a31, and the fourth row computing unit receives a41. Each column of calculation units in the calculation unit array respectively receives the plurality of second data. Similar to the above example, each column of calculation units receives the same second data, and different columns of calculation units receive different second data. The situation will be described later and will not be repeated here.

如图4所示,为了实现上述第一数据读取电路的功能,可选的,所述第一数据读取电路还包括:As shown in FIG. 4, in order to realize the function of the above-mentioned first data reading circuit, optionally, the first data reading circuit further includes:

第一数值比较电路401(Z_D),用于判断所述多个第一数据中的每一个数据的值是否为零;a first value comparison circuit 401 (Z_D), for judging whether the value of each of the plurality of first data is zero;

第一控制电路402(Ctrl1),用于根据所述第一数值比较电路的判断结果生成所述第二数据读取控制信号。The first control circuit 402 ( Ctrl1 ) is configured to generate the second data read control signal according to the judgment result of the first numerical comparison circuit.

如图4所示,第一数值比较电路401包括多个数据比较缓存电路Z_Det,当从存储器或存储区域M1中读取出多个第一数据之后,将每个第一数据缓存进一个Z_Det电路中,在Z_Det电路中,判断第一数据是否为0,得到判断结果Z0-ZM-1。示例性的,所述Z_Det电路包括一个或门电路和一个非门电路,其中所述或门电路包括X个输入端,其中X为第一数据的位数,每个输入端对应所述第一数据中的一位;示例性的,所述X=4,则所述或门电路包括4个输入端,分别对应第一数据Data[3:0]的每一位,通过所述或门电路将第一数据中的每一位按位执行逻辑或运算之后在通过非门电路得到判断结果;示例性的,所述第一数据为1110,则按位执行逻辑或运算得到结果为1,再经过非门电路得到结果为0,则判断所述第一数据不为0;所述第一数据为0000,则按位执行逻辑或运算得到结果为0,再经过非门电路得到结果为1,则判断所述第一数据为0。可以理解的,所述Z_Det也可以不包括所述非门电路,此时或门输出的结果为1时,判断所述第一数据不为0,或门输出的结果为0时,判断所述第一数据为0。As shown in FIG. 4 , the first value comparison circuit 401 includes a plurality of data comparison buffer circuits Z_Det. After reading a plurality of first data from the memory or storage area M1, each first data is buffered into a Z_Det circuit In the Z_Det circuit, it is judged whether the first data is 0, and the judgment result Z 0 -Z M-1 is obtained. Exemplarily, the Z_Det circuit includes an OR gate circuit and a NOT gate circuit, wherein the OR gate circuit includes X input terminals, where X is the number of bits of the first data, and each input terminal corresponds to the first A bit in the data; exemplarily, the X=4, then the OR gate circuit includes 4 input terminals, corresponding to each bit of the first data Data[3:0] respectively, through the OR gate circuit After performing bitwise logical OR operation on each bit of the first data, a judgment result is obtained through the NOT gate circuit; exemplarily, if the first data is 1110, then bitwise logical OR operation is performed to obtain a result of 1, and then If the result obtained through the NOT gate circuit is 0, then it is judged that the first data is not 0; if the first data is 0000, then the bitwise logical OR operation is performed and the result is 0, and then the result obtained through the NOT gate circuit is 1, Then it is judged that the first data is 0. It can be understood that the Z_Det may not include the NOT gate circuit. At this time, when the result output by the OR gate is 1, it is judged that the first data is not 0, and when the result outputted by the OR gate is 0, it is judged that the The first data is 0.

如图4所示,所述多个数据比较缓存电路Z_Det的多个判断结果Z0-ZM-1输入所述第一控制电路402,所述第一控制电路402根据所述第一数值比较电路的判断结果生成所述第二数据读取控制信号C1。示例性的,所述第一控制电路包括第一读取地址生成电路AG1以及第一控制信号产生电路CL1,其中所述第一控制信号生成电路CL1接收所述多个判断结果Z0-ZM-1以生成所述第二数据读取控制信号C1,示例性的所述第一控制信号生成电路CL1包括一个多输入端的与门电路,所述与门电路的每个输入端分别用于接收一个所述判断结果Z0-ZM-1;当所述Z0-ZM-1均为1时,C1为1;当所述Z0-ZM-1中不全为0时,C1为0。所述第二数据读取电路202,接收所述第二数据读取控制信号C1,如果所述C1为1,表示所取出的多个第一数据均为0,则第二数据读取电路202累加第二矩阵的读取地址,不读取与所述多个第一数据对应的第二数据;如果所述C1为0,表示所取出的多个第二数据不全为0,则所述第一数据读取电路201根据当前的第二矩阵的读取地址读取与所述多个第一数据对应多个的第二数据。As shown in FIG. 4 , the plurality of judgment results Z 0 -Z M-1 of the plurality of data comparison buffer circuits Z_Det are input to the first control circuit 402 , and the first control circuit 402 compares according to the first value The judgment result of the circuit generates the second data read control signal C1. Exemplarily, the first control circuit includes a first read address generation circuit AG1 and a first control signal generation circuit CL1, wherein the first control signal generation circuit CL1 receives the plurality of judgment results Z 0 -Z M -1 to generate the second data read control signal C1, the exemplary first control signal generation circuit CL1 includes an AND gate circuit with multiple input terminals, each input terminal of the AND gate circuit is respectively used for receiving A described judgment result Z 0 -Z M-1 ; when the Z 0 -Z M-1 are all 1, C1 is 1; when not all of the Z 0 -Z M-1 are 0, C1 is 0. The second data reading circuit 202 receives the second data reading control signal C1, if the C1 is 1, indicating that the plurality of first data retrieved are all 0, the second data reading circuit 202 Accumulate the read addresses of the second matrix, and do not read the second data corresponding to the plurality of first data; if the C1 is 0, indicating that the plurality of second data fetched are not all 0, then the first data A data reading circuit 201 reads a plurality of second data corresponding to the plurality of first data according to the current read address of the second matrix.

如图5所示,为了实现上述第二数据读取电路的功能,可选的,所述第二数据读取电路还包括:As shown in FIG. 5 , in order to realize the function of the above-mentioned second data reading circuit, optionally, the second data reading circuit further includes:

第二数值比较电路501(DB),用于判断所述多个第二数据中的每一个数据的值是否为零;A second value comparison circuit 501 (DB), configured to determine whether the value of each of the plurality of second data is zero;

第二控制电路502(Ctrl2),用于根据所述第二数值比较电路的判断结果生成计算控制信号。The second control circuit 502 ( Ctrl2 ) is configured to generate a calculation control signal according to the judgment result of the second numerical value comparison circuit.

可选的,所述第二数值比较电路501的结构与所述第一数值比较电路401的结构相同,也包括多个数据比较缓存电路,其判断所述多个第二数据中的每一个数据的值是否为零的步骤与上述第一数值比较电路判断多个第一数据中的每一个数据的值是否为零的步骤相同,在此不再赘述。Optionally, the structure of the second numerical comparison circuit 501 is the same as that of the first numerical comparison circuit 401, and also includes a plurality of data comparison buffer circuits, which determine each data in the plurality of second data. The step of whether the value of is zero is the same as the step of determining whether the value of each of the plurality of first data is zero by the above-mentioned first numerical comparison circuit, which is not repeated here.

所述第二控制电路502,当所述第二数据比较电路的判断结果为每个第二数据均为0时,生成计算控制信号C_Start为0,以使所述计算电路不进行计算;当所述第二数据比较电路的判断结果为所述多个第二数据中至少有一个第二数据不为0,则生成计算控制信号C_Start为1,以使所述计算电路根据所述多个第一数据和所述多个第二数据进行计算得到第三数据。The second control circuit 502, when the judgment result of the second data comparison circuit is that each second data is 0, generates a calculation control signal C_Start to be 0, so that the calculation circuit does not perform calculation; If the judgment result of the second data comparison circuit is that at least one second data in the plurality of second data is not 0, the generated calculation control signal C_Start is 1, so that the calculation circuit can make the calculation circuit according to the plurality of first data. The data and the plurality of second data are calculated to obtain third data.

可选的,所述第二控制电路502进一步包括:第二读取地址生成电路AG2以及第二控制信号产生电路CL2,其中所述第二控制信号产生电路CL2用于根据第二数据读取电路控制信号生成第二读取地址生成电路AG2的控制信号以及根据第二数据比较电路的判断结果生成所述计算控制信号,第二读取地址生成电路AG2用于根据所述根据所述第二读取地址生成电路AG2的控制信号生成第二矩阵的读取地址。示例性的,所述第二读取地址生成电路AG2在得到解码电路发送的第二数据的读取首地址之后,根据所述C1信号判断是否需要对第二矩阵的读取地址进行累加,如果C1信号为1,则AG2对第二矩阵的读取地址进行累加;如果C1信号为0,则AG2不对第二矩阵的读取地址进行累加,且第二数据读取电路LD_M2从存储器或存储区域M2中读取多个第二数据。Optionally, the second control circuit 502 further includes: a second read address generation circuit AG2 and a second control signal generation circuit CL2, wherein the second control signal generation circuit CL2 is used to read the circuit according to the second data The control signal generates the control signal of the second read address generation circuit AG2 and generates the calculation control signal according to the judgment result of the second data comparison circuit. The second read address generation circuit AG2 is used for The read address of the second matrix is generated by taking the control signal of the address generating circuit AG2. Exemplarily, after obtaining the first read address of the second data sent by the decoding circuit, the second read address generation circuit AG2 determines whether the read address of the second matrix needs to be accumulated according to the C1 signal, if If the C1 signal is 1, AG2 accumulates the read address of the second matrix; if the C1 signal is 0, then AG2 does not accumulate the read address of the second matrix, and the second data read circuit LD_M2 reads from the memory or storage area. A plurality of second data are read in M2.

图6a-6e为上述实施例中的矩阵计算电路的计算过程的实例。如图6a所示,为矩阵计算电路需要执行的矩阵乘法计算,M1为第一矩阵,M2为第二矩阵,M为M1和M2矩阵相乘得到的第三矩阵。6a-6e are examples of the calculation process of the matrix calculation circuit in the above embodiment. As shown in FIG. 6a, for the matrix multiplication calculation to be performed by the matrix calculation circuit, M1 is the first matrix, M2 is the second matrix, and M is the third matrix obtained by multiplying the M1 and M2 matrices.

如图6b所示,为使用矩阵计算电路执行上述矩阵乘法的整体示意图。其中所述矩阵计算电路包括一个2*2的计算单元阵列,LD_M1读出的一列两个第一数据被分别输入到计算单元阵列中的每一行,LD_M2读出的一行两个第二数据被分别输入到计算单元阵列中的每一列,经过三个时钟周期的计算得到第三矩阵M。As shown in FIG. 6b, it is an overall schematic diagram of using a matrix computing circuit to perform the above-mentioned matrix multiplication. The matrix calculation circuit includes a 2*2 calculation unit array, one column and two first data read by LD_M1 are respectively input to each row in the calculation unit array, and one row and two second data read by LD_M2 are respectively Input to each column in the array of calculation units, and obtain the third matrix M after three clock cycles of calculation.

如图6c所示,为第一次取数及第一次计算的过程。LD_M1取出第一矩阵的第一列第一数据1和0,缓存入Z_Det,Z_Det输出判断结果01给Ctrl1,Ctrl1对01进行按位与逻辑运算得到C1=0;LD_M2根据C1=0,按照当前第二矩阵的读取地址读取第二矩阵中的第一行第二数据1和2,并缓存入DB中,DB判断第二数据不全为0,则生成计算控制信号C_Start=1,使得计算单元阵列执行计算,其中PU0,0的输入端输入1和1,计算乘积为1;PU0,1的输入端输入1和2,计算乘积为2;PU1,0的输入端输入0和1,计算乘积为0;PU1,1的输入端输入0和2,计算乘积为0;由此得到第一次计算的结果M_temp。在读取完所述一列第一数据之后,所述LD_M1累加第一矩阵的读取地址;在读取完所述一行第二数据之后,LD_M2累加所述第二矩阵的读取地址。As shown in Figure 6c, it is the process of the first fetching and the first calculation. LD_M1 takes out the first data 1 and 0 of the first column of the first matrix, buffers them into Z_Det, Z_Det outputs the judgment result 01 to Ctrl1, and Ctrl1 performs a bitwise AND logical operation on 01 to obtain C1=0; LD_M2 according to C1=0, according to the current The read address of the second matrix reads the second data 1 and 2 of the first row in the second matrix, and buffers them in the DB. If the DB judges that the second data is not all 0, it generates the calculation control signal C_Start=1, so that the calculation The cell array performs the calculation, where 1 and 1 are input to the input of PU 0,0 , and the product is calculated as 1; 1 and 2 are input to the input of PU 0,1 , and the product is calculated as 2; 1, the calculation product is 0; the input terminals of PU 1,1 input 0 and 2, and the calculation product is 0; thus, the result M_temp of the first calculation is obtained. After reading the one column of the first data, the LD_M1 accumulates the read addresses of the first matrix; after reading the row of the second data, the LD_M2 accumulates the read addresses of the second matrix.

如图6d,为第二次取数及第二次计算的过程。LD_M1取出第一矩阵的第二列第一数据0和0,缓存入Z_Det,Z_Det输出判断结果11给Ctrl1,Ctrl1对11进行按位与逻辑运算得到C1=1;LD_M2根据C1=1,累加第二矩阵的读取地址,在第二次计算中,由于第一矩阵的两个第一数据均为0,因此,计算单元阵列跳过此次计算;M_temp不发生变化。As shown in Figure 6d, it is the process of the second fetching and the second calculation. LD_M1 takes out the first data 0 and 0 of the second column of the first matrix, buffers them into Z_Det, Z_Det outputs the judgment result 11 to Ctrl1, and Ctrl1 performs a bitwise AND logical operation on 11 to obtain C1=1; LD_M2 accumulates the first data according to C1=1. For the read address of the second matrix, in the second calculation, since the two first data of the first matrix are both 0, the calculation unit array skips this calculation; M_temp does not change.

如图6e,为第三次取数及第三次计算的过程。LD_M1取出第一矩阵的第三列第一数据0和2,缓存入Z_Det,Z_Det输出判断结果10给Ctrl1,Ctrl1对10进行按位与逻辑运算得到C1=0;LD_M2根据C1=0,根据当前第二矩阵的读取地址读取第二矩阵中的第三行第二数据0和0,并缓存入DB中,DB判断第二数据全为0,则生成计算控制信号C_Start=0,因此,计算单元阵列跳过此次计算;M_temp仍然不发生变化。As shown in Figure 6e, it is the process of the third fetching and the third calculation. LD_M1 takes out the first data 0 and 2 of the third column of the first matrix, buffers them into Z_Det, Z_Det outputs the judgment result 10 to Ctrl1, and Ctrl1 performs a bitwise AND logical operation on 10 to obtain C1=0; LD_M2 according to C1=0, according to the current The read address of the second matrix reads the second data 0 and 0 in the third row of the second matrix, and buffers them into DB. If DB judges that the second data is all 0, the calculation control signal C_Start=0 is generated. Therefore, The calculation cell array skips this calculation; M_temp remains unchanged.

根据第一矩阵的大小参数,如矩阵的行数和列数,可以判断矩阵计算结束,由此得到最终的第三矩阵M。According to the size parameters of the first matrix, such as the number of rows and columns of the matrix, it can be determined that the matrix calculation is completed, thereby obtaining the final third matrix M.

根据上述矩阵计算电路,可以在计算之前判断第一矩阵中的多个第一数据是否全为0,如果全为0,则可以跳过此次计算,以节省计算资源;当第一矩阵中的多个第一数据不全为0时,进一步判断取出的多个第二数据是否全为0,如果全为0则不进行计算,以节省计算资源。只有当多个第一数据不全为0且多个第二数据也不全为0的情况下,计算电路才执行矩阵计算,在第一矩阵和/或第二矩阵为稀疏的情况下,可以大大节省计算资源。According to the above matrix calculation circuit, it can be judged whether multiple first data in the first matrix are all 0 before calculation, and if all are 0, this calculation can be skipped to save calculation resources; when the first data in the first matrix are all 0s When the plurality of first data are not all 0, it is further judged whether the plurality of extracted second data are all 0, and if all are 0, no calculation is performed, so as to save computing resources. The calculation circuit performs the matrix calculation only when the plurality of first data are not all 0 and the plurality of second data are not all 0. In the case where the first matrix and/or the second matrix are sparse, it can save a lot of money computing resources.

图7为本公开实施例提供的矩阵计算方法的流程图。如图7所示,该方法包括如下步骤:FIG. 7 is a flowchart of a matrix calculation method provided by an embodiment of the present disclosure. As shown in Figure 7, the method includes the following steps:

步骤S701,根据第一矩阵的读取地址读取第一矩阵中的多个第一数据;Step S701, reading a plurality of first data in the first matrix according to the read address of the first matrix;

步骤S702,根据所述多个第一数据生成第二数据读取控制信号;Step S702, generating a second data read control signal according to the plurality of first data;

步骤S703,根据所述第二数据读取控制信号以及第二矩阵的读取地址读取第二矩阵中的多个第二数据;Step S703, reading a plurality of second data in the second matrix according to the second data read control signal and the read address of the second matrix;

步骤S704,根据所述多个第一数据和所述多个第二数据计算得到第三数据。Step S704, calculating and obtaining third data according to the plurality of first data and the plurality of second data.

进一步的,所述根据所述多个第一数据生成第二数据读取控制信号,包括:Further, the described generation of the second data read control signal according to the plurality of first data includes:

根据所述多个数据的每一个数据是否为零,生成所述第二数据读取控制信号。The second data read control signal is generated according to whether each of the plurality of data is zero.

进一步的,所述根据所述多个数据的每一个数据是否为零,生成所述第二数据读取控制信号,具体包括:Further, generating the second data read control signal according to whether each of the plurality of data is zero, specifically includes:

响应于所述多个第一数据中的每一个数据的值均为零,生成指示累加所述第二矩阵的读取地址的所述第二数据读取控制信号;或者,in response to the value of each of the plurality of first data being zero, generating the second data read control signal indicating the accumulation of the read address of the second matrix; or,

响应于所述多个第一数据中至少有一个数据的值不为零,生成指示所述第二数据读取电路根据所述第二矩阵的读取地址读取所述多个第二数据的所述第二数据读取控制信号。In response to the value of at least one of the plurality of first data being non-zero, generating a signal instructing the second data read circuit to read the plurality of second data according to the read address of the second matrix the second data read control signal.

进一步的,所述根据所述多个第一数据生成第二数据读取控制信号包括:Further, the generating a second data read control signal according to the plurality of first data includes:

判断所述多个第一数据中的每一个数据的值是否为零;judging whether the value of each data in the plurality of first data is zero;

根据判断结果生成所述第二数据读取控制信号。The second data read control signal is generated according to the judgment result.

进一步的,所述矩阵计算方法还包括:Further, the matrix calculation method also includes:

判断所述多个第二数据中的每一个数据的值是否为零;judging whether the value of each of the plurality of second data is zero;

根据所述判断结果生成计算控制信号。A calculation control signal is generated according to the judgment result.

进一步的,所述根据所述判断结果生成计算控制信号,包括:Further, generating the calculation control signal according to the judgment result includes:

响应于所述多个第二数据中至少有一个不为零,生成计算控制信号以指示所述计算电路执行计算操作。In response to at least one of the plurality of second data being non-zero, a calculation control signal is generated to instruct the calculation circuit to perform a calculation operation.

进一步的,所述矩阵计算方法还包括:Further, the matrix calculation method also includes:

根据所述判断结果生成第一控制信号;generating a first control signal according to the judgment result;

累加所述第一矩阵的读取地址以得到新的第一矩阵的读取地址。The read addresses of the first matrix are accumulated to obtain a new read address of the first matrix.

在上文中,虽然按照上述的顺序描述了上述方法实施例中的各个步骤,本领域技术人员应清楚,本公开实施例中的步骤并不必然按照上述顺序执行,其也可以倒序、并行、交叉等其他顺序执行,而且,在上述步骤的基础上,本领域技术人员也可以再加入其他步骤,这些明显变型或等同替换的方式也应包含在本公开的保护范围之内,在此不再赘述。In the above, although the steps in the above method embodiments are described in the above order, it should be clear to those skilled in the art that the steps in the embodiments of the present disclosure are not necessarily executed in the above order, and may also be performed in reverse order, parallel, interleaved and other steps are performed in other order, and, on the basis of the above steps, those skilled in the art can also add other steps, these obvious modifications or equivalent replacement modes should also be included within the protection scope of the present disclosure, and will not be repeated here. .

本公开实施例还提供一种芯片,所述芯片包括上述实施例中至少一个任一矩阵计算电路。An embodiment of the present disclosure further provides a chip, where the chip includes at least any one of the matrix computing circuits in the foregoing embodiments.

本公开实施例提供一种电子设备,包括:存储器,用于存储计算机可读指令;以及一个或多个处理器,用于运行所述计算机可读指令,使得所述处理器运行时实现实施例中的任一所述的矩阵计算方法。An embodiment of the present disclosure provides an electronic device, including: a memory for storing computer-readable instructions; and one or more processors for executing the computer-readable instructions, so that the processor implements the embodiment when running Any of the matrix calculation methods described in .

本公开实施例还提供一种非暂态计算机可读存储介质,其特征在于,该非暂态计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行前述实施例中的任一所述的矩阵计算方法。Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium, characterized in that the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions are used to cause a computer to execute any one of the foregoing embodiments. The matrix calculation method described above.

本公开实施例还提供一种计算机程序产品,其中,其特征在于:包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行前述实施例中的任一所述的矩阵计算方法。Embodiments of the present disclosure also provide a computer program product, which is characterized by comprising computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute the matrix in any one of the foregoing embodiments. calculation method.

本公开实施例还提供一种计算装置,其特征在于,包括所述实施例中的任一所述的芯片。An embodiment of the present disclosure further provides a computing device, which is characterized in that it includes the chip described in any one of the embodiments.

本公开附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、任务段、或代码的一部分,该模块、任务段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the figures of the present disclosure illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, task segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances.

本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.

在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

Claims (10)

1.一种矩阵计算电路,其特征在于,包括:1. a matrix computing circuit, is characterized in that, comprises: 第一数据读取电路,用于根据第一矩阵的读取地址读取第一矩阵中的多个第一数据;根据所述多个第一数据生成第二数据读取控制信号;a first data reading circuit, configured to read a plurality of first data in the first matrix according to a read address of the first matrix; generate a second data read control signal according to the plurality of first data; 第二数据读取电路,用于根据所述第二数据读取控制信号以及第二矩阵的读取地址读取第二矩阵中的多个第二数据;a second data read circuit, configured to read a plurality of second data in the second matrix according to the second data read control signal and the read address of the second matrix; 计算电路,用于根据所述多个第一数据和所述多个第二数据计算得到第三数据。A calculation circuit, configured to calculate and obtain third data according to the plurality of first data and the plurality of second data. 2.如权利要求1所述的矩阵计算电路,其中,所述第一数据读取电路用于根据所述多个第一数据生成第二数据读取控制信号,包括:2. The matrix calculation circuit of claim 1, wherein the first data read circuit is configured to generate a second data read control signal according to the plurality of first data, comprising: 所述第一数据读取电路用于根据所述多个数据的每一个数据是否为零,生成所述第二数据读取控制信号。The first data read circuit is configured to generate the second data read control signal according to whether each data of the plurality of data is zero. 3.如权利要求2所述的矩阵计算电路,所述根据所述多个数据的每一个数据是否为零,生成所述第二数据读取控制信号,具体包括:3. The matrix calculation circuit according to claim 2, wherein the second data read control signal is generated according to whether each data of the plurality of data is zero, specifically comprising: 响应于所述多个第一数据中的每一个数据的值均为零,生成指示累加所述第二矩阵的读取地址的所述第二数据读取控制信号;或者,in response to the value of each of the plurality of first data being zero, generating the second data read control signal indicating to accumulate the read address of the second matrix; or, 响应于所述多个第一数据中至少有一个数据的值不为零,生成指示所述第二数据读取电路根据所述第二矩阵的读取地址读取所述多个第二数据的所述第二数据读取控制信号。In response to the value of at least one of the plurality of first data being non-zero, generating a signal instructing the second data read circuit to read the plurality of second data according to the read address of the second matrix the second data read control signal. 4.如权利要求1所述的矩阵计算电路,其特征在于,所述第一数据读取电路,包括:4. The matrix computing circuit according to claim 1, wherein the first data reading circuit comprises: 第一数值比较电路,用于判断所述多个第一数据中的每一个数据的值是否为零;a first value comparison circuit, configured to determine whether the value of each of the plurality of first data is zero; 第一控制电路,用于根据所述第一数值比较电路的判断结果生成所述第二数据读取控制信号。The first control circuit is configured to generate the second data read control signal according to the judgment result of the first numerical comparison circuit. 5.如权利要求1-4任一项所述的矩阵计算电路,其特征在于,所述第二数据读取电路,还包括:5. The matrix calculation circuit according to any one of claims 1-4, wherein the second data reading circuit further comprises: 第二数值比较电路,用于判断所述多个第二数据中的每一个数据的值是否为零;a second value comparison circuit, configured to determine whether the value of each of the plurality of second data is zero; 第二控制电路,用于根据所述第二数值比较电路的判断结果生成计算控制信号。The second control circuit is configured to generate a calculation control signal according to the judgment result of the second numerical value comparison circuit. 6.如权利要求5所述的矩阵计算电路,其特征在于,所述第二控制电路,用于:6. The matrix calculation circuit according to claim 5, wherein the second control circuit is used for: 响应于所述多个第二数据中至少有一个不为零,生成所述计算控制信号以指示所述计算电路执行计算操作。The calculation control signal is generated to instruct the calculation circuit to perform a calculation operation in response to at least one of the plurality of second data being non-zero. 7.如权利要求4-6中任一项所述的矩阵计算电路,其特征在于,所述第一控制电路包括:7. The matrix calculation circuit according to any one of claims 4-6, wherein the first control circuit comprises: 第一控制信号产生电路,用于根据所述第一数值比较电路的判断结果生成第一控制信号;a first control signal generating circuit, configured to generate a first control signal according to the judgment result of the first numerical comparison circuit; 第一读取地址生成电路,用于累加所述第一矩阵的读取地址以得到新的第一矩阵的读取地址。The first read address generation circuit is configured to accumulate the read addresses of the first matrix to obtain a new read address of the first matrix. 8.如权利要求1-7任一项所述的矩阵计算电路,其特征在于:8. The matrix computing circuit according to any one of claims 1-7, wherein: 所述多个第一数据为所述第一矩阵中的一列第一数据;The plurality of first data is a column of first data in the first matrix; 所述多个第二数据为所述第二矩阵中的一行第二数据。The plurality of second data are one row of second data in the second matrix. 9.如权利要求1所述的矩阵计算电路,其特征在于,所述计算电路,包括:9. The matrix calculation circuit according to claim 1, wherein the calculation circuit comprises: 计算单元阵列,其中所述计算单元阵列中包括多个计算单元;a computing unit array, wherein the computing unit array includes a plurality of computing units; 所述计算单元阵列中的每一行计算单元分别接收所述多个第一数据;Each row of computing units in the computing unit array respectively receives the plurality of first data; 所述计算单元阵列中的每一列计算单元分别接收所述多个第二数据。Each column of computing units in the computing unit array respectively receives the plurality of second data. 10.一种矩阵计算方法,其特征在于,包括:10. A matrix calculation method, characterized in that, comprising: 根据所述第一矩阵的读取地址读取第一矩阵中的多个第一数据;Read a plurality of first data in the first matrix according to the read address of the first matrix; 根据所述多个第一数据生成第二数据读取控制信号;generating a second data read control signal according to the plurality of first data; 根据所述第二数据读取控制信号以及第二矩阵的读取地址读取第二矩阵中的多个第二数据;Read a plurality of second data in the second matrix according to the second data read control signal and the read address of the second matrix; 根据所述多个第一数据和所述多个第二数据计算得到第三数据。The third data is calculated and obtained according to the plurality of first data and the plurality of second data.
CN202010807002.7A 2020-08-12 2020-08-12 Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium Pending CN114077718A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010807002.7A CN114077718A (en) 2020-08-12 2020-08-12 Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010807002.7A CN114077718A (en) 2020-08-12 2020-08-12 Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN114077718A true CN114077718A (en) 2022-02-22

Family

ID=80280051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010807002.7A Pending CN114077718A (en) 2020-08-12 2020-08-12 Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN114077718A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126481A (en) * 2016-06-29 2016-11-16 华为技术有限公司 A kind of computing engines and electronic equipment
CN107957976A (en) * 2017-12-15 2018-04-24 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN108664447A (en) * 2017-03-31 2018-10-16 华为技术有限公司 A kind of multiplying method and device of matrix and vector

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126481A (en) * 2016-06-29 2016-11-16 华为技术有限公司 A kind of computing engines and electronic equipment
CN108664447A (en) * 2017-03-31 2018-10-16 华为技术有限公司 A kind of multiplying method and device of matrix and vector
CN107957976A (en) * 2017-12-15 2018-04-24 北京中科寒武纪科技有限公司 A kind of computational methods and Related product

Similar Documents

Publication Publication Date Title
US11907830B2 (en) Neural network architecture using control logic determining convolution operation sequence
CN111831254B (en) Image processing acceleration method, image processing model storage method and corresponding device
WO2022037257A1 (en) Convolution calculation engine, artificial intelligence chip, and data processing method
US11664070B2 (en) In-memory computation device and in-memory computation method to perform multiplication operation in memory cell array according to bit orders
CN114930311B (en) Cascaded communication between FPGA repeating units
US20200090051A1 (en) Optimization problem operation method and apparatus
CN110597487A (en) A Matrix-Vector Multiplication Circuit and Calculation Method
EP4206996A1 (en) Neural network accelerator with configurable pooling processing unit
CN112200310B (en) Intelligent processor, data processing method and storage medium
CN114077718A (en) Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
CN112418413B (en) Apparatus and method for storing data and apparatus for performing packet convolution operation
CN110889259A (en) Sparse matrix vector multiplication calculation unit for arranged block diagonal weight matrix
WO2022053032A1 (en) Matrix calculation circuit, method, electronic device, and computer-readable storage medium
CN115994040A (en) Computing system, method for data broadcasting and data reduction, and storage medium
CN114168894A (en) Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
CN113836481B (en) Matrix computing circuit, method, electronic device, and computer-readable storage medium
CN113961871A (en) Matrix computing circuit, method, electronic device, and computer-readable storage medium
CN113988279A (en) Output current reading method and system of storage array supporting negative value excitation
KR20220143333A (en) Mobilenet hardware accelator with distributed sram architecture and channel stationary data flow desigh method thereof
CN117632607B (en) Programmable digital signal parallel processor and abnormality detection and fault recognition method thereof
CN115280272A (en) Data access circuit and method
CN104035744A (en) Optical feedback strategy based on MSD adder
CN114282158A (en) Matrix calculation circuit, matrix calculation method, electronic device, and computer-readable storage medium
CN115617717B (en) Memristor-based coprocessor design method
CN114065124A (en) Computing device and computing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: Room 201, No. 6 Fengtong Heng Street, Huangpu District, Guangzhou City, Guangdong Province

Applicant after: Guangzhou Ximu Semiconductor Technology Co.,Ltd.

Address before: Building 202-24, No. 6, Courtyard 1, Gaolizhang Road, Haidian District, Beijing

Applicant before: Beijing SIMM Computing Technology Co.,Ltd.

Country or region before: China