TW201937412A

TW201937412A - Integrated circuit chip device and related product has the advantages of small amount of calculation and low power consumption

Info

Publication number: TW201937412A
Application number: TW108100779A
Authority: TW
Inventors: 放棄姓名表示權
Original assignee: 大陸商上海寒武紀資訊科技有限公司
Priority date: 2018-02-27
Filing date: 2019-01-09
Publication date: 2019-09-16
Also published as: CN110197263A; TWI787430B; CN110197263B

Abstract

Disclosed are an integrated circuit chip device and a related product. The integrated circuit chip device comprises a main processing circuit and a plurality of basic processing circuits. The main processing circuit comprises a first mapping circuit. At least one of the plurality of basic processing circuits comprises a second mapping circuit. The first mapping circuit and the second mapping circuit are both used for executing compression processing of each data in neural network operation. The plurality of basic processing circuits are distributed in an array mode. Each basic processing circuit is connected with other adjacent basic processing circuits. The main processing circuit is connected with n basic processing circuits of the first row, n basic processing circuits of the m row, and m basic processing circuits of the first column. The technical scheme provided by the invention has the advantages of small amount of calculation and low power consumption.

Description

Integrated circuit chip device and related products

本披露涉及神經網絡領域，尤其涉及一種集成電路芯片裝置及相關產品。The present disclosure relates to the field of neural networks, and in particular, to an integrated circuit chip device and related products.

人工神經網絡（Artificial Neural Network， ANN ），是20世紀80 年代以來人工智能領域興起的研究熱點。它從信息處理角度對人腦神經元網絡進行抽象，建立某種簡單模型，按不同的連接方式組成不同的網絡。在工程與學術界也常直接簡稱為神經網絡或類神經網絡。神經網絡是一種運算模型，由大量的節點（或稱神經元）之間相互聯接構成。現有的神經網絡的運算基於中央處理器（Central Processing Unit，CPU）或圖形處理器（Graphics Processing Unit，GPU）來實現神經網絡的運算，此種運算的計算量大，功耗高。Artificial neural network (Artificial Neural Network, ANN) is a research hotspot that has emerged in the field of artificial intelligence since the 1980s. It abstracts the human brain neuron network from the perspective of information processing, establishes some simple model, and forms different networks according to different connection methods. In engineering and academia, it is often referred to as neural network or neural network. A neural network is a computing model that consists of a large number of nodes (or neurons) connected to each other. The existing neural network operations are based on a central processing unit (CPU) or a graphics processor (Graphics Processing Unit, GPU) to implement the operations of the neural network. Such operations have a large amount of calculation and high power consumption.

本披露實施例提供了一種集成電路芯片裝置及相關產品，可提升計算裝置的處理速度，提高效率。The embodiments of the present disclosure provide an integrated circuit chip device and related products, which can increase the processing speed and efficiency of a computing device.

第一方面，提供一種集成電路芯片裝置，所述集成電路芯片裝置包括：主處理電路以及多個基礎處理電路；所述主處理電路包括第一映射電路，所述多個基礎處理電路中至少一個電路（即部分或全部基礎處理電路）包括第二映射電路，所述第一映射電路以及所述第二映射電路均用於執行神經網絡運算中的各個數據的壓縮處理；According to a first aspect, an integrated circuit chip device is provided. The integrated circuit chip device includes a main processing circuit and a plurality of basic processing circuits. The main processing circuit includes a first mapping circuit and at least one of the plurality of basic processing circuits. The circuit (that is, part or all of the basic processing circuit) includes a second mapping circuit, and the first mapping circuit and the second mapping circuit are both used to perform compression processing of each data in a neural network operation;

所述多個基礎處理電路呈陣列分布；每個基礎處理電路與相鄰的其他基礎處理電路連接，所述主處理電路連接第1行的n個基礎處理電路、第m行的n個基礎處理電路以及第1列的m個基礎處理電路；The multiple basic processing circuits are distributed in an array; each basic processing circuit is connected to other adjacent basic processing circuits, and the main processing circuit is connected to n basic processing circuits in the first row and n basic processing in the m row. Circuit and m basic processing circuits in column 1;

所述主處理電路，用於執行神經網絡運算中的各個連續的運算以及和與其相連的所述基礎處理電路傳輸數據；The main processing circuit is configured to perform each continuous operation in a neural network operation and transmit data to the basic processing circuit connected thereto;

所述多個基礎處理電路，用於依據傳輸的數據以並行方式執行神經網絡中的運算，並將運算結果通過與所述主處理電路連接的基礎處理電路傳輸給所述主處理電路。The plurality of basic processing circuits are configured to perform operations in the neural network in a parallel manner according to the transmitted data, and transmit the operation results to the main processing circuit through the basic processing circuit connected to the main processing circuit.

第二方面，提供一種神經網絡運算裝置，所述神經網絡運算裝置包括一個或多個第一方面提供的集成電路芯片裝置。In a second aspect, a neural network computing device is provided. The neural network computing device includes one or more integrated circuit chip devices provided in the first aspect.

第三方面，提供一種組合處理裝置，所述組合處理裝置包括：第二方面提供的神經網絡運算裝置、通用互聯介面和通用處理裝置；According to a third aspect, a combined processing device is provided. The combined processing device includes a neural network computing device, a universal interconnection interface, and a universal processing device provided in the second aspect;

所述神經網絡運算裝置通過所述通用互聯介面與所述通用處理裝置連接。The neural network computing device is connected to the universal processing device through the universal interconnection interface.

第四方面，提供一種芯片，所述芯片集成第一方面的裝置、第二方面的裝置或第三方面的裝置。According to a fourth aspect, a chip is provided, which integrates the device of the first aspect, the device of the second aspect, or the device of the third aspect.

第五方面，提供一種電子設備，所述電子設備包括第四方面的芯片。According to a fifth aspect, an electronic device is provided, and the electronic device includes a chip according to the fourth aspect.

第六方面，提供一種神經網絡的運算方法，所述方法應用在集成電路芯片裝置內，所述集成電芯片裝置包括：第一方面所述的集成電路芯片裝置，所述集成電路芯片裝置用於執行神經網絡的運算。According to a sixth aspect, a method for calculating a neural network is provided. The method is applied in an integrated circuit chip device. The integrated electrical chip device includes the integrated circuit chip device according to the first aspect. The integrated circuit chip device is used for: Perform operations on neural networks.

可以看出，通過本披露實施例，提供映射電路將數據塊壓縮處理後再進行運算，節省了傳輸資源以及計算資源，所以其具有功耗低，計算量小的優點。It can be seen that, according to the embodiment of the present disclosure, a mapping circuit is provided to compress data blocks and then perform operations, which saves transmission resources and computing resources, so it has the advantages of low power consumption and small calculation amount.

為了使本技術領域的人員更好地理解本披露方案，下面將結合本披露實施例中的圖式，對本披露實施例中的技術方案進行清楚、完整地描述，顯然，所描述的實施例僅僅是本披露一部分實施例，而不是全部的實施例。基於本披露中的實施例，所屬技術領域中具有通常知識者在沒有作出創造性勞動前提下所獲得的所有其他實施例，都屬於本披露保護的範圍。In order to enable those skilled in the art to better understand the disclosure scheme, the technical solutions in the embodiments of the present disclosure will be described clearly and completely in combination with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are merely These embodiments are part of, but not all of the embodiments of this disclosure. Based on the embodiments in the present disclosure, all other embodiments obtained by persons with ordinary knowledge in the technical field without making creative labor fall into the scope of protection of the present disclosure.

在第一方面提供的裝置中，所述主處理電路用於獲取待計算的數據塊以及運算指令，依據所述運算指令將所述待計算的數據塊劃分為橫向數據塊和豎向數據塊；將所述橫向數據塊和預存的所述橫向數據塊關聯的標識數據塊進行拆分處理得到多個基本數據塊以及所述基本數據塊關聯的標識數據塊；將所述多個基本數據塊以及所述多個基本數據塊各自關聯的標識數據塊分發至與其連接的基礎處理電路；將所述豎向數據塊以及該豎向數據塊關聯的標識數據塊廣播至與其連接的基礎處理電路。其中，所述標識數據塊具體可用直接索引或者步長索引的方式來表示，可選的還可用列表的列表（List of Lists，LIL）、坐標列表（Coordinate list，COO）、壓縮稀疏行（Compressed Sparse Row，CSR）、壓縮稀疏列（Compressed Sparse Column，CSC）、ELL（ELL Pack）以及混合（Hybird，HYB）等方式表示，本申請不做限定。In the apparatus provided by the first aspect, the main processing circuit is configured to obtain a data block to be calculated and an operation instruction, and divide the data block to be calculated into a horizontal data block and a vertical data block according to the operation instruction; Splitting the horizontal data block and the pre-stored identification data block associated with the horizontal data block to obtain multiple basic data blocks and the identification data block associated with the basic data block; dividing the multiple basic data blocks and The identification data blocks associated with each of the plurality of basic data blocks are distributed to a basic processing circuit connected thereto; the vertical data block and the identification data block associated with the vertical data block are broadcast to a basic processing circuit connected thereto. The identification data block may be specifically expressed by a direct index or a step index. Alternatively, a list of lists (LIL), a coordinate list (COO), and a compressed sparse row (Compressed) are optional. Sparse Row (CSR), Compressed Sparse Column (CSC), ELL (ELL Pack), and Hybrid (Hybird, HYB) are used to indicate, this application is not limited.

以所述標識數據塊用直接索引的方式表示為例，所述標識數據塊具體可為是由0和1構成的數據塊，其中，0表示數據塊中包含的數據（如權值或輸入神經元）的絕對值小於或等於第一閾值，1表示數據塊中包含的數據（如權值或輸入神經元）的絕對值大於第一閾值，第一閾值為用戶側或裝置側自定義隨機設置的，例如0.05、0等等。Taking the identification data block as an example of direct indexing, the identification data block may specifically be a data block composed of 0 and 1, where 0 represents data contained in the data block (such as weight or input nerve). The absolute value of the element is less than or equal to the first threshold. 1 means that the absolute value of the data (such as weights or input neurons) contained in the data block is greater than the first threshold. The first threshold is a user- or device-side custom random setting. , Such as 0.05, 0, and so on.

為節省數據傳輸量、提高數據傳輸效率，在所述主處理電路向所述基礎處理電路發送數據的過程中，具體可將所述多個基本數據塊中的目標數據以及所述多個基本數據塊各自關聯的標識數據塊分發至與其連接的基礎處理電路；可選，還可將所述處理後的豎向數據塊中的目標數據以及該豎向數據塊關聯的標識數據塊廣播至與其連接的基礎處理電路。其中，所述目標數據是指數據塊中絕對值大於第一閾值的數據，或者是指數據塊（這裡具體可為處理後的橫向數據塊或處理後的豎向數據塊）中的非0數據。In order to save data transmission amount and improve data transmission efficiency, in the process that the main processing circuit sends data to the basic processing circuit, the target data in the multiple basic data blocks and the multiple basic data may be specifically The identification data block associated with each block is distributed to the basic processing circuit connected to it; optionally, the target data in the processed vertical data block and the identification data block associated with the vertical data block can also be broadcast to the connection to it Basic processing circuit. The target data refers to data whose absolute value in the data block is greater than the first threshold value, or non-zero data in the data block (here, specifically, a processed horizontal data block or a processed vertical data block). .

相應地，所述基礎處理電路用於啓動所述第二映射電路根據所述豎向數據塊關聯的標識數據塊和所述基本數據塊關聯的標識數據獲得連接標識數據塊；根據所述連接標識數據塊對所述豎向數據塊和所述基本數據塊進行處理得到處理後的豎向數據塊和基本數據塊；對所述處理後的豎向數據塊和基本數據塊執行內積運算得到運算結果，將所述運算結果發送至所述主處理電路；Accordingly, the basic processing circuit is configured to start the second mapping circuit to obtain a connection identification data block according to the identification data block associated with the vertical data block and the identification data associated with the basic data block; according to the connection identification Data block processing the vertical data block and the basic data block to obtain a processed vertical data block and a basic data block; performing an inner product operation on the processed vertical data block and the basic data block to obtain an operation As a result, the operation result is sent to the main processing circuit;

所述主處理電路，用於對所述運算結果處理得到所述待計算的數據塊以及運算指令的指令結果。The main processing circuit is configured to process the operation result to obtain the data block to be calculated and an instruction result of an operation instruction.

例如，橫向數據塊為M1行N1列的矩陣，基本數據塊為M2行N2列的矩陣，其中M1＞M2，N1＞N2。相應地，該橫向數據塊關聯的標識數據塊同樣也為M1行N1列的矩陣，該基本數據塊關聯的標識數據塊同樣為M2行N2列的矩陣。以基本數據塊為2*2的矩陣為例，設為，第一閾值為0.05，則該基本數據塊關聯的標識數據塊為。關於第一映射電路和第二映射電路對數據塊的處理將在後文進行具體闡述。For example, the horizontal data block is a matrix of M1 rows and N1 columns, and the basic data block is a matrix of M2 rows and N2 columns, where M1> M2 and N1> N2. Correspondingly, the identification data block associated with the horizontal data block is also a matrix with M1 rows and N1 columns, and the identification data block associated with the basic data block is also a matrix with M2 rows and N2 columns. Taking a matrix with a basic data block of 2 * 2 as an example, set , The first threshold is 0.05, then the identification data block associated with the basic data block is . The processing of the data blocks by the first mapping circuit and the second mapping circuit will be specifically described later.

在第一方面提供的裝置中，所述主處理電路，用於獲取待計算的數據塊以及運算指令，依據所述運算指令將所述待計算的數據塊劃分為橫向數據塊和豎向數據塊；啓動所述第一映射電路對所述橫向數據塊和所述豎向數據塊進行處理得到處理後的橫向數據塊以及該橫向數據塊關聯的標識數據塊，處理後的豎向數據塊以及該豎向數據塊關聯的標識數據塊；將所述處理後的橫向數據塊以及該橫向數據塊關聯的標識數據塊進行拆分處理得到多個基本數據塊以及所述基本數據塊各自關聯的標識數據塊，將所述多個基本數據塊以及所述多個基本數據塊各自關聯的標識數據塊分發至與其連接的基礎處理電路，將所述豎向數據塊以及該豎向數據塊關聯的標識數據塊廣播至與其連接的基礎處理電路；In the apparatus provided by the first aspect, the main processing circuit is configured to obtain a data block to be calculated and an operation instruction, and divide the data block to be calculated into a horizontal data block and a vertical data block according to the operation instruction. ; Starting the first mapping circuit to process the horizontal data block and the vertical data block to obtain a processed horizontal data block and an identification data block associated with the horizontal data block, the processed vertical data block, and the The identification data block associated with the vertical data block; splitting the processed horizontal data block and the identification data block associated with the horizontal data block to obtain a plurality of basic data blocks and identification data associated with each of the basic data blocks Block, distributing the plurality of basic data blocks and identification data blocks each associated with the plurality of basic data blocks to a basic processing circuit connected thereto, and distributing the vertical data block and identification data associated with the vertical data block Block broadcast to the underlying processing circuit connected to it;

所述基礎處理電路，用於啓動所述第二映射電路根據所述豎向數據塊關聯的標識數據塊和所述基本數據塊關聯的標識數據獲得連接標識數據塊；根據所述連接標識數據塊對所述豎向數據塊和所述基本數據塊進行處理得到處理後的豎向數據塊和基本數據塊；對所述處理後的豎向數據塊和基本數據塊執行內積運算得到運算結果，將所述運算結果發送至所述主處理電路；The basic processing circuit is configured to start the second mapping circuit to obtain a connection identification data block according to the identification data block associated with the vertical data block and the identification data associated with the basic data block; according to the connection identification data block Processing the vertical data block and the basic data block to obtain a processed vertical data block and a basic data block; performing an inner product operation on the processed vertical data block and the basic data block to obtain an operation result, Sending the operation result to the main processing circuit;

在可選實施例中，所述主處理電路，還具體用於將所述豎向數據塊或處理後的豎向數據塊以及該豎向數據塊關聯的標識數據塊進行拆分處理得到多個部分豎向數據塊以及所述多個部分豎向數據塊各自關聯的標識數據塊；將所述多個部分豎向數據塊以及所述多個部分豎向數據塊各自關聯的標識數據塊通過一次或多次廣播給所述基礎處理電路；其中，所述多個部分豎向數據塊組合形成所述豎向數據塊或處理後的豎向數據塊。In an optional embodiment, the main processing circuit is further specifically configured to split the vertical data block or the processed vertical data block and the identification data block associated with the vertical data block to obtain multiple data blocks. A portion of the vertical data block and the identification data block associated with each of the plurality of partial vertical data blocks; passing the identification data block associated with the plurality of partial vertical data blocks and the plurality of partial vertical data blocks once Or broadcast to the basic processing circuit multiple times; wherein the plurality of vertical data blocks are combined to form the vertical data block or the processed vertical data block.

相應地，所述基礎處理電路，具體用於啓動所述第二映射電路根據所述部分豎向數據塊關聯的標識數據塊以及所述基本數據塊關聯的標識數據塊得到連接標識數據塊；根據所述連接標識數據對所述部分豎向數據塊以及所述基本數據塊進行處理得到處理後的部分豎向數據塊以及處理後的基本數據塊；對所述處理後的部分豎向數據塊以及所述處理後的基本數據塊執行內積運算。Accordingly, the basic processing circuit is specifically configured to start the second mapping circuit to obtain a connection identification data block according to the identification data block associated with the partial vertical data block and the identification data block associated with the basic data block; And processing, by the connection identification data, the partial vertical data block and the basic data block to obtain a processed partial vertical data block and a processed basic data block; the processed partial vertical data block and The processed basic data block performs an inner product operation.

其中，該連接標識數據塊是通過對所述基本數據塊關聯的標識數據塊和所述部分豎向數據塊關聯的標識數據塊進行逐元素與操作而獲得的數據塊。可選的，該連接標識數據塊用於表示兩個數據塊（具體為基本數據塊以及豎向數據塊）中數據均大於絕對值的數據。具體在後文進行詳述。The connection identification data block is a data block obtained by performing an element-wise AND operation on the identification data block associated with the basic data block and the identification data block associated with the partial vertical data block. Optionally, the connection identification data block is used to represent data in which the data in both data blocks (specifically, the basic data block and the vertical data block) is greater than an absolute value. The details will be described later.

例如，橫向數據塊關聯的標識數據塊為2*3的矩陣，部分豎向數據塊關聯的標識數據塊為2*2的矩陣，則對應獲得的連接標識數據塊為。For example, the identification data block associated with the horizontal data block is a 2 * 3 matrix , Part of the vertical data block is associated with a 2 * 2 matrix , The corresponding connection identification data block is .

在第一方面提供的裝置中，所述主處理電路，用於獲取待計算的數據塊以及運算指令，依據所述運算指令將所述待計算的數據塊劃分為橫向數據塊和豎向數據塊；啓動所述第一映射電路對所述橫向數據塊進行處理得到處理後的橫向數據塊以及該橫向數據塊關聯的標識數據塊，或者啓動所述第一映射電路根據預存的所述橫向數據塊關聯的標識數據塊對所述橫向數據塊進行處理得到處理後的橫向數據塊；將所述處理後的橫向數據塊以及該橫向數據塊關聯的標識數據塊進行拆分處理得到多個基本數據塊以及所述基本數據塊各自關聯的標識數據塊，將所述多個基本數據塊以及所述多個基本數據塊各自關聯的標識數據塊分發至與其連接的基礎處理電路，將所述豎向數據塊廣播至與其連接的基礎處理電路；In the apparatus provided by the first aspect, the main processing circuit is configured to obtain a data block to be calculated and an operation instruction, and divide the data block to be calculated into a horizontal data block and a vertical data block according to the operation instruction. ; Start the first mapping circuit to process the horizontal data block to obtain a processed horizontal data block and an identification data block associated with the horizontal data block, or start the first mapping circuit according to the pre-stored horizontal data block The associated identification data block processes the horizontal data block to obtain a processed horizontal data block; and splits the processed horizontal data block and the identification data block associated with the horizontal data block to obtain multiple basic data blocks. And an identification data block associated with each of the basic data blocks, distributing the plurality of basic data blocks and the identification data blocks associated with each of the plurality of basic data blocks to a basic processing circuit connected thereto, and transmitting the vertical data Block broadcast to the underlying processing circuit connected to it;

所述基礎處理電路，用於啓動所述第二映射電路根據所述基本數據塊關聯的標識數據塊對所述豎向數據塊進行處理，得到處理後的豎向數據塊；對所述處理後的豎向數據塊和所述處理後的基本數據塊執行內積運算得到運算結果，將所述運算結果發送至所述主處理電路；The basic processing circuit is configured to start the second mapping circuit to process the vertical data block according to the identification data block associated with the basic data block to obtain a processed vertical data block; after the processing, Performing an inner product operation on the vertical data block and the processed basic data block to obtain an operation result, and sending the operation result to the main processing circuit;

所述主處理電路，用於對所述運算結果處理得到指令結果。The main processing circuit is configured to process the operation result to obtain an instruction result.

在可選實施例中，所述主處理電路，還具體用於將所述豎向數據塊進行拆分處理得到多個部分豎向數據塊；將所述多個部分豎向數據塊通過一次或多次廣播給所述基礎處理電路；其中，所述多個部分豎向數據塊組合形成所述豎向數據塊或處理後的豎向數據塊。In an optional embodiment, the main processing circuit is further specifically configured to perform split processing on the vertical data block to obtain multiple partial vertical data blocks; pass the multiple vertical data blocks once or Broadcast to the basic processing circuit multiple times; wherein the plurality of partial vertical data blocks are combined to form the vertical data block or the processed vertical data block.

相應地，所述基礎處理電路具體用於根據所述基本數據塊關聯的標識數據塊對所述部分豎向數據塊進行處理得到處理後的部分豎向數據塊；對所述基本數據塊以及所述處理後的部分豎向數據塊執行內積運算。Correspondingly, the basic processing circuit is specifically configured to process the partial vertical data block according to the identification data block associated with the basic data block to obtain a processed partial vertical data block; Perform the inner product operation on the processed vertical data blocks.

在第一方面提供的裝置中，所述主處理電路，用於獲取待計算的數據塊以及運算指令，依據所述運算指令將所述待計算的數據塊劃分為橫向數據塊和豎向數據塊；啓動所述第一映射電路對所述豎向數據塊進行處理，得到處理後的豎向數據塊以及該豎向數據塊關聯的標識數據塊，或者啓動所述第一映射電路根據預存的所述豎向數據塊關聯的標識數據塊對所述豎向數據塊進行處理得到處理後的豎向數據塊；對所述橫向數據塊進行拆分處理得到多個基本數據塊；將所述多個基本數據塊分發至與其連接的基礎處理電路，將所述處理後的豎向數據塊以及該豎向數據塊關聯的標識數據塊廣播至與其連接的基礎處理電路；In the apparatus provided by the first aspect, the main processing circuit is configured to obtain a data block to be calculated and an operation instruction, and divide the data block to be calculated into a horizontal data block and a vertical data block according to the operation instruction. Start the first mapping circuit to process the vertical data block to obtain a processed vertical data block and an identification data block associated with the vertical data block, or start the first mapping circuit according to a pre-stored data block. The identification data block associated with the vertical data block processes the vertical data block to obtain a processed vertical data block; performs split processing on the horizontal data block to obtain a plurality of basic data blocks; The basic data block is distributed to a basic processing circuit connected thereto, and the processed vertical data block and the identification data block associated with the vertical data block are broadcast to the basic processing circuit connected thereto;

所述基礎處理電路，用於啓動所述第二映射電路根據所述豎向數據塊關聯的標識數據塊對所述基本數據塊進行處理得到處理後的基本數據塊；對所述處理後的豎向數據塊和所述處理後的基本數據塊執行內積運算得到運算結果，將所述運算結果發送至所述主處理電路；The basic processing circuit is configured to start the second mapping circuit to process the basic data block according to the identification data block associated with the vertical data block to obtain a processed basic data block; to the processed vertical data block; Performing an inner product operation on a data block and the processed basic data block to obtain an operation result, and sending the operation result to the main processing circuit;

在可選實施例中，所述主處理電路，還具體用於將所述處理後的豎向數據塊和該豎向數據塊關聯的標識數據塊進行拆分處理得到多個部分豎向數據塊以及所述多個部分豎向數據塊關聯的標識數據塊；將所述多個部分豎向數據塊以及所述多個部分豎向數據塊各自關聯的標識數據塊通過一次或多次廣播給所述基礎處理電路；其中，所述多個部分豎向數據塊組合形成所述豎向數據塊或處理後的豎向數據塊。In an optional embodiment, the main processing circuit is further specifically configured to perform split processing on the processed vertical data block and an identification data block associated with the vertical data block to obtain multiple partial vertical data blocks. And the identification data block associated with the plurality of partial vertical data blocks; and broadcasting the identification data block associated with each of the plurality of vertical data blocks and the plurality of partial vertical data blocks to one or more broadcasts through one or more times. The basic processing circuit; wherein the plurality of partial vertical data blocks are combined to form the vertical data block or the processed vertical data block.

相應地，所述基礎處理電路具體用於根據所述部分豎向數據塊關聯的標識數據塊對所述基本數據塊進行處理得到處理後的基本數據塊；對所述處理後的基本數據塊以及所述部分豎向數據塊執行內積運算。Accordingly, the basic processing circuit is specifically configured to process the basic data block according to the identification data block associated with the partial vertical data block to obtain a processed basic data block; to the processed basic data block and The partial vertical data block performs an inner product operation.

在第一方面提供的裝置中，所述主處理電路，具體用於將該豎向數據塊（具體可為所述豎向數據塊或者處理後的豎向數據塊）通過一次廣播發送至與其連接的所述基礎處理電路。In the device provided by the first aspect, the main processing circuit is specifically configured to send the vertical data block (specifically, the vertical data block or the processed vertical data block) to a connection with the vertical data block through one broadcast. The basic processing circuit.

在第一方面提供的裝置中，所述基礎處理電路，具體用於將該基本數據塊（同理可為所述基本數據塊或處理後的基本數據塊）與該豎向數據塊執行內積處理得到內積處理結果，將所述內積處理結果累加得到運算結果，將所述運算結果發送至所述主處理電路。In the device provided by the first aspect, the basic processing circuit is specifically configured to perform an inner product of the basic data block (the same can be the basic data block or the processed basic data block) and the vertical data block. The inner product processing result is obtained by processing, the inner product processing result is accumulated to obtain an operation result, and the operation result is sent to the main processing circuit.

在第一方面提供的裝置中，所述主處理電路，用於在如所述運算結果為內積處理的結果時，對所述運算結果累加後得到累加結果，將該累加結果排列得到所述待計算的數據塊以及運算指令的指令結果。In the device provided by the first aspect, the main processing circuit is configured to, when the operation result is a result of an inner product processing, accumulate the operation result to obtain an accumulation result, and arrange the accumulation result to obtain the result. The data block to be calculated and the instruction result of the operation instruction.

在第一方面提供的裝置中，所述主處理電路，具體用於將所述豎向數據塊分成多個部分豎向數據塊，將所述多個部分豎向數據塊通過多次廣播至所述基礎處理電路；所述多個部分豎向數據塊組合形成所述豎向數據塊。In the device provided by the first aspect, the main processing circuit is specifically configured to divide the vertical data block into multiple partial vertical data blocks, and broadcast the multiple partial vertical data blocks to all The basic processing circuit is described; the plurality of partial vertical data blocks are combined to form the vertical data block.

在第一方面提供的裝置中，所述基礎處理電路，具體用於將該部分豎向數據塊（具體可為部分豎向數據塊或者處理後的部分豎向數據塊）與該基本數據塊執行一次內積處理後得到內積處理結果，將所述內積處理結果累加得到部分運算結果，將所述部分運算結果發送至所述主處理電路。In the device provided by the first aspect, the basic processing circuit is specifically configured to execute the part of the vertical data block (specifically, the part of the vertical data block or the processed part of the vertical data block) and the basic data block. An inner product processing result is obtained after one inner product processing, the inner product processing result is accumulated to obtain a partial operation result, and the partial operation result is sent to the main processing circuit.

在第一方面提供的裝置中，所述基礎處理電路，具體用於復用n次該部分豎向數據塊執行該部分豎向數據塊與該n個基本數據塊內積運算得到n個部分處理結果，將n個部分處理結果分別累加後得到n個部分運算結果，將所述n個部分運算結果發送至主處理電路，所述n為大於等於2的整數。In the device provided by the first aspect, the basic processing circuit is specifically configured to multiplex the partial vertical data block n times and execute the inner product operation of the partial vertical data block and the n basic data blocks to obtain n partial processes. As a result, n partial processing results are accumulated after obtaining n partial computing results, and the n partial computing results are sent to the main processing circuit, where n is an integer greater than or equal to two.

在第一方面提供的裝置中，所述主處理電路包括：主寄存器或主片上緩存電路；In the apparatus provided by the first aspect, the main processing circuit includes: a main register or a main on-chip cache circuit;

所述基礎處理電路包括：基本寄存器或基本片上緩存電路。The basic processing circuit includes a basic register or a basic on-chip buffer circuit.

在第一方面提供的裝置中，所述主處理電路包括：向量運算器電路、算數邏輯單元電路、累加器電路、矩陣轉置電路、直接內存存取電路、第一映射電路或數據重排電路中的一種或任意組合。In the device provided by the first aspect, the main processing circuit includes: a vector operator circuit, an arithmetic logic unit circuit, an accumulator circuit, a matrix transposition circuit, a direct memory access circuit, a first mapping circuit, or a data rearrangement circuit One or any combination.

在第一方面提供的裝置中，所述基礎處理電路，還具體用於將該豎向數據塊和基本數據塊轉發給其他基礎處理電路以先進行數據處理再執行內積運算得到運算結果，將所述運算結果發送至所述主處理電路；In the device provided in the first aspect, the basic processing circuit is further specifically configured to forward the vertical data block and the basic data block to other basic processing circuits to perform data processing first and then perform an inner product operation to obtain an operation result. Sending the operation result to the main processing circuit;

在第一方面提供的裝置中，所述數據塊可用張量表示，其具體可為：向量、矩陣、三維數據塊、四維數據塊以及n維數據塊中一種或任意組合。In the device provided by the first aspect, the data block may be represented by a tensor, which may specifically be one or any combination of a vector, a matrix, a three-dimensional data block, a four-dimensional data block, and an n-dimensional data block.

在第一方面提供的裝置中，如所述運算指令為乘法指令，所述主處理電路確定乘數數據塊為豎向數據塊，被乘數數據塊為橫向數據塊；In the apparatus provided by the first aspect, if the operation instruction is a multiplication instruction, the main processing circuit determines that the multiplier data block is a vertical data block, and the multiplicand data block is a horizontal data block;

或如所述運算指令為卷積指令，所述主處理電路確定卷積輸入數據塊為豎向數據塊，卷積核為橫向數據塊。Alternatively, if the operation instruction is a convolution instruction, the main processing circuit determines that the convolution input data block is a vertical data block, and the convolution kernel is a horizontal data block.

在第六方面提供的方法中，所述神經網絡的運算包括：卷積運算、矩陣乘矩陣運算、矩陣乘向量運算、偏執運算、全連接運算、通用矩陣乘法（General Matrix Multiplication， GEMM）運算、通用矩陣向量乘法（General Matrix Vector Multiplication， GEMV）運算、激活運算中的一種或任意組合。In the method provided by the sixth aspect, the operations of the neural network include: convolution operations, matrix multiplication matrix operations, matrix multiplication vector operations, paranoid operations, fully connected operations, general matrix multiplication (GEMM) operations, One or any combination of General Matrix Vector Multiplication (GEMV) operations and activation operations.

參閱圖1a，圖1a為本披露提供的一種集成電路芯片裝置，該集成電路芯片裝置包括：主處理電路和多個基礎處理電路，所述多個基礎處理電路呈陣列排布（m*n陣列），其中，m、n的取值範圍為大於等於1的整數且m、n中至少有一個值大於等於2。對於m*n陣列分布的多個基礎處理電路，每個基礎處理電路與相鄰的基礎處理電路連接，所述主處理電路連接多個基礎處理電路的k個基礎處理電路，所述k個基礎處理電路可以為：第1行的n個基礎處理電路、第m行的n個基礎處理電路以及第1列的m個基礎處理電路。如圖1a所示的集成電路芯片裝置，主處理電路包括第一映射電路，所述第一映射電路用於對數據進行壓縮處理，以獲得處理後的數據以及標識數據。該標識數據用於指示該數據的絕對值是否大於第一閾值。進一步地，所述主處理電路可僅將處理後的數據（具體可為絕對值大於第一閾值的數據）以及該數據關聯的標識數據發送給基礎處理電路。優點是：減少發送至基礎處理電路中進行數據處理的數據量，提升數據處理速率。該第一閾值為用戶側或裝置側自定義設置的，例如0.05、0.5等等，不做限定。Referring to FIG. 1a, FIG. 1a is an integrated circuit chip device provided by the present disclosure. The integrated circuit chip device includes a main processing circuit and a plurality of basic processing circuits. The plurality of basic processing circuits are arranged in an array (m * n array). ), Where the range of m and n is an integer greater than or equal to 1 and at least one of m and n is greater than or equal to 2. For a plurality of basic processing circuits distributed in an m * n array, each basic processing circuit is connected to an adjacent basic processing circuit, and the main processing circuit is connected to k basic processing circuits of the multiple basic processing circuits, and the k basic processing circuits are connected to each other. The processing circuits may be: n basic processing circuits in the first row, n basic processing circuits in the m row, and m basic processing circuits in the first column. As shown in the integrated circuit chip device shown in FIG. 1 a, the main processing circuit includes a first mapping circuit, and the first mapping circuit is configured to perform compression processing on data to obtain processed data and identification data. The identification data is used to indicate whether an absolute value of the data is greater than a first threshold. Further, the main processing circuit may send only the processed data (specifically, the data whose absolute value is greater than the first threshold) and the identification data associated with the data to the basic processing circuit. The advantage is: reduce the amount of data sent to the basic processing circuit for data processing, and improve the data processing rate. The first threshold is a user- or device-side customized setting, such as 0.05, 0.5, and the like, and is not limited.

例如，主處理電路的輸入數據為矩陣數據塊，經過第一映射電路處理後可獲得處理後的矩陣數據塊為，該矩陣數據塊關聯的標識數據塊為。關於第一映射電路的具體處理將在後文進行詳述。For example, the input data of the main processing circuit is a matrix data block , After processing by the first mapping circuit, the processed matrix data block can be obtained as , The identification data block associated with the matrix data block is . The specific processing of the first mapping circuit will be described in detail later.

相應地，在主處理電路向基礎處理電路分發數據時，可僅發送1和0.5這兩個數據，並非發送處理後的矩陣數據塊，8個數據；同時還需將該矩陣數據塊關聯的標識數據塊一起發送給基礎處理電路，以便基礎處理電路根據接收的標識數據塊和接收的兩個數據（1和0.5），對應獲知這兩個數據位於原矩陣數據塊的位置。即是，所述基礎處理電路可根據接收的標識數據塊以及接收的數據，對應還原出主處理電路中處理後的矩陣數據塊。Correspondingly, when the main processing circuit distributes data to the basic processing circuit, only two data of 1 and 0.5 can be sent, instead of sending the processed matrix data block, 8 data; at the same time, the identification of the matrix data block needs to be associated. The data blocks are sent to the basic processing circuit together, so that the basic processing circuit correspondingly learns that the two data are located in the original matrix data block according to the received identification data block and the two received data (1 and 0.5). That is, the basic processing circuit may correspondingly restore the processed matrix data block in the main processing circuit according to the received identification data block and the received data.

多個基礎電路中的至少一個基礎處理電路（即部分或者全部基礎處理電路）均可包括第二映射電路。具體的，多個基礎處理電路中可以有部分基礎處理電路包括第二映射電路，例如在可選方案中，可將k個基礎處理電路配置第二映射電路，這樣n個基礎處理電路可以分別負責對本列的m個基礎處理電路的數據進行壓縮處理步驟。此設置能夠提高運算效率，降低功耗，因為對於第1行的n個基礎處理電路來說，由於其最先接收到主處理電路發送的數據，那麼將該接收到的數據進行壓縮處理可以減少後續基礎處理電路的計算量以及與後續基礎處理電路的數據傳輸的量，同理，對於第一列的m個基礎處理電路配置第二映射電路也具有計算量小和功耗低的優點。另外，依據該結構，主處理電路可以採用動態的數據發送策略，例如，主處理電路向第1列的m個基礎處理電路廣播數據，主處理電路向第1行的n個基礎處理電路發送分發數據。關於第二映射電路的具體處理將在後文進行詳述。At least one basic processing circuit (that is, part or all of the basic processing circuits) of the plurality of basic circuits may include a second mapping circuit. Specifically, some of the plurality of basic processing circuits may include a second mapping circuit. For example, in an alternative solution, k basic processing circuits may be configured with the second mapping circuit, so that n basic processing circuits may be respectively responsible for Perform compression processing steps on the data of the m basic processing circuits in this column. This setting can improve the operation efficiency and reduce the power consumption, because for the n basic processing circuits in the first row, since it first receives the data sent by the main processing circuit, the compression processing of the received data can be reduced. The calculation amount of the subsequent basic processing circuit and the data transmission amount of the subsequent basic processing circuit are the same. For the m basic processing circuits in the first column, the configuration of the second mapping circuit also has the advantages of small calculation amount and low power consumption. In addition, according to this structure, the main processing circuit can adopt a dynamic data transmission strategy. For example, the main processing circuit broadcasts data to the m basic processing circuits in the first column, and the main processing circuit sends and distributes the data to the n basic processing circuits in the first row. data. The specific processing of the second mapping circuit will be described in detail later.

所述主處理電路，用於執行神經網絡運算中的各個連續的運算以及和與其相連的所述基礎處理電路傳輸數據；上述連續的運算但不限於：累加運算、ALU運算、激活運算等等運算。The main processing circuit is configured to perform each continuous operation in a neural network operation and transmit data to the basic processing circuit connected thereto; the above-mentioned continuous operation is not limited to: an accumulation operation, an ALU operation, an activation operation, and the like .

所述多個基礎處理電路，用於依據傳輸的數據以並行方式執行神經網絡中的運算，並將運算結果通過與所述主處理電路連接的基礎處理電路傳輸給所述主處理電路。上述並行方式執行神經網絡中的運算包括但不限於：內積運算、矩陣或向量乘法運算等等。The plurality of basic processing circuits are configured to perform operations in the neural network in a parallel manner according to the transmitted data, and transmit the operation results to the main processing circuit through the basic processing circuit connected to the main processing circuit. The above-mentioned parallel manner for performing operations in the neural network includes, but is not limited to, inner product operations, matrix or vector multiplication operations, and the like.

主處理電路可以包括：數據發送電路、數據接收電路或介面，該數據發送電路可以集成橫向數據分發電路以及豎向數據分發電路，當然在實際應用中，橫向數據分發電路以及豎向數據分發電路也可以分別設置。對於橫向數據，即需要按照行方向（或者橫向）發送給每個基礎處理電路的數據，如圖1a中將橫向數據發送給m行中的任一行或多行中的基礎處理電路。對於豎向數據，即需要按照列方向（或豎向）有選擇的發送給部分基礎處理電路的數據，具體的，如卷積運算，卷積運算的卷積輸入數據需要發送給所有的基礎處理電路，所有其為豎向數據，卷積核需要有選擇的發送給部分基礎數據塊，所以卷積核為橫向數據。橫向數據具體的選擇發送給那個基礎處理電路的方式可以由主處理電路依據負載以及其他分配方式進行具體的確定。對於豎向數據或橫向數據的發送方式，可將數據以廣播形式發送至每個基礎處理電路。（在實際應用中，通過一次廣播的方式將橫向/豎向數據發送至每個基礎處理電路，也可以通過多次廣播的方式將橫向/豎向數據發送至每個基礎處理電路，本披露具體實施方式並不限制上述廣播的次數）。可選的，針對上述橫向/豎向數據，主處理電路也可有選擇的發送給部分基礎處理電路。The main processing circuit may include: a data sending circuit, a data receiving circuit, or an interface. The data sending circuit may integrate a horizontal data distribution circuit and a vertical data distribution circuit. Of course, in practical applications, the horizontal data distribution circuit and the vertical data distribution circuit also Can be set separately. For horizontal data, that is, data that needs to be sent to each basic processing circuit in a row direction (or horizontal direction), as shown in FIG. 1a, the horizontal data is sent to any one or more of the m lines of basic processing circuits. For vertical data, that is, data that needs to be selectively sent to some basic processing circuits in the column direction (or vertical), specifically, such as convolution operations, convolution input data of convolution operations need to be sent to all basic processing The circuit is all vertical data. The convolution kernel needs to be selectively sent to some basic data blocks, so the convolution kernel is horizontal data. The way in which the horizontal data is specifically selected for sending to that basic processing circuit can be determined specifically by the main processing circuit based on the load and other distribution methods. For vertical or horizontal data transmission, the data can be sent to each basic processing circuit in a broadcast form. (In practical applications, horizontal / vertical data is sent to each basic processing circuit through one broadcast, and horizontal / vertical data can also be sent to each basic processing circuit through multiple broadcasts. This disclosure specifically The embodiment does not limit the number of broadcasts described above). Optionally, for the above horizontal / vertical data, the main processing circuit may also be selectively sent to some basic processing circuits.

主處理電路（如圖1d所示）可以包括寄存器和/或片上緩存電路，該主處理電路還可以包括:控制電路、向量運算器電路、算數邏輯單元（arithmetic and logic unit，ALU）電路、累加器電路、直接內存存取(Direct Memory Access，DMA)電路等電路，當然在實際應用中，上述主處理電路還可以添加，轉換電路（例如矩陣轉置電路）、數據重排電路或激活電路等等其他的電路。The main processing circuit (as shown in FIG. 1d) may include a register and / or an on-chip cache circuit. The main processing circuit may further include a control circuit, a vector operator circuit, an arithmetic and logic unit (ALU) circuit, and an accumulation circuit. Circuit, direct memory access (DMA) circuit, etc. Of course, in practical applications, the above main processing circuit can also be added, conversion circuit (such as matrix transposition circuit), data rearrangement circuit or activation circuit, etc. And other circuits.

每個基礎處理電路可以包括基礎寄存器和/或基礎片上緩存電路；每個基礎處理電路還可以包括：內積運算器電路、向量運算器電路、累加器電路等中一個或任意組合。上述內積運算器電路、向量運算器電路、累加器電路都可以是集成電路，上述內積運算器電路、向量運算器電路、累加器電路也可以為單獨設置的電路。Each basic processing circuit may include a basic register and / or a basic on-chip cache circuit; each basic processing circuit may further include one or any combination of an inner product operator circuit, a vector operator circuit, an accumulator circuit, and the like. The inner product operator circuit, the vector operator circuit, and the accumulator circuit may all be integrated circuits, and the inner product operator circuit, the vector operator circuit, and the accumulator circuit may also be separately provided circuits.

可選的，對於第m行n個基礎處理電路的累加器電路可以執行內積運算的累加運算，因為對於第m行基礎處理電路來說，其能夠接收到本列所有的基礎處理電路的乘積結果，而將內積運算的累加運算通過第m行的n個基礎處理電路執行內積運算的累加運算，這樣能夠對計算資源進行有效的分配，具有節省功耗的優點。此技術方案尤其對於m數量較大時更為適用。Optionally, the accumulator circuit for the nth basic processing circuit in the mth row can perform the accumulation operation of the inner product operation, because for the mth basic processing circuit, it can receive the product of all the basic processing circuits in this column. As a result, the accumulation operation of the inner product operation is performed by the n basic processing circuits in the m-th row, so that the calculation resources can be effectively allocated, which has the advantage of saving power consumption. This technical solution is especially applicable when the number of m is large.

對於數據的壓縮處理可以由主處理電路來分配執行的電路，具體的，可以通過顯示或隱式的方式來分配執行的電路，對於顯示方式，主處理電路可以配置一個特殊指示或指令，當基礎處理電路接收到該特殊指示或指令時，確定執行數據壓縮處理，如基礎處理電路未接收到特殊指示或指令時，確定不執行數據的壓縮處理。又如，可以以暗示的方式來執行，例如，基礎處理電路接收到稀疏數據（即含0，或包括小於預設閾值的數據大於預設數量）且確定需要執行內積運算時，將對稀疏數據進行壓縮處理。對於顯示配置的方式，特殊指令或指示可以配置一個遞減序列，該遞減序列的每經過一個基礎處理電路，數值減1，基礎處理電路讀取該遞減序列的值，如該值大於零，則執行數據壓縮處理，如該值等於或小於零，則不執行數據壓縮處理。此設置是依據陣列分配的基礎處理電路所配置的，例如對於第i列的m個基礎處理電路來說，主處理電路需要前面5個基礎處理電路執行數據壓縮處理，則主處理電路下發一個特殊指令，該特殊指令包含有遞減序列，該遞減序列的初始值可以為5，則每經過一個基礎處理電路，遞減序列的值即減1，到第5個基礎處理電路時，該遞減序列的值為1，到第6個基礎處理電路時，該遞減序列為0，此時第6個基礎處理電路將不在執行該數據壓縮處理，此種方式可以使得主處理電路可以動態的配置數據壓縮處理的執行主體以及執行次數。For data compression processing, the main processing circuit can be used to allocate and execute the circuit. Specifically, the display or implicit mode can be used to allocate and execute the circuit. For the display mode, the main processing circuit can be configured with a special instruction or instruction. When the processing circuit receives the special instruction or instruction, it determines to perform data compression processing. For example, when the basic processing circuit does not receive the special instruction or instruction, it determines to not perform data compression processing. As another example, it can be performed in an implied manner. For example, when the basic processing circuit receives sparse data (that is, contains zero, or includes data less than a preset threshold greater than a preset amount) and determines that an inner product operation needs to be performed, the sparse The data is compressed. For the display configuration mode, a special instruction or instruction can be configured with a decreasing sequence. The value of the decreasing sequence decreases by 1 each time the decreasing sequence passes through the basic processing circuit, and the basic processing circuit reads the value of the decreasing sequence. If the value is greater than zero, it executes Data compression processing. If the value is equal to or less than zero, no data compression processing is performed. This setting is configured according to the basic processing circuit allocated by the array. For example, for the m basic processing circuits in the i-th column, the main processing circuit needs the first five basic processing circuits to perform data compression processing, and then the main processing circuit issues a A special instruction that contains a decreasing sequence. The initial value of the decreasing sequence can be 5, and the value of the decreasing sequence is reduced by 1 after each basic processing circuit. When the fifth basic processing circuit is reached, the The value is 1. When the sixth basic processing circuit is reached, the decrementing sequence is 0. At this time, the sixth basic processing circuit will not perform the data compression processing. In this way, the main processing circuit can dynamically configure the data compression processing. And the number of executions.

本披露一個實施例提供一種集成電路芯片裝置，包括一個主處理電路（也可以稱為主單元）和多個基礎處理電路(也可以稱為基礎單元)；實施例的結構如圖1b所示；其中，虛線框中是所述神經網絡運算裝置的內部結構；灰色填充的箭頭表示主處理電路和基礎處理電路陣列之間的數據傳輸通路，空心箭頭表示基礎處理電路陣列中各個基礎處理電路（相鄰基礎處理電路）之間的數據傳輸通路。其中，基礎處理電路陣列的長寬長度可以不同，即m、n的取值可以不同，當然也可以相同，本披露並不限制上述取值的具體值。An embodiment of the present disclosure provides an integrated circuit chip device, including a main processing circuit (also referred to as a main unit) and a plurality of basic processing circuits (also referred to as a basic unit); the structure of the embodiment is shown in FIG. 1b; The dashed box is the internal structure of the neural network computing device; the gray-filled arrows indicate the data transmission path between the main processing circuit and the basic processing circuit array, and the hollow arrows indicate the basic processing circuits (phases) in the basic processing circuit array. (Neighboring basic processing circuits). The length, width, and length of the basic processing circuit array may be different, that is, the values of m and n may be different, and of course, they may be the same. The present disclosure does not limit the specific values of the foregoing values.

基礎處理電路的電路結構如圖1c所示；圖中虛線框表示基礎處理電路的邊界，與虛線框交叉的粗箭頭表示數據輸入輸出通道（指向虛線框內是輸入通道，指出虛線框是輸出通道）；虛線框中的矩形框表示存儲單元電路（寄存器和/或片上緩存），包括輸入數據1，輸入數據2，乘法或內積結果，累加數據；菱形框表示運算器電路，包括乘法或內積運算器，加法器。The circuit structure of the basic processing circuit is shown in Figure 1c; the dashed box in the figure indicates the boundary of the basic processing circuit, and the thick arrows crossing the dashed box indicate the data input and output channels (pointing to the dashed box is the input channel, and the dotted box is the output channel ); The rectangular box in the dashed box indicates the storage unit circuit (register and / or on-chip cache), including input data 1, input data 2, multiplication or inner product results, and accumulated data; diamond box indicates the arithmetic circuit, including multiplication or internal Product operator, adder.

本實施例中，所述神經網絡運算裝置包括一個主處理電路和16個基礎處理電路（16個基礎處理電路僅僅為了舉例說明，在實際應用中，可以採用其他的數值）；In this embodiment, the neural network computing device includes a main processing circuit and 16 basic processing circuits (the 16 basic processing circuits are for illustration only, and other values may be used in practical applications);

本實施例中，基礎處理電路有兩個數據輸入介面，兩個數據輸出介面；在本例的後續描述中，將橫向的輸入介面（圖1b中指向本單元的橫向箭頭）稱作輸入0，豎向的輸入介面（圖1b中指向本單元的豎向箭頭）稱作輸入1；將每一個橫向的數據輸出介面（圖1b中從本單元指出的橫向箭頭）稱作輸出0，豎向的數據輸出介面（圖1b中從本單元指出的豎向箭頭）稱作輸出1。In this embodiment, the basic processing circuit has two data input interfaces and two data output interfaces; in the subsequent description of this example, the horizontal input interface (the horizontal arrow pointing to this unit in Figure 1b) is called input 0, The vertical input interface (the vertical arrow pointing to this unit in Figure 1b) is called input 1; each horizontal data output interface (the horizontal arrow pointed out from this unit in Figure 1b) is called output 0, and the vertical The data output interface (the vertical arrow pointed out from this unit in Figure 1b) is called output 1.

每一個基礎處理電路的數據輸入介面和數據輸出介面可以分別連接不同的單元，包括主處理電路與其他基礎處理電路；The data input interface and data output interface of each basic processing circuit can be connected to different units, including the main processing circuit and other basic processing circuits;

本例中，基礎處理電路0,4,8,12（編號見圖1b）這四個基礎處理電路的輸入0與主處理電路的數據輸出介面連接；In this example, the input 0 of the four basic processing circuits 0, 4, 8, 12 (numbered as shown in Figure 1b) is connected to the data output interface of the main processing circuit;

本例中，基礎處理電路0,1,2,3這四個基礎處理電路的輸入1與主處理電路的數據輸出介面連接；In this example, the input 1 of the four basic processing circuits 0,1,2,3 is connected to the data output interface of the main processing circuit;

本例中，基礎處理電路12,13,14,15這四個基礎處理電路的輸出1與主處理電路的數據輸入介面相連；In this example, the outputs 1 of the four basic processing circuits 12, 13, 14, 15 are connected to the data input interface of the main processing circuit;

本例中，基礎處理電路輸出介面與其他基礎處理電路輸入介面相連接的情況見圖1b所示，不再一一列舉；In this example, the connection between the output interface of the basic processing circuit and the input interface of other basic processing circuits is shown in Figure 1b, which is no longer listed one by one;

具體地，S單元的輸出介面S1與P單元的輸入介面P1相連接，表示P單元將可以從其P1介面接收到S單元發送到其S1介面的數據。Specifically, the output interface S1 of the S unit is connected to the input interface P1 of the P unit, which means that the P unit can receive data from its P1 interface and the data sent by the S unit to its S1 interface.

本實施例包含一個主處理電路，主處理電路與外部裝置相連接（即由輸入介面也有輸出介面），主處理電路的一部分數據輸出介面與一部分基礎處理電路的數據輸入介面相連接；主處理電路的一部分數據輸入介面與一部分基礎處理電路的數據輸出介面相連。This embodiment includes a main processing circuit, which is connected to an external device (that is, an input interface also has an output interface), a part of the data output interface of the main processing circuit is connected to a part of the data input interface of the basic processing circuit; the main processing circuit A part of the data input interface is connected to a part of the data output interface of the basic processing circuit.

集成電路芯片裝置的使用方法Method for using integrated circuit chip device

本披露提供的使用方法中所涉及到的數據可以是經過壓縮處理後的數據。需要說明的是，本申請中的數據可以是神經網絡中的輸入神經元或權值，其具體可為矩陣數據或向量數據等，本申請不做限定。也即是本申請下文闡述的數據或數據塊可為神經網絡中的輸入神經元或權值，它們可以矩陣或向量等形式體現。The data involved in the usage methods provided in this disclosure may be data after compression processing. It should be noted that the data in this application may be input neurons or weights in a neural network, which may specifically be matrix data or vector data, which is not limited in this application. That is, the data or data blocks described later in this application may be input neurons or weights in a neural network, and they may be embodied in the form of a matrix or a vector.

本申請涉及的數據壓縮處理具體在前文所述的第一映射電路和第二映射電路中執行。應理解的，由於神經網絡是一個高計算量和高訪存的算法，權值越多，計算量和訪存量都會增大。特別是，針對權值較小（如為0，或小於設定數值的權值）的情況下，為提高計算速率、減小開銷需對這些權值較小的數據進行壓縮處理。在實際應用中，數據壓縮處理在稀疏神經網絡中應用，效果最為明顯，如減小數據計算的工作量、減小數據額外開銷，提高數據計算速率等。The data compression processing involved in this application is specifically performed in the first mapping circuit and the second mapping circuit described above. It should be understood that since the neural network is an algorithm with a high calculation amount and a high memory access, the more weights, the calculation amount and memory access will increase. In particular, for cases where the weight is small (such as 0 or a weight smaller than a set value), in order to increase the calculation rate and reduce the overhead, it is necessary to compress the data with smaller weights. In practical applications, data compression processing is applied in sparse neural networks with the most obvious effects, such as reducing the data calculation workload, reducing the data overhead, and increasing the data calculation rate.

以輸入數據為例，闡述數據壓縮處理涉及的具體實施例。所述輸入數據包括但不限於至少一個輸入神經元和/或至少一個權值。Taking input data as an example, the specific embodiment involved in data compression processing is explained. The input data includes, but is not limited to, at least one input neuron and / or at least one weight.

第一實施例中：In the first embodiment:

第一映射電路接收到第一輸入數據（具體可為主處理電路發送的待計算的數據塊，如橫向數據塊或者豎向數據塊等）後，所述第一映射電路可對所述第一輸入數據進行處理，以獲得處理後的第一輸入數據以該第一輸入數據關聯的標識mask數據，該mask數據用於指示該第一輸入數據的絕對值是否大於第一閾值，如0.5、0等等。After the first mapping circuit receives the first input data (specifically, a data block to be calculated sent by the main processing circuit, such as a horizontal data block or a vertical data block, etc.), the first mapping circuit may The input data is processed to obtain the processed first input data and an identification mask data associated with the first input data, where the mask data is used to indicate whether an absolute value of the first input data is greater than a first threshold, such as 0.5, 0 and many more.

具體的，當所述第一輸入數據的絕對值大於第一閾值，則保留該輸入數據；否則刪除該第一輸入數據或將該第一輸入數據置為0。例如，輸入的矩陣數據塊為，第一閾值為0.05，則經過第一映射電路處理後可獲得處理後的矩陣數據塊，與該矩陣數據塊關聯的標識數據塊（也可稱為mask矩陣）為。Specifically, when the absolute value of the first input data is greater than a first threshold, the input data is retained; otherwise, the first input data is deleted or the first input data is set to 0. For example, the input matrix data block is , The first threshold is 0.05, and the processed matrix data block can be obtained after the first mapping circuit processes , The identification data block (also known as the mask matrix) associated with the matrix data block is .

進一步地，為減少數據傳輸量，所述主處理電路再向與其連接的基礎處理電路中分發數據時，可發送所述處理後的矩陣數據塊中的目標數據（本例中即為1,0.06和0.5）以及該矩陣數據塊關聯的標識數據塊。具體實施時，所述主處理電路可按照設定規則將所述處理後的矩陣數據塊中的目標數據分發至基礎處理電路中，例如按照行順序依次發送或者按照列順序依次等等，本申請不做限定。相應地，基礎處理電路在接收到所述目標數據以及該目標數據對應關聯的標識數據塊後，按照設定規則（例如行順序）將其還原為處理後的矩陣數據塊。例如本例中，基礎處理電路可根據接收的數據（1,0.06和0.5）以及標識數據塊，可獲知該數據對應的矩陣數據塊（即主處理電路中第一映射電路處理後的矩陣數據塊）為。Further, in order to reduce the amount of data transmission, when the main processing circuit distributes data to the basic processing circuit connected to the main processing circuit, it may send the target data in the processed matrix data block (in this example, it is 1.0.06). And 0.5) and the identification data block associated with the matrix data block. In specific implementation, the main processing circuit may distribute the target data in the processed matrix data block to the basic processing circuit according to a set rule, for example, sequentially sending in row order or column order, etc. Be limited. Correspondingly, after receiving the target data and the identification data block corresponding to the target data, the basic processing circuit restores the target data to a processed matrix data block according to a set rule (such as a row order). For example, in this example, the basic processing circuit can identify the data block based on the received data (1,0.06 and 0.5) and , It can be known that the matrix data block corresponding to the data (that is, the matrix data block processed by the first mapping circuit in the main processing circuit) is .

在本發明實施例中，該第一輸入數據可為橫向數據塊和/或豎向數據塊。In the embodiment of the present invention, the first input data may be a horizontal data block and / or a vertical data block.

相應地，第二映射電路可利用第一輸入數據關聯的標識數據對第二輸入數據進行處理，從而獲得處理後的第二輸入數據；其中第一輸入數據與所述第二輸入數據不同。例如當所述第一輸入數據為至少一個權值時，則所述第二輸入數據可為至少一個輸入神經元；或者，當所述第一輸入數據為至少一個輸入神經元時，則所述第二輸入數據可為至少一個權值。Correspondingly, the second mapping circuit may use the identification data associated with the first input data to process the second input data to obtain the processed second input data; wherein the first input data is different from the second input data. For example, when the first input data is at least one weight value, the second input data may be at least one input neuron; or when the first input data is at least one input neuron, the The second input data may be at least one weight value.

在本發明實施例中，該第二輸入數據與所述第一輸入數據不同，所述第二輸入數據可為以下中的任一個：橫向數據塊、基本數據塊、豎向數據塊以及部分豎向數據塊。In the embodiment of the present invention, the second input data is different from the first input data, and the second input data may be any one of the following: a horizontal data block, a basic data block, a vertical data block, and a part of a vertical data block. To the data block.

例如，當所述第一輸入數據為橫向數據塊時，則第二輸入數據為部分豎向數據塊。假設第二輸入數據為矩陣數據塊，相應地利用上例中mask矩陣處理後，獲得處理後的部分豎向數據塊為。由於在實際應用中，輸入數據涉及的矩陣數據塊維數較大，本申請這裡僅為示意，本不構成限定。For example, when the first input data is a horizontal data block, the second input data is a partial vertical data block. Assume that the second input data is a matrix data block And correspondingly use the mask matrix in the above example After processing, the processed vertical data block is obtained as . Because in the actual application, the dimension of the matrix data block involved in the input data is relatively large, this application is only for illustration here, and it is not a limitation.

第二實施例中：In the second embodiment:

所述第一映射電路可用於對第一輸入數據和第二輸入數據進行處理，以得到處理後的第一輸入數據以及所述第一輸入數據關聯的第一標識mask數據、處理後的第二輸入數據以及所述第二輸入數據關聯的第二標識mask數據。其中，所述第一mask數據或者第二mask數據用於指示第一或第二輸入數據的絕對值是否大於第二閾值，該第二閾值為用戶側或裝置側自定義設置的，例如0.05、0等等。The first mapping circuit may be configured to process the first input data and the second input data to obtain processed first input data and first identification mask data associated with the first input data, and processed second data. The input data and the second identification mask data associated with the second input data. The first mask data or the second mask data is used to indicate whether the absolute value of the first or second input data is greater than a second threshold, where the second threshold is set by the user or the device, such as 0.05, 0 and so on.

所述處理後的第一輸入數據或第二輸入數據可為處理後的輸入數據，也可為未處理前的輸入數據。例如，第一輸入數據為橫向數據塊，如上述例子中的矩陣數據塊。經過第一映射電路處理後可獲得處理後的橫向數據塊，這裡處理後的橫向數據塊可為原矩陣數據塊，也可為壓縮處理後的矩陣數據塊。應理解的，本申請為減少數據量的傳輸以及基礎處理電路中數據處理效率，優選地所述處理後的輸入數據（如處理後的基本數據塊或部分豎向數據塊等）應為壓縮處理後的數據。優選地，主處理電路向基礎處理電路中發送的數據，具體可為所述處理後的輸入數據中的目標數據，該目標數據具體可為絕對值大於預設閾值的數據，也可為非0數據等等。The processed first input data or the second input data may be processed input data or unprocessed input data. For example, the first input data is a horizontal data block, such as the matrix data block in the above example. . After processing by the first mapping circuit, a processed horizontal data block can be obtained. The processed horizontal data block can be the original matrix data block. , Or a matrix data block after compression processing . It should be understood that, in order to reduce the amount of data transmitted and the data processing efficiency in the basic processing circuit, it is preferable that the processed input data (such as the processed basic data block or part of the vertical data block) should be compressed After the data. Preferably, the data sent by the main processing circuit to the basic processing circuit may specifically be target data in the processed input data, and the target data may specifically be data having an absolute value greater than a preset threshold, or may be non-zero Data and more.

相應地在基礎處理電路中，第二映射電路可根據所述第一輸入數據關聯的第一標識數據以及所述第二輸入數據關聯的第二標識數據得到連接標識數據；該連接標識數據用於指示所述第一輸入數據和所述第二輸入數據中絕對值均大於第三閾值的數據，其中第三閾值為用戶側或裝置側自定義設置的，如0.05、0等。進一步地，所述第二映射電路可根據所述連接標識數據分別對接收的第一輸入數據和第二輸入數據進行處理，從而獲得處理後的第一輸入數據和處理後的第二輸入數據。Accordingly, in the basic processing circuit, the second mapping circuit may obtain connection identification data according to the first identification data associated with the first input data and the second identification data associated with the second input data; the connection identification data is used for Data indicating that the absolute value of both the first input data and the second input data is greater than a third threshold, where the third threshold is a user- or device-side custom setting, such as 0.05, 0, and the like. Further, the second mapping circuit may process the received first input data and the second input data respectively according to the connection identification data, so as to obtain the processed first input data and the processed second input data.

例如，第一輸入數據為矩陣數據塊，第二輸入數據塊同樣也為矩陣數據塊。經過第一映射電路處理後可獲得該第一輸入數據關聯的第一標識數據塊，以及處理後的第一輸入數據塊；相應地獲得該第二輸入數據關聯的第二標識數據塊，處理後的第二輸入數據塊為。相應地，為提高數據傳輸速率，主處理電路中僅可將處理後的第一輸入數據塊中的目標數據1,0.06和0.5、以及該第一輸入數據塊關聯的第一標識數據塊發送給基礎處理電路；同時，將處理後的第二輸入數據塊中的目標數據1,1.1,0.6,0.3和0.5，以及該第二輸入數據塊關聯的第二標識數據塊發送給基礎處理電路。For example, the first input data is a matrix data block , The second input data block is also a matrix data block . After processing by the first mapping circuit, a first identification data block associated with the first input data can be obtained , And the processed first input data block ; Correspondingly obtain the second identification data block associated with the second input data , The processed second input data block is . Correspondingly, in order to improve the data transmission rate, the main processing circuit can only send the target data in the processed first input data block 1,0.06 and 0.5, and the first identification data block associated with the first input data block to the main processing circuit. The basic processing circuit; at the same time, the target data 1,1.1, 0.6, 0.3, and 0.5 in the processed second input data block and the second identification data block associated with the second input data block are sent to the basic processing circuit.

相應地，基礎處理電路在接收到上述數據後，可通過第二映射電路對上述第一標識數據塊和第二標識數據塊進行逐元素與操作，得到連接標識數據塊。相應地，第二映射電路利用該連接標識數據塊分別對所述處理後的第一輸入數據塊和處理後的第二輸入數據塊分別進行處理，從而獲得處理後的第一輸入數據塊為，處理後的第二輸入數據塊為。其中，在基礎處理電路中可根據第一標識數據塊以及接收的第一數據塊中的目標數據，確定出該目標數據對應所在的第一數據塊（即經過第一映射電路處理後的第一數據塊）；相應地，根據第二標識數據塊以及接收的第二數據塊中的目標數據，確定出該目標數據對應所在的第二數據塊（即經過第一映射電路處理後的第二數據塊）；然後，在第二映射電路獲知連接標識數據塊後，利用該連接標識數據塊分別與確定的第一數據塊和確定的第二數據塊進行逐元素與操作，以獲得經由第二映射電路處理後的第一數據塊和處理後的第二數據塊。Correspondingly, after receiving the data, the basic processing circuit may perform element-by-element AND operation on the first identification data block and the second identification data block through the second mapping circuit to obtain the connection identification data block. . Correspondingly, the second mapping circuit uses the connection identification data block to separately process the processed first input data block and the processed second input data block, thereby obtaining the processed first input data block as , The processed second input data block is . The basic processing circuit may determine the first data block corresponding to the target data according to the first identification data block and the target data in the received first data block (that is, the first data block processed by the first mapping circuit). (Data block); correspondingly, according to the second identification data block and the target data in the received second data block, determine the second data block corresponding to the target data (that is, the second data processed by the first mapping circuit) Block); then, after the second mapping circuit learns the connection identification data block, use the connection identification data block to perform an element-wise AND operation with the determined first data block and the determined second data block, respectively, to obtain The first data block processed by the circuit and the second data block processed.

第三實施例中：In the third embodiment:

所述主處理電路中並不會設置第一映射電路，但所述主處理電路可將第三輸入數據以及預存的所述第三輸入數據關聯的第三標識數據發送至與其連接的基礎處理電路中。該基礎處理電路中設置有第二映射電路。下面闡述第二映射電路涉及的數據壓縮處理的具體實施例。The first processing circuit is not provided in the main processing circuit, but the main processing circuit may send the third input data and the third identification data associated with the pre-stored third input data to the basic processing circuit connected thereto. in. A second mapping circuit is provided in the basic processing circuit. A specific embodiment of the data compression processing involved in the second mapping circuit is described below.

應理解的，所述第三輸入數據包括但不限於基礎數據塊、部分豎向數據塊、豎向數據塊等。同樣地，在神經網絡處理器中，該第三輸入數據也可為至少一個權值，和/或至少一個輸入神經，本申請不做限定。It should be understood that the third input data includes, but is not limited to, a basic data block, a partial vertical data block, a vertical data block, and the like. Similarly, in the neural network processor, the third input data may also be at least one weight value and / or at least one input nerve, which is not limited in this application.

在第二映射電路中，所述第二映射電路可根據接收的第三輸入數據關聯的第三標識數據對所述第三輸入數據進行處理，從而獲得處理後的第三輸入數據，以便後續對處理後的第三輸入數據執行相關運算操作，如內積運算等。In the second mapping circuit, the second mapping circuit may process the third input data according to the third identification data associated with the received third input data, so as to obtain the processed third input data for subsequent subsequent The processed third input data performs related operation operations, such as an inner product operation.

例如，第二映射電路接收的第三輸入數據為矩陣數據塊，相應地預存的該第三輸入數據關聯的第三標識數據塊（也成mask矩陣數據塊）為。進一步地，第二映射電路根據第三標識數據塊對第三輸入數據塊進行處理得到處理後的第三輸入數據塊具體為。For example, the third input data received by the second mapping circuit is a matrix data block. , And the corresponding pre-stored third identification data block (also referred to as a mask matrix data block) of the third input data is . Further, the second mapping circuit processes the third input data block according to the third identification data block to obtain a processed third input data block, which is specifically: .

此外，本發明實施例中提到的輸入神經元和輸出神經元並非是指整個神經網絡的輸入層中的神經元和輸出層中的神經元，而是對於神經網絡中任意相鄰的兩層神經元，處於網絡前饋運算下層中的神經元即為輸入神經元，處於網絡前饋運算上層中的神經元即為輸出神經元。以卷積神經網絡為例，假設一個卷積神經網絡有L層，K=1,2,3…L-1，對於第K層和第K+1層來說，第K層被稱為輸入層，該層中的神經元為上述輸入神經元，第K+1層被稱為輸入層，該層中的神經元為上述輸出神經元，即除了頂層之外，每一層都可以作為輸入層，其下一層為對應的輸出層。In addition, the input neuron and output neuron mentioned in the embodiments of the present invention do not refer to the neurons in the input layer and the neurons in the output layer of the entire neural network, but to any two adjacent layers in the neural network. Neuron, the neuron in the lower layer of the network feedforward operation is the input neuron, and the neuron in the upper layer of the network feedforward operation is the output neuron. Taking a convolutional neural network as an example, suppose a convolutional neural network has L layers, K = 1,2,3 ... L-1. For the Kth and K + 1th layers, the Kth layer is called the input Layer, the neurons in this layer are the above-mentioned input neurons, the K + 1 layer is called the input layer, and the neurons in this layer are the above-mentioned output neurons, that is, each layer can be used as the input layer except the top layer The next layer is the corresponding output layer.

第四實施中：In the fourth implementation:

所述主處理電路中並不設置映射電路，在所述基礎處理電路中設置有第一映射電路和第二映射電路。關於所述第一映射電路和第二映射電路的數據處理具體可參見前述第一實施例至第三實施例所述，這裡不再贅述。A mapping circuit is not provided in the main processing circuit, and a first mapping circuit and a second mapping circuit are provided in the basic processing circuit. For the data processing of the first mapping circuit and the second mapping circuit, reference may be made to the foregoing first embodiment to the third embodiment, and details are not described herein again.

可選的，還存在第五實施例。第五實施例中，所述基礎處理電路中並不設置映射電路，將所述第一映射電路和第二映射電路均設置在主處理電路中，關於所述第一映射電路和第二映射電路的數據處理具體可參見前述第一實施例至第三實施例所述，這裡不再贅述。即是，主處理電路中完成數據的壓縮處理，將處理後的輸入數據發送給基礎處理電路，以便基礎處理電路利用處理後的輸入數據（具體可為處理後的神經元和處理後權值）執行相應地的運算操作。Optionally, there is a fifth embodiment. In a fifth embodiment, a mapping circuit is not provided in the basic processing circuit, and both the first mapping circuit and the second mapping circuit are provided in a main processing circuit. Regarding the first mapping circuit and the second mapping circuit For specific data processing, refer to the foregoing first embodiment to the third embodiment, and details are not described herein again. That is, the data compression processing is completed in the main processing circuit, and the processed input data is sent to the basic processing circuit, so that the basic processing circuit uses the processed input data (specifically, it can be processed neurons and processed weights) Perform the corresponding operation.

下面闡述本申請涉及映射電路的具體結構示意圖。如圖4a和4b示出兩種可能的映射電路。其中，如圖4a所示的映射電路包括比較器和選擇器。關於所述比較器和選擇器的數量本申請不做限定。如圖4a示出一個比較器和兩個選擇器，其中，所述比較器用於判定輸入數據是否滿足預設條件。該預設條件可為用戶側或設備側自定義設置的，例如本申請上述的所述輸入數據的絕對值大於或等於預設閾值。如果滿足預設條件，則比較器可確定允許輸出該輸入數據，該輸入數據對應關聯的標識數據為1；否則可確定不輸出該輸入數據，或者默認該輸入數據為0。相應地，此時該輸入數據對應關聯的標識數據為0。也即是，經過該比較器後，可獲知輸入數據關聯的標識數據。The specific structure diagram of the mapping circuit involved in the present application is described below. Figures 4a and 4b show two possible mapping circuits. The mapping circuit shown in FIG. 4a includes a comparator and a selector. The number of the comparators and selectors is not limited in this application. As shown in Fig. 4a, a comparator and two selectors are shown, wherein the comparator is used to determine whether the input data meets a preset condition. The preset condition may be set by the user or the device. For example, the absolute value of the input data described above in this application is greater than or equal to a preset threshold. If the preset condition is satisfied, the comparator may determine that the input data is allowed to be output, and the input data corresponds to the associated identification data is 1; otherwise, it may be determined that the input data is not output or the input data is 0 by default. Accordingly, at this time, the corresponding identification data corresponding to the input data is 0. That is, after the comparator, the identification data associated with the input data can be obtained.

進一步地，所述比較器對輸入數據進行預設條件的判定後，可將獲得的標識數據輸入至選擇器中，以便選擇器利用該標識數據來決定是否輸出相應地的輸入數據，即獲得處理後的輸入數據。Further, after the comparator determines the preset conditions on the input data, the obtained identification data may be input into the selector, so that the selector uses the identification data to decide whether to output corresponding input data, that is, to obtain processing. After the input data.

如圖4a，以所述輸入數據為矩陣數據塊為例，經過比較器可對該矩陣數據塊中的每個數據進行預設條件的判定，從而可獲得該矩陣數據塊關聯的標識數據塊（mask矩陣）。進一步地，在第一選擇器中可利用該標識數據塊對所述矩陣數據塊進行篩選，將所述矩陣數據塊中絕對值大於或等於預設閾值（即滿足預設條件）的數據進行保留，其餘數據進行刪除，以輸出處理後的矩陣數據塊。可選的，在第二選擇器中還可利用該標識數據塊對其他輸入數據（例如第二矩陣數據塊）進行處理，例如進行逐元素與操作，以將該第二矩陣數據塊中絕對值大於或等於預設閾值的數據進行保留，以輸出處理後的第二矩陣數據塊。As shown in FIG. 4a, taking the input data as a matrix data block as an example, each of the data in the matrix data block can be determined by a comparator, so as to obtain an identification data block associated with the matrix data block ( mask matrix). Further, in the first selector, the identification data block may be used to filter the matrix data block, and data having an absolute value greater than or equal to a preset threshold (that is, meeting a preset condition) in the matrix data block is retained. , The remaining data is deleted to output the processed matrix data block. Optionally, in the second selector, the identification data block may also be used to process other input data (for example, the second matrix data block), for example, perform an element-wise AND operation to absolute values in the second matrix data block. Data that is greater than or equal to a preset threshold is retained to output a processed second matrix data block.

應理解的，對應於上述第一和第二實施例中，所述第一映射電路的具體結構可包括至少一個比較器和至少一個選擇器，例如上例中圖4a中的比較器和第一選擇器；所述第二映射電路的具體結果可包括一個或多個選擇器，例如上例中圖4a的第二選擇器。It should be understood that, corresponding to the foregoing first and second embodiments, the specific structure of the first mapping circuit may include at least one comparator and at least one selector, such as the comparator and the first in FIG. 4a in the above example. Selector; the specific result of the second mapping circuit may include one or more selectors, such as the second selector in FIG. 4a in the above example.

如圖4b，示出另一種映射電路的結構示意圖。如圖4b，所述映射電路包括選擇器，所述選擇器的數量不做限定，可為一個，也可為多個。具體的，所述選擇器用於根據輸入的輸入數據所關聯的標識數據來對輸入的所述輸入數據進行選擇，以將所述輸入數據中絕對值大於或等於預設閾值的數據進行輸出，其餘數據進行刪除/不輸出，從而獲得處理後的輸入數據。As shown in FIG. 4b, a schematic structural diagram of another mapping circuit is shown. As shown in FIG. 4b, the mapping circuit includes a selector, and the number of the selectors is not limited, and may be one or multiple. Specifically, the selector is configured to select the inputted input data according to the identification data associated with the inputted input data to output data whose absolute value is greater than or equal to a preset threshold, and the rest The data is deleted / not output to obtain the processed input data.

以所述輸入數據為矩陣數據塊為例，向所述映射電路輸入該矩陣數據塊以及該矩陣數據塊關聯的標識數據塊，選擇器可根據該標識數據塊對所述矩陣數據塊進行選擇，將其絕對值大於或等於0的數據進行輸出，其餘數據不予輸出，從而輸出處理後的矩陣數據塊。Taking the input data as a matrix data block as an example, the matrix data block and an identification data block associated with the matrix data block are input to the mapping circuit, and the selector may select the matrix data block according to the identification data block. The data whose absolute value is greater than or equal to 0 is output, and the remaining data is not output, so that the processed matrix data block is output.

應理解的，如圖4b所示的結構可應用於上述第三實施例中的第二映射電路，即是上述第三實施例中的第二映射電路的具體結果可包括至少一個選擇器。同理，對於主處理電路和基礎處理電路中設計的第一映射電路和第二映射電路可按照如圖4a和圖4b所示的功能部件進行交叉組合或部件拆分，本申請不做限定。It should be understood that the structure shown in FIG. 4b may be applied to the second mapping circuit in the third embodiment, that is, the specific result of the second mapping circuit in the third embodiment may include at least one selector. Similarly, the first mapping circuit and the second mapping circuit designed in the main processing circuit and the basic processing circuit may be cross-combined or divided according to the functional components shown in FIG. 4a and FIG. 4b, which is not limited in this application.

基於前述實施例，下面具體闡述主處理電路以及基礎處理電路中需完成的操作處理，可使用如下方法進行：Based on the foregoing embodiments, the operation processing to be completed in the main processing circuit and the basic processing circuit will be specifically described below, which can be performed using the following methods:

主處理電路先啓用第一映射電路對第一輸入數據進行處理，以獲得處理後的第一輸入數據以及該第一輸入數據關聯的第一標識數據；然後再將處理後的第一輸入數據以及該第一輸入數據關聯的第一標識數據傳輸給基礎處理電路運算。例如，主處理電路可以將待計算的數據（如橫向數據塊/豎向數據塊）進行處理後再傳輸給基礎處理電路，其優點是可以減少傳輸數據的位寬，減少傳輸的總比特數量，基礎處理電路執行位寬較小的數據運算的效率也更高，功耗更低。The main processing circuit first enables the first mapping circuit to process the first input data to obtain the processed first input data and the first identification data associated with the first input data; then, the processed first input data and The first identification data associated with the first input data is transmitted to a basic processing circuit for calculation. For example, the main processing circuit can process the data to be calculated (such as horizontal data blocks / vertical data blocks) before transmitting them to the basic processing circuit. The advantages are that the bit width of the transmitted data can be reduced, and the total number of bits transmitted can be reduced. The basic processing circuit performs data operations with smaller bit widths with higher efficiency and lower power consumption.

基礎處理電路啓用第二映射電路利用該第一標識數據對接收的第二輸入數據進行處理，得到處理後的第二輸入數據然後再對處理後的第一輸入數據和第二輸入數據執行相關運算操作。例如，基礎處理電路收到主處理電路傳輸過來的第二輸入數據（如稀疏數據，豎向數據塊），先對其進行壓縮處理再進行運算，提高運算效率，降低功耗。The basic processing circuit enables the second mapping circuit to use the first identification data to process the received second input data to obtain the processed second input data and then perform related operations on the processed first input data and the second input data. operating. For example, the basic processing circuit receives the second input data (such as sparse data and vertical data blocks) transmitted from the main processing circuit, and then compresses it before performing operations to improve the operation efficiency and reduce power consumption.

可選的，主處理電路可先將第一輸入數據（如基本數據塊）、第一輸入數據關聯的第一標識數據、第二輸入數據（如部分豎向數據塊等）以及第二輸入數據關聯的第二標識數據先傳輸給基礎處理電路運算。Optionally, the main processing circuit may first convert the first input data (such as a basic data block), the first identification data associated with the first input data, the second input data (such as a partial vertical data block, etc.), and the second input data. The associated second identification data is first transmitted to the basic processing circuit for operation.

相應地，基礎處理電路接收數據後，可先啓用第二映射電路根據第一標識數據和第二標識數據獲得連接標識數據塊，然後在利用該連接標識數據對第一輸入數據和第二輸入數據進行處理，進一步地在基礎處理電路中還能完成針對所述處理後的第一輸入數據和第二輸入數據的運算操作，其好處減少數據運算量，提高運算效率，降低功耗。Accordingly, after receiving the data, the basic processing circuit may first enable the second mapping circuit to obtain the connection identification data block according to the first identification data and the second identification data, and then use the connection identification data to pair the first input data and the second input data. By performing processing, further, the operation of the first input data and the second input data after the processing can be completed in the basic processing circuit, which has the advantages of reducing the amount of data operation, improving the operation efficiency, and reducing the power consumption.

可選的，主處理電路發送的第一輸入數據關聯的第一標識數據以及第二輸入數據關聯的第二標識數據為預先存儲在該主處理電路中的，或者為所述主處理電路啓用第一映射電路通過所述第一/第二輸入數據獲得的，本申請不做限定。Optionally, the first identification data associated with the first input data and the second identification data associated with the second input data sent by the main processing circuit are stored in the main processing circuit in advance, or the first processing data is enabled for the main processing circuit. A mapping circuit is obtained through the first / second input data, which is not limited in this application.

基礎處理電路的使用方法（如圖2a）；How to use the basic processing circuit (see Figure 2a);

主處理電路從裝置外部接收待計算的輸入數據；The main processing circuit receives input data to be calculated from outside the device;

可選地，主處理電路利用本單元的各種運算電路，向量運算電路，內積運算器電路、累加器電路等對數據進行運算處理；Optionally, the main processing circuit uses various arithmetic circuits, vector arithmetic circuits, inner product arithmetic circuits, and accumulator circuits of the unit to perform arithmetic processing on the data;

主處理電路通過數據輸出介面向基礎處理電路陣列（把所有基礎處理電路的集合稱作基礎處理電路陣列）發送數據(如圖2b所示)；The main processing circuit sends data to the basic processing circuit array (referring to the collection of all basic processing circuits as the basic processing circuit array) through the data output interface (as shown in Figure 2b);

此處的發送數據的方式可以是向一部分基礎處理電路直接發送數據，即多次廣播方式；The method for sending data here may be sending data directly to a part of the basic processing circuit, that is, multiple broadcast mode;

此處發送數據的方式可以向不同的基礎處理電路分別發送不同的數據，即分發方式；The way of sending data here can send different data to different basic processing circuits, that is, the distribution mode;

基礎處理電路陣列對數據進行計算；The basic processing circuit array calculates the data;

基礎處理電路接收到輸入數據後進行運算；The basic processing circuit performs operations after receiving the input data;

可選地，基礎處理電路接收到數據後將該數據從本單元的數據輸出介面傳輸出去；（傳輸給其他沒有直接從主處理電路接收到數據的基礎處理電路。）Optionally, after receiving the data, the basic processing circuit transmits the data from the data output interface of the unit; (transmit to other basic processing circuits that do not receive data directly from the main processing circuit.)

可選地，基礎處理電路將運算結果從數據輸出介面傳輸出去；（中間計算結果或者最終計算結果）Optionally, the basic processing circuit transmits the operation result from the data output interface; (intermediate calculation result or final calculation result)

主處理電路接收到從基礎處理電路陣列返回的輸出數據；The main processing circuit receives the output data returned from the basic processing circuit array;

可選地，主處理電路對從基礎處理電路陣列接收到的數據繼續進行處理（例如累加或激活操作）；Optionally, the main processing circuit continues to process the data received from the basic processing circuit array (such as an accumulation or activation operation);

主處理電路處理完畢，將處理結果從數據輸出介面傳輸給裝置外部。The main processing circuit finishes processing and transmits the processing result from the data output interface to the outside of the device.

使用所述電路裝置完成張量乘張量運算，所述張量和前文所述的數據塊相同，其可為矩陣、向量、三維數據塊、四位數據塊以及高維數據塊中的任一項或多項的組合；下面如圖2c和2f分別示出矩陣乘向量和矩陣乘矩陣運算的具體實現方法。Use the circuit device to complete the tensor multiplication tensor operation. The tensor is the same as the data block described above, and it can be any of a matrix, a vector, a three-dimensional data block, a four-bit data block, and a high-dimensional data block. A combination of terms or multiple items; the specific implementation methods of matrix multiplying vectors and matrix multiplying matrix operations are shown in Figures 2c and 2f, respectively.

使用所述電路裝置完成矩陣乘向量運算；（矩陣乘向量可以是矩陣中的每一行分別與向量進行內積運算，並將這些結果按對應行的順序擺放成一個向量。）Use the circuit device to complete matrix multiplying vector operations; (matrix multiplying vectors can be the inner product operation of each row in the matrix with the vector, and these results are placed into a vector in the order of the corresponding rows.)

下面描述計算尺寸是M行L列的矩陣S和長度是L的向量C的乘法的運算，如下圖2c所示。The following describes the calculation of a multiplication of a matrix S of size M rows and L columns and a vector C of length L, as shown in Figure 2c below.

此方法用到所述神經網絡計算裝置的全部或者一部分基礎處理電路，假設用到了K個基礎處理電路；This method uses all or part of the basic processing circuits of the neural network computing device, assuming that K basic processing circuits are used;

主處理電路將矩陣S的部分或全部行中的數據發送到k個基礎處理電路中的每個基礎處理電路；The main processing circuit sends data in part or all of the rows of the matrix S to each of the k basic processing circuits;

在一種可選的方案中，主處理電路的控制電路將矩陣S中某行的數據每次發送一個數或者一部分數給某個基礎處理電路；（例如，對於每次發送一個數，可以為對於某一個基礎處理電路，第1次發送第3行第1個數，第2次發送第3行數據中的第2個數，第3次發送第3行的第3個數……，或者對於每次發送一部分數，第1次發送第3行前兩個數（即第1、2個數），第二次發送第3行第3和第4個數，第三次發送第3行第5和第6個數……；）In an optional solution, the control circuit of the main processing circuit sends data of a row in the matrix S one number or a part of the number to a basic processing circuit at a time; (for example, for sending a number each time, it can be A certain basic processing circuit sends the first number of the third row of data for the first time, the second number of the third row of data for the second time, the third number of the third row for the third time ..., or Each time a part of the number is sent, the first two numbers of the third line (ie, the first and second numbers) are sent for the first time, the third and third numbers of the third line are sent for the second time, and the third and third lines are sent. 5 and 6th numbers ...;)

在一種可選的方案中，主處理電路的控制電路將矩陣S中某幾行的數據每次各發送一個數者一部分數給某個基礎處理電路；（例如，對於某一個基礎處理電路，第1次發送第3,4,5行每行的第1個數，第2次發送第3,4,5行每行的第2個數，第3次發送第3,4,5行每行的第3個數……，或者第1次發送第3,4,5行每行前兩個數，第二次發送第3,4,5行每行第3和第4個數，第三次發送第3,4,5行每行第5和第6個數……。）In an optional solution, the control circuit of the main processing circuit sends data of some rows in the matrix S to each basic processing circuit at a time; (for example, for a basic processing circuit, the first Send the 3rd, 4th, and 5th lines of the first number each time, send the 2nd number of the 3rd, 4th, and 5th lines each time, and send the 3rd, 4th, and 5th lines each time The third number of ........., or the first time to send the first two numbers of each line 3,4,5, the second time to send the third and fourth numbers of each line 3,4,5, the third Send the 3rd, 4th, and 5th lines 5th and 6th numbers of each line ...)

主處理電路的控制電路將向量C中的數據逐次發送到第0個基礎處理電路；The control circuit of the main processing circuit sequentially sends the data in the vector C to the 0th basic processing circuit;

第0個基礎處理電路接收到向量C的數據之後，將該數據發送給與其相連接的下一個基礎處理電路，即基礎處理電路1；After the 0th basic processing circuit receives the data of the vector C, it sends the data to the next basic processing circuit connected to it, namely, the basic processing circuit 1.

具體的，有些基礎處理電路不能直接從主處理電路處獲得計算所需的所有的數據，例如，圖2d中的基礎處理電路1，只有一個數據輸入介面與主處理電路相連，所以只能直接從主處理電路獲得矩陣S的數據，而向量C的數據就需要依靠基礎處理電路0輸出給基礎處理電路1，同理，基礎處理電路1也要收到數據後也要繼續把向量C的數據輸出給基礎處理電路2。Specifically, some basic processing circuits cannot directly obtain all the data required for calculation from the main processing circuit. For example, the basic processing circuit 1 in FIG. 2d has only one data input interface connected to the main processing circuit, so it can only be obtained directly from the main processing circuit. The main processing circuit obtains the data of the matrix S, and the data of the vector C needs to be output to the basic processing circuit 1 by the basic processing circuit 0. Similarly, the basic processing circuit 1 also continues to output the data of the vector C after receiving the data. To the base processing circuit 2.

每一個基礎處理電路對接收到的數據進行運算，該運算包括但不限於：內積運算、乘法運算、加法運算等等；Each basic processing circuit performs operations on the received data, including, but not limited to, inner product operations, multiplication operations, addition operations, and the like;

在一種可選方案中，基礎處理電路每次計算一組或多組兩個數據的乘法，然後將結果累加到寄存器和或片上緩存上；In an optional solution, the basic processing circuit calculates a multiplication of one or more sets of two data at a time, and then accumulates the results in a register and / or an on-chip buffer;

在一種可選方案中，基礎處理電路每次計算一組或多組兩個向量的內積，然後將結果累加到寄存器和或片上緩存上；In an optional solution, the basic processing circuit calculates an inner product of one or more groups of two vectors at a time, and then accumulates the results in a register and / or an on-chip buffer;

基礎處理電路計算出結果後，將結果從數據輸出介面傳輸出去（即傳輸給與其連接的其他基礎處理電路）；After the basic processing circuit calculates the result, the result is transmitted from the data output interface (that is, to other basic processing circuits connected to it);

在一種可選方案中，該計算結果可以是內積運算的最終結果或中間結果；In an optional solution, the calculation result may be a final result or an intermediate result of the inner product operation;

基礎處理電路接收到來自其他基礎處理電路的計算結果之後，將該數據傳輸給與其相連接的其他基礎處理電路或者主處理電路；After receiving the calculation results from other basic processing circuits, the basic processing circuit transmits the data to other basic processing circuits or main processing circuits connected to the data;

主處理電路接收到各個基礎處理電路內積運算的結果，將該結果處理得到最終結果（該處理可以為累加運算或激活運算等等）。The main processing circuit receives the result of the inner product operation of each basic processing circuit, and processes the result to obtain the final result (the processing may be an accumulation operation or an activation operation, etc.).

採用上述計算裝置實現矩陣乘向量方法的實施例：An embodiment of a method for implementing a matrix multiplication vector by using the above calculation device:

在一種可選方案中，方法所用到的多個基礎處理電路按照如下圖2d或者圖2e所示的方式排列；In an optional solution, the multiple basic processing circuits used in the method are arranged in the manner shown in FIG. 2d or FIG. 2e below;

如圖2c所示，主處理電路可分別獲取矩陣S和矩陣P各自對應的mask矩陣（即前文所述的標識數據/標識數據塊）。具體的，該矩陣S和矩陣P各自對應的mask矩陣可以是預先存儲在主處理電路中的高速存儲器中；也可是主處理電路啓用第一映射電路分別根據矩陣S和矩陣P獲得的各自對應的mask矩陣。主處理單元的控制電路將矩陣S的M行數據分成K組，分別由第i個基礎處理電路負責第i組（該組數據中行的集合記為Ai）的運算；相應地，主處理單元的控制電路同樣也會將矩陣S對應的第一mask矩陣的M行數據分成K組，並和矩陣S被劃分為K組後新形成的矩陣一起發送給相應地的基礎處理電路，以在該基礎處理電路中完成相關數據的運算操作。As shown in FIG. 2c, the main processing circuit can respectively obtain a mask matrix corresponding to each of the matrix S and the matrix P (that is, the identification data / identification data block described above). Specifically, the respective mask matrices of the matrix S and the matrix P may be stored in a high-speed memory in the main processing circuit in advance; or the main processing circuit may enable the first mapping circuit to obtain the corresponding corresponding ones obtained from the matrix S and the matrix P, respectively. mask matrix. The control circuit of the main processing unit divides the M rows of data of the matrix S into K groups, and the i-th basic processing circuit is responsible for the operation of the i-th group (the set of rows in this group is denoted as Ai); accordingly, the main processing unit's The control circuit also divides the M rows of the first mask matrix corresponding to the matrix S into K groups, and sends them to the corresponding basic processing circuit together with the newly formed matrix after the matrix S is divided into K groups. The processing circuit completes the operation of related data.

此處對M行數據進行分組的方法是任意不會重複分配的分組方式；The method for grouping M rows of data here is any grouping method that will not be repeatedly allocated;

在一種可選方案中，採用如下分配方式：將第j行分給第j%K（%為取余數運算）個基礎處理電路；In an optional solution, the following allocation method is adopted: the jth row is allocated to the j% K (% is the remainder operation) basic processing circuits;

在一種可選方案中，對於不能平均分組的情況也可以先對一部分行平均分配，對於剩下的行以任意方式分配。In an alternative, for a case where the grouping cannot be evenly divided, a part of the rows may be evenly distributed first, and the remaining rows may be distributed in an arbitrary manner.

主處理電路的控制電路每次將矩陣S中部分或全部行中的數據依次發送給對應的基礎處理電路；相應地，控制電路還會將矩陣S中這幾行數據所對應在第一mask矩陣中的標識數據一起發送給對應的基礎處理電路。The control circuit of the main processing circuit sends the data in some or all of the rows of the matrix S to the corresponding basic processing circuit in turn each time; correspondingly, the control circuit also corresponds to the first mask matrix of the rows of data in the matrix S The identification data in is sent to the corresponding basic processing circuit together.

例如，矩陣S為50*50的矩陣數據塊，主處理電路可將矩陣S分為10個小矩陣，每個小矩陣的尺寸大小為5*50，則主處理電路可將第1個小矩陣S0（5行50列）以及該小矩陣S0關聯的標識數據塊（5行50列）一起發送給第1個基礎處理電路，以在第1個基礎處理電路中完成相關數據的運算處理。For example, if the matrix S is a matrix data block of 50 * 50, the main processing circuit can divide the matrix S into 10 small matrices, and the size of each small matrix is 5 * 50, then the main processing circuit can divide the first small matrix S0 (5 rows and 50 columns) and the identification data block (5 rows and 50 columns) associated with the small matrix S0 are sent to the first basic processing circuit together to complete the related data calculation processing in the first basic processing circuit.

在一種可選方案中，主處理電路的控制電路每次向第i個基礎處理電路發送其負責的第i組數據Mi中的一行數據中的一個或多個數據，該第i組數據Mi可以是矩陣S中的數據，也可以是該矩陣S對應的第一mask矩陣中的數據；In an optional solution, the control circuit of the main processing circuit sends one or more data in a row of data of the i-th group of data Mi to the i-th basic processing circuit each time, and the i-th group of data Mi may Data in the matrix S, or data in the first mask matrix corresponding to the matrix S;

在一種可選方案中，主處理電路的控制電路每次向第i個基礎處理電路發送其負責的第i組數據Mi中的部分或全部行中的每行的一個或多個數據；In an optional solution, the control circuit of the main processing circuit sends one or more data of each of some or all of the i-th group of data Mi to the i-th basic processing circuit each time;

主處理電路的控制電路將向量C中的數據依次向第1個基礎處理電路發送；相應地，主處理電路的控制電路可將向量C關聯的第二mask矩陣中的數據也一起依次發送給第1個基礎處理電路。The control circuit of the main processing circuit sequentially sends the data in the vector C to the first basic processing circuit; correspondingly, the control circuit of the main processing circuit can also sequentially send the data in the second mask matrix associated with the vector C to the first 1 basic processing circuit.

在一種可選方案中，主處理電路的的控制電路每次可以發送向量C或者向量C關聯的第二mask矩陣中的一個或多個數據；In an optional solution, the control circuit of the main processing circuit may send one or more data in the vector C or the second mask matrix associated with the vector C each time;

第i個基礎處理電路接收到向量C或者第二mask矩陣的數據之後還可發送給與其相連的第i+1個基礎處理電路；After receiving the data of the vector C or the second mask matrix, the i-th basic processing circuit may also send it to the i + 1-th basic processing circuit connected to it;

每個基礎處理電路接收到來自矩陣S中某一行或者某幾行中的一個或多個數據以及來自向量C的一個或多個數據後，進行運算（包括但不限於乘法或加法）；After each basic processing circuit receives one or more data from a row or rows in the matrix S and one or more data from the vector C, it performs operations (including but not limited to multiplication or addition);

具體實現中，每個基礎處理電路接收到矩陣S中的數據以及該數據在第一mask矩陣中關聯的第一標識數據、向量C中的數據以及該數據在第二mask數據中關聯的第二標識數據後；可先根據第一標識數據和第二標識數據獲得連接標識數據；然後利用該連接標識數據決定是否對矩陣P中的數據和向量C中的數據執行相關運算操作。該連接標識數據是通過對第一標識數據和第二標識數據進行與操作所獲得的，其可為0或1，1表示矩陣S中某個位置的數據和向量C中同一位置的數據均為絕對值大於預設閾值的數據；反之，0表示矩陣S中同一位置的數據和/或向量C中同一位置的數據為絕對值小於或等於預設閾值的數據。In specific implementation, each basic processing circuit receives the data in the matrix S and the first identification data associated with the data in the first mask matrix, the data in the vector C, and the second associated with the data in the second mask data. After the identification data, the connection identification data may be obtained first according to the first identification data and the second identification data; and then the connection identification data is used to decide whether to perform related operation operations on the data in the matrix P and the data in the vector C. The connection identification data is obtained by performing an AND operation on the first identification data and the second identification data, and may be 0 or 1, where 1 indicates that data at a position in the matrix S and data at the same position in the vector C are both Data whose absolute value is greater than a preset threshold; on the contrary, 0 indicates that data at the same position in the matrix S and / or data at the same position in the vector C are data whose absolute value is less than or equal to the preset threshold.

即是，每個基礎處理電路啓動第二映射電路根據矩陣S的第一mask矩陣和向量C的第二mask矩陣選取同一位置中標識數據為1對應在矩陣S和向量C中的數據執行相關運算操作，例如乘法、加法操作等等。也即是，利用第一mask矩陣和第二mask矩陣對應來選取矩陣S和矩陣P中相同位置上絕對值大於預設閾值的數據執行相關運算操作，如乘法操作。That is, each basic processing circuit starts a second mapping circuit to select the identification data in the same position as 1 according to the first mask matrix of the matrix S and the second mask matrix of the vector C, and perform related operations on the data in the matrix S and the vector C. Operations, such as multiplication, addition, and so on. That is, the correspondence between the first mask matrix and the second mask matrix is used to select data in the same position in the matrix S and the matrix P whose absolute value is greater than a preset threshold to perform a related operation, such as a multiplication operation.

例如，基礎處理電路接收到矩陣S中的某兩行的數據為矩陣S0，對應的該矩陣S0關聯的第一mask矩陣；接收到向量C中的某幾個數據為向量C0，該向量C0關聯的第二mask向量；進一步的基礎處理電路可啓用第二映射電路先對和進行逐元素與操作，獲得連接mask矩陣，進一步利用該連接mask矩陣對接收的矩陣S0和向量C0進行處理，從而獲得處理後的矩陣S0和處理後的向量C0，以便基礎處理電路針對處理後的矩陣S0和處理後的向量C0執行相關的運算操作。For example, the basic processing circuit receives the data of some two rows in the matrix S as the matrix S0 Corresponding to the first mask matrix associated with this matrix S0 ; Received some data in vector C is vector C0 , The second mask vector associated with this vector C0 ; Further basic processing circuit can enable the second mapping circuit to first with Perform element-by-element AND operation to obtain the connection mask matrix , Further use the connection mask matrix to process the received matrix S0 and the vector C0 to obtain a processed matrix S0 And the processed vector C0 So that the basic processing circuit performs related operation operations on the processed matrix S0 and the processed vector C0.

在一種可選方案中，每個基礎處理電路中若接收的數據（具體可為待計算的數據塊，如矩陣S或向量C中某幾行/列的數據以及對應在mask矩陣中的標識數據）的數據量超過預設閾值時，該基礎處理電路將不再接收新的輸入數據，如主處理電路將後續發送的矩陣S或向量C某幾行/列的數據以及該數據對應在mask矩陣中的標識數據等等，直至基礎處理電路中擁有足夠的緩存/存儲空間，再接收主處理電路新發送的數據。In an optional solution, if the data received in each basic processing circuit (specifically, it may be a data block to be calculated, such as data of some rows / columns in matrix S or vector C, and identification data corresponding to the mask matrix) ) When the amount of data exceeds a preset threshold, the basic processing circuit will no longer receive new input data. For example, the main processing circuit will send the data of some rows / columns of matrix S or vector C and the data corresponds to the mask matrix. The identification data in the server, etc., until there is sufficient buffer / storage space in the basic processing circuit, and then receive the newly sent data from the main processing circuit.

在一種可選方案中，基礎處理電路接收到的數據也可以是中間結果，保存在寄存器和或片上緩存上；In an optional solution, the data received by the basic processing circuit may also be intermediate results, which are stored in registers and / or on-chip buffers;

基礎處理電路將本地的計算結果傳輸給與其相連接的下一個基礎處理電路或者主處理電路；The basic processing circuit transmits the local calculation result to the next basic processing circuit or main processing circuit connected to it;

在一種可選方案中，對應於圖2d的結構，只有每列的最後一個基礎處理電路的輸出介面與主處理電路相連接的，這種情況下，只有最後一個基礎處理電路可以直接將本地的計算結果傳輸給主處理電路，其他基礎處理電路的計算結果都要傳遞給自己的下一個基礎處理電路，下一個基礎處理電路傳遞給下下個基礎處理電路直至全部傳輸給最後一個基礎處理電路，最後一個基礎處理電路將本地的計算結果以及接收到的本列的其他基礎處理電路的結果執行累加計算得到中間結果，將中間結果發送至主處理電路；當然還可以為最後一個基礎處理電路可以將本列的其他基礎電路的結果以及本地的處理結果直接發送給主處理電路。In an optional solution, corresponding to the structure of FIG. 2d, only the output interface of the last basic processing circuit of each column is connected to the main processing circuit. In this case, only the last basic processing circuit can directly connect the local The calculation results are transmitted to the main processing circuit. The calculation results of other basic processing circuits must be passed to their next basic processing circuit, and the next basic processing circuit is passed to the next basic processing circuit until all of them are transmitted to the last basic processing circuit. The last basic processing circuit performs an accumulation calculation on the local calculation result and the results of other basic processing circuits received in this column to obtain an intermediate result, and sends the intermediate result to the main processing circuit. Of course, the last basic processing circuit can also send The results of the other basic circuits in this column and the local processing results are sent directly to the main processing circuit.

在一種可選方案中，對應於圖2e的結構，每一個基礎處理電路都有與主處理電路相連接的輸出介面，這種情況下，每一個基礎處理電路都直接將本地的計算結果傳輸給主處理電路；In an optional solution, corresponding to the structure of FIG. 2e, each basic processing circuit has an output interface connected to the main processing circuit. In this case, each basic processing circuit directly transmits the local calculation result to Main processing circuit

基礎處理電路接收到其他基礎處理電路傳遞過來的計算結果之後，傳輸給與其相連接的下一個基礎處理電路或者主處理電路。After the basic processing circuit receives the calculation results transmitted from other basic processing circuits, it transmits it to the next basic processing circuit or main processing circuit connected to it.

主處理電路接收到M個內積運算的結果，作為矩陣乘向量的運算結果。The main processing circuit receives the results of the M inner product operations as the operation results of the matrix multiplication vector.

使用所述電路裝置完成矩陣乘矩陣運算；Using the circuit device to complete a matrix multiplication matrix operation;

下面描述計算尺寸是M行L列的矩陣S和尺寸是L行N列的矩陣P的乘法的運算，（矩陣S中的每一行與矩陣P的每一列長度相同，如圖2f所示）The following describes the calculation of a multiplication of a matrix S whose size is M rows and L columns and a matrix P whose size is L rows and N columns. (Each row in matrix S has the same length as each column of matrix P, as shown in Figure 2f.)

本方法使用所述裝置如圖1b所示的實施例進行說明；The method is described by using the embodiment of the device as shown in FIG. 1b;

主處理電路的第一映射電路獲取矩陣S和矩陣P各自對應的標識mask矩陣，例如啓動第一映射電路分別對矩陣S和矩陣P進行處理以獲得該矩陣S對應的第一mask矩陣以及該矩陣P對應的第二mask矩陣；The first mapping circuit of the main processing circuit obtains the identification mask matrices corresponding to the matrix S and the matrix P respectively. For example, the first mapping circuit is activated to process the matrix S and the matrix P to obtain the first mask matrix corresponding to the matrix S and the matrix. A second mask matrix corresponding to P;

主處理電路的控制電路將矩陣S的部分或全部行中的數據發送到通過橫向數據輸入介面直接與主處理電路相連的那些基礎處理電路（例如，圖1b中最上方的灰色填充的豎向數據通路）；同時，控制電路還會將對應在第一mask矩陣中的部分或全部行中的標識數據發送到與其連接的基礎處理電路中。例如，控制電路將矩陣S中的前兩行數據以及該前兩行數據對應在第一mask矩陣中的前兩行標識數據一起發送到與主處理電路相連的基礎電路中。The control circuit of the main processing circuit sends the data in some or all of the rows of the matrix S to those basic processing circuits that are directly connected to the main processing circuit through the horizontal data input interface (for example, the gray-filled vertical data at the top in Figure 1b) Path); at the same time, the control circuit will also send the identification data corresponding to some or all of the rows in the first mask matrix to the basic processing circuit connected to it. For example, the control circuit sends the first two rows of data in the matrix S and the first two rows of identification data corresponding to the first two rows of data in the first mask matrix to the basic circuit connected to the main processing circuit.

在一種可選方案中，主處理電路的控制電路將矩陣S中某行的數據每次發送一個數或者一部分數給某個基礎處理電路；（例如，對於某一個基礎處理電路，第1次發送第3行第1個數，第2次發送第3行數據中的第2個數，第3次發送第3行的第3個數……，或者第1次發送第3行前兩個數，第二次發送第3行第3和第4個數，第三次發送第3行第5和第6個數……；）In an optional solution, the control circuit of the main processing circuit sends data of a row in the matrix S one number or a part of the number to a certain basic processing circuit at a time; (for example, for a certain basic processing circuit, the first transmission The first number in the third line, the second number in the third line of data is sent for the second time, the third number in the third line is sent in the third time ..., or the first two numbers in the third line are sent for the first time , Send the 3rd and 4th numbers of the 3rd line for the second time, send the 5th and 6th numbers of the 3rd line for the third time ...;)

相應地，控制電路同時還將與矩陣S中該行對應在第一mask矩陣中對應行的標識數據每次發送一個或一部分標識數據給某個基礎處理電路。Correspondingly, the control circuit also sends the identification data corresponding to the corresponding row in the first mask matrix in the matrix S to the basic processing circuit one or a part of the identification data at a time.

在一種可選方案中，主處理電路的控制電路將矩陣S中某幾行的數據以及對應在第一mask矩陣中對應幾行的標識數據每次各發送一個數或者一部分數給某個基礎處理電路；（例如，對於某一個基礎處理電路，第1次發送第3,4,5行每行的第1個數，第2次發送第3,4,5行每行的第2個數，第3次發送第3,4,5行每行的第3個數……，或者第1次發送第3,4,5行每行前兩個數，第二次發送第3,4,5行每行第3和第4個數，第三次發送第3,4,5行每行第5和第6個數……；）In an optional solution, the control circuit of the main processing circuit sends the data of some rows in the matrix S and the identification data corresponding to the corresponding rows in the first mask matrix to the basic processing at a time. Circuit; (for example, for a basic processing circuit, the first number of each line of 3,4,5 is sent for the first time, and the second number of each line of 3,4,5 is sent for the second time, Send the 3rd number of 3, 4, 5 lines for the third time ... or send the first 2 numbers of 3, 4, 5 lines for the first time, and send the 3, 4, 5 for the second time 3rd and 4th numbers per line, 3rd, 4th, 5th lines send 5th and 6th numbers per line ...;)

主處理電路的控制電路將矩陣P中的部分或全部列中的數據發送到通過豎向數據輸入介面直接與主處理電路相連的那些基礎處理電路（例如，圖1b中基礎處理電路陣列左側的灰色填充的橫向數據通路）；同時，控制電路還會將對應在第二mask矩陣中的部分或全部行中的標識數據發送到與其連接的基礎處理電路中。例如，控制電路將矩陣P中的前兩行數據以及該前兩行數據對應在第二mask矩陣中的前兩行標識數據一起發送到與主處理電路相連的基礎電路中。The control circuit of the main processing circuit sends data in some or all of the columns in the matrix P to those basic processing circuits that are directly connected to the main processing circuit through the vertical data input interface (for example, the gray to the left of the basic processing circuit array in Figure 1b) Filled horizontal data path); at the same time, the control circuit will also send the identification data corresponding to some or all of the rows in the second mask matrix to the basic processing circuit connected to it. For example, the control circuit sends the first two rows of data in the matrix P and the first two rows of identification data corresponding to the first two rows in the second mask matrix to the basic circuit connected to the main processing circuit.

在一種可選方案中，主處理電路的控制電路將矩陣P中某列的數據每次發送一個數或者一部分數給某個基礎處理電路；（例如，對於某一個基礎處理電路，第1次發送第3列第1個數，第2次發送第3列數據中的第2個數，第3次發送第3列的第3個數……，或者第1次發送第3列前兩個數，第二次發送第3列第3和第4個數，第三次發送第3列第5和第6個數……；）相應地，控制電路同時還將與矩陣P中該行對應在第二mask矩陣中對應行的標識數據每次發送一個或一部分標識數據給某個基礎處理電路。In an optional solution, the control circuit of the main processing circuit sends data of a column in the matrix P one number or a part of the number to a basic processing circuit at a time; (for example, for a basic processing circuit, the first sending The first number in the third column, the second number in the third column of data is sent for the second time, the third number in the third column is sent for the third time ..., or the first two numbers in the third column are sent for the first time. , Send the 3rd and 3rd numbers of the 3rd column for the second time, and send the 5th and 6th numbers of the 3rd column for the third time ...;) correspondingly, the control circuit will also correspond to the row in the matrix P at the same time The identification data of the corresponding row in the second mask matrix sends one or a part of the identification data to a certain basic processing circuit at a time.

在一種可選方案中，主處理電路的控制電路將矩陣P中某幾列的數據以及對應在第二mask矩陣中對應幾行的標識數據每次各發送一個數者一部分數給某個基礎處理電路；（例如，對於某一個基礎處理電路，第1次發送第3,4,5列每列的第1個數，第2次發送第3,4,5列每列的第2個數，第3次發送第3,4,5列每列的第3個數……，或者第1次發送第3,4,5列每列前兩個數，第二次發送第3,4,5列每列第3和第4個數，第三次發送第3,4,5列每列第5和第6個數……；）In an optional solution, the control circuit of the main processing circuit sends the data of some columns in the matrix P and the identification data corresponding to the corresponding rows in the second mask matrix to each of them one at a time to a certain basic processing. Circuit; (for example, for a basic processing circuit, the first number of columns 3, 4, 5 is sent for the first time, and the second number of columns 3, 4, 5 is sent for the second time, Send the 3rd number of each 3rd, 4th, 5th column for the third time ... or send the first 2 numbers of each 3rd, 4th, 5th column for the first time, and the 3rd, 4th, 5th for the second time 3rd and 4th numbers per column, 3rd, 4th, 5th columns each send 5th and 6th numbers ...;)

基礎處理電路接收到矩陣S的數據以及矩陣S關聯的第一mask矩陣的標識數據之後，將該數據（具體可為矩陣S的數據以及該數據對應在第一mask矩陣中的標識數據）通過其橫向的數據輸出介面傳輸給其相連接下一個基礎處理電路（例如，圖1b中基礎處理電路陣列中間的白色填充的橫向的數據通路）；基礎處理電路接收到矩陣P的數據後，將該數據通過其豎向的數據輸出介面傳輸給與其相連接的下一個基礎處理電路（例如，圖1b中基礎處理電路陣列中間的白色填充的豎向的數據通路）；After receiving the data of the matrix S and the identification data of the first mask matrix associated with the matrix S, the basic processing circuit passes the data (specifically, the data of the matrix S and the identification data corresponding to the data in the first mask matrix). The horizontal data output interface is transmitted to its next basic processing circuit (for example, the white filled horizontal data path in the middle of the basic processing circuit array in Figure 1b); after receiving the data of matrix P, the basic processing circuit sends the data Transmitted through its vertical data output interface to the next basic processing circuit connected to it (for example, the white-filled vertical data path in the middle of the basic processing circuit array in Figure 1b);

每一個基礎處理電路對接收到的數據進行運算；具體的，每個基礎處理電路接收到矩陣S中某一行或幾行的數據以及該數據對應在第一mask矩陣中關聯的第一標識數據、矩陣P中某一列或幾列的數據以及該數據對應在第二mask數據中關聯的第二標識數據後；可先根據第一標識數據和第二標識數據獲得連接標識數據；然後利用該連接標識數據決定是否對矩陣S中的數據和矩陣P中的數據執行相關運算操作。該連接標識數據是通過對第一標識數據和第二標識數據進行與操作所獲得的，其可為0或1，1表示矩陣S中某個位置的數據和矩陣P中同一位置的數據均為絕對值大於預設閾值的數據；反之，0表示矩陣S中某一位置的數據和/或矩陣P中同一位置的數據為絕對值小於或等於預設閾值的數據。具體可參見前述實施例所述，這裡不再贅述。Each basic processing circuit performs operations on the received data; specifically, each basic processing circuit receives data of a certain row or rows in the matrix S and the data corresponds to the first identification data associated with the first mask matrix, After the data of one or more columns in the matrix P and the data correspond to the second identification data associated in the second mask data; the connection identification data can be obtained according to the first identification data and the second identification data; and then the connection identification is used The data decides whether to perform a correlation operation on the data in the matrix S and the data in the matrix P. The connection identification data is obtained by performing an AND operation on the first identification data and the second identification data, and may be 0 or 1, where 1 indicates that data at a position in the matrix S and data at the same position in the matrix P are both Data whose absolute value is greater than a preset threshold; conversely, 0 indicates that data at a certain position in matrix S and / or data at the same position in matrix P are data whose absolute value is less than or equal to a preset threshold. For details, refer to the foregoing embodiments, and details are not described herein again.

即是，每個基礎處理電路啓動第二映射電路根據矩陣S的第一mask矩陣和矩陣P的第二mask矩陣選取同一位置中標識數據為1的數據執行相關運算操作，例如乘法、加法操作等等。That is, each basic processing circuit activates a second mapping circuit to select the data with the identification data of 1 in the same position according to the first mask matrix of the matrix S and the second mask matrix of the matrix P to perform related arithmetic operations, such as multiplication and addition operations, etc. Wait.

在一種可選方案中，每個基礎處理電路中若接收的數據（具體可為待計算的數據塊，如矩陣S或矩陣P中某幾行/列的數據以及對應在mask矩陣中的標識數據）的數據量超過預設閾值時，該基礎處理電路將不再接收新的輸入數據，如主處理電路將後續發送的矩陣S或矩陣P某幾行/列的數據以及該數據對應在mask矩陣中的標識數據等等，直至基礎處理電路中擁有足夠的緩存/存儲空間，再接收主處理電路新發送的數據。In an optional solution, if the data received in each basic processing circuit (specifically, it may be a data block to be calculated, such as data of some rows / columns in matrix S or matrix P, and identification data corresponding to the mask matrix) ) When the amount of data exceeds a preset threshold, the basic processing circuit will no longer receive new input data. For example, the main processing circuit will send the data of some rows / columns of matrix S or matrix P and the data corresponds to the mask matrix. The identification data in the server, etc., until there is sufficient buffer / storage space in the basic processing circuit, and then receive the newly sent data from the main processing circuit.

基礎處理電路計算出結果後，可以將結果從數據輸出介面傳輸出去；After the basic processing circuit calculates the result, the result can be transmitted from the data output interface;

具體地，如果該基礎處理電路有直接與主處理電路相連接的輸出介面則從該介面傳輸結果，如果沒有，則向著能夠直接向主處理電路輸出的基礎處理電路的方向輸出結果（例如，圖1b中，最下面一行基礎處理電路將其輸出結果直接輸出給主處理電路，其他基礎處理電路從豎向的輸出介面向下傳輸運算結果）。Specifically, if the basic processing circuit has an output interface directly connected to the main processing circuit, the result is transmitted from the interface; if not, the result is output to the direction of the basic processing circuit that can directly output to the main processing circuit (for example, FIG. In 1b, the bottom line of the basic processing circuit directly outputs its output result to the main processing circuit, and the other basic processing circuits transmit the calculation results downward from the vertical output interface).

向著能夠直接向主處理電路輸出的方向輸出結果（例如，圖1b中，最下面一行基礎處理電路將其輸出結果直接輸出給主處理電路，其他基礎處理電路從豎向的輸出介面向下傳輸運算結果）；Output the result in a direction that can be directly output to the main processing circuit (for example, in Figure 1b, the bottom line of the basic processing circuit directly outputs its output result to the main processing circuit, and the other basic processing circuits transfer the calculation from the vertical output interface downward. result);

主處理電路接收到各個基礎處理電路內積運算的結果，即可得到輸出結果。The main processing circuit receives the result of the inner product operation of each basic processing circuit and can obtain the output result.

「矩陣乘矩陣」方法的實施例：An embodiment of the "matrix by matrix" method:

方法用到按照如圖1b所示方式排列的基礎處理電路陣列；The method uses a basic processing circuit array arranged in a manner as shown in FIG. 1b;

主處理電路的第一映射電路獲取矩陣S和矩陣P各自對應的標識mask矩陣，例如啓動第一映射電路分別對矩陣S和矩陣P進行處理以獲得該矩陣S對應的第一mask矩陣以及該矩陣P對應的第二mask矩陣，可選的，還可得到處理後的矩陣S和矩陣P，假設處理後的矩陣S有h行，處理主處理電路的控制電路將矩陣S的h行數據分成h組，分別由第i個基礎處理電路負責第i組（該組數據中行的集合記為Hi）的運算；同時，控制電路還會將數據對應在第一mask矩陣中的部分或全部行中的標識數據發送到與其連接的基礎處理電路中。例如，控制電路將矩陣S中的前兩行數據以及該前兩行數據對應在第一mask矩陣中的前兩行標識數據一起發送到與主處理電路相連的基礎電路中。The first mapping circuit of the main processing circuit obtains the identification mask matrices corresponding to the matrix S and the matrix P respectively. For example, the first mapping circuit is activated to process the matrix S and the matrix P to obtain the first mask matrix corresponding to the matrix S and the matrix. The second mask matrix corresponding to P is optional, and the processed matrix S and matrix P can also be obtained. Assuming that the processed matrix S has h rows, the control circuit that processes the main processing circuit divides the data of h rows of the matrix S into h Group, the i-th basic processing circuit is responsible for the operation of the i-th group (the set of rows in the set of data is denoted as Hi); at the same time, the control circuit will also correspond to the data in some or all of the rows in the first mask matrix. The identification data is sent to the underlying processing circuit connected to it. For example, the control circuit sends the first two rows of data in the matrix S and the first two rows of identification data corresponding to the first two rows of data in the first mask matrix to the basic circuit connected to the main processing circuit.

此處對h行數據進行分組的方法是任意不會重複分配的分組方式；Here, the method for grouping the data of h rows is any grouping method that will not be repeatedly allocated;

在一種可選方案中，採用如下分配方式：主處理電路的控制電路將第j行分給第j%h個基礎處理電路；In an optional solution, the following allocation method is adopted: the control circuit of the main processing circuit allocates the jth row to the j% h basic processing circuits;

主處理電路的控制電路將矩陣P的W列數據分成w組，分別由第i個基礎處理電路負責第i組（該組數據中行的集合記為Wi）的運算；相應地，控制電路同時還將與矩陣P中該列對應在第二mask矩陣中對應列的標識數據每次發送一個或一部分標識數據給某個基礎處理電路。The control circuit of the main processing circuit divides the W column data of the matrix P into w groups, and the i-th basic processing circuit is responsible for the operation of the i-th group (the set of rows in the data is denoted as Wi); accordingly, the control circuit also simultaneously The identification data corresponding to the corresponding column in the second mask matrix in the matrix P is sent one or a part of the identification data to a certain basic processing circuit at a time.

此處對W列數據進行分組的方法是任意不會重複分配的分組方式；The method for grouping W column data here is any grouping method that will not be repeatedly assigned;

在一種可選方案中，採用如下分配方式：主處理電路的控制電路將第j行分給第j%w個基礎處理電路；In an optional solution, the following allocation method is adopted: the control circuit of the main processing circuit divides the jth row to the j% wth basic processing circuits;

在一種可選方案中，對於不能平均分組的情況也可以先對一部分列平均分配，對於剩下的列以任意方式分配。In an optional solution, for a case where the grouping cannot be evenly divided, a part of the columns may be evenly distributed first, and the remaining columns may be allocated in an arbitrary manner.

主處理電路的控制電路將矩陣S的部分或全部行中的數據發送到基礎處理電路陣列中每行的第一個基礎處理電路；The control circuit of the main processing circuit sends data in part or all of the rows of the matrix S to the first basic processing circuit of each row in the basic processing circuit array;

在一種可選方案中，主處理電路的控制電路每次向基礎處理電路陣列中第i行的第一個基礎處理電路發送其負責的第i組數據Hi中的一行數據中的一個或多個數據；同時採用相同方法可將第i組數據Hi對應在mask矩陣中的標識數據也發送給第一基礎處理電路；In an alternative, the control circuit of the main processing circuit sends one or more of a row of data in the i-th group of data Hi to the first basic processing circuit in the i-th row of the basic processing circuit array each time. Data; at the same time, the identification data corresponding to the i-th group of data Hi in the mask matrix can also be sent to the first basic processing circuit by using the same method;

在一種可選方案中，主處理電路的控制電路每次向基礎處理電路陣列中第i行的第一個基礎處理電路發送其負責的第i組數據Hi中的部分或全部行中的每行的一個或多個數據；同時採用相同方法可將第i組數據Hi對應在mask矩陣中的標識數據也發送給第一基礎處理電路；In an optional solution, the control circuit of the main processing circuit sends each of some or all of the i-th group of data Hi to the first basic processing circuit in the i-th row of the basic processing circuit array. At the same time, the identification data corresponding to the i-th group of data Hi in the mask matrix can also be sent to the first basic processing circuit by using the same method;

主處理電路的控制電路將矩陣P的部分或全部列中的數據發送到基礎處理電路陣列中每列的第一個基礎處理電路；同時，控制電路還會將對應在第二mask矩陣中的部分或全部行中的標識數據發送到與其連接的基礎處理電路中。例如，控制電路將矩陣P中的前兩行數據以及該前兩行數據對應在第二mask矩陣中的前兩行標識數據一起發送到與主處理電路相連的基礎電路中。The control circuit of the main processing circuit sends data in some or all of the columns of the matrix P to the first basic processing circuit of each column in the basic processing circuit array; at the same time, the control circuit will also send the portion corresponding to the second mask matrix Or the identification data in all rows is sent to the underlying processing circuit connected to it. For example, the control circuit sends the first two rows of data in the matrix P and the first two rows of identification data corresponding to the first two rows in the second mask matrix to the basic circuit connected to the main processing circuit.

在一種可選方案中，主處理電路的控制電路每次向基礎處理電路陣列中第i列的第一個基礎處理電路發送其負責的第i組數據Wi中的一列數據中的一個或多個數據；In an optional solution, the control circuit of the main processing circuit sends one or more of a column of data in the i-th group of data Wi that it is responsible to the first basic processing circuit in the i-th column of the basic processing circuit array. data;

在一種可選方案中，主處理電路的控制電路每次向基礎處理電路陣列中第i列的第一個基礎處理電路發送其負責的第i組數據Ni中的部分或全部列中的每列的一個或多個數據；In an optional solution, the control circuit of the main processing circuit sends each of some or all of the i-th group of data Ni to the first basic processing circuit in the i-th column of the basic processing circuit array. One or more data;

基礎處理電路接收到矩陣S的數據之後，將該數據通過其橫向的數據輸出介面傳輸給其相連接下一個基礎處理電路（例如，圖1b中基礎處理電路陣列中間的白色填充的橫向的數據通路）；基礎處理電路接收到矩陣P的數據後，將該數據通過其豎向的數據輸出介面傳輸給與其相連接的下一個基礎處理電路（例如，圖1b中基礎處理電路陣列中間的白色填充的豎向的數據通路）；After the basic processing circuit receives the data of the matrix S, it transmits the data to its next basic processing circuit through its horizontal data output interface (for example, the white-filled horizontal data path in the middle of the basic processing circuit array in Figure 1b). ); After receiving the data of matrix P, the basic processing circuit transmits the data to the next basic processing circuit connected to it through its vertical data output interface (for example, the white filled pad in the middle of the basic processing circuit array in Figure 1b) Vertical data path);

具體地，如果該基礎處理電路有直接與主處理電路相連接的輸出介面則從該介面傳輸結果，如果沒有，則向著能夠直接向主處理電路輸出的基礎處理電路的方向輸出結果（例如，最下面一行基礎處理電路將其輸出結果直接輸出給主處理電路，其他基礎處理電路從豎向的輸出介面向下傳輸運算結果）。Specifically, if the basic processing circuit has an output interface directly connected to the main processing circuit, the result is transmitted from the interface; if not, the result is output to the basic processing circuit that can directly output to the main processing circuit (for example, the most The bottom line of the basic processing circuit directly outputs its output result to the main processing circuit, and the other basic processing circuits transmit the calculation results downward from the vertical output interface).

向著能夠直接向主處理電路輸出的方向輸出結果（例如，最下面一行基礎處理電路將其輸出結果直接輸出給主處理電路，其他基礎處理電路從豎向的輸出介面向下傳輸運算結果）；Output the result in a direction that can be directly output to the main processing circuit (for example, the bottom row of the basic processing circuit outputs its output result directly to the main processing circuit, and the other basic processing circuits transmit the calculation result downward from the vertical output interface);

以上描述中使用的「橫向」，「豎向」等詞語只是為了表述圖1b所示的例子，實際使用只需要區分出每個單元的「橫向」「豎向」介面代表兩個不同的介面即可。The words "horizontal" and "vertical" used in the above description are just to describe the example shown in Figure 1b. In actual use, only the "horizontal" and "vertical" interfaces of each unit need to be distinguished to represent two different interfaces. can.

使用所述電路裝置完成全連接運算：Use the circuit device to complete a fully connected operation:

如果全連接層的輸入數據是一個向量（即神經網絡的輸入是單個樣本的情況），則以全連接層的權值矩陣作為矩陣S，輸入向量作為向量C，按照所述裝置的使用矩陣乘以向量方法執行運算；If the input data of the fully connected layer is a vector (ie, the input of the neural network is a single sample), then the weight matrix of the fully connected layer is used as the matrix S, and the input vector is used as the vector C. Perform operations in a vector method;

如果全連接層的輸入數據是一個矩陣（即神經網絡的輸入是多個樣本的情況），則以全連接層的權值矩陣作為矩陣S，輸入向量作為矩陣P，或者以全連接層的權值矩陣作為矩陣P，輸入向量作為矩陣S，按照所述裝置的矩陣乘以矩陣執行運算；If the input data of the fully connected layer is a matrix (that is, when the input of the neural network is multiple samples), then the weight matrix of the fully connected layer is used as the matrix S, and the input vector is used as the matrix P, or the weight of the fully connected layer is used. The value matrix is used as the matrix P, the input vector is used as the matrix S, and the operation is performed according to the matrix of the device multiplied by the matrix;

使用所述電路裝置完成卷積運算：Use the circuit device to complete a convolution operation:

下面描述卷積運算，下面的圖中一個方塊表示一個數據，輸入數據用圖3a表示（N個樣本，每個樣本有C個通道，每個通道的特徵圖的高為H，寬為W），權值也即卷積核用圖3b表示（有M個卷積核，每個卷積核有C個通道，高和寬分別為KH和KW）。對於輸入數據的N個樣本，卷積運算的規則都是一樣的，下面解釋在一個樣本上進行卷積運算的過程，在一個樣本上，M個卷積核中的每一個都要進行同樣的運算，每個卷積核運算得到一張平面特徵圖，M個卷積核最終計算得到M個平面特徵圖，（對一個樣本，卷積的輸出是M個特徵圖），對於一個卷積核，要在一個樣本的每一個平面位置進行內積運算，然後沿著H和W方向進行滑動，例如，圖3c表示一個卷積核在輸入數據的一個樣本中右下角的位置進行內積運算的對應圖；圖3d表示卷積的位置向左滑動一格和圖3e表示卷積的位置向上滑動一格。The convolution operation is described below. A square in the figure below represents a piece of data, and the input data is represented in Figure 3a (N samples, each sample has C channels, and the feature map of each channel has a height of H and a width of W). The weights, ie, the convolution kernels, are shown in Figure 3b (there are M convolution kernels, each convolution kernel has C channels, and the height and width are KH and KW, respectively). For N samples of the input data, the rules of the convolution operation are the same. The process of performing the convolution operation on one sample is explained below. On one sample, each of the M convolution kernels must be the same. Operation, each convolution kernel operates to obtain a planar feature map, and M convolution kernels finally calculate to obtain M planar feature maps (for a sample, the output of the convolution is M feature maps), for a convolution kernel , To perform an inner product operation at each plane position of a sample, and then slide in the H and W directions. For example, Figure 3c shows a convolution kernel that performs an inner product operation at the lower right corner of a sample of input data. Correspondence map; Figure 3d shows the position of the convolution slide one grid to the left and Figure 3e shows the position of the convolution slide one grid up.

主處理電路的第一映射電路可將權值的部分或全部卷積核中的數據進行處理，得到對應的mask數據以及處理後的權值數據（即是處理後權值的部分或全部卷積核中的數據）。The first mapping circuit of the main processing circuit may process data in part or all of the weighted convolution kernel to obtain corresponding mask data and processed weighted data (that is, part or all of the processed weighted convolution) Data in the core).

主處理電路的控制電路將權值的部分或全部卷積核中的數據（該數據可為原來的權值數據或者處理後的權值數據）發送到通過橫向數據輸入介面直接與主處理電路相連的那些基礎處理電路（例如，圖1b中最上方的灰色填充的豎向數據通路）；同時，控制電路將與該數據對應關聯的mask數據也一起發送給與主處理電路連接的基礎處理電路中；The control circuit of the main processing circuit sends part or all of the data in the convolution kernel (the data can be the original weight data or the processed weight data) to the direct connection to the main processing circuit through the horizontal data input interface Those basic processing circuits (for example, the gray-filled vertical data path at the top of Figure 1b); at the same time, the control circuit also sends the mask data corresponding to the data to the basic processing circuit connected to the main processing circuit. ;

在一種可選方案中，主處理電路的控制電路將權值中某個卷積核的數據每次發送一個數或者一部分數給某個基礎處理電路；（例如，對於某一個基礎處理電路，第1次發送第3行第1個數，第2次發送第3行數據中的第2個數，第3次發送第3行的第3個數……，或者第1次發送第3行前兩個數，第二次發送第3行第3和第4個數，第三次發送第3行第5和第6個數……；）同時，控制電路將該權值中某個卷積核對應的mask數據也採用上述每次發生一個數或一部分數據給那個基礎處理電路；In an optional solution, the control circuit of the main processing circuit sends data of a certain convolution kernel in the weight value to a certain basic processing circuit at a time; (for example, for a certain basic processing circuit, the first Send the first number of the 3rd line once, send the 2nd number of the 3rd line data for the 2nd time, send the 3rd number of the 3rd line for the 3rd time ... or before the 3rd line of the 1st time Two numbers, the 3rd and 4th numbers of the 3rd line are sent for the second time, and the 5th and 6th numbers of the 3rd line are sent for the third time ...) At the same time, the control circuit convolves a certain one of the weights The mask data corresponding to the core also uses the above-mentioned occurrence of a number or a part of the data to that basic processing circuit;

在一種可選方案中另一種情況是，主處理電路的控制電路將權值中某幾個卷積核的數據每次各發送一個數者一部分數給某個基礎處理電路；（例如，對於某一個基礎處理電路，第1次發送第3,4,5行每行的第1個數，第2次發送第3,4,5行每行的第2個數，第3次發送第3,4,5行每行的第3個數……，或者第1次發送第3,4,5行每行前兩個數，第二次發送第3,4,5行每行第3和第4個數，第三次發送第3,4,5行每行第5和第6個數……；）相應地，控制電路將與該權值中某幾個卷積核所對應關聯的mask數據也採用上述相同的方法每次發生一個數或一部分數據給那個基礎處理電路；In another alternative, the control circuit of the main processing circuit sends the data of some convolution kernels in the weight value to the basic processing circuit one by one each time; (for example, for a certain A basic processing circuit that sends the first number of each line 3,4,5 for the first time, the second number of each line 3,4,5 for the second time, and the third number of 3, The 3rd number of each line 4,5 ... or the first two numbers of 3,4,5 lines are sent for the first time, the 3rd and 4th of 5th lines are sent for the second time 4 numbers, send the 3rd, 4th, 5th lines for the 5th and 6th numbers of each line for the third time ...;) Correspondingly, the control circuit will associate the masks corresponding to some convolution kernels in the weight The data also uses the same method described above to generate a number or a part of the data each time to that basic processing circuit;

主處理電路的控制電路把輸入數據按照卷積的位置進行劃分，主處理電路的控制電路將輸入數據中的部分或全部卷積位置中的數據發送到通過豎向數據輸入介面直接與主處理電路相連的那些基礎處理電路（例如，圖1b中基礎處理電路陣列左側的灰色填充的橫向數據通路）；相應地，控制電路同樣也會按照卷積的位置對於所述輸入數據關聯的mask數據進行劃分，相應地控制電路同時也會將所述輸入數據中的部分或全部卷積位置中的數據所對應的mask數據也一起發送給與主處理電路電性連接的基礎處理電路中；The control circuit of the main processing circuit divides the input data according to the position of the convolution. The control circuit of the main processing circuit sends the data in some or all of the convolution positions in the input data to the main processing circuit directly through the vertical data input interface. Those connected basic processing circuits (for example, the gray-filled horizontal data path on the left of the basic processing circuit array in Figure 1b); correspondingly, the control circuit will also divide the mask data associated with the input data according to the position of the convolution Correspondingly, the control circuit also sends the mask data corresponding to the data in some or all of the convolution positions in the input data to the basic processing circuit electrically connected to the main processing circuit together;

在一種可選方案中，主處理電路的控制電路將輸入數據中某個卷積位置的數據以及與該數據對應關聯的mask數據每次發送一個數或者一部分數給某個基礎處理電路；（例如，對於某一個基礎處理電路，第1次發送第3列第1個數，第2次發送第3列數據中的第2個數，第3次發送第3列的第3個數……，或者第1次發送第3列前兩個數，第二次發送第3列第3和第4個數，第三次發送第3列第5和第6個數……；）In an optional solution, the control circuit of the main processing circuit sends data of a convolution position in the input data and mask data corresponding to the data to a basic processing circuit at a time or a part of the data; (for example, For a basic processing circuit, the first number in the third column is transmitted for the first time, the second number in the third column of data is transmitted for the second time, and the third number in the third column is transmitted for the third time ..., Or send the first two numbers in the third column for the first time, send the third and fourth numbers in the third column for the second time, and send the fifth and sixth numbers in the third column for the third time ...;)

在一種可選方案中另一種情況是，主處理電路的控制電路將輸入數據中某幾個卷積位置的數據以及與該數據對應關聯的mask數據每次各發送一個數或者一部分數給某個基礎處理電路；（例如，對於某一個基礎處理電路，第1次發送第3,4,5列每列的第1個數，第2次發送第3,4,5列每列的第2個數，第3次發送第3,4,5列每列的第3個數……，或者第1次發送第3,4,5列每列前兩個數，第二次發送第3,4,5列每列第3和第4個數，第三次發送第3,4,5列每列第5和第6個數……；）In another alternative, the control circuit of the main processing circuit sends the data of some convolution positions in the input data and the mask data corresponding to the data each time to send a number or a part of the number to a certain Basic processing circuit; (for example, for a basic processing circuit, the first number of each column of 3, 4, 5 is sent for the first time, and the second number of each column of 3, 4, 5 is sent for the second time Number, the 3rd number of each column in the 3rd, 4th, and 5th columns is sent for the 3rd time ... or the first 2nd number of each column in the 3rd, 4th, and 5th columns is sent, and the 3rd and 4th numbers are sent for the second time , 5th and 3rd and 4th numbers in each column, 3rd, 4th, 5th and 5th and 6th numbers in each column ...;)

基礎處理電路接收到權值的數據（具體可為權值中卷積核的數據（簡稱權值數據）或者與該權值數據對應關聯的mask數據）之後，將該數據通過其橫向的數據輸出介面傳輸給其相連接下一個基礎處理電路（例如，圖1b中基礎處理電路陣列中間的白色填充的橫向的數據通路）；基礎處理電路接收到輸入數據的數據（該數據可為主處理電路發送的輸入數據以及該輸入數據關聯的標識mask數據）後，將該數據通過其豎向的數據輸出介面傳輸給與其相連接的下一個基礎處理電路（例如，圖1b中基礎處理電路陣列中間的白色填充的豎向的數據通路）；After the basic processing circuit receives the weight data (specifically, the data of the convolution kernel in the weight (referred to as the weight data) or the mask data corresponding to the weight data), it outputs the data through its horizontal data. The interface transmits to its next basic processing circuit (for example, the white filled horizontal data path in the middle of the basic processing circuit array in Figure 1b); the basic processing circuit receives the input data (the data can be sent by the main processing circuit) Input data and the identification mask data associated with the input data), then transmit the data to the next basic processing circuit connected to it through its vertical data output interface (for example, the white in the middle of the basic processing circuit array in Figure 1b) Filled vertical data path);

具體的，主處理電路的控制電路可將輸入數據以及該輸入數據關聯的mask數據一起發送給基處理電路，基礎處理電路接收該輸入數據以及該輸入數據關聯的mask數據；Specifically, the control circuit of the main processing circuit may send the input data and the mask data associated with the input data to the base processing circuit, and the base processing circuit receives the input data and the mask data associated with the input data;

每一個基礎處理電路對接收到的數據進行運算；具體的，基礎處理電路可啓用第二映射電路根據輸入數據關聯的mask數據以及權值數據關聯的mask數據（即權值中卷積核所關聯的mask數據）得到連接標識數據；再利用連接標識數據選擇輸入數據以及權值數據中絕對值大於預設閾值的數據進行乘法運算；Each basic processing circuit performs operations on the received data. Specifically, the basic processing circuit may enable the second mapping circuit to associate the mask data associated with the input data and the mask data associated with the weight data (that is, the values associated with the convolution kernel in the weight). Mask data) to obtain the connection identification data; and then use the connection identification data to select the input data and the data in the weight data whose absolute value is greater than a preset threshold value for multiplication;

在一種可選方案中，每個基礎處理電路中若接收的數據（具體可為待計算的數據塊，如權值中卷積核中的數據以及該數據關聯的mask數據、輸入數據或者該輸入數據關聯的mask數據）的數據量超過預設閾值時，該基礎處理電路將不再接收新的輸入數據，如主處理電路將後續發送的權值中某幾個卷積核中的數據以及該數據對應關聯的mask數據等等，直至基礎處理電路中擁有足夠的緩存/存儲空間，再接收主處理電路新發送的數據。In an optional solution, if the data received in each basic processing circuit (specifically may be a data block to be calculated, such as data in a convolution kernel in weights and mask data associated with the data, input data, or the input When the amount of data associated with the mask data) exceeds a preset threshold, the basic processing circuit will no longer receive new input data, such as the data in several convolution kernels in the weights that the main processing circuit will subsequently send and the The data corresponds to the associated mask data, etc., until there is sufficient buffer / storage space in the basic processing circuit, and then the newly sent data from the main processing circuit is received.

在一種可選方案中，基礎處理電路每次計算一組或多組兩個數據的乘法，然後將結果累加到寄存器和/或片上緩存上；In an optional solution, the basic processing circuit calculates a multiplication of one or more sets of two data at a time, and then accumulates the results in a register and / or an on-chip buffer;

在一種可選方案中，基礎處理電路每次計算一組或多組兩個向量的內積，然後將結果累加到寄存器和/或片上緩存上；In an optional solution, the basic processing circuit calculates an inner product of one or more groups of two vectors at a time, and then accumulates the results in a register and / or an on-chip buffer;

在一個實施例中，本發明公開了一種神經網絡運算裝置，其包括用於執行如上所述方法實施例中提供的所有或部分實施方式所對應的功能單元。In one embodiment, the present invention discloses a neural network computing device, which includes a functional unit corresponding to all or part of the implementation methods provided in the method embodiments described above.

在一個實施例裏，本發明公開了一種芯片（如圖4），用於執行如上所述方法實施例中提供的所有或部分實施方式。In one embodiment, the present invention discloses a chip (as shown in FIG. 4) for performing all or part of the implementation manners provided in the method embodiments described above.

在一個實施例裏，本發明公開了一種電子裝置，其包括用於執行如上所述方法實施例中的所有或部分實施方式的功能單元。In one embodiment, the present invention discloses an electronic device including a functional unit for performing all or part of the method embodiments described above.

電子裝置包括數據處理裝置、機器人、電腦、打印機、掃描儀、平板電腦、智能終端、手機、行車記錄儀、導航儀、傳感器、攝像頭、伺服器、相機、攝像機、投影儀、手錶、耳機、移動存儲、可穿戴設備、交通工具、家用電器、和/或醫療設備。Electronic devices include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cameras, camcorders, projectors, watches, headphones, mobile Storage, wearables, vehicles, home appliances, and / or medical devices.

所述交通工具包括飛機、輪船和/或車輛；所述家用電器包括電視、空調、微波爐、冰箱、電飯煲、加濕器、洗衣機、電燈、燃氣灶、油煙機；所述醫療設備包括核磁共振儀、B型超音波掃描儀和/或心電圖儀。The vehicles include airplanes, ships, and / or vehicles; the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, cooker hoods, and the medical equipment includes nuclear magnetic resonance Instrument, type B ultrasound scanner and / or electrocardiograph.

以上所述的具體實施例，對本披露的目的、技術方案和有益效果進行了進一步詳細說明，所應理解的是，以上所述僅為本披露的具體實施例而已，並不用於限制本披露，凡在本披露的精神和原則之內，所做的任何修改、等同替換、改進等，均應包含在本披露的保護範圍之內。The specific embodiments described above further describe the purpose, technical solution and beneficial effects of the present disclosure. It should be understood that the above are only specific embodiments of the present disclosure and are not used to limit the present disclosure. Any modification, equivalent replacement, or improvement made within the spirit and principle of this disclosure shall be included in the protection scope of this disclosure.

S、P‧‧‧矩陣S, P‧‧‧ Matrix

C‧‧‧向量 C‧‧‧ vector

圖1a是一種集成電路芯片裝置結構示意圖。FIG. 1a is a schematic structural diagram of an integrated circuit chip device.

圖1b是另一種集成電路芯片裝置結構示意圖。FIG. 1b is a schematic structural diagram of another integrated circuit chip device.

圖1c是一種基礎處理電路的結構示意圖。Figure 1c is a schematic structural diagram of a basic processing circuit.

圖1d是一種主處理電路的結構示意圖。FIG. 1d is a schematic structural diagram of a main processing circuit.

圖2a是一種基礎處理電路的使用方法示意圖。Figure 2a is a schematic diagram of a method of using a basic processing circuit.

圖2b是一種主處理電路傳輸數據示意圖。Figure 2b is a schematic diagram of data transmission by a main processing circuit.

圖2c是矩陣乘以向量的示意圖。Figure 2c is a schematic diagram of a matrix multiplied by a vector.

圖2d是一種集成電路芯片裝置結構示意圖。FIG. 2d is a schematic structural diagram of an integrated circuit chip device.

圖2e是又一種集成電路芯片裝置結構示意圖。FIG. 2e is a schematic structural diagram of another integrated circuit chip device.

圖2f是矩陣乘以矩陣的示意圖。Figure 2f is a schematic diagram of a matrix multiplied by a matrix.

圖3a為卷積輸入數據示意圖。Figure 3a is a schematic diagram of convolution input data.

圖3b為卷積核示意圖。Figure 3b is a schematic diagram of a convolution kernel.

圖3c為輸入數據的一個三維數據塊的運算窗口示意圖。FIG. 3c is a schematic diagram of a calculation window of a three-dimensional data block of input data.

圖3d為輸入數據的一個三維數據塊的另一運算窗口示意圖。FIG. 3d is a schematic diagram of another operation window of a three-dimensional data block of input data.

圖3e為輸入數據的一個三維數據塊的又一運算窗口示意圖。FIG. 3e is another schematic view of a calculation window of a three-dimensional data block of input data.

圖4a-圖5b為本申請實施例提供的兩種映射電路的結構示意圖。4a-5b are schematic structural diagrams of two types of mapping circuits provided by embodiments of the present application.

Claims

An integrated circuit chip device, wherein the integrated circuit chip device includes: a main processing circuit and a plurality of basic processing circuits; the main processing circuit includes a first mapping circuit; at least one of the plurality of basic processing circuits includes a first Two mapping circuits, the first mapping circuit and the second mapping circuit are both used to perform compression processing of each data in a neural network operation; The plurality of basic processing circuits are distributed in an array; each basic processing circuit is connected to other adjacent basic processing circuits, and the main processing circuit is connected to n basic processing circuits in the first row, n basic processing circuits in the m row, and M basic processing circuits in the first column; The main processing circuit is configured to perform each continuous operation in a neural network operation and transmit data to the basic processing circuit connected thereto; The plurality of basic processing circuits are configured to perform operations in the neural network in a parallel manner according to the transmitted data, and transmit the operation results to the main processing circuit through the basic processing circuit connected to the main processing circuit.

The integrated circuit chip device according to the first patent application scope, wherein, The main processing circuit is configured to obtain a data block to be calculated and an operation instruction. According to the operation instruction, the data block to be calculated is divided into a horizontal data block and a vertical data block. A data block distributed to the basic processing circuit connected to the main processing circuit in a direction, the vertical data block being a data block distributed to the basic processing circuit connected to the main processing circuit in a vertical direction; starting the first mapping circuit to the The horizontal data block and the vertical data block are processed to obtain a processed horizontal data block and an identification data block associated with the horizontal data block, the processed vertical data block and the identification data block associated with the vertical data block; The processed horizontal data block and the identification data block associated with the horizontal data block are split and processed to obtain multiple basic data blocks and the identification data blocks associated with the basic data block, respectively. The multiple basic data blocks and the multiple basic data blocks are split. The identification data blocks associated with the data blocks are distributed to the underlying processing circuit connected to them, and the processed vertical data blocks and Data identifying the associated vertical block data blocks is broadcast to its base connected to the processing circuit; The basic processing circuit is configured to start the second mapping circuit to obtain a connection identification data block according to the identification data block associated with the vertical data block and the identification data associated with the basic data block, and to obtain the connection identification data block according to the connection identification data block. Process the data block and the basic data block to obtain a processed vertical data block and the basic data block; perform an inner product operation on the processed vertical data block and the basic data block to obtain an operation result, and send the operation result To the main processing circuit; The main processing circuit is configured to process the operation result to obtain the data block to be calculated and an instruction result of the operation instruction.

The integrated circuit chip device according to the first patent application scope, wherein, The main processing circuit is configured to obtain a data block to be calculated and an operation instruction, and divide the data block to be calculated into a horizontal data block and a vertical data block according to the operation instruction; and start the first mapping circuit pair. Process the horizontal data block to obtain a processed horizontal data block and an identification data block associated with the horizontal data block, or start the first mapping circuit to process the horizontal data block according to a pre-stored identification data block associated with the horizontal data block. Obtaining a processed horizontal data block; splitting the processed horizontal data block and an identification data block associated with the horizontal data block to obtain a plurality of basic data blocks and identification data blocks that are respectively associated with the plurality of basic data blocks, Distributing the plurality of basic data blocks and the identification data blocks respectively associated with the plurality of basic data blocks to a basic processing circuit connected thereto, and broadcasting the vertical data block to the basic processing circuit connected thereto; The basic processing circuit is used to start the second mapping circuit to process the vertical data block according to the identification data block associated with the basic data block, to obtain a processed vertical data block; and to process the processed vertical data block. Perform an inner product operation with the processed basic data block to obtain an operation result, and send the operation result to the main processing circuit; The main processing circuit is configured to process the operation result to obtain the data block to be calculated and an instruction result of the operation instruction.

The integrated circuit chip device according to the first patent application scope, wherein, The main processing circuit is configured to obtain a data block to be calculated and an operation instruction, and divide the data block to be calculated into a horizontal data block and a vertical data block according to the operation instruction; and start the first mapping circuit pair. The vertical data block is processed to obtain a processed vertical data block and an identification data block associated with the vertical data block, or the first mapping circuit is started according to a pre-stored identification data block associated with the vertical data block. The vertical data block is processed to obtain a processed vertical data block; the horizontal data block is split to obtain multiple basic data blocks; the multiple basic data blocks are distributed to a basic processing circuit connected thereto, and the processing is performed. The subsequent vertical data block and the identification data block associated with the vertical data block are broadcast to the basic processing circuit connected to the vertical data block; The basic processing circuit is used to start the second mapping circuit to process the basic data block according to the identification data block associated with the vertical data block to obtain a processed basic data block; the processed vertical data block and the An inner product operation is performed on the processed basic data block to obtain an operation result, and the operation result is sent to the main processing circuit; The main processing circuit is configured to process the operation result to obtain the data block to be calculated and an instruction result of the operation instruction.

The integrated circuit chip device according to any one of claims 1 to 4, wherein the data block to be calculated includes at least one weight, and / or at least one input neuron.

The integrated circuit chip device according to item 5 of the scope of patent application, wherein the identification data block is a matrix data block composed of 0 and 1, where 0 represents the weight or the absolute value of the input neuron is less than or equal to a first A threshold, 1 means that the weight or the absolute value of the input neuron is greater than the first threshold.

The integrated circuit chip device according to item 6 of the scope of patent application, wherein the connection identification data block is obtained by performing element-by-element and operation on the identification data associated with the vertical data block and the identification data block associated with the basic data block.

The integrated circuit chip device according to any one of claims 2-4, wherein, The basic processing circuit is specifically used for performing inner product processing on the basic data block and the vertical data block to obtain an inner product processing result, accumulating the inner product processing result to obtain an operation result, and sending the operation result to the main process. Circuit The main processing circuit is configured to, when the operation result is the result of the inner product processing, accumulate the operation result to obtain an accumulation result, and arrange the accumulation result to obtain the data block to be calculated and the instruction result of the operation instruction. .

The integrated circuit chip device according to any one of claims 2-4, wherein, The main processing circuit is specifically configured to divide the vertical data block into multiple partial vertical data blocks, and broadcast the multiple partial vertical data blocks to the basic processing circuit through multiple times; the multiple partial vertical data blocks Combine to form the vertical data block; The basic processing circuit is specifically configured to start the second mapping circuit to process the partial vertical data block according to the identification data block associated with the basic data block to obtain a processed partial vertical data block; the basic data block and the The processed vertical data block performs an inner product operation.

The integrated circuit chip device according to any one of claims 2-4, wherein, The main processing circuit is specifically configured to divide the processed vertical data block and the identification data block associated with the vertical data block into a plurality of partial vertical data blocks and the identification data block associated with the vertical data block. The multiple vertical data blocks and the identification data blocks associated with the multiple vertical data blocks are broadcast to the basic processing circuit multiple times; the multiple vertical data blocks are combined to form the vertical data block; The basic processing circuit is specifically configured to start the second mapping circuit to obtain a connection identification data block according to the identification data block associated with the basic data block and the identification data block associated with the partial vertical data block; according to the connection identification data block pair, The basic data block and the vertical data block are processed to obtain a processed basic data block and a processed partial broadcast data; and an inner product operation is performed on the processed basic data block and the processed vertical data block; Alternatively, the basic processing circuit is specifically configured to start the second mapping circuit to process the basic data block according to the identification data block associated with the vertical data block of the part to obtain a processed basic data block; and to process the processed basic data The inner product is performed on the block and the part of the vertical data block.

The integrated circuit chip device according to any one of claims 2-4, wherein, The basic processing circuit is specifically configured to perform an inner product process on the partial vertical data block and the basic data block to obtain an inner product processing result, accumulate the inner product processing result to obtain a partial operation result, and perform the partial operation result. Sent to the main processing circuit; or, The basic processing circuit is specifically configured to multiplex the partial vertical data blocks n times and execute the inner product operation of the partial vertical data blocks and n basic data blocks to obtain n partial processing results, and divide the n partial processing results respectively. After the accumulation, n partial operation results are obtained, and the n partial operation results are sent to the main processing circuit, where n is an integer greater than or equal to 2.

A chip, wherein the chip integrates a device such as any one of items 1 to 11 of the scope of patent application.

An electronic device, wherein the electronic device includes a chip as in item 12 of the scope of patent application.

A method for calculating a neural network, wherein the method is applied in an integrated circuit chip device, and the integrated circuit chip device includes: an integrated circuit chip device according to any one of claims 1 to 11 of the scope of patent application; For performing operations on neural networks; Among them, the operation of the neural network includes one or any of convolution operation, matrix multiplication matrix operation, matrix multiplication vector operation, paranoid operation, fully connected operation, general matrix multiplication GEMM operation, general matrix vector multiplication GEMV operation, and activation operation. combination.