TWI740761B - Data processing apparatus, artificial intelligence chip - Google Patents

Data processing apparatus, artificial intelligence chip Download PDF

Info

Publication number
TWI740761B
TWI740761B TW109146826A TW109146826A TWI740761B TW I740761 B TWI740761 B TW I740761B TW 109146826 A TW109146826 A TW 109146826A TW 109146826 A TW109146826 A TW 109146826A TW I740761 B TWI740761 B TW I740761B
Authority
TW
Taiwan
Prior art keywords
selection unit
unit
data processing
input
processing device
Prior art date
Application number
TW109146826A
Other languages
Chinese (zh)
Other versions
TW202129553A (en
Inventor
孫海濤
王文強
胡英俊
蔣科
發明人放棄姓名表示權
Original Assignee
大陸商上海商湯智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大陸商上海商湯智能科技有限公司 filed Critical 大陸商上海商湯智能科技有限公司
Publication of TW202129553A publication Critical patent/TW202129553A/en
Application granted granted Critical
Publication of TWI740761B publication Critical patent/TWI740761B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Abstract

The present disclosure provides a data processing apparatus and artificial intelligence chip. The apparatus includes a first/second selection unit provided with multiple input and output terminals, and multiple arithmetic units. The input terminals of the first selection unit are configured to connect with its output terminals. At least a part of the output terminals of the first selection unit connect with input terminals of the arithmetic units. Output terminals of the arithmetic units connect with the input terminals of the second selection unit. The input terminals of the second selection unit connect with its output terminals. At least a part of the output terminals of the second selection unit connect with the input terminals of the first selection unit, and/or with a data output port of the apparatus. Such connection enables the arithmetic units to construct different pathways for arithmetic operation.

Description

數據處理裝置、人工智能晶片Data processing device, artificial intelligence chip

本公開涉及數據處理技術領域,尤其涉及數據處理裝置、人工智能晶片。The present disclosure relates to the field of data processing technology, in particular to data processing devices and artificial intelligence chips.

在各種應用場景(例如,神經網路應用場景)中,存在各種各樣的運算操作,有基本的簡單函數的運算,比如加、減、乘、除等,也存在大量的非常規的複雜運算,並且不同的應用場景所包括的複雜運算的形式也五花八門,同時新類型的複雜運算組合方式也層出不窮。隨著運算類型的增加以及運算量的增大,執行運算操作的數據處理裝置的面積和功耗也相應增大。In various application scenarios (for example, neural network application scenarios), there are a variety of operations, including basic simple function operations, such as addition, subtraction, multiplication, division, etc., as well as a large number of unconventional complex operations , And the forms of complex calculations included in different application scenarios are also varied, and new types of complex calculation combinations are also emerging in an endless stream. As the types of operations increase and the amount of operations increases, the area and power consumption of data processing devices that perform operations also increase correspondingly.

本公開提供一種數據處理裝置、人工智能晶片。The present disclosure provides a data processing device and artificial intelligence chip.

根據本公開實施例的第一方面,提供一種數據處理裝置,所述裝置包括:具有多個輸入端和多個輸出端的第一選擇單元,具有多個輸入端和多個輸出端的第二選擇單元,以及多個運算單元;所述第一選擇單元的多個輸入端可配置地連接所述第一選擇單元的多個輸出端,所述第一選擇單元的多個輸出端中的至少一部分連接所述多個運算單元的輸入端,所述多個運算單元的輸出端連接於所述第二選擇單元的多個輸入端,所述第二選擇單元的多個輸入端可配置地連接所述第二選擇單元的多個輸出端,所述第二選擇單元的多個輸出端中的至少一部分與所述第一選擇單元的多個輸入端連接,和/或與所述數據處理裝置的數據輸出端口連接,以使所述多個運算單元構成不同的運算通路。According to a first aspect of the embodiments of the present disclosure, there is provided a data processing device, the device comprising: a first selection unit having a plurality of input terminals and a plurality of output terminals, and a second selection unit having a plurality of input terminals and a plurality of output terminals , And multiple arithmetic units; multiple input terminals of the first selection unit are configurably connected to multiple output terminals of the first selection unit, and at least a part of the multiple output terminals of the first selection unit is connected The input ends of the multiple arithmetic units, the output ends of the multiple arithmetic units are connected to the multiple input ends of the second selection unit, and the multiple input ends of the second selection unit are configurably connected to the The multiple output terminals of the second selection unit, at least a part of the multiple output terminals of the second selection unit are connected to the multiple input terminals of the first selection unit, and/or are connected to the data of the data processing device The output ports are connected so that the multiple arithmetic units form different arithmetic paths.

在一些實施例中,所述裝置還包括:第一暫存器,用於儲存第一配置資訊,所述第一配置資訊用於配置:所述第一選擇單元的多個輸入端與所述第一選擇單元的多個輸出端之間的連接關係,和/或,所述第二選擇單元的多個輸入端與所述第二選擇單元的多個輸出端之間的連接關係。In some embodiments, the device further includes: a first register for storing first configuration information, and the first configuration information is used for configuring: the plurality of input terminals of the first selection unit and the The connection relationship between the multiple output terminals of the first selection unit, and/or the connection relationship between the multiple input terminals of the second selection unit and the multiple output terminals of the second selection unit.

在一些實施例中,所述裝置還包括:第二暫存器,用於儲存第二配置資訊,所述第二配置資訊用於配置所述多個運算單元中的至少一部分的運算類型。In some embodiments, the device further includes: a second register for storing second configuration information, and the second configuration information is used for configuring the operation type of at least a part of the plurality of arithmetic units.

在一些實施例中,所述第一選擇單元的多個輸入端包括至少一個第一輸入端,所述第一輸入端與所述數據處理裝置的數據輸入端口連接,用於輸入原始操作數。In some embodiments, the multiple input terminals of the first selection unit include at least one first input terminal, and the first input terminal is connected to the data input port of the data processing device for inputting the original operand.

在一些實施例中,所述運算單元用於檢測輸入數據中的有效標識資訊,並響應於檢測到所述輸入數據中的有效標識資訊,對所述輸入數據進行運算。In some embodiments, the arithmetic unit is used to detect valid identification information in the input data, and in response to detecting the valid identification information in the input data, perform operations on the input data.

在一些實施例中,所述裝置還包括:至少一個延遲單元;所述延遲單元的輸入端連接於所述第一選擇單元的輸出端,所述延遲單元的輸出端連接於所述第二選擇單元的輸入端;所述延遲單元用於對從所述第一選擇單元的輸出端接收到的數據進行延遲處理,並將所述延遲處理後的數據傳輸至所述第二選擇單元的輸入端。In some embodiments, the device further includes: at least one delay unit; the input terminal of the delay unit is connected to the output terminal of the first selection unit, and the output terminal of the delay unit is connected to the second selection unit. The input terminal of the unit; the delay unit is used to delay processing the data received from the output terminal of the first selection unit, and transmit the delayed processed data to the input terminal of the second selection unit .

在一些實施例中,所述裝置還包括:至少一個第三暫存器,所述第三暫存器的輸入端連接所述第二選擇單元的輸出端,所述第三暫存器的輸出端連接所述第一選擇單元的輸入端,或連接於所述數據處理裝置的數據輸出端口。In some embodiments, the device further includes: at least one third register, the input terminal of the third register is connected to the output terminal of the second selection unit, and the output of the third register The terminal is connected to the input terminal of the first selection unit, or connected to the data output port of the data processing device.

在一些實施例中,所述第一選擇單元的多個輸入端包括至少一個第二輸入端,所述第二輸入端通過連接線與所述第二選擇單元的輸出端連接,或者通過連接線與用於儲存運算參數的第四暫存器連接。In some embodiments, the multiple input terminals of the first selection unit include at least one second input terminal, and the second input terminal is connected to the output terminal of the second selection unit through a connecting line, or through a connecting line. Connect with the fourth register for storing operation parameters.

在一些實施例中,所述多個運算單元包括至少一個算數運算單元和/或至少一個邏輯運算單元。In some embodiments, the multiple operation units include at least one arithmetic operation unit and/or at least one logic operation unit.

根據本公開實施例的第二方面,提供一種人工智能晶片,所述人工智能晶片包括任一實施例所述的數據處理裝置。According to a second aspect of the embodiments of the present disclosure, an artificial intelligence chip is provided, and the artificial intelligence chip includes the data processing device described in any one of the embodiments.

在一些實施例中,所述人工智能晶片還包括:控制單元,所述控制單元用於控制所述數據處理裝置,以使所述數據處理裝置中的多個運算單元構成不同的運算通路。In some embodiments, the artificial intelligence chip further includes: a control unit configured to control the data processing device so that multiple arithmetic units in the data processing device form different arithmetic paths.

在一些實施例中,所述控制單元進一步用於:對所述數據處理裝置的配置資訊進行配置,以使所述數據處理裝置中的多個運算單元構成不同的運算通路。In some embodiments, the control unit is further configured to configure the configuration information of the data processing device, so that multiple arithmetic units in the data processing device form different arithmetic paths.

在一些實施例中,所述配置資訊包括以下至少任一:第一配置資訊,用於配置所述第一選擇單元的多個輸入端與所述第一選擇單元的多個輸出端之間的連接關係,和/或所述第二選擇單元的多個輸入端與所述第二選擇單元的多個輸出端之間的連接關係,第二配置資訊,用於配置所述多個運算單元中的至少一部分的運算類型。In some embodiments, the configuration information includes at least any one of the following: first configuration information for configuring the relationship between the multiple input terminals of the first selection unit and the multiple output terminals of the first selection unit The connection relationship, and/or the connection relationship between the multiple input terminals of the second selection unit and the multiple output terminals of the second selection unit, and the second configuration information is used to configure the multiple arithmetic units At least part of the operation type.

在一些實施例中,所述第一選擇單元的多個輸入端包括至少一個第一輸入端;所述控制單元進一步用於:將原始操作數寫入所述第一選擇單元的至少一個第一輸入端。In some embodiments, the multiple input terminals of the first selection unit include at least one first input terminal; the control unit is further configured to: write the original operand into the at least one first input terminal of the first selection unit. Input terminal.

本公開實施例的數據處理裝置,透過改變第一選擇單元和第二選擇單元內部的連接方式,能夠形成不同的數據處理通路,透過非固化的裝置連接方式,實現了運算單元的高效複用,節省了數據處理裝置的面積。另外,在第一選擇單元和第二選擇單元內部的連接方式配置好之後,本公開實施例的方案能夠在運算通路中自動形成多個運算單元的流水操作,完成高效的數據處理,節省了數據處理功耗,獲得了較高的能耗比。The data processing device of the embodiment of the present disclosure can form different data processing paths by changing the internal connection mode of the first selection unit and the second selection unit. Through the non-cured device connection mode, the efficient multiplexing of the computing unit is realized. Save the area of the data processing device. In addition, after the internal connection modes of the first selection unit and the second selection unit are configured, the solution of the embodiment of the present disclosure can automatically form a pipeline operation of multiple arithmetic units in the arithmetic path, complete efficient data processing, and save data. Processing power consumption, to obtain a higher energy consumption ratio.

應當理解的是,以上的一般描述和後文的細節描述僅是示例性和解釋性的,而非限制本公開。It should be understood that the above general description and the following detailed description are only exemplary and explanatory, rather than limiting the present disclosure.

這裡將詳細地對示例性實施例進行說明,其示例表示在附圖中。下面的描述涉及附圖時,除非另有表示,不同附圖中的相同數字表示相同或相似的要素。以下示例性實施例中所描述的實施方式並不代表與本公開相一致的所有實施方式。相反,它們僅是與如所附權利要求書中所詳述的、本公開的一些方面相一致的裝置和方法的例子。The exemplary embodiments will be described in detail here, and examples thereof are shown in the accompanying drawings. When the following description refers to the accompanying drawings, unless otherwise indicated, the same numbers in different drawings represent the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with the present disclosure. On the contrary, they are merely examples of devices and methods consistent with some aspects of the present disclosure as detailed in the appended claims.

在本公開使用的術語是僅僅出於描述特定實施例的目的,而非旨在限制本公開。在本公開和所附權利要求書中所使用的單數形式的“一種”、“所述”和“該”也旨在包括多數形式,除非上下文清楚地表示其他含義。還應當理解,本文中使用的術語“和/或”是指並包含一個或多個相關聯的列出項目的任何或所有可能組合。另外,本文中術語“至少一種”表示多種中的任意一種或多種中的至少兩種的任意組合。The terms used in the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. The singular forms of "a", "said" and "the" used in the present disclosure and appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items. In addition, the term "at least one" herein means any one of a plurality of types or any combination of at least two of the plurality of types.

應當理解,儘管在本公開可能採用術語第一、第二、第三等來描述各種資訊,但這些資訊不應限於這些術語。這些術語僅用來將同一類型的資訊彼此區分開。例如,在不脫離本公開範圍的情況下,第一資訊也可以被稱為第二資訊,類似地,第二資訊也可以被稱為第一資訊。取決於語境,如在此所使用的詞語“如果”可以被解釋成為“在……時”或“當……時”或“響應於確定”。It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of the present disclosure, the first information can also be referred to as second information, and similarly, the second information can also be referred to as first information. Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to determination".

為了使本技術領域的人員更好的理解本公開實施例中的技術方案,並使本公開實施例的上述目的、特徵和優點能夠更加明顯易懂,下面結合附圖對本公開實施例中的技術方案作進一步詳細的說明。In order to enable those skilled in the art to better understand the technical solutions in the embodiments of the present disclosure, and to make the above-mentioned objectives, features, and advantages of the embodiments of the present disclosure more obvious and understandable, the following describes the technology in the embodiments of the present disclosure with reference to the accompanying drawings. The plan is explained in further detail.

在許多應用場景下,往往存在各種各樣的數據處理操作,例如,運算操作。以神經網路應用場景為例,神經網路中的激活函數的運算可能包括基本的簡單函數運算,比如加法運算、減法運算、乘法運算和除法運算等,也可能包括大量的非常規的複雜運算,並且不同的神經網路中所包括的複雜運算的形式也五花八門;同時,新類型的神經網路也層出不窮。因此,在設計用於執行數據處理操作的數據處理裝置時,往往面臨著運算單元類型多、數據處理量大的難題,另外,還要求數據處理裝置同時要兼顧面積小和低功耗的設計要求。In many application scenarios, there are often various data processing operations, for example, arithmetic operations. Taking neural network application scenarios as an example, the operation of activation function in neural network may include basic simple function operations, such as addition, subtraction, multiplication and division, etc., and may also include a large number of unconventional complex operations. , And the forms of complex operations included in different neural networks are also varied; at the same time, new types of neural networks are also emerging in endlessly. Therefore, when designing a data processing device for performing data processing operations, it is often faced with the problems of multiple types of arithmetic units and a large amount of data processing. In addition, the data processing device is also required to take into account the design requirements of small area and low power consumption. .

本公開實施例提供了一種數據處理裝置,如圖1所示,所述裝置可包括:具有多個輸入端和多個輸出端的第一選擇單元101,具有多個輸入端和多個輸出端的第二選擇單元102,以及多個運算單元103。An embodiment of the present disclosure provides a data processing device. As shown in FIG. 1, the device may include: a first selection unit 101 with multiple input terminals and multiple output terminals, and a first selection unit 101 with multiple input terminals and multiple output terminals. Two selection units 102, and multiple operation units 103.

所述第一選擇單元101的多個輸入端可配置地連接所述第一選擇單元101的多個輸出端,所述第一選擇單元101的多個輸出端中的至少一部分連接所述多個運算單元103的輸入端。The multiple input terminals of the first selection unit 101 are configurably connected to multiple output terminals of the first selection unit 101, and at least a part of the multiple output terminals of the first selection unit 101 is connected to the multiple output terminals. The input terminal of the arithmetic unit 103.

所述多個運算單元103的輸出端連接於所述第二選擇單元102的多個輸入端。The output terminals of the multiple arithmetic units 103 are connected to the multiple input terminals of the second selection unit 102.

所述第二選擇單元102的多個輸入端可配置地連接所述第二選擇單元102的多個輸出端,所述第二選擇單元102的多個輸出端中的至少一部分與所述第一選擇單元101的多個輸入端連接,和/或與所述數據處理裝置的數據輸出端口連接,以使所述多個運算單元103構成不同的運算通路。The multiple input terminals of the second selection unit 102 are configurably connected to multiple output terminals of the second selection unit 102, and at least a part of the multiple output terminals of the second selection unit 102 is connected to the first The multiple input terminals of the selection unit 101 are connected, and/or are connected with the data output port of the data processing device, so that the multiple arithmetic units 103 form different arithmetic paths.

在本公開實施例中,所述第一選擇單元101和所述第二選擇單元102均可以包括多個輸入端和多個輸出端,所述第一選擇單元101可配置地將輸入端中的部分或全部連接到輸出端中的部分或全部;同理,所述第二選擇單元102也可配置地將輸入端中的部分或全部連接到輸出端中的部分或全部,從而使連接在第一選擇單元101和所述第二選擇單元102之間的多個運算單元103可以構成不同的數據處理通路。在實際應用中,第一選擇單元101的輸入端的數量與第一選擇單元101的輸出端的數量可以相同,也可以不同;第二選擇單元102的輸入端的數量與第二選擇單元102的輸出端的數量可以相同,也可以不同;第一選擇單元101的輸入端的數量與第二選擇單元102的輸出端的數量可以相同,也可以不同。In the embodiment of the present disclosure, the first selection unit 101 and the second selection unit 102 may each include multiple input terminals and multiple output terminals, and the first selection unit 101 configurable Part or all is connected to part or all of the output terminal; in the same way, the second selection unit 102 can also be configured to connect part or all of the input terminal to part or all of the output terminal, so that it is connected to the first A plurality of arithmetic units 103 between a selection unit 101 and the second selection unit 102 can form different data processing paths. In practical applications, the number of input terminals of the first selection unit 101 and the number of output terminals of the first selection unit 101 may be the same or different; the number of input terminals of the second selection unit 102 and the number of output terminals of the second selection unit 102 It can be the same or different; the number of input terminals of the first selection unit 101 and the number of output terminals of the second selection unit 102 can be the same or different.

在本公開實施例中,選擇單元的內部連接關係可以根據需求配置,例如,選擇單元的輸入端與輸出端之間的連接關係,或者,選擇單元的外部連接關係可以配置,例如,選擇單元的輸入端或輸出端與其他單元之間的連接關係,等等。In the embodiments of the present disclosure, the internal connection relationship of the selection unit can be configured according to requirements, for example, the connection relationship between the input terminal and the output terminal of the selection unit, or the external connection relationship of the selection unit can be configured, for example, the connection relationship of the selection unit The connection relationship between the input or output and other units, and so on.

第一選擇單元101的輸入端與輸出端的連接方式,以及第二選擇單元102的輸入端與輸出端的連接方式,可以根據第一配置資訊來確定。所述第一配置資訊可以預先配置。當第一配置資訊變化的時候,第一選擇單元101的輸入端與輸出端的連接方式,以及所述第二選擇單元102的輸入端與輸出端的連接方式均可以發生變化,從而使多個運算單元103構成不同的數據處理通路。所述連接方式包括第一選擇單元101的每個輸入端所連接的第一選擇單元101的輸出端,以及所述第二選擇單元102的每個輸入端所連接的第二選擇單元102的輸出端。The connection mode between the input terminal and the output terminal of the first selection unit 101 and the connection mode between the input terminal and the output terminal of the second selection unit 102 can be determined according to the first configuration information. The first configuration information may be pre-configured. When the first configuration information changes, the connection between the input terminal and the output terminal of the first selection unit 101 and the connection between the input terminal and the output terminal of the second selection unit 102 can be changed, so that multiple computing units 103 constitutes different data processing paths. The connection mode includes the output terminal of the first selection unit 101 connected to each input terminal of the first selection unit 101, and the output terminal of the second selection unit 102 connected to each input terminal of the second selection unit 102 end.

如圖2所示,在一些實施例中,所述裝置還包括:第一暫存器104,用於儲存第一配置資訊,所述第一配置資訊用於配置:所述第一選擇單元101的多個輸入端與所述第一選擇單元101的多個輸出端之間的連接關係,和/或,所述第二選擇單元102的多個輸入端與所述第二選擇單元102的多個輸出端之間的連接關係。所述第一選擇單元101可以與所述第一暫存器104相連接,以從所述第一暫存器104獲取所述第一配置資訊。同理,所述第二選擇單元102也可以與所述第一暫存器104相連接,以從所述第一暫存器104獲取所述第一配置資訊。所述第一暫存器104的數量可以是一個或多個,其數量可以根據所述第一配置資訊的長度以及所述第一暫存器104的寬度來確定。As shown in FIG. 2, in some embodiments, the device further includes: a first register 104 for storing first configuration information, and the first configuration information is used for configuring: the first selection unit 101 The connection relationship between the multiple input terminals of the first selection unit 101 and the multiple output terminals of the first selection unit 101, and/or the multiple input terminals of the second selection unit 102 and the multiple output terminals of the second selection unit 102 The connection relationship between the output terminals. The first selection unit 101 may be connected to the first register 104 to obtain the first configuration information from the first register 104. Similarly, the second selection unit 102 can also be connected to the first register 104 to obtain the first configuration information from the first register 104. The number of the first register 104 can be one or more, and the number can be determined according to the length of the first configuration information and the width of the first register 104.

如圖3A和圖3B所示,是本公開一些實施例的第一選擇單元101和所述第二選擇單元102的連接方式的示意圖,以及該連接方式下各個運算單元103構成的數據處理通路的示意圖。在圖3A中,第一選擇單元101的輸入端1與輸出端2、輸入端2與輸出端4、輸入端3與輸出端1對應連接,第二選擇單元102的輸入端1與輸出端1、輸入端2與輸出端2、輸入端4與輸出端3對應連接。從而構成的數據處理通路上依次包括運算單元2、運算單元4和運算單元1。As shown in FIG. 3A and FIG. 3B, it is a schematic diagram of the connection mode of the first selection unit 101 and the second selection unit 102 in some embodiments of the present disclosure, and the data processing path formed by each arithmetic unit 103 in this connection mode. Schematic. In FIG. 3A, the input terminal 1 and the output terminal 2, the input terminal 2 and the output terminal 4, and the input terminal 3 and the output terminal 1 of the first selection unit 101 are connected correspondingly, and the input terminal 1 and the output terminal 1 of the second selection unit 102 , The input terminal 2 and the output terminal 2, the input terminal 4 and the output terminal 3 are connected correspondingly. The data processing path thus constituted includes arithmetic unit 2, arithmetic unit 4, and arithmetic unit 1 in sequence.

如圖4A和圖4B所示,是本公開另一些實施例的第一選擇單元101和所述第二選擇單元102的連接方式的示意圖,以及該連接方式下各個運算單元103構成的數據處理通路的示意圖。在圖4A中,第一選擇單元101的輸入端1與輸出端1、輸入端2與輸出端2、輸入端3與輸出端3、輸入端4與輸出端4對應連接,第二選擇單元102的輸入端1與輸出端2、輸入端2與輸出端3、輸入端3與輸出端4、輸入端4與輸出端1對應連接。從而構成的數據處理通路上依次包括運算單元1、運算單元2、運算單元3和運算單元4。4A and 4B are schematic diagrams of the connection modes of the first selection unit 101 and the second selection unit 102 in some other embodiments of the present disclosure, and the data processing path formed by each arithmetic unit 103 in this connection mode Schematic diagram. In FIG. 4A, the input terminal 1 and the output terminal 1, the input terminal 2 and the output terminal 2, the input terminal 3 and the output terminal 3, and the input terminal 4 and the output terminal 4 of the first selection unit 101 are connected correspondingly, and the second selection unit 102 The input terminal 1 and the output terminal 2, the input terminal 2 and the output terminal 3, the input terminal 3 and the output terminal 4, and the input terminal 4 and the output terminal 1 are connected correspondingly. The data processing path thus constructed includes arithmetic unit 1, arithmetic unit 2, arithmetic unit 3, and arithmetic unit 4 in sequence.

本領域技術人員可以理解,以上連接方式以及數據處理通路的實施例僅為示例性說明,本公開不限於此。在實際應用中,第一選擇單元101的輸入端的數量和輸出端的數量以及連接方式均可以根據需要而設定。並且,第一選擇單元101和所述第二選擇單元102均可以是一個單獨的選擇單元,也可以是由多個選擇單元構成的選擇單元。在本公開中並不對該選擇單元的實現方式進行限制,只要能夠實現上述示例中描述的選擇功能即可。例如,該選擇單元可以通過由基本門電路搭建的組合邏輯來實現。Those skilled in the art can understand that the above connection modes and the embodiments of the data processing path are only exemplary descriptions, and the present disclosure is not limited thereto. In practical applications, the number of input terminals and the number of output terminals of the first selection unit 101 and the connection mode can be set according to needs. In addition, both the first selection unit 101 and the second selection unit 102 may be a single selection unit, or may be a selection unit composed of multiple selection units. The implementation of the selection unit is not limited in the present disclosure, as long as the selection function described in the above example can be realized. For example, the selection unit can be realized by a combinational logic constructed by a basic gate circuit.

其中,運算單元103可以包括各種類型的運算單元。關於該運算單元的實現,在本公開中並不進行限制。例如,運算單元可直接調用來自常見的eda廠商的IP核。運算單元的類型可以包括但不限於算術運算單元和邏輯運算單元中的至少一者,或者包括向量運算單元、標量運算單元和矩陣運算單元中的至少一種,等等。其中,算術運算單元可包括但不限於以下至少一種:加減法運算單元、乘法運算單元、除法運算單元、指數運算單元、對數運算單元、開方運算單元、三角函數運算單元、導數運算單元、積分運算單元、卷積運算單元、取整運算單元等。所述邏輯運算單元可包括求與運算單元、求或運算單元、求非運算單元等等。每種類型的運算單元的數量可以是一個或多個。每個運算單元103可以包括一個或多個輸入端,也可以包括一個或多個輸出端。當一個運算單元103包括多個輸入端時,所述運算單元103的每個輸入端連接所述第一選擇單元101的一個輸出端;當一個運算單元103包括多個輸出端時,所述運算單元103的每個輸出端連接所述第二選擇單元102的一個輸入端。Among them, the arithmetic unit 103 may include various types of arithmetic units. Regarding the implementation of the arithmetic unit, there is no limitation in this disclosure. For example, the computing unit can directly call IP cores from common eda vendors. The type of operation unit may include but is not limited to at least one of an arithmetic operation unit and a logic operation unit, or at least one of a vector operation unit, a scalar operation unit, and a matrix operation unit, and so on. Among them, the arithmetic operation unit may include but is not limited to at least one of the following: addition and subtraction operation unit, multiplication operation unit, division operation unit, exponent operation unit, logarithm operation unit, square root operation unit, trigonometric function operation unit, derivative operation unit, integral Operation unit, convolution operation unit, rounding operation unit, etc. The logical operation unit may include an AND operation unit, an OR operation unit, a negation operation unit, and so on. The number of each type of arithmetic unit can be one or more. Each arithmetic unit 103 may include one or more input terminals, and may also include one or more output terminals. When an arithmetic unit 103 includes multiple input terminals, each input terminal of the arithmetic unit 103 is connected to an output terminal of the first selection unit 101; when an arithmetic unit 103 includes multiple output terminals, the arithmetic Each output terminal of the unit 103 is connected to an input terminal of the second selection unit 102.

運算單元103中的一者或多者的運算類型可根據實際情況進行配置。例如,選擇比較器可被配置為求最大值,求最小值,累計求最大值等;加減法運算單元可被配置為執行加法或者減法運算;累加運算單元可被配置為求累加或者執行加法運算;指數運算單元的冪指數可被配置為操作數或者操作數的相反數。在一些實施例中,所述裝置還包括:第二暫存器105,用於儲存第二配置資訊,所述第二配置資訊用於配置所述多個運算單元103中的至少一部分的運算類型。所述多個運算單元103中的至少一部分可以與所述第二暫存器105相連接,以從所述第二暫存器105中獲取所述第二配置資訊。所述第二暫存器105的數量可以是一個或多個,其數量可以根據所述第二配置資訊的長度以及所述第二暫存器105的寬度來確定。The operation type of one or more of the operation units 103 can be configured according to actual conditions. For example, the selection comparator can be configured to find the maximum value, find the minimum value, find the maximum value of accumulation, etc.; the addition and subtraction operation unit can be configured to perform addition or subtraction operations; the accumulation operation unit can be configured to find accumulation or perform addition operations ; The exponent of the exponent operation unit can be configured as the operand or the opposite of the operand. In some embodiments, the device further includes: a second register 105 for storing second configuration information, and the second configuration information is used for configuring the operation type of at least a part of the plurality of arithmetic units 103 . At least a part of the plurality of arithmetic units 103 can be connected to the second register 105 to obtain the second configuration information from the second register 105. The number of the second register 105 can be one or more, and the number can be determined according to the length of the second configuration information and the width of the second register 105.

在一些實施例中,第一暫存器104和第二暫存器105可以是相同的暫存器,該暫存器中的一部分儲存空間用於儲存第一配置資訊,另一部分儲存空間用於儲存第二配置資訊。例如,暫存器的第1至第N1位用於儲存第一配置資訊,第N1+1至第N2位用於儲存第二配置資訊。在一些實施例中,第一暫存器104和第二暫存器105也可以是不同的暫存器。In some embodiments, the first register 104 and the second register 105 may be the same register, and a part of the storage space in the register is used for storing the first configuration information, and the other part of the storage space is used for Store the second configuration information. For example, bits 1 to N1 of the register are used to store first configuration information, and bits N1+1 to N2 are used to store second configuration information. In some embodiments, the first register 104 and the second register 105 may also be different registers.

在一些實施例中,所述第一選擇單元101的多個輸入端包括至少一個第一輸入端,所述第一輸入端與所述數據處理裝置的數據輸入端口連接,用於輸入原始操作數。所述第一輸入端也稱為操作數輸入端。所述數據處理裝置的數據輸入端口可以從隨機存取記憶體、外部控制單元或者其他數據處理裝置獲取到所述原始操作數。在實際應用中,所述第一輸入端與所述數據處理裝置的數據輸入端口連接,也可以包括將所述第一輸入端本身作為所述數據處理裝置的數據輸入端口的情形。In some embodiments, the multiple input terminals of the first selection unit 101 include at least one first input terminal, and the first input terminal is connected to the data input port of the data processing device for inputting the original operand. . The first input terminal is also referred to as an operand input terminal. The data input port of the data processing device can obtain the original operand from a random access memory, an external control unit or other data processing devices. In practical applications, the connection of the first input terminal to the data input port of the data processing device may also include the case where the first input terminal itself is used as the data input port of the data processing device.

在進行操作數的運算時,首先向第一選擇單元101的一個操作數輸入端輸入操作數,由第一選擇單元101將操作數輸出至第一選擇單元101的一個或多個輸出端,再輸出至與所述一個或多個輸出端相連接的運算單元103進行運算,得到中間運算結果。中間運算結果經運算單元103的輸出端輸出至第二選擇單元102與運算單元103相連接的輸入端,再由第二選擇單元102的輸入端輸出至第二選擇單元102的輸出端,然後由第二選擇單元102的輸出端輸出至第一選擇單元101的輸入端,再重複上述過程,如此循環往復,直到得到最終運算結果,該最終運算結果可通過第二選擇單元102輸出至所述數據處理裝置的數據輸出端口。輸入到第一選擇單元101的操作數也可以經過運算單元103進行運算之後作為最終運算結果,再經第二選擇單元102直接輸出至所述數據處理裝置的數據輸出端口。When performing an operand operation, first input an operand to an operand input terminal of the first selection unit 101, and the first selection unit 101 outputs the operand to one or more output terminals of the first selection unit 101, and then The output is output to the arithmetic unit 103 connected to the one or more output terminals for calculation, and an intermediate calculation result is obtained. The intermediate operation result is output to the input terminal of the second selection unit 102 connected to the operation unit 103 through the output terminal of the operation unit 103, and then output from the input terminal of the second selection unit 102 to the output terminal of the second selection unit 102, and then by The output terminal of the second selection unit 102 is output to the input terminal of the first selection unit 101, and the above process is repeated, and so on, until the final operation result is obtained, and the final operation result can be output to the data through the second selection unit 102 The data output port of the processing device. The operand input to the first selection unit 101 can also be used as the final operation result after the operation unit 103 performs the operation, and then is directly output to the data output port of the data processing device through the second selection unit 102.

在一些實施例中,所述運算單元103用於檢測輸入數據中的有效標識資訊,並響應於檢測到所述輸入數據中的有效標識資訊,對所述輸入數據進行運算。所述輸入數據可以是原始操作數,也可以是經運算單元103運算後的中間運算結果,運算單元103對原始操作數進行運算得到中間運算結果之後,可以將所述有效標識資訊與所述中間運算結果一起輸出到第二選擇單元102。In some embodiments, the arithmetic unit 103 is configured to detect valid identification information in the input data, and in response to detecting the valid identification information in the input data, perform operations on the input data. The input data may be the original operand or the intermediate operation result after the operation unit 103. After the operation unit 103 calculates the original operand and obtains the intermediate operation result, the effective identification information can be combined with the intermediate operation result. The calculation result is output to the second selection unit 102 together.

在本實施例中,通過向所述第一選擇單元101寫入有效標識資訊從而開始執行運算操作。只有在寫入有效標識資訊之後,才開始執行運算操作,否則,不執行運算操作。在一次運算過程中,當輸入到第一選擇單元101的某個輸入端的一組操作數的數量為多個時,可以在該組操作數中的每一個操作數中攜帶所述有效標識資訊。In this embodiment, the calculation operation is started by writing valid identification information to the first selection unit 101. Only after the valid identification information is written, the calculation operation is started, otherwise, the calculation operation is not performed. During an operation, when the number of a set of operands input to a certain input terminal of the first selection unit 101 is multiple, each operand in the set of operands can carry the effective identification information.

例如,當輸入到第一選擇單元101的操作數輸入端的數據為{1, x, 2, x, 3}時,只有1、2和3這三個操作數為需要進行運算的有效操作數,兩個x均為無效操作數。此時,操作數1、2和3中可以分別攜帶所述有效標識資訊,而x中不攜帶所述有效標識資訊,從而可以區分出有效操作數和無效操作數。只有當操作數中攜帶所述有效標識資訊時,運算單元103才會對所述操作數進行處理,不攜帶所述有效標識資訊的操作數則不進行處理,從而可以節省數據處理裝置的功耗。For example, when the data input to the operand input terminal of the first selection unit 101 is {1, x, 2, x, 3}, only the three operands 1, 2, and 3 are valid operands that need to be operated on. Both x are invalid operands. At this time, operands 1, 2 and 3 can carry the effective identification information, and x does not carry the effective identification information, so that effective operands and invalid operands can be distinguished. Only when the operand carries the valid identification information, the arithmetic unit 103 will process the operand, and the operand that does not carry the valid identification information will not be processed, thereby saving the power consumption of the data processing device. .

在一些實施例中,所述裝置還包括:至少一個延遲單元106;所述延遲單元106的輸入端連接於所述第一選擇單元101的輸出端,所述延遲單元106的輸出端連接於所述第二選擇單元102的輸入端;所述延遲單元106用於對從所述第一選擇單元101的輸出端接收到的數據進行延遲處理,並將所述延遲處理後的數據傳輸至所述第二選擇單元102的輸入端。由於不同類型的運算單元進行運算所需的時間不同,因此,通過延遲單元106對運算單元103各個輸入端的數據進行時間對齊,可以保證具有多元輸入的運算單元103的每個輸入數據同時有效到達。在本公開示例中,延遲單元可以通過暫存器打拍(例如,通過移位暫存器)來實現。In some embodiments, the device further includes: at least one delay unit 106; the input terminal of the delay unit 106 is connected to the output terminal of the first selection unit 101, and the output terminal of the delay unit 106 is connected to the output terminal of the first selection unit 101. The input terminal of the second selection unit 102; the delay unit 106 is used to delay processing the data received from the output terminal of the first selection unit 101, and transmit the delayed processed data to the The input terminal of the second selection unit 102. Since different types of arithmetic units require different time to perform operations, the delay unit 106 time-aligns the data at each input end of the arithmetic unit 103 to ensure that each input data of the arithmetic unit 103 with multiple inputs arrives effectively at the same time. In the example of the present disclosure, the delay unit may be implemented by a register tap (for example, by a shift register).

例如,對於

Figure 02_image001
這一運算,首先需要通過指數運算單元對操作數進行指數運算,再將指數運算的結果輸入到加法運算單元的第一輸入端,將運算參數1輸入到加法運算單元的第二輸入端,然後由加法單元對兩個輸入端的數據進行相加運算。由於加法運算單元的第一輸入端的數據相對於第二輸入端的數據會存在延遲,因此,可以將運算參數1先輸入到一個延遲單元進行延遲處理,再輸入到加法運算單元的第二輸入端,以使加法運算單元的第一輸入端和第二輸入端的數據同時到達。 For example, for
Figure 02_image001
This operation first needs to perform exponential operation on the operand through the exponential operation unit, and then input the result of the exponential operation to the first input terminal of the addition operation unit, and input the operation parameter 1 into the second input terminal of the addition operation unit, and then The addition unit performs the addition operation on the data at the two input terminals. Since the data at the first input terminal of the addition operation unit is delayed relative to the data at the second input terminal, the operation parameter 1 can be input to a delay unit for delay processing, and then input to the second input terminal of the addition operation unit. So that the data of the first input terminal and the second input terminal of the addition unit arrive at the same time.

在一些實施例中,所述裝置還包括:至少一個第三暫存器107,所述第三暫存器107的輸入端連接所述第二選擇單元102的輸出端,所述第三暫存器107的輸出端連接所述第一選擇單元101的輸入端,或連接於所述數據處理裝置的數據輸出端口。其中,所述第三暫存器107的輸出端連接於所述數據處理裝置的數據輸出端口,也可以包括將所述第三暫存器107的輸出端作為所述數據處理裝置的數據輸出端口的情形。所述數據輸出端口可以連接到隨機存取記憶體或者其他數據處理裝置。通過設置第三暫存器107,能夠減少數據處理裝置內部的數據傳輸時延。In some embodiments, the device further includes: at least one third register 107, the input terminal of the third register 107 is connected to the output terminal of the second selection unit 102, and the third register The output terminal of the device 107 is connected to the input terminal of the first selection unit 101, or is connected to the data output port of the data processing device. Wherein, the output terminal of the third register 107 is connected to the data output port of the data processing device, and may also include the output terminal of the third register 107 as the data output port of the data processing device Situation. The data output port can be connected to a random access memory or other data processing device. By providing the third register 107, the data transmission delay inside the data processing device can be reduced.

在一些實施例中,所述第一選擇單元101的多個輸入端包括至少一個第二輸入端,所述第二輸入端通過連接線與所述第二選擇單元102的輸出端連接,或者通過連接線與用於儲存運算參數的第四暫存器連接。該連接線代表了兩個單元端口的連接,在晶片上對應實現為某種金屬走線。所述運算參數為常量運算參數,例如,對於y=1+x這一運算函數而言,其中的“1”即為所述運算參數。所述第四暫存器的數量可以是一個或多個,各個第四暫存器可以用於儲存取值不同的運算參數,例如,0,±1,±2,±Max等。所述連接線可以根據第三配置資訊將第一選擇單元101的第二輸入端選擇性地連接所述第二選擇單元102的輸出端或者所述第四暫存器。In some embodiments, the multiple input terminals of the first selection unit 101 include at least one second input terminal, and the second input terminal is connected to the output terminal of the second selection unit 102 through a connecting line, or through The connection line is connected to the fourth register for storing the operation parameters. The connection line represents the connection of the two unit ports, and is correspondingly implemented as a metal trace on the chip. The operation parameter is a constant operation parameter. For example, for the operation function y=1+x, "1" is the operation parameter. The number of the fourth register may be one or more, and each fourth register may be used to store operation parameters with different values, for example, 0, ±1, ±2, ±Max, etc. The connecting line can selectively connect the second input terminal of the first selection unit 101 to the output terminal of the second selection unit 102 or the fourth register according to the third configuration information.

第三暫存器和第四暫存器可以是相同或不同的暫存器,在一些實施例中,數據處理裝置可以包括共享快取記憶體單元,以供所述多個運算單元共享,或者進一步供所述第二選擇單元和/或第一選擇單元,以用於暫存數據,例如,暫存原始操作數、運算參數和運算結果中的一種或多種,本公開實施例對此不做限定。The third register and the fourth register may be the same or different registers. In some embodiments, the data processing device may include a shared cache memory unit for the multiple computing units to share, or The second selection unit and/or the first selection unit are further provided for temporarily storing data, for example, temporarily storing one or more of original operands, operation parameters, and operation results, which is not done in the embodiment of the present disclosure. limited.

在實際應用場景下,以上各種配置資訊(包括第一配置資訊、第二配置資訊、第三配置資訊和第四配置資訊)均可以是配置碼,或者是其他類型的配置資訊。以上各種配置資訊可以是同一配置資訊中的不同部分,或者是不同的配置資訊。In actual application scenarios, the above various configuration information (including first configuration information, second configuration information, third configuration information, and fourth configuration information) may all be configuration codes or other types of configuration information. The above various configuration information can be different parts of the same configuration information, or different configuration information.

本公開一些實施例的配置資訊如圖5所示。例如,第1位至第c1位為第一配置資訊,第c1+1位至第c1+c2位為第二配置資訊,第c1+c2+1位至第c1+c2+c3位為第三配置資訊,第c1+c2+ c3+1位至第c1+c2+ c3+c4位為第四配置資訊。該第四配置資訊可以用於對延遲單元106的延遲週期數進行配置,根據不同的第四配置資訊,可將延遲單元106的延遲週期數配置為一個或多個時鐘週期。通過對第一選擇單元101和第二選擇單元102內部的連接方式、各個運算單元103的運算類型和/或連接線的連接方式進行配置,可以適應各種複雜的運算類型,提高數據處理裝置的複用率。The configuration information of some embodiments of the present disclosure is shown in FIG. 5. For example, bits 1 to c1 are the first configuration information, bits c1+1 to c1+c2 are second configuration information, and bits c1+c2+1 to c1+c2+c3 are third Configuration information, bits c1+c2+c3+1 to bits c1+c2+c3+c4 are the fourth configuration information. The fourth configuration information can be used to configure the number of delay cycles of the delay unit 106. According to different fourth configuration information, the number of delay cycles of the delay unit 106 can be configured as one or more clock cycles. By configuring the internal connection mode of the first selection unit 101 and the second selection unit 102, the operation type of each arithmetic unit 103, and/or the connection mode of the connecting line, it is possible to adapt to various complex operation types and improve the complexity of the data processing device. Utilization rate.

上述c1、c2和c3的數值可以相同也可以不同,配置資訊中各部分的順序也可以根據實際需要而調整,例如,所述配置資訊中的第1位至第c2位可以是第二配置資訊,第c2+1位至第c1+c2位可以是第一配置資訊。配置資訊中各部分的功能以及長度可以預先設置好。The above-mentioned values of c1, c2, and c3 can be the same or different, and the order of each part in the configuration information can also be adjusted according to actual needs. For example, the first to c2 bits in the configuration information can be the second configuration information , The c2+1th bit to the c1+c2th bit may be the first configuration information. The function and length of each part of the configuration information can be set in advance.

以上各種配置資訊(第一、第二、第三、第四配置資訊等)可以分別對應於總的配置資訊的各個部分,也可以為單獨的配置資訊;當為單獨的配置資訊時,可以儲存在同一暫存器中,也可以儲存在不同暫存器中。The above configuration information (first, second, third, fourth configuration information, etc.) can respectively correspond to each part of the total configuration information, or it can be separate configuration information; when it is separate configuration information, it can be stored In the same register, it can also be stored in different registers.

在一次運算中,各配置資訊均保持不變。一次運算結束之後,可以通過改變配置資訊,從而改變運算通路或者運算通路上至少一個運算單元的運算類型。其中,所述一次運算過程是指從向第一選擇單元101的一個或多個輸入端各輸入一組操作數,直到輸出這組操作數對應的一組最終運算結果的過程,其中,第一選擇單元101的每個輸入端輸入的一組操作數中均可包括一個或多個操作數,同一輸入端的一組操作數依次輸入到該輸入端。In one operation, the configuration information remains unchanged. After an operation is completed, the configuration information can be changed to change the operation type of the operation path or at least one operation unit on the operation path. Wherein, the one-time operation process refers to the process from inputting a set of operands to one or more input terminals of the first selection unit 101 until outputting a set of final operation results corresponding to the set of operands, where the first A set of operands input by each input terminal of the selection unit 101 may include one or more operands, and a set of operands from the same input terminal are input to the input terminal in turn.

假設向第一選擇單元101的k個輸入端各輸入一組原始操作數,每個輸入端輸入的各組原始操作數分別為

Figure 02_image003
Figure 02_image005
,……,
Figure 02_image007
,m為每組原始操作數中的原始操作數的總數,則首先分別向k個輸入端輸入
Figure 02_image009
,再分別向k個輸入端輸入
Figure 02_image011
,以此類推,對應得到最終運算結果
Figure 02_image013
。從輸入
Figure 02_image009
,直到得到
Figure 02_image015
的過程稱為一次運算過程。當需要同時向第一選擇單元101的多個輸入端輸入原始操作數時,各個輸入端輸入的原始操作數可以先經過時間對齊再輸入到第一選擇單元101的多個輸入端。 Suppose that a set of original operands are input to each of the k input terminals of the first selection unit 101, and the original operands of each input terminal of each input terminal are respectively
Figure 02_image003
,
Figure 02_image005
,...,
Figure 02_image007
, M is the total number of original operands in each group of original operands, first input to k input terminals
Figure 02_image009
, And then input to the k input terminals respectively
Figure 02_image011
, And so on, corresponding to the final calculation result
Figure 02_image013
. Input from
Figure 02_image009
Until you get
Figure 02_image015
The process is called an operation process. When the original operands need to be input to multiple input terminals of the first selection unit 101 at the same time, the original operands input by each input terminal may be time-aligned before being input to the multiple input terminals of the first selection unit 101.

本公開實施例通過兩個可配置的選擇單元,將一個或多個原始操作數映射連接到運算單元的輸入上,再將運算單元的運算結果重新映射連接到下一個運算單元的輸入上,直到得到最終的運算結果並輸出。通過這種方式讓數據和運算“流動”起來。在一次運算過程中,將配置資訊保持不變,從而在一次運算過程中固化運算通路,自動根據配置資訊形成多個運算單元流水操作,完成高效運算。在獲取最終運算結果之前,各個運算單元輸出的運算結果可以無需儲存到隨機存取記憶體中,而是直接輸入到下一個運算單元繼續進行運算,這樣,減少了存取次數,從而降低了功耗。進一步地,如果運算函數比較複雜,可以先對運算函數中的一部分進行運算,將得到的最終運算結果寫入隨機存取記憶體,再從隨機存取記憶體中讀取所述最終運算結果用於對運算函數的另一部分進行運算,然後將另一部分的最終運算結果寫入隨機存取記憶體,如此反復多次,得到整個運算函數的最終運算結果。在完成一次運算過程之後,可以重新輸入配置資訊,從而更改各個運算單元之間的連接方式,使得本公開實施例的裝置可以適用於多種類型的算式,實現資源的高效複用,節省面積和功耗,獲得較高的能耗比。In the embodiment of the present disclosure, through two configurable selection units, one or more original operands are mapped and connected to the input of the arithmetic unit, and then the operation result of the arithmetic unit is remapped and connected to the input of the next arithmetic unit until Get the final calculation result and output it. In this way, data and calculations "flow". During a calculation process, the configuration information is kept unchanged, so that the calculation path is solidified during a calculation process, and multiple arithmetic unit pipeline operations are automatically formed according to the configuration information to complete efficient calculations. Before the final operation result is obtained, the operation result output by each operation unit can be directly input to the next operation unit to continue the operation without being stored in the random access memory. In this way, the number of accesses is reduced and the work is reduced. Consumption. Further, if the arithmetic function is more complicated, you can perform operations on a part of the arithmetic function first, write the final operation result obtained into the random access memory, and then read the final operation result from the random access memory. Perform an operation on another part of the arithmetic function, and then write the final operation result of the other part into the random access memory, and repeat this many times to obtain the final operation result of the entire arithmetic function. After completing an operation process, the configuration information can be re-input to change the connection mode between the various operation units, so that the device of the embodiment of the present disclosure can be applied to multiple types of calculation formulas, realize the efficient reuse of resources, and save area and work. To obtain a higher energy consumption ratio.

在本公開實施例中,可以預先針對不同的運算函數確定出不同的配置資訊,然後儲存所確定出的配置資訊。後續,當需要使用不同的運算函數時,可以調用不同的配置資訊,當配置資訊發生改變時,線路的連接關係也會對應發生變化。In the embodiment of the present disclosure, different configuration information can be determined in advance for different arithmetic functions, and then the determined configuration information can be stored. Later, when different arithmetic functions need to be used, different configuration information can be called. When the configuration information changes, the connection relationship of the line will also change accordingly.

在本公開實施例中,運算單元的數量、連接線的數量以及配置碼的寬度可以根據實際需要進行不同的配置,以實現更多類型的運算,擴展性較強。輸入的原始操作數可以是向量也可以是標量,可以是定點數,也可以是浮點數,只要對運算單元的類型和延遲單元的位寬做調整即可支持。In the embodiments of the present disclosure, the number of arithmetic units, the number of connecting lines, and the width of the configuration code can be configured differently according to actual needs, so as to realize more types of operations and have stronger scalability. The input original operand can be a vector or a scalar, can be a fixed-point number, or a floating-point number, as long as the type of the arithmetic unit and the bit width of the delay unit can be adjusted.

運算單元的總數、延遲單元的總數、第一選擇單元用於輸入操作數的輸入端的總數,以及第一選擇單元的輸入端中與第二選擇單元相連接的輸入端的總數可以相同,也可以不同,以上各個數量可以根據實際需求設置。The total number of arithmetic units, the total number of delay units, the total number of input terminals used by the first selection unit to input operands, and the total number of input terminals connected to the second selection unit among the input terminals of the first selection unit can be the same or different , The above quantities can be set according to actual needs.

下面以一個具體實施例為例對本公開實施例的方案進行說明。Sigmoid(y=1/(e-x+1))是神經網路中常見的一種激活函數,其運算過程涉及到基本的運算有指數運算、加法運算和除法運算,該Sigmoid函數可通過圖6所示的裝置來實現。如圖6所示,在本實施例中,通過合理配置,將裝置功能重構為Sigmoid激活函數運算,具體配置方式如下:The following uses a specific embodiment as an example to describe the solution of the embodiment of the present disclosure. Sigmoid (y=1/(e-x+1)) is a common activation function in neural networks. Its calculation process involves basic operations such as exponential operation, addition operation and division operation. The Sigmoid function can be shown in Figure 6. The device shown is implemented. As shown in Figure 6, in this embodiment, the device function is reconstructed into a Sigmoid activation function operation through reasonable configuration. The specific configuration method is as follows:

步驟1:Sigmoid為一元函數,只有一個操作數,假設來自於第一選擇單元的操作數輸入端1,將操作數輸入端1通過配置連接到指數單元的輸入端;Step 1: Sigmoid is a unary function with only one operand. Assuming that it comes from the operand input terminal 1 of the first selection unit, connect the operand input terminal 1 to the input terminal of the exponential unit through configuration;

步驟2:將指數單元的輸出端通過配置連接到加法單元的一個輸入端上;Step 2: Connect the output terminal of the exponent unit to an input terminal of the addition unit through configuration;

步驟3:連接線1配置為將第三暫存器的用於輸出運算參數1的一個輸出端,連接到加法單元的另一個輸入端上,同時連接到除法單元的被除數輸入端;Step 3: The connecting line 1 is configured to connect one output terminal of the third register for outputting operation parameter 1 to the other input terminal of the addition unit and to the dividend input terminal of the division unit at the same time;

步驟4:將加法單元的輸出端通過配置連接到除法單元的除數輸入端;Step 4: Connect the output terminal of the addition unit to the divisor input terminal of the division unit through configuration;

步驟5:配置指數單元的運算方式為exp(-x),加法單元配置為執行加法運算;Step 5: Configure the operation mode of the exponent unit as exp(-x), and configure the addition unit to perform addition operation;

步驟6:將除法單元的輸出端通過配置連接到最終的結果輸出端。Step 6: Connect the output terminal of the division unit to the final result output terminal through configuration.

通過配置而形成的上述運算通路實現了完整的Sigmoid函數運算。本公開實施例可通過修改配置碼,通過有限的運算單元以及連接線涵蓋大量簡單及複雜運算類型。The above-mentioned calculation path formed by configuration realizes the complete Sigmoid function calculation. The embodiment of the present disclosure can cover a large number of simple and complex operation types through limited operation units and connecting lines by modifying the configuration code.

本公開實施例可以實現高效靈活可配置、擴展性強的數據處理裝置,可以高效實現多種複雜運算,同時兼顧了面積和功耗。可以利用本公開實施例的裝置實現神經網路運算中的各種類型的激活函數的運算。神經網路的運算中出現的激活函數大多是一些複雜函數,並且支持神經網路運算的硬件裝置在通用性和可拓展方面有一定的需求,利用本公開實施例可以用一種裝置實現多個激活函數。The embodiments of the present disclosure can realize an efficient, flexible, configurable, and highly expandable data processing device, and can efficiently realize a variety of complex operations, while taking into account area and power consumption. The devices of the embodiments of the present disclosure can be used to implement various types of activation function operations in neural network operations. Most of the activation functions appearing in neural network operations are complex functions, and hardware devices that support neural network operations have certain requirements in terms of versatility and scalability. Using the embodiments of the present disclosure, one device can be used to achieve multiple activations. function.

本公開實施例還提供一種人工智能晶片,所述人工智能晶片包括以上任一實施例所述的數據處理裝置。該人工智能晶片中的數據處理裝置的實施例詳見上述數據處理裝置的實施例,此處不再贅述。The embodiments of the present disclosure also provide an artificial intelligence chip, which includes the data processing device described in any of the above embodiments. For the embodiment of the data processing device in the artificial intelligence chip, please refer to the embodiment of the above-mentioned data processing device for details, which will not be repeated here.

在一些實施例中,所述人工智能晶片還包括:控制單元,所述控制單元用於控制所述數據處理裝置,以使所述數據處理裝置中的多個運算單元構成不同的運算通路。In some embodiments, the artificial intelligence chip further includes: a control unit configured to control the data processing device so that multiple arithmetic units in the data processing device form different arithmetic paths.

在一些實施例中,所述控制單元進一步用於:對所述數據處理裝置的配置資訊進行配置,以使所述數據處理裝置中的多個運算單元構成不同的運算通路。在本公開的實施例中,並不對該控制單元的實現形式進行限制,只要能夠對各個模組或單元或組件進行配置並且提供相應操作數即可,比如該控制單元可以實現為符合條件的一些周邊電路等。In some embodiments, the control unit is further configured to configure the configuration information of the data processing device, so that multiple arithmetic units in the data processing device form different arithmetic paths. In the embodiments of the present disclosure, the implementation form of the control unit is not limited, as long as each module or unit or component can be configured and corresponding operands can be provided. For example, the control unit can be implemented as some qualified ones. Peripheral circuits, etc.

在一些實施例中,所述配置資訊包括以下至少任一:第一配置資訊,用於配置所述第一選擇單元的多個輸入端與所述第一選擇單元的多個輸出端之間的連接關係,和/或所述第二選擇單元的多個輸入端與所述第二選擇單元的多個輸出端之間的連接關係,第二配置資訊,用於配置所述多個運算單元中的至少一部分的運算類型。In some embodiments, the configuration information includes at least any one of the following: first configuration information for configuring the relationship between the multiple input terminals of the first selection unit and the multiple output terminals of the first selection unit The connection relationship, and/or the connection relationship between the multiple input terminals of the second selection unit and the multiple output terminals of the second selection unit, and the second configuration information is used to configure the multiple arithmetic units At least part of the operation type.

在一些實施例中,所述第一選擇單元的多個輸入端包括至少一個第一輸入端;所述控制單元進一步用於:將原始操作數寫入所述第一選擇單元的至少一個第一輸入端。In some embodiments, the multiple input terminals of the first selection unit include at least one first input terminal; the control unit is further configured to: write the original operand into the at least one first input terminal of the first selection unit. Input terminal.

上述人工智能晶片中的數據處理裝置的實施例與前述數據處理裝置的實施例相同,此處不再贅述。透過上述人工智能晶片中的控制單元進行配置的配置資訊的實施例詳見前述數據處理裝置的實施例中的配置資訊,此處不再贅述。The embodiment of the data processing device in the aforementioned artificial intelligence chip is the same as the embodiment of the aforementioned data processing device, and will not be repeated here. For an example of the configuration information configured through the control unit in the artificial intelligence chip, please refer to the configuration information in the embodiment of the aforementioned data processing device, which will not be repeated here.

本領域技術人員在考慮說明書及實踐這裡公開的說明書後,將容易想到本公開的其它實施方案。本公開旨在涵蓋本公開的任何變型、用途或者適應性變化,這些變型、用途或者適應性變化遵循本公開的一般性原理並包括本公開未公開的本技術領域中的公知常識或慣用技術手段。說明書和實施例僅被視為示例性的,本公開的真正範圍和精神由下面的權利要求指出。Those skilled in the art will easily think of other embodiments of the present disclosure after considering the specification and practicing the specification disclosed herein. The present disclosure is intended to cover any variations, uses, or adaptive changes of the present disclosure. These variations, uses, or adaptive changes follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field that are not disclosed in the present disclosure. . The description and the embodiments are to be regarded as exemplary only, and the true scope and spirit of the present disclosure are pointed out by the following claims.

應當理解的是,本公開並不局限於上面已經描述並在附圖中示出的精確結構,並且可以在不脫離其範圍進行各種修改和改變。本公開的範圍僅由所附的權利要求來限制。It should be understood that the present disclosure is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present disclosure is only limited by the appended claims.

以上所述僅為本公開的較佳實施例而已,並不用以限制本公開,凡在本公開的精神和原則之內,所做的任何修改、等同替換、改進等,均應包含在本公開保護的範圍之內。The above are only the preferred embodiments of the present disclosure and are not intended to limit the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure shall be included in the present disclosure. Within the scope of protection.

101:第一選擇單元 102:第二選擇單元 103:運算單元 104:第一暫存器 105:第二暫存器 106:延遲單元 107:第三暫存器 101: The first selection unit 102: The second selection unit 103: arithmetic unit 104: The first register 105: second register 106: Delay unit 107: Third register

圖1是本公開實施例的數據處理裝置的結構示意圖。 圖2是本公開另一些實施例的數據處理裝置的結構示意圖。 圖3A是本公開一些實施例的數據處理裝置中各單元的連接方式的示意圖。 圖3B是圖3A所示的連接方式對應的數據處理通路的示意圖。 圖4A是本公開另一些實施例的數據處理裝置中各單元的連接方式的示意圖。 圖4B是圖4A所示的連接方式對應的數據處理通路的示意圖。 圖5是本公開實施例的配置資訊的示意圖。 圖6是本公開具體實施例的數據處理裝置的結構示意圖。 Fig. 1 is a schematic structural diagram of a data processing device according to an embodiment of the present disclosure. Fig. 2 is a schematic structural diagram of a data processing device according to other embodiments of the present disclosure. FIG. 3A is a schematic diagram of the connection mode of each unit in the data processing device of some embodiments of the present disclosure. Fig. 3B is a schematic diagram of a data processing path corresponding to the connection shown in Fig. 3A. FIG. 4A is a schematic diagram of the connection modes of the units in the data processing device according to other embodiments of the present disclosure. Fig. 4B is a schematic diagram of a data processing path corresponding to the connection shown in Fig. 4A. FIG. 5 is a schematic diagram of configuration information of an embodiment of the present disclosure. Fig. 6 is a schematic structural diagram of a data processing device according to a specific embodiment of the present disclosure.

101:第一選擇單元 102:第二選擇單元 103:運算單元 101: The first selection unit 102: The second selection unit 103: arithmetic unit

Claims (14)

一種數據處理裝置,包括:具有多個輸入端和多個輸出端的第一選擇單元;具有多個輸入端和多個輸出端的第二選擇單元;以及多個運算單元;所述第一選擇單元的多個輸入端可配置地連接所述第一選擇單元的多個輸出端,所述第一選擇單元的多個輸出端中的至少一部分連接所述多個運算單元的輸入端,所述多個運算單元的輸出端連接於所述第二選擇單元的多個輸入端,所述第二選擇單元的多個輸入端可配置地連接所述第二選擇單元的多個輸出端,所述第二選擇單元的多個輸出端中的至少一部分與所述第一選擇單元的多個輸入端連接,和/或與所述數據處理裝置的數據輸出端口連接,以使所述多個運算單元構成不同的運算通路。 A data processing device includes: a first selection unit with a plurality of input terminals and a plurality of output terminals; a second selection unit with a plurality of input terminals and a plurality of output terminals; and a plurality of arithmetic units; A plurality of input terminals are configurably connected to a plurality of output terminals of the first selection unit, at least a part of the plurality of output terminals of the first selection unit is connected to the input terminals of the plurality of arithmetic units, and the plurality of The output terminals of the arithmetic unit are connected to the multiple input terminals of the second selection unit, and the multiple input terminals of the second selection unit are configurably connected to the multiple output terminals of the second selection unit. At least a part of the multiple output terminals of the selection unit is connected to the multiple input terminals of the first selection unit, and/or is connected to the data output port of the data processing device, so that the multiple arithmetic units have different configurations的Computer Path. 如請求項1所述的數據處理裝置,還包括:第一暫存器,用於儲存第一配置資訊,所述第一配置資訊用於配置:所述第一選擇單元的多個輸入端與所述第一選擇單元的多個輸出端之間的連接關係,和/或,所述第二選擇單元的多個輸入端與所述第二選擇單元的多個輸出端之間的連接關係。 The data processing device according to claim 1, further comprising: a first register for storing first configuration information, and the first configuration information is used for configuring: a plurality of input terminals of the first selection unit and The connection relationship between the multiple output terminals of the first selection unit, and/or the connection relationship between the multiple input terminals of the second selection unit and the multiple output terminals of the second selection unit. 如請求項1或2所述的數據處理裝置,還包括:第二暫存器,用於儲存第二配置資訊,所述第二配置資訊用 於配置所述多個運算單元中的至少一部分的運算類型。 The data processing device according to claim 1 or 2, further comprising: a second register for storing second configuration information, and the second configuration information is used for The operation type of at least a part of the plurality of operation units is configured. 如請求項1或2所述的數據處理裝置,其中,所述第一選擇單元的多個輸入端包括至少一個第一輸入端,所述第一輸入端與所述數據處理裝置的數據輸入端口連接,用於輸入原始操作數。 The data processing device according to claim 1 or 2, wherein the multiple input terminals of the first selection unit include at least one first input terminal, and the first input terminal is connected to the data input port of the data processing device Connection, used to enter the original operand. 如請求項1或2所述的數據處理裝置,其中,所述運算單元用於檢測輸入數據中的有效標識資訊,並響應於檢測到所述輸入數據中的有效標識資訊,對所述輸入數據進行運算。 The data processing device according to claim 1 or 2, wherein the arithmetic unit is used to detect valid identification information in the input data, and in response to detecting the valid identification information in the input data, perform processing on the input data Perform calculations. 如請求項1或2所述的數據處理裝置,還包括:至少一個延遲單元;所述延遲單元的輸入端連接於所述第一選擇單元的輸出端,所述延遲單元的輸出端連接於所述第二選擇單元的輸入端;所述延遲單元用於對從所述第一選擇單元的輸出端接收到的數據進行延遲處理,並將所述延遲處理後的數據傳輸至所述第二選擇單元的輸入端。 The data processing device according to claim 1 or 2, further comprising: at least one delay unit; the input terminal of the delay unit is connected to the output terminal of the first selection unit, and the output terminal of the delay unit is connected to the output terminal of the first selection unit. The input terminal of the second selection unit; the delay unit is used to delay processing the data received from the output terminal of the first selection unit, and transmit the delayed processed data to the second selection The input terminal of the unit. 如請求項1或2所述的數據處理裝置,還包括:至少一個第三暫存器,所述第三暫存器的輸入端連接所述第二選擇單元的輸出端,所述第三暫存器的輸出端連接所述第一選擇單元的輸入端,或連接於所述數據處理裝置的數據輸出端口。 The data processing device according to claim 1 or 2, further comprising: at least one third register, the input terminal of the third register is connected to the output terminal of the second selection unit, and the third temporary register The output terminal of the memory is connected to the input terminal of the first selection unit, or is connected to the data output port of the data processing device. 如請求項1或2所述的數據處理裝置,其中,所述第一選擇單元的多個輸入端包括至少一個第二輸入端,所述第二輸 入端通過連接線與所述第二選擇單元的輸出端連接,或者通過連接線與用於儲存運算參數的第四暫存器連接。 The data processing device according to claim 1 or 2, wherein the multiple input terminals of the first selection unit include at least one second input terminal, and the second input terminal The input terminal is connected to the output terminal of the second selection unit through a connecting line, or is connected to a fourth register for storing operation parameters through a connecting line. 如請求項1或2所述的數據處理裝置,其中,所述多個運算單元包括至少一個算數運算單元和/或至少一個邏輯運算單元。 The data processing device according to claim 1 or 2, wherein the multiple operation units include at least one arithmetic operation unit and/or at least one logical operation unit. 一種人工智能晶片,包括權利要求1至9任意一項所述的數據處理裝置。 An artificial intelligence chip, comprising the data processing device according to any one of claims 1 to 9. 如請求項10所述的人工智能晶片,還包括:控制單元,所述控制單元用於控制所述數據處理裝置,以使所述數據處理裝置中的多個運算單元構成不同的運算通路。 The artificial intelligence chip according to claim 10, further comprising: a control unit configured to control the data processing device so that a plurality of arithmetic units in the data processing device form different arithmetic paths. 如請求項11所述的人工智能晶片,其中,所述控制單元進一步用於:對所述數據處理裝置的配置資訊進行配置,以使所述數據處理裝置中的多個運算單元構成不同的運算通路。 The artificial intelligence chip according to claim 11, wherein the control unit is further configured to configure the configuration information of the data processing device so that the multiple arithmetic units in the data processing device constitute different operations path. 如請求項12所述的人工智能晶片,其中,所述配置資訊包括以下至少任一:第一配置資訊,用於配置所述第一選擇單元的多個輸入端與所述第一選擇單元的多個輸出端之間的連接關係,和/或所述第二選擇單元的多個輸入端與所述第二選擇單元的多個輸出端之間的連接關係,第二配置資訊,用於配置所述多個運算單元中的至少一部分的運算類型。 The artificial intelligence chip according to claim 12, wherein the configuration information includes at least any one of the following: first configuration information for configuring the multiple input terminals of the first selection unit and the first selection unit The connection relationship between the multiple output terminals, and/or the connection relationship between the multiple input terminals of the second selection unit and the multiple output terminals of the second selection unit, and the second configuration information is used for configuration The operation type of at least a part of the plurality of operation units. 如請求項11至13任意一項所述的人工智能晶片,其中,所述第一選擇單元的多個輸入端包括至少一個第一輸入端;所述控制單元進一步用於:將原始操作數寫入所述第一選擇單元的至少一個第一輸入端。 The artificial intelligence chip according to any one of claims 11 to 13, wherein the multiple input terminals of the first selection unit include at least one first input terminal; the control unit is further configured to: write the original operand At least one first input terminal of the first selection unit.
TW109146826A 2020-01-21 2020-12-30 Data processing apparatus, artificial intelligence chip TWI740761B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010072639.6A CN113222126B (en) 2020-01-21 2020-01-21 Data processing device and artificial intelligence chip
CN202010072639.6 2020-01-21

Publications (2)

Publication Number Publication Date
TW202129553A TW202129553A (en) 2021-08-01
TWI740761B true TWI740761B (en) 2021-09-21

Family

ID=76991985

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109146826A TWI740761B (en) 2020-01-21 2020-12-30 Data processing apparatus, artificial intelligence chip

Country Status (5)

Country Link
JP (1) JP7250953B2 (en)
KR (1) KR20210131417A (en)
CN (1) CN113222126B (en)
TW (1) TWI740761B (en)
WO (1) WO2021147602A1 (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065330A (en) * 2013-01-15 2013-04-24 南京师范大学 Method and device for particle filter target tracking based on pipeline parallel processing technique
US20150310311A1 (en) * 2012-12-04 2015-10-29 Institute Of Semiconductors, Chinese Academy Of Sciences Dynamically reconstructable multistage parallel single instruction multiple data array processing system
US9460048B2 (en) * 2005-03-28 2016-10-04 Gerald George Pechanek Methods and apparatus for creating and executing a packet of instructions organized according to data dependencies between adjacent instructions and utilizing networks based on adjacencies to transport data in response to execution of the instructions
US9508113B2 (en) * 2013-10-11 2016-11-29 Samsung Electronics Co., Ltd. Pipeline system including feedback routes and method of operating the same
US9940534B1 (en) * 2016-10-10 2018-04-10 Gyrfalcon Technology, Inc. Digital integrated circuit for extracting features out of an input image based on cellular neural networks
CN108885543A (en) * 2016-01-26 2018-11-23 Icat有限责任公司 Processor with reconfigurable algorithm pipeline kernel and algorithmic match assembly line compiler
US20190114536A1 (en) * 2017-10-17 2019-04-18 Mediatek Inc. Hybrid non-uniform convolution transform engine for deep learning applications
CN109697185A (en) * 2017-10-20 2019-04-30 图核有限公司 Synchronization in more tile processing arrays
TW201923612A (en) * 2017-10-20 2019-06-16 英商葛夫科有限公司 Parallel computing
US20190196814A1 (en) * 2017-12-22 2019-06-27 Alibaba Group Holding Limited Multiple-pipeline architecture with special number detection
CN110245756A (en) * 2019-06-14 2019-09-17 第四范式(北京)技术有限公司 Method for handling the programming device of data group and handling data group
CN110462602A (en) * 2017-04-07 2019-11-15 英特尔公司 The method and apparatus of deep learning network execution pipeline on multi processor platform
TW201947524A (en) * 2017-05-12 2019-12-16 美商谷歌有限責任公司 Image processor with configurable number of active cores and supporting internal network

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3170599B2 (en) * 1996-03-01 2001-05-28 経済産業省産業技術総合研究所長 Programmable LSI and its operation method
JP2000255668A (en) * 1999-03-10 2000-09-19 Giyoumei Furuyama Food container
JP2004206326A (en) * 2002-12-25 2004-07-22 Seiko Epson Corp Arithmetic processing circuit and semiconductor device using it
US8442927B2 (en) * 2009-07-30 2013-05-14 Nec Laboratories America, Inc. Dynamically configurable, multi-ported co-processor for convolutional neural networks
CN106203617B (en) * 2016-06-27 2018-08-21 哈尔滨工业大学深圳研究生院 A kind of acceleration processing unit and array structure based on convolutional neural networks
CN106126481B (en) * 2016-06-29 2019-04-12 华为技术有限公司 A kind of computing system and electronic equipment
US11562115B2 (en) * 2017-01-04 2023-01-24 Stmicroelectronics S.R.L. Configurable accelerator framework including a stream switch having a plurality of unidirectional stream links
CN107145467A (en) * 2017-05-13 2017-09-08 贾宏博 A kind of distributed computing hardware system in real time
CN109034382A (en) * 2017-10-30 2018-12-18 上海寒武纪信息科技有限公司 The recognition methods of scene or object and Related product
CN110083333A (en) * 2019-03-22 2019-08-02 福州麦辽自动化设备有限公司 A kind of data processing circuit
CN110390383B (en) * 2019-06-25 2021-04-06 东南大学 Deep neural network hardware accelerator based on power exponent quantization
CN110427169B (en) * 2019-07-12 2021-07-02 东南大学 Three-layer structure configurable approximate bit width adder for artificial neural network

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9460048B2 (en) * 2005-03-28 2016-10-04 Gerald George Pechanek Methods and apparatus for creating and executing a packet of instructions organized according to data dependencies between adjacent instructions and utilizing networks based on adjacencies to transport data in response to execution of the instructions
US20150310311A1 (en) * 2012-12-04 2015-10-29 Institute Of Semiconductors, Chinese Academy Of Sciences Dynamically reconstructable multistage parallel single instruction multiple data array processing system
CN103065330A (en) * 2013-01-15 2013-04-24 南京师范大学 Method and device for particle filter target tracking based on pipeline parallel processing technique
US9508113B2 (en) * 2013-10-11 2016-11-29 Samsung Electronics Co., Ltd. Pipeline system including feedback routes and method of operating the same
CN108885543A (en) * 2016-01-26 2018-11-23 Icat有限责任公司 Processor with reconfigurable algorithm pipeline kernel and algorithmic match assembly line compiler
US9940534B1 (en) * 2016-10-10 2018-04-10 Gyrfalcon Technology, Inc. Digital integrated circuit for extracting features out of an input image based on cellular neural networks
CN110462602A (en) * 2017-04-07 2019-11-15 英特尔公司 The method and apparatus of deep learning network execution pipeline on multi processor platform
TW201947524A (en) * 2017-05-12 2019-12-16 美商谷歌有限責任公司 Image processor with configurable number of active cores and supporting internal network
US20190114536A1 (en) * 2017-10-17 2019-04-18 Mediatek Inc. Hybrid non-uniform convolution transform engine for deep learning applications
TW201923612A (en) * 2017-10-20 2019-06-16 英商葛夫科有限公司 Parallel computing
CN109697185A (en) * 2017-10-20 2019-04-30 图核有限公司 Synchronization in more tile processing arrays
US20190196814A1 (en) * 2017-12-22 2019-06-27 Alibaba Group Holding Limited Multiple-pipeline architecture with special number detection
CN110245756A (en) * 2019-06-14 2019-09-17 第四范式(北京)技术有限公司 Method for handling the programming device of data group and handling data group

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
2013年4月30日公開文件am Skalicky "PAPER ACCEPTED: FPGA PIPELINE MODEL"https://samskalicky.wordpress.com/2013/04/27/paper-accepted-fpga-pipeline-model/
年4月30日公開文件am Skalicky "PAPER ACCEPTED: FPGA PIPELINE MODEL"https://samskalicky.wordpress.com/2013/04/27/paper-accepted-fpga-pipeline-model/ *

Also Published As

Publication number Publication date
JP2022527318A (en) 2022-06-01
JP7250953B2 (en) 2023-04-03
CN113222126A (en) 2021-08-06
KR20210131417A (en) 2021-11-02
WO2021147602A1 (en) 2021-07-29
CN113222126B (en) 2022-01-28
TW202129553A (en) 2021-08-01

Similar Documents

Publication Publication Date Title
EP0100511B1 (en) Processor for fast multiplication
US20190196970A1 (en) Unified memory organization for neural network processors
EP3343466A1 (en) Method and apparatus for a binary neural network mapping scheme utilizing a gate array architecture
US20090300336A1 (en) Microprocessor with highly configurable pipeline and executional unit internal hierarchal structures, optimizable for different types of computational functions
US20210011863A1 (en) Non-volatile memory based processors and dataflow techniques
JPH10187438A (en) Method for reducing transition to input of multiplier
JP2008537268A (en) An array of data processing elements with variable precision interconnection
JP7183197B2 (en) high throughput processor
CN111615685B (en) Programmable multiply-add array hardware
US20120137108A1 (en) Systems and methods integrating boolean processing and memory
TWI740761B (en) Data processing apparatus, artificial intelligence chip
KR100453230B1 (en) Hyperelliptic curve crtpto processor hardware apparatus
JP2010117806A (en) Semiconductor device and data processing method by semiconductor device
US8607029B2 (en) Dynamic reconfigurable circuit with a plurality of processing elements, data network, configuration memory, and immediate value network
TWI537819B (en) Algorithm module, device and system
Gaurav et al. Design and Implementation of low power RISC V ISA based coprocessor design for Matrix multiplication
TW202018502A (en) Reducing power consumption in a processor circuit
US10387155B2 (en) Controlling register bank access between program and dedicated processors in a processing system
Narkhede et al. Design and implementation of an efficient instruction set for ternary processor
Wei et al. Design of Modular Multiplier Based on Memristor
CN114418077A (en) Method, system, equipment and storage medium for accelerating neural network calculation
JP2022546785A (en) Reuse of adjacent SIMD units for rapid and extensive result generation
CN115062565A (en) Design method of low-delay elliptic curve point multiplication circuit
JP2023073196A (en) Computing device for performing digital pulse-based crossbar operation and method of operating the same
CN116991481A (en) Execution method, device and medium of operation instruction