CN116431214A - Instruction set device for reconfigurable deep neural network accelerator - Google Patents

Instruction set device for reconfigurable deep neural network accelerator Download PDF

Info

Publication number
CN116431214A
CN116431214A CN202310334605.3A CN202310334605A CN116431214A CN 116431214 A CN116431214 A CN 116431214A CN 202310334605 A CN202310334605 A CN 202310334605A CN 116431214 A CN116431214 A CN 116431214A
Authority
CN
China
Prior art keywords
module
configuration
hardware
instruction
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310334605.3A
Other languages
Chinese (zh)
Inventor
梁云
贾连成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202310334605.3A priority Critical patent/CN116431214A/en
Publication of CN116431214A publication Critical patent/CN116431214A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses an instruction set device for a reconfigurable deep neural network accelerator, which comprises an instruction controller and a plurality of hardware modules, wherein each hardware module comprises an input/output module, a matrix calculation module and a vector calculation module; providing multi-level hardware configuration by adopting a microkernel programming paradigm; compiling a computing task of the deep neural network accelerator into a plurality of microkernels, wherein each microkernel is encoded into a plurality of hardware instructions; each hardware instruction is used for module hardware configuration control and time-plane configuration control of specific computation or data movement operation; each hardware instruction includes the fields: instruction type, module type, configuration address, dependency flags and module configuration content. The invention realizes the efficient programming of various complex reconfigurable functional neural network hardware accelerators by using hardware instructions to represent the data flow reconfiguration and the functional reconfiguration of the reconfigurable deep neural network accelerator.

Description

Instruction set device for reconfigurable deep neural network accelerator
Technical Field
The invention relates to a hardware instruction set architecture technology, in particular to an instruction set interaction interface device of a reconfigurable neural network accelerator.
Background
The instruction set architecture (English: instruction Set Architecture, ISA), also known as the instruction set or instruction set architecture, is a program design-related portion of a computer architecture, and includes basic data types, instruction sets, registers, addressing modes, storage systems, interrupts, exception handling, and external I/O. The instruction set architecture contains a series of opcodes, i.e., opcodes (machine language), and basic commands that are executed by a particular processor.
Deep Neural Network (DNN) accelerators are a new type of computer hardware architecture for efficiently processing various types of neural network applications. Compared with the traditional computer, the DNN accelerator has the following characteristics: (1) high parallelism: thousands of computing units (Processing Element, PE) are arranged in the accelerator, a rectangular or tree-shaped interconnection array is formed among the PE, and data transmission among different PE is realized through hardware data flow. (2) support algorithms limited: DNN accelerators typically only need to support various types of DNN algorithms, such as matrix multiplication, convolution, activation functions, etc., and do not need to support programming of general-purpose algorithms. (3) control logic is simple: DNN accelerators typically act as subsystems of a complete computer and thus do not need to support the full functionality of a complete computer, such as complex branch control, interrupts, etc. (4) explicit memory access: unlike traditional computers, which cache data through multi-layer caches, DNN accelerators use an explicit memory access mechanism, and users need to specify specific locations of data in each layer of memory through instructions and access sequences to each layer of memory in each cycle.
The reconfigurable DNN accelerator is a novel DNN accelerator hardware structure. The architecture implements one or more reconfigurable features, namely data stream reconstruction, functional reconstruction and multi-module reconstruction, respectively. The data flow reconstruction can dynamically adjust the transmission mode of data in the PE array. The functional reconstruction may dynamically adjust the algorithm implemented by the ALU unit. The multi-module reconstruction may run different DNN calculation tasks in multiple sub-modules.
Some of the recent efforts have devised the instruction set architecture of DNN accelerators. Including Cambricon [1], VTA [2], gemmini [3], and the like. These instruction set architectures are each adapted to specific DNN accelerator hardware, but existing instruction sets have limited support for reconfigurable DNN accelerators, typically only one reconfigurable feature. In the face of accelerators with multiple reconfigurable features, a new instruction set architecture is needed to support multiple reconfigurable features of multiple DNN accelerators, thereby improving the operating efficiency of the accelerator.
Reference to the literature
[1]Liu,Shaoli,Zidong Du,Jinhua Tao,Dong Han,Tao Luo,Yuan Xie,Yunji Chen,and TianshiChen."Cambricon:An instruction set architecture for neural networks."ACM SIGARCHComputer Architecture News 44,no.3(2016):393-405.
[2]Moreau,Thierry,Tianqi Chen,Luis Vega,Jared Roesch,Eddie Yan,Lianmin Zheng,JoshFromm et al."A hardware–software blueprint for flexible deep learning specialization."IEEEMicro 39,no.5(2019):8-16.
[3]Genc,Hasan,et al."Gemmini:Enabling systematic deep-learning architecture evaluation viafull-stack integration."2021 58th ACM/IEEE Design Automation Conference(DAC).IEEE,2021.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides an instruction set device for a reconfigurable deep neural network accelerator, which is an instruction set architecture supporting a complex reconfigurable functional neural network accelerator, and the data flow reconfiguration and the functional reconfiguration characteristics of the neural network accelerator are represented by hardware instructions, so that the efficient programming of the complex neural network hardware accelerator is realized, the length of programming codes is reduced, and higher hardware operation efficiency is supported.
For convenience, the invention is defined in terms of:
PE (Processing Element) computing unit
DMA (Direct Memory Access) direct memory access
FSM (Finite State Machine) finite state machine
SRAM (Static Random-Access Memory) Static Random Access Memory
DRAM (Dynamic Random Access Memory) dynamic random access memory
The technical scheme of the invention is as follows:
an instruction set apparatus for a reconfigurable deep neural network accelerator,
compared with the prior art, the invention has the beneficial effects that:
existing instruction set architectures are typically capable of supporting only one type of reconfigurable DNN accelerator hardware. The instruction set architecture provided by the invention can support various hardware reconfiguration characteristics, thereby improving the programming efficiency of the reconfigurable hardware accelerator.
Detailed Description
The invention is further described by the following examples, which are not intended to limit the scope of the invention in any way.
The invention provides an instruction set device for a reconfigurable deep neural network accelerator, which can support various complex reconfigurable functional neural network accelerators by using data flow reconfiguration and functional reconfiguration characteristics of an instruction representation accelerator.
The invention provides an instruction set device for a reconfigurable deep neural network accelerator, which comprises an instruction set format, and comprises the following components: instruction type, module type, configuration address, dependency flags and module configuration content.
The present invention is directed to a reconfigurable deep neural network accelerator design instruction set architecture. The accelerator comprises an instruction controller and a plurality of hardware modules, wherein the hardware modules comprise an input-output module, a matrix calculation module and a vector calculation module, and can be used for processing various deep neural network calculation tasks.
Overall format of instruction set architecture:
the instruction set architecture of the present invention employs a microkernel (microkernel) programming paradigm to provide multi-level hardware configuration. The overall deep neural network DNN computing task is compiled into a plurality of microkernels, each microkernel encoded as a plurality of hardware instructions. Each hardware instruction is used for module hardware configuration control and time plane configuration control of a particular computing or data movement operation. The module hardware configuration determines the data flow and function of each hardware module in the reconfigurable deep neural network accelerator. The time level configuration is to program a Finite State Machine (FSM) in a computing module and an input/output module (DMA module) of the accelerator to realize time level control of a multi-layer nested loop algorithm and data transmission tasks in a deep neural network of the accelerator.
Table 1 instruction set architecture format
Position of 0 1 2 3 4 11 12 15 16 127
Data Instruction type Module type Dependency flag Configuring addresses Configuring content
Table 1 shows the format of a 128-bit Instruction Set Architecture (ISA) employed by the present invention. Each hardware instruction contains 5 fields: inst Type (instruction Type) occupies 2 bits, module Type occupies 2 bits, dep. Flags (dependency flag) occupies 8 bits, config Addr occupies 4 bits, config Payload occupies 112 bits. The meaning and purpose of each field are described below.
1) Instruction type, module type, configuration address
The first two bits (Inst Type) determine the Type of hardware instruction, which may be one of two: a) Pure configuration or b) configuration-execution. If the type is configuration-execution, the instruction controller sends a start signal to the corresponding hardware module (input-output module, matrix calculation module or vector calculation module) after writing the corresponding configuration to call the microkernel and wait for completion thereof. Otherwise, the instruction configuration of the microkernel is not completed, and the next instruction is fetched from the program written by the user and the configuration is continued. Next, we devised a two-dimensional table to configure each hardware module with different contents, as shown in table 2.
In each instruction, the module type field defines the column index of table 2, indicating the hardware module number that the reconfigurable deep neural network accelerator needs to be configured. The hardware module type may be one of the following 4 types: a) An input module; b) A matrix calculation module; c) A vector calculation module; d) And the module numbers of the output modules are 0,1,2 and 3 respectively. The configuration address field defines the row index of table 2, indicating the configuration type inside the hardware module. When the microkernel computing task is translated (encoded) into a hardware instruction, all configuration addresses in a hardware module configured by the hardware instruction need to be configured by one hardware instruction. The 3 rd to 7 th addresses of the input module and the output module are invalid, so that the addresses 0 to 2 are required to be configured, and 3 instructions are required; the matrix and vector calculation module needs to configure all addresses from 0 to 7 for 8 instructions in total. For example, when programming a microkernel containing K instructions, the first K-1 instruction needs to use a pure configuration type and the last instruction uses a configuration-execution type to initiate hardware execution.
2) Dependency flag
We use an 8-bit dependency flag field to encode dependencies between different microkernels. When the x-th bit (from low to high) of the lower 4 bits is 1, the instruction needs to rely on a ready signal sent after the completion of the module with the number x to start execution; when the x-th bit (from low to high) of the upper 4 bits is 1, it indicates that the ready signal is sent to the x-th module after the instruction is completed. For example, if an instruction for vector calculation depends on a read instruction, the instruction dependency of the vector calculation module is encoded as 00000001, and the 0 th bit of the lower 4 bits is 1, indicating that the vector calculation instruction depends on the ready signal sent by the module (read module) with the number 0. Meanwhile, the dependency relationship flag of the reading module is set to 01000000, and the 2 nd bit of the upper 4 bits is 1, which indicates that after the reading instruction is completed, a ready signal is sent to the module (vector calculation module) with the number of 2.
Table 2 configuration content of each module at different addresses
3) Configuring content
Table 2 shows the configuration content of each type of module at different addresses. The details of each configuration are described below.
3.1 Global internetwork configuration
All modules share the same global internetwork configuration register at address 1. The register determines which module each memory is read from or written to. Each memory uses 4 bits of interconnect network configuration information. The first two bits are the written information, namely the module type written into the memory; the second two bits are read information, i.e. the module type of the memory is read. The relationship between the module type and the corresponding value is shown in table 3. Each instruction only changes the configuration information of one part of the memories, and the configuration information of the other part of the memories is not changed, so that 0 needs to be filled in the corresponding position. Since this register is shared by all modules, the write to this register should be masked (mask bit at address 0) to avoid conflicts. Therefore, all positions filled with non-0 values need to be filled with 3 for the corresponding mask bit content.
Table 3 global interconnect network configuration values corresponding to module types
Module type Numerical value
Matrix calculation module 1
Vector calculation module 2
Input and output 3
3.2 Module specific configuration
At address 2, each module configures its specific runtime configuration information. For the input and output modules, the configuration content comprises a DRAM base address (32 bits), an SRAM base address (16 bits) and a read-write length (16 bits). For the matrix calculation module, the configuration content includes a reset bit (determining whether the output matrix is reset, 1 is reset, 0 is not reset), and a data stream (determining the transmission mode of data in the matrix calculation module, 1 is output holding, and 0 is weight holding). For the vector computation module, the operational configuration and the data flow configuration each contain a plurality of pieces of content, which are discussed separately below.
3.3 operational configuration of vector computation Module
There may be multiple vector computation modules in the accelerator. For a calculation using an immediate, it is necessary to configure "whether or not to use the immediate" as 1, and to configure "immediate value" as the immediate required for the calculation. Otherwise, the "use immediate" is configured to 0. In addition, the operands, the source port of each input operand, and the output destination port are also configured. For each vector calculation module, the operation configuration content thereof is shown in table 4.
Table 4 operation configuration content and bit width occupied by each configuration
Configuration description Bit width
Whether or not to use an immediate 1
Immediate value 16
Operand(s) 4
Input operand A source port 2
Input operand B source port 2
Input operand C source port 2
Output destination port 2
Wherein the configuration values for each operand are shown in Table 5. For example, when the operand of the vector calculation module is set to Mul, the operand corresponding position of the vector calculation module needs to be set to 7.
TABLE 5 operand definitions and corresponding configuration values
Operand(s) Configuration values
Min 1
Max 2
Add 3
Sub 4
Shl 5
Shr 6
Mul 7
Mac 8
exp 9
log 10
sigmoid 11
tanh 12
nop 13
For input operand A source, input operand B source, input operand C source and output destination, their configuration values represent different port numbers, as shown in Table 6.
TABLE 6 configuration of input operand sources and output destinations
Port numbering Configuration values
No input/no output 0
Port 1 1
Port 2 2
Port 3 3
3.4 data flow configuration of vector operation Module
The data flow configuration of the vector operation module defines the data flow used by each operand of the vector operation module. The accelerator may have multiple vector operation modules, but the multiple vector operation modules share the same data flow configuration. The bit width occupied by each configuration is shown in table 7.
TABLE 7 data stream configuration content and bit width occupied by each configuration
Configuration description Bit width
Input operand A data stream 2
Input operand B data stream 2
Input operand C data stream 2
Output operand data stream 2
The correspondence between the data flow of the operand a and the configuration value is shown in table 8.
Table 8 configuration values corresponding to the data stream of input operand a
Operand A data stream Configuration values
Horizontal multicast 0
Horizontal pulsation 1
Horizontal pulsing, vertical multicasting 2
Horizontal unicast and vertical multicast 3
The correspondence between the operand B, C, the data stream of the output operand and the configuration value is shown in table 9.
Table 9 configuration values corresponding to the data stream of input operand B or C and output operand
Figure BDA0004155974040000071
3.5 cycle and memory Access configuration
The ISA designed by the invention uses the rest of configuration registers (addresses 3-7) to process the control and data access of any 4-layer perfect nested loop in a vector computing module and a matrix computing module, wherein the control and data access comprises the scope of each layer of loop, an on-chip cache base address and the data access step length of each layer of loop. Wherein, the range of each layer of circulation is configured at the address 3, and 4 layers of circulation are total, and each layer of circulation occupies 16 bits, and 64 bits total.
Addresses 4-7 respectively configure the base address of each operand to access the memory, and the access step size of each tier cycle. The address used by each operand is shown in table 10.
Table 10 configuration Address used per operand
Figure BDA0004155974040000072
Figure BDA0004155974040000081
For each operand, the memory address Addr it accesses can be expressed by the following formula:
Addr=Base+St1×Idx1+St2×Idx2+St3×Idx3+St4×Idx4
where Idx 1-Idx 4 are subscripts to the 4-layer cycle, which are dynamically generated when the accelerator is running. For each operand, base, st 1-St 4 are respectively configured at their corresponding configuration addresses. Each number occupies 16 bits and a total of 80 bits.
The instruction set architecture provided by the invention provides a programming interface of software and hardware, and provides support for controllers, data streams and interconnection modes required by the hardware. In terms of software, the operating framework of the deep neural network should include a compiler for the reconfigurable accelerator for compiling to generate configuration code conforming to the definition of the present instruction set. In terms of hardware, the reconfigurable deep neural network accelerator should realize corresponding functions according to the configuration described in the invention, including instruction controllers, data flows, operands and internet configuration, and these configurations are used for realizing corresponding functions, so that various different deep neural network applications can be efficiently operated.
It should be noted that the purpose of the disclosed embodiments is to aid further understanding of the present invention, but those skilled in the art will appreciate that: various alternatives and modifications are possible without departing from the scope of the invention and the appended claims. Therefore, the invention should not be limited to the disclosed embodiments, but rather the scope of the invention is defined by the appended claims.

Claims (10)

1. An instruction set device for a reconfigurable deep neural network accelerator is characterized in that the reconfigurable deep neural network accelerator comprises an instruction controller and a plurality of hardware modules, and the hardware modules comprise an input-output module, a matrix calculation module and a vector calculation module; the data flow reconstruction and the function reconstruction of the reconfigurable deep neural network accelerator are represented by using hardware instructions, so that efficient programming of various complex reconfigurable functional neural network hardware accelerators is realized;
the instruction set device adopts a microkernel programming paradigm to provide multi-level hardware configuration; compiling a computing task of the deep neural network accelerator into a plurality of microkernels, wherein each microkernel is encoded into a plurality of hardware instructions; each hardware instruction is used for module hardware configuration control and time-plane configuration control of specific computation or data movement operation;
the format of the instruction set adopts the format of a 128-bit Instruction Set Architecture (ISA); each hardware instruction includes the following fields: instruction type, module type, configuration address, dependency mark and module configuration content;
the instruction type of the hardware instruction includes a pure configuration or a configuration-execution; if the type is configuration-execution, the instruction controller sends a starting signal to the corresponding hardware module after writing the corresponding configuration to call the microkernel and wait for the microkernel to finish; otherwise, the next instruction is fetched and configuration is continued;
the module type field of each instruction is used for representing the module number corresponding to the hardware module type which needs to be configured by the reconfigurable deep neural network accelerator; the hardware module types include: an input module; a matrix calculation module; a vector calculation module; an output module;
the configuration address field is used for representing the configuration type inside the hardware module; all configuration addresses in a hardware module configured by the hardware instruction are configured by adopting one hardware instruction;
the dependency relationship mark field is used for encoding the dependency relationship between different microkernels; when the x bit from low to high of the low 4 bits is 1, the hardware instruction can start executing only according to a ready signal sent after the completion of the module with the serial number x; when the x bit from low to high of the high 4 bits is 1, the hardware instruction is finished and then a ready signal is sent to the x-th module;
the module configuration content comprises global internet configuration data, module specific configuration, operation configuration of a vector calculation module, data flow configuration and circulation and memory access configuration of a vector operation module; all modules share the same global internet configuration data and are used for determining the modules of which each memory is read or written; the module specific configuration content includes module specific runtime configuration information; the operation configuration of the vector calculation module is used for configuring the corresponding operation of the vector calculation module; the data flow configuration of the vector operation module is used for defining the data flow used by each operand of the vector operation module, and a plurality of vector operation modules share the same data flow configuration; the loop and memory access configuration is used for processing control and data access of any 4-layer perfect nested loop in the vector calculation module and the matrix calculation module, and comprises a scope of each layer of loop, an on-chip cache base address and a data access step length of each layer of loop.
2. The instruction set apparatus for a reconfigurable deep neural network accelerator of claim 1, wherein the module hardware configuration determines the data flow and function of each hardware module in the reconfigurable deep neural network accelerator; the time layer configuration realizes the time layer control of the multi-layer nested loop algorithm and the data transmission task in the deep neural network of the accelerator by programming the finite state machine in the calculation module and the input/output module of the accelerator.
3. The instruction set apparatus for a reconfigurable deep neural network accelerator of claim 1, wherein each hardware instruction includes a field in which an instruction type occupies 2 bits; the module type occupies 2 bits; the dependency mark occupies 8 bits; the configuration address occupies 4 bits; the configuration content occupies 112 bits.
4. The instruction set apparatus for a reconfigurable deep neural network accelerator of claim 3, wherein the hardware module types have module numbers of 0,1,2,3, respectively.
5. The instruction set apparatus for a reconfigurable deep neural network accelerator of claim 4, wherein when translating a microkernel computing task into a hardware instruction, all configuration addresses in a hardware module configured by the hardware instruction are configured by one hardware instruction; the 3 rd to 7 th addresses of the input module and the output module are invalid, and 3 instructions of the addresses 0 to 2 are required to be configured; the matrix and vector calculation module needs to configure all addresses from 0 to 7 for 8 instructions in total.
6. The instruction set apparatus for a reconfigurable deep neural network accelerator of claim 5, wherein when programming a microkernel containing K instructions, the first K-1 instruction is of a pure configuration type and the last instruction is of a configuration-execution type to initiate hardware execution.
7. The instruction set apparatus for a reconfigurable deep neural network accelerator of claim 5, wherein each memory uses 4-bit global internetwork configuration information; the first two bits are the written information, namely the module type written into the memory; the second two bits are read information, i.e. the module type of the memory is read.
8. The instruction set apparatus for a reconfigurable deep neural network accelerator of claim 7, wherein in the module-specific configuration, for the input module and the output module, the configuration contents include a DRAM base address, an SRAM base address, and a read-write length; for the matrix computation module, the configuration content includes reset bits and data streams.
9. The instruction set apparatus for a reconfigurable deep neural network accelerator of claim 7, wherein in the module-specific configuration, for the vector computation module, the operational configuration and the data flow configuration each comprise a plurality of pieces of content, comprising: the configuration uses immediate computation, operands, source ports for each input operand, and output destination ports.
10. The instruction set apparatus for a reconfigurable deep neural network accelerator of claim 9, wherein the loop and memory access configuration is specifically control and data access using any 4-layer perfectly nested loop in the configuration registers of addresses 3-7 to handle vector computation modules, matrix computation modules; wherein, the range of each layer of circulation is configured at the address 3, and 4 layers of circulation are total, and each layer of circulation occupies 16 bits, and 64 bits total.
CN202310334605.3A 2023-03-31 2023-03-31 Instruction set device for reconfigurable deep neural network accelerator Pending CN116431214A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310334605.3A CN116431214A (en) 2023-03-31 2023-03-31 Instruction set device for reconfigurable deep neural network accelerator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310334605.3A CN116431214A (en) 2023-03-31 2023-03-31 Instruction set device for reconfigurable deep neural network accelerator

Publications (1)

Publication Number Publication Date
CN116431214A true CN116431214A (en) 2023-07-14

Family

ID=87084830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310334605.3A Pending CN116431214A (en) 2023-03-31 2023-03-31 Instruction set device for reconfigurable deep neural network accelerator

Country Status (1)

Country Link
CN (1) CN116431214A (en)

Similar Documents

Publication Publication Date Title
Ankit et al. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference
Hajinazar et al. SIMDRAM: A framework for bit-serial SIMD processing using DRAM
CN111630502B (en) Unified memory organization for neural network processors
Fujiki et al. In-memory data parallel processor
US8984256B2 (en) Thread optimized multiprocessor architecture
US11055613B2 (en) Method and apparatus for a binary neural network mapping scheme utilizing a gate array architecture
US20080250227A1 (en) General Purpose Multiprocessor Programming Apparatus And Method
EP3497624A1 (en) Apparatuses, methods, and systems for neural networks
US20210224185A1 (en) Data layout optimization on processing in memory architecture for executing neural network model
US20140351563A1 (en) Advanced processor architecture
CN1853164B (en) Combinational method for developing building blocks of DSP compiler
CN111752530A (en) Machine learning architecture support for block sparsity
CN112445454A (en) System for performing unary functions using range-specific coefficient set fields
Challapalle et al. FARM: A flexible accelerator for recurrent and memory augmented neural networks
AskariHemmat et al. RISC-V barrel processor for deep neural network acceleration
Liu et al. Establishing high performance AI ecosystem on Sunway platform
US10990384B2 (en) System, apparatus and method for dynamic update to code stored in a read-only memory (ROM)
CN116431214A (en) Instruction set device for reconfigurable deep neural network accelerator
Basalama et al. SPAR-2: A SIMD processor array for machine learning in IoT devices
Han et al. Polyhedral-based compilation framework for in-memory neural network accelerators
US20220414050A1 (en) Apparatus for Memory Configuration for Array Processor and Associated Methods
US20220413850A1 (en) Apparatus for Processor with Macro-Instruction and Associated Methods
Mambu et al. Dedicated instruction set for pattern-based data transfers: an experimental validation on systems containing in-memory computing units
CN111857824A (en) Control system and method for fractal intelligent processor and electronic equipment
Zhao et al. A microcode-based control unit for deep learning processors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination