WO2018171715A1 - Automated design method and system applicable for neural network processor - Google Patents

Automated design method and system applicable for neural network processor Download PDF

Info

Publication number
WO2018171715A1
WO2018171715A1 PCT/CN2018/080200 CN2018080200W WO2018171715A1 WO 2018171715 A1 WO2018171715 A1 WO 2018171715A1 CN 2018080200 W CN2018080200 W CN 2018080200W WO 2018171715 A1 WO2018171715 A1 WO 2018171715A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
hardware
file
data
instruction
Prior art date
Application number
PCT/CN2018/080200
Other languages
French (fr)
Chinese (zh)
Inventor
韩银和
许浩博
王颖
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Publication of WO2018171715A1 publication Critical patent/WO2018171715A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present invention relates to the technical field of neural network processor architecture, and in particular to an automatic design method and system for a neural network processor.
  • the existing neural network hardware acceleration technology includes an Application Specific Integrated Circuit (ASIC) chip and a Field Programmable Gate Array (FPGA).
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the ASIC chip runs at a speed. Fast and low power consumption, but the design process is complex, the filming period is long, the development cost is high, and it can not adapt to the characteristics of rapid update of the neural network model; FPGA has the characteristics of flexible circuit configuration and short development cycle, but the running speed is relatively low, hardware overhead And the power consumption is relatively large.
  • the neural network model and algorithm developers need to master the hardware development technology while understanding the network topology and data flow mode, including processor architecture design, hardware code writing, Simulation verification and layout and other aspects, these technologies are more difficult for high-level application developers who focus on researching neural network models and structural design without hardware design capabilities.
  • the present invention provides an automatic design method and system for a neural network processor, so that high-level developers can efficiently perform neural network technology application development.
  • the present invention provides an automated design method for a neural network processor, the method comprising:
  • Step A acquiring a neural network model topology configuration file and a hardware resource constraint file of the target hardware circuit for the neural network model to be implemented in a hardware circuit manner;
  • Step B construct a neural network processor hardware architecture corresponding to the neural network model according to the neural network model topology configuration file and the hardware resource constraint file, and generate a hardware architecture description file;
  • Step C Generate a control description file for controlling data scheduling, storage, and calculation of the neural network processor according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file;
  • Step D Generate a hardware circuit description language corresponding to the neural network processor according to the hardware architecture description file and the control description file, so as to implement a hardware circuit of the neural network processor on the target hardware circuit.
  • the step B may include:
  • Various types of unit components each unit including a hardware description file and a configuration script for describing its hardware structure
  • the hardware resource constraint file may include one or more of the following: an operating frequency of the target hardware circuit, a target circuit area overhead, a target circuit power consumption overhead, a supported data precision, and a target circuit memory size.
  • the neural network model topology configuration file may include the number of neural network layers and each layer network size, data bit width, weight bit width, current layer function attribute, current layer input layer number, current layer output layer Number, current layer convolution kernel size, current layer step size, next layer connection property.
  • the step C may further include determining a convolution kernel splitting and data sharing manner by the following steps:
  • the convolution kernel k is divided into a plurality of convolution kernels k s ; if the number of data layers is greater than the calculation unit width, a data sharing manner is adopted;
  • control description file may include an instruction stream for controlling data scheduling, storage, and calculation of the neural network processor, wherein the types of instructions include load/store instructions and operation instructions.
  • the load/store instructions may include data exchange instructions between an external memory and an internal memory of the neural network processor, data for the internal memory, and weight loading to the computing unit. And an instruction for storing the calculation result of the calculation unit into the memory; and the operation instruction includes a convolution operation instruction, a pooling operation instruction, a local corresponding normalization instruction, a clear instruction, and an excitation function operation instruction.
  • the format of the convolutional instruction may include the following fields: an opcode for marking the type of instruction, a number of computation cores for marking the number of computational cores participating in the operation, and a transmission interval for marking each operation of the instruction.
  • the transmission interval for marking the mode of convolution and cross-layer convolution in the layer, and the destination register for marking the storage location of the calculation result.
  • the present invention provides an automated design method for a neural network processor, comprising:
  • the control description file from the constructed neural network reusable cell library to find a cell library that meets the design requirements, generate corresponding control logic, and generate a corresponding hardware circuit description language, Converting the hardware circuit description language into a hardware circuit.
  • the neural network model topology configuration file may include the number of neural network layers and each layer network size, data bit width, weight bit width, current layer function attribute, current layer input layer number, current layer output layer number, current Layer convolution kernel size, current layer step size, next layer connection property.
  • the method further includes generating a stream of control instructions while generating a neural network circuit model, the types of instructions including load/store instructions and arithmetic instructions.
  • the step 3) may include: performing convolution kernel partitioning, data partitioning according to the neural network model topology configuration file, and generating a control state machine; generating a control instruction according to the control state machine flow.
  • the hardware architecture description file includes input data memory capacity, input memory bit width, weight memory capacity, weight memory bit width, offset memory capacity, offset memory bit width, output data memory capacity, output data Memory bit width, data bit width, calculation unit width, calculation unit depth, data sharing flag bit, and weight sharing flag bit.
  • the present invention also provides an automated design system for a neural network processor, comprising:
  • Obtaining a data module configured to acquire a neural network model topology configuration file and a hardware resource constraint file, where the hardware resource constraint file includes a target circuit area overhead, a target circuit power consumption overhead, and a target circuit operating frequency;
  • Generating a hardware architecture description file module configured to generate a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generate a hardware architecture description file;
  • Generating a control description file module configured to optimize a data scheduling, storage, and calculation manner according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, to generate a corresponding control description file;
  • Generating a hardware circuit module configured to search, according to the hardware architecture description file, the control description file, a cell library that meets a design requirement from a constructed neural network reusable cell library, and generate a corresponding hardware circuit description language, where The hardware circuit description language is converted into a hardware circuit.
  • the present invention also provides an optimization method based on an automated design method for a neural network processor as described, comprising:
  • Step (1) for a given neural network layer, if the convolution kernel size k is consistent with the step value s, the weight sharing mode is adopted, and the convolution kernel performs a convolution operation in the single layer data graph;
  • Step (2) if the number of data layers is smaller than the calculation unit width, the convolution kernel is divided into multiple convolution kernels k s by a method of convolution kernel division; if the number of data layers is greater than the calculation unit width, data is used. Sharing method
  • step (3) the calculation mode of the next neural network layer is determined, and the calculation result of the current layer is stored according to the convolution operation mode of the next neural network layer.
  • the neural network model is mapped to the hardware description language code for designing the hardware circuit, and the designed hardware circuit structure and data storage mode are automatically optimized according to hardware resource constraints and network characteristics, and the corresponding control instruction stream is generated at the same time, and the neural network is realized.
  • the hardware and software automation of the hardware accelerator is designed to shorten the design cycle of the neural network processor, improve the performance of the neural network processor, and meet the neural network operation requirements of the upper application developers.
  • Figure 1 shows a schematic diagram of a topology common to neural networks
  • Figure 2 shows a schematic block diagram of a neural network convolution operation
  • Figure 3 shows a schematic block diagram of a common structure according to a neural network processor
  • FIG. 4 is a schematic diagram of an automated design flow of a neural network processor in accordance with one embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a compiler workflow according to an embodiment of the present invention.
  • FIG. 6 is a flow chart of a control state machine for performing a convolution operation by a neural network processor in accordance with one embodiment of the present invention
  • FIG. 7 is a schematic diagram of the operation of a convolution kernel in a weight sharing mode according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a convolution kernel split according to an embodiment of the present invention.
  • Figure 9 is a diagram of an instruction format of a load/store instruction
  • Fig. 10 is a diagram showing an instruction format of an arithmetic instruction.
  • the neural network is a mathematical model for modeling human brain structure and behavioral activities. It is usually divided into input layer, hidden layer and output layer. Each layer is composed of multiple neuron nodes. The neuron nodes in this layer are composed. The output value is passed as input to the next level of neuron nodes, connected layer by layer.
  • the neural network itself has bionic characteristics, and its multi-layer abstract iterative process has similar information processing methods as the human brain and other sensory organs.
  • Figure 1 shows a common topology diagram of a neural network.
  • the first layer input value of the neural network multilayer structure is the original image (the "original image” in the present invention refers to the original data to be processed, not only the image obtained by taking a photograph in a narrow sense), typically, for
  • the convolution operation process in the neural network is generally as shown in Fig. 2: a K*K-sized two-dimensional weight convolution check feature map is scanned, and the inner weight of the corresponding feature elements in the feature map is obtained during the scanning process. And sum all the inner product values to get an output layer feature element.
  • N convolution kernels of K*K size are convoluted with the feature maps in the convolutional layer, and N inner product values are summed to obtain an output layer.
  • Feature element N convolution kernels of K*K size
  • the neural network operation may also include pooling, normalization calculation, and the like.
  • hardware acceleration techniques are usually used to construct a dedicated neural network processor to implement neural network computing.
  • common hardware acceleration technologies include ASIC or FPGA.
  • FPGAs are more flexible from a design perspective.
  • Verilog HDL Hard Description Language
  • VHDL Very High Speed L1
  • other hardware description language can be used to define internal logic structures to implement custom hardware circuits.
  • Common neural network processors are based on storage-control-calculation logic structures.
  • the storage structure is configured to store data participating in the calculation, the weight of the neural network, and the operation instruction of the processor;
  • the control structure includes a decoding circuit and a control logic circuit for parsing the operation instruction, generating a control signal to control scheduling of the data in the processor, and
  • Storage and computational processes of neural networks; computational structures are responsible for neural network computational operations.
  • the storage unit may store data transmitted from outside the neural network processor (for example, original feature map data), trained neural network weights, processing results or intermediate results generated in the calculation process, instruction information participating in the calculation, and the like.
  • FIG. 3 is a schematic diagram showing a conventional neural network processor system 101 including an input data storage unit 102, a control unit 103, an output data storage unit 104, a weight storage unit 105, and an instruction storage unit 106. , calculation unit 107.
  • the input data storage unit 102 is configured to store data participating in the calculation, the data includes original feature map data and data participating in the intermediate layer calculation;
  • the output data storage unit 104 stores the calculated neuron response value;
  • the instruction storage unit 106 stores the participating calculations.
  • the instruction information is interpreted as a control flow to schedule neural network calculations;
  • the weight storage unit 105 is configured to store the trained neural network weights.
  • the control unit 103 is connected to the output data storage unit 104, the weight storage unit 105, the instruction storage unit 106, and the calculation unit 107, respectively.
  • the control unit 103 is responsible for instruction decoding, data scheduling, process control, etc., for example, is obtained and stored in the instruction storage unit 106.
  • the instruction in the instruction parses the instruction, schedules the data according to the parsed control signal, and controls the computing unit to perform a correlation operation of the neural network.
  • the calculation unit 107 is configured to perform a corresponding neural network calculation according to a control signal generated by the control unit 103.
  • the computing unit 107 is associated with one or more storage units, which may obtain data from a data storage component in its associated input data storage unit 102 for calculation and may output to the output data storage unit 104 associated therewith. data input.
  • the computing unit 107 performs most of the operations in the neural network algorithm, including vector multiply and add operations, pooling, normalization calculations, and the like.
  • the topology and parameter design of the neural network model will change according to different application scenarios or application requirements, and the development of the neural network model is fast, which is a neural network model and algorithm.
  • High-level application developers bring great development challenges, not only need to quickly design or adjust related hardware acceleration solutions for different application requirements, but also require high-level developers to understand hardware development such as FPGA while mastering neural network models and algorithms. Technology, so development is very difficult.
  • an automated design method system or apparatus suitable for a neural network processor comprising a hardware generator and a compiler; wherein the hardware generator is based on a neural network model and hardware
  • the resource constraint automatically generates the hardware description language code of the neural network processor for subsequent hardware designers to generate the hardware circuit of the neural network processor through the hardware description language by using the existing hardware circuit design method; and the compiler can be used to generate the neural network processor Control of the circuit structure and flow of data dispatch instructions.
  • the hardware generator may construct a neural network processor hardware architecture according to a topology of the neural network model, a hardware resource constraint file, and a constructed neural network reusable cell library according to the processor hardware architecture.
  • the control state machine generated by the compiler generates the hardware description language code.
  • the system may also include a pre-built neural network reusable cell library, which may include various recoverable neural network models.
  • the basic units used include, for example but are not limited to: a neuron unit, an accumulator unit, a pooling unit, a classifier unit, a local response normalization unit, a lookup table unit, an address generation unit, a control unit, and the like.
  • the specific hardware structure of each unit in the cell library is defined by the hardware description file associated with it.
  • the hardware description file for each unit can be described in Verilog HDL or other hardware description language.
  • each unit also has a configuration script associated therewith, by which the hardware structure of each unit can be appropriately adjusted; for example, configuring the bit width of the register in the neuron unit, and adding the addition included in the configuration addition tree unit The number of devices, the number of comparators in the configuration pooling unit, and so on.
  • FIG. 4 illustrates a workflow applicable to a neural network processor automation design system according to an embodiment of the present invention, which may mainly include:
  • the neural network model topology configuration file is read.
  • the neural network model topology configuration file is mainly used to describe a neural network model designed according to specific application requirements, including a network topology of the neural network model and various operational layer definitions.
  • the neural network model topology configuration file may include a number of neural network layers, a network size and structure of each layer, a data bit width, a weight bit width, a current layer function attribute, a current layer input layer number, a current layer output layer number, Current layer convolution kernel size, current layer step size, next layer connection properties, and so on.
  • Each layer of the neural network includes one or more units, and the unit types are usually basic neuron units, convolution units, pool units, normalization units, loop units, and the like.
  • the neural network model description file may include three parts: a basic attribute, a parameter description, and a connection information, wherein the basic attribute may include a layer name, a layer type, a layer structure, and the like; the parameter description may include an output layer number, a convolution kernel size, Step size, etc.; connection information may include connection name, connection direction, connection type, and the like.
  • Step S2 reading a hardware resource constraint file, the hardware resource constraint file including some parameters describing the available hardware resources of the target hardware circuit of the neural network processor to be implemented, for example, may include implementing the neural network processing The operating frequency of the target hardware circuit, the target circuit area overhead, the target circuit power consumption overhead, the supported data precision, the target circuit memory size, and so on. These hardware resource constraint parameters can be loaded into the system together in a constraint file.
  • Step S3 the hardware generator of the system constructs a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generates a corresponding hardware architecture description file.
  • the hardware architecture description file may include hardware circuit structure, input data memory capacity, input memory bit width, weight memory capacity, weight memory bit width, offset memory capacity, offset memory bit width, output data memory capacity, output data memory bit. Width, data bit width, calculation unit width, calculation unit depth, data sharing flag, weight sharing flag, and so on.
  • step S4 the compiler of the system generates a control of the neural network processor circuit structure and a data dispatch instruction stream, and these processes can be described by controlling the state machine. For example, according to the neural network model topology, hardware resource constraints, and hardware architecture description files, data scheduling, storage, and calculation methods are optimized, and corresponding control description files are generated.
  • FIG. 5 is a flowchart showing a workflow of a compiler for generating a control instruction stream according to a neural network topology, a constructed hardware architecture, and a hardware resource constraint file to perform real-time on a neural network processor according to an embodiment of the present invention. control.
  • the method may include: step a1, reading the neural network model topology configuration file network topology structure configuration file, the hardware architecture description file, and the hardware resource constraint file.
  • the compiler performs scheduling optimization such as convolution kernel partitioning and data partitioning according to the above file, and generates a control state machine.
  • the control state machine can be used to schedule the operational state of the neural network processor hardware circuitry to be implemented.
  • a control instruction stream for the neural network processor is generated based on the control state machine.
  • Figure 6 depicts a partial control state machine flow diagram with a neural network processor performing a convolution operation as an example.
  • control neural network related unit reads the neural network data and the weight data from the external memory into the internal memory, and then loads the relevant neural network data, the offset data, and the weight data to be subjected to the convolution operation into the calculation unit, and then controls the calculation unit to perform multiplication. Add operation and accumulate operation, and repeat the above loading and calculation operations until the corresponding data is calculated.
  • step S5 the hardware generator indexes the cell library that meets the design requirements from the constructed neural network reusable cell library according to the hardware architecture description file and the control description file, generates corresponding control logic, and generates and generates The hardware circuit description language of the neural network processor corresponding to the neural network model. Then in step S6, the generated hardware circuit description language can be converted into a specific hardware circuit implementing the neural network processor by an existing hardware design method.
  • the neural network model is often unable to fully expand according to its model description when mapped to a hardware circuit, whereby the compiler can analyze the computational throughput of the neural network processor and On-chip memory size, which divides neural network feature data and weight data into appropriate data block storage and access.
  • the computational data of the neural network includes input feature data and trained weight data. Through good data segmentation and storage layout, the internal data bandwidth of the processor can be reduced and the storage space utilization efficiency can be improved.
  • the optimization method of the compiler based on convolution kernel partitioning and data sharing is described below with reference to Figs. 7 and 8, and mainly includes the following steps:
  • Step (1) for a given neural network layer, if the convolution kernel size k and the step value s are the same, the weight sharing mode is adopted, and the convolution kernel performs a convolution operation in the single layer data graph, as shown in FIG. ;
  • Step (2) if the number of data layers is smaller than the calculation unit width, the convolution kernel segmentation method is used to divide the large convolution kernel k into small convolution kernels k s , as shown in FIG. 8; if the data layer number is greater than Calculate the unit width and use data sharing.
  • step (3) the calculation mode of the next neural network layer is determined, and the calculation result of the current layer is stored according to the convolution operation mode of the next neural network layer.
  • the instruction streams generated by the compiler of the system for control and data scheduling of neural network processor circuit structures may be collectively referred to as instruction streams. These instruction streams are used to control the operation and operation of the designed neural network processor.
  • Instruction types can include types such as load/store instructions and arithmetic instructions. For example, where the load/store instruction can include the following instructions:
  • the instruction format of the load/store instruction is introduced.
  • the instruction format is as shown in FIG. 9, wherein the operation code is used to mark the instruction type; the transmission interval is used to mark the emission of each operation of the instruction. Interval; data first address is used to mark the data first address; operation mode is used to describe the working state of the circuit, including large convolution kernel operation, small convolution kernel operation, pooling operation, full connection operation, etc.; convolution kernel size The convolution kernel value is marked; the output image size is used to mark the output image size; the number of input layers is used to mark the number of input layers; the number of output layers is used to mark the number of output layers; and the clear signal is used to clear the data values.
  • the operation instructions may include a convolution operation instruction for controlling a convolution operation; a pooled operation instruction for controlling a pooling operation; a local corresponding normalization instruction for controlling a local response normalization operation; A clear instruction for clearing data loaded in the calculation unit; an excitation function operation instruction for controlling the operation of the excitation function and configuring the function mode, and the like.
  • the convolution instruction as an example, the instruction format of the operation instruction is introduced.
  • the instruction format is shown in Figure 10, where the operation code is used to mark the instruction type; the calculation core number is used to mark the number of calculation cores participating in the operation; the transmission interval is used to mark the instruction The transmission interval of each operation; the operation mode is used to include modes such as intra-layer convolution and cross-layer convolution; the target register is used to mark the storage location of the calculation result, including the output data memory, the excitation function register, and the lookup table register.
  • the compiler may, for example, take the following steps to generate the above described instruction stream:
  • Step b1 reading the name of the neural network layer
  • Step b2 reading in the neural network layer type
  • Step b3 parsing neural network layer parameters
  • Step b4 determining a hardware circuit structure and parameters
  • Step b5 performing scheduling optimization based on the convolution kernel segmentation and data sharing optimization manner described above in connection with FIGS. 7 and 8;
  • the instruction parameters are determined and the control flow instruction is generated according to the neural network working mode and the scheduling mode.
  • the command parameters may include, for example, a neural network layer serial number, an input layer number, an output layer number, a data size per layer, a data width, a weight width, a convolution kernel size, and the like.
  • an automated design method for a neural network processor comprising: step A, acquiring a neural network model topology configuration file for a neural network model to be implemented in a hardware circuit manner, and a hardware resource constraint file of the target hardware circuit; step B, constructing a hardware structure of the neural network processor corresponding to the neural network model according to the neural network model topology configuration file and the hardware resource constraint file, and generating a hardware architecture description a file; step C, generating a control description file for controlling data scheduling, storage, and calculation of the neural network processor according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file Step D, generating a hardware circuit description language corresponding to the neural network processor according to the hardware architecture description file and the control description file, so as to implement the hardware circuit of the neural network processor on the target hardware circuit.
  • the hardware resource constraint file may include one or more of the following: an operating frequency of the target hardware circuit, a target circuit area overhead, a target circuit power consumption overhead, a supported data precision, and a target circuit memory size.
  • the control description file may include an instruction stream for controlling data scheduling, storage, and calculation of the neural network processor, wherein the types of instructions include load/store instructions and operation instructions.
  • step B may include: acquiring the neural model components and their associated specific hardware structures according to the neural network model topology configuration file and the hardware resource constraint parameters and based on the pre-established cell library, wherein
  • the unit library is composed of various types of units reusable in a neural network, each unit includes a hardware description file and a configuration script for describing a hardware structure thereof; and a description file according to the neural network model and the description
  • the hardware resource constraint parameter sets a configuration script of each unit acquired from the unit library to obtain a description file of a hardware structure corresponding to each unit, thereby obtaining a hardware architecture description file of the neural network processor.
  • the step C may further comprise determining a convolution kernel partitioning and data sharing manner by the following steps:
  • the convolution kernel k is divided into a plurality of convolution kernels k s ; if the number of data layers is greater than the calculation unit width, a data sharing manner is adopted;
  • an automated design method for a neural network processor including: step 1), acquiring a neural network model topology configuration file and a hardware resource constraint file, wherein the hardware resource constraint file includes Target circuit area overhead, target circuit power consumption overhead, and target circuit operating frequency; step 2), generating a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generating a hardware architecture description a file; step 3), optimizing a data scheduling, storing, and calculating manner according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, and generating a corresponding control description file; step 4)
  • the hardware architecture description file, the control description file, the cell library that meets the design requirements are searched from the constructed neural network reusable cell library, the corresponding control logic is generated, and a corresponding hardware circuit description language is generated, and the hardware circuit is generated.
  • the description language is converted to a hardware circuit.
  • the neural network model topology configuration file may include a number of neural network layers and each layer network size, a data bit width, a weight bit width, a current layer function attribute, a current layer input layer number, and a current layer output. The number of layers, the current layer convolution kernel size, the current layer step size, and the next layer of connection properties.
  • the hardware architecture description file may include input data memory capacity, input memory bit width, weight memory capacity, weight memory bit width, offset memory capacity, offset memory bit width, output data memory capacity, output Data memory bit width, data bit width, calculation unit width, calculation unit depth, data sharing flag bit, and weight sharing flag bit.
  • the method can also include generating a stream of control instructions while generating a neural network circuit model, the types of instructions including load/store instructions and arithmetic instructions.
  • step 3) may comprise: convolution kernel partitioning, data partitioning according to the neural network model topology configuration file, and generating a control state machine; generating a control instruction stream according to the control state machine.
  • an automated design apparatus for a neural network processor comprising:
  • Obtaining a data module configured to acquire a neural network model topology configuration file and a hardware resource constraint file, where the hardware resource constraint file includes a target circuit area overhead, a target circuit power consumption overhead, and a target circuit operating frequency;
  • Generating a hardware architecture description file module configured to generate a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generate a hardware architecture description file;
  • Generating a control description file module configured to optimize a data scheduling, storage, and calculation manner according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, to generate a corresponding control description file;
  • Generating a hardware circuit module configured to search, according to the hardware architecture description file, the control description file, a cell library that meets a design requirement from a constructed neural network reusable cell library, and generate a corresponding hardware circuit description language, where The hardware circuit description language is converted into a hardware circuit.
  • the neural network reusable unit library includes: a neuron unit, an accumulator unit, a pooling unit, a classifier unit, a local response normalization unit, a lookup table unit, an address generation unit, and a control unit .
  • the generating the control description file includes: performing convolution kernel partitioning, data partitioning according to the neural network model topology configuration file, and generating a control state machine; generating a control instruction according to the control state machine flow.
  • the automatic design system applicable to the neural network processor can map the neural network model to the hardware description language of the neural network dedicated processor, and optimize the data calculation and scheduling according to the processor structure.
  • the method and the corresponding control flow instruction are generated, which realizes the automatic design of the neural network processor, reduces the design cycle of the neural network processor, and adapts to the neural network technology network model update fast, the operation speed requirement block, and the energy efficiency requirement.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Devices For Executing Special Programs (AREA)
  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

Disclosed are an automated design method and system applicable for a neural network processor. The method comprises: acquiring a topological structure configuration file of a neural network model and a hardware resource constraint file of a target hardware circuit; constructing, according to the topological structure configuration file of the neural network model and the hardware resource constraint file, a hardware architecture and descriptor file thereof, of a neural network processor, corresponding to the neural network model, and a control descriptor file for controlling data scheduling, storage and computing of the neural network processor, and then based on the hardware architecture descriptor file and the control descriptor file, generating a hardware descriptor code with the neural network processor, so as to realize a hardware circuit of the neural network processor on the target hardware circuit. The system and method realize an automated design of a neural network processor, shorten the designing cycle of the neural network processor, and adapt to the characteristics of the rapid update of network models in neural network technology and the requirement of high operating speed.

Description

适用神经网络处理器的自动化设计方法及系统Automatic design method and system for neural network processor 技术领域Technical field
本发明涉及神经网络处理器体系结构技术领域,特别涉及适用神经网络处理器的自动化设计方法及系统。The present invention relates to the technical field of neural network processor architecture, and in particular to an automatic design method and system for a neural network processor.
背景技术Background technique
随着人工智能领域相关技术的飞速发展,深度学习作为计算机科学与生命科学的跨学科产物,在解决高级抽象认知问题上具有出色的表现,因此成为了学术界和工业界的研究热点。为了提高神经网络的计算性能同时适应更复杂的应用问题,神经网络的规模在不断扩大,计算量、数据量及运算能耗也随之增加。寻找高性能低能耗的神经网络计算方法及设备成为研究人员的关注热点With the rapid development of related technologies in the field of artificial intelligence, deep learning, as an interdisciplinary product of computer science and life science, has outstanding performance in solving high-level abstract cognitive problems, and thus has become a research hotspot in academia and industry. In order to improve the computational performance of neural networks and to adapt to more complex application problems, the scale of neural networks is expanding, and the amount of computation, data volume, and computing energy consumption are also increasing. Finding high-performance and low-energy neural network computing methods and equipment has become a hot spot for researchers
目前利用深度神经网络进行实时任务分析大多依靠大规模高性能处理器或通用图形处理器,这些设备成本高功耗大,面向便携式智能设备应用时,存在电路规模大、能量消耗高和产品价格昂贵等一系列问题,因此,针对嵌入式设备及小型低成本数据中心等应用领域中高能效实时处理的应用,采用专用神经网络处理器加速而不是软件的方式进行神经网络模型计算成为一种更有效的解决方案,然而神经网络模型的拓扑结构及参数设计会根据不同的应用场景而改变,另外神经网络模型的发展更迭速度很快,提供一种可以面向各种应用场景并覆盖各种神经网络模型的通用高效神经网络处理器非常困难,这为高层应用开发者针对不同应用需求设计硬件加速解决方案带来了极大不变。At present, real-time task analysis using deep neural networks mostly relies on large-scale high-performance processors or general-purpose graphics processors. These devices are high in cost and high in power consumption. When applied to portable smart device applications, there are large circuit scales, high energy consumption, and expensive products. A series of problems, therefore, for applications of high-efficiency real-time processing in embedded devices and small-scale low-cost data centers, the use of specialized neural network processor acceleration rather than software for neural network model calculation becomes a more effective Solution, however, the topology and parameter design of the neural network model will change according to different application scenarios. In addition, the development of the neural network model is very fast, providing a kind of neural network model that can cover various application scenarios and cover various neural network models. The general-purpose high-efficiency neural network processor is very difficult, which brings the high-end application developers to design hardware acceleration solutions for different application requirements.
目前现有的神经网络硬件加速技术包括专用集成电路(Application Specific Integrated Circuit,ASIC)芯片和现场可编程门阵列(Field Programmable Gate Array,FPGA)两种方式,在同等工艺条件下,ASIC芯片运行速度快且功耗低,但设计流程复杂、投片周期长、开发成本高,无法适应神经网络模型快速更新的特点;FPGA具有电路配置灵活、开发 周期短的特点,但运行速度相对低,硬件开销及功耗相对较大,无论采用上述哪种硬件加速技术,均需要神经网络模型及算法开发人员在了解网络拓扑和数据流模式的同时掌握硬件开发技术,包括处理器架构设计、硬件代码编写、仿真验证及布局布线等环节,这些技术对专注于研究神经网络模型及结构设计、而不具备硬件设计能力的高层应用开发人员而言开发难度较高。At present, the existing neural network hardware acceleration technology includes an Application Specific Integrated Circuit (ASIC) chip and a Field Programmable Gate Array (FPGA). Under the same process conditions, the ASIC chip runs at a speed. Fast and low power consumption, but the design process is complex, the filming period is long, the development cost is high, and it can not adapt to the characteristics of rapid update of the neural network model; FPGA has the characteristics of flexible circuit configuration and short development cycle, but the running speed is relatively low, hardware overhead And the power consumption is relatively large. Regardless of which hardware acceleration technology is used, the neural network model and algorithm developers need to master the hardware development technology while understanding the network topology and data flow mode, including processor architecture design, hardware code writing, Simulation verification and layout and other aspects, these technologies are more difficult for high-level application developers who focus on researching neural network models and structural design without hardware design capabilities.
发明内容Summary of the invention
针对现有技术的不足,本发明提供一款适用神经网络处理器的自动设计方法及系统,以便于高层开发者高效地进行神经网络技术应用开发。In view of the deficiencies of the prior art, the present invention provides an automatic design method and system for a neural network processor, so that high-level developers can efficiently perform neural network technology application development.
在一个方面,本发明提供了一种适用神经网络处理器的自动化设计方法,该方法包括:In one aspect, the present invention provides an automated design method for a neural network processor, the method comprising:
步骤A,对于要以硬件电路方式实现的神经网络模型,获取神经网络模型拓扑结构配置文件以及目标硬件电路的硬件资源约束文件;Step A: acquiring a neural network model topology configuration file and a hardware resource constraint file of the target hardware circuit for the neural network model to be implemented in a hardware circuit manner;
步骤B,根据所述神经网络模型拓扑结构配置文件与所述硬件资源约束文件来构建与该神经网络模型对应的神经网络处理器硬件架构,并生成硬件架构描述文件;Step B: construct a neural network processor hardware architecture corresponding to the neural network model according to the neural network model topology configuration file and the hardware resource constraint file, and generate a hardware architecture description file;
步骤C,根据所述神经网络模型拓扑结构、所述硬件资源约束文件及所述硬件架构描述文件生成用于对所述神经网络处理器进行数据调度、存储及计算进行控制的控制描述文件;Step C: Generate a control description file for controlling data scheduling, storage, and calculation of the neural network processor according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file;
步骤D,根据所述硬件架构描述文件和所述控制描述文件生成与该神经网络处理器对应的硬件电路描述语言,以便在所述目标硬件电路上实现该神经网络处理器的硬件电路。Step D: Generate a hardware circuit description language corresponding to the neural network processor according to the hardware architecture description file and the control description file, so as to implement a hardware circuit of the neural network processor on the target hardware circuit.
在上述方法中,所述步骤B可包括:In the above method, the step B may include:
根据所述神经网络模型拓扑结构配置文件与所述硬件资源约束参数并基于预先建立的单元库获取神经模型各组成单元及其关联的具体硬件结构,其中所述单元库由神经网络中可复用的各种类型的单元构成,每个单元包括用于描述其硬件结构的硬件描述文件及配置脚本;以及Obtaining, according to the neural network model topology configuration file and the hardware resource constraint parameter, a component of a neural model component and a specific hardware structure thereof according to a pre-established cell library, wherein the cell library is reusable by a neural network Various types of unit components, each unit including a hardware description file and a configuration script for describing its hardware structure;
根据所述神经网络模型的描述文件与所述硬件资源约束参数对从所述单元库获取的各单元的配置脚本进行设置来获得与各单元对应的硬件 结构的描述文件,从而获得所述神经网络处理器的硬件架构描述文件。And setting a configuration script of each unit acquired from the unit library according to the description file of the neural network model and the hardware resource constraint parameter to obtain a description file of a hardware structure corresponding to each unit, thereby obtaining the neural network. The hardware architecture description file for the processor.
在上述方法中,所述硬件资源约束文件可包括下列中的一个或多个:目标硬件电路的工作频率、目标电路面积开销、目标电路功耗开销、所支持的数据精度、目标电路存储器大小。In the above method, the hardware resource constraint file may include one or more of the following: an operating frequency of the target hardware circuit, a target circuit area overhead, a target circuit power consumption overhead, a supported data precision, and a target circuit memory size.
在上述方法中,所述神经网络模型拓扑结构配置文件可包括神经网络层数及每层网络尺寸大小、数据位宽、权重位宽、当前层功能属性、当前层输入层数、当前层输出层数、当前层卷积核大小、当前层步进大小,下一层连接属性。In the above method, the neural network model topology configuration file may include the number of neural network layers and each layer network size, data bit width, weight bit width, current layer function attribute, current layer input layer number, current layer output layer Number, current layer convolution kernel size, current layer step size, next layer connection property.
在上述方法中,所述步骤C还可包括通过下面步骤确定卷积核分割和数据共享方式:In the above method, the step C may further include determining a convolution kernel splitting and data sharing manner by the following steps:
(1),对于给定的神经网络层,若卷积核大小k与步进值s一致,采用权重共享模式,卷积核在单层数据图内进行卷积操作;(1) For a given neural network layer, if the convolution kernel size k is consistent with the step value s, the weight sharing mode is adopted, and the convolution kernel performs a convolution operation in the single layer data graph;
(2),若数据图层数小于计算单元宽度,将卷积核k分割为多个卷积核k s;若数据图层数大于计算单元宽度,采用数据共享方式; (2) If the number of data layers is smaller than the calculation unit width, the convolution kernel k is divided into a plurality of convolution kernels k s ; if the number of data layers is greater than the calculation unit width, a data sharing manner is adopted;
(3),判断下一神经网络层的计算方式,并根据下一神经网络层的卷积操作方式存储当前层的计算结果。(3), judging the calculation mode of the next neural network layer, and storing the calculation result of the current layer according to the convolution operation mode of the next neural network layer.
在上述方法中,所述控制描述文件可包括用于对所述神经网络处理器进行数据调度、存储及计算进行控制的指令流,其中指令的类型包括载入/存储指令和运算指令。In the above method, the control description file may include an instruction stream for controlling data scheduling, storage, and calculation of the neural network processor, wherein the types of instructions include load/store instructions and operation instructions.
在上述方法中,所述载入/存储指令可包括用于外部存储器与神经网络处理器的内部存储器之间的数据交换指令、用于所述内部存储器中的数据和权重载入至计算单元的指令、用于将计算单元的计算结果存储至存储器中的指令;以及所述运算指令包括卷积操作指令、池化操作指令、局部相应归一化指令、清除指令、激励函数操作指令。In the above method, the load/store instructions may include data exchange instructions between an external memory and an internal memory of the neural network processor, data for the internal memory, and weight loading to the computing unit. And an instruction for storing the calculation result of the calculation unit into the memory; and the operation instruction includes a convolution operation instruction, a pooling operation instruction, a local corresponding normalization instruction, a clear instruction, and an excitation function operation instruction.
在上述方法中,所述卷积指令的格式可包括下列字段:用于标记指令类型的操作码、用于标记参与运算的计算核心数的计算核心数、用于标记指令每次操作的发射间隔的发射间隔;用于标记层内卷积及跨层卷积的模式的操作模式、用于标记计算结果的存储位置的目标寄存器。In the above method, the format of the convolutional instruction may include the following fields: an opcode for marking the type of instruction, a number of computation cores for marking the number of computational cores participating in the operation, and a transmission interval for marking each operation of the instruction. The transmission interval; the mode of operation for marking the mode of convolution and cross-layer convolution in the layer, and the destination register for marking the storage location of the calculation result.
在又一个方面,本发明提供一种适用神经网络处理器的自动化设计方法,包括:In yet another aspect, the present invention provides an automated design method for a neural network processor, comprising:
1),获取神经网络模型拓扑结构配置文件与硬件资源约束文件,其中所述硬件资源约束文件包括目标电路面积开销、目标电路功耗开销及目标电路工作频率;1) acquiring a neural network model topology configuration file and a hardware resource constraint file, where the hardware resource constraint file includes a target circuit area overhead, a target circuit power consumption overhead, and a target circuit operating frequency;
2),根据所述神经网络模型拓扑结构配置文件与所述硬件资源约束文件生成神经网络处理器硬件架构,并生成硬件架构描述文件;2) generating a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generating a hardware architecture description file;
3),根据所述神经网络模型拓扑结构、所述硬件资源约束文件及所述硬件架构描述文件优化数据调度、存储及计算方式,生成对应的控制描述文件;3) generating, according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, a data control, storage, and calculation manner, and generating a corresponding control description file;
4),根据所述硬件架构描述文件、所述控制描述文件从已构建的神经网络可复用单元库查找符合设计要求的单元库、生成相对应的控制逻辑并生成对应的硬件电路描述语言,将所述硬件电路描述语言转化为硬件电路。4), according to the hardware architecture description file, the control description file from the constructed neural network reusable cell library to find a cell library that meets the design requirements, generate corresponding control logic, and generate a corresponding hardware circuit description language, Converting the hardware circuit description language into a hardware circuit.
其中,所述神经网络模型拓扑结构配置文件可包括神经网络层数及每层网络尺寸大小、数据位宽、权重位宽、当前层功能属性、当前层输入层数、当前层输出层数、当前层卷积核大小、当前层步进大小,下一层连接属性。在一个实施例中,该方法还包括在生成神经网络电路模型的同时生成控制指令流,指令类型包括载入/存储指令和运算指令等类型。在一个实施例中,所述步骤3)可包括:根据所述神经网络模型拓扑结构配置文件进行卷积核分块、数据分块,并生成控制状态机;根据所述控制状态机生成控制指令流。在一些实施例中,所述硬件架构描述文件包括输入数据存储器容量、输入存储器位宽、权重存储器容量、权重存储器位宽、偏置存储器容量、偏置存储器位宽、输出数据存储器容量、输出数据存储器位宽、数据位宽、计算单元宽度、计算单元深度、数据共享标志位及权重共享标志位。The neural network model topology configuration file may include the number of neural network layers and each layer network size, data bit width, weight bit width, current layer function attribute, current layer input layer number, current layer output layer number, current Layer convolution kernel size, current layer step size, next layer connection property. In one embodiment, the method further includes generating a stream of control instructions while generating a neural network circuit model, the types of instructions including load/store instructions and arithmetic instructions. In an embodiment, the step 3) may include: performing convolution kernel partitioning, data partitioning according to the neural network model topology configuration file, and generating a control state machine; generating a control instruction according to the control state machine flow. In some embodiments, the hardware architecture description file includes input data memory capacity, input memory bit width, weight memory capacity, weight memory bit width, offset memory capacity, offset memory bit width, output data memory capacity, output data Memory bit width, data bit width, calculation unit width, calculation unit depth, data sharing flag bit, and weight sharing flag bit.
在又一个方面,本发明还提出一种适用神经网络处理器的自动化设计系统,包括:In yet another aspect, the present invention also provides an automated design system for a neural network processor, comprising:
获取数据模块,用于获取神经网络模型拓扑结构配置文件与硬件资源约束文件,其中所述硬件资源约束文件包括目标电路面积开销、目标电路功耗开销及目标电路工作频率;Obtaining a data module, configured to acquire a neural network model topology configuration file and a hardware resource constraint file, where the hardware resource constraint file includes a target circuit area overhead, a target circuit power consumption overhead, and a target circuit operating frequency;
生成硬件架构描述文件模块,用于根据所述神经网络模型拓扑结构配 置文件与所述硬件资源约束文件生成神经网络处理器硬件架构,并生成硬件架构描述文件;Generating a hardware architecture description file module, configured to generate a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generate a hardware architecture description file;
生成控制描述文件模块,用于根据所述神经网络模型拓扑结构、所述硬件资源约束文件及所述硬件架构描述文件优化数据调度、存储及计算方式,生成对应的控制描述文件;Generating a control description file module, configured to optimize a data scheduling, storage, and calculation manner according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, to generate a corresponding control description file;
生成硬件电路模块,用于根据所述硬件架构描述文件、所述控制描述文件从已构建的神经网络可复用单元库查找符合设计要求的单元库并生成对应的硬件电路描述语言,将所述硬件电路描述语言转化为硬件电路。Generating a hardware circuit module, configured to search, according to the hardware architecture description file, the control description file, a cell library that meets a design requirement from a constructed neural network reusable cell library, and generate a corresponding hardware circuit description language, where The hardware circuit description language is converted into a hardware circuit.
在又一个方面,本发明还提出一种基于如所述的适用神经网络处理器的自动化设计方法的优化方法,包括:In yet another aspect, the present invention also provides an optimization method based on an automated design method for a neural network processor as described, comprising:
步骤(1),对于给定的神经网络层,若卷积核大小k与步进值s一致,采用权重共享模式,卷积核在单层数据图内进行卷积操作;Step (1), for a given neural network layer, if the convolution kernel size k is consistent with the step value s, the weight sharing mode is adopted, and the convolution kernel performs a convolution operation in the single layer data graph;
步骤(2),若数据图层数小于计算单元宽度,采用卷积核分割的方法,将卷积核k分割为多个卷积核k s;若数据图层数大于计算单元宽度,采用数据共享方式; Step (2), if the number of data layers is smaller than the calculation unit width, the convolution kernel is divided into multiple convolution kernels k s by a method of convolution kernel division; if the number of data layers is greater than the calculation unit width, data is used. Sharing method
步骤(3),判断下一神经网络层的计算方式,并根据下一神经网络层的卷积操作方式存储当前层的计算结果。In step (3), the calculation mode of the next neural network layer is determined, and the calculation result of the current layer is stored according to the convolution operation mode of the next neural network layer.
由以上方案可知,本发明的优点在于:As can be seen from the above scheme, the advantages of the present invention are:
将神经网络模型映射为用于设计硬件电路的硬件描述语言代码,并根据硬件资源约束和网络特征自动优化所设计的硬件电路结构及数据存储方式,同时生成相应的控制指令流,实现了神经网络硬件加速器的硬件及软件自动化协同设计,从而缩短神经网络处理器的设计周期、提高神经网络处理器的工作性能并满足上层应用开发者的神经网络运行需求。The neural network model is mapped to the hardware description language code for designing the hardware circuit, and the designed hardware circuit structure and data storage mode are automatically optimized according to hardware resource constraints and network characteristics, and the corresponding control instruction stream is generated at the same time, and the neural network is realized. The hardware and software automation of the hardware accelerator is designed to shorten the design cycle of the neural network processor, improve the performance of the neural network processor, and meet the neural network operation requirements of the upper application developers.
附图说明DRAWINGS
以下参照附图对本发明的实施例作进一步说明,其中:The embodiments of the present invention are further described below with reference to the accompanying drawings, in which:
图1示出了神经网络常见的拓扑示意图;Figure 1 shows a schematic diagram of a topology common to neural networks;
图2示出了神经网络卷积操作示意框图;Figure 2 shows a schematic block diagram of a neural network convolution operation;
图3示出了根据神经网络处理器常见结构示意框图;Figure 3 shows a schematic block diagram of a common structure according to a neural network processor;
图4是根据本发明一个实施例的神经网络处理器的自动化设计流程示 意图;4 is a schematic diagram of an automated design flow of a neural network processor in accordance with one embodiment of the present invention;
图5是根据本发明一个实施例的编译器工作流程示意图;FIG. 5 is a schematic diagram of a compiler workflow according to an embodiment of the present invention; FIG.
图6是根据本发明一个实施例的神经网络处理器进行卷积操作的控制状态机流程图;6 is a flow chart of a control state machine for performing a convolution operation by a neural network processor in accordance with one embodiment of the present invention;
图7是根据本发明一个实施例的权重共享模式下卷积核工作示意图;7 is a schematic diagram of the operation of a convolution kernel in a weight sharing mode according to an embodiment of the present invention;
图8是根据本发一个实施例的卷积核分割示意图;FIG. 8 is a schematic diagram of a convolution kernel split according to an embodiment of the present invention; FIG.
图9是载入/存储指令的指令格式图;Figure 9 is a diagram of an instruction format of a load/store instruction;
图10是运算指令的指令格式图。Fig. 10 is a diagram showing an instruction format of an arithmetic instruction.
具体实施方式detailed description
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图通过具体实施例对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。The present invention will be further described in detail below with reference to the accompanying drawings. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
神经网络是对人脑结构和行为活动进行建模形成的数学模型,通常分为输入层、隐藏层和输出层等结构,每一层均由多个神经元节点构成,本层的神经元节点的输出值,会作为输入传递给下一层的神经元节点,逐层连接。神经网路本身具有仿生学特征,其多层抽象迭代的过程与人脑及其他感知器官有着类似的信息处理方式。图1示出了神经网络的常见拓扑示意图。神经网络多层结构的第一层输入值为原始图像(在本发明中的“原始图像”指的是待处理的原始数据,不仅仅是狭义的通过拍摄照片获得的图像),典型地,对于神经网络的每一层,可通过对该层的神经元节点值(在本文中也称为数据)和其对应的权重值进行计算来得到下一层的节点值。例如,假设x=x 1,x 2,x 3,...,x n表示神经网络中某一层的几个神经元节点,它们与下一层的节点y相连,
Figure PCTCN2018080200-appb-000001
表示对应连接的权重,则y的取值定义:y=x×w。因此,对于神经网络的各层都存在大量的以乘加运算为主的卷积操作。神经网络中的卷积操作过程通常如图2所示:将一个K*K大小的二维权重卷积核对特征图进行扫描,在扫描过程中权重与特征图内对应的特征元素求内积,并将所有内积值求和,得到一个输出层特征元素。当每个卷积层具有N个特征图层时,会有N个K*K大小的卷积核与该卷积层内特征图进行卷积操作,N个内积值求和得到一个输出层特征元素。除了上述以向量乘加为主的卷积 计算之外,神经网络运算还可包括池化、归一化计算等。考虑到神经网络运算的复杂度,通常采用硬件加速技术构建专用神经网络处理器实现神经网络计算。目前常用硬件加速技术包括ASIC或FPGA,为描述方便,下文以FPGA为例来进行说明。FPGA从设计的角度来说更加的灵活多变,例如可以采用Verilog HDL(Hard Description Language,硬件描述语言)、VHDL、或者其他硬件描述语言定义内部的逻辑结构即可实现定制的硬件电路。
The neural network is a mathematical model for modeling human brain structure and behavioral activities. It is usually divided into input layer, hidden layer and output layer. Each layer is composed of multiple neuron nodes. The neuron nodes in this layer are composed. The output value is passed as input to the next level of neuron nodes, connected layer by layer. The neural network itself has bionic characteristics, and its multi-layer abstract iterative process has similar information processing methods as the human brain and other sensory organs. Figure 1 shows a common topology diagram of a neural network. The first layer input value of the neural network multilayer structure is the original image (the "original image" in the present invention refers to the original data to be processed, not only the image obtained by taking a photograph in a narrow sense), typically, for Each layer of the neural network can be calculated by calculating the neuron node values (also referred to herein as data) and its corresponding weight values for the layer to obtain the node values of the next layer. For example, suppose x=x 1 , x 2 , x 3 ,..., x n represent several neuron nodes of a layer in the neural network, which are connected to the node y of the next layer.
Figure PCTCN2018080200-appb-000001
Indicates the weight of the corresponding connection, then the value of y is defined as: y=x×w. Therefore, there are a large number of convolution operations based on multiply and add operations for each layer of the neural network. The convolution operation process in the neural network is generally as shown in Fig. 2: a K*K-sized two-dimensional weight convolution check feature map is scanned, and the inner weight of the corresponding feature elements in the feature map is obtained during the scanning process. And sum all the inner product values to get an output layer feature element. When each convolutional layer has N feature layers, N convolution kernels of K*K size are convoluted with the feature maps in the convolutional layer, and N inner product values are summed to obtain an output layer. Feature element. In addition to the above-described convolution calculation based on vector multiplication and addition, the neural network operation may also include pooling, normalization calculation, and the like. Considering the complexity of neural network operations, hardware acceleration techniques are usually used to construct a dedicated neural network processor to implement neural network computing. At present, common hardware acceleration technologies include ASIC or FPGA. For convenience of description, the following takes FPGA as an example to illustrate. FPGAs are more flexible from a design perspective. For example, Verilog HDL (Hard Description Language), VHDL, or other hardware description language can be used to define internal logic structures to implement custom hardware circuits.
常见的神经网络处理器都基于存储-控制-计算的逻辑结构。存储结构用于存储参与计算的数据、神经网络权重及处理器的操作指令等;控制结构包括译码电路与控制逻辑电路,用于解析操作指令,生成控制信号以控制处理器内数据的调度和存储以及神经网络的计算过程;计算结构负责神经网络计算操作。其中存储单元可以存储神经网络处理器外部传来的数据(例如,原始特征图数据)、已经训练好的神经网络权重、计算过程中产生的处理结果或中间结果、参与计算的指令信息等。图3为给出了常见的神经网络处理器系统101示意图,该神经网络处理器系统101架构包括输入数据存储单元102、控制单元103、输出数据存储单元104、权重存储单元105、指令存储单元106、计算单元107。其中输入数据存储单元102用于存储参与计算的数据,该数据包括原始特征图数据和参与中间层计算的数据;输出数据存储单元104存储计算得到的神经元响应值;指令存储单元106存储参与计算的指令信息,指令被解析为控制流来调度神经网络计算;权重存储单元105用于存储已经训练好的神经网络权重。控制单元103分别与输出数据存储单元104、权重存储单元105、指令存储单元106、计算单元107相连,控制单元103负责指令译码、数据调度、过程控制等工作,例如获得保存在指令存储单元106中的指令并且解析该指令,根据解析得到的控制信号来调度数据并控制计算单元进行神经网络的相关运算。计算单元107用于根据控制单元103产生的控制信号来执行相应的神经网络计算。计算单元107与一个或多个存储单元相关联,计算单元107可以从与其相关联的输入数据存储单元102中的数据存储部件获得数据以进行计算,并且可以向与其相关联的输出数据存储单元104写入数据。计算单元107完成神经网络算法中的大部分运算,包括向量乘加操作、池化、归一化计算等等。Common neural network processors are based on storage-control-calculation logic structures. The storage structure is configured to store data participating in the calculation, the weight of the neural network, and the operation instruction of the processor; the control structure includes a decoding circuit and a control logic circuit for parsing the operation instruction, generating a control signal to control scheduling of the data in the processor, and Storage and computational processes of neural networks; computational structures are responsible for neural network computational operations. The storage unit may store data transmitted from outside the neural network processor (for example, original feature map data), trained neural network weights, processing results or intermediate results generated in the calculation process, instruction information participating in the calculation, and the like. 3 is a schematic diagram showing a conventional neural network processor system 101 including an input data storage unit 102, a control unit 103, an output data storage unit 104, a weight storage unit 105, and an instruction storage unit 106. , calculation unit 107. The input data storage unit 102 is configured to store data participating in the calculation, the data includes original feature map data and data participating in the intermediate layer calculation; the output data storage unit 104 stores the calculated neuron response value; and the instruction storage unit 106 stores the participating calculations. The instruction information is interpreted as a control flow to schedule neural network calculations; the weight storage unit 105 is configured to store the trained neural network weights. The control unit 103 is connected to the output data storage unit 104, the weight storage unit 105, the instruction storage unit 106, and the calculation unit 107, respectively. The control unit 103 is responsible for instruction decoding, data scheduling, process control, etc., for example, is obtained and stored in the instruction storage unit 106. The instruction in the instruction parses the instruction, schedules the data according to the parsed control signal, and controls the computing unit to perform a correlation operation of the neural network. The calculation unit 107 is configured to perform a corresponding neural network calculation according to a control signal generated by the control unit 103. The computing unit 107 is associated with one or more storage units, which may obtain data from a data storage component in its associated input data storage unit 102 for calculation and may output to the output data storage unit 104 associated therewith. data input. The computing unit 107 performs most of the operations in the neural network algorithm, including vector multiply and add operations, pooling, normalization calculations, and the like.
然而,如在背景技术部分提到的,神经网络模型的拓扑结构及参数设计会根据不同的应用场景或应用需求而改变,并且神经网络模型的发展更迭速度很快,这对于神经网络模型及算法的高层应用开发者带来极大的开发挑战,不仅需要针对不同应用需求快速设计或调整相关的硬件加速解决方案,而且要求高层开发人员在掌握神经网络模型及算法的同时了解例如FPGA等硬件开发技术,因此开发难度很大。However, as mentioned in the background section, the topology and parameter design of the neural network model will change according to different application scenarios or application requirements, and the development of the neural network model is fast, which is a neural network model and algorithm. High-level application developers bring great development challenges, not only need to quickly design or adjust related hardware acceleration solutions for different application requirements, but also require high-level developers to understand hardware development such as FPGA while mastering neural network models and algorithms. Technology, so development is very difficult.
在本发明的一个实施例中提供了一种适用于神经网络处理器的自动化设计方法系统或装置,该系统或装置包括硬件生成器和编译器;其中该硬件生成器可根据神经网络模型及硬件资源约束自动生成神经网络处理器的硬件描述语言代码,以供后续硬件设计人员利用已有硬件电路设计方法通过硬件描述语言生成神经网络处理器的硬件电路;而编译器可用于生成神经网络处理器电路结构的控制和数据调度指令流。在一些实施例中,硬件生成器可根据神经网络模型的拓扑结构、硬件资源约束文件构建神经网络处理器硬件架构,并根据所述处理器硬件架构、已构建好的神经网络可复用单元库以及编译器生成的控制状态机生成硬件描述语言代码。In one embodiment of the present invention, an automated design method system or apparatus suitable for a neural network processor is provided, the system or apparatus comprising a hardware generator and a compiler; wherein the hardware generator is based on a neural network model and hardware The resource constraint automatically generates the hardware description language code of the neural network processor for subsequent hardware designers to generate the hardware circuit of the neural network processor through the hardware description language by using the existing hardware circuit design method; and the compiler can be used to generate the neural network processor Control of the circuit structure and flow of data dispatch instructions. In some embodiments, the hardware generator may construct a neural network processor hardware architecture according to a topology of the neural network model, a hardware resource constraint file, and a constructed neural network reusable cell library according to the processor hardware architecture. And the control state machine generated by the compiler generates the hardware description language code.
在一些实施例中,为了避免重复设计以及适应各种神经网络模型的硬件实现,该系统还可包括预先构建的神经网络可复用单元库,该单元度中可包含神经网络模型中各个可复用的基本单元,例如包括但不局限于:神经元单元、累加器单元、池化单元、分类器单元、局部响应归一化单元、查找表单元、地址生成单元、控制单元等等。在单元库中的每个单元的具体硬件结构通过与其关联的硬件描述文件来限定。每个单元对应的硬件描述文件可以采用Verilog HDL或其他硬件描述语言来描述。优选地,每个单元还具有与其关联的配置脚本,可以通过该配置脚本来对每个单元的硬件结构进行适当调整;例如配置神经元单元中寄存器的位宽、配置加法树单元中包含的加法器的数量、配置池化单元中比较器的个数等等。In some embodiments, to avoid repetitive design and to accommodate hardware implementations of various neural network models, the system may also include a pre-built neural network reusable cell library, which may include various recoverable neural network models. The basic units used include, for example but are not limited to: a neuron unit, an accumulator unit, a pooling unit, a classifier unit, a local response normalization unit, a lookup table unit, an address generation unit, a control unit, and the like. The specific hardware structure of each unit in the cell library is defined by the hardware description file associated with it. The hardware description file for each unit can be described in Verilog HDL or other hardware description language. Preferably, each unit also has a configuration script associated therewith, by which the hardware structure of each unit can be appropriately adjusted; for example, configuring the bit width of the register in the neuron unit, and adding the addition included in the configuration addition tree unit The number of devices, the number of comparators in the configuration pooling unit, and so on.
图4示出了根据本发明一个实施例的适用于神经网络处理器自动化设计系统的工作流程,主要可包括:FIG. 4 illustrates a workflow applicable to a neural network processor automation design system according to an embodiment of the present invention, which may mainly include:
步骤S1,读入神经网络模型拓扑结构配置文件。该神经网络模型拓扑结构配置文件主要用于描述根据具体应用需求所设计的神经网络模型,包括该神经网络模型的网络拓扑结构和各个运算层定义。例如,该神经网络 模型拓扑结构配置文件可包括神经网络层数、每层网络尺寸大小及结构、数据位宽、权重位宽、当前层功能属性、当前层输入层数、当前层输出层数、当前层卷积核大小、当前层步进大小,下一层连接属性等等。神经网络每个层包括一个或多个单元,单元类型通常为基本的神经元单元、卷积单元、池化单元、归一化单元、循环单元等等。简言之,神经网络模型描述文件可包括基本属性、参数描述和连接信息三部分,其中基本属性可包括层名称、层类型、层结构等;参数描述可包括输出层数、卷积核大小、步进大小等;连接信息可包括连接名称、连接方向、连接类型等。In step S1, the neural network model topology configuration file is read. The neural network model topology configuration file is mainly used to describe a neural network model designed according to specific application requirements, including a network topology of the neural network model and various operational layer definitions. For example, the neural network model topology configuration file may include a number of neural network layers, a network size and structure of each layer, a data bit width, a weight bit width, a current layer function attribute, a current layer input layer number, a current layer output layer number, Current layer convolution kernel size, current layer step size, next layer connection properties, and so on. Each layer of the neural network includes one or more units, and the unit types are usually basic neuron units, convolution units, pool units, normalization units, loop units, and the like. In short, the neural network model description file may include three parts: a basic attribute, a parameter description, and a connection information, wherein the basic attribute may include a layer name, a layer type, a layer structure, and the like; the parameter description may include an output layer number, a convolution kernel size, Step size, etc.; connection information may include connection name, connection direction, connection type, and the like.
步骤S2,读入硬件资源约束文件,所述硬件资源约束文件包括对于要实现该神经网络处理器的目标硬件电路的可利用的硬件资源进行描述的一些参数,例如可包括要实现该神经网络处理器的目标硬件电路的工作频率、目标电路面积开销、目标电路功耗开销、所支持的数据精度、目标电路存储器大小等等。这些硬件资源约束参数可以放在一个约束文件中一起载入系统中。Step S2, reading a hardware resource constraint file, the hardware resource constraint file including some parameters describing the available hardware resources of the target hardware circuit of the neural network processor to be implemented, for example, may include implementing the neural network processing The operating frequency of the target hardware circuit, the target circuit area overhead, the target circuit power consumption overhead, the supported data precision, the target circuit memory size, and so on. These hardware resource constraint parameters can be loaded into the system together in a constraint file.
步骤S3,该系统的硬件生成器根据所述神经网络模型拓扑结构配置文件和硬件资源约束文件构建神经网络处理器硬件架构,并生成相应的硬件架构描述文件。该硬件架构描述文件可以包括硬件电路结构、输入数据存储器容量、输入存储器位宽、权重存储器容量、权重存储器位宽、偏置存储器容量、偏置存储器位宽、输出数据存储器容量、输出数据存储器位宽、数据位宽、计算单元宽度、计算单元深度、数据共享标志位、权重共享标志位等等。Step S3, the hardware generator of the system constructs a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generates a corresponding hardware architecture description file. The hardware architecture description file may include hardware circuit structure, input data memory capacity, input memory bit width, weight memory capacity, weight memory bit width, offset memory capacity, offset memory bit width, output data memory capacity, output data memory bit. Width, data bit width, calculation unit width, calculation unit depth, data sharing flag, weight sharing flag, and so on.
步骤S4,该系统的编译器生成神经网络处理器电路结构的控制和数据调度指令流,这些过程可以控制状态机的方式来描述。例如根据神经网络模型拓扑结构、硬件资源约束及硬件架构描述文件优化数据调度、存储及计算方式,并生成对应的控制描述文件。图5给出了根据本发明一个实施例的编译器的工作流程,该编译器根据神经网络拓扑结构、所构建的硬件架构及硬件资源约束文件生成控制指令流,以对神经网络处理器进行实时控制。具体可包括:步骤a1,读入神经神经网络模型拓扑结构配置文件网络拓补结构配置文件、硬件架构描述文件及硬件资源约束文件。步骤a2,编译器根据上述文件进行卷积核分块、数据分块等调度优化,并生成控制状 态机。该控制状态机可以用来调度要实现的神经网络处理器硬件电路的工作状态。步骤a3,基于该控制状态机生成用于神经网络处理器的控制指令流。图6以神经网络处理器进行卷积操作为例描述了部分控制状态机流程图。例如控制神经网络相关单元从外部存储器读入神经网络数据和权重数据至内部存储器,接着将要进行卷积操作的相关神经网络数据、偏置数据、权重数据载入计算单元,然后控制计算单元进行乘加操作和累加操作,不断重复上述载入和计算操作直到相应数据均计算完毕。In step S4, the compiler of the system generates a control of the neural network processor circuit structure and a data dispatch instruction stream, and these processes can be described by controlling the state machine. For example, according to the neural network model topology, hardware resource constraints, and hardware architecture description files, data scheduling, storage, and calculation methods are optimized, and corresponding control description files are generated. FIG. 5 is a flowchart showing a workflow of a compiler for generating a control instruction stream according to a neural network topology, a constructed hardware architecture, and a hardware resource constraint file to perform real-time on a neural network processor according to an embodiment of the present invention. control. Specifically, the method may include: step a1, reading the neural network model topology configuration file network topology structure configuration file, the hardware architecture description file, and the hardware resource constraint file. In step a2, the compiler performs scheduling optimization such as convolution kernel partitioning and data partitioning according to the above file, and generates a control state machine. The control state machine can be used to schedule the operational state of the neural network processor hardware circuitry to be implemented. In step a3, a control instruction stream for the neural network processor is generated based on the control state machine. Figure 6 depicts a partial control state machine flow diagram with a neural network processor performing a convolution operation as an example. For example, the control neural network related unit reads the neural network data and the weight data from the external memory into the internal memory, and then loads the relevant neural network data, the offset data, and the weight data to be subjected to the convolution operation into the calculation unit, and then controls the calculation unit to perform multiplication. Add operation and accumulate operation, and repeat the above loading and calculation operations until the corresponding data is calculated.
继续参考图4,在步骤S5,硬件生成器根据硬件架构描述文件及控制描述文件从已构建好的神经网络可复用单元库索引符合设计要求的单元库、生成相对应的控制逻辑并生成与神经网络模型对应的神经网络处理器的硬件电路描述语言。然后在步骤S6,可以通过现有硬件设计方法将所生成的硬件电路描述语言转化为实现神经网络处理器的具体硬件电路。With continued reference to FIG. 4, in step S5, the hardware generator indexes the cell library that meets the design requirements from the constructed neural network reusable cell library according to the hardware architecture description file and the control description file, generates corresponding control logic, and generates and generates The hardware circuit description language of the neural network processor corresponding to the neural network model. Then in step S6, the generated hardware circuit description language can be converted into a specific hardware circuit implementing the neural network processor by an existing hardware design method.
在一些实施例中,由于目标硬件电路的资源约束的限制,神经网络模型在映射为硬件电路时经常无法按照其模型描述形式完整展开,由此编译器可以分析神经网络处理器的计算吞吐量和片上存储器大小,将神经网络特征数据和权重数据划分为适当的数据块集中存储和访问。神经网络的计算数据包括输入特征数据和训练好的权重数据,通过良好的数据分割和存储布局可以减小处理器内部数据带宽并提高存储空间利用效率。下面结合图7和8介绍编译器基于卷积核分割及数据共享的优化方式,主要包括下列步骤:In some embodiments, due to the limitation of resource constraints of the target hardware circuit, the neural network model is often unable to fully expand according to its model description when mapped to a hardware circuit, whereby the compiler can analyze the computational throughput of the neural network processor and On-chip memory size, which divides neural network feature data and weight data into appropriate data block storage and access. The computational data of the neural network includes input feature data and trained weight data. Through good data segmentation and storage layout, the internal data bandwidth of the processor can be reduced and the storage space utilization efficiency can be improved. The optimization method of the compiler based on convolution kernel partitioning and data sharing is described below with reference to Figs. 7 and 8, and mainly includes the following steps:
步骤(1),对于给定的神经网络层,若卷积核大小k和步进值s一致,采用权重共享模式,卷积核在单层数据图内进行卷积操作,如图7所示;Step (1), for a given neural network layer, if the convolution kernel size k and the step value s are the same, the weight sharing mode is adopted, and the convolution kernel performs a convolution operation in the single layer data graph, as shown in FIG. ;
步骤(2),若数据图层数小于计算单元宽度,采用卷积核分割的方法,将大卷积核k分割为小卷积核k s,如图8所示;若数据图层数大于计算单元宽度,采用数据共享方式。 Step (2), if the number of data layers is smaller than the calculation unit width, the convolution kernel segmentation method is used to divide the large convolution kernel k into small convolution kernels k s , as shown in FIG. 8; if the data layer number is greater than Calculate the unit width and use data sharing.
步骤(3),判断下一神经网络层的计算方式,并根据下一神经网络层的卷积操作方式存储当前层的计算结果。In step (3), the calculation mode of the next neural network layer is determined, and the calculation result of the current layer is stored according to the convolution operation mode of the next neural network layer.
在一些实施例中,该系统的编译器所生成的用于神经网络处理器电路结构的控制和数据调度的指令流可以统称为指令流。这些指令流用于控制所设计神经网络处理器的操作及工作方式。指令类型可包括载入/存储指 令和运算指令等类型。例如其中载入/存储指令可以包括下面的指令:In some embodiments, the instruction streams generated by the compiler of the system for control and data scheduling of neural network processor circuit structures may be collectively referred to as instruction streams. These instruction streams are used to control the operation and operation of the designed neural network processor. Instruction types can include types such as load/store instructions and arithmetic instructions. For example, where the load/store instruction can include the following instructions:
*外部与内部存储器数据传输指令,用于外部存储器与内部存储器之间的数据交换,所述数据包括参与神经网络计算的数据、已训练好的权重及偏置数据等;* External and internal memory data transfer instructions for data exchange between external memory and internal memory, including data participating in neural network calculations, trained weights and offset data, etc.;
*输入数据存储器与计算单元传输指令,用于将片上存储器中的数据按照编译优化的调度方式载入至计算单元;* input data memory and computing unit transfer instructions for loading data in the on-chip memory into the computing unit in a compiled optimized scheduling manner;
*权重存储器与计算单元传输指令,用于将片上存储器中的权重数据按照编译优化的调度方式载入至计算单元;* a weight memory and a calculation unit transfer instruction for loading the weight data in the on-chip memory into the calculation unit in a compiling-optimized scheduling manner;
*计算单元与输出数据存储器传输指令,用于将计算单元的计算结果存储至存储器中。* A calculation unit and an output data memory transfer instruction for storing the calculation result of the calculation unit in the memory.
以输入数据存储器与计算单元传输指令为例,介绍载入/存储指令的指令格式,指令格式如图9所示,其中操作码用于标记指令类型;发射间隔用于标记指令每次操作的发射间隔;数据首地址用于标记数据首地址;操作模式用于描述电路所处工作状态,包括大卷积核操作、小卷积核操作、池化操作、全连接操作等;卷积核大小用于标记卷积核值;输出图片大小,用于标记输出图片大小;输入层数目用于标记输入层数目;输出层数目用于标记输出层数目;清零信号用于清除数据值。Taking the input data memory and the calculation unit transfer instruction as an example, the instruction format of the load/store instruction is introduced. The instruction format is as shown in FIG. 9, wherein the operation code is used to mark the instruction type; the transmission interval is used to mark the emission of each operation of the instruction. Interval; data first address is used to mark the data first address; operation mode is used to describe the working state of the circuit, including large convolution kernel operation, small convolution kernel operation, pooling operation, full connection operation, etc.; convolution kernel size The convolution kernel value is marked; the output image size is used to mark the output image size; the number of input layers is used to mark the number of input layers; the number of output layers is used to mark the number of output layers; and the clear signal is used to clear the data values.
又例如,运算指令可包括用于控制卷积操作的卷积操作指令;用于控制池化操作的池化操作指令;用于控制局部响应归一化操作的局部相应归一化指令;用于清除计算单元中载入的数据的清除指令;用于控制激励函数操作并配置函数模式的激励函数操作指令等等。以卷积指令为例,介绍运算指令的指令格式,指令格式如图10所示,其中操作码用于标记指令类型;计算核心数用于标记参与运算的计算核心数;发射间隔用于标记指令每次操作的发射间隔;操作模式用于包括层内卷积及跨层卷积等模式;目标寄存器用于标记计算结果的存储位置,包括输出数据存储器、激励函数寄存器及查找表寄存器等。For another example, the operation instructions may include a convolution operation instruction for controlling a convolution operation; a pooled operation instruction for controlling a pooling operation; a local corresponding normalization instruction for controlling a local response normalization operation; A clear instruction for clearing data loaded in the calculation unit; an excitation function operation instruction for controlling the operation of the excitation function and configuring the function mode, and the like. Taking the convolution instruction as an example, the instruction format of the operation instruction is introduced. The instruction format is shown in Figure 10, where the operation code is used to mark the instruction type; the calculation core number is used to mark the number of calculation cores participating in the operation; the transmission interval is used to mark the instruction The transmission interval of each operation; the operation mode is used to include modes such as intra-layer convolution and cross-layer convolution; the target register is used to mark the storage location of the calculation result, including the output data memory, the excitation function register, and the lookup table register.
在一个实施例中,该编译器例如可采用下面的步骤来生成上述指令流:In one embodiment, the compiler may, for example, take the following steps to generate the above described instruction stream:
步骤b1,读入神经网络层名称;Step b1, reading the name of the neural network layer;
步骤b2,读入神经网络层类型;Step b2, reading in the neural network layer type;
步骤b3,解析神经网络层参数;Step b3, parsing neural network layer parameters;
步骤b4,确定硬件电路结构与参数;Step b4, determining a hardware circuit structure and parameters;
步骤b5,通过上述结合图7和8介绍的基于卷积核分割及数据共享优化方式进行调度优化;Step b5, performing scheduling optimization based on the convolution kernel segmentation and data sharing optimization manner described above in connection with FIGS. 7 and 8;
步骤b6,确定指令参数并根据神经网络工作方式与调度方式生成控制流指令。指令参数例如可包括:神经网络层序号、输入层数、输出层数、每层数据尺寸大小、数据宽度、权重宽度、卷积核大小等等。In step b6, the instruction parameters are determined and the control flow instruction is generated according to the neural network working mode and the scheduling mode. The command parameters may include, for example, a neural network layer serial number, an input layer number, an output layer number, a data size per layer, a data width, a weight width, a convolution kernel size, and the like.
在又一个实施例中,还提供了一种适用神经网络处理器的自动化设计方法,该方法包括:步骤A,对于要以硬件电路方式实现的神经网络模型,获取神经网络模型拓扑结构配置文件以及目标硬件电路的硬件资源约束文件;步骤B,根据所述神经网络模型拓扑结构配置文件与所述硬件资源约束文件来构建与该神经网络模型对应的神经网络处理器硬件架构,并生成硬件架构描述文件;步骤C,根据所述神经网络模型拓扑结构、所述硬件资源约束文件及所述硬件架构描述文件生成用于对所述神经网络处理器进行数据调度、存储及计算进行控制的控制描述文件;步骤D,根据所述硬件架构描述文件和所述控制描述文件生成与该神经网络处理器对应的硬件电路描述语言,以便在所述目标硬件电路上实现该神经网络处理器的硬件电路。其中所述硬件资源约束文件可包括下列中的一个或多个:目标硬件电路的工作频率、目标电路面积开销、目标电路功耗开销、所支持的数据精度、目标电路存储器大小。其中所述控制描述文件可包括用于对所述神经网络处理器进行数据调度、存储及计算进行控制的指令流,其中指令的类型包括载入/存储指令和运算指令。In yet another embodiment, an automated design method for a neural network processor is provided, the method comprising: step A, acquiring a neural network model topology configuration file for a neural network model to be implemented in a hardware circuit manner, and a hardware resource constraint file of the target hardware circuit; step B, constructing a hardware structure of the neural network processor corresponding to the neural network model according to the neural network model topology configuration file and the hardware resource constraint file, and generating a hardware architecture description a file; step C, generating a control description file for controlling data scheduling, storage, and calculation of the neural network processor according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file Step D, generating a hardware circuit description language corresponding to the neural network processor according to the hardware architecture description file and the control description file, so as to implement the hardware circuit of the neural network processor on the target hardware circuit. The hardware resource constraint file may include one or more of the following: an operating frequency of the target hardware circuit, a target circuit area overhead, a target circuit power consumption overhead, a supported data precision, and a target circuit memory size. The control description file may include an instruction stream for controlling data scheduling, storage, and calculation of the neural network processor, wherein the types of instructions include load/store instructions and operation instructions.
在一些实施例中,步骤B可包括:根据所述神经网络模型拓扑结构配置文件与所述硬件资源约束参数并基于预先建立的单元库获取神经模型各组成单元及其关联的具体硬件结构,其中所述单元库由神经网络中可复用的各种类型的单元构成,每个单元包括用于描述其硬件结构的硬件描述文件及配置脚本;以及根据所述神经网络模型的描述文件与所述硬件资源约束参数对从所述单元库获取的各单元的配置脚本进行设置来获得与各单元对应的硬件结构的描述文件,从而获得所述神经网络处理器的硬件架构描述文件。In some embodiments, step B may include: acquiring the neural model components and their associated specific hardware structures according to the neural network model topology configuration file and the hardware resource constraint parameters and based on the pre-established cell library, wherein The unit library is composed of various types of units reusable in a neural network, each unit includes a hardware description file and a configuration script for describing a hardware structure thereof; and a description file according to the neural network model and the description The hardware resource constraint parameter sets a configuration script of each unit acquired from the unit library to obtain a description file of a hardware structure corresponding to each unit, thereby obtaining a hardware architecture description file of the neural network processor.
在一些实施例中,所述步骤C还可包括通过下面步骤确定卷积核分割和数据共享方式:In some embodiments, the step C may further comprise determining a convolution kernel partitioning and data sharing manner by the following steps:
(1),对于给定的神经网络层,若卷积核大小k与步进值s一致,采用权重共享模式,卷积核在单层数据图内进行卷积操作;(1) For a given neural network layer, if the convolution kernel size k is consistent with the step value s, the weight sharing mode is adopted, and the convolution kernel performs a convolution operation in the single layer data graph;
(2),若数据图层数小于计算单元宽度,将卷积核k分割为多个卷积核k s;若数据图层数大于计算单元宽度,采用数据共享方式; (2) If the number of data layers is smaller than the calculation unit width, the convolution kernel k is divided into a plurality of convolution kernels k s ; if the number of data layers is greater than the calculation unit width, a data sharing manner is adopted;
(3),判断下一神经网络层的计算方式,并根据下一神经网络层的卷积操作方式存储当前层的计算结果。(3), judging the calculation mode of the next neural network layer, and storing the calculation result of the current layer according to the convolution operation mode of the next neural network layer.
在又一个实施例中,还提供了一种适用神经网络处理器的自动化设计方法,包括:步骤1),获取神经网络模型拓扑结构配置文件与硬件资源约束文件,其中所述硬件资源约束文件包括目标电路面积开销、目标电路功耗开销及目标电路工作频率;步骤2),根据所述神经网络模型拓扑结构配置文件与所述硬件资源约束文件生成神经网络处理器硬件架构,并生成硬件架构描述文件;步骤3),根据所述神经网络模型拓扑结构、所述硬件资源约束文件及所述硬件架构描述文件优化数据调度、存储及计算方式,生成对应的控制描述文件;步骤4),根据所述硬件架构描述文件、所述控制描述文件从已构建的神经网络可复用单元库查找符合设计要求的单元库、生成相对应的控制逻辑并生成对应的硬件电路描述语言,将所述硬件电路描述语言转化为硬件电路。In still another embodiment, an automated design method for a neural network processor is provided, including: step 1), acquiring a neural network model topology configuration file and a hardware resource constraint file, wherein the hardware resource constraint file includes Target circuit area overhead, target circuit power consumption overhead, and target circuit operating frequency; step 2), generating a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generating a hardware architecture description a file; step 3), optimizing a data scheduling, storing, and calculating manner according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, and generating a corresponding control description file; step 4) The hardware architecture description file, the control description file, the cell library that meets the design requirements are searched from the constructed neural network reusable cell library, the corresponding control logic is generated, and a corresponding hardware circuit description language is generated, and the hardware circuit is generated. The description language is converted to a hardware circuit.
在一些实施例中,所述神经网络模型拓扑结构配置文件可包括神经网络层数及每层网络尺寸大小、数据位宽、权重位宽、当前层功能属性、当前层输入层数、当前层输出层数、当前层卷积核大小、当前层步进大小,下一层连接属性。在一些实施例中,所述硬件架构描述文件可包括输入数据存储器容量、输入存储器位宽、权重存储器容量、权重存储器位宽、偏置存储器容量、偏置存储器位宽、输出数据存储器容量、输出数据存储器位宽、数据位宽、计算单元宽度、计算单元深度、数据共享标志位及权重共享标志位。In some embodiments, the neural network model topology configuration file may include a number of neural network layers and each layer network size, a data bit width, a weight bit width, a current layer function attribute, a current layer input layer number, and a current layer output. The number of layers, the current layer convolution kernel size, the current layer step size, and the next layer of connection properties. In some embodiments, the hardware architecture description file may include input data memory capacity, input memory bit width, weight memory capacity, weight memory bit width, offset memory capacity, offset memory bit width, output data memory capacity, output Data memory bit width, data bit width, calculation unit width, calculation unit depth, data sharing flag bit, and weight sharing flag bit.
在一些实施例中,该方法还可包括在生成神经网络电路模型的同时生成控制指令流,指令类型包括载入/存储指令和运算指令等类型。在一些实施例中,步骤3)可包括:根据所述神经网络模型拓扑结构配置文件进 行卷积核分块、数据分块,并生成控制状态机;根据所述控制状态机生成控制指令流。In some embodiments, the method can also include generating a stream of control instructions while generating a neural network circuit model, the types of instructions including load/store instructions and arithmetic instructions. In some embodiments, step 3) may comprise: convolution kernel partitioning, data partitioning according to the neural network model topology configuration file, and generating a control state machine; generating a control instruction stream according to the control state machine.
在一些实施例中,还提供了一种适用神经网络处理器的自动化设计装置,包括:In some embodiments, an automated design apparatus for a neural network processor is also provided, comprising:
获取数据模块,用于获取神经网络模型拓扑结构配置文件与硬件资源约束文件,其中所述硬件资源约束文件包括目标电路面积开销、目标电路功耗开销及目标电路工作频率;Obtaining a data module, configured to acquire a neural network model topology configuration file and a hardware resource constraint file, where the hardware resource constraint file includes a target circuit area overhead, a target circuit power consumption overhead, and a target circuit operating frequency;
生成硬件架构描述文件模块,用于根据所述神经网络模型拓扑结构配置文件与所述硬件资源约束文件生成神经网络处理器硬件架构,并生成硬件架构描述文件;Generating a hardware architecture description file module, configured to generate a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generate a hardware architecture description file;
生成控制描述文件模块,用于根据所述神经网络模型拓扑结构、所述硬件资源约束文件及所述硬件架构描述文件优化数据调度、存储及计算方式,生成对应的控制描述文件;Generating a control description file module, configured to optimize a data scheduling, storage, and calculation manner according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, to generate a corresponding control description file;
生成硬件电路模块,用于根据所述硬件架构描述文件、所述控制描述文件从已构建的神经网络可复用单元库查找符合设计要求的单元库并生成对应的硬件电路描述语言,将所述硬件电路描述语言转化为硬件电路。Generating a hardware circuit module, configured to search, according to the hardware architecture description file, the control description file, a cell library that meets a design requirement from a constructed neural network reusable cell library, and generate a corresponding hardware circuit description language, where The hardware circuit description language is converted into a hardware circuit.
在一些实施例中,所述神经网络可复用单元库包括:神经元单元、累加器单元、池化单元、分类器单元、局部响应归一化单元、查找表单元、地址生成单元、控制单元。在一些实施例中,所述生成控制描述文件包括:根据所述神经网络模型拓扑结构配置文件进行卷积核分块、数据分块,并生成控制状态机;根据所述控制状态机生成控制指令流。In some embodiments, the neural network reusable unit library includes: a neuron unit, an accumulator unit, a pooling unit, a classifier unit, a local response normalization unit, a lookup table unit, an address generation unit, and a control unit . In some embodiments, the generating the control description file includes: performing convolution kernel partitioning, data partitioning according to the neural network model topology configuration file, and generating a control state machine; generating a control instruction according to the control state machine flow.
由上述实施例可以看出,根据本发明的适用于神经网络处理器的自动化设计系统可将神经网络模型映射为神经网络专用处理器的硬件描述语言,并根据该处理器结构优化数据计算及调度方式、生成相对应的控制流指令,实现了神经网络处理器的自动化设计,降低了神经网络处理器的设计周期,适应了神经网络技术网络模型更新快、运算速度要求块、能量效率要求高的应用特点。It can be seen from the above embodiment that the automatic design system applicable to the neural network processor according to the present invention can map the neural network model to the hardware description language of the neural network dedicated processor, and optimize the data calculation and scheduling according to the processor structure. The method and the corresponding control flow instruction are generated, which realizes the automatic design of the neural network processor, reduces the design cycle of the neural network processor, and adapts to the neural network technology network model update fast, the operation speed requirement block, and the energy efficiency requirement. Application characteristics.
本说明书中针对“各个实施例”、“一些实施例”、“一个实施例”、或“实施例”等的参考指代的是结合所述实施例所描述的特定特征、结构、或性质包括在至少一个实施例中。因此,短语“在各个实施例中”、“在一 些实施例中”、“在一个实施例中”、或“在实施例中”等在整个说明书中各地方的出现并非必须指代相同的实施例。此外,特定特征、结构、或性质可以在一个或多个实施例中以任何合适方式组合。因此,结合一个实施例中所示出或描述的特定特征、结构或性质可以整体地或部分地与一个或多个其他实施例的特征、结构、或性质无限制地组合,只要该组合不是非逻辑性的或不能工作。另外,本申请附图中的各个元素仅仅为了示意说明,并非按比例绘制。References in the specification to "individual embodiments," "some embodiments," "one embodiment," or "an embodiment" or "an" In at least one embodiment. Thus, appearances of the phrases "in the various embodiments", "in some embodiments", "in one embodiment", or "in an embodiment" example. Furthermore, the particular features, structures, or properties may be combined in any suitable manner in one or more embodiments. Thus, the particular features, structures, or properties shown or described in connection with one embodiment may be combined, in whole or in part, with the features, structures, or properties of one or more other embodiments without limitation, as long as the combination is not Logical or not working. In addition, the various elements in the drawings of the present application are only for the purpose of illustration and illustration.
由此描述了本发明的至少一个实施例的几个方面,可以理解,对本领域技术人员来说容易地进行各种改变、修改和改进。这种改变、修改和改进意于在本发明的精神和范围内。Having thus described several aspects of at least one embodiment of the present invention, it is understood that various changes, modifications and improvements can be readily made by those skilled in the art. Such changes, modifications, and improvements are intended to be within the spirit and scope of the invention.

Claims (11)

  1. 一种适用神经网络处理器的自动化设计方法,该方法包括:An automated design method for a neural network processor, the method comprising:
    步骤A,对于要以硬件电路方式实现的神经网络模型,获取神经网络模型拓扑结构配置文件以及目标硬件电路的硬件资源约束文件;Step A: acquiring a neural network model topology configuration file and a hardware resource constraint file of the target hardware circuit for the neural network model to be implemented in a hardware circuit manner;
    步骤B,根据所述神经网络模型拓扑结构配置文件与所述硬件资源约束文件来构建与该神经网络模型对应的神经网络处理器硬件架构,并生成硬件架构描述文件;Step B: construct a neural network processor hardware architecture corresponding to the neural network model according to the neural network model topology configuration file and the hardware resource constraint file, and generate a hardware architecture description file;
    步骤C,根据所述神经网络模型拓扑结构、所述硬件资源约束文件及所述硬件架构描述文件生成用于对所述神经网络处理器进行数据调度、存储及计算进行控制的控制描述文件;Step C: Generate a control description file for controlling data scheduling, storage, and calculation of the neural network processor according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file;
    步骤D,根据所述硬件架构描述文件和所述控制描述文件生成与该神经网络处理器对应的硬件电路描述语言,以便在所述目标硬件电路上实现该神经网络处理器的硬件电路。Step D: Generate a hardware circuit description language corresponding to the neural network processor according to the hardware architecture description file and the control description file, so as to implement a hardware circuit of the neural network processor on the target hardware circuit.
  2. 根据权利要求1所述的方法,所述步骤B包括:The method of claim 1 wherein said step B comprises:
    根据所述神经网络模型拓扑结构配置文件与所述硬件资源约束参数并基于预先建立的单元库获取神经模型各组成单元及其关联的具体硬件结构,其中所述单元库由神经网络中可复用的各种类型的单元构成,每个单元包括用于描述其硬件结构的硬件描述文件及配置脚本;以及Obtaining, according to the neural network model topology configuration file and the hardware resource constraint parameter, a component of a neural model component and a specific hardware structure thereof according to a pre-established cell library, wherein the cell library is reusable by a neural network Various types of unit components, each unit including a hardware description file and a configuration script for describing its hardware structure;
    根据所述神经网络模型的描述文件与所述硬件资源约束参数对从所述单元库获取的各单元的配置脚本进行设置来获得与各单元对应的硬件结构的描述文件,从而获得所述神经网络处理器的硬件架构描述文件。And setting a configuration script of each unit acquired from the unit library according to the description file of the neural network model and the hardware resource constraint parameter to obtain a description file of a hardware structure corresponding to each unit, thereby obtaining the neural network. The hardware architecture description file for the processor.
  3. 根据权利要求1所述的方法,其中所述硬件资源约束文件包括下列中的一个或多个:目标硬件电路的工作频率、目标电路面积开销、目标电路功耗开销、所支持的数据精度、目标电路存储器大小。The method of claim 1, wherein the hardware resource constraint file comprises one or more of the following: operating frequency of the target hardware circuit, target circuit area overhead, target circuit power consumption overhead, supported data accuracy, target Circuit memory size.
  4. 根据权利要求1所述的方法,其中所述神经网络模型拓扑结构配置文件包括神经网络层数及每层网络尺寸大小、数据位宽、权重位宽、当前层功能属性、当前层输入层数、当前层输出层数、当前层卷积核大小、当前层步进大小,下一层连接属性。The method according to claim 1, wherein the neural network model topology configuration file comprises a number of neural network layers and a network size of each layer, a data bit width, a weight bit width, a current layer function attribute, a current layer input layer number, The current layer output layer number, the current layer convolution kernel size, the current layer step size, and the next layer connection property.
  5. 根据权利要求1所述的方法,所述步骤C还包括通过下面步骤确定所述神经网络处理器的卷积核分割和数据共享方式:The method of claim 1, the step C further comprising determining, by the following steps, a convolution kernel partitioning and data sharing manner of the neural network processor:
    (1),对于给定的神经网络层,若卷积核大小k与步进值s一致,采用权重共享模式,卷积核在单层数据图内进行卷积操作;(1) For a given neural network layer, if the convolution kernel size k is consistent with the step value s, the weight sharing mode is adopted, and the convolution kernel performs a convolution operation in the single layer data graph;
    (2),若数据图层数小于计算单元宽度,将卷积核k分割为多个卷积核k s;若数据图层数大于计算单元宽度,采用数据共享方式; (2) If the number of data layers is smaller than the calculation unit width, the convolution kernel k is divided into a plurality of convolution kernels k s ; if the number of data layers is greater than the calculation unit width, a data sharing manner is adopted;
    (3),判断下一神经网络层的计算方式,并根据下一神经网络层的卷积操作方式存储当前层的计算结果。(3), judging the calculation mode of the next neural network layer, and storing the calculation result of the current layer according to the convolution operation mode of the next neural network layer.
  6. 根据权利要求1所述的方法,所述控制描述文件包括用于对所述神经网络处理器进行数据调度、存储及计算进行控制的指令流,其中指令的类型包括载入/存储指令和运算指令。The method of claim 1, the control description file comprising an instruction stream for controlling data scheduling, storage, and calculation of the neural network processor, wherein the type of the instruction includes a load/store instruction and an operation instruction .
  7. 根据权利要求6所述的方法,所述载入/存储指令包括用于外部存储器与神经网络处理器的内部存储器之间的数据交换指令、用于所述内部存储器中的数据和权重载入至计算单元的指令、用于将计算单元的计算结果存储至存储器中的指令;以及所述运算指令包括卷积操作指令、池化操作指令、局部相应归一化指令、清除指令、激励函数操作指令。The method of claim 6 wherein said load/store instructions include data exchange instructions between an external memory and an internal memory of a neural network processor, data for said internal memory, and weight loading to An instruction of the calculation unit, an instruction for storing the calculation result of the calculation unit into the memory; and the operation instruction includes a convolution operation instruction, a pooled operation instruction, a local corresponding normalization instruction, a clear instruction, an excitation function operation instruction .
  8. 根据权利要求7所述的方法,所述卷积指令的格式包括下列字段:用于标记指令类型的操作码、用于标记参与运算的计算核心数的计算核心数、用于标记指令每次操作的发射间隔的发射间隔;用于标记层内卷积及跨层卷积的模式的操作模式、用于标记计算结果的存储位置的目标寄存器。The method of claim 7, the format of the convolution instruction comprising the following fields: an opcode for marking the type of instruction, a number of computation cores for marking the number of computational cores participating in the operation, for marking the instruction each time The transmission interval of the transmission interval; the operation mode for marking the mode of convolution and cross-layer convolution in the layer, and the destination register for marking the storage location of the calculation result.
  9. 一种适用神经网络处理器的自动化设计方法,其特征在于,包括:An automated design method for a neural network processor, comprising:
    步骤1)获取神经网络模型拓扑结构配置文件与硬件资源约束文件,其中所述硬件资源约束文件包括目标电路面积开销、目标电路功耗开销及目标电路工作频率;Step 1) acquiring a neural network model topology configuration file and a hardware resource constraint file, where the hardware resource constraint file includes a target circuit area overhead, a target circuit power consumption overhead, and a target circuit operating frequency;
    步骤2)根据所述神经网络模型拓扑结构配置文件与所述硬件资源约束文件生成神经网络处理器硬件架构,并生成硬件架构描述文件;Step 2) generating a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generating a hardware architecture description file;
    步骤3)根据所述神经网络模型拓扑结构、所述硬件资源约束文件及所述硬件架构描述文件优化数据调度、存储及计算方式,生成对应的控制描述文件;Step 3) Optimize data scheduling, storage, and calculation manner according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, and generate a corresponding control description file;
    步骤4)根据所述硬件架构描述文件、所述控制描述文件从已构建的神经网络可复用单元库查找符合设计要求的单元库、生成相对应的控制逻 辑并生成对应的硬件电路描述语言,以便将所述硬件电路描述语言转化为硬件电路。Step 4) Searching, according to the hardware architecture description file, the control description file, a cell library that meets the design requirements from the constructed neural network reusable cell library, generating corresponding control logic, and generating a corresponding hardware circuit description language, In order to convert the hardware circuit description language into a hardware circuit.
  10. 根据权利要求9所述的方法,其特征在于,所述硬件架构描述文件包括输入数据存储器容量、输入存储器位宽、权重存储器容量、权重存储器位宽、偏置存储器容量、偏置存储器位宽、输出数据存储器容量、输出数据存储器位宽、数据位宽、计算单元宽度、计算单元深度、数据共享标志位及权重共享标志位。The method of claim 9, wherein the hardware architecture description file comprises an input data memory capacity, an input memory bit width, a weight memory capacity, a weight memory bit width, an offset memory capacity, an offset memory bit width, Output data memory capacity, output data memory bit width, data bit width, calculation unit width, calculation unit depth, data sharing flag bits, and weight sharing flag bits.
  11. 一种适用神经网络处理器的自动化设计系统,其特征在于,包括:An automated design system for a neural network processor, comprising:
    获取数据模块,用于获取神经网络模型拓扑结构配置文件与硬件资源约束文件,其中所述硬件资源约束文件包括目标电路面积开销、目标电路功耗开销及目标电路工作频率;Obtaining a data module, configured to acquire a neural network model topology configuration file and a hardware resource constraint file, where the hardware resource constraint file includes a target circuit area overhead, a target circuit power consumption overhead, and a target circuit operating frequency;
    生成硬件架构描述文件模块,用于根据所述神经网络模型拓扑结构配置文件与所述硬件资源约束文件生成神经网络处理器硬件架构,并生成硬件架构描述文件;Generating a hardware architecture description file module, configured to generate a neural network processor hardware architecture according to the neural network model topology configuration file and the hardware resource constraint file, and generate a hardware architecture description file;
    生成控制描述文件模块,用于根据所述神经网络模型拓扑结构、所述硬件资源约束文件及所述硬件架构描述文件优化数据调度、存储及计算方式,生成对应的控制描述文件;Generating a control description file module, configured to optimize a data scheduling, storage, and calculation manner according to the neural network model topology, the hardware resource constraint file, and the hardware architecture description file, to generate a corresponding control description file;
    生成硬件电路模块,用于根据所述硬件架构描述文件、所述控制描述文件从已构建的神经网络可复用单元库查找符合设计要求的单元库并生成对应的硬件电路描述语言,以便将所述硬件电路描述语言转化为硬件电路。Generating a hardware circuit module, configured to search, according to the hardware architecture description file, the control description file, a cell library that meets a design requirement from a constructed neural network reusable cell library, and generate a corresponding hardware circuit description language, so as to The hardware circuit description language is converted into a hardware circuit.
PCT/CN2018/080200 2017-03-23 2018-03-23 Automated design method and system applicable for neural network processor WO2018171715A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710178679.7A CN107016175B (en) 2017-03-23 2017-03-23 It is applicable in the Automation Design method, apparatus and optimization method of neural network processor
CN201710178679.7 2017-03-23

Publications (1)

Publication Number Publication Date
WO2018171715A1 true WO2018171715A1 (en) 2018-09-27

Family

ID=59444868

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/080200 WO2018171715A1 (en) 2017-03-23 2018-03-23 Automated design method and system applicable for neural network processor

Country Status (2)

Country Link
CN (1) CN107016175B (en)
WO (1) WO2018171715A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022068343A1 (en) * 2020-09-30 2022-04-07 International Business Machines Corporation Memory-mapped neural network accelerator for deployable inference systems

Families Citing this family (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107016175B (en) * 2017-03-23 2018-08-31 中国科学院计算技术研究所 It is applicable in the Automation Design method, apparatus and optimization method of neural network processor
CN107480789B (en) * 2017-08-07 2020-12-29 北京中星微电子有限公司 Efficient conversion method and device of deep learning model
CN107480115B (en) * 2017-08-31 2021-04-06 郑州云海信息技术有限公司 Method and system for format conversion of caffe frame residual error network configuration file
CN107578098B (en) * 2017-09-01 2020-10-30 中国科学院计算技术研究所 Neural network processor based on systolic array
CN109697509B (en) * 2017-10-24 2020-10-20 上海寒武纪信息科技有限公司 Processing method and device, and operation method and device
CN107918794A (en) * 2017-11-15 2018-04-17 中国科学院计算技术研究所 Neural network processor based on computing array
WO2019114842A1 (en) 2017-12-14 2019-06-20 北京中科寒武纪科技有限公司 Integrated circuit chip apparatus
CN109961134B (en) * 2017-12-14 2020-06-23 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
CN109496319A (en) * 2018-01-15 2019-03-19 深圳鲲云信息科技有限公司 Artificial intelligence process device hardware optimization method, system, storage medium, terminal
CN108280305B (en) * 2018-01-30 2020-03-13 西安交通大学 Deep learning-based rapid topological optimization design method for cooling channel of heat dissipation device
EP3770775A4 (en) * 2018-03-23 2021-06-02 Sony Corporation Information processing device and information processing method
CN108764483B (en) * 2018-03-29 2021-05-18 杭州必优波浪科技有限公司 Neural network block optimization method with low computational power requirement and block optimizer
CN108564168B (en) * 2018-04-03 2021-03-09 中国科学院计算技术研究所 Design method for neural network processor supporting multi-precision convolution
US11954576B2 (en) 2018-04-17 2024-04-09 Shenzhen Corerain Technologies Co., Ltd. Method for implementing and developing network model and related product
CN110555334B (en) * 2018-05-30 2022-06-07 东华软件股份公司 Face feature determination method and device, storage medium and electronic equipment
US11663461B2 (en) 2018-07-05 2023-05-30 International Business Machines Corporation Instruction distribution in an array of neural network cores
CN109255148B (en) * 2018-07-27 2023-01-31 石家庄创天电子科技有限公司 Mechanical product design method and system
US10728954B2 (en) 2018-08-07 2020-07-28 At&T Intellectual Property I, L.P. Automated network design and traffic steering
CN110825311B (en) * 2018-08-10 2023-04-18 昆仑芯(北京)科技有限公司 Method and apparatus for storing data
CN109086875A (en) * 2018-08-16 2018-12-25 郑州云海信息技术有限公司 A kind of convolutional network accelerating method and device based on macroinstruction set
CN109409510B (en) * 2018-09-14 2022-12-23 深圳市中科元物芯科技有限公司 Neuron circuit, chip, system and method thereof, and storage medium
CN110991161B (en) * 2018-09-30 2023-04-18 北京国双科技有限公司 Similar text determination method, neural network model obtaining method and related device
CN109359732B (en) 2018-09-30 2020-06-09 阿里巴巴集团控股有限公司 Chip and data processing method based on chip
CN111078284B (en) * 2018-10-19 2021-02-05 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111078285B (en) * 2018-10-19 2021-01-26 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111079913B (en) * 2018-10-19 2021-02-05 中科寒武纪科技股份有限公司 Operation method, device and related product
CN111078293B (en) * 2018-10-19 2021-03-16 中科寒武纪科技股份有限公司 Operation method, device and related product
CN111078282B (en) * 2018-10-19 2020-12-22 安徽寒武纪信息科技有限公司 Operation method, device and related product
CN111078283B (en) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 Operation method, device and related product
CN111079907B (en) * 2018-10-19 2021-01-26 安徽寒武纪信息科技有限公司 Operation method, device and related product
CN111079910B (en) * 2018-10-19 2021-01-26 中科寒武纪科技股份有限公司 Operation method, device and related product
WO2020078446A1 (en) * 2018-10-19 2020-04-23 中科寒武纪科技股份有限公司 Computation method and apparatus, and related product
CN111079911B (en) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111079915B (en) * 2018-10-19 2021-01-26 中科寒武纪科技股份有限公司 Operation method, device and related product
CN111079914B (en) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111079912B (en) * 2018-10-19 2021-02-12 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111078280B (en) * 2018-10-19 2021-01-26 中科寒武纪科技股份有限公司 Operation method, device and related product
CN111079916B (en) * 2018-10-19 2021-01-15 安徽寒武纪信息科技有限公司 Operation method, system and related product
CN111079909B (en) * 2018-10-19 2021-01-26 安徽寒武纪信息科技有限公司 Operation method, system and related product
CN111079925B (en) * 2018-10-19 2021-04-09 中科寒武纪科技股份有限公司 Operation method, device and related product
CN111078291B (en) * 2018-10-19 2021-02-09 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111078125B (en) * 2018-10-19 2021-01-29 中科寒武纪科技股份有限公司 Operation method, device and related product
CN111079924B (en) * 2018-10-19 2021-01-08 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111078281B (en) * 2018-10-19 2021-02-12 中科寒武纪科技股份有限公司 Operation method, system and related product
CN111104120B (en) * 2018-10-29 2023-12-22 赛灵思公司 Neural network compiling method and system and corresponding heterogeneous computing platform
CN111144561B (en) * 2018-11-05 2023-05-02 杭州海康威视数字技术股份有限公司 Neural network model determining method and device
WO2020107265A1 (en) * 2018-11-28 2020-06-04 深圳市大疆创新科技有限公司 Neural network processing device, control method, and computing system
CN111240682A (en) * 2018-11-28 2020-06-05 深圳市中兴微电子技术有限公司 Instruction data processing method and device, equipment and storage medium
CN111542818B (en) * 2018-12-12 2023-06-06 深圳鲲云信息科技有限公司 Network model data access method and device and electronic equipment
CN111325311B (en) * 2018-12-14 2024-03-29 深圳云天励飞技术有限公司 Neural network model generation method for image recognition and related equipment
CN111381979B (en) * 2018-12-29 2023-05-23 杭州海康威视数字技术股份有限公司 Development verification method, device and system of neural network and storage medium
CN109799977B (en) * 2019-01-25 2021-07-27 西安电子科技大学 Method and system for developing and scheduling data by instruction program
CN109978160B (en) * 2019-03-25 2021-03-02 中科寒武纪科技股份有限公司 Configuration device and method of artificial intelligence processor and related products
CN111865640B (en) * 2019-04-30 2023-09-26 华为技术服务有限公司 Network architecture description method, device and medium thereof
CN110210605B (en) * 2019-05-31 2023-04-07 Oppo广东移动通信有限公司 Hardware operator matching method and related product
CN112132271A (en) * 2019-06-25 2020-12-25 Oppo广东移动通信有限公司 Neural network accelerator operation method, architecture and related device
CN110443357B (en) * 2019-08-07 2020-09-15 上海燧原智能科技有限公司 Convolutional neural network calculation optimization method and device, computer equipment and medium
CN113272813B (en) * 2019-10-12 2023-05-05 深圳鲲云信息科技有限公司 Custom data stream hardware simulation method, device, equipment and storage medium
CN111339027B (en) * 2020-02-25 2023-11-28 中国科学院苏州纳米技术与纳米仿生研究所 Automatic design method of reconfigurable artificial intelligent core and heterogeneous multi-core chip
WO2022135599A1 (en) * 2020-12-25 2022-06-30 中科寒武纪科技股份有限公司 Device, board and method for merging branch structures, and readable storage medium
US11693692B2 (en) 2021-06-17 2023-07-04 International Business Machines Corporation Program event recording storage alteration processing for a neural network accelerator instruction
CN113657059B (en) * 2021-08-17 2023-05-09 成都视海芯图微电子有限公司 Automatic design method and device suitable for point cloud data processor
CN114968602B (en) * 2022-08-01 2022-10-21 成都图影视讯科技有限公司 Architecture, method and apparatus for a dynamically resource-allocated neural network chip

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022468A (en) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 Artificial neural network processor integrated circuit and design method therefor
CN106355244A (en) * 2016-08-30 2017-01-25 深圳市诺比邻科技有限公司 CNN (convolutional neural network) construction method and system
CN106529670A (en) * 2016-10-27 2017-03-22 中国科学院计算技术研究所 Neural network processor based on weight compression, design method, and chip
CN107016175A (en) * 2017-03-23 2017-08-04 中国科学院计算技术研究所 It is applicable the Automation Design method, device and the optimization method of neural network processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106022468A (en) * 2016-05-17 2016-10-12 成都启英泰伦科技有限公司 Artificial neural network processor integrated circuit and design method therefor
CN106355244A (en) * 2016-08-30 2017-01-25 深圳市诺比邻科技有限公司 CNN (convolutional neural network) construction method and system
CN106529670A (en) * 2016-10-27 2017-03-22 中国科学院计算技术研究所 Neural network processor based on weight compression, design method, and chip
CN107016175A (en) * 2017-03-23 2017-08-04 中国科学院计算技术研究所 It is applicable the Automation Design method, device and the optimization method of neural network processor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG, YING ET AL.: "DeepBurning: Automatic Generation of FPGA-Based Learning Accelerators for the Neural Network Family", DESIGN AUTOMATION CONFERENCE (DAC), 9 June 2016 (2016-06-09), XP055540634 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022068343A1 (en) * 2020-09-30 2022-04-07 International Business Machines Corporation Memory-mapped neural network accelerator for deployable inference systems
GB2614851A (en) * 2020-09-30 2023-07-19 Ibm Memory-mapped neural network accelerator for deployable inference systems

Also Published As

Publication number Publication date
CN107016175A (en) 2017-08-04
CN107016175B (en) 2018-08-31

Similar Documents

Publication Publication Date Title
WO2018171715A1 (en) Automated design method and system applicable for neural network processor
WO2018171717A1 (en) Automated design method and system for neural network processor
Xu et al. Autodnnchip: An automated dnn chip predictor and builder for both fpgas and asics
Xu et al. CaFPGA: An automatic generation model for CNN accelerator
Chen et al. A NoC-based simulator for design and evaluation of deep neural networks
Yang et al. S 2 Engine: A novel systolic architecture for sparse convolutional neural networks
Belabed et al. User driven FPGA-based design automated framework of deep neural networks for low-power low-cost edge computing
Kim FPGA based neural network accelerators
Cordeiro et al. Machine learning migration for efficient near-data processing
EP3805995A1 (en) Method of and apparatus for processing data of a deep neural network
Wang et al. Briefly Analysis about CNN Accelerator based on FPGA
Odetola et al. 2l-3w: 2-level 3-way hardware-software co-verification for the mapping of deep learning architecture (dla) onto fpga boards
Bhowmik et al. ESCA: Event-based split-CNN architecture with data-level parallelism on ultrascale+ FPGA
US20210312278A1 (en) Method and apparatus with incremental learning moddel
Kim et al. Agamotto: A performance optimization framework for CNN accelerator with row stationary dataflow
Gan et al. High performance reconfigurable computing for numerical simulation and deep learning
CN116974868A (en) Chip power consumption estimation device, method, electronic equipment and storage medium
Shahshahani et al. An automated tool for implementing deep neural networks on fpga
Odetola et al. 2l-3w: 2-level 3-way hardware–software co-verification for the mapping of convolutional neural network (cnn) onto fpga boards
Gonçalves et al. Exploring data size to run convolutional neural networks in low density fpgas
Yu et al. Hardware implementation of CNN based on FPGA for EEG Signal Patterns Recognition
Li et al. ANNA: Accelerating Neural Network Accelerator through software-hardware co-design for vertical applications in edge systems
Ali et al. RISC-V based MPSoC design exploration for FPGAs: area, power and performance
Li et al. An experimental evaluation of extreme learning machines on several hardware devices
Krishnamoorthy et al. Integrated analysis of power and performance for cutting edge Internet of Things microprocessor architectures

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18772279

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18772279

Country of ref document: EP

Kind code of ref document: A1