CN110232441A - A kind of stacking-type based on unidirectional systolic arrays is from encoding system and method - Google Patents

A kind of stacking-type based on unidirectional systolic arrays is from encoding system and method Download PDF

Info

Publication number
CN110232441A
CN110232441A CN201910528794.1A CN201910528794A CN110232441A CN 110232441 A CN110232441 A CN 110232441A CN 201910528794 A CN201910528794 A CN 201910528794A CN 110232441 A CN110232441 A CN 110232441A
Authority
CN
China
Prior art keywords
data
unidirectional
control module
module
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910528794.1A
Other languages
Chinese (zh)
Other versions
CN110232441B (en
Inventor
李丽
黄延
傅玉祥
陈沁雨
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201910528794.1A priority Critical patent/CN110232441B/en
Publication of CN110232441A publication Critical patent/CN110232441A/en
Application granted granted Critical
Publication of CN110232441B publication Critical patent/CN110232441B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Stacking-type based on unidirectional systolic arrays of the invention is from the hardware realization of encryption algorithm reasoning, including signal control module, input/output control module, data address generation module and computing array module;Signal control module: receiving commencing signal, controls each intermodule communication, generates end signal;Input/output control module: reading the data of the outer DDR of piece when input and is stored in on-chip SRAM by ad hoc fashion, and on-chip SRAM data are write back DDR by ad hoc fashion when output;Data address generation module: source data or result data address are generated;Computing array module: the reasoning operation of neural network algorithm is carried out in a manner of unidirectional systolic arrays.Present invention support batch processing, support water operation realize that part calculates hiding, the speed-up ratio height of time and memory access time by ping-pong operation, and scalability is good.

Description

A kind of stacking-type based on unidirectional systolic arrays is from encoding system and method
Technical field
The present invention relates to field of artificial intelligence more particularly to a kind of stacking-type based on unidirectional systolic arrays are self-editing Code system and method.
Background technique
Stack noise reduction self-encoding encoder is typical standard neural network, has two main points, one be it is a series of from Dynamic encoder, the other is multilayer perceptron (MLP).The reasoning process of stack noise reduction self-encoding encoder is actually equivalent to more The feed forward process of layer perceptron, if the output of certain layer of j-th of neuron is y in networkj, operand has n, i-th of operand For xi, respective weights wij, it is biased to bi, then have:
For such computation-intensive algorithm, powerful calculation power is needed to be supported.Before 2007, it is limited to work as When the factors such as network size and data volume, general cpu chip can provide enough calculating power.Later, fast with GPU Speed development, parallel computation characteristic adapts to the requirement of intelligent algorithm big data parallel computation just, therefore GPU becomes master Stream.Structurally, there is the transistor of accounting 70% to be used to construct Cache (Cache) and control unit in CPU, patrol It is few to collect arithmetic element (ALU module), it is difficult to meet the calculation power demand of intelligent algorithm;The far super CPU of the computing capability of GPU, But the hardware configuration of GPU does not have programmability, if intelligent algorithm varies widely, GPU can not be configured flexibly firmly Part structure.In addition, the energy consumption of GPU and CPU is all bigger.
Nowadays, the appearance with more and more application scenarios with advances in technology, people are to artificial intelligence chip Demand is gradually promoted, and Artificial Intelligence Development faces new problem, for example pilotless automobile needs real-time, extremely low delay Reaction, this characteristic determine that we cannot use big power consumption, high-cost GPU.
How under acceptable power consumption, cost limitation, solves the problems, such as the huge calculation amount of deep learning, make nerve net Network performance is more preferable, power consumption is lower, a scalability more preferably current manual's intelligence big technical problem.
Summary of the invention
Present invention aims to overcome that existing technical problem makes full use of and deposits to improve neural network computing efficiency Resource and computing resource are stored up, the calculating speed of reasoning is accelerated, provides a kind of stacking-type based on unidirectional systolic arrays from encoding System is specifically realized by the following technical scheme:
The stacking-type based on unidirectional systolic arrays is from coded system, comprising:
Signal control module: receiving commencing signal, controls each intermodule communication, generates end signal;
Memory module: including DDR memory outside piece and on-chip SRAM memory;
Input/output control module: the data and sequence that piece outer DDR memory is read when input are stored in on-chip SRAM storage The data sequence of on-chip SRAM memory is write back DDR chip external memory when output by device;
Data address generation module: the address of source data or result data is generated;
Computing array module: the reasoning operation of neural network algorithm is carried out in a manner of unidirectional systolic arrays.
The stacking-type based on unidirectional systolic arrays from coded system it is further design be, the neural network All results of algorithm share same set of storage resource, and the storage location that the intermediate result of algorithm generation occupies can cover.
The stacking-type based on unidirectional systolic arrays from coded system it is further design be, computing array module It include: the unidirectional systolic arrays that scale is 32x32, each independent computing unit includes 16 fixed-point multiplication devices, adds in the array Musical instruments used in a Buddhist or Taoist mass, divider, support Relu function calculating linear activation primitive computing unit and support tanh function and The nonlinear activation function computing unit that sigmoid function calculates realizes the calculating multiplied accumulating with neural network activation primitive.
The stacking-type based on unidirectional systolic arrays from coded system it is further design be, the unidirectional pulsation The unidirectional microseismic data transmission mode that the mode of array is pulsation between using column, in the ranks broadcasts, specifically: operand is with behavior Unit simultaneous transmission is to each computing unit in a column, and weight is to arrange each calculating list sequentially entered in a column for unit Member supports the multiple multiplexing of weight and operand.
Using the stacking-type based on unidirectional systolic arrays from coded system from coding method, including walk as follows It is rapid:
Step 1) signal control module receives algorithm commencing signal, controls input/output control module for input data It is transferred in SRAM memory in a particular order from DDR memory;
Step 2) controls data address generation module generating source data address to signal control module, according to source data The operand stored in SRAM memory is passed to computing array module by location, is generated and is passed to input data useful signal;
Step 3) computing array module receives the input data useful signal and reads in operand from SRAM memory Afterwards, start to carry out ANN Reasoning calculating, in calculating process: for each column, different neuron respective weights are from top to bottom It flows in each computing unit;For every a line, it is broadcast to each calculating of computing array from left to right with batch input data In unit, calculating process is completed in each computing unit;
Step 4) computing array module generates output data useful signal, and signal control module is controlled after receiving the signal Data address generation module processed generates result data address, and result data is passed to SRAM memory according to result data address In;
Step 5) signal control module control input/output control module writes result data from on-chip SRAM memory Enter in the outer DDR memory of piece, generate end signal, completes the calculating of primary complete ANN Reasoning.
The further design from coding method is that input data includes operand and weight in the step 1), Each address bit of storage unit can store 4 16 fixed-point datas in SRAM memory, and operand and weight are storing Sequential storage in unit.
Beneficial effects of the present invention:
Stacking-type based on unidirectional systolic arrays of the invention supports nerve net from the hardware realization of encryption algorithm reasoning Network layers number and neuronal quantity are configurable, support the selection of three kinds of different interlayer activation primitives, support flowing water and table tennis behaviour Make, support batch processing and it is flexible in application, scalability is good.
Detailed description of the invention
Fig. 1 is typical stacking-type autoencoder network model schematic.
Fig. 2 is schematic diagram of the stacking-type based on unidirectional systolic arrays from coded system.
Fig. 3 is unidirectional systolic arrays data flow schematic diagram.
Fig. 4 is operand storage mode schematic diagram.
Fig. 5 is weight storage mode schematic diagram.
Fig. 6 is output storage mode schematic diagram.
Specific embodiment
The present invention is described in detail with reference to the accompanying drawing.
As shown in Figure 1, the present embodiment, by taking typical standard neural network as an example, the connection type before layer and layer is complete Connection, each neuron receive the input from upper one layer of all neuron, each neuron and all nerves of next layer Member is connected, and input is transmitted by the connection of Weight and the biasing of each neuron, and the output of neuron is by current Neuron weight, the biasing of Current neural member and the output of upper one layer of neuron determine.
The stacking-type based on unidirectional systolic arrays of the present embodiment is from coded system mainly by signal control module, input Output control module, data address generation module and computing array module composition.
Relationship between each module is referring to fig. 2, wherein signal control module is responsible for receiving commencing signal, controls each module Between communicate, generate end signal.It specifically including: receiving commencing signal, control input/output control module is passed to source data, Data address generation module generating source data address is controlled, source data is read from storage unit by incoming calculating battle array according to address Column module carries out operation, after operation, receives output data useful signal, control data address generation module generates result Data address, control input/output control module spread out of result data, generate end signal.
Input/output control module is responsible for the communication between on-chip SRAM and piece external storage DDR, specifically includes to receive and ask After seeking signal, reads the data of the outer DDR of piece and be passed to on-chip SRAM by specific regular and sequence, used for computing array.Entirely After portion calculates, end signal is received, reads DDR outside the data and incoming piece of on-chip SRAM by specific rule and sequence.
Data address generation module: it before calculating, generates source data (including operand and weight) address and exports, count After calculation, generates output data (i.e. result data) address and export;
The computing array module design structure of unidirectional systolic arrays, the array are completed all of ANN Reasoning and are multiplied Accumulating operation specifically includes and receives the input data useful signal from signal control module, starts ANN Reasoning fortune It calculates, calculating finishes, produce output result useful signal and input signal control module.
A concrete case is provided below in conjunction with Fig. 3 to realize.In the case, the reasoning and calculation module of neural network by The unidirectional systolic arrays composition of one 32x32, memory module is by 128 data storage cells and 32 constant storage unit groups At.Wherein, it is 64 that data storage cell, which is bit wide, and depth is the SRAM of 8k;Constant storage unit is that bit wide is 64, deep Degree is the SRAM of 1k.Computational accuracy uses 16 fixed-point numbers, operand 128, and hidden layer neuron number is 32, lot number (batch) it is set as 3.
The specific steps of the present embodiment are as follows:
Step 1) signal processing module receives algorithm commencing signal, controls input/output control module for input data It is transferred in SRAM in a particular order from DDR.Wherein, the storage of input data (including operand and weight) in sram Mode is as shown in Figure 5 and Figure 6, and each address bit of storage unit (64) can store 4 16 fixed-point datas, operand With weight sequential storage in the memory unit.
After step 2) is to step 1) data end of transmission, signal processing module controls data address generation module and generates The operand stored in SRAM is passed to computing array module according to source data address, generates and be passed to input by source data address Data valid signal.
Step 3) computing array module receives input data useful signal and after the operand that SRAM is read in, and starts ANN Reasoning calculating is carried out, calculating process data flow, referring to fig. 4:
Set X(1),X(2),X(3)Respectively batch 1, batch 2, the operand of batch 3, the operand of three batches is all It is the vector of length 128;W1, W2, W3..., W32Respectively hidden layer neuron 1, neuron 2, neuron 3 ... ..., neuron Weight corresponding to 32, they are all the vectors that length is 128;The computing unit of i-th row jth column is expressed as PE (i, j).Example Such as, W1=(W1_1,W1_2,W1_3,…,W1_128)。
When calculating, for each column, different neuron respective weights flow to each computing unit (MLU) from top to bottom In;It for every a line, is broadcast in each computing unit from left to right with batch input data, calculating process is single in each calculating It is completed in member, the main step that calculates is to multiply accumulating.
By taking the MLU (1,1) of the first row as an example, input data (operand)By row sequence into Enter systolic arrays, and is broadcasted to same a line, respective weights W1_1,W1_2,W1_3,…,W1_128Column major order enters array, and to same One column flowing, operand and weight multiply accumulating operation in the inner completion of MLU (1,1), and operand and multiplying accumulating for weight complete it Afterwards, operation result and the corresponding biasing b of current MLU1It will do it add operation and pass through activation primitive computing unit (AU), i.e., it is complete The calculating exported at the 1st neuron.Similarly, (1,2) MLU, MLU (1,3) to MLU (1,32) complete the 2nd to the 32nd successively The calculating of neuron output.The MLU of MLU and the third line for the second row, calculating process is identical with the MLU of the first row, but Postponing a clock cycle obtains result.
So far, computing array can complete whole calculating of ANN Reasoning.
Step 4) computing array module generates output data useful signal, and signal control module is controlled after receiving the signal Data address generation module processed generates result data address, and result data is passed to SRAM according to address
Step 5) signal control module control input/output control module module is responsible for result from on-chip SRAM memory In the outer DDR memory of middle write-in piece, end signal is generated, completes the calculating of primary complete ANN Reasoning.
More than, it is merely preferred embodiments of the present invention, but scope of protection of the present invention is not limited thereto, appoints In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of, all by what those familiar with the art It is covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of protection of the claims It is quasi-.

Claims (6)

1. a kind of stacking-type based on unidirectional systolic arrays is from coded system, characterized by comprising:
Signal control module: receiving commencing signal, controls each intermodule communication, generates end signal;
Memory module: including DDR memory outside piece and on-chip SRAM memory;
Input/output control module: the data and sequence that piece outer DDR memory is read when input are stored in on-chip SRAM memory, defeated The result data sequence of on-chip SRAM memory is write back into DDR chip external memory when out;
Data address generation module: the address of source data or result data is generated;
Computing array module: the reasoning operation of neural network algorithm is carried out in a manner of unidirectional systolic arrays.
2. the stacking-type according to claim 1 based on unidirectional systolic arrays is from coded system, it is characterised in that: the mind Same set of storage resource is shared through all results of network algorithm, and the storage location that the intermediate result of algorithm generation occupies can be covered Lid.
3. the stacking-type according to claim 1 based on unidirectional systolic arrays is from coded system, it is characterised in that: calculate battle array Column module includes: the unidirectional systolic arrays that scale is 32x32, and each independent computing unit includes 16 fixed-point multiplications in the array Device, adder, divider, support Relu function calculating linear activation primitive computing unit and support tanh function and The nonlinear activation function computing unit that sigmoid function calculates realizes the calculating multiplied accumulating with neural network activation primitive.
4. the stacking-type according to claim 1 based on unidirectional systolic arrays is from coded system, it is characterised in that: the list To the mode of systolic arrays be using pulsation, the in the ranks unidirectional microseismic data transmission mode broadcasted between column, specifically: operand with Behavior unit simultaneous transmission is to each computing unit in a column, and weight is to arrange each calculating sequentially entered in a column for unit Unit supports the multiple multiplexing of weight and operand.
5. using the stacking-type based on unidirectional systolic arrays as described in claim 1-4 from coded system from coding method, It is characterized by comprising following steps:
Step 1) signal control module receives algorithm commencing signal, and control input/output control module is by input data from DDR Memory is transferred in SRAM memory in a particular order;
Step 2 waits for that signal control module controls data address generation module generating source data address, will according to source data address The operand stored in SRAM memory is passed to computing array module, generates and is passed to input data useful signal;
Step 3) computing array module receives the input data useful signal and after SRAM memory reading operand, opens Begin to carry out ANN Reasoning calculating, in calculating process: for each column, different neuron respective weights flow to from top to bottom In each computing unit;For every a line, it is broadcast in each computing unit of computing array from left to right with batch input data, Calculating process is completed in each computing unit;
Step 4) computing array module generates output data useful signal, and signal control module controls data after receiving the signal Address generating module generates result data address, and result data is passed in SRAM memory according to result data address;
Step 5) signal control module controls input/output control module and piece is written from on-chip SRAM memory in result data In outer DDR memory, end signal is generated, completes the calculating of primary complete ANN Reasoning.
6. according to claim 5 from coding method, it is characterised in that: input data includes operand in the step 1) And weight, each address bit of storage unit can store 4 16 fixed-point datas, operand and weight in SRAM memory Sequential storage in the memory unit.
CN201910528794.1A 2019-06-18 2019-06-18 Stack type self-coding system and method based on unidirectional pulsation array Active CN110232441B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910528794.1A CN110232441B (en) 2019-06-18 2019-06-18 Stack type self-coding system and method based on unidirectional pulsation array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910528794.1A CN110232441B (en) 2019-06-18 2019-06-18 Stack type self-coding system and method based on unidirectional pulsation array

Publications (2)

Publication Number Publication Date
CN110232441A true CN110232441A (en) 2019-09-13
CN110232441B CN110232441B (en) 2023-05-09

Family

ID=67859718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910528794.1A Active CN110232441B (en) 2019-06-18 2019-06-18 Stack type self-coding system and method based on unidirectional pulsation array

Country Status (1)

Country Link
CN (1) CN110232441B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689123A (en) * 2019-09-27 2020-01-14 南京大学 Long-short term memory neural network forward acceleration system and method based on pulse array
CN111401522A (en) * 2020-03-12 2020-07-10 上海交通大学 Variable speed pulsating array speed control method and variable speed pulsating array micro-frame
CN111401532A (en) * 2020-04-28 2020-07-10 南京宁麒智能计算芯片研究院有限公司 Convolutional neural network reasoning accelerator and acceleration method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140163664A1 (en) * 2006-11-21 2014-06-12 David S. Goldsmith Integrated system for the ballistic and nonballistic infixion and retrieval of implants with or without drug targeting
CN104319773A (en) * 2014-11-25 2015-01-28 常熟市五爱电器设备有限公司 Solar energy and electric supply flexible complementation power supply system
CN108710943A (en) * 2018-05-21 2018-10-26 南京大学 A kind of multilayer feedforward neural network Parallel Accelerator

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140163664A1 (en) * 2006-11-21 2014-06-12 David S. Goldsmith Integrated system for the ballistic and nonballistic infixion and retrieval of implants with or without drug targeting
CN104319773A (en) * 2014-11-25 2015-01-28 常熟市五爱电器设备有限公司 Solar energy and electric supply flexible complementation power supply system
CN108710943A (en) * 2018-05-21 2018-10-26 南京大学 A kind of multilayer feedforward neural network Parallel Accelerator

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689123A (en) * 2019-09-27 2020-01-14 南京大学 Long-short term memory neural network forward acceleration system and method based on pulse array
CN111401522A (en) * 2020-03-12 2020-07-10 上海交通大学 Variable speed pulsating array speed control method and variable speed pulsating array micro-frame
CN111401522B (en) * 2020-03-12 2023-08-15 上海交通大学 Pulsation array variable speed control method and variable speed pulsation array micro-frame system
CN111401532A (en) * 2020-04-28 2020-07-10 南京宁麒智能计算芯片研究院有限公司 Convolutional neural network reasoning accelerator and acceleration method

Also Published As

Publication number Publication date
CN110232441B (en) 2023-05-09

Similar Documents

Publication Publication Date Title
CN111897579B (en) Image data processing method, device, computer equipment and storage medium
Yin et al. A high energy efficient reconfigurable hybrid neural network processor for deep learning applications
CN109948774B (en) Neural network accelerator based on network layer binding operation and implementation method thereof
CN107301456B (en) Deep neural network multi-core acceleration implementation method based on vector processor
CN107578095B (en) Neural computing device and processor comprising the computing device
CN109447241B (en) Dynamic reconfigurable convolutional neural network accelerator architecture for field of Internet of things
CN108805266A (en) A kind of restructural CNN high concurrents convolution accelerator
CN110232441A (en) A kind of stacking-type based on unidirectional systolic arrays is from encoding system and method
CN107239823A (en) A kind of apparatus and method for realizing sparse neural network
CN107423816B (en) Multi-calculation-precision neural network processing method and system
CN107229967A (en) A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN109409510B (en) Neuron circuit, chip, system and method thereof, and storage medium
CN110390383A (en) A kind of deep neural network hardware accelerator based on power exponent quantization
CN107689948A (en) Efficient data memory access managing device applied to neural network hardware acceleration system
CN111325321A (en) Brain-like computing system based on multi-neural network fusion and execution method of instruction set
CN110222818B (en) Multi-bank row-column interleaving read-write method for convolutional neural network data storage
CN109472356A (en) A kind of accelerator and method of restructural neural network algorithm
CN101717817B (en) Method for accelerating RNA secondary structure prediction based on stochastic context-free grammar
CN111105023B (en) Data stream reconstruction method and reconfigurable data stream processor
CN109934336A (en) Neural network dynamic based on optimum structure search accelerates platform designing method and neural network dynamic to accelerate platform
CN113076521B (en) Reconfigurable architecture method based on GPGPU and computing system
CN110991630A (en) Convolutional neural network processor for edge calculation
CN108960414A (en) Method for realizing single broadcast multiple operations based on deep learning accelerator
CN109993275A (en) A kind of signal processing method and device
CN111860773B (en) Processing apparatus and method for information processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant