CN109409510A

CN109409510A - Neuron circuit, chip, system and method, storage medium

Info

Publication number: CN109409510A
Application number: CN201811076248.0A
Authority: CN
Inventors: 王峥; 梁明兰; 林跃金; 李善辽; 赵玮
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Zhongke Yuanwuxin Technology Co ltd
Priority date: 2018-09-14
Filing date: 2018-09-14
Publication date: 2019-03-01
Anticipated expiration: 2038-09-14
Also published as: CN109409510B

Abstract

The present invention is applicable in field of computer technology, provides a kind of neuron circuit, chip, system and method, storage medium, comprises the following structure in neuron circuit: computing module；Configuration information memory module, for storing neuron tupe configuration information；And control module, for controlling the computing module and being adjusted to corresponding calculating architecture and execute corresponding neural net layer node data processing according to the tupe configuration information.In this way, the complicated and diversified neural computing demand of iteratively faster can be met, can be widely applied to the field that computing resource is limited, needs certain neural network framework restructural, extend the application of deep learning chip.

Description

Neuron circuit, chip, system and method, storage medium

Technical field

The invention belongs to field of computer technology more particularly to a kind of neuron circuit, chip, system and method, deposit Storage media.

Background technique

In recent years, as the depth learning technology based on artificial neural network is in computer vision, natural language processing, intelligence The extensive use in the fields such as the energy system decision-making, the artificial intelligence chip technology accelerated to neural computing obtain academia With the concern and attention of industry.

Existing specific integrated circuit (the Application Specific customized for neural computing Integrated Circuit, ASIC) chip is most or is based on preassigned network structure and algorithm, excessively pursue power consumption With the performance of speed, its hardware configuration is caused to be fixed, do not have the reconfigurability of neural network framework, can not disposed nowadays quickly The complicated and diversified neural network structure of iteration is limited so that computing resource can not be widely applied to, needs certain neural network The restructural field of framework, such as mobile internet-of-things terminal, unmanned plane, unmanned field, asic chip application are restricted.

Summary of the invention

The purpose of the present invention is to provide a kind of neuron circuit, chip, system and method, storage mediums, it is intended to solve Certainly present in the prior art, neural network framework can not reconstruct and the problem that causes the application of deep learning chip limited.

On the one hand, the present invention provides a kind of neuron circuit, the neuron circuit includes:

Computing module；

Configuration information memory module, for storing neuron tupe configuration information；And

Control module, for controlling the computing module and being adjusted to corresponding meter according to the tupe configuration information It calculates architecture and executes corresponding neural net layer node data processing.

On the other hand, the present invention provides a kind of deep learning chip, the deep learning chip includes:

Storage unit refers to for storage depth study instruction set and the targeted data of deep learning, the deep learning Enabling collection includes: several neural net layer instructions with predetermined process sequence；

By several neuron arrays constituted such as above-mentioned neuron circuit；

Central controller, for being controlled according to the deep learning instruction set so that: from the storage unit to the mind Presently described processing mould corresponding with the instruction of presently described neural net layer is placed in through the neuron circuit in element array Formula configuration information and the corresponding required data handled, and the Current Situation of Neural Network indicated in the instruction of presently described neural net layer After the completion of layer processing task, next neural net layer processing task is executed, until depth indicated by the deep learning instruction set Learning tasks are spent to complete；And

Input-output unit, for realizing transmission of the data between the storage unit and the neuron arrays.

On the other hand, the present invention also provides a kind of deep learning chip cascade system, the deep learning chip cascades System include: at least two between each other there are cascade connection, such as above-mentioned deep learning chips.

On the other hand, the present invention also provides a kind of deep learning systems, and the deep learning system includes: at least one Such as above-mentioned deep learning chip, and the peripheral components being connected with the deep learning chip.

On the other hand, the present invention also provides a kind of neuron control method, the neuron control method includes following Step:

Obtain neuron tupe configuration information；

According to the tupe configuration information, controls computing module and be adjusted to corresponding calculating architecture and execution pair The neural net layer node data processing answered.

On the other hand, the present invention also provides deep learning control methods, and the deep learning control method includes following Step:

Deep learning instruction set is obtained, the deep learning instruction set includes: several nerves with predetermined process sequence Network layer instruction；

According to the deep learning instruction set, control so that: neuron circuit merging into neuron arrays with it is current Neural net layer instructs corresponding current processing mode configuration information and the corresponding required data handled, wherein neuron electricity Road is adjusted to corresponding calculating architecture according to presently described tupe configuration information and executes corresponding neural net layer Node data processing, and after the completion of presently described neural net layer instruction indicated Current Situation of Neural Network layer processing task, Next neural net layer processing task is executed, until deep learning task indicated by the deep learning instruction set is completed.

On the other hand, the present invention also provides a kind of computer readable storage medium, the computer readable storage mediums It is stored with computer program, is realized when the computer program is executed by processor such as the step in the above method.

On the other hand, the present invention also provides a kind of deep learning methods, and the deep learning method is based on above-mentioned depth Degree learns chip or such as above-mentioned deep learning chip cascade system, the deep learning method include the following steps:

The deep learning instruction set and the data are placed in the storage unit；

The central controller according to the deep learning instruction set, control so that: the institute in Xiang Suoshu neuron arrays It states neuron circuit and is placed in presently described tupe configuration information corresponding with the instruction of presently described neural net layer and phase The data handled needed for answering, wherein the neuron circuit is adjusted to corresponding according to presently described tupe configuration information It calculates architecture and executes corresponding neural net layer node data processing, and instruct meaning in presently described neural net layer After the completion of the Current Situation of Neural Network layer processing task shown, next neural net layer processing task is executed, until the deep learning Deep learning task indicated by instruction set is completed.

The present invention comprises the following structure in neuron circuit: computing module；Configuration information memory module, for storing mind Through first tupe configuration information；And control module, for controlling the calculating according to the tupe configuration information Module is adjusted to corresponding calculating architecture and executes corresponding neural net layer node data processing.In this way, can be according to not The demands such as same scenes function, neural network type, scale of neural network, neuron operation mode, flexible configuration neuron electricity Deep learning chip applied by road and neuron circuit enables deep learning chip and neuron circuit according to practical mind It is reconstructed through network query function needs, to meet the complicated and diversified neural computing demand of iteratively faster, can answer extensively The field that computing resource is limited, needs certain neural network framework restructural is used, the application of deep learning chip is extended.

Detailed description of the invention

Fig. 1 is the structural schematic diagram for the neuron circuit that the embodiment of the present invention one provides；

Fig. 2 is the structural schematic diagram of neuron circuit provided by Embodiment 2 of the present invention；

Fig. 3 is the structural schematic diagram for the neuron circuit that the embodiment of the present invention three provides；

Fig. 4 is the structural schematic diagram for the neuron circuit that the embodiment of the present invention four provides；

Fig. 5 is the structural schematic diagram for the deep learning chip that the embodiment of the present invention five provides；

Fig. 6 is the structural schematic diagram for the deep learning chip cascade system that the embodiment of the present invention eight provides；

Fig. 7 is the structural schematic diagram for the deep learning system that the embodiment of the present invention nine provides；

Fig. 8 is the flow diagram for the neuron control method that the embodiment of the present invention ten provides；

Fig. 9 is the flow diagram for the deep learning control method that the embodiment of the present invention 11 provides；

Figure 10 is the data structure schematic diagram of convolutional network layer instruction in an application example of the invention；

Figure 11 is the data structure schematic diagram of network layer instruction in pond in an application example of the invention；

Figure 12 is the data structure schematic diagram of fully connected network network layers instruction in an application example of the invention；

Figure 13 is the data structure schematic diagram of activation primitive network layer instruction in an application example of the invention；

Figure 14 is the data structure schematic diagram of state action network layer instruction in an application example of the invention；

Figure 15 is the structural schematic diagram of CRNA framework chip in an application example of the invention；

Figure 16 is the structural schematic diagram of neuron circuit in an application example of the invention；

Figure 17 is 128 grades of state machine control flow schematic diagrams of central controller in an application example of the invention.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

Specific implementation of the invention is described in detail below in conjunction with specific embodiment:

Embodiment one:

Fig. 1 shows the structure of the neuron circuit of the offer of the embodiment of the present invention one, and in particular, to a kind of digital nerve First circuit, for constituting deep learning neural network, and deep learning neural network can be to each needed for the progress of the data of input The orderly processing of neural net layer, neuron circuit are then used to execute data processing required in neural net layer corresponding node. For ease of description, only parts related to embodiments of the present invention are shown, including:

Computing module 101 can be carried out the adjustment for calculating architecture, to execute at different neural net layer node datas Reason.In the present embodiment, computing module 101 can be used for carrying out corresponding multiplying, add operation, be activated using activation primitive Deng single processing, or carry out the flexible combination etc. of different disposal.Computing module 101 mentioned herein can execute at least two not Same neural net layer node data processing includes at least the following two kinds meaning: first, corresponding to different types of neural network Layer, such as: convolutional network layer, pond network layer, fully connected network network layers, activation primitive network layer, state action network layer etc., Data handling requirements are not quite similar, and computing module 101 can meet at least two neural net layer data handling requirements, computing module The 101 this abilities for adapting to variety classes neural net layer data processing, can by utilize according to demand to multiplying, plus The flexible combination of the processing such as method operation, activation realizes, such as: needing certain computing module 101 to carry out convolutional network layer data When processing, computing module 101 is being needed by that can meet convolutional network layer data processing requirement to the flexible combination of above-mentioned processing Will the computing module 101 carry out fully-connected network layer data processing when, computing module 101 pass through to above-mentioned processing it is another flexibly Combination can meet fully-connected network layer data processing requirement, and this flexible combination for different demands is needed by subsequent The cooperation of configuration information memory module 102 and control module 103 is realized together, and in this meaning, neuron circuit can be carried out The data processing of a node in a certain neural network layer can also carry out a node in another neural network layer Data processing；Second, neuron circuit can carry out the corresponding data processing of difference node, example in same type neural net layer Such as: a neuron circuit can carry out the data processing of a certain node in the first convolution network layer, can also carry out the second convolutional network The data processing of a certain node in layer；Third, neuron circuit can carry out the corresponding number of difference node in same neural net layer According to processing, neuron circuit Data processing needed for same neural net layer is multiplexed.When the different node datas of progress When processing, the calculating architecture of computing module 101 is not quite similar, and computing module 101 controlled can carry out calculating architecture It is adapted to different node data processing.The data processing of all nodes is completed in a certain neural net layer, then the nerve net The data processing of network layers is completed, and the data of each neural net layer are completed in neural network, then the data processing of the neural network is complete At.

Configuration information memory module 102, for storing neuron tupe configuration information.In the present embodiment, it handles When pattern configurations information indicates that the neuron circuit carries out corresponding neural net layer node data processing, required one A little configuration informations, these configuration informations can indicate the neuron circuit needs to realize which node operation etc..

Control module 103, for controlling computing module and being adjusted to corresponding calculating basis according to tupe configuration information Framework simultaneously executes corresponding neural net layer node data processing.

Implement the present embodiment, neuron circuit can carry out in a certain neural net layer of entire depth learning neural network one The data processing of node, in this way, can be according to different scenes functions, neural network type, scale of neural network, neural n ary operation The demands such as mode, flexible configuration neuron circuit enable neuron circuit to carry out weight according to practical neural computing needs Structure can be widely applied to that computing resource is limited, needs to meet the complicated and diversified neural computing demand of iteratively faster The restructural field of certain neural network framework, extends the application of deep learning chip.

Embodiment two:

The present embodiment is on the basis of the neuron circuit of other embodiments, it is further provided following content:

As shown in Fig. 2, neuron circuit further include:

Parameter memory module 201, for parameter needed for storing the processing of neural net layer node data.In the present embodiment In, parameter can be training gained neural network parameter.

Address generation module 202 is searched for being controlled by control module 103 and handles needle with neural net layer node data Pair data corresponding to parameter, the parameter searched can be input to computing module 101 with participate in corresponding data processing in.

In the present embodiment, since the data processing of certain type neural net layers is not necessarily to call parameters, such as: pond net Network layers, activation primitive network layer etc., therefore, neuron circuit configurations are not necessarily to above-mentioned parameter memory module 201 and address Generation module 202, and other such as convolutional network layer, fully connected network network layers, state action network layers need when data processing Neural network parameter is called, then needs to configure above-mentioned parameter memory module 201 and address generation module in neuron circuit 202, the broad applicability of neuron circuit can be enhanced in this way.

Embodiment three:

As shown in figure 3, neuron circuit further include:

Temporary storage module 301, for storing the intermediate data of neural net layer node data processing.

In the present embodiment, due to certain type neural networks such as convolutional network, Local Area Network etc., without to neuron electricity Processing gained intermediate data in road carries out being stored for subsequent processing, and therefore, neuron circuit configurations are not necessarily to above-mentioned interim Memory module 301, and other such as intensified learning networks, recirculating network data processing need to handle gained using neuron circuit Intermediate data then needs to configure above-mentioned temporary storage module 301 in neuron circuit, can also enhance neuron circuit in this way Broad applicability.

Example IV:

As shown in figure 4, computing module 101 includes:

Basic calculation module 401, basic calculation module include: multiplier, adder and/or activation primitive module etc..

Gating module 402 makes basic calculation mould for executing corresponding gating movement under the control of control module 103 Block 401 constitutes corresponding calculating architecture, gating module 402 can include: multiplexer (Multiplexer, MUX) and/or solution Multiplexer (Demultiplexer, DEMUX) etc..

In the present embodiment, basic calculation module 401 can carry out multiplying, add operation, be activated using activation primitive Etc. based process required parameter can be obtained from above-mentioned parameter memory module 201, gating module 402 is then when carrying out based process Basic calculation module 401 can be adjusted accordingly as required, be obtained needed for Real-time Neural Network node layer data processing Architecture is calculated, realizes the reconstruct of computing module 101.

Embodiment five:

The structure that Fig. 5 shows the deep learning chip of the offer of the embodiment of the present invention five illustrates only for ease of description Part related to the embodiment of the present invention, including:

Storage unit 501, for storage depth study instruction set and the targeted data of deep learning, deep learning instruction Collection includes: several neural net layer instructions with predetermined process sequence.In the present embodiment, deep learning instruction set includes deep All neural net layers instruction that degree learning tasks are covered, such as: the instruction of convolutional network layer, the instruction of pond network layer, Quan Lian Connect network layer instruction, the instruction of activation primitive network layer, the instruction of state action network layer etc..Certainly, in this instruction set usually also Including the other information needed to have to complete deep learning task, such as: neural network information, neural network structure Information etc., wherein neural network information can indicate that the neural network is convolutional network, Local Area Network, recirculating network or strong Change learning network etc., neural network structure information can include: neural net layer number of plies information that neural network is included, nerve net Number of nodes information and neural net layer in network layers need to realize instruction information of which operation etc., the instruction information with it is upper It is corresponding to state the tupe configuration information stored in configuration information memory module 102.

By several neuron arrays 504 constituted such as above-mentioned neuron circuit 502.Concrete function, structure such as other realities It applies described in example, details are not described herein again.

Central controller 503, for according to deep learning instruction set control so that: from storage unit 501 to neuron battle array Neuron circuit 502 in column 504 be placed in current processing mode configuration information corresponding with the instruction of Current Situation of Neural Network layer and The data handled needed for accordingly, and after the completion of working as Current Situation of Neural Network layer processing task indicated by neural net layer instruction, Next neural net layer processing task is executed, until deep learning task indicated by deep learning instruction set is completed.In this reality It applies in example, various types controller can be used in central controller 503, such as: advanced reduced instruction set machine (Advanced Reduced Instruction Set Computer Machine, ARM) controller, Intel's Intel Series Controller, China Series Controller etc. is thought for sea, and finite state machine can be used in framework, the conversion of different conditions can be completed according to condition, to control The workflow for making the neural network that it is covered, specifically includes: configuration flow, neural network computing process, data transmission stream Journey etc., wherein may relate to neuron arrays 504 in entire depth study chip and be directed at the single batch of a neural net layer Reason or neuron arrays are directed to the multiple batches of processing of a neural net layer, and multiple batches of processing is then related to neuron circuit 502 Multiplexing.Central controller 503 is primarily useful for configuring the deep learning chip for constituting neural network, so that neuron battle array Column 504 can instruct according to the neural net layer in deep learning instruction set and carry out orderly data processing.In entire neural network Operational process in, main operational of the central controller 503 to realize neural network, comprising: the update of instruction, content decoding Deng.

Input-output unit 505, for realizing biography of the data between storage unit 501 and in real time neuron arrays 504 It is defeated.

The process flow of one deep learning task approximately as:

Current Situation of Neural Network layer processing task is first carried out, it is corresponding with the instruction of Current Situation of Neural Network layer in storage unit 501 Current processing mode configuration information, can pass through input-output unit 505 be placed in neuron arrays 504 in neuron circuit In 502, the configuration of neuron circuit 502 is completed, the pending data in subsequent storage unit 501 passes through input-output unit again In neuron circuit 502 in 505 merging neuron arrays 504, neuron circuit 502 then on the basis of the configuration completed, The data of merging are handled, pending data of the processing the data obtained as next neural net layer processing task.Then, According to above-mentioned same method, next neural net layer processing task is executed, until all neural net layer processing tasks are complete At being finally completed this deep learning task.

In addition, handled if needing to introduce neural network parameter in deep learning task, in above-mentioned process flow, Tupe configuration information be placed in neuron circuit 502 after, can also from storage unit 501 into neuron arrays 504 Neuron circuit 502 is placed in corresponding parameter, then executes merging, the processing to data.

Implement the present embodiment, can be transported according to different scenes functions, neural network type, scale of neural network, neuron The demands such as calculation mode, deep learning chip applied by flexible configuration neuron circuit and neuron circuit, so that depth Practising chip and neuron circuit can need to be reconstructed according to practical neural computing, so that the complexity for meeting iteratively faster is more The neural computing demand of sample can be widely applied to the neck that computing resource is limited, needs certain neural network framework restructural Domain extends the application of deep learning chip.

Embodiment six:

The present embodiment on the deep learning chip basis of other embodiments, further to:

Input-output unit 505 is to seal in (Stream-in) to go here and there out (Stream-out) formula shift register, neuron electricity Independent data transmission path is established between road 502 and input-output unit 505.

In the present embodiment, the pending data for each node of neural net layer that storage unit 501 is stored will pass through Input shift register and independent data transmission path are transferred in neuron arrays 504 in corresponding neuron circuit 502 It is handled, after the completion of processing, processing the data obtained passes through independent data transmission path again and Output Shift Register passes It is defeated to be stored into storage unit 501.If the processing the data obtained of all nodes of Current Situation of Neural Network layer is next nerve The pending data of network layer is then after all nodes of Current Situation of Neural Network layer complete data processing, then by all processing Pending data of the data obtained as next neural net layer.

Implement the present embodiment, it can be achieved that the data pipeline transmission for the formula of going here and there out is sealed in, compared to traditional multi-neuron pair Most evidences access for required more fan-out circuits, no longer need to the access address for calculating data storage, carry out greatly to read-write Amplitude simplifies, and reduces the bandwidth requirement to memory, considerably reduces input and output power consumption.And the use of shift register And the data transmission path established between neuron circuit 502 and shift register phase between each neuron circuit 502 To independence, the competition mechanism (Retention) that the more core systems of neuron arrays access same storage can avoid, without As traditional multi-core processor system is required on communication bus, the caching of the arbitration mechanism for avoiding conflict and complexity Synchronization mechanism (Cache Synchronization), thus because seal in go here and there out formula register introducing due to more array classes for constituting Join input-output system, can make to calculate handling capacity and the quantity of neuron circuit 502 linearly increases, while optimize storage Access mechanism, avoid useless calculating.

Embodiment seven:

Storage unit 501 is also used to: the intermediate data of storage Real-time Neural Network node layer data processing.

The implementation purpose of the present embodiment is identical as above-described embodiment three, and details are not described herein again.

Embodiment eight:

Fig. 6 shows the structure of the deep learning chip cascade system of the offer of the embodiment of the present invention eight, for ease of description, Only parts related to embodiments of the present invention are shown, including:

At least two between each other there are the deep learning chips 601 of cascade connection, such as above-mentioned any embodiment.

Implement the present embodiment, muti-piece can be cascaded, chip is accelerated to meet different scenes to expand parallel processing capability Use demand.

Embodiment nine:

The structure that Fig. 7 shows the deep learning system of the offer of the embodiment of the present invention nine illustrates only for ease of description Part related to the embodiment of the present invention, including:

At least one is such as above-mentioned deep learning chip 701, and the peripheral components being connected with deep learning chip 701 702.In the present embodiment, it when there are at least two deep learning chips 701, can be cascaded between deep learning chip 701, It can not also cascade and mutually indepedent.And peripheral components 702 can be other embeded processors or sensor etc..

Embodiment ten:

The process that Fig. 8 shows the neuron control method of the offer of the embodiment of the present invention ten is only shown for ease of description Part related to the embodiment of the present invention, is directed to following steps:

In step S801, neuron tupe configuration information is obtained.

In step S802, according to the tupe configuration information, controls computing module and be adjusted to corresponding calculating base Plinth framework simultaneously executes corresponding neural net layer node data processing.

The involved content of above-mentioned steps S801, S802 processing has specific presentation in related content in other embodiments, this Place can quote and repeat no more.

Embodiment 11:

Fig. 9 shows the process of the deep learning control method of the offer of the embodiment of the present invention 11, for ease of description, only Part related to the embodiment of the present invention is shown, following steps are directed to:

In step S901, deep learning instruction set is obtained, which includes: several with predetermined process The neural net layer of sequence instructs.

In step S902, according to deep learning instruction set, control so that: the neuron circuit into neuron arrays is set Enter and instruct corresponding current processing mode configuration information and the corresponding required data handled to Current Situation of Neural Network layer, wherein Neuron circuit is adjusted to corresponding calculating architecture according to presently described tupe configuration information and executes corresponding mind It handles through network layer node data, and is completed in the indicated Current Situation of Neural Network layer processing task of Current Situation of Neural Network layer instruction Afterwards, next neural net layer processing task is executed, until deep learning task indicated by deep learning instruction set is completed.

The involved content of above-mentioned steps S901, S902 processing has specific presentation in related content in other embodiments, this Place can quote and repeat no more.

Embodiment 12:

In embodiments of the present invention, a kind of computer readable storage medium is provided, which deposits Computer program is contained, the step in above method embodiment 11 or 12 is realized when which is executed by processor Suddenly, for example, step S801 to S802 shown in FIG. 1.

The computer readable storage medium of the embodiment of the present invention may include can carry computer program code any Entity or device, recording medium, for example, the memories such as ROM/RAM, disk, CD, flash memory.

Embodiment 13:

The process for the deep learning method that the embodiment of the present invention 13 provides, based on such as above-mentioned deep learning chip or as above State deep learning chip cascade system or as above-mentioned deep learning system illustrates only and implements with the present invention for ease of description The relevant part of example, is directed to following steps:

Learn instruction set and data to 501 placed-depth of storage unit；

Central controller 503 according to deep learning instruction set, control so that: the neuron electricity into neuron arrays 504 The merging of road 502 instructs corresponding current processing mode configuration information and the corresponding required number handled to Current Situation of Neural Network floor According to, wherein neuron circuit 502 is adjusted to corresponding calculating architecture and execution pair according to current processing mode configuration information The neural net layer node data processing answered, and appoint in the indicated Current Situation of Neural Network layer processing of Current Situation of Neural Network layer instruction After the completion of business, next neural net layer processing task is executed, until deep learning task indicated by deep learning instruction set is complete At.

Below by an application example, to neuron circuit, chip involved in the various embodiments described above, system and its side Method, the related content of storage medium are specifically described.

This application example is specifically related to a kind of deep learning instruction set and the restructural mind of coarseness based on the instruction set Design through form array (Coarse-grained Reconfigurable Neuromorphic Array, CRNA) framework and Using, the design and application in, cover neuron circuit, chip involved in the various embodiments described above, system and method, The related content of storage medium.This application example introduces assembly line using full-digital circuit design neuron and neuron arrays Design method flexibly realizes configuration neural network type, neural network structure (neural net layer section by way of dynamic configuration Point number and the neural net layer number of plies), the combined application of a variety of neural networks, the operating mode of neuron etc..Using should With example, the processing speed of data can be increased substantially, and is able to satisfy the neural network algorithm demand of existing iteratively faster, is had low The characteristics of power consumption, processing speed be fast, reconfigurability, be particularly suitable for computing resource is limited, memory capacity is few, power consumption requires, The usage scenario to be sought quickness of processing speed has widened hardware neural network based, software application field.

Firstly, accordingly being illustrated deep learning instruction set.Instruction set is the core of processor design, is software systems With the interface of hardware chip.The instruction set of this application example support neural network hierarchical description, and in particular to following five kinds even Further types of neural net layer instruction.In this application example, instruction width is 96, certainly, in other application example In, the adjustment of instruction width adaptability, and in particular to: as shown in Figure 10 convolutional network layer instruction, pond as shown in figure 11 Network layer instruction, fully connected network network layers as shown in figure 12 instruction, as shown in fig. 13 that activation primitive network layer instruction and such as The instruction of state action network layer shown in Figure 14, wherein assignment can be carried out to corresponding data position in instruction, to realize corresponding Function, such as:, can be to the 70th assignment " 1 " to indicate to fill in Figure 10, assignment " 0 ", can be to to indicate to be not filled with 65-67 assignment " 001 " are to indicate convolution kernel having a size of 1 × 1, and assignment " 010 " is to indicate convolution kernel having a size of 2 × 2 etc.；Scheming In 11, pondization strategy can be indicated 65-67 assignment " 000 " using maximum pond (max-pooling), assignment " 001 " To indicate pondization strategy using minimum pond (min-pooling), assignment " 010 " is to indicate pondization strategy using average pond (average-pooling) etc., it can indicate the 70th assignment " 1 " positive, assignment " 0 " is to indicate reversed etc.；In Figure 13, Can to 5-9 assignment " 00001 " using indicate activation primitive mode as line rectification (Rectified Linear Unit, ReLU) function, assignment " 00010 " with indicate activation primitive mode be S type (Sigmoid) function, assignment " 00011-11111 " with Presentation code is expansible etc.；In Figure 14, can 45-47 assignment " 000 " be indicated with iterative strategy is learnt using depth Q (Deep Q-learning, DQN) algorithm, assignment " 001 " is to indicate iterative strategy adoption status-movement-reward-state-movement (State-Action-Reward-State-Action, SARSA) algorithm, assignment " 010-111 " is expansible with presentation code, E- Greedy probability can use 0-100 etc..

Secondly, the design and application to CRNA framework chip are accordingly illustrated.The CRNA framework that this application example proposes Chip, generally comprises storage unit 1501 as shown in figure 15, input-output unit 1502, several neuron circuits 1503 and Central controller 1504 etc..The framework breaks the limitation of traditional von Neumann framework, by distributed memory memory optimization It uses, different network modes, neural network structure and various modes nerve net is flexibly realized by dynamic configuration mode The combined application etc. of network；Based on the configuration of control module and the realization of central processing unit 1504 to storage in neuron circuit 1503 With the assembly line rapid computations function of neural network, and by the optimization design of hardware realization artificial neuron, mention significantly The high computing capability of entire CRNA framework.The CRNA framework sufficiently uses memory source, and it is further to break through von Neumann framework Computing capability is improved, volume of transmitted data is effectively reduced, and power consumption is greatly lowered, the CRNA framework that this application example proposes It supports the deployment of polyhybird neural net layer, there are well the flexibly advantages such as reconfigurability, low-power consumption, high computing capability.

The function of each unit can be as following in CRNA framework:

(1) memory:

Memory includes: storage unit 1501 and the parameter memory module in neuron circuit 1503 as shown in figure 15 15031, storage unit 1501 can carry out distributed deployment: the first storing sub-units 15011, the second storing sub-units 15012 and Third storing sub-units 15013, these storing sub-units, which can also be concentrated, is deployed in a physical store, is described as follows:

First storing sub-units 15011, the data being directed to for storing Processing with Neural Network include: input data, nerve Network interlayer storing data and output data etc..

Parameter memory module, for parameter needed for storing trained neural network node data processing, in mind The storage of parameter can be completed through the netinit stage.When neural network is in operation stages, neuron circuit 1503 can The corresponding parameter read in parameter memory module completes corresponding neural network node layer operation, and neuron circuit 1503 is only read Local parameter is so as to avoid data access is carried out between neuron a possibility that.

Second storing sub-units 15012, the partial memory determine neural network type (convolutional network, area of CRNA framework Domain network, recirculating network or intensified learning network) and neural network structure (neural net layer interstitial content, neural net layer The number of plies, the realized operation of every neural net layer) etc..

Third storing sub-units 15013, the partial memory particular for intensified learning network mode or recirculating network mode, Intermediate data caused by intensified learning network, recirculating network operation is stored.

(2) input and output:

Input-output unit 1502, for being realized respectively by input shift register and Output Shift Register to input Data and sealing in for output data are gone here and there out, illustrate that see the above embodiment 6 for details, and details are not described herein again.

(3) artificial neuron:

Neuron circuit 1503 can carry out the neural n ary operation of designated mode according to configuration to the input data of neural network, Obtain operation result.Artificial neuron's design method of CRNA framework can flexibly realize the deployment or more of single kind neural network The combination deployment of kind neural network, is described as follows:

Neuron circuit 1503 may include structure as shown in figure 16, be directed to: computing module, configuration information store mould Block 1601, control module 1602, parameter memory module 1603, address generation module 1604, temporary storage module 1605, operation are slow Storing module 1606, configuration information/parameter input module 1607, data input module 1608, data outputting module 1609 etc..Its In, configuration chain register can be used in configuration information memory module 1601, and operation cache module 1606 can be accumulator register, calculates Module may include multiplier, adder, activation primitive module 1610, gating module etc..Details are as follows for the function of each module:

Configuration information/parameter input module 1607, the tupe for inputting neuron to neuron circuit 1503 are matched Confidence breath and neural network parameter, operating mode of the tupe configuration information to configure neuron.

Gating module can be presented as that MUX and/or DEMUX, MUX are M1, M2 in figure label, and DEMUX is in figure label For DM1, DM2.M1 is used to choose whether to skip multiplication unit, skips if reading parameter and being 0；DM1 is used to control input content Destination be configuration information memory module 1601 or parameter memory module 1603；DM2 is used to specify activation primitive or skips Activation processing；The output of M2 selection activation primitive；Content selected by M1, M2, DM1, DM2 is specified by control module 1602.

Address generation module 1604, the parameter phase read in the input data and parameter memory for guaranteeing neuron in real time Matching.

Multiplier, adder form multiply-add module, and for carrying out multiplying to data, parameter, result is stored in operation Cache module 1606, and read in next period as one of addition input, it such as needs to back up the calculated result, then result is stored in In temporary storage module 1605.

Control module 1602, for controlling the operating mode of entire neuron, including MUX, DEMUX according to configuration information Selection, the operating mode of address generation module 1604 etc..

Data outputting module 169, the calculated result for output neuron circuit 1503.

The workflow of neuron circuit 1503 approximately as:

Firstly, being used to configure neuron by the serial input content of configuration information/parameter input module 1607, wherein mind It is stored respectively in parameter memory module 1603 and configuration information memory module 1601 through network parameter and configuration information.

Secondly, completing neuron with postponing, neuron obtains input data from data input module 1608, stores from parameter The multiply-add operation needed for the neural network parameter to match with input data is carried out for neuron is found in module 1603.

Then, by multiply-add operation as a result, selection instruction domain selects phase according to the mode contents in the instruction of activation primitive layer Activation processing needed for answering activation primitive to carry out, then by neuronal activation as a result, being stored according to network mode to corresponding Memory (operation cache module 1606 or operation cache module 1606 and temporary storage module 1605).

When completing the operation of all input datas of Current Situation of Neural Network node layer, by the output result of neuron through data Output module 1609 exports, and exports by CRNA framework input-output unit 1502 to be stored.

It should be understood that the above process contains multiply-add operation, activation processing, but in practical applications, these operations All select on demand to configure.

(4) center control:

Central controller 1504 uses finite state machine in CRNA framework, and state machine completes different shapes according to switch condition The conversion of state, so that the workflow of entire framework is controlled, including as shown in figure 17: configuration flow S1701, neural network fortune Calculate process S1702, data transmission stream journey S1703 etc..As shown in figure 17, it is specifically described with 128 grades of state machine control flows It is as follows:

In process S1701, to the second storing sub-units 15012, parameter memory module 1603, the first storing sub-units 15011 are configured according to algorithm requirements.

In process S1702, it is related to the state machine control flow of 128 grades of overlength vector assembly line, including: instruction Update and content decoding, realize the main operational of neural network.128 neurons are used in the CRNA architecture design, when same When one neural net layer interstitial content is greater than 128 artificial neuron numbers that CRNA framework is included, 128 minds can be utilized It is repeatedly calculated in batches through member progress, i.e., the artificial neuron element array constituted to 128 each neurons is constantly multiplexed, by reading and solving Code instruction configures 128 neuron tupes, and the characteristic of control neural network entirety is configured by global parameter, And data flow is controlled in the jumping of neural network interlayer, the input and output of data, parametric distribution etc..

In process S1703, by the output result of neural network be transferred to host computer carry out using.

The process that above-mentioned CRNA framework chip is realized relates generally to:

First, row initial configuration is internally deposited into according to the corresponding configuration information of deep learning instruction set, parameter, data, Specifically can, configuration neural network type (mode), neural net layer number of plies number and neuron tupe etc..

Second, carrying out corresponding neural network computing according to neural network type.It is specifically related to: Current Situation of Neural Network layer is referred to It enables and executes corresponding data processing, complete Current Situation of Neural Network layer and handle task, then institute's total is handled with Current Situation of Neural Network layer According to the pending data as next neural net layer processing task, next neural net layer processing task is executed.

When executing neural net layer task, the general nerve first read from memory in configuration information merging neuron arrays In first circuit, the tupe configuration of neuron circuit is completed, then neural network parameter is read from memory and is placed in neuron battle array In neuron circuit in column, is then handled from memory serial input pending data, mind can be called during processing Through network parameter.

In this way, realizing by 128 level production line process of overlength and carrying out network operations to data, in this process, constantly Operation result deposit is added up into accumulator register, instructs targeted all data all complete to a neural net layer After portion completes the network operations of neuron arrays, which is passed sequentially through into Output Shift Register storage into memory, Input data as the processing of next neural net layer.When a last neural net layer is arrived in processing, by last nerve net The operation result of network layers is equally stored in memory, and according to data output journey, the operation result is successively output to host computer, For the subsequent use of host computer.

The design of this CRNA framework chip is based on United Microelectronics company (United Microelectronics Corporation, UMC) 65nm complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor, CMOS) technique carry out emulation and logic synthesis.Devise the restructural battle array of integrated 128 digital neurons Column are used as computing unit, each neuron includes the datarams of two pieces of 1KB and the parameter memory of two pieces of 1KB, have in real time The control port for flexibly closing neuron, can reduce the dynamic power consumption of neuron.Host state machine can flexible modulation memory and nerve The working condition of element array, and the jumping of network interlayer, the input and output of data, parametric distribution are realized by data flow control Etc. functions.Synthesis result shows that the design of distributed memory system greatly reduces being fanned out to for storage access, reduces storage control The complexity of system processed and the bandwidth of data access, improve the harmony of parametric distribution.By configuring, can effectively dispose Type is different, neural network model diversified in specifications.It is part of test results below:

Chip emulation: it is configured with the full Connection Neural Network of 10 layers of any Inport And Outport Node quantity, is sent out by waveform diagram Existing, either neuron arrays service efficiency in the case where it is enabled is several into 100%, and interlayer jumps 2 of only state machine The delay of clock cycle.Fully-connected network function is able to the Complete Mappings from algorithm to circuit.

Logic synthesis: synthesis result indicates that this restructural design chips specification in 1.5mm2 or so, is saved hard very much Part resource is always advantageously integrated in resource-constrained terminal, as shown in table 1 below；Power consumption is easily integrated into tens milliwatt ranks In the terminal device of low-power consumption, as shown in table 2 below.

Table 1: basic unit uses and area occupancy situation

Table 2: module dissipation ratio

The above this application example has the advantage that

First, proposing that instruction is assembler language level instruction, with the existing platform rank Network Dept. based on operating system It is different to affix one's name to model (TensorFlow, Caffe etc.) frame, does not need the support of operating system, is directly changed chip operation mould Formula, programming efficiency is high, can be deployed directly into super low-power consumption and calculate scene.

Second, CRNA framework uses digital circuit artificial neuron so that neuron have noise resisting ability it is strong, can The advantages that, precision height, favorable expandability high by property, design method maturation specification.The design computational accuracy is 8 fixed point quantization sides Formula uses unit binary quantization network compared to deep learning processor part now, and the design is higher than its computational accuracy.

Third, the implementation of neural net layer is more flexible.Complicated neural network is disposed by the way of array multiplexing, Piecemeal realizes the different neural network model of number of nodes, a large amount of to be multiplexed neural computing unit, greatly improves hardware resource Utilization rate, save hardware cost, have high flexibility.

Fourth, CRNA framework has reconfigurability, programmability.CRNA framework uses assembly line distributed storage mode, Delay and power consumption are reduced, the reliability of system is improved, and it is relatively complete to make each computing unit become a function Kind, the relatively complete independent mini system of structure, configuration chain is directly connect with each individual neuron, and configuration process has Similitude and progressive relationship pass through configuration-direct memory global configuration different mode networks to be easier to realize restructural.Cause This is from global and local realization reconfigurability, programmability；

Fifth, integration and scalability.Distributed storage mode reduces delay and power consumption, improves the reliable of system Property, but also data and parameter distribution are more uniform, there is preferably harmony relative to centrally stored mode, to have good Good integration；CRNA framework can be used in combination with embeded processor, sensor, and cascade muti-piece accelerates chip, expand parallel Processing capacity, to meet the use of different scenes.

To sum up, the mentioned instruction of this application example and CRNA framework have high-speed low-power-consumption, flexible reconfigurable ability, for nowadays The different deep neural network of type provides reliable computing platform, and deep neural network algorithm is promoted to set in mobile internet-of-things terminal The extensive use in the fields such as standby, unmanned plane, automatic Pilot.

It should be understood that each unit involved in the above-described embodiments or module can be by corresponding hardware or software lists Member realizes that each unit or module can be independent soft and hardware unit or module, also can integrate as a soft and hardware unit Or module, herein not to limit the present invention.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims

1. a kind of neuron circuit, which is characterized in that the neuron circuit includes:

Computing module；

Control module, for controlling the computing module and being adjusted to corresponding calculating base according to the tupe configuration information Plinth framework simultaneously executes corresponding neural net layer node data processing.

2. neuron circuit as described in claim 1, which is characterized in that the neuron circuit further include:

Parameter memory module, for parameter needed for storing the neural net layer node data processing；And

Address generation module is searched and is directed to neural net layer node data processing for being controlled by the control module Data corresponding to parameter.

3. neuron circuit as described in claim 1, which is characterized in that the neuron circuit further include:

Temporary storage module, for storing the intermediate data of the neural net layer node data processing.

4. neuron circuit as described in claim 1, which is characterized in that the computing module includes:

Basic calculation module, the basic calculation module include: multiplier, adder and/or activation primitive module；And

Gating module makes the basic calculation module for executing corresponding gating movement under the control of the control module The corresponding calculating architecture is constituted, the gating module includes: multiplexer and/or demultiplexer.

5. a kind of deep learning chip, which is characterized in that the deep learning chip includes:

Storage unit, for storage depth study instruction set and the targeted data of deep learning, the deep learning instruction set It include: several neural net layer instructions with predetermined process sequence；

By several neuron arrays constituted such as the described in any item neuron circuits of Claims 1-4；

Central controller, for being controlled according to the deep learning instruction set so that: from the storage unit to the neuron The neuron circuit in array is placed in presently described tupe corresponding with the instruction of presently described neural net layer and matches Confidence breath and the corresponding required data handled, and at the indicated Current Situation of Neural Network layer of presently described neural net layer instruction After the completion of reason task, next neural net layer processing task is executed, until depth indicated by the deep learning instruction set Habit task is completed；And

6. deep learning chip as claimed in claim 5, which is characterized in that the input-output unit is to seal in formula of going here and there out to move Bit register establishes independent data transmission path between the neuron circuit and the input-output unit.

7. deep learning chip as claimed in claim 5, which is characterized in that the storage unit is also used to: storing the mind The intermediate data handled through network layer node data.

8. a kind of deep learning chip cascade system, which is characterized in that the deep learning chip cascade system includes: at least two It is a that there are cascade connections, the described in any item deep learning chips of such as claim 5 to 7 between each other.

9. a kind of deep learning system, which is characterized in that the deep learning system includes: at least one such as claim 5 to 7 Described in any item deep learning chips, and the peripheral components being connected with the deep learning chip.

10. a kind of neuron control method, which is characterized in that the neuron control method includes the following steps:

Obtain neuron tupe configuration information；

According to the tupe configuration information, controls computing module and be adjusted to corresponding calculating architecture and execute corresponding The processing of neural net layer node data.

11. a kind of deep learning control method, which is characterized in that the deep learning control method includes the following steps:

Deep learning instruction set is obtained, the deep learning instruction set includes: several neural networks with predetermined process sequence Layer instruction；

According to the deep learning instruction set, control so that: neuron circuit merging and Current neural into neuron arrays Network layer instructs corresponding current processing mode configuration information and the corresponding required data handled, wherein neuron circuit root Corresponding calculating architecture is adjusted to according to presently described tupe configuration information and executes corresponding neural network node layer Data processing, and after the completion of presently described neural net layer instruction indicated Current Situation of Neural Network layer processing task, it executes Next neural net layer handles task, until deep learning task indicated by the deep learning instruction set is completed.

12. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In realization such as the step in claim 10 or 11 the methods when the computer program is executed by processor.

13. a kind of deep learning method, which is characterized in that the deep learning method is based on such as any one of claim 5 to 7 institute The deep learning chip or deep learning chip cascade system as claimed in claim 8 or as claimed in claim 9 deep stated Learning system is spent, the deep learning method includes the following steps:

The central controller according to the deep learning instruction set, control so that: the mind in Xiang Suoshu neuron arrays Corresponding presently described tupe configuration information and corresponding institute are instructed to presently described neural net layer through the merging of first circuit The data that need to be handled, wherein the neuron circuit is adjusted to corresponding calculating according to presently described tupe configuration information Architecture simultaneously executes corresponding neural net layer node data processing, and indicated in the instruction of presently described neural net layer After the completion of Current Situation of Neural Network layer processing task, next neural net layer processing task is executed, until the deep learning instructs The indicated deep learning task of collection is completed.