CN108255514A - For the neuron calculator operation method of cellular array computing system - Google Patents

For the neuron calculator operation method of cellular array computing system Download PDF

Info

Publication number
CN108255514A
CN108255514A CN201611238870.8A CN201611238870A CN108255514A CN 108255514 A CN108255514 A CN 108255514A CN 201611238870 A CN201611238870 A CN 201611238870A CN 108255514 A CN108255514 A CN 108255514A
Authority
CN
China
Prior art keywords
neuron
cellular array
computing unit
computing system
identification code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611238870.8A
Other languages
Chinese (zh)
Inventor
戴瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ciyu Information Technologies Co Ltd
Original Assignee
Shanghai Ciyu Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ciyu Information Technologies Co Ltd filed Critical Shanghai Ciyu Information Technologies Co Ltd
Priority to CN201611238870.8A priority Critical patent/CN108255514A/en
Publication of CN108255514A publication Critical patent/CN108255514A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Neurology (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of neuron calculator operation methods for cellular array computing system.Cellular array computing system includes master controller, bus, forms cellular array by multiple computing units;Each computing unit includes:For performing the neuron calculator of the calculating operation of neuron and internal storage location, its position in cellular array is stored in each computing unit as identification code.The input information of each neuron calculator is made to include the identification code and output data of upstream neuron;The identification code of all upstream neurons is stored using the one section of content adressable memory included in each neuron calculator;It is inputted for each, the identification code of the neuron calculator of input is compared with the content in above-mentioned memory;Identification code according to meeting in comparison result find with the corresponding weight of the input, each weight and input are multiplied to obtain product, and added up to obtain cumulative signal to all products.

Description

For the neuron calculator operation method of cellular array computing system
Technical field
The present invention relates to semiconductor chip field and artificial intelligence fields more particularly to a kind of cellular array that is used for calculate The neuron calculator operation method of system.
Background technology
Human brain is a network connected by a large amount of neuron complexity.Each neuron is connected a large amount of by a large amount of dendron Other neurons, receive information, each tie point is cynapse (Synapse).After outside stimulus accumulation to a certain extent, A stimulus signal is generated, is sent out by aixs cylinder.Aixs cylinder has a large amount of tip, by cynapse, is connected to other a large amount of god Dendron through member.Exactly such a network being made of the neuron of simple functions realizes all intelligency activities of the mankind. Man memory and intelligence are generally believed that in the different stiffness of couplings for being stored in each cynapse.
The response frequency of neuron is no more than 100Hz, 10,000,000 times faster than human brain of the CPU of modern computer, but handles very The ability of more challenges is not so good as human brain.This has promoted computer industry to start to imitate human brain.The earliest imitation to human brain is In software view.
Neural network (Neural Networks) is common algorithm in computer learning.God in neural network algorithm It is exactly a function through member, it has very multiple input, each input corresponds to a weight.General algorithm is each Input is multiplied by weight and is being added.It exports 0 or 1 (being determined by a threshold value) or a value between 0 and 1.One allusion quotation The neural network of type is the network that the output input of a large amount of cellular arrays (Neuron) is linked together, is typically organized to more Level framework.It is exactly to adjust these parameters to have very multiple parameters (weight, threshold value), the process of learning training inside it.This is one The function optimization that magnanimity is needed to calculate.This kind of algorithm has been achieved for abundant achievement, is used widely.
Network in neural network algorithm is all divided into plurality of layers.Earliest network, each neuron of last layer Each neuron with next layer connects, and becomes the network of full-mesh.One problem of full-mesh network, is image procossing In this kind of application, there are many pixel of image, and the weight quantity of each layer of needs is proportional to pixel square, thus in program occupancy It deposits too greatly, calculation amount is even more to be unable to cope with.
In convolutional neural networks, the plurality of layers of front is no longer full-mesh.Each layer of neuron is as a figure As being ordered in array.Next layer each neuron is only connected with a zonule of this layer.Zonule is often one The length of side is the square region of k, and k is known as the size of cores (Kernel Size) of convolutional network, as shown in Figure 1.
Convolutional neural networks (Convolutional Neural Network, CNN) are because to each of this zonule The summation of point weighted is gained the name similar to convolution.Each point of this group of weight in each same confluent monolayer cells is the same Weight quantity is greatly reduced so as to be compared with full-mesh network so that high-resolution image procossing in (both translation invariances) It is possibly realized.One convolutional neural networks includes multiple layers connected in this way and other kinds of layer.
With popularizing for deep learning application, people start to develop dedicated neural network chip.It is realized with special circuit The addition and multiplication that neuron calculates, than with CPU GPU much more efficients.
Reluctance type random access storage device (Magnetic Random Access Memory, MRAM) is a kind of new interior Deposit and memory technology, can as SRAM/DRAM quick random read-write, it is and faster than DRAM;Can also as flash memory Permanent retention data after power-off, and unlike NAND, it can be with unlimited secondary erasable.
It is local good that the economy of MRAM is thought, the silicon area that unit capacity occupies is than SRAM (usually as the caching of CPU) There is very big advantage, be expected to the level close to DRAM.Its performance is also fairly good, reads and writes time delay close to best SRAM, power consumption It is then best in various memories and memory technology.And MRAM unlike DRAM and flash memory with standard CMOS semiconductor technique not It is compatible.MRAM can be integrated into logic circuit in a chip.There is MRAM technology, it is possible to memory, storage, calculate three A function is integrated on a chip.New computing architecture is possible to.
The characteristics of human brain is extensive parallel computing, not only has a large amount of neuron that can work at the same time, but also each god It is connected through member with thousands of a neurons.For modern integrated circuits technology, a large amount of neuron is integrated on a single die It is easy to, but internal communication bandwidth as offer human brain is extremely difficult.For example, if the input data of one layer of neuron exists In one block RAM, it is necessary to which at least k clock cycle could read out data, because the memory do not gone together cannot be carried out at the same time Read-write.The speed of data is read as a result, and both memory bandwidth was the bottleneck calculated.
Invention content
In view of the drawbacks described above of the prior art, a kind of and neural network frame based on cellular array is proposed in the present invention The framework of structure is made of numerous neuron calculators for having both store function and dense network connection.The present invention's is this new Framework will will be widely used in the fields such as mass computing, big data processing, artificial intelligence.
To achieve the above object, the present invention provides a kind of neuron calculator operations for cellular array computing system Method, wherein cellular array computing system include master controller, bus, form cellular array by multiple computing units;Wherein, carefully Each computing unit of born of the same parents' array includes:For perform the calculating operation of neuron one or more neuron calculators, with And internal storage location, wherein, its position in cellular array is stored in each computing unit as identification code;The nerve First calculator operation method includes:Make the input information including upstream neuron of each neuron calculator identification code and Output data;All upstream neurons are stored using the one section of content adressable memory included in each neuron calculator Identification code;It is inputted for each, the content in the identification code of the neuron calculator of input and above-mentioned memory is carried out It compares;Identification code according to meeting in comparison result find with the corresponding weight of the input, by each weight with input be multiplied To obtain product, and added up to obtain cumulative signal to all products.
Preferably, every time store the identification code in above-mentioned memory with some input compare meet when, it is defeated Go out one and meet signal, and completion signal is generated according to all signals that meets.
Preferably, when identification code meets, output high potential expression meets;The high potential that all expressions are met Signal is connected to a NAND gate, and NAND gate represents to complete when exporting low potential.
Preferably, using counter, initialization counter causes the counting of counter to be equal to input neuron population, often sends out It is raw once meet counter is made to subtract one, during counter clear output complete signal.
Preferably, when occurring to complete signal, cumulative signal is carried out mapping output by neuron calculator.
Preferably, the master controller is communicated by the bus with each computing unit;The master controller passes through institute The data that bus is read and write in the internal storage location of each computing unit are stated, and the master controller passes through the bus and each meter Calculate the neuron calculator communication of unit.
Preferably, the internal storage location is MRAM.
Preferably, the internal storage location storage weight parameter.
Preferably, cellular array includes communication network so that each computing unit of cellular array can be with neighborhood calculation Unit communication.
The technique effect of the design of the present invention, concrete structure and generation is described further below with reference to attached drawing, with It is fully understood from the purpose of the present invention, feature and effect.
Description of the drawings
With reference to attached drawing, and by reference to following detailed description, it will more easily have more complete understanding to the present invention And be more easily understood its with the advantages of and feature, wherein:
Fig. 1 is the framework of convolutional neural networks.
Fig. 2 is the schematic diagram of cellular array computing system framework according to the preferred embodiment of the invention.
Fig. 3 is the communication of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention Network diagram.
Fig. 4 is the example of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention Schematic diagram.
Fig. 5 is the exemplary schematic diagram of the Path selection of the network communication in array according to the preferred embodiment of the invention.
Fig. 6 is schematic diagram of the starting point according to the preferred embodiment of the invention on the angle of rectangular area.
Fig. 7 is schematic diagram of the starting point according to the preferred embodiment of the invention on the side of rectangular area.
Fig. 8 is schematic diagram of the starting point according to the preferred embodiment of the invention outside rectangular area.
Fig. 9 is schematic diagram of the starting point according to the preferred embodiment of the invention outside rectangular area.
Figure 10 is the schematic diagram of the specific example of communication mode according to the preferred embodiment of the invention.
Figure 11 is the schematic diagram of the specific example of mass-sending mode according to the preferred embodiment of the invention.
Figure 12 is the schematic diagram of neuron calculator operation method according to the preferred embodiment of the invention.
It should be noted that attached drawing is not intended to limit the present invention for illustrating the present invention.Note that represent that the attached drawing of structure can It can be not necessarily drawn to scale.Also, in attached drawing, same or similar element indicates same or similar label.
Specific embodiment
<Cellular array computing system>
Fig. 2 is the schematic diagram of cellular array computing system framework according to the preferred embodiment of the invention.
As shown in Fig. 2, cellular array computing system according to the preferred embodiment of the invention includes:Master controller 10 (for example, The master controller 10 is master cpu), bus 20, by multiple computing units 30 form cellular array.
For example, the master controller 10 is master controller or the outer master controller of piece in piece.
Wherein, each computing unit 30 of cellular array includes:For performing the calculating operation of neuron (for example, calculating Operation includes addition and multiplication etc., specifically for example each inputs and to be added with all inputs after being multiplied to of its weight) one Or multiple neuron calculators 31 and internal storage location 32.
Wherein, internal storage location 32 may be used SRAM or MRAM, and wherein MRAM has non-volatile and has higher density. It is therefore preferred that internal storage location 32 is MRAM.
Wherein, internal storage location 32 stores parameter, such as weight parameter.
Wherein, the master controller 10 is communicated by the bus 20 with each computing unit 30.Specifically, for example, institute State the data that master controller 10 is read and write by the bus 20 in the internal storage location 32 of each computing unit 30, and the master control Device 10 processed is communicated by the bus 20 with the neuron calculator 31 of each computing unit 30.
Wherein, its position (x, y) in cellular array is stored in each computing unit 30 as identification code, and Software and hardware in computing unit 30 can read this identification code, to be used in specific operation.
The characteristics of embodiment of the present invention is integrated using MRAM and logic circuit, allows density to reach SRAM 8-20 times the characteristics of;Wherein, the embodiment of the invention proposes to be made of a fritter MRAM and neuron calculator one thin Born of the same parents form an array, then this array is connected to form a framework by bus by a large amount of cells.This framework pole The earth has expanded memory bandwidth, improves the overall performance of chip.
<Cellular array mesh communication network>
Fig. 3 is the communication of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention Network diagram.As shown in figure 3, cellular array includes communication network so that each computing unit 30 of cellular array can be with Neighboring computational unit 30 communicates.
For example, each computing unit 30 can be read and write in the internal storage location 32 of neighboring computational unit 30 by the bus 20 Data, and each computing unit 30 can be logical by the neuron calculator 31 of the bus 20 and neighboring computational unit 30 Letter.
The output of each neuron calculator 31 is transmitted to next stage neuron calculator by bus 20 or communication network Input.
In this embodiment of the invention, cellular array framework by data mass-send and internal network, solve memory and The bottleneck problem of communication.Moreover, the embodiment of the present invention solves memory and the bottleneck of communication, so as to land productivity to a greater extent With parallel computing, higher computing capability has thus been given play to.
<Cellular array bus broadcast method>
Master controller 10 can by bus instruction or information group sending to the god in the computing unit of a rectangular area Through the same phase that the internal storage location in the computing unit of a rectangular area is dealt into first calculator and/or data group To in address.
The bus for having broadcast capability in this way can be implemented by the following method:
Fig. 4 is the example of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention Schematic diagram.
As shown in figure 4, each computing unit 30 of cellular array includes:For performing the one of the calculating operation of neuron Or multiple neuron calculators 31, internal storage location 32, bus control unit 33 and internal bus 34.
The bus control unit 33 of each computing unit 30 is connect with the bus 20.
The internal storage location 32 of each computing unit 30 is the slave device of respective inner bus 34;Each computing unit 30 is total Lane controller 33 and neuron calculator 31 are the main equipments of respective inner bus 34, and wherein bus control unit has higher excellent First grade.
Moreover, cellular array bus broadcast method specifically may include following step:
Broadcast destination address is simultaneously in the bus of cellular array for 10 read/write memory unit of master controller, wherein master controller 10 It sends or prepares to read data, bus control unit receives destination address, if the destination address in computing unit, connects the meter The internal storage location of unit is calculated to perform read-write operation.If neuron calculator reads and writes the internal storage location of the computing unit, It connects the internal storage location of the computing unit and the computing unit is read and write by neuron calculator again to perform after read-write operation is completed Internal storage location.
Master controller 10 communicates with neuron calculator, wherein it is reserved that first is reserved in the address space of master controller 10 Section is for the communication with neuron calculator.First reserved section is used to store the identification code of target computing unit.Bus control unit The targeted neuron calculator of present communications is identified in the identification code for receiving target computing unit, connects targeted god Received through first calculator with execute instruction, data receiver, state read etc. subsequent operations.
Master controller 10 performs mass-sending processing, is used for wherein reserving the second reserved section in the address space of master controller 10 Mass-send the instruction to neuron calculator and/or information, second reserves the address stored in section and include target when mass-sending data (starting point computing unit and endpoint calculation unit are in target rectangle for starting point computing unit and endpoint calculation unit in rectangular area On the diagonal in region) identification code, with send instruction and/or information.
Master controller 10 performs mass-sending processing, wherein third is reserved in the address space in master controller 10 reserves section use In mass-sending data, when mass-sending data third reserve the address stored in section include in target rectangle region starting point computing unit with The identification code of endpoint calculation unit (starting point computing unit and endpoint calculation unit are on diagonal), and mass-send and contain in data Have the number of transmission data i.e..In the data transmission of the number, the address included in data transmission each time includes Relative address of the computing unit in target rectangle region, to indicate computing unit reception data and be stored in relative address.
For example, Yi Shang agreement is implemented by bus control unit, it is responsible for decoding the address in bus, and perform accordingly with carefully The neuron calculator of intracellular and the data exchange of internal storage location.
The broadcast capability of cellular array bus can provide full-mesh neural network very big help, greatly improve The transmission speed of mass data.Concrete operation method is as follows:
A Layer assignment in full-mesh neural network (for image procossing application, is done so in a rectangular area It is more natural), each cell performs the function of one or more neurons.The weight of each input of each neuron is deposited In the internal storage location of this cell.
After a neuron calculator is completed to calculate, read by master controller 10 from its delivery outlet as a result, data are wide It is multicast in the region where next layer network, is sent to the neuron calculator of each computing unit.
Or after one or more neuron calculators in a computing unit are completed to calculate, result is stored in memory It preset relative address and is read, and is broadcast in the region where next layer of neuron by master controller 10 in unit, It is stored in preset relative address in each cell.
The broadcast capability of cellular array bus can provide full-mesh neural network very big help, greatly improve The transmission speed of mass data.
<Cellular array internal network communication method>
Fig. 5 is the exemplary schematic diagram of the Path selection of the network communication in array according to the preferred embodiment of the invention.
● identification of each information all containing starting point computing unit and endpoint calculation unit between computing unit (cell) Code.
● an information is reached home by the connection between neighboring computational unit from starting point computing unit by multiple transfer Computing unit.
● a network controller is set in each computing unit, in the case where not interfering other functions rapidly Transfering the letter breath.
● while the identification code for indicating endpoint calculation unit, information indicates the address in endpoint calculation unit simultaneously Or neuron calculator.
■ indicates under the pattern of the address in endpoint calculation unit that information will be by the network control in computing unit at the same time Device processed writes direct the appropriate address in the internal storage location of the computing unit.
■ indicates under the pattern of the neuron calculator in endpoint calculation unit that information will transfer to intracellular god simultaneously It is handled through first calculator.
● each sends and the computing unit of transfer information, network controller must all select an adjacent calculating Unit is as the next stop.
■ when starting point computing unit and endpoint calculation unit on one wire when, there are one rational selections only.
■ is in the case of other, and there are two similary rational selection, network controller can select a traffic is opposite not to be in a hurry Neighboring computational unit.
From a computing unit sending bulk message to a rectangular area, there are one simple methods:By master controller 10 reading information are mass-sended again.Another mode is provided herein:Point communications functionality between computing unit is expanded It opens up to region and mass-sends.This mode can support the higher depth of parallelism, much higher total bandwidth.It is very suitable for convolutional Neural net Network.For the mass-sending between computing unit, original sender is responsible for indicating target area, is still completed by serial transfer.
● if computing unit and transfer computing unit are sent in target area:
■ (as shown in Figure 6) on the angle of rectangular area
◆ if the width in region is 1, at this time only can be with the transfer of the selected as next stop there are one neighboring computational unit.The meter The network controller for calculating unit receives the data (if the computing unit is not sender of the message) of the information, and information is forwarded To this neighboring computational unit, then update target area (length subtracts 1).
If ● in be left the last one computing unit, stop transfer.
◆ if the length and width in region are both greater than 1, at this time can be with the transfer of the selected as next stop there are two neighboring computational unit.It should The network controller of computing unit receives the data (if the computing unit is not sender of the message) of the information, and respectively to this Two neighboring computational units forward the information, then update target area, and one of region is the rectangular area that width is 1.
■ (as shown in Figure 7) on the side of rectangular area
◆ if the width in region is 1, at this time can be with the transfer of the selected as next stop there are two neighboring computational unit.The calculating The network controller of unit receives the data (if the cell is not sender of the message) of the information, and adjacent to the two respectively Computing unit forwards the information, then updates target area.
◆ if the width in region is more than 1, at this time can be with the transfer of the selected as next stop there are three neighboring computational unit.The meter The network controller for calculating unit receives the data (if the cell is not sender of the message) of the information, and respectively to these three phases Adjacent computing unit forwards the information, then updates target area, and two of which region is the rectangular area that width is 1.
■, at this time can be with the transfer of the selected as next stop, and such as there are four neighboring computational unit in the inside of rectangular area The fruit computing unit is only possible to be sender of the message).And the information is forwarded to this four neighboring computational units respectively, then update Target area, two of which region are the rectangular areas that width is 1.
If ● send computing unit and transfer computing unit outside target area (as shown in Figure 8 and Figure 9).In view of god It needs to say that piece of data is transferred to another on a large scale through network, in this case, communication network is easy to get congestion.This In using an agreement, congestion can be avoided in convolutional network through research.
■ indicates transmission direction when sending information.
Information all is transmitted by transmission direction during ■ transfers each time, when coordinate in a forward direction enters target area After coordinate range, gradually laterally the mass-sending that width is 1 is done.
Specific implementation method (as shown in Figure 10):
1. the communication port between each two neighboring computational unit is made of a pair of of FIFO (first in, first out).From a calculating Unit is written and the FIFO of another computing unit output and FIFO of opposite direction.It stands the angle of a computing unit wherein On, can it be referred to as input and output FIFOs.
2. network controller is connected with all (most 4 couples) FIFO in computing unit.Network controller also with the calculating MPU connections in unit are sent out to it and are interrupted (such as FIFO empty, FIFO is full, new to arrive information etc., information submitting etc.).
3. network controller transmission, reception and transfer information.
4. if some input FIFO has information entrance, network controller will first check for it:
● if terminal is this computing unit, then:
If ■ terminal feature relative addresses, since network controller has DMA abilities, information will be directly stored in interior The appropriate address of memory cell and with notice neuron calculator
If ■ terminals are neuron calculators, directly it is notified to be handled.
Cellular array network mass-sending function can provide very big help, pole for neural network particularly convolutional neural networks The earth improves the transmission bandwidth of mass data.Concrete operation method is as follows:
1. the Layer assignment in neural network (for image procossing application, is done so more in a rectangular area It is natural), each computing unit performs the function of one or more neurons.The weight of each input of each neuron is deposited In internal storage location/MRAM of this computing unit.
2. adjacent layer is deployed in adjacent region.
After this layer of all neuron is completed to calculate, the region where the layer that transmission direction is directed towards, knowledge All computing units synchronize mass-sending, as shown in figure 11.
The advantage that neural computing is realized using the present invention is obvious:
1. the arithmetic speed that the parallel computing of a large amount of computing units is significantly speeded up, so that the speed pole of learning training It is big to improve.
2. the huge bandwidth of cellular array internal network communication and mass-sending mechanism are equally remarkably contributing to improve speed.
3.MRAM's is non-volatile so that the successful chip of training can be replicated directly as the product for solving particular problem Sale.
<Neuron calculator operation method>
The Comparision of neuron calculator is simple:The output for each the upper strata neuron for having connection therewith, it is multiplied by Corresponding weight, adds up.Usually last result also needs to do a simple mapping, for example, be mapped to 0 to 1 it Between a number.However, each neuron has tens at least, at most thousands of input neurons, in a net In network environment, it is difficult to ensure that these inputs arrive by set sequence.A method rapidly and efficiently is needed, it is defeated to each Enter, go to search comparison, also to determine whether that all inputs all have arrived at.
The present invention proposes that a method solves the problems, such as this, as shown in figure 12:
1. make the identification code of the input information including upstream neuron of each neuron calculator in network and defeated Go out data.
2. make comprising one section of content adressable memory (SRAM, CAM) in each neuron calculator, it is all to store The identification code of upstream neuron.This internal storage location can once complete input and the comparison of all identification codes, meet An output is generated during identification code.
3. being inputted for each, the content in the identification code of the neuron calculator of input and above-mentioned memory is carried out It compares;Identification code according to meeting in comparison result find with the corresponding weight of the input, by each weight with input be multiplied To obtain product, and added up to obtain cumulative signal to all products.For example, it is a kind of look for address implementation method be: The storage of weight address is the corresponding address of its identification code.
4. every time store the identification code in above-mentioned memory with some input compare meet when, export one Meet signal, and completion signal is generated according to all signals that meets.There are two types of implementation methods:
A. when all identification codes meet, output high potential expression meets.All signals that meet are connected to one NAND gate, NAND gate represent to complete when exporting low potential.
B. using counter, initialization counter causes the counting of counter to be equal to input neuron population, often occurs one It is secondary meet counter is made to subtract one, during counter clear output complete signal.
5. when occurring to complete signal, cumulative signal is carried out mapping output by neuron calculator.
Thereby, it is possible to while ensureing that input arrives by set sequence, improve service speed.
<Cellular array three-dimensional communication transmission method>
Three-dimensional cell array neural network chip can provide higher performance.Cell according to the preferred embodiment of the invention Array three-dimensional method for communication transmission may include:
By multi-layer cellular array neural network chip (that is, cellular array computing system) perpendicular through silicon hole (TSV) excessively It is overlapped into three-dimensional chip;
The bus of every one layer of cells array is got up by crossing silicon hole connection.
The computing unit in two neighbouring cellular array neural network chips is made to pass through silicon hole and carries out network Connection.In this way, Extension of Communication Networks between computing unit is into three-dimensional communication network.
The neighbouring cell battle array adjacent neural net layer in multilayer neural network being deployed in three-dimensional chip In row neural network chip layer;When needing to transmit data to next layer of neural network by communication network, first passed through silicon and led to Hole carries out the data transmission of a vertical direction, and the communication network per one layer of cells array neural network chip is recycled to carry out water Flat data transmission.
This pattern of the preferred embodiment of the present invention is especially efficient for convolutional neural network so that several network transmissions Period can complete the transhipment work for needing thousands of a periods that could complete originally.
<Convolutional neural networks implementation method>
Cellular array framework has very big flexibility, provides another convolutional neural networks implementation method here:
1. by one layer of convolutional neural networks by the interior location relational deployment of this layer of convolutional neural networks in cellular array In one rectangular area.
2. one by one weight is sent in rectangular area where each neuron by cellular array bus broadcast method Computing unit, be stored in the memory of the computing unit.(weight of convolutional neural networks has translation invariance, each nerve It is different, but all neurons use same group of weight that member closes on the corresponding weight of cell input from difference.It is very suitable for It is sent using broadcast mechanism).
3. for this layer of convolutional neural networks, receiving the calculating of input progress neuron, (neuron calculating includes:It will be each defeated Enter multiplied by weight corresponding with its, then sum to all products), the output of each neuron is temporarily retained in nerve In member calculates.
4. by cellular array bus broadcast method next layer of convolutional neural networks (above-mentioned this layer of convolutional Neural Next layer of convolutional neural networks of network) weight be sent in the computing unit of the rectangular area.
5. last layer neuron, (the above-mentioned layer is rolled up using the cellular array mesh communication network in cellular array Product neural network last layer neuron) output be transferred in predetermined close region need the output computing unit in. (in convolutional neural networks, the neuron of be only output to close on zonule of each neuron is very suitable for using cell Array mesh communication network is spread.Only need a few clock cycle that can complete all data exchanges)
6. using the output of new weight (weights of next layer of convolutional neural networks) and last layer neuron, carry out next The calculating of layer convolutional neural networks.
7. if following several layer networks or convolutional network, still can carry out successively like this.
The advantages of this method is:
1. if the image of input is very big, and convolutional network usually has ten several layers of, will occupy a large amount of hardware and memory source. This method carries out the calculating of different layers using same hardware, dramatically saves hardware resource.
2. utilizing cellular array mesh communication network, the data exchange between different layers is very efficient.
The preferred embodiment of the present invention has shown and described in above description, as previously described, it should be understood that the present invention is not office Be limited to form disclosed herein, be not to be taken as the exclusion to other embodiment, and available for various other combinations, modification and Environment, and can be changed in the scope of the invention is set forth herein by the above teachings or related fields of technology or knowledge It is dynamic.And changes and modifications made by those skilled in the art do not depart from the spirit and scope of the present invention, then it all should be appended by the present invention In scope of the claims.

Claims (9)

1. a kind of neuron calculator operation method for cellular array computing system, wherein cellular array computing system includes Master controller, bus form cellular array by multiple computing units;Wherein, each computing unit of cellular array includes:For The one or more neuron calculators and internal storage location of the calculating operation of neuron are performed, wherein, each computing unit In store its position in cellular array as identification code;It is characterized in that the neuron calculator operation method packet It includes:
The input information of each neuron calculator is made to include the identification code and output data of upstream neuron;
The knowledge of all upstream neurons is stored using the one section of content adressable memory included in each neuron calculator Other code;
It is inputted for each, the identification code of the neuron calculator of input is compared with the content in above-mentioned memory; Identification code according to meeting in comparison result find with the corresponding weight of the input, each weight is multiplied to obtain with input Product, and added up to obtain cumulative signal to all products.
2. it to be used for the neuron calculator operation method of cellular array computing system as described in claim 1, it is characterised in that It further includes:Every time store the identification code in above-mentioned memory with some input compare meet when, export one symbol Signal is closed, and completion signal is generated according to all signals that meets.
3. being used for the neuron calculator operation method of cellular array computing system as claimed in claim 1 or 2, feature exists In further including:When identification code meets, output high potential expression meets;The high potential signal that all expressions meet is connect To a NAND gate, NAND gate represents to complete when exporting low potential.
4. it to be used for the neuron calculator operation method of cellular array computing system as claimed in claim 3, which is characterized in that Using counter, initialization counter causes the counting of counter to be equal to input neuron population, and often occurring once to meet makes meter Number devices subtract one, during counter clear output complete signal.
5. it to be used for the neuron calculator operation method of cellular array computing system as claimed in claim 3, which is characterized in that When occurring to complete signal, cumulative signal is carried out mapping output by neuron calculator.
6. being used for the neuron calculator operation method of cellular array computing system as claimed in claim 1 or 2, feature exists In the master controller is communicated by the bus with each computing unit;The master controller is read and write every by the bus Data in the internal storage location of a computing unit, and the master controller passes through the bus and the nerve of each computing unit First calculator communication.
7. being used for the neuron calculator operation method of cellular array computing system as claimed in claim 1 or 2, feature exists In the internal storage location is MRAM.
8. being used for the neuron calculator operation method of cellular array computing system as claimed in claim 1 or 2, feature exists In the internal storage location stores weight parameter.
9. being used for the neuron calculator operation method of cellular array computing system as claimed in claim 1 or 2, feature exists In cellular array includes communication network so that each computing unit of cellular array can communicate with neighboring computational unit.
CN201611238870.8A 2016-12-28 2016-12-28 For the neuron calculator operation method of cellular array computing system Pending CN108255514A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611238870.8A CN108255514A (en) 2016-12-28 2016-12-28 For the neuron calculator operation method of cellular array computing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611238870.8A CN108255514A (en) 2016-12-28 2016-12-28 For the neuron calculator operation method of cellular array computing system

Publications (1)

Publication Number Publication Date
CN108255514A true CN108255514A (en) 2018-07-06

Family

ID=62720269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611238870.8A Pending CN108255514A (en) 2016-12-28 2016-12-28 For the neuron calculator operation method of cellular array computing system

Country Status (1)

Country Link
CN (1) CN108255514A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0438800A2 (en) * 1990-01-24 1991-07-31 Hitachi, Ltd. Neural network processing system using semiconductor memories
CN101866446A (en) * 2010-03-08 2010-10-20 李爱国 Method for community correction work and correction device
CN105740946A (en) * 2015-07-29 2016-07-06 上海磁宇信息科技有限公司 Method for realizing neural network calculation by using cell array computing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0438800A2 (en) * 1990-01-24 1991-07-31 Hitachi, Ltd. Neural network processing system using semiconductor memories
CN101866446A (en) * 2010-03-08 2010-10-20 李爱国 Method for community correction work and correction device
CN105740946A (en) * 2015-07-29 2016-07-06 上海磁宇信息科技有限公司 Method for realizing neural network calculation by using cell array computing system

Similar Documents

Publication Publication Date Title
CN106022468B (en) the design method of artificial neural network processor integrated circuit and the integrated circuit
JP6714690B2 (en) Information processing system, method of operating information processing system, and machine learning computing unit
US11055608B2 (en) Convolutional neural network
US11423296B2 (en) Device and method for distributing convolutional data of a convolutional neural network
CN108256640A (en) Convolutional neural networks implementation method
CN105740946B (en) A kind of method that application cell array computation system realizes neural computing
CN107918794A (en) Neural network processor based on computing array
CN105719000B (en) A kind of neuron hardware unit and the method with this unit simulation impulsive neural networks
CN109416756A (en) Acoustic convolver and its applied artificial intelligence process device
CN110390388A (en) Neuromorphic circuit with 3D stacked structure and the semiconductor device including it
CN104881666B (en) A kind of real-time bianry image connected component labeling implementation method based on FPGA
CN105718996B (en) Cellular array computing system and communication means therein
KR20200037748A (en) Chip device and related product
CN107392309A (en) A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA
JP6912491B2 (en) Energy-saving multiple neural core circuits, methods and neurosynaptic systems
CN107766935B (en) Multilayer artificial neural network
CN112464784A (en) Distributed training method based on hybrid parallel
CN105469143B (en) Network-on-chip method for mapping resource based on neural network dynamic feature
CN106951926A (en) The deep learning systems approach and device of a kind of mixed architecture
CN108256637A (en) A kind of cellular array three-dimensional communication transmission method
CN108647776A (en) A kind of convolutional neural networks convolution expansion process circuit and method
CN108090496A (en) The method and apparatus of image procossing based on convolutional neural networks
CN109670581A (en) A kind of computing device and board
CN114443862A (en) Knowledge graph completion method and system based on weighted graph convolution network
CN108256641A (en) For the cellular array internal network communication method of cellular array computing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180706