CN108256640A - Convolutional neural networks implementation method - Google Patents
Convolutional neural networks implementation method Download PDFInfo
- Publication number
- CN108256640A CN108256640A CN201611236090.XA CN201611236090A CN108256640A CN 108256640 A CN108256640 A CN 108256640A CN 201611236090 A CN201611236090 A CN 201611236090A CN 108256640 A CN108256640 A CN 108256640A
- Authority
- CN
- China
- Prior art keywords
- computing unit
- neuron
- cellular array
- convolutional neural
- neural networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4063—Device-to-bus coupling
- G06F13/4068—Electrical coupling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17306—Intercommunication techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17356—Indirect interconnection networks
- G06F15/17362—Indirect interconnection networks hierarchical topologies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Neurology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of convolutional neural networks implementation methods using cellular array computing system.By convolutional neural networks by the interior location relational deployment of each layer convolutional neural networks in a rectangular area of cellular array;Each weight of each layer is one by one sent to by cellular array bus broadcast method by each cell in rectangular area, is stored in the memory of the computing unit;For first layer convolutional neural networks, receive input and carry out neuron calculating, the output of each neuron is temporarily retained in computing unit;The output of this layer of neuron is transferred in the computing unit for needing the output in predetermined close region using the cellular array mesh communication network in cellular array;Using the output of next layer of weight and last layer neuron, the calculating of next layer of convolutional neural networks is carried out, to the last one layer of completion.
Description
Technical field
The present invention relates to semiconductor chip field and artificial intelligence fields more particularly to a kind of use cellular array to calculate
The convolutional neural networks implementation method of system.
Background technology
Human brain is a network connected by a large amount of neuron complexity.Each neuron is connected a large amount of by a large amount of dendron
Other neurons, receive information, each tie point is cynapse (Synapse).After outside stimulus accumulation to a certain extent,
A stimulus signal is generated, is sent out by aixs cylinder.Aixs cylinder has a large amount of tip, by cynapse, is connected to other a large amount of god
Dendron through member.Exactly such a network being made of the neuron of simple functions realizes all intelligency activities of the mankind.
Man memory and intelligence are generally believed that in the different stiffness of couplings for being stored in each cynapse.
The response frequency of neuron is no more than 100Hz, 10,000,000 times faster than human brain of the CPU of modern computer, but handles very
The ability of more challenges is not so good as human brain.This has promoted computer industry to start to imitate human brain.The earliest imitation to human brain is
In software view.
Neural network (Neural Networks) is common algorithm in computer learning.God in neural network algorithm
It is exactly a function through member, it has very multiple input, each input corresponds to a weight.General algorithm is each
Input is multiplied by weight and is being added.It exports 0 or 1 (being determined by a threshold value) or a value between 0 and 1.One allusion quotation
The neural network of type is the network that the output input of a large amount of cellular arrays (Neuron) is linked together, is typically organized to more
Level framework.It is exactly to adjust these parameters to have very multiple parameters (weight, threshold value), the process of learning training inside it.This is one
The function optimization that magnanimity is needed to calculate.This kind of algorithm has been achieved for abundant achievement, is used widely.
Network in neural network algorithm is all divided into plurality of layers.Earliest network, each neuron of last layer
Each neuron with next layer connects, and becomes the network of full-mesh.One problem of full-mesh network, is image procossing
In this kind of application, there are many pixel of image, and the weight quantity of each layer of needs is proportional to pixel square, thus in program occupancy
It deposits too greatly, calculation amount is even more to be unable to cope with.
In convolutional neural networks, the plurality of layers of front is no longer full-mesh.Each layer of neuron is as a figure
As being ordered in array.Next layer each neuron is only connected with a zonule of this layer.Zonule is often one
The length of side is the square region of k, and k is known as the size of cores (Kernel Size) of convolutional network, as shown in Figure 1.
Convolutional neural networks (Convolutional Neural Network, CNN) are because to each of this zonule
The summation of point weighted is gained the name similar to convolution.Each point of this group of weight in each same confluent monolayer cells is the same
Weight quantity is greatly reduced so as to be compared with full-mesh network so that high-resolution image procossing in (both translation invariances)
It is possibly realized.One convolutional neural networks includes multiple layers connected in this way and other kinds of layer.
With popularizing for deep learning application, people start to develop dedicated neural network chip.It is realized with special circuit
The addition and multiplication that neuron calculates, than with CPU GPU much more efficients.
Reluctance type random access storage device (Magnetic Random Access Memory, MRAM) is a kind of new interior
Deposit and memory technology, can as SRAM/DRAM quick random read-write, it is and faster than DRAM;Can also as flash memory
Permanent retention data after power-off, and unlike NAND, it can be with unlimited secondary erasable.
It is local good that the economy of MRAM is thought, the silicon area that unit capacity occupies is than SRAM (usually as the caching of CPU)
There is very big advantage, be expected to the level close to DRAM.Its performance is also fairly good, reads and writes time delay close to best SRAM, power consumption
It is then best in various memories and memory technology.And MRAM unlike DRAM and flash memory with standard CMOS semiconductor technique not
It is compatible.MRAM can be integrated into logic circuit in a chip.There is MRAM technology, it is possible to memory, storage, calculate three
A function is integrated on a chip.New computing architecture is possible to.
The characteristics of human brain is extensive parallel computing, not only has a large amount of neuron that can work at the same time, but also each god
It is connected through member with thousands of a neurons.For modern integrated circuits technology, a large amount of neuron is integrated on a single die
It is easy to, but internal communication bandwidth as offer human brain is extremely difficult.For example, if the input data of one layer of neuron exists
In one block RAM, it is necessary to which at least k clock cycle could read out data, because the memory do not gone together cannot be carried out at the same time
Read-write.The speed of data is read as a result, and both memory bandwidth was the bottleneck calculated.
Invention content
In view of the drawbacks described above of the prior art, a kind of and neural network frame based on cellular array is proposed in the present invention
The framework of structure is made of numerous neuron calculators for having both store function and dense network connection.The present invention's is this new
Framework will will be widely used in the fields such as mass computing, big data processing, artificial intelligence.
To achieve the above object, the present invention provides a kind of wherein cellular array computing system include master controller, bus,
Cellular array is formed by multiple computing units;Wherein each computing unit of cellular array includes the calculating for performing neuron
The one or more neuron calculators or microprocessor and internal storage location of operation;Wherein, the convolutional neural networks are realized
Method includes:By convolutional neural networks by the interior location relational deployment of each layer convolutional neural networks at one of cellular array
In rectangular area;One by one each weight of each layer is sent to by cellular array bus broadcast method each in rectangular area
Cell is stored in the memory of the computing unit;For first layer convolutional neural networks, receive input and carry out neuron calculating, it will
The output of each neuron is temporarily retained in computing unit;Using the cellular array mesh communication network in cellular array this
The output of one layer of neuron is transferred in the computing unit for needing the output in predetermined close region;Use next layer of weight
With the output of last layer neuron, the calculating of next layer of convolutional neural networks is carried out, to the last one layer of completion.
For convolutional neural networks, the weight of each neuron is few, is exactly a last layer network zonule nearby
Neuron.One convolutional neural networks usually has more than ten of convolutional layer, and cell usually has all power of enough memory storages
Weight.And the weight of convolutional network has translation invariance, is particularly suitable for sending using broadcast mechanism.
Preferably, cellular array mesh communication network enables each computing unit of cellular array and neighborhood calculation list
Member communication, wherein each computing unit transmits data to the memory of neighboring computational unit by cellular array mesh communication network
Particular address in unit or the neuron calculator for being sent to neighboring computational unit.
Preferably, in cellular array mesh communication network, the communication port between each two neighboring computational unit is by one
Cell fifo is formed, the pair of cell fifo includes the first cell fifo and the second cell fifo;It is wherein adjacent to two
First cell fifo of the computing unit write-in information in computing unit with from another in two neighboring computational units
The operation direction of second cell fifo of computing unit output information is opposite.
Preferably, it in cellular array mesh communication network, is set in each computing unit there are one can be in direct read/write
The network controller deposited, and all cell fifos connection in the network controller and computing unit in computing unit.
Preferably, information is written in a computing unit of first cell fifo into two neighboring computational units, then net
Network controller judges whether the terminal of the information of write-in is this computing unit;And the terminal of the information in judgement write-in is this meter
When calculating unit, operations described below is performed:When terminal is relative address, information is directly stored in the phase of internal storage location by network controller
Answer address and with notice neuron calculator;When terminal is neuron calculator, the neuron calculator is directly notified to carry out
Processing.
Preferably, the cellular array bus broadcast method includes:Master controller is by bus instruction and/or ensemble
A rectangular area is dealt into the neuron calculator being dealt into the computing unit of a rectangular area and/or data group
Computing unit in internal storage location same relative address in.
Preferably, each computing unit of cellular array further includes:Bus control unit and internal bus;It is each to calculate list
The bus control unit of member is connect with the bus;The internal storage location of each computing unit is the slave device of respective inner bus;Often
The bus control unit and neuron calculator of a computing unit are the main equipments of respective inner bus, and wherein bus control unit has
Higher priority.
Preferably, master controller read/write memory unit, wherein the master controller first broadcast target in the bus of cellular array
Address retransmits or prepares to read data, and bus control unit receives destination address, if the destination address connects in computing unit
The internal storage location of the computing unit is connect to perform read-write operation.
Preferably, master controller communicates with neuron calculator, wherein reserving first in the address space of master controller
Reserved section is for the communication with neuron calculator;First reserved section is used to store the identification code of target computing unit;Total line traffic control
Device processed identifies the targeted neuron calculator of present communications in the identification code for receiving target computing unit, and connection is targeted
Neuron calculator to perform subsequent operation.
The technique effect of the design of the present invention, concrete structure and generation is described further below with reference to attached drawing, with
It is fully understood from the purpose of the present invention, feature and effect.
Description of the drawings
With reference to attached drawing, and by reference to following detailed description, it will more easily have more complete understanding to the present invention
And be more easily understood its with the advantages of and feature, wherein:
Fig. 1 is the framework of convolutional neural networks.
Fig. 2 is the schematic diagram of cellular array computing system framework according to the preferred embodiment of the invention.
Fig. 3 is the communication of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention
Network diagram.
Fig. 4 is the example of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention
Schematic diagram.
Fig. 5 is the exemplary schematic diagram of the Path selection of the network communication in array according to the preferred embodiment of the invention.
Fig. 6 is schematic diagram of the starting point according to the preferred embodiment of the invention on the angle of rectangular area.
Fig. 7 is schematic diagram of the starting point according to the preferred embodiment of the invention on the side of rectangular area.
Fig. 8 is schematic diagram of the starting point according to the preferred embodiment of the invention outside rectangular area.
Fig. 9 is schematic diagram of the starting point according to the preferred embodiment of the invention outside rectangular area.
Figure 10 is the schematic diagram of the specific example of communication mode according to the preferred embodiment of the invention.
Figure 11 is the schematic diagram of the specific example of mass-sending mode according to the preferred embodiment of the invention.
Figure 12 is the schematic diagram of neuron calculator operation method according to the preferred embodiment of the invention.
It should be noted that attached drawing is not intended to limit the present invention for illustrating the present invention.Note that represent that the attached drawing of structure can
It can be not necessarily drawn to scale.Also, in attached drawing, same or similar element indicates same or similar label.
Specific embodiment
<Cellular array computing system>
Fig. 2 is the schematic diagram of cellular array computing system framework according to the preferred embodiment of the invention.
As shown in Fig. 2, cellular array computing system according to the preferred embodiment of the invention includes:Master controller 10 (for example,
The master controller 10 is master cpu), bus 20, by multiple computing units 30 form cellular array.
For example, the master controller 10 is master controller or the outer master controller of piece in piece.
Wherein, each computing unit 30 of cellular array includes:For performing the calculating operation of neuron (for example, calculating
Operation includes addition and multiplication etc., specifically for example each inputs and to be added with all inputs after being multiplied to of its weight) one
Or multiple neuron calculators 31 and internal storage location 32.
Wherein, internal storage location 32 may be used SRAM or MRAM, and wherein MRAM has non-volatile and has higher density.
It is therefore preferred that internal storage location 32 is MRAM.
Wherein, internal storage location 32 stores parameter, such as weight parameter.
Wherein, the master controller 10 is communicated by the bus 20 with each computing unit 30.Specifically, for example, institute
State the data that master controller 10 is read and write by the bus 20 in the internal storage location 32 of each computing unit 30, and the master control
Device 10 processed is communicated by the bus 20 with the neuron calculator 31 of each computing unit 30.
Wherein, its position (x, y) in cellular array is stored in each computing unit 30 as identification code, and
Software and hardware in computing unit 30 can read this identification code, to be used in specific operation.
The characteristics of embodiment of the present invention is integrated using MRAM and logic circuit, allows density to reach SRAM
8-20 times the characteristics of;Wherein, the embodiment of the invention proposes to be made of a fritter MRAM and neuron calculator one thin
Born of the same parents form an array, then this array is connected to form a framework by bus by a large amount of cells.This framework pole
The earth has expanded memory bandwidth, improves the overall performance of chip.
<Cellular array mesh communication network>
Fig. 3 is the communication of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention
Network diagram.As shown in figure 3, cellular array includes communication network so that each computing unit 30 of cellular array can be with
Neighboring computational unit 30 communicates.
For example, each computing unit 30 can be read and write in the internal storage location 32 of neighboring computational unit 30 by the bus 20
Data, and each computing unit 30 can be logical by the neuron calculator 31 of the bus 20 and neighboring computational unit 30
Letter.
The output of each neuron calculator 31 is transmitted to next stage neuron calculator by bus 20 or communication network
Input.
In this embodiment of the invention, cellular array framework by data mass-send and internal network, solve memory and
The bottleneck problem of communication.Moreover, the embodiment of the present invention solves memory and the bottleneck of communication, so as to land productivity to a greater extent
With parallel computing, higher computing capability has thus been given play to.
<Cellular array bus broadcast method>
Master controller 10 can by bus instruction or information group sending to the god in the computing unit of a rectangular area
Through the same phase that the internal storage location in the computing unit of a rectangular area is dealt into first calculator and/or data group
To in address.
The bus for having broadcast capability in this way can be implemented by the following method:
Fig. 4 is the example of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention
Schematic diagram.
As shown in figure 4, each computing unit 30 of cellular array includes:For performing the one of the calculating operation of neuron
Or multiple neuron calculators 31, internal storage location 32, bus control unit 33 and internal bus 34.
The bus control unit 33 of each computing unit 30 is connect with the bus 20.
The internal storage location 32 of each computing unit 30 is the slave device of respective inner bus 34;Each computing unit 30 is total
Lane controller 33 and neuron calculator 31 are the main equipments of respective inner bus 34, and wherein bus control unit has higher excellent
First grade.
Moreover, cellular array bus broadcast method specifically may include following step:
Broadcast destination address is simultaneously in the bus of cellular array for 10 read/write memory unit of master controller, wherein master controller 10
It sends or prepares to read data, bus control unit receives destination address, if the destination address in computing unit, connects the meter
The internal storage location of unit is calculated to perform read-write operation.If neuron calculator reads and writes the internal storage location of the computing unit,
It connects the internal storage location of the computing unit and the computing unit is read and write by neuron calculator again to perform after read-write operation is completed
Internal storage location.
Master controller 10 communicates with neuron calculator, wherein it is reserved that first is reserved in the address space of master controller 10
Section is for the communication with neuron calculator.First reserved section is used to store the identification code of target computing unit.Bus control unit
The targeted neuron calculator of present communications is identified in the identification code for receiving target computing unit, connects targeted god
Received through first calculator with execute instruction, data receiver, state read etc. subsequent operations.
Master controller 10 performs mass-sending processing, is used for wherein reserving the second reserved section in the address space of master controller 10
Mass-send the instruction to neuron calculator and/or information, second reserves the address stored in section and include target when mass-sending data
(starting point computing unit and endpoint calculation unit are in target rectangle for starting point computing unit and endpoint calculation unit in rectangular area
On the diagonal in region) identification code, with send instruction and/or information.
Master controller 10 performs mass-sending processing, wherein third is reserved in the address space in master controller 10 reserves section use
In mass-sending data, when mass-sending data third reserve the address stored in section include in target rectangle region starting point computing unit with
The identification code of endpoint calculation unit (starting point computing unit and endpoint calculation unit are on diagonal), and mass-send and contain in data
Have the number of transmission data i.e..In the data transmission of the number, the address included in data transmission each time includes
Relative address of the computing unit in target rectangle region, to indicate computing unit reception data and be stored in relative address.
For example, Yi Shang agreement is implemented by bus control unit, it is responsible for decoding the address in bus, and perform accordingly with carefully
The neuron calculator of intracellular and the data exchange of internal storage location.
The broadcast capability of cellular array bus can provide full-mesh neural network very big help, greatly improve
The transmission speed of mass data.Concrete operation method is as follows:
A Layer assignment in full-mesh neural network (for image procossing application, is done so in a rectangular area
It is more natural), each cell performs the function of one or more neurons.The weight of each input of each neuron is deposited
In the internal storage location of this cell.
After a neuron calculator is completed to calculate, read by master controller 10 from its delivery outlet as a result, data are wide
It is multicast in the region where next layer network, is sent to the neuron calculator of each computing unit.
Or after one or more neuron calculators in a computing unit are completed to calculate, result is stored in memory
It preset relative address and is read, and is broadcast in the region where next layer of neuron by master controller 10 in unit,
It is stored in preset relative address in each cell.
The broadcast capability of cellular array bus can provide full-mesh neural network very big help, greatly improve
The transmission speed of mass data.
<Cellular array internal network communication method>
Fig. 5 is the exemplary schematic diagram of the Path selection of the network communication in array according to the preferred embodiment of the invention.
● identification of each information all containing starting point computing unit and endpoint calculation unit between computing unit (cell)
Code.
● an information is reached home by the connection between neighboring computational unit from starting point computing unit by multiple transfer
Computing unit.
● a network controller is set in each computing unit, in the case where not interfering other functions rapidly
Transfering the letter breath.
● while the identification code for indicating endpoint calculation unit, information indicates the address in endpoint calculation unit simultaneously
Or neuron calculator.
■ indicates under the pattern of the address in endpoint calculation unit that information will be by the network control in computing unit at the same time
Device processed writes direct the appropriate address in the internal storage location of the computing unit.
■ indicates under the pattern of the neuron calculator in endpoint calculation unit that information will transfer to intracellular god simultaneously
It is handled through first calculator.
● each sends and the computing unit of transfer information, network controller must all select an adjacent calculating
Unit is as the next stop.
■ when starting point computing unit and endpoint calculation unit on one wire when, there are one rational selections only.
■ is in the case of other, and there are two similary rational selection, network controller can select a traffic is opposite not to be in a hurry
Neighboring computational unit.
From a computing unit sending bulk message to a rectangular area, there are one simple methods:By master controller
10 reading information are mass-sended again.Another mode is provided herein:Point communications functionality between computing unit is expanded
It opens up to region and mass-sends.This mode can support the higher depth of parallelism, much higher total bandwidth.It is very suitable for convolutional Neural net
Network.For the mass-sending between computing unit, original sender is responsible for indicating target area, is still completed by serial transfer.
● if computing unit and transfer computing unit are sent in target area:
■ (as shown in Figure 6) on the angle of rectangular area
◆ if the width in region is 1, at this time only can be with the transfer of the selected as next stop there are one neighboring computational unit.The meter
The network controller for calculating unit receives the data (if the computing unit is not sender of the message) of the information, and information is forwarded
To this neighboring computational unit, then update target area (length subtracts 1).
If ● in be left the last one computing unit, stop transfer.
◆ if the length and width in region are both greater than 1, at this time can be with the transfer of the selected as next stop there are two neighboring computational unit.It should
The network controller of computing unit receives the data (if the computing unit is not sender of the message) of the information, and respectively to this
Two neighboring computational units forward the information, then update target area, and one of region is the rectangular area that width is 1.
■ (as shown in Figure 7) on the side of rectangular area
◆ if the width in region is 1, at this time can be with the transfer of the selected as next stop there are two neighboring computational unit.The calculating
The network controller of unit receives the data (if the cell is not sender of the message) of the information, and adjacent to the two respectively
Computing unit forwards the information, then updates target area.
◆ if the width in region is more than 1, at this time can be with the transfer of the selected as next stop there are three neighboring computational unit.The meter
The network controller for calculating unit receives the data (if the cell is not sender of the message) of the information, and respectively to these three phases
Adjacent computing unit forwards the information, then updates target area, and two of which region is the rectangular area that width is 1.
■, at this time can be with the transfer of the selected as next stop, and such as there are four neighboring computational unit in the inside of rectangular area
The fruit computing unit is only possible to be sender of the message).And the information is forwarded to this four neighboring computational units respectively, then update
Target area, two of which region are the rectangular areas that width is 1.
If ● send computing unit and transfer computing unit outside target area (as shown in Figure 8 and Figure 9).In view of god
It needs to say that piece of data is transferred to another on a large scale through network, in this case, communication network is easy to get congestion.This
In using an agreement, congestion can be avoided in convolutional network through research.
■ indicates transmission direction when sending information.
Information all is transmitted by transmission direction during ■ transfers each time, when coordinate in a forward direction enters target area
After coordinate range, gradually laterally the mass-sending that width is 1 is done.
Specific implementation method (as shown in Figure 10):
1. the communication port between each two neighboring computational unit is made of a pair of of FIFO (first in, first out).From a calculating
Unit is written and the FIFO of another computing unit output and FIFO of opposite direction.It stands the angle of a computing unit wherein
On, can it be referred to as input and output FIFOs.
2. network controller is connected with all (most 4 couples) FIFO in computing unit.Network controller also with the calculating
MPU connections in unit are sent out to it and are interrupted (such as FIFO empty, FIFO is full, new to arrive information etc., information submitting etc.).
3. network controller transmission, reception and transfer information.
4. if some input FIFO has information entrance, network controller will first check for it:
● if terminal is this computing unit, then:
If ■ terminal feature relative addresses, since network controller has DMA abilities, information will be directly stored in interior
The appropriate address of memory cell and with notice neuron calculator
If ■ terminals are neuron calculators, directly it is notified to be handled.
Cellular array network mass-sending function can provide very big help, pole for neural network particularly convolutional neural networks
The earth improves the transmission bandwidth of mass data.Concrete operation method is as follows:
1. the Layer assignment in neural network (for image procossing application, is done so more in a rectangular area
It is natural), each computing unit performs the function of one or more neurons.The weight of each input of each neuron is deposited
In internal storage location/MRAM of this computing unit.
2. adjacent layer is deployed in adjacent region.
After this layer of all neuron is completed to calculate, the region where the layer that transmission direction is directed towards is supported
All computing units synchronize mass-sending, as shown in figure 11.
The advantage that neural computing is realized using the present invention is obvious:
1. the arithmetic speed that the parallel computing of a large amount of computing units is significantly speeded up, so that the speed pole of learning training
It is big to improve.
2. the huge bandwidth of cellular array internal network communication and mass-sending mechanism are equally remarkably contributing to improve speed.
3.MRAM's is non-volatile so that the successful chip of training can be replicated directly as the product for solving particular problem
Sale.
<Neuron calculator operation method>
The Comparision of neuron calculator is simple:The output for each the upper strata neuron for having connection therewith, it is multiplied by
Corresponding weight, adds up.Usually last result also needs to do a simple mapping, for example, be mapped to 0 to 1 it
Between a number.However, each neuron has tens at least, at most thousands of input neurons, in a net
In network environment, it is difficult to ensure that these inputs arrive by set sequence.A method rapidly and efficiently is needed, it is defeated to each
Enter, go to search comparison, also to determine whether that all inputs all have arrived at.
The present invention proposes that a method solves the problems, such as this, as shown in figure 12:
1. make the identification code of the input information including upstream neuron of each neuron calculator in network and defeated
Go out data.
2. make comprising one section of content adressable memory (SRAM, CAM) in each neuron calculator, it is all to store
The identification code of upstream neuron.This internal storage location can once complete input and the comparison of all identification codes, meet
An output is generated during identification code.
3. being inputted for each, the content in the identification code of the neuron calculator of input and above-mentioned memory is carried out
It compares;Identification code according to meeting in comparison result find with the corresponding weight of the input, by each weight with input be multiplied
To obtain product, and added up to obtain cumulative signal to all products.For example, it is a kind of look for address implementation method be:
The storage of weight address is the corresponding address of its identification code.
4. every time store the identification code in above-mentioned memory with some input compare meet when, export one
Meet signal, and completion signal is generated according to all signals that meets.There are two types of implementation methods:
A. when all identification codes meet, output high potential expression meets.All signals that meet are connected to one
NAND gate, NAND gate represent to complete when exporting low potential.
B. using counter, initialization counter causes the counting of counter to be equal to input neuron population, often occurs one
It is secondary meet counter is made to subtract one, during counter clear output complete signal.
5. when occurring to complete signal, cumulative signal is carried out mapping output by neuron calculator.
Thereby, it is possible to while ensureing that input arrives by set sequence, improve service speed.
<Cellular array three-dimensional communication transmission method>
Three-dimensional cell array neural network chip can provide higher performance.Cell according to the preferred embodiment of the invention
Array three-dimensional method for communication transmission may include:
By multi-layer cellular array neural network chip (that is, cellular array computing system) perpendicular through silicon hole (TSV) excessively
It is overlapped into three-dimensional chip;
The bus of every one layer of cells array is got up by crossing silicon hole connection.
The computing unit in two neighbouring cellular array neural network chips is made to pass through silicon hole and carries out network
Connection.In this way, Extension of Communication Networks between computing unit is into three-dimensional communication network.
The neighbouring cell battle array adjacent neural net layer in multilayer neural network being deployed in three-dimensional chip
In row neural network chip layer;When needing to transmit data to next layer of neural network by communication network, first passed through silicon and led to
Hole carries out the data transmission of a vertical direction, and the communication network per one layer of cells array neural network chip is recycled to carry out water
Flat data transmission.
This pattern of the preferred embodiment of the present invention is especially efficient for convolutional neural network so that several network transmissions
Period can complete the transhipment work for needing thousands of a periods that could complete originally.
<Convolutional neural networks implementation method>
Cellular array framework has very big flexibility, provides another convolutional neural networks implementation method here:
1. by one layer of convolutional neural networks by the interior location relational deployment of this layer of convolutional neural networks in cellular array
In one rectangular area.
2. one by one weight is sent in rectangular area where each neuron by cellular array bus broadcast method
Computing unit, be stored in the memory of the computing unit.(weight of convolutional neural networks has translation invariance, each nerve
It is different, but all neurons use same group of weight that member closes on the corresponding weight of cell input from difference.It is very suitable for
It is sent using broadcast mechanism).
3. for this layer of convolutional neural networks, receiving the calculating of input progress neuron, (neuron calculating includes:It will be each defeated
Enter multiplied by weight corresponding with its, then sum to all products), the output of each neuron is temporarily retained in nerve
In member calculates.
4. by cellular array bus broadcast method next layer of convolutional neural networks (above-mentioned this layer of convolutional Neural
Next layer of convolutional neural networks of network) weight be sent in the computing unit of the rectangular area.
5. last layer neuron, (the above-mentioned layer is rolled up using the cellular array mesh communication network in cellular array
Product neural network last layer neuron) output be transferred in predetermined close region need the output computing unit in.
(in convolutional neural networks, the neuron of be only output to close on zonule of each neuron is very suitable for using cell
Array mesh communication network is spread.Only need a few clock cycle that can complete all data exchanges)
6. using the output of new weight (weights of next layer of convolutional neural networks) and last layer neuron, carry out next
The calculating of layer convolutional neural networks.
7. if following several layer networks or convolutional network, still can carry out successively like this.
Here, by convolutional neural networks by the interior location relational deployment of each layer convolutional neural networks in cellular array
In one rectangular area;One by one each weight of each layer is sent in rectangular area by cellular array bus broadcast method
Each cell, is stored in the memory of the computing unit;For first layer convolutional neural networks, receive input and carry out neuron meter
It calculates, the output of each neuron is temporarily retained in computing unit;Utilize the cellular array mesh communication net in cellular array
Network is transferred to the output of this layer of neuron in the computing unit for needing the output in predetermined close region;Use next layer
Weight and last layer neuron output, carry out next layer of convolutional neural networks calculating, so to the last one layer completion.
Here, the present invention has done an improvement, the weight of each layer is all disposably broadcast to each cell.Each layer of calculating later
Just do not have to repeat loading weight, processing speed is faster.
The advantages of this method is:
1. if the image of input is very big, and convolutional network usually has ten several layers of, will occupy a large amount of hardware and memory source.
This method carries out the calculating of different layers using same hardware, dramatically saves hardware resource.
2. utilizing cellular array mesh communication network, the data exchange between different layers is very efficient.
The preferred embodiment of the present invention has shown and described in above description, as previously described, it should be understood that the present invention is not office
Be limited to form disclosed herein, be not to be taken as the exclusion to other embodiment, and available for various other combinations, modification and
Environment, and can be changed in the scope of the invention is set forth herein by the above teachings or related fields of technology or knowledge
It is dynamic.And changes and modifications made by those skilled in the art do not depart from the spirit and scope of the present invention, then it all should be appended by the present invention
In scope of the claims.
Claims (9)
1. a kind of convolutional neural networks implementation method using cellular array computing system, wherein cellular array computing system includes
Master controller, bus form cellular array by multiple computing units;Wherein each computing unit of cellular array includes holding
The one or more neuron calculators or microprocessor and internal storage location of the calculating operation of row neuron;
It is characterized in that, the convolutional neural networks implementation method includes:Convolutional neural networks are pressed into each layer convolutional Neural net
The interior location relational deployment of network is in a rectangular area of cellular array;By cellular array bus broadcast method one by one
Each weight of each layer is sent to each cell in rectangular area, is stored in the memory of the computing unit;It is rolled up for first layer
Product neural network receives input and carries out neuron calculating, the output of each neuron is temporarily retained in computing unit;It utilizes
Cellular array mesh communication network in cellular array is transferred to the output of this layer of neuron the need in predetermined close region
It will be in the computing unit of the output;Using the output of next layer of weight and last layer neuron, next layer of convolutional Neural is carried out
The calculating of network, to the last one layer of completion.
2. convolutional neural networks implementation method as described in claim 1, which is characterized in that cellular array mesh communication network makes
Obtaining each computing unit of cellular array can communicate with neighboring computational unit, wherein each computing unit passes through cellular array net
The particular address or be sent to neighborhood calculation list that shape communication network is transmitted data in the internal storage location of neighboring computational unit
The neuron calculator of member.
3. convolutional neural networks implementation method as claimed in claim 2, which is characterized in that in cellular array mesh communication network
In, the communication port between each two neighboring computational unit is made of a pair of of cell fifo, and the pair of cell fifo includes the
One cell fifo and the second cell fifo;A computing unit wherein into two neighboring computational units is written the first of information
The operation side of cell fifo and the second cell fifo from another computing unit output information in two neighboring computational units
To opposite.
4. convolutional neural networks implementation method as claimed in claim 3, which is characterized in that in cellular array mesh communication network
In, it is set in each computing unit there are one can be with the network controller of direct read/write memory, and the network control in computing unit
Device processed is connect with all cell fifos in computing unit.
5. convolutional neural networks implementation method as claimed in claim 4, which is characterized in that in the first cell fifo to two phases
Information is written in a computing unit in adjacent computing unit, then network controller judges whether the terminal of the information of write-in is this meter
Calculate unit;And when it is this computing unit to judge the terminal of information of write-in, operations described below is performed:It is relative address in terminal
When, network controller by information be directly stored in internal storage location appropriate address and with notice neuron calculator;It is god in terminal
During through first calculator, directly notify that the neuron calculator is handled.
6. the convolutional neural networks implementation method as described in one of claim 1 to 5, which is characterized in that the cellular array is total
Line broadcasting method includes:Master controller is by bus in instruction and/or information group sending to the computing unit of a rectangular area
Neuron calculator in and/or the internal storage location that data group is dealt into the computing unit of a rectangular area it is same
In a relative address.
7. convolutional neural networks implementation method as claimed in claim 6, which is characterized in that each computing unit of cellular array
It further includes:Bus control unit and internal bus;The bus control unit of each computing unit is connect with the bus;It is each to calculate
The internal storage location of unit is the slave device of respective inner bus;Each the bus control unit of computing unit and neuron calculator are
The main equipment of respective inner bus, wherein bus control unit have higher priority.
8. convolutional neural networks implementation method as claimed in claim 6, which is characterized in that master controller read/write memory unit,
Wherein master controller first broadcast destination address in the bus of cellular array retransmits or prepares to read data, and bus control unit receives
Destination address, if the destination address in computing unit, connects the internal storage location of the computing unit to perform read-write operation.
9. convolutional neural networks implementation method as claimed in claim 6, which is characterized in that master controller and neuron calculator
Communication, wherein reserving the first reserved section in the address space of master controller for the communication with neuron calculator;First is pre-
Section is stayed for storing the identification code of target computing unit;Bus control unit is identified in the identification code for receiving target computing unit
The targeted neuron calculator of present communications connects targeted neuron calculator to perform subsequent operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611236090.XA CN108256640A (en) | 2016-12-28 | 2016-12-28 | Convolutional neural networks implementation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611236090.XA CN108256640A (en) | 2016-12-28 | 2016-12-28 | Convolutional neural networks implementation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108256640A true CN108256640A (en) | 2018-07-06 |
Family
ID=62720233
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611236090.XA Pending CN108256640A (en) | 2016-12-28 | 2016-12-28 | Convolutional neural networks implementation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108256640A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197114A (en) * | 2019-04-04 | 2019-09-03 | 华中科技大学 | A kind of automatic identifying method and device of full brain range single neuron aixs cylinder synaptic knob |
CN111753994A (en) * | 2020-06-22 | 2020-10-09 | 深圳鲲云信息科技有限公司 | Data processing method and device of AI chip and computer equipment |
CN113138957A (en) * | 2021-03-29 | 2021-07-20 | 北京智芯微电子科技有限公司 | Chip for neural network inference and method for accelerating neural network inference |
WO2021259231A1 (en) * | 2020-06-22 | 2021-12-30 | 深圳鲲云信息科技有限公司 | Artificial intelligence chip and artificial intelligence chip-based data processing method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103074210A (en) * | 2013-01-25 | 2013-05-01 | 深圳先进技术研究院 | Monitoring system for in-vitro cell photostimulation array device |
CN105718996A (en) * | 2015-07-29 | 2016-06-29 | 上海磁宇信息科技有限公司 | Cell array calculation system and communication method wherein |
CN105718379A (en) * | 2015-07-29 | 2016-06-29 | 上海磁宇信息科技有限公司 | Cell array calculation system and group-sending communication method among cells thereof |
CN105740946A (en) * | 2015-07-29 | 2016-07-06 | 上海磁宇信息科技有限公司 | Method for realizing neural network calculation by using cell array computing system |
-
2016
- 2016-12-28 CN CN201611236090.XA patent/CN108256640A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103074210A (en) * | 2013-01-25 | 2013-05-01 | 深圳先进技术研究院 | Monitoring system for in-vitro cell photostimulation array device |
CN105718996A (en) * | 2015-07-29 | 2016-06-29 | 上海磁宇信息科技有限公司 | Cell array calculation system and communication method wherein |
CN105718379A (en) * | 2015-07-29 | 2016-06-29 | 上海磁宇信息科技有限公司 | Cell array calculation system and group-sending communication method among cells thereof |
CN105740946A (en) * | 2015-07-29 | 2016-07-06 | 上海磁宇信息科技有限公司 | Method for realizing neural network calculation by using cell array computing system |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197114A (en) * | 2019-04-04 | 2019-09-03 | 华中科技大学 | A kind of automatic identifying method and device of full brain range single neuron aixs cylinder synaptic knob |
CN110197114B (en) * | 2019-04-04 | 2021-06-29 | 华中科技大学 | Automatic identification method and device for single neuron axon synaptic junction in whole brain range |
CN111753994A (en) * | 2020-06-22 | 2020-10-09 | 深圳鲲云信息科技有限公司 | Data processing method and device of AI chip and computer equipment |
WO2021259231A1 (en) * | 2020-06-22 | 2021-12-30 | 深圳鲲云信息科技有限公司 | Artificial intelligence chip and artificial intelligence chip-based data processing method |
CN111753994B (en) * | 2020-06-22 | 2023-11-03 | 深圳鲲云信息科技有限公司 | Data processing method and device of AI chip and computer equipment |
CN113138957A (en) * | 2021-03-29 | 2021-07-20 | 北京智芯微电子科技有限公司 | Chip for neural network inference and method for accelerating neural network inference |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106022468B (en) | the design method of artificial neural network processor integrated circuit and the integrated circuit | |
CN108256640A (en) | Convolutional neural networks implementation method | |
JP6714690B2 (en) | Information processing system, method of operating information processing system, and machine learning computing unit | |
CN105740946B (en) | A kind of method that application cell array computation system realizes neural computing | |
US11423296B2 (en) | Device and method for distributing convolutional data of a convolutional neural network | |
CN105718996B (en) | Cellular array computing system and communication means therein | |
CN107918794A (en) | Neural network processor based on computing array | |
CN110390388A (en) | Neuromorphic circuit with 3D stacked structure and the semiconductor device including it | |
KR20200037748A (en) | Chip device and related product | |
CN105469143B (en) | Network-on-chip method for mapping resource based on neural network dynamic feature | |
CN105719000A (en) | Neuron hardware structure and method of simulating pulse neural network by adopting neuron hardware structure | |
CN112464784A (en) | Distributed training method based on hybrid parallel | |
CN108256637A (en) | A kind of cellular array three-dimensional communication transmission method | |
CN110348574A (en) | A kind of general convolutional neural networks accelerating structure and design method based on ZYNQ | |
CN113240127B (en) | Training method and device based on federal learning, electronic equipment and storage medium | |
CN108647776A (en) | A kind of convolutional neural networks convolution expansion process circuit and method | |
CN108090496A (en) | The method and apparatus of image procossing based on convolutional neural networks | |
CN107766932A (en) | Image processing method and device based on neural network | |
CN109670581A (en) | A kind of computing device and board | |
US9196077B1 (en) | Efficient inter-processor communication in ray tracing | |
CN108256641A (en) | For the cellular array internal network communication method of cellular array computing system | |
US9558530B2 (en) | Method and apparatus for an inter-cell shortest communication | |
CN109740619A (en) | Neural network terminal operating method and device for target identification | |
CN108256622A (en) | A kind of cellular array computing system | |
CN108256639A (en) | For the cellular array mesh communication network of cellular array computing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180706 |