CN108256640A - Convolutional neural networks implementation method - Google Patents

Convolutional neural networks implementation method Download PDF

Info

Publication number
CN108256640A
CN108256640A CN201611236090.XA CN201611236090A CN108256640A CN 108256640 A CN108256640 A CN 108256640A CN 201611236090 A CN201611236090 A CN 201611236090A CN 108256640 A CN108256640 A CN 108256640A
Authority
CN
China
Prior art keywords
computing unit
neuron
cellular array
convolutional neural
neural networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611236090.XA
Other languages
Chinese (zh)
Inventor
郭民
郭一民
戴瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ciyu Information Technologies Co Ltd
Original Assignee
Shanghai Ciyu Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ciyu Information Technologies Co Ltd filed Critical Shanghai Ciyu Information Technologies Co Ltd
Priority to CN201611236090.XA priority Critical patent/CN108256640A/en
Publication of CN108256640A publication Critical patent/CN108256640A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4063Device-to-bus coupling
    • G06F13/4068Electrical coupling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17362Indirect interconnection networks hierarchical topologies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a kind of convolutional neural networks implementation methods using cellular array computing system.By convolutional neural networks by the interior location relational deployment of each layer convolutional neural networks in a rectangular area of cellular array;Each weight of each layer is one by one sent to by cellular array bus broadcast method by each cell in rectangular area, is stored in the memory of the computing unit;For first layer convolutional neural networks, receive input and carry out neuron calculating, the output of each neuron is temporarily retained in computing unit;The output of this layer of neuron is transferred in the computing unit for needing the output in predetermined close region using the cellular array mesh communication network in cellular array;Using the output of next layer of weight and last layer neuron, the calculating of next layer of convolutional neural networks is carried out, to the last one layer of completion.

Description

Convolutional neural networks implementation method
Technical field
The present invention relates to semiconductor chip field and artificial intelligence fields more particularly to a kind of use cellular array to calculate The convolutional neural networks implementation method of system.
Background technology
Human brain is a network connected by a large amount of neuron complexity.Each neuron is connected a large amount of by a large amount of dendron Other neurons, receive information, each tie point is cynapse (Synapse).After outside stimulus accumulation to a certain extent, A stimulus signal is generated, is sent out by aixs cylinder.Aixs cylinder has a large amount of tip, by cynapse, is connected to other a large amount of god Dendron through member.Exactly such a network being made of the neuron of simple functions realizes all intelligency activities of the mankind. Man memory and intelligence are generally believed that in the different stiffness of couplings for being stored in each cynapse.
The response frequency of neuron is no more than 100Hz, 10,000,000 times faster than human brain of the CPU of modern computer, but handles very The ability of more challenges is not so good as human brain.This has promoted computer industry to start to imitate human brain.The earliest imitation to human brain is In software view.
Neural network (Neural Networks) is common algorithm in computer learning.God in neural network algorithm It is exactly a function through member, it has very multiple input, each input corresponds to a weight.General algorithm is each Input is multiplied by weight and is being added.It exports 0 or 1 (being determined by a threshold value) or a value between 0 and 1.One allusion quotation The neural network of type is the network that the output input of a large amount of cellular arrays (Neuron) is linked together, is typically organized to more Level framework.It is exactly to adjust these parameters to have very multiple parameters (weight, threshold value), the process of learning training inside it.This is one The function optimization that magnanimity is needed to calculate.This kind of algorithm has been achieved for abundant achievement, is used widely.
Network in neural network algorithm is all divided into plurality of layers.Earliest network, each neuron of last layer Each neuron with next layer connects, and becomes the network of full-mesh.One problem of full-mesh network, is image procossing In this kind of application, there are many pixel of image, and the weight quantity of each layer of needs is proportional to pixel square, thus in program occupancy It deposits too greatly, calculation amount is even more to be unable to cope with.
In convolutional neural networks, the plurality of layers of front is no longer full-mesh.Each layer of neuron is as a figure As being ordered in array.Next layer each neuron is only connected with a zonule of this layer.Zonule is often one The length of side is the square region of k, and k is known as the size of cores (Kernel Size) of convolutional network, as shown in Figure 1.
Convolutional neural networks (Convolutional Neural Network, CNN) are because to each of this zonule The summation of point weighted is gained the name similar to convolution.Each point of this group of weight in each same confluent monolayer cells is the same Weight quantity is greatly reduced so as to be compared with full-mesh network so that high-resolution image procossing in (both translation invariances) It is possibly realized.One convolutional neural networks includes multiple layers connected in this way and other kinds of layer.
With popularizing for deep learning application, people start to develop dedicated neural network chip.It is realized with special circuit The addition and multiplication that neuron calculates, than with CPU GPU much more efficients.
Reluctance type random access storage device (Magnetic Random Access Memory, MRAM) is a kind of new interior Deposit and memory technology, can as SRAM/DRAM quick random read-write, it is and faster than DRAM;Can also as flash memory Permanent retention data after power-off, and unlike NAND, it can be with unlimited secondary erasable.
It is local good that the economy of MRAM is thought, the silicon area that unit capacity occupies is than SRAM (usually as the caching of CPU) There is very big advantage, be expected to the level close to DRAM.Its performance is also fairly good, reads and writes time delay close to best SRAM, power consumption It is then best in various memories and memory technology.And MRAM unlike DRAM and flash memory with standard CMOS semiconductor technique not It is compatible.MRAM can be integrated into logic circuit in a chip.There is MRAM technology, it is possible to memory, storage, calculate three A function is integrated on a chip.New computing architecture is possible to.
The characteristics of human brain is extensive parallel computing, not only has a large amount of neuron that can work at the same time, but also each god It is connected through member with thousands of a neurons.For modern integrated circuits technology, a large amount of neuron is integrated on a single die It is easy to, but internal communication bandwidth as offer human brain is extremely difficult.For example, if the input data of one layer of neuron exists In one block RAM, it is necessary to which at least k clock cycle could read out data, because the memory do not gone together cannot be carried out at the same time Read-write.The speed of data is read as a result, and both memory bandwidth was the bottleneck calculated.
Invention content
In view of the drawbacks described above of the prior art, a kind of and neural network frame based on cellular array is proposed in the present invention The framework of structure is made of numerous neuron calculators for having both store function and dense network connection.The present invention's is this new Framework will will be widely used in the fields such as mass computing, big data processing, artificial intelligence.
To achieve the above object, the present invention provides a kind of wherein cellular array computing system include master controller, bus, Cellular array is formed by multiple computing units;Wherein each computing unit of cellular array includes the calculating for performing neuron The one or more neuron calculators or microprocessor and internal storage location of operation;Wherein, the convolutional neural networks are realized Method includes:By convolutional neural networks by the interior location relational deployment of each layer convolutional neural networks at one of cellular array In rectangular area;One by one each weight of each layer is sent to by cellular array bus broadcast method each in rectangular area Cell is stored in the memory of the computing unit;For first layer convolutional neural networks, receive input and carry out neuron calculating, it will The output of each neuron is temporarily retained in computing unit;Using the cellular array mesh communication network in cellular array this The output of one layer of neuron is transferred in the computing unit for needing the output in predetermined close region;Use next layer of weight With the output of last layer neuron, the calculating of next layer of convolutional neural networks is carried out, to the last one layer of completion.
For convolutional neural networks, the weight of each neuron is few, is exactly a last layer network zonule nearby Neuron.One convolutional neural networks usually has more than ten of convolutional layer, and cell usually has all power of enough memory storages Weight.And the weight of convolutional network has translation invariance, is particularly suitable for sending using broadcast mechanism.
Preferably, cellular array mesh communication network enables each computing unit of cellular array and neighborhood calculation list Member communication, wherein each computing unit transmits data to the memory of neighboring computational unit by cellular array mesh communication network Particular address in unit or the neuron calculator for being sent to neighboring computational unit.
Preferably, in cellular array mesh communication network, the communication port between each two neighboring computational unit is by one Cell fifo is formed, the pair of cell fifo includes the first cell fifo and the second cell fifo;It is wherein adjacent to two First cell fifo of the computing unit write-in information in computing unit with from another in two neighboring computational units The operation direction of second cell fifo of computing unit output information is opposite.
Preferably, it in cellular array mesh communication network, is set in each computing unit there are one can be in direct read/write The network controller deposited, and all cell fifos connection in the network controller and computing unit in computing unit.
Preferably, information is written in a computing unit of first cell fifo into two neighboring computational units, then net Network controller judges whether the terminal of the information of write-in is this computing unit;And the terminal of the information in judgement write-in is this meter When calculating unit, operations described below is performed:When terminal is relative address, information is directly stored in the phase of internal storage location by network controller Answer address and with notice neuron calculator;When terminal is neuron calculator, the neuron calculator is directly notified to carry out Processing.
Preferably, the cellular array bus broadcast method includes:Master controller is by bus instruction and/or ensemble A rectangular area is dealt into the neuron calculator being dealt into the computing unit of a rectangular area and/or data group Computing unit in internal storage location same relative address in.
Preferably, each computing unit of cellular array further includes:Bus control unit and internal bus;It is each to calculate list The bus control unit of member is connect with the bus;The internal storage location of each computing unit is the slave device of respective inner bus;Often The bus control unit and neuron calculator of a computing unit are the main equipments of respective inner bus, and wherein bus control unit has Higher priority.
Preferably, master controller read/write memory unit, wherein the master controller first broadcast target in the bus of cellular array Address retransmits or prepares to read data, and bus control unit receives destination address, if the destination address connects in computing unit The internal storage location of the computing unit is connect to perform read-write operation.
Preferably, master controller communicates with neuron calculator, wherein reserving first in the address space of master controller Reserved section is for the communication with neuron calculator;First reserved section is used to store the identification code of target computing unit;Total line traffic control Device processed identifies the targeted neuron calculator of present communications in the identification code for receiving target computing unit, and connection is targeted Neuron calculator to perform subsequent operation.
The technique effect of the design of the present invention, concrete structure and generation is described further below with reference to attached drawing, with It is fully understood from the purpose of the present invention, feature and effect.
Description of the drawings
With reference to attached drawing, and by reference to following detailed description, it will more easily have more complete understanding to the present invention And be more easily understood its with the advantages of and feature, wherein:
Fig. 1 is the framework of convolutional neural networks.
Fig. 2 is the schematic diagram of cellular array computing system framework according to the preferred embodiment of the invention.
Fig. 3 is the communication of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention Network diagram.
Fig. 4 is the example of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention Schematic diagram.
Fig. 5 is the exemplary schematic diagram of the Path selection of the network communication in array according to the preferred embodiment of the invention.
Fig. 6 is schematic diagram of the starting point according to the preferred embodiment of the invention on the angle of rectangular area.
Fig. 7 is schematic diagram of the starting point according to the preferred embodiment of the invention on the side of rectangular area.
Fig. 8 is schematic diagram of the starting point according to the preferred embodiment of the invention outside rectangular area.
Fig. 9 is schematic diagram of the starting point according to the preferred embodiment of the invention outside rectangular area.
Figure 10 is the schematic diagram of the specific example of communication mode according to the preferred embodiment of the invention.
Figure 11 is the schematic diagram of the specific example of mass-sending mode according to the preferred embodiment of the invention.
Figure 12 is the schematic diagram of neuron calculator operation method according to the preferred embodiment of the invention.
It should be noted that attached drawing is not intended to limit the present invention for illustrating the present invention.Note that represent that the attached drawing of structure can It can be not necessarily drawn to scale.Also, in attached drawing, same or similar element indicates same or similar label.
Specific embodiment
<Cellular array computing system>
Fig. 2 is the schematic diagram of cellular array computing system framework according to the preferred embodiment of the invention.
As shown in Fig. 2, cellular array computing system according to the preferred embodiment of the invention includes:Master controller 10 (for example, The master controller 10 is master cpu), bus 20, by multiple computing units 30 form cellular array.
For example, the master controller 10 is master controller or the outer master controller of piece in piece.
Wherein, each computing unit 30 of cellular array includes:For performing the calculating operation of neuron (for example, calculating Operation includes addition and multiplication etc., specifically for example each inputs and to be added with all inputs after being multiplied to of its weight) one Or multiple neuron calculators 31 and internal storage location 32.
Wherein, internal storage location 32 may be used SRAM or MRAM, and wherein MRAM has non-volatile and has higher density. It is therefore preferred that internal storage location 32 is MRAM.
Wherein, internal storage location 32 stores parameter, such as weight parameter.
Wherein, the master controller 10 is communicated by the bus 20 with each computing unit 30.Specifically, for example, institute State the data that master controller 10 is read and write by the bus 20 in the internal storage location 32 of each computing unit 30, and the master control Device 10 processed is communicated by the bus 20 with the neuron calculator 31 of each computing unit 30.
Wherein, its position (x, y) in cellular array is stored in each computing unit 30 as identification code, and Software and hardware in computing unit 30 can read this identification code, to be used in specific operation.
The characteristics of embodiment of the present invention is integrated using MRAM and logic circuit, allows density to reach SRAM 8-20 times the characteristics of;Wherein, the embodiment of the invention proposes to be made of a fritter MRAM and neuron calculator one thin Born of the same parents form an array, then this array is connected to form a framework by bus by a large amount of cells.This framework pole The earth has expanded memory bandwidth, improves the overall performance of chip.
<Cellular array mesh communication network>
Fig. 3 is the communication of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention Network diagram.As shown in figure 3, cellular array includes communication network so that each computing unit 30 of cellular array can be with Neighboring computational unit 30 communicates.
For example, each computing unit 30 can be read and write in the internal storage location 32 of neighboring computational unit 30 by the bus 20 Data, and each computing unit 30 can be logical by the neuron calculator 31 of the bus 20 and neighboring computational unit 30 Letter.
The output of each neuron calculator 31 is transmitted to next stage neuron calculator by bus 20 or communication network Input.
In this embodiment of the invention, cellular array framework by data mass-send and internal network, solve memory and The bottleneck problem of communication.Moreover, the embodiment of the present invention solves memory and the bottleneck of communication, so as to land productivity to a greater extent With parallel computing, higher computing capability has thus been given play to.
<Cellular array bus broadcast method>
Master controller 10 can by bus instruction or information group sending to the god in the computing unit of a rectangular area Through the same phase that the internal storage location in the computing unit of a rectangular area is dealt into first calculator and/or data group To in address.
The bus for having broadcast capability in this way can be implemented by the following method:
Fig. 4 is the example of the computing unit of the cellular array of cellular array computing system according to the preferred embodiment of the invention Schematic diagram.
As shown in figure 4, each computing unit 30 of cellular array includes:For performing the one of the calculating operation of neuron Or multiple neuron calculators 31, internal storage location 32, bus control unit 33 and internal bus 34.
The bus control unit 33 of each computing unit 30 is connect with the bus 20.
The internal storage location 32 of each computing unit 30 is the slave device of respective inner bus 34;Each computing unit 30 is total Lane controller 33 and neuron calculator 31 are the main equipments of respective inner bus 34, and wherein bus control unit has higher excellent First grade.
Moreover, cellular array bus broadcast method specifically may include following step:
Broadcast destination address is simultaneously in the bus of cellular array for 10 read/write memory unit of master controller, wherein master controller 10 It sends or prepares to read data, bus control unit receives destination address, if the destination address in computing unit, connects the meter The internal storage location of unit is calculated to perform read-write operation.If neuron calculator reads and writes the internal storage location of the computing unit, It connects the internal storage location of the computing unit and the computing unit is read and write by neuron calculator again to perform after read-write operation is completed Internal storage location.
Master controller 10 communicates with neuron calculator, wherein it is reserved that first is reserved in the address space of master controller 10 Section is for the communication with neuron calculator.First reserved section is used to store the identification code of target computing unit.Bus control unit The targeted neuron calculator of present communications is identified in the identification code for receiving target computing unit, connects targeted god Received through first calculator with execute instruction, data receiver, state read etc. subsequent operations.
Master controller 10 performs mass-sending processing, is used for wherein reserving the second reserved section in the address space of master controller 10 Mass-send the instruction to neuron calculator and/or information, second reserves the address stored in section and include target when mass-sending data (starting point computing unit and endpoint calculation unit are in target rectangle for starting point computing unit and endpoint calculation unit in rectangular area On the diagonal in region) identification code, with send instruction and/or information.
Master controller 10 performs mass-sending processing, wherein third is reserved in the address space in master controller 10 reserves section use In mass-sending data, when mass-sending data third reserve the address stored in section include in target rectangle region starting point computing unit with The identification code of endpoint calculation unit (starting point computing unit and endpoint calculation unit are on diagonal), and mass-send and contain in data Have the number of transmission data i.e..In the data transmission of the number, the address included in data transmission each time includes Relative address of the computing unit in target rectangle region, to indicate computing unit reception data and be stored in relative address.
For example, Yi Shang agreement is implemented by bus control unit, it is responsible for decoding the address in bus, and perform accordingly with carefully The neuron calculator of intracellular and the data exchange of internal storage location.
The broadcast capability of cellular array bus can provide full-mesh neural network very big help, greatly improve The transmission speed of mass data.Concrete operation method is as follows:
A Layer assignment in full-mesh neural network (for image procossing application, is done so in a rectangular area It is more natural), each cell performs the function of one or more neurons.The weight of each input of each neuron is deposited In the internal storage location of this cell.
After a neuron calculator is completed to calculate, read by master controller 10 from its delivery outlet as a result, data are wide It is multicast in the region where next layer network, is sent to the neuron calculator of each computing unit.
Or after one or more neuron calculators in a computing unit are completed to calculate, result is stored in memory It preset relative address and is read, and is broadcast in the region where next layer of neuron by master controller 10 in unit, It is stored in preset relative address in each cell.
The broadcast capability of cellular array bus can provide full-mesh neural network very big help, greatly improve The transmission speed of mass data.
<Cellular array internal network communication method>
Fig. 5 is the exemplary schematic diagram of the Path selection of the network communication in array according to the preferred embodiment of the invention.
● identification of each information all containing starting point computing unit and endpoint calculation unit between computing unit (cell) Code.
● an information is reached home by the connection between neighboring computational unit from starting point computing unit by multiple transfer Computing unit.
● a network controller is set in each computing unit, in the case where not interfering other functions rapidly Transfering the letter breath.
● while the identification code for indicating endpoint calculation unit, information indicates the address in endpoint calculation unit simultaneously Or neuron calculator.
■ indicates under the pattern of the address in endpoint calculation unit that information will be by the network control in computing unit at the same time Device processed writes direct the appropriate address in the internal storage location of the computing unit.
■ indicates under the pattern of the neuron calculator in endpoint calculation unit that information will transfer to intracellular god simultaneously It is handled through first calculator.
● each sends and the computing unit of transfer information, network controller must all select an adjacent calculating Unit is as the next stop.
■ when starting point computing unit and endpoint calculation unit on one wire when, there are one rational selections only.
■ is in the case of other, and there are two similary rational selection, network controller can select a traffic is opposite not to be in a hurry Neighboring computational unit.
From a computing unit sending bulk message to a rectangular area, there are one simple methods:By master controller 10 reading information are mass-sended again.Another mode is provided herein:Point communications functionality between computing unit is expanded It opens up to region and mass-sends.This mode can support the higher depth of parallelism, much higher total bandwidth.It is very suitable for convolutional Neural net Network.For the mass-sending between computing unit, original sender is responsible for indicating target area, is still completed by serial transfer.
● if computing unit and transfer computing unit are sent in target area:
■ (as shown in Figure 6) on the angle of rectangular area
◆ if the width in region is 1, at this time only can be with the transfer of the selected as next stop there are one neighboring computational unit.The meter The network controller for calculating unit receives the data (if the computing unit is not sender of the message) of the information, and information is forwarded To this neighboring computational unit, then update target area (length subtracts 1).
If ● in be left the last one computing unit, stop transfer.
◆ if the length and width in region are both greater than 1, at this time can be with the transfer of the selected as next stop there are two neighboring computational unit.It should The network controller of computing unit receives the data (if the computing unit is not sender of the message) of the information, and respectively to this Two neighboring computational units forward the information, then update target area, and one of region is the rectangular area that width is 1.
■ (as shown in Figure 7) on the side of rectangular area
◆ if the width in region is 1, at this time can be with the transfer of the selected as next stop there are two neighboring computational unit.The calculating The network controller of unit receives the data (if the cell is not sender of the message) of the information, and adjacent to the two respectively Computing unit forwards the information, then updates target area.
◆ if the width in region is more than 1, at this time can be with the transfer of the selected as next stop there are three neighboring computational unit.The meter The network controller for calculating unit receives the data (if the cell is not sender of the message) of the information, and respectively to these three phases Adjacent computing unit forwards the information, then updates target area, and two of which region is the rectangular area that width is 1.
■, at this time can be with the transfer of the selected as next stop, and such as there are four neighboring computational unit in the inside of rectangular area The fruit computing unit is only possible to be sender of the message).And the information is forwarded to this four neighboring computational units respectively, then update Target area, two of which region are the rectangular areas that width is 1.
If ● send computing unit and transfer computing unit outside target area (as shown in Figure 8 and Figure 9).In view of god It needs to say that piece of data is transferred to another on a large scale through network, in this case, communication network is easy to get congestion.This In using an agreement, congestion can be avoided in convolutional network through research.
■ indicates transmission direction when sending information.
Information all is transmitted by transmission direction during ■ transfers each time, when coordinate in a forward direction enters target area After coordinate range, gradually laterally the mass-sending that width is 1 is done.
Specific implementation method (as shown in Figure 10):
1. the communication port between each two neighboring computational unit is made of a pair of of FIFO (first in, first out).From a calculating Unit is written and the FIFO of another computing unit output and FIFO of opposite direction.It stands the angle of a computing unit wherein On, can it be referred to as input and output FIFOs.
2. network controller is connected with all (most 4 couples) FIFO in computing unit.Network controller also with the calculating MPU connections in unit are sent out to it and are interrupted (such as FIFO empty, FIFO is full, new to arrive information etc., information submitting etc.).
3. network controller transmission, reception and transfer information.
4. if some input FIFO has information entrance, network controller will first check for it:
● if terminal is this computing unit, then:
If ■ terminal feature relative addresses, since network controller has DMA abilities, information will be directly stored in interior The appropriate address of memory cell and with notice neuron calculator
If ■ terminals are neuron calculators, directly it is notified to be handled.
Cellular array network mass-sending function can provide very big help, pole for neural network particularly convolutional neural networks The earth improves the transmission bandwidth of mass data.Concrete operation method is as follows:
1. the Layer assignment in neural network (for image procossing application, is done so more in a rectangular area It is natural), each computing unit performs the function of one or more neurons.The weight of each input of each neuron is deposited In internal storage location/MRAM of this computing unit.
2. adjacent layer is deployed in adjacent region.
After this layer of all neuron is completed to calculate, the region where the layer that transmission direction is directed towards is supported All computing units synchronize mass-sending, as shown in figure 11.
The advantage that neural computing is realized using the present invention is obvious:
1. the arithmetic speed that the parallel computing of a large amount of computing units is significantly speeded up, so that the speed pole of learning training It is big to improve.
2. the huge bandwidth of cellular array internal network communication and mass-sending mechanism are equally remarkably contributing to improve speed.
3.MRAM's is non-volatile so that the successful chip of training can be replicated directly as the product for solving particular problem Sale.
<Neuron calculator operation method>
The Comparision of neuron calculator is simple:The output for each the upper strata neuron for having connection therewith, it is multiplied by Corresponding weight, adds up.Usually last result also needs to do a simple mapping, for example, be mapped to 0 to 1 it Between a number.However, each neuron has tens at least, at most thousands of input neurons, in a net In network environment, it is difficult to ensure that these inputs arrive by set sequence.A method rapidly and efficiently is needed, it is defeated to each Enter, go to search comparison, also to determine whether that all inputs all have arrived at.
The present invention proposes that a method solves the problems, such as this, as shown in figure 12:
1. make the identification code of the input information including upstream neuron of each neuron calculator in network and defeated Go out data.
2. make comprising one section of content adressable memory (SRAM, CAM) in each neuron calculator, it is all to store The identification code of upstream neuron.This internal storage location can once complete input and the comparison of all identification codes, meet An output is generated during identification code.
3. being inputted for each, the content in the identification code of the neuron calculator of input and above-mentioned memory is carried out It compares;Identification code according to meeting in comparison result find with the corresponding weight of the input, by each weight with input be multiplied To obtain product, and added up to obtain cumulative signal to all products.For example, it is a kind of look for address implementation method be: The storage of weight address is the corresponding address of its identification code.
4. every time store the identification code in above-mentioned memory with some input compare meet when, export one Meet signal, and completion signal is generated according to all signals that meets.There are two types of implementation methods:
A. when all identification codes meet, output high potential expression meets.All signals that meet are connected to one NAND gate, NAND gate represent to complete when exporting low potential.
B. using counter, initialization counter causes the counting of counter to be equal to input neuron population, often occurs one It is secondary meet counter is made to subtract one, during counter clear output complete signal.
5. when occurring to complete signal, cumulative signal is carried out mapping output by neuron calculator.
Thereby, it is possible to while ensureing that input arrives by set sequence, improve service speed.
<Cellular array three-dimensional communication transmission method>
Three-dimensional cell array neural network chip can provide higher performance.Cell according to the preferred embodiment of the invention Array three-dimensional method for communication transmission may include:
By multi-layer cellular array neural network chip (that is, cellular array computing system) perpendicular through silicon hole (TSV) excessively It is overlapped into three-dimensional chip;
The bus of every one layer of cells array is got up by crossing silicon hole connection.
The computing unit in two neighbouring cellular array neural network chips is made to pass through silicon hole and carries out network Connection.In this way, Extension of Communication Networks between computing unit is into three-dimensional communication network.
The neighbouring cell battle array adjacent neural net layer in multilayer neural network being deployed in three-dimensional chip In row neural network chip layer;When needing to transmit data to next layer of neural network by communication network, first passed through silicon and led to Hole carries out the data transmission of a vertical direction, and the communication network per one layer of cells array neural network chip is recycled to carry out water Flat data transmission.
This pattern of the preferred embodiment of the present invention is especially efficient for convolutional neural network so that several network transmissions Period can complete the transhipment work for needing thousands of a periods that could complete originally.
<Convolutional neural networks implementation method>
Cellular array framework has very big flexibility, provides another convolutional neural networks implementation method here:
1. by one layer of convolutional neural networks by the interior location relational deployment of this layer of convolutional neural networks in cellular array In one rectangular area.
2. one by one weight is sent in rectangular area where each neuron by cellular array bus broadcast method Computing unit, be stored in the memory of the computing unit.(weight of convolutional neural networks has translation invariance, each nerve It is different, but all neurons use same group of weight that member closes on the corresponding weight of cell input from difference.It is very suitable for It is sent using broadcast mechanism).
3. for this layer of convolutional neural networks, receiving the calculating of input progress neuron, (neuron calculating includes:It will be each defeated Enter multiplied by weight corresponding with its, then sum to all products), the output of each neuron is temporarily retained in nerve In member calculates.
4. by cellular array bus broadcast method next layer of convolutional neural networks (above-mentioned this layer of convolutional Neural Next layer of convolutional neural networks of network) weight be sent in the computing unit of the rectangular area.
5. last layer neuron, (the above-mentioned layer is rolled up using the cellular array mesh communication network in cellular array Product neural network last layer neuron) output be transferred in predetermined close region need the output computing unit in. (in convolutional neural networks, the neuron of be only output to close on zonule of each neuron is very suitable for using cell Array mesh communication network is spread.Only need a few clock cycle that can complete all data exchanges)
6. using the output of new weight (weights of next layer of convolutional neural networks) and last layer neuron, carry out next The calculating of layer convolutional neural networks.
7. if following several layer networks or convolutional network, still can carry out successively like this.
Here, by convolutional neural networks by the interior location relational deployment of each layer convolutional neural networks in cellular array In one rectangular area;One by one each weight of each layer is sent in rectangular area by cellular array bus broadcast method Each cell, is stored in the memory of the computing unit;For first layer convolutional neural networks, receive input and carry out neuron meter It calculates, the output of each neuron is temporarily retained in computing unit;Utilize the cellular array mesh communication net in cellular array Network is transferred to the output of this layer of neuron in the computing unit for needing the output in predetermined close region;Use next layer Weight and last layer neuron output, carry out next layer of convolutional neural networks calculating, so to the last one layer completion. Here, the present invention has done an improvement, the weight of each layer is all disposably broadcast to each cell.Each layer of calculating later Just do not have to repeat loading weight, processing speed is faster.
The advantages of this method is:
1. if the image of input is very big, and convolutional network usually has ten several layers of, will occupy a large amount of hardware and memory source. This method carries out the calculating of different layers using same hardware, dramatically saves hardware resource.
2. utilizing cellular array mesh communication network, the data exchange between different layers is very efficient.
The preferred embodiment of the present invention has shown and described in above description, as previously described, it should be understood that the present invention is not office Be limited to form disclosed herein, be not to be taken as the exclusion to other embodiment, and available for various other combinations, modification and Environment, and can be changed in the scope of the invention is set forth herein by the above teachings or related fields of technology or knowledge It is dynamic.And changes and modifications made by those skilled in the art do not depart from the spirit and scope of the present invention, then it all should be appended by the present invention In scope of the claims.

Claims (9)

1. a kind of convolutional neural networks implementation method using cellular array computing system, wherein cellular array computing system includes Master controller, bus form cellular array by multiple computing units;Wherein each computing unit of cellular array includes holding The one or more neuron calculators or microprocessor and internal storage location of the calculating operation of row neuron;
It is characterized in that, the convolutional neural networks implementation method includes:Convolutional neural networks are pressed into each layer convolutional Neural net The interior location relational deployment of network is in a rectangular area of cellular array;By cellular array bus broadcast method one by one Each weight of each layer is sent to each cell in rectangular area, is stored in the memory of the computing unit;It is rolled up for first layer Product neural network receives input and carries out neuron calculating, the output of each neuron is temporarily retained in computing unit;It utilizes Cellular array mesh communication network in cellular array is transferred to the output of this layer of neuron the need in predetermined close region It will be in the computing unit of the output;Using the output of next layer of weight and last layer neuron, next layer of convolutional Neural is carried out The calculating of network, to the last one layer of completion.
2. convolutional neural networks implementation method as described in claim 1, which is characterized in that cellular array mesh communication network makes Obtaining each computing unit of cellular array can communicate with neighboring computational unit, wherein each computing unit passes through cellular array net The particular address or be sent to neighborhood calculation list that shape communication network is transmitted data in the internal storage location of neighboring computational unit The neuron calculator of member.
3. convolutional neural networks implementation method as claimed in claim 2, which is characterized in that in cellular array mesh communication network In, the communication port between each two neighboring computational unit is made of a pair of of cell fifo, and the pair of cell fifo includes the One cell fifo and the second cell fifo;A computing unit wherein into two neighboring computational units is written the first of information The operation side of cell fifo and the second cell fifo from another computing unit output information in two neighboring computational units To opposite.
4. convolutional neural networks implementation method as claimed in claim 3, which is characterized in that in cellular array mesh communication network In, it is set in each computing unit there are one can be with the network controller of direct read/write memory, and the network control in computing unit Device processed is connect with all cell fifos in computing unit.
5. convolutional neural networks implementation method as claimed in claim 4, which is characterized in that in the first cell fifo to two phases Information is written in a computing unit in adjacent computing unit, then network controller judges whether the terminal of the information of write-in is this meter Calculate unit;And when it is this computing unit to judge the terminal of information of write-in, operations described below is performed:It is relative address in terminal When, network controller by information be directly stored in internal storage location appropriate address and with notice neuron calculator;It is god in terminal During through first calculator, directly notify that the neuron calculator is handled.
6. the convolutional neural networks implementation method as described in one of claim 1 to 5, which is characterized in that the cellular array is total Line broadcasting method includes:Master controller is by bus in instruction and/or information group sending to the computing unit of a rectangular area Neuron calculator in and/or the internal storage location that data group is dealt into the computing unit of a rectangular area it is same In a relative address.
7. convolutional neural networks implementation method as claimed in claim 6, which is characterized in that each computing unit of cellular array It further includes:Bus control unit and internal bus;The bus control unit of each computing unit is connect with the bus;It is each to calculate The internal storage location of unit is the slave device of respective inner bus;Each the bus control unit of computing unit and neuron calculator are The main equipment of respective inner bus, wherein bus control unit have higher priority.
8. convolutional neural networks implementation method as claimed in claim 6, which is characterized in that master controller read/write memory unit, Wherein master controller first broadcast destination address in the bus of cellular array retransmits or prepares to read data, and bus control unit receives Destination address, if the destination address in computing unit, connects the internal storage location of the computing unit to perform read-write operation.
9. convolutional neural networks implementation method as claimed in claim 6, which is characterized in that master controller and neuron calculator Communication, wherein reserving the first reserved section in the address space of master controller for the communication with neuron calculator;First is pre- Section is stayed for storing the identification code of target computing unit;Bus control unit is identified in the identification code for receiving target computing unit The targeted neuron calculator of present communications connects targeted neuron calculator to perform subsequent operation.
CN201611236090.XA 2016-12-28 2016-12-28 Convolutional neural networks implementation method Pending CN108256640A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611236090.XA CN108256640A (en) 2016-12-28 2016-12-28 Convolutional neural networks implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611236090.XA CN108256640A (en) 2016-12-28 2016-12-28 Convolutional neural networks implementation method

Publications (1)

Publication Number Publication Date
CN108256640A true CN108256640A (en) 2018-07-06

Family

ID=62720233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611236090.XA Pending CN108256640A (en) 2016-12-28 2016-12-28 Convolutional neural networks implementation method

Country Status (1)

Country Link
CN (1) CN108256640A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197114A (en) * 2019-04-04 2019-09-03 华中科技大学 A kind of automatic identifying method and device of full brain range single neuron aixs cylinder synaptic knob
CN111753994A (en) * 2020-06-22 2020-10-09 深圳鲲云信息科技有限公司 Data processing method and device of AI chip and computer equipment
CN113138957A (en) * 2021-03-29 2021-07-20 北京智芯微电子科技有限公司 Chip for neural network inference and method for accelerating neural network inference
WO2021259231A1 (en) * 2020-06-22 2021-12-30 深圳鲲云信息科技有限公司 Artificial intelligence chip and artificial intelligence chip-based data processing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103074210A (en) * 2013-01-25 2013-05-01 深圳先进技术研究院 Monitoring system for in-vitro cell photostimulation array device
CN105718996A (en) * 2015-07-29 2016-06-29 上海磁宇信息科技有限公司 Cell array calculation system and communication method wherein
CN105718379A (en) * 2015-07-29 2016-06-29 上海磁宇信息科技有限公司 Cell array calculation system and group-sending communication method among cells thereof
CN105740946A (en) * 2015-07-29 2016-07-06 上海磁宇信息科技有限公司 Method for realizing neural network calculation by using cell array computing system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103074210A (en) * 2013-01-25 2013-05-01 深圳先进技术研究院 Monitoring system for in-vitro cell photostimulation array device
CN105718996A (en) * 2015-07-29 2016-06-29 上海磁宇信息科技有限公司 Cell array calculation system and communication method wherein
CN105718379A (en) * 2015-07-29 2016-06-29 上海磁宇信息科技有限公司 Cell array calculation system and group-sending communication method among cells thereof
CN105740946A (en) * 2015-07-29 2016-07-06 上海磁宇信息科技有限公司 Method for realizing neural network calculation by using cell array computing system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197114A (en) * 2019-04-04 2019-09-03 华中科技大学 A kind of automatic identifying method and device of full brain range single neuron aixs cylinder synaptic knob
CN110197114B (en) * 2019-04-04 2021-06-29 华中科技大学 Automatic identification method and device for single neuron axon synaptic junction in whole brain range
CN111753994A (en) * 2020-06-22 2020-10-09 深圳鲲云信息科技有限公司 Data processing method and device of AI chip and computer equipment
WO2021259231A1 (en) * 2020-06-22 2021-12-30 深圳鲲云信息科技有限公司 Artificial intelligence chip and artificial intelligence chip-based data processing method
CN111753994B (en) * 2020-06-22 2023-11-03 深圳鲲云信息科技有限公司 Data processing method and device of AI chip and computer equipment
CN113138957A (en) * 2021-03-29 2021-07-20 北京智芯微电子科技有限公司 Chip for neural network inference and method for accelerating neural network inference

Similar Documents

Publication Publication Date Title
CN106022468B (en) the design method of artificial neural network processor integrated circuit and the integrated circuit
CN108256640A (en) Convolutional neural networks implementation method
JP6714690B2 (en) Information processing system, method of operating information processing system, and machine learning computing unit
CN105740946B (en) A kind of method that application cell array computation system realizes neural computing
US11423296B2 (en) Device and method for distributing convolutional data of a convolutional neural network
CN105718996B (en) Cellular array computing system and communication means therein
CN107918794A (en) Neural network processor based on computing array
CN110390388A (en) Neuromorphic circuit with 3D stacked structure and the semiconductor device including it
KR20200037748A (en) Chip device and related product
CN105469143B (en) Network-on-chip method for mapping resource based on neural network dynamic feature
CN105719000A (en) Neuron hardware structure and method of simulating pulse neural network by adopting neuron hardware structure
CN112464784A (en) Distributed training method based on hybrid parallel
CN108256637A (en) A kind of cellular array three-dimensional communication transmission method
CN110348574A (en) A kind of general convolutional neural networks accelerating structure and design method based on ZYNQ
CN113240127B (en) Training method and device based on federal learning, electronic equipment and storage medium
CN108647776A (en) A kind of convolutional neural networks convolution expansion process circuit and method
CN108090496A (en) The method and apparatus of image procossing based on convolutional neural networks
CN107766932A (en) Image processing method and device based on neural network
CN109670581A (en) A kind of computing device and board
US9196077B1 (en) Efficient inter-processor communication in ray tracing
CN108256641A (en) For the cellular array internal network communication method of cellular array computing system
US9558530B2 (en) Method and apparatus for an inter-cell shortest communication
CN109740619A (en) Neural network terminal operating method and device for target identification
CN108256622A (en) A kind of cellular array computing system
CN108256639A (en) For the cellular array mesh communication network of cellular array computing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180706