CN105740946A - Method for realizing neural network calculation by using cell array computing system - Google Patents

Method for realizing neural network calculation by using cell array computing system Download PDF

Info

Publication number
CN105740946A
CN105740946A CN201510456306.2A CN201510456306A CN105740946A CN 105740946 A CN105740946 A CN 105740946A CN 201510456306 A CN201510456306 A CN 201510456306A CN 105740946 A CN105740946 A CN 105740946A
Authority
CN
China
Prior art keywords
cell
data
cellular array
array
master cpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510456306.2A
Other languages
Chinese (zh)
Other versions
CN105740946B (en
Inventor
戴瑾
郭民
郭一民
王践识
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ciyu Information Technologies Co Ltd
Original Assignee
Shanghai Ciyu Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ciyu Information Technologies Co Ltd filed Critical Shanghai Ciyu Information Technologies Co Ltd
Priority to CN201510456306.2A priority Critical patent/CN105740946B/en
Publication of CN105740946A publication Critical patent/CN105740946A/en
Application granted granted Critical
Publication of CN105740946B publication Critical patent/CN105740946B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

The invention relates to a method for realizing neural network calculation by using a cell array calculation system. The cell array calculation system includes a main control CPU, a cell array and a cell array bus; the cell array is a two-dimensional array composed of more than one cell with a calculation function and a storage function; the main control CPU communicates with each cell in the cell array through the cell array bus; and communication interfaces are arranged between adjacent cells and can transmit data to each other. The method includes the following steps that: each cell in the cell array, which participates in neural network calculation, stores the code of one or more than one neuronal function; any cell in the cell array, which participates in neural network calculation, receives execution results for the codes of the neuronal functions which are outputted by other cells and adopts the execution results as input data of the cell so as to execute the codes of the neuronal functions stored by the cell, and outputs an execution result, or stores the execution result into a preset address. With the method of the invention adopted, communication bottlenecks between the CPU and memory as well as between the CPU and storage of an existing computer architecture can be eliminated, and the overall performance of the system can be improved.

Description

A kind of method that application cell array computation system realizes neural computing
Technical field
The present invention relates to computer and Computer Applied Technology field, particularly to a kind of method that application cell array computation system realizes neural computing.
Background technology
As a rule, a computer mainly includes three cores: central processing unit (CPU, CentralProcessingUnit), internal memory and storage.
Through the unremitting effort of some Apical Limited of the world, CPU has evolved into extremely complicated semiconductor chip.Metal-oxide-semiconductor number within top CPU core can more than 100,000,000.Current industry trend is to be limited by power consumption, and the running frequency of CPU is difficult to improve again.Extremely complicated modern CPU, operational efficiency is similarly difficult to improve again.New CPU product, more and more towards the evolution of multinuclear direction.
In internal memory, what occupy dominant position at present is dynamic random access memory (DRAM, DynamicRandomAccessMemory) technology.DRAM can quick random read-write, but content can not be kept in the event of a power failure.Even if it practice, when energising, it is also due to internal being used for stores up the electric leakage of stored capacitor and lose information, it is necessary to periodically self-refresh.
In storage, nand flash memory technology is progressively replacing conventional hard.Floating grid (floatinggate) technology that flash memory relies on, although content can be kept in the event of a power failure, but the speed that write (is rewritten as ' 0 ' by ' 1 ') is very slow, the speed that erasing (is rewritten as ' 1 ' by ' 0 ') is slower, it is impossible to for directly supporting what calculate as DRAM.It is made into block device (blockdevice), it is necessary to monoblock is wiped together, and a block (block) comprises a lot of page (page), and after erasing, each page can read operation.The another one problem of NAND is to have the limited life-span.
DRAM and nand flash memory, and the logic circuit of CPU, although be all based on cmos semiconductor explained hereafter, but the technique of this three is each other and incompatible.Then, three cores of computer cannot coexist on a single die, and this has had a deep effect on the framework of modern computer.
Computer architecture of the prior art is as shown in Figure 1, multiple CPU cores shown in Fig. 1, respectively CPU1, CPU2, CPU3 ..., CPUn, each CPU core generally has corresponding level cache (L1Cache), can also be that each CPU core is equipped with corresponding L2 cache (L2Cache), three grades of buffer memorys (L3Cache) as required further.By Double Data Rate (DDR between DRAM and each CPU core, DoubleDataRate) interface communicates, hard disk (HD, or then communicated by peripheral interface between solid state hard disc (SSD, SolidStateDrives) and each CPU core HardDisk).
On the one hand, CPU is developing to the direction of multinuclear, and internal memory and storage are all in other chip on the other hand.Multi-core CPU is handled up the proportional increase of quantity of information, and the communication with internal memory, storage just increasingly becomes the bottleneck of systematic function.In order to alleviate communication performance bottleneck, CPU has to adopt increasing multi-level buffer.Buffer memory be in internal memory content replicate, it is common that with cost more much higher than DRAM but what speed static RAM faster (SRAM, StaticRandomAccessMemory) designed.Such framework, cost effectiveness difference very.The cost of semiconductor chip is determined by the area of its silicon chip, and the performance boost that convention computer architecture brings is far from proportional to the increase of its silicon area.
Neutral net (NeuralNetworks) is computational methods conventional in computer learning, is the operation principle copying human brain, generally uses perceptron (Perceptron) or neuron (Neuron) such concept.The learning training process of NeuralNetworks is the process that a magnanimity calculates, one neutral net is actually a function having quantity of parameters (being likely to there are tens thousand of), need substantial amounts of scene, each scene has input data and correct option, is used for regulating these substantial amounts of parameters to reach the destination of study.Amount of calculation owing to relating in neural computing is very big, and computer architecture of the prior art then can have a strong impact on computer overall performance because of the communication performance bottleneck existed between CPU and internal memory, storage, thus being unfavorable for the efficient realization of neural computing.
Summary of the invention
The problem to be solved in the present invention is computer architecture of the prior art affects computer overall performance because of the communication performance bottleneck existed between CPU and internal memory, storage, thus being unfavorable for the efficient realization of neural computing.
For solving the problems referred to above, technical solution of the present invention provides a kind of method that application cell array computation system realizes neural computing, and described cellular array computing system includes: master cpu, cellular array and cellular array bus;The two-dimensional array that described cellular array is made up of the more than one cell having calculating and storage function concurrently, each of which cell includes microprocessor (MPU, and non-volatile (NV, NotVolatile) random access memory MicroProcessingUnit);The random access memory of involved data when described non-volatile random access memory calculates for described microprocessor, is additionally operable to the instruction code of storage software and needs the data of persistence;Position in the described cellular array of each leisure of each cell storage is read for the software in cell or hardware as identity recognition number (ID, identification);Described master cpu is communicated with each cell in described cellular array by described cellular array bus;There is communication interface between flanking cell in described cellular array, can mutually send data;Described method includes: store the code of one or more neuron functions in described cellular array in the cell of each participation neural computing;Described master cpu selects one or more cells to perform the code of the stored neuron function of each cell, and execution result is exported to one or more target cell;The cell of arbitrary participation neural computing receives the execution result of the code for neuron function exported from other cells, it can be used as the input data of this cell, the code of the stored neuron function of this cell is performed based on all input data from other cells, and the cell that result exports the neuron function place of this execution result of needs will be performed, or it is stored in preset address, or exports to described master cpu.
Optionally, can communicate between any two cell in described cellular array, the cell participating in cell-cell communication comprises starting point cell, end point cell and transfer cell, described starting point cell is the cell sending data to described end point cell, described end point cell is the final cell receiving described the sent out data of starting point cell, described transfer cell is adjacent successively along cell-cell communication path and by the sent out data of starting point cell described in described communication interface transfer cells, described cell-cell communication path is by described starting point cell, the data transmit-receive path that transfer cell and end point cell are constituted;Described method also includes: be the execution result being transmitted the code for described neuron function by described cell-cell communication between any two cell.
Optionally, arbitrary cell in described cellular array can also carry out mass-sending communication as described starting point cell to all cells in target area, participate in the described cell communicating and being positioned at target area of mass-sending as described starting point cell or as described end point cell or simultaneously as described transfer cell and end point cell, participate in the described cell communicated and be positioned at outside target area of mass-sending as described starting point cell or transfer cell;Described method also includes: if neutral net is divided into different layers, then use described mass-sending communication to realize data transmission between layers, and the cell participating in neural computing in same layer is positioned at same target area.
Optionally, described cellular array is additionally provided with at least one full-time output cell, described full-time output cell receives and stores other cells output data to described master cpu as described end point cell, and notifies that described master cpu reads described output data with interrupt signal;Described execution result is included to the output of described master cpu: by described cell-cell communication, described execution result is sent to described full-time output cell, read for described master cpu.
Optionally, described non-volatile random access memory is magnetic RAM (MRAM, MagneticRandomAccessMemory).
Optionally, described master cpu selects the cell in whole in described cellular array or target area as the cell participating in neural computing, by in the same segment relative address of the code broadcast of identical neuron function to selected each cell, and broadcasting instructions makes the microprocessor of one or more selected cells start to perform the code of described neuron function from described relative address.
Compared with prior art, technical scheme at least has the advantage that
In current neural network algorithm, the process of learning training is usually the problem that a magnanimity calculates;The learning training time of one network is very long;Neutral net in cellular array, it is possible to achieve the parallel computing of thousands of neurons, the learning process of the neutral net greatly accelerated, improve learning capacity.
nullBy by the more than one unit (being called " cell ") composition two-dimensional array (being called " cellular array ") having independent calculating and storage function concurrently,Each of which cell includes microprocessor and non-volatile random access memory,Described non-volatile random access memory can support described microprocessor random access memory of involved data when being calculated,Also can support the instruction code of storage software and need the data of persistence,Make internal memory、Storage、Calculate three functions to be integrated in each cell,And make to be formed between each cell intensive communication network,On the one hand,Master cpu can be communicated with each cell in cellular array by cellular array bus,On the other hand,Also data can be mutually sent between flanking cell in cellular array,Thus can be mass-sended by data and internal network,Overcome active computer framework because of CPU and internal memory、The communication performance bottleneck existed between storage,Thus promoting the overall performance of computing system,And cost effectiveness can be made preferably;When the above-mentioned computing system with cellular array is applied to realize neural computing, the arithmetic speed that the parallel computing of a large amount of cells greatly accelerates, so that the speed of learning training greatly improves, and in cell non-volatile random access memory have non-volatile so that train successful chip can directly replicate as solve particular problem production marketing.
By communication interface between flanking cell in cellular array, realize the repeatedly transfer for data between flanking cell, make just can communicate not against master cpu between any two cell in cellular array, the bandwidth relying on cellular array internal network communication huge improves the efficiency of cell-cell communication, also reduce the processing load of master cpu, it is thus possible to promote the overall performance of computing system further, it is more beneficial for the efficient realization of neural computing.
By intercellular point communications functionality being expanded to region mass-sending, it is possible to support the higher depth of parallelism, it is thus achieved that much higher total bandwidth, thus promoting the overall performance of computing system further, it is remarkably contributing to improve the speed of neural computing.
By arranging full-time output cell in cellular array, receive and store other cells to the output data of master cpu using described full-time output cell as described end point cell, and in the way of interrupt signal, notify that described master cpu reads described output data, so can improve master cpu when only a few cell needs and exports data to master cpu and read the efficiency of output data, it is thus possible to more efficiently the final result of neural computing is exported to master cpu.
The cellular array computing system provided in technical solution of the present invention solves the communication performance bottleneck problem between CPU and internal memory.So, under identical silicon area, the disposal ability of a large amount of miniature MPU, the disposal ability of remote a few top CPU super in a lot of problems, power consumption is then much lower.
Cellular array uses the result (each weighted data) of nonvolatile storage storage learning training, once complete training.Whole chip product can be sold in simple copy.
The cellular array computing system provided in technical solution of the present invention makes the framework of computer closer to human brain, provides strong engine for following intelligent algorithm.
Accompanying drawing explanation
Fig. 1 is the schematic diagram of prior art Computer framework;
Fig. 2 is a kind of structural representation of the cellular array computing system that the embodiment of the present invention provides;
Fig. 3 be the embodiment of the present invention a kind of flanking cell between the schematic diagram of communication mode;
Fig. 4 be the embodiment of the present invention another kind of flanking cell between the schematic diagram of communication mode;
Fig. 5 is the structural representation of the cell of the embodiment of the present invention;
Fig. 6 be the embodiment of the present invention cellular array in carry out the structural representation of cell of cell-cell communication;
Fig. 7 be the embodiment of the present invention cellular array in the schematic diagram of Path selection of cell-cell communication;
Fig. 8 is the implementation process schematic diagram of the sole duty output cell of the embodiment of the present invention;
Fig. 9 be the embodiment of the present invention cellular array in starting point cell on angle, target area cell mass-sending schematic diagram;
Figure 10 be the embodiment of the present invention cellular array in starting point cell on limit, target area cell mass-sending schematic diagram;
Figure 11 be the embodiment of the present invention cellular array in starting point cell within target area cell mass-send schematic diagram;
Figure 12 be the embodiment of the present invention cellular array in starting point cell outside target area cell mass-sending schematic diagram;
Figure 13 is the another kind of structural representation of the cellular array computing system that the embodiment of the present invention provides;
Figure 14 is another structural representation of the cellular array computing system that the embodiment of the present invention provides;
Figure 15 is the yet another construction schematic diagram of the cellular array computing system that the embodiment of the present invention provides;
Figure 16 is a neuronic schematic diagram in neutral net;
Figure 17 is the schematic diagram of neural computing.
Detailed description of the invention
The computer architecture of prior art, owing to there is communication performance bottleneck between CPU and internal memory, storage, thus affecting the lifting of computer overall performance, and makes cost effectiveness poor.
Present inventor considers after research, if internal memory, storage, three functions of calculating are integrated on a chip, formed relatively easy but have independent calculating and the unit of storage function concurrently, and make to form intensive communication network between this type of unit a large amount of, realize data with this and broadcast/mass-send function and can the internal network of parallel transmission data on a large scale, just developing a kind of computing architecture that there is similarity with human brain, this is equivalent to substantial amounts of microcomputer to do on a single die.
For this, technical solution of the present invention provides a kind of computing architecture (in technical solution of the present invention be referred to as " cellular array computing system ") similar to human brain structure, this computing architecture is relatively easy by numerous structures, have storage and computing function concurrently, have again unit (being referred to as " cell " in the technical solution of the present invention) composition that dense network connects.This new computing architecture will be widely used in the fields such as mass computing, the process of big data, artificial intelligence.
Understandable for enabling the above-mentioned purpose of the present invention, feature and advantage to become apparent from, below in conjunction with accompanying drawing, specific embodiments of the invention are described in detail.
As in figure 2 it is shown, the cellular array computing system that the embodiment of the present invention provides includes: master cpu, cellular array and cellular array bus;Described cellular array is the main body in described cellular array computing system, the two-dimensional array that it is made up of the more than one cell having calculating and storage function concurrently, each of which cell includes microprocessor (MPU) and non-volatile random access memory (for MRAM in Fig. 2);The random access memory of involved data when described non-volatile random access memory calculates for described microprocessor, is additionally operable to the instruction code of storage software and needs the data of persistence;Position in the described cellular array of each leisure of each cell storage is read for the software in cell or hardware as ID;Described master cpu is communicated with each cell in described cellular array by described cellular array bus;There is communication interface between flanking cell in described cellular array, can mutually send data.
It should be noted that in the present embodiment, described non-volatile random access memory is to illustrate for MRAM.In other embodiments, along with the development further of non-volatile random memory technology is with ripe, described non-volatile random access memory can also adopt other several potential technology to realize, such as: phase-change random access memory (PCRAM, PhaseChangeRandomAccessMemory), resistive formula random access memory (ResistiveRandomAccessMemory), Ferroelectric Random Access Memory (FeRAM, FerroelectricRandomAccessMemory), ferroelectric dynamic random access memory (FEDRAM, FerroelectricDynamicRandomAccessMemory) etc..
MRAM is a kind of new internal memory and memory technology, it is possible to quick random read-write as SRAM/DRAM, and faster than DRAM;As flash memory, can also forever retain data after a loss of power, and unlike nand flash memory, MRAM can not limit secondaryly erasable, and service life is longer.Additionally, the economy of MRAM thinks that locality is good, the silicon area that unit capacity takies has very big advantage than SRAM (buffer memory usually used as CPU), is expected to the level close to DRAM.Its performance is also fairly good, and read-write time delay is close to best SRAM, and power consumption is then best at various internal memories and memory technology.And MRAM is incompatible with standard CMOS semiconductor technique unlike DRAM and Flash, MRAM can be integrated in a chip with logic circuit.By adopting MRAM technology, it is possible to internal memory, storage, three functions of calculating are integrated on a chip, make the realization of described cellular array computing system just can be possibly realized.
In the present embodiment, described microprocessor has the function of usual CPU, Floating-point Computation processor (FPU can also be added according to concrete application scenarios, FloatPointUnit), the unit such as image processor, therefore, described microprocessor can be integrated with at least one in Floating-point Computation processor and image processor.
When reality is implemented, described master cpu can be integrated in a chip with described cellular array and described cellular array bus, described master cpu as independent chip, can also be communicated with the chip being made up of described cellular array and described cellular array bus by the memory interface of standard.When the memory interface adopting standard between described master cpu and described cellular array communicates, then master cpu can adopt general cpu chip to realize, it is easier to the enforcement of described cellular array computing system.
In the present embodiment, position in each comfortable cellular array of each cell storage is as ID, described position can adopt the coordinate representation mode of first quartile in plane right-angle coordinate, if (x, y) represent certain cell position in cellular array, then (x, y) can be stored in this cell as ID, software and hardware in cell can read this ID, uses in concrete operation.
In the present embodiment, the communication that described master cpu is undertaken by described cellular array bus and each cell in described cellular array includes situations below:
The non-volatile random access memory of arbitrary cell in described cellular array is read and write by address;
Data are broadcast in described cellular array the non-volatile random access memory of each cell in target area, and write relative address identical in the non-volatile random access memory of each cell in this target area;
Send instruction (including starting, suspending) to the microprocessor of cell arbitrary in described cellular array, send data or reading state;
To the microprocessor broadcasting instructions of all cells in target area.
Certainly, in other embodiments, described master cpu can also be one or more combinations in above-mentioned situation by the communication that described cellular array bus and each cell in described cellular array carry out.
It should be noted that, " target area " in the embodiment of the present invention refers to the region being selected the more than one cell that there are neighbouring relations to constitute by the arbitrary cell in described master cpu or described cellular array, and the cell in this region is the object of the arbitrary cell broadcast data in master cpu or cellular array or instruction.In the present embodiment, described target area is specifically with rectangular area (a≤x≤b, c≤y≤d, wherein a, b are this rectangular area boundary coordinate in x-axis direction in plane right-angle coordinate respectively, and c, d are this rectangular area boundary coordinate in y-axis direction in plane right-angle coordinate respectively) illustrate for example.In other embodiments, described target area can also be the region that other shapes are constituted, for instance diamond-shaped area, delta-shaped region, hexagonal area etc..
Additionally, the concept of " broadcast " is different from the concept of " mass-sending " in the embodiment of the present invention, the former can be send out a pass according to or instruction all of object can be received, the latter point issues different objects many times over.
Except cell arbitrary in described cellular array (including the microprocessor in cell or non-volatile random access memory) being broadcasted by master cpu, in described cellular array, also has a communication network, this communication network enables to a cell under the control of its MPU, can send data to the cell adjacent with it.As it is shown on figure 3, in a plane, arbitrary cell can communicate with the flanking cell of its upper and lower, left and right four direction.Certainly, between flanking cell, the concept of communication mode is not limited only to " upper and lower, left and right four direction ", when configuration can be supported, can also be " upper and lower, left and right, upper left, upper right, lower-left, direction, eight, bottom right ", as shown in Figure 4, arbitrary cell can with its upper and lower, left and right, upper left, upper right, lower-left, eight directions, bottom right flanking cell communicate.
As shown in Figure 5, in the present embodiment, cell in described cellular array specifically can also include bus control unit and cell interior bus, described bus control unit is connected with described cellular array bus, microprocessor and cell interior bus, described bus control unit is for identifying the communication carried out between described master cpu and this cell, connect described microprocessor to transmit instruction or data, the state reading of the transmission of described master cpu, or carried out the read-write operation of data by the described cell interior bus described non-volatile random access memory of connection.
Those skilled in the art know, one fairly simple and the good CPU of performance, such as ARMCotexM0, the only metal-oxide-semiconductor of about 50,000, even if appropriateness increases FPU function, also more than one hundred million the metal-oxide-semiconductors than top CPU are much smaller, and it is out-of-proportion that the area (cost) that lifting cpu performance brings increases.One big CPU is replaced with many little CPU, and when totle drilling cost is the same, total computing capability necessarily increases many times.But convention computer architecture is limited by communication performance bottleneck, the actual performance that a large amount of CPU core brings is used to promote very limited.
And the cellular array computing system that technical solution of the present invention provides is by data broadcast/mass-sending and internal network, the problem solving communication performance bottleneck, thus promoting the overall performance of computing system, and cost effectiveness can be made preferably, this will be more clearly visible that this point in follow-up application example.
Preliminary research shows, if using the MPU being similar to CortexM0, coordinating the internal memory of 32KB, forming a cell.The technique using 40 nanometers, it is possible to 3000 such cells are done on a single die, this is very powerful computing capability.Further investigations have shown that, make in this way, the computing capability (general flops per second (FLOPS, floating-pointoperationspersecond) is measured) of contemporary top CPU can be surmounted in same silicon area.Owing to the cellular array computing system of technical solution of the present invention no longer faces the bottleneck with memory interface, in solving a lot of practical problems, the meeting of performance is better.
Based on above-mentioned cellular array computing system, the embodiment of the present invention also provides for the communication means in a kind of above-mentioned cellular array computing system, including traffic operation between flanking cell in the read-write operation of non-volatile random access memory of: master cpu, traffic operation between master cpu and microprocessor, the broadcast operation of master cpu and cellular array;
The operation of described master cpu read-write non-volatile random access memory specifically includes: the arbitrary cell in described cellular array receives the destination address that described master cpu is broadcasted in described cellular array bus, if judging, described destination address is in this cell, then connect the non-volatile random access memory of this cell so that described master cpu carries out the read-write operation of data.
Traffic operation between described master cpu and microprocessor specifically includes: in system address space, reserved first special address field is for the communication between described master cpu and microprocessor the ID storing target cell, if in described cellular array, arbitrary cell identifies when receiving described first special address field is the communication of microprocessor with this cell, then connects the microprocessor of this cell and complete follow-up command reception, data receiver and status read operation.
It should be noted that, described system address space is not limited only to the address space summation being made up of the non-volatile random access memory comprised in each cell of cellular array, because the memorizer connecting cellular array bus is likely in each cell of not just cellular array the non-volatile random access memory comprised, entirely possible there is also other kinds of memorizer and be connected with cellular array bus, access for described master cpu.Therefore, described master cpu needs the ID according to cell to go to identify its cell (now this cell is called " target cell " in the present embodiment) preparing to access.
The broadcast operation of described master cpu specifically includes: in system address space, reserved second special address field is for described master cpu broadcasting instructions, described second special address field has can assist in the ID of each cell of the scope of target area in described cellular array, if arbitrary cell identifies this cell in described target area after receiving described second special address field in described cellular array, then connect the microprocessor of this cell to transmit instruction or the data of the transmission of described master cpu, state reads, or the read-write operation of data is carried out by connecting the non-volatile random access memory of this cell.
For specifically rectangular area, described target area, the broadcast operation of described master cpu is illustrated below.In system address space, reserved one section is used as broadcasting instructions, and a section in this address is for storing the ID of starting point cell in target rectangle region.Described starting point cell is the first cell accessed by master cpu in this target rectangle region, after bus control unit in this cell receives this special address, the data of a reception word (word) below, these data include the ID in target rectangle region with the cell at starting point cell diagonal angle.Bus control unit judges that this cell is in this region, receives the data of second word.Second sign is bright is the instruction to MPU or data, or some relative address starts write from non-volatile random access memory.If the former, connect MPU, if the latter, connect non-volatile random access memory and complete operation below.
It should be noted that, when the memory space of described second special address field is relatively limited, the cell ID of its storage might not determine therefrom that the scope of target area completely, after now receiving described second special address field, in addition it is also necessary to receive follow-up data to coordinate stored cell ID in described second special address field jointly to determine the scope of target area.
In cellular array, the traffic operation between flanking cell includes: the arbitrary cell in described cellular array sends data to flanking cell under the control of its microprocessor.
In the present embodiment, each cell is provided with a bus control unit, is connected with cellular array bus.Cell interior arranges cell interior bus, described non-volatile random access memory be this cell interior bus from equipment (Slave), described bus control unit and microprocessor are main equipment (Master).
" the arbitrary cell in described cellular array judges that described destination address is whether in this cell " related in communication means in above-mentioned cellular array computing system, " identify whether the communication of microprocessor with this cell ", " identify that this cell is whether in described target area ", and " connecting non-volatile random access memory or microprocessor " is all completed by described bus control unit, described bus control unit connects described non-volatile random access memory by described cell interior bus.
In the specific implementation, described master cpu is higher than this intracellular microprocessor read-write operation for corresponding non-volatile random access memory for the priority of the read-write operation of the non-volatile random access memory of cell arbitrary in described cellular array.If it is to say, the master cpu such as the microprocessor in certain cell needs to read and write the non-volatile random access memory in this cell, palpus just can carry out after completing for the read-write operation of the non-volatile random access memory of this cell.
Being embodied as of communication means in described cellular array computing system with reference to the enforcement of above-mentioned cellular array computing system, can also repeat no more herein.
In the present embodiment, for the enforcement of the internal network of cellular array, not only can send data to flanking cell, but also it is expanded to and can be sent to any one cell from a cell, namely be capable of the cell-cell communication in cellular array.
Specifically, can communicate when not against master cpu between any two cell in the cellular array of described cellular array computing system, the cell participating in cell-cell communication comprises starting point cell, end point cell and transfer cell, described starting point cell is the cell sending data to described end point cell, described end point cell is the final cell receiving described the sent out data of starting point cell, described transfer cell is adjacent successively along cell-cell communication path and by the sent out data of starting point cell described in described communication interface transfer cells, described cell-cell communication path is by described starting point cell, the data transmit-receive path that transfer cell and end point cell are constituted.
By communication interface between flanking cell in cellular array, realize the repeatedly transfer for data between flanking cell, make just can communicate not against master cpu between any two cell in cellular array, improve the efficiency of cell-cell communication, also reduce the processing load of master cpu, it is thus possible to promote the overall performance of computing system further.
It should be noted that, described starting point cell, end point cell and transfer cell are all the relative concepies for certain cell-cell communication process, because certain starting point cell is likely to as the transfer cell in other cell-cell communication processes or end point cell completely, certain end point cell is likely to as the transfer cell in other cell-cell communication processes or starting point cell.
In the specific implementation, cell in described cellular array can also include the network controller being connected with described microprocessor, described network controller, for the data sent, the data of transfer or the data of final reception being carried out transmitting-receiving control when cell-cell communication, is additionally operable to send interrupt signal to described microprocessor.In the present embodiment, by arranging a network controller in each cell, in order to the interim data rapidly when not disturbing MPU, thus reduce the processing load of MPU in cell.In other embodiments, it is also possible to be not provided with described network controller, but realized the transfer of data by MPU.
In the present embodiment, " data sent " refer to the data that described starting point cell is sent out self;" data of transfer " refer to the data that starting point cell described in described transfer transit cell sends, and these data are not send required for this cell self;" the final data received " refer to end point cell received data, and these data are arriving at after repeatedly transfer, will no longer carry out transfer." data sent ", " data of transfer " and " the final data received ", be likely identical data for content, only belongs to the different addresses in different communication stage.
In the specific implementation, cell in described cellular array can also include a group or more First Input First Output (FIFO being connected with described network controller, FirstInputFirstOutput), each group First Input First Output corresponding cell adjacent with this cell respectively, each group of First Input First Output includes input First Input First Output and output First Input First Output, described input First Input First Output inputs the data that this cell carries out transfer or the data finally received for storing, the described output First Input First Output data for storing the need exported from this cell and carry out the data of transfer or this cell sending to other cells.
If when for communication mode between flanking cell shown in Fig. 3, the cellular array of the present embodiment carries out the structure of cell of cell-cell communication as shown in Figure 6, network controller in Fig. 6 is connected with MPU and 4 groups of fifo queues respectively, each group fifo queue respectively with this cell at " the upper and lower, left and right four direction of two dimensional surface " adjacent cell one_to_one corresponding, when being embodied as, the communication port between each two flanking cell can share one group of corresponding fifo queue.Often group fifo queue all comprises input FIFO and output FIFO, stand in the angle of one of them cell, input FIFO deposits the data inputted from other flanking cells, output FIFO deposits the data exported from this cell to other flanking cells, the output FIFO of flanking cell belongs to input FIFO for belonging to input FIFO, the output FIFO of this cell this cell for flanking cell.
It should be noted that, corresponding in cell shown in Fig. 6 exist 4 groups of fifo queues, if being positioned at the cell at 4 angles of the cellular array of rectangle, then this cell only has two flanking cells, the now corresponding 2 groups of fifo queues of this cell, if being positioned at the cell on 4 limits of the cellular array of rectangle, then this cell has three flanking cells, now the corresponding 3 groups of fifo queues of this cell.
In the present embodiment, network controller also connects with this intracellular MPU, sends out interrupt signal to it, and such as FIFO empty, FIFO is full, newly arrive data, data are sent etc.;MPU then can send data by network controller, and the data sent would generally first be put in a corresponding output fifo queue.
It should be noted that the cellularity in Fig. 6 illustrate only the module relevant to carrying out cell-cell communication, it will be appreciated to those of skill in the art that the cellularity shown in Fig. 6 can combine with the cellularity shown in Fig. 5 completely.
Additionally, the present embodiment adopts fifo queue storage input and exports the data of certain cell, so can make the data relay in cell-cell communication process more efficiently, reduce the process load of MPU.In other embodiments, input and export the data of certain cell and can also be realized by depositor.
The embodiment of the present invention also provides for the communication means in a kind of above-mentioned cellular array computing system between cell, including: the data that the starting point cell in described cellular array will send to end point cell, send to the cell adjacent with described starting point cell by selected sending direction;When the arbitrary cell in described cellular array receives the data of data that flanking cell sends or transfer, if the ID according to the described end point cell indicated in the data received judges that this cell is end point cell, then the data received are stored in the non-volatile random access memory of this cell, or notify that the data received are processed by the microprocessor of this cell, otherwise this cell is as transfer cell, gives the cell adjacent with this cell by the described data relay received after selected sending direction.
In the specific implementation, the each data related in cell-cell communication process all can contain the ID of starting point cell and end point cell, arbitrary cell according to the ID of the end point cell indicated in the data received just may determine that these data be intended for this cell need for further in be given to other flanking cells.One data is by the connection between flanking cell, through repeatedly transfer, reach home cell, if this end point cell needs to make feedback with regard to the sent out data of starting point cell, then according to the ID of starting point cell, feedback data can be sent to starting point cell, the ID of the starting point cell that described end point cell is indicated in the data received is as the ID of end point cell, feedback data obtained after the data received are processed is indicated, starting point cell when now this end point cell becomes a new cell-cell communication, original starting point cell then becomes end point cell during this cell-cell communication.
When being embodied as, while indicating the ID of end point cell, the data that described starting point cell sends to end point cell also indicate address to be accessed in described end point cell or MPU;The described non-volatile random access memory that the data received are stored in this cell, carries out after being the address to be accessed indicated in identifying the data received of described end point cell;The data received are processed by the described MPU notifying this cell, carry out after being the MPU that indicates in identifying the data received of described end point cell.
When reality is implemented, if behind the address to be accessed that end point cell is indicated in identifying the data received, the data then received can be write direct by the network controller in this end point cell this cell non-volatile random access memory in appropriate address, in this case, cell can realize " breeding ", and a cell downloads to another one cell;If after the MPU that end point cell is indicated in identifying the data received, the data of reception will transfer to the MPU process in end point cell.
In the present embodiment, owing to the cell in cellular array also includes the network controller that is connected with MPU, therefore described starting point cell sends arbitrary cell in data, described cellular array to end point cell and receives the data of data that flanking cell sends or transfer and judge that this cell is final cell or transfer cell, the data received is stored in the non-volatile random access memory of this cell or notifies that the data received are processed by the MPU of this cell, is all complete under the control of described network controller.
When being embodied as, the data that described starting point cell sends to end point cell are first inputted described output First Input First Output by described network controller, then by described network controller from the output of described output First Input First Output to the cell adjacent with described starting point cell;If the arbitrary cell in described cellular array receives the data of data that flanking cell sends or transfer, then the data received are inputted described input First Input First Output, and again these data are inputted described output First Input First Output when the data judging to receive need to carry out transfer.
In addition, if described network controller judges that described input First Input First Output or output First Input First Output are empty or full, or receive flanking cell and send or the data of transfer, or send data or interim data to flanking cell, then send interrupt signal to described microprocessor.
In the specific implementation, described starting point cell or transfer cell can select described sending direction in the following way: if can form the communication path of straight line between described starting point cell or transfer cell and described end point cell, then described sending direction is the direction along described straight line to described end point cell by described starting point cell or transfer cell, otherwise described sending direction is to the direction of flanking cell to be selected by described starting point cell or transfer cell, described flanking cell to be selected is the cell among the cell adjacent with described starting point cell or transfer cell near described end point cell.Certainly, the quantity of described flanking cell to be selected is likely two, now then selects the less cell of the communication task exporting data in the two flanking cell to be selected as transfer cell.
In the present embodiment, described starting point cell or transfer cell be selected sending direction by the way, actually it is also assumed that be the path selection process of cell-cell communication in cellular array.Can consulting the cell that each rectangle in Fig. 7, Fig. 7 represents in cellular array, all cells shown in Fig. 7 is the part in whole cellular array, it is assumed that undertaken by the communication mode shown in Fig. 3 between flanking cell.
If A point represents a starting point cell, the end point cell that this starting point cell prepares to C point place sends data, owing to being clearly, between A point and C point, the communication path that straight line can be formed, then data are sent to the cell at the B point place being adjacent by the cell at A point place, in like manner, the cell at B point place is as transfer cell, continue on the direction interim data to C point place cell of the straight line between A point and C point, cell adjacent successively on the cell-cell communication path formed between A point with C point, the data that A point place cell sends repeatedly are forwarded, until being sent to C point place cell.
If D point represents another starting point cell, the end point cell that this starting point cell prepares to G point place sends data, owing to being clearly, between D point and G point, the communication path that straight line cannot be formed, then among the cell adjacent with D point place cell, the cell at E point place and the cell at F point place are obviously closer to the end point cell at G point place, then the two cell belongs to the flanking cell to be selected of D point place cell, the less cell of the communication task wherein exporting data can be selected as transfer cell, if the communication task of the two cell output data is identical, then choose at random a cell as transfer cell.As it is shown in fig. 7, select cell or the cell at F point place at E point place, different cell-cell communication paths will be formed.
It should be noted that, the present embodiment is for the communication mode between the flanking cell shown in Fig. 3, the Path selection of cell-cell communication is illustrated, what skilled artisans appreciate that is, if the communication mode between the flanking cell shown in employing Fig. 4, then alternative sending direction will be more.
To sum up, when reality is implemented, each sends or the cell of interim data, and its network controller all must select an adjacent cell as the next stop.When beginning and end point-blank time, reasonably select general only one of which;In other situations, there are two same reasonably selection, the neighbours that network controller will select a traffic not relatively to be in a hurry.
If certain input fifo queue has data to enter, then network controller will first check for it:
If terminal is this cell, so: if terminal is specific relative address, owing to network controller has direct memory access (DMA, DirectMemoryAccess) ability, the appropriate address data received being directly stored in described non-volatile random access memory, and use interrupt notification MPU;If terminal is MPU, then directly notify MPU process by interrupt signal.
If terminal is other cells, or the MPU of this cell sends out data, then: if terminal is with this cell point-blank, then selects to be correctly oriented, send data to flanking cell;In other situations, there are two possible directions, select the flanking cell that wherein output fifo queue is more idle to send, if the situation of the output fifo queue in two flanking cells to be selected is identical, then can therefrom choose at random a flanking cell and send.
When reality is implemented, when the thousands of MPU existed in described cellular array calculate together, how the output data of each cell being sent to described master cpu just becomes a problem.In general, each MPU can leave output data in the agreed address in the non-volatile random access memory of its place cell, allows master cpu be read out by the mode of each MPU of poll one by one.But, this is not particularly suited for all problems, in some problem, when in cellular array, only a few cell needs to export data to master cpu, then master cpu poll one by one each MPU efficiency is too low.
Therefore, the cellular array computing system that the embodiment of the present invention provides also includes: be additionally provided with at least one full-time output cell in described cellular array, described full-time output cell receives and stores other cells output data to described master cpu as end point cell, and notifies that described master cpu reads described output data with interrupt signal.
In the specific implementation, fifo queue can also be set in the non-volatile random access memory of described full-time output cell, other cells are stored in this fifo queue to all output data of described master cpu, this fifo queue should have enough memory spaces, all output data storing other cells to described master cpu of having the ability.
During actual enforcement, it is possible to select one or several cell as described full-time output cell in cellular array, generally can select in position and communicate cell more easily between master cpu.It is provided with interrupt line between described full-time output cell and described master cpu, described full-time output cell can send interrupt signal to master cpu, for instance the new fifo queue arranged in the output data of other cells, MRAM is full, FIFO sky etc. of arranging in MRAM.
Based on the above-mentioned cellular array computing system being provided with full-time output cell, the embodiment of the present invention also provides for the communication means in a kind of cellular array computing system, including: after described full-time output cell receives and stores other cells output data to described master cpu, the interrupt signal of reading of giving notice to described master cpu;Described master cpu, after receiving the interrupt signal that described notice reads, reads described output data from described full-time output cell.
When being embodied as, described output data can be sent to described full-time output cell by other cells in the following way described: described output data, as starting point cell, are sent to adjacent cell by the arbitrary cell in other cells described by selected sending direction;When the arbitrary cell in described cellular array receives the described output data that flanking cell sends, if judging, the ID of the end point cell indicated in described output data is consistent with the ID of this cell, the ID of the end point cell owing to indicating in described output data is the ID of described full-time output cell, show that this cell is described full-time output cell, then described output data are stored in the non-volatile random access memory of this cell, otherwise this cell is as transfer cell, gives the cell adjacent with this cell by described output data relay after selected sending direction.
Described output data are sent to the process of described full-time output cell by other cells described, described starting point cell or transfer cell can select described sending direction in the following way: if can form the communication path of straight line between described starting point cell or transfer cell and described full-time output cell, then described sending direction is the direction along described straight line to described full-time output cell by described starting point cell or transfer cell, otherwise described sending direction is to the direction of flanking cell to be selected by described starting point cell or transfer cell, described flanking cell to be selected is the cell among the cell adjacent with described starting point cell or transfer cell near described full-time output cell.
The implementation process of the sole duty output cell of the embodiment of the present invention can also consult Fig. 8.Fig. 8 illustrates master cpu, cellular array and cellular array bus, lattice one by one in cellular array represents cell one by one simply, wherein the cell (cell that namely bold box lattice represents) at J point place is sole duty output cell, Fig. 8 further illustrates the structure of full-time output cell, as represented by dashed arrows in fig. 8, it will be seen that the full-time MRAM exported in cell is provided with the fifo queue storing other cells to all output data of described master cpu.
Assume that the cell at H point place and the cell at I point place need to provide output data to master cpu, described output data are sent the cell to J point place by the communication mode that then can pass through between cell, H point, to the cell-cell communication path of J point and I point to the cell-cell communication path of J point, refers to Fig. 8.Owing to cell-cell communication mode has a detailed description before this, repeat no more herein.
After the output data that the cell of cell or I point place that the cell at J point place receives H point place sends, then can send the interrupt signal of notice reading to master cpu, after master cpu receives the interrupt signal that this notice reads, just can pass through cellular array bus from the cell at J point place, read this output data.
By arranging full-time output cell in cellular array, receive and store other cells to the output data of master cpu using described full-time output cell as described end point cell, and in the way of interrupt signal, notify that described master cpu reads described output data, so can improve master cpu when only a few cell needs and exports data to master cpu and read the efficiency of output data.
Introduce an example applying above-mentioned cellular array computing system below again.
Speech recognition can be compared with the acoustical signal of known sound bank and input, and this comparison can be can also in frequency domain comparison in time domain comparison.When the words needing comparison gets more and more, such as, it is contemplated that different accents can arrive tens thousand of, if the computing capability only relying on a few CPU just seems not enough for Real-time speech recognition.
The cellular array computing system that the embodiment of the present invention provides then is especially suitable for solution problems.
For this, the embodiment of the present invention also provides for a kind of applying the method that above-mentioned cellular array computing system carries out comparing, including: described master cpu is broadcast to alignment programs in the non-volatile random access memory of each cell after selecting the cell in whole in described cellular array or a target area;The sample that described master cpu is responsible for comparison selected each cell is respectively written in the agreed address of each cell;Described master cpu broadcasting instructions give selected cell microprocessor, make each microprocessor complete initialize after etc. data to be compared to be entered;Described master cpu is broadcast to the microprocessor of selected cell data to be compared;The microprocessor of selected cell runs described alignment programs, the sample that the data to be compared received and this cell are responsible for comparison compares, if obtaining the comparative result that both are consistent, then use the communication means in above-mentioned cellular array computing system, described comparative result is sent extremely described full-time output cell as output data and reads for described master cpu.
When being embodied as, described data to be compared both can be speech data to be identified, it is also possible to is view data to be identified, it is also possible to is other data needing to compare.
When reality is implemented, each MPU constantly accepts speech data and compares, under normal circumstances, hundreds of is in several thousand cells, only one of which or a few obtain data to be compared and be responsible for the comparative result that the sample of comparison is consistent with this cell, these cells are sent to full-time output cell comparative result, by interrupt signal, the latter notifies that master cpu receives.
If described data to be compared are specially speech data, then the process of comparing can carry out in time domain or frequency domain, if the latter, first can be carried out fast Fourier transform (FFT by master cpu segmentation, FastFourierTransformation), then broadcast have been converted to frequency domain speech data give selected cell MPU.
Comparing is carried out by applying the above-mentioned cellular array computing system being provided with full-time output cell, substantial amounts of cell in cellular array can be made to compare the computing of program simultaneously, thus there is extremely strong parallel processing capability, solve the communication performance bottleneck problem between CPU and internal memory in prior art, make the ability of real-time voice/image recognition be greatly improved.
As it was previously stated, certain target area from a cell mass photos and sending messages to cellular array, there is a simple method: read information by master cpu and broadcast again.The present embodiment also provides for another implementation: intercellular point communications functionality is extended to region mass-sending, and this mode can support the higher depth of parallelism, much higher total bandwidth.
In the cellular array computing system that the present embodiment provides, arbitrary cell in described cellular array can also carry out mass-sending communication as described starting point cell to all cells in target area, participate in the described cell communicating and being positioned at target area of mass-sending as described starting point cell or as described end point cell or simultaneously as described transfer cell and end point cell, participate in the described cell communicated and be positioned at outside target area of mass-sending as described starting point cell or transfer cell.
When being embodied as, the network controller being connected with microprocessor in each cell, except carrying out cell-cell communication at any two cell, but also when described mass-sending communication, the data sent, the data of transfer or the data of final reception are carried out transmitting-receiving control, and described network controller is additionally operable to send interrupt signal to described microprocessor.
When reality is implemented, the original sender (as the cell of starting point cell in cellular array) of iuntercellular mass-sending communication is responsible for indicating target area, completes the mass-sending of data still through a series of transfers.What skilled artisans appreciate that is, iuntercellular mass-sending communication is it is also assumed that be repeatedly effective superposition of intercellular point-to-point communication, therefore being embodied as that iuntercellular mass-sending communicates can also with reference to the enforcement communicated between any two cell, such as, equally possible a group or more First Input First Output including being connected with described network controller of cell in the described cellular array mentioned before this, repeats no more herein.
Support on the basis of iuntercellular mass-sending communication at above-mentioned cellular array computing system, the embodiment of the present invention also provides for iuntercellular mass-sending communication means in a kind of above-mentioned cellular array computing system, including: when the arbitrary cell in cellular array initiates mass-sending communication as starting point cell to all cells in target area, if this starting point cell is positioned at this target area, iuntercellular is then mass-sended data send to all flanking cells being positioned at this target area, and update target area for each flanking cell, otherwise by the direction near target area, described iuntercellular is mass-sended data to send to adjacent cell;If the cell being positioned at outside target area receives the described iuntercellular mass-sending data that flanking cell sends, after the target area then indicated in judging described iuntercellular mass-sending data does not comprise this cell, described iuntercellular, as transfer cell, is mass-sended data relay to flanking cell by the direction near target area by this cell;If the cell being positioned at target area receives the described iuntercellular mass-sending data that flanking cell sends, after the target area then indicated in judging described iuntercellular mass-sending data comprises this cell, this cell is as end point cell, the described iuntercellular received is mass-sended data and is stored in the non-volatile random access memory of this cell, or notify that described iuntercellular is mass-sended data and processed by the microprocessor of this cell, if yet suffering from the cell adjacent with this cell in target area, then this cell is also as transfer cell, the described iuntercellular received is mass-sended data relay to all flanking cells being positioned at target area, and update target area for each flanking cell;Target area after renewal includes one or more target areas divided by the target area before updating, send in target area before renewal or each flanking cell of cell of iuntercellular mass-sending data described in transfer is contained in each target area after renewal respectively, sent or the cell of iuntercellular mass-sending data described in transfer is got rid of outside target area in the updated.
It should be noted that, the data of certain cell can also be broadcast to certain target area in cellular array due to master cpu, in order to otherwise varied with " broadcast data of master cpu ", mass-sending data involved when therefore iuntercellular mass-sending being communicated in the present embodiment are called " iuntercellular mass-sending data ".Initiate the cell meeting hard objectives region of iuntercellular mass-sending communication, in this target area, the scope of the ID or all cells ID of all cells will be indicated and mass-send among data at described iuntercellular, arbitrary cell receives described iuntercellular mass-sending data, just can mass-send, according to described iuntercellular, the target area indicated in data and judge that this iuntercellular mass-sending data are finally received by this cell, or need to be given in further other flanking cells, or both all need execution.
In addition, described for each flanking cell renewal target area, it is specifically one or more target areas (having sent or outside the cell of iuntercellular mass-sending data has been excluded target area in the updated described in transfer) that the target area before updating is divided, wherein each target area can each self-contained described flanking cell (namely with update before target area in send or the adjacent cell of cell of iuntercellular mass-sending data described in transfer), each flanking cell described proceeds intercellular mass-sending communication in target area after each corresponding renewal, correspondingly, the target area indicated in described iuntercellular mass-sending data equally also can be updated.
In the present embodiment, with communication mode between flanking cell shown in Fig. 3, and the rectangle that is shaped as of the determined target area of starting point cell to initiate mass-sending communication illustrates.It should be noted that, comparatively convenient and efficient mode when iuntercellular mass-sending communication mode given by the present embodiment is actual enforcement, skilled artisans appreciate that, in other embodiments, in above-mentioned cellular array computing system, iuntercellular mass-sending communication means can be suitably used for communication mode or the target area of other shapes between other flanking cells equally.
When being embodied as, as the difference of the cell present position of starting point cell or transfer cell, send or the mode of transfer is by different.
When the first cell as starting point cell or transfer cell is positioned on the angle of rectangular target areas, if the cell quantity all containing the wherein one side on the adjacent both sides of described first cell in described rectangular target areas is 1, target area after then updating is the rectangular area formed after the described rectangular target areas another side on described adjacent both sides gets rid of described first cell, target area after otherwise updating includes the target area of two rectangles, and one of them target area is the rectangular area that in described adjacent both sides, any side is formed after getting rid of described first cell.It should be noted that the general designation of the class cell that is on the angle of rectangular target areas of described first cell in the present embodiment.
nullFig. 9 can be consulted,Assume the starting point cell that cell is initiation iuntercellular mass-sending communication at K point place,Or for being responsible for the transfer cell of transfer iuntercellular mass-sending data,The cell that rectangular target areas 101 is K point place sends or determined target area before iuntercellular mass-sending data described in transfer,Now the cell at K point place is in rectangular target areas 101,And be positioned on the angle of rectangular target areas 101,Owing to rectangular target areas 101 this edge in the horizontal direction only comprises 1 cell,Now the cell only one of which neighbours at K point place can be chosen as next stop transfer,So described iuntercellular is mass-sended data and is sent to the cell at L point place by the network controller of this cell,And rectangular target areas 101 is updated,The target area formed after renewal is rectangular target areas 102,Be equivalent to get rid of outside rectangular target areas 101 cell at K point place;Along with the continuous renewal of target area, if last cell remaining in target area, then stop transfer.
nullAssume that the cell at M point place is also for initiating the starting point cell of iuntercellular mass-sending communication,Or for being responsible for the transfer cell of transfer iuntercellular mass-sending data,The cell that rectangular target areas 103 is M point place sends or determined target area before iuntercellular mass-sending data described in transfer,Now the cell at M point place is in rectangular target areas 103,And be positioned on the angle of rectangular target areas 103,Owing to the adjacent both sides of rectangular target areas 103 all comprise the cell of more than 1,Now the cell at M point place has two neighbours can be chosen as next stop transfer,So described iuntercellular is mass-sended data and is sent to the cell at N point place and the cell at O point place by the network controller of this cell,And rectangular target areas 103 is updated,Target area after renewal includes the target area of two rectangles,One of them target area is rectangular target areas 104,Another target area is rectangular target areas 105,Be equivalent to get rid of outside rectangular target areas 103 cell at M point place,Rectangular target areas 104 and rectangular target areas 105 can continue to carry out data relay with aforementioned similar approach as independent target area;Along with the continuous renewal of target area, if last cell remaining in target area, then stop transfer.
When the second cell as starting point cell or transfer cell is positioned on the limit of rectangular target areas, if the cell quantity on limit adjacent with described second limit, cell place in described rectangular target areas is 1, target area after then updating includes the target area of two rectangles formed after described second cell is got rid of on described second limit, cell place, target area after otherwise updating includes the target area of three rectangles, and two of which target area is two rectangular areas formed after described second cell is got rid of on described second limit, cell place.It should be noted that the general designation of the class cell that is on the limit of rectangular target areas of described second cell in the present embodiment.
nullFigure 10 can be consulted,Assume the starting point cell that cell is initiation iuntercellular mass-sending communication at P point place,Or for being responsible for the transfer cell of transfer iuntercellular mass-sending data,The cell that rectangular target areas 111 is P point place sends or determined target area before iuntercellular mass-sending data described in transfer,Now the cell at P point place is in rectangular target areas 111,And be positioned on certain limit of rectangular target areas 111,Owing to the cell quantity on limit adjacent with the limit, cell place at P point place in rectangular target areas 111 is more than 1,Then now the cell at P point place has three neighbours can be chosen as next stop transfer,Described iuntercellular is mass-sended data and is separately sent to the cell at Q point place by the network controller of this cell、The cell at R point place and the cell at S point place,And rectangular target areas 111 is updated,Target area after renewal includes the target area of three rectangles,Respectively rectangular target areas 112、Rectangular target areas 113 and rectangular target areas 114,Be equivalent to get rid of outside rectangular target areas 111 cell at P point place,Rectangular target areas 112 and rectangular target areas 113 the two target area are the equal of two rectangular areas formed after the cell at the eliminating P point place, limit, cell place at P point place,Rectangular target areas 112、Rectangular target areas 113 and rectangular target areas 114 can continue to carry out data relay with aforementioned similar approach as independent target area;Along with the continuous renewal of target area, if last cell remaining in target area, then stop transfer.
It is understandable that, if the cell quantity on the limit that target area is adjacent with the limit, cell place at P point place in (not indicating in Figure 10) is 1, then now the cell at P point place has two neighbours can be chosen as next stop transfer, described iuntercellular is mass-sended data and is separately sent to the cell at Q point place and the cell at R point place by the network controller of this cell, and target area is updated, target area after renewal includes the target area of two rectangles, is specially rectangular target areas 112 and rectangular target areas 113.
When the 3rd cell as starting point cell is positioned at the inside of rectangular target areas, target area after renewal includes the target area of four rectangles, two of which target area is two rectangular areas formed after described 3rd cell place row or column gets rid of described 3rd cell, and the described rectangular target areas that two other target area is before updating is split, by described 3rd cell place row or column, two rectangular areas formed.It should be noted that described 3rd cell in the present embodiment is in the general designation of a class cell of the inside of rectangular target areas, the inside of described rectangular target areas refers to the region except " angle " and " limit ".
nullFigure 11 can be consulted,Assume the starting point cell that cell is initiation iuntercellular mass-sending communication (in the present embodiment, the cell at T point place can not be responsible for transfer iuntercellular to mass-send the transfer cell of data) at T point place,The cell that rectangular target areas 121 is T point place send described iuntercellular mass-sending data before determined target area,Now the cell at T point place is positioned at the inside of rectangular target areas 121,The cell at T point place has four neighbours can be chosen as next stop transfer,Described iuntercellular is mass-sended data and is separately sent to the cell at U point place by the network controller of this cell、The cell at V point place、The cell at W point place and the cell at X point place,And rectangular target areas 121 is updated,Target area after renewal includes the target area of four rectangles,Respectively rectangular target areas 122、Rectangular target areas 123、Rectangular target areas 124 and rectangular target areas 125,Be equivalent to get rid of outside rectangular target areas 121 cell at T point place,Rectangular target areas 122 and rectangular target areas 123 the two target area are the equal of that the cell at T point place is expert at two rectangular areas of formation after the cell getting rid of T point place,Rectangular target areas 124 and rectangular target areas 125 the two target area are the equal of rectangular target areas 121 is expert at segmentation by the cell at T point place and two rectangular areas being formed,Rectangular target areas 122、Rectangular target areas 123、Rectangular target areas 124 and rectangular target areas 125 can continue to carry out data relay with aforementioned similar approach as independent target area;Along with the continuous renewal of target area, if last cell remaining in target area, then stop transfer.
In the present embodiment, when the 4th cell as starting point cell or transfer cell is positioned at outside target area, if the communication path of straight line can be formed in described 4th cell and target area between arbitrary cell, then described 4th cell send or described in transfer iuntercellular mass-sending data sending direction be along described straight line to the direction of target area by described 4th cell, otherwise described sending direction is to the direction of flanking cell to be selected by described 4th cell, and described flanking cell to be selected is the cell among the cell adjacent with described 4th cell near target area.It should be noted that described 4th cell in the present embodiment is in the general designation of the class cell outside rectangular target areas.
Figure 12 can be consulted, assume the starting point cell that cell is initiation iuntercellular mass-sending communication at Y1 point place, the cell that rectangular target areas 131 is Y1 point place send described iuntercellular mass-sending data before determined target area, now the cell at Y1 point place is outside rectangular target areas 131, owing to this cell is between the extended line on relative two limits of rectangular target areas, and the communication path of straight line can be formed in rectangular target areas between the cell of Y3 point place, now only one of which neighbours can as the transfer of the next stop, the network controller of the cell at Y1 point place is mass-sended data described iuntercellular and is sent to these neighbours, the i.e. cell at Y2 point place, the cell at Y2 point place is as the transfer cell of iuntercellular mass-sending data described in responsible transfer.The cell at Y2 point place is by along the direction interim data shown in dotted arrow in Figure 12, until being sent to Y3 point place cell.Y3 point place cell is positioned on the limit of rectangular target areas 131, it is possible to continued the transfer process in rectangular target areas 131 according to aforementioned correlation technique.
With continued reference to Figure 12, assume the starting point cell that cell is initiation iuntercellular mass-sending communication at Z1 point place, the cell that rectangular target areas 131 is Z1 point place send described iuntercellular mass-sending data before determined target area, now the cell at Z1 point place is outside rectangular target areas 131, due to this cell not between the extended line on relative two limits of rectangular target areas, and all it is difficult to be formed the communication path of straight line in rectangular target areas between arbitrary cell, now there are two neighbours can as the transfer of the next stop, the i.e. cell at the cell at Z2 point place and Z3 point place, the two cell is the flanking cell to be selected of the cell at Z1 point place, because closer to the cell of rectangular target areas 131 among the cell that the cell that the two cell is Z1 point place is adjacent.When reality is implemented, it is possible to arbitrarily select one or practical communication situation selects burden lighter more cell as next stop transfer, the cell that described burden is lighter refers specifically to the cell that the communication task of output data is less.From the cell at Z1 point place, by two feasible transfer communication paths, until by described iuntercellular mass-sending data relay to Z4 point place cell.Z4 point place cell is positioned on the angle of rectangular target areas 131, it is possible to continued the transfer process in rectangular target areas 131 according to aforementioned correlation technique.
Iuntercellular mass-sending communication means in the cellular array computing system that the present embodiment provides, by intercellular point communications functionality being expanded to region mass-sending, the higher depth of parallelism can be supported, it is thus achieved that much higher total bandwidth, thus promoting the overall performance of computing system further.
As previously mentioned, cell in the cellular array of the embodiment of the present invention has internal memory concurrently, storage and three functions of calculating, intracellular described non-volatile random access memory can realize the random access memory of involved data when described microprocessor calculates, the instruction code of software can be stored again and need the data of persistence, but the cost of described non-volatile random access memory is usually higher, so the space that the non-volatile random access memory in cell is used as memory part is limited, so when in cell, the data processing amount of microprocessor is bigger, limited memory headroom is just likely to affect the treatment effeciency of microprocessor, the memory headroom how extending cell just becomes problem demanding prompt solution.
Based on above-mentioned consideration, the embodiment of the present invention gives the another kind of structure of cellular array computing system, as shown in figure 13, described cellular array computing system is except including foregoing master cpu, cellular array and cellular array bus, can further include at least one memory cell array, the two-dimensional array that described memory cell array is made up of more than one internal storage location, described cellular array and all memory cell array are built up three dimensional structure, internal storage location in each memory cell array is connected correspondingly with the cell in described cellular array, described internal storage location coordinates described non-volatile random access memory, both are provided commonly for the random access memory of involved data when described microprocessor calculates.
When reality is implemented, non-volatile random access memory in cell can be MRAM, described memory cell array can be then MRAM, DRAM or SRAM silicon chip, generally can choose lower-cost one or more DRAM silicon chip, wherein each DRAM silicon chip is the memory cell array formed by the internal storage location consistent with each cell position in described cellular array, again all DRAM silicon chips and a cellular array silicon chip are carried out 3D combination, silicon through hole (TSV can be passed through between the cell that either memory unit is corresponding with cellular array, ThroughSiliconVias) establish a communications link, thus extend the internal memory of each cell.
In the embodiment of the present invention, by by least one memory cell array being made up of more than one internal storage location, it is built up three dimensional structure with described cellular array, and make the internal storage location in each memory cell array be connected correspondingly with the cell in cellular array, the random access memory of involved data when described internal storage location calculates for described microprocessor, so just with the memory headroom of each cell in relatively low cost extension cellular array, the treatment effeciency of microprocessor in cell can be improved.
It is to be noted, illustrate only a memory cell array and described cellular array in Figure 13 and be built up the situation of three dimensional structure, those skilled in the art will also appreciate that more than one memory cell array and described cellular array are built up the situation of three dimensional structure equally.
Those skilled in the art it is also understood that be, the communication that aforementioned master cpu is communicated by cellular array bus cell each with cellular array, carry out not against master cpu between any two cell, arbitrary cell carry out mass-sending to all cells in target area and communicates, receive as end point cell and store other cells to the output data of master cpu for the communication means such as master cpu reading, these cellular array computing systems being equally applicable to comprise described memory cell array by arranging the full-time cell that exports in cellular array.
It should be noted that, the memory headroom due to each Cell expansions in cellular array, described master cpu is except accessing the non-volatile random access memory of this cell, the internal storage location corresponding with this cell can also be accessed (when being superimposed as the quantity of memory cell array of three dimensional structure with cellular array and being more than one, the quantity of then corresponding with this cell internal storage location also has more than one), therefore at least one that described master cpu includes in situations below by the communication that described cellular array bus carries out with each cell in described cellular array: read and write the non-volatile random access memory of arbitrary cell in described cellular array or corresponding internal storage location by address;The non-volatile random access memory being broadcast in target area by data each cell or corresponding internal storage location, and write the non-volatile random access memory of each cell this target area in or relative address identical in internal storage location accordingly;Send instruction to the microprocessor of cell arbitrary in described cellular array, send data or reading state;To the microprocessor broadcasting instructions of all cells in target area.
When also including bus control unit and cell interior bus when the cell in described cellular array, described cell interior bus is except connecting described microprocessor, non-volatile random access memory, also corresponding with this cell internal storage location is connected, described bus control unit and described cellular array bus, microprocessor and cell interior bus are connected, described bus control unit is for identifying the communication carried out between described master cpu and this cell, connect described microprocessor to transmit instruction or the data of the transmission of described master cpu, state reads, or the read-write operation of data is carried out by the described cell interior bus described non-volatile random access memory of connection or the internal storage location corresponding with this cell.
As previously mentioned, cell in the cellular array of the embodiment of the present invention has internal memory concurrently, storage and three functions of calculating, intracellular described non-volatile random access memory can realize the random access memory of involved data when described microprocessor calculates, the instruction code of software can be stored again and need the data of persistence, but the cost of described non-volatile random access memory is usually higher, so the space that the non-volatile random access memory in cell is used as storage part is limited equally, so when there is heap file or data need each cell being stored in cellular array, limited memory space just cannot meet memory requirement, even it is also possible to the treatment effeciency of microprocessor can be affected, the memory space how extending cell is then problem demanding prompt solution equally.
Based on above-mentioned consideration, the embodiment of the present invention gives another structure of cellular array computing system, as shown in figure 14, described cellular array computing system is except including foregoing master cpu, cellular array and cellular array bus, can further include at least one memory cell array, the two-dimensional array that described memory cell array is made up of more than one memory element, described cellular array and all memory cell arrays are built up three dimensional structure, memory element in each memory cell array is connected correspondingly with the cell in described cellular array, described memory element coordinates described non-volatile random access memory, both are provided commonly for the instruction code of storage software and need the data of persistence.
When reality is implemented, non-volatile random access memory in cell can be MRAM, described memory cell array can be specifically then flash memory silicon chip, generally can choose cost one or more nand flash memory silicon chips less relative to MRAM, wherein each nand flash memory silicon chip is the memory cell array formed by the memory element consistent with each cell position in described cellular array, again all nand flash memory silicon chips and a cellular array silicon chip are carried out 3D combination, can be vertically connected to establish a communications link by TSV between the cell that arbitrary memory element is corresponding with cellular array, thus extend the memory space of each cell.
When being embodied as, the cell in described cellular array also includes the storage control being connected with described microprocessor, accesses control for the memory element being connected with this cell carries out data storage.After one or more nand flash memory silicon chips and a cellular array silicon chip are carried out 3D combination, can also being each cell configuration nand flash memory controller in cellular array, the MPU of this cell just can realize the memory element corresponding with this cell is written and read by the nand flash memory controller of this cell.When heap file or data are stored in nand flash memory, the search of data can be scanned for by respective NAND passage by each cell, accelerated greatly.The compiling of large software system needs also exist for thousands of source code file are compiled, and when these source codes are saved in such cellular array computing system, compiling has been similarly obtained great acceleration.
The cellular array computing system comprising described memory cell array that the embodiment of the present invention provides, with the memory space of each cell in relatively low cost extension cellular array, can improve the data storage capacities of each cell.
It is to be noted, illustrate only a memory cell array and described cellular array in Figure 14 and be built up the situation of three dimensional structure, those skilled in the art will also appreciate that more than one memory cell array and described cellular array are built up the situation of three dimensional structure equally.
What skilled artisans appreciate that is, the communication that aforementioned master cpu is communicated by cellular array bus cell each with cellular array, carry out not against master cpu between any two cell, arbitrary cell carry out mass-sending to all cells in target area and communicates, receive as end point cell and store other cells to the output data of master cpu for the communication means such as master cpu reading, these cellular array computing systems being equally applicable to comprise described memory cell array by arranging the full-time cell that exports in cellular array.
As previously described, because the space that the non-volatile random access memory in cell is used as internal memory and storage part is all very limited, the internal memory and the memory space that therefore how to extend cell are problem demanding prompt solutions simultaneously.Based on above-mentioned consideration, the embodiment of the present invention gives the yet another construction of cellular array computing system, as shown in figure 15, described cellular array computing system is except including master cpu, cellular array and cellular array bus, it is also possible to farther include at least one said memory cells array and at least one above-mentioned memory cell array.Comprise the cellular array computing system of described memory cell array and memory cell array simultaneously and be referred to above-mentioned being embodied as of cellular array computing system only comprising memory cell array or only comprising memory cell array, repeat no more herein.
The embodiment of the present invention comprises the cellular array computing system of described memory cell array and memory cell array while providing, storage and the memory headroom of each cell in cellular array can be extended with relatively low cost simultaneously, improve the treatment effeciency of microprocessor in the data storage capacities of each cell and cell, it is thus possible to further promote the overall performance of computing system.
As stated in the Background Art, the amount of calculation that neural computing relates to is very big, but owing to its active computer framework realizing relying on exists the communication performance bottleneck between CPU and internal memory, storage, thus have impact on computer overall performance, thus it is unfavorable for the efficient realization of neural computing.
For this, based on the above-mentioned cellular array computing system that the embodiment of the present invention provides, the embodiment of the present invention also provides for a kind of applying the method that above-mentioned cellular array computing system realizes neural computing, including: the cell of each participation neural computing stores the code of one or more neuron functions in described cellular array;Described master cpu selects one or more cells to perform the code of the stored neuron function of each cell, and execution result is exported to one or more target cell;The cell of arbitrary participation neural computing receives the execution result of the code for neuron function exported from other cells, it can be used as the input data of this cell, the code of the stored neuron function of this cell is performed based on all input data from other cells, and the cell that result exports the neuron function place of this execution result of needs will be performed, or it is stored in preset address and waits that described master cpu reads, or export to described master cpu.
In order to the method that the above-mentioned cellular array computing system of application being better understood from embodiment of the present invention offer realizes neural computing, it is necessary to first the principle of neural computing is briefly described.
Neutral net is algorithm conventional in computer learning, and it has copied the operation principle of human brain.Human brain is made up of a large amount of neurons, and each neuron can pass through a large amount of (thousands of) synapse on dendritic arbors and other neuronal contacts, is inputted;Stiffness of coupling in synapse has memory function;The signal of its one excitement of output or suppression, this signal can send substantial amounts of neuron at a distance to by aixs cylinder.
Copying the neural computing method that human brain operation principle is formed, neuron therein is exactly a function, and as shown in figure 16, it has a lot of inputs, for instance x1、x2And x3Being wherein 3 inputs, each input correspond to a weight, and general computational methods are that each input is multiplied by weight and is added;Its output 0 or 1 (being determined by a threshold value), or a value between 0 and 1;Having a lot of parameters (such as weight parameter, threshold parameter etc.) inside it, the process adjusting these parameters is exactly the process of computer learning.
One typical NeuralNetwork, is the output a large amount of Neuron, inputs the network linked together, be typically organized to multistage architecture.As shown in figure 17, in figure, this neutral net is organized into three-tier architecture, and the output of each ground floor Neuron is sent to each second layer Neuron, and the output of each second layer Neuron is sent to each third layer Neuron.In Figure 17, the ground floor of neutral net is properly termed as input layer, and the second layer is properly termed as hidden layer, and third layer is properly termed as output layer, described output layer export 0,1,2 ..., 8,9 be the final result of neural computing.Neutral net in application, is generally all such hierarchy, and when reality is implemented, this can be a neutral net for identifying handwritten numeral.
The learning training process of Neuralnetwork is the process that a magnanimity calculates.One network is actually a function having quantity of parameters (being likely to there are tens thousand of), need substantial amounts of scene, each scene has input data and correct option, is used for regulating these substantial amounts of parameters to reach the destination of study (generally using steepest descent method).Such as one neural network recognization handwritten numeral of training, it is necessary to by several ten thousand pictures, regulates parameter and makes neutral net that each pictures is provided correct result, and handwriting recognition hereafter just has significantly high success rate.
The present inventors considered that, neural computing is to be highly suitable in the cellular array computing system that the embodiment of the present invention provides to be achieved.Owing to training process needing neutral net is carried out the valuation of a lot of number of times, then this neutral net can be by substantial amounts of MPU parallel computing and accelerates.
When reality is implemented, if neuronic quantity size is not less than cell quantity in cellular array computing system in neutral net, then a cell in cellular array can complete the calculating that a neuron in neutral net relates to, and the calculating that otherwise in neutral net, more than one neuron relates to can be integrated in a cell and complete.
When being embodied as, each cell stores the code of one or more neuron functions, the code of neuron function can be broadcasted to each as among neuronic cell by the mode that data are broadcasted by master cpu, and started neural computing by master cpu broadcasting instructions to certain target area, such as: described master cpu can select the cell in whole in described cellular array or target area as the cell participating in neural computing, by in the code broadcast of identical neuron function to the same segment relative address of each selected cell, and broadcasting instructions makes the microprocessor of one or more selected cells start to perform the code of described neuron function from described relative address.It is embodied as the associated description can also being embodied as with reference to above-mentioned cellular array computing system.
In the embodiment of the present invention, each is from the corresponding weight parameter of input data of other cells, the described code performing the stored neuron function of this cell based on all input data from other cells may include that and each is multiplied from the weight parameter that the input data of other cells are corresponding, and all products are sued for peace, the result after summation is compared the output valve determined afterwards as performing result with threshold parameter.During actual enforcement, described weight parameter and threshold parameter are pre-stored among the described non-volatile random access memory in cell.
During actual enforcement, if the execution result of output is successive value after performing the code of described neuron function, then accelerate output also by the mode tabled look-up.
When being embodied as, the cellular array computing system provided due to the embodiment of the present invention supports the cell-cell communication between any two cell, when then carrying out neural computing, it is possible to by adopting the transmission of described cell-cell communication for the execution result of the code of described neuron function.The bandwidth relying on cellular array internal network communication huge improves the efficiency of cell-cell communication, also reduces the processing load of master cpu, it is thus possible to promote the overall performance of computing system further, is more beneficial for the efficient realization of neural computing.
When being embodied as, when neutral net is divided into different layers time, then data transmission between layers can use the cell-cell communication mass-sending mechanism that the embodiment of the present invention provides, the cell now participating in neural computing in same layer is positioned at same target area, in a certain layer, the cell of arbitrary participation neural computing is to target area mass-sending for the execution result of the code of neuron function, is just equivalent to be dealt in the cell of next layer of all participation neural computing the execution result clusters of the code for neuron function.By intercellular point communications functionality being expanded to region mass-sending, it is possible to support the higher depth of parallelism, it is thus achieved that much higher total bandwidth, thus promoting the overall performance of computing system further, it is remarkably contributing to improve the speed of neural computing.
When being embodied as, if cellular array is additionally provided with full-time output cell, then by described cell-cell communication, the execution result of neural computing can be sent to described full-time output cell, read for described master cpu, so can more efficiently the execution result of neural computing be exported to master cpu.During actual enforcement, if in the neuron function of last layer some obtain affirmative output (common application be last layer neuron in, each is responsible for identifying a specific data characteristics or image), then can pass through this mechanism notice master cpu.
Implement neural computing be referred to the associated description in cellular array computing system embodiment by cell-cell communication, iuntercellular mass-sending and full-time output cell, repeat no more herein.
In sum, the advantage that the cellular array computing system that the embodiment of the present invention provides realizes neural computing is used to be apparent from: the arithmetic speed that the parallel computing of a large amount of cells significantly speeds up, so that the speed of learning training greatly improves;Bandwidth and mass-sending mechanism that array internal network communication is huge are remarkably contributing to improve speed equally;MRAM's is non-volatile so that train successful chip can directly replicate as the production marketing solving particular problem.
Although present disclosure is as above, but the present invention is not limited to this.Any those skilled in the art, without departing from the spirit and scope of the present invention, all can make various changes or modifications, and therefore protection scope of the present invention should be as the criterion with claim limited range.

Claims (9)

1. the method that an application cell array computation system realizes neural computing, it is characterised in that described cellular array computing system includes: master cpu, cellular array and cellular array bus;The two-dimensional array that described cellular array is made up of the more than one cell having calculating and storage function concurrently, each of which cell includes microprocessor and non-volatile random access memory;The random access memory of involved data when described non-volatile random access memory calculates for described microprocessor, is additionally operable to the instruction code of storage software and needs the data of persistence;Position in the described cellular array of each leisure of each cell storage is read for the software in cell or hardware as ID;Described master cpu is communicated with each cell in described cellular array by described cellular array bus;There is communication interface between flanking cell in described cellular array, can mutually send data;
Described method includes:
The cell of each participation neural computing stores the code of one or more neuron functions in described cellular array;
Described master cpu selects one or more cells to perform the code of the stored neuron function of each cell, and execution result is exported to one or more target cell;
The cell of arbitrary participation neural computing receives the execution result of the code for neuron function exported from other cells, it can be used as the input data of this cell, the code of the stored neuron function of this cell is performed based on all input data from other cells, and the cell that result exports the neuron function place of this execution result of needs will be performed, or it is stored in preset address, or exports to described master cpu.
2. the method that application cell array computation system according to claim 1 realizes neural computing, it is characterized in that, each is from the corresponding weight parameter of input data of other cells, the described code performing the stored neuron function of this cell based on all input data from other cells includes: each be multiplied from the weight parameter that the input data of other cells are corresponding, and all products are sued for peace, the result after summation is compared the output valve determined afterwards as performing result with threshold parameter.
3. the method that application cell array computation system according to claim 2 realizes neural computing, it is characterised in that described weight parameter and threshold parameter are pre-stored among the described non-volatile random access memory in cell.
4. the method that application cell array computation system according to claim 1 realizes neural computing, it is characterised in that if the execution result of output is successive value after performing the code of described neuron function, then the mode by tabling look-up accelerates output.
5. the method that application cell array computation system according to claim 1 realizes neural computing, it is characterized in that, can communicate between any two cell in described cellular array, the cell participating in cell-cell communication comprises starting point cell, end point cell and transfer cell, described starting point cell is the cell sending data to described end point cell, described end point cell is the final cell receiving described the sent out data of starting point cell, described transfer cell is adjacent successively along cell-cell communication path and by the sent out data of starting point cell described in described communication interface transfer cells, described cell-cell communication path is by described starting point cell, the data transmit-receive path that transfer cell and end point cell are constituted;Described method also includes: be the execution result being transmitted the code for described neuron function by described cell-cell communication between any two cell.
6. the method that application cell array computation system according to claim 5 realizes neural computing, it is characterized in that, arbitrary cell in described cellular array can also carry out mass-sending communication as described starting point cell to all cells in target area, participate in the described cell communicating and being positioned at target area of mass-sending as described starting point cell or as described end point cell or simultaneously as described transfer cell and end point cell, participate in the described cell communicated and be positioned at outside target area of mass-sending as described starting point cell or transfer cell;Described method also includes: if neutral net is divided into different layers, then use described mass-sending communication to realize data transmission between layers, and the cell participating in neural computing in same layer is positioned at same target area.
7. the method that the application cell array computation system according to claim 5 or 6 realizes neural computing, it is characterized in that, described cellular array is additionally provided with at least one full-time output cell, described full-time output cell receives and stores other cells output data to described master cpu as described end point cell, and notifies that described master cpu reads described output data with interrupt signal;Described execution result is included to the output of described master cpu: by described cell-cell communication, described execution result is sent to described full-time output cell, read for described master cpu.
8. the method that application cell array computation system according to claim 1 realizes neural computing, it is characterised in that described non-volatile random access memory is MRAM.
9. the method that application cell array computation system according to claim 1 realizes neural computing, it is characterized in that, described master cpu selects the cell in whole in described cellular array or target area as the cell participating in neural computing, by in the same segment relative address of the code broadcast of identical neuron function to selected each cell, and broadcasting instructions makes the microprocessor of one or more selected cells start to perform the code of described neuron function from described relative address.
CN201510456306.2A 2015-07-29 2015-07-29 A kind of method that application cell array computation system realizes neural computing Active CN105740946B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510456306.2A CN105740946B (en) 2015-07-29 2015-07-29 A kind of method that application cell array computation system realizes neural computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510456306.2A CN105740946B (en) 2015-07-29 2015-07-29 A kind of method that application cell array computation system realizes neural computing

Publications (2)

Publication Number Publication Date
CN105740946A true CN105740946A (en) 2016-07-06
CN105740946B CN105740946B (en) 2019-02-12

Family

ID=56295970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510456306.2A Active CN105740946B (en) 2015-07-29 2015-07-29 A kind of method that application cell array computation system realizes neural computing

Country Status (1)

Country Link
CN (1) CN105740946B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107817708A (en) * 2017-11-15 2018-03-20 复旦大学 A kind of highly compatible may be programmed neutral net and accelerate array
CN108154226A (en) * 2016-12-06 2018-06-12 上海磁宇信息科技有限公司 A kind of neural network chip calculated using simulation
CN108154225A (en) * 2016-12-06 2018-06-12 上海磁宇信息科技有限公司 A kind of neural network chip calculated using simulation
CN108256637A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 A kind of cellular array three-dimensional communication transmission method
CN108256640A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 Convolutional neural networks implementation method
CN108256641A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 For the cellular array internal network communication method of cellular array computing system
CN108255775A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 For the cellular array bus broadcast method of cellular array computing system
CN108255514A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 For the neuron calculator operation method of cellular array computing system
WO2018228295A1 (en) * 2017-06-16 2018-12-20 华为技术有限公司 Peripheral circuit and system for supporting rram-based neural network training
CN109886393A (en) * 2019-02-26 2019-06-14 杭州闪亿半导体有限公司 It is a kind of to deposit the calculation method for calculating integrated circuit and neural network
CN109961136A (en) * 2017-12-14 2019-07-02 北京中科寒武纪科技有限公司 Integrated circuit chip device and Related product
CN110308909A (en) * 2018-03-27 2019-10-08 上海寒武纪信息科技有限公司 For the executable program generating means and method of neural network processor
CN110362280A (en) * 2019-09-04 2019-10-22 南京优存科技有限公司 Mixing storage system based on the nearly data processing MRAM of low-power consumption neural network
CN110851380A (en) * 2018-08-20 2020-02-28 旺宏电子股份有限公司 Artificial intelligence application program-based data storage device, system and method
CN112269751A (en) * 2020-11-12 2021-01-26 浙江大学 Chip expansion method for hundred million-level neuron brain computer

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341242A (en) * 1999-01-21 2002-03-20 索尼电脑娱乐公司 High-speed processor system, method of using the same, and recording medium
US20090094436A1 (en) * 2007-07-26 2009-04-09 Yuefan Deng Ultra-scalable supercomputer based on mpu architecture
WO2009075694A1 (en) * 2007-12-05 2009-06-18 Hewlett-Packard Development Company, L.P. Hybrid microscale-nanoscale neuromorphic integrated circuit
CN102665049A (en) * 2012-03-29 2012-09-12 中国科学院半导体研究所 Programmable visual chip-based visual image processing system
CN103019656A (en) * 2012-12-04 2013-04-03 中国科学院半导体研究所 Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341242A (en) * 1999-01-21 2002-03-20 索尼电脑娱乐公司 High-speed processor system, method of using the same, and recording medium
US20090094436A1 (en) * 2007-07-26 2009-04-09 Yuefan Deng Ultra-scalable supercomputer based on mpu architecture
WO2009075694A1 (en) * 2007-12-05 2009-06-18 Hewlett-Packard Development Company, L.P. Hybrid microscale-nanoscale neuromorphic integrated circuit
CN102665049A (en) * 2012-03-29 2012-09-12 中国科学院半导体研究所 Programmable visual chip-based visual image processing system
CN103019656A (en) * 2012-12-04 2013-04-03 中国科学院半导体研究所 Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154225B (en) * 2016-12-06 2021-09-03 上海磁宇信息科技有限公司 Neural network chip using analog computation
CN108154226A (en) * 2016-12-06 2018-06-12 上海磁宇信息科技有限公司 A kind of neural network chip calculated using simulation
CN108154225A (en) * 2016-12-06 2018-06-12 上海磁宇信息科技有限公司 A kind of neural network chip calculated using simulation
CN108154226B (en) * 2016-12-06 2021-09-03 上海磁宇信息科技有限公司 Neural network chip using analog computation
CN108256637A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 A kind of cellular array three-dimensional communication transmission method
CN108256640A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 Convolutional neural networks implementation method
CN108256641A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 For the cellular array internal network communication method of cellular array computing system
CN108255775A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 For the cellular array bus broadcast method of cellular array computing system
CN108255514A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 For the neuron calculator operation method of cellular array computing system
WO2018228295A1 (en) * 2017-06-16 2018-12-20 华为技术有限公司 Peripheral circuit and system for supporting rram-based neural network training
US11409438B2 (en) 2017-06-16 2022-08-09 Huawei Technologies Co., Ltd. Peripheral circuit and system supporting RRAM-based neural network training
CN107817708A (en) * 2017-11-15 2018-03-20 复旦大学 A kind of highly compatible may be programmed neutral net and accelerate array
CN109961136A (en) * 2017-12-14 2019-07-02 北京中科寒武纪科技有限公司 Integrated circuit chip device and Related product
CN111242294A (en) * 2017-12-14 2020-06-05 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
CN111242294B (en) * 2017-12-14 2023-08-25 中科寒武纪科技股份有限公司 Integrated circuit chip device and related products
CN110308909A (en) * 2018-03-27 2019-10-08 上海寒武纪信息科技有限公司 For the executable program generating means and method of neural network processor
CN110308909B (en) * 2018-03-27 2023-08-01 上海寒武纪信息科技有限公司 Executable program generating device and method for neural network processor
CN110851380A (en) * 2018-08-20 2020-02-28 旺宏电子股份有限公司 Artificial intelligence application program-based data storage device, system and method
TWI696072B (en) * 2018-08-20 2020-06-11 旺宏電子股份有限公司 Data storage apparatus, system and method
US11914860B2 (en) 2018-08-20 2024-02-27 Macronix International Co., Ltd. Data storage for artificial intelligence-based applications
CN109886393A (en) * 2019-02-26 2019-06-14 杭州闪亿半导体有限公司 It is a kind of to deposit the calculation method for calculating integrated circuit and neural network
CN110362280A (en) * 2019-09-04 2019-10-22 南京优存科技有限公司 Mixing storage system based on the nearly data processing MRAM of low-power consumption neural network
CN112269751A (en) * 2020-11-12 2021-01-26 浙江大学 Chip expansion method for hundred million-level neuron brain computer

Also Published As

Publication number Publication date
CN105740946B (en) 2019-02-12

Similar Documents

Publication Publication Date Title
CN105740946A (en) Method for realizing neural network calculation by using cell array computing system
CN105718996A (en) Cell array calculation system and communication method wherein
CN105608490B (en) Cellular array computing system and communication means therein
CN105718994B (en) Cellular array computing system
US11741345B2 (en) Multi-memory on-chip computational network
US11580367B2 (en) Method and system for processing neural network
US11294599B1 (en) Registers for restricted memory
US20190179795A1 (en) Fast context switching for computational networks
DE102010044529B4 (en) AUTONOMOUS MEMORY SUB SYSTEM WITH HARDWARE ACCELERATOR
CN111465943A (en) On-chip computing network
CN105718991B (en) Cellular array computing system
CN112988080B (en) Multi-data page simultaneous writing method, 3D flash memory device and computer storage medium
CN105718990B (en) Communication means between cellular array computing system and wherein cell
US20210373790A1 (en) Inference in memory
CN104156316B (en) A kind of method and system of Hadoop clusters batch processing job
CN105718380B (en) Cellular array computing system
CN105718379B (en) Cellular array computing system and wherein iuntercellular mass-send communication means
CN105718993B (en) Cellular array computing system and communication means therein
CN107291209A (en) Cellular array computing system
CN105718992B (en) Cellular array computing system
CN105719228B (en) Camera system and image identification system
CN105719227A (en) Camera system and an image identification system
CN105718995A (en) Cell array calculation system and debugging method thereof
CN107341129A (en) Cellular array computing system and its method of testing
CN118012623A (en) Data processing method and processor of neuromorphic chip under many-core architecture

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant