CN105608490A - Cellular array computing system and communication method thereof - Google Patents
Cellular array computing system and communication method thereof Download PDFInfo
- Publication number
- CN105608490A CN105608490A CN201510456294.3A CN201510456294A CN105608490A CN 105608490 A CN105608490 A CN 105608490A CN 201510456294 A CN201510456294 A CN 201510456294A CN 105608490 A CN105608490 A CN 105608490A
- Authority
- CN
- China
- Prior art keywords
- cell
- cellular array
- data
- array
- cellular
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Multi Processors (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a cellular array computing system and a communication method thereof. The cellular array computing system comprises a main control CPU, a cellular array and cellular array buses, wherein the cellular array is a two-dimensional array composed of more than one cell having computing and storage functions, and each cell comprises a microprocessor and non-volatile random access memory; each non-volatile random access memory is used for random access of data related in computing of the corresponding microprocessor, and is used for storing instruction codes of software and data to be preserved permanently; each cell stores its position in the cellular array as an ID to be read by software or hardware in the cell; the main control CPU communicates with each cell in the cellular array through the cellular array buses; and communication interfaces are provided between the adjacent cells in the cellular array, and the adjacent cells can sent data to each other. The cellular array computing system and the communication method thereof can overcome the communication bottleneck among the CPU, internal storage and memory of the existing computer architecture, and enhance the overall performance of the computing system.
Description
Technical field
The present invention relates to computer and Computer Applied Technology field, particularly a kind of cellular array computing system withAnd communication means wherein.
Background technology
As a rule, a computer mainly comprises three cores: central processing unit (CPU, CentralProcessingUnit), internal memory and storage.
Through the unremitting effort of some Apical Limited of the world, CPU has developed into extremely complicated semiconductor chip.The metal-oxide-semiconductor number of top CPU core inside can exceed 100,000,000. Current industry trend is to be limited by meritConsumption, the running frequency of CPU has been difficult to improve again. Extremely complicated modern CPU, operational efficiency equally veryDifficult raising. New CPU product, more and more towards the evolution of multinuclear direction.
Aspect internal memory, that occupy at present dominant position is dynamic random access memory (DRAM, DynamicRandomAccessMemory) technology. DRAM is random read-write fast, but can not be in the situation of power-offLower maintenance content. In fact,, even the in the situation that of energising, it also can be because inside be for the electric capacity of storing informationThe electric leakage of device and drop-out, periodically self-refresh.
Aspect storage, nand flash memory technology is progressively replacing conventional hard. The floating grid that flash memory relies on(floatinggate) technology, although can keep content in the situation that of power-off, writing (will ' 1 ' rewritingFor ' 0 ') speed very slow, the speed of wiping (will ' 0 ' be rewritten as ' 1 ') is slower, cannot look like DRAMLike that for the direct support to calculating. It is made into block device (blockdevice), must monoblock wipe togetherRemove, a piece (block) comprises a lot of pages (page), wipes latter every page and can carry out write operation. NANDAnother one problem be to there is the limited life-span.
DRAM and nand flash memory, and the logic circuit of CPU, although be all based on cmos semiconductor techniqueProduce, but this three's technique is also incompatible each other. So three cores of computer cannot be at oneOn chip, coexist, this has had a deep effect on the framework of modern computer.
Computer architecture of the prior art as shown in Figure 1, multiple CPU cores shown in Fig. 1, be respectively CPU1,CPU2, CPU3 ..., CPUn, each CPU core generally has corresponding level cache (L1Cache),As required can also be further for each CPU core, to be equipped with corresponding L2 cache (L2Cache), three grades slowDeposit (L3Cache). Between DRAM and each CPU core, pass through Double Data Rate (DDR, DoubleDataRate)Interface communicates, hard disk (HD, HardDisk) or solid state hard disc (SSD, SolidStateDrives) withBetween each CPU core, communicate by peripheral interface.
On the one hand, CPU is at the future development to multinuclear, and internal memory and storage are all in other chip on the other hand.The multi-core CPU proportional increase of information content of handling up, just more and more becomes the bottle of systematic function with communicating by letter of internal memory, storageNeck. In order to alleviate communication performance bottleneck, CPU has to adopt increasing multi-level buffer. Buffer memory is in internal memoryContent replication, normally with cost than DRAM much higher but speed static RAM faster(SRAM, StaticRandomAccessMemory) design. Such framework, very poor of cost effectiveness.The cost of semiconductor chip is determined by the area of its silicon chip, and the performance boost that traditional computer framework brings and its siliconThe increase of sheet area is far from proportional.
Summary of the invention
The problem to be solved in the present invention is computer architecture of the prior art because existing between CPU and internal memory, storageCommunication performance bottleneck and affect the lifting of computer overall performance, and make cost effectiveness poor.
For addressing the above problem, technical solution of the present invention provides a kind of cellular array computing system, comprising: master controlCPU, cellular array and cellular array bus; Described cellular array is to have concurrently above and calculate and memory function by oneThe two-dimensional array of cell composition, wherein each cell comprises microprocessor (MPU, MicroProcessingAnd non-volatile (NV, NotVolatile) random access memory Unit); Described in described non-volatile random access memory is used forThe arbitrary access of related data when microprocessor calculates, also forever protects with needing for the instruction code of storing softwareThe data of depositing; Each cell store position in the described cellular array of each leisure as identity recognition number (ID,Identification) read for the software in cell or hardware; Described master cpu is total by described cellular arrayEach cell in line and described cellular array communicates; Between flanking cell in described cellular array, have logicalLetter interface, can send data mutually.
Optionally, described master cpu is by each cell in described cellular array bus and described cellular arrayThe communication of carrying out comprises at least one in following situation:
Read and write the non-volatile random access memory of arbitrary cell in described cellular array by address;
Non-volatile random access memory by data broadcast to each cell in target area in described cellular array, andWrite relative address identical in the non-volatile random access memory of each cell in described target area;
Send instruction to the microprocessor of arbitrary cell in described cellular array, send data or reading state;
Give the microprocessor broadcasting instructions of all cells in described target area.
Optionally, the cell in described cellular array also comprises bus control unit and cell interior bus, described busController is connected with described cellular array bus, microprocessor and cell interior bus, and described bus control unit is usedThat between identification described master cpu and this cell, carries out communicates by letter, and connects described microprocessor to transmit described master controlInstruction or data, state that CPU sends read, or by described cell interior bus connection described non-volatile withMachine memory carries out the read-write operation of data.
Optionally, in described microprocessor, be integrated with Floating-point Computation processor (FPU, FloatPointUnit) and figureAt least one in picture processor.
Optionally, described non-volatile random access memory is magnetic RAM (MRAM, MagneticRandomAccessMemory)。
Optionally, described master cpu and described cellular array and described cellular array bus are integrated in a chipIn.
Optionally, described master cpu is as chip independently, by the memory interface of standard with by described cell battle arrayThe chip of row and described cellular array bus composition communicates.
For addressing the above problem, technical solution of the present invention also provides the communication in a kind of above-mentioned cellular array computing systemMethod, comprising:
Arbitrary cell in described cellular array receives that described master cpu broadcasts in described cellular array busDestination address, if judge described destination address in this cell, connects the non-volatile random access memory of this cellSo that described master cpu carries out the read-write operation of data;
In system address space, reserved the first special address field is used between described master cpu and microprocessorCommunicate by letter and store the ID of target cell, if when arbitrary cell receives described the first special address field in described cellular arrayIdentify is and the communicating by letter of the microprocessor of this cell that the microprocessor that connects this cell completes follow-up instruction and connectsReceipts, data receiver and state read operation;
In system address space, reserved the second special address field is for described master cpu broadcasting instructions, described secondSpecial address field has the ID that can help to determine each cell of the scope of target area in described cellular array, if instituteState arbitrary cell in cellular array and identify this cell in described target area after receiving described the second special address fieldIn, the microprocessor that connects this cell reads with the instruction of transmitting described master cpu and sending or data, state,Or carry out the read-write operation of data by connecting the non-volatile random access memory of this cell;
Arbitrary cell in described cellular array sends data to flanking cell under the control of its microprocessor.
Optionally, the cell in described cellular array also comprises bus control unit and cell interior bus, described busController is connected with described cellular array bus, microprocessor and cell interior bus; In described cellular arrayArbitrary cell judges that described destination address is whether in this cell, whether logical with the microprocessor of this cell of identificationLetter, identifies this cell whether in described target area, and connect non-volatile random access memory or microprocessor equalCompleted by described bus control unit, described bus control unit connects described non-easy by described cell interior busLose random access memory.
Optionally, described master cpu reading for the non-volatile random access memory of arbitrary cell in described cellular arrayThe priority of write operation is the read-write behaviour for corresponding non-volatile random access memory higher than this intracellular microprocessorDo.
Technical solution of the present invention also provides the above-mentioned cellular array computing system of a kind of application to calculate Monte Carlo (MonteCarlo) method of integration, comprising:
Described master cpu is selected the cell all or in a target area in described cellular array, integrandCorresponding program is broadcast to the relative address section of selected each cell;
Described master cpu broadcasting instructions, makes the microprocessor of selected cell start to carry out from described relative address sectionThe program that described integrand is corresponding;
Each cell completes after integral operation, summation is stored in to the address of agreement, after reading for described master cpuAlways sue for peace.
Optionally, program corresponding to described integrand is in the time starting to carry out, and its randomizer carrying readsThe ID of cell is as seed.
It is long-pending that technical solution of the present invention also provides the above-mentioned cellular array computing system of another kind of application to calculate MonteCarloThe method of dividing, comprising:
Described master cpu is selected the cell all or in a target area in described cellular array;
In the same segment relative address of selected each cell that downloads of described master cpu broadcast,And broadcasting instructions makes the microprocessor of selected cell start to download described in execution from described relative address;
Program corresponding integrand is split into two or more subprograms, and described master cpu broadcast is eachIndividual subprogram is in the microprocessor of selected cell;
The microprocessor downloading described in operation, selects one of them subprogram according to the ID of place cell separatelyStore, each subprogram is deployed in one group of adjacent successively cell according to the order of sequence;
Described master cpu broadcasting instructions, makes the microprocessor of each group cell carry out successively described integrand correspondenceProgram by the subprogram after splitting, the intermediate object program of previous stage is transported to next stage input;
Each group cell completes after integral operation, summation is stored in to the address of agreement, reads for described master cpuAfter always sue for peace.
Compared with prior art, technical scheme of the present invention at least has the following advantages:
Form two dimension battle array by the unit (being called " cell ") that has above independent calculating and memory function concurrently byRow (being called " cellular array "), wherein each cell comprises microprocessor and non-volatile random access memory, described inWhen non-volatile random access memory can support that described microprocessor calculates, the arbitrary access of related data, also canThe instruction code and the data that need persistence of supporting storing software, make internal memory, storage, three function collection of calculatingBecome in each cell, and make to form intensive communication network between each cell, on the one hand, master cpu can pass throughEach cell in cellular array bus and cellular array communicates, on the other hand, and adjacent in cellular arrayBetween cell, also can mutually send data, can mass-send and internal network by data thus, overcome active computer frameThe communication performance bottleneck of structure because existing between CPU and internal memory, storage, thus the overall performance of computing system promoted, and energyMake cost effectiveness better.
In prior art, utilize multiple nuclear CPU framework to calculate MonteCarlo integration, the buffer memory carrying as each CPUNot time, all CPU all need to from the interface of internal memory read the code of integrand, form thus bottleneck,And calculate MonteCarlo integration by the above-mentioned cellular array computing system of application, cellular array computing system wideBroadcast/mass-sending function and the on a large scale advantage of the internal network of parallel transmission data will fully demonstrate out, great Liang XiWhen born of the same parents, computing discharges the powerful operational capability of described cellular array computing system completely, thereby complete more efficientlyBecome the calculating of MonteCarlo integration.
Brief description of the drawings
Fig. 1 is the schematic diagram of prior art Computer framework;
Fig. 2 is a kind of structural representation of the cellular array computing system that provides of the embodiment of the present invention;
Fig. 3 is the schematic diagram of communication mode between a kind of flanking cell of the embodiment of the present invention;
Fig. 4 is the schematic diagram of communication mode between the another kind of flanking cell of the embodiment of the present invention;
Fig. 5 is the structural representation of the cell of the embodiment of the present invention;
Fig. 6 is application cell array computation system is calculated MonteCarlo integration schematic diagram in the mode of streamline;
Fig. 7 is the structural representation that carries out the cell of cell-cell communication in the cellular array of the embodiment of the present invention;
Fig. 8 is the schematic diagram of the Path selection of cell-cell communication in the cellular array of the embodiment of the present invention;
Fig. 9 is the implementation process schematic diagram of the sole duty output cell of the embodiment of the present invention;
Figure 10 is the cell mass-sending signal of starting point cell on angle, target area in the cellular array of the embodiment of the present inventionFigure;
Figure 11 is the cell mass-sending signal of starting point cell on limit, target area in the cellular array of the embodiment of the present inventionFigure;
Figure 12 be in the cellular array of the embodiment of the present invention starting point cell in the cell mass-sending signal of inside, target areaFigure;
Figure 13 is the cell mass-sending schematic diagram of starting point cell outside target area in the cellular array of the embodiment of the present invention;
Figure 14 is the another kind of structural representation of the cellular array computing system that provides of the embodiment of the present invention;
Figure 15 is another structural representation of the cellular array computing system that provides of the embodiment of the present invention;
Figure 16 is another structural representation of the cellular array computing system that provides of the embodiment of the present invention;
Figure 17 is a kind of structural representation of the camera system that provides of the embodiment of the present invention;
Figure 18 is the another kind of structural representation of the camera system that provides of the embodiment of the present invention;
Figure 19 is a neuronic schematic diagram in neutral net;
Figure 20 is the schematic diagram of neural computing;
Figure 21 is the structural representation of the cellular array computing system with debugging interface that provides of the embodiment of the present invention.
Detailed description of the invention
The computer architecture of prior art is owing to having communication performance bottleneck between CPU and internal memory, storage, thus impact meterThe lifting of calculation machine overall performance, and make cost effectiveness poor.
Present inventor considers after research, if internal memory, storage, three functions of calculating are integrated into oneOn chip, come, form relatively simple but have independent calculating and the unit of memory function concurrently, and make in a large number these type of unit itBetween form intensive communication network, realize data mass-sending function and the inside of parallel transmission data on a large scale with thisNetwork, just can develop a kind of and human brain and exist the computing architecture of similarity, and this is equivalent to a large amount of microcomputersCalculation machine is made on a chip.
For this reason, technical solution of the present invention provides a kind of and computing architecture (technical solution of the present invention human brain structural similarityIn be referred to as " cellular array computing system "), this computing architecture is relatively simple by numerous structures, have concurrently storage andComputing function, unit (being referred to as " cell " in the technical solution of the present invention) group that has again dense network to connectBecome. This new computing architecture will obtain widely should in the fields such as mass computing, large data processing, artificial intelligenceWith.
For above-mentioned purpose of the present invention, feature and advantage can more be become apparent, below in conjunction with accompanying drawing to thisBright specific embodiment is described in detail.
As shown in Figure 2, the cellular array computing system that the embodiment of the present invention provides comprises: master cpu, cell battle arrayRow and cellular array bus; Described cellular array is the main body in described cellular array computing system, and it is by oneThe two-dimensional array that more than has the cell composition of calculating and memory function concurrently, wherein each cell comprises microprocessorAnd non-volatile random access memory (in Fig. 2 taking MRAM as example) (MPU); Described non-volatile random access memory is usedThe arbitrary access of related data in the time that described microprocessor calculates, also for the instruction code of storing software with needThe data of persistence; Each cell store position in the described cellular array of each leisure as ID for cellIn software or hardware read; Every by described cellular array bus and described cellular array of described master cpuA cell communicates; Between flanking cell in described cellular array, there is communication interface, can mutually send data.
It should be noted that, in the present embodiment, described non-volatile random access memory is to carry out as an example of MRAM exampleExplanation. In other embodiments, along with further developing and maturation of non-volatile random memory technology, described non-easyLose random access memory and also can adopt several potential technology in addition to realize, for example: phase-change random access memory(PCRAM, PhaseChangeRandomAccessMemory), resistive formula random access memory (ResistiveRandomAccessMemory), ferroelectric random memory (FeRAM, FerroelectricRandomAccessMemory), ferroelectric dynamic random access memory (FEDRAM, FerroelectricDynamicRandomAccessMemory) etc.
MRAM is a kind of new internal memory and memory technology, can be as SRAM/DRAM quick random read-write,And faster than DRAM; Can also forever after power-off as flash memory retain data, and dodge unlike NANDDeposit, MRAM can not limit inferiorly erasable, and service life is longer. In addition, the economy of MRAM is thought local good,The silicon area that unit capacity takies has very large advantage than SRAM (usually used as the buffer memory of CPU), is expected toApproach the level of DRAM. Its performance is also fairly good, and read-write time delay approaches best SRAM, and power consumption existsVarious internal memories and memory technology are best. And MRAM is unlike DRAM and Flash and standard CMOSSemiconductor technology is incompatible, and MRAM can be integrated in a chip with logic circuit. By adopting MRAMTechnology, just can be integrated into internal memory, storage, three functions of calculating on a chip, makes described cellular arrayThe realization of computing system just can become possibility.
In the present embodiment, described microprocessor has the function of common CPU, can also be according to concrete application scenariosAdd the unit such as Floating-point Computation processor (FPU, FloatPointUnit), image processor, therefore, described micro-In processor, can be integrated with at least one in Floating-point Computation processor and image processor.
In the time that reality is implemented, described master cpu and described cellular array and described cellular array bus can be integrated inIn a chip, described master cpu also can be used as independently chip, by the memory interface of standard and by describedThe chip of cellular array and described cellular array bus composition communicates. When described master cpu and described cell battle arrayWhile adopting the memory interface of standard to communicate between row, master cpu can adopt general cpu chip realNow, be easier to the enforcement of described cellular array computing system.
In the present embodiment, the position in the each comfortable cellular array of each cell storage is as ID, and described position is passableAdopt the coordinate of first quartile in plane right-angle coordinate to represent mode, if (x, y) represents that certain cell is in cell battle arrayPosition in row, (x, y) can be used as ID and is stored in this cell, and the software and hardware in cell can be readGet this ID, in concrete operation, use.
In the present embodiment, described master cpu is by each in described cellular array bus and described cellular arrayThe communication that cell carries out comprises following situation:
Read and write the non-volatile random access memory of arbitrary cell in described cellular array by address;
Non-volatile random access memory by data broadcast to each cell in target area in described cellular array, andWrite relative address identical in the non-volatile random access memory of each cell in this target area;
Send instruction (comprise beginning, suspend), send number to the microprocessor of arbitrary cell in described cellular arrayAccording to or reading state;
Give the microprocessor broadcasting instructions of all cells in target area.
Certainly, in other embodiments, described master cpu is by described cellular array bus and described cellular arrayIn the communication carried out of each cell can be also one or more combinations in above-mentioned situation.
It should be noted that, " target area " in the embodiment of the present invention refers to by described master cpu or described thinArbitrary cell in born of the same parents' array is selected a region that exists above the cell of neighbouring relations to form, in this regionCell is the object that data or instruction were broadcasted/mass-sended to the arbitrary cell in master cpu or cellular array. The present embodimentIn, specifically with rectangular area, (wherein a, b are respectively this rectangle regions for a≤x≤b, c≤y≤d in described target areaThe axial boundary coordinate of territory x in plane right-angle coordinate, c, d are respectively that this rectangular area is sat at flat squareThe axial boundary coordinate of y in mark system) describe for example. In other embodiments, also can in described target areaFor example, to be the region that other shapes form, diamond-shaped area, delta-shaped region, hexagonal area etc.
In addition, in the embodiment of the present invention, the concept of " broadcast " is different from the concept of " mass-sending ", and the former sends outOne pass according to or instruction all objects can be received, the latter issues different a lot timesObject.
Except by master cpu to arbitrary cell in described cellular array (comprise microprocessor in cell or non-easilyLose random access memory) broadcast, in described cellular array, also have a communication network, this communication network energyEnough make a cell under the control of its MPU, to send data to the cell adjacent with it. As Fig. 3 instituteShow, in a plane, arbitrary cell can communicate with the flanking cell of its upper and lower, left and right four direction.Certainly, between flanking cell, the concept of communication mode is not limited only to " upper and lower, left and right four direction ",In the situation that configuration can be supported, can be also " upper and lower, left and right, upper left, upper right, lower-left,Eight of bottom rights direction ", as shown in Figure 4, arbitrary cell can with its upper and lower, left and right, upper left, upper right,The flanking cell of lower-left, eight directions in bottom right communicates.
As shown in Figure 5, in the present embodiment, the cell in described cellular array specifically can also comprise total line traffic controlDevice and cell interior bus, described bus control unit and described cellular array bus, microprocessor and cell interiorBus is connected, and described bus control unit is for identifying communicating by letter of carrying out between described master cpu and this cell, connectionDescribed microprocessor reads with the instruction of transmitting described master cpu and sending or data, state, or by described thinBorn of the same parents' internal bus connects described non-volatile random access memory carries out the read-write operation of data.
Those skilled in the art know, a fairly simple and good CPU of performance, and as ARMCotexM0,Only has the metal-oxide-semiconductor of 50,000 left and right, even if appropriateness increases FPU function, also than more than one hundred million the MOS of top CPUManage much smallerly, it is out-of-proportion promoting that the area (cost) that brings of cpu performance increases. A large CPUWith many little CPU replacements, in the situation that totle drilling cost is the same, total computing capability necessarily increases many times. But passSystem computer architecture is limited by communication performance bottleneck, uses the actual performance that a large amount of CPU cores bring to promote very limited.
And the cellular array computing system that technical solution of the present invention provides is by data broadcast and internal network, solveThe problem of communication performance bottleneck, thereby the overall performance of lifting computing system, and can make cost effectiveness better, will be rear to thisIn continuous application example, be more clearly visible this point.
Preliminary research shows, if use the MPU that is similar to CortexM0, coordinates the internal memory of 32KB, groupBecome a cell. Use the technique of 40 nanometers, 3000 such cells can be made on a chip, thisIt is very powerful computing capability. Further research shows, makes in this way, can be at same silicon chip faceOn long-pending, surmount contemporary top CPU computing capability (generally use flops per second (FLOPS,Floating-pointoperationspersecond) measure). Because the cellular array of technical solution of the present invention calculates systemSystem no longer faces the bottleneck with memory interface, and in a lot of practical problems of solution, the meeting of performance is better.
Based on above-mentioned cellular array computing system, the embodiment of the present invention also provides a kind of above-mentioned cellular array computing systemIn communication means, comprising: operation, master cpu and the Wei Chu of master cpu read-write non-volatile random access memoryCommunication behaviour in traffic operation between reason device, the broadcast operation of master cpu and cellular array between flanking cellDo;
The operation of described master cpu read-write non-volatile random access memory specifically comprises: arbitrary in described cellular arrayCell receives the destination address that described master cpu is broadcasted in described cellular array bus, if judge described targetAddress, in this cell, connects the non-volatile random access memory of this cell so that described master cpu carries out dataRead-write operation.
Traffic operation between described master cpu and microprocessor specifically comprises: in system address space reserved theOne special address field being for communicating by letter and storing the ID of target cell between described master cpu and microprocessor, ifWhen in described cellular array, arbitrary cell receives described the first special address field, identify is the micro-processing with this cellThe communication of device, the microprocessor that connects this cell completes follow-up command reception, data receiver and state and reads behaviourDo.
It should be noted that, described system address space is not limited only to wrap in each cell by cellular arrayThe address space summation of the non-volatile random access memory composition containing, may because connect the memory of cellular array bus, may also there are other types completely in the non-volatile random access memory not just comprising in each cell of cellular arrayMemory be connected with cellular array bus, for the access of described master cpu. Therefore, described master cpu needs(now this cell is called that " target is thin in the present embodiment to remove to identify its cell of preparing access according to the ID of cellBorn of the same parents ").
The broadcast operation of described master cpu specifically comprises: in system address space, reserved the second special address field is usedIn described master cpu broadcasting instructions, described the second special address field has can help to determine order in described cellular arrayThe ID of each cell of the scope in mark region, if arbitrary cell receives described the second special address in described cellular arrayAfter section, identify this cell in described target area, connect the microprocessor of this cell to transmit described master controlThe instruction that CPU sends or data, state read, or are undertaken by the non-volatile random access memory that connects this cellThe read-write operation of data.
Below taking described target area specifically rectangular area as example to the broadcast operation of described master cpu for exampleBright. In system address space, reserved one section is used as broadcasting instructions, and one section in this address for storing target squareThe ID of starting point cell in shape region. Described starting point cell is firstly in this target rectangle region to be accessed by master cpuCell, the bus control unit in this cell receives behind this special address, receives a word (word) belowData, these data comprise in target rectangle region the ID with the cell at starting point cell diagonal angle. Bus control unit is sentencedDisconnected this cell, in this region, receives the data of second word. Second sign is bright is instruction or the number to MPUAccording to, or from non-volatile random access memory, some relative addresses start to write. If the former connects MPU,If the latter connects non-volatile random access memory and completes operation below.
It should be noted that, in the case of the memory space of described the second special address field is comparatively limited, its storageCell ID might not determine completely accordingly and the scope of target area now receive described the second special addressAfter section, also need to receive follow-up data to coordinate the cell ID being stored in described the second special address field to be total toWith the scope of determining target area.
Traffic operation in cellular array between flanking cell comprises: the arbitrary cell in described cellular array is micro-at itUnder the control of processor, send data to flanking cell.
In the present embodiment, each cell is provided with a bus control unit, is connected with cellular array bus. In cellPortion arranges cell interior bus, described non-volatile random access memory be this cell interior bus from equipment (Slave),Described bus control unit and microprocessor are main equipment (Master).
In communication means in above-mentioned cellular array computing system, relate to " the arbitrary cell in described cellular array is sentencedWhether disconnected described destination address is in this cell ", " identification whether with the communicating by letter of the microprocessor of this cell ", " identificationWhether this cell in described target area ", and " connecting non-volatile random access memory or microprocessor " be allCompleted by described bus control unit, described bus control unit connects described non-volatile by described cell interior busRandom access memory.
In the specific implementation, described master cpu is for the non-volatile random storage of arbitrary cell in described cellular arrayThe priority of the read-write operation of device higher than this intracellular microprocessor for corresponding non-volatile random access memoryRead-write operation. The microprocessor in certain cell that is to say, if need to be read and write non-volatile random in this cellMemory, the master cpus such as palpus just can enter after completing for the read-write operation of the non-volatile random access memory of this cellOK.
The concrete enforcement of the communication means in described cellular array computing system also can be with reference to above-mentioned cellular array meterThe enforcement of calculation system repeats no more herein.
In addition, the embodiment of the present invention also provides the above-mentioned cellular array computing system of a kind of application to calculate MonteCarloThe method of integration. MonteCarlo integration is the summation to random number, is the computation of commonly using at Science and engineering,Its principle is relatively simple, and computational process is with reference to following formula.
S=ΣRandom(x)F(x1,x2,.....,xN)
Below by the superiority further to show cellular array computing system for this way to solve the problem.The calculating of MonteCarlo integration is very suitable for above-mentioned cellular array computing system, and concrete execution step is as follows:
Described master cpu is selected the cell all or in a target area in described cellular array, integrandThe program that F () is corresponding is broadcast to the relative address section of selected each cell;
Described master cpu broadcasting instructions, makes the microprocessor of selected cell start to carry out from described relative address sectionThe program that integrand F () is corresponding;
Each cell completes after integral operation, summation is stored in to the address of agreement, after reading for described master cpuAlways sue for peace.
In the present embodiment, the program of described integrand F () correspondence is in the time starting to carry out, and the random number that it carries occursDevice reads the ID of cell as seed, so can ensure that the random number that each cell produces is different.
In the time that reality is implemented, several thousand cells start computing simultaneously, and powerful operational capability discharges completely, is no longer subject toTo the restriction of communication performance bottleneck in prior art, thereby can complete more efficiently the calculating of MonteCarlo integration.
In the time that reality is implemented, if integrand F () is very complicated, the internal memory of a cell can not load, and can also lead toThe mode of crossing streamline solves. Therefore, the present embodiment also provides another kind of application above-mentioned cellular array computing system meterThe method of calculating MonteCarlo integration, comprising:
Described master cpu is selected the cell all or in a target area in described cellular array;
In the same segment relative address of selected each cell that downloads of described master cpu broadcast,And broadcasting instructions makes the microprocessor of selected cell start to download described in execution from described relative address; DescribedDownload and will wait for next step input;
Program corresponding integrand is split into two or more subprograms, and described master cpu broadcast is eachIndividual subprogram is in the microprocessor of selected cell;
The microprocessor downloading described in operation, selects one of them subprogram according to the ID of place cell separatelyStore, each subprogram is deployed in one group of adjacent successively cell according to the order of sequence;
Described master cpu broadcasting instructions, makes the microprocessor of each group cell carry out successively described integrand correspondenceProgram by the subprogram after splitting, the intermediate object program of previous stage is transported to next stage input;
Each group cell completes after integral operation, summation is stored in to the address of agreement, reads for described master cpuAfter always sue for peace.
For instance, as shown in Figure 6, such as integrand F () being split into f1, f2, tri-parts (three of f3Individual subprogram), be deployed in adjacent cell, the intermediate object program of previous stage is transported to next stage input.
Particularly, when master cpu broadcast f1, f2, f3 to each MPU (note: be not to be dealt in internal memory,To issue MPU), the MPU that downloads of operation, according to own ID at the axial coordinate of rectangular coordinate system xX (for example with x/3 remainder) selects one of them subprogram to store. Like this, by the broadcast in two stages, threeIndividual subprogram has been deployed to according to desirable rule in the cell of all participations calculating.
In addition, because being actually from first subprogram, the execution of the corresponding program of described integrand F () starts to carry out, when the subprogram of first after fractionation starts to carry out so, remain the randomizer carrying with it and readGet the ID of cell as seed, ensure that the random number that each cell produces is different.
Compare with traditional computer framework, the broadcast capability of described cellular array computing system with can be parallel on a large scaleThe advantage of the internal network of transmission data here embodies. If calculate this with traditional multiple nuclear CPU frameworkProblem, when the buffer memory that carries as each CPU is inadequate, all CPU all need to from the interface of internal memory readThe code of integrand F (), forms bottleneck.
In the present embodiment, for the enforcement of the internal network of cellular array, not only can send data to flanking cell,But also it is expanded to and can send to any one cell from a cell, can realize in cellular arrayCell-cell communication.
Particularly, between any two cells in the cellular array of described cellular array computing system, can not rely onIn the situation of master cpu, communicate, participate in the cell of cell-cell communication comprise starting point cell, terminal cell and inTurn cell, described starting point cell is the cell that sends data to described terminal cell, and described terminal cell is terminatingReceive the cell of described starting point data that cell is sent out, described transfer cell is adjacent and logical successively along cell-cell communication pathThe cell of crossing starting point data that cell is sent out described in described communication interface transfer, described cell-cell communication path is by describedThe data transmit-receive path that starting point cell, transfer cell and terminal cell form.
By the communication interface between flanking cell in cellular array, realize between flanking cell for data repeatedlyTransfer, makes not rely on master cpu just can communicate between any two cells in cellular array, has improved thinThe efficiency of intercellular communication, has also reduced the processing load of master cpu, thereby can further promote the whole of computing systemBody performance.
It should be noted that, described starting point cell, terminal cell and transfer cell are all with respect to certain iuntercellularThe relative concept of communication process, because certain starting point cell completely also may be served as other cell-cell communication processesIn transfer cell or terminal cell, certain terminal cell also may be served as the transfer in other cell-cell communication processesCell or starting point cell.
In the specific implementation, the cell in described cellular array can also comprise the network being connected with described microprocessorController, described network controller is for the data to the data of sending, transfer in the time of cell-cell communication or finalThe data that receive are received and dispatched control, also for sending interrupt signal to described microprocessor. In the present embodiment, logicalCross a network controller be set in each cell, so as in the situation that not disturb MPU rapidly in revolutionAccording to, reduce thus the processing load of MPU in cell. In other embodiments, described network also can be setController, but realized the transfer of data by MPU.
In the present embodiment, " data of sending " refer to the data that described starting point cell is sent out self; " transferData " refer to the data that starting point cell sends described in described transfer transit cell, these data are not by this cellSelf is required sends; " the final data that receive " refer to the data that terminal cell receives, and these data existThrough repeatedly having arrived destination after transfer, will no longer carry out transfer. " data of sending ", " data of transfer "" the final data that receive " may be identical data with regard to content, only belong to the different communication stageDifference address.
In the specific implementation, the cell in described cellular array can also comprise be connected with described network controller oneThe above First Input First Output of group or one group, respectively organize First Input First Output respectively corresponding one adjacent with this cell carefullyBorn of the same parents, each group First Input First Output comprises input First Input First Output and output First Input First Output, described input is firstEnter first dequeue and carry out data or the final data that receive of transfer for storing this cell of input, described output first entersFirst dequeue carries out from the need of this cell output for storing that the data of transfer or this cell send to other cellsData.
If so that shown in Fig. 3, between flanking cell, communication mode is as example in the situation that, in the cellular array of the present embodimentCarry out cell-cell communication cell structure as shown in Figure 7, the network controller in Fig. 7 respectively with MPU and 4Group fifo queue be connected, respectively organize fifo queue respectively with this cell in the " upper and lower, left and right four of two dimensional surfaceIndividual direction " adjacent cell correspondence one by one, while specifically enforcement, the communication port between every two flanking cells is passableShare one group of corresponding fifo queue. Every group of fifo queue all comprises input FIFO and output FIFO, stands in itIn in the angle of a cell, input FIFO deposits from the data of other flanking cell inputs, output FIFODeposit the data from this cell to other flanking cell outputs, the output FIFO of flanking cell is for this cellBelong to input FIFO, the output FIFO of this cell belongs to input FIFO for flanking cell.
It should be noted that, correspondingly in the cell shown in Fig. 7 have 4 groups of fifo queues, if be positioned at rectangleThe cell at 4 angles of cellular array, this cell only has two flanking cells, now the corresponding 2 groups of FIFO of this cellQueue, if be positioned at the cell on 4 limits of the cellular array of rectangle, this cell has three flanking cells, nowThe corresponding 3 groups of fifo queues of this cell.
In the present embodiment, network controller is also connected with this intracellular MPU, sends out interrupt signal, as FIFO to itEmpty, FIFO is full, newly send etc. to data, data; MPU can send data by network controller,The data of sending can first be put into a corresponding output fifo queue conventionally.
It should be noted that, the eucaryotic cell structure in Fig. 7 only shows the module relevant to carrying out cell-cell communication,Those skilled in the art can understand, and the eucaryotic cell structure shown in Fig. 7 can be tied with the cell shown in Fig. 5 completelyStructure combines.
In addition, in the present embodiment, adopt the data of certain cell of fifo queue storage input and output, so canMake the data relay in cell-cell communication process more efficient, reduce the processing load of MPU. At other embodimentIn, the data of certain cell of input and output also can realize by register.
The embodiment of the present invention also provides the communication means between cell in a kind of above-mentioned cellular array computing system, bagDraw together: the starting point cell in described cellular array, by the data of sending to terminal cell, sends by selected sending directionTo the cell adjacent with described starting point cell; When receiving flanking cell, sends in the arbitrary cell in described cellular arrayData or when the data of transfer, if according to the ID judgement of the described terminal cell of indicating in the data that receiveGoing out this cell is terminal cell, the data that receive is deposited in to the non-volatile random access memory of this cell, or logicalThe microprocessor of knowing this cell is processed the data that receive, otherwise this cell is as transfer cell, sends out selectedSend after direction the described data relay receiving to the cell adjacent with this cell.
In the specific implementation, each data relating in cell-cell communication process all can contain starting point cell and terminalThe ID of cell, arbitrary cell just can judge this number according to the ID of the terminal cell of indicating in the data that receiveAccording to issue this cell or need further in be given to other flanking cells. Article one, data are passed through flanking cellBetween connection, through repeatedly transfer, the cell of reaching home, if this terminal cell need to be with regard to starting point data that cell is sent outMake feedback, can feedback data be sent to starting point cell according to the ID of starting point cell, described terminal cell withThe ID of the starting point cell of indicating in the data that receive, as the ID of terminal cell, is entering the data that receiveIn the feedback data that row obtains after processing, indicated, now this terminal cell becomes a new cell-cell communicationTime starting point cell, terminal cell when original starting point cell becomes this cell-cell communication.
When concrete enforcement, in indicating the ID of terminal cell, described starting point cell sends to terminal cellIn data, also indicate in described terminal cell and to want accessed address or MPU; Described the data that receive are deposited inThe non-volatile random access memory of this cell is described terminal cell identifying wanting of indicating in the data that receiveAfter accessed address, carry out; The described MPU that notifies this cell processes the data that receive, and is instituteState that terminal cell carries out after identifying the MPU that indicates in the data that receive.
In the time that reality is implemented, if terminal cell identify in the data that receive, indicate want accessed addressAfter, the data that receive can be by the intracellular network controller of this terminal the non-volatile random of this cell that write directAppropriate address in memory, in this case, cell can be realized " breeding ", and a cell can be given anotherAn outer cell downloads; If terminal cell is after identifying the MPU indicating in the data that receive,The data that receive will transfer to the intracellular MPU of terminal to process.
In the present embodiment, because the cell in cellular array also comprises the network controller being connected with MPU, because ofArbitrary cell reception flanking cell that described in this, starting point cell sends in data, described cellular array to terminal cell is sent outThe data that go out or the data of transfer also judge that this cell is final cell or transfer cell, the data that receive are depositedEnter the non-volatile random access memory of this cell or notify the MPU of this cell to process the data that receive, allTo complete under the control of described network controller.
When concrete enforcement, the data that described starting point cell sends to terminal cell are first inputted institute by described network controllerState output First Input First Output, then exported to described from described output First Input First Output by described network controllerThe cell that starting point cell is adjacent; If the arbitrary cell in described cellular array receive data that flanking cell sends orThe data of transfer, input the data that receive described input First Input First Output, and receive judgingWhen need to carrying out transfer, data again these data are inputted to described output First Input First Output.
In addition, if described network controller is judged described input First Input First Output or output First Input First Output isEmpty or full, or receive that flanking cell sends or the data of transfer, or send data or transfer to flanking cellData, send interrupt signal to described microprocessor.
In the specific implementation, described starting point cell or transfer cell can be selected described transmit leg in the following wayTo: if can form the communication path of straight line between described starting point cell or transfer cell and described terminal cell,Serve as reasons described starting point cell or the direction of transfer cell along described straight line to described terminal cell of described sending direction,Otherwise described sending direction is served as reasons, described starting point cell or transfer cell are to the direction of flanking cell to be selected, described to be selectedFlanking cell be with described starting point cell or the adjacent cell of transfer cell among near the cell of described terminal cell.Certainly, the quantity of described flanking cell to be selected is likely two, now selects in these two flanking cells to be selectedThe communication task of output data cell is still less as transfer cell.
In the present embodiment, described starting point cell or transfer cell be selected sending direction by the way, in fact alsoCan think the Path selection process of cell-cell communication in cellular array. Can consult Fig. 8, each in Fig. 8Rectangle represents a cell in cellular array, and all cells shown in Fig. 8 is in whole cellular arrayPoint, suppose to be undertaken by the communication mode shown in Fig. 3 between flanking cell.
If A point represents a starting point cell, this starting point cell is prepared to send data to the terminal cell at C point place,Owing to being obviously the communication path that can form straight line between A point and C point, the cell at A point place willData are sent to the cell at the B point place being adjacent, and in like manner, the cell at B point place, as transfer cell, continuesDirection interim data along from the straight line between A point and C point to C point place cell, between A point and C pointAdjacent cell successively on the cell-cell communication path forming, the data that A point place cell is sent repeatedly forward,Until be sent to C point place cell.
If D point represents another starting point cell, this starting point cell is prepared to send number to the terminal cell at G point placeAccording to, owing to being obviously the communication path that cannot form straight line between D point and G point, with D point placeAmong the adjacent cell of cell, the cell at the cell at E point place and F point place is the end at more close G point place obviouslyPoint cell, these two cells belong to the flanking cell to be selected of D point place cell, can select wherein to export dataCell still less of communication task as transfer cell, if the communication task of these two cells output data is identical,Choose at random a cell as transfer cell. As shown in Figure 8, select cell or the F point place at E point placeCell, will form different cell-cell communication path.
It should be noted that, be to carefully in the present embodiment as an example of the communication mode between the flanking cell shown in Fig. 3 exampleThe Path selection of intercellular communication describes, and those skilled in the art can understand, if adopt Fig. 4 instituteCommunication mode between the flanking cell showing, alternative sending direction will be more.
To sum up, in the time that reality is implemented, each sends or the cell of interim data, and its network controller all must choosingSelect an adjacent cell as the next stop. When starting point and terminal are point-blank time, reasonably select only generalThere is one; In other situations, have two same reasonably selections, network controller will be selected a traffic relatively notBusy neighbours.
If certain input fifo queue has data to enter, first network controller will check it:
If terminal is this cell, so: if terminal is specific relative address, because network controller has directlyThe ability of internal storage access (DMA, DirectMemoryAccess), will directly deposit institute in the data that receiveState the appropriate address in non-volatile random access memory, and use interrupt notification MPU; If terminal is MPU, directBy interrupt signal notice, MPU processes.
If terminal is other cells, or the MPU of this cell outwards sends data, so: if terminal and thisCell, is selected correct direction point-blank, sends data to flanking cell; In other situations, have twoIndividual possible direction, selects wherein to export the more idle flanking cell of fifo queue and sends, if two are treated phase selectionThe situation of the output fifo queue in adjacent cell is identical, can therefrom choose at random a flanking cell and send.
In the time that reality is implemented, in the time that the thousands of MPU that exist in described cellular array calculate together, how respectivelyThe output data of individual cell send to described master cpu just to become a problem. In general, each MPU canWith the agreed address in the non-volatile random access memory of its place cell output deposit data, make master cpu logicalThe mode of the each MPU of poll one by one of crossing reads. But this is not also suitable for all problems, asks at someIn topic, in the time only having a few cell to export data to master cpu in cellular array, master cpu soThe each MPU efficiency of poll is too low one by one.
Therefore the cellular array computing system that, the embodiment of the present invention provides also comprises: in described cellular array, be also provided withAt least one full-time output cell, described full-time output cell receives and stores other cells to institute as terminal cellState the output data of master cpu, and notify described master cpu to read described output data with interrupt signal.
In the specific implementation, can also in the non-volatile random access memory of described full-time output cell, FIFO be setQueue, other cells are stored in this fifo queue to all output data of described master cpu, this FIFOQueue should have enough memory spaces, all output numbers of other cells of storage to described master cpu of having the abilityAccording to.
When actual enforcement, can in cellular array, select one or several cell as described full-time output cell,Generally can select in position and communicate between master cpu cell more easily. Described full-time output is thinBetween born of the same parents and described master cpu, be provided with interrupt line, described full-time output cell can send and interrupt to master cpuSignal, for example, newly in, MRAM full to the fifo queue arranging in the output data of other cells, MRAMThe FIFO arranging is empty etc.
Based on the above-mentioned cellular array computing system that is provided with full-time output cell, it is a kind of thin that the embodiment of the present invention also providesCommunication means in born of the same parents' array computation system, comprising: described full-time output cell receives and store other cells to instituteAfter stating the output data of master cpu, to the described master cpu interrupt signal reading of giving notice; Described masterControl CPU is receiving after the interrupt signal that described notice reads, and from described full-time output cell, reads described outputData.
When concrete enforcement, described other cells can be sent to described sole duty by described output data in the following wayOutput cell: the arbitrary cell in described other cells, as starting point cell, is pressed selected sending out by described output dataSend direction to be sent to adjacent cell; When the arbitrary cell in described cellular array receives the institute that flanking cell sendsWhile stating output data, the ID of terminal cell and the ID of this cell mono-that in described output data, indicate if judgeCause, due to the ID that the ID of the terminal cell of indicating in described output data is described full-time output cell, showThis cell is described full-time output cell, described output data is deposited in to the non-volatile random access memory of this cell,Otherwise this cell, as transfer cell, is given adjacent with this cell by described output data relay after selected sending directionCell.
Described other cells are sent to described output data in the process of described full-time output cell, and described starting point is thinBorn of the same parents or transfer cell can be selected described sending direction in the following way: if described starting point cell or transfer cell withBetween described full-time output cell, can form the communication path of straight line, the described sending direction described starting point of serving as reasonsCell or the transfer cell direction along described straight line to described full-time output cell, the institute otherwise described sending direction is served as reasonsState starting point cell or the transfer cell direction to flanking cell to be selected, described flanking cell to be selected is thin with described starting pointAmong the adjacent cell of born of the same parents or transfer cell near the cell of described full-time output cell.
The implementation process of the sole duty output cell of the embodiment of the present invention can also be consulted Fig. 9. Fig. 9 shows master controlCPU, cellular array and cellular array bus, the lattice one by one in cellular array is expressed one simplyIndividual cell, wherein the cell at J point place (being the cell that bold box lattice represents) is full-time output cell, figure9 also further show the structure of full-time output cell, as shown in dotted arrow in Fig. 9, can see sole dutyIn MRAM in output cell, be provided with the FIFO of other cells of storage to all output data of described master cpuQueue.
Suppose that the cell at H point place and the cell at I point place need to provide output data to master cpu, canWith the communication mode by between cell, described output data are sent to the cell at J point place, H point is ordered to JCell-cell communication path and I point, to the cell-cell communication path that J is ordered, refer to Fig. 9. Due to cell-cell communication sideFormula has a detailed description before this, repeats no more herein.
The cell at J point place receives after the output data that the cell at H point place or the cell at I point place send,Can send and notify the interrupt signal reading to master cpu, master cpu receives the interruption that this notice readsAfter signal, just can from the cell at J point place, read by cellular array bus this output data.
By full-time output cell is set in cellular array, using described full-time output cell as described terminal cellReceive and store the output data of other cells to master cpu, and notify described master control in the mode of interrupt signalCPU reads described output data, so can only have a few cell to export data to master cpuShi Tigao master cpu reads the efficiency of output data.
Introduce again the example of an above-mentioned cellular array computing system of application below.
Speech recognition can be compared with the voice signal of known sound bank and input, this comparison can beTime domain comparison also can be compared at frequency domain. When the words of needs comparison gets more and more, such as, consider differentAccent can be arrived tens thousand of, if only rely on the computing capability of a few CPU just to seem for Real-time speech recognitionNot enough.
The cellular array computing system that the embodiment of the present invention provides is applicable to solving problems very much.
For this reason, the embodiment of the present invention also provides a kind of side that applies above-mentioned cellular array computing system and carry out comparingMethod, comprising: described master cpu is selected after the cell in whole in described cellular array or a target area,Comparison program is broadcast in the non-volatile random access memory of each cell; Described master cpu is selected each thinThe sample that born of the same parents are responsible for comparison is written to respectively in the agreed address of each cell; Described master cpu broadcasting instructions is given selectedSelect the microprocessor of cell, each microprocessor is completed and initialize the data to be compared to be entered such as rear; Described masterControl CPU gives data broadcast to be compared the microprocessor of selected cell; The microprocessor operation of selected cellDescribed comparison program, the sample that the data to be compared that receive and this cell are responsible for comparing compares, if obtainObtain both consistent comparative results, use the communication means in above-mentioned cellular array computing system, by described comparisonResult is sent to described full-time output cell as output data and reads for described master cpu.
When concrete enforcement, described data to be compared can be both speech datas to be identified, can be also to be identifiedView data, can also be other data that need to compare.
In the time that reality is implemented, each MPU constantly accepts speech data and compares, under normal circumstances, and hundreds ofIn several thousand cells, only have one or a few to obtain the sample that data to be compared and this cell are responsible for comparingThe comparative result that both are consistent, these cells send to full-time output cell comparative result, and the latter is with interrupting letterNumber notice master cpu receive.
If described data to be compared are specially speech data, the process of comparing can be at time domain or frequency domainThe latter carries out, if can first carry out Fast Fourier Transform (FFT) (FFT, FastFourier by master cpu segmentationTransformation), then broadcast be transformed into frequency domain speech data to the MPU of selected cell.
Carry out comparing by applying the above-mentioned cellular array computing system that is provided with full-time output cell, can make thinIn born of the same parents' array, the computing of program of simultaneously comparing of a large amount of cells, has extremely strong parallel processing capability thus, separatesThe communication performance bottleneck problem between CPU and internal memory in prior art of having determined, obtains the ability of real-time voice/image recognitionTo greatly promoting.
As previously mentioned,, there has been a letter certain target area from a cell broadcast message to cellular arraySingle method: broadcasted again by master cpu reading information. The present embodiment also provides another implementation:Intercellular point communications functionality is extended to region mass-sending, and this mode can be supported the higher depth of parallelism,Much higher total bandwidth.
In the cellular array computing system providing at the present embodiment, the arbitrary cell in described cellular array can also be served asDescribed starting point cell is mass-sended communication to all cells in target area, participates in described mass-sending communication and is positioned at orderCell in mark region is as described starting point cell or as described terminal cell or simultaneously thin as described transferBorn of the same parents and terminal cell, participate in described mass-sending communication and be arranged in cell outside target area as described starting point cell orTurn cell.
When concrete enforcement, the network controller being connected with microprocessor in each cell, except at any two cellsCarry out cell-cell communication, but also in the time of described mass-sending communication, the data to the data of sending, transfer or finalThe data that receive are received and dispatched control, and described network controller is also for sending interrupt signal to described microprocessor.
In the time that reality is implemented, the original sender of iuntercellular mass-sending communication (thin as starting point cell in cellular arrayBorn of the same parents) be responsible for indicating and still complete the mass-sending of data by a series of transfers in target area. Those skilled in the art's energyEnough understanding, iuntercellular mass-sending communication also can be thought repeatedly effective stack of intercellular point-to-point communication,Therefore the concrete enforcement of iuntercellular mass-sending communication also can be with reference to the enforcement communicating between any two cells, exampleCell in the described cellular array of mentioning so can comprise one group that is connected with described network controller equallyOr one group of above First Input First Output, repeat no more herein.
On the basis of mass-sending communication between above-mentioned cellular array computing system supportint cell, the embodiment of the present invention also providesIuntercellular mass-sending communication means in a kind of above-mentioned cellular array computing system, comprising: arbitrary thin when in cellular arrayWhen born of the same parents initiate mass-sending communication as starting point cell to all cells in target area, if this starting point cell is positioned at this targetIn region, iuntercellular is mass-sended to data and be sent to all flanking cells that are positioned at this target area, and for oftenA flanking cell upgrades target area, otherwise by the direction near target area, described iuntercellular mass-sending data is sent outDeliver to adjacent cell; If the cell being positioned at outside target area receives the described iuntercellular mass-sending that flanking cell sendsData, do not comprise after this cell judging the target area of indicating in described iuntercellular mass-sending data, and this is thinBorn of the same parents, as transfer cell, mass-send data relay to flanking cell by the direction near target area by described iuntercellular;Receive if be positioned at the cell of target area the described iuntercellular mass-sending data that flanking cell sends, judgingThe target area of indicating in described iuntercellular mass-sending data comprises after this cell, and this cell, will as terminal cellThe described iuntercellular mass-sending data that receive deposit the non-volatile random access memory of this cell in, or notify this cellMicroprocessor is processed described iuntercellular mass-sending data, if still exist adjacent with this cell in target areaCell, this cell is also as transfer cell, by the described iuntercellular mass-sending data relay receiving to all positionsFlanking cell in target area, and upgrade target area for each flanking cell; Target area after renewalTerritory comprises by the target area before upgrading divides one or more target areas that form, the target area before renewalIn territory, send or transfer described in each flanking cells of cell of iuntercellular mass-sending data be contained in respectively after renewalEach target area in, sent or the order of the cells exclude of iuntercellular mass-sending data after renewal described in transferOutside mark region.
It should be noted that, because master cpu also can be by the data broadcast of certain cell to certain in cellular arrayIndividual target area, in order to distinguish to some extent with " broadcast data of master cpu ", therefore in the present embodiment by iuntercellularWhen mass-sending communication, related mass-sending data are called " iuntercellular mass-sending data ". Initiate the thin of iuntercellular mass-sending communicationBorn of the same parents understand hard objectives region, and in this target area, the scope of the ID of all cells or all cells ID will be indicatedAmong described iuntercellular mass-sending data, arbitrary cell receives described iuntercellular mass-sending data, just can be according to instituteState the target area of indicating in iuntercellular mass-sending data and judge that these iuntercellular mass-sending data are by the terminating of this cellReceive, or in needing further, be given to other flanking cells, or both all need to carry out.
In addition, described for each flanking cell renewal target area, be specifically by the target area before upgrading(has sent or the cell of iuntercellular mass-sending data described in transfer one or more target areas that division formsBe excluded outside the target area after renewal), wherein each target area can be each self-contained one described adjacent thinBorn of the same parents (with upgrade before target area in send or transfer described in the adjacent cell of the cell of iuntercellular mass-sending data),Described each flanking cell is proceeded intercellular mass-sending in the target area after corresponding renewal separatelyCommunication, correspondingly, the target area of indicating in described iuntercellular mass-sending data equally also can be upgraded.
In the present embodiment, with communication mode between flanking cell shown in Fig. 3, and thin to initiate the mass-sending starting point of communicating by letterThe rectangle that is shaped as of the determined target area of born of the same parents is that example describes. It should be noted that, the present embodiment is givenIuntercellular mass-sending communication mode comparatively convenient and efficient mode while being actual enforcement, those skilled in the art canUnderstand, in other embodiments, in above-mentioned cellular array computing system, iuntercellular mass-sending communication means can be suitable for equallyCommunication mode or the target area of other shapes between other flanking cells.
When concrete enforcement, as the difference of the cell present position of starting point cell or transfer cell, send or transferMode is by different.
When on the angle that is positioned at rectangular target region as the first cell of starting point cell or transfer cell, if described inThe wherein cell quantity on one side that all contains the adjacent both sides of described the first cell in rectangular target region is 1, moreTarget area after new is that described the first cell is got rid of at the another side on described adjacent both sides in described rectangular target regionThe rectangular area of rear formation, otherwise target area after upgrading comprises the target area of two rectangles, one of them orderMark region is that in described adjacent both sides, the rectangular area forming after described the first cell is got rid of on arbitrary limit. Need to illustrateThat described the first cell in the present embodiment is the general designation that is positioned at the class cell on the angle in rectangular target region.
Can consult Figure 10, the cell of supposing K point place is the starting point cell of initiating iuntercellular mass-sending communication, orFor being responsible for the transfer cell of transfer iuntercellular mass-sending data, send for the cell at K point place in rectangular target region 101Or the front determined target area of iuntercellular mass-sending data described in transfer, now the cell at K point place is at rectangular targetIn region 101, and be positioned on the angle in rectangular target region 101, due to rectangular target region 101 in the horizontal directionThis edge only comprise 1 cell, now the cell at K point place only has neighbours can be chosen as in the next stopTurn, the network controller of this cell sends to described iuntercellular mass-sending data in the cell at L point place so, and rightRectangular target region 101 is upgraded, and the target area forming after upgrading is rectangular target region 102, is equivalent toBy the cells exclude at K point place outside rectangular target region 101; Along with the continuous renewal of target area, ifIn target area, remaining last cell, stops transfer.
Suppose that the cell at M point place is also for initiating the starting point cell of iuntercellular mass-sending communication, or thin for being responsible for transferThe transfer cell of intercellular mass-sending data, rectangular target region 103 for the cell at M point place send or transfer described in carefullyDetermined target area before intercellular mass-sending data, now the cell at M point place in rectangular target region 103,And be positioned on the angle in rectangular target region 103, because adjacent both sides, rectangular target region 103 all comprise more than 1Cell, now the cell at M point place has two neighbours can be chosen as next stop transfer, the net of this cell soDescribed iuntercellular mass-sending data are sent to the cell at N point place and the cell at O point place by network controller, and to squareShape target area 103 is upgraded, and the target area after renewal comprises the target area of two rectangles, one of themTarget area is rectangular target region 104, and another target area is rectangular target region 105, is equivalent to MPut the cells exclude at place outside rectangular target region 103, rectangular target region 104 and rectangular target region 105Can be used as independently target area continues to carry out data relay with aforementioned similar approach; Along with target area constantlyUpgrade, if remaining last cell in target area stops transfer.
When on the limit that is positioned at rectangular target region as the second cell of starting point cell or transfer cell, if described inIn rectangular target region, the cell quantity on the limit adjacent with described the second limit, cell place is 1, the target after upgradingRegion comprises that described the second limit, cell place gets rid of the target area of two rectangles that form after described the second cell, noTarget area after upgrading comprises the target area of three rectangles, and wherein two target areas are described the second cellTwo rectangular areas that form after described the second cell are got rid of on limit, place. It should be noted that the institute in the present embodimentState the second cell and be the general designation of the class cell on the limit that is positioned at rectangular target region.
Can consult Figure 11, the cell of supposing P point place is the starting point cell of initiating iuntercellular mass-sending communication, orFor being responsible for the transfer cell of transfer iuntercellular mass-sending data, send for the cell at P point place in rectangular target region 111Or the front determined target area of iuntercellular mass-sending data described in transfer, now the cell at P point place is at rectangular targetIn region 111, and be positioned on certain limit in rectangular target region 111, due in rectangular target region 111 with PThe cell quantity on the adjacent limit, limit, cell place at some place is greater than 1, and now the cell at P point place has three neighboursResidence can be chosen as next stop transfer, and the network controller of this cell sends to described iuntercellular mass-sending data respectivelyThe cell at the cell at Q point place, the cell at R point place and S point place, and rectangular target region 111 is carried outUpgrade, the target area after renewal comprises the target area of three rectangles, is respectively rectangular target region 112, squareShape target area 113 and rectangular target region 114, be equivalent to the cells exclude at P point place in rectangular target districtOutside territory 111, rectangular target region 112 and 113 these two target areas, rectangular target region are the equal of P pointTwo rectangular areas that form after the cell at P point place are got rid of on the limit, cell place at place, rectangular target region 112,Rectangular target region 113 and rectangular target region 114 can be used as independently target area and continue with aforementioned similar sideMethod is carried out data relay; Along with the continuous renewal of target area, if remaining last cell in target area,Stop transfer.
Be understandable that, if in target area (in Figure 11 do not indicate) with limit, the cell place phase at P point placeThe cell quantity on adjacent limit is 1, and now the cell at P point place has two neighbours can be chosen as next stop transfer,The network controller of this cell sends to described iuntercellular mass-sending data respectively cell and the R point place at Q point placeCell, and target area is upgraded, the target area after renewal comprises the target area of two rectangles, toolBody is rectangular target region 112 and rectangular target region 113.
In the time being positioned at the inside in rectangular target region as the 3rd cell of starting point cell, the target area bag after renewalDraw together the target area of four rectangles, wherein two target areas are that described the 3rd cell place row or column gets rid of describedTwo rectangular areas that form after three cells, two other target area is the described rectangular target region quilt before upgradingDescribed the 3rd cell place row or column is cut apart and two rectangular areas forming. It should be noted that, in the present embodimentDescribed the 3rd cell be the general designation that is positioned at a class cell of the inside in rectangular target region, described rectangular target regionInside refer to the region except " angle " and " limit ".
Can consult Figure 12, the cell of supposing T point place is starting point cell (this reality of initiating iuntercellular mass-sending communicationThe cell of executing T point place in example can not be mass-sended for being responsible for transfer iuntercellular the transfer cell of data), rectangular targetRegion 121 is front determined target area, now T of described iuntercellular mass-sending data for the cell at T point place sendsThe cell at some place is positioned at the inside in rectangular target region 121, and the cell at T point place has four neighbours to selectFor next stop transfer, described iuntercellular mass-sending data are sent to respectively U point place by the network controller of this cellThe cell at the cell at cell, V point place, the cell at W point place and X point place, and to rectangular target region121 upgrade, and the target area after renewal comprises the target area of four rectangles, are respectively rectangular target region122, rectangular target region 123, rectangular target region 124 and rectangular target region 125, is equivalent to T point instituteCells exclude outside rectangular target region 121, rectangular target region 122 and rectangular target region 123 thisThe cell that two target areas are the equal of T point place is expert at and is got rid of two squares that form after the cell at T point placeShape region, rectangular target region 124 and 125 these two target areas, rectangular target region are the equal of rectangular targetTwo rectangular areas being expert at and cutting apart and form by the cell at T point place in region 121, rectangular target region 122,Rectangular target region 123, rectangular target region 124 and rectangular target region 125 can be used as independently target areaTerritory continues to carry out data relay with aforementioned similar approach; Along with the continuous renewal of target area, if in target areaRemaining last cell, stops transfer.
In the present embodiment, in the time being positioned at outside target area as the 4th cell of starting point cell or transfer cell, asIn fruit described the 4th cell and target area, between arbitrary cell, can form the communication path of straight line, described theFour cells send or transfer described in the sending direction of iuntercellular mass-sending data serve as reasons described the 4th cell along described straight lineTo the direction of target area, otherwise described sending direction is served as reasons, described the 4th cell is to the direction of flanking cell to be selected,Described flanking cell to be selected is the cell near target area among the cell adjacent with described the 4th cell. NeedBright, described the 4th cell in the present embodiment is the general designation that is positioned at the class cell outside rectangular target region.
Can consult Figure 13, the cell of supposing Y1 point place is the starting point cell of initiating iuntercellular mass-sending communication, squareShape target area 131 is the front determined target area of described iuntercellular mass-sending data for the cell at Y1 point place sends,Now the cell at Y1 point place is outside rectangular target region 131, because this cell is relative in rectangular target regionBetween the extended line on two limits, and in rectangular target region, between the cell of Y3 point place, can form straight lineCommunication path, now only has neighbours to can be used as the transfer of the next stop, the network control of the cell at Y1 point placeDevice is sent to this neighbours, i.e. the cell at Y2 point place, the cell at Y2 point place described iuntercellular mass-sending dataAs the transfer cell of iuntercellular mass-sending data described in responsible transfer. The cell at Y2 point place will be along void in Figure 13Direction interim data shown in line arrow, until be sent to Y3 point place cell. Y3 point place cell is positioned at rectangleOn the limit of target area 131, can continue the transfer in rectangular target region 131 according to aforementioned correlation techniqueProcess.
Continue to consult Figure 13, the cell of supposing Z1 point place is the starting point cell of initiating iuntercellular mass-sending communication, squareShape target area 131 is the front determined target area of described iuntercellular mass-sending data for the cell at Z1 point place sends,Now the cell at Z1 point place is outside rectangular target region 131, because this cell is not or not rectangular target regionBetween the extended line on relative two limits, and in rectangular target region, between arbitrary cell, be all difficult to form straight lineCommunication path, now has two neighbours to can be used as the transfer of the next stop, i.e. the cell at Z2 point place and Z3 point instituteCell, these two cells are the flanking cell to be selected of the cell at Z1 point place, because these two cells are Z1The cell in more close rectangular target region 131 among the adjacent cell of cell at some place. In the time that reality is implemented, canUsing select arbitrarily one or more practical communication situation select a low weight cell of burden as next stop transfer,The low weight cell of described burden specifically refers to the cell still less of communication task of output data. From Z1 point placeCell set out, by two feasible transfer communication paths, until by described iuntercellular mass-sending data relay to Z4Point place cell. Z4 point place cell is positioned on the angle in rectangular target region 131, can be according to aforementioned correlation techniqueContinue the transfer process in rectangular target region 131.
Iuntercellular mass-sending communication means in the cellular array computing system that the present embodiment provides, by by intercellular pointPoint to-point communication Function Extension is mass-sended to region, can support the higher depth of parallelism, obtains much higher total bandwidth, fromAnd further promote the overall performance of computing system.
As previously mentioned, the cell in the cellular array of the embodiment of the present invention has internal memory, storage and three functions of calculating concurrently,When intracellular described non-volatile random access memory can be realized described microprocessor calculating, related data is randomAccess, instruction code that again can storing software and need the data of persistence, but described non-volatile random storageThe cost of device is generally higher, so the non-volatile random access memory in cell is to have as the space of interior nonresident portionLimit, in the time that the data processing amount of microprocessor in cell is larger, limited memory headroom just may affect micro-soThe treatment effeciency of processor, memory headroom how to expand cell just becomes problem demanding prompt solution.
Based on above-mentioned consideration, the embodiment of the present invention gives the another kind of structure of cellular array computing system, as figureShown in 14, described cellular array computing system is except comprising foregoing master cpu, cellular array and cell battle arrayColumn bus, can further include at least one memory cell array, described memory cell array be by one withThe two-dimensional array of upper internal storage location composition, described cellular array and all memory cell array are built up three-dimensional knotStructure, the internal storage location in each memory cell array is connected correspondingly with the cell in described cellular array, instituteState internal storage location and coordinate described non-volatile random access memory, related when both calculate for described microprocessor jointlyThe arbitrary access of data.
In the time that reality is implemented, the non-volatile random access memory in cell can be MRAM, described memory cell arrayCan be MRAM, DRAM or SRAM silicon chip, generally can choose lower-cost one or moreDRAM silicon chip, wherein each DRAM silicon chip is by consistent with each cell position in described cellular arrayThe memory cell array that internal storage location forms, more all DRAM silicon chips and a cellular array silicon chip are carried out to 3DCombination, can establish a communications link by TSV between corresponding cell in either memory unit and cellular array,Expand thus the internal memory of each cell.
In the embodiment of the present invention, by by least one by the memory cell array that more than internal storage location forms,Be built up three-dimensional structure with described cellular array, and make internal storage location and cell battle array in each memory cell arrayCell in row is connected correspondingly, related data when described internal storage location calculates for described microprocessorArbitrary access, so just can, with the memory headroom of each cell in lower cost expansion cellular array, improve cellThe treatment effeciency of middle microprocessor.
It is pointed out that and in Figure 14, only show a memory cell array and described cellular array is built upThe situation of three-dimensional structure, those skilled in the art equally also can understand more than one memory cell array and described thinBorn of the same parents' array is built up the situation of three-dimensional structure.
Those skilled in the art can also understand, and aforementioned master cpu is by cellular array bus and cellular arrayIn each cell communicate, do not rely between any two cells the communication that master cpu carries out, arbitrary cell toIn target area, all cells is mass-sended communication, is exported cell as terminal by sole duty is set in cellular arrayCell receives and stores other cells and the communication means such as reads to the output data of master cpu for master cpu,These are equally applicable to the cellular array computing system that comprises described memory cell array.
It should be noted that, because each cell in cellular array has been expanded memory headroom, described master cpu removesCan access the non-volatile random access memory of this cell, can also access the internal storage location corresponding with this cell(when the quantity that is superimposed as the memory cell array of three-dimensional structure with cellular array is when more than one, with this cellThe quantity of corresponding internal storage location also has more than one), therefore described master cpu is by described cellular arrayCommunicating by letter that bus is carried out with each cell in described cellular array comprises at least one in following situation: by groundNon-volatile random access memory or the corresponding internal storage location of arbitrary cell in described cellular array read and write in location; Data are wideBroadcast non-volatile random access memory or the corresponding internal storage location of each cell in target area, and write this targetIdentical relative address in the non-volatile random access memory of each cell or corresponding internal storage location in region; Give instituteThe microprocessor of stating arbitrary cell in cellular array sends instruction, sends data or reading state; Give in target areaThe microprocessor broadcasting instructions of all cells.
In the time that the cell in described cellular array also comprises bus control unit and cell interior bus, described cell interiorBus is except connecting described microprocessor, non-volatile random access memory, the also internal storage location phase corresponding with this cellConnect, described bus control unit is connected with described cellular array bus, microprocessor and cell interior bus, described inBus control unit is for identifying communicating by letter of carrying out between described master cpu and this cell, connect described microprocessor withTransmit instruction that described master cpu sends or data, state reads, or by described cell interior bus connectionDescribed non-volatile random access memory or the internal storage location corresponding with this cell carry out the read-write operation of data.
As previously mentioned, the cell in the cellular array of the embodiment of the present invention has internal memory, storage and three functions of calculating concurrently,When intracellular described non-volatile random access memory can be realized described microprocessor calculating, related data is randomAccess, instruction code that again can storing software and need the data of persistence, but described non-volatile random storageThe cost of device is generally higher, so the non-volatile random access memory in cell is same as the space of storage areaLimited, in the time existing heap file or data need to be stored in each cell in cellular array, limited soMemory space just cannot meet memory requirement, how may affect even in addition the treatment effeciency of microprocessorThe memory space of expansion cell is problem demanding prompt solution equally.
Based on above-mentioned consideration, the embodiment of the present invention gives another structure of cellular array computing system, as figureShown in 15, described cellular array computing system is except comprising foregoing master cpu, cellular array and cell battle arrayColumn bus, can further include at least one memory cell array, described memory cell array be by one withThe two-dimensional array of upper memory cell composition, described cellular array and all memory cell arrays are built up three-dimensional knotStructure, the memory cell in each memory cell array is connected correspondingly with the cell in described cellular array, instituteState memory cell and coordinate described non-volatile random access memory, both are jointly for the instruction code of storing software with needThe data of persistence.
In the time that reality is implemented, the non-volatile random access memory in cell can be MRAM, described memory cell arrayCan be specifically flash memory silicon chip, generally can choose one or more NAND that cost is lower with respect to MRAMFlash memory silicon chip, wherein each nand flash memory silicon chip is by consistent with each cell position in described cellular arrayThe memory cell array that memory cell forms, more all nand flash memory silicon chips and a cellular array silicon chip are carried out3D combination, between arbitrary memory cell and cell corresponding in cellular array can by TSV vertical connected withEstablish a communications link, expand thus the memory space of each cell.
When concrete enforcement, the cell in described cellular array also comprises the storage control being connected with described microprocessorDevice, for carrying out data memory access control to the memory cell being connected with this cell. When one or moreNand flash memory silicon chip and a cellular array silicon chip carry out after 3D combination, can also be each in cellular arrayIndividual cell configuration nand flash memory controller, the MPU of this cell is by the nand flash memory controller ability of this cellCan realize the memory cell corresponding with this cell read and write. When heap file or data storing existIn nand flash memory time, can be searched for by NAND passage separately by each cell the search of data,Accelerated greatly. The compiling of large software system need to be compiled thousands of source code file equallyTranslate, in the time that these source codes are kept in such cellular array computing system, compiling has obtained adding greatly equallySpeed.
The cellular array computing system that what the embodiment of the present invention provided comprise described memory cell array, can be with lowerCost is expanded the memory space of each cell in cellular array, improves the data storage capacities of each cell.
It is pointed out that and in Figure 15, only show a memory cell array and described cellular array is built upThe situation of three-dimensional structure, those skilled in the art equally also can understand more than one memory cell array and described thinBorn of the same parents' array is built up the situation of three-dimensional structure.
Those skilled in the art can understand, and aforementioned master cpu is by cellular array bus and cellular arrayEach cell communicates, do not rely between any two cells the communication that master cpu carries out, arbitrary cell to orderIn mark region, all cells is mass-sended communication, thin as terminal by full-time output cell is set in cellular arrayBorn of the same parents receive and store other cells and the communication means, this such as read to the output data of master cpu for master cpuBe equally applicable to a bit the cellular array computing system that comprises described memory cell array.
As previously mentioned, due to the non-volatile random access memory in cell as the space of internal memory and storage area all veryLimited, internal memory and memory space therefore how to expand cell are problem demanding prompt solutions simultaneously. Based on above-mentioned consideration,The embodiment of the present invention gives another structure of cellular array computing system, as shown in figure 16, and described cellArray computation system is except comprising master cpu, cellular array and cellular array bus, can further include toA few said memory cells array and at least one above-mentioned memory cell array. Comprise described memory cell battle array simultaneouslyThe cellular array computing system of row and memory cell array can only comprise memory cell array or only wrap with reference to above-mentionedContain the concrete enforcement of the cellular array computing system of memory cell array, repeat no more herein.
When providing, the embodiment of the present invention comprises the cellular array of described memory cell array and memory cell arrayComputing system can be expanded with lower cost storage and the memory headroom of each cell in cellular array simultaneously, improvesThe treatment effeciency of microprocessor in the data storage capacities of each cell and cell, thus meter can further be promotedThe overall performance of calculation system.
High-end imageing sensor is within the extremely short time, (Microsecond grade) to obtain view data, but a frameThe data volume of high-definition image is very large, in current camera system, consider active computer framework because of CPU with inThe communication performance bottleneck existing between depositing, storing, for the impact of computer overall performance, reads out view data conventionallyJust need the time of 1/30 second-1/60 second, therefore in most camera systems, to the disposal ability of videoDo not catch up with the speed that obtains view data far away.
Therefore, camera system of the prior art is because its computer architecture adopting exists CPU and internal memory, storageBetween communication performance bottleneck, the overall performance that this has greatly affected computing system, make current camera system forThe disposal ability of video does not catch up with imageing sensor in this camera system far away and collects the speed of view data.
For addressing the above problem, the embodiment of the present invention also provides a kind of shooting of applying above-mentioned cellular array computing systemSystem. As shown in figure 17, described camera system comprises cellular array computing system and imageing sensor, described inCellular array computing system comprises master cpu, cellular array and cellular array bus, specifically can be with reference to above-mentioned phaseThe description of closing embodiment repeats no more herein; Described imageing sensor is made up of more than one image acquisition unitsTwo-dimensional array, described cellular array and described imageing sensor are built up three-dimensional structure, described imageing sensorIn image acquisition units be connected correspondingly with the cell in described cellular array; Described image acquisition units is usedProcess for the cell in described cellular array in acquisition of image data.
In the time that reality is implemented, described imageing sensor can adopt comparatively popular cmos image sensor at present,The cmos image sensor of low side can sensor devices, (for example signal amplification circuit, modulus turn with other circuitChange circuit etc.) be made in the same face of silicon chip, high-end cmos image sensor can be made in other circuitThe back side, is connected with photosurface by TSV; No matter any situation, can be figure a large amount of in imageing sensorBe divided into the two-dimensional array consistent with each cell position in described cellular array as collecting unit, then image is passedSensor and a cellular array silicon chip carry out 3D combination, and arbitrary image acquisition units is corresponding with cellular arrayBetween cell, can establish a communications link by TSV.
It should be noted that, those skilled in the art know, each IMAQ of imageing sensor shown in Figure 17The significant surface of unit need to be arranged at the below of place silicon chip, because camera lens can only be below, otherwise light above willCan be blocked by other several silicon chips. In other embodiments, imageing sensor also can be superimposed upon cellular array silicon chipTop.
In addition,, in the time that reality is implemented, each two field picture will be split to each IMAQ of described imageing sensorIn unit, each image acquisition units is responsible for gathering corresponding a part of content in a two field picture, follow-up just can be realNow the parallel processing to each several part content in a two field picture, has so improved image treatment effeciency.
In the present embodiment, all right configuration image processor of each cell in described cellular array, this image processingDevice is processed for the view data that the image acquisition units being connected with this cell is gathered. Certainly, in realityWhen enforcement, described image processor also can be integrated among the microprocessor in cell.
Under the cell quantity in cellular array reaches the situation of several thousand, the IMAQ in imageing sensor soElement number has identical scale, and so each two field picture just can be read and be located by several thousand passages simultaneouslyReason, makes the ability of image processing of camera system by the raising that obtains hundred times, thereby meets for high-speed camera moreHigh request.
In addition, present inventor further contemplates, and " camera system of prior art is for the processing energy of videoPower does not catch up with imageing sensor far away and collects the speed of view data, makes the image-capable of camera systemFor limited " problem, except high-speed digital photography does not still have good solution at present, know for some imagesThe application that does not exist faster speed to require, for example following automatic vehicle control system, may require a frameImage is made and is identified and form reaction in signa, the same effectively plan of reply that lacks.
For this reason, the embodiment of the present invention also provides a kind of image identification system, comprises recognition unit and the invention processThe camera system that example provides, described recognition unit is known for the image obtaining after described camera system is processedNot.
Those skilled in the art can understand, and the camera system providing when the embodiment of the present invention can be with than existingThe faster speed of technology completes in the situation that reads and process of each two field picture, and nature can be at faster speedRealize the identification for each two field picture.
Therefore, by the camera system that comprises cellular array computing system is applied in image identification system, due toThe processing speed of each two field picture is increased dramatically, and just can within the shorter time, realize for each two field pictureIdentification, make this image identification system there is recognition capability faster, thus meet identify for high speed imageRequirement.
The above-mentioned camera system that the embodiment of the present invention provides and the concrete enforcement of image identification system can also referencesThe associated description of above-mentioned cellular array computing system repeats no more herein.
As previously mentioned, because included non-volatile of the cell in the cellular array of cellular array computing system deposited at randomReservoir is all very limited as the space of internal memory and storage area, therefore how to expand internal memory and the storage of cell simultaneouslySpace is problem demanding prompt solution, and the embodiment of the present invention provides one comprising master cpu, cell battle array for this reasonOn the basis of row and cellular array bus, comprise again the cell of described memory cell array and memory cell array simultaneouslyArray computation system, this cellular array computing system has been expanded internal memory and the memory space of cell simultaneously, especially applicableComplete the task of some big data quantity Storage and Processings, the frame frame gathering due to imageing sensor in camera systemThe data volume of high-definition image is larger, is therefore in each cell expansion of the cellular array computing system in camera systemDeposit with memory space and just seem particularly necessary.
Based on above-mentioned consideration, the embodiment of the present invention also provides the another kind of structure of camera system. As shown in figure 18,Described camera system comprises cellular array computing system and imageing sensor, and described cellular array computing system comprisesMaster cpu, cellular array and cellular array bus, specifically can be with reference to the description of above-mentioned related embodiment, hereinRepeat no more; In the present embodiment, described cellular array computing system also comprises at least one memory cell array and extremelyA few memory cell array; Described memory cell array is by the two-dimensional array that more than memory cell forms,Described cellular array and one or more memory cell arrays are built up three-dimensional structure, each memory cell battle arrayMemory cell in row is connected correspondingly with the cell in described cellular array; Described memory cell is for storageThe instruction code of software and the data that need persistence; Described memory cell array is by more than one internal storage locationThe two-dimensional array of composition, described cellular array and one or more memory cell array are built up three-dimensional knotStructure, the internal storage location in each memory cell array is connected correspondingly with the cell in described cellular array, instituteThe arbitrary access of related data while stating internal storage location for described microprocessor calculating.
When actual enforcement, described memory cell array can be flash memory silicon chip, and described memory cell array can beMRAM, DRAM or SRAM silicon chip, described cellular array is on a silicon chip, and described imageing sensor canTo adopt comparatively popular cmos image sensor at present.
The cellular array computing system that the camera system of the another kind of structure that the embodiment of the present invention provides comprises also canWith with reference to Figure 16.
By by least one by the memory cell array that more than memory cell forms, and at least one is by oneThe memory cell array of individual above internal storage location composition, is built up three-dimensional structure with described cellular array respectively, andMake in the internal storage location and cellular array in memory cell, the each memory cell array in each memory cell arrayCell be connected correspondingly, so just can be with the storage sky of each cell in lower cost expansion cellular arrayBetween and memory headroom, can improve the data storage capacities of each cell, can improve again locating of microprocessor in cellManage efficiency, thereby further improved the image-capable of camera system.
It should be noted that, the another kind of structure of the camera system that the embodiment of the present invention provides is with described cellular arrayIn computing system, comprise at least one memory cell array and at least one memory cell array is that example is said simultaneouslyBright, in other embodiments, described in also can only comprising in the cellular array computing system that camera system comprisesAny one in memory cell array and memory cell array, for example, cellular array meter shown in Figure 14 or Figure 15Calculation system.
In addition in the structure of camera system shown in Figure 18, between imageing sensor and cellular array silicon chip, be straight,Connect and be superimposed together, do not have memory cell array silicon chip or memory cell array silicon chip between the two,Make the line between image acquisition units and corresponding cell shorter, so can make imageing sensor gatherImage is read and processes by the cell in cellular array more quickly, thereby improves image treatment effeciency.
Certainly, the embodiment of the present invention also provides a kind of recognition unit and figure of camera system as shown in figure 18 of comprisingPicture recognition system, described recognition unit is identified for the image obtaining after this camera system is processed.
The above-mentioned camera system that the embodiment of the present invention provides and the concrete enforcement of image identification system can also referencesThe description of above-mentioned relevant cell array computation system repeats no more herein.
Neutral net (NeuralNetworks) is computational methods conventional in computer learning, copies human brainOperation principle, is generally used perceptron (Perceptron) or the such concept of neuron (Neuron). NeuralThe learning training process of Networks is the process that a magnanimity is calculated, and a neutral net is actually one and hasThe function of quantity of parameters (may have tens thousand of), needs a large amount of scenes, and each scene has input data and correctAnswer, is used for regulating these a large amount of parameters to reach the destination of study. Due to the meter relating in neural computingCalculation amount is very big, computer architecture of the prior art because of the communication performance bottleneck that exists between CPU and internal memory, storageCan have a strong impact on computer overall performance, thereby be unfavorable for the efficient realization of neural computing.
For this reason, the above-mentioned cellular array computing system providing based on the embodiment of the present invention, the embodiment of the present invention is also carriedFor a kind of method of applying above-mentioned cellular array computing system and realizing neural computing, comprising: in described cell battle arrayIn row, in the cell of each participation neural computing, store the code of one or more neuron functions; DescribedMaster cpu selects one or more cells to carry out the code of the neuron function that each cell stores, and will holdRow result is exported to one or more target cell; The cell of arbitrary participation neural computing receives and arrivesThe execution result of the code for neuron function of exporting from other cells, sets it as the input number of this cellAccording to, carry out the code of the neuron function that this cell stores based on all input data from other cells, andExecution result is outputed to the cell at neuron function place that need to this execution result, or deposit default address etc. inTreat that described master cpu reads, or to described master cpu output.
The above-mentioned cellular array computing system realization of the application god who provides in order better to understand the embodiment of the present inventionThrough the method for network calculations, be necessary that first the principle to neural computing is briefly described.
Neutral net is algorithm conventional in computer learning, and it has copied the operation principle of human brain. Human brain is by greatlyAmount neuron composition, each neuron can be by a large amount of (thousands of) cynapses on dendritic arbors and otherNeuron contact, is inputted; Stiffness of coupling in cynapse has memory function; Excitement of its output or inhibitionSignal, this signal can send a large amount of at a distance neurons to by aixs cylinder.
The neural computing method of copying human brain operation principle to form, neuron is wherein exactly a function,As shown in figure 19, it has a lot of inputs, for example x1、x2And x3Wherein 3 inputs, each inputCorresponding a weight, general computational methods are that each input is multiplied by weight and is added; Its output 0 or 1 (byA threshold value determines), or a value between 0 and 1; There are a lot of parameter (for example weights its insideParameter, threshold parameter etc.), the process of adjusting these parameters is exactly the process of computer learning.
A typical NeuralNetwork, is the network that the output of a large amount of Neuron, input are linked together,Conventionally be organized into multistage architecture. As shown in figure 20, in figure, this neutral net is organized into three-tier architecture, eachThe output of ground floor Neuron sends to each second layer Neuron, and the output of each second layer Neuron is sent outDeliver to each the 3rd layer of Neuron. In Figure 20, the ground floor of neutral net can be called input layer, and the second layer canTo be called hidden layer, the 3rd layer can be called output layer, described output layer export 0,1,2 ..., 8,9 is the final result of neural computing. Neutral net in application, is all such hierarchy conventionally,When actual enforcement, this can be a neutral net that is used for identifying handwritten numeral.
The learning training process of Neuralnetwork is the process that a magnanimity is calculated. A network is actually oneThe individual function that has quantity of parameters (may have tens thousand of), needs a large amount of scenes, and each scene has input dataAnd correct option, be used for regulating these a large amount of parameters to reach the destination of study (conventionally using steepest descent method).Such as a neural network recognization handwritten numeral of training, need to pass through several ten thousand pictures, regulate parameter to make nerve netNetwork provides correct result to each pictures, and handwriting recognition after this just has very high success rate.
Present inventor thinks, neural computing is to be highly suitable for the cellular array that the embodiment of the present invention providesIn computing system, be achieved. Owing to needing neutral net to carry out the valuation of a lot of number of times in training process, thatThis neutral net is to accelerate by a large amount of MPU parallel computings.
In the time that reality is implemented, if neuronic quantity size does not exceed in cellular array computing system thin in neutral netBorn of the same parents' quantity, a cell in cellular array can complete the calculating that neuron relates in neutral net,Otherwise the calculating that in neutral net, more than one neuron relates to can be integrated in a cell and complete.
When concrete enforcement, each cell is stored the code of one or more neuron functions, the generation of neuron functionCode can be broadcast to each as among neuronic cell by the mode of data broadcast by master cpu, and by leadingControl CPU broadcasting instructions starts neural computing to certain target area, for example: described master cpu can selectSelecting the cell all or in target area in described cellular array, will as the cell that participates in neural computingThe code broadcast of identical neuron function is in the same segment relative address of selected each cell, and wideBroadcasting instruction makes the microprocessor of selected one or more cells start to carry out from described relative addressThe code of described neuron function. Concrete enforcement can also be with reference to the concrete phase of implementing of above-mentioned cellular array computing systemClose and describe.
In the embodiment of the present invention, the corresponding weight parameter of each input data from other cells, described baseThe code of carrying out the neuron function that this cell stores in all input data from other cells can comprise:Each is multiplied each other from the input data of other cells weight parameter corresponding with it, and to all product summations,Result after summation and threshold parameter are compared to afterwards definite output valve as execution result. Actual enforcementTime, described weight parameter and threshold parameter are pre-stored among the described non-volatile random access memory in cell.
When actual enforcement, the execution result of exporting after the code of described neuron function if carry out is successive value, alsoAccelerate output by the mode of tabling look-up.
When concrete enforcement, any two cells of cellular array computing system support that provide due to the embodiment of the present invention itBetween cell-cell communication, while carrying out neural computing, can by adopt the transmission of described cell-cell communication forThe execution result of the code of described neuron function. The huge bandwidth of communicating by letter dependence cellular array internal network has improvedThe efficiency of cell-cell communication, also reduced the processing load of master cpu, thereby can further promote computing systemOverall performance, is more conducive to the efficient realization of neural computing.
When concrete enforcement, in the time that neutral net is divided into different layers, transfer of data between layers can makeThe cell-cell communication mass-sending mechanism providing by the embodiment of the present invention, now participates in the thin of neural computing in same layerBorn of the same parents are positioned at same target area, and in certain one deck, the cell of arbitrary participation neural computing is right to target area mass-sendingIn the execution result of the code of neuron function, be just equivalent to the execution result group of the code for neuron functionBe dealt in the cell of all participation neural computings of lower one deck. By intercellular point communications functionality is expandedMass-send to region, can support the higher depth of parallelism, obtain much higher total bandwidth, thereby further promote and calculateThe overall performance of system contributes to improve the speed of neural computing very much.
When concrete enforcement, if be also provided with full-time output cell in cellular array, can pass through described cell-cell communicationThe execution result of neural computing is sent to described full-time output cell, reads for described master cpu, asThis can export to master cpu by the execution result of neural computing more efficiently. When actual enforcement, if lastIn the neuron function of one deck, some obtains sure output (common application is in the neuron of last one deck, everyBe responsible for specific data characteristics of identification or image for one), can be by this mechanism notice master cpu.
Implementing neural computing by the mass-sending of cell-cell communication, iuntercellular and full-time output cell can be with reference to carefullyAssociated description in born of the same parents' array computation system embodiment repeats no more herein.
In sum, the cellular array computing system that uses the embodiment of the present invention to provide is realized the excellent of neural computingGesture is apparent: the arithmetic speed that the parallel computing of a large amount of cells is greatly accelerated, makes learning training especiallySpeed greatly improve; Communicate by letter huge bandwidth and mass-sending mechanism of array internal network contributes to improve equally very muchSpeed; MRAM's is non-volatile, makes to train successful chip can directly copy as solving particular problemProduction marketing.
Those skilled in the art know, and any one CPU needs to leave software debugging interface, and this opens for softwareSending out is necessary function. Now the CPU on market nearly all leaves debugging interface, most of debugging interfaces according toJTAG (JointTestActionGroup, combined testing action group) standard design. Outside commissioning device is logicalCross this interface and send out debug command to CPU, comprising: suspend, arrange breakpoint, read/write memory etc., these instructionsThe ruuning situation of enforcement helper person audit program, diagnostic software fault.
The cellular array computing system providing for the embodiment of the present invention, for example, when a large amount of (thousands of) CPU are collectedBecome on a chip time, how convenient, to realize debug function be efficiently problem demanding prompt solution.
Based on above-mentioned consideration, the embodiment of the present invention gives a kind of cellular array with debugging interface and calculates systemSystem. As shown in figure 21, described cellular array computing system is except comprising foregoing master cpu, cell battle arrayRow and cellular array bus, can further include the debugging interface being connected with described master cpu, commissioning deviceBy master cpu described in described debugging interface control to the software moving in each cell of described cellular arrayDebug.
In the cellular array computing system providing in the embodiment of the present invention, described master cpu can adopt prior artIn CPU realize, it also can support existing debugging interface naturally. External debug equipment connects by debugging interfaceConnect master cpu, recycle the function of existing debugging interface support read/write memory, send debugging to cellular array and refer toOrder.
Consider in some cases, between software program requirement cell, exchange bulk information, if a cell is metTo breakpoint stop, and other cell still continues operation, can cause the confusion of whole system to cause debugging to enterOK. For this reason, in embodiments of the present invention, described cellular array computing system can also comprise the described cell battle array of connectionThe halt signal line of each cell (not shown in Figure 21) in row, described halt signal line is for working as arbitrary cellThe software of operation meets with breakpoint and while suspending, sends halt signal to other all cells.
By increase a halt signal line that connects all cells in cellular array computing system, any one is thinWhen born of the same parents run into breakpoint, all can on this root halt signal line, send halt signal, all cells are received this time-out letterAfter number, suspend at once the wherein software of operation, intercellular Internet Transmission also suspends simultaneously, can avoid thus causingThe confusion of whole system and cause debugging cannot carry out, guarantee debugging stability and accuracy.
Further, described halt signal line can also be connected with described commissioning device, and described halt signal also sendsTo described commissioning device. In the time that reality is implemented, described halt signal line can be guided to cellular array chip exterior, usesTo connect commissioning device.
Based on the above-mentioned cellular array computing system with debugging interface, it is a kind of above-mentioned thin that the embodiment of the present invention also providesThe adjustment method of born of the same parents' array computation system, comprising: described commissioning device is by master control described in described debugging interface controlCPU to the target cell in described cellular array, cell in target area or all cell send debugging and refer toOrder, debugs with the software of realizing moving in each cell.
In embodiments of the present invention, software debugging is to be come by described debugging interface control master cpu by commissioning deviceRealize.
Particularly, master cpu send to some cells or be broadcast to cell in some target areas orThe debug command of the whole cells in person's cellular array comprises: pause instruction, read write command and break-poing instruction is set;Described pause instruction can suspend MPU operation, suspend intercellular transfer of data etc.; Described read write command energyEnough read the internal register of MPU, comprise the register (operation, time-out, the experience breakpoint that record MPU stateEtc.), can also read the data of FIFO in the communication interface between flanking cell; The described break-poing instruction energy that arrangesThe enough software moving in cell arranges breakpoint, specifically writes in the MPU of cell and is achieved.
When concrete enforcement, described master cpu is under the control of described commissioning device, or it is temporary to detect other needsStop condition time, give cell in target cell, target area or all cell send pause instruction; ReceiveThe cell of described pause instruction suspends the operation of software in this cell; By sending read write command to read described micro-placeFor example, leading between internal register, described non-volatile random access memory (MRAM) and flanking cell in reason deviceContent in letter interface is carried out debugging software.
Further, described master cpu can also arrange break-poing instruction in target cell or target by transmissionThe software moving in cell in region arranges breakpoint, and regularly reads the state of cell; The state bag of described cellDraw together operation, time-out and the experience breakpoint of software in cell, the described condition that other need to suspend comprises soft in cellPart meets with breakpoint.
In the embodiment of the present invention, owing to also comprising that at cellular array computing system in the described cellular array of connection, each is thinBorn of the same parents' described halt signal line, therefore the adjustment method of above-mentioned cellular array computing system also comprises: when arbitrary cellThe software of middle operation meets with breakpoint and while suspending, this cell is sent out by other all cells of described halt signal alignmentSend halt signal; Receive the cell of described halt signal, suspend the operation of software in this cell, and time-out and phaseTransfer of data between adjacent cell.
In the embodiment of the present invention, described halt signal line is also connected with described commissioning device, also makes described halt signalCan be sent to described commissioning device, therefore the adjustment method of above-mentioned cellular array computing system also comprises: when describedWhen commissioning device receives the described halt signal that arbitrary cell transmits by described halt signal line, control and suspend instituteState the operation of master cpu. In the time that reality is implemented, when the halt signal that in cellular array, arbitrary cell sends is sent toWhen commissioning device, commissioning device can suspend master cpu at once, to check the friendship of master cpu and cellular arrayProblem mutually, thus further guarantee stability and the accuracy of debugging.
The cellular array computing system with debugging interface that the embodiment of the present invention provides and adjustment method thereof concreteImplementing also can be with reference to the enforcement of the above-mentioned cellular array computing system with other structures.
It is pointed out that the embodiment of the present invention taking described cellular array be specially two-dimentional cellular array as example toThe cellular array computing system of debugging interface describes, and in other embodiments, described cellular array is also passableBe three-dimensional cell array, described three-dimensional cell array is to be formed by stacking by an above two-dimentional cellular array, now thinIn born of the same parents' array, the concept of " flanking cell " is not only confined to two dimensional surface, but expands to three dimensions. IfIn two dimension cellular array, adopt the communication mode between flanking cell as shown in Figure 3, at rectangular coordinate system in spaceIn, arbitrary cell all has phase at x axle both forward and reverse directions, y axle both forward and reverse directions and this six direction of z axle both forward and reverse directionsAdjacent cell. In the time that reality is implemented, in the time that multi-disc 2D cellular array chip can be superimposed together composition 3D chip,Set up vertical linkage by TSV at adjacent iuntercellular, lay respectively at the adjacent of adjacent two two-dimentional cellular arraiesBetween cell, establish a communications link by TSV. The cellular array chip of 3D, in keeping low-power consumption advantage,Strengthen the scale of cellular array, expanded intercommunicating bandwidth.
Although the present invention discloses as above, the present invention is not defined in this. Any those skilled in the art, not de-In the spirit and scope of the present invention, all can make various changes or modifications, therefore protection scope of the present invention should be withClaim limited range is as the criterion.
Claims (13)
1. a cellular array computing system, is characterized in that, comprising: master cpu, cellular array and cellular arrayBus;
Described cellular array has concurrently above and calculates and two-dimensional array that the cell of memory function forms by one, whereinEach cell comprises microprocessor and non-volatile random access memory; Described non-volatile random access memory is for described micro-The arbitrary access of related data when processor calculates, also for the instruction code of storing software with need persistenceData;
Each cell store position in the described cellular array of each leisure as ID for the software in cell or hardPart reads;
Described master cpu leads to by each cell in described cellular array bus and described cellular arrayLetter;
Between flanking cell in described cellular array, there is communication interface, can mutually send data.
2. cellular array computing system according to claim 1, is characterized in that, described master cpu passes through instituteCommunicating by letter of stating that cellular array bus carries out with each cell in described cellular array comprises in following situation extremelyFew a kind of:
Read and write the non-volatile random access memory of arbitrary cell in described cellular array by address;
Non-volatile random access memory by data broadcast to each cell in target area in described cellular array, andWrite relative address identical in the non-volatile random access memory of each cell in described target area;
Send instruction to the microprocessor of arbitrary cell in described cellular array, send data or reading state;
Give the microprocessor broadcasting instructions of all cells in described target area.
3. cellular array computing system according to claim 1, is characterized in that, thin in described cellular arrayBorn of the same parents also comprise bus control unit and cell interior bus, described bus control unit and described cellular array bus, Wei ChuReason device and cell interior bus are connected, and described bus control unit is used for identifying between described master cpu and this cellThe communication of carrying out, connects described microprocessor and reads with the instruction of transmitting described master cpu and sending or data, state,Or connect described non-volatile random access memory and carry out the read-write operation of data by described cell interior bus.
4. cellular array computing system according to claim 1, is characterized in that, integrated in described microprocessorThere is at least one in Floating-point Computation processor and image processor.
5. cellular array computing system according to claim 1, is characterized in that, described non-volatile random storageDevice is MRAM.
6. cellular array computing system according to claim 1, is characterized in that, described master cpu with described inCellular array and described cellular array bus are integrated in a chip.
7. cellular array computing system according to claim 1, is characterized in that, described master cpu is as solelyVertical chip, by the memory interface of standard and the chip being made up of described cellular array and described cellular array busCommunicate.
8. the communication means in cellular array computing system as claimed in claim 1, is characterized in that, comprising:
Arbitrary cell in described cellular array receives that described master cpu broadcasts in described cellular array busDestination address, if judge described destination address in this cell, connects the non-volatile random access memory of this cellSo that described master cpu carries out the read-write operation of data;
In system address space, reserved the first special address field is used between described master cpu and microprocessorCommunicate by letter and store the ID of target cell, if when arbitrary cell receives described the first special address field in described cellular arrayIdentify is and the communicating by letter of the microprocessor of this cell that the microprocessor that connects this cell completes follow-up instruction and connectsReceipts, data receiver and state read operation;
In system address space, reserved the second special address field is for described master cpu broadcasting instructions, and described theTwo special address fields have the ID that can help to determine each cell of the scope of target area in described cellular array, ifIn described cellular array, arbitrary cell identifies this cell in described target area after receiving described the second special address fieldIn territory, the microprocessor that connects this cell reads with the instruction of transmitting described master cpu and sending or data, state,Or carry out the read-write operation of data by connecting the non-volatile random access memory of this cell;
Arbitrary cell in described cellular array sends data to flanking cell under the control of its microprocessor.
9. the communication means in cellular array computing system according to claim 8, is characterized in that, described thinCell in born of the same parents' array also comprises bus control unit and cell interior bus, described bus control unit and described cell battle arrayColumn bus, microprocessor and cell interior bus are connected; Arbitrary cell in described cellular array judges described orderMark address whether in this cell, identification whether with the communicating by letter of the microprocessor of this cell, identify this cell and whether existIn described target area, and to connect non-volatile random access memory or microprocessor be all complete by described bus control unitBecome, described bus control unit connects described non-volatile random access memory by described cell interior bus.
10. the communication means in cellular array computing system according to claim 8, is characterized in that described masterCPU is high for the priority of the read-write operation of the non-volatile random access memory of arbitrary cell in described cellular array in controlThis intracellular microprocessor of read-write operation in to(for) corresponding non-volatile random access memory.
It is long-pending that 11. 1 kinds of application cellular array computing system as described in claim 1 to 7 any one calculates MonteCarloThe method of dividing, is characterized in that, comprising:
Described master cpu is selected the cell all or in a target area in described cellular array, integrandCorresponding program is broadcast to the relative address section of selected each cell;
Described master cpu broadcasting instructions, makes the microprocessor of selected cell start to carry out from described relative address sectionThe program that described integrand is corresponding;
Each cell completes after integral operation, summation is stored in to the address of agreement, after reading for described master cpuAlways sue for peace.
It is long-pending that 12. 1 kinds of application cellular array computing system as described in claim 1 to 7 any one calculates MonteCarloThe method of dividing, is characterized in that, comprising:
Described master cpu is selected the cell all or in a target area in described cellular array;
In the same segment relative address of selected each cell that downloads of described master cpu broadcast,And broadcasting instructions makes the microprocessor of selected cell start to download described in execution from described relative address;
Program corresponding integrand is split into two or more subprograms, and described master cpu broadcast is eachIndividual subprogram is in the microprocessor of selected cell;
The microprocessor downloading described in operation, selects one of them subprogram according to the ID of place cell separatelyStore, each subprogram is deployed in one group of adjacent successively cell according to the order of sequence;
Described master cpu broadcasting instructions, makes the microprocessor of each group cell carry out successively described integrand correspondenceProgram by the subprogram after splitting, the intermediate object program of previous stage is transported to next stage input;
Each group cell completes after integral operation, summation is stored in to the address of agreement, reads for described master cpuAfter always sue for peace.
13. according to the method for the cellular array computing system calculating MonteCarlo integration described in claim 11 or 12,It is characterized in that, program corresponding to described integrand is in the time starting to carry out, and its randomizer carrying readsThe ID of cell is as seed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510456294.3A CN105608490B (en) | 2015-07-29 | 2015-07-29 | Cellular array computing system and communication means therein |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510456294.3A CN105608490B (en) | 2015-07-29 | 2015-07-29 | Cellular array computing system and communication means therein |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105608490A true CN105608490A (en) | 2016-05-25 |
CN105608490B CN105608490B (en) | 2018-10-26 |
Family
ID=55988414
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510456294.3A Active CN105608490B (en) | 2015-07-29 | 2015-07-29 | Cellular array computing system and communication means therein |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105608490B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106951955A (en) * | 2017-03-09 | 2017-07-14 | 中国人民解放军军械工程学院 | Electronic cell number system of selection in bus embryo's electronic cell array |
CN108256637A (en) * | 2016-12-28 | 2018-07-06 | 上海磁宇信息科技有限公司 | A kind of cellular array three-dimensional communication transmission method |
CN108897714A (en) * | 2018-07-03 | 2018-11-27 | 中国人民解放军国防科技大学 | Multi-core or many-core processor chip with autonomous region |
WO2019041251A1 (en) * | 2017-08-31 | 2019-03-07 | 北京中科寒武纪科技有限公司 | Chip device and related product |
CN109886393A (en) * | 2019-02-26 | 2019-06-14 | 杭州闪亿半导体有限公司 | It is a kind of to deposit the calculation method for calculating integrated circuit and neural network |
CN110362280A (en) * | 2019-09-04 | 2019-10-22 | 南京优存科技有限公司 | Mixing storage system based on the nearly data processing MRAM of low-power consumption neural network |
CN112106035A (en) * | 2018-04-03 | 2020-12-18 | 赛灵思公司 | System-on-chip interface architecture |
CN112631989A (en) * | 2021-03-08 | 2021-04-09 | 南京蓝洋智能科技有限公司 | Data transmission method among small chips, among chips and among small chips |
CN112667557A (en) * | 2021-03-16 | 2021-04-16 | 南京蓝洋智能科技有限公司 | Data transmission method suitable for chiplet architecture |
CN112732631A (en) * | 2020-12-25 | 2021-04-30 | 南京蓝洋智能科技有限公司 | Data transmission method between small chips |
US11972132B2 (en) | 2018-04-03 | 2024-04-30 | Xilinx, Inc. | Data processing engine arrangement in a device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1341242A (en) * | 1999-01-21 | 2002-03-20 | 索尼电脑娱乐公司 | High-speed processor system, method of using the same, and recording medium |
US20090024829A1 (en) * | 2007-06-22 | 2009-01-22 | Yuefan Deng | Mixed torus and hypercube multi-rank tensor expansion method |
CN101354694A (en) * | 2007-07-26 | 2009-01-28 | 上海红神信息技术有限公司 | Ultra-high expanding super computing system based on MPU structure |
CN101681296A (en) * | 2008-02-29 | 2010-03-24 | 株式会社东芝 | Memory system |
-
2015
- 2015-07-29 CN CN201510456294.3A patent/CN105608490B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1341242A (en) * | 1999-01-21 | 2002-03-20 | 索尼电脑娱乐公司 | High-speed processor system, method of using the same, and recording medium |
US20090024829A1 (en) * | 2007-06-22 | 2009-01-22 | Yuefan Deng | Mixed torus and hypercube multi-rank tensor expansion method |
CN101354694A (en) * | 2007-07-26 | 2009-01-28 | 上海红神信息技术有限公司 | Ultra-high expanding super computing system based on MPU structure |
CN101681296A (en) * | 2008-02-29 | 2010-03-24 | 株式会社东芝 | Memory system |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108256637A (en) * | 2016-12-28 | 2018-07-06 | 上海磁宇信息科技有限公司 | A kind of cellular array three-dimensional communication transmission method |
CN106951955A (en) * | 2017-03-09 | 2017-07-14 | 中国人民解放军军械工程学院 | Electronic cell number system of selection in bus embryo's electronic cell array |
CN106951955B (en) * | 2017-03-09 | 2019-12-17 | 中国人民解放军军械工程学院 | Method for selecting electronic cell number in bus embryo electronic cell array |
WO2019041251A1 (en) * | 2017-08-31 | 2019-03-07 | 北京中科寒武纪科技有限公司 | Chip device and related product |
CN112106035A (en) * | 2018-04-03 | 2020-12-18 | 赛灵思公司 | System-on-chip interface architecture |
US11972132B2 (en) | 2018-04-03 | 2024-04-30 | Xilinx, Inc. | Data processing engine arrangement in a device |
CN108897714A (en) * | 2018-07-03 | 2018-11-27 | 中国人民解放军国防科技大学 | Multi-core or many-core processor chip with autonomous region |
CN108897714B (en) * | 2018-07-03 | 2022-05-24 | 中国人民解放军国防科技大学 | Multi-core or many-core processor chip with autonomous region |
CN109886393A (en) * | 2019-02-26 | 2019-06-14 | 杭州闪亿半导体有限公司 | It is a kind of to deposit the calculation method for calculating integrated circuit and neural network |
CN110362280A (en) * | 2019-09-04 | 2019-10-22 | 南京优存科技有限公司 | Mixing storage system based on the nearly data processing MRAM of low-power consumption neural network |
CN112732631A (en) * | 2020-12-25 | 2021-04-30 | 南京蓝洋智能科技有限公司 | Data transmission method between small chips |
CN112631989A (en) * | 2021-03-08 | 2021-04-09 | 南京蓝洋智能科技有限公司 | Data transmission method among small chips, among chips and among small chips |
CN112667557A (en) * | 2021-03-16 | 2021-04-16 | 南京蓝洋智能科技有限公司 | Data transmission method suitable for chiplet architecture |
Also Published As
Publication number | Publication date |
---|---|
CN105608490B (en) | 2018-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105608490A (en) | Cellular array computing system and communication method thereof | |
CN105740946B (en) | A kind of method that application cell array computation system realizes neural computing | |
US11580367B2 (en) | Method and system for processing neural network | |
CN105718996A (en) | Cell array calculation system and communication method wherein | |
CN105718994B (en) | Cellular array computing system | |
CN102289390B (en) | Hypervisor scheduler | |
CN102906726B (en) | Association process accelerated method, Apparatus and system | |
KR20210044669A (en) | Data preprocessing method, apparatus, computer equipment and storage medium | |
CN105468291B (en) | Dynamic and static wear balance control method and device | |
CN112035061B (en) | Solid state disk resource allocation method, device and storage medium | |
DE102022129936A1 (en) | Techniques for expanding system memory by utilizing available device memory | |
CN105556503A (en) | Dynamic memory control method and system thereof | |
CN112988080B (en) | Multi-data page simultaneous writing method, 3D flash memory device and computer storage medium | |
US12056382B2 (en) | Inference in memory | |
CN105874431A (en) | Computing system with reduced data exchange overhead and related data exchange method thereof | |
CN105718991B (en) | Cellular array computing system | |
CN110569122A (en) | Multiprocessor system, multi-core processing device, and method of operating the same | |
CN105718990B (en) | Communication means between cellular array computing system and wherein cell | |
CN110750363B (en) | Computer storage management method and device, electronic equipment and storage medium | |
CN105718380B (en) | Cellular array computing system | |
CN105718379B (en) | Cellular array computing system and wherein iuntercellular mass-send communication means | |
CN106776390A (en) | Method for realizing memory access of multiple devices | |
CN105718993A (en) | Cell array calculation system and communication method therein | |
CN107291209A (en) | Cellular array computing system | |
CN104951406B (en) | A kind of paging type address space menagement method and controller |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |