CN105718992B - Cellular array computing system - Google Patents

Cellular array computing system Download PDF

Info

Publication number
CN105718992B
CN105718992B CN201510456244.5A CN201510456244A CN105718992B CN 105718992 B CN105718992 B CN 105718992B CN 201510456244 A CN201510456244 A CN 201510456244A CN 105718992 B CN105718992 B CN 105718992B
Authority
CN
China
Prior art keywords
cell
cellular array
array
data
cellular
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510456244.5A
Other languages
Chinese (zh)
Other versions
CN105718992A (en
Inventor
戴瑾
郭民
郭一民
王践识
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ciyu Information Technologies Co Ltd
Original Assignee
Shanghai Ciyu Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ciyu Information Technologies Co Ltd filed Critical Shanghai Ciyu Information Technologies Co Ltd
Priority to CN201510456244.5A priority Critical patent/CN105718992B/en
Publication of CN105718992A publication Critical patent/CN105718992A/en
Application granted granted Critical
Publication of CN105718992B publication Critical patent/CN105718992B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

A kind of cellular array computing system, comprising: master cpu, cellular array, cellular array bus and memory cell array;Cellular array is the three-dimensional cell array being formed by stacking by more than one two-dimentional cellular array, and two-dimentional cellular array is made of more than one cell for having both calculating and store function;At least one two-dimentional cellular array overlaps to form three-dimensional structure with one or more corresponding memory cell arrays, and the internal storage location in each memory cell array is connected correspondingly with the cell in respective two-dimensional cellular array;Position in each comfortable cellular array of each cell storage as ID in cell software or hardware reading;Master cpu is communicated by cellular array bus with each of cellular array cell;There is communication interface between flanking cell in cellular array, can mutually send data.The present invention can overcome active computer framework because of the communication performance bottleneck between CPU and memory, storage, lifting system overall performance.

Description

Cellular array computing system
Technical field
The present invention relates to computer and computer application technology, in particular to a kind of cellular array computing system.
Background technique
Usually, a computer mainly includes three cores: central processing unit (CPU, Central Processing Unit), memory and storage.
By the unremitting effort of some Apical Limited of the world, CPU has evolved into extremely complicated semiconductor chip.It is top CPU core inside metal-oxide-semiconductor number can be more than 100,000,000.Current industry trend is to be limited by power consumption, the operation frequency of CPU Rate has been difficult to improve again.Extremely complicated modern CPU, operational efficiency are similarly difficult to improve again.New CPU product, more Come more towards the evolution of multicore direction.
In terms of memory, that occupy dominant position at present is dynamic random access memory (DRAM, Dynamic Random Access Memory) technology.DRAM can quick random read-write, but content cannot be kept in the event of a power failure.It is practical On, even if it is also due to internal lose information for storing up the electric leakage of stored capacitor in the case where energization, it is necessary to Periodically self-refresh.
In terms of storage, nand flash memory technology gradually replaces conventional hard.The floating grid that flash memory is relied on (floating gate) technology, although content can be kept in the event of a power failure, the speed of write-in (being rewritten as ' 0 ' for ' 1 ') Degree is very slow, and the speed of erasing (being rewritten as ' 1 ' for ' 0 ') is slower, and no image of Buddha DRAM is used for the direct support to calculating like that.Its quilt It makes block device (block device), it is necessary to which monolith is wiped together, and a block (block) includes many pages (page), is wiped Write operation can be carried out except latter every page.The another question of NAND is that have the limited service life.
The logic circuit of DRAM and nand flash memory and CPU, although being all based on the production of cmos semiconductor technique, The technique of this three is each other and incompatible.Then, three cores of computer can not coexist on a single die, this is deep Ground affects the framework of modern computer.
Computer architecture in the prior art as shown in Figure 1, show multiple CPU cores in Fig. 1, respectively CPU1, CPU2, CPU3 ..., CPUn, each CPU core generally has corresponding level cache (L1Cache), as needed can also be into one Step is that each CPU core is equipped with corresponding L2 cache (L2Cache), three-level caches (L3Cache).In DRAM and each CPU It is communicated between core by Double Data Rate (DDR, Double Data Rate) interface, hard disk (HD, Hard Disk) or solid-state Hard disk (SSD, Solid State Drives) with then communicated by peripheral device interface between each CPU core.
On the one hand, CPU develops to the direction of multicore, and another aspect memory and storage are all in other chip.Multicore CPU handles up the proportional increase of information content, and the bottleneck of system performance is just increasingly becomed with the communication of memory, storage.In order to alleviate Communication performance bottleneck, CPU have to using increasing multi-level buffer.Caching be in memory content duplication, usually at This but speed faster static random access memory (SRAM, Static Random Access more much higher than DRAM Memory it) designs.Such framework, cost effectiveness are very poor.The cost of semiconductor chip determines by the area of its silicon wafer, and The performance boost of convention computer architecture bring and the increase of its silicon area are much disproportionate.
Summary of the invention
The problem to be solved in the present invention is computer architecture in the prior art because existing between CPU and memory, storage Communication performance bottleneck and the promotion for influencing computer overall performance, and keep cost effectiveness poor.
To solve the above problems, technical solution of the present invention provides a kind of cellular array computing system, comprising: master cpu, thin Born of the same parents' array, cellular array bus and memory cell array;The cellular array be by more than one two-dimentional cellular array overlapping and At three-dimensional cell array, it is described two dimension cellular array be made of more than one cell for having both calculating and store function, wherein Each cell include microprocessor (MPU, Micro Processing Unit) and non-volatile (NV, Not Volatile) with Machine memory;The arbitrary access of the non-volatile random access memory involved data when calculating for the microprocessor, is also used In storage software instruction code and need the data of persistence;The memory cell array is by more than one internal storage location The two-dimensional array of composition, at least one two-dimentional cellular array overlap to be formed with one or more corresponding memory cell arrays Three-dimensional structure, internal storage location in each memory cell array and the cell in corresponding two-dimentional cellular array phase correspondingly Even;The arbitrary access of the internal storage location involved data when being calculated for the microprocessor;Each cell storage is respectively Position in the cellular array is as identity recognition number (ID, identification) for the software or hardware in cell It reads;The master cpu is communicated by the cellular array bus with each of cellular array cell;It is described There is communication interface between flanking cell in cellular array, can mutually send data.
Optionally, the master cpu by each of the cellular array bus and the cellular array cell into Capable communication includes at least one of following situations:
The non-volatile random access memory of any cell or corresponding internal storage location in the cellular array are read and write by address;
By data broadcasting into the cellular array in target area each cell non-volatile random access memory or phase The internal storage location answered, and be written in the target area in the non-volatile random access memory of each cell or corresponding internal storage location Identical relative address;
Instruction is sent to the microprocessor of any cell in the cellular array, sends data or reading state;
To the microprocessor broadcasting instructions of cells all in target area.
Optionally, the cell in the cellular array further includes bus control unit and cell interior bus, described intracellular Portion's bus connects the microprocessor, non-volatile random access memory and internal storage location corresponding with this cell, the bus Controller is connected with the cellular array bus, microprocessor and cell interior bus, and the bus control unit is for identification The communication carried out between the master cpu and this cell connects the microprocessor to transmit the instruction that the master cpu is sent Or data, state are read, or connect the reading that the non-volatile random access memory carries out data by the cell interior bus Write operation.
Optionally, the memory cell array of identical quantity forms a memory cell array group, the memory cell array The quantity of group is identical as the two-dimentional quantity of cellular array, and each memory cell array group and each two-dimentional cellular array are one by one Accordingly overlapping forms three-dimensional structure.
Optionally, Floating-point Computation processor (FPU, Float Point Unit) and image are integrated in the microprocessor At least one of processor.
Optionally, the non-volatile random access memory is magnetic RAM (MRAM, Magnetic Random Access Memory), the memory cell array is MRAM, DRAM or SRAM silicon wafer, and a two-dimentional cellular array is in a silicon On piece.
Optionally, it is located between the flanking cell of two neighboring two-dimentional cellular array and the internal storage location and institute It states and all passed through through silicon via in two-dimentional cellular array between corresponding cell and establish a communications link.
Optionally, it can be carried out communication between any two cell in the cellular array, participate in the thin of cell-cell communication Born of the same parents include starting point cell, end point cell and transfer cell, and the starting point cell is the cell that data are issued to the end point cell, The end point cell is the cell for finally receiving the sent out data of starting point cell, and the transfer cell is along cell-cell communication road Diameter is successively adjacent and cell by the sent out data of starting point cell described in the communication interface transfer, the cell-cell communication path It is the data transmit-receive path being made of the starting point cell, transfer cell and end point cell.
Optionally, any cell in the cellular array can also be all into target area as the starting point cell Cell carries out mass-sending communication, participates in the mass-sending communication and the cell in target area as the starting point cell or makees For the end point cell or simultaneously as the transfer cell and end point cell, participates in the mass-sending communication and be located at target area Overseas cell is as the starting point cell or transfer cell.
Optionally, at least one sole duty output cell, the full-time output cell conduct are additionally provided in the cellular array The end point cell receives and stores other cells to the output data of the master cpu, and notifies the master with interrupt signal It controls CPU and reads the output data.
Optionally, the cell in the cellular array further includes the network controller being connected with the microprocessor, described Network controller is used to receive and dispatch the data of sending, the data of transfer or final received data in cell-cell communication Control is also used to send interrupt signal to the microprocessor.
Optionally, the cell in the cellular array further include be connected with the network controller it is a group or more of First Input First Output, each group First Input First Output respectively correspond a cell adjacent with this cell, each group of first in, first out team Column include input First Input First Output and output First Input First Output, this is thin for storing input for the input First Input First Output Born of the same parents carry out the data or final received data of transfer, and the output First Input First Output is used to store the need exported from this cell The data that the data or this cell for carrying out transfer are issued to other cells.
Optionally, the master cpu and the cellular array and the cellular array bus are integrated in a chip.
Optionally, the master cpu is as independent chip, by the memory interface of standard with by the cellular array and The chip of the cellular array bus composition is communicated.
Compared with prior art, technical solution of the present invention has at least the following advantages:
Two-dimentional cell battle array is formed with the unit (referred to as " cell ") of store function by the way that more than one is had both independent calculate Column, then overlap more than one two-dimentional cellular array to form three-dimensional cell array, wherein each cell include microprocessor and Non-volatile random access memory, the non-volatile random access memory can support the microprocessor involved data when being calculated Arbitrary access, can also support the instruction code of storage software and need the data of persistence, make memory, storage, calculate three A function is integrated into each cell, and makes to form intensive communication network between each cell, on the one hand, master cpu can pass through Cellular array bus is communicated with each of cellular array cell, on the other hand, flanking cell in cellular array it Between also can mutually send data, thus, it is possible to by data mass-sending and internal network, overcome active computer framework because CPU in Existing communication performance bottleneck between depositing, storing to promote the overall performance of computing system, and can make cost effectiveness preferable;In addition, again By the memory cell array being made of more than one internal storage location, by least one two-dimentional cellular array with corresponding one or More than one memory cell array overlaps to form three-dimensional structure, and makes internal storage location in each memory cell array and corresponding Two-dimentional cellular array in cell be connected correspondingly, the internal storage location for the microprocessor calculate when it is involved The arbitrary access of data just can so be extended the memory headroom of each cell in cellular array with lower cost, improve cell The treatment effeciency of middle microprocessor.
By the communication interface between flanking cell in cellular array, realize between flanking cell for data it is multiple in Turn, makes just to can be carried out communication not against master cpu between any two cell in cellular array, improve cell-cell communication Efficiency also reduces the processing load of master cpu, so as to further promote the overall performance of computing system.
By the way that intercellular point communications functionality is expanded to region mass-sending, it can support the higher depth of parallelism, obtain Much higher total bandwidth, to further promote the overall performance of computing system.
By the way that full-time output cell is arranged in cellular array, connect using the full-time output cell as the end point cell It receives and stores other cells to the output data of master cpu, and notified described in the master cpu reading in a manner of interrupt signal Output data so can improve master cpu when only a few cell is needed to master cpu output data and read output The efficiency of data.
The present invention solves the problems, such as the communication performance bottleneck between CPU and memory.In this way, under identical silicon area, largely The processing capacity of miniature MPU far surpasses the processing capacity of a few top CPU in many problems, and power consumption is then much lower.
The invention enables the frameworks of computer closer to human brain, provides strong draw for following intelligent algorithm It holds up.
Detailed description of the invention
Fig. 1 is the schematic diagram of computer architecture in the prior art;
Fig. 2 is the structural schematic diagram for the cellular array computing system that the embodiment of the present invention one provides;
Fig. 3 is the schematic diagram of communication mode between a kind of flanking cell of the embodiment of the present invention one;
Fig. 4 is the schematic diagram of communication mode between another flanking cell of the embodiment of the present invention one;
Fig. 5 is the structural schematic diagram of the cell of the embodiment of the present invention one;
Fig. 6 is the schematic diagram that application cell array computation system calculates Monte Carlo integral in pipelined fashion;
Fig. 7 is the structural schematic diagram that the cell of cell-cell communication is carried out in the cellular array of the embodiment of the present invention one;
Fig. 8 is the schematic diagram of the Path selection of cell-cell communication in the cellular array of the embodiment of the present invention one;
Fig. 9 is the implementation process schematic diagram of the full-time output cell of the embodiment of the present invention one;
Figure 10 is cell mass-sending signal of the starting point cell on the angle of target area in the cellular array of the embodiment of the present invention one Figure;
Figure 11 is cell mass-sending signal of the starting point cell on the side of target area in the cellular array of the embodiment of the present invention one Figure;
Figure 12 is cell mass-sending signal of the starting point cell inside target area in the cellular array of the embodiment of the present invention one Figure;
Figure 13 is cell mass-sending schematic diagram of the starting point cell outside target area in the cellular array of the embodiment of the present invention one;
Figure 14 is the structural schematic diagram of cellular array computing system provided by Embodiment 2 of the present invention;
Figure 15 is the structural schematic diagram for the cellular array computing system that the embodiment of the present invention three provides;
Figure 16 is the structural schematic diagram for the cellular array computing system that the embodiment of the present invention four provides.
Specific embodiment
The computer architecture of the prior art is since there are communication performance bottlenecks between CPU and memory, storage, to influence computer The promotion of overall performance, and keep cost effectiveness poor.
Present inventor considers afterwards after study, if three memory, storage, calculating functions are integrated into a chip Come up, formed relatively easy but have both the independent unit calculated with store function, and makes to be formed between a large amount of such units intensive Communication network, data broadcasting/mass-sending function is realized with this and is capable of the internal network of extensive parallel transmission data, just can be opened A kind of and human brain is issued there are the computing architecture of similarity, this, which is equivalent to, is made in a chip a large amount of microcomputer On.
For this purpose, technical solution of the present invention provides one kind computing architecture similar with human brain structure (in technical solution of the present invention It is referred to as " cellular array computing system "), the computing architecture is relatively easy by numerous structures, storage and computing function are had both, It is made of again the unit (being referred to as " cell " in technical solution of the present invention) that dense network connects.This new computing architecture will It is widely used in the fields such as mass computing, big data processing, artificial intelligence.
To make the above purposes, features and advantages of the invention more obvious and understandable, with reference to the accompanying drawing to the present invention Specific embodiment be described in detail.
Embodiment one
As shown in Fig. 2, cellular array computing system provided in this embodiment includes: master cpu, cellular array and cell battle array Column bus;The cellular array is the main body in the cellular array computing system, it is to have both calculating by more than one and deposit The two-dimensional array of the cell composition of function is stored up, wherein each cell includes microprocessor (MPU) and non-volatile random access memory (in Fig. 2 by taking MRAM as an example);Involved data is random when the non-volatile random access memory is for microprocessor calculating Access is also used to store the instruction code of software and needs the data of persistence;Each leisure of each cell storage is described thin Position in born of the same parents' array as ID in cell software or hardware read;The master cpu is total by the cellular array Line is communicated with each of cellular array cell;There is communication to connect between flanking cell in the cellular array Mouthful, it can mutually send data.
It should be noted that in the present embodiment, the non-volatile random access memory is illustrated by taking MRAM as an example.? In other embodiments, with further development and mature, the non-volatile random access memory of non-volatile random storage technology It can be realized using in addition several potential technologies, such as: phase-change random access memory (PCRAM, Phase Change Random Access Memory), resistive formula random access memory (Resistive Random Access Memory), ferroelectricity with Machine memory (FeRAM, Ferroelectric Random Access Memory), ferroelectric dynamic random access memory (FEDRAM, Ferroelectric Dynamic Random Access Memory) etc..
MRAM is a kind of new memory and memory technology, can as SRAM/DRAM quick random read-write, and compare DRAM is fast;Can also as flash memory permanent reservation data after a loss of power, and unlike nand flash memory, MRAM can with unlimited time Erasable, service life is longer.In addition, the economy of MRAM thinks that locality is good, the silicon area ratio SRAM that unit capacity occupies is (usually Caching as CPU) there is very big advantage, it is expected to the level close to DRAM.Its performance is also fairly good, and read-write time delay is close to most Good SRAM, power consumption are then best in various memories and memory technology.And MRAM is unlike DRAM and Flash and standard Cmos semiconductor technique is incompatible, and MRAM can be integrated into a chip with logic circuit.By using MRAM technology, so that it may Three memory, storage, calculating functions are integrated on a chip, the realization of the cellular array computing system is made just It is possibly realized.
In the present embodiment, the microprocessor has the function of usual CPU, can also be added according to specific application scenarios The units such as Floating-point Computation processor, image processor, therefore, can integrate in the microprocessor Floating-point Computation processor and At least one of image processor.
In actual implementation, the master cpu and the cellular array and the cellular array bus be can integrate in one In a chip, the master cpu can also be used as independent chip, by the memory interface of standard with by the cellular array and The chip of the cellular array bus composition is communicated.When using standard between the master cpu and the cellular array When memory interface is communicated, then master cpu can be realized using general cpu chip, it is easier to which the cellular array calculates The implementation of system.
In the present embodiment, as ID, the position can adopt the position in each comfortable cellular array of each cell storage With the coordinate representation mode of first quartile in plane right-angle coordinate, if (x, y) indicates position of some cell in cellular array It sets, then (x, y) can be used as ID and be stored in the cell, and the software and hardware in cell can read this ID, specific It is used in operation.
In the present embodiment, the master cpu is thin by each of the cellular array bus and the cellular array The communication that born of the same parents carry out includes following situations:
The non-volatile random access memory of any cell in the cellular array is read and write by address;
By the non-volatile random access memory of data broadcasting each cell in target area into the cellular array, and write Enter in the target area identical relative address in the non-volatile random access memory of each cell;
Send instruction (including start, suspend) to the microprocessor of any cell in the cellular array, send data or Reading state;
To the microprocessor broadcasting instructions of all cells in the target area.
Certainly, in other embodiments, the master cpu passes through in the cellular array bus and the cellular array The communication that each cell carries out is also possible to one of above situation or multiple combinations.
It should be noted that " target area " in the embodiment of the present invention refers to by the master cpu or the cell battle array Any cell in column selects more than one there are the region that the cell of neighbouring relations is constituted, and the cell in the region is master control The object of data or instruction is broadcasted/mass-sended to any cell in CPU or cellular array.In the present embodiment, the target area tool With rectangular area, (a≤x≤b, c≤y≤d, wherein a, b are rectangular area x-axis direction in plane right-angle coordinate respectively to body Boundary coordinate, c, d are the boundary coordinate in rectangular area y-axis direction in plane right-angle coordinate respectively) for said It is bright.In other embodiments, the target area is also possible to the region of other shapes composition, such as diamond-shaped area, triangle Region, hexagonal area etc..
In addition, the concept of " broadcast " is different from the concept of " mass-sending " in the embodiment of the present invention, the former can be one pass of hair According to or instruction so that all objects can receive, and the latter then can be point many times and issue different objects.
Except through master cpu to cell any in the cellular array (including microprocessor in cell or non-volatile Random access memory) it is broadcasted, in the cellular array, there are one communication network, which enables to one Cell can send data to the cell adjacent with it under the control of its MPU.As shown in figure 3, in a plane, Ren Yixi Born of the same parents can communicate with the flanking cell of its upper and lower, left and right four direction.Certainly, between flanking cell communication mode it is general Thought is not limited only to " upper and lower, left and right four direction ", in the case where configuration can be supported, be also possible to " upper, Under, left and right, upper left, upper right, lower-left, the direction of bottom right eight ", as shown in figure 4, any cell can with its upper and lower, left and right, Upper left, upper right, lower-left, eight directions in bottom right flanking cell communicated.
As shown in figure 5, in the present embodiment, the cell in the cellular array specifically can also include bus control unit and Cell interior bus, the bus control unit are connected with the cellular array bus, microprocessor and cell interior bus, institute The communication that bus control unit carries out between the master cpu and this cell for identification is stated, connects the microprocessor to transmit The instruction or data, state that the master cpu is sent are read, or by cell interior bus connection it is described it is non-volatile with The read-write operation of machine memory progress data.
As known to those skilled in the art, one fairly simple and the good CPU of performance, such as ARM Cotex M0, only 50,000 A or so metal-oxide-semiconductor, it is also more much smaller than more than one hundred million a metal-oxide-semiconductors of top CPU even if appropriateness increases FPU function, promote cpu performance Bring area (cost) increase is out-of-proportion.The many small CPU of CPU one big are replaced, in the same feelings of totle drilling cost Under condition, total computing capability is centainly increased many times.However convention computer architecture is limited by communication performance bottleneck, using in a large amount of CPU Core bring actual performance is promoted very limited.
And the cellular array computing system that technical solution of the present invention provides passes through data broadcasting/mass-sending and internal network, solution Determined communication performance bottleneck the problem of, to promote the overall performance of computing system, and cost effectiveness can be made preferable, will be subsequent to this This point is more clearly visible that in application example.
It is preliminary studies have shown that being formed one thin if cooperating the memory of 32KB using the MPU for being similar to Cortex M0 Born of the same parents.Using 40 nanometers of technique, 3000 such cells can be done on a single die, this is very powerful calculating energy Power.Further investigations have shown that in this way, the calculating of the present age top CPU can be surmounted in same silicon area Ability (is generally measured with flops per second (FLOPS, floating-point operations per second)). Since the cellular array computing system of technical solution of the present invention no longer faces the bottleneck with memory interface, solving much actually to ask In topic, the meeting of performance is more preferable.
Based on above-mentioned cellular array computing system, the present embodiment also provides logical in a kind of above-mentioned cellular array computing system Letter method, comprising: the communication that master cpu is read and write between operation, master cpu and the microprocessor of non-volatile random access memory is grasped Make, the traffic operation in the broadcast operation and cellular array of master cpu between flanking cell;
The operation of the master cpu read-write non-volatile random access memory specifically includes: any thin in the cellular array Born of the same parents receive the destination address that the master cpu is broadcasted in the cellular array bus, if judging the destination address at this In cell, then the non-volatile random access memory of the cell is connected so that the master cpu carries out the read-write operation of data.
Traffic operation between the master cpu and microprocessor specifically includes: first is reserved in system address space Special address field is for the communication between the master cpu and microprocessor and stores the ID of target cell, if the cell battle array Identify it is communication with the microprocessor of this cell when any cell receives the first special address field in column, then connection should The microprocessor of cell completes subsequent command reception, data receiver and status read operation.
It should be noted that the system address space be not limited only in each cell by cellular array include Non-volatile random access memory composition address space summation because the memory of connection cellular array bus may be not just thin The non-volatile random access memory for including in each cell of born of the same parents' array, entirely possible there is also other kinds of memory and cells Array bus is connected, and accesses for the master cpu.Therefore, the master cpu needs to go to identify according to the ID of cell it and prepares to visit The cell asked (cell is known as " target cell " in the present embodiment at this time).
The broadcast operation of the master cpu specifically includes: reserving the second special address field in system address space and is used for The master cpu broadcasting instructions, the second special address field have and can assist in target area in the cellular array The ID of each cell of range, if any cell identifies that this is thin after receiving the second special address field in the cellular array Born of the same parents connect the microprocessor of the cell then in the target area to transmit the instruction or data, shape that the master cpu is sent State is read, or the non-volatile random access memory by connecting the cell carries out the read-write operation of data.
The broadcast operation of the master cpu is illustrated so that the target area is specifically rectangular area as an example below. One section is reserved in system address space and is used as broadcasting instructions, and one in this address section rises in target rectangle region for storing The ID of point cell.The starting point cell is the first cell accessed by master cpu in the target rectangle region, total in the cell After lane controller receives this special address, the data of a subsequent word (word) are received, this data includes target rectangle The ID of the cell diagonal with starting point cell in region.Bus control unit judges this cell in this region, receives second word Data.It is that some relative address is opened to the instruction or data of MPU, or from non-volatile random access memory that second word, which is indicated, Begin to be written.If it is the former, MPU is connected, if it is the latter, non-volatile random access memory is connected and completes subsequent operation.
It should be noted that in the case where the memory space of the described second special address field is relatively limited, storage Cell ID may not be able to determine therefrom that the range of target area completely, after receiving the described second special address field at this time, also Need to receive subsequent data to cooperate the cell ID stored in the described second special address field to determine target area jointly Range.
Traffic operation in cellular array between flanking cell includes: any cell in the cellular array at its micro- place It manages and sends data to flanking cell under the control of device.
In the present embodiment, each cell is provided with a bus control unit, connect with cellular array bus.Cell interior is set Cell interior bus is set, the non-volatile random access memory is the slave equipment (Slave) of the cell interior bus, total line traffic control Device and microprocessor processed are main equipment (Master).
" any cell judgement in the cellular array involved in communication means in above-mentioned cellular array computing system Whether the destination address is in this cell ", " identifying whether the communication with the microprocessor of this cell " " identifies that this cell is It is no in the target area ", and " connection non-volatile random access memory or microprocessor " is by the bus control unit It completes, the bus control unit connects the non-volatile random access memory by the cell interior bus.
In the specific implementation, non-volatile random access memory of the master cpu for any cell in the cellular array Read-write operation priority be higher than the intracellular microprocessor for corresponding non-volatile random access memory read-write operation. That is, if the microprocessor in some cell needs to read and write the non-volatile random access memory in the cell, the master controls such as palpus CPU just can be carried out after completing for the read-write operation of the non-volatile random access memory of the cell.
The specific implementation of communication means in the cellular array computing system can also be calculated with reference to above-mentioned cellular array The implementation of system, details are not described herein again.
In addition, the present embodiment also provide it is a kind of using above-mentioned cellular array computing system calculate Monte Carlo integral Method.Monte Carlo integral is the summation to random number, is in science and the commonly used big calculating of engineering, principle is opposite Simply, calculating process refers to following formula.
S=ΣRandom(x)F(x1,x2,…··,xN)
The superiority of cellular array computing system will be further shown with the solution for this problem below. The calculating of Monte Carlo integral is very suitable for above-mentioned cellular array computing system, and steps are as follows for specific execution:
The master cpu selects the cell in the cellular array in a whole or target area, integrand F () Relative address section of the corresponding program broadcast to each selected cell;
The master cpu broadcasting instructions make the microprocessor of selected cell execute quilt since the relative address section The corresponding program of Product function F ();
After each cell completes integral operation, summation is stored in the address of agreement, is carried out after being read for the master cpu Total summation.
In the present embodiment, the corresponding program of the integrand F () is when starting to execute, included randomizer The ID of cell is read as seed, the random number that can so guarantee that each cell generates is different.
In actual implementation, thousands of a cells start simultaneously at operation, and powerful operational capability discharges completely, are no longer influenced by existing There is the restriction of communication performance bottleneck in technology, so as to be more efficiently completed the calculating of Monte Carlo integral.
In actual implementation, if integrand F () is extremely complex, the memory of a cell be can not load, and can also be passed through The mode of assembly line solves.Therefore, the present embodiment also provides the above-mentioned cellular array computing system of another application and calculates Monte The method of Carlo integral, comprising:
The master cpu selects the cell in the cellular array in a whole or target area;
The master cpu broadcasts a downloading program into the same segment relative address of each selected cell, and Broadcasting instructions make the microprocessor of selected cell execute the downloading program since the relative address;The downloading program The input of next step will be waited;
The corresponding program of integrand is split into two or more subprograms, the master cpu broadcast is each Subprogram is into the microprocessor of selected cell;
The microprocessor for running the downloading program, selects one of subprogram to store up according to the ID of respective place cell It deposits, is deployed in each subprogram sequentially in one group of successively adjacent cell;
The master cpu broadcasting instructions make the microprocessor of each group of cell successively execute the integrand corresponding Program be split after subprogram, the intermediate result of previous stage is transported to next stage and inputs;
After each group of cell completes integral operation, summation is stored in the address of agreement, is read for the master cpu laggard The total summation of row.
For example, as shown in fig. 6, integrand F () can such as be split into tri- parts f1, f2, f3 (three sons Program), it is deployed in adjacent cell, the intermediate result of previous stage is transported to next stage and inputs.
Specifically, when master cpu broadcasts f1, f2, f3 to each MPU (note: be not to be dealt into memory, be intended for MPU), The MPU of operation downloading program, selects it in the coordinate x (such as with remainder of x/3) of rectangular coordinate system x-axis direction according to oneself ID In a sub- program storage.In this way, passing through the broadcast in two stages, three subprograms are deployed to according to desired rule In all cells for participating in calculating.
In addition, since the execution of the corresponding program of the integrand F () is actually to execute since first subprogram , then being still to read cell with its included randomizer when first subprogram after fractionation starts to execute ID as seed, the random number for guaranteeing that each cell generates is different.
It is compared with convention computer architecture, broadcast/mass-sending function of the cellular array computing system and can be extensive The advantage of the internal network of parallel transmission data embodies herein.If calculating this with traditional multiple nuclear CPU framework to ask Topic, when the caching that each CPU is carried is inadequate, all CPU require to read integrand F () from the interface with memory Code, formed bottleneck.
In the present embodiment, implementation for the internal network of cellular array can not only send data to flanking cell, and And also it is expanded to and can be sent to any one cell from a cell, it can realize that the iuntercellular in cellular array is logical Letter.
It specifically, can be not against master between any two cell in the cellular array of the cellular array computing system It is communicated in the case where control CPU, the cell for participating in cell-cell communication includes starting point cell, end point cell and transfer cell, institute Stating starting point cell is the cell that data are issued to the end point cell, and the end point cell is finally receives the starting point cell institute The cell of data is sent out, the transfer cell is successively adjacent along cell-cell communication path and by described in the communication interface transfer The cell of the sent out data of starting point cell, the cell-cell communication path are by the starting point cell, transfer cell and end point cell The data transmit-receive path constituted.
By the communication interface between flanking cell in cellular array, realize between flanking cell for data it is multiple in Turn, makes just to can be carried out communication not against master cpu between any two cell in cellular array, improve cell-cell communication Efficiency also reduces the processing load of master cpu, so as to further promote the overall performance of computing system.
It should be noted that the starting point cell, end point cell and transfer cell are logical relative to certain iuntercellular Relative concept for letter process, because some starting point cell completely may also be as the transfer during other cell-cell communications Cell or end point cell, some end point cell may also as during other cell-cell communications transfer cell or starting point it is thin Born of the same parents.
In the specific implementation, the cell in the cellular array can also include the network control being connected with the microprocessor Device processed, the network controller are used in cell-cell communication to the data of sending, the data of transfer or final received number According to transmitting-receiving control is carried out, it is also used to send interrupt signal to the microprocessor.In the present embodiment, by being set into the cell each A network controller is set, so as to the rapidly interim data in the case where not interfering MPU, thus reduces the place of MPU in cell Reason burden.In other embodiments, it can also be not provided with the network controller, but realize the transfer of data by MPU.
In the present embodiment, " data of sending " refer to the data that the starting point cell itself is sent out;" the number of transfer According to " referring to the data that starting point cell described in the transfer cell transfer issued, the data are not as needed for this cell itself It to be issued;" final received data " refer to end point cell received data, and the data are after by multiple transfer It arrived destination, will no longer carry out transfer." data of sending ", " data of transfer " and " final received data ", it is just interior It may be identical data for appearance, only belong to the different addresses in different communication stage.
In the specific implementation, the cell in the cellular array can also include one group to be connected with the network controller Or one group or more First Input First Output, each group First Input First Output respectively correspond a cell adjacent with this cell, each group First Input First Output includes input First Input First Output and output First Input First Output, and the input First Input First Output is for depositing Storage inputs the data or final received data that this cell carries out transfer, and the output First Input First Output is thin from this for storing The data that the need of born of the same parents' output carry out the data of transfer or this cell is issued to other cells.
If in the case where between the flanking cell shown in Fig. 3 for communication mode, in the cellular array of the present embodiment into The structure of the cell of row cell-cell communication as shown in fig. 7, network controller in Fig. 7 respectively with MPU and 4 group of fifo queue phase Even, each group fifo queue is a pair of in " the upper and lower, left and right four direction of two-dimensional surface " adjacent cell one with this cell respectively It answers, when it is implemented, the communication channel between every two flanking cell can share one group of corresponding fifo queue.Every group of FIFO Queue includes input FIFO and output FIFO, and for standing in the wherein angle of a cell, input FIFO is stored from other The data of flanking cell input, output FIFO store data from this cell to other flanking cells that export from, flanking cell it is defeated FIFO belongs to input FIFO, the output FIFO of this cell for this cell and belongs to input for flanking cell out FIFO。
It should be noted that corresponding in cell shown in Fig. 7, there are 4 groups of fifo queues, if being located at the cell battle array of rectangle The cell at 4 angles of column, then only there are two flanking cells for the cell, and the cell corresponds to 2 groups of fifo queues at this time, if being located at square The cell on 4 sides of the cellular array of shape, then there are three flanking cells for the cell, and the cell corresponds to 3 groups of fifo queues at this time.
In the present embodiment, network controller is also connected with the intracellular MPU, give it send out interrupt signal, as FIFO empty, FIFO completely, newly arrives data, data submitting etc.;MPU then can issue data by network controller, and the data of sending would generally It is first put into corresponding output fifo queue.
It should be noted that the eucaryotic cell structure in Fig. 7 illustrate only module relevant to cell-cell communication is carried out, ability Field technique personnel are, it is understood that eucaryotic cell structure shown in Fig. 7 can combine completely with eucaryotic cell structure shown in fig. 5.
In addition, outputting and inputting the data of some cell in the present embodiment using fifo queue storage, can so make thin Data relay in intercellular communication process more efficiently, reduces the processing load of MPU.In other embodiments, it outputs and inputs The data of some cell can also be realized by register.
The present embodiment also provides the communication means in a kind of above-mentioned cellular array computing system, comprising: the cellular array In the data that will be issued to end point cell of starting point cell, be sent to by selected sending direction adjacent with the starting point cell Cell;When any cell in the cellular array receives the data of the data that flanking cell issues or transfer, if according to The ID for the end point cell indicated in the data received judges that this cell is end point cell, the then data that will be received It is stored in the non-volatile random access memory of this cell, or the microprocessor of this cell is notified to handle received data, it is no Then this cell is used as transfer cell, and the data relay received is given adjacent thin of this cell after selected sending direction Born of the same parents.
In the specific implementation, each data involved in cell-cell communication process all can be thin containing starting point cell and terminal The ID of born of the same parents, any cell can judge that the data are intended for this according to the ID for the end point cell indicated in the data received Cell still needs to be given to other flanking cells in further.One data passes through the connection between flanking cell, through excessive Secondary transfer, cell of reaching home can be according to starting points if the end point cell needs to make feedback with regard to the sent out data of starting point cell Feedback data is sent to starting point cell by the ID of cell, and the end point cell is with the starting point cell indicated in the data that receive ID of the ID as end point cell, is indicated in obtained feedback data, at this time after handling the data received The end point cell becomes starting point cell when a new cell-cell communication, and it is logical that original starting point cell then becomes the secondary iuntercellular End point cell when letter.
When it is implemented, while the ID for indicating end point cell, data that the starting point cell is issued to end point cell In also indicate address or MPU to be accessed in the end point cell;The data that will be received are stored in the non-of this cell Volatile random access memory, after being the address to be accessed that the end point cell is indicated in identifying the data received It carries out;The MPU for notifying this cell handles received data, and the end point cell is received identifying It is carried out after the MPU indicated in data.
In actual implementation, if the address to be accessed that end point cell is indicated in identifying the data received Afterwards, then received data can be write direct the non-volatile random access memory of the cell by the network controller in the end point cell In appropriate address, in this case, cell may be implemented " to breed ", and cell can download journey to another cell Sequence;If after the MPU that end point cell is indicated in identifying the data received, received data will be transferred in end point cell MPU processing.
In the present embodiment, described since the cell in cellular array further includes the network controller being connected with MPU Starting point cell issues data to end point cell, any cell in the cellular array receives data that flanking cell issues or in The data that turn and judge this cell be final cell or transfer cell, the data received are stored in this cell it is non-volatile with Machine memory notifies the MPU of this cell to handle received data, is under the control of the network controller It completes.
When it is implemented, the data that the starting point cell is issued to end point cell are first as described in network controller input Export First Input First Output, then by the network controller from the output First Input First Output export to the starting point cell Adjacent cell;If any cell in the cellular array receives the data of flanking cell sending or the data of transfer, The data received are inputted into the input First Input First Output, and again when the data for judging to receive need to carry out transfer The data are inputted into the output First Input First Output.
In addition, if the network controller judges the input First Input First Output or output First Input First Output for sky Or expired, or receive flanking cell and issue or the data of transfer, or issue data or interim data to flanking cell, then to The microprocessor sends interrupt signal.
In the specific implementation, the starting point cell or transfer cell can select in the following way described sender to: If the communication path of straight line, the transmission can be formed between the starting point cell or transfer cell and the end point cell Direction is along the straight line from the starting point cell or transfer cell to the direction of the end point cell, otherwise described sender to For from the starting point cell or transfer cell to the direction of flanking cell to be selected, the flanking cell to be selected is thin with the starting point Close to the cell of the end point cell among born of the same parents or the adjacent cell of transfer cell.Certainly, the quantity of the flanking cell to be selected It is possible that then selecting the less cell of communication task of output data in the two flanking cells to be selected as at this time for two Turn cell.
In the present embodiment, the starting point cell or transfer cell select sending direction through the above way, actually may be used To be considered the path selection process of cell-cell communication in cellular array.Can be refering to Fig. 8, each rectangle in Fig. 8 indicates thin A cell in born of the same parents' array, all cells shown in Fig. 8 are a part in entire cellular array, it is assumed that flanking cell it Between communication mode as shown in Figure 3 carry out.
If A point indicates that a starting point cell, the starting point cell prepare to issue data to the end point cell where C point, by Be clearly the communication path for being capable of forming straight line between A point and C point, then the cell where A point by data be sent to and its Cell where adjacent B point, similarly, the cell where B point continue on the straight line between A point and C point as transfer cell The direction interim data of cell where to C point is successively adjacent thin on the cell-cell communication path formed between A point and C point Born of the same parents repeatedly forward the data that cell where A point issues, until being transmitted to cell where C point.
If D point indicates another starting point cell, which prepares to issue data to the end point cell where G point, Due to being clearly the communication path that can not form straight line between D point and G point, then in the cell adjacent with cell where D point Among, the cell where cell and F point where E point is obviously closer to the end point cell where G point, then the two cells belong to The flanking cell to be selected of cell, the less cell of communication task that can choose wherein output data are thin as transfer where D point Born of the same parents choose at random a cell as transfer cell if the communication task of the two cell output datas is identical.Such as Fig. 8 institute Show, the cell where cell or F point where selection E point will form different cell-cell communication paths.
It should be noted that being by taking the communication mode between flanking cell shown in Fig. 3 as an example in the present embodiment to cell Between the Path selection that communicates be illustrated, if it will be appreciated to those of skill in the art that using shown in Fig. 4 adjacent thin Communication mode between born of the same parents, then alternative sending direction will be more.
To sum up, in actual implementation, each is issued or the cell of interim data, network controller must all select one A adjacent cell is as the next stop.When beginning and end point-blank when, reasonable selection it is general only one;Other In the case of, there are two same reasonable selection, network controller will select the opposite neighbours not being in a hurry of a traffic.
If some input fifo queue has data entrance, network controller will first check for it:
If terminal is this cell: if terminal is specific relative address, since network controller has direct memory The data received will be directly stored in described non-volatile deposit at random by the ability for accessing (DMA, Direct Memory Access) Appropriate address in reservoir, and with interrupt notification MPU;If terminal is MPU, directly handled with interrupt signal notice MPU.
If the MPU that terminal is other cells or this cell sends out data: if terminal and this cell exist On straight line, then selection is correctly oriented, and sends data to flanking cell;In the case of other, there are two possible direction, choosings The flanking cell that wherein output fifo queue is more idle is selected to send, if the output fifo queue in two flanking cells to be selected Situation is identical, then can therefrom choose at random a flanking cell and send.
In actual implementation, when the thousands of MPU present in the cellular array are calculated together, how each cell Output data be sent to the master cpu just and become a problem.In general, each MPU can store output data Agreed address where it in non-volatile random access memory of cell allows master cpu by way of each MPU of poll one by one It is read out.However, this is not particularly suited for all problems, in some problems, when there was only a few cell in cellular array When needing to master cpu output data, then each MPU efficiency of poll is too low one by one for master cpu.
Therefore, cellular array computing system provided in this embodiment further include: at least one is additionally provided in the cellular array A full-time output cell, the sole duty export cell as end point cell and receive and store other cells to the master cpu Output data, and notify the master cpu to read the output data with interrupt signal.
In the specific implementation, FIFO team can also be set in the non-volatile random access memory of the full-time output cell Column, other cells are stored in the fifo queue to all output datas of the master cpu, which should have foot Enough memory spaces have the ability to store other cells to all output datas of the master cpu.
When actual implementation, one or several cells can be selected as the full-time output cell in cellular array, one As can choose and communicated more convenient and fast cell between master cpu in position.The full-time output cell with it is described Interrupt line is equipped between master cpu, the full-time output cell can send interrupt signal to master cpu, such as newly arrive other The FIFO that the fifo queue being arranged in the output data of cell, MRAM expired, is arranged in MRAM sky etc..
Based on the above-mentioned cellular array computing system for being equipped with full-time output cell, the present embodiment also provides a kind of cellular array Communication means in computing system, comprising: the sole duty exports cell and receives and stores other cells to the defeated of the master cpu Out after data, give notice the interrupt signal of reading to the master cpu;The master cpu is receiving the notice reading After the interrupt signal taken, the output data is read from the full-time output cell.
When it is implemented, the output data can be sent to the sole duty in the following way defeated for other described cells Cell out: any cell in other described cells is sent out the output data by selected sending direction as starting point cell It send to adjacent cell;When any cell in the cellular array receives the output data of flanking cell transmission, If the ID for judging the end point cell indicated in the output data is consistent with the ID of this cell, due in the output data The ID for the end point cell indicated is the ID of the full-time output cell, shows that this cell is that the sole duty exports cell, then will The output data is stored in the non-volatile random access memory of this cell, and otherwise this cell is used as transfer cell, in selected sender The output data transfer is given into the adjacent cell of this cell backward.
During the output data is sent to the full-time output cell by other described cells, the starting point cell Or transfer cell can select in the following way described sender to: if the starting point cell or transfer cell and the sole duty The communication path of straight line can be formed between output cell, then described sender is to for by the starting point cell or transfer cell Along the straight line to the direction of the full-time output cell, otherwise described sender is to for by the starting point cell or transfer cell To the direction of flanking cell to be selected, the flanking cell to be selected is among the cell adjacent with the starting point cell or transfer cell Close to the cell of the full-time output cell.
The implementation process of the full-time output cell of the present embodiment can also be refering to Fig. 9.Fig. 9 shows master cpu, cell battle array Column and cellular array bus, the lattice one by one in cellular array simply represent cell one by one, wherein where J point Cell (i.e. bold box lattice indicate cell) be full-time output cell, Fig. 9 further illustrates full-time output cell Structure, as shown in dotted arrow in Fig. 9, it can be seen that sole duty export be equipped in MRAM in cell other cells of storage to The fifo queue of all output datas of the master cpu.
Assuming that the cell where cell and I point where H point needs to provide output data to master cpu, then can pass through The output data is sent to the cell where J point, H point to the cell-cell communication path of J point by the communication mode between cell And I point please refers to Fig. 9 to the cell-cell communication path of J point.Since cell-cell communication mode has had a detailed description before this, this Place repeats no more.
Cell where J point receive cell or I point where H point where the output data that issues of cell after, then may be used Notify the interrupt signal read to master cpu to send, it, can after master cpu receives the interrupt signal of notice reading The output data is read from the cell where J point by cellular array bus.
By the way that full-time output cell is arranged in cellular array, connect using the full-time output cell as the end point cell It receives and stores other cells to the output data of master cpu, and notified described in the master cpu reading in a manner of interrupt signal Output data so can improve master cpu when only a few cell is needed to master cpu output data and read output The efficiency of data.
Introduce the example of an above-mentioned cellular array computing system of application again below.
Speech recognition can be compared with the voice signal of known sound bank and input, this comparison can be when Domain compares and can also compare in frequency domain.When needing the words that compares more and more, such as, it is contemplated that different accents can arrive It is tens of thousands of, seem insufficient if only relying on the computing capability of a few CPU for Real-time speech recognition.
Cellular array computing system provided in an embodiment of the present invention is then very suitable to solve problems.
For this purpose, the present embodiment also provides a kind of method for carrying out comparing using above-mentioned cellular array computing system, packet It includes: after the master cpu selects the cell in the cellular array in a whole or target area, alignment programs being broadcast to In the non-volatile random access memory of each cell;Selected each cell is responsible for the sample compared and distinguished by the master cpu It is written in the agreed address of each cell;The master cpu broadcasting instructions give the microprocessor of selected cell, make each micro- place The data to be compared to be entered such as after reason device completion initialization;The master cpu is data broadcasting to be compared to selected thin The microprocessor of born of the same parents;The microprocessor of selected cell runs the alignment programs, to the data to be compared received and this The sample that cell is responsible for comparing is compared, if obtaining the consistent comparison result of the two, calculates system using above-mentioned cellular array Communication means in system is sent to the full-time output cell for the comparison result as output data for the master control CPU is read.
When it is implemented, the data to be compared are either voice data to be identified, is also possible to be identified Image data can also be other data for needing to be compared.
In actual implementation, each MPU constantly receives voice data and is compared, it is generally the case that several hundred a to thousands of In a cell, only one or a few obtain data to be compared with this cell to be responsible for both samples compared consistent Comparison result is sent to full-time output cell by comparison result, these cells, and the latter notifies master cpu to receive with interrupt signal.
If the data to be compared are specially voice data, the process of comparing can time domain or frequency domain into Row can be first segmented by master cpu if it is the latter and carry out Fast Fourier Transform (FFT) (FFT, Fast Fourier Transformation), then broadcast have been converted to frequency domain voice data give selected cell MPU.
Comparing is carried out by the above-mentioned cellular array computing system for being equipped with full-time output cell of application, cell can be made The operation of program is compared in a large amount of cell simultaneously in array, thus has extremely strong parallel processing capability, solves existing Communication performance bottleneck problem in technology between CPU and memory is greatly improved real-time voice/image recognition ability.
As previously mentioned, from a cell sending bulk message to some target area in cellular array, had one it is simple Method: by master cpu read information broadcasted again.The present embodiment also provides another implementation: intercellular Point communications functionality is extended to region mass-sending, and this mode can support the higher depth of parallelism, much higher total bandwidth.
In cellular array computing system provided in this embodiment, any cell in the cellular array can also be used as institute It states all cells of the starting point cell into target area and carries out mass-sending communication, participate in the mass-sending communication and be located in target area Cell the transfer cell and end point cell, ginseng are used as the starting point cell or as the end point cell or simultaneously It is communicated with the mass-sending and the cell outside target area is as the starting point cell or transfer cell.
When it is implemented, the network controller being connected in each cell with microprocessor, in addition to any two cell into Row cell-cell communication, but also in mass-sending communication, to the data of sending, the data of transfer or final received data Transmitting-receiving control is carried out, the network controller is also used to send interrupt signal to the microprocessor.
In actual implementation, the original sender (cell in cellular array as starting point cell) of iuntercellular mass-sending communication It is responsible for indicating target area, the mass-sending of data is completed still through a series of transfers.It will be appreciated to those of skill in the art that Iuntercellular mass-sending communication is it is also assumed that be effective superposition of multiple intercellular point-to-point communication, therefore iuntercellular mass-sending communicates Specific implementation can also be with reference to the implementation communicated between any two cell, such as the cellular array referred to before this In cell equally may include a group or more of First Input First Outputs being connected with the network controller, herein no longer It repeats.
On the basis of mass-sending communication between above-mentioned cellular array computing system sertoli cell, the embodiment of the present invention also provides one Iuntercellular mass-sends communication means in the above-mentioned cellular array computing system of kind, comprising: when any cell in cellular array is used as It, will be thin if the starting point cell is located in the target area when point cell all cells into target area initiate mass-sending communication Intercellular mass-sending data are sent to all flanking cells in the target area, and are directed to each flanking cell more fresh target Otherwise iuntercellular mass-sending data are sent to adjacent cell by the direction close to target area by region;If being located at target Cell outside region receives the iuntercellular mass-sending data of flanking cell transmission, then is judging the iuntercellular mass-sending number After not including this cell according to middle indicated target area, this cell is used as transfer cell, will by the direction close to target area The iuntercellular mass-sending data relay is to flanking cell;If the cell being located in target area receives the institute of flanking cell transmission State iuntercellular mass-sending data, then after the target area indicated in judging the iuntercellular mass-sending data is comprising this cell, This cell is used as end point cell, and the iuntercellular received mass-sending data are stored in the non-volatile random access memory of this cell, Or notify the microprocessor of this cell to the iuntercellular mass-sending data handle, if the target area in still have with The adjacent cell of this cell, then this cell is also used as transfer cell, by the iuntercellular received mass-sending data relay to institute There is the flanking cell being located in target area, and updates target area for each flanking cell;Updated target area Including one or more target areas made of being divided as the target area before updating, issued in the target area before update Or each flanking cell of the cell of the mass-sending data of iuntercellular described in transfer is separately included in updated each target area It is interior, it has issued or iuntercellular described in transfer is mass-sended except the target area of the cell exclusion of data in the updated.
It should be noted that since master cpu can also be by the data broadcasting of some cell to some in cellular array Target area, in order to " broadcast data of master cpu " different from, therefore in the present embodiment by iuntercellular mass-send communication when institute The mass-sending data being related to are known as " iuntercellular mass-sending data ".The cell meeting hard objectives region for initiating iuntercellular mass-sending communication, should The range of the ID of all cells or all cell ID will be indicated among iuntercellular mass-sending data in target area, appoint One cell receives the iuntercellular mass-sending data, just can mass-send the target area indicated in data according to the iuntercellular Judge that iuntercellular mass-sending data are finally received by this cell, or need to be given to other flanking cells in further, Or the two all needs execution.
In addition, described update target area for each flanking cell, it is specifically that the target area before updating is drawn One or more target areas made of point (have issued or the cell of the data of the mass-sending of iuntercellular described in transfer are excluded Except updated target area), wherein each target area can respectively contain the flanking cell (i.e. and before update Issued in target area or transfer described in iuntercellular mass-sending data the adjacent cell of cell), each described flanking cell exists Respectively continue intercellular mass-sending communication in corresponding updated target area, correspondingly, the iuntercellular mass-sending The target area indicated in data equally can also be updated.
In the present embodiment, with communication mode between flanking cell shown in Fig. 3, and to initiate the starting point cell institute of mass-sending communication The shape of determining target area be rectangle for be illustrated.It should be noted that iuntercellular group given by the present embodiment More convenient and efficient mode when communication mode is actual implementation is sent out, it will be appreciated by those skilled in the art that in other implementations In example, iuntercellular mass-sending communication means equally can be suitably used for communicating between other flanking cells in above-mentioned cellular array computing system The target area of mode or other shapes.
When it is implemented, the side of the difference of the cell present position as starting point cell or transfer cell, sending or transfer Formula will be different.
When the first cell as starting point cell or transfer cell is located on the angle of rectangular target areas, if the square The cell quantity on the wherein one side on the adjacent both sides containing first cell is 1 in shape target area, then updated mesh Mark region is that the rectangular target areas excludes the rectangle region formed after first cell in the another side on the adjacent both sides Domain, otherwise updated target area includes the target area of two rectangles, and one of target area is the adjacent both sides Middle any side excludes the rectangular area formed after first cell.It should be noted that described first in the present embodiment is thin Born of the same parents are the general designations of a kind of cell on the angle of rectangular target areas.
It can be with refering to fig. 10, it is assumed that the cell where K point is the starting point cell for initiating iuntercellular mass-sending communication, or is negative Blame the transfer cell of transfer iuntercellular mass-sending data, rectangular target areas 101 be the cell where K point issue or transfer described in it is thin Intercellular mass-sends identified target area before data, and the cell where K point is in rectangular target areas 101 at this time, and is located at square On the angle of shape target area 101, since this edge in the horizontal direction of rectangular target areas 101 only includes 1 cell, K at this time Cell only one neighbour where point can choose as next stop transfer, then the network controller of the cell is by the cell Between mass-sending data be sent to the cell where L point, and rectangular target areas 101 is updated, the target area formed after update Domain is rectangular target areas 102, is equivalent to and excludes the cell where K point except rectangular target areas 101;With target area The continuous renewal in domain stops transfer if being left the last one cell in target area.
Assuming that the cell where M point is also the starting point cell for initiating iuntercellular mass-sending communication, or to be responsible for transfer cell Between mass-send data transfer cell, rectangular target areas 103 be M point where cell issue or transfer described in iuntercellular mass-send number According to preceding identified target area, the cell where M point is in rectangular target areas 103 at this time, and is located at rectangular target areas On 103 angle, since the adjacent both sides of rectangular target areas 103 include 1 or more cell, the cell where M point has at this time Two neighbours can choose as next stop transfer, then the network controller of the cell sends iuntercellular mass-sending data to The cell where cell and O point where N point, and rectangular target areas 103 is updated, updated target area includes The target area of two rectangles, one of target area are rectangular target areas 104, another target area is rectangular target Region 105 is equivalent to the cell where M point excluding rectangular target areas 104 and rectangle except rectangular target areas 103 Target area 105 can be used as independent target area and continue to carry out data relay with aforementioned similar approach;With target area Continuous renewal stop transfer if being left the last one cell in target area.
When the second cell as starting point cell or transfer cell is located on the side of rectangular target areas, if the square It with the cell quantity where second cell while adjacent is 1 in shape target area, then updated target area includes Side where second cell excludes the target area of two rectangles formed after second cell, otherwise updated target Region includes the target area of three rectangles, and two of them target area is that side where second cell excludes described second carefully Two rectangular areas formed after born of the same parents.It should be noted that second cell in the present embodiment is to be located at rectangular target area The general designation of a kind of cell on the side in domain.
It can be with refering to fig. 11, it is assumed that the cell where P point is the starting point cell for initiating iuntercellular mass-sending communication, or is negative Blame the transfer cell of transfer iuntercellular mass-sending data, rectangular target areas 111 be the cell where P point issue or transfer described in it is thin Intercellular mass-sends identified target area before data, and the cell where P point is in rectangular target areas 111 at this time, and is located at square On certain side of shape target area 111, where cell as where in rectangular target areas 111 with P point while adjacent Cell quantity is greater than 1, then the cell where P point can choose there are three neighbours as next stop transfer, the network of the cell at this time It is thin where cell that iuntercellular mass-sending data are separately sent to where Q point by controller, cell and S point where R point Born of the same parents, and rectangular target areas 111 is updated, updated target area includes the target area of three rectangles, respectively Rectangular target areas 112, rectangular target areas 113 and rectangular target areas 114 are equivalent to and exclude the cell where P point Except rectangular target areas 111, rectangular target areas 112 and the two target areas of rectangular target areas 113 are the equal of P point Two rectangular areas formed after cell where the cell at place where side exclusion P point, rectangular target areas 112, rectangle mesh Mark region 113 and rectangular target areas 114, which can be used as independent target area, to be continued to carry out in data with aforementioned similar approach Turn;With the continuous renewal of target area, if being left the last one cell in target area, stop transfer.
It is understood that if in target area (not indicated in Figure 11) with where cell where P point while adjacent Cell quantity be 1, then the cell where P point can choose as next stop transfer, the network of the cell there are two neighbours at this time The iuntercellular is mass-sended the cell where cell and R point where data are separately sent to Q point by controller, and to target area It is updated, updated target area includes the target area of two rectangles, specially rectangular target areas 112 and rectangle mesh Mark region 113.
When the third cell as starting point cell is located at the inside of rectangular target areas, updated target area includes The target area of four rectangles, two of them target area are after third cell place row or column excludes the third cell Two rectangular areas formed, other two target area is the rectangular target areas before update by the third cell institute It is expert at or column split and two rectangular areas being formed.It should be noted that the third cell in the present embodiment is to be located at The general designation of a kind of cell of the inside of rectangular target areas, the inside of the rectangular target areas refer to except " angle " and " side " with Outer region.
It can be with refering to fig. 12, it is assumed that the cell where T point is the starting point cell (the present embodiment for initiating iuntercellular mass-sending communication Cell where middle T point can not mass-send the transfer cell of data for responsible transfer iuntercellular), rectangular target areas 121 is T point The cell at place issues identified target area before the iuntercellular mass-sending data, and the cell where T point is located at rectangle at this time The inside of target area 121, the cell where T point can choose there are four neighbours as next stop transfer, the network control of the cell Device processed by iuntercellular mass-sending data be separately sent to the cell where U point, the cell where V point, the cell where W point and Cell where X point, and rectangular target areas 121 is updated, updated target area includes the target of four rectangles Region, respectively rectangular target areas 122, rectangular target areas 123, rectangular target areas 124 and rectangular target areas 125, It is equivalent to and the cell where T point is excluded into rectangular target areas 122 and rectangular target areas except rectangular target areas 121 123 the two target areas are the equal of that cell where T point is expert at two rectangles formed after the cell where excluding T point Region, rectangular target areas 124 and the two target areas of rectangular target areas 125 are the equal of rectangular target areas 121 by T Cell where point, which is expert at, to be divided and two rectangular areas of formation, rectangular target areas 122, rectangular target areas 123, square Shape target area 124 and rectangular target areas 125, which can be used as independent target area, to be continued to be counted with aforementioned similar approach According to transfer;With the continuous renewal of target area, if being left the last one cell in target area, stop transfer.
In the present embodiment, when the 4th cell as starting point cell or transfer cell is located at except target area, if The communication path of straight line can be formed in 4th cell and target area between any cell, then the 4th cell hair Out or the sending direction of the mass-sending data of iuntercellular described in transfer is along the straight line from the 4th cell to the side of target area To, otherwise described sender is to for from the 4th cell to the direction of flanking cell to be selected, the flanking cell to be selected be with Close to the cell of target area among the adjacent cell of 4th cell.It should be noted that described in the present embodiment Four cells are the general designations of a kind of cell except rectangular target areas.
It can be with refering to fig. 13, it is assumed that the cell where Y1 point is the starting point cell for initiating iuntercellular mass-sending communication, rectangle mesh Marking region 131 is that the cell where Y1 point issues identified target area before the iuntercellular mass-sending data, at this time Y1 point institute Cell except rectangular target areas 131, due to the cell rectangular target areas with respect to two sides extended line between, With the communication path for being capable of forming straight line where Y3 point in rectangular target areas between cell, only one neighbour can at this time Using the transfer as the next stop, iuntercellular mass-sending data are sent to this neighbour by the network controller of the cell where Y1 point It occupies, i.e. cell where Y2 point, the cell where Y2 point is as the transfer cell for being responsible for the mass-sending data of iuntercellular described in transfer.Y2 Cell where point will be along direction interim data shown in dotted arrow in Figure 13, until being transmitted to cell where Y3 point.Y3 Cell where point is located on the side of rectangular target areas 131, can continue to complete rectangular target areas according to aforementioned correlation technique Transfer process in 131.
With continued reference to Figure 13, it is assumed that the cell where Z1 point is the starting point cell for initiating iuntercellular mass-sending communication, rectangle mesh Marking region 131 is that the cell where Z1 point issues identified target area before the iuntercellular mass-sending data, at this time Z1 point institute Cell except rectangular target areas 131, due to the cell not rectangular target areas with respect to two sides extended line Between, it is all difficult to form the communication path of straight line between cell any in rectangular target areas, there are two neighbours at this time It can be used as the transfer of the next stop, i.e. the cell where cell and Z3 point where Z2 point, the two cells are where Z1 point The flanking cell to be selected of cell, because the two cells are among the adjacent cell of cell where Z1 point closer to rectangular target The cell in region 131.In actual implementation, it can arbitrarily select one or more practical communication situation selects a burden to compare For light cell as next stop transfer, the communication task that the lighter cell of the burden refers specifically to output data is less Cell.Cell where Z1 point, by two feasible transfer communication paths, until the iuntercellular is mass-sended data In go to cell where Z4 point.Cell where Z4 point is located on the angle of rectangular target areas 131, can be according to aforementioned correlation technique Continue to complete the transfer process in rectangular target areas 131.
Iuntercellular mass-sends communication means in cellular array computing system provided in this embodiment, by by intercellular point pair Point communication function expands to region mass-sending, can support the higher depth of parallelism, obtain much higher total bandwidth, to further mention Rise the overall performance of computing system.
Embodiment two
On the basis of the cellular array computing system that the embodiment of the present invention one provides, cellular array provided in this embodiment Two-dimentional cellular array is expanded to three-dimensional cell array by computing system.
As shown in figure 14, cellular array computing system provided in this embodiment includes: master cpu, cellular array and cell Array bus.The cellular array being different from embodiment one, the cellular array in the present embodiment is by more than one two-dimensional array The three-dimensional cell array (3D cellular array) being formed by stacking, the two-dimensional array are the two-dimentional cellular array in embodiment one (2D cellular array) is equally made of more than one cell for having both calculating and store function, each cell includes micro- place Device and non-volatile random access memory are managed, it specifically can be with related content described in reference implementation example one.
In the present embodiment, the position in each leisure cellular array of each cell storage is as ID in cell Software or hardware are read.It should be noted that the cell ID in the present embodiment can be using the coordinate in rectangular coordinate system in space Representation, the i.e. address cell ID/ are three-dimensional (x, y, z), if (x, y, z) indicates position of some cell in cellular array It sets, then (x, y, z) can be used as ID and be stored in the cell, and the software and hardware in cell can read this ID, specific Operation in use.
In the present embodiment, master cpu is communicated by cellular array bus with each of cellular array cell.It is main The connection relationship for controlling each cell in CPU and any one two-dimensional array can also be refering to Fig. 2.
In the present embodiment, there is communication interface between the flanking cell in the cellular array, can mutually send data.Difference Cellular array in embodiment one, it is flat to be not limited solely to two dimension for the concept of " flanking cell " in the cellular array of the present embodiment Face, but expand to three-dimensional space.If using the communication mode between flanking cell as shown in Figure 3 in two-dimensional array, then In a space rectangular coordinate system, any cell is equal in x-axis both forward and reverse directions, y-axis both forward and reverse directions and this six direction of z-axis both forward and reverse directions With adjacent cell.
As known to those skilled in the art, when a signal passes to another device from a device by a section lead When, it, can be constantly to this capacitor charging and electric discharge, very in the transmission process of signal due to there is the capacitor of a very little on conducting wire Multipotency amount just consumes during this one or two charge is charged on capacitor and is discharged.Although the capacitor on a bit of conducting wire is very It is small, but the frequency of the power and transmission consumed is directly proportional.For the signal of high-speed transfer, led across a bit of on different chips The power of line consumption, is very big relative to the transmission in chip.So to cross through silicon via (TSV, Through Silicon Vias the 3D chip technology based on) is increasingly taken seriously.
In actual implementation, multi-disc 2D cellular array chip, which can be superimposed together, forms 3D chip, by TSV adjacent Iuntercellular establish vertical linkage, that is, be located between the flanking cell of two neighboring two-dimensional array by TSV establish communicate Connection.
The cellular array chip of 3D increases the scale of cellular array, has expanded interior while keeping low-power consumption advantage The bandwidth of portion's communication.
It will be appreciated to those of skill in the art that in the cellular array computing system being described in detail in example 1 Communication means, such as master cpu pass through communication, the cell battle array that each of cellular array bus and cellular array cell carry out All cells of communication, any cell into target area between any two cell in column not against master cpu progress Carry out mass-sending communication, by cellular array be arranged sole duty export cell as end point cell receive and store other cells to For master cpu reading, these equally can extend to cellular array provided in this embodiment and calculate the output data of master cpu The specific implementation of system.
It should be noted that in 3D cellular array:
For the broadcast mechanism of master cpu, i.e., master cpu is instructed by cellular array bus broadcast or data give cell battle array All cells in column in target area, will be extended to three-dimensional space.In the specific implementation, target area can be from (x1, Y1, z1) arrive the cube of (x2, y2, z2), wherein x2 >=x1, y2 >=y1, z2 >=z1.
For the agreement of iuntercellular data dissemination, i.e., not against master cpu between any two cell in cellular array The communication of progress will also be extended to three-dimensional space.In the specific implementation, from cell (x0, y0, a z0) transmission or middle revolution When according to arriving another cell (x1, y1, z1), there are the following 1-3 neighbours that can choose to send, usually can choose one Least congestion neighbours issue or interim data:
X coordinate is closer to x1, if x ≠ x1;
Y-coordinate is closer to y1, if y ≠ y1;
Z coordinate is closer to z1, if z ≠ z1.
For intercellular datacasting, i.e., it is logical that all cells of any cell into target area carry out mass-sending Letter, will equally be extended to three-dimensional space.In the specific implementation, target area equally can be from (x1, y1, z1) to (x2, Y2, z2) cube, wherein x2 >=x1, y2 >=y1, z2 >=z1.
In actual implementation, if the shape for initiating target area determined by the starting point cell of mass-sending communication is cube Body, then:
Starting point cell in cubic objects region, or receive in adjacent two-dimensional array or cubic objects area The transfer cell of the iuntercellular mass-sending data transmitted by overseas flanking cell, to be contained in the two-dimensional array where it The iuntercellular is mass-sended in the sub-goal region as independent sub-goal region the part in the cubic objects region Data are mass-sended, and iuntercellular mass-sending data are sent to the flanking cell in adjacent two-dimensional array;
Starting point cell or transfer cell except cubic objects region send iuntercellular mass-sending data to The cell nearest from starting point cell or transfer cell in the cubic objects region, it is described most from starting point cell or transfer cell Close cell is on the vertex in the cubic objects region, crest line or surface.
In the present embodiment, iuntercellular mass-sending data are sent to the specific reality of the flanking cell in adjacent two-dimensional array It applies, it can be with reference to the relevant way in the embodiment of the present invention one.
The specific implementation of cellular array computing system provided in this embodiment and communication means therein can be with reference to real The cellular array computing system of example one and the related content of communication means therein are applied, details are not described herein again.
Embodiment three
As previously mentioned, the cell in the cellular array of the embodiment of the present invention one has both memory, storage and calculates three functions, The intracellular non-volatile random access memory had not only been able to achieve the arbitrary access of involved data when the microprocessor calculates, but also The instruction code of software can be stored and need the data of persistence, however the cost of the non-volatile random access memory is usually It is higher, so the space that the non-volatile random access memory in cell is used as memory part is limited, then when micro- in cell When the data processing amount of processor is larger, limited memory headroom can influence the treatment effeciency of microprocessor, how extend The memory headroom of cell becomes as urgent problem to be solved.
Based on above-mentioned consideration, on the basis of example 1, the present embodiment gives the another of cellular array computing system again A kind of structure, as shown in figure 15, the cellular array computing system in addition to include foregoing master cpu, cellular array and Cellular array bus, can further include at least one memory cell array, the memory cell array be by one with The two-dimensional array of upper internal storage location composition, the cellular array and all memory cell arrays overlap to form three-dimensional structure, each Internal storage location in memory cell array is connected correspondingly with the cell in the cellular array, the internal storage location cooperation The non-volatile random access memory, the two are provided commonly for the arbitrary access of involved data when the microprocessor calculates.
In actual implementation, the non-volatile random access memory in cell can be MRAM, and the memory cell array then may be used To be MRAM, DRAM or SRAM silicon wafer, lower-cost one or more DRAM silicon wafer can be generally chosen, wherein each DRAM Silicon wafer is the memory cell array formed by the internal storage location consistent with each cell position in the cellular array, then by institute There is DRAM silicon wafer to carry out 3D with cellular array silicon wafer to combine, either memory unit and cell corresponding in cellular array it Between can be established a communications link by TSV, thus extend the memory of each cell.
In the embodiment of the present invention, by the memory cell array for being made of at least one more than one internal storage location, with The cellular array overlaps to form three-dimensional structure, and makes thin in the internal storage location and cellular array in each memory cell array Born of the same parents are connected correspondingly, the arbitrary access of the internal storage location involved data when calculating for the microprocessor, so Just the memory headroom that each cell in cellular array can be extended with lower cost improves the processing effect of microprocessor in cell Rate.
It overlaps to form three-dimensional with the cellular array it should be pointed out that illustrating only a memory cell array in Figure 15 The case where structure, those skilled in the art equally will also appreciate that more than one memory cell array and the cellular array overlap The case where forming three-dimensional structure.
Those skilled in the art are also understood that aforementioned master cpu passes through in cellular array bus and cellular array Each cell communicated, is carried out between any two cell not against master cpu communication, any cell are into target area All cells carry out mass-sending communication, receive and store it by the way that full-time output cell is arranged in cellular array as end point cell His cell the communication means such as reads to the output data of master cpu for master cpu, these are equally applicable to comprising the memory The cellular array computing system of cell array.
It should be noted that due to the memory headroom of each Cell expansions in cellular array, the master cpu is in addition to can To access the non-volatile random access memory of this cell, can also access corresponding with this cell internal storage location (when with cell battle array Column are superimposed as the quantity of the memory cell array of three-dimensional structure when being more than one, then corresponding with this cell internal storage location Quantity also has more than one), thus the master cpu by the cellular array bus with it is each in the cellular array The communication that a cell carries out includes at least one of following situations: reading and writing the non-of any cell in the cellular array by address Volatile random access memory or corresponding internal storage location;The non-volatile of each cell in data broadcasting to target area is deposited at random Reservoir or corresponding internal storage location, and the non-volatile random access memory of each cell or corresponding interior in the target area is written Identical relative address in memory cell;To in the cellular array any cell microprocessor send instruction, send data or Reading state;To the microprocessor broadcasting instructions of cells all in target area.
When the cell in the cellular array further includes bus control unit and cell interior bus, the cell interior is total Line is connected in addition to connecting the microprocessor, non-volatile random access memory, internal storage location also corresponding with this cell, described total Lane controller is connected with the cellular array bus, microprocessor and cell interior bus, and the bus control unit is for knowing The communication not carried out between the master cpu and this cell connects the microprocessor to transmit the finger that the master cpu is sent Enable or data, state read, or by the cell interior bus connection non-volatile random access memory or with this cell Corresponding internal storage location carries out the read-write operation of data.
Example IV
It will be appreciated by those skilled in the art that the structure of the cellular array computing system due to the offer of embodiment two, equally There is the technical solution technical problems to be solved that embodiment three provides, i.e., how to extend each cell in cellular array The problem of memory headroom, therefore the present embodiment is on the basis of example 2, gives the another of cellular array computing system Kind structure is to solve the above problems.
As shown in figure 16, the cellular array computing system includes master cpu, cellular array and cellular array bus, institute Stating cellular array is the three-dimensional cell array (3D cellular array) being formed by stacking by more than one two-dimentional cellular array, the two dimension Cellular array is made of more than one cell for having both calculating and store function, each cell includes microprocessor and non-volatile Random access memory, specifically can be with related content described in reference implementation example one;In addition, cellular array meter provided in this embodiment Calculation system can further include memory cell array, and the memory cell array is made of more than one internal storage location Two-dimensional array, at least one two-dimentional cellular array overlap to form three-dimensional knot with one or more corresponding memory cell arrays Structure, the internal storage location in each memory cell array are connected correspondingly with the cell in corresponding two-dimentional cellular array;Institute It states internal storage location and cooperates the non-volatile random access memory, the two is provided commonly for involved data when the microprocessor calculates Arbitrary access.
In actual implementation, the non-volatile random access memory in cell can be MRAM, and the memory cell array then may be used To be MRAM, DRAM or SRAM silicon wafer, lower-cost one or more DRAM silicon wafer can be generally chosen, wherein each DRAM Silicon wafer is the memory cell array formed by the internal storage location consistent with each cell position in the two-dimentional cellular array, then By at least one two-dimentional cellular array silicon wafer (a two-dimentional cellular array is on a silicon wafer) and corresponding one or more DRAM silicon wafer carries out 3D combination, can be built by TSV between either memory unit and corresponding cell in two-dimentional cellular array It is vertical to write to each other, thus extend the memory of each cell.
In actual implementation, usually the memory cell array of identical quantity can be formed into a memory cell array group, And keep the quantity of the memory cell array group identical as the two-dimentional quantity of cellular array, each memory cell array group with Each two dimension cellular array overlaps correspondingly forms three-dimensional structure.As shown in figure 16, every in three two-dimentional cellular arrays A two dimension cellular array all (includes an internal storage location battle array in each memory cell array group with a memory cell array group Column) position for forming three-dimensional structure, and overlapping between each two-dimentional cellular array and memory cell array group is overlapped correspondingly Set relationship all and be it is identical, it is so either all relatively reasonable when in manufacturing process or in practical application.Certainly, at other In embodiment, also it is not absolutely required to be the two-dimentional cellular array exented memory of each of 3D cellular array.
Cellular array computing system provided in this embodiment can extend each cell in cellular array with lower cost Memory headroom improves the treatment effeciency of microprocessor in cell.
It will be appreciated to those of skill in the art that aforementioned master cpu passes through in cellular array bus and cellular array often A cell communicated, is carried out between any two cell not against master cpu communication, any cell institute into target area There is cell to carry out mass-sending communication, receive and store other by the way that full-time output cell is arranged in cellular array as end point cell Cell the communication means such as reads to the output data of master cpu for master cpu, these are equally applicable to above-mentioned comprising 3D cell The cellular array computing system of array and memory cell array.
Specific implementation about cellular array computing system provided in this embodiment can be with the phase in reference implementation example three Hold inside the Pass.
Although present disclosure is as above, present invention is not limited to this.Anyone skilled in the art are not departing from this It in the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute Subject to the range of restriction.

Claims (10)

1. a kind of cellular array computing system characterized by comprising master cpu, cellular array, cellular array bus and interior Deposit receipt element array, the cellular array and the cellular array bus are integrated in a chip;
The cellular array is the three-dimensional cell array being formed by stacking by more than one two-dimentional cellular array, the two dimension cell battle array Column be made of more than one cell for having both calculating and store function, wherein each cell include microprocessor and it is non-volatile with Machine memory;The arbitrary access of the non-volatile random access memory involved data when calculating for the microprocessor, is also used In storage software instruction code and need the data of persistence;
The two-dimensional array that the memory cell array is made of more than one internal storage location, at least one two-dimentional cellular array with One or more corresponding memory cell arrays overlap to form three-dimensional structure, the internal storage location in each memory cell array It is connected correspondingly with the cell in corresponding two-dimentional cellular array;When the internal storage location is calculated for the microprocessor The arbitrary access of involved data;
Position in each leisure cellular array of each cell storage as ID in cell software or hardware read;
The master cpu is communicated by the cellular array bus with each of cellular array cell;
There is communication interface between flanking cell in the cellular array, it can the lower mutually transmission number of instruction software control in the cell According to.
2. cellular array computing system according to claim 1, which is characterized in that the master cpu passes through the cell The communication that each of array bus and the cellular array cell carry out includes at least one of following situations:
The non-volatile random access memory of any cell or corresponding internal storage location in the cellular array are read and write by address;
By data broadcasting into the cellular array non-volatile random access memory of each cell or corresponding in target area Internal storage location, and be written in the target area identical in the non-volatile random access memory of each cell or corresponding internal storage location Relative address;
Instruction is sent to the microprocessor of any cell in the cellular array, sends data or reading state;
To the microprocessor broadcasting instructions of cells all in target area.
3. cellular array computing system according to claim 1, which is characterized in that the cell in the cellular array also wraps Bus control unit and cell interior bus are included, the cell interior bus connects the microprocessor, non-volatile random access memory And internal storage location corresponding with this cell, the bus control unit and the cellular array bus, microprocessor and thin Born of the same parents' internal bus is connected, the communication that the bus control unit carries out between the master cpu and this cell for identification, connects institute Microprocessor is stated to transmit the instruction or data, state reading that the master cpu is sent, or passes through the cell interior bus It connects the non-volatile random access memory or internal storage location corresponding with this cell carries out the read-write operation of data.
4. cellular array computing system according to claim 1, which is characterized in that the memory cell array group of identical quantity At a memory cell array group, the quantity of the memory cell array group is identical as the two-dimentional quantity of cellular array, respectively A memory cell array group and each two-dimentional cellular array overlap form three-dimensional structure correspondingly.
5. cellular array computing system according to claim 1, which is characterized in that the non-volatile random access memory is MRAM, the memory cell array are MRAM, DRAM or SRAM silicon wafer, and a two-dimentional cellular array is on a silicon wafer.
6. cellular array computing system according to claim 1, which is characterized in that be located at two neighboring two-dimentional cell All pass through between the flanking cell of array and between the internal storage location and corresponding cell in the two-dimentional cellular array Through silicon via is crossed to establish a communications link.
7. cellular array computing system according to claim 1, which is characterized in that any two in the cellular array Can be carried out communication between cell, participate in cell-cell communication cell include starting point cell, end point cell and transfer cell, described Point cell is the cell that data are issued to the end point cell, and the end point cell is sent out number finally to receive the starting point cell According to cell, the transfer cell is successively adjacent and pass through starting point described in the communication interface transfer along cell-cell communication path The cell of the sent out data of cell, the cell-cell communication path are by the starting point cell, transfer cell and end point cell institute structure At data transmit-receive path.
8. cellular array computing system according to claim 7, which is characterized in that any cell in the cellular array Mass-sending communication can also be carried out as all cells of the starting point cell into target area, participate in the mass-sending communication and be located at Cell in target area as the starting point cell or as the end point cell or simultaneously as the transfer cell and End point cell participates in the mass-sending communication and the cell outside target area as the starting point cell or transfer cell.
9. cellular array computing system according to claim 7 or 8, which is characterized in that be additionally provided in the cellular array At least one sole duty output cell, the sole duty export cell as the end point cell and receive and store other cells to described The output data of master cpu, and notify the master cpu to read the output data with interrupt signal.
10. cellular array computing system according to claim 7 or 8, which is characterized in that the cell in the cellular array It further include the network controller being connected with the microprocessor, the network controller is used in cell-cell communication to sending Data, the data of transfer or final received data carry out transmitting-receiving control, are also used to send to the microprocessor and interrupt letter Number.
CN201510456244.5A 2015-07-29 2015-07-29 Cellular array computing system Active CN105718992B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510456244.5A CN105718992B (en) 2015-07-29 2015-07-29 Cellular array computing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510456244.5A CN105718992B (en) 2015-07-29 2015-07-29 Cellular array computing system

Publications (2)

Publication Number Publication Date
CN105718992A CN105718992A (en) 2016-06-29
CN105718992B true CN105718992B (en) 2019-02-19

Family

ID=56144866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510456244.5A Active CN105718992B (en) 2015-07-29 2015-07-29 Cellular array computing system

Country Status (1)

Country Link
CN (1) CN105718992B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704137A (en) * 2020-07-30 2021-11-26 西安紫光国芯半导体有限公司 In-memory computing module and method, in-memory computing network and construction method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341242A (en) * 1999-01-21 2002-03-20 索尼电脑娱乐公司 High-speed processor system, method of using the same, and recording medium
CN101354694A (en) * 2007-07-26 2009-01-28 上海红神信息技术有限公司 Ultra-high expanding super computing system based on MPU structure
CN103019656A (en) * 2012-12-04 2013-04-03 中国科学院半导体研究所 Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system
CN104701309A (en) * 2015-03-24 2015-06-10 上海新储集成电路有限公司 Three-dimensional stacked nerve cell device and preparation method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341242A (en) * 1999-01-21 2002-03-20 索尼电脑娱乐公司 High-speed processor system, method of using the same, and recording medium
CN101354694A (en) * 2007-07-26 2009-01-28 上海红神信息技术有限公司 Ultra-high expanding super computing system based on MPU structure
CN103019656A (en) * 2012-12-04 2013-04-03 中国科学院半导体研究所 Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system
CN104701309A (en) * 2015-03-24 2015-06-10 上海新储集成电路有限公司 Three-dimensional stacked nerve cell device and preparation method thereof

Also Published As

Publication number Publication date
CN105718992A (en) 2016-06-29

Similar Documents

Publication Publication Date Title
CN105718994B (en) Cellular array computing system
CN105740946B (en) A kind of method that application cell array computation system realizes neural computing
CN105718996B (en) Cellular array computing system and communication means therein
CN105608490B (en) Cellular array computing system and communication means therein
Zhan et al. OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures
Shen et al. Agent-based traffic simulation and traffic signal timing optimization with GPU
CN103761215B (en) Matrix transpose optimization method based on graphic process unit
TW201423600A (en) Technique for improving performance in multi-threaded processing units
CN103999051A (en) Policies for shader resource allocation in a shader core
CN103207774A (en) Method And System For Resolving Thread Divergences
CN102135949A (en) Computing network system, method and device based on graphic processing unit
CN106708626A (en) Low power consumption-oriented heterogeneous multi-core shared cache partitioning method
CN105718991B (en) Cellular array computing system
CN105718990B (en) Communication means between cellular array computing system and wherein cell
CN105468439A (en) Adaptive parallel algorithm for traversing neighbors in fixed radius under CPU-GPU (Central Processing Unit-Graphic Processing Unit) heterogeneous framework
Li et al. A hybrid particle swarm optimization algorithm for load balancing of MDS on heterogeneous computing systems
CN105718392B (en) Cellular array document storage system and its file-storage device and file memory method
CN105718380B (en) Cellular array computing system
Ghasemi et al. GraphA: An efficient ReRAM-based architecture to accelerate large scale graph processing
CN105718379B (en) Cellular array computing system and wherein iuntercellular mass-send communication means
CN105718993B (en) Cellular array computing system and communication means therein
US20120166682A1 (en) Memory mapping apparatus and multiprocessor system on chip platform including the same
CN105718992B (en) Cellular array computing system
CN109863478A (en) Fine granularity power optimization for isomerism parallel structure
CN104156316B (en) A kind of method and system of Hadoop clusters batch processing job

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant