CN105608490B - Cellular array computing system and communication means therein - Google Patents

Cellular array computing system and communication means therein Download PDF

Info

Publication number
CN105608490B
CN105608490B CN201510456294.3A CN201510456294A CN105608490B CN 105608490 B CN105608490 B CN 105608490B CN 201510456294 A CN201510456294 A CN 201510456294A CN 105608490 B CN105608490 B CN 105608490B
Authority
CN
China
Prior art keywords
cell
cellular array
data
array
master cpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510456294.3A
Other languages
Chinese (zh)
Other versions
CN105608490A (en
Inventor
戴瑾
郭民
郭一民
王践识
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ciyu Information Technologies Co Ltd
Original Assignee
Shanghai Ciyu Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ciyu Information Technologies Co Ltd filed Critical Shanghai Ciyu Information Technologies Co Ltd
Priority to CN201510456294.3A priority Critical patent/CN105608490B/en
Publication of CN105608490A publication Critical patent/CN105608490A/en
Application granted granted Critical
Publication of CN105608490B publication Critical patent/CN105608490B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Multi Processors (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

A kind of cellular array computing system and communication means therein, the cellular array computing system include:Master cpu, cellular array and cellular array bus;The cellular array is had both the two-dimensional array that the cell of calculating and store function forms by more than one, and wherein each cell includes microprocessor and non-volatile random access memory;The arbitrary access of non-volatile random access memory involved data when being calculated for microprocessor is additionally operable to the instruction code of storage software and needs the data of persistence;Position in each comfortable cellular array of each cell storage as ID in cell software or hardware reading;Master cpu is communicated by cellular array bus with each cell in cellular array;There is communication interface between flanking cell in cellular array, it can mutual transmission data.The present invention can overcome active computer framework because of communication performance bottleneck existing between CPU and memory, storage, promote the overall performance of computing system.

Description

Cellular array computing system and communication means therein
Technical field
The present invention relates to computer and computer application technology, more particularly to a kind of cellular array computing system and Communication means therein.
Background technology
Usually, a computer includes mainly three cores:Central processing unit (CPU, Central Processing Unit), memory and storage.
By the unremitting effort of some Apical Limited of the world, CPU has evolved into extremely complicated semiconductor chip.It is top CPU core inside metal-oxide-semiconductor number can be more than 100,000,000.Current industry trend is to be limited by power consumption, the operation frequency of CPU Rate has been difficult to improve again.Extremely complicated modern CPU, operational efficiency are similarly difficult to improve again.New CPU products, more Come more towards the evolution of multinuclear direction.
In terms of memory, that occupy dominant position at present is dynamic random access memory (DRAM, Dynamic Random Access Memory) technology.DRAM can quick random read-write, but content cannot be kept in the event of a power failure.It is practical On, even if in the case of energization, it is also due to internal lose information for storing up the electric leakage of stored capacitor, it is necessary to Periodically self-refresh.
In terms of storage, nand flash memory technology gradually replaces conventional hard.The floating grid that flash memory is relied on (floating gate) technology, although content can be kept in the event of a power failure, the speed of write-in (being rewritten as ' 0 ' by ' 1 ') Degree is very slow, and the speed of erasing (being rewritten as ' 1 ' by ' 0 ') is slower, and no image of Buddha DRAM is like that for the direct support to calculating.Its quilt Make block device (block device), it is necessary to which monoblock is wiped together, and a block (block) includes many pages (page), is wiped Write operation can be carried out except latter every page.The another question of NAND is that have the limited service life.
The logic circuit of DRAM and nand flash memory and CPU, although being all based on cmos semiconductor technique productions, The technique of this three is each other and incompatible.Then, three cores of computer can not coexist on a single die, this is deep Ground affects the framework of modern computer.
Computer architecture in the prior art as shown in Figure 1, show multiple CPU cores in Fig. 1, respectively CPU1, CPU2, CPU3 ..., CPUn, each CPU core generally have corresponding level cache (L1Cache), as needed can also be into one Step is that each CPU core is equipped with corresponding L2 cache (L2Cache), three-level caches (L3Cache).In DRAM and each CPU It is communicated by Double Data Rate (DDR, Double Data Rate) interface between core, hard disk (HD, Hard Disk) or solid-state Hard disk (SSD, Solid State Drives) is then communicated by peripheral device interface between each CPU core.
On the one hand, CPU develops to the direction of multinuclear, and another aspect memory and storage are all in other chip.Multinuclear CPU handles up the proportional increase of information content, and the bottleneck of system performance is just increasingly becomed with the communication of memory, storage.In order to alleviate Communication performance bottleneck, CPU have to use increasing multi-level buffer.Caching be in memory content replicate, typically at This but speed faster static RAM (SRAM, Static Random Access more much higher than DRAM Memory it) designs.Such framework, cost effectiveness are very poor.The cost of semiconductor chip determines by the area of its silicon chip, and The performance boost that convention computer architecture is brought and the increase of its silicon area are much disproportionate.
Invention content
The problem to be solved in the present invention is computer architecture in the prior art because existing between CPU and memory, storage Communication performance bottleneck and the promotion for influencing computer overall performance, and keep cost effectiveness poor.
To solve the above problems, technical solution of the present invention provides a kind of cellular array computing system, including:It is master cpu, thin Born of the same parents' array and cellular array bus;The cellular array is had both two that the cell of calculating and store function forms by more than one Array is tieed up, wherein each cell includes microprocessor (MPU, Micro Processing Unit) and non-volatile (NV, Not Volatile) random access memory;When the non-volatile random access memory is calculated for the microprocessor involved data with Machine accesses, and is additionally operable to the instruction code of storage software and needs the data of persistence;Described in each leisure of each cell storage Position in cellular array as identity recognition number (ID, identification) in cell software or hardware read; The master cpu is communicated by the cellular array bus with each cell in the cellular array;The cell There is communication interface between flanking cell in array, it can mutual transmission data.
Optionally, the master cpu by each cell in the cellular array bus and the cellular array into Capable communication includes at least one of following situations:
The non-volatile random access memory of any cell in the cellular array is read and write by address;
By in data broadcasting to the cellular array in target area each cell non-volatile random access memory, and write Enter in the target area identical relative address in the non-volatile random access memory of each cell;
Instruction, transmission data or reading state are sent to the microprocessor of any cell in the cellular array;
To the microprocessor broadcasting instructions of all cells in the target area.
Optionally, the cell in the cellular array further includes bus control unit and cell interior bus, total line traffic control Device processed is connected with the cellular array bus, microprocessor and cell interior bus, bus control unit institute for identification State the communication carried out between master cpu and this cell, connect the microprocessor with transmit instruction that the master cpu is sent or Data, state are read, or connect the read-write that the non-volatile random access memory carries out data by the cell interior bus Operation.
Optionally, Floating-point Computation processor (FPU, Float Point Unit) and image are integrated in the microprocessor At least one of processor.
Optionally, the non-volatile random access memory is magnetic RAM (MRAM, Magnetic Random Access Memory)。
Optionally, the master cpu is integrated in the cellular array and the cellular array bus in a chip.
Optionally, the master cpu is as independent chip, by the memory interface of standard with by the cellular array and The chip of the cellular array bus composition is communicated.
To solve the above problems, technical solution of the present invention also provides the communication party in a kind of above-mentioned cellular array computing system Method, including:
Any cell in the cellular array receives the target that the master cpu is broadcasted in the cellular array bus Address connects the non-volatile random access memory of the cell so that the master if judging the destination address in this cell Control the read-write operation that CPU carries out data;
The first special address field is reserved in system address space for the communication between the master cpu and microprocessor And store the ID of target cell, identified when if any cell receiving the first special address field in the cellular array be with The communication of the microprocessor of this cell, the then microprocessor for connecting the cell complete subsequent command reception, data receiver and shape State read operation;
The second special address field is reserved in system address space and is used for the master cpu broadcasting instructions, and described second is special Different address field has the ID that can assist in each cell of the range of target area in the cellular array, if the cell battle array Any cell identifies that this cell in the target area, then connects the cell after receiving the second special address field in row Microprocessor read with transmitting instruction that the master cpu is sent or data, state, or by connect the cell it is non-easily Lose the read-write operation that random access memory carries out data;
Any cell in the cellular array is under the control of its microprocessor to flanking cell transmission data.
Optionally, the cell in the cellular array further includes bus control unit and cell interior bus, total line traffic control Device processed is connected with the cellular array bus, microprocessor and cell interior bus;Any cell in the cellular array Judge that the destination address whether in this cell, identifies whether the communication with the microprocessor of this cell, identifies that this cell is It is no in the target area, and connection non-volatile random access memory or microprocessor be to be completed by the bus control unit , the bus control unit connects the non-volatile random access memory by the cell interior bus.
Optionally, read-write of the master cpu for the non-volatile random access memory of any cell in the cellular array The priority of operation is higher than the intracellular microprocessor for the read-write operation of corresponding non-volatile random access memory.
Technical solution of the present invention also provides a kind of above-mentioned cellular array computing system calculating Monte Carlo (Monte of application Carlo) the method integrated, including:
The master cpu selects the cell in a whole or target area in the cellular array, integrand pair Relative address section of the program broadcast answered to each selected cell;
The master cpu broadcasting instructions make the microprocessor of selected cell execute institute since the relative address section State the corresponding program of integrand;
After each cell completes integral operation, summation is stored in the address of agreement, is carried out after being read for the master cpu Total summation.
Optionally, when starting to execute, included randomizer reads thin the corresponding program of the integrand The ID of born of the same parents is as seed.
Technical solution of the present invention also provides the above-mentioned cellular array computing system of another application and calculates Monte Carlo integrals Method, including:
The master cpu selects the cell in a whole or target area in the cellular array;
The master cpu is broadcasted one and is downloaded in program to the same segment relative address of each selected cell, and Broadcasting instructions make the microprocessor of selected cell execute the download program since the relative address;
The corresponding program of integrand is split into two or more subprograms, the master cpu broadcast is each In subprogram to the microprocessor of selected cell;
The microprocessor for downloading program is run, selects one of subprogram to store up according to the ID of respective place cell It deposits, each subprogram is made sequentially to be deployed in one group of adjacent successively cell;
The master cpu broadcasting instructions make the microprocessor of each group of cell execute the integrand successively corresponding Program be split after subprogram, the intermediate result of previous stage is transported to next stage and inputs;
After each group of cell completes integral operation, summation is stored in the address of agreement, is read for the master cpu laggard The total summation of row.
Compared with prior art, technical scheme of the present invention has at least the following advantages:
(claimed with the unit of store function (referred to as " cell ") composition two-dimensional array by the way that more than one is had both independent calculate For " cellular array "), wherein each cell includes microprocessor and non-volatile random access memory, the non-volatile random storage The arbitrary access of device involved data when the microprocessor can be supported to be calculated can also support the instruction generation of storage software Code and need the data of persistence, make memory, storage, calculate three functions and be integrated into each cell, and make each cell it Between form intensive communication network, on the one hand, master cpu can pass through each cell in cellular array bus and cellular array Communicated, on the other hand, between the flanking cell in cellular array also can mutual transmission data, thus, it is possible to be mass-sended by data And internal network, overcome active computer framework because of communication performance bottleneck existing between CPU and memory, storage, is calculated to be promoted The overall performance of system, and cost effectiveness can be made preferable.
It utilizes multiple nuclear CPU framework to calculate Monte Carlo in the prior art to integrate, when the caching that each CPU is carried is inadequate When, all CPU are required for reading the code of integrand from the interface with memory, and bottleneck is consequently formed, and pass through application Above-mentioned cellular array computing system calculates Monte Carlo integrals, broadcast/mass-sending function of cellular array computing system and can The advantage of the internal network of extensive parallel transmission data will fully demonstrate out, while a large amount of cells operation make the cell The powerful operational capability of array computation system discharges completely, to be more efficiently completed the calculating of Monte Carlo integrals.
Description of the drawings
Fig. 1 is the schematic diagram of computer architecture in the prior art;
Fig. 2 is a kind of structural schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Fig. 3 be the embodiment of the present invention a kind of flanking cell between communication mode schematic diagram;
Fig. 4 be the embodiment of the present invention another flanking cell between communication mode schematic diagram;
Fig. 5 is the structural schematic diagram of the cell of the embodiment of the present invention;
Fig. 6 is the schematic diagram that application cell array computation system calculates Monte Carlo integrals in pipelined fashion;
Fig. 7 be the embodiment of the present invention cellular array in carry out cell-cell communication cell structural schematic diagram;
Fig. 8 be the embodiment of the present invention cellular array in cell-cell communication Path selection schematic diagram;
Fig. 9 is the implementation process schematic diagram of the full-time output cell of the embodiment of the present invention;
Figure 10 be the embodiment of the present invention cellular array in starting point cell on the angle of target area cell mass-sending schematic diagram;
Figure 11 be the embodiment of the present invention cellular array in starting point cell on the side of target area cell mass-sending schematic diagram;
Figure 12 be the embodiment of the present invention cellular array in starting point cell inside target area cell mass-sending schematic diagram;
Figure 13 be the embodiment of the present invention cellular array in starting point cell outside target area cell mass-sending schematic diagram;
Figure 14 is another structural schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Figure 15 is another structural schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Figure 16 is the yet another construction schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Figure 17 is a kind of structural schematic diagram of camera system provided in an embodiment of the present invention;
Figure 18 is another structural schematic diagram of camera system provided in an embodiment of the present invention;
Figure 19 is the schematic diagram of a neuron in neural network;
Figure 20 is the schematic diagram of neural computing;
Figure 21 is the structural schematic diagram of the cellular array computing system provided in an embodiment of the present invention with debugging interface.
Specific implementation mode
The computer architecture of the prior art is since there are communication performance bottlenecks between CPU and memory, storage, to influence computer The promotion of overall performance, and keep cost effectiveness poor.
Present inventor considers after research, if three memory, storage, calculating functions are integrated into a chip Come up, formed it is relatively easy but have both it is independent calculate and the unit of store function, and make to be formed between a large amount of such units intensive Communication network just can be developed with the internal network that this realizes data mass-sending function and is capable of extensive parallel transmission data There are the computing architecture of similarity, this is equivalent to does a large amount of microcomputer on a single die a kind of and human brain.
For this purpose, technical solution of the present invention provides one kind computing architecture similar with human brain structure (in technical solution of the present invention It is referred to as " cellular array computing system "), the computing architecture is relatively easy by numerous structures, has both storage and computing function, The unit (being referred to as in technical solution of the present invention " cell ") connected again by dense network forms.This new computing architecture will It is widely used in the fields such as mass computing, big data processing, artificial intelligence.
To make the above purposes, features and advantages of the invention more obvious and understandable, below in conjunction with the accompanying drawings to the present invention Specific embodiment be described in detail.
As shown in Fig. 2, cellular array computing system provided in an embodiment of the present invention includes:Master cpu, cellular array and thin Born of the same parents' array bus;The cellular array is the main body in the cellular array computing system, it is to have both calculating by more than one The two-dimensional array formed with the cell of store function, wherein each cell include microprocessor (MPU) and non-volatile deposit at random Reservoir (in Fig. 2 by taking MRAM as an example);The non-volatile random access memory involved data when being calculated for the microprocessor Arbitrary access is additionally operable to the instruction code of storage software and needs the data of persistence;Each comfortable institute of each cell storage State position in cellular array as ID in cell software or hardware read;The master cpu passes through the cell battle array Column bus is communicated with each cell in the cellular array;There is communication between flanking cell in the cellular array Interface, can mutual transmission data.
It should be noted that in the present embodiment, the non-volatile random access memory is illustrated by taking MRAM as an example. In other embodiment, with further development and ripe, the non-volatile random access memory of non-volatile random storage technology In addition several potential technologies may be used to realize, such as:Phase-change random access memory (PCRAM, Phase Change Random Access Memory), resistive formula random access memory (Resistive Random Access Memory), ferroelectricity with Machine memory (FeRAM, Ferroelectric Random Access Memory), ferroelectric dynamic random access memory (FEDRAM, Ferroelectric Dynamic Random Access Memory) etc..
MRAM is a kind of new memory and memory technology, can as SRAM/DRAM quick random read-write, and compare DRAM is fast;Can also as flash memory permanent retention data after a loss of power, and unlike nand flash memory, MRAM can with unlimited time Erasable, service life is longer.In addition, the economy of MRAM thinks that locality is good, the silicon area ratio SRAM that unit capacity occupies is (usually Caching as CPU) there is prodigious advantage, it is expected to the level close to DRAM.Its performance is also fairly good, and read-write time delay is close to most Good SRAM, power consumption are then best in various memories and memory technology.And MRAM is unlike DRAM and Flash and standard Cmos semiconductor technique is incompatible, and MRAM can be integrated into logic circuit in a chip.By using MRAM technology, so that it may Three memory, storage, calculating functions are integrated on a chip, the realization of the cellular array computing system is made just It is possibly realized.
In the present embodiment, the microprocessor has the function of usual CPU, can also be added according to specific application scenarios The units such as Floating-point Computation processor (FPU, Float Point Unit), image processor, therefore, in the microprocessor can be with It is integrated at least one of Floating-point Computation processor and image processor.
In actual implementation, the master cpu can be integrated in one with the cellular array and the cellular array bus In a chip, the master cpu can also be used as independent chip, by the memory interface of standard with by the cellular array and The chip of the cellular array bus composition is communicated.When using standard between the master cpu and the cellular array When memory interface is communicated, then master cpu may be used general cpu chip and realize, it is easier to which the cellular array calculates The implementation of system.
In the present embodiment, as ID, the position can adopt the position in each comfortable cellular array of each cell storage With the coordinate representation mode of first quartile in plane right-angle coordinate, if (x, y) indicates position of some cell in cellular array It sets, then (x, y) can be stored in as ID in the cell, and the software and hardware in cell can read this ID, specific It is used in operation.
In the present embodiment, the master cpu is thin by each in the cellular array bus and the cellular array The communication that born of the same parents carry out includes following situations:
The non-volatile random access memory of any cell in the cellular array is read and write by address;
By in data broadcasting to the cellular array in target area each cell non-volatile random access memory, and write Enter in the target area identical relative address in the non-volatile random access memory of each cell;
To the microprocessor of any cell in the cellular array send instruction (including start, suspend), transmission data or Reading state;
To the microprocessor broadcasting instructions of all cells in target area.
Certainly, in other embodiments, the master cpu passes through in the cellular array bus and the cellular array The communication that each cell carries out can also be one or more combinations in the above situation.
It should be noted that " target area " in the embodiment of the present invention refers to by the master cpu or the cell battle array Any cell in row selects more than one there are the region that the cell of neighbouring relations is constituted, and the cell in the region is master control The object of data or instruction is broadcasted/mass-sended to any cell in CPU or cellular array.In the present embodiment, the target area tool With rectangular area, (a≤x≤b, c≤y≤d, wherein a, b are rectangular area x-axis direction in plane right-angle coordinate respectively to body Boundary coordinate, c, d are the boundary coordinate in rectangular area y-axis direction in plane right-angle coordinate respectively) for said It is bright.In other embodiments, the target area can also be the region that other shapes are constituted, such as diamond-shaped area, triangle Region, hexagonal area etc..
In addition, the concept of " broadcast " is different from the concept of " mass-sending " in the embodiment of the present invention, the former can be one pass of hair According to or instruction so that all objects can receive, and the latter can be then point to issue different objects many times over.
Except through master cpu to (including the microprocessor in cell or non-volatile of any cell in the cellular array Random access memory) it is broadcasted, in the cellular array, there are one communication network, which enables to one Cell can be under the control of its MPU, to the cell transmission data adjacent with it.As shown in figure 3, in a plane, Ren Yixi Born of the same parents can communicate with the flanking cell of its upper and lower, left and right four direction.Certainly, between flanking cell communication mode it is general Thought is not limited only to " upper and lower, left and right four direction ", in the case where configuration can be supported, can also be " upper, Under, left and right, upper left, upper right, lower-left, the direction of bottom right eight ", as shown in figure 4, any cell can with its upper and lower, left and right, Upper left, upper right, lower-left, eight directions in bottom right flanking cell communicated.
As shown in figure 5, in the present embodiment, the cell in the cellular array can also include specifically bus control unit and Cell interior bus, the bus control unit are connected with the cellular array bus, microprocessor and cell interior bus, institute The communication that bus control unit carries out between the master cpu and this cell for identification is stated, connects the microprocessor to transmit The instruction or data, state that the master cpu is sent are read, or by the cell interior bus connect it is described it is non-volatile with Machine memory carries out the read-write operation of data.
Those skilled in the art know, one fairly simple and the good CPU of performance, such as ARM Cotex M0, only 50,000 A or so metal-oxide-semiconductor, even if appropriateness increases more much smaller than more than one hundred million a metal-oxide-semiconductors of top CPU if FPU functions, promotion cpu performance Area (cost) increase brought is out-of-proportion.The many small CPU of CPU one big are replaced, in the same feelings of totle drilling cost Under condition, total computing capability centainly increases many times.However convention computer architecture is limited by communication performance bottleneck, using in a large amount of CPU The actual performance that nucleus band comes is promoted very limited.
And the cellular array computing system that technical solution of the present invention provides solves logical by data broadcasting and internal network The problem of believing bottleneck, to promote the overall performance of computing system, and can make cost effectiveness preferable, will be real in subsequent application to this It is more clearly visible that this point in example.
It is preliminary to coordinate the memory of 32KB studies have shown that if using the MPU similar to Cortex M0, it forms one thin Born of the same parents.Using 40 nanometers of technique, 3000 such cells can be done on a single die, this is very powerful calculating energy Power.Further investigations have shown that in this way, the calculating of the present age top CPU can be surmounted in same silicon area Ability (is generally measured with flops per second (FLOPS, floating-point operations per second)). Since the cellular array computing system of technical solution of the present invention no longer faces the bottleneck with memory interface, solving much actually to ask In topic, the meeting of performance is more preferable.
Based on above-mentioned cellular array computing system, the embodiment of the present invention is also provided in a kind of above-mentioned cellular array computing system Communication means, including:Master cpu reads and writes the communication between operation, master cpu and the microprocessor of non-volatile random access memory It operates, the traffic operation in the broadcast operation and cellular array of master cpu between flanking cell;
The operation of the master cpu read-write non-volatile random access memory specifically includes:It is any thin in the cellular array Born of the same parents receive the destination address that the master cpu is broadcasted in the cellular array bus, if judging the destination address at this In cell, then the non-volatile random access memory of the cell is connected so that the master cpu carries out the read-write operation of data.
Traffic operation between the master cpu and microprocessor specifically includes:First is reserved in system address space Special address field is for the communication between the master cpu and microprocessor and stores the ID of target cell, if the cell battle array Identify it is communication with the microprocessor of this cell when any cell receives the first special address field in row, then connection should The microprocessor of cell completes subsequent command reception, data receiver and status read operation.
It should be noted that the system address space be not limited only in each cell by cellular array include Non-volatile random access memory composition address space summation because the memory of connection cellular array bus may be not just thin The non-volatile random access memory for including in each cell of born of the same parents' array, entirely possible there is also other kinds of memory and cells Array bus is connected, and is accessed for the master cpu.Therefore, the master cpu needs to be gone to identify its preparation visit according to the ID of cell The cell asked (cell is known as " target cell " in the present embodiment at this time).
The broadcast operation of the master cpu specifically includes:The second special address field is reserved in system address space to be used for The master cpu broadcasting instructions, the second special address field have and can assist in target area in the cellular array The ID of each cell of range, if any cell identifies that this is thin after receiving the second special address field in the cellular array Born of the same parents then connect the microprocessor of the cell to transmit the instruction or data, shape that the master cpu is sent in the target area State is read, or the read-write operation of data is carried out by connecting the non-volatile random access memory of the cell.
The broadcast operation of the master cpu is illustrated so that the target area is specifically rectangular area as an example below. One section is reserved in system address space and is used as broadcasting instructions, and one in this address section rises for storing in target rectangle region The ID of point cell.The starting point cell is the first cell accessed by master cpu in the target rectangle region, total in the cell After lane controller receives this special address, the data of a subsequent word (word) are received, this data includes target rectangle The ID of the cell diagonal with starting point cell in region.Bus control unit judges this cell in this region, second word of reception Data.It is instruction or data to MPU that second word, which is indicated, or some relative address is opened from non-volatile random access memory Begin to be written.If it is the former, MPU is connected, if it is the latter, connection non-volatile random access memory completes subsequent operation.
It should be noted that in the case where the memory space of the described second special address field is relatively limited, storage Cell ID may not be able to determine therefrom that the range of target area completely, after receiving the described second special address field at this time, also It needs to receive subsequent data to coordinate the cell ID stored in the described second special address field to determine target area jointly Range.
Traffic operation in cellular array between flanking cell includes:Any cell in the cellular array is at its micro- place It manages under the control of device to flanking cell transmission data.
In the present embodiment, there are one bus control units for each cell setting, are connect with cellular array bus.Cell interior is set Cell interior bus is set, the non-volatile random access memory is the slave equipment (Slave) of the cell interior bus, total line traffic control Device and microprocessor processed are main equipment (Master).
" any cell judgement in the cellular array involved in communication means in above-mentioned cellular array computing system Whether the destination address is in this cell ", " identifying whether the communication with the microprocessor of this cell " " identifies that this cell is It is no in the target area ", and " connection non-volatile random access memory or microprocessor " is by the bus control unit It completes, the bus control unit connects the non-volatile random access memory by the cell interior bus.
In the specific implementation, non-volatile random access memory of the master cpu for any cell in the cellular array Read-write operation priority higher than the intracellular microprocessor for corresponding non-volatile random access memory read-write operation. That is, if the microprocessor in some cell needs to read and write the non-volatile random access memory in the cell, the master controls such as palpus CPU could be carried out after being completed for the read-write operation of the non-volatile random access memory of the cell.
The specific implementation of communication means in the cellular array computing system can also refer to above-mentioned cellular array and calculate The implementation of system, details are not described herein again.
In addition, the embodiment of the present invention also provides a kind of above-mentioned cellular array computing system calculating Monte Carlo products of application The method divided.Monte Carlo integrals are the summations to random number, are in the commonly used big calculating of science and engineering, principle Relatively easy, calculating process refers to following formula.
S=ΣRandom(x)F(x1,x2,.....,xN)
It below will be further to show the superiority of cellular array computing system for this way to solve the problem. The calculating of Monte Carlo integrals is very suitable for above-mentioned cellular array computing system, and steps are as follows for specific execution:
The master cpu selects the cell in a whole or target area in the cellular array, integrand F () Relative address section of the corresponding program broadcast to each selected cell;
The master cpu broadcasting instructions make the microprocessor of selected cell execute quilt since the relative address section The corresponding programs of Product function F ();
After each cell completes integral operation, summation is stored in the address of agreement, is carried out after being read for the master cpu Total summation.
In the present embodiment, the corresponding program of the integrand F () is when starting to execute, included randomizer The ID of cell is read as seed, can so ensure that the random number that each cell generates is different.
In actual implementation, thousands of a cells start simultaneously at operation, and powerful operational capability discharges completely, are no longer influenced by existing The restriction for having communication performance bottleneck in technology, so as to be more efficiently completed the calculating of Monte Carlo integrals.
In actual implementation, if integrand F () is extremely complex, the memory of a cell can not load, and can also pass through The mode of assembly line solves.Therefore, the present embodiment also provides the above-mentioned cellular array computing system of another application and calculates Monte The method of Carlo integrals, including:
The master cpu selects the cell in a whole or target area in the cellular array;
The master cpu is broadcasted one and is downloaded in program to the same segment relative address of each selected cell, and Broadcasting instructions make the microprocessor of selected cell execute the download program since the relative address;The download program The input of next step will be waited for;
The corresponding program of integrand is split into two or more subprograms, the master cpu broadcast is each In subprogram to the microprocessor of selected cell;
The microprocessor for downloading program is run, selects one of subprogram to store up according to the ID of respective place cell It deposits, each subprogram is made sequentially to be deployed in one group of adjacent successively cell;
The master cpu broadcasting instructions make the microprocessor of each group of cell execute the integrand successively corresponding Program be split after subprogram, the intermediate result of previous stage is transported to next stage and inputs;
After each group of cell completes integral operation, summation is stored in the address of agreement, is read for the master cpu laggard The total summation of row.
For example, as shown in fig. 6, integrand F () can such as be split into tri- parts f1, f2, f3 (three sons Program), it is deployed in adjacent cell, the intermediate result of previous stage is transported to next stage and inputs.
Specifically, when master cpu broadcasts f1, f2, f3 to each MPU (note:It is not to be dealt into memory, is intended for MPU), The MPU of program is downloaded in operation, it is selected in the coordinate x (such as with remainder of x/3) of rectangular coordinate system x-axis direction according to oneself ID In a sub- program storage.In this way, by the broadcast in two stages, three subprograms are deployed to according to desirable rule In all cells for participating in calculating.
In addition, since the execution of the corresponding program of the integrand F () is actually to be executed since first subprogram , it is still that cell is read with its included randomizer then when first subprogram after fractionation starts to execute ID as seed, ensure that the random number that each cell generates is different.
Compared with convention computer architecture, the broadcast capability of the cellular array computing system with being capable of extensive parallel biography The advantage of the internal network of transmission of data embodies herein.If this problem is calculated with traditional multiple nuclear CPU framework, when every When caching included a CPU is inadequate, all CPU are required for reading the code of integrand F () from the interface with memory, Form bottleneck.
In the present embodiment, the implementation for the internal network of cellular array can not only send data to flanking cell, and And also it is expanded to and can be sent to any one cell from a cell, it can realize that the iuntercellular in cellular array is logical Letter.
It specifically, can be not against master between any two cell in the cellular array of the cellular array computing system It is communicated in the case of control CPU, the cell for participating in cell-cell communication includes starting point cell, end point cell and transfer cell, institute It is the cell that data are sent out to the end point cell to state starting point cell, and the end point cell is finally receives the starting point cell institute The cell of data is sent out, the transfer cell is adjacent successively along cell-cell communication path and by described in the communication interface transfer The cell of the sent out data of starting point cell, the cell-cell communication path are by the starting point cell, transfer cell and end point cell The data transmit-receive path constituted.
By the communication interface between flanking cell in cellular array, realize between flanking cell for data it is multiple in Turn, makes just to be communicated not against master cpu between any two cell in cellular array, improve cell-cell communication Efficiency also reduces the processing load of master cpu, so as to further promote the overall performance of computing system.
It should be noted that the starting point cell, end point cell and transfer cell are logical relative to certain iuntercellular Relative concept for letter process, because some starting point cell completely may also be as the transfer during other cell-cell communications Cell or end point cell, some end point cell may also as during other cell-cell communications transfer cell or starting point it is thin Born of the same parents.
In the specific implementation, the cell in the cellular array can also include the network control being connected with the microprocessor Device processed, the network controller in cell-cell communication to the data of data, transfer sent out or the number finally received According to transmitting-receiving control is carried out, it is additionally operable to send interrupt signal to the microprocessor.In the present embodiment, by being set into the cell each A network controller is set, so as to the rapidly interim data in the case where not interfering MPU, thus reduces the place of MPU in cell Reason burden.In other embodiments, the network controller can not also be set, but realize the transfer of data by MPU.
In the present embodiment, " data sent out " refer to the data that the starting point cell itself is sent out;" the number of transfer According to " referring to the data that starting point cell is sent out described in the transfer cell transfer, the data are not needed for this cell itself It to be sent out;" data finally received " refer to end point cell received data, and the data are after by multiple transfer Destination is arrived, transfer will be no longer carried out." data sent out ", " data of transfer " and " data finally received ", it is just interior It may be identical data for appearance, only belong to the different addresses in different communication stage.
In the specific implementation, the cell in the cellular array can also include one group to be connected with the network controller Or one group or more First Input First Output, each group First Input First Output respectively correspond to a cell adjacent with this cell, each group First Input First Output includes input First Input First Output and output First Input First Output, and the input First Input First Output is for depositing Storage inputs the data that this cell carries out the data of transfer or finally receives, and the output First Input First Output is thin from this for storing The data that the need of born of the same parents' output carry out the data of transfer or this cell is sent out to other cells.
If by taking communication mode between flanking cell shown in Fig. 3 as an example, in the cellular array of the present embodiment into The structure of the cell of row cell-cell communication as shown in fig. 7, network controller in Fig. 7 respectively with MPU and 4 group of fifo queue phase Even, each group fifo queue is a pair of in " the upper and lower, left and right four direction of two dimensional surface " adjacent cell one with this cell respectively It answers, when it is implemented, the communication port between each two flanking cell can share one group of corresponding fifo queue.Every group of FIFO Queue includes input FIFO and output FIFO, in the angle of a cell wherein of stand for, input FIFO is stored from other The data of flanking cell input, output FIFO store data from this cell to other flanking cells that exported from, flanking cell it is defeated Go out FIFO and belong to input FIFO for this cell, the output FIFO of this cell belongs to input for flanking cell FIFO。
It should be noted that corresponding in cell shown in Fig. 7, there are 4 groups of fifo queues, if positioned at the cell battle array of rectangle The cell at 4 angles of row, then only there are two flanking cells for the cell, and the cell corresponds to 2 groups of fifo queues at this time, if being located at square The cell on 4 sides of the cellular array of shape, then there are three flanking cells, the at this time cell to correspond to 3 groups of fifo queues for the cell.
In the present embodiment, network controller is also connected with the intracellular MPU, give it send out interrupt signal, as FIFO empty, FIFO completely, newly arrives data, data submitting etc.;MPU then can send out data by network controller, and the data sent out would generally It is first put into corresponding output fifo queue.
It should be noted that the eucaryotic cell structure in Fig. 7 illustrate only and carry out the relevant module of cell-cell communication, ability Field technique personnel are, it is understood that eucaryotic cell structure shown in Fig. 7 can be combined completely with eucaryotic cell structure shown in fig. 5.
In addition, outputting and inputting the data of some cell in the present embodiment using fifo queue storage, can so make thin Data relay in intercellular communication process more efficiently, reduces the processing load of MPU.In other embodiments, it outputs and inputs The data of some cell can also be realized by register.
The embodiment of the present invention also provides the communication means between cell in a kind of above-mentioned cellular array computing system, including: The data that starting point cell in the cellular array will be sent out to end point cell are sent to and described by selected sending direction The adjacent cell of point cell;When any cell in the cellular array receives the number of data or transfer that flanking cell is sent out According to when, if according to the ID for the end point cell indicated in the data received judge this cell be end point cell, will The data received are stored in the non-volatile random access memory of this cell, or notify the microprocessor of this cell to the data of reception It is handled, otherwise this cell is used as transfer cell, and the data relay received is given this after selected sending direction The adjacent cell of cell.
In the specific implementation, each data involved in cell-cell communication process can all contain starting point cell and terminal is thin The ID of born of the same parents, any cell can judge that the data are intended for this according to the ID for the end point cell indicated in the data received Cell or need further in be given to other flanking cells.A data is by the connection between flanking cell, through excessive Secondary transfer, cell of reaching home can be according to starting points if the end point cell needs to make feedback with regard to the sent out data of starting point cell Feedback data is sent to starting point cell by the ID of cell, and the end point cell is with the starting point cell indicated in the data that receive IDs of the ID as end point cell, is indicated after handling the data received in obtained feedback data, at this time The end point cell becomes starting point cell when a new cell-cell communication, and it is logical that original starting point cell then becomes the secondary iuntercellular End point cell when letter.
When it is implemented, while the ID for indicating end point cell, data that the starting point cell is sent out to end point cell In also indicate address to be accessed in the end point cell or MPU;It is described that the data received are stored in the non-of this cell Volatile random access memory is the end point cell after the address to be accessed indicated in identifying the data received It carries out;Described that the MPU of this cell is notified to handle the data of reception, the end point cell is received identifying It is carried out after the MPU indicated in data.
In actual implementation, if the address to be accessed that end point cell is indicated in identifying the data received Afterwards, then the data received can be write direct the non-volatile random access memory of the cell by the network controller in the end point cell In appropriate address, in this case, cell may be implemented " to breed ", and cell can give another cell to download journey Sequence;If end point cell, after the MPU indicated in identifying the data received, the data of reception will be transferred in end point cell MPU processing.
In the present embodiment, described since the cell in cellular array further includes the network controller being connected with MPU Starting point cell to end point cell send out any cell in data, the cellular array receive the data that send out of flanking cell or in The data that turn and judge this cell be final cell or transfer cell, the data received are stored in this cell it is non-volatile with Machine memory notifies the MPU of this cell to handle the data of reception, is under the control of the network controller It completes.
When it is implemented, the data that the starting point cell is sent out to end point cell are first described in network controller input Export First Input First Output, then by the network controller from the output First Input First Output export to the starting point cell Adjacent cell;If any cell in the cellular array receives the data of data or transfer that flanking cell is sent out, The data received are inputted into the input First Input First Output, and again when the data for judging to receive need to carry out transfer The data are inputted into the output First Input First Output.
If in addition, the network controller judges that the input First Input First Output or output First Input First Output are sky Or expired, or receive flanking cell and send out or the data of transfer, or data or interim data are sent out to flanking cell, then to The microprocessor sends interrupt signal.
In the specific implementation, the starting point cell or transfer cell can select in the following way described sender to: If the communication path of straight line, the transmission can be formed between the starting point cell or transfer cell and the end point cell Direction is along the straight line from the starting point cell or transfer cell to the direction of the end point cell, otherwise described sender to For from the starting point cell or transfer cell to the direction of flanking cell to be selected, the flanking cell to be selected is thin with the starting point Close to the cell of the end point cell among born of the same parents or the adjacent cell of transfer cell.Certainly, the quantity of the flanking cell to be selected It is possible that for two, then select the less cell of communication task of output data in the two flanking cells to be selected as at this time Turn cell.
In the present embodiment, the starting point cell or transfer cell select sending direction through the above way, actually also may be used To be considered the path selection process of cell-cell communication in cellular array.Can be refering to Fig. 8, each rectangle in Fig. 8 indicates thin A cell in born of the same parents' array, all cells shown in Fig. 8 are the part in entire cellular array, it is assumed that flanking cell it Between communication mode as shown in Figure 3 carry out.
If A points indicate that a starting point cell, the starting point cell prepare to send out data to the end point cell where C points, by Be clearly that can form the communication path of straight line between A points and C points, then the cell where A points by data be sent to and its Cell where adjacent B points, similarly, the cell where B points continue on the straight line between A points and C points as transfer cell The direction interim data of cell where to C points is adjacent thin successively on the cell-cell communication path formed between A points and C points Born of the same parents repeatedly forward the data that cell where A points is sent out, until being transmitted to cell where C points.
If D points indicate another starting point cell, which prepares to send out data to the end point cell where G points, Due to being clearly the communication path that can not form straight line between D points and G points, then in the cell adjacent with cell where D points Among, the cell where cell and F points where E points is obviously closer to the end point cell where G points, then the two cells belong to The flanking cell to be selected of cell, can select the less cell of the communication task of wherein output data thin as transfer where D points Born of the same parents choose at random a cell as transfer cell if the communication task of the two cell output datas is identical.Such as Fig. 8 institutes Show, the cell where selection E points or the cell where F points will form different cell-cell communication paths.
It should be noted that being by taking the communication mode between flanking cell shown in Fig. 3 as an example in the present embodiment to cell Between the Path selection that communicates illustrate, if it will be appreciated to those of skill in the art that using shown in Fig. 4 adjacent thin Communication mode between born of the same parents, then alternative sending direction will be more.
To sum up, in actual implementation, each sends out or the cell of interim data, and network controller must all select one A adjacent cell is as the next stop.When beginning and end point-blank when, rational selection it is general only there are one;Other In the case of, there are two same rational selection, network controller will select the opposite neighbours not being in a hurry of a traffic.
If some input fifo queue has data entrance, network controller that will first check for it:
If terminal is this cell,:If terminal is specific relative address, since network controller has direct memory The data received will be directly stored in described non-volatile deposit at random by the ability for accessing (DMA, Direct Memory Access) Interrupt notification MPU is used in combination in appropriate address in reservoir;If terminal is MPU, directly MPU processing is notified with interrupt signal.
If the MPU that terminal is other cells or this cell sends out data,:If terminal exists with this cell On straight line, then selection is correctly oriented, to flanking cell transmission data;In the case of other, there are two possible direction, choosings The flanking cell that wherein output fifo queue is more idle is selected to send, if the output fifo queue in two flanking cells to be selected Situation is identical, then can therefrom choose at random a flanking cell and send.
In actual implementation, when the thousands of MPU present in the cellular array are calculated together, how each cell Output data be sent to the master cpu just become a problem.In general, each MPU can store output data Agreed address in non-volatile random access memory of cell where it allows master cpu by way of each MPU of poll one by one It is read out.However, this is not particularly suited for all problems, in some problems, when there was only a few cell in cellular array When needing to master cpu output data, then each MPU efficiency of poll is too low one by one for master cpu.
Therefore, cellular array computing system provided in an embodiment of the present invention further includes:Be additionally provided in the cellular array to A few full-time output cell, the sole duty export cell and receive and store other cells to the master control as end point cell The output data of CPU, and notify the master cpu to read the output data with interrupt signal.
In the specific implementation, can also FIFO teams be set in the non-volatile random access memory of the full-time output cell Row, other cells are stored in all output datas of the master cpu in the fifo queue, which should have foot Enough memory spaces, all output datas for storing other cells to the master cpu of having the ability.
When actual implementation, one or several cells can be selected as the full-time output cell in cellular array, one As can select to be communicated more easily cell between master cpu in position.The full-time output cell with it is described Interrupt line is equipped between master cpu, the full-time output cell can send interrupt signal to master cpu, such as newly arrive other The FIFO that the fifo queue being arranged in the output data of cell, MRAM expired, is arranged in MRAM skies etc..
Based on the above-mentioned cellular array computing system for being equipped with full-time output cell, the embodiment of the present invention also provides a kind of cell Communication means in array computation system, including:The sole duty exports cell and receives and stores other cells to the master cpu Output data after, the interrupt signal for reading of giving notice to the master cpu;The master cpu is receiving described lead to After knowing the interrupt signal of reading, the output data is read from the full-time output cell.
When it is implemented, can the output data to be sent to the sole duty in the following way defeated for other described cells Go out cell:Any cell in other described cells is sent out the output data by selected sending direction as starting point cell It send to adjacent cell;When any cell in the cellular array receives the output data of flanking cell transmission, If the ID for judging the end point cell indicated in the output data is consistent with the ID of this cell, due in the output data The ID for the end point cell indicated is the full-time ID for exporting cell, shows that this cell is that the sole duty exports cell, then will The output data is stored in the non-volatile random access memory of this cell, and otherwise this cell is used as transfer cell, in selected sender The output data transfer is given into the adjacent cell of this cell backward.
During the output data is sent to the full-time output cell by other described cells, the starting point cell Or transfer cell can select in the following way described sender to:If the starting point cell or transfer cell and the sole duty The communication path of straight line can be formed between output cell, then described sender is to for by the starting point cell or transfer cell Along the straight line to the direction of the full-time output cell, otherwise described sender is to for by the starting point cell or transfer cell To the direction of flanking cell to be selected, the flanking cell to be selected is among the cell adjacent with the starting point cell or transfer cell Close to the full-time cell for exporting cell.
The implementation process of the full-time output cell of the embodiment of the present invention can also be refering to Fig. 9.Fig. 9 shows master cpu, thin Born of the same parents' array and cellular array bus, the lattice one by one in cellular array simply represent cell one by one, wherein J points The cell (cell that i.e. bold box lattice indicates) at place is full-time output cell, and Fig. 9 further illustrates full-time output The structure of cell, as shown in dotted arrow in Fig. 9, it can be seen that be equipped in the MRAM in sole duty output cell and store other carefully Fifo queue of the born of the same parents to all output datas of the master cpu.
Assuming that the cell where cell and I points where H points needs to provide output data to master cpu, then can pass through The output data is sent to the cell where J points, H points to the cell-cell communication path of J points by the communication mode between cell And I points please refer to Fig. 9 to the cell-cell communication path of J points.Since cell-cell communication mode has had a detailed description before this, this Place repeats no more.
After cell where J points receives the output data that the cell where H points or the cell where I points are sent out, then may be used Notify the interrupt signal read to master cpu to send, it, can after master cpu receives the interrupt signal of notice reading By cellular array bus the output data is read from the cell where J points.
By the way that full-time output cell is arranged in cellular array, connect using the full-time output cell as the end point cell The output data received and store other cells to master cpu, and notified described in the master cpu reading in a manner of interrupt signal Output data so can improve master cpu when only a few cell is needed to master cpu output data and read output The efficiency of data.
Introduce the example of the above-mentioned cellular array computing system of an application again below.
Speech recognition can be compared with the voice signal of known sound bank and input, this comparison can be when Domain compares and can also be compared in frequency domain.When needing the words that compares more and more, such as, it is contemplated that different accents can arrive It is tens thousand of, seem insufficient if only relying on the computing capability of a few CPU for Real-time speech recognition.
Cellular array computing system provided in an embodiment of the present invention is then very suitable for solving problems.
For this purpose, the embodiment of the present invention also provides a kind of side carrying out comparing using above-mentioned cellular array computing system Method, including:The master cpu selects in the cellular array all or after the cell in a target area, alignment programs It is broadcast in the non-volatile random access memory of each cell;The sample that the master cpu is responsible for selected each cell to compare Product are respectively written into the agreed address of each cell;The master cpu broadcasting instructions give the microprocessor of selected cell, make each The data to be compared to be entered such as after a microprocessor completion initialization;The master cpu is data broadcasting to be compared to institute Select the microprocessor of cell;The microprocessor of selected cell runs the alignment programs, to the number to be compared received It is compared according to the sample for being responsible for comparing with this cell, if obtaining the consistent comparison result of the two, uses above-mentioned cellular array Communication means in computing system is sent to the full-time output cell for described using the comparison result as output data Master cpu is read.
When it is implemented, the data to be compared are either voice data to be identified, can also be to be identified Image data can also be other data for needing to be compared.
In actual implementation, each MPU constantly receives voice data and is compared, it is generally the case that hundreds of to thousands of In a cell, only there are one or a few obtain data to be compared with this cell to be responsible for both samples compared consistent Comparison result is sent to full-time output cell by comparison result, these cells, and the latter notifies master cpu to receive with interrupt signal.
If the data to be compared are specially voice data, the process of comparing can time domain or frequency domain into Row can be first segmented by master cpu if it is the latter and carry out Fast Fourier Transform (FFT) (FFT, Fast Fourier Transformation), then broadcast have been converted to frequency domain voice data give selected cell MPU.
Comparing is carried out by the above-mentioned cellular array computing system for being equipped with full-time output cell of application, cell can be made A large amount of cell is carried out at the same time the operation of alignment programs in array, thus has extremely strong parallel processing capability, solves existing Communication performance bottleneck problem in technology between CPU and memory, makes the ability of real-time voice/image recognition be greatly improved.
As previously mentioned, from a cell broadcast message to some target area in cellular array, there are one simple Method:Information is read by master cpu to be broadcasted again.The present embodiment also provides another realization method:Intercellular Point communications functionality is extended to region mass-sending, and this mode can support the higher depth of parallelism, much higher total bandwidth.
In cellular array computing system provided in this embodiment, any cell in the cellular array can also be used as institute It states all cells of the starting point cell into target area and carries out mass-sending communication, participate in the mass-sending communication and in target area Cell the transfer cell and end point cell, ginseng are used as the starting point cell or as the end point cell or simultaneously It is communicated with the mass-sending and the cell outside target area is as the starting point cell or transfer cell.
When it is implemented, the network controller being connected with microprocessor in each cell, in addition to any two cell into Row cell-cell communication, but also when the mass-sending communicates, to the data of data, transfer sent out or the data finally received Transmitting-receiving control is carried out, the network controller is additionally operable to send interrupt signal to the microprocessor.
In actual implementation, the original sender (cell in cellular array as starting point cell) of iuntercellular mass-sending communication It is responsible for indicating target area, the mass-sending of data is completed still through a series of transfers.It will be appreciated to those of skill in the art that Iuntercellular mass-sending communication is it is also assumed that be effective superposition of multiple intercellular point-to-point communication, therefore iuntercellular mass-sending communicates Specific implementation can also refer to the implementation that is communicated between any two cell, such as the cellular array referred to before this In cell equally may include a group or more of First Input First Outputs being connected with the network controller, herein no longer It repeats.
On the basis of mass-sending communication between above-mentioned cellular array computing system sertoli cell, the embodiment of the present invention also provides one Iuntercellular mass-sends communication means in the above-mentioned cellular array computing system of kind, including:When any cell in cellular array is used as It, will be thin if the starting point cell is located in the target area when point cell all cells into target area initiate mass-sending communication Intercellular mass-sending data are sent to all flanking cells in the target area, and are directed to each flanking cell more fresh target Otherwise iuntercellular mass-sending data are sent to adjacent cell by region by the direction close to target area;If being located at target Cell outside region receives the iuntercellular mass-sending data of flanking cell transmission, then is judging the iuntercellular mass-sending number After not including this cell according to middle indicated target area, this cell is used as transfer cell, will by the direction close to target area The iuntercellular mass-sends data relay to flanking cell;If the cell in target area receives the institute of flanking cell transmission State iuntercellular mass-sending data, then after the target area indicated in judging the iuntercellular mass-sending data includes this cell, This cell is used as end point cell, and the iuntercellular received mass-sending data are stored in the non-volatile random access memory of this cell, Or notify the microprocessor of this cell to handle iuntercellular mass-sending data, if still have the target area in The adjacent cell of this cell, then this cell is also as transfer cell, by the iuntercellular received mass-sending data relay to institute There is the flanking cell in target area, and target area is updated for each flanking cell;Updated target area Include one or more target areas made of being divided by the target area before updating, is sent out in the target area before update Or each flanking cell of the cell of iuntercellular mass-sending data described in transfer is separately included in updated each target area It is interior, it has sent out or iuntercellular described in transfer is mass-sended except the target area of the cell exclusion of data in the updated.
It should be noted that since master cpu can also be by the data broadcasting of some cell to some in cellular array Target area, in order to " broadcast data of master cpu " different from, therefore in the present embodiment by iuntercellular mass-send communication when institute The mass-sending data being related to are known as " iuntercellular mass-sending data ".The cell meeting hard objectives region for initiating iuntercellular mass-sending communication, should The range of the ID of all cells or all cell ID will be indicated among the iuntercellular mass-sends data in target area, appoint One cell receives the iuntercellular mass-sending data, just can mass-send the target area indicated in data according to the iuntercellular Judge that iuntercellular mass-sending data are finally received by this cell, or other flanking cells be given in needing further, Or the two all needs execution.
In addition, described update target area for each flanking cell, it is specifically that the target area before updating is drawn One or more target areas made of point (have sent out or the cell of iuntercellular mass-sending data described in transfer are excluded Except updated target area), wherein each target area can respectively contain the flanking cell (i.e. with before update Sent out in target area or transfer described in iuntercellular mass-sending data the adjacent cell of cell), each described flanking cell exists Respectively continue intercellular mass-sending communication in corresponding updated target area, correspondingly, the iuntercellular mass-sending The target area indicated in data equally can also be updated.
In the present embodiment, with communication mode between flanking cell shown in Fig. 3, and to initiate the starting point cell institute of mass-sending communication The shape of determining target area be rectangle for illustrate.It should be noted that the iuntercellular group given by the present embodiment More convenient and efficient mode when communication mode is actual implementation is sent out, it will be appreciated by those skilled in the art that in other implementations In example, iuntercellular mass-sending communication means equally can be suitably used for communicating between other flanking cells in above-mentioned cellular array computing system The target area of mode or other shapes.
When it is implemented, the difference of the cell present position as starting point cell or transfer cell, send out or the side of transfer Formula will be different.
When the first cell as starting point cell or transfer cell is located on the angle of rectangular target areas, if the square The cell quantity for containing the wherein one side on the adjacent both sides of first cell in shape target area is 1, then updated mesh Mark region is the rectangle region that the rectangular target areas excludes to be formed after first cell in the another side on the adjacent both sides Domain, otherwise updated target area includes the target area of two rectangles, and one of target area is the adjacent both sides Middle any side excludes the rectangular area formed after first cell.It should be noted that described first in the present embodiment is thin Born of the same parents are the general designations of a kind of cell on the angle of rectangular target areas.
It can be with refering to fig. 10, it is assumed that the cell where K points is the starting point cell initiated iuntercellular mass-sending and communicated, or is negative Blame the transfer cell of transfer iuntercellular mass-sending data, rectangular target areas 101 be the cell where K points send out or transfer described in it is thin Intercellular mass-sends identified target area before data, and the cell where K points is in rectangular target areas 101 at this time, and is located at square On the angle of shape target area 101, since 101 this edge in the horizontal direction of rectangular target areas only includes 1 cell, at this time K Cell where point only can be selected as next stop transfer there are one neighbours, then the network controller of the cell is by the cell Between mass-sending data be sent to the cell where L points, and rectangular target areas 101 is updated, the target area formed after update Domain is rectangular target areas 102, is equivalent to and excludes the cell where K points except rectangular target areas 101;With target area The continuous renewal in domain stops transfer if being left the last one cell in target area.
Assuming that the cell where M points is also to initiate the starting point cell of iuntercellular mass-sending communication, or to be responsible for transfer cell Between mass-send data transfer cell, rectangular target areas 103 be M points where cell send out or transfer described in iuntercellular mass-send number According to preceding identified target area, the cell where M points is in rectangular target areas 103 at this time, and is located at rectangular target areas On 103 angle, since 103 adjacent both sides of rectangular target areas include 1 or more cell, the cell where M points has at this time Two neighbours can be selected as next stop transfer, then iuntercellular mass-sending data are sent to by the network controller of the cell The cell where cell and O points where N points, and rectangular target areas 103 is updated, updated target area includes The target area of two rectangles, one of target area are rectangular target areas 104, another target area is rectangular target Region 105 is equivalent to and excludes the cell where M points except rectangular target areas 103, rectangular target areas 104 and rectangle Target area 105 can be used as independent target area to continue to carry out data relay with aforementioned similar approach;With target area Continuous renewal, if in target area be left the last one cell, stop transfer.
When the second cell as starting point cell or transfer cell is located on the side of rectangular target areas, if the square With the cell quantity where second cell while adjacent it is 1 in shape target area, then updated target area includes Side where second cell excludes the target area of two rectangles formed after second cell, otherwise updated target Region includes the target area of three rectangles, and two of which target area is that side where second cell excludes described second carefully Two rectangular areas formed after born of the same parents.It should be noted that second cell in the present embodiment is to be located at rectangular target area The general designation of a kind of cell on the side in domain.
It can be with refering to fig. 11, it is assumed that the cell where P points is the starting point cell initiated iuntercellular mass-sending and communicated, or is negative Blame the transfer cell of transfer iuntercellular mass-sending data, rectangular target areas 111 be the cell where P points send out or transfer described in it is thin Intercellular mass-sends identified target area before data, and the cell where P points is in rectangular target areas 111 at this time, and is located at square On certain side of shape target area 111, due in rectangular target areas 111 with where cell where P points when adjacent Cell quantity is more than 1, then the cell where P points can be selected as next stop transfer, the network of the cell there are three neighbours at this time It is thin where cell, the cell where R points and S points that iuntercellular mass-sending data are separately sent to where Q points by controller Born of the same parents, and rectangular target areas 111 is updated, updated target area includes the target area of three rectangles, respectively Rectangular target areas 112, rectangular target areas 113 and rectangular target areas 114 are equivalent to and exclude the cell where P points Except rectangular target areas 111, rectangular target areas 112 and the two target areas of rectangular target areas 113 are the equal of P points Two rectangular areas formed after cell where the cell at place where side exclusion P points, rectangular target areas 112, rectangle mesh Marking region 113 and rectangular target areas 114 can be used as independent target area to continue to carry out in data with aforementioned similar approach Turn;With the continuous renewal of target area, if being left the last one cell in target area, stop transfer.
It is understood that in target area if (not indicated in Figure 11) with where cell where P points while adjacent Cell quantity be 1, then the cell where P points can be selected as next stop transfer, the network of the cell there are two neighbours at this time Iuntercellular mass-sending data are separately sent to the cell where Q points and the cell where R points by controller, and to target area It is updated, updated target area includes the target area of two rectangles, specially rectangular target areas 112 and rectangle mesh Mark region 113.
When the third cell as starting point cell is located at the inside of rectangular target areas, updated target area includes The target area of four rectangles, two of which target area are after third cell place row or column excludes the third cell Two rectangular areas formed, other two target area is the rectangular target areas before update by the third cell institute It is expert at or column split and two rectangular areas being formed.It should be noted that the third cell in the present embodiment is to be located at The general designation of a kind of cell of the inside of rectangular target areas, the inside of the rectangular target areas refer to except " angle " and " side " with Outer region.
It can be with refering to fig. 12, it is assumed that the cell where T points is the starting point cell (the present embodiment for initiating iuntercellular mass-sending communication Cell where middle T points can not possibly mass-send the transfer cell of data for responsible transfer iuntercellular), rectangular target areas 121 is T points The cell at place sends out identified target area before the iuntercellular mass-sending data, and the cell where T points is located at rectangle at this time The inside of target area 121, the cell where T points can be selected as next stop transfer, the network control of the cell there are four neighbours Device processed by iuntercellular mass-sending data be separately sent to the cell where U points, the cell where V points, the cell where W points and Cell where X points, and rectangular target areas 121 is updated, updated target area includes the target of four rectangles Region, respectively rectangular target areas 122, rectangular target areas 123, rectangular target areas 124 and rectangular target areas 125, It is equivalent to and excludes the cell where T points except rectangular target areas 121, rectangular target areas 122 and rectangular target areas 123 the two target areas are the equal of that the cell where T points is expert at two rectangles formed after the cell where excluding T points Region, rectangular target areas 124 and the two target areas of rectangular target areas 125 are the equal of rectangular target areas 121 by T Cell where point, which is expert at, to be divided and two rectangular areas of formation, rectangular target areas 122, rectangular target areas 123, square Shape target area 124 and rectangular target areas 125 can be used as independent target area to continue with aforementioned similar approach into line number According to transfer;With the continuous renewal of target area, if being left the last one cell in target area, stop transfer.
In the present embodiment, when the 4th cell as starting point cell or transfer cell is located at except target area, if The communication path of straight line can be formed in 4th cell and target area between any cell, then the 4th cell hair Go out or transfer described in the sending directions of iuntercellular mass-sending data be along the straight line from the 4th cell to the side of target area To, otherwise described sender is to for from the 4th cell to the direction of flanking cell to be selected, the flanking cell to be selected be with Close to the cell of target area among the adjacent cell of 4th cell.It should be noted that described in the present embodiment Four cells are the general designations for being located at a kind of cell except rectangular target areas.
It can be with refering to fig. 13, it is assumed that the cell where Y1 points is the starting point cell for initiating iuntercellular mass-sending communication, rectangle mesh Mark region 131 is that the cell where Y1 points sends out identified target area before the iuntercellular mass-sending data, at this time Y1 points institute Cell except rectangular target areas 131, due to the cell rectangular target areas with respect to two sides extended line between, The communication path of straight line can be formed between cell where Y3 points in rectangular target areas, it at this time only can there are one neighbours Using the transfer as the next stop, iuntercellular mass-sending data are sent to this neighbour by the network controller of the cell where Y1 points It occupies, i.e. cell where Y2 points, the cell where Y2 points mass-sends the transfer cell of data as iuntercellular described in transfer is responsible for.Y2 Cell where point will be along direction interim data shown in dotted arrow in Figure 13, until being transmitted to cell where Y3 points.Y3 Cell where point is located on the side of rectangular target areas 131, can continue to complete rectangular target areas according to aforementioned correlation technique Transfer process in 131.
With continued reference to Figure 13, it is assumed that the cell where Z1 points is the starting point cell for initiating iuntercellular mass-sending communication, rectangle mesh Mark region 131 is that the cell where Z1 points sends out identified target area before the iuntercellular mass-sending data, at this time Z1 points institute Cell except rectangular target areas 131, due to the cell not rectangular target areas with respect to two sides extended line Between, it is all difficult to form the communication path of straight line between any cell in rectangular target areas, there are two neighbours at this time Can be as the transfer of the next stop, i.e. the cell where cell and Z3 points where Z2 points, the two cells are where Z1 points The flanking cell to be selected of cell, because the two cells are among the adjacent cell of cell where Z1 points closer to rectangular target The cell in region 131.In actual implementation, it can arbitrarily select one or more practical communication situation selects a burden to compare For light cell as next stop transfer, the communication task that the lighter cell of the burden refers specifically to output data is less Cell.Cell where Z1 points, by two feasible transfer communication paths, until the iuntercellular is mass-sended data In go to cell where Z4 points.Cell where Z4 points is located on the angle of rectangular target areas 131, can be according to aforementioned correlation technique Continue to complete the transfer process in rectangular target areas 131.
Iuntercellular mass-sends communication means in cellular array computing system provided in this embodiment, by by intercellular point pair Point communication function expands to region mass-sending, can support the higher depth of parallelism, obtain much higher total bandwidth, to further carry Rise the overall performance of computing system.
As previously mentioned, the cell in the cellular array of the embodiment of the present invention has both memory, storage and calculates three functions, carefully The non-volatile random access memory of intracellular can realize when the microprocessor calculates the arbitrary access of involved data and Store the instruction code of software and need the data of persistence, however the cost of the non-volatile random access memory be usually compared with High, so it is limited that the non-volatile random access memory in cell, which is used as the space of memory part, then when micro- place in cell When the data processing amount of reason device is larger, limited memory headroom can influence the treatment effeciency of microprocessor, how extend thin The memory headroom of born of the same parents becomes as urgent problem to be solved.
Based on above-mentioned consideration, the embodiment of the present invention gives another structure of cellular array computing system, such as Figure 14 Shown, the cellular array computing system is gone back in addition to including foregoing master cpu, cellular array and cellular array bus At least one memory cell array is may further include, the memory cell array is made of more than one internal storage location Two-dimensional array, the cellular array and all memory cell arrays are built up three-dimensional structure, in each memory cell array Internal storage location is connected correspondingly with the cell in the cellular array, and internal storage location cooperation is described non-volatile to deposit at random Reservoir, the two are provided commonly for the arbitrary access of involved data when the microprocessor calculates.
In actual implementation, the non-volatile random access memory in cell can be MRAM, and the memory cell array then may be used To be MRAM, DRAM or SRAM silicon chip, lower-cost one or more DRAM silicon chips can be generally chosen, wherein each DRAM Silicon chip is the memory cell array formed by the internal storage location consistent with each cell position in the cellular array, then by institute There are DRAM silicon chips to carry out 3D with cellular array silicon chip to combine, either memory unit cell corresponding with cellular array it Between can be established a communications link by TSV, thus extend the memory of each cell.
In the embodiment of the present invention, by by least one memory cell array being made of more than one internal storage location, with The cellular array is built up three-dimensional structure, and makes thin in the internal storage location and cellular array in each memory cell array Born of the same parents are connected correspondingly, the arbitrary access of the internal storage location involved data when being calculated for the microprocessor, so Just the memory headroom that each cell in cellular array can be extended with lower cost, improves the processing effect of microprocessor in cell Rate.
It should be pointed out that illustrate only a memory cell array in Figure 14 is built up three-dimensional with the cellular array The case where structure, those skilled in the art equally will also appreciate that more than one memory cell array is overlapped with the cellular array The case where forming three-dimensional structure.
Those skilled in the art are also understood that aforementioned master cpu passes through in cellular array bus and cellular array Each cell communicated, between any two cell not against master cpu carry out communicate, any cell is into target area All cells, which carry out mass-sending communication, are used as end point cell by the way that full-time output cell is arranged in cellular array receives and stores it His cell the communication means such as reads for master cpu to the output data of master cpu, these are equally applicable to include the memory The cellular array computing system of cell array.
It should be noted that due to the memory headroom of each Cell expansions in cellular array, the master cpu is in addition to can To access the non-volatile random access memory of this cell, can also access internal storage location corresponding with this cell (when with cell battle array When the quantity that row are superimposed as the memory cell array of three-dimensional structure is more than one, then corresponding with this cell internal storage location Quantity also has more than one), thus the master cpu by the cellular array bus with it is each in the cellular array The communication that a cell carries out includes at least one of following situations:The non-of any cell in the cellular array is read and write by address Volatile random access memory or corresponding internal storage location;The non-volatile of each cell in data broadcasting to target area is deposited at random Reservoir or corresponding internal storage location, and the non-volatile random access memory of each cell or corresponding interior in the target area is written Identical relative address in memory cell;To in the cellular array any cell microprocessor send instruction, transmission data or Reading state;To the microprocessor broadcasting instructions of all cells in target area.
When the cell in the cellular array further includes bus control unit and cell interior bus, the cell interior is total Line is connected in addition to connecting the microprocessor, non-volatile random access memory, internal storage location also corresponding with this cell, described total Lane controller is connected with the cellular array bus, microprocessor and cell interior bus, and the bus control unit is for knowing The communication not carried out between the master cpu and this cell connects the microprocessor to transmit the finger that the master cpu is sent Enable or data, state read, or by the cell interior bus connect the non-volatile random access memory or with this cell Corresponding internal storage location carries out the read-write operation of data.
As previously mentioned, the cell in the cellular array of the embodiment of the present invention has both memory, storage and calculates three functions, carefully The non-volatile random access memory of intracellular can realize when the microprocessor calculates the arbitrary access of involved data and Store the instruction code of software and need the data of persistence, however the cost of the non-volatile random access memory be usually compared with High, so it is equally limited that the non-volatile random access memory in cell, which is used as the space of storage section, then big when existing When amount file or data needs store each cell in cellular array, limited memory space just cannot be satisfied storage and want It asks, or even it is also possible to the treatment effeciency of microprocessor can be influenced, how to extend the memory space of cell is equally then urgently to solve Certainly the problem of.
Based on above-mentioned consideration, the embodiment of the present invention gives another structure of cellular array computing system, such as Figure 15 Shown, the cellular array computing system is gone back in addition to including foregoing master cpu, cellular array and cellular array bus At least one memory cell array is may further include, the memory cell array is made of more than one storage unit Two-dimensional array, the cellular array and all memory cell arrays are built up three-dimensional structure, in each memory cell array Storage unit is connected correspondingly with the cell in the cellular array, and storage unit cooperation is described non-volatile to deposit at random Reservoir, the two are provided commonly for the instruction code of storage software and need the data of persistence.
In actual implementation, the non-volatile random access memory in cell can be MRAM, and the memory cell array then has Body can be flash memory silicon chip, can generally choose cost relative to the lower one or more nand flash memory silicon chips of MRAM, wherein Each nand flash memory silicon chip is the storage list formed by the storage unit consistent with each cell position in the cellular array Element array, then all nand flash memory silicon chips are subjected to 3D with a cellular array silicon chip and are combined, any storage unit and cell battle array It can be vertically connected by TSV between corresponding cell to establish a communications link in row, thus extend the storage of each cell Space.
When it is implemented, the cell in the cellular array further includes the storage control being connected with the microprocessor, Data, which are carried out, for pair storage unit being connected with this cell stores access control.When one or more nand flash memory silicon chips and Can also be that each cell in cellular array configures nand flash memory control after one cellular array silicon chip carries out 3D combinations A pair storage unit corresponding with this cell just may be implemented by the nand flash memory controller of this cell in the MPU of device, this cell It is written and read.When heap file or data storage are in nand flash memory, the search of data can be passed through by each cell The respective channels NAND scan for, and are greatly accelerated.The compiling of large software system is also needed to thousands of Source code file is compiled, and when these source codes are stored in such cellular array computing system, compiling is similarly obtained Great acceleration.
The cellular array computing system provided in an embodiment of the present invention for including the memory cell array, can with it is lower at The memory space of each cell, improves the data storage capacities of each cell in this extension cellular array.
It should be pointed out that illustrate only a memory cell array in Figure 15 is built up three-dimensional with the cellular array The case where structure, those skilled in the art equally will also appreciate that more than one memory cell array is overlapped with the cellular array The case where forming three-dimensional structure.
It will be appreciated to those of skill in the art that aforementioned master cpu passes through in cellular array bus and cellular array often A cell communicated, between any two cell not against master cpu carry out communicate, any cell institute into target area There is cell to carry out mass-sending communication, be used as end point cell by the way that full-time output cell is arranged in cellular array and receive and store other Cell to master cpu output data for master cpu read etc. communication means, these be equally applicable to comprising the storage singly The cellular array computing system of element array.
All have very much as previously described, because the non-volatile random access memory in cell is used as memory and the space of storage section Limit, therefore how to extend the memory of cell and memory space is a problem to be solved simultaneously.Based on above-mentioned consideration, the present invention is real Apply the yet another construction that example gives cellular array computing system, as shown in figure 16, the cellular array computing system in addition to Including master cpu, cellular array and cellular array bus, can further include at least one said memory cells array and At least one above-mentioned memory cell array.Cellular array simultaneously comprising the memory cell array and memory cell array calculates System can refer to the tool of the above-mentioned cellular array computing system for only comprising memory cell array or only including memory cell array Body is implemented, and details are not described herein again.
Cellular array meter that is provided in an embodiment of the present invention while including the memory cell array and memory cell array Calculation system can extend the storage of each cell and memory headroom in cellular array simultaneously with lower cost, improve each cell Data storage capacities and cell in microprocessor treatment effeciency, so as to further promote the globality of computing system Energy.
High-end imaging sensor is (Microsecond grade) can to obtain image data within the extremely short time, but a vertical frame dimension is clear The data volume of image is very big, in current camera system, it is contemplated that active computer framework is because between CPU and memory, storage Influence of the existing communication performance bottleneck for computer overall performance reads out image data and usually just needs 1/30 second -1/60 The time of second, therefore in the camera system of the overwhelming majority, the processing capacity of video is not caught up with much and obtains image data Speed.
Therefore, camera system in the prior art is since there are between CPU and memory, storage for computer architecture that it is used Communication performance bottleneck, this greatly affected the overall performance of computing system so that processing of the current camera system for video Ability does not catch up with the speed that the imaging sensor in the camera system collects image data much.
To solve the above problems, the embodiment of the present invention also provides a kind of camera shooting system using above-mentioned cellular array computing system System.As shown in figure 17, the camera system includes cellular array computing system and imaging sensor, and the cellular array calculates System includes master cpu, cellular array and cellular array bus, can specifically refer to the description of above-mentioned related embodiment, herein It repeats no more;The two-dimensional array that described image sensor is made of more than one image acquisition units, the cellular array with Described image sensor is built up three-dimensional structure, in the image acquisition units in described image sensor and the cellular array Cell be connected correspondingly;Described image collecting unit is for acquiring image data for the cell in the cellular array It is handled.
In actual implementation, cmos image sensor more popular at present, low side may be used in described image sensor Cmos image sensor sensor devices and other circuits (such as signal amplification circuit, analog to digital conversion circuit etc.) can be made in The same face of silicon chip, and high-end cmos image sensor can then do other circuits overleaf, be connected by TSV and photosurface It connects;No matter any situation, a large amount of image acquisition units in imaging sensor can be divided into and the cellular array In the consistent two-dimensional array of each cell position, then imaging sensor is subjected to 3D with a cellular array silicon chip and is combined, appointed It can be established a communications link by TSV between one image acquisition units cell corresponding with cellular array.
It should be noted that those skilled in the art know, each image acquisition units of imaging sensor shown in Figure 17 Significant surface need be set to where silicon chip lower section because camera lens can only be below, otherwise above light will be several by other A silicon chip blocks.In other embodiments, imaging sensor can also be superimposed upon the top of cellular array silicon chip.
In addition, in actual implementation, each frame image will be divided to each Image Acquisition list of described image sensor In member, each image acquisition units are responsible for acquiring corresponding a part of content in a frame image, just can subsequently realize to a frame The parallel processing of each section content, so improves image processing efficiency in image.
In the present embodiment, each cell in the cellular array can also configure image processor, the image processor It is handled for pair image acquisition units acquired image data being connected with this cell.Certainly, in actual implementation, institute It states among the microprocessor that image processor can also be integrated in cell.
In the case that the cell quantity in cellular array reaches thousands of, then the Image Acquisition list in imaging sensor First quantity scale having the same, such each frame image can be read by thousands of a channels and be handled simultaneously, make to take the photograph As the ability of the image procossing of system will obtain hundred times of raising, to meet the requirements at the higher level for high-speed camera.
In addition, present inventor is it is further contemplated that " camera system of the prior art is remote for the processing capacity of video The speed that imaging sensor collects image data far is not caught up with, keeps the image-capable of camera system relatively limited " Problem, in addition to high-speed digital photography is at present still without good solution, for some image recognitions, there are faster speeds to want For the application asked, such as following automatic vehicle control system, it may be required that a frame image is made in signa It identifies and forms reaction, then the same plan for lacking effective reply.
For this purpose, the embodiment of the present invention also provides a kind of image identification system, including recognition unit and the embodiment of the present invention The camera system of offer, the recognition unit are used to that the image obtained after camera system processing to be identified.
It will be appreciated to those of skill in the art that when camera system provided in an embodiment of the present invention can be with than existing skill In the case that the faster speed of art completes the reading and processing of each frame image, can realize at faster speed naturally for The identification of each frame image.
Therefore, by that will include that the camera system of cellular array computing system is applied in image identification system, due to every The processing speed of one frame image is increased dramatically, and just can realize the identification for each frame image within the shorter time, Make the image identification system that there is faster recognition capability, to meet the requirement identified for high speed image.
The specific implementation of above-mentioned camera system provided in an embodiment of the present invention and image identification system can also refer to upper The associated description of cellular array computing system is stated, details are not described herein again.
As previously described, because the non-volatile random storage included by cell in the cellular array of cellular array computing system Device is used as memory and the space of storage section is all very limited, therefore how to extend the memory of cell and memory space simultaneously is urgently Problem to be solved, thus the embodiment of the present invention have been provided for one kind comprising master cpu, cellular array and cellular array bus On the basis of, and include the cellular array computing system of the memory cell array and memory cell array, the cell battle array simultaneously Column count system extends the memory and memory space of cell simultaneously, is especially suitable for completing the storage and processing of some big data quantities Task, since the data volume for the frame frame high-definition image that imaging sensor is acquired in camera system is larger, for camera shooting system The each Cell expansions memory and memory space of cellular array computing system in system just seem particularly necessary.
Based on above-mentioned consideration, the embodiment of the present invention also provides another structure of camera system.As shown in figure 18, described to take the photograph As system includes cellular array computing system and imaging sensor, the cellular array computing system includes master cpu, cell Array and cellular array bus can specifically refer to the description of above-mentioned related embodiment, and details are not described herein again;In the present embodiment, The cellular array computing system further includes at least one memory cell array and at least one memory cell array;The storage The two-dimensional array that cell array is made of more than one storage unit, the cellular array and one or more storages are single Element array is built up three-dimensional structure, and the storage unit in each memory cell array and the cell in the cellular array are one by one Accordingly it is connected;The storage unit is for storing the instruction code of software and needing the data of persistence;The interior deposit receipt The two-dimensional array that element array is made of more than one internal storage location, the cellular array and one or more internal storage locations Array is built up three-dimensional structure, and the internal storage location in each memory cell array and the cell one in the cellular array are a pair of It is connected with answering, the arbitrary access of the internal storage location involved data when being calculated for the microprocessor.
When actual implementation, the memory cell array can be flash memory silicon chip, the memory cell array can be MRAM, DRAM or SRAM silicon chips, on a silicon chip, described image sensor may be used more popular at present the cellular array Cmos image sensor.
The cellular array computing system that the camera system of another kind structure provided in an embodiment of the present invention is included can also With reference to figure 16.
By by least one memory cell array being made of more than one storage unit and at least one by one The memory cell array of the above internal storage location composition, is built up three-dimensional structure, and make each deposit with the cellular array respectively Cell in the internal storage location and cellular array in storage unit, each memory cell array in storage unit array corresponds Ground is connected, and the memory space and memory headroom of each cell in cellular array, Ji Nengti just can be so extended with lower cost The data storage capacities of high each cell, and the treatment effeciency of microprocessor in cell can be improved, it is taken the photograph to further improve As the image-capable of system.
It should be noted that another structure of camera system provided in an embodiment of the present invention is in terms of the cellular array Simultaneously comprising illustrating at least one memory cell array and at least one memory cell array in calculation system, at it Can also only include the memory cell array and interior in the cellular array computing system that camera system is included in his embodiment Any one in deposit receipt element array, such as Figure 14 or shown in figure 15 cellular array computing systems.
In addition, being directly to overlap in the structure of camera system shown in Figure 18, between imaging sensor and cellular array silicon chip Together, between the two be not present memory cell array silicon chip or memory cell array silicon chip, make image acquisition units with Line between corresponding cell is shorter, can so make the image that imaging sensor acquires more quickly by thin in cellular array Born of the same parents read and processing, to improve image processing efficiency.
Certainly, the embodiment of the present invention also provides a kind of image knowledge including recognition unit and camera system as shown in figure 18 Other system, the recognition unit are used to that the image obtained after camera system processing to be identified.
The specific implementation of above-mentioned camera system provided in an embodiment of the present invention and image identification system can also refer to upper The description of relevant cell array computation system is stated, details are not described herein again.
Neural network (Neural Networks) is common computational methods in computer learning, is the work for copying human brain Make principle, generally uses concept as perceptron (Perceptron) or neuron (Neuron).Neural Networks Learning training process be a magnanimity calculate process, a neural network be actually one possess quantity of parameters (may Have tens thousand of) function, need a large amount of scene, each scene to have input data and correct option, it is a large amount of for adjusting these Parameter to reach the destination of study.Since the calculation amount involved in neural computing is very big, and calculating in the prior art Rack structure is because communication performance bottleneck existing between CPU and memory, storage can then seriously affect computer overall performance, to be unfavorable for The efficient realization of neural computing.
For this purpose, based on the above-mentioned cellular array computing system that the embodiment of the present invention is provided, the embodiment of the present invention also provides A method of realizing neural computing using above-mentioned cellular array computing system, including:It is each in the cellular array Participate in storing the code of one or more neuron functions in the cell of neural computing;The master cpu selection one A or more than one cell executes the code for the neuron function that each cell is stored, and by implementing result to one or one with On target cell output;It is any participate in neural computing cell receive from other cells exported for nerve The implementing result of the code of meta-function, as the input data of this cell, based on all input numbers from other cells The code of neuron function stored according to this cell is executed, and implementing result is output to the neuron for needing the implementing result Cell where function is either stored in preset address and the master cpu is waited for read or exported to the master cpu.
Nerve is realized in order to be better understood from the above-mentioned cellular array computing system of application provided in an embodiment of the present invention The method of network calculations, it is necessary to first the principle of neural computing is briefly described.
Neural network is common algorithm in computer learning, has copied the operation principle of human brain.Human brain is by a large amount of Neuron composition, each neuron can be connect by a large amount of (thousands of) cynapses on dendritic arbors with other neurons It touches, is inputted;Stiffness of coupling in cynapse has memory function;It exports the signal of an excitement or inhibition, this signal A large amount of neuron at a distance can be sent to by aixs cylinder.
Human brain operation principle is copied to be formed by neural computing method, neuron therein is exactly a function, such as Shown in Figure 19, it has very multiple input, such as x1、x2And x3It is wherein 3 inputs, each input corresponds to a weight, General computational methods are that each input is multiplied by weight and is added again;It exports 0 or 1 (being determined by a threshold value) or one Value between 0 and 1;There are very multiple parameters (such as weight parameter, threshold parameter etc.) inside it, adjusts these parameters Process is exactly the process of computer learning.
One typical Neural Network is the network that output, the input of a large amount of Neuron are linked together, leads to Often it is organized into multistage architecture.As shown in figure 20, this neural network is organized into three-tier architecture in figure, each first layer Neuron Output be sent to each second layer Neuron, the output of each second layer Neuron is sent to each third layer Neuron.The first layer of neural network is properly termed as input layer in Figure 20, and the second layer is properly termed as hidden layer, and third layer can claim For output layer, the output layer exported 0,1,2 ..., 8,9 be neural computing final result.God in Usually all it is such hierarchical structure through network, in actual implementation, this can be a nerve for being used for identifying handwritten numeral Network.
The learning training process of Neural network is the process that a magnanimity calculates.One network is actually one The function for possessing quantity of parameters (may have tens thousand of), needs a large amount of scene, each scene to have input data and correct option, For adjusting these a large amount of parameters to reach the destination of study (usually using steepest descent method).For example train a nerve net Network identifies handwritten numeral, needs through tens of thousands of pictures, adjustment parameter makes neural network provide correctly each pictures As a result, handwriting recognition hereafter just has very high success rate.
The present inventors considered that neural computing is to be highly suitable for cellular array meter provided in an embodiment of the present invention It is achieved in calculation system.Due to needing to carry out neural network the valuation of many numbers in training process, then the nerve Network is can be accelerated by a large amount of MPU parallel computings.
In actual implementation, if the quantity size of neuron is less than cell in cellular array computing system in neural network Quantity, then in cellular array a cell can complete the calculating that a neuron in neural network is related to, it is otherwise neural The calculating that more than one neuron is related in network can be integrated in a cell and complete.
When it is implemented, the code of the one or more neuron functions of each cell storage, the code of neuron function It can be broadcasted to each cell as neuron by way of data broadcasting by master cpu, and be broadcasted by master cpu Instruction to some target area starts neural computing, such as:The master cpu can select in the cellular array all Or cell of the cell in target area as participation neural computing, by the code broadcast of identical neuron function to institute In the same segment relative address of each cell of selection, and broadcasting instructions make the micro- of one or more selected cells Processor executes the code of the neuron function since the relative address.Specific implementation can also refer to above-mentioned cell battle array The associated description of column count system specific implementation.
In the embodiment of the present invention, each input data from other cells corresponds to a weight parameter, described to be based on The code that all input datas from other cells execute the neuron function that this cell is stored may include:By each The corresponding weight parameter of input data from other cells is multiplied, and sums to all products, by the result after summation Output valve determining later is compared with threshold parameter as implementing result.When actual implementation, the weight parameter and threshold value Parameter is pre-stored among the non-volatile random access memory in cell.
It is also logical if the implementing result exported after executing the code of the neuron function is successive value when actual implementation The mode tabled look-up is crossed to accelerate to export.
When it is implemented, due between cellular array computing system provided in an embodiment of the present invention support any two cell Cell-cell communication, then when carrying out neural computing, can by using the cell-cell communication transmit for the nerve The implementing result of the code of meta-function.The effect of cell-cell communication is improved by the huge bandwidth of cellular array internal network communication Rate also reduces the processing load of master cpu, so as to further promote the overall performance of computing system, is more advantageous to nerve The efficient realization of network calculations.
When it is implemented, when neural network is divided into different layers, then data transmission between layers can use Cell-cell communication provided in an embodiment of the present invention mass-sends mechanism, and the cell for participating in neural computing in same layer at this time is located at together In one target area, any cell for participating in neural computing is mass-sended to target area for neuron function in a certain layer The implementing result of code, being just equivalent to will be for the implementing result of the code of neuron function mass-sending to all participation god of next layer In cell through network calculations.By the way that intercellular point communications functionality is expanded to region mass-sending, can support higher The depth of parallelism obtains much higher total bandwidth, to further promote the overall performance of computing system, is remarkably contributing to improve nerve The speed of network calculations.
If when it is implemented, being additionally provided with full-time output cell in cellular array, can be incited somebody to action by the cell-cell communication The implementing result of neural computing is sent to the full-time output cell so can be more efficient so that the master cpu is read Ground exports the implementing result of neural computing to master cpu.When actual implementation, if in the neuron function of last layer (common application is in the neuron of last layer, each is responsible for the specific number of identification one for certain outputs for obtaining affirmative According to feature or image), then it can notify master cpu by this mechanism.
Implement neural computing by cell-cell communication, iuntercellular mass-sending and full-time output cell and can refer to cell Associated description in array computation system embodiment, details are not described herein again.
In conclusion realizing the advantage of neural computing using cellular array computing system provided in an embodiment of the present invention It is obvious:The arithmetic speed that the parallel computing of a large amount of cells is significantly speeded up, so that the speed pole of learning training It is big to improve;The huge bandwidth of array internal network communication and mass-sending mechanism are equally remarkably contributing to improve speed;MRAM's is non- Volatibility so that the successful chip of training can be replicated directly as the product sale for solving particular problem.
Those skilled in the art know that any one CPU is required for there are software debugging interface, this is for software development Necessary function.Nearly all there are debugging interfaces by CPU currently on the market, and most of debugging interfaces are according to JTAG (Joint Test Action Group, combined testing action group) standard design.External commissioning device is sent out by this interface to CPU Debugging instruction, including:Pause, setting breakpoint, read/write memory etc., the fortune of these instructions implemented helper person and check program Market condition, diagnostic software failure.
For cellular array computing system provided in an embodiment of the present invention, when a large amount of (such as thousands of) CPU are integrated into When on one chip, how convenient, efficient realization debugging function is a problem to be solved.
Based on above-mentioned consideration, the embodiment of the present invention gives a kind of cellular array computing system with debugging interface. As shown in figure 21, the cellular array computing system is total in addition to including foregoing master cpu, cellular array and cellular array Line, can further include the debugging interface being connected with the master cpu, and commissioning device controls institute by the debugging interface Master cpu is stated to debug the software run in each cell of the cellular array.
In cellular array computing system provided in an embodiment of the present invention, the master cpu may be used in the prior art CPU realize, can naturally also support existing debugging interface.External debugging device connects master cpu by debugging interface, It recycles existing debugging interface to support the function of read/write memory, debugging instruction is sent to cellular array.
In view of in some cases, software program requires to exchange bulk information between cell, if a cell encounters Breakpoint stop, and other cells continue to run, and the confusion of whole system can be caused to cause debugging that can not carry out.For this purpose, In embodiments of the present invention, the cellular array computing system can also include the temporary of each cell in the connection cellular array Stop signal wire (being not shown in Figure 21), the halt signal line is used to suspend when the software run in any cell meets with breakpoint When, the cell all to other sends halt signal.
By increasing the halt signal line of all cells of connection, any one cell in cellular array computing system When encountering breakpoint, halt signal can be all sent on this root halt signal line, all cells receive after the halt signal at once Suspend the software wherein run, intercellular network transmission is also suspended simultaneously, and thus, it is possible to avoid causing the confusion of whole system And cause debugging that can not carry out, it is ensured that the stability and accuracy of debugging.
Further, the halt signal line can also be connected with the commissioning device, and the halt signal is also sent to The commissioning device.In actual implementation, the halt signal line can guide to cellular array chip exterior, to connect debugging Equipment.
Based on the above-mentioned cellular array computing system with debugging interface, the embodiment of the present invention also provides a kind of above-mentioned cell The adjustment method of array computation system, including:The commissioning device controls the master cpu to described by the debugging interface The cell or whole cell transmission debugging instruction in target cell, target area in cellular array, to realize to each thin The software run in born of the same parents is debugged.
In embodiments of the present invention, software debugging is to control master cpu by the debugging interface come real by commissioning device It is existing.
Specifically, the cell or thin that master cpu is sent to some cell or is broadcast in some target area The debugging instruction of whole cells in born of the same parents' array includes:Pause instruction, read write command and setting break-poing instruction;The pause instruction Operation, the intercellular data transmission of pause etc. of MPU can be suspended;The read write command can read the internal register of MPU, Register (operation, pause, experience breakpoint etc.) including recording MPU states, additionally it is possible to read the communication between flanking cell The data of FIFO in interface;The software setting breakpoint that the setting break-poing instruction can be run in cell, is specifically written cell MPU in be achieved.
When it is implemented, the master cpu is under the control of the commissioning device, or detect what other needs suspended When condition, pause instruction is sent to target cell, the cell in target area or whole cells;Receive the pause instruction Cell suspend the operation of software in this cell;The deposit of the inside in the microprocessor is read by sending read write command The content in communication interface between device, the non-volatile random access memory (such as MRAM) and flanking cell carrys out debugging software.
Further, break-poing instruction can also be arranged in target cell or target area in the master cpu by transmission The software setting breakpoint run in cell in domain, and periodically read the state of cell;The state of the cell includes in cell Operation, pause and the experience breakpoint of software, the condition that other described needs suspend include that the software in cell meets with breakpoint.
In the embodiment of the present invention, due to each cell in further including the connection cellular array in cellular array computing system The halt signal line, therefore the adjustment method of above-mentioned cellular array computing system further includes:When what is run in any cell When software meets with breakpoint and suspends, which sends halt signal by the halt signal line to other all cells;It connects The cell for receiving the halt signal suspends the operation of software in this cell, and suspends the data transmission between flanking cell.
In the embodiment of the present invention, the halt signal line is also connected with the commissioning device, makes the halt signal can also It is enough sent to the commissioning device, therefore the adjustment method of above-mentioned cellular array computing system further includes:When the commissioning device When receiving the halt signal that any cell is transmitted by the halt signal line, control suspends the fortune of the master cpu Row.In actual implementation, when the halt signal that any cell is sent out in cellular array is transmitted to commissioning device, commissioning device can To suspend master cpu at once, to check the interaction problems of master cpu and cellular array, the stabilization of debugging is thereby further ensured that Property and accuracy.
The specific reality of cellular array computing system and its adjustment method provided in an embodiment of the present invention with debugging interface The implementation of the above-mentioned cellular array computing system with other structures can also be referred to by applying.
It should be pointed out that the embodiment of the present invention is adjusted so that the cellular array is specially two-dimentional cellular array as an example to carrying The cellular array computing system of mouth of trying illustrates, and in other embodiments, the cellular array can also be three-dimensional thin Born of the same parents' array, the three-dimensional cell array is formed by stacking by more than one two-dimentional cellular array, " adjacent thin in cellular array at this time The concept of born of the same parents " is not limited solely to two dimensional surface, but expands to three dimensions.If using such as Fig. 3 in two-dimentional cellular array Shown in communication mode between flanking cell, then in rectangular coordinate system in space, any cell x-axis both forward and reverse directions, y-axis just Negative direction and z-axis both forward and reverse directions this six directions all have adjacent cell.In actual implementation, when multi-disc 2D cellular array cores Piece can be superimposed together composition 3D chips when, vertical linkage is established in adjacent iuntercellular by TSV, that is, is located at adjacent It is established a communications link by TSV between the flanking cell of two two-dimentional cellular arrays.The cellular array chip of 3D is keeping low work( While consumption advantage, the scale of cellular array is increased, has expanded the bandwidth of intercommunication.
Although present disclosure is as above, present invention is not limited to this.Any those skilled in the art are not departing from this It in the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute Subject to the range of restriction.

Claims (13)

1. a kind of cellular array computing system, which is characterized in that including:Master cpu, cellular array and cellular array bus, institute It states cellular array and the cellular array bus is integrated in a chip;
The cellular array is had both the two-dimensional array that the cell of calculating and store function forms by more than one, wherein each Cell includes microprocessor and non-volatile random access memory;When the non-volatile random access memory is calculated for the microprocessor The arbitrary access of involved data is additionally operable to the instruction code of storage software and needs the data of persistence;
Position in each leisure cellular array of each cell storage as ID in cell software or hardware read;
The master cpu is communicated by the cellular array bus with each cell in the cellular array;
There is communication interface between flanking cell in the cellular array, any cell in the cellular array is in its microprocessor To flanking cell transmission data under the control of device.
2. cellular array computing system according to claim 1, which is characterized in that the master cpu passes through the cell The communication that array bus is carried out with each cell in the cellular array includes at least one of following situations:
The non-volatile random access memory of any cell in the cellular array is read and write by address;
By in data broadcasting to the cellular array in target area each cell non-volatile random access memory, and institute is written State in target area identical relative address in the non-volatile random access memory of each cell;
Instruction, transmission data or reading state are sent to the microprocessor of any cell in the cellular array;
To the microprocessor broadcasting instructions of all cells in the target area.
3. cellular array computing system according to claim 1, which is characterized in that the cell in the cellular array also wraps Include bus control unit and cell interior bus, the bus control unit and the cellular array bus, microprocessor and cell Internal bus is connected, the communication that the bus control unit carries out between the master cpu and this cell for identification, described in connection Microprocessor is to transmit the instruction or data, state reading that the master cpu is sent, or passes through the cell interior bus and connect Connect the read-write operation that the non-volatile random access memory carries out data.
4. cellular array computing system according to claim 1, which is characterized in that be integrated with floating-point in the microprocessor At least one of computation processor and image processor.
5. cellular array computing system according to claim 1, which is characterized in that the non-volatile random access memory is MRAM。
6. cellular array computing system according to claim 1, which is characterized in that the master cpu and the cell battle array Row and the cellular array bus are integrated in a chip.
7. cellular array computing system according to claim 1, which is characterized in that the master cpu is as independent core Piece is communicated by the memory interface of standard with the chip being made of the cellular array and the cellular array bus.
8. the communication means in a kind of cellular array computing system as described in claim 1, which is characterized in that including:
Any cell in the cellular array is with receiving the target that the master cpu is broadcasted in the cellular array bus Location connects the non-volatile random access memory of the cell so that the master control if judging the destination address in this cell CPU carries out the read-write operation of data;
The first special address field is reserved for the communication between the master cpu and microprocessor in system address space and is deposited The ID of target cell is stored up, identifies it is thin with this when if any cell receiving the first special address field in the cellular array The communication of the microprocessor of born of the same parents, the then microprocessor for connecting the cell complete subsequent command reception, data receiver and state reading Extract operation;
The second special address field is reserved in system address space and is used for the master cpu broadcasting instructions, and described second distinguishingly Location section has the ID that can assist in each cell of the range of target area in the cellular array, if in the cellular array Any cell identifies that this cell in the target area, then connects the micro- of the cell after receiving the second special address field Processor is read with transmitting instruction that the master cpu is sent or data, state, or by connect the cell it is non-volatile with Machine memory carries out the read-write operation of data;
Any cell in the cellular array is under the control of its microprocessor to flanking cell transmission data.
9. the communication means in cellular array computing system according to claim 8, which is characterized in that the cellular array In cell further include bus control unit and cell interior bus, the bus control unit and the cellular array bus, Wei Chu It manages device and cell interior bus is connected;Whether any cell in the cellular array judges the destination address in this cell In, identify whether the communication with the microprocessor of this cell, identify this cell whether in the target area, and connection is non- Volatile random access memory or microprocessor are completed by the bus control unit, and the bus control unit passes through the cell Internal bus connects the non-volatile random access memory.
10. the communication means in cellular array computing system according to claim 8, which is characterized in that the master cpu Intracellular higher than this for the priority of the read-write operation of the non-volatile random access memory of any cell in the cellular array Read-write operation of the microprocessor for corresponding non-volatile random access memory.
11. a kind of application cellular array computing system as described in any one of claim 1 to 7 calculates Monte Carlo integrals Method, which is characterized in that including:
The master cpu selects the cell in a whole or target area in the cellular array, and integrand is corresponding Relative address section of the program broadcast to each selected cell;
The master cpu broadcasting instructions make the microprocessor of selected cell execute the quilt since the relative address section The corresponding program of Product function;
After each cell completes integral operation, summation is stored in the address of agreement, is always asked after being read for the master cpu With.
12. a kind of application cellular array computing system as described in any one of claim 1 to 7 calculates Monte Carlo integrals Method, which is characterized in that including:
The master cpu selects the cell in a whole or target area in the cellular array;
The master cpu is broadcasted one and is downloaded in program to the same segment relative address of each selected cell, and broadcasts Instruction makes the microprocessor of selected cell execute the download program since the relative address;
The corresponding program of integrand is split into two or more subprograms, the master cpu broadcasts each sub- journey In sequence to the microprocessor of selected cell;
The microprocessor for downloading program is run, selects one of subprogram to store according to the ID of respective place cell, makes Each subprogram is sequentially deployed in one group of adjacent successively cell;
The master cpu broadcasting instructions make the microprocessor of each group of cell execute the corresponding program of the integrand successively Subprogram after being split, the intermediate result of previous stage are transported to next stage and input;
After each group of cell completes integral operation, summation is stored in the address of agreement, is carried out after being read for the master cpu total Summation.
13. the method that cellular array computing system according to claim 11 or 12 calculates Monte Carlo integrals, special Sign is, when starting to execute, the ID that included randomizer reads cell makees the corresponding program of the integrand For seed.
CN201510456294.3A 2015-07-29 2015-07-29 Cellular array computing system and communication means therein Active CN105608490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510456294.3A CN105608490B (en) 2015-07-29 2015-07-29 Cellular array computing system and communication means therein

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510456294.3A CN105608490B (en) 2015-07-29 2015-07-29 Cellular array computing system and communication means therein

Publications (2)

Publication Number Publication Date
CN105608490A CN105608490A (en) 2016-05-25
CN105608490B true CN105608490B (en) 2018-10-26

Family

ID=55988414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510456294.3A Active CN105608490B (en) 2015-07-29 2015-07-29 Cellular array computing system and communication means therein

Country Status (1)

Country Link
CN (1) CN105608490B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108256637A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 A kind of cellular array three-dimensional communication transmission method
CN106951955B (en) * 2017-03-09 2019-12-17 中国人民解放军军械工程学院 Method for selecting electronic cell number in bus embryo electronic cell array
KR102481256B1 (en) * 2017-08-31 2022-12-23 캠브리콘 테크놀로지스 코퍼레이션 리미티드 Chip device and related product
US10635622B2 (en) * 2018-04-03 2020-04-28 Xilinx, Inc. System-on-chip interface architecture
US10866753B2 (en) 2018-04-03 2020-12-15 Xilinx, Inc. Data processing engine arrangement in a device
CN108897714B (en) * 2018-07-03 2022-05-24 中国人民解放军国防科技大学 Multi-core or many-core processor chip with autonomous region
CN109886393B (en) * 2019-02-26 2021-02-09 上海闪易半导体有限公司 Storage and calculation integrated circuit and calculation method of neural network
CN110362280A (en) * 2019-09-04 2019-10-22 南京优存科技有限公司 Mixing storage system based on the nearly data processing MRAM of low-power consumption neural network
CN112732631A (en) * 2020-12-25 2021-04-30 南京蓝洋智能科技有限公司 Data transmission method between small chips
CN112631989A (en) * 2021-03-08 2021-04-09 南京蓝洋智能科技有限公司 Data transmission method among small chips, among chips and among small chips
CN112667557A (en) * 2021-03-16 2021-04-16 南京蓝洋智能科技有限公司 Data transmission method suitable for chiplet architecture

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341242A (en) * 1999-01-21 2002-03-20 索尼电脑娱乐公司 High-speed processor system, method of using the same, and recording medium
CN101354694A (en) * 2007-07-26 2009-01-28 上海红神信息技术有限公司 Ultra-high expanding super computing system based on MPU structure
CN101681296A (en) * 2008-02-29 2010-03-24 株式会社东芝 Memory system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101330413B (en) * 2007-06-22 2012-08-08 上海红神信息技术有限公司 Method for expanding mixed multi-stage tensor based on around network and ultra-cube network structure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341242A (en) * 1999-01-21 2002-03-20 索尼电脑娱乐公司 High-speed processor system, method of using the same, and recording medium
CN101354694A (en) * 2007-07-26 2009-01-28 上海红神信息技术有限公司 Ultra-high expanding super computing system based on MPU structure
CN101681296A (en) * 2008-02-29 2010-03-24 株式会社东芝 Memory system

Also Published As

Publication number Publication date
CN105608490A (en) 2016-05-25

Similar Documents

Publication Publication Date Title
CN105608490B (en) Cellular array computing system and communication means therein
CN105740946B (en) A kind of method that application cell array computation system realizes neural computing
CN105718996B (en) Cellular array computing system and communication means therein
US20230334294A1 (en) Multi-memory on-chip computational network
US11580367B2 (en) Method and system for processing neural network
CN105718994B (en) Cellular array computing system
CN106662995B (en) Device, method, system, medium and the equipment seized for providing intermediate thread
US11294599B1 (en) Registers for restricted memory
CN106951926A (en) The deep learning systems approach and device of a kind of mixed architecture
CN111465943B (en) Integrated circuit and method for neural network processing
DE112020004107T5 (en) CONTENT RECOMMENDATIONS USING ONE OR MORE NEURAL NETWORKS
CN102279386B (en) SAR (Synthetic Aperture Radar) imaging signal processing data transposing method based on FPGA (Field Programmable Gata Array)
CN101008928A (en) Method and apparatus for tracking command order dependencies
CN105718991B (en) Cellular array computing system
CN105718990B (en) Communication means between cellular array computing system and wherein cell
CN109359542A (en) The determination method and terminal device of vehicle damage rank neural network based
CN105718392B (en) Cellular array document storage system and its file-storage device and file memory method
CN105718379B (en) Cellular array computing system and wherein iuntercellular mass-send communication means
CN105718380B (en) Cellular array computing system
CN105719227B (en) A kind of camera system and image identification system
CN105718992B (en) Cellular array computing system
CN105718995B (en) Cellular array computing system and its adjustment method
CN105719228B (en) Camera system and image identification system
CN107291209A (en) Cellular array computing system
DE112021004742T5 (en) Memory bandwidth throttling for virtual machines

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant