CN105740946B - A kind of method that application cell array computation system realizes neural computing - Google Patents

A kind of method that application cell array computation system realizes neural computing Download PDF

Info

Publication number
CN105740946B
CN105740946B CN201510456306.2A CN201510456306A CN105740946B CN 105740946 B CN105740946 B CN 105740946B CN 201510456306 A CN201510456306 A CN 201510456306A CN 105740946 B CN105740946 B CN 105740946B
Authority
CN
China
Prior art keywords
cell
data
cellular array
array
communication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510456306.2A
Other languages
Chinese (zh)
Other versions
CN105740946A (en
Inventor
戴瑾
郭民
郭一民
王践识
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Ciyu Information Technologies Co Ltd
Original Assignee
Shanghai Ciyu Information Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Ciyu Information Technologies Co Ltd filed Critical Shanghai Ciyu Information Technologies Co Ltd
Priority to CN201510456306.2A priority Critical patent/CN105740946B/en
Publication of CN105740946A publication Critical patent/CN105740946A/en
Application granted granted Critical
Publication of CN105740946B publication Critical patent/CN105740946B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

A kind of method that application cell array computation system realizes neural computing, the cellular array computing system includes: master cpu, cellular array and cellular array bus;The two-dimensional array that cellular array is made of more than one cell for having both calculating and store function;Master cpu is communicated by cellular array bus with cell each in cellular array;There is communication interface between flanking cell, can mutually send data;The described method includes: the code that one or more neuron functions will be stored in the cell for participating in neural computing each in cellular array;Any cell for participating in neural computing receives the implementing result of the code for neuron function exported from other cells, input data as this cell executes the code for the neuron function that this cell is stored, and implementing result is exported, or deposit preset address.The present invention can overcome active computer framework because of communication performance bottleneck existing between CPU and memory, storage, lifting system overall performance.

Description

A kind of method that application cell array computation system realizes neural computing
Technical field
The present invention relates to computer and computer application technology, in particular to a kind of application cell array computation system The method for realizing neural computing.
Background technique
Usually, a computer mainly includes three cores: central processing unit (CPU, Central Processing Unit), memory and storage.
By the unremitting effort of some Apical Limited of the world, CPU has evolved into extremely complicated semiconductor chip.It is top CPU core inside metal-oxide-semiconductor number can be more than 100,000,000.Current industry trend is to be limited by power consumption, the operation frequency of CPU Rate has been difficult to improve again.Extremely complicated modern CPU, operational efficiency are similarly difficult to improve again.New CPU product, more Come more towards the evolution of multicore direction.
In terms of memory, that occupy dominant position at present is dynamic random access memory (DRAM, Dynamic Random Access Memory) technology.DRAM can quick random read-write, but content cannot be kept in the event of a power failure.It is practical On, even if it is also due to internal lose information for storing up the electric leakage of stored capacitor in the case where energization, it is necessary to Periodically self-refresh.
In terms of storage, nand flash memory technology gradually replaces conventional hard.The floating grid that flash memory is relied on (floating gate) technology, although content can be kept in the event of a power failure, the speed of write-in (being rewritten as ' 0 ' for ' 1 ') Degree is very slow, and the speed of erasing (being rewritten as ' 1 ' for ' 0 ') is slower, and no image of Buddha DRAM is used for the direct support to calculating like that.Its quilt It makes block device (block device), it is necessary to which monolith is wiped together, and a block (block) includes many pages (page), is wiped Write operation can be carried out except latter every page.The another question of NAND is that have the limited service life.
The logic circuit of DRAM and nand flash memory and CPU, although being all based on the production of cmos semiconductor technique, The technique of this three is each other and incompatible.Then, three cores of computer can not coexist on a single die, this is deep Ground affects the framework of modern computer.
Computer architecture in the prior art as shown in Figure 1, show multiple CPU cores in Fig. 1, respectively CPU1, CPU2, CPU3 ..., CPUn, each CPU core generally has corresponding level cache (L1 Cache), as needed can also be into one Step is that each CPU core is equipped with corresponding L2 cache (L2 Cache), three-level caches (L3 Cache).DRAM and each CPU It is communicated between kernel by Double Data Rate (DDR, Double Data Rate) interface, hard disk (HD, Hard Disk) or solid State hard disk (SSD, Solid State Drives) with then communicated by peripheral device interface between each CPU core.
On the one hand, CPU develops to the direction of multicore, and another aspect memory and storage are all in other chip.Multicore CPU handles up the proportional increase of information content, and the bottleneck of system performance is just increasingly becomed with the communication of memory, storage.In order to alleviate Communication performance bottleneck, CPU have to using increasing multi-level buffer.Caching be in memory content duplication, usually at This but speed faster static random access memory (SRAM, Static Random Access more much higher than DRAM Memory it) designs.Such framework, cost effectiveness are very poor.The cost of semiconductor chip determines by the area of its silicon wafer, and The performance boost of convention computer architecture bring and the increase of its silicon area are much disproportionate.
Neural network (Neural Networks) is common calculation method in computer learning, is the work for copying human brain Make principle, generally using concept as perceptron (Perceptron) or neuron (Neuron).Neural Networks Learning training process be a magnanimity calculate process, a neural network be actually one possess quantity of parameters (may Have tens of thousands of) function, need a large amount of scene, each scene has input data and correct option, a large amount of for adjusting these Parameter to reach the destination of study.The calculation amount as involved in neural computing is very big, and calculating in the prior art Rack structure is because communication performance bottleneck existing between CPU and memory, storage can then seriously affect computer overall performance, to be unfavorable for The efficient realization of neural computing.
Summary of the invention
The problem to be solved in the present invention is computer architecture in the prior art because existing between CPU and memory, storage Communication performance bottleneck and influence computer overall performance, to be unfavorable for the efficient realization of neural computing.
To solve the above problems, technical solution of the present invention provides a kind of application cell array computation system realization neural network The method of calculating, the cellular array computing system include: master cpu, cellular array and cellular array bus;The cell battle array The two-dimensional array that column are made of more than one cell for having both calculating and store function, wherein each cell includes micro process Device (MPU, Micro Processing Unit) and non-volatile (NV, Not Volatile) random access memory;It is described it is non-volatile with The arbitrary access of involved data when machine memory is calculated for the microprocessor, be also used to store software instruction code and Need the data of persistence;Position in each leisure cellular array of each cell storage as identity recognition number (ID, Identification) in cell software or hardware read;The master cpu passes through the cellular array bus and institute Each of cellular array cell is stated to be communicated;There is communication interface between flanking cell in the cellular array, it can phase Mutually send data;The described method includes: storing one in each cell for participating in neural computing in the cellular array Or the code of more than one neuron function;The master cpu selects each cell of one or more cells execution to be stored Neuron function code, and implementing result is exported to one or more target cell;Any participation nerve net The cell that network calculates receives the implementing result of the code for neuron function exported from other cells, as The input data of this cell executes the neuron function that this cell is stored based on all input datas from other cells Code, and implementing result is output to the cell where the neuron function for needing the implementing result, or deposit preset address, Or it is exported to the master cpu.
Optionally, it can be carried out communication between any two cell in the cellular array, participate in the thin of cell-cell communication Born of the same parents include starting point cell, end point cell and transfer cell, and the starting point cell is the cell that data are issued to the end point cell, The end point cell is the cell for finally receiving the sent out data of starting point cell, and the transfer cell is along cell-cell communication road Diameter is successively adjacent and cell by the sent out data of starting point cell described in the communication interface transfer, the cell-cell communication path It is the data transmit-receive path being made of the starting point cell, transfer cell and end point cell;The method also includes: any two It is the implementing result by cell-cell communication transmission for the code of the neuron function between a cell.
Optionally, any cell in the cellular array can also be all into target area as the starting point cell Cell carries out mass-sending communication, participates in the mass-sending communication and the cell in target area as the starting point cell or makees For the end point cell or simultaneously as the transfer cell and end point cell, participates in the mass-sending communication and be located at target area Overseas cell is as the starting point cell or transfer cell;The method also includes: if neural network is divided into different layers, make Data transmission between layers is realized with the mass-sending communication, and the cell of participation neural computing is located at same in same layer In target area.
Optionally, at least one sole duty output cell, the full-time output cell conduct are additionally provided in the cellular array The end point cell receives and stores other cells to the output data of the master cpu, and notifies the master with interrupt signal It controls CPU and reads the output data;It includes: by the cell-cell communication that the implementing result is exported to the master cpu The implementing result is sent to the full-time output cell, for master cpu reading.
Optionally, the non-volatile random access memory is magnetic RAM (MRAM, Magnetic Random Access Memory)。
Optionally, the master cpu selects whole in the cellular array or the cell in target area as participating in mind Cell through network query function is opposite by same section of the code broadcast of identical neuron function to each selected cell In address, and broadcasting instructions execute the microprocessor of one or more selected cells since the relative address The code of the neuron function.
Compared with prior art, technical solution of the present invention has at least the following advantages:
In current neural network algorithm, the process of learning training is usually a problem of magnanimity calculates;One network The learning training time it is very long;The parallel computing of thousands of neurons, pole may be implemented in neural network in cellular array The learning process for the neural network that the earth is accelerated, improves learning ability.
Two-dimensional array (title is formed with the unit (referred to as " cell ") of store function by the way that more than one is had both independent calculate For " cellular array "), wherein each cell includes microprocessor and non-volatile random access memory, the non-volatile random storage Device can support the arbitrary access of the microprocessor involved data when being calculated, and can also support the instruction generation of storage software Code and need the data of persistence, make memory, storage, calculate three functions and be integrated into each cell, and make each cell it Between form intensive communication network, on the one hand, master cpu can pass through each of cellular array bus and cellular array cell It is communicated, on the other hand, also can mutually send data between the flanking cell in cellular array, thus, it is possible to mass-send by data And internal network, overcome active computer framework because of communication performance bottleneck existing between CPU and memory, storage, to promote calculating The overall performance of system, and cost effectiveness can be made preferable;The above-mentioned computing system with cellular array is being applied to realize nerve In the case where network query function, the arithmetic speed that the parallel computing of a large amount of cells greatly accelerates, so that learning training Speed is greatly improved, and non-volatile possessed by non-volatile random access memory in cell, so that the successful chip of training can The product sale of particular problem directly to replicate as solving.
By the communication interface between flanking cell in cellular array, realize between flanking cell for data it is multiple in Turn, makes just to can be carried out communication not against master cpu between any two cell in cellular array, by cellular array intranet Network communicates the efficiency that huge bandwidth improves cell-cell communication, the processing load of master cpu is also reduced, so as to further The overall performance for promoting computing system, is more advantageous to the efficient realization of neural computing.
By the way that intercellular point communications functionality is expanded to region mass-sending, it can support the higher depth of parallelism, obtain Much higher total bandwidth is remarkably contributing to improve neural computing to further promote the overall performance of computing system Speed.
By the way that full-time output cell is arranged in cellular array, connect using the full-time output cell as the end point cell It receives and stores other cells to the output data of master cpu, and notified described in the master cpu reading in a manner of interrupt signal Output data so can improve master cpu when only a few cell is needed to master cpu output data and read output The efficiency of data, so as to more efficiently export the final result of neural computing to master cpu.
The communication performance bottleneck that the cellular array computing system provided in technical solution of the present invention solves between CPU and memory is asked Topic.In this way, under identical silicon area, it is top far to surpass a few in many problems for the processing capacity of a large amount of miniature MPU The processing capacity of CPU, and power consumption is then much lower.
Cellular array stores the result (each weighted data) of learning training using nonvolatile storage, once complete instruction Practice.Entire chip product can be sold with simple copy.
The cellular array computing system provided in technical solution of the present invention makes the framework of computer closer to human brain, for not The intelligent algorithm come provides strong engine.
Detailed description of the invention
Fig. 1 is the schematic diagram of computer architecture in the prior art;
Fig. 2 is a kind of structural schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Fig. 3 is the schematic diagram of communication mode between a kind of flanking cell of the embodiment of the present invention;
Fig. 4 is the schematic diagram of communication mode between another flanking cell of the embodiment of the present invention;
Fig. 5 is the structural schematic diagram of the cell of the embodiment of the present invention;
Fig. 6 is the structural schematic diagram that the cell of cell-cell communication is carried out in the cellular array of the embodiment of the present invention;
Fig. 7 is the schematic diagram of the Path selection of cell-cell communication in the cellular array of the embodiment of the present invention;
Fig. 8 is the implementation process schematic diagram of the full-time output cell of the embodiment of the present invention;
Fig. 9 is cell mass-sending schematic diagram of the starting point cell on the angle of target area in the cellular array of the embodiment of the present invention;
Figure 10 is cell mass-sending schematic diagram of the starting point cell on the side of target area in the cellular array of the embodiment of the present invention;
Figure 11 is cell mass-sending schematic diagram of the starting point cell inside target area in the cellular array of the embodiment of the present invention;
Figure 12 is cell mass-sending schematic diagram of the starting point cell outside target area in the cellular array of the embodiment of the present invention;
Figure 13 is another structural schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Figure 14 is another structural schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Figure 15 is the yet another construction schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Figure 16 is the schematic diagram of a neuron in neural network;
Figure 17 is the schematic diagram of neural computing.
Specific embodiment
The computer architecture of the prior art is since there are communication performance bottlenecks between CPU and memory, storage, to influence computer The promotion of overall performance, and keep cost effectiveness poor.
Present inventor considers afterwards after study, if three memory, storage, calculating functions are integrated into a chip Come up, formed relatively easy but have both the independent unit calculated with store function, and makes to be formed between a large amount of such units intensive Communication network, data broadcasting/mass-sending function is realized with this and is capable of the internal network of extensive parallel transmission data, just can be opened A kind of and human brain is issued there are the computing architecture of similarity, this, which is equivalent to, is made in a chip a large amount of microcomputer On.
For this purpose, technical solution of the present invention provides one kind computing architecture similar with human brain structure (in technical solution of the present invention It is referred to as " cellular array computing system "), the computing architecture is relatively easy by numerous structures, storage and computing function are had both, It is made of again the unit (being referred to as " cell " in technical solution of the present invention) that dense network connects.This new computing architecture will It is widely used in the fields such as mass computing, big data processing, artificial intelligence.
To make the above purposes, features and advantages of the invention more obvious and understandable, with reference to the accompanying drawing to the present invention Specific embodiment be described in detail.
As shown in Fig. 2, cellular array computing system provided in an embodiment of the present invention includes: master cpu, cellular array and thin Born of the same parents' array bus;The cellular array is the main body in the cellular array computing system, it is that calculating is had both by more than one The two-dimensional array formed with the cell of store function, wherein each cell includes microprocessor (MPU) and non-volatile deposits at random Reservoir (in Fig. 2 by taking MRAM as an example);The non-volatile random access memory involved data when being calculated for the microprocessor Arbitrary access is also used to store the instruction code of software and needs the data of persistence;Each comfortable institute of each cell storage State position in cellular array as ID in cell software or hardware read;The master cpu passes through the cell battle array Column bus is communicated with each of cellular array cell;There is communication between flanking cell in the cellular array Interface can mutually send data.
It should be noted that in the present embodiment, the non-volatile random access memory is illustrated by taking MRAM as an example.? In other embodiments, with further development and mature, the non-volatile random access memory of non-volatile random storage technology It can be realized using in addition several potential technologies, such as: phase-change random access memory (PCRAM, Phase Change Random Access Memory), resistive formula random access memory (ResistiveRandom Access Memory), ferroelectricity with Machine memory (FeRAM, Ferroelectric Random Access Memory), ferroelectric dynamic random access memory (FEDRAM, Ferroelectric Dynamic Random Access Memory) etc..
MRAM is a kind of new memory and memory technology, can as SRAM/DRAM quick random read-write, and compare DRAM is fast;Can also as flash memory permanent reservation data after a loss of power, and unlike nand flash memory, MRAM can with unlimited time Erasable, service life is longer.In addition, the economy of MRAM thinks that locality is good, the silicon area ratio SRAM that unit capacity occupies is (usually Caching as CPU) there is very big advantage, it is expected to the level close to DRAM.Its performance is also fairly good, and read-write time delay is close to most Good SRAM, power consumption are then best in various memories and memory technology.And MRAM is unlike DRAM and Flash and standard Cmos semiconductor technique is incompatible, and MRAM can be integrated into a chip with logic circuit.By using MRAM technology, so that it may Three memory, storage, calculating functions are integrated on a chip, the realization of the cellular array computing system is made just It is possibly realized.
In the present embodiment, the microprocessor has the function of usual CPU, can also be added according to specific application scenarios The units such as Floating-point Computation processor (FPU, Float Point Unit), image processor, therefore, in the microprocessor can be with It is integrated at least one of Floating-point Computation processor and image processor.
In actual implementation, the master cpu and the cellular array and the cellular array bus be can integrate in one In a chip, the master cpu can also be used as independent chip, by the memory interface of standard with by the cellular array and The chip of the cellular array bus composition is communicated.When using standard between the master cpu and the cellular array When memory interface is communicated, then master cpu can be realized using general cpu chip, it is easier to which the cellular array calculates The implementation of system.
In the present embodiment, as ID, the position can adopt the position in each comfortable cellular array of each cell storage With the coordinate representation mode of first quartile in plane right-angle coordinate, if (x, y) indicates position of some cell in cellular array It sets, then (x, y) can be used as ID and be stored in the cell, and the software and hardware in cell can read this ID, specific It is used in operation.
In the present embodiment, the master cpu is thin by each of the cellular array bus and the cellular array The communication that born of the same parents carry out includes following situations:
The non-volatile random access memory of any cell in the cellular array is read and write by address;
By the non-volatile random access memory of data broadcasting each cell in target area into the cellular array, and write Enter in the target area identical relative address in the non-volatile random access memory of each cell;
Send instruction (including start, suspend) to the microprocessor of any cell in the cellular array, send data or Reading state;
To the microprocessor broadcasting instructions of cells all in target area.
Certainly, in other embodiments, the master cpu passes through in the cellular array bus and the cellular array The communication that each cell carries out is also possible to one of above situation or multiple combinations.
It should be noted that " target area " in the embodiment of the present invention refers to by the master cpu or the cell battle array Any cell in column selects more than one there are the region that the cell of neighbouring relations is constituted, and the cell in the region is master control The object of CPU or any cell broadcast data or instruction in cellular array.In the present embodiment, the target area is specifically with square (a≤x≤b, c≤y≤d, wherein a, b are the boundary of rectangular area x-axis direction in plane right-angle coordinate respectively in shape region Coordinate, c, d are the boundary coordinate in rectangular area y-axis direction in plane right-angle coordinate respectively) for be illustrated.At it In his embodiment, the target area is also possible to the region of other shapes composition, such as diamond-shaped area, delta-shaped region, six Side shape region etc..
In addition, the concept of " broadcast " is different from the concept of " mass-sending " in the embodiment of the present invention, the former can be one pass of hair According to or instruction so that all objects can receive, and the latter then can be point many times and issue different objects.
Except through master cpu to cell any in the cellular array (including microprocessor in cell or non-volatile Random access memory) it is broadcasted, in the cellular array, there are one communication network, which enables to one Cell can send data to the cell adjacent with it under the control of its MPU.As shown in figure 3, in a plane, Ren Yixi Born of the same parents can communicate with the flanking cell of its upper and lower, left and right four direction.Certainly, between flanking cell communication mode it is general Thought is not limited only to " upper and lower, left and right four direction ", in the case where configuration can be supported, be also possible to " upper, Under, left and right, upper left, upper right, lower-left, the direction of bottom right eight ", as shown in figure 4, any cell can with its upper and lower, left and right, Upper left, upper right, lower-left, eight directions in bottom right flanking cell communicated.
As shown in figure 5, in the present embodiment, the cell in the cellular array specifically can also include bus control unit and Cell interior bus, the bus control unit are connected with the cellular array bus, microprocessor and cell interior bus, institute The communication that bus control unit carries out between the master cpu and this cell for identification is stated, connects the microprocessor to transmit The instruction or data, state that the master cpu is sent are read, or by cell interior bus connection it is described it is non-volatile with The read-write operation of machine memory progress data.
As known to those skilled in the art, one fairly simple and the good CPU of performance, such as ARM Cotex M0, only 50,000 A or so metal-oxide-semiconductor, it is also more much smaller than more than one hundred million a metal-oxide-semiconductors of top CPU even if appropriateness increases FPU function, promote cpu performance Bring area (cost) increase is out-of-proportion.The many small CPU of CPU one big are replaced, in the same feelings of totle drilling cost Under condition, total computing capability is centainly increased many times.However convention computer architecture is limited by communication performance bottleneck, using in a large amount of CPU Core bring actual performance is promoted very limited.
And the cellular array computing system that technical solution of the present invention provides passes through data broadcasting/mass-sending and internal network, solution Determined communication performance bottleneck the problem of, to promote the overall performance of computing system, and cost effectiveness can be made preferable, will be subsequent to this This point is more clearly visible that in application example.
It is preliminary studies have shown that being formed one thin if cooperating the memory of 32KB using the MPU for being similar to Cortex M0 Born of the same parents.Using 40 nanometers of technique, 3000 such cells can be done on a single die, this is very powerful calculating energy Power.Further investigations have shown that in this way, the calculating of the present age top CPU can be surmounted in same silicon area Ability (is generally measured with flops per second (FLOPS, floating-point operations per second)). Since the cellular array computing system of technical solution of the present invention no longer faces the bottleneck with memory interface, solving much actually to ask In topic, the meeting of performance is more preferable.
Based on above-mentioned cellular array computing system, the embodiment of the present invention is also provided in a kind of above-mentioned cellular array computing system Communication means, comprising: master cpu reads and writes the communication between the operation of non-volatile random access memory, master cpu and microprocessor It operates, the traffic operation in the broadcast operation and cellular array of master cpu between flanking cell;
The operation of the master cpu read-write non-volatile random access memory specifically includes: any thin in the cellular array Born of the same parents receive the destination address that the master cpu is broadcasted in the cellular array bus, if judging the destination address at this In cell, then the non-volatile random access memory of the cell is connected so that the master cpu carries out the read-write operation of data.
Traffic operation between the master cpu and microprocessor specifically includes: first is reserved in system address space Special address field is for the communication between the master cpu and microprocessor and stores the ID of target cell, if the cell battle array Identify it is communication with the microprocessor of this cell when any cell receives the first special address field in column, then connection should The microprocessor of cell completes subsequent command reception, data receiver and status read operation.
It should be noted that the system address space be not limited only in each cell by cellular array include Non-volatile random access memory composition address space summation because the memory of connection cellular array bus may be not just thin The non-volatile random access memory for including in each cell of born of the same parents' array, entirely possible there is also other kinds of memory and cells Array bus is connected, and accesses for the master cpu.Therefore, the master cpu needs to go to identify according to the ID of cell it and prepares to visit The cell asked (cell is known as " target cell " in the present embodiment at this time).
The broadcast operation of the master cpu specifically includes: reserving the second special address field in system address space and is used for The master cpu broadcasting instructions, the second special address field have and can assist in target area in the cellular array The ID of each cell of range, if any cell identifies that this is thin after receiving the second special address field in the cellular array Born of the same parents connect the microprocessor of the cell then in the target area to transmit the instruction or data, shape that the master cpu is sent State is read, or the non-volatile random access memory by connecting the cell carries out the read-write operation of data.
The broadcast operation of the master cpu is illustrated so that the target area is specifically rectangular area as an example below. One section is reserved in system address space and is used as broadcasting instructions, and one in this address section rises in target rectangle region for storing The ID of point cell.The starting point cell is the first cell accessed by master cpu in the target rectangle region, total in the cell After lane controller receives this special address, the data of a subsequent word (word) are received, this data includes target rectangle The ID of the cell diagonal with starting point cell in region.Bus control unit judges this cell in this region, receives second word Data.It is that some relative address is opened to the instruction or data of MPU, or from non-volatile random access memory that second word, which is indicated, Begin to be written.If it is the former, MPU is connected, if it is the latter, non-volatile random access memory is connected and completes subsequent operation.
It should be noted that in the case where the memory space of the described second special address field is relatively limited, storage Cell ID may not be able to determine therefrom that the range of target area completely, after receiving the described second special address field at this time, also Need to receive subsequent data to cooperate the cell ID stored in the described second special address field to determine target area jointly Range.
Traffic operation in cellular array between flanking cell includes: any cell in the cellular array at its micro- place It manages and sends data to flanking cell under the control of device.
In the present embodiment, each cell is provided with a bus control unit, connect with cellular array bus.Cell interior is set Cell interior bus is set, the non-volatile random access memory is the slave equipment (Slave) of the cell interior bus, total line traffic control Device and microprocessor processed are main equipment (Master).
" any cell judgement in the cellular array involved in communication means in above-mentioned cellular array computing system Whether the destination address is in this cell ", " identifying whether the communication with the microprocessor of this cell " " identifies that this cell is It is no in the target area ", and " connection non-volatile random access memory or microprocessor " is by the bus control unit It completes, the bus control unit connects the non-volatile random access memory by the cell interior bus.
In the specific implementation, non-volatile random access memory of the master cpu for any cell in the cellular array Read-write operation priority be higher than the intracellular microprocessor for corresponding non-volatile random access memory read-write operation. That is, if the microprocessor in some cell needs to read and write the non-volatile random access memory in the cell, the master cpus such as palpus For just can be carried out after the completion of the read-write operation of the non-volatile random access memory of the cell.
The specific implementation of communication means in the cellular array computing system can also be calculated with reference to above-mentioned cellular array The implementation of system, details are not described herein again.
In the present embodiment, implementation for the internal network of cellular array can not only send data to flanking cell, and And also it is expanded to and can be sent to any one cell from a cell, it can realize that the iuntercellular in cellular array is logical Letter.
It specifically, can be not against master between any two cell in the cellular array of the cellular array computing system It is communicated in the case where control CPU, the cell for participating in cell-cell communication includes starting point cell, end point cell and transfer cell, institute Stating starting point cell is the cell that data are issued to the end point cell, and the end point cell is finally receives the starting point cell institute The cell of data is sent out, the transfer cell is successively adjacent along cell-cell communication path and by described in the communication interface transfer The cell of the sent out data of starting point cell, the cell-cell communication path are by the starting point cell, transfer cell and end point cell The data transmit-receive path constituted.
By the communication interface between flanking cell in cellular array, realize between flanking cell for data it is multiple in Turn, makes just to can be carried out communication not against master cpu between any two cell in cellular array, improve cell-cell communication Efficiency also reduces the processing load of master cpu, so as to further promote the overall performance of computing system.
It should be noted that the starting point cell, end point cell and transfer cell are logical relative to certain iuntercellular Relative concept for letter process, because some starting point cell completely may also be as the transfer during other cell-cell communications Cell or end point cell, some end point cell may also as during other cell-cell communications transfer cell or starting point it is thin Born of the same parents.
In the specific implementation, the cell in the cellular array can also include the network control being connected with the microprocessor Device processed, the network controller are used in cell-cell communication to the data of sending, the data of transfer or final received number According to transmitting-receiving control is carried out, it is also used to send interrupt signal to the microprocessor.In the present embodiment, by being set into the cell each A network controller is set, so as to the rapidly interim data in the case where not interfering MPU, thus reduces the place of MPU in cell Reason burden.In other embodiments, it can also be not provided with the network controller, but realize the transfer of data by MPU.
In the present embodiment, " data of sending " refer to the data that the starting point cell itself is sent out;" the number of transfer According to " referring to the data that starting point cell described in the transfer cell transfer issued, the data are not as needed for this cell itself It to be issued;" final received data " refer to end point cell received data, and the data are after by multiple transfer It arrived destination, will no longer carry out transfer." data of sending ", " data of transfer " and " final received data ", it is just interior It may be identical data for appearance, only belong to the different addresses in different communication stage.
In the specific implementation, the cell in the cellular array can also include one group to be connected with the network controller Or one group or more First Input First Output (FIFO, First Input First Output), each group First Input First Output is right respectively A cell adjacent with this cell is answered, each group of First Input First Output includes input First Input First Output and output first in, first out Queue, the input First Input First Output, which is used to store, inputs data or final received data that this cell carries out transfer, institute Output First Input First Output is stated to send out for storing from the data of the need progress transfer of this cell output or this cell to other cells Data out.
If in the case where between the flanking cell shown in Fig. 3 for communication mode, in the cellular array of the present embodiment into The structure of the cell of row cell-cell communication as shown in fig. 6, network controller in Fig. 6 respectively with MPU and 4 group of fifo queue phase Even, each group fifo queue is a pair of in " the upper and lower, left and right four direction of two-dimensional surface " adjacent cell one with this cell respectively It answers, when it is implemented, the communication channel between every two flanking cell can share one group of corresponding fifo queue.Every group of FIFO Queue includes input FIFO and output FIFO, and for standing in the wherein angle of a cell, input FIFO is stored from other The data of flanking cell input, output FIFO store data from this cell to other flanking cells that export from, flanking cell it is defeated FIFO belongs to input FIFO, the output FIFO of this cell for this cell and belongs to input for flanking cell out FIFO。
It should be noted that corresponding in cell shown in fig. 6, there are 4 groups of fifo queues, if being located at the cell battle array of rectangle The cell at 4 angles of column, then only there are two flanking cells for the cell, and the cell corresponds to 2 groups of fifo queues at this time, if being located at square The cell on 4 sides of the cellular array of shape, then there are three flanking cells for the cell, and the cell corresponds to 3 groups of fifo queues at this time.
In the present embodiment, network controller is also connected with the intracellular MPU, give it send out interrupt signal, as FIFO empty, FIFO completely, newly arrives data, data submitting etc.;MPU then can issue data by network controller, and the data of sending would generally It is first put into corresponding output fifo queue.
It should be noted that the eucaryotic cell structure in Fig. 6 illustrate only module relevant to cell-cell communication is carried out, ability Field technique personnel are, it is understood that eucaryotic cell structure shown in fig. 6 can combine completely with eucaryotic cell structure shown in fig. 5.
In addition, outputting and inputting the data of some cell in the present embodiment using fifo queue storage, can so make thin Data relay in intercellular communication process more efficiently, reduces the processing load of MPU.In other embodiments, it outputs and inputs The data of some cell can also be realized by register.
The embodiment of the present invention also provides the communication means in a kind of above-mentioned cellular array computing system between cell, comprising: The data that starting point cell in the cellular array will be issued to end point cell are sent to and described by selected sending direction The adjacent cell of point cell;The data of flanking cell sending or the number of transfer are received when any cell in the cellular array According to when, if according to the ID for the end point cell indicated in the data received judge this cell be end point cell, will The data received are stored in the non-volatile random access memory of this cell, or notify the microprocessor of this cell to received data It is handled, otherwise this cell is used as transfer cell, and the data relay received is given this after selected sending direction The adjacent cell of cell.
In the specific implementation, each data involved in cell-cell communication process all can be thin containing starting point cell and terminal The ID of born of the same parents, any cell can judge that the data are intended for this according to the ID for the end point cell indicated in the data received Cell still needs to be given to other flanking cells in further.One data passes through the connection between flanking cell, through excessive Secondary transfer, cell of reaching home can be according to starting points if the end point cell needs to make feedback with regard to the sent out data of starting point cell Feedback data is sent to starting point cell by the ID of cell, and the end point cell is with the starting point cell indicated in the data that receive ID of the ID as end point cell, is indicated in obtained feedback data, at this time after handling the data received The end point cell becomes starting point cell when a new cell-cell communication, and it is logical that original starting point cell then becomes the secondary iuntercellular End point cell when letter.
When it is implemented, while the ID for indicating end point cell, data that the starting point cell is issued to end point cell In also indicate address or MPU to be accessed in the end point cell;The data that will be received are stored in the non-of this cell Volatile random access memory, after being the address to be accessed that the end point cell is indicated in identifying the data received It carries out;The MPU for notifying this cell handles received data, and the end point cell is received identifying It is carried out after the MPU indicated in data.
In actual implementation, if the address to be accessed that end point cell is indicated in identifying the data received Afterwards, then received data can be write direct the non-volatile random access memory of the cell by the network controller in the end point cell In appropriate address, in this case, cell may be implemented " to breed ", and cell can download journey to another cell Sequence;If after the MPU that end point cell is indicated in identifying the data received, received data will be transferred in end point cell MPU processing.
In the present embodiment, described since the cell in cellular array further includes the network controller being connected with MPU Starting point cell issues data to end point cell, any cell in the cellular array receives data that flanking cell issues or in The data that turn and judge this cell be final cell or transfer cell, the data received are stored in this cell it is non-volatile with Machine memory notifies the MPU of this cell to handle received data, is under the control of the network controller It completes.
When it is implemented, the data that the starting point cell is issued to end point cell are first as described in network controller input Export First Input First Output, then by the network controller from the output First Input First Output export to the starting point cell Adjacent cell;If any cell in the cellular array receives the data of flanking cell sending or the data of transfer, The data received are inputted into the input First Input First Output, and again when the data for judging to receive need to carry out transfer The data are inputted into the output First Input First Output.
In addition, if the network controller judges the input First Input First Output or output First Input First Output for sky Or expired, or receive flanking cell and issue or the data of transfer, or issue data or interim data to flanking cell, then to The microprocessor sends interrupt signal.
In the specific implementation, the starting point cell or transfer cell can select in the following way described sender to: If the communication path of straight line, the transmission can be formed between the starting point cell or transfer cell and the end point cell Direction is along the straight line from the starting point cell or transfer cell to the direction of the end point cell, otherwise described sender to For from the starting point cell or transfer cell to the direction of flanking cell to be selected, the flanking cell to be selected is thin with the starting point Close to the cell of the end point cell among born of the same parents or the adjacent cell of transfer cell.Certainly, the quantity of the flanking cell to be selected It is possible that then selecting the less cell of communication task of output data in the two flanking cells to be selected as at this time for two Turn cell.
In the present embodiment, the starting point cell or transfer cell select sending direction through the above way, actually may be used To be considered the path selection process of cell-cell communication in cellular array.Can be refering to Fig. 7, each rectangle in Fig. 7 indicates thin A cell in born of the same parents' array, all cells shown in Fig. 7 are a part in entire cellular array, it is assumed that flanking cell it Between communication mode as shown in Figure 3 carry out.
If A point indicates that a starting point cell, the starting point cell prepare to issue data to the end point cell where C point, by Be clearly the communication path for being capable of forming straight line between A point and C point, then the cell where A point by data be sent to and its Cell where adjacent B point, similarly, the cell where B point continue on the straight line between A point and C point as transfer cell The direction interim data of cell where to C point is successively adjacent thin on the cell-cell communication path formed between A point and C point Born of the same parents repeatedly forward the data that cell where A point issues, until being transmitted to cell where C point.
If D point indicates another starting point cell, which prepares to issue data to the end point cell where G point, Due to being clearly the communication path that can not form straight line between D point and G point, then in the cell adjacent with cell where D point Among, the cell where cell and F point where E point is obviously closer to the end point cell where G point, then the two cells belong to The flanking cell to be selected of cell, the less cell of communication task that can choose wherein output data are thin as transfer where D point Born of the same parents choose at random a cell as transfer cell if the communication task of the two cell output datas is identical.Such as Fig. 7 institute Show, the cell where cell or F point where selection E point will form different cell-cell communication paths.
It should be noted that being by taking the communication mode between flanking cell shown in Fig. 3 as an example in the present embodiment to cell Between the Path selection that communicates be illustrated, if it will be appreciated to those of skill in the art that using shown in Fig. 4 adjacent thin Communication mode between born of the same parents, then alternative sending direction will be more.
To sum up, in actual implementation, each is issued or the cell of interim data, network controller must all select one A adjacent cell is as the next stop.When beginning and end point-blank when, reasonable selection it is general only one;Other In the case of, there are two same reasonable selection, network controller will select the opposite neighbours not being in a hurry of a traffic.
If some input fifo queue has data entrance, network controller will first check for it:
If terminal is this cell: if terminal is specific relative address, since network controller has direct memory The data received will be directly stored in described non-volatile deposit at random by the ability for accessing (DMA, Direct Memory Access) Appropriate address in reservoir, and with interrupt notification MPU;If terminal is MPU, directly handled with interrupt signal notice MPU.
If the MPU that terminal is other cells or this cell sends out data: if terminal and this cell exist On straight line, then selection is correctly oriented, and sends data to flanking cell;In the case of other, there are two possible direction, choosings The flanking cell that wherein output fifo queue is more idle is selected to send, if the output fifo queue in two flanking cells to be selected Situation is identical, then can therefrom choose at random a flanking cell and send.
In actual implementation, when the thousands of MPU present in the cellular array are calculated together, how each cell Output data be sent to the master cpu just and become a problem.In general, each MPU can store output data Agreed address where it in non-volatile random access memory of cell allows master cpu by way of each MPU of poll one by one It is read out.However, this is not particularly suited for all problems, in some problems, when there was only a few cell in cellular array When needing to master cpu output data, then each MPU efficiency of poll is too low one by one for master cpu.
Therefore, cellular array computing system provided in an embodiment of the present invention further include: be additionally provided in the cellular array to A few full-time output cell, the sole duty export cell as end point cell and receive and store other cells to the master control The output data of CPU, and notify the master cpu to read the output data with interrupt signal.
In the specific implementation, FIFO team can also be set in the non-volatile random access memory of the full-time output cell Column, other cells are stored in the fifo queue to all output datas of the master cpu, which should have foot Enough memory spaces have the ability to store other cells to all output datas of the master cpu.
When actual implementation, one or several cells can be selected as the full-time output cell in cellular array, one As can choose and communicated more convenient and fast cell between master cpu in position.The full-time output cell with it is described Interrupt line is equipped between master cpu, the full-time output cell can send interrupt signal to master cpu, such as newly arrive other The FIFO that the fifo queue being arranged in the output data of cell, MRAM expired, is arranged in MRAM sky etc..
Based on the above-mentioned cellular array computing system for being equipped with full-time output cell, the embodiment of the present invention also provides a kind of cell Communication means in array computation system, comprising: the sole duty exports cell and receives and stores other cells to the master cpu Output data after, give notice the interrupt signal of reading to the master cpu;The master cpu is receiving described lead to After knowing the interrupt signal of reading, the output data is read from the full-time output cell.
When it is implemented, the output data can be sent to the sole duty in the following way defeated for other described cells Cell out: any cell in other described cells is sent out the output data by selected sending direction as starting point cell It send to adjacent cell;When any cell in the cellular array receives the output data of flanking cell transmission, If the ID for judging the end point cell indicated in the output data is consistent with the ID of this cell, due in the output data The ID for the end point cell indicated is the ID of the full-time output cell, shows that this cell is that the sole duty exports cell, then will The output data is stored in the non-volatile random access memory of this cell, and otherwise this cell is used as transfer cell, in selected sender The output data transfer is given into the adjacent cell of this cell backward.
During the output data is sent to the full-time output cell by other described cells, the starting point cell Or transfer cell can select in the following way described sender to: if the starting point cell or transfer cell and the sole duty The communication path of straight line can be formed between output cell, then described sender is to for by the starting point cell or transfer cell Along the straight line to the direction of the full-time output cell, otherwise described sender is to for by the starting point cell or transfer cell To the direction of flanking cell to be selected, the flanking cell to be selected is among the cell adjacent with the starting point cell or transfer cell Close to the cell of the full-time output cell.
The implementation process of the full-time output cell of the embodiment of the present invention can also be refering to Fig. 8.Fig. 8 shows master cpu, thin Born of the same parents' array and cellular array bus, the lattice one by one in cellular array simply represent cell one by one, wherein J point The cell (i.e. the cell of bold box lattice expression) at place is full-time output cell, and Fig. 8 further illustrates full-time output The structure of cell, as represented by dashed arrows in fig. 8, it can be seen that be equipped in the MRAM in sole duty output cell and store other carefully Fifo queue of the born of the same parents to all output datas of the master cpu.
Assuming that the cell where cell and I point where H point needs to provide output data to master cpu, then can pass through The output data is sent to the cell where J point, H point to the cell-cell communication path of J point by the communication mode between cell And I point please refers to Fig. 8 to the cell-cell communication path of J point.Since cell-cell communication mode has had a detailed description before this, this Place repeats no more.
Cell where J point receive cell or I point where H point where the output data that issues of cell after, then may be used Notify the interrupt signal read to master cpu to send, it, can after master cpu receives the interrupt signal of notice reading The output data is read from the cell where J point by cellular array bus.
By the way that full-time output cell is arranged in cellular array, connect using the full-time output cell as the end point cell It receives and stores other cells to the output data of master cpu, and notified described in the master cpu reading in a manner of interrupt signal Output data so can improve master cpu when only a few cell is needed to master cpu output data and read output The efficiency of data.
Introduce the example of an above-mentioned cellular array computing system of application again below.
Speech recognition can be compared with the voice signal of known sound bank and input, this comparison can be when Domain compares and can also compare in frequency domain.When needing the words that compares more and more, such as, it is contemplated that different accents can arrive It is tens of thousands of, seem insufficient if only relying on the computing capability of a few CPU for Real-time speech recognition.
Cellular array computing system provided in an embodiment of the present invention is then very suitable to solve problems.
For this purpose, the embodiment of the present invention also provides a kind of side for carrying out comparing using above-mentioned cellular array computing system Method, comprising: the master cpu selects in the cellular array all or after the cell in a target area, alignment programs It is broadcast in the non-volatile random access memory of each cell;The master cpu is responsible for selected each cell in the sample compared Product are respectively written into the agreed address of each cell;The master cpu broadcasting instructions give the microprocessor of selected cell, make each The data to be compared to be entered such as after a microprocessor completion initialization;The master cpu is data broadcasting to be compared to institute Select the microprocessor of cell;The microprocessor of selected cell runs the alignment programs, to the number to be compared received It is compared according to the sample for being responsible for comparing with this cell, if obtaining the consistent comparison result of the two, uses above-mentioned cellular array Communication means in computing system is sent to the full-time output cell for the comparison result as output data for described Master cpu is read.
When it is implemented, the data to be compared are either voice data to be identified, is also possible to be identified Image data can also be other data for needing to be compared.
In actual implementation, each MPU constantly receives voice data and is compared, it is generally the case that several hundred a to thousands of In a cell, only one or a few obtain data to be compared with this cell to be responsible for both samples compared consistent Comparison result is sent to full-time output cell by comparison result, these cells, and the latter notifies master cpu to receive with interrupt signal.
If the data to be compared are specially voice data, the process of comparing can time domain or frequency domain into Row can be first segmented by master cpu if it is the latter and carry out Fast Fourier Transform (FFT) (FFT, Fast Fourier Transformation), then broadcast have been converted to frequency domain voice data give selected cell MPU.
Comparing is carried out by the above-mentioned cellular array computing system for being equipped with full-time output cell of application, cell can be made The operation of program is compared in a large amount of cell simultaneously in array, thus has extremely strong parallel processing capability, solves existing Communication performance bottleneck problem in technology between CPU and memory is greatly improved real-time voice/image recognition ability.
As previously mentioned, from a cell sending bulk message to some target area in cellular array, had one it is simple Method: by master cpu read information broadcasted again.The present embodiment also provides another implementation: intercellular Point communications functionality is extended to region mass-sending, and this mode can support the higher depth of parallelism, much higher total bandwidth.
In cellular array computing system provided in this embodiment, any cell in the cellular array can also be used as institute It states all cells of the starting point cell into target area and carries out mass-sending communication, participate in the mass-sending communication and be located in target area Cell the transfer cell and end point cell, ginseng are used as the starting point cell or as the end point cell or simultaneously It is communicated with the mass-sending and the cell outside target area is as the starting point cell or transfer cell.
When it is implemented, the network controller being connected in each cell with microprocessor, in addition to any two cell into Row cell-cell communication, but also in mass-sending communication, to the data of sending, the data of transfer or final received data Transmitting-receiving control is carried out, the network controller is also used to send interrupt signal to the microprocessor.
In actual implementation, the original sender (cell in cellular array as starting point cell) of iuntercellular mass-sending communication It is responsible for indicating target area, the mass-sending of data is completed still through a series of transfers.It will be appreciated to those of skill in the art that Iuntercellular mass-sending communication is it is also assumed that be effective superposition of multiple intercellular point-to-point communication, therefore iuntercellular mass-sending communicates Specific implementation can also be with reference to the implementation communicated between any two cell, such as the cellular array referred to before this In cell equally may include a group or more of First Input First Outputs being connected with the network controller, herein no longer It repeats.
On the basis of mass-sending communication between above-mentioned cellular array computing system sertoli cell, the embodiment of the present invention also provides one Iuntercellular mass-sends communication means in the above-mentioned cellular array computing system of kind, comprising: when any cell in cellular array is used as It, will be thin if the starting point cell is located in the target area when point cell all cells into target area initiate mass-sending communication Intercellular mass-sending data are sent to all flanking cells in the target area, and are directed to each flanking cell more fresh target Otherwise iuntercellular mass-sending data are sent to adjacent cell by the direction close to target area by region;If being located at target Cell outside region receives the iuntercellular mass-sending data of flanking cell transmission, then is judging the iuntercellular mass-sending number After not including this cell according to middle indicated target area, this cell is used as transfer cell, will by the direction close to target area The iuntercellular mass-sending data relay is to flanking cell;If the cell being located in target area receives the institute of flanking cell transmission State iuntercellular mass-sending data, then after the target area indicated in judging the iuntercellular mass-sending data is comprising this cell, This cell is used as end point cell, and the iuntercellular received mass-sending data are stored in the non-volatile random access memory of this cell, Or notify the microprocessor of this cell to the iuntercellular mass-sending data handle, if the target area in still have with The adjacent cell of this cell, then this cell is also used as transfer cell, by the iuntercellular received mass-sending data relay to institute There is the flanking cell being located in target area, and updates target area for each flanking cell;Updated target area Including one or more target areas made of being divided as the target area before updating, issued in the target area before update Or each flanking cell of the cell of the mass-sending data of iuntercellular described in transfer is separately included in updated each target area It is interior, it has issued or iuntercellular described in transfer is mass-sended except the target area of the cell exclusion of data in the updated.
It should be noted that since master cpu can also be by the data broadcasting of some cell to some in cellular array Target area, in order to " broadcast data of master cpu " different from, therefore in the present embodiment by iuntercellular mass-send communication when institute The mass-sending data being related to are known as " iuntercellular mass-sending data ".The cell meeting hard objectives region for initiating iuntercellular mass-sending communication, should The range of the ID of all cells or all cell ID will be indicated among iuntercellular mass-sending data in target area, appoint One cell receives the iuntercellular mass-sending data, just can mass-send the target area indicated in data according to the iuntercellular Judge that iuntercellular mass-sending data are finally received by this cell, or need to be given to other flanking cells in further, Or the two all needs execution.
In addition, described update target area for each flanking cell, it is specifically that the target area before updating is drawn One or more target areas made of point (have issued or the cell of the data of the mass-sending of iuntercellular described in transfer are excluded Except updated target area), wherein each target area can respectively contain the flanking cell (i.e. and before update Issued in target area or transfer described in iuntercellular mass-sending data the adjacent cell of cell), each described flanking cell exists Respectively continue intercellular mass-sending communication in corresponding updated target area, correspondingly, the iuntercellular mass-sending The target area indicated in data equally can also be updated.
In the present embodiment, with communication mode between flanking cell shown in Fig. 3, and to initiate the starting point cell institute of mass-sending communication The shape of determining target area be rectangle for be illustrated.It should be noted that iuntercellular group given by the present embodiment More convenient and efficient mode when communication mode is actual implementation is sent out, it will be appreciated by those skilled in the art that in other implementations In example, iuntercellular mass-sending communication means equally can be suitably used for communicating between other flanking cells in above-mentioned cellular array computing system The target area of mode or other shapes.
When it is implemented, the side of the difference of the cell present position as starting point cell or transfer cell, sending or transfer Formula will be different.
When the first cell as starting point cell or transfer cell is located on the angle of rectangular target areas, if the square The cell quantity on the wherein one side on the adjacent both sides containing first cell is 1 in shape target area, then updated mesh Mark region is that the rectangular target areas excludes the rectangle region formed after first cell in the another side on the adjacent both sides Domain, otherwise updated target area includes the target area of two rectangles, and one of target area is the adjacent both sides Middle any side excludes the rectangular area formed after first cell.It should be noted that described first in the present embodiment is thin Born of the same parents are the general designations of a kind of cell on the angle of rectangular target areas.
It can be refering to Fig. 9, it is assumed that the cell where K point is the starting point cell for initiating iuntercellular mass-sending communication, or is negative Blame the transfer cell of transfer iuntercellular mass-sending data, rectangular target areas 101 be the cell where K point issue or transfer described in it is thin Intercellular mass-sends identified target area before data, and the cell where K point is in rectangular target areas 101 at this time, and is located at square On the angle of shape target area 101, since this edge in the horizontal direction of rectangular target areas 101 only includes 1 cell, K at this time Cell only one neighbour where point can choose as next stop transfer, then the network controller of the cell is by the cell Between mass-sending data be sent to the cell where L point, and rectangular target areas 101 is updated, the target area formed after update Domain is rectangular target areas 102, is equivalent to and excludes the cell where K point except rectangular target areas 101;With target area The continuous renewal in domain stops transfer if being left the last one cell in target area.
Assuming that the cell where M point is also the starting point cell for initiating iuntercellular mass-sending communication, or to be responsible for transfer cell Between mass-send data transfer cell, rectangular target areas 103 be M point where cell issue or transfer described in iuntercellular mass-send number According to preceding identified target area, the cell where M point is in rectangular target areas 103 at this time, and is located at rectangular target areas On 103 angle, since the adjacent both sides of rectangular target areas 103 include 1 or more cell, the cell where M point has at this time Two neighbours can choose as next stop transfer, then the network controller of the cell sends iuntercellular mass-sending data to The cell where cell and O point where N point, and rectangular target areas 103 is updated, updated target area includes The target area of two rectangles, one of target area are rectangular target areas 104, another target area is rectangular target Region 105 is equivalent to the cell where M point excluding rectangular target areas 104 and rectangle except rectangular target areas 103 Target area 105 can be used as independent target area and continue to carry out data relay with aforementioned similar approach;With target area Continuous renewal stop transfer if being left the last one cell in target area.
When the second cell as starting point cell or transfer cell is located on the side of rectangular target areas, if the square It with the cell quantity where second cell while adjacent is 1 in shape target area, then updated target area includes Side where second cell excludes the target area of two rectangles formed after second cell, otherwise updated target Region includes the target area of three rectangles, and two of them target area is that side where second cell excludes described second carefully Two rectangular areas formed after born of the same parents.It should be noted that second cell in the present embodiment is to be located at rectangular target area The general designation of a kind of cell on the side in domain.
It can be with refering to fig. 10, it is assumed that the cell where P point is the starting point cell for initiating iuntercellular mass-sending communication, or is negative Blame the transfer cell of transfer iuntercellular mass-sending data, rectangular target areas 111 be the cell where P point issue or transfer described in it is thin Intercellular mass-sends identified target area before data, and the cell where P point is in rectangular target areas 111 at this time, and is located at square On certain side of shape target area 111, where cell as where in rectangular target areas 111 with P point while adjacent Cell quantity is greater than 1, then the cell where P point can choose there are three neighbours as next stop transfer, the network of the cell at this time It is thin where cell that iuntercellular mass-sending data are separately sent to where Q point by controller, cell and S point where R point Born of the same parents, and rectangular target areas 111 is updated, updated target area includes the target area of three rectangles, respectively Rectangular target areas 112, rectangular target areas 113 and rectangular target areas 114 are equivalent to and exclude the cell where P point Except rectangular target areas 111, rectangular target areas 112 and the two target areas of rectangular target areas 113 are the equal of P point Two rectangular areas formed after cell where the cell at place where side exclusion P point, rectangular target areas 112, rectangle mesh Mark region 113 and rectangular target areas 114, which can be used as independent target area, to be continued to carry out in data with aforementioned similar approach Turn;With the continuous renewal of target area, if being left the last one cell in target area, stop transfer.
It is understood that if in target area (not indicated in Figure 10) with where cell where P point while adjacent Cell quantity be 1, then the cell where P point can choose as next stop transfer, the network of the cell there are two neighbours at this time The iuntercellular is mass-sended the cell where cell and R point where data are separately sent to Q point by controller, and to target area It is updated, updated target area includes the target area of two rectangles, specially rectangular target areas 112 and rectangle mesh Mark region 113.
When the third cell as starting point cell is located at the inside of rectangular target areas, updated target area includes The target area of four rectangles, two of them target area are after third cell place row or column excludes the third cell Two rectangular areas formed, other two target area is the rectangular target areas before update by the third cell institute It is expert at or column split and two rectangular areas being formed.It should be noted that the third cell in the present embodiment is to be located at The general designation of a kind of cell of the inside of rectangular target areas, the inside of the rectangular target areas refer to except " angle " and " side " with Outer region.
It can be with refering to fig. 11, it is assumed that the cell where T point is the starting point cell (the present embodiment for initiating iuntercellular mass-sending communication Cell where middle T point can not mass-send the transfer cell of data for responsible transfer iuntercellular), rectangular target areas 121 is T point The cell at place issues identified target area before the iuntercellular mass-sending data, and the cell where T point is located at rectangle at this time The inside of target area 121, the cell where T point can choose there are four neighbours as next stop transfer, the network control of the cell Device processed by iuntercellular mass-sending data be separately sent to the cell where U point, the cell where V point, the cell where W point and Cell where X point, and rectangular target areas 121 is updated, updated target area includes the target of four rectangles Region, respectively rectangular target areas 122, rectangular target areas 123, rectangular target areas 124 and rectangular target areas 125, It is equivalent to and the cell where T point is excluded into rectangular target areas 122 and rectangular target areas except rectangular target areas 121 123 the two target areas are the equal of that cell where T point is expert at two rectangles formed after the cell where excluding T point Region, rectangular target areas 124 and the two target areas of rectangular target areas 125 are the equal of rectangular target areas 121 by T Cell where point, which is expert at, to be divided and two rectangular areas of formation, rectangular target areas 122, rectangular target areas 123, square Shape target area 124 and rectangular target areas 125, which can be used as independent target area, to be continued to be counted with aforementioned similar approach According to transfer;With the continuous renewal of target area, if being left the last one cell in target area, stop transfer.
In the present embodiment, when the 4th cell as starting point cell or transfer cell is located at except target area, if The communication path of straight line can be formed in 4th cell and target area between any cell, then the 4th cell hair Out or the sending direction of the mass-sending data of iuntercellular described in transfer is along the straight line from the 4th cell to the side of target area To, otherwise described sender is to for from the 4th cell to the direction of flanking cell to be selected, the flanking cell to be selected be with Close to the cell of target area among the adjacent cell of 4th cell.It should be noted that described in the present embodiment Four cells are the general designations of a kind of cell except rectangular target areas.
It can be with refering to fig. 12, it is assumed that the cell where Y1 point is the starting point cell for initiating iuntercellular mass-sending communication, rectangle mesh Marking region 131 is that the cell where Y1 point issues identified target area before the iuntercellular mass-sending data, at this time Y1 point institute Cell except rectangular target areas 131, due to the cell rectangular target areas with respect to two sides extended line between, With the communication path for being capable of forming straight line where Y3 point in rectangular target areas between cell, only one neighbour can at this time Using the transfer as the next stop, iuntercellular mass-sending data are sent to this neighbour by the network controller of the cell where Y1 point It occupies, i.e. cell where Y2 point, the cell where Y2 point is as the transfer cell for being responsible for the mass-sending data of iuntercellular described in transfer.Y2 Cell where point will be along direction interim data shown in dotted arrow in Figure 12, until being transmitted to cell where Y3 point.Y3 Cell where point is located on the side of rectangular target areas 131, can continue to complete rectangular target areas according to aforementioned correlation technique Transfer process in 131.
With continued reference to Figure 12, it is assumed that the cell where Z1 point is the starting point cell for initiating iuntercellular mass-sending communication, rectangle mesh Marking region 131 is that the cell where Z1 point issues identified target area before the iuntercellular mass-sending data, at this time Z1 point institute Cell except rectangular target areas 131, due to the cell not rectangular target areas with respect to two sides extended line Between, it is all difficult to form the communication path of straight line between cell any in rectangular target areas, there are two neighbours at this time It can be used as the transfer of the next stop, i.e. the cell where cell and Z3 point where Z2 point, the two cells are where Z1 point The flanking cell to be selected of cell, because the two cells are among the adjacent cell of cell where Z1 point closer to rectangular target The cell in region 131.In actual implementation, it can arbitrarily select one or more practical communication situation selects a burden to compare For light cell as next stop transfer, the communication task that the lighter cell of the burden refers specifically to output data is less Cell.Cell where Z1 point, by two feasible transfer communication paths, until the iuntercellular is mass-sended data In go to cell where Z4 point.Cell where Z4 point is located on the angle of rectangular target areas 131, can be according to aforementioned correlation technique Continue to complete the transfer process in rectangular target areas 131.
Iuntercellular mass-sends communication means in cellular array computing system provided in this embodiment, by by intercellular point pair Point communication function expands to region mass-sending, can support the higher depth of parallelism, obtain much higher total bandwidth, to further mention Rise the overall performance of computing system.
As previously mentioned, the cell in the cellular array of the embodiment of the present invention has both memory, storage and calculates three functions, carefully The non-volatile random access memory intracellular had not only been able to achieve the arbitrary access of involved data when the microprocessor calculates, but also energy Store the instruction code of software and need the data of persistence, however the cost of the non-volatile random access memory be usually compared with High, so the space that the non-volatile random access memory in cell is used as memory part is limited, then when place micro- in cell When the data processing amount of reason device is larger, limited memory headroom can influence the treatment effeciency of microprocessor, how extend thin The memory headroom of born of the same parents becomes as urgent problem to be solved.
Based on above-mentioned consideration, the embodiment of the present invention gives another structure of cellular array computing system, such as Figure 13 Shown, the cellular array computing system is gone back in addition to including foregoing master cpu, cellular array and cellular array bus It may further include at least one memory cell array, the memory cell array is made of more than one internal storage location Two-dimensional array, the cellular array and all memory cell arrays overlap to form three-dimensional structure, in each memory cell array Internal storage location is connected correspondingly with the cell in the cellular array, and internal storage location cooperation is described non-volatile to deposit at random Reservoir, the two are provided commonly for the arbitrary access of involved data when the microprocessor calculates.
In actual implementation, the non-volatile random access memory in cell can be MRAM, and the memory cell array then may be used To be MRAM, DRAM or SRAM silicon wafer, lower-cost one or more DRAM silicon wafer can be generally chosen, wherein each DRAM Silicon wafer is the memory cell array formed by the internal storage location consistent with each cell position in the cellular array, then by institute There is DRAM silicon wafer to carry out 3D with cellular array silicon wafer to combine, either memory unit and cell corresponding in cellular array it Between can by cross through silicon via (TSV, Through Silicon Vias) establish a communications link, thus extend the interior of each cell It deposits.
In the embodiment of the present invention, by the memory cell array for being made of at least one more than one internal storage location, with The cellular array overlaps to form three-dimensional structure, and makes thin in the internal storage location and cellular array in each memory cell array Born of the same parents are connected correspondingly, the arbitrary access of the internal storage location involved data when calculating for the microprocessor, so Just the memory headroom that each cell in cellular array can be extended with lower cost improves the processing effect of microprocessor in cell Rate.
It overlaps to form three-dimensional with the cellular array it should be pointed out that illustrating only a memory cell array in Figure 13 The case where structure, those skilled in the art equally will also appreciate that more than one memory cell array and the cellular array overlap The case where forming three-dimensional structure.
Those skilled in the art are also understood that aforementioned master cpu passes through in cellular array bus and cellular array Each cell communicated, is carried out between any two cell not against master cpu communication, any cell are into target area All cells carry out mass-sending communication, receive and store it by the way that full-time output cell is arranged in cellular array as end point cell His cell the communication means such as reads to the output data of master cpu for master cpu, these are equally applicable to comprising the memory The cellular array computing system of cell array.
It should be noted that due to the memory headroom of each Cell expansions in cellular array, the master cpu is in addition to can To access the non-volatile random access memory of this cell, can also access corresponding with this cell internal storage location (when with cell battle array Column are superimposed as the quantity of the memory cell array of three-dimensional structure when being more than one, then corresponding with this cell internal storage location Quantity also has more than one), thus the master cpu by the cellular array bus with it is each in the cellular array The communication that a cell carries out includes at least one of following situations: reading and writing the non-of any cell in the cellular array by address Volatile random access memory or corresponding internal storage location;The non-volatile of each cell in data broadcasting to target area is deposited at random Reservoir or corresponding internal storage location, and the non-volatile random access memory of each cell or corresponding interior in the target area is written Identical relative address in memory cell;To in the cellular array any cell microprocessor send instruction, send data or Reading state;To the microprocessor broadcasting instructions of cells all in target area.
When the cell in the cellular array further includes bus control unit and cell interior bus, the cell interior is total Line is connected in addition to connecting the microprocessor, non-volatile random access memory, internal storage location also corresponding with this cell, described total Lane controller is connected with the cellular array bus, microprocessor and cell interior bus, and the bus control unit is for knowing The communication not carried out between the master cpu and this cell connects the microprocessor to transmit the finger that the master cpu is sent Enable or data, state read, or by the cell interior bus connection non-volatile random access memory or with this cell Corresponding internal storage location carries out the read-write operation of data.
As previously mentioned, the cell in the cellular array of the embodiment of the present invention has both memory, storage and calculates three functions, carefully The non-volatile random access memory intracellular had not only been able to achieve the arbitrary access of involved data when the microprocessor calculates, but also energy Store the instruction code of software and need the data of persistence, however the cost of the non-volatile random access memory be usually compared with High, so the space that the non-volatile random access memory in cell is used as storage section is equally limited, then big when existing When amount file or data needs store each cell in cellular array, limited memory space is just unable to satisfy storage and wants It asks, or even it is also possible to will affect the treatment effeciency of microprocessor, the memory space for how extending cell is equally then urgently to solve Certainly the problem of.
Based on above-mentioned consideration, the embodiment of the present invention gives another structure of cellular array computing system, such as Figure 14 Shown, the cellular array computing system is gone back in addition to including foregoing master cpu, cellular array and cellular array bus It may further include at least one memory cell array, the memory cell array is made of more than one storage unit Two-dimensional array, the cellular array and all memory cell arrays overlap to form three-dimensional structure, in each memory cell array Storage unit is connected correspondingly with the cell in the cellular array, and storage unit cooperation is described non-volatile to deposit at random Reservoir, the two are provided commonly for the instruction code of storage software and need the data of persistence.
In actual implementation, the non-volatile random access memory in cell can be MRAM, and the memory cell array then has Body can be flash memory silicon wafer, can generally choose cost relative to the lower one or more nand flash memory silicon wafers of MRAM, wherein Each nand flash memory silicon wafer is the storage list formed by the storage unit consistent with each cell position in the cellular array Element array, then all nand flash memory silicon wafers are subjected to 3D with a cellular array silicon wafer and are combined, any storage unit and cell battle array It can be vertically connected by TSV between corresponding cell to establish a communications link in column, thus extend the storage of each cell Space.
When it is implemented, the cell in the cellular array further includes the storage control being connected with the microprocessor, For carrying out data storage access control to the storage unit being connected with this cell.When one or more nand flash memory silicon wafers and After one cellular array silicon wafer carries out 3D combination, nand flash memory control can also be configured for each of cellular array cell The MPU of device, this cell just may be implemented by the nand flash memory controller of this cell to storage unit corresponding with this cell It is written and read.When heap file or data storage are in nand flash memory, the search of data can be passed through by each cell The respective channel NAND scans for, and is greatly accelerated.The compiling of large software system is also needed to thousands of Source code file is compiled, and when these source codes are stored in such cellular array computing system, compiling is similarly obtained Great acceleration.
Cellular array computing system provided in an embodiment of the present invention comprising the memory cell array, can with it is lower at The memory space of each cell in this extension cellular array, improves the data storage capacities of each cell.
It overlaps to form three-dimensional with the cellular array it should be pointed out that illustrating only a memory cell array in Figure 14 The case where structure, those skilled in the art equally will also appreciate that more than one memory cell array and the cellular array overlap The case where forming three-dimensional structure.
It will be appreciated to those of skill in the art that aforementioned master cpu passes through in cellular array bus and cellular array often A cell communicated, is carried out between any two cell not against master cpu communication, any cell institute into target area There is cell to carry out mass-sending communication, receive and store other by the way that full-time output cell is arranged in cellular array as end point cell Cell the communication means such as reads to the output data of master cpu for master cpu, these are equally applicable to comprising the storage singly The cellular array computing system of element array.
As previously described, because the non-volatile random access memory in cell is used as memory and the space of storage section all to be had very much Limit, therefore how to extend the memory of cell and memory space is a problem to be solved simultaneously.Based on above-mentioned consideration, the present invention is real Apply the yet another construction that example gives cellular array computing system, as shown in figure 15, the cellular array computing system in addition to Including master cpu, cellular array and cellular array bus, can further include at least one said memory cells array and At least one above-mentioned memory cell array.Cellular array simultaneously comprising the memory cell array and memory cell array calculates System can refer to the tool of the above-mentioned cellular array computing system for only comprising memory cell array or only including memory cell array Body is implemented, and details are not described herein again.
Cellular array meter that is provided in an embodiment of the present invention while including the memory cell array and memory cell array Calculation system can extend the storage of each cell and memory headroom in cellular array simultaneously with lower cost, improve each cell Data storage capacities and cell in microprocessor treatment effeciency, so as to further promote the globality of computing system Energy.
As stated in the background art, the calculation amount that neural computing is related to is very big, but by its realization relied on it is existing There are the communication performance bottlenecks between CPU and memory, storage for computer architecture, thus affect computer overall performance, thus unfavorable In the efficient realization of neural computing.
For this purpose, based on above-mentioned cellular array computing system provided by the embodiment of the present invention, the embodiment of the present invention is also provided A method of neural computing is realized using above-mentioned cellular array computing system, comprising: each in the cellular array Participate in the code that one or more neuron functions are stored in the cell of neural computing;The master cpu selection one A or more than one cell executes the code for the neuron function that each cell is stored, and by implementing result to one or one with On target cell output;It is any participate in neural computing cell receive from other cells exported for nerve The implementing result of the code of meta-function, as the input data of this cell, based on all input numbers from other cells The code of neuron function stored according to this cell is executed, and implementing result is output to the neuron for needing the implementing result Cell where function is perhaps stored in preset address and the master cpu is waited to read or export to the master cpu.
Nerve is realized in order to better understand the above-mentioned cellular array computing system of application provided in an embodiment of the present invention The method of network query function, it is necessary to first the principle of neural computing is briefly described.
Neural network is common algorithm in computer learning, has copied the working principle of human brain.Human brain is by a large amount of Neuron composition, each neuron can be connect by a large amount of (thousands of) cynapses on dendritic arbors with other neurons Touching, is inputted;Stiffness of coupling in cynapse has memory function;It exports the signal of an excitement or inhibition, this signal A large amount of neuron at a distance can be sent to by aixs cylinder.
Human brain working principle is copied to be formed by neural computing method, neuron therein is exactly a function, such as Shown in Figure 16, it has many inputs, such as x1、x2And x3It is that wherein 3 inputs, each input correspond to a weight, General calculation method is that each input is added again multiplied by weight;It exports 0 or 1 (being determined by a threshold value) or one Value between 0 and 1;There are very multiple parameters (such as weight parameter, threshold parameter etc.) inside it, adjusts these parameters Process is exactly the process of computer learning.
One typical Neural Network is the network that output, the input of a large amount of Neuron are linked together, leads to Often it is organized into multistage architecture.As shown in figure 17, this neural network is organized into three-tier architecture in figure, each first layer Neuron Output be sent to each second layer Neuron, the output of each second layer Neuron is sent to each third layer Neuron.The first layer of neural network is properly termed as input layer in Figure 17, and the second layer is properly termed as hidden layer, and third layer can claim For output layer, the output layer exported 0,1,2 ..., 8,9 be neural computing final result.Mind in It is all usually such hierarchical structure through network, in actual implementation, this can be the nerve for being used to identify handwritten numeral Network.
The learning training process of Neural network is the process that a magnanimity calculates.One network is actually one The function for possessing quantity of parameters (may have tens of thousands of), needs a large amount of scene, and each scene has input data and correct option, For adjusting these a large amount of parameters to reach the destination of study (usually using steepest descent method).For example train a nerve net Network identifies handwritten numeral, needs through tens of thousands of pictures, adjustment parameter provides neural network correctly to each picture As a result, handwriting recognition hereafter just has very high success rate.
The present inventors considered that neural computing is to be highly suitable for cellular array meter provided in an embodiment of the present invention It is achieved in calculation system.Due to needing to carry out neural network the valuation of many numbers in training process, then the nerve Network is can be accelerated by a large amount of MPU parallel computing.
In actual implementation, if the quantity size of neuron is less than cell in cellular array computing system in neural network Quantity, then in cellular array a cell can complete the calculating that a neuron in neural network is related to, otherwise neural The calculating that more than one neuron is related in network can integrate to be completed in a cell.
When it is implemented, each cell stores the code of one or more neuron functions, the code of neuron function It can be broadcasted by way of data broadcasting by master cpu to each cell as neuron, and be broadcasted by master cpu Instruction to some target area starts neural computing, such as: the master cpu can choose in the cellular array all Or cell of the cell in target area as participation neural computing, by the code broadcast of identical neuron function to institute In the same segment relative address of each cell of selection, and broadcasting instructions make the micro- of one or more selected cells Processor executes the code of the neuron function since the relative address.Specific implementation can also refer to above-mentioned cell battle array The associated description of column count system specific implementation.
In the embodiment of the present invention, the corresponding weight parameter of each input data from other cells is described to be based on The code that all input datas from other cells execute the neuron function that this cell is stored may include: by each The corresponding weight parameter of input data from other cells is multiplied, and sums to all products, by the result after summation The output valve determined after being compared with threshold parameter is as implementing result.When actual implementation, the weight parameter and threshold value Parameter is pre-stored among the non-volatile random access memory in cell.
It is also logical if the implementing result exported after executing the code of the neuron function is successive value when actual implementation The mode tabled look-up is crossed to accelerate to export.
When it is implemented, since cellular array computing system provided in an embodiment of the present invention is supported between any two cell Cell-cell communication, then when carrying out neural computing, can by using the cell-cell communication transmit for the nerve The implementing result of the code of meta-function.The effect of cell-cell communication is improved by the huge bandwidth of cellular array internal network communication Rate also reduces the processing load of master cpu, so as to further promote the overall performance of computing system, is more advantageous to nerve The efficient realization of network query function.
When it is implemented, then data transmission between layers can be used when neural network is divided into different layers Cell-cell communication provided in an embodiment of the present invention mass-sends mechanism, and the cell for participating in neural computing in same layer at this time is located at together In one target area, any cell for participating in neural computing is mass-sended to target area for neuron function in a certain layer The implementing result of code is just equivalent to the implementing result mass-sending of the code for neuron function to all participation minds of next layer In cell through network query function.By the way that intercellular point communications functionality is expanded to region mass-sending, can support higher The depth of parallelism obtains much higher total bandwidth, to further promote the overall performance of computing system, is remarkably contributing to improve nerve The speed of network query function.
When it is implemented, can be incited somebody to action by the cell-cell communication if being additionally provided with full-time output cell in cellular array The implementing result of neural computing is sent to the full-time output cell so can be more efficient for master cpu reading Ground exports the implementing result of neural computing to master cpu.When actual implementation, if in the neuron function of the last layer (common application is in the neuron of the last layer, each is responsible for identification one specific number for certain outputs for obtaining affirmative According to feature or image), then master cpu can be notified by this mechanism.
Implement neural computing by cell-cell communication, iuntercellular mass-sending and full-time output cell and can refer to cell Associated description in array computation system embodiment, details are not described herein again.
In conclusion realizing the advantage of neural computing using cellular array computing system provided in an embodiment of the present invention It is obvious: the arithmetic speed that the parallel computing of a large amount of cells is significantly speeded up, so that the speed pole of learning training It is big to improve;The huge bandwidth of array internal network communication and mass-sending mechanism are equally remarkably contributing to improve speed;MRAM's is non- Volatibility replicates the successful chip of training directly as the product sale for solving particular problem.
Although present disclosure is as above, present invention is not limited to this.Anyone skilled in the art are not departing from this It in the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute Subject to the range of restriction.

Claims (9)

1. a kind of method that application cell array computation system realizes neural computing, which is characterized in that the cellular array Computing system includes: master cpu, cellular array and cellular array bus, the cellular array and the cellular array bus collection In Cheng Yuyi chip;The two-dimentional battle array that the cellular array is made of more than one cell for having both calculating and store function Column, wherein each cell includes microprocessor and non-volatile random access memory;The non-volatile random access memory is for described The arbitrary access of microprocessor involved data when calculating, is also used to store the instruction code of software and needs the number of persistence According to;Position in each leisure cellular array of each cell storage as ID in cell software or hardware read; The master cpu is communicated by the cellular array bus with each of cellular array cell;The cell There is communication interface between flanking cell in array, it can the lower mutually transmission data of instruction software control in the cell;
The described method includes:
One or more neuron functions are stored in each cell for participating in neural computing in the cellular array Code;
The master cpu selects one or more cells to execute the code for the neuron function that each cell is stored, and incites somebody to action Implementing result is exported to one or more target cell;
Any cell for participating in neural computing receives the code for neuron function exported from other cells Implementing result execute this cell based on all input datas from other cells as the input data of this cell The code of the neuron function stored, and implementing result is output to thin where the neuron function for needing the implementing result Born of the same parents are perhaps stored in preset address or export to the master cpu.
2. the method that application cell array computation system according to claim 1 realizes neural computing, feature exist In, corresponding weight parameter of each input data from other cells, it is described based on all from the defeated of other cells Enter data execute the code for the neuron function that this cell is stored include: input data by each from other cells with Its corresponding weight parameter is multiplied, and sums to all products, after the result after summation is compared with threshold parameter really Fixed output valve is as implementing result.
3. the method that application cell array computation system according to claim 2 realizes neural computing, feature exist In the weight parameter and threshold parameter are pre-stored among the non-volatile random access memory in cell.
4. the method that application cell array computation system according to claim 1 realizes neural computing, feature exist In being accelerated by way of tabling look-up defeated if the implementing result exported after executing the code of the neuron function is successive value Out.
5. the method that application cell array computation system according to claim 1 realizes neural computing, feature exist In, can be carried out communication between any two cell in the cellular array, participate in cell-cell communication cell include starting point it is thin Born of the same parents, end point cell and transfer cell, the starting point cell are the cell that data are issued to the end point cell, the end point cell For the cell for finally receiving the sent out data of starting point cell, the transfer cell be it is successively adjacent along cell-cell communication path and By the cell of the sent out data of starting point cell described in the communication interface transfer, the cell-cell communication path is by the starting point The data transmit-receive path that cell, transfer cell and end point cell are constituted;The method also includes: be between any two cell By cell-cell communication transmission for the implementing result of the code of the neuron function.
6. the method that application cell array computation system according to claim 5 realizes neural computing, feature exist In any cell in the cellular array can also be mass-sended as all cells of the starting point cell into target area Communication, participates in the mass-sending communication and the cell in target area is thin as the starting point cell or as the terminal Born of the same parents are used as the transfer cell and end point cell simultaneously, participate in the mass-sending communication and the cell being located at outside target area is made For the starting point cell or transfer cell;The method also includes: it is logical using the mass-sending if neural network is divided into different layers Reliable existing data between layers are transmitted, and the cell that neural computing is participated in same layer is located in same target area.
7. the method that application cell array computation system according to claim 5 or 6 realizes neural computing, feature It is, at least one sole duty output cell is additionally provided in the cellular array, and the full-time output cell is thin as the terminal Born of the same parents receive and store other cells to the output data of the master cpu, and notify the master cpu to read institute with interrupt signal State output data;It includes: to execute knot for described by the cell-cell communication that the implementing result is exported to the master cpu Fruit is sent to the full-time output cell, for master cpu reading.
8. the method that application cell array computation system according to claim 1 realizes neural computing, feature exist In the non-volatile random access memory is MRAM.
9. the method that application cell array computation system according to claim 1 realizes neural computing, feature exist Select whole in the cellular array or the cell in target area as participating in neural computing in, the master cpu Cell, by the same segment relative address of the code broadcast of identical neuron function to each selected cell, and it is wide Broadcasting instruction makes the microprocessor of one or more selected cells execute the neuron since the relative address The code of function.
CN201510456306.2A 2015-07-29 2015-07-29 A kind of method that application cell array computation system realizes neural computing Active CN105740946B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510456306.2A CN105740946B (en) 2015-07-29 2015-07-29 A kind of method that application cell array computation system realizes neural computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510456306.2A CN105740946B (en) 2015-07-29 2015-07-29 A kind of method that application cell array computation system realizes neural computing

Publications (2)

Publication Number Publication Date
CN105740946A CN105740946A (en) 2016-07-06
CN105740946B true CN105740946B (en) 2019-02-12

Family

ID=56295970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510456306.2A Active CN105740946B (en) 2015-07-29 2015-07-29 A kind of method that application cell array computation system realizes neural computing

Country Status (1)

Country Link
CN (1) CN105740946B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154226B (en) * 2016-12-06 2021-09-03 上海磁宇信息科技有限公司 Neural network chip using analog computation
CN108154225B (en) * 2016-12-06 2021-09-03 上海磁宇信息科技有限公司 Neural network chip using analog computation
CN108255514A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 For the neuron calculator operation method of cellular array computing system
CN108256637A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 A kind of cellular array three-dimensional communication transmission method
CN108255775A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 For the cellular array bus broadcast method of cellular array computing system
CN108256641A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 For the cellular array internal network communication method of cellular array computing system
CN108256640A (en) * 2016-12-28 2018-07-06 上海磁宇信息科技有限公司 Convolutional neural networks implementation method
CN109146070B (en) * 2017-06-16 2021-10-22 华为技术有限公司 Peripheral circuit and system for supporting neural network training based on RRAM
CN107817708B (en) * 2017-11-15 2020-07-07 复旦大学 High-compatibility programmable neural network acceleration array
CN111242294B (en) * 2017-12-14 2023-08-25 中科寒武纪科技股份有限公司 Integrated circuit chip device and related products
CN110308909B (en) * 2018-03-27 2023-08-01 上海寒武纪信息科技有限公司 Executable program generating device and method for neural network processor
US11914860B2 (en) * 2018-08-20 2024-02-27 Macronix International Co., Ltd. Data storage for artificial intelligence-based applications
CN109886393B (en) * 2019-02-26 2021-02-09 上海闪易半导体有限公司 Storage and calculation integrated circuit and calculation method of neural network
CN110362280A (en) * 2019-09-04 2019-10-22 南京优存科技有限公司 Mixing storage system based on the nearly data processing MRAM of low-power consumption neural network
CN112269751B (en) * 2020-11-12 2022-08-23 浙江大学 Chip expansion method for hundred million-level neuron brain computer

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341242A (en) * 1999-01-21 2002-03-20 索尼电脑娱乐公司 High-speed processor system, method of using the same, and recording medium
WO2009075694A1 (en) * 2007-12-05 2009-06-18 Hewlett-Packard Development Company, L.P. Hybrid microscale-nanoscale neuromorphic integrated circuit
CN102665049A (en) * 2012-03-29 2012-09-12 中国科学院半导体研究所 Programmable visual chip-based visual image processing system
CN103019656A (en) * 2012-12-04 2013-04-03 中国科学院半导体研究所 Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101354694B (en) * 2007-07-26 2010-10-13 上海红神信息技术有限公司 Ultra-high expanding super computing system based on MPU structure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1341242A (en) * 1999-01-21 2002-03-20 索尼电脑娱乐公司 High-speed processor system, method of using the same, and recording medium
WO2009075694A1 (en) * 2007-12-05 2009-06-18 Hewlett-Packard Development Company, L.P. Hybrid microscale-nanoscale neuromorphic integrated circuit
CN102665049A (en) * 2012-03-29 2012-09-12 中国科学院半导体研究所 Programmable visual chip-based visual image processing system
CN103019656A (en) * 2012-12-04 2013-04-03 中国科学院半导体研究所 Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system

Also Published As

Publication number Publication date
CN105740946A (en) 2016-07-06

Similar Documents

Publication Publication Date Title
CN105740946B (en) A kind of method that application cell array computation system realizes neural computing
CN105718996B (en) Cellular array computing system and communication means therein
CN105608490B (en) Cellular array computing system and communication means therein
CN105718994B (en) Cellular array computing system
US11741345B2 (en) Multi-memory on-chip computational network
US10846621B2 (en) Fast context switching for computational networks
US11580367B2 (en) Method and system for processing neural network
CN106203619B (en) Data optimized neural network traversal
CN106951926A (en) The deep learning systems approach and device of a kind of mixed architecture
DE102010044529B4 (en) AUTONOMOUS MEMORY SUB SYSTEM WITH HARDWARE ACCELERATOR
CN111465943B (en) Integrated circuit and method for neural network processing
DE112020004107T5 (en) CONTENT RECOMMENDATIONS USING ONE OR MORE NEURAL NETWORKS
CN105718991B (en) Cellular array computing system
DE102022121773A1 (en) IN-MEMORY ASSOCIATIVE PROCESSING SYSTEM
CN105718990B (en) Communication means between cellular array computing system and wherein cell
CN105718380B (en) Cellular array computing system
CN105718379B (en) Cellular array computing system and wherein iuntercellular mass-send communication means
CN104156316B (en) A kind of method and system of Hadoop clusters batch processing job
CN117786412A (en) Elastic training method, cluster system, product and medium for large language model
CN105718993B (en) Cellular array computing system and communication means therein
CN110750363B (en) Computer storage management method and device, electronic equipment and storage medium
CN106293491B (en) The processing method and Memory Controller Hub of write request
CN105718992B (en) Cellular array computing system
CN105718995B (en) Cellular array computing system and its adjustment method
CN105719227B (en) A kind of camera system and image identification system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant