CN105740946B - A kind of method that application cell array computation system realizes neural computing - Google Patents
A kind of method that application cell array computation system realizes neural computing Download PDFInfo
- Publication number
- CN105740946B CN105740946B CN201510456306.2A CN201510456306A CN105740946B CN 105740946 B CN105740946 B CN 105740946B CN 201510456306 A CN201510456306 A CN 201510456306A CN 105740946 B CN105740946 B CN 105740946B
- Authority
- CN
- China
- Prior art keywords
- cell
- data
- cellular array
- array
- communication
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Mobile Radio Communication Systems (AREA)
Abstract
A kind of method that application cell array computation system realizes neural computing, the cellular array computing system includes: master cpu, cellular array and cellular array bus;The two-dimensional array that cellular array is made of more than one cell for having both calculating and store function;Master cpu is communicated by cellular array bus with cell each in cellular array;There is communication interface between flanking cell, can mutually send data;The described method includes: the code that one or more neuron functions will be stored in the cell for participating in neural computing each in cellular array;Any cell for participating in neural computing receives the implementing result of the code for neuron function exported from other cells, input data as this cell executes the code for the neuron function that this cell is stored, and implementing result is exported, or deposit preset address.The present invention can overcome active computer framework because of communication performance bottleneck existing between CPU and memory, storage, lifting system overall performance.
Description
Technical field
The present invention relates to computer and computer application technology, in particular to a kind of application cell array computation system
The method for realizing neural computing.
Background technique
Usually, a computer mainly includes three cores: central processing unit (CPU, Central
Processing Unit), memory and storage.
By the unremitting effort of some Apical Limited of the world, CPU has evolved into extremely complicated semiconductor chip.It is top
CPU core inside metal-oxide-semiconductor number can be more than 100,000,000.Current industry trend is to be limited by power consumption, the operation frequency of CPU
Rate has been difficult to improve again.Extremely complicated modern CPU, operational efficiency are similarly difficult to improve again.New CPU product, more
Come more towards the evolution of multicore direction.
In terms of memory, that occupy dominant position at present is dynamic random access memory (DRAM, Dynamic Random
Access Memory) technology.DRAM can quick random read-write, but content cannot be kept in the event of a power failure.It is practical
On, even if it is also due to internal lose information for storing up the electric leakage of stored capacitor in the case where energization, it is necessary to
Periodically self-refresh.
In terms of storage, nand flash memory technology gradually replaces conventional hard.The floating grid that flash memory is relied on
(floating gate) technology, although content can be kept in the event of a power failure, the speed of write-in (being rewritten as ' 0 ' for ' 1 ')
Degree is very slow, and the speed of erasing (being rewritten as ' 1 ' for ' 0 ') is slower, and no image of Buddha DRAM is used for the direct support to calculating like that.Its quilt
It makes block device (block device), it is necessary to which monolith is wiped together, and a block (block) includes many pages (page), is wiped
Write operation can be carried out except latter every page.The another question of NAND is that have the limited service life.
The logic circuit of DRAM and nand flash memory and CPU, although being all based on the production of cmos semiconductor technique,
The technique of this three is each other and incompatible.Then, three cores of computer can not coexist on a single die, this is deep
Ground affects the framework of modern computer.
Computer architecture in the prior art as shown in Figure 1, show multiple CPU cores in Fig. 1, respectively CPU1, CPU2,
CPU3 ..., CPUn, each CPU core generally has corresponding level cache (L1 Cache), as needed can also be into one
Step is that each CPU core is equipped with corresponding L2 cache (L2 Cache), three-level caches (L3 Cache).DRAM and each CPU
It is communicated between kernel by Double Data Rate (DDR, Double Data Rate) interface, hard disk (HD, Hard Disk) or solid
State hard disk (SSD, Solid State Drives) with then communicated by peripheral device interface between each CPU core.
On the one hand, CPU develops to the direction of multicore, and another aspect memory and storage are all in other chip.Multicore
CPU handles up the proportional increase of information content, and the bottleneck of system performance is just increasingly becomed with the communication of memory, storage.In order to alleviate
Communication performance bottleneck, CPU have to using increasing multi-level buffer.Caching be in memory content duplication, usually at
This but speed faster static random access memory (SRAM, Static Random Access more much higher than DRAM
Memory it) designs.Such framework, cost effectiveness are very poor.The cost of semiconductor chip determines by the area of its silicon wafer, and
The performance boost of convention computer architecture bring and the increase of its silicon area are much disproportionate.
Neural network (Neural Networks) is common calculation method in computer learning, is the work for copying human brain
Make principle, generally using concept as perceptron (Perceptron) or neuron (Neuron).Neural Networks
Learning training process be a magnanimity calculate process, a neural network be actually one possess quantity of parameters (may
Have tens of thousands of) function, need a large amount of scene, each scene has input data and correct option, a large amount of for adjusting these
Parameter to reach the destination of study.The calculation amount as involved in neural computing is very big, and calculating in the prior art
Rack structure is because communication performance bottleneck existing between CPU and memory, storage can then seriously affect computer overall performance, to be unfavorable for
The efficient realization of neural computing.
Summary of the invention
The problem to be solved in the present invention is computer architecture in the prior art because existing between CPU and memory, storage
Communication performance bottleneck and influence computer overall performance, to be unfavorable for the efficient realization of neural computing.
To solve the above problems, technical solution of the present invention provides a kind of application cell array computation system realization neural network
The method of calculating, the cellular array computing system include: master cpu, cellular array and cellular array bus;The cell battle array
The two-dimensional array that column are made of more than one cell for having both calculating and store function, wherein each cell includes micro process
Device (MPU, Micro Processing Unit) and non-volatile (NV, Not Volatile) random access memory;It is described it is non-volatile with
The arbitrary access of involved data when machine memory is calculated for the microprocessor, be also used to store software instruction code and
Need the data of persistence;Position in each leisure cellular array of each cell storage as identity recognition number (ID,
Identification) in cell software or hardware read;The master cpu passes through the cellular array bus and institute
Each of cellular array cell is stated to be communicated;There is communication interface between flanking cell in the cellular array, it can phase
Mutually send data;The described method includes: storing one in each cell for participating in neural computing in the cellular array
Or the code of more than one neuron function;The master cpu selects each cell of one or more cells execution to be stored
Neuron function code, and implementing result is exported to one or more target cell;Any participation nerve net
The cell that network calculates receives the implementing result of the code for neuron function exported from other cells, as
The input data of this cell executes the neuron function that this cell is stored based on all input datas from other cells
Code, and implementing result is output to the cell where the neuron function for needing the implementing result, or deposit preset address,
Or it is exported to the master cpu.
Optionally, it can be carried out communication between any two cell in the cellular array, participate in the thin of cell-cell communication
Born of the same parents include starting point cell, end point cell and transfer cell, and the starting point cell is the cell that data are issued to the end point cell,
The end point cell is the cell for finally receiving the sent out data of starting point cell, and the transfer cell is along cell-cell communication road
Diameter is successively adjacent and cell by the sent out data of starting point cell described in the communication interface transfer, the cell-cell communication path
It is the data transmit-receive path being made of the starting point cell, transfer cell and end point cell;The method also includes: any two
It is the implementing result by cell-cell communication transmission for the code of the neuron function between a cell.
Optionally, any cell in the cellular array can also be all into target area as the starting point cell
Cell carries out mass-sending communication, participates in the mass-sending communication and the cell in target area as the starting point cell or makees
For the end point cell or simultaneously as the transfer cell and end point cell, participates in the mass-sending communication and be located at target area
Overseas cell is as the starting point cell or transfer cell;The method also includes: if neural network is divided into different layers, make
Data transmission between layers is realized with the mass-sending communication, and the cell of participation neural computing is located at same in same layer
In target area.
Optionally, at least one sole duty output cell, the full-time output cell conduct are additionally provided in the cellular array
The end point cell receives and stores other cells to the output data of the master cpu, and notifies the master with interrupt signal
It controls CPU and reads the output data;It includes: by the cell-cell communication that the implementing result is exported to the master cpu
The implementing result is sent to the full-time output cell, for master cpu reading.
Optionally, the non-volatile random access memory is magnetic RAM (MRAM, Magnetic Random
Access Memory)。
Optionally, the master cpu selects whole in the cellular array or the cell in target area as participating in mind
Cell through network query function is opposite by same section of the code broadcast of identical neuron function to each selected cell
In address, and broadcasting instructions execute the microprocessor of one or more selected cells since the relative address
The code of the neuron function.
Compared with prior art, technical solution of the present invention has at least the following advantages:
In current neural network algorithm, the process of learning training is usually a problem of magnanimity calculates;One network
The learning training time it is very long;The parallel computing of thousands of neurons, pole may be implemented in neural network in cellular array
The learning process for the neural network that the earth is accelerated, improves learning ability.
Two-dimensional array (title is formed with the unit (referred to as " cell ") of store function by the way that more than one is had both independent calculate
For " cellular array "), wherein each cell includes microprocessor and non-volatile random access memory, the non-volatile random storage
Device can support the arbitrary access of the microprocessor involved data when being calculated, and can also support the instruction generation of storage software
Code and need the data of persistence, make memory, storage, calculate three functions and be integrated into each cell, and make each cell it
Between form intensive communication network, on the one hand, master cpu can pass through each of cellular array bus and cellular array cell
It is communicated, on the other hand, also can mutually send data between the flanking cell in cellular array, thus, it is possible to mass-send by data
And internal network, overcome active computer framework because of communication performance bottleneck existing between CPU and memory, storage, to promote calculating
The overall performance of system, and cost effectiveness can be made preferable;The above-mentioned computing system with cellular array is being applied to realize nerve
In the case where network query function, the arithmetic speed that the parallel computing of a large amount of cells greatly accelerates, so that learning training
Speed is greatly improved, and non-volatile possessed by non-volatile random access memory in cell, so that the successful chip of training can
The product sale of particular problem directly to replicate as solving.
By the communication interface between flanking cell in cellular array, realize between flanking cell for data it is multiple in
Turn, makes just to can be carried out communication not against master cpu between any two cell in cellular array, by cellular array intranet
Network communicates the efficiency that huge bandwidth improves cell-cell communication, the processing load of master cpu is also reduced, so as to further
The overall performance for promoting computing system, is more advantageous to the efficient realization of neural computing.
By the way that intercellular point communications functionality is expanded to region mass-sending, it can support the higher depth of parallelism, obtain
Much higher total bandwidth is remarkably contributing to improve neural computing to further promote the overall performance of computing system
Speed.
By the way that full-time output cell is arranged in cellular array, connect using the full-time output cell as the end point cell
It receives and stores other cells to the output data of master cpu, and notified described in the master cpu reading in a manner of interrupt signal
Output data so can improve master cpu when only a few cell is needed to master cpu output data and read output
The efficiency of data, so as to more efficiently export the final result of neural computing to master cpu.
The communication performance bottleneck that the cellular array computing system provided in technical solution of the present invention solves between CPU and memory is asked
Topic.In this way, under identical silicon area, it is top far to surpass a few in many problems for the processing capacity of a large amount of miniature MPU
The processing capacity of CPU, and power consumption is then much lower.
Cellular array stores the result (each weighted data) of learning training using nonvolatile storage, once complete instruction
Practice.Entire chip product can be sold with simple copy.
The cellular array computing system provided in technical solution of the present invention makes the framework of computer closer to human brain, for not
The intelligent algorithm come provides strong engine.
Detailed description of the invention
Fig. 1 is the schematic diagram of computer architecture in the prior art;
Fig. 2 is a kind of structural schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Fig. 3 is the schematic diagram of communication mode between a kind of flanking cell of the embodiment of the present invention;
Fig. 4 is the schematic diagram of communication mode between another flanking cell of the embodiment of the present invention;
Fig. 5 is the structural schematic diagram of the cell of the embodiment of the present invention;
Fig. 6 is the structural schematic diagram that the cell of cell-cell communication is carried out in the cellular array of the embodiment of the present invention;
Fig. 7 is the schematic diagram of the Path selection of cell-cell communication in the cellular array of the embodiment of the present invention;
Fig. 8 is the implementation process schematic diagram of the full-time output cell of the embodiment of the present invention;
Fig. 9 is cell mass-sending schematic diagram of the starting point cell on the angle of target area in the cellular array of the embodiment of the present invention;
Figure 10 is cell mass-sending schematic diagram of the starting point cell on the side of target area in the cellular array of the embodiment of the present invention;
Figure 11 is cell mass-sending schematic diagram of the starting point cell inside target area in the cellular array of the embodiment of the present invention;
Figure 12 is cell mass-sending schematic diagram of the starting point cell outside target area in the cellular array of the embodiment of the present invention;
Figure 13 is another structural schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Figure 14 is another structural schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Figure 15 is the yet another construction schematic diagram of cellular array computing system provided in an embodiment of the present invention;
Figure 16 is the schematic diagram of a neuron in neural network;
Figure 17 is the schematic diagram of neural computing.
Specific embodiment
The computer architecture of the prior art is since there are communication performance bottlenecks between CPU and memory, storage, to influence computer
The promotion of overall performance, and keep cost effectiveness poor.
Present inventor considers afterwards after study, if three memory, storage, calculating functions are integrated into a chip
Come up, formed relatively easy but have both the independent unit calculated with store function, and makes to be formed between a large amount of such units intensive
Communication network, data broadcasting/mass-sending function is realized with this and is capable of the internal network of extensive parallel transmission data, just can be opened
A kind of and human brain is issued there are the computing architecture of similarity, this, which is equivalent to, is made in a chip a large amount of microcomputer
On.
For this purpose, technical solution of the present invention provides one kind computing architecture similar with human brain structure (in technical solution of the present invention
It is referred to as " cellular array computing system "), the computing architecture is relatively easy by numerous structures, storage and computing function are had both,
It is made of again the unit (being referred to as " cell " in technical solution of the present invention) that dense network connects.This new computing architecture will
It is widely used in the fields such as mass computing, big data processing, artificial intelligence.
To make the above purposes, features and advantages of the invention more obvious and understandable, with reference to the accompanying drawing to the present invention
Specific embodiment be described in detail.
As shown in Fig. 2, cellular array computing system provided in an embodiment of the present invention includes: master cpu, cellular array and thin
Born of the same parents' array bus;The cellular array is the main body in the cellular array computing system, it is that calculating is had both by more than one
The two-dimensional array formed with the cell of store function, wherein each cell includes microprocessor (MPU) and non-volatile deposits at random
Reservoir (in Fig. 2 by taking MRAM as an example);The non-volatile random access memory involved data when being calculated for the microprocessor
Arbitrary access is also used to store the instruction code of software and needs the data of persistence;Each comfortable institute of each cell storage
State position in cellular array as ID in cell software or hardware read;The master cpu passes through the cell battle array
Column bus is communicated with each of cellular array cell;There is communication between flanking cell in the cellular array
Interface can mutually send data.
It should be noted that in the present embodiment, the non-volatile random access memory is illustrated by taking MRAM as an example.?
In other embodiments, with further development and mature, the non-volatile random access memory of non-volatile random storage technology
It can be realized using in addition several potential technologies, such as: phase-change random access memory (PCRAM, Phase Change
Random Access Memory), resistive formula random access memory (ResistiveRandom Access Memory), ferroelectricity with
Machine memory (FeRAM, Ferroelectric Random Access Memory), ferroelectric dynamic random access memory (FEDRAM,
Ferroelectric Dynamic Random Access Memory) etc..
MRAM is a kind of new memory and memory technology, can as SRAM/DRAM quick random read-write, and compare
DRAM is fast;Can also as flash memory permanent reservation data after a loss of power, and unlike nand flash memory, MRAM can with unlimited time
Erasable, service life is longer.In addition, the economy of MRAM thinks that locality is good, the silicon area ratio SRAM that unit capacity occupies is (usually
Caching as CPU) there is very big advantage, it is expected to the level close to DRAM.Its performance is also fairly good, and read-write time delay is close to most
Good SRAM, power consumption are then best in various memories and memory technology.And MRAM is unlike DRAM and Flash and standard
Cmos semiconductor technique is incompatible, and MRAM can be integrated into a chip with logic circuit.By using MRAM technology, so that it may
Three memory, storage, calculating functions are integrated on a chip, the realization of the cellular array computing system is made just
It is possibly realized.
In the present embodiment, the microprocessor has the function of usual CPU, can also be added according to specific application scenarios
The units such as Floating-point Computation processor (FPU, Float Point Unit), image processor, therefore, in the microprocessor can be with
It is integrated at least one of Floating-point Computation processor and image processor.
In actual implementation, the master cpu and the cellular array and the cellular array bus be can integrate in one
In a chip, the master cpu can also be used as independent chip, by the memory interface of standard with by the cellular array and
The chip of the cellular array bus composition is communicated.When using standard between the master cpu and the cellular array
When memory interface is communicated, then master cpu can be realized using general cpu chip, it is easier to which the cellular array calculates
The implementation of system.
In the present embodiment, as ID, the position can adopt the position in each comfortable cellular array of each cell storage
With the coordinate representation mode of first quartile in plane right-angle coordinate, if (x, y) indicates position of some cell in cellular array
It sets, then (x, y) can be used as ID and be stored in the cell, and the software and hardware in cell can read this ID, specific
It is used in operation.
In the present embodiment, the master cpu is thin by each of the cellular array bus and the cellular array
The communication that born of the same parents carry out includes following situations:
The non-volatile random access memory of any cell in the cellular array is read and write by address;
By the non-volatile random access memory of data broadcasting each cell in target area into the cellular array, and write
Enter in the target area identical relative address in the non-volatile random access memory of each cell;
Send instruction (including start, suspend) to the microprocessor of any cell in the cellular array, send data or
Reading state;
To the microprocessor broadcasting instructions of cells all in target area.
Certainly, in other embodiments, the master cpu passes through in the cellular array bus and the cellular array
The communication that each cell carries out is also possible to one of above situation or multiple combinations.
It should be noted that " target area " in the embodiment of the present invention refers to by the master cpu or the cell battle array
Any cell in column selects more than one there are the region that the cell of neighbouring relations is constituted, and the cell in the region is master control
The object of CPU or any cell broadcast data or instruction in cellular array.In the present embodiment, the target area is specifically with square
(a≤x≤b, c≤y≤d, wherein a, b are the boundary of rectangular area x-axis direction in plane right-angle coordinate respectively in shape region
Coordinate, c, d are the boundary coordinate in rectangular area y-axis direction in plane right-angle coordinate respectively) for be illustrated.At it
In his embodiment, the target area is also possible to the region of other shapes composition, such as diamond-shaped area, delta-shaped region, six
Side shape region etc..
In addition, the concept of " broadcast " is different from the concept of " mass-sending " in the embodiment of the present invention, the former can be one pass of hair
According to or instruction so that all objects can receive, and the latter then can be point many times and issue different objects.
Except through master cpu to cell any in the cellular array (including microprocessor in cell or non-volatile
Random access memory) it is broadcasted, in the cellular array, there are one communication network, which enables to one
Cell can send data to the cell adjacent with it under the control of its MPU.As shown in figure 3, in a plane, Ren Yixi
Born of the same parents can communicate with the flanking cell of its upper and lower, left and right four direction.Certainly, between flanking cell communication mode it is general
Thought is not limited only to " upper and lower, left and right four direction ", in the case where configuration can be supported, be also possible to " upper,
Under, left and right, upper left, upper right, lower-left, the direction of bottom right eight ", as shown in figure 4, any cell can with its upper and lower, left and right,
Upper left, upper right, lower-left, eight directions in bottom right flanking cell communicated.
As shown in figure 5, in the present embodiment, the cell in the cellular array specifically can also include bus control unit and
Cell interior bus, the bus control unit are connected with the cellular array bus, microprocessor and cell interior bus, institute
The communication that bus control unit carries out between the master cpu and this cell for identification is stated, connects the microprocessor to transmit
The instruction or data, state that the master cpu is sent are read, or by cell interior bus connection it is described it is non-volatile with
The read-write operation of machine memory progress data.
As known to those skilled in the art, one fairly simple and the good CPU of performance, such as ARM Cotex M0, only 50,000
A or so metal-oxide-semiconductor, it is also more much smaller than more than one hundred million a metal-oxide-semiconductors of top CPU even if appropriateness increases FPU function, promote cpu performance
Bring area (cost) increase is out-of-proportion.The many small CPU of CPU one big are replaced, in the same feelings of totle drilling cost
Under condition, total computing capability is centainly increased many times.However convention computer architecture is limited by communication performance bottleneck, using in a large amount of CPU
Core bring actual performance is promoted very limited.
And the cellular array computing system that technical solution of the present invention provides passes through data broadcasting/mass-sending and internal network, solution
Determined communication performance bottleneck the problem of, to promote the overall performance of computing system, and cost effectiveness can be made preferable, will be subsequent to this
This point is more clearly visible that in application example.
It is preliminary studies have shown that being formed one thin if cooperating the memory of 32KB using the MPU for being similar to Cortex M0
Born of the same parents.Using 40 nanometers of technique, 3000 such cells can be done on a single die, this is very powerful calculating energy
Power.Further investigations have shown that in this way, the calculating of the present age top CPU can be surmounted in same silicon area
Ability (is generally measured with flops per second (FLOPS, floating-point operations per second)).
Since the cellular array computing system of technical solution of the present invention no longer faces the bottleneck with memory interface, solving much actually to ask
In topic, the meeting of performance is more preferable.
Based on above-mentioned cellular array computing system, the embodiment of the present invention is also provided in a kind of above-mentioned cellular array computing system
Communication means, comprising: master cpu reads and writes the communication between the operation of non-volatile random access memory, master cpu and microprocessor
It operates, the traffic operation in the broadcast operation and cellular array of master cpu between flanking cell;
The operation of the master cpu read-write non-volatile random access memory specifically includes: any thin in the cellular array
Born of the same parents receive the destination address that the master cpu is broadcasted in the cellular array bus, if judging the destination address at this
In cell, then the non-volatile random access memory of the cell is connected so that the master cpu carries out the read-write operation of data.
Traffic operation between the master cpu and microprocessor specifically includes: first is reserved in system address space
Special address field is for the communication between the master cpu and microprocessor and stores the ID of target cell, if the cell battle array
Identify it is communication with the microprocessor of this cell when any cell receives the first special address field in column, then connection should
The microprocessor of cell completes subsequent command reception, data receiver and status read operation.
It should be noted that the system address space be not limited only in each cell by cellular array include
Non-volatile random access memory composition address space summation because the memory of connection cellular array bus may be not just thin
The non-volatile random access memory for including in each cell of born of the same parents' array, entirely possible there is also other kinds of memory and cells
Array bus is connected, and accesses for the master cpu.Therefore, the master cpu needs to go to identify according to the ID of cell it and prepares to visit
The cell asked (cell is known as " target cell " in the present embodiment at this time).
The broadcast operation of the master cpu specifically includes: reserving the second special address field in system address space and is used for
The master cpu broadcasting instructions, the second special address field have and can assist in target area in the cellular array
The ID of each cell of range, if any cell identifies that this is thin after receiving the second special address field in the cellular array
Born of the same parents connect the microprocessor of the cell then in the target area to transmit the instruction or data, shape that the master cpu is sent
State is read, or the non-volatile random access memory by connecting the cell carries out the read-write operation of data.
The broadcast operation of the master cpu is illustrated so that the target area is specifically rectangular area as an example below.
One section is reserved in system address space and is used as broadcasting instructions, and one in this address section rises in target rectangle region for storing
The ID of point cell.The starting point cell is the first cell accessed by master cpu in the target rectangle region, total in the cell
After lane controller receives this special address, the data of a subsequent word (word) are received, this data includes target rectangle
The ID of the cell diagonal with starting point cell in region.Bus control unit judges this cell in this region, receives second word
Data.It is that some relative address is opened to the instruction or data of MPU, or from non-volatile random access memory that second word, which is indicated,
Begin to be written.If it is the former, MPU is connected, if it is the latter, non-volatile random access memory is connected and completes subsequent operation.
It should be noted that in the case where the memory space of the described second special address field is relatively limited, storage
Cell ID may not be able to determine therefrom that the range of target area completely, after receiving the described second special address field at this time, also
Need to receive subsequent data to cooperate the cell ID stored in the described second special address field to determine target area jointly
Range.
Traffic operation in cellular array between flanking cell includes: any cell in the cellular array at its micro- place
It manages and sends data to flanking cell under the control of device.
In the present embodiment, each cell is provided with a bus control unit, connect with cellular array bus.Cell interior is set
Cell interior bus is set, the non-volatile random access memory is the slave equipment (Slave) of the cell interior bus, total line traffic control
Device and microprocessor processed are main equipment (Master).
" any cell judgement in the cellular array involved in communication means in above-mentioned cellular array computing system
Whether the destination address is in this cell ", " identifying whether the communication with the microprocessor of this cell " " identifies that this cell is
It is no in the target area ", and " connection non-volatile random access memory or microprocessor " is by the bus control unit
It completes, the bus control unit connects the non-volatile random access memory by the cell interior bus.
In the specific implementation, non-volatile random access memory of the master cpu for any cell in the cellular array
Read-write operation priority be higher than the intracellular microprocessor for corresponding non-volatile random access memory read-write operation.
That is, if the microprocessor in some cell needs to read and write the non-volatile random access memory in the cell, the master cpus such as palpus
For just can be carried out after the completion of the read-write operation of the non-volatile random access memory of the cell.
The specific implementation of communication means in the cellular array computing system can also be calculated with reference to above-mentioned cellular array
The implementation of system, details are not described herein again.
In the present embodiment, implementation for the internal network of cellular array can not only send data to flanking cell, and
And also it is expanded to and can be sent to any one cell from a cell, it can realize that the iuntercellular in cellular array is logical
Letter.
It specifically, can be not against master between any two cell in the cellular array of the cellular array computing system
It is communicated in the case where control CPU, the cell for participating in cell-cell communication includes starting point cell, end point cell and transfer cell, institute
Stating starting point cell is the cell that data are issued to the end point cell, and the end point cell is finally receives the starting point cell institute
The cell of data is sent out, the transfer cell is successively adjacent along cell-cell communication path and by described in the communication interface transfer
The cell of the sent out data of starting point cell, the cell-cell communication path are by the starting point cell, transfer cell and end point cell
The data transmit-receive path constituted.
By the communication interface between flanking cell in cellular array, realize between flanking cell for data it is multiple in
Turn, makes just to can be carried out communication not against master cpu between any two cell in cellular array, improve cell-cell communication
Efficiency also reduces the processing load of master cpu, so as to further promote the overall performance of computing system.
It should be noted that the starting point cell, end point cell and transfer cell are logical relative to certain iuntercellular
Relative concept for letter process, because some starting point cell completely may also be as the transfer during other cell-cell communications
Cell or end point cell, some end point cell may also as during other cell-cell communications transfer cell or starting point it is thin
Born of the same parents.
In the specific implementation, the cell in the cellular array can also include the network control being connected with the microprocessor
Device processed, the network controller are used in cell-cell communication to the data of sending, the data of transfer or final received number
According to transmitting-receiving control is carried out, it is also used to send interrupt signal to the microprocessor.In the present embodiment, by being set into the cell each
A network controller is set, so as to the rapidly interim data in the case where not interfering MPU, thus reduces the place of MPU in cell
Reason burden.In other embodiments, it can also be not provided with the network controller, but realize the transfer of data by MPU.
In the present embodiment, " data of sending " refer to the data that the starting point cell itself is sent out;" the number of transfer
According to " referring to the data that starting point cell described in the transfer cell transfer issued, the data are not as needed for this cell itself
It to be issued;" final received data " refer to end point cell received data, and the data are after by multiple transfer
It arrived destination, will no longer carry out transfer." data of sending ", " data of transfer " and " final received data ", it is just interior
It may be identical data for appearance, only belong to the different addresses in different communication stage.
In the specific implementation, the cell in the cellular array can also include one group to be connected with the network controller
Or one group or more First Input First Output (FIFO, First Input First Output), each group First Input First Output is right respectively
A cell adjacent with this cell is answered, each group of First Input First Output includes input First Input First Output and output first in, first out
Queue, the input First Input First Output, which is used to store, inputs data or final received data that this cell carries out transfer, institute
Output First Input First Output is stated to send out for storing from the data of the need progress transfer of this cell output or this cell to other cells
Data out.
If in the case where between the flanking cell shown in Fig. 3 for communication mode, in the cellular array of the present embodiment into
The structure of the cell of row cell-cell communication as shown in fig. 6, network controller in Fig. 6 respectively with MPU and 4 group of fifo queue phase
Even, each group fifo queue is a pair of in " the upper and lower, left and right four direction of two-dimensional surface " adjacent cell one with this cell respectively
It answers, when it is implemented, the communication channel between every two flanking cell can share one group of corresponding fifo queue.Every group of FIFO
Queue includes input FIFO and output FIFO, and for standing in the wherein angle of a cell, input FIFO is stored from other
The data of flanking cell input, output FIFO store data from this cell to other flanking cells that export from, flanking cell it is defeated
FIFO belongs to input FIFO, the output FIFO of this cell for this cell and belongs to input for flanking cell out
FIFO。
It should be noted that corresponding in cell shown in fig. 6, there are 4 groups of fifo queues, if being located at the cell battle array of rectangle
The cell at 4 angles of column, then only there are two flanking cells for the cell, and the cell corresponds to 2 groups of fifo queues at this time, if being located at square
The cell on 4 sides of the cellular array of shape, then there are three flanking cells for the cell, and the cell corresponds to 3 groups of fifo queues at this time.
In the present embodiment, network controller is also connected with the intracellular MPU, give it send out interrupt signal, as FIFO empty,
FIFO completely, newly arrives data, data submitting etc.;MPU then can issue data by network controller, and the data of sending would generally
It is first put into corresponding output fifo queue.
It should be noted that the eucaryotic cell structure in Fig. 6 illustrate only module relevant to cell-cell communication is carried out, ability
Field technique personnel are, it is understood that eucaryotic cell structure shown in fig. 6 can combine completely with eucaryotic cell structure shown in fig. 5.
In addition, outputting and inputting the data of some cell in the present embodiment using fifo queue storage, can so make thin
Data relay in intercellular communication process more efficiently, reduces the processing load of MPU.In other embodiments, it outputs and inputs
The data of some cell can also be realized by register.
The embodiment of the present invention also provides the communication means in a kind of above-mentioned cellular array computing system between cell, comprising:
The data that starting point cell in the cellular array will be issued to end point cell are sent to and described by selected sending direction
The adjacent cell of point cell;The data of flanking cell sending or the number of transfer are received when any cell in the cellular array
According to when, if according to the ID for the end point cell indicated in the data received judge this cell be end point cell, will
The data received are stored in the non-volatile random access memory of this cell, or notify the microprocessor of this cell to received data
It is handled, otherwise this cell is used as transfer cell, and the data relay received is given this after selected sending direction
The adjacent cell of cell.
In the specific implementation, each data involved in cell-cell communication process all can be thin containing starting point cell and terminal
The ID of born of the same parents, any cell can judge that the data are intended for this according to the ID for the end point cell indicated in the data received
Cell still needs to be given to other flanking cells in further.One data passes through the connection between flanking cell, through excessive
Secondary transfer, cell of reaching home can be according to starting points if the end point cell needs to make feedback with regard to the sent out data of starting point cell
Feedback data is sent to starting point cell by the ID of cell, and the end point cell is with the starting point cell indicated in the data that receive
ID of the ID as end point cell, is indicated in obtained feedback data, at this time after handling the data received
The end point cell becomes starting point cell when a new cell-cell communication, and it is logical that original starting point cell then becomes the secondary iuntercellular
End point cell when letter.
When it is implemented, while the ID for indicating end point cell, data that the starting point cell is issued to end point cell
In also indicate address or MPU to be accessed in the end point cell;The data that will be received are stored in the non-of this cell
Volatile random access memory, after being the address to be accessed that the end point cell is indicated in identifying the data received
It carries out;The MPU for notifying this cell handles received data, and the end point cell is received identifying
It is carried out after the MPU indicated in data.
In actual implementation, if the address to be accessed that end point cell is indicated in identifying the data received
Afterwards, then received data can be write direct the non-volatile random access memory of the cell by the network controller in the end point cell
In appropriate address, in this case, cell may be implemented " to breed ", and cell can download journey to another cell
Sequence;If after the MPU that end point cell is indicated in identifying the data received, received data will be transferred in end point cell
MPU processing.
In the present embodiment, described since the cell in cellular array further includes the network controller being connected with MPU
Starting point cell issues data to end point cell, any cell in the cellular array receives data that flanking cell issues or in
The data that turn and judge this cell be final cell or transfer cell, the data received are stored in this cell it is non-volatile with
Machine memory notifies the MPU of this cell to handle received data, is under the control of the network controller
It completes.
When it is implemented, the data that the starting point cell is issued to end point cell are first as described in network controller input
Export First Input First Output, then by the network controller from the output First Input First Output export to the starting point cell
Adjacent cell;If any cell in the cellular array receives the data of flanking cell sending or the data of transfer,
The data received are inputted into the input First Input First Output, and again when the data for judging to receive need to carry out transfer
The data are inputted into the output First Input First Output.
In addition, if the network controller judges the input First Input First Output or output First Input First Output for sky
Or expired, or receive flanking cell and issue or the data of transfer, or issue data or interim data to flanking cell, then to
The microprocessor sends interrupt signal.
In the specific implementation, the starting point cell or transfer cell can select in the following way described sender to:
If the communication path of straight line, the transmission can be formed between the starting point cell or transfer cell and the end point cell
Direction is along the straight line from the starting point cell or transfer cell to the direction of the end point cell, otherwise described sender to
For from the starting point cell or transfer cell to the direction of flanking cell to be selected, the flanking cell to be selected is thin with the starting point
Close to the cell of the end point cell among born of the same parents or the adjacent cell of transfer cell.Certainly, the quantity of the flanking cell to be selected
It is possible that then selecting the less cell of communication task of output data in the two flanking cells to be selected as at this time for two
Turn cell.
In the present embodiment, the starting point cell or transfer cell select sending direction through the above way, actually may be used
To be considered the path selection process of cell-cell communication in cellular array.Can be refering to Fig. 7, each rectangle in Fig. 7 indicates thin
A cell in born of the same parents' array, all cells shown in Fig. 7 are a part in entire cellular array, it is assumed that flanking cell it
Between communication mode as shown in Figure 3 carry out.
If A point indicates that a starting point cell, the starting point cell prepare to issue data to the end point cell where C point, by
Be clearly the communication path for being capable of forming straight line between A point and C point, then the cell where A point by data be sent to and its
Cell where adjacent B point, similarly, the cell where B point continue on the straight line between A point and C point as transfer cell
The direction interim data of cell where to C point is successively adjacent thin on the cell-cell communication path formed between A point and C point
Born of the same parents repeatedly forward the data that cell where A point issues, until being transmitted to cell where C point.
If D point indicates another starting point cell, which prepares to issue data to the end point cell where G point,
Due to being clearly the communication path that can not form straight line between D point and G point, then in the cell adjacent with cell where D point
Among, the cell where cell and F point where E point is obviously closer to the end point cell where G point, then the two cells belong to
The flanking cell to be selected of cell, the less cell of communication task that can choose wherein output data are thin as transfer where D point
Born of the same parents choose at random a cell as transfer cell if the communication task of the two cell output datas is identical.Such as Fig. 7 institute
Show, the cell where cell or F point where selection E point will form different cell-cell communication paths.
It should be noted that being by taking the communication mode between flanking cell shown in Fig. 3 as an example in the present embodiment to cell
Between the Path selection that communicates be illustrated, if it will be appreciated to those of skill in the art that using shown in Fig. 4 adjacent thin
Communication mode between born of the same parents, then alternative sending direction will be more.
To sum up, in actual implementation, each is issued or the cell of interim data, network controller must all select one
A adjacent cell is as the next stop.When beginning and end point-blank when, reasonable selection it is general only one;Other
In the case of, there are two same reasonable selection, network controller will select the opposite neighbours not being in a hurry of a traffic.
If some input fifo queue has data entrance, network controller will first check for it:
If terminal is this cell: if terminal is specific relative address, since network controller has direct memory
The data received will be directly stored in described non-volatile deposit at random by the ability for accessing (DMA, Direct Memory Access)
Appropriate address in reservoir, and with interrupt notification MPU;If terminal is MPU, directly handled with interrupt signal notice MPU.
If the MPU that terminal is other cells or this cell sends out data: if terminal and this cell exist
On straight line, then selection is correctly oriented, and sends data to flanking cell;In the case of other, there are two possible direction, choosings
The flanking cell that wherein output fifo queue is more idle is selected to send, if the output fifo queue in two flanking cells to be selected
Situation is identical, then can therefrom choose at random a flanking cell and send.
In actual implementation, when the thousands of MPU present in the cellular array are calculated together, how each cell
Output data be sent to the master cpu just and become a problem.In general, each MPU can store output data
Agreed address where it in non-volatile random access memory of cell allows master cpu by way of each MPU of poll one by one
It is read out.However, this is not particularly suited for all problems, in some problems, when there was only a few cell in cellular array
When needing to master cpu output data, then each MPU efficiency of poll is too low one by one for master cpu.
Therefore, cellular array computing system provided in an embodiment of the present invention further include: be additionally provided in the cellular array to
A few full-time output cell, the sole duty export cell as end point cell and receive and store other cells to the master control
The output data of CPU, and notify the master cpu to read the output data with interrupt signal.
In the specific implementation, FIFO team can also be set in the non-volatile random access memory of the full-time output cell
Column, other cells are stored in the fifo queue to all output datas of the master cpu, which should have foot
Enough memory spaces have the ability to store other cells to all output datas of the master cpu.
When actual implementation, one or several cells can be selected as the full-time output cell in cellular array, one
As can choose and communicated more convenient and fast cell between master cpu in position.The full-time output cell with it is described
Interrupt line is equipped between master cpu, the full-time output cell can send interrupt signal to master cpu, such as newly arrive other
The FIFO that the fifo queue being arranged in the output data of cell, MRAM expired, is arranged in MRAM sky etc..
Based on the above-mentioned cellular array computing system for being equipped with full-time output cell, the embodiment of the present invention also provides a kind of cell
Communication means in array computation system, comprising: the sole duty exports cell and receives and stores other cells to the master cpu
Output data after, give notice the interrupt signal of reading to the master cpu;The master cpu is receiving described lead to
After knowing the interrupt signal of reading, the output data is read from the full-time output cell.
When it is implemented, the output data can be sent to the sole duty in the following way defeated for other described cells
Cell out: any cell in other described cells is sent out the output data by selected sending direction as starting point cell
It send to adjacent cell;When any cell in the cellular array receives the output data of flanking cell transmission,
If the ID for judging the end point cell indicated in the output data is consistent with the ID of this cell, due in the output data
The ID for the end point cell indicated is the ID of the full-time output cell, shows that this cell is that the sole duty exports cell, then will
The output data is stored in the non-volatile random access memory of this cell, and otherwise this cell is used as transfer cell, in selected sender
The output data transfer is given into the adjacent cell of this cell backward.
During the output data is sent to the full-time output cell by other described cells, the starting point cell
Or transfer cell can select in the following way described sender to: if the starting point cell or transfer cell and the sole duty
The communication path of straight line can be formed between output cell, then described sender is to for by the starting point cell or transfer cell
Along the straight line to the direction of the full-time output cell, otherwise described sender is to for by the starting point cell or transfer cell
To the direction of flanking cell to be selected, the flanking cell to be selected is among the cell adjacent with the starting point cell or transfer cell
Close to the cell of the full-time output cell.
The implementation process of the full-time output cell of the embodiment of the present invention can also be refering to Fig. 8.Fig. 8 shows master cpu, thin
Born of the same parents' array and cellular array bus, the lattice one by one in cellular array simply represent cell one by one, wherein J point
The cell (i.e. the cell of bold box lattice expression) at place is full-time output cell, and Fig. 8 further illustrates full-time output
The structure of cell, as represented by dashed arrows in fig. 8, it can be seen that be equipped in the MRAM in sole duty output cell and store other carefully
Fifo queue of the born of the same parents to all output datas of the master cpu.
Assuming that the cell where cell and I point where H point needs to provide output data to master cpu, then can pass through
The output data is sent to the cell where J point, H point to the cell-cell communication path of J point by the communication mode between cell
And I point please refers to Fig. 8 to the cell-cell communication path of J point.Since cell-cell communication mode has had a detailed description before this, this
Place repeats no more.
Cell where J point receive cell or I point where H point where the output data that issues of cell after, then may be used
Notify the interrupt signal read to master cpu to send, it, can after master cpu receives the interrupt signal of notice reading
The output data is read from the cell where J point by cellular array bus.
By the way that full-time output cell is arranged in cellular array, connect using the full-time output cell as the end point cell
It receives and stores other cells to the output data of master cpu, and notified described in the master cpu reading in a manner of interrupt signal
Output data so can improve master cpu when only a few cell is needed to master cpu output data and read output
The efficiency of data.
Introduce the example of an above-mentioned cellular array computing system of application again below.
Speech recognition can be compared with the voice signal of known sound bank and input, this comparison can be when
Domain compares and can also compare in frequency domain.When needing the words that compares more and more, such as, it is contemplated that different accents can arrive
It is tens of thousands of, seem insufficient if only relying on the computing capability of a few CPU for Real-time speech recognition.
Cellular array computing system provided in an embodiment of the present invention is then very suitable to solve problems.
For this purpose, the embodiment of the present invention also provides a kind of side for carrying out comparing using above-mentioned cellular array computing system
Method, comprising: the master cpu selects in the cellular array all or after the cell in a target area, alignment programs
It is broadcast in the non-volatile random access memory of each cell;The master cpu is responsible for selected each cell in the sample compared
Product are respectively written into the agreed address of each cell;The master cpu broadcasting instructions give the microprocessor of selected cell, make each
The data to be compared to be entered such as after a microprocessor completion initialization;The master cpu is data broadcasting to be compared to institute
Select the microprocessor of cell;The microprocessor of selected cell runs the alignment programs, to the number to be compared received
It is compared according to the sample for being responsible for comparing with this cell, if obtaining the consistent comparison result of the two, uses above-mentioned cellular array
Communication means in computing system is sent to the full-time output cell for the comparison result as output data for described
Master cpu is read.
When it is implemented, the data to be compared are either voice data to be identified, is also possible to be identified
Image data can also be other data for needing to be compared.
In actual implementation, each MPU constantly receives voice data and is compared, it is generally the case that several hundred a to thousands of
In a cell, only one or a few obtain data to be compared with this cell to be responsible for both samples compared consistent
Comparison result is sent to full-time output cell by comparison result, these cells, and the latter notifies master cpu to receive with interrupt signal.
If the data to be compared are specially voice data, the process of comparing can time domain or frequency domain into
Row can be first segmented by master cpu if it is the latter and carry out Fast Fourier Transform (FFT) (FFT, Fast Fourier
Transformation), then broadcast have been converted to frequency domain voice data give selected cell MPU.
Comparing is carried out by the above-mentioned cellular array computing system for being equipped with full-time output cell of application, cell can be made
The operation of program is compared in a large amount of cell simultaneously in array, thus has extremely strong parallel processing capability, solves existing
Communication performance bottleneck problem in technology between CPU and memory is greatly improved real-time voice/image recognition ability.
As previously mentioned, from a cell sending bulk message to some target area in cellular array, had one it is simple
Method: by master cpu read information broadcasted again.The present embodiment also provides another implementation: intercellular
Point communications functionality is extended to region mass-sending, and this mode can support the higher depth of parallelism, much higher total bandwidth.
In cellular array computing system provided in this embodiment, any cell in the cellular array can also be used as institute
It states all cells of the starting point cell into target area and carries out mass-sending communication, participate in the mass-sending communication and be located in target area
Cell the transfer cell and end point cell, ginseng are used as the starting point cell or as the end point cell or simultaneously
It is communicated with the mass-sending and the cell outside target area is as the starting point cell or transfer cell.
When it is implemented, the network controller being connected in each cell with microprocessor, in addition to any two cell into
Row cell-cell communication, but also in mass-sending communication, to the data of sending, the data of transfer or final received data
Transmitting-receiving control is carried out, the network controller is also used to send interrupt signal to the microprocessor.
In actual implementation, the original sender (cell in cellular array as starting point cell) of iuntercellular mass-sending communication
It is responsible for indicating target area, the mass-sending of data is completed still through a series of transfers.It will be appreciated to those of skill in the art that
Iuntercellular mass-sending communication is it is also assumed that be effective superposition of multiple intercellular point-to-point communication, therefore iuntercellular mass-sending communicates
Specific implementation can also be with reference to the implementation communicated between any two cell, such as the cellular array referred to before this
In cell equally may include a group or more of First Input First Outputs being connected with the network controller, herein no longer
It repeats.
On the basis of mass-sending communication between above-mentioned cellular array computing system sertoli cell, the embodiment of the present invention also provides one
Iuntercellular mass-sends communication means in the above-mentioned cellular array computing system of kind, comprising: when any cell in cellular array is used as
It, will be thin if the starting point cell is located in the target area when point cell all cells into target area initiate mass-sending communication
Intercellular mass-sending data are sent to all flanking cells in the target area, and are directed to each flanking cell more fresh target
Otherwise iuntercellular mass-sending data are sent to adjacent cell by the direction close to target area by region;If being located at target
Cell outside region receives the iuntercellular mass-sending data of flanking cell transmission, then is judging the iuntercellular mass-sending number
After not including this cell according to middle indicated target area, this cell is used as transfer cell, will by the direction close to target area
The iuntercellular mass-sending data relay is to flanking cell;If the cell being located in target area receives the institute of flanking cell transmission
State iuntercellular mass-sending data, then after the target area indicated in judging the iuntercellular mass-sending data is comprising this cell,
This cell is used as end point cell, and the iuntercellular received mass-sending data are stored in the non-volatile random access memory of this cell,
Or notify the microprocessor of this cell to the iuntercellular mass-sending data handle, if the target area in still have with
The adjacent cell of this cell, then this cell is also used as transfer cell, by the iuntercellular received mass-sending data relay to institute
There is the flanking cell being located in target area, and updates target area for each flanking cell;Updated target area
Including one or more target areas made of being divided as the target area before updating, issued in the target area before update
Or each flanking cell of the cell of the mass-sending data of iuntercellular described in transfer is separately included in updated each target area
It is interior, it has issued or iuntercellular described in transfer is mass-sended except the target area of the cell exclusion of data in the updated.
It should be noted that since master cpu can also be by the data broadcasting of some cell to some in cellular array
Target area, in order to " broadcast data of master cpu " different from, therefore in the present embodiment by iuntercellular mass-send communication when institute
The mass-sending data being related to are known as " iuntercellular mass-sending data ".The cell meeting hard objectives region for initiating iuntercellular mass-sending communication, should
The range of the ID of all cells or all cell ID will be indicated among iuntercellular mass-sending data in target area, appoint
One cell receives the iuntercellular mass-sending data, just can mass-send the target area indicated in data according to the iuntercellular
Judge that iuntercellular mass-sending data are finally received by this cell, or need to be given to other flanking cells in further,
Or the two all needs execution.
In addition, described update target area for each flanking cell, it is specifically that the target area before updating is drawn
One or more target areas made of point (have issued or the cell of the data of the mass-sending of iuntercellular described in transfer are excluded
Except updated target area), wherein each target area can respectively contain the flanking cell (i.e. and before update
Issued in target area or transfer described in iuntercellular mass-sending data the adjacent cell of cell), each described flanking cell exists
Respectively continue intercellular mass-sending communication in corresponding updated target area, correspondingly, the iuntercellular mass-sending
The target area indicated in data equally can also be updated.
In the present embodiment, with communication mode between flanking cell shown in Fig. 3, and to initiate the starting point cell institute of mass-sending communication
The shape of determining target area be rectangle for be illustrated.It should be noted that iuntercellular group given by the present embodiment
More convenient and efficient mode when communication mode is actual implementation is sent out, it will be appreciated by those skilled in the art that in other implementations
In example, iuntercellular mass-sending communication means equally can be suitably used for communicating between other flanking cells in above-mentioned cellular array computing system
The target area of mode or other shapes.
When it is implemented, the side of the difference of the cell present position as starting point cell or transfer cell, sending or transfer
Formula will be different.
When the first cell as starting point cell or transfer cell is located on the angle of rectangular target areas, if the square
The cell quantity on the wherein one side on the adjacent both sides containing first cell is 1 in shape target area, then updated mesh
Mark region is that the rectangular target areas excludes the rectangle region formed after first cell in the another side on the adjacent both sides
Domain, otherwise updated target area includes the target area of two rectangles, and one of target area is the adjacent both sides
Middle any side excludes the rectangular area formed after first cell.It should be noted that described first in the present embodiment is thin
Born of the same parents are the general designations of a kind of cell on the angle of rectangular target areas.
It can be refering to Fig. 9, it is assumed that the cell where K point is the starting point cell for initiating iuntercellular mass-sending communication, or is negative
Blame the transfer cell of transfer iuntercellular mass-sending data, rectangular target areas 101 be the cell where K point issue or transfer described in it is thin
Intercellular mass-sends identified target area before data, and the cell where K point is in rectangular target areas 101 at this time, and is located at square
On the angle of shape target area 101, since this edge in the horizontal direction of rectangular target areas 101 only includes 1 cell, K at this time
Cell only one neighbour where point can choose as next stop transfer, then the network controller of the cell is by the cell
Between mass-sending data be sent to the cell where L point, and rectangular target areas 101 is updated, the target area formed after update
Domain is rectangular target areas 102, is equivalent to and excludes the cell where K point except rectangular target areas 101;With target area
The continuous renewal in domain stops transfer if being left the last one cell in target area.
Assuming that the cell where M point is also the starting point cell for initiating iuntercellular mass-sending communication, or to be responsible for transfer cell
Between mass-send data transfer cell, rectangular target areas 103 be M point where cell issue or transfer described in iuntercellular mass-send number
According to preceding identified target area, the cell where M point is in rectangular target areas 103 at this time, and is located at rectangular target areas
On 103 angle, since the adjacent both sides of rectangular target areas 103 include 1 or more cell, the cell where M point has at this time
Two neighbours can choose as next stop transfer, then the network controller of the cell sends iuntercellular mass-sending data to
The cell where cell and O point where N point, and rectangular target areas 103 is updated, updated target area includes
The target area of two rectangles, one of target area are rectangular target areas 104, another target area is rectangular target
Region 105 is equivalent to the cell where M point excluding rectangular target areas 104 and rectangle except rectangular target areas 103
Target area 105 can be used as independent target area and continue to carry out data relay with aforementioned similar approach;With target area
Continuous renewal stop transfer if being left the last one cell in target area.
When the second cell as starting point cell or transfer cell is located on the side of rectangular target areas, if the square
It with the cell quantity where second cell while adjacent is 1 in shape target area, then updated target area includes
Side where second cell excludes the target area of two rectangles formed after second cell, otherwise updated target
Region includes the target area of three rectangles, and two of them target area is that side where second cell excludes described second carefully
Two rectangular areas formed after born of the same parents.It should be noted that second cell in the present embodiment is to be located at rectangular target area
The general designation of a kind of cell on the side in domain.
It can be with refering to fig. 10, it is assumed that the cell where P point is the starting point cell for initiating iuntercellular mass-sending communication, or is negative
Blame the transfer cell of transfer iuntercellular mass-sending data, rectangular target areas 111 be the cell where P point issue or transfer described in it is thin
Intercellular mass-sends identified target area before data, and the cell where P point is in rectangular target areas 111 at this time, and is located at square
On certain side of shape target area 111, where cell as where in rectangular target areas 111 with P point while adjacent
Cell quantity is greater than 1, then the cell where P point can choose there are three neighbours as next stop transfer, the network of the cell at this time
It is thin where cell that iuntercellular mass-sending data are separately sent to where Q point by controller, cell and S point where R point
Born of the same parents, and rectangular target areas 111 is updated, updated target area includes the target area of three rectangles, respectively
Rectangular target areas 112, rectangular target areas 113 and rectangular target areas 114 are equivalent to and exclude the cell where P point
Except rectangular target areas 111, rectangular target areas 112 and the two target areas of rectangular target areas 113 are the equal of P point
Two rectangular areas formed after cell where the cell at place where side exclusion P point, rectangular target areas 112, rectangle mesh
Mark region 113 and rectangular target areas 114, which can be used as independent target area, to be continued to carry out in data with aforementioned similar approach
Turn;With the continuous renewal of target area, if being left the last one cell in target area, stop transfer.
It is understood that if in target area (not indicated in Figure 10) with where cell where P point while adjacent
Cell quantity be 1, then the cell where P point can choose as next stop transfer, the network of the cell there are two neighbours at this time
The iuntercellular is mass-sended the cell where cell and R point where data are separately sent to Q point by controller, and to target area
It is updated, updated target area includes the target area of two rectangles, specially rectangular target areas 112 and rectangle mesh
Mark region 113.
When the third cell as starting point cell is located at the inside of rectangular target areas, updated target area includes
The target area of four rectangles, two of them target area are after third cell place row or column excludes the third cell
Two rectangular areas formed, other two target area is the rectangular target areas before update by the third cell institute
It is expert at or column split and two rectangular areas being formed.It should be noted that the third cell in the present embodiment is to be located at
The general designation of a kind of cell of the inside of rectangular target areas, the inside of the rectangular target areas refer to except " angle " and " side " with
Outer region.
It can be with refering to fig. 11, it is assumed that the cell where T point is the starting point cell (the present embodiment for initiating iuntercellular mass-sending communication
Cell where middle T point can not mass-send the transfer cell of data for responsible transfer iuntercellular), rectangular target areas 121 is T point
The cell at place issues identified target area before the iuntercellular mass-sending data, and the cell where T point is located at rectangle at this time
The inside of target area 121, the cell where T point can choose there are four neighbours as next stop transfer, the network control of the cell
Device processed by iuntercellular mass-sending data be separately sent to the cell where U point, the cell where V point, the cell where W point and
Cell where X point, and rectangular target areas 121 is updated, updated target area includes the target of four rectangles
Region, respectively rectangular target areas 122, rectangular target areas 123, rectangular target areas 124 and rectangular target areas 125,
It is equivalent to and the cell where T point is excluded into rectangular target areas 122 and rectangular target areas except rectangular target areas 121
123 the two target areas are the equal of that cell where T point is expert at two rectangles formed after the cell where excluding T point
Region, rectangular target areas 124 and the two target areas of rectangular target areas 125 are the equal of rectangular target areas 121 by T
Cell where point, which is expert at, to be divided and two rectangular areas of formation, rectangular target areas 122, rectangular target areas 123, square
Shape target area 124 and rectangular target areas 125, which can be used as independent target area, to be continued to be counted with aforementioned similar approach
According to transfer;With the continuous renewal of target area, if being left the last one cell in target area, stop transfer.
In the present embodiment, when the 4th cell as starting point cell or transfer cell is located at except target area, if
The communication path of straight line can be formed in 4th cell and target area between any cell, then the 4th cell hair
Out or the sending direction of the mass-sending data of iuntercellular described in transfer is along the straight line from the 4th cell to the side of target area
To, otherwise described sender is to for from the 4th cell to the direction of flanking cell to be selected, the flanking cell to be selected be with
Close to the cell of target area among the adjacent cell of 4th cell.It should be noted that described in the present embodiment
Four cells are the general designations of a kind of cell except rectangular target areas.
It can be with refering to fig. 12, it is assumed that the cell where Y1 point is the starting point cell for initiating iuntercellular mass-sending communication, rectangle mesh
Marking region 131 is that the cell where Y1 point issues identified target area before the iuntercellular mass-sending data, at this time Y1 point institute
Cell except rectangular target areas 131, due to the cell rectangular target areas with respect to two sides extended line between,
With the communication path for being capable of forming straight line where Y3 point in rectangular target areas between cell, only one neighbour can at this time
Using the transfer as the next stop, iuntercellular mass-sending data are sent to this neighbour by the network controller of the cell where Y1 point
It occupies, i.e. cell where Y2 point, the cell where Y2 point is as the transfer cell for being responsible for the mass-sending data of iuntercellular described in transfer.Y2
Cell where point will be along direction interim data shown in dotted arrow in Figure 12, until being transmitted to cell where Y3 point.Y3
Cell where point is located on the side of rectangular target areas 131, can continue to complete rectangular target areas according to aforementioned correlation technique
Transfer process in 131.
With continued reference to Figure 12, it is assumed that the cell where Z1 point is the starting point cell for initiating iuntercellular mass-sending communication, rectangle mesh
Marking region 131 is that the cell where Z1 point issues identified target area before the iuntercellular mass-sending data, at this time Z1 point institute
Cell except rectangular target areas 131, due to the cell not rectangular target areas with respect to two sides extended line
Between, it is all difficult to form the communication path of straight line between cell any in rectangular target areas, there are two neighbours at this time
It can be used as the transfer of the next stop, i.e. the cell where cell and Z3 point where Z2 point, the two cells are where Z1 point
The flanking cell to be selected of cell, because the two cells are among the adjacent cell of cell where Z1 point closer to rectangular target
The cell in region 131.In actual implementation, it can arbitrarily select one or more practical communication situation selects a burden to compare
For light cell as next stop transfer, the communication task that the lighter cell of the burden refers specifically to output data is less
Cell.Cell where Z1 point, by two feasible transfer communication paths, until the iuntercellular is mass-sended data
In go to cell where Z4 point.Cell where Z4 point is located on the angle of rectangular target areas 131, can be according to aforementioned correlation technique
Continue to complete the transfer process in rectangular target areas 131.
Iuntercellular mass-sends communication means in cellular array computing system provided in this embodiment, by by intercellular point pair
Point communication function expands to region mass-sending, can support the higher depth of parallelism, obtain much higher total bandwidth, to further mention
Rise the overall performance of computing system.
As previously mentioned, the cell in the cellular array of the embodiment of the present invention has both memory, storage and calculates three functions, carefully
The non-volatile random access memory intracellular had not only been able to achieve the arbitrary access of involved data when the microprocessor calculates, but also energy
Store the instruction code of software and need the data of persistence, however the cost of the non-volatile random access memory be usually compared with
High, so the space that the non-volatile random access memory in cell is used as memory part is limited, then when place micro- in cell
When the data processing amount of reason device is larger, limited memory headroom can influence the treatment effeciency of microprocessor, how extend thin
The memory headroom of born of the same parents becomes as urgent problem to be solved.
Based on above-mentioned consideration, the embodiment of the present invention gives another structure of cellular array computing system, such as Figure 13
Shown, the cellular array computing system is gone back in addition to including foregoing master cpu, cellular array and cellular array bus
It may further include at least one memory cell array, the memory cell array is made of more than one internal storage location
Two-dimensional array, the cellular array and all memory cell arrays overlap to form three-dimensional structure, in each memory cell array
Internal storage location is connected correspondingly with the cell in the cellular array, and internal storage location cooperation is described non-volatile to deposit at random
Reservoir, the two are provided commonly for the arbitrary access of involved data when the microprocessor calculates.
In actual implementation, the non-volatile random access memory in cell can be MRAM, and the memory cell array then may be used
To be MRAM, DRAM or SRAM silicon wafer, lower-cost one or more DRAM silicon wafer can be generally chosen, wherein each DRAM
Silicon wafer is the memory cell array formed by the internal storage location consistent with each cell position in the cellular array, then by institute
There is DRAM silicon wafer to carry out 3D with cellular array silicon wafer to combine, either memory unit and cell corresponding in cellular array it
Between can by cross through silicon via (TSV, Through Silicon Vias) establish a communications link, thus extend the interior of each cell
It deposits.
In the embodiment of the present invention, by the memory cell array for being made of at least one more than one internal storage location, with
The cellular array overlaps to form three-dimensional structure, and makes thin in the internal storage location and cellular array in each memory cell array
Born of the same parents are connected correspondingly, the arbitrary access of the internal storage location involved data when calculating for the microprocessor, so
Just the memory headroom that each cell in cellular array can be extended with lower cost improves the processing effect of microprocessor in cell
Rate.
It overlaps to form three-dimensional with the cellular array it should be pointed out that illustrating only a memory cell array in Figure 13
The case where structure, those skilled in the art equally will also appreciate that more than one memory cell array and the cellular array overlap
The case where forming three-dimensional structure.
Those skilled in the art are also understood that aforementioned master cpu passes through in cellular array bus and cellular array
Each cell communicated, is carried out between any two cell not against master cpu communication, any cell are into target area
All cells carry out mass-sending communication, receive and store it by the way that full-time output cell is arranged in cellular array as end point cell
His cell the communication means such as reads to the output data of master cpu for master cpu, these are equally applicable to comprising the memory
The cellular array computing system of cell array.
It should be noted that due to the memory headroom of each Cell expansions in cellular array, the master cpu is in addition to can
To access the non-volatile random access memory of this cell, can also access corresponding with this cell internal storage location (when with cell battle array
Column are superimposed as the quantity of the memory cell array of three-dimensional structure when being more than one, then corresponding with this cell internal storage location
Quantity also has more than one), thus the master cpu by the cellular array bus with it is each in the cellular array
The communication that a cell carries out includes at least one of following situations: reading and writing the non-of any cell in the cellular array by address
Volatile random access memory or corresponding internal storage location;The non-volatile of each cell in data broadcasting to target area is deposited at random
Reservoir or corresponding internal storage location, and the non-volatile random access memory of each cell or corresponding interior in the target area is written
Identical relative address in memory cell;To in the cellular array any cell microprocessor send instruction, send data or
Reading state;To the microprocessor broadcasting instructions of cells all in target area.
When the cell in the cellular array further includes bus control unit and cell interior bus, the cell interior is total
Line is connected in addition to connecting the microprocessor, non-volatile random access memory, internal storage location also corresponding with this cell, described total
Lane controller is connected with the cellular array bus, microprocessor and cell interior bus, and the bus control unit is for knowing
The communication not carried out between the master cpu and this cell connects the microprocessor to transmit the finger that the master cpu is sent
Enable or data, state read, or by the cell interior bus connection non-volatile random access memory or with this cell
Corresponding internal storage location carries out the read-write operation of data.
As previously mentioned, the cell in the cellular array of the embodiment of the present invention has both memory, storage and calculates three functions, carefully
The non-volatile random access memory intracellular had not only been able to achieve the arbitrary access of involved data when the microprocessor calculates, but also energy
Store the instruction code of software and need the data of persistence, however the cost of the non-volatile random access memory be usually compared with
High, so the space that the non-volatile random access memory in cell is used as storage section is equally limited, then big when existing
When amount file or data needs store each cell in cellular array, limited memory space is just unable to satisfy storage and wants
It asks, or even it is also possible to will affect the treatment effeciency of microprocessor, the memory space for how extending cell is equally then urgently to solve
Certainly the problem of.
Based on above-mentioned consideration, the embodiment of the present invention gives another structure of cellular array computing system, such as Figure 14
Shown, the cellular array computing system is gone back in addition to including foregoing master cpu, cellular array and cellular array bus
It may further include at least one memory cell array, the memory cell array is made of more than one storage unit
Two-dimensional array, the cellular array and all memory cell arrays overlap to form three-dimensional structure, in each memory cell array
Storage unit is connected correspondingly with the cell in the cellular array, and storage unit cooperation is described non-volatile to deposit at random
Reservoir, the two are provided commonly for the instruction code of storage software and need the data of persistence.
In actual implementation, the non-volatile random access memory in cell can be MRAM, and the memory cell array then has
Body can be flash memory silicon wafer, can generally choose cost relative to the lower one or more nand flash memory silicon wafers of MRAM, wherein
Each nand flash memory silicon wafer is the storage list formed by the storage unit consistent with each cell position in the cellular array
Element array, then all nand flash memory silicon wafers are subjected to 3D with a cellular array silicon wafer and are combined, any storage unit and cell battle array
It can be vertically connected by TSV between corresponding cell to establish a communications link in column, thus extend the storage of each cell
Space.
When it is implemented, the cell in the cellular array further includes the storage control being connected with the microprocessor,
For carrying out data storage access control to the storage unit being connected with this cell.When one or more nand flash memory silicon wafers and
After one cellular array silicon wafer carries out 3D combination, nand flash memory control can also be configured for each of cellular array cell
The MPU of device, this cell just may be implemented by the nand flash memory controller of this cell to storage unit corresponding with this cell
It is written and read.When heap file or data storage are in nand flash memory, the search of data can be passed through by each cell
The respective channel NAND scans for, and is greatly accelerated.The compiling of large software system is also needed to thousands of
Source code file is compiled, and when these source codes are stored in such cellular array computing system, compiling is similarly obtained
Great acceleration.
Cellular array computing system provided in an embodiment of the present invention comprising the memory cell array, can with it is lower at
The memory space of each cell in this extension cellular array, improves the data storage capacities of each cell.
It overlaps to form three-dimensional with the cellular array it should be pointed out that illustrating only a memory cell array in Figure 14
The case where structure, those skilled in the art equally will also appreciate that more than one memory cell array and the cellular array overlap
The case where forming three-dimensional structure.
It will be appreciated to those of skill in the art that aforementioned master cpu passes through in cellular array bus and cellular array often
A cell communicated, is carried out between any two cell not against master cpu communication, any cell institute into target area
There is cell to carry out mass-sending communication, receive and store other by the way that full-time output cell is arranged in cellular array as end point cell
Cell the communication means such as reads to the output data of master cpu for master cpu, these are equally applicable to comprising the storage singly
The cellular array computing system of element array.
As previously described, because the non-volatile random access memory in cell is used as memory and the space of storage section all to be had very much
Limit, therefore how to extend the memory of cell and memory space is a problem to be solved simultaneously.Based on above-mentioned consideration, the present invention is real
Apply the yet another construction that example gives cellular array computing system, as shown in figure 15, the cellular array computing system in addition to
Including master cpu, cellular array and cellular array bus, can further include at least one said memory cells array and
At least one above-mentioned memory cell array.Cellular array simultaneously comprising the memory cell array and memory cell array calculates
System can refer to the tool of the above-mentioned cellular array computing system for only comprising memory cell array or only including memory cell array
Body is implemented, and details are not described herein again.
Cellular array meter that is provided in an embodiment of the present invention while including the memory cell array and memory cell array
Calculation system can extend the storage of each cell and memory headroom in cellular array simultaneously with lower cost, improve each cell
Data storage capacities and cell in microprocessor treatment effeciency, so as to further promote the globality of computing system
Energy.
As stated in the background art, the calculation amount that neural computing is related to is very big, but by its realization relied on it is existing
There are the communication performance bottlenecks between CPU and memory, storage for computer architecture, thus affect computer overall performance, thus unfavorable
In the efficient realization of neural computing.
For this purpose, based on above-mentioned cellular array computing system provided by the embodiment of the present invention, the embodiment of the present invention is also provided
A method of neural computing is realized using above-mentioned cellular array computing system, comprising: each in the cellular array
Participate in the code that one or more neuron functions are stored in the cell of neural computing;The master cpu selection one
A or more than one cell executes the code for the neuron function that each cell is stored, and by implementing result to one or one with
On target cell output;It is any participate in neural computing cell receive from other cells exported for nerve
The implementing result of the code of meta-function, as the input data of this cell, based on all input numbers from other cells
The code of neuron function stored according to this cell is executed, and implementing result is output to the neuron for needing the implementing result
Cell where function is perhaps stored in preset address and the master cpu is waited to read or export to the master cpu.
Nerve is realized in order to better understand the above-mentioned cellular array computing system of application provided in an embodiment of the present invention
The method of network query function, it is necessary to first the principle of neural computing is briefly described.
Neural network is common algorithm in computer learning, has copied the working principle of human brain.Human brain is by a large amount of
Neuron composition, each neuron can be connect by a large amount of (thousands of) cynapses on dendritic arbors with other neurons
Touching, is inputted;Stiffness of coupling in cynapse has memory function;It exports the signal of an excitement or inhibition, this signal
A large amount of neuron at a distance can be sent to by aixs cylinder.
Human brain working principle is copied to be formed by neural computing method, neuron therein is exactly a function, such as
Shown in Figure 16, it has many inputs, such as x1、x2And x3It is that wherein 3 inputs, each input correspond to a weight,
General calculation method is that each input is added again multiplied by weight;It exports 0 or 1 (being determined by a threshold value) or one
Value between 0 and 1;There are very multiple parameters (such as weight parameter, threshold parameter etc.) inside it, adjusts these parameters
Process is exactly the process of computer learning.
One typical Neural Network is the network that output, the input of a large amount of Neuron are linked together, leads to
Often it is organized into multistage architecture.As shown in figure 17, this neural network is organized into three-tier architecture in figure, each first layer Neuron
Output be sent to each second layer Neuron, the output of each second layer Neuron is sent to each third layer
Neuron.The first layer of neural network is properly termed as input layer in Figure 17, and the second layer is properly termed as hidden layer, and third layer can claim
For output layer, the output layer exported 0,1,2 ..., 8,9 be neural computing final result.Mind in
It is all usually such hierarchical structure through network, in actual implementation, this can be the nerve for being used to identify handwritten numeral
Network.
The learning training process of Neural network is the process that a magnanimity calculates.One network is actually one
The function for possessing quantity of parameters (may have tens of thousands of), needs a large amount of scene, and each scene has input data and correct option,
For adjusting these a large amount of parameters to reach the destination of study (usually using steepest descent method).For example train a nerve net
Network identifies handwritten numeral, needs through tens of thousands of pictures, adjustment parameter provides neural network correctly to each picture
As a result, handwriting recognition hereafter just has very high success rate.
The present inventors considered that neural computing is to be highly suitable for cellular array meter provided in an embodiment of the present invention
It is achieved in calculation system.Due to needing to carry out neural network the valuation of many numbers in training process, then the nerve
Network is can be accelerated by a large amount of MPU parallel computing.
In actual implementation, if the quantity size of neuron is less than cell in cellular array computing system in neural network
Quantity, then in cellular array a cell can complete the calculating that a neuron in neural network is related to, otherwise neural
The calculating that more than one neuron is related in network can integrate to be completed in a cell.
When it is implemented, each cell stores the code of one or more neuron functions, the code of neuron function
It can be broadcasted by way of data broadcasting by master cpu to each cell as neuron, and be broadcasted by master cpu
Instruction to some target area starts neural computing, such as: the master cpu can choose in the cellular array all
Or cell of the cell in target area as participation neural computing, by the code broadcast of identical neuron function to institute
In the same segment relative address of each cell of selection, and broadcasting instructions make the micro- of one or more selected cells
Processor executes the code of the neuron function since the relative address.Specific implementation can also refer to above-mentioned cell battle array
The associated description of column count system specific implementation.
In the embodiment of the present invention, the corresponding weight parameter of each input data from other cells is described to be based on
The code that all input datas from other cells execute the neuron function that this cell is stored may include: by each
The corresponding weight parameter of input data from other cells is multiplied, and sums to all products, by the result after summation
The output valve determined after being compared with threshold parameter is as implementing result.When actual implementation, the weight parameter and threshold value
Parameter is pre-stored among the non-volatile random access memory in cell.
It is also logical if the implementing result exported after executing the code of the neuron function is successive value when actual implementation
The mode tabled look-up is crossed to accelerate to export.
When it is implemented, since cellular array computing system provided in an embodiment of the present invention is supported between any two cell
Cell-cell communication, then when carrying out neural computing, can by using the cell-cell communication transmit for the nerve
The implementing result of the code of meta-function.The effect of cell-cell communication is improved by the huge bandwidth of cellular array internal network communication
Rate also reduces the processing load of master cpu, so as to further promote the overall performance of computing system, is more advantageous to nerve
The efficient realization of network query function.
When it is implemented, then data transmission between layers can be used when neural network is divided into different layers
Cell-cell communication provided in an embodiment of the present invention mass-sends mechanism, and the cell for participating in neural computing in same layer at this time is located at together
In one target area, any cell for participating in neural computing is mass-sended to target area for neuron function in a certain layer
The implementing result of code is just equivalent to the implementing result mass-sending of the code for neuron function to all participation minds of next layer
In cell through network query function.By the way that intercellular point communications functionality is expanded to region mass-sending, can support higher
The depth of parallelism obtains much higher total bandwidth, to further promote the overall performance of computing system, is remarkably contributing to improve nerve
The speed of network query function.
When it is implemented, can be incited somebody to action by the cell-cell communication if being additionally provided with full-time output cell in cellular array
The implementing result of neural computing is sent to the full-time output cell so can be more efficient for master cpu reading
Ground exports the implementing result of neural computing to master cpu.When actual implementation, if in the neuron function of the last layer
(common application is in the neuron of the last layer, each is responsible for identification one specific number for certain outputs for obtaining affirmative
According to feature or image), then master cpu can be notified by this mechanism.
Implement neural computing by cell-cell communication, iuntercellular mass-sending and full-time output cell and can refer to cell
Associated description in array computation system embodiment, details are not described herein again.
In conclusion realizing the advantage of neural computing using cellular array computing system provided in an embodiment of the present invention
It is obvious: the arithmetic speed that the parallel computing of a large amount of cells is significantly speeded up, so that the speed pole of learning training
It is big to improve;The huge bandwidth of array internal network communication and mass-sending mechanism are equally remarkably contributing to improve speed;MRAM's is non-
Volatibility replicates the successful chip of training directly as the product sale for solving particular problem.
Although present disclosure is as above, present invention is not limited to this.Anyone skilled in the art are not departing from this
It in the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute
Subject to the range of restriction.
Claims (9)
1. a kind of method that application cell array computation system realizes neural computing, which is characterized in that the cellular array
Computing system includes: master cpu, cellular array and cellular array bus, the cellular array and the cellular array bus collection
In Cheng Yuyi chip;The two-dimentional battle array that the cellular array is made of more than one cell for having both calculating and store function
Column, wherein each cell includes microprocessor and non-volatile random access memory;The non-volatile random access memory is for described
The arbitrary access of microprocessor involved data when calculating, is also used to store the instruction code of software and needs the number of persistence
According to;Position in each leisure cellular array of each cell storage as ID in cell software or hardware read;
The master cpu is communicated by the cellular array bus with each of cellular array cell;The cell
There is communication interface between flanking cell in array, it can the lower mutually transmission data of instruction software control in the cell;
The described method includes:
One or more neuron functions are stored in each cell for participating in neural computing in the cellular array
Code;
The master cpu selects one or more cells to execute the code for the neuron function that each cell is stored, and incites somebody to action
Implementing result is exported to one or more target cell;
Any cell for participating in neural computing receives the code for neuron function exported from other cells
Implementing result execute this cell based on all input datas from other cells as the input data of this cell
The code of the neuron function stored, and implementing result is output to thin where the neuron function for needing the implementing result
Born of the same parents are perhaps stored in preset address or export to the master cpu.
2. the method that application cell array computation system according to claim 1 realizes neural computing, feature exist
In, corresponding weight parameter of each input data from other cells, it is described based on all from the defeated of other cells
Enter data execute the code for the neuron function that this cell is stored include: input data by each from other cells with
Its corresponding weight parameter is multiplied, and sums to all products, after the result after summation is compared with threshold parameter really
Fixed output valve is as implementing result.
3. the method that application cell array computation system according to claim 2 realizes neural computing, feature exist
In the weight parameter and threshold parameter are pre-stored among the non-volatile random access memory in cell.
4. the method that application cell array computation system according to claim 1 realizes neural computing, feature exist
In being accelerated by way of tabling look-up defeated if the implementing result exported after executing the code of the neuron function is successive value
Out.
5. the method that application cell array computation system according to claim 1 realizes neural computing, feature exist
In, can be carried out communication between any two cell in the cellular array, participate in cell-cell communication cell include starting point it is thin
Born of the same parents, end point cell and transfer cell, the starting point cell are the cell that data are issued to the end point cell, the end point cell
For the cell for finally receiving the sent out data of starting point cell, the transfer cell be it is successively adjacent along cell-cell communication path and
By the cell of the sent out data of starting point cell described in the communication interface transfer, the cell-cell communication path is by the starting point
The data transmit-receive path that cell, transfer cell and end point cell are constituted;The method also includes: be between any two cell
By cell-cell communication transmission for the implementing result of the code of the neuron function.
6. the method that application cell array computation system according to claim 5 realizes neural computing, feature exist
In any cell in the cellular array can also be mass-sended as all cells of the starting point cell into target area
Communication, participates in the mass-sending communication and the cell in target area is thin as the starting point cell or as the terminal
Born of the same parents are used as the transfer cell and end point cell simultaneously, participate in the mass-sending communication and the cell being located at outside target area is made
For the starting point cell or transfer cell;The method also includes: it is logical using the mass-sending if neural network is divided into different layers
Reliable existing data between layers are transmitted, and the cell that neural computing is participated in same layer is located in same target area.
7. the method that application cell array computation system according to claim 5 or 6 realizes neural computing, feature
It is, at least one sole duty output cell is additionally provided in the cellular array, and the full-time output cell is thin as the terminal
Born of the same parents receive and store other cells to the output data of the master cpu, and notify the master cpu to read institute with interrupt signal
State output data;It includes: to execute knot for described by the cell-cell communication that the implementing result is exported to the master cpu
Fruit is sent to the full-time output cell, for master cpu reading.
8. the method that application cell array computation system according to claim 1 realizes neural computing, feature exist
In the non-volatile random access memory is MRAM.
9. the method that application cell array computation system according to claim 1 realizes neural computing, feature exist
Select whole in the cellular array or the cell in target area as participating in neural computing in, the master cpu
Cell, by the same segment relative address of the code broadcast of identical neuron function to each selected cell, and it is wide
Broadcasting instruction makes the microprocessor of one or more selected cells execute the neuron since the relative address
The code of function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510456306.2A CN105740946B (en) | 2015-07-29 | 2015-07-29 | A kind of method that application cell array computation system realizes neural computing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510456306.2A CN105740946B (en) | 2015-07-29 | 2015-07-29 | A kind of method that application cell array computation system realizes neural computing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105740946A CN105740946A (en) | 2016-07-06 |
CN105740946B true CN105740946B (en) | 2019-02-12 |
Family
ID=56295970
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510456306.2A Active CN105740946B (en) | 2015-07-29 | 2015-07-29 | A kind of method that application cell array computation system realizes neural computing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105740946B (en) |
Families Citing this family (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108154226B (en) * | 2016-12-06 | 2021-09-03 | 上海磁宇信息科技有限公司 | Neural network chip using analog computation |
CN108154225B (en) * | 2016-12-06 | 2021-09-03 | 上海磁宇信息科技有限公司 | Neural network chip using analog computation |
CN108255514A (en) * | 2016-12-28 | 2018-07-06 | 上海磁宇信息科技有限公司 | For the neuron calculator operation method of cellular array computing system |
CN108256637A (en) * | 2016-12-28 | 2018-07-06 | 上海磁宇信息科技有限公司 | A kind of cellular array three-dimensional communication transmission method |
CN108255775A (en) * | 2016-12-28 | 2018-07-06 | 上海磁宇信息科技有限公司 | For the cellular array bus broadcast method of cellular array computing system |
CN108256641A (en) * | 2016-12-28 | 2018-07-06 | 上海磁宇信息科技有限公司 | For the cellular array internal network communication method of cellular array computing system |
CN108256640A (en) * | 2016-12-28 | 2018-07-06 | 上海磁宇信息科技有限公司 | Convolutional neural networks implementation method |
CN109146070B (en) * | 2017-06-16 | 2021-10-22 | 华为技术有限公司 | Peripheral circuit and system for supporting neural network training based on RRAM |
CN107817708B (en) * | 2017-11-15 | 2020-07-07 | 复旦大学 | High-compatibility programmable neural network acceleration array |
CN111242294B (en) * | 2017-12-14 | 2023-08-25 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related products |
CN110308909B (en) * | 2018-03-27 | 2023-08-01 | 上海寒武纪信息科技有限公司 | Executable program generating device and method for neural network processor |
US11914860B2 (en) * | 2018-08-20 | 2024-02-27 | Macronix International Co., Ltd. | Data storage for artificial intelligence-based applications |
CN109886393B (en) * | 2019-02-26 | 2021-02-09 | 上海闪易半导体有限公司 | Storage and calculation integrated circuit and calculation method of neural network |
CN110362280A (en) * | 2019-09-04 | 2019-10-22 | 南京优存科技有限公司 | Mixing storage system based on the nearly data processing MRAM of low-power consumption neural network |
CN112269751B (en) * | 2020-11-12 | 2022-08-23 | 浙江大学 | Chip expansion method for hundred million-level neuron brain computer |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1341242A (en) * | 1999-01-21 | 2002-03-20 | 索尼电脑娱乐公司 | High-speed processor system, method of using the same, and recording medium |
WO2009075694A1 (en) * | 2007-12-05 | 2009-06-18 | Hewlett-Packard Development Company, L.P. | Hybrid microscale-nanoscale neuromorphic integrated circuit |
CN102665049A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院半导体研究所 | Programmable visual chip-based visual image processing system |
CN103019656A (en) * | 2012-12-04 | 2013-04-03 | 中国科学院半导体研究所 | Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101354694B (en) * | 2007-07-26 | 2010-10-13 | 上海红神信息技术有限公司 | Ultra-high expanding super computing system based on MPU structure |
-
2015
- 2015-07-29 CN CN201510456306.2A patent/CN105740946B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1341242A (en) * | 1999-01-21 | 2002-03-20 | 索尼电脑娱乐公司 | High-speed processor system, method of using the same, and recording medium |
WO2009075694A1 (en) * | 2007-12-05 | 2009-06-18 | Hewlett-Packard Development Company, L.P. | Hybrid microscale-nanoscale neuromorphic integrated circuit |
CN102665049A (en) * | 2012-03-29 | 2012-09-12 | 中国科学院半导体研究所 | Programmable visual chip-based visual image processing system |
CN103019656A (en) * | 2012-12-04 | 2013-04-03 | 中国科学院半导体研究所 | Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system |
Also Published As
Publication number | Publication date |
---|---|
CN105740946A (en) | 2016-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105740946B (en) | A kind of method that application cell array computation system realizes neural computing | |
CN105718996B (en) | Cellular array computing system and communication means therein | |
CN105608490B (en) | Cellular array computing system and communication means therein | |
CN105718994B (en) | Cellular array computing system | |
US11741345B2 (en) | Multi-memory on-chip computational network | |
US10846621B2 (en) | Fast context switching for computational networks | |
US11580367B2 (en) | Method and system for processing neural network | |
CN106203619B (en) | Data optimized neural network traversal | |
CN106951926A (en) | The deep learning systems approach and device of a kind of mixed architecture | |
DE102010044529B4 (en) | AUTONOMOUS MEMORY SUB SYSTEM WITH HARDWARE ACCELERATOR | |
CN111465943B (en) | Integrated circuit and method for neural network processing | |
DE112020004107T5 (en) | CONTENT RECOMMENDATIONS USING ONE OR MORE NEURAL NETWORKS | |
CN105718991B (en) | Cellular array computing system | |
DE102022121773A1 (en) | IN-MEMORY ASSOCIATIVE PROCESSING SYSTEM | |
CN105718990B (en) | Communication means between cellular array computing system and wherein cell | |
CN105718380B (en) | Cellular array computing system | |
CN105718379B (en) | Cellular array computing system and wherein iuntercellular mass-send communication means | |
CN104156316B (en) | A kind of method and system of Hadoop clusters batch processing job | |
CN117786412A (en) | Elastic training method, cluster system, product and medium for large language model | |
CN105718993B (en) | Cellular array computing system and communication means therein | |
CN110750363B (en) | Computer storage management method and device, electronic equipment and storage medium | |
CN106293491B (en) | The processing method and Memory Controller Hub of write request | |
CN105718992B (en) | Cellular array computing system | |
CN105718995B (en) | Cellular array computing system and its adjustment method | |
CN105719227B (en) | A kind of camera system and image identification system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |