CN107341129B

CN107341129B - Cell array computing system and testing method thereof

Info

Publication number: CN107341129B
Application number: CN201610284034.7A
Authority: CN
Inventors: 戴瑾
Original assignee: Shanghai Ciyu Information Technologies Co Ltd
Current assignee: Shanghai Ciyu Information Technologies Co Ltd
Priority date: 2016-04-29
Filing date: 2016-04-29
Publication date: 2021-06-29
Anticipated expiration: 2036-04-29
Also published as: CN107341129A

Abstract

A cell array computing system and a testing method thereof, the cell array computing system comprising: a master control CPU, a cell array and a cell array bus; the cell array is a two-dimensional or three-dimensional array consisting of more than one cell with calculation and storage functions, wherein each cell comprises a microprocessor and a nonvolatile random access memory; each cell stores its respective location in the cell array as an ID for software or hardware reading in the cell; the master control CPU is communicated with each cell in the cell array through a cell array bus; the cell array reserves more than one redundant cell for being used as a corresponding replacement cell of the damaged cell when any other cell in the cell array is determined to be the damaged cell; the cell array and the cell array bus are integrated on one chip. The invention can overcome the communication bottleneck existing between the CPU, the memory and the storage in the existing computer architecture, improve the overall performance of the computing system and improve the product yield.

Description

Cell array computing system and testing method thereof

Technical Field

The invention relates to the technical field of computers and computer application, in particular to a cell array computing system and a testing method thereof.

Background

Generally, a computer mainly includes three core parts: central Processing Unit (CPU), memory and storage.

Through the continuing efforts of some top-of-the-world companies, CPUs have evolved into extremely complex semiconductor chips. The number of MOS transistors inside the top-level CPU core may exceed one hundred million. The current industry trend is that the operating frequency of the CPU has been difficult to increase again due to power consumption. Modern CPUs, which are already extremely complex, are also difficult to improve in operating efficiency. New CPU products are evolving more and more towards multi-core.

In terms of Memory, Dynamic Random Access Memory (DRAM) technology is currently dominating. DRAM can be read and written randomly quickly, but cannot hold its contents in the event of a power failure. In fact, even at power-on, it loses information due to leakage of the capacitor used internally to store the information, and must be periodically self-refreshed.

NAND flash memory technology is gradually replacing traditional hard disks in terms of storage. The floating gate (floating gate) technology relied on by flash memory can hold contents when power is off, but the speed of writing (rewriting '1' to '0') is slow, the speed of erasing (rewriting '0' to '1') is slow, and it cannot be used for direct support of calculation like DRAM. It is fabricated as a block device (block device) that must be erased together in one block, a block containing many pages (pages) that can be written to after erasure. Another problem with NAND is having a limited lifetime.

The DRAM and NAND flash memories, as well as the logic circuit of the CPU, are produced based on CMOS semiconductor processes, but the processes of the three are not compatible with each other. Thus, the three core parts of a computer cannot coexist on one chip, which profoundly affects the architecture of modern computers.

A computer architecture in the prior art is shown in fig. 1, where fig. 1 shows a plurality of CPU cores, which are respectively CPU1, CPU2, CPU3, … …, and CPUn, each CPU core generally has a corresponding first-level Cache (L1Cache), and each CPU core may be further equipped with a corresponding second-level Cache (L2Cache) and third-level Cache (L3Cache) as needed. The DRAM communicates with each CPU core through a Double Data Rate (DDR) interface, and the Hard Disk (HD) or Solid State Drive (SSD) communicates with each CPU core through a peripheral interface.

On one hand, the CPU is developing towards multi-core, and on the other hand, the memory and storage are in other chips. The throughput of the multi-core CPU increases in proportion, and the communication with the memory and storage becomes a bottleneck of the system performance. To alleviate the communication bottleneck, CPUs have to employ increasingly larger multi-level caches. Caches are used to copy the contents of Memory, and are typically designed with Static Random Access Memories (SRAM) which are much more expensive but faster than DRAM. Such an architecture is very inefficient. The cost of a semiconductor chip is determined by the area of its silicon die, and the performance improvement brought by traditional computer architectures is far from proportional to the increase in silicon die area.

CPUs are becoming more and more complex, relying on one generation of yet another evolving semiconductor process. This creates a problem, as semiconductor chips become more complex, and a chip can have more than 10 hundred million MOS devices in advanced processes. If one of 10 hundred million components is damaged during the chip manufacturing process, the entire chip is generally rejected. The damage rate of the components is controlled to be less than 10 parts per billion, so that the semiconductor process is greatly challenged, the yield is low, and the cost of the chip is greatly increased.

Disclosure of Invention

The invention aims to solve the problems that the computer architecture in the prior art influences the improvement of the overall performance of a computer due to the communication bottleneck existing between a CPU (central processing unit) and an internal memory and between the CPU and the internal memory, the cost efficiency is poor, and the yield is low and the cost is high when the computer architecture is integrated on a chip.

In order to solve the above problems, an embodiment of the present invention provides a cell array computing system, including: a master control CPU, a cell array and a cell array bus; the cell array is a two-dimensional array or a three-dimensional array composed of more than one cell with calculation and storage functions, wherein each cell comprises a Micro Processing Unit (MPU) and a Nonvolatile (NV) random access memory (Not Volatile); the non-volatile random access memory is used for random access of data involved in the calculation of the microprocessor and is also used for storing instruction codes of software and data needing to be permanently stored; each cell stores a respective position in the cell array as an identification number (ID) for software or hardware reading in the cell; the master CPU communicates with each cell in the cell array via the cell array bus; more than one redundant cell is reserved in the cell array as a spare cell, and the spare cell is used as a corresponding replacement cell of the damaged cell when any other cell in the cell array is determined to be the damaged cell; the cell array and the cell array bus are integrated on one chip.

Optionally, the communication of the master CPU with each cell in the cell array through the cell array bus comprises at least one of:

a nonvolatile random access memory for reading and writing any cell in the cell array by address;

broadcasting data to the non-volatile random access memory of each cell in a target area in the cell array and writing the same relative address in the non-volatile random access memory of each cell in the target area;

sending instructions, sending data or reading status to a microprocessor of any cell in the cell array;

broadcasting instructions to the microprocessors of all cells within the target area.

Optionally, the cells in the cell array further include a bus controller and a cell internal bus, the bus controller is connected to the cell array bus, the microprocessor and the cell internal bus, the bus controller is configured to listen to an instruction on the cell array bus, and for the instruction of the relevant cell, the bus controller is connected to the microprocessor to transmit the instruction or data sent by the main control CPU, read the state, or connect to the nonvolatile random access memory through the cell internal bus to perform read-write operation of data; a first nonvolatile memory is arranged in the bus controller of the spare cell and used for storing the position of the damaged cell replaced by the cell in the cell array; when the chip is operated, the damaged cell is in a closed state, and the bus controller of the damaged cell corresponding to the replaced cell listens to the instruction on the cell array bus, the instruction related to the replaced cell of the cell is recognized as the instruction related to the cell.

Optionally, a communication interface is provided between adjacent cells in the cell array, and data can be transmitted and received to and from each other; any two cells can communicate with each other, and the cells involved in intercellular communication include a start cell, an end cell and a transit cell, the start cell is a cell which sends data to the end cell, the end cell is a cell which finally receives the data sent by the start cell, the transit cell is a cell which is adjacent in sequence along an intercellular communication path and relays the data sent by the start cell through the communication interface, and the intercellular communication path is a data sending and receiving path formed by the start cell, the transit cell and the end cell.

Optionally, any cell in the cell array can also be used as the starting cell to perform mass-sending communication to all cells in the target area, a cell involved in the mass-sending communication and located in the target area is used as the starting cell, or used as the destination cell, or used as both the transit cell and the destination cell, and a cell involved in the mass-sending communication and located outside the target area is used as the starting cell or the transit cell.

Optionally, the cells in the cell array further include a network controller connected to the microprocessor, where the network controller is configured to perform transceiving and routing control on transmitted data, transferred data, or finally received data, and is further configured to send an interrupt signal to the microprocessor; a second nonvolatile memory is arranged in the network controller, the second nonvolatile memory in all normal cells adjacent to the damaged cell is used for calibrating the damaged cell adjacent to the cell and storing the position of a corresponding replacement cell of the damaged cell in the cell array; in communicating data, the network controller of a neighboring cell of the damaged cell bypasses the damaged cell when performing routing control, and if the damaged cell is one of the end point cell or the end point cell, controls forwarding data to a corresponding replacement cell of the damaged cell.

Optionally, the cell array is a two-dimensional array, and the reserved one or more redundant cells are one row or one column of cells in the cell array.

Optionally, at least one of a Floating Point Unit (FPU) and an image processor is integrated in the microprocessor.

Optionally, the nonvolatile Random Access Memory is a Magnetic Random Access Memory (MRAM).

Optionally, the main control CPU, the cell array and the cell array bus are integrated in one chip.

Optionally, the main control CPU is an independent chip and communicates with a chip composed of the cell array and the cell array bus through a standard memory interface.

In order to solve the above problems, the present invention further provides a method for testing the cell array computing system, including: broadcasting a test program to enable each cell in the cell array to perform self-testing; and if one cell except the spare cell can not pass the test, determining that the cell is a damaged cell, selecting a normal cell from the spare cells as a corresponding replacement cell of the damaged cell, and storing the position of the replaced damaged cell in the cell array in the replacement cell.

Optionally, after determining that a cell other than the spare cell is a damaged cell, the damaged cell is also calibrated in all normal cells adjacent to the damaged cell, and the position of the corresponding replacement cell of the damaged cell in the cell array is stored.

Compared with the prior art, the technical scheme of the invention at least has the following advantages:

as semiconductor technology advances to higher process nodes, the smaller each device, the more devices on a chip. In a large chip such as a conventional CPU, the entire chip becomes a waste product due to the damage of one device, and the production yield is increasingly difficult to control. The technical scheme of the invention provides a new computing architecture of an array integrating a large number of cells on a chip, and a Redundancy design (Redundancy) is added in the cell array, namely, a part of redundant cells are reserved for standby use and used as corresponding replacement cells of damaged cells when any other cell in the cell array is determined to be the damaged cell, so that the work of the whole system can not be influenced under the condition that a small part of cells have production and manufacturing problems, thereby improving the product yield and reducing the production and manufacturing cost.

Drawings

FIG. 1 is a schematic diagram of a prior art computer architecture;

FIG. 2 is a schematic diagram of a cell array computing system according to an embodiment of the present invention;

FIG. 3 is a schematic illustration of a communication mode between adjacent cells according to an embodiment of the present invention;

FIG. 4 is a schematic illustration of another communication mode between adjacent cells according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the structure of a cell of an embodiment of the invention;

FIG. 6 is a schematic of a Monte Carlo integration in a pipelined manner using a cell array computing system;

FIG. 7 is a schematic diagram of the structure of cells in a cell array that communicate between cells in accordance with an embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating the routing of intercellular communication in the cell array according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a process for performing full-time export of cells according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of cell mass distribution of origin cells in a cell array at a corner of a target region according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of cell mass distribution of starting cells on the sides of a target region in a cell array according to an embodiment of the present invention;

FIG. 12 is a schematic diagram of cell mass distribution of starting cells within a target region in a cell array according to an embodiment of the present invention;

FIG. 13 is a schematic diagram of cell mass distribution of origin cells outside the target region in a cell array according to an embodiment of the present invention;

FIG. 14 is a schematic structural diagram of a cell array computing system for reserving redundant cells according to an embodiment of the present invention.

Detailed Description

In the computer architecture in the prior art, communication bottlenecks exist among the CPU, the memory and the storage, so that the improvement of the overall performance of the computer is influenced, and the cost efficiency is poor.

After research, the inventor of the present application considers that if three functions of memory, storage and calculation are integrated on one chip, a relatively simple unit with independent calculation and storage functions is formed, and a large number of such units form an intensive communication network, so as to realize a data mass sending function and an internal network capable of massively and parallelly transmitting data, a calculation architecture similar to that of human brain can be developed, which is equivalent to that a large number of microcomputers are manufactured on one chip.

Therefore, the technical scheme of the invention provides a computing architecture (referred to as a cell array computing system in the technical scheme of the invention) similar to the structure of a human brain, and the computing architecture is composed of a plurality of units (referred to as cells in the technical scheme of the invention) which have relatively simple structures, have storage and computing functions and are densely connected with a network. The new computing architecture can be widely applied to the fields of large-scale computing, big data processing, artificial intelligence and the like.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

As shown in fig. 2, the cell array computing system provided by the embodiment of the present invention includes: a master control CPU, a cell array and a cell array bus; the cell array is a main body in the cell array computing system, and is a two-dimensional array composed of more than one cell with both computing and storing functions, wherein each cell comprises a Microprocessor (MPU) and a nonvolatile random access memory (MRAM is taken as an example in FIG. 2); the non-volatile random access memory is used for random access of data involved in the calculation of the microprocessor and is also used for storing instruction codes of software and data needing to be permanently stored; each cell stores a respective location in the cell array as an ID for software or hardware reading in the cell; the master CPU communicates with each cell in the cell array via the cell array bus; and adjacent cells in the cell array are provided with communication interfaces which can transmit and receive data to and from each other.

In this embodiment, the nonvolatile random access memory is described by taking MRAM as an example. In other embodiments, as the technology of non-volatile random access memory is further developed and matured, the non-volatile random access memory can also be implemented by several other potential technologies, such as: phase Change Random Access Memory (PCRAM), Resistive Random Access Memory (Resistive Random Access Memory), Ferroelectric Random Access Memory (FeRAM), Ferroelectric Dynamic Random Access Memory (FeRAM), and the like.

MRAM is a new memory and storage technology, can be read and written randomly as fast as SRAM/DRAM, and is faster than DRAM; it can also permanently retain data after power is off like flash memory, and unlike NAND flash memory, MRAM can be erased and written without limit, and has a long lifetime. In addition, the economy of MRAM is desired to be good, and the silicon area occupied by unit capacity is superior to that of SRAM (which is usually used as a cache memory of a CPU), and is expected to be close to the level of DRAM. Its performance is also quite good, the read-write time delay is close to the best SRAM, and the power consumption is the best in various memory and storage technologies. Also MRAM is not compatible with standard CMOS semiconductor processes like DRAM and Flash, and MRAM can be integrated with logic circuits in one chip. By using MRAM technology, the three functions of memory, storage and computation can be integrated on a single chip, enabling the implementation of the cell array computing system.

In this embodiment, the microprocessor has a function of a general CPU, and units such as a Floating Point Unit (FPU) and an image processor may be added according to a specific application scenario, so that at least one of the floating Point Unit and the image processor may be integrated in the microprocessor.

In practical implementation, the main control CPU, the cell array and the cell array bus may be integrated in one chip, or the main control CPU may be an independent chip and communicate with a chip composed of the cell array and the cell array bus through a standard memory interface. When the master control CPU communicates with the cell array by using a standard memory interface, the master control CPU may be implemented by using a general CPU chip, which is easier to implement in the cell array computing system.

In this embodiment, each cell stores its own position in the cell array as an ID, the position can be represented by coordinates in the first quadrant of a rectangular plane coordinate system, (x, y) can be stored as an ID in a certain cell if (x, y) represents the position of the cell in the cell array, and software and hardware in the cell can read the ID for use in a specific operation.

In this embodiment, the communication between the master CPU and each cell in the cell array via the cell array bus includes the following:

broadcasting data to the non-volatile random access memory of each cell in the target area in the cell array, and writing the same relative address in the non-volatile random access memory of each cell in the target area;

sending instructions (including start, pause), sending data, or reading status to the microprocessor of any cell in the cell array;

the microprocessor broadcasts instructions to all cells in the target area.

Of course, in other embodiments, the communication between the master CPU and each cell in the cell array via the cell array bus may be one or more of the above.

In the embodiment of the present invention, the "target region" refers to a region in which one or more cells having an adjacent relationship are selected from the master CPU or any cell in the cell array, and the cells in the region are targets for broadcasting/mass-sending data or instructions from the master CPU or any cell in the cell array. In this embodiment, the target area is specifically illustrated as a rectangular area (x is greater than or equal to a and is less than or equal to b, and y is greater than or equal to c and is less than or equal to d, where a and b are boundary coordinates of the rectangular area in the x-axis direction in the rectangular planar coordinate system, and c and d are boundary coordinates of the rectangular area in the y-axis direction in the rectangular planar coordinate system). In other embodiments, the target region may be a region formed by other shapes, such as a diamond region, a triangular region, a hexagonal region, and the like.

In addition, the concept of "broadcast" in the embodiment of the present invention is different from the concept of "group transmission", in which the former may be to transmit data or instructions once so that all objects can be received, and the latter may be to transmit the data or instructions many times to different objects.

In addition to broadcasting any cell in the cell array (including microprocessors or non-volatile random access memories in cells) via the master CPU, within the cell array there is a communication network that enables a cell to send data to its neighboring cells under the control of its MPU. As shown in fig. 3, in one plane, any cell can communicate with its neighboring cells in four directions, up, down, left, and right. Of course, the concept of the communication method between adjacent cells is not limited to "four directions of up, down, left, and right", and may be "eight directions of up, down, left, right, left up, right up, left down, and right down" in the case that the circuit layout can support, as shown in fig. 4, any cell may communicate with the adjacent cells in eight directions of up, down, left, right, left up, right up, left down, and right down.

As shown in fig. 5, in this embodiment, the cells in the cell array may further include a bus controller and a cell internal bus, where the bus controller is connected to the cell array bus, the microprocessor and the cell internal bus, and the bus controller is configured to identify communication performed between the main control CPU and the cell, and is connected to the microprocessor to transmit instructions or data sent by the main control CPU and read a state, or is connected to the nonvolatile random access memory through the cell internal bus to perform read-write operation of data.

Those skilled in the art will appreciate that a relatively simple and well-behaved CPU, such as ARM Cotex M0, has only about 5 ten thousand MOS transistors, which, even with modest increases in FPU functionality, are much smaller than the billions of MOS transistors in a top-level CPU, and that the area (cost) increase associated with increasing CPU performance is disproportionate. The total computing power is increased by a plurality of times under the condition that the total cost is the same by replacing a large CPU with a plurality of small CPUs. However, conventional computer architectures suffer from communication bottlenecks and the actual performance gains from using large numbers of CPU cores are very limited.

The cell array computing system provided by the technical scheme of the invention solves the problem of communication bottleneck through data broadcasting and an internal network, thereby improving the overall performance of the computing system and ensuring better cost efficiency, which can be seen more clearly in the subsequent application examples.

Preliminary studies showed that if an MPU similar to Cortex M0 was used, with 32KB of memory, one cell was composed. Using a 40 nm process, 3000 such cells can be fabricated on a single chip, which is very computationally intensive. Further studies have shown that the computational power of contemporary top-level CPUs can be exceeded over the same silicon area using this method (typically measured in floating-point operations per second (FLOPS)). Because the cell array computing system of the technical scheme of the invention does not face the bottleneck of an interface with a memory any more, the performance is better in solving a plurality of practical problems.

Based on the cell array computing system, an embodiment of the present invention further provides a communication method in the cell array computing system, including: the method comprises the following steps of (1) the operation of a master control CPU reading and writing a nonvolatile random access memory, the communication operation between the master control CPU and a microprocessor, the broadcast operation of the master control CPU and the communication operation between adjacent cells in a cell array;

the operation of the main control CPU for reading and writing the nonvolatile random access memory specifically includes: and any cell in the cell array receives a target address broadcast by the main control CPU on the cell array bus, and if the target address is judged to be in the cell, the cell is connected with the nonvolatile random access memory of the cell so that the main control CPU performs data reading and writing operations.

The communication operation between the main control CPU and the microprocessor specifically includes: and reserving a first special address field in a system address space for communication between the main control CPU and the microprocessor and storing the ID of a target cell, and if any cell in the cell array identifies communication with the microprocessor of the cell when receiving the first special address field, connecting the microprocessor of the cell to complete subsequent instruction receiving, data receiving and state reading operations.

It should be noted that the system address space is not limited to the sum of the address spaces of the non-volatile random access memories contained in the cells of the cell array, since the memory connected to the cell array bus may be not only the non-volatile random access memory contained in the cells of the cell array, but there may be other types of memory connected to the cell array bus for access by the master CPU. Therefore, the master CPU needs to identify the cell it is preparing to access (this time called the "target cell" in this embodiment) based on the ID of the cell.

The broadcast operation of the main control CPU specifically includes: and reserving a second special address field in a system address space for broadcasting an instruction by the main control CPU, wherein the second special address field is stored with an ID (identity) of each cell which can help to determine the range of a target area in the cell array, and if any cell in the cell array identifies that the cell is in the target area after receiving the second special address field, connecting a microprocessor of the cell to transmit the instruction or data sent by the main control CPU, reading the state or performing data reading and writing operations through a nonvolatile random access memory connected with the cell.

The following takes the target area, specifically a rectangular area, as an example to illustrate the broadcast operation of the main control CPU. A section is reserved in the system address space for a broadcast instruction, and a section in the address is used for storing the ID of the starting cell in the target rectangular area. The starting cell is the first cell accessed by the master CPU in the target rectangular area, and after receiving the special address, the bus controller in the cell receives the next word data, which includes the ID of the cell diagonal to the starting cell in the target rectangular area. The bus controller judges that the cell is in the area and receives the data of the second word. The second word indicates whether the instruction or data is for the MPU or whether the writing is started from a relative address in the non-volatile random access memory. If the former is the former, the MPU is connected, and if the latter is the latter, the nonvolatile random access memory is connected to complete the following operation.

It should be noted that, under the condition that the storage space of the second special address field is relatively limited, the stored cell ID may not be able to completely determine the range of the target region according to the cell ID, and at this time, after receiving the second special address field, subsequent data needs to be received to determine the range of the target region together with the cell ID stored in the second special address field.

The operation of communicating between adjacent cells within the cell array includes: any cell in the cell array sends data to adjacent cells under the control of its microprocessor.

In this embodiment, each cell is provided with a bus controller, which is connected to the cell array bus. An intracellular bus is arranged inside the cell, the nonvolatile random access memory is a Slave device (Slave) of the intracellular bus, and the bus controller and the microprocessor are Master devices (masters).

The "any cell in the cell array judges whether the target address is in the cell", the "identifies whether the cell is in communication with the microprocessor of the cell", the "identifies whether the cell is in the target area", and the "connects the nonvolatile random access memory or the microprocessor" involved in the communication method in the cell array computing system are all completed by the bus controller, and the bus controller is connected with the nonvolatile random access memory through the internal bus of the cell.

In specific implementation, the priority of the read-write operation of the master CPU on the nonvolatile random access memory of any cell in the cell array is higher than that of the microprocessor in the cell on the corresponding nonvolatile random access memory. That is, if the microprocessor in a cell needs to read from or write to the nonvolatile random access memory in the cell, the microprocessor must wait until the master CPU finishes the read/write operation of the nonvolatile random access memory in the cell.

For the specific implementation of the communication method in the cell array computing system, reference may also be made to the implementation of the cell array computing system, and details are not repeated here.

In this embodiment, for the implementation of the internal network of the cell array, it is possible to not only transmit data to neighboring cells, but also extend it to be able to transmit data from one cell to any one cell, i.e., to enable inter-cell communication in the cell array.

Specifically, any two cells in the cell array of the cell array computing system can communicate with each other without depending on a main control CPU, and the cells involved in intercellular communication include a start cell, an end cell, and a transit cell, the start cell is a cell that sends data to the end cell, the end cell is a cell that finally receives data sent by the start cell, the transit cells are cells that are adjacent in sequence along an intercellular communication path and relay data sent by the start cell through the communication interface, and the intercellular communication path is a data transmission and reception path constituted by the start cell, the transit cell, and the end cell.

Through the communication interface between the adjacent cells in the cell array, multiple transfer of data between the adjacent cells is realized, so that any two cells in the cell array can communicate without depending on the master control CPU, the efficiency of intercellular communication is improved, the processing burden of the master control CPU is also reduced, and the overall performance of the computing system can be further improved.

It should be noted that the starting cell, the end cell and the transit cell are relative concepts with respect to a certain inter-cell communication process, because a certain starting cell may also serve as a transit cell or an end cell in other inter-cell communication processes, and a certain end cell may also serve as a transit cell or a starting cell in other inter-cell communication processes.

In a specific implementation, the cells in the cell array may further include a network controller connected to the microprocessor, where the network controller is configured to perform transceiving control on transmitted data, relayed data, or finally received data during cell-to-cell communication, and is further configured to send an interrupt signal to the microprocessor. In the present embodiment, a network controller is provided in each cell so as to relay data quickly without disturbing the MPUs, thereby reducing the processing load of the MPUs in the cell. In other embodiments, the network controller may not be provided, and the MPU may perform data relay.

In the present embodiment, "data issued" refers to data issued by the origin cell itself; "transit data" refers to data sent from the transit cell to the origin cell, which data is not required to be sent from the cell itself; "Final received data" refers to data received by the end point cells that has reached the destination after multiple passes and will not be passed. "outgoing data", "relayed data", and "finally received data" may be the same data in terms of content, but belong to different designations of different communication phases.

In a specific implementation, the cells in the cell array may further include one or more groups of first-in first-out queues connected to the network controller, each group of first-in first-out queues corresponds to one cell adjacent to the cell, each group of first-in first-out queues includes an input first-in first-out queue and an output first-in first-out queue, the input first-in first-out queue is used to store data input into the cell for transfer or data finally received, and the output first-in first-out queue is used to store data output from the cell for transfer or data sent from the cell to other cells.

If the communication method between adjacent cells shown in fig. 3 is taken as an example, the structure of the cells performing inter-cell communication in the cell array of this embodiment is shown in fig. 7, the network controller in fig. 7 is respectively connected to the MPU and 4 sets of FIFO queues, each set of FIFO queues respectively corresponds to the cells adjacent to the cell in "four directions, i.e., up, down, left, and right directions, of the two-dimensional plane" one by one, and in specific implementation, the communication channels between every two adjacent cells can share one set of corresponding FIFO queues. Each group of FIFO queues comprises an input FIFO and an output FIFO, wherein in terms of one cell, the input FIFO stores data input from other adjacent cells, the output FIFO stores data output from the cell to other adjacent cells, the output FIFO of the adjacent cell belongs to the input FIFO for the cell, and the output FIFO of the cell belongs to the input FIFO for the adjacent cells.

It should be noted that, 4 sets of FIFO queues are present in the cells shown in fig. 7, and if a cell is located at the 4 corners of the rectangular cell array, the cell has only two adjacent cells, and this cell corresponds to 2 sets of FIFO queues, and if a cell is located at the 4 sides of the rectangular cell array, this cell has three adjacent cells, and this cell corresponds to 3 sets of FIFO queues.

In this embodiment, the network controller is also connected to the MPU in the cell, and sends interrupt signals to the MPU, such as FIFO empty, FIFO full, new data, data out, and the like; the MPU may then issue data through the network controller, which is typically placed in a corresponding one of the output FIFO queues first.

It should be noted that the cell structure in fig. 7 only shows modules related to the communication between cells, and those skilled in the art can understand that the cell structure shown in fig. 7 can be fully combined with the cell structure shown in fig. 5.

In addition, in the embodiment, the FIFO queue is adopted to store data for inputting and outputting a certain cell, so that data transfer in the process of cell-to-cell communication can be more efficient, and the processing load of the MPU is reduced. In other embodiments, inputting and outputting data of a certain cell may be implemented by a register.

The embodiment of the present invention further provides a method for communication between cells in the above cell array computing system, including: the starting point cell in the cell array sends data sent to the end point cell to the cell adjacent to the starting point cell according to the selected sending direction; when any cell in the cell array receives data sent by adjacent cells or relayed data, if the cell is judged to be an end cell according to the ID of the end cell marked in the received data, the received data is stored in a nonvolatile random access memory of the cell or a microprocessor of the cell is informed to process the received data, otherwise, the cell is taken as a relayed cell, and the received data is relayed to the cell adjacent to the cell after a sender is selected.

In specific implementation, each piece of data involved in the inter-cell communication process contains the IDs of the starting cell and the end cell, and any cell can determine whether the data is addressed to the cell or needs to be further transferred to other adjacent cells according to the ID of the end cell indicated in the received data. A piece of data passes through the connection between adjacent cells and is transferred for a plurality of times to reach an end point cell, if the end point cell needs to make feedback on the data sent by the starting point cell, the feedback data can be sent to the starting point cell according to the ID of the starting point cell, the end point cell takes the ID of the starting point cell marked in the received data as the ID of the end point cell, and the ID of the end point cell is marked in the feedback data obtained after the received data is processed, at this time, the end point cell becomes the starting point cell in the new primary cell-to-cell communication, and the original starting point cell becomes the end point cell in the secondary cell-to-cell communication.

In specific implementation, the ID of the end-point cell is marked, and simultaneously, the address to be accessed in the end-point cell or the MPU is also marked in the data sent from the start-point cell to the end-point cell; the storing of the received data into the non-volatile random access memory of the cell is performed after the destination cell identifies the address to be accessed, which is indicated in the received data; the MPU notifying the own cell performs the processing of the received data after the MPU indicated in the received data is recognized by the end-point cell.

In practical implementation, if the destination cell identifies the address to be accessed, which is indicated in the received data, the received data may be directly written into the corresponding address in the nonvolatile random access memory of the destination cell by the network controller in the destination cell, in which case, the cells may "propagate", and one cell may download a program to another cell; if the destination cell identifies the MPU indicated in the received data, the received data is processed by the MPU in the destination cell.

In this embodiment, since the cells in the cell array further include a network controller connected to the MPUs, the data transmission from the starting cell to the end cell, the data transmission from any one of the cells in the cell array to the neighboring cell or the relay data, the determination that the cell is the final cell or the relay cell, the storage of the received data in the nonvolatile random access memory of the cell, or the processing of the received data by the MPUs notifying the cell are all performed under the control of the network controller.

In specific implementation, the data sent from the starting cell to the destination cell is input into the output fifo queue by the network controller, and then is output from the output fifo queue to the cell adjacent to the starting cell by the network controller; and if any cell in the cell array receives data sent by adjacent cells or transferred data, inputting the received data into the input first-in first-out queue, and inputting the data into the output first-in first-out queue when the received data is judged to be transferred.

In addition, if the network controller judges that the input first-in first-out queue or the output first-in first-out queue is empty or full, or receives data sent or transferred by an adjacent cell, or sends data or transfers data to the adjacent cell, an interrupt signal is sent to the microprocessor.

In a specific implementation, the starting cell or the transit cell may select the sending direction by: if a straight line communication path can be formed between the starting point cell or the transit cell and the end point cell, the sending direction is a direction from the starting point cell or the transit cell to the end point cell along the straight line, otherwise, the sending direction is a direction from the starting point cell or the transit cell to a cell to be selected, and the cell to be selected is a cell close to the end point cell in cells adjacent to the starting point cell or the transit cell. Of course, the number of the candidate neighboring cells may be two, and in this case, the cell with less communication task of outputting data in the two candidate neighboring cells is selected as the relay cell.

In this embodiment, the starting cell or the transit cell selects the transmission direction in the above manner, and may actually be considered as a path selection process for communication between cells in the cell array. Referring to fig. 8, each rectangle in fig. 8 represents a cell in the cell array, and all cells shown in fig. 8 are part of the entire cell array, assuming communication between adjacent cells is as shown in fig. 3.

If the point A represents a starting point cell, the starting point cell is ready to send data to an end point cell where the point C is located, and as a straight line communication path is obviously formed between the point A and the point C, the cell where the point A is located sends the data to a cell where the point B is adjacent to the point A, and similarly, the cell where the point B is located serves as a relay cell, the data is continuously relayed to the direction of the cell where the point C is located along the straight line between the point A and the point C, and the data sent by the cell where the point A is located is repeatedly forwarded by cells which are adjacent to each other on the intercellular communication path formed between the point A and the point C until the data are transmitted to the cell where the point C is located.

If the point D represents another starting point cell, the starting point cell is ready to send data to an end point cell where the point G is located, and since a straight line communication path can not be formed between the point D and the point G obviously, a cell where the point E is located and a cell where the point F is located are obviously closer to an end point cell where the point G is located in cells adjacent to the point D, the two cells belong to adjacent cells to be selected of the cell where the point D is located, cells with fewer communication tasks for outputting data can be selected as relay cells, and if the communication tasks for outputting data of the two cells are the same, one cell is selected as the relay cell at will. As shown in FIG. 8, selecting whether the cell is located at point E or F will result in a different intercellular communication pathway.

It should be noted that, in the present embodiment, the selection of the path of the inter-cell communication is described by taking the communication method between the adjacent cells shown in fig. 3 as an example, and it can be understood by those skilled in the art that if the communication method between the adjacent cells shown in fig. 4 is adopted, more transmission directions may be selected.

In summary, in practical implementation, for each cell sending or relaying data, the network controller must select a neighboring cell as the next station. When the starting point and the end point are on the same straight line, only one point is reasonably selected; in other cases, there are two equally reasonable choices the network controller will choose a neighbor whose traffic is relatively not busy.

If an input FIFO queue has data coming in, the network controller will first check it:

if the endpoint is the subject cell, then: if the destination is a specific relative address, the network controller has the capability of Direct Memory Access (DMA), directly stores the received data into the corresponding address in the nonvolatile random Access Memory, and notifies the MPU by interruption; if the destination is an MPU, the MPU is notified directly by an interrupt signal to perform processing.

If the end point is other cell, or MPU of the cell sends out data, then: if the terminal point is on the same straight line with the cell, selecting the correct direction and sending data to the adjacent cell; in other cases, there are two possible directions, selecting the adjacent cell from which the output FIFO queue is more free to transmit, and if the conditions of the output FIFO queue in the two adjacent cells to be selected are the same, then one adjacent cell can be selected from among them to transmit at will.

In actual implementation, when several thousands of MPUs existing in the cell array are calculated together, how to transmit output data of each cell to the main control CPU becomes a problem. Generally, each MPU can store output data at a predetermined address in the nonvolatile random access memory of the cell in which the MPU is located, and the main control CPU can read the output data by polling each MPU one by one. However, this does not apply to all problems, and in some of them, when only a few cells in the cell array need to output data to the main control CPU, it is too inefficient for the main control CPU to poll each MPU one by one.

Therefore, the cell array computing system provided by the embodiment of the invention further comprises: the cell array is also provided with at least one full-time output cell, the full-time output cell is used as a terminal cell to receive and store output data of other cells to the main control CPU, and the main control CPU is informed of reading the output data by an interrupt signal.

In a specific implementation, a FIFO queue may be further provided in the non-volatile random access memory of the full-time output cell, and all output data of other cells to the master CPU is stored in the FIFO queue, and the FIFO queue should have enough storage space to store all output data of other cells to the master CPU.

In practice, one or more cells in the cell array may be selected as the professional output cells, and typically cells may be selected that are more convenient for communicating locally with the master CPU. An interrupt line is arranged between the full-time output cell and the master CPU, and the full-time output cell can send an interrupt signal to the master CPU, for example, output data newly sent to other cells, the FIFO queue arranged in the MRAM is full, the FIFO arranged in the MRAM is empty, and the like.

Based on the cell array computing system with the full-time output cells, the embodiment of the invention also provides a communication method in the cell array computing system, which comprises the following steps: after receiving and storing the output data of other cells to the main control CPU, the full-time output cell sends an interrupt signal for informing reading to the main control CPU; and after receiving the interrupt signal for informing reading, the main control CPU reads the output data from the full-time output cell.

In particular implementations, the other cells may send the output data to the professional output cells by: any cell in the other cells is used as a starting cell, and the output data is transmitted to an adjacent cell according to a selected transmission direction; when any cell in the cell array receives the output data sent by the adjacent cell, if the ID of the end-point cell marked in the output data is judged to be consistent with the ID of the cell, the ID of the end-point cell marked in the output data is the ID of the full-time output cell, and the cell is indicated to be the full-time output cell, the output data is stored in a nonvolatile random access memory of the cell, otherwise, the cell is used as a transfer cell, and the cell adjacent to the cell is transferred from the output data after the sender is selected.

In the process of sending the output data to the professional output cell by the other cell, the starting cell or the transit cell may select the sending direction by: if a straight line communication path can be formed between the starting point cell or the relay cell and the full-time output cell, the sending direction is a direction from the starting point cell or the relay cell to the full-time output cell along the straight line, otherwise, the sending direction is a direction from the starting point cell or the relay cell to a cell to be selected, and the cell to be selected is a cell close to the full-time output cell in cells adjacent to the starting point cell or the relay cell.

See also figure 9 for a process for performing full-time output cells according to embodiments of the present invention. Fig. 9 shows a master CPU, a cell array and a cell array bus, wherein each cell in the cell array is simply represented by a small square, in which the cell where the J point is located (i.e. the cell indicated by the small square with a thick line frame) is a full-time output cell, and fig. 9 further shows the structure of the full-time output cell, as shown by the dotted arrow in fig. 9, it can be seen that the MRAM in the full-time output cell is provided with a FIFO queue for storing all output data of other cells to the master CPU.

Assuming that the cell at the H point and the cell at the I point need to provide output data to the main control CPU, the output data may be sent to the cell at the J point, the intercellular communication path from the H point to the J point, and the intercellular communication path from the I point to the J point through the communication mode between the cells, please refer to fig. 9. Since the communication mode between cells has been described in detail previously, it is not described herein again.

After receiving the output data sent by the cell at the point H or the cell at the point I, the cell at the point J can send an interrupt signal informing reading to the master control CPU, and after receiving the interrupt signal informing reading, the master control CPU can read the output data from the cell at the point J through the cell array bus.

The full-time output cells are arranged in the cell array, the full-time output cells are used as the destination cells to receive and store output data of other cells to the main control CPU, and the main control CPU is informed to read the output data in an interrupt signal mode, so that the efficiency of reading the output data by the main control CPU can be improved when only a few cells need to output data to the main control CPU.

An example of an application of the above cell array computing system is described below.

The speech recognition can be compared with the input sound signal by using a known speech library, and the comparison can be performed in a time domain or a frequency domain. When more and more words need to be compared, for example, considering that different accents can be tens of thousands, it is insufficient for real-time speech recognition if only relying on the computing power of a few CPUs.

The cell array computing system provided by the embodiment of the invention is very suitable for solving the problems.

Therefore, the embodiment of the present invention further provides a method for comparing data by using the cell array computing system, including: after the main control CPU selects all cells or cells in a target area in the cell array, broadcasting a comparison program to a nonvolatile random access memory of each cell; the main control CPU writes the samples which are responsible for comparison of each selected cell into the appointed address of each cell respectively; the master control CPU broadcasts instructions to the microprocessors of the selected cells, so that each microprocessor waits for inputting data to be compared after finishing initialization; the master control CPU broadcasts the data to be compared to the microprocessor of the selected cell; and the microprocessor of the selected cell runs the comparison program, compares the received data to be compared with the sample for comparison of the cell, and if the comparison result is consistent with the comparison result, the microprocessor of the selected cell uses a communication method in the cell array computing system to send the comparison result as output data to the full-time output cell for the main control CPU to read.

In specific implementation, the data to be compared may be voice data to be recognized, image data to be recognized, or other data that needs to be compared.

In practical implementation, each MPU continuously receives voice data for comparison, and in general, only one or a few of hundreds to thousands of cells obtain a comparison result that the data to be compared is consistent with a sample for comparison of the cell, and the cells send the comparison result to a professional output cell which informs a main control CPU of receiving the comparison result by using an interrupt signal.

If the data to be compared is specifically voice data, the data comparison process can be performed in the time domain or the frequency domain, and if the data to be compared is voice data, the main control CPU can perform Fast Fourier Transform (FFT) in a segmented manner, and then broadcast the voice data converted into the frequency domain to the MPU of the selected cell.

By using the cell array computing system provided with the full-time output cells to compare data, a large number of cells in the cell array can simultaneously carry out the operation of a comparison program, so that the cell array computing system has extremely strong parallel processing capability, solves the problem of communication bottleneck between a CPU and a memory in the prior art, and greatly improves the capability of real-time voice/image recognition.

As previously mentioned, there has been a simple method of broadcasting information from a cell to a certain target area in a cell array: the master control CPU reads the information and broadcasts the information. The embodiment further provides another implementation manner: the point-to-point communication function among the cells is expanded to the regional mass sending, and the mode can support higher parallelism and higher total bandwidth.

In the cell array computing system provided in this embodiment, any cell in the cell array can also be used as the starting point cell to perform mass-sending communication to all cells in the target area, a cell involved in the mass-sending communication and located in the target area is used as the starting point cell, or used as the destination cell, or used as both the transit cell and the destination cell, and a cell involved in the mass-sending communication and located outside the target area is used as the starting point cell or the transit cell.

In a specific implementation, the network controller connected to the microprocessor in each cell performs, in addition to cell-to-cell communication between any two cells, transceiving control of transmitted data, relayed data, or finally received data during the group transmission communication, and is further configured to transmit an interrupt signal to the microprocessor.

In practical implementation, the original sender of the intercellular mass communication (the cell in the cell array as the starting cell) is responsible for identifying the target area, and the mass sending of the data is still accomplished through a series of relays. It will be understood by those skilled in the art that the intercellular mass communication may also be considered as an effective superposition of point-to-point communication among multiple cells, and therefore, the specific implementation of the intercellular mass communication may refer to the implementation of communication between any two cells, for example, the cells in the cell array mentioned above may also include one or more sets of fifo queues connected to the network controller, and will not be described herein again.

On the basis that the cell array computing system supports intercell mass-sending communication, the embodiment of the invention also provides an intercell mass-sending communication method in the cell array computing system, which comprises the following steps: when any cell in the cell array is used as a starting cell to initiate mass-sending communication to all cells in a target area, if the starting cell is located in the target area, sending intercellular mass-sending data to all adjacent cells located in the target area, updating the target area aiming at each adjacent cell, and otherwise, sending the intercellular mass-sending data to the adjacent cells in a direction close to the target area; if the cell outside the target area receives the intercellular mass-sending data sent by the adjacent cell, after judging that the target area indicated in the intercellular mass-sending data does not contain the cell, the cell is used as a transfer cell, and the intercellular mass-sending data is transferred to the adjacent cell in the direction close to the target area; if the cell located in the target area receives the intercell mass-sending data sent by the adjacent cell, after the target area marked in the intercell mass-sending data is judged to contain the cell, the cell is used as an end-point cell, the received intercell mass-sending data is stored in a nonvolatile random access memory of the cell, or a microprocessor of the cell is informed to process the intercell mass-sending data, if the cell adjacent to the cell still exists in the target area, the cell is also used as a transfer cell, the received intercell mass-sending data is transferred to all the adjacent cells located in the target area, and the target area is updated aiming at each adjacent cell; the updated target area comprises one or more target areas divided by the target area before updating, each adjacent cell of the cells sending or transferring the intercellular mass data in the target area before updating is respectively contained in each updated target area, and the cells sending or transferring the intercellular mass data are excluded from the target area after updating.

In the present embodiment, the master CPU may broadcast data of a certain cell to a certain target area in the cell array, and the mass data involved in the inter-cell mass communication is referred to as "inter-cell mass data" in order to distinguish from "broadcast data of the master CPU". The cell initiating the intercell mass-sending communication defines a target area, the IDs of all cells or the ranges of all cell IDs in the target area are marked in the intercell mass-sending data, and when any cell receives the intercell mass-sending data, it can be determined whether the intercell mass-sending data is finally received by the cell, or needs to be further transferred to other adjacent cells, or both according to the target area marked in the intercell mass-sending data.

In addition, the target area is updated for each neighboring cell, specifically, one or more target areas are obtained by dividing the target area before updating (the cell that has sent or transferred the intercellular mass data is excluded from the target area after updating), wherein each target area includes one neighboring cell (i.e., a cell that is adjacent to the cell that has sent or transferred the intercellular mass data in the target area before updating), each neighboring cell continues to perform intercellular mass communication in the corresponding target area after updating, and accordingly, the target area indicated in the intercellular mass data is also updated.

In this embodiment, a description will be given taking as an example a communication method between adjacent cells as shown in fig. 3, and a shape of a target area specified by an origin cell from which mass-transfer communication is initiated is a rectangle. It should be noted that the intercell mass-sending communication method provided in this embodiment is a convenient and efficient method for practical implementation, and those skilled in the art can understand that in other embodiments, the intercell mass-sending communication method in the cell array computing system can also be applied to other adjacent cell communication methods or target areas with other shapes.

In practice, the mode of transmission or transfer will vary depending on where the cells that are the starting cells or the cells that are the cells to be transferred are located.

When a first cell as a starting cell or a transit cell is located at a corner of a rectangular target region, if the number of cells in each of two adjacent sides of the rectangular target region including the first cell is 1, the updated target region is a rectangular region formed by excluding the first cell from the other side of the two adjacent sides of the rectangular target region, otherwise, the updated target region includes two rectangular target regions, one of which is a rectangular region formed by excluding the first cell from either one of the two adjacent sides. It should be noted that the first cell in this embodiment is a generic term for a type of cell located at a corner of a rectangular target region.

Referring to fig. 10, assuming that the cell at the K point is the starting cell for initiating the intercell mass communication, or is a transfer cell responsible for transferring intercellular mass data, the rectangular target area 101 is a target area determined before the cell where the point K is located sends or transfers the intercellular mass data, at this time, the cell where the point K is located is in the rectangular target area 101 and is located on the corner of the rectangular target area 101, since the edge of the rectangular target area 101 in the horizontal direction only contains 1 cell, and only one neighbor of the cell where the K point is located can be selected as the next station relay, the network controller of the cell sends the cell group sending data to the cell where the L point is located, updating a rectangular target area 101, wherein the target area formed after updating is a rectangular target area 102, which is equivalent to excluding the cells where the K points are located from the rectangular target area 101; with the target area continuously updated, if the last cell is left in the target area, the transfer is stopped.

Assuming that the cell where the M point is located is also a starting cell for initiating the inter-cell mass-sending communication or a transfer cell responsible for transferring inter-cell mass-sending data, the rectangular target region 103 is a target region determined before the cell where the M point is located sends or transfers the inter-cell mass-sending data, the cell where the M point is located is in the rectangular target region 103 and is located at a corner of the rectangular target region 103, since two adjacent sides of the rectangular target region 103 both include more than 1 cell, and at this time, two neighbors of the cell where the M point is located can be selected as a next station for transfer, the network controller of the cell sends the inter-cell mass-sending data to the cell where the N point is located and the cell where the O point is located, and updates the rectangular target region 103, the updated target region includes two rectangular target regions, one of which is the rectangular target region 104, the other target area is a rectangular target area 105, which is equivalent to excluding the cell where the point M is located from the rectangular target area 103, and the rectangular target area 104 and the rectangular target area 105 can be used as independent target areas to continue data transfer in the similar method; with the target area continuously updated, if the last cell is left in the target area, the transfer is stopped.

When a second cell serving as a starting cell or a transit cell is located on the side of a rectangular target region, if the number of cells on the side of the rectangular target region adjacent to the side where the second cell is located is 1, the updated target region includes two rectangular target regions formed by excluding the second cell from the side where the second cell is located, otherwise, the updated target region includes three rectangular target regions, where the two target regions are two rectangular regions formed by excluding the second cell from the side where the second cell is located. It should be noted that the second cell in this embodiment is a generic term for a type of cell located on the side of the rectangular target region.

Referring to fig. 11, assuming that the cell where the point P is located is a starting cell for initiating the inter-cell mass-sending communication or a relay cell responsible for relaying inter-cell mass-sending data, the rectangular target region 111 is a target region determined before the cell where the point P is located sends or relays the inter-cell mass-sending data, the cell where the point P is located in the rectangular target region 111 and is located on one side of the rectangular target region 111, since the number of cells on the side of the rectangular target region 111 adjacent to the side where the cell where the point P is located is greater than 1, three neighbors of the cell where the point P is located may be selected as a next station for relaying, the network controller of the cell sends the inter-cell mass-sending data to the cell where the point Q is located, the cell where the point R is located, and the cell where the point S is located, respectively, and updates the rectangular target region 111, and the updated target region includes three rectangular target regions, the two target areas, namely the rectangular target area 112, the rectangular target area 113 and the rectangular target area 114, are equivalent to excluding the cell where the point P is located from the rectangular target area 111, the two target areas, namely the rectangular target area 112 and the rectangular target area 113, are equivalent to two rectangular areas formed after excluding the cell where the point P is located on the side where the cell where the point P is located, and the rectangular target area 112, the rectangular target area 113 and the rectangular target area 114 can be used as independent target areas to continue to perform data transfer in the similar method; with the target area continuously updated, if the last cell is left in the target area, the transfer is stopped.

It can be understood that, if the number of cells on the side of the target area (not labeled in fig. 11) adjacent to the side where the cell where the point P is located is 1, then two neighbors of the cell where the point P is located may be selected as the next station relay, and the network controller of the cell sends the cell group sending data to the cell where the point Q is located and the cell where the point R is located, respectively, and updates the target area, where the updated target area includes two rectangular target areas, specifically, a rectangular target area 112 and a rectangular target area 113.

When a third cell, which is a starting cell, is located inside a rectangular target region, the updated target region includes four rectangular target regions, two of the target regions are two rectangular regions formed by excluding the third cell from the row or column where the third cell is located, and the other two target regions are two rectangular regions formed by dividing the rectangular target region before updating by the row or column where the third cell is located. It should be noted that the third cell in this embodiment is a generic term for a type of cell located inside a rectangular target region, and the inside of the rectangular target region refers to regions other than "corners" and "sides".

Referring to fig. 12, assuming that the cell at the T point is the starting cell for initiating the inter-cell mass-sending communication (in this embodiment, the cell at the T point is not a transfer cell responsible for transferring inter-cell mass-sending data), the rectangular target region 121 is a target region determined before the cell at the T point sends out the inter-cell mass-sending data, at this time, the cell at the T point is located inside the rectangular target region 121, four neighbors of the cell at the T point may be selected as next station transfer, the network controller of the cell sends the inter-cell mass-sending data to the cell at the U point, the cell at the V point, the cell at the W point, and the cell at the X point respectively, and updates the rectangular target region 121, and the updated target region includes four rectangular target regions, namely a rectangular target region 122, a rectangular target region 123, a rectangular target region 124, and a rectangular target region 125, the cell where the T point is located is excluded from the rectangular target region 121, the two target regions, namely the rectangular target region 122 and the rectangular target region 123, are equivalent to two rectangular regions formed after the cell where the T point is located is excluded from the row where the cell where the T point is located, the two target regions, namely the rectangular target region 124 and the rectangular target region 125, are equivalent to two rectangular regions formed by dividing the rectangular target region 121 by the row where the cell where the T point is located, and the rectangular target region 122, the rectangular target region 123, the rectangular target region 124 and the rectangular target region 125 can be used as independent target regions to continue data transfer in the similar method; with the target area continuously updated, if the last cell is left in the target area, the transfer is stopped.

In this embodiment, when a fourth cell serving as an origin cell or a transit cell is located outside a target region, if a straight communication path can be formed between the fourth cell and any cell in the target region, a sending direction of the group data sent or relayed by the fourth cell is a direction from the fourth cell to the target region along the straight line, otherwise, the sending direction is a direction from the fourth cell to a cell adjacent to a candidate selected cell, where the cell adjacent to the candidate selected cell is a cell close to the target region, among cells adjacent to the fourth cell. It should be noted that the fourth cell in this embodiment is a generic term for a cell type located outside the rectangular target region.

Referring to fig. 13, assuming that the cell at point Y1 is the starting cell for initiating the intercell mass-sending communication, the rectangular target region 131 is the target region determined before the cell at point Y1 sends out the intercell mass-sending data, at this time, the cell at point Y1 is outside the rectangular target region 131, since the cell can form a straight-line communication path between the extension lines of the two opposite sides of the rectangular target region and the cell at point Y3 in the rectangular target region, only one neighbor can be used as the relay of the next station, the network controller of the cell at point Y1 sends the intercell mass-sending data to the neighbor, that is, the cell at point Y2, and the cell at point Y2 is used as the relay cell responsible for relaying the intercell mass-sending data. The cell at point Y2 relays data in the direction shown by the dotted arrow in fig. 13 until it is transmitted to the cell at point Y3. The cell at point Y3 is located on the side of the rectangular target area 131, and the transfer process within the rectangular target area 131 can be continued according to the related method.

Continuing with fig. 13, assuming that the cell at the point Z1 is the starting cell for initiating the intercell mass-sending communication, and the rectangular target region 131 is the target region determined before the cell at the point Z1 sends out the intercell mass-sending data, at this time, the cell at the point Z1 is outside the rectangular target region 131, since the cell is not between the extensions of the two opposite sides of the rectangular target region, it is difficult to form a straight communication path with any cell in the rectangular target region, at this time, two neighbors can be used as the relay of the next station, that is, the cell at the point Z2 and the cell at the point Z3 are the candidate neighboring cells of the cell at the point Z1, and the two cells are the cells closer to the rectangular target region 131 from among the neighboring cells of the cell at the point Z1. In practical implementation, one or more cells with a lighter load, specifically, cells with fewer communication tasks for outputting data, may be selected as the next station relay according to any practical communication situation. Starting from the cell at the point Z1, the mass-sending data among the cells is transferred to the cell at the point Z4 through two feasible transfer communication paths. The cell at point Z4 is located at the corner of the rectangular target area 131, and the transfer process within the rectangular target area 131 can be continued according to the related method described above.

The intercell mass-sending communication method in the cell array computing system provided by the embodiment can support higher parallelism and obtain much higher total bandwidth by expanding the point-to-point communication function among the cells to regional mass sending, thereby further improving the overall performance of the computing system.

In this embodiment, more than one unit ("cell") with independent calculation and storage functions is combined into a two-dimensional or three-dimensional array ("cell array"), where each cell includes a microprocessor and a nonvolatile random access memory, and the nonvolatile random access memory can support both random access of data involved in calculation performed by the microprocessor and storage of instruction codes of software and data to be permanently stored, so that three functions of memory, storage and calculation are integrated into each cell, and a dense communication network is formed between each cell A communication bottleneck exists between the stores.

Those skilled in the art recognize that CPUs are becoming more and more complex to rely on in one and the next generation of evolving semiconductor processes. This creates a problem, as semiconductor chips become more complex, and as advanced processes become more sophisticated, more than 10 hundred million MOS devices may be implemented on a chip. However, if one of 10 hundred million components is damaged in the chip manufacturing process, generally, the whole chip becomes a waste product, and the damage rate of the component needs to be controlled to be less than 10 billion, which makes a great challenge to the semiconductor process, and the yield is low, which will greatly increase the cost of the chip.

The architecture of the cell array computing system shown in fig. 1 provided by the embodiment of the invention integrates a large number of cells capable of independent operation on one chip, so that a small part of cells have problems in the production and manufacturing process, and the whole cell array computing system cannot normally operate, so that the chip of the integrated cell array computing system is scrapped, the yield of products is reduced, and the cost of the chip is increased.

Therefore, the embodiment of the present invention further provides a cell array computing system with an added redundant design in the cell array, wherein a part of redundant cells is reserved in the cell array as a spare cell, and when any other cell in the cell array is determined to be a damaged cell, the spare redundant cell is used as a corresponding replacement cell for the damaged cell, so that the work of the whole system is not affected when a small part of cells have production and manufacturing problems, thereby improving the product yield and reducing the production and manufacturing cost.

As shown in fig. 14, on the basis of the cell array computing system shown in fig. 1, more than one redundant cell is reserved in the cell array as a spare cell, and the spare cell is used as a corresponding replacement cell for a damaged cell when any other cell in the cell array is determined to be the damaged cell; the cell array and the cell array bus are integrated on one chip.

It should be noted that, in this embodiment, there are communication interfaces between adjacent cells in the cell array, which can send and receive data to and from each other, but in other embodiments, in order to simplify the design, the cells in the cell array computing system may not have the function of mutual communication, but only the function of communication between the main control CPU and each cell in the cell array is retained.

In practical implementation, a part of redundant cells can be reserved in the cell array as spare cells, and when the cell array is a two-dimensional array, for example, one row or one column of cells in the cell array can be reserved as spare cells. The spare cells are only used as corresponding replacement cells under the condition that other cells in the cell array are damaged, the spare cells which do not become the replacement cells are always in an unopened state and do not participate in the operation of the cell array computing system, and only the spare cells which become the replacement cells participate in the operation of the cell array computing system.

When the chip integrated with at least the cell array and the cell array bus is produced for on-line testing, each cell in the cell array is tested. In practice, each cell may be self-tested by broadcasting a test program, which can determine which cells are damaged, declare a cell as dead (damaged) if a cell fails the test, selecting a normal cell (undamaged) as a replacement in the spare cells, and writing the address of the replaced cell (i.e., the location of the damaged cell in the cell array) in the corresponding non-volatile register of the replaced cell.

Therefore, if a certain cell other than the spare cells fails to pass the test, the cell is determined to be a damaged cell, a normal cell is selected from the spare cells as a corresponding replacement cell of the damaged cell, and the position of the replaced damaged cell in the cell array is stored in the replacement cell.

In this embodiment, once some cells in the cell array are determined to be damaged cells, the cell array computing system is in operation, and these damaged cells are always in an off state (or referred to as an off state), and the corresponding replacement cells of the damaged cells are responsible for listening to instructions on the cell array bus, and the relevant operations are performed when read, write, and broadcast instructions are encountered involving the replaced cells.

As previously described, the protocol for reading, writing, and broadcasting cells in a cell array is implemented as shown in FIG. 5. Each cell array has a bus controller therein for listening to instructions on the cell array bus and executing instructions associated with the cell. In this embodiment, the bus controller of the spare cell is used to identify the communication between the master CPU and the damaged cell replaced by the cell. Specifically, when listening to the command on the cell array bus, the bus controller that replaces the damaged cell with the corresponding cell recognizes the command related to the cell replaced by the cell as the command related to the cell (corresponding to the bus controller that replaces the damaged cell with the corresponding cell assumes that the cell has the ID of the replaced cell when listening to the command on the cell array bus), and then connects the microprocessor to transmit the command or data sent by the main control CPU, to read the status, or connects the nonvolatile random access memory through the internal bus of the cell to perform the read-write operation of the data.

Redundant cell design requires the addition of a non-volatile memory within the bus controller to write the address (x and y coordinates) of the replaced cell to the spare cell during in-line testing. Therefore, in this embodiment, a first nonvolatile memory is provided in the bus controller of the spare cell for storing the position of the damaged cell replaced by the cell in the cell array. The first nonvolatile memory may be a one-time programmable memory, for example, a currently mature FUSE technology (a nonvolatile memory that can be written only once) may be used.

When the cell array bus has memory read-write command, the address of the corresponding cell (cell ID) can be deduced from the address, the bus controller compares the received cell address with the cell address replaced by the cell, if the cell address is the same as the cell address, the relevant read-write command is executed; when there is a broadcast command on the cell array bus, the bus controller checks whether the replaced cell is within the broadcast area, and if so, executes the relevant read-in command.

As previously mentioned, communication can be performed between any two cells in the cell array, and the cells involved in intercellular communication include a start cell, an end cell, and a transit cell; in addition, any cell in the cell array can also be used as the starting cell to perform mass-sending communication to all cells in the target area.

In this embodiment, it is also possible to designate a cell as a dead cell by using a nonvolatile memory among adjacent cells (up to four cells when the communication method between adjacent cells shown in fig. 3 is used) around the dead cell, and record the address of the replacement cell. This information will be used to guide the transmission protocol of the intercellular communication network.

Communication of the cell array internal network is implemented as shown in fig. 7. As mentioned above, the cells in the cell array further include a network controller connected to the microprocessor, and the network controller is configured to perform transceiving and routing control on the transmitted data, the relayed data, or the finally received data during the mass communication or the cell-to-cell communication between any two cells, and is further configured to send an interrupt signal to the microprocessor. In a practical embodiment, the network controller inside each cell is responsible for reading and writing the FIFOs for communication with four adjacent cells, and for communication with the MPU for that cell.

The redundant cell design requires that a non-volatile memory is also added inside the network controller for addressing replacement cells of adjacent dead cells. Therefore, in this embodiment, a second nonvolatile memory is provided in the network controller, and the second nonvolatile memory in all normal cells adjacent to the damaged cell is used to calibrate the damaged cell adjacent to the cell and store the position of the corresponding replacement cell of the damaged cell in the cell array; in communicating data, the network controller of a neighboring cell of the damaged cell bypasses the damaged cell when performing routing control, and if the damaged cell is an end cell (inter-cell communication for any two cells) or one of the end cells (for the bulk communication), control forwards the data to the corresponding replacement cell of the damaged cell. Similarly to the first non-volatile memory, the second non-volatile memory may also be a one-time programmable memory, for example, also implemented with FUSE technology.

In practice, the network controller selects a path that bypasses dead cells when a point-to-point transmission protocol (i.e., communication between any two cells) is implemented, possibly to pass through the dead cells.

When a point-to-point transmission protocol is executed and the end point is a dead cell, the relevant message is forwarded to the replacement cell. Specifically, the interrupt may be generated by a network controller, and the forwarding operation may be implemented by software by an MPU.

When a cell parcel broadcast (i.e., a mass-cast communication of cells) is performed and the end point contains a dead cell, the alternate cell is redirected to forward the associated message and the subsequent parcel broadcast transmission is re-scheduled. Specifically, the interrupt may be generated by a network controller, and the forwarding operation may be implemented by software by an MPU.

Based on the cell array computing system for reserving redundant cells, an embodiment of the present invention further provides a testing method for the cell array computing system, including: broadcasting a test program to enable each cell in the cell array to perform self-testing; and if one cell except the spare cell can not pass the test, determining that the cell is a damaged cell, selecting a normal cell from the spare cells as a corresponding replacement cell of the damaged cell, and storing the position of the replaced damaged cell in the cell array in the replacement cell.

In this embodiment, after determining that a certain cell other than the spare cell is a damaged cell, the damaged cell may be calibrated in all normal cells adjacent to the damaged cell, and the position of the corresponding replacement cell of the damaged cell in the cell array may be stored.

Based on the cell array computing system for reserving redundant cells, an embodiment of the present invention further provides a communication method in the cell array computing system, including: the replacement cell listens to a memory read-write instruction on the cell array bus, and if the ID of the target cell determined by the memory read-write instruction is judged to be the same as the ID of the damaged cell replaced by the cell, the memory read-write instruction is executed; the replaced cell also listens to a broadcast instruction on the cell array bus, and if the damaged cell replaced by the cell is judged to be in the target area determined by the broadcast instruction, the relevant read-in instruction is executed.

In addition, an embodiment of the present invention further provides a method for communication between cells in the cell array computing system for reserving redundant cells, which includes, in addition to the entire contents of the method for communication between cells in the cell array computing system shown in fig. 1: when the cell adjacent to the starting cell or the transit cell is a damaged cell, selecting a sending direction for bypassing the damaged cell; when the terminal cell is a damaged cell, the damaged cell is redirected to a corresponding replacement cell to forward relevant data.

In addition, an embodiment of the present invention further provides a method for performing group communication between cells in the cell array computing system reserved with redundant cells, which includes, in addition to all the contents of the method for performing group communication between cells in the cell array computing system shown in fig. 1: and if the terminal cell in the mass-sending communication is the damaged cell, redirecting the corresponding replaced cell of the damaged cell to forward related data, and replanning a subsequent updated target area. In practice, damaged cells are also excluded from the updated target area.

The advantages of the redundant design of cell arrays are summarized below:

in a traditional computing architecture, a huge CPU core is adopted, each core is provided with a large number of components (tens of millions or even hundreds of millions), and only one component is in problem during production and manufacturing, and the whole core or even the whole chip is a waste product. When the integrated circuit process further evolves and the size of each device is only 20-30 nm or even smaller, the yield of the product is difficult to control, and the cost is increased dramatically.

The architecture of the cell array computing system provided by the embodiment of the invention adopts the micro kernel, a large number of cells capable of independently operating are integrated on one chip, and the redundant design is matched, so that the production and manufacturing problems of a small part of cells do not influence the work of the whole chip, and the dilemma is solved.

The cell array computing system for reserving redundant cells, the testing method thereof, and the communication method provided in the embodiment of the present invention may also refer to the cell array computing system shown in fig. 1, and the implementation related contents of the communication method between cells and the inter-cell mass-sending communication method thereof.

It should be noted that, in the embodiment of the present invention, the cell array calculation system with redundant cells reserved is described by taking the cell array as an example, and in other embodiments, the cell array may also be a three-dimensional cell array, where the three-dimensional cell array is formed by stacking more than one two-dimensional cell arrays, and the concept of "adjacent cells" in the cell array is not limited to a two-dimensional plane, but extends to a three-dimensional space. In the two-dimensional cell array, if the communication method between adjacent cells as shown in fig. 3 is used, any cell has adjacent cells in all of the six directions of the x-axis positive and negative direction, the y-axis positive and negative direction, and the z-axis positive and negative direction in the spatial rectangular coordinate system. In practical implementation, when a plurality of 2D cell array chips can be stacked together to form a 3D chip, Through-Silicon Vias (TSVs) are used to establish longitudinal communication between adjacent cells, that is, communication between adjacent cells in two adjacent two-dimensional cell arrays is established Through TSVs. The 3D cell array chip increases the scale of the cell array and expands the bandwidth of internal communication while keeping the advantage of low power consumption.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A cell array computing system, comprising: a master control CPU, a cell array and a cell array bus;

the cell array is a two-dimensional array or a three-dimensional array consisting of more than one cell with calculation and storage functions, wherein each cell comprises a microprocessor and a nonvolatile random access memory; the non-volatile random access memory is used for random access of data involved in the calculation of the microprocessor and is also used for storing instruction codes of software and data needing to be permanently stored;

each cell stores a respective location in the cell array as an ID for software or hardware reading in the cell;

the master CPU communicates with each cell in the cell array via the cell array bus; the master CPU communicating with each cell in the cell array over the cell array bus comprises at least one of:

broadcasting instructions to the microprocessors of all cells within the target area;

more than one redundant cell is reserved in the cell array as a spare cell, and the spare cell is used as a corresponding replacement cell of the damaged cell when any other cell in the cell array is determined to be the damaged cell;

the cell array and the cell array bus are integrated on one chip.

2. The cell array computing system of claim 1, wherein the cells in the cell array further comprise a bus controller and an internal cell bus, the bus controller is connected to the cell array bus, the microprocessor and the internal cell bus, the bus controller is configured to listen to the instruction on the cell array bus, and for the instruction of the relevant cell, the microprocessor is connected to transmit the instruction or data sent by the main control CPU, read the status, or connect the internal cell bus to the nonvolatile random access memory for data reading and writing; a first nonvolatile memory is arranged in the bus controller of the spare cell and used for storing the position of the damaged cell replaced by the cell in the cell array; when the chip is operated, the damaged cell is in a closed state, and the bus controller of the damaged cell corresponding to the replaced cell listens to the instruction on the cell array bus, the instruction related to the replaced cell of the cell is recognized as the instruction related to the cell.

3. The cell array computing system of claim 1, wherein adjacent cells in the cell array have communication interfaces for transmitting and receiving data to and from each other; any two cells can communicate with each other, and the cells involved in intercellular communication include a start cell, an end cell and a transit cell, the start cell is a cell which sends data to the end cell, the end cell is a cell which finally receives the data sent by the start cell, the transit cell is a cell which is adjacent in sequence along an intercellular communication path and relays the data sent by the start cell through the communication interface, and the intercellular communication path is a data sending and receiving path formed by the start cell, the transit cell and the end cell.

4. The cell array computing system of claim 3, wherein any cell in the cell array is further capable of performing mass-sending communication as the origin cell to all cells in a target area, wherein a cell involved in the mass-sending communication and located in the target area is used as the origin cell, or used as the destination cell, or used as both the transit cell and the destination cell, and wherein a cell involved in the mass-sending communication and located outside the target area is used as the origin cell or the transit cell.

5. The cell array computing system of claim 3 or 4, wherein the cells in the cell array further comprise a network controller connected to the microprocessor, the network controller being configured to send, receive, and route outgoing, relayed, or ultimately received data, and further configured to send an interrupt signal to the microprocessor; a second nonvolatile memory is arranged in the network controller, the second nonvolatile memory in all normal cells adjacent to the damaged cell is used for calibrating the damaged cell adjacent to the cell and storing the position of a corresponding replacement cell of the damaged cell in the cell array; in communicating data, the network controller of a neighboring cell of the damaged cell bypasses the damaged cell when performing routing control, and if the damaged cell is one of the end point cell or the end point cell, controls forwarding data to a corresponding replacement cell of the damaged cell.

6. The cell array computing system of claim 1, wherein the cell array is a two-dimensional array, and the one or more redundant cells are reserved as one row or one column of cells in the cell array.

7. The cell array computing system of claim 1, wherein the non-volatile random access memory is an MRAM.

8. A method of testing a cell array computing system according to any one of claims 1 to 7, comprising:

broadcasting a test program to enable each cell in the cell array to perform self-testing;

if a certain cell A except the spare cell can not pass the test, determining the cell A as a damaged cell, selecting a normal cell from the spare cells as a corresponding replacement cell of the damaged cell, and storing the position of the replaced damaged cell in the cell array in the replacement cell.

9. The method for testing a cell array computing system according to claim 8, wherein after determining that a cell A other than the spare cell is a damaged cell, the damaged cell is further marked in all normal cells adjacent to the damaged cell, and the position of the corresponding replacement cell of the damaged cell in the cell array is stored.