CN118035170A

CN118035170A - Two-dimensional Mesh network-on-chip topology reconstruction method based on shortest path first

Info

Publication number: CN118035170A
Application number: CN202410213549.2A
Authority: CN
Inventors: 丁浩; 武政; 钱俊彦
Original assignee: Guangxi Normal University
Current assignee: Guangxi Normal University
Priority date: 2024-02-27
Filing date: 2024-02-27
Publication date: 2024-05-14

Abstract

The invention discloses a two-dimensional Mesh network-on-chip topology reconstruction method based on shortest path priority, which is characterized in that redundant columns are arranged at the left end and the right end of an array, so that the problems of logic topology confusion, low performance and the like after reconstruction caused by overlong compensation distance of a fault unit are optimized; and according to the established subsequent searching rule, the minimum distance standby replacement of the fault unit is realized by utilizing the greedy thought, so that the logic topology of the fault-free unit with excellent performance is constructed.

Description

Two-dimensional Mesh network-on-chip topology reconstruction method based on shortest path first

Technical Field

The invention relates to the technical field of Network on chips (nocs), in particular to a two-dimensional Mesh Network on Chip topology reconstruction method based on shortest path first.

Background

With the development of deep submicron technology, the innovation of packaging technology is further advanced, and the density of circuit devices reaches an unprecedented height. Communication inside the chip becomes more complex due to the increasing number of transistors on the chip. Under the current trend of large-scale complex chip development, the traditional bus mode exposes a plurality of problems such as performance bottleneck, excessive power consumption and the like. Thus, a new type of interconnect architecture network-on-chip has evolved. Nocs are often made as compact as possible using very large scale integrated circuit technology, thus leading to an increasing probability of failure of larger scale nocs. To improve the reliability of nocs, it is necessary to reconstruct the two-dimensional Mesh NoC topology containing the faulty processing units using an efficient fault tolerance technique.

Early, because of the low chip integration, the single panel has a small number of processor cores and its internal architecture is not as regular as the memory cells, so research into NoC fault tolerance techniques has focused mainly on microarchitectural level redundancy. This technique is mainly applied to single core chips, and the principle is that when one of the cores fails, other relevant resources in the core can be used, and the whole core can be in a degraded state, and when the redundancy in one available processor is consumed, the whole core is regarded as an unusable state. However, for multi-core chips, since the number of cores on the chip itself is large and all are packaged by the latest technology, the defective cores at the micro-architecture level are not necessary to be tolerated, so that the core-level redundancy is a more suitable scheme. The idea of core level redundancy is to place redundant cores on the chip, and when a processing unit inside the chip fails, the system can be continuously serviced by replacing the current failed processor unit core with the redundant cores to build a new logical topology of the non-failed processing unit. However, since different topologies have a significant impact on both chip transmission efficiency and communication load levels, the greatest challenge is currently to ensure the performance advantages of the reconstructed topology itself.

Disclosure of Invention

The invention aims to solve the problem that a fault processing unit on a two-dimensional Mesh on-chip network influences the reliability of the whole system, and provides a two-dimensional Mesh on-chip network topology reconstruction method based on shortest path priority.

In order to solve the problems, the invention is realized by the following technical scheme:

a two-dimensional Mesh network-on-chip topology reconstruction method based on shortest path first comprises the following steps:

Step 1, defining redundant columns on an original main array of an array scale n multiplied by m according to a preset redundant column number r;

When the predetermined redundant column number r is an even number, the redundant columns are distributed on the leftmost side and the rightmost side of the original main array completely and evenly, and the number of the left redundant columns of the original main array is the same as the number of the right redundant columns of the original main array;

When the predetermined redundant column number r is an odd number, the redundant columns are relatively evenly distributed at the leftmost side and the rightmost side of the original main array, and the number of the left redundant columns of the original main array is one column less than that of the right redundant columns of the original main array;

Step2, scanning fault processor units in non-redundant columns of the current main array in sequence from top to bottom and constructing a left shortest path of the current fault processor units reaching a left redundant column and a right shortest path of the current fault processor units reaching a right redundant column;

Step 3, comparing the path lengths of the left shortest path and the right shortest path of the current fault processor unit obtained in the step 2, and taking the path length smaller as a fault compensation path; performing logic replacement operation of circulating left shift on all nodes of the fault compensation path to update the current main array;

Step 4, merging the redundant columns of the current main array, merging the fault processor units on the left redundant column to the left side of the left redundant column in sequence, and merging the fault processor units on the right redundant column to the right side of the right redundant column in sequence;

Step 5, repeating the steps 2-4 until no fault processor unit exists in the non-redundant columns of the current main array;

And 6, removing redundant columns on the left side and the right side of the current main array to obtain the target array with the array scale of n× (m-r).

In the above step 2, the specific process of constructing the left shortest path L for the current fault processor unit to reach the left redundant column is:

Step 1L, initializing, namely enabling the current fault processor unit to be an initial node L ₁ of the left shortest path L;

Step 2L, selecting a subsequent processing unit from the left alternative processor units of the current node L _t; where the coordinates of the current node L _t are (x _t,y_t), then the coordinates of the left alternative processor unit of the front node L _t are (x _t-1,y_t-1)、(x_t,y_t -1) and (x _t+1,y_t -1);

① If the processor units with coordinates (x _t-1,y_t-1)、(x_t,y_t -1) and (x _t+1,y_t -1) are normal processor units, selecting the processor unit with coordinates (x _t,y_t -1) as a subsequent processor unit;

② If the processor unit with coordinates (x _t,y_t -1) is a faulty processor unit and the processor units with coordinates (x _t-1,y_t -1) and (x _t+1,y_t -1) are normal processor units, then it is further determined whether the processor units with coordinates (x _t-1,y_t -1) and (x _t+1,y_t -1) are in the left redundant row:

If the rows of processor units with coordinates (x _t-1,y_t -1) and (x _t+1,y_t -1) are left redundant rows, selecting the processor unit with coordinates (x _t-1,y_t -1) as a subsequent processor unit;

if only one of the rows of processor units having coordinates (x _t-1,y_t -1) and (x _t+1,y_t -1) is a left redundant row, selecting the processor unit in which the row is the left redundant row as a subsequent processor unit;

If neither of the rows of processor units having coordinates (x _t-1,y_t -1) and (x _t+1,y_t -1) is a left redundant row, selecting the processor unit having coordinates (x _t-1,y_t -1) as a subsequent processor unit;

Wherein a left redundant row means that the number of failed processor units on the non-redundant column of the row is less than the number of normal processor units on the left redundant column of the row;

③ If the processor unit with the coordinates of (x _t,y_t -1) is a fault processor unit, only one of the processor units with the coordinates of (x _t-1,y_t -1) and (x _t+1,y_t -1) is a normal processor unit, selecting the normal processor unit as a subsequent processor unit;

④ If both processor units with coordinates (x _t-1,y_t-1)、(x_t,y_t -1) and (x _t+1,y_t -1) are faulty processor units, selecting the processor unit with coordinates (x _t,y_t -1) as a subsequent processor unit;

step 3L, letting the subsequent processor unit of the current node L _t be the current node L _t of the left shortest path L;

Step 4L, determining whether the current node L _t is a normal processor unit on the redundant column: if yes, stopping iteration, thereby obtaining a left shortest path L; otherwise, returning to the step 2L.

In the step 2, the specific process of constructing the right shortest path R from the current fault processor unit to the right redundant column is as follows:

Step 1R, initializing, namely enabling the current fault processor unit to be an initial node R ₁ of the right shortest path R;

Step 2R, selecting a subsequent processing unit from the right alternative processor units of the current node R _t; where the coordinates of the current node R _t are (x _t,y_t), then the coordinates of the right alternative processor unit of the current node L _t are (x _t-1,y_t+1)、(x_t,y_t +1) and (x _t+1,y_t +1);

① If the processor units with coordinates of (x _t-1,y_t+1)、(x_t,y_t +1) and (x _t+1,y_t +1) are normal processor units, selecting the processor unit with coordinates of (x _t,y_t +1) as a subsequent processor unit;

② If the processor unit with coordinates (x _t,y_t +1) is a faulty processor unit and the processor units with coordinates (x _t-1,y_t +1) and (x _t+1,y_t +1) are normal processor units, then it is further determined whether the row in which the processor units with coordinates (x _t-1,y_t +1) and (x _t+1,y_t +1) are located is a right redundant row:

If the rows of the processor units with coordinates of (x _t-1,y_t +1) and (x _t+1,y_t +1) are the right redundant rows, selecting the processor unit with coordinates of (x _t-1,y_t +1) as the subsequent processor unit;

If only one of the rows of processor units having coordinates (x _t-1,y_t +1) and (x _t+1,y_t +1) is a right redundant row, selecting the processor unit in the row which is the right redundant row as a subsequent processor unit;

if the rows of the processor units with coordinates of (x _t-1,y_t +1) and (x _t+1,y_t +1) are not right redundant rows, selecting the processor unit with coordinates of (x _t-1,y_t +1) as a subsequent processor unit;

wherein the right redundant row means that the number of failed processor units on the non-redundant column of the row is less than the number of normal processor units on the right redundant column of the row;

③ If the processor unit with the coordinates of (x _t,y_t +1) is a fault processor unit, only one of the processor units with the coordinates of (x _t-1,y_t +1) and (x _t+1,y_t +1) is a normal processor unit, selecting the normal processor unit as a subsequent processor unit;

④ If both processor units with coordinates (x _t-1,y_t+1)、(x_t,y_t +1) and (x _t+1,y_t +1) are faulty processor units, selecting the processor unit with coordinates (x _t,y_t +1) as a subsequent processor unit;

Step 3R, letting the subsequent processor unit of the current node R _t be the current node R _t of the right shortest path R;

Step 4R, determining whether the current node R _t is a normal processor unit on the redundant column: if yes, stopping iteration, thereby obtaining a right shortest path R; otherwise, returning to the step 2R.

Compared with the prior art, the invention has the following characteristics:

1. In the conventional two-dimensional NoC physical topology reconstruction, redundant columns are arranged at the right side of the array, and the invention provides that the redundant columns are arranged at the left end and the right end of the array, so that the problems of logic topology confusion, low performance and the like after the reconstruction caused by overlong compensation distance of a fault unit are solved;

2. and designing a shortest path first reconstruction algorithm, and utilizing a greedy idea to realize the minimum-distance standby replacement of the fault unit according to the formulated subsequent search rule, so as to construct a logic topology with excellent performance and no fault unit.

Drawings

Fig. 1 shows an example of a physical array structure of size 4×4 containing 2 redundant columns.

Fig. 2 is an exemplary diagram of a shortest path logical topology reconstruction algorithm performed in its entirety by a failed unit.

Detailed Description

The invention will be further described in detail below with reference to specific examples and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the invention more apparent. In the examples, directional terms such as "upper", "lower", "middle", "left", "right", etc. are merely directions with reference to the drawings. Accordingly, the directions of use are merely illustrative and not intended to limit the scope of the invention.

The two-dimensional Mesh network on chip is typically comprised of processor units, network interfaces, routers, communication links, and the like, each of which can communicate with any other processor unit through the NoC infrastructure. When a processor unit is unable to process data or read and write information from its surrounding units, the processor unit fails, defined as a failed processor unit. A processor unit that is not faulty, i.e. is able to function properly, is defined as a normal faulty processor unit or a non-faulty processor unit. It should be noted that in the study of the present invention, the network interfaces, routers and communication links were assumed to be fault-free.

The network-on-chip based processor array produced by the lithography machine related technology is called a physical array, also called a main array. The sub-array of the non-faulty processor unit obtained by the reconstruction algorithm is called a logical array, also called a target array, on the basis of the physical array. The problem of the reconstruction algorithm of the present invention is described as: for a physical array containing a faulty processor element, the array size is nxm, a reconstruction algorithm is adopted to reconstruct, and the faulty processor element is replaced by the processor element in the redundant column, so that a logic array which does not contain the faulty processor element and has the array size of nxx (m-r) is obtained. For a processor unit (x, y), wherein x represents the row coordinates of the processor unit, 1.ltoreq.x.ltoreq.n, n being the number of rows of the array; y represents the column coordinate of the processor unit, y is more than or equal to 1 and less than or equal to m, and m is the column number of the array; r denotes the number of given redundant columns.

Based on the knowledge, the two-dimensional Mesh network-on-chip topology reconstruction method based on shortest path priority provided by the invention comprises the following steps:

Step1, defining redundant columns on an original main array of an array scale n×m according to a predetermined redundant column number r, as shown in fig. 1.

When the predetermined redundant column number r is even, the redundant columns are distributed on the leftmost and rightmost sides of the original main array completely evenly, namely 1 st to 1 st of the left side of the original main arrayColumn and right side/>Columns m are defined as redundant columns, at which time the/>To/>The columns are non-redundant columns, and the number of redundant columns on the left side of the original main array is the same as the number of redundant columns on the right side of the original main array.

When the predetermined redundant column number r is odd, the redundant columns are relatively equally distributed on the leftmost side and the rightmost side of the original main array, namely 1 st to 1 st of the left side of the original main arrayColumn and right side/>Columns m are defined as redundant columns, at which time the/>To/>The columns are non-redundant columns, and the number of redundant columns on the left side of the original main array is one column less than the number of redundant columns on the right side of the original main array.

Where n is the number of rows of the original main array, m is the number of columns of the original main array, and r is the number of columns of the redundant columns.

And 2, scanning fault processor units in non-redundant columns of the current main array in sequence from top to bottom, and constructing a left shortest path of the current fault processor units reaching a left redundant column and a right shortest path of the current fault processor units reaching a right redundant column.

(1) The specific process of constructing the left shortest path L of the current fault processor unit to the left redundant column is as follows:

Step 2L, selecting a subsequent processing unit from the left alternative processor units of the current node L _t;

Assuming that the coordinates of the current node L _t are (x _t,y_t), the coordinates of the left alternative processor unit of the current node L _t are (x _t-1,y_t-1)、(x_t,y_t -1) and (x _t+1,y_t -1);

(2) The specific process of constructing the right shortest path R of the current fault processor unit to the right redundant column is as follows:

Step 2R, selecting a subsequent processing unit from the right alternative processor units of the current node R _t;

Assuming that the coordinates of the current node R _t are (x _t,y_t), the coordinates of the right alternative processor unit of the current node L _t are (x _t-1,y_t+1)、(x_t,y_t +1) and (x _t+1,y_t +1);

③ If the processor unit with the coordinates of (x _t,y_t +1) is a fault processor unit, only one of the processor units with the coordinates of (x _t-1,y_t +1) and (x _t+1,y_t +1) is a normal processor unit, selecting the non-fault processor unit as a subsequent processor unit;

Step 3, comparing the path lengths of the left shortest path and the right shortest path of the current fault processor unit obtained in the step 2, and taking the path length smaller as a fault compensation path; and performing logic replacement operation of circulating left shift on all nodes of the fault compensation path to update the current main array. As shown in fig. 2.

The logical replacement of all nodes of the fault-compensated path with cyclic left shift is to shift all nodes of the fault-compensated path to the left by one bit in a full cycle, the node on the right side of every two adjacent nodes on the fault-compensated path is shifted to the position of the node on the left side, and the leftmost node on the fault-compensated path is shifted to the position of the node on the rightmost side, at this time, the current fault handler unit is shifted to the redundant column.

And 4, merging the redundant columns of the current main array.

The faulty processor element on the left redundant column is merged in sequence to the left of the left redundant column, i.e. for all processor elements on the left redundant column of each row of the current main array, the faulty processor element therein is moved in sequence to the left of the left redundant column of the row, and the normal processor element therein is moved in sequence to the right of the left redundant column of the row.

The faulty processor element on the right redundant column is merged in sequence to the right of the right redundant column, i.e. for all processor elements of the right redundant column of each row of the current main array, the faulty processor element therein is moved in sequence to the right of the right redundant column of the row, and the normal processor element therein is moved in sequence to the left of the right redundant column of the row.

And 5, repeating the steps 2-4 until no fault processor unit exists in the non-redundant columns of the current main array.

It should be noted that, although the examples described above are illustrative, this is not a limitation of the present invention, and thus the present invention is not limited to the above-described specific embodiments. Other embodiments, which are apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein, are considered to be within the scope of the invention as claimed.

Claims

1. The two-dimensional Mesh network-on-chip topology reconstruction method based on shortest path first is characterized by comprising the following steps:

2. The method for reconstructing network topology on a two-dimensional Mesh chip based on shortest path first as claimed in claim 1, wherein in step 2, the specific process of constructing the left shortest path L from the current fault processor unit to the left redundant column is as follows:

3. The method for reconstructing network topology on a two-dimensional Mesh chip based on shortest path first as claimed in claim 1, wherein in step 2, the specific process of constructing the right shortest path R from the current fault processor unit to the right redundant column is as follows: