CN106339350B

CN106339350B - The method and device thereof of many-core processor on piece memory access distance optimization

Info

Publication number: CN106339350B
Application number: CN201610711933.0A
Authority: CN
Inventors: 张洋; 唐志敏; 叶笑春; 张�浩; 范东睿
Original assignee: Smartcore Beijing Co ltd; Institute of Computing Technology of CAS
Current assignee: Smartcore Beijing Co ltd; Institute of Computing Technology of CAS
Priority date: 2016-08-23
Filing date: 2016-08-23
Publication date: 2019-01-11
Anticipated expiration: 2036-08-23
Also published as: CN106339350A

Abstract

The present invention is suitable for field of computer technology, provide a kind of method and device thereof of many-core processor on piece memory access distance optimization, described method includes following steps: step 1, when storage control is on the side of the many-core processor on piece n*n topological structure, vertex nearest with a distance from the storage control in the n*n topological structure is searched；Step 2, can judgement (n-1) be divided exactly by 3, if can, increase the first node that a line connects the vertex and its place diagonal line ((0,0), (n-1, n-1)) 2/3；If cannot, judge connection it is corresponding (,) first node or (,) first node income, and select one of first node to connect the vertex according to the income；Step 3, the storage control is connect with the vertex.Whereby, the present invention, which realizes, effectively reduces the distance between node and memory access controller, to lower the memory access latency of many-core processor piece network-on-chip.

Description

Method and device for optimizing memory access distance on many-core processor chip

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for optimizing memory access distance on a many-core processor chip.

Background

With the increase of the number of cores on a chip, the distances between the storage controllers and the cores on the many-core chip are different, so that the problems of small memory access delay of a near core and large memory access delay of a far core are caused. The topological structure is a node layout and interconnection mode on a chip, is one of the key problems of the design of a network on the chip, and has important influence on the performance, the area and the power consumption of the network on the chip. Most of the existing many-core processors adopt classical topological structures (such as mesh and torus), the mesh (wireless mesh network) is simple in structural design, but the problem of far core delay exists, and torus connects edge cores through long connecting lines to solve the problem of far core delay, but the physical implementation is complex.

In high-throughput applications, tasks running on the cores are highly independent, fewer messages are interacted between the cores, and much information is available between the cores and the storage controller. In the field of topology, the traditional approach aims to reduce the distance between all cores to reduce the maximum distance between any two nodes, while in high-throughput applications, the aim is to reduce the distance between any node and the access controller.

The Mesh structure is laid out like a chessboard, and each routing node is connected with an IP core and is connected with adjacent nodes in four directions of south, east, west and north. In a network structure with the size of n × n, assuming that coordinates of a node X and a node Y are (i, j) and (s, t), respectively, the condition that X and Y are connected is | s-i | + | t-j | 1. the Mesh structure is characterized by simple design and good expansibility, but has a great degree of far-core access problem.

XMesh is a structure designed on the basis of Mesh, and the distance problem between Mesh nodes is optimized. Xmesh adds two annular paths between two groups of diagonal vertexes on the Mesh structure to form a return edge. On one diagonal of the Xmesh structure of n x n, the node (i, n-1-i) is connected to (i +1, n-2-i), (0, n-1) is connected to (n-1,0), and on the other diagonal the node (i, i) is connected to the node (i +1 ). The nodes (0,0) are connected with the nodes (n-1 ).

Xmesh solves the problem of distance between any two points and is not the problem of remote core access. Meanwhile, for the routing structure on the diagonal line, the design is relatively complex.

The Torus structure connects the head and tail routing nodes of each row and each column in the Mesh structure to shorten the average distance between the nodes. In a Torus architecture with a network size of n × n, assuming that the coordinates of node X and node Y are (i, j) and (s, t), respectively, then the condition that X and Y are connected is | s-i | + | t-j | -1 or s & | t-j | -n-1 or t & | s-i | -n-1. Torus takes advantage of unused routing node ports at the Mesh boundary, reducing network diameter. The Torus architecture also addresses the problem of distance between any two points, rather than the far core memory access problem. While Torus suffers from the additional delay due to the multiple long links. Therefore, existing topology optimization is mostly oriented to the distance problem between arbitrary nodes, not to the problem between a memory node and an arbitrary node. And thus are not fully suited for high throughput applications with high independence, high concurrency, and high memory access. The direction of traditional network-on-chip topology optimization is to reduce the distance between any two nodes, i.e. the network diameter of the topology. But the interaction between high-throughput applications is rare, but the respective memory access operations are numerous.

In summary, the existing technology for optimizing the memory access distance on the many-core processor obviously has inconvenience and defects in practical use, so that improvement is needed.

Disclosure of Invention

In view of the above-mentioned drawbacks, an object of the present invention is to provide a method and an apparatus for optimizing the access distance on a many-core processor chip, so as to effectively reduce the distance between a node and an access controller, thereby reducing the access delay of the network on the many-core processor chip.

In order to achieve one object of the invention, the invention provides a method for optimizing the on-chip memory access distance of a many-core processor, which comprises the following steps:

step 1, when a storage controller is arranged on the edge of an n x n topological structure on the many-core processor, searching a vertex closest to the storage controller in the n x n topological structure;

step 2, judging (n-1) energyIf the vertex is not divided by 3, adding a connecting line to connect the vertex and a first node of a diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondence First node of orAnd selecting one of the first nodes to connect with the vertex according to the profit;

and 3, connecting the storage controller with the vertex.

According to the method, in the step 2, the connection correspondence is judged by a regression analysis methodFirst node of orThe revenue of the first node.

According to the method, the connection correspondence is judged by a regression analysis method First node of orThe step of earning of the first node comprises:

step 21, setting 3 variables N, C and H related to N, wherein:

g)

h)

i)

step 22, setting 2 functions f (near) and f (far) related to the variables N, C and H; wherein:

e)f(near)＝2*N²-3N；

f)f(far)＝N²-N-C²+2CH+H²；

step 23, judging the sizes of f (near) and f (far); if f (near) < f (far), the vertex join correspondences are selectedA first node of (a); otherwise, selecting vertex join correspondence The first node of (1).

According to the method, the step 3 comprises:

and 31, connecting the second node where the storage controller is located and the vertex.

According to the method, the topology is a wireless mesh network topology; the coordinates of the vertex are (0, 0).

To achieve another object of the present invention, the present invention also provides an apparatus for optimizing an on-chip memory access distance of a many-core processor, the apparatus comprising:

the searching module is used for searching a vertex closest to the storage controller in the n x n topological structure when the storage controller is arranged on the edge of the n x n topological structure on the many-core processor;

the judging module is used for judging whether the (n-1) can be divided by 3, if so, adding a connecting line to connect the vertex and a first node of a diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit;

a connection module to connect the storage controller with the vertex.

According to the device, the judging module judges the connection correspondence by a regression analysis method First node of orThe revenue of the first node.

According to the apparatus, the determining module includes:

a first setting submodule for setting 3N-related variables N, C, H, wherein:

j)

k)

l)

a second setting submodule for setting 2 functions f (near) and f (far) associated with the variables N, C, H; wherein:

g)f(near)＝2*N²-3N；

h)f(far)＝N²-N-C²+2CH+H²；

the judgment connection submodule is used for judging the sizes of f (near) and f (far); if f (near) < f (far), the vertex join correspondences are selectedA first node of (a); otherwise, selecting vertex join correspondenceThe first node of (1).

According to the device, the connection module is connected with the second node where the storage controller is located and the vertex.

According to the device, the topological structure is a wireless mesh network topological structure; the coordinates of the vertex are (0, 0).

According to the method, when a storage controller is arranged on the edge of an n x n topological structure on the many-core processor, a vertex closest to the storage controller in the n x n topological structure is searched; judging whether (n-1) can be evenly divided by 3, if so, adding a connecting line to connect the vertex and a first node of a diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit; connecting the storage controller with the vertex. Therefore, the characteristics of high concurrency, less inter-core communication and more core and access controller communication of the high-flux processor core are realized, and the distance between the far core and the access controller is shortened. Specifically, by optimizing the mesh network on chip, the connection between the access controller and the remote node is increased, so that the average access distance is reduced. The invention can effectively reduce the average distance and the farthest distance from each node to the storage controller, has simple realization and less additional long connecting lines, and effectively reduces the extra delay caused by long links. The route based on the invention only needs to increase the distance comparison of some far half area nodes on the basis of the mesh route, and the route is simple.

Drawings

FIG. 1 is a schematic diagram of a device for optimizing the memory access distance on a many-core processor chip provided by the invention;

FIG. 2 is a schematic diagram of a device for optimizing the memory access distance on a many-core processor chip provided by the invention;

FIG. 3 is a flow diagram of a method for on-chip memory access distance optimization for a many-core processor, provided by the invention;

FIG. 4 is a schematic diagram of node selection to be optimized on a many-core processor as provided in the prior art;

fig. 5 is a schematic diagram of memory access distance before 4 × 4Mesh optimization provided by the present invention;

fig. 6 is a schematic diagram of memory access distance before 4 × 4Mesh optimization provided by the present invention;

fig. 7 is a schematic diagram of the memory access distance after 8 × 8Mesh optimization provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to FIG. 1, in a first embodiment of the invention, there is provided an apparatus 100 for on-chip memory access distance optimization for a many-core processor, the apparatus 100 for on-chip memory access distance optimization for a many-core processor comprising:

a lookup module 10, configured to, when the storage controller 101 is on an edge of an n × n topology on the many-core processor chip, lookup a vertex in the n × n topology that is closest to the storage controller (MC) 101;

a judging module 20, configured to judge whether (n-1) is divisible by 3, and if so, add a connection line to connect the vertex and the first node of the diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit;

a connection module 30, configured to connect the storage controller 101 with the vertex.

In this embodiment, the device 100 for optimizing the memory access distance on the many-core processor chip can optimize the memory access distance of any node in the network on the many-core processor chip. Specifically, the lookup module 10 looks up a vertex closest to the storage controller 101 in the n × n topology; the coordinates of the vertices are (0,0). The judging module 20 judges whether (n-1) can be divided by 3, if yes, a connecting line is added to connect the vertex and the first node of the diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondence First node of orAnd selecting one of the first nodes to connect with the vertex according to the profit; the connection module 30 connects the storage controller 101 to the vertex, and the connection module 30 connects the vertex and a second node where the storage controller 101 is located. Because the existing topological structure optimization is mostly directed to the distance problem between any nodes in the existing on-chip many-core architecture, but not to the problem between a storage node and any node, the existing topological structure optimization is not completely suitable for high-throughput application with high independence, high concurrency and high memory access. The device 100 for optimizing the memory access distance on the many-core processor chip provided by the invention takes the memory controller 101 as the center, reduces the average distance from the whole network to the memory controller 101 through a small number of connecting lines, and effectively solves the problem of the distance between a high-flux many-core processor node and a memory access controller. Preferably, the topology is a wireless mesh network (mesh) topology.

Referring to fig. 2, in the second embodiment of the present invention, the determining module 20 determines the connection correspondence by a regression analysis methodFirst node of orThe revenue of the first node. Specifically, the judging module 20 includes:

a first setting submodule 21 for setting 3N-related variables N, C, H, wherein:

m)

n)

o)

a second setting submodule 22 for setting 2 functions f (near) and f (far) associated with the variables N, C, H; wherein:

i)f(near)＝2*N²-3N；

j)f(far)＝N²-N-C²+2CH+H²；

the judgment connection submodule 23 is used for judging the sizes of f (near) and f (far); if f (near) < f (far), the vertex join correspondences are selectedA first node of (a); otherwise, selecting vertex join correspondenceThe first node of (1).

In this embodiment, based on the Mesh structure, the routing is simple, and the average distance from the Mesh whole network to the storage controller 101 is greatly reduced. Compared with Torus and other annular structures, the method has the advantages of simpler structural design and small physical implementation difficulty. There are fewer long wires and less extra delay due to long link growth.

Referring to FIG. 3, in a third embodiment of the invention, a method for memory distance optimization on a many-core processor chip is provided, the method comprising the following steps:

in step S301, when the storage controller 101 is on an edge of an n × n topology on the many-core processor chip, a vertex closest to the storage controller 101 in the n × n topology is searched; this step is implemented by the lookup module 10;

in step S302, it is determined whether (n-1) is divisible by 3, and if so, a connecting line is added to connect the vertex and the first node of the diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit; this step is performed by the decision block 20;

step S303, connecting the storage controller 101 to the vertex. This step is implemented by the connection module 30.

In this embodiment, a method for optimizing the memory access distance on a many-core processor chip is provided, in which when a memory controller 101 is on an edge of an n × n topology structure on the many-core processor chip, a lookup module 10 looks up a vertex in the n × n topology structure, which is closest to the memory controller 101; the judging module 20 judges whether (n-1) can be divided by 3, if yes, a connecting line is added to connect the vertex and the first node of the diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit; the connection module 30 connects the memory controller 101 with the vertices. Connecting storage controllersThe coordinates of the vertex of the second node where the controller 101 is located and the vertex are (0, 0). Preferably, the topology is a wireless mesh network (mesh) topology.

In the fourth embodiment of the present invention, in the step 302, the determining module 20 determines the connection correspondence by a regression analysis methodFirst node of orThe revenue of the first node. Specifically, the connection correspondence is judged by a regression analysis method First node of orThe step of earning of the first node comprises:

step 21, the first setting submodule 21 sets 3 variables N, C, H related to N, where:

p)

q)

r)

step 22, the second setting submodule 22 sets 2 functions f (near) and f (far) related to the variables N, C and H; wherein:

k)f(near)＝2*N²-3N；

l)f(far)＝N²-N-C²+2CH+H²；

step 23, judging the size of f (near) and f (far) by the connection submodule 23; if f (near) < f (far), the vertex join correspondences are selectedA first node of (a); otherwise, selecting vertex join correspondenceThe first node of (1).

In one embodiment of the present invention, on an n × n Mesh structure, it is assumed that the memory controller 101 is connected to the routing nodes of the boundary vertices (0,0) of the Mesh structure, and a connecting line is added to connect the node (0,0) and the diagonal ((0,0), (n-1 ))2/3 node. When (n-1) cannot be divided by 3, judging connectionOrThe profit of, ultimately inAndto select a point connection (0, 0). Following is a judgment connectionOrA method of regression analysis of the yields of (1). In particular, the method comprises the following steps of,

(1) let 3 variables N, C, H associated with N. Wherein,

a)

b)

c)

(2) setting 2 functions f (near) and f (far) related to variables N, C and H; wherein:

a)f(near)＝2*N²-3N；

b)f(far)＝N²-N-C²+2CH+H²；

(3) if f (near) < f (far), the connection is selected to correspond toThe node of (2).

Otherwise, selecting connection correspondenceThe node of (2).

That is, if the storage controller 101 is on the edge of the mesh structure, the vertex closest to the storage controller 101 is found first, and then the first node needing to be connected is found through the above method. And then links connecting the node where the storage controller 101 is located and the vertex node and the node where the storage controller 101 is located and the first node, respectively. The final topology formed is the M-mesh. The final formed connection result is the optimization result of the invention.

In one embodiment of the invention, the apparatus 100 for optimizing the memory access distance on a many-core processor realizes the memory access distance optimization on the many-core processor.

Step A: as shown in fig. 4, the memory controller 101 is connected to the vertex 101(0, 0) in the original 4 × 4mesh structure.

And B: the judgment (4-1) can be divided by 3. A connecting line is added to connect the node 101(0, 0) and the diagonal line ((0,0), (3,3))2/3, namely ((0,0), (2,2)) node 102.

For the present embodiment, before connection, the distance from each node to the (0,0) point is as shown in fig. 5: average distance is 3, farthest distance is 6. after connection, the distance from each node to the (0,0) point is as shown in fig. 3: the average distance was reduced to 2 and the maximum distance was 3.

And C: as shown in fig. 7, the Memory Controller (MC)400 is connected to the vertices 401(0, 0) in the original 8 × 8mesh structure.

Step D: judge (8-1) not divisible by 3.

Step E: determining a connecting node 402Or node 403 The gain of (1). Such as a method using the following regression analysis:

a)

b)

c)

step 404: setting 2 functions f (near) and f (far) related to variables N, C and H; wherein:

a)f(near)＝2*N²-3N＝2*25-15＝35

b)f(far)＝N²-N-C²+2CH+H²；＝25–5-16+2*4*3+9＝37

step F: f (near)<f (far), select the (0,0) connectionI.e., node 402(4, 4).

In the prior art, the average distance between nodes is not a main performance parameter in high-throughput applications, and the distance between each node and the storage controller is called optimization key. On the basis of the mesh structure, the invention provides a topological structure suitable for high-flux memory access by taking the memory controller 101 as the center, and reduces the average distance from the whole network to the memory controller 101. Compared with Torus and other annular structures, the structure is simpler in design and small in physical realization difficulty. There are fewer long wires and less extra delay due to long link growth. Based on Mesh, the route is simple. The invention increases the connection between the access controller and the remote node by optimizing the mesh network on chip so as to reduce the average access distance. The method is simple and effectively solves the problem of the distance between the high-flux many-core processor node and the access controller. The two topology structures mentioned above are both directed to common crowdsourcing structure, and based on the original connection function, the distance between any two points is shortened,

in summary, in the invention, when a storage controller is on the edge of an n × n topology structure on the many-core processor chip, a vertex closest to the storage controller in the n × n topology structure is searched; judging whether (n-1) can be divided by 3, if so, adding a connecting line to connect the vertex and the vertexA first node located at diagonal line ((0,0), (n-1 )) 2/3; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit; connecting the storage controller with the vertex. Therefore, the characteristics of high concurrency, less inter-core communication and more core and access controller communication of the high-flux processor core are realized, and the distance between the far core and the access controller is shortened. Specifically, by optimizing the mesh network on chip, the connection between the access controller and the remote node is increased, so that the average access distance is reduced. The invention can effectively reduce the average distance and the farthest distance from each node to the storage controller, has simple realization and less additional long connecting lines, and effectively reduces the extra delay caused by long links. The route based on the invention only needs to increase the distance comparison of some far half area nodes on the basis of the mesh route, and the route is simple.

The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it should be understood that various changes and modifications can be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for optimizing memory access distance on a many-core processor chip is characterized by comprising the following steps:

step 2, judging whether (n-1) can be evenly divided by 3, if so, adding a connecting line to connect the vertex and the first node of the diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondence First node of orAnd selecting one of the first nodes to connect with the vertex according to the profit;

step 3, connecting the storage controller with the vertex;

wherein in the step 2, the connection correspondence is judged by a regression analysis method First node of orThe revenue of the first node of (1);

the connection correspondence is judged by a regression analysis methodFirst node of orThe step of earning of the first node comprises:

step 21, setting 3 variables N, C and H related to N, wherein:

a)

b)

c)

a)f(near)＝2*N²-3N；

b)f(far)＝N²-N-C²+2CH+H²；

2. The method of claim 1, wherein step 3 comprises:

3. The method of claim 1, wherein the topology is a wireless mesh network topology; the coordinates of the vertex are (0, 0).

4. An apparatus for memory distance optimization on a many-core processor chip, the apparatus comprising:

a connection module for connecting the storage controller with the vertex;

wherein the judging module judges the connection correspondence by a regression analysis methodFirst node of orThe revenue of the first node of (1);

the judging module comprises:

a first setting submodule for setting 3N-related variables N, C, H, wherein:

d)

e)

f)

c)f(near)＝2*N²-3N；

d)f(far)＝N²-N-C²+2CH+H²；

5. The apparatus of claim 4, wherein the connection module connects the vertex with a second node at which the storage controller is located.

6. The apparatus of claim 4, wherein the topology is a wireless mesh network topology; the coordinates of the vertex are (0, 0).