CN106339350B - The method and device thereof of many-core processor on piece memory access distance optimization - Google Patents

The method and device thereof of many-core processor on piece memory access distance optimization Download PDF

Info

Publication number
CN106339350B
CN106339350B CN201610711933.0A CN201610711933A CN106339350B CN 106339350 B CN106339350 B CN 106339350B CN 201610711933 A CN201610711933 A CN 201610711933A CN 106339350 B CN106339350 B CN 106339350B
Authority
CN
China
Prior art keywords
node
vertex
many
far
core processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610711933.0A
Other languages
Chinese (zh)
Other versions
CN106339350A (en
Inventor
张洋
唐志敏
叶笑春
张�浩
范东睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Smartcore Beijing Co ltd
Institute of Computing Technology of CAS
Original Assignee
Smartcore Beijing Co ltd
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Smartcore Beijing Co ltd, Institute of Computing Technology of CAS filed Critical Smartcore Beijing Co ltd
Priority to CN201610711933.0A priority Critical patent/CN106339350B/en
Publication of CN106339350A publication Critical patent/CN106339350A/en
Application granted granted Critical
Publication of CN106339350B publication Critical patent/CN106339350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention is suitable for field of computer technology, provide a kind of method and device thereof of many-core processor on piece memory access distance optimization, described method includes following steps: step 1, when storage control is on the side of the many-core processor on piece n*n topological structure, vertex nearest with a distance from the storage control in the n*n topological structure is searched;Step 2, can judgement (n-1) be divided exactly by 3, if can, increase the first node that a line connects the vertex and its place diagonal line ((0,0), (n-1, n-1)) 2/3;If cannot, judge connection it is corresponding (,) first node or (,) first node income, and select one of first node to connect the vertex according to the income;Step 3, the storage control is connect with the vertex.Whereby, the present invention, which realizes, effectively reduces the distance between node and memory access controller, to lower the memory access latency of many-core processor piece network-on-chip.

Description

Method and device for optimizing memory access distance on many-core processor chip
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for optimizing memory access distance on a many-core processor chip.
Background
With the increase of the number of cores on a chip, the distances between the storage controllers and the cores on the many-core chip are different, so that the problems of small memory access delay of a near core and large memory access delay of a far core are caused. The topological structure is a node layout and interconnection mode on a chip, is one of the key problems of the design of a network on the chip, and has important influence on the performance, the area and the power consumption of the network on the chip. Most of the existing many-core processors adopt classical topological structures (such as mesh and torus), the mesh (wireless mesh network) is simple in structural design, but the problem of far core delay exists, and torus connects edge cores through long connecting lines to solve the problem of far core delay, but the physical implementation is complex.
In high-throughput applications, tasks running on the cores are highly independent, fewer messages are interacted between the cores, and much information is available between the cores and the storage controller. In the field of topology, the traditional approach aims to reduce the distance between all cores to reduce the maximum distance between any two nodes, while in high-throughput applications, the aim is to reduce the distance between any node and the access controller.
The Mesh structure is laid out like a chessboard, and each routing node is connected with an IP core and is connected with adjacent nodes in four directions of south, east, west and north. In a network structure with the size of n × n, assuming that coordinates of a node X and a node Y are (i, j) and (s, t), respectively, the condition that X and Y are connected is | s-i | + | t-j | 1. the Mesh structure is characterized by simple design and good expansibility, but has a great degree of far-core access problem.
XMesh is a structure designed on the basis of Mesh, and the distance problem between Mesh nodes is optimized. Xmesh adds two annular paths between two groups of diagonal vertexes on the Mesh structure to form a return edge. On one diagonal of the Xmesh structure of n x n, the node (i, n-1-i) is connected to (i +1, n-2-i), (0, n-1) is connected to (n-1,0), and on the other diagonal the node (i, i) is connected to the node (i +1 ). The nodes (0,0) are connected with the nodes (n-1 ).
Xmesh solves the problem of distance between any two points and is not the problem of remote core access. Meanwhile, for the routing structure on the diagonal line, the design is relatively complex.
The Torus structure connects the head and tail routing nodes of each row and each column in the Mesh structure to shorten the average distance between the nodes. In a Torus architecture with a network size of n × n, assuming that the coordinates of node X and node Y are (i, j) and (s, t), respectively, then the condition that X and Y are connected is | s-i | + | t-j | -1 or s & | t-j | -n-1 or t & | s-i | -n-1. Torus takes advantage of unused routing node ports at the Mesh boundary, reducing network diameter. The Torus architecture also addresses the problem of distance between any two points, rather than the far core memory access problem. While Torus suffers from the additional delay due to the multiple long links. Therefore, existing topology optimization is mostly oriented to the distance problem between arbitrary nodes, not to the problem between a memory node and an arbitrary node. And thus are not fully suited for high throughput applications with high independence, high concurrency, and high memory access. The direction of traditional network-on-chip topology optimization is to reduce the distance between any two nodes, i.e. the network diameter of the topology. But the interaction between high-throughput applications is rare, but the respective memory access operations are numerous.
In summary, the existing technology for optimizing the memory access distance on the many-core processor obviously has inconvenience and defects in practical use, so that improvement is needed.
Disclosure of Invention
In view of the above-mentioned drawbacks, an object of the present invention is to provide a method and an apparatus for optimizing the access distance on a many-core processor chip, so as to effectively reduce the distance between a node and an access controller, thereby reducing the access delay of the network on the many-core processor chip.
In order to achieve one object of the invention, the invention provides a method for optimizing the on-chip memory access distance of a many-core processor, which comprises the following steps:
step 1, when a storage controller is arranged on the edge of an n x n topological structure on the many-core processor, searching a vertex closest to the storage controller in the n x n topological structure;
step 2, judging (n-1) energyIf the vertex is not divided by 3, adding a connecting line to connect the vertex and a first node of a diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondence First node of orAnd selecting one of the first nodes to connect with the vertex according to the profit;
and 3, connecting the storage controller with the vertex.
According to the method, in the step 2, the connection correspondence is judged by a regression analysis methodFirst node of orThe revenue of the first node.
According to the method, the connection correspondence is judged by a regression analysis method First node of orThe step of earning of the first node comprises:
step 21, setting 3 variables N, C and H related to N, wherein:
g)
h)
i)
step 22, setting 2 functions f (near) and f (far) related to the variables N, C and H; wherein:
e)f(near)=2*N2-3N;
f)f(far)=N2-N-C2+2CH+H2
step 23, judging the sizes of f (near) and f (far); if f (near) < f (far), the vertex join correspondences are selectedA first node of (a); otherwise, selecting vertex join correspondence The first node of (1).
According to the method, the step 3 comprises:
and 31, connecting the second node where the storage controller is located and the vertex.
According to the method, the topology is a wireless mesh network topology; the coordinates of the vertex are (0, 0).
To achieve another object of the present invention, the present invention also provides an apparatus for optimizing an on-chip memory access distance of a many-core processor, the apparatus comprising:
the searching module is used for searching a vertex closest to the storage controller in the n x n topological structure when the storage controller is arranged on the edge of the n x n topological structure on the many-core processor;
the judging module is used for judging whether the (n-1) can be divided by 3, if so, adding a connecting line to connect the vertex and a first node of a diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit;
a connection module to connect the storage controller with the vertex.
According to the device, the judging module judges the connection correspondence by a regression analysis method First node of orThe revenue of the first node.
According to the apparatus, the determining module includes:
a first setting submodule for setting 3N-related variables N, C, H, wherein:
j)
k)
l)
a second setting submodule for setting 2 functions f (near) and f (far) associated with the variables N, C, H; wherein:
g)f(near)=2*N2-3N;
h)f(far)=N2-N-C2+2CH+H2
the judgment connection submodule is used for judging the sizes of f (near) and f (far); if f (near) < f (far), the vertex join correspondences are selectedA first node of (a); otherwise, selecting vertex join correspondenceThe first node of (1).
According to the device, the connection module is connected with the second node where the storage controller is located and the vertex.
According to the device, the topological structure is a wireless mesh network topological structure; the coordinates of the vertex are (0, 0).
According to the method, when a storage controller is arranged on the edge of an n x n topological structure on the many-core processor, a vertex closest to the storage controller in the n x n topological structure is searched; judging whether (n-1) can be evenly divided by 3, if so, adding a connecting line to connect the vertex and a first node of a diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit; connecting the storage controller with the vertex. Therefore, the characteristics of high concurrency, less inter-core communication and more core and access controller communication of the high-flux processor core are realized, and the distance between the far core and the access controller is shortened. Specifically, by optimizing the mesh network on chip, the connection between the access controller and the remote node is increased, so that the average access distance is reduced. The invention can effectively reduce the average distance and the farthest distance from each node to the storage controller, has simple realization and less additional long connecting lines, and effectively reduces the extra delay caused by long links. The route based on the invention only needs to increase the distance comparison of some far half area nodes on the basis of the mesh route, and the route is simple.
Drawings
FIG. 1 is a schematic diagram of a device for optimizing the memory access distance on a many-core processor chip provided by the invention;
FIG. 2 is a schematic diagram of a device for optimizing the memory access distance on a many-core processor chip provided by the invention;
FIG. 3 is a flow diagram of a method for on-chip memory access distance optimization for a many-core processor, provided by the invention;
FIG. 4 is a schematic diagram of node selection to be optimized on a many-core processor as provided in the prior art;
fig. 5 is a schematic diagram of memory access distance before 4 × 4Mesh optimization provided by the present invention;
fig. 6 is a schematic diagram of memory access distance before 4 × 4Mesh optimization provided by the present invention;
fig. 7 is a schematic diagram of the memory access distance after 8 × 8Mesh optimization provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to FIG. 1, in a first embodiment of the invention, there is provided an apparatus 100 for on-chip memory access distance optimization for a many-core processor, the apparatus 100 for on-chip memory access distance optimization for a many-core processor comprising:
a lookup module 10, configured to, when the storage controller 101 is on an edge of an n × n topology on the many-core processor chip, lookup a vertex in the n × n topology that is closest to the storage controller (MC) 101;
a judging module 20, configured to judge whether (n-1) is divisible by 3, and if so, add a connection line to connect the vertex and the first node of the diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit;
a connection module 30, configured to connect the storage controller 101 with the vertex.
In this embodiment, the device 100 for optimizing the memory access distance on the many-core processor chip can optimize the memory access distance of any node in the network on the many-core processor chip. Specifically, the lookup module 10 looks up a vertex closest to the storage controller 101 in the n × n topology; the coordinates of the vertices are (0,0). The judging module 20 judges whether (n-1) can be divided by 3, if yes, a connecting line is added to connect the vertex and the first node of the diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondence First node of orAnd selecting one of the first nodes to connect with the vertex according to the profit; the connection module 30 connects the storage controller 101 to the vertex, and the connection module 30 connects the vertex and a second node where the storage controller 101 is located. Because the existing topological structure optimization is mostly directed to the distance problem between any nodes in the existing on-chip many-core architecture, but not to the problem between a storage node and any node, the existing topological structure optimization is not completely suitable for high-throughput application with high independence, high concurrency and high memory access. The device 100 for optimizing the memory access distance on the many-core processor chip provided by the invention takes the memory controller 101 as the center, reduces the average distance from the whole network to the memory controller 101 through a small number of connecting lines, and effectively solves the problem of the distance between a high-flux many-core processor node and a memory access controller. Preferably, the topology is a wireless mesh network (mesh) topology.
Referring to fig. 2, in the second embodiment of the present invention, the determining module 20 determines the connection correspondence by a regression analysis methodFirst node of orThe revenue of the first node. Specifically, the judging module 20 includes:
a first setting submodule 21 for setting 3N-related variables N, C, H, wherein:
m)
n)
o)
a second setting submodule 22 for setting 2 functions f (near) and f (far) associated with the variables N, C, H; wherein:
i)f(near)=2*N2-3N;
j)f(far)=N2-N-C2+2CH+H2
the judgment connection submodule 23 is used for judging the sizes of f (near) and f (far); if f (near) < f (far), the vertex join correspondences are selectedA first node of (a); otherwise, selecting vertex join correspondenceThe first node of (1).
In this embodiment, based on the Mesh structure, the routing is simple, and the average distance from the Mesh whole network to the storage controller 101 is greatly reduced. Compared with Torus and other annular structures, the method has the advantages of simpler structural design and small physical implementation difficulty. There are fewer long wires and less extra delay due to long link growth.
Referring to FIG. 3, in a third embodiment of the invention, a method for memory distance optimization on a many-core processor chip is provided, the method comprising the following steps:
in step S301, when the storage controller 101 is on an edge of an n × n topology on the many-core processor chip, a vertex closest to the storage controller 101 in the n × n topology is searched; this step is implemented by the lookup module 10;
in step S302, it is determined whether (n-1) is divisible by 3, and if so, a connecting line is added to connect the vertex and the first node of the diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit; this step is performed by the decision block 20;
step S303, connecting the storage controller 101 to the vertex. This step is implemented by the connection module 30.
In this embodiment, a method for optimizing the memory access distance on a many-core processor chip is provided, in which when a memory controller 101 is on an edge of an n × n topology structure on the many-core processor chip, a lookup module 10 looks up a vertex in the n × n topology structure, which is closest to the memory controller 101; the judging module 20 judges whether (n-1) can be divided by 3, if yes, a connecting line is added to connect the vertex and the first node of the diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit; the connection module 30 connects the memory controller 101 with the vertices. Connecting storage controllersThe coordinates of the vertex of the second node where the controller 101 is located and the vertex are (0, 0). Preferably, the topology is a wireless mesh network (mesh) topology.
In the fourth embodiment of the present invention, in the step 302, the determining module 20 determines the connection correspondence by a regression analysis methodFirst node of orThe revenue of the first node. Specifically, the connection correspondence is judged by a regression analysis method First node of orThe step of earning of the first node comprises:
step 21, the first setting submodule 21 sets 3 variables N, C, H related to N, where:
p)
q)
r)
step 22, the second setting submodule 22 sets 2 functions f (near) and f (far) related to the variables N, C and H; wherein:
k)f(near)=2*N2-3N;
l)f(far)=N2-N-C2+2CH+H2
step 23, judging the size of f (near) and f (far) by the connection submodule 23; if f (near) < f (far), the vertex join correspondences are selectedA first node of (a); otherwise, selecting vertex join correspondenceThe first node of (1).
In one embodiment of the present invention, on an n × n Mesh structure, it is assumed that the memory controller 101 is connected to the routing nodes of the boundary vertices (0,0) of the Mesh structure, and a connecting line is added to connect the node (0,0) and the diagonal ((0,0), (n-1 ))2/3 node. When (n-1) cannot be divided by 3, judging connectionOrThe profit of, ultimately inAndto select a point connection (0, 0). Following is a judgment connectionOrA method of regression analysis of the yields of (1). In particular, the method comprises the following steps of,
(1) let 3 variables N, C, H associated with N. Wherein,
a)
b)
c)
(2) setting 2 functions f (near) and f (far) related to variables N, C and H; wherein:
a)f(near)=2*N2-3N;
b)f(far)=N2-N-C2+2CH+H2
(3) if f (near) < f (far), the connection is selected to correspond toThe node of (2).
Otherwise, selecting connection correspondenceThe node of (2).
That is, if the storage controller 101 is on the edge of the mesh structure, the vertex closest to the storage controller 101 is found first, and then the first node needing to be connected is found through the above method. And then links connecting the node where the storage controller 101 is located and the vertex node and the node where the storage controller 101 is located and the first node, respectively. The final topology formed is the M-mesh. The final formed connection result is the optimization result of the invention.
In one embodiment of the invention, the apparatus 100 for optimizing the memory access distance on a many-core processor realizes the memory access distance optimization on the many-core processor.
Step A: as shown in fig. 4, the memory controller 101 is connected to the vertex 101(0, 0) in the original 4 × 4mesh structure.
And B: the judgment (4-1) can be divided by 3. A connecting line is added to connect the node 101(0, 0) and the diagonal line ((0,0), (3,3))2/3, namely ((0,0), (2,2)) node 102.
For the present embodiment, before connection, the distance from each node to the (0,0) point is as shown in fig. 5: average distance is 3, farthest distance is 6. after connection, the distance from each node to the (0,0) point is as shown in fig. 3: the average distance was reduced to 2 and the maximum distance was 3.
In one embodiment of the invention, the apparatus 100 for optimizing the memory access distance on a many-core processor realizes the memory access distance optimization on the many-core processor.
And C: as shown in fig. 7, the Memory Controller (MC)400 is connected to the vertices 401(0, 0) in the original 8 × 8mesh structure.
Step D: judge (8-1) not divisible by 3.
Step E: determining a connecting node 402Or node 403 The gain of (1). Such as a method using the following regression analysis:
a)
b)
c)
step 404: setting 2 functions f (near) and f (far) related to variables N, C and H; wherein:
a)f(near)=2*N2-3N=2*25-15=35
b)f(far)=N2-N-C2+2CH+H2;=25–5-16+2*4*3+9=37
step F: f (near)<f (far), select the (0,0) connectionI.e., node 402(4, 4).
In the prior art, the average distance between nodes is not a main performance parameter in high-throughput applications, and the distance between each node and the storage controller is called optimization key. On the basis of the mesh structure, the invention provides a topological structure suitable for high-flux memory access by taking the memory controller 101 as the center, and reduces the average distance from the whole network to the memory controller 101. Compared with Torus and other annular structures, the structure is simpler in design and small in physical realization difficulty. There are fewer long wires and less extra delay due to long link growth. Based on Mesh, the route is simple. The invention increases the connection between the access controller and the remote node by optimizing the mesh network on chip so as to reduce the average access distance. The method is simple and effectively solves the problem of the distance between the high-flux many-core processor node and the access controller. The two topology structures mentioned above are both directed to common crowdsourcing structure, and based on the original connection function, the distance between any two points is shortened,
in summary, in the invention, when a storage controller is on the edge of an n × n topology structure on the many-core processor chip, a vertex closest to the storage controller in the n × n topology structure is searched; judging whether (n-1) can be divided by 3, if so, adding a connecting line to connect the vertex and the vertexA first node located at diagonal line ((0,0), (n-1 )) 2/3; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit; connecting the storage controller with the vertex. Therefore, the characteristics of high concurrency, less inter-core communication and more core and access controller communication of the high-flux processor core are realized, and the distance between the far core and the access controller is shortened. Specifically, by optimizing the mesh network on chip, the connection between the access controller and the remote node is increased, so that the average access distance is reduced. The invention can effectively reduce the average distance and the farthest distance from each node to the storage controller, has simple realization and less additional long connecting lines, and effectively reduces the extra delay caused by long links. The route based on the invention only needs to increase the distance comparison of some far half area nodes on the basis of the mesh route, and the route is simple.
The present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof, and it should be understood that various changes and modifications can be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (6)

1. A method for optimizing memory access distance on a many-core processor chip is characterized by comprising the following steps:
step 1, when a storage controller is arranged on the edge of an n x n topological structure on the many-core processor, searching a vertex closest to the storage controller in the n x n topological structure;
step 2, judging whether (n-1) can be evenly divided by 3, if so, adding a connecting line to connect the vertex and the first node of the diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondence First node of orAnd selecting one of the first nodes to connect with the vertex according to the profit;
step 3, connecting the storage controller with the vertex;
wherein in the step 2, the connection correspondence is judged by a regression analysis method First node of orThe revenue of the first node of (1);
the connection correspondence is judged by a regression analysis methodFirst node of orThe step of earning of the first node comprises:
step 21, setting 3 variables N, C and H related to N, wherein:
a)
b)
c)
step 22, setting 2 functions f (near) and f (far) related to the variables N, C and H; wherein:
a)f(near)=2*N2-3N;
b)f(far)=N2-N-C2+2CH+H2
step 23, judging the sizes of f (near) and f (far); if f (near) < f (far), the vertex join correspondences are selectedA first node of (a); otherwise, selecting vertex join correspondence The first node of (1).
2. The method of claim 1, wherein step 3 comprises:
and 31, connecting the second node where the storage controller is located and the vertex.
3. The method of claim 1, wherein the topology is a wireless mesh network topology; the coordinates of the vertex are (0, 0).
4. An apparatus for memory distance optimization on a many-core processor chip, the apparatus comprising:
the searching module is used for searching a vertex closest to the storage controller in the n x n topological structure when the storage controller is arranged on the edge of the n x n topological structure on the many-core processor;
the judging module is used for judging whether the (n-1) can be divided by 3, if so, adding a connecting line to connect the vertex and a first node of a diagonal line ((0,0), (n-1 ))2/3 where the vertex is located; if not, determining connection correspondenceFirst node of orAnd selecting one of the first nodes to connect with the vertex according to the profit;
a connection module for connecting the storage controller with the vertex;
wherein the judging module judges the connection correspondence by a regression analysis methodFirst node of orThe revenue of the first node of (1);
the judging module comprises:
a first setting submodule for setting 3N-related variables N, C, H, wherein:
d)
e)
f)
a second setting submodule for setting 2 functions f (near) and f (far) associated with the variables N, C, H; wherein:
c)f(near)=2*N2-3N;
d)f(far)=N2-N-C2+2CH+H2
the judgment connection submodule is used for judging the sizes of f (near) and f (far); if f (near) < f (far), the vertex join correspondences are selectedA first node of (a); otherwise, selecting vertex join correspondenceThe first node of (1).
5. The apparatus of claim 4, wherein the connection module connects the vertex with a second node at which the storage controller is located.
6. The apparatus of claim 4, wherein the topology is a wireless mesh network topology; the coordinates of the vertex are (0, 0).
CN201610711933.0A 2016-08-23 2016-08-23 The method and device thereof of many-core processor on piece memory access distance optimization Active CN106339350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610711933.0A CN106339350B (en) 2016-08-23 2016-08-23 The method and device thereof of many-core processor on piece memory access distance optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610711933.0A CN106339350B (en) 2016-08-23 2016-08-23 The method and device thereof of many-core processor on piece memory access distance optimization

Publications (2)

Publication Number Publication Date
CN106339350A CN106339350A (en) 2017-01-18
CN106339350B true CN106339350B (en) 2019-01-11

Family

ID=57824681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610711933.0A Active CN106339350B (en) 2016-08-23 2016-08-23 The method and device thereof of many-core processor on piece memory access distance optimization

Country Status (1)

Country Link
CN (1) CN106339350B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107634909A (en) * 2017-10-16 2018-01-26 北京中科睿芯科技有限公司 Towards the route network and method for routing of multiaddress shared data route bag
CN111985181B (en) * 2020-08-25 2023-09-22 北京灵汐科技有限公司 Node layout method and device, computer equipment and storage medium
CN112613266B (en) * 2020-12-02 2023-01-31 海光信息技术股份有限公司 System on chip with network topology structure, routing path determination method and device and electronic equipment
CN116405555B (en) * 2023-03-08 2024-01-09 阿里巴巴(中国)有限公司 Data transmission method, routing node, processing unit and system on chip
CN116720560B (en) * 2023-07-13 2023-12-01 中电海康集团有限公司 Brain-like system based on many-core processing unit and data processing method
CN117454930B (en) * 2023-12-22 2024-04-05 苏州元脑智能科技有限公司 Method and device for outputting expression characteristic data aiming at graphic neural network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102591759A (en) * 2011-12-29 2012-07-18 中国科学技术大学苏州研究院 Clock precision parallel simulation system for on-chip multi-core processor
US20130318539A1 (en) * 2011-12-23 2013-11-28 Saurabh Dighe Characterization of within-die variations of many-core processors

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130318539A1 (en) * 2011-12-23 2013-11-28 Saurabh Dighe Characterization of within-die variations of many-core processors
CN102591759A (en) * 2011-12-29 2012-07-18 中国科学技术大学苏州研究院 Clock precision parallel simulation system for on-chip multi-core processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
EOFDM:一种面向众核架构的最低能耗搜索方法;朱亚涛等;《计算机研究与发展》;20151231;第52卷(第6期);第1303-1314页 *
众核处理器高效片上访存机制研究;范灵俊等;《第十五届计算机工程与工艺年会暨第一届激处理器技术论坛论文集》;20111231;第29-34页 *

Also Published As

Publication number Publication date
CN106339350A (en) 2017-01-18

Similar Documents

Publication Publication Date Title
CN106339350B (en) The method and device thereof of many-core processor on piece memory access distance optimization
US10348563B2 (en) System-on-chip (SoC) optimization through transformation and generation of a network-on-chip (NoC) topology
US9473388B2 (en) Supporting multicast in NOC interconnect
US9253085B2 (en) Hierarchical asymmetric mesh with virtual routers
CN101808032B (en) Static XY routing algorithm-oriented two-dimensional grid NoC router optimization design method
US10229087B2 (en) Many-core processor system integrated with network router, and integration method and implementation method thereof
US20140331027A1 (en) Asymmetric mesh noc topologies
CN109189720B (en) Hierarchical network-on-chip topology structure and routing method thereof
US10218581B2 (en) Generation of network-on-chip layout based on user specified topological constraints
US11983481B2 (en) Software-defined wafer-level switching system design method and apparatus
CN114844827B (en) Shared storage-based spanning tree routing hardware architecture and method for network-on-chip
CN110991626A (en) Multi-CPU brain simulation system
KR20230170609A (en) System and method for synthesis of a network-on-chip for deadlock-free transformation
US10469338B2 (en) Cost management against requirements for the generation of a NoC
CN113203940B (en) Parallel test method in 3D NoC test planning
Inam et al. Shortest path routing algorithm for hierarchical interconnection network-on-chip
US9774498B2 (en) Hierarchical asymmetric mesh with virtual routers
CN203982379U (en) For the multimode data transmission connectors of coarseness dynamic reconfigurable array
WO2016127892A1 (en) Point-to-multipoint communication method based on 3d-mesh network and communication node
CN110784781B (en) Waveguide layout optimization method of optical switching structure supporting multicast/broadcast
US11762560B2 (en) Optimizing NOC performance using crossbars
CN107104909B (en) Fault-tolerant special network-on-chip topology generation method
WO2024216854A1 (en) Two-dimensional network-on-chip structure, routing method therefor, apparatus, terminal, and storage medium
US20240048508A1 (en) Mixed-Dimension Order Routing
Gu et al. Research on congestion perception and control of network on chip

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant