WO2022042368A1 - 一种逻辑节点布局方法、装置、计算机设备及存储介质 - Google Patents

一种逻辑节点布局方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022042368A1
WO2022042368A1 PCT/CN2021/112964 CN2021112964W WO2022042368A1 WO 2022042368 A1 WO2022042368 A1 WO 2022042368A1 CN 2021112964 W CN2021112964 W CN 2021112964W WO 2022042368 A1 WO2022042368 A1 WO 2022042368A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
unlocked
logical
nodes
processing
Prior art date
Application number
PCT/CN2021/112964
Other languages
English (en)
French (fr)
Inventor
何伟
沈杨书
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Priority to US17/909,417 priority Critical patent/US11694014B2/en
Publication of WO2022042368A1 publication Critical patent/WO2022042368A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/398Design verification or optimisation, e.g. using design rule check [DRC], layout versus schematics [LVS] or finite element methods [FEM]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/10Packet switching elements characterised by the switching fabric construction
    • H04L49/109Integrated on microchip, e.g. switch-on-chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7825Globally asynchronous, locally synchronous, e.g. network on chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/32Circuit design at the digital level
    • G06F30/33Design verification, e.g. functional simulation or model checking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/392Floor-planning or layout, e.g. partitioning or placement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2115/00Details relating to the type of the circuit
    • G06F2115/02System on chip [SoC] design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/502Proximity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/30Circuit design
    • G06F30/39Circuit design at the physical level
    • G06F30/394Routing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • Embodiments of the present invention relate to computer technology, in particular to the technical field of neural networks and artificial intelligence, and in particular, to a method, apparatus, computer device, and storage medium for layout of logical nodes.
  • a processing node also called a computing core
  • a processing node is the most basic computing element of a many-core system, and undertakes important functions such as logical operations, control processing, storage access, and interconnection communication.
  • On a many-core system there are generally multiple processing nodes, and the processing nodes are connected to each other through an on-chip network to realize the effective transmission of data on the many-core system.
  • each logical node (such as a computing subtask) is laid out in the many-core system by means of static layout, that is, the layout of each logical node is implemented on each processing node in the many-core system in advance, and the The position of the logical node will not be adjusted again after layout. Since one logical node may need to use data calculated by other logical nodes, data transmission is required between corresponding processing nodes.
  • the inventor found that the related art does not effectively arrange each logical node in the many-core system, therefore, the amount of data transmission between some processing nodes in the entire on-chip network is too large, and some processing nodes may be too large. Due to the occurrence of unsaturated data transmission volume, the maximum efficiency of each location routing in the many-core system cannot be fully exerted, and the processing efficiency of the many-core system will be reduced.
  • Embodiments of the present invention provide a logical node layout method, apparatus, computer equipment, and storage medium, so as to provide a logical node layout method in a many-core system, so as to improve the processing efficiency of the entire many-core system.
  • an embodiment of the present invention provides a logical node layout method, which is used in a many-core system.
  • the multiple processing nodes of the many-core system those located at the edge of the on-chip network are edge processing nodes, and the others are internal processing nodes; method Including: acquiring multiple routing information; each routing information includes two logical nodes and the data transmission volume between the two logical nodes; determining the unprocessed routing information with the largest data transmission volume as the current routing information;
  • Each unlocked logical node is mapped to an unlocked processing node, and the mapped logical node and processing node are locked; if there is at least one unlocked logical node, the unprocessed routing information that determines the largest amount of data transmission is returned. is the step of current routing information; wherein, if there is an unlocked edge processing node, the unlocked logical node is mapped to the unlocked edge processing node.
  • the unlocked logical nodes are mapped to the unlocked edge processing nodes at the corners of the on-chip network.
  • mapping each unlocked logical node of the current routing information to an unlocked processing node includes: if the current routing information includes two unlocked logical nodes, according to the position of the unlocked processing node, Map the two unlocked logical nodes to two unlocked processing nodes respectively; if the current routing information includes an unlocked logical node and a locked logical node, according to the location of the unlocked processing node and the locked logical node The position of the processing node where the locked logical node is located, and the unlocked logical node is mapped to an unlocked processing node.
  • mapping the two unlocked logical nodes to the two unlocked processing nodes includes: if there are at least two unlocked edge processing nodes, mapping the two unlocked logical nodes to the two unlocked processing nodes respectively includes: Two unlocked logical nodes are respectively mapped to the two closest unlocked edge processing nodes; and/or, if there is only one unlocked edge processing node, one of the unlocked logical nodes is mapped to the unlocked edge processing node the edge processing node, another unlocked logical node is mapped to the unlocked internal processing node closest to the unlocked edge processing node; and/or, if there is no unlocked edge processing node, the two Unlocked logical nodes are respectively mapped to the two closest unlocked internal processing nodes.
  • mapping the unlocked logical node to an unlocked processing node according to the location of the unlocked processing node and the location of the processing node where the locked logical node is located includes: if there is at least one unlocked processing node For the locked edge processing node, map the unlocked logical node to the edge processing node closest to the processing node where the locked logical node is located; if there is no unlocked edge processing node, the unlocked logical node Maps to the internal processing node closest to the processing node where the locked logical node resides.
  • an unlocked logical node is an unmapped logical node to be mapped; an unlocked processing node is an empty processing node with no logical nodes in it.
  • the unlocked logical node is a preset logical node that has been pre-mapped to the processing nodes; the unlocked processing node includes a preset mapped processing node having a preset logical node therein.
  • mapping each unlocked logical node of the current routing information to an unlocked processing node includes: if the unlocked logical node is mapped to a preset mapping processing node having other preset logical nodes therein , move the preset logical node to another unlocked processing node.
  • mapping each unlocked logical node of the current routing information to an unlocked processing node includes: if the current routing information includes two unlocked logical nodes, and the two unlocked logical nodes If the distance between the two edge processing nodes is less than or equal to the preset threshold, the two unlocked logical nodes are respectively mapped to the edge processing nodes where they are located.
  • an embodiment of the present invention provides a logical node layout device, which is used in a many-core system.
  • the device includes: : The routing information acquisition module is used to acquire multiple routing information; each routing information includes two logical nodes and the data transmission volume between the two logical nodes; the current routing determination module is used to determine the unprocessed one with the largest data transmission volume The routing information is the current routing information; the mapping module is used to map each unlocked logical node of the current routing information to an unlocked processing node, and lock the mapped logical node and processing node; if there is at least one The unlocked logical node makes the current route determination module work; wherein, if there is an unlocked edge processing node, the unlocked logical node is mapped to the unlocked edge processing node.
  • an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, any one of the embodiments of the present invention is implemented.
  • Logical node layout method When the processor executes the computer program, any one of the embodiments of the present invention is implemented.
  • an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements any one of the logical node layout methods in the embodiment of the present invention.
  • the unlocked logical nodes are mapped to the unlocked processing nodes according to the descending order of the data transmission amount in the routing information, And as long as there are still edge processing nodes that are currently unlocked, the logical nodes will be mapped to the unlocked edge processing nodes, so as to achieve "priority mapping of logical nodes with a large amount of data transmission to edge processing nodes".
  • the layout position in the network determines the frequency of its use in the data transmission process, creatively proposes a new way of laying out each logical node in the many-core system according to the data transmission volume of the logical node, so that the on-chip network can be
  • the amount of data transmitted in part is relatively balanced, making full use of the routing efficiency of each location in the many-core system, thereby improving the processing efficiency of the entire many-core system.
  • FIG. 1 is an implementation flowchart of a logical node layout method in an embodiment of the present invention
  • FIG. 2 is an implementation flowchart of another logical node layout method in an embodiment of the present invention.
  • Fig. 3a is the realization flow chart of another logical node layout method in the embodiment of the present invention.
  • 3b is a schematic diagram of an application scenario of another logical node layout process to which an embodiment of the invention is applicable;
  • Fig. 3c is the scene schematic diagram of another kind of logical node layout process that the embodiment of the present invention is applicable to;
  • 3d is a schematic diagram of a scenario of another logical node layout process to which the embodiment of the present invention is applicable;
  • 3e is a schematic diagram of a scenario of another logical node layout process to which an embodiment of the present invention is applicable;
  • Fig. 4 is the realization flow chart of another logical node layout method in the embodiment of the present invention.
  • FIG. 5 is a structural diagram of a logical node layout device in an embodiment of the present invention.
  • FIG. 6 is a structural diagram of a computer device in an embodiment of the present invention.
  • FIG. 7 is a structural diagram of a computer-readable storage medium in an embodiment of the present invention.
  • a many-core system includes multiple processing nodes, and each processing node can be connected to each other through an on-chip network to realize data interaction between any two processing nodes.
  • data interaction can be performed directly between two adjacent processing nodes, while data interaction is indirectly performed between two non-adjacent processing nodes through forwarding by one or more relay processing nodes.
  • the inventor found that in order to ensure a small number of processing nodes relaying, in an on-chip network, the closer the processing node to the middle position is, the more likely it will be selected as a relay processing node to forward data, that is, in an on-chip network. During the entire data calculation and processing process, the processing nodes in the middle position will frequently act as relay processing nodes to forward data.
  • a computing task for example, a neural network
  • the computing task will be divided into multiple computing sub-tasks, and corresponding computing tasks will be allocated to each processing node in the many-core system.
  • Subtask for example, a layer in a neural network
  • the calculation subtasks are also called logical nodes, and the above allocation process is to map (layout) the logical nodes to the corresponding processing nodes.
  • the amount of data transmission between each logical node can also be predetermined. Since there is no effective logical node layout method in the related art, there may be a situation where logical nodes with a large amount of data transmission are laid out at the center of the on-chip network, while logical nodes with a small amount of data transmission are laid out. to the edge of the network-on-chip. At this time, the processing nodes in the middle position not only undertake the heavy data transmission task, but also undertake the heavy data relay task, and the routing load is too large, while the processing nodes in the edge position neither need to transmit a large amount of data nor give priority to It is selected as a relay processing node, and the routing load of different locations is not saturated.
  • the inventor creatively proposes: a new way of introducing the data transmission volume of the logical node as a layout reference factor in the process of logical node layout, giving priority to the layout of the logical node with a large data transmission volume to the edge position of the on-chip network, and placing the Logic nodes with a small amount of data transmission are preferentially placed in the center of the on-chip network, so that each processing node can achieve routing load balance to the greatest extent.
  • an embodiment of the present invention provides a logical node layout method, which is used in a many-core system.
  • the multiple processing nodes of the many-core system those located at the edge of the on-chip network are edge processing nodes, and the others are internal processing nodes.
  • a many-core system includes multiple processing nodes (computing cores), and the processing nodes are connected to each other through an on-chip network.
  • the on-chip network has a predetermined physical shape, such as a matrix shape, a star shape, or a honeycomb shape.
  • the on-chip mesh is not limited to a two-dimensional mesh (2D mesh), for example, it can also have a loaded three-dimensional structure.
  • processing nodes are located at the edge of the on-chip network, that is, these processing nodes are processing nodes in the "outermost circle" of the on-chip network graph, which are called edge processing nodes. Except for edge processing nodes, other processing nodes are internal processing nodes.
  • the on-chip network when the internal processing nodes in the on-chip network are used as relay processing nodes, the number of relays is generally reduced. Therefore, the coordinated data processing process is carried out by each processing node in the many-core system. , the frequency of internal processing nodes in the network-on-chip as relay processing nodes is much higher than that of edge processing nodes.
  • the method of the embodiment of the present invention includes:
  • each routing information includes two logical nodes and the data transmission amount between the two logical nodes.
  • the computing task is divided into multiple computing subtasks (logical nodes), and each logical node is assigned (mapped) to a processing node.
  • each logical node since this computing task is known, the amount of data transmission between each logical node is also known. In other words, after each logical node is determined, multiple routing information can be determined, and each routing information records two logical nodes that will interact with data and the amount of data that should be transmitted between the two logical nodes. (ie, the amount of data transfer).
  • the amount of data transmission between two logical nodes may be the total amount of data exchanged between the two logical nodes, or may be transmitted from one logical node (source logical node) to another logical node (target logical node) ) unidirectional amount of data.
  • the routing information of different logical node combinations is acquired.
  • any one logical node usually does not only perform data transmission with another logical node, but may perform data transmission with a plurality of other logical nodes. That is, a logical node may belong to multiple routing information at the same time, and different routing information may also include the same logical node.
  • All routing information is sorted in descending order of the data transmission amount, and the routing information with the largest data transmission amount is selected from the unprocessed routing information as the current routing information for subsequent processing.
  • routing information is no longer "unprocessed" routing information.
  • S003 map each unlocked logical node of the current routing information to an unlocked processing node, and lock the mapped logical node and processing node; if there is at least one unlocked logical node, return to determine the amount of data transmission
  • the step with the largest unprocessed routing information is the current routing information.
  • the unlocked logical node is mapped to the unlocked edge processing node.
  • the above mapping of unlocked logical nodes must satisfy: as long as there are edge processing nodes that are unlocked, the unlocked logical nodes must be mapped to unlocked edge processing nodes, but cannot be mapped to unlocked edge processing nodes. in the internal processing node.
  • step S001 After this step is completed, if there are unlocked logical nodes in all the routing information, the process returns to step S001, and the current routing information is reselected to allocate the unlocked logical nodes.
  • the unlocked logical nodes are mapped to the unlocked processing nodes according to the descending order of the data transmission amount in the routing information, And as long as there are still edge processing nodes that are currently unlocked, the logical nodes will be mapped to the unlocked edge processing nodes, so as to achieve "priority mapping of logical nodes with a large amount of data transmission to edge processing nodes".
  • the layout position in the network determines the frequency of its use in the data transmission process, creatively proposes a new way of laying out each logical node in the many-core system according to the data transmission volume of the logical node, so that the on-chip network can be
  • the amount of data transmitted in part is relatively balanced, making full use of the routing efficiency of each location in the many-core system, thereby improving the processing efficiency of the entire many-core system.
  • the unlocked logical nodes are mapped to the unlocked edge processing nodes at the corners of the on-chip network.
  • the logical node is preferentially mapped to the unlocked processing node at the corner. Locked edge processing nodes. Because the "corner" is a further extreme case of the edge, its probability of being a relay processing node will be lower.
  • mapping each unlocked logical node of the current routing information to an unlocked processing node includes:
  • the current routing information includes two unlocked logical nodes, map the two unlocked logical nodes to two unlocked processing nodes respectively according to the positions of the unlocked processing nodes.
  • the unlocked logical node Maps to an unlocked processing node.
  • how to allocate the two logical nodes can be determined according to the positions of the processing nodes that are not yet locked in the on-chip network.
  • the unlocked logical node should also be considered when determining how the unlocked logical node is allocated. The location of the locked logical node.
  • mapping the two unlocked logical nodes to the two unlocked processing nodes includes at least one of the following:
  • both unlocked logical nodes can be mapped to the unlocked edge processing nodes, and Try to ensure that the distance between the two mapped edge processing nodes is close.
  • one of the logical nodes is mapped to the edge processing node, and the other logical node is mapped to the internal processing node closest to the edge processing node.
  • the two unlocked logical nodes are respectively mapped to the unlocked internal processing nodes closest to each other.
  • the distance between the processing nodes is based on the “shortest transmission step” between the two processing nodes, that is, the distance between the two processing nodes in the on-chip network is based on all paths that can connect the two processing nodes.
  • the length of the shortest path of rather than the straight-line physical distance between the two processing nodes.
  • mapping the unlocked logical node to an unlocked processing node includes:
  • the logical node is mapped to the internal processing node (may be an edge processing node or an internal processing node) closest to the processing node where the locked logical node in the current routing information is located. in the processing node.
  • the internal processing node may be an edge processing node or an internal processing node
  • an unlocked logical node is an unmapped logical node to be mapped; an unlocked processing node is an empty processing node with no logical nodes in it.
  • all logical nodes may be "unmapped” in the initial state, and all processing nodes may be "empty” in the initial state, that is, the embodiment of the present invention is used to map the unmapped logical nodes To empty processing nodes, for example, mapping n unmapped logical nodes to n empty processing nodes.
  • the unlocked logical node is a preset logical node that has been pre-mapped to the processing nodes; the unlocked processing node includes a preset mapped processing node having a preset logical node therein.
  • all logical nodes may be "mapped to logical nodes" in the initial state, so at least some of the processing nodes are "mapped to logical nodes" (of course, there may also be some
  • the processing node is an empty processing node), and the method in this embodiment of the present invention is used to "modify” or “adjust" the original mapping, that is, to "move” the logical node from the processing node where its original mapping is located to a new processing node middle.
  • a logical node may actually still be mapped to the processing node where it is originally located, or it is said that it "has not moved".
  • mapping each unlocked logical node of the current routing information to an unlocked processing node includes:
  • the original injection processing node in the preset injection processing node Preset logical nodes should also be "moved" to other unlocked processing nodes to avoid situations where there are multiple logical nodes in one processing node.
  • mapping each unlocked logical node of the current routing information to an unlocked processing node includes:
  • the two unlocked logical nodes are located in two edge processing nodes whose distance is less than or equal to a preset threshold, then the two unlocked logical nodes are The nodes are respectively mapped to the edge processing nodes where they are located.
  • the two unlocked logical nodes of the current routing information are both in edge processing nodes, and the distance (eg, the shortest transmission step) between the two edge processing nodes is less than a predetermined If the threshold is set, the two unlocked logical nodes can be directly mapped to the edge processing nodes where they are originally located, or "do not move".
  • FIG. 2 is an implementation flowchart of another node (ie, logical node, the same below) layout method provided by an embodiment of the present invention.
  • This embodiment can be applied to layout each node on a chip ( That is, in a many-core system, the same below) in each node vacancy (that is, a processing node, the same below), and no node has been laid out in each node vacancy in advance, the method can be performed by a node layout device, which can be implemented by It is implemented in software, and/or hardware, and can generally be integrated on a computer device with data computing functions.
  • the chip to be laid out includes multiple nodes and multiple node vacancies, and each node is correspondingly arranged in the node vacancy to form a transmission network.
  • each node vacancy in the chip is reasonably configured, that is, the position of each node vacancy in the chip is preset. And communication between adjacent node vacancies is possible.
  • the node vacancy refers to a computing core in the chip that performs data computing tasks.
  • the method of the embodiment of the present invention specifically includes the following steps:
  • each node in the chip is assigned corresponding computing subtasks.
  • the amount of data transmission between each node is also known.
  • each piece of routing information can be determined, and each piece of routing information records the two nodes where data interaction occurs, and between the two nodes.
  • the amount of data transferred ie, the amount of data transferred).
  • the routing information includes: the source node for sending data, the target node for receiving data, and the data transmission amount between the source node and the target node.
  • arranging each of the nodes in the direction of the network edge node vacancy toward the network center node vacancy may include:
  • the routing information can be sorted in descending order of the data transmission amount, and the corresponding source nodes and target nodes can be obtained in sequence according to the sorted routing information for layout, until all the chips in the chip are completed.
  • the layout of the nodes can be sorted in descending order of the data transmission amount, and the corresponding source nodes and target nodes can be obtained in sequence according to the sorted routing information for layout, until all the chips in the chip are completed.
  • arranging each of the nodes in the direction of the network edge node vacancy toward the network center node vacancy may include:
  • the routing information is sorted; according to the sorting result, the source node and the target node are respectively obtained from the routing information in sequence, and the non-overlapping nodes are added to the node set to obtain A node set corresponding to all nodes in the chip; wherein, in the node set, a source node and a target node belonging to the same routing information are identified; according to the node order in the node set and the routing information relationship between the nodes , and arrange each node along the direction of the network edge node vacancy toward the network center node vacancy.
  • the layout order of each node is obtained according to the sorting result, and each node is laid out according to the layout order.
  • FIG. 3a is an implementation flowchart of another node layout method provided by an embodiment of the present invention. This embodiment is refined on the basis of the above-mentioned embodiment. , and refine the operation of laying out each of the nodes along the direction of the network edge node vacancy toward the network center node vacancy.
  • the method of this embodiment specifically includes:
  • the routing information includes: the source node for sending data, the target node for receiving data, and the data transmission amount between the source node and the target node.
  • S250 Determine whether there is currently an undistributed adjacent network edge node vacancy pair: if so, go to S270; otherwise, go to S280.
  • both the above-mentioned two nodes need to be laid out in the corresponding node slot. Since the layout is carried out along the direction of the network edge node vacancy toward the network center node vacancy, it can firstly detect whether there is a network edge node vacancy to layout the above two nodes.
  • FIG. 3b shows a schematic diagram of an application scenario of a node layout process to which the embodiment of the invention is applicable.
  • the two node vacancies A2 and A3 are a pair of adjacent network edge node vacancies.
  • Adjacent network edge node vacancy pairs layout the current source node and the current target node.
  • S260 Determine whether only the current source node or the current target node is not laid out in the node space: if so, execute S2130; otherwise, execute S2140.
  • the layout mode may be to layout the current source node first, and then the current target node; or may be to layout the current target node first, then the current source node, etc., which is not limited in this embodiment.
  • S280 Determine whether there are currently two non-adjacent network edge node vacancies: if so, execute S290; otherwise, execute S2100.
  • FIG. 3c shows a schematic diagram of an application scenario of a node layout process to which the embodiment of the invention is applicable.
  • the two node vacancies B1 and D1 are two non-adjacent network edge node vacancies.
  • the set node vacancy can be used as a reference point, and it can be determined by random selection, or clockwise or counterclockwise selection. Two non-adjacent network edge node vacancies are created, and the current source node and the current target node are laid out.
  • the layout mode may be to layout the current source node first, and then the current target node; or may be to layout the current target node first, then the current source node, etc., which is not limited in this embodiment.
  • S2100 Determine whether there is currently a unique network edge node vacancy: if yes, go to S2110; if not, go to S2120.
  • both the current source node and the current target node cannot be placed in the network edge node vacancy. At this time, it can be further judged whether there is currently a unique network edge node. To ensure that the network edge vacancies in the transmission network are fully occupied, the layout continues to the interior of the transmission network.
  • FIG. 3d shows a schematic diagram of an application scenario of a node layout process to which the embodiment of the invention is applicable.
  • FIG. 3d shows a schematic diagram of an application scenario of a node layout process to which the embodiment of the invention is applicable.
  • FIG. 3d shows a schematic diagram of an application scenario of a node layout process to which the embodiment of the invention is applicable.
  • FIG. 3d shows a schematic diagram of an application scenario of a node layout process to which the embodiment of the invention is applicable.
  • FIG. 3d shows a schematic diagram of an application scenario of a node layout process to which the embodiment of the invention is applicable.
  • FIG. 3d shows a schematic diagram of an application scenario of a node layout process to which the embodiment of the invention is applicable.
  • FIG. 3d shows a schematic diagram of an application scenario of a node layout process to which the embodiment of the invention is applicable.
  • FIG. 3d shows a schematic diagram of an application scenario of a node layout process to which the embodiment of the invention is applicable.
  • B2 and A2 can directly communicate with each other without a relay, while B3 and A2 need to go through a relay (A3 or B2) to realize data exchange, and C2 and A2 need to go through a relay (B2) To achieve data exchange, C3 and A2 need to go through two relays (C3->C2->B2->A2, or, C3->B3->A3->A2) to achieve data exchange, therefore, B2 is The node with the least (closest) number of relays to A2 is vacant. Therefore, A2 and B2 can be selected to lay out the current source node and the current target node.
  • the current source node and the current target node need to be laid out along the network edge node vacancy toward the network center node vacancy direction. It can be understood by those skilled in the art that the nodes may be arranged in any manner along the direction from the edge position to the center position.
  • all network edge node vacancies that have been currently laid out can be masked first, new network edge node vacancies can be determined in the transmission network, and the network edge node vacancies can continue to be laid out in the previous way;
  • the contour of the entire transmission network matches the center point, and the nodes are arranged in the order of distance from the center point from far to near, which is not limited in this embodiment.
  • S2120 only shows an optional way of layout along the direction of the network edge node vacancy toward the network center node vacancy.
  • the layout of the current source node and the current target node may be:
  • the first target node vacancy is in the unlayout node vacancy closest to the first target node vacancy.
  • the method before acquiring the minimum edge distance difference corresponding to each currently unlayout node vacancy, the method further includes:
  • the minimum edge distance difference corresponding to each non-network edge node vacancy may be calculated first.
  • each node can be laid out in the order of the smallest edge distance difference from small to large, which has reached the various embodiments of the present invention.
  • the desired layout effect is the closer the non-edge network node is to the network edge node, the smaller the minimum edge distance difference. Therefore, each node can be laid out in the order of the smallest edge distance difference from small to large, which has reached the various embodiments of the present invention. The desired layout effect.
  • S2130 Layout the current target node or the current source node in the un-layout node slot that matches the currently laid-out current source node or the current target node, and execute S2140.
  • the layout position of the other node can be determined according to the currently laid out current source node or the current target node.
  • the current source node if the current source node is laid out at the network edge node vacancy, first check whether the network edge node vacancy adjacent to the network edge node vacancy has nodes laid out, if If there is no layout node at this position, the current target node can be laid out at the adjacent network edge node vacancy. If there are nodes laid out at this location, you can continue to detect whether there are still unplaced network edge node vacancies. If so, you can place the current target node at the network edge node vacancy. If it does not exist, you can place the current target node. The layout is at the non-network edge node vacancy closest to the node vacancy placed by the current source node.
  • FIG. 3e shows a schematic diagram of an application scenario of a node layout process to which the embodiment of the invention is applicable.
  • the current source node has been laid out at the network edge node vacancy C1.
  • S2140 Determine whether the layout of all nodes is completed, if so, end the process, otherwise, return to S230.
  • each node in the chip by acquiring a plurality of routing information corresponding to each node in the chip, and in the order of the data transmission amount in the routing information from large to small, each node in the chip is moved along the network edge node vacancy in the direction of the network center node vacancy
  • the way of layout is based on the conclusion that the layout position of the node in the network determines the frequency of use of the node in the data transmission process, and creatively proposes a new way of in-chip layout of each node according to the data transmission volume of the node.
  • the maximum performance of each node in the chip is fully utilized, thereby improving the processing efficiency of the entire chip.
  • FIG. 4 is an implementation flowchart of another node layout method provided by an embodiment of the present invention. This embodiment is refined on the basis of the above-mentioned embodiment. , and refine the operation of laying out each of the nodes along the direction of the network edge node vacancy toward the network center node vacancy.
  • the method of this embodiment specifically includes:
  • the routing information includes: the source node for sending data, the target node for receiving data, and the data transmission amount.
  • the non-overlapping nodes in the node set include all nodes in the chip, and the arrangement order of the nodes in the node set reflects the descending order of the data transmission amount of each node. Therefore, the technical effects of the embodiments of the present invention can be achieved by sequentially following the order of the nodes in the node set along the direction of the network edge node vacancy toward the network center node vacancy.
  • a source node and a target node belonging to the same routing information are identified in the node set.
  • the reason for this setting is that, when laying out each node obtained from the node set, if the nodes belonging to the same routing information as the currently laid out nodes have already been laid out, the currently laid out nodes can be placed close to the already laid out nodes belonging to the same routing information.
  • the node vacancy layout where the layout node is located can minimize the number of relays between the above two nodes, thereby improving the processing efficiency of the entire chip.
  • S350 Determine whether there is a target associated node belonging to the same routing information as the current processing node in the currently laid out node vacancies: if yes, go to S360; otherwise, go to S370.
  • node slot A it can be first detected whether node slot A is a network edge node slot
  • the current processing node can be directly placed in the undistributed network edge node vacancy; If not, continue to judge whether there are other unplaced network node vacancies. If it is determined to exist, the current processing node can be directly placed in the other unplaced network node vacancies. If it is determined that it does not exist, the current processing node can be placed. The node is placed in an unplaced non-network edge node slot closest to node slot A.
  • the current processing node can be laid out in an unlaid non-network edge node closest to the node vacancy A. in the vacancy.
  • S370 determine whether there is currently a network edge node vacancy: if yes, execute S380; otherwise. S390 is executed.
  • the current processing node make a layout.
  • the layout of the current processing nodes may include:
  • the method may further include:
  • each node in the chip by acquiring a plurality of routing information corresponding to each node in the chip, and in the order of the data transmission amount in the routing information from large to small, each node in the chip is moved along the network edge node vacancy in the direction of the network center node vacancy
  • the way of layout is based on the conclusion that the layout position of the node in the network determines the frequency of use of the node in the data transmission process, and creatively proposes a new way of in-chip layout of each node according to the data transmission volume of the node.
  • the maximum performance of each node in the chip is fully utilized, thereby improving the processing efficiency of the entire chip.
  • an embodiment of the present invention further provides a logical node layout apparatus.
  • FIG. 5 is a structural diagram of a logical node layout device according to an embodiment of the present invention.
  • the logical node layout device provided by the embodiment of the present invention is used in a many-core system.
  • the multiple processing nodes of the many-core system those located at the edge of the on-chip network are edge processing nodes, and the others are internal processing nodes.
  • the apparatus of the embodiment of the present invention includes:
  • the routing information obtaining module 610 is configured to obtain a plurality of routing information; each routing information includes two logical nodes and the data transmission amount between the two logical nodes.
  • the current routing determination module 620 is configured to determine the unprocessed routing information with the largest data transmission amount as the current routing information.
  • the mapping module 630 is configured to map each unlocked logical node of the current routing information to an unlocked processing node, and lock the mapped logical node and processing node; if there is at least one unlocked logical node, make The current routing determination module works; wherein, if there is an unlocked edge processing node, the unlocked logical node is mapped to the unlocked edge processing node.
  • the logical node layout apparatus provided by the embodiment of the present invention can execute the logical node layout method provided by any embodiment of the present invention, and has functional modules and beneficial effects corresponding to the execution method.
  • an embodiment of the present invention also provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements any of the embodiments of the present invention when the processor executes the computer program.
  • a computer device which includes a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements any of the embodiments of the present invention when the processor executes the computer program.
  • FIG. 6 is a schematic structural diagram of a computer device provided by an embodiment of the present invention.
  • the computer device includes a processor 70, a memory 71, an input device 72, and an output device 73; the number of processors 70 in the computer device There may be one or more, and four processors 70 are taken as an example in FIG. 6; the processor 70, memory 71, input device 72 and output device 73 in the computer equipment can be connected by a bus or in other ways. Take bus connection as an example.
  • the four processors 70 may cooperate together to implement the method of any embodiment of the present invention.
  • the figure shows an internal structure diagram of a processor to which the embodiments of the present invention are applied.
  • the processor 70 includes one or more nodes (also referred to as computing cores). A required number of nodes need to be set, which is not limited in this embodiment.
  • Each node includes a computing unit and a storage unit, the computing unit is used to implement core computing in the node, and the storage unit is used to perform on-chip storage for data calculated in the node.
  • the memory 71 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like.
  • the memory 71 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device.
  • memory 71 may further include memory located remotely from processor 70, which may be connected to the computer device through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
  • the input device 72 may be used to receive input numerical or character information and to generate key signal input related to user settings and function control of the computer device.
  • the output device 73 may include a display device such as a display screen.
  • an embodiment of the present invention further provides a computer-readable storage medium 80, on which a computer program is stored, and when the computer program is executed by a processor, implements any logical node layout of the embodiment of the present invention method.
  • the present invention can be realized by software and necessary general-purpose hardware, and of course can also be realized by hardware, but in many cases the former is a better embodiment .
  • the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in a computer-readable storage medium, such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer , server, or network device, etc.) to execute the methods of various embodiments of the present invention.
  • a computer-readable storage medium such as a floppy disk of a computer , read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disk, etc.
  • the included units and modules are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, The specific names of the functional units are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Geometry (AREA)
  • Evolutionary Computation (AREA)
  • Architecture (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

公开的方法用于众核系统,众核系统的多个处理节点中位于片上网络边缘的为边缘处理节点,其它为内部处理节点;方法包括:获取多个路由信息;每个路由信息包括两个逻辑节点,以及两个逻辑节点间的数据传输量;确定数据传输量最大的未处理的路由信息为当前路由信息;将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点,并锁定映射后的逻辑节点和处理节点;若还存在至少一个未锁定的逻辑节点,返回确定数据传输量最大的未处理的路由信息为当前路由信息的步骤;其中,若存在未锁定的边缘处理节点,未锁定的逻辑节点被映射至未锁定的边缘处理节点。

Description

一种逻辑节点布局方法、装置、计算机设备及存储介质 技术领域
本发明实施例涉及计算机技术,具体涉及神经网络以及人工智能技术领域,尤其涉及一种逻辑节点布局方法、装置、计算机设备及存储介质。
背景技术
随着计算机技术的不断发展,目前已研发得到了各种众核系统(如芯片)。处理节点(也可以称为计算核)是众核系统的最基本的计算元素,承担了逻辑运算、控制处理、存储访问以及互连通信等重要功能。一个众核系统上,一般布局有多个处理节点,各处理节点间通过片上网络相互连接,以实现数据在众核系统上的有效传输。
相关技术中,各逻辑节点(如计算子任务)通过静态布局的方式进行众核系统内布局,也就是说,预先在众核系统中的各处理节点上实现对各逻辑节点的布局,并在布局后不会对逻辑节点的位置进行再次调整。由于一个逻辑节点可能需要使用其它逻辑节点计算得到的数据,故相应处理节点间要进行数据传输。
发明人在实现本发明的过程中发现,相关技术没有对众核系统中的各逻辑节点进行有效的布局,因此,会造成整个片上网络中有些处理节点间数据传输量过大,而有些处理节点间数据传输量不饱和情况的发生,无法充分发挥众核系统中各位置路由的最大效能,进而会降低众核系统的处理效率。
发明内容
本发明实施例提供了一种逻辑节点布局方法、装置、计算机设备及存储介质,以提供一种逻辑节点的众核系统内布局方式,用以提高整个众核系统的处理效率。
第一方面,本发明实施例提供了一种逻辑节点布局方法,其用于众核系统,众核系统的多个处理节点中位于片上网络边缘的为边缘处理节点,其它为内部处理节点;方法包括:获取多个路由信息;每个路由信息包括两个逻辑节点,以及两个逻辑节点间的数据传输量;确定数据传输量最大的未处理的路由信息为当前路由信息;将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点,并锁定映射后的逻辑节点和处理节点;若还存在至少一个未锁定的逻辑节点,返回确定数据传输量最大的未处理的路由信息为当前路由信息的步骤;其中,若存在未锁定的边缘处理节点,未锁定的逻辑节点被映射至未锁定的边缘处理节点。
在一些实施例中,若存在未锁定的位于片上网络角部的边缘处理节点,未锁定的逻辑节点被映射至位于片上网络角部的未锁定的边缘处理节点。
在一些实施例中,将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点包括:若当前路由信息包括两个未锁定的逻辑节点,根据未锁定的处理节点的位置,将该两个未锁定的逻辑节点分别映射至两个未锁定的处理节点;若当前路由信息包括一个未锁定的逻辑节点和一个已锁定的逻辑节点,根据未锁定的处理节点的位置以及该已锁定的逻辑节点所在的处理节点的位置,将该未锁定的逻辑节点映射至一个未锁定的处理节点。
在一些实施例中,根据未锁定的处理节点的位置,将该两个未锁定的逻辑节点分别映射至两个未锁定的处理节点包括:若存在至少两个未锁定的边缘处理节点,将该两个未锁定的逻辑节点分别映射至两个最接近的未锁定的边缘处理节点;和/或,若仅存在一个未锁定的边缘处理节点,将其中一个未锁定的逻辑节点映射至该未锁定的边缘处理节点,另一个未锁定的逻辑节点映射至与该未锁定的边缘处理节点最接近的未锁定的内部处理节点;和/或,若不存未锁定的边缘处理节点,将该两个未锁定的逻辑节点分别映射至两个最接近的未锁定的内部处理节点。
在一些实施例中,根据未锁定的处理节点的位置以及该已锁定的逻辑节点所在的处理节点的位置,将该未锁定的逻辑节点映射至一个未锁定的处理节点包括:若存在至少一个未锁定的边缘处理节点,将该未锁定的逻辑节点映射至与该已锁定的逻辑节点所在的处理节点最接近的边缘处理节点;若不存在未锁定的边缘处理节点,将该未锁定的逻辑节点映射至与该已锁定的逻辑节点所在的处理节点最接近的内部处理节点。
在一些实施例中,未锁定的逻辑节点为未被映射的待映射逻辑节点;未锁定的处理节点为内无逻辑节点的空处理节点。
在一些实施例中,未锁定的逻辑节点为已被预先映射至处理节点中的预设逻辑节点;未锁定的处理节点包括内有预设逻辑节点的预设射处理节点。
在一些实施例中,将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点包括:若未锁定的逻辑节点被映射至内有其它预设逻辑节点的预设射处理节点中,将该预设逻辑节点移动至另一个未锁定的处理节点。
在一些实施例中,将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点包括:若当前路由信息包括两个未锁定的逻 辑节点,且该两个未锁定的逻辑节点位于之间距离小于或等于预设阈值的两个边缘处理节点中,则将两个未锁定的逻辑节点分别映射至其所在的边缘处理节点中。
第二方面,本发明实施例提供一种逻辑节点布局装置,其用于众核系统,众核系统的多个处理节点中位于片上网络边缘的为边缘处理节点,其它为内部处理节点;装置包括:路由信息获取模块,用于获取多个路由信息;每个路由信息包括两个逻辑节点,以及两个逻辑节点间的数据传输量;当前路由确定模块,用于确定数据传输量最大的未处理的路由信息为当前路由信息;映射模块,用于将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点,并锁定映射后的逻辑节点和处理节点;若还存在至少一个未锁定的逻辑节点,使当前路由确定模块工作;其中,若存在未锁定的边缘处理节点,未锁定的逻辑节点被映射至未锁定的边缘处理节点。
第三方面,本发明实施例提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现本发明实施例的任意一种逻辑节点布局方法。
第四方面,本发明实施例提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本发明实施例的任意一种逻辑节点布局方法。
本发明实施例通过获取与众核系统中各逻辑节点对应的多个路由信息,按照路由信息中数据传输量从大到小的顺序,将其中未锁定的逻辑节点映射至未锁定的处理节点,且只要当前还有边缘处理节点未被锁定,则逻辑节点就会被映射至未锁定的边缘处理节点,从而实现“将数据传输量大的逻辑节点优先映射至边缘处理节点”,基于处理节点在网络中的布局位置决定了其在数据传输过程中的使用频度这一结论,创造性的提出了根据逻辑节点的数据传输量对各逻辑节点进行众核系统内布局的新方式,使片上网络各部分传输的数据量相对均衡,充分利用了众核系统中各位置路由的效能,进而可以提高整个众核系统的处理效率。
附图说明
图1为本发明实施例中的一种逻辑节点布局方法的实现流程图
图2为本发明实施例中的另一种逻辑节点布局方法的实现流程图;
图3a是本发明实施例中的另一种逻辑节点布局方法的实现流程图;
图3b是发明实施例所适用的另一种逻辑节点布局过程的应用场景示意图;
图3c是本发明实施例所适用的另一种逻辑节点布局过程的场景示 意图;
图3d本发明实施例所适用的另一种逻辑节点布局过程的场景示意图;
图3e是本发明实施例所适用的另一种逻辑节点布局过程的场景示意图;
图4是本发明实施例中的另一种逻辑节点布局方法的实现流程图;
图5是本发明实施例中的一种逻辑节点布局装置的结构图;
图6是本发明实施例中的一种计算机设备的结构图;
图7是本发明实施例中的一种计算机可读存储介质的结构图。
具体实施方式
下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本发明,而非对本发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部结构。
另外还需要说明的是,为了便于描述,附图中仅示出了与本发明相关的部分而非全部内容。在更加详细地讨论示例性实施例之前应当提到的是,一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将各项操作(或步骤)描述成顺序的处理,但是其中的许多操作可以被并行地、并发地或者同时实施。此外,各项操作的顺序可以被重新安排。当其操作完成时所述处理可以被终止,但是还可以具有未包括在附图中的附加步骤。所述处理可以对应于方法、函数、规程、子例程、子程序等等。
首先,为了便于理解本发明的各实施例,首先将本发明实施例的实现构思进行简单介绍。
如前所述,一个众核系统中包括有多个处理节点,各处理节点可通过一个片上网络相互连接,以实现任意两个处理节点之间的数据交互。一般来说,相邻的两个处理节点之间可以直接进行数据交互,而不相邻的两个处理节点之间需要经过一个或者多个中继处理节点的转发,间接的进行数据交互。发明人通过研究发现,为了保证较小的处理节点中继次数,在一个片上网络中,越接近中间位置的处理节点,越有可能会被选择作为中继处理节点来转发数据,也即,在整个数据计算处理过程中,中间位置的处理节点会频繁的作为中继处理节点来转发数据。
同时,当在一个众核系统中布局一个计算任务(例如,一个神经网络)时,会将计算任务分为多个计算子任务,并会为众核系统中的 各个处理节点对应分配相应的计算子任务(例如,神经网络中的某一层)。由此,计算子任务也称逻辑节点,以上分配过程即为将逻辑节点映射(布局)至对应的处理节点。
此时,由于计算任务预先确定,各个逻辑节点之间的数据传输量也是可以预先确定的。相关技术由于没有一个行之有效的逻辑节点布局方式,因此,可能会出现这样一种局面,数据传输量大的逻辑节点被布局到了片上网络的中心位置,而数据传输量小的逻辑节点被布局到了片上网络的边缘位置。此时,中间位置的处理节点既承担了繁重的数据传输任务,又承担了繁重的数据中继任务,路由负载过大,而边缘位置的处理节点既不需要传输大量的数据,也不会优先被选择作为中继处理节点,不同位置的路由负载处于不饱和状态。
基于此,发明人创造性的提出:在逻辑节点布局过程中引入逻辑节点的数据传输量作为布局参考因素的新方式,将数据传输量大的逻辑节点优先向片上网络的边缘位置进行布局,而将数据传输量小的逻辑节点优先向片上网络的中心位置进行布局,以使得各个处理节点最大程度的实现路由负载均衡。
第一方面,本发明实施例提供一种逻辑节点布局方法,其用于众核系统,众核系统多个处理节点中位于片上网络边缘的为边缘处理节点,其它为内部处理节点。
众核系统包括多个处理节点(计算核),而各处理节点通过片上网络相互连接。
其中,由于各个处理节点在众核系统中的位置是预置好的,因此,片上网络具有设定的物理形状,例如,矩阵形、星形或者蜂巢型等。
当然,片上网络不止为二维网络(2D mesh)中,例如,其也可具有负载的三维结构。
在片上网络中,部分处理节点是位于片上网络边缘的,即这些处理节点是片上网络图形的“最外圈”的处理节点,其称为边缘处理节点。而除了边缘处理节点外,其它处理节点为内部处理节。
不论该片上网络属于何种形状,以片上网络中的内部处理节点为中继处理节点时,一般会减少中继次数,因此,在通过该众核系统中的各处理节点进行协同的数据处理过程中,片上网络中的内部处理节点作为中继处理节点的频度是会远远高于边缘处理节点的。
参照图1,本发明实施例的方法包括:
S001、获取多个路由信息。
其中,每个路由信息包括两个逻辑节点,以及两个逻辑节点间的数据传输量。
当一个众核系统要布局一个计算任务时,会将计算任务分为多个计算子任务(逻辑节点),而每个逻辑节点被分配(映射)给一个处理节点。
此时,由于这个计算任务已知,故各个逻辑节点之间的数据传输量也是已知的。换句话说,当各个逻辑节点确定后,即可确定多个路由信息,每个路由信息中记录了两个会发生数据交互的逻辑节点,以及在这两个逻辑节点之间应传输的数据量(也即,数据传输量)。
其中,两个逻辑节点之间的数据传输量,可为两个逻辑节点之间往来的数据总量,也可为从其中一个逻辑节点(源逻辑节点)传输至另一个逻辑节点(目标逻辑节点)间的单向数据量。
本步骤中,根据逻辑节点的情况,获取各不同的逻辑节点组合的路由信息。
应当理解,任意一个逻辑节点通常都不是仅与另一个逻辑节点进行数据传输,而是会与多个其它的逻辑节点进行数据传输。即,一个逻辑节点可能同时属于多个路由信息,而不同路由信息中也可能包括相同的逻辑节点。
S002、确定数据传输量最大的未处理的路由信息为当前路由信息。
按照数据传输量从大到小的顺序对所有路由信息进行排序,从其中未进行过处理的路由信息中,选择出数据传输量最大的路由信息为当前路由信息,进行后续处理。
应当理解,在后续处理完成后,该路由信息就不再是“未处理的”路由信息了。
S003、将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点,并锁定映射后的逻辑节点和处理节点;若还存在至少一个未锁定的逻辑节点,返回确定数据传输量最大的未处理的路由信息为当前路由信息的步骤。
其中,若存在未锁定的边缘处理节点,未锁定的逻辑节点被映射至未锁定的边缘处理节点。
若当前路由信息中有未锁定的逻辑节点,则将其映射至未锁定的处理节点中;且在映射后,将该逻辑节点和该处理节点“锁定”,即后续该逻辑节点就不是未锁定的逻辑节点了,也不能再被重新映射至其它处理节点,且该处理节点也不是未锁定的处理节点了,其中也不能再映射其它的逻辑节点。
其中,以上未锁定的逻辑节点的映射必须满足:只要还有边缘处理节点是未锁定的,则未锁定的逻辑节点就必须被映射到未锁定的边缘处理节点中,而不能映射至未锁定的内部处理节点中。
当然,若当前路由信息中已经没有未锁定的逻辑节点(既其中的两个逻辑节点都已在之前的循环中被锁定),则本步骤直接完成。
在完成本步骤后,若所有的路由信息中,还有未锁定的逻辑节点,则返回步骤S001,重新选定当前路由信息,以分配其中未锁定的逻辑节点。
当然,若此时已不存在未锁定的逻辑节点,即表示所有逻辑节点的分配均完成,从而可结束,而不必等待对所有的路由信息都进行一次处理。
本发明实施例通过获取与众核系统中各逻辑节点对应的多个路由信息,按照路由信息中数据传输量从大到小的顺序,将其中未锁定的逻辑节点映射至未锁定的处理节点,且只要当前还有边缘处理节点未被锁定,则逻辑节点就会被映射至未锁定的边缘处理节点,从而实现“将数据传输量大的逻辑节点优先映射至边缘处理节点”,基于处理节点在网络中的布局位置决定了其在数据传输过程中的使用频度这一结论,创造性的提出了根据逻辑节点的数据传输量对各逻辑节点进行众核系统内布局的新方式,使片上网络各部分传输的数据量相对均衡,充分利用了众核系统中各位置路由的效能,进而可以提高整个众核系统的处理效率。
在一些实施例中,若存在未锁定的位于片上网络角部的边缘处理节点,未锁定的逻辑节点被映射至位于片上网络角部的未锁定的边缘处理节点。
作为本发明实施例的一种方式,当存在位于网络角部的未锁定的处理节点(其必然为边缘处理节点,因为角是边的一部分),则优先将逻辑节点被映射至角部的未锁定的边缘处理节点中。因为“角部”是边缘的进一步极端情况,其作为中继处理节点的概率会更低。
在一些实施例中,将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点(S002)包括:
S0021、若当前路由信息包括两个未锁定的逻辑节点,根据未锁定的处理节点的位置,将该两个未锁定的逻辑节点分别映射至两个未锁定的处理节点。
S0022、若当前路由信息包括一个未锁定的逻辑节点和一个已锁定的逻辑节点,根据未锁定的处理节点的位置以及该已锁定的逻辑节点所在的处理节点的位置,将该未锁定的逻辑节点映射至一个未锁定的处理节点。
若当前路由信息的两个逻辑节点均未锁定,则可根据片上网络中仍未被锁定的各处理节点的位置,确定如何分配两个逻辑节点。
若当前路由信息的两个逻辑节点中只有一个未被锁定,而另一个已锁定(已在之前的循环中被锁定),则在确定该未锁定的逻辑节点如何分配时,还应考虑该已锁定的逻辑节点所在的位置。
当然,若当前路由信息的两个逻辑节点中以没有未锁定的逻辑节点(既其中的两个逻辑节点都已在之前的循环中被锁定),则本步骤直接结束。
在一些实施例中,根据未锁定的处理节点的位置,将该两个未锁定的逻辑节点分别映射至两个未锁定的处理节点(S0021)包括以下至少一项:
S00211、若存在至少两个未锁定的边缘处理节点,将该两个未锁定的逻辑节点分别映射至两个最接近的未锁定的边缘处理节点。
S00212、若仅存在一个未锁定的边缘处理节点,将其中一个未锁定的逻辑节点映射至该未锁定的边缘处理节点,另一个未锁定的逻辑节点映射至与该未锁定的边缘处理节点最接近的未锁定的内部处理节点。
S00213、若不存未锁定的边缘处理节点,将该两个未锁定的逻辑节点分别映射至两个最接近的未锁定的内部处理节点。
在当前路由信息中有两个未锁定的逻辑节点需要映射时,若存在多个未锁定的边缘处理节点,则可将两个未锁定的逻辑节点都映射至未锁定的边缘处理节点中,且尽量保证所映射的两个边缘处理节点间的距离接近。
而若只剩一个未锁定的边缘处理节点,则将其中一个逻辑节点映射至该边缘处理节点中,而将另一个逻辑节点映射至与该边缘处理节点最接近的内部处理节点中。
而若已经没有未锁定的边缘处理节点,则将该两个未锁定的逻辑节点分别映射相互最接近的未锁定的内部处理节点中。
其中,本发明实施例中,处理节点之间距离的长短是以两个处理节点间的“最短传输步长”为准的,即,以片上网络中能将两个处理节点连接的所有路径中的最短路径的长度为准,而不是以两个处理节点之间的直线物理距离为准。
其中,应当理解,若两个处理节点是直接相连的,则二者之间的距离最短,或者称为二者“相邻”。
从而,所有分配方式中,若能将逻辑节点映射至相邻的处理节点中,则必然是“距离最短”的。
在一些实施例中,根据未锁定的处理节点的位置以及该已锁定的 逻辑节点所在的处理节点的位置,将该未锁定的逻辑节点映射至一个未锁定的处理节点(S0022)包括:
S00221、若存在至少一个未锁定的边缘处理节点,将该未锁定的逻辑节点映射至与该已锁定的逻辑节点所在的处理节点最接近的边缘处理节点。
S00222、若不存在未锁定的边缘处理节点,将该未锁定的逻辑节点映射至与该已锁定的逻辑节点所在的处理节点最接近的内部处理节点。
在当前路由信息中仅有一个未锁定的逻辑节点需要映射(当前路由信息中的另一个逻辑节点必然已在之前的循环中被映射和锁定)时,若当前还存在未锁定的边缘处理节点,则将该逻辑节点映射至边缘处理节点,且该边缘处理节点距离当前路由信息中已锁定的逻辑节点所在的处理节点(必然为边缘处理节点,因当前还有未锁定的边缘处理节点,故之前映射的逻辑节点必然位于边缘处理节点中)最接近。
若当前已没有未锁定的边缘处理节点,则将该逻辑节点映射至与当前路由信息中已锁定的逻辑节点所在的处理节点(可能为边缘处理节点,也可能为内部处理节点)最接近的内部处理节点中。
其中,应当理解,只要满足以上“只要当前还有边缘处理节点未被锁定,则逻辑节点就会被映射至未锁定的边缘处理节点”的条件,则逻辑节点的具体映射方式就是多样的,例如,也可不是如以上步骤S00211至S00213,以及步骤S00221和S00222将要求映射的处理节点“最接近”,而是采用随机映射等其它的方式。
在一些实施例中,未锁定的逻辑节点为未被映射的待映射逻辑节点;未锁定的处理节点为内无逻辑节点的空处理节点。
作为本发明实施例的一种方式,所有逻辑节点在初始状态可以是“未映射”的,所有处理节点在初始状态是“空的”,即本发明实施例用于将未映射的逻辑节点映射至空处理节点中,如将n个未映射的逻辑节点映射至n个空处理节点中。
在一些实施例中,未锁定的逻辑节点为已被预先映射至处理节点中的预设逻辑节点;未锁定的处理节点包括内有预设逻辑节点的预设射处理节点。
作为本发明实施例的一种方式,所有逻辑节点在初始状态可以是“已经映射到逻辑节点”中的,故有至少部分处理节点中是“已映射有逻辑节点”的(当然也可有部分处理节点是空处理节点),而本发明实施例的方法,用于“修改”或“调整”原有的映射,即将逻辑节点从其原映射所在的处理节点,“移动到”新的处理节点中。
应当理解,存在“原映射”并不代表逻辑节点、处理节点已被锁定,即,只有当按照本发明实施例的方法将一个逻辑节点映射至一个处理节点中时,该逻辑节点和处理节点才算是被“锁定”。
应当理解,一个逻辑节点在本发明实施例的方法中,实际也可能仍然被映射至其原本所在的处理节点,或者说其“并未移动”。
在一些实施例中,将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点(S002)包括:
S0023、若未锁定的逻辑节点被映射至内有其它预设逻辑节点的预设射处理节点中,将该预设逻辑节点移动至另一个未锁定的处理节点。
作为本发明实施例的一种方式,当未锁定的逻辑节点被映射至(或者说移动至)内有其它预设逻辑节点的预设射处理节点时,则该预设射处理节点中原有的预设逻辑节点也应被“移动至”其它的未锁定的处理节点,以免产生在一个处理节点中存在多个逻辑节点的情况。
其中,应当理解进一步的,若以上预设逻辑节点被移动至另一个预设射处理节点,则该另一个预设射处理节点中原有的预设逻辑节点也应继续“移动”。
在一些实施例中,将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点(S002)包括:
S0024、若当前路由信息包括两个未锁定的逻辑节点,且该两个未锁定的逻辑节点位于之间距离小于或等于预设阈值的两个边缘处理节点中,则将两个未锁定的逻辑节点分别映射至其所在的边缘处理节点中。
作为本发明实施例的一种方式,若当前路由信息的两个未锁定的逻辑节点均在边缘处理节点中,且该两个边缘处理节点之间的距离(如最短传输步长)小于一个预设阈值,则可以是直接将两个未锁定的逻辑节点映射至其原本所在的边缘处理节点中,或者说是“不移动”。
下面对本发明实施例的逻辑节点布局方法的一些具体示例进行详细介绍。
图2为本发明实施例提供的另一种节点(即逻辑节点,下同)布局方法的实现流程图,本实施例可适用于根据各节点的数据传输量大小,将各节点布局在芯片(即众核系统,下同)的各节点空位(即处理节点,下同)中,且各节点空位中预先未布局任一节点的情况,该方法可以由节点布局装置来执行,该装置可以由软件,和/或硬件的方式实现,并一般可以集成在具有数据计算功能的计算机设备上。
在本实施例中,待布局的芯片中包括多个节点和多个节点空位,各节点对应布局于节点空位中构成传输网络。
一般来说,各个节点空位在芯片中的布局位置均是合理配置好的,也即,各个节点空位在芯片中的位置是预置好的。且相邻的节点空位之间是可以进行通信的。其中,所述节点空位是指芯片中执行数据计算任务的计算核。相应的,本发明实施例的方法具体包括如下步骤:
S110、获取多个路由信息。
在一个具体的例子中,当某一个芯片预先布局一个计算任务后,会将芯片中的各个节点对应分配相应的计算子任务。此时,由于这个计算任务已知,各个节点之间的数据传输量也是已知的。换句话说,当将芯片中的各个节点分配了明确的计算子任务后,可以确定多条路由信息,每条路由信息中记录了发生数据交互的两个节点,以及在这两个节点之间传输的数据量(也即,数据传输量)。
相应的,所述路由信息包括:发送数据的源节点、接收数据的目标节点,以及源节点及目标节点间的数据传输量。
S120、按照所述数据传输量从大到小的顺序,将各所述节点沿网络边缘节点空位(边缘处理节点,下同)朝向网络中心节点空位(内部处理节点,下同)的方向进行布局。
在本实施例的一个可选的实施方式中,按照所述数据传输量从大到小的顺序,将各所述节点沿网络边缘节点空位朝向网络中心节点空位的方向进行布局可以包括:
按照数据传输量从大到小的顺序,将各所述路由信息进行排序,并按照排序结果,依次获取当前路由信息;获取当前路由信息中的当前源节点和当前目标节点,并将当前源节点和当前目标节点沿网络边缘节点空位朝向网络中心节点空位的方向进行布局;返回执行按照排序结果,依次获取当前处理路由信息的操作,直至完成对全部节点的布局。
在本实施方式中,可以首先将路由信息按照数据传输量从大到小的顺序进行排序,并根据排序后的路由信息,依次获取相应的源节点和目标节点进行布局,直至完成对芯片中全部节点的布局。
在本实施例的另一个可选的实施方式中,按照所述数据传输量从大到小的顺序,将各所述节点沿网络边缘节点空位朝向网络中心节点空位的方向进行布局可以包括:
按照数据传输量从大到小的顺序,对各所述路由信息进行排序;按照排序结果,顺序从各所述路由信息中分别获取源节点和目标节点不重叠的加入至节点集合中,以得到与所述芯片中的全部节点对应的节点集合;其中,在所述节点集合中标识有属于同一路由信息的源节点和目标节点;按照所述节点集合中的节点顺序以及节点间的路由信息关系,将各节点沿网络边缘节点空位朝向网络中心节点空位的方向进行布局。
在本实施方式中,在将路由信息按照数据传输量从大到小的顺序进行排序后,根据排序结果,得到各节点的布局顺序,并按照该布局顺序,对 各个节点进行布局。
图3a为本发明实施例提供的另一种节点布局方法的实现流程图,本实施例以上述实施例为基础进行细化,在本实施例中,将按照所述数据传输量从大到小的顺序,将各所述节点沿网络边缘节点空位朝向网络中心节点空位的方向进行布局这一操作进行细化,相应的,本实施例的方法具体包括:
S210、获取多个路由信息。
其中,所述路由信息包括:发送数据的源节点、接收数据的目标节点,以及源节点及目标节点间的数据传输量。
S220、按照数据传输量从大到小的顺序,将各所述路由信息进行排序。
S230、按照排序结果,依次获取当前路由信息。
S240、判断当前源节点与当前目标节点是否均未布局在节点空位:若是,执行S250;否则,执行S260。
S250、判断当前是否存在未布局的相邻网络边缘节点空位对:若是,执行S270;否则,执行S280。
在本实施例中,如果确定当前源节点与当前目标节点均未布局在节点空位中,则此次布局过程需要将上述两个节点均布局在对应的节点空位上。由于是按照沿网络边缘节点空位朝向网络中心节点空位的方向进行的布局,因此,可以首先检测是否存在网络边缘节点空位布局上述两个节点。
可以理解的是,待布局的当前源节点与当前目标节点之间的距离越小,则两者之间的数据传输量的中继次数也就越小,传输效率也就越高。因此,考虑优先将当前源节点与当前目标节点进行相邻布局。因此,可以首先判断当前是否存在未布局的相邻网络边缘节点空位对。
所谓相邻网络边缘节点空位对,具体是指两个相邻的网络边缘节点空位。其中,在图3b中示出了发明实施例所适用的一种节点布局过程的应用场景示意图。如图3b所示,A2和A3两个节点空位即为一个相邻网络边缘节点空位对。
如果在当前未布局的节点空位中检测到存在多个上述相邻网络边缘节点空位对,则可以通过随机选择,或者以设定节点空位作为参考定点,顺时针或者逆时针选择的方式,确定一个相邻网络边缘节点空位对,布局当前源节点与当前目标节点。
S260、判断是否仅当前源节点或者当前目标节点未布局在节点空位:若是,执行S2130;否则,执行S2140。
在本实施例中,如果当前源节点或者当前目标节点中有一个已经布局在节点空位上,则只需要将另外一个没有布局的节点进行布局即可,此时的布局方式请参考S2130。
S270、将所述当前源节点与所述当前目标节点布局在所述相邻网络边缘节点空位对中,执行S2140。
其中,布局方式可以为先布局当前源节点,后布局当前目标节点;也可以为先布局当前目标节点,后布局当前源节点等,本实施例对此并不进行限制。
S280、判断当前是否存在两个不相邻的网络边缘节点空位:若是,执行S290;否则,执行S2100。
续前例,如果当前源节点与当前目标节点均未布局在节点空位,且当前不存在相邻网络边缘节点空位对,则可以继续检测是否可以将当前源节点与当前目标节点均布局在网络边缘节点空位处,也即,判断当前是否存在两个不相邻的网络边缘节点空位。如果存在,则可以将上述当前源节点与当前目标节点均布局在网络边缘节点空位;如果不存在,则需要向传输网络的内部进行布局。
所谓两个不相邻网络边缘节点空位,具体是指两个网络边缘节点空位之间需要通过一个或者多个其他网络边缘节点的中继才能进行数据交互。其中,在图3c中示出了发明实施例所适用的一种节点布局过程的应用场景示意图。如图3c所示,B1和D1这两个节点空位即为两个不相邻的网络边缘节点空位。
如果在当前未布局的节点空位中检测到存在大于两个的不相邻网络边缘节点空位,则可以以设定节点空位作为参考定点,通过随机选择,或者顺时针或者逆时针选择的方式,确定出两个不相邻网络边缘节点空位,布局当前源节点与当前目标节点。
S290、将所述当前源节点与所述当前目标节点布局在所述不相邻的网络边缘节点空位中,执行S2140。
其中,布局方式可以为先布局当前源节点,后布局当前目标节点;也可以为先布局当前目标节点,后布局当前源节点等,本实施例对此并不进行限制。
S2100、判断当前是否存在唯一网络边缘节点空位:若是,执行S2110;否择,执行S2120。
进一步,如果确定当前不存在两个不相邻的网络边缘节点空位,则无法将当前源节点与当前目标节点均布局在网络边缘节点空位处,此时,可以进一步判断当前是否存在唯一网络边缘节点空位,以保证将传输网络中的网络边缘空位全部占满后,再继续向传输网络的内部进行布局。
S2110、将所述当前源节点和当前目标节点中的任一者布局在所述唯一网络边缘节点空位中,并将所述当前源节点和当前目标节点布局中的另一者布局在与所述唯一网络边缘节点空位最接近的非网络边缘节点空位中,执行S2140。
如果确定存在唯一网络边缘节点空位,则可以选择将当前源节点和当前目标节点布局中的一个,布局在该唯一网络边缘节点空位,为了尽可能缩短当前源节点和当前目标节点之间的中继次数,进而可以将当前源节点和当前目标节点布局中的另一个,布局在与所述唯一网络边缘节点空位最接近的非网络边缘节点空位中。
与一个空位最接近的空位,具体是指与该空位的中继次数最小的空位。其中,在图3d中示出了发明实施例所适用的一种节点布局过程的应用场景示意图。如图3d所示,在布局当前源节点与当前目标节点之前,发现仅存在唯一网络边缘节点空位A2,此时,将当前源节点和当前目标节点中的一个布局在A2处,并在非边缘网络节点空位B2、B3、C2和C3中,选择一个与A2中继次数最少的节点空位布局当前源节点和当前目标节点中的另一个。显然,B2与A2之间不需要中继可以直接互传,而B3与A2之间需要经过一次中继(A3或者B2)才能实现数据交互,C2与A2之间需要经过一次中继(B2)才能实现数据交互,C3与A2之间需要经过两次中继(C3->C2->B2->A2,或者,C3->B3->A3->A2)才能实现数据交互,因此,B2是与A2中继次数最少(最接近)的节点空位,因此,可以选择A2和B2布局当前源节点与当前目标节点。
S2120、根据当前未布局的节点空位与各所述网络边缘节点空位的位置关系,对所述当前源节点和当前目标节点进行布局,执行S2140。
具体的,如果全部网络边缘节点空位均已经被节点布满,则需要将当前源节点和当前目标节点沿网络边缘节点空位朝向网络中心节点空位的方向进行布局。本领域技术人员可以理解的是,可以采取任意方式将节点沿边缘位置朝向中心位置的方向进行布局。
例如,可以将当前已布局完成的全部的网络边缘节点空位首先屏蔽,在传输网络中确定新的网络边缘节点空位,并继续按照之前的方式进行网络边缘节点空位的布局;或者,也可以确定与整个传输网络的轮廓匹配的中心点,并沿距离中心点由远及近的顺序,布局各个节点等,本实施例对此并不进行限制。其中,S2120仅示出了一种可选的,沿网络边缘节点空位朝向网络中心节点空位的方向进行布局的方式。
具体的,根据当前未布局的节点空位与各所述网络边缘节点空位的位置关系,对所述当前源节点和当前目标节点进行布局的方式,可以为:
分别获取各当前未布局的节点空位的最小边缘距离差;
将当前源节点和当前目标节点中的任一者布局在所述网络边缘位置差最小的未布局的第一目标节点空位中,并将当前源节点和当前目标节点中的另一者布局在与所述第一目标节点空位最接近的未布局的节点空位中。
可选的,在分别获取与各当前未布局的节点空位分别对应的最小边缘距离差之前,还包括:
分别计算每个非网络边缘节点空位与各所述网络边缘节点空位之间的多个距离值,并获取与每个非网络边缘节点空位分别对应的最小距离值,作为非网络边缘节点空位的最小边缘距离差。
在本可选实施方式中,可以首先计算与每个非网络边缘节点空位分别对应的最小边缘距离差,所谓最小边缘距离差,具体是指该非网络边缘节点空位与网络边缘节点空位的最小距离值,该最小距离值可以通过中继次数+1来衡量。例如,如图3d所示,非网络边缘节点空位B2与各个网络边缘节点之间的中继次数为0、1、2、3和4,则可以取得最小值0+1=1,作为B2的最小边缘距离差。
显然,距离网络边缘节点越近的非边缘网络节点,其最小边缘距离差越小,因此,可以通过最小边缘距离差由小到大的顺序,对各节点进行布局,已达到本发明各实施例所需的布局效果。
S2130、将当前目标节点或当前源节点布局在与当前已布局的当前源节点或当前目标节点匹配的未布局的节点空位中,执行S2140。
如前所述,在当前源节点或者当前目标节点中有一个已经布局在节点空位上时,可以根据当前已布局的当前源节点或当前目标节点,确定另一个节点的布局位置。
具体的,以当前已布局的节点为当前源节点为例,如果当前源节点布局在网络边缘节点空位处,则首先检测与该网络边缘节点空位相邻的网络边缘节点空位是否布局有节点,如果该位置没有布局节点,则可以将当前目标节点布局在该相邻的网络边缘节点空位处。如果该位置布局有节点,则可以继续检测是否仍然存在未布局的网络边缘节点空位,如果存在,则可以将当前目标节点布局在该网络边缘节点空位处,如果不存在,则可以将当前目标节点布局在与该当前源节点所布局的节点空位最接近的非网络边缘节点空位处。
其中,在图3e中示出了发明实施例所适用的一种节点布局过程的应用场景示意图。如图3e所示,在获取当前源节点与当前目标节点之后,确定当前源节点已经布局在网络边缘节点空位C1处,此时,首先检测是否存在B1和D1位置处是否布局有节点,在确定上述位置均已布局节点后,继续检测是否存在未布局的网络边缘节点空位,如果确定不存在,则在当前未布局的各非网络边缘节点空位中,选择距离C1最近的一个未布局的节点空位C2,并将当前目标节点布局在C2处。
S2140、判断是否完成对全部节点的布局,若是,结束流程,否则,返回执行S230。
本发明实施例通过获取与芯片中各节点对应的多个路由信息,按照路由信息中数据传输量从大到小的顺序,将芯片中的各节点沿网络边缘节点空位朝向网络中心节点空位的方向进行布局的方式,基于节点在网络中的布局位置决定了节点在数据传输过程中的使用频度这一结论,创造性的提 出了根据节点的数据传输量对各节点进行芯片内布局的新方式,充分利用了芯片中各节点的最大效能,进而可以提高整个芯片的处理效率。
图4为本发明实施例提供的另一种节点布局方法的实现流程图,本实施例以上述实施例为基础进行细化,在本实施例中,将按照所述数据传输量从大到小的顺序,将各所述节点沿网络边缘节点空位朝向网络中心节点空位的方向进行布局这一操作进行细化,相应的,本实施例的方法具体包括:
S310、获取多个路由信息。
其中,所述路由信息包括:发送数据的源节点、接收数据的目标节点和数据传输量。
S320、按照数据传输量从大到小的顺序,对各所述路由信息进行排序。
S330、按照排序结果,顺序从各所述路由信息中分别获取源节点和目标节点不重叠的加入至节点集合中,以得到与所述芯片中的全部节点对应的节点集合。
在本实施例中,所述节点集合中不重叠的包括有所述芯片中的全部节点,且节点集合中的节点排布顺序,同时反映了各节点的数据传输量从大到小的顺序。因此,依次按照节点集合中的节点顺序进行沿网络边缘节点空位朝向网络中心节点空位的方向,即可实现本发明各实施例的技术效果。
其中,在所述节点集合中标识有属于同一路由信息的源节点和目标节点。这样设置的原因在于,在布局从节点集合中获取的每个节点时,如果与当前布局的节点属于同一路由信息的节点已经被完成布局,则可以将当前布局的节点贴近属于同一路由信息的已布局节点所在的节点空位布局,以最大程度的减少上述两个节点间的中继次数,进而可以提高整个芯片的处理效率。
S340、依次从所述节点集合中获取一个节点作为当前处理节点。
S350、判断当前已布局的节点空位中,是否存在与所述当前处理节点属于同一路由信息的目标关联节点:若是,执行S360;否则执行S370。
S360、将所述当前处理节点布局在与所述目标关联节点所布局的节点空位匹配的未布局的节点空位中,执行S3100。
在本实施例中,如果确定与当前处理节点处于同一路由信息的目标关联节点已经布局在某一个节点空位(例如节点空位A)中,则可以首先检测节点空位A是否为网络边缘节点空位;
若确定节点空位A为网络边缘节点空位,则继续判断是否存在与节点空位A相邻的未布局网络边缘节点空位:若是,则可以直接将当前处理节点直接布局在该未布局网络边缘节点空位;若否,则继续判断是否存在其他未布局的网络节点空位,如果确定存在,则可以直接将该当前处理节点布局在该其他未布局的网络节点空位中,如果确定不存在,则可以将当前处理节点布局在距离节点空位A最近的一个未布局的非网络边缘节点空位 中。
若确定节点空位A不为网络边缘节点空位,则确定当前已经不存在未布局的网络边缘节点空位,此时,可以将当前处理节点布局在距离节点空位A最近的一个未布局的非网络边缘节点空位中。
S370、判断当前是否存在网络边缘节点空位:若是,执行S380;否则。执行S390。
S380、将所述当前处理节点布局在所述网络边缘节点空位中,执行S3100。
可选的,如果确定当前布局的各节点中,不存在与当前处理节点处于同一路由信息的节点,则可以仅对当前处理节点按照沿网络边缘节点空位朝向网络中心节点空位的方向进行布局,也即,优先在网络边缘节点空位布局该当前处理节点,如果确定不存在网络边缘节点空位,则可以根据当前未布局的节点空位与各所述网络边缘节点空位的位置关系,对所述当前处理节点进行布局。
S390、根据当前未布局的节点空位与各所述网络边缘节点空位的位置关系,对所述当前处理节点进行布局,执行S3100。
具体的,根据当前未布局的节点空位与各所述网络边缘节点空位的位置关系,对所述当前处理节点进行布局,可以包括:
分别获取各当前未布局的节点空位(非网络边缘节点空位)的最小边缘距离差;将当前处理节点布局在所述网络边缘位置差最小的未布局的节点空位中。
可选的,在分别获取与各当前未布局的节点空位分别对应的最小边缘距离差之前,还可以包括:
分别计算每个非网络边缘节点空位与各所述网络边缘节点空位之间的多个距离值,并获取与每个非网络边缘节点空位分别对应的最小距离值,作为非网络边缘节点空位的最小边缘距离差。
S3100、判断是否完成对节点集合中全部节点的布局:若是,结束流程,否则,返回执行S340。
本发明实施例通过获取与芯片中各节点对应的多个路由信息,按照路由信息中数据传输量从大到小的顺序,将芯片中的各节点沿网络边缘节点空位朝向网络中心节点空位的方向进行布局的方式,基于节点在网络中的布局位置决定了节点在数据传输过程中的使用频度这一结论,创造性的提出了根据节点的数据传输量对各节点进行芯片内布局的新方式,充分利用了芯片中各节点的最大效能,进而可以提高整个芯片的处理效率。
第二方面,本发明实施例还提供的一种逻辑节点布局装置。
图5是本发明实施例提供的一种逻辑节点布局装置的结构图。
本发明实施例提供的逻辑节点布局装置用于众核系统,众核系统 的多个处理节点中位于片上网络边缘的为边缘处理节点,其它为内部处理节点。本发明实施例的装置包括:
路由信息获取模块610,用于获取多个路由信息;每个路由信息包括两个逻辑节点,以及两个逻辑节点间的数据传输量。
当前路由确定模块620,用于确定数据传输量最大的未处理的路由信息为当前路由信息。
映射模块630,用于将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点,并锁定映射后的逻辑节点和处理节点;若还存在至少一个未锁定的逻辑节点,使当前路由确定模块工作;其中,若存在未锁定的边缘处理节点,未锁定的逻辑节点被映射至未锁定的边缘处理节点。
本发明实施例所提供的逻辑节点布局装置可执行本发明任意实施例所提供的逻辑节点布局方法,具备执行方法相应的功能模块和有益效果。
第三方面,本发明实施例还提供的一种计算机设备,其包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现本发明实施例任意一种的逻辑节点布局方法。
图6为本发明实施例提供的一种计算机设备的结构示意图,如图6所示,该计算机设备包括处理器70、存储器71、输入装置72和输出装置73;计算机设备中处理器70的数量可以是一个或多个,图6中以四个处理器70为例;计算机设备中的处理器70、存储器71、输入装置72和输出装置73可以通过总线或其他方式连接,图6中以通过总线连接为例。
四个处理器70之间可以共同配合,实现本发明任意实施例的方法。
其中,在图中示出了本发明实施例所适用的一种处理器的内部结构图,该处理器70中包括一个或者多个节点(也可以称为计算核),实际上,可以根据实际需要设置所需数量的节点,本实施例对此并不进行限制。每个节点中,包括计算单元和存储单元,计算单元用于实现节点内的核心计算,存储单元用于对节点内计算得到的数据进行片内存储。
存储器71可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器71可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存 储器71可进一步包括相对于处理器70远程设置的存储器,这些远程存储器可以通过网络连接至计算机设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置72可用于接收输入的数字或字符信息,以及产生与计算机设备的用户设置以及功能控制有关的键信号输入。输出装置73可包括显示屏等显示设备。
第四方面,参照图7,本发明实施例还提供的一计算机可读存储介质80,其上存储有计算机程序,该计算机程序被处理器执行时实现本发明实施例的任意一种逻辑节点布局方法。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本发明可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例的方法。
值得注意的是,上述逻辑节点布局装置的实施例中,所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本发明的保护范围。
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解,本发明不限于这里的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。

Claims (12)

  1. 一种逻辑节点布局方法,其特征在于,用于众核系统,众核系统的多个处理节点中位于片上网络边缘的为边缘处理节点,其它为内部处理节点;方法包括:
    获取多个路由信息;每个路由信息包括两个逻辑节点,以及两个逻辑节点间的数据传输量;
    确定数据传输量最大的未处理的路由信息为当前路由信息;
    将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点,并锁定映射后的逻辑节点和处理节点;若还存在至少一个未锁定的逻辑节点,返回确定数据传输量最大的未处理的路由信息为当前路由信息的步骤;其中,若存在未锁定的边缘处理节点,未锁定的逻辑节点被映射至未锁定的边缘处理节点。
  2. 根据权利要求1的方法,其特征在于,
    若存在未锁定的位于片上网络角部的边缘处理节点,未锁定的逻辑节点被映射至位于片上网络角部的未锁定的边缘处理节点。
  3. 根据权利要求1的方法,其特征在于,将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点包括:
    若当前路由信息包括两个未锁定的逻辑节点,根据未锁定的处理节点的位置,将该两个未锁定的逻辑节点分别映射至两个未锁定的处理节点;
    若当前路由信息包括一个未锁定的逻辑节点和一个已锁定的逻辑节点,根据未锁定的处理节点的位置以及该已锁定的逻辑节点所在的处理节点的位置,将该未锁定的逻辑节点映射至一个未锁定的处理节点。
  4. 根据权利要求3的方法,其特征在于,根据未锁定的处理节点的位置,将该两个未锁定的逻辑节点分别映射至两个未锁定的处理节点包括:
    若存在至少两个未锁定的边缘处理节点,将该两个未锁定的逻辑节点分别映射至两个最接近的未锁定的边缘处理节点;
    和/或,
    若仅存在一个未锁定的边缘处理节点,将其中一个未锁定的逻辑节点映射至该未锁定的边缘处理节点,另一个未锁定的逻辑节点映射至与该未锁定的边缘处理节点最接近的未锁定的内部处理节点;
    和/或,
    若不存未锁定的边缘处理节点,将该两个未锁定的逻辑节点分别映射至两个最接近的未锁定的内部处理节点。
  5. 根据权利要求3的方法,其特征在于,根据未锁定的处理节点的位置以及该已锁定的逻辑节点所在的处理节点的位置,将该未锁定的逻辑节点映射至一个未锁定的处理节点包括:
    若存在至少一个未锁定的边缘处理节点,将该未锁定的逻辑节点映射至与该已锁定的逻辑节点所在的处理节点最接近的边缘处理节点;
    若不存在未锁定的边缘处理节点,将该未锁定的逻辑节点映射至与该已锁定的逻辑节点所在的处理节点最接近的内部处理节点。
  6. 根据权利要求1的方法,其特征在于,
    未锁定的逻辑节点为未被映射的待映射逻辑节点;
    未锁定的处理节点为内无逻辑节点的空处理节点。
  7. 根据权利要求1的方法,其特征在于,
    未锁定的逻辑节点为已被预先映射至处理节点中的预设逻辑节点;
    未锁定的处理节点包括内有预设逻辑节点的预设处理节点。
  8. 根据权利要求7的方法,其特征在于,将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点包括:
    若未锁定的逻辑节点被映射至内有其它预设逻辑节点的预设射处理节点中,将该预设逻辑节点移动至另一个未锁定的处理节点。
  9. 根据权利要求7的方法,其特征在于,将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点包括:
    若当前路由信息包括两个未锁定的逻辑节点,且该两个未锁定的逻辑节点位于之间距离小于或等于预设阈值的两个边缘处理节点中,则将两个未锁定的逻辑节点分别映射至其所在的边缘处理节点中。
  10. 一种逻辑节点布局装置,其特征在于,用于众核系统,众核系统的多个处理节点中位于片上网络边缘的为边缘处理节点,其它为内部处理节点;装置包括:
    路由信息获取模块,用于获取多个路由信息;每个路由信息包括两个逻辑节点,以及两个逻辑节点间的数据传输量;
    当前路由确定模块,用于确定数据传输量最大的未处理的路由信息为当前路由信息;
    映射模块,用于将当前路由信息的每个未锁定的逻辑节点映射至一个未锁定的处理节点,并锁定映射后的逻辑节点和处理节点;若还存在至少一个未锁定的逻辑节点,使当前路由确定模块工作;其中, 若存在未锁定的边缘处理节点,未锁定的逻辑节点被映射至未锁定的边缘处理节点。
  11. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,处理器执行计算机程序时实现如权利要求1-9中任一的逻辑节点布局方法。
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1-9中任一的逻辑节点布局方法。
PCT/CN2021/112964 2020-08-25 2021-08-17 一种逻辑节点布局方法、装置、计算机设备及存储介质 WO2022042368A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/909,417 US11694014B2 (en) 2020-08-25 2021-08-17 Logical node layout method and apparatus, computer device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010868916.4A CN111985181B (zh) 2020-08-25 2020-08-25 一种节点布局方法、装置、计算机设备及存储介质
CN202010868916.4 2020-08-25

Publications (1)

Publication Number Publication Date
WO2022042368A1 true WO2022042368A1 (zh) 2022-03-03

Family

ID=73444136

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/112964 WO2022042368A1 (zh) 2020-08-25 2021-08-17 一种逻辑节点布局方法、装置、计算机设备及存储介质

Country Status (3)

Country Link
US (1) US11694014B2 (zh)
CN (1) CN111985181B (zh)
WO (1) WO2022042368A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230089320A1 (en) * 2020-08-25 2023-03-23 Lynxi Technologies Co., Ltd. Logical node layout method and apparatus, computer device, and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114338506B (zh) * 2022-03-15 2022-08-05 之江实验室 一种类脑计算机操作系统的神经任务片内路由方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101035023A (zh) * 2007-03-07 2007-09-12 华为技术有限公司 一种网络结构拓扑布局的方法及网管设备
US20100161793A1 (en) * 2008-12-18 2010-06-24 Electronics And Telecommunications Research Institute Method for composing on-chip network topology
CN106339350A (zh) * 2016-08-23 2017-01-18 中国科学院计算技术研究所 众核处理器片上访存距离优化的方法及其装置
CN110955463A (zh) * 2019-12-03 2020-04-03 天津大学 支持边缘计算的物联网多用户计算卸载方法
CN111985181A (zh) * 2020-08-25 2020-11-24 北京灵汐科技有限公司 一种节点布局方法、装置、计算机设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5035480B1 (ja) * 2012-01-27 2012-09-26 オムロン株式会社 データ中継装置、データ送信装置、ネットワークシステム
CN105450481B (zh) * 2014-07-10 2018-09-14 龙芯中科技术有限公司 片上网络的布局优化方法及装置
CN107517160B (zh) * 2016-06-15 2020-08-18 阿尔格布鲁控股有限公司 一种用于跨不同自治系统进行数据路由的方法、装置及系统
CN110166279B (zh) * 2019-04-09 2021-05-18 中南大学 一种非结构化云数据管理系统的动态布局方法
US10891414B2 (en) * 2019-05-23 2021-01-12 Xilinx, Inc. Hardware-software design flow for heterogeneous and programmable devices
US10891132B2 (en) * 2019-05-23 2021-01-12 Xilinx, Inc. Flow convergence during hardware-software design for heterogeneous and programmable devices
US11188312B2 (en) * 2019-05-23 2021-11-30 Xilinx, Inc. Hardware-software design flow with high-level synthesis for heterogeneous and programmable devices
CN111194064B (zh) * 2019-11-06 2021-10-01 周口师范学院 数据传输方法、装置、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101035023A (zh) * 2007-03-07 2007-09-12 华为技术有限公司 一种网络结构拓扑布局的方法及网管设备
US20100161793A1 (en) * 2008-12-18 2010-06-24 Electronics And Telecommunications Research Institute Method for composing on-chip network topology
CN106339350A (zh) * 2016-08-23 2017-01-18 中国科学院计算技术研究所 众核处理器片上访存距离优化的方法及其装置
CN110955463A (zh) * 2019-12-03 2020-04-03 天津大学 支持边缘计算的物联网多用户计算卸载方法
CN111985181A (zh) * 2020-08-25 2020-11-24 北京灵汐科技有限公司 一种节点布局方法、装置、计算机设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230089320A1 (en) * 2020-08-25 2023-03-23 Lynxi Technologies Co., Ltd. Logical node layout method and apparatus, computer device, and storage medium
US11694014B2 (en) * 2020-08-25 2023-07-04 Lynxi Technologies Co., Ltd. Logical node layout method and apparatus, computer device, and storage medium

Also Published As

Publication number Publication date
CN111985181A (zh) 2020-11-24
US11694014B2 (en) 2023-07-04
US20230089320A1 (en) 2023-03-23
CN111985181B (zh) 2023-09-22

Similar Documents

Publication Publication Date Title
WO2022042368A1 (zh) 一种逻辑节点布局方法、装置、计算机设备及存储介质
KR102285138B1 (ko) 네트워크 온 칩에서의 시스템 레벨 시뮬레이션
KR102374572B1 (ko) 네트워크 온 칩 설계를 위한 트랜잭션 트래픽 스펙
US9769077B2 (en) QoS in a system with end-to-end flow control and QoS aware buffer allocation
CN107710237B (zh) 服务器上深度神经网络划分
US9015448B2 (en) Message broadcast with router bypassing
US9244880B2 (en) Automatic construction of deadlock free interconnects
US9130856B2 (en) Creating multiple NoC layers for isolation or avoiding NoC traffic congestion
JP2015535630A (ja) 多層相互接続による分散型プロセッサを有する処理システム
EP3399709B1 (en) Method for forwarding packet
US9781043B2 (en) Identification of internal dependencies within system components for evaluating potential protocol level deadlocks
CN110191155B (zh) 一种面向胖树互连网络的并行作业调度方法、系统及存储介质
CN114500355B (zh) 路由方法、片上网络、路由节点和路由装置
WO2022012576A1 (zh) 路径规划方法、装置、路径规划设备及存储介质
WO2022033290A1 (zh) 强一致存储系统、数据强一致存储方法、服务器及介质
Tessier et al. Topology-aware data aggregation for intensive I/O on large-scale supercomputers
KR101382606B1 (ko) 하이브리드 광학 네트워크 온 칩의 태스크 매핑 장치 및 방법과 이를 이용한 하이브리드 광학 네트워크 온 칩 시스템
CN108347377B (zh) 数据转发方法及装置
EP3910522A1 (en) Methods and computer readable media for synthesis of a network-on-chip for deadlock-free transformation
CN109165729A (zh) 神经网络的调度方法及系统
WO2022184008A1 (zh) 众核的路由映射方法、装置、设备及介质
CN112257376B (zh) 馈通路径的规划方法及装置、电子设备、存储介质
CN114221961A (zh) 层级式dag区块链生成方法、设备、介质及程序产品
KR101558807B1 (ko) 호스트 프로세서와 협업 프로세서 간에 협업 처리를 위한 프로세서 스케줄링 방법 및 그 방법을 수행하는 호스트 프로세서
Odendahl et al. Optimized buffer allocation in multicore platforms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21860206

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21860206

Country of ref document: EP

Kind code of ref document: A1