WO2017107411A1 - 分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片 - Google Patents

分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片 Download PDF

Info

Publication number
WO2017107411A1
WO2017107411A1 PCT/CN2016/086094 CN2016086094W WO2017107411A1 WO 2017107411 A1 WO2017107411 A1 WO 2017107411A1 CN 2016086094 W CN2016086094 W CN 2016086094W WO 2017107411 A1 WO2017107411 A1 WO 2017107411A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector data
data
leaf
node
leaf nodes
Prior art date
Application number
PCT/CN2016/086094
Other languages
English (en)
French (fr)
Inventor
罗韬
刘少礼
张士锦
陈云霁
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Priority to US15/781,039 priority Critical patent/US10866924B2/en
Publication of WO2017107411A1 publication Critical patent/WO2017107411A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7825Globally asynchronous, locally synchronous, e.g. network on chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Definitions

  • the invention relates to the technical field of intelligent equipment, unmanned driving, on-chip network data transmission, and the like, and particularly relates to a device, a method, a control device and an intelligent chip of a vector data return processing unit in a fractal tree.
  • the fractal tree structure is a multi-level tree structure composed of a root node as a central node and multiple sets of leaf nodes with self-similarity. Fractal trees have been widely used in VLSI design because they use a full M-tree layout, which is proportional to the number of tree nodes and saves on-chip space. The propagation delay is the same, so it is often used as an interconnection network in VLSI multiprocessors.
  • a vector in linear algebra refers to an ordered array of n real or complex numbers, called an n-dimensional vector.
  • a (a 1 , a 2 , ... a i , a n ) is called an n-dimensional vector, where a i It is called the ith component of the vector ⁇ .
  • the component data competes with each other on the intermediate nodes of the fractal, and the transmission of the maintenance component data requires a protocol for guaranteeing the component data.
  • the reliability, the time when the component data is returned cannot be notified to each other between the leaf nodes.
  • the prior art does not provide effective and convenient support for the communication of the vector data transmission network on the on-chip fractal network, and therefore provides a conflict-free and reliable communication method suitable for the network communication mode of the vector data transmission on the fractal network.
  • An orderly approach is especially urgent and needed.
  • the present invention provides an apparatus, method, control apparatus, and smart chip for a vector data backhaul processing unit in a fractal tree.
  • the invention provides a device for a vector data back-transfer processing unit in a fractal tree, comprising:
  • a central node which is a communication data center of the on-chip network, configured to receive vector data returned by a plurality of leaf nodes;
  • the repeater module includes a local cache structure and a data processing component for data communication with the upper and lower nodes and processing of the vector data;
  • the plurality of leaf nodes are divided into N groups, and the number of leaf nodes in each group is the same, and the central node communicates with each group of leaf nodes separately through the repeater module, and each group of leaf nodes constitutes communication
  • the structure has self-similarity, and the plurality of leaf nodes and the central node are communicably connected in a complete M-tree manner by the plurality of the transponder modules, each leaf node includes a set bit, if the set bit requires a leaf When the vector data in the node is shifted, the leaf node moves the vector data of the preset bandwidth bit to the corresponding position, otherwise the leaf node transmits the vector data back to the central node.
  • the device of the vector data back-transfer processing unit in the fractal tree includes: each leaf node has an id identifier, and the id identifier sequentially increases the sequence number from the top side of the complete M-tree; the data is released The device shares a clock signal.
  • each of the repeater modules includes an adder of a preset bandwidth, the number of the adders is a total number of all leaf nodes, and the adder Having an overflow check function, wherein if the vector data is over-shifted, the repeater module splicing the received vector data to transmit the spliced result to the upper layer node, otherwise the repeater module will receive The obtained vector data is checked for overflow and added, and then transmitted to the upper node.
  • the invention also proposes a method of using the device, comprising:
  • each leaf node includes a set bit, and if the set bit requires vector data in the leaf node to be shifted, The leaf node moves the vector data of the preset bandwidth bit to the corresponding position, otherwise the leaf node transmits the vector data back to the central node.
  • each leaf node has an id identifier, and the id identifier sequentially increases the sequence number from the top side of the complete M-tree; the data distribution device shares a clock signal
  • the leaf node calculates the number of bits according to the id identifier and the preset bandwidth, and the leaf node is The vector data is moved to the corresponding position on the full bandwidth.
  • the repeater module performs bit splicing on the received vector data, and transmits the splicing result to the upper layer node. Otherwise, the repeater module will The received vector data is checked for overflow and added, and then transmitted to the upper node.
  • the leaf node and the central node comply with the handshake protocol.
  • the invention also proposes a control device comprising the device.
  • the invention also proposes a smart chip comprising the control device.
  • FIG. 1 is a schematic diagram of an on-chip multi-core structure of 16+1 cores connected by using an H-tree according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a hub_two_add_to_one in an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a behavior of a hub_two_add_to_one handshake with a data sender in an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of an H-tree structure of the present invention developed into a complete binary tree topology
  • FIG. 5 is a schematic diagram showing the behavior of component data being shifted into vector data in a leaf tile according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a behavior of bit splicing of vector data in a hub according to an embodiment of the present invention
  • FIG. 7 is a schematic diagram showing vector results of component data of all leaf tiles in the end of a spliced data path according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of the behavior of vector data superimposed in a hub according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of an on-chip multi-core structure of 64+1 cores connected by using an X-tree according to another embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a hub_four_add_to_one according to another embodiment of the present invention.
  • FIG. 11 is a schematic diagram of a behavior of a hub_four_add_to_one handshake with a data sender according to another embodiment of the present invention.
  • FIG. 12 is a schematic diagram showing the behavior of component data being shifted into vector data in a leaf tile according to another embodiment of the present invention.
  • FIG. 13 is a schematic diagram of a behavior of bit splicing of vector data in a hub according to another embodiment of the present invention.
  • FIG. 14 is a schematic diagram showing vector results of component data of all leaf tiles in the end of a spliced data path according to another embodiment of the present invention.
  • Figure 15 is a diagram showing the behavior of vector data superimposed in a hub in another embodiment of the present invention.
  • the invention provides a device for a vector data back-transfer processing unit in a fractal tree, comprising:
  • a central node which is a communication data center of the on-chip network, configured to receive vector data returned by a plurality of leaf nodes;
  • the repeater module includes a local cache structure and a data processing component for processing data communication with the upper layer and the lower layer node and processing the vector data, including a leaf repeater module directly connected to the leaf node, and a center forwarding directly connected to the central node. a module, an intermediate repeater module indirectly connected to the leaf node and the central node;
  • the plurality of leaf nodes are divided into N groups, and the number of leaf nodes in each group is the same, and the central node communicates with each group of leaf nodes separately through the repeater module, and each group of leaf nodes constitutes communication
  • the structure has a self-similarity, and the plurality of leaf nodes and the central node are communicably connected in a complete M-tree manner by the plurality of the repeater modules, each leaf node includes a set bit, if the set bit requires a leaf When the vector data in the node is shifted, the leaf node moves the vector data of the preset bandwidth bit to the corresponding position, otherwise the leaf node transmits the vector data back to the central node.
  • Each leaf node has an id identifier, and the id identifier sequentially increases the sequence number from the top side of the full M-tree; the data distribution device shares a clock signal.
  • Each of the repeater modules includes an adder of a preset bandwidth, the number of the adders is a total number of all leaf nodes, and the adder has an overflow check function, wherein if the vector data is shifted, Then, the repeater module performs bit splicing on the received vector data, and transmits the splicing result to the upper layer node. Otherwise, the repeater module performs the check overflow of the received vector data and performs the addition operation, and then transmits the vector data to the upper layer node. The upper level node.
  • the invention also proposes a method of using the device, comprising:
  • each leaf node includes a set bit, and if the set bit requires vector data in the leaf node to be shifted, The leaf node moves the vector data of the preset bandwidth bit to the corresponding position, otherwise the leaf node transmits the vector data back to the central node.
  • Each leaf node has an id identifier, and the id identifier sequentially increases the sequence number from the top side of the full M-tree; the data distribution device shares a clock signal.
  • the set bit requires the leaf node to perform the shift, and the leaf node calculates the number of bits according to the id identifier and the preset bandwidth, and moves the vector data in the leaf node to The corresponding location on the full bandwidth.
  • the repeater module performs bit splicing on the received vector data, and transmits the spliced result to the upper layer node. Otherwise, the repeater module performs check overflow on the received vector data. After the addition operation, it is transmitted to the upper layer node.
  • the leaf node and the central node comply with the handshake protocol, that is, the data senders of the next two nodes are ready to send data, send the data valid signal, and put the data on the bus; the data receiver of the upper node is ready After receiving the data, the data is ready to receive the signal; only after the above data valid signal and the data ready to receive signal are detected, the data on the bus is received by the data receiver.
  • the intermediate repeater module splicing and transmitting the vector data in the data buffer, including: first, through the adder, bit-splicing the vector data transmitted by all the received next-level nodes, and then inputting the vector data result to The upper level node.
  • the vector data backhaul is performed by the plurality of leaf nodes to the addition tree between the central nodes, if the leaf node transmits the vector data of the full bandwidth, the vector is sent by the plurality of leaf nodes.
  • the data is transmitted on the hub as follows: firstly, the data buffer of the leaf transponder module directly connected to the leaf node is superimposed and buffered, and then superimposed and transmitted in the data buffer of the intermediate transponder module, and finally the input is directly connected.
  • the data is buffered in the data buffer of the central repeater module of the central node, and finally the superimposed result is output to the central node through the output port.
  • the intermediate repeater module superimposes and transmits the vector data in the data buffer, including: first, by adding, for example, an adder, superimposing the vector data transmitted by all the received next layer nodes, and then inputting the vector data result to the upper A layer of nodes.
  • the invention also proposes a control device comprising the data distribution device.
  • the invention also proposes a smart chip comprising the control device.
  • One embodiment is a specific setting of a backhaul processing unit of vector data in an H-tree network structure.
  • FIG. 1 shows a schematic diagram of a communication device for on-chip processing and returning vector data elements of 16+1 processing units connected by an H-tree network structure.
  • the root node of the H tree is a central tile, which is a vector The end point of the data transmission; the leaf node of the H tree is the leaf tile, which is the starting point of the vector data; the remaining intermediate nodes are the hub for processing and transmitting the vector data.
  • This device implements a communication method for processing data elements of the H-tree that returns vector results.
  • FIG. 2 shows a schematic diagram of a hub structure in an H-tree network structure.
  • the hub is composed of a hub_two_add_to_one module, which includes an adder, and hub_two_add_to_one processes two sets of full-bandwidth input vector data 20 and 21 into a set of full-bandwidth vector data 22 output. For transmission from leaf tile to central tile.
  • each group of leaf tiles labeled 150.
  • the leaf tile0 is labeled with 151 leaf tiles 1, 152 and 153, 154 and 155, 156 and 157, 158 and 159, 15a and 15b, 15c and 15d, 15e and 15f, respectively, with their respective upper layers and leaves directly Connected leaf hub: Hub3_0, 141, 142, 143, 144, 145, 146, 147 labeled 140 for handshake protocol.
  • Leaf hub hub3_0, 141, 142, 143, 144, 145, 146, 147 labeled 140, respectively, with its intermediate hub: hub2_0, 131, 132, 133 labeled 130 successfully, after its successful vector data Input the data buffer of the middle hub, and perform bit splicing; similarly, in the middle hub: the hub2_0, 131, 132, 133 marked 130 is successfully handshaked with the hub1_0, 121 of the upper layer of hub: 120 marked 120, Its vector data input 120 and 121 In the data buffer, bit splicing is performed; finally, after the handshake protocol, 120 and 121 directly connect the vector data input to the data buffer of the center hub0_0 of the center tile for bit splicing, and the final bit splicing result is output to the Center tile. In this way, bit stitching vector data backhaul on this network structure can be achieved
  • the hub_two_add_to_one module labeled 330 has sent the data ready to receive signal to the bus, and the data sender 0 labeled 310 and the data sender 1 labeled 320 have already sent the data and data valid signals.
  • the handshake protocol is considered successful: the shots 310 and 320 assume that the data receiver 330 has received the data, and the next shot 330 stores the data on the beat bus into its own buffer.
  • This data transmission protocol guarantees the reliability of data in point-to-point data transmission, thus ensuring the reliability of data transmission on the network.
  • the valid data bit transmitted by the leaf tile is the vector data of the preset bandwidth, and the leaf tile is required to set the bit before the vector data is sent.
  • the leaf tile will calculate the shifted number according to the unique id number and the number of bits of the preset bandwidth, and move the component data of the preset bandwidth bit owned by it to the full bandwidth vector. The corresponding location on the data.
  • FIG. 5 it is a specific example implemented on the H-tree in FIG. 1, assuming that the full bandwidth is 251 bits, and 16-bit component data owned by 16 leaf tiles can be spliced together.
  • the process of shifting the component data D0 of the leaf tile 1 is shown in FIG. First, zero is added before the component data, so that the vector data D1 reaches the full-band number of bits, that is, 256 bits. Secondly, according to its id serial number, that is, No. 1; with its preset bandwidth bit, that is, the number of bits of its component data, 16 bits, the number of bits that the vector data should be shifted to the left is calculated by the formula (id* preset bandwidth). In this case, this vector data just needs to be shifted 16 bits to the left.
  • the visible shift causes the original component D0 to be located at the full bandwidth data data[31:161, the position of D2, forming the last vector data D3 to be transmitted.
  • each leaf tile 4 is a complete binary tree expansion of the H tree shown in FIG. 1.
  • the id identifier of each leaf tile corresponds to the queue number that is sequentially added from one side in the topology in which the H-tree is expanded into a complete binary tree, that is, the number 0 corresponding to leaf tile0. It can be seen that the id identifier of each leaf tile is unique, and the ids of all leaf tiles are consecutive natural numbers, in this case, the natural number is 0 to 15. Therefore, the component data of the preset bandwidth corresponding to each leaf tile on the full-bandwidth vector data is unique and non-colliding, and all component data is continuous on the full-bandwidth vector data. As shown in FIG.
  • this vector data represents the result of splicing all the component bits of the leaf tile whose effective data is the preset bandwidth in the above example.
  • the component D0 represents the component possessed by the leaf tile 15 in the vector data, and is located in the full bandwidth data data [255:240];
  • the component D1 represents the component possessed by the leaf tile 14 in the vector data, and is located in the whole
  • the bandwidth data data[239:224] does not conflict with the position of the full bandwidth vector data of each two different leaf tiles, and is continuous and arranged in order. It can be seen that this shift mode provides technical support for collision-free, ordered vector result backhaul on this H-tree structure.
  • the hub splicing and transmitting the vector data in the data buffer.
  • the hub stores the vector data in a local cache.
  • the number of adders in the Hub is the number of leaf nodes. In this example, there are 16 leaf nodes, so there are 16 adders.
  • Each adder can perform preset bandwidth. In this example, it is set to 16 bits, and each adder has an overflow check function.
  • the two vector data D3 and D1 delivered by leaf tile0 and leaf tile1 are superimposed and spliced.
  • the component D4 of leaf tile0 is stitched in place.
  • the data[31:16] of the full bandwidth vector data D2, that is, the position of D0; and the component D5 of the leaf tile1 is located at the data[15:0] of the full bandwidth vector data D2 after the bit splicing, that is, the position of D1.
  • their component data is ordered, unique, and non-conflicting on this vector result data.
  • the vector data shown in Fig. 7 is the result of the vector data obtained by the last processing of the hub0_0 performed by the method on the structure of Fig. 1. It can be seen that this method provides technical support for collision-free, ordered vector result backhaul.
  • each group of tiles labeled 150.
  • Leaf tile0 and leaf tile1, 152 and 153, 154 and 155, 151 and 157, 158 and 159, 15a and 15b, 15c and 15d, 15e and 15f, respectively, are directly connected to the leaves in the respective upper layers.
  • the leaf hub: the hub3_0, 141, 142, 143, 144, 145, 141, 147 labeled 140 performs the handshake protocol, inputs its vector data into the data buffer of the leaf hub, and superimposes; when the leaf hub: is marked as 140 Hub3_0, 141, 142, 143, 144, 145, 141, 147 respectively with the middle hub of the upper layer: hub2_0, 131, 132, 133 marked 130, after successful handshake, input its vector data into the data buffer of the intermediate hub In the middle, and in the middle hub: the hub2_0, 131, 132, 133 marked 130, and the hub: the hub1_0, 121 marked 120, the handshake is successful, the vector data is input into 120 and 121.
  • the vector data input is directly connected to the data buffer of the center hub0_0 of the center tile by handshaking 120 and 121, and the final superimposed result is output to the center tile through the output port.
  • the vector data in the leaf tile completes the operation of the addition tree on the path back to the central tile, realizing the vector data backhaul of the addition tree operation on the network structure.
  • the hub superimposes and transmits the vector data in the data buffer.
  • the hub stores the vector data in a local cache.
  • the number of adders in the Hub is the number of leaf nodes. In this example, there are 16 leaf nodes, so there are 16 adders.
  • Each adder can perform preset bandwidth. In this example, it is set to 16 bits, and each adder has an overflow check function. .
  • the adder superimposes the 16 components of the two vector data D3 and D5 passed by the received next layer of leaf nodes, leaf tile0 and leaf tile1.
  • the lower component D4 of D3 is located in the full bandwidth data[15:0]
  • the lower component D6 of D5 is located in the full bandwidth data[15:0].
  • the sum of the two is Write after overflow check and judgment
  • the position of the D0 component of D2 that is, data[15:0]. If the result of D4 and D6 superimposition overflows, the adder will judge and estimate the assignment based on the overflow result. In this way, the addition tree vector data back-transmission on the above device is achieved.
  • Another embodiment is a specific setting of a backhaul processing unit of vector data in an X-tree network structure.
  • Figure 9 is a diagram showing a communication device for on-chip processing and returning vector data elements of 64+1 processing units connected by an X-tree network structure.
  • the root node of the X tree is the central tile, which is the end point of the vector data transmission;
  • the leaf node of the X tree is the leaf tile, which is the starting point of the vector data;
  • the remaining intermediate nodes are the hub, which is used to process and transmit the vector data.
  • Each leaf tile has a unique id identifier, which is a queue number corresponding to the sequence in which the X-tree is expanded into a complete quadtree in order, that is, the number 0 corresponding to the leaf tile0.
  • the id identifier of each leaf tile is unique, and the ids of all leaf tiles are consecutive natural numbers, in this case, the natural number is 0 to 63.
  • the component data of the preset bandwidth corresponding to each leaf tile on the full-bandwidth vector data is unique and non-conflicting, and all component data is continuous on the full-bandwidth vector data.
  • This device implements a communication method for processing data elements of the X-tree that returns vector results.
  • Figure 10 is a block diagram showing the structure of a hub in an X-tree network structure.
  • the hub is composed of a hub_four_add_to_one module, which contains an adder, and hub_four_add_to_one processes four sets of full-bandwidth input vector data a1, a2, a3, and a4 into a set of full bandwidth.
  • the vector data a5 output is used for the transfer from the leaf tile to the central tile.
  • each group of leaf tiles labeled 940
  • leaf tile0 leaf tile labeled 941, leaf tile 2 labeled 942
  • leaf tile 3,944, 945, 946, and 947 ..., 9a0, 9a1, 9a2, and 9a3, respectively, with their respective previous ones
  • the leaf hub of the layer directly connected to the leaf hub2_0, 931, 932, 933, 934, 935, 936, 937, 938, 939, 93a, 93b, 93c, 93d, 93e, 93f labeled 930 for handshake agreement, handshake
  • the handshake protocol is considered successful: the shots b1, b2, b3 and b4 think that the data receiver b5 has received the data, and the next A beat b5 stores the data on the bus into its own buffer.
  • This data transmission protocol guarantees the reliability of data in point-to-point data transmission, thus ensuring the reliability of data transmission on the network.
  • the valid data bit transmitted by the leaf tile is the vector data of the preset bandwidth
  • the leaf tile is required to select the set data to be the component data owned by the leaf tile before the vector data is transmitted. Shift.
  • the leaf tile will calculate the shifted number according to the unique id number and the number of bits of the preset bandwidth, and move the component data of the preset bandwidth bit owned by it to the full bandwidth vector. The corresponding location on the data. As shown in FIG. 12, it is a specific example implemented on the X-tree in FIG. 9. Assuming that the full bandwidth is 1024 bits, 16-bit component data owned by 64 leaf tiles can be spliced together.
  • the process of shifting the component data c1 of the leaf tile 1 is shown in FIG. First, zero is added before the component data, so that the vector data c2 reaches the full bandwidth number of bits, that is, 1024 bits. Secondly, according to its id serial number, that is, No. 1; and its preset bandwidth bit, that is, the number of bits of its component data is 16 bits, the number of bits that the vector data should be shifted to the left is calculated by the formula (id* preset bandwidth). In this case, this vector data just needs to be shifted 16 bits to the left. The visible shift causes the original component c1 to be at the full bandwidth data data[31:16], ie the position of c3, forming the last vector data c4 to be transmitted.
  • this vector data represents the result of splicing all the component bits whose effective data of the leaf tile is the preset bandwidth in the above example.
  • the component f3 represents the component possessed by the leaf tile 63 in the vector data, and is located in the full bandwidth data data[1024:1008]; the component f2 represents the component owned by the leaf tile 62 in the vector data.
  • the component f1 shown in the figure represents the component owned by the leaf tile1 in the vector data, located in the full bandwidth data data[31:16]; the component f0 is The vector data represents the component owned by leaf tile 0 and is located in the full bandwidth data data[15:0].
  • the positions on the full bandwidth vector data of each two different leaf tiles do not conflict, and are consecutive and arranged in order. It can be seen that this shift The way is to provide technical support for collision-free, ordered vector result backhaul on this X-tree structure.
  • the hub splicing and transmitting the vector data in the data buffer.
  • the hub stores the vector data in a local cache.
  • the number of adders in the Hub is the number of leaf nodes. In this example, there are 64 leaf nodes, so there are 64 adders.
  • Each adder can perform preset bandwidth. In this example, it is set to 16 bits, and each adder has an overflow check function.
  • the four vector data e7, e9, e11, and e13 transmitted by leaf tile0, leaf tile1, leaf tile2, and leaf tile3 are superimposed and spliced. It can be seen that the component e6 of the leaf tile0 is located in the data[15:0] of the full bandwidth vector data e5 after the bit splicing, that is, the position of e1; the component e8 of the leaf tile1 is located in the data of the full bandwidth vector data e5 after the bit splicing.
  • the vector data shown in Fig. 14 is the result of the vector data obtained by the last processing of the hub0_0 performed by the method on the structure of Fig. 9. It can be seen that this method provides technical support for collision-free, ordered vector result backhaul.
  • each set of leaf tiles labeled 940
  • leaf tile0 leaf tile labeled 941, leaf tile 2 labeled 942, and leaf tile 3, 944, 945, 946, and 947, ..., 9a0, 9a1, 9a2, and 9a3, respectively, on their respective upper layers
  • the vector data is entered into the data buffer of the leaf hub and superimposed; when the leaf hub: is labeled 930, hub2_0, 931, 932, 933, 934, 935, 936, 937, 938, 939, 93a, 93b,
  • the hub superimposes and transmits the vector data in the data buffer.
  • the hub stores the vector data in a local cache.
  • the number of adders in the Hub is the number of leaf nodes. In this example, there are 64 leaf nodes, so there are 64 adders.
  • Each adder can perform preset bandwidth. In this example, it is set to 16 bits, and each adder has an overflow check function.
  • the adder superimposes the 16 components of the four vector data g5, g7, g9, and g11 transmitted from the leaf layer 0, the leaf tile 1, the leaf tile 2, and the leaf tile 3 received by the next layer of leaf nodes.
  • the lower component g6 of g5 is located in the full bandwidth data[15:0]
  • the lower component g8 of g7 is located in the full bandwidth data[15:0]
  • the lower component g10 of g9 is located in the full bandwidth data[15:0].
  • the lower component g12 of g11 is located in the data [15:0] of the full bandwidth.
  • the sum of the four is written to the position of the g13 component of the result g4 after the overflow check and judgment, that is, data[15] :0]. If the result of the superposition of g6, g8, g10, and g12 overflows, the adder will judge and estimate the assignment based on the overflow result. In this way, the addition tree vector data back-transmission on the above device is achieved.
  • a communication apparatus and method for processing a data element for backhauling a vector result for a fractal tree which realizes splicing, superimposing, etc. of vector data result bits for an on-chip network without collision
  • the return of vector results is done reliably, orderly, so that communication is more convenient and effective, and the effect is better.
  • the invention realizes the operations of bit splicing and superimposing the vector data results for the on-chip network, so that the vector results can be transmitted back without conflict, reliably and orderly, thereby obtaining better communication effects.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Algebra (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Small-Scale Networks (AREA)

Abstract

本发明提出分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片,该装置包括一中心节点,接收叶子节点回传的向量数据;多个叶子节点,对向量数据进行计算与移位;转发器模块,包括局部缓存结构与数据处理部件;将所述多个叶子节点分为N组,每组中叶子节点的个数相同,所述中心节点通过所述转发器模块单独与每一组叶子节点进行通信连接,每组叶子节点构成的通信结构具有自相似性,所述多个叶子节点与所述中心节点通过多层所述转发器模块以完全M叉树方式进行通信连接,每个叶子节点包括设置位,如果所述设置位要求叶子节点中的向量数据进行移位,则叶子节点将预设带宽位的向量数据移至相应位置,否则叶子节点将向量数据回传给中心节点。

Description

分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片 技术领域
本发明涉及智能设备、无人驾驶、片上网络数据传输等技术领域,尤其涉及分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片。
背景技术
分形树结构,是一种由一个根节点作为中心节点,多组具有自相似性的叶子节点组成的多层次树状结构。分形树在超大规模集成电路设计中得到了广泛应用,因为在应用时采用了完全M叉树的布局,这种布局的使用面积与树节点的数目成正比,能节省片上空间;而且在分形上传播的延迟是相同的,所以经常被用作VLSI多处理器中的互连网络。
在线性代数中的向量是指n个实数或者复数组成的有序数组,称为n维向量,a=(a1,a2,…ai,an)称为n维向量,其中ai称为向量α的第i个分量。
在将向量的多个零散在各个分形叶子节点的分量回传给分形根节点的过程中,分量数据在分形的中间节点上互相竞争传输,维护分量数据的传输需要一个协议,用于保证分量数据的可靠性,分量数据传回的时间在各个叶节点之间无法互相通知,当分量数据传输到根节点时,分量数据是乱序的,根节点需要和叶节点建立一套复杂机制用于维护各个分量之间的顺序,最后将分量数据按序拼成向量数据。
现有技术对片上分形网络上对向量数据传输的网络的通讯没有提供有效方便的支持,因此,提供一种同时适合分形网络上对向量数据传输的网络的通讯方式的无冲突的、可靠的、有序的方式就尤为迫切和需要。
发明公开
针对现有技术的不足,本发明提出分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片。
本发明提出一种分形树中向量数据回传处理单元的装置,包括:
一中心节点,其为所述片上网络的通信数据中心,用于接收多个叶子节点回传的向量数据;
多个叶子节点,用于对向量数据进行计算与移位;
转发器模块,包括局部缓存结构与数据处理部件,用于与上层、下层节点的数据通信与对向量数据的处理;
将所述多个叶子节点分为N组,每组中叶子节点的个数相同,所述中心节点通过所述转发器模块单独与每一组叶子节点进行通信连接,每组叶子节点构成的通信结构具有自相似性,所述多个叶子节点与所述中心节点通过多层所述转发器模块以完全M叉树方式进行通信连接,每个叶子节点包括设置位,如果所述设置位要求叶子节点中的向量数据进行移位,则叶子节点将预设带宽位的向量数据移至相应位置,否则叶子节点将向量数据回传给中心节点。
所述的分形树中向量数据回传处理单元的装置,包括:每个叶子节点均有id标识,且所述id标识从完全M叉树的拓扑一侧按序依次增加序号;所述数据发布装置共享一个时钟信号。
所述的分形树中向量数据回传处理单元的装置,每个所述转发器模块包括预设带宽的加法器,所述加法器的个数为所有叶子节点的总数量,且所述加法器具有溢出检查功能,其中,如果向量数据进行过移位,则所述转发器模块将接收到的向量数据进行位拼接,将拼接结果传输给上一层节点,否则,所述转发器模块将接收到的向量数据进行检查溢出并进行加法操作后,传输给上一层节点。
本发明还提出一种利用所述装置的方法,包括:
通过所述叶子节点对向量数据进行计算与移位,并回传到所述中心节点,其中,每个叶子节点包括设置位,如果所述设置位要求叶子节点中的向量数据进行移位,则叶子节点将预设带宽位的向量数据移至相应位置,否则叶子节点将向量数据回传给中心节点。
所述的方法,每个叶子节点均有id标识,且所述id标识从完全M叉树的拓扑一侧按序依次增加序号;所述数据发布装置共享一个时钟信号
所述的方法,若叶子节点传输的数据为有效的预设带宽的向量数据,则设置位要求叶子节点进行移位,叶子节点根据id标识与预设带宽的位数进行计算,将叶子节点中的向量数据移至全带宽上的相应位置。
所述的方法,如果向量数据进行过移位,则所述转发器模块将接收到的向量数据进行位拼接,将拼接结果传输给上一层节点,否则,所述转发器模块将 接收到的向量数据进行检查溢出并进行加法操作后,传输给上一层节点。
所述的方法,叶子节点与中心节点之间遵守握手协议。
本发明还提出一种包含所述装置的控制装置。
本发明还提出一种包含所述控制装置的智能芯片。
附图简要说明
图1为本发明的一个实施例中使用H-tree连接的16+1个核的片上多核结构示意图;
图2为本发明的一个实施例中hub_two_add_to_one结构示意图;
图3为本发明的一个实施例中hub_two_add_to_one与数据发送方握手的行为示意图;
图4为本发明的H-tree结构的展开成完全二叉树拓扑的结构示意图;
图5为本发明的一个实施例中分量数据在叶子tile中移位成为向量数据的行为示意图;
图6为本发明的一个实施例中向量数据在hub中进行位拼接的行为示意图;
图7为本发明的一个实施例中所有leaf tile的分量数据在位拼接数据通路终点时向量结果的示意图。
图8为本发明的一个实施例中向量数据在hub中进行叠加的行为示意图。
图9为本发明的另一个实施例中使用X-tree连接的64+1个核的片上多核结构示意图;
图10为本发明的另一实施例中hub_four_add_to_one结构示意图;
图11为本发明的另一实施例中hub_four_add_to_one与数据发送方握手的行为示意图;
图12为本发明的另一实施例中分量数据在叶子tile中移位成为向量数据的行为示意图;
图13为本发明的另一实施例中向量数据在hub中进行位拼接的行为示意图;
图14为本发明的另一实施例中所有leaf tile的分量数据在位拼接数据通路终点时向量结果的示意图。
图15为本发明的另一个实施例中向量数据在hub中进行叠加的行为示意图。
实现本发明的最佳方式
本发明提出一种分形树中向量数据回传处理单元的装置,包括:
一中心节点,其为所述片上网络的通信数据中心,用于接收多个叶子节点回传的向量数据;
多个叶子节点,用于对向量数据进行计算与移位;
转发器模块,包括局部缓存结构与数据处理部件,用于与上层、下层节点的数据通信与对向量数据的处理,包括与叶子节点直接相连的叶子转发器模块、与中心节点直接相连的中心转发器模块、与叶子节点与中心节点间接相连的中间转发器模块;
将所述多个叶子节点分为N组,每组中叶子节点的个数相同,所述中心节点通过所述转发器模块单独与每一组叶子节点进行通信连接,每组叶子节点构成的通信结构具有自相似性,所述多个叶子节点与所述中心节点通过多层所述转发器模块以完全M叉树方式进行通信连接,每个叶子节点包括设置位,如果所述设置位要求叶子节点中的向量数据进行移位,则叶子节点将预设带宽位的向量数据移至相应位置,否则叶子节点将向量数据回传给中心节点。
每个叶子节点均有id标识,且所述id标识从完全M叉树的拓扑一侧按序依次增加序号;所述数据发布装置共享一个时钟信号。
每个所述转发器模块包括预设带宽的加法器,所述加法器的个数为所有叶子节点的总数量,且所述加法器具有溢出检查功能,其中,如果向量数据进行过移位,则所述转发器模块将接收到的向量数据进行位拼接,将拼接结果传输给上一层节点,否则,所述转发器模块将接收到的向量数据进行检查溢出并进行加法操作后,传输给上一层节点。
本发明还提出一种利用所述装置的方法,包括:
通过所述叶子节点对向量数据进行计算与移位,并回传到所述中心节点,其中,每个叶子节点包括设置位,如果所述设置位要求叶子节点中的向量数据进行移位,则叶子节点将预设带宽位的向量数据移至相应位置,否则叶子节点将向量数据回传给中心节点。
每个叶子节点均有id标识,且所述id标识从完全M叉树的拓扑一侧按序依次增加序号;所述数据发布装置共享一个时钟信号。
若叶子节点传输的数据为有效的预设带宽的向量数据,则设置位要求叶子节点进行移位,叶子节点根据id标识与预设带宽的位数进行计算,将叶子节点中的向量数据移至全带宽上的相应位置。
如果向量数据进行过移位,则所述转发器模块将接收到的向量数据进行位拼接,将拼接结果传输给上一层节点,否则,所述转发器模块将接收到的向量数据进行检查溢出并进行加法操作后,传输给上一层节点。
叶子节点与中心节点之间遵守握手协议,即下一层两个节点的数据发送方都准备好发送数据后,发送数据有效信号,并将数据置于总线;上一层节点的数据接收方准备好接收数据后,发送数据准备接收信号;只有在上述的数据有效信号和数据准备接收信号均被检测到后,总线上的数据才会被数据接收方接收。
中间转发器模块将向量数据在数据缓存中位拼接和传输,包括:首先通过加法器,对所有的接收到的下一层节点传递来的向量数据进行位拼接,然后才将向量数据结果输入给上一层节点。
当执行所述多个叶子节点向所述中心节点之间的加法树进行向量数据回传时,如果叶子节点传输的是有效数据为全带宽的向量数据,则所述多个叶子节点发出的向量数据在hub上的传输方式如下:首先输入直接连接至所述叶子节点的叶子转发器模块的数据缓存中叠加和缓存,再依次在中间转发器模块的数据缓存中叠加和传输,最后输入直接连接至中心节点的中心转发器模块的数据缓存中进行叠加,最终将叠加结果通过输出端口输出给所述中心节点。
中间转发器模块将向量数据在数据缓存中叠加和传输,包括:首先通过如加法器,对所有的接收到的下一层节点传递来的向量数据进行叠加,然后才将向量数据结果输入给上一层节点。
本发明还提出一种包含所述数据发布装置的控制装置。
本发明还提出一种包含所述控制装置的智能芯片。
以下为本发明两个实施例,结合附图对本发明做进一步的详细说明,以令本领域技术人员参照说明书文字能够据以实施。
一个实施例是H树网络结构中向量数据的回传处理单元的具体设置。
附图1表示了一个由H树网络结构连接的16+1个处理单元的片上处理并回传向量数据元素的通讯设备示意图。H树的根节点为central tile,其为向量 数据传输的终点;H树的叶子节点为leaf tile,其为向量数据的起点;其余的中间节点为hub,用于处理并传输向量数据。此设备实现了用于H树的将向量结果回传的处理数据元素的通讯方法。
附图2表示了H树网络结构中hub结构示意图,hub由hub_two_add_to_one模块构成,其中含有一个加法器,hub_two_add_to_one将两组全带宽的输入向量数据20和21,处理成一组全带宽的向量数据22输出,用于从leaf tile到central tile的传输。
如图1,当标记为10的central tile收集到来自于各叶子tile传来的有效带宽为预设带宽的向量数据,向量数据在hub上的传输方式如下:首先每组叶子tile:标记为150的leaf tile0与标记为151的leaf tile1,152和153,154和155,156和157,158和159,15a和15b,15c和15d,15e和15f,分别与各自的上一层的与叶子直接相连的叶子hub:标记为140的hub3_0,141,142,143,144,145,146,147进行握手协议,握手成功后,把其向量数据输入叶子hub的数据缓存中,并进行位拼接;当叶子hub:标记为140的hub3_0,141,142,143,144,145,146,147分别与其上一层的中间hub:标记为130的hub2_0,131,132,133握手成功后,将其向量数据输入中间hub的数据缓存中,并进行位拼接;同样的,在中间hub:标记为130的hub2_0,131,132,133与其上一层的hub:标记为120的hub1_0,121握手成功以后,将其向量数据输入120与121的数据缓存中,并进行位拼接;最后通过握手协议后,120与121将向量数据输入直接连接至中心tile的中心hub0_0的数据缓存中进行位拼接,最终的位拼接结果通过输出端口输出给所述中心tile。通过这种方法,可以实现在此网络结构上的位拼接向量数据回传。
如图3所示,当标记为330的hub_two_add_to_one模块已经将数据准备接收信号发至总线上,且标记为310的数据发送方0与标记为320的数据发送方1已经将数据及数据有效信号发至总线时,此时握手协议才算成功:此拍310和320,认为数据接收方330已经接收数据,而下一拍330将此拍总线上的数据存入自己的缓冲区。此数据传输协议保证了在点对点的数据传输中的数据可靠性,从而保证了数据在片上网络传输的可靠性。
其中,在上述执行位拼接向量数据回传过程中,叶子tile传输的有效数据位是预设带宽的向量数据,则要求叶子tile在向量数据发送前,将设置位选择 成将其拥有的分量数据进行移位。当选择对向量数据进行移位时,叶子tile将根据独有的id序号与预设带宽的位数计算所移位数,将其所拥有的预设带宽位的分量数据移至全带宽的向量数据上的相应位置。如图5,是图1中H树上实现的一特定实例,假设全带宽为251位,可由16个叶子tile所拥有的16位的分量数据拼接而成。图5所示描述了对leaf tile1的分量数据D0位移的过程。首先,在分量数据前补零,使得向量数据D1到达全带宽的位数,即256位。其次按照其id序号,即1号;与其预设带宽位,即其分量数据的位数,16位,通过算式(id*预设带宽)计算向量数据应当左移的位数。在此例中,此向量数据恰好需要左移16位。可见移位使得原分量D0位于全带宽数据data[31:161,即D2的位置,形成了最后将要传输的向量数据D3。
其中,如图4,是图1所示的H树的完全二叉树展开。每个叶子tile的id标识,是对应其在将H树展开为完全二叉树的拓扑中从一侧按序依次增加的队列序号,即leaf tile0所对应的即0序号。可见,每个叶子tile的id标识都是唯一的,且所有叶子tile的id为自然数连续的,在此例中,即自然数0到15。由此可推,每个叶子tile在全带宽的向量数据上所对应的预设带宽的分量数据都是唯一且不冲突的,且在全带宽的向量数据上所有分量数据连续。如图7,此向量数据表示,为上述实例中,将所有由叶子tile的有效数据为预设带宽的分量位拼接而成的结果。在此实例中,分量D0在向量数据中表示的是叶子tile15所拥有的分量,位于全带宽数据data[255:240];分量D1在向量数据中表示的是叶子tile14所拥有的分量,位于全带宽数据data[239:224],每两个不同叶子tile所在全带宽向量数据上的位置均不冲突,且连续,且按序排列。可知,此移位方式为在此H树结构上的无冲突、有序的向量结果回传提供了技术支持。
其中,在上述执行位拼接向量数据回传过程中,在上述一些实例中,hub将向量数据在数据缓存中进行位拼接和传输。如图6,以图1中hub3_0为例。首先,hub将向量数据存储在局部缓存中。Hub中加法器的个数为叶子节点的数目,即在此例中,共有16个叶子节点,故共有16个加法器,每个加法器都是可以进行预设带宽的,本例中设置为16位,并且每个加法器具有溢出检查功能。对接收到的下一层一组叶子节点,leaf tile0和leaf tile1传递来的两个向量数据D3和D1进行叠加位拼接。可以看见,leaf tile0的分量D4在位拼接后 位于全带宽向量数据D2的data[31:16],即D0的位置;而leaf tile1的分量D5在位拼接后位于全带宽向量数据D2的data[15:0],即D1的位置。由此,他们的分量数据在此向量结果数据上均是有序、唯一且不冲突的。如图7所示的向量数据为在此方法在如图1的结构上执行下的hub0_0最后处理得到的向量数据结果。可见,此方法为无冲突的、有序的向量结果回传提供了技术支持。
如图1,当标记为10的central tile接收到来自与各叶子tile传来的有效带宽为全带宽的向量数据,向量数据在hub上的传输方式如下:首先每组叶子tile:标记为150的leaf tile0与标记为151的leaf tile1,152和153,154和155,151和157,158和159,15a和15b,15c和15d,15e和15f,分别于各自的上一层的与叶子直接相连的叶子hub:标记为140的hub3_0,141,142,143,144,145,141,147进行握手协议,把其向量数据输入叶子hub的数据缓存中,并进行叠加;当叶子hub:标记为140的hub3_0,141,142,143,144,145,141,147分别与其上一层的中间hub:标记为130的hub2_0,131,132,133握手成功后,将其向量数据输入中间hub的数据缓存中,并进行叠加;同样的,在中间hub:标记为130的hub2_0,131,132,133与其上一层的hub:标记为120的hub1_0,121握手成功以后,将其向量数据输入120与121的数据缓存中,并进行叠加;最后通过握手120与121将向量数据输入直接连接至中心tile的中心hub0_0的数据缓存中进行叠加,最终的叠加结果通过输出端口输出给所述中心tile。通过这种方法,可以看见,叶子tile中的向量数据在向中心tile回传的路径上完成了加法树的操作,实现在此网络结构上的进行加法树操作的向量数据回传。
其中,在上述执行加法树的向量数据回传过程中,在一些实例中,hub将向量数据在数据缓存中进行叠加并传输。如图8,以图1中hub3_0为例。首先,hub将向量数据存储在局部缓存中。Hub中加法器的个数为叶子节点的数目,即在此例中,共有16个叶子节点,故共有16个加法器,每个加法器都是可以进行预设带宽的,本例中设置为16位,并且每个加法器具有溢出检查功能。。加法器对接收到的下一层一组叶子节点,leaf tile0和leaf tile1,传递来的两个向量数据D3和D5中的16个分量分开进行叠加。可以看见,D3的低位分量D4位于全带宽的data[15:0],D5的低位分量D6位于全带宽的data[15:0],在经过加法器叠加后的结果中,两者的和在溢出检查和判断后写入 结果D2的D0分量位置,即data[15:0]。如果D4与D6叠加的结果溢出,那么加法器会根据溢出结果判断并估计赋值。通过这种方法,上述装置上的加法树向量数据回传得以实现。
另一个实施例是X树网络结构中向量数据的回传处理单元的具体设置。
附图9表示了一个由X树网络结构连接的64+1个处理单元的片上处理并回传向量数据元素的通讯设备示意图。X树的根节点为central tile,其为向量数据传输的终点;X树的叶子节点为leaf tile,其为向量数据的起点;其余的中间节点为hub,用于处理并传输向量数据。每个叶子tile具有唯一的id标识,是对应其在将X树展开为完全四叉树的拓扑中从一侧按序依次增加的队列序号,即leaf tile0所对应的即0序号。因而能够保证每个叶子tile的id标识都是唯一的,且所有叶子tile的id为自然数连续的,在此例中,即自然数0到63。每个叶子tile在全带宽的向量数据上所对应的预设带宽的分量数据都是唯一且不冲突的,且在全带宽的向量数据上所有分量数据连续。此设备实现了用于X树的将向量结果回传的处理数据元素的通讯方法。
附图10表示了X树网络结构中hub的结构示意图,hub由hub_four_add_to_one模块构成,其中含有一个加法器,hub_four_add_to_one将四组全带宽的输入向量数据a1、a2、a3和a4,处理成一组全带宽的向量数据a5输出,用于从leaf tile到central tile的传输。
如图9,当标记为90的central tile收集到来自于各叶子tile传来的有效带宽为预设带宽的向量数据,向量数据在hub上的传输方式如下:首先每组叶子tile:标记为940的leaf tile0、标记为941的leaf tile1、标记为942的leaf tile2和标记为943的leaf tile3,944、945、946和947,……,9a0、9a1、9a2和9a3,分别与各自的上一层的与叶子直接相连的叶子hub:标记为930的hub2_0,931,932,933,934,935,936,937,938,939,93a,93b,93c,93d,93e,93f进行握手协议,握手成功后,把其向量数据输入叶子hub的数据缓存中,并进行位拼接;当叶子hub:标记为930的hub2_0,931,932,933,934,935,936,937,938,939,93a,93b,93c,93d,93e,93f分别与其上一层的中间hub:标记为920的hub1_0,921,922,923握手成功后,将其向量数据输入中间hub的数据缓存中,并进行位拼接;最后通过握手协议后,标记为920的hub1_0,921,922,923将向量数据输入直接连接至标记为910的中心tile 的中心hub0_0的数据缓存中进行位拼接,最终的位拼接结果通过输出端口输出给所述标记为90的中心tile。通过这种方法,可以实现在此网络结构上的位拼接向量数据回传。
如图11所示,当标记为b5的hub_four_add_to_one模块已经将数据准备接收信号发至总线上,且标记为b1的数据发送方0、标记为b2的数据发送方1、标记为b3的数据发送方2和标记为b4的数据发送方3已经将数据及数据有效信号发至总线时,此时握手协议才算成功:此拍b1、b2、b3和b4认为数据接收方b5已经接收数据,而下一拍b5将此拍总线上的数据存入自己的缓冲区。此数据传输协议保证了在点对点的数据传输中的数据可靠性,从而保证了数据在片上网络传输的可靠性。
其中,在上述执行位拼接向量数据回传过程中,叶子tile传输的有效数据位是预设带宽的向量数据,则要求叶子tile在向量数据发送前,将设置位选择成将其拥有的分量数据进行移位。当选择对向量数据进行移位时,叶子tile将根据独有的id序号与预设带宽的位数计算所移位数,将其所拥有的预设带宽位的分量数据移至全带宽的向量数据上的相应位置。如图12,是图9中X树上实现的一特定实例,假设全带宽为1024位,可由64个叶子tile所拥有的16位的分量数据拼接而成。图12所示描述了对leaf tile1的分量数据c1位移的过程。首先,在分量数据前补零,使得向量数据c2到达全带宽的位数,即1024位。其次按照其id序号,即1号;与其预设带宽位,即其分量数据的位数16位,通过算式(id*预设带宽)计算向量数据应当左移的位数。在此例中,此向量数据恰好需要左移16位。可见移位使得原分量c1位于全带宽数据data[31:16],即c3的位置,形成了最后将要传输的向量数据c4。
如图14,此向量数据表示,为上述实例中,将所有由叶子tile的有效数据为预设带宽的分量位拼接而成的结果。在此实例中,分量f3在向量数据中表示的是leaf tile 63所拥有的分量,位于全带宽数据data[1024:1008];分量f2在向量数据中表示的是leaf tile 62所拥有的分量,位于全带宽数据data[1007:992],以此类推,图中所示的分量f1在向量数据中表示的是leaf tile1所拥有的分量,位于全带宽数据data[31:16];分量f0在向量数据中表示的是leaf tile 0所拥有的分量,位于全带宽数据data[15:0]。每两个不同leaf tile所在全带宽向量数据上的位置均不冲突,且连续,且按序排列。可知,此移位 方式为在此X树结构上的无冲突、有序的向量结果回传提供了技术支持。
其中,在上述执行位拼接向量数据回传过程中,在上述一些实例中,hub将向量数据在数据缓存中进行位拼接和传输。如图13,以图9中hub2_0为例。首先,hub将向量数据存储在局部缓存中。Hub中加法器的个数为叶子节点的数目,即在此例中,共有64个叶子节点,故共有64个加法器,每个加法器都是可以进行预设带宽的,本例中设置为16位,并且每个加法器具有溢出检查功能。对接收到的下一层一组叶子节点,leaf tile0、leaf tile1、leaf tile2和leaf tile3传递来的四个向量数据e7、e9、e11和e13进行叠加位拼接。可以看见,leaf tile0的分量e6在位拼接后位于全带宽向量数据e5的data[15:0],即e1的位置;leaf tile1的分量e8在位拼接后位于全带宽向量数据e5的data[31:16],即e2的位置;leaf tile2的分量e10在位拼接后位于全带宽向量数据e5的data[47:32],即e3的位置;leaf tile3的分量e8在位拼接后位于全带宽向量数据e5的data[63:48],即e4的位置。由此,他们的分量数据在此向量结果数据上均是有序、唯一且不冲突的。如图14所示的向量数据为在此方法在如图9的结构上执行下的hub0_0最后处理得到的向量数据结果。可见,此方法为无冲突的、有序的向量结果回传提供了技术支持。
如图9,当标记为90的central tile接收到来自与各leaf tile传来的有效带宽为全带宽的向量数据,向量数据在hub上的传输方式如下:首先每组叶子tile:标记为940的leaf tile0、标记为941的leaf tile1、标记为942的leaf tile2和标记为943的leaf tile3,944、945、946和947,……,9a0、9a1、9a2和9a3,分别于各自的上一层的与叶子直接相连的叶子hub:标记为930的hub2_0,931,932,933,934,935,936,937,938,939,93a,93b,93c,93d,93e,93f进行握手协议,把其向量数据输入叶子hub的数据缓存中,并进行叠加;当叶子hub:标记为930的hub2_0,931,932,933,934,935,936,937,938,939,93a,93b,93c,93d,93e,93f分别与其上一层的中间hub:标记为920的hub1_0,921,922,923握手成功后,将其向量数据输入中间hub的数据缓存中,并进行叠加;最后通过握手标记为920的hub1_0,921,922,923将向量数据输入直接连接至标记为910的中心tile的中心hub0_0的数据缓存中进行叠加,最终的叠加结果通过输出端口输出给所述标记为90的中心tile。通过这种方法,可以看见,叶子tile中的向量数据在向中心tile回传的路径上完成 了加法树的操作,实现在此网络结构上的进行加法树操作的向量数据回传。
其中,在上述执行加法树的向量数据回传过程中,在一些实例中,hub将向量数据在数据缓存中进行叠加并传输。如图15,以图9中hub2_0为例。首先,hub将向量数据存储在局部缓存中。Hub中加法器的个数为叶子节点的数目,即在此例中,共有64个叶子节点,故共有64个加法器,每个加法器都是可以进行预设带宽的,本例中设置为16位,并且每个加法器具有溢出检查功能。加法器对接收到的下一层一组叶子节点,leaf tile0、leaf tile1、leaf tile2和leaf tile3传递来的四个向量数据g5、g7、g9和g11中的16个分量分开进行叠加。可以看见,g5的低位分量g6位于全带宽的data[15:0],g7的低位分量g8位于全带宽的data[15:0],g9的低位分量g10位于全带宽的data[15:0],g11的低位分量g12位于全带宽的data[15:0],在经过加法器叠加后的结果中,四者的和在溢出检查和判断后写入结果g4的g13分量位置,即data[15:0]。如果g6、g8、g10与g12叠加的结果溢出,那么加法器会根据溢出结果判断并估计赋值。通过这种方法,上述装置上的加法树向量数据回传得以实现。
这里说明的装置和处理规模是用来简化本发明的说明的。对本发明的用于分形树(以H树和X树为例)的将向量结果回传的处理数据元素的通讯装置和方法的应用、修改和变化对本领域的技术人员来说是显而易见的。
如上所述,根据本发明,由于提供用于分形树的将向量结果回传的处理数据元素的通讯装置和方法,其在,为片上网络实现对向量数据结果位拼接、叠加等操作,无冲突地、可靠地、有序地完成向量结果的回传,从而使通信更为便捷和有效,效果更好。
工业应用性
本发明为片上网络实现了对向量数据结果进行位拼接、叠加等操作,使得向量结果能够无冲突地、可靠地、有序地进行回传,从而获得更好的通讯效果。

Claims (10)

  1. 一种分形树中向量数据回传处理单元的装置,其特征在于,包括:
    一中心节点,其为片上网络的通信数据中心,用于接收多个叶子节点回传的向量数据;
    多个叶子节点,用于对向量数据进行计算与移位;
    转发器模块,包括局部缓存结构与数据处理部件,用于与上层、下层节点的数据通信与对向量数据的处理;
    将所述多个叶子节点分为N组,每组中叶子节点的个数相同,所述中心节点通过所述转发器模块单独与每一组叶子节点进行通信连接,每组叶子节点构成的通信结构具有自相似性,所述多个叶子节点与所述中心节点通过多层所述转发器模块以完全M叉树方式进行通信连接,每个叶子节点包括设置位,如果所述设置位要求叶子节点中的向量数据进行移位,则叶子节点将预设带宽位的向量数据移至相应位置,否则叶子节点将向量数据回传给中心节点。
  2. 如权利要求1所述的分形树中向量数据回传处理单元的装置,其特征在于,包括:每个叶子节点均有id标识,且所述id标识从完全M叉树的拓扑一侧按序依次增加序号;所述数据发布装置共享一个时钟信号。
  3. 如权利要求1所述的分形树中向量数据回传处理单元的装置,其特征在于,每个所述转发器模块包括预设带宽的加法器,所述加法器的个数为所有叶子节点的总数量,且所述加法器具有溢出检查功能,其中,如果向量数据进行过移位,则所述转发器模块将接收到的向量数据进行位拼接,将拼接结果传输给上一层节点,否则,所述转发器模块将接收到的向量数据进行检查溢出并进行加法操作后,传输给上一层节点。
  4. 一种利用如权利要求1-3任意一项装置的方法,其特征在于,包括:
    通过所述叶子节点对向量数据进行计算与移位,并回传到所述中心节点,其中,每个叶子节点包括设置位,如果所述设置位要求叶子节点中的向量数据进行移位,则叶子节点将预设带宽位的向量数据移至相应位置,否则叶子节点将向量数据回传给中心节点。
  5. 如权利要求4所述的方法,其特征在于,每个叶子节点均有id标识,且 所述id标识从完全M叉树的拓扑一侧按序依次增加序号;所述数据发布装置共享一个时钟信号。
  6. 如权利要求5所述的方法,其特征在于,若叶子节点传输的数据为有效的预设带宽的向量数据,则设置位要求叶子节点进行移位,叶子节点根据id标识与预设带宽的位数进行计算,将叶子节点中的向量数据移至全带宽上的相应位置。
  7. 如权利要求4所述的方法,其特征在于,如果向量数据进行过移位,则所述转发器模块将接收到的向量数据进行位拼接,将拼接结果传输给上一层节点,否则,所述转发器模块将接收到的向量数据进行检查溢出并进行加法操作后,传输给上一层节点。
  8. 如权利要求4所述的方法,其特征在于,叶子节点与中心节点之间遵守握手协议。
  9. 一种包含如权利要求1所述装置的控制装置。
  10. 一种包含如权利要求9所述控制装置的智能芯片。
PCT/CN2016/086094 2015-12-24 2016-06-17 分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片 WO2017107411A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/781,039 US10866924B2 (en) 2015-12-24 2016-06-17 Device for vector data returning processing unit in fractal tree, method, control device, and intelligent chip

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510983391.8A CN105630733B (zh) 2015-12-24 2015-12-24 分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片
CN201510983391.8 2015-12-24

Publications (1)

Publication Number Publication Date
WO2017107411A1 true WO2017107411A1 (zh) 2017-06-29

Family

ID=56045696

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/086094 WO2017107411A1 (zh) 2015-12-24 2016-06-17 分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片

Country Status (3)

Country Link
US (1) US10866924B2 (zh)
CN (1) CN105630733B (zh)
WO (1) WO2017107411A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630733B (zh) 2015-12-24 2017-05-03 中国科学院计算技术研究所 分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片
CN105550157B (zh) * 2015-12-24 2017-06-27 中国科学院计算技术研究所 一种分形树结构通信结构、方法、控制装置及智能芯片
CN105634960B (zh) * 2015-12-24 2017-04-05 中国科学院计算技术研究所 基于分形树结构的数据发布装置、方法、控制装置及智能芯片
CN108830436B (zh) * 2018-04-08 2020-08-11 浙江广播电视大学 基于分形树自平衡划分的共享自行车调度方法
CN111860799A (zh) * 2019-04-27 2020-10-30 中科寒武纪科技股份有限公司 运算装置
US11841822B2 (en) 2019-04-27 2023-12-12 Cambricon Technologies Corporation Limited Fractal calculating device and method, integrated circuit and board card
WO2020220935A1 (zh) 2019-04-27 2020-11-05 中科寒武纪科技股份有限公司 运算装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6334125B1 (en) * 1998-11-17 2001-12-25 At&T Corp. Method and apparatus for loading data into a cube forest data structure
US8886677B1 (en) * 2004-07-23 2014-11-11 Netlogic Microsystems, Inc. Integrated search engine devices that support LPM search operations using span prefix masks that encode key prefix length
CN105630733A (zh) * 2015-12-24 2016-06-01 中国科学院计算技术研究所 分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1332563C (zh) * 2003-12-31 2007-08-15 中国科学院计算技术研究所 一种视频图像跳过宏块的编码方法
US20070253642A1 (en) * 2006-04-27 2007-11-01 Mapinfo Corporation Method and apparatus for indexing, storing and retrieving raster (GRID) data in a combined raster vector system
US8040823B2 (en) * 2007-01-08 2011-10-18 Industrial Technology Research Institute Method and system for network data transmitting
TWI398127B (zh) * 2008-04-08 2013-06-01 Ind Tech Res Inst 無線感測網路及其取樣頻率設定方法
US20190377580A1 (en) * 2008-10-15 2019-12-12 Hyperion Core Inc. Execution of instructions based on processor and data availability
GB201210702D0 (en) * 2012-06-15 2012-08-01 Qatar Foundation A system and method to store video fingerprints on distributed nodes in cloud systems
CN103974268B (zh) * 2013-01-29 2017-09-29 上海携昌电子科技有限公司 精细粒度可调的低延时传感器网络数据传输方法
CN103281707B (zh) * 2013-06-07 2015-08-19 北京交通大学 面向轨道交通基础设施服役状态检测的接入网构建方法
US10262274B2 (en) * 2013-07-22 2019-04-16 Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi Incremental learner via an adaptive mixture of weak learners distributed on a non-rigid binary tree
CN105512724B (zh) * 2015-12-01 2017-05-10 中国科学院计算技术研究所 加法器装置、数据累加方法及数据处理装置
US10042875B2 (en) * 2016-09-26 2018-08-07 International Business Machines Corporation Bloom filter index for device discovery
US10817490B2 (en) * 2017-04-28 2020-10-27 Microsoft Technology Licensing, Llc Parser for schema-free data exchange format

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6334125B1 (en) * 1998-11-17 2001-12-25 At&T Corp. Method and apparatus for loading data into a cube forest data structure
US8886677B1 (en) * 2004-07-23 2014-11-11 Netlogic Microsystems, Inc. Integrated search engine devices that support LPM search operations using span prefix masks that encode key prefix length
CN105630733A (zh) * 2015-12-24 2016-06-01 中国科学院计算技术研究所 分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片

Also Published As

Publication number Publication date
US10866924B2 (en) 2020-12-15
US20200272595A1 (en) 2020-08-27
CN105630733A (zh) 2016-06-01
CN105630733B (zh) 2017-05-03

Similar Documents

Publication Publication Date Title
WO2017107411A1 (zh) 分形树中向量数据回传处理单元的装置、方法、控制装置及智能芯片
TWI803663B (zh) 一種運算裝置和運算方法
US8819611B2 (en) Asymmetric mesh NoC topologies
CN105049353B (zh) 一种为业务配置路由路径的方法及控制器
CN109698788A (zh) 流量转发方法和流量转发装置
CN103501236B (zh) 网络控制平面逻辑拓扑生成方法及装置
CN111865799B (zh) 路径规划方法、装置、路径规划设备及存储介质
US11620501B2 (en) Neural network apparatus
US20070180182A1 (en) System and method for a distributed crossbar network using a plurality of crossbars
CN100573490C (zh) 模块互连结构
CN105634974A (zh) 软件定义网络中的路由确定方法和装置
CN107851078A (zh) 一种PCIe设备的聚合友好型地址分配的方法和系统
CN104866460B (zh) 一种基于SoC的容错自适应可重构系统与方法
CN105634960B (zh) 基于分形树结构的数据发布装置、方法、控制装置及智能芯片
CN105550157B (zh) 一种分形树结构通信结构、方法、控制装置及智能芯片
CN105681215A (zh) 一种转发表项的生成方法及控制器
CN102025615A (zh) 一种光通讯网络中小粒度业务路径规划的方法及装置
TW202127840A (zh) 初始化晶片上操作
JP2020046713A (ja) 並列計算機システム、並列計算機システムの制御方法、及びプログラム
CN104539485A (zh) 一种点到点双向链路的自动拓扑识别方法
CN108965012B (zh) 节点行列全互连网络的高效传输方法
CN101119395B (zh) 用于向通信用户自动分配地址的方法和通信用户
CN110196829A (zh) 源设备和至少一个目标设备间事务路由的管理方法及系统
CN104601423B (zh) Spi总线节点设备及其通信方法以及spi总线系统
CN104158740A (zh) 一种路径管理方法及控制器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16877224

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16877224

Country of ref document: EP

Kind code of ref document: A1