WO2023279369A1 - 一种数据传输装置、方法及相关设备 - Google Patents
一种数据传输装置、方法及相关设备 Download PDFInfo
- Publication number
- WO2023279369A1 WO2023279369A1 PCT/CN2021/105474 CN2021105474W WO2023279369A1 WO 2023279369 A1 WO2023279369 A1 WO 2023279369A1 CN 2021105474 W CN2021105474 W CN 2021105474W WO 2023279369 A1 WO2023279369 A1 WO 2023279369A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- router
- sending
- data
- processing unit
- receiving
- Prior art date
Links
- 230000005540 biological transmission Effects 0.000 title claims abstract description 186
- 238000000034 method Methods 0.000 title claims abstract description 78
- 238000012545 processing Methods 0.000 claims abstract description 307
- 238000004891 communication Methods 0.000 claims abstract description 32
- 230000007246 mechanism Effects 0.000 claims description 61
- 235000008694 Humulus lupulus Nutrition 0.000 claims description 11
- 238000010586 diagram Methods 0.000 description 40
- 230000008569 process Effects 0.000 description 36
- 238000013507 mapping Methods 0.000 description 20
- 230000001360 synchronised effect Effects 0.000 description 15
- 230000006870 function Effects 0.000 description 13
- 238000013461 design Methods 0.000 description 12
- 238000007726 management method Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 7
- 230000010354 integration Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 238000003032 molecular docking Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006855 networking Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005538 encapsulation Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000000630 rising effect Effects 0.000 description 2
- 230000001960 triggered effect Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000013500 data storage Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000011143 downstream manufacturing Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/58—Association of routers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/645—Splitting route computation layer and forwarding layer, e.g. routing according to path computational element [PCE] or based on OpenFlow functionality
- H04L45/655—Interaction between route computation entities and forwarding entities, e.g. for route determination or for flow table update
Definitions
- the present application relates to the field of information technology, and in particular to a data transmission device, method and related equipment.
- ip chip intellectual property core
- Microprocessor central processing unit (central processing unit, cpu), digital signal processing (digital signal processing, dsp), image processing unit (graphics processing unit, gpu), neural network processing unit (neural-network processing unit, npu) Memory, network connection chips, etc.
- ip cores such as Microprocessor, central processing unit (central processing unit, cpu), digital signal processing (digital signal processing, dsp), image processing unit (graphics processing unit, gpu), neural network processing unit (neural-network processing unit, npu) Memory, network connection chips, etc.
- Asynchronous circuit design technology naturally cancels the clock, and can realize the new SoC integrated architecture of globally asynchronous local synchronization (gals), which can greatly simplify the complexity of chip design and reduce development investment and cycle.
- the noc (network over chip/network on chip) architecture is the current mainstream large-scale ip core integration bus integration technology.
- Each routing node (node) is connected to other routing nodes in four interconnection directions to form a network.
- each process entity (pe) is only connected to one of the routing nodes and communicates with other pes, where different pes work at different main frequencies.
- each routing node can only be connected to one processing entity (ie, pe), and the number of routers is large, occupying a large chip area.
- the current noc architecture adopts a mesh interconnection architecture, and the IP scale of the processing units that need to be integrated is large, such as up to 100 million-level transistors, and the routers have many and long outgoing lines, introducing large transmission delays and different local directions. The difference in latency is large, which makes timing analysis and convergence difficult.
- Embodiments of the present application provide a data transmission device, method, and related equipment, which can improve system performance while reducing chip area.
- the embodiment of the present application provides a data transmission device, which may include: multiple processing units and multiple routers, each of the above-mentioned routers is connected to one or more processing units, each of the above-mentioned routers is connected to the above-mentioned multiple routers Any one of the routers forms a communication connection relationship; wherein, the plurality of routers include a first router, and the first router is connected to the first processing unit.
- the above-mentioned first processing unit is configured to: generate a first request, the above-mentioned first request is used to request to send target data to the second processing unit, and the above-mentioned target data includes the destination address of the second processing unit; After the state of receiving data is ready, determine a first clock signal based on the first request; send the target data to the first router based on the first clock signal, and send the first clock signal to the first router.
- the first router is configured to: receive the first clock signal; receive the target data sent by the first processing unit based on the first clock signal; and send the target data to the second processing unit according to the destination address.
- the data transmission device based on the asynchronous handshake mechanism determines that the router (such as: the first router) is ready to receive data, it makes the processing unit (such as: the first processing unit) request, generate a first clock signal; and send the first clock signal and the target data to the router according to the first clock signal, so that the router can receive the target data through the received first clock signal, and then the router according to The destination address carried in the received target data is sent to the second processing unit.
- This transmission mode of asynchronous handshake between the processing unit and the router can ensure that the router receives and completes the target data.
- the processing unit also sends the clock signal (that is, the first clock signal) when sending the target data to the router, so that the router can receive the data according to the clock signal, which reduces the clock constraints in the data transmission device, and makes data transmission
- the clock signal that is, the first clock signal
- multiple routers are not limited by synchronous clocks, making decisions faster and effectively improving the transmission performance of the system.
- the data line connecting the processing unit and the router is relatively short and relatively definite, which will further lead to a small and relatively definite time delay of the corresponding clock signal when the processing unit needs to send data.
- a router can be asynchronously connected to multiple processing units, which greatly reduces the chip area occupied by the bus.
- the above-mentioned first router is further configured to: after the above-mentioned target data is received, adjust the state of receiving data of the above-mentioned first router to be not ready for receiving data; the above-mentioned first processing unit It is also used for: determining that the sending of the target data is completed after detecting that the state of the first router's state of receiving data is changed from ready to not ready.
- the processing unit can determine that the data transmission is complete, and then stop the data transmission to save communication resources.
- the ready state and the not ready state can be identified by high and low electric signals respectively.
- each of the above-mentioned processing units includes a first asynchronous handshake circuit; the above-mentioned first processing unit is specifically configured to: after determining that the state of receiving data of the above-mentioned first router is ready through the above-mentioned first asynchronous handshake circuit , based on the first request, determine the first clock signal.
- the first clock signal (also referred to as a self-sequential clock) is provided by an asynchronous handshake circuit.
- the asynchronous handshake circuit has a simple structure and can generate a self-sequential clock through a self-loop, that is, when the state of receiving data of the router is ready and the first request exists at the same time, the self-sequential clock can be generated to drive asynchronous message transmission
- the router sends the target data to the router in serial single-bit transmission.
- each of the above-mentioned processing units includes an asynchronous message transmitter; the above-mentioned first processing unit is specifically configured to: based on the above-mentioned first request, control the above-mentioned asynchronous message transmitter to transmit the above-mentioned target message based on the above-mentioned first clock signal
- the data is sent to the above-mentioned first router in a serial single-bit transmission manner.
- the asynchronous message transmitter can receive the drive of the first clock signal to send the target data to the router in a serial single-bit transmission mode, so as to realize asynchronous transmission between the processing unit and the router.
- the data form of the above-mentioned target data during the sending process is a variable-length or fixed-length data packet;
- the above-mentioned first processing unit is further configured to: after generating the first request, set the Packet header, sending the above-mentioned packet header and the above-mentioned first clock signal to the above-mentioned first router; after the last bit of data of the above-mentioned target data is sent, the packet tail of the above-mentioned target data is set and sent; the above-mentioned first router is also used for: After receiving the packet header corresponding to the above-mentioned target data, start receiving the above-mentioned target data; after receiving the packet tail of the above-mentioned target data, adjust the status of the first router from ready to receive data to not ready to receive data.
- the asynchronous transmission of the target data is realized by setting the header and tail of the target data, so that there is no need to synchronize the clock between the processing unit and the router, and it is also easier for a router to integrate multiple heterogeneous processing units or intellectual property core.
- each of the above-mentioned processing units includes a storage area based on a first-in-first-out storage mechanism; the above-mentioned first processing unit is specifically configured to: write the above-mentioned target data to the above-mentioned storage area based on a first-in-first-out storage mechanism After the zone, generate the first request above.
- the above-mentioned first processing unit is specifically configured to: write the above-mentioned target data to the above-mentioned storage area based on a first-in-first-out storage mechanism After the zone, generate the first request above.
- the above-mentioned first processing unit and the above-mentioned first router are connected through an asynchronous message bus, wherein the above-mentioned asynchronous message bus includes a receiving ready signal line, a clock signal line, a message valid bit signal line and a Root or multiple data lines.
- the asynchronous message bus includes four signal lines, that is, a receive ready signal line, a clock signal line, a message valid bit signal line and one or more data lines.
- the receiving ready signal line is used to transmit the ready signal, and the ready signal is used to indicate that the state of the received data is ready;
- the clock signal line is used to transmit the first clock signal;
- the message effective bit signal line is used to transmit the header signal and packet of the target data.
- each of the above-mentioned routers includes multiple groups of ports, and each group of the above-mentioned ports includes a receiving port and a sending port, wherein each of the above-mentioned receiving ports is used for receiving data, and each of the above-mentioned sending ports is used for sending data .
- the router configures a processing unit connected to the router or other routers through a configurable port.
- the networking architecture can be flexibly reconfigured, such as point-to-point, multipoint-to-multipoint and other architectures.
- each port inside the router is connected with a receiving unit or a sending unit, so as to send and receive data.
- each receiving port corresponds to a receiving unit, and each receiving unit includes a storage area based on a first-in-first-out storage mechanism; the first router is specifically configured to: based on the first clock signal Drive the storage area in the first receiving unit to receive the target data sent by the first processing unit through a target receiving port, where the target receiving port is a receiving port connected to the first processing unit in the first router.
- the first-in first-out storage mechanism when multiple target data need to be sent, they are sent sequentially according to a certain time order, so that the sending unit makes faster decisions during the sending process, which can effectively improve the system efficiency. transmission performance.
- the storage area based on the first-in-first-out storage mechanism can be adapted to adapt synchronously to asynchronously, write data synchronously (such as: synchronously write target data in the processing unit to the sending unit) or read data (such as: router
- the sending unit reads data synchronously based on the storage area of the receiving unit), reads data asynchronously (such as: the processing unit sends the target data to the router asynchronously from the sending unit in the processing unit) or writes data (such as: the router in the router The first receiving unit writes data asynchronously).
- each of the above-mentioned sending ports corresponds to a sending unit; the above-mentioned first router is specifically used to: determine the target sending port in the above-mentioned first router according to the above-mentioned destination address, and the above-mentioned target sending port is the above-mentioned A sending port in the first router corresponding to the second processing unit; sending the target data to the second processing unit through the first sending unit corresponding to the target sending port.
- the port-based configurable router determines the target sending port corresponding to the first sending unit according to the destination address, and sends the target data to the second processing unit through the sending port.
- each of the above-mentioned routers includes a mapping table, and the above-mentioned mapping table includes the relationship between the port identifier of each of the above-mentioned sending ports in the above-mentioned router and the corresponding unit identifier of the above-mentioned processing unit or the routing identifier of other above-mentioned routers.
- the above-mentioned unit identifier is used to uniquely determine the above-mentioned processing unit, and the above-mentioned routing identifier is used to uniquely determine the above-mentioned router; the above-mentioned first router is specifically used to: determine the above-mentioned target based on the above-mentioned destination address based on the mapping table in the above-mentioned first router sending port.
- the route forwarding process is simplified and the transmission efficiency is improved by querying the route forwarding mechanism of the mapping table.
- the target sending port is The sending port with the least number of hops connected to the second router among the first routers.
- the data with the most routing connection hops is to subtract one from the number of routers, and the router can select the transmission path with the least connection hops according to the destination address.
- the target data is sent to the second processing unit.
- the above-mentioned first router is specifically configured to: when the above-mentioned first sending unit receives the second request sent by the above-mentioned first receiving unit, control the above-mentioned first sending unit to receive the request from the above-mentioned first receiving unit Acquiring the above-mentioned target data in the storage area, the above-mentioned second request is used to request to send the above-mentioned target data through the above-mentioned first sending unit; The unit sends the above object data.
- the storage area of the fifo mechanism at the receiving end is reused through a simple sending unit based on shared data, thereby reducing data movement and improving transmission efficiency.
- each of the above-mentioned routers includes a channel selector; the channel selector of the above-mentioned first router is used to connect the data path from the above-mentioned first receiving unit to the above-mentioned first sending unit, so that the above-mentioned first sending unit The unit acquires the above-mentioned target data from the storage area of the above-mentioned first receiving unit.
- the channel selector can connect the data path between the receiving unit and the sending unit, so that the sending unit can multiplex the fifo storage area of the receiving unit through the data path, reducing Data migration greatly improves the transmission performance of the router.
- each of the above-mentioned routers includes an arbitrator, and each of the above-mentioned sending units corresponds to one of the above-mentioned arbitrators; the arbitrator of the above-mentioned first router is used to: request the m receiving units to the first sending unit at the same time When sending data, according to a preset arbitration rule, determine the target receiving unit from the m receiving units, where m is greater than 1 and less than or equal to the number of all the receiving units included in the router.
- the channel selector of the above-mentioned first router is also used to connect the data path from the above-mentioned target receiving unit to the above-mentioned first sending unit after the above-mentioned arbiter determines the above-mentioned target receiving unit, so that the above-mentioned first sending unit
- the storage area of the above-mentioned target receiving unit acquires and sends data.
- an arbitrator is used to implement a "many to one" fair arbitration mechanism to reduce conflicts when routing and forwarding data. Wherein, in order to ensure the normal operation of each sending unit, there is a one-to-one correspondence between the arbitrator and the sending unit.
- the above-mentioned arbitrator includes a second asynchronous handshake circuit; the second asynchronous handshake circuit of the above-mentioned first router is used to: after determining that the state of the sending data of the above-mentioned first sending unit is ready, based on the above-mentioned target The receiving unit sends a signal requesting to send data to the above-mentioned first sending unit, and determines the second clock signal; the channel selector of the above-mentioned first router is specifically used to: based on the above-mentioned second clock signal, connect the above-mentioned target receiving unit to the above-mentioned second sent data path.
- the arbitrator in the router implements a fair arbitration mechanism in the data transmission device based on a simple token ring mechanism of a handshake circuit such as a Click circuit.
- the arbiter implements a data packet-based transmission mechanism by using a public arbitration mechanism that is dependent on timing with the receiving unit, and has high performance. It can be understood that the arbiter in the router is an asynchronous arbiter.
- the number of the second asynchronous handshake circuits in each of the arbitrators is one less than the number of receiving ports in the router.
- the number of second asynchronous handshake circuits in the arbiter is compared with the number of receiving ports in the router One less.
- the embodiment of the present application provides a data transmission method, which is applied to a data transmission device.
- the data transmission device includes: a plurality of processing units and a plurality of routers, and each of the above routers is connected to one or more processing units.
- Each of the above-mentioned routers forms a communication connection relationship with any one of the above-mentioned multiple routers; wherein, the above-mentioned multiple routers include a first router, and the above-mentioned first router is connected to the first processing unit; the above-mentioned method includes: through the above-mentioned first processing unit Generate a first request, the first request is used to request to send the target data to the second processing unit, the target data includes the destination address of the second processing unit; determine the state of the received data of the first router through the first processing unit When ready, determine a first clock signal based on the first request; send the target data to the first router through the first processing unit based on the first clock signal, and send the first clock signal to the first router ; Receive the above-mentioned first clock signal through the above-mentioned first router; receive the above-mentioned target data sent by the above-mentioned first processing unit through the above-mentioned first router based on the above-ment
- the method further includes: after the target data is received by the first router, adjusting the state of the first router from ready to receive data to not ready to receive data; A processing unit determines that the sending of the target data is completed after detecting that the state of receiving data of the first router is changed from ready to not ready.
- each of the above-mentioned processing units includes a first asynchronous handshake circuit; the above-mentioned first processing unit is specifically configured to: after determining that the state of receiving data of the above-mentioned first router is ready through the above-mentioned first asynchronous handshake circuit , based on the first request, determine the first clock signal.
- each of the above-mentioned processing units includes an asynchronous message transmitter; the above-mentioned first processing unit is specifically configured to: based on the above-mentioned first request, control the above-mentioned asynchronous message transmitter to transmit the above-mentioned target message based on the above-mentioned first clock signal The data is sent to the above-mentioned first router in a serial single-bit transmission manner.
- the data form of the above-mentioned target data in the sending process is a variable-length or fixed-length data packet; the above-mentioned method also includes: after the first request is generated by the above-mentioned first processing unit, setting the above-mentioned target The packet header of the data, the above-mentioned packet header and the above-mentioned first clock signal are sent to the above-mentioned first router; after the last bit of data of the above-mentioned target data is sent, the packet tail of the above-mentioned target data is set and sent; After arriving at the packet header corresponding to the above-mentioned target data, start receiving the above-mentioned target data; after receiving the packet tail of the above-mentioned target data, adjust the status of the first router from ready to receive data to not ready to receive data.
- each of the above-mentioned processing units includes a storage area based on a first-in-first-out storage mechanism; the above-mentioned generating the first request by the above-mentioned first processing unit includes: writing the above-mentioned target data by the above-mentioned first processing unit After entering the storage area based on the above-mentioned first-in-first-out storage mechanism, the above-mentioned first request is generated.
- the above-mentioned first processing unit and the above-mentioned first router are connected through an asynchronous message bus, wherein the above-mentioned asynchronous message bus includes a receiving ready signal line, a clock signal line, a message valid bit signal line and a Root or multiple data lines.
- each of the above-mentioned routers includes multiple groups of ports, and each of the above-mentioned port groups includes a receiving port and a sending port, wherein each of the above-mentioned receiving ports corresponds to a receiving unit for receiving data, and each of the above-mentioned The sending port corresponds to a sending unit and is used for sending data.
- each receiving port corresponds to a receiving unit, and each receiving unit includes a storage area based on a first-in-first-out storage mechanism; the above-mentioned first router receives the above-mentioned
- the above-mentioned target data sent by the first processing unit includes: based on the above-mentioned first clock signal, driving the above-mentioned storage area in the first receiving unit through the target receiving port to receive the above-mentioned target data sent by the first processing unit, and the above-mentioned target receiving port is A receiving port connected to the first processing unit in the first router.
- each of the above-mentioned sending ports corresponds to a sending unit; the sending of the above-mentioned target data to the above-mentioned second processing unit through the above-mentioned first router according to the above-mentioned destination address includes: according to the above-mentioned destination address, determining The target sending port in the above-mentioned first router, the above-mentioned target sending port is the sending port corresponding to the above-mentioned second processing unit in the above-mentioned first router; the above-mentioned target data.
- each of the above-mentioned routers includes a mapping table, and the above-mentioned mapping table includes the relationship between the port identifier of each of the above-mentioned sending ports in the above-mentioned router and the corresponding unit identifier of the above-mentioned processing unit or the routing identifier of other above-mentioned routers.
- the above-mentioned unit identifier is used to uniquely determine the above-mentioned processing unit, and the above-mentioned routing identifier is used to uniquely determine the above-mentioned router; the above-mentioned determining the target sending port in the above-mentioned first router according to the above-mentioned destination address includes: according to the above-mentioned destination address, based on The mapping table in the first router determines the target sending port.
- the target sending port is The sending port with the least number of hops connected to the second router among the first routers.
- the above-mentioned sending the above-mentioned target data to the above-mentioned second processing unit through the first sending unit corresponding to the above-mentioned target sending port includes: receiving the first sending unit sent by the above-mentioned first receiving unit 2.
- the target sending port sends the target data to the second processing unit in a serial single-bit transmission manner.
- each of the above-mentioned routers includes a channel selector; the above method further includes: connecting the data path from the above-mentioned first receiving unit to the above-mentioned first sending unit through the channel selector of the above-mentioned first router, so that The first sending unit acquires the target data from a storage area of the first receiving unit.
- each of the above-mentioned routers includes an arbitrator, and each of the above-mentioned sending units corresponds to one of the above-mentioned arbitrators; the above method further includes: when m receiving units simultaneously request the above-mentioned first sending unit to send data, The arbiter of the first router determines the target receiving unit from the m receiving units according to a preset arbitration rule, where m is greater than 1 and less than or equal to the number of all the receiving units included in the router.
- the method further includes: after the arbiter determines the target receiving unit, connecting the data path from the target receiving unit to the first sending unit through the channel selector of the first router to The first sending unit is made to acquire data from the storage area of the target receiving unit and send it.
- the above-mentioned arbitrator includes a second asynchronous handshake circuit; after the above-mentioned arbitrator determines the above-mentioned target receiving unit, the above-mentioned target receiving unit is connected to the above-mentioned first sending unit through the channel selector of the above-mentioned first router.
- the data path of the unit includes: after determining that the state of sending data of the first sending unit is ready, based on the signal requesting to send data sent by the target receiving unit to the first sending unit, through the second asynchronous signal of the first router
- the handshaking circuit determines the second clock signal; based on the second clock signal, connects the target receiving unit to the second sending data path through the channel selector of the first router.
- the number of the second asynchronous handshake circuits in each of the arbitrators is one less than the number of receiving ports in the router.
- an embodiment of the present application provides a computer-readable storage medium for storing computer software instructions used by the data transmission device provided in the first aspect above, which includes a program designed to execute the above aspect.
- an embodiment of the present application provides a computer program product, the computer program product includes instructions, and when the computer program is executed by a computer, the computer can execute the process performed by the data transmission device in the first aspect above.
- the present application provides a chip system, which includes the above first aspect and the device provided in combination with any implementation manner of the first aspect.
- the system-on-a-chip is used to implement the functions of the device involved in the above-mentioned first aspect.
- the chip system further includes a memory, and the memory is configured to store necessary program instructions and data of the data transmission device.
- the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
- the embodiment of the present application provides an electronic device, which includes the first aspect and the apparatus provided in combination with any implementation manner of the first aspect.
- the electronic device is used to implement the functions involved in the first aspect above.
- the data transmission device based on the asynchronous handshake mechanism determines that the state of receiving data of the router (the first router) is ready through the processing unit (the first processing unit), based on the request of the processing unit to send data, generate the second a clock signal; and send the first clock signal and the target data to the router according to the first clock signal, so that the router connected to the processing unit can receive the target data through the first clock signal, and then the router can receive the target data according to the received In the received target data, the carried destination address is used to send the target data to the second processing unit.
- This transmission mode of asynchronous handshake between the processing unit and the router can ensure that the router receives and completes the target data.
- the processing unit also sends the clock signal (that is, the first clock signal) when sending the target data to the router, so that the router can receive the data according to the clock signal, reducing the clock constraints in the data transmission device and making it easier to integrate multiple Heterogeneous processing units or intellectual property cores make multiple routers not limited by synchronous clocks, make decisions faster, and effectively improve system transmission performance.
- the data line connecting the processing unit and the router is relatively short and relatively definite, which will further lead to a small and relatively definite time delay of the corresponding clock signal when the processing unit needs to send data.
- a router can be connected to multiple processing units asynchronously at the same time, greatly reducing the chip area occupied by the bus.
- FIG. 1 is a schematic structural diagram of a data packet provided by an embodiment of the present application.
- Fig. 2 is a schematic structural diagram of a data transmission device provided by an embodiment of the present application.
- FIG. 3 is a schematic structural diagram of another data transmission device provided by an embodiment of the present application.
- Fig. 4 is a schematic structural diagram of an asynchronous message transceiver provided by an embodiment of the present application.
- FIG. 5 is a schematic circuit diagram of a Click unit provided in an embodiment of the present application.
- FIG. 6 is a schematic diagram of a working sequence of a Click unit in a working mode provided by an embodiment of the present application.
- FIG. 7 is a schematic structural diagram of a sending unit provided by an embodiment of the present application.
- Fig. 8 is a schematic diagram of an asynchronous message sending processing flow provided by an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a receiving unit provided by an embodiment of the present application.
- FIG. 10 is a schematic structural diagram of a router provided by an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of a simple data transmission device provided by an embodiment of the present application.
- FIG. 12 is an implementation block diagram of a router provided by an embodiment of the present application.
- FIG. 13 is a schematic diagram of a forwarding process of a router provided in an embodiment of the present application.
- Fig. 14 is a schematic diagram of an arbitration process provided by an embodiment of the present application.
- FIG. 15 is a schematic diagram of an internal circuit structure of an arbiter provided by an embodiment of the present application.
- FIG. 16 is a schematic structural diagram of an extended sending unit based on FIG. 7 provided by the embodiment of the present application.
- Fig. 17 is an effect diagram of data packet transmission based on the sending unit shown in Fig. 7 provided by the embodiment of the present application.
- Fig. 18 is a data packet transmission effect based on the sending unit shown in Fig. 16 provided by the embodiment of the present application.
- FIG. 19 is an implementation block diagram of an extended router corresponding to FIG. 16 provided by the embodiment of the present application.
- FIG. 20 is a schematic flowchart of a data transmission method provided by an embodiment of the present application.
- At least one (item) means one or more, and “multiple” means two or more.
- “And/or” is used to describe the association relationship of associated objects, which means that there can be three kinds of relationships, for example, “a and/or b” can mean: only a exists, only b exists, and a and b exist at the same time , where a and b can be singular or plural.
- the character “/” generally indicates that the contextual objects are an "or” relationship.
- At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
- At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.
- a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
- an application running on a computing device and the computing device can be components.
- One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers.
- these components can execute from various computer readable media having various data structures stored thereon.
- a component may, for example, be based on a system having one or more packets of data (e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems through signals). Signals are communicated through local and/or remote processes.
- packets of data e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems through signals.
- Signals are communicated through local and/or remote processes.
- serial transmission is a transmission mode in which data is transmitted on a signal line and carried out bit by bit.
- one data line can be used to transmit data, one bit at a time, and multiple bits need to be transmitted one after another.
- parallel transmission the data is divided into blocks according to the set number of bits, and the data blocks transmit each batch of data at the same time through several data lines with the same number of bits. That is, parallel transmission is to transmit data on multiple signal lines, using multiple parallel data lines to transmit multiple bits at a time.
- the transmission frequency of serial transmission is higher than that of parallel transmission.
- the communication methods corresponding to serial transmission and parallel transmission are asynchronous communication and synchronous communication.
- the time slot between the sent unit data can be arbitrary. But the receiving end must be ready to receive at all times.
- the sending end can start sending characters at any time, so it is necessary to add a sign at the beginning and end of each character, that is, add a start bit and a stop bit, so that the receiving end can Receive every character correctly.
- Synchronous communication is a kind of bit synchronous communication technology, which requires the sender and receiver to have a synchronous clock signal with the same frequency and phase. It only needs to add a specific synchronization character at the front of the transmitted message to make the sender and receiver establish synchronization. bit-by-bit transmission/reception under the control of the Compared with synchronous communication, the advantage of asynchronous communication is that the communication equipment is simple and cheap, and most importantly, it does not require strict control of clock synchronization.
- the inter-chip architecture in the embodiment of this application adopts a combination of asynchronous communication and serial transmission, and as much as possible in a single routing node Multi-connected processing units simplify chip design complexity and reduce chip area.
- the data to be transmitted may be a data packet.
- the data packet may be of fixed length or of variable length.
- a variable-length data packet is taken as an example, and the structure of the data packet is shown in FIG. 1 .
- FIG. 1 please refer to the accompanying drawing 1, which is a schematic structural diagram of a data packet provided by the embodiment of the present application. As shown in Figure 1, each field of the packet is defined as follows:
- the first field is the destination address of the data packet, that is, to determine the receiver of the data packet (the second processing unit).
- the length of this field can be expanded correspondingly according to the scale of the actual bus. For example, a 3-bit flag. In this embodiment of the application, it may be the communication address of the second processing unit.
- the second field is the length field of the data packet, which represents the effective data length of the data packet.
- the effective data of the data packet takes 2 bits as the basic length unit, and the length field indicates that the effective data is a multiple of 2 bits, namely 2*length value.
- the third field is the valid data of the data packet, which stores the information to be transmitted by the data packet.
- the specific format can be agreed upon according to design requirements, and is not specifically limited in this embodiment of the application.
- the fourth field is the transmission check digit of the data packet, which is used to check whether a bit error is introduced in the transmission.
- the check method can be selected according to the actual scene. For example: a parity check and other manners may be used, which is not specifically limited in this embodiment of the present application.
- the embodiment of the present application is described by taking the transmission of the data packet in the data transmission device as an example, and does not limit the specific transmission form of the data in the data structure. For example: it can also be transmitted in the form of data frame, data block, etc.
- FIG. 1 is a schematic structural diagram of a data transmission device provided by an embodiment of the present application.
- the circle represents the processing unit 01
- the square represents the router 02 .
- the device includes a plurality of processing units 01 and a plurality of routers 02, connection paths can be formed between the plurality of routers 02, each of the routers 02 is connected to one or more processing units 01, and Each processing unit 01 has one and only one corresponding router 02 connected thereto. That is, each router 02 can be connected to multiple processing units 01 , but each processing unit 01 can only be connected to one router 02 . Moreover, each router 02 may form a communication connection with any router 02 among the multiple routers 02 .
- any one of the routers 02 in the plurality of routers 02 can perform data transmission with another one of the routers 02 in the plurality of routers 02; Or a plurality of routers 02 perform data transmission with a processing unit 01 connected to another router 02 .
- the data in the processing unit 101 may be transmitted to the processing unit 103 through the router 102 and the router 104 .
- multiple routers can be connected in a back-to-back cascading manner, which reduces outgoing lines of routers, shortens the length of interconnection lines between routers, and shortens transmission delay.
- the so-called back-to-back cascading method is to directly connect the receiving port and the sending port between two interconnected routers through data lines, wires or other data transmission media. That is, two interconnected routers are not connected through a communication network, but are directly connected through relevant data transmission media. For example: the sending port of the router at the sending end is directly connected to the receiving port of the router at the receiving end.
- the router and the processing unit may also be connected in a back-to-back cascading manner.
- the first processing unit in the data transmission device is configured to: generate a first request, the above-mentioned first request is used to request to send the target data to the second processing unit, and the above-mentioned target data includes the destination address of the second processing unit; After determining that the state of receiving data of the above-mentioned first router is ready, based on the above-mentioned first request, determine a first clock signal; send the above-mentioned target data to the above-mentioned first router based on the above-mentioned first clock signal, and send the above-mentioned first clock signal to The first router above.
- the first router is configured to: receive the first clock signal; receive the target data sent by the first processing unit based on the first clock signal; and send the target data to the second processing unit according to the destination address.
- the data transmission device based on the asynchronous handshake mechanism determines that the state of receiving data of the router (the first router) is ready through the processing unit (the first processing unit), based on the request of the processing unit to send data, generate the second a clock signal; and send the first clock signal and the target data to the router according to the first clock signal, so that the router connected to the processing unit can receive the target data through the first clock signal, and then the router can receive the target data according to the received In the received target data, the carried destination address is used to send the target data to the second processing unit.
- This transmission mode of asynchronous handshake between the processing unit and the router can ensure that the router receives and completes the target data.
- the processing unit also sends the clock signal (that is, the first clock signal) when sending the target data to the router, so that the router can receive the data according to the clock signal, reducing the clock constraints in the data transmission device and making it easier to integrate multiple Heterogeneous processing units or intellectual property cores make multiple routers not limited by synchronous clocks, make decisions faster, and effectively improve system transmission performance.
- the data line connecting the processing unit and the router is relatively short and relatively definite, which will further lead to a small and relatively definite time delay of the corresponding clock signal when the processing unit needs to send data.
- a router can be asynchronously connected to multiple processing units, which greatly reduces the chip area occupied by the bus.
- the processing unit of the data transmission device may include an intellectual property (intellectual property, ip) core, a microprocessor, a central processing unit (central processing unit, cpu), a digital signal processing (digital signal processing, dsp), Image processing unit (graphics processing unit, gpu), neural network processing unit (neural-network processing unit, npu) and other related processing entities (process entity, pe) that can perform data processing.
- an intellectual property intelligent property, ip
- a microprocessor central processing unit, cpu
- a digital signal processing digital signal processing
- dsp digital signal processing
- Image processing unit graphics processing unit
- gpu graphics processing unit
- neural network processing unit neural-network processing unit
- npu neural-network processing unit
- the structure of the data transmission device provided in the embodiment of the present application is not only a closed structure as shown in FIG. 2 , but may also include a non-closed structure.
- FIG. 3 is a schematic structural diagram of another data transmission device provided by an embodiment of the present application.
- the device includes a plurality of processing units 01 and a plurality of routers 02, and connection paths can be formed between the plurality of routers 02, each of the routers 02 is connected to at least two processing units 01, and each Each processing unit 01 has one and only one corresponding router 02 connected to it.
- the embodiment of the present application does not specifically limit the connection structure of the data transmission device.
- Both the processing unit 01 and the router 02 in the embodiment of the present application may include an asynchronous transceiver, and the asynchronous transceiver may transmit and receive target data in an asynchronous serial transmission manner.
- the asynchronous transceiver includes a sending unit and a receiving unit, the sending unit includes an asynchronous handshake circuit and an asynchronous message transmitter, and the asynchronous handshake circuit is used to provide the asynchronous message transmitter with a self-timing clock signal (equivalent to the first clock signal), so that the asynchronous message sender sends the target data in a serial single-bit transmission manner according to the self-sequential clock signal.
- the receiving unit is configured to receive target data.
- FIG. 4 is a schematic structural diagram of an asynchronous message transceiver provided in an embodiment of the present application.
- the sending end includes a sending unit tx of a processing unit (also called an asynchronous message sending unit, a second sending unit, etc.) and a message packet management unit
- the receiving end includes a receiving unit rx of a router (also called It can be called an asynchronous message receiving unit, a first receiving unit, etc.) and a message packet management unit.
- the message packet management unit msg at the sending end can drive the asynchronous message sending unit and the asynchronous message receiving unit to transmit data packets to realize the asynchronous message packet transmission mechanism.
- fifo refers to a first-in-first-out mechanism, that is, the target data that first enters the message packet management unit is sent first.
- the message packet management unit at the sending end can synchronously receive and process the target data sent by the processing unit, including one or more of the data sending indication bit (the start bit (msg_bn) and the end bit (msg_end) of the data packet), valid data, and a synchronous clock, etc. Multiple. It is also possible to send a receiving indication bit (data received) and feedback data sending status (success or failure) to the processing unit.
- the message packet management unit msg of the receiving end is also a message packet management mechanism based on fifo.
- the message packet management unit of the receiving end can send the target data to the router or the processing unit, including the data sending indicator bit (the start bit of the data packet (msg_bn) and end bit (msg_end)), valid data, and asynchronous self-clock, etc. It is also possible for the receiving router or the processing unit to send a receiving indication bit (data received) and feedback data sending status (success or failure).
- Sending unit and receiving unit 1. Serial single-bit transmission mechanism can be realized. Since the router and the processing unit are back-to-back cascaded connection transmission architecture, the sending unit and the receiving unit at different ends (for example: different ends refer to between different processing units and routers, between different routers and routers) can pass through one or Multiple data lines are used to send or receive data (such as sending data in FIG. 4 ), for example, the serial single-bit transmission mechanism of the present application can be realized through a single data line. 2. Shorten the data transmission delay.
- the embodiment of the present application uses a packet-based asynchronous handshake mechanism instead of the asynchronous single-bit handshake mechanism, which shortens the transmission delay.
- a packet-based asynchronous handshake mechanism instead of the asynchronous single-bit handshake mechanism, which shortens the transmission delay.
- send the indicator bit the signal transmitted by the message effective bit signal line determines the start bit (msg_bn) and the end bit (msg_end) of the packet. 3, simplify the timing analysis of integration and docking.
- Asynchronous timing is adopted (self-timing and self-clocking in Figure 4), and strict clock synchronization is not required between different receiving units of a router, which further simplifies the timing analysis of integration and docking, and is easy to expand according to business requirements.
- the sending unit includes an asynchronous handshake circuit and an asynchronous message sender, the asynchronous handshake circuit provides a self-sequence clock for the asynchronous message sender, and the asynchronous message sender converts the target data in a serial single-bit format according to the self-sequence clock signal
- the transmission method is sent to the router.
- This step handshake circuit may be called a click element, hereinafter referred to as the Click unit, and the Click unit may provide a self-sequential clock for the asynchronous message transmitter.
- the Click unit can greatly simplify the design complexity of changing a synchronous circuit to an asynchronous circuit due to its simple design.
- the Click unit generates a self-timing clock through a self-loop to drive the asynchronous message transmitter to continuously send serial data, and the time delay of the self-loop is determined by the maximum time delay from the sending unit to the receiving unit.
- the Click unit includes: two AND gates, one OR gate and a phase lock register.
- the AND gate is a basic logic gate circuit that performs an "AND" operation. This circuit has multiple inputs and one output. When all inputs are high (logic 1) at the same time, the output is high, otherwise the output is low (logic 0).
- the OR gate is a circuit that implements logic addition, also known as logic and circuit. This circuit has more than two input terminals and one output terminal. Among them, as long as one or several input terminals are high level (logic 1), the output of the OR gate is high level (logic 1). The output is low (logic 0) only when all inputs are low (logic 0).
- the phase lock register is used to invert the level of the signal corresponding to B.ack in the embodiment of the present application, that is, when the level of B.ack changes, the changed level is inverted back. For example: B.ack changes from low level to high level, and the phase lock register can change the high level to low level again.
- Fire -A.req*A.ack*B.ack+A.req*-A.ack*-B.ack finally output by the Click unit.
- -A.req is the reversal signal of A.req.
- -A.req is high level
- -A.req is low level
- -A.ack and -B.ack are the reversal signals of A.ack and B.ack respectively.
- the high level is 1, and the low level is 0.
- the handshake circuit of the Click unit includes: a forward handshake signal line, a backward handshake signal line and a self-clock signal line.
- Forward handshake signal lines which are request and response two signal lines, such as A.req and A.ack in Figure 5;
- the backward handshake signal lines are the two signal lines of request and response, such as B.req and B.ack in Figure 5;
- the self-clock signal line Fire can drive a data storage device based on a first-in-first-out mechanism (such as a register, a serial fifo memory, and a fifo queue) to output data according to the self-clock signal Fire.
- a first-in-first-out mechanism such as a register, a serial fifo memory, and a fifo queue
- FIG. 6 is a schematic diagram of a working sequence of a Click unit in a working mode provided by an embodiment of the present application.
- the circuit working mode of the Click unit wherein, as shown in Figure 6: the signal in_req is the signal output by the A.req signal line in the above Figure 5, in_ack is the signal output by the B.ack signal line in the above Figure 5, and out_req is The signal output by the B.req signal line in FIG. 5 above, and out_ack is the signal output by the A.ack signal line in FIG. 5 above.
- Fire -in_req*out_ack*in_ack+in_req*-out_ack*-in_ack.
- the forward handshake signal line A.req and the backward handshake signal line B.ack of the Click unit are the two input signal lines of the Click unit, and the forward handshake signal line A.ack and the backward handshake signal B.req are The Click unit has two output signal lines.
- the input signal in_req changes from low level to high level (rising edge)
- the Click unit can be triggered to send data from the clock Fire.
- the Click unit implements a 4-phase handshake protocol, in which both the rising and falling edges of the request can be generated from the timing clock, that is, the Fire shown in Figure 6 .
- FIG. 7 is a schematic structural diagram of a sending unit provided by an embodiment of the present application.
- the sending unit may include a Click unit and an asynchronous message sender, and may also include an asynchronous message bus.
- the asynchronous message bus refers to the connection data line connecting the receiving unit and the sending unit, wherein, the sending unit side includes four signal lines, including the receiving ready signal line, the self-sequential clock signal line, the message effective bit signal line, and the data line (transmission data signal lines).
- the receiving ready signal line is used to transmit the indicator bit
- the self-sequence clock signal line is used to transmit the self-sequence clock
- the message valid bit signal line is used to transmit the header and tail of the data packet
- the data line is used to transmit valid data.
- the Click unit (equivalent to the first asynchronous handshake circuit in the present application) generates the self-timing clock to drive the asynchronous message transmitter to continuously send serial data through the self-loop mode, wherein the time delay of the self-loop in the Click unit is determined by the processing
- the maximum delay from the sending unit on the unit side to the receiving unit on the router side is determined. It should be noted that the maximum delay may be determined by physical quantities that affect data transmission time, such as the length and material of the data line connecting the sending unit on the processing unit side and the receiving unit on the router side.
- the manner in which the Click unit is generated from the timing clock can refer to the descriptions of the above-mentioned related embodiments in FIG. 5 to FIG.
- the asynchronous message sender based on the driving of the self-timing clock provided by the above-mentioned Click unit, reads the effective bit of the serial fifo (based on the fifo storage medium) and the data packet, and outputs data to the data line and the effective bit of the message of the asynchronous message bus, and also Delaying for a certain time also outputs the self-timing clock to the self-timing clock of the asynchronous message bus.
- the asynchronous message sender includes an asynchronous message sending process, a message length len, an asynchronous serial fifo, a packet encapsulation module M, and valid data D. Please refer to FIG.
- the asynchronous message sending process can start message packet sending, and call message len to judge whether the message length is greater than 0; if so, set the sending request signal A.req; set the valid bit of the message to be valid; wait for the sending to complete setting A. ack, the data in the asynchronous serial fifo is shifted out one bit, and the message len is decreased by 1; the process is repeated until the length of the message len is 0, and the sending of the message is completed, and the effective bit of the message is set to invalid.
- the message len is used for packet length statistics of the target data, and the valid data D is valid data in the target data.
- the processing unit first writes the data packet to be sent from the data interface to the asynchronous serial fifo in the asynchronous message sender.
- the processing unit notifies the asynchronous message sending processing flow in its asynchronous message sending unit to start data sending.
- the asynchronous message sending unit sets the packet header in the packet encapsulation module M and waits for the receiving ready of the receiving end of the routing node to become valid.
- the Click unit triggers the clock and sends the packet header, the first bit of data and the corresponding clock pulse to the asynchronous message bus.
- the sending of this bit is completed.
- the Click unit feeds back to itself whether the sending of the bit is completed. It should be noted that the delay time determined by the delay circuit is determined by the distance between the data from the sending unit to the receiving unit.
- the reception is complete. After the receiving end of the routing node detects the end of the packet, it sets the receiving ready signal to be invalid, indicating that the packet has been received.
- the sending is complete.
- the sending unit of the processing unit notifies the processing unit at the local end of the completion of sending after detecting the message receiving completion signal.
- FIG. 9 is a schematic structural diagram of a receiving unit provided by an embodiment of the present application.
- the receiving unit may include: an asynchronous message bus and an asynchronous message receiver.
- the asynchronous message bus refers to the connection data line connecting the receiving unit and the sending unit, wherein, the receiving unit side includes four signal lines (connected to the sending unit side), including a receiving ready signal line, a self-sequential clock signal line, and a message Effective bit signal line, data line (signal line for transmitting data).
- the receiving ready signal line is used to transmit the indicator bit
- the self-sequence clock signal line is used to transmit the self-sequence clock
- the message valid bit signal line is used to transmit the header and tail of the data packet
- the data line is used to transmit valid data.
- Asynchronous message receiver includes asynchronous message receiving processing flow and asynchronous serial fifo.
- the asynchronous message receiving process can supervise the data receiving process and save the data packets in the asynchronous serial fifo. Since the sending unit transmits the self-timing clock through the asynchronous message bus, the receiving unit can directly use this clock signal to receive the data transmitted by the sending unit. The receiving unit needs to wait for the completion signal or event of message sending, and then notify the downstream processing unit or router to read the received data packet.
- the local end (the receiving unit in the router) confirms that it can receive new data packets, and sets a ready signal for receiving.
- Baotou is detected. Start receiving data and data packet length statistics when the header signal is detected.
- the reception is complete.
- the local end sets the reception completion signal to notify the peer end that the message has been received, that is, the reception ready signal is set to be invalid, indicating that the message has been received.
- the message is ready.
- the message that notifies the processing unit at the local end (the processing unit that receives the target data) is ready.
- the local processing unit reads the received data packets through the data interface.
- the processing unit at the local end sets the reading completion after reading the data packet, and the interface unit at the local end detects the signal and repeats step 1 to prepare to receive the next data packet.
- steps 6 to 8 are the steps for two processing units to transmit target data when they are connected to the same router and there is no transmission conflict.
- the embodiment of the present application uses a packet-based asynchronous handshake mechanism instead of the asynchronous single-bit handshake mechanism, which shortens the transmission delay, (for example : as shown in Figure 4, send the indicator bit, determine the start bit (msg_bn) and end bit (msg_end) of the packet.And, asynchronous timing is adopted between different ends (self-time sequence self-clock as in Figure 4), without strict clock synchronization, Simplifies the timing analysis of integration and docking, and is easy to expand according to business needs.
- a packet-based asynchronous handshake mechanism instead of the asynchronous single-bit handshake mechanism, which shortens the transmission delay, (for example : as shown in Figure 4, send the indicator bit, determine the start bit (msg_bn) and end bit (msg_end) of the packet.And, asynchronous timing is adopted between different ends (self-time sequence self-clock as in Figure 4), without strict clock synchronization, Simplifies the timing analysis of integration and docking, and is easy
- both the router and the processing unit include a receiving unit and a sending unit, wherein the structures and functions of the receiving unit and the sending unit in the router and the processing unit can refer to the relevant descriptions of the foregoing embodiments.
- An asynchronous serial single-bit transmission mechanism can be implemented between the receiving unit of the first router and the sending unit of the first processing unit. 2. Shorten the data transmission delay. 3. Simplify the timing analysis of integration and docking.
- the router is configured with multiple groups of routing ports, and each group of routing ports includes a receiving port and a sending port, wherein the receiving port of each group of routing ports is also connected to a receiving unit, and the sending port of each group of routing ports is connected to a sending unit.
- the router includes an asynchronous transceiver unit (that is, a receiving unit or a sending unit connected to each port), a mapping table, a routing arbitration and a channel selector.
- asynchronous transceiver unit that is, a receiving unit or a sending unit connected to each port
- mapping table that is, a mapping table
- routing arbitration and a channel selector.
- FIG. 10 is a schematic structural diagram of a router provided in an embodiment of the present application. As shown in Figure 10:
- Asynchronous transceiver unit receiving and sending message packets from the routing port, the asynchronous transceiver unit in the router also includes a receiving unit and a transmitting unit (ie, RX and TX).
- the asynchronous transceiver unit includes a plurality of receiving units and sending units.
- Each receiving unit and each sending unit corresponds to a port.
- receiving unit RX0 corresponds to receiving port A
- receiving unit RX1 corresponds to receiving port B
- receiving unit RX2 corresponds to receiving port C
- receiving unit RX3 corresponds to receiving port D
- transmitting unit TX0 corresponds to transmitting port A
- transmitting unit TX1 Corresponding to the transmission port B, the transmission unit TX2 corresponds to the transmission port C
- the transmission unit TX3 corresponds to the transmission port D.
- the receiving unit and the sending unit in the router are the same as the transmission mechanism of the asynchronous transceiver in the above-mentioned embodiment (1), and can implement 1.
- the serial single-bit transmission mechanism 2. Shorten the data transmission delay. 3. Simplify the timing analysis of integration and docking, etc., which will not be repeated in this embodiment of the present application.
- mapping table including the connection relationship between each sending port of the router and the processing unit or other routers, so that the router configures the mapping table according to the topological networking situation of the SOC, and searches for the sending port of the message packet.
- the mapping table includes the target port number and the sending port number of the router.
- the target port number includes the unit identification of the processing unit connected to the port, the routing identification of the router or the communication code corresponding to the port, the communication address (such as: the destination address contained in the data packet), etc., wherein the unit identification is used For uniquely identifying the processing unit, the routing identifier is used to uniquely identify the router.
- FIG. 11 is a schematic structural diagram of a simple data transmission device provided by an embodiment of the present application.
- the data transmission device includes two routers and six processing units, wherein each router is connected to three processing units.
- the connection relationship shown in Figure 11, the mapping table of each router of the data transmission device is as follows:
- each router only saves the mapping table corresponding to its own local port.
- Each sending port corresponds to an arbiter.
- the arbitration condition for example: when multiple receiving ports send messages to a sending port at the same time, only one receiving port sends a request at a time and each receiving port All ports get a fair chance to send;
- Channel selector connect or disconnect the data channel from the receiving port to the sending port. For example: when the arbitration condition is satisfied, the data channel from the receiving port to the sending port is connected or disconnected according to the arbitration result of the arbitrator. Another example: when the data transmission is completed, disconnect the data channel from the receiving port to the sending port.
- FIG. 13 is a schematic diagram of a forwarding process of a router provided in an embodiment of the present application.
- port A, port B, and port C can all send data requests to the arbiter of port D; the arbiter selects port A, port B, or port C to send data to port D; the channel selector obtains the arbitration After the sending port determined by the device, the data channel between port A, port B or port C and port D is opened; 1, sending request sent by receiving port A, port B or port C; 2, message valid bit (that is, data 3. Data bit; 4. Feedback to it to continue sending the next bit of data bit.
- the receiving unit of port A of the router After receiving a message packet, the receiving unit of port A of the router extracts the code of the destination processing unit, and searches for the corresponding port, such as port D, through the code of the destination processing unit.
- the receiving unit of port A requests the arbiter of the port to send a message packet.
- the arbiter starts arbitration after confirming that port D is ready to send. If there is no conflict, it directly decides to send a message packet to port A.
- the arbiter sends a signal to select port A to the channel selection unit to open the data channel between port A and port D
- port A After detecting the ready-to-send signal of port D, port A starts sending the message packet.
- the receiving port A also releases the request at the same time.
- the arbiter notifies the channel selection unit to release the data channel between port A and port D according to the completion of the sending of port D and the release request signal of port A, and thus completes a complete sending process.
- the arbiter in this application does not have an arbitration protection window for synchronous clock cycles, the current synchronous arbiter mechanism cannot be completely reused. Therefore, it is necessary to design a real-time arbiter mechanism based on event arrival time, which can take advantage of asynchronous real-time advantages, and at the same time realize fair arbitration.
- FIG. 14 is a schematic diagram of an arbitration process provided by an embodiment of the present application. As shown in Figure 14, the steps are as follows:
- the arbitrator judges that more than one of the preset arbitration conditions are satisfied at the same time, it starts a new round of arbitration, otherwise it waits for the state to change.
- the default arbitration conditions are as follows:
- the asynchronous message receiver of at least one receiving port requests to send the target data to the target sending port.
- step 4 Determine whether the receiving port B needs to be strobed. If port B has a request, set the receiving port B to be gated; otherwise, step 4;
- step 6 After the receiving port B is strobed, wait for the receiving port B to complete the transmission, if not, wait; otherwise, release the strobe signal of the receiving port B, and go to step 4;
- step 7 After the receiving port C is strobed, wait for the receiving port C to complete the transmission, if not, wait; otherwise, release the receiving port C strobe signal, and go to step 1.
- FIG. 15 is a schematic diagram of an internal circuit structure of an arbiter provided by an embodiment of the present application.
- the implementation scheme of the asynchronous arbitrator is a circular arbitration circuit based on multiple Click circuits (referring to the embodiment described in Figure 5 above), and the Click circuit is used to implement the token ring mechanism. Judgment is made to ensure that each port gets a judgment opportunity in any time sequence, so as to achieve the goal of round-robin arbitration.
- the working mode of the Click circuit reference may be made to the related description of the above Click circuit device embodiment correspondingly, which will not be described again in the embodiment of the present application. It should be noted that the number of Click circuits corresponds to the number of receiving ports in the router.
- R A , R B , and R C in Figure 15 represent the signals of receiving port A, receiving port B, and receiving port C; S A , S B , and S C represent receiving port A, receiving port B, and receiving port C, respectively.
- the connection path to the sending port D; T A is the state switching instruction, T R is the ready state of the sending port D, and T C is the sending completion state of the sending port D.
- the ClickA circuit, the ClickB circuit and the ClickC circuit are three asynchronous handshake circuits (equivalent to the second asynchronous handshake circuit in this application), and & represents an AND gate logic circuit.
- 1-7 in FIG. 15 corresponds to 1-7 of the implementation process in FIG. 14 above.
- 1 in the truth tables of the above-mentioned Tables 3 to 5 is logic 1, representing a true, ready state, and 0 is a logic 0, representing a false, not ready state.
- PortA, PortB and PortC under the sending request represent the sending requests of receiving port A, receiving port B and receiving port C respectively;
- PortA, PortB and PortC under the channel signal represent sending port D and receiving port A, receiving port B and receiving port respectively Channel signal between ports C; sending ready refers to whether the sending port D can complete the sending task; sending complete refers to whether the sending port D has completed a sending task.
- PortA is 1 at time T0 of the self-sequence clock (equivalent to the second clock signal in the embodiment of the present application), it means that receiving port A has a send request to be sent at T0 time of the self-sequence clock.
- Table 3 (Scenario 1) is the situation where three sending requests arrive at the same time. To implement sending port A first, then sending port B, and finally sending port C; Table 4 (Scenario 2) is the situation where there is only one sending signal. The shortest arbitration cycle (such as six Click handshake cycles); Table 5 (Scenario 3) is the case where two sending requests arrive at the same time, and fair scheduling must also be achieved.
- the arbitrator in the router implements a fair arbitration mechanism in the data transmission device based on a simple token ring mechanism of a handshake circuit such as a Click circuit.
- the arbiter realizes a data packet-based transmission mechanism by using a public arbitration mechanism that has a timing dependency with the receiving unit, and has high performance.
- the port reusable router and router cascading technology can be used, which can be connected to multiple processing units or routers.
- the wiring is simple, the wiring is short, the routing algorithm is simple, and the maximum number of hops is asynchronous
- the number of routers is reduced by one, and the delay is small and relatively deterministic, which greatly reduces the chip area occupied by the bus; moreover, the application of asynchronous transceiver units in routers and asynchronous transceivers in processors can reduce clock constraints and make it easier to integrate multiple heterogeneous processing unit or ip core; the asynchronous arbitrator is used in the router, which is not limited by the synchronous clock, and makes decisions faster, which can effectively improve the forwarding performance of the system.
- the connecting data line connecting the receiving unit and the sending unit includes four signal lines, including a receiving ready signal line, a self-sequential clock signal line, a message effective bit signal line, and a data line ( signal line to transmit data).
- the data line is used to transmit effective data in a serial transmission mode, and when data requiring high-speed transmission such as large data blocks or vectors needs to be transmitted, the transmission speed between the receiving unit and the sending unit is relatively slow.
- the channel number of the data line can be easily expanded, and the control of the receiving unit and the sending unit can be completely reused.
- Line and control logic so that the data line can be expanded to support large data transmission as needed. Therefore, in order to improve large data blocks or data with high transmission speed requirements, the number of data lines between the receiving unit and the sending unit can be increased to realize the transmission mode of multi-channel serial transmission.
- FIG. 16 is a schematic structural diagram of an expanded sending unit based on FIG. 7 provided by an embodiment of the present application.
- a new data channel is expanded in Figure 16, including: a new data line is added to the asynchronous data bus, and a new data line is added to the asynchronous message sender
- the effective data transmission module D asynchronous serial fifo-2.
- Fig. 17 is a data packet transmission effect diagram based on the sending unit shown in Fig. 7 provided by the embodiment of the present application
- Fig. 18 is a kind of Packet transfer effect for the sending unit shown. Schematic diagram of the expanded circuit structure. As shown in Figure 17, when there is only one data line, the data packet is serially transmitted by single bit according to a data channel, D0, D1 and D2 are the data of each unit in the data packet, and the size of each unit of data can be 1 bit.
- the data packet is serially transmitted in single bits according to multiple data channels, where D0, D1, and D2 are each of the corresponding channels of the data packet.
- unit of data and the size of each unit of data can be 1 bit.
- multiple channels of data can be transmitted at the same time.
- the original parallel data can be converted into serial data of multiple channels according to a certain algorithm such as the odd-even branch algorithm (such as: odd-numbered bit data in The data line corresponding to channel 1 is transmitted and stored in the serial fifo, and the even-numbered bits are transmitted and stored in the data line corresponding to channel 2 and the serial fifo).
- the odd-even branch algorithm such as: odd-numbered bit data in The data line corresponding to channel 1 is transmitted and stored in the serial fifo, and the even-numbered bits are transmitted and stored in the data line corresponding to channel 2 and the serial fifo).
- the receiving unit can correspondingly add a serial fifo to store the received data.
- FIG. 19 is an implementation block diagram of an extended router corresponding to FIG. 16 provided by an embodiment of the present application.
- the router after expanding the receiving unit and the sending unit, it is only necessary to expand the channel selector to be consistent with the asynchronous data transceiver unit, and the channel selector supports multi-channel data transmission. accomplish.
- FIG. 19 is only an example illustration of an extended data channel, and the specific implementation manner can be customized according to business requirements, which is not specifically limited in this embodiment of the present application.
- each unit corresponds to its own program code (or program instruction), and when the program code corresponding to each unit runs on a related hardware device, the unit executes a corresponding process to realize a corresponding function.
- the functions of each unit can also be realized by related hardware.
- Fig. 20 is a schematic flowchart of a data transmission method provided by an embodiment of the present application, which can be applied to the data transmission architecture described in Fig. 2 or Fig. 3 above, where the processing unit can be used to support And execute the method flow step S301-step S304 shown in FIG. 3 .
- the router may be used to support and execute steps S305-S308 of the method flow shown in FIG. 3 .
- the data transmission method in the embodiment of the present application will be exemplarily described below by taking the sending of target data from the first processing unit to the target processing unit as an example.
- the method may include the following steps S301-S308.
- Step S301 the first processing unit determines target data.
- the first processing unit determines target data, and the target data includes the destination address of the second processing unit.
- the target address may be a communication address of the second processing unit.
- the data form of the target data during the sending process is a variable-length or fixed-length data packet.
- the packet structure as described above in Figure 1.
- Step S302 the first processing unit generates a first request.
- the first processing unit generates a first request, and the first request is used to request to send the target data to the second processing unit.
- the first request is equivalent to the A.req signal shown in FIG. 5
- the first request is generated, it is equivalent to the A.req signal changing from a low level to a high level.
- the first request may be used to trigger the first asynchronous handshake circuit (as shown in FIG. 5 ) to generate the first clock signal.
- each of the processing units includes a storage area based on a first-in-first-out storage mechanism.
- the first processing unit writes the target data into the storage area based on the first-in-first-out storage mechanism
- the sending request is generated.
- the storage area based on the first-in-first-out storage mechanism may be the asynchronous serial fifo module as shown in FIG. 7 or FIG. 9 above, or other forms of storage area, such as memory, queue or linked list, etc.
- the first-in-first-out storage mechanism when multiple target data need to be sent, they are sent sequentially in a certain time order, making the sending unit decision-making faster during the sending process, which can effectively improve the transmission performance of the system.
- Step S303 After determining that the state of the first router is ready to receive data, the first processing unit determines the first clock signal based on the first request.
- the first processing unit determines the first clock signal based on the first request.
- the first clock signal is a clock signal triggered simultaneously by the receiving state of the first router and the first request, and the clock signal can drive the sending unit to send data to the receiving unit, and can also drive the receiving unit to receive the sending unit sent data.
- the first clock signal is equivalent to the self-sequential clock signal in the embodiment described above in FIG. 7 or FIG. 9 .
- each of the above-mentioned processing units includes a first asynchronous handshake circuit; the above-mentioned first processing unit is specifically configured to: after the above-mentioned first asynchronous handshake circuit determines that the state of receiving data of the above-mentioned first router is ready, based on the above-mentioned first request, determine the above-mentioned first clock signal.
- the first clock signal (also called a self-sequential clock) is provided by an asynchronous handshake circuit.
- the asynchronous handshake circuit has a simple structure and can generate a self-sequential clock through a self-loop, that is, when the state of receiving data of the router is ready and the first request exists at the same time, the self-sequential clock can be generated to drive asynchronous message transmission
- the router sends the target data to the router in serial single-bit transmission.
- each of the above-mentioned processing units includes an asynchronous message transmitter; the above-mentioned first processing unit is specifically configured to: based on the above-mentioned first request, control the above-mentioned asynchronous message transmitter to transmit the above-mentioned target data in a serial form based on the above-mentioned first clock signal
- the bit transmission mode is sent to the above-mentioned first router.
- the asynchronous message transmitter can be driven by the first clock signal to send the target data to the router in a serial single-bit transmission mode, so as to realize asynchronous transmission between the processing unit and the router.
- the first clock signal can also be referred to as a self-sequential clock (such as the above-mentioned embodiments shown in Figures 7-9), which is provided by an asynchronous handshake circuit, such as the Fire signal shown in Figure 5 above.
- the asynchronous handshake circuit has a simple structure and can generate a self-sequential clock through a self-loop, that is, when the state of receiving data of the router is ready and the first request exists at the same time, the self-sequential clock can be generated to drive asynchronous message transmission
- the router sends the target data to the router in serial single-bit transmission.
- Each of the processing units may further include a sending unit (ie, a second sending unit), and the second sending unit includes a first asynchronous handshake circuit and an asynchronous message sender.
- Step S304 the first processing unit sends the target data to the first router based on the first clock signal, and sends the first clock signal to the first router.
- the second sending unit in the first processing unit sends the target data to the first router based on the first clock signal, and sends the first clock signal to the first router.
- the first processing unit and the first router are connected through an asynchronous message bus, wherein the asynchronous message bus includes a receive ready signal line, a clock signal line, a message valid bit signal line and a data line.
- the asynchronous message bus includes four signal lines, that is, a receive ready signal line, a clock signal line, a message valid bit signal line and one or more data lines.
- the receiving ready signal line is used to transmit the ready signal, and the ready signal is used to indicate that the state of the received data is ready
- the clock signal line is used to transmit the first clock signal
- the message effective bit signal line is used to transmit the header signal and packet of the target data.
- Tail signal one or more data lines used to transmit the valid data of the target data.
- the target data when the data to be transmitted is small (for example: the target data is an indication message, a control message, data whose data size is less than a preset threshold, etc.), it can be transmitted in a serial single-bit manner through a data line; when the data When the target data is relatively large (for example: the target data is vector data, video frame, image data, voice data, data whose data size is greater than or equal to the preset threshold, etc.), it can be serially transmitted through multiple data lines that support multiple channels , for specific implementation manners, reference may be made to the above-mentioned embodiments, and the present application will not repeat them here.
- the four signal lines greatly alleviate the problems in the prior art, such as many and complicated outgoing lines between the processing unit and the router, and reduce the chip area occupied by the entire asynchronous message bus.
- the data line may also be multiple data lines supporting multiple channels.
- Step S305 the first router receives the first clock signal.
- the first router receives the first clock signal.
- each of the above-mentioned routers includes multiple groups of ports, and each of the above-mentioned port groups includes a receiving port and a sending port, wherein each of the above-mentioned receiving ports is used for receiving data, and each of the above-mentioned sending ports is used for sending data.
- the router configures a processing unit connected to the router or other routers through a configurable port.
- the networking architecture can be flexibly reconfigured, such as point-to-point, multipoint-to-multipoint and other architectures.
- each port inside the router is connected with a receiving unit or a sending unit, so as to send and receive data.
- the data form of the target data during sending is variable-length or fixed-length data packets; the first processing unit is further configured to: set the packet header of the target data after generating the sending request. It can be understood that the packet header needs to be sent to the router together with the first clock signal through the message effective bit signal line.
- Step S306 The first router receives the target data sent by the first processing unit based on the first clock signal.
- the first router receives the target data sent by the first processing unit based on the first clock signal.
- the data form of the target data in the sending process is a variable-length or fixed-length data packet;
- the above-mentioned first processing unit is also used to: after generating the first request, set the packet header of the above-mentioned target data, and combine the above-mentioned packet header and The above-mentioned first clock signal is sent to the above-mentioned first router;
- the above-mentioned first router is further configured to: start receiving the above-mentioned target data after receiving the header corresponding to the above-mentioned target data. For example: as shown in FIG. 7 above, when the first router detects the packet header signal, it starts statistics of received data and message packet length.
- each of the receiving ports in the router corresponds to a receiving unit, and each of the receiving units includes a storage area based on a first-in-first-out storage mechanism; the first router is specifically configured to: based on the first clock signal, pass the target The receiving port drives the storage area in the first receiving unit to receive the target data sent by the first processing unit, and the target receiving port is a receiving port connected to the first processing unit in the first router.
- the first-in-first-out storage mechanism when multiple target data need to be sent, they are sent sequentially in a certain time order, making the sending unit decision-making faster during the sending process, which can effectively improve the transmission performance of the system.
- the storage area based on the first-in-first-out storage mechanism can be adapted to adapt synchronously to asynchronously, write data synchronously (such as: synchronously write target data in the processing unit to the sending unit) or read data (such as: router
- the middle sending unit reads data synchronously based on the storage area of the receiving unit), reads data asynchronously (such as: the target data is asynchronously sent to the router by the second sending unit in the processing unit) or writes data (such as: the first The receiving unit writes data asynchronously).
- Step S307 After receiving the target data, adjust the status of the first router from ready to receive data to not ready to receive data.
- the above-mentioned first router is also used for: after the above-mentioned target data is received, the state of receiving data of the above-mentioned first router is adjusted to the state of receiving data is not ready; the above-mentioned first processing unit is also used for: After the status of the received data to the first router is changed from ready to not ready, it is determined that the sending of the target data is completed.
- the processing unit can determine that the data transmission is complete, and then stop the data transmission to save communication resources.
- the ready state and the not ready state can be identified by high and low electric signals respectively.
- the above-mentioned first processing unit is also used to: after the last bit of the above-mentioned target data is sent, set and send the packet tail of the above-mentioned target data; the above-mentioned first router is also used to: after receiving the above-mentioned target data After the end of the packet, the state of receiving data of the first router is adjusted to the state of receiving data is not ready.
- Asynchronous transmission of target data is achieved by setting the header and tail of the target data, so that there is no need to synchronize the clock between the processing unit and the router, and it also makes it easier for a router to integrate multiple heterogeneous processing units or intellectual property cores.
- Step S308 the first router sends the target data to the second processing unit according to the destination address.
- the first router sends the target data to the target unit according to the destination address.
- each of the above-mentioned sending ports in the router corresponds to a sending unit; according to the above-mentioned destination address, determine the target sending port in the above-mentioned first router, and the above-mentioned target sending port is corresponding to the above-mentioned second processing unit in the above-mentioned first router sending port; sending the target data to the second processing unit through the first sending unit corresponding to the target sending port.
- the port-based configurable router determines the target sending port corresponding to the first sending unit according to the destination address, and sends the target data to the second processing unit through the sending port.
- each of the above-mentioned routers includes a mapping table
- the above-mentioned mapping table includes a mapping relationship between the port identifier of each of the above-mentioned sending ports in the above-mentioned router and the corresponding unit identifier of the above-mentioned processing unit or the routing identifier of other above-mentioned routers, the above-mentioned
- the unit identifier is used to uniquely identify the processing unit
- the routing identifier is used to uniquely identify the router
- the first router is specifically used to: determine the target sending port based on the mapping table in the first router according to the destination address.
- the route forwarding process is simplified and the transmission efficiency is improved by querying the route forwarding mechanism of the mapping table.
- the target sending port is the same as the first router.
- the sending port with the least number of hops connected to the above-mentioned second router is the number of routers minus one, and the router can choose the transmission path with the least connection hops according to the destination address to send to the second processing unit target data.
- the above-mentioned first router is specifically configured to: when the above-mentioned first sending unit receives the second request sent by the above-mentioned first receiving unit, control the above-mentioned first sending unit to obtain the above-mentioned Target data, the second request is used to request to send the target data through the first sending unit; send the target data to the second processing unit through the first sending unit in a serial single-bit transmission mode based on the target sending port .
- the simple sending unit based on shared data, the storage area of the fifo mechanism at the receiving end is reused to reduce data movement and improve transmission efficiency.
- each of the above-mentioned routers includes a channel selector; the channel selector of the above-mentioned first router is used to connect the data path from the above-mentioned first receiving unit to the above-mentioned first sending unit, so that the above-mentioned first sending unit transmits from the above-mentioned first
- the storage area of the receiving unit acquires the above-mentioned target data.
- the channel selector can connect the data path between the receiving unit and the sending unit, so that the sending unit can multiplex the fifo storage area of the receiving unit through the data path, reducing data movement and greatly improving The transmission performance of the router.
- the channel selector releases the data path for other first receiving units to send data to the first sending unit. That is, the first sending unit can only send the data of one first receiving unit at a time, and after the sending is completed, the data path between the first sending unit and the first receiving unit will be disconnected.
- each of the above-mentioned routers includes an arbitrator, and each of the above-mentioned sending units corresponds to one of the above-mentioned arbitrators; the arbitrator of the above-mentioned first router is used to: when m receiving units simultaneously request the first sending unit to send data, according to Preset arbitration rules, determine the target receiving unit from the above m receiving units, m is greater than 1 and less than or equal to the number of all the above receiving units included in the above router; the channel selector of the above first router is also used in the above arbitration After determining the target receiving unit, the controller connects the data path from the target receiving unit to the first sending unit, so that the first sending unit acquires data from the storage area of the target receiving unit and sends it.
- an arbitrator is used to implement a "many to one" fair arbitration mechanism to reduce conflicts when routing and forwarding data.
- a "many to one" fair arbitration mechanism to reduce conflicts when routing and forwarding data.
- the above-mentioned arbitrator includes a second asynchronous handshake circuit; the second asynchronous handshake circuit of the above-mentioned first router is used to: after determining that the state of sending data of the above-mentioned first sending unit is ready, based on the above-mentioned target receiving unit to the above-mentioned second A signal sent by a sending unit to request sending data determines a second clock signal; the channel selector of the first router is specifically used to connect the target receiving unit to the second sending data path based on the second clock signal.
- the arbitrator in the router implements a fair arbitration mechanism in the data transmission device based on a simple token ring mechanism of a handshake circuit such as a Click circuit.
- the arbiter implements a data packet-based transmission mechanism by using a public arbitration mechanism that is dependent on timing with the receiving unit, and has high performance. It can be understood that the arbiter in the router is an asynchronous arbiter.
- the number of the second asynchronous handshake circuits in each of the arbitrators is one less than the number of receiving ports in the router.
- the number of second asynchronous handshake circuits in the arbiter is compared with the number of receiving ports in the router One less.
- the data transmission device based on the asynchronous handshake mechanism uses the processing unit (for example, the first processing unit) to determine that the state of the router (for example, the first router) is ready to receive data, and based on the request of the processing unit to send data , generating a first clock signal; and sending the first clock signal and the target data to the router according to the first clock signal, so that the router connected to the processing unit can receive the target data through the first clock signal, and then the router Then, according to the destination address carried in the received target data, the target data is sent to the second processing unit.
- This transmission mode of asynchronous handshake between the processing unit and the router can ensure that the router receives and completes the target data.
- the processing unit also sends the clock signal (that is, the first clock signal) when sending the target data to the router, so that the router can receive the data according to the clock signal, reducing the clock constraints in the data transmission device and making it easier to integrate multiple Heterogeneous processing units or intellectual property cores make multiple routers not limited by synchronous clocks, make decisions faster, and effectively improve the forwarding performance of the system.
- the data line connecting the processing unit and the router is relatively short and relatively definite, which will further lead to a small and relatively definite time delay of the corresponding clock signal when the processing unit needs to send data.
- a router can be asynchronously connected to multiple processing units, which greatly reduces the chip area occupied by the bus.
- both the first router and the first processing unit include a receiving unit and a sending unit
- the first router may include a first receiving unit and a first sending unit
- the first processing unit may include a second receiving unit and a second sending unit.
- the first receiving unit and the second receiving unit have similar functions and similar structures, and both are used to receive data through asynchronous serial single-bit transmission.
- the first sending unit and the second sending unit have similar functions and structures and can be used to Send data in a single-bit transmission mode.
- first router mentioned in the embodiment of the present application can refer to the routers involved in the above-mentioned Figure 4-the above-mentioned Figure 19, and the first processing unit mentioned in the embodiment of the present application can refer to the above-mentioned figure 4 -
- the processing units involved in the above-mentioned FIG. 19 will not be described in detail here in this embodiment of the present application.
- An embodiment of the present application further provides a chip system, where the chip system includes any one of the foregoing embodiments and the device provided in combination with any one of the implementation manners of the foregoing embodiments.
- the chip system is used to realize the functions of the above-mentioned data transmission device.
- the chip system further includes a memory, and the memory is configured to store necessary program instructions and data of the data transmission device.
- the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
- An embodiment of the present application further provides an electronic device, where the electronic device includes any one of the foregoing embodiments and the apparatus provided in combination with any one of the implementation manners of the foregoing embodiments.
- the electronic equipment is used to realize the function of the above-mentioned data transmission device.
- the disclosed device can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the above units is only a logical function division.
- there may be other division methods for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.
- the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
- the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
- the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, server or network device, etc., specifically, a processor in the computer device) execute all or part of the steps of the above-mentioned methods in various embodiments of the present application.
- the aforementioned storage medium may include: a USB flash drive, a mobile hard disk, a magnetic disk, an optical disc, a read-only memory (read-only memory, abbreviated: rom) or a random access memory (random access memory, abbreviated: ram) and the like.
- a USB flash drive a mobile hard disk
- a magnetic disk a magnetic disk
- an optical disc a read-only memory (read-only memory, abbreviated: rom) or a random access memory (random access memory, abbreviated: ram) and the like.
- rom read-only memory
- random access memory random access memory
Landscapes
- Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
本申请实施例提供一种数据传输装置、方法及相关设备。其中,一种数据传输装置,可包括:多个处理单元和多个路由器,每个路由器与一个或多个处理单元连接,每个路由器与多个路由器中任意一个路由器形成通信连接关系;多个路由器包括第一路由器,第一路由器与第一处理单元连接。第一处理单元用于:生成第一请求;在确定第一路由器的接收数据的状态就绪后,基于第一请求,确定第一时钟信号;基于第一时钟信号向第一路由器发送目标数据,并将第一时钟信号发送至第一路由器。第一路由器用于:基于第一时钟信号接收第一处理单元发送的目标数据;根据目的地址,向第二处理单元发送目标数据。实施本申请实施例,可以在减少芯片面积的同时提升系统性能。
Description
本申请涉及信息技术领域,尤其涉及一种数据传输装置、方法及相关设备。
随着互联网化和行业数字化的深入,个人和行业智能终端的厂家越来越普遍采用片上系统(system on chip,soc)技术集成多个种类的芯片知识产权(intellectual property core,ip)核,如微处理器,中央处理单元(central processing unit,cpu),数字信号处理(digital signal processing,dsp),图像处理单元(graphics processing unit,gpu),神经网络处理单元(neural-network processing unit,npu)内存,网络联接芯片等等,集成ip核的数量也从几十个增长到上百个。这就是导致每个ip核之间需要更多互连和带宽来支撑越来越多实时数据通信。
异步电路设计技术因天然取消时钟,可实现全局异步本地同步(globally-asynchronous locaiiy-synchronous,gals)的新型soc集成架构,可大大简化芯片设计复杂度,减少开发投入和周期。其中,noc(network over chip/片上网络)架构是目前主流的大规模ip核集成的总线集成技术,每个路由节点(node)都是四个互连方向与其他路由节点相连,形成一张网状的全互连的网络,每个处理进程实体(process entity,pe)只与其中一个路由节点相连,与其他pe进行通信,其中,不同的pe工作在不同的主频上。
然而,现有的noc架构中每个路由节点只能接一个处理进程实体(即,pe),路由器数量多,占用大量芯片面积。同时,现在noc架构采用网状互连架构且现在需要集成的处理单元的ip规模面积均很大,如可达亿级晶体管,路由器出线多且长,引入传输时延大且不同局向的时延也差异大,导致时序分析和收敛难度大。
因此,如何在减少芯片面积的同时提升系统性能,是本申请实施例亟需解决的问题。
发明内容
本申请实施例提供一种数据传输装置、方法及相关设备,减少芯片面积的同时提升系统性能。
第一方面,本申请实施例提供了一种数据传输装置,可包括:多个处理单元和多个路由器,每个上述路由器与一个或多个处理单元连接,每个上述路由器与上述多个路由器中任意一个路由器形成通信连接关系;其中,上述多个路由器包括第一路由器,上述第一路由器与第一处理单元连接。
上述第一处理单元,用于:生成第一请求,上述第一请求用于请求将目标数据发送至第二处理单元,上述目标数据包括第二处理单元的目的地址;在确定上述第一路由器的接收数据的状态就绪后,基于上述第一请求,确定第一时钟信号;基于上述第一时钟信号向上述第一路由器发送上述目标数据,并将上述第一时钟信号发送至上述第一路由器。
上述第一路由器,用于:接收上述第一时钟信号;基于上述第一时钟信号接收上述第一处理单元发送的上述目标数据;根据上述目的地址,向上述第二处理单元发送上述目标 数据。
在第一方面提供的实施例中,基于异步握手机制的数据传输装置在确定路由器(如:第一路由器)接收数据的状态就绪后,使得处理单元(如:第一处理单元)基于发送数据的请求,生成第一时钟信号;并按照第一时钟信号将所述第一时钟信号和所述目标数据发送至路由器中,使得路由器可以通过接收到的第一时钟信号接收目标数据,然后路由器再根据接收到的目标数据中携带的目的地址,向第二处理单元发送该目标数据。这种处理单元和路由器之间异步握手的传输方式可以确保路由器接收完成目标数据。另外,处理单元将发送目标数据时的时钟信号(即,第一时钟信号)也发送至路由器,以使路由器可以根据该时钟信号接收到数据,减少了数据传输装置内的时钟约束,使得数据传输装置内更易集成多种异构的处理单元或知识产权核,同时使得多个路由器之间不受同步时钟限制,决策更快,可有效提升系统的传输性能。而且,处理单元与路由器之间连接数据线相对短且相对确定,会进一步导致在处理单元有发送数据的需求时,对应的时钟信号的时延小且相对确定。与此同时,本申请实施例中一个路由器可以与多个处理单元异步连接,大大减少总线占用的芯片面积。
在一种可能实现的方式中,上述第一路由器还用于:在上述目标数据接收完毕后,将上述第一路由器的接收数据的状态就绪调整为接收数据的状态未就绪;上述第一处理单元还用于:在监测到上述第一路由器的接收数据的状态就绪变为状态未就绪后,确定上述目标数据发送完成。在本申请实施例中,当路由器中接收数据的状态由就绪状态转变为未就绪状态后,处理单元就可以确定数据发送完毕,即可停止数据发送节省通信资源。其中,就绪状态和未就绪状态可以分别高、低电信号标识。
在一种可能实现的方式中,每个上述处理单元包括第一异步握手电路;上述第一处理单元具体用于:通过上述第一异步握手电路在确定上述第一路由器的接收数据的状态就绪后,基于上述第一请求,确定上述第一时钟信号。在本申请实施例中,第一时钟信号信号(也可以称之为自时序时钟)是由异步握手电路提供的。该异步握手电路,结构简单,可以通过自环路的方式产生自时序时钟,即,在路由器的接收数据的状态就绪和第一请求同时存在时,即可产生自时序时钟,以驱动异步消息发送器以串行单比特的传输方式发送目标数据至路由器。
在一种可能实现的方式中,每个上述处理单元包括异步消息发送器;上述第一处理单元具体用于:基于上述第一请求,控制上述异步消息发送器基于上述第一时钟信号将上述目标数据以串行单比特的传输方式发送至上述第一路由器。在本申请实施例中,该异步消息发送器可以接收第一时钟信号的驱动以串行单比特的传输方式发送目标数据至路由器,实现如处理单元与路由器之间的异步传输。
在一种可能实现的方式中,上述目标数据在发送过程中的数据形式为变长或定长的数据包;上述第一处理单元还用于:在生成第一请求后,设置上述目标数据的包头,将上述包头和上述第一时钟信号发送至上述第一路由器;在上述目标数据的最后一位数据发送完毕后,设置上述目标数据的包尾并发送;上述第一路由器还用于:在接收到上述目标数据对应的包头后,启动接收上述目标数据;在接收到上述目标数据的包尾后,将上述第一路由器的接收数据的状态就绪调整为接收数据的状态未就绪。在本申请实施例中,通过设置 目标数据的包头和包尾实现目标数据的异步传输,使得不需要同步处理单元与路由器之间的时钟,也令一个路由器更易集成多种异构的处理单元或知识产权核。
在一种可能实现的方式中,每个上述处理单元包括基于先进先出存储机制的存储区域;上述第一处理单元具体用于:将上述目标数据写入至上述基于先进先出存储机制的存储区域后,生成上述第一请求。在本申请实施例中,通过先进先出的存储机制使得在多个目标数据需要发送的情况下,按照一定的时间次序依次发送,使得在发送过程中发送决策更快,可有效提升系统的传输性能。
在一种可能实现的方式中,上述第一处理单元和上述第一路由器之间通过异步消息总线连接,其中,上述异步消息总线包括接收就绪信号线,时钟信号线,消息有效位信号线和一根或多根数据线。在本申请实施例中,异步消息总线包括四根信号线,即,接收就绪信号线,时钟信号线,消息有效位信号线和一根或多根数据线。其中,接收就绪信号线用于传输就绪信号,该就绪信号用于指示接收数据的状态就绪;时钟信号线用于传输第一时钟信号;消息有效位信号线用于传输目标数据的包头信号和包尾信号;一根或多根数据线用于传输目标数据的有效数据。该四种信号线大大缓解了现有技术中处理单元与路由器之间出线多、出线复杂等问题,减少了整个异步消息总线占用的芯片面积。可选的,根据业务需求,该数据线还可以是支持多通道的多根数据线。
在一种可能实现的方式中,每个上述路由器包括多组端口,每组上述端口包括接收端口和发送端口,其中,每个上述接收端口用于接收数据,每个上述发送端口用于发送数据。在本申请实施例中,路由器通过可配置的端口,配置与路由器连接的处理单元或其他路由器。通过基于端口可配置的路由器,可灵活重构的组网架构,如点到点,多点到多点等架构。而且,路由器的内部每个端口都连接有接收单元或发送单元,以便收发数据。
在一种可能实现的方式中,每个上述接收端口与一个接收单元对应,每个上述接收单元包括基于先进先出存储机制的存储区域;上述第一路由器具体用于:基于上述第一时钟信号,通过目标接收端口驱动第一接收单元内的上述存储区域接收上述第一处理单元发送的上述目标数据,上述目标接收端口为上述第一路由器中与上述第一处理单元连接的接收端口。在本申请实施例中,通过先进先出的存储机制使得在多个目标数据需要发送的情况下,按照一定的时间次序依次发送,使得在发送过程中发送单元决策更快,可有效提升系统的传输性能。另外,该基于先进先出存储机制的存储区域可以适用于同步转异步的适配方法,同步写入数据(如:处理单元中将目标数据同步写入发送单元)或读出数据(如:路由器中发送单元基于接收单元的存储区域同步读出数据),异步读出数据(如:处理单元中将目标数据由处理单元中的发送单元异步发送至路由器)或写入数据(如:路由器中的第一接收单元异步写入数据)。
在一种可能实现的方式中,每个上述发送端口与一个发送单元对应;上述第一路由器具体用于:根据上述目的地址,确定上述第一路由器中的目标发送端口,上述目标发送端口为上述第一路由器中与上述第二处理单元对应的发送端口;通过上述目标发送端口对应的第一发送单元向上述第二处理单元发送上述目标数据。在本申请实施例中,通过基于端口可配置的路由器根据目的地址,确定与第一发送单元对应的目标发送端口,并通过该发送端口向第二处理单元发送目标数据。而且,在多个路由器形成的通信连接中,并不需要 确定接收到目标数据时确定端口号,即可根据该目的地址向第二处理单元发送数据。
在一种可能实现的方式中,每个上述路由器包括映射表,上述映射表包括上述路由器中每个上述发送端口的端口标识与对应的上述处理单元的单元标识或其他上述路由器的路由标识之间的映射关系,上述单元标识用于唯一确定上述处理单元,上述路由标识用于唯一确定上述路由器;上述第一路由器具体用于:根据上述目的地址,基于上述第一路由器中的映射表确定上述目标发送端口。在本申请实施例中,通过查询映射表的路由转发机制,简化路由转发过程,提高传输效率。
在一种可能实现的方式中,在上述第二处理单元与第二路由器对应连接,且上述第一路由器与上述第二路由器为上述多个路由器中两个不同的路由器时,上述目标发送端口为与上述第一路由器中与上述第二路由器连接跳数最少的发送端口。实施本申请实施例,在多个路由器形成的通信连接中,路由连接跳数的数据最多的情况是在路由器数量的基础上减一,路由器可以根据目的地址自行选择连接跳数最少的传输路径进行向第二处理单元发送目标数据。
在一种可能实现的方式中,上述第一路由器具体用于:在上述第一发送单元接收到上述第一接收单元发送的第二请求时,控制上述第一发送单元从上述第一接收单元的存储区域中获取上述目标数据,上述第二请求用于请求通过上述第一发送单元发送上述目标数据;通过上述第一发送单元基于上述目标发送端口以串行单比特的传输方式向上述第二处理单元发送上述目标数据。在本申请实施例中,通过基于共享数据的简单发送单元,重用接收端的fifo机制的存储区域,减少数据搬移,提高传输效率。
在一种可能实现的方式中,每个上述路由器包括通道选择器;上述第一路由器的通道选择器用于,联通上述第一接收单元到上述第一发送单元的数据通路,以使上述第一发送单元从上述第一接收单元的存储区域获取上述目标数据。在本申请实施例中,当有数据发送的需求时,通道选择器可以连通接收单元与发送单元之间的数据通路,以使发送单元可以通过该数据通路复用接收单元的fifo存储区域,减少数据搬移,大大的提高了路由器的传输性能。
在一种可能实现的方式中,每个上述路由器包括仲裁器,每个上述发送单元对应一个上述仲裁器;上述第一路由器的仲裁器用于:在m个接收单元同时向上述第一发送单元请求发送数据时,根据预设仲裁规则,从上述m个接收单元中确定目标接收单元,m为大于1且小于或等于上述路由器包含的全部上述接收单元数量。可选的,上述第一路由器的通道选择器还用于,在上述仲裁器确定上述目标接收单元后,联通上述目标接收单元到上述第一发送单元的数据通路,以使上述第一发送单元从上述目标接收单元的存储区域获取数据并发送在本申请实施例中,利用仲裁器实现“多到一”的公平仲裁机制,减少路由转发数据时的冲突。其中,为了保证每个发送单元的正常工作,仲裁器与该发送单元存在一一对应的关系。
在一种可能实现的方式中,上述仲裁器包括第二异步握手电路;上述第一路由器的第二异步握手电路用于:在确定上述第一发送单元的发送数据的状态就绪后,基于上述目标接收单元向上述第一发送单元发送的请求发送数据的信号,确定第二时钟信号;上述第一路由器的通道选择器具体用于:基于上述第二时钟信号,联通上述目标接收单元到上述第 二发送的数据通路。在本申请实施例中,路由器中的仲裁器基于Click电路等握手电路的简单令牌环机制,实现该数据传输装置中公平仲裁机制。而且,该仲裁器通过利用同接收单元的时序依赖关系的公共仲裁机制,实现基于数据包的传输机制,性能高。可以理解的是,该路由器中的仲裁器为异步仲裁器。
在一种可能实现的方式中,每个上述仲裁器中上述第二异步握手电路的数量与上述路由器中接收端口的数量相比少一。在本申请实施例中,为了保证除发送端口对应的接收端口外,其他接收端口都需要向发送端口发送消息,所以,仲裁器中第二异步握手电路的数量与路由器中接收端口的数量相比少一。
第二方面,本申请实施例提供了一种数据传输方法,应用于数据传输装置,上述数据传输装置包括:多个处理单元和多个路由器,每个上述路由器与一个或多个处理单元连接,每个上述路由器与上述多个路由器中任意一个路由器形成通信连接关系;其中,上述多个路由器包括第一路由器,上述第一路由器与第一处理单元连接;上述方法包括:通过上述第一处理单元生成第一请求,上述第一请求用于请求将目标数据发送至第二处理单元,上述目标数据包括第二处理单元的目的地址;通过上述第一处理单元确定上述第一路由器的接收数据的状态就绪后,基于上述第一请求,确定第一时钟信号;通过上述第一处理单元基于上述第一时钟信号向上述第一路由器发送上述目标数据,并将上述第一时钟信号发送至上述第一路由器;通过上述第一路由器接收上述第一时钟信号;通过上述第一路由器基于上述第一时钟信号接收上述第一处理单元发送的上述目标数据;通过上述第一路由器根据上述目的地址,向上述第二处理单元发送上述目标数据。
在一种可能实现的方式中,上述方法还包括:通过上述第一路由器在上述目标数据接收完毕后,将上述第一路由器的接收数据的状态就绪调整为接收数据的状态未就绪;通过上述第一处理单元在监测到上述第一路由器的接收数据的状态就绪变为状态未就绪后,确定上述目标数据发送完成。
在一种可能实现的方式中,每个上述处理单元包括第一异步握手电路;上述第一处理单元具体用于:通过上述第一异步握手电路在确定上述第一路由器的接收数据的状态就绪后,基于上述第一请求,确定上述第一时钟信号。
在一种可能实现的方式中,每个上述处理单元包括异步消息发送器;上述第一处理单元具体用于:基于上述第一请求,控制上述异步消息发送器基于上述第一时钟信号将上述目标数据以串行单比特的传输方式发送至上述第一路由器。
在一种可能实现的方式中,上述目标数据在发送过程中的数据形式为变长或定长的数据包;上述方法还包括:通过上述第一处理单元在生成第一请求后,设置上述目标数据的包头,将上述包头和上述第一时钟信号发送至上述第一路由器;在上述目标数据的最后一位数据发送完毕后,设置上述目标数据的包尾并发送;通过上述第一路由器在接收到上述目标数据对应的包头后,启动接收上述目标数据;在接收到上述目标数据的包尾后,将上述第一路由器的接收数据的状态就绪调整为接收数据的状态未就绪。
在一种可能实现的方式中,每个上述处理单元包括基于先进先出存储机制的存储区域;上述通过上述第一处理单元生成第一请求,包括:通过上述第一处理单元将上述目标数据 写入至上述基于先进先出存储机制的存储区域后,生成上述第一请求。
在一种可能实现的方式中,上述第一处理单元和上述第一路由器之间通过异步消息总线连接,其中,上述异步消息总线包括接收就绪信号线,时钟信号线,消息有效位信号线和一根或多根数据线。
在一种可能实现的方式中,每个上述路由器包括多组端口,每组上述端口包括接收端口和发送端口,其中,每个上述接收端口与一个接收单元对应,用于接收数据,每个上述发送端口与一个发送单元对应,用于发送数据。
在一种可能实现的方式中,每个上述接收端口与一个接收单元对应,每个上述接收单元包括基于先进先出存储机制的存储区域;上述通过上述第一路由器基于上述第一时钟信号接收上述第一处理单元发送的上述目标数据,包括:基于上述第一时钟信号,通过目标接收端口驱动第一接收单元内的上述存储区域接收上述第一处理单元发送的上述目标数据,上述目标接收端口为上述第一路由器中与上述第一处理单元连接的接收端口。
在一种可能实现的方式中,每个上述发送端口与一个发送单元对应;上述通过上述第一路由器根据上述目的地址,向上述第二处理单元发送上述目标数据,包括:根据上述目的地址,确定上述第一路由器中的目标发送端口,上述目标发送端口为上述第一路由器中与上述第二处理单元对应的发送端口;通过上述目标发送端口对应的第一发送单元向上述第二处理单元发送上述目标数据。
在一种可能实现的方式中,每个上述路由器包括映射表,上述映射表包括上述路由器中每个上述发送端口的端口标识与对应的上述处理单元的单元标识或其他上述路由器的路由标识之间的映射关系,上述单元标识用于唯一确定上述处理单元,上述路由标识用于唯一确定上述路由器;上述根据上述目的地址,确定上述第一路由器中的目标发送端口,包括:根据上述目的地址,基于上述第一路由器中的映射表确定上述目标发送端口。
在一种可能实现的方式中,在上述第二处理单元与第二路由器对应连接,且上述第一路由器与上述第二路由器为上述多个路由器中两个不同的路由器时,上述目标发送端口为与上述第一路由器中与上述第二路由器连接跳数最少的发送端口。
在一种可能实现的方式中,上述通过上述目标发送端口对应的第一发送单元向上述第二处理单元发送上述目标数据,包括:在上述第一发送单元接收到上述第一接收单元发送的第二请求时,控制上述第一发送单元从上述第一接收单元的存储区域中获取上述目标数据,上述第二请求用于请求通过上述第一发送单元发送上述目标数据;通过上述第一发送单元基于上述目标发送端口以串行单比特的传输方式向上述第二处理单元发送上述目标数据。
在一种可能实现的方式中,每个上述路由器包括通道选择器;上述方法还包括:通过上述第一路由器的通道选择器联通上述第一接收单元到上述第一发送单元的数据通路,以使上述第一发送单元从上述第一接收单元的存储区域获取上述目标数据。
在一种可能实现的方式中,每个上述路由器包括仲裁器,每个上述发送单元对应一个上述仲裁器;上述方法还包括:在m个接收单元同时向上述第一发送单元请求发送数据时,通过上述第一路由器的仲裁器根据预设仲裁规则,从上述m个接收单元中确定目标接收单元,m为大于1且小于或等于上述路由器包含的全部上述接收单元数量。
在一种可能实现的方式中,上述方法还包括:在上述仲裁器确定上述目标接收单元后,通过上述第一路由器的通道选择器联通上述目标接收单元到上述第一发送单元的数据通路,以使上述第一发送单元从上述目标接收单元的存储区域获取数据并发送。
在一种可能实现的方式中,上述仲裁器包括第二异步握手电路;上述在上述仲裁器确定上述目标接收单元后,通过上述第一路由器的通道选择器联通上述目标接收单元到上述第一发送单元的数据通路,包括:在确定上述第一发送单元的发送数据的状态就绪后,基于上述目标接收单元向上述第一发送单元发送的请求发送数据的信号,通过上述第一路由器的第二异步握手电路确定第二时钟信号;基于上述第二时钟信号,通过上述第一路由器的通道选择器联通上述目标接收单元到上述第二发送的数据通路。
在一种可能实现的方式中,每个上述仲裁器中上述第二异步握手电路的数量与上述路由器中接收端口的数量相比少一。
第三方面,本申请实施例提供一种计算机可读存储介质,用于储存为上述第一方面提供的一种数据传输装置所用的计算机软件指令,其包含用于执行上述方面所设计的程序。
第四方面,本申请实施例提供了一种计算机程序产品,该计算机程序产品包括指令,当该计算机程序被计算机执行时,使得计算机可以执行上述第一方面中的数据传输装置所执行的流程。
第五方面,本申请提供了一种芯片系统,该芯片系统包括了上述第一方面以及结合第一方面的任意一种实现方式所提供的装置。该芯片系统用于实现上述第一方面中所涉及的装置的功能。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存数据传输装置必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
第六方面,本申请实施例提供了一种电子设备,该电子设备包括了上述第一方面以及结合第一方面的任意一种实现方式所提供的装置。该电子设备用于实现上述第一方面中所涉及的功能。
在本申请实施例中,基于异步握手机制的数据传输装置通过处理单元(第一处理单元)在确定路由器(第一路由器)的接收数据的状态就绪后,基于处理单元发送数据的请求,生成第一时钟信号;并按照第一时钟信号将所述第一时钟信号和所述目标数据发送至路由器中,使得与处理单元连接的路由器可以通过第一时钟信号接收到目标数据,然后路由器再根据接收到的目标数据中,携带的目的地址向第二处理单元发送该目标数据。这种处理单元和路由器之间异步握手的传输方式可以确保路由器接收完成目标数据。另外,处理单元将发送目标数据时的时钟信号(即,第一时钟信号)也发送至路由器,以使路由器可以根据该时钟信号接收到数据,减少数据传输装置内的时钟约束,更易集成多种异构的处理单元或知识产权核,同时使得多个路由器之间不受同步时钟限制,决策更快,可有效提升系统的传输性能。而且,处理单元与路由器之间连接数据线相对短且相对确定,会进一步导致在处理单元有发送数据的需求时,对应的时钟信号的时延小且相对确定。与此同时,本申请实施例中一个路由器可以同时与多个处理单元异步连接,大大减少总线占用的芯片 面积。
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。
图1是本申请实施例提供的一种数据包的结构示意图。
图2是本申请实施例提供的一种数据传输装置的结构示意图。
图3是本申请实施例提供的另一种数据传输装置的结构示意图。
图4是本申请实施例提供的一种异步消息收发器的结构示意图。
图5是本申请实施例提供的一种Click单元的电路示意图。
图6是本申请实施例提供的一种Click单元处于工作模式下的工作时序示意图。
图7是本申请实施例提供的一种发送单元的结构示意图。
图8是本申请实施例提供的一种异步消息发送处理流程示意图。
图9是本申请实施例提供的一种接收单元的结构示意图。
图10是本申请实施例提供的一种路由器的结构示意图。
图11是本申请实施例提供的一种简单数据传输装置的结构示意图。
图12是本申请实施例提供的一种路由器的实现框图。
图13是本申请实施例提供的一种路由器的转发流程示意图。
图14是本申请实施例提供的一种仲裁流程示意图。
图15是本申请实施例提供的一种仲裁器内部电路结构示意图。
图16是本申请实施例提供的一种基于图7发送单元扩展后的结构示意图。
图17是本申请实施例提供的一种基于图7所示发送单元的数据包传输效果图。
图18是本申请实施例提供的一种基于图16所示发送单元的数据包传输效果。
图19是本申请实施例提供的一种与图16对应的扩展后路由器的实现框图。
图20是本申请实施例提供的一种数据传输方法的流程示意图。
下面将结合本申请实施例中的附图,对本申请实施例进行描述。
本申请的说明书和权利要求书及所述附图中的术语“第一”和“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两 个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“a和/或b”可以表示:只存在a,只存在b以及同时存在a和b三种情况,其中a,b可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如,通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。
现有的数据在传输过程中,可以分为串行传输和并行传输两种传输方式。其中,串行传输是数据传输在一条信号线路上,按位进行的传输方式。例如,可以使用一根数据线传输数据,一次传输1个比特,多个比特需要一个接一个依次传输。并行传输是数据按照设定的位数分割成块,数据块通过与其位数相同数量的若干数据线同时将各批量数据传送。即,并行传输是将数据传输在多条信号线路上,使用多根并行的数据线一次同时传输多个比特。
串行传输和并行传输的区别:
1、由于传输特性,并行传输的线缆所占用的空间比串行传输的线缆所占用的空间大很多。
2、并行传输中,如果并行的线路之间的物理性质不一致,例如:长度上有细微差别,线缆的材质不同等,会导致并行线路中传输的比特不是同时到达接收方,接收器接收数据时容易出错。
3、串行传输传输频率比并行传输高。
4、并行传输的线路成本是串行传输的若干倍。
与串行传输和并行传输相对应的通信方式为异步通信和同步通信。异步通信在发送数据时,所发送的单位数据之间的时隙可以是任意的。但是接收端必须时刻做好接收的准备发送端可以在任意时刻开始发送字符,因此必须在每一个字符的开始和结束的地方加上标志,即加上开始位和停止位,以便使接收端能够正确地将每一个字符接收下来。同步通信是一种比特同步通信技术,要求发收双方具有同频同相的同步时钟信号,只需在传送报文的最前面附加特定的同步字符,使发收双方建立同步,此后便在同步时钟的控制下逐位发送/接收。相比于同步通信,异步通信的好处是通信设备简单、便宜,最重要的是不需要严格的控制时钟同步。
随着互联网化和行业数字化的深入,目前芯片间的集成ip核的数量也从几十个增长到上百个。这就是导致ip之间需要更多互连和带宽来支撑越来越多实时数据通信。而且,由于当前ip核都是采用同步电路设计技术,需要严格时钟同步且规模越来越大,因此,在高主频和数据位数大的情况下,ip核在soc芯片内的布局布线引入的时延也无法被忽略,导致规模越大的soc芯片的设计复杂度大幅度增加。所以为了简化芯片设计复杂度,减少开发投入和周期,同时又保证芯片的算力,本申请实施例中的芯片间架构采用异步通信与串行传输相结合的方式,并且在单一路由节点尽可能多接处理单元,简化芯片设计复杂度,减小芯片面积。
首先,在本申请实施例的数据传输装置中,传输的数据形式可以是数据包。其中,该数据包可以是定长,也可以是变长的。本申请实施例以变长的数据包为例,数据包结构如图1。其中,请参考附图1,图1是本申请实施例提供的一种数据包的结构示意图。如图1所示,该数据包的各个字段定义如下:
第一个字段是数据包的目的地址,也就是确定该数据包的接受方(第二处理单元),该字段的长度可根据实际总线的规模做相应的扩展。例如,3比特标识。在本申请实施例中可以是第二处理单元的通信地址。
第二字段是数据包的长度字段,表示该数据包的有效数据长度,如图1所示,数据包的有效数据以2比特为基本长度单位,长度字段表示有效数据为2比特的倍数,即2*长度值。
第三字段是数据包的有效数据,是存放该数据包要传递的信息,具体格式可根据设计需要进行约定,本申请实施例不做具体的限定。
第四字段是数据包的传输校验位,是检查传输中是否引入误码,其中,校验方式可根据实际场景选择。例如:可以采用奇偶校验等方式,本申请实施例对此不作具体的限定。
需要说明的是,本申请实施例以在数据传输装置中传输该数据包为例进行说明,并不限制在该数据架构中数据的具体传输形式。例如:还可以是数据帧、数据块等数据形式进行传输。
接下来,本申请实施例以传输的数据如图1所示的数据包为例,简单介绍下本申请实施例中异步通信与串行传输相结合的数据传输装置。请参考附图2,图2是本申请实施例提供的一种数据传输装置的结构示意图。该数据传输装置的结构示意图中,圆形代表处理单元01,方形代表路由器02。
如图2所示:该装置包括多个处理单元01和多个路由器02,所述多个路由器02之间可以形成连接通路,每个所述路由器02与一个或多个处理单元01连接,且每个所述处理单元01有且只有一个对应的路由器02与之连接。即,每个路由器02可以连接多个处理单元01,但每个所述处理单元01只能连接一个路由器02。而且,每个路由器02可以与多个路由器02中任意一个路由器02形成通信连接关系。即,所述多个路由器02中任意一个所述路由器02可以与所述多个路由器02中另外任意一个所述路由器02进行数据传输;进而,与一个所述路由器02连接处理单元01可以通过一个或多个路由器02与另一个路由器02 连接的处理单元01进行数据传输。例如:如图2所示,处理单元101中的数据可以通过路由器102和路由器104传输至处理单元103中。
可选的,多个路由器之间可以通过背靠背级联方式进行连接,减少路由器出线,缩短路由器间互连的线长,缩短传输时延。其中,所谓的背靠背级联方式是将相互连接的两个路由器之间接收端口和发送端口,通过数据线、导线或其他数据传输介质直接连接起来。即两台相互连接的路由器不通过通信网络,而直接通过相关数据传输介质来连接。例如:发送端路由器的发送端口与接收端路由器的接收端口直接连接起来。
可选的,路由器和处理单元之间也可以通过背靠背级联方式进行连接。
其中,该数据传输装置中的第一处理单元用于:生成第一请求,上述第一请求用于请求将目标数据发送至第二处理单元,上述目标数据包括第二处理单元的目的地址;在确定上述第一路由器的接收数据的状态就绪后,基于上述第一请求,确定第一时钟信号;基于上述第一时钟信号向上述第一路由器发送上述目标数据,并将上述第一时钟信号发送至上述第一路由器。
第一路由器用于:接收上述第一时钟信号;基于上述第一时钟信号接收上述第一处理单元发送的上述目标数据;根据上述目的地址,向上述第二处理单元发送上述目标数据。
在本申请实施例中,基于异步握手机制的数据传输装置通过处理单元(第一处理单元)在确定路由器(第一路由器)的接收数据的状态就绪后,基于处理单元发送数据的请求,生成第一时钟信号;并按照第一时钟信号将所述第一时钟信号和所述目标数据发送至路由器中,使得与处理单元连接的路由器可以通过第一时钟信号接收到目标数据,然后路由器再根据接收到的目标数据中,携带的目的地址向第二处理单元发送该目标数据。这种处理单元和路由器之间异步握手的传输方式可以确保路由器接收完成目标数据。另外,处理单元将发送目标数据时的时钟信号(即,第一时钟信号)也发送至路由器,以使路由器可以根据该时钟信号接收到数据,减少数据传输装置内的时钟约束,更易集成多种异构的处理单元或知识产权核,同时使得多个路由器之间不受同步时钟限制,决策更快,可有效提升系统的传输性能。而且,处理单元与路由器之间连接数据线相对短且相对确定,会进一步导致在处理单元有发送数据的需求时,对应的时钟信号的时延小且相对确定。与此同时,本申请实施例中一个路由器可以与多个处理单元异步连接,大大减少总线占用的芯片面积。
需要说明的是,该数据传输装置的处理单元可以包括知识产权(intellectual property,ip)核、微处理器,中央处理单元(central processing unit,cpu),数字信号处理(digital signal processing,dsp),图像处理单元(graphics processing unit,gpu),神经网络处理单元(neural-network processing unit,npu)等等可以进行数据处理的相关处理实体(process entity,pe)。
还需要说明的是,本申请实施例提供的数据传输装置结构不仅仅是如图2所示的闭合式结构,还可以包括非闭合式结构。例如:请参考附图3,图3是本申请实施例提供的另一种数据传输装置的结构示意图。如图3所示,该装置包括多个处理单元01和多个路由器02,所述多个路由器02之间可以形成连接通路,每个所述路由器02与至少两个处理单元01连接,且每个所述处理单元01有且只有一个对应的路由器02与之连接。与图2所示的装置不同,该数据传输装置内的多个路由器呈链状分布,该数据传输过程中最长传输路径 的路由跳数比该数据传输装置内路由器的数量减一。因此,本申请实施例对数据传输装置的连接结构不做具体的限定。
下面以处理单元向路由器发送数据为例,简单介绍下本申请实施例中异步串行传输方式涉及的示例性的电子设备以及相关逻辑模块。
(一)异步消息收发器
本申请实施例中的处理单元01和路由器02中均可以包括异步收发器,该异步收发器可以将目标数据按照异步串行的传输方式进行收发。
该异步收发器包括发送单元和接收单元,该发送单元包括异步握手电路和异步消息发送器,所述异步握手电路用于为所述异步消息发送器提供自时序时钟信号(相当于本申请中的第一时钟信号),以使该异步消息发送器按照该自时序时钟信号将目标数据以串行单比特的传输方式发送。
所述接收单元,用于接收目标数据。
以处理单元向路由器发送数据为例,请参考附图4,图4是本申请实施例提供的一种异步消息收发器的结构示意图。如图4所示:发送端中包括处理单元的发送单元tx(也可称之为异步消息发送单元、第二发送单元等)和消息包管理单元,接收端中包括路由器的接收单元rx(也可称之为异步消息接收单元、第一接收单元等)和消息包管理单元。其中,
发送端的消息包管理单元msg:基于fifo的消息包管理机制,可以驱动异步消息发送单元和异步消息接收单元传输数据包,实现异步消息包传输机制。其中,fifo是指先进先出机制,即最先进入该消息包管理单元中的目标数据最先被发送。另外,该发送端的消息包管理单元可以同步接收处理单元发送目标数据,包括数据发送指示位(数据包的开始位(msg_bn)和结束位(msg_end)),有效数据以及同步时钟等中的一个或多个。还可以向处理单元发送接收指示位(接收到数据)以及反馈数据发送状态(成功或失败)。
接收端的消息包管理单元msg:也是基于fifo的消息包管理机制,另外,该接收端的消息包管理单元可以向路由器或处理单元发送目标数据,包括数据发送指示位(数据包的开始位(msg_bn)和结束位(msg_end)),有效数据以及异步自时钟等。还可以接收路由器或处理单元发送接收指示位(接收到数据)以及反馈数据发送状态(成功或失败)。
发送单元和接收单元:可以实现1、串行单比特传输机制。由于路由器和处理单元时背靠背级联连接的传输架构,所以不同端(例如:不同端是指不同处理单元与路由器之间,不同路由器与路由器之间)的发送单元和接收单元可通过一根或多根数据线来发送或接收数据(如图4中发送数据),例如:可以通过单根数据线实现本申请的串行单比特传输机制。2、缩短数据传输时延。由于本申请实施例中发送单元和接收单元之间数据的最小传输单元可以是数据包,即,本申请实施例采用基于包的异步握手机制替代异步单比特握手的机制,缩短了传输时延,(例如:如图4中发送指示位,通过消息有效位信号线传输的信号确定包的开始位(msg_bn)和结束位(msg_end)。3、简化集成和对接的时序分析。由于不同端之间采用了异步时序(如图4中自时序自时钟),而且针对一个路由器的不同接收单元之间无需严格时钟同步,进一步的简化集成和对接的时序分析,易于根据业务需求做扩展。
需要说明的是,上述发送单元和接收单元之间的具体实施过程请参考下述发送单元和 接收单元装置实施例以及方法实施例的相关描述,此处暂不赘述。
a、发送单元
发送单元包括异步握手电路和异步消息发送器,所述异步握手电路为异步消息发送器提供自时序时钟,所述异步消息发送器按照所述自时序时钟信号将所述目标数据以串行单比特的传输方式发送至路由器。
异步握手电路
首先,简单介绍一下本申请实施例涉及的异步握手电路,该步握手电路可称作click element,以下称Click单元,该Click单元可以为异步消息发送器提供自时序时钟。而且Click单元因设计简单,可大大简化将同步电路改为异步电路的设计复杂度。其中,该Click单元通过自环路的方式产生自时序时钟驱动异步消息发送器连续发送串行数据,自环路的时延由发送单元到接收单元的最大时延决定。
请参考附图5,图5是本申请实施例提供的一种Click单元的电路示意图。该Click单元包括:两个与门,一个或门和锁相寄存器。
与门,是执行“与”运算的基本逻辑门电路。此电路有多个输入端,一个输出端。当所有的输入同时为高电平(逻辑1)时,输出才为高电平,否则输出为低电平(逻辑0)。
或门,是实现逻辑加的电路,又称逻辑和电路。此电路有两个以上输入端,一个输出端。其中,只要有一个或几个输入端是高电平(逻辑1),或门的输出即为高电平(逻辑1)。而只有所有输入端为低电平(逻辑0)时,输出才为低电平(逻辑0)。
锁相寄存器,用于将在本申请实施例中B.ack对应的信号进行电平翻转,即,当B.ack的电平发生变化后,将变化后的电平翻转回来。例如:B.ack由低电平变为了高电平,锁相寄存器可以将高电平重新变为低电平。
即,该Click单元最终输出的Fire=-A.req*A.ack*B.ack+A.req*-A.ack*-B.ack。其中,-A.req为A.req的反转信号。例如:A.req为高电平时,-A.req为低电平。同理,-A.ack和-B.ack分别为A.ack和B.ack的反转信号。一般高电平为1,低电平为0。
其中,该Click单元握手电路包括:前向握手信号线,后向握手信号线和自时钟信号线。
1.前向握手信号线,分别是请求和响应两个信号线,如图5中的A.req,A.ack;
2.后向握手信号线,分别是请求和响应两个信号线,如图5中的B.req,B.ack;
3.自时钟信号线,如图5中Fire。
该自时钟信号线Fire可以驱动基于先进先出机制的数据存储装置(如:寄存器、串行fifo存储器、fifo队列)按照自时钟信号Fire输出数据。
请参考附图6,图6是本申请实施例提供的一种Click单元处于工作模式下的工作时序示意图。Click单元的电路工作模式,其中,如图6所示:信号in_req为上述图5中的A.req信号线输出的信号,in_ack为上述图5中的B.ack信号线输出的信号,out_req为上述图5中的B.req信号线输出的信号,out_ack为上述图5中的A.ack信号线输出的信号。
即,Fire=-in_req*out_ack*in_ack+in_req*-out_ack*-in_ack。
其中,该Click单元的前向握手信号线A.req和后向握手信号线B.ack是该Click单元两个输入信号线,前向握手信号线A.ack和后向握手信号B.req是该Click单元两个输出信 号线。在本申请实施例中,当接收到目标数据的发送请求时,输入信号in_req由低电平变为高电平时(上升沿),即可触发该Click单元自时钟Fire,进行数据的发送。
同时,由图6所示的工作时序可知,该Click单元实现4相握手协议,其中,请求的上升沿和下降沿都可以产生自时序时钟,即,图6所示中的Fire。
需要说明的是,本申请实施例也可采用其他电路用于提供自时序时钟,本申请实施例对此并不做具体的限定。
还需要说明的是,本申请实施例Click单元的具体应用过程还请参考下述装置实施例的相关描述,本申请实施例暂不叙述。
其次,以上述图4中处理单元为发送端为例,介绍下处理单元向路由器发送目标数据时处理单元侧的发送单元的相关实施方式。请参考附图7,图7是本申请实施例提供的一种发送单元的结构示意图。如图7所示:发送单元可以包括Click单元和异步消息发送器,还可以包括异步消息总线。其中,
异步消息总线是指连接接收单元和发送单元的连接数据线,其中,发送单元侧的包括四根信号线,包含接收就绪信号线,自时序时钟信号线,消息有效位信号线,数据线(传输数据的信号线)。接收就绪信号线用于传输指示位,自时序时钟信号线用于传输自时序时钟,消息有效位信号线用于传输数据包的包头和包尾,数据线用于传输有效数据。
Click单元(相当于本申请中的第一异步握手电路),通过自环路的方式产生自时序时钟驱动异步消息发送器连续发送串行数据,其中,Click单元中自环路的时延由处理单元侧发送单元到路由器侧接收单元的最大时延决定。需要说明的是,最大时延可以由连接了处理单元侧发送单元与路由器侧接收单元的数据线的长度、材质等影响数据传输时间的物理量确定。还需要说明的是,Click单元产生自时序时钟的方式可以对应参考上述图5到图6相关实施例的描述,本申请实施例在此暂不赘述。
异步消息发送器,基于上述Click单元提供的自时序时钟的驱动读取串行fifo(基于fifo存储介质)和数据包的有效位,向异步消息总线的数据线和消息有效位输出数据,同时也延迟一定时间也将自时序时钟也输出到异步消息总线的自时序时钟。其中,如图7所示:异步消息发送器包括异步消息发送处理流程,消息长度len,异步串行fifo,包封装模块M,有效数据D。请参考附图8,图8是本申请实施例提供的一种异步消息发送处理流程示意图。如图8所示:异步消息发送处理流程可以启动消息包发送,调用消息len判断消息长度是否大于0;若是,设置发送请求信号A.req;设置消息有效位为有效;等待发送完成设置A.ack,异步串行fifo中的数据移出一位,同时消息len减小1;循环该过程,直至消息len长度为0,设置消息发送完成,并设置消息有效位为无效。其中,消息len用于进行目标数据的包长统计,有效数据D为目标数据中的有效数据。
以发送单元向路由器发送目标数据为例,处理单元侧发送单元中异步消息收发器发送数据流程的具体步骤如下:
1、写发送消息。处理单元先从数据接口向异步消息发送器中的异步串行fifo写入要发送的数据包。
2、启动发送。处理单元通知其异步消息发送单元中异步消息发送处理流程启动数据发 送。
3、设置包头并等待接收。异步消息发送单元在包封装模块M设置包头并等待路由节点的接收端的接收就绪变为有效。
4、请求发送。如接收就绪变为有效,则向异步消息发送单元中的Click单元请求发送。
5、发送比特。Click单元触发时钟,向异步消息总线发送包头,第一个比特数据及对应的时钟脉冲。
6、本比特发送完成。根据预设的时延电路,Click单元向自己反馈该比特发送是否完成。需要说明的是,该时延电路确定的延时时长由数据从发送单元到接收单元之间的距离确定。
7、下一比特。一旦检测到发送完成,Click单元向异步消息发送单元通知可以发第一个比特,如果报文未结束,则重复上述步骤4到7,直到报文的最后一个比特。
8、设置包尾。在检测到最后一个比特之后,异步消息发送单元在数据包模块M设置包尾,重复上述步骤4-7。
9、接收完成。路由节点的接收端在检测到包尾之后,将接收就绪信号设置为无效,表示报文已接收完成。
10、发送完成。处理单元的发送单元在检测到报文接收完成信号,向本端的处理单元通知发送完成。
b、接收单元
另外,请参考附图9,图9是本申请实施例提供的一种接收单元的结构示意图。如图9所示:接收单元可包括:异步消息总线和异步消息接收器两个部分组成。
其中,异步消息总线是指连接接收单元和发送单元的连接数据线,其中,接收单元侧的包括四根信号线(与发送单元侧连接),包含接收就绪信号线,自时序时钟信号线,消息有效位信号线,数据线(传输数据的信号线)。接收就绪信号线用于传输指示位,自时序时钟信号线用于传输自时序时钟,消息有效位信号线用于传输数据包的包头和包尾,数据线用于传输有效数据。
异步消息接收器包括异步消息接收处理流程和异步串行fifo。异步消息接收处理流程可以监管数据接收流程并将数据包保存在异步串行fifo中。由于发送单元将自时序时钟通过异步消息总线传递过来,所以接收单元可直接利用这个时钟信号来接收发送单元传输过来的数据。接收单元需要等待消息发送的完成信号或事件,即可通知下游的处理单元或路由器等读取接收到的数据包。
其中,以上述图4中路由器为接收端为例,介绍下发送单元向路由器发送目标数据时路由器侧接收单元的相关实施方式。如图9所示:具体接收单元接收数据流程的具体步骤如下:
1、接收就绪。在接受到异步消息发送单元发送数据的请求时,本端(路由器中的接收单元)确认可接收新的数据包,设置接收就绪信号。
2、检测到包头。在检测到包头信号时启动接收数据和数据包长统计。
3、接收数据。基于对端(处理单元中的发送单元)的自时钟驱动本端的fifo接收数据 并更新包长统计。
4、检测到包尾。在检测到包尾信号时确认报文已接收完成。
5、接收完成。本端设置接收完成信号通知对端报文已接收完毕,即,将接收就绪信号设置为无效,表示报文已接收完成。
6、消息就绪。通知本端的处理单元(接收目标数据的处理单元)的消息就绪。
7、读接收消息。本端处理单元通过数据接口读取接收到的数据包。
8、读完成。本端处理单元在读完数据包之后设置读完成,本端的接口单元检测该信号之后并重复步骤1准备接收下一数据包。
需要说明的是,步骤6-步骤8是在连接在同一个路由器且没有发送冲突时两个处理单元进行目标数据传输的步骤流程。
由于本申请实施例中发送单元和接收单元之间的最小传输单元是数据包,即,本申请实施例采用基于包的异步握手机制替代异步单比特握手的机制,缩短了传输时延,(例如:如图4中发送指示位,确定包的开始位(msg_bn)和结束位(msg_end)。而且,不同端之间采用了异步时序(如图4中自时序自时钟),无需严格时钟同步,简化集成和对接的时序分析,易于根据业务需求做扩展。
总之,在本申请实施例中路由器和处理单元均包括接收单元和发送单元,其中,该路由器和处理单元中接收单元和发送单元的结构和功能均可参考上述实施例的相关描述。例如:该第一路由器的接收单元和第一处理单元的发送单元之间可以实现1、异步串行单比特传输机制。2、缩短数据传输时延。3、简化集成和对接的时序分析。
(二)路由器
路由器配置有多组路由端口,每组所述路由端口包括接收端口和发送端口,其中,每组所述路由端口接收端口还连接一个接收单元,每组所述路由端口发送端口连接一个发送单元。
其中,路由器包含有异步收发单元(即:与各个端口连接的接收单元或发送单元),映射表,路由仲裁和通道选择器。请参考附图10,图10是本申请实施例提供的一种路由器的结构示意图。如图10所示:
异步收发单元:从路由端口上接收和发送消息包,路由器中异步收发单元也包括接收单元和发送单元(即,RX和TX)。
如上述图10所示:异步收发单元包括多个接收单元和发送单元。每个接收单元和每个发送单元都对应了一个端口。例如:接收单元RX0对应了接收端口A,接收单元RX1对应了接收端口B,接收单元RX2对应了接收端口C,接收单元RX3对应了接收端口D;发送单元TX0对应了发送端口A,发送单元TX1对应了发送端口B,发送单元TX2对应了发送端口C,发送单元TX3对应了发送端口D。
需要说明的是,路由器中的接收单元和发送单元与上述实施例(一)中异步收发器的传输机制相同,可以实现1、串行单比特传输机制。2、缩短数据传输时延。3、简化集成和对接的时序分析等,本申请实施例在此不再赘述。
映射表:包括本路由器各个发送端口与处理单元或其他路由器的连接关系,使得路由 器根据soc的拓扑组网情况配置映射表,查找消息包的发送端口。该映射表包括目标端口号和本路由器发送端口号。其中,目标端口号包括该端口连接的处理单元的单元标识、路由器的路由标识或该端口对应的通信编码、通信地址(如:数据包中包含的目的地址)等等,其中,该单元标识用于唯一识别处理单元,路由标识用于唯一识别路由器。
请参考附图11,图11是本申请实施例提供的一种简单数据传输装置的结构示意图。该数据传输装置包括两个路由器和六个处理单元,其中,每个路由器连接三个处理单元。如图11所示的连接关系,该数据传输装置的每个路由器的映射表如下所示:
表1:路由器1的映射表
目标端口号 | 本路由器发送端口号 |
1-处理单元1 | 1 |
2-处理单元2 | 2 |
3-处理单元3 | 3 |
其他 | 4 |
表2:路由器2的映射表
目标端口号 | 本路由器发送端口号 |
4-处理单元4 | 1 |
5-处理单元5 | 3 |
6-处理单元6 | 4 |
其他 | 2 |
需要说明的是,每个路由器只保存自己本地端口对应的映射表。
仲裁器:每个发送端口都对应一个仲裁器,在满足仲裁条件的情况下(如:多个接收端口同时向一个发送端口发送消息时),每次只处理一个接收端口发送请求并每个接收端口都获得公平的发送机会;
通道选择器:连通或断开接收端口到发送端口的数据通道。例如:在满足仲裁条件的情况下,根据仲裁器的仲裁结果,连通或断开接收端口到发送端口的数据通道。又例如:在数据传输完毕的情况下,断开接收端口到发送端口的数据通道。
基于上述图10所示的路由器结构示意图,下面以端口A向端口D发送消息包为例,描述路由器的转发流程,具体步骤如下:请参考附图12和图13,图12是本申请实施例提供的一种路由器的实现框图,图13是本申请实施例提供的一种路由器的转发流程示意图。其中,如图12所示,端口A、端口B和端口C均可以向端口D的仲裁器发送数据请求;仲裁器选择端口A、端口B或端口C向端口D发送数据;通道选择器获取仲裁器确定的发送端口后,打通端口A、端口B或端口C与端口D之间的数据通道;接收端口A、端口B或端口C发送的1、发送请求;2、消息有效位(即,数据的包头或包尾);3、数据位;4、并向其反馈继续发送下一比特的数据位。
以端口A向端口D发送数据为例,如图13所示:
1、路由器的端口A的接收单元收到一个消息包之后,提取目的处理单元的编码,通过目的处理单元的编码查找其所在对应的端口,如端口D。
2、端口A的接收单元向端口的仲裁器请求发送消息包。
3、仲裁器在确认端口D发送就绪之后启动仲裁,如不存在冲突,直接判决给端口A发送消息包。
4、仲裁器向通道选择单元发送选通端口A的信号,打通端口A和端口D的数据通道
5、在数据通道打通之后,端口D的发送就绪信号就可以送到端口A。
6、在检测到端口D的发送就绪信号之后,端口A启动消息包发送。
7、在发送报文结束之后,向仲裁器通知发送完成。
8、接收端口A也同时释放请求。
9、仲裁器根据端口D的发送完成和端口A的释放请求信号之后,向通道选择单元通知释放端口A和端口D的数据通道,至此完成一个完整的发送过程。
路由器中的仲裁器
由于本申请中的仲裁器没有同步时钟周期的仲裁保护窗口,不能完全重用现在同步的仲裁器机制,因此需要设计基于事件达到时间的实时仲裁器机制,可发挥异步的实时优势,同时还要实现公平仲裁。
下面以三个接收端口同时向一个发送端口的冲突场景为例,示例性的说明该发送端口对应仲裁器的工作原理。请参考附图14,图14是本申请实施例提供的一种仲裁流程示意图。如图14所示,步骤如下:
1、仲裁器在判断同时满足预设仲裁条件中的多个,则启动新一轮仲裁,否则等待状态变化。其中,预设仲裁条件如下:
1)至少一个接收端口的异步消息接收器器要向目标发送端口请求发送目标数据。
2)本发送端口处于就绪状态。
3)本发送端口不处于完成状态。
4)本发送端口不被选通。
2、判断接收端口A是否需要选通,如端口A有请求,则设置接收端口A选通;否则,步骤3;
3、判断接收接收端口B是否需要选通,如端口B有请求,则设置接收端口B选通;否则,步骤4;
4、判断接收端口C是否需要选通,如端口C有请求,则设置接收端口C选通;否则,回到步骤1;
5、在接收端口A选通之后,等待接收端口A发送完成,如未完,则等待;否则释放接收端口A的选通信号,到步骤3;
6、在接收端口B选通之后,等待接收端口B发送完成,如未完,则等待;否则释放接收端口B的选通信号,到步骤4;
7、在接收端口C选通之后,等待接收端口C发送完成,如未完,则等待;否则释放接收端口C的选通信号,到步骤1。
请参考附图15,图15是本申请实施例提供的一种仲裁器内部电路结构示意图。如图15是异步仲裁器的实现方案,是基与多个Click电路(可对应参考上述图5所述实施例)的循环仲裁电路,利用Click电路实现令牌环机制,只有收到令牌才能做判决,确保在任意时间序列下每个端口都获得一次判断机会,从而达到循环仲裁的目标。其中,该Click电路的工作模式可以对应参考上述Click电路装置实施例的相关描述本申请实施例再次不再赘述。需要说明的是,该Click电路的数量与路由器中接收端口的数量相对应。
其中,该图15中R
A、R
B、R
C表示接收端口A、接收端口B、接收端口C的信号;S
A、S
B、S
C分别表示接收端口A、接收端口B、接收端口C到发送端口D的连接通路;T
A是状态切换指示,T
R是发送端口D就绪状态,T
C发送端口D发送完成状态。ClickA电路、ClickB电路和ClickC电路是三个异步握手电路(相当于本申请中的第二异步握手电路),&表示与门逻辑电路。其中,图15中的①-⑦对应上述图14中的实现流程的①-⑦。
可以理解的是,本申请实施例对仲裁器的实现方案并不做具体的限定。
基于上述仲裁器内部电路结构,对仲裁器内真值表进行分析。请参考下表3-表5。
表3:三个发送请求同时到达的情况
表4:只有一个发送信号的情况
表5:有两个发送请求同时到达的情况
其中,上述表3至表5的真值表中1为逻辑1,代表真、就绪状态,0为逻辑0,代表假、未就绪状态。发送请求下的PortA、PortB和PortC分别代表接收端口A、接收端口B和接收端口C的发送请求;通道信号下的PortA、PortB和PortC分别代表发送端口D与接收端口A、接收端口B和接收端口C之间的通道信号;发送就绪是指发送端口D是否可以完成发送任务;发送完成是指发送端口D是否完成一个发送任务。例如:发送请求在自时序时钟(相当于本申请实施例中的第二时钟信号)的T0时刻PortA为1是指接收端口A在自时序时钟的T0时刻有发送请求需要被发送。
上述表3(场景一)是三个发送请求同时到达的情况,要实现先发送端口A,再发送端口B,最后发送端口C;表4(场景二)是只有一个发送信号的情况,要实现最短的仲裁周期(如六次Click握手周期);表5(场景三)是有两个发送请求同时到达的情况,也要实现公平调度。
在本申请实施例中,路由器中的仲裁器基于Click电路等握手电路的简单令牌环机制,实现该数据传输装置中公平仲裁机制。而且,该仲裁器通过利用同接收单元的时序依赖关 系的公共仲裁机制,实现基于数据包的传输机制,性能高。
通过本申请实施例可以基于异步消息总线架构,采用端口可复用的路由器和路由器级联技术,可与多个处理单元或路由器相连,布线简单,布线简短,路由算法简单,最多跳数是异步路由器数量减一,时延小且相对确定,大大减少总线占用的芯片面积;而且,路由器中的异步收发单元和处理器中异步收发器的应用可减少时钟约束,更易集成多种异构的处理单元或ip核;路由器中采用异步仲裁器,不受同步时钟限制,决策更快,可有效提升系统的转发性能。
(三)扩展后的接收单元和发送单元。
基于上述图7和图9所示的结构示意图,连接接收单元和发送单元的连接数据线包括四根信号线,包含接收就绪信号线,自时序时钟信号线,消息有效位信号线,数据线(传输数据的信号线)。其中,该数据线用于以串行传输的方式传输有效数据,当需要传输大数据块或向量等要求高速传输的数据时,该接收单元和发送单元之间的传输速度较慢。
由于本申请实施例中基于异步消息收发器的实现方案(图7和图9),通过采用绑定数据传输方案,可方便扩展数据线的通道数,完全可重复使用接收单元和发送单元的控制线及控制逻辑,这样可以就根据需要将数据线扩展来支持大数据传输。因此,为了提高大数据块或传输速度要求较高的数据时,可以增加接收单元和发送单元之间数据线的数量,以实现多通道串行传输的传输方式。
请参考附图16,图16是本申请实施例提供的一种基于图7发送单元扩展后的结构示意图。如图16所示,相较于图7所示的消息总线,图16中的拓展了一条新的数据通道,包括:异步数据总线中添加了新的数据线,异步消息发送器中添加了新的有效数据传输模块D,异步串行fifo-2。
其中,请参考附图17和图18,图17是本申请实施例提供的一种基于图7所示发送单元的数据包传输效果图,图18是本申请实施例提供的一种基于图16所示发送单元的数据包传输效果。扩展后的电路结构示意图。如图17所示,在数据线只有一根时,数据包根据一条数据通道进行串行单比特传输,D0、D1和D2等是数据包中每个单位的数据,每单位数据的大小可以是1比特。如图18所示,在数据线只有多根时(以两根为例),数据包根据多条数据通道进行串行单比特传输,其中,D0、D1和D2等是数据包对应通道中每个单位的数据,每单位数据的大小可以是1比特。此时,在同一异步时钟驱动下,可同时传输多路数据。需要说明的是,当大数据包按照并行数据的方式发送时,可以根据一定的算法如奇偶分行算法,将原有的并行数据转为多个通道的串行数据(如:奇数位比特数据在通道1对应的数据线和串行fifo中传输和存储,偶数位比特在通道2对应的数据线和串行fifo中传输和存储)。
需要说明的是,在发送单元的基础上接收单元可以对应增加串行fifo以保存接收到的数据。
请参考附图19,图19是本申请实施例提供的一种与图16对应的扩展后路由器的实现框图。如图19所示,针对路由器,在扩展后接收单元和发送单元后,只需要扩展通道选择 器,与异步数据收发单元的保持一致即可使得,该通道选择器支持多通道的数据传输即可实现。
需要说明的是,图19只是以扩展数据通道为例示例性的说明,具体的实现方式可以根据业务需求进行定制,本申请实施例对此不做具体的限制。
还需要说明的是,上述图4-上述图19只是本申请实施例示例性的说明,其具体的实现方式本申请实施例并不限制,而且本申请实施例所提及的装置可以是一个控制装置或者一个处理模块等用于对数据传输装置内的数据进行传输,本申请对装置的具体形式不做具体的限定。
还需要说明的是,上述图4-上述图19所述多个单元的划分仅是一种根据功能进行的逻辑划分,不作为对数据传输装置内具体的结构的限定。在具体的实现中,其中部分功能模块可能被细分为更多细小的功能模块,部分功能模块也可能组合成一个功能模块,但无论这些功能模块是进行了细分还是组合,数据传输装置在对数据传输的过程中所执行的大致流程是相同的。通常,每个单元都对应有各自的程序代码(或者说程序指令),这些单元各自对应的程序代码在相关硬件装置上运行时,使得该单元执行相应的流程从而实现相应功能。另外,每个单元的功能还可以通过相关的硬件实现。
基于上述装置实施例提供的相关装置,结合本申请中提供的数据传输方法,对本申请中提出的技术问题进行具体分析和解决。
参见图20,图20是本申请实施例提供的一种数据传输方法的流程示意图,该方法可应用于上述图2或图3中所述的数据传输架构中,其中,处理单元可以用于支持并执行图3中所示的方法流程步骤S301-步骤S304。路由器可以用于支持并执行图3中所示的方法流程步骤S305-步骤S308。下面以第一处理单元向目标处理单元发送目标数据为例,示例性的说明本申请实施例中数据传输方法。该方法可以包括以下步骤S301-步骤S308。
步骤S301:第一处理单元确定目标数据。
具体地,第一处理单元确定目标数据,所述目标数据包括第二处理单元的目的地址。其中,目标地址可以为第二处理单元的通信地址。
可选的,目标数据在发送过程中的数据形式为变长或定长的数据包。如上述图1所述的数据包结构。
步骤S302:第一处理单元生成第一请求。
具体地,第一处理单元生成第一请求,所述第一请求用于请求将目标数据发送至第二处理单元。需要说明的是,该第一请求相当于图5所示的A.req信号,当生成第一请求时,相当于该A.req信号由低电平变为高电平。第一请求可以用于触发第一异步握手电路(如图5所示)生成第一时钟信号。
可选的,每个所述处理单元包括基于先进先出存储机制的存储区域。第一处理单元将所述目标数据写入至所述基于先进先出存储机制的存储区域后,生成所述发送请求。其中,该基于先进先出存储机制的存储区域可以是如上述图7或图9中的异步串行fifo模块,还可以是其他形式的存储区域,如:存储器、队列或链表等等。通过先进先出的存储机制使得在多个目标数据需要发送的情况下,按照一定的时间次序依次发送,使得在发送过程中 发送单元决策更快,可有效提升系统的传输性能。
步骤S303:第一处理单元在确定第一路由器的接收数据的状态就绪后,基于第一请求,确定第一时钟信号。
具体地,第一处理单元在确定第一路由器的接收数据的状态就绪后,基于第一请求,确定第一时钟信号。需要说明的是,该第一时钟信号是由第一路由器的接收状态就绪和第一请求同时触发的时钟信号,该时钟信号可以驱动发送单元向接收单元发送数据,也可以驱动接收单元接收发送单元发送的数据。该第一时钟信号相当于上述图7或图9所述实施例中的自时序时钟信号。
可选的,每个上述处理单元包括第一异步握手电路;上述第一处理单元具体用于:通过上述第一异步握手电路在确定上述第一路由器的接收数据的状态就绪后,基于上述第一请求,确定上述第一时钟信号。第一时钟信号信号(也可以称之为自时序时钟)是由异步握手电路提供的。该异步握手电路,结构简单,可以通过自环路的方式产生自时序时钟,即,在路由器的接收数据的状态就绪和第一请求同时存在时,即可产生自时序时钟,以驱动异步消息发送器以串行单比特的传输方式发送目标数据至路由器。
可选的,每个上述处理单元包括异步消息发送器;上述第一处理单元具体用于:基于上述第一请求,控制上述异步消息发送器基于上述第一时钟信号将上述目标数据以串行单比特的传输方式发送至上述第一路由器。该异步消息发送器可以接收第一时钟信号的驱动以串行单比特的传输方式发送目标数据至路由器,实现如处理单元与路由器之间的异步传输。
需要说明的是,该第一时钟信号信号也可以称之为自时序时钟(如上述图7-图9所示实施例),是由异步握手电路提供的,如上述图5所示的Fire信号。该异步握手电路,结构简单,可以通过自环路的方式产生自时序时钟,即,在路由器的接收数据的状态就绪和第一请求同时存在时,即可产生自时序时钟,以驱动异步消息发送器以串行单比特的传输方式发送目标数据至路由器。每个上述处理单元还可以包括发送单元(即,第二发送单元),上述第二发送单元包括第一异步握手电路和异步消息发送器。
步骤S304:第一处理单元基于第一时钟信号向第一路由器发送目标数据,并将第一时钟信号发送至第一路由器。
具体地,第一处理单元中的第二发送单元基于第一时钟信号向第一路由器发送目标数据,并将第一时钟信号发送至第一路由器。
可选的,上述第一处理单元和上述第一路由器之间通过异步消息总线连接,其中,上述异步消息总线包括接收就绪信号线,时钟信号线,消息有效位信号线和数据线。在本申请实施例中,异步消息总线包括四根信号线,即,接收就绪信号线,时钟信号线,消息有效位信号线和一根或多根数据线。其中,接收就绪信号线用于传输就绪信号,该就绪信号用于指示接收数据的状态就绪;时钟信号线用于传输第一时钟信号;消息有效位信号线用于传输目标数据的包头信号和包尾信号;一根或多根数据线用于传输目标数据的有效数据。其中,当传输的数据较小(例如:目标数据为指示消息、控制消息、数据大小小于预设阈值的数据等等)时,可以通过一根数据线进行串行单比特的方式传输;当数据比较大(例如:目标数据为向量型数据、视频帧、图像数据、语音数据、数据大小大于或等于预设阈 值的数据等等)时,可以通过支持多通道的多根数据线进行串行传输,具体的实施方式可以对应参考上述实施例,本申请在此暂不赘述。该四根信号线大大缓解了现有技术中处理单元与路由器之间出线多、出线复杂等问题,减少了整个异步消息总线占用的芯片面积。
可选的,根据业务需求,该数据线还可以是支持多通道的多根数据线。
步骤S305:第一路由器接收第一时钟信号。
具体地,第一路由器接收第一时钟信号。
可选的,每个上述路由器包括多组端口,每组上述端口包括接收端口和发送端口,其中,每个上述接收端口用于接收数据,每个上述发送端口用于发送数据。在本申请实施例中,路由器通过可配置的端口,配置与路由器连接的处理单元或其他路由器。通过基于端口可配置的路由器,可灵活重构的组网架构,如点到点,多点到多点等架构。而且,路由器的内部每个端口都连接有接收单元或发送单元,以便收发数据。
可选的,所述目标数据在发送过程中的数据形式为变长或定长的数据包;所述第一处理单元还用于:在生成发送请求后,设置所述目标数据的包头。可以理解的,该包头需要通过消息有效位信号线同第一时钟信号一同发送至路由器。
步骤S306:第一路由器基于第一时钟信号接收第一处理单元发送的目标数据。
具体地,上述第一路由器基于第一时钟信号接收第一处理单元发送的目标数据。
可选的,目标数据在发送过程中的数据形式为变长或定长的数据包;上述第一处理单元还用于:在生成第一请求后,设置上述目标数据的包头,将上述包头和上述第一时钟信号发送至上述第一路由器;上述第一路由器还用于:在接收到上述目标数据对应的包头后,启动接收上述目标数据。例如:如上述图7所示,第一路由器检测到包头信号时并启动接收数据和消息包长统计。
可选的,路由器中每个上述接收端口与一个接收单元对应,每个上述接收单元包括基于先进先出存储机制的存储区域;上述第一路由器具体用于:基于上述第一时钟信号,通过目标接收端口驱动第一接收单元内的上述存储区域接收上述第一处理单元发送的上述目标数据,上述目标接收端口为上述第一路由器中与上述第一处理单元连接的接收端口。通过先进先出的存储机制使得在多个目标数据需要发送的情况下,按照一定的时间次序依次发送,使得在发送过程中发送单元决策更快,可有效提升系统的传输性能。另外,该基于先进先出存储机制的存储区域可以适用于同步转异步的适配方法,同步写入数据(如:处理单元中将目标数据同步写入发送单元)或读出数据(如:路由器中发送单元基于接收单元的存储区域同步读出数据),异步读出数据(如:处理单元中将目标数据由第二发送单元异步发送至路由器)或写入数据(如:路由器中的第一接收单元异步写入数据)。
步骤S307:在目标数据接收完毕后,将第一路由器的接收数据的状态就绪调整为接收数据的状态未就绪。
具体地,上述第一路由器还用于:在上述目标数据接收完毕后,将上述第一路由器的接收数据的状态就绪调整为接收数据的状态未就绪;上述第一处理单元还用于:在监测到上述第一路由器的接收数据的状态就绪变为状态未就绪后,确定上述目标数据发送完成。在本申请实施例中,当路由器中接收数据的状态由就绪状态转变为未就绪状态后,处理单元就可以确定数据发送完毕,即可停止数据发送节省通信资源。其中,就绪状态和未就绪 状态可以分别高、低电信号标识。
可选的,上述第一处理单元还用于:在上述目标数据的最后一位数据发送完毕后,设置上述目标数据的包尾并发送;上述第一路由器还用于:在接收到上述目标数据的包尾后,将上述第一路由器的接收数据的状态就绪调整为接收数据的状态未就绪。通过设置目标数据的包头和包尾实现目标数据的异步传输,使得不需要同步处理单元与路由器之间的时钟,也令一个路由器更易集成多种异构的处理单元或知识产权核。
步骤S308:第一路由器根据目的地址,向第二处理单元发送目标数据。
具体地,第一路由器根据所述目的地址,向所述目标单元发送所述目标数据。
可选的,路由器中每个上述发送端口与一个发送单元对应;根据上述目的地址,确定上述第一路由器中的目标发送端口,上述目标发送端口为上述第一路由器中与上述第二处理单元对应的发送端口;通过上述目标发送端口对应的第一发送单元向上述第二处理单元发送上述目标数据。通过基于端口可配置的路由器根据目的地址,确定与第一发送单元对应的目标发送端口,并通过该发送端口向第二处理单元发送目标数据。而且,在多个路由器形成的通信连接中,并不需要确定接收到目标数据时确定端口号,即可根据该目的地址向第二处理单元发送数据。
可选的,每个上述路由器包括映射表,上述映射表包括上述路由器中每个上述发送端口的端口标识与对应的上述处理单元的单元标识或其他上述路由器的路由标识之间的映射关系,上述单元标识用于唯一识别上述处理单元,上述路由标识用于唯一识别上述路由器;上述第一路由器具体用于:根据上述目的地址,基于上述第一路由器中的映射表确定上述目标发送端口。在本申请实施例中,通过查询映射表的路由转发机制,简化路由转发过程,提高传输效率。
可选的,在上述第二处理单元与第二路由器对应连接,且上述第一路由器与上述第二路由器为上述多个路由器中两个不同的路由器时,上述目标发送端口为与上述第一路由器中与上述第二路由器连接跳数最少的发送端口。在多个路由器形成的通信连接中,路由连接跳数的数据最多的情况是在路由器数量的基础上减一,路由器可以根据目的地址自行选择连接跳数最少的传输路径进行向第二处理单元发送目标数据。
可选的,上述第一路由器具体用于:在上述第一发送单元接收到上述第一接收单元发送的第二请求时,控制上述第一发送单元从上述第一接收单元的存储区域中获取上述目标数据,上述第二请求用于请求通过上述第一发送单元发送上述目标数据;通过上述第一发送单元基于上述目标发送端口以串行单比特的传输方式向上述第二处理单元发送上述目标数据。通过基于共享数据的简单发送单元,重用接收端的fifo机制的存储区域,减少数据搬移,提高传输效率。
可选的,每个上述路由器包括通道选择器;上述第一路由器的通道选择器用于,联通上述第一接收单元到上述第一发送单元的数据通路,以使上述第一发送单元从上述第一接收单元的存储区域获取上述目标数据。当有数据发送的需求时,通道选择器可以连通接收单元与发送单元之间的数据通路,以使发送单元可以通过该数据通路复用接收单元的fifo存储区域,减少数据搬移,大大的提高了路由器的传输性能。
可选的,在发送完毕之后通道选择器释放该数据通路,供其他第一接收单元向第一发 送单元发送数据。即,第一发送单元一次只能发送一个第一接收单元的数据,在发送完毕后,第一发送单元与该第一接收单元的数据通路会断开。
可选的,每个上述路由器包括仲裁器,每个上述发送单元对应一个上述仲裁器;上述第一路由器的仲裁器用于:在m个接收单元同时向上述第一发送单元请求发送数据时,根据预设仲裁规则,从上述m个接收单元中确定目标接收单元,m为大于1且小于或等于上述路由器包含的全部上述接收单元数量;上述第一路由器的通道选择器还用于,在上述仲裁器确定上述目标接收单元后,联通上述目标接收单元到上述第一发送单元的数据通路,以使上述第一发送单元从上述目标接收单元的存储区域获取数据并发送。在本申请实施例中,利用仲裁器实现“多到一”的公平仲裁机制,减少路由转发数据时的冲突。其中,为了保证每个发送单元的正常工作,仲裁器与该发送单元存在一一对应的关系。
可选的,上述仲裁器包括第二异步握手电路;上述第一路由器的第二异步握手电路用于:在确定上述第一发送单元的发送数据的状态就绪后,基于上述目标接收单元向上述第一发送单元发送的请求发送数据的信号,确定第二时钟信号;上述第一路由器的通道选择器具体用于:基于上述第二时钟信号,联通上述目标接收单元到上述第二发送的数据通路。在本申请实施例中,路由器中的仲裁器基于Click电路等握手电路的简单令牌环机制,实现该数据传输装置中公平仲裁机制。而且,该仲裁器通过利用同接收单元的时序依赖关系的公共仲裁机制,实现基于数据包的传输机制,性能高。可以理解的是,该路由器中的仲裁器为异步仲裁器。
可选的,每个上述仲裁器中上述第二异步握手电路的数量与上述路由器中接收端口的数量相比少一。在本申请实施例中,为了保证除发送端口对应的接收端口外,其他接收端口都需要向发送端口发送消息,所以,仲裁器中第二异步握手电路的数量与路由器中接收端口的数量相比少一。
实施本申请实施例,基于异步握手机制的数据传输装置通过处理单元(如,第一处理单元)在确定路由器(如,第一路由器)的接收数据的状态就绪后,基于处理单元发送数据的请求,生成第一时钟信号;并按照第一时钟信号将所述第一时钟信号和所述目标数据发送至路由器中,使得与处理单元连接的路由器可以通过第一时钟信号接收到目标数据,然后路由器再根据接收到的目标数据中,携带的目的地址向第二处理单元发送该目标数据。这种处理单元和路由器之间异步握手的传输方式可以确保路由器接收完成目标数据。另外,处理单元将发送目标数据时的时钟信号(即,第一时钟信号)也发送至路由器,以使路由器可以根据该时钟信号接收到数据,减少数据传输装置内的时钟约束,更易集成多种异构的处理单元或知识产权核,同时使得多个路由器之间不受同步时钟限制,决策更快,可有效提升系统的转发性能。而且,处理单元与路由器之间连接数据线相对短且相对确定,会进一步导致在处理单元有发送数据的需求时,对应的时钟信号的时延小且相对确定。与此同时,本申请实施例中一个路由器可以与多个处理单元异步连接,大大减少总线占用的芯片面积。
需要说明的是,第一路由器和第一处理单元中均包括接收单元和发送单元,例如:第一路由器中可以包括第一接收单元和第一发送单元,第一处理单元可以包括第二接收单元和第二发送单元。其中,第一接收单元和第二接收单元功能相似,结构相似均用于通过异 步串行单比特的传输方式接收数据,第一发送单元和第二发送单元功能和结构相似均可用于通过异步串行单比特的传输方式发送数据。其中,第一路由器和第一处理单元中接收单元和发送单元的相关结构和功能还可以对应参考上述实施例的相关描述,本申请实施例暂不赘述。
还需要说明的是,本申请实施例中提及的第一路由器可对应参考上述图4-上述图19中所涉及的路由器,本申请实施例中提及的第一处理单元可对应参考上述图4-上述图19中所涉及的处理单元,本申请实施例在此暂不赘述。
本申请实施例还提供了一种芯片系统,该芯片系统包括了上述任意一个实施例以及结合上述实施例的任意一种实现方式所提供的装置。该芯片系统用于实现如上述数据传输装置的功能。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存数据传输装置必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
本申请实施例还提供了一种电子设备,该电子设备包括了上述任意一个实施例以及结合上述实施例的任意一种实现方式所提供的装置。该电子设备用于实现如上述数据传输装置的功能。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者 说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务端或者网络设备等,具体可以是计算机设备中的处理器)执行本申请各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:u盘、移动硬盘、磁碟、光盘、只读存储器(read-only memory,缩写:rom)或者随机存取存储器(random access memory,缩写:ram)等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。
Claims (24)
- 一种数据传输装置,其特征在于,包括:多个处理单元和多个路由器,每个所述路由器与一个或多个处理单元连接,每个所述路由器与所述多个路由器中任意一个路由器形成通信连接关系;其中,所述多个路由器包括第一路由器,所述第一路由器与第一处理单元连接;所述第一处理单元,用于:生成第一请求,所述第一请求用于请求将目标数据发送至第二处理单元,所述目标数据包括第二处理单元的目的地址;在确定所述第一路由器的接收数据的状态就绪后,基于所述第一请求,确定第一时钟信号;基于所述第一时钟信号向所述第一路由器发送所述目标数据,并将所述第一时钟信号发送至所述第一路由器;所述第一路由器,用于:接收所述第一时钟信号;基于所述第一时钟信号接收所述第一处理单元发送的所述目标数据;根据所述目的地址,向所述第二处理单元发送所述目标数据。
- 根据权利要求1所述装置,其特征在于,每个所述处理单元包括第一异步握手电路;所述第一处理单元具体用于:通过所述第一异步握手电路在确定所述第一路由器的接收数据的状态就绪后,基于所述第一请求,确定所述第一时钟信号。
- 根据权利要求1或2所述装置,其特征在于,每个所述处理单元包括异步消息发送器;所述第一处理单元具体用于:基于所述第一请求,控制所述异步消息发送器基于所述第一时钟信号将所述目标数据以串行单比特的传输方式发送至所述第一路由器。
- 根据权利要求1-3任意一项所述装置,其特征在于,每个所述处理单元包括基于先进先出存储机制的存储区域;所述第一处理单元具体用于:将所述目标数据写入至所述基于先进先出存储机制的存储区域后,生成所述第一请求。
- 根据权利要求1-4任意一项所述装置,其特征在于,所述第一处理单元和所述第一路由器之间通过异步消息总线连接,其中,所述异步消息总线包括接收就绪信号线,时钟信号线,消息有效位信号线和一根或多根数据线。
- 根据权利要求1-5任意一项所述装置,其特征在于,每个所述路由器包括多组端口,每组所述端口包括接收端口和发送端口,其中,每个所述接收端口用于接收数据,每个所述发送端口用于发送数据。
- 根据权利要求6所述装置,其特征在于,每个所述接收端口与一个接收单元对应,每个所述接收单元包括基于先进先出存储机制的存储区域;所述第一路由器具体用于:基于所述第一时钟信号,通过目标接收端口驱动第一接收单元内的所述存储区域接收所述第一处理单元发送的所述目标数据,所述目标接收端口为所述第一路由器中与所述第一处理单元连接的接收端口。
- 根据权利要求7所述装置,其特征在于,每个所述发送端口与一个发送单元对应;所述第一路由器具体用于:根据所述目的地址,确定所述第一路由器中的目标发送端口,所述目标发送端口为所述第一路由器中与所述第二处理单元对应的发送端口;通过所述目标发送端口对应的第一发送单元向所述第二处理单元发送所述目标数据。
- 根据权利要求8所述装置,其特征在于,在所述第二处理单元与第二路由器对应连接,且所述第一路由器与所述第二路由器为所述多个路由器中两个不同的路由器时,所述目标发送端口为与所述第一路由器中与所述第二路由器连接跳数最少的发送端口。
- 根据权利要求8所述装置,其特征在于,所述第一路由器具体用于:在所述第一发送单元接收到所述第一接收单元发送的第二请求时,控制所述第一发送单元从所述第一接收单元的存储区域中获取所述目标数据,所述第二请求用于请求通过所述第一发送单元发送所述目标数据;通过所述第一发送单元基于所述目标发送端口以串行单比特的传输方式向所述第二处理单元发送所述目标数据。
- 根据权利要求8-10任意一项所述装置,其特征在于,每个所述路由器包括仲裁器,每个所述发送单元对应一个所述仲裁器;所述第一路由器的仲裁器用于:在m个接收单元同时向所述第一发送单元请求发送数据时,根据预设仲裁规则,从所述m个接收单元中确定目标接收单元,m为大于1且小于或等于所述路由器包含的全部所述接收单元数量。
- 一种数据传输方法,其特征在于,应用于数据传输装置,所述数据传输装置包括:多个处理单元和多个路由器,每个所述路由器与一个或多个处理单元连接,每个所述路由器与所述多个路由器中任意一个路由器形成通信连接关系;其中,所述多个路由器包括第一路由器,所述第一路由器与第一处理单元连接;所述方法包括:通过所述第一处理单元生成第一请求,所述第一请求用于请求将目标数据发送至第二处理单元,所述目标数据包括第二处理单元的目的地址;通过所述第一处理单元确定所述第一路由器的接收数据的状态就绪后,基于所述第一请求,确定第一时钟信号;通过所述第一处理单元基于所述第一时钟信号向所述第一路由器发送所述目标数据,并将所述第一时钟信号发送至所述第一路由器;通过所述第一路由器接收所述第一时钟信号;通过所述第一路由器基于所述第一时钟信号接收所述第一处理单元发送的所述目标数据;通过所述第一路由器根据所述目的地址,向所述第二处理单元发送所述目标数据。
- 根据权利要求12所述方法,其特征在于,每个所述处理单元包括第一异步握手电路;所述通过所述第一处理单元确定所述第一路由器的接收数据的状态就绪后,基于所述第一请求,确定第一时钟信号,包括:通过所述第一异步握手电路在确定所述第一路由器的接收数据的状态就绪后,基于所述第一请求,确定所述第一时钟信号。
- 根据权利要求12或13所述方法,其特征在于,每个所述处理单元包括异步消息发送器;所述通过所述第一处理单元基于所述第一时钟信号向所述第一路由器发送所述目标数据,包括:基于所述第一请求,控制所述异步消息发送器基于所述第一时钟信号将所述目标数据以串行单比特的传输方式发送至所述第一路由器。
- 根据权利要求12-14任意一项所述方法,其特征在于,每个所述处理单元包括基于先进先出存储机制的存储区域;所述通过所述第一处理单元生成第一请求,包括:通过所述第一处理单元将所述目标数据写入至所述基于先进先出存储机制的存储区域后,生成所述第一请求。
- 根据权利要求12-15任意一项所述方法,其特征在于,所述第一处理单元和所述第一路由器之间通过异步消息总线连接,其中,所述异步消息总线包括接收就绪信号线,时钟信号线,消息有效位信号线和一根或多根数据线。
- 根据权利要求12-16任意一项所述方法,其特征在于,每个所述路由器包括多组端口,每组所述端口包括接收端口和发送端口,其中,每个所述接收端口用于接收数据,每个所述发送端口用于发送数据。
- 根据权利要求17所述方法,其特征在于,每个所述接收端口与一个接收单元对应,每个所述接收单元包括基于先进先出存储机制的存储区域;所述通过所述第一路由器基于所述第一时钟信号接收所述第一处理单元发送的所述目标数据,包括:基于所述第一时钟信号,通过目标接收端口驱动第一接收单元内的所述存储区域接收 所述第一处理单元发送的所述目标数据,所述目标接收端口为所述第一路由器中与所述第一处理单元连接的接收端口。
- 根据权利要求18所述方法,其特征在于,每个所述发送端口与一个发送单元对应;所述通过所述第一路由器根据所述目的地址,向所述第二处理单元发送所述目标数据,包括:根据所述目的地址,确定所述第一路由器中的目标发送端口,所述目标发送端口为所述第一路由器中与所述第二处理单元对应的发送端口;通过所述目标发送端口对应的第一发送单元向所述第二处理单元发送所述目标数据。
- 根据权利要求19所述方法,其特征在于,在所述第二处理单元与第二路由器对应连接,且所述第一路由器与所述第二路由器为所述多个路由器中两个不同的路由器时,所述目标发送端口为与所述第一路由器中与所述第二路由器连接跳数最少的发送端口。
- 根据权利要求19所述装置,其特征在于,所述通过所述目标发送端口对应的第一发送单元向所述第二处理单元发送所述目标数据,包括:在所述第一发送单元接收到所述第一接收单元发送的第二请求时,控制所述第一发送单元从所述第一接收单元的存储区域中获取所述目标数据,所述第二请求用于请求通过所述第一发送单元发送所述目标数据;通过所述第一发送单元基于所述目标发送端口以串行单比特的传输方式向所述第二处理单元发送所述目标数据。
- 根据权利要求19-21任意一项所述方法,其特征在于,每个所述路由器包括仲裁器,每个所述发送单元对应一个所述仲裁器;所述方法还包括:在m个接收单元同时向所述第一发送单元请求发送数据时,通过所述第一路由器的仲裁器根据预设仲裁规则,从所述m个接收单元中确定目标接收单元,m为大于1且小于或等于所述路由器包含的全部所述接收单元数量。
- 一种芯片系统,其特征在于,所述芯片系统包括上述权利要求1-11中任意一项所述的装置。
- 一种电子设备,其特征在于,所述电子设备包括上述权利要求1-11中任意一项所述的装置。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/105474 WO2023279369A1 (zh) | 2021-07-09 | 2021-07-09 | 一种数据传输装置、方法及相关设备 |
CN202180100274.6A CN117616735A (zh) | 2021-07-09 | 2021-07-09 | 一种数据传输装置、方法及相关设备 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/105474 WO2023279369A1 (zh) | 2021-07-09 | 2021-07-09 | 一种数据传输装置、方法及相关设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023279369A1 true WO2023279369A1 (zh) | 2023-01-12 |
Family
ID=84800239
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/105474 WO2023279369A1 (zh) | 2021-07-09 | 2021-07-09 | 一种数据传输装置、方法及相关设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117616735A (zh) |
WO (1) | WO2023279369A1 (zh) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104065570A (zh) * | 2014-06-23 | 2014-09-24 | 合肥工业大学 | 异步可容错片上网络路由器设计方法 |
CN104683263A (zh) * | 2015-01-26 | 2015-06-03 | 天津大学 | 缓解热点的片上网络拓扑结构 |
US20180159786A1 (en) * | 2016-12-02 | 2018-06-07 | Netspeed Systems, Inc. | Interface virtualization and fast path for network on chip |
CN111131091A (zh) * | 2019-12-25 | 2020-05-08 | 中山大学 | 一种面向片上网络的片间互连方法和系统 |
CN112597075A (zh) * | 2020-12-28 | 2021-04-02 | 海光信息技术股份有限公司 | 用于路由器的缓存分配方法、片上网络及电子设备 |
CN113079100A (zh) * | 2021-03-03 | 2021-07-06 | 桂林电子科技大学 | 一种用于高速数据采集的NoC路由器 |
-
2021
- 2021-07-09 WO PCT/CN2021/105474 patent/WO2023279369A1/zh active Application Filing
- 2021-07-09 CN CN202180100274.6A patent/CN117616735A/zh active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104065570A (zh) * | 2014-06-23 | 2014-09-24 | 合肥工业大学 | 异步可容错片上网络路由器设计方法 |
CN104683263A (zh) * | 2015-01-26 | 2015-06-03 | 天津大学 | 缓解热点的片上网络拓扑结构 |
US20180159786A1 (en) * | 2016-12-02 | 2018-06-07 | Netspeed Systems, Inc. | Interface virtualization and fast path for network on chip |
CN111131091A (zh) * | 2019-12-25 | 2020-05-08 | 中山大学 | 一种面向片上网络的片间互连方法和系统 |
CN112597075A (zh) * | 2020-12-28 | 2021-04-02 | 海光信息技术股份有限公司 | 用于路由器的缓存分配方法、片上网络及电子设备 |
CN113079100A (zh) * | 2021-03-03 | 2021-07-06 | 桂林电子科技大学 | 一种用于高速数据采集的NoC路由器 |
Also Published As
Publication number | Publication date |
---|---|
CN117616735A (zh) | 2024-02-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10084692B2 (en) | Streaming bridge design with host interfaces and network on chip (NoC) layers | |
US10848442B2 (en) | Heterogeneous packet-based transport | |
US9742630B2 (en) | Configurable router for a network on chip (NoC) | |
CN100527697C (zh) | 用于交换数据分组或帧的装置和方法 | |
US6012099A (en) | Method and integrated circuit for high-bandwidth network server interfacing to a local area network | |
US11695708B2 (en) | Deterministic real time multi protocol heterogeneous packet based transport | |
EP1249978A1 (en) | Device and method for transmission in a switch | |
CN111555901A (zh) | 灵活支持混合总线协议的芯片配置网络系统 | |
CN102685017A (zh) | 一种基于fpga的片上网络路由器 | |
JP2011505038A (ja) | チェーン化デバイスシステムにおいてパラメータを設定し待ち時間を決定する方法 | |
US8589614B2 (en) | Network system with crossbar switch and bypass route directly coupling crossbar interfaces | |
US9185026B2 (en) | Tagging and synchronization for fairness in NOC interconnects | |
CN116383114B (zh) | 芯片、芯片互联系统、数据传输方法、电子设备和介质 | |
WO2019236235A1 (en) | Priority-based arbitration for parallel multicast routing with self-directed data packets | |
US8824295B2 (en) | Link between chips using virtual channels and credit based flow control | |
WO2023279369A1 (zh) | 一种数据传输装置、方法及相关设备 | |
US20230388251A1 (en) | Tightly-Coupled, Loosely Connected Heterogeneous Packet Based Transport | |
WO2018196833A1 (zh) | 报文发送方法和报文接收方法及装置 | |
CN116627894B (zh) | 一种介质访问控制层、通信方法和系统 | |
Zhang et al. | Application of SRIO in radar signal processing | |
WO2024098869A9 (zh) | 一种通信方法及装置 | |
WO2011100139A1 (en) | Implementation of switches in a communication network | |
US20240004735A1 (en) | Non-blocking ring exchange algorithm | |
KR0168904B1 (ko) | 그룹통신을 제공하는 데이터 교환 장치 및 방법 | |
Luo et al. | A low power and delay multi-protocol switch with IO and network virtualization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21948863 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180100274.6 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21948863 Country of ref document: EP Kind code of ref document: A1 |