WO2019228077A1 - 实现数据传输的方法、装置、电子设备及计算机可读存储介质 - Google Patents

实现数据传输的方法、装置、电子设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2019228077A1
WO2019228077A1 PCT/CN2019/082225 CN2019082225W WO2019228077A1 WO 2019228077 A1 WO2019228077 A1 WO 2019228077A1 CN 2019082225 W CN2019082225 W CN 2019082225W WO 2019228077 A1 WO2019228077 A1 WO 2019228077A1
Authority
WO
WIPO (PCT)
Prior art keywords
access
instruction
bus
read
write
Prior art date
Application number
PCT/CN2019/082225
Other languages
English (en)
French (fr)
Inventor
李嘉昕
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2019228077A1 publication Critical patent/WO2019228077A1/zh
Priority to US17/007,523 priority Critical patent/US11481346B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/36Handling requests for interconnection or transfer for access to common bus or bus system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/382Information transfer, e.g. on bus using universal interface adapter
    • G06F13/385Information transfer, e.g. on bus using universal interface adapter for adaptation of a particular data processing system to different peripheral devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/38Universal adapter
    • G06F2213/3852Converter between protocols

Definitions

  • the present application relates to the technical field of integrated circuits, and in particular, to a method, an apparatus, an electronic device, and a computer-readable storage medium for implementing data transmission.
  • system-on-chip is also a system-level integrated circuit (SOC).
  • SOC system-level integrated circuit
  • IC design methods have also evolved from a timing-driven approach to an IP (Intellectual Property, Intellectual Property Core) resource reuse approach.
  • This application provides a method for implementing data transmission. .
  • the present application also provides a method for implementing data transmission, which is executed by an electronic device.
  • the method is applied to the execution of transmission between different types of buses corresponding to instances run by a computing service and external devices, respectively.
  • the access instruction cached in the instruction storage area is continuously transmitted to the access object, and the transmission of the access instruction is not stopped until the flow control is received.
  • the present application also provides a device for implementing data transmission.
  • the device is applied to the execution of transmission between different types of buses corresponding to instances run by a computing service and external devices, respectively.
  • the device includes:
  • An instruction obtaining module configured to obtain an access instruction for reading and writing data, the access instruction being initiated by the instance and both ends of the external device to any end through a bus;
  • An instruction cache module configured to cache the access instruction to an instruction storage area corresponding to the access instruction
  • the instruction transmission module is configured to continuously transmit the access instruction buffered in the instruction storage area to the access object according to the access object indicated by the access instruction, and stop transmitting the access instruction until it is flow-controlled.
  • the present application also provides an electronic device.
  • the electronic device includes:
  • Memory for storing processor-executable instructions
  • the processor is configured to execute the method for implementing data transmission described above.
  • the present application also provides a computer-readable storage medium that stores processor-executable instructions. When the instructions are executed by one or more processors, the foregoing method for implementing data transmission is completed.
  • FIG. 1 is a schematic diagram of an implementation environment after an example of an AXI bus is transplanted to an FPGA chip based on a CCI-P bus;
  • FIG. 2 is a schematic diagram of an implementation environment after an example based on a CCI-P bus is transplanted to an FPGA chip based on an AXI bus;
  • Fig. 3 is a flow chart showing a method for implementing data transmission according to an exemplary embodiment
  • FIG. 4 is a flowchart of a method for implementing data transmission provided by another exemplary embodiment based on the embodiment shown in FIG. 3;
  • step 310 is a detailed flowchart of step 310 in the embodiment corresponding to FIG. 3;
  • step 312 is a detailed flowchart of step 312 in the embodiment corresponding to FIG. 5;
  • FIG. 7 is a system frame diagram of a bus conversion device for converting an AXI bus and a CCI-P bus deployed in an FPGA chip;
  • FIG. 8 is a detailed development view of the bus conversion device shown in FIG. 7.
  • Fig. 9 is a flow chart of processing a read request by a read link stream processing module in a fast link protocol converter according to an exemplary embodiment
  • Fig. 10 is a flow chart of processing a read request by a full-link stream processing module in a fast link protocol converter according to an exemplary embodiment
  • Fig. 11 is a flow chart of processing a write request by a write link stream processing module in a fast link protocol converter according to an exemplary embodiment
  • Fig. 12 is a flow chart of processing a write request by a full-link stream processing module in a fast link protocol converter according to an exemplary embodiment
  • FIG. 13 is a schematic flowchart of a method for implementing data transmission provided by an exemplary embodiment of the present application.
  • Fig. 14 is a block diagram of a device for implementing data transmission according to an exemplary embodiment
  • FIG. 15 is a block diagram of a device for implementing data transmission according to another exemplary embodiment based on the embodiment shown in FIG. 14; FIG.
  • FIG. 16 is a detailed block diagram of the instruction acquisition module in the embodiment corresponding to FIG. 14.
  • mainstream FPGA (field-programmable gate array) chip manufacturers include Intel and Xilinx. Because Xilinx generally uses the AXI bus to interconnect various functional modules within the chip, Intel is CCI-P bus is used to interconnect the functional modules inside the chip. Because FPGA chips provided by different manufacturers use different types of bus interconnections, in order to minimize the secondary development of users, this application adds a fast link protocol converter in the FPGA chip, which can realize the connection between the CCI-P and the AXI bus. Conversion, so that the Xilinx-provided FPGA chip based on the AXI bus example can be easily transplanted to Intel-based FPGA chip based on the CCI-P bus.
  • Figure 1 is a schematic diagram of the implementation environment after the example based on the AXI bus is transplanted to the FPGA chip based on the CCI-P bus.
  • the implementation environment includes an FPGA chip 110 and an external device 120 for running an application program.
  • the external device 120 and the FPGA chip 110 are interconnected through a PCIe bus.
  • Example 111 is a program run by a computing service such as deep learning, graphic image compression processing, genomics, etc. transplanted from a chip based on the AXI bus 115.
  • the fast link protocol converter 113 of the present application for data transmission can be deployed in the FPGA chip 110, so that when the instance 111 based on the AXI bus 115 is transplanted into the FPGA chip 110 based on the CCI-P bus 114, It can be connected to the CCI-P bus 114 of the FPGA chip 110.
  • the fast link protocol converter 113 includes multiple FIFOs (first-in-first-out data buffers), which respectively buffer the data sent by the external device 120 to the instance 111 and the data sent by the instance 111 to the external device 120. . Even if the clock cycles of data transmission between AXI bus 115 and CCI-P bus 114 are different, and the transmission bandwidth is different, by buffering the data transmitted between AXI bus 115 and CCI-P bus 114, when high-speed data needs to be transmitted, transmission is avoided. The waste of bandwidth; and the buffered data is continuously transmitted to the receiver when no flow control signal is received, which realizes the data stream mode transmission and improves the data transmission efficiency. Further, the data can be output according to the clock cycle of the receiver. So as to realize the asynchronous transmission of data.
  • the above-mentioned fast link protocol converter 113 may include a read link stream processing module 101, a write link stream processing module 102, a full link stream processing module 103, and a co-link stream processing module 104.
  • the modules each contain a FIFO.
  • the read link stream processing module 101 can receive the read request transmitted by the instance 111 through the AXI bus 115, and process the read request to obtain a read instruction.
  • the read instruction is buffered in the read instruction asynchronous FIFO, and then the read instruction is asynchronously transmitted to the CCI-P bus 114 in a stream mode, and the read data returned by the external device 120 in response to the read instruction is obtained.
  • the read link stream processing module 101 buffers the read data in the read data asynchronous FIFO, and then asynchronously transmits the read data through the AXI bus 115 to the running instance 111 of the computing service in a stream mode, so that the instance 111 can perform data processing on the received read data. Analysis and processing.
  • the write link stream processing module 102 may receive a write request transmitted by the instance 111 through the AXI bus 115, and process the write request to obtain a write instruction.
  • the write instruction is buffered in the write instruction asynchronous FIFO, and then the write instruction is asynchronously transmitted to the CCI-P bus 114 in a stream mode.
  • the external device 120 writes the write data carried by the write instruction and returns a write response.
  • the write link stream processing module 102 buffers the write response in the write response asynchronous FIFO, and then asynchronously transmits the write response to the instance 111 running by the computing service through the AXI bus 115 in a stream mode.
  • the write data may be a result obtained by performing a data analysis process on the read data in Example 111.
  • the full-link stream processing module 103 has a parallel read-write function, which can receive read-write requests transmitted by the external device 120 through the CCI-P bus 114 in parallel, and process read-write requests to obtain read-write instructions, and cache the read instructions in the read-command asynchronous FIFO.
  • the write instruction is buffered in the write instruction asynchronous FIFO, and then the read or write instruction is asynchronously transmitted to the AXI bus 115 in a streaming mode, and the read data returned by the instance 111 in response to the read instruction or the write response returned by the write instruction is obtained.
  • the full-link stream processing module 103 can further buffer the write response or read data and asynchronously transmit it to the external device 120 through the CCI-P bus 114 in a stream mode.
  • the co-link stream processing module 104 is used to transmit low-speed configuration information to the AXI bus 115. For example, it can assist the read link stream processing module 101 and the write link stream processing module 102, and return the header information of the read data to the AXI bus 115. And write the header information of the response, and the transmission check code.
  • Figure 2 is a schematic diagram of the implementation environment after the example based on the CCI-P bus is transplanted to the FPGA chip based on the AXI bus.
  • the implementation environment includes an FPGA chip 110 and an external device 120 for running an application program.
  • the external device 120 and the FPGA chip 110 are interconnected through a PCIe bus.
  • Example 111 is a program run by a computing service such as deep learning, graphic image compression processing, genomics, etc. transplanted from a chip based on the CCI-P bus 114.
  • the fast link protocol converter 113 of the present application for data transmission can be deployed in the FPGA chip 110, so that when the instance 111 based on the CCI-P bus 114 is transplanted into the FPGA chip 110 based on the AXI bus 115, It can be docked with the AXI bus 115 of the FPGA chip 110.
  • the fast link protocol converter includes multiple FIFOs, which buffer the data sent by the external device 120 to the instance 111 and buffer the data sent by the instance 111 to the external device 120, thereby realizing the AXI bus 115 and the CCI-P bus.
  • the fast link protocol converter 113 includes a read link stream processing module 101, a write link stream processing module 102, a full link stream processing module 103, and a co-link stream processing module 104.
  • Each template is Including FIFO.
  • the read link stream processing module 101 can receive a read request transmitted by the external device 120 through the AXI bus 115, and process the read request to obtain a read instruction.
  • the read instruction is buffered in the read instruction asynchronous FIFO, and then the read instruction is asynchronously transmitted to the CCI-P bus 114 in a stream mode, and the read data returned by the instance 111 in response to the read instruction is obtained.
  • the read link stream processing module 101 further buffers the read data in the read data asynchronous FIFO, and asynchronously transmits the read data to the external device 120 through the AXI bus 115 in a stream mode, so that the external device 120 can perform data analysis on the received read data. deal with.
  • the write link stream processing module 102 can receive the write request transmitted by the external device 120 through the AXI bus 115, and process the write request to obtain the write instruction.
  • the write instruction is buffered in the write instruction asynchronous FIFO, and then the stream mode is sent to the CCI-P bus 114.
  • Write commands are transmitted asynchronously, and the write response returned by the instance 111 in response to the write command is obtained.
  • the write link stream processing module 102 can further buffer the write response in the write response asynchronous FIFO, and asynchronously transmit the write response through the AXI bus 115 in stream mode to External device 120.
  • the write data may be a result obtained by the external device 120 performing data analysis processing on the read data.
  • the full-link stream processing module 103 has a parallel read-write function, and can receive read-write requests transmitted by the instance 111 through the CCI-P bus 114 in parallel, and process read-write requests to obtain read-write instructions.
  • the read instruction is cached in the read instruction asynchronous buffer
  • the write instruction is cached in the write instruction asynchronous buffer
  • the read or write instruction is asynchronously transmitted to the AXI bus 115 in a stream mode, thereby obtaining the read data returned by the external device 120 in response to the read instruction.
  • the full-link stream processing module 103 can further buffer the read data and asynchronously transmit it to the instance 111 through the CCI-P bus 114 in a stream mode.
  • the co-link stream processing module 104 is used to transmit low-speed configuration information to the AXI bus 115, for example, it can assist the read link stream processing module 101 and the write link stream processing module 102 and return the header information of the read data to the AXI bus 115 And write the header information of the response, and the transmission check code.
  • Fig. 3 is a flow chart showing a method for implementing data transmission according to an exemplary embodiment.
  • the applicable range of the method for implementing data transmission can be used, for example, in the FPGA chip 110 of the implementation environment shown in FIG. 1 to implement the execution of transmission between different types of buses corresponding to instances running on the computing service and external devices, as shown in the figure.
  • the method is executed by an electronic device described later, and may specifically include the following steps.
  • step 310 an access instruction for reading and writing data is obtained, and the access instruction is initiated to either end through a bus between the instance and both ends of the external device.
  • computing services refer to certain data processing functions deployed on FPGA chips, such as computing services such as deep learning, graphic image compression processing, and genomics.
  • An instance refers to a program module that completes the aforementioned computing services.
  • An external device is a terminal device that runs an application program, such as a host computer, as opposed to an FPGA chip.
  • the bus used internally by the FPGA chip may be AXI bus or CCI-P bus. Examples can be ported from other designs, and different types of buses may be used. For example, when the FPGA chip uses the AXI bus, the ported instance may use the CCI-P bus. When the FPGA chip uses the CCI-P bus, the ported example may use the AXI bus.
  • the method provided in this application can be used to implement the transmission between the CCI-P bus and the AXI bus.
  • the access instruction may be generated according to the access request, and the access instruction includes a read instruction and a write instruction.
  • the access request may be initiated by an instance, and a request to read and write data to an external device through the AXI bus and the CCI-P bus.
  • the access request can also be initiated by an external device, and a request to read and write data to the instance through the AXI bus and CCI-P bus.
  • FPGA chip can realize the conversion between AXI bus and CCI-P bus through a fast link protocol converter.
  • the fast link protocol converter can receive access requests sent by instances or external devices, and process read and write requests to obtain access instructions.
  • step 330 the access instruction is buffered to an instruction storage area corresponding to the access instruction.
  • buffering the access instruction to the instruction storage area corresponding to the access instruction means storing the access instructions for the same access object together.
  • the instruction storage area may be an asynchronous FIFO deployed in the FPGA chip.
  • a write operation instruction for the same access object is stored in the write instruction asynchronous FIFO
  • a read operation instruction for the same access object is stored in the read instruction asynchronous FIFO.
  • the access instructions are stored in the instruction storage area in the order in which the access instructions are generated, and in accordance with the first-in-first-out principle, when no back pressure signal is received, the access instructions are sequentially read for output.
  • the fast link protocol converter of the FPGA chip sequentially stores the access instructions obtained by processing the access request in the corresponding instruction storage area.
  • the read instructions of the FPGA chip to access the external device are buffered in the first FIFO
  • the write instructions of the FPGA chip to access the external device are buffered in the second FIFO
  • the read instructions of the external device to access the instance in the FPGA chip are buffered in the third FIFO.
  • Write instructions to access the instance in the FPGA chip are buffered in the fourth FIFO.
  • step 350 according to the access object indicated by the access instruction, the access instruction cached in the instruction storage area is continuously transmitted to the access object, and the transmission of the access instruction is stopped until the flow control is received.
  • the access object is an external device. Assuming that the access instruction is generated based on an access request sent by an external device, the access object is an instance in the FPGA chip.
  • the bus corresponding to the example may be an AXI bus, and the bus corresponding to an external device is a CCI-P bus.
  • the bus corresponding to the example is a CCI-P bus, and the bus corresponding to an external device is an AXI bus.
  • the AXI bus is an on-chip bus proposed by ARM for high performance, high bandwidth, and low latency.
  • CCI-P bus is a kind of on-chip cache application bus proposed by Intel Corporation.
  • the flow control means that the transmission of the access instruction needs to be suspended because the access target data cannot be processed.
  • the flow control signal returned by the access object to the access instruction it can be considered that the transmission of the access instruction is flow-controlled, and then the transmission of the access instruction is stopped.
  • the access object does not have extra processing power to process the access instruction, it will send a flow control signal to the FPGA chip.
  • the FPGA chip's fast link protocol converter does not receive the flow control signal from the access object to the access instruction, it will continuously transmit the cached access instruction to the access object through the bus corresponding to the access object, thereby realizing the data flow. Mode transmission to improve transmission efficiency.
  • the flow control signal to the access instruction is received, the transmission of the access instruction to the access object is stopped.
  • the access object is an external device
  • the fast link protocol converter of the FPGA chip continuously transmits the buffered read instruction to the external device until the flow control signal of the read instruction returned by the external device is received. Pause the transmission of a read instruction.
  • the fast link protocol converter of the FPGA chip continuously transmits the buffered write instruction to the external device until the flow control signal of the write instruction returned by the external device is received, and then the transmission of the write instruction is suspended. In other words, the transmission of the read instruction and the transmission of the write instruction do not interfere with each other.
  • the transmission of the read instruction is suspended, but if the flow control signal to the write instruction is not received, it can still be transmitted. Transmission of write instructions continues.
  • the present application can continuously transmit the cached access instruction to the access object when the flow control signal returned by the access object is not received. For example, when the transmission bandwidth of the AXI bus is large and the transmission bandwidth of the CCI-P bus is small, the high-speed data transmitted by the AXI bus can be buffered, and the buffered data is continuously transmitted to the CCI when no flow control signal is received. -P bus transmission, which maximizes data transmission efficiency and improves throughput.
  • the technical solution provided in this application caches the access instruction transmitted between the instance and the external device through different types of buses, and continuously transmits the cached access instruction to the access object of the access instruction, and stops transmitting to the access object until the flow control is received. Access instructions, thereby realizing the streaming mode transmission of access instructions and improving the efficiency of data transmission. By buffering the access instructions, the problem of bandwidth waste caused by different bandwidths between different types of buses when the instances are migrated across chips is eliminated. Adjusted the examples, saving development costs.
  • the above step 350 specifically includes:
  • the access instruction buffered in the instruction storage area is continuously transmitted to the access object until the flow control signal returned by the access object to the access instruction is received.
  • the clock cycle refers to the timing of transmitting the access instruction to the access object on the bus corresponding to the access object.
  • the flow control signal can be a back pressure signal, or it can be another signal used to indicate that the data processing cannot be paused for data transmission.
  • the FPGA chip's fast link protocol converter continuously reads the access instruction from the instruction storage area and transmits it to the access object according to the timing, until it receives the backpressure signal sent by the access object because it cannot process the access instruction, it suspends An access instruction is transmitted to the access object. When a signal to cancel back pressure is received, the access instruction continues to be transmitted to the access object.
  • the bus corresponding to the example is an AXI bus
  • the bus corresponding to an external device is a CCI-P bus.
  • the clock cycles of different types of buses are different.
  • the access instruction is initiated by an instance and used to access an external device.
  • This application uses a fast link protocol converter to cache the access instructions transmitted between the AXI bus and the CCI-P bus, so that the fast link protocol converter can receive the transmitted access instructions and cache them according to the clock cycle of the AXI bus. Then, the cached access instruction can be transmitted to the external device according to the clock cycle of the CCI-P bus, thereby achieving asynchronous transmission of data between different types of buses in the FPGA chip.
  • the method for implementing data transmission provided by the present application further includes the following steps:
  • step 401 in the data storage area corresponding to the access instruction, the read-write feedback data returned by the access object according to the access instruction is cached;
  • the data storage area is used to store the read-write feedback data returned by the access object in response to the access instruction.
  • Read and write feedback data includes read data and write response.
  • the data storage area includes write response asynchronous FIFO and read data asynchronous FIFO.
  • the write response asynchronous FIFO is used to buffer the write response returned by the access object according to the write instruction
  • the read data asynchronous FIFO is used to buffer the read data returned by the access object according to the read instruction.
  • the fast link protocol converter of the FPGA chip caches the read / write feedback data in a data storage area corresponding to the access instruction, for example, in a write response asynchronous FIFO.
  • the write response returned by the access object in response to the write instruction is buffered in the read data asynchronous FIFO.
  • step 402 according to the initiator of the access instruction, the read-write feedback data buffered in the data storage area is continuously transmitted to the initiator until the read-write feedback data is received by the initiator.
  • the returned flow control signal according to the initiator of the access instruction, the read-write feedback data buffered in the data storage area is continuously transmitted to the initiator until the read-write feedback data is received by the initiator.
  • the initiator of the access instruction may be an instance or an external device.
  • the access object is an external device; conversely, when the initiator is an external device, the access object is an instance.
  • the fast link protocol converter of the FPGA chip sends the cached access instruction (taking the read instruction as an example) to the external device, and the external device returns according to the read instruction. Read data, and then cache the read data in the read data asynchronous FIFO.
  • the fast link protocol converter continuously transmits the read data buffered in the read data asynchronous FIFO to the instance according to the first-in-first-out principle, and suspends the return of the read data until it receives the flow control signal returned by the instance to the read data.
  • step 350 specifically includes:
  • the access instruction is continuously transmitted to the access object until a flow control signal from the access object to the access instruction is received.
  • the FPGA chip Before the FPGA chip transmits the cached access instruction to the access object, it needs to determine whether the data storage area is not full. Because the data storage area user caches the read-write feedback data returned by the access object according to the access instruction, if the data storage area is full, the returned read-write feedback data cannot be written to the data storage area and needs to be registered separately. Therefore, when the data storage area is not full, the FPGA chip continuously transmits the cached access instruction to the access object until it receives the flow control signal returned by the access object to the access instruction.
  • the foregoing step 310 specifically includes:
  • step 311 an access request for reading and writing data is received, and the access request is initiated by the bus between the instance and both ends of the external device to either end;
  • the access request may be initiated by an instance or an external device.
  • the fast link protocol converter of the FPGA chip receives the access request initiated by the instance to read and write data from the external device through the AXI bus or CCI-P bus corresponding to the instance running the computing service.
  • the CCI-P bus or AXI bus corresponding to the external device is used to receive an access request initiated by the external device for reading and writing data to an instance run by the computing service.
  • step 312 the access request is processed to obtain a corresponding access instruction according to a protocol conversion rule between different types of buses.
  • the FPGA chip 110 may store protocol conversion rules between different buses in advance.
  • the access request transmitted by the instance through the AXI bus can be mapped according to the protocol conversion rules to obtain an access instruction to access the CCI-P bus.
  • the access request transmitted by the instance through the CCI-P bus may be mapped according to the protocol conversion rule to obtain an access instruction for accessing the AXI bus.
  • the CCI-P bus read address is calculated according to the AXI bus read address and other control information (such as a check code, etc.) carried in the access request to obtain the CCI-P bus read address.
  • the access instruction can further obtain the data corresponding to the read address by transmitting the access instruction to an external device.
  • step 312 specifically includes:
  • step 601 the validity of the access request is determined according to the identification information carried in the access request.
  • the identification information is used to indicate the validity of the access request.
  • the access request carries identification information a, which indicates that a read operation is performed.
  • the access request carries identification information b, which indicates that a write operation is performed.
  • the access request does not carry the above identification information, it indicates that the access request is invalid. Therefore, according to the identification information carried in the access request, it can be determined whether the access request is valid, and the invalid access request may not be processed.
  • step 602 when the access request is valid, an access request including an address signal is mapped to an access instruction including a read-write address according to a protocol conversion rule between different types of buses.
  • the access request transmitted through the AXI bus is based on the AXI bus protocol
  • the address signal carried in the access request is an access address based on the AXI bus protocol.
  • the address signals based on the AXI bus can be mapped to access instructions containing read and write addresses based on the CCI-P bus.
  • the address signal 111111 based on the AXI bus protocol, and the read and write address based on the CCI-P bus protocol that is mapped may be 111000.
  • the access request transmitted through the CCI-P bus can also be mapped to the AXI-based bus based on the CCI-P bus-based address signal carried in the access request according to the protocol conversion rules between the AXI bus and the CCI-P bus. Access instructions containing read and write addresses.
  • the fast link protocol converter of the FPGA chip may, according to the protocol conversion rules between the AXI bus and the CCI-P bus, access requests (including read-write addresses) transmitted by the instance through the AXI bus ), Mapping the read-write address transmitted in accordance with the AXI bus to the read-write address transmitted through the CCI-P bus, thereby obtaining an access instruction including the read-write address.
  • the foregoing step 330 specifically includes:
  • the access instruction is continuously written into an instruction storage area corresponding to the access object and the instruction type until the instruction storage area is full.
  • the access objects include external devices and instances
  • the instruction types include read instructions and write instructions.
  • the read instruction When the access instruction is a read instruction to access an external device, the read instruction is stored in the corresponding first read instruction asynchronous FIFO. When the first read instruction asynchronous FIFO is not full, the read instructions are generated in the order in which they are generated Buffered to the first read instruction asynchronous FIFO. Therefore, when the flow control signal of the read instruction from the external device is not received, the read instruction can be continuously read from the first read instruction asynchronous FIFO and transmitted to the external device.
  • the write instruction When the access instruction is a write instruction to access an external device, the write instruction is stored in the corresponding asynchronous write FIFO of the first write instruction.
  • the write instructions When the asynchronous write FIFO of the first write instruction is not full, the write instructions are sorted in the order in which the write instructions are generated. Buffered to the first write instruction asynchronous FIFO. Therefore, when the flow control signal of the write instruction from the external device is not received, the write instruction can be continuously read from the first write instruction asynchronous FIFO and transmitted to the external device.
  • the read instruction When the access instruction is a read instruction of the access instance, the read instruction is stored in the corresponding second read instruction asynchronous FIFO.
  • the second read instruction asynchronous FIFO When the second read instruction asynchronous FIFO is not full, the read instructions are sequentially buffered in the order in which the read instructions are generated. To the second read instruction asynchronous FIFO. Therefore, when the flow control signal of the read instruction from the instance is not received, the read instruction can be continuously obtained from the second read instruction asynchronous FIFO and transmitted to the instance.
  • the write instruction When the access instruction is the write instruction of the access instance, the write instruction is stored in the corresponding asynchronous write FIFO of the second write instruction.
  • the write instructions When the asynchronous write FIFO of the second write instruction is not full, the write instructions are sequentially buffered in the order in which the write instructions are generated. To the second write instruction asynchronous FIFO. Therefore, when the flow control signal of the write instruction from the instance is not received, the write instruction can be continuously obtained from the asynchronous write FIFO of the second write instruction and transmitted to the instance.
  • FIG. 7 is a system frame diagram of a bus conversion device for converting AXI bus and CCI-P bus deployed in an FPGA chip.
  • the bus conversion device may include a fast link protocol converter 113 and a first AXI bus 115 The interface 701 and the second interface 702 of the CCI-P bus 114.
  • the fast link protocol converter 113 is connected to the first interface 701 of the AXI bus 115 and the second interface 703 of the CCI-P bus 114, so as to realize asynchronous transmission of data between the AXI bus 115 and the CCI-P bus 114, and data flow. Mode transmission. Unless a flow control signal is received, the fast link protocol converter 113 can continuously transmit data to the interface, thereby maximizing data transmission efficiency and avoiding waste of transmission bandwidth.
  • the AXI bus 115 includes four data transmission links
  • the CCI-P bus 114 includes three data transmission links
  • the first interface 701 is composed of three independent AXI buses and one AXI-Lite bus, of which there are two independent AXI bus ports Is the master (M: Master), and an independent AXI bus port is the slave (S: Slave).
  • One AXI-Lite bus port is a slave (S: Slave).
  • the second interface 702 is composed of three data transmission links C0, C1, and C2.
  • FIG. 8 is a detailed development view of the bus conversion device shown in FIG. 7.
  • the fast link protocol converter 113 includes an AXI interface 106, a CCI-P interface 105, a read link stream processing module 101, a write link stream processing module 102, a full link stream processing module 103, and a co-link stream processing module. 104.
  • the CCI-P interface 105 includes three TX (transmitting ends) and two RX (receiving ends). According to the service type, it can be divided into 7 types of services, and the corresponding relationship is shown in Table 1 below.
  • Table 1 List of service types of CCI-P interface
  • the AXI interface 106 includes three AXI buses (AXI0, AXI1, AXI2) and one AXI-Lite bus, of which three AXI buses transmit high-speed data and AXI-Lite buses transmit low-speed configuration data, such as additional header information.
  • the interconnection relationship between the AXI interface 106 and the fast link protocol converter 113 is shown in Table 2 below.
  • the read link stream processing module 101 is mainly used to perform a read operation from the AXI bus 115 to the CCI-P bus 114, that is, to receive a read request from the AXI bus 115, read data from the CCI-P bus 114, and return it to AXI.
  • Bus 115 (ie MEM_RD and MEM_RD_RSP services).
  • the write link stream processing module 102 is mainly used to perform the write operation of the AXI bus 115 to the CCI-P bus 114, that is, to receive the write request of the AXI bus 115, write data to the CCI-P bus 114, and receive the returned write response (that is MEM_WR and MEM_WR_RSP services).
  • the full-link stream processing module 103 is mainly used to perform read and write operations from the CCI-P bus 114 to the AXI bus 115, that is, to write data to the AXI bus 115 or read data from the AXI bus 115 to the CCI-P bus 114 (that is, MMIO_WR , MMIO_RD and MMIO_RD_RSP services).
  • the co-link stream processing module 104 mainly returns the additional memory header information included in the write memory response and the read memory response transmitted by the C0.RX and C1.RX ports to the AXI interface 106 through the AXI-Lite bus. As shown in Table 3 below, it is a description of processing services of the co-link stream processing module 104.
  • the FPGA chip 110 when the access instruction is a read instruction to access the external device 120 and the bus corresponding to the external device 120 is a CCI-P bus 114 (refer to the implementation environment shown in FIG. 1), the FPGA chip 110 is deployed
  • the read link stream processing module 101 processes MEM_RD (read memory) and MEM_RD_RSP (read memory response) services, and realizes data transmission between different types of buses in the FPGA chip 110.
  • the read link stream processing module 101 includes a read instruction asynchronous FIFO and a read data asynchronous FIFO. As shown in FIG. 9, the specific workflow of the read link stream processing module 101 includes the following steps.
  • a read request is detected.
  • the read link stream processing module When the read link stream processing module is in an idle state, the read link stream processing module continuously detects whether there is a current read request and determines whether the read request is valid. If the current read request is valid, it enters the stage of calculating the read request header.
  • a read request packet header is calculated. Calculate the CCI-P bus read request packet header (that is, the read instruction containing the CCI-P bus read address) according to the AXI bus read address carried by the read request and other control information (such as a check code).
  • step 903 the read instruction asynchronous FIFO is written.
  • the read request packet header calculated in step 702 is written into the read instruction asynchronous FIFO.
  • step 904 a CCI-P bus read operation (MEM_RD) is initiated.
  • MEM_RD CCI-P bus read operation
  • step 905 the CCI-P bus read data is received and returned.
  • step 906 when the read back pressure of the AXI interface is invalid, the corresponding read data is returned to the AXI bus.
  • the full link deployed in the FPGA chip 110 The stream processing module 103 processes MMIO_RD_RSP (memory mapped I / O read response) and MMIO_RD (read memory mapped I / O) services.
  • the full link stream processing module 103 includes a read instruction asynchronous FIFO and a read data asynchronous FIFO. As shown in FIG. 10, the full-link stream processing module 103 processes the MMIO_RD_RSP and MMIO_RD services.
  • the specific workflow for completing a read request includes the following steps.
  • step 1001 a read request received by the CCI-P CO.RX port is detected.
  • the link continuously detects whether there is currently a read request, and if the current read request is valid, it proceeds to step 1002.
  • step 1002 the CCI-P bus read address is registered, and the AXI bus read address is calculated based on the CCI-P read address.
  • step 1003 when the read instruction asynchronous FIFO is not full, the AXI bus read address is written into the read instruction asynchronous FIFO.
  • step 1004 when the read data asynchronous FIFO is not full and the AXI interface read back pressure is invalid, a read operation is initiated to the AXI bus.
  • step 1005 the read data returned by the AXI interface is written into the read data asynchronous FIFO.
  • step 1006 when the CCI-P C2.TX port back pressure is invalid, the corresponding read data is returned to the CCI-P bus.
  • the read link stream processing module 101 and the full link stream processing module 103 of the FPGA chip 110 both include read instruction asynchronous FIFO and read data asynchronous FIFO
  • the AXI bus 115 and CCI -P bus 114 supports the transmission of data and instructions between asynchronous clock domains; the read link stream processing module 101 and the full link stream processing module 103 support stream mode data transmission.
  • the FIFO is not full, it is allowed to continuously write in the stream Input instructions or data, and continuously output instructions or data when there is no back pressure, to maximize access efficiency and improve throughput.
  • the access instruction is a write instruction to access the external device 120 and the bus corresponding to the external device 120 is the CCI-P bus 114 (refer to the implementation environment shown in FIG. 1)
  • the FPGA chip 110 is deployed
  • the write link stream processing module 102 processes MEM_WR (write memory) and MEM_WR_RSP (write response) services.
  • the write link stream processing module 102 includes a write instruction asynchronous FIFO and a write response asynchronous FIFO. As shown in FIG. 11, the specific workflow of the write link stream processing module 102 includes the following steps.
  • step 1101 a write request is detected.
  • the write link stream processing module continuously detects whether there are currently write requests, and if the current write request is valid, it enters the MEM_WR and MEM_WR_RSP phases at the same time.
  • step 1102 For the MEM_WR phase, when the write request is valid, it proceeds to step 1102 to register the AXI bus write address and data. After the registration is completed, it enters the calculation of the write request packet header.
  • step 1103 a write request packet header is calculated, and a CCI-P bus write request packet header (including a CCI-P bus write address) is calculated according to the current AXI bus write address and other control information.
  • step 1104 when the write instruction asynchronous FIFO is not full, the write address and write data (that is, the write instruction) are written into the write instruction asynchronous FIFO.
  • step 1105 a CCI-P bus write operation is initiated.
  • the MEM_WR operation is initiated on the TX port of the CCI-P C1 bus.
  • step 1107 a write response returned by the CCI-P bus is received.
  • step 1108 when the write response asynchronous FIFO is not full, the write response (MEM_WR_RSP) returned by CCI-P C1.RX is written into the write response asynchronous FIFO.
  • step 1109 when the write-response interface of the AXI bus is invalid, the corresponding write response is returned to the AXI bus.
  • the full link deployed in the FPGA chip 110 The stream processing module 103 processes MMIO_WR (write memory mapped I / O) services. Among them, the full link stream processing module 103 supports read and write parallel operations. In addition to the read command asynchronous FIFO and the read data asynchronous FIFO, the full link stream processing module 103 also includes a write instruction asynchronous FIFO. As shown in FIG. 11, the specific workflow of the full-link flow processing module 103 for processing the MMIO_WR service includes the following steps.
  • step 1201 a CCI-P CO.RX write request is detected.
  • the link continuously detects whether there is currently a write request, and if the current write request is valid, it proceeds to step 1202.
  • step 1202 the CCI-P write address and data carried in the write request are registered.
  • the process proceeds to step 1203.
  • step 1203 the AXI write address is calculated according to the registered CCI-P write request packet header.
  • step 1204 when the write instruction asynchronous FIFO is not full, the write data and the AXI write address (that is, the write instruction) are written into the write instruction asynchronous FIFO.
  • step 1205 when the back pressure of the AXI bus write port is invalid, a write operation is performed on the AXI bus until completion.
  • the full link stream processing module 103 includes a write instruction asynchronous FIFO to write data and
  • the write address is cached, so the AXI interface 106 and the CCI-P interface 105 of the FPGA chip 110 support the transmission of data and instructions between asynchronous clock domains; the write link stream processing module 102 and the full link stream processing module 103 support streaming. Mode data transmission.
  • the instruction or data can be continuously written in a stream mode.
  • the instruction or data is continuously output when there is no back pressure, which maximizes the access efficiency, improves throughput, and overcomes the waste of bandwidth caused by bandwidth mismatch problem.
  • FIG. 13 is a schematic flowchart of a method for implementing data transmission provided by an exemplary embodiment of the present application. It is assumed that the bus used by the FPGA chip itself is a CCI-P bus, and the bus used in an example transplanted from another chip is an AXI bus. As shown in FIG. 13, the method for implementing data transmission may include the following steps. For the case where the FPGA chip uses the AXI bus and the example uses the CCI-P bus, reference may be made to the current embodiment.
  • step 1301 the fast link protocol converter of the FPGA chip receives the read and write request transmitted by the instance through the AXI bus (or the external device through the CCI-P bus);
  • step 1302 it is determined whether the read and write request is valid according to the identification information carried in the access request.
  • step 1303 when the access request is valid, according to the protocol conversion rule between the AXI bus and the CCI-P bus, the access request is mapped to a read-write instruction including a read-write address;
  • the read instruction buffer is continuously performed when the read instruction asynchronous FIFO is not full, and when an external device does not receive a flow control signal for the read instruction through the CCI-P bus (or the instance through the AXI bus), according to the advanced First-in-first-out principle, the buffered read instructions are continuously transmitted to external devices through the CCI-P bus (or to the instance through the AXI bus).
  • the write instruction buffer When the write instruction asynchronous FIFO is not full, the write instruction buffer is continuously performed, and when no external device returns a flow control signal to the write instruction through the CCI-P bus (or the instance through the AXI bus), according to the FIFO principle
  • the buffered write instructions are continuously transmitted to external devices via the CCI-P bus (or to the instance via the AXI bus).
  • the following is a device embodiment of the present application, which can be used to implement the method embodiment for implementing data transmission performed by the fast link protocol converter in the above-mentioned FPGA chip of the present application.
  • the method embodiment for implementing data transmission of the present application please refer to the method embodiment for implementing data transmission of the present application.
  • Fig. 14 is a block diagram of a device for implementing data transmission according to an exemplary embodiment.
  • the device for implementing data transmission can be used in the FPGA chip 110 of the implementation environment shown in Fig. 1 to execute Figs. All or part of the steps of the method for implementing data transmission shown in any of 9 to FIG. 13.
  • the device includes, but is not limited to, an instruction acquisition module 1310, an instruction cache module 1330, and an instruction transmission module 1350.
  • An instruction obtaining module 1310 is configured to obtain an access instruction for reading and writing data, where the access instruction is initiated by the bus between the instance and both ends of the external device to either end;
  • An instruction cache module 1330 configured to cache the access instruction to an instruction storage area corresponding to the access instruction
  • the instruction transmission module 1350 is configured to continuously transmit the access instruction cached in the instruction storage area to the access object according to the access object indicated by the access instruction, and stop transmitting the access instruction until it is subject to flow control.
  • the instruction acquisition module 1310, the instruction cache module 1330, and the instruction transmission module 1350 may be functional modules, and are configured to perform corresponding steps in the foregoing method for implementing data transmission. It can be understood that these modules can be implemented by hardware, software, or a combination of both. When implemented in hardware, these modules may be implemented as one or more hardware modules, such as one or more application specific integrated circuits. When implemented in software, these modules may be implemented as one or more computer programs executing on one or more processors.
  • the instruction transmission module 1350 includes, but is not limited to:
  • An asynchronous transmission unit configured to continuously transmit the access instruction buffered in the instruction storage area to the access object according to a clock cycle of transmitting the access instruction to the access object until the access object receives the access The flow control signal returned by the instruction.
  • the device for implementing data transmission further includes, but is not limited to:
  • a data cache module 1370 configured to cache the read and write feedback data returned by the access object according to the access instruction in a data storage area corresponding to the access instruction;
  • a data transmission module 1390 configured to continuously transmit the read and write feedback data buffered in the data storage area to the initiator according to the initiator of the access instruction, until the initiator reads the read and write data from the initiator; The flow control signal returned by the feedback data.
  • the instruction transmission module 1350 includes, but is not limited to:
  • the continuous transmission unit is configured to continuously transmit the access instruction to the access object when the data storage area is not full until a flow control signal of the access object to the access instruction is received.
  • the instruction obtaining module 1310 includes, but is not limited to:
  • the request receiving unit 1311 is configured to receive an access request for reading and writing data, where the access request is initiated by the bus between the instance and both ends of the external device to either end;
  • the protocol conversion unit 1312 is configured to process the access request to obtain a corresponding access instruction according to a protocol conversion rule between different types of buses.
  • the request receiving unit 1311 includes, but is not limited to:
  • a first subunit configured to receive, through an AXI bus or a CCI-P bus corresponding to an instance running a computing service, an access request initiated by the instance for reading and writing data to and from an external device;
  • the second subunit is configured to receive, through a CCI-P bus or an AXI bus corresponding to the external device, an access request initiated by the external device for reading and writing data to an instance run by a computing service.
  • the protocol conversion unit 1312 includes, but is not limited to:
  • a judging subunit configured to judge the validity of the access request according to the identification information carried in the access request
  • the conversion subunit is configured to map an access request including an address signal to an access instruction including a read-write address according to a protocol conversion rule between different types of buses when the access request is valid.
  • the instruction cache module 1330 includes, but is not limited to:
  • a continuous cache unit configured to continuously write the access instruction into an instruction storage area corresponding to the access object and the instruction type according to the access object and the instruction type indicated by the access instruction, until the instruction storage area is full .
  • the present application further provides an electronic device that can be used in the FPGA chip 110 of the implementation environment shown in FIG. 1 to execute a method for implementing data transmission shown in any of FIGS. 3 to 6 and 9 to 13. All or part of the steps.
  • the electronic device may include:
  • a processor a memory for storing processor-executable instructions; wherein the processor is configured to execute the method for implementing data transmission described in the foregoing embodiment.
  • a storage medium is also provided, and the storage medium is a computer-readable storage medium, and may be, for example, temporary and non-transitory computer-readable storage media including instructions.
  • the storage medium stores a computer program, which can be executed by the FPGA chip 110 in the implementation environment shown in FIG. 1 to complete the above-mentioned method for implementing data transmission.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Bus Control (AREA)

Abstract

一种实现数据传输的方法及装置、电子设备、计算机可读存储介质,该方法应用于计算服务所运行实例和外部设备所分别对应的不同类型总线之间传输的执行,包括:获取进行数据读写的访问指令,所述访问指令是所述实例与外部设备两端之间通过总线而向任意一端发起的(310);将所述访问指令缓存至所述访问指令对应的指令存储区域(330);根据所述访问指令指示的访问对象,将所述指令存储区域缓存的所述访问指令持续传输至所述访问对象,直至受到流控才停止所述访问指令的传输(350)。

Description

实现数据传输的方法、装置、电子设备及计算机可读存储介质
本申请要求于2018年5月31日提交中国专利局、申请号为201810551660.7、发明名称为“实现数据传输的方法及装置、电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及集成电路技术领域,特别涉及一种实现数据传输的方法、装置、电子设备及计算机可读存储介质。
发明背景
随着集成电路的深亚微米制造技术、设计技术的迅速发展,集成电路已进入片上系统时代。所谓片上系统,也就是系统级集成电路(SOC,System on chip)。与此同时,IC的设计方法也从基于时序驱动的方式,发展到了基于IP(Intellectual Property,知识产权核)资源复用的方式。
由于各家FPGA(现场可编程门阵列)芯片厂商定义了不同的内部总线互联标准,而不同类型总线的传输带宽不同,这样当将A芯片中基于第一总线的用户设计移植到基于第二总线的B芯片中时,由于第一总线和第二总线的传输带宽不同,通过第一总线和第二总线传输的数据只能按照带宽较小的总线标准进行传输,由此影响了数据传输效率,浪费了另一总线的传输带宽。如果根据第二总线的传输带宽,重新调整用户设计的系统结构去适配新的总线带宽,则需要花费大量时间和精力,开发成本较高。
综上,由于不同类型总线的传输带宽不同,在进行用户设计的跨芯片移植时,影响了数据传输效率,浪费了总线带宽。
发明内容
为了解决相关技术中存在的由于不同类型总线的传输带宽不同,在进行用户设计的跨芯片移植时,影响了数据传输效率,浪费了总线带宽 的问题,本申请提供了一种实现数据传输的方法。
一方面,本申请还提供了一种实现数据传输的方法,由电子设备执行,所述方法应用于计算服务所运行实例和外部设备所分别对应的不同类型总线之间传输的执行,所述方法包括:
获取进行数据读写的访问指令,所述访问指令是所述实例与外部设备两端之间通过总线而向任意一端发起的;
将所述访问指令缓存至所述访问指令对应的指令存储区域;
根据所述访问指令指示的访问对象,将所述指令存储区域缓存的所述访问指令持续传输至所述访问对象,直至受到流控才停止所述访问指令的传输。
另一方面,本申请还提供了一种实现数据传输的装置,所述装置应用于计算服务所运行实例和外部设备所分别对应的不同类型总线之间传输的执行,所述装置包括:
指令获取模块,用于获取进行数据读写的访问指令,所述访问指令是所述实例与外部设备两端之间通过总线而向任意一端发起的;
指令缓存模块,用于将所述访问指令缓存至所述访问指令对应的指令存储区域;
指令传输模块,用于根据所述访问指令指示的访问对象,将所述指令存储区域缓存的所述访问指令持续传输至所述访问对象,直至受到流控才停止所述访问指令的传输。
进一步的,本申请还提供了一种电子设备,所述电子设备包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,所述处理器被配置为执行上述实现数据传输的方法。
另外,本申请还提供了一种计算机可读存储介质,存有处理器可执行指令,所述指令由一个或一个以上处理器执行时,完成上述实现数据传输的方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本申请。
附图简要说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1为基于AXI总线的实例移植到基于CCI-P总线的FPGA芯片后的实施环境示意图;
图2为基于CCI-P总线的实例移植到基于AXI总线的FPGA芯片后的实施环境示意图;
图3是根据一示例性实施例示出的一种实现数据传输的方法的流程图;
图4是在图3对应实施例的基础上另一示例性实施例提供的实现数据传输的方法流程图;
图5是图3对应实施例中步骤310的细节流程图;
图6是图5对应实施例中步骤312的细节流程图;
图7为FPGA芯片中部署的AXI总线与CCI-P总线转换的总线转换装置系统框架图;
图8是图7所示总线转换装置的详细展开图。
图9是根据一示例性实施例示出的快速链路协议转换器中读链路流处理模块处理读请求的流程示意图;
图10是根据一示例性实施例示出的快速链路协议转换器中全链路流处理模块处理读请求的流程示意图;
图11是根据一示例性实施例示出的快速链路协议转换器中写链路流处理模块处理写请求的流程示意图;
图12是根据一示例性实施例示出的快速链路协议转换器中全链路流处理模块处理写请求的流程示意图;
图13是本申请示例性实施例提供的实现数据传输的方法的完整流程示意图;
图14是根据一示例性实施例示出的一种实现数据传输的装置的框图;
图15是在图14对应实施例的基础上另一示例性实施例示出的一种实现数据传输的装置的框图;
图16是图14对应实施例中指令获取模块的细节框图。
实施方式
这里将详细地对示例性实施例执行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
目前主流FPGA(现场可编程门阵列)芯片生产商有因特尔(Intel)和赛灵思(Xilinx),由于赛灵思一般采用AXI总线进行芯片内部各功能模块互联,而因特尔则是采用CCI-P总线进行芯片内部各功能模块互联。由于不同厂商提供的FPGA芯片采用不同类型的总线互联,为了最大限度减少用户的二次开发,本申请在FPGA芯片内增加了一个快速链路协议转换器,可以实现CCI-P与AXI总线之间的转换,这样赛灵思提供的FPGA芯片中基于AXI总线的实例就可以方便的移植到因特尔提供的基于CCI-P总线的FPGA芯片中。
图1为基于AXI总线的实例移植到基于CCI-P总线的FPGA芯片后的实施环境示意图。如图1所示,该实施环境包括:FPGA芯片110和用于运行应用程序的外部设备120,外部设备120与FPGA芯片110之间通过PCIe总线互联。
实例111是从基于AXI总线115的芯片中移植过来的深度学习、图形图像压缩处理、基因组学等计算服务所运行的程序。如图1所示,可以在FPGA芯片110中部署本申请实现数据传输的快速链路协议转换器113,从而基于AXI总线115的实例111移植到基于CCI-P总线114的FPGA芯片110中时,可以与FPGA芯片110的CCI-P总线114的转接。
如图1所示,快速链路协议转换器113包括多个FIFO(先进先出的数据缓存器),分别缓存外部设备120向实例111发送的数据,以及缓存实例111向外部设备120发送的数据。即使AXI总线115与CCI-P总线114传输数据的时钟周期不同,传输带宽不同,通过对AXI总线115与CCI-P总线114之间传输的数据进行缓存,当需要传输高速数据时,避 免了传输带宽的浪费;并且在没有接收到流控信号时持续将缓存的数据传输至接收方,实现了数据的流模式传输,提高了数据传输效率,进一步的,可以使数据按照接收方的时钟周期输出,从而实现数据的异步传输。
如图1所示,上述快速链路协议转换器113可以包括读链路流处理模块101、写链路流处理模块102、全链路流处理模块103和协链路流处理模块104,每个模块均包含FIFO。
读链路流处理模块101可以接收实例111通过AXI总线115传输的读请求,并处理读请求得到读指令。将读指令缓存在读指令异步FIFO中,进而以流模式向CCI-P总线114异步传输读指令,获得外部设备120响应读指令返回的读数据。读链路流处理模块101将读数据缓存在读数据异步FIFO中,进而以流模式将读数据通过AXI总线115异步传输至计算服务所运行的实例111,从而实例111可以对接收的读数据进行数据分析处理。
写链路流处理模块102可以接收实例111通过AXI总线115传输的写请求,并处理写请求得到写指令。将写指令缓存在写指令异步FIFO中,进而以流模式向CCI-P总线114异步传输写指令,外部设备120将写指令所携带的写数据进行写入,并返回写响应。写链路流处理模块102将写响应缓存在写响应异步FIFO中,进而以流模式将写响应通过AXI总线115异步传输至计算服务所运行的实例111。其中,写数据可以是实例111对上述读数据进行数据分析处理获得的结果。
全链路流处理模块103具有并行读写功能,可以并行接收外部设备120通过CCI-P总线114传输的读写请求,并处理读写请求得到读写指令,将读指令缓存在读指令异步FIFO,将写指令缓存在写指令异步FIFO,进而以流模式向向AXI总线115异步传输读指令或写指令,获得实例111响应读指令返回的读数据或者响应写指令返回的写响应。全链路流处理模块103进一步可以对写响应或读数据进行缓存,并以流模式通过CCI-P总线114异步传输至外部设备120。
其中,协链路流处理模块104用于向AXI总线115传输低速的配置信息,例如可以辅助读链路流处理模块101和写链路流处理模块102, 向AXI总线115返回读数据的包头信息和写响应的包头信息,以及传输校验码等。
现有技术为了将Xilinx芯片的用户设计(即实例111)移植到Intel的芯片,则需要重新设计接口以及调整用户设计的系统结构去适配Intel芯片的总线带宽和时序,需要花费较多设计成本。本申请提供的方法,实现了不同类型总线之间的数据传输,从而可以实现用户设计的跨芯片移植,节省了开发成本。
图2为基于CCI-P总线的实例移植到基于AXI总线的FPGA芯片后的实施环境示意图。该实施环境包括:FPGA芯片110和用于运行应用程序的外部设备120,外部设备120与FPGA芯片110之间通过PCIe总线互联。
实例111是从基于CCI-P总线114的芯片中移植过来的深度学习、图形图像压缩处理、基因组学等计算服务所运行的程序。如图2所示,可以在FPGA芯片110中部署本申请实现数据传输的快速链路协议转换器113,从而基于CCI-P总线114的实例111移植到基于AXI总线115的FPGA芯片110中时,可以与FPGA芯片110的AXI总线115对接。
与图1相同,快速链路协议转换器包括多个FIFO,分别缓存外部设备120向实例111发送的数据,以及缓存实例111向外部设备120发送的数据,从而实现AXI总线115和CCI-P总线114之间数据的异步传输以及流模式传输。
如图2所示,快速链路协议转换器113包括读链路流处理模块101、写链路流处理模块102、全链路流处理模块103和协链路流处理模块104,每个模板均包括FIFO。
读链路流处理模块101可以接收外部设备120通过AXI总线115传输的读请求,并处理读请求得到读指令。将读指令缓存在读指令异步FIFO中,进而以流模式向CCI-P总线114异步传输读指令,获得实例111响应读指令返回的读数据。读链路流处理模块101进一步在读数据异步FIFO中对读数据进行缓存,并以流模式通过AXI总线115将读数据异步传输至外部设备120,从而外部设备120可以对接收的读数据进行数据分析处理。
写链路流处理模块102可以接收外部设备120通过AXI总线115传输的写请求,并处理写请求得到写指令,将写指令缓存在写指令异步FIFO中,进而以流模式向CCI-P总线114异步传输写指令,获得实例111响应写指令返回的写响应,写链路流处理模块102进一步可以在写响应异步FIFO中对写响应进行缓存,并以流模式通过AXI总线115异步传输写响应至外部设备120。其中写数据可以是外部设备120对上述读数据进行数据分析处理获得的结果。
全链路流处理模块103具有并行读写功能,可以并行接收实例111通过CCI-P总线114传输的读写请求,并处理读写请求得到读写指令。在读指令异步缓存器中缓存读指令,在写指令异步缓存器中缓存写指令,并以流模式向AXI总线115异步传输读指令或写指令,进而获得外部设备120响应读指令返回的读数据。全链路流处理模块103进一步可以对读数据进行缓存,并以流模式通过CCI-P总线114异步传输至实例111。
其中,协链路流处理模块104用于向AXI总线115传输低速的配置信息,例如可以辅助读链路流处理模块101和写链路流处理模块102,向AXI总线115返回读数据的包头信息和写响应的包头信息,以及传输校验码等。
图3是根据一示例性实施例示出的一种实现数据传输的方法流程图。该实现数据传输的方法的适用范围,例如可以用于图1所示实施环境的FPGA芯片110中,实现计算服务所运行实例和外部设备所分别对应的不同类型总线之间传输的执行,如图3所示,该方法由后文介绍的电子设备执行,具体可以包括以下步骤。
在步骤310中,获取进行数据读写的访问指令,访问指令是所述实例与外部设备两端之间通过总线而向任意一端发起的。
其中,计算服务是指FPGA芯片部署的某种数据处理功能,例如深度学习、图形图像压缩处理、基因组学等计算服务。实例是指完成上述计算服务的程序模块。外部设备是相对FPGA芯片而言,运行应用程序的终端设备,例如计算机的主机。FPGA芯片内部本身采用的总线,可能是AXI总线或CCI-P总线,实例可以是从其他设计中移植过来的,有 可能采用不同类型的总线。例如,FPGA芯片采用AXI总线时,移植过来的实例可能采用CCI-P总线。FPGA芯片采用CCI-P总线时,移植过来的实例可能采用AXI总线。本申请提供的方法可以用于实现CCI-P总线和AXI总线之间传输的执行。
需要说明的是,访问指令可以是根据访问请求生成的,访问指令包括读指令和写指令。访问请求可以是实例发起的,通过AXI总线和CCI-P总线对外部设备进行数据读写的请求。访问请求也可以是外部设备发起的,通过AXI总线和CCI-P总线对实例进行数据读写的请求。由于AXI总线和CCI-P总线的时序和带宽不同,FPGA芯片通过快速链路协议转换器可以实现AXI总线和CCI-P总线之间的转换。快速链路协议转换器可以接收实例或外部设备发送的访问请求,并对读写请求进行处理获得访问指令。
在步骤330中,将访问指令缓存至访问指令对应的指令存储区域。
需要说明的是,将访问指令缓存至访问指令对应的指令存储区域是指将针对同一访问对象的访问指令存储在一起。指令存储区域可以是FPGA芯片中部署的异步FIFO,针对同一访问对象的写操作指令存储在写指令异步FIFO中,针对同一访问对象的读操作指令存储在读指令异步FIFO中。按照访问指令生成的先后顺序,按序在指令存储区域中进行访问指令的存储,并按照先进先出原则,在没有接收到反压信号时,按序读取访问指令进行输出。
具体的,FPGA芯片的快速链路协议转换器将处理访问请求获得的访问指令按序存储在对应的指令存储区域。FPGA芯片访问外部设备的读指令缓存在第一FIFO中,FPGA芯片访问外部设备的写指令缓存在第二FIFO中,而外部设备访问FPGA芯片中实例的读指令缓存在第三FIFO中,外部设备访问FPGA芯片中实例的写指令缓存在第四FIFO中。
在步骤350中,根据访问指令指示的访问对象,将指令存储区域缓存的访问指令持续传输至访问对象,直至受到流控才停止所述访问指令的传输。
需要说明的是,假设访问指令是根据实例发送的访问请求生成的,则访问对象是外部设备。假设访问指令是根据外部设备发送的访问请求 生成的,则访问对象是FPGA芯片中的实例。对应于实例的总线可能是AXI总线,对应于外部设备的总线是CCI-P总线。或者,对应于实例的总线是CCI-P总线,对应于外部设备的总线是AXI总线。AXI总线是ARM公司提出的一种面向高性能、高带宽、低延迟的片内总线。CCI-P总线是Intel公司提出的一种面向片上高速缓存应用的总线。采用本申请提供的方法可以实现CCI-P总线和AXI总线之间的转换,这样就可以方便的将其他FPGA芯片的实例移植到该当前FPGA芯片110。
其中,受到流控是指因访问对象数据处理不过来需暂停访问指令的传输。在一种实施例中,如果接收到访问对象对访问指令返回的流控信号,可以认为访问指令的传输受到流控,则停止访问指令的传输。当访问对象没有多余处理能力处理访问指令时,则会向FPGA芯片发送流控信号。FPGA芯片的快速链路协议转换器在没有接收到访问对象对访问指令的流控信号时,会持续不断地将缓存的访问指令通过对应于访问对象的总线传输至访问对象,从而实现数据的流模式传输,提高传输效率。当接收到对访问指令的流控信号时,才停止向访问对象传输访问指令。
在一种实施例中,访问对象是外部设备,FPGA芯片的快速链路协议转换器持续不断将缓存的读指令传输至外部设备,直到接收到外部设备返回的对读指令的流控信号,则暂停读指令的传输。FPGA芯片的快速链路协议转换器持续不断将缓存的写指令传输至外部设备,直到接收到外部设备返回的对写指令的流控信号,则暂停写指令的传输。换句话说,读指令的传输和写指令的传输互不干扰,在接收到对读指令的流控信号时,暂停读指令的传输,但是如果没有接收到对写指令的流控信号,仍可继续进行写指令的传输。
需要强调的是,由于AXI总线和CCI-P总线属于不同类型的总线,当需要传输高速数据时,由于AXI总线与CCI-P总线的带宽不同,因此传输速率只能按照带宽小的总线进行传输,导致传输带宽的浪费。而本申请通过对访问指令进行缓存,在没有接收到访问对象返回的流控信号时,可以持续不断将缓存的访问指令传输至访问对象。举例来说,当AXI总线传输带宽较大时,CCI-P总线传输带宽较小时,可以对AXI总线传输的高速数据进行缓存,并在没有接收到流控信号时,将缓存的数 据持续向CCI-P总线传输,从而最大限度提高了数据传输效率,提高了吞吐量。
本申请提供的技术方案,通过对实例与外部设备之间通过不同类型总线传输的访问指令进行缓存,并持续向访问指令的访问对象传输缓存的访问指令,直至受到流控才停止向访问对象传输访问指令,从而实现了访问指令的流模式传输,提高了数据传输效率,通过对访问指令进行缓存,克服了实例跨芯片移植时,因不同类型总线之间带宽不同导致的带宽浪费问题,进而无需调整实例,节省了开发成本。
在一种示例性实施例中,上述步骤350具体包括:
按照向所述访问对象传输所述访问指令的时钟周期,持续将所述指令存储区域缓存的访问指令传输至所述访问对象,直到接收到所述访问对象对所述访问指令返回的流控信号。
其中,时钟周期是指对应于访问对象的总线,向访问对象传输访问指令的时序。流控信号可以是反压信号,也可以是其他用于指示数据处理不过来需暂停数据传输的信号。FPGA芯片的快速链路协议转换器按照该时序,不断从指令存储区域读取访问指令并传输至访问对象,直到接收到该访问对象由于对访问指令处理不过来而发送的反压信号,则暂停向访问对象传输访问指令。当接收到撤销反压的信号时,则继续向访问对象传输访问指令。
假设对应于实例的总线是AXI总线,对应于外部设备的总线是CCI-P总线,不同类型总线的时钟周期不同。假设访问指令是实例发起的,用于对外部设备进行访问。本申请通过快速链路协议转换器对AXI总线和CCI-P总线之间传输的访问指令的进行缓存,从而快速链路协议转换器可以按照AXI总线的时钟周期接收传输的访问指令并进行缓存,之后可以按照CCI-P总线的时钟周期向外部设备传输缓存的访问指令,由此实现FPGA芯片内不同类型总线之间数据的异步传输。
在一种示例性实施例中,如图4所示,在上述步骤350之后,本申请提供的实现数据传输的方法还包括以下步骤:
在步骤401中,在所述访问指令对应的数据存储区域中,缓存所述访问对象根据所述访问指令返回的读写反馈数据;
其中,数据存储区域用于存储访问对象响应访问指令返回的读写反馈数据。读写反馈数据包括读数据和写响应。数据存储区域包括写响应异步FIFO、读数据异步FIFO。其中,写响应异步FIFO用于缓存访问对象根据写指令返回的写响应,而读数据异步FIFO用于缓存访问对象根据读指令返回的读数据。
具体的,FPGA芯片的快速链路协议转换器接收到访问对象响应访问指令返回的读写反馈数据后,在访问指令对应的数据存储区域缓存该读写反馈数据,例如在写响应异步FIFO中缓存访问对象响应写指令返回的写响应,在读数据异步FIFO中缓存访问对象响应读指令返回的读数据。
在步骤402中,根据所述访问指令的发起方,将所述数据存储区域缓存的所述读写反馈数据持续传输至所述发起方,直到接收到所述发起方对所述读写反馈数据返回的流控信号。
其中,访问指令的发起方可以是实例,也可以是外部设备。当发起方是实例时,访问对象是外部设备;相反的,当发起方是外部设备时,访问对象是实例。以发起方是实例,访问对象是外部设备的场景举例来说,FPGA芯片的快速链路协议转换器将缓存的访问指令(以读指令为例)发送至外部设备,接收外部设备根据读指令返回的读数据,然后将读数据缓存在读数据异步FIFO中。快速链路协议转换器按照先进先出原则持续将读数据异步FIFO中缓存的读数据传输至实例,直到接收到实例对读数据返回的流控信号,才暂停读数据的返回。
进一步的,上述步骤350具体包括:
在所述数据存储区域非满时,将所述访问指令持续传输至所述访问对象,直到接收到所述访问对象对所述访问指令的流控信号。
也就是说,FPGA芯片在将缓存的访问指令传输至访问对象之前,还需要判断数据存储区域是否非满。因数据存储区域用户缓存访问对象根据访问指令返回的读写反馈数据,如果该数据存储区域满了,则返回的读写反馈数据无法写入数据存储区域,还需另外进行寄存。由此,FPGA芯片在数据存储区域非满时,才不断向访问对象传输缓存的访问指令,直到接收到访问对象对访问指令返回的流控信号。
在一种示例性实施例中,如图5所示,上述步骤310具体包括:
在步骤311中,接收进行数据读写的访问请求,所述访问请求是所述实例与外部设备两端之间通过总线而向任意一端发起的;
其中,访问请求可以是实例或外部设备发起的。在一种实施例中,FPGA芯片的快速链路协议转换器通过对应于计算服务所运行实例的AXI总线或CCI-P总线,接收实例发起的用于对外部设备进行数据读写的访问请求。在一种实施例中,通过对应于外部设备的CCI-P总线或AXI总线,接收外部设备发起的用于对计算服务所运行实例进行数据读写的访问请求。
在步骤312中,根据不同类型总线之间的协议转换规则,处理所述访问请求获得相应访问指令。
需要说明的是,FPGA芯片110中可以预先存储不同总线之间的协议转换规则。根据AXI总线和CCI-P总线之间的协议转换规则,可以将实例通过AXI总线传输的访问请求,按照该协议转换规则映射得到对CCI-P总线进行访问的访问指令。在其他实施例中,可以将实例通过CCI-P总线传输的访问请求,按照该协议转换规则映射得到向AXI总线进行访问的访问指令。举例来说,按照协议转换规则,根据访问请求中所携带的AXI总线读地址与其他控制信息(如校验码等)计算出CCI-P总线读地址,获得包含该CCI-P总线读地址的访问指令,进而通过向外部设备传输该访问指令可以获得该读地址对应存储的数据。
进一步的,如图6所示,上述步骤312具体包括:
在步骤601中,根据所述访问请求所携带的标识信息,判断所述访问请求的有效性;
其中,标识信息用于指示访问请求的有效性。举例来说,访问请求中携带标识信息a,表示执行读操作。访问请求中携带标识信息b,表示执行写操作。当访问请求中没有携带上述标识信息时,表示该访问请求无效,由此根据访问请求所携带的标识信息,可以判断出访问请求是否有效,对于无效的访问请求可以不作处理。
在步骤602中,在所述访问请求有效时,根据不同类型总线之间的 协议转换规则,将包含地址信号的访问请求映射得到包含读写地址的访问指令。
需要说明的是,通过AXI总线传输的访问请求,该访问请求是基于AXI总线协议,该访问请求中所携带的地址信号是基于AXI总线协议的访问地址。根据AXI总线和CCI-P总线之间的协议转换规则,可以将基于AXI总线的地址信号映射得到基于CCI-P总线的包含读写地址的访问指令。例如基于AXI总线协议的地址信号111111,映射得到的基于CCI-P总线协议的读写地址可能是111000。同样的,通过CCI-P总线传输的访问请求,也可以根据AXI总线和CCI-P总线之间的协议转换规则,将访问请求中所携带的基于CCI-P总线的地址信号映射得到基于AXI总线的包含读写地址的访问指令。
具体的,在判断出访问请求有效时,FPGA芯片的快速链路协议转换器可以根据AXI总线和CCI-P总线之间的协议转换规则,对于实例通过AXI总线传输的访问请求(包含读写地址),将按照AXI总线传输的读写地址映射为通过CCI-P总线传输的读写地址,从而得到包含读写地址的访问指令。
在一种示例性实施例中,上述步骤330具体包括:
根据所述访问指令指示的访问对象及指令类型,持续将所述访问指令写入与所述访问对象及指令类型对应的指令存储区域,直到所述指令存储区域被写满。
其中,访问对象包括外部设备和实例,指令类型包括读指令和写指令。
当访问指令是访问外部设备的读指令时,该读指令存储在对应的第一读指令异步FIFO,在第一读指令异步FIFO非满时,按照读指令生成的先后顺序,将读指令按序缓存至第一读指令异步FIFO。由此在没有接收到外部设备对读指令的流控信号时,可以持续不断从第一读指令异步FIFO中读取读指令并传输至外部设备。
当访问指令是访问外部设备的写指令时,该写指令存储在对应的第一写指令异步FIFO,在第一写指令异步FIFO非满时,按照写指令生成的先后顺序,将写指令按序缓存至第一写指令异步FIFO。由此在没有接 收到外部设备对写指令的流控信号时,可以持续不断从第一写指令异步FIFO中读取写指令并传输至外部设备。
当访问指令是访问实例的读指令时,该读指令存储在对应的第二读指令异步FIFO,在第二读指令异步FIFO非满时,按照读指令生成的先后顺序,将读指令按序缓存至第二读指令异步FIFO。由此在没有接收到实例对读指令的流控信号时,可以持续不断从第二读指令异步FIFO中获取读指令并传输至实例。
当访问指令是访问实例的写指令时,该写指令存储在对应的第二写指令异步FIFO,在第二写指令异步FIFO非满时,按照写指令生成的先后顺序,将写指令按序缓存至第二写指令异步FIFO。由此在没有接收到实例对写指令的流控信号时,可以持续不断从第二写指令异步FIFO中获取写指令并传输至实例。
图7为FPGA芯片中部署的AXI总线与CCI-P总线转换的总线转换装置系统框架图,如图7所示,该总线转换装置可以包括快速链路协议转换器113、AXI总线115的第一接口701、CCI-P总线114的第二接口702。快速链路协议转换器113对接AXI总线115的第一接口701以及CCI-P总线114的第二接口703,从而实现AXI总线115和CCI-P总线114之间的数据异步传输,以及数据的流模式传输。除非接收到流控信号,否则快速链路协议转换器113可以不间断向接口传输数据,最大限度提高数据传输效率,避免传输带宽的浪费。
其中,AXI总线115包含4条数据传输链路,CCI-P总线114包含三条数据传输链路,第一接口701由三条独立AXI总线和一条AXI-Lite总线组成,其中有两条独立AXI总线端口是主方(M:Master),一条独立AXI总线端口是从方(S:Slave)。一条AXI-Lite总线端口是从方(S:Slave)。第二接口702由C0,C1,C2三条数据传输链路组成。
图8是图7所示总线转换装置的详细展开图。其中,快速链路协议转换器113包括AXI接口106、CCI-P接口105、读链路流处理模块101、写链路流处理模块102、全链路流处理模块103以及协链路流处理模块104。CCI-P接口105包括3个TX(发送端)和2个RX(接收端)。根 据业务类型可以划分为7种业务,其对应关系如下表1所示。
表1:CCI-P接口业务类型列表
Figure PCTCN2019082225-appb-000001
AXI接口106包括了3条AXI总线(AXI0、AXI1、AXI2)和1条AXI-Lite总线,其中3条AXI总线传输的是高速数据,AXI-Lite总线传输的是低速的配置数据,例如附加包头信息。AXI接口106与快速链路协议转换器113的互连关系如下表2所示。
表2:AXI总线互连关系
总线编号 主方 从方
AXI0 AXI接口 读链路流处理模块
AXI1 AXI接口 写链路流处理模块
AXI2 全链路流处理模块 AXI接口
AXI-Lite 协处理模块 AXI接口
具体的,读链路流处理模块101主要用于执行AXI总线115向CCI-P总线114的读操作,即接收来自AXI总线115的读请求,从CCI-P总线114读取数据并返回至AXI总线115(即MEM_RD和MEM_RD_RSP业务)。写链路流处理模块102主要用于执行AXI总线115向CCI-P总线114的写操作,即接收AXI总线115的写请求,将数据写入CCI-P总线114并接收返回的写响应(即MEM_WR和MEM_WR_RSP业务)。全链路流处理模块103主要用于执行CCI-P总线114向AXI总线115的读写操作,即将数据写入AXI总线115或从AXI总线115读取数据返回至CCI-P总线114(即MMIO_WR、MMIO_RD和MMIO_RD_RSP 业务)。协链路流处理模块104主要把C0.RX、C1.RX端口传输的写内存响应和读内存响应包含的附加包头信息通过AXI-Lite总线返回至AXI接口106。如下表3所示,为协链路流处理模块104的处理业务描述。
表3:协处理模块处理业务描述
业务类型 CCI-P硬件接口 返回具体信息
MEM_WR_RSP C1.RX 写内存响应的附加包头信息
MEM_RD_RSP C0.RX 读内存响应的附加包头信息
在一种示例性实施例中,当访问指令是访问外部设备120的读指令,对应于外部设备120的总线是CCI-P总线114时(参照图1所示实施环境),FPGA芯片110中部署的读链路流处理模块101处理MEM_RD(读内存)和MEM_RD_RSP(读内存响应)业务,实现FPGA芯片110中不同类型总线之间的数据传输。读链路流处理模块101包含读指令异步FIFO以及读数据异步FIFO。如图9所示,读链路流处理模块101的具体工作流程包括以下步骤。
在步骤901,检测读请求。当读链路流处理模块处于空闲状态时,读链路流处理模块不断检测当前是否有读请求,并判断读请求是否有效,如果当前读请求有效,则进入计算读请求包头阶段。
在步骤902,计算读请求包头。根据读请求所携带的AXI总线读地址以及其他控制信息(如校验码)计算CCI-P总线读请求包头(即包含CCI-P总线读地址的读指令)。
在步骤903,写入读指令异步FIFO。在读指令异步FIFO非满时,把步骤702计算得到的读请求包头写入读指令异步FIFO。
在步骤904,发起CCI-P总线读操作(MEM_RD)。当CCI-P接口读反压无效而且读数据异步FIFO非满情况下,在CCI-P接口的CO.TX端口发起读操作。
在步骤905,接收CCI-P总线读数据返回。把CCI-P接口的C0.RX端口返回的读数据(MEM_RD_RSP)写入读数据异步FIFO。
在步骤906,当AXI接口读反压无效时,向AXI总线返回对应的读数据。
在另一示例性实施例中,当访问指令是访问实例111的读指令,对应于实例111的总线是AXI总线115时(参照图1所示实施环境),FPGA芯片110中部署的全链路流处理模块103处理MMIO_RD_RSP(内存映射I/O读响应)和MMIO_RD(读内存映射I/O)业务。全链路流处理模块103包含读指令异步FIFO以及读数据异步FIFO。如图10所示,全链路流处理模块103处理MMIO_RD_RSP和MMIO_RD业务,完成读请求的具体工作流程包括以下步骤。
在步骤1001中,检测CCI-P C0.RX端口接收的读请求。当系统处于空闲状态时,链路不断检测当前是否有读请求,如果当前读请求有效,则进入步骤1002。
在步骤1002中,寄存CCI-P总线读地址,根据CCI-P读地址计算AXI总线读地址。
在步骤1003中,在读指令异步FIFO非满时,将AXI总线读地址写入读指令异步FIFO。
在步骤1004中,在读数据异步FIFO非满且AXI接口读反压无效时,向AXI总线发起读操作。
在步骤1005中,把AXI接口返回的读数据写入读数据异步FIFO。
在步骤1006中,当CCI-P C2.TX端口反压无效时,向CCI-P总线返回对应读数据。
需要说明的是,在处理读任务时,由于FPGA芯片110的读链路流处理模块101和全链路流处理模块103均包含读指令异步FIFO和读数据异步FIFO,因此在AXI总线115和CCI-P总线114之间支持异步时钟域之间数据和指令的传输;读链路流处理模块101和全链路流处理模块103支持流模式数据传输,当FIFO非满时允许以流方式不断写入指令或者数据,在无反压时不断输出指令或数据,最大限度提高访问效率,提高吞吐。
在一种示例性实施例中,当访问指令是访问外部设备120的写指令,对应于外部设备120的总线是CCI-P总线114时(参照图1所示实施环 境),FPGA芯片110中部署的写链路流处理模块102处理MEM_WR(写内存)和MEM_WR_RSP(写响应)业务。写链路流处理模块102包含写指令异步FIFO以及写响应异步FIFO。如图11所示,写链路流处理模块102的具体工作流程包括以下步骤。
在步骤1101中,检测写请求。当系统处于空闲状态时,写链路流处理模块不断检测当前是否有写请求,如果当前写请求有效,则同时进入MEM_WR和MEM_WR_RSP阶段。
对于MEM_WR阶段,当写请求有效后,进入步骤1102寄存AXI总线写地址和数据,当寄存完毕后,进入计算写请求包头。
在步骤1103中,计算写请求包头,根据当前AXI总线写地址以及其他控制信息计算CCI-P总线写请求包头(包括CCI-P总线写地址)。
在步骤1104中,在写指令异步FIFO非满时,将写地址和写数据(即写指令)写入写指令异步FIFO。
在步骤1105中,发起CCI-P总线写操作。当CCI-P总线写反压无效的情况下在CCI-P C1总线的TX端口发起MEM_WR操作。
对于MEM_WR_RSP阶段,当写请求有效后,进入步骤1106的等待写响应阶段。
在步骤1107中,接收CCI-P总线返回的写响应。
在步骤1108中,当写响应异步FIFO非满时,把CCI-P C1.RX返回的写响应(MEM_WR_RSP)写入写响应异步FIFO。
在步骤1109中,当AXI总线的写响应接口反压无效时,向AXI总线返回对应写响应。
在一种示例性实施例中,当访问指令是访问实例111的写指令,对应于实例111的总线是AXI总线115时(参照图1所示实施环境),FPGA芯片110中部署的全链路流处理模块103处理MMIO_WR(写内存映射I/O))业务。其中,全链路流处理模块103支持读、写并行操作。全链路流处理模块103除了包含读指令异步FIFO以及读数据异步FIFO外,还包含写指令异步FIFO。如图11所示,全链路流处理模块103处理MMIO_WR业务的具体工作流程包括以下步骤。
在步骤1201中,检测CCI-P C0.RX写请求。当系统处于空闲状态时,链路不断检测当前是否有写请求,如果当前写请求有效,进入步骤1202。
在步骤1202中,寄存写请求所携带的CCI-P写地址和数据,当寄存完毕后,进入步骤1203。
在步骤1203中,根据寄存的CCI-P写请求包头,计算AXI写地址。
在步骤1204中,在写指令异步FIFO非满时,将写数据和AXI写地址(即写指令)写入写指令异步FIFO。
在步骤1205中,在AXI总线写端口反压无效时,对AXI总线执行写操作直到完毕。
需要说明的是,在处理写任务时,由于FPGA芯片110的写链路流处理模块102包含写指令异步FIFO和写响应异步FIFO,全链路流处理模块103包括写指令异步FIFO对写数据和写地址进行缓存,因此FPGA芯片110的AXI接口106和CCI-P接口105之间支持异步时钟域之间数据和指令的传输;写链路流处理模块102和全链路流处理模块103支持流模式数据传输,当FIFO非满时允许以流方式不断写入指令或者数据,在无反压时不断输出指令或数据,最大限度提高访问效率,提高吞吐,克服由于带宽不适配导致的带宽浪费问题。
图13是本申请示例性实施例提供的实现数据传输的方法的完整流程示意图。假设FPGA芯片本身采用的总线是CCI-P总线,从其他芯片移植过来的实例采用的总线是AXI总线,如图13所示,该实现数据传输的方法可以包括以下步骤。对于FPGA芯片采用AXI总线,实例采用CCI-P总线的情况,可以参照当前实施例执行。
在步骤1301中,FPGA芯片的快速链路协议转换器接收实例通过AXI总线(或者外部设备通过CCI-P总线)传输的读写请求;
在步骤1302中,根据访问请求携带的标识信息,判断读写请求是否有效,
在步骤1303中,在访问请求有效时,根据AXI总线和CCI-P总线之间的协议转换规则,将访问请求映射得到包含读写地址的读写指令;
在步骤1304中,在读指令异步FIFO非满时持续进行读指令的缓存, 并在没有接收到外部设备通过CCI-P总线(或者实例通过AXI总线)返回对读指令的流控信号时,按照先进先出原则,持续将缓存的读指令通过CCI-P总线传输至外部设备(或者通过AXI总线传输至实例)。
在写指令异步FIFO非满时持续进行写指令的缓存,并在没有接收到外部设备通过CCI-P总线(或者实例通过AXI总线)返回对写指令的流控信号时,按照先进先出原则,持续将缓存的写指令通过CCI-P总线传输至外部设备(或者通过AXI总线传输至实例)。
下述为本申请装置实施例,可以用于执行本申请上述FPGA芯片中快速链路协议转换器执行的实现数据传输的方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请实现数据传输的方法实施例。
图14是根据一示例性实施例示出的一种实现数据传输的装置的框图,该实现数据传输的装置可以用于图1所示实施环境的FPGA芯片110中,执行图3-图6、图9-图13任一所示的实现数据传输的方法的全部或者部分步骤。如图14所示,该装置包括但不限于:指令获取模块1310、指令缓存模块1330以及指令传输模块1350。
指令获取模块1310,用于获取进行数据读写的访问指令,所述访问指令是所述实例与外部设备两端之间通过总线而向任意一端发起的;
指令缓存模块1330,用于将所述访问指令缓存至所述访问指令对应的指令存储区域;
指令传输模块1350,用于根据所述访问指令指示的访问对象,将所述指令存储区域缓存的所述访问指令持续传输至所述访问对象,直至受到流控才停止所述访问指令的传输。
上述装置中各个模块的功能和作用的实现过程具体详见上述实现数据传输的方法中对应步骤的实现过程,在此不再赘述。
指令获取模块1310、指令缓存模块1330、指令传输模块1350可以是功能模块,用于执行上述实现数据传输的方法中的对应步骤。可以理解,这些模块可以通过硬件、软件、或二者结合来实现。当以硬件方式实现时,这些模块可以实施为一个或多个硬件模块,例如一个或多个专用集成电路。当以软件方式实现时,这些模块可以实施为在一个或多个处理器上执行的一个或多个计算机程序。
在一种示例性实施例中,所述指令传输模块1350包括但不限于:
异步传输单元,用于按照向所述访问对象传输所述访问指令的时钟周期,持续将所述指令存储区域缓存的访问指令传输至所述访问对象,直到接收到所述访问对象对所述访问指令返回的流控信号。
在一种示例性实施例中,如图15所示,本申请提供的实现数据传输的装置还包括但不限于:
数据缓存模块1370,用于在所述访问指令对应的数据存储区域中,缓存所述访问对象根据所述访问指令返回的读写反馈数据;
数据传输模块1390,用于根据所述访问指令的发起方,将所述数据存储区域缓存的所述读写反馈数据持续传输至所述发起方,直到接收到所述发起方对所述读写反馈数据返回的流控信号。
在一种示例性实施例中,所述指令传输模块1350包括但不限于:
持续传输单元,用于在所述数据存储区域非满时,将所述访问指令持续传输至所述访问对象,直到接收到所述访问对象对所述访问指令的流控信号。
在一种示例性实施例中,如图16所示,所述指令获取模块1310包括但不限于:
请求接收单元1311,用于接收进行数据读写的访问请求,所述访问请求是所述实例与外部设备两端之间通过总线而向任意一端发起的;
协议转换单元1312,用于根据不同类型总线之间的协议转换规则,处理所述访问请求获得相应访问指令。
其中,上述请求接收单元1311包括但不限于:
第一子单元,用于通过对应于计算服务所运行实例的AXI总线或CCI-P总线,接收所述实例发起的用于对外部设备进行数据读写的访问请求;
或者,
第二子单元,用于通过对应于所述外部设备的CCI-P总线或AXI总线,接收所述外部设备发起的用于对计算服务所运行实例进行数据读写的访问请求。
其中,上述协议转换单元1312包括但不限于:
判断子单元,用于根据所述访问请求所携带的标识信息,判断所述访问请求的有效性;
转换子单元,用于在所述访问请求有效时,根据不同类型总线之间的协议转换规则,将包含地址信号的访问请求映射得到包含读写地址的访问指令。
在一种示例性实施例中,上述指令缓存模块1330包括但不限于:
持续缓存单元,用于根据所述访问指令指示的访问对象及指令类型,持续将所述访问指令写入与所述访问对象及指令类型对应的指令存储区域,直到所述指令存储区域被写满。
本申请还提供一种电子设备,该电子设备可以用于图1所示实施环境的FPGA芯片110中,执行图3-图6、图9-图13任一所示的实现数据传输的方法的全部或者部分步骤。具体的,该电子设备可以包括:
处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为执行上述实施例所述的实现数据传输的方法。
该实施例中电子设备的处理器执行操作的具体方式已经在有关该实现数据传输的方法的实施例中执行了详细描述,此处将不做详细阐述说明。
在示例性实施例中,还提供了一种存储介质,该存储介质为计算机可读存储介质,例如可以为包括指令的临时性和非临时性计算机可读存储介质。该存储介质存储有计算机程序,该计算机程序可由图1所示实施环境中的FPGA芯片110执行以完成上述实现数据传输的方法。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围执行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (16)

  1. 一种实现数据传输的方法,由电子设备执行,所述方法应用于计算服务所运行实例和外部设备所分别对应的不同类型总线之间传输的执行,所述方法包括:
    获取进行数据读写的访问指令,所述访问指令是所述实例与外部设备两端之间通过总线而向任意一端发起的;
    将所述访问指令缓存至所述访问指令对应的指令存储区域;
    根据所述访问指令指示的访问对象,将所述指令存储区域缓存的所述访问指令持续传输至所述访问对象,直至受到流控才停止所述访问指令的传输。
  2. 根据权利要求1所述的方法,其中,所述根据所述访问指令指示的访问对象,将所述指令存储区域缓存的所述访问指令持续传输至所述访问对象,直至受到流控才停止所述访问指令的传输,包括:
    按照向所述访问对象传输所述访问指令的时钟周期,持续将所述指令存储区域缓存的访问指令传输至所述访问对象,直到接收到所述访问对象对所述访问指令返回的流控信号。
  3. 根据权利要求1所述的方法,其中,所述根据所述访问指令指示的访问对象,将所述指令存储区域缓存的所述访问指令持续传输至所述访问对象,直至受到流控才停止所述访问指令的传输之后,所述方法还包括:
    在所述访问指令对应的数据存储区域中,缓存所述访问对象根据所述访问指令返回的读写反馈数据;
    根据所述访问指令的发起方,将所述数据存储区域缓存的所述读写反馈数据持续传输至所述发起方,直到接收到所述发起方对所述读写反馈数据返回的流控信号。
  4. 根据权利要求3所述的方法,其中,所述根据所述访问指令指示的访问对象,将所述指令存储区域缓存的所述访问指令持续传输至所述访问对象,直至受到流控才停止所述访问指令的传输,包括:
    在所述数据存储区域非满时,将所述访问指令持续传输至所述访问 对象,直到接收到所述访问对象对所述访问指令的流控信号。
  5. 根据权利要求1所述的方法,其中,所述获取进行数据读写的访问指令,所述访问指令是所述实例与外部设备两端之间通过总线而向任意一端发起的,包括:
    接收进行数据读写的访问请求,所述访问请求是所述实例与外部设备两端之间通过总线而向任意一端发起的;
    根据不同类型总线之间的协议转换规则,处理所述访问请求获得相应访问指令。
  6. 根据权利要求5所述的方法,其中,所述接收进行数据读写的访问请求,包括:
    通过对应于计算服务所运行实例的AXI总线或CCI-P总线,接收所述实例发起的用于对外部设备进行数据读写的访问请求。
  7. 根据权利要求5所述的方法,其中,所述接收进行数据读写的访问请求,包括:
    通过对应于所述外部设备的CCI-P总线或AXI总线,接收所述外部设备发起的用于对计算服务所运行实例进行数据读写的访问请求。
  8. 根据权利要求5所述的方法,其中,所述根据不同类型总线之间的协议转换规则,处理所述访问请求获得相应访问指令,包括:
    根据所述访问请求所携带的标识信息,判断所述访问请求的有效性;
    在所述访问请求有效时,根据不同类型总线之间的协议转换规则,将包含地址信号的访问请求映射得到包含读写地址的访问指令。
  9. 根据权利要求1所述的方法,其中,所述将所述访问指令缓存至所述访问指令对应的指令存储区域,包括:
    根据所述访问指令指示的访问对象及指令类型,持续将所述访问指令写入与所述访问对象及指令类型对应的指令存储区域,直到所述指令存储区域被写满。
  10. 一种实现数据传输的装置,所述装置应用于计算服务所运行实例和外部设备所分别对应的不同类型总线之间传输的执行,所述装置包 括:
    指令获取模块,用于获取进行数据读写的访问指令,所述访问指令是所述实例与外部设备两端之间通过总线而向任意一端发起的;
    指令缓存模块,用于将所述访问指令缓存至所述访问指令对应的指令存储区域;
    指令传输模块,用于根据所述访问指令指示的访问对象,将所述指令存储区域缓存的所述访问指令持续传输至所述访问对象,直至受到流控才停止所述访问指令的传输。
  11. 根据权利要求10所述的装置,其中,所述指令传输模块包括:
    异步传输单元,用于按照向所述访问对象传输所述访问指令的时钟周期,持续将所述指令存储区域缓存的访问指令传输至所述访问对象,直到接收到所述访问对象对所述访问指令返回的流控信号。
  12. 根据权利要求10所述的装置,其中,所述装置还包括:
    数据缓存模块,用于在所述访问指令对应的数据存储区域中,缓存所述访问对象根据所述访问指令返回的读写反馈数据;
    数据传输模块,用于根据所述访问指令的发起方,将所述数据存储区域缓存的所述读写反馈数据持续传输至所述发起方,直到接收到所述发起方对所述读写反馈数据返回的流控信号。
  13. 根据权利要求12所述的装置,其中,所述指令传输模块包括:
    持续传输单元,用于在所述数据存储区域非满时,将所述访问指令持续传输至所述访问对象,直到接收到所述访问对象对所述访问指令的流控信号。
  14. 根据权利要求10所述的装置,其中,所述指令获取模块包括:
    请求接收单元,用于接收进行数据读写的访问请求,所述访问请求是所述实例与外部设备两端之间通过总线而向任意一端发起的;
    协议转换单元,用于根据不同类型总线之间的协议转换规则,处理所述访问请求获得相应访问指令。
  15. 一种电子设备,所述电子设备包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为执行权利要求1-9任意一项所述的实现数据传输的方法。
  16. 一种计算机可读存储介质,存有处理器可执行指令,所述指令由一个或一个以上处理器执行时,完成如权利要求1-9中任一的实现数据传输的方法。
PCT/CN2019/082225 2018-05-31 2019-04-11 实现数据传输的方法、装置、电子设备及计算机可读存储介质 WO2019228077A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/007,523 US11481346B2 (en) 2018-05-31 2020-08-31 Method and apparatus for implementing data transmission, electronic device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810551660.7A CN110196824B (zh) 2018-05-31 2018-05-31 实现数据传输的方法及装置、电子设备
CN201810551660.7 2018-05-31

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/007,523 Continuation US11481346B2 (en) 2018-05-31 2020-08-31 Method and apparatus for implementing data transmission, electronic device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2019228077A1 true WO2019228077A1 (zh) 2019-12-05

Family

ID=67751384

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/082225 WO2019228077A1 (zh) 2018-05-31 2019-04-11 实现数据传输的方法、装置、电子设备及计算机可读存储介质

Country Status (3)

Country Link
US (1) US11481346B2 (zh)
CN (1) CN110196824B (zh)
WO (1) WO2019228077A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631658A (zh) * 2021-01-13 2021-04-09 成都国科微电子有限公司 指令发送方法、芯片和电子设备
CN113890783A (zh) * 2021-09-27 2022-01-04 北京微纳星空科技有限公司 一种数据收发系统、方法、电子设备及存储介质

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114008601B (zh) * 2019-06-19 2023-11-10 三菱电机株式会社 命令变换装置、方法及记录介质
CN111061434B (zh) * 2019-12-17 2021-10-01 人和未来生物科技(长沙)有限公司 基因压缩多流数据并行写入及读取方法、系统及介质
CN111290715B (zh) * 2020-02-24 2023-04-28 山东华芯半导体有限公司 一种基于分区实现的安全存储装置
CN111414325B (zh) * 2020-02-29 2021-09-17 苏州浪潮智能科技有限公司 一种Avalon总线转Axi4总线的方法
CN112115504A (zh) * 2020-06-29 2020-12-22 上海金融期货信息技术有限公司 一种基于tds协议的数据库访问方法和系统
CN111949585A (zh) * 2020-07-15 2020-11-17 西安万像电子科技有限公司 数据转换处理方法及装置
CN112463668B (zh) * 2020-11-20 2021-10-22 华中科技大学 一种基于stt-mram的多通道高速数据访存结构
CN112732611A (zh) * 2021-01-18 2021-04-30 上海国微思尔芯技术股份有限公司 一种基于axi的芯片互联系统
US11847489B2 (en) * 2021-01-26 2023-12-19 Apple Inc. United states graphics processor techniques with split between workload distribution control data on shared control bus and corresponding graphics data on memory interfaces
CN113220620B (zh) * 2021-05-21 2024-05-07 北京旋极信息技术股份有限公司 一种用于数据流格式转换的系统以及数据流传输系统
CN114595171A (zh) * 2022-02-21 2022-06-07 杭州加速科技有限公司 一种pcie转gpib的接口转换装置及其使用方法
CN115189977B (zh) * 2022-09-09 2023-01-06 太初(无锡)电子科技有限公司 一种基于axi协议的广播传输方法、系统及介质
CN116719755B (zh) * 2023-08-10 2023-11-07 浪潮电子信息产业股份有限公司 一种多应用内存访问的方法、装置、设备
CN117632820B (zh) * 2024-01-22 2024-05-14 北京开源芯片研究院 请求处理方法、装置、总线桥、电子设备及可读存储介质
CN117971439A (zh) * 2024-03-29 2024-05-03 山东云海国创云计算装备产业创新中心有限公司 一种任务处理方法、系统、设备及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650701A (zh) * 2009-09-11 2010-02-17 中国电子科技集团公司第十四研究所 并行总线到RapidIO高速串行总线的转换装置
CN102521190A (zh) * 2011-12-19 2012-06-27 中国科学院自动化研究所 一种应用于实时数据处理的多级总线系统
CN102841869A (zh) * 2012-07-03 2012-12-26 深圳市邦彦信息技术有限公司 一种基于fpga的多通道i2c控制器
CN105335326A (zh) * 2015-10-10 2016-02-17 广州慧睿思通信息科技有限公司 一种基于fpga的pcie转sata接口阵列的装置

Family Cites Families (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58225432A (ja) * 1982-06-24 1983-12-27 Toshiba Corp 要求バツフア装置
JPH0628051B2 (ja) * 1986-04-25 1994-04-13 株式会社日立製作所 記憶制御方式
CN1018883B (zh) * 1990-10-30 1992-10-28 海崖微型计算机产品公司 串行数据流控制的方法和装置
US5191649A (en) * 1990-12-21 1993-03-02 Intel Corporation Multiprocessor computer system with data bus and ordered and out-of-order split data transactions
JP3827049B2 (ja) * 1998-03-25 2006-09-27 セイコーエプソン株式会社 プリンタ制御回路、プリンタ及びプリントシステム
US6947430B2 (en) * 2000-03-24 2005-09-20 International Business Machines Corporation Network adapter with embedded deep packet processing
US20020049875A1 (en) * 2000-10-23 2002-04-25 Biran Giora Data communications interfaces
TW533354B (en) * 2001-11-07 2003-05-21 Via Tech Inc Control chip having multi-layer defer queue and the operating method thereof
JP4193607B2 (ja) * 2003-06-26 2008-12-10 日本電気株式会社 データフロー制御方式、方法、およびプログラム
US7921323B2 (en) * 2004-05-11 2011-04-05 L-3 Communications Integrated Systems, L.P. Reconfigurable communications infrastructure for ASIC networks
US7814242B1 (en) * 2005-03-25 2010-10-12 Tilera Corporation Managing data flows in a parallel processing environment
US7620741B2 (en) * 2005-04-22 2009-11-17 Sun Microsystems, Inc. Proxy-based device sharing
US7457905B2 (en) * 2005-08-29 2008-11-25 Lsi Corporation Method for request transaction ordering in OCP bus to AXI bus bridge design
CN100414524C (zh) * 2005-09-20 2008-08-27 中国科学院计算技术研究所 一种控制两种不同速度总线间数据传送的方法
KR100675850B1 (ko) * 2005-10-12 2007-02-02 삼성전자주식회사 AXI 프로토콜을 적용한 NoC 시스템
CN101276318B (zh) * 2008-05-12 2010-06-09 北京航空航天大学 基于pci-e总线的直接存取数据传输控制装置
US8489791B2 (en) * 2010-03-12 2013-07-16 Lsi Corporation Processor bus bridge security feature for network processors or the like
CN102004709B (zh) * 2009-08-31 2013-09-25 国际商业机器公司 处理器局部总线到高级可扩展接口之间的总线桥及映射方法
US8549203B2 (en) * 2010-10-29 2013-10-01 Qualcomm Incorporated Multi-protocol bus interface device
KR20120046461A (ko) * 2010-11-02 2012-05-10 삼성전자주식회사 인터페이스 장치 및 이를 포함하는 시스템
US8954017B2 (en) * 2011-08-17 2015-02-10 Broadcom Corporation Clock signal multiplication to reduce noise coupled onto a transmission communication signal of a communications device
JP5524279B2 (ja) * 2011-09-13 2014-06-18 株式会社東芝 情報処理装置および情報処理方法
US9239808B2 (en) * 2011-12-15 2016-01-19 Marvell World Trade Ltd. Serial interface for FPGA prototyping
CN102999467A (zh) * 2012-12-24 2013-03-27 中国科学院半导体研究所 基于fpga实现的高速接口与低速接口转换电路及方法
CN103049414B (zh) * 2012-12-28 2015-04-15 中国航空工业集团公司第六三一研究所 Fc总线与can总线间数据的转换及传输方法
US20140289445A1 (en) * 2013-03-22 2014-09-25 Antony Savich Hardware accelerator system and method
CN104346135B (zh) * 2013-08-08 2018-06-15 腾讯科技(深圳)有限公司 数据流并行处理的方法、设备及系统
EP3060992B1 (en) * 2013-10-27 2019-11-27 Advanced Micro Devices, Inc. Input/output memory map unit and northbridge
CN104679702B (zh) * 2013-11-28 2018-01-12 中国航空工业集团公司第六三一研究所 多路高速串行接口控制器
JP6331944B2 (ja) * 2014-10-07 2018-05-30 富士通株式会社 情報処理装置、メモリ制御装置及び情報処理装置の制御方法
CN105279123A (zh) * 2014-10-10 2016-01-27 天津市英贝特航天科技有限公司 双冗余1553b总线的串口转换结构及转换方法
CN104467909B (zh) * 2014-12-23 2016-11-30 天津光电通信技术有限公司 一种基于fpga技术的可配置pci总线的收发电路
CN104579885B (zh) * 2015-02-05 2016-03-23 中车青岛四方车辆研究所有限公司 Cpci总线和isa总线的协议转换器和转换方法
KR102106541B1 (ko) * 2015-03-18 2020-05-04 삼성전자주식회사 공유 리소스 액세스 중재 방법 및 이를 수행하기 위한 공유 리소스 액세스 중재 장치 및 공유 리소스 액세스 중재 시스템
CN204886928U (zh) * 2015-09-09 2015-12-16 吉林大学 基于pcie总线的微小时间间隔数据采集装置
CN105183680B (zh) * 2015-09-18 2018-03-20 烽火通信科技股份有限公司 实现PCIe接口转CF卡接口的FPGA芯片及方法
US10210088B2 (en) * 2015-12-28 2019-02-19 Nxp Usa, Inc. Computing system with a cache invalidation unit, a cache invalidation unit and a method of operating a cache invalidation unit in a computing system
US10216669B2 (en) * 2016-02-23 2019-02-26 Honeywell International Inc. Bus bridge for translating requests between a module bus and an axi bus
US20180143903A1 (en) * 2016-11-22 2018-05-24 Mediatek Inc. Hardware assisted cache flushing mechanism
CN107426246B (zh) * 2017-08-31 2020-09-08 北京计算机技术及应用研究所 基于FPGA的万兆以太网和RapidIO协议间高速数据交换系统
CN107577636A (zh) * 2017-09-12 2018-01-12 天津津航技术物理研究所 一种基于soc的axi总线接口数据传输系统及传输方法
US11301415B2 (en) * 2018-01-04 2022-04-12 Intel Corporation Interface discovery between partitions of a programmable logic device
US11307925B2 (en) * 2018-03-29 2022-04-19 Intel Corporation Systems and methods for isolating an accelerated function unit and/or an accelerated function context
US10719452B2 (en) * 2018-06-22 2020-07-21 Xilinx, Inc. Hardware-based virtual-to-physical address translation for programmable logic masters in a system on chip

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650701A (zh) * 2009-09-11 2010-02-17 中国电子科技集团公司第十四研究所 并行总线到RapidIO高速串行总线的转换装置
CN102521190A (zh) * 2011-12-19 2012-06-27 中国科学院自动化研究所 一种应用于实时数据处理的多级总线系统
CN102841869A (zh) * 2012-07-03 2012-12-26 深圳市邦彦信息技术有限公司 一种基于fpga的多通道i2c控制器
CN105335326A (zh) * 2015-10-10 2016-02-17 广州慧睿思通信息科技有限公司 一种基于fpga的pcie转sata接口阵列的装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112631658A (zh) * 2021-01-13 2021-04-09 成都国科微电子有限公司 指令发送方法、芯片和电子设备
CN112631658B (zh) * 2021-01-13 2022-11-15 成都国科微电子有限公司 指令发送方法、芯片和电子设备
CN113890783A (zh) * 2021-09-27 2022-01-04 北京微纳星空科技有限公司 一种数据收发系统、方法、电子设备及存储介质
CN113890783B (zh) * 2021-09-27 2022-07-26 北京微纳星空科技有限公司 一种数据收发系统、方法、电子设备及存储介质

Also Published As

Publication number Publication date
CN110196824B (zh) 2022-12-09
CN110196824A (zh) 2019-09-03
US20200401542A1 (en) 2020-12-24
US11481346B2 (en) 2022-10-25

Similar Documents

Publication Publication Date Title
WO2019228077A1 (zh) 实现数据传输的方法、装置、电子设备及计算机可读存储介质
EP3035198B1 (en) Low power entry in a shared memory link
CN113434446A (zh) 灵活总线协议协商和启用序列
JP6311164B2 (ja) 統合コンポーネント相互接続
CN107408032B (zh) 互连中的伪随机比特序列
US9164938B2 (en) Method to integrate ARM ecosystem IPs into PCI-based interconnect
US11899612B2 (en) Online upgrading method and system for multi-core embedded system
CN112631959A (zh) 用于相干消息的高带宽链路层
CN108268414B (zh) 基于spi模式的sd卡驱动器及其控制方法
CN102169470B (zh) 一种ahb总线到bvci总线的转换桥
CN110659221A (zh) 主机管理的相干设备存储器
WO2016078307A1 (zh) 可配置片上互联系统及其实现方法、装置和存储介质
US9361230B2 (en) Three channel cache-coherency socket protocol
US9170963B2 (en) Apparatus and method for generating interrupt signal that supports multi-processor
CN107111584B (zh) 到片上网络的接口的高带宽核
US6425071B1 (en) Subsystem bridge of AMBA's ASB bus to peripheral component interconnect (PCI) bus
US20150113196A1 (en) Emi mitigation on high-speed lanes using false stall
JP6745289B2 (ja) マルチチップパッケージリンク
Kavianipour et al. High performance FPGA-based scatter/gather DMA interface for PCIe
US7774513B2 (en) DMA circuit and computer system
US9563586B2 (en) Shims for processor interface
US10671779B1 (en) Function calls in high level synthesis
CN114443530B (zh) 基于TileLink的芯片互联电路及数据传输方法
TW201916644A (zh) 匯流排系統
US11886365B2 (en) DMA control circuit with quality of service indications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19811074

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19811074

Country of ref document: EP

Kind code of ref document: A1