WO2021120714A1 - 数据交换方法、装置、处理器及计算机系统 - Google Patents

数据交换方法、装置、处理器及计算机系统 Download PDF

Info

Publication number
WO2021120714A1
WO2021120714A1 PCT/CN2020/114006 CN2020114006W WO2021120714A1 WO 2021120714 A1 WO2021120714 A1 WO 2021120714A1 CN 2020114006 W CN2020114006 W CN 2020114006W WO 2021120714 A1 WO2021120714 A1 WO 2021120714A1
Authority
WO
WIPO (PCT)
Prior art keywords
destination
source
data
address
buffer unit
Prior art date
Application number
PCT/CN2020/114006
Other languages
English (en)
French (fr)
Inventor
蒋宇翔
王晓阳
Original Assignee
成都海光微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都海光微电子技术有限公司 filed Critical 成都海光微电子技术有限公司
Publication of WO2021120714A1 publication Critical patent/WO2021120714A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1673Details of memory controller using buffers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of data processing technology, and specifically to a data exchange method, device, processor, and computer system.
  • an embodiment of the present application provides a data exchange method that exchanges data between a source storage unit and a destination storage unit through a ring buffer unit.
  • the ring buffer unit includes a first buffer unit and a second buffer unit.
  • the method includes: generating at least one source address read request, and sending the at least one source address read request to the source storage unit, so that the source storage unit returns the source data corresponding to each source address read request to the first Buffer unit; generates at least one destination address read request, and sends the at least one destination address read request to the destination storage unit, so that the destination storage unit returns the destination data corresponding to each destination address read request to the first Two buffer units; determining that the source data can be transmitted to the first buffer unit, and the destination data can be transmitted to the second buffer unit; according to the data received by the ring buffer unit, generate a corresponding address write Request, wherein the address write request includes the destination address of the corresponding data, the destination address corresponding to the data received by the first buffer unit is located in the destination storage unit,
  • a source address read request is generated, and corresponding source data is requested from the source storage unit, so that the source storage unit returns the corresponding source data to the first buffer unit of the ring buffer unit; generates a destination address read request, And request the corresponding destination data from the destination storage unit, so that the destination storage unit returns the corresponding destination data to the second buffer unit of the ring buffer unit. After it is determined that the source data can all be transferred to the first buffer unit, and the destination data can all be transferred to the second buffer unit.
  • the present application uses the ring buffer unit to realize the data exchange between the source storage unit and the destination storage unit, which reduces power consumption and saves cost compared with the prior art.
  • At least one of the source data is returned to the first buffer unit, and the generating a corresponding address write request according to the data received by the ring buffer unit includes: according to the first buffer
  • the source data received by the unit generates a corresponding destination address write request, where the destination address write request includes the destination address of the corresponding source data; the address write request is used to send the corresponding data to the corresponding storage unit
  • the sending of the destination address includes: using the destination address write request to send the corresponding source data to the destination address of the destination storage unit.
  • the GPU may generate a destination address write request, the destination address write request includes the destination address of the source data, and the destination address is located in the destination storage unit, thereby realizing the The source data received by a buffer unit is written into the destination storage unit.
  • the source data received by the first buffer unit comes from the source storage unit. Therefore, the first buffer unit is used to realize the data transfer process from the source storage unit to the destination storage unit.
  • generating a corresponding address write request according to the data received by the ring buffer unit includes: every detection A source data is written into the first buffer unit, and the source data to be written is generated according to the destination address read request corresponding to the destination data that has been written into the second buffer unit The corresponding destination address write request.
  • not only the destination address write request corresponding to the source data written in the first buffer unit can be obtained in time, but also the destination data not sent to the second buffer unit can be prevented from being overwritten.
  • At least one of the destination data is returned to the second buffer unit, and the generating a corresponding address write request according to the data received by the ring buffer unit includes: according to the second buffer
  • the destination data received by the unit generates a corresponding source address write request, where the source address write request includes the destination address of the corresponding destination data; the address write request is used to send the corresponding data to the corresponding storage unit
  • the sending of the destination address includes: using the source address write request to send the corresponding destination data to the destination address of the source storage unit.
  • the GPU may generate a source address write request.
  • the source address write request includes the destination address of the destination data, and the destination address is located in the source storage unit, thereby realizing the The destination data received by the second buffer unit is written into the source storage unit.
  • the destination data received by the second buffer unit comes from the destination storage unit. Therefore, the second buffer unit is used to realize the data transfer process from the destination storage unit to the source storage unit.
  • generating a corresponding address write request according to the data received by the ring buffer unit includes: every detection A destination data is written into the second buffer unit, and the destination data to be written is generated according to the source address read request corresponding to the source data written in the first buffer unit The corresponding source address write request.
  • not only the source address write request corresponding to the destination data written into the second buffer unit can be obtained in time, but also the source data that has not been sent to the first buffer unit can be overwritten.
  • the communication interface for the source storage unit to send source data to the first buffer unit is the same as the communication interface for the destination storage unit to send destination data to the second buffer unit; step: generate at least A destination address read request, and sending the at least one destination address read request to the destination storage unit. After the step: generating at least one source address read request and sending the at least one source address read request to the source storage unit carried out.
  • the purpose of sending the destination address read request to the destination storage unit is to make the destination storage unit return the destination data.
  • the purpose of sending the source address read request to the source storage unit is to make the source storage unit return the source data.
  • the communication interface through which the source storage unit sends source data to the first buffer unit is different from the communication interface through which the destination storage unit sends destination data to the second buffer unit; step: generate At least one destination address read request, and sending the at least one destination address read request to the destination storage unit and the step: generating at least one source address read request, and sending the at least one source address read request to the source storage unit Parallel execution.
  • the purpose of sending the destination address read request to the destination storage unit is to make the destination storage unit return the destination data.
  • the purpose of sending the source address read request to the source storage unit is to make the source storage unit return the source data.
  • the data exchange method is performed in an orderly mode, the determining that the source data can be transmitted to the first buffer unit, and the destination data can be transmitted to the second buffer unit , Including: determining that the at least one source address read request is all sent to the source storage unit, and the at least one destination address read request is all sent to the destination storage unit.
  • At least one source address read request is sent to the source storage unit, that is, it can be determined that the source data corresponding to each source address read request can be transmitted back to the first buffer unit;
  • the destination address read requests are all sent to the destination storage unit, and it can be determined that the destination data corresponding to each destination address read request mentioned above can be transmitted back to the second buffer unit, so as to avoid the data from being sent by the opposite end before the data is transmitted to the buffer unit. The data is overwritten.
  • the data exchange method is performed in an out-of-order mode, the determining that the source data can be transmitted to the first buffer unit, and the destination data can be transmitted to the second buffer unit , Including: determining that the source data are all transmitted to the first buffer unit, and the destination data are all transmitted to the second buffer unit.
  • the data exchange method is performed in the disordered mode, and there is no order restriction on the read request and the write request. Therefore, after the source data is transmitted to the first buffer unit, and the destination data is transmitted to the second buffer unit , It can be determined that the source data can be transmitted to the first buffer unit, and the destination data can be transmitted to the second buffer unit; thus avoiding the data from being overwritten by the data sent from the opposite end before the data is transmitted to the buffer unit .
  • an embodiment of the present application provides a data exchange device configured to exchange data between a source storage unit and a destination storage unit through a ring buffer unit.
  • the ring buffer unit includes a first buffer unit and a second buffer unit.
  • the device includes: a source read request generation module configured to generate at least one source address read request, and send the at least one source address read request to the source storage unit, so that the source storage unit will request each source address read request The corresponding source data is returned to the first buffer unit;
  • the destination read request generation module is configured to generate at least one destination address read request, and send the at least one destination address read request to the destination storage unit, so that the destination
  • the storage unit returns the destination data corresponding to each destination address read request to the second buffer unit;
  • the data transmission determining module is configured to determine that the source data can be transmitted to the first buffer unit, and the destination data can be Is transmitted to the second buffer unit;
  • a write request generation module configured to generate a corresponding address write request according to the data received by the ring buffer unit
  • an embodiment of the present application provides a processor including a ring buffer unit, a source storage unit, a destination storage unit, a source read address logic generating circuit, and a destination read address logic generating circuit, the source read address logic generating circuit Connected to the source storage unit, the destination read address logic generating circuit is connected to the destination storage unit, the ring buffer unit includes a first buffer unit and a second buffer unit, and the source storage unit passes through a corresponding communication interface Is connected to the first buffer unit and the second buffer unit, the destination storage unit is connected to the first buffer unit and the second buffer unit through a corresponding communication interface, and the processor is configured to pass through the first buffer unit A buffer unit and a second buffer unit exchange data between a source storage unit and a destination storage unit; the source read address logic generating circuit is configured to generate at least one source address read request, and send the at least one source storage unit to the source storage unit Address read request, so that the source storage unit returns the source data corresponding to each source address read request to the first buffer
  • it further includes a destination write address logic generating circuit, the destination write address logic generating circuit is connected to the first buffer unit; the destination write address logic generating circuit is configured to be based on the first buffer unit
  • the received source data generates a corresponding destination address write request, where the destination address write request includes the destination address of the corresponding source data; the first buffer unit is configured to use the destination address write request to transfer the corresponding destination address
  • the source data is sent to the destination address of the destination storage unit.
  • the destination write address logic generating circuit is configured to: every time it is detected that the first buffer unit is written A source data generates the destination address write request corresponding to the source data that has been written according to the destination address read request corresponding to the destination data that has been written into the second buffer unit.
  • it further includes a source write address logic generating circuit, the source write address logic generating circuit is connected to the second buffer unit; the source write address logic generating circuit is configured to be based on the second buffer unit
  • the received destination data generates a corresponding source address write request, where the source address write request includes the destination address of the corresponding destination data; the second buffer unit is configured to use the source address write request to transfer the corresponding source address
  • the destination data is sent to the destination address of the source storage unit.
  • the source write address logic generating circuit is configured to:
  • the source address read request corresponding to one of the source data that has been written in the first buffer unit is generated to generate the written destination data.
  • the communication interface through which the source storage unit sends source data to the first buffer unit is the same as the communication interface through which the destination storage unit sends destination data to the second buffer unit;
  • the source storage unit The unit is a global data shared GDS memory, and the destination storage unit is a GDS memory; or the source storage unit is any one of a cache memory and a device memory, and the destination storage unit is the cache memory and the device Any one of the memory.
  • the communication interface through which the source storage unit sends source data to the first buffer unit is different from the communication interface through which the destination storage unit sends destination data to the second buffer unit;
  • the source The storage unit is the GDS memory, and the destination storage unit is any one of the cache memory and the device memory; or the source storage unit is any one of the cache memory and the device memory, and the destination The storage unit is the GDS memory.
  • the data exchange method is performed in an ordered mode, and the data exchange method is performed in an ordered mode;
  • the source read address logic generating circuit is configured to determine the at least one source address read request Are all sent to the source storage unit;
  • the destination read address logic generating circuit is configured to determine that the at least one destination address read request is all sent to the destination storage unit.
  • the data exchange method is performed in an out-of-order mode; the first buffer unit is configured to determine that the source data are all transmitted to the first buffer unit; and the second buffer unit is configured to It is determined that the target data are all transmitted to the second buffer unit.
  • an embodiment of the present application provides a computer system, including the foregoing third aspect or a processor in any optional implementation manner of the third aspect.
  • this application provides an executable program product that, when the executable program product runs on a computer, causes the computer to execute the method in the first aspect or any possible implementation of the first aspect.
  • this application provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium, and the computer program executes the first aspect or any possible implementation of the first aspect when the computer program is run by a processor The method in the way.
  • FIG. 1 is a hardware flowchart of a GPU corresponding to an embodiment of the application
  • FIG. 2 is a schematic flowchart of a data exchange method provided by an embodiment of the application.
  • FIG. 3 is a schematic flowchart of a specific implementation manner of some steps of the data exchange method provided by an embodiment of the application;
  • FIG. 5 is a schematic structural block diagram of a data exchange device provided by an embodiment of the application.
  • Fig. 6 is a schematic structural diagram of a computer system according to an embodiment of the application.
  • the GPU performs a hardware flow chart of Direct Memory Access (DMA) operations.
  • the GPU includes a ring buffer unit, a source read address logic generation circuit, a destination write address logic generation circuit, a GDS interface, and a high-speed cache. Interface, Global Data Share (GDS) memory, Cache memory, Device memory, Device memory controller.
  • the source read address logic generation circuit is respectively connected to the GDS interface and the cache interface
  • the destination write address logic generation circuit is respectively connected to the GDS interface 108 and the cache interface.
  • the GDS memory is connected to the ring buffer unit through the GDS interface
  • the cache memory is connected to the cache interface through a plurality of cache routes
  • the cache interface is also connected to the ring buffer unit.
  • the cache memory is also connected to the device memory through the device memory controller.
  • a ring buffer unit is configured as a buffer unit for buffering data, and the ring buffer unit may be implemented by a static random-access memory (SRAM).
  • SRAM static random-access memory
  • the GPU can generate a source address read request by using a source read address logic generating circuit, the source address read request can read data from the source storage unit, and the data obtained from the source storage unit can be written into the ring buffer unit.
  • the ring buffer unit has a pointer corresponding to the source address read request. The pointer points to the blank storage space in the ring buffer unit where the data is to be written. When the blank storage space is written with data, the pointer points to the position to be updated to make the The pointer points to the new blank storage space for data to be written.
  • the source storage unit may be any one of GDS memory, cache memory, and device memory.
  • the GPU can generate a destination address write request by using the destination write address logic generating circuit.
  • the destination address write request can read data from the ring buffer unit, and the data read from the ring buffer unit is written to the destination storage unit.
  • the ring buffer unit also has a pointer corresponding to the destination address write request. The pointer points to the storage space corresponding to the data to be read from the ring buffer unit. When the corresponding data is read, the location pointed to by the pointer is updated to make The pointer points to the new storage space corresponding to the data to be read from the ring buffer unit.
  • the destination storage unit may also be any one of GDS memory, cache memory, and device memory.
  • data exchange is implemented by extending the DMA engine in the command processor (Command processor), and a swap mode is added to the DMA_DATA command packet, thereby using DMA_DATA_SWAP to implement data exchange.
  • command processor Common processor
  • a new field [31:31] can be added to DW0 of the DMA_DATA command packet:
  • 0 DMA_MODE:DMA-Copy data from source to destination.
  • SWAP_MODE swap data between source and destination.
  • the above new field defines the bit of the SWAP mode of DMA SWAP operation.
  • the bit of SWAP mode is set to 0, which means that the data is copied from the source location to the destination according to the DMA mode; the bit of SWAP mode is set to 1, which means that the source location and the destination are exchanged. Data corresponding to each location.
  • FIG. 2 shows a data exchange method provided by an embodiment of the present application.
  • the method can be executed by the GPU shown in FIG. 1.
  • the GPU includes a ring buffer unit 102, a source read address logic generating circuit 104, and a source write address.
  • the ring buffer unit 102 includes a first buffer unit 1021 and a second buffer unit 1022.
  • the first buffer unit 1021 is connected to the GDS interface 108 and the cache interface 110, respectively, and the second buffer unit 1022 is connected to the GDS interface 108 and the cache interface 110, respectively.
  • the source read address logic generating circuit 104 is connected to the GDS interface 108 and the cache interface 110 respectively, the source write address logic generating circuit 105 is connected to the second buffer unit 1022, and the destination write address logic generating circuit 106 is connected to the first buffer unit 1021.
  • the read address logic generating circuit 107 is connected to the GDS interface 108 and the cache interface 110, respectively.
  • the GDS interface 108 is also connected to the GDS memory 112, and the cache interface 110 is also connected to the cache memory 116 through a plurality of cache routes 114.
  • the cache memory 116 is also connected to the device memory controller 118, and the device memory controller 118 is also connected to the device memory 120.
  • the storage space of the ring buffer unit 102 may be set to 64*512 bits, and the storage space of the first buffer unit 1021 and the second buffer unit 1022 can be equal, and both are equal to half of the ring buffer unit 102 (for example, , The storage space of the first buffer unit 1021 and the second buffer unit 1022 may both be 32*512 bits), so that the ring buffer unit 102 can be utilized to a greater extent.
  • the storage space of the first buffer unit 1021 may be equal to or unequal to the storage space of the second buffer unit 1022, and the specific numerical values of the storage spaces of the first buffer unit 1021 and the second buffer unit 1022 should not be construed as relevant to the present application. limit.
  • the GPU is configured to exchange data between the source storage unit and the destination storage unit through the first buffer unit 1021 and the second buffer unit 1022 of the ring buffer unit 102.
  • the method specifically includes the following steps S110 to S150:
  • Step S110 Generate at least one source address read request, and send the at least one source address read request to the source storage unit, so that the source storage unit returns the source data corresponding to each source address read request to the first A buffer unit 1021.
  • Each source address read request in the at least one source address read request includes a corresponding source address, each source address stores corresponding source data, and each source address is located in a source storage unit.
  • the source address read request may be generated by the GPU through the source read address logic generating circuit 104.
  • the GPU can use the source read address logic generating circuit 104 to send each generated source address read request to the source storage unit.
  • the source storage unit finds the corresponding source data according to the source address, and returns the source data to the first buffer unit 1021.
  • Step S120 Generate at least one destination address read request, and send the at least one destination address read request to the destination storage unit, so that the destination storage unit returns the destination data corresponding to each destination address read request to the first Two buffer unit 1022.
  • Each destination address read request in the at least one destination address read request includes a corresponding destination address, each destination address stores corresponding destination data, and each destination address is located in a destination storage unit.
  • the destination address read request may be generated by the GPU through the destination read address logic generating circuit 107.
  • the GPU can use the target read address logic generating circuit 107 to send each generated target address read request to the target storage unit.
  • the destination storage unit finds the corresponding destination data according to the destination address, and returns the destination data to the second cache unit.
  • Step S130 It is determined that the source data can be transmitted to the first buffer unit 1021, and the destination data can be transmitted to the second buffer unit 1022.
  • the GPU determines that all source data can be transmitted to the first buffer unit 1021, and the GPU determines that all target data can be transmitted to the second buffer unit 1022. It is determined that all source data can be transferred to the first buffer unit 1021 to avoid that the destination data is sent to the source storage unit before the source data is completely transferred to the first buffer unit 1021, causing the source data that has not yet been transferred to be sent to the destination. Data coverage. Similarly, it is determined that all destination data can be transmitted to the second buffer unit 1022 in order to avoid that the source data is sent to the destination storage unit before the destination data is completely transmitted to the second buffer unit 1022, resulting in the destination that has not yet been transmitted. The data is overwritten by the source data.
  • the source data can be transmitted to the first buffer unit 1021 and the destination data can be transmitted to the second buffer unit 1022 under different conditions.
  • step S130 may be: determining that the at least one source address read request is sent to the source storage unit, and The at least one destination address read request is sent to the destination storage unit.
  • the data exchange method is carried out in an orderly mode. Since the read request is sent first and the write request is sent later, the data requested by the read request is already on the way back to the ring buffer unit 102 before the write request is sent. Therefore, the source read address logic generating circuit 104 determines that at least one source address read request is sent to the source storage unit, and it can be determined that the source data corresponding to each source address read request can be transmitted back to the first buffer unit.
  • the destination read address logic generating circuit 107 determines that at least one destination address read request is sent to the destination storage unit, and then it can be determined that the destination data corresponding to each destination address read request mentioned above can be transmitted back to the second buffer unit 1022, thereby avoiding When the data has not been transmitted to the buffer unit, it is overwritten by the data sent from the opposite end.
  • step S130 may be: determining that the source data are all transmitted to the first buffer unit 1021, and The destination data are all transmitted to the second buffer unit 1022.
  • the data exchange method is carried out in disordered mode. There is no restriction on the order of read requests and write requests. Therefore, the source data can be determined only after the source data is transmitted to the first buffer unit 1021 and the destination data is transmitted to the second buffer unit 1022. Data can be transmitted to the first buffer unit 1021, and the target data can be transmitted to the second buffer unit 1022; thereby avoiding the data from being overwritten by the data sent from the opposite end before the data is transmitted to the buffer unit.
  • Step S140 According to the data received by the ring buffer unit 102, a corresponding address write request is generated.
  • Step S150 using the address write request to send the corresponding data to the destination address of the corresponding storage unit.
  • the address write request includes the destination address of the corresponding data
  • the destination address corresponding to the data received by the first buffer unit 1021 is located in the destination storage unit
  • the destination address of the data received by the second buffer unit 1022 is The corresponding destination address is located in the source storage unit.
  • the source data can be transmitted to the first buffer unit 1021, and the destination data can be transmitted to the second buffer unit 1022.
  • the address write request of the second buffer unit 1022 sends the destination data received by the second buffer unit 1022 to the source storage unit.
  • the present application uses the ring buffer unit 102 to realize the data exchange between the source storage unit and the destination storage unit, which reduces power consumption and saves cost compared with the prior art.
  • steps S140 to S150 correspond to the following steps S141 to S151, respectively:
  • Step S141 According to the source data received by the first buffer unit 1021, a corresponding destination address write request is generated.
  • the destination address write request includes the destination address of the corresponding source data, and the destination address is located in the destination storage unit.
  • the destination write address logic generating circuit 106 can generate a corresponding destination address write request, so that the source data in the first buffer unit 1021 can be Timely transfer to the destination storage unit.
  • Step S151 Send the corresponding source data to the destination address of the destination storage unit by using the destination address write request.
  • the GPU executes the destination address write request, and sends the source data to the destination address of the destination storage unit through the first buffer unit 1021, so as to write the source data into the destination storage unit.
  • the GPU can generate a destination address write request, the destination address write request includes the destination address of the source data, the destination address is located in the destination storage unit, so that the first buffer unit 1021 can be received
  • the source data is written to the destination storage unit.
  • the source data received by the first buffer unit 1021 comes from the source storage unit. Therefore, the first buffer unit 1021 is used to realize the data transfer process from the source storage unit to the destination storage unit.
  • steps S140 to S150 correspond to the following steps S241 to S251, respectively:
  • Step S241 According to the destination data received by the second buffer unit 1022, a corresponding source address write request is generated.
  • the source address write request includes the destination address of the corresponding destination data, and the destination address is located in the source storage unit.
  • the source write address logic generating circuit 105 each time the second buffer unit 1022 receives destination data, the source write address logic generating circuit 105 generates a corresponding source address write request, so that the destination data in the second buffer unit 1022 can be Transfer to the source storage unit in time.
  • Step S251 Send the corresponding destination data to the destination address of the source storage unit by using the source address write request.
  • the GPU executes the source address write request, and sends the destination data to the destination address of the source storage unit through the second buffer unit 1022, so as to write the destination data into the source storage unit.
  • the GPU can generate a source address write request, the source address write request includes the destination address of the destination data, the destination address is located in the source storage unit, so that the second buffer unit 1022 can be received
  • the destination data is written to the source storage unit.
  • the destination data received by the second buffer unit 1022 comes from the destination storage unit. Therefore, the second buffer unit 1022 is used to realize the data transfer process from the destination storage unit to the source storage unit.
  • the communication interface through which the source storage unit sends the source data to the first buffer unit 1021 is the same as the communication interface through which the destination storage unit sends the destination data to the second buffer unit 1022, then step S120 It can be executed after step S110.
  • the communication interface for the source storage unit to send source data to the first buffer unit 1021 and the communication interface for the destination storage unit to send destination data to the second buffer unit 1022 are both the cache interface 110 shown in FIG. 1.
  • the source storage unit and the destination storage unit may be different storage units connected to the cache interface 110, for example, the source storage unit may be the cache memory 116, and the destination storage unit may be the device memory 120; It can be understood that the source storage unit and the destination storage unit can also be interchanged, that is, the source storage unit may be the device memory 120, and the destination storage unit may be the cache memory 116.
  • the goal of the data exchange method provided by the embodiments of this application is to realize the exchange of data in two places. Therefore, for the carriers where the two data to be exchanged are located, one of the carriers can be randomly used as the source storage unit, and the other carrier will naturally become the destination. Storage unit.
  • the source storage unit and the destination storage unit can also be the same storage unit, that is, data exchange can be the exchange of data in different locations of the same storage unit.
  • the source storage unit and the destination storage unit can both be Cache memory 116.
  • the purpose of sending the destination address read request to the destination storage unit is to make the destination storage unit return the destination data.
  • the purpose of sending the source address read request to the source storage unit is to make the source storage unit return the source data.
  • the source storage unit is the device memory 120 and the destination storage unit is the cache memory 116 to illustrate the case where the communication interfaces are the same (both are the cache interfaces 110):
  • the source read address logic generating circuit 104 generates multiple source address read requests, and the GPU sends the multiple source address read requests to the source storage unit: the device memory 120.
  • the device memory 120 can obtain the corresponding source data for each source address read request, and send the source data to the first buffer unit 1021 via the cache interface 110.
  • the GPU After the GPU sends multiple source address read requests to the device memory 120, it can determine that the source data can be transmitted to the first cache unit; for the disordered mode, it needs to receive multiple source address read requests in the first cache unit. Only the source data corresponding to the source address read request can be determined that the source data can be transmitted to the first cache unit.
  • the destination read address logic generating circuit 107 generates multiple destination address read requests, and the GPU sends the multiple destination address read requests to the destination storage unit: the cache memory 116.
  • the cache memory 116 can obtain corresponding target data for each target address read request, and send the target data to the second buffer unit 1022 via the cache interface 110.
  • the GPU After the GPU sends multiple destination address read requests to the cache memory 116, it can determine that the destination data can be transmitted to the second cache unit; for the disordered mode, it needs to receive multiple destination address read requests from the second cache unit.
  • the destination data corresponding to each destination address read request can determine that the destination data can be transmitted to the second buffer unit.
  • the GPU can use the destination write address logic generating circuit 106 to generate a corresponding destination address write request, the destination address write request It includes the destination address of the destination storage unit: cache memory 116, so that the source data from the source storage unit: device memory 120 is cached by the first cache unit and transferred to the destination storage unit: cache memory 116.
  • the GPU when it is determined that at least one destination data can be transmitted to the second buffer unit, the GPU generates a corresponding destination address write request by using the destination write address logic generating circuit 106: A piece of source data is written into the first buffer unit, and the source data corresponding to the written source data is generated according to the destination address read request corresponding to the destination data that has been written into the second buffer unit Destination address write request. In this way, the timeliness of writing the source data to the destination storage unit can be guaranteed. Obviously, even if there is at least one piece of destination data that has not been written into the second buffer unit, it can prevent the destination data that has not been written into the second buffer unit from being overwritten.
  • the step of generating a corresponding source address write request includes: every time it is detected that a destination data is written in the second buffer unit, according to the The source address read request corresponding to one of the source data written into the first buffer unit is generated, and the source address write request corresponding to the written destination data is generated.
  • the GPU can use the source write address logic generating circuit 105 to generate a corresponding source address write request, the source address write request It includes the destination address of the source storage unit: the device memory 120, so that the source data from the destination storage unit: the cache memory 116 is cached by the second cache unit and transferred to the source storage unit: the device memory 120.
  • step S120 can be executed in parallel with step S110.
  • the communication interface through which the source storage unit sends source data to the first buffer unit 1021 is the GDS interface 108 shown in FIG. 1, and the communication interface through which the destination storage unit sends destination data to the second buffer unit 1022 is the cache interface 110.
  • the destination data returned by the destination storage unit and the source storage unit return to the source data are in parallel without interference, and both can maintain high transmission. Speed, thereby improving the efficiency of data transmission.
  • the source storage unit is the GDS memory 112 and the destination storage unit is the cache memory 116 to illustrate the different communication interfaces:
  • the source read address logic generating circuit 104 generates multiple source address read requests, and the GPU sends the multiple source address read requests to the source storage unit: the GDS memory 112.
  • the GDS memory 112 can obtain the corresponding source data for each source address read request, and send the source data to the first buffer unit 1021 via the GDS interface 108.
  • the target read address logic generating circuit 107 generates multiple target address read requests, and the GPU sends the multiple target address read requests to the target storage unit: the cache memory 116.
  • the cache memory 116 can obtain corresponding target data for each target address read request, and send the target data to the second buffer unit 1022 via the cache interface 110.
  • the GPU can use the destination write address logic generating circuit 106 to generate Corresponding destination address write request, the destination address write request includes the destination address located in the destination storage unit: cache memory 116, so that the source data from the source storage unit: GDS memory 112 is cached by the first cache unit and transferred to Destination storage unit: cache memory 116.
  • the GPU can use the source write address logic generating circuit 105 to generate a corresponding source address write request.
  • the source address write request includes the source storage unit: GDS memory.
  • the destination address of 112 so that the source data from the destination storage unit: the cache memory 116 is cached by the second cache unit, and transferred to the source storage unit: the GDS memory 112.
  • FIG. 5 shows a data exchange device provided by an embodiment of the present application.
  • the device 300 includes:
  • the source read request generating module 310 is configured to generate at least one source address read request, and send the at least one source address read request to the source storage unit, so that the source storage unit can read the source corresponding to each source address read request.
  • the data is returned to the first buffer unit.
  • the destination read request generating module 320 is configured to generate at least one destination address read request, and send the at least one destination address read request to the destination storage unit, so that the destination storage unit can assign each destination address read request to the corresponding destination.
  • the data is returned to the second buffer unit.
  • the data transmission determining module 330 is configured to determine that the source data can be transmitted to the first buffer unit, and the destination data can be transmitted to the second buffer unit;
  • the write request generation module 340 is configured to generate a corresponding address write request according to the data received by the ring buffer unit, wherein the address write request includes the destination address of the corresponding data, and the first buffer unit receives The destination address corresponding to the data is located in the destination storage unit, and the destination address corresponding to the data received by the second buffer unit is located in the source storage unit;
  • the data sending module 350 is configured to use the address write request to send the corresponding data to the destination address of the corresponding storage unit.
  • the write request generation module 340 is specifically configured to generate a corresponding destination address write request according to the source data received by the first buffer unit, where the destination address write request includes the corresponding source data Destination address.
  • the data sending module 350 is specifically configured to use the destination address write request to send the corresponding source data to the destination address of the destination storage unit.
  • the write request generation module 340 is specifically configured to generate a corresponding source address write request according to the destination data received by the second buffer unit, where the source address write request includes the corresponding destination data Destination address.
  • the data sending module 350 is specifically configured to use the source address write request to send the corresponding destination data to the destination address of the source storage unit.
  • the data exchange method is performed in an orderly mode, and the data transmission determining module 330 is specifically configured to determine that the at least one source address read request is sent to the source storage unit, and the at least one source address read request is sent to the source storage unit. All destination address read requests are sent to the destination storage unit.
  • the data exchange method is performed in an out-of-order mode, and the data transmission determining module 330 is specifically configured to determine that the source data are all transmitted to the first buffer unit, and the destination data Are transmitted to the second buffer unit.
  • the data exchange device shown in FIG. 5 corresponds to the data exchange method shown in FIG. 2, and will not be repeated here.
  • the processor includes a ring buffer unit, a source storage unit, a destination storage unit, a source read address logic generating circuit, and a destination read address logic generating circuit ,
  • the source read address logic generating circuit is connected to the source storage unit
  • the destination read address logic generating circuit is connected to the destination storage unit
  • the ring buffer unit includes a first buffer unit and a second buffer unit, so The source storage unit is connected to the first buffer unit and the second buffer unit through a corresponding communication interface
  • the destination storage unit is connected to the first buffer unit and the second buffer unit through a corresponding communication interface
  • the processor is configured to exchange data between a source storage unit and a destination storage unit through the first buffer unit and the second buffer unit;
  • the source read address logic generating circuit is configured to generate at least one source address read request and send it to the The source storage unit sends the at least one source address read request, so that the source storage unit returns the source data corresponding to each source address read request
  • it further includes a target write address logic generating circuit, the target write address logic generating circuit is connected to the first buffer unit; the target write address logic generating circuit is configured to be based on the first buffer
  • the source data received by the unit generates a corresponding destination address write request, where the destination address write request includes the destination address of the corresponding source data; the first buffer unit is configured to use the destination address write request to transfer the corresponding destination address to the destination address.
  • the source data is sent to the destination address of the destination storage unit.
  • it further includes a source write address logic generating circuit, the source write address logic generating circuit is connected to the second buffer unit; the source write address logic generating circuit is configured to be based on the second buffer
  • the destination data received by the unit generates a corresponding source address write request, where the source address write request includes the destination address of the corresponding destination data; the second buffer unit is configured to use the source address write request to transfer the corresponding source address write request.
  • the destination data is sent to the destination address of the source storage unit.
  • the communication interface for the source storage unit to send source data to the first buffer unit is the same as the communication interface for the destination storage unit to send destination data to the second buffer unit;
  • the source The storage unit is a global data shared GDS memory, and the destination storage unit is a GDS memory; or the source storage unit is either a cache memory or a device memory, and the destination storage unit is the cache memory, the Any one of the device memory.
  • the communication interface through which the source storage unit sends source data to the first buffer unit is different from the communication interface through which the destination storage unit sends destination data to the second buffer unit;
  • the source storage unit is the GDS memory, the destination storage unit is any one of the cache memory and the device memory; or the source storage unit is any one of the cache memory and the device memory, the The destination storage unit is the GDS memory.
  • the data exchange method is performed in an ordered mode, and the data exchange method is performed in an ordered mode;
  • the source read address logic generating circuit is configured to determine the at least one source address read Requests are all sent to the source storage unit;
  • the destination read address logic generating circuit is configured to determine that the at least one destination address read request is all sent to the destination storage unit.
  • the data exchange method is performed in an out-of-order mode; the first buffer unit is configured to determine that the source data is all transmitted to the first buffer unit; the second buffer unit is configured It is determined that the target data are all transmitted to the second buffer unit.
  • FIG. 6 is a schematic structural diagram of a computer system according to an embodiment of the application.
  • the computer system may be composed of a hardware subsystem and a software subsystem.
  • the computer system includes a processor 601, a memory (memory) 602, and a bus 603; wherein the processor 601 and the memory 602 all communicate with each other through the bus 603; the processor 601 It is configured to call the program instructions in the memory 602 to perform image and graphics related operations.
  • the method for the processor 601 to exchange data is consistent with the foregoing embodiment, and will not be repeated here.
  • the processor 601 can be a graphics processor, a central processing unit (CPU), an accelerated processing unit, etc., or other types of processors.
  • processors Such as network processor (Network Processor, NP), application processor, of course, in some products, the application processor is the CPU, the processor 601 provided in this embodiment of the application can be configured as a graphics processing application scenario, or can be configured Computational scenarios such as in-depth calculations.
  • NP Network Processor
  • application processor is the CPU
  • the processor 601 provided in this embodiment of the application can be configured as a graphics processing application scenario, or can be configured Computational scenarios such as in-depth calculations.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation.
  • multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
  • the data exchange method, device, processor and computer system provided by the present application realize the data exchange between the source storage unit and the destination storage unit by using a ring buffer unit, without adding a new core, which not only reduces power consumption, but also saves Cost.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Transfer Systems (AREA)

Abstract

本申请提供一种数据交换方法、装置、处理器及计算机系统,包括:生成至少一个源地址读请求,并向源存储单元发送至少一个源地址读请求,以便源存储单元将对应的源数据返回给第一缓冲单元;生成至少一个目的地址读请求,并向目的存储单元发送至少一个目的地址读请求,以便目的存储单元将对应的目的数据返回给第二缓冲单元;确定源数据能传输到第一缓冲单元,且目的数据能传输到第二缓冲单元;根据环形缓冲单元收到的数据,生成对应的地址写请求;利用地址写请求,将对应的数据向对应的存储单元的去向地址发送。本申请利用环形缓冲单元便实现了源存储单元与目的存储单元的数据交换,无需新增内核,减小了功耗,节约了成本。

Description

数据交换方法、装置、处理器及计算机系统
相关申请的交叉引用
本申请要求于2019年12月18日提交中国专利局的申请号为201911317544X、名称为“数据交换方法、装置、处理器及计算机系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,具体而言,涉及一种数据交换方法、装置、处理器及计算机系统。
背景技术
对于图形处理器(Graphics Processing Unit,简称GPU)等电子设备,常常会产生交换电子设备的内存中的数据的需求。例如,在高度并行线性系统软件包(Highly Parallel Linear system package,简称HPL)中,通常需要搜索主分量,并交换当前行和具有主分量的行;或者在多核系统中,需要在每个核上做子矩阵操作,然后交换上述核的子矩阵。
在一些场景中,在进行数据交换时,常常需要引入新的内核,这样容易造成大量功耗。
发明内容
第一方面,本申请实施例提供了一种数据交换方法,通过环形缓冲单元交换源存储单元与目的存储单元的数据,所述环形缓冲单元包括第一缓冲单元和第二缓冲单元,所述方法包括:生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求,以便所述源存储单元将每个源地址读请求对应的源数据返回给所述第一缓冲单元;生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求,以便所述目的存储单元将每个目的地址读请求对应的目的数据返回给所述第二缓冲单元;确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元;根据所述环形缓冲单元收到的数据,生成对应的地址写请求,其中,所述地址写请求包括对应的数据的去向地址,所述第一缓冲单元收到的数据所对应的去向地址位于所述目的存储单元,所述第二缓冲单元收到的数据所对应的去向地址位于所述源存储单元;利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送。
在上述的实施方式中,生成源地址读请求,并向源存储单元请求对应的源数据,以便源存储单元将对应的源数据返回给环形缓冲单元的第一缓冲单元;生成目的地址读请求,并向目的存储单元请求对应的目的数据,以便目的存储单元将对应的目的数据返回给环形缓冲单元的第二缓冲单元。在确定源数据均能够被传输到第一缓冲单元,且目的数据均能够被传输到第二缓冲单元之后。对第一缓冲单元收到的源数据生成对应的地址写请求,从而将第一缓冲单元收到的源数据发送到目的存 储单元;对第二缓冲单元收到的目的数据也生成对应的地址写请求,从而将第二缓冲单元收到的目的数据发送到源存储单元。本申请利用环形缓冲单元便实现了源存储单元与目的存储单元的数据交换,与现有技术相比,减小了功耗,节约了成本。
在一个可能的设计中,至少一个所述源数据返回至所述第一缓冲单元,所述根据所述环形缓冲单元收到的数据,生成对应的地址写请求,包括:根据所述第一缓冲单元收到的源数据,生成对应的目的地址写请求,其中,目的地址写请求包括对应的源数据的去向地址;所述利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送,包括:利用所述目的地址写请求,将对应的所述源数据向所述目的存储单元的所述去向地址发送。
在上述的实施方式中,对于第一缓冲单元收到的源数据,GPU可以生成目的地址写请求,目的地址写请求包括该源数据的去向地址,该去向地址位于目的存储单元,从而实现将第一缓冲单元收到的源数据写入目的存储单元。第一缓冲单元收到的源数据来自于源存储单元,由此,利用第一缓冲单元实现了数据从源存储单元向目的存储单元的转移过程。
在一个可能的设计中,在确定至少一个所述目的数据能传输到所述第二缓冲单元,所述根据所述环形缓冲单元收到的数据,生成对应的地址写请求,包括:每侦测到所述第一缓冲单元中被写入一个源数据,根据已写入所述第二缓冲单元的一个所述目的数据所对应的所述目的地址读请求,生成被写入的所述源数据所对应的所述目的地址写请求。
在上述的实施例中,不仅能够及时得到写入第一缓冲单元的源数据能够所对应的目的地址写请求,也能避免未发送到第二缓冲单元的目的数据被覆盖。
在一个可能的设计中,至少一个所述目的数据返回至所述第二缓冲单元,所述根据所述环形缓冲单元收到的数据,生成对应的地址写请求,包括:根据所述第二缓冲单元收到的目的数据,生成对应的源地址写请求,其中,源地址写请求包括对应的目的数据的去向地址;所述利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送,包括:利用所述源地址写请求,将对应的所述目的数据向所述源存储单元的所述去向地址发送。
在上述的实施方式中,对于第二缓冲单元收到的目的数据,GPU可以生成源地址写请求,源地址写请求包括该目的数据的去向地址,该去向地址位于源存储单元,从而实现将第二缓冲单元收到的目的数据写入源存储单元。第二缓冲单元收到的目的数据来自于目的存储单元,由此,利用第二缓冲单元实现了数据从目的存储单元向源存储单元的转移过程。
在一个可能的设计中,在确定至少一个所述源数据能传输到所述第一缓冲单元,所述根据所述环形缓冲单元收到的数据,生成对应的地址写请求,包括:每侦测到所述第二缓冲单元中被写入一个目的数据,根据已写入所述第一缓冲单元的一个所述源数据所对应的所述源地址读请求,生成被写入的所述目的数据所对应的所述源地址写请求。
在上述的实施例中,不仅能及时得到写入第二缓冲单元的目的数据所对应的源地址写请求,也能避免未发送到第一缓冲单元的源数据被覆盖。
在一个可能的设计中,所述源存储单元向所述第一缓冲单元发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元发送目的数据的通信接口相同;步骤:生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求在步骤:生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求之后执行。
在上述的实施方式中,向目的存储单元发送目的地址读请求的目的在于使目的存储单元返回目的数据,同理,向源存储单元发送源地址读请求的目的在于使源存储单元返回源数据。在源存储单元传输数据的通信接口与目的存储单元传输数据的通信接口相同的情况下,若目的存储单元返回目的数据与源存储单元返回源数据并行进行,则会交织占用同一通信接口,从而导致传输效率变低,因此,目的存储单元返回目的数据与源存储单元返回源数据不在同一时间段进行,利于提高传输效率。
在一个可能的设计中,所述源存储单元向所述第一缓冲单元发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元发送目的数据的通信接口不相同;步骤:生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求与步骤:生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求并行执行。
在上述的实施方式中,向目的存储单元发送目的地址读请求的目的在于使目的存储单元返回目的数据,同理,向源存储单元发送源地址读请求的目的在于使源存储单元返回源数据。在源存储单元传输数据的通信接口与目的存储单元传输数据的通信接口不相同的情况下,目的存储单元返回目的数据与源存储单元返回源数据并行进行互不干扰,都能维持较高的传输速率,从而提高了数据传输的效率。
在一个可能的设计中,所述数据交换方法在有序模式下进行,所述确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元,包括:确定所述至少一个源地址读请求均向所述源存储单元发送,且所述至少一个目的地址读请求均向所述目的存储单元发送。
在上述的实施方式中,确定至少一个源地址读请求均向所述源存储单元发送,即可确定上述的每个源地址读请求对应的源数据能够被传输回第一缓冲单元;确定至少一个目的地址读请求均向目的存储单元发送,即可确定上述的每个目的地址读请求对应的目的数据能够被传输回第二缓冲单元,从而避免数据尚未传输到缓冲单元时就被对端的发送过来的数据覆盖掉。
在一个可能的设计中,所述数据交换方法在无序模式下进行,所述确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元,包括:确定所述源数据均传输至所述第一缓冲单元,且所述目的数据均传输至所述第二缓冲单元。
在上述的实施方式中,数据交换方法在无序模式下进行,读请求与写请求没有顺序的限制,故在源数据均传输至第一缓冲单元,且目的数据均传输至第二缓冲单元后,才能够确定源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元;从而避免数据尚未传输到缓冲单元时就被对端的发送过来的数据覆盖掉。
第二方面,本申请实施例提供了一种数据交换装置,配置成通过环形缓冲单元交换源存储单元与目的存储单元的数据,所述环形缓冲单元包括第一缓冲单元和第二缓冲单元,所述装置包括:源读请求生成模块,配置成生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求,以便所述源存储单元将每个源地址读请求对应的源数据返回给所述第一缓冲单元;目的读请求生成模块,配置成生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求,以便所述目的存储单元将每个目的地址读请求对应的目的数据返回给所述第二缓冲单元;数据传输确定模块,配置成确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元;写请求生成模块,配置成根据所述环形缓冲单元收到的数据,生成对应的地址写请求,其中,所述地址写请求包括对应的数据的去向地址,所述第一缓冲单元收到的数据所对应的去向地址位于所述目的存储单元,所述第二缓冲单元收到的数据所对应的去向地址位于所述源存储单元;数据发送模块,配置成利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送。
第三方面,本申请实施例提供了一种处理器,包括环形缓冲单元、源存储单元、目的存储单元、源读地址逻辑生成电路以及目的读地址逻辑生成电路,所述源读地址逻辑生成电路与所述源存储单元连接,所述目的读地址逻辑生成电路与所述目的存储单元连接,所述环形缓冲单元包括第一缓冲单元和第二缓冲单元,所述源存储单元通过对应的通信接口与所述第一缓冲单元和第二缓冲单元连接,所述目的存储单元通过对应的通信接口与所述第一缓冲单元和所述第二缓冲单元连接,所述处理器配置成通过所述第一缓冲单元和第二缓冲单元交换源存储单元与目的存储单元的数据;所述源读地址逻辑生成电路配置成生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求,以便所述源存储单元将每个源地址读请求对应的源数据返回给所述第一缓冲单元;所述目的读地址逻辑生成电路配置成生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求,以便所述目的存储单元将每个目的地址读请求对应的目的数据返回给所述第二缓冲单元;所述处理器配置成确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元;所述处理器配置成根据所述环形缓冲单元收到的数据,生成对应的地址写请求,其中,所述地址写请求包括对应的数据的去向地址,所述第一缓冲单元收到的数据所对应的去向地址位于所述目的存储单元,所述第二缓冲单元收到的数据所对应的去向地址位于所述源存储单元;所述处理器配置成利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去 向地址发送。
在一个可能的设计中,还包括目的写地址逻辑生成电路,所述目的写地址逻辑生成电路与所述第一缓冲单元连接;所述目的写地址逻辑生成电路配置成根据所述第一缓冲单元收到的源数据,生成对应的目的地址写请求,其中,目的地址写请求包括对应的源数据的去向地址;所述第一缓冲单元配置成利用所述目的地址写请求,将对应的所述源数据向所述目的存储单元的所述去向地址发送。
在一个可能的设计中,在确定至少一个所述目的数据能传输到所述第二缓冲单元,所述目的写地址逻辑生成电路配置成:每侦测到所述第一缓冲单元中被写入一个源数据,根据已写入所述第二缓冲单元的一个所述目的数据所对应的所述目的地址读请求,生成被写入的所述源数据所对应的所述目的地址写请求。
在一个可能的设计中,还包括源写地址逻辑生成电路,所述源写地址逻辑生成电路与所述第二缓冲单元连接;所述源写地址逻辑生成电路配置成根据所述第二缓冲单元收到的目的数据,生成对应的源地址写请求,其中,源地址写请求包括对应的目的数据的去向地址;所述第二缓冲单元配置成利用所述源地址写请求,将对应的所述目的数据向所述源存储单元的所述去向地址发送。
在一个可能的设计中,在确定至少一个所述源数据能传输到所述第一缓冲单元,所述源写地址逻辑生成电路配置成:
每侦测到所述第二缓冲单元中被写入一个目的数据,根据已写入所述第一缓冲单元的一个所述源数据所对应的所述源地址读请求,生成被写入的所述目的数据所对应的所述源地址写请求。
在一个可能的设计中,所述源存储单元向所述第一缓冲单元发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元发送目的数据的通信接口相同;所述源存储单元为全局数据共享GDS存储器,所述目的存储单元为GDS存储器;或所述源存储单元为高速缓存存储器、设备内存中的任一个,所述目的存储单元为所述高速缓存存储器、所述设备内存中的任一个。
在一个可能的设计中,所述源存储单元向所述第一缓冲单元发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元发送目的数据的通信接口不相同;所述源存储单元为所述GDS存储器,所述目的存储单元为所述高速缓存存储器、所述设备内存中的任一个;或所述源存储单元为高速缓存存储器、设备内存中的任一个,所述目的存储单元为所述GDS存储器。
在一个可能的设计中,所述数据交换方法在有序模式下进行,所述数据交换方法在有序模式下进行;所述源读地址逻辑生成电路配置成确定所述至少一个源地址读请求均向所述源存储单元发送;所述目的读地址逻辑生成电路配置成确定所述至少一个目的地址读请求均向所述目的存储单元发送。
在一个可能的设计中,所述数据交换方法在无序模式下进行;所述第一缓冲单元配置成确定所 述源数据均传输至所述第一缓冲单元;所述第二缓冲单元配置成确定所述目的数据均传输至所述第二缓冲单元。
第四方面,本申请实施例提供了一种计算机系统,包括上述第三方面或第三方面的任一可选的实现方式的处理器。
第五方面,本申请提供一种可执行程序产品,所述可执行程序产品在计算机上运行时,使得计算机执行第一方面或第一方面的任意可能的实现方式中的方法。
第六方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行第一方面或第一方面的任意可能的实现方式中的方法。
为使本申请实施例所要实现的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本申请实施例对应的GPU的硬件流程图;
图2为本申请实施例提供的数据交换方法的流程示意图;
图3为本申请实施例提供的数据交换方法的部分步骤的一种具体实施方式的流程示意图;
图4为本申请实施例提供的数据交换方法的部分步骤的另一种具体实施方式的流程示意图;
图5为本申请实施例提供的数据交换装置的示意性结构框图;
图6为本申请实施例的计算机系统的结构示意图。
具体实施方式
在介绍本申请实施例之前,先对一些场景下的技术方案进行简要说明:
在一些场景中的GPU进行直接内存存取(Direct Memory Access,简称DMA)操作的硬件流程图,GPU包括环形缓冲单元、源读地址逻辑生成电路、目的写地址逻辑生成电路、GDS接口、高速缓存接口、全局数据共享(Global Data Share,简称GDS)存储器、高速缓存(Cache)存储器、设备内存(Device memory)、设备内存控制器。源读地址逻辑生成电路分别与GDS接口以及高速缓存接口连接,目的写地址逻辑生成电路分别与GDS接口108以及高速缓存接口连接。GDS存储器通过GDS接口与环形缓冲单元连接,高速缓存存储器通过多个高速缓存路由与高速缓存接口连接,高速缓存接口也与环形缓冲单元连接。高速缓存存储器还通过设备内存控制器与设备内存连接。
在一些场景中环形缓冲单元(ring buffer)配置成缓存数据的缓存单元,环形缓冲单元可以由静态随机存取存储器(Static Random-Access Memory,简称SRAM)实现。
GPU可以利用源读地址逻辑生成电路生成源地址读请求,源地址读请求可以从源存储单元读取数据,从源存储单元获取到的数据可被写入环形缓冲单元。环形缓冲单元中具有与源地址读请求对应的指针,该指针指向环形缓冲单元中待被写入数据的空白存储空间,当空白存储空间被写入数据后,更新该指针指向的位置,使该指针指向新的待被写入数据的空白存储空间。源存储单元可以为GDS存储器、高速缓存存储器、设备内存中的任意一个。
GPU可以利用目的写地址逻辑生成电路生成目的地址写请求,该目的地址写请求可以从环形缓冲单元读取数据,从环形缓冲单元读取到的数据被写入目的存储单元。环形缓冲单元中还具有与目的地址写请求对应的指针,该指针指向待被从环形缓冲单元读取的数据对应的存储空间,当相应的数据被读取后,更新该指针指向的位置,使该指针指向新的待被从环形缓冲单元读取的数据对应的存储空间。其中,若目的地址写请求对应的指针指向的位置和源地址读请求对应的指针指向的位置相同,表示环形缓冲单元为空。目的存储单元也可以为GDS存储器、高速缓存存储器、设备内存中的任意一个。
可见,一些场景中,上述的硬件流程只能实现数据的传输,若要实现数据交换,需要引入新的内核,会造成功耗变高。
在本申请实施例中,通过扩展命令处理器(Command processor)中的DMA引擎实现数据交换,在DMA_DATA命令包中增加交换模式(swap mode),从而使用DMA_DATA_SWAP实现数据交换。
具体地,可以在DMA_DATA命令包的DW0中加入新字段[31:31]:
SWAP_MODE.
0=DMA_MODE:DMA-Copy data from source to destination.
1=SWAP_MODE:swap data between source and destination.
上述新字段定义了DMA SWAP操作的SWAP模式的位,SWAP模式的位设置为0,表示按照DMA方式将数据从源位置复制到目的位置;SWAP模式的位设置为1,表示交换源位置与目的位置各自对应的数据。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
请参见图2,图2示出了本申请实施例提供的数据交换方法,该方法可以由图1示出的GPU执行,GPU包括环形缓冲单元102、源读地址逻辑生成电路104、源写地址逻辑生成电路105、目的写地址逻辑生成电路106、目的读地址逻辑生成电路107、GDS接口108、高速缓存接口110、GDS存储器112、多个高速缓存路由114、高速缓存存储器116、设备内存控制器118以及设备内存120。
环形缓冲单元102包括第一缓冲单元1021和第二缓冲单元1022,第一缓冲单元1021分别与 GDS接口108和高速缓存接口110连接,第二缓冲单元1022分别与GDS接口108和高速缓存接口110连接。源读地址逻辑生成电路104分别与GDS接口108和高速缓存接口110连接,源写地址逻辑生成电路105与第二缓冲单元1022连接,目的写地址逻辑生成电路106与第一缓冲单元1021连接,目的读地址逻辑生成电路107分别与GDS接口108和高速缓存接口110连接。GDS接口108还与GDS存储器112连接,高速缓存接口110还通过多个高速缓存路由114与高速缓存存储器116连接。高速缓存存储器116还与设备内存控制器118连接,设备内存控制器118还与设备内存120连接。
在一些可能的实现方式中,不妨设环形缓冲单元102的存储空间为64*512bit,第一缓冲单元1021与第二缓冲单元1022的存储空间可以相等,且均等于环形缓冲单元102的一半(例如,第一缓冲单元1021与第二缓冲单元1022的存储空间可以均为32*512bit),从而可以较大程度地利用环形缓冲单元102。应当理解,第一缓冲单元1021的存储空间可以与第二缓冲单元1022相等,也可以不相等,第一缓冲单元1021与第二缓冲单元1022的存储空间的具体数值不应该理解为是对本申请的限制。
GPU配置成通过环形缓冲单元102的第一缓冲单元1021和第二缓冲单元1022交换源存储单元与目的存储单元的数据。该方法具体包括如下步骤S110至步骤S150:
步骤S110,生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求,以便所述源存储单元将每个源地址读请求对应的源数据返回给所述第一缓冲单元1021。
至少一个源地址读请求中的每个源地址读请求均包括对应的源地址,每个源地址均存储有对应的源数据,每个源地址均位于源存储单元。
源地址读请求可以由GPU通过源读地址逻辑生成电路104生成。GPU可以利用源读地址逻辑生成电路104把生成的每个源地址读请求发送给源存储单元。对于每个源地址读请求,源存储单元根据源地址查找到对应的源数据,并把源数据返回给第一缓冲单元1021。
步骤S120,生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求,以便所述目的存储单元将每个目的地址读请求对应的目的数据返回给所述第二缓冲单元1022。
至少一个目的地址读请求中的每个目的地址读请求均包括对应的目的地址,每个目的地址均存储有对应的目的数据,每个目的地址均位于目的存储单元。
目的地址读请求可以由GPU通过目的读地址逻辑生成电路107生成。GPU可以利用目的读地址逻辑生成电路107把生成的每个目的地址读请求发送给目的存储单元。对于每个目的地址读请求,目的存储单元根据目的地址查找到对应的目的数据,并把目的数据返回给第二缓存单元。
步骤S130,确定所述源数据能传输到所述第一缓冲单元1021,且所述目的数据能传输到所述 第二缓冲单元1022。
GPU确定全部源数据均能够传输到第一缓冲单元1021,GPU确定全部目的数据均能够传输到第二缓冲单元1022。确定全部源数据均能够传输到第一缓冲单元1021是为了避免在源数据尚未完全传输到第一缓冲单元1021的情况下目的数据就被发送到了源存储单元,导致尚未被传输的源数据被目的数据覆盖。同样的,确定全部目的数据均能够传输到第二缓冲单元1022是为了避免在目的数据尚未完全传输到第二缓冲单元1022的情况下源数据就被发送到了目的存储单元,导致尚未被传输的目的数据被源数据覆盖。
在不同的运行模式下,确定源数据能传输到第一缓冲单元1021、目的数据能传输到第二缓冲单元1022的条件不同。
在一些可能的实现方式中,数据交换方法可以在有序模式(in-order mode)下进行,则步骤S130可以为:确定所述至少一个源地址读请求均向所述源存储单元发送,且所述至少一个目的地址读请求均向所述目的存储单元发送。
数据交换方法在有序模式下进行,由于读请求先发送,写请求后发送,读请求所请求的数据在写请求发送前,均已经在返回环形缓冲单元102的途中。因此,所述源读地址逻辑生成电路104确定至少一个源地址读请求均向所述源存储单元发送,即可确定上述的每个源地址读请求对应的源数据能够被传输回第一缓冲单元1021;目的读地址逻辑生成电路107确定至少一个目的地址读请求均向目的存储单元发送,即可确定上述的每个目的地址读请求对应的目的数据能够被传输回第二缓冲单元1022,从而避免数据尚未传输到缓冲单元时就被对端的发送过来的数据覆盖掉。
在一些可能的实现方式中,数据交换方法可以在无序模式(out-of-order mode)下进行,则步骤S130可以为:确定所述源数据均传输至所述第一缓冲单元1021,且所述目的数据均传输至所述第二缓冲单元1022。
数据交换方法在无序模式下进行,读请求与写请求没有顺序的限制,故在源数据均传输至第一缓冲单元1021,且目的数据均传输至第二缓冲单元1022后,才能够确定源数据能传输到所述第一缓冲单元1021,且所述目的数据能传输到所述第二缓冲单元1022;从而避免数据尚未传输到缓冲单元时就被对端的发送过来的数据覆盖掉。
步骤S140,根据所述环形缓冲单元102收到的数据,生成对应的地址写请求。
步骤S150,利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送。
其中,所述地址写请求包括对应的数据的去向地址,所述第一缓冲单元1021收到的数据所对应的去向地址位于所述目的存储单元,所述第二缓冲单元1022收到的数据所对应的去向地址位于所述源存储单元。
在确定源数据均能够被传输到第一缓冲单元1021,且目的数据均能够被传输到第二缓冲单元 1022之后。对第一缓冲单元1021收到的源数据生成对应的地址写请求,从而将第一缓冲单元1021收到的源数据发送到目的存储单元;对第二缓冲单元1022收到的目的数据也生成对应的地址写请求,从而将第二缓冲单元1022收到的目的数据发送到源存储单元。本申请利用环形缓冲单元102便实现了源存储单元与目的存储单元的数据交换,与现有技术相比,减小了功耗,节约了成本。
在一些可能的实现方式中,请参见图3,步骤S140至步骤S150分别对应于如下步骤S141至步骤S151:
步骤S141,根据所述第一缓冲单元1021收到的源数据,生成对应的目的地址写请求。
目的地址写请求包括对应的源数据的去向地址,该去向地址位于目的存储单元。在一些可能的实现方式中,每当第一缓冲单元1021收到一次源数据,目的写地址逻辑生成电路106便可以生成对应的目的地址写请求,从而可以将第一缓冲单元1021中的源数据及时的传输给目的存储单元。
步骤S151,利用所述目的地址写请求,将对应的所述源数据向所述目的存储单元的所述去向地址发送。
GPU执行目的地址写请求,通过第一缓冲单元1021将源数据向目的存储单元的去向地址发送,以将该源数据写入目的存储单元。
对于第一缓冲单元1021收到的源数据,GPU可以生成目的地址写请求,目的地址写请求包括该源数据的去向地址,该去向地址位于目的存储单元,从而实现将第一缓冲单元1021收到的源数据写入目的存储单元。第一缓冲单元1021收到的源数据来自于源存储单元,由此,利用第一缓冲单元1021实现了数据从源存储单元向目的存储单元的转移过程。
在一些可能的实现方式中,请参见图4,步骤S140至步骤S150分别对应于如下步骤S241至步骤S251:
步骤S241,根据所述第二缓冲单元1022收到的目的数据,生成对应的源地址写请求。
源地址写请求包括对应的目的数据的去向地址,该去向地址位于源存储单元。在一些可能的实现方式中,每当第二缓冲单元1022收到一次目的数据,源写地址逻辑生成电路105便会生成对应的源地址写请求,从而可以将第二缓冲单元1022中的目的数据及时的传输给源存储单元。
步骤S251,利用所述源地址写请求,将对应的所述目的数据向所述源存储单元的所述去向地址发送。
GPU执行源地址写请求,通过第二缓冲单元1022将目的数据向源存储单元的去向地址发送,以将该目的数据写入源存储单元。
对于第二缓冲单元1022收到的目的数据,GPU可以生成源地址写请求,源地址写请求包括该目的数据的去向地址,该去向地址位于源存储单元,从而实现将第二缓冲单元1022收到的目的数据写入源存储单元。第二缓冲单元1022收到的目的数据来自于目的存储单元,由此,利用第二缓 冲单元1022实现了数据从目的存储单元向源存储单元的转移过程。
在一种具体实施方式中,源存储单元向所述第一缓冲单元1021发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元1022发送目的数据的通信接口相同,则步骤S120可以在步骤S110之后执行。
不妨设源存储单元向第一缓冲单元1021发送源数据的通信接口以及目的存储单元向第二缓冲单元1022发送目的数据的通信接口均为图1示出的高速缓存接口110。
在一些可能的实现方式中,源存储单元和目的存储单元可以为均与高速缓存接口110连接的不同存储单元,例如,源存储单元可以为高速缓存存储器116,目的存储单元可以为设备内存120;可以理解,源存储单元与目的存储单元也可以互换,即源存储单元可以为设备内存120,目的存储单元可以为高速缓存存储器116。本申请实施例提供的数据交换方法的目标在于实现两处数据的交换,因此,对于欲交换的两处数据分别所在的载体,可以随机将其中一个载体作为源存储单元,另一个载体自然成为目的存储单元。
在一些可能的实现方式中,源存储单元和目的存储单元也可以为同一存储单元,即数据交换可以是实现同一存储单元的不同位置的数据的交换,例如源存储单元和目的存储单元均可以为高速缓存存储器116。
向目的存储单元发送目的地址读请求的目的在于使目的存储单元返回目的数据,同理,向源存储单元发送源地址读请求的目的在于使源存储单元返回源数据。在源存储单元传输数据的通信接口与目的存储单元传输数据的通信接口相同的情况下,若目的存储单元返回目的数据与源存储单元返回源数据并行进行,则会交织占用同一通信接口,从而导致传输效率变低,因此,目的存储单元返回目的数据与源存储单元返回源数据不在同一时间段进行,利于提高传输效率。
为了便于描述,不妨以源存储单元是设备内存120,目的存储单元是高速缓存存储器116为例,对通信接口相同(均为高速缓存接口110)的情况进行说明:
源读地址逻辑生成电路104生成多个源地址读请求,GPU将多个源地址读请求均发送给源存储单元:设备内存120。设备内存120对于每个源地址读请求,均可以获取到对应的源数据,并将源数据经高速缓存接口110发送给第一缓冲单元1021。
对于有序模式,GPU将多个源地址读请求均发送给设备内存120后,便可以确定源数据能够传输到第一缓存单元;对于无序模式,则需要在第一缓存单元接收到多个源地址读请求分别对应的源数据,才可以确定源数据能够传输到第一缓存单元。
目的读地址逻辑生成电路107生成多个目的地址读请求,GPU将多个目的地址读请求均发送给目的存储单元:高速缓存存储器116。高速缓存存储器116对于每个目的地址读请求,均可以获取到对应的目的数据,并将目的数据经高速缓存接口110发送给第二缓冲单元1022。
对于有序模式,GPU将多个目的地址读请求均发送给高速缓存存储器116后,便可以确定目的数据能够传输到第二缓存单元;对于无序模式,则需要在第二缓存单元接收到多个目的地址读请求分别对应的目的数据,才可以确定目的数据能够传输到第二缓存单元。
在确定目的数据能够传输到第二缓存单元后,每当一个源数据被写入第一缓存单元,GPU便可以利用目的写地址逻辑生成电路106生成对应的目的地址写请求,该目的地址写请求中包括位于目的存储单元:高速缓存存储器116的去向地址,从而将来自于源存储单元:设备内存120的源数据经第一缓存单元缓存,传输至目的存储单元:高速缓存存储器116。
在一些可能的实现方式中,在确定至少一个目的数据能传输到所述第二缓冲单元时,GPU利用目的写地址逻辑生成电路106生成对应的目的地址写请求可以是:每侦测到所述第一缓冲单元中被写入一条源数据,根据已写入所述第二缓冲单元的一个目的数据所对应的所述目的地址读请求,生成被写入的所述源数据所对应的所述目的地址写请求。如此,可以保障源数据的写入目的存储单元的及时性。显然,即便存在至少一条目的数据未被写入第二缓冲单元,也能避免未写入第二缓冲单元的目的数据被覆盖。
同理,在确定至少一个源数据能传输到所述第一缓冲单元,生成对应的源地址写请求的步骤包括:每侦测到所述第二缓冲单元中被写入一个目的数据,根据已写入所述第一缓冲单元的一个所述源数据所对应的所述源地址读请求,生成被写入的所述目的数据所对应的所述源地址写请求。
在确定源数据能够传输到第一缓存单元后,每当一个目的数据被写入第二缓存单元,GPU便可以利用源写地址逻辑生成电路105生成对应的源地址写请求,该源地址写请求中包括位于源存储单元:设备内存120的去向地址,从而将来自于目的存储单元:高速缓存存储器116的源数据经第二缓存单元缓存,传输至源存储单元:设备内存120。
在另一种具体实现方式中,所述源存储单元向所述第一缓冲单元1021发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元1022发送目的数据的通信接口不相同,则步骤S120可以与步骤S110并行执行。
不妨设源存储单元向第一缓冲单元1021发送源数据的通信接口为图1示出的GDS接口108,目的存储单元向第二缓冲单元1022发送目的数据的通信接口为高速缓存接口110。
在源存储单元传输数据的通信接口与目的存储单元传输数据的通信接口不相同的情况下,目的存储单元返回目的数据与源存储单元返回源数据并行进行互不干扰,都能维持较高的传输速率,从而提高了数据传输的效率。
为了便于描述,不妨以源存储单元是GDS存储器112,目的存储单元是高速缓存存储器116为例,对通信接口不同的情况进行说明:
源读地址逻辑生成电路104生成多个源地址读请求,GPU将多个源地址读请求均发送给源存储单元:GDS存储器112。GDS存储器112对于每个源地址读请求,均可以获取到对应的源数据,并将源数据经GDS接口108发送给第一缓冲单元1021。
与此同时,目的读地址逻辑生成电路107生成多个目的地址读请求,GPU将多个目的地址读请求均发送给目的存储单元:高速缓存存储器116。高速缓存存储器116对于每个目的地址读请求,均可以获取到对应的目的数据,并将目的数据经高速缓存接口110发送给第二缓冲单元1022。
在确定目的数据能够传输到第二缓存单元,且确定源数据能够传输到第一缓存单元后,每当一个源数据被写入第一缓存单元,GPU便可以利用目的写地址逻辑生成电路106生成对应的目的地址写请求,该目的地址写请求中包括位于目的存储单元:高速缓存存储器116的去向地址,从而将来自于源存储单元:GDS存储器112的源数据经第一缓存单元缓存,传输至目的存储单元:高速缓存存储器116。
与此同时,每当一个目的数据被写入第二缓存单元,GPU便可以利用源写地址逻辑生成电路105生成对应的源地址写请求,该源地址写请求中包括位于源存储单元:GDS存储器112的去向地址,从而将来自于目的存储单元:高速缓存存储器116的源数据经第二缓存单元缓存,传输至源存储单元:GDS存储器112。
请参见图5,图5示出了本申请实施例提供的数据交换装置,所述装置300包括:
源读请求生成模块310,配置成生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求,以便所述源存储单元将每个源地址读请求对应的源数据返回给所述第一缓冲单元。
目的读请求生成模块320,配置成生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求,以便所述目的存储单元将每个目的地址读请求对应的目的数据返回给所述第二缓冲单元。
数据传输确定模块330,配置成确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元;
写请求生成模块340,配置成根据所述环形缓冲单元收到的数据,生成对应的地址写请求,其中,所述地址写请求包括对应的数据的去向地址,所述第一缓冲单元收到的数据所对应的去向地址位于所述目的存储单元,所述第二缓冲单元收到的数据所对应的去向地址位于所述源存储单元;
数据发送模块350,配置成利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送。
在一些可能的实现方式中,写请求生成模块340,具体配置成根据所述第一缓冲单元收到的源数据,生成对应的目的地址写请求,其中,目的地址写请求包括对应的源数据的去向地址。
数据发送模块350,具体配置成利用所述目的地址写请求,将对应的所述源数据向所述目的存储单元的所述去向地址发送。
在一些可能的实现方式中,写请求生成模块340,具体配置成根据所述第二缓冲单元收到的目的数据,生成对应的源地址写请求,其中,源地址写请求包括对应的目的数据的去向地址。
数据发送模块350,具体配置成利用所述源地址写请求,将对应的所述目的数据向所述源存储单元的所述去向地址发送。
在一些可能的实现方式中,数据交换方法在有序模式下进行,数据传输确定模块330,具体配置成确定所述至少一个源地址读请求均向所述源存储单元发送,且所述至少一个目的地址读请求均向所述目的存储单元发送。
在一些可能的实现方式中,所述数据交换方法在无序模式下进行,数据传输确定模块330,具体被配置成确定所述源数据均传输至所述第一缓冲单元,且所述目的数据均传输至所述第二缓冲单元。
图5示出的数据交换装置与图2示出的数据交换方法相对应,在此便不做赘述。
本申请实施例提供一种处理器,处理器的硬件流程图如图1所示,处理器包括环形缓冲单元、源存储单元、目的存储单元、源读地址逻辑生成电路以及目的读地址逻辑生成电路,所述源读地址逻辑生成电路与所述源存储单元连接,所述目的读地址逻辑生成电路与所述目的存储单元连接,所述环形缓冲单元包括第一缓冲单元和第二缓冲单元,所述源存储单元通过对应的通信接口与所述第一缓冲单元和第二缓冲单元连接,所述目的存储单元通过对应的通信接口与所述第一缓冲单元和所述第二缓冲单元连接,所述处理器配置成通过所述第一缓冲单元和第二缓冲单元交换源存储单元与目的存储单元的数据;所述源读地址逻辑生成电路配置成生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求,以便所述源存储单元将每个源地址读请求对应的源数据返回给所述第一缓冲单元;所述目的读地址逻辑生成电路配置成生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求,以便所述目的存储单元将每个目的地址读请求对应的目的数据返回给所述第二缓冲单元;所述处理器配置成确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元;所述处理器配置成根据所述环形缓冲单元收到的数据,生成对应的地址写请求,其中,所述地址写请求包括对应的数据的去向地址,所述第一缓冲单元收到的数据所对应的去向地址位于所述目的存储单元,所述第二缓冲单元收到的数据所对应的去向地址位于所述源存储单元;所述处理器配置成利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送。
在一些可能的实现方式中,还包括目的写地址逻辑生成电路,所述目的写地址逻辑生成电路与所述第一缓冲单元连接;所述目的写地址逻辑生成电路配置成根据所述第一缓冲单元收到的源数据, 生成对应的目的地址写请求,其中,目的地址写请求包括对应的源数据的去向地址;所述第一缓冲单元配置成利用所述目的地址写请求,将对应的所述源数据向所述目的存储单元的所述去向地址发送。
在一些可能的实现方式中,还包括源写地址逻辑生成电路,所述源写地址逻辑生成电路与所述第二缓冲单元连接;所述源写地址逻辑生成电路配置成根据所述第二缓冲单元收到的目的数据,生成对应的源地址写请求,其中,源地址写请求包括对应的目的数据的去向地址;所述第二缓冲单元配置成利用所述源地址写请求,将对应的所述目的数据向所述源存储单元的所述去向地址发送。
在一些可能的实现方式中,所述源存储单元向所述第一缓冲单元发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元发送目的数据的通信接口相同;所述源存储单元为全局数据共享GDS存储器,所述目的存储单元为GDS存储器;或所述源存储单元为高速缓存存储器、设备内存中的任一个,所述目的存储单元为所述高速缓存存储器、所述设备内存中的任一个。
在一些可能的实现方式中,所述源存储单元向所述第一缓冲单元发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元发送目的数据的通信接口不相同;所述源存储单元为所述GDS存储器,所述目的存储单元为所述高速缓存存储器、所述设备内存中的任一个;或所述源存储单元为高速缓存存储器、设备内存中的任一个,所述目的存储单元为所述GDS存储器。
在一些可能的实现方式中,所述数据交换方法在有序模式下进行,所述数据交换方法在有序模式下进行;所述源读地址逻辑生成电路配置成确定所述至少一个源地址读请求均向所述源存储单元发送;所述目的读地址逻辑生成电路配置成确定所述至少一个目的地址读请求均向所述目的存储单元发送。
在一些可能的实现方式中,所述数据交换方法在无序模式下进行;所述第一缓冲单元配置成确定所述源数据均传输至所述第一缓冲单元;所述第二缓冲单元配置成确定所述目的数据均传输至所述第二缓冲单元。
应当说明的是,处理器所执行的动作与上述各实施例一致,此处不再赘述。
图6为本申请实施例的计算机系统的结构示意图,计算机系统可以由硬件子系统和软件子系统组成。如图6所示,该计算机系统包括处理器601、存储器(memory)602和总线603;其中,所述处理器601、存储器602均通过所述总线603完成相互间的通信;所述处理器601配置成调用所述存储器602中的程序指令,进行图像和图形相关运算工作。其中,处理器601进行数据交换的方法与上述实施例一致,此处不再赘述。
在一些可能的实现方式中,处理器601可以为图形处理器,也可以是中央处理器(Central Processing Unit,CPU)、加速处理器(Accelerated Processing Unit)等,也还可以是其他类型的处理器,如网络处理器(Network Processor,NP)、应用处理器,当然在某些产品中,应用处理器就是CPU, 本申请实施例提供的处理器601可以应配置成图形处理应用场景,也可以配置成深度计算等运算场景。
在本申请所提供的实施例中,应该理解到,所揭露装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
另外,作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
再者,在本申请各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。
在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。
以上所述仅为本申请的实施例而已,并不用于限制本申请的保护范围,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。
工业实用性
本申请提供的数据交换方法、装置、处理器及计算机系统,通过利用环形缓冲单元便实现了源存储单元与目的存储单元的数据交换,无需增加新的内核,不仅减小了功耗,还节约了成本。

Claims (21)

  1. 一种数据交换方法,其特征在于,通过环形缓冲单元交换源存储单元与目的存储单元的数据,所述环形缓冲单元包括第一缓冲单元和第二缓冲单元,所述方法包括:
    生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求,以便所述源存储单元将每个源地址读请求对应的源数据返回给所述第一缓冲单元;
    生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求,以便所述目的存储单元将每个目的地址读请求对应的目的数据返回给所述第二缓冲单元;
    确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元;
    根据所述环形缓冲单元收到的数据,生成对应的地址写请求,其中,所述地址写请求包括对应的数据的去向地址,所述第一缓冲单元收到的数据所对应的去向地址位于所述目的存储单元,所述第二缓冲单元收到的数据所对应的去向地址位于所述源存储单元;
    利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送。
  2. 根据权利要求1所述的方法,其特征在于,至少一个所述源数据返回至所述第一缓冲单元,所述根据所述环形缓冲单元收到的数据,生成对应的地址写请求,包括:
    根据所述第一缓冲单元收到的源数据,生成对应的目的地址写请求,其中,目的地址写请求包括对应的源数据的去向地址;
    所述利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送,包括:
    利用所述目的地址写请求,将对应的所述源数据向所述目的存储单元的所述去向地址发送。
  3. 根据权利要求2所述的方法,其特征在于,在确定至少一个所述目的数据能传输到所述第二缓冲单元,所述根据所述环形缓冲单元收到的数据,生成对应的地址写请求,包括:
    每侦测到所述第一缓冲单元中被写入一个源数据,根据已写入所述第二缓冲单元的一个所述目的数据所对应的所述目的地址读请求,生成被写入的所述源数据所对应的所述目的地址写请求。
  4. 根据权利要求1所述的方法,其特征在于,至少一个所述目的数据返回至所述第二缓冲单元,所述根据所述环形缓冲单元收到的数据,生成对应的地址写请求,包括:
    根据所述第二缓冲单元收到的目的数据,生成对应的源地址写请求,其中,源地址写请求包括对应的目的数据的去向地址;
    所述利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送,包括:
    利用所述源地址写请求,将对应的所述目的数据向所述源存储单元的所述去向地址发送。
  5. 根据权利要求4所述的方法,其特征在于,在确定至少一个所述源数据能传输到所述第一缓冲单元,所述根据所述环形缓冲单元收到的数据,生成对应的地址写请求,包括:
    每侦测到所述第二缓冲单元中被写入一个目的数据,根据已写入所述第一缓冲单元的一个所述源数据所对应的所述源地址读请求,生成被写入的所述目的数据所对应的所述源地址写请求。
  6. 根据权利要求1所述的方法,其特征在于,所述源存储单元向所述第一缓冲单元发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元发送目的数据的通信接口相同;
    步骤:生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求在步骤:生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求之后执行。
  7. 根据权利要求1所述的方法,其特征在于,所述源存储单元向所述第一缓冲单元发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元发送目的数据的通信接口不相同;
    步骤:生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求与步骤:生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求并行执行。
  8. 根据权利要求1所述的方法,其特征在于,所述数据交换方法在有序模式下进行,所述确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元,包括:
    确定所述至少一个源地址读请求均向所述源存储单元发送,且所述至少一个目的地址读请求均向所述目的存储单元发送。
  9. 根据权利要求1所述的方法,其特征在于,所述数据交换方法在无序模式下进行,所述确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元,包括:
    确定所述源数据均传输至所述第一缓冲单元,且所述目的数据均传输至所述第二缓冲单元。
  10. 一种数据交换装置,其特征在于,通过环形缓冲单元交换源存储单元与目的存储单元的数据,所述环形缓冲单元包括第一缓冲单元和第二缓冲单元,所述装置包括:
    源读请求生成模块,配置成生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求,以便所述源存储单元将每个源地址读请求对应的源数据返回给所述第一缓冲单元;
    目的读请求生成模块,配置成生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求,以便所述目的存储单元将每个目的地址读请求对应的目的数据 返回给所述第二缓冲单元;
    数据传输确定模块,配置成确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元;
    写请求生成模块,配置成根据所述环形缓冲单元收到的数据,生成对应的地址写请求,其中,所述地址写请求包括对应的数据的去向地址,所述第一缓冲单元收到的数据所对应的去向地址位于所述目的存储单元,所述第二缓冲单元收到的数据所对应的去向地址位于所述源存储单元;
    数据发送模块,配置成利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送。
  11. 一种处理器,其特征在于,包括环形缓冲单元、源存储单元、目的存储单元、源读地址逻辑生成电路以及目的读地址逻辑生成电路,所述源读地址逻辑生成电路与所述源存储单元连接,所述目的读地址逻辑生成电路与所述目的存储单元连接,所述环形缓冲单元包括第一缓冲单元和第二缓冲单元,所述源存储单元通过对应的通信接口与所述第一缓冲单元和第二缓冲单元连接,所述目的存储单元通过对应的通信接口与所述第一缓冲单元和所述第二缓冲单元连接,所述处理器配置成通过所述第一缓冲单元和第二缓冲单元交换源存储单元与目的存储单元的数据;
    所述源读地址逻辑生成电路配置成生成至少一个源地址读请求,并向所述源存储单元发送所述至少一个源地址读请求,以便所述源存储单元将每个源地址读请求对应的源数据返回给所述第一缓冲单元;
    所述目的读地址逻辑生成电路配置成生成至少一个目的地址读请求,并向所述目的存储单元发送所述至少一个目的地址读请求,以便所述目的存储单元将每个目的地址读请求对应的目的数据返回给所述第二缓冲单元;
    所述处理器配置成确定所述源数据能传输到所述第一缓冲单元,且所述目的数据能传输到所述第二缓冲单元;
    所述处理器配置成根据所述环形缓冲单元收到的数据,生成对应的地址写请求,其中,所述地址写请求包括对应的数据的去向地址,所述第一缓冲单元收到的数据所对应的去向地址位于所述目的存储单元,所述第二缓冲单元收到的数据所对应的去向地址位于所述源存储单元;
    所述处理器配置成利用所述地址写请求,将对应的所述数据向对应的存储单元的所述去向地址发送。
  12. 根据权利要求11所述的处理器,其特征在于,还包括目的写地址逻辑生成电路,所述目的写地址逻辑生成电路与所述第一缓冲单元连接;
    所述目的写地址逻辑生成电路配置成根据所述第一缓冲单元收到的源数据,生成对应的目的地址写请求,其中,目的地址写请求包括对应的源数据的去向地址;
    所述第一缓冲单元配置成利用所述目的地址写请求,将对应的所述源数据向所述目的存储单元的所述去向地址发送。
  13. 根据权利要求12所述的处理器,其特征在于,在确定至少一个所述目的数据能传输到所述第二缓冲单元,所述目的写地址逻辑生成电路配置成:每侦测到所述第一缓冲单元中被写入一个源数据,根据已写入所述第二缓冲单元的一个所述目的数据所对应的所述目的地址读请求,生成被写入的所述源数据所对应的所述目的地址写请求。
  14. 根据权利要求11所述的处理器,其特征在于,还包括源写地址逻辑生成电路,所述源写地址逻辑生成电路与所述第二缓冲单元连接;
    所述源写地址逻辑生成电路配置成根据所述第二缓冲单元收到的目的数据,生成对应的源地址写请求,其中,源地址写请求包括对应的目的数据的去向地址;
    所述第二缓冲单元配置成利用所述源地址写请求,将对应的所述目的数据向所述源存储单元的所述去向地址发送。
  15. 根据权利要求14所述的处理器,其特征在于,在确定至少一个所述源数据能传输到所述第一缓冲单元,所述源写地址逻辑生成电路配置成:每侦测到所述第二缓冲单元中被写入一个目的数据,根据已写入所述第一缓冲单元的一个所述源数据所对应的所述源地址读请求,生成被写入的所述目的数据所对应的所述源地址写请求。
  16. 根据权利要求11所述的处理器,其特征在于,所述源存储单元向所述第一缓冲单元发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元发送目的数据的通信接口相同;
    所述源存储单元为全局数据共享GDS存储器,所述目的存储单元为GDS存储器;或
    所述源存储单元为高速缓存存储器、设备内存中的任一个,所述目的存储单元为所述高速缓存存储器、所述设备内存中的任一个。
  17. 根据权利要求11所述的处理器,其特征在于,所述源存储单元向所述第一缓冲单元发送源数据的通信接口与所述目的存储单元向所述第二缓冲单元发送目的数据的通信接口不相同;
    所述源存储单元为GDS存储器,所述目的存储单元为高速缓存存储器、设备内存中的任一个;或
    所述源存储单元为高速缓存存储器、设备内存中的任一个,所述目的存储单元为所述GDS存储器。
  18. 根据权利要求11所述的处理器,其特征在于,
    所述源读地址逻辑生成电路配置成确定所述至少一个源地址读请求均向所述源存储单元发送;
    所述目的读地址逻辑生成电路配置成确定所述至少一个目的地址读请求均向所述目的存储单元发送。
  19. 根据权利要求11所述的处理器,其特征在于,
    所述第一缓冲单元配置成确定所述源数据均传输至所述第一缓冲单元;
    所述第二缓冲单元配置成确定所述目的数据均传输至所述第二缓冲单元。
  20. 一种计算机系统,其特征在于,包括权利要求11-19任一项所述的处理器。
  21. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行如权利要求1至9任一所述方法的步骤。
PCT/CN2020/114006 2019-12-18 2020-09-08 数据交换方法、装置、处理器及计算机系统 WO2021120714A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911317544.X 2019-12-18
CN201911317544.XA CN111124953B (zh) 2019-12-18 2019-12-18 数据交换方法、装置、处理器及计算机系统

Publications (1)

Publication Number Publication Date
WO2021120714A1 true WO2021120714A1 (zh) 2021-06-24

Family

ID=70500930

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/114006 WO2021120714A1 (zh) 2019-12-18 2020-09-08 数据交换方法、装置、处理器及计算机系统

Country Status (2)

Country Link
CN (1) CN111124953B (zh)
WO (1) WO2021120714A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124953B (zh) * 2019-12-18 2021-04-27 海光信息技术股份有限公司 数据交换方法、装置、处理器及计算机系统
CN112380154A (zh) * 2020-11-12 2021-02-19 海光信息技术股份有限公司 数据传输方法和数据传输装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1825292A (zh) * 2005-02-23 2006-08-30 华为技术有限公司 一种直接存储器存取装置及单通道双向数据交互实现方法
US20070088929A1 (en) * 2005-10-13 2007-04-19 Tomohiro Hanai Method for exchanging data between volumes of storage system
CN103714026A (zh) * 2014-01-14 2014-04-09 中国人民解放军国防科学技术大学 一种支持原址数据交换的存储器访问方法及装置
CN110046047A (zh) * 2019-04-15 2019-07-23 Oppo广东移动通信有限公司 一种进程间通信方法、装置及计算机可读存储介质
CN110543433A (zh) * 2019-08-30 2019-12-06 中国科学院微电子研究所 一种混合内存的数据迁移方法及装置
CN111124953A (zh) * 2019-12-18 2020-05-08 海光信息技术有限公司 数据交换方法、装置、处理器及计算机系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5209535B2 (ja) * 2009-02-24 2013-06-12 ルネサスエレクトロニクス株式会社 Usbホストコントローラ及びusbホストコントローラの制御方法
CN101908036B (zh) * 2010-07-22 2011-08-31 中国科学院计算技术研究所 一种高密度多处理器系统及其节点控制器
CN103514261B (zh) * 2013-08-13 2017-03-15 北京华电天益信息科技有限公司 一种应用于工业控制系统的数据异步存储及访问方法
CN103955436B (zh) * 2014-04-30 2018-01-16 华为技术有限公司 一种数据处理装置和终端
US10282811B2 (en) * 2017-04-07 2019-05-07 Intel Corporation Apparatus and method for managing data bias in a graphics processing architecture
CN109117416B (zh) * 2018-09-27 2020-05-26 贵州华芯通半导体技术有限公司 插槽间的数据迁移或交换的方法和装置以及多处理器系统
CN110083568B (zh) * 2019-03-29 2021-07-13 海光信息技术股份有限公司 数据交换系统、数据交换命令路由方法、芯片及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1825292A (zh) * 2005-02-23 2006-08-30 华为技术有限公司 一种直接存储器存取装置及单通道双向数据交互实现方法
US20070088929A1 (en) * 2005-10-13 2007-04-19 Tomohiro Hanai Method for exchanging data between volumes of storage system
CN103714026A (zh) * 2014-01-14 2014-04-09 中国人民解放军国防科学技术大学 一种支持原址数据交换的存储器访问方法及装置
CN110046047A (zh) * 2019-04-15 2019-07-23 Oppo广东移动通信有限公司 一种进程间通信方法、装置及计算机可读存储介质
CN110543433A (zh) * 2019-08-30 2019-12-06 中国科学院微电子研究所 一种混合内存的数据迁移方法及装置
CN111124953A (zh) * 2019-12-18 2020-05-08 海光信息技术有限公司 数据交换方法、装置、处理器及计算机系统

Also Published As

Publication number Publication date
CN111124953A (zh) 2020-05-08
CN111124953B (zh) 2021-04-27

Similar Documents

Publication Publication Date Title
US10169080B2 (en) Method for work scheduling in a multi-chip system
US9529532B2 (en) Method and apparatus for memory allocation in a multi-node system
CN103119571B (zh) 用于目录高速缓存的分配和写策略的装置和方法
CN117971715A (zh) 多处理器系统中的中继一致存储器管理
US8862801B2 (en) Handling atomic operations for a non-coherent device
US7836144B2 (en) System and method for a 3-hop cache coherency protocol
WO2021120714A1 (zh) 数据交换方法、装置、处理器及计算机系统
US20080065835A1 (en) Offloading operations for maintaining data coherence across a plurality of nodes
US10282293B2 (en) Method, switch, and multiprocessor system using computations and local memory operations
US10592459B2 (en) Method and system for ordering I/O access in a multi-node environment
US20220294743A1 (en) Methods and apparatus for network interface fabric operations
JP2002304328A (ja) マルチプロセッサシステム用コヒーレンスコントローラ、およびそのようなコントローラを内蔵するモジュールおよびマルチモジュールアーキテクチャマルチプロセッサシステム
US11709774B2 (en) Data consistency and durability over distributed persistent memory systems
US20150254183A1 (en) Inter-chip interconnect protocol for a multi-chip system
JP7461895B2 (ja) Gpu主導の通信のためのネットワークパケットテンプレーティング
WO2021114768A1 (zh) 数据处理装置、方法、芯片、处理器、设备及存储介质
US20220179792A1 (en) Memory management device
US20230229595A1 (en) Low latency inter-chip communication mechanism in a multi-chip processing system
US10592465B2 (en) Node controller direct socket group memory access
US9372795B2 (en) Apparatus and method for maintaining cache coherency, and multiprocessor apparatus using the method
US20140250285A1 (en) Inter-domain memory copy method and apparatus
US11275707B2 (en) Multi-core processor and inter-core data forwarding method
CN113900967A (zh) 高速缓存存储系统
WO2022061763A1 (zh) 一种数据存储方法及装置
US20240045588A1 (en) Hybrid memory system and accelerator including the same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20901297

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20901297

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20901297

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.04.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20901297

Country of ref document: EP

Kind code of ref document: A1