WO2023040464A1 - Procédé de communication par bus et dispositif associé - Google Patents

Procédé de communication par bus et dispositif associé Download PDF

Info

Publication number
WO2023040464A1
WO2023040464A1 PCT/CN2022/107397 CN2022107397W WO2023040464A1 WO 2023040464 A1 WO2023040464 A1 WO 2023040464A1 CN 2022107397 W CN2022107397 W CN 2022107397W WO 2023040464 A1 WO2023040464 A1 WO 2023040464A1
Authority
WO
WIPO (PCT)
Prior art keywords
page table
translation
request
invalidation
address
Prior art date
Application number
PCT/CN2022/107397
Other languages
English (en)
Chinese (zh)
Inventor
刘君龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023040464A1 publication Critical patent/WO2023040464A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/109Address translation for multiple virtual address spaces, e.g. segmentation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/09Mapping addresses
    • H04L61/25Mapping addresses of the same type
    • H04L61/2503Translation of Internet protocol [IP] addresses
    • H04L61/2557Translation policies or rules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/50Address allocation
    • H04L61/5038Address allocation for local use, e.g. in LAN or USB networks, or in a controller area network [CAN]

Definitions

  • the present application relates to the technical field of bus communication, in particular to a bus communication method and related equipment.
  • an address translation service (Address Translation Service, ATS) mechanism is introduced in the existing technology, which can obtain the address translation result of the corresponding access destination node before the computing node performs data access, and then use the address translation result to directly send The physical address of the destination node initiates a corresponding operation request, so as to realize the visit to the destination node.
  • ATS Address Translation Service
  • Embodiments of the present application provide a bus communication method and related equipment, so as to improve the efficiency and flexibility of bus communication.
  • the embodiment of the present application provides a bus communication method, which can be applied to a bus system.
  • the bus system includes a plurality of devices interconnected through the bus, and the plurality of devices interconnected through the bus include at least the first device and a second device; the method may include: the first device sends an address translation request to the second device, and the address translation request includes a first identifier for identifying the address translation request and a virtual address space to be translated
  • the virtual address of the virtual address space to be translated includes one or more minimum translation units STU; the second device generates one or more translation response messages in response to the address translation request; wherein the one Each translation response message in one or more translation response messages includes the first identifier, target location information and target physical address, and the target location information is one or more of the one or more STUs
  • the location information of the target STU in the virtual address space to be translated, and the target physical address is a physical address after translation of the virtual addresses of the one or more target STUs;
  • a device feeds back the one
  • the embodiment of the present application provides a design scheme of bus communication, especially relates to the design of address translation mechanism in bus communication, including redefining the address translation request and response mechanism to improve address translation efficiency and bus bandwidth utilization.
  • each translation response message responded by the translation response side carries the identification of its corresponding address translation request and adds the translation response of the prior art
  • the target location information that does not exist in the message, and the target location information refers to the location information of the currently translated address segment in the address space to be translated, so that the translation request side can accurately identify any translation response based on the above information Which segment of address in the address space to be translated is translated by the message.
  • the translation response side can feed back the translation response message in batches and out of order according to the actual translation progress (that is, if the translation is completed first, the feedback can be given first), and a batch can contain multiple translation response messages, avoiding that in the prior art, because the translation response message does not carry any position indication information, it can only strictly follow the established order (such as the sequence based on the continuity of the virtual address), and once only The problem that the untranslated message caused by the way of feeding back a response message will delay the feedback progress of the translated message, so the embodiment of the present application can effectively improve the efficiency and parallelism of the translation response, and at the same time The utilization rate of bus bandwidth is improved; and, once the translation request side receives the translation response message, it can perform data read and write access
  • the embodiment of the present application adds the position information of the smallest translation unit (Smallest Translation Unit, STU) currently translated in the translation response message, the feedback between the translation response messages is not restricted by the feedback sequence. Therefore, the translation response messages can be out of order, parallel, or serial feedback, so the untranslated virtual addresses corresponding to the translated physical addresses in any two translation response messages can be discontinuous, thereby avoiding In the prior art, the order-preserving feedback can only be performed in strict accordance with the established order, resulting in problems such as inflexible design, extended translation time, low translation efficiency, and low bandwidth utilization.
  • STU Smallest Translation Unit
  • the translation response side such as translation agent (Translation Agent, TA)
  • TA Translation Agent
  • the order preservation requirement of the bus routing is removed and the out-of-order feedback of the translation response message is realized
  • the translation request side such as address translation high-speed From the perspective of Address Translation Cache (ATC)
  • ATC Address Translation Cache
  • the embodiment of the present application realizes that more than two translation response messages can be fed back corresponding to one address translation request (the prior art specifies no more than two), and These translation response messages can be transmitted out of order in the bus routing, and different translation response messages can be parallel or serial, that is, to ensure the efficient completion rate of address translation requests (such as the total translation result address space size/untranslated address space size requested by the address translation request), and also further improves the utilization rate of the bus bandwidth.
  • the target location information includes a base address of a first target STU among the one or more target STUs, and a size of an address space of the one or more target STUs.
  • the target location information added in the translation response message is defined, which can be specifically the base address of the translated first STU plus the information of the translated address space size, so as to indicate the translation request side The specific location of the currently translated STU in the translation response message in the original virtual address space to be translated.
  • the first device initiates a read and write access operation to the target physical address according to the first identifier and the target location information, including: the first device according to the first An identification and the target location information, determining the virtual address of the one or more target STUs; determining the read and write access operations corresponding to the virtual addresses of the one or more target STUs, and initiating the target physical address to the target physical address Describe read and write access operations.
  • the first device first determines the virtual address of the original virtual address space to be translated corresponding to the translation response message according to the first identifier in the translation response message, based on the virtual address, combined with the target location information , then the virtual addresses of one or more target STUs translated in the current translation response message can be determined, and the relationship with the one or more target STUs can be further determined according to the specific virtual addresses of the one or more target STUs The specific read and write access operation corresponding to the virtual address, so as to finally initiate the read and write access operation to the corresponding physical address.
  • the first device is one or more of a graphics processing unit (GPU), a network card, or an accelerator
  • the second device is a processor or a host including processing.
  • the first device is one or more of a GPU, a network card, and an accelerator that implements an Address Translation Cache (Address Translation Cache, ATC) function
  • the second device implements a translation agent ( Translation Agent, TA) function of the processor or host containing the processor.
  • the embodiment of the present application exemplarily gives the functional modules or chips specifically corresponding to the first device and the second device included in the bus system involved in the present application. It can be understood that in the embodiment of the present application
  • the first device or the second device includes but is not limited to the specific functional modules or chips mentioned above.
  • the address translation request further includes usage information of the virtual address space, and the usage information includes one of usage frequency, access frequency, or regional expectations of the virtual address space. one or more species.
  • the ATC side may also carry the address translation request to be translated by the ATC side.
  • the usage information of the virtual address space such as the usage frequency of the virtual address space to be accessed, the access frequency, whether it needs to be cached locally (Locality), etc., so that the TA side can optimize these virtual addresses to be translated according to the usage information Space, for example, the TA side may use these usage information to determine whether the page table mapping corresponding to this virtual address space needs to be cached to the MMU of the CPU, etc.
  • each translation response message further includes one or more of the memory attributes of the one or more target STUs, first indication information, or translation extension information; wherein, The memory attribute includes whether the target STU can be cached or the size of the access delay corresponding to the target STU, and the first indication information is used to indicate whether the current translation response message is one of the one or more translation response messages The last response message.
  • the translation response message fed back from the translation response side (that is, the second device, such as the TA side) to the translation request side (that is, the first device, such as the ATC side) may also carry the currently translated virtual address Some memory attribute information of the space and translation extension information of the translation response message, so that the translation request side (such as the ATC side) can quickly learn some characteristics of the virtual address space to be translated, and optimize the virtual address space based on these characteristics access operations.
  • At least one translation response message among the one or more translation response messages further includes the total number of translation response messages corresponding to the address translation request; the method also It includes: the second device judges whether the translation response message corresponding to the address translation request has been received according to the number of translation response messages received and the total number of the translation response messages; if received After completion, the tag resource corresponding to the address translation request is released or reused.
  • the translation response message fed back from the translation response side (that is, the second device, such as the TA side) to the translation request side (that is, the first device, such as the ATC side) may also carry the translation corresponding to the address translation request.
  • the method further includes: the second device sending an invalidation request to the first device, where the invalidation request includes a second identifier for identifying the invalidation request and a pending Virtual addresses of multiple invalidation units to be invalidated, wherein the multiple invalidation units correspond to multiple discrete virtual address spaces to be invalidated; the first device responds to the invalidation request by invalidating the multiple The mapping relationship between the virtual address and the physical address of each invalidation unit in the invalidation unit is invalidated.
  • the embodiment of the present application provides a bus communication design scheme.
  • it further designs the address invalidation mechanism in the bus communication, including redefining the address invalidation request mechanism to improve the performance of the invalidate command.
  • the embodiment of the present application defines that an invalidation request can carry multiple discontinuous invalidation address space segments (i.e. multiple invalidation units), so as to improve the transmission efficiency of the invalidation command, thereby improving the overall performance of the system for executing the invalidation command.
  • the bandwidth utilization rate of the invalidation command in the bus is greatly improved.
  • the method further includes: the first device generating one or more invalidation response messages, wherein each invalidation response message includes the second identifier and an invalidation unit number, where the invalidation unit number is the number of the currently invalidated invalidation unit in the plurality of invalidation units.
  • the embodiment of the present application provides a bus communication design scheme.
  • it further designs the address invalidation mechanism in the bus communication, including redefining the address invalidation request and response mechanism to improve the performance of invalidation commands.
  • the address invalidation mechanism in the bus communication including redefining the address invalidation request and response mechanism to improve the performance of invalidation commands.
  • a mechanism for the invalidation request side (such as the TA side) to improve the transmission efficiency of invalidation commands and the parallelism of invalidation execution, so as to improve the performance of the system as a whole in executing invalidation commands.
  • the bandwidth utilization rate of the invalidation command in the bus is greatly improved; and the invalidation response message can be returned in batches Giving the invalidation completion status to the invalidation request side (such as the TA side) also avoids the problem of increased delay in the execution completion of the invalidation command due to the aggregation of multiple invalidation response messages.
  • At least one invalidation response message among the one or more invalidation response messages further includes the total number of invalidation response messages corresponding to the invalidation request;
  • the method further includes: the second device judges the invalidation response message corresponding to the invalidation request according to the number of received invalidation response messages and the total number of the invalidation response messages Whether the receiving is completed; if the receiving is completed, the marking resource corresponding to the invalidation request is released or reused.
  • the invalidation response message fed back from the invalidation response side (that is, the first device, such as the ATC side) to the invalidation request side (that is, the second device, such as the TA side) may also carry the invalidation request The total number of corresponding invalidation response packets, so that the invalidation request side can decide whether to reuse or recycle the invalidated marked resources according to this information, so as to improve resource utilization.
  • the method further includes: the first device initiates a page table request to the second device, and the page table request includes the information in the page table request group to which the page table request belongs.
  • the embodiment of the present application provides a bus communication design scheme.
  • it further designs the page table request mechanism in the bus communication, including redefining the page table request mechanism. , to improve the performance of the page table request command.
  • this embodiment of the application defines that a page table request (such as a command/message) can carry multiple
  • the information of the page table application unit can apply to the system to establish multiple discontinuous page table spaces correspondingly.
  • a page table request message transmitted by the bus can carry information of multiple page table request units, the overhead of the page table request command can be reduced, the bandwidth utilization rate of the bus can be improved, and the transmission efficiency of the page table request command and the bus can be improved. bandwidth utilization.
  • the page table request further includes one or more of page table space size, read and write permissions, desired memory attributes, or usage prompt information corresponding to each page table application unit.
  • the page table request may also carry the expected information of the page table space applied for by the page table request side (such as the ATC side), for example, the size of the page table space expected to be established, the memory of the page table space expected to be established Model properties, system properties, etc.
  • the ATC side can provide an interface for the ATC side to propose more user-desired information when applying for a page table to the page table response side (such as the TA side or the system side), so that the TA can be based on more accurate user information Create a page table for the corresponding ATC, so that the ATC can optimize its DMA behavior according to the memory model attributes of the corresponding space, and expand the design space of the ATC.
  • TA From the point of view of the page table response side (such as the TA side), TA has the opportunity to adjust its strategy of establishing page table and the attributes of the established page table according to these expected information, thus greatly expanding the design space of TA/system, allowing TA/
  • the page table created by the system for ATC is more in line with ATC's expectations, so that there is an opportunity to improve the overall page table management efficiency and improve the overall performance of the system. For example, according to the size and usage of the page table application unit that ATC wants to apply for, TA/system It is possible to determine the size (granularity) of the correspondingly established page table, whether the correspondingly established page table can be imported (loaded) into the cache (cache) in advance, and other behaviors.
  • the method further includes: the second device generates one or more page table response messages in response to the page table request; wherein each page table response message The text includes the page table request code and the currently established mapping relationship between the virtual address and the physical address of the page table application unit, or each of the page table response messages includes the page table request code; the first The second device feeds back the one or more page table response messages to the first device.
  • a bus communication design scheme is provided.
  • the page table request and response mechanism in the bus communication is further designed, including redefining the Page table request and response mechanism to improve the performance of page table request and response commands.
  • a page table request (such as a command/message) can carry the information of multiple page table application units, correspondingly, it can apply to the system to establish multiple discontinuous page table spaces, and further defines the same
  • the PRG can correspond to multiple page table response messages, and the page table response side (such as the TA side) can feed back the page table response messages to the page table request side (such as the ATC side) in batches and without order requirements.
  • these page table response messages can directly carry the page table information that has been successfully established, so that the address mapping result can be obtained after the page table request is sent again after the page fault occurs in the first address translation request of ATC , compared with the prior art, the sending and feedback of the second address translation request are reduced.
  • the embodiment of the present application can reduce the steps needed to obtain the required address mapping result after a page fault occurs in the first page table request in the existing ATS technology, thereby improving the memory access performance of the system. It can be understood that when each page table response message only includes the page table request code but does not include the address mapping relationship, then the ATC needs to send the corresponding address translation request after sending the page table request, to further request untranslated virtual addresses.
  • the page table response message in the embodiment of this application contains two implementation methods, one is to initiate a page table request only once after a page fault, and the other is to initiate a page table request after a page fault
  • the request also needs to initiate an address translation request, that is, it is compatible with two page table response methods at the same time.
  • each page table response message also includes the base address of the page table space corresponding to the established page table application unit, the size of the page table space, the memory model attribute of the page table space, the read/write One or more of permissions or system properties.
  • the page table response message also carries the base address of the page table space corresponding to the currently established page table application unit, the size of the page table space, the memory model attribute of the page table space, read and write permissions, and system attributes One or more of them can make the page table response side (such as the TA/system side) return the attributes of the established page table, so that the page table request side (such as the ATC side) can obtain more page table attributes, so that according to the corresponding
  • the memory model attribute of the page table space optimizes the way the page table request side ATC itself uses this page table space, for example, optimizes the subsequent DMA behavior of the page table request side ATC, thereby improving the overall performance of the system and expanding the design space of the ATC.
  • At least one of the one or more page table response messages further includes the total number of page table response messages corresponding to the page table request group PRG
  • the method further includes: the second device judges the page table request group PRG corresponding to the page table request group PRG according to the total number of page table response messages and the number of received page table response messages Whether the page table response message has been received; if it has been received, release or reuse the tag resource corresponding to the page table request group PRG.
  • the page table response message fed back from the page table response side (that is, the second device, such as the TA side) to the page table request side (that is, the first device, such as the ATC side) may also carry the associated page
  • the method further includes: the first device receiving the one or more page table response messages fed back by the second device;
  • the first device initiates an access operation to the corresponding physical address according to the page table request code and the currently established mapping relationship between the virtual address and the physical address of the page table request unit; or, for each The page table response message, the first device determines the virtual address corresponding to the corresponding page table request unit according to the page table request code, and initiates a request to the second device for the corresponding virtual address of the page table request unit The translation request for the virtual address of .
  • the page table response message carries the page table information that has been successfully established, after the first address translation request of the ATC on the page table request side, a page fault occurs, and the page table request is sent again.
  • the page table response message directly obtains the address mapping result, and performs subsequent access operations on the corresponding physical address according to the address mapping result.
  • the sending and feedback of the second address translation request is reduced, and the existing address translation service ATS technology can reduce the number of pages to be retrieved after a page fault occurs during the first page table request. The steps required to map the results, thereby improving the memory access performance of the system.
  • the page table response message does not carry the page table information that has been successfully established, the ATC at the page table requesting side needs to initiate an address translation request for the virtual address whose page table has not been successfully established.
  • the embodiment of the present application provides a bus system
  • the bus system may include a plurality of devices interconnected through the bus, and the plurality of devices interconnected through the bus include at least a first device and a second device;
  • the first device may be configured to send an address translation request to the second device, where the address translation request includes a first identifier for identifying the address translation request and a virtual address of the virtual address space to be translated, and the address translation request
  • the virtual address space includes one or more minimum translation units STU;
  • the second device may be configured to generate one or more translation response messages in response to the address translation request; wherein, the one or more translation response messages
  • Each translation response message in the text includes the first identification, target location information and target physical address, and the target location information is that one or more target STUs in the one or more STUs are located in the to-be
  • the location information in the translated virtual address space, the target physical address is the translated physical address of the virtual address of the one or more target STUs; the second device can be used to
  • the target location information includes the base address of the first target STU in the one or more target STUs, and the size of the address space of the one or more target STUs.
  • the first device is specifically configured to: determine the virtual addresses of the one or more target STUs according to the first identifier and the target location information; or read and write access operations corresponding to the virtual addresses of multiple target STUs, and initiate the read and write access operations to the target physical address.
  • the address translation request further includes usage information of the virtual address space, and the usage information includes one of usage frequency, access frequency, or regional expectations of the virtual address space. one or more species.
  • each translation response message further includes one or more of the memory attributes of the one or more target STUs, first indication information, or translation extension information; wherein, The memory attribute includes whether the target STU can be cached or the size of the access delay corresponding to the target STU, and the first indication information is used to indicate whether the current translation response message is one of the one or more translation response messages The last response message.
  • At least one translation response message among the one or more translation response messages further includes the total number of translation response messages corresponding to the address translation request; the second The device is further configured to determine whether the translation response message corresponding to the address translation request has been received according to the number of received translation response messages and the total number of the translation response messages; if the reception is completed, Then release or reuse the tag resource corresponding to the address translation request.
  • the second device is further configured to send an invalidation request to the first device, where the invalidation request includes a second identifier for identifying the invalidation request and multiple Virtual addresses of multiple invalidation units, wherein the multiple invalidation units correspond to multiple discrete segments of virtual address space to be invalidated; the first device is further configured to respond to the invalidation request to the multiple virtual address spaces The mapping relationship between the virtual address and the physical address of each invalidation unit in the invalidation unit is invalidated.
  • the first device is further configured to generate one or more invalidation response messages, where each invalidation response message includes the second identifier and the invalidation A unit number, where the invalidation unit number is the number of the currently invalidated invalidation unit in the plurality of invalidation units.
  • At least one invalidation response message among the one or more invalidation response messages further includes the total number of invalidation response messages corresponding to the invalidation request;
  • the second device is further configured to determine whether the invalidation response message corresponding to the invalidation request is received according to the number of received invalidation response messages and the total number of the invalidation response messages Complete; if the reception is complete, the marking resource corresponding to the invalidation request is released or reused.
  • the first device is further configured to initiate a page table request to the second device, where the page table request includes information about the page table request in the page table request group PRG to which it belongs.
  • the page table request includes information about the page table request in the page table request group PRG to which it belongs.
  • the page table request further includes page table space size, read and write permissions, expected memory attributes or usage prompt information corresponding to each page table application unit in the plurality of page table application units one or more of.
  • the second device is further configured to generate one or more page table response messages in response to the page table request; wherein each page table response message includes the The page table request code, and the mapping relationship between the virtual address and the physical address of the currently established page table application unit, or, each of the page table response messages includes the page table request code; the second device, It is also used to feed back the one or more page table response messages to the first device.
  • each page table response message also includes the base address of the page table space corresponding to the established page table application unit, the size of the page table space, the memory model attribute of the page table space, the read/write One or more of permissions and system properties.
  • At least one of the one or more page table response messages further includes the total number of page table response messages corresponding to the page table request group PRG number; the second device is also configured to determine the page table corresponding to the page table request group PRG according to the total number of page table response messages and the number of received page table response messages Whether the response message has been received; if it has been received, release or reuse the tag resource corresponding to the page table request group PRG.
  • the first device is further configured to: receive the one or more page table response messages fed back by the second device; for each page table response message, Initiate an access operation to the corresponding physical address according to the page table request code and the currently established mapping relationship between the virtual address and the physical address of the page table application unit; or, for each page table response message, Determine the virtual address corresponding to the page table request unit according to the page table request code, and initiate a translation request for the virtual address corresponding to the page table request unit to the second device.
  • the present application provides a semiconductor chip, which may include the bus system provided in any one of the implementation manners in the second aspect above.
  • the present application provides a semiconductor chip, which may include: the bus system provided by any one of the implementation manners in the second aspect above, an internal memory coupled to the bus system, and an external memory.
  • the present application provides a system-on-chip SoC chip
  • the SoC chip includes the bus system provided by any one of the implementation manners in the second aspect above, an internal memory and an external memory coupled to the bus system.
  • the SoC chip may consist of chips, or may include chips and other discrete devices.
  • the present application provides a chip system, which includes the bus system provided in any one of the implementation manners in the second aspect above.
  • the system-on-a-chip further includes a memory, and the memory is configured to store necessary or related program instructions and data during operation of the data processing device.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • the present application provides a bus communication device, the bus communication device has the function of implementing any one of the bus communication methods in the first aspect above. This function may be implemented by hardware, or may be implemented by executing corresponding software on the hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by the bus system, the bus communication method described in any one of the above-mentioned second aspects is implemented. process.
  • the embodiment of the present application provides a computer program, the computer program includes an instruction, and when the instruction is executed by the bus system, the bus system can execute the process of the bus communication method described in any one of the above-mentioned second aspects .
  • FIG. 1 is a schematic diagram of a hardware structure of a PCIe bus system provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a hardware structure of a bus system provided by an embodiment of the present application.
  • FIG. 3A is a schematic flowchart of a bus communication method provided by an embodiment of the present application.
  • FIG. 3B is a schematic diagram of an implementation structure of a translation response message provided by an embodiment of the present application.
  • FIG. 3C is a schematic diagram of another implementation structure of a translation response message provided by an embodiment of the present application.
  • FIG. 3D is a schematic diagram of a physical address space after virtual address space translation according to an embodiment of the present application.
  • FIG. 3E is a schematic diagram of a corresponding translation result response for all untranslated address spaces of a translation request provided by an embodiment of the present application.
  • FIG. 4A is a schematic diagram of a data structure of an invalidation request message provided by an embodiment of the present application.
  • FIG. 4B is a schematic diagram showing the space size of translation results provided by the embodiment of the present application.
  • FIG. 4C is a schematic diagram of the format of an invalidation response message provided by the embodiment of the present application.
  • FIG. 5A is a schematic diagram of a format of a page table response message with page table information provided by an embodiment of the present application.
  • FIG. 5B is a schematic diagram showing the size of the page table space provided by the embodiment of the present application.
  • FIG. 5C is a schematic diagram of another page table response message format with page table information provided by the embodiment of the present application.
  • FIG. 5D is another schematic diagram showing the size of the page table space provided by the embodiment of the present application.
  • FIG. 5E is a schematic diagram of a scene where the ATC receives a page table response message in an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a bus system in an embodiment of the present application.
  • a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a computing device and the computing device can be components.
  • One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers.
  • these components can execute from various computer readable media having various data structures stored thereon.
  • a component may, for example, be based on a signal having one or more packets of data (e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems). Communicate through local and/or remote processes.
  • packets of data e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems.
  • Bus is a public line or path used in a computer to connect various functional components and transmit data between them.
  • the bus can be divided into: chip bus, also known as device-level bus, which is the bus inside the central processing unit chip.
  • the internal bus also known as the system bus or board-level bus, is the transmission path between the various functional departments of the computer.
  • the microcomputer bus is usually called the internal bus.
  • External bus also known as communication bus, is a transmission path between computer systems, or between a computer host and peripheral devices.
  • the bus communication system in this application may include the above-mentioned chip system, internal bus or external bus.
  • Virtual address (virtual address): For each running program, the system will allocate a virtual address space for it, that is, the program runs in the virtual address space. The virtual address of this virtual address space does not actually exist in the computer.
  • Physical address The address placed on the addressing bus.
  • the physical address space is an entity that actually exists in the computer and maintains unique independence in each computer. If it is read, the circuit puts the data in the physical memory of the corresponding address into the data bus for transmission according to the value of each bit of the address. If it is writing, the circuit puts the content on the data bus in the physical memory of the corresponding address according to the value of each bit of the address.
  • PCIe Peripheral Component Interconnect Express
  • Root Complex (Root Complex, RC): A platform system can contain one or more RCs.
  • RC devices can be used to connect the processor and memory subsystem to a PCI Express switch fabric consisting of one or more switch devices.
  • Root Port (Root Port, RP): Each RC can support one or more RPs. Each RP represents an independent hierarchy.
  • Address Translation and Protection Table (Address Translation and Protection Table, ATPT), in the computer operating system, address translation is to convert the user's logical address into the physical address of the memory, and complete the address relocation.
  • the address translation table refers to a table used for address translation, in which logical addresses and physical addresses are in one-to-one correspondence. It is mainly used to speed up the address conversion process and improve the speed and efficiency of reading data in the computer system.
  • Direct memory access a device interface tries to send data (usually a large amount of data) directly to another device through the bus, it will first send a DMA request signal to the CPU.
  • the peripheral device sends a bus request to the CPU to take over the bus control right through DMA controller (DMAC), a special interface circuit of DMA.
  • DMAC DMA controller
  • the CPU will press the DMA signal after the current bus cycle ends. Priority and the order in which DMA requests are made respond to DMA signals.
  • the CPU responds to a DMA request for a device interface, it relinquishes control of the bus. So under the management of the DMA controller, the peripheral hardware and the memory directly exchange data without CPU intervention. After the data transfer is completed, the device interface will send a DMA end signal to the CPU and return the bus control right.
  • ATS address translation service
  • ATC is implemented in the PCIe device connected to the host. Then ATC initiates an address translation request to TA, and after TA translates, it returns the corresponding address mapping to ATC, and ATC locally caches these address mappings, so that when the PCIe device accesses the memory space of the CPU (such as DMA), it can directly access the memory space from the local The corresponding physical address is obtained in the ATC, so that the physical address directly accesses the memory space of the CPU.
  • the TA When some of the address mappings in the TA cached locally by the ATC are about to be invalid, the TA will initiate an invalidation request to the ATC, and the ATC will return an invalidation response to the TA after invalidating the corresponding mappings in the local cache.
  • the ATS requester Requester
  • the TA or RC it is collectively called the ATS completer (completer), or Directly use TA to refer to.
  • the address space that a translation request can represent is more limited in today's high-bandwidth and high-speed buses, and its maximum is 16*2 (STU+12) bytes, while in existing systems, STU often takes a value is 0, and in the case that today's bus has higher bandwidth and speed (for example, the speed of each link of PCIe6. More translation request commands are required, which consumes more bus bandwidth.
  • One translation request corresponds to at most two translation response messages, and the translation size must be the same in the same response message. This will limit the design space of TA, and may increase the probability that one address translation request cannot obtain the translation results of all requested address spaces, thereby reducing the overall translation efficiency.
  • An invalidation request message can only carry an invalidation indication of one continuous address space segment, which increases the overhead of the invalidation request message header.
  • a page table request cannot carry more information, for example, it cannot carry the size of the page table space you want to request to build, and a page table request can only carry a virtual address of the page table space you want to request to build, Increased the overhead of the page table request header.
  • the above-mentioned PCIe ATS solution has problems such as low communication efficiency, extended communication time, excessive bus bandwidth occupation, large bus overhead, and limited design space. Therefore, the technical problems to be solved in this application can include the following aspects: the existing ATS technology can be improved under the bus with high bandwidth and high speed, which has insufficient performance, large bus overhead, and can not match the shortcomings of high bandwidth and high speed of the bus; expand the ATS mechanism to allow The new ATS mechanism can have higher performance and provide a highly scalable interface for the system; when a page fault occurs in the existing ATS technology, it is necessary to additionally request a page from TA through the page table request interface The low-efficiency mechanism of obtaining translation results through ATS translation requests after the table is established, and a new page table request and response mechanism is proposed to reduce the process of obtaining translation results through secondary translation requests.
  • the bus communication method in this application can be applied to various bus system architectures, such as PCIe bus system, AMBA bus system, etc.
  • PCIe PCI Express
  • the embodiment of this application will use the bus communication system as PCI Express (PCIe) system is taken as an example to describe the bus communication method in this application.
  • PCIe PCI Express
  • the basic configuration of the PCIe system will be described first.
  • the PCIe bus is a high-speed serial computer expansion bus standard, which belongs to the point-to-point transmission structure and adopts a serial connection method.
  • a PCIe link in the PCIe bus system the connected ports are completely equal, respectively connected to the sending device and the receiving device, and one end of a PCIe link can only be connected to one sending device or receiving device.
  • multiple devices can be connected. Its typical structure is that a root port (root port) and an endpoint (endpoint) directly form a point-to-point connection pair, and the Switch can connect several endpoints at the same time, so as to realize the example of link expansion through the Switch.
  • a device conforming to the PCIe bus standard is called a PCIe device, and a PCIe bus system may include multiple PCIe devices interconnected through the PCIe bus.
  • Fig. 1 is a schematic diagram of the hardware structure of a PCIe bus system provided by the embodiment of the present application.
  • the PCIe bus system 10 can be located in any electronic device, such as a computer, a computer, a mobile phone, a tablet and other various devices. .
  • the hardware structure of the PCIe bus system 10 may specifically be a chip or a chip set or a circuit board equipped with a chip or a chip set.
  • the chip or chipset or the circuit board equipped with the chip or chipset can work under the necessary software drive.
  • the interconnected PCIe devices may include a host 100 and a plurality of node devices interconnected by a PCIe bus, wherein the host 100 may include one or more CPUs 101, main memory (Memory) 102, and include an IO address management function (for example, the input-output address management unit (IOMMU), the system address management unit (SMMU) under the ARM system or the VT-d under the X86 system) TA103, and a cache memory (such as cache) not shown in the figure, Computing systems with internal interconnect buses, etc.
  • IOMMU input-output address management unit
  • SMMU system address management unit
  • TA103 a cache memory (such as cache) not shown in the figure, Computing systems with internal interconnect buses, etc.
  • Node equipment can specifically be image processor (GPU1) 105, smart network card 106, solid-state disk SSD 107 and image processor (GPU2) 108 etc.; , Congestion control and other functions.
  • the host 100 can be interconnected with multiple node devices or switches 109 through the switching device RC 104.
  • FIG. 2 is a schematic diagram of the hardware structure of a bus system provided by the embodiment of the present application.
  • the bus system 20 can also be located in any electronic device, such as a computer, computer, mobile phone, tablet and other devices.
  • the hardware structure of the bus system 20 may specifically be a chip or a chipset or a circuit board equipped with a chip or a chipset.
  • the chip or chipset or the circuit board equipped with the chip or chipset can work under the necessary software drive.
  • each node device such as SSD205, network card 206, accelerator 207, network card 2 208, GPU1 209, GPU2 210) through the switching device RC 104, and each node device
  • switches Switch
  • each node device may be multipath reachable (multipathing)
  • the entire interconnection topology of the bus system in Figure 2 is a flattened (flat) figure
  • the bus system in Figure 2 supports multipathing.
  • ATS-related data streams will communicate freely on the bus, and each node may be multipath reachable.
  • the bus communication method in the embodiment of the present application can support the above-mentioned multi-path (because multi-path will inevitably involve out-of-order) because it supports the characteristics of out-of-order feedback of response messages; Therefore, it is unable to support the multi-path architecture in Figure 2 above.
  • bus system architectures in FIG. 1 and FIG. 2 are only several exemplary implementations in the embodiments of the present application, and the bus system architectures in the embodiments of the present application include but are not limited to the above application scenarios.
  • the bus communication method in this application can also be applied to bus communication system architectures such as various parallel buses such as AMBA bus, serial bus, chip bus, internal bus, and external bus. Other scenarios and examples will not be listed one by one. repeat.
  • Any CPU included in the host can be a general-purpose processor (CPU), an embedded processor, a special-purpose processor, and the like.
  • the host may also include functional modules such as RC/RP, TA, and ATPT. in,
  • the translation agent (Translation Agent, TA) is implemented by hardware or a combination of software and hardware. Responsible for converting virtual addresses in PCIe transactions into actual physical addresses.
  • TA can support Address Translation Services (ATS), which enables PCIe Function to obtain the address translation relationship of the target memory in advance before starting data access services (such as DMA).
  • ATS Address Translation Services
  • TA may contain TLB (table lookup buffer), which is used to speed up the access of the address translation table.
  • the Address Translation and Protection Table is a table that stores address translation relations and can be accessed by TAs.
  • Root Complex (Root Complex, RC): A platform system can contain one or more RCs.
  • RC devices can be used to connect the processor and memory subsystem to a PCI Express switch fabric consisting of one or more switch devices.
  • Each RC can support one or more RPs. Each RP represents an independent hierarchy.
  • Switch Switch used for the expansion of the bus system, for example, allowing more PCIe devices or PCIe Switches to be connected.
  • a PCIe device which can be a network card, GPU, SSD, accelerator, storage, etc.
  • the network card is mainly used to connect the entire PCIe system to the network, that is, to connect the device to the external network;
  • the accelerator is usually used to assist the CPU to accelerate a specific function , such as matrix accelerators, etc.;
  • GPU a microprocessor usually used for image and graphics-related operations;
  • SSD also known as a solid-state drive, is a hard disk made of an array of solid-state electronic memory chips.
  • the address translation cache (Address Translation Cache, ATC) is essentially a Cache, which is used to store the above-mentioned TA-translated page table, that is, to cache the virtual address and its mapped physical address. If a PCIe device has its own ATC, there is no need to query the IO TLB, which can relieve the pressure on the IO TLB and improve memory access performance.
  • the above-mentioned system architecture in FIG. 1 or FIG. 2 may include devices that implement ATC, or devices that do not implement ATC. For example, in the above-mentioned system architecture of FIG.
  • the ATC function can be realized in GPU1, GPU2, and the network card, but the ATC function is not implemented in the SSD; for another example, in the above-mentioned system architecture of FIG.
  • the ATC function can be implemented in the network card 2 and the accelerator, but the ATC function is not implemented in the SSD.
  • the above-mentioned interactive process between the translation agent (TA) and the address translation cache (ATC) finally forms the address translation service (ATS).
  • ATS address translation service
  • the ATC of the PCIe device cannot complete the address mapping
  • the ATC where the The PCIe device sends an ATS Request to the TA; after the TA completes the address mapping, it will return the result to the PCIe device, so that the ATC in the PCIe device has the address mapping item cached.
  • PCIe devices are usually divided into devices that implement ATC functions and devices that implement TA functions according to their functions of initiating requests or completing responses.
  • Devices that implement ATC functions are usually those that initiate translation requests or page table The initiator of the request, and the device implementing the TA function or the device where the RC is located is usually the responder responding to the translation request or responding to the page table request; or, the device implementing the TA function is usually the initiator of the invalid request, and the ATC function is implemented The device that completes the invalid request response is the responder.
  • ATC initiates an address translation request to TA, and after TA translates, it returns the corresponding address mapping to ATC, and ATC caches these address mappings locally, so that when it is necessary to access the device where TA is located, such as the memory space of the CPU (such as DMA),
  • the corresponding physical address can be obtained directly from the local ATC, so that the physical address directly accesses the memory space of the CPU; when some address mappings in the TA that are locally cached by the ATC are invalid, the TA will initiate an invalidation request to the ATC , the ATC returns an invalidation response to the TA after invalidating the corresponding mapping of the local cache.
  • the ATC side or ATC will be directly used to refer to the device implementing the ATC function; similarly, the TA side or TA will be used to refer to the device implementing the TA function, and details will not be described later. It can be understood that some devices in the bus system can also have TA and ATC capabilities at the same time, that is, in different ATSs, they can use their TA capabilities or ATC capabilities as different roles in the ATS. This is not specifically limited.
  • the first device in this application can be considered as a device that realizes the ATC function or has the ATC capability
  • the second device can be considered as a device that realizes the TA function or has the TA capability.
  • the first device in this application can be the GPU, network card, accelerator, etc. that realize the ATC function in the above-mentioned Figure 1 or Figure 2
  • the second device in this application can be the above-mentioned Figure 1 or Figure 2. Host with TA function.
  • the bus communication method in the embodiment of the present application may be applicable to include but not limited to the bus system architecture in FIG. 1 or FIG. 2 above.
  • the PCIe bus system may include multiple PCIe devices, and the multiple PCIe devices may include one or more A first device that realizes the ATC function, and one or more second devices that realize the TA function.
  • the first device or the second device in this application may be a device with ATC capability in the corresponding system, or
  • the devices with TA capability that is, the first device and the second device in this application include but are not limited to the devices mentioned in FIG. 1 or FIG. 2 above.
  • FIG. 3A is a schematic flowchart of a bus communication method provided by an embodiment of the present application. This method can be applied to the PCIe bus system described in FIG. 1 or FIG. 2 above, and the bus system 10 can be used for Support and execute the method flow steps S301-S304 shown in FIG. 3A. The following will describe from the interaction side of the first device and the second device in the bus system 10 with reference to FIG. 3A .
  • the method may include the following steps S301 to S304, and optionally, may also include steps S305 to S306.
  • Step S301 the first device sends an address translation request to the second device.
  • the bus communication method in the embodiment of the present application is applied to a bus system, and the bus system includes a plurality of devices interconnected through a bus, and the plurality of devices interconnected through a bus include at least a first device and a second device.
  • the first device can be understood as a node device that implements the ATC function or has ATC capabilities during the bus communication process (for ease of description, it will be referred to as ATC later), and the second device can be understood as implementing the TA function during the bus communication process.
  • TA a node device with TA capability (for ease of description, TA will be abbreviated as hereinafter).
  • the ATC When the ATC informs it through software that it needs to access the address space of TA, since the address space is a virtual address space, during the access process, it needs to translate the virtual address space into a corresponding section or several sections of physical address space to realize ATC's real access to TA's memory space. Specifically, when ATC needs to access a certain segment of virtual address space in TA, ATC initiates an address translation request to TA, and TA returns the corresponding address mapping to ATC after translation, and ATC caches these address mappings locally, so that ATC can When accessing the virtual address space of the TA (such as the memory space DMA of the CPU), the corresponding physical address can be obtained directly from the local ATC, thereby directly accessing the memory space of the CPU through the physical address.
  • the virtual address space of the TA such as the memory space DMA of the CPU
  • the address translation request includes a first identifier identifying the address translation request and a virtual address of a virtual address space to be translated, and the virtual address space to be translated includes one or more minimum translation units STU; wherein, the first An identifier is used for TA to carry in the feedback translation response message, so that ATC can identify which address translation request the translation response message corresponds to by identifying the identifier in the translation response message; the smallest translation unit STU refers to a continuous virtual address Space, that is, the translation response side takes the STU as the smallest translation unit.
  • the packet length of the address translation request there is no limit to the packet length of the address translation request, and the length field value of the packet can reach the length of a general data read request, that is, it can be longer than the prior art. Therefore, the size of the virtual address space to be translated carried by a translation request in the embodiment of the present application can be expanded, so that the message overhead of the address translation request can be greatly reduced.
  • the address translation request further includes usage information of the virtual address space, and the usage information includes one of usage frequency, access frequency, or regional expectations of the virtual address space.
  • the regional expectation can be considered to refer to the degree of intensity that the first device side expects to cache the data corresponding to the virtual address space locally (Locality), that is, according to whether the data corresponding to the virtual address space needs to be frequently
  • the data cache (cache) corresponding to the virtual address space is local, and does not need to be obtained from other memories such as double-rate synchronous dynamic random access memory (Double DataRate, DDR) every time.
  • DDR double-rate synchronous dynamic random access memory
  • the ATC side may also carry the address translation request to be translated by the ATC side.
  • the usage information of the virtual address space such as the usage frequency of the virtual address space to be accessed, the access frequency, whether it needs to be cached locally (Locality), etc., so that the TA side can optimize these virtual addresses to be translated according to the usage information Space, for example, the TA side may use these usage information to determine whether the page table mapping corresponding to this virtual address space needs to be cached to the MMU of the CPU, etc.
  • Step S302 the second device generates one or more translation response messages in response to the address translation request.
  • each translation response message also needs to carry the first identifier that uniquely identifies the address translation request, and the target location of one or more target STUs in the one or more STUs in the virtual address space Information, and the target physical address after the translation of the virtual address of the one or more target STUs.
  • the target location information includes a base address of a first target STU among the one or more target STUs, and a size of an address space of the one or more target STUs.
  • the target location information added in the translation response message can be specifically the base address of the first STU to be translated plus the information on the size of the translated address space, so as to indicate to the translation request side that the translation response message contains The specific location of the currently translated STU in the original virtual address space to be translated.
  • each translation response message carries the identifier of the corresponding address translation request and increases the currently translated address space in the original request
  • the location information in the translated address space enables the requesting translation side to accurately identify which address space of the original request translation is translated by any translation response message and which segment of the address space it is in, based on the above information.
  • the embodiment of the present application is not affected and constrained by the feedback order of the translation response messages, and the translation response messages can be fed back out of order, serially, or in parallel, thereby avoiding the problem of serializing the response messages in strict accordance with the established order in the prior art. Problems such as design inflexibility, time extension, and low efficiency caused by sequence-preserving feedback.
  • each translation response message further includes one or more of the memory attributes of the one or more target STUs, first indication information, or translation extension information; wherein, The memory attribute includes whether the target STU can be cached or the size of the access delay corresponding to the target STU, and the first indication information is used to indicate whether the current translation response message is the last of the one or more translation response messages Response text.
  • the translation response message fed back from the translation response side (that is, the second device, such as the TA side) to the translation request side (that is, the first device, such as the ATC side) may also carry the currently translated virtual address Some memory attribute information of the space and translation extension information of the translation response message, so that the translation request side (such as the ATC side) can quickly learn some characteristics of the virtual address space to be translated, and optimize the virtual address space based on these characteristics access operations.
  • the translated address space may be all of the address spaces to be translated in the address translation request, or May be part.
  • the second device may not find the mapping relationship between the virtual address and the physical address of the address space segment that needs to be translated, or the translation fails due to other reasons, so the address translation received by the first device
  • the response message may only correspond to a part of the virtual address space to be translated.
  • the first device can send a page table request carrying virtual address and other information to the second device, and the second device returns one or more pages
  • the table responds to the first device, so that the first device can directly obtain the translation result or initiate an address translation request again to obtain the translation result.
  • the subsequent step S308-step S311 please refer to the description of the subsequent step S308-step S311, which will not be described in detail here.
  • step S302 after the second device processes all or part of the address space requested for translation by the address translation request, it can return the corresponding translation result.
  • the second device takes the interaction between the TA in the second device and the ATC of the first device as an example, how the second device responds to the address translation request in this embodiment of the present application is exemplified. in,
  • TA generates translation results and returns them to ATC through translation response messages.
  • Each translation request can correspond to multiple translation response messages;
  • the information that needs to be carried in a translation response message for the same address translation request includes: it can uniquely identify which address translation request the translation response message corresponds to, and the translation result (Translations ), the untranslated address space corresponding to the first translation result carried in this response message starts from the STU space of the address space requested for translation in the translation request, and the translation corresponding to each translation result What is the size of the address space, the attributes of the corresponding translation address space, whether this translation response message is the last translation response message corresponding to the translation request, whether there is an instruction to carry translation extension information, and if there is extended translation information, it should be Include the corresponding translation extension information.
  • the untranslated address spaces corresponding to the translation results carried in a translation response message should be continuous; a translation response message must also contain an indication of the untranslated address space corresponding to the first translation result , which starts from the STU space of the address space requested for translation in the translation request; a translation response message should have a bit field indicating whether the current message is the last translation response message corresponding to the translation request, if so For the last one, it also indicates the number of all translation response messages corresponding to this translation request.
  • FIG. 3B is a schematic diagram of an implementation structure of a translation response message provided by an embodiment of the present application.
  • the Start STU field indicates that the first 64-bit data carried by the TransCpl message (corresponding to the first translation result) is the translation of the space corresponding to the first STU space segment of the untranslated address space of TransReq result.
  • the TransCpl is a completion message without data
  • this field is reserved, and TA should be filled with 0.
  • ATC does not pay attention to the specific value of this field.
  • the L field indicates whether the current TransCpl is the last translation response message corresponding to the translation request.
  • the Cpl_Num field indicates how many translation response messages the address translation request belongs to the current TransCpl, and its value is equal to the actual number of messages minus 1. When the value is 0, the number of messages indicated is the maximum , only the Cpl_Num value corresponding to the response message whose L field is 1 is (required) accurate.
  • Each translation result also includes the memory attributes of the corresponding translated address space, such as whether this space is cacheable, whether the corresponding access delay is particularly large/small, etc.; as shown in Figure 3C , FIG. 3C is a schematic diagram of another implementation structure of a translation response message provided by the embodiment of the present application.
  • the memory attribute (MemAttr) is used to indicate the memory (memory) attribute corresponding to the translation address space, and its specific coding meaning is related to the TA system, or it can be coded and indicated as a more abstract (such as establishing a common memory space model) coding meaning , the following table 1 is a possible encoding meaning:
  • Step S303 the second device feeds back the one or more translation response messages to the first device.
  • the second device can provide feedback in the manner of generating a translation response message and immediately feeding back a translation response message, that is, the second device can feed back translation response messages to the first device in batches according to the actual translation progress , and a batch can include one or more translation response messages.
  • the original untranslated virtual address corresponding to the translated physical address in any two different translation response messages can be discontinuous, and translation is requested
  • the side can also receive translation response messages in batches, and one batch can contain one or more, which is different from the prior art that can only be divided into two times at most and can only feed back one response message at a time; it can reach Reduce the feedback delay of the translation response message and improve the effect of bandwidth utilization.
  • the TA in the second device can expand its design space according to the translation completion response mechanism, and reduce the translation request completion delay of the ATC in the first device, and improve the transmission utilization rate of the bus for the translation request (obtained translation space size/number of translation requests).
  • Step S304 For each translation response message, the first device initiates a read and write access operation to the target physical address according to the first identifier and the target location information.
  • the first device can restore the address space segment translated in the current translation response message to the original to-be-translated address space segment according to the first identifier and target location information carried in each translation response message.
  • the specific virtual address in the address space and according to the determined specific virtual address, determine the specific read and write access operation to the translated address space segment, and finally initiate the specific read and write access operation to the corresponding physical address.
  • the first device may first determine the virtual address of the original virtual address space to be translated corresponding to the translation response message according to the first identifier in the translation response message, based on the virtual address and combined with the target location information, then The virtual address of the translated address space segment (i.e.
  • one or more target STUs) in the current translation response message can be determined, and according to the specific virtual addresses of the one or more target STUs, it can be further determined to be related to the one or more target STUs Specific read and write access operations corresponding to the virtual addresses of multiple target STUs, and finally initiate the read and write access operations to corresponding physical addresses.
  • the requesting translation side can perform data read and write access operations (such as DMA operations) according to the physical address in the received translation response message, so that the requesting translation side waits for the delay of the translation result It is further shortened, and the bus bandwidth can be used more fully and efficiently.
  • the embodiment of this application defines a new address translation request and translation completion response, so that an address translation request can apply for a larger address space, and the translation completion response can be returned to the requester in batches and out of order, and the translation response is defined
  • the translation extension information carried in the message can carry more system information of the system where the TA is located to the requester, which can solve (1), (2) and (3) of the above analysis of the shortcomings of the prior art.
  • At least one translation response packet in the one or more translation response packets further includes a translation response packet corresponding to the address translation request the total number; the method further includes: the second device judges the translation corresponding to the address translation request according to the number of translation response messages received and the total number of translation response messages Whether the reception of the response message is completed; if the reception is completed, the tag resource corresponding to the address translation request is released or reused.
  • the TA side may also carry in the last translation response message sent in the translation response message, whether the current translation response message is the last message in the one or more translation response messages instructions.
  • the translation response message fed back from the translation response side (that is, the second device, such as the TA side) to the translation request side (that is, the first device, such as the ATC side) may also carry the translation corresponding to the address translation request.
  • the total number of response messages, or based on the total number and the indication of whether the last response message carried in the last response message is the last response message, so that the translation request side can decide whether to send the address based on this information Mark resources corresponding to translation requests for reuse or recycling to improve resource utilization.
  • step S304 after the ATC in the second device receives all the translation response messages of the address translation request, it can determine which address spaces requesting translation have obtained the translation results according to the information carried in the messages, and according to the translation results Perform the access operation of the corresponding address. In the following, in combination with the specific virtual address space and the translated physical address space, how the TA in the first device in this application operates and accesses the physical address space of the virtual address space to be accessed according to the translation response message .
  • Figure 3D is a schematic diagram of a physical address space translated from a virtual address space provided by the embodiment of the present application.
  • TA has received an address translation request from ATC, which requests the size of the virtual address space to be translated is 32 STUs;
  • TA starts to process the address translation request, and can return a translation response message to ATC within a certain period of time. According to TA's own micro-architecture and design, it decides to return the first translation response message to ATC first, and the first translation response message it returns The untranslated address space corresponding to the target location information carried in a translation response message starts from the segment of the second STU, and the size is 1 STU. Therefore, the Start STU value of the translation response message sent by it is 0, the value of the L field is 0, and the length field value is 2 (DWs), and the 64-bit data carried in the message corresponds to the translation result and the corresponding translation size;
  • ATC receives the first translation response message returned by TA, and can start to use the information in the translation response message to directly perform physical address access to the corresponding physical address space;
  • the TA decides to return the second translation response message to the ATC according to its own micro-architecture and design.
  • the untranslated address space corresponding to the translation response message is a space of 8 consecutive STUs starting from the segment of the fifth STU. Therefore, the Start STU value of the translation response message sent by TA is 4, the value of the L field is 0, and the value of the length field is 4 (DWs).
  • the two 64-bit data groups carried in the message represent two translation results respectively.
  • the untranslated address space corresponding to the first translation result is from STU4 to STU7; the second corresponding untranslated address space is from STU8 to STU11.
  • TA decides to return a third translation response message to ATC according to its own micro-architecture and design.
  • the untranslated address space corresponding to the translation response message is a continuous space with the size of one STU starting from the segment of the third STU. Therefore, the Start STU value of the translation response message sent by TA is 2, the value of the L field is 0, the value of the length field is 2 (DWs), and a 64-bit data group carried in the message represents a translation result.
  • TA decides to end this request according to its own micro-architecture and design, and the untranslated address space corresponding to the translation response message that TA can return at present is the address space of the last STU, and the remaining Untranslated spaces are no longer processed and returned. Therefore, the Start STU value of the translation response message sent by TA is 31, the value of the L field is 1, and the value of the length field is 2 (DWs).
  • a 64-bit data group carried in the message represents a translation result, and the value of Cpl_Num The value is 3 (indicating that this translation request has a total of 4 translation response messages);
  • All translation response messages can be out of order, so ATC may receive the last translation response message first. Now ATC receives the translation response message whose L field value is 1, finds that its Cpl_Num is 3, and itself only receives two translation response messages so far, so it knows that there are two translation response messages on the road. Need to wait to receive.
  • ATC receives the remaining two translation response messages of the translation request one after another, and finally knows that the corresponding translation request has ended, and all the translation response messages have been received, and releases/rides the tag used by the corresponding request (tag ) and other resources.
  • the TA can also translate corresponding translation results for all untranslated address spaces, and these translation results can be returned in batches out of order with multiple translation response messages.
  • FIG. 3E is a schematic diagram of a corresponding translation result response for all untranslated address spaces of a translation request provided by an embodiment of the present application.
  • the ATC will finally collect all the translation response messages before considering that the corresponding translation request has been completed, and resources such as the corresponding tag (tag) can be released/reused.
  • the embodiment of the present application provides a design scheme of bus communication, especially relates to the design of address translation mechanism in bus communication, including redefining the address translation request and response mechanism to improve address translation efficiency and bus bandwidth utilization.
  • each translation response message responded by the translation response side carries the identification of its corresponding address translation request and adds the translation response of the prior art
  • the target location information that does not exist in the message, and the target location information refers to the location information of the currently translated address segment in the address space to be translated, so that the translation request side can accurately identify any translation response based on the above information Which segment of address in the address space to be translated is translated by the message.
  • the translation response side can feed back the translation response message in batches and out of order according to the actual translation progress (that is, if the translation is completed first, the feedback can be given first), and a batch can contain multiple translation response messages, avoiding that in the prior art, because the translation response message does not carry any position indication information, it can only strictly follow the established order (such as the sequence based on the continuity of the virtual address), and once only The problem that the untranslated message caused by the way of feeding back a response message will delay the feedback progress of the translated message, so the embodiment of the present application can effectively improve the efficiency and parallelism of the translation response, and at the same time The utilization rate of bus bandwidth is improved; and, once the translation request side receives the translation response message, it can perform data read and write access
  • the embodiment of the present application adds the position information of the smallest translation unit (Smallest Translation Unit, STU) currently translated in the translation response message, the feedback between the translation response messages is not restricted by the feedback sequence. Therefore, the translation response messages can be out of order, parallel, or serial feedback, so the untranslated virtual addresses corresponding to the translated physical addresses in any two translation response messages can be discontinuous, thereby avoiding In the prior art, the order-preserving feedback can only be performed in strict accordance with the established order, resulting in problems such as inflexible design, extended translation time, low translation efficiency, and low bandwidth utilization.
  • STU Smallest Translation Unit
  • the translation response side such as translation agent (Translation Agent, TA)
  • TA Translation Agent
  • the order preservation requirement of the bus routing is removed and the out-of-order feedback of the translation response message is realized
  • the translation request side such as address translation high-speed From the perspective of Address Translation Cache (ATC)
  • ATC Address Translation Cache
  • the embodiment of the present application realizes that more than two translation response messages can be fed back corresponding to one address translation request (the prior art specifies no more than two), and These translation response messages can be transmitted out of order in the bus routing, and different translation response messages can be parallel or serial, that is, to ensure the efficient completion rate of address translation requests (such as the total translation result address space size/untranslated address space size requested by the address translation request), and also further improves the utilization rate of the bus bandwidth.
  • this application further designs the address invalidation mechanism in the bus communication, specifically including the invalidation request and the invalidation response in the invalidation mechanism.
  • the following is how the second device 10 side makes an invalidation request for the invalidation unit to be invalidated; and how the first device 103 side performs the invalidation of the corresponding unit according to the invalidation request, and feeds back an invalidation response message to the second device etc. to specifically describe the bus communication method provided by the present application. Therefore, after the execution of the above step S304, the following steps S305-S307 may also be included:
  • Step S305 the second device sends an invalidation request to the first device.
  • the invalidation request includes a second identifier identifying the invalidation request and virtual addresses of multiple invalidation units to be invalidated, wherein the multiple invalidation units correspond to multiple discrete virtual addresses to be invalidated space.
  • the embodiment of the present application provides a bus communication design scheme. In addition to the design of the address translation mechanism in the bus communication, it further designs the address invalidation mechanism in the bus communication, including redefining the address invalidation request mechanism to improve the performance of the invalidate command.
  • an invalidation request can be defined to carry invalidation information for multiple discontinuous invalidation address space segments, so as to improve the transmission efficiency of the invalidation command, thereby improving the overall performance of the system for executing the invalidation command.
  • the bandwidth utilization rate of the invalidation command in the bus is greatly improved.
  • Step S306 the first device invalidates the mapping relationship between the virtual address and the physical address of the plurality of invalidation units in response to the invalidation request.
  • the invalidation request response side invalidates the virtual address and physical address mapping relationship that may carry multiple discontinuous invalidation address space segments in the invalidation request command, such as locally deleting the invalidation request command.
  • Mapping relations wherein, in a possible implementation manner, invalidation means that the first device side deletes the mapping relationship between the virtual address and the physical address that has been stored locally. It can be understood that the mapping relationship can also be changed in other specific ways. The invalidation is performed, which is not specifically limited in this embodiment of the present application.
  • Step S307 the first device generates one or more invalidation response messages.
  • each invalidation response message includes the second identification and an invalidation unit number
  • the invalidation unit number is the number of the invalidation unit that is currently invalidated in the multiple invalidation units .
  • the second device can determine the specific virtual address of the invalidation unit invalidated by the current invalidation response message according to the second identifier and the corresponding invalidation unit code. More specifically, the second device determines the invalidation request corresponding to the second identifier according to the second identifier, and further determines the virtualization of multiple invalidation units corresponding to the invalidation request.
  • the embodiment of the present application provides a bus communication design scheme. In addition to the design of the address translation mechanism in the bus communication, it further designs the address invalidation mechanism in the bus communication, including redefining the address invalidation request and response mechanism to improve the performance of invalidation commands.
  • an invalidation request can carry invalidation information for multiple discontinuous invalidation address space segments, and the invalidation completion status of the corresponding invalidation address space segments can be returned in batches with multiple invalidation response messages
  • a mechanism for the invalidation request side (such as the TA side) to improve the transmission efficiency of invalidation commands and the parallelism of invalidation execution, so as to improve the performance of the system as a whole in executing invalidation commands.
  • the bandwidth utilization rate of the invalidation command in the bus is greatly improved; and the invalidation response message can be returned in batches Giving the invalidation completion status to the invalidation request side (such as the TA side) also avoids the problem of increased delay in the execution completion of the invalidation command due to the aggregation of multiple invalidation response messages.
  • At least one invalidation response message among the one or more invalidation response messages further includes the total number of invalidation response messages corresponding to the invalidation request;
  • the method further includes: the second device judges the invalidation response message corresponding to the invalidation request according to the number of received invalidation response messages and the total number of the invalidation response messages Whether the receiving is completed; if the receiving is completed, the marking resource corresponding to the invalidation request is released or reused.
  • the invalidation response message fed back from the invalidation response side (that is, the first device, such as the ATC side) to the invalidation request side (that is, the second device, such as the TA side) may also carry the invalidation request The total number of corresponding invalidation response packets, so that the invalidation request side can decide whether to reuse or recycle the invalidated marked resources according to this information, so as to improve resource utilization.
  • the embodiment of the present application defines a new invalidation request, so that a single invalidation request message can carry a command to invalidate the translation of multiple discontinuous address space segments, which can solve the above-mentioned (4) analysis of the shortcomings of the prior art .
  • step S305 when the TA in the second device needs to invalidate the mapping relationship between a part of the virtual address and the physical address, it can send a request to the ATC in the first device to transfer the relevant mapping stored in the first device to The relationship is invalidated.
  • the following will exemplify how the ATC in the first device in this application specifically invalidates the invalidation unit according to the invalidation request sent by the TA in the second device.
  • the information of an invalidation request message sent by TA includes: the number of invalidation commands (invalidation units) used to invalidate the address space, and the untranslated base address space corresponding to each invalidation command. Address, corresponding to the size of the address space to be invalidated.
  • FIG. 4A is a schematic diagram of a data structure of an invalidation request message provided by an embodiment of the present application. As shown in FIG. 4A , the field marked XX is the same as the prior art. The value of the length field of the message indicates how many invalidation units there are, and its value must be an even number. Each invalidation unit represents the invalidation of a small segment of address space.
  • each invalidation unit has an S field.
  • FIG. 4B is a schematic diagram representing the space size of the translation result provided by the embodiment of the present application.
  • the ATC After receiving the invalidation request message, the ATC starts to invalidate corresponding address space segments according to the invalidation unit carried in the message.
  • the device can flexibly return different invalidation response messages for different invalidation units to TA in batches according to the actual business scenario and its own design.
  • the invalidation response message returned by ATC needs to include: which invalidation unit of the invalidation request that can uniquely mark the invalidation response message corresponding to the invalidation response message, and the invalidation unit of the invalidation request corresponding to the response message How many invalidation response packets are there in total.
  • the ATC can also return the same invalidation complete message by all the invalidation units, and it needs to be indicated in the message header at this time.
  • FIG. 4C is a schematic diagram of the format of an invalidation response message provided by the embodiment of the present application, and the fields marked with XX are the same as those in the prior art.
  • the CC field indicates how many invalidation response messages there are in all invalidation units/single invalidation units corresponding to the invalidation request corresponding to the current invalidation response. When its value is 0, it means that the number of corresponding response messages should be a maximum of 256 indivual. When the All field is 1, it means that all invalidation units of the invalidation request corresponding to the current invalidation response message share the same invalidation response message. At this time, the CC field indicates all invalidation units of the corresponding invalidation request. The total number of invalidation response packets in the optimization unit. InvUnit indicates that the current invalidation response message corresponds to the response of the invalidation unit of the invalidation request. This field is valid only when the All field is 0.
  • the TA After receiving the corresponding invalidation completion response message, the TA can accurately determine which invalidation unit of the invalidation request the current invalidation response message corresponds to. In this way, the TA can accurately determine which address segment has been invalidated.
  • this application further designs the page table establishment mechanism in bus communication, specifically including the page table request and page table response in the page table establishment mechanism.
  • the following is how the first device 10 side performs a page table request for the currently missing missing page; and how the first device 103 side establishes the corresponding page table according to the page table request, and feeds back the page table establishment result to the second device etc. to specifically describe the bus communication method provided by the present application. Therefore, before or after the execution of the above step S304, the following steps S308-S311 may also be included:
  • Step S308 the first device initiates a page table request to the second device.
  • the page table request includes the page table request code of the page table request in the page table request group PRG to which it belongs and the virtual addresses of multiple page table application units to be applied for, wherein the multiple page table requests
  • the application unit corresponds to multiple discontinuous page table spaces.
  • the embodiment of the present application provides a bus communication design scheme. In addition to the design of the address translation mechanism in the bus communication, it further designs the page table request mechanism in the bus communication, including redefining the page table request mechanism. , to improve the performance of the page table request command. Specifically, it is defined that a page table request (such as a command/message) can carry information of multiple page table application units, and correspondingly can apply to the system to create multiple discontinuous page table spaces.
  • a page table request message transmitted by the bus can carry information of multiple page table request units, the overhead of the page table request command can be reduced, the bandwidth utilization rate of the bus can be improved, and the transmission efficiency of the page table request command and the bus can be improved. bandwidth utilization.
  • the second device may not find the mapping relationship between the virtual address and the physical address of the address space segment that needs to be translated, or because If the translation fails due to other reasons, the address translation response message received by the first device may only correspond to a part of the virtual address space to be translated.
  • the first device can issue the page table request in the above step S308, that is, issue the page table request carrying information such as the virtual address again to the second device, so that the second device returns one or more page table response messages to the first device, so that the first device can directly obtain the translation result or initiate an address translation request again to obtain the translation result.
  • the page table request further includes one or more of page table space size, read and write permissions, desired memory attributes, or usage prompt information corresponding to each page table application unit.
  • the expected memory attribute refers to the memory attribute that the page table request side expects to have in the page table space corresponding to the page table application unit requested to be established.
  • Table 1 For the specific content of the memory attribute, please refer to Table 1 to be translated. The relevant description of the memory attributes of the address space of , will not be repeated here.
  • the page table request may also carry the expected information of the page table space applied for by the page table request side (such as the ATC side), for example, the size of the page table space expected to be established, the memory of the page table space expected to be established Model properties, system properties, etc.
  • the ATC side can provide an interface for the ATC side to propose more user-desired information when applying for a page table to the page table response side (such as the TA side or the system side), so that the TA can be based on more accurate user information Create a page table for the corresponding ATC, so that the ATC can optimize its DMA behavior according to the memory model attributes of the corresponding space, and expand the design space of the ATC.
  • TA From the point of view of the page table response side (such as the TA side), TA has the opportunity to adjust its strategy of establishing page table and the attributes of the established page table according to these expected information, thus greatly expanding the design space of TA/system, allowing TA/
  • the page table created by the system for ATC is more in line with ATC's expectations, so that there is an opportunity to improve the overall page table management efficiency and improve the overall performance of the system. For example, according to the size and usage of the page table application unit that ATC wants to apply for, TA/system It is possible to determine the size (granularity) of the correspondingly established page table, whether the correspondingly established page table can be imported (loaded) into the cache (cache) in advance, and other behaviors.
  • step S308 after requesting address translation, the TA in the first device finds that some virtual address requests fail, and needs to re-establish these virtual address information.
  • the TA in the first device in this application specifically initiates a page table request will be exemplified.
  • ATC initiates a page table request, and the requested message/transaction should include the address of the application page table, the size of the corresponding page table space, the read and write permissions of the corresponding page table space, and the memory of the page table space that ATC expects Attributes, ATC prompt information on the use of this space, etc.
  • a page table request message/transaction can contain multiple page tables (page table application units) that you want to apply for, and the encoding of the current page table request message/transaction in all page table requests under the current PRG, so that it is out of order After transmitting these page table requests, TA can accurately know how many page table requests there will be under the current PRG (this number is equal to the Page Num field value of the page table request whose L field is 1 plus 1). Each page table that wants to apply contains the above information.
  • FIG. 5A is a schematic diagram of the format of a page table response message with page table information provided by the embodiment of the present application, wherein the XXXX field indicates the same meaning as the prior art.
  • the length field of the message indicates how many sets of page tables that need to be applied are carried in the current page table request message, which is called a page table application unit, and each 2DWs (64 bits) is a set of page table application units.
  • the SSV field indicates whether the current page table request message contains PASID.
  • the bit width of the Page Num field of the page table request message header is 9 bits, which represents the coding of the current page table request in all page table requests under the PRG.
  • Exe and Pri are the same as Exe and Priv in the prior art, respectively indicating whether the requested page table is an executable space and requires special privileges.
  • the Hint field indicates the most likely way for ATC to use the page table space indicated by the current page table application unit, and Table 2 shows one of the possible encoding ways.
  • the memory attribute (MemAttr) field indicates what the memory attribute of the page table space indicated by the current page table application unit is expected by the ATC, and the specific encoding is the same as that shown in table 1.
  • the S field indicates whether the size of the page table space that the current page table application unit wants to apply for is greater than 4096 bytes. Together with the page table address field, it will indicate the size of the page table space that the current page table application unit wants to apply for.
  • Figure 5B is a schematic diagram showing the size of the page table space provided by the embodiment of the present application.
  • Untranslated Address[12] indicates whether the size of the currently invalidated untranslated address space is greater than 8192bytes, if [12] is set to 1, it means the current invalidation unit The invalid space size is greater than 8192bytes, otherwise it is equal to 8192bytes, and so on.
  • Step S309 the second device generates one or more page table response messages in response to the page table request
  • each of the page table response messages includes the page table request code and the mapping relationship between the virtual address and the physical address of the currently established page table application unit, or, each of the page table response messages
  • the code of the page table request is included; the second device feeds back the one or more page table response messages to the first device. That is to say, in some possible embodiments, the page table response message contains both the page table request code and the established address mapping relationship, and in other possible embodiments, the page table response message only includes The page table request encoding is not included in the address mapping relationship.
  • the first device can directly obtain the address mapping relationship, and directly initiate an access operation to the corresponding physical address; when the page table response message does not carry the address mapping relationship , the first device may not be able to obtain the corresponding address mapping relationship this time, and needs to initiate an address translation request to the second device again, so as to request to obtain the translation result of the corresponding virtual address.
  • a bus communication design scheme is provided.
  • the page table request and response mechanism in the bus communication is further designed, including redefining the Page table request and response mechanism to improve the performance of page table request and response commands.
  • a page table request (such as a command/message) can carry the information of multiple page table application units, correspondingly, it can apply to the system to establish multiple discontinuous page table spaces, and further defines the same
  • the PRG can correspond to multiple page table response messages, and the page table response side (such as the TA side) can feed back the page table response messages to the page table request side (such as the ATC side) in batches and without order requirements.
  • these page table response messages can directly carry the page table information that has been successfully established, so that the address mapping result can be obtained after the page table request is sent again after the page fault occurs in the first address translation request of ATC , compared with the prior art, the sending and feedback of the second address translation request are reduced.
  • the embodiment of the present application can reduce the steps needed to obtain the required address mapping result after a page fault occurs in the first page table request in the existing ATS technology, thereby improving the memory access performance of the system. It can be understood that when each page table response message only includes the page table request code but does not include the address mapping relationship, then the ATC needs to send the corresponding address translation request after sending the page table request, to further request untranslated virtual addresses.
  • the page table response message in the embodiment of this application contains two implementation methods, one is to initiate a page table request only once after a page fault, and the other is to initiate a page table request after a page fault The request also needs to initiate an address translation request, that is, it is compatible with two page table response methods at the same time.
  • each page table response message also includes the base address of the page table space corresponding to the established page table application unit, the size of the page table space, the memory model attribute of the page table space, the read/write One or more of permissions and system properties. It can be understood that the above one or more types of information carried in the page table response message in the embodiment of the present application are based on the information carried and established in the page table response message fed back by the second device in step S309 above. The mapping relationship between the virtual address and the physical address of the table application unit.
  • the page table response message also carries the base address of the page table space corresponding to the currently established page table application unit, the size of the page table space, the memory model attribute of the page table space, read and write permissions, and system attributes One or more of them can make the page table response side (such as the TA/system side) return the attributes of the established page table, so that the page table request side (such as the ATC side) can obtain more page table attributes, so that according to the corresponding
  • the memory model attribute of the page table space optimizes the way the page table request side ATC itself uses this page table space, for example, optimizes the subsequent DMA behavior of the page table request side ATC, thereby improving the overall performance of the system and expanding the design space of the ATC.
  • At least one of the one or more page table response messages further includes the total number of page table response messages corresponding to the page table request group PRG
  • the method further includes: the second device judges the page table request group PRG corresponding to the page table request group PRG according to the total number of page table response messages and the number of received page table response messages Whether the page table response message has been received; if it has been received, release or reuse the tag resource corresponding to the page table request group PRG.
  • the page table response message fed back from the page table response side (that is, the second device, such as the TA side) to the page table request side (that is, the first device, such as the ATC side) may also carry the associated page
  • step S309 after requesting address translation, the ATC in the first device finds that some virtual address requests fail, and needs to re-establish these virtual address information.
  • the TA in the second device in this application establishes the relevant page table according to the page table request initiated by the ATC in the first device will be exemplified.
  • TA receives the page table request, and starts to build the page table according to the information shown in the page table request message.
  • TA may not meet all the page table space and corresponding memory attributes proposed by ATC. For TA, these can be just prompt information, and TA (system software) can decide according to its own design and strategy. Whether all meet the expectations of ATC.
  • TA can return multiple page table response messages with page table information to ATC in batches according to its specific design and micro-architecture, or it can be the same as the existing technology PCIe ATS, Only return a page table response message without data (page table information) to ATC.
  • this embodiment of the present application defines a page table response message that can carry page table information (with data), and multiple page table response messages can correspond to multiple page table requests of the same PRG number mechanism.
  • the format of the last page table response message needs to be defined, and the format of the last page table response message needs to be defined, which can correctly indicate how many pages there are in the page table request under the current PRG.
  • the table response message correctly indicates that the successfully established page table information carried in each page table response message is the information of the page table requested to be established corresponding to the page table request under this PRG.
  • the page table information carried in the page table response message may include the base address and size of the corresponding page table, the memory attributes of the physical address space corresponding to the page table, the read and write permissions of the physical address space corresponding to the page table, and the physical address of the corresponding page table Whether the space is a privileged attribute, whether the physical address space corresponding to the page table is executable, etc.
  • the number of established page table information groups carried in the page table response message with page table information may be less than the number of page table application units requested by the page table request corresponding to the response. ATC should not assume that when the number of page table information groups carried by the received page table response message is less than the number of page table application units of the page table request corresponding to this response, the corresponding page table application that does not return page table information The unit is not built successfully.
  • FIG. 5C is a schematic diagram of another page table response message format with page table information provided by the embodiment of the present application.
  • the page table response message with page table information is a page table response message with data
  • the successfully established page table information is carried in the form of data in the message. Only the successfully established page table is returned to ATC through the page table response message with data.
  • the L field of the page table response message indicates whether the current page table response message is the last page table response message corresponding to the page table request under the PRG; Page Num should accurately indicate that the current page table response message corresponds to the page table response message under this PRG.
  • the RC field value of the response message must be able to accurately indicate the number of all page table response messages under this PRG.
  • each page table information group (a total of 2DWs) includes: the physical base address corresponding to the currently established page table, the memory attribute (MemAttr) of the physical address space corresponding to the currently established page table, read and write permissions (R and W Domain), whether it is privileged (Pri), etc., the specific meaning is shown in Table 3, and the coding meaning of MemAttr is one of the possibilities, which can be understood and defined according to the specific system where the TA is located.
  • FIG. 5D is another schematic diagram showing the size of the page table space provided by the embodiment of the present application.
  • Each page table information group also carries the size of the physical address space corresponding to the currently established page table.
  • This size indication is similar to the size indication of the translation result: whether the S field is 1 or not indicates whether the size of the physical address space of the currently established page table is greater than 4096bytes, if the S field is 1, whether the bit[12] of the page table address is 1 indicates whether the size of the physical address space of the currently established page table is greater than 8192bytes, and so on.
  • Step S310 the second device feeds back the one or more page table response messages to the first device.
  • the second device after the second device has created a page table response message corresponding to a page table application unit, it can feed back to the first device, in multiple times, in parallel or in series.
  • Step S311 the first device receives the one or more page table response messages fed back by the second device; for each page table response message, the first device according to the page table request encoding and the mapping relationship between the virtual address and the physical address of the currently established page table application unit, and initiate an access operation to the corresponding physical address; or, for each page table response message, the first device according to The page table request code determines the virtual address corresponding to the corresponding page table application unit, and initiates a translation request for the virtual address corresponding to the page table application unit to the second device.
  • the page table response message carries the page table information that has been successfully established, after the first address translation request of the ATC on the page table request side occurs a page fault, it can pass the page table response after sending the page table request again.
  • the message directly obtains the address mapping result, and performs subsequent access operations on the corresponding physical address according to the address mapping result.
  • the sending and feedback of the second address translation request is reduced, and the existing address translation service ATS technology can reduce the number of pages to be retrieved after a page fault occurs during the first page table request. The steps required to map the results, thereby improving the memory access performance of the system.
  • the page table response message does not carry the page table information that has been successfully established, the ATC at the page table requesting side needs to initiate an address translation request for the virtual address whose page table has not been successfully established.
  • the ATC in the first device may receive one or more page table response messages fed back by the TA in the second device.
  • the ATC in the first device specifically receives the page table response message fed back by the TA in the second device in this application will be exemplified.
  • ATC may receive multiple page table response messages belonging to page table requests under the same PRG in batches. If the corresponding page table response messages carry page table information, ATC is allowed to directly use these page table responses The page table physical address and attributes carried in the message directly issue the physical address to access the corresponding physical address space. ATC cannot assume that all corresponding page table requests under its PRG can obtain page table response messages with data, and ATC cannot assume that those page table requests that do not obtain page table response messages with data are not established successfully.
  • FIG. 5E is a schematic diagram of a scene where the ATC receives a page table response message in the embodiment of the present application.
  • the embodiment of this application redefines the page table request, by defining that the page table request can carry the untranslated address and size of the page table requested to be established, as well as some expectations for certain attributes of the application page table space or the requester's request for these Prompt information such as the possible use of the page table, so as to allow TA to optimize its own design space, also solves (5) of the above-mentioned analysis of the shortcomings of the prior art.
  • the embodiment of the present application also redefines the page table response message mechanism, by modifying the limitation that the page table request under the same PRG under the existing mechanism can only have one page table response message, it is extended to the page table response message under the same PRG.
  • a table request can have multiple page table response messages, and a set is defined to allow the requester (ATC) to accurately identify whether the page table requests under the corresponding PRG have been completed (received corresponding to all page table response messages).
  • ATC requester
  • this application also defines the page table response message mechanism, so that the page table response message can directly return the page table mapping result of the page table space corresponding to the request.
  • FIG. 6 is a schematic structural diagram of a bus system provided by an embodiment of the present application.
  • the bus system 60 may include a plurality of devices interconnected through a bus, and the plurality of devices interconnected through a bus include at least a first device 601 and a second device 602; wherein, detailed functions of the first device 601 and the second device 602 are described as follows.
  • the first device 601 is configured to send an address translation request to the second device 602, where the address translation request includes a first identifier for identifying the address translation request and a virtual address of the virtual address space to be translated, and the virtual address space to be translated
  • the address space includes one or more minimum translation units STU;
  • the second device 602 is configured to generate one or more translation response messages in response to the address translation request; where each translation response message in the one or more translation response messages includes the first An identification, target location information and target physical address, the target location information is the location information of one or more target STUs in the virtual address space to be translated, the target physical address The address is a physical address translated from the virtual address of the one or more target STUs;
  • the second device 602 is configured to feed back the one or more translation response messages to the first device 601;
  • the first device 601 is configured to, for each translation response message, initiate a read and write access operation to the target physical address according to the first identifier and the target location information.
  • the target location information includes a base address of a first target STU among the one or more target STUs, and a size of an address space of the one or more target STUs.
  • the first device 601 is specifically configured to: determine the virtual address of the one or more target STUs according to the first identifier and the target location information; The read and write access operations corresponding to the virtual addresses of the multiple target STUs initiate the read and write access operations to the target physical addresses.
  • the address translation request further includes usage information of the virtual address space, and the usage information includes one of usage frequency, access frequency, or regional expectations of the virtual address space. one or more species.
  • each translation response message further includes one or more of the memory attributes of the one or more target STUs, first indication information, or translation extension information; wherein, The memory attribute includes whether the target STU can be cached or the size of the access delay corresponding to the target STU, and the first indication information is used to indicate whether the current translation response message is one of the one or more translation response messages The last response message.
  • At least one translation response message among the one or more translation response messages further includes the total number of translation response messages corresponding to the address translation request; the second device 602 is also used to determine whether the translation response message corresponding to the address translation request has been received according to the number of translation response messages received and the total number of the translation response messages; if the reception is complete, then The tag resource corresponding to the address translation request is released or reused.
  • the second device 602 is further configured to send an invalidation request to the first device 601, where the invalidation request includes a second identifier for identifying the invalidation request and multiple invalidation requests to be invalidated.
  • the virtual address of the invalidation unit wherein the multiple invalidation units correspond to multiple discrete virtual address spaces to be invalidated; the first device 601 is further configured to, in response to the invalidation request, assign the multiple invalidation units Invalidate the virtual address and physical address mapping relationship of each invalidation unit.
  • the first device 601 is further configured to generate one or more invalidation response messages, where each of the invalidation response messages includes the second identifier and the invalidation unit number, the invalidation unit number is the number of the currently invalidated invalidation unit in the plurality of invalidation units.
  • At least one invalidation response message among the one or more invalidation response messages further includes the total number of invalidation response messages corresponding to the invalidation request;
  • the second device 602 is further configured to judge whether the invalidation response message corresponding to the invalidation request has been received according to the number of the invalidation response message received and the total number of the invalidation response message ; If the reception is completed, release or reuse the tag resource corresponding to the invalidation request.
  • the first device 601 is further configured to initiate a page table request to the second device 602, where the page table request includes the page table of the page table request in the page table request group PRG to which it belongs.
  • the page table request further includes page table space size, read and write permissions, expected memory attributes or usage prompt information corresponding to each page table application unit in the plurality of page table application units one or more of.
  • the second device 602 is further configured to generate one or more page table response messages in response to the page table request; wherein, each of the page table response messages includes the The page table request code and the mapping relationship between the virtual address and the physical address of the currently established page table application unit, or each page table response message includes the page table request code; the second device 602 also uses to feed back the one or more page table response messages to the first device.
  • each page table response message also includes the base address of the page table space corresponding to the established page table application unit, the size of the page table space, the memory model attribute of the page table space, the read/write One or more of permissions and system properties.
  • At least one of the one or more page table response messages further includes the total number of page table response messages corresponding to the page table request group PRG number; the second device 602 is further configured to determine the page table response corresponding to the page table request group PRG according to the total number of page table response messages and the number of received page table response messages Whether the message has been received; if it has been received, release or reuse the tag resource corresponding to the page table request group PRG.
  • the first device 601 is further configured to receive the one or more page table response messages fed back by the second device 602; for each page table response message, according to the The page table request code and the mapping relationship between the virtual address and the physical address of the currently established page table application unit, initiate an access operation to the corresponding physical address; or, for each page table response message, according to the The page table request code determines the virtual address corresponding to the corresponding page table application unit, and initiates a translation request for the virtual address corresponding to the page table application unit to the second device.
  • each device in the bus system 60 described in the embodiment of the present application may refer to the relevant descriptions in the method embodiments described above in FIGS. 1-5E , and will not be repeated here.
  • the embodiment of the present application also provides a computer storage medium, wherein the computer storage medium can store a program, and the program includes some or all of the steps described in any one of the above method embodiments when executed.
  • the embodiments of the present application also provide a computer program, the computer program includes instructions, and when the computer program is executed by a computer, the computer can perform some or all of the steps described in any one of the above method embodiments.
  • the disclosed device can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the above units is only a logical function division.
  • there may be other division methods for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, server, or network device, etc., specifically, a processor in the computer device) execute all or part of the steps of the above-mentioned methods in various embodiments of the present application.
  • the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disc, read-only memory (Read-Only Memory, abbreviated: ROM) or random access memory (Random Access Memory, abbreviated: RAM) and the like.
  • ROM Read-Only Memory
  • RAM Random Access Memory

Abstract

L'invention divulgue un procédé de communication par bus et un dispositif associé. Le procédé de communication par bus peut être appliqué à un système de bus, le système de bus comprend une pluralité de dispositifs interconnectés au moyen d'un bus et les dispositifs comprennent au moins un premier dispositif et un second dispositif. Le procédé peut comprendre les étapes suivantes : le premier dispositif envoie une demande de traduction d'adresse au second dispositif, la demande de traduction d'adresse comprenant un premier identifiant ainsi qu'une adresse virtuelle à traduire d'un espace d'adresse virtuelle, et l'espace d'adresse virtuelle comprenant une ou plusieurs unités STU ; le second dispositif génère un ou plusieurs messages de réponse de traduction à la suite de la demande de traduction d'adresse, chaque message de réponse de traduction comprenant le premier identifiant, des informations d'emplacement cible et une adresse physique cible ; le second dispositif renvoie le ou les messages de réponse de traduction au premier dispositif ; et pour chaque message de réponse de traduction, le premier dispositif lance une opération d'accès en lecture-écriture à l'adresse physique cible en fonction du premier identifiant et des informations d'emplacement cible. La présente demande peut améliorer les performances du système de bus et améliorer l'utilisation du bus.
PCT/CN2022/107397 2021-09-14 2022-07-22 Procédé de communication par bus et dispositif associé WO2023040464A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111076791.2 2021-09-14
CN202111076791.2A CN115811509A (zh) 2021-09-14 2021-09-14 一种总线通信方法及相关设备

Publications (1)

Publication Number Publication Date
WO2023040464A1 true WO2023040464A1 (fr) 2023-03-23

Family

ID=85481572

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107397 WO2023040464A1 (fr) 2021-09-14 2022-07-22 Procédé de communication par bus et dispositif associé

Country Status (2)

Country Link
CN (1) CN115811509A (fr)
WO (1) WO2023040464A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117851289A (zh) * 2024-03-07 2024-04-09 北京象帝先计算技术有限公司 页表获取方法、系统、电子组件及电子设备
CN117851291A (zh) * 2024-03-07 2024-04-09 北京象帝先计算技术有限公司 内存访问系统、电子组件及电子设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046106A (zh) * 2019-03-29 2019-07-23 海光信息技术有限公司 一种地址转换方法、地址转换模块及系统
GB202007437D0 (en) * 2020-05-19 2020-07-01 Advanced Risc Mach Ltd Translation table address storage circuitry
WO2021072721A1 (fr) * 2019-10-17 2021-04-22 华为技术有限公司 Procédé et appareil de traduction d'adresse

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046106A (zh) * 2019-03-29 2019-07-23 海光信息技术有限公司 一种地址转换方法、地址转换模块及系统
WO2021072721A1 (fr) * 2019-10-17 2021-04-22 华为技术有限公司 Procédé et appareil de traduction d'adresse
GB202007437D0 (en) * 2020-05-19 2020-07-01 Advanced Risc Mach Ltd Translation table address storage circuitry

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117851289A (zh) * 2024-03-07 2024-04-09 北京象帝先计算技术有限公司 页表获取方法、系统、电子组件及电子设备
CN117851291A (zh) * 2024-03-07 2024-04-09 北京象帝先计算技术有限公司 内存访问系统、电子组件及电子设备

Also Published As

Publication number Publication date
CN115811509A (zh) 2023-03-17

Similar Documents

Publication Publication Date Title
US11929927B2 (en) Network interface for data transport in heterogeneous computing environments
US9678918B2 (en) Data processing system and data processing method
WO2023040464A1 (fr) Procédé de communication par bus et dispositif associé
CN107153624B (zh) 经由计算机总线对持久性存储器的控制
US20110004732A1 (en) DMA in Distributed Shared Memory System
DE102018004327A1 (de) Systeme und Verfahren zum Zugreifen auf Massenspeicher als Arbeitsspeicher
CN112988632A (zh) 设备之间的共享存储器空间
WO2015078219A1 (fr) Procédé et appareil de mise en mémoire cache d'informations, et dispositif de communication
US9584628B2 (en) Zero-copy data transmission system
WO2021244155A1 (fr) Procédé de communication inter-processus et appareil de communication inter-processus
US20120054380A1 (en) Opportunistic improvement of mmio request handling based on target reporting of space requirements
US11829309B2 (en) Data forwarding chip and server
WO2015180598A1 (fr) Procédé, appareil et système de traitement d'informations d'accès d'un dispositif de stockage
US20220222016A1 (en) Method for accessing solid state disk and storage device
WO2023093418A1 (fr) Procédé et appareil de migration de données et dispositif électronique
DE102022107778A1 (de) Adressübersetzung an einer zielnetzwerk-schnittstellenvorrichtung
US20060120376A1 (en) Method and apparatus for providing peer-to-peer data transfer within a computing environment
CN117555836A (zh) 一种数据处理装置、系统及电子设备
US20240104045A1 (en) System and method for ghost bridging
US9424227B2 (en) Providing byte enables for peer-to-peer data transfer within a computing environment
WO2023216603A1 (fr) Procédé et appareil de partage de mémoire
US20230388371A1 (en) System and method for accessing remote resource
CN117971135A (zh) 存储设备的访问方法、装置、存储介质和电子设备
CN114911411A (zh) 一种数据存储方法、装置及网络设备
CN114661639A (zh) 地址转换技术

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22868841

Country of ref document: EP

Kind code of ref document: A1