WO2022205130A1 - 读写操作执行方法和SoC芯片 - Google Patents

读写操作执行方法和SoC芯片 Download PDF

Info

Publication number
WO2022205130A1
WO2022205130A1 PCT/CN2021/084556 CN2021084556W WO2022205130A1 WO 2022205130 A1 WO2022205130 A1 WO 2022205130A1 CN 2021084556 W CN2021084556 W CN 2021084556W WO 2022205130 A1 WO2022205130 A1 WO 2022205130A1
Authority
WO
WIPO (PCT)
Prior art keywords
address
node
operation authority
read
message
Prior art date
Application number
PCT/CN2021/084556
Other languages
English (en)
French (fr)
Inventor
夏晶
信恒超
黎卓南
袁思睿
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2021/084556 priority Critical patent/WO2022205130A1/zh
Priority to CN202180093103.5A priority patent/CN116940934A/zh
Priority to EP21933798.7A priority patent/EP4310683A4/en
Publication of WO2022205130A1 publication Critical patent/WO2022205130A1/zh
Priority to US18/477,110 priority patent/US20240028528A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1642Handling requests for interconnection or transfer for access to memory bus based on arbitration with request queuing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/1652Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
    • G06F13/1663Access to shared memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1605Handling requests for interconnection or transfer for access to memory bus based on arbitration
    • G06F13/161Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
    • G06F13/1626Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by reordering requests

Definitions

  • the present application relates to the field of storage, and in particular, to a method for performing read and write operations and a system on chip (SoC) chip.
  • SoC system on chip
  • Data can be passed between multiple processes (software) by accessing shared memory.
  • multiple processes send read and write commands to hardware (for example, a central processing unit (CPU)), and the hardware performs read and write operations on the shared memory.
  • hardware for example, a central processing unit (CPU)
  • the memory consistency model can be used to make different requirements on the execution order of the read and write operations to ensure that the execution results meet the software expectations. .
  • the strictness of the execution order required by different storage consistency models is different.
  • the nodes of the storage consistency model (referred to as the strong order model) comply with strict order (strict order, SO) constraints
  • the nodes obey the relaxed order (relax order
  • the read and write operations should also be performed in the weak order model according to the execution order of the strong order model to ensure that the execution results are globally visible.
  • the order of observable, GO) conforms to the requirements of the strong order model.
  • Embodiments of the present application provide a method and a SoC chip for executing a read and write operation, which are used to realize that the order in which the execution result of a read and write operation performed by a node that complies with the RO constraint meets the requirements of the node that complies with the SO constraint.
  • a first aspect provides a method for performing a read-write operation, comprising: a first node receiving a first message and a second message from a second node; the first message is used to request to read and write a first address managed by a third node operation; the second message is used to request read and write operations on the second address managed by the third node; the execution sequence constraints of the read and write operations of the second node are stricter than the execution sequence constraints of the read and write operations of the third node; the first node Obtain the operation authority of the first address and the operation authority of the second address from the third node; the first node performs read and write operations on the first address and the second address.
  • the first node receives the first message and the second message from the second node, the second node complies with the SO constraint, and the first message requests to read the first address managed by the third node Write operation, the second message requests to perform read and write operations on the second address managed by the third node, and the third node complies with the RO constraint; then the first node obtains the operation authority of the first address and the operation authority of the second address from the third node , so that the first node participates in the management of cache consistency, and other nodes cannot perform read and write operations on the first address and the second address that require operation permissions, that is, the execution order of the read and write operations of the first address and the second address is determined by Controlled by the first node, then the order in which the execution results are globally visible is also controlled by the first node. Thereby, the order in which the execution results of the read and write operations performed by the nodes complying with the RO constraint are globally
  • the first node performs read and write operations on the first address and the second address, including: the first node performs read and write operations on the first address and the second address in parallel.
  • This embodiment can realize parallel processing of read and write operation requests from nodes that comply with SO constraints, thereby improving transmission bandwidth and interaction efficiency between nodes complying with SO constraints (second nodes) and nodes complying with RO constraints (third nodes). .
  • the first node performs read and write operations on the first address and the second address in parallel, including: the first node reads and writes the first address and the second address in parallel according to the receiving order of the first message and the second message. address for read and write operations.
  • This embodiment can ensure that the globally visible order of execution results conforms to the strong order model requirement.
  • the second node obeys the strict order SO constraint
  • the third node obeys the relaxed order RO constraint. This embodiment explains why the execution order constraints of the read and write operations of the second node are stricter than the execution order constraints of the read and write operations of the third node.
  • the method further includes: after the first node completes the read and write operations on the first address, releasing the operation authority of the first address to the third node. In this way, the third node or other nodes can continue to perform read and write operations on the first address. After completing the read and write operations on the second address, the first node releases the operation authority of the second address to the third node. In this way, the third node or other nodes can continue to perform read and write operations on the second address.
  • the first node obtains the operation authority of the first address and the operation authority of the second address from the third node, including: the first node obtains the E state of the first address and the second address from the third node.
  • the E state of the address This embodiment provides a specific form of the operation authority of the first address and the operation authority of the second address.
  • the first node when the first node requests the operation authority of the first address but does not obtain the operation authority of the first address, the first node receives a request from the third node to perform a read and write operation on the first address that requires operation authority, Or, if the operation authority of the first address is requested, the first node indicates to the third node that the operation authority of the first address has not been obtained, so that the third node or other nodes can perform read and write operations on the first address.
  • the first node requests the operation authority of the second address but does not obtain the operation authority of the second address, the first node receives a request from the third node to perform a read and write operation on the second address that requires operation authority, or requests the operation authority of the second address. , the first node indicates to the third node that the operation authority of the second address has not been obtained; so that the third node or other nodes can perform read and write operations on the second address.
  • the method further includes: when the preset condition is satisfied, the first node sends the third node to the third node.
  • the node releases the operation authority of the first address and the operation authority of the second address. This enables the third node or other nodes to perform read and write operations on the first address and the second address.
  • the preset condition is that the third node requests the operation authority of the first address and the second address. This enables the third node or other nodes to perform read and write operations on the first address and the second address.
  • the preset condition is that the time when the first node obtains the operation authority of the first address from the third node is greater than or equal to the first preset time, and the first node obtains the second address from the third node The time of the operation authority is greater than or equal to the second preset time.
  • the method further includes: after the first node obtains the operation authority of the first address and before starting to perform read and write operations on the first address, the third node requests the first address to perform operations that require operation authority. read and write operations, or request the operation authority of the first address, the first node releases the operation authority of the first address to the third node, and obtains the operation authority of the first address from the third node again; If the three nodes obtain the operation authority of the first address, they can continue to perform read and write operations on the first address.
  • the third node After the first node obtains the operation authority of the second address and before starting to read and write operations on the second address, the third node requests to perform a read and write operation on the second address that requires operation authority, or requests an operation on the first address authority, the first node releases the operation authority of the second address to the third node, and obtains the operation authority of the second address from the third node again. If the first node obtains the operation authority of the second address from the third node again, it can continue to perform read and write operations on the second address.
  • the method further includes: when the first node starts to perform a write operation on the first address but does not obtain a cache address corresponding to the first address, the third node requests to perform a read operation on the first address that requires operation authority write operation, or request the operation permission of the first address, after obtaining the cache address corresponding to the first address, the first node sends the data written in the cache address corresponding to the first address to the third node, or indicates that it has been released The operation authority of the first address; enabling the third node or other nodes to perform read and write operations on the first address.
  • the third node When the first node starts to perform a write operation on the second address but does not obtain the cache address corresponding to the second address, the third node requests to perform a read and write operation on the second address that requires operation authority, or requests the operation authority of the second address , the first node sends the data written in the cache address corresponding to the second address to the third node after acquiring the cache address corresponding to the second address, or indicates that the operation authority of the second address has been released. This enables the third node or other nodes to perform read and write operations on the second address.
  • the second node is an input/output I/O device other than the system-on-chip SoC chip
  • the first node is a memory management unit (memory management unit, MMU) in the SoC chip
  • the MMU may be SMMU
  • the third node is the memory controller in the SoC chip or the local agent HA in the memory controller.
  • the second node is a processor in the SoC chip
  • the first node is an on-chip interconnection NOC or an interface module of the processor in the SoC chip
  • the third node is a memory controller in the SoC chip or HA in the memory controller.
  • a system-on-chip SoC chip which is characterized by comprising: a first node and a memory controller, where the first node is configured to: receive a first message and a second message from the second node; It is used to request read and write operations to the first address managed by the memory controller; the second message is used to request read and write operations to the second address managed by the memory controller;
  • the execution sequence of the read and write operations of the memory controller is strictly restricted; the operation authority of the first address and the operation authority of the second address are obtained from the memory controller; and the read and write operations are performed on the first address and the second address.
  • the first node is specifically configured to perform read and write operations on the first address and the second address in parallel.
  • the first node is specifically configured to perform read and write operations on the first address and the second address in parallel according to the receiving order of the first message and the second message.
  • the second node obeys the strict order SO constraint
  • the memory controller obeys the relaxed order RO constraint
  • the first node is further configured to: release the operation authority of the first address to the memory controller after completing the read and write operations on the first address; after completing the read and write operations on the second address After that, the operation authority of the second address is released to the memory controller.
  • the first node is specifically configured to: acquire the E state of the first address and the E state of the second address from the memory controller.
  • the first node is further configured to: when requesting the operation authority of the first address but not obtaining the operation authority of the first address, receiving a request from the memory controller to perform an operation requiring operation authority on the first address Read and write operations, or request the operation authority of the first address, indicate to the memory controller that the operation authority of the first address has not been obtained; when requesting the operation authority of the second address but not obtained the operation authority of the second address, receive The memory controller requests to perform read and write operations on the second address that require operation authority, or requests the operation authority of the second address, indicating to the memory controller that the operation authority of the second address has not been obtained.
  • the first node after obtaining the operation authority of the first address and the operation authority of the second address from the memory controller, the first node is further configured to: release the first node to the memory controller when the preset condition is satisfied The operation authority of one address and the operation authority of the second address.
  • the preset condition is that the memory controller requests the operation authority of the first address and the second address.
  • the preset condition is that the time when the first node obtains the operation authority of the first address from the memory controller is greater than or equal to the first preset time, and the first node obtains the second address from the memory controller The time of the operation authority is greater than or equal to the second preset time.
  • the second node is an input/output I/O device outside the SoC chip
  • the first node is a memory management unit MMU in the SoC chip.
  • the second node is a processor in the SoC chip
  • the first node is an on-chip interconnection network NOC in the SoC chip or an interface module of the processor.
  • the first node includes a sequence processing module, an operation authority judgment module and a data cache judgment module; the sequence processing module is used for recording the sequence of receiving the first message and the second message; the operation authority judgment module is used for Record whether the operation authority of the first address and the operation authority of the second address are received, and determine the sequence of reading and writing operations on the first address and the second address according to the sequence; the data cache judgment module is used to record whether the first address is received.
  • the identifier of the cache address corresponding to one address and the identifier of the cache address corresponding to the second address are used to determine whether to send data.
  • FIG. 1 is a schematic structural diagram of a chip system in which an I/O device communicates with a SoC chip according to an embodiment of the present application;
  • FIG. 2 is a schematic structural diagram of an SMMU provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of RO constraints and SO constraints of different storage consistency models according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram 1 of implementing a globally visible sequence of execution results in a weak sequence model that meets the requirements of a strong sequence model according to an embodiment of the present application;
  • FIG. 5 is a schematic diagram 2 of implementing a globally visible sequence of execution results in a weak sequence model that meets the requirements of a strong sequence model according to an embodiment of the present application;
  • FIG. 6 is a schematic diagram of communication between different modules within a same storage consistency model provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an improvement to a weak sequence model provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an improvement to the same storage consistency model provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart 1 of a method for performing a read-write operation provided by an embodiment of the present application.
  • FIG. 10 is a second schematic flowchart of a method for performing a read-write operation provided by an embodiment of the present application.
  • FIG. 11 is a schematic flowchart three of a method for executing a read-write operation provided by an embodiment of the present application.
  • FIG. 12 is a fourth schematic flowchart of a method for executing a read-write operation provided by an embodiment of the present application.
  • FIG. 13 is a schematic flowchart five of a method for executing a read-write operation provided by an embodiment of the present application.
  • FIG. 14 is a sixth schematic flowchart of a method for executing a read-write operation provided by an embodiment of the application.
  • 15 is a seventh schematic flowchart of a method for executing a read-write operation provided by an embodiment of the present application.
  • 16 is a schematic flowchart eight of a method for performing a read-write operation provided by an embodiment of the present application
  • 17 is a schematic flowchart 9 of a method for executing a read-write operation provided by an embodiment of the present application.
  • FIG. 19 is a schematic flowchart eleventh of a method for performing a read and write operation provided by an embodiment of the present application.
  • the execution results of the read and write operations (whether the read and write operations are performed) have certain requirements on the order that other nodes are globally visible. For example, a node reads two addresses one by one. Write operation (equivalent to performing two read and write operations), or, if a node performs two read and write operations on an address successively, other nodes not only know that two read and write operations have been performed, but also know (that is, globally visible)
  • the order of the execution results of the two read and write operations conforms to software expectations, that is, to meet the requirements of storage consistency.
  • the correct order of the execution results that are globally visible includes: executing both the first address and the second address. In the write operation, only the first address is written, or neither the first address nor the second address is written.
  • the error sequence in which the execution result is globally visible includes: only the second address is written.
  • the processor is a fast-running device relative to the memory.
  • the processor reads and writes to the memory, if it waits for the operation to complete before processing other tasks, the processor will be blocked and the work efficiency of the processor will be reduced. Therefore, one cache can be configured per processor (cache is much faster but smaller than memory).
  • the direct memory access (DMA) device stores the data to the memory; similarly, when the processor reads When there is data in the memory, the DMA device first stores the data from the memory to the cache, and then the processor reads the data from the cache.
  • DMA direct memory access
  • Cache-coherent devices comply with the MESI protocol, which specifies four exclusive states of the cache line (the smallest cache unit in the cache), including: E (Exclusive) state, M (Modified) state, S state (Shared) state and I (Invalid) state.
  • E Exclusive
  • M Mode
  • S Shared
  • I Invalid
  • the E state indicates that the cache line is valid, the data in the cache is consistent with the data in the memory, and the data only exists in this cache
  • the M state indicates that the cache line is valid, the data has been modified, the data in the cache is inconsistent with the data in the memory, and the data Only exists in this cache
  • the S state indicates that the cache line is valid, the data in the cache is consistent with the data in the memory, and the data exists in multiple caches
  • the I state indicates that the cache line is invalid.
  • the storage consistency model includes, from strong to weak, the strictness of the required execution order: sequential consistency (SC) model, total store order (TSO) model, relaxed model (relax model, RM) )Wait.
  • SC sequential consistency
  • TSO total store order
  • RM relaxed model
  • the SC model requires that the operation sequence of reading and writing shared memory on the hardware is strictly consistent with the operation sequence required by the software instructions;
  • TSO model based on the SC model, introduces a cache mechanism, which relaxes the write-read (write first and then read) operation.
  • the order constraint of that is, the read operation in the write-read operation can be completed before the write operation;
  • the RM model is the most relaxed, and does not impose order constraints on any read and write operations, which simplifies the hardware implementation.
  • blocking (fence) subsequent operations to ensure the order of execution.
  • the strong order model is not suitable for a certain read and write combination (such as write-write (write first and then write), write-read (write first and then read) , read-write (read first and then write), read-read (read first and then read)) have order constraints, but the weak order model does not have the order constraints.
  • the weak-order model should be executed serially according to the execution order of the strong-order model, so as to ensure that the order in which the execution results are globally visible conforms to the requirements of the strong-order model.
  • a chip system provided by an embodiment of the present application includes an input output (I/O) device 11 and a SoC chip 12 other than the SoC chip.
  • I/O input output
  • SoC chip 12 When the I/O device 11 and the SoC chip 12 are connected through a high-speed serial computer expansion bus standard (peripheral component interconnect express, PCIE), the I/O device 11 can be a PCIE board.
  • PCIE peripheral component interconnect express
  • the I/O device 11 and the SoC chip 12 When connected through a network transmission protocol, the I/O device 11 may be an Ethernet interface.
  • the I/O device 11 uses the X86 architecture, the corresponding strong order model is the TSO model, the SoC chip 12 uses the ARM architecture, and the corresponding weak order model is the RM model.
  • the SoC chip 12 may include a graphics processing unit (graphics processing unit, GPU) 120, a central processing unit (central processing unit, CPU) 121, a neural network processing unit (neural network processing unit, NPU) 122, system memory management A unit (system memory management unit, SMMU) 123 , a memory controller (memory controller) 124 , a memory 125 , and optionally, a network on chip (network on chip, NOC) 126 .
  • GPU 120, CPU 121, NPU 122, SMMU 123, and memory controller 124 are interconnected through NOC 126.
  • GPU 120 is a graphics processing core
  • CPU 121 is a general-purpose processor core
  • NPU 122 is a dedicated processor core for artificial intelligence (AI)
  • SMMU 123 is a system memory management unit, which is used to provide address translation functions based on page tables
  • the SMMU 123 provides an address translation function between the I/O device 11 and the SoC chip 12
  • the memory controller 124 is used to manage data read and write operations in the memory 125
  • the memory controller 124 may also include a home agent (home agent, HA), the HA is responsible for the cache coherency management of the SoC chip, and can be incorporated into the memory controller 124 or independently mounted on the NOC 126
  • the memory 125 can be a memory or an on-chip memory.
  • the SMMU 123 may include a translation lookaside buffer (TLB) 211 and an address translation circuit 212 .
  • TLB 211 can reduce the time used to access user memory locations, TLB 211 stores the latest translation of virtual memory to physical memory, which can be called an address translation cache.
  • the address translation circuit 212 is used to perform the translation of virtual addresses to physical addresses.
  • the I/O device (hereinafter referred to as the second node) sends a read to the memory controller in the SoC chip (hereinafter referred to as the third node) through the SMMU in the SoC chip (hereinafter referred to as the first node)
  • a write request means that a device in the strong order model sends a read and write request to a device in the weak order model.
  • both models allow out-of-order write-read requests (ie, write first and then read), which can be processed in parallel in both models, so for either model There is no impact on the transmission bandwidth and interaction efficiency between the two.
  • SO-constrained read and write requests of I/O devices after entering the SoC chip, the corresponding read and write operations must still be executed in order to ensure that the order in which the execution results are globally visible meets the requirements of the strong order model.
  • the SMMU acts as an interface between different storage consistency models, and the I/O devices in the strong order model send write request 1 and write to the SMMU in parallel order.
  • Request 2 SMMU sends write request 1 and write request 2 to the memory controller in the weak order model in serial order after two handshakes, that is, SMMU sends write request 1 and the corresponding write request to the memory controller in the first handshake.
  • data and send write request 2 and corresponding data to the memory controller in the second handshake after completion.
  • write request 1 and write request 2 indicate that a write operation is to be performed
  • write response 1 and write response 2 indicate the location of data that can be received and data storage
  • write data 1 and write data 2 include the data to be written and the location of data storage.
  • Write Completion 1 and Write Completion 2 indicate that the write operation is complete
  • acknowledgement (ACK) 1 and ACK2 indicate that the write completion is received.
  • the processing flow in Figure 3 is improved as follows: the SMMU sends a write request 1 to the memory controller and receives a write response 1, and does not You need to wait for the completion of write request 1. You can first send write request 2 and receive write response 2. Then SMMU sends write data 1 and write data 2 in parallel, receives write completion 1 and write completion 2 in parallel, and sends ACK1 and ACK2 in parallel. Among them, ACK1 is earlier than ACK2, which informs the memory controller that the execution result of the write request has been globally visible.
  • the delay of SMMU waiting for the memory controller to return a write response is still very large, which will still reduce the transmission bandwidth and interaction efficiency between devices with different storage consistency models.
  • the SMMU and the memory controller still need at least one handshake, and the sequential processing mechanism is more cumbersome.
  • the modules that comply with the SO constraints are sent to the modules that comply with the RO constraints.
  • the module sends read and write requests it will also reduce the transmission bandwidth and interaction efficiency within the model.
  • the processors (such as GPU 120, CPU 121, NPU 122, etc.) in the SOC chip belonging to the weak order model in Figure 1 (hereinafter referred to as the second node) pass through interfaces (such as NOC 126, interface modules in the processor, etc. ) (hereinafter referred to as the first node) sends read and write requests to the memory controller 124 (hereinafter referred to as the third node) as an example to illustrate a typical application of sending read and write requests between different modules within the same storage consistency model Scenes.
  • the memory controller 124 hereinafter referred to as the third node
  • the embodiments of the present application provide a method for executing read and write operations, which can be applied to communication between different storage consistency models, and can also be applied to communication between different modules within the same storage consistency model, so as to optimize the model Internal transmission bandwidth and interaction efficiency.
  • the interface node SMMU For the communication between different storage consistency models, as shown in Figure 7, by extending the cache coherence (CC) domain of the weak sequential model, the interface node SMMU between different models is also included. In the CC range, sequential processing is completed at the SMMU, so that parallel read and write requests from the strong-order model can also be processed in parallel in the weak-order model, improving the transmission bandwidth and interaction efficiency between devices that comply with SO constraints and devices that comply with RO constraints.
  • the SMMU completes the sequential processing, the memory controller of the weak sequential model does not need the sequential processing mechanism. When the memory controller changes, it is not necessary to re-establish the sequential processing mechanism, so the versatility and scalability are stronger.
  • read and write requests have a clear sequence relationship in software, and the strong sequence model where the I/O device is located constrains the sequence of such read and write requests, in the strong sequence model, read and write requests are issued in sequence can be efficiently processed in parallel.
  • the memory controller inside the weak order model implements the cache consistency. Participate in cache coherency management, so it is impossible to perform cache coherence processing in SMMU to ensure that the globally visible order of the execution results in the weak sequence model meets the requirements of the strong sequence model.
  • the memory controller can only achieve cache coherence through the handshake process, resulting in The transmission bandwidth and interaction efficiency between devices with different storage consistency models are reduced.
  • the processing authority of cache coherence is moved from the internal memory controller to the SMMU. After the SMMU receives the read and write requests of the strong sequence model, the sequence processing can be completed, ensuring the execution result.
  • the globally visible order in the weak order model meets the requirements of the strong order model.
  • serial handshake with I/O devices can be avoided, and read and write requests can be processed in parallel in the weak sequential model, improving parallel processing efficiency.
  • the scope of the CC that complies with the RO constraint can be extended, and the module (such as the processor) that is located inside the model that complies with the SO constraint and the module that complies with the RO constraint can be extended.
  • Interfaces between modules are also included in the CC scope, so that read and write requests from modules that adhere to SO constraints can also be processed in parallel at the interface and modules that adhere to RO constraints to optimize transmission bandwidth and interaction efficiency within the model .
  • the present application moves the processing authority of cache coherence from the memory controller to the interface between the processor and the memory controller, and the interface receives read and write requests from the processor After that, the sequential processing can be completed, which ensures that the execution result in the globally visible order of the module that complies with the RO constraint meets the requirements of the module that strongly complies with the RO constraint.
  • the read-write operation execution method includes:
  • the first node receives the first message and the second message from the second node.
  • the first message is used for requesting to perform read and write operations on the first address managed by the third node
  • the second message is used for requesting read and write operations on the second address managed by the third node.
  • the execution order constraint of the read and write operations of the second node is stricter than that of the read and write operations of the third node, that is, the second node complies with the SO constraint, and the third node complies with the RO constraint. Since the second node complies with the SO constraint, in fact, the first message is used to request read and write operations to the first address managed by the third node in strict order, and the second message is used to request the second address managed by the third node. Read and write operations are performed in strict order.
  • the second node refers to the device that obeys the SO constraint in the strong-order model
  • the third node refers to the device that obeys the RO constraint in the weak-order model.
  • the first node refers to an interface node between the strong-order model and the weak-order model, and the first node may be an independent device or an interface module in the second node or the third node.
  • the second node may be the I/O device 11 located outside the SoC chip 12 in FIG. 1 for sending read and write requests;
  • the third node may be the memory controller 124 in the SoC chip 12 in FIG. 1 or The HA in the memory controller 124 is used for cache coherency management, such as managing the directory of the storage space;
  • the first node may be an MMU, such as the SMMU 123 in the SoC chip 12 of FIG. 1 or the SMMU shown in FIG. 10 .
  • the read-write operation execution circuit 213 in 123, the read-write operation execution circuit 213 is newly added on the SMMU 123 shown in FIG. 2, and is used for executing the read-write operation execution method provided by the present application.
  • FIG. 10 provides a schematic structural diagram of a read-write operation execution circuit 213 .
  • the read-write operation execution circuit 213 includes a sequence processing module 2131 , an operation authority judgment module 2132 and a data cache judgment module 2133 .
  • the sequence processing module 2131 is used for recording the sequence of receiving the first message and the second message, and is used for the operation authority judging module 2132 to perform read and write operations in order.
  • the operation authority judgment module 2132 is used to record whether the operation authority of the first address (eg E state) and the operation authority of the second address (eg E status) are received, and the first message and the second message recorded by the module 2131 are processed according to the sequence.
  • the sequence determines the sequence of reading and writing operations on the first address and the second address. For example, the sequence processing module 2131 records that the first message is received first and then the second message is received.
  • a WriteBack message is sent, and then a writeback message for the second address is sent.
  • the write-back message may include the type of the write operation and the target address (the first address or the second address); for the read operation, the write-back message may include the type of the read operation and the target address.
  • the data cache judgment module 2133 is used to record whether the identifier of the cache address corresponding to the first address returned by the memory controller (for example, the data buffer identifier (data buffer ID, DBID) and the identifier of the cache address corresponding to the first address (for example, DBID) are received. ) to determine whether data needs to be sent.
  • the read/write operation execution circuit 214 may be located in the SMMU as the first node;
  • the on-chip processor accesses the on-chip storage, that is, the communication between different modules in the same storage consistency model
  • the read and write operation execution circuit 214 as the first node may be located in the NOC or the on-chip processor.
  • the present application exemplarily takes a communication scenario between different storage consistency models as an example for description, but is not intended to be limited thereto.
  • the first node refers to the module that obeys the SO constraint in the storage consistency model
  • the third node refers to the module that obeys the RO constraint in the storage consistency model
  • the second node refers to the module that obeys the RO constraint in the storage consistency model.
  • An interface module used for interaction between the first node and the third node in the storage consistency model.
  • the second node is the processor (such as GPU 120, CPU 121, NPU 122, etc.) in the SoC chip in FIG. 1, and the second node is the on-chip NOC 126 in the SoC chip or the interface module of the processor (this module is hardware circuit), the third node is the memory controller 124 in the SoC chip or the HA in the memory controller 124 .
  • the first node, the second node and the third node are different hardware modules inside the processor.
  • the read and write operations involved in this application can support write-write (write first and then write), write-read (write first and then read), read-write (read first and then write), read-read (read first and then read), etc. operate.
  • the first message or the second message may be a write request, corresponding to a write operation, or may be a read request, corresponding to a read operation.
  • the first message or the second message is not limited to one, but may be multiple.
  • the message types of the first message and the second message can be the same, for example, both are write requests (ie, write-write requests) or read requests (ie, read-read requests), or they can be different, for example, one is a write request and the other is a write request.
  • Read requests ie, write-read requests or read-write requests).
  • the first address of the first message and the second address of the second message may be the same or different.
  • the second node may send a first message and a second message to the first node, and the first message and the second message may be write request messages.
  • the first message is used for requesting to perform a write operation on the first address managed by the third node in a strict order
  • the second message is used for requesting a write operation on the second address managed by the third node in a strict order.
  • the first node acquires the operation authority of the first address and the operation authority of the second address from the third node.
  • the operation authority may refer to the E state in the cache coherence, indicating the operation authority that the node has on the address, that is, the first node can obtain the E state of the first address and the E state of the second address from the third node.
  • the CC scope is extended from the third node to the first node, so that the first node participates in the management of cache consistency in the weak sequence model, and other nodes ( For example, the third node) cannot perform read and write operations on the first address and the second address that require operation permissions, that is, the third node's sequential processing authority for read and write requests has been transferred to the first node, the first address and the second address.
  • the execution order of read and write operations is controlled by the first node.
  • the following specifically describes how the first node acquires the operation authority of the first address and the operation authority of the second address.
  • the first node may send a third message to the third node, where the third message includes the first address, and the second message is used to request the operation authority of the first address.
  • the third node may send a fourth message to the first node, the fourth message may be a response message to the third message, and the fourth message is used to indicate the operation authority of the first address.
  • the first node may send a confirmation message of the fourth message to the third node, where the confirmation message is used to indicate that the first node has received the fourth message.
  • the first node may send a third message to the third node, where the third message includes the second address, and the second message is used to request the operation authority of the second address.
  • the third node may send a fourth message to the first node, the fourth message may be a response message to the third message, and the fourth message is used to indicate the operation authority of the second address.
  • the first node may send a confirmation message of the fourth message to the third node, where the confirmation message is used to indicate that the first node has received the fourth message.
  • This application does not limit the order in which the first node obtains the operation authority of the first address and the operation authority of the second address from the third node. For example, it is assumed that the first node receives the first message (including the first address) first and then the second message. (including the second address), the first node may first obtain the operation authority of the second address and then obtain the operation authority of the first address.
  • the following describes how the first node acquires the operation authority of the first address and the operation authority of the second address with reference to FIG. 11 .
  • the first node may send the third message 1 and the third message 2 to the third node, and the third message 1 and the third message 2 may be GET_E messages.
  • the third message 1 includes the first address
  • the third message 2 includes the second address.
  • the third message 1 is used to request the operation authority of the first address
  • the third message 2 is used to request the operation authority of the second address.
  • the present application does not limit the order in which the first node sends the third message 1 and the third message 2 to the third node.
  • the third node sends a fourth message 1 and a fourth message 2 to the first node
  • the fourth message 1 may be a response message (RSP1) of the third message 1
  • the fourth message 2 may be a response message (RSP2) of the third message 2 ).
  • the first node receives the fourth message 1 and the fourth message 2 from the third node.
  • the fourth message 1 is used to instruct the first node to obtain the operation authority of the first address
  • the fourth message 2 is used to instruct the first node to obtain the operation authority of the second address.
  • the first node sends an acknowledgement message 1 (ACK1) of the fourth message 1 and an acknowledgement message 2 (ACK2) of the fourth message 2 to the third node.
  • the two confirmation messages are used to indicate that the first node has received the fourth message.
  • Steps S901 and S902 are not executed in sequence. For example, step S901 may be executed first and then step S902 may be executed, or step S902 may be executed first and then step S901 may be executed.
  • the first node performs read and write operations on the first address and the second address.
  • This application does not limit the execution order of the read and write operations on the first address and the second address by the first node.
  • the first node can perform read and write operations on the first address and the second address in parallel
  • Parallel refers to the next read and write operation without waiting for the completion of the previous read and write operation, so as to realize the parallel processing of multiple read and write operations in the weak sequential model.
  • the requests from the strong order model can be processed in parallel, thereby increasing the number of nodes that obey the SO constraint (the second node) and the nodes that obey the RO constraint (the first node).
  • the following specifically describes how the first node performs read and write operations on the first address and the second address.
  • the first node may send a fifth message to the third node, where the fifth message is used to instruct a read/write operation to be performed on the first address, and the fifth message may be a write-back (WriteBack) message.
  • the fifth message may include the data to be written, the type of the write operation, and the first address; for the read operation, the fifth message may include the type of the read operation and the first address.
  • the first node may send a fifth message to the third node, where the fifth message is used to instruct a read/write operation to be performed on the second address, and the fifth message may be a write-back (WriteBack) message.
  • the fifth message may include the data to be written, the type of the write operation, and the second address; for the read operation, the fifth message may include the type of the read operation and the second address.
  • the order in which the first node sends the fifth message corresponding to the first address and the fifth message corresponding to the second address may be the same as the order in which the first message and the second message are received. For example, if the first node receives the first message first and then the second message, the first node first sends the fifth message corresponding to the first address and then sends the fifth message corresponding to the second address.
  • the third node may send a sixth message to the first node, where the sixth message may be a response message to the fifth message, and the sixth message is used to indicate a cache address corresponding to the first address.
  • the third node may send a sixth message to the first node, where the sixth message may be a response message to the fifth message, and the sixth message is used to indicate the cache address corresponding to the second address .
  • the first node After receiving the sixth message, the first node sends a seventh message to the third node.
  • the seventh message may be a WriteData message, and the seventh message is used to perform read and write operations on the cache address corresponding to the first address.
  • the first node After receiving the sixth message, the first node sends a seventh message to the third node.
  • the seventh message may be a write data (WriteData) message, and the seventh message is used to read the cache address corresponding to the second address. write operation.
  • the first node sends the fifth message 1 and the fifth message 2 to the third node
  • the fifth message 1 and the fifth message 2 may be write-back (WriteBack) messages.
  • the fifth message 1 corresponds to the first message and is used for instructing to perform a write operation on the first address
  • the fifth message 2 corresponds to the second message and is used for instructing to perform a write operation on the second address. Since the first node first receives the first message from the second node and then receives the second message, the first node first sends the fifth message 1 to the third node and then sends the fifth message 2.
  • the parallelism at this time means that the first node can send the fifth message 2 without waiting for all read and write operations corresponding to the fifth message 1 to be completed.
  • the third node sends the sixth message 1 and the sixth message 2 to the first node
  • the sixth message 1 may be the response message (RSP3) of the fifth message 1
  • the sixth message 2 may be the response message (RSP4) of the fifth message 2 ).
  • the sixth message 1 is used to indicate the cache address corresponding to the first address
  • the fifth message 2 is used to indicate the cache address corresponding to the second address.
  • the first node sends a seventh message 1 and a seventh message 2 to the third node, and the seventh message may be a write data (WriteData) message.
  • the seventh message 1 is used to write data to the cache address corresponding to the first address
  • the seventh message 2 is used to write data to the cache address corresponding to the second address.
  • the first node may release the operation authority of the first address to the third node after completing the read and write operations on the first address.
  • the above seventh message may also be used to release the operation authority of the first address to the third node. In this way, the third node or other nodes can continue to perform read and write operations on the first address.
  • the first node can release the operation authority of the second address to the third node after completing the read and write operations on the second address.
  • the above seventh message can also be used to release the operation of the second address to the third node. permissions. In this way, the third node or other nodes can continue to perform read and write operations on the second address.
  • the seventh message 1 is further used to instruct the third node to release the operation authority of the first address; the seventh message 2 is also used to instruct the third node to release the operation authority of the second address.
  • the first node receives the first message and the second message from the second node, the second node complies with the SO constraint, and the first message requests to read the first address managed by the third node Write operation, the second message requests to perform read and write operations on the second address managed by the third node, and the third node complies with the RO constraint; then the first node obtains the operation authority of the first address and the operation authority of the second address from the third node , so that the first node participates in the management of cache consistency, and other nodes cannot perform read and write operations on the first address and the second address that require operation permissions, that is, the execution order of the read and write operations of the first address and the second address is determined by Controlled by the first node, then the order in which the execution results are globally visible is also controlled by the first node. Thereby, the order in which the execution results of the read and write operations performed by the nodes complying with the RO constraint are globally
  • the third node requests the operation authority of the first address (or the second address) or performs read and write operations on the first address (or the second address) that require operation authority. What will the first node do? Processing to meet the storage consistency requirements and ensure that the order in which the execution results are globally visible conforms to the requirements of the strong order model.
  • the first node if it requests the operation authority of the first address but does not obtain the operation authority of the first address, it receives a request from the third node to perform a read and write operation on the first address that requires operation authority , or, if the operation authority of the first address is requested, the first node indicates to the third node that the operation authority of the first address has not been obtained.
  • the first node before requesting the operation authority of the second address but not obtaining the operation authority of the second address, receives a request from the third node to perform read and write operations on the second address that require operation authority, or requests the second address the operation authority of the address, the first node indicates to the third node that the operation authority of the second address has not been obtained.
  • the above-mentioned read-write operation execution method further includes:
  • the first node Before the first node acquires the operation authority of the first address (or the second address), the first node receives an eighth message from the third node.
  • the eighth message is used to request a read and write operation on the first address that requires an operation authority, or in other words, is used to request an operation authority of the first address.
  • the eighth message is used to request a read and write operation on the second address that requires an operation authority, or, in other words, is used to request an operation authority of the second address.
  • the eighth message may be a snoop message.
  • the third node sends a response message (RSP1) of the third message 1 (GET_E1) to the first node, so that the first node obtains the operation authority of the first address (or the second address)
  • the third node sends the eighth message (sniff message) to the first node, so that the first node receives the eighth message from the third node, the eighth message is used to request the first address (or the second address) Read and write operations that require operation authority, or are used to request operation authority of the first address (or the second address).
  • the first node sends a ninth message to the third node.
  • the ninth message is used to indicate that the operation authority of the first address (or the second address) has not been acquired.
  • the ninth message may be a response message of the eighth message, for example, the ninth message may be a sniffing response message.
  • the first node sends a ninth message (sniff response message) to the third node, and the eighth message is used to indicate that the operation authority of the first address (or the second address) is not obtained.
  • the third node if, before the first node obtains the operation authority of the first address, the third node requests to perform a read and write operation on the first address that requires operation authority, or requests the operation authority of the first address, the first node Indicates to the third node that the operation authority of the first address is not obtained. This enables the third node or other nodes to perform read and write operations on the first address.
  • the third node if before the first node obtains the operation authority of the second address, the third node requests to perform a read and write operation on the second address that requires operation authority, or requests the operation authority of the second address, the first node will send a request to the second address.
  • the three nodes indicate that the operation authority of the second address is not obtained. This enables the third node or other nodes to perform read and write operations on the second address.
  • the third node requests to perform a read and write operation on the first address that requires operation permission before the first node starts to perform a write operation on the first address but acquires the cache address corresponding to the first address , or to request the operation authority of the first address
  • the first node sends the data written to the cache address corresponding to the first address to the third node, or indicates that it has been released The operation authority of the first address.
  • the third node requests to perform a read and write operation on the second address that requires operation permissions, or requests the second address.
  • the first node sends the data written to the cache address corresponding to the second address to the third node after obtaining the cache address corresponding to the second address, or indicates that the operation authority of the second address has been released.
  • the above-mentioned read-write operation execution method further includes:
  • the first node starts to write the first address (or the second address) but does not obtain the cache address corresponding to the first address (or the second address), and the first node receives the twelfth message from the third node.
  • the twelfth message is used to request a read and write operation on the first address (or the second address) that requires operation authority, or in other words, is used to request the operation authority of the first address (or the second address).
  • the twelfth message may be a snoop message.
  • the third node before the third node sends the sixth message (including the cache address corresponding to the first address (or the second address)) to the first node, the third node sends the tenth message to the first node.
  • Two messages so that the first node receives a twelfth message from the third node, and the twelfth message is used to request a read and write operation on the first address (or the second address) that requires operation permission, or , which is used to request the operation permission of the first address (or the second address).
  • the first node After acquiring the cache address corresponding to the first address (or the second address), the first node sends a thirteenth message to the third node.
  • the first node after the first node receives the sixth message 1 (including the cache address corresponding to the first address (or the second address)) from the first node, it sends the tenth message to the third node.
  • the thirteenth message may be a response message of the twelfth message, for example, the thirteenth message may be a sniffing response message.
  • the thirteenth message may include data written to the cache address corresponding to the first address (or the second address).
  • the thirteenth message may have the function of the seventh message instead of the seventh message, that is, the thirteenth message may also be used to instruct the third node to release the operation authority of the first address (or the second address).
  • the thirteenth message may be sent after the seventh message (in this case, the seventh message is used to instruct the third node to release the operation authority of the first address (or the second address)), using To indicate that the operation authority of the first address (or the second address) has been released.
  • the third node if the third node requests the first address to perform a write operation on the first address but before obtaining the cache address corresponding to the first address, the read and write operations, or request the operation permission of the first address, after obtaining the cache address corresponding to the first address, the first node sends the data written to the cache address corresponding to the first address to the third node, or, Indicates that the operation authority of the first address has been released.
  • the first node After obtaining the cache address corresponding to the first address, the first node sends the data written to the cache address corresponding to the first address to the third node, so that the third node can directly obtain the data; After the seventh message is sent, it is indicated that the operation authority of the first address has been released, so that the third node or other nodes can perform read and write operations on the first address.
  • the third node requests to perform a read and write operation on the second address that requires operation permissions, or requests the second address.
  • the first node sends the data written to the cache address corresponding to the second address to the third node after obtaining the cache address corresponding to the second address, or indicates that the operation authority of the second address has been released.
  • the first node After obtaining the cache address corresponding to the second address, the first node sends the data written to the cache address corresponding to the second address to the third node, so that the third node can directly obtain the data; After the seventh message is sent, it is indicated that the operation authority of the second address has been released, so that the third node or other nodes can perform read and write operations on the second address.
  • the first node may release the operation authority of the first address and the operation authority of the second address to the third node when the preset conditions are satisfied operation authority. This enables the third node or other nodes to perform read and write operations on the first address and the second address.
  • the preset condition is that the third node requests the operation authority of the first address and the second address.
  • the third node requests to perform a read and write operation on the first address that requires operation authority, or requests the operation authority of the first address, the first node releases the operation authority of the first address to the third node, and obtains the operation authority of the first address from the third node again.
  • the third node requests to perform read and write operations on the second address that require operation authority, or requests the first The operation authority of the second address, the first node releases the operation authority of the second address to the third node, and obtains the operation authority of the second address from the third node again.
  • the above-mentioned read-write operation execution method further includes:
  • the first node After the first node obtains the operation authority of the first address (or the second address) and before starting to perform read and write operations on the first address (or the second address), the first node receives the tenth message from the third node .
  • the tenth message is used to request a read and write operation on the first address (or the second address) that requires operation authority, or in other words, is used to request the operation authority of the first address (or the second address).
  • the tenth message may be a snoop message.
  • the third node sends the tenth message (sniff message) to the first node, so that the first node sends the message from the third node to the third node.
  • a tenth message is received, where the tenth message is used to request a read and write operation on the first address (or the second address) that requires operation authority, or is used to request the operation authority of the first address (or the second address).
  • the first node sends an eleventh message to the third node, and obtains the operation authority of the first address (or the second address) from the third node again.
  • the eleventh message is used to indicate releasing the operation authority of the first address (or the second address).
  • the eleventh message may be a response message of the tenth message, for example, the eleventh message may be a sniffing response message.
  • the first node sends an eleventh message (sniff response message) to the third node, where the eleventh message is used to indicate releasing the operation authority of the first address (or the second address), And re-send the third message (GET_E) to the third node, and receive the fourth message (RSP1/RSP2) from the third node to obtain the operation authority of the first address (or the second address), and send the third message to the third node.
  • Four-message acknowledgment message (ACK1) and then re-execute the read and write operation procedures for the first address (or the second address) and the procedure for releasing the operation authority of the first address (or the second address).
  • the third node if after the first node obtains the operation authority of the first address and before starting to read and write the first address, the third node requests to perform the read and write operation of the first address that requires the operation authority, or , the operation authority of the first address is requested, the first node releases the operation authority of the first address to the third node, and obtains the operation authority of the first address from the third node again. After the first node releases the operation authority of the first address to the third node, the third node can directly perform read and write operations on the first address. If the first node obtains the operation authority of the first address from the third node again, it can continue to perform read and write operations on the first address.
  • the third node requests to perform read and write operations on the second address that require operation authority, or requests the first The operation authority of the second address
  • the first node releases the operation authority of the second address to the third node, and obtains the operation authority of the second address from the third node again.
  • the third node can directly perform read and write operations on the second address. If the first node obtains the operation authority of the second address from the third node again, it can continue to perform read and write operations on the second address.
  • the preset condition is that the time when the first node obtains the operation authority of the first address from the third node is greater than or equal to the first preset time, and the first node obtains the second address from the third node.
  • the time of the operation authority of the address is greater than or equal to the second preset time.
  • the first preset time and the second preset time may be the same or different.
  • steps S901 and S902 are not executed sequentially.
  • step S902 to be executed first and then step S901 to be executed, that is, the first node obtains the operation authority of the first address (or the second address) in advance, and when receiving the first message (or second message), the first address (or second address) can be read and written quickly.
  • the first node may obtain the operation authority of the first address (or the second address) in advance according to the historical read and write operations.
  • the first node Within a preset time after the first node acquires the operation authority of the first address from the third node, if the first message is not received, the first node releases the operation authority of the first address to the third node. Similarly, within a preset time after the first node obtains the operation authority of the second address from the third node, if the second message is not received, the first node releases the operation authority of the second address to the third node.
  • the first node after the first node obtains the operation authority of the first address (or the second address) through the third message and the fourth message, it does not receive the first message (or the second address) after a preset time. two messages), the first node sends a fourteenth message to the third node to indicate releasing the operation authority of the first address (or the second address). Subsequently, when the first node receives the first message (or the second message) again, the interaction process corresponding to the above steps S902-S903 is re-executed to complete the read and write operations.
  • the first node after the first node obtains the operation authority of the first address (or the second address) through the third message and the fourth message, the first node receives the first message (or the first address) within a preset time. two messages), the first node executes the interaction process corresponding to step S903 to complete the read and write operations.
  • the first node does not need to perform the process of acquiring the operation permission after receiving the read and write request from the second node, and can quickly perform read and write operations on the first address (or the second address).
  • the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
  • the disclosed systems, devices and methods may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • a software program it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, optical fiber, Digital Subscriber Line, DSL) or wireless (eg infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or data storage devices including one or more servers, data centers, etc. that can be integrated with the medium.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (eg, a Solid State Disk (SSD)), and the like.
  • a magnetic medium eg, a floppy disk, a hard disk, a magnetic tape
  • an optical medium eg, a DVD
  • a semiconductor medium eg, a Solid State Disk (SSD)

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

本申请公开了一种读写操作执行方法和SoC芯片,用于实现遵守RO约束的节点执行读写操作的执行结果全局可见的顺序符合遵守SO约束的节点的要求。读写操作执行方法包括:第一节点从第二节点接收第一消息和第二消息;第一消息用于请求对第三节点管理的第一地址进行读写操作;第二消息用于请求对第三节点管理的第二地址进行读写操作;第二节点的读写操作的执行顺序约束比第三节点的读写操作的执行顺序约束严格;第一节点从第三节点获取第一地址的操作权限和第二地址的操作权限;第一节点对第一地址和第二地址进行读写操作。

Description

读写操作执行方法和SoC芯片 技术领域
本申请涉及存储领域,尤其涉及一种读写操作执行方法和片上系统(system on chip,SoC)芯片。
背景技术
多个进程(软件)之间可以通过访问共享内存(shared memory)来传递数据。具体的,多个进程向硬件(例如中央处理器(central processing unit,CPU))发送读写命令,由硬件对共享内存执行读写操作。硬件执行读写操作的顺序将影响软件最终看到的执行结果,因此可以通过存储一致性模型(memory consistency model)来对读写操作的执行顺序进行不同程度的要求,以保证执行结果符合软件预期。
不同的存储一致性模型所要求的执行顺序的严格程度是不同的,当遵守严格顺序(strict order,SO)约束的存储一致性模型(简称强顺序模型)的节点向遵守宽松顺序(relax order,RO)约束的存储一致性模型(简称弱顺序模型)的节点请求读写操作时,也要在弱顺序模型内按照强顺序模型的执行顺序来执行读写操作,以保证执行结果全局可见(global observable,GO)的顺序符合强顺序模型的要求。
发明内容
本申请实施例提供一种读写操作执行方法和SoC芯片,用于实现遵守RO约束的节点执行读写操作的执行结果全局可见的顺序符合遵守SO约束的节点的要求。
为达到上述目的,本申请的实施例采用如下技术方案:
第一方面,提供了一种读写操作执行方法,包括:第一节点从第二节点接收第一消息和第二消息;第一消息用于请求对第三节点管理的第一地址进行读写操作;第二消息用于请求对第三节点管理的第二地址进行读写操作;第二节点的读写操作的执行顺序约束比第三节点的读写操作的执行顺序约束严格;第一节点从第三节点获取第一地址的操作权限和第二地址的操作权限;第一节点对第一地址和第二地址进行读写操作。
本申请实施例提供的读写操作执行方法,第一节点从第二节点接收第一消息和第二消息,第二节点遵守SO约束,第一消息请求对第三节点管理的第一地址进行读写操作,第二消息请求对第三节点管理的第二地址进行读写操作,第三节点遵守RO约束;则第一节点从第三节点获取第一地址的操作权限和第二地址的操作权限,使得第一节点参与到缓存一致性的管理,其他节点都无法对第一地址和第二地址进行需要操作权限的读写操作,即第一地址和第二地址的读写操作的执行顺序由第一节点来控制,那么执行结果全局可见的顺序也就由第一节点来控制。从而实现遵守RO约束的节点执行读写操作的执行结果全局可见的顺序符合遵守SO约束的节点的要求。
在一种可能的实施方式中,第一节点对第一地址和第二地址进行读写操作,包括:第一节点并行对第一地址和第二地址进行读写操作。该实施方式可以实现并行处理来自遵守SO约束的节点的读写操作请求,从而提高遵守SO约束的节点(第二节点)与 遵守RO约束的节点(第三节点)之间的传输带宽和交互效率。
在一种可能的实施方式中,第一节点并行对第一地址和第二地址进行读写操作,包括:第一节点按照第一消息和第二消息的接收顺序并行对第一地址和第二地址进行读写操作。该实施方式可以保证执行结果的全局可见的顺序符合强顺序模型要求。
在一种可能的实施方式中,第二节点遵守严格顺序SO约束,第三节点遵守宽松顺序RO约束。该实施方式解释了何为第二节点的读写操作的执行顺序约束比第三节点的读写操作的执行顺序约束严格。
在一种可能的实施方式中,还包括:第一节点在完成对第一地址的读写操作后,向第三节点释放第一地址的操作权限。这样第三节点或其他节点可以继续对第一地址进行读写操作。第一节点在完成对第二地址的读写操作后,向第三节点释放第二地址的操作权限。这样第三节点或其他节点可以继续对第二地址进行读写操作。
在一种可能的实施方式中,第一节点从第三节点获取第一地址的操作权限和第二地址的操作权限,包括:第一节点从第三节点获取第一地址的E态以及第二地址的E态。该实施方式提供了第一地址的操作权限和第二地址的操作权限的一种具体形式。
在一种可能的实施方式中,第一节点在请求第一地址的操作权限但未获取第一地址的操作权限时,接收到第三节点请求对第一地址进行需要操作权限的读写操作,或者,请求第一地址的操作权限,则第一节点向第三节点指示未获取第一地址的操作权限;使得第三节点或其他节点可以对第一地址进行读写操作。第一节点在请求第二地址的操作权限但未获取第二地址的操作权限时,接收到第三节点请求对第二地址进行需要操作权限的读写操作,或者,请求第二地址的操作权限,则第一节点向第三节点指示未获取第二地址的操作权限;使得第三节点或其他节点可以对第二地址进行读写操作。
在一种可能的实施方式中,在第一节点从第三节点获取第一地址的操作权限和第二地址的操作权限之后,方法还包括:在预设条件满足时,第一节点向第三节点释放第一地址的操作权限和第二地址的操作权限。使得第三节点或其他节点可以对第一地址和第二地址进行读写操作。
在一种可能的实施方式中,预设条件为第三节点请求第一地址和第二地址的操作权限。使得第三节点或其他节点可以对第一地址和第二地址进行读写操作。
在一种可能的实施方式中,预设条件为第一节点从第三节点获取第一地址的操作权限的时间大于或等于第一预设时间,以及第一节点从第三节点获取第二地址的操作权限的时间大于或等于第二预设时间。第一节点在接收了来自第二节点的读写请求后不必再执行获取第一地址的操作权限的流程,可以快速对第一地址执行读写操作。第一节点在接收了来自第二节点的读写请求后不必再执行获取第二地址的操作权限的流程,可以快速对第二地址执行读写操作。
在一种可能的实施方式中,还包括:在第一节点获取了第一地址的操作权限之后并且开始对第一地址进行读写操作之前,第三节点请求对第一地址进行需要操作权限的读写操作,或者,请求第一地址的操作权限,则第一节点向第三节点释放第一地址的操作权限,并重新从第三节点获取第一地址的操作权限;第一节点重新从第三节点获取第一地址的操作权限,则可以继续对第一地址进行读写操作。在第一节点获取了 第二地址的操作权限之后并且开始对第二地址进行读写操作之前,第三节点请求对第二地址进行需要操作权限的读写操作,或者,请求第一地址的操作权限,则第一节点向第三节点释放第二地址的操作权限,并重新从第三节点获取第二地址的操作权限。第一节点重新从第三节点获取第二地址的操作权限,则可以继续对第二地址进行读写操作。
在一种可能的实施方式中,还包括:在第一节点开始对第一地址进行写操作但未获取第一地址对应的缓存地址时,第三节点请求对第一地址进行需要操作权限的读写操作,或者,请求第一地址的操作权限,则第一节点在获取第一地址对应的缓存地址之后,向第三节点发送第一地址对应的缓存地址写入的数据,或者,指示已经释放第一地址的操作权限;使得第三节点或其他节点能够对第一地址进行读写操作。在第一节点开始对第二地址进行写操作但未获取第二地址对应的缓存地址时,第三节点请求对第二地址进行需要操作权限的读写操作,或者,请求第二地址的操作权限,则第一节点在获取第二地址对应的缓存地址之后,向第三节点发送第二地址对应的缓存地址写入的数据,或者,指示已经释放第二地址的操作权限。使得第三节点或其他节点能够对第二地址进行读写操作。
在一种可能的实施方式中,第二节点为片上系统SoC芯片之外的输入输出I/O设备,第一节点为SoC芯片中的内存管理单元(memory management unit,MMU),该MMU可以为SMMU,第三节点为SoC芯片中的内存控制器或内存控制器中的本地代理HA。该实施方式提供了一种具体应用场景。
在一种可能的实施方式中,第二节点为SoC芯片中的处理器,第一节点为SoC芯片中的片上互联网络NOC或者处理器的接口模块,第三节点为SoC芯片中的内存控制器或内存控制器中的HA。该实施方式提供了另一种具体应用场景。
第二方面,提供了一种片上系统SoC芯片,其特征在于,包括:第一节点和内存控制器,第一节点用于:从第二节点接收第一消息和第二消息;第一消息用于请求对内存控制器管理的第一地址进行读写操作;第二消息用于请求对内存控制器管理的第二地址进行读写操作;第二节点的读写操作的执行顺序约束比内存控制器的读写操作的执行顺序约束严格;从内存控制器获取第一地址的操作权限和第二地址的操作权限;对第一地址和第二地址进行读写操作。
在一种可能的实施方式中,第一节点具体用于:并行对第一地址和第二地址进行读写操作。
在一种可能的实施方式中,第一节点具体用于:按照第一消息和第二消息的接收顺序并行对第一地址和第二地址进行读写操作。
在一种可能的实施方式中,第二节点遵守严格顺序SO约束,内存控制器遵守宽松顺序RO约束。
在一种可能的实施方式中,第一节点还用于:在完成对第一地址的读写操作后,向内存控制器释放第一地址的操作权限;在完成对第二地址的读写操作后,向内存控制器释放第二地址的操作权限。
在一种可能的实施方式中,第一节点具体用于:从内存控制器获取第一地址的E态以及第二地址的E态。
在一种可能的实施方式中,第一节点还用于:在请求第一地址的操作权限但未获取第一地址的操作权限时,接收到内存控制器请求对第一地址进行需要操作权限的读写操作,或者,请求第一地址的操作权限,则向内存控制器指示未获取第一地址的操作权限;在请求第二地址的操作权限但未获取第二地址的操作权限时,接收到内存控制器请求对第二地址进行需要操作权限的读写操作,或者,请求第二地址的操作权限,则向内存控制器指示未获取第二地址的操作权限。
在一种可能的实施方式中,在从内存控制器获取第一地址的操作权限和第二地址的操作权限之后,第一节点还用于:在预设条件满足时,向内存控制器释放第一地址的操作权限和第二地址的操作权限。
在一种可能的实施方式中,预设条件为内存控制器请求第一地址和第二地址的操作权限。
在一种可能的实施方式中,预设条件为第一节点从内存控制器获取第一地址的操作权限的时间大于或等于第一预设时间,以及第一节点从内存控制器获取第二地址的操作权限的时间大于或等于第二预设时间。
在一种可能的实施方式中,第二节点为SoC芯片之外的输入输出I/O设备,第一节点为SoC芯片中的内存管理单元MMU。
在一种可能的实施方式中,第二节点为SoC芯片中的处理器,第一节点为SoC芯片中的片上互联网络NOC或者处理器的接口模块。
在一种可能的实施方式中,第一节点包括顺序处理模块、操作权限判断模块和数据缓存判断模块;顺序处理模块用于记录接收第一消息和第二消息的顺序;操作权限判断模块用于记录是否收到第一地址的操作权限和第二地址的操作权限,并根据顺序来确定对第一地址和第二地址进行读写操作的先后顺序;数据缓存判断模块用于记录是否收到第一地址对应的缓存地址的标识以及第二地址对应的缓存地址的标识,从而确定是否发送数据。
第二方面的技术效果参照第一方面及其任一实施方式所述的内容,在此不再重复。
附图说明
图1为本申请实施例提供的一种I/O设备与SoC芯片通信的芯片系统的结构示意图;
图2为本申请实施例提供的一种SMMU的结构示意图;
图3为本申请实施例提供的一种不同存储一致性模型的RO约束和SO约束的示意图;
图4为本申请实施例提供的一种在弱顺序模型实现执行结果的全局可见的顺序符合强顺序模型要求的示意图一;
图5为本申请实施例提供的一种在弱顺序模型实现执行结果的全局可见的顺序符合强顺序模型要求的示意图二;
图6为本申请实施例提供的一种同一存储一致性模型内部不同模块之间通信的示意图;
图7为本申请实施例提供的一种对弱顺序模型的改进的示意图;
图8为本申请实施例提供的一种对同一存储一致性模型的改进的示意图;
图9为本申请实施例提供的一种读写操作执行方法的流程示意图一;
图10为本申请实施例提供的一种读写操作执行方法的流程示意图二;
图11为本申请实施例提供的一种读写操作执行方法的流程示意图三;
图12为本申请实施例提供的一种读写操作执行方法的流程示意图四;
图13为本申请实施例提供的一种读写操作执行方法的流程示意图五;
图14为本申请实施例提供的一种读写操作执行方法的流程示意图六;
图15为本申请实施例提供的一种读写操作执行方法的流程示意图七;
图16为本申请实施例提供的一种读写操作执行方法的流程示意图八;
图17为本申请实施例提供的一种读写操作执行方法的流程示意图九;
图18为本申请实施例提供的一种读写操作执行方法的流程示意图十;
图19为本申请实施例提供的一种读写操作执行方法的流程示意图十一。
具体实施方式
首先对本申请涉及的一些概念进行描述:
存储一致性:指硬件执行读写操作后,读写操作的执行结果(是否执行了读写操作)对其他节点全局可见的顺序有一定要求,例如某一节点先后对两个地址分别进行一次读写操作(相当于执行了两次读写操作),或者,某一节点先后对一个地址进行两次读写操作,其他节点不仅获知已经执行了两次读写操作,并且获知(即全局可见)这两次读写操作的执行结果的顺序符合软件预期,即为满足了存储一致性的要求。例如,在对读写操作的执行顺序有要求的情况下,先后对第一地址和第二地址执行写操作,则执行结果全局可见的正确顺序包括:对第一地址和第二地址均执行了写操作,只对第一地址执行了写操作,或者,第一地址和第二地址均未执行写操作。执行结果全局可见的错误顺序包括:只对第二地址执行了写操作。
缓存一致性:处理器相对于存储器是快速运行的设备,处理器对存储器进行读写操作时,如果等待操作完成再处理其他任务,将造成处理器阻塞,降低处理器的工作效率。因此,可以针对每个处理器配置一个缓存(缓存的速度远远快于存储器但容量小于存储器)。当处理器向存储器中写数据时,可以将数据先写入缓存然后就可以处理其他任务,由直接存储器访问(direct memory access,DMA)器件来将数据存储至存储器;同理,当处理器读存储器中的数据时,由DMA器件先将数据从存储器存储至缓存,再由处理器从缓存中读取数据。当不同处理器通过缓存对存储器中同一地址进行读写操作时,对读写操作的执行顺序有严格要求,即前一个读写操作完成前阻塞后一个读写操作,以防止同时进行读写操作而引起缓存中数据与存储器中数据不一致。
缓存一致性的设备遵守MESI协议,在MESI协议中规定了缓存线(cache line)(缓存中的最小缓存单位)的四种独占状态,包括:E(Exclusive)态、M(Modified)态、S(Shared)态和I(Invalid)态。其中,E态表示该缓存线有效,缓存中数据和存储器中数据一致,数据只存在于本缓存中;M态表示该缓存线有效,数据被修改了,缓存中数据和存储器中数据不一致,数据只存在于本缓存中;S态表示该缓存线有效,缓存中数据和存储器中数据一致,数据存在于多个缓存中;I态表示该缓存线无效。
存储一致性模型按照所要求的执行顺序的严格程度从强到弱包括:顺序一致性(sequential consistency,SC)模型、完全存储定序(total store order,TSO)模型、宽 松模型(relax model,RM)等。SC模型要求硬件上读写共享内存的操作顺序与软件指令要求的操作顺序严格保持一致;TSO模型,在SC模型的基础上引入了缓存机制,放松了对于写-读(先写后读)操作的顺序约束,即写-读操作中的读操作可以先于写操作完成;RM模型最为宽松,不对任何读写操作进行顺序约束,简化了硬件实现,只是在有需求的时候通过一些软件手段,例如阻塞(fence)后续操作的方式来保证执行顺序。
强顺序模型的设备向弱顺序模型的设备发送读写请求时,可能会出现强顺序模型对于某种读写组合(例如写-写(先写后写)、写-读(先写后读)、读-写(先读后写)、读-读(先读后读))有顺序约束,而弱顺序模型没有该顺序约束的情况。为此,对于强顺序模型内的并行读写请求,在弱顺序模型内要按照强顺序模型的执行顺序来串行执行,以保证执行结果能够全局可见的顺序符合强顺序模型的要求。
首先以图1中的芯片系统为例,说明强顺序模型的设备向弱顺序模型的设备发送读写请求的一种典型应用场景。
如图1所示,本申请实施例提供的一种芯片系统包括SoC芯片之外的输入输出(input output,I/O)设备11和SoC芯片12。当I/O设备11与SoC芯片12通过高速串行计算机扩展总线标准(peripheral component interconnect express,PCIE)连接时,I/O设备11可以为PCIE板卡,当I/O设备11与SoC芯片12通过网络传输协议连接时,I/O设备11可以为以太网接口。
I/O设备11使用X86架构,对应的强顺序模型是TSO模型,SoC芯片12使用ARM架构,对应的弱顺序模型是RM模型。
示例性的,SoC芯片12可以包括图形处理单元(graphics processing unit,GPU)120、中央处理器(central processing unit,CPU)121、神经网络处理器(neural network processing unit,NPU)122、系统内存管理单元(system memory management unit,SMMU)123、内存控制器(memory controller)124、存储器125,可选的,还可以包括片上互联网络(network on chip,NOC)126。GPU 120、CPU 121、NPU 122、SMMU 123、内存控制器124通过NOC 126互联。
其中,GPU 120为图形处理核心;CPU 121为通用处理器核心;NPU 122为人工智能(artificial intelligence,AI)专用处理器核心;SMMU 123为系统内存管理单元,用于基于页表提供地址翻译功能,例如,SMMU 123提供I/O设备11与SoC芯片12之间地址翻译功能;内存控制器124用于管理存储器125中的数据读写操作;内存控制器124还可以包括本地代理(home agent,HA),HA负责SoC芯片的缓存一致性管理,可以合并入内存控制器124中,也可以独立挂载在NOC 126上;存储器125可以是存储器,也可以是片内存储器。
进一步的,如图2所示,SMMU 123可以包括转换检测缓冲区(translation lookaside buffer,TLB)211和地址转化电路212。TLB 211可以减少用于访问用户存储器位置的时间,TLB 211将虚拟内存到物理内存的最新转换存储起来,可以称为地址转换缓存。地址转化电路212用于执行虚拟地址到物理地址的转化。
如图3所示,I/O设备(后文中指第二节点)通过SoC芯片中的SMMU(后文中指第一节点)向SoC芯片中的内存控制器(后文中指第三节点)发送读写请求,即为 强顺序模型的设备向弱顺序模型的设备发送读写请求。对于I/O设备的RO约束的读写请求,两个模型都允许乱序的写-读请求(即先写后读),在两种模型中都可以并行处理,因此对于这两个模型之间的传输带宽和交互效率没有影响。而对于I/O设备的SO约束的读写请求,在进入SoC芯片后,仍要按照顺序执行对应的读写操作,保证执行结果能够全局可见的顺序符合强顺序模型的要求。
示例性的,以写操作为例,如图4所示,SMMU作为不同存储一致性模型之间的接口接点,位于强顺序模型中的I/O设备以并行顺序向SMMU发送写请求1和写请求2,SMMU经过两次握手,以串行顺序向弱顺序模型中的内存控制器发送写请求1和写请求2,即SMMU在第一次握手中向内存控制器发送写请求1以及对应的数据,完成后在第二次握手中向内存控制器发送写请求2以及对应的数据。
其中,写请求1和写请求2指示要进行写操作,写响应1和写响应2指示可接收数据以及数据存储的位置,写数据1和写数据2中包括待写的数据以及数据存储的位置,写完成1和写完成2指示写操作完成,确认(acknowledge,ACK)1和ACK2指示接收到写完成。
这种方式使得在弱顺序模型内,有顺序约束的多个读写请求只能串行执行,并且需要SMMU和内存控制器反复握手以保证在弱顺序模型内的执行顺序,降低了不同存储一致性模型的设备之间的传输带宽和交互效率。并且通用性和扩展性不好,内存控制器发生变更时,SMMU要与新的节点重新建立顺序处理机制。
示例性的,如图5所示,为了减少SMMU与内存控制器之间握手耗时,对图3的处理流程进行了如下改进:SMMU向内存控制器发送写请求1并接收写响应1,不需要等待写请求1完成,可以先发送写请求2并接收写响应2,然后SMMU并行发送写数据1和写数据2,并行接收写完成1和写完成2,并行发送ACK1和ACK2。其中,ACK1早于ACK2,告知内存控制器写请求的执行结果已经全局可见。
这种方式中,在某些场景(例如跨芯片)下SMMU等待内存控制器返回写响应的时延仍很大,仍会降低不同的存储一致性模型的设备之间的传输带宽和交互效率。并且,SMMU和内存控制器仍需要至少一次握手,顺序处理机制更加繁琐。
另外,对于同一存储一致性模型内部不同模块之间的通信,当存储一致性模型(例如TSO模型或者SC模型)对执行结果全局可见的顺序有要求时,遵守SO约束的模块向遵守RO约束的模块发送读写请求时,同样会降低模型内部的传输带宽和交互效率。
下面以图1中属于弱顺序模型的SOC芯片中的处理器(例如GPU 120、CPU 121、NPU 122等)(后文中指第二节点)通过接口(例如NOC 126、处理器中的接口模块等)(后文中指第一节点)向内存控制器124(后文中指第三节点)发送读写请求为例,来说明同一存储一致性模型内部不同模块之间发送读写请求的一种典型应用场景。
示例性的,如图6所示,遵守SO约束的处理器并行发出多个读写请求(例如写流(stream-write)请求)时,在这些请求进入遵守RO约束的乱序(out-of-order)总线之前,执行顺序由模型本身保证。当这些请求通过与乱序总线之间的接口进入乱序总线并发送给内存控制器时,为了保证执行结果的全局可见的顺序符合强顺序模型要求,同样采用类似图4的串行执行方式或图5的部分串行执行方式(图6中未示出), 因此会降低模型内部的传输带宽和交互效率。
为此,本申请实施例提供了一种读写操作执行方法,可以应用于不同存储一致性模型之间的通信,也可以应用于同一存储一致性模型内部不同模块之间的通信,以优化模型内部的传输带宽和交互效率。
对于不同存储一致性模型之间的通信,如图7所示,通过对弱顺序模型的缓存一致性(cache coherence,CC)范围(domain)进行扩展,将不同模型之间的接口节点SMMU也纳入CC范围,在SMMU处完成顺序处理,使得来自强顺序模型的并行读写请求在弱顺序模型也可以并行处理,提高遵守SO约束的设备与遵守RO约束的设备之间的传输带宽和交互效率。另外,由于在SMMU完成顺序处理,弱顺序模型的内存控制器不需要顺序处理机制,内存控制器发生变更时,不必重新建立顺序处理机制,因此通用性和延展性更强。
由于读写请求在软件上有明确的先后顺序关系,并且I/O设备所在的强顺序模型约束了这类读写请求之间的顺序,因此在强顺序模型内,读写请求在顺序发出后就可以高效地并行处理。
读写请求到达两个模型之间接口的SMMU后,为了在弱顺序模型保证执行顺序,在未采用本申请提供的方案前,由弱顺序模型内部的内存控制器来实现缓存一致性,SMMU不参与缓存一致性管理,因此无法在SMMU进行缓存一致性处理来保证在弱顺序模型执行结果全局可见的顺序符合强顺序模型要求,只能由内存控制器通过握手过程来实现缓存一致性,从而导致不同的存储一致性模型的设备之间的传输带宽和交互效率降低。本申请通过扩展弱顺序模型的缓存一致性域,将缓存一致性的处理权限从内部内存控制器移到SMMU,SMMU收到强顺序模型的读写请求之后就可以完成顺序处理,保证了执行结果在弱顺序模型的全局可见的顺序符合强顺序模型的要求。
在SMMU完成了顺序处理之后,可以避免与I/O设备的串行握手,读写请求可以在弱顺序模型并行处理,提高并行处理效率。
对于同一存储一致性模型内部不同模块之间的通信,如图8所示,可以对遵守RO约束的CC范围进行扩展,将模型内部位于遵守SO约束的模块(例如处理器)与遵守RO约束的模块(例如内存控制器)之间的接口也纳入CC范围,使得来自遵守SO约束的模块的读写请求在接口和遵守RO约束的模块也可以并行处理,以优化模型内部的传输带宽和交互效率。
另外,本申请通过扩展弱顺序模型的缓存一致性域,将缓存一致性的处理权限从内存控制器移到处理器与内存控制器之间的接口,该接口收到来自处理器的读写请求之后就可以完成顺序处理,保证了执行结果在遵守RO约束的模块全局可见的顺序符合强遵守RO约束的模块的要求。
如图9所示,本申请实施例提供的读写操作执行方法,包括:
S901、第一节点从第二节点接收第一消息和第二消息。
第一消息用于请求对第三节点管理的第一地址进行读写操作,第二消息用于请求对第三节点管理的第二地址进行读写操作。第二节点读写操作的执行顺序约束比第三节点的读写操作的执行顺序约束更严格,即第二节点遵守SO约束,第三节点遵守RO约束。由于第二节点遵守SO约束,所以实际上,第一消息用于请求对第三节点管理 的第一地址按照严格顺序进行读写操作,第二消息用于请求对第三节点管理的第二地址按照严格顺序进行读写操作。
对于不同存储一致性模型之间的通信来说,第二节点指强顺序模型中遵守SO约束的设备,第三节点指弱顺序模型中遵守RO约束的设备。第一节点指位于强顺序模型和弱顺序模型之间的接口节点,第一节点可以是独立的设备,也可以是第二节点或第三节点中的接口模块。
例如,第二节点可以为图1中的位于SoC芯片12之外的I/O设备11,用于发送读写请求;第三节点可以为图1中的SoC芯片12中的内存控制器124或内存控制器124中的HA,用于缓存一致性管理,例如管理存储空间的目录;第一节点可以为MMU,例如可以为图1的SoC芯片12中的SMMU 123或者如图10所示的SMMU 123中的读写操作执行电路213,该读写操作执行电路213是在图2所示的SMMU 123上新增的,用于执行本申请提供的读写操作执行方法。
进一步,图10提供了一种读写操作执行电路213的结构示意图,该读写操作执行电路213包括顺序处理模块2131、操作权限判断模块2132和数据缓存判断模块2133。
顺序处理模块2131用于记录接收第一消息和第二消息的顺序,用于操作权限判断模块2132保序进行读写操作。
操作权限判断模块2132用于记录是否收到第一地址的操作权限(例如E态)和第二地址的操作权限(例如E态),并且根据顺序处理模块2131记录的第一消息和第二消息顺序来确定对第一地址和第二地址进行读写操作的先后顺序,例如顺序处理模块2131记录了先接收到第一消息后接收到第二消息,则顺序处理模块2131先发送针对第一地址的回写(WriteBack)消息,后发送针对第二地址的回写消息。对于写操作来说,回写消息可以包括写操作类型、目标地址(第一地址或第二地址);对于读操作来说,回写消息可以包括读操作类型、目标地址。
数据缓存判断模块2133用于记录是否收到内存控制器返回的第一地址对应的缓存地址的标识(例如数据缓冲标识(data buffer ID,DBID)和第一地址对应的缓存地址的标识(例如DBID),从而确定是否需要发送数据。
需要说明的是,在针对I/O设备访问SoC片内存储的场景,即不同存储一致性模型之间的通信,读写操作执行电路214作为第一节点可以位于SMMU中;类似地,在针对片内处理器访问片内存储的场景,即同一存储一致性模型内部不同模块之间的通信,读写操作执行电路214作为第一节点可以位于NOC或者片内处理器。
另外需要说明的是,本申请示例性的以不同存储一致性模型之间的通信场景为例进行描述,但并不意在限定于此。
对于同一存储一致性模型内部的模块之间的通信来说,第一节点指存储一致性模型中遵守SO约束的模块,第三节点指存储一致性模型中遵守RO约束的模块,第二节点指存储一致性模型中用于第一节点与第三节点之间交互的接口模块。
例如,第二节点为图1中SoC芯片中的处理器(例如GPU 120、CPU 121、NPU 122等),第二节点为SoC芯片中的片上NOC 126或者处理器的接口模块(该模块为硬件电路),第三节点为SoC芯片中的内存控制器124或内存控制器124中的HA。或者,第一节点、第二节点和第三节点为处理器内部不同的硬件模块。
本申请涉及的读写操作,可以支持写-写(先写后写)、写-读(先写后读)、读-写(先读后写)、读-读(先读后读)等操作。第一消息或第二消息可以为写请求,对应写操作,或者,可以为读请求,对应读操作。第一消息或第二消息不限定一个,可以是多个。并且第一消息和第二消息的消息类型可以相同,例如均为写请求(即写-写请求)或读请求(即读-读请求),也可以不同,例如,一个为写请求另一个为读请求(即写-读请求或读-写请求)。并且第一消息的第一地址与第二消息的第二地址可以相同或不同。
示例性的,如图11所示,第二节点可以向第一节点发送第一消息和第二消息,第一消息和第二消息可以为写请求消息。第一消息用于请求对第三节点管理的第一地址按照严格顺序进行写操作,第二消息用于请求对第三节点管理的第二地址按照严格顺序进行写操作。
S902、第一节点从第三节点获取第一地址的操作权限和第二地址的操作权限。
操作权限可以指缓存一致性中的E态,表示节点对于该地址拥有的操作权限,也就是说,第一节点可以从第三节点获取第一地址的E态和第二地址的E态。
第一节点获取第一地址的操作权限和第二地址的操作权限后,CC范围从第三节点扩展到第一节点,使得第一节点参与到弱顺序模型中缓存一致性的管理,其他节点(例如第三节点)不能对第一地址和第二地址进行需要操作权限的读写操作,即第三节点对读写请求的顺序处理权限已经转移到第一节点,第一地址和第二地址的读写操作的执行顺序由第一节点来控制。
下面具体说明第一节点如何获取第一地址的操作权限和第二地址的操作权限。
第一节点可以向第三节点发送第三消息,第三消息中包括第一地址,第二消息用于请求第一地址的操作权限。第三节点在接收到第二消息后,可以向第一节点发送第四消息,第四消息可以为第三消息的响应消息,第四消息用于指示第一地址的操作权限。第一节点接收到第四消息后,可以向第三节点发送第四消息的确认消息,该确认消息用于指示第一节点接收到第四消息。
同理,第一节点可以向第三节点发送第三消息,第三消息中包括第二地址,第二消息用于请求第二地址的操作权限。第三节点在接收到第二消息后,可以向第一节点发送第四消息,第四消息可以为第三消息的响应消息,第四消息用于指示第二地址的操作权限。第一节点接收到第四消息后,可以向第三节点发送第四消息的确认消息,该确认消息用于指示第一节点接收到第四消息。
本申请不限定第一节点从第三节点获取第一地址的操作权限和第二地址的操作权限的顺序,例如,假设第一节点先接收第一消息(包括第一地址)后接收第二消息(包括第二地址),第一节点可以先获取第二地址的操作权限后获取第一地址的操作权限。
下面结合图11说明第一节点如何获取第一地址的操作权限和第二地址的操作权限。
示例性的,如图11所示,第一节点可以向第三节点发送第三消息1和第三消息2,第三消息1和第三消息2可以为GET_E消息。第三消息1中包括第一地址,第三消息2中包括第二地址。第三消息1用于请求第一地址的操作权限,第三消息2用于请求第二地址的操作权限。本申请不限定第一节点向第三节点发送第三消息1和第三消息 2的顺序。
第三节点向第一节点发送第四消息1和第四消息2,第四消息1可以为第三消息1的响应消息(RSP1),第四消息2可以为第三消息2的响应消息(RSP2)。相应地,第一节点从第三节点接收第四消息1和第四消息2。第四消息1用于指示第一节点获取第一地址的操作权限,第四消息2用于指示第一节点获取第二地址的操作权限。
第一节点向第三节点发送第四消息1的确认消息1(ACK1)以及第四消息2的确认消息2(ACK2)。这两个确认消息用于指示第一节点接收到第四消息。
步骤S901和S902无先后执行顺序,例如,可以先执行步骤S901后执行步骤S902,或者,先执行步骤S902后执行步骤S901。
S903、第一节点对第一地址和第二地址进行读写操作。
本申请不对第一节点对第一地址和第二地址进行读写操作的执行顺序进行限定,在一种可能的实施方式中,第一节点可以并行对第一地址和第二地址进行读写操作,并行指不等上一个读写操作完成即进行下一个读写操作,从而在弱顺序模型实现多个读写操作的并行处理。
通过第一节点并行对第一地址和第二地址进行读写操作,则可以实现并行处理来自强顺序模型的请求,从而提高遵守SO约束的节点(第二节点)与遵守RO约束的节点(第三节点)之间的传输带宽和交互效率。
第一节点开始对第一地址和第二地址进行读写操作的顺序可以与第一消息和第二消息的接收顺序相同,也就是说,第一节点可以按照第一消息和第二消息的接收顺序并行对第一地址和第二地址进行读写操作。例如,第一节点先接收第一消息后接收第二消息,则第一节点可以先对第一地址进行读写操作,后对第二地址进行读写操作。从而由第一节点完成读写请求的顺序处理,以实现存储一致性。
下面具体说明第一节点如何对第一地址和第二地址进行读写操作。
第一节点可以向第三节点发送第五消息,第五消息用于指示对第一地址进行读写操作,第五消息可以为回写(WriteBack)消息。对于写操作来说,第五消息可以包括待写的数据、写操作类型、第一地址;对于读操作来说,第五消息可以包括读操作类型、第一地址。同理,第一节点可以向第三节点发送第五消息,第五消息用于指示对第二地址进行读写操作,第五消息可以为回写(WriteBack)消息。对于写操作来说,第五消息可以包括待写的数据、写操作类型、第二地址;对于读操作来说,第五消息可以包括读操作类型、第二地址。
第一节点发送第一地址对应的第五消息和第二地址对应的第五消息的顺序与第一消息和第二消息的接收顺序可以相同。例如,第一节点先接收第一消息后接收第二消息,则第一节点先发送第一地址对应的第五消息后发送第二地址对应的第五消息。
第三节点在接收到第五消息后可以向第一节点发送第六消息,该第六消息可以为第五消息的响应消息,该第六消息用于指示第一地址对应的缓存地址。同理,第三节点在接收到第五消息后可以向第一节点发送第六消息,该第六消息可以为第五消息的响应消息,该第六消息用于指示第二地址对应的缓存地址。
第一节点在接收到第六消息后,向第三节点发送第七消息,第七消息可以为写数据(WriteData)消息,第七消息用于对第一地址对应的缓存地址进行读写操作。同理, 第一节点在接收到第六消息后,向第三节点发送第七消息,第七消息可以为写数据(WriteData)消息,第七消息用于对第二地址对应的缓存地址进行读写操作。
示例性的,如图11所示,第一节点向第三节点发送第五消息1和第五消息2,第五消息1和第五消息2可以为回写(WriteBack)消息。第五消息1与第一消息对应,用于指示对第一地址进行写操作,第五消息2与第二消息对应,用于指示对第二地址进行写操作。由于第一节点先从第二节点接收第一消息后接收第二消息,所以第一节点先向第三节点先发送第五消息1后发送第五消息2。此时的并行指第一节点不用等待第五消息1对应的读写操作全部完成即可以发送第五消息2。
第三节点向第一节点发送第六消息1和第六消息2,第六消息1可以为第五消息1的响应消息(RSP3),第六消息2可以为第五消息2的响应消息(RSP4)。第六消息1用于指示第一地址对应的缓存地址,第五消息2用于指示第二地址对应的缓存地址。
第一节点向第三节点发送第七消息1和第七消息2,第七消息可以为写数据(WriteData)消息。第七消息1用于向第一地址对应的缓存地址写入数据,第七消息2用于向第二地址对应的缓存地址写入数据。
第一节点在完成对第一地址的读写操作后可以向第三节点释放第一地址的操作权限。例如,上述第七消息还可以用于向第三节点释放第一地址的操作权限。这样第三节点或其他节点可以继续对第一地址进行读写操作。同理,第一节点在完成对第二地址的读写操作后可以向第三节点释放第二地址的操作权限,例如,上述第七消息还可以用于向第三节点释放第二地址的操作权限。这样第三节点或其他节点可以继续对第二地址进行读写操作。
示例性的,如图11所示,第七消息1还用于指示向第三节点释放第一地址的操作权限;第七消息2还用于指示向第三节点释放第二地址的操作权限。
本申请实施例提供的读写操作执行方法,第一节点从第二节点接收第一消息和第二消息,第二节点遵守SO约束,第一消息请求对第三节点管理的第一地址进行读写操作,第二消息请求对第三节点管理的第二地址进行读写操作,第三节点遵守RO约束;则第一节点从第三节点获取第一地址的操作权限和第二地址的操作权限,使得第一节点参与到缓存一致性的管理,其他节点都无法对第一地址和第二地址进行需要操作权限的读写操作,即第一地址和第二地址的读写操作的执行顺序由第一节点来控制,那么执行结果全局可见的顺序也就由第一节点来控制。从而实现遵守RO约束的节点执行读写操作的执行结果全局可见的顺序符合遵守SO约束的节点的要求。
下面结合图12-图19,基于缓存一致性的原理,说明如果在第一节点与第三节点交互以进行读写操作的过程中,如果其他节点要对第一地址(或第二地址)进行需要操作权限的读写操作,则第三节点请求第一地址(或第二地址)的操作权限或者对第一地址(或第二地址)进行需要操作权限的读写操作,第一节点将如何处理,以满足存储一致性要求,保证执行结果全局可见的顺序符合强顺序模型要求。
在一种可能的实施方式中,如果第一节点在请求第一地址的操作权限但未获取第一地址的操作权限时,接收到第三节点请求对第一地址进行需要操作权限的读写操作,或者,请求第一地址的操作权限,则第一节点向第三节点指示未获取第一地址的操作权限。同理,如果第一节点在请求第二地址的操作权限但未获取第二地址的操作权限 之前,接收到第三节点请求对第二地址进行需要操作权限的读写操作,或者,请求第二地址的操作权限,则第一节点向第三节点指示未获取第二地址的操作权限。如图12所示,上述读写操作执行方法还包括:
S1201、在第一节点获取第一地址(或第二地址)的操作权限之前,第一节点从第三节点接收第八消息。
第八消息用于请求对第一地址进行需要操作权限的读写操作,或者说,用于请求第一地址的操作权限。同理,第八消息用于请求对第二地址进行需要操作权限的读写操作,或者说,用于请求第二地址的操作权限。示例性的,第八消息可以是嗅探(snoop)消息。
示例性的,如图13所示,在第三节点向第一节点发送第三消息1(GET_E1)的响应消息(RSP1)以使第一节点获取第一地址(或第二地址)的操作权限之前,第三节点向第一节点发送第八消息(嗅探消息),使得第一节点从第三节点接收第八消息,该第八消息用于请求对第一地址(或第二地址)进行需要操作权限的读写操作,或者,用于请求第一地址(或第二地址)的操作权限。
S1202、第一节点向第三节点发送第九消息。
第九消息用于指示未获取第一地址(或第二地址)的操作权限。第九消息可以为第八消息的响应消息,例如第九消息可以为嗅探响应消息。
示例性的,如图13所示,第一节点向第三节点发送第九消息(嗅探响应消息),第八消息用于指示未获取第一地址(或第二地址)的操作权限。
该实施方式中,如果在第一节点获取第一地址的操作权限之前,第三节点请求对第一地址进行需要操作权限的读写操作,或者,请求第一地址的操作权限,则第一节点向第三节点指示未获取第一地址的操作权限。使得第三节点或其他节点可以对第一地址进行读写操作。同理,如果在第一节点获取第二地址的操作权限之前,第三节点请求对第二地址进行需要操作权限的读写操作,或者,请求第二地址的操作权限,则第一节点向第三节点指示未获取第二地址的操作权限。使得第三节点或其他节点可以对第二地址进行读写操作。
在又一种可能的实施方式中,如果在第一节点开始对第一地址进行写操作但获取第一地址对应的缓存地址之前,第三节点请求对第一地址进行需要操作权限的读写操作,或者,请求第一地址的操作权限,则第一节点在获取了第一地址对应的缓存地址之后,向第三节点发送向第一地址对应的缓存地址写入的数据,或者,指示已经释放第一地址的操作权限。同理,如果在第一节点开始对第二地址进行写操作但获取第二地址对应的缓存地址之前,第三节点请求对第二地址进行需要操作权限的读写操作,或者,请求第二地址的操作权限,则第一节点在获取了第二地址对应的缓存地址之后,向第三节点发送向第二地址对应的缓存地址写入的数据,或者,指示已经释放第二地址的操作权限。如图14所示,上述读写操作执行方法还包括:
S1401、在第一节点开始对第一地址(或第二地址)进行写操作但未获取第一地址(或第二地址)对应的缓存地址,第一节点从第三节点接收第十二消息。
第十二消息用于请求对第一地址(或第二地址)进行需要操作权限的读写操作,或者说,用于请求第一地址(或第二地址)的操作权限。例如,第十二消息可以是嗅 探(snoop)消息。
示例性的,如图15所示,在第三节点向第一节点发送第六消息(包括第一地址(或第二地址)对应的缓存地址)之前,第三节点向第一节点发送第十二消息(嗅探消息),使得第一节点从第三节点接收第十二消息,该第十二消息用于请求对第一地址(或第二地址)进行需要操作权限的读写操作,或者,用于请求第一地址(或第二地址)的操作权限。
S1402、第一节点在获取第一地址(或第二地址)对应的缓存地址之后,向第三节点发送第十三消息。
示例性的,如图15所示,在第一节点接收到来自第一节点的第六消息1(包括第一地址(或第二地址)对应的缓存地址)之后,向第三节点发送第十三消息。第十三消息可以为第十二消息的响应消息,例如第十三消息可以为嗅探响应消息。
在一种可能的实施方式中,第十三消息中可以包括向第一地址(或第二地址)对应的缓存地址写入的数据。第十三消息可以具有第七消息的功能以取代第七消息,即第十三消息还可以用于指示向第三节点释放第一地址(或第二地址)的操作权限。
在另一种可能的实施方式中,第十三消息可以在第七消息(此时第七消息用于指示向第三节点释放第一地址(或第二地址)的操作权限)之后发送,用于指示已经释放第一地址(或第二地址)的操作权限。
在图14和图15所示的实施方式中,如果在第一节点开始对第一地址进行写操作但获取第一地址对应的缓存地址之前,第三节点请求对第一地址进行需要操作权限的读写操作,或者,请求第一地址的操作权限,则第一节点在获取了第一地址对应的缓存地址之后,向第三节点发送向第一地址对应的缓存地址写入的数据,或者,指示已经释放第一地址的操作权限。则第一节点在获取了第一地址对应的缓存地址之后,向第三节点发送向第一地址对应的缓存地址写入的数据,使得第三节点能够直接得到该数据;或者,第一节点在发送第七消息后指示已经释放第一地址的操作权限,使得第三节点或其他节点能够对第一地址进行读写操作。
同理,如果在第一节点开始对第二地址进行写操作但获取第二地址对应的缓存地址之前,第三节点请求对第二地址进行需要操作权限的读写操作,或者,请求第二地址的操作权限,则第一节点在获取了第二地址对应的缓存地址之后,向第三节点发送向第二地址对应的缓存地址写入的数据,或者,指示已经释放第二地址的操作权限。则第一节点在获取了第二地址对应的缓存地址之后,向第三节点发送向第二地址对应的缓存地址写入的数据,使得第三节点能够直接得到该数据;或者,第一节点在发送第七消息后指示已经释放第二地址的操作权限,使得第三节点或其他节点能够对第二地址进行读写操作。
在第一节点从第三节点获取第一地址的操作权限和第二地址的操作权限之后,在预设条件满足时,第一节点可以向第三节点释放第一地址的操作权限和第二地址的操作权限。使得第三节点或其他节点可以对第一地址和第二地址进行读写操作。
在一种可能的实施方式中,预设条件为第三节点请求第一地址和第二地址的操作权限。
示例性的,如果在第一节点获取了第一地址的操作权限之后并且开始对第一地址 进行读写操作之前,第三节点请求对第一地址进行需要操作权限的读写操作,或者,请求第一地址的操作权限,则第一节点向第三节点释放第一地址的操作权限,并重新从第三节点获取第一地址的操作权限。同理,如果在第一节点获取了第二地址的操作权限之后并且开始对第二地址进行读写操作之前,第三节点请求对第二地址进行需要操作权限的读写操作,或者,请求第二地址的操作权限,则第一节点向第三节点释放第二地址的操作权限,并重新从第三节点获取第二地址的操作权限。如图16所示,上述读写操作执行方法还包括:
S1601、在第一节点获取了第一地址(或第二地址)的操作权限之后并且开始对第一地址(或第二地址)进行读写操作之前,第一节点从第三节点接收第十消息。
第十消息用于请求对第一地址(或第二地址)进行需要操作权限的读写操作,或者说,用于请求第一地址(或第二地址)的操作权限。例如,第十消息可以是嗅探(snoop)消息。
示例性的,如图17所示,在第一节点向第三节点发送第四消息1之前,第三节点向第一节点发送第十消息(嗅探消息),使得第一节点从第三节点接收第十消息,该第十消息用于请求对第一地址(或第二地址)进行需要操作权限的读写操作,或者,用于请求第一地址(或第二地址)的操作权限。
S1602、第一节点向第三节点发送第十一消息,并重新从第三节点获取第一地址(或第二地址)的操作权限。
第十一消息用于指示释放第一地址(或第二地址)的操作权限。第十一消息可以为第十消息的响应消息,例如第十一消息可以为嗅探响应消息。
示例性的,如图17所示,第一节点向第三节点发送第十一消息(嗅探响应消息),第十一消息用于指示释放第一地址(或第二地址)的操作权限,并重新向第三节点发送第三消息(GET_E),并从第三节点接收第四消息(RSP1/RSP2)以获取第一地址(或第二地址)的操作权限,并向第三节点发送第四消息的确认消息(ACK1),然后重新执行对第一地址(或第二地址)的读写操作流程以及释放第一地址(或第二地址)的操作权限流程。
在该实施方式中,如果在第一节点获取了第一地址的操作权限之后并且开始对第一地址进行读写操作之前,第三节点请求对第一地址进行需要操作权限的读写操作,或者,请求第一地址的操作权限,则第一节点向第三节点释放第一地址的操作权限,并重新从第三节点获取第一地址的操作权限。在第一节点向第三节点释放第一地址的操作权限之后,使得第三节点可以直接对第一地址进行读写操作。第一节点重新从第三节点获取第一地址的操作权限,则可以继续对第一地址进行读写操作。
同理,如果在第一节点获取了第二地址的操作权限之后并且开始对第二地址进行读写操作之前,第三节点请求对第二地址进行需要操作权限的读写操作,或者,请求第二地址的操作权限,则第一节点向第三节点释放第二地址的操作权限,并重新从第三节点获取第二地址的操作权限。在第一节点向第三节点释放第二地址的操作权限之后,使得第三节点可以直接对第二地址进行读写操作。第一节点重新从第三节点获取第二地址的操作权限,则可以继续对第二地址进行读写操作。
在另一种可能的实施方式中,预设条件为第一节点从第三节点获取第一地址的操 作权限的时间大于或等于第一预设时间,以及第一节点从第三节点获取第二地址的操作权限的时间大于或等于第二预设时间。第一预设时间和第二预设时间可以相同或不同。
如前文所述的,步骤S901和S902无先后执行顺序,对于先执行步骤S902后执行步骤S901,即第一节点预先获取第一地址(或第二地址)的操作权限,当接收到第一消息(或第二消息)时,可以快速对第一地址(或第二地址)执行读写操作。第一节点可以根据历史的读写操作提前获取第一地址(或第二地址)的操作权限。
在第一节点从第三节点获取第一地址的操作权限之后的预设时间内,如果未接收到第一消息,则第一节点向第三节点释放第一地址的操作权限。同理,在第一节点从第三节点获取第二地址的操作权限之后的预设时间内,如果未接收到第二消息,则第一节点向第三节点释放第二地址的操作权限。
示例性的,如图18所示,第一节点在通过第三消息和第四消息获取第一地址(或第二地址)的操作权限后,经过预设时间未接收到第一消息(或第二消息),则第一节点向第三节点发送第十四消息以指示释放第一地址(或第二地址)的操作权限。后续第一节点再接收到第一消息(或第二消息)时,重新执行上述步骤S902-S903对应的交互流程,以完成读写操作。
示例性的,如图19所示,第一节点在通过第三消息和第四消息获取第一地址(或第二地址)的操作权限后,在预设时间内接收到第一消息(或第二消息),则第一节点执行步骤S903对应的交互流程,以完成读写操作。
在该实施方式中,第一节点在接收了来自第二节点的读写请求后不必再执行获取操作权限的流程,可以快速对第一地址(或第二地址)执行读写操作。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到 多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可以用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带),光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (25)

  1. 一种读写操作执行方法,其特征在于,包括:
    第一节点从第二节点接收第一消息和第二消息;所述第一消息用于请求对第三节点管理的第一地址进行读写操作;所述第二消息用于请求对所述第三节点管理的第二地址进行读写操作;所述第二节点的读写操作的执行顺序约束比所述第三节点的读写操作的执行顺序约束严格;
    所述第一节点从所述第三节点获取所述第一地址的操作权限和所述第二地址的操作权限;
    所述第一节点对所述第一地址和所述第二地址进行读写操作。
  2. 根据权利要求1所述的方法,其特征在于,所述第一节点对所述第一地址和所述第二地址进行读写操作,包括:
    所述第一节点并行对所述第一地址和所述第二地址进行读写操作。
  3. 根据权利要求2所述的方法,其特征在于,所述第一节点并行对所述第一地址和所述第二地址进行读写操作,包括:
    所述第一节点按照所述第一消息和所述第二消息的接收顺序并行对所述第一地址和所述第二地址进行读写操作。
  4. 根据权利要求1-3任一项所述的方法,其特征在于,所述第二节点遵守严格顺序SO约束,第三节点遵守宽松顺序RO约束。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,还包括:
    所述第一节点在完成对所述第一地址的读写操作后,向所述第三节点释放所述第一地址的操作权限;
    所述第一节点在完成对所述第二地址的读写操作后,向所述第三节点释放所述第二地址的操作权限。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述第一节点从第三节点获取所述第一地址的操作权限和所述第二地址的操作权限,包括:
    所述第一节点从所述第三节点获取所述第一地址的E态以及所述第二地址的E态。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,还包括:
    所述第一节点在请求所述第一地址的操作权限但未获取所述第一地址的操作权限时,接收到所述第三节点请求对所述第一地址进行需要操作权限的读写操作,或者,请求所述第一地址的操作权限,则所述第一节点向所述第三节点指示未获取所述第一地址的操作权限;
    所述第一节点在请求所述第二地址的操作权限但未获取所述第二地址的操作权限时,接收到所述第三节点请求对所述第二地址进行需要操作权限的读写操作,或者,请求所述第二地址的操作权限,则所述第一节点向所述第三节点指示未获取所述第二地址的操作权限。
  8. 根据权利要求1-7任一项所述的方法,其特征在于,在所述第一节点从所述第三节点获取所述第一地址的操作权限和所述第二地址的操作权限之后,所述方法还包括:
    在预设条件满足时,所述第一节点向所述第三节点释放所述第一地址的操作权限和所述第二地址的操作权限。
  9. 根据权利要求8所述的方法,其特征在于,所述预设条件为所述第三节点请求所述第一地址和所述第二地址的操作权限。
  10. 根据权利要求8所述的方法,其特征在于,所述预设条件为所述第一节点从所述第三节点获取所述第一地址的操作权限的时间大于或等于第一预设时间,以及所述第一节点从所述第三节点获取所述第二地址的操作权限的时间大于或等于第二预设时间。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,所述第二节点为片上系统SoC芯片之外的输入输出I/O设备,所述第一节点为所述SoC芯片中的内存管理单元MMU,所述第三节点为所述SoC芯片中的内存控制器或所述内存控制器中的本地代理HA。
  12. 根据权利要求1-10任一项所述的方法,其特征在于,所述第二节点为SoC芯片中的处理器,所述第一节点为所述SoC芯片中的片上互联网络NOC或者所述处理器的接口模块,所述第三节点为所述SoC芯片中的内存控制器或所述内存控制器中的HA。
  13. 一种片上系统SoC芯片,其特征在于,包括:第一节点和内存控制器,
    所述第一节点用于:
    从第二节点接收第一消息和第二消息;所述第一消息用于请求对所述内存控制器管理的第一地址进行读写操作;所述第二消息用于请求对所述内存控制器管理的第二地址进行读写操作;所述第二节点的读写操作的执行顺序约束比所述内存控制器的读写操作的执行顺序约束严格;
    从所述内存控制器获取所述第一地址的操作权限和所述第二地址的操作权限;
    对所述第一地址和所述第二地址进行读写操作。
  14. 根据权利要求13所述的SoC芯片,其特征在于,所述第一节点具体用于:
    并行对所述第一地址和所述第二地址进行读写操作。
  15. 根据权利要求14所述的SoC芯片,其特征在于,所述第一节点具体用于:
    按照所述第一消息和所述第二消息的接收顺序并行对所述第一地址和所述第二地址进行读写操作。
  16. 根据权利要求13-15任一项所述的SoC芯片,其特征在于,所述第二节点遵守严格顺序SO约束,内存控制器遵守宽松顺序RO约束。
  17. 根据权利要求13-16任一项所述的SoC芯片,其特征在于,所述第一节点还用于:
    在完成对所述第一地址的读写操作后,向所述内存控制器释放所述第一地址的操作权限;
    在完成对所述第二地址的读写操作后,向所述内存控制器释放所述第二地址的操作权限。
  18. 根据权利要求13-17任一项所述的SoC芯片,其特征在于,所述第一节点具体用于:
    从所述内存控制器获取所述第一地址的E态以及所述第二地址的E态。
  19. 根据权利要求13-18任一项所述的SoC芯片,其特征在于,所述第一节点还用 于:
    在请求所述第一地址的操作权限但未获取所述第一地址的操作权限时,接收到所述内存控制器请求对所述第一地址进行需要操作权限的读写操作,或者,请求所述第一地址的操作权限,则向所述内存控制器指示未获取所述第一地址的操作权限;
    在请求所述第二地址的操作权限但未获取所述第二地址的操作权限时,接收到所述内存控制器请求对所述第二地址进行需要操作权限的读写操作,或者,请求所述第二地址的操作权限,则向所述内存控制器指示未获取所述第二地址的操作权限。
  20. 根据权利要求13-19任一项所述的SoC芯片,其特征在于,在从所述内存控制器获取所述第一地址的操作权限和所述第二地址的操作权限之后,所述第一节点还用于:
    在预设条件满足时,向所述内存控制器释放所述第一地址的操作权限和所述第二地址的操作权限。
  21. 根据权利要求20所述的SoC芯片,其特征在于,所述预设条件为所述内存控制器请求所述第一地址和所述第二地址的操作权限。
  22. 根据权利要求20所述的SoC芯片,其特征在于,所述预设条件为所述第一节点从所述内存控制器获取所述第一地址的操作权限的时间大于或等于第一预设时间,以及所述第一节点从所述内存控制器获取所述第二地址的操作权限的时间大于或等于第二预设时间。
  23. 根据权利要求13-22任一项所述的SoC芯片,其特征在于,所述第二节点为所述SoC芯片之外的输入输出I/O设备,所述第一节点为所述SoC芯片中的内存管理单元MMU。
  24. 根据权利要求13-22任一项所述的SoC芯片,其特征在于,所述第二节点为所述SoC芯片中的处理器,所述第一节点为所述SoC芯片中的片上互联网络NOC或者所述处理器的接口模块。
  25. 根据权利要求13-24任一项所述的SoC芯片,其特征在于,所述第一节点包括顺序处理模块、操作权限判断模块和数据缓存判断模块;
    所述顺序处理模块用于记录接收所述第一消息和所述第二消息的顺序;
    所述操作权限判断模块用于记录是否收到所述第一地址的操作权限和所述第二地址的操作权限,并根据所述顺序来确定对所述第一地址和所述第二地址进行读写操作的先后顺序;
    所述数据缓存判断模块用于记录是否收到所述第一地址对应的缓存地址的标识以及所述第二地址对应的缓存地址的标识,从而确定是否发送数据。
PCT/CN2021/084556 2021-03-31 2021-03-31 读写操作执行方法和SoC芯片 WO2022205130A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2021/084556 WO2022205130A1 (zh) 2021-03-31 2021-03-31 读写操作执行方法和SoC芯片
CN202180093103.5A CN116940934A (zh) 2021-03-31 2021-03-31 读写操作执行方法和SoC芯片
EP21933798.7A EP4310683A4 (en) 2021-03-31 2021-03-31 METHOD FOR PERFORMING A READ/WRITE OPERATION AND SOC CHIP
US18/477,110 US20240028528A1 (en) 2021-03-31 2023-09-28 Read/write operation execution method and soc chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/084556 WO2022205130A1 (zh) 2021-03-31 2021-03-31 读写操作执行方法和SoC芯片

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/477,110 Continuation US20240028528A1 (en) 2021-03-31 2023-09-28 Read/write operation execution method and soc chip

Publications (1)

Publication Number Publication Date
WO2022205130A1 true WO2022205130A1 (zh) 2022-10-06

Family

ID=83455366

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084556 WO2022205130A1 (zh) 2021-03-31 2021-03-31 读写操作执行方法和SoC芯片

Country Status (4)

Country Link
US (1) US20240028528A1 (zh)
EP (1) EP4310683A4 (zh)
CN (1) CN116940934A (zh)
WO (1) WO2022205130A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101176083A (zh) * 2005-03-23 2008-05-07 高通股份有限公司 在弱有序处理系统中强制执行强有序请求
US20100199048A1 (en) * 2009-02-04 2010-08-05 Sun Microsystems, Inc. Speculative writestream transaction
CN106796561A (zh) * 2014-09-12 2017-05-31 高通股份有限公司 将强有序写入事务桥接到弱有序域中的装置和相关设备、方法和计算机可读媒体

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050289306A1 (en) * 2004-06-28 2005-12-29 Sridhar Muthrasanallur Memory read requests passing memory writes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101176083A (zh) * 2005-03-23 2008-05-07 高通股份有限公司 在弱有序处理系统中强制执行强有序请求
US20100199048A1 (en) * 2009-02-04 2010-08-05 Sun Microsystems, Inc. Speculative writestream transaction
CN106796561A (zh) * 2014-09-12 2017-05-31 高通股份有限公司 将强有序写入事务桥接到弱有序域中的装置和相关设备、方法和计算机可读媒体

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4310683A4 *

Also Published As

Publication number Publication date
EP4310683A4 (en) 2024-05-01
CN116940934A (zh) 2023-10-24
EP4310683A1 (en) 2024-01-24
US20240028528A1 (en) 2024-01-25

Similar Documents

Publication Publication Date Title
US11822786B2 (en) Delayed snoop for improved multi-process false sharing parallel thread performance
US10169080B2 (en) Method for work scheduling in a multi-chip system
US7024521B2 (en) Managing sparse directory evictions in multiprocessor systems via memory locking
US9529532B2 (en) Method and apparatus for memory allocation in a multi-node system
US7613882B1 (en) Fast invalidation for cache coherency in distributed shared memory system
US20110004732A1 (en) DMA in Distributed Shared Memory System
US20150254182A1 (en) Multi-core network processor interconnect with multi-node connection
US10592459B2 (en) Method and system for ordering I/O access in a multi-node environment
US9372800B2 (en) Inter-chip interconnect protocol for a multi-chip system
US6920532B2 (en) Cache coherence directory eviction mechanisms for modified copies of memory lines in multiprocessor systems
JP7193547B2 (ja) キャッシュ・メモリ動作の調整
US6934814B2 (en) Cache coherence directory eviction mechanisms in multiprocessor systems which maintain transaction ordering
JP2001167077A (ja) ネットワークシステムにおけるデータアクセス方法、ネットワークシステムおよび記録媒体
US6925536B2 (en) Cache coherence directory eviction mechanisms for unmodified copies of memory lines in multiprocessor systems
US9183150B2 (en) Memory sharing by processors
US20220114098A1 (en) System, apparatus and methods for performing shared memory operations
US6647469B1 (en) Using read current transactions for improved performance in directory-based coherent I/O systems
US20030182509A1 (en) Methods and apparatus for speculative probing at a request cluster
CN114356839B (zh) 处理写操作的方法、设备、处理器及设备可读存储介质
WO2022205130A1 (zh) 读写操作执行方法和SoC芯片
EP4220375A1 (en) Systems, methods, and devices for queue management with a coherent interface
CN110083548B (zh) 数据处理方法及相关网元、设备、系统
KR20200143922A (ko) 메모리 카드 및 이를 이용한 데이터 처리 방법
WO2022246848A1 (zh) 分布式缓存系统和数据缓存方法
CN109597776B (zh) 一种数据操作方法、内存控制器以及多处理器系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21933798

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180093103.5

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2021933798

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2021933798

Country of ref document: EP

Effective date: 20231019

NENP Non-entry into the national phase

Ref country code: DE