WO2022205130A1 - 读写操作执行方法和SoC芯片 - Google Patents
读写操作执行方法和SoC芯片 Download PDFInfo
- Publication number
- WO2022205130A1 WO2022205130A1 PCT/CN2021/084556 CN2021084556W WO2022205130A1 WO 2022205130 A1 WO2022205130 A1 WO 2022205130A1 CN 2021084556 W CN2021084556 W CN 2021084556W WO 2022205130 A1 WO2022205130 A1 WO 2022205130A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- address
- node
- operation authority
- read
- message
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 238000012545 processing Methods 0.000 claims description 36
- 238000003860 storage Methods 0.000 description 38
- 230000004044 response Effects 0.000 description 22
- 230000003993 interaction Effects 0.000 description 14
- 230000005540 biological transmission Effects 0.000 description 13
- 238000004891 communication Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 13
- 238000007726 management method Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 9
- 238000013519 translation Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000012790 confirmation Methods 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000001693 membrane extraction with a sorbent interface Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1668—Details of memory controller
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/1642—Handling requests for interconnection or transfer for access to memory bus based on arbitration with request queuing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/1652—Handling requests for interconnection or transfer for access to memory bus based on arbitration in a multiprocessor architecture
- G06F13/1663—Access to shared memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1027—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/16—Handling requests for interconnection or transfer for access to memory bus
- G06F13/1605—Handling requests for interconnection or transfer for access to memory bus based on arbitration
- G06F13/161—Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement
- G06F13/1626—Handling requests for interconnection or transfer for access to memory bus based on arbitration with latency improvement by reordering requests
Definitions
- the present application relates to the field of storage, and in particular, to a method for performing read and write operations and a system on chip (SoC) chip.
- SoC system on chip
- Data can be passed between multiple processes (software) by accessing shared memory.
- multiple processes send read and write commands to hardware (for example, a central processing unit (CPU)), and the hardware performs read and write operations on the shared memory.
- hardware for example, a central processing unit (CPU)
- the memory consistency model can be used to make different requirements on the execution order of the read and write operations to ensure that the execution results meet the software expectations. .
- the strictness of the execution order required by different storage consistency models is different.
- the nodes of the storage consistency model (referred to as the strong order model) comply with strict order (strict order, SO) constraints
- the nodes obey the relaxed order (relax order
- the read and write operations should also be performed in the weak order model according to the execution order of the strong order model to ensure that the execution results are globally visible.
- the order of observable, GO) conforms to the requirements of the strong order model.
- Embodiments of the present application provide a method and a SoC chip for executing a read and write operation, which are used to realize that the order in which the execution result of a read and write operation performed by a node that complies with the RO constraint meets the requirements of the node that complies with the SO constraint.
- a first aspect provides a method for performing a read-write operation, comprising: a first node receiving a first message and a second message from a second node; the first message is used to request to read and write a first address managed by a third node operation; the second message is used to request read and write operations on the second address managed by the third node; the execution sequence constraints of the read and write operations of the second node are stricter than the execution sequence constraints of the read and write operations of the third node; the first node Obtain the operation authority of the first address and the operation authority of the second address from the third node; the first node performs read and write operations on the first address and the second address.
- the first node receives the first message and the second message from the second node, the second node complies with the SO constraint, and the first message requests to read the first address managed by the third node Write operation, the second message requests to perform read and write operations on the second address managed by the third node, and the third node complies with the RO constraint; then the first node obtains the operation authority of the first address and the operation authority of the second address from the third node , so that the first node participates in the management of cache consistency, and other nodes cannot perform read and write operations on the first address and the second address that require operation permissions, that is, the execution order of the read and write operations of the first address and the second address is determined by Controlled by the first node, then the order in which the execution results are globally visible is also controlled by the first node. Thereby, the order in which the execution results of the read and write operations performed by the nodes complying with the RO constraint are globally
- the first node performs read and write operations on the first address and the second address, including: the first node performs read and write operations on the first address and the second address in parallel.
- This embodiment can realize parallel processing of read and write operation requests from nodes that comply with SO constraints, thereby improving transmission bandwidth and interaction efficiency between nodes complying with SO constraints (second nodes) and nodes complying with RO constraints (third nodes). .
- the first node performs read and write operations on the first address and the second address in parallel, including: the first node reads and writes the first address and the second address in parallel according to the receiving order of the first message and the second message. address for read and write operations.
- This embodiment can ensure that the globally visible order of execution results conforms to the strong order model requirement.
- the second node obeys the strict order SO constraint
- the third node obeys the relaxed order RO constraint. This embodiment explains why the execution order constraints of the read and write operations of the second node are stricter than the execution order constraints of the read and write operations of the third node.
- the method further includes: after the first node completes the read and write operations on the first address, releasing the operation authority of the first address to the third node. In this way, the third node or other nodes can continue to perform read and write operations on the first address. After completing the read and write operations on the second address, the first node releases the operation authority of the second address to the third node. In this way, the third node or other nodes can continue to perform read and write operations on the second address.
- the first node obtains the operation authority of the first address and the operation authority of the second address from the third node, including: the first node obtains the E state of the first address and the second address from the third node.
- the E state of the address This embodiment provides a specific form of the operation authority of the first address and the operation authority of the second address.
- the first node when the first node requests the operation authority of the first address but does not obtain the operation authority of the first address, the first node receives a request from the third node to perform a read and write operation on the first address that requires operation authority, Or, if the operation authority of the first address is requested, the first node indicates to the third node that the operation authority of the first address has not been obtained, so that the third node or other nodes can perform read and write operations on the first address.
- the first node requests the operation authority of the second address but does not obtain the operation authority of the second address, the first node receives a request from the third node to perform a read and write operation on the second address that requires operation authority, or requests the operation authority of the second address. , the first node indicates to the third node that the operation authority of the second address has not been obtained; so that the third node or other nodes can perform read and write operations on the second address.
- the method further includes: when the preset condition is satisfied, the first node sends the third node to the third node.
- the node releases the operation authority of the first address and the operation authority of the second address. This enables the third node or other nodes to perform read and write operations on the first address and the second address.
- the preset condition is that the third node requests the operation authority of the first address and the second address. This enables the third node or other nodes to perform read and write operations on the first address and the second address.
- the preset condition is that the time when the first node obtains the operation authority of the first address from the third node is greater than or equal to the first preset time, and the first node obtains the second address from the third node The time of the operation authority is greater than or equal to the second preset time.
- the method further includes: after the first node obtains the operation authority of the first address and before starting to perform read and write operations on the first address, the third node requests the first address to perform operations that require operation authority. read and write operations, or request the operation authority of the first address, the first node releases the operation authority of the first address to the third node, and obtains the operation authority of the first address from the third node again; If the three nodes obtain the operation authority of the first address, they can continue to perform read and write operations on the first address.
- the third node After the first node obtains the operation authority of the second address and before starting to read and write operations on the second address, the third node requests to perform a read and write operation on the second address that requires operation authority, or requests an operation on the first address authority, the first node releases the operation authority of the second address to the third node, and obtains the operation authority of the second address from the third node again. If the first node obtains the operation authority of the second address from the third node again, it can continue to perform read and write operations on the second address.
- the method further includes: when the first node starts to perform a write operation on the first address but does not obtain a cache address corresponding to the first address, the third node requests to perform a read operation on the first address that requires operation authority write operation, or request the operation permission of the first address, after obtaining the cache address corresponding to the first address, the first node sends the data written in the cache address corresponding to the first address to the third node, or indicates that it has been released The operation authority of the first address; enabling the third node or other nodes to perform read and write operations on the first address.
- the third node When the first node starts to perform a write operation on the second address but does not obtain the cache address corresponding to the second address, the third node requests to perform a read and write operation on the second address that requires operation authority, or requests the operation authority of the second address , the first node sends the data written in the cache address corresponding to the second address to the third node after acquiring the cache address corresponding to the second address, or indicates that the operation authority of the second address has been released. This enables the third node or other nodes to perform read and write operations on the second address.
- the second node is an input/output I/O device other than the system-on-chip SoC chip
- the first node is a memory management unit (memory management unit, MMU) in the SoC chip
- the MMU may be SMMU
- the third node is the memory controller in the SoC chip or the local agent HA in the memory controller.
- the second node is a processor in the SoC chip
- the first node is an on-chip interconnection NOC or an interface module of the processor in the SoC chip
- the third node is a memory controller in the SoC chip or HA in the memory controller.
- a system-on-chip SoC chip which is characterized by comprising: a first node and a memory controller, where the first node is configured to: receive a first message and a second message from the second node; It is used to request read and write operations to the first address managed by the memory controller; the second message is used to request read and write operations to the second address managed by the memory controller;
- the execution sequence of the read and write operations of the memory controller is strictly restricted; the operation authority of the first address and the operation authority of the second address are obtained from the memory controller; and the read and write operations are performed on the first address and the second address.
- the first node is specifically configured to perform read and write operations on the first address and the second address in parallel.
- the first node is specifically configured to perform read and write operations on the first address and the second address in parallel according to the receiving order of the first message and the second message.
- the second node obeys the strict order SO constraint
- the memory controller obeys the relaxed order RO constraint
- the first node is further configured to: release the operation authority of the first address to the memory controller after completing the read and write operations on the first address; after completing the read and write operations on the second address After that, the operation authority of the second address is released to the memory controller.
- the first node is specifically configured to: acquire the E state of the first address and the E state of the second address from the memory controller.
- the first node is further configured to: when requesting the operation authority of the first address but not obtaining the operation authority of the first address, receiving a request from the memory controller to perform an operation requiring operation authority on the first address Read and write operations, or request the operation authority of the first address, indicate to the memory controller that the operation authority of the first address has not been obtained; when requesting the operation authority of the second address but not obtained the operation authority of the second address, receive The memory controller requests to perform read and write operations on the second address that require operation authority, or requests the operation authority of the second address, indicating to the memory controller that the operation authority of the second address has not been obtained.
- the first node after obtaining the operation authority of the first address and the operation authority of the second address from the memory controller, the first node is further configured to: release the first node to the memory controller when the preset condition is satisfied The operation authority of one address and the operation authority of the second address.
- the preset condition is that the memory controller requests the operation authority of the first address and the second address.
- the preset condition is that the time when the first node obtains the operation authority of the first address from the memory controller is greater than or equal to the first preset time, and the first node obtains the second address from the memory controller The time of the operation authority is greater than or equal to the second preset time.
- the second node is an input/output I/O device outside the SoC chip
- the first node is a memory management unit MMU in the SoC chip.
- the second node is a processor in the SoC chip
- the first node is an on-chip interconnection network NOC in the SoC chip or an interface module of the processor.
- the first node includes a sequence processing module, an operation authority judgment module and a data cache judgment module; the sequence processing module is used for recording the sequence of receiving the first message and the second message; the operation authority judgment module is used for Record whether the operation authority of the first address and the operation authority of the second address are received, and determine the sequence of reading and writing operations on the first address and the second address according to the sequence; the data cache judgment module is used to record whether the first address is received.
- the identifier of the cache address corresponding to one address and the identifier of the cache address corresponding to the second address are used to determine whether to send data.
- FIG. 1 is a schematic structural diagram of a chip system in which an I/O device communicates with a SoC chip according to an embodiment of the present application;
- FIG. 2 is a schematic structural diagram of an SMMU provided by an embodiment of the present application.
- FIG. 3 is a schematic diagram of RO constraints and SO constraints of different storage consistency models according to an embodiment of the present application.
- FIG. 4 is a schematic diagram 1 of implementing a globally visible sequence of execution results in a weak sequence model that meets the requirements of a strong sequence model according to an embodiment of the present application;
- FIG. 5 is a schematic diagram 2 of implementing a globally visible sequence of execution results in a weak sequence model that meets the requirements of a strong sequence model according to an embodiment of the present application;
- FIG. 6 is a schematic diagram of communication between different modules within a same storage consistency model provided by an embodiment of the present application.
- FIG. 7 is a schematic diagram of an improvement to a weak sequence model provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of an improvement to the same storage consistency model provided by an embodiment of the present application.
- FIG. 9 is a schematic flowchart 1 of a method for performing a read-write operation provided by an embodiment of the present application.
- FIG. 10 is a second schematic flowchart of a method for performing a read-write operation provided by an embodiment of the present application.
- FIG. 11 is a schematic flowchart three of a method for executing a read-write operation provided by an embodiment of the present application.
- FIG. 12 is a fourth schematic flowchart of a method for executing a read-write operation provided by an embodiment of the present application.
- FIG. 13 is a schematic flowchart five of a method for executing a read-write operation provided by an embodiment of the present application.
- FIG. 14 is a sixth schematic flowchart of a method for executing a read-write operation provided by an embodiment of the application.
- 15 is a seventh schematic flowchart of a method for executing a read-write operation provided by an embodiment of the present application.
- 16 is a schematic flowchart eight of a method for performing a read-write operation provided by an embodiment of the present application
- 17 is a schematic flowchart 9 of a method for executing a read-write operation provided by an embodiment of the present application.
- FIG. 19 is a schematic flowchart eleventh of a method for performing a read and write operation provided by an embodiment of the present application.
- the execution results of the read and write operations (whether the read and write operations are performed) have certain requirements on the order that other nodes are globally visible. For example, a node reads two addresses one by one. Write operation (equivalent to performing two read and write operations), or, if a node performs two read and write operations on an address successively, other nodes not only know that two read and write operations have been performed, but also know (that is, globally visible)
- the order of the execution results of the two read and write operations conforms to software expectations, that is, to meet the requirements of storage consistency.
- the correct order of the execution results that are globally visible includes: executing both the first address and the second address. In the write operation, only the first address is written, or neither the first address nor the second address is written.
- the error sequence in which the execution result is globally visible includes: only the second address is written.
- the processor is a fast-running device relative to the memory.
- the processor reads and writes to the memory, if it waits for the operation to complete before processing other tasks, the processor will be blocked and the work efficiency of the processor will be reduced. Therefore, one cache can be configured per processor (cache is much faster but smaller than memory).
- the direct memory access (DMA) device stores the data to the memory; similarly, when the processor reads When there is data in the memory, the DMA device first stores the data from the memory to the cache, and then the processor reads the data from the cache.
- DMA direct memory access
- Cache-coherent devices comply with the MESI protocol, which specifies four exclusive states of the cache line (the smallest cache unit in the cache), including: E (Exclusive) state, M (Modified) state, S state (Shared) state and I (Invalid) state.
- E Exclusive
- M Mode
- S Shared
- I Invalid
- the E state indicates that the cache line is valid, the data in the cache is consistent with the data in the memory, and the data only exists in this cache
- the M state indicates that the cache line is valid, the data has been modified, the data in the cache is inconsistent with the data in the memory, and the data Only exists in this cache
- the S state indicates that the cache line is valid, the data in the cache is consistent with the data in the memory, and the data exists in multiple caches
- the I state indicates that the cache line is invalid.
- the storage consistency model includes, from strong to weak, the strictness of the required execution order: sequential consistency (SC) model, total store order (TSO) model, relaxed model (relax model, RM) )Wait.
- SC sequential consistency
- TSO total store order
- RM relaxed model
- the SC model requires that the operation sequence of reading and writing shared memory on the hardware is strictly consistent with the operation sequence required by the software instructions;
- TSO model based on the SC model, introduces a cache mechanism, which relaxes the write-read (write first and then read) operation.
- the order constraint of that is, the read operation in the write-read operation can be completed before the write operation;
- the RM model is the most relaxed, and does not impose order constraints on any read and write operations, which simplifies the hardware implementation.
- blocking (fence) subsequent operations to ensure the order of execution.
- the strong order model is not suitable for a certain read and write combination (such as write-write (write first and then write), write-read (write first and then read) , read-write (read first and then write), read-read (read first and then read)) have order constraints, but the weak order model does not have the order constraints.
- the weak-order model should be executed serially according to the execution order of the strong-order model, so as to ensure that the order in which the execution results are globally visible conforms to the requirements of the strong-order model.
- a chip system provided by an embodiment of the present application includes an input output (I/O) device 11 and a SoC chip 12 other than the SoC chip.
- I/O input output
- SoC chip 12 When the I/O device 11 and the SoC chip 12 are connected through a high-speed serial computer expansion bus standard (peripheral component interconnect express, PCIE), the I/O device 11 can be a PCIE board.
- PCIE peripheral component interconnect express
- the I/O device 11 and the SoC chip 12 When connected through a network transmission protocol, the I/O device 11 may be an Ethernet interface.
- the I/O device 11 uses the X86 architecture, the corresponding strong order model is the TSO model, the SoC chip 12 uses the ARM architecture, and the corresponding weak order model is the RM model.
- the SoC chip 12 may include a graphics processing unit (graphics processing unit, GPU) 120, a central processing unit (central processing unit, CPU) 121, a neural network processing unit (neural network processing unit, NPU) 122, system memory management A unit (system memory management unit, SMMU) 123 , a memory controller (memory controller) 124 , a memory 125 , and optionally, a network on chip (network on chip, NOC) 126 .
- GPU 120, CPU 121, NPU 122, SMMU 123, and memory controller 124 are interconnected through NOC 126.
- GPU 120 is a graphics processing core
- CPU 121 is a general-purpose processor core
- NPU 122 is a dedicated processor core for artificial intelligence (AI)
- SMMU 123 is a system memory management unit, which is used to provide address translation functions based on page tables
- the SMMU 123 provides an address translation function between the I/O device 11 and the SoC chip 12
- the memory controller 124 is used to manage data read and write operations in the memory 125
- the memory controller 124 may also include a home agent (home agent, HA), the HA is responsible for the cache coherency management of the SoC chip, and can be incorporated into the memory controller 124 or independently mounted on the NOC 126
- the memory 125 can be a memory or an on-chip memory.
- the SMMU 123 may include a translation lookaside buffer (TLB) 211 and an address translation circuit 212 .
- TLB 211 can reduce the time used to access user memory locations, TLB 211 stores the latest translation of virtual memory to physical memory, which can be called an address translation cache.
- the address translation circuit 212 is used to perform the translation of virtual addresses to physical addresses.
- the I/O device (hereinafter referred to as the second node) sends a read to the memory controller in the SoC chip (hereinafter referred to as the third node) through the SMMU in the SoC chip (hereinafter referred to as the first node)
- a write request means that a device in the strong order model sends a read and write request to a device in the weak order model.
- both models allow out-of-order write-read requests (ie, write first and then read), which can be processed in parallel in both models, so for either model There is no impact on the transmission bandwidth and interaction efficiency between the two.
- SO-constrained read and write requests of I/O devices after entering the SoC chip, the corresponding read and write operations must still be executed in order to ensure that the order in which the execution results are globally visible meets the requirements of the strong order model.
- the SMMU acts as an interface between different storage consistency models, and the I/O devices in the strong order model send write request 1 and write to the SMMU in parallel order.
- Request 2 SMMU sends write request 1 and write request 2 to the memory controller in the weak order model in serial order after two handshakes, that is, SMMU sends write request 1 and the corresponding write request to the memory controller in the first handshake.
- data and send write request 2 and corresponding data to the memory controller in the second handshake after completion.
- write request 1 and write request 2 indicate that a write operation is to be performed
- write response 1 and write response 2 indicate the location of data that can be received and data storage
- write data 1 and write data 2 include the data to be written and the location of data storage.
- Write Completion 1 and Write Completion 2 indicate that the write operation is complete
- acknowledgement (ACK) 1 and ACK2 indicate that the write completion is received.
- the processing flow in Figure 3 is improved as follows: the SMMU sends a write request 1 to the memory controller and receives a write response 1, and does not You need to wait for the completion of write request 1. You can first send write request 2 and receive write response 2. Then SMMU sends write data 1 and write data 2 in parallel, receives write completion 1 and write completion 2 in parallel, and sends ACK1 and ACK2 in parallel. Among them, ACK1 is earlier than ACK2, which informs the memory controller that the execution result of the write request has been globally visible.
- the delay of SMMU waiting for the memory controller to return a write response is still very large, which will still reduce the transmission bandwidth and interaction efficiency between devices with different storage consistency models.
- the SMMU and the memory controller still need at least one handshake, and the sequential processing mechanism is more cumbersome.
- the modules that comply with the SO constraints are sent to the modules that comply with the RO constraints.
- the module sends read and write requests it will also reduce the transmission bandwidth and interaction efficiency within the model.
- the processors (such as GPU 120, CPU 121, NPU 122, etc.) in the SOC chip belonging to the weak order model in Figure 1 (hereinafter referred to as the second node) pass through interfaces (such as NOC 126, interface modules in the processor, etc. ) (hereinafter referred to as the first node) sends read and write requests to the memory controller 124 (hereinafter referred to as the third node) as an example to illustrate a typical application of sending read and write requests between different modules within the same storage consistency model Scenes.
- the memory controller 124 hereinafter referred to as the third node
- the embodiments of the present application provide a method for executing read and write operations, which can be applied to communication between different storage consistency models, and can also be applied to communication between different modules within the same storage consistency model, so as to optimize the model Internal transmission bandwidth and interaction efficiency.
- the interface node SMMU For the communication between different storage consistency models, as shown in Figure 7, by extending the cache coherence (CC) domain of the weak sequential model, the interface node SMMU between different models is also included. In the CC range, sequential processing is completed at the SMMU, so that parallel read and write requests from the strong-order model can also be processed in parallel in the weak-order model, improving the transmission bandwidth and interaction efficiency between devices that comply with SO constraints and devices that comply with RO constraints.
- the SMMU completes the sequential processing, the memory controller of the weak sequential model does not need the sequential processing mechanism. When the memory controller changes, it is not necessary to re-establish the sequential processing mechanism, so the versatility and scalability are stronger.
- read and write requests have a clear sequence relationship in software, and the strong sequence model where the I/O device is located constrains the sequence of such read and write requests, in the strong sequence model, read and write requests are issued in sequence can be efficiently processed in parallel.
- the memory controller inside the weak order model implements the cache consistency. Participate in cache coherency management, so it is impossible to perform cache coherence processing in SMMU to ensure that the globally visible order of the execution results in the weak sequence model meets the requirements of the strong sequence model.
- the memory controller can only achieve cache coherence through the handshake process, resulting in The transmission bandwidth and interaction efficiency between devices with different storage consistency models are reduced.
- the processing authority of cache coherence is moved from the internal memory controller to the SMMU. After the SMMU receives the read and write requests of the strong sequence model, the sequence processing can be completed, ensuring the execution result.
- the globally visible order in the weak order model meets the requirements of the strong order model.
- serial handshake with I/O devices can be avoided, and read and write requests can be processed in parallel in the weak sequential model, improving parallel processing efficiency.
- the scope of the CC that complies with the RO constraint can be extended, and the module (such as the processor) that is located inside the model that complies with the SO constraint and the module that complies with the RO constraint can be extended.
- Interfaces between modules are also included in the CC scope, so that read and write requests from modules that adhere to SO constraints can also be processed in parallel at the interface and modules that adhere to RO constraints to optimize transmission bandwidth and interaction efficiency within the model .
- the present application moves the processing authority of cache coherence from the memory controller to the interface between the processor and the memory controller, and the interface receives read and write requests from the processor After that, the sequential processing can be completed, which ensures that the execution result in the globally visible order of the module that complies with the RO constraint meets the requirements of the module that strongly complies with the RO constraint.
- the read-write operation execution method includes:
- the first node receives the first message and the second message from the second node.
- the first message is used for requesting to perform read and write operations on the first address managed by the third node
- the second message is used for requesting read and write operations on the second address managed by the third node.
- the execution order constraint of the read and write operations of the second node is stricter than that of the read and write operations of the third node, that is, the second node complies with the SO constraint, and the third node complies with the RO constraint. Since the second node complies with the SO constraint, in fact, the first message is used to request read and write operations to the first address managed by the third node in strict order, and the second message is used to request the second address managed by the third node. Read and write operations are performed in strict order.
- the second node refers to the device that obeys the SO constraint in the strong-order model
- the third node refers to the device that obeys the RO constraint in the weak-order model.
- the first node refers to an interface node between the strong-order model and the weak-order model, and the first node may be an independent device or an interface module in the second node or the third node.
- the second node may be the I/O device 11 located outside the SoC chip 12 in FIG. 1 for sending read and write requests;
- the third node may be the memory controller 124 in the SoC chip 12 in FIG. 1 or The HA in the memory controller 124 is used for cache coherency management, such as managing the directory of the storage space;
- the first node may be an MMU, such as the SMMU 123 in the SoC chip 12 of FIG. 1 or the SMMU shown in FIG. 10 .
- the read-write operation execution circuit 213 in 123, the read-write operation execution circuit 213 is newly added on the SMMU 123 shown in FIG. 2, and is used for executing the read-write operation execution method provided by the present application.
- FIG. 10 provides a schematic structural diagram of a read-write operation execution circuit 213 .
- the read-write operation execution circuit 213 includes a sequence processing module 2131 , an operation authority judgment module 2132 and a data cache judgment module 2133 .
- the sequence processing module 2131 is used for recording the sequence of receiving the first message and the second message, and is used for the operation authority judging module 2132 to perform read and write operations in order.
- the operation authority judgment module 2132 is used to record whether the operation authority of the first address (eg E state) and the operation authority of the second address (eg E status) are received, and the first message and the second message recorded by the module 2131 are processed according to the sequence.
- the sequence determines the sequence of reading and writing operations on the first address and the second address. For example, the sequence processing module 2131 records that the first message is received first and then the second message is received.
- a WriteBack message is sent, and then a writeback message for the second address is sent.
- the write-back message may include the type of the write operation and the target address (the first address or the second address); for the read operation, the write-back message may include the type of the read operation and the target address.
- the data cache judgment module 2133 is used to record whether the identifier of the cache address corresponding to the first address returned by the memory controller (for example, the data buffer identifier (data buffer ID, DBID) and the identifier of the cache address corresponding to the first address (for example, DBID) are received. ) to determine whether data needs to be sent.
- the read/write operation execution circuit 214 may be located in the SMMU as the first node;
- the on-chip processor accesses the on-chip storage, that is, the communication between different modules in the same storage consistency model
- the read and write operation execution circuit 214 as the first node may be located in the NOC or the on-chip processor.
- the present application exemplarily takes a communication scenario between different storage consistency models as an example for description, but is not intended to be limited thereto.
- the first node refers to the module that obeys the SO constraint in the storage consistency model
- the third node refers to the module that obeys the RO constraint in the storage consistency model
- the second node refers to the module that obeys the RO constraint in the storage consistency model.
- An interface module used for interaction between the first node and the third node in the storage consistency model.
- the second node is the processor (such as GPU 120, CPU 121, NPU 122, etc.) in the SoC chip in FIG. 1, and the second node is the on-chip NOC 126 in the SoC chip or the interface module of the processor (this module is hardware circuit), the third node is the memory controller 124 in the SoC chip or the HA in the memory controller 124 .
- the first node, the second node and the third node are different hardware modules inside the processor.
- the read and write operations involved in this application can support write-write (write first and then write), write-read (write first and then read), read-write (read first and then write), read-read (read first and then read), etc. operate.
- the first message or the second message may be a write request, corresponding to a write operation, or may be a read request, corresponding to a read operation.
- the first message or the second message is not limited to one, but may be multiple.
- the message types of the first message and the second message can be the same, for example, both are write requests (ie, write-write requests) or read requests (ie, read-read requests), or they can be different, for example, one is a write request and the other is a write request.
- Read requests ie, write-read requests or read-write requests).
- the first address of the first message and the second address of the second message may be the same or different.
- the second node may send a first message and a second message to the first node, and the first message and the second message may be write request messages.
- the first message is used for requesting to perform a write operation on the first address managed by the third node in a strict order
- the second message is used for requesting a write operation on the second address managed by the third node in a strict order.
- the first node acquires the operation authority of the first address and the operation authority of the second address from the third node.
- the operation authority may refer to the E state in the cache coherence, indicating the operation authority that the node has on the address, that is, the first node can obtain the E state of the first address and the E state of the second address from the third node.
- the CC scope is extended from the third node to the first node, so that the first node participates in the management of cache consistency in the weak sequence model, and other nodes ( For example, the third node) cannot perform read and write operations on the first address and the second address that require operation permissions, that is, the third node's sequential processing authority for read and write requests has been transferred to the first node, the first address and the second address.
- the execution order of read and write operations is controlled by the first node.
- the following specifically describes how the first node acquires the operation authority of the first address and the operation authority of the second address.
- the first node may send a third message to the third node, where the third message includes the first address, and the second message is used to request the operation authority of the first address.
- the third node may send a fourth message to the first node, the fourth message may be a response message to the third message, and the fourth message is used to indicate the operation authority of the first address.
- the first node may send a confirmation message of the fourth message to the third node, where the confirmation message is used to indicate that the first node has received the fourth message.
- the first node may send a third message to the third node, where the third message includes the second address, and the second message is used to request the operation authority of the second address.
- the third node may send a fourth message to the first node, the fourth message may be a response message to the third message, and the fourth message is used to indicate the operation authority of the second address.
- the first node may send a confirmation message of the fourth message to the third node, where the confirmation message is used to indicate that the first node has received the fourth message.
- This application does not limit the order in which the first node obtains the operation authority of the first address and the operation authority of the second address from the third node. For example, it is assumed that the first node receives the first message (including the first address) first and then the second message. (including the second address), the first node may first obtain the operation authority of the second address and then obtain the operation authority of the first address.
- the following describes how the first node acquires the operation authority of the first address and the operation authority of the second address with reference to FIG. 11 .
- the first node may send the third message 1 and the third message 2 to the third node, and the third message 1 and the third message 2 may be GET_E messages.
- the third message 1 includes the first address
- the third message 2 includes the second address.
- the third message 1 is used to request the operation authority of the first address
- the third message 2 is used to request the operation authority of the second address.
- the present application does not limit the order in which the first node sends the third message 1 and the third message 2 to the third node.
- the third node sends a fourth message 1 and a fourth message 2 to the first node
- the fourth message 1 may be a response message (RSP1) of the third message 1
- the fourth message 2 may be a response message (RSP2) of the third message 2 ).
- the first node receives the fourth message 1 and the fourth message 2 from the third node.
- the fourth message 1 is used to instruct the first node to obtain the operation authority of the first address
- the fourth message 2 is used to instruct the first node to obtain the operation authority of the second address.
- the first node sends an acknowledgement message 1 (ACK1) of the fourth message 1 and an acknowledgement message 2 (ACK2) of the fourth message 2 to the third node.
- the two confirmation messages are used to indicate that the first node has received the fourth message.
- Steps S901 and S902 are not executed in sequence. For example, step S901 may be executed first and then step S902 may be executed, or step S902 may be executed first and then step S901 may be executed.
- the first node performs read and write operations on the first address and the second address.
- This application does not limit the execution order of the read and write operations on the first address and the second address by the first node.
- the first node can perform read and write operations on the first address and the second address in parallel
- Parallel refers to the next read and write operation without waiting for the completion of the previous read and write operation, so as to realize the parallel processing of multiple read and write operations in the weak sequential model.
- the requests from the strong order model can be processed in parallel, thereby increasing the number of nodes that obey the SO constraint (the second node) and the nodes that obey the RO constraint (the first node).
- the following specifically describes how the first node performs read and write operations on the first address and the second address.
- the first node may send a fifth message to the third node, where the fifth message is used to instruct a read/write operation to be performed on the first address, and the fifth message may be a write-back (WriteBack) message.
- the fifth message may include the data to be written, the type of the write operation, and the first address; for the read operation, the fifth message may include the type of the read operation and the first address.
- the first node may send a fifth message to the third node, where the fifth message is used to instruct a read/write operation to be performed on the second address, and the fifth message may be a write-back (WriteBack) message.
- the fifth message may include the data to be written, the type of the write operation, and the second address; for the read operation, the fifth message may include the type of the read operation and the second address.
- the order in which the first node sends the fifth message corresponding to the first address and the fifth message corresponding to the second address may be the same as the order in which the first message and the second message are received. For example, if the first node receives the first message first and then the second message, the first node first sends the fifth message corresponding to the first address and then sends the fifth message corresponding to the second address.
- the third node may send a sixth message to the first node, where the sixth message may be a response message to the fifth message, and the sixth message is used to indicate a cache address corresponding to the first address.
- the third node may send a sixth message to the first node, where the sixth message may be a response message to the fifth message, and the sixth message is used to indicate the cache address corresponding to the second address .
- the first node After receiving the sixth message, the first node sends a seventh message to the third node.
- the seventh message may be a WriteData message, and the seventh message is used to perform read and write operations on the cache address corresponding to the first address.
- the first node After receiving the sixth message, the first node sends a seventh message to the third node.
- the seventh message may be a write data (WriteData) message, and the seventh message is used to read the cache address corresponding to the second address. write operation.
- the first node sends the fifth message 1 and the fifth message 2 to the third node
- the fifth message 1 and the fifth message 2 may be write-back (WriteBack) messages.
- the fifth message 1 corresponds to the first message and is used for instructing to perform a write operation on the first address
- the fifth message 2 corresponds to the second message and is used for instructing to perform a write operation on the second address. Since the first node first receives the first message from the second node and then receives the second message, the first node first sends the fifth message 1 to the third node and then sends the fifth message 2.
- the parallelism at this time means that the first node can send the fifth message 2 without waiting for all read and write operations corresponding to the fifth message 1 to be completed.
- the third node sends the sixth message 1 and the sixth message 2 to the first node
- the sixth message 1 may be the response message (RSP3) of the fifth message 1
- the sixth message 2 may be the response message (RSP4) of the fifth message 2 ).
- the sixth message 1 is used to indicate the cache address corresponding to the first address
- the fifth message 2 is used to indicate the cache address corresponding to the second address.
- the first node sends a seventh message 1 and a seventh message 2 to the third node, and the seventh message may be a write data (WriteData) message.
- the seventh message 1 is used to write data to the cache address corresponding to the first address
- the seventh message 2 is used to write data to the cache address corresponding to the second address.
- the first node may release the operation authority of the first address to the third node after completing the read and write operations on the first address.
- the above seventh message may also be used to release the operation authority of the first address to the third node. In this way, the third node or other nodes can continue to perform read and write operations on the first address.
- the first node can release the operation authority of the second address to the third node after completing the read and write operations on the second address.
- the above seventh message can also be used to release the operation of the second address to the third node. permissions. In this way, the third node or other nodes can continue to perform read and write operations on the second address.
- the seventh message 1 is further used to instruct the third node to release the operation authority of the first address; the seventh message 2 is also used to instruct the third node to release the operation authority of the second address.
- the first node receives the first message and the second message from the second node, the second node complies with the SO constraint, and the first message requests to read the first address managed by the third node Write operation, the second message requests to perform read and write operations on the second address managed by the third node, and the third node complies with the RO constraint; then the first node obtains the operation authority of the first address and the operation authority of the second address from the third node , so that the first node participates in the management of cache consistency, and other nodes cannot perform read and write operations on the first address and the second address that require operation permissions, that is, the execution order of the read and write operations of the first address and the second address is determined by Controlled by the first node, then the order in which the execution results are globally visible is also controlled by the first node. Thereby, the order in which the execution results of the read and write operations performed by the nodes complying with the RO constraint are globally
- the third node requests the operation authority of the first address (or the second address) or performs read and write operations on the first address (or the second address) that require operation authority. What will the first node do? Processing to meet the storage consistency requirements and ensure that the order in which the execution results are globally visible conforms to the requirements of the strong order model.
- the first node if it requests the operation authority of the first address but does not obtain the operation authority of the first address, it receives a request from the third node to perform a read and write operation on the first address that requires operation authority , or, if the operation authority of the first address is requested, the first node indicates to the third node that the operation authority of the first address has not been obtained.
- the first node before requesting the operation authority of the second address but not obtaining the operation authority of the second address, receives a request from the third node to perform read and write operations on the second address that require operation authority, or requests the second address the operation authority of the address, the first node indicates to the third node that the operation authority of the second address has not been obtained.
- the above-mentioned read-write operation execution method further includes:
- the first node Before the first node acquires the operation authority of the first address (or the second address), the first node receives an eighth message from the third node.
- the eighth message is used to request a read and write operation on the first address that requires an operation authority, or in other words, is used to request an operation authority of the first address.
- the eighth message is used to request a read and write operation on the second address that requires an operation authority, or, in other words, is used to request an operation authority of the second address.
- the eighth message may be a snoop message.
- the third node sends a response message (RSP1) of the third message 1 (GET_E1) to the first node, so that the first node obtains the operation authority of the first address (or the second address)
- the third node sends the eighth message (sniff message) to the first node, so that the first node receives the eighth message from the third node, the eighth message is used to request the first address (or the second address) Read and write operations that require operation authority, or are used to request operation authority of the first address (or the second address).
- the first node sends a ninth message to the third node.
- the ninth message is used to indicate that the operation authority of the first address (or the second address) has not been acquired.
- the ninth message may be a response message of the eighth message, for example, the ninth message may be a sniffing response message.
- the first node sends a ninth message (sniff response message) to the third node, and the eighth message is used to indicate that the operation authority of the first address (or the second address) is not obtained.
- the third node if, before the first node obtains the operation authority of the first address, the third node requests to perform a read and write operation on the first address that requires operation authority, or requests the operation authority of the first address, the first node Indicates to the third node that the operation authority of the first address is not obtained. This enables the third node or other nodes to perform read and write operations on the first address.
- the third node if before the first node obtains the operation authority of the second address, the third node requests to perform a read and write operation on the second address that requires operation authority, or requests the operation authority of the second address, the first node will send a request to the second address.
- the three nodes indicate that the operation authority of the second address is not obtained. This enables the third node or other nodes to perform read and write operations on the second address.
- the third node requests to perform a read and write operation on the first address that requires operation permission before the first node starts to perform a write operation on the first address but acquires the cache address corresponding to the first address , or to request the operation authority of the first address
- the first node sends the data written to the cache address corresponding to the first address to the third node, or indicates that it has been released The operation authority of the first address.
- the third node requests to perform a read and write operation on the second address that requires operation permissions, or requests the second address.
- the first node sends the data written to the cache address corresponding to the second address to the third node after obtaining the cache address corresponding to the second address, or indicates that the operation authority of the second address has been released.
- the above-mentioned read-write operation execution method further includes:
- the first node starts to write the first address (or the second address) but does not obtain the cache address corresponding to the first address (or the second address), and the first node receives the twelfth message from the third node.
- the twelfth message is used to request a read and write operation on the first address (or the second address) that requires operation authority, or in other words, is used to request the operation authority of the first address (or the second address).
- the twelfth message may be a snoop message.
- the third node before the third node sends the sixth message (including the cache address corresponding to the first address (or the second address)) to the first node, the third node sends the tenth message to the first node.
- Two messages so that the first node receives a twelfth message from the third node, and the twelfth message is used to request a read and write operation on the first address (or the second address) that requires operation permission, or , which is used to request the operation permission of the first address (or the second address).
- the first node After acquiring the cache address corresponding to the first address (or the second address), the first node sends a thirteenth message to the third node.
- the first node after the first node receives the sixth message 1 (including the cache address corresponding to the first address (or the second address)) from the first node, it sends the tenth message to the third node.
- the thirteenth message may be a response message of the twelfth message, for example, the thirteenth message may be a sniffing response message.
- the thirteenth message may include data written to the cache address corresponding to the first address (or the second address).
- the thirteenth message may have the function of the seventh message instead of the seventh message, that is, the thirteenth message may also be used to instruct the third node to release the operation authority of the first address (or the second address).
- the thirteenth message may be sent after the seventh message (in this case, the seventh message is used to instruct the third node to release the operation authority of the first address (or the second address)), using To indicate that the operation authority of the first address (or the second address) has been released.
- the third node if the third node requests the first address to perform a write operation on the first address but before obtaining the cache address corresponding to the first address, the read and write operations, or request the operation permission of the first address, after obtaining the cache address corresponding to the first address, the first node sends the data written to the cache address corresponding to the first address to the third node, or, Indicates that the operation authority of the first address has been released.
- the first node After obtaining the cache address corresponding to the first address, the first node sends the data written to the cache address corresponding to the first address to the third node, so that the third node can directly obtain the data; After the seventh message is sent, it is indicated that the operation authority of the first address has been released, so that the third node or other nodes can perform read and write operations on the first address.
- the third node requests to perform a read and write operation on the second address that requires operation permissions, or requests the second address.
- the first node sends the data written to the cache address corresponding to the second address to the third node after obtaining the cache address corresponding to the second address, or indicates that the operation authority of the second address has been released.
- the first node After obtaining the cache address corresponding to the second address, the first node sends the data written to the cache address corresponding to the second address to the third node, so that the third node can directly obtain the data; After the seventh message is sent, it is indicated that the operation authority of the second address has been released, so that the third node or other nodes can perform read and write operations on the second address.
- the first node may release the operation authority of the first address and the operation authority of the second address to the third node when the preset conditions are satisfied operation authority. This enables the third node or other nodes to perform read and write operations on the first address and the second address.
- the preset condition is that the third node requests the operation authority of the first address and the second address.
- the third node requests to perform a read and write operation on the first address that requires operation authority, or requests the operation authority of the first address, the first node releases the operation authority of the first address to the third node, and obtains the operation authority of the first address from the third node again.
- the third node requests to perform read and write operations on the second address that require operation authority, or requests the first The operation authority of the second address, the first node releases the operation authority of the second address to the third node, and obtains the operation authority of the second address from the third node again.
- the above-mentioned read-write operation execution method further includes:
- the first node After the first node obtains the operation authority of the first address (or the second address) and before starting to perform read and write operations on the first address (or the second address), the first node receives the tenth message from the third node .
- the tenth message is used to request a read and write operation on the first address (or the second address) that requires operation authority, or in other words, is used to request the operation authority of the first address (or the second address).
- the tenth message may be a snoop message.
- the third node sends the tenth message (sniff message) to the first node, so that the first node sends the message from the third node to the third node.
- a tenth message is received, where the tenth message is used to request a read and write operation on the first address (or the second address) that requires operation authority, or is used to request the operation authority of the first address (or the second address).
- the first node sends an eleventh message to the third node, and obtains the operation authority of the first address (or the second address) from the third node again.
- the eleventh message is used to indicate releasing the operation authority of the first address (or the second address).
- the eleventh message may be a response message of the tenth message, for example, the eleventh message may be a sniffing response message.
- the first node sends an eleventh message (sniff response message) to the third node, where the eleventh message is used to indicate releasing the operation authority of the first address (or the second address), And re-send the third message (GET_E) to the third node, and receive the fourth message (RSP1/RSP2) from the third node to obtain the operation authority of the first address (or the second address), and send the third message to the third node.
- Four-message acknowledgment message (ACK1) and then re-execute the read and write operation procedures for the first address (or the second address) and the procedure for releasing the operation authority of the first address (or the second address).
- the third node if after the first node obtains the operation authority of the first address and before starting to read and write the first address, the third node requests to perform the read and write operation of the first address that requires the operation authority, or , the operation authority of the first address is requested, the first node releases the operation authority of the first address to the third node, and obtains the operation authority of the first address from the third node again. After the first node releases the operation authority of the first address to the third node, the third node can directly perform read and write operations on the first address. If the first node obtains the operation authority of the first address from the third node again, it can continue to perform read and write operations on the first address.
- the third node requests to perform read and write operations on the second address that require operation authority, or requests the first The operation authority of the second address
- the first node releases the operation authority of the second address to the third node, and obtains the operation authority of the second address from the third node again.
- the third node can directly perform read and write operations on the second address. If the first node obtains the operation authority of the second address from the third node again, it can continue to perform read and write operations on the second address.
- the preset condition is that the time when the first node obtains the operation authority of the first address from the third node is greater than or equal to the first preset time, and the first node obtains the second address from the third node.
- the time of the operation authority of the address is greater than or equal to the second preset time.
- the first preset time and the second preset time may be the same or different.
- steps S901 and S902 are not executed sequentially.
- step S902 to be executed first and then step S901 to be executed, that is, the first node obtains the operation authority of the first address (or the second address) in advance, and when receiving the first message (or second message), the first address (or second address) can be read and written quickly.
- the first node may obtain the operation authority of the first address (or the second address) in advance according to the historical read and write operations.
- the first node Within a preset time after the first node acquires the operation authority of the first address from the third node, if the first message is not received, the first node releases the operation authority of the first address to the third node. Similarly, within a preset time after the first node obtains the operation authority of the second address from the third node, if the second message is not received, the first node releases the operation authority of the second address to the third node.
- the first node after the first node obtains the operation authority of the first address (or the second address) through the third message and the fourth message, it does not receive the first message (or the second address) after a preset time. two messages), the first node sends a fourteenth message to the third node to indicate releasing the operation authority of the first address (or the second address). Subsequently, when the first node receives the first message (or the second message) again, the interaction process corresponding to the above steps S902-S903 is re-executed to complete the read and write operations.
- the first node after the first node obtains the operation authority of the first address (or the second address) through the third message and the fourth message, the first node receives the first message (or the first address) within a preset time. two messages), the first node executes the interaction process corresponding to step S903 to complete the read and write operations.
- the first node does not need to perform the process of acquiring the operation permission after receiving the read and write request from the second node, and can quickly perform read and write operations on the first address (or the second address).
- the size of the sequence numbers of the above-mentioned processes does not mean the sequence of execution, and the execution sequence of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
- the disclosed systems, devices and methods may be implemented in other manners.
- the device embodiments described above are only illustrative.
- the division of the units is only a logical function division. In actual implementation, there may be other division methods.
- multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
- the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
- the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
- a software program it can be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, all or part of the processes or functions described in the embodiments of the present application are generated.
- the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, optical fiber, Digital Subscriber Line, DSL) or wireless (eg infrared, wireless, microwave, etc.) means.
- the computer-readable storage medium can be any available medium that can be accessed by a computer or data storage devices including one or more servers, data centers, etc. that can be integrated with the medium.
- the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (eg, a Solid State Disk (SSD)), and the like.
- a magnetic medium eg, a floppy disk, a hard disk, a magnetic tape
- an optical medium eg, a DVD
- a semiconductor medium eg, a Solid State Disk (SSD)
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Description
Claims (25)
- 一种读写操作执行方法,其特征在于,包括:第一节点从第二节点接收第一消息和第二消息;所述第一消息用于请求对第三节点管理的第一地址进行读写操作;所述第二消息用于请求对所述第三节点管理的第二地址进行读写操作;所述第二节点的读写操作的执行顺序约束比所述第三节点的读写操作的执行顺序约束严格;所述第一节点从所述第三节点获取所述第一地址的操作权限和所述第二地址的操作权限;所述第一节点对所述第一地址和所述第二地址进行读写操作。
- 根据权利要求1所述的方法,其特征在于,所述第一节点对所述第一地址和所述第二地址进行读写操作,包括:所述第一节点并行对所述第一地址和所述第二地址进行读写操作。
- 根据权利要求2所述的方法,其特征在于,所述第一节点并行对所述第一地址和所述第二地址进行读写操作,包括:所述第一节点按照所述第一消息和所述第二消息的接收顺序并行对所述第一地址和所述第二地址进行读写操作。
- 根据权利要求1-3任一项所述的方法,其特征在于,所述第二节点遵守严格顺序SO约束,第三节点遵守宽松顺序RO约束。
- 根据权利要求1-4任一项所述的方法,其特征在于,还包括:所述第一节点在完成对所述第一地址的读写操作后,向所述第三节点释放所述第一地址的操作权限;所述第一节点在完成对所述第二地址的读写操作后,向所述第三节点释放所述第二地址的操作权限。
- 根据权利要求1-5任一项所述的方法,其特征在于,所述第一节点从第三节点获取所述第一地址的操作权限和所述第二地址的操作权限,包括:所述第一节点从所述第三节点获取所述第一地址的E态以及所述第二地址的E态。
- 根据权利要求1-6任一项所述的方法,其特征在于,还包括:所述第一节点在请求所述第一地址的操作权限但未获取所述第一地址的操作权限时,接收到所述第三节点请求对所述第一地址进行需要操作权限的读写操作,或者,请求所述第一地址的操作权限,则所述第一节点向所述第三节点指示未获取所述第一地址的操作权限;所述第一节点在请求所述第二地址的操作权限但未获取所述第二地址的操作权限时,接收到所述第三节点请求对所述第二地址进行需要操作权限的读写操作,或者,请求所述第二地址的操作权限,则所述第一节点向所述第三节点指示未获取所述第二地址的操作权限。
- 根据权利要求1-7任一项所述的方法,其特征在于,在所述第一节点从所述第三节点获取所述第一地址的操作权限和所述第二地址的操作权限之后,所述方法还包括:在预设条件满足时,所述第一节点向所述第三节点释放所述第一地址的操作权限和所述第二地址的操作权限。
- 根据权利要求8所述的方法,其特征在于,所述预设条件为所述第三节点请求所述第一地址和所述第二地址的操作权限。
- 根据权利要求8所述的方法,其特征在于,所述预设条件为所述第一节点从所述第三节点获取所述第一地址的操作权限的时间大于或等于第一预设时间,以及所述第一节点从所述第三节点获取所述第二地址的操作权限的时间大于或等于第二预设时间。
- 根据权利要求1-10任一项所述的方法,其特征在于,所述第二节点为片上系统SoC芯片之外的输入输出I/O设备,所述第一节点为所述SoC芯片中的内存管理单元MMU,所述第三节点为所述SoC芯片中的内存控制器或所述内存控制器中的本地代理HA。
- 根据权利要求1-10任一项所述的方法,其特征在于,所述第二节点为SoC芯片中的处理器,所述第一节点为所述SoC芯片中的片上互联网络NOC或者所述处理器的接口模块,所述第三节点为所述SoC芯片中的内存控制器或所述内存控制器中的HA。
- 一种片上系统SoC芯片,其特征在于,包括:第一节点和内存控制器,所述第一节点用于:从第二节点接收第一消息和第二消息;所述第一消息用于请求对所述内存控制器管理的第一地址进行读写操作;所述第二消息用于请求对所述内存控制器管理的第二地址进行读写操作;所述第二节点的读写操作的执行顺序约束比所述内存控制器的读写操作的执行顺序约束严格;从所述内存控制器获取所述第一地址的操作权限和所述第二地址的操作权限;对所述第一地址和所述第二地址进行读写操作。
- 根据权利要求13所述的SoC芯片,其特征在于,所述第一节点具体用于:并行对所述第一地址和所述第二地址进行读写操作。
- 根据权利要求14所述的SoC芯片,其特征在于,所述第一节点具体用于:按照所述第一消息和所述第二消息的接收顺序并行对所述第一地址和所述第二地址进行读写操作。
- 根据权利要求13-15任一项所述的SoC芯片,其特征在于,所述第二节点遵守严格顺序SO约束,内存控制器遵守宽松顺序RO约束。
- 根据权利要求13-16任一项所述的SoC芯片,其特征在于,所述第一节点还用于:在完成对所述第一地址的读写操作后,向所述内存控制器释放所述第一地址的操作权限;在完成对所述第二地址的读写操作后,向所述内存控制器释放所述第二地址的操作权限。
- 根据权利要求13-17任一项所述的SoC芯片,其特征在于,所述第一节点具体用于:从所述内存控制器获取所述第一地址的E态以及所述第二地址的E态。
- 根据权利要求13-18任一项所述的SoC芯片,其特征在于,所述第一节点还用 于:在请求所述第一地址的操作权限但未获取所述第一地址的操作权限时,接收到所述内存控制器请求对所述第一地址进行需要操作权限的读写操作,或者,请求所述第一地址的操作权限,则向所述内存控制器指示未获取所述第一地址的操作权限;在请求所述第二地址的操作权限但未获取所述第二地址的操作权限时,接收到所述内存控制器请求对所述第二地址进行需要操作权限的读写操作,或者,请求所述第二地址的操作权限,则向所述内存控制器指示未获取所述第二地址的操作权限。
- 根据权利要求13-19任一项所述的SoC芯片,其特征在于,在从所述内存控制器获取所述第一地址的操作权限和所述第二地址的操作权限之后,所述第一节点还用于:在预设条件满足时,向所述内存控制器释放所述第一地址的操作权限和所述第二地址的操作权限。
- 根据权利要求20所述的SoC芯片,其特征在于,所述预设条件为所述内存控制器请求所述第一地址和所述第二地址的操作权限。
- 根据权利要求20所述的SoC芯片,其特征在于,所述预设条件为所述第一节点从所述内存控制器获取所述第一地址的操作权限的时间大于或等于第一预设时间,以及所述第一节点从所述内存控制器获取所述第二地址的操作权限的时间大于或等于第二预设时间。
- 根据权利要求13-22任一项所述的SoC芯片,其特征在于,所述第二节点为所述SoC芯片之外的输入输出I/O设备,所述第一节点为所述SoC芯片中的内存管理单元MMU。
- 根据权利要求13-22任一项所述的SoC芯片,其特征在于,所述第二节点为所述SoC芯片中的处理器,所述第一节点为所述SoC芯片中的片上互联网络NOC或者所述处理器的接口模块。
- 根据权利要求13-24任一项所述的SoC芯片,其特征在于,所述第一节点包括顺序处理模块、操作权限判断模块和数据缓存判断模块;所述顺序处理模块用于记录接收所述第一消息和所述第二消息的顺序;所述操作权限判断模块用于记录是否收到所述第一地址的操作权限和所述第二地址的操作权限,并根据所述顺序来确定对所述第一地址和所述第二地址进行读写操作的先后顺序;所述数据缓存判断模块用于记录是否收到所述第一地址对应的缓存地址的标识以及所述第二地址对应的缓存地址的标识,从而确定是否发送数据。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/084556 WO2022205130A1 (zh) | 2021-03-31 | 2021-03-31 | 读写操作执行方法和SoC芯片 |
CN202180093103.5A CN116940934A (zh) | 2021-03-31 | 2021-03-31 | 读写操作执行方法和SoC芯片 |
EP21933798.7A EP4310683A4 (en) | 2021-03-31 | 2021-03-31 | METHOD FOR PERFORMING A READ/WRITE OPERATION AND SOC CHIP |
US18/477,110 US20240028528A1 (en) | 2021-03-31 | 2023-09-28 | Read/write operation execution method and soc chip |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2021/084556 WO2022205130A1 (zh) | 2021-03-31 | 2021-03-31 | 读写操作执行方法和SoC芯片 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/477,110 Continuation US20240028528A1 (en) | 2021-03-31 | 2023-09-28 | Read/write operation execution method and soc chip |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022205130A1 true WO2022205130A1 (zh) | 2022-10-06 |
Family
ID=83455366
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/084556 WO2022205130A1 (zh) | 2021-03-31 | 2021-03-31 | 读写操作执行方法和SoC芯片 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240028528A1 (zh) |
EP (1) | EP4310683A4 (zh) |
CN (1) | CN116940934A (zh) |
WO (1) | WO2022205130A1 (zh) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101176083A (zh) * | 2005-03-23 | 2008-05-07 | 高通股份有限公司 | 在弱有序处理系统中强制执行强有序请求 |
US20100199048A1 (en) * | 2009-02-04 | 2010-08-05 | Sun Microsystems, Inc. | Speculative writestream transaction |
CN106796561A (zh) * | 2014-09-12 | 2017-05-31 | 高通股份有限公司 | 将强有序写入事务桥接到弱有序域中的装置和相关设备、方法和计算机可读媒体 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050289306A1 (en) * | 2004-06-28 | 2005-12-29 | Sridhar Muthrasanallur | Memory read requests passing memory writes |
-
2021
- 2021-03-31 EP EP21933798.7A patent/EP4310683A4/en active Pending
- 2021-03-31 WO PCT/CN2021/084556 patent/WO2022205130A1/zh active Application Filing
- 2021-03-31 CN CN202180093103.5A patent/CN116940934A/zh active Pending
-
2023
- 2023-09-28 US US18/477,110 patent/US20240028528A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101176083A (zh) * | 2005-03-23 | 2008-05-07 | 高通股份有限公司 | 在弱有序处理系统中强制执行强有序请求 |
US20100199048A1 (en) * | 2009-02-04 | 2010-08-05 | Sun Microsystems, Inc. | Speculative writestream transaction |
CN106796561A (zh) * | 2014-09-12 | 2017-05-31 | 高通股份有限公司 | 将强有序写入事务桥接到弱有序域中的装置和相关设备、方法和计算机可读媒体 |
Non-Patent Citations (1)
Title |
---|
See also references of EP4310683A4 * |
Also Published As
Publication number | Publication date |
---|---|
EP4310683A4 (en) | 2024-05-01 |
CN116940934A (zh) | 2023-10-24 |
EP4310683A1 (en) | 2024-01-24 |
US20240028528A1 (en) | 2024-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11822786B2 (en) | Delayed snoop for improved multi-process false sharing parallel thread performance | |
US10169080B2 (en) | Method for work scheduling in a multi-chip system | |
US7024521B2 (en) | Managing sparse directory evictions in multiprocessor systems via memory locking | |
US9529532B2 (en) | Method and apparatus for memory allocation in a multi-node system | |
US7613882B1 (en) | Fast invalidation for cache coherency in distributed shared memory system | |
US20110004732A1 (en) | DMA in Distributed Shared Memory System | |
US20150254182A1 (en) | Multi-core network processor interconnect with multi-node connection | |
US10592459B2 (en) | Method and system for ordering I/O access in a multi-node environment | |
US9372800B2 (en) | Inter-chip interconnect protocol for a multi-chip system | |
US6920532B2 (en) | Cache coherence directory eviction mechanisms for modified copies of memory lines in multiprocessor systems | |
JP7193547B2 (ja) | キャッシュ・メモリ動作の調整 | |
US6934814B2 (en) | Cache coherence directory eviction mechanisms in multiprocessor systems which maintain transaction ordering | |
JP2001167077A (ja) | ネットワークシステムにおけるデータアクセス方法、ネットワークシステムおよび記録媒体 | |
US6925536B2 (en) | Cache coherence directory eviction mechanisms for unmodified copies of memory lines in multiprocessor systems | |
US9183150B2 (en) | Memory sharing by processors | |
US20220114098A1 (en) | System, apparatus and methods for performing shared memory operations | |
US6647469B1 (en) | Using read current transactions for improved performance in directory-based coherent I/O systems | |
US20030182509A1 (en) | Methods and apparatus for speculative probing at a request cluster | |
CN114356839B (zh) | 处理写操作的方法、设备、处理器及设备可读存储介质 | |
WO2022205130A1 (zh) | 读写操作执行方法和SoC芯片 | |
EP4220375A1 (en) | Systems, methods, and devices for queue management with a coherent interface | |
CN110083548B (zh) | 数据处理方法及相关网元、设备、系统 | |
KR20200143922A (ko) | 메모리 카드 및 이를 이용한 데이터 처리 방법 | |
WO2022246848A1 (zh) | 分布式缓存系统和数据缓存方法 | |
CN109597776B (zh) | 一种数据操作方法、内存控制器以及多处理器系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21933798 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180093103.5 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2021933798 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2021933798 Country of ref document: EP Effective date: 20231019 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |