CN118113631A - Data processing system, method, device, medium and computer program product - Google Patents
Data processing system, method, device, medium and computer program product Download PDFInfo
- Publication number
- CN118113631A CN118113631A CN202410534079.XA CN202410534079A CN118113631A CN 118113631 A CN118113631 A CN 118113631A CN 202410534079 A CN202410534079 A CN 202410534079A CN 118113631 A CN118113631 A CN 118113631A
- Authority
- CN
- China
- Prior art keywords
- consistency
- interface
- request
- cache
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012545 processing Methods 0.000 title claims abstract description 226
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000004590 computer program Methods 0.000 title claims abstract description 34
- 230000006870 function Effects 0.000 claims abstract description 146
- 238000004891 communication Methods 0.000 claims abstract description 46
- 230000001360 synchronised effect Effects 0.000 claims description 45
- 230000000977 initiatory effect Effects 0.000 claims description 42
- 230000005540 biological transmission Effects 0.000 claims description 33
- 238000001514 detection method Methods 0.000 claims description 32
- 238000006243 chemical reaction Methods 0.000 claims description 18
- 238000003672 processing method Methods 0.000 claims description 18
- 230000004044 response Effects 0.000 claims description 16
- 238000001693 membrane extraction with a sorbent interface Methods 0.000 claims description 12
- 238000001914 filtration Methods 0.000 claims description 5
- 238000004806 packaging method and process Methods 0.000 claims description 5
- 230000002093 peripheral effect Effects 0.000 description 36
- 238000010586 diagram Methods 0.000 description 11
- 230000001427 coherent effect Effects 0.000 description 6
- 238000012549 training Methods 0.000 description 4
- 230000001133 acceleration Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000003999 initiator Substances 0.000 description 3
- XEEYBQQBJWHFJM-UHFFFAOYSA-N Iron Chemical compound [Fe] XEEYBQQBJWHFJM-UHFFFAOYSA-N 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 229910052742 iron Inorganic materials 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention discloses a data processing system, a method, equipment, a medium and a computer program product in the technical field of computers. The invention arranges the consistency function processing device and at least one consistency interface in the equipment, and enables different consistency interfaces to realize one-to-one communication connection, thereby realizing cache consistency among different processor equipment, cache consistency among different target equipment and cache consistency between any processor equipment and any target equipment. The scheme supports more application scenes, has flexible expansibility, can meet the requirements of high memory capacity and high delay, and improves the flexibility and expandability of cache consistency among different devices.
Description
Technical Field
The present invention relates to the field of computer technology, and in particular, to a data processing system, a method, an apparatus, a medium, and a computer program product.
Background
CXL (Compute Express Link, cache coherence interconnect technology) is an asymmetric cache coherence bus, and Master (Master mode) and Slave (Slave mode) are generally used for interconnection among the Master and the Slave, so that role restriction of the Master and the Slave exists; and the interconnection between the Master and the Slave needs to be expanded by means of an expansion card, so that the use scene is limited.
Therefore, how to improve flexibility and scalability of cache consistency between different devices is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
Accordingly, it is an object of the present invention to provide a data processing system, method, apparatus, medium and computer program product for improving flexibility and scalability of cache coherency among different devices. The specific scheme is as follows:
In a first aspect, the present invention provides a data processing system comprising: at least one processor device and at least one target device connected to the at least one processor device;
Each processor device and each target device includes: a coherency function processing device and at least one coherency interface; in the same device, the consistency function processing device is in communication connection with at least one consistency interface;
The two consistency interfaces are in communication connection with each other in different processor devices, in different target devices and in any processor device and any target device, and are used for realizing cache consistency among different processor devices, cache consistency among different target devices and cache consistency between any processor device and any target device.
In another aspect, an arbitrary coincidence processing device includes:
The register configuration mode selection module is used for configuring an interface for connecting the consistency interface as an AXI interface or an ACE interface according to the received register configuration request;
The ACE interface processing module is used for determining processing logic of the current consistency protocol request according to the transaction type of the received consistency protocol request;
And the interface conversion module is used for realizing interface conversion between the AXI interface and the ACE interface.
In another aspect, the arbitrary coincidence processing device further includes:
and the multipath detection module is used for writing the response data in the peer processor cache into the request processor cache when detecting that the response data required by the request processor exists in the peer processor cache in the current device.
In another aspect, the arbitrary coincidence processing device further includes:
The second-level cache interface module is used for converting the data format of the received second-level cache access request into a data format matched with the second-level cache interface and processing the received second-level cache access request; after the processing is completed, the processing result is returned to the sending end of the secondary cache access request in an original path.
In another aspect, the arbitrary coincidence processing device further includes:
The virtual memory transaction module is used for detecting whether the current consistency protocol request is an invalid request or not if the transaction type of the consistency protocol request is determined to be a distributed virtual memory transaction; if the processing is invalid, the invalidation request is processed.
In another aspect, the arbitrary coincidence processing device further includes: the ACE_AXI interface processing module is used for detecting the current effective interface type, processing the received consistency protocol request according to the current effective interface type, and filtering request processing logic which is not matched with the current effective interface type; the currently available interface type is the ACE interface or AXI interface.
In another aspect, the arbitrary coherence interface includes:
And the ACE/AXI conversion CPI interface module is used for converting the received ACE interface signal or AXI interface signal into a CPI interface format.
In another aspect, the arbitrary coherence interface further comprises:
The interface channel module is used for realizing the transmission control functions of a physical layer, a link layer and a transmission layer; wherein, the transmission control function of the physical layer includes: the initialization and control functions of the physical link, the transmission control functions of the link layer include: the data link state control and management function, the transmission control function of the transmission layer includes: and (5) packaging and unpacking the message.
In another aspect, the arbitrary coherence interface further comprises:
and the configuration module is used for responding to the configuration instruction of the interface channel module and carrying out configuration space configuration and configuration of a memory space register on the interface channel module.
On the other hand, any processor device or any target device is used as an initiating terminal, and any communication opposite terminal of the initiating terminal is used as a target terminal, so that a cache consistency process between the initiating terminal and the target terminal comprises the following steps:
The consistency function processing device in the initiating terminal transmits a memory data synchronous request to a corresponding target consistency interface in the initiating terminal; the target consistency interface generates a consistency protocol request according to the memory data synchronous request, and transmits the consistency protocol request to a target consistency interface in the target end; the destination consistency interface transmits the consistency protocol request to a consistency function processing device in the destination end; the consistency function processing device in the destination end converts the consistency protocol request into a memory protocol request; and the memory system in the destination terminal requests to read corresponding data in the memory system according to the memory protocol, and returns the read data to the initiating terminal in an original way.
In another aspect, a cache coherency process between any processor device and any target device includes:
The current processor device loads a memory data synchronization request of the current target device;
And if the request data of the current memory data synchronous request exists in the current processor cache, reading the request data from the current processor cache.
In another aspect, the cache coherence process between any processor device and any target device further comprises:
If the current processor cache does not have the request data of the current memory data synchronization request, the current memory data synchronization request is sent to an ACE interface processing module in a consistency function processing device of the current processor device through an ACE interface in the consistency function processing device of the current processor device, so that the ACE interface processing module judges whether the current memory data synchronization request is an invalid request or not;
if the request is not an invalid request, the ACE interface processing module sends the current memory data synchronization request to a multi-path detection module in a consistency function processing device of the current processor device, so that the multi-path detection module detects a secondary cache and a peer processor cache in the current processor device;
And if the request data exists in the secondary cache or the peer processor cache, reading the request data from the secondary cache or the peer processor cache, and updating the cache line state.
In another aspect, cache coherence between any processor device and any target device further comprises:
If the request data do not exist in the secondary cache and the peer processor cache, a consistency function processing device in the current processor device generates a consistency protocol request according to the current memory data synchronous request, and transmits the current memory data synchronous request to a corresponding target consistency interface in the current processor device; the target consistency interface generates a consistency protocol request according to the current memory data synchronous request, and transmits the consistency protocol request to a target consistency interface in current target equipment; the target consistency interface transmits the consistency protocol request to a consistency function processing device in the current target equipment; the consistency function processing device in the current target equipment converts the consistency protocol request into a memory protocol request; and the memory system in the current target equipment reads corresponding data in the memory system according to the memory protocol request, and the read data is returned to the current processor equipment in an original way.
In another aspect, a cache coherence process between different target devices includes:
The first target device loads a memory data synchronization request of the second target device;
and if the request data of the current memory data synchronous request exists in the first target equipment cache, reading the request data from the first target equipment cache.
In another aspect, a cache coherence process between different target devices includes:
If the first target equipment cache does not have the request data of the current memory data synchronous request, sending a detection request to a consistency interface of the second target equipment through a consistency function processing device and the consistency interface of the first target equipment; the consistency interface of the second target equipment checks the local cache state according to the detection request, and if the local cache of the second target equipment is hit, the request data is read from the local cache of the second target equipment; and if the local cache of the second target device is not hit, reading the request data from the local memory of the second target device.
On the other hand, the cache consistency protocol between different devices is MESI, MOESI or MESIF.
In another aspect, the present invention provides a data processing method applied to a data processing system, the data processing system comprising: at least one processor device and at least one target device connected to the at least one processor device; each processor device and each target device includes: a coherency function processing device and at least one coherency interface; in the same device, the consistency function processing device is in communication connection with at least one consistency interface; the two consistency interfaces are in communication connection among different processor devices, different target devices, any processor device and any target device and are used for realizing cache consistency among different processor devices, cache consistency among different target devices and cache consistency between any processor device and any target device;
The data processing method includes the steps that any processor device or any target device is used as an initiating terminal, and any communication opposite terminal of the initiating terminal is used as a target terminal:
The consistency function processing device in the initiating terminal transmits a memory data synchronous request to a corresponding target consistency interface in the initiating terminal; the target consistency interface generates a consistency protocol request according to the memory data synchronous request, and transmits the consistency protocol request to a target consistency interface in the target end; the destination consistency interface transmits the consistency protocol request to a consistency function processing device in the destination end; the consistency function processing device in the destination end converts the consistency protocol request into a memory protocol request; and the memory system in the destination terminal requests to read corresponding data in the memory system according to the memory protocol, and returns the read data to the initiating terminal in an original way.
In another aspect, the present invention provides a processor device comprising: a coherency function processing device and at least one coherency interface; the consistency function processing device is in communication connection with at least one consistency interface;
and the random consistency interface of the processor equipment is in communication connection with the consistency interfaces of other equipment, and is used for realizing cache consistency between the processor equipment and the other equipment.
In another aspect, the present invention provides an accelerator apparatus comprising: a coherency function processing device and at least one coherency interface; the consistency function processing device is in communication connection with at least one consistency interface;
And the random consistency interface of the accelerator equipment is in communication connection with the consistency interfaces of other equipment, and is used for realizing cache consistency between the accelerator equipment and the other equipment.
In another aspect, the present invention provides an electronic device, including:
A memory for storing a computer program;
And a processor for executing the computer program to implement the previously disclosed data processing method.
In another aspect, the present invention provides a non-volatile storage medium for storing a computer program, wherein the computer program when executed by a processor implements the data processing method disclosed above.
In another aspect, the invention provides a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the previously disclosed data processing method.
As can be seen from the above, the present invention provides a data processing system, comprising: at least one processor device and at least one target device connected to the at least one processor device; each processor device and each target device includes: a coherency function processing device and at least one coherency interface; in the same device, the consistency function processing device is in communication connection with at least one consistency interface; the two consistency interfaces are in communication connection with each other in different processor devices, in different target devices and in any processor device and any target device, and are used for realizing cache consistency among different processor devices, cache consistency among different target devices and cache consistency between any processor device and any target device.
The beneficial effects of the invention are as follows: the method comprises the steps of arranging a consistency function processing device and at least one consistency interface in equipment, and enabling different consistency interfaces to realize one-to-one communication connection, so that cache consistency among different processor equipment, cache consistency among different target equipment and cache consistency between any processor equipment and any target equipment are realized. The scheme supports more application scenes, has flexible expansibility, can meet the requirements of high memory capacity and high delay, and improves the flexibility and expandability of cache consistency among different devices.
Accordingly, the data processing method, the device, the nonvolatile storage medium and the computer program product also have the technical effects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a data processing system according to the present disclosure;
FIG. 2 is a schematic diagram of a consistent interconnect based on a RISC-V processor implementation of the present disclosure;
FIG. 3 is a detailed schematic of FIG. 3 of the present disclosure;
FIG. 4 is a schematic diagram of a coherency function processing module according to the present disclosure;
FIG. 5 is a schematic diagram of a cache coherence protocol interface module disclosed in the present invention;
FIG. 6 is a schematic diagram of a data processing method according to the present disclosure;
FIG. 7 is a diagram illustrating a server configuration according to the present invention;
fig. 8 is a diagram of a terminal structure according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other examples, which a person of ordinary skill in the art would obtain without undue burden based on the embodiments of the invention, are within the scope of the invention.
The interconnection buses of the processor include on-chip and inter-chip, and on-chip interconnection refers to interconnection between IP modules in a chip, such as an on-chip interconnection TileLink bus of a RISC-V processor; the inter-chip interconnection refers to the inter-chip interconnection and is divided into the inter-processor interconnection and the inter-peripheral interconnection, wherein the inter-processor interconnection comprises UPI (Ultra Path Interconnect) among Intel processors, IF (Infinity Fabric) buses among AMD ultrafine semiconductor processors and the like; the interconnection buses between the processor and the peripheral devices include low-speed buses such as I2C (Inter-INTEGRATED CIRCUIT) interface, SPI (SERIAL PERIPHERAL INTERFACE ), USB (Universal Serial Bus, universal serial bus), HDMI (High Definition Multimedia Interface, a digital video/audio interface technology), and high-speed buses such as PCIE (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, a high-speed serial computer expansion bus standard).
At present, the CXL bus is an asymmetric cache consistency bus, and a Master mode and a Slave mode are generally used for interconnection among the masters and between the slaves, so that role limitations of the masters and the slaves exist; and the interconnection between the Master and the Slave needs to be expanded by CXL Switch, so that the use scene is limited. Therefore, the invention provides a data processing scheme which can promote flexibility and expandability of cache consistency among different devices.
Referring to FIG. 1, an embodiment of the present invention discloses a data processing system, comprising: at least one processor device and at least one target device connected to the at least one processor device.
Wherein each processor device and each target device includes: a coherency function processing device and at least one coherency interface. Of course, each processor device and each target device may further include: multilevel cache, memory, etc. In the same device, the consistency function processing device is in communication connection with at least one consistency interface; the two consistency interfaces are in communication connection with each other in different processor devices, in different target devices and in any processor device and any target device, and are used for realizing cache consistency among different processor devices, cache consistency among different target devices and cache consistency between any processor device and any target device. The target device may be embodied as an accelerator device, a memory device, or the like. It should be noted that any two coherence interfaces may be directly connected one to one, or may be connected through a switch or other expansion card. The same coherence interface can enable multiplexing when connected through a switch or other expansion card.
In one example, with any processor device or any target device as an initiator and any communication peer of the initiator (i.e. other processor devices or other target devices) as a target, the cache coherence procedure between the initiator and the target includes: the consistency function processing device in the initiating terminal transmits the memory data synchronous request to the corresponding target consistency interface in the initiating terminal; the target consistency interface generates a consistency protocol request according to the memory data synchronous request, and transmits the consistency protocol request to a target consistency interface in the destination terminal; the destination consistency interface transmits a consistency protocol request to a consistency function processing device in the destination end; the consistency function processing device in the destination end converts the consistency protocol request into a memory protocol request; and the memory system in the destination terminal requests to read corresponding data in the memory system according to the memory protocol, and returns the read data to the initiating terminal in an original way.
In one embodiment, any coherency function processing device includes:
And the register configuration mode selection module is used for configuring an interface for connecting the consistency interface as an AXI interface or an ACE interface according to the received register configuration request.
And the ACE interface processing module is used for determining the processing logic of the current consistency protocol request according to the transaction type of the received consistency protocol request.
And the interface conversion module is used for realizing interface conversion between the AXI interface and the ACE interface.
And the multipath detection module is used for writing the response data in the peer processor cache into the request processor cache when detecting that the response data required by the request processor exists in the peer processor cache in the current device.
The second-level cache interface module is used for converting the data format of the received second-level cache access request into a data format matched with the second-level cache interface and processing the received second-level cache access request; after the processing is completed, the processing result is returned to the sending end of the secondary cache access request in an original path.
The virtual memory transaction module is used for detecting whether the current consistency protocol request is an invalid request or not if the transaction type of the consistency protocol request is determined to be a distributed virtual memory transaction; if the processing is invalid, the invalidation request is processed.
The ACE_AXI interface processing module is used for detecting the current effective interface type, processing the received consistency protocol request according to the current effective interface type, and filtering request processing logic which is not matched with the current effective interface type; the currently available interface type is the ACE interface or AXI interface.
In one embodiment, the arbitrary coherence interface comprises: and the ACE/AXI conversion CPI interface module is used for converting the received ACE interface signal or AXI interface signal into a CPI interface format. The interface channel module is used for realizing the transmission control functions of a physical layer, a link layer and a transmission layer; wherein, the transmission control function of the physical layer includes: the initialization and control functions of the physical link, the transmission control functions of the link layer include: the data link state control and management function, the transmission control function of the transmission layer includes: and (5) packaging and unpacking the message. And the configuration module is used for responding to the configuration instruction of the interface channel module, and carrying out configuration space configuration and configuration of the memory space register on the interface channel module.
In one embodiment, the cache coherence process between any processor device and any target device includes: the current processor device loads a memory data synchronization request of the current target device (the memory data synchronization request is used for accessing the memory data of the current target device); if the request data of the current memory data synchronization request already exists in the current processor cache (namely, a first-level cache in a processor core loading the memory data synchronization request in the current processor device), the request data is read from the current processor cache. If the current processor cache does not have the request data of the current memory data synchronization request, the current memory data synchronization request is sent to an ACE interface processing module in a consistency function processing device of the current processor device through an ACE interface in the consistency function processing device of the current processor device, so that the ACE interface processing module judges whether the current memory data synchronization request is an invalid request or not; if the request is not an invalid request, the ACE interface processing module sends the current memory data synchronization request to a multi-path detection module in a consistency function processing device of the current processor device, so that the multi-path detection module detects a secondary cache (namely, a secondary cache in the current processor device) and a peer processor cache (a primary cache in other processor cores which are not loaded with the memory data synchronization request in the current processor device); and if the request data exists in the secondary cache or the peer processor cache, reading the request data from the secondary cache or the peer processor cache, and updating the cache line state. If the request data do not exist in the second-level cache and the peer processor cache, a consistency function processing device in the current processor device generates a consistency protocol request according to the current memory data synchronous request, and transmits the current memory data synchronous request to a corresponding target consistency interface in the current processor device; the target consistency interface generates a consistency protocol request according to the current memory data synchronous request, and transmits the consistency protocol request to the target consistency interface in the current target equipment; the target consistency interface transmits a consistency protocol request to a consistency function processing device in the current target equipment; the consistency function processing device in the current target equipment converts the consistency protocol request into a memory protocol request; and the memory system in the current target device requests to read corresponding data in the memory system according to the memory protocol, and the read data is returned to the current processor device in an original way.
In one embodiment, the cache coherence process between different target devices includes: the first target device loads a memory data synchronization request of the second target device; and if the request data of the current memory data synchronous request exists in the first target equipment cache, reading the request data from the first target equipment cache. If the first target equipment cache does not have the request data of the current memory data synchronous request, sending a detection request to a consistency interface of the second target equipment through a consistency function processing device and the consistency interface of the first target equipment; the consistency interface of the second target equipment checks the local cache state according to the detection request, and if the local cache of the second target equipment is hit, the request data is read from the local cache of the second target equipment; and if the local cache of the second target device is not hit, reading the request data from the local memory of the second target device.
In one embodiment, the cache coherence protocol between different devices is MESI, MOESI, or MESIF.
It can be seen that, in this embodiment, the coherency function processing device and at least one coherency interface are disposed in each device, and the different coherency interfaces are made to implement one-to-one communication connection, so as to implement cache coherency between different processor devices, cache coherency between different target devices, and cache coherency between any processor device and any target device. The scheme supports more application scenes, has flexible expansibility, can meet the requirements of high memory capacity and high delay, and improves the flexibility and expandability of cache consistency among different devices.
A data processing method provided in the embodiments of the present invention is described below, and a data processing method described below may refer to other embodiments described herein.
The invention provides a data processing method, which is applied to a data processing system, wherein the data processing system comprises the following steps: at least one processor device and at least one target device connected to the at least one processor device; each processor device and each target device includes: a coherency function processing device and at least one coherency interface; in the same device, the consistency function processing device is in communication connection with at least one consistency interface; the two consistency interfaces are in communication connection with each other in different processor devices, in different target devices and in any processor device and any target device, and are used for realizing cache consistency among different processor devices, cache consistency among different target devices and cache consistency between any processor device and any target device.
In this embodiment, any processor device or any target device is used as an initiating terminal, and any communication opposite terminal of the initiating terminal is used as a target terminal, then the data processing method includes: the consistency function processing device in the initiating terminal transmits the memory data synchronous request to the corresponding target consistency interface in the initiating terminal; the target consistency interface generates a consistency protocol request according to the memory data synchronous request, and transmits the consistency protocol request to a target consistency interface in the destination terminal; the destination consistency interface transmits a consistency protocol request to a consistency function processing device in the destination end; the consistency function processing device in the destination end converts the consistency protocol request into a memory protocol request; and the memory system in the destination terminal requests to read corresponding data in the memory system according to the memory protocol, and returns the read data to the initiating terminal in an original way.
In one embodiment, any coherency function processing device includes: the register configuration mode selection module is used for configuring an interface for connecting the consistency interface as an AXI interface or an ACE interface according to the received register configuration request; the ACE interface processing module is used for determining processing logic of the current consistency protocol request according to the transaction type of the received consistency protocol request; and the interface conversion module is used for realizing interface conversion between the AXI interface and the ACE interface.
In one embodiment, the arbitrary coincidence processing device further includes: and the multipath detection module is used for writing the response data in the peer processor cache into the request processor cache when detecting that the response data required by the request processor exists in the peer processor cache in the current device.
In one embodiment, the arbitrary coincidence processing device further includes: the second-level cache interface module is used for converting the data format of the received second-level cache access request into a data format matched with the second-level cache interface and processing the received second-level cache access request; after the processing is completed, the processing result is returned to the sending end of the secondary cache access request in an original path.
In one embodiment, the arbitrary coincidence processing device further includes: the virtual memory transaction module is used for detecting whether the current consistency protocol request is an invalid request or not if the transaction type of the consistency protocol request is determined to be a distributed virtual memory transaction; if the processing is invalid, the invalidation request is processed.
In one embodiment, the arbitrary coincidence processing device further includes: the ACE_AXI interface processing module is used for detecting the current effective interface type, processing the received consistency protocol request according to the current effective interface type, and filtering request processing logic which is not matched with the current effective interface type; the currently available interface type is the ACE interface or AXI interface.
In one embodiment, the arbitrary coherence interface comprises: and the ACE/AXI conversion CPI interface module is used for converting the received ACE interface signal or AXI interface signal into a CPI interface format.
In one embodiment, the arbitrary coherence interface further comprises: the interface channel module is used for realizing the transmission control functions of a physical layer, a link layer and a transmission layer; wherein, the transmission control function of the physical layer includes: the initialization and control functions of the physical link, the transmission control functions of the link layer include: the data link state control and management function, the transmission control function of the transmission layer includes: and (5) packaging and unpacking the message.
In one embodiment, the arbitrary coherence interface further comprises: the configuration module is used for carrying out configuration space configuration and configuration of the memory space registers on the interface channel module.
In one embodiment, any processor device or any target device is used as an initiating terminal, and any communication opposite terminal of the initiating terminal is used as a destination terminal, and the cache consistency process between the initiating terminal and the destination terminal comprises: the consistency function processing device in the initiating terminal transmits the memory data synchronous request to the corresponding target consistency interface in the initiating terminal; the target consistency interface generates a consistency protocol request according to the memory data synchronous request, and transmits the consistency protocol request to a target consistency interface in the destination terminal; the destination consistency interface transmits a consistency protocol request to a consistency function processing device in the destination end; the consistency function processing device in the destination end converts the consistency protocol request into a memory protocol request; and the memory system in the destination terminal requests to read corresponding data in the memory system according to the memory protocol, and returns the read data to the initiating terminal in an original way.
In one embodiment, the cache coherence process between any processor device and any target device includes: the current processor device loads a memory data synchronization request of the current target device; and if the request data of the current memory data synchronous request exists in the current processor cache, the request data is read from the current processor cache.
In one embodiment, the cache coherence process between any processor device and any target device further comprises: if the current processor cache does not have the request data of the current memory data synchronization request, the current memory data synchronization request is sent to an ACE interface processing module in a consistency function processing device of the current processor device through an ACE interface in the consistency function processing device of the current processor device, so that the ACE interface processing module judges whether the current memory data synchronization request is an invalid request or not; if the request is not an invalid request, the ACE interface processing module sends the current memory data synchronization request to a multi-path detection module in a consistency function processing device of the current processor device, so that the multi-path detection module detects a secondary cache and a peer processor cache in the current processor device; and if the request data exists in the secondary cache or the peer processor cache, reading the request data from the secondary cache or the peer processor cache, and updating the cache line state.
In one embodiment, the cache coherence between any processor device and any target device further comprises: if the request data do not exist in the second-level cache and the peer processor cache, a consistency function processing device in the current processor device generates a consistency protocol request according to the current memory data synchronous request, and transmits the current memory data synchronous request to a corresponding target consistency interface in the current processor device; the target consistency interface generates a consistency protocol request according to the current memory data synchronous request, and transmits the consistency protocol request to the target consistency interface in the current target equipment; the target consistency interface transmits a consistency protocol request to a consistency function processing device in the current target equipment; the consistency function processing device in the current target equipment converts the consistency protocol request into a memory protocol request; and the memory system in the current target device requests to read corresponding data in the memory system according to the memory protocol, and the read data is returned to the current processor device in an original way.
In one embodiment, the cache coherence process between different target devices includes: the first target device loads a memory data synchronization request of the second target device; and if the request data of the current memory data synchronous request exists in the first target equipment cache, reading the request data from the first target equipment cache.
In one embodiment, the cache coherence process between different target devices includes: if the first target equipment cache does not have the request data of the current memory data synchronous request, sending a detection request to a consistency interface of the second target equipment through a consistency function processing device and the consistency interface of the first target equipment; the consistency interface of the second target equipment checks the local cache state according to the detection request, and if the local cache of the second target equipment is hit, the request data is read from the local cache of the second target equipment; and if the local cache of the second target device is not hit, reading the request data from the local memory of the second target device.
In one embodiment, the cache coherence protocol between different devices is MESI, MOESI, or MESIF.
It can be seen that the embodiment can achieve cache consistency between different processor devices, cache consistency between different target devices, and cache consistency between any processor device and any target device. The scheme supports more application scenes, has flexible expansibility, can meet the requirements of high memory capacity and high delay, and improves the flexibility and expandability of cache consistency among different devices.
A processor device and an accelerator device provided by embodiments of the present invention are described below, and a processor device and an accelerator device described below may be referred to with respect to other embodiments described herein.
The present invention provides a processor device comprising: a coherency function processing device and at least one coherency interface; the consistency function processing device is in communication connection with at least one consistency interface; any consistency interface of the processor device is in communication connection with a consistency interface of other devices (such as an accelerator device or the processor device provided by the invention) and is used for realizing cache consistency between the processor device and the other devices.
The present invention also provides an accelerator apparatus comprising: a coherency function processing device and at least one coherency interface; the consistency function processing device is in communication connection with at least one consistency interface; any consistency interface of the accelerator device is in communication connection with a consistency interface of other devices (such as the accelerator device or the processor device provided by the invention) and is used for realizing cache consistency between the accelerator device and the other devices.
In one embodiment, any coherency function processing device includes: the register configuration mode selection module is used for configuring an interface for connecting the consistency interface as an AXI interface or an ACE interface according to the received register configuration request; the ACE interface processing module is used for determining processing logic of the current consistency protocol request according to the transaction type of the received consistency protocol request; and the interface conversion module is used for realizing interface conversion between the AXI interface and the ACE interface.
In one embodiment, the arbitrary coincidence processing device further includes: and the multipath detection module is used for writing the response data in the peer processor cache into the request processor cache when detecting that the response data required by the request processor exists in the peer processor cache in the current device.
In one embodiment, the arbitrary coincidence processing device further includes: the second-level cache interface module is used for converting the data format of the received second-level cache access request into a data format matched with the second-level cache interface and processing the received second-level cache access request; after the processing is completed, the processing result is returned to the sending end of the secondary cache access request in an original path.
In one embodiment, the arbitrary coincidence processing device further includes: the virtual memory transaction module is used for detecting whether the current consistency protocol request is an invalid request or not if the transaction type of the consistency protocol request is determined to be a distributed virtual memory transaction; if the processing is invalid, the invalidation request is processed.
In one embodiment, the arbitrary coincidence processing device further includes: the ACE_AXI interface processing module is used for detecting the current effective interface type, processing the received consistency protocol request according to the current effective interface type, and filtering request processing logic which is not matched with the current effective interface type; the currently available interface type is the ACE interface or AXI interface.
In one embodiment, the arbitrary coherence interface comprises: and the ACE/AXI conversion CPI interface module is used for converting the received ACE interface signal or AXI interface signal into a CPI interface format.
In one embodiment, the arbitrary coherence interface further comprises: the interface channel module is used for realizing the transmission control functions of a physical layer, a link layer and a transmission layer; wherein, the transmission control function of the physical layer includes: the initialization and control functions of the physical link, the transmission control functions of the link layer include: the data link state control and management function, the transmission control function of the transmission layer includes: and (5) packaging and unpacking the message.
In one embodiment, the arbitrary coherence interface further comprises: the configuration module is used for carrying out configuration space configuration and configuration of the memory space registers on the interface channel module.
In one embodiment, any processor device or any target device is used as an initiating terminal, and any communication opposite terminal of the initiating terminal is used as a destination terminal, and the cache consistency process between the initiating terminal and the destination terminal comprises: the consistency function processing device in the initiating terminal transmits the memory data synchronous request to the corresponding target consistency interface in the initiating terminal; the target consistency interface generates a consistency protocol request according to the memory data synchronous request, and transmits the consistency protocol request to a target consistency interface in the destination terminal; the destination consistency interface transmits a consistency protocol request to a consistency function processing device in the destination end; the consistency function processing device in the destination end converts the consistency protocol request into a memory protocol request; and the memory system in the destination terminal requests to read corresponding data in the memory system according to the memory protocol, and returns the read data to the initiating terminal in an original way.
In one embodiment, the cache coherence process between any processor device and any target device includes: the current processor device loads a memory data synchronization request of the current target device; and if the request data of the current memory data synchronous request exists in the current processor cache, the request data is read from the current processor cache.
In one embodiment, the cache coherence process between any processor device and any target device further comprises: if the current processor cache does not have the request data of the current memory data synchronization request, the current memory data synchronization request is sent to an ACE interface processing module in a consistency function processing device of the current processor device through an ACE interface in the consistency function processing device of the current processor device, so that the ACE interface processing module judges whether the current memory data synchronization request is an invalid request or not; if the request is not an invalid request, the ACE interface processing module sends the current memory data synchronization request to a multi-path detection module in a consistency function processing device of the current processor device, so that the multi-path detection module detects a secondary cache and a peer processor cache in the current processor device; and if the request data exists in the secondary cache or the peer processor cache, reading the request data from the secondary cache or the peer processor cache, and updating the cache line state.
In one embodiment, the cache coherence between any processor device and any target device further comprises: if the request data do not exist in the second-level cache and the peer processor cache, a consistency function processing device in the current processor device generates a consistency protocol request according to the current memory data synchronous request, and transmits the current memory data synchronous request to a corresponding target consistency interface in the current processor device; the target consistency interface generates a consistency protocol request according to the current memory data synchronous request, and transmits the consistency protocol request to the target consistency interface in the current target equipment; the target consistency interface transmits a consistency protocol request to a consistency function processing device in the current target equipment; the consistency function processing device in the current target equipment converts the consistency protocol request into a memory protocol request; and the memory system in the current target device requests to read corresponding data in the memory system according to the memory protocol, and the read data is returned to the current processor device in an original way.
In one embodiment, the cache coherence process between different target devices includes: the first target device loads a memory data synchronization request of the second target device; and if the request data of the current memory data synchronous request exists in the first target equipment cache, reading the request data from the first target equipment cache.
In one embodiment, the cache coherence process between different target devices includes: if the first target equipment cache does not have the request data of the current memory data synchronous request, sending a detection request to a consistency interface of the second target equipment through a consistency function processing device and the consistency interface of the first target equipment; the consistency interface of the second target equipment checks the local cache state according to the detection request, and if the local cache of the second target equipment is hit, the request data is read from the local cache of the second target equipment; and if the local cache of the second target device is not hit, reading the request data from the local memory of the second target device.
In one embodiment, the cache coherence protocol between different devices is MESI, MOESI, or MESIF.
It can be seen that the embodiment can achieve cache consistency between different processor devices, cache consistency between different target devices, and cache consistency between any processor device and any target device. The scheme supports more application scenes, has flexible expansibility, can meet the requirements of high memory capacity and high delay, and improves the flexibility and expandability of cache consistency among different devices.
Taking a RISC-V processor as an example, a scheme for realizing a server-level symmetrical cache consistency interconnection bus on the RISC-V processor (a free open-source processor based on a reduced instruction set) side is introduced, so that the RISC-V server can be designed, remote consistency memory expansion is supported, remote equipment consistency access to the RISC-V processor memory is also supported, and the application range of the RISC-V processor can be expanded.
Referring to fig. 2, in a schematic diagram of a coherency interconnect implemented on a RISC-V processor side, a RISC-V processor deploys a coherency function processing device and a cache coherency peripheral interface (i.e., a coherency interface) that interfaces with a memory coherency interface of a peripheral. The peripherals may be FPGAs, GPUs, and memory units. The application of the RISC-V processor can be expanded through the interconnection bus shown in fig. 2, and the RISC-V-based high-performance server is designed to be used for cloud computing, AI, finance and other scenes.
In the scheme shown in fig. 2, the cache consistency characteristic of the bus supports that the processor can access the memory of the remote device in a consistent manner, and the remote device can also access the memory of the processor in a consistent manner, so that the method is particularly suitable for scenes with high memory requirements such as model training and scenes with interaction with the memory of the processor such as a network card. That is to say: the memory in the processor is shared by the processor and the downstream peripheral, so the processor is called shared memory; the memory in the downstream peripheral is shared by the processor and the downstream peripheral and is also referred to as shared memory. The symmetry of bus cache consistency enables consistent access among RISC-V processors, consistent access among devices, original support of MESH topology and high expandability. The cache coherence protocol mode is different for different RISC-V processors, the coherence interconnection bus can be switched to a cache coherence protocol consistent with the processor through register configuration, for example, the interconnection of a RISC-V processor supporting MESI and a device uses the MESI coherence protocol, the interconnection of a RISC-V processor supporting MOESI and the device uses the MOESI coherence protocol, and the RISC-V processor system with a NUMA (Non-Uniform Memory Access, non-coherent memory access) structure uses the MESIF protocol. Different interface modes can be flexibly used according to different application scenes of the current processor system, and power consumption is reduced.
The scheme is designed by the RISC-V processor core of the open-source Xuan iron C910, 1-4 cores are supported to be configurable, RISC-V64 GC instruction architecture, 12-level deep water architecture, 3-decode 8-executed superscalar architecture are supported, and the design of the server-level cache consistency bus is carried out based on the processor core. Referring to fig. 3, the first interface type is ACE, the second interface type is AXI, and the third interface type is CPI. The RISC-V Core module is a RISC-V processor Core (such as Core 0 and Core 1 in fig. 3), can be an open-source RISC-V Core, can also be a commercial RISC-V Core, and realizes pipeline processing functions of fetching, instruction decoding, executing, memory access, write-back, branch prediction and the like, and the external interface is an ACE (AXI Coherency Extensions) interface of AMBA (Advanced Microcontroller Bus Architecture, protocol specification of inter-module interconnection), which supports cache consistency. The consistency function processing module is a cache consistency processing module of a host side, realizes a cache consistency protocol by using an ACE interface, realizes a 6-state machine, can select different consistency protocols according to different register configuration of a processor, and can realize a corresponding protocol in a MESI, MOESI, MESIF state. The module receives a consistency request initiated by Core and sends the consistency request to a downstream cache consistency peripheral interface module, and then sends the request to a peripheral to realize consistency access to a remote memory; and simultaneously, receiving a consistency read-write request initiated by the peripheral and sending the request to the memory unit, so as to realize the consistency access of the equipment to the memory of the host. The second-level cache is a second-level cache unit shared by multiple cores. The cache consistency peripheral interface module is a consistency interconnection bus interface of the host side and realizes a consistency interconnection protocol of the host side; the register unit module is used for realizing the register read-write function of the configuration space and the memory space of the consistency interconnection protocol; the AXI interconnection module is used for realizing AXI interface interconnection and is used for connecting a plurality of AXI devices such as a memory subsystem, a serial port and other external units.
Referring to fig. 4, the consistency function processing module includes:
Register configuration mode selection module: the Core-0 initiates a configuration mode register write request, the module parses and outputs as two mode select registers, wherein the protocol mode register is used for the ACE interface processing module to cache the configuration of the coherence protocol mode, and the interface mode register is used for the ace_axi interface processing module to use for the user interface format selection of the module, a value of 1 indicates the AXI interface, and a value of 0 indicates the ACE interface.
ACE interface processing module: and receiving a consistent read-write request initiated by RISC-V Core, and sending the request to a virtual kneading transaction module for subsequent processing according to the type of the requested transaction if the request is a distributed virtual memory transaction, and sending other transaction types to a multipath detection module for subsequent processing.
The ACE_AXI interface processing module receives a consistency read-write request initiated by a cache consistency peripheral interface, realizes the function similar to the ACE interface processing module, and increases the ACE/AXI interface mode switching function: if the interface mode register is configured to be in an AXI mode, bypass drops ACE interface related logic (the module logic occupies relatively large) and resources can be saved, thereby saving power consumption.
Virtual memory transaction module: and processing the distributed virtual memory transaction request, and mainly processing the invalidation requests of the TLB and the instruction Cache.
And a multi-path detection module: and processing the MESI cache line state. When detecting whether the peer Cache has the request address data, if the peer Cache is found to have the response data, the peer Cache data is directly written into the current Cache address, so that the operation of writing into a memory is omitted, and the searching performance is improved.
L2 cache interface module: converting requests initiated by each module for reading and writing L2 Cache into L2 Cache interface
ACE2AXI interface conversion module: and finishing conversion from ACE interface to AXI interface.
Referring to fig. 5, the cache coherence protocol interface module includes:
ACE/AXI changes CPI interface module: and converting the ACE interface signal format output by the consistency function processing module or the AXI interface into the CPI interface format of the consistency protocol interface channel module.
The format of the ACE/AXI interface signal is defined by AMBA AXI/ACE standard specification, and CPI (Conherence Protocol Interface) is an interface signal output by a custom consistency protocol interface channel, and mainly comprises three signals, namely request, response and data, wherein the format of the request signal is shown in table 1.
Table 1: request signal format of CPI
The data signal format of CPI is shown in table 2.
Table 2: data signal format of CPI
The CPI response signal format is shown in table 3.
Table 3: response signal format of CPI
The effective identification field indicates whether the signal is effective, the operation code field indicates what type of request or response, the ID number indicates a serial number, the load data and the response data are read-write data content, the address is a read-write memory address, and the end identification field identifies the end of the data.
A coherence protocol interface channel module: the system is divided into a physical layer, a link layer and a transmission layer, wherein the physical layer completes the related functions of physical link initialization and control, the link layer completes the control and management of the data link state, and the transmission layer completes the encapsulation and decapsulation of the message, so that various consistency protocol messages can be analyzed.
And (3) a configuration module: and finishing the initialization configuration of the consistency protocol interface channel module of the equipment end, and mainly performing the initialization configuration of the configuration space and the memory space register.
And the ACE_AXI interface processing module of the consistency function processing unit selects different cache consistency peripheral interfaces according to the destination address of the request and is used for connecting different off-chip interfaces.
Referring to fig. 5, taking the topology of two processors and two accelerator peripherals as an example, implementation: the processor coherence accesses the accelerator device memory or the remote processor memory and the accelerator device coherence accesses the remote accelerator device memory or the processor memory. And the processor and accelerator coherent interconnection access process is illustrated by taking processor coherent access to the accelerator device memory and accelerator device coherent access to the remote device memory as examples. As shown in FIG. 5, the processor and the accelerator peripheral are directly connected through a cache consistency interface to form a MESH topology, so as to support consistency access between the processors, between the processors and the accelerator, and between the accelerators. The MESH topology is a topology in which all nodes are connected to each other, and each node is connected to at least two other nodes, and an integral network is formed between all nodes.
The system power-on initialization process shown in fig. 5 includes: after power-on, the physical layer of the cache consistency peripheral interface of the processor side and the equipment side completes training and linking of the link; after the link, a configuration module at the processor side initiates the initialization configuration of a configuration space and a memory space register of a peripheral interface with cache consistency of the peripheral; simultaneously configuring ID numbers of a local terminal and a device terminal for forwarding of a protocol layer; the RISC-V core initiates a mode register configuration request according to the processor type and the operation type, a protocol mode register is configured as an MESI protocol, and an interface mode register is an AXI interface. The host core then initiates a request to read from and write to the device memory. The first interface type is ACE, the second interface type is AXI, and the third interface type is CPI.
The data processing flow of the processor consistency access to the memory of the accelerator device is shown in fig. 6. In fig. 6, a core in the processor initiates a request to load peripheral memory data; whether the data of the external memory is cached in the primary cache of the current core or not; if yes, reading data from the first-level cache; if not, the request is sent to an ACE interface processing module of the consistency function processing module through the ACE interface; judging whether the request is an invalid LTB/ICACHE request according to the request type; if yes, sending a request to a cache exchange module for subsequent processing. The current scene does not jump to the module. If the request is not an invalid request, the request is sent to a multi-path detection module, the multi-path detection module reads the second-level cache and detects whether other cores have corresponding data, if the request data is in the second-level cache, the data of the second-level cache is directly read and returned, if the second-level cache is not hit, whether the detection result has hit is checked, if yes, the data of other cores are read, the state of a cache line is updated, and if not hit, the request is sent to a cache consistency peripheral interface module through an AXI interface to read a peripheral memory. The cache consistency peripheral interface converts the AXI request into a request in a CPI format of a consistency protocol interface and sends the request to a consistency protocol interface channel; the consistency protocol interface channel converts the consistency request into serial data and sends the serial data to a cache consistency peripheral interface of the peripheral through a physical link; the consistency protocol interface channel converts the consistency request into serial data and sends the serial data to a cache consistency peripheral interface of the peripheral through a physical link; the cache consistency interface channel of the peripheral converts serial data sent by the link into self-defined CPI format request data and sends the request data to the consistency function processing module of the peripheral; the peripheral consistency function processing module converts a consistency request in a CPI format into an AXI memory bus interface format; the external memory subsystem reads the corresponding memory data and sends the corresponding memory data back to the core which initiates the request through the reverse data flow, and the core caches the data in the secondary cache and the self primary cache. The first interface is ACE, the second interface is AXI, and the third interface is CPI.
The consistent access memory flow between the accelerators comprises the following steps: the accelerator initiates a memory request of the read-write remote equipment, the consistency function processing unit searches equipment cache, if hit, the equipment cache data is read, if miss, a detection signal is sent to the remote accelerator through the cache consistency interface module, the cache consistency interface of the remote accelerator receives the detection request and checks the local cache state, and the cache state is returned to the local accelerator; the local accelerator checks the received peer cache state, if the peer cache is hit, a request is initiated to fetch from the peer cache, if the peer cache is not hit, an operation of reading and writing the memory address is initiated, and returned data is written back to the equipment cache and returned to the user acceleration unit. When the external interface of the user acceleration unit is AXI, the register can be configured to select the AXI interface, and other conversions are similar.
It should be noted that the process of accessing the remote processor memory by the processor is also similar, and will not be described herein. Moreover, the processor and the coherency processing unit within the accelerator function similarly, may support symmetric coherency processing, with the data stream decoupled from the host, but the initialization configuration still requires the processor to initiate.
The embodiment realizes the consistency interconnection function through the consistency function processing module of the processor side and the cache consistency peripheral interface, and the processor side and the equipment side both comprise the consistency function processing module, the cache consistency peripheral interface module and the memory subsystem to form a symmetrical consistency model, and support the original MESH topology, thereby supporting the consistency memory access function among the processor, the peripheral and the processor and among the equipment, supporting more application scenes and flexible expansibility. The processor consistency accesses the device memory and the characteristics of the device consistency accessing the host memory improve the performance of various application scenes, such as scenes with high requirements on memory capacity and high delay requirements, such as large model training, financial acceleration, high-performance network cards and the like. In addition, for memory expansion, various memory mediums are supported, such as PMEM persistent memory, LPDDR (low power DDR), SSD, and the like.
In a specific implementation, different cache coherence protocol mode configurations are supported for different RISC-V processors, such as a MESI coherence protocol is used for interconnection of a RISC-V processor supporting MESI and a device, a MOESI coherence protocol is used for interconnection of a RISC-V processor supporting MOESI and a device, and a MESIF protocol is used for a RISC-V processor system of a NUMA structure, which can be used for multiple types of RISC-V processors to realize server-level applications. Different interface modes can be flexibly used according to different application scenes of the current processor system, for example, an AXI interface can be used in a memory expansion application scene (a host read-write equipment memory), and a high-performance low-delay network card needs to read and write a host memory to be configured into an ACE interface user.
In this embodiment, a symmetrical cache consistency interface is designed, and the cache consistency of the interface enables the processor to access the peripheral memory in a consistent manner, the peripheral memory can also access the host memory in a consistent manner, so that the performance can be improved for applications related to memory expansion, such as large model training, and the delay can be greatly reduced for applications such as a network card. Symmetry supports flexible extensions, supporting coherent memory accesses between processors and coherent memory accesses between devices. The memory wall and the IO wall are effectively relieved.
An electronic device provided in the embodiments of the present invention is described below, and an electronic device described below may refer to other embodiments described herein.
The embodiment of the invention discloses an electronic device, which comprises:
a memory for storing a computer program;
and a processor for executing the computer program to implement the method disclosed in any of the above embodiments.
Further, the embodiment of the invention also provides electronic equipment. The electronic device may be a server as shown in fig. 7 or a terminal as shown in fig. 8. Fig. 7 and 8 are each a block diagram of an electronic device according to an exemplary embodiment, and the contents of the drawings should not be construed as any limitation on the scope of use of the present invention.
Fig. 7 is a schematic structural diagram of a server according to an embodiment of the present invention. The server specifically may include: at least one processor, at least one memory, a power supply, a communication interface, an input-output interface, and a communication bus. Wherein the memory is configured to store a computer program that is loaded and executed by the processor to implement the relevant steps in the data processing disclosed in any of the foregoing embodiments.
In this embodiment, the power supply is configured to provide a working voltage for each hardware device on the server; the communication interface can create a data transmission channel between the server and external equipment, and the communication protocol to be followed by the communication interface is any communication protocol applicable to the technical scheme of the invention, and the communication protocol is not particularly limited; the input/output interface is used for acquiring external input data or outputting data to the external, and the specific interface type can be selected according to the specific application requirement, and is not limited in detail herein.
In addition, the memory may be a read-only memory, a random access memory, a magnetic disk, an optical disk, or the like as a carrier for storing resources, where the resources stored include an operating system, a computer program, data, and the like, and the storage mode may be transient storage or permanent storage.
The operating system is used for managing and controlling each hardware device and computer program on the Server to realize the operation and processing of the processor on the data in the memory, and the operation and processing can be Windows Server, netware, unix, linux and the like. The computer program may further comprise a computer program capable of being used to perform other specific tasks in addition to the computer program capable of being used to perform the data processing method disclosed in any of the embodiments described above. The data may include data such as information on a developer of the application program in addition to data such as update information of the application program.
Fig. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention, where the terminal may specifically include, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, or the like.
Generally, the terminal in this embodiment includes: a processor and a memory.
The processor may include one or more processing cores, such as a 4-core processor, an 8-core processor, etc. The processor may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). The processor may also include a main processor, which is a processor for processing data in an awake state, also called a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor may incorporate a GPU (Graphics Processing Unit, image processor) for rendering and rendering of content to be displayed by the display screen. In some embodiments, the processor may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.
The memory may include one or more computer non-volatile storage media, which may be non-transitory. The memory may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory is at least used to store a computer program, where the computer program, after being loaded and executed by the processor, can implement relevant steps in the data processing method performed by the terminal side disclosed in any of the foregoing embodiments. In addition, the resources stored in the memory can also comprise an operating system, data and the like, and the storage mode can be short-term storage or permanent storage. The operating system may include Windows, unix, linux, among other things. The data may include, but is not limited to, update information for the application.
In some embodiments, the terminal may further include a display screen, an input-output interface, a communication interface, a sensor, a power supply, and a communication bus.
Those skilled in the art will appreciate that the structure shown in fig. 8 is not limiting of the terminal and may include more or fewer components than shown.
A non-volatile storage medium according to an embodiment of the present invention is described below, and the non-volatile storage medium described below and other embodiments described herein may be referred to with reference to each other.
A non-volatile storage medium for storing a computer program which, when executed by a processor, implements the data processing method disclosed in the foregoing embodiments. The nonvolatile storage medium is a computer readable nonvolatile storage medium, and can be read-only memory, random access memory, magnetic disk or optical disk, etc. as a carrier for storing resources, and the resources stored on the nonvolatile storage medium include an operating system, a computer program, data, etc., and the storage mode can be transient storage or permanent storage.
A computer program product provided by embodiments of the present invention is described below, and the computer program product described below may be referred to with respect to other embodiments described herein.
A computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the previously disclosed data processing method.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of non-volatile storage medium known in the art.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Claims (22)
1. A data processing system, comprising: at least one processor device and at least one target device connected to the at least one processor device;
Each processor device and each target device includes: a coherency function processing device and at least one coherency interface; in the same device, the consistency function processing device is in communication connection with at least one consistency interface;
The two consistency interfaces are in communication connection with each other in different processor devices, in different target devices and in any processor device and any target device, and are used for realizing cache consistency among different processor devices, cache consistency among different target devices and cache consistency between any processor device and any target device.
2. The system of claim 1, wherein any coherence function processing device comprises:
The register configuration mode selection module is used for configuring an interface for connecting the consistency interface as an AXI interface or an ACE interface according to the received register configuration request;
The ACE interface processing module is used for determining processing logic of the current consistency protocol request according to the transaction type of the received consistency protocol request;
And the interface conversion module is used for realizing interface conversion between the AXI interface and the ACE interface.
3. The system of claim 1, wherein any coherence function processing device further comprises:
and the multipath detection module is used for writing the response data in the peer processor cache into the request processor cache when detecting that the response data required by the request processor exists in the peer processor cache in the current device.
4. The system of claim 1, wherein any coherence function processing device further comprises:
The second-level cache interface module is used for converting the data format of the received second-level cache access request into a data format matched with the second-level cache interface and processing the received second-level cache access request; after the processing is completed, the processing result is returned to the sending end of the secondary cache access request in an original path.
5. The system of claim 2, wherein any coherence function processing device further comprises:
The virtual memory transaction module is used for detecting whether the current consistency protocol request is an invalid request or not if the transaction type of the consistency protocol request is determined to be a distributed virtual memory transaction; if the processing is invalid, the invalidation request is processed.
6. The system of claim 2, wherein any coherence function processing device further comprises:
The ACE_AXI interface processing module is used for detecting the current effective interface type, processing the received consistency protocol request according to the current effective interface type, and filtering request processing logic which is not matched with the current effective interface type; the currently available interface type is the ACE interface or AXI interface.
7. The system of claim 1, wherein any coherence interface comprises:
And the ACE/AXI conversion CPI interface module is used for converting the received ACE interface signal or AXI interface signal into a CPI interface format.
8. The system of claim 1, wherein any coherence interface further comprises:
The interface channel module is used for realizing the transmission control functions of a physical layer, a link layer and a transmission layer; wherein, the transmission control function of the physical layer includes: the initialization and control functions of the physical link, the transmission control functions of the link layer include: the data link state control and management function, the transmission control function of the transmission layer includes: and (5) packaging and unpacking the message.
9. The system of claim 8, wherein any coherence interface further comprises:
And the configuration module is used for responding to the configuration instruction of the interface channel module, and responding to the configuration instruction of the interface channel module to perform configuration space configuration and configuration of a memory space register on the interface channel module.
10. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
Taking any processor equipment or any target equipment as an initiating terminal and taking any communication opposite terminal of the initiating terminal as a target terminal, wherein the cache consistency process between the initiating terminal and the target terminal comprises the following steps:
The consistency function processing device in the initiating terminal transmits a memory data synchronous request to a corresponding target consistency interface in the initiating terminal; the target consistency interface generates a consistency protocol request according to the memory data synchronous request, and transmits the consistency protocol request to a target consistency interface in the target end; the destination consistency interface transmits the consistency protocol request to a consistency function processing device in the destination end; the consistency function processing device in the destination end converts the consistency protocol request into a memory protocol request; and the memory system in the destination terminal requests to read corresponding data in the memory system according to the memory protocol, and returns the read data to the initiating terminal in an original way.
11. The system of claim 1, wherein the cache coherence process between any processor device and any target device comprises:
The current processor device loads a memory data synchronization request of the current target device;
And if the request data of the current memory data synchronous request exists in the current processor cache, reading the request data from the current processor cache.
12. The system of claim 11, wherein the cache coherence process between any processor device and any target device further comprises:
If the current processor cache does not have the request data of the current memory data synchronization request, the current memory data synchronization request is sent to an ACE interface processing module in a consistency function processing device of the current processor device through an ACE interface in the consistency function processing device of the current processor device, so that the ACE interface processing module judges whether the current memory data synchronization request is an invalid request or not;
if the request is not an invalid request, the ACE interface processing module sends the current memory data synchronization request to a multi-path detection module in a consistency function processing device of the current processor device, so that the multi-path detection module detects a secondary cache and a peer processor cache in the current processor device;
And if the request data exists in the secondary cache or the peer processor cache, reading the request data from the secondary cache or the peer processor cache, and updating the cache line state.
13. The system of claim 11, wherein cache coherence between any processor device and any target device further comprises:
If the request data do not exist in the secondary cache and the peer processor cache, a consistency function processing device in the current processor device generates a consistency protocol request according to the current memory data synchronous request, and transmits the current memory data synchronous request to a corresponding target consistency interface in the current processor device; the target consistency interface generates a consistency protocol request according to the current memory data synchronous request, and transmits the consistency protocol request to a target consistency interface in current target equipment; the target consistency interface transmits the consistency protocol request to a consistency function processing device in the current target equipment; the consistency function processing device in the current target equipment converts the consistency protocol request into a memory protocol request; and the memory system in the current target equipment reads corresponding data in the memory system according to the memory protocol request, and the read data is returned to the current processor equipment in an original way.
14. The system of claim 1, wherein the cache coherence process between different target devices comprises:
The first target device loads a memory data synchronization request of the second target device;
and if the request data of the current memory data synchronous request exists in the first target equipment cache, reading the request data from the first target equipment cache.
15. The system of claim 14, wherein the cache coherence process between different target devices comprises:
If the first target equipment cache does not have the request data of the current memory data synchronous request, sending a detection request to a consistency interface of the second target equipment through a consistency function processing device and the consistency interface of the first target equipment; the consistency interface of the second target equipment checks the local cache state according to the detection request, and if the local cache of the second target equipment is hit, the request data is read from the local cache of the second target equipment; and if the local cache of the second target device is not hit, reading the request data from the local memory of the second target device.
16. The system of any one of claims 1 to 15, wherein the cache coherence protocol between different devices is MESI, MOESI or MESIF.
17. A data processing method, characterized by being applied to a data processing system, the data processing system comprising: at least one processor device and at least one target device connected to the at least one processor device; each processor device and each target device includes: a coherency function processing device and at least one coherency interface; in the same device, the consistency function processing device is in communication connection with at least one consistency interface; the two consistency interfaces are in communication connection among different processor devices, different target devices, any processor device and any target device and are used for realizing cache consistency among different processor devices, cache consistency among different target devices and cache consistency between any processor device and any target device;
The data processing method includes the steps that any processor device or any target device is used as an initiating terminal, and any communication opposite terminal of the initiating terminal is used as a target terminal:
The consistency function processing device in the initiating terminal transmits a memory data synchronous request to a corresponding target consistency interface in the initiating terminal; the target consistency interface generates a consistency protocol request according to the memory data synchronous request, and transmits the consistency protocol request to a target consistency interface in the target end; the destination consistency interface transmits the consistency protocol request to a consistency function processing device in the destination end; the consistency function processing device in the destination end converts the consistency protocol request into a memory protocol request; and the memory system in the destination terminal requests to read corresponding data in the memory system according to the memory protocol, and returns the read data to the initiating terminal in an original way.
18. An electronic device, comprising:
A memory for storing a computer program;
a processor for executing the computer program to implement the method of claim 17.
19. A non-volatile storage medium for storing a computer program, wherein the computer program when executed by a processor implements the method of claim 17.
20. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the data processing method of claim 17.
21. A processor device, comprising: a coherency function processing device and at least one coherency interface; the consistency function processing device is in communication connection with at least one consistency interface;
and the random consistency interface of the processor equipment is in communication connection with the consistency interfaces of other equipment, and is used for realizing cache consistency between the processor equipment and the other equipment.
22. An accelerator apparatus, comprising: a coherency function processing device and at least one coherency interface; the consistency function processing device is in communication connection with at least one consistency interface;
And the random consistency interface of the accelerator equipment is in communication connection with the consistency interfaces of other equipment, and is used for realizing cache consistency between the accelerator equipment and the other equipment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410534079.XA CN118113631B (en) | 2024-04-30 | 2024-04-30 | Data processing system, method, device, medium and computer program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410534079.XA CN118113631B (en) | 2024-04-30 | 2024-04-30 | Data processing system, method, device, medium and computer program product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118113631A true CN118113631A (en) | 2024-05-31 |
CN118113631B CN118113631B (en) | 2024-07-02 |
Family
ID=91212761
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410534079.XA Active CN118113631B (en) | 2024-04-30 | 2024-04-30 | Data processing system, method, device, medium and computer program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118113631B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118474209A (en) * | 2024-07-11 | 2024-08-09 | 山东海量信息技术研究院 | Memory expansion system and data package packaging method, device, medium and product thereof |
CN118503195A (en) * | 2024-07-17 | 2024-08-16 | 浪潮电子信息产业股份有限公司 | Data transmission method, equipment, heterogeneous system and consistency interconnection processing device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110221985A (en) * | 2019-06-06 | 2019-09-10 | 成都海光集成电路设计有限公司 | The apparatus and method of across chip maintenance buffer consistency strategy |
US20190347125A1 (en) * | 2016-12-31 | 2019-11-14 | Intel Corporation | Systems, methods, and apparatuses for heterogeneous computing |
CN116418869A (en) * | 2022-01-07 | 2023-07-11 | 三星电子株式会社 | Apparatus and method for cache coherency |
EP4273706A1 (en) * | 2022-05-02 | 2023-11-08 | Samsung Electronics Co., Ltd. | Storage device, memory device, and system including storage device and memory device |
CN117407194A (en) * | 2023-10-27 | 2024-01-16 | 中电科申泰信息科技有限公司 | Heterogeneous communication architecture based on cache consistency |
-
2024
- 2024-04-30 CN CN202410534079.XA patent/CN118113631B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190347125A1 (en) * | 2016-12-31 | 2019-11-14 | Intel Corporation | Systems, methods, and apparatuses for heterogeneous computing |
CN110221985A (en) * | 2019-06-06 | 2019-09-10 | 成都海光集成电路设计有限公司 | The apparatus and method of across chip maintenance buffer consistency strategy |
CN116418869A (en) * | 2022-01-07 | 2023-07-11 | 三星电子株式会社 | Apparatus and method for cache coherency |
EP4273706A1 (en) * | 2022-05-02 | 2023-11-08 | Samsung Electronics Co., Ltd. | Storage device, memory device, and system including storage device and memory device |
CN117407194A (en) * | 2023-10-27 | 2024-01-16 | 中电科申泰信息科技有限公司 | Heterogeneous communication architecture based on cache consistency |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118474209A (en) * | 2024-07-11 | 2024-08-09 | 山东海量信息技术研究院 | Memory expansion system and data package packaging method, device, medium and product thereof |
CN118503195A (en) * | 2024-07-17 | 2024-08-16 | 浪潮电子信息产业股份有限公司 | Data transmission method, equipment, heterogeneous system and consistency interconnection processing device |
Also Published As
Publication number | Publication date |
---|---|
CN118113631B (en) | 2024-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN118113631B (en) | Data processing system, method, device, medium and computer program product | |
US6918012B2 (en) | Streamlined cache coherency protocol system and method for a multiple processor single chip device | |
JP5348429B2 (en) | Cache coherence protocol for persistent memory | |
CN108268385B (en) | Optimized caching agent with integrated directory cache | |
US8037253B2 (en) | Method and apparatus for global ordering to insure latency independent coherence | |
CN101794271B (en) | Implementation method and device of consistency of multi-core internal memory | |
US8806232B2 (en) | Systems and method for hardware dynamic cache power management via bridge and power manager | |
CN113495861A (en) | System and method for computing | |
TW200534110A (en) | A method for supporting improved burst transfers on a coherent bus | |
TW200910100A (en) | Cache memory having configurable associativity | |
JP2020523674A (en) | Reduced cache transfer overhead within the system | |
US20140089600A1 (en) | System cache with data pending state | |
US9864687B2 (en) | Cache coherent system including master-side filter and data processing system including same | |
US20140025930A1 (en) | Multi-core processor sharing li cache and method of operating same | |
EP4123649A1 (en) | Memory module, system including the same, and operation method of memory module | |
US11436146B2 (en) | Storage control apparatus, processing apparatus, computer system, and storage control method | |
US20140156950A1 (en) | Emulated message signaled interrupts in multiprocessor systems | |
KR20060023963A (en) | Apparatus and method to provide multithreaded computer processing | |
US20220269433A1 (en) | System, method and apparatus for peer-to-peer communication | |
US20210224213A1 (en) | Techniques for near data acceleration for a multi-core architecture | |
US20030023794A1 (en) | Cache coherent split transaction memory bus architecture and protocol for a multi processor chip device | |
US20150074357A1 (en) | Direct snoop intervention | |
TW201423403A (en) | Efficient processing of access requests for a shared resource | |
US20180074964A1 (en) | Power aware hash function for cache memory mapping | |
US9141560B2 (en) | Multi-level storage apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |