CN117827706A - Data processing method, data processing device, electronic equipment and storage medium - Google Patents

Data processing method, data processing device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117827706A
CN117827706A CN202311846596.2A CN202311846596A CN117827706A CN 117827706 A CN117827706 A CN 117827706A CN 202311846596 A CN202311846596 A CN 202311846596A CN 117827706 A CN117827706 A CN 117827706A
Authority
CN
China
Prior art keywords
node
directory
data
processor core
snoop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311846596.2A
Other languages
Chinese (zh)
Inventor
贾启祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202311846596.2A priority Critical patent/CN117827706A/en
Publication of CN117827706A publication Critical patent/CN117827706A/en
Pending legal-status Critical Current

Links

Abstract

A data processing method, a data processing apparatus, an electronic device, and a storage medium. The data processing method is used for a first node in a plurality of nodes which are in communication connection with each other, the first node comprises a directory interception expansion filter for the first node, the directory interception expansion filter comprises a first consistency directory, a directory vector of each directory entry in the first consistency directory comprises a first part and a second part for object data, the first part is used for indicating whether other nodes except the first node cache the object data, and the second part is used for indicating whether the first node caches the object data. The data processing method comprises the following steps: in response to the directory snoop extension filter receiving a query requirement for the first data, directory entries of the first coherence directory are queried to determine whether the first data is cached within the first node and within other nodes other than the first node. The method can relieve the bandwidth occupation pressure of the transmission between the node partitions.

Description

Data processing method, data processing device, electronic equipment and storage medium
Technical Field
Embodiments of the present disclosure relate to a data processing method, a data processing apparatus, an electronic device, and a storage medium.
Background
Currently, for computer systems having multiple processors and multiple caches (caches), the caches may Cache large amounts of data, which may be exclusive or copies of the same address may be held in different caches. Consistency problems arise when different caches operate on cached data at the same address. To address this problem, many techniques have emerged to maintain data consistency. For example, when multiple caches store identical copies of data, a processor may want to modify the data stored in one cache, the data may be marked invalid in other cached copies to avoid a coherency error.
Disclosure of Invention
At least one embodiment of the present disclosure provides a data processing method for a first node of a plurality of nodes communicatively connected to each other, wherein the first node includes a directory listening extension filter for the first node, the directory listening extension filter including a first coherence directory, a directory vector of each directory entry in the first coherence directory including a first portion for object data and a second portion for indicating whether other nodes than the first node cache the object data, the second portion for indicating whether the first node caches the object data; the data processing method comprises the following steps: in response to the directory snoop extension filter receiving a query requirement for first data, directory entries of the first coherence directory are queried to determine whether the first data is cached within the first node and within other nodes other than the first node.
For example, in a data processing method according to at least one embodiment of the present disclosure, the first node further includes a first processor core, and the data processing method further includes: generating, by the first processor core, a first access request to the first data, wherein a storage address of the first data is located in a node other than the first node, the query requirement being generated based on the first access request; and according to a first query result of querying the directory entry of the first consistency directory, sending a first interception request by the directory interception expansion filter.
For example, in a data processing method according to at least one embodiment of the present disclosure, the first node further includes a second processor core, the second processor core includes at least one cache, and according to a first query result of querying a directory entry of the first coherence directory, sending, by the directory snoop extension filter, a first snoop request includes: and in response to determining that the first data is already cached in the cache of the second processor core according to the first query result, sending, by the directory snoop extension filter, the first snoop request to the second processor core.
For example, a data processing method according to at least one embodiment of the present disclosure further includes: and the second processor core responds to the first access request according to the first interception request, and the first data is sent to the first processor core from the cache of the second processor core.
For example, in a data processing method according to at least one embodiment of the present disclosure, according to a first query result of querying a directory entry of the first coherence directory, sending, by the directory interception expansion filter, a first interception request includes: and in response to determining that the first data is cached in other nodes except the first node according to the first query result, sending the first interception request to the other nodes except the first node.
For example, a data processing method according to at least one embodiment of the present disclosure further includes: receiving, by the first node, an off-node snoop request for a cache coherency state of the first data; the query requirement is generated by the off-node snoop request.
For example, a data processing method according to at least one embodiment of the present disclosure further includes: in response to querying a directory entry of the first coherence directory, determining that the first data is not cached within the first node, forwarding, by the first node, the off-node snoop request and not processing the off-node snoop request.
For example, in a data processing method according to at least one embodiment of the present disclosure, the plurality of nodes are divided into a plurality of node partitions, each of the plurality of node partitions includes at least one node, and the first portion indicates whether or not the first data is cached by other nodes than the first node by way of recording area information.
For example, in a data processing method according to at least one embodiment of the present disclosure, the inter-node coherence extension unit of the first node queries the first coherence directory of the directory listening extension filter according to the query requirement.
For example, in a data processing method according to at least one embodiment of the present disclosure, the first node further includes a first processor core, a first memory controller, and a directory snoop filter for the first memory controller, the directory snoop filter including a second coherence directory, a directory vector of each directory entry in the second coherence directory also including the first portion and the second portion for the object data, the data processing method further comprising: generating, by the first processor core, a second access request for second data, wherein a storage address of the second data is located in the first node, and querying a directory entry of the second coherence directory for a second query result based on the second access request, for determining whether the second data is cached in the first node and in other nodes than the first node; and according to the second query result, a second interception request is sent out by the second directory interception filter.
For example, in a data processing method according to at least one embodiment of the present disclosure, the first node further includes a second processor core, the second processor core includes at least one cache, and the data processing method further includes: and responding to the second query result to determine that the second data is cached in the cache of the second processor core, and sending the second data from the cache of the second processor core to the first processor core by the second processor core according to the second interception request and responding to the second access request.
At least one embodiment of the present disclosure provides an electronic device including a plurality of nodes communicatively connected to each other, wherein the plurality of nodes includes a first node including a directory listening expansion filter for the first node, the directory listening expansion filter including a first coherence directory, a directory vector of each directory entry in the first coherence directory including a first portion for object data and a second portion for indicating whether other nodes than the first node cache the object data, the second portion for indicating whether the first node caches the object data; the first node is configured to query a directory entry of the first coherence directory to determine whether the first data is cached within the first node and within other nodes than the first node in response to the directory snoop extension filter receiving a query requirement for the first data.
For example, in the electronic device according to at least one embodiment of the present disclosure, the first node further includes a first processor core configured to generate a first access request for the first data, a storage address of the first data is located in a node other than the first node, and an inter-node consistency expansion unit configured to generate the query requirement based on the first access request; the directory snoop extension filter is configured to send a first snoop request according to a first query result of querying a directory entry of the first coherence directory.
For example, in an electronic device according to at least one embodiment of the present disclosure, the first node further includes a second processor core, the second processor core includes at least one cache, and the directory snoop extension filter is further configured to send the first snoop request to the second processor core in response to determining that the first data is already cached in the cache of the second processor core according to the first query result.
For example, in an electronic device according to at least one embodiment of the present disclosure, the second processor core is configured to send the first data from the cache of the second processor core to the first processor core in response to the first access request according to the first snoop request.
For example, in an electronic device according to at least one embodiment of the present disclosure, the directory listening extension filter is further configured to send the first listening request to other nodes than the first node in response to determining that the first data is already cached in the other nodes than the first node according to the first query result.
For example, in the electronic device according to at least one embodiment of the present disclosure, the first node further includes an inter-node coherence extension unit configured to receive an off-node snoop request for a cache coherence state of the first data; the directory snoop extension filter is further configured to generate the query requirement from the off-node snoop request.
For example, in an electronic device according to at least one embodiment of the present disclosure, the inter-node coherence extension unit is further configured to forward the off-node snoop request and not process the off-node snoop request in response to a directory entry of the first coherence directory being queried to determine that the first data is not cached within the first node.
For example, in the electronic device according to at least one embodiment of the present disclosure, the plurality of nodes are divided into a plurality of node partitions, each of the plurality of node partitions includes at least one node, and the first portion indicates whether other nodes than the first node cache the first data by recording area information.
For example, in an electronic device according to at least one embodiment of the present disclosure, the first node further comprises a switching unit, and the inter-node consistency expansion unit is configured to communicate with the directory listening expansion filter through the switching unit.
For example, in an electronic device according to at least one embodiment of the present disclosure, the first node further includes a first processor core, a first memory controller, and a directory snoop filter for the first memory controller, the directory snoop filter including a second coherence directory, a directory vector of each directory entry in the second coherence directory also including the first portion and the second portion for the object data, the first processor core being configured to generate a second access request for second data, wherein a storage address of the second data is located in the first node, the first memory controller being configured to query a directory entry of the second coherence directory according to the second access request to obtain a second query result for determining whether the second data is cached in the first node and in other nodes than the first node; the second directory snoop filter is configured to issue a second snoop request according to the second query result.
For example, in an electronic device according to at least one embodiment of the present disclosure, the first node further includes a second processor core, the second processor core includes at least one cache, the second processor core is configured to determine, in response to the second query result, that the second data is already cached in the cache of the second processor core, send the second data from the cache of the second processor core to the first processor core in response to the second access request according to the second snoop request.
At least one embodiment of the present disclosure also provides a data processing apparatus including a memory and at least one processor. The memory is configured to store computer-executable instructions; the at least one processor is configured to execute the computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, implement the method as in any of the embodiments above.
At least one embodiment of the present disclosure also provides a non-transitory storage medium that non-transitory stores computer-executable instructions, wherein the computer-executable instructions, when executed by at least one processor, implement a method as in any of the embodiments above.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure, not to limit the present disclosure.
Fig. 1 shows a schematic diagram of a processor node.
FIG. 2 shows a schematic diagram of a directory organization of a coherence directory of a directory snoop filter.
Fig. 3 illustrates a schematic diagram of a processor node of an electronic device in accordance with at least one embodiment of the present disclosure.
Fig. 4 illustrates a schematic diagram of a directory organization of a coherence directory of a directory listening filter in accordance with at least one embodiment of the present disclosure.
Fig. 5A shows a schematic diagram of an electronic device according to an embodiment of the present disclosure, the electronic device in this embodiment being a two-node system.
Fig. 5B shows a schematic diagram of an electronic device according to an embodiment of the present disclosure, the electronic device in this embodiment being a three-node system.
Fig. 5C shows a schematic diagram of an electronic device according to an embodiment of the present disclosure, which is a multi-node system.
Fig. 6 illustrates a schematic diagram of a directory organization of a coherence directory of a directory listening filter in accordance with at least one other embodiment of the present disclosure.
Fig. 7 shows a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure.
Fig. 8 shows a schematic diagram of a non-transitory storage medium provided by an embodiment of the present disclosure.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present disclosure. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.
Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
In order to keep the following description of the embodiments of the present disclosure clear and concise, the present disclosure omits a detailed description of some known functions and known components.
In multiprocessor (or multiprocessor kernel) systems, techniques such as snoop filtering have been proposed to improve the efficiency of data coherency maintenance. Snoop filtering techniques may help track cache states in multiple caches, including: for example, a single copy of a certain data in a certain cache has an exclusive state, a plurality of copies of a certain data in a certain cache has a shared state, or a certain data only exists in a main memory, etc.
In a multiprocessor (or processor core) multi-cached computer system, a directory snoop filter tracks the state of the processor caches and stores the tracked state information in the snoop filter. After the coherence agent listens to the bus and finds out the coherence transaction, the coherence agent can query the cache coherence information tracked in the snoop filter and send out a corresponding snoop request to complete the maintenance of the coherence.
Fig. 1 shows a schematic diagram of a processor node 100, which is for example a single processor chip or a separate part of a processor chip, and which together with other processor nodes constitutes a larger system, such as an electronic device.
As shown in FIG. 1, processor node 100 is a basic multiprocessor multi-cache system that uses directory snoop filtering to maintain cache coherency. Processor node 100 includes n+1 processor cores 10-1 n, n+1 caches (hereinafter also referred to as "caches") 20-2 n, a coherency interconnect bus 14, m+1 memory controllers 40-4 m, m+1 system memories 30-3 m, m+1 directory snoop filters (or "coherency directory snoop filters") 50-5 m, where n and m are integers greater than or equal to 0.
For example, each of the caches 20-2 n is configured to be capable of storing data storage information corresponding to at least one data information. For example, caches 20-2 n may be multiple dedicated caches used by processor cores 10-1 n, respectively. In addition, the processor node 100 may include a shared cache for the processor cores 10-1 n.
As shown in fig. 1, the m+1 memory controllers 40 to 4m correspond to the m+1 system memories 30 to 3m, respectively; the m+1 memory controllers 40 to 4m correspond to the m+1 directory listening filters 50 to 5m, respectively. These memory controllers are used to manage and control access and operation to system memory and as coherency nodes connect corresponding directory snoop filters and system memory to the coherency interconnect bus 14.
These directory snoop filters are used to maintain coherency of data in the corresponding system memory in the caches 20-2 n. More specifically, directory snoop filters 50-5 m may help track the state of cached data in caches 20-2 n; for example, for a certain data, the cached data state includes, for example: one or more of caches 20-2 n may have only a single copy of the cached data, multiple copies of the cached data, or only a main memory medium state of the data. The directory snoop filter 10-1 n tracks the cache data state of the caches 20-2 n of the processor cores 10-1 n and stores the tracked state information in the directory of the directory snoop filter 10-1 n. After the coherence agent listens to the bus for the coherence transaction, it will query the state information tracked in the directory snoop filters 10-1 n, and the corresponding directory snoop filter or directory snoop extension filter sends out the corresponding snoop request to complete the coherence maintenance.
For example, as shown in FIG. 1, the coherency interconnect bus 14 is directly coupled to the caches 20-2 n and also directly coupled to the memory controllers 40-4 m. The coherent interconnect bus 14 is a common communication trunk for information transmission, for example, the coherent interconnect bus 14 is a transmission harness composed of electronic components such as wires in a chip.
As shown in fig. 1, the processor node 100 further includes a coherence extension unit 15, where the coherence extension unit 15 is configured to communicate with other processor nodes (e.g., chips) to maintain data cache coherence among the plurality of nodes by querying the directory snoop filters 10-1 n.
FIG. 2 is a schematic diagram of a directory organization of a coherence directory of a directory snoop filter. As shown in fig. 2, the directory is a data table, and includes multiple directory entries (or referred to as target entries), and each directory entry may include the following data entries:
directory valid bit: indicating whether the directory entry is valid.
Owner ID: indicating that if the data stored at the address corresponding to the directory entry is exclusively owned by a cache, the cache ID is recorded and snoop requests may be sent directly to the cache that is exclusive of the data when the other cache accesses the data stored at the address.
Directory vector: indicating which caches have a backup of the data stored at that address. The directory vector may make an exact record, such as recording whether each cache in the system cached the data, or may make a fuzzy record, such as only which nodes and even which block nodes cached the data.
Cache state information: representing the state of data backup of the data stored at the address corresponding to the directory entry in the corresponding cache, such as shared state, exclusive (Exclusive)
A state, a modification state, etc.
Directory address flags: address information corresponding to data stored in the address corresponding to the directory entry is generally recorded in the upper N bits (bits) of the address.
Depending on which of the MSI/MESI/MOESI protocols the system cache state is, the request type, snoop type, etc., the embodiments of the present disclosure are not limited in this regard.
The inventors of the present disclosure have noted in research that in the data items of the above-described directory entries, the directory vector may make an exact record, such as recording whether each cache has cached the data, or may make a fuzzy record, such as recording only which processor nodes and even which processor nodes in a slice have cached the data. Accordingly, if the directory vector records are more accurate, the system performance is better, but the storage overhead is correspondingly greater. While degrading directory logging accuracy can solve the problem of storage overhead, it can result in many meaningless snoop requests and responses for consistency maintenance, which take up bandwidth within the system and result in wasted power consumption.
Based on the above understanding, at least one embodiment of the present disclosure provides a data processing method and an electronic device.
An electronic device provided in at least one embodiment of the present disclosure includes a plurality of nodes communicatively connected to each other, where the plurality of nodes includes a first node including a directory listening expansion filter for the first node, the directory listening expansion filter including a first coherence directory, a directory vector of each directory entry in the first coherence directory including a first portion and a second portion for object data, the first portion being configured to indicate whether other nodes than the first node cache the object data, the second portion being configured to indicate whether the first node caches the object data; the first node is configured to query directory entries of the first coherence directory to determine whether the first data is cached within the first node and within other nodes than the first node in response to the directory snoop extension filter receiving a query requirement for the first data.
Corresponding to the electronic device, at least one embodiment of the present disclosure provides a data processing method, including the following steps: in response to the directory snoop extension filter receiving a query requirement for the first data, directory entries of the first coherence directory are queried to determine whether the first data is cached within the first node and within other nodes other than the first node.
In the above embodiment, the first node may be any one of a plurality of nodes of the electronic device, for example, a single chip or a part of a single chip, which is itself a multiprocessor multi-cache system. In operation, the first node may act as a requesting node, a passing node or a home node and thus process the generated or received snoop request accordingly. Instead of the directory snoop extension filter being set for the memory controller as mentioned later, the directory vector of each directory entry in the coherence directory of the directory snoop extension filter may comprise two or more parts, whereby the directory vector is hierarchically recorded, whereby hierarchical processing of snoop requests may be achieved.
The above-mentioned "object data" refers to the data corresponding to each directory entry in the directory itself, so that different directory entries correspond to different object data; the above-mentioned "first data" and "second data" mentioned later refer to data that is an object to be operated at the present operation. The premise of a "query-requirement" query operation may be, for example, in the form of a query request or a non-query request, whereby a coherence directory is queried based on the query requirement.
For example, a plurality of nodes in the electronic device may be identical to the first node, and also hierarchically record directory vectors of the coherence directory therein (whether a directory snoop extension filter or a directory snoop filter), so that hierarchical processing of snoop requests for cache coherence maintenance may be implemented.
The electronic system and the data processing method provided by the above embodiments of the present disclosure can implement hierarchical recording of directory vectors, and accordingly perform hierarchical processing on interception requests, so that the number of meaningless interception requests and responses can be reduced, the bandwidth occupation pressure of inter-node (e.g., inter-chip) transmission is relieved, and the power consumption of inter-chip transmission is reduced due to the reduction of inter-node transmission data traffic; in some embodiments, access latency may also be effectively reduced, which improves overall system performance.
For example, in the electronic device of at least one embodiment, the first node further includes a first processor core and an inter-node consistency extension unit, the first processor core configured to generate a first access request for first data, a storage address of the first data being located in a node other than the first node; the inter-node consistency expansion unit is configured to generate a query requirement based on the first access request; the directory snoop extension filter is configured to send a first snoop request based on a first query result of a directory entry of a first coherence directory. In this embodiment, the first node is the requesting node, whereby the inter-node consistency extension unit of the first node needs to handle inter-node communication.
For example, in the electronic device of at least one embodiment, the first node further comprises a second processor core comprising at least one cache; the directory snoop extension filter is further configured to send a first snoop request to the second processor core in response to determining from the first query result that the first data is already cached in the cache of the second processor core.
For example, in an electronic device of at least one embodiment, a second processor core is configured to send first data from a cache of the second processor core to a first processor core in response to a first access request according to a first snoop request.
For example, in the electronic device of at least one embodiment, the directory listening expansion filter is further configured to send a first listening request to nodes other than the first node in response to determining from the first query result that the first data has been cached in the nodes other than the first node.
For example, in the electronic device of at least one embodiment, the first node further includes an inter-node consistency expansion unit; the inter-node consistency expansion unit is configured to receive an off-node interception request of a cache consistency state of the first data; the directory snoop extension filter is further configured to generate a query requirement from the off-node snoop request. In this embodiment, the first node is a passthrough node that listens for requests (i.e., off-node snoop requests).
For example, in the electronic device of at least one embodiment, the inter-node coherence extension unit is further configured to forward the off-node snoop request and not process the off-node snoop request in response to a determination that the first data is not cached within the first node by querying a directory entry of the first coherence directory. In the case where the first node is a passthrough node, if it can be determined by the first portion of the first coherence directory that data involved in the snoop request is not cached within the node, the first node itself need not process the snoop request in addition to forwarding the snoop request, thereby reducing operations of the first node itself to process the snoop request, and responses that may result therefrom, and the like.
For example, in the electronic device of at least one embodiment, the first node further comprises a first processor core, a first memory controller, and a directory snoop filter for the first memory controller, the directory snoop filter comprising a second coherence directory, a directory vector for each directory entry in the second coherence directory also comprising a first portion and a second portion for the object data; the first processor core is configured to generate a second access request for second data, wherein a storage address of the second data is located in the first node; the first memory controller is configured to query the directory entries of the second coherence directory for a second inquiry result in accordance with the second access request, for determining whether the second data is cached in the first node and in other nodes than the first node; the second directory snoop filter is configured to issue a second snoop request based on the second query result. In this embodiment, the first node is a requesting node, and the read data is also located in the first node, and then the directory snoop filter corresponding to the memory controller processes the snoop request and other operations.
For example, in the electronic device of at least one embodiment, the first node further comprises a second processor core comprising at least one cache; the second processor core is configured to determine, in response to a second query result, that the second data has been cached in the cache of the second processor core, send the second data from the cache of the second processor core to the first processor core in response to a second access request in accordance with the second snoop request.
In any of the above embodiments, for example, the inter-node coherence extension unit is configured to communicate with other processor nodes (for example), receive a snoop request and query a directory snoop extension filter, and perform cache coherence maintenance between the processor nodes.
The inter-node consistency extension unit can filter interception requests which do not relate to the first node, so that responses to unnecessary interception requests in the first node are reduced, bandwidth occupation pressure of transmission in the node can be reduced, and power consumption in the node is reduced. And the consistency expansion unit between the nodes is convenient for the electronic equipment to flexibly expand, the electronic equipment can be randomly expanded to any multi-node under the condition of not modifying other nodes, and when the processor nodes in the electronic equipment are linearly increased, the capacity of the directory memory in the processor nodes is not required to be increased, so that a performance optimization scheme is provided for building a many-core system.
For example, a first coherence directory capacity of a directory listening expansion filter and a second coherence directory capacity of a directory listening filter in the same node may each be independently set, independently maintained, and have no necessary relationship therebetween. For example, the eviction of the first coherence directory may eliminate or override the coherence directory directly without informing the other coherence directory of the coherence; the eviction of the second coherence directory may send a snoop request to inform the coherence directory of the eviction operation. For another example, an eviction of a first coherence directory may also inform the coherence directory concerned.
For example, in an electronic device of at least one embodiment, a directory listening extension filter communicates directly with an inter-node consistency extension unit; for another example, the first node further comprises a switching unit; the inter-node consistency extension unit is configured to (indirectly) communicate with the directory listening extension filter through the switching unit; for example, the switching unit, the inter-node coherence extension unit, and the directory snoop extension filter are all coupled to a coherence interconnect bus, the inter-node coherence extension unit and the directory snoop extension filter communicating through the switching unit.
For example, in an electronic device of at least one embodiment, a first node includes p processor cores, a second portion of the directory vector includes q bits for marking the p processor cores, respectively, p and q are positive integers, and q is greater than or equal to p. For example, the first node includes 4 processor cores and the second portion of the directory vector includes 4 bits for marking the 4 processor cores, respectively, e.g., first/second/third/fourth bits of the 4 bits (bits) for marking the first/second/third/fourth processor cores, respectively.
For example, in an electronic device of at least one embodiment, a plurality of nodes are divided into a plurality of node partitions, each of the plurality of node partitions includes at least one node, a first portion of a directory vector is multi-bit wide, each bit represents information of one node, and whether other nodes except the first node cache first data is indicated by way of recording area information. For example, the directory vector may include a third portion in addition to the first portion and the second portion, where the first portion indicates whether a node partition including other nodes caches object data, and the third portion indicates whether a node partition where the first node itself is located caches object data, and the second portion is used to indicate, if the first node caches object data, at least one piece of information that caches object data. Therefore, in this embodiment, the directory vector further includes node partition information, so that partition management can be performed on interception requests and the like for node partitions, so that the bandwidth occupation pressure of transmission between node partitions is further relieved, and the power consumption of the whole system is reduced due to the reduction of data flow transmitted between node partitions; in some examples of this embodiment, access latency may also be effectively shortened, improving overall system performance.
Correspondingly, in the data processing method of at least one embodiment of the present disclosure, the operation steps corresponding to the above configuration are also performed, and will not be described in detail herein.
Some embodiments of the present disclosure and examples thereof will be described below with reference to the accompanying drawings.
Fig. 3 illustrates a schematic diagram of a processor node of an electronic device in accordance with at least one embodiment of the present disclosure. The electronic device includes a plurality of processor nodes including node 200. The processor node 200 is, for example, a single processor chip or a separate part of a processor chip and together with other processor nodes constitutes the electronic device.
For example, a Multi-Chip Module (MCM) package or a Chip (Chip) package may be employed between the chips in the electronic device. The MCM (Multi-Chip Module) packaging technology packages multiple individual chips in a single Module. Chiplet refers to splitting a complete chip design into multiple small modules (chips), each module being referred to as a Chiplet (Chiplet); these chiplets (chiplets) can be designed and fabricated separately and then integrated together by interconnect technology and packaged to form a complete chip system.
In embodiments of the present disclosure, a node refers to, for example, a chip containing one or more processor cores, one or more memories, one or more (inter-chip) coherency extension units.
As shown in fig. 3, processor node 200 is a multiprocessor multi-cache system that uses directory snoop filtering to maintain cache coherency. The processor node 200 includes n+1 processor cores 110 to 11n, n+1 caches (hereinafter also simply referred to as "caches") 120 to 12n, a coherency interconnect bus 114, m+1 memory controllers 140 to 14m, m+1 system memories 130 to 13m, m+1 directory snoop filters 150 to 15m, a coherency extension unit 115, and a directory snoop extension filter 15x, where n and m are integers equal to or greater than 0.
Here, the directory listening filters 150 to 15m are examples of the above-described "directory listening filters", including the second coherence directory; directory listening extension filter 15X is an example of the "directory listening extension filter" described above, including a first coherence directory; the consistency expansion unit 115 is an example of the above-described "inter-node consistency expansion unit".
For example, each of caches 120-12 n is configured to be capable of storing data storage information corresponding to at least one data information. For example, caches 120-12 n may be multiple dedicated caches used by processor cores 110-11 n, respectively. In addition, the processor node 200 may include shared caches for the processor cores 110-11 n, e.g., one shared cache for each two processor cores.
As shown in fig. 3, the m+1 memory controllers 140 to 14m correspond to the m+1 system memories 130 to 13m, respectively; the m+1 memory controllers 140 to 14m correspond to the m+1 directory listening filters 150 to 15m, respectively. These memory controllers are used to manage and control access and operation to system memory and, as coherency nodes, connect corresponding directory snoop filters and system memory to coherency interconnect bus 114.
These directory snoop filters are used to maintain coherency of data in the corresponding system memory in the caches 120-12 n. More specifically, directory snoop filters 150-15 m may help track the state of cached data in caches 120-12 n; for example, for a certain data, the cached data state includes, for example: one or more of caches 120-12 n may have only a single copy of the cached data, multiple copies of the cached data, or only a main memory medium state. The directory snoop filters 150-15 n and the directory snoop extension filter 15X track the cache data states of the caches 120-12 n of the processor cores 110-11 n and store the tracked state information in the directories of the directory snoop filters 150-15 n and the directory snoop extension filter 15X. After the coherence agent snoops the bus for the coherence transaction, it queries the directory snoop extension filter 15X or, if necessary, the state information tracked in the directory snoop filters 150-15 n, and the corresponding directory snoop filter or directory snoop extension filter issues a corresponding snoop request to complete the maintenance of the coherence.
For example, as shown in FIG. 3, coherency interconnect bus 114 is coupled directly to caches 120-12 n and also to memory controllers 140-14 m. The coherent interconnect bus 114 is a common communication backbone for information transfer, for example, the coherent interconnect bus 114 is a transmission harness consisting of electrical components such as wires in a chip.
As shown in fig. 3, in the processor node 200, the coherence extension unit 115 is configured to communicate with other processor nodes (e.g., chips), and is further connected to the directory snoop extension filter 15X, and hierarchically manages snoop requests to the node 200 by querying the directory snoop extension filter 15X.
Although specific numbers of directory snoop filters, processor cores, memory controllers, caches, and system memory are shown in FIG. 3, embodiments of the disclosure are not limited to these specific numbers, which may be set to any value as desired.
The electronic device in at least one embodiment of the present disclosure may include, but is not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), wearable electronic devices, etc., fixed terminals such as digital TVs, desktop computers, smart home devices, etc., server side devices such as local area network servers, wide area network servers, cloud servers, etc.
For example, the system memory in at least one embodiment of the present disclosure may be a main memory (e.g., DRAM), or may be a memory external to the processor node 200, such as a hard disk, a floppy disk, an optical disk, a usb disk, or the like.
For example, a memory controller, directory snoop filter, coherency extension unit, etc. may be implemented in software, hardware, firmware, and any viable combination thereof, e.g., may include logic devices such as registers, latches, flip-flops, buffers, inverters, etc. For example, the processor core may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or program execution capabilities, such as a Field Programmable Gate Array (FPGA) or Tensor Processing Unit (TPU), or the like; for example, the Central Processing Unit (CPU) may be an X86, RSIC-V, ARM architecture, or the like.
Embodiments of the present disclosure are not limited in the type of processor core, the microarchitecture employed, the instruction set, etc., and may employ, for example, the X86 instruction set and corresponding microarchitecture, the Arm instruction set and corresponding microarchitecture, or the RISC-V instruction set and corresponding microarchitecture, etc.
Similarly, embodiments of the present disclosure are not limited in terms of cache, type of system memory, etc. It should be noted that, in embodiments of the present disclosure, a private cache of a processor represents a cache that may be accessed only by the processor, while a shared cache may be accessed by multiple processors.
For example, when the processor core 110 needs to read certain data, at least one private cache 120 used by the processor core 110 is accessed first, and when the private cache 120 does not store the data information that the processor core 110 needs to read, the processor core 110 accesses the lower level cache next; when the private cache is accessed, the processor core 120 may then continue to access at least one shared cache (not shown) or system memory used by the processor core 110.
For example, each cache includes a cache control module that can access each storage location in the cache to read the content stored in the storage location and then parse the stored content. When judging whether the data information to be read in the read request sent by the processor core is stored in a certain cache, executing a judging process by a cache control module in the cache, namely, reading and analyzing the content in the storage unit by the cache control module, and comparing the analyzed content with the data information to be read determined based on the read request, so as to judge whether the data information stored in the storage unit is consistent with the data information to be read.
Fig. 4 is a schematic diagram of a directory organization of a coherence directory of a directory snoop filter in accordance with at least one embodiment of the present disclosure, for use with directory snoop extension filter 15X described above, and further for use with directory snoop filters 150-15 n.
As shown in FIG. 4, the coherence directory is a data table comprising multiple directory entries (or target entries), each directory entry may comprise multiple data entries, and the directory vector comprises multiple parts, as compared to the case shown in FIG. 2, such as:
the first part, indicates whether other nodes possess a backup.
The second part, representing the local node cache ID vector.
For example, in one example, the first portion has a size (bit width) of 1 bit, where a value of "1" indicates that caches of other nodes (other than the current node) have backups of data corresponding to the current directory entry, and a value of "0" indicates that caches of other nodes do not have backups of the data. Whether the current node's own cache owns a backup of the data may be determined by querying the second portion.
For example, in another example, the first portion is more than 1 bit in size, and each 1 bit represents a node in the system to indicate whether the node has a backup of some data; for example, the system has 5 nodes, and the first portion is 5 bits in size and each 1 bit represents one of the 5 nodes to indicate whether the node has a backup of certain data.
For another example, the first portion is a degenerate type recording and the second portion is an exact type recording. Wherein the first portion may be 1 bit or multiple bits wide, where each bit represents one direction to the remote node, such as:
only one direction to the far-end node, 1 bit wide can be taken; or alternatively
There are 2 directions to the far end node, which can take 2 bits wide; or alternatively
There are 4 directions to the far end node, which may be 4 bits wide.
In this case, how much bit width the first part has in particular may be determined by the interconnection structure of the system, and once the interconnection structure of the system is defined, the bit width of the first part "whether other nodes have backups" in the directory vector may be selected accordingly.
Similarly, once the number of processor cores owned by each node is determined, the bit width of the second portion of the directory vector, the "home node cache ID vector", is also determined.
For example, for recording the second portion for precision, for example, if the current node should include 2 caches, 2 bits may be used to correspond to the 2 caches of the current node, with each bit (bit) value of "1" and "0" indicating whether the corresponding caches have a backup of the data, respectively; for another example, if the current node should include 4 caches, 4 bits may be used to correspond to the 4 caches of the current node, with each bit (bit) value of "1" and "0" indicating whether the corresponding caches have a copy of the data, respectively.
The meaning and examples of the other data items of each directory entry may be referred to the description of fig. 2, and are not repeated here.
In one example, each directory entry may be 32 bits, e.g., directory valid bit is 1 bit, owner ID is 4 bits, directory vector is 3 bits (where the first part (whether other nodes have backups) is 1 bit, the second part (the present node cache ID vector) is 2 bits), cache state information is 3 bits, directory address flag is 21 bits. For example, in operation, the coherence directory is stored in a static real memory (SRAM), for example, the access line width of the SRAM is 128 bits, that is, each time the read/write is performed with 128 bits as granularity, then each time the read/write is performed with 4 entries. The organization of SRAM is various, and embodiments of the present disclosure are not limited thereto, e.g., read and write may be performed at other granularity; moreover, the embodiments of the present disclosure are not limited in particular to the content and bit width covered by each entry of the record information.
Fig. 5A shows a schematic diagram of an electronic device according to an embodiment of the present disclosure, the electronic device in this embodiment being a two-node system. As shown, the system has 2 nodes, and the topology for cache coherency maintenance for each node may employ an embodiment such as that shown in fig. 3.
In a multi-node system, a "requesting node" refers to a node that sends an access request, and a "home node" refers to a node where memory to be accessed by the access request is located. That is, the requesting node and home node are not fixed with respect to a particular operation.
For a dual node system such as that of FIG. 5A, for example, in operation 1, left node processor core 0 sends a read request, accessing the memory of the right node, then for operation 1 the left node is the requesting node and the right node is the home node. For example, in operation 2, the left node processor core 1 sends a read request to access the memory of the left node, and then for operation 2 both the requesting node and the home node refer to the left node. For example, in operation 3, the processor core 0 of the right node sends a read request, accesses the memory of the left node, and for operation 3, the right node is the requesting node and the left Bian Jiedian node is the home node.
In fig. 5A, a non-limiting description is given taking a left node as a request node and a right node as a home node as an example. The requesting node and home node are functionally identical, except that the requesting node also schematically depicts a processor core and an inter-chip coherency extension unit in the schematic diagram, and the home node also schematically depicts a system memory.
Each of the two nodes has, for example, 2 processor cores and 2 caches corresponding to each of the two processor cores, and performs inter-chip communication through respective inter-chip coherence extension units; moreover, the directory snoop extension filter of the two nodes and the directory vector of the directory entry in the directory of the directory snoop filter include a first portion of 1 bit and a second portion of 2 bits. The 1 bit of the first portion indicates whether there is a backup of the corresponding data in the remote node; the 2 bits of the second portion are used to indicate whether the local node has a corresponding backup of data in both caches, respectively.
In the state shown in fig. 5A, when the 2 processor core caches of the requesting node all have a data backup, the directory vector of the directory entry of the directory snoop extension filter (and the directory snoop filter) of the home node is { far node vector=1b, local node vector=00b }, indicating that there is no backup of the data in the home node cache and there is a backup of the data in the requesting node cache. The directory vector of the directory snoop extension filter (and directory snoop filter) of the requesting node is { remote node vector=0b, local node vector=11b }, indicating that there is a backup of the data in both caches of the requesting node, and no backup of the data in the home node caches. At this time, the directory interception filter or the directory interception expansion filter in the home node only needs to transmit to the requesting node when an interception request is transmitted.
In at least one example, for a multi-node system, after adding a directory snoop extension filter, the directory vector within the node corresponding to the directory snoop filter of the memory controller may be more compact, simply by keeping track of the cache state of the node and whether other nodes possess the backup.
Some exemplary processes for processing data access requests are described below in connection with the dual node system shown in FIG. 5A.
Case 1: the directory of the right node listens for the case of filter 0 miss.
First, the processor core 0 of the requesting node (left node in the figure) issues a read request 1 to access the system memory 0 corresponding to the memory controller 0 of the home node (right node in the figure) to access the data a (example of the first data) in the system memory 0, where the data a has no cached backup in the requesting node.
Read request 1 is then sent over an interconnect within the requesting node, such as a Network On Chip (NOC), to an inter-chip coherence extension unit of the requesting node, which queries the coherence directory of the directory snoop extension filter (an example of a first coherence directory), with the result of the query being a Miss (Miss). The inter-chip consistency expansion unit continues to forward the read request 1 to the home node; the read request 1 is routed to the memory controller 0 of the home node via the inter-chip coherence extension unit of the home node, the interconnect within the home node (e.g. a network on chip). Memory controller 0 queries directory snoop filter 0 based on the address of data A requested to be read, the query result also being a Miss (Miss).
The memory controller 0 sends a read request to the system memory 0 and returns the data a to be read, records the address of the data a into an entry in the coherence directory of the directory snoop filter 0, e.g., stores the SRAM specific row of the directory by the low order index of the address, writes the high order bits of the address into the directory address tag field in the directory entry, points the directory valid position 1, points the owner ID to (the cache of) the processor core 0 of the left node, places the directory vector at 3' b100, indicates that the other node owns the backup of the data and the cache in the present node does not own the backup of the data, places the cache state in an appropriate state, such as Exclusive state (Exclusive).
When the read data a is returned from the home node through the inter-slice coherency extension unit of the requesting node, the inter-slice coherency extension unit records the address of the data a into an entry in the coherency directory of the directory snoop extension filter, e.g., stores the SRAM specific row of the directory by the low order index of the address, writes the high order bits of the address into the directory address tag field in the directory entry, points the owner ID to (the cache of) processor core 0 of the present node, places the directory vector at 3' b001, indicating that the other node does not own a backup of the data and the cache of processor core 0 of the present node owns the backup, and then transfers the read data a to the processor core 0 of the requesting node via the intra-node routing way and into the cache of processor core 0.
Case 2: the directory of the right node listens to the case of filter 0 hits.
After having made the cache of processor core 0 possess a copy of data a by the above operation, the processor core 1 of the requesting node in turn issues a read request 2, the address of data a to be read being the same as read request 1, thereby again requiring access to the memory controller 0 of the home node.
Similarly, read request 2 is routed to the requesting node's inter-chip coherency extension unit, which queries the coherency directory of the directory snoop extension filter (an example of a first coherency directory), the query result is a Hit (Hit), and confirms that the other node does not possess a copy of the data and that the cache of the own node's processor core 0 possesses the copy via directory vector 3'b001, so the directory snoop extension filter only issues snoop requests to the requesting node's processor core 0 (without sending snoop requests out of the node). After the processor core 0 of the requesting node receives the interception request, the cached data A is directly returned to the processor core 1 of the requesting node, and the cached data A of the processor core 0 can be set to an Invalid (Invalid) state or a Shared (Shared) state according to the type (actual requirement) of the interception request; the cache of the processor core 1 of the requesting node caches this data a.
The processor core 1 of the request node sends a read response confirmation message to the inter-chip consistency expansion unit of the request node; the inter-chip consistency expansion unit of the requesting node updates the directory entry in the consistency directory of the directory snoop expansion filter 0 corresponding to the data a, places the directory vector at 3' b011, indicating that other nodes do not own a backup of the data and that the caches of the processor core 0 and the processor core 1 of the node both own the backup.
Then, the inter-chip consistency expansion unit of the requesting node sends an update request to a directory listening filter 0 corresponding to a memory controller 0 of the home node; through the inter-chip consistency expansion unit of the home node, the directory listening filter 0 corresponding to the memory controller 0 of the home node receives the directory update request, and updates the consistency directory of the directory listening filter 0, wherein the directory vector is kept at 3' b100, which indicates that other nodes have the backup of the data and the cache in the home node does not have the backup of the data, but the cache state is set to a suitable state, for example, set to Shared state (Shared).
At this time, fig. 5A shows a state after the completion of the operation.
Fig. 5B shows a schematic diagram of an electronic device according to an embodiment of the present disclosure, the electronic device in this embodiment being a three-node system. As shown in fig. 5B, the system has 3 nodes, and the topology for cache coherency maintenance for each node may employ the embodiment shown in fig. 3, for example. In the three-node system, three nodes are connected in series, and as shown, the left-side node and the right-side node are connected to the intermediate node, respectively, thereby being able to communicate with each other.
In fig. 5B, a non-limiting description is given taking a left node as a request node, an intermediate node as a pass-through node, and a right node as a home node as an example. The request node, the pass node and the home node are identical in function as a whole, but in the schematic diagram, the request node only schematically depicts the processor core and the inter-chip consistency extension unit and the directory snoop extension filter, the pass node only schematically depicts the inter-chip consistency extension unit and the directory snoop extension filter, and the home node only schematically depicts the inter-chip consistency extension unit and the directory snoop extension filter, the memory controller 0, the directory snoop filter 0 and the system memory 0.
Each of the three nodes has, for example, 2 processor cores and 2 caches corresponding to each processor core, and performs inter-chip communication through respective inter-chip coherence extension units; moreover, the directory snoop extension filter of the two nodes and the directory vector of the directory entry in the directory of the directory snoop filter include a first portion of 2 bits and a second portion of 2 bits. The 2 bits of the first portion indicate whether there is a backup of the corresponding data in the two remote nodes; the 2 bits of the second portion are used to indicate whether the local node has a corresponding backup of data in both caches, respectively.
In the state shown in fig. 5B, the directory vector of the directory listening extension filter (and the directory listening filter) of the home node is { far-end node vector=10b, local node vector=00b }, which indicates that there is no backup of the data corresponding to the directory entry in the home node cache, and there is a backup of the data in the far-end node cache in the left direction of the home node.
The directory vector of the directory snoop extension filter (and the directory snoop filter) of the passing node is { far-end node vector=10b, local node vector=00b }, which indicates that there is no backup of data corresponding to the directory entry in two caches of the passing node, and there is a backup of the data in the far-end node caches in the left direction of the passing node.
The directory vector of the directory snoop extension filter (and the directory snoop filter) of the requesting node is { far-end node vector=00b, local node vector=11b }, which indicates that there is a backup of data corresponding to the directory entry in both caches of the requesting node, and there is no backup of the data in the caches of other nodes (to the left node and/or to the right node) than the requesting node.
For example, when the directory listening filter of the home node sends a listening request as needed, the listening request only needs to be sent to the left node, and the listening request does not need to be sent inside the home node. After receiving the interception request (which is an off-node interception request), the inter-chip consistency extension unit of the passing node inquires the directory of the directory interception extension filter of the same node, and according to the inquiry result, only sends the interception request to the left node without sending and processing the interception request in the passing node. Similarly, after receiving the snoop request (which is an off-node snoop request), the inter-chip consistency expansion unit of the requesting node queries the directory of the directory snoop expansion filter of the same node, and then can send the snoop request to the processor core 0 and the processor core 1 according to the result, and process the snoop request accordingly.
The directory vector of the hierarchical record can be expanded to any multi-node system on the premise of not modifying the storage capacity and design of the directory.
Fig. 5C shows a schematic diagram of an electronic device according to an embodiment of the present disclosure, which is a multi-node system. As shown, the system has more than 3 nodes, for example, m rows and n columns (m and n are positive integers greater than or equal to 2) of nodes, and the topology of each node for cache coherency maintenance may employ the embodiment shown in FIG. 3, for example. In this system, a plurality of nodes are connected to each other in a network manner (e.g., via a Network On Chip (NOC)) to communicate.
In this embodiment, the first part of the directory vector used to indicate the status of the remote node is 4 bits wide, and these 4 bits represent whether to route up, down, left, and right, respectively, according to the routing rules, i.e., the nodes on the up, down, left, and right sides have no backup of the corresponding data. The bit width of the second part of the directory vector for indicating the state of the local node depends on the number of processors of each node, and the second part is also a fixed value after the design of the node is fixed, and does not change with the increase of the number of nodes in the system. It follows that, regardless of the number of nodes in the system, the number of processor cores per node, each node of this embodiment can record directory vectors hierarchically and precisely, so that snoop requests can more precisely find the location of the cache to process the snoop requests on the forwarded path.
In the above embodiment, the directory listening expansion filter interfaces with the inter-chip consistency expansion unit, however, the connection location of the directory listening expansion filter is not limited to the inter-chip consistency expansion unit, and may interface with other units in the system.
Fig. 6 illustrates a schematic diagram of a processor node 300 of an electronic device in accordance with at least one embodiment of the present disclosure. The electronic device includes a plurality of processor nodes including node 300. Processor node 300 is, for example, a single processor chip or a separate part of a processor chip and together with other processor nodes constitutes the electronic device.
As shown in fig. 6, processor node 300 maintains cache coherency for a multiprocessor multi-cache system using directory snoop filtering. Processor node 300 includes n+1 processor cores 210-21 n, n+1 caches (hereinafter also referred to as "caches") 220-22 n, coherency interconnect bus 214, m+1 memory controllers 240-24 m, m+1 system memories 230-23 m, m+1 directory snoop filters 250-25 m, coherency extension unit 215, and directory snoop extension filters 25X, where n and m are integers greater than or equal to 0. Here, directory listening filters 250 to 25m are examples of the above-described "directory listening filter", and directory listening extension filter 25X is an example of the above-described "directory listening extension filter".
Except for the differences described below, processor node 300 is identical to processor node 200 and will not be described in detail.
As shown in fig. 6, in the processor node 300, the coherence extension unit 215 is configured to communicate with other processor nodes (e.g., chips) and is communicatively connected to the switching unit 210; directory listening expansion filter 25X is also communicatively coupled to switching unit 210. The coherence extension unit 215 communicates indirectly with the directory interception extension filter 25X through the switching unit 210, whereby interception requests to the node 300 can be managed hierarchically by querying the directory interception extension filter 25X. Switching unit 210 is connected to coherency interconnect bus 214 and is configured to control communications within node 300 over coherency interconnect bus 214; further, interconnection between nodes may also be accomplished by the switching unit 210 and the coherence extension unit 215 together.
For another example, in the case shown in fig. 6, the plurality of nodes are divided into a plurality of node partitions, each of the plurality of node partitions includes at least one node, and the first portion of the directory vector indicates whether or not other nodes than the first node cache the first data by way of recording the area information. For example, the directory vector may include a third portion in addition to the first portion and the second portion, where the first portion indicates whether a node partition including other nodes caches certain data, the third portion indicates whether the node partition where the first node itself resides caches the data, and the second portion is used to indicate which cache within the first node partition caches the data if the first node caches the data.
Some embodiments of the present disclosure also provide another data processing apparatus. Fig. 7 is a schematic diagram of a data processing apparatus according to some embodiments of the present disclosure.
As shown in fig. 7, a data processing apparatus 500 according to an embodiment of the present disclosure may include a processor 501 and a memory 502, which may be interconnected by a bus 503.
The processor 501 may perform various actions and processes in accordance with programs or code stored in the memory 502. In particular, the processor 501 may be an integrated circuit chip with signal processing capabilities. For example, the processor 501 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, and may implement or perform the various methods and steps disclosed in embodiments of the present disclosure. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, and may be an X86 architecture or an ARM architecture or the like.
Memory 502 is used for non-transitory storage of computer-executable instructions and processor 501 is used for execution of computer-executable instructions. Computer-executable instructions, when executed by processor 501, implement a data processing method provided by at least one embodiment of the present disclosure.
For example, the memory 502 may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (ddr SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link Dynamic Random Access Memory (SLDRAM), and direct memory bus random access memory (DRRAM). It should be noted that the memory of the methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
Embodiments of the present disclosure also provide a non-transitory storage medium, which may be a non-transitory computer-readable storage medium. The non-transitory storage medium is used to non-transitory store computer executable instructions that, when executed by a computer, implement the data processing methods provided by some embodiments of the present disclosure.
Fig. 8 is a schematic diagram of a non-transitory storage medium provided by some embodiments of the present disclosure.
As shown in fig. 8, the non-transitory storage medium 600 may non-transitory store computer executable instructions 610, which when executed by a computer, the computer executable instructions 610 implement a data processing method provided by any of the embodiments of the present disclosure.
Similarly, the non-transitory storage medium in embodiments of the present disclosure may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. It should be noted that the memory of the methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
Embodiments of the present disclosure also provide a computer program product or a non-transitory storage medium, the computer program product comprising computer instructions, or the computer instructions being stored in the non-transitory storage medium. A processor of a computer device reads the computer instructions, which the processor executes, causing the computer device to perform a data processing method according to any of the embodiments of the present disclosure.
The technical effects of the data processing apparatus and the non-transitory storage medium are the same as those of the data processing method, and will not be described here again.
It is noted that the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises at least one executable instruction for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In general, the various example embodiments of the disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of the embodiments of the present disclosure are illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
For the purposes of this disclosure, the following points are also noted:
(1) The drawings of the embodiments of the present disclosure relate only to the structures related to the embodiments of the present disclosure, and other structures may refer to the general design.
(2) In the drawings for describing embodiments of the present disclosure, thicknesses and dimensions of layers or structures are exaggerated for clarity. It will be understood that when an element such as a layer, film, region or substrate is referred to as being "on" or "under" another element, it can be "directly on" or "under" the other element or intervening elements may be present.
(3) The embodiments of the present disclosure and features in the embodiments may be combined with each other to arrive at a new embodiment without conflict.
The foregoing is merely a specific embodiment of the disclosure, but the scope of the disclosure is not limited thereto and should be determined by the scope of the claims.

Claims (24)

1. A data processing method for a first node of a plurality of nodes communicatively connected to each other, wherein the first node comprises a directory listening extension filter for the first node, the directory listening extension filter comprising a first coherence directory, a directory vector of each directory entry in the first coherence directory comprising a first part for object data and a second part for indicating whether other nodes than the first node cache the object data, the second part for indicating whether the first node caches the object data;
The data processing method comprises the following steps:
in response to the directory snoop extension filter receiving a query requirement for first data, directory entries of the first coherence directory are queried to determine whether the first data is cached within the first node and within other nodes other than the first node.
2. The data processing method of claim 1, wherein the first node further comprises a first processor core,
the data processing method further comprises the following steps:
generating, by the first processor core, a first access request for the first data, wherein a storage address of the first data is located in a node other than the first node,
generating the query requirement based on the first access request;
and according to a first query result of querying the directory entry of the first consistency directory, sending a first interception request by the directory interception expansion filter.
3. The data processing method of claim 2, wherein the first node further comprises a second processor core, the second processor core comprising at least one cache,
according to a first query result of querying a directory entry of the first coherence directory, sending, by the directory interception expansion filter, a first interception request, including:
And in response to determining that the first data is already cached in the cache of the second processor core according to the first query result, sending, by the directory snoop extension filter, the first snoop request to the second processor core.
4. A data processing method according to claim 3, further comprising:
and the second processor core responds to the first access request according to the first interception request, and the first data is sent to the first processor core from the cache of the second processor core.
5. The data processing method of claim 2, wherein sending, by the directory snoop extension filter, a first snoop request according to a first query result of querying a directory entry of the first coherence directory, comprises:
and in response to determining that the first data is cached in other nodes except the first node according to the first query result, sending the first interception request to the other nodes except the first node.
6. The data processing method of claim 1, further comprising:
receiving, by the first node, an off-node snoop request for a cache coherency state of the first data;
The query requirement is generated by the off-node snoop request.
7. The data processing method of claim 6, further comprising:
in response to querying a directory entry of the first coherence directory, determining that the first data is not cached within the first node, forwarding, by the first node, the off-node snoop request and not processing the off-node snoop request.
8. The data processing method of claim 1, wherein the plurality of nodes are divided into a plurality of node partitions, each of the plurality of node partitions including at least one node,
the first part indicates whether other nodes except the first node cache the first data or not by recording area information.
9. The data processing method of claim 1, wherein the first coherence directory of the directory listening expansion filter is queried by an inter-node coherence expansion unit of the first node according to the query requirement.
10. The data processing method of claim 1, wherein the first node further comprises a first processor core, a first memory controller, and a directory snoop filter for the first memory controller, the directory snoop filter comprising a second coherence directory, a directory vector for each directory entry in the second coherence directory also comprising the first portion and the second portion for the object data,
The data processing method further comprises the following steps:
generating, by the first processor core, a second access request for second data, wherein a storage address of the second data is located in the first node,
according to the second access request, inquiring the directory entry of the second consistency directory to obtain a second inquiry result, wherein the second inquiry result is used for determining whether the second data is cached in the first node and whether the second data is cached in other nodes except the first node;
and according to the second query result, a second interception request is sent out by the second directory interception filter.
11. The data processing method of claim 10, wherein the first node further comprises a second processor core, the second processor core comprising at least one cache,
the data processing method further comprises the following steps:
and responding to the second query result to determine that the second data is cached in the cache of the second processor core, and sending the second data from the cache of the second processor core to the first processor core by the second processor core according to the second interception request and responding to the second access request.
12. An electronic device comprising a plurality of nodes communicatively coupled to each other, wherein the plurality of nodes comprises a first node,
the first node comprises a directory listening extension filter for the first node, the directory listening extension filter comprising a first coherence directory, a directory vector of each directory entry in the first coherence directory comprising a first portion for object data and a second portion for indicating whether other nodes than the first node cache the object data, the second portion for indicating whether the first node cached the object data;
the first node is configured to query a directory entry of the first coherence directory to determine whether the first data is cached within the first node and within other nodes than the first node in response to the directory snoop extension filter receiving a query requirement for the first data.
13. The electronic device of claim 12, wherein the first node further comprises a first processor core and an inter-node coherence extension unit, the first processor core configured to generate a first access request for the first data, the memory address of the first data being located in a node other than the first node,
The inter-node consistency expansion unit is configured to generate the query requirement based on the first access request;
the directory snoop extension filter is configured to send a first snoop request according to a first query result of querying a directory entry of the first coherence directory.
14. The electronic device of claim 13, wherein the first node further comprises a second processor core, the second processor core comprising at least one cache,
the directory snoop extension filter is further configured to send the first snoop request to the second processor core in response to determining from the first query result that the first data is already cached in the cache of the second processor core.
15. The electronic device of claim 14, wherein the second processor core is configured to send the first data from the cache of the second processor core to the first processor core in response to the first access request according to the first snoop request.
16. The electronic device of claim 13, wherein the directory listening expansion filter is further configured to send the first listening request to other nodes than the first node in response to determining from the first query result that the first data is already cached in the other nodes than the first node.
17. The electronic device of claim 12, wherein the first node further comprises an inter-node consistency expansion unit,
the inter-node consistency expansion unit is configured to receive an off-node interception request for a cache consistency state of the first data;
the directory snoop extension filter is further configured to generate the query requirement from the off-node snoop request.
18. The electronic device of claim 17, wherein the inter-node coherence extension unit is further configured to forward the off-node snoop request and not process the off-node snoop request in response to querying a directory entry of the first coherence directory to determine that the first data is not cached within the first node.
19. The electronic device of claim 12, wherein the plurality of nodes is divided into a plurality of node partitions, each of the plurality of node partitions including at least one node,
the first part indicates whether other nodes except the first node cache the first data or not by recording area information.
20. The electronic device of claim 13 or 17, wherein the first node further comprises a switching unit,
The inter-node coherence extension unit is configured to communicate with the directory listening extension filter through the switching unit.
21. The electronic device of claim 12, wherein the first node further comprises a first processor core, a first memory controller, and a directory snoop filter for the first memory controller, the directory snoop filter comprising a second coherence directory, a directory vector for each directory entry in the second coherence directory also comprising the first portion and the second portion for the object data,
the first processor core is configured to generate a second access request for second data, wherein a memory address of the second data is located in the first node,
the first memory controller is configured to query a directory entry of the second coherence directory for a second inquiry result according to the second access request, for determining whether the second data is cached in the first node and in other nodes than the first node;
the second directory snoop filter is configured to issue a second snoop request according to the second query result.
22. The electronic device of claim 21, wherein the first node further comprises a second processor core, the second processor core comprising at least one cache,
the second processor core is configured to determine, in response to the second query result, that the second data has been cached in the cache of the second processor core, send the second data from the cache of the second processor core to the first processor core in response to the second access request according to the second snoop request.
23. A data processing apparatus comprising:
a memory configured to store computer-executable instructions; and
at least one processor configured to execute the computer-executable instructions,
wherein the computer-executable instructions, when executed by the at least one processor, implement the method of any of claims 1-11.
24. A non-transitory storage medium storing non-transitory computer-executable instructions, wherein the computer-executable instructions, when executed by at least one processor, implement the method of any one of claims 1-11.
CN202311846596.2A 2023-12-28 2023-12-28 Data processing method, data processing device, electronic equipment and storage medium Pending CN117827706A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311846596.2A CN117827706A (en) 2023-12-28 2023-12-28 Data processing method, data processing device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311846596.2A CN117827706A (en) 2023-12-28 2023-12-28 Data processing method, data processing device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117827706A true CN117827706A (en) 2024-04-05

Family

ID=90510998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311846596.2A Pending CN117827706A (en) 2023-12-28 2023-12-28 Data processing method, data processing device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117827706A (en)

Similar Documents

Publication Publication Date Title
US11908546B2 (en) In-memory lightweight memory coherence protocol
US9952975B2 (en) Memory network to route memory traffic and I/O traffic
US7814279B2 (en) Low-cost cache coherency for accelerators
US9792210B2 (en) Region probe filter for distributed memory system
US10402327B2 (en) Network-aware cache coherence protocol enhancement
US9235529B2 (en) Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with optical interconnect
KR100348200B1 (en) Complete and concise remote (ccr) directory
US9009446B2 (en) Using broadcast-based TLB sharing to reduce address-translation latency in a shared-memory system with electrical interconnect
US5560027A (en) Scalable parallel processing systems wherein each hypernode has plural processing modules interconnected by crossbar and each processing module has SCI circuitry for forming multi-dimensional network with other hypernodes
US10042762B2 (en) Light-weight cache coherence for data processors with limited data sharing
GB2316204A (en) Scalable cache coherency using distributed path directories
KR100257993B1 (en) Adaptive Granularity Method for Merging Micro and Coarse Communication in Distributed Shared Memory Systems
US20090248989A1 (en) Multiprocessor computer system with reduced directory requirement
CN117827706A (en) Data processing method, data processing device, electronic equipment and storage medium
US6775742B2 (en) Memory device storing data and directory information thereon, and method for providing the directory information and the data in the memory device
CN117667785A (en) Data processing method, data processing device, electronic equipment and storage medium
US6636948B2 (en) Method and system for a processor to gain assured ownership of an up-to-date copy of data
US11874783B2 (en) Coherent block read fulfillment
CN114238165B (en) Data processing method, data processing apparatus, and storage medium
US11954033B1 (en) Page rinsing scheme to keep a directory page in an exclusive state in a single complex
US20240134795A1 (en) Page rinsing scheme to keep a directory page in an exclusive state in a single complex
CN116225978A (en) Data processing method, data processing device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination