CN115454887A - Data processing method and device, electronic equipment and readable storage medium - Google Patents

Data processing method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN115454887A
CN115454887A CN202211014820.7A CN202211014820A CN115454887A CN 115454887 A CN115454887 A CN 115454887A CN 202211014820 A CN202211014820 A CN 202211014820A CN 115454887 A CN115454887 A CN 115454887A
Authority
CN
China
Prior art keywords
request
address
cache
tag
target state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211014820.7A
Other languages
Chinese (zh)
Inventor
韩新辉
姚永斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Eswin Computing Technology Co Ltd
Original Assignee
Beijing Eswin Computing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Eswin Computing Technology Co Ltd filed Critical Beijing Eswin Computing Technology Co Ltd
Priority to CN202211014820.7A priority Critical patent/CN115454887A/en
Publication of CN115454887A publication Critical patent/CN115454887A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3024Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a central processing unit [CPU]

Abstract

A data processing method, a data processing device, an electronic device and a readable storage medium are provided. The method comprises the following steps: a first request including a first address of a data value to be accessed is received, and then tag information in a tag cache is detected based on the first address, while a conflict check is performed on the first request and a second request, the second request being performed prior to the first request. When the detection result for the tag information is cache hit and the first request and the second request conflict, acquiring a target state that the second request is not updated to the tag cache, forwarding the target state to the first request, associating the target state with the first request, and performing data processing based on the first request and the target state. According to the method and the device, the target state is forwarded to the first request, so that the first request can acquire the correct state without accessing the tag random access memory again, and the execution efficiency of the CPU is improved while the cache consistency is ensured.

Description

Data processing method and device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus, an electronic device, and a readable storage medium.
Background
A single-core processor is a computing component having only a Central Processing Unit (CPU), with all programs or software executing on only one core, while a multi-core processor is a computing component consisting of two or more processors called "cores" that read and execute programs in a more efficient manner than a single core. Multi-core allows computers to more easily run multiple processes simultaneously, thereby increasing performance with the need for multitasking or powerful applications and programs.
Because each core of the multi-core CPU has each cache, the operations between the cores are independent, and a new challenge of maintaining cache consistency is brought.
At present, the consistency of the cache can be maintained by a snoop (snoop) technology, but when a snoop request and a local request conflict, a request in a later execution sequence is blocked (blocked), and the blocked request needs to access the cache again after another request in the conflict is completed, and occupies port resources, so that the execution progress of the subsequent request is influenced, and the performance of the CPU is influenced.
Disclosure of Invention
The purpose of the embodiment of the application is to solve the problem that the execution efficiency of the CPU under the condition of ensuring the cache consistency is not high enough.
In a first aspect, the present application provides a data processing method, including:
receiving a first request to be processed, wherein the first request comprises a first address of a data value to be accessed;
detecting the tag information in the tag cache based on the first address, and simultaneously performing conflict check on the first request and the second request, wherein the second request is executed before the first request;
when the detection result for the tag information is cache hit and the first request and the second request have conflict, acquiring a target state which is not updated to the tag cache after the second request is executed;
the target state is forwarded to the first request to associate the target state with the first request, and data processing is performed based on the associated first request and the target state.
In an optional embodiment of the first aspect, obtaining a target state that has not been updated to the tag cache after the second request is executed includes:
determining a request identifier of the second request;
and acquiring the target state of the second request corresponding to the request identifier based on the request identifier.
In an optional embodiment of the first aspect, the tag information includes at least one address, and the detecting the tag information in the tag cache based on the first address includes:
inquiring a first address in the address of the tag information, and if the address of the tag information contains the first address, determining that the detection result aiming at the tag information is cache hit;
if the address of the tag information does not include the first address, the detection result for the tag information is a cache miss.
In an optional embodiment of the first aspect, the tag information further includes a cache state corresponding to each address, and after a detection result for the tag information is a cache hit, the method further includes:
obtaining a cache state corresponding to a first address in the tag information, and taking the cache state as a tag state corresponding to the first request;
and if the first request and the second request do not have conflict, taking the tag state as a target state.
In an optional embodiment of the first aspect, the first request is located in a preset first request queue, and the second request is located in a preset second request queue; performing data processing based on the associated first request and target state, comprising:
after the second request is completed, the second request queue sends a request identifier corresponding to the second request to the first request queue;
and after the first request queue receives the request identification corresponding to the second request, performing data processing based on the associated first request and the target state.
In an optional embodiment of the first aspect, the first request is a snoop request, and the first request queue is a snoop request queue; the second request is a local request and the second request queue is a local request queue.
In an optional embodiment of the first aspect, the first request is a local request, and the first request queue is a local request queue; the second request is a snoop request, and the second request queue is a snoop request queue.
In a second aspect, there is provided a data processing apparatus comprising:
the device comprises a request receiving module, a processing module and a processing module, wherein the request receiving module is used for receiving a first request to be processed, and the first request comprises a first address of a data value to be accessed;
the conflict detection module is used for detecting the tag information in the tag cache based on the first address and simultaneously performing conflict check on the first request and the second request, wherein the second request is executed before the first request;
the state acquisition module is used for acquiring a target state which is not updated to the tag cache after the second request is executed when the detection result of the tag information is hit and the first request and the second request have conflict;
and the state forwarding module is used for forwarding the target state to the first request, associating the target state with the first request and processing data based on the associated first request and the target state.
In a third aspect, an electronic device is provided, which includes:
the data processing method comprises a memory, a processor and a program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the data processing method of any embodiment.
In a fourth aspect, a readable storage medium is provided, which stores a program that when executed by a processor implements the data processing method of any of the above embodiments.
In the embodiment of the application, when the detection result for the tag information is cache hit and the first request and the second request conflict, the target state which is not updated to the tag cache after the second request is executed is obtained, and the target state is forwarded to the first request, so that the target state is associated with the first request, and data processing is performed based on the associated first request and the target state. According to the method and the device, the target state is forwarded to the first request, so that the first request can acquire the correct state without accessing the tag random access memory again, and the execution efficiency of the CPU is improved while the cache consistency is ensured.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 2 is an interaction diagram of a data processing method according to an embodiment of the present application;
fig. 3 is an interaction diagram of a data processing method according to an embodiment of the present application;
fig. 4 is a schematic flowchart of a data processing method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a data processing electronic device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below in conjunction with the drawings in the present application. It should be understood that the embodiments set forth below in connection with the drawings are exemplary descriptions for explaining technical solutions of the embodiments of the present application, and do not limit the technical solutions of the embodiments of the present application.
As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the terms "comprises" and/or "comprising," when used in this specification in connection with embodiments of the present application, specify the presence of stated features, information, data, steps, operations, elements, and/or components, but do not preclude the presence or addition of other features, information, data, steps, operations, elements, components, and/or groups thereof, as embodied in the art.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The terms referred to in this application will first be introduced and explained:
a Cache memory: the Memory, which is located between the CPU and the main Memory DRAM (Dynamic Random Access Memory), has a small size but a high speed, and is generally composed of an SRAM (Static Random Access Memory).
Level three cache (L1, L2, L3): the third-level cache (including the first-level cache of L1, the second-level cache of L2 and the third-level cache of L3) is a cache integrated in the CPU, and the third-level cache of L1, the second-level cache of L2 and the third-level cache of L3 are used as a high-speed data buffer area between the CPU and the main memory. Where L1 is closest to the CPU core, L2 is second, and L3 is again. In the aspect of operation speed, L1 is fastest, L2 is fast, and L3 is slowest; in terms of capacity size: the L1 capacity is the smallest, the L2 capacity is the larger, and the L3 capacity is the largest.
Request blocking refers to: before the calling result of the request returns, the execution thread of the request is suspended, the CPU execution right is not released, the thread can not do other things, and the waiting is needed. The request can continue to be executed only until the call results are returned.
At present, a multitasking application environment is increasingly involved, and a processor is required to have stronger processing capability and higher processing speed. When the multi-core runs a plurality of single-thread programs on the multi-core processor at the same time, the operating system sends the instructions of the programs to the plurality of cores respectively, so that the speed of completing the programs at the same time is greatly increased.
Because each core of the multi-core CPU is provided with each cache, the operations between the cores are independent, and a new challenge of maintaining cache consistency is brought.
At present, cache consistency can be maintained through snooping (snoop), but when a snoop request conflicts with a local request, since modification of the cache state by one request is often performed at the final stage of the request, when the conflict occurs, the request in the later execution sequence may not obtain the correct cache state, or the request in the later execution sequence is blocked, and the blocked request needs to access the cache again after the completion of another request in the conflict, seizes port resources, affects the execution progress of the subsequent request, and thus affects the performance of the CPU.
The application provides a data processing method, a data processing device, an electronic device and a readable storage medium, which aim to solve the above technical problems in the prior art.
The technical solutions of the embodiments of the present application and the technical effects produced by the technical solutions of the present application will be described below through descriptions of several exemplary embodiments. It should be noted that the following embodiments may be referred to, referred to or combined with each other, and the description of the same terms, similar features, similar implementation steps and the like in different embodiments is not repeated.
Fig. 1 is a method for controlling access to a cache according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S101, a first request to be processed is received, where the first request includes a first address of a data value to be accessed.
In the embodiment of the present application, the pending first request may be any request that needs to operate on the preset cache, where the operation on the preset cache includes, but is not limited to, read, write, modify, invalidate, and the like. The pre-set cache may be used to store the data value to be accessed in the first request. In this embodiment, the predetermined cache may be an L2 cache.
Each first request may have a corresponding first address for characterizing an address in the cache of a data value to be accessed by the first request.
In the embodiment of the present application, the prefix "first" of the "request" is only used for distinguishing, and is distinguished from the "second request" later, so as to facilitate the description of the method flow.
In an actual application scenario, the method provided by the application can be applied to the L2 cache design of a multi-core CPU. The first request may be a listening request or a local request, which is not limited in this application.
Specifically, when the execution sequence of the local requests in the CPU is earlier than the snoop request, the first request may be the snoop request; the first request may be a local request when the execution order of the snoop requests in the CPU is earlier than the local request. In general, the first request in this embodiment is a post-run request.
The specific steps of the embodiments of the present application applied to the above two cases will be described in detail later.
Step S102, detecting the tag information in the tag cache based on the first address, and simultaneously performing conflict check on the first request and the second request, wherein the second request is executed before the first request.
Specifically, the manner of performing conflict check on the first request and the second request includes, but is not limited to: and matching the first address with second addresses respectively corresponding to at least one second request, and if the first address exists in the second addresses with the same first address, considering that the first request and the second request conflict.
In this embodiment, the data value to be accessed by the first request may be stored in the preset cache, and the preset cache may include a tag cache, where the tag cache is used for storing tag information.
Specifically, the preset cache may be an L2 cache or an L3 cache, which is not limited in the present application; the tag information may include an address and a cache status.
The tag information in the tag cache can be detected based on the first address, and whether an address consistent with the first address exists in each tag information is detected; meanwhile, the first address may be matched with second addresses respectively corresponding to at least one second request. Each second request has a corresponding second address, the second addresses are used for representing addresses of data to be targeted by the second requests in the cache, the first addresses and the second addresses corresponding to at least one second request can be matched, and if the second addresses exist in the second addresses with the same first address, the matching is successful.
In an actual application scenario, the embodiment of the present application may be applied to an L2 cache design of a multi-core CPU, and is not limited thereto. The second request may be a listening request or a local request, which is not limited in this application.
Specifically, when the execution sequence of the local requests in the CPU is earlier than the snoop requests, the first request may be the snoop request, the second request may be the local request, the tag information in the tag cache may be detected according to the first address of the snoop request, and meanwhile, the first address of the snoop request may be respectively matched with the second address of at least one local request.
When the execution sequence of the snoop requests in the CPU is earlier than the local requests, the first request may be a local request, the second request may be a snoop request, and at this time, the tag information in the tag cache may be detected according to the first address of the local request, and simultaneously, the first address of the local request may be respectively matched with the second address of at least one snoop request.
In this embodiment of the present application, the tag information includes at least one address, and detecting the tag information in the tag cache based on the first address may include the following steps: inquiring a first address in the address of the tag information, and if the address of the tag information contains the first address, determining that the detection result aiming at the tag information is hit; if the address of the tag information does not include the first address, the detection result for the tag information is missing.
In this embodiment of the present application, cache lines (cache lines) are constituent units of a preset cache, where the preset cache includes a tag cache, and each cache line also has a respective tag cache portion. Each cache line corresponds to a unique tag information (or called tag, tag), and the tag information can be stored in the tag cache part of each cache line. The tag information may include an address and a state corresponding to the corresponding cache line.
Each cache line comprises an index (index) and a block offset (block offset) besides tag information corresponding to each cache line, the processor can determine the cache line corresponding to the first address in the preset cache through the index (index), but data in the cache line is not necessarily required by the first request, the first address needs to be further inquired in the address contained in the tag information of each cache line, if the address of the tag information contains the first address, namely the address corresponding to any cache line is the first address, the detection result aiming at the tag information is cache hit (hit), otherwise, the detection result is cache miss (miss).
When the request reaches the preset cache, whether the data required by the request exists in the preset cache can be determined in a mode of inquiring the tag information.
Step S103, when the detection result for the tag information is cache hit and the first request and the second request conflict, acquiring a target state that is not updated to the tag cache after the second request is executed.
When the first request and the second request conflict, that is, a target second address matching the first address exists in at least one second address, the target state that has not been updated to the tag cache after the second request is executed may be obtained. At this time, the target state is a state corresponding to a target second address matching the first address.
Specifically, when the address of the tag information includes a first address, that is, an address corresponding to any cache line is the first address, the detection result for the tag information is a cache hit. The presence of a target second address matching the first address in the at least one second address may refer to: if the same address as the first address exists in the second addresses, it may be considered that a second address matching the first address exists, and the matching second address may be used as the target second address.
A target state corresponding to the target second address may be further obtained. Specifically, during execution of a request, modifying the state is typically done by the last execution phase of the request. The state to be modified for the cache line corresponding to the second address can be stored in the second request in advance, so that the target state in the second request corresponding to the target second address can be obtained, the target state is forwarded to the first request which conflicts with the target state, the first request does not need to access the tag random access memory again to obtain the correct state, and the execution efficiency of the request is improved.
In this embodiment of the present application, obtaining a target state that is not updated to the tag cache after the second request is executed may include the following steps:
and determining a request identifier of the second request, and acquiring a target state of the second request corresponding to the request identifier based on the request identifier.
In this embodiment, the request identifier may refer to a request ID of each second request in the second request queue.
The second request may be located in a second request queue, the second request queue may store a correspondence between each second request and each request identifier in advance, and may determine, according to the correspondence and the target second address, a request identifier corresponding to the target second address, and then determine, based on the request identifier corresponding to the target second address, a second request corresponding to the target second address in the second request queue, and obtain a target state in the second request.
In some embodiments, since each second request has a corresponding relationship with the second address, the second request corresponding to the target second address may also be directly determined according to the corresponding relationship between the second request and the second address, and then the target state in the second request is obtained.
Step S104, forwarding the target state to the first request, so as to associate the target state with the first request, and performing data processing based on the associated first request and the target state.
In the embodiment of the present application, when a detection result of detecting tag information based on a first address of a first request is a cache hit and there is a second request conflicting with the first request, the first request will not be directly executed after reaching a cache, but enters a first request queue to wait for completion of execution of the second request conflicting with the first request queue, and then continues to execute the first request.
After the target state of the second request which conflicts with the first request is acquired, the target state can be forwarded to the first request in the first request queue, the target state is associated with the first request until the execution of the second request which conflicts with the first request is completed, and data processing is performed based on the associated first request and the target state.
Wherein, the execution strategy of the first request is different according to the different target states.
In the embodiment of the present application, the types of target states include, but are not limited to, several of the MESI protocols:
(1) Modified (Modified): indicating that the data on the Cache Block has been updated but has not been written to memory.
(2) Spent (Invalidated): indicating that the data in this cache block has failed and that the data in this state cannot be read.
(3) Whether in Exclusive (Exclusive) or Shared (Shared) state, the data in the cache is "clean". At this time, the data in the cache block is consistent with the data in the main memory.
In the exclusive state, the corresponding Cache Line is only loaded into the Cache owned by the current CPU core. And other CPU cores do not load corresponding data into the Cache of the other CPU cores. At this time, if data is to be written to an exclusive cache block, the data can be freely written without informing other CPU cores.
In the exclusive state, if a request to read the corresponding cache is received from the bus, it becomes the shared state. The reason for this shared state is that, at this time, another CPU core also loads the corresponding Cache block from the memory into its own Cache.
In the shared state, the same data is stored in the caches of the CPU cores. Therefore, when data in the Cache is required to be updated, the data cannot be directly modified, a request is firstly broadcast to all other CPU cores, the caches in the other CPU cores are required to be changed into an invalid state, and then the data in the current Cache is updated. This broadcast operation, commonly called RFO (Request For Ownership), is to obtain Ownership of the current corresponding cache block data.
Nowadays, as computer technology develops, the types of states are also diversified, and the four states are the basic state types, which are only examples. In a specific application, the application does not limit the type of the state.
According to the difference of the target states, the execution strategies aiming at the first request are different, so that the correct state needs to be obtained when the first request is executed, and the target state is forwarded to the first request, so that the first request can obtain the correct state without accessing the tag random access memory again, the execution speed of other requests is increased, and the performance of a CPU is improved.
In the embodiment of the application, when the detection result for the tag information is cache hit and the first request and the second request conflict, the target state which is not updated to the tag cache after the second request is executed is obtained, and the target state is forwarded to the first request, so that the target state is associated with the first request, and data processing is performed based on the associated first request and the target state. According to the method and the device, the target state is forwarded to the first request, so that the first request can acquire the correct state without accessing the tag random access memory again, and the execution efficiency of the CPU is improved while the cache consistency is ensured.
In the embodiment of the present application, a possible implementation manner is provided, the tag information further includes a cache state corresponding to each address, and after a detection result of the tag information is a cache hit, the method may further include the following steps:
obtaining a cache state corresponding to a first address in the tag information, and taking the cache state as a tag state corresponding to the first request;
if there is no conflict between the first request and the second request (for example, there is no target second address matching the first address in the second address), the tag status is used as the target status.
In this embodiment of the application, when detecting tag information of a tag cache based on a first address, if a detection result is a cache hit, a cache state corresponding to the first address in the tag information may be simultaneously obtained as a tag state corresponding to a first request.
If the second address does not have a target second address matching the first address, it indicates that there is no second request conflicting with the first request, and the tag status may be directly used as the target status corresponding to the first request.
The embodiment of the application provides a possible implementation manner, and the first request can be located in a preset first request queue after a detection result for the tag information is hit; the second request may be in a preset second request queue.
Performing data processing based on the associated first request and target state may include the steps of:
(1) After the second request corresponding to the target second address is completed, the second request queue may send the request identifier corresponding to the second request to the first request queue to notify the first request queue that the execution of the second request having a conflict with the first request is completed, and the first request may be continuously executed.
(2) After the first request queue receives the request identifier corresponding to the second request, data processing may be performed based on the associated first request and the target state.
Specifically, the execution policy of the first request will be different according to the target state, and data processing may be performed based on the associated first request and the target state.
For example, it is assumed that the obtained target state is "invalidated," which means that data in a Cache Line (Cache Line) corresponding to the target state has been invalidated, that is, the data in the Cache Line is not trusted. If the snoop request is intended to invalidate the cache line, then if the latest status of the snoop request is "invalidated," the snoop request will not perform any subsequent operations on the data in the cache block, i.e., no subsequent "modify tag" operations are required.
In some embodiments, the first request may be a snoop request, the first request queue being a snoop request queue; the second request may be a local request and the second request queue may be a local request queue.
In an actual application scenario, the data processing method provided by the application can be applied to the L2 cache design of a multi-core CPU. When the execution sequence of the local requests in the system is earlier than the snoop requests and the snoop requests are blocked, the snoop requests can be the first requests in the application, and the local requests can be the second requests in the application.
In other embodiments, the first request may be a local request and the first request queue may be a local request queue; the second request is a snoopable request and the second request queue is a snoopable request queue.
Specifically, when the execution order of the snoop requests in the system is earlier than the local requests and the local requests are blocked, the local request may be the first request in the present application, and the snoop request may be the second request in the present application.
The above two cases will be explained in detail below.
As an example, the method can be applied to the L2 cache design of a multi-core CPU.
When the local request and snoop request address of the L2 cache are the same or the same cache line is processed, the local request and snoop request will generate conflict.
When a conflict occurs, the following two cases can be distinguished:
the first condition is as follows: the local requests are executed in the system in an order that is earlier than the snoop requests, which are blocked.
Case two: snoop requests are executed in the system in an order that is earlier than local requests, which are blocked.
In the embodiment of the present application, the execution order of the snoop request and the local request in the system may be determined by the following method:
the execution order of requests in the system depends on the execution order seen by the bus (home node). If the local request conflicts with the monitoring request, the local request is not sent to the bus, or the local request is sent to the bus but a response message (response) is not received, the monitoring request is considered to be earlier than the local request; if a local request has been sent to the bus and a response message has been received, the local request may be considered to be executed in an order earlier than the snoop requests.
The above method for determining the execution sequence of the requests in the system is only an example, and the application is not limited thereto.
The method provided by the present application will be described below with respect to case one and case two, respectively.
In case one, the local requests are executed in the system in an order that is earlier than the snoop requests, and the snoop requests are blocked. As shown in fig. 2, the specific method flow is as follows:
step S1, after the snoop request reaches the L2 cache, a tag (tag) in the cache may be queried according to the first address corresponding to the snoop request. The tag in the present application may include two parts, namely, an address and a cache state. Whether the address in the tag comprises a first address corresponding to the monitoring request can be queried, if the address in the tag comprises the first address corresponding to the monitoring request, it is indicated that target data targeted by the monitoring request is in the cache, and if the query result for the tag is hit, the monitoring request enters a monitoring request queue; if the address in the tag does not include the first address corresponding to the snoop request, it indicates that the target data targeted by the snoop request is not in the cache, and the query result for the tag is missing, the snoop request will not enter the snoop request queue.
A conflict check may be performed while querying whether the tag contains the first address. Specifically, a first address corresponding to the snoop request may be respectively matched with second addresses of a plurality of local requests located in the local request queue, and if the second address of the local request is the same as the first address of the snoop request, it is determined that the local request and the snoop request conflict.
Step S2, obtaining the request Identification (ID) of the local request with conflict, and obtaining the state of the local request corresponding to the request identification as the forwarding state according to the request identification. This forwarding state may not have been updated into the tag random access memory (tag RAM), but is the state that the snoop request is intended to take.
Step S3, after querying a tag (tag) in the cache according to the first address corresponding to the snoop request, if a query result for the tag is a hit, the cache state corresponding to the first address in the tag may be obtained as the tag state.
The target state may be determined in the tag state and the forwarding state, and specifically, once there is a local request that conflicts with the snoop request, the forwarding state of the local request is taken as the target state.
And if the second address of the local request is not the same as the first address of the monitoring request, no conflict is generated between the local request and the monitoring request, and the tag state is taken as the target state.
Step S4, writes the target state into the snoop request queue (SNPQ). The snoop request queue is used for storing snoop requests and finishing the follow-up operations of updating tags, reading cache and the like.
The snoop requests in the snoop request queue wait for the completion of the local requests which conflict with the snoop requests and then continue to execute.
And step S5, after the local request with the conflict is finished, the local request queue storing the local request sends a request identifier of the local request with the conflict to the monitoring request queue, so as to inform the monitoring request queue that the local request with the conflict is finished and the subsequent operation of the monitoring request can be started.
Because the monitoring queue has already brought the target state by the way of state forwarding, associate the target state and the monitoring request, can confirm the execution strategy of the monitoring request according to the target state directly, and carry out the monitoring request according to the execution strategy that the target state corresponds in order to finish the subsequent data handling operation, do not need to visit the random access memory of label again.
For example, it is assumed that the obtained target state is "invalidated," which represents that data in a Cache Line (Cache Line) corresponding to the target state has been invalidated, that is, the data in the Cache Line is not trusted. If the snoop request is intended to invalidate the cache line, then if the latest status of the snoop request is "invalidated," the snoop request will not perform any subsequent operations on the data in the cache block, i.e., no subsequent "modify tag" operations are required.
According to the process, under the condition that the monitoring request is blocked by the local request, the latest state is obtained in a state forwarding mode, so that the tag random access memory does not need to be accessed again when the monitoring request is executed, the tag does not need to be read again, the access pressure of the tag random access memory is relieved, the execution progress of other requests is accelerated, and the CPU performance is improved.
In case two, the snoop requests are executed in the system in an order that is earlier than the local requests, which are blocked. At this time, the address corresponding to the local request is the first address, the address corresponding to the snooping request is the second address, and the "first" and the "second" are only used for distinguishing the addresses corresponding to the local request and the snooping request respectively, so that the explanation and explanation of the method flow are facilitated.
As shown in fig. 3, the specific method flow is as follows:
step S1, after the local request reaches the L2 cache, a tag (tag) in the cache may be queried according to a first address corresponding to the local request.
The tag in the present application may include two parts, namely an address and a cache state. Whether the address in the tag includes a first address corresponding to the local request or not can be queried, if the address in the tag includes the first address corresponding to the local request, it is indicated that target data targeted by the local request is in the cache, and a query result for the tag is a hit, and the local request enters a local request queue; if the address in the tag does not include the first address corresponding to the local request, it indicates that the target data targeted by the local request is not in the cache, and the query result for the tag is missing, and the local request will not enter the local request queue.
A conflict check may be performed while querying whether the tag contains the first address. Specifically, a first address corresponding to the local request may be matched with second addresses of multiple snoop requests located in the snoop request queue, and if the second addresses of the snoop requests are the same as the first addresses of the local requests, the snoop request and the local request are considered to conflict with each other.
Step S2, obtaining the request Identification (ID) of the monitoring request with conflict, and obtaining the state of the monitoring request corresponding to the request identification as the forwarding state according to the request identification. This forwarding state may not yet be updated into the tag random access memory (tag RAM), but is the state that the local request wants to get.
Step S3, after querying a tag (tag) in the cache according to the first address corresponding to the local request, if a query result for the tag is a hit, the cache state corresponding to the first address in the tag may be obtained as the tag state.
The target state may be determined in the tag state and the forwarding state, and specifically, once there is a snoop request conflicting with the local request, the forwarding state of the snoop request is taken as the target state.
And if the second address of the monitoring request is not the same as the first address of the local request, the monitoring request and the local request are not in conflict, and the tag state is taken as a target state.
Step S4, writing the target state into the local request queue (REQQ). The local request queue is used for storing local requests and completing subsequent operations of updating tags, reading cache and the like.
The local request in the local request queue will wait for the completion of the conflicting snoop request and then continue execution.
And step S5, after the monitoring request with the conflict is finished, the monitoring request queue storing the monitoring request sends a request identifier of the monitoring request with the conflict to the local request queue to inform the local request queue that the monitoring request with the conflict is finished and the subsequent operation of the local request can be started.
Because the local queue takes the target state in a state forwarding mode, the target state and the local request are associated, the execution strategy of the local request can be directly determined according to the target state, and the local request is executed according to the execution strategy corresponding to the target state to complete the subsequent data processing operation, the tag random access memory does not need to be accessed again, the access pressure of the tag random access memory is relieved, the execution speed of other requests is accelerated, and the performance of a CPU (Central processing Unit) is improved.
In an embodiment of the present application, as shown in fig. 4, a data processing method provided by the present application may include the following steps:
step S401, receiving a first request to be processed, wherein the first request comprises a first address;
step S402, detecting the tag information in the tag cache based on the first address, and simultaneously performing conflict check on the first request and the second request; specifically, the conflict check may be performed by matching the first address with a second address corresponding to the second request; the first address can be queried in the address of the tag information, if the address of the tag information contains the first address, the detection result for the tag information is cache hit, and if the address of the tag information does not contain the first address, the detection result for the tag information is cache miss;
step S403, judging whether the detection result of the tag information is cache hit, if so, entering step S405, otherwise, entering step S404;
step S404, ending the flow;
step S405, whether a target second address matched with the first address exists or not is judged, if yes, the step S407 is executed, and if not, the step S406 is executed;
step S406, acquiring a cache state corresponding to the first address in the tag information, taking the cache state as a tag state corresponding to the first request, and taking the tag state as a final target state, and entering step S409;
step S407, determining a request identifier corresponding to the second target address, and acquiring a target state in a second request corresponding to the request identifier based on the request identifier;
step S408, forwarding the target state to the first request, and associating the target state with the first request;
step S409, data processing is carried out based on the associated first request and the target state; specifically, after the second request corresponding to the second target address is completed, the second request queue may send the request identifier corresponding to the second target address to the first request queue, inform the first request queue and the first request queue that the request identifier corresponding to the second target address is received at the first request queue, and perform data processing based on the associated first request and the target state.
An embodiment of the present application provides a data processing apparatus, and as shown in fig. 5, the data processing apparatus 50 may include: a request receiving module 501, a conflict detecting module 502, a status obtaining module 503, and a status forwarding module 504, wherein,
a request receiving module 501, configured to receive a first request to be processed, where the first request includes a first address of a data value to be accessed;
a conflict detection module 502, configured to detect tag information in a tag cache based on a first address, and perform conflict check on a first request and a second request at the same time, where the second request is executed before the first request;
a state obtaining module 503, configured to, when a detection result for the tag information is a hit and a conflict exists between the first request and the second request, obtain a target state that is not updated to the tag cache after the second request is executed;
a state forwarding module 504, configured to forward the target state to the first request, associate the target state with the first request, and perform data processing based on the associated first request and the target state.
In this embodiment of the present application, when acquiring a target state that is not updated to the tag cache after the second request is executed, the state acquiring module 503 is specifically configured to:
determining a request identifier of the second request;
and acquiring the target state of the second request corresponding to the request identifier based on the request identifier.
In this embodiment of the present application, the tag information includes at least one address, and when the conflict detection module 502 detects the tag information in the tag cache based on the first address, it is specifically configured to:
inquiring a first address in the address of the tag information, wherein if the address of the tag information contains the first address, the detection result aiming at the tag information is cache hit;
if the address of the tag information does not include the first address, the detection result for the tag information is a cache miss.
In this embodiment of the present application, the tag information further includes a cache state corresponding to each address, and after a detection result for the tag information is a cache hit, the tag information further includes a tag state obtaining module, which is specifically configured to:
obtaining a cache state corresponding to a first address in the tag information, and taking the cache state as a tag state corresponding to the first request;
and if the first request and the second request do not have conflict, taking the tag state as a target state.
In the embodiment of the application, the first request is located in a preset first request queue, and the second request is located in a preset second request queue; when performing data processing based on the associated first request and the target state, the state forwarding module 504 is specifically configured to:
after the second request is completed, the second request queue sends a request identifier corresponding to the second request to the first request queue;
and after the first request queue receives the request identification corresponding to the second request, performing data processing based on the associated first request and the target state.
In the embodiment of the application, the first request is a monitoring request, and the first request queue is a monitoring request queue; the second request is a local request and the second request queue is a local request queue.
In the embodiment of the application, the first request is a local request, and the first request queue is a local request queue; the second request is a snoop request, and the second request queue is a snoop request queue.
The apparatus in the embodiment of the present application may execute the method provided in the embodiment of the present application, and the implementation principle is similar, the actions executed by the modules in the apparatus in the embodiments of the present application correspond to the steps in the method in the embodiments of the present application, and for the detailed functional description of the modules in the apparatus, reference may be made to the description in the corresponding method shown in the foregoing, and details are not repeated here.
In an embodiment of the present application, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory, where the processor executes the computer program to implement the steps of the data processing method, and compared with the related art, the steps of: by forwarding the target state to the first request, the first request can acquire a correct state without accessing the tag random access memory again, the execution speed of other requests is increased, and the performance of the CPU is improved.
In an alternative embodiment, an electronic device is provided, as shown in fig. 6, the electronic device 4000 shown in fig. 6 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further include a transceiver 4004, and the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computing function, e.g., comprising one or more microprocessors, a combination of DSPs and microprocessors, etc.
Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.
The Memory 4003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium, other magnetic storage devices, or any other medium that can be used to carry or store a computer program and that can be Read by a computer, and is not limited herein.
The memory 4003 is used for storing computer programs for executing the embodiments of the present application, and is controlled by the processor 4001 to execute. The processor 4001 is used to execute computer programs stored in the memory 4003 to implement the steps shown in the foregoing method embodiments.
Among them, electronic devices include but are not limited to: mobile terminals such as mobile phones, notebook computers, PADs, etc., and fixed terminals such as digital TVs, desktop computers, etc.
The embodiments of the present application provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, can implement the steps of the foregoing method embodiments and corresponding content.
Embodiments of the present application further provide a computer program product, which includes a computer program, and when the computer program is executed by a processor, the steps and corresponding contents of the foregoing method embodiments can be implemented.
The terms "first," "second," "third," "fourth," "1," "2," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in other sequences than illustrated or otherwise described herein.
It should be understood that, although each operation step is indicated by an arrow in the flowchart of the embodiment of the present application, the implementation order of the steps is not limited to the order indicated by the arrow. In some implementation scenarios of the embodiments of the present application, the implementation steps in the flowcharts may be performed in other sequences as needed, unless explicitly stated otherwise herein. In addition, some or all of the steps in each flowchart may include multiple sub-steps or multiple stages based on an actual implementation scenario. Some or all of these sub-steps or stages may be performed at the same time, or each of these sub-steps or stages may be performed at different times, respectively. Under the scenario that the execution time is different, the execution sequence of the sub-steps or phases may be flexibly configured according to the requirement, which is not limited in the embodiment of the present application.
The foregoing is only an optional implementation manner of a part of implementation scenarios in this application, and it should be noted that, for those skilled in the art, other similar implementation means based on the technical idea of this application are also within the protection scope of the embodiments of this application without departing from the technical idea of this application.

Claims (10)

1. A data processing method, comprising:
receiving a first request to be processed, wherein the first request comprises a first address of a data value to be accessed;
detecting the tag information in the tag cache based on the first address, and simultaneously performing conflict check on the first request and a second request, wherein the second request is executed before the first request;
when the detection result aiming at the tag information is cache hit and the first request and the second request have conflict, acquiring a target state which is not updated to the tag cache after the second request is executed;
forwarding the target state to the first request to associate the target state with the first request, and performing data processing based on the associated first request and the target state.
2. The data processing method of claim 1, wherein the obtaining the target state that has not been updated to the tag cache after the second request is executed comprises:
determining a request identifier of the second request;
and acquiring the target state of the second request corresponding to the request identifier based on the request identifier.
3. The data processing method of claim 1, wherein the tag information comprises at least one address, and the detecting the tag information in the tag cache based on the first address comprises:
inquiring the first address in the address of the tag information, wherein if the address of the tag information contains the first address, the detection result aiming at the tag information is cache hit;
if the address of the tag information does not include the first address, the detection result for the tag information is a cache miss.
4. The data processing method according to claim 3, wherein the tag information further includes a cache status corresponding to each address, and after the detection result for the tag information is a cache hit, the method further comprises:
obtaining a cache state corresponding to the first address in the tag information, and taking the cache state as a tag state corresponding to the first request;
and if the first request and the second request do not have conflict, taking the tag state as the target state.
5. The data processing method according to any one of claims 2 to 4, wherein the first request is in a predetermined first request queue, and the second request is in a predetermined second request queue; the data processing based on the associated first request and the target state comprises:
after the second request is completed, the second request queue sends a request identifier corresponding to the second request to the first request queue;
and after the first request queue receives the request identifier corresponding to the second request, performing data processing based on the associated first request and the target state.
6. The data processing method of claim 5, wherein the first request is a snoop request, and the first request queue is a snoop request queue; the second request is a local request, and the second request queue is a local request queue.
7. The data processing method of claim 5, wherein the first request is a local request, and the first request queue is a local request queue; the second request is a monitoring request, and the second request queue is a monitoring request queue.
8. A data processing apparatus, comprising:
a request receiving module, configured to receive a first request to be processed, where the first request includes a first address of a data value to be accessed;
a conflict detection module, configured to detect tag information in a tag cache based on the first address, and perform conflict check on the first request and a second request at the same time, where the second request is executed before the first request;
a state obtaining module, configured to, when a detection result for the tag information is a hit and a conflict exists between the first request and the second request, obtain a target state that is not updated to a tag cache after the second request is executed;
and the state forwarding module is used for forwarding the target state to the first request, associating the target state with the first request and processing data based on the associated first request and the target state.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the steps of the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the data processing method of any one of claims 1 to 7.
CN202211014820.7A 2022-08-23 2022-08-23 Data processing method and device, electronic equipment and readable storage medium Pending CN115454887A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211014820.7A CN115454887A (en) 2022-08-23 2022-08-23 Data processing method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211014820.7A CN115454887A (en) 2022-08-23 2022-08-23 Data processing method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN115454887A true CN115454887A (en) 2022-12-09

Family

ID=84298215

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211014820.7A Pending CN115454887A (en) 2022-08-23 2022-08-23 Data processing method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN115454887A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116089116A (en) * 2022-12-16 2023-05-09 成都海光集成电路设计有限公司 Data processing method and device
CN116701246A (en) * 2023-05-23 2023-09-05 合芯科技有限公司 Method, device, equipment and storage medium for improving cache bandwidth
CN117349199A (en) * 2023-11-30 2024-01-05 摩尔线程智能科技(北京)有限责任公司 Cache management device and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116089116A (en) * 2022-12-16 2023-05-09 成都海光集成电路设计有限公司 Data processing method and device
CN116701246A (en) * 2023-05-23 2023-09-05 合芯科技有限公司 Method, device, equipment and storage medium for improving cache bandwidth
CN116701246B (en) * 2023-05-23 2024-05-07 合芯科技有限公司 Method, device, equipment and storage medium for improving cache bandwidth
CN117349199A (en) * 2023-11-30 2024-01-05 摩尔线程智能科技(北京)有限责任公司 Cache management device and system

Similar Documents

Publication Publication Date Title
CN115454887A (en) Data processing method and device, electronic equipment and readable storage medium
US8706973B2 (en) Unbounded transactional memory system and method
US7840759B2 (en) Shared cache eviction
US8762651B2 (en) Maintaining cache coherence in a multi-node, symmetric multiprocessing computer
JP4680851B2 (en) System controller, same address request queuing prevention method, and information processing apparatus
US8423736B2 (en) Maintaining cache coherence in a multi-node, symmetric multiprocessing computer
US9323675B2 (en) Filtering snoop traffic in a multiprocessor computing system
EP2891984B1 (en) Transaction abort method in a multi-core CPU
JP4874165B2 (en) Multiprocessor system and access right setting method in multiprocessor system
CN111737564B (en) Information query method, device, equipment and medium
CN110291507B (en) Method and apparatus for providing accelerated access to a memory system
WO2020225615A1 (en) Executing multiple data requests of multiple-core processors
US6405292B1 (en) Split pending buffer with concurrent access of requests and responses to fully associative and indexed components
US6526480B1 (en) Cache apparatus and control method allowing speculative processing of data
CN114518900A (en) Instruction processing method applied to multi-core processor and multi-core processor
US8560776B2 (en) Method for expediting return of line exclusivity to a given processor in a symmetric multiprocessing data processing system
US6839806B2 (en) Cache system with a cache tag memory and a cache tag buffer
CN113900968B (en) Method and device for realizing synchronous operation of multi-copy non-atomic write storage sequence
US11334486B2 (en) Detection circuitry
US8938588B2 (en) Ensuring forward progress of token-required cache operations in a shared cache
US11321146B2 (en) Executing an atomic primitive in a multi-core processor system
CN114780447A (en) Memory data reading method, device, equipment and storage medium
CN113867801A (en) Instruction cache, instruction cache group and request merging method thereof
CN103885824B (en) Interface control circuit, equipment and mark changing method
WO2021055056A1 (en) Dynamic hammock branch training for branch hammock detection in an instruction stream executing in a processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination