CN114217855A

CN114217855A - Method and system for realizing store _ buffer merge

Info

Publication number: CN114217855A
Application number: CN202111549628.3A
Authority: CN
Inventors: 李长林; 刘磊
Original assignee: Guangdong Saifang Technology Co ltd
Current assignee: Guangdong Saifang Technology Co ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-03-22

Abstract

The invention relates to the technical field of CPUs (central processing units), in particular to a method and a system for realizing a store _ buffer merge. The invention provides a time window for the store data to the store instruction of the same cache for merge, thereby fully utilizing pipeline resources and using more pipeline resources for other requests. Thereby providing the overall performance of the CPU. Forcing Pipeline on the entry of the store buffer when the entry of the store buffer is running low, and not waiting for a younger store to come to merge; after a storage buffer entry is allocated, after a certain number of cycles, the Pipeline is forced to be on the entry of the storage buffer, and a younger storage is not waited for to merge, so that the overall performance of the CPU is improved.

Description

Method and system for realizing store _ buffer merge

Technical Field

The invention relates to the technical field of CPUs (central processing units), in particular to a method and a system for realizing store _ buffer merge.

Background

After the store instruction is reiterated, the data is stored in the store _ buffer, according to the correlation of the data, the data of several previous and next store instructions are often inside the same cacheline, if each store instruction is used for uploading a Pipeline independently, the number of occupied pipelines is large, if a time window is provided for the store data to merge the store instructions of the same cacheline later, so that the store instructions can be merged together, the times of uploading pipelines in the store _ buffer by the plurality of store instructions can be greatly reduced, so that the Pipeline resources are fully utilized, and more Pipeline resources are used for other requests. Thereby providing the overall performance of the CPU.

If the time window for providing merge is too long, the entry of the store _ buffer may be full, the data that has already been replaced cannot be normally stored in the store _ buffer from the store queue, and the entry of the store queue cannot be deallocated, which may have a certain influence on the overall performance of the CPU, and in order to solve the problem, two methods are proposed: (1) the Pipeline is forced on the entry of the store buffer when the store buffer entry is running low, instead of waiting for a younger store to merge. (2) After a store buffer entry is allocated, after a certain number of cycles, the store buffer entry is forced to be Pipeline, instead of waiting for a younger store to merge.

Disclosure of Invention

Aiming at the defects of the prior art, the invention discloses a method and a system for realizing a store _ buffer merge, which are used for solving the problems.

The invention is realized by the following technical scheme:

in a first aspect, the present invention provides a method for implementing a store _ buffer merge, including the following steps:

s1 is initialized, after the Store instruction enters the Store buffer from the Store queue, the service condition of the Store buffer entry item is judged, and corresponding processing is carried out;

s2, sending a read request by the store _ buffer, uploading pipeline, judging whether the M/E state or the S state is hit in the D _ cache, and performing corresponding processing;

the S3miss queue entry sends a load request to L2, and L2 returns load E-state data and simultaneously wakes up the miss in the store buffer to hit the S-state store _ buffer entry;

s4, backfilling the reload _ buffer with the reload data, uploading pipeline, detecting whether the data of the same cacheline needs to be stored in the store _ buffer, and performing corresponding processing.

Furthermore, in the method, if the store buffer entry is running out, the store instruction is forced to enter pipeline as soon as possible, and the store buffer entry is emptied.

Furthermore, in the method, if more than 2store entries of the store buffer entry are not still in empty, delaying the new entry for a period of time and then uploading, and then merging if a store instruction with the same address and the entry appear.

Further, in the method, if the M/E state is hit in the D _ cache, the store _ buffer then issues a write request, writes the data into the D _ cache, and modifies the state of the cacheline to the E state.

Furthermore, in the method, if no hit occurs in the D _ cache or the hit occurs in an S state, an entry is allocated to the miss queue request.

Furthermore, in the method, while returning the reload E-state data, L2 wakes up the miss hit in the store buffer to store _ buffer entry in the S-state, and after giving a merge window of several clock cycles to the pipeline on the refill, wakes up the store _ buffer entry and loads the pipeline.

Furthermore, in the method, after the reloaded data is refilled into the refill _ buffer, the refill immediately writes the pipeline, at this time, the pipeline detects whether the data of the same cache needs to be stored in the storage _ buffer, if the data of the storage needs to be stored in the D _ cache, the storage _ buffer and the refill _ buffer entry are deleted by the allocation.

In a second aspect, the present invention provides a system for implementing a store _ buffer merge, where the system is configured to implement the method for implementing the store _ buffer merge in the first aspect, and includes a store _ buffer module, a pipeline module, a miss queue module, and a refill module.

Furthermore, the store _ buffer module is configured to, after the store queue completes sta and std operations, obtain that the store data of the store instruction can be written into the memory by the graduation module, and write the data of the store into the store _ buffer;

the store _ buffer determines whether to directly write the data of the store into the D _ cache or request to allocate missq entry according to the hit state of the D _ cache, and then the L2 returns the load data, and the pipeline on the refill and the refill data are merged and written into the D _ cache;

the pipeline module is used for instructing data access after pipeline, if an entry item is required to be allocated in the Miss new missq, pipeline is allocated in the refill, and if the data consistent with the address in the store _ buffer needs to be stored, the two data are merged and then are written into the D _ cache;

the miss queue module is used for sending a load request to l2 after D _ cache miss is completed;

the refill module is configured to receive the pipeline on the refill _ buffer after receiving the data returned by the L2.

The invention has the beneficial effects that:

the invention provides a time window for the store data to merge the store instructions of the same cacheline later, so that more store instructions can merge together, and the times of uploading Pipeline in the store _ buffer by the store instructions can be greatly reduced, thereby fully utilizing Pipeline resources and using more Pipeline resources for other requests. Thereby providing the overall performance of the CPU.

The method forces the Pipeline on the entry item of the storage buffer when the entry item of the storage buffer is running out, and does not wait for the younger storage to come to merge; after a storage buffer entry is allocated, after a certain number of cycles, the Pipeline is forced to be on the entry of the storage buffer, and a younger storage is not waited for to merge, so that the overall performance of the CPU is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a basic block diagram of a store _ buffer merge implementation system;

FIG. 2 is a flowchart of a store instruction after entering a store buffer, according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

The embodiment provides a method for implementing store _ buffer merge, which includes the following steps:

In the embodiment, a time window is provided for the store data to merge the store instruction of the same cacheline later, so that more than one store instruction can merge together, and the number of Pipeline times of the multiple store instructions in the store _ buffer can be greatly reduced.

The embodiment forcibly puts Pipeline on the entry of the store buffer when the entry of the store buffer is running low, and does not wait for a younger store to merge.

After a store buffer entry is allocated, after a certain number of cycles, the store buffer entry is forced to be associated with Pipeline, and a younger store is not waited for to merge.

Example 2

In a specific implementation level, referring to fig. 2, the embodiment provides a flow after a store instruction enters a store buffer, which is specifically as follows:

after the Store instruction enters the Store buffer from the Store queue;

if the store buffer entry is running out, the store instruction is forced to enter pipeline as soon as possible, the store buffer entry is emptied, and the pipeline is not required to be blocked

If more than 2 entry items of the store buffer entry are not yet in empty, delaying the newly entered entry for a period of time and then uploading, and merging the store instruction and the entry if the same address appears later, thereby reducing the number of times of uploading the store instruction

In this embodiment, the store _ buffer sends a read request, and after pipeline is started;

if the M/E state is hit in the D _ cache, the store _ buffer then issues a write request, writes the data into the D _ cache, and modifies the cache state to the E state

If no hit exists in the D _ cache or the hit time is S-state, an entry item is allocated to the request in the miss queue

The miss queue entry of the embodiment sends a load request to the L2;

in this embodiment, L2 wakes up the miss in the store buffer while returning the reload E-state data to hit the store _ buffer entry in the S-state, but the woken store _ buffer entry does not immediately prioritize but gives a merge window of several clock cycles to the pipeline in the refill, so that the storage _ buffer is again prioritized.

In this embodiment, after the reload data is refilled into the refill _ buffer, the refill immediately starts to upload a pipeline, at this time, the pipeline detects whether there is data of the same cache in the store _ buffer that needs to be stored, if so, the data of the store is merged together with the merge and written into the D _ cache, and simultaneously the store _ buffer and the refill _ buffer entry are deallocated.

If the time window for providing merge is too long, the entry of the store _ buffer may be full, the data already in the reserve cannot be normally saved from the store queue to the store _ buffer, and the entry of the store queue cannot be deallocated, which may have a certain effect on the overall performance of the CPU.

Therefore, the embodiment forces the Pipeline on the entry item of the store buffer when the entry item of the store buffer is running low, and does not wait for the younger store to merge.

Example 3

Referring to fig. 1, the present embodiment provides an implementation system of a store _ buffer merge, including a store _ buffer module, a pipeline module, a miss queue module, and a refill module.

The store _ buffer module in this embodiment is configured to, after a store queue completes sta and std operations, obtain a store data that confirms that the store data of the store instruction can be written into a memory by the graduation module, and write the store data into the store _ buffer;

the pipeline module in this embodiment is used for accessing data after the pipeline is instructed, if an entry item is requested to be allocated in the Miss new missq, the pipeline is allocated in the refill, and if the data consistent with the address in the store _ buffer needs to be stored, the two data are merged and then are written into the D _ cache together;

the miss queue module in this embodiment is configured to send a load request to l2 after the D _ cache miss is completed;

the refill module in this embodiment is configured to receive the pipeline on the refill _ buffer after receiving the data returned by the L2.

In summary, the invention provides a time window for the store data to merge the store instruction of the same cacheline later, so that more store instructions can merge together, and the times of uploading Pipeline in the store _ buffer by the store instructions can be greatly reduced, thereby fully utilizing Pipeline resources and using more Pipeline resources for other requests. Thereby providing the overall performance of the CPU.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for implementing store _ buffer merge is characterized by comprising the following steps:

2. The method for implementing the store _ buffer merge as claimed in claim 1, wherein, in the method, if the store buffer entry is running low, the store instruction is forced to enter pipeline as soon as possible, and the store buffer entry is emptied.

3. The method as claimed in claim 2, wherein if more than 2store entries in the store buffer entry are not yet in empty, delaying the new entry for a period of time and then uploading, and then merging if a store instruction with the same address and the entry appear.

4. The method as claimed in claim 1, wherein if M/E state is hit in D _ cache, then store _ buffer issues write request, writes data into D _ cache, and modifies cache state to E state.

5. The method for implementing store _ buffer merge as claimed in claim 4, wherein in the method, if there is no hit in the D _ cache, or if there is a hit in the D _ cache, it is S-state, then an entry is allocated to the miss queue.

6. The method as claimed in claim 1, wherein in the method, L2 wakes up the store _ buffer entry in the S state at miss hit in the store buffer while returning the reload E state data, and wakes up the store _ buffer entry and pipeline after giving pipeline window of several clock cycles on the pipeline.

7. The method as claimed in claim 1, wherein in the method, after the reload data is refilled into the refill _ buffer, the refill is immediately mapped, at this time, the pipeline detects whether the data of the same cache needs to be stored in the store _ buffer, if yes, the data of the store is merged together with the buffer and written into the D _ cache, and simultaneously the items of the store _ buffer and the refill _ buffer are eliminated by deallocation.

8. A system for implementing the store _ buffer merge, the system being used for implementing the method for implementing the store _ buffer merge as claimed in any one of claims 1 to 7, and comprising a store _ buffer module, a pipeline module, a miss queue module and a refill module.

9. The system for implementing store _ buffer merge according to claim 8,

the store _ buffer module is used for writing the data of the store into the store _ buffer when the store data of the store instruction confirmed by the graduation module is obtained and written into the memory after the store queue completes the sta and std operations;