CN114217855A - Method and system for realizing store _ buffer merge - Google Patents

Method and system for realizing store _ buffer merge Download PDF

Info

Publication number
CN114217855A
CN114217855A CN202111549628.3A CN202111549628A CN114217855A CN 114217855 A CN114217855 A CN 114217855A CN 202111549628 A CN202111549628 A CN 202111549628A CN 114217855 A CN114217855 A CN 114217855A
Authority
CN
China
Prior art keywords
store
buffer
data
pipeline
entry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111549628.3A
Other languages
Chinese (zh)
Inventor
李长林
刘磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Saifang Technology Co ltd
Original Assignee
Guangdong Saifang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Saifang Technology Co ltd filed Critical Guangdong Saifang Technology Co ltd
Priority to CN202111549628.3A priority Critical patent/CN114217855A/en
Publication of CN114217855A publication Critical patent/CN114217855A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30094Condition code generation, e.g. Carry, Zero flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention relates to the technical field of CPUs (central processing units), in particular to a method and a system for realizing a store _ buffer merge. The invention provides a time window for the store data to the store instruction of the same cache for merge, thereby fully utilizing pipeline resources and using more pipeline resources for other requests. Thereby providing the overall performance of the CPU. Forcing Pipeline on the entry of the store buffer when the entry of the store buffer is running low, and not waiting for a younger store to come to merge; after a storage buffer entry is allocated, after a certain number of cycles, the Pipeline is forced to be on the entry of the storage buffer, and a younger storage is not waited for to merge, so that the overall performance of the CPU is improved.

Description

Method and system for realizing store _ buffer merge
Technical Field
The invention relates to the technical field of CPUs (central processing units), in particular to a method and a system for realizing store _ buffer merge.
Background
After the store instruction is reiterated, the data is stored in the store _ buffer, according to the correlation of the data, the data of several previous and next store instructions are often inside the same cacheline, if each store instruction is used for uploading a Pipeline independently, the number of occupied pipelines is large, if a time window is provided for the store data to merge the store instructions of the same cacheline later, so that the store instructions can be merged together, the times of uploading pipelines in the store _ buffer by the plurality of store instructions can be greatly reduced, so that the Pipeline resources are fully utilized, and more Pipeline resources are used for other requests. Thereby providing the overall performance of the CPU.
If the time window for providing merge is too long, the entry of the store _ buffer may be full, the data that has already been replaced cannot be normally stored in the store _ buffer from the store queue, and the entry of the store queue cannot be deallocated, which may have a certain influence on the overall performance of the CPU, and in order to solve the problem, two methods are proposed: (1) the Pipeline is forced on the entry of the store buffer when the store buffer entry is running low, instead of waiting for a younger store to merge. (2) After a store buffer entry is allocated, after a certain number of cycles, the store buffer entry is forced to be Pipeline, instead of waiting for a younger store to merge.
Disclosure of Invention
Aiming at the defects of the prior art, the invention discloses a method and a system for realizing a store _ buffer merge, which are used for solving the problems.
The invention is realized by the following technical scheme:
in a first aspect, the present invention provides a method for implementing a store _ buffer merge, including the following steps:
s1 is initialized, after the Store instruction enters the Store buffer from the Store queue, the service condition of the Store buffer entry item is judged, and corresponding processing is carried out;
s2, sending a read request by the store _ buffer, uploading pipeline, judging whether the M/E state or the S state is hit in the D _ cache, and performing corresponding processing;
the S3miss queue entry sends a load request to L2, and L2 returns load E-state data and simultaneously wakes up the miss in the store buffer to hit the S-state store _ buffer entry;
s4, backfilling the reload _ buffer with the reload data, uploading pipeline, detecting whether the data of the same cacheline needs to be stored in the store _ buffer, and performing corresponding processing.
Furthermore, in the method, if the store buffer entry is running out, the store instruction is forced to enter pipeline as soon as possible, and the store buffer entry is emptied.
Furthermore, in the method, if more than 2store entries of the store buffer entry are not still in empty, delaying the new entry for a period of time and then uploading, and then merging if a store instruction with the same address and the entry appear.
Further, in the method, if the M/E state is hit in the D _ cache, the store _ buffer then issues a write request, writes the data into the D _ cache, and modifies the state of the cacheline to the E state.
Furthermore, in the method, if no hit occurs in the D _ cache or the hit occurs in an S state, an entry is allocated to the miss queue request.
Furthermore, in the method, while returning the reload E-state data, L2 wakes up the miss hit in the store buffer to store _ buffer entry in the S-state, and after giving a merge window of several clock cycles to the pipeline on the refill, wakes up the store _ buffer entry and loads the pipeline.
Furthermore, in the method, after the reloaded data is refilled into the refill _ buffer, the refill immediately writes the pipeline, at this time, the pipeline detects whether the data of the same cache needs to be stored in the storage _ buffer, if the data of the storage needs to be stored in the D _ cache, the storage _ buffer and the refill _ buffer entry are deleted by the allocation.
In a second aspect, the present invention provides a system for implementing a store _ buffer merge, where the system is configured to implement the method for implementing the store _ buffer merge in the first aspect, and includes a store _ buffer module, a pipeline module, a miss queue module, and a refill module.
Furthermore, the store _ buffer module is configured to, after the store queue completes sta and std operations, obtain that the store data of the store instruction can be written into the memory by the graduation module, and write the data of the store into the store _ buffer;
the store _ buffer determines whether to directly write the data of the store into the D _ cache or request to allocate missq entry according to the hit state of the D _ cache, and then the L2 returns the load data, and the pipeline on the refill and the refill data are merged and written into the D _ cache;
the pipeline module is used for instructing data access after pipeline, if an entry item is required to be allocated in the Miss new missq, pipeline is allocated in the refill, and if the data consistent with the address in the store _ buffer needs to be stored, the two data are merged and then are written into the D _ cache;
the miss queue module is used for sending a load request to l2 after D _ cache miss is completed;
the refill module is configured to receive the pipeline on the refill _ buffer after receiving the data returned by the L2.
The invention has the beneficial effects that:
the invention provides a time window for the store data to merge the store instructions of the same cacheline later, so that more store instructions can merge together, and the times of uploading Pipeline in the store _ buffer by the store instructions can be greatly reduced, thereby fully utilizing Pipeline resources and using more Pipeline resources for other requests. Thereby providing the overall performance of the CPU.
The method forces the Pipeline on the entry item of the storage buffer when the entry item of the storage buffer is running out, and does not wait for the younger storage to come to merge; after a storage buffer entry is allocated, after a certain number of cycles, the Pipeline is forced to be on the entry of the storage buffer, and a younger storage is not waited for to merge, so that the overall performance of the CPU is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a basic block diagram of a store _ buffer merge implementation system;
FIG. 2 is a flowchart of a store instruction after entering a store buffer, according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The embodiment provides a method for implementing store _ buffer merge, which includes the following steps:
s1 is initialized, after the Store instruction enters the Store buffer from the Store queue, the service condition of the Store buffer entry item is judged, and corresponding processing is carried out;
s2, sending a read request by the store _ buffer, uploading pipeline, judging whether the M/E state or the S state is hit in the D _ cache, and performing corresponding processing;
the S3miss queue entry sends a load request to L2, and L2 returns load E-state data and simultaneously wakes up the miss in the store buffer to hit the S-state store _ buffer entry;
s4, backfilling the reload _ buffer with the reload data, uploading pipeline, detecting whether the data of the same cacheline needs to be stored in the store _ buffer, and performing corresponding processing.
In the embodiment, a time window is provided for the store data to merge the store instruction of the same cacheline later, so that more than one store instruction can merge together, and the number of Pipeline times of the multiple store instructions in the store _ buffer can be greatly reduced.
The embodiment forcibly puts Pipeline on the entry of the store buffer when the entry of the store buffer is running low, and does not wait for a younger store to merge.
After a store buffer entry is allocated, after a certain number of cycles, the store buffer entry is forced to be associated with Pipeline, and a younger store is not waited for to merge.
Example 2
In a specific implementation level, referring to fig. 2, the embodiment provides a flow after a store instruction enters a store buffer, which is specifically as follows:
after the Store instruction enters the Store buffer from the Store queue;
if the store buffer entry is running out, the store instruction is forced to enter pipeline as soon as possible, the store buffer entry is emptied, and the pipeline is not required to be blocked
If more than 2 entry items of the store buffer entry are not yet in empty, delaying the newly entered entry for a period of time and then uploading, and merging the store instruction and the entry if the same address appears later, thereby reducing the number of times of uploading the store instruction
In this embodiment, the store _ buffer sends a read request, and after pipeline is started;
if the M/E state is hit in the D _ cache, the store _ buffer then issues a write request, writes the data into the D _ cache, and modifies the cache state to the E state
If no hit exists in the D _ cache or the hit time is S-state, an entry item is allocated to the request in the miss queue
The miss queue entry of the embodiment sends a load request to the L2;
in this embodiment, L2 wakes up the miss in the store buffer while returning the reload E-state data to hit the store _ buffer entry in the S-state, but the woken store _ buffer entry does not immediately prioritize but gives a merge window of several clock cycles to the pipeline in the refill, so that the storage _ buffer is again prioritized.
In this embodiment, after the reload data is refilled into the refill _ buffer, the refill immediately starts to upload a pipeline, at this time, the pipeline detects whether there is data of the same cache in the store _ buffer that needs to be stored, if so, the data of the store is merged together with the merge and written into the D _ cache, and simultaneously the store _ buffer and the refill _ buffer entry are deallocated.
If the time window for providing merge is too long, the entry of the store _ buffer may be full, the data already in the reserve cannot be normally saved from the store queue to the store _ buffer, and the entry of the store queue cannot be deallocated, which may have a certain effect on the overall performance of the CPU.
Therefore, the embodiment forces the Pipeline on the entry item of the store buffer when the entry item of the store buffer is running low, and does not wait for the younger store to merge.
After a store buffer entry is allocated, after a certain number of cycles, the store buffer entry is forced to be associated with Pipeline, and a younger store is not waited for to merge.
Example 3
Referring to fig. 1, the present embodiment provides an implementation system of a store _ buffer merge, including a store _ buffer module, a pipeline module, a miss queue module, and a refill module.
The store _ buffer module in this embodiment is configured to, after a store queue completes sta and std operations, obtain a store data that confirms that the store data of the store instruction can be written into a memory by the graduation module, and write the store data into the store _ buffer;
the store _ buffer determines whether to directly write the data of the store into the D _ cache or request to allocate missq entry according to the hit state of the D _ cache, and then the L2 returns the load data, and the pipeline on the refill and the refill data are merged and written into the D _ cache;
the pipeline module in this embodiment is used for accessing data after the pipeline is instructed, if an entry item is requested to be allocated in the Miss new missq, the pipeline is allocated in the refill, and if the data consistent with the address in the store _ buffer needs to be stored, the two data are merged and then are written into the D _ cache together;
the miss queue module in this embodiment is configured to send a load request to l2 after the D _ cache miss is completed;
the refill module in this embodiment is configured to receive the pipeline on the refill _ buffer after receiving the data returned by the L2.
In summary, the invention provides a time window for the store data to merge the store instruction of the same cacheline later, so that more store instructions can merge together, and the times of uploading Pipeline in the store _ buffer by the store instructions can be greatly reduced, thereby fully utilizing Pipeline resources and using more Pipeline resources for other requests. Thereby providing the overall performance of the CPU.
The method forces the Pipeline on the entry item of the storage buffer when the entry item of the storage buffer is running out, and does not wait for the younger storage to come to merge; after a storage buffer entry is allocated, after a certain number of cycles, the Pipeline is forced to be on the entry of the storage buffer, and a younger storage is not waited for to merge, so that the overall performance of the CPU is improved.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A method for implementing store _ buffer merge is characterized by comprising the following steps:
s1 is initialized, after the Store instruction enters the Store buffer from the Store queue, the service condition of the Store buffer entry item is judged, and corresponding processing is carried out;
s2, sending a read request by the store _ buffer, uploading pipeline, judging whether the M/E state or the S state is hit in the D _ cache, and performing corresponding processing;
the S3miss queue entry sends a load request to L2, and L2 returns load E-state data and simultaneously wakes up the miss in the store buffer to hit the S-state store _ buffer entry;
s4, backfilling the reload _ buffer with the reload data, uploading pipeline, detecting whether the data of the same cacheline needs to be stored in the store _ buffer, and performing corresponding processing.
2. The method for implementing the store _ buffer merge as claimed in claim 1, wherein, in the method, if the store buffer entry is running low, the store instruction is forced to enter pipeline as soon as possible, and the store buffer entry is emptied.
3. The method as claimed in claim 2, wherein if more than 2store entries in the store buffer entry are not yet in empty, delaying the new entry for a period of time and then uploading, and then merging if a store instruction with the same address and the entry appear.
4. The method as claimed in claim 1, wherein if M/E state is hit in D _ cache, then store _ buffer issues write request, writes data into D _ cache, and modifies cache state to E state.
5. The method for implementing store _ buffer merge as claimed in claim 4, wherein in the method, if there is no hit in the D _ cache, or if there is a hit in the D _ cache, it is S-state, then an entry is allocated to the miss queue.
6. The method as claimed in claim 1, wherein in the method, L2 wakes up the store _ buffer entry in the S state at miss hit in the store buffer while returning the reload E state data, and wakes up the store _ buffer entry and pipeline after giving pipeline window of several clock cycles on the pipeline.
7. The method as claimed in claim 1, wherein in the method, after the reload data is refilled into the refill _ buffer, the refill is immediately mapped, at this time, the pipeline detects whether the data of the same cache needs to be stored in the store _ buffer, if yes, the data of the store is merged together with the buffer and written into the D _ cache, and simultaneously the items of the store _ buffer and the refill _ buffer are eliminated by deallocation.
8. A system for implementing the store _ buffer merge, the system being used for implementing the method for implementing the store _ buffer merge as claimed in any one of claims 1 to 7, and comprising a store _ buffer module, a pipeline module, a miss queue module and a refill module.
9. The system for implementing store _ buffer merge according to claim 8,
the store _ buffer module is used for writing the data of the store into the store _ buffer when the store data of the store instruction confirmed by the graduation module is obtained and written into the memory after the store queue completes the sta and std operations;
the store _ buffer determines whether to directly write the data of the store into the D _ cache or request to allocate missq entry according to the hit state of the D _ cache, and then the L2 returns the load data, and the pipeline on the refill and the refill data are merged and written into the D _ cache;
the pipeline module is used for instructing data access after pipeline, if an entry item is required to be allocated in the Miss new missq, pipeline is allocated in the refill, and if the data consistent with the address in the store _ buffer needs to be stored, the two data are merged and then are written into the D _ cache;
the miss queue module is used for sending a load request to l2 after D _ cache miss is completed;
the refill module is configured to receive the pipeline on the refill _ buffer after receiving the data returned by the L2.
CN202111549628.3A 2021-12-17 2021-12-17 Method and system for realizing store _ buffer merge Pending CN114217855A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111549628.3A CN114217855A (en) 2021-12-17 2021-12-17 Method and system for realizing store _ buffer merge

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111549628.3A CN114217855A (en) 2021-12-17 2021-12-17 Method and system for realizing store _ buffer merge

Publications (1)

Publication Number Publication Date
CN114217855A true CN114217855A (en) 2022-03-22

Family

ID=80703461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111549628.3A Pending CN114217855A (en) 2021-12-17 2021-12-17 Method and system for realizing store _ buffer merge

Country Status (1)

Country Link
CN (1) CN114217855A (en)

Similar Documents

Publication Publication Date Title
US10209900B2 (en) Buffer allocation and memory management using mapping table
US20020042863A1 (en) Storing a flushed cache line in a memory buffer of a controller
KR20130018742A (en) Gpu support for garbage collection
JP2009540438A (en) Processor core stack expansion
US6560681B1 (en) Split sparse directory for a distributed shared memory multiprocessor system
US10901887B2 (en) Buffered freepointer management memory system
EP1605360B1 (en) Cache coherency maintenance for DMA, task termination and synchronisation operations
WO2024045586A1 (en) Cache supporting simt architecture and corresponding processor
CN114217861A (en) Data processing method and device, electronic device and storage medium
EP1030243B1 (en) Optimized hardware cleaning function for virtual index virtual tag data cache
US7694041B2 (en) Method for managing buffers pool and a system using the method
US6202134B1 (en) Paging processing system in virtual storage device and paging processing method thereof
US8990537B2 (en) System and method for robust and efficient free chain management
CN113138851B (en) Data management method, related device and system
WO2024045817A1 (en) Method for scheduling returned data of simt architecture processor, and corresponding processor
US9507725B2 (en) Store forwarding for data caches
CN100380347C (en) Semiconductor
US20050044321A1 (en) Method and system for multiprocess cache management
CN101847128A (en) TLB management method and device
JPH06202951A (en) Cash memory system
CN114217855A (en) Method and system for realizing store _ buffer merge
US7606961B2 (en) Computer system and data pre-fetching method
CN113467935A (en) Method and system for realizing L1cache load forward
US20180081580A1 (en) Buffer Allocation and Memory Management
KR102619406B1 (en) Memory access device and method for allocating cores to programming engines using same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination