CN110688155A

CN110688155A - Merging method for storage instruction accessing non-cacheable area

Info

Publication number: CN110688155A
Application number: CN201910859164.2A
Authority: CN
Inventors: 胡向东; 王飙; 杨剑新; 路冬冬; 张晓东
Original assignee: Shanghai Integrated Circuits with Highperformance Center
Current assignee: Shanghai Integrated Circuits with Highperformance Center
Priority date: 2019-09-11
Filing date: 2019-09-11
Publication date: 2020-01-14

Abstract

The invention relates to a merging method of storage instructions for accessing an uncacheable area, which is characterized in that merging buffer is arranged behind a storage instruction queue, the storage instructions of a plurality of access uncacheable areas with access addresses falling in the same Cache block range are merged, and the write data of the storage instructions are merged and stored in an 'uncacheable area write data buffer' entry. The invention reduces occupation of relevant request channels and data channels by storage instructions accessing the uncacheable area.

Description

Merging method for storage instruction accessing non-cacheable area

Technical Field

The invention relates to the technical field of micro-structure design of a central processing unit, in particular to a merging method for storage instructions accessing an uncacheable area.

Background

In a modern microprocessor, in order to solve the problem of a storage wall caused by the increasingly large difference between the access speed of a storage and the execution speed of a processor, a storage system is generally divided into a first-level Cache (L1 Cache), a second-level Cache (L2 Cache), a third-level Cache (L3 Cache), a memory, a disk and the like from top to bottom in sequence. In the general microprocessor, in order to better utilize the locality of a program and exert the function of each level of Cache to the maximum extent, when an access instruction accesses a storage system, the whole Cache block data where an access address is located is written into each level of Cache. Correspondingly, the access instruction can access each level of cache from top to bottom when accessing the storage system, and if the data block of the current access address is stored in the first level of cache, the data is read from the first level of cache; if the data block of the current access address is not stored in the first-level cache, sequentially accessing the second-level cache and the third-level cache to the main memory, and loading the data block of the access address into each-level cache.

In the general microprocessor, before a storage instruction is executed, the whole Cache block where an access address is located needs to be loaded into a first-level data Cache and the write permission of the Cache block is obtained, and data can exist in lower-level caches at the same time. The storage instruction directly writes data into a first-level data cache when being executed; if the first-level data cache adopts a write-through write strategy, the storage instruction can write data into the first-level data cache and the second-level cache simultaneously.

The premise that the hierarchical storage system can obtain the effect is that the access behavior has better locality; for a storage area without access locality, if data of a corresponding area is written into each level of cache, data which may be useful in the future in the cache is occupied, and "pollution" of the cache is caused. For such cases, the software may mark areas that do not have memory locality as uncacheable areas. When the storage instruction accesses the area, the write request and the write data are directly sent to the main memory to be executed; or special memory access instructions can be designed in the microprocessor, memory access is carried out in a non-cache mode, namely, the main memory is directly accessed during execution, and accessed data blocks are not written into caches of all levels. In addition, when the storage instruction accessing the IO device is executed, the write request and the write data also need to be sent to the corresponding IO device for execution. Subsequently, the uncacheable storage area, the storage area accessed in an uncacheable manner, and the address space where the IO device is located are collectively referred to as an uncacheable area.

The execution of a store instruction to access a non-cacheable location can be divided into the following steps: 1) when all the old instructions in the process are normally finished, the storage instruction accessing the uncacheable area is withdrawn from the storage instruction queue, and the write data is stored in a 'uncacheable area write data buffer', wherein the buffer is provided with a plurality of entries; 2) the storage instruction sends a write request to the Cache consistency processing component for the non-cacheable area and carries an entry number of write data in the 'non-cacheable area write data buffer'; 3) the Cache consistency processing part sends a data fetching request to the 'non-cacheable region write data buffer'; 4) the 'non-cacheable area write data buffer' sends write data to a Cache consistency processing part and releases an entry corresponding to the write data; 5) and the Cache consistency processing part sends the write request and the write data to a main memory or IO equipment, and executes write operation in the main memory or the IO equipment.

In the operation of step "2)", after the Cache consistency processing unit receives a write request to the uncacheable area, if the latest data of the address is found in a certain level of Cache, it needs to notify the corresponding Cache to write the latest data back to the main memory.

The storage instruction accessing the non-cacheable area is processed in the above mode, and the advantage is that the request channel of Cache consistency and the data channel of the Cache data write-back main memory can be multiplexed. However, when a plurality of storage instructions are used to write a continuous address space of the uncacheable area, the plurality of storage instructions are executed in sequence, which will occupy the request channel of Cache consistency and the data channel of the main memory for writing back the storage data for many times, and the execution efficiency of the storage instructions is also low.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a merging method for storage instructions accessing an uncacheable area, which reduces the occupation of the storage instructions accessing the uncacheable area on the relevant request channel and data channel.

The technical scheme adopted by the invention for solving the technical problems is as follows: a merging method for storage instructions accessing an uncacheable area is provided, merging buffering is arranged behind a storage instruction queue, the storage instructions accessing the uncacheable area with the access addresses falling in the same Cache block range are merged, and write data of the storage instructions are merged and stored in an 'uncacheable area write data buffering' entry.

Before merging the storage instructions of a plurality of access non-cacheable areas with the access addresses falling in the same Cache block range, merging judgment is carried out on the storage instructions of the access non-cacheable areas and the write requests in the merging buffer, and corresponding processing is carried out according to the judgment result.

When the storage instruction accessing the uncacheable area is merged and judged with the write request in the merge buffer, if the write request in the merge buffer is invalid, the storage instruction is converted into the write request and is registered in the merge buffer, a new 'uncacheable area write data buffer' entry is applied to record the write data of the storage instruction, and meanwhile, the applied 'uncacheable area write data buffer' entry number is registered in the merge buffer.

When the storage instruction accessing the uncacheable area is merged with the write request in the merge buffer, if the write request in the merge buffer is valid and the storage instruction can be merged with the write request in the merge buffer, merging the storage instruction with the write request in the merge buffer to form a new write request and register the new write request in the merge buffer, and at the same time merging the write data of the storage instruction with the write data in the "uncacheable area write data buffer" entry corresponding to the merge buffer.

When the storage instruction accessing the uncacheable area is merged and judged with the write request in the merge buffer, if the write request in the merge buffer is valid but the storage instruction cannot be merged with the write request in the merge buffer, the write request in the merge buffer is immediately sent out, the storage instruction is converted into the write request and is registered in the merge buffer, and a new 'uncacheable area write data buffer' is applied for recording the write data of the storage instruction.

During the merging period of the write request waiting in the merging buffer and the subsequent accessing of the non-cacheable storage instruction, if a loading instruction with the memory access address of the write request in the merging buffer falling in the same Cache block range is executed, the write request in the merging buffer is immediately sent out, and the write request in the merging buffer is ensured to be sent out firstly, and then a read request corresponding to the loading instruction accessing the non-cacheable area and the Cache block is sent out.

Advantageous effects

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects: the method combines a plurality of storage instructions accessing the uncacheable area into one write request and then sends the write request to the Cache consistency processing part, and simultaneously, the write data of the plurality of storage instructions after combination share one 'uncacheable area write data buffer' entry.

Drawings

FIG. 1 is a schematic diagram of the location of merge buffers in the present invention;

FIG. 2 is a schematic view of a process flow for converting a store instruction accessing an uncacheable area into a write request to be registered in a merge buffer, with the merge buffer disabled;

FIG. 3 is a flow chart illustrating a process of merging a store instruction accessing an uncacheable region with a write request in a merge buffer with a valid merge buffer;

FIG. 4 is a flow chart illustrating a process of a merge buffer being active and a store instruction accessing an uncacheable region not being able to merge with a write request in the merge buffer;

FIG. 5 is a flow chart illustrating a process for closing a merge buffer for a load instruction accessing an uncacheable area;

FIG. 6 is a flow chart illustrating an execution of a write request for accessing an uncacheable area after the write request is issued from a merge buffer.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

The embodiment of the invention relates to a merging method of storage instructions for accessing an uncacheable area, which comprises the steps of setting merging buffer behind a storage instruction queue, merging a plurality of storage instructions for accessing the uncacheable area, the access addresses of which fall in the same Cache block range, and merging and storing write data of the plurality of storage instructions into an 'uncacheable area write data buffer' entry.

As shown in fig. 1, in the present embodiment, a merge buffer is set behind the store instruction queue, and the merge buffer records relevant information such as the access address and the access granularity of the current pending merge request, and records the entry number of the current requested write data in the "write data buffer in the uncacheable area". And after the storage instruction of the access and storage non-cacheable area exits from the storage instruction queue, the storage instruction and the write request in the merge buffer are merged and judged, and corresponding processing is carried out according to the judgment result.

When the storage instruction accessing the uncacheable area and the write request in the merge buffer are merged and judged, if the write request in the merge buffer is invalid, the storage instruction is converted into the write request and is registered in the merge buffer, a new 'uncacheable area write data buffer' entry is applied to record the write data of the storage instruction, and meanwhile, the applied 'uncacheable area write data buffer' entry number is registered in the merge buffer. As shown in fig. 2, after the store instruction accessing the uncacheable area exits from the store instruction queue, address merge determination is performed on the store instruction and the write request in the merge buffer, and at this time, the request in the merge buffer is invalid, so that the store instruction registers the write address and the write granularity in the merge buffer, applies for a new "uncacheable area write data buffer" entry, and writes the write data therein.

When the storage instruction accessing the uncacheable area is merged with the write request in the merge buffer, if the write request in the merge buffer is valid and the storage instruction can be merged with the write request in the merge buffer, merging the storage instruction with the write request in the merge buffer to form a new write request and register the new write request into the merge buffer, and at the same time merging the write data of the storage instruction with the write data in the "uncacheable area write data buffer" entry corresponding to the merge buffer. As shown in fig. 3, after the store instruction accessing the uncacheable area exits from the store instruction queue, an address merge determination is performed with the write request in the merge buffer, at this time, the request in the merge buffer is valid, and the store instruction may be merged with the write request in the merge buffer; thus, the store instruction merges the write address and write granularity into the merge buffer and merges the write data with the data in the "uncacheable region write data buffer" corresponding to the merge buffer.

When the storage instruction accessing the uncacheable area is merged and judged with the write request in the merge buffer, if the write request in the merge buffer is valid but the storage instruction cannot be merged with the write request in the merge buffer, the write request in the merge buffer is immediately sent out, the storage instruction is converted into the write request and is registered in the merge buffer, and a new 'uncacheable area write data buffer' is applied for recording the write data of the storage instruction. As shown in fig. 4, after the store instruction accessing the uncacheable area exits from the store instruction queue, address correlation determination is performed on the store instruction and the write request in the merge buffer, at this time, the request in the merge buffer is valid, but the store instruction cannot be merged with the write request in the merge buffer; therefore, the request in the merge buffer is immediately sent to the Cache consistency processing unit, the storage instruction registers the write address and the write granularity in the merge buffer, and applies for a new 'uncacheable region write data buffer' entry to write the write data into the new 'uncacheable region write data buffer'.

Therefore, whether the memory access addresses are in the same Cache block range in the embodiment is a basic criterion for judging whether storage instructions accessing a non-cacheable area can be merged. If the merge buffer is effective, and the memory address of the storage instruction accessing the non-cacheable area and the memory address of the write request in the merge buffer are in the same Cache block range, merging the storage instruction accessing the non-cacheable area and the write request in the merge buffer, otherwise, sending the write request in the merge buffer to the Cache consistency processing component, registering the storage instruction accessing the non-cacheable area in the merge buffer, applying for a new 'non-cacheable area write data buffer' entry, and writing the write data into the applied entry.

In an out-of-order execution microprocessor, the launch and execution of load instructions are out-of-order; store instructions may be issued out of order, but must be executed in order. Therefore, if the load instruction being executed intersects with the access address of the write request in the merge buffer, it indicates that one or more storage instructions corresponding to the write request in the merge buffer should be executed first according to the program sequence, and therefore the merge buffer is immediately closed, that is, the merge buffer does not wait for the subsequent storage instruction, and immediately sends the write request to the Cache consistency processing component, so as to ensure that the Cache consistency processing component receives the write request corresponding to the previous storage instruction in the program sequence first. As shown in fig. 5, when waiting for the merging judgment of the write request in the merge buffer and the subsequent storage instruction, a load instruction for accessing the uncacheable area is received, and the access address of the instruction is crossed with the access address of the write request in the merge buffer, so that the write request in the merge buffer is immediately sent to the Cache consistency processing component, and it is ensured that the write request in the merge buffer is sent to the Cache consistency processing component first, and then the load instruction for the uncacheable area is sent to the Cache consistency processing component.

The processing flow after sending the write request in the merge buffer and the corresponding "non-cacheable area write data buffer" entry number to the Cache consistency processing component is shown in fig. 6, and the Cache consistency processing component sends out a data fetching request of the entry corresponding to the "non-cacheable area write data buffer"; after the 'data writing buffer in the non-cacheable area' receives the data fetching request, the data writing is sent to the Cache consistency processing part, and meanwhile, the corresponding item is released; after receiving the write data, the Cache consistency processing part sends the write data to the non-cacheable area together with information such as a write address and write granularity, and specific write operation is completed in the non-cacheable area. Therefore, the multiple storage instructions are merged into one write request and then executed concurrently, so that the total time required by the execution of the multiple instructions can be effectively shortened, and the performance of the processor for executing the related program segments is improved.

It is not difficult to find that the invention merges a plurality of storage instructions accessing the uncacheable area into a write request and sends the write request to the Cache consistency processing component, and simultaneously, the write data of the merged storage instructions share one 'uncacheable area write data buffer' entry, so that the method not only reduces the occupation of the storage instructions accessing the uncacheable area on the relevant request channel and data channel, but also can improve the entry use efficiency of the 'uncacheable area write data buffer' and improve the execution efficiency of the storage instructions accessing the uncacheable area.

Claims

1. A merging method for storage instructions accessing an uncacheable area is characterized in that merging buffer is arranged behind a storage instruction queue, the storage instructions accessing the uncacheable area with the access addresses falling in the same Cache block range are merged, and write data of the storage instructions are merged and stored in an 'uncacheable area write data buffer' entry.

2. The method for merging the storage instructions accessing the uncacheable area according to claim 1, wherein before merging the storage instructions accessing the uncacheable area with the access addresses falling within the same Cache block, the method further comprises merging and judging the storage instructions accessing the uncacheable area and the write request in the merge buffer, and performing corresponding processing according to the judgment result.

3. The method as claimed in claim 2, wherein when the merge determination is performed on the store instruction accessing the uncacheable region and the write request in the merge buffer, if the write request in the merge buffer is invalid, the store instruction is converted into the write request and registered in the merge buffer, and a new "uncacheable region write data buffer" entry is applied to record the write data of the store instruction, and at the same time, an entry number of the applied "uncacheable region write data buffer" is registered in the merge buffer.

4. The method according to claim 2, wherein when the merge determination is performed on the store instruction accessing the uncacheable region and the write request in the merge buffer, if the write request in the merge buffer is valid and the store instruction can be merged with the write request in the merge buffer, the store instruction and the write request in the merge buffer are merged to form a new write request and are registered in the merge buffer, and at the same time, the write data of the store instruction and the write data in the "uncacheable region write data buffer" entry corresponding to the merge buffer are merged.

5. The method as claimed in claim 2, wherein when determining to merge the storage instruction accessing the uncacheable area with the write request in the merge buffer, if the write request in the merge buffer is valid but the storage instruction cannot be merged with the write request in the merge buffer, the write request in the merge buffer is immediately issued, the storage instruction is converted into a write request and registered in the merge buffer, and a new "uncacheable area write data buffer" is applied to record the write data of the storage instruction.

6. The method as claimed in claim 1, wherein during the period that the write request in the merge buffer waits for merging with the subsequent store instruction accessing the uncacheable area, if a load instruction whose memory address is in the same Cache block range as the memory address of the write request in the merge buffer is executed, the write request in the merge buffer is immediately issued, and it is ensured that the write request in the merge buffer is issued first, and then a read request corresponding to the load instruction accessing the uncacheable area is issued.