CN113867801A

CN113867801A - Instruction cache, instruction cache group and request merging method thereof

Info

Publication number: CN113867801A
Application number: CN202111162371.6A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Biren Intelligent Technology Co Ltd
Current assignee: Shanghai Biren Intelligent Technology Co Ltd
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2021-12-31

Abstract

The invention provides an instruction cache, an instruction cache group and a request merging method thereof. Selecting one of a plurality of access requests as an execution request via an arbiter, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration to the arbiter at the same time; merging an access request mark which is not selected by the arbiter in the plurality of access requests and has the same access address with the execution request to obtain a merged request; reading data based on the access address; and broadcasting the read data to all threads corresponding to the merging request.

Description

Instruction cache, instruction cache group and request merging method thereof

Technical Field

The invention relates to an instruction cache, an instruction cache group and a request merging method thereof.

Background

Generally, in a General-purpose computing graphics processing unit (GPGPU), an instruction cache is shared by multiple threads. In normal operation, multiple threads may simultaneously issue instruction fetch requests to the instruction cache. After arbitration, the instruction cache selects a thread request to send to the instruction memory to read out the corresponding instruction data. Accordingly, when multiple threads read the same instruction at the same time, the same instruction is read multiple times because the instruction cache can only process requests of one thread at a time, and thus the requests can only be processed serially. This way the read latency of the instructions increases wasting the data bandwidth of the instruction memory.

Disclosure of Invention

The invention aims at an instruction cache, an instruction cache group and a request merging method thereof, which can merge and process a plurality of requests at the same time, reduce the processing delay and reduce the access bandwidth of an instruction memory.

The invention relates to a request merging method for instruction cache, which comprises the following steps: selecting one of a plurality of access requests as an execution request via an arbiter, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration to the arbiter at the same time; merging the access request which is not selected by the arbiter and has the same access address with the execution request in the plurality of access requests with the execution request to obtain a merged request; reading data based on the access address; and broadcasting the read data to all threads corresponding to the merging request.

According to an embodiment of the present invention, merging an access request, which is not selected by the arbiter and has the same access address as the execution request, with the execution request to obtain the merged request includes: and recording an identifier corresponding to the merged request, wherein the identifier at least comprises identification information of threads corresponding to access requests with the same access address in the multiple access requests.

According to an embodiment of the present invention, the step of broadcasting the read data to all threads corresponding to the merge request includes: broadcasting the read data to all threads corresponding to the merging request based on the identifier.

According to an embodiment of the present invention, after selecting one of the plurality of access requests as an execution request via the arbiter, further comprising: comparing the access request which is not selected by the arbitrator in the plurality of access requests with the execution request to find out the access request with the same access address as the execution request.

The instruction cache of the invention comprises: an arbiter; to select one of a plurality of access requests as an execution request, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration from the arbiter at the same time; first operation logic, configured to merge an access request, which is not selected by the arbiter and has the same access address as the execution request, of the multiple access requests with the execution request to obtain a merged request; an instruction memory that reads data based on the access address; and second operation logic, configured to broadcast the read data to all threads corresponding to the merge request.

In the instruction cache according to the embodiment of the present invention, the first operation logic records an identifier corresponding to the merge request, where the identifier includes at least identification information of a thread corresponding to an access request with a same access address in the multiple access requests.

In the instruction access apparatus according to the embodiment of the present invention, the second operation logic broadcasts the read data to all threads corresponding to the merge request based on the identifier.

In the instruction access device according to the embodiment of the present invention, the first operation logic compares an access request that is not selected by the arbiter among the plurality of access requests with the execution request to find an access request having the same access address as the execution request.

The instruction cache set of the present invention comprises: a plurality of instruction caches; the plurality of instruction caches includes at least one of the instruction caches.

In the instruction cache set according to the embodiment of the invention, the access addresses of the access requests for simultaneously accessing at least two instruction caches in the plurality of instruction caches are different.

Based on the above, the present invention provides a request merging method in an instruction cache, that is, a request merging action is introduced between an arbiter and an instruction memory, and access requests having the same access address at the same time are merged. Therefore, the reading time of the instruction can be shortened, and the access bandwidth of the instruction memory is saved.

Drawings

FIG. 1 is a block diagram of an instruction access apparatus according to an embodiment of the present invention.

FIG. 2 is a flow diagram of a request merge method according to an embodiment of the invention.

Description of the reference numerals

100: instruction access device

100A: execution unit

100B: instruction cache

110: arbitrator

120: first operation logic

130: instruction memory

140: second operation logic

T1-Tn: threading

r 1-rn: access request

r: executing a request

S205 to S220: steps of a request merging method

Detailed Description

Reference will now be made in detail to exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings and the description to refer to the same or like parts.

FIG. 1 is a block diagram of an instruction access apparatus according to an embodiment of the present invention. Referring to fig. 1, the instruction access apparatus 100 includes an Execution Unit (EU) 100A and an instruction cache 100B. In some examples, the execution unit 100A is, for example, part of a Central Processing Unit (CPU), which performs operations and calculations as instructed by a computer program. In the present embodiment, the execution unit 100A issues a plurality of threads T1 to Tn. In addition, in other embodiments, a plurality of threads T1 Tn may be issued by a plurality of different EUs 100A, and one EU 100A may issue one thread or issue a plurality of threads simultaneously.

It should be understood that the threads T1 Tn can be issued by other devices, and the invention is not limited to the manner of generating the threads T1 Tn.

Instruction cache 100B is, for example, a cache. A cache exists within each processor, and the processors do not need to interact directly with memory when performing read and write operations, but rather through the cache. In the embodiment, the processor is a General-purpose computing graphics processing unit (GPGPU). Instruction cache 100B includes arbiter 110, first operating logic 120, instruction memory 130, and second operating logic 140.

The steps of the instruction sharing method are further described below with reference to the instruction accessing apparatus 100.

FIG. 2 is a flow diagram of a request merge method according to an embodiment of the invention. Referring to fig. 1 and fig. 2, in step S205, one of the access requests r 1-rn is selected as the execution request r through the arbiter 110. Here, the plurality of access requests r1 to rn are transmitted from the plurality of threads T1 to Tn, respectively, and request arbitration from the arbiter 110 at the same time. That is, when the threads T1 Tn enter the instruction cache 100B and request arbitration from the arbiter 110 at the same time, the arbiter 110 will first select one of the threads T1 Tn as the subsequent execution request r.

In step S210, the first operation logic 120 merges the access request r 1-rn that is not selected by the arbiter 110 and has the same access address as the execution request r with the execution request r to obtain a merged request. For example, the first operation logic 120 compares the unselected ones of the access requests r 1-rn with the execution request r to find an access request having the same access address as the execution request r.

Here, each access request includes an identifier (or identification information) of its corresponding thread. That is, the identifier is used to identify which thread the access request was sent by. The first operation logic 120 may obtain the merged request by adding an identifier corresponding to each access request having the same access address as the execution request r to the execution request r. That is, the merged request is the execution request r plus an identifier of one or more access requests having the same access address as the execution request r. The identifier of the merged request includes the identifier of the execution request r and the identifiers of one or more access requests having the same access address as the execution request r.

In other embodiments, the first operation logic 120 may also add a tag to the access request with the same access address as the execution request r, thereby using the access request with the tag and the execution request r as a merged request for subsequent use by the second operation logic 140.

In other embodiments, the first operation logic 120 may also be used with a register to record an identifier corresponding to each access request with the same access address as the execution request r into the register, so that the identifier is recorded in the register of the access request and the execution request r as a merged request to be subsequently used by the second operation logic 140.

For example, assuming that the arbiter 110 selects the access request r1 as the execution request r with the access address x00A, the first operation logic 120 compares the unselected access requests r 2-rn with the execution request r to find out the access request with the access address x 00A. Assuming that the first operation logic 120 finds that the access addresses of the access requests r3, r5 are also X00A, the access requests r3, r5 are merged with the execution request r (access request r1) to obtain a merged request. For example, the execution request r is added with identifiers of the access requests r3 and r5 corresponding to the threads T3 and T5, respectively. Alternatively, a tag is added to the access requests r3, r5 for identification by the second operation logic 140. Alternatively, the identifiers corresponding to the threads T3 and T5 in the access requests r3 and r5 are recorded in registers.

Next, in step S215, instruction memory 130 reads data based on the access address. Specifically, after the first operation logic 120 completes the request merge, the merge request is sent to the instruction memory 130, so that the instruction memory 130 reads data based on the access address of the merge request.

Thereafter, in step S220, the second operation logic 140 broadcasts the read data to all threads corresponding to the merge request. For example, the second operation logic 140 passes the read data to the thread to which the identifier corresponds based on all identifiers included in the merge request. Alternatively, the second operation logic 140 obtains the identifier recorded in the execution request r and the access request with the tag, and transmits the read data to the thread corresponding to the identifier. Alternatively, the second operation logic 140 obtains the identifier of the execution request r and the identifier recorded in the register used with the first operation logic 110, and transmits the read data to the thread corresponding to the identifier.

After step S220 is completed, other access requests that cannot be merged with the execution request r wait for other access requests sent by other subsequent threads for the next round of arbitration. That is, after the current access request r is finished, the arbiter 110 selects another access request from the access requests which are not yet accessed and simultaneously request arbitration from the arbiter 110 to execute steps S210 to S220, and so on until all access requests are executed.

The request merge method may also be used in instruction cache sets. The instruction cache set includes a plurality of instruction caches. And at least one of these instruction caches employs the request merge method. For example, an instruction cache set includes a plurality of memory blocks (banks). Each memory block using the request merge method is configured as the instruction cache 100B, and includes an arbiter 110, a first operation logic 120, an instruction memory 130, and a second operation logic 140. In an embodiment it may be set that only one memory block is allowed to be accessed at one point in time. In the case where a plurality of access requests enter one memory block at the same time, steps S205 to S220 of the above-described request merge method are performed. In another embodiment, multiple memory blocks may be allowed to be accessed simultaneously as long as the access addresses do not conflict. That is, access addresses of access requests for simultaneously accessing a plurality of memory blocks are different. For example, the access address of the access request to access memory block b01 is x001, the access address of the access request to access memory block b02 is x0002, the access address of the access request to access memory block b03 is x0003, and the access address of the access request to access memory block b04 is x 0004. Accordingly, the threads T1 Tn can be divided into 4 groups to access the memory block b01 to the memory block b04, respectively. If multiple access requests enter the memory blocks b01 to b04 at the same time, the memory blocks b01 to b04 execute the steps S205 to S220 of the request merge method.

In summary, the present invention introduces a request merging action between the arbiter and the instruction memory in the instruction cache, compares the access request of the thread selected by the arbiter with the access requests of other threads to merge the access instructions with the same access address, and then broadcasts the accessed data to the related threads by broadcasting. Therefore, a plurality of requests can be processed in a combined mode at the same time, the instruction reading time is further shortened, the processing delay is reduced, and the access bandwidth of the instruction memory is saved.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A request merging method for an instruction cache, comprising:

selecting one of a plurality of access requests as an execution request via an arbiter, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration to the arbiter at the same time;

merging the access request which is not selected by the arbiter and has the same access address with the execution request in the plurality of access requests with the execution request to obtain a merged request;

reading data based on the access address; and

and broadcasting the read data to all threads corresponding to the merging request.

2. The request merging method of claim 1, wherein merging the execution request with an access request of the plurality of access requests that is not selected by the arbiter and has a same access address as the execution request to obtain the merged request comprises:

and recording an identifier corresponding to the merged request, wherein the identifier at least comprises identification information of threads corresponding to access requests with the same access address in the multiple access requests.

3. The request merging method according to claim 2, wherein the step of broadcasting the read data to all threads corresponding to the merged request comprises:

broadcasting the read data to all threads corresponding to the merging request based on the identifier.

4. The request merging method of claim 1, wherein after selecting one of the plurality of access requests as an execution request via an arbiter, further comprising:

comparing the access request which is not selected by the arbitrator in the plurality of access requests with the execution request to find out the access request with the same access address as the execution request.

5. An instruction cache, comprising:

an arbiter; to select one of a plurality of access requests as an execution request, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration from the arbiter at the same time;

first operation logic, configured to merge an access request, which is not selected by the arbiter and has the same access address as the execution request, of the multiple access requests with the execution request to obtain a merged request;

an instruction memory that reads data based on the access address; and

second operation logic, configured to broadcast the read data to all threads corresponding to the merge request.

6. The instruction cache of claim 5 wherein the first operation logic records an identifier corresponding to the merged request, the identifier comprising at least identification information of a thread corresponding to an access request of the plurality of access requests having a same access address.

7. The instruction cache of claim 6 wherein the second operation logic broadcasts the read data to all threads corresponding to the merge request based on the identifier.

8. The instruction cache of claim 5 wherein the first operation logic compares access requests of the plurality of access requests not selected by the arbiter to the execution request to find access requests having the same access address as the execution request.

9. An instruction cache set, comprising:

a plurality of instruction caches;

wherein the plurality of instruction caches comprises at least one instruction cache according to any of claims 5 to 8.

10. The instruction cache set of claim 9 wherein access addresses of access requests to access at least two of the plurality of instruction caches simultaneously are different.