CN113867801A - Instruction cache, instruction cache group and request merging method thereof - Google Patents
Instruction cache, instruction cache group and request merging method thereof Download PDFInfo
- Publication number
- CN113867801A CN113867801A CN202111162371.6A CN202111162371A CN113867801A CN 113867801 A CN113867801 A CN 113867801A CN 202111162371 A CN202111162371 A CN 202111162371A CN 113867801 A CN113867801 A CN 113867801A
- Authority
- CN
- China
- Prior art keywords
- request
- access
- access requests
- arbiter
- instruction cache
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000010586 diagram Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3853—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention provides an instruction cache, an instruction cache group and a request merging method thereof. Selecting one of a plurality of access requests as an execution request via an arbiter, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration to the arbiter at the same time; merging an access request mark which is not selected by the arbiter in the plurality of access requests and has the same access address with the execution request to obtain a merged request; reading data based on the access address; and broadcasting the read data to all threads corresponding to the merging request.
Description
Technical Field
The invention relates to an instruction cache, an instruction cache group and a request merging method thereof.
Background
Generally, in a General-purpose computing graphics processing unit (GPGPU), an instruction cache is shared by multiple threads. In normal operation, multiple threads may simultaneously issue instruction fetch requests to the instruction cache. After arbitration, the instruction cache selects a thread request to send to the instruction memory to read out the corresponding instruction data. Accordingly, when multiple threads read the same instruction at the same time, the same instruction is read multiple times because the instruction cache can only process requests of one thread at a time, and thus the requests can only be processed serially. This way the read latency of the instructions increases wasting the data bandwidth of the instruction memory.
Disclosure of Invention
The invention aims at an instruction cache, an instruction cache group and a request merging method thereof, which can merge and process a plurality of requests at the same time, reduce the processing delay and reduce the access bandwidth of an instruction memory.
The invention relates to a request merging method for instruction cache, which comprises the following steps: selecting one of a plurality of access requests as an execution request via an arbiter, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration to the arbiter at the same time; merging the access request which is not selected by the arbiter and has the same access address with the execution request in the plurality of access requests with the execution request to obtain a merged request; reading data based on the access address; and broadcasting the read data to all threads corresponding to the merging request.
According to an embodiment of the present invention, merging an access request, which is not selected by the arbiter and has the same access address as the execution request, with the execution request to obtain the merged request includes: and recording an identifier corresponding to the merged request, wherein the identifier at least comprises identification information of threads corresponding to access requests with the same access address in the multiple access requests.
According to an embodiment of the present invention, the step of broadcasting the read data to all threads corresponding to the merge request includes: broadcasting the read data to all threads corresponding to the merging request based on the identifier.
According to an embodiment of the present invention, after selecting one of the plurality of access requests as an execution request via the arbiter, further comprising: comparing the access request which is not selected by the arbitrator in the plurality of access requests with the execution request to find out the access request with the same access address as the execution request.
The instruction cache of the invention comprises: an arbiter; to select one of a plurality of access requests as an execution request, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration from the arbiter at the same time; first operation logic, configured to merge an access request, which is not selected by the arbiter and has the same access address as the execution request, of the multiple access requests with the execution request to obtain a merged request; an instruction memory that reads data based on the access address; and second operation logic, configured to broadcast the read data to all threads corresponding to the merge request.
In the instruction cache according to the embodiment of the present invention, the first operation logic records an identifier corresponding to the merge request, where the identifier includes at least identification information of a thread corresponding to an access request with a same access address in the multiple access requests.
In the instruction access apparatus according to the embodiment of the present invention, the second operation logic broadcasts the read data to all threads corresponding to the merge request based on the identifier.
In the instruction access device according to the embodiment of the present invention, the first operation logic compares an access request that is not selected by the arbiter among the plurality of access requests with the execution request to find an access request having the same access address as the execution request.
The instruction cache set of the present invention comprises: a plurality of instruction caches; the plurality of instruction caches includes at least one of the instruction caches.
In the instruction cache set according to the embodiment of the invention, the access addresses of the access requests for simultaneously accessing at least two instruction caches in the plurality of instruction caches are different.
Based on the above, the present invention provides a request merging method in an instruction cache, that is, a request merging action is introduced between an arbiter and an instruction memory, and access requests having the same access address at the same time are merged. Therefore, the reading time of the instruction can be shortened, and the access bandwidth of the instruction memory is saved.
Drawings
FIG. 1 is a block diagram of an instruction access apparatus according to an embodiment of the present invention.
FIG. 2 is a flow diagram of a request merge method according to an embodiment of the invention.
Description of the reference numerals
100: instruction access device
100A: execution unit
100B: instruction cache
110: arbitrator
120: first operation logic
130: instruction memory
140: second operation logic
T1-Tn: threading
r 1-rn: access request
r: executing a request
S205 to S220: steps of a request merging method
Detailed Description
Reference will now be made in detail to exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings and the description to refer to the same or like parts.
FIG. 1 is a block diagram of an instruction access apparatus according to an embodiment of the present invention. Referring to fig. 1, the instruction access apparatus 100 includes an Execution Unit (EU) 100A and an instruction cache 100B. In some examples, the execution unit 100A is, for example, part of a Central Processing Unit (CPU), which performs operations and calculations as instructed by a computer program. In the present embodiment, the execution unit 100A issues a plurality of threads T1 to Tn. In addition, in other embodiments, a plurality of threads T1 Tn may be issued by a plurality of different EUs 100A, and one EU 100A may issue one thread or issue a plurality of threads simultaneously.
It should be understood that the threads T1 Tn can be issued by other devices, and the invention is not limited to the manner of generating the threads T1 Tn.
The steps of the instruction sharing method are further described below with reference to the instruction accessing apparatus 100.
FIG. 2 is a flow diagram of a request merge method according to an embodiment of the invention. Referring to fig. 1 and fig. 2, in step S205, one of the access requests r 1-rn is selected as the execution request r through the arbiter 110. Here, the plurality of access requests r1 to rn are transmitted from the plurality of threads T1 to Tn, respectively, and request arbitration from the arbiter 110 at the same time. That is, when the threads T1 Tn enter the instruction cache 100B and request arbitration from the arbiter 110 at the same time, the arbiter 110 will first select one of the threads T1 Tn as the subsequent execution request r.
In step S210, the first operation logic 120 merges the access request r 1-rn that is not selected by the arbiter 110 and has the same access address as the execution request r with the execution request r to obtain a merged request. For example, the first operation logic 120 compares the unselected ones of the access requests r 1-rn with the execution request r to find an access request having the same access address as the execution request r.
Here, each access request includes an identifier (or identification information) of its corresponding thread. That is, the identifier is used to identify which thread the access request was sent by. The first operation logic 120 may obtain the merged request by adding an identifier corresponding to each access request having the same access address as the execution request r to the execution request r. That is, the merged request is the execution request r plus an identifier of one or more access requests having the same access address as the execution request r. The identifier of the merged request includes the identifier of the execution request r and the identifiers of one or more access requests having the same access address as the execution request r.
In other embodiments, the first operation logic 120 may also add a tag to the access request with the same access address as the execution request r, thereby using the access request with the tag and the execution request r as a merged request for subsequent use by the second operation logic 140.
In other embodiments, the first operation logic 120 may also be used with a register to record an identifier corresponding to each access request with the same access address as the execution request r into the register, so that the identifier is recorded in the register of the access request and the execution request r as a merged request to be subsequently used by the second operation logic 140.
For example, assuming that the arbiter 110 selects the access request r1 as the execution request r with the access address x00A, the first operation logic 120 compares the unselected access requests r 2-rn with the execution request r to find out the access request with the access address x 00A. Assuming that the first operation logic 120 finds that the access addresses of the access requests r3, r5 are also X00A, the access requests r3, r5 are merged with the execution request r (access request r1) to obtain a merged request. For example, the execution request r is added with identifiers of the access requests r3 and r5 corresponding to the threads T3 and T5, respectively. Alternatively, a tag is added to the access requests r3, r5 for identification by the second operation logic 140. Alternatively, the identifiers corresponding to the threads T3 and T5 in the access requests r3 and r5 are recorded in registers.
Next, in step S215, instruction memory 130 reads data based on the access address. Specifically, after the first operation logic 120 completes the request merge, the merge request is sent to the instruction memory 130, so that the instruction memory 130 reads data based on the access address of the merge request.
Thereafter, in step S220, the second operation logic 140 broadcasts the read data to all threads corresponding to the merge request. For example, the second operation logic 140 passes the read data to the thread to which the identifier corresponds based on all identifiers included in the merge request. Alternatively, the second operation logic 140 obtains the identifier recorded in the execution request r and the access request with the tag, and transmits the read data to the thread corresponding to the identifier. Alternatively, the second operation logic 140 obtains the identifier of the execution request r and the identifier recorded in the register used with the first operation logic 110, and transmits the read data to the thread corresponding to the identifier.
After step S220 is completed, other access requests that cannot be merged with the execution request r wait for other access requests sent by other subsequent threads for the next round of arbitration. That is, after the current access request r is finished, the arbiter 110 selects another access request from the access requests which are not yet accessed and simultaneously request arbitration from the arbiter 110 to execute steps S210 to S220, and so on until all access requests are executed.
The request merge method may also be used in instruction cache sets. The instruction cache set includes a plurality of instruction caches. And at least one of these instruction caches employs the request merge method. For example, an instruction cache set includes a plurality of memory blocks (banks). Each memory block using the request merge method is configured as the instruction cache 100B, and includes an arbiter 110, a first operation logic 120, an instruction memory 130, and a second operation logic 140. In an embodiment it may be set that only one memory block is allowed to be accessed at one point in time. In the case where a plurality of access requests enter one memory block at the same time, steps S205 to S220 of the above-described request merge method are performed. In another embodiment, multiple memory blocks may be allowed to be accessed simultaneously as long as the access addresses do not conflict. That is, access addresses of access requests for simultaneously accessing a plurality of memory blocks are different. For example, the access address of the access request to access memory block b01 is x001, the access address of the access request to access memory block b02 is x0002, the access address of the access request to access memory block b03 is x0003, and the access address of the access request to access memory block b04 is x 0004. Accordingly, the threads T1 Tn can be divided into 4 groups to access the memory block b01 to the memory block b04, respectively. If multiple access requests enter the memory blocks b01 to b04 at the same time, the memory blocks b01 to b04 execute the steps S205 to S220 of the request merge method.
In summary, the present invention introduces a request merging action between the arbiter and the instruction memory in the instruction cache, compares the access request of the thread selected by the arbiter with the access requests of other threads to merge the access instructions with the same access address, and then broadcasts the accessed data to the related threads by broadcasting. Therefore, a plurality of requests can be processed in a combined mode at the same time, the instruction reading time is further shortened, the processing delay is reduced, and the access bandwidth of the instruction memory is saved.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A request merging method for an instruction cache, comprising:
selecting one of a plurality of access requests as an execution request via an arbiter, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration to the arbiter at the same time;
merging the access request which is not selected by the arbiter and has the same access address with the execution request in the plurality of access requests with the execution request to obtain a merged request;
reading data based on the access address; and
and broadcasting the read data to all threads corresponding to the merging request.
2. The request merging method of claim 1, wherein merging the execution request with an access request of the plurality of access requests that is not selected by the arbiter and has a same access address as the execution request to obtain the merged request comprises:
and recording an identifier corresponding to the merged request, wherein the identifier at least comprises identification information of threads corresponding to access requests with the same access address in the multiple access requests.
3. The request merging method according to claim 2, wherein the step of broadcasting the read data to all threads corresponding to the merged request comprises:
broadcasting the read data to all threads corresponding to the merging request based on the identifier.
4. The request merging method of claim 1, wherein after selecting one of the plurality of access requests as an execution request via an arbiter, further comprising:
comparing the access request which is not selected by the arbitrator in the plurality of access requests with the execution request to find out the access request with the same access address as the execution request.
5. An instruction cache, comprising:
an arbiter; to select one of a plurality of access requests as an execution request, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration from the arbiter at the same time;
first operation logic, configured to merge an access request, which is not selected by the arbiter and has the same access address as the execution request, of the multiple access requests with the execution request to obtain a merged request;
an instruction memory that reads data based on the access address; and
second operation logic, configured to broadcast the read data to all threads corresponding to the merge request.
6. The instruction cache of claim 5 wherein the first operation logic records an identifier corresponding to the merged request, the identifier comprising at least identification information of a thread corresponding to an access request of the plurality of access requests having a same access address.
7. The instruction cache of claim 6 wherein the second operation logic broadcasts the read data to all threads corresponding to the merge request based on the identifier.
8. The instruction cache of claim 5 wherein the first operation logic compares access requests of the plurality of access requests not selected by the arbiter to the execution request to find access requests having the same access address as the execution request.
9. An instruction cache set, comprising:
a plurality of instruction caches;
wherein the plurality of instruction caches comprises at least one instruction cache according to any of claims 5 to 8.
10. The instruction cache set of claim 9 wherein access addresses of access requests to access at least two of the plurality of instruction caches simultaneously are different.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111162371.6A CN113867801A (en) | 2021-09-30 | 2021-09-30 | Instruction cache, instruction cache group and request merging method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111162371.6A CN113867801A (en) | 2021-09-30 | 2021-09-30 | Instruction cache, instruction cache group and request merging method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113867801A true CN113867801A (en) | 2021-12-31 |
Family
ID=79001355
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111162371.6A Pending CN113867801A (en) | 2021-09-30 | 2021-09-30 | Instruction cache, instruction cache group and request merging method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113867801A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115269455A (en) * | 2022-09-30 | 2022-11-01 | 湖南兴天电子科技股份有限公司 | Disk data read-write control method and device based on FPGA and storage terminal |
CN115285139A (en) * | 2022-08-04 | 2022-11-04 | 中国第一汽车股份有限公司 | Intelligent cabin control method, device, equipment and medium |
CN117742793A (en) * | 2023-11-01 | 2024-03-22 | 上海合芯数字科技有限公司 | Instruction merging circuit, method and chip for data cache instruction |
-
2021
- 2021-09-30 CN CN202111162371.6A patent/CN113867801A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115285139A (en) * | 2022-08-04 | 2022-11-04 | 中国第一汽车股份有限公司 | Intelligent cabin control method, device, equipment and medium |
CN115269455A (en) * | 2022-09-30 | 2022-11-01 | 湖南兴天电子科技股份有限公司 | Disk data read-write control method and device based on FPGA and storage terminal |
CN115269455B (en) * | 2022-09-30 | 2022-12-23 | 湖南兴天电子科技股份有限公司 | Disk data read-write control method and device based on FPGA and storage terminal |
CN117742793A (en) * | 2023-11-01 | 2024-03-22 | 上海合芯数字科技有限公司 | Instruction merging circuit, method and chip for data cache instruction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113867801A (en) | Instruction cache, instruction cache group and request merging method thereof | |
US6038646A (en) | Method and apparatus for enforcing ordered execution of reads and writes across a memory interface | |
US6779089B2 (en) | Locked content addressable memory for efficient access | |
US8527708B2 (en) | Detecting address conflicts in a cache memory system | |
US20120110303A1 (en) | Method for Process Synchronization of Embedded Applications in Multi-Core Systems | |
US7529876B2 (en) | Tag allocation method | |
CN114416397A (en) | Chip, memory access method and computer equipment | |
CN115454887A (en) | Data processing method and device, electronic equipment and readable storage medium | |
US20090150642A1 (en) | Indexing Page Attributes | |
CN112368676A (en) | Method and apparatus for processing data | |
CN117742793B (en) | Instruction merging circuit, method and chip for data cache instruction | |
JP2021157843A (en) | Method for executing instruction, device, apparatus, and computer readable storage medium | |
CN116701246B (en) | Method, device, equipment and storage medium for improving cache bandwidth | |
CN112559403B (en) | Processor and interrupt controller therein | |
JP3260456B2 (en) | Computer system, integrated circuit suitable for it, and requirement selection circuit | |
CN115269199A (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
CN114063923A (en) | Data reading method and device, processor and electronic equipment | |
US6349370B1 (en) | Multiple bus shared memory parallel processor and processing method | |
CN110647357B (en) | Synchronous multithread processor | |
US20050135402A1 (en) | Data transfer apparatus | |
CN107807888B (en) | Data prefetching system and method for SOC architecture | |
US20030018854A1 (en) | Microprocessor | |
CN116820333B (en) | SSDRAID-5 continuous writing method based on multithreading | |
CN117009265B (en) | Data processing device applied to system on chip | |
CN117891510B (en) | Instruction acquisition method, apparatus, computer device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Country or region after: China Address after: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai Applicant after: Shanghai Bi Ren Technology Co.,Ltd. Address before: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai Applicant before: Shanghai Bilin Intelligent Technology Co.,Ltd. Country or region before: China |
|
CB02 | Change of applicant information |