CN113867801A - Instruction cache, instruction cache group and request merging method thereof - Google Patents

Instruction cache, instruction cache group and request merging method thereof Download PDF

Info

Publication number
CN113867801A
CN113867801A CN202111162371.6A CN202111162371A CN113867801A CN 113867801 A CN113867801 A CN 113867801A CN 202111162371 A CN202111162371 A CN 202111162371A CN 113867801 A CN113867801 A CN 113867801A
Authority
CN
China
Prior art keywords
request
access
access requests
arbiter
instruction cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111162371.6A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Biren Intelligent Technology Co Ltd
Original Assignee
Shanghai Biren Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Biren Intelligent Technology Co Ltd filed Critical Shanghai Biren Intelligent Technology Co Ltd
Priority to CN202111162371.6A priority Critical patent/CN113867801A/en
Publication of CN113867801A publication Critical patent/CN113867801A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3853Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution of compound instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides an instruction cache, an instruction cache group and a request merging method thereof. Selecting one of a plurality of access requests as an execution request via an arbiter, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration to the arbiter at the same time; merging an access request mark which is not selected by the arbiter in the plurality of access requests and has the same access address with the execution request to obtain a merged request; reading data based on the access address; and broadcasting the read data to all threads corresponding to the merging request.

Description

Instruction cache, instruction cache group and request merging method thereof
Technical Field
The invention relates to an instruction cache, an instruction cache group and a request merging method thereof.
Background
Generally, in a General-purpose computing graphics processing unit (GPGPU), an instruction cache is shared by multiple threads. In normal operation, multiple threads may simultaneously issue instruction fetch requests to the instruction cache. After arbitration, the instruction cache selects a thread request to send to the instruction memory to read out the corresponding instruction data. Accordingly, when multiple threads read the same instruction at the same time, the same instruction is read multiple times because the instruction cache can only process requests of one thread at a time, and thus the requests can only be processed serially. This way the read latency of the instructions increases wasting the data bandwidth of the instruction memory.
Disclosure of Invention
The invention aims at an instruction cache, an instruction cache group and a request merging method thereof, which can merge and process a plurality of requests at the same time, reduce the processing delay and reduce the access bandwidth of an instruction memory.
The invention relates to a request merging method for instruction cache, which comprises the following steps: selecting one of a plurality of access requests as an execution request via an arbiter, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration to the arbiter at the same time; merging the access request which is not selected by the arbiter and has the same access address with the execution request in the plurality of access requests with the execution request to obtain a merged request; reading data based on the access address; and broadcasting the read data to all threads corresponding to the merging request.
According to an embodiment of the present invention, merging an access request, which is not selected by the arbiter and has the same access address as the execution request, with the execution request to obtain the merged request includes: and recording an identifier corresponding to the merged request, wherein the identifier at least comprises identification information of threads corresponding to access requests with the same access address in the multiple access requests.
According to an embodiment of the present invention, the step of broadcasting the read data to all threads corresponding to the merge request includes: broadcasting the read data to all threads corresponding to the merging request based on the identifier.
According to an embodiment of the present invention, after selecting one of the plurality of access requests as an execution request via the arbiter, further comprising: comparing the access request which is not selected by the arbitrator in the plurality of access requests with the execution request to find out the access request with the same access address as the execution request.
The instruction cache of the invention comprises: an arbiter; to select one of a plurality of access requests as an execution request, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration from the arbiter at the same time; first operation logic, configured to merge an access request, which is not selected by the arbiter and has the same access address as the execution request, of the multiple access requests with the execution request to obtain a merged request; an instruction memory that reads data based on the access address; and second operation logic, configured to broadcast the read data to all threads corresponding to the merge request.
In the instruction cache according to the embodiment of the present invention, the first operation logic records an identifier corresponding to the merge request, where the identifier includes at least identification information of a thread corresponding to an access request with a same access address in the multiple access requests.
In the instruction access apparatus according to the embodiment of the present invention, the second operation logic broadcasts the read data to all threads corresponding to the merge request based on the identifier.
In the instruction access device according to the embodiment of the present invention, the first operation logic compares an access request that is not selected by the arbiter among the plurality of access requests with the execution request to find an access request having the same access address as the execution request.
The instruction cache set of the present invention comprises: a plurality of instruction caches; the plurality of instruction caches includes at least one of the instruction caches.
In the instruction cache set according to the embodiment of the invention, the access addresses of the access requests for simultaneously accessing at least two instruction caches in the plurality of instruction caches are different.
Based on the above, the present invention provides a request merging method in an instruction cache, that is, a request merging action is introduced between an arbiter and an instruction memory, and access requests having the same access address at the same time are merged. Therefore, the reading time of the instruction can be shortened, and the access bandwidth of the instruction memory is saved.
Drawings
FIG. 1 is a block diagram of an instruction access apparatus according to an embodiment of the present invention.
FIG. 2 is a flow diagram of a request merge method according to an embodiment of the invention.
Description of the reference numerals
100: instruction access device
100A: execution unit
100B: instruction cache
110: arbitrator
120: first operation logic
130: instruction memory
140: second operation logic
T1-Tn: threading
r 1-rn: access request
r: executing a request
S205 to S220: steps of a request merging method
Detailed Description
Reference will now be made in detail to exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings and the description to refer to the same or like parts.
FIG. 1 is a block diagram of an instruction access apparatus according to an embodiment of the present invention. Referring to fig. 1, the instruction access apparatus 100 includes an Execution Unit (EU) 100A and an instruction cache 100B. In some examples, the execution unit 100A is, for example, part of a Central Processing Unit (CPU), which performs operations and calculations as instructed by a computer program. In the present embodiment, the execution unit 100A issues a plurality of threads T1 to Tn. In addition, in other embodiments, a plurality of threads T1 Tn may be issued by a plurality of different EUs 100A, and one EU 100A may issue one thread or issue a plurality of threads simultaneously.
It should be understood that the threads T1 Tn can be issued by other devices, and the invention is not limited to the manner of generating the threads T1 Tn.
Instruction cache 100B is, for example, a cache. A cache exists within each processor, and the processors do not need to interact directly with memory when performing read and write operations, but rather through the cache. In the embodiment, the processor is a General-purpose computing graphics processing unit (GPGPU). Instruction cache 100B includes arbiter 110, first operating logic 120, instruction memory 130, and second operating logic 140.
The steps of the instruction sharing method are further described below with reference to the instruction accessing apparatus 100.
FIG. 2 is a flow diagram of a request merge method according to an embodiment of the invention. Referring to fig. 1 and fig. 2, in step S205, one of the access requests r 1-rn is selected as the execution request r through the arbiter 110. Here, the plurality of access requests r1 to rn are transmitted from the plurality of threads T1 to Tn, respectively, and request arbitration from the arbiter 110 at the same time. That is, when the threads T1 Tn enter the instruction cache 100B and request arbitration from the arbiter 110 at the same time, the arbiter 110 will first select one of the threads T1 Tn as the subsequent execution request r.
In step S210, the first operation logic 120 merges the access request r 1-rn that is not selected by the arbiter 110 and has the same access address as the execution request r with the execution request r to obtain a merged request. For example, the first operation logic 120 compares the unselected ones of the access requests r 1-rn with the execution request r to find an access request having the same access address as the execution request r.
Here, each access request includes an identifier (or identification information) of its corresponding thread. That is, the identifier is used to identify which thread the access request was sent by. The first operation logic 120 may obtain the merged request by adding an identifier corresponding to each access request having the same access address as the execution request r to the execution request r. That is, the merged request is the execution request r plus an identifier of one or more access requests having the same access address as the execution request r. The identifier of the merged request includes the identifier of the execution request r and the identifiers of one or more access requests having the same access address as the execution request r.
In other embodiments, the first operation logic 120 may also add a tag to the access request with the same access address as the execution request r, thereby using the access request with the tag and the execution request r as a merged request for subsequent use by the second operation logic 140.
In other embodiments, the first operation logic 120 may also be used with a register to record an identifier corresponding to each access request with the same access address as the execution request r into the register, so that the identifier is recorded in the register of the access request and the execution request r as a merged request to be subsequently used by the second operation logic 140.
For example, assuming that the arbiter 110 selects the access request r1 as the execution request r with the access address x00A, the first operation logic 120 compares the unselected access requests r 2-rn with the execution request r to find out the access request with the access address x 00A. Assuming that the first operation logic 120 finds that the access addresses of the access requests r3, r5 are also X00A, the access requests r3, r5 are merged with the execution request r (access request r1) to obtain a merged request. For example, the execution request r is added with identifiers of the access requests r3 and r5 corresponding to the threads T3 and T5, respectively. Alternatively, a tag is added to the access requests r3, r5 for identification by the second operation logic 140. Alternatively, the identifiers corresponding to the threads T3 and T5 in the access requests r3 and r5 are recorded in registers.
Next, in step S215, instruction memory 130 reads data based on the access address. Specifically, after the first operation logic 120 completes the request merge, the merge request is sent to the instruction memory 130, so that the instruction memory 130 reads data based on the access address of the merge request.
Thereafter, in step S220, the second operation logic 140 broadcasts the read data to all threads corresponding to the merge request. For example, the second operation logic 140 passes the read data to the thread to which the identifier corresponds based on all identifiers included in the merge request. Alternatively, the second operation logic 140 obtains the identifier recorded in the execution request r and the access request with the tag, and transmits the read data to the thread corresponding to the identifier. Alternatively, the second operation logic 140 obtains the identifier of the execution request r and the identifier recorded in the register used with the first operation logic 110, and transmits the read data to the thread corresponding to the identifier.
After step S220 is completed, other access requests that cannot be merged with the execution request r wait for other access requests sent by other subsequent threads for the next round of arbitration. That is, after the current access request r is finished, the arbiter 110 selects another access request from the access requests which are not yet accessed and simultaneously request arbitration from the arbiter 110 to execute steps S210 to S220, and so on until all access requests are executed.
The request merge method may also be used in instruction cache sets. The instruction cache set includes a plurality of instruction caches. And at least one of these instruction caches employs the request merge method. For example, an instruction cache set includes a plurality of memory blocks (banks). Each memory block using the request merge method is configured as the instruction cache 100B, and includes an arbiter 110, a first operation logic 120, an instruction memory 130, and a second operation logic 140. In an embodiment it may be set that only one memory block is allowed to be accessed at one point in time. In the case where a plurality of access requests enter one memory block at the same time, steps S205 to S220 of the above-described request merge method are performed. In another embodiment, multiple memory blocks may be allowed to be accessed simultaneously as long as the access addresses do not conflict. That is, access addresses of access requests for simultaneously accessing a plurality of memory blocks are different. For example, the access address of the access request to access memory block b01 is x001, the access address of the access request to access memory block b02 is x0002, the access address of the access request to access memory block b03 is x0003, and the access address of the access request to access memory block b04 is x 0004. Accordingly, the threads T1 Tn can be divided into 4 groups to access the memory block b01 to the memory block b04, respectively. If multiple access requests enter the memory blocks b01 to b04 at the same time, the memory blocks b01 to b04 execute the steps S205 to S220 of the request merge method.
In summary, the present invention introduces a request merging action between the arbiter and the instruction memory in the instruction cache, compares the access request of the thread selected by the arbiter with the access requests of other threads to merge the access instructions with the same access address, and then broadcasts the accessed data to the related threads by broadcasting. Therefore, a plurality of requests can be processed in a combined mode at the same time, the instruction reading time is further shortened, the processing delay is reduced, and the access bandwidth of the instruction memory is saved.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A request merging method for an instruction cache, comprising:
selecting one of a plurality of access requests as an execution request via an arbiter, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration to the arbiter at the same time;
merging the access request which is not selected by the arbiter and has the same access address with the execution request in the plurality of access requests with the execution request to obtain a merged request;
reading data based on the access address; and
and broadcasting the read data to all threads corresponding to the merging request.
2. The request merging method of claim 1, wherein merging the execution request with an access request of the plurality of access requests that is not selected by the arbiter and has a same access address as the execution request to obtain the merged request comprises:
and recording an identifier corresponding to the merged request, wherein the identifier at least comprises identification information of threads corresponding to access requests with the same access address in the multiple access requests.
3. The request merging method according to claim 2, wherein the step of broadcasting the read data to all threads corresponding to the merged request comprises:
broadcasting the read data to all threads corresponding to the merging request based on the identifier.
4. The request merging method of claim 1, wherein after selecting one of the plurality of access requests as an execution request via an arbiter, further comprising:
comparing the access request which is not selected by the arbitrator in the plurality of access requests with the execution request to find out the access request with the same access address as the execution request.
5. An instruction cache, comprising:
an arbiter; to select one of a plurality of access requests as an execution request, wherein the plurality of access requests are respectively sent by a plurality of threads and request arbitration from the arbiter at the same time;
first operation logic, configured to merge an access request, which is not selected by the arbiter and has the same access address as the execution request, of the multiple access requests with the execution request to obtain a merged request;
an instruction memory that reads data based on the access address; and
second operation logic, configured to broadcast the read data to all threads corresponding to the merge request.
6. The instruction cache of claim 5 wherein the first operation logic records an identifier corresponding to the merged request, the identifier comprising at least identification information of a thread corresponding to an access request of the plurality of access requests having a same access address.
7. The instruction cache of claim 6 wherein the second operation logic broadcasts the read data to all threads corresponding to the merge request based on the identifier.
8. The instruction cache of claim 5 wherein the first operation logic compares access requests of the plurality of access requests not selected by the arbiter to the execution request to find access requests having the same access address as the execution request.
9. An instruction cache set, comprising:
a plurality of instruction caches;
wherein the plurality of instruction caches comprises at least one instruction cache according to any of claims 5 to 8.
10. The instruction cache set of claim 9 wherein access addresses of access requests to access at least two of the plurality of instruction caches simultaneously are different.
CN202111162371.6A 2021-09-30 2021-09-30 Instruction cache, instruction cache group and request merging method thereof Pending CN113867801A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111162371.6A CN113867801A (en) 2021-09-30 2021-09-30 Instruction cache, instruction cache group and request merging method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111162371.6A CN113867801A (en) 2021-09-30 2021-09-30 Instruction cache, instruction cache group and request merging method thereof

Publications (1)

Publication Number Publication Date
CN113867801A true CN113867801A (en) 2021-12-31

Family

ID=79001355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111162371.6A Pending CN113867801A (en) 2021-09-30 2021-09-30 Instruction cache, instruction cache group and request merging method thereof

Country Status (1)

Country Link
CN (1) CN113867801A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269455A (en) * 2022-09-30 2022-11-01 湖南兴天电子科技股份有限公司 Disk data read-write control method and device based on FPGA and storage terminal

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269455A (en) * 2022-09-30 2022-11-01 湖南兴天电子科技股份有限公司 Disk data read-write control method and device based on FPGA and storage terminal
CN115269455B (en) * 2022-09-30 2022-12-23 湖南兴天电子科技股份有限公司 Disk data read-write control method and device based on FPGA and storage terminal

Similar Documents

Publication Publication Date Title
US6038646A (en) Method and apparatus for enforcing ordered execution of reads and writes across a memory interface
US6779089B2 (en) Locked content addressable memory for efficient access
US8527708B2 (en) Detecting address conflicts in a cache memory system
US20100100714A1 (en) System and Method of Indirect Register Access
US20120110303A1 (en) Method for Process Synchronization of Embedded Applications in Multi-Core Systems
US20200371696A1 (en) Method, Apparatus, Device and Storage Medium for Managing Access Request
US7093100B2 (en) Translation look aside buffer (TLB) with increased translational capacity for multi-threaded computer processes
CN115033184A (en) Memory access processing device and method, processor, chip, board card and electronic equipment
US7529876B2 (en) Tag allocation method
CN114416397A (en) Chip, memory access method and computer equipment
US8028118B2 (en) Using an index value located on a page table to index page attributes
CN115454887A (en) Data processing method and device, electronic equipment and readable storage medium
CN113867801A (en) Instruction cache, instruction cache group and request merging method thereof
CN112368676A (en) Method and apparatus for processing data
US20180024951A1 (en) Heterogeneous multi-processor device and method of enabling coherent data access within a heterogeneous multi-processor device
CN116701246B (en) Method, device, equipment and storage medium for improving cache bandwidth
CN112559403B (en) Processor and interrupt controller therein
JP3260456B2 (en) Computer system, integrated circuit suitable for it, and requirement selection circuit
JP2020095345A (en) Arithmetic processing apparatus, memory apparatus, and method of controlling arithmetic processing apparatus
CN114063923A (en) Data reading method and device, processor and electronic equipment
CN115269199A (en) Data processing method and device, electronic equipment and computer readable storage medium
US20140173225A1 (en) Reducing memory access time in parallel processors
CN110647357B (en) Synchronous multithread processor
US20050135402A1 (en) Data transfer apparatus
CN107807888B (en) Data prefetching system and method for SOC architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai

Applicant after: Shanghai Bi Ren Technology Co.,Ltd.

Address before: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai

Applicant before: Shanghai Bilin Intelligent Technology Co.,Ltd.

Country or region before: China

CB02 Change of applicant information