CN111984387B

CN111984387B - Method and processor for scheduling instructions in issue queues

Info

Publication number: CN111984387B
Application number: CN202010869614.9A
Authority: CN
Inventors: 张康康; 王健斌
Original assignee: Shanghai Zhaoxin Semiconductor Co Ltd
Current assignee: Shanghai Zhaoxin Semiconductor Co Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2024-06-25
Anticipated expiration: 2040-08-26
Also published as: CN111984387A

Abstract

The invention provides a method and a processor for dispatching instructions in an issue queue, wherein the method comprises the following steps: selecting, by a selection logic, a maximum number of instructions from the corresponding issue queue segment, the first number of instructions; judging whether the issued command of the previous clock cycle is repeated with the selected command by a filter according to feedback data fed back by an arbiter; discarding the repeated instruction and reserving a maximum number of remaining instructions to a buffer when the issued instruction is repeated with the selected instruction; and determining the instruction issue number of the issue queue segment by the arbiter, and selecting an instruction from the remaining instructions in the buffer according to the instruction issue number of the issue queue segment to output for issue. Therefore, the limitation of circuit speed and issue queue size is greatly reduced, so that the size of the instruction issue window can be enlarged to improve the out-of-order execution efficiency of the processor.

Description

Method and processor for scheduling instructions in issue queues

Technical Field

The present invention relates to a method and processor for scheduling instructions in an issue queue, and more particularly to a method and processor for scheduling instructions in an issue queue for use in a superscalar (Superscalar) processor.

Background

Instruction issue window (instruction issue window) is a key factor in achieving high performance for modern superscalar processors. One major determinant is the size of the issue queue (issue).

There are two types of issue queues that are currently common, the first being a conventional issue queue and the other being a circular segment (CYCLIC SEGMENTED) issue queue. Fig. 1 shows a conventional issue queue. As shown, instructions in the rename (rename) unit randomly enter the issue queue, so the issue queue entry (entry) is program order independent. The order (or age (age)) is retained in another sequence or matrix. Such issue queues have high utilization, but the flexibility of the scheduler is low.

FIG. 2 shows a circular segment issue queue. As shown, instructions in the renamer continue to enter the segmented issue queues seg0 through seg, such that each entry of the issue queue is circularly ordered by program. Such an issue queue would greatly simplify the scheduler.

Both of the above structures require a single clock cycle, e.g., a request and acknowledge cycle. The execution of this single clock cycle validation cycle is typically the critical path of the scheduler and directly limits the circuit speed and the size of the issue queue, thereby limiting the size of the instruction issue window, affecting the performance of out-of-order execution of the processor. This is because an instruction should issue and clear its ready state in the same cycle of a single clock cycle, otherwise it may be issued twice, resulting in an error.

Accordingly, there is a need for a method and processor for scheduling instructions in an issue queue that ameliorates the above-described problems.

Disclosure of Invention

The following disclosure is illustrative only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description. That is, the following disclosure is provided to introduce concepts, advantages, and benefits and to describe novel and non-obvious technical advantages of the present invention. Alternatively, not all embodiments will be described in further detail below. Thus, the following disclosure is not intended to be exhaustive or to limit the scope of the claimed subject matter to the precise form disclosed.

It is therefore a primary object of the present invention to provide a method and processor for scheduling instructions in an issue queue, which ameliorates the above-mentioned disadvantages.

The invention provides a method for dispatching instructions in an issue queue, which comprises the following steps: selecting, by a selection logic, a maximum number of instructions from the corresponding issue queue segment, the first number of instructions; judging whether the issued command of the previous clock cycle is repeated with the selected command by a filter according to feedback data fed back by an arbiter; discarding the repeated instruction and reserving a maximum number of remaining instructions to a buffer when the issued instruction is repeated with the selected instruction; and determining the instruction issue number of the issue queue segment by the arbiter, and selecting an instruction from the remaining instructions in the buffer according to the instruction issue number of the issue queue segment to output for issue.

In some embodiments, the first number is twice the second number.

In some embodiments, the feedback data includes an address or tag of the issue instruction corresponding to a previous clock cycle.

In some embodiments, when the number of instructions after discarding the repeated instructions by the filter exceeds the second number, the filter retains the oldest instruction of the second number after discarding the repeated instructions as the remaining instructions.

The present invention proposes a processor for scheduling instructions in an issue queue, said processor comprising: a selection logic for selecting a maximum number of instructions from the corresponding issue queue segment; a filter coupled to the selection logic for determining whether the issued command of the previous clock cycle is repeated with the command according to a feedback data fed back by an arbiter, wherein when the issued command is repeated with the command, the repeated command is discarded by the filter and a maximum number of remaining commands is reserved; a buffer coupled to the filter for receiving and buffering the remaining instructions transmitted by the filter; and the arbiter is coupled to the buffer, determines the instruction issue number of the issue queue segment, and selects an instruction from the rest instructions in the buffer according to the instruction issue number of the issue queue segment to output for issue.

Therefore, the invention greatly lightens the limit of circuit speed and the size of the issue queue, so that the size of the instruction issue window can be enlarged to improve the out-of-order execution efficiency of the processor.

Drawings

Fig. 1 shows a conventional issue queue.

FIG. 2 shows a circular segment issue queue.

FIG. 3 is a block diagram of an example of a computing system of a processor that can incorporate embodiments of the invention.

Fig. 4 shows a schematic diagram of a processor according to an embodiment of the invention.

FIG. 5 is a detailed architecture for scheduling instructions in a corresponding issue queue segment that may implement the processor depicted in FIG. 4 in accordance with one embodiment of the present invention.

FIG. 6 depicts an exemplary computer-controlled method flow for scheduling instructions in an issue queue, according to an embodiment of the present invention.

Detailed Description

The embodiment of the invention provides a method and a processor for dispatching instructions in an issue queue, which are added into a dispatching structure corresponding to a circular segmentation issue queue for further solving the problem that the size of an instruction issue window is limited by the confirmation cycle of a single clock cycle, and the problem that errors are caused by the fact that the circular segmentation issue queue structure issues instructions twice.

Reference will now be made in detail to the embodiments of the present invention, examples of which are illustrated in the accompanying drawings. While the invention will be described in conjunction with these embodiments, it will be understood that they are not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope of the invention as defined by the appended claims. Furthermore, in the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a thorough understanding of the invention. However, it will be recognized by one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments of the present invention.

Symbol and term:

The detailed description that follows is presented in terms of procedures, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. In the present invention, procedures, logic blocks, processes, and so on, are conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those utilizing physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as transactions, bits, values, elements, symbols, characters, samples, pixels, or the like.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as "selecting", "assigning", "determining", "storing", "sending", "deciding", "verifying", or the like, refer to the actions and processes of a computer system (e.g., the method flowchart 600 of FIG. 6), or similar electronic computing device or processor (e.g., the system 300 of FIG. 3). A computer system or similar electronic computing device operates on and converts data represented in physical (electronic) quantities within a computer system memory, registers, or other such information storage, transmission, or display devices.

The described embodiments of the invention may be generally discussed in terms of computer-executable instructions residing on some form of computer-readable storage medium, such as program modules, being executed by one or more computers or other devices. By way of example, and not limitation, computer readable storage media may comprise non-transitory computer readable storage media and communication media; non-transitory computer-readable media include all computer-readable media except transitory propagating signals. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or distributed as desired in various embodiments.

Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, random access Memory (Random Access Memory, RAM), read-Only Memory (ROM), electrically erasable programmable Read-Only Memory (EEPROM), flash Memory or other Memory technology, compact disk Read-Only Memory (CD-ROM), digital versatile disk (DIGITAL VERSATILE DISC, DVD) or other optical disk storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. The computer storage medium itself contains no signals.

Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modular data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term "modular data signal" refers to a signal having one or more sets of features or altered in a manner that encodes information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio Frequency (RF), infrared and other wireless media. Combinations of the above should be included within the scope of computer-readable media.

FIG. 3 is a block diagram of an example of a computing system 300 that can incorporate a processor 304 of an embodiment of the invention. Computing system 300 broadly represents any single-processor or multi-processor computing device or system capable of executing computer-readable instructions. Examples of computing system 300 include, but are not limited to, a workstation, a laptop, a client terminal, a server, a distributed computing system, a handheld device, or any other computing system or device. In its most basic configuration, computing system 300 may include at least one processor 304 and system memory 306 in accordance with an embodiment of the present invention.

Processor 304 encompasses embodiments of the present invention and generally represents any type or form of processing unit capable of processing data or interpreting and executing instructions. In some embodiments, the processor 304 may receive instructions from a software application or module. These instructions may cause processor 304 to implement the functions of one or more of the exemplary embodiments described and/or illustrated herein. In one embodiment, the processor 304 may be a superscalar (Superscalar) processor. In various embodiments, processor 304 may include multiple processors that operate in parallel.

The system memory 306 generally represents any type or form of volatile or non-volatile storage device or medium capable of storing data and/or other computer-readable instructions. Examples of system memory 306 include, but are not limited to, RAM, ROM, flash memory, or any other suitable memory device. Although not required, in some embodiments computing system 300 may include both volatile memory units (e.g., system memory 306) and non-volatile storage (e.g., main storage 322).

Computing system 300 may include one or more components or elements in addition to processor 304 and system memory 306. For example, in the embodiment of FIG. 3, computing system 300 includes a memory controller 308, an input/output (I/O) controller 310, and a communication interface 312, each of which may be interconnected via a communication infrastructure 302. Communication infrastructure 302 generally represents any type or form of infrastructure capable of facilitating communication between one or more elements in a computing device. Examples of communication infrastructure 302 include, but are not limited to, communication buses such as Industry Standard Architecture (ISA), peripheral component interconnect standards (PERIPHERAL COMPONENT INTERCONNECT, PCI), PCI Express (PCIe), or similar buses, and networks.

Memory controller 308 generally represents any type or form of device capable of processing memory or data or controlling communication between one or more elements of computing system 300. For example, the memory controller 308 may control communications between the processor 304, the system memory 306, and the I/O controller 310 via the communication infrastructure 302.

I/O controller 310 generally represents any type or form of module capable of coordinating and/or controlling the input-output functions of a computing device. For example, the I/O controller 310 may control or facilitate transfer of data between one or more elements of the computing system 300, such as the processor 304, the system memory 306, the communication interface 312, the display card 316, the input interface 320, and the storage interface 324.

Communication interface 312 broadly represents any type or form of communication device or adapter capable of facilitating communication between exemplary computing system 300 and one or more additional devices. For example, the communication interface 312 may facilitate communication between the computing system 300 and a private or public network that includes additional computing systems. Examples of communication interface 312 include, but are not limited to, a wired network interface (such as a network interface card), a wireless network interface (such as a wireless network interface card), a modem, and any other suitable interface. In one embodiment, the communication interface 312 provides a direct connection to a remote server via a direct link to a network such as the Internet. Communication interface 312 may also indirectly provide such a connection through any other suitable connection.

Communication interface 312 may also represent a host interface card configured to facilitate communication between computing system 300 and one or more additional networks or storage devices via an external bus or communication channel. Examples of host interface cards include, but are not limited to, small Computer system interface (Small Computer SYSTEM INTERFACE, SCSI) host interface cards, universal Serial Bus (USB) host interface cards, IEEE (institute of electrical and electronics engineers) 1394 host interface cards, serial Advanced Technology Attachment (SATA) and external SATA (eSATA) host adapters, advanced Technology Attachment (ATA) and parallel ATA (PATA) host interface cards, fibre channel interface adapters, ethernet adapters, and the like. The communication interface 312 may also allow the computing system 300 to participate in decentralized or remote computing. For example, the communication interface 312 may receive instructions from, or send instructions to, a remote device for execution.

As shown in fig. 3, computing system 300 may also include at least one display device 314 coupled to communication infrastructure 302 via a display card 316. Display 314 generally represents any type or form of device capable of visually displaying information forwarded by display adapter 316. Similarly, the display card 316 generally represents any type or form of device configured to forward graphics, text, and other data for display on the display device 314.

As shown in fig. 3, computing system 300 may also include at least one input device 318 coupled to communication infrastructure 302 via an input interface 320. Input device 318 is generally representative of any type or form of input device capable of providing computer-generated or manually-generated input to computing system 300. Examples of input devices 318 include, but are not limited to, a keyboard, a pointing device, a voice recognition device, or any other input device.

As shown in fig. 3, computing system 300 may also include a primary storage 322 and an optional backup storage 323 coupled to communication infrastructure 302 via a storage interface 324. Storage devices 322 and 323 generally represent any type or form of storage device or medium capable of storing data and/or other computer readable instructions. Storage devices 322 and 323 may be, for example, magnetic disk drives (e.g., so-called hard disk drives), floppy disk drives, tape drives, optical disk drives, flash drives, and the like. Storage interface 324 generally represents any type or form of interface or device for transferring data between storage devices 322 and 323 and other elements of computing system 300.

In one example, database 330 may be stored within main storage 322. Database 330 may represent a single database or portion of a computing device, or it may represent multiple databases or computing devices. For example, database 330 may represent (be stored in) a portion of computing system 300. Alternatively, database 330 may represent (be stored in) one or more physically separate devices that are accessible by a computing device, such as computing system 300.

With continued reference to fig. 3, the storage devices 322 and 323 may be configured to read from and/or write to removable storage device units configured to store computer software, data, or other computer readable information. Examples of suitable removable storage units include, but are not limited to, floppy disks, tape cassettes, optical disks, flash memory devices, and the like. Storage devices 322 and 323 may also include other similar structures or devices to allow computer software, data, or other computer readable instructions to be loaded into computing system 300. For example, storage devices 322 and 323 may be configured to read and write software, data, or other computer readable information. Storage devices 322 and 323 may also be part of computing system 300 or may be separate devices accessed through other interface systems.

Many other devices or subsystems may be connected to computing system 300. Rather, all of the components and devices shown in FIG. 3 need not be provided to practice the embodiments of the present invention. The devices and subsystems referenced above may also be interconnected in different ways from that shown in fig. 3. Computing system 300 may also employ any number of software, firmware, and/or hardware settings. For example, exemplary embodiments of the invention may be encoded as a computer program (also known as computer software, software applications, computer-readable instructions, or computer control logic) on a computer-readable medium.

The computer readable medium containing the computer program may be loaded into computing system 300. All or a portion of the computer program stored on the computer readable medium may then be stored in system memory 306 and/or portions of storage devices 322 and 323. The computer programs loaded into computing system 300 when executed by processor 304 may enable processor 304 to implement and/or be a means for implementing the functions of the exemplary embodiments described and/or illustrated herein. Additionally or alternatively, the exemplary embodiments described and/or illustrated herein may be implemented in firmware and/or hardware.

Fig. 4 shows an architecture diagram of a processor 400 according to an embodiment of the invention. The exemplary processor 400 includes an issue queue divided into S segments (segments) and a scheduler 410, a buffer 430, and an arbiter 440 for each segment. In one embodiment, the issue queue may be a circular segmented issue queue. Only one set of scheduler 410, buffer 430, and arbiter 440, for which the ith segment corresponds, is shown in fig. 4 for ease of illustration. Wherein execution time of one clock cycle is required for an instruction to proceed from the issue queue to the buffer 430, and execution time of one clock cycle is also required for an instruction to proceed from the buffer 430 to the output of the arbiter 440. In one embodiment, the issue queue receives instructions from a rename unit (not shown), and the instructions output by the segmented arbiter 440 are merged and issued to corresponding execution units (not shown; e.g., floating point units, etc.).

The scheduler 410 in turn includes a selection logic 412 and a filter 414. Selection logic 412 requests a maximum number of first number of instructions to be selected from entries in the corresponding issue queue segment (e.g., seg0, seg1, segi, …, seg (S-1)). Selection logic 412 may be implemented, for example, by a binary tree (binary tree) structure, although the invention is not limited in this regard. The filter 414 is coupled to the selection logic 412 and receives a maximum number of instructions and a feedback data fed back by the arbiter 440. And filter 414 determines whether each issued instruction selected in the previous clock cycle (for filter 414) is repeated with any of the maximum number of instructions selected in the present clock cycle (for filter 414) based on the feedback data fed back by arbiter 440. When the issued instruction selected in the previous clock cycle has any instruction that is repeated with the maximum number of instructions of the first number, the filter 414 discards the repeated instructions and retains the remaining instructions of the maximum number of instructions of the second number. In one embodiment, if the number of instructions after discarding the repeated instructions exceeds the second number, filter 414 may select the oldest instruction of the second number after discarding the repeated instructions to remain as the remaining instructions, although the invention is not limited in this respect. In an embodiment, the first number is twice as large as the second number, but the invention is not limited thereto, and the ratio between the first number and the second number can be designed according to different requirements.

The buffer 430 is coupled to the filter 414 for receiving and buffering the remaining instructions transmitted from the filter 414. Buffer 430 is preceded by a clock cycle worse than after buffer 430. The arbiter 440 is coupled to the buffer 430 for feeding back a feedback data to the filter 414, wherein the feedback data is the address or tag (tag) of each issued instruction selected corresponding to the previous clock cycle (for the filter 414), but the invention is not limited thereto. The arbiter 440 determines the number of instructions issued by the ith segment based on the number of instructions ready for issue in each of the circularly queued segments from the oldest segment to the ith-1 segment in time order and the number of instructions ready for issue in the ith segment in the present clock cycle (for the arbiter 440), for example, determining the number of instructions issued by the present segment as the smaller of the second number minus the sum of the number of instructions ready for the oldest segment to the ith-1 segment and the number of instructions ready for the ith segment, such that the sum of the number of instructions issued by the first ith segment in time order of the issue queue does not exceed the second number, so that eventually the sum of the number of instructions issued by all segments of the issue queue does not exceed the second number, with the corresponding number being the execution unit of the second number. Then, the arbiter 440 selects and outputs the instruction from the rest instructions in the buffer 430 according to the instruction issue number of the i-th segment (for the arbiter 440) in the present clock cycle; in one embodiment, if the number of remaining instructions is greater than the number of instructions issued, arbiter 440 may select the oldest instruction of the number of instructions issued in the remaining instructions for issue, although the invention is not limited in this respect. The arbiter 440 also transmits a confirmation message to the corresponding entry of the issue queue segment to invalidate the valid bit in the entry corresponding to each issued instruction to indicate that the instruction has been issued (in this case, the arbiter 440 is in the previous clock cycle for the issue queue), wherein the confirmation message may include the address or tag of each issued instruction corresponding to the previous clock cycle (for the issue queue), but the invention is not limited thereto. It is noted that the acknowledgement information need not be the same as the content contained in the feedback data. For example, the feedback data may include a tag of the issue instruction corresponding to the previous clock cycle (for issue queue and filter 414), and the acknowledgement information may include an address of the issue instruction corresponding to the previous clock cycle (for issue queue and filter 414). Of course, the acknowledgement information may be the same as the content contained in the feedback data. Finally, the instructions issued by all the segments of the issue queue are merged and issued to the corresponding execution units to execute the instruction content.

FIG. 5 is a detailed architecture 500 for scheduling instructions in a corresponding issue queue segment that may implement the processor 400 depicted in FIG. 4 in accordance with one embodiment of the present invention. As shown in fig. 5, a plurality of compact (compare) arithmetic logic 5122 and shifter 5124 in selection logic 512 may request to pick a maximum number of instructions of 2W from the M entries of the corresponding issue queue segment 502. Those skilled in the art will appreciate that there are many different implementations of the selection logic 512 in the prior art, and this is not repeated here. The filter 514 receives a maximum number of 2W instructions from the select logic 512 (the bus width between the select logic 512 and the filter 514 is 2W) and a feedback data fed back by the arbiter 540. The filter 514 determines whether the issue command selected in the previous clock cycle (for the filter 514) is repeated with the maximum number of 2W commands in the present clock cycle (for the filter 514) according to the feedback data fed back by the arbiter 540. When the issue instruction selected for the previous clock cycle is repeated with the maximum number of instructions of 2W in the present clock cycle, the filter 514 discards the repeated instructions and retains the remaining maximum number of instructions of 2W. The filter 514 may be implemented, for example, by a structure of a plurality of comparator logic circuits to compare whether the address of the issued instruction selected in the previous clock cycle is identical to the address of the instruction in the present clock cycle one by one, but the present invention is not limited thereto. In one embodiment, if the number of instructions after discarding the repeated instruction exceeds the number W, the filter 514 may select the oldest instruction of the number W after discarding the repeated instruction to remain as the remaining instructions, but the invention is not limited thereto.

Ready counter 552 counts the number RdyCnt of ready-to-issue instructions in the corresponding issue queue segment 502 (i.e., instructions with valid bits in the corresponding entry) and subtracts the number of instruction issues IssueCnt of the previous clock cycle (for issue queue and ready counter 552) by subtraction logic 554 to obtain the net number of ready instructions NtRdyCnt for the segment in the present clock cycle (for issue queue and ready counter 552), where when the difference of RdyCnt minus IssueCnt is greater than W, The subtraction logic 554 defines the net ready instruction number NtRdyCnt as W by a saturation operation. Buffer 530 receives and buffers the remaining instructions transmitted by filter 514; the bus widths between filter 514 and buffer 530 and between buffer 530 and arbiter 540 are W. The control logic 5402 in the arbiter 540 receives the net ready instruction count NtRdyCnt for the current segment in the current clock cycle (for the arbiter 540) calculated by the subtraction logic 554 and the sum of the net ready instruction counts Offset for each segment in the current clock cycle (for the arbiter 540) from the oldest segment in the circular queue time sequence to the previous segment (without the current segment) and determines the instruction issue count for the current segment for the current clock cycle as the smaller of the sum of the net ready instruction count for the previous segment subtracted by W and the net ready instruction count for the current segment by the statistics logic 558, min (W-Offset, ntRdyCnt), so that the sum of the number of instructions issued by all issue queue segments preceding (including) the present segment in time order does not exceed the number W. In addition, the net ready instruction number NtRdyCnt of the present segment is also transferred to all subsequent segments in time sequence for all subsequent segments to count the total sum of the net ready instruction numbers of the segments preceding them, respectively. Then, the grant logic 5404 in the arbiter 540 selects the instruction issued in the clock cycle (for the arbiter 540) from the remaining instructions transmitted from the buffer 530 according to the number of the instruction issued in the current segment received from the control logic 5402, and feeds back a feedback data to the filter 514 in the next clock cycle (for the arbiter 540), wherein the feedback data is the address or tag of the issued instruction corresponding to the clock cycle (for the arbiter 540). In one embodiment, if the number of remaining instructions is greater than the number of instructions issued, the authorization logic 5404 may be to select the oldest instruction of the number of instructions issued in the remaining instructions for issue, although the invention is not limited in this respect. The arbiter 540 also transmits an acknowledge to the entry of the corresponding issue queue segment to invalidate the valid bit in the entry corresponding to each issued instruction to indicate that the instruction has issued (in this case, the arbiter 540 is in the previous clock cycle for the issue queue), wherein the acknowledge may include the address or tag of the issued instruction corresponding to the previous clock cycle (for the issue queue).

In one embodiment, the arbiter 540 may receive the remaining instructions from the buffer 530 as input and the grant logic 5404 outputs a plurality of selection signals to control the multiplexer to output a corresponding number of instructions for issue, but the invention is not limited thereto. Finally, the instructions issued by all the segments of the issue queue are merged and issued to the corresponding execution units to execute the instruction content.

FIG. 6 depicts an exemplary computer-controlled method flow 600 for scheduling instructions in an issue queue, according to an embodiment of the invention. Although the steps in the flowcharts are shown and described in a sequential order, it should be appreciated by those skilled in the art that some or all of the steps may be performed in a different order and some or all of the steps may be performed in parallel. Furthermore, in one or more embodiments of the invention, one or more of the steps described below may be omitted, repeated, and/or implemented in a different order. Accordingly, the arrangement of steps in FIG. 6 should not be construed as limiting the disclosed aspects. In addition, other functional flows are within the scope and spirit of the present invention as will be apparent to those skilled in the relevant art in view of the teachings provided herein. The method flow 600 may be described with continued reference to the above-described exemplary embodiments, although the method is not limited to this embodiment.

FIG. 6 shows a flowchart 600 of a method for scheduling instructions in an issue queue according to an embodiment of the present invention. The method may be performed in a processor as shown in fig. 3-5.

In step S605, a maximum number of instructions of a first number are selected from the corresponding issue queue segments by a selection logic in the scheduler. Next, in step S610, a filter in the scheduler determines whether the issued instruction of the previous clock cycle (for the filter) is repeated with the first number of instructions of the maximum number according to a feedback data fed back by an arbiter, wherein the feedback data is an address or a tag of the issued instruction of the previous clock cycle (for the filter). Then, in step S615, when the issued command has a repetition with the first number of commands, the repeated commands are discarded by the filter and a remaining command with a second number of commands is reserved for being transferred to the buffer, wherein the first number may be twice the second number. In step S620, the arbiter determines the number of instructions issued by the segments of the issue queue so that the sum of the numbers of instructions issued by all segments of the issue queue does not exceed the second number, and selects instructions from the remaining instructions in the buffer to issue according to the number of instructions issued, and provides the feedback data to the filter (for the arbiter, the filter is in the next clock cycle).

As described above, by discarding the issued instruction of the previous clock cycle (for the filter) in the feedback data by the filter, the error caused by the issue of the instruction twice can be avoided, so that the scheduler can select the instruction number twice larger than the output instruction number from the issue queue in each clock cycle, and the critical path of the validation loop is also lengthened to two clock cycles, thereby greatly reducing the limitation of the circuit speed and the size of the issue queue, and enlarging the size of the instruction issue window to improve the out-of-order execution efficiency of the processor. While the delay in this operation increases slightly, the benefit of expanding the instruction issue window is a significant improvement in processor out-of-order execution performance.

Any particular order or hierarchy of steps in the processes disclosed herein is purely by way of example. Based on design preferences, it is understood that any specific order or hierarchy of steps in the programs may be rearranged within the scope of the disclosure herein. The accompanying method claims present elements of the various steps in a sample order, and are, therefore, not limited to the specific order or hierarchy presented.

The use of ordinal terms such as "first," "second," "third," etc., in the claims to modify a component does not by itself connote any priority, precedence, order or order of steps performed by a method, but are used merely as labels to distinguish between different components having the same name (but for use of the ordinal term).

The above description is only of the preferred embodiments of the present application, but not limited thereto, and any person skilled in the art can make further modifications and variations without departing from the spirit and scope of the present application, and the scope of the present application is defined by the appended claims.

Claims

1. A method for scheduling instructions in an issue queue, comprising:

selecting, by selection logic, a maximum number of instructions from the corresponding issue queue segment;

judging whether the issued instruction of the previous clock cycle is repeated with the instruction selected by the current clock cycle or not by a filter according to feedback data fed back by an arbiter;

Discarding the repeated instructions by the filter and reserving the remaining instructions with the maximum number of the second number to be sent to the buffer when the issued instruction is repeated with the selected instruction; and

The arbiter determines the instruction issue number of the issue queue segment, and selects and outputs instructions from the remaining instructions in the buffer according to the instruction issue number of the issue queue segment.

2. The method for scheduling instructions in an issue queue of claim 1 wherein said first number is twice said second number.

3. A method for scheduling instructions in an issue queue as claimed in claim 1, wherein said feedback data comprises an address or tag of said issue instruction corresponding to a previous clock cycle.

4. The method for scheduling instructions in an issue queue of claim 1, wherein when the number of instructions after discarding duplicate instructions by said filter exceeds said second number, said filter retains the oldest instruction of said second number after discarding duplicate instructions as said remaining instructions.

5. The method for scheduling instructions in an issue queue of claim 1 wherein said arbiter further transmits an acknowledge to said issue queue segment to invalidate a valid bit in an entry of said issue queue segment corresponding to an instruction issued by said arbiter.

6. The method for scheduling instructions in an issue queue of claim 5 further comprising:

And calculating the number of instructions with valid bits in the corresponding entries in the issue queue segment, and subtracting the number of instruction issues of the previous clock cycle to obtain a net ready instruction number, wherein the arbiter determines the instruction issue number of the issue queue segment according to the net ready instruction number of the issue queue segment and a sum of the net ready instruction numbers of the time-sequentially oldest issue queue segments to segments preceding the issue queue segment.

7. The method of claim 6 wherein said arbiter determines said number of instructions issued by said issue queue segment to be the smaller of said second number minus said sum and said number of net ready instructions for said issue queue segment.

8. A processor for scheduling instructions in an issue queue, said processor comprising:

selection logic that selects a maximum number of instructions from the corresponding issue queue segment that is a first number;

A filter coupled to the selection logic for determining whether the issued command of the previous clock cycle is repeated with the selected command of the current clock cycle according to the feedback data fed back by the arbiter, wherein when the issued command is repeated with the selected command, the repeated command is discarded by the filter and the remaining command with the maximum number of the second number is reserved;

A buffer coupled to the filter for receiving and buffering the remaining instructions transmitted by the filter; and

The arbiter is coupled to the buffer, determines the number of instructions issued by the issue queue segment, and selects instructions from the remaining instructions in the buffer according to the number of instructions issued by the issue queue segment for outputting for issuing.

9. The processor for scheduling instructions in an issue queue of claim 8 wherein said first number is twice said second number.

10. A processor for scheduling instructions in an issue queue as claimed in claim 8 wherein said feedback data comprises an address or tag of said issue instruction corresponding to a previous clock cycle.

11. The processor for scheduling instructions in an issue queue of claim 8 wherein when the number of instructions after said filter discards duplicate instructions exceeds said second number, said filter retains the oldest instruction of said second number after discarding duplicate instructions as said remaining instructions.

12. The processor for scheduling instructions in an issue queue of claim 8 wherein said arbiter further transmits an acknowledge to said issue queue segment to invalidate a valid bit in an entry of said issue queue segment corresponding to an instruction issued by said arbiter.

13. The processor for scheduling instructions in an issue queue of claim 12 wherein said arbiter determines said number of instruction issues for said issue queue segment based on a sum of a net ready number of instructions for said issue queue segment and said net ready number of instructions for each segment in time sequence that is oldest issue queue segment to said issue queue segment, wherein said net ready number of instructions is the number of instructions for which a valid bit in a corresponding entry in said issue queue segment is valid minus said number of instruction issues for a previous clock cycle.

14. The processor for scheduling instructions in an issue queue of claim 13 wherein said arbiter determines said number of instruction issues for said issue queue segment to be the smaller of said second number minus said sum and said number of net ready instructions for said issue queue segment.