WO2024093063A1 - 一种译码方法、处理器、芯片及电子设备 - Google Patents

一种译码方法、处理器、芯片及电子设备 Download PDF

Info

Publication number
WO2024093063A1
WO2024093063A1 PCT/CN2023/078435 CN2023078435W WO2024093063A1 WO 2024093063 A1 WO2024093063 A1 WO 2024093063A1 CN 2023078435 W CN2023078435 W CN 2023078435W WO 2024093063 A1 WO2024093063 A1 WO 2024093063A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
decoder
group
microinstruction
groups
Prior art date
Application number
PCT/CN2023/078435
Other languages
English (en)
French (fr)
Inventor
崔泽汉
Original Assignee
海光信息技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 海光信息技术股份有限公司 filed Critical 海光信息技术股份有限公司
Publication of WO2024093063A1 publication Critical patent/WO2024093063A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3818Decoding for concurrent execution
    • G06F9/3822Parallel decoding, e.g. parallel decode units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3814Implementation provisions of instruction buffers, e.g. prefetch buffer; banks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • Embodiments of the present disclosure relate to a decoding method, a processor, a chip, and an electronic device.
  • decoding is the process of parsing and translating the instruction fetched to obtain micro-instructions (micro-op, Uop).
  • micro-op micro-instructions
  • the embodiments of the present disclosure provide a decoding method, a processor, a chip and an electronic device to implement parallel decoding of instructions and obtain a microinstruction sequence consistent with the instruction fetch order, thereby improving the decoding performance of the processor.
  • the present disclosure provides a decoding method, which is applied to a processor.
  • the method includes:
  • the instruction fetch request carrying at least one switching mark, the switching mark at least indicating an instruction position for performing a decoder group switching;
  • an instruction stream indicated by the instruction fetch request is obtained, and an instruction position for switching the decoder group in the instruction stream is determined according to a switch mark carried by the instruction fetch request; according to the instruction position, the instruction stream is allocated to a plurality of decoder groups for parallel decoding, and a switch mark is attached to a target microinstruction obtained by decoding a target instruction, and the target instruction is the instruction corresponding to the instruction position;
  • the microinstruction corresponding to the instruction fetch request is obtained from the microinstruction cache, and the obtained microinstruction is not accompanied by a switch mark.
  • the present disclosure also provides a processor, including:
  • a branch prediction unit configured to generate an instruction fetch request, wherein the instruction fetch request carries at least one switch tag, and the switch tag at least indicates an instruction position for performing a decoder group switch;
  • An instruction cache for obtaining the instruction stream indicated by the instruction fetch request in response to the microinstructions decoded by the decoder group, and determining the instruction position for performing decoder group switching in the instruction stream according to the switching flag carried by the instruction fetch request;
  • An instruction distribution unit used for distributing the instruction stream to a plurality of decoder groups for parallel decoding according to the instruction position
  • a decoder group is used to decode the assigned instruction to obtain a microinstruction; the number of the decoder groups is multiple, wherein when the decoder group decodes a target instruction, a switch mark is attached to the target microinstruction obtained by decoding the target instruction, and the target instruction is the instruction corresponding to the instruction position;
  • the microinstruction cache is used to respond to the microinstruction found by the microinstruction cache. If the instruction fetch request hits in the microinstruction cache, the microinstruction corresponding to the instruction fetch request is obtained from the microinstruction cache, and the obtained microinstruction is not accompanied by a switching mark.
  • An embodiment of the present disclosure also provides a chip, comprising the processor as described above.
  • An embodiment of the present disclosure also provides an electronic device, comprising the chip as described above.
  • the decoding method provided by the embodiment of the present disclosure can be applied to a processor, and at least one switching mark can be carried in an instruction fetch request, and the switching mark at least indicates the instruction position for switching the decoder group.
  • the processor responds to the microinstructions decoded by the decoder group, the instruction stream indicated by the instruction fetch request can be obtained, and the instruction position for switching the decoder group in the instruction stream can be determined according to the switching mark carried by the instruction fetch request; then, according to the instruction position, the instruction stream is allocated to multiple decoder groups for parallel decoding, and the target microinstructions obtained by decoding the target instruction are accompanied by a switching mark, and the target instruction is the instruction corresponding to the instruction position, so that the microinstructions decoded by the multiple decoder groups can be merged according to the switching mark attached to the target microinstruction, and the microinstructions corresponding to the instruction fetch order can be obtained.
  • the embodiment of the present disclosure can not decode the instruction through the decoder group, but obtain the microinstructions corresponding to the instruction fetch request from the microinstruction cache.
  • the disclosed embodiments can be used in a processor having a microinstruction cache and a decoder by fetching an instruction request.
  • the switch mark is carried in the instruction fetch request, and the switch mark at least indicates the instruction position for switching the decoder group, so that in the decoder mode in which the decoder group decodes the microinstruction, the switch mark is transparently transmitted through the target instruction and the target microinstruction to realize parallel decoding supporting multiple decoder groups, and the decoded microinstructions can be merged according to the instruction fetch order of the instruction, thereby improving the decoding efficiency; and for the microinstruction cache mode in which the processor obtains the microinstruction by searching the microinstruction cache, the embodiment of the present disclosure may not be processed based on the switch mark carried in the instruction fetch request, thereby being compatible with the microinstruction cache mode of the processor.
  • the embodiment of the present disclosure can realize support for parallel decoding in a processor that supports the decoder mode and the microinstruction cache mode, thereby
  • FIG1A is a block diagram of a processor architecture
  • FIG1B is another architecture block diagram of a processor
  • FIG2A is a block diagram of an architecture of a processor provided by at least one embodiment of the present disclosure.
  • FIG2B is an optional flow chart of a decoding method provided by at least one embodiment of the present disclosure.
  • FIG2C is another architecture block diagram of a processor provided by at least one embodiment of the present disclosure.
  • FIG3A is a schematic diagram of dividing an instruction stream provided by at least one embodiment of the present disclosure.
  • FIG3B is a schematic diagram of merging microinstructions provided by at least one embodiment of the present disclosure.
  • FIG4A is a block diagram of a processor architecture with a microinstruction cache
  • FIG4B is another architecture block diagram of a processor provided by at least one embodiment of the present disclosure.
  • FIG5A is an optional schematic diagram of storing microinstructions in a microinstruction cache mode provided by at least one embodiment of the present disclosure
  • FIG. 5B is another optional schematic diagram of storing microinstructions in a microinstruction cache mode provided by at least one embodiment of the present disclosure.
  • FIG. 6 is another optional flow chart of a decoding method provided by at least one embodiment of the present disclosure.
  • Instructions are commands that control the execution of computer operations, also known as machine instructions.
  • the role of instructions is to coordinate the working relationship between various hardware components. It reflects the basic functions of the computer and is the smallest functional unit of computer operation.
  • the processor needs to process the instruction and convert it into machine language that can be recognized by the machine.
  • pipeline technology is generally used to implement instruction processing.
  • a branch prediction unit can also be set at the front end of the pipeline of the processor processing instructions to realize the branch prediction of the instruction.
  • FIG. 1A exemplarily shows an architecture block diagram of a processor, which includes: a branch prediction unit 101 , an instruction cache 102 , and a decoder group 103 .
  • the branch prediction unit 101 is a digital circuit that can perform branch prediction on instructions and generate instruction fetch requests based on the branch prediction results.
  • the branch prediction results include whether the current instruction is a branch instruction, the branch result of the branch instruction (direction, address, target address, etc.), etc.
  • the branch prediction unit can perform branch prediction on the instruction based on the historical execution information and results of the branch instruction, thereby obtaining the instruction fetch address range of the instruction and generating an instruction fetch request.
  • the instruction fetch request generated by the branch prediction unit contains the instruction fetch addresses of several instructions, which are used to read the corresponding instructions from the instruction cache 102.
  • Instructions are stored in the instruction cache 102, and the instruction cache 102 mainly stores instructions through instruction cache blocks.
  • each instruction cache block corresponds to a Tag tag, which is used to identify the instruction cache block in the instruction cache, so that when the instruction cache performs instruction fetching according to the instruction fetch request, the corresponding instruction cache block can be found based on the Tag.
  • the instruction fetch address generated by the branch prediction unit may correspond to multiple instructions, and the multiple instructions form an instruction stream.
  • the instruction cache 102 can be located in the processing The portion of the processor's first-level cache used to store instructions.
  • the instruction fetch address generated by the branch prediction unit contains a Tag area (address identification area) and an Index area (address index area).
  • the Index area in the instruction fetch address can be used to read the tags of multiple instruction cache blocks in the instruction cache, and then the tags of the read instruction cache multiple instruction cache blocks are matched with the tag of the instruction fetch address to obtain the storage location of the instruction corresponding to the instruction fetch address in the instruction cache (i.e., the location of the instruction cache block), thereby reading the corresponding instruction.
  • the decoder group 103 can parse and translate the instructions. By decoding the instructions, the decoded instructions can be obtained.
  • the decoded instructions can be the operation information that can be executed by the machine obtained by translating the instructions, such as the Uop (micro-op) that can be executed by the machine formed by the control field; that is, the decoder can decode the instructions to obtain microinstructions.
  • FIG1A uses a single decoder group to decode instructions, which is limited by the throughput of the decoder group, and the decoding efficiency of instructions is difficult to effectively improve. Based on this, a processor that uses multiple decoder groups to decode instructions in parallel has emerged.
  • FIG1B exemplarily shows another architecture block diagram of a processor. In combination with FIG1A and FIG1B , the processor shown in FIG1B is provided with multiple decoder groups 1031 to 103n, and the specific value of n can be determined according to the specific design of the processor, and the embodiment of the present disclosure is not limited.
  • the instruction stream indicated by the instruction cache 102 based on the instruction fetch request can be allocated to multiple decoder groups for decoding, so that multiple decoder groups perform parallel decoding of instructions, and respectively output the microinstructions obtained by decoding, thereby improving the decoding efficiency of instructions.
  • the multiple decoder groups may be two decoder groups, for example, decoder group 0 and decoder group 1.
  • decoder group 1 can also perform decoding operations on the instruction; for example, within one clock cycle of the processor, decoder group 0 and decoder group 1 can simultaneously perform decoding operations on the instruction and obtain microinstructions, thereby realizing parallel decoding of the instruction.
  • decoder group 0 and decoder group 1 may not decode in the order of the instructions, but may support parallel decoding of the instructions.
  • the processor may set more than two decoder groups as needed. For ease of understanding, the disclosed embodiment only shows an example of two decoder groups.
  • FIG2A exemplarily shows an architecture block diagram of a processor provided by at least one embodiment of the present disclosure.
  • the processor shown in FIG2A is further provided with an instruction allocation unit 201, which is used to split the instruction stream fetched by the instruction cache 102, so as to obtain multiple instruction groups allocated to the decoder group for parallel decoding; at the same time, in order to make the division of the instruction stream have a basis, the embodiment of the present disclosure carries a switching mark in the instruction fetch request, and the switching mark can at least indicate the instruction position for switching the decoder group, so that by transparently transmitting the switching mark to the instruction cache in accordance with the instruction fetch request to fetch the instruction stream, the instruction allocation unit 201 can be made to split the instruction stream according to the instruction position for switching the decoder group, and the multiple instruction groups after the instruction stream is split are allocated to multiple decoder groups for parallel decoding, providing technical support for the parallel decoding of instructions in multiple decoder groups.
  • an instruction allocation unit 201 which is used to split the instruction stream fetched by the instruction cache 102, so as to obtain multiple instruction groups allocated to the decoder group for parallel decoding; at
  • the switch mark can also be transmitted to the microinstruction through the instruction corresponding to the instruction position, so that after each decoder group decodes the microinstruction, the microinstruction can be merged based on the switch mark in the microinstruction, providing technical support for the merging result of the microinstruction to correspond to the instruction fetch order.
  • FIG. 2B exemplarily shows an optional flow chart of a decoding method provided by at least one embodiment of the present disclosure.
  • the decoding method can be considered as a method flow of a processor when a decoder group is used for decoding, that is, a method flow of obtaining microinstructions by decoding by a decoder group.
  • the method flow may include the following steps.
  • step S21 an instruction fetch request is generated, wherein the instruction fetch request carries at least one switching mark, and the switching mark at least indicates an instruction position for performing decoder group switching.
  • the instruction fetch request generated by the branch prediction unit may carry a switch tag.
  • the branch prediction direction is mainly divided into two types: branch instruction jump and branch instruction non-jump; accordingly, the instruction fetch address generated by the branch prediction unit can be divided into two types: the instruction fetch address corresponding to the branch prediction direction of the jump, and the instruction fetch address corresponding to the branch prediction direction of the non-jump.
  • the embodiment of the present disclosure can set a switch tag according to the address position corresponding to the branch prediction direction of the jump, and generate an instruction fetch request carrying at least one switch tag.
  • the switch flag may also be set by other mechanisms, not limited to the branch prediction unit setting the switch flag in the instruction fetch request based on the branch prediction situation.
  • the embodiment of the present disclosure may use other devices in the processor (such as the instruction cache) to set the switch flag in the instruction fetch request.
  • the instruction cache may set a switch mark in the instruction fetch request based on an instruction boundary, and the instruction boundary may indicate the end position of the instruction.
  • step S22 the instruction stream indicated by the instruction fetch request is obtained, and the instruction position for performing decoder group switching in the instruction stream is determined according to the switch tag carried by the instruction fetch request.
  • the instruction cache obtains the instruction fetch request of the branch prediction unit, and fetches the instruction according to the instruction fetch address in the instruction fetch request, thereby obtaining the instruction stream corresponding to the instruction fetch request. If the instruction fetch request carries a switch mark, and the switch mark at least indicates the instruction position for performing a decoder group switch, the instruction position for performing a decoder group switch in the instruction stream can be determined according to the switch mark carried by the instruction fetch request.
  • an instruction stream is a set of instruction sequences including several instructions. If there is no clear boundary in the instruction sequence, the end position of the instruction in the instruction sequence cannot be determined.
  • the boundary of the instruction stream can be determined by determining the instruction position for switching the decoder group in the instruction stream according to the switch mark carried by the instruction fetch request, and the instruction position can be used as the end position. If the instruction corresponding to the instruction position is used as the target instruction, the instruction position is the end position of the target instruction, so that the end position of the target instruction in the instruction stream can be determined according to the instruction position indicated by the switch mark carried by the instruction fetch request.
  • the switch mark is used to at least indicate the instruction position for switching the decoder group, its setting position in the instruction fetch request will not affect the instruction stream of the instruction cache fetch, nor will it damage the structure of the instruction stream fetched.
  • the specific setting position and expression form of the switch mark in the embodiment of the present disclosure are not limited, and it can be, for example, an indication field outside the instruction stream fetched by the instruction cache or represented by a switch mark indication bit.
  • step S23 according to the instruction position, the instruction stream is distributed to multiple decoder groups for parallel decoding, and a switching mark is attached to the target microinstruction obtained by decoding the target instruction, and the target instruction is the instruction corresponding to the instruction position.
  • the instruction allocation unit may allocate the instruction stream to multiple decoder groups for parallel decoding according to the instruction position (i.e., the instruction position for switching the decoder group).
  • the instruction allocation unit may divide the instruction stream according to the instruction position to obtain multiple instruction groups, and then allocate the multiple instruction groups to multiple decoder groups for parallel decoding.
  • the instruction dispatching unit divides the instruction stream according to the instruction position, which may be to divide the instruction stream into a plurality of instruction streams with the instruction position as a boundary. Instruction groups, wherein the target instruction as a boundary in two adjacent instruction groups is cut into the previous instruction group.
  • the instruction assignment unit can assign a decoder group to the next instruction group according to the switch mark corresponding to the target instruction cut into the previous instruction group, and the decoder group assigned to the previous instruction group is different from the decoder group assigned to the next instruction group, wherein the switch mark corresponding to the target instruction can be a switch mark corresponding to the end position indicating the target instruction.
  • the disclosed embodiment can parse the target instruction corresponding to the translation instruction position, and attach a switch mark to the target microinstruction obtained by decoding the target instruction.
  • the target microinstruction obtained by decoding the target instruction can be a combination of two microinstructions, one of which is a microinstruction without a switch mark, and the other is a microinstruction with a switch mark.
  • the disclosed embodiments may also merge the microinstructions decoded by the multiple decoder groups according to the switch mark attached to the target microinstruction to obtain microinstructions corresponding to the instruction fetching order. It is understandable that in order to achieve complete operation of the program, it is necessary to merge the microinstructions decoded by multiple decoder groups, and the order of the microinstruction sequence obtained after the microinstructions are merged must also correspond to the instruction fetching order.
  • the switch mark at least indicates the instruction position for switching the decoder group.
  • the instruction stream indicated by the instruction fetch request can be obtained, and the instruction position for switching the decoder group in the instruction stream can be determined according to the switch mark carried by the instruction fetch request, and then according to the instruction position, the instruction stream is allocated to multiple decoder groups for parallel decoding, and the switch mark is attached to the target microinstruction obtained by decoding the target instruction, and the target instruction is the instruction corresponding to the instruction position.
  • the switch mark can be used to indicate the instruction position for switching the decoder group, and the switch mark is transmitted from the instruction fetch request to the instruction position in the instruction stream of the instruction fetch, so as to realize the segmentation of the instruction stream of the instruction fetch based on the instruction position, and allocate it to multiple decoder groups for parallel decoding, which effectively improves the decoding efficiency of the processor.
  • the disclosed embodiment can pass the switch mark into the target microinstruction by parsing and translating the target instruction serving as the boundary.
  • the microinstructions decoded by the multiple decoder groups are merged according to the switch mark to obtain the microinstructions corresponding to the instruction fetching order, so as to facilitate the accurate execution of the microinstructions.
  • FIG2C shows another architecture block diagram of a processor provided by at least one embodiment of the present disclosure.
  • each decoder group is provided with a corresponding instruction queue and a microinstruction queue, for example, decoder groups 1031 to 103n are provided with instruction queues 2021 to 202n, and microinstruction queues 2031 to 203n, respectively, and one decoder group corresponds to one instruction queue and one microinstruction queue.
  • the instruction queue of the decoder group is used to store the instruction group assigned to the decoder group by the instruction allocation unit, that is, the instruction queue can store the instructions to be decoded by the decoder group; for example, the instruction queue 2021 stores the instructions to be decoded by the decoder group 1031, and so on, the instruction queue 202n stores the instructions to be decoded by the decoder group 103n, wherein, when implementing the parallel decoding of multiple decoder groups, as long as the speed of the instruction queue of each decoder group storing the instructions to be decoded is faster than the decoding speed of the decoder group, the decoder group can continuously obtain instructions from the instruction queue for decoding, thereby realizing the parallel decoding of multiple decoder groups.
  • the microinstruction queue of the decoder group is used to store the microinstructions decoded by the decoder group; for example, the microinstruction queue 2031 stores the microinstructions decoded by the decoder group 1031, and so on, the microinstruction queue 203n stores the microinstructions decoded by the decoder group 103n.
  • the processor further provides a merging unit 204, which can read instructions from multiple microinstruction queues and merge them so that the order of the merged microinstructions corresponds to the instruction fetching order.
  • the processor architecture shown in Figure 2C can, during the decoding process, pass the switch mark in the instruction fetch request through the corresponding instruction to the microinstruction to achieve parallel decoding of multiple decoder groups, and merge the microinstructions in the order of instruction fetch.
  • the optional specific process is shown below.
  • the branch prediction unit 101 generates an instruction fetch request carrying a switch tag, and sends the instruction fetch request to the instruction cache 102, so as to read the corresponding instruction stream according to the instruction fetch request in the instruction cache 102.
  • the switch tag of the instruction fetch request indicates the instruction position for the decoder group switch, and does not affect the instruction fetch address to search for the instruction corresponding to the instruction position in the instruction cache.
  • the instruction cache 102 reads the instruction stream according to the instruction fetch address of the instruction fetch request, wherein the switch tag carried by the instruction fetch request does not affect the instruction fetch instruction of the instruction cache. After reading the instruction stream, the instruction cache 102 can determine the instruction position for the decoder group switch in the instruction stream according to the switch tag carried by the instruction fetch request.
  • the instruction allocation unit 201 divides the instruction stream according to the instruction position to obtain multiple instruction groups, and allocates the multiple instruction groups to the instruction queues corresponding to the decoder groups 1031 to 103n. 2021 to 202n.
  • the instruction positions indicated by the switching mark can be multiple, so that the instruction positions determined in the instruction stream can be multiple, and when the instruction allocation unit splits the instruction stream, it can be that when one of the instruction positions is identified in the instruction stream, it is split once, and the target instruction corresponding to the instruction position is split into the previous instruction group, and, according to the switching mark corresponding to the target instruction split into the previous instruction group, a decoder group is allocated to the next instruction group, and the decoder group allocated to the previous instruction group is different from the decoder group allocated to the next instruction group. In this way, the instruction stream is split into multiple instruction groups with the instruction position as the boundary, and the multiple instruction groups are allocated to multiple decoder groups for parallel decoding.
  • the instruction allocation unit 201 may save the first instruction group among the multiple instruction groups to the instruction queue corresponding to the default decoder group, and for the non-first instruction group among the multiple instruction groups, the instruction allocation unit 201 may determine a decoder group different from the decoder group assigned to the previous instruction group from the multiple decoder groups according to the switching mark corresponding to the target instruction in the previous instruction group, and then save the non-first instruction group to the instruction queue corresponding to the determined decoder group.
  • the decoder group assigned to each non-first instruction group may be determined in sequence from the plurality of decoder groups according to the switching mark corresponding to the target instruction in the previous instruction group of the non-first instruction group, in accordance with the order of the plurality of decoder groups. For example, if the first decoder group 1031 is the default decoder group, the first instruction group after the instruction stream is segmented is assigned to the decoder group 1031, and then, in accordance with the order of the decoder groups, each non-first instruction group is sequentially assigned to the decoder group after the decoder group 1031, until the decoder group 103n.
  • the allocation principle of the instruction allocation unit to allocate decoder groups to instruction groups can be to allocate them according to the order of the decoder groups, and according to the switching mark corresponding to the target instruction in the previous instruction group, a decoder group different from the decoder group allocated to the previous instruction group is allocated to the subsequent instruction group, so as to achieve reasonable allocation of instruction groups in the instruction queues corresponding to the decoder groups, ensure that multiple decoder groups can read the instructions to be decoded in the corresponding instruction queues, and realize parallel decoding of multiple decoder groups.
  • the switch mark may also include information about the decoder group to be switched, which is used to specifically indicate the decoder group to be switched.
  • the instruction allocation unit can allocate a specific decoder group to the next instruction group based on the switch mark corresponding to the target instruction in the previous instruction group, so as to allocate the decoder group to the instruction group not in the order of the decoder groups.
  • the instruction allocation unit allocates the corresponding decoder group to the instruction group according to the decoder group specifically indicated in the switch mark until the instruction group is allocated.
  • the default decoder group may be the first decoder group assigned in sequence, or may be a decoder group assigned specifically by the processor, and the present disclosure does not impose too many restrictions on this.
  • the decoder groups 1031 to 103n save the decoded microinstructions in the microinstruction queues 2031 to 203n. If the decoded instruction is a target instruction, the decoder groups 1031 to 103n can decode the target microinstructions corresponding to the target instructions, and attach the switch mark to the target microinstructions according to the instruction position of the corresponding switch mark in the target instruction.
  • Merging unit 204 reads microinstructions in microinstruction queues 2031 to 203n, and merges the read microinstructions to obtain a microinstruction sequence that can be executed.
  • merging unit 204 can read microinstructions in order in microinstruction queues 2031 to 203n and merge based on the switch mark attached to the target microinstruction.
  • merging unit reads microinstructions in microinstruction queue 2031 according to the order of microinstruction queues, and when the target microinstruction attached with the switch mark is read in microinstruction queue 2031, the next microinstruction queue after microinstruction queue 2031 is sequentially switched to, and microinstructions are read in this microinstruction queue.
  • the target microinstruction attached with the switch mark is read in this microinstruction queue
  • the next microinstruction queue is continuously switched to read microinstructions, and so on, until the microinstruction is read.
  • the first microinstruction queue from which the merging unit reads microinstructions may correspond to the first assigned instruction queue of the instruction group (e.g., a microinstruction queue and an instruction queue belonging to the same decoder group).
  • the instruction allocation unit assigns the first instruction group to the instruction queue 2021 corresponding to the decoder group 1031
  • the merging unit first reads the microinstructions in the microinstruction queue 2031 when merging the microinstructions, to provide support for the merged microinstruction queue to correspond to the instruction fetch order.
  • the switch mark corresponding to the target instruction may also specifically indicate the decoder group to be switched. Then, when the target instruction is decoded to obtain the switch mark attached to the target microinstruction, the switch mark may specifically indicate Switched microinstruction queue. Therefore, when the merging unit merges the microinstructions, the read microinstruction queue can be switched based on the switch mark attached to the target microinstruction, so as to realize reading not in the order of the microinstruction queue.
  • the merging unit when the merging unit reads the switch mark attached to the target microinstruction in the microinstruction queue 2031, it can switch to the microinstruction queue 203n to continue reading the microinstruction; if the switch mark attached to the target microinstruction is read in the microinstruction queue 203n, it will switch to the microinstruction queue 2031 to read the microinstruction.
  • FIG3A exemplarily shows a schematic diagram of the splitting of the instruction stream.
  • the instruction stream includes instructions 310 to 31m, where m is the number of instructions in the instruction stream, which may be determined according to actual conditions, and the embodiments of the present disclosure are not limited thereto.
  • the instruction position for switching the decoder group indicated by the switching mark is shown as the dotted arrow in the figure, then the instruction 31k corresponding to the instruction position may be the target instruction, and the instruction position may be the end position of the target instruction 31k, wherein the switching mark may be set, for example, by a branch prediction unit.
  • the disclosed embodiment can use the instruction position as the boundary for instruction stream segmentation to segment the instruction stream 310 to 31m.
  • the target instruction 31k is adjacent to the instruction 31k+1.
  • the target instruction 31k is segmented into the previous instruction group (i.e., the instructions 310 to 31k are a group), and the instruction 31k+1 is segmented into the next instruction group (i.e., the instructions 31k+1 to 31m are a group), and two adjacent different instruction groups are obtained.
  • there are multiple switch marks there are multiple instruction positions in the instruction stream, and there are also multiple corresponding target instructions.
  • the instruction stream can be segmented in this way to obtain multiple instruction groups.
  • the disclosed embodiment can allocate the first instruction group (i.e., instruction 310 to target instruction 31k) after segmentation to the instruction queue corresponding to decoder group 0 in the order of decoder group 0 and decoder group 1, and decoder group 0 performs decoding operations on instruction 310 to target instruction 31k. Since target instruction 31k is the instruction corresponding to the instruction position for switching the decoder group, it is necessary to allocate a decoder group different from decoder group 0 to the instructions after target instruction 31k, so that the instruction group of instruction 31k+1 to instruction 31m is allocated to the instruction queue corresponding to decoder group 1, and decoder group 1 performs decoding operations on instruction 31k+1 to instruction 31m.
  • the first instruction group i.e., instruction 310 to target instruction 31k
  • decoder group 0 performs decoding operations on instruction 310 to target instruction 31k.
  • FIG3B exemplarily shows a schematic diagram of merging microinstructions.
  • Decoder group 0 decodes the instruction group from instruction 310 to target instruction 31k to obtain microinstructions 320 to target microinstructions 32k (not shown in the figure), wherein decoder group 0 parses and translates the target instruction 31k, and the obtained target microinstruction 32k is a combination of microinstruction 32k' and microinstruction 32k", microinstruction 32k' is a microinstruction without a switching mark, and microinstruction 32k” is a microinstruction with a switching mark. Decoder group 1 decodes the instruction group from instruction 31k+1 to instruction 31m to obtain microinstructions 32k+1 to microinstructions 32m. Microinstructions 320 to target microinstructions 32k are stored in the microinstruction queue of decoder group 0, and microinstructions 32k+1 to microinstructions 32m are stored in the microinstruction queue of decoder group 1.
  • the microinstructions may be read first in the microinstruction queue of decoder group 0 according to the order of the decoder groups.
  • the microinstruction queue of decoder group 1 is switched to read the microinstruction. That is, when the microinstruction with the switch mark is read from the currently read microinstruction queue, the microinstruction is switched to the next microinstruction queue for reading the microinstruction until the microinstruction is read.
  • the read microinstructions may be switched in the microinstruction queues of multiple decoder groups according to the switch mark attached to the target microinstruction, so that the read microinstructions can correspond to the instruction fetch order.
  • the instruction position for switching the decoder group is indicated by the switch mark of the instruction fetch request, and the switch mark is transparently transmitted to the microinstruction obtained by the decoder group through the target instruction in the instruction stream according to the instruction fetch request, so as to support multiple decoder groups for parallel decoding and sequential merging of microinstructions, which effectively improves the decoding efficiency of the processor.
  • the disclosed embodiment further provides a high-performance processor with Micro-Op Cache (OC).
  • OC Micro-Op Cache
  • FIG4A is a block diagram of a processor architecture with a microinstruction cache.
  • the branch prediction unit 101 sends the generated instruction fetch request to the microinstruction cache 104.
  • the microinstruction cache 104 can be used to cache microinstructions.
  • the microinstruction cache 104 can include multiple entries, Each entry may contain multiple microinstructions.
  • the instruction fetch request generated by the branch prediction unit 101 may correspond to multiple microinstruction cache entries.
  • the microinstruction cache can perform a hit judgment between the starting address of the instruction fetch request and the address of the first microinstruction of all table entries; if a hit is found, the microinstruction in the first table entry is obtained. If the end address of the last microinstruction in the microinstruction cache table entry is smaller than the end address of the address range of the instruction fetch request, then the end address in the address range corresponding to the last microinstruction needs to be used to further perform a hit judgment with the address of the first microinstruction of all table entries; if a hit is found, the microinstruction in the second table entry is obtained. Repeat the above process until the end address of the address range in the instruction fetch request is smaller than the end address of the last microinstruction in the table entry, then the microinstruction can be read from the microinstruction cache based on the instruction fetch request.
  • the microinstruction cache when the addresses in the instruction fetch requests generated by the branch prediction unit can all hit in the microinstruction cache, the microinstruction cache can output the corresponding microinstructions; when the starting address in the instruction fetch request generated by the branch prediction unit fails to hit in the microinstruction cache table entry, the microinstruction cache cannot output the microinstructions.
  • the processor may include multiple decoding modes, wherein the multiple decoding modes include a decoder mode and a microinstruction cache mode, wherein the decoder mode is decoded by the decoder group to obtain microinstructions, and the microinstruction cache mode is searched by the microinstruction cache to obtain microinstructions.
  • FIG4B exemplarily shows another architecture block diagram of a processor provided by at least one embodiment of the present disclosure.
  • the processor of the embodiment of the present disclosure is compatible with the decoder mode and the microinstruction cache mode.
  • the instruction fetch request carrying the switch mark generated by the branch prediction unit 101 can obtain the microinstruction through two paths.
  • the two paths can be respectively a path for obtaining the microinstruction by decoding by the decoder group (corresponding to the decoder mode) and a path for obtaining the microinstruction by searching the microinstruction cache (corresponding to the microinstruction cache mode).
  • the branch prediction unit 101 sends the instruction fetch request carrying the switch mark to the instruction cache 102, and the instruction cache 102 fetches the instruction stream according to the address in the instruction fetch request, and the instruction stream is divided into multiple instruction groups according to the instruction position in the instruction stream through the instruction allocation unit 201, and the obtained multiple instruction groups are allocated to the instruction queues 2021 to 202n corresponding to the multiple decoder groups.
  • the multiple decoder groups 1031 to 103n read the instructions to be decoded in the corresponding instruction queues 2021 to 202n and perform decoding operations to obtain microinstructions. Instruction, and then save the decoded microinstruction to the corresponding microinstruction queue, and based on the existence of the microinstruction cache 104, the decoded microinstruction can also be cached in the microinstruction cache.
  • the branch prediction unit 101 In response to obtaining the microinstructions from the microinstruction cache, that is, in the microinstruction cache mode, the branch prediction unit 101 sends the instruction fetch request carrying the switch mark to the microinstruction cache 104, so that the microinstructions can be output accordingly according to the hit of the instruction fetch request in the microinstruction cache, and based on the existence of the microinstruction queue, the obtained microinstructions can be saved in the microinstruction queue corresponding to the default decoder group.
  • the fetched microinstruction is saved in the microinstruction queue corresponding to the default decoder group, wherein the microinstruction queue corresponding to the default decoder group can be the microinstruction queue corresponding to the first decoder group determined in sequence, or the microinstruction queue corresponding to the decoder group specified by the processor, or the microinstruction queue corresponding to the decoder group can be determined based on whether the last instruction decoded by the decoder group before switching the decoding mode to the microinstruction cache mode has an instruction position indicated by a corresponding switching mark, thereby determining the microinstruction queue corresponding to the decoder group.
  • FIG. 5A is an optional schematic diagram of switching to microinstruction cache mode to save microinstructions in at least one embodiment of the present disclosure.
  • the instruction stream (i.e., instruction 510 to instruction 51m) is read in the instruction cache according to the instruction fetch request, wherein the switch mark indicates the instruction position for switching the decoder group as shown by the dotted line in the figure, then the instruction 51k corresponding to the instruction position is the target instruction, and there is no instruction position for switching the decoder group at the end position of the instruction stream; the instruction stream is divided according to the instruction position to obtain two adjacent different instruction groups (i.e., instruction 510 to instruction 51k and instruction 51k+1 to instruction 51m), and the corresponding decoder groups are allocated, and the last instruction 51m is decoded by decoder group 1 to obtain microinstruction 52m without the switch mark, and saved in the corresponding microinstruction queue 1.
  • the switch mark indicates the instruction position for switching the decoder group as shown by the dotted line in the figure
  • the instruction 51k corresponding to the instruction position is the target instruction, and there is no instruction position for switching the decoder group at the end position of the instruction stream
  • the instruction fetch address is searched and hit in the microinstruction cache, and the microinstructions (i.e., microinstructions 530 to 53m) are read out, and the read microinstructions 530 to 53m are correspondingly saved in the microinstruction queue 1.
  • FIG. 5B is another optional schematic diagram of switching to the microinstruction cache mode to save microinstructions in at least one embodiment of the present disclosure.
  • the instruction stream (i.e., instruction 510 to instruction 51m) is read in the instruction cache according to the instruction fetch request, wherein the instruction position corresponding to the switch mark indicating the decoder group switching is performed in the last instruction 51m (the instruction position indicated by the switch mark 2 in the figure); after the instruction stream is divided and the decoder groups are allocated, instructions 510 to 51k are decoded by decoder group 0, instructions 51k+1 to 51m are decoded by decoder group 1, and the last instruction 51m is decoded by decoder group 1.
  • Microinstruction 52m' and 52m" are obtained, and microinstruction 52m" is accompanied by a switch mark, and the microinstruction decoded by decoder group 1 is saved in the corresponding microinstruction queue 1.
  • the instruction fetch address is searched and hit in the microinstruction cache, and the microinstructions (i.e., microinstructions 530 to microinstructions 53m) are read out.
  • microinstruction queue 0 corresponding to the decoder group (i.e., decoder group 0) to which the switching mark corresponding to the last instruction 51m indicates the switching before the decoding mode is switched.
  • the microinstruction cache may not respond to the switch mark carried in the instruction fetch request, and the switch mark is not included in the read microinstruction.
  • the decoder mode if the instruction fetch request does not hit in the microinstruction cache, the decoder mode is entered, and multiple decoder groups in the decoder mode decode the instructions corresponding to the instruction fetch request in parallel to obtain microinstructions, and the microinstructions decoded in the decoder mode can be saved in the microinstruction cache.
  • the instruction stream indicated by the instruction fetch request is segmented according to the instruction position indicated by the switch mark, and the first instruction group of the obtained multiple instruction groups is assigned to the instruction queue corresponding to the default decoder group.
  • the instruction queue corresponding to the default decoder group can be the instruction queue corresponding to the first decoder group determined in sequence, or the instruction queue corresponding to the decoder group specified by the processor, or the instruction queue corresponding to the corresponding decoder group can be determined according to whether the last instruction decoded by the decoder group has the instruction position indicated by the switch mark before entering the microinstruction cache mode.
  • the decode before the processor's decode mode switches to microinstruction cache mode, the decode If the last instruction decoded by the decoder group does not have an instruction position indicated by the corresponding switch mark, then after the microinstruction cache mode is switched to the decoder mode, the first instruction group among the multiple instruction groups obtained by segmenting the instruction stream is allocated to the instruction queue corresponding to the decoder group that decoded the last instruction before the microinstruction cache mode is switched.
  • the decoding mode of the processor before the decoding mode of the processor is switched to the microinstruction cache mode, the last instruction 51m decoded by decoder group 1 does not have an instruction position indicated by the corresponding switch mark, so when switching to the microinstruction cache mode, if the instruction fetch address in the instruction fetch request does not hit in the microinstruction cache, the decoding mode is switched to the decoder group mode again, and the first instruction group (i.e., instruction 510 to instruction 51k) obtained after segmenting the instruction stream according to the instruction position indicated by the switch mark carried by the instruction fetch request (the position shown by the dotted line in FIG5A) is allocated to the instruction queue 1 corresponding to decoder group 1.
  • the first instruction group i.e., instruction 510 to instruction 51k
  • the last instruction decoded by the decoder group has an instruction position indicated by the corresponding switch mark
  • the first instruction group among the multiple instruction groups obtained by dividing the instruction stream is allocated to the instruction queue corresponding to the decoder group that is switched by the switch mark corresponding to the last instruction before the microinstruction cache mode is switched.
  • the last instruction 51m decoded by the decoder group 1 has an instruction position indicated by the corresponding switch mark, so that when switching to the microinstruction cache mode, if the instruction fetch address in the instruction fetch request does not hit in the microinstruction cache, the decoding mode is switched to the decoder group mode again, and according to the instruction position indicated by the switch mark carried by the instruction fetch request (as shown by the switch mark 2 in Figure 5B), the first instruction group (i.e., instruction 510 to instruction 51k) obtained after dividing the instruction stream is allocated to the instruction queue 0 corresponding to the decoder group 0 after the switch mark corresponding to the last instruction 51m before the microinstruction cache mode is switched.
  • the first instruction group i.e., instruction 510 to instruction 51k
  • the embodiments of the present disclosure may be implemented in a processor having a microinstruction cache and a decoder, by carrying a switch mark in an instruction fetch request, and the switch mark at least indicating the instruction position for switching the decoder group; thus, in the decoder mode (corresponding to the microinstructions obtained by decoding by the decoder group), the switch mark may be transparently transmitted through the target instruction and the microinstructions to support parallel decoding of multiple decoder groups and improve decoding efficiency; and for the microinstruction cache mode of the processor (corresponding to the microinstructions obtained by searching the microinstruction cache), the embodiments of the present disclosure may not be processed based on the switch mark carried in the instruction fetch request, so as to be compatible with the microinstruction cache mode of the processor.
  • the embodiments of the present disclosure may support parallel decoding and improve decoding performance in a processor that supports the decoder mode and the microinstruction cache mode.
  • the embodiments of the present disclosure For example, when the processor is compatible with the microinstruction cache mode, parallel decoding of the decoder mode can be supported to improve the decoding performance.
  • FIG. 6 shows another optional flow chart of the decoding method provided by an embodiment of the present disclosure.
  • the method shown in FIG. 6 may be executed by the processor shown in FIG. 4B , wherein the contents described below may correspond to the contents described above.
  • the method may include:
  • Step S60 Generate an instruction fetch request.
  • step S60 may be performed by a branch prediction unit, and the branch prediction unit may set a switch mark in the instruction fetch request according to the branch prediction jump result to indicate the instruction position for performing decoder group switching.
  • Step S61 determine whether the current processor is in microinstruction cache mode, if not, execute step S62, if yes, execute step S69.
  • the microinstruction cache mode obtains the microinstructions by searching the microinstruction cache.
  • Step S62 Access the instruction cache and fetch the instruction stream according to the instruction fetch request.
  • the switch mark carried by the instruction fetch request will be transmitted to the instruction stream indicated by the fetch request. Based on the switch mark at least indicating the instruction position for switching the decoder group, the instruction position for switching the decoder group in the instruction stream can be determined, and then the target instruction corresponding to the instruction position can be determined.
  • Step S63 determine whether there is a switch mark in the instruction fetch request. If not, execute step S64; if so, execute step S65.
  • Step S64 Send the instruction stream to the default decoder group, which decodes the instructions and saves the obtained microinstructions into the corresponding microinstruction queue.
  • the switch mark carried by the instruction fetch request is at least used to indicate the instruction position for switching the decoder group.
  • the default decoder group decodes the instruction stream, and the obtained microinstructions are saved to the microinstruction queue corresponding to the default decoder group.
  • the default decoder group can be the first decoder group assigned in sequence, or it can be a decoder group assigned by the processor, and in the process of processing instructions, for each instruction fetch request, the default decoder group can also be the decoder group currently switched to.
  • the default decoder group in the embodiment of the present disclosure is not specified as a fixed decoder group, and can be selected according to actual needs.
  • Step S65 split the instruction stream according to the switch mark, and replace the target instruction and the instruction before the target instruction
  • the instruction is assigned to the instruction queue corresponding to the default decoder group and decoded by the first decoder group.
  • the obtained microinstruction is saved in the corresponding microinstruction queue, wherein the target microinstruction obtained by decoding the target instruction is accompanied by a switching mark.
  • Step S66 determine whether there are target instructions corresponding to the switch mark in the remaining instructions, if not, execute step S67, if so, execute step S68.
  • Step S67 Allocate the remaining instructions to the instruction queue corresponding to the next decoder group different from the previous decoder group, and decode them by the next decoder group, and save the obtained microinstructions to the corresponding microinstruction queue.
  • Step S68 assign the target instruction and the instruction before the target instruction to the instruction queue corresponding to the next decoder group different from the previous decoder group, and decode them by the next decoder group, and save the obtained microinstructions to the corresponding microinstruction queue, wherein the microinstructions obtained by decoding the target instruction are accompanied by a switching mark; return to execute step S66.
  • steps S65 to S68 are only optional implementation methods of the embodiment of the present disclosure to divide the instruction stream according to the switching mark corresponding to the target instruction to obtain multiple instruction groups, and assign the multiple instruction groups to multiple decoder groups for parallel decoding.
  • Step S69 Take out the microinstructions from the microinstruction cache, and save the obtained microinstructions to the corresponding microinstruction queue.
  • step S62 is executed.
  • the decoder mode is decoded by a decoder group to obtain microinstructions.
  • Step S70 reading microinstructions from the microinstruction queue corresponding to the first decoder group.
  • Step S71 determine whether the read microinstruction is accompanied by a switching mark, if so, execute step S72; if not, return to execute step S70 until the microinstruction is read.
  • Step S72 Switch to the microinstruction queue corresponding to the next decoder group to read the microinstruction, and return Step S71.
  • the disclosed embodiment may further execute the microinstructions.
  • the disclosed embodiments can support parallel decoding in a processor that supports a decoder mode and a microinstruction cache mode, thereby improving decoding performance.
  • the present disclosure also provides a processor, the structure of which can refer to FIG. 4B , wherein the content described below can be considered as the functional modules required to be set by the processor to implement the decoding method provided by the present disclosure, and the content described below can be referred to in correspondence with the content described above.
  • the processor at least includes:
  • a branch prediction unit configured to generate an instruction fetch request, wherein the instruction fetch request carries at least one switch tag, and the switch tag at least indicates an instruction position for performing a decoder group switch;
  • An instruction cache for obtaining the instruction stream indicated by the instruction fetch request in response to the microinstructions decoded by the decoder group, and determining the instruction position for performing decoder group switching in the instruction stream according to the switching mark carried by the instruction fetch request;
  • An instruction distribution unit used for distributing the instruction stream to a plurality of decoder groups for parallel decoding according to the instruction position
  • a decoder group used for decoding the assigned instruction to obtain a microinstruction; the number of the decoder groups is multiple, wherein when the decoder group decodes a target instruction, a target microinstruction obtained by decoding the target instruction is accompanied by a switching mark, and the target instruction is the instruction corresponding to the instruction position;
  • the microinstruction cache is used to respond to the microinstruction found by the microinstruction cache. If the instruction fetch request hits in the microinstruction cache, the microinstruction corresponding to the instruction fetch request is obtained from the microinstruction cache, and the obtained microinstruction is not accompanied by a switching mark.
  • the end position of the target instruction in the instruction stream is determined according to the instruction position indicated by the switch tag carried by the instruction fetch request.
  • the step of the instruction allocating unit, for allocating the instruction stream to multiple decoder groups for parallel decoding according to the instruction position may include:
  • the instruction stream is segmented according to the instruction positions to obtain a plurality of instruction groups, and the plurality of instruction groups are allocated to a plurality of decoder groups for parallel decoding.
  • the instruction allocating unit divides the instruction stream according to the instruction position to obtain a plurality of instruction groups, including:
  • the instruction stream is divided into a plurality of instruction groups with the instruction position as a boundary in the instruction stream, wherein the target instruction as a boundary in two adjacent instruction groups is divided into the previous instruction group;
  • the step of the instruction distribution unit distributing the plurality of instruction groups to a plurality of decoder groups for parallel decoding comprises:
  • a decoder group is allocated to the next instruction group, and the decoder group allocated to the previous instruction group is different from the decoder group allocated to the next instruction group.
  • one decoder group is correspondingly provided with an instruction queue for storing instructions to be decoded
  • the instruction allocation unit allocates a decoder group to a subsequent instruction group according to a switch mark corresponding to a target instruction segmented into a previous instruction group, and the decoder group allocated to the previous instruction group is different from the decoder group allocated to the subsequent instruction group, comprising:
  • a decoder group different from the decoder group assigned to the previous instruction group is determined from the multiple decoder groups, and the non-first instruction group is saved to the instruction queue corresponding to the determined decoder group.
  • the instruction allocation unit determines, according to a switching flag corresponding to a target instruction in a previous instruction group, from the multiple decoder groups a decoder group different from a decoder group allocated to the previous instruction group, comprising:
  • the decoder group assigned to each non-first instruction group is determined in sequence from the multiple decoder groups.
  • the step of the instruction allocation unit saving the first instruction group among the multiple instruction groups to the instruction queue corresponding to the default decoder group includes:
  • the first instruction group of the plurality of instruction groups is assigned to the instruction queue corresponding to the default decoder group; wherein the decoder mode is obtained by decoding the microinstructions by the decoder group, and the microinstruction cache mode is obtained by searching the microinstruction cache. instruction;
  • the decoding mode is switched to the microinstruction cache mode, if the last instruction decoded by the decoder group does not have an instruction position indicated by the corresponding switching mark, then after the decoding mode continues to switch to the decoder mode, the first instruction group is allocated to the instruction queue corresponding to the decoder group that decoded the last instruction;
  • the decoding mode switches to the microinstruction cache mode, if the last instruction decoded by the decoder group has an instruction position indicated by the corresponding switching mark, then after the decoding mode continues to switch to the decoder mode, the first instruction group is assigned to the instruction queue corresponding to the decoder group indicated by the switching mark corresponding to the last instruction.
  • the processor further includes a merging unit for merging the microinstructions decoded by the multiple decoder groups in a decoder mode according to a switch mark attached to the target microinstructions, so as to obtain microinstructions corresponding to the instruction fetch order.
  • a merging unit for merging the microinstructions decoded by the multiple decoder groups in a decoder mode according to a switch mark attached to the target microinstructions, so as to obtain microinstructions corresponding to the instruction fetch order.
  • one decoder group is correspondingly provided with one microinstruction queue
  • the decoder group is also used to store the decoded microinstructions in the corresponding microinstruction queue
  • the step of the merging unit merging the microinstructions decoded by the plurality of decoder groups according to the switch flag attached to the target microinstruction to obtain the microinstructions corresponding to the instruction fetching sequence includes:
  • the microinstructions are switched in the microinstruction queues corresponding to the decoder groups to merge the microinstructions to obtain the microinstructions corresponding to the instruction fetching sequence.
  • the merging unit switches the merging of microinstructions in the microinstruction queues corresponding to the respective decoder groups according to the switching mark attached to the target microinstructions to obtain the microinstructions corresponding to the instruction fetching sequence, including the following steps:
  • Microinstructions are read starting from the microinstruction queue corresponding to the default decoder group. If the read microinstructions are accompanied by a switching mark, the next microinstruction queue for switching to read the microinstructions is determined based on the switching mark attached to the microinstructions, until the microinstructions in the microinstruction queues corresponding to each decoder group are read.
  • the merging unit determines the step of switching the next microinstruction queue for reading the microinstruction according to the switching mark carried by the microinstruction, including:
  • the microinstruction queue for reading the microinstruction is switched in sequence from the microinstruction queues of each decoder group according to the switch mark carried by the microinstruction and the order of the microinstruction queues.
  • the microinstruction cache is further used to, in the microinstruction cache mode, save the acquired microinstructions to a microinstruction queue corresponding to the default decoder group;
  • the decoder group is also used to save the decoded microinstructions into the microinstruction cache;
  • the decoder mode is entered.
  • a chip is also provided in an embodiment of the present disclosure, and the chip may include the above-mentioned processor.
  • An embodiment of the present disclosure also provides an electronic device, which may include the above-mentioned chip.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

本公开实施例提供一种译码方法、处理器、芯片及电子设备,其中所述方法包括:生成携带有至少一个切换标记的取指请求,切换标记至少指示进行译码器组切换的指令位置;响应于由译码器组译码得到微指令,获取取指请求取指出的指令流,根据取指请求携带的切换标记确定进行译码器组切换的指令位置;根据指令位置将指令流分配给多个译码器组进行并行译码,并在目标指令译码得到的目标微指令中附带切换标记,目标指令为指令位置对应的指令;响应于由微指令缓存查找得到微指令,若取指请求在微指令缓存中命中,从微指令缓存中获取对应的微指令。本公开实施例能够提升处理器的译码性能。

Description

一种译码方法、处理器、芯片及电子设备
本申请要求于2022年10月31日递交的中国专利申请第202211350246.2号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及一种译码方法、处理器、芯片及电子设备。
背景技术
在现代处理器中,指令需要经过取指、译码和执行等处理过程;其中,译码是对取指的指令进行解析翻译,以得到微指令(micro-op,Uop)的过程。译码作为处理器的重要工作,如何提升处理器的译码性能,一直是本领域技术人员研究的问题。
发明内容
有鉴于此,本公开实施例提供一种译码方法、处理器、芯片及电子设备,以实现指令的并行译码,并得到与取指顺序一致的微指令序列,从而提升处理器的译码性能。
为实现上述目的,本公开实施例提供如下技术方案:
本公开实施例提供一种译码方法,应用于处理器,所述方法包括:
生成取指请求,所述取指请求携带有至少一个切换标记,所述切换标记至少指示进行译码器组切换的指令位置;
响应于由译码器组译码得到微指令,获取所述取指请求取指出的指令流,根据所述取指请求携带的切换标记,确定所述指令流中进行译码器组切换的指令位置;根据所述指令位置,将所述指令流分配给多个译码器组进行并行译码,并在目标指令译码得到的目标微指令中附带切换标记,所述目标指令为所述指令位置对应的指令;
响应于由微指令缓存查找得到微指令,若所述取指请求在微指令缓存中 命中,从所述微指令缓存中获取所述取指请求对应的微指令,所获取的微指令不附带切换标记。
本公开实施例还提供一种处理器,包括:
分支预测单元,用于生成取指请求,所述取指请求携带有至少一个切换标记,所述切换标记至少指示进行译码器组切换的指令位置;
指令缓存,用于响应于由译码器组译码得到微指令,获取所述取指请求取指出的指令流,根据所述取指请求携带的切换标记,确定所述指令流中进行译码器组切换的指令位置;
指令分配单元,用于根据所述指令位置,将所述指令流分配给多个译码器组进行并行译码;
译码器组,用于对分配的指令进行译码,以得到微指令;所述译码器组的数量为多个,其中,译码器组译码目标指令时,所述目标指令译码得到的目标微指令中附带切换标记,所述目标指令为所述指令位置对应的指令;
微指令缓存,用于响应于由微指令缓存查找得到微指令,若所述取指请求在微指令缓存中命中,从所述微指令缓存中获取所述取指请求对应的微指令,所获取的微指令不附带切换标记。
本公开实施例还提供一种芯片,包括如上述所述的处理器。
本公开实施例还提供一种电子设备,包括如上述所述的芯片。
本公开实施例所提供的译码方法可应用于处理器,可在取指请求中携带至少一个切换标记,所述切换标记至少指示进行译码器组切换的指令位置。从而当处理器响应于由译码器组译码得到微指令时,可获取取指请求取指出的指令流,并根据所述取指请求携带的切换标记确定所述指令流中进行译码器组切换的指令位置;进而根据所述指令位置,将所述指令流分配给多个译码器组进行并行译码,并在目标指令译码得到的目标微指令中附带切换标记,所述目标指令为所述指令位置对应的指令,以便后续能够根据所述目标微指令附带的切换标记,对多个译码器组译码得到的微指令进行合并,得到与取指顺序对应的微指令。当处理器响应于由微指令缓存查找得到微指令时,若所述取指请求在微指令缓存中命中,本公开实施例可不通过译码器组进行指令的译码,而是从所述微指令缓存中获取所述取指请求对应的微指令。
本公开实施例可在具有微指令缓存和译码器的处理器中,通过在取指请 求中携带切换标记,并由切换标记至少指示进行译码器组切换的指令位置,从而在由译码器组译码得到微指令的译码器模式下,将切换标记通过目标指令和目标微指令进行透传,以实现支持多个译码器组的并行译码,以及译码后的微指令能够按照指令的取指顺序进行合并,提升了译码效率;而针对处理器由微指令缓存查找得到微指令的微指令缓存模式,本公开实施例可不基于取指请求中携带的切换标记进行处理,从而兼容处理器的微指令缓存模式。本公开实施例可在支持译码器模式和微指令缓存模式的处理器中,实现并行译码的支持,提升译码性能。
附图说明
为了更清楚地说明本公开实施例,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1A为处理器的架构框图;
图1B为处理器的另一架构框图;
图2A为本公开至少一实施例提供的处理器的一架构框图;
图2B为本公开至少一实施例提供的译码方法的可选流程图;
图2C为本公开至少一实施例提供的处理器的又一架构框图;
图3A为本公开至少一实施例提供的指令流的切分示意图;
图3B为本公开至少一实施例提供的微指令的合并示意图;
图4A为具有微指令缓存的处理器架构框图;
图4B为本公开至少一实施例提供的处理器的再一架构框图;
图5A为本公开至少一实施例提供的微指令缓存模式下保存微指令的可选示意图;
图5B为本公开至少一实施例提供的微指令缓存模式下保存微指令的另一可选示意图;以及
图6为本公开至少一实施例提供的译码方法的另一可选流程图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开的一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
指令是控制计算机执行操作的命令,也称为机器指令。指令的作用是协调各硬件部件之间的工作关系,它反映了计算机所拥有的基本功能,是计算机运行的最小功能单位。在计算机执行某种操作命令时,处理器需要对指令进行处理,转化为能够被机器识别的机器语言。在处理器中,一般使用流水线(pipeline)技术实现指令的处理。
在处理器的流水线作业中,指令需要经过取指(Instruction Fetch)、译码(Instruction Decode)、执行(Execute)等处理过程。取指为从处理器的缓存或主存储器中取指出程序运行对应的指令;译码操作为对取指出的指令进行译码,以确定指令的操作码和/或地址码等;执行操作为根据得到的操作码和/或地址码等执行指令操作,实现程序运行。其中,由于指令中存在改变程序流程的分支指令,为解决在处理分支指令时,处理器需等待分支指令的执行结果来确定下一步取指而导致的流水线延迟,处理器处理指令的流水线的前端还可设置分支预测单元,以实现指令的分支预测。
图1A示例性的示出了处理器的架构框图,该处理器包括:分支预测单元101、指令缓存102、译码器组103。
分支预测单元101是一种数字电路,可对指令进行分支预测,并基于分支预测结果生成取指请求。分支预测结果例如当前指令是否为分支指令、分支指令的分支结果(方向、地址、目标地址等)等。在一种实现中,分支预测单元可基于分支指令的历史执行信息和结果进行指令的分支预测,从而得到指令的取指地址范围并生成取指请求。分支预测单元生成的取指请求中包含有若干条指令的取指地址,用于从指令缓存102中读取相应的指令。
指令缓存102中存储有指令,指令缓存102主要通过指令缓存块进行指令的存储。在指令缓存中,每个指令缓存块对应一个Tag标签,用于在指令缓存中标识该指令缓存块,从而指令缓存在根据取指请求进行取指时,基于Tag能够查找到对应的指令缓存块。分支预测单元生成的取指地址可能对应多个指令,该多个指令形成指令流。可选的,指令缓存102可以是位于处理 器的一级缓存中用于存储指令的缓存部分。
需要说明的是,分支预测单元生成的取指地址中存在Tag区(地址标识区)和Index区(地址索引区),通过取指地址中的Index区能够在指令缓存中读取出多个指令缓存块的Tag,进而将读取的指令缓存多个指令缓存块的Tag与取指地址的Tag进行匹配判断,得到取指地址对应的指令在指令缓存中的存储位置(即指令缓存块的位置),从而读取对应的指令。
译码器组103能够对指令进行解析翻译,通过译码器组对指令进行译码操作,可以得出译码后的指令。译码后的指令可以是翻译指令所得出的可由机器执行的操作信息,诸如控制字段所形成的机器可执行的Uop(micro-op,微指令);也就是说,译码器可对指令进行译码,从而得到微指令。
图1A所示的处理器架构使用单一译码器组对指令进行译码,其受限于译码器组的吞吐量,指令的译码效率难以有效提升。基于此,使用多个译码器组对指令进行并行译码的处理器应运而生。图1B示例性的示出了处理器的另一架构框图。结合图1A和图1B所示,图1B所示处理器设置了多个译码器组1031至103n,n的具体数值可以根据处理器的具体设计而定,本公开实施例并不设限。基于处理器设置的多个译码器组1031至103n,指令缓存102基于取指请求取指出的指令流可以分配给多个译码器组进行译码,使得多个译码器组进行指令的并行译码,并分别输出译码得到的微指令,提升指令的译码效率。
在一个示例中,多个译码器组可以为两组译码器组,例如分别为译码器组0和译码器组1。其中,译码器组0在对指令执行译码的同时,译码器组1也能够对指令执行译码操作;比如在处理器的一个时钟周期内,译码器组0和译码器组1能够同时对指令执行译码操作并获得微指令,实现指令的并行译码。同时,译码器组0和译码器组1可以不按指令的顺序进行译码,而是可支持指令并行译码。需要说明的是,在实际应用中,处理器可以根据需要设置两组以上的译码器组,为便于理解,本公开实施例仅示出两组译码器组的示例。
然而多个译码器组对指令进行并行译码,不同于单一译码器组对指令进行顺序译码,多个译码器组进行指令的并行译码需要面临更为复杂的情况,比如指令缓存取指出的指令流如何分配给多个译码器组,多个译码器组译码 得到的微指令如何合并,从而使得最终被执行的微指令是与指令的取指顺序对应的。为解决上述问题,本公开实施例提供进一步改进的处理器架构,图2A示例性的示出了本公开至少一实施例提供的处理器的一架构框图。结合图1B和图2A所示,图2A所示处理器中进一步设置了指令分配单元201,用于对指令缓存102取指出的指令流进行切分,从而得到分配给译码器组进行并行译码的多个指令组;同时,为使得指令流的切分具有依据,本公开实施例在取指请求中携带了切换标记,所述切换标记可以至少指示进行译码器组切换的指令位置,从而通过将切换标记透传到指令缓存根据取指请求取指出指令流的过程中,可以使得指令分配单元201根据进行译码器组切换的指令位置对指令流进行切分,将指令流切分后的多个指令组分配给多个译码器组进行并行译码,为指令在多个译码器组的并行译码提供技术支持。进一步的,切换标记还可通过所述指令位置对应的指令透传到微指令中,从而各个译码器组译码得到微指令之后,能够基于微指令中的切换标记进行微指令的合并,为微指令的合并结果能够与取指顺序相对应提供技术支持。
作为可选实现,图2B示例性的示出了本公开至少一实施例提供的译码方法的可选流程图。该译码方法可以认为是处理器在利用译码器组进行译码的情况下的方法流程,即由译码器组译码得到微指令的方法流程。参照图2B,该方法流程可以包括如下步骤。
在步骤S21中,生成取指请求,所述取指请求携带有至少一个切换标记,所述切换标记至少指示进行译码器组切换的指令位置。
在一些实施例中,分支预测单元生成的取指请求中可携带切换标记。分支预测单元在进行分支预测时,分支预测方向主要分为两种:分支指令跳转和分支指令不跳转;相应的,分支预测单元所产生的取指地址可以分为两种:跳转的分支预测方向相应的取指地址,和,不跳转的分支预测方向相应的取指地址。作为可选实现,本公开实施例可根据跳转的分支预测方向对应的地址位置设置切换标记,并生成携带有至少一个切换标记的取指请求。
在一些实施例中,切换标记也可以通过其他机制进行设置,而不限于由分支预测单元基于分支预测情况在取指请求中设置切换标记。作为可选实现,分支预测单元生成取指请求(不携带切换标记)后,本公开实施例可利用处理器中的其他器件(例如指令缓存)在取指请求中设置切换标记。在一个实 现示例中,指令缓存在获取分支预测单元的取指请求之后,可基于指令边界在取指请求中设置切换标记,所述指令边界可以表示指令的结束位置。
在步骤S22中,获取所述取指请求取指出的指令流,根据所述取指请求携带的切换标记确定所述指令流中进行译码器组切换的指令位置。
指令缓存获取分支预测单元的取指请求,并根据所述取指请求中的取指地址取指指令,从而得到取指请求对应的指令流。若所述取指请求携带切换标记,并且所述切换标记至少指示进行译码器组切换的指令位置,则可以根据取指请求携带的切换标记确定指令流中进行译码器组切换的指令位置。
可以理解的是,指令流为包含有若干条指令的一组指令序列,若指令序列中未存在明确的界限,则指令序列中指令的结束位置不能确定。在本公开实施例中,通过根据取指请求携带的切换标记确定指令流中进行译码器组切换的指令位置,可以确定指令流的界限,则所述指令位置可以作为结束位置。若以所述指令位置对应的指令作为目标指令,则所述指令位置为目标指令的结束位置,从而根据取指请求携带的切换标记指示的指令位置,能够确定指令流中目标指令的结束位置。
需要说明的是,由于切换标记用于至少指示进行译码器组切换的指令位置,因此,其在取指请求中的设置位置不会影响指令缓存取指的指令流,也不会对取指出的指令流结构造成破坏。并且,本公开实施例中对于切换标记的具体设置位置和表现形式并不设限,可以例如是存在于指令缓存取指出的指令流之外的指示域或者是通过切换标记指示位进行表示。
在步骤S23中,根据所述指令位置,将所述指令流分配给多个译码器组进行并行译码,并在目标指令译码得到的目标微指令中附带切换标记,所述目标指令为所述指令位置对应的指令。
在一些实施例中,基于指令缓存取指出的指令流,指令分配单元可根据指令位置(即进行译码器组切换的指令位置),将指令流分配给多个译码器组进行并行译码。在一个可选示例中,指令分配单元可以根据所述指令位置对指令流进行切分,以得到多个指令组,进而将所述多个指令组分配给多个译码器组,由多个译码器组进行并行译码。
进一步的在一些实施例中,指令分配单元根据所述指令位置对指令流进行切分,可以是在指令流中以所述指令位置为分界,将指令流切分为多个指 令组,其中,将相邻的两个指令组中作为分界的目标指令切分到前一个指令组中。从而在将指令流分配给多个译码器组进行并行译码时,指令分配单元可根据切分到前一个指令组中的目标指令对应的切换标记,为后一个指令组分配译码器组,并且前一个指令组分配的码器组不同于后一个指令组分配的码器组,其中,目标指令对应的切换标记可以是指示目标指令的结束位置对应的切换标记。
需要说明的是,在多个译码器组对分配的指令组执行译码操作得到微指令后,为使得多个译码器组译码的微指令能够按照取指顺序进行合并,本公开实施例可通过解析翻译指令位置对应的目标指令,在目标指令译码得到的目标微指令中附带切换标记。作为一种可选实现,目标指令译码得到的目标微指令可以是两条微指令的组合,其中,一条为不附带切换标记的微指令,一条为附带切换标记的微指令。
在一些实施例中,本公开实施例在得到多个译码器组译码的微指令后,还可根据目标微指令附带的切换标记,将所述多个译码器组译码得到的微指令进行合并,以得到与取指顺序对应的微指令。可以理解的是,要实现对程序的完整操作,需要对多个译码器组译码得到的微指令进行合并,并且微指令合并后得到的微指令序列的顺序也要与取指顺序能够对应。
本公开实施例在取指请求携带切换标记的情况下,所述切换标记至少指示进行译码器组切换的指令位置。从而可获取取指请求取指出的指令流,并根据所述取指请求携带的切换标记确定所述指令流中进行译码器组切换的指令位置,进而根据所述指令位置,将所述指令流分配给多个译码器组进行并行译码,并在目标指令译码得到的目标微指令中附带切换标记,所述目标指令为所述指令位置对应的指令。本公开实施例可通过切换标记指示进行译码器组切换的指令位置,并将切换标记由取指请求透传到取指的指令流中的指令位置,从而实现基于该指令位置对取指的指令流进行切分,并分配给多个译码器组进行并行译码,有效提升了处理器的译码效率。进一步的,本公开实施例可通过对作为分界的目标指令的解析翻译,将切换标记透传到目标微指令中,在得到多个译码器组译码的微指令后,以根据切换标记,将所述多个译码器组译码得到的微指令进行合并,得到与取指顺序对应的微指令,以便于微指令的准确执行。
在一些实施例中,图2C示出了本公开至少一实施例提供的处理器的又一架构框图。结合图2A和图2C所示,在图2C所示处理器中,各个译码器组设置了对应的指令队列以及微指令队列,例如译码器组1031至103n分别设置了指令队列2021至202n,以及微指令队列2031至203n,一个译码器组对应一个指令队列和一个微指令队列。其中译码器组的指令队列,用于保存指令分配单元分配给译码器组的指令组,即指令队列可以保存译码器组待译码的指令;例如,指令队列2021保存译码器组1031待译码的指令,以此类推,指令队列202n保存译码器组103n待译码器的指令,其中,在实现多个译码器组的并行译码时,只要保障各个译码器组的指令队列保存待译码指令的速度,快于译码器组的译码速度,则译码器组可不断的从指令队列中获取指令进行译码,从而实现多个译码器组的并行译码。译码器组的微指令队列,用于保存译码器组译码得到的微指令;例如,微指令队列2031保存译码器组1031译码得到的微指令,以此类推,微指令队列203n保存译码器组103n译码得到的微指令。为对各个译码器组译码得到的微指令进行合并,处理器进一步设置了合并单元204,合并单元204可从多个微指令队列中读取指令并进行合并,使得合并的微指令的顺序与取指顺序相对应。
基于图2B所示方法流程的原理,图2C所示处理器架构可以在译码过程中,将取指请求中的切换标记通过对应的指令透传到微指令中,以实现多个译码器组并行译码,并且微指令按照取指顺序进行合并,可选的具体过程如下所示。
分支预测单元101生成携带有切换标记的取指请求,将取指请求下发至指令缓存102,以在指令缓存102中根据取指请求读取对应的指令流。其中,取指请求的切换标记指示的是进行译码器组切换的指令位置,并不影响取指地址在指令缓存中查找指令位置对应的指令。
指令缓存102根据取指请求的取指地址读取指令流,其中,取指请求携带的切换标记不对指令缓存的取指指令造成影响。在读取到指令流,根据取指请求携带的切换标记,指令缓存102能够确定所述指令流中进行译码器组切换的指令位置。
指令分配单元201根据所述指令位置,对指令流进行切分,得到多个指令组,并将所述多个指令组分配给译码器组1031至103n对应的指令队列 2021至202n。作为一种可选实现,切换标记指示的指令位置可以为多个,从而指令流中确定的指令位置可以为多个,指令分配单元对指令流进行切分时,可以是在指令流中识别到一个所述指令位置时,进行一次切分,并将指令位置对应的目标指令切分到前一个指令组中,并且,根据切分到前一个指令组中的目标指令对应的切换标记,为后一个指令组分配译码器组,而且,前一个指令组分配的译码器组不同于后一个指令组分配的译码器组。以此方式,以所述指令位置为分界,将指令流切分为多个指令组,将所述多个指令组分配给多个译码器组进行并行译码。
在一些实施例中,基于译码器组对应设置的用于保存待译码的指令的指令队列,指令分配单元201可将多个指令组中的首个指令组保存到默认译码器组对应的指令队列中,而针对多个指令组中的非首个指令组,指令分配单元201可根据前一个指令组中的目标指令对应的切换标记,从多个译码器组中确定与前一个指令组分配的译码器组不同的译码器组,进而将该非首个指令组保存到所确定的译码器组对应的指令队列。
在针对多个指令组中的非首个指令组分配译码器组的可选实现中,可以是根据非首个指令组的前一个指令组中的目标指令对应的切换标记,按照多个译码器组的顺序,从多个译码器组中依序确定为各非首个指令组分配的译码器组。例如,首个译码器组1031为默认译码器组,则指令流切分后的首个指令组分配给译码器组1031,而后按照译码器组的顺序,依次将各非首个指令组分配给译码器组1031之后的译码器组,直至译码器组103n。
可以理解的是,指令分配单元为指令组分配译码器组的分配原则可以是根据译码器组的顺序进行分配的,并且根据前一个指令组中的目标指令对应的切换标记,为后一个指令组分配与前一个指令组分配的译码器组不同的译码器组,以此实现指令组在译码器组对应的指令队列中的合理分配,保证多个译码器组在对应的指令队列中能够读取待译码指令,实现多个译码器组的并行译码。
需要说明的是,在另一可选实现中,切换标记还可以包含要切换的译码器组的信息,用于具体指示切换的译码器组。从而指令分配单元基于前一个指令组中目标指令对应的切换标记,能够为后一个指令组分配具体的译码器组,实现不按译码器组的顺序为指令组分配译码器组。例如,对于译码器组 1031至103n,根据切换标记的出现顺序,若首个切换标记中记录有译码器组103n的信息,则在将首个指令组分配给默认译码器组后,为后一个指令组分配首个切换标记具体指示的译码器组103n,由译码器组103n来执行译码;若下一个切换标记具体指示译码器组1031,则为下一个指令组分配的译码器组为1031,由译码器组1031来执行译码,以此方式,指令分配单元根据切换标记中具体指示切换的译码器组,为指令组分配对应的译码器组,直至指令组分配完毕。
需要进一步说明的是,默认译码器组可以为按顺序分配的首个译码器组,也可以为处理器指定分配的译码器组,本公开对此不做过多限制。
继续参照图2C,译码器组1031至103n将译码后的微指令对应保存在微指令队列2031至203n中,其中,译码器组1031至103n如果译码的指令为目标指令,则可以译码得到目标指令对应的目标微指令,并根据目标指令中对应切换标记的指令位置,将切换标记附带在目标微指令中。
合并单元204在微指令队列2031至203n中读取微指令,将读取的微指令进行合并,以得到能够被执行的微指令序列。通过合并单元204实现微指令队列2031至203n中的微指令合并时,合并单元204可基于目标微指令附带的切换标记,在微指令队列2031至203n中依序读取微指令进行合并。例如:合并单元按照微指令队列的顺序,在微指令队列2031中读取微指令,当在微指令队列2031中读取到附带有切换标记的目标微指令,则顺序的切换到微指令队列2031之后的下一微指令队列,并在该微指令队列中读取微指令。当在该微指令队列读取到附带有切换标记的目标微指令时,则继续切换到下一微指令队列中读取微指令,以此类推,直至微指令读取完毕。
在一些实施例中,合并单元读取微指令的首个微指令队列,可以与指令组首个分配的指令队列相对应(例如属于同一译码器组的微指令队列和指令队列),在一个示例中,如果指令分配单元将首个指令组分配给译码器组1031对应的指令队列2021,则合并单元在对微指令进行合并时,首先在微指令队列2031中读取微指令,以为合并的微指令队列能够与取指顺序相对应提供支持。
在另一些实施例中,目标指令对应的切换标记还可以具体指示切换的译码器组,则在目标指令译码得到目标微指令中附带的切换标记可以具体指示 切换的微指令队列。从而合并单元在对微指令进行合并时,基于目标微指令附带的切换标记能够切换的读取微指令队列,实现不按微指令队列的顺序读取。在一个示例中,假设目标微指令存在于微指令队列2031中,并且目标微指令中附带的切换标记具体指示切换的微指令队列为203n,当合并单元在微指令队列2031中读取到目标微指令附带的切换标记时,可切换到微指令队列203n中继续读取微指令;若在微指令队列203n中读取到目标微指令附带的切换标记具体指示切换的微指令队列为2031,则切换到微指令队列2031中读取微指令。
为便于理解基于指令流中进行译码器组切换的指令位置,对指令流进行切分的原理,下面以两个译码器组为例进行介绍。图3A示例性的示出了指令流的切分示意图。如图3A所示,指令流包括指令310至31m,m为指令流中的指令数量,具体可根据实际情况而定,本公开实施例并不设限。在指令310至31m中,切换标记指示的进行译码器组切换的指令位置如图中虚线箭头所示,则所述指令位置对应的指令31k可以为目标指令,并且所述指令位置可以为目标指令31k的结束位置,其中,切换标记可以例如通过分支预测单元进行设置。
如图3A所示,本公开实施例可将所述指令位置作为指令流切分的分界,对指令流310至31m进行切分。其中,目标指令31k与指令31k+1相邻,在根据所述指令位置对指令流进行切分时,将目标指令31k切分到前一个指令组(即指令310至指令31k为一组),指令31k+1切分到后一个指令组(即指令31k+1至指令31m为一组),得到相邻的两个不同指令组。当切换标记为多个时,指令流中的指令位置为多个,对应的目标指令也为多个,可按此方式进行指令流的切分,得到多个指令组。
参照图3A,本公开实施例可按照译码器组0和译码器组1的顺序,将切分后的首个指令组(即指令310至目标指令31k)分配到译码器组0对应的指令队列中,由译码器组0对指令310至目标指令31k执行译码操作。基于目标指令31k为进行译码器组切换的指令位置对应的指令,因此需为目标指令31k之后的指令分配与译码器组0不同的译码器组,从而将指令31k+1至指令31m的指令组,分配到译码器组1对应的指令队列,由译码器组1对指令31k+1至指令31m执行译码操作。
为便于理解基于附带切换标记的微指令,对微指令进行合并的原理,下面以两个译码器组为例进行介绍。图3B示例性的示出了微指令的合并示意图。
译码器组0对指令310至目标指令31k的指令组进行译码,得到微指令320至目标微指令32k(图中未示出),其中,译码器组0对目标指令31k进行解析翻译,得到的目标微指令32k为微指令32k'和微指令32k”的组合,微指令32k'为不附带切换标记的微指令,微指令32k”为附带切换标记的微指令。译码器组1对指令31k+1至指令31m的指令组进行译码,得到微指令32k+1至微指令32m。微指令320至目标微指令32k存放在译码器组0的微指令队列,微指令32k+1至微指令32m存放在译码器组1的微指令队列。
在对微指令进行合并时,按照译码器组的顺序,可先在译码器组0的微指令队列中读取微指令,在读取到目标微指令32k中附带切换标记的微指令32k”时,则切换至译码器组1的微指令队列读取微指令,也就是说,在从当前读取的微指令队列中读取到附带切换标记的微指令时,则切换到下一微指令队列进行微指令的读取,直至微指令读取完毕。由于取指的指令流是按照目标指令中对应于切换标记指示的指令位置进行切分,分配给多个译码器组进行并行译码,因此在读取译码得到的微指令时,可以按照目标微指令附带的切换标记,在多个译码器组的微指令队列中切换的读取微指令,使得读取的微指令能够与取指顺序相对应。
上文描述了在处理器具有多个译码器组的情况下,通过取指请求的切换标记指示进行译码器组切换的指令位置,并根据取指请求将切换标记通过指令流中的目标指令透传到译码器组译码得到的微指令中,以实现支持多个译码器组进行并行译码和微指令的按序合并,有效提升了处理器的译码效率。虽然处理器的多组译码器组能够实现指令的并行译码,但是得到微指令必须经历取指和译码的过程,这导致获得微指令的过程较为繁琐;基于此,为提升微指令的获取速度,本公开实施例进一步提供具有Micro-Op Cache(OC,微指令缓存)的高性能处理器。
图4A为具有微指令缓存的处理器架构框图。如图4A所示,分支预测单元101将生成的取指请求下发至微指令缓存104。其中,微指令缓存104可以用于缓存微指令。在一些实施例中,微指令缓存104可以包括多个表项, 每个表项可以容纳多个微指令。分支预测单元101生成的取指请求可能对应多个微指令缓存表项。
分支预测单元生成的取指请求在微指令缓存中进行微指令获取时,微指令缓存可将取指请求的起始地址与所有表项的第一条微指令的地址进行命中判断;若命中,则得到第一个表项中的微指令。若微指令缓存表项中最后一个微指令的结束地址小于取指请求的地址范围的结束地址,那么需要使用最后一个微指令对应的地址范围中的结束地址,进一步与所有表项的第一条微指令的地址进行命中判断;若命中,则得到第二个表项中的微指令。重复上述过程,直至取指请求中地址范围的结束地址小于表项中最后一个微指令的结束地址,则可实现基于取指请求,从微指令缓存中读取微指令。
在一些实施例中,当分支预测单元生成的取指请求中的地址都能在微指令缓存中命中时,微指令缓存能够输出对应的微指令;当分支预测单元生成的取指请求中的起始地址未能在微指令缓存表项命中时,微指令缓存不能输出微指令。
基于具有微指令缓存和译码器组的处理器,处理器可以包括多种译码模式,所述多种译码模式包括译码器模式和微指令缓存模式,其中,所述译码器模式由译码器组译码得到微指令,所述微指令缓存模式由微指令缓存查找得到微指令。图4B示例性的示出了本公开至少一实施例提供的处理器的再一架构框图。
如图4B所示,本公开实施例的处理器能够兼容译码器模式和微指令缓存模式,分支预测单元101生成的携带切换标记的取指请求可以通过两条通路得到微指令,该两条通路可以分别为由译码器组译码得到微指令的通路(对应译码器模式),由微指令缓存查找得到微指令的通路(对应微指令缓存模式)。
响应于由译码器组译码得到微指令,即在译码器模式下,分支预测单元101将携带切换标记的取指请求下发至指令缓存102,指令缓存102根据取指请求中的地址取指出指令流,经由指令分配单元201根据指令流中的指令位置对指令流进行切分,得到多个指令组,并将得到的多个指令组分配给多个译码器组对应的指令队列2021至202n。多个译码器组1031至103n在各自对应的指令队列2021至202n中读取待译码指令并执行译码操作,得到微 指令,进而将译码得到的微指令保存到对应的微指令队列,并且基于微指令缓存104的存在,还可以将译码得到的微指令缓存至微指令缓存中。
响应于由微指令缓存查找得到微指令,即在微指令缓存模式下,分支预测单元101将携带切换标记的取指请求下发至微指令缓存104,从而可以根据取指请求在微指令缓存中的命中,由微指令缓存对应输出微指令,并且基于微指令队列的存在,能够将获取的微指令保存到默认译码器组对应的微指令队列中。
在一些实施例中,若取指请求在微指令缓存中命中,将获取的微指令保存到默认译码器组对应的微指令队列中,其中,默认译码器组对应的微指令队列可以是按顺序确定的首个译码器组对应的微指令队列,也可以是处理器指定的译码器组对应的微指令队列,还可以是根据译码模式切换至微指令缓存模式前,译码器组译码的最后一条指令是否存在对应切换标记指示的指令位置,从而确定译码器组对应的微指令队列。
作为一种可选实现,在处理器的译码模式由译码器模式切换到微指令缓存模式前,译码器组译码的最后一条指令不存在对应切换标记指示的指令位置,则在微指令缓存中读取的微指令,保存到译码模式切换前译码最后一条指令的译码器组对应的微指令队列。图5A为本公开至少一实施例中切换到微指令缓存模式下保存微指令的可选示意图,如图5A所示,在译码器模式下,根据取指请求在指令缓存中读取指令流(即指令510至指令51m),其中,切换标记指示进行译码器组切换的指令位置如图中虚线所示,则该指令位置对应的指令51k为目标指令,并且指令流的结束位置不存在进行译码器组切换的指令位置;对指令流根据指令位置进行切分得到相邻的两个不同指令组(即指令510至指令51k以及指令51k+1至指令51m),并分配对应译码器组,而且,最后一条指令51m由译码器组1译码得到不附带切换标记的微指令52m,并保存到对应的微指令队列1中。当译码模式切换至微指令缓存模式后,取指地址在微指令缓存中进行查找并命中,读取出微指令(即微指令530至微指令53m),则将读取出的微指令530至微指令53m对应保存在微指令队列1中。
作为另一种可选实现,在处理器的译码模式由译码器模式切换到微指令缓存模式前,译码器组译码的最后一条指令中对应存在切换标记指示进行译 码器组切换的指令位置,则在微指令缓存中读取的微指令,根据最后一条指令对应的切换标记,保存到切换标记指示切换后的译码器组对应的微指令队列。图5B为本公开至少一实施例中切换到微指令缓存模式下保存微指令的另一可选示意图,如图5B所示,在译码器模式下,根据取指请求在指令缓存中读取指令流(即指令510至指令51m),其中,最后一条指令51m中对应存在切换标记指示进行译码器组切换的指令位置(图中切换标记2指示的指令位置);对指令流进行切分并分配译码器组后,指令510至51k由译码器组0进行译码,指令51k+1至51m由译码器组1进行译码,并且最后一条指令51m由译码器组1译码得到的微指令52m'和52m”中,微指令52m”附带切换标记,将译码器组1译码得到的微指令保存到对应的微指令队列1中。在译码模式切换至微指令缓存模式后,取指地址在微指令缓存中进行查找并命中,读取出微指令(即微指令530至微指令53m),则将读取出的微指令530至微指令53m保存在译码模式切换前,最后一条指令51m对应的切换标记指示切换到的译码器组(即译码器组0)对应的微指令队列中(即微指令队列0)。
可以理解的是,由于取指请求中携带的切换标记至少指示进行译码器组切换的指令位置,而微指令缓存模式不经过译码器组进行指令译码,因此微指令缓存可不响应取指请求中携带的切换标记,并且在读取的微指令中也不附带切换标记。
在另一些实施例中,在微指令缓存模式下,若取指请求未在微指令缓存中命中,则进入译码器模式,由译码器模式下的多个译码器组对取指请求对应的指令进行并行译码得到微指令,并且在译码器模式下译码得到的微指令可保存到微指令缓存中。其中,在切换到译码器模式下,对取指请求取指出的指令流,根据切换标记指示的指令位置进行切分后,得到的多个指令组中的首个指令组分配给默认译码器组对应的指令队列,所述默认译码器组对应的指令队列可以是按顺序确定的首个译码器组对应的指令队列,也可以是处理器指定的译码器组对应的指令队列,还可以是在进入微指令缓存模式前,根据译码器组译码的最后一个指令是否存在对应切换标记指示的指令位置,从而确定对应的译码器组对应的指令队列。
作为一种可选实现,在处理器的译码模式切换到微指令缓存模式前,译 码器组译码的最后一条指令不存在对应切换标记指示的指令位置,则在微指令缓存模式切换到译码器模式后,对指令流切分得到的多个指令组中的首个指令组,分配给微指令缓存模式切换前,译码最后一条指令的译码器组对应的指令队列。继续参考图5A,在处理器的译码模式切换到微指令缓存模式前,译码器组1译码的最后一条指令51m不存在对应切换标记指示的指令位置,从而在切换到微指令缓存模式时,若取指请求中的取指地址未在微指令缓存中命中,则译码模式再次切换到译码器组模式,依据取指请求携带的切换标记指示的指令位置(如图5A中虚线所示位置),对指令流进行切分后,得到的首个指令组(即指令510至指令51k),分配给译码器组1对应的指令队列1。
作为另一种可选实现,在处理器的译码模式切换到微指令缓存模式前,译码器组译码的最后一条指令存在对应切换标记指示的指令位置,则在微指令缓存模式切换到译码器模式后,对指令流切分得到的多个指令组中的首个指令组,分配给微指令缓存模式切换前,最后一条指令对应的切换标记指示进行切换的译码器组对应的指令队列。继续参考图5B,在处理器的译码模式切换到微指令缓存模式前,译码器组1译码的最后一条指令51m存在对应切换标记指示的指令位置,从而在切换到微指令缓存模式时,若取指请求中的取指地址未在微指令缓存中命中,则译码模式再次切换到译码器组模式,依据取指请求携带的切换标记指示的指令位置(如图5B中切换标记2所示位置),对指令流进行切分后,得到的首个指令组(即指令510至指令51k),分配给微指令缓存模式切换前,最后一条指令51m对应的切换标记指示切换后的译码器组0对应的指令队列0。
本公开实施例可在具有微指令缓存和译码器的处理器中,通过在取指请求中携带切换标记,并由切换标记至少指示进行译码器组切换的指令位置;从而在译码器模式下(对应由译码器组译码得到微指令),切换标记可通过目标指令和微指令进行透传,以实现支持多个译码器组的并行译码,提升译码效率;而针对处理器的微指令缓存模式(对应由微指令缓存查找得到微指令),本公开实施例可不基于取指请求中携带的切换标记进行处理,从而兼容处理器的微指令缓存模式。本公开实施例可在支持译码器模式和微指令缓存模式的处理器中,实现并行译码的支持,提升译码性能。也就是说,本公开实施 例可在处理器兼容微指令缓存模式的情况下,支持译码器模式的并行译码,提升译码性能。
作为一种可选示例,图6示出了本公开实施例提供的译码方法的另一可选流程图,图6所示方法可由图4B所示的处理器执行,其中,下文描述的内容可与上文描述内容相互对应参照。参照图6,该方法可以包括:
步骤S60、生成取指请求。
可选的,步骤S60可由分支预测单元执行,分支预测单元可根据分支预测跳转结果,在取指请求中设置切换标记,以用于指示进行译码器组切换的指令位置。
步骤S61、判断当前处理器是否处于微指令缓存模式,若否,执行步骤S62,若是,执行步骤S69。
其中,所述微指令缓存模式由微指令缓存查找得到微指令。
步骤S62、访问指令缓存,根据取指请求取指出指令流。
需要说明的是,指令缓存根据取指请求读取的指令流时,取指请求携带的切换标记会透传到取指出的指令流外,基于切换标记至少指示进行译码器组切换的指令位置,能够确定指令流中进行译码器组切换的指令位置,进而确定所述指令位置对应的目标指令。
步骤S63、判断取指请求中是否存在切换标记。若否,执行步骤S64;若是,执行步骤S65。
步骤S64、将指令流发送至默认译码器组,由默认译码器组对指令进行译码,得到的微指令保存到对应微指令队列。
可以理解的是,取指请求携带的切换标记至少用于指示进行译码器组切换的指令位置,当取指请求中不存在切换标记时,不需要对译码器组进行切换,由默认译码器组对指令流进行译码,得到的微指令保存到默认译码器组对应的微指令队列。其中,默认译码器组可以为按顺序分配的首个译码器组,也可以为处理器指定分配的译码器组,并且,在指令的处理过程中,针对每个取指请求,默认译码器组也可以为当前切换到的译码器组,本公开实施例中的默认译码器组并不指定为某一固定的译码器组,可根据实际需求进行选择。
步骤S65、根据切换标记切分指令流,并将目标指令以及目标指令之前 的指令分配给默认译码器组对应的指令队列,并由首个译码器组进行译码,得到的微指令保存到对应的微指令队列,其中,目标指令译码得到的目标微指令中附带切换标记。
步骤S66、判断剩余的指令是否还存在切换标记对应的目标指令,若否,执行步骤S67,若是,执行步骤S68。
步骤S67、将剩余的指令分配给与上一译码器组不同的下一译码器组对应的指令队列,并由该下一译码器组进行译码,得到的微指令保存到对应的微指令队列。
步骤S68、将目标指令以及目标指令之前的指令分配给与上一译码器组不同的下一译码器组对应的指令队列,并由该下一译码器组进行译码,得到的微指令保存到对应的微指令队列,其中,目标指令译码得到的微指令附带切换标记;返回执行步骤S66。
需要说明的是,由于指令是分配给译码器组的指令队列,再由译码器组由指令队列读取指令进行译码,因此当指令分配给指令队列的速度,快于译码器组的译码速度时,本公开实施例可实现将指令流切分后分配给多个译码器组的指令队列,并使得多个译码器组基于指令队列已分配的指令进行并行译码。需要进一步说明的是,步骤S65至步骤S68仅是本公开实施例根据目标指令对应的切换标记,对指令流进行切分,以得到多个指令组,并将多个指令组分配给多个译码器组进行并行译码的可选实现方式。
步骤S69、在微指令缓存中取出微指令,得到的微指令保存到对应微指令队列。
其中,微指令队列的选取如前文所述,在此不做赘述。
需要说明的是,基于取指请求在微指令缓存中取指微指令的前提是取指请求能够在微指令缓存中命中,若取指请求不能在微指令缓存中命中,则进入译码器模式,执行步骤S62。
其中,所述译码器模式由译码器组译码得到微指令。
步骤S70、从首个译码器组对应的微指令队列读取微指令。
步骤S71、判断读取到的微指令是否附带切换标记,若是,执行步骤S72;若否,返回执行步骤S70,直至微指令读取完毕。
步骤S72、切换至下一个译码器组对应的微指令队列读取微指令,返回 步骤S71。
在微指令读取完毕后,本公开实施例可进一步执行微指令。
本公开实施例可在支持译码器模式和微指令缓存模式的处理器中,实现并行译码的支持,提升译码性能。
本公开实施例还提供一种处理器,其结构可以参考图4B,其中,下文描述的内容可以认为是处理器为实现本公开实施例提供的译码方法,所需设置的功能模块,下文描述的内容可与上文描述内容相互对应参照。该处理器至少包括:
分支预测单元,用于生成取指请求,所述取指请求携带有至少一个切换标记,所述切换标记至少指示进行译码器组切换的指令位置;
指令缓存,用于响应于由译码器组译码得到微指令,获取所述取指请求取指出的指令流,根据所述取指请求携带的切换标记确定所述指令流中进行译码器组切换的指令位置;
指令分配单元,用于根据所述指令位置,将所述指令流分配给多个译码器组进行并行译码;
译码器组,用于对分配的指令进行译码,以得到微指令;所述译码器组的数量为多个,其中,译码器组译码目标指令时,目标指令译码得到的目标微指令中附带切换标记,所述目标指令为所述指令位置对应的指令;
微指令缓存,用于响应于由微指令缓存查找得到微指令,若所述取指请求在微指令缓存中命中,从所述微指令缓存中获取所述取指请求对应的微指令,所获取的微指令不附带切换标记。
可选的,所述指令位置为目标指令的结束位置;所述指令缓存,用于根据所述取指请求携带的切换标记,确定所述指令流中进行译码器组切换的指令位置的步骤可以包括:
根据所述取指请求携带的切换标记指示的指令位置,确定所述指令流中所述目标指令的结束位置。
可选的,所述指令分配单元,用于根据所述指令位置,将所述指令流分配给多个译码器组进行并行译码的步骤可以包括:
根据所述指令位置,对所述指令流进行切分,以得到多个指令组,将所述多个指令组分配给多个译码器组进行并行译码。
可选的,所述指令分配单元根据所述指令位置,对所述指令流进行切分,以得到多个指令组的步骤包括:
在所述指令流中以所述指令位置为分界,将所述指令流切分为多个指令组,其中,相邻的两个指令组中作为分界的目标指令切分到前一个指令组中;
所述指令分配单元将所述多个指令组分配给多个译码器组进行并行译码的步骤包括:
根据切分到前一个指令组中的目标指令对应的切换标记,为后一个指令组分配译码器组,并且前一个指令组分配的译码器组不同于后一个指令组分配的译码器组。
可选的,一个译码器组对应设置一个指令队列,用于保存待译码的指令;
所述指令分配单元,根据切分到前一个指令组中的目标指令对应的切换标记,为后一个指令组分配译码器组,并且前一个指令组分配的译码器组不同于后一个指令组分配的译码器组的步骤包括:
将所述多个指令组中的首个指令组保存到默认译码器组对应的指令队列;
针对所述多个指令组中的非首个指令组,根据前一个指令组中的目标指令对应的切换标记,从所述多个译码器组中确定与前一个指令组分配的译码器组不同的译码器组,将该非首个指令组保存到所确定的译码器组对应的指令队列。
可选的,所述指令分配单元,针对所述多个指令组中的非首个指令组,根据前一个指令组中的目标指令对应的切换标记,从所述多个译码器组中确定与前一个指令组分配的码器组不同的译码器组的步骤包括:
针对所述多个指令组中的非首个指令组,根据非首个指令组的前一个指令组中的目标指令对应的切换标记,按照所述多个译码器组的顺序,从所述多个译码器组中依序确定为各非首个指令组分配的译码器组。
可选的,所述指令分配单元,将所述多个指令组中的首个指令组保存到默认译码器组对应的指令队列的步骤包括:
在从微指令缓存模式退出,并进入译码器模式后,将所述多个指令组中的首个指令组分配给默认译码器组对应的指令队列;其中,所述译码器模式由译码器组译码得到微指令,所述微指令缓存模式由微指令缓存查找得到微 指令;
其中,在译码模式切换到微指令缓存模式之前,如果译码器组译码的最后一条指令不存在对应切换标记指示的指令位置,则在译码模式继续切换到译码器模式后,所述首个指令组分配给译码该最后一条指令的译码器组所对应的指令队列;
在译码模式切换到微指令缓存模式之前,如果译码器组译码的最后一条指令存在对应切换标记指示的指令位置,则在译码模式继续切换到译码器模式后,所述首个指令组分配给该最后一条指令对应的切换标记所指示的译码器组对应的指令队列。
可选的,所述处理器还包括合并单元,用于在译码器模式下,根据所述目标微指令附带的切换标记,将所述多个译码器组译码得到的微指令进行合并,以得到与取指顺序对应的微指令。
可选的,一个译码器组对应设置一个微指令队列;
所述译码器组还用于,将译码得到的微指令,保存在对应的微指令队列;
所述合并单元,根据所述目标微指令附带的切换标记,将所述多个译码器组译码得到的微指令进行合并,以得到与取指顺序对应的微指令的步骤包括:
根据所述目标微指令附带的切换标记,在各个译码器组对应的微指令队列中切换的进行微指令的合并,以得到与取指顺序对应的微指令。
可选的,所述合并单元,根据所述目标微指令附带的切换标记,在各个译码器组对应的微指令队列中切换的进行微指令的合并,以得到与取指顺序对应的微指令的步骤包括:
从默认译码器组对应的微指令队列开始读取微指令,若读取的微指令附带切换标记,根据微指令附带的切换标记,确定切换读取微指令的下一微指令队列,直至各个译码器组对应的微指令队列中的微指令读取完毕。
可选的,所述合并单元,若读取的微指令附带切换标记,根据微指令附带的切换标记,确定切换读取微指令的下一微指令队列的步骤包括:
若读取的微指令附带切换标记,根据微指令附带的切换标记,按照微指令队列的顺序,从各个译码器组的微指令队列中,依序切换读取微指令的微指令队列。
可选的,微指令缓存还用于,在微指令缓存模式下,将获取的微指令保存到默认译码器组对应的微指令队列;
所述译码器组还用于,将译码得到的微指令保存到微指令缓存中;
其中,若在微指令缓存模式下,所述取指请求未在微指令缓存中命中,则进入译码器模式。
本公开实施例中还提供了一种芯片,该芯片可以包括上述的处理器。
本公开实施例还提供一种电子设备,该电子设备可以包括上述的芯片。
上文描述了本公开的多个实施例,各实施例介绍的各可选方式可在不冲突的情况下相互结合、交叉引用,从而延伸出多种可能的实施例,这些均可认为是本公开的实施例披露。
虽然本公开实施例披露如上,但本公开并非限定于此。任何本领域技术人员,在不脱离本公开的精神和范围内,均可作各种更动与修改,因此本公开的保护范围应当以权利要求所限定的范围为准。

Claims (24)

  1. 一种译码方法,应用于处理器,所述译码方法包括:
    生成取指请求,其中,所述取指请求携带有至少一个切换标记,所述切换标记至少指示进行译码器组切换的指令位置;
    响应于由译码器组译码得到微指令,获取所述取指请求取指出的指令流,根据所述取指请求携带的所述切换标记,确定所述指令流中进行译码器组切换的指令位置;根据所述指令位置,将所述指令流分配给多个译码器组进行并行译码,并在目标指令译码得到的目标微指令中附带所述切换标记,所述目标指令为所述指令位置对应的指令;
    响应于由微指令缓存查找得到微指令,若所述取指请求在所述微指令缓存中命中,从所述微指令缓存中获取所述取指请求对应的微指令。
  2. 根据权利要求1所述的译码方法,其中,所述指令位置为所述目标指令的结束位置;所述根据所述取指请求携带的所述切换标记,确定所述指令流中进行译码器组切换的所述指令位置包括:
    根据所述取指请求携带的所述切换标记指示的所述指令位置,确定所述指令流中所述目标指令的结束位置。
  3. 根据权利要求1或2所述的译码方法,其中,所述根据所述指令位置,将所述指令流分配给多个译码器组进行并行译码包括:
    根据所述指令位置,对所述指令流进行切分,以得到多个指令组,将所述多个指令组分配给多个译码器组进行并行译码。
  4. 根据权利要求3所述的译码方法,其中,所述根据所述指令位置,对所述指令流进行切分,以得到多个指令组包括:
    在所述指令流中以所述指令位置为分界,将所述指令流切分为多个指令组,其中,相邻的两个指令组中作为分界的目标指令切分到前一个指令组中;
    所述将所述多个指令组分配给多个译码器组进行并行译码包括:
    根据切分到前一个指令组中的目标指令对应的切换标记,为后一个指令组分配译码器组,并且前一个指令组分配的译码器组不同于后一个指令组分配的译码器组。
  5. 根据权利要求4所述的译码方法,其中,一个译码器组对应设置一个 指令队列,用于保存待译码的指令;所述根据切分到前一个指令组中的目标指令对应的切换标记,为后一个指令组分配译码器组,并且前一个指令组分配的译码器组不同于后一个指令组分配的译码器组包括:
    将所述多个指令组中的首个指令组保存到默认译码器组对应的指令队列;
    针对所述多个指令组中的非首个指令组,根据前一个指令组中的目标指令对应的切换标记,从所述多个译码器组中确定与前一个指令组分配的译码器组不同的译码器组,将所述非首个指令组保存到所确定的译码器组对应的指令队列。
  6. 根据权利要求5所述的译码方法,其中,所述针对所述多个指令组中的所述非首个指令组,根据前一个指令组中的目标指令对应的切换标记,从所述多个译码器组中确定与前一个指令组分配的码器组不同的译码器组包括:
    针对所述多个指令组中的所述非首个指令组,根据所述非首个指令组的前一个指令组中的目标指令对应的切换标记,按照所述多个译码器组的顺序,从所述多个译码器组中依序确定为各非首个指令组分配的译码器组。
  7. 根据权利要求5或6所述的译码方法,其中,所述将所述多个指令组中的首个指令组保存到默认译码器组对应的指令队列包括:
    在从微指令缓存模式退出,并进入译码器模式后,将所述多个指令组中的所述首个指令组分配给默认译码器组对应的指令队列;其中,所述译码器模式由译码器组译码得到微指令,所述微指令缓存模式由所述微指令缓存查找得到微指令;
    其中,在译码模式切换到所述微指令缓存模式之前,如果译码器组译码的最后一条指令不存在对应所述切换标记指示的所述指令位置,则在所述译码模式继续切换到所述译码器模式后,所述首个指令组分配给译码所述最后一条指令的译码器组所对应的指令队列;
    在所述译码模式切换到所述微指令缓存模式之前,如果译码器组译码的最后一条指令存在对应所述切换标记指示的所述指令位置,则在所述译码模式继续切换到所述译码器模式后,所述首个指令组分配给所述最后一条指令对应的切换标记所指示的译码器组对应的指令队列。
  8. 根据权利要求1-6任一项所述的译码方法,其中,所述处理器包括多种译码模式,所述多种译码模式包括译码器模式和微指令缓存模式;所述译码器模式由译码器组译码得到微指令,所述微指令缓存模式由微指令缓存查找得到微指令。
  9. 根据权利要求8所述的译码方法,还包括:
    在所述译码器模式下,根据所述目标微指令附带的所述切换标记,将所述多个译码器组译码得到的微指令进行合并,以得到与取指顺序对应的微指令。
  10. 根据权利要求9所述的译码方法,其中,一个译码器组对应设置一个微指令队列;所述译码方法还包括:
    将各个译码器组译码得到的微指令,保存在对应的微指令队列;
    所述根据所述目标微指令附带的所述切换标记,将所述多个译码器组译码得到的微指令进行合并,以得到与取指顺序对应的微指令包括:
    根据所述目标微指令附带的所述切换标记,在各个译码器组对应的微指令队列中切换的进行微指令的合并,以得到与取指顺序对应的微指令。
  11. 根据权利要求10所述的译码方法,其中,所述根据所述目标微指令附带的所述切换标记,在各个译码器组对应的微指令队列中切换的进行微指令的合并,以得到与取指顺序对应的微指令包括:
    从默认译码器组对应的微指令队列开始读取微指令,若读取的微指令附带所述切换标记,根据微指令附带的所述切换标记,确定切换读取微指令的下一微指令队列,直至各个译码器组对应的微指令队列中的微指令读取完毕。
  12. 根据权利要求11所述的译码方法,其中,所述若读取的微指令附带所述切换标记,根据微指令附带的所述切换标记,确定切换读取微指令的下一微指令队列包括:
    若读取的微指令附带所述切换标记,根据微指令附带的所述切换标记,按照微指令队列的顺序,从各个译码器组的微指令队列中,依序切换读取微指令的微指令队列。
  13. 根据权利要求10所述的译码方法,还包括:
    在所述微指令缓存模式下,将获取的微指令保存到默认译码器组对应的微指令队列。
  14. 根据权利要求8-13任一项所述的译码方法,还包括:
    在所述微指令缓存模式下,若所述取指请求未在所述微指令缓存中命中,进入所述译码器模式;
    在所述译码器模式下,将所述多个译码器组译码得到的微指令保存到所述微指令缓存中。
  15. 一种处理器,包括:
    分支预测单元,被配置为生成取指请求,其中,所述取指请求携带有至少一个切换标记,所述切换标记至少指示进行译码器组切换的指令位置;
    指令缓存,被配置为响应于由译码器组译码得到微指令,获取所述取指请求取指出的指令流,根据所述取指请求携带的所述切换标记,确定所述指令流中进行译码器组切换的指令位置;
    指令分配单元,被配置为根据所述指令位置,将所述指令流分配给多个译码器组进行并行译码;
    译码器组,被配置为对分配的指令进行译码,以得到微指令;所述译码器组的数量为多个,其中,所述译码器组译码目标指令时,所述目标指令译码得到的目标微指令中附带所述切换标记,所述目标指令为所述指令位置对应的指令;以及
    微指令缓存,被配置为响应于由微指令缓存查找得到微指令,若所述取指请求在所述微指令缓存中命中,从所述微指令缓存中获取所述取指请求对应的微指令,所获取的微指令不附带所述切换标记。
  16. 根据权利要求15所述的处理器,其中,所述指令位置为所述目标指令的结束位置;所述指令缓存,根据所述取指请求携带的所述切换标记,确定所述指令流中进行译码器组切换的所述指令位置,包括:
    根据所述取指请求携带的所述切换标记指示的所述指令位置,确定所述指令流中所述目标指令的结束位置。
  17. 根据权利要求15或16所述的处理器,其中,所述指令分配单元根据所述指令位置,将所述指令流分配给多个译码器组进行并行译码,包括:
    所述指令分配单元根据所述指令位置,对所述指令流进行切分,以得到多个指令组,将所述多个指令组分配给多个译码器组进行并行译码。
  18. 根据权利要求17所述的处理器,其中,所述指令分配单元根据所述 指令位置,对所述指令流进行切分,以得到多个指令组,包括:
    所述指令分配单元在所述指令流中以所述指令位置为分界,将所述指令流切分为多个指令组,其中,相邻的两个指令组中作为分界的目标指令切分到前一个指令组中;以及
    根据切分到前一个指令组中的目标指令对应的切换标记,为后一个指令组分配译码器组,并且前一个指令组分配的译码器组不同于后一个指令组分配的译码器组。
  19. 根据权利要求18所述的处理器,其中,一个译码器组对应设置一个指令队列,用于保存待译码的指令;
    所述指令分配单元,根据切分到前一个指令组中的目标指令对应的切换标记,为后一个指令组分配译码器组,并且前一个指令组分配的译码器组不同于后一个指令组分配的译码器组,包括:
    所述指令分配单元将所述多个指令组中的首个指令组保存到默认译码器组对应的指令队列;
    针对所述多个指令组中的非首个指令组,根据前一个指令组中的目标指令对应的切换标记,从所述多个译码器组中确定与前一个指令组分配的译码器组不同的译码器组,将所述非首个指令组保存到所确定的译码器组对应的指令队列。
  20. 根据权利要求15-19任一项所述的处理器,还包括:
    合并单元,被配置为在译码器模式下,根据目标微指令中附带的所述切换标记,将所述多个译码器组译码得到的微指令进行合并,以得到与取指顺序对应的微指令,其中,所述译码器模式由译码器组译码得到微指令。
  21. 根据权利要求20所述的处理器,其中,一个译码器组对应设置一个微指令队列;所述译码器组还被配置为,将译码得到的微指令,保存在对应的微指令队列;
    所述合并单元根据所述目标微指令附带的所述切换标记,将所述多个译码器组译码得到的微指令进行合并,以得到与取指顺序对应的微指令,包括:
    根据所述目标微指令附带的所述切换标记,在各个译码器组对应的微指令队列中切换的进行微指令的合并,以得到与取指顺序对应的微指令。
  22. 根据权利要求21所述的处理器,其中,所述微指令缓存还被配置为, 在微指令缓存模式下,将获取的微指令保存到默认译码器组对应的微指令队列;其中,所述微指令缓存模式由所述微指令缓存查找得到微指令;
    所述译码器组还被配置为,将译码得到的微指令保存到所述微指令缓存中;
    其中,若在所述微指令缓存模式下,所述取指请求未在所述微指令缓存中命中,则进入所述译码器模式。
  23. 一种芯片,包括如权利要求15-22任一项所述的处理器。
  24. 一种电子设备,包括如权利要求23所述的芯片。
PCT/CN2023/078435 2022-10-31 2023-02-27 一种译码方法、处理器、芯片及电子设备 WO2024093063A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211350246.2 2022-10-31
CN202211350246.2A CN115525344B (zh) 2022-10-31 2022-10-31 一种译码方法、处理器、芯片及电子设备

Publications (1)

Publication Number Publication Date
WO2024093063A1 true WO2024093063A1 (zh) 2024-05-10

Family

ID=84703803

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/078435 WO2024093063A1 (zh) 2022-10-31 2023-02-27 一种译码方法、处理器、芯片及电子设备

Country Status (2)

Country Link
CN (1) CN115525344B (zh)
WO (1) WO2024093063A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115525344B (zh) * 2022-10-31 2023-06-27 海光信息技术股份有限公司 一种译码方法、处理器、芯片及电子设备
CN116414463B (zh) * 2023-04-13 2024-04-12 海光信息技术股份有限公司 指令调度方法、指令调度装置、处理器及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10207707A (ja) * 1997-01-14 1998-08-07 Ind Technol Res Inst スーパースカラパイプライン式データ処理装置の可変長命令の並列デコーディング装置及び方法
CN107358125A (zh) * 2017-06-14 2017-11-17 北京多思科技工业园股份有限公司 一种处理器
CN112631660A (zh) * 2020-12-16 2021-04-09 广东赛昉科技有限公司 一种并行提取指令的方法与可读存储介质
CN114090077A (zh) * 2021-11-24 2022-02-25 海光信息技术股份有限公司 调取指令的方法及装置、处理装置及存储介质
US20220100519A1 (en) * 2020-09-25 2022-03-31 Advanced Micro Devices, Inc. Processor with multiple fetch and decode pipelines
CN115098169A (zh) * 2022-06-24 2022-09-23 海光信息技术股份有限公司 基于容量共享的调取指令的方法及装置
CN115525344A (zh) * 2022-10-31 2022-12-27 海光信息技术股份有限公司 一种译码方法、处理器、芯片及电子设备

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7281120B2 (en) * 2004-03-26 2007-10-09 International Business Machines Corporation Apparatus and method for decreasing the latency between an instruction cache and a pipeline processor
CN102243578A (zh) * 2010-05-10 2011-11-16 北京凡达讯科技有限公司 一种芯片的命令译码方法、系统及装置
GB2519103B (en) * 2013-10-09 2020-05-06 Advanced Risc Mach Ltd Decoding a complex program instruction corresponding to multiple micro-operations
CN106406814B (zh) * 2016-09-30 2019-06-14 上海兆芯集成电路有限公司 处理器和将架构指令转译成微指令的方法
CN112540797A (zh) * 2019-09-23 2021-03-23 阿里巴巴集团控股有限公司 指令处理装置和指令处理方法
CN112130897A (zh) * 2020-09-23 2020-12-25 上海兆芯集成电路有限公司 微处理器
CN114138341B (zh) * 2021-12-01 2023-06-02 海光信息技术股份有限公司 微指令缓存资源的调度方法、装置、程序产品以及芯片
CN114201219B (zh) * 2021-12-21 2023-03-17 海光信息技术股份有限公司 指令调度方法、指令调度装置、处理器及存储介质
CN114579312A (zh) * 2022-03-04 2022-06-03 海光信息技术股份有限公司 一种指令处理方法、处理器、芯片和电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10207707A (ja) * 1997-01-14 1998-08-07 Ind Technol Res Inst スーパースカラパイプライン式データ処理装置の可変長命令の並列デコーディング装置及び方法
CN107358125A (zh) * 2017-06-14 2017-11-17 北京多思科技工业园股份有限公司 一种处理器
US20220100519A1 (en) * 2020-09-25 2022-03-31 Advanced Micro Devices, Inc. Processor with multiple fetch and decode pipelines
CN112631660A (zh) * 2020-12-16 2021-04-09 广东赛昉科技有限公司 一种并行提取指令的方法与可读存储介质
CN114090077A (zh) * 2021-11-24 2022-02-25 海光信息技术股份有限公司 调取指令的方法及装置、处理装置及存储介质
CN115098169A (zh) * 2022-06-24 2022-09-23 海光信息技术股份有限公司 基于容量共享的调取指令的方法及装置
CN115525344A (zh) * 2022-10-31 2022-12-27 海光信息技术股份有限公司 一种译码方法、处理器、芯片及电子设备

Also Published As

Publication number Publication date
CN115525344A (zh) 2022-12-27
CN115525344B (zh) 2023-06-27

Similar Documents

Publication Publication Date Title
WO2024093063A1 (zh) 一种译码方法、处理器、芯片及电子设备
US6173369B1 (en) Computer system for processing multiple requests and out of order returns using a request queue
US5895501A (en) Virtual memory system for vector based computer systems
KR100295081B1 (ko) 명령어실행제어를위해명령어에태그를할당하는시스템및방법
US10866902B2 (en) Memory aware reordered source
KR19990087940A (ko) 단일클럭사이클내에불연속명령을페치하기위한방법및시스템
US11372646B2 (en) Exit history based branch prediction
CN107870879B (zh) 一种数据搬移方法、加速器板卡、主机及数据搬移系统
US7093100B2 (en) Translation look aside buffer (TLB) with increased translational capacity for multi-threaded computer processes
US10210090B1 (en) Servicing CPU demand requests with inflight prefetchs
US8707014B2 (en) Arithmetic processing unit and control method for cache hit check instruction execution
US6052530A (en) Dynamic translation system and method for optimally translating computer code
US5893146A (en) Cache structure having a reduced tag comparison to enable data transfer from said cache
US11327768B2 (en) Arithmetic processing apparatus and memory apparatus
WO2024093062A1 (zh) 并行译码的方法、处理器、芯片及电子设备
CN115629807B (zh) 多线程处理器的译码方法、处理器、芯片及电子设备
JPH02304650A (ja) パイプライン方式のマイクロプロセッサ
US6292866B1 (en) Processor
EP0101718B1 (en) Computer with automatic mapping of memory contents into machine registers
CN113795823A (zh) 处理器资源的可编程控制
US11507372B2 (en) Processing of instructions fetched from memory
EP4276611A1 (en) Instruction prediction method and system, and computer-readable storage medium
CN115658150B (zh) 一种指令分配方法、处理器、芯片及电子设备
CN109614146B (zh) 一种局部跳转指令取指方法及装置
JPH10133872A (ja) 命令バッファを有するプロセッサ装置

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2023884011

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2023884011

Country of ref document: EP

Effective date: 20240529

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23884011

Country of ref document: EP

Kind code of ref document: A1