CN115658150A - Instruction distribution method, processor, chip and electronic equipment - Google Patents

Instruction distribution method, processor, chip and electronic equipment Download PDF

Info

Publication number
CN115658150A
CN115658150A CN202211348765.5A CN202211348765A CN115658150A CN 115658150 A CN115658150 A CN 115658150A CN 202211348765 A CN202211348765 A CN 202211348765A CN 115658150 A CN115658150 A CN 115658150A
Authority
CN
China
Prior art keywords
instruction
cache
boundary
boundary information
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211348765.5A
Other languages
Chinese (zh)
Other versions
CN115658150B (en
Inventor
崔泽汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202211348765.5A priority Critical patent/CN115658150B/en
Publication of CN115658150A publication Critical patent/CN115658150A/en
Application granted granted Critical
Publication of CN115658150B publication Critical patent/CN115658150B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides an instruction distribution method, a processor, a chip and electronic equipment, wherein the method comprises the following steps: reading an instruction stream from an instruction cache according to an instruction fetching address, and reading instruction boundary information from an instruction boundary cache according to the instruction fetching address, wherein the instruction boundary information indicates an instruction position for performing instruction segmentation; and segmenting the instruction stream according to the instruction position indicated by the instruction boundary information, and distributing the segmented instruction stream to a plurality of decoder groups for parallel decoding. According to the method and the device, the instruction boundary information recorded by the instruction boundary cache can be segmented and distributed to a plurality of decoder groups for decoding, so that the throughput of the decoder is improved, and the decoding performance of the processor is improved.

Description

Instruction distribution method, processor, chip and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of processors, in particular to an instruction distribution method, a processor, a chip and electronic equipment.
Background
In modern processors, instructions need to be processed through instruction fetching, decoding, execution and the like; the decoding is a process of analyzing and translating an instruction to obtain a micro-instruction (Uop). In order to improve the decoding performance, a plurality of decoders may be disposed in the processor for decoding a plurality of instructions, however, how to distribute the instructions to the plurality of decoders is problematic.
Disclosure of Invention
In view of this, embodiments of the present application provide an instruction allocation method, a processor, a chip, and an electronic device, so as to implement splitting an instruction to be fetched and allocate the instruction to a plurality of decoder groups for decoding, thereby improving throughput of the decoder groups and improving decoding performance of the processor.
In order to achieve the above object, the embodiments of the present application provide the following technical solutions.
In a first aspect, an embodiment of the present application provides an instruction allocation method, including:
reading an instruction stream from an instruction cache according to an instruction fetching address, and reading instruction boundary information from an instruction boundary cache according to the instruction fetching address, wherein the instruction boundary information indicates an instruction position for performing instruction segmentation;
and segmenting the instruction stream according to the instruction position indicated by the instruction boundary information, and distributing the segmented instruction stream to a plurality of decoder groups for parallel decoding.
In a second aspect, an embodiment of the present application further provides a processor, including:
the instruction cache is used for acquiring an instruction stream according to the instruction fetching address;
the instruction boundary cache is used for acquiring instruction boundary information according to the instruction fetching address, and the instruction boundary information indicates an instruction position for performing instruction segmentation;
and the instruction splitting unit is used for splitting the instruction stream according to the instruction position indicated by the instruction boundary information and distributing the split instruction stream to a plurality of decoder groups for parallel decoding.
In a third aspect, an embodiment of the present application further provides a chip, which includes the processor as described in the second aspect.
In a fourth aspect, an embodiment of the present application further provides an electronic device, which includes the chip as described in the third aspect.
The instruction allocation method provided by the embodiment of the application can record the instruction boundary information of the instruction through the instruction boundary cache, wherein the instruction boundary information indicates the instruction position for performing instruction segmentation on the instruction stream of the instruction fetch. Based on this, the embodiment of the application can read the instruction stream from the instruction cache according to the instruction fetch address, read the instruction boundary information from the instruction boundary cache according to the instruction fetch address, further segment the instruction stream according to the instruction position indicated by the instruction boundary information, and allocate the segmented instruction stream to the plurality of decoder groups for parallel decoding. According to the embodiment of the application, the instruction boundary information of the instruction can be recorded through the instruction boundary cache, and the instruction stream of the instruction is segmented, so that the segmented instruction stream is distributed to the plurality of decoders to be decoded in parallel, the effect of segmenting the instruction stream of the instruction and distributing the instruction stream to the plurality of decoder groups to be decoded is achieved, the problem that the instruction is distributed to the plurality of decoder groups is solved, the throughput of the decoder groups can be improved, and the decoding performance of the processor is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is an architectural block diagram of a processor.
Fig. 2 is another block diagram of an architecture of a processor according to an embodiment of the present disclosure.
Fig. 3 is a block diagram of another architecture of a processor according to an embodiment of the present disclosure.
Fig. 4 is a flowchart of an instruction allocation method according to an embodiment of the present disclosure.
Fig. 5 is a schematic diagram of instruction stream segmentation provided in the embodiment of the present application.
Fig. 6A is an alternative diagram of an internal structure of an instruction boundary cache block in an instruction boundary cache according to an embodiment of the present disclosure.
Fig. 6B is another alternative diagram illustrating an internal structure of an instruction boundary cache block in an instruction boundary cache according to an embodiment of the present application.
Fig. 7 is an alternative schematic diagram of a read instruction stream and instruction boundary information according to an embodiment of the present disclosure.
Fig. 8 is another alternative diagram of reading an instruction stream and instruction boundary information according to an embodiment of the present disclosure.
FIG. 9 is a block diagram of a processor architecture with a micro instruction cache according to an embodiment of the present disclosure.
FIG. 10 is a block diagram of an embodiment of a processor with a micro instruction cache.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.
Processors typically implement the processing of instructions using pipeline (pipeline) techniques. In the pipeline operation of the processor, an Instruction needs to go through processing procedures such as Instruction Fetch (Instruction Fetch), decode (Instruction Decode), execute (Execute), and the like. The instruction fetching refers to fetching an instruction corresponding to program operation from a cache or a main memory of a processor; the decoding operation is to decode the instruction pointed out to determine the operation code and/or address code of the instruction; the execution operation is to execute instruction operation according to the obtained operation code and/or address code, and the like, so as to realize program operation. Wherein, the decoding is mainly realized by a plurality of decoder groups arranged in the processor. As an alternative implementation, fig. 1 illustrates an architectural block diagram of a processor that includes: the specific values of branch prediction unit 101, instruction cache 102, and the plurality of decoder sets 1031 to 103n, n may depend on the specific design of the processor, and are not limited in this embodiment.
Branch prediction unit 101 is a digital circuit that performs branch prediction on instructions and generates fetch requests based on the results of the branch prediction. It should be noted that, because of the possibility of a branch instruction changing the program flow, in order to reduce the pipeline delay caused by the processor waiting for the execution result of the branch instruction to determine the next fetch, the front end of the processor may be provided with a branch prediction unit to implement the branch prediction of the instruction.
The branch prediction result is, for example, whether the current instruction is a branch instruction, the branch outcome (direction, address, target address, etc.) of the branch instruction, etc. In one implementation, a branch prediction unit may perform branch prediction of an instruction based on historical execution information and results of the branch instruction, thereby deriving a fetch address range for the instruction and generating a fetch request. The instruction fetch request generated by the branch prediction unit includes instruction fetch addresses of a plurality of instructions, and is used for reading corresponding instructions from the instruction cache 102 according to the instruction fetch addresses.
The instruction cache 102 stores instructions, wherein the instruction cache 102 stores instructions mainly through instruction cache blocks. The instruction cache block is mainly composed of two parts, namely a tag part (also referred to as a tag field) and a data part (also referred to as a data field), and a plurality of storage units are arranged in the tag field and the data field. The tag field of the instruction cache is used for storing a tag, and whether an instruction corresponding to the instruction fetching address is stored in the instruction cache can be determined according to the tag; the data field is used for storing the specific content of the instruction, and the corresponding instruction can be searched in the data field according to the tag cached by the instruction.
The plurality of decoder groups 1031 to 103n are capable of decoding a plurality of instructions in parallel. For any decoder group, the decoder group can carry out decoding operation on the instruction to obtain a decoded instruction; the decoded instructions may be machine-executable operational information derived from interpreting the instructions, such as machine-executable uops (micro-ops, micro-instructions) formed by control fields.
However, the processor needs to implement parallel decoding of instructions by multiple decoder groups, and has many problems, such as how the instruction stream fetched by the instruction cache 102 is distributed to multiple decoder groups. To solve the above problem, the embodiment of the present application provides a further improved processor architecture, and fig. 2 schematically illustrates another architecture block diagram of the processor provided by the embodiment of the present application. With reference to fig. 1 and fig. 2, the processor shown in fig. 2 is further provided with an instruction splitting unit 201, where the instruction splitting unit 201 is configured to split an instruction stream fetched by the instruction cache 102, so as to obtain a plurality of instruction groups allocated to a plurality of decoder groups for parallel decoding.
In a possible implementation, the instruction splitting unit 201 may perform instruction splitting on the instruction stream of the fetch according to the instruction branch jump predicted by the branch prediction unit.
It should be noted that there may be branch instructions in an instruction stream, and the branch instructions may change the execution flow of the program or call the subprogram. The prediction of a branch instruction by branch prediction unit 101 enables the determination of the fetch address of the next instruction to the branch instruction. Among them, the prediction of branch instructions by branch prediction unit 101 is generally divided into two cases: branch instructions do not jump and branch instruction jumps. When branch prediction unit 101 predicts a branch instruction jump, the end position of the current instruction fetch address corresponding to the branch jump instruction in the instruction fetch request is the end position of the branch jump instruction, the instruction fetch start address of the next instruction is the jump instruction fetch address, and the instruction fetch address is the address indicating the branch jump. It will be appreciated that the branch jump instruction is a complete instruction, and the end of instruction position of the branch jump instruction may be used as an effective instruction boundary. Therefore, the branch prediction jump of the branch prediction unit may be used as a basis for the instruction splitting unit 201 to split the instruction stream.
The inventor finds that, in the research process, although the instruction splitting unit can split the instruction stream based on the branch prediction jump of the branch prediction unit, for the instruction stream with sparse branches or no branches, the problem that the instruction splitting unit splits the instruction stream is untimely or unreasonable occurs.
In one example, if there are no branches in a long instruction stream, the branch prediction unit may not be able to give a prediction of a branch jump. For the instruction stream without the branch jump instruction, the instruction splitting unit cannot split the instruction stream, so that a long section of instruction stream is allocated to a default decoder group, and the rest decoder groups are in an idle state, so that the multiple decoder groups cannot decode the instruction in parallel, the throughput of the multiple decoder groups is reduced, and the decoding performance of the processor is affected.
In another example, if only a small number of branch jump instructions exist in a long instruction stream, for example, only one branch jump instruction exists, that is, only one address indicating branch jump exists in the fetch addresses obtained by branch predicting the instructions by the branch prediction unit, the instruction splitting unit can split the instruction stream once according to the branch jump instruction at the moment, and the splitting position of the instruction stream is limited by the branch jump instruction. For example, if the branch jump instruction exists at the start position of the instruction stream, most of the split instructions will be allocated to one decoder group, which will also cause idle states of the remaining decoder groups, and thus the throughput of the decoder group cannot be effectively improved.
Based on this, the embodiment of the present application provides a further improved processor architecture, and fig. 3 schematically illustrates yet another architecture block diagram of the processor provided in the embodiment of the present application. As shown in fig. 2 and fig. 3, an instruction boundary cache 202 is further disposed in the processor shown in fig. 3, and is configured to store instruction boundary information, where the instruction boundary information indicates an instruction position for performing instruction splitting, and can provide a basis for the instruction splitting unit to split the instruction stream. In some embodiments, the instruction splitting unit splits the instruction stream based on valid instruction boundaries.
In some embodiments, the instruction boundary cache may be accessed synchronously when the instruction cache is accessed based on the instruction fetch request, and then the instruction boundary information is read in the instruction boundary cache while the instruction stream is read in the instruction cache according to the instruction fetch address; the instruction segmentation unit can segment the instruction stream of the instruction fetch based on the instruction boundary information read from the instruction boundary cache; compared with the situation that the branches of the instruction stream are sparse or no instruction branch exists, the instruction stream can be segmented in time by using the instruction boundary information in the instruction boundary cache, the instruction stream with the instruction can be distributed to a plurality of decoder groups for parallel decoding after being segmented, the throughput of the plurality of decoder groups is improved, and the decoding performance of the processor is improved.
As an alternative implementation, fig. 4 exemplarily shows a flowchart of an instruction allocation method provided in the embodiment of the application. As shown in fig. 4, the method flow may include the following steps.
In step S41, an instruction stream is read from an instruction cache according to an instruction fetch address, and instruction boundary information is read from an instruction boundary cache according to the instruction fetch address, where the instruction boundary information indicates an instruction position for performing instruction splitting.
The instruction fetch address has tag (address identification), the tag is used as tag information of the instruction fetch address, and the identification corresponding to the tag field of the instruction fetched in the instruction cache can be recorded. In some embodiments, the instruction boundary cache may multiplex a tag of the instruction cache, so that the embodiment of the present application can synchronously read instruction boundary information in the instruction boundary cache while reading an instruction stream from the instruction cache according to an instruction fetching address. As an alternative implementation, the instruction boundary cache may be a structure disposed outside of the instruction cache, separate from the instruction cache, for storing instruction boundary information. As another alternative implementation, the instruction boundary cache may be a region allocated in the data field of the instruction cache for storing instruction boundary information, and the region is located in the instruction cache.
It should be noted that the effective instruction boundaries of the instructions cannot be known by the decoder group before the instructions are decoded. Based on this, as an optional implementation, as shown in fig. 3, when the decoder group decodes an instruction, the embodiment of the present application may store instruction boundary information determined by the instruction decoded by the decoder group into an instruction boundary cache, where the instruction boundary information may specifically indicate an instruction ending address, so that the instruction splitting unit performs splitting of an instruction stream according to the instruction ending address indicated by the instruction boundary information.
In step S42, the instruction stream is segmented according to the instruction position indicated by the instruction boundary information, and the segmented instruction stream is allocated to multiple decoder groups for parallel decoding.
In some embodiments, since the instruction boundary information indicates an instruction position (for example, an instruction end address) at which the instruction splitting is performed, after the instruction splitting unit obtains the instruction stream and the instruction boundary information, the instruction stream can be split according to the instruction position indicated by the instruction boundary information, and the split instruction stream is allocated to a plurality of decoder groups for parallel decoding.
As an optional implementation of splitting the instruction stream, the instruction splitting unit splits the instruction stream into a preceding instruction group after the splitting and a subsequent instruction group after the splitting with an instruction position indicated by the instruction boundary information as a boundary, thereby splitting the instruction stream into a plurality of instruction groups. When the instruction stream is distributed to a plurality of decoder groups for parallel decoding, the instruction splitting unit distributes the former instruction group after splitting to the default decoder group for decoding, and switches the latter instruction group after splitting to the next decoder group of the default decoder group for decoding.
To facilitate understanding of the principle of splitting the instruction stream based on the instruction positions indicated by the instruction boundary information, two decoder groups are described as an example. Fig. 5 schematically shows an instruction stream segmentation diagram. As shown in fig. 5, the instruction stream includes instructions 510 to 51m, where m is the number of instructions in the instruction stream, which may be determined according to practical situations, and the embodiment of the present application is not limited thereto. And if the instruction boundary information read from the instruction boundary cache corresponds to the end position of the instruction 51k, splitting the instruction streams 510 to 51m by taking the end position of the instruction 51k as a boundary for splitting the instruction streams in the instructions 510 to 51 m.
Referring to fig. 5, instruction 51k is adjacent to instruction 51k +1, and when the instruction stream is divided, instructions 510 to 51k are divided into one instruction group, and instructions 51k +1 to 51m are divided into another instruction group, so as to obtain two different instruction groups, where instructions 510 to 51k are referred to as a previous instruction group after division, and instructions 51k +1 to 51m are referred to as a next instruction group after division.
With continued reference to fig. 5, the embodiment of the present application may assign the sliced previous instruction group (i.e., the instruction 510 to the instruction 51 k) to the decoder group 521 according to the order of the decoder group 521 and the decoder group 522, and the decoder group 521 performs a decoding operation on the instruction 510 to the instruction 51 k. Based on the instruction position indicated by the instruction boundary information being the end position of the instruction 51k, the instruction after the instruction 51k needs to be switched to a different decoder group for decoding, so that the divided next instruction group (i.e., the instruction 51k +1 to the instruction 51 m) is allocated to the decoder group 522, and the decoder group 522 performs a decoding operation on the instruction 51k +1 to the instruction 51 m.
It should be noted that, when there are a plurality of pieces of read instruction boundary information, the instruction stream may also be split according to the above splitting principle, for example, the instruction boundary information indicates a plurality of instruction positions for performing instruction splitting, and then the embodiment of the present application may perform instruction splitting for a plurality of times in the instruction stream according to the plurality of instruction positions indicated by the instruction boundary information. When the number of the instruction groups after the splitting is multiple, as an optional implementation, the embodiment of the present application may sequentially allocate the multiple instruction groups to the multiple decoder groups.
As an optional implementation, an instruction queue may be correspondingly arranged in front of each decoder group, and is used for storing instructions to be decoded. When the instruction splitting unit distributes the split instructions to a plurality of decoder groups, the instruction splitting unit can store the instructions into the instruction queues corresponding to the decoder groups, so that each decoder group can read the instructions to be decoded in the corresponding instruction queues and perform decoding operation.
As a possible implementation, if a branch jump instruction occurs in the instruction stream to be fetched, the instruction splitting unit may further split the instruction stream according to the branch jump instruction, for example, split the instruction stream at an end position of the branch jump instruction, where an instruction before the end position of the branch jump instruction belongs to an instruction before splitting, and an instruction after the end position of the branch jump instruction belongs to a split instruction.
It can be seen that, in the embodiment of the present application, an instruction boundary cache may be additionally set in the processor, and instruction boundary information is stored through the instruction boundary cache, where the instruction boundary information indicates an instruction position for performing instruction splitting; therefore, when the processor reads the instruction stream from the instruction cache according to the instruction fetching address, the processor synchronously accesses the instruction boundary cache according to the instruction fetching address and reads the instruction boundary information from the instruction boundary cache. Furthermore, the embodiment of the application can segment the instruction stream through the instruction position indicated by the instruction boundary information to realize timely segmentation of the instruction stream, and the segmented instruction stream is distributed to a plurality of decoder groups to be decoded in parallel. According to the embodiment of the application, the instruction boundary information can be stored through the instruction boundary cache, and when the instruction stream is read, the instruction stream is timely segmented based on the instruction boundary information which is stored in the instruction boundary cache and corresponds to the instruction fetching address, so that the segmented instruction stream can be distributed to a plurality of decoder groups to be decoded in parallel, the problem that the instruction is distributed to the plurality of decoder groups is solved, the throughput of the plurality of decoder groups is improved, and the decoding performance of a processor is improved.
The embodiment of the application can not be limited to the segmentation of the instruction stream of the instruction fetch through the branch jump information, but can be used for segmenting the instruction stream based on the instruction boundary information stored in the instruction boundary cache, so that the instruction stream can be segmented in time when the instruction fetch address corresponding to the branch jump information does not exist, and the segmented instruction stream is distributed to a plurality of decoders for parallel decoding, thereby solving the problem that the instruction is distributed to a plurality of decoder groups.
In some embodiments, an instruction boundary cache may multiplex tags of the instruction cache, so that embodiments of the present application may read instruction boundary information corresponding to a target tag hit in the instruction cache based on a fetch address.
It should be noted that the target tag is a tag of the instruction cache corresponding to the fetch address. The method comprises the steps that on the basis of tags existing in a fetch address, identification corresponding to tag parts of fetched instructions in an instruction cache can be recorded, the tags in the fetch address can be compared with the tags in the instruction cache, so that the tags hit by the fetch address in the instruction cache are determined, the hit tags are target tags, and instruction streams can be obtained by reading the instructions corresponding to the target tags in the instruction cache.
In some further embodiments, the instruction cache block of the instruction cache may include a multi-way tag field and a corresponding multi-way data field, and the instruction boundary cache may include a multi-way data field. As an optional implementation, the multiple data fields of the instruction boundary cache may multiplex the multiple tag fields of the instruction boundary cache to implement as data field association tag fields in the instruction boundary cache, and the multiple data fields of the instruction boundary cache are in a one-to-one correspondence structure with the multiple data fields of the instruction cache.
In one example, according to a target tag hit by an instruction fetch address in an instruction cache, instruction boundary information corresponding to the target tag is read from the instruction boundary cache, and by determining a tag field way number of the target tag in the instruction cache, a data field way number corresponding to the tag field way number is determined from the instruction boundary cache, so as to read the instruction boundary information in a data field corresponding to the data field way number of the instruction boundary cache.
In some embodiments, one way data field of the instruction boundary cache corresponds to one instruction boundary cache block, and the instruction boundary cache stores the instruction boundary information through the instruction boundary cache block, wherein one instruction boundary cache block of the instruction boundary cache may store a plurality of instruction boundary information and a plurality of valid bits, and one instruction boundary information corresponds to one valid bit, and the valid bit is used for indicating whether the corresponding instruction boundary information is valid or not. Fig. 6A is an alternative diagram illustrating an internal structure of an instruction boundary cache block in an instruction boundary cache according to an embodiment of the present application. As shown in fig. 6A, M instruction boundary information (instruction boundary information 0, instruction boundary information 1, 1.... And instruction boundary information M-1, respectively) and M valid bits (valid bit 0, valid bit 1, 1.. And M-1, respectively) may be stored in one instruction boundary cache block, where one instruction boundary information corresponds to one valid bit, e.g., instruction boundary information 0 corresponds to valid bit 0, instruction boundary information 1 corresponds to valid bit 1, and so on, instruction boundary information M-1 corresponds to valid bit M-1.
Optionally, in an initial state of the processor, if the instruction boundary cache in the embodiment of the present application is empty, all valid bits in an instruction boundary cache block constituting the instruction boundary cache indicate invalidity. As instruction boundary information is written into the instruction boundary cache, the valid bits corresponding to the instruction boundary information written in the instruction boundary cache block are trained to indicate that the instruction boundary information is valid. For example, when an instruction boundary information is written into a corresponding instruction boundary cache block of the instruction boundary cache, the valid bit corresponding to the instruction boundary information written in the instruction boundary cache block is set to indicate valid.
In other embodiments, one way data field of the instruction boundary cache corresponds to one instruction boundary cache block, where one instruction boundary cache block of the instruction boundary cache may include a plurality of instruction boundary information, and specifically, only the plurality of instruction boundary information may be included in the instruction boundary cache block. Fig. 6B is an exemplary diagram illustrating an alternative internal structure of an instruction boundary cache block in an instruction boundary cache according to an embodiment of the present application. As shown in fig. 6B, there are M instruction boundary information (instruction boundary information 0, instruction boundary information 1, and instruction boundary information M-1, respectively) in one instruction boundary cache block, and it can be seen that there is no valid bit corresponding to the instruction boundary information in the instruction boundary cache block of fig. 6B. For the case that no valid bit exists in the instruction boundary cache block, the embodiment of the present application may specify a value of the instruction boundary information to indicate whether the instruction boundary information is valid or invalid. For example, for 2-bit wide instruction boundary information stored in an instruction boundary cache block, a binary algorithm is used, and it is specified that a value of the instruction boundary information is 0 to indicate that the instruction boundary information is invalid, and a value of 1 to 3 indicates that the instruction boundary information is valid, for example, a value corresponding to the instruction boundary information 0 is 00, and the instruction boundary information 0 is invalid; the value of the instruction boundary information 1 is 01, and the instruction boundary information 1 is valid; of course, it may also be specified that the value of the instruction boundary information is 3 to indicate that the instruction boundary information is invalid, and the value of 0 to 2 to indicate that the instruction boundary information is valid, which is not limited in this embodiment of the present application.
Based on the existence of effective instruction boundary information and ineffective instruction boundary information in the instruction boundary cache block, wherein the effective instruction boundary information indicates that an instruction corresponding to an instruction boundary is a complete instruction, and the ineffective instruction boundary information does not correspond to any instruction boundary, namely the instruction does not have boundary information and is an incomplete instruction; therefore, when the instruction boundary information stored in the instruction boundary cache block is read from the instruction boundary cache based on the instruction fetch address, the embodiment of the application can correspondingly read the effective instruction boundary information (the number is one or more).
In a further optional implementation, in combination with the presence of a valid bit in an instruction boundary cache block, when instruction boundary information is read in an instruction boundary cache in the embodiment of the present application, based on a tag field number corresponding to a target tag hit by a fetch address in the instruction cache, one or more pieces of instruction boundary information indicated as valid by valid bit information are read in a data field number corresponding to the tag field number of the instruction boundary cache. Or, in combination with the effective representation corresponding to the numerical value of the instruction boundary information stored in the specified instruction boundary cache block, when the instruction boundary information is read from the instruction boundary cache in the embodiment of the present application, based on the number of tag domains corresponding to the target tag hit in the instruction cache by the fetch address, in the number of data domains corresponding to the number of tag domains in the instruction boundary cache, effective instruction boundary information may be read according to the numerical value of the instruction boundary information.
Fig. 7 exemplarily shows an alternative schematic diagram of reading an instruction stream and instruction boundary information in the embodiment of the present application, in order to facilitate understanding of a process of reading the instruction stream and the instruction boundary information. As shown in fig. 7, the instruction boundary cache is configured to store instruction boundary information, which is disposed outside the instruction cache and independent from the instruction cache, wherein the instruction fetch address is composed of three parts, which are tag (tag), index (index), and offset (offset), for example: for the 48-bit finger address, 47 is tag,11 is index, 6 is index, and 5 is offset. The tag is an address tag with finer granularity obtained by further dividing the instruction fetching address, and is a basis for judging whether an instruction to be accessed is in an instruction cache or not; index is used as an address indicator of the instruction cache, and indicates the range of the accessed instruction cache, such as indicating a certain line or certain lines of the instruction cache; the offset represents the offset in the instruction cache of the instruction to be accessed.
The instruction cache is divided into a tag portion (i.e., tag field) and a data portion (i.e., data field), each of which is in turn divided into N ways (i.e., way 0 through way N-1). When an instruction is read from an instruction cache according to an instruction fetch address, firstly, using an index in the instruction fetch address, one tag is read from each way of a tag part of the instruction cache, and the total number of the tags is N (namely tag0 to tag N-1). Then, according to the index of the instruction-fetching address, one corresponding data is read out in each way of the data part of the instruction cache, and N data (namely data 0 to data N-1) are read out. And comparing the tags in the instruction fetch address with the N tags read from the instruction cache one by one, wherein if the same tags exist, the instruction fetch address hits in the instruction cache, otherwise, the instruction fetch address misses in the instruction cache, that is, no instruction corresponding to the instruction fetch address exists in the instruction cache. In the case of a hit, the embodiment of the present application may select, according to a comparison result between the fetch address and the tag of the instruction cache, data corresponding to a way equal to the tag hit in the instruction cache from the N data, so as to obtain an instruction output.
The instruction boundary cache is organized in a one-to-one correspondence with the data portion of the instruction cache, and may be, for example, one instruction cache block in the instruction cache, corresponding to one instruction boundary cache block in the instruction boundary cache. In one example, the instruction boundary cache may directly multiplex the tag portion of the instruction cache, and only the data portion (i.e., the data field) may be set in the instruction boundary cache, and the data portion of the instruction boundary cache and the data portion of the instruction cache are in a one-to-one correspondence structure, so that the instruction boundary cache can perform information lookup simultaneously with the instruction cache according to the instruction fetching address. With continued reference to FIG. 7, first using the index in the instruction fetch address, N ways (i.e., way 0 to way N-1) may be read out of the instruction boundary cache, one instruction boundary information per way, for a total of N instruction boundary information. And then selecting the boundary information corresponding to the ways which are hit in the tag field of the instruction cache by the tag of the instruction fetch address and are equal to the tag of the instruction cache from the N pieces of instruction boundary information read by the instruction boundary cache according to the comparison result of the tag of the instruction fetch address and the tag of the instruction cache, thereby obtaining the instruction boundary information and outputting the instruction boundary information.
In another example, fig. 8 schematically shows another alternative schematic diagram of reading an instruction stream and instruction boundary information in the embodiment of the present application. As shown in fig. 8, the instruction boundary cache is an area (dotted line portion in fig. 8) for storing instruction boundary information allocated in the data portion of the instruction cache, so that the instruction boundary cache can multiplex the tag portion of the instruction cache, and the data portion of the instruction boundary cache corresponds to the data portion of the instruction cache one to one.
Referring to fig. 8, when an instruction is read from an instruction cache according to an instruction fetch address, first, using an index in the instruction fetch address, one tag is read for each way of a tag portion of the instruction cache, and N tags are read (i.e., tag0 corresponding to way 0 to tag N-1 corresponding to way N-1). And then reading corresponding data in each path of the data part of the instruction cache according to the index of the instruction fetching address. Since the data portion of the instruction cache is allocated with the area for storing the instruction boundary information, when the corresponding data is read out in each way of the data portion of the instruction cache according to the index of the instruction fetch address, N data can be read out in each way of the area for storing the instruction boundary information, and N data can be read out in each way of the area for dividing the area for storing the instruction boundary information, so that 2N data (namely, data 0 to data N-1 corresponding to the instruction in the instruction cache, and data 0 to data N-1 corresponding to the instruction boundary information in the instruction cache) are totally read out. Furthermore, after the tag in the fetch address is compared with the N tags read from the tag portion of the instruction cache one by one, and it is determined that the tag of the fetch address hits in the tag read from the instruction cache, the embodiment of the present application may select, according to the comparison result between the fetch address and the tag of the instruction cache, data corresponding to a way equal to the hit tag from the N data corresponding to the instruction data portion in the instruction cache, and select, from the N data corresponding to the instruction boundary information in the instruction cache, data corresponding to a way equal to the hit tag, thereby outputting the instruction and the instruction boundary information.
The following is a further explanation of the instruction cache structure with two ways. The tag part and the data part of the instruction cache are respectively provided with two paths, for example, the tag part and the data part of the instruction cache are respectively provided with a path 0 and a path 1, and the data part of the instruction boundary cache is provided with two paths, for example, the data part of the instruction boundary cache is provided with a path 0 and a path 1; and the instruction boundary cache multiplexes the tag portion of the instruction cache. Therefore, in the embodiment of the present application, the tag of way 0 and the tag of way 1 may be read out in two ways of the tag portion of the instruction cache according to the index of the fetch address, the data of way 0 and the data of way 1 may be read out in two ways of the data portion of the instruction cache, and the instruction boundary information of way 0 and the instruction boundary information of way 1 may be read out in the data portion of the instruction boundary cache. Comparing the tag of the instruction fetch address with the tags of way 1 and way 0 of the tag part of the instruction cache, if the tag of the instruction fetch address is the same as the tag of way 0 of the tag part of the instruction cache, selecting the instruction corresponding to way 0 in the data part of the instruction cache, thereby obtaining the instruction output. Meanwhile, based on the way 0 where the tag hit by the instruction fetch address in the instruction cache is located, instruction boundary information corresponding to the way 0 is selected from the data part of the instruction boundary cache, and therefore instruction boundary information is obtained and output.
In some embodiments, the instruction boundary information indicates the instruction locations at which instruction splitting is performed. Optionally, the instruction boundary information may specifically indicate an instruction ending address, for example, if 0x40013 is an instruction starting address, and the instruction length is 4, the instruction ending address is 0x40016. Based on this, when there is no branch jump in the instruction stream, an optional implementation of the instruction splitting unit in the embodiment of the present application to split the instruction stream may be that the instruction splitting unit records a numerical value of consecutive sequential instruction fetching of the instruction stream, and the recorded numerical value of sequential instruction fetching may be the first numerical value information. When the first value information reaches a first threshold, for example: the first threshold may be 2, and if the instruction boundary information read based on the current instruction fetching request is valid, the instruction stream is segmented according to the instruction end address indicated by the instruction boundary information, so as to obtain a plurality of instruction groups. The first numerical information recorded by the instruction splitting unit may be the number of instruction fetch requests corresponding to consecutive sequential instruction fetch, or the number of bytes of instruction fetch requests corresponding to consecutive sequential instruction fetch, which is not limited in this application.
It should be noted that, after the recorded first value information reaches the first threshold and the read instruction boundary information is valid, and the instruction splitting unit splits the instruction stream according to the read instruction boundary information, the first value information is reset, and records the value of consecutive sequential fetching (for example, records the number of fetch requests corresponding to consecutive sequential fetching or the number of bytes of fetch requests corresponding to consecutive sequential fetching). Further, if the instruction splitting unit encounters the branch jump instruction, the instruction stream is split according to the instruction end position of the branch jump instruction, and the first numerical value information is reset.
In some embodiments, for example, in the initial state, the instruction boundary cache is empty, i.e., no instruction boundary information is stored in the instruction boundary cache before the decoder set performs the decoding operation. Optionally, storing the instruction boundary information into the instruction boundary cache may occur during a decoding operation performed on the instruction by the decoder group, for example, the instruction boundary information determined by the decoded instruction is stored into the instruction boundary cache by the decoder group. As an alternative implementation, when a plurality of decoder groups decode an instruction, each of the decoder groups records sequential sequentially decoded numerical value information, and the recorded sequential sequentially decoded numerical value information may be the second numerical value information. When the second numerical information reaches a second threshold, for example: and the second threshold is 1, and the decoder group stores the instruction boundary information of the current instruction determined by decoding the current instruction into the instruction boundary cache. The second numerical information recorded by the decoder group may be the number of instruction fetch requests corresponding to consecutive sequential decoding, or the number of instructions corresponding to consecutive sequential decoding, which is not limited in this application.
It is to be understood that, based on the structure that the instruction boundary cache and the data portion of the instruction cache are in one-to-one correspondence, when instruction boundary information is written in the instruction boundary cache, the instruction boundary information of an instruction may be written in an instruction boundary cache block that is in one-to-one correspondence with the instruction boundary cache block. In one example, when an instruction is fetched from an instruction cache and allocated to a decoder group for decoding, the decoder group receives information of an index and a way corresponding to instruction boundary information of the instruction together. When the decoder group writes the instruction boundary information into the instruction boundary cache, the decoder group writes the instruction boundary information into an instruction boundary cache block corresponding to the index and the way information in the instruction boundary cache.
It should be further noted that, when the second value information recorded by the decoder group reaches the second threshold and the instruction boundary information determined by decoding the current instruction is stored in the instruction boundary cache, the second value information is reset, and the value information of consecutive sequential decoding is re-recorded (for example, the number of instruction fetch requests corresponding to consecutive sequential decoding or the number of instructions corresponding to consecutive sequential decoding is re-recorded). The second numerical information is reset when the decoder bank encounters a branch jump instruction.
According to the embodiment of the application, the instruction boundary information of the instruction can be recorded through the instruction boundary cache, and the instruction stream of the instruction is segmented, so that the segmented instruction stream is distributed to the plurality of decoders to be decoded in parallel, the effect of segmenting the instruction stream of the instruction and distributing the instruction stream to the plurality of decoder groups to be decoded is achieved, the throughput of the decoder groups can be improved, and the decoding performance of the processor is improved.
In the above description, the processor in the embodiment of the present application realizes parallel decoding of instructions by setting multiple decoder groups, so as to improve throughput of the decoder groups, but the obtained microinstructions need to undergo the processes of fetching and decoding.
FIG. 9 is a block diagram of a processor architecture with a micro instruction cache according to an embodiment of the present disclosure. Referring to fig. 3 and 9, a micro instruction cache 203 is further added to the processor shown in fig. 9, and is configured to store micro instructions obtained by decoding instructions of a plurality of decoder groups, so that when an instruction fetch request generated by the branch prediction unit is directly issued to the micro instruction cache, if the instruction fetch request hits in the micro instruction cache, the micro instruction cache directly outputs a corresponding micro instruction.
It should be noted that the micro-instruction cache 203 includes a plurality of entries, where each entry can accommodate up to N micro-instructions, and each instruction fetch request (i.e., the corresponding indicated instruction fetch address range) may correspond to a plurality of micro-instruction cache entries. When fetching instruction in the micro instruction cache, firstly, the fetch starting address in the fetch request is matched with the address of the first micro instruction in all micro instruction cache table entries, and if the matching is successful, the maximum N micro instructions in the corresponding micro instruction cache table entries are obtained. If the ending address of the last micro-instruction in the micro-instruction cache table entry is smaller than the ending address of the instruction fetching range, the ending address of the last micro-instruction in the micro-instruction cache table entry is used for further matching the addresses of the first micro-instructions in all the micro-instruction cache table entries, and if the matching is successful, the maximum N micro-instructions in the second micro-instruction cache table entry are obtained. The above process is repeated until the ending address of the fetch range is less than the ending address of the last microinstruction in the microinstruction cache entry.
As can be seen from the above process, in the micro instruction cache, although each micro instruction cache entry stores a maximum of N micro instructions, only the starting address of the first micro instruction can participate in the lookup comparison. For example, if a micro instruction cache entry stores micro instructions 0-7, the entry can only be obtained by using the starting address lookup of micro instruction 0. That is, the micro-instruction cache entry construction and lookup is paired. For example, the instruction fetch request corresponds to instructions 1 to 7, and if the entry of the micro instruction cache is constructed as micro instructions 0 to 7, the entry cannot be accessed by the instruction fetch request because the initial address of the instruction fetch 1 in the instruction fetch request is not matched with the address of the micro instruction 0 in the entry of the micro instruction cache; if the entry of the micro instruction cache is constructed as 1-8, the entry is accessible by the instruction fetch request because the starting address of the instruction fetch 1 in the instruction fetch request matches the address of the micro instruction 1 in the micro instruction cache entry.
Based on the construction principle of the micro-instruction cache table entry, the embodiment of the application is further improved in the processor, and the purpose that when the micro-instruction cache constructs the micro-instruction table entry, the instruction boundary information of the corresponding instruction carried by the micro-instruction is stored in the instruction boundary cache is achieved. Fig. 10 is a block diagram illustrating another architecture of a processor provided in an embodiment of the present application.
As shown in FIG. 10, the decoder sets 1031 to 103n store decoded micro instructions in the micro instruction cache 203, and the micro instruction cache 203 fills the instruction boundary information into the instruction boundary cache 202. When a processor accesses instruction cache 102 based on an instruction fetch request generated by branch prediction unit 101, instruction boundary cache 202 can be accessed synchronously; thus, according to the fetch address in the fetch request, the instruction stream is read in the instruction cache 102, and the instruction boundary information is read in the instruction boundary cache 202; further, the instruction splitting unit 201 splits the instruction stream according to the effective instruction boundary information to obtain a plurality of instruction groups, and allocates the instruction groups to the decoder groups 1031 to 103n for parallel decoding.
It should be noted that, in the processor shown in fig. 10, the filling of the instruction boundary information into the instruction boundary cache by the micro instruction cache occurs when the micro instruction cache constructs a micro instruction cache entry, where the micro instruction cache entry may also be referred to as an entry of a micro instruction. As an optional implementation, for example, in an initial state, the decoder groups 1031 to 103n do not perform decoding operations, the micro instruction cache cannot execute the micro instruction cache table entry, and therefore cannot obtain the instruction boundary information, and at this time, the instruction boundary cache may be empty; after the decoder sets 1031 to 103n execute decoding operations, the decoded sequential microinstructions may be sent to the microinstruction cache 203, so that the microinstruction cache constructs table entries and fills the sequential microinstructions into the microinstruction cache; when the micro-instruction cache table item is constructed, the micro-instruction cache stores the boundary information corresponding to the micro-instruction at the table item end position to the instruction boundary cache, wherein the boundary information corresponding to the micro-instruction at the micro-instruction cache table item end position is the end position of the instruction.
It should be further explained that, when constructing the table entry, the microinstruction to be placed needs to satisfy the construction condition corresponding to the microinstruction cache table entry. In an alternative example, the micro instructions obtained by decoding by the decoder group are consecutive micro instructions, and the decoder group sends the consecutive micro instructions obtained by decoding to the micro instruction cache for constructing the table entry and filling the table entry into the micro instruction cache. Wherein, each micro-instruction cache table entry correspondingly stores the storage capacity of N micro-instructions. The micro instruction cache checks the micro instructions to be put in to determine whether the micro instructions to be put in meet a plurality of construction conditions of the corresponding table entry, and when any construction condition is not met, the micro instructions to be put in cannot enter the table entry. For example: a0, A1, A2, A3, A4 are consecutive microinstructions that the decoder group sends to the cache of the microinstruction, when A0, A1, A2, A3, A4 is in constructing the cache table entry of the microinstruction, A0, A1, A2 meet the construction condition, form a cache table entry of the microinstruction, A3 can't meet the construction condition, begin with A3, construct a new table entry different from cache table entry of the microinstruction that A0, A1, A2 forms again, at this moment, the correspondent boundary information of A2 is the end position of the order, is filled into the cache of the instruction boundary. For another example: b0, B1, B2, B3, B4 are consecutive microinstructions that the decoder group sends to the microinstruction cache, when B0, B1, B2, B3, B4 is in constructing the cache table entry of the microinstruction, B0, B1, B2, B3 form a cache table entry of the microinstruction, B4 can't meet the construction condition, begin to construct a new table entry from B4 again, at this moment, the correspondent boundary information of B3 is the end position of the order, is filled into the instruction boundary cache.
It can be understood that, since the boundary information corresponding to the micro instruction at the end position of the micro instruction cache entry is the end position of the instruction, when the instruction boundary information is constructed by the micro instruction cache 203 into the micro instruction cache entry, the instruction boundary cache 202 is written, the decoder group may further transfer the information of the index and the way corresponding to the instruction boundary information of the instruction to the micro instruction cache, and the micro instruction cache writes the instruction boundary information into the instruction boundary cache block corresponding to the information of the index and the way in the instruction boundary cache.
According to the embodiment of the application, the instruction stream read in the instruction cache can be timely segmented based on the instruction boundary information recorded in the instruction boundary cache, so that the condition that a plurality of decoder groups cannot decode in parallel due to untimely segmentation of the instruction is avoided, the decoding throughput of the processor is effectively improved, and the decoding performance of the processor is improved.
An embodiment of the present application further provides a processor, which, in combination with fig. 9 or fig. 10, includes at least:
the instruction cache is used for acquiring an instruction stream according to the instruction fetching address;
the instruction boundary cache is used for acquiring instruction boundary information according to the instruction fetching address, and the instruction boundary information indicates an instruction position for performing instruction segmentation;
and the instruction splitting unit is used for splitting the instruction stream according to the instruction position indicated by the instruction boundary information and distributing the split instruction stream to a plurality of decoder groups for parallel decoding.
In some embodiments, the instruction boundary cache to fetch instruction boundary information according to the fetch address comprises:
and according to a target tag hit by the instruction fetch address in the instruction cache, reading instruction boundary information corresponding to the target tag from the instruction boundary cache, wherein the instruction boundary cache multiplexes the tag of the instruction cache, and the target tag is the tag of the instruction cache corresponding to the instruction fetch address.
In some embodiments, the instruction cache includes multiple ways of tag fields and data fields corresponding to each way of tag fields, the instruction boundary cache includes multiple ways of data fields; the multi-path data field of the instruction boundary cache is multiplexed with the multi-path tag field of the instruction cache, and the multi-path data field of the instruction boundary cache is in one-to-one correspondence with the multi-path tag field of the instruction cache.
In some embodiments, the instruction boundary cache is configured to, according to a target tag hit in the instruction cache by the fetch address, read instruction boundary information corresponding to the target tag from the instruction boundary cache, where the reading includes:
determining a data domain path number corresponding to the tag domain path number from the instruction boundary cache according to the tag domain path number of the target tag in the instruction cache; and reading instruction boundary information in a data domain corresponding to the data domain path number of the instruction boundary cache.
In some embodiments, a way data field of the instruction boundary cache is stored in a cache block; the one-way data field of the instruction boundary cache comprises a plurality of instruction boundary information and a plurality of valid bits respectively indicating whether each instruction boundary information is valid or not, or the one-way data field of the instruction boundary cache comprises one valid bit and a plurality of instruction boundary information uniformly indicating whether each valid instruction boundary information is valid or not by the one valid bit.
In some embodiments, the instruction boundary cache, configured to read instruction boundary information in a data field corresponding to the data field number of the instruction boundary cache, includes:
reading one or more instruction boundary information indicating that a valid bit is valid in a data field corresponding to the data field way number of the instruction boundary cache.
In some embodiments, the instruction boundary information indicates instruction locations for instruction splitting, including: the instruction boundary information indicates an instruction end address.
In some embodiments, the instruction splitting unit, configured to split the instruction stream according to the instruction position indicated by the instruction boundary information, includes:
recording first numerical value information of continuous sequential instruction fetching, and when the first numerical value information reaches a first threshold value and instruction boundary information read based on a current instruction fetching request is effective, segmenting the instruction stream according to an instruction ending address indicated by the instruction boundary information to obtain a plurality of instruction groups.
In some embodiments, the first numerical information includes a number of fetch addresses corresponding to consecutive sequential fetches, or a number of bytes of fetch addresses corresponding to consecutive sequential fetches.
In some embodiments, the split instruction stream is divided into a split previous instruction group and a split subsequent instruction group with the position of the split instruction as a boundary. The instruction splitting unit is used for distributing the split instruction streams to a plurality of decoder groups for parallel decoding and comprises the following steps:
and distributing the instruction before splitting to a default decoder group for decoding, and distributing the instruction after splitting to a next decoder group of the default decoder group for decoding.
In some further embodiments, the processor may further comprise: a plurality of decoder groups;
and the decoder group is used for decoding the distributed instructions and storing the instruction boundary information determined by the decoded instructions into the instruction boundary cache.
In some embodiments, the saving the instruction boundary information determined by the decoded instruction into the instruction boundary cache comprises:
and recording second numerical value information of continuous sequential decoding in the decoder group, and storing the instruction boundary information of the current instruction determined by the current instruction decoded by the decoder group into the instruction boundary cache when the second numerical value information reaches a second threshold value.
In some embodiments, the second numerical information includes a number of instruction fetch requests corresponding to sequential decoding, or a number of instructions corresponding to sequential decoding.
In some further embodiments, the instruction splitting unit is further configured to:
and if the instruction fetching address is an address indicating branch jumping, the instruction stream is segmented according to branch jumping information, and the segmented instruction stream is distributed to a plurality of decoder groups for parallel decoding.
In some further embodiments, the processor further comprises: and the micro instruction cache is used for storing the micro instructions obtained by the decoding instructions of the decoder group and storing the instruction boundary information carried by the micro instructions into the instruction boundary cache when the table entry for storing the micro instructions is constructed.
The embodiment of the application also provides a chip, and the chip can comprise the processor.
The embodiment of the application further provides an electronic device, which may include the chip.
While various embodiments have been described above in connection with what are presently considered to be the embodiments of the disclosure, the various alternatives described in the various embodiments can be readily combined and cross-referenced without conflict to extend the variety of possible embodiments that can be considered to be the disclosed and disclosed embodiments of the disclosure.
Although the embodiments of the present application are disclosed above, the present application is not limited thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present disclosure, and it is intended that the scope of the present disclosure be defined by the appended claims.

Claims (24)

1. An instruction allocation method, comprising:
reading an instruction stream from an instruction cache according to an instruction fetching address, and reading instruction boundary information from an instruction boundary cache according to the instruction fetching address, wherein the instruction boundary information indicates an instruction position for performing instruction segmentation;
and segmenting the instruction stream according to the instruction position indicated by the instruction boundary information, and distributing the segmented instruction stream to a plurality of decoder groups for parallel decoding.
2. The method of claim 1, wherein reading instruction boundary information from an instruction boundary cache according to the instruction fetch address comprises:
and according to a target tag hit by the instruction fetching address in the instruction cache, reading instruction boundary information corresponding to the target tag from the instruction boundary cache, wherein the instruction boundary cache multiplexes the tag of the instruction cache.
3. The method of claim 2, wherein the instruction cache includes multiple tag fields and data fields corresponding to the multiple tag fields, and wherein the instruction boundary cache includes multiple data fields; the multi-path data field of the instruction boundary cache is multiplexed with the multi-path tag field of the instruction cache, and the multi-path data field of the instruction boundary cache is in one-to-one correspondence with the multi-path tag field of the instruction cache.
4. The method according to claim 3, wherein the reading instruction boundary information corresponding to the target tag from the instruction boundary cache according to the target tag hit by the fetch address in the instruction cache comprises:
determining a data domain path number corresponding to the tag domain path number from the instruction boundary cache according to the tag domain path number of the target tag in the instruction cache; and reading instruction boundary information in a data domain corresponding to the data domain path number of the instruction boundary cache.
5. The method of claim 4, wherein a way data field of the instruction boundary cache is stored in an instruction boundary cache block; the instruction boundary cache block includes a plurality of instruction boundary information and a plurality of valid bits respectively indicating whether the respective instruction boundary information is valid or not, or the instruction boundary cache block includes a plurality of instruction boundary information.
6. The method of claim 5, wherein the reading instruction boundary information in a data field corresponding to the data field way number of the instruction boundary cache comprises:
reading one or more instruction boundary information indicating that the indication of the valid bit information is valid in a data field corresponding to the number of data field ways of the instruction boundary cache, or reading valid instruction boundary information according to the value of the instruction boundary information in a data field corresponding to the number of data field ways of the instruction boundary cache.
7. The instruction distribution method according to claim 1, wherein the instruction boundary information indicates an instruction position for performing instruction splitting, and comprises: the instruction boundary information indicates an instruction end address.
8. The instruction distribution method according to claim 7, wherein the splitting the instruction stream according to the instruction position indicated by the instruction boundary information comprises:
recording first numerical value information of continuous sequential instruction fetching, and when the first numerical value information reaches a first threshold value and instruction boundary information read based on a current instruction fetching address is effective, segmenting the instruction stream according to an instruction ending address indicated by the instruction boundary information to obtain a plurality of instruction groups.
9. The method of claim 8, wherein the first value information comprises a number of fetch addresses corresponding to consecutive sequential fetches, or a number of bytes of the fetch addresses corresponding to consecutive sequential fetches.
10. The instruction distribution method according to claim 1, wherein the sliced instruction stream is divided into a preceding instruction group after slicing and a succeeding instruction group after slicing with a position of the slicing instruction as a boundary; the allocating the split instruction stream to a plurality of decoder groups for parallel decoding comprises:
and allocating the segmented former instruction group to a default decoder group for decoding, and allocating the segmented latter instruction group to a next decoder group of the default decoder group for decoding.
11. The instruction distribution method of claim 1, further comprising:
when the decoder group decodes the instruction, the instruction boundary information determined by the instruction decoded by the decoder group is stored in the instruction boundary cache.
12. The method of claim 11, wherein saving the instruction boundary information determined by the decoder group decoding the instruction into the instruction boundary cache when the decoder group decodes the instruction comprises:
and recording second numerical value information of continuous sequential decoding in the decoder group, and storing the instruction boundary information of the current instruction determined by the current instruction decoded by the decoder group into the instruction boundary cache when the second numerical value information reaches a second threshold value.
13. The method of claim 12, wherein the second numerical information comprises a number of fetch addresses corresponding to sequential decoding, or a number of instructions corresponding to sequential decoding.
14. The instruction distribution method according to claim 1, further comprising:
and if the instruction fetching address is an address indicating branch jumping, the instruction stream is segmented according to branch jumping information, and the segmented instruction stream is distributed to a plurality of decoder groups for parallel decoding.
15. The instruction distribution method of claim 1, further comprising:
storing microinstructions obtained by decoding the instructions by the plurality of decoder groups into a microinstruction cache;
and when the micro-instruction cache structure finishes storing the table entry of the micro-instruction, storing the instruction boundary information carried by the micro-instruction into the instruction boundary cache.
16. A processor, comprising:
the instruction cache is used for acquiring an instruction stream according to the instruction fetching address;
the instruction boundary cache is used for acquiring instruction boundary information according to the instruction fetching address, wherein the instruction boundary information indicates an instruction position for performing instruction segmentation;
and the instruction splitting unit is used for splitting the instruction stream according to the instruction position indicated by the instruction boundary information and distributing the split instruction stream to a plurality of decoder groups for parallel decoding.
17. The processor of claim 16, wherein the instruction boundary information indicates instruction locations for instruction slicing comprising: the instruction boundary information indicates an instruction end address.
18. The processor of claim 17, wherein the instruction splitting unit is configured to split the instruction stream according to the instruction position indicated by the instruction boundary information, and wherein the instruction splitting unit is configured to:
recording first numerical value information of continuous sequential instruction fetching, and when the first numerical value information reaches a first threshold value and instruction boundary information read based on a current instruction fetching request is effective, segmenting the instruction stream according to an instruction ending address indicated by the instruction boundary information to obtain a plurality of instruction groups.
19. The processor of claim 16, wherein the stream of sliced instructions is divided into a preceding instruction group after slicing and a succeeding instruction group after slicing, with a position of the slicing instruction as a boundary; the instruction splitting unit is configured to allocate the split instruction stream to a plurality of decoder groups for parallel decoding, and includes:
and allocating the segmented former instruction group to a default decoder group for decoding, and allocating the segmented latter instruction group to a next decoder group of the default decoder group for decoding.
20. The processor of claim 16, further comprising: a plurality of decoder groups;
and the decoder group is used for decoding the distributed instruction group and storing the instruction boundary information determined by the decoded instruction into the instruction boundary cache.
21. The processor as claimed in claim 16, wherein the instruction splitting unit is further configured to split the instruction stream according to branch jump information and allocate the split instruction stream to a plurality of decoder groups for parallel decoding if the fetch address is an address indicating branch jump.
22. The processor of claim 16, further comprising:
and the micro instruction cache is used for storing the micro instructions obtained by the decoding instructions of the decoder group and storing the instruction boundary information carried by the micro instructions into the instruction boundary cache when the table entry for storing the micro instructions is constructed.
23. A chip comprising a processor as claimed in any one of claims 16 to 22.
24. An electronic device comprising the chip of claim 23.
CN202211348765.5A 2022-10-31 2022-10-31 Instruction distribution method, processor, chip and electronic equipment Active CN115658150B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211348765.5A CN115658150B (en) 2022-10-31 2022-10-31 Instruction distribution method, processor, chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211348765.5A CN115658150B (en) 2022-10-31 2022-10-31 Instruction distribution method, processor, chip and electronic equipment

Publications (2)

Publication Number Publication Date
CN115658150A true CN115658150A (en) 2023-01-31
CN115658150B CN115658150B (en) 2023-06-09

Family

ID=84992598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211348765.5A Active CN115658150B (en) 2022-10-31 2022-10-31 Instruction distribution method, processor, chip and electronic equipment

Country Status (1)

Country Link
CN (1) CN115658150B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108279927A (en) * 2017-12-26 2018-07-13 芯原微电子(上海)有限公司 The multichannel command control method and system, controller of adjustable instruction priority
CN110069285A (en) * 2019-04-30 2019-07-30 海光信息技术有限公司 A kind of method and processor of detection branches prediction
CN112631660A (en) * 2020-12-16 2021-04-09 广东赛昉科技有限公司 Method for parallel instruction extraction and readable storage medium
CN113986774A (en) * 2021-11-16 2022-01-28 中国科学院上海高等研究院 Cache replacement system and method based on instruction stream and memory access mode learning
CN114201219A (en) * 2021-12-21 2022-03-18 海光信息技术股份有限公司 Instruction scheduling method, instruction scheduling device, processor and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108279927A (en) * 2017-12-26 2018-07-13 芯原微电子(上海)有限公司 The multichannel command control method and system, controller of adjustable instruction priority
CN110069285A (en) * 2019-04-30 2019-07-30 海光信息技术有限公司 A kind of method and processor of detection branches prediction
CN112631660A (en) * 2020-12-16 2021-04-09 广东赛昉科技有限公司 Method for parallel instruction extraction and readable storage medium
CN113986774A (en) * 2021-11-16 2022-01-28 中国科学院上海高等研究院 Cache replacement system and method based on instruction stream and memory access mode learning
CN114201219A (en) * 2021-12-21 2022-03-18 海光信息技术股份有限公司 Instruction scheduling method, instruction scheduling device, processor and storage medium

Also Published As

Publication number Publication date
CN115658150B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
US5778434A (en) System and method for processing multiple requests and out of order returns
KR101835949B1 (en) Cache replacement policy that considers memory access type
KR101867143B1 (en) Set associative cache memory with heterogeneous replacement policy
US7409535B2 (en) Branch target prediction for multi-target branches by identifying a repeated pattern
US5941981A (en) System for using a data history table to select among multiple data prefetch algorithms
KR101817847B1 (en) Cache memory budgeted by ways based on memory access type
EP0927394B1 (en) A cache line branch prediction scheme that shares among sets of a set associative cache
JP5177141B2 (en) Arithmetic processing device and arithmetic processing method
CN112543916B (en) Multi-table branch target buffer
US20020138700A1 (en) Data processing system and method
JPS59132049A (en) Data processing system
JP5933360B2 (en) Specifying the address of the branch destination buffer in the data processor
US7093100B2 (en) Translation look aside buffer (TLB) with increased translational capacity for multi-threaded computer processes
US11249762B2 (en) Apparatus and method for handling incorrect branch direction predictions
CN114201219A (en) Instruction scheduling method, instruction scheduling device, processor and storage medium
CN111008039B (en) Apparatus and method for providing decoded instruction
CN115658150B (en) Instruction distribution method, processor, chip and electronic equipment
CN115525344B (en) Decoding method, processor, chip and electronic equipment
CN114090077B (en) Method and device for calling instruction, processing device and storage medium
US20070233961A1 (en) Multi-portioned instruction memory
US11947461B2 (en) Prefetch unit filter for microprocessor
CN115525343B (en) Parallel decoding method, processor, chip and electronic equipment
CN111190645B (en) Separated instruction cache structure
CN115629807B (en) Decoding method of multi-thread processor, chip and electronic equipment
US11907126B2 (en) Processor with multiple op cache pipelines

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant