CN117348933B - Processor and computer system - Google Patents

Processor and computer system Download PDF

Info

Publication number
CN117348933B
CN117348933B CN202311652922.6A CN202311652922A CN117348933B CN 117348933 B CN117348933 B CN 117348933B CN 202311652922 A CN202311652922 A CN 202311652922A CN 117348933 B CN117348933 B CN 117348933B
Authority
CN
China
Prior art keywords
instruction
buffer
output
unit
instruction information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311652922.6A
Other languages
Chinese (zh)
Other versions
CN117348933A (en
Inventor
刘宇翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruisixinke Shenzhen Technology Co ltd
Original Assignee
Ruisixinke Shenzhen Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruisixinke Shenzhen Technology Co ltd filed Critical Ruisixinke Shenzhen Technology Co ltd
Priority to CN202311652922.6A priority Critical patent/CN117348933B/en
Publication of CN117348933A publication Critical patent/CN117348933A/en
Application granted granted Critical
Publication of CN117348933B publication Critical patent/CN117348933B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30025Format conversion instructions, e.g. Floating-Point to Integer, decimal conversion
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention is applicable to the technical field of processors and provides a processor and a computer system. According to the invention, by adding the additional instruction group buffer unit to store the instruction information content possibly output in the next period, when the instruction buffer unit module needs to output, the corresponding instruction is not selected from the main buffer through a large cross switch matrix to output, but the instruction information content of the corresponding instruction group buffer is selected instead to output and judge. And at the same time supplements instructions from the input of the main buffer or instruction buffer unit. With this structure, the time of two stages of logic can be saved. And with the increase of the size of the main buffer and the emission quantity of the superscalar processor system, the processor provided by the invention has more time sequence benefits, is beneficial to realizing higher working frequency, and achieves better performance.

Description

Processor and computer system
Technical Field
The present invention relates to a processor, and more particularly, to a processor and a computer system.
Background
The processor is the primary computing component in the computer system. It is responsible for executing instructions and performing arithmetic and logical operations. With the continuous development of computer technology, the performance and the function of a processor are greatly improved. Currently, processors on the market are mainly divided into two types: a reduced instruction set processor (Reduced Instruction Set Computing, RISC) and a complex instruction set processor (Complex Instruction Set Computer, CISC). A reduced instruction set processor may enable efficient instruction processing using a simple instruction set, while a complex instruction set processor supports more complex instructions and higher level operations. With the development of technologies such as artificial intelligence and machine learning, the requirements on the performance of a processor become higher and higher.
A simplified typical superscalar processor pipeline architecture is shown in fig. 1. As shown in fig. 1, it can be divided into four modules: fetch unit, instruction buffer unit, decoding unit, and execution unit. The instruction fetching unit is responsible for fetching instructions to be executed by the processor from the memory in each cycle; the instruction buffer unit is responsible for protecting the instruction fetched by the instruction fetch unit and balancing instruction throughput gap between the instruction fetch unit and the decoding unit. The decoding unit is responsible for decoding the fetched instruction to obtain an operand, and sending relevant information into the execution unit for execution to obtain a result. Among these, the instruction buffer unit is a key component in a superscalar processor, which functions to store and schedule instructions. Because in a high-performance superscalar processor system, in view of system performance balance, the number of instructions fetched by the instruction fetching unit in each cycle is generally greater than the number of instructions that can be decoded by each unit in the subsequent stage decoding, the instructions fetched by the instruction fetching unit are generally stored in the instruction buffer unit and then sent to the decoding unit in the subsequent stage, so as to balance different load requirements on two sides. It can be said that instruction buffer unit performance directly affects the performance and efficiency of a superscalar processor.
The main body of the existing instruction buffer is composed of a fixed-size first-in first-out (First In First Out, FIFO) queue, i.e. a main buffer, and the structure is shown in fig. 2. When the superscalar processor runs, the fetch unit of the front stage stores the fetched pieces of instruction information in the main buffer of the instruction buffer in sequence. Meanwhile, at the outlet of the instruction buffer unit, each cycle is transmitted to a certain number of instructions of the later stage processing unit according to the residual instruction number of the current instruction buffer unit.
As shown in fig. 2, the conventional instruction buffer unit mainly comprises two crossbar matrices, an output selection logic unit, an output buffer and a main buffer. When in operation, the device comprises: when an upper level transmits an instruction packet information, if the residual space of the main buffer of the current instruction buffer is insufficient, the front-end request is back-pressed; if the remaining space is sufficient, the corresponding main buffer position of the input instruction packet is selected according to the write pointer of the current main buffer, the input instruction packet is sequentially stored in the FIFO of the main buffer, and the write pointer position of the FIFO is updated. When the instruction buffer outputs the instruction to the next stage, the instruction to be output is selected from the main buffer through the cross switch matrix according to the read pointer position of the FIFO, stored in the output buffer, and the instruction which can be actually output is judged through the output selection logic unit and is sent to the next stage decoding unit.
In a superscalar processor system of this architecture, to ensure that the pipeline at the instruction buffer runs continuously without disruption, each cycle of the instruction buffer needs to complete the following work, as shown in FIG. 3.
1. According to the latest main buffer read pointer of the present period, the instruction range which is likely to be output when the beat is read out from the main buffer. 2. According to the content of the first instruction in the instruction range, judging the first instruction as a compressed/non-compressed instruction, and determining the starting position of the next instruction according to the content. 3. And circularly carrying out the operation in the last step, determining the boundaries of all instructions, and obtaining the number of main buffers occupied by the output instructions. 4. Based on the current main buffer effective depth, the number of actual instructions that can be issued to the later stage pipeline is determined. 5. The read pointer of the main buffer is updated.
It can be seen that since the start and end points in this path are both the read pointers of the main buffer, the contents of this path must be completed in one cycle of time in order to ensure that the pipeline at the instruction buffer runs uninterrupted. This also becomes a key point limiting the frequency with which superscalar processors can be implemented. At the same time, the logic depth of the crossbar matrix in the output logic and the complexity of the output select logic are related to the main buffer size of the instruction buffer unit and the number of emissions of the superscalar processor system, which is more significant for high performance, high throughput systems, and thus may be a bottleneck in some computationally intensive applications, resulting in the superscalar processor not fully utilizing its processing power.
Disclosure of Invention
The invention provides a processor and a computer system, which solve the problem that the performance and efficiency of the processor are affected by overlong critical time sequence paths related to the output end of a command buffer unit in the processor in the prior art.
In a first aspect, the present invention provides a processor, where the processor includes a fetch unit, an instruction buffer unit connected to an output end of the fetch unit, a decoding unit connected to an output end of the instruction buffer unit, and an execution unit connected to an output end of the decoding unit;
the instruction fetching unit is used for fetching instruction information which needs to be executed by the processor from the memory in each period; the instruction buffer unit is used for storing the instruction information fetched by the instruction fetching unit and balancing throughput gaps of the instruction information between the instruction fetching unit and the decoding unit; the decoding unit is used for decoding the fetched instruction information to obtain an operand and transmitting the operand to the execution unit; the execution unit is used for executing the instruction information and obtaining a result;
the instruction buffer unit comprises a main buffer, a first crossbar matrix connected with the input end of the main buffer, a plurality of instruction group buffers respectively connected with the output end of the main buffer and the output end of the first crossbar matrix, a second crossbar matrix connected with the output ends of the plurality of instruction group buffers, an output buffer connected with the output end of the second crossbar matrix and an output selection logic unit connected with the output end of the output buffer; the output end of the output selection logic unit is connected with the decoding unit;
the instruction group buffer is used for storing instruction information to be output in the next period; the first crossbar is used for splitting the instruction information output by the instruction fetching unit into the instruction information with the storage size in the corresponding instruction buffer unit according to the current instruction information condition stored in the instruction buffer unit, and outputting the instruction information to the corresponding positions in the main buffer and the instruction group buffer for storage; the main buffer is used for storing instruction information output by the instruction fetching unit; the second crossbar is used for selecting instruction information in the corresponding instruction group buffer from a plurality of instruction group buffers and outputting the instruction information to the output buffer; the output buffer is used for temporarily storing instruction information in the corresponding instruction group buffer; the output selection logic unit is used for judging according to the instruction information output by the output buffer, selecting the instruction information to be output in the current period and outputting the instruction information to the decoding unit.
Preferably, after the main buffer receives the instruction information output by the instruction fetching unit, if the current remaining space of the main buffer is insufficient, back-pressing a front-end request to the instruction fetching unit; and if the residual space of the main buffer is sufficient, storing instruction information into a first-in first-out queue of the main buffer in sequence, and updating the write pointer position of the first-in first-out queue.
Preferably, the number of instruction set buffers corresponds to the number of maximum outputs of the main buffer per cycle.
Preferably, the instruction information received by the instruction group buffer includes instruction information output by the main buffer and instruction information output by the instruction fetching unit in a current period.
Preferably, after the output selection logic unit outputs instruction information to the decoding unit, the read pointer position of the main buffer is updated according to the number of the output instruction information.
Preferably, the instruction information of the corresponding instruction group buffer is selected as the instruction information output by the output buffer according to the number of the instruction information output by the main buffer in the previous period.
In a second aspect, the present invention also provides a computer system comprising a processor as in any one of the above embodiments.
Compared with the prior art, the invention stores the instruction information content possibly output in the next period by adding the additional instruction group buffer unit, and when the instruction buffer unit module needs to output, the instruction information content of the corresponding instruction group buffer is selected to output instead of selecting the corresponding instruction from the main buffer through a large crossbar matrix to output. And at the same time supplements instructions from the input of the main buffer or instruction buffer unit. With this structure, the time of two stages of logic can be saved. And with the increase of the size of the main buffer and the emission quantity of the superscalar processor system, the processor provided by the invention has more time sequence benefits, is beneficial to realizing higher working frequency, and achieves better performance.
Drawings
The present invention will be described in detail with reference to the accompanying drawings. The foregoing and other aspects of the invention will become more apparent and more readily appreciated from the following detailed description taken in conjunction with the accompanying drawings. In the accompanying drawings:
FIG. 1 is a schematic diagram of a simplified superscalar processor pipeline provided by the related art;
FIG. 2 is a schematic diagram of an instruction buffer unit according to the related art;
FIG. 3 is a schematic diagram of a critical timing path of an output stage of an instruction buffer unit according to the related art;
fig. 4 is a schematic diagram of an instruction buffer unit according to an embodiment of the present invention.
The system comprises 100 instruction buffer units, 101, first crossbar switch matrixes, 102, main buffers, 103, instruction group buffers, 104, second crossbar switch matrixes, 105, output buffers, 106 and output selection logic units.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
Referring to fig. 4, the present invention provides a processor, which includes a fetch unit, an instruction buffer unit 100 connected to an output end of the fetch unit, a decoding unit connected to an output end of the instruction buffer unit 100, and an execution unit connected to an output end of the decoding unit;
the instruction fetching unit is used for fetching instruction information which needs to be executed by the processor from the memory in each period; the instruction buffer unit 100 is configured to store the instruction information fetched by the instruction fetch unit, and balance a throughput gap of the instruction information between the instruction fetch unit and the decoding unit; the decoding unit is used for decoding the fetched instruction information to obtain an operand and transmitting the operand to the execution unit; the execution unit is used for executing the instruction information and obtaining a result;
the instruction buffer unit 100 includes a main buffer 102, a first crossbar matrix 101 connected to an input of the main buffer 102, a plurality of instruction group buffers 103 connected to an output of the main buffer 102 and an output of the first crossbar matrix 101, respectively, a second crossbar matrix 104 connected to outputs of the plurality of instruction group buffers 103, an output buffer 105 connected to an output of the second crossbar matrix 104, and an output selection logic unit 106 connected to an output of the output buffer 105; wherein, the output end of the output selection logic unit 106 is connected with the decoding unit;
the instruction set buffer 103 is used for storing instruction information to be output in the next period; the first crossbar 101 is configured to split the instruction information output by the instruction fetching unit into corresponding instruction information with a size stored in the instruction buffer unit 100 according to the current instruction information situation stored in the instruction buffer unit 100, and output the instruction information to corresponding positions in the main buffer 102 and the instruction group buffer 103 for storage; the main buffer 102 is configured to store instruction information output by the instruction fetch unit; the second crossbar 104 is configured to select instruction information in the corresponding instruction set buffer 103 from the plurality of instruction set buffers 103, and output the instruction information to the output buffer 105; the output buffer 105 is configured to temporarily store instruction information in the corresponding instruction set buffer 103; the output selection logic unit 106 is configured to determine according to the instruction information output by the output buffer 105, and select instruction information to be output in the current period to output to the decoding unit.
In the embodiment of the present invention, after the main buffer 102 receives the instruction information output by the instruction fetching unit, if the current remaining space of the main buffer 102 is insufficient, back-pressing a front-end request to the instruction fetching unit; if the remaining space of the main buffer 102 is sufficient, instruction information is sequentially stored in a first-in first-out queue of the main buffer 102, and the write pointer position of the first-in first-out queue is updated.
In an embodiment of the present invention, the number of instruction set buffers 103 corresponds to the number of maximum outputs of the main buffer 102 per cycle. Specifically, the instruction set buffer 103 outputs at most 3 instructions per cycle, so the main buffer 102 outputs 6 depth contents per cycle, and thus 7 instruction set buffers 103 (1 more is an instruction when not currently outputting) are required, and each instruction set buffer 103 contains one complete instruction of the output buffer 105, i.e., 6 depth instructions. It should be noted that the number of instruction set buffers 103 is related to the maximum output of the instruction buffer unit 100 in each cycle, and other numbers of instruction set buffers 103 are possible.
In the embodiment of the present invention, the instruction information received by the instruction set buffer 103 includes the instruction information output by the main buffer 102 and the instruction information output by the instruction fetch unit in the current period. Specifically, when the instruction information of the main buffer 102 is valid, the instruction group buffer 103 receives the instruction information output from the main buffer 102; otherwise, it is checked whether there is valid input instruction information currently, and when there is valid instruction information input by the instruction fetch unit, the corresponding instruction information is selected to be input to the main buffer 102.
For example, taking 7 instruction set buffers 103 as an example, when the instruction buffer unit 100 outputs instruction information to the decoding unit, if the number of instruction information output in a cycle on the main buffer 102 is 2, the instruction set buffer 103 with the number of instruction information corresponding to the 7 instruction set buffers 103 being 2 is selected, the instruction information in the instruction set buffer 103 is output to the output buffer 105 through the second crossbar 104, and then is selected by the output selection logic unit 106, and finally is output to the decoding unit.
In the existing processor design, the instruction output end needs to perform 32-level 1 selection operation to obtain the content of an output buffer, namely about 6 levels of logic; in the design of the processor, only 7-1 selection operation is needed, and about 4 stages of logic are needed, so that the time of two stages of logic can be saved.
In the embodiment of the present invention, after the output selection logic 106 outputs the instruction information to the decoding unit, the read pointer position of the main buffer 102 is updated according to the number of the outputted instruction information.
In the embodiment of the present invention, the instruction information of the corresponding instruction group buffer 103 is selected as the instruction information output by the output buffer 105 according to the number of instruction information output by the main buffer 102 in the previous cycle.
Compared with the prior art, the invention stores the instruction information content possibly output in the next period by adding the additional instruction group buffer unit, and when the instruction buffer unit module needs to output, the instruction information content of the corresponding instruction group buffer is selected to output instead of selecting the corresponding instruction from the main buffer through a large crossbar matrix to output. And at the same time supplements instructions from the input of the main buffer or instruction buffer unit. With this structure, the time of two stages of logic can be saved. And with the increase of the size of the main buffer and the emission quantity of the superscalar processor system, the processor provided by the invention has more time sequence benefits, is beneficial to realizing higher working frequency, and achieves better performance.
Example two
The embodiment of the present invention also provides a computer system, including a processor as described in the first embodiment, and since the computer system in this embodiment includes the processor of one of the above embodiments, it can achieve the technical effects achieved by the processor of the first embodiment, which is not described herein again,
it should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
While the embodiments of the present invention have been illustrated and described in connection with the drawings, what is presently considered to be the most practical and preferred embodiments of the invention, it is to be understood that the invention is not limited to the disclosed embodiments, but on the contrary, is intended to cover various equivalent modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (7)

1. The processor is characterized by comprising a fetching unit, an instruction buffer unit connected with the output end of the fetching unit, a decoding unit connected with the output end of the instruction buffer unit and an execution unit connected with the output end of the decoding unit;
the instruction fetching unit is used for fetching instruction information which needs to be executed by the processor from the memory in each period; the instruction buffer unit is used for storing the instruction information fetched by the instruction fetching unit and balancing throughput gaps of the instruction information between the instruction fetching unit and the decoding unit; the decoding unit is used for decoding the fetched instruction information to obtain an operand and transmitting the operand to the execution unit; the execution unit is used for executing the instruction information and obtaining a result;
the instruction buffer unit comprises a main buffer, a first crossbar matrix connected with the input end of the main buffer, a plurality of instruction group buffers respectively connected with the output end of the main buffer and the output end of the first crossbar matrix, a second crossbar matrix connected with the output ends of the plurality of instruction group buffers, an output buffer connected with the output end of the second crossbar matrix and an output selection logic unit connected with the output end of the output buffer; the output end of the output selection logic unit is connected with the decoding unit;
the instruction group buffer is used for storing instruction information to be output in the next period; the first crossbar is used for splitting the instruction information output by the instruction fetching unit into the instruction information with the storage size in the corresponding instruction buffer unit according to the current instruction information condition stored in the instruction buffer unit, and outputting the instruction information to the corresponding positions in the main buffer and the instruction group buffer for storage; the main buffer is used for storing instruction information output by the instruction fetching unit; the second crossbar is used for selecting instruction information in the corresponding instruction group buffer from a plurality of instruction group buffers and outputting the instruction information to the output buffer; the output buffer is used for temporarily storing instruction information in the corresponding instruction group buffer; the output selection logic unit is used for judging according to the instruction information output by the output buffer, selecting the instruction information to be output in the current period and outputting the instruction information to the decoding unit.
2. The processor as set forth in claim 1, wherein after said main buffer receives instruction information output by said instruction fetch unit, if the remaining space of said main buffer is currently insufficient, back-pressing a front-end request to said instruction fetch unit; and if the residual space of the main buffer is sufficient, storing instruction information into a first-in first-out queue of the main buffer in sequence, and updating the write pointer position of the first-in first-out queue.
3. The processor of claim 1 wherein the number of instruction set buffers corresponds to the number of maximum outputs of the main buffer per cycle.
4. The processor of claim 1 wherein the instruction information received by the instruction set buffer includes instruction information output by the main buffer and instruction information output by the instruction fetch unit at a current cycle.
5. The processor of claim 1, wherein the output select logic unit updates the read pointer position of the main buffer based on the number of instruction information output after outputting the instruction information to the decode unit.
6. The processor of claim 1 wherein instruction information of the corresponding instruction group buffer is selected as instruction information output by the output buffer based on the number of instruction information output by the main buffer in the last cycle.
7. A computer system comprising a processor as claimed in any one of claims 1 to 6.
CN202311652922.6A 2023-12-05 2023-12-05 Processor and computer system Active CN117348933B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311652922.6A CN117348933B (en) 2023-12-05 2023-12-05 Processor and computer system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311652922.6A CN117348933B (en) 2023-12-05 2023-12-05 Processor and computer system

Publications (2)

Publication Number Publication Date
CN117348933A CN117348933A (en) 2024-01-05
CN117348933B true CN117348933B (en) 2024-02-06

Family

ID=89361763

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311652922.6A Active CN117348933B (en) 2023-12-05 2023-12-05 Processor and computer system

Country Status (1)

Country Link
CN (1) CN117348933B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0651321A1 (en) * 1993-10-29 1995-05-03 Advanced Micro Devices, Inc. Superscalar microprocessors
CN102156637A (en) * 2011-05-04 2011-08-17 中国人民解放军国防科学技术大学 Vector crossing multithread processing method and vector crossing multithread microprocessor
CN105242904A (en) * 2015-09-21 2016-01-13 中国科学院自动化研究所 Apparatus for processor instruction buffering and circular buffering and method for operating apparatus
WO2016016726A2 (en) * 2014-07-30 2016-02-04 Linear Algebra Technologies Limited Vector processor
CN111512298A (en) * 2018-04-03 2020-08-07 英特尔公司 Apparatus, method and system for conditional queuing in configurable spatial accelerators
CN116483441A (en) * 2023-06-21 2023-07-25 睿思芯科(深圳)技术有限公司 Output time sequence optimizing system, method and related equipment based on shift buffering
CN116501389A (en) * 2023-06-28 2023-07-28 睿思芯科(深圳)技术有限公司 Instruction buffer unit, processor and computer system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0651321A1 (en) * 1993-10-29 1995-05-03 Advanced Micro Devices, Inc. Superscalar microprocessors
CN102156637A (en) * 2011-05-04 2011-08-17 中国人民解放军国防科学技术大学 Vector crossing multithread processing method and vector crossing multithread microprocessor
WO2016016726A2 (en) * 2014-07-30 2016-02-04 Linear Algebra Technologies Limited Vector processor
CN105242904A (en) * 2015-09-21 2016-01-13 中国科学院自动化研究所 Apparatus for processor instruction buffering and circular buffering and method for operating apparatus
CN111512298A (en) * 2018-04-03 2020-08-07 英特尔公司 Apparatus, method and system for conditional queuing in configurable spatial accelerators
CN116483441A (en) * 2023-06-21 2023-07-25 睿思芯科(深圳)技术有限公司 Output time sequence optimizing system, method and related equipment based on shift buffering
CN116501389A (en) * 2023-06-28 2023-07-28 睿思芯科(深圳)技术有限公司 Instruction buffer unit, processor and computer system

Also Published As

Publication number Publication date
CN117348933A (en) 2024-01-05

Similar Documents

Publication Publication Date Title
EP0689128B1 (en) Computer instruction compression
FI90804B (en) A data processor control unit having an interrupt service using instruction prefetch redirection
EP1368732B1 (en) Digital signal processing apparatus
EP2671150B1 (en) Processor with a coprocessor having early access to not-yet issued instructions
EP1046983B1 (en) VLIW processor and program code compression device and method
US6108768A (en) Reissue logic for individually reissuing instructions trapped in a multiissue stack based computing system
US6654871B1 (en) Device and a method for performing stack operations in a processing system
CN116483441B (en) Output time sequence optimizing system, method and related equipment based on shift buffering
US6275903B1 (en) Stack cache miss handling
CN116501389B (en) Instruction buffer unit, processor and computer system
CN117348933B (en) Processor and computer system
US6237086B1 (en) 1 Method to prevent pipeline stalls in superscalar stack based computing systems
JP3779012B2 (en) Pipelined microprocessor without interruption due to branching and its operating method
US6237087B1 (en) Method and apparatus for speeding sequential access of a set-associative cache
US6170050B1 (en) Length decoder for variable length data
US5155818A (en) Unconditional wide branch instruction acceleration
EP0992889A1 (en) Interrupt processing during iterative instruction execution
KR100639146B1 (en) Data processing system having a cartesian controller
US6550003B1 (en) Not reported jump buffer
CN117667222B (en) Two-stage branch prediction system, method and related equipment with optimized time sequence
CN112181497B (en) Method and device for transmitting branch target prediction address in pipeline
CN109614146B (en) Local jump instruction fetch method and device
US7941638B2 (en) Facilitating fast scanning for control transfer instructions in an instruction fetch unit
EP0689129B1 (en) Processing of computer instructions with a reduced number of bits for operand specifiers
WO2007094256A1 (en) Queue processor and data processing method using queue processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant