CN111475203B - Instruction reading method for processor and corresponding processor - Google Patents

Instruction reading method for processor and corresponding processor Download PDF

Info

Publication number
CN111475203B
CN111475203B CN202010258353.7A CN202010258353A CN111475203B CN 111475203 B CN111475203 B CN 111475203B CN 202010258353 A CN202010258353 A CN 202010258353A CN 111475203 B CN111475203 B CN 111475203B
Authority
CN
China
Prior art keywords
instruction
instructions
cache
memory
buffer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010258353.7A
Other languages
Chinese (zh)
Other versions
CN111475203A (en
Inventor
倪永良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiaohua Semiconductor Co ltd
Original Assignee
Xiaohua Semiconductor Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiaohua Semiconductor Co ltd filed Critical Xiaohua Semiconductor Co ltd
Priority to CN202010258353.7A priority Critical patent/CN111475203B/en
Publication of CN111475203A publication Critical patent/CN111475203A/en
Application granted granted Critical
Publication of CN111475203B publication Critical patent/CN111475203B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30047Prefetch instructions; cache control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Advance Control (AREA)

Abstract

The invention relates to an instruction reading method for a processor. The invention further relates to a processor. In the method of the invention, the instruction which is prefetched and hit by the prefetching unit is placed in the buffer instead of the cache, so that the occupation of the cache is reduced, the utilization efficiency of the cache is improved, and meanwhile, because the cache unit stores the instruction which is fetched by the ROM for the first time and other sequential instructions can be prefetched by the prefetching unit, when the program is called for the second time, the speed of the instruction fetching and the speed of the conventional cache are basically consistent and are not influenced by the waiting time.

Description

Instruction reading method for processor and corresponding processor
Technical Field
The present invention relates generally to the field of processors, and more particularly, to an instruction fetching method for a processor. The invention further relates to a processor.
Background
With the progress of semiconductor processes, the processing speed of processors such as general purpose processors, micro-Controller units (Micro-Controller units), etc. has been greatly increased, and the frequency thereof has been increased from the past several MHz to several GHz. At the same time, however, instruction access speeds for Memory devices such as computer hard disks, read Only Memories (ROMs), etc. have not yet evolved to levels commensurate with the instruction execution speed of processors, such as computer hard disks or ROMs, which are Read at speeds of Only tens to 100 or more MB/s, which is far from the instruction processing speed of several billion instructions per second (MIPS) at processor frequencies. Moreover, the speed gap between processors and memory tends to be exacerbated.
In order to make up for the speed gap between processors and memory to reduce the latency of the processor, various solutions have been proposed in the prior art. These schemes are described below with respect to reading a ROM and CPU that require 2 wait cycles as an example:
1. when no optimization is performed, the CPU needs to wait for 2 cycles each time the instruction is fetched, that is, an average of 3 cycles is required to fetch one instruction.
2. The bit width of the ROM is increased and a Buffer (Buffer) is set. When the bit width of the ROM is increased to 4 instructions, 4 instructions can be read at a time, and the buffer is used to store the 4 instructions that were read the last time. When the CPU reads the instructions sequentially, the CPU waits 2 cycles each time it fetches the first of the 4 instructions, and fetches the remaining 3 instructions can be provided directly from the buffer without waiting, that is, on average, 6 cycles can fetch the 4 instructions. Therefore, increasing the bit width of the ROM and providing a buffer can improve the efficiency of the CPU in fetching instructions.
3. A Prefetch Unit (Prefetch Unit) is set. The Prefetch unit is disposed in front of the buffer and the ROM and is used for prefetching (Prefetch), namely reading and saving a plurality of instructions in advance, so as to reduce the waiting time of the CPU. When the instruction prefetched by the prefetching unit is just the instruction to be fetched by the CPU, the instruction is called prefetching hit, otherwise, the instruction is missed. In the case of a miss, the CPU needs to read the instruction from the ROM. In the case of setting the prefetch unit, only the first instruction requires a wait cycle when the program is executed sequentially, and then it is possible to execute one instruction up to an average of 1 cycle, and since the prefetch unit is supposed to act in sequence, it is necessary to insert 2 wait cycles when a jump occurs.
4. A Cache (Cache) is set. The cache is located between the CPU and the ROM and is used to store all instructions executed by the CPU. When the CPU repeatedly executes these instructions, they are provided directly from the cache, and do not need to be read from the slow ROM. When a program that has been completely stored in the cache is repeatedly executed, the CPU performance can be free from the influence of the ROM access waiting period regardless of jump or sequential execution. To improve the performance of the CPU on the first execution, a cache and prefetch unit are typically used in combination.
As can be appreciated from the above, the use of a cache greatly reduces the latency of the CPU. However, the access speed of the cache is very high, but the cost is also very high. Therefore, it is desirable to be able to achieve as little processor latency as possible with as little cache as possible.
Disclosure of Invention
Starting from the prior art, the task of the present invention is to provide an instruction fetch method for a processor, by which the latency of the processor can be reduced as much as possible, while the occupancy of the cache memory is reduced, thereby reducing the required capacity of the cache memory or increasing the utilization efficiency of the cache memory.
According to the invention, this task is solved by an instruction fetch method for a processor, comprising the following steps:
providing, by an arithmetic unit, an instruction address of a first instruction to be read;
providing, by the buffer, the first instruction if the first instruction is present in the buffer; otherwise:
if the first instruction is present in the cache, providing, by the cache, the first instruction and storing, by the buffer, a group of instructions cached in the cache that includes the first instruction; otherwise:
providing, by the prefetch unit and storing, by the buffer, a group of instructions including the first instruction if the first instruction is present in the prefetch unit; otherwise:
an instruction group including the first instruction is read by the memory according to an instruction address of the first instruction and provided to the arithmetic unit, and the instruction group is cached by the cache and stored by the buffer.
In the present invention, "arithmetic unit" refers to a unit in a processor for processing or executing instructions. A "processor" should be broadly interpreted as a device that executes instructions, such as a general purpose processor, a special purpose processor, an MCU, and so forth.
In one embodiment of the invention, provision is made for:
storing, by the buffer, a set of instructions including a first instruction includes: storing, by a buffer, the group of instructions and an instruction address of the group of instructions in association; and/or
Caching, by a cache, a set of instructions including a first instruction includes: the instruction group and the instruction address of the instruction group are stored in association with each other by the cache.
In a further embodiment of the invention, provision is made for:
the instruction group is a plurality of instructions having consecutive storage locations.
For example, the instruction group is instructions of 4 consecutive addresses from the target instruction. Other instruction fetch modes, prefetch modes, and cache fetch modes are also contemplated.
In a further embodiment of the invention, provision is made for:
the bit width of the memory is n times the instruction length and the storage capacity of the buffer is n times the instruction length, where n =2 k K is an integer and k is not less than 0; and/or
The instruction group comprises n instructions, wherein n =2 k K is an integer and k is not less than 0; and/or
The prefetch unit prefetches n instructions at a time, where n =2 k K is an integer and k is not less than 0.
In a further embodiment of the invention, it is provided that the processor is a microcontroller unit MCU and the memory is a read-only memory ROM.
Furthermore, the invention relates to a processor configured to perform the method according to the invention.
In addition, the invention also provides a micro control unit, wherein the memory is a read only memory ROM, and the micro control unit is configured to execute the method according to the invention.
In a second aspect of the invention, the aforementioned task is solved by a processor comprising:
the arithmetic unit is configured to send a target instruction address and receive a target instruction corresponding to the target instruction address for execution;
a buffer configured to store instructions read from memory and to store instructions prefetched by the prefetch unit when the target instructions are not present in both the memory and the cache;
a cache configured to store instructions each time they are read from memory;
a prefetch unit configured to prefetch instructions from a memory at a set timing; and a memory configured to output a plurality of instructions including the target instruction store according to respective target instruction addresses.
In one embodiment of the invention, it is provided that the memory comprises one or more of the following: SDRAM, DRAM, and read only memory.
The invention has at least the following beneficial effects: (1) The invention reduces the waiting time of the processor better by adopting the pre-fetching unit and the buffer; (2) The invention has the advantages that the instruction which is prefetched and hit by the prefetching unit is put into the buffer instead of the cache, so that the occupation of the cache is better reduced, the utilization efficiency of the cache is improved, and meanwhile, because the instruction which is read by the ROM for the first time is stored in the cache unit, and the rest sequential instructions can be prefetched by the prefetching unit, when the program is called for the second time, the speed of the instruction reading is basically consistent with that of the conventional cache, and the influence of waiting time is avoided.
Drawings
The invention is further elucidated with reference to the drawings in conjunction with the detailed description.
FIG. 1 illustrates an architecture of a processor according to the present invention;
FIG. 2 illustrates an embodiment according to the present invention; and
fig. 3 shows a flow of the method according to the invention.
Detailed Description
It should be noted that the components in the figures may be exaggerated and not necessarily to scale for illustrative purposes. In the figures, identical or functionally identical components are provided with the same reference symbols.
In the present invention, "disposed on" \ 8230 "", "disposed over" \823030 "", and "disposed over" \8230 "", do not exclude the presence of an intermediate therebetween, unless otherwise specified. Furthermore, "arranged above or 8230that" on "merely indicates the relative positional relationship between the two components, but in certain cases, for example after reversing the product direction, can also be switched to" arranged below or below "8230, and vice versa.
In the present invention, the embodiments are only intended to illustrate the aspects of the present invention, and should not be construed as limiting.
In the present invention, the terms "a" and "an" do not exclude the presence of a plurality of elements, unless otherwise specified.
It is further noted herein that in embodiments of the present invention, only a portion of the components or assemblies may be shown for clarity and simplicity, but those of ordinary skill in the art will appreciate that, given the teachings of the present invention, required components or assemblies may be added as needed in a particular scenario.
It is also noted herein that, within the scope of the present invention, the terms "same", "equal", and the like do not mean that the two values are absolutely equal, but allow some reasonable error, that is, the terms also encompass "substantially the same", "substantially equal". By analogy, in the present disclosure, the terms "perpendicular," parallel, "and the like in the directions of the tables also encompass the meanings of" substantially perpendicular, "" substantially parallel.
The numbering of the steps of the methods of the present invention does not limit the order in which the method steps are performed. Unless specifically stated, the method steps may be performed in a different order. In particular, in the present application, some acts of the components of the processor may be performed in parallel, e.g., the prefetch instruction act may be performed in parallel with other acts. Thus, the order in which the method steps of the present application are sequenced does not necessarily imply that the associated steps can only be performed in that order.
Furthermore, in the present invention, the term "instruction set" or "set of instructions" may comprise one or more instructions, e.g. depending on parameter limitations such as bandwidth of the memory.
The invention is further elucidated with reference to the drawings in conjunction with the detailed description.
Fig. 1 shows the architecture of a processor 100 according to the invention.
The processor 100 includes, for example, an Arithmetic Unit (AU) 101, a buffer 102, a cache 103, a prefetch unit 104, and a memory, here a read only memory ROM 105. These components are in data communication with each other via data lines, such as an address bus and a data bus. In the present embodiment, the instruction reading speed of the buffer 102 and the cache 103 is faster than that of the ROM 105, for example, but the storage capacity thereof increases in order. The bandwidth of the ROM 105 is, for example, 4 instructions, and the capacity of the buffer 102 is also, for example, 1kB, that is, 1024 bytes, that of the 4 instruction cache 103. In other embodiments, other bandwidths and capacities may be set.
The following components of the processor 100 are described separately.
The operator 101 is, for example, configured to send instruction addresses, such as ROM instruction addresses, to other components of the processor, such as the buffer 102, cache 104, prefetch unit 104, and ROM 105, for example, via an address bus (not shown), and to receive instructions stored by the instruction addresses from the other components for execution, for example, via a data bus (not shown). Similarly, the other components of the processor 100, such as the arithmetic logic unit ALU, the accumulator, the status register, and the general register, the operator 101 may be configured by the components. The operator 101 is configured to execute instructions of one or more instruction sets, such as addition operations, multiplication operations, shift operations, and the like.
The buffer 102 is configured to store instructions read from the cache 103 or the prefetch unit 104 or the ROM 105, for example.
The Cache (Cache) 103 is configured to store instructions each time they are read from the ROM 105, for example. For example, when the target instruction is not present in the cache and the instruction prefetched by the prefetch unit also misses the target instruction (i.e., does not contain the target instruction), the instruction must be fetched from the ROM 105, at which time the cache 103 stores the instruction fetched from the ROM 105.
The prefetch unit 104 is configured, for example, to prefetch instructions from the ROM 105 every instruction cycle or every certain time (e.g., every two instruction cycles, or other time settings are also contemplated), and to send the prefetch hitting instructions to the buffer 102 for storage when the target instructions are not present in the cache. The prefetching of instructions may preferably be done when the ROM is idle. For example, the prefetching of instructions may occur concurrently with the actions of other components, such as a processor read action, when the ROM is idle. The specific prefetch instruction timing may be determined and optimized based on the usage scenario and actual demand.
The ROM 105 has stored therein instructions available to the processor 100. The ROM 105 may output the instruction stored by the instruction address according to the corresponding instruction address in the address bus. In the present embodiment, the bit width of the ROM 105 is, for example, 4 instructions, i.e., 4 instructions can be read at a time.
The method of operation of the processor according to the invention is briefly described below.
1. The instruction address of the next instruction to be read (hereinafter referred to as this instruction) is supplied by the arithmetic unit AU, and a read request is issued to the buffer BUF.
2. If the instruction is stored in the buffer BUF, the instruction is directly provided by the buffer BUF, and the instruction fetching is finished.
3. If the buffer BUF does not store the instruction, the buffer BUF sends a read request to the CACHE CACHE, and the read instruction is provided to the arithmetic unit AU, and an instruction group containing the instruction is stored in the buffer BUF.
4. If the CACHE stores the instruction, the CACHE directly provides an instruction group containing the instruction, and the instruction fetching is finished.
5. If the CACHE CACHE does not store the instruction, the CACHE CACHE sends a read request to the prefetch unit PF, and provides the read instruction group containing the instruction to the buffer BUF, and if the prefetch unit feeds back a miss, the instruction group is stored in the CACHE CACHE.
6. If the prefetch unit PF stores the instruction, the prefetch unit PF directly provides an instruction group including the instruction, and feeds back a prefetch hit, and this instruction fetch is completed.
7. If the prefetch unit PF does not store the instruction, the prefetch unit PF sends a read request to the memory ROM, and provides the read instruction group containing the instruction to the CACHE CACHE, feeds back the prefetch miss, and finishes the instruction fetch.
8. The prefetch unit actively issues a read request to the memory ROM and stores a read instruction group into the prefetch unit when, for example, the memory is idle and a next group of instructions estimated from the instruction addresses issued by the AUs are not stored in the prefetch unit.
In short, each instruction fetch, the action is one of the following ABCDs:
A. if the buffer stores the instruction, actions 1 and 2 are executed.
B. If the buffer does not store the instruction and the cache stores the instruction, actions 1,3 and 4 are executed.
C. If the cache/cache does not store the instruction, and the prefetch unit stores the instruction, actions 1,3,5 and 6 are performed.
D. If none of the cache/prefetch units holds the instruction, actions 1,3,5, and 7 are performed.
Action 8 may be performed simultaneously with the three ABC actions described above.
Fig. 2 shows an embodiment according to the invention. In this embodiment, the ROM 105 is 4 times wider than the instruction, where the ROM requires 2 read latency cycles.
After the first subroutine a call, the instruction read from the ROM 105 is placed in the cache 103, and the instruction provided at the time of the hit by the prefetch unit 104 is not placed in the cache 103 but is placed in the buffer 102. The calls of program a and program B need to be stored in 4 sets of instructions (4 x 4 instructions).
The second time subroutine A is invoked, the performance of processor 100 (or CPU) may be unaffected by ROM 105 access latency, in combination with the 4 sets of instructions stored in cache 103 and the corresponding instruction fetching actions of prefetch unit 104.
Therefore, after the scheme is used, in order to prevent the performance of the CPU from being influenced by the access waiting of the ROM, the occupied cache capacity of the same subprogram is reduced, and the equivalent capacity of the cache is equivalently improved.
Fig. 3 shows a flow of a method 300 according to the invention.
In step 302, an instruction address of a first instruction to be fetched is provided by an arithmetic unit.
At step 304, if the first instruction is present in the buffer, providing, by the buffer, the first instruction; otherwise the method proceeds to step 306.
At step 306, if the first instruction is present in the cache, providing, by the cache, the first instruction and storing, by the buffer, a group of instructions cached in the cache including the first instruction; otherwise, the method proceeds to step 308.
At step 308, if the first instruction is present in the prefetch unit, the first instruction is provided by the prefetch unit and the instruction group containing the first instruction stored in the prefetch unit is stored by the buffer, and the prefetch unit outputs "hit" information to the cache; otherwise, if the first instruction is not present in the prefetch unit, the prefetch unit outputs a "miss" to the cache and the method proceeds to step 310.
At step 310, an instruction group including a first instruction is fetched by the memory according to an instruction address of the first instruction and provided to the arithmetic unit, and the instruction group is cached by the cache and stored by the buffer.
The invention has at least the following beneficial effects: (1) The invention reduces the waiting time of the processor better by adopting the pre-fetching unit and the buffer; (2) The invention has the advantages that the instruction which is prefetched and hit by the prefetching unit is put into the buffer instead of the cache, so that the occupation of the cache is better reduced, the utilization efficiency of the cache is improved, and meanwhile, because the instruction which is read by the ROM for the first time is stored in the cache unit, and the rest sequential instructions can be prefetched by the prefetching unit, when the program is called for the second time, the speed of the instruction reading is basically consistent with that of the conventional cache, and the influence of waiting time is avoided.
Although some embodiments of the present invention have been described herein, those skilled in the art will appreciate that they have been presented by way of example only. Numerous variations, substitutions and modifications will occur to those skilled in the art in light of the teachings of the present invention without departing from the scope thereof. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims (10)

1. An instruction fetch method for a processor, comprising the steps of:
providing, by an arithmetic unit, an instruction address of a first instruction to be read;
providing, by the buffer, the first instruction if the first instruction is present in the buffer; otherwise:
if the first instruction is present in the cache, providing, by the cache, the first instruction and storing, by the buffer, a group of instructions cached in the cache including the first instruction; otherwise:
providing, by the prefetch unit and storing, by the buffer, a set of instructions including the first instruction without storing the set of instructions by the cache, if the first instruction is present in the prefetch unit; otherwise:
an instruction group including the first instruction is read by the memory according to an instruction address of the first instruction and provided to the arithmetic unit, and the instruction group is cached by the cache and stored by the buffer.
2. The method of claim 1, wherein:
storing, by the buffer, a set of instructions including a first instruction includes: storing, by a buffer, the group of instructions and the instruction addresses of the group of instructions in association; and/or
Caching, by a cache, a set of instructions including a first instruction includes: the group of instructions and the instruction addresses of the group of instructions are stored in association with each other by a cache.
3. The method of claim 1, wherein: the instruction group is a plurality of instructions whose storage locations are consecutive.
4. The method of claim 1, wherein:
the bit width of the memory is n times the instruction length and the storage capacity of the buffer is n times the instruction length, where n =2 k K is an integer and k is not less than 0; and/or
The group of instructions comprises n instructions, wherein n =2 k K is an integer and k is not less than 0; and/or
Prefetch unit prefetches n fingers at a timeOrder, where n =2 k K is an integer and k is not less than 0.
5. The method of claim 1, wherein the processor is a Micro Control Unit (MCU) and the memory is a Read Only Memory (ROM).
6. The method of claim 1, further comprising the steps of:
the second set of instructions is prefetched by the prefetch unit based on the instruction address of the first instruction and the prediction algorithm when the memory is idle.
7. A processor configured to perform the method of one of claims 1 to 6.
8. A micro control unit, wherein the memory is a read only memory ROM, the micro control unit being configured to perform the method according to one of claims 1 to 6.
9. A processor, comprising:
the arithmetic unit is configured to send a target instruction address and receive a target instruction corresponding to the target instruction address for execution;
a buffer configured to store instructions read from memory and to store instructions prefetched by the prefetch unit when the target instructions are not present in both the memory and the cache;
a cache configured to store instructions each time they are read from memory;
a prefetch unit configured to prefetch instructions from the memory at a set time, wherein instructions provided on a prefetch unit hit are not placed in the cache but are placed in the buffer; and
a memory configured to output a plurality of instructions including the target instruction store according to respective target instruction addresses.
10. The processor of claim 9, wherein the memory comprises one or more of: SDRAM, DRAM, and read only memory.
CN202010258353.7A 2020-04-03 2020-04-03 Instruction reading method for processor and corresponding processor Active CN111475203B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010258353.7A CN111475203B (en) 2020-04-03 2020-04-03 Instruction reading method for processor and corresponding processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010258353.7A CN111475203B (en) 2020-04-03 2020-04-03 Instruction reading method for processor and corresponding processor

Publications (2)

Publication Number Publication Date
CN111475203A CN111475203A (en) 2020-07-31
CN111475203B true CN111475203B (en) 2023-03-14

Family

ID=71749804

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010258353.7A Active CN111475203B (en) 2020-04-03 2020-04-03 Instruction reading method for processor and corresponding processor

Country Status (1)

Country Link
CN (1) CN111475203B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1462388A (en) * 2001-02-20 2003-12-17 皇家菲利浦电子有限公司 Cycle prefetching of sequencial memory
CN1484157A (en) * 2002-09-20 2004-03-24 联发科技股份有限公司 Embedding system and instruction prefetching device and method thereof
CN101228507A (en) * 2005-06-10 2008-07-23 高通股份有限公司 Method and apparatus for managing instruction flushing in a microprocessor's instruction pipeline
CN101526895A (en) * 2009-01-22 2009-09-09 杭州中天微系统有限公司 High-performance low-power-consumption embedded processor based on command dual-transmission
CN102169428A (en) * 2010-06-22 2011-08-31 上海盈方微电子有限公司 Dynamic configurable instruction access accelerator
CN104049954A (en) * 2013-03-14 2014-09-17 英特尔公司 Multiple Data Element-To-Multiple Data Element Comparison Processors, Methods, Systems, and Instructions
CN107479860A (en) * 2016-06-07 2017-12-15 华为技术有限公司 A kind of forecasting method of processor chips and instruction buffer
CN110442382A (en) * 2019-07-31 2019-11-12 西安芯海微电子科技有限公司 Prefetch buffer control method, device, chip and computer readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7032097B2 (en) * 2003-04-24 2006-04-18 International Business Machines Corporation Zero cycle penalty in selecting instructions in prefetch buffer in the event of a miss in the instruction cache
JP2009230374A (en) * 2008-03-21 2009-10-08 Fujitsu Ltd Information processor, program, and instruction sequence generation method
US8533437B2 (en) * 2009-06-01 2013-09-10 Via Technologies, Inc. Guaranteed prefetch instruction
US10719321B2 (en) * 2015-09-19 2020-07-21 Microsoft Technology Licensing, Llc Prefetching instruction blocks
GB201701841D0 (en) * 2017-02-03 2017-03-22 Univ Edinburgh Branch target buffer for a data processing apparatus

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1462388A (en) * 2001-02-20 2003-12-17 皇家菲利浦电子有限公司 Cycle prefetching of sequencial memory
CN1484157A (en) * 2002-09-20 2004-03-24 联发科技股份有限公司 Embedding system and instruction prefetching device and method thereof
CN101228507A (en) * 2005-06-10 2008-07-23 高通股份有限公司 Method and apparatus for managing instruction flushing in a microprocessor's instruction pipeline
CN101526895A (en) * 2009-01-22 2009-09-09 杭州中天微系统有限公司 High-performance low-power-consumption embedded processor based on command dual-transmission
CN102169428A (en) * 2010-06-22 2011-08-31 上海盈方微电子有限公司 Dynamic configurable instruction access accelerator
CN104049954A (en) * 2013-03-14 2014-09-17 英特尔公司 Multiple Data Element-To-Multiple Data Element Comparison Processors, Methods, Systems, and Instructions
CN107479860A (en) * 2016-06-07 2017-12-15 华为技术有限公司 A kind of forecasting method of processor chips and instruction buffer
CN110442382A (en) * 2019-07-31 2019-11-12 西安芯海微电子科技有限公司 Prefetch buffer control method, device, chip and computer readable storage medium

Also Published As

Publication number Publication date
CN111475203A (en) 2020-07-31

Similar Documents

Publication Publication Date Title
CN107479860B (en) Processor chip and instruction cache prefetching method
US9141553B2 (en) High-performance cache system and method
US6131155A (en) Programmer-visible uncached load/store unit having burst capability
US6665749B1 (en) Bus protocol for efficiently transferring vector data
US6564313B1 (en) System and method for efficient instruction prefetching based on loop periods
US6513107B1 (en) Vector transfer system generating address error exception when vector to be transferred does not start and end on same memory page
US9396117B2 (en) Instruction cache power reduction
US6643755B2 (en) Cyclically sequential memory prefetch
US7152170B2 (en) Simultaneous multi-threading processor circuits and computer program products configured to operate at different performance levels based on a number of operating threads and methods of operating
US20090177842A1 (en) Data processing system and method for prefetching data and/or instructions
JP5625809B2 (en) Arithmetic processing apparatus, information processing apparatus and control method
US20190079771A1 (en) Lookahead out-of-order instruction fetch apparatus for microprocessors
US20040230780A1 (en) Dynamically adaptive associativity of a branch target buffer (BTB)
US20140019690A1 (en) Processor, information processing apparatus, and control method of processor
US8266379B2 (en) Multithreaded processor with multiple caches
CN111475203B (en) Instruction reading method for processor and corresponding processor
US20150193348A1 (en) High-performance data cache system and method
CN112148366A (en) FLASH acceleration method for reducing power consumption and improving performance of chip
JP4354001B1 (en) Memory control circuit and integrated circuit
CN112559389A (en) Storage control device, processing device, computer system, and storage control method
US20170147498A1 (en) System and method for updating an instruction cache following a branch instruction in a semiconductor device
CN112395000B (en) Data preloading method and instruction processing device
JP4413663B2 (en) Instruction cache system
CN111399913B (en) Processor accelerated instruction fetching method based on prefetching
CN111124494B (en) Method and circuit for accelerating unconditional jump in CPU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220729

Address after: 201210 floor 10, block a, building 1, No. 1867, Zhongke Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Applicant after: Xiaohua Semiconductor Co.,Ltd.

Address before: 201210 8th floor, block a, 1867 Zhongke Road, Pudong New Area, Shanghai

Applicant before: HUADA SEMICONDUCTOR Co.,Ltd.

GR01 Patent grant
GR01 Patent grant