CN113849226A

CN113849226A - Instruction extraction device, processor, board card and instruction extraction method

Info

Publication number: CN113849226A
Application number: CN202010599747.9A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2021-12-28

Abstract

The disclosure discloses an instruction fetch apparatus, a processor, an integrated circuit chip, a board, an electronic device, and a method for fetching an instruction. In one embodiment, the aforementioned instruction fetch means may be included in a computing device, and the computing device may be included in a combined processing device, which may also include a general purpose interconnect interface and other processing devices. The computing device interacts with other processing devices to jointly complete computing operations specified by a user. The combined processing means may further comprise storage means connected to the device and the other processing means, respectively, for storing data of the device and the other processing means. The disclosed scheme can improve the instruction fetching efficiency of various computing fields including, for example, the artificial intelligence field, thereby accelerating the operation and reducing the overall overhead and cost of the operation.

Description

Instruction extraction device, processor, board card and instruction extraction method

Technical Field

The present disclosure relates generally to the field of computing. More particularly, the present disclosure relates to an instruction fetch apparatus, a processor, an integrated circuit chip, a board, an electronic device, and a method for fetching instructions.

Background

When a cache miss occurs, a conventional central processing unit ("CPU") needs to backfill instructions from a dynamic random access memory ("DRAM") into a cache before continuing to fetch instructions from the cache for operations such as decoding. In some scenarios, in order to improve the efficiency of fetching instructions (abbreviated as "instruction fetching"), the CPU often performs a hardware prefetch operation during backfilling, that is, in addition to backfilling a currently missed instruction (abbreviated as "backfill instruction"), several instructions following the backfill instruction are also backfilled into the buffer, which affects the efficiency of instruction fetching.

Disclosure of Invention

To address at least the above-identified problems in the prior art, the present disclosure provides a scheme for efficiently fetching instructions. By utilizing the scheme disclosed by the disclosure, the received instruction can be sent out (for example, sent to a decoder) during the backfill operation of the backfill instruction, so that the efficiency of fetching the instruction is obviously improved. Due to the improvements in instruction fetching efficiency, aspects of the present disclosure may also achieve technical advantages in enhancing processing performance of hardware, such as processors, reducing power consumption, and increasing execution efficiency of computing operations.

In a first aspect, the present disclosure provides an instruction fetch apparatus comprising: a buffer configured to buffer a plurality of instructions fetched from an external memory; an instruction fetch circuit configured to fetch a plurality of instructions to be sent out passing from the buffer; backfill request circuitry configured to: detecting whether a cache miss event occurs during an instruction is sent out by the instruction fetching device; and in response to detecting the occurrence of the cache miss event, sending a backfill request to the external memory; the instruction fetching circuit is further configured to: receiving, in batches from the external memory, refill instructions in response to refill requests; and during the interval of receiving the backfill instructions in batches, sending the received backfill instructions outwards.

In a second aspect, the present disclosure provides a processor comprising: an instruction fetch apparatus as described above and to be described in a plurality of embodiments below; a decoder configured to receive and decode instructions from the instruction fetch device to obtain a plurality of microinstructions and/or control signals; and processing circuitry configured to perform operations in accordance with the plurality of microinstructions and/or control signals.

In a third aspect, the present disclosure provides an integrated circuit chip comprising an instruction fetch device or processor as described above and in a number of embodiments below.

In a fourth aspect, the present disclosure provides a board card comprising an integrated circuit chip as described above and in the following embodiments.

In a fifth aspect, the present disclosure provides an electronic device comprising an integrated circuit chip as described above and in the following embodiments.

In a sixth aspect, the present disclosure provides a method for fetching instructions, the method comprising: detecting whether a cache miss event occurs during the period of obtaining a plurality of instructions to be sent outwards from a cache; in response to detecting the occurrence of the cache miss event, sending a backfill request to external memory; receiving, in batches from the external memory, backfill instructions responsive to the backfill request; and during the interval of receiving the backfill instructions in batches, sending the received backfill instructions outwards.

By utilizing the instruction extraction device, the integrated circuit chip, the board card, the electronic equipment and the method disclosed by the invention, the received backfill instruction can be sent outwards during the interval of backfill operation, so that the instruction fetching efficiency is obviously improved. Due to the improvement of the instruction fetching efficiency, the scheme disclosed by the invention further improves the operation speed of instruction-related operations, thereby improving the overall performance of hardware and reducing the calculation overhead.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. In the drawings, several embodiments of the disclosure are illustrated by way of example and not by way of limitation, and like or corresponding reference numerals indicate like or corresponding parts and in which:

FIG. 1 is a block diagram illustrating an instruction fetch apparatus according to an embodiment of the present disclosure;

FIG. 2 is a block diagram illustrating a processor according to an embodiment of the present disclosure;

FIG. 3 is a flow chart illustratively showing a method for fetching instructions in accordance with an embodiment of the present disclosure;

FIG. 4 is a block diagram illustrating a combined treatment device according to an embodiment of the present disclosure; and

fig. 5 is a schematic diagram illustrating a structure of a board card according to an embodiment of the disclosure.

Detailed Description

The disclosed aspects provide a hardware architecture that speeds instruction fetching. When the hardware architecture is implemented in a fetch instruction unit, the instruction fetch unit includes a backfill request circuit configured to detect whether a cache miss event occurs during an instruction fetch unit sending an instruction out, and to send a backfill request to an external memory upon the occurrence of the cache miss event. In one embodiment, the instruction fetch apparatus of the present disclosure further includes an instruction fetch circuit configured to receive a backfill instruction as a response to a backfill request from the external memory in batches, and to send the received backfill instruction out, for example to a decoder, during intervals of batch reception. By virtue of the instruction fetch scheme of the present disclosure, instruction fetch operations may be performed efficiently, thereby accelerating execution of instructions. When the instruction extracting device disclosed by the disclosure is applied to a computing device, the processing performance and the computing efficiency of the computing device can be obviously improved, and then the computing overhead is also reduced.

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, not all embodiments of the present disclosure. All other embodiments, which can be derived by one skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.

Fig. 1 is a block diagram illustrating an instruction fetch apparatus 100 according to an embodiment of the present disclosure. For ease of understanding, an external memory 104 is also shown in FIG. 1 that interacts with instruction fetch device 100.

As shown in FIG. 1, an instruction fetch apparatus 100 of the present disclosure may include a buffer 102 configured to buffer a plurality of instructions fetched from an external memory 104 over a bus 106. By caching a plurality of instructions in a buffer in advance from an external storage (e.g., a memory), frequent access operations between an instruction execution subject (e.g., a core processor) and the external storage can be avoided, thereby reducing I/O throughput caused by instruction transfer.

During the process of sending instructions to the outside, such as a decoder, the buffer may pass a plurality of instructions to be sent out, such as

instructions

110, 112, and 114 shown in the figure and instructions to be backfilled located therebetween, i.e.,

backfilled instructions

116, 118, and 120 for short for the present disclosure, to the fetch circuitry 108. In one embodiment, the instruction fetch circuitry may be configured to queue the plurality of instructions to be sent in a send order to form an instruction fetch pipeline for sending out in order. When it is determined that a computing operation or an instruction required for code execution is not cached in the cache, a cache miss event, as referred to herein, may occur. The occurrence of this event means that there is no required instruction or instructions in the current buffer and fetch pipeline, such as the missing and waiting to

backfill instructions

116, 118 and 120 shown schematically in phantom in FIG. 1. It should be understood that three backfill instructions are shown in FIG. 1 by way of example only, and that in different scenarios, depending on how often cache miss events occur, more backfill instructions may need to be backfilled from external memory, and the sequence of backfill instructions is not an arrangement of adjacent intervals to other instructions as shown in FIG. 1, but may occur anywhere in the fetch pipeline.

To determine whether a cache miss event occurs as described above, the instruction fetch apparatus of the present disclosure may further include a backfill request circuit 122, which may be configured to detect whether a cache miss event occurs during the instruction fetch apparatus sending out instructions. When a cache miss event is detected, e.g., no cache in the cache results in the fetch pipeline having no instructions associated therewith, the backfill request circuitry may send a backfill request to the external memory for the missing instructions (i.e., the backfill instruction). In response to receiving the backfill request, an external memory (e.g., DRAM) performs a backfill operation directed to the cache, i.e., backfilling into the cache a backfill instruction that triggered the cache miss event.

In practical application scenarios, because the number of backfill requests is often large, the backfill instructions returned by the external memory need to be transmitted in batches many times. In view of this, the present disclosure proposes to backfill a backfill instruction that triggers a cache miss event into the cache while also utilizing the fetch circuitry to backfill it into the fetch pipeline at an appropriate location, such as indicated by dashed

boxes

116, 118, and 120 in FIG. 1, for queuing and outbound transmission. Since the external memory is in batch to perform the transfer of backfill instructions, the instruction fetch circuitry of the present disclosure may be configured to send the received backfill instructions out during the interval in which the backfill instructions are received in batches. In this way, the efficiency of the instruction fetch operation of the present disclosure is significantly improved, thereby accelerating instruction fetch and subsequent instruction execution operations.

In one embodiment, upon detecting the occurrence of the cache miss event, the cache is configured to suspend delivery of instructions to the fetch circuitry to be sent out, and to receive the backfill instructions from the external memory in batches. In one scenario, after the fetch circuitry sends out the received backfill instructions, the buffer may be configured to receive from the external memory the next batch of backfill instructions to backfill. In another embodiment, the instruction fetch apparatus of the present disclosure further includes a buffer 124 configured to receive the backfill instructions from the external memory in batches, and to backfill the backfill instructions received in batches into the buffer after the instruction fetch circuit completes sending the backfill instructions received in batches out. By the method, the instruction fetching circuit can execute the backfill operation of the buffer after sending all the backfill instructions in the instruction fetching pipeline, so that frequent instruction access operation between the buffer and an external memory is reduced.

In one embodiment, the backfill request circuit may be further configured to send a backfill request to the external memory for a same backfill instruction only once in response to detecting a multiple cache miss event triggered by the same backfill instruction. According to the scheme disclosed by the invention, since the instruction fetching circuit does not wait until the backfill instructions are all refilled into the buffer, and then executes the operation of sending the instructions out through the instruction fetching pipeline, a plurality of cache miss events triggered by the same backfill instructions can be detected by the backfill request circuit. To avoid resending a backfill request to external memory for the same backfill instruction, the backfill request circuitry of the present disclosure will only send a backfill request to external memory once for the same backfill instruction. Thereby, the number of sending backfill requests and the communication overhead between the backfill request circuitry and the external memory may be reduced. In contrast, when a new cache miss event triggered by a new backfill instruction is detected during an interval in which backfill instructions are received in batches, then the backfill request circuitry may be configured to send a new backfill request to the external memory for the new backfill instruction.

The instruction fetch apparatus of the present disclosure and various embodiments thereof are described in detail above in conjunction with FIG. 1. By using the instruction fetch device disclosed by the invention, the instruction fetch operation can be accelerated, so that efficient instruction fetch is realized. Further, when a cache miss event occurs, the instruction fetch device of the present disclosure may also reduce unnecessary backfill requests, saving I/O throughput for instruction delivery. When the instruction fetching device sends the instruction to the decoder, the efficient decoding of the decoder is facilitated, and therefore the instruction execution efficiency is improved.

FIG. 2 is a block diagram illustrating a processor 200 according to an embodiment of the disclosure. As shown in FIG. 2, processor 200 includes instruction fetch apparatus 100 described above in connection with FIG. 1, which is operative to fetch a cached instruction from a cache and, upon a cache miss event, obtain a backfill instruction from external memory, and, during a backfill interval, continue sending instructions out through an internal fetch pipeline. Since the instruction fetching apparatus 100 has been described in detail in conjunction with fig. 1, it will not be described in detail here.

In one embodiment, the processor of the present disclosure further includes a decoder 204 configured to receive and decode instructions from the instruction fetch device to obtain a plurality of microinstructions and/or control signals that may be executed within a processor or processing circuit, for example. In the context of the present disclosure, the aforementioned micro instructions and/or control signals may also be referred to collectively as operational instructions, which may include (or indicate) one or more operational operations to be performed by the processor. The operation may include, but is not limited to, various operations such as an addition operation, a multiplication operation, a convolution operation, a pooling operation, and the like according to different operation scenarios. With respect to the various operations described previously, the processor of the present disclosure also includes processing circuitry 206 for performing the various operations, which in one or more embodiments may include one or more processing sub-circuits. In the case of multiple processing sub-circuits, these may be connected in a regular structure, for example in a multi-dimensional array, to perform operations such as parallel operations or multi-stage pipelined operations. Depending on different implementations and computing scenarios, the processing circuit may include, but is not limited to, a random number processing circuit, an addition-subtraction circuit, a table look-up circuit, a parameter configuration circuit, a multiplier, a pooling device, a comparator, an absolute value circuit, a logical operator, a position index circuit, or a filter, among various operators or operation circuits.

While the processor of the present disclosure including the instruction fetch apparatus is described above in connection with fig. 2, it is noted that the description herein is merely exemplary and not limiting and that many modifications and alternatives are possible to those skilled in the art in light of the teachings herein. For example, although the instruction fetch device, decoder and processing circuitry are described as three separate entities in a processor, the instruction fetch device and decoder may be incorporated or simultaneously disposed within control circuitry that controls the processing circuitry, depending on the implementation scenario. For another example, in some scenarios, the decoder may be arranged in the processing circuit as an entity or unit of a decoding function, such that the processing circuit also supports a decoding function for the fetched instructions.

FIG. 3 is a flow diagram illustratively depicting a method 300 for fetching instructions in accordance with an embodiment of the present disclosure. From the foregoing description in conjunction with fig. 1 and 2, those skilled in the art will appreciate that the method 300 shown in fig. 3 may be implemented by the instruction fetch apparatus 100 of fig. 1 or the processor 200 of fig. 2

As shown in FIG. 3, at step 302, method 300 detects whether a cache miss event occurs during fetching of a plurality of instructions to be sent out from a cache. As mentioned above, the occurrence of the cache miss event means that the requested fetch instruction is not cached in the cache and requires a backfill operation to be requested from an external memory, such as a DRAM. Thus, at step 304, the method 300 sends a backfill request to external memory, such as with backfill request circuitry shown in FIG. 1, in response to detecting the occurrence of the cache miss event. Then, at step 306, the method 300 receives a backfill instruction from the external memory batch in response to the backfill request. Finally, at step 308, the method 300 sends the received backfill instructions out during the interval in which the backfill instructions are received in batches. As previously described, the issue here may be an instruction issue operation performed by the instruction fetch circuitry shown in FIG. 1 using an instruction fetch pipeline that is queued in sequence. Further, the sending here may be a sending operation to a decoder shown in fig. 2, so that the decoder decodes or parses the instruction to obtain, for example, microinstructions and/or control signals required to perform the arithmetic operation.

In one embodiment, in response to detecting the occurrence of multiple cache miss events triggered by the same backfill instruction, the method 300 sends a backfill request to the external memory only once for the same backfill instruction. In another embodiment, in response to detecting a new cache miss event triggered by a new backfill instruction during an interval in which backfill instructions are received in batches, the method 300 sends a new backfill request to the external memory for the new backfill instruction.

The method for fetching instructions of the present disclosure is described above in connection with the steps illustrated in FIG. 3 for the sake of brevity. Those skilled in the art can also appreciate that the method may include more steps according to the disclosure of the present disclosure, and the execution of the steps may implement various operations of the present disclosure described in conjunction with fig. 1 and fig. 2, which are not described herein again.

Fig. 4 is a block diagram illustrating a combined processing device 400 according to an embodiment of the present disclosure. As shown, the combined processing device 400 includes a computing device 402, which may include an instruction fetch device or processor as described in connection with the present disclosure. In one or more embodiments, the computing device may also be implemented as an integrated circuit chip, board, or electronic device that includes the instruction fetch apparatus of the present disclosure. In addition, the combined processing device includes a universal interconnect interface 404 and other processing devices 406. The computing device 402 of the disclosed aspects may interact with other processing devices 406 through a universal interconnect interface 404 to collectively perform user-specified operations, including, for example, instruction fetch operations, subsequent decoding, and/or execution operations of the present disclosure.

According to aspects of the present disclosure, the other processing devices may include one or more types of general and/or special purpose processors such as a central processing unit ("CPU"), a graphics processing unit ("GPU"), an artificial intelligence processor, etc., and the number thereof may be determined not by limitation but by actual needs. In one or more embodiments, the other processing device may serve as an interface for the computing device of the present disclosure to external data and controls, performing basic controls including, but not limited to, data handling, completing start, stop, etc. of the computing device; other processing devices may also cooperate with the computing device to collectively perform computational tasks. In one implementation scenario, a computing device in accordance with aspects of the present disclosure may be implemented as a machine learning computing device.

In accordance with aspects of the present disclosure, the universal interconnect interface may be used to transfer data and control instructions, such as the computing instructions of the present disclosure, between a computing device and other processing devices. For example, the computing device may retrieve the required input data from other processing devices via the universal interconnect interface, write to an on-chip storage device of the computing device, such as an external memory of the present disclosure, e.g., a DRAM. Further, the computing device may obtain control instructions from other processing devices via the universal interconnect interface and write to an on-chip control cache of the computing device, such as the cache of the present disclosure. Alternatively or optionally, the universal interconnect interface may also read data from a storage device of the computing device and transmit the data to other processing devices.

Optionally, the combined processing device may further comprise a storage device 408, which may be connected to the computing device and the other processing device, respectively. In one or more embodiments, the storage device may be used to store data for the computing device and the other processing devices, particularly data that may not be stored in its entirety in internal or on-chip storage within the computing device or other processing devices.

According to different application scenes, the combined processing device disclosed by the invention can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, so that the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the general interconnection interface of the combined processing apparatus is connected with some components of the device. Some components such as a camera, a display, a mouse, a keyboard, a network card, or a wifi interface.

In some embodiments, the present disclosure also discloses a chip or an integrated circuit chip comprising the above-mentioned computing device or combined processing device. In other embodiments, the present disclosure also discloses a chip packaging structure, which includes the above chip.

In some embodiments, the disclosure also discloses a board card comprising the chip packaging structure. Referring to fig. 5, the exemplary board card is provided, which may include other accessories besides the chip 502, including but not limited to: a memory device 504, an interface arrangement 506 and a control device 508.

The memory device is connected with the chip in the chip packaging structure through a bus and used for storing data. The memory device may include a plurality of groups of memory cells 510. Each group of the storage units is connected with the chip through a bus. It will be appreciated that each group of memory cells may be DDR SDRAM ("Double Data Rate SDRAM").

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the memory device may include 4 groups of the memory cells. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may internally include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check.

In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.

The interface device is electrically connected with a chip in the chip packaging structure. The interface means is used to enable data transfer between the chip and an external device 512, such as a server or a computer. For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. In another embodiment, the interface device may also be another interface, and the disclosure does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., a server) by the interface device.

The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). In one or more embodiments, the chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, which may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing and/or a plurality of processing circuits in the chip.

In some embodiments, the present disclosure also discloses an electronic device or apparatus, which includes the above board card. According to different application scenarios, the electronic device or apparatus may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, optical, acoustic, magnetic or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. With this understanding, when the technical solution of the present disclosure can be embodied in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

In the above embodiments of the present disclosure, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The foregoing may be better understood in light of the following clauses:

clause 1, an instruction fetch apparatus, comprising:

a buffer configured to buffer a plurality of instructions fetched from an external memory;

an instruction fetch circuit configured to fetch a plurality of instructions to be sent out passing from the buffer;

backfill request circuitry configured to:

detecting whether a cache miss event occurs during an instruction is sent out by the instruction fetching device; and

in response to detecting the cache miss event occurring, sending a backfill request to the external memory;

the instruction fetching circuit is further configured to:

receiving, in batches from the external memory, refill instructions in response to refill requests; and

during intervals in which the backfill instructions are received in batches, sending out the received backfill instructions.

Clause 2, the instruction fetch apparatus of clause 1, wherein the fetch circuitry is further configured to:

queuing the plurality of instructions to be sent in a sending sequence to form an instruction fetching pipeline; and

and queuing the received backfill instructions according to a corresponding sequence so as to send the backfill instructions out through the instruction fetching pipeline.

Clause 3, the instruction fetch apparatus of clause 1, wherein upon detecting the occurrence of the cache miss event, the cache is configured to:

suspending the transfer of instructions to be sent out to the instruction fetching circuit; and

receiving the backfill instructions from the external memory in batches.

Clause 4, the instruction extraction apparatus of clause 3, wherein the instruction fetch circuit is configured to send out a received backfill instruction during an interval in which the buffer receives the backfill instructions in batches, and the buffer is configured to receive a next batch of backfill instructions from the external memory after the instruction fetch circuit sends out the received backfill instruction.

Clause 5, the instruction fetch apparatus of clause 1, further comprising a buffer configured to:

receiving the backfill instructions from the external memory in batches; and

and after the instruction fetching circuit finishes sending the backfill instructions received in batches to the outside, backfilling the backfill instructions received in batches into the buffer.

Clause 6, the instruction fetch device of any of clauses 1-5, wherein the backfill request circuit is further configured to:

sending a backfill request to the external memory only once for the same backfill instruction in response to detecting a multiple cache miss event triggered by the same backfill instruction.

Clause 7, the instruction fetch device of any of clauses 1-5, wherein the backfill request circuit is further configured to:

in response to detecting a new cache miss event triggered by a new backfill instruction during an interval in which backfill instructions are received in batches, sending a new backfill request to the external memory for the new backfill instruction.

Clause 8, a processor, comprising:

the instruction fetch apparatus of any of clauses 1-7;

a decoder configured to receive and decode instructions from the instruction fetch device to obtain a plurality of microinstructions and/or control signals; and

processing circuitry configured to perform operations in accordance with the plurality of microinstructions and/or control signals.

Clause 9, an integrated circuit chip comprising the instruction fetch apparatus according to any one of clauses 1-7 or the processor according to claim 8.

Clause 10, a board comprising the integrated circuit chip of clause 9.

Clause 11, an electronic device, comprising the integrated circuit chip of clause 9.

Clause 12, a method for fetching instructions, comprising:

detecting whether a cache miss event occurs during the period of obtaining a plurality of instructions to be sent outwards from a cache;

in response to detecting the occurrence of the cache miss event, sending a backfill request to external memory;

receiving, in batches from the external memory, backfill instructions responsive to the backfill request; and

during intervals in which the backfill instructions are received in batches, sending the received backfill instructions out.

Clause 13, the method of clause 12, wherein sending the instruction out comprises:

Clause 14, the method of clause 12, further comprising causing the cache to suspend delivery of instructions to be sent out upon detecting the occurrence of the cache miss event, and receiving the backfill instructions from the external portion of memory in batches.

Clause 15, the method of clause 14, wherein during an interval in which the buffer receives the backfill instructions in batches, the method comprises sending out the backfill instructions that have been received, and after sending out, having the buffer receive backfill instructions for the next batch of backfill from the external memory.

Clause 16, the method of clause 12, further comprising buffering backfill instructions received from the external memory in batches, and filling the buffer with the backfill instructions received in batches after sending the backfill instructions received in batches out.

Clause 17, the method of any one of clauses 12-16, wherein sending the backfill request to the external memory comprises:

Clause 18, the method of any of clauses 12-16, wherein sending the backfill request to the external memory comprises:

The foregoing detailed description of the embodiments of the present disclosure has been presented for purposes of illustration and description and is intended to be exemplary only and is not intended to be exhaustive or to limit the invention to the precise forms disclosed; meanwhile, for the person skilled in the art, based on the idea of the present disclosure, there may be variations in the specific embodiments and the application scope, and in summary, the present disclosure should not be construed as limiting the present disclosure.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present disclosure are used to distinguish between different objects and are not used to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The foregoing detailed description of the embodiments of the present disclosure has been presented for purposes of illustration and description and is intended to be exemplary only and is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Meanwhile, a person skilled in the art should, according to the idea of the present disclosure, change or modify the embodiments and applications of the present disclosure. In view of the above, this description should not be taken as limiting the present disclosure.

Claims

1. An instruction fetch apparatus comprising:

backfill request circuitry configured to:

the instruction fetching circuit is further configured to:

2. The instruction fetch device of claim 1, wherein the instruction fetch circuit is further configured to:

3. The instruction fetch device of claim 1, wherein upon detecting the occurrence of the cache miss event, the cache is configured to:

receiving the backfill instructions from the external memory in batches.

4. The instruction fetch apparatus according to claim 3, wherein the instruction fetch circuit is configured to send out a received backfill instruction during an interval in which the buffer receives the backfill instructions in batches, and the buffer is configured to receive a next batch of backfill instructions from the external memory after the instruction fetch circuit sends out the received backfill instruction.

5. The instruction fetch device of claim 1, further comprising a buffer configured to:

receiving the backfill instructions from the external memory in batches; and

6. The instruction fetch device of any of claims 1-5, wherein the backfill request circuit is further configured to:

7. The instruction fetch device of any of claims 1-5, wherein the backfill request circuit is further configured to:

8. A processor, comprising:

the instruction fetch apparatus of any one of claims 1-7;

9. An integrated circuit chip comprising an instruction fetch apparatus according to any one of claims 1-7 or a processor according to claim 8.

10. A board card comprising the integrated circuit chip of claim 9.

11. An electronic device comprising the integrated circuit chip of claim 9.

12. A method for fetching instructions, comprising:

13. The method of claim 12, wherein sending instructions out comprises:

14. The method of claim 12, further comprising causing the buffer to suspend delivery of instructions to be sent out upon detection of the occurrence of the cache miss event, and receiving the backfill instructions from the external portion of memory in batches.

15. The method of claim 14, wherein during an interval in which the buffer receives the backfill instructions in batches, the method comprises sending out the backfill instructions that have been received, and after sending out, causing the buffer to receive backfill instructions for the next batch from the external memory.

16. The method of claim 12, further comprising buffering backfill instructions received in batches from the external memory and filling the buffer with backfill instructions received in batches after sending them out.

17. The method of any of claims 12-16, wherein sending the backfill request to the external memory comprises:

18. The method of any of claims 12-16, wherein sending the backfill request to the external memory comprises: