CN111159062A - Cache data scheduling method and device, CPU chip and server - Google Patents

Cache data scheduling method and device, CPU chip and server Download PDF

Info

Publication number
CN111159062A
CN111159062A CN201911305828.7A CN201911305828A CN111159062A CN 111159062 A CN111159062 A CN 111159062A CN 201911305828 A CN201911305828 A CN 201911305828A CN 111159062 A CN111159062 A CN 111159062A
Authority
CN
China
Prior art keywords
instruction
data
target data
target
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911305828.7A
Other languages
Chinese (zh)
Other versions
CN111159062B (en
Inventor
陈立勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN201911305828.7A priority Critical patent/CN111159062B/en
Publication of CN111159062A publication Critical patent/CN111159062A/en
Application granted granted Critical
Publication of CN111159062B publication Critical patent/CN111159062B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The embodiment of the invention discloses a method and a device for scheduling cache data, a CPU chip and a server, relates to the technical field of computers, and can effectively improve the instruction execution speed. The method comprises the following steps: before executing a data loading instruction, searching target data in a cache according to a storage address of the target data; the data loading instruction is used for loading the target data from the memory into the processor; and if the target data is not in the cache, requesting the target data from the memory according to the storage address of the target data in the memory. The invention is suitable for the buffer data scheduling.

Description

Cache data scheduling method and device, CPU chip and server
Technical Field
The invention relates to the technical field of computers, in particular to a cache data scheduling method and device, a CPU chip and a server.
Background
When the CPU executes a load/store instruction, it will send the effective address of the data to the memory access unit of the CPU, and at this time, if the data required by the load/store instruction is not in the CACHE (i.e. CACHE MISS occurs), the CPU can only wait for the memory access unit of the CPU to load the data of the corresponding address from the next level of memory into the data CACHE, and then can continue the execution of the instruction. At this stage, the CPU needs to wait for multiple cycles, severely reducing instruction execution speed.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for scheduling cache data, a CPU chip, and a server, which can effectively improve instruction execution speed.
In a first aspect, an embodiment of the present invention provides a method for scheduling cache data, including:
before executing a data loading instruction, searching target data in a cache according to a storage address of the target data; the data loading instruction is used for loading the target data from the memory into the processor;
and if the target data is not in the cache, requesting the target data from the memory according to the storage address of the target data in the memory.
Optionally, before executing the data load instruction, searching for the target data in the cache according to the storage address of the target data includes:
determining an execution order of the data load instructions in all instructions of a program compilation result;
selecting an instruction as a target instruction in an instruction set of which the execution order is prior to the data loading instruction, and attaching a pre-fetching operator to the target instruction;
when the target instruction is executed, searching the target data in the cache according to the storage address of the target data and the pre-fetching operator.
Optionally, the selecting an instruction as a target instruction in an instruction set whose execution order precedes that of the data load instruction includes:
and in an instruction set of which the execution order precedes that of the data loading instruction, selecting a storage address generation instruction of the target data or selecting an instruction of which the execution order follows the storage address generation instruction as the target instruction.
Optionally, the scheduling method further includes:
and adjusting the execution order of the instructions in the program compiling result so that at least one program instruction is separated between the target instruction and the data loading instruction.
Optionally, before searching for the target data in the cache according to the storage address of the target data, the method further includes:
and generating a storage address of the target data according to the numerical value stored in the preset register.
Optionally, after the request for the target data from the memory, the method further includes:
and executing the data loading instruction, wherein the fetching operation of the data loading instruction is at least one clock cycle later than the operation of requesting the target data from the memory.
In a second aspect, an embodiment of the present invention further provides a scheduling apparatus for buffering data, including:
the searching unit is used for searching the target data in the cache according to the storage address of the target data before executing the data loading instruction; the data loading instruction is used for loading the target data from the memory into the processor;
and the request unit is used for requesting the target data from the memory according to the storage address of the target data in the memory if the target data is not found in the cache.
Optionally, the searching unit includes:
the determining module is used for determining the execution order of the data loading instruction in all instructions of a program compiling result;
the selection module is used for selecting an instruction as a target instruction in an instruction set of which the execution order is prior to the data loading instruction, and attaching a pre-fetching operator to the target instruction;
and the searching module is used for searching the target data in the cache according to the storage address of the target data and the pre-fetching operator when the target instruction is executed.
Optionally, the selecting module is specifically configured to:
and in an instruction set of which the execution order precedes that of the data loading instruction, selecting a storage address generation instruction of the target data or selecting an instruction of which the execution order follows the storage address generation instruction as the target instruction.
Optionally, the apparatus further comprises:
and the adjusting unit is used for adjusting the execution order of the instructions in the program compiling result so as to enable at least one program instruction to be separated between the target instruction and the data loading instruction.
Optionally, the apparatus further includes a generating unit, configured to generate the storage address of the target data according to a numerical value stored in a preset register before searching the target data in the cache according to the storage address of the target data.
Optionally, the apparatus further includes an execution unit, configured to execute the data load instruction after the target data is requested to the memory, where a fetch operation of the data load instruction is at least one clock cycle later than the operation of requesting the target data to the memory.
In a third aspect, an embodiment of the present invention further provides a CPU chip, including: at least one processor core, a cache; the processor core to:
before executing a data loading instruction, searching the target data in the cache according to the storage address of the target data; the data loading instruction is used for loading the target data from the memory into the processor;
and if the target data is not in the cache, requesting the target data from the memory according to the storage address of the target data in the memory.
Optionally, the processor core is specifically configured to:
determining an execution order of the data load instructions in all instructions of a program compilation result;
selecting an instruction as a target instruction in an instruction set of which the execution order is prior to the data loading instruction, and attaching a pre-fetching operator to the target instruction;
when the target instruction is executed, searching the target data in a cache according to the storage address of the target data and the pre-fetching operator.
Optionally, the processor core is specifically configured to:
and in an instruction set of which the execution order precedes that of the data loading instruction, selecting a storage address generation instruction of the target data or selecting an instruction of which the execution order follows the storage address generation instruction as the target instruction.
Optionally, the processor core is further configured to:
and adjusting the execution order of the instructions in the program compiling result so that at least one program instruction is separated between the target instruction and the data loading instruction.
Optionally, the processor core is further configured to:
and generating the storage address of the target data according to the numerical value stored in a preset register before searching the target data in the cache according to the storage address of the target data.
Optionally, the processor core is further configured to:
and executing the data loading instruction after the target data is requested to the memory, wherein the fetching operation of the data loading instruction is at least one clock cycle later than the operation of requesting the target data to the memory.
In a fourth aspect, an embodiment of the present invention further provides a server, including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, and is used for executing any scheduling method for caching data provided by the embodiment of the invention.
According to the scheduling method and device for cache data, the CPU chip and the server provided by the embodiment of the invention, before the data loading instruction is executed, the target data can be searched in the cache according to the storage address of the target data, and if the target data is not in the cache, the target data is requested to the memory according to the storage address of the target data in the memory. Therefore, corresponding data can be fetched from the cache in advance before the data loading instruction is executed, so that even if the data is not in the cache, the data is read from the memory into the cache before the data loading instruction is executed, the data is loaded and utilized by the processor, the processor does not need to wait for a plurality of clock cycles, and the execution efficiency of the program instruction is effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for scheduling cache data according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a scheduling apparatus for buffering data according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a CPU chip according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for scheduling cache data, including:
s11, before executing the data loading instruction, searching the target data in the cache according to the storage address of the target data; the data loading instruction is used for loading the target data from the memory into the processor;
the data load instruction may refer to an instruction to read data from the memory into the processor. When a pipeline processor executes an instruction, it can be generally broken down into the following operations: fetching an instruction, decoding the instruction, executing the instruction, accessing a memory, and writing back an execution result. The time period required for a processor to execute an instruction may be referred to as an instruction cycle, and an instruction cycle may include one or more clock cycles. The processor may perform one of the above operations on one instruction every clock cycle therein. Meanwhile, after one operation of one instruction is completed, the processor can execute the operation on the next instruction, so that an instruction pipeline processing structure is formed. For example, after the instruction detection 1 is decoded, on one hand, detection 1 may be executed, and on the other hand, detection 2 may be decoded at the same time.
In an embodiment of the present invention, a memory access unit exists in the processor, and the memory access unit is dedicated to data interaction with the memory, and in this step, before the data load instruction is executed, the storage address of the target data may be sent to the memory access unit, so that the memory access unit may request data from the cache or the memory in advance before the data load instruction is executed.
And S12, if the target data is not in the cache, requesting the target data from the memory according to the storage address of the target data in the memory.
In this step, if the target data is not in the cache, the data may be requested from the next level of memory, and since the data is requested from the memory before the data load instruction is executed, there is more time to read the data from the memory to the cache before the data load instruction is executed, which effectively improves the program execution speed of the processor.
According to the scheduling method of the cache data provided by the embodiment of the invention, before the data loading instruction is executed, the target data can be searched in the cache according to the storage address of the target data, and if the target data is not in the cache, the target data is requested to the memory according to the storage address of the target data in the memory. Therefore, corresponding data can be fetched from the cache in advance before the data loading instruction is executed, so that even if the data is not in the cache, more time is provided for reading the data from the memory to the cache before the data loading instruction is executed, the data is loaded and utilized by the processor, the processor does not need to wait for a plurality of clock cycles, and the execution efficiency of the program instruction is effectively improved.
Optionally, in an embodiment of the present invention, before executing the data load instruction, in step S11, searching for the target data in the cache according to the storage address of the target data may specifically include:
determining an execution order of the data load instructions in all instructions of a program compilation result;
selecting an instruction as a target instruction in an instruction set of which the execution order is prior to the data loading instruction, and attaching a pre-fetching operator to the target instruction;
when the target instruction is executed, searching the target data in the cache according to the storage address of the target data and the pre-fetching operator.
For example, in one embodiment of the present invention, the program compilation results in INST1, INST2, INST3, and INST4 instructions being executed in that order. If INST3 is a data load instruction and the data address used by INST3 is calculated in INSN2, then an operator may be appended to the INSN2 instruction to request data from the cache in advance, or if the data is not present in the cache, the data may be brought into the cache in advance from memory. If the data address used in INST3 is calculated in INST1, a prefetch operator may be appended in INST1 or INST 2.
In the embodiment of the present invention, as a result of program compilation, there may be a plurality of instructions whose execution order precedes that of the data load instruction, and the specific selection of which instruction is used as the target instruction may be set as required.
Optionally, in an embodiment of the present invention, in an instruction set whose execution order precedes that of the data load instruction, selecting an instruction as a target instruction may include:
and in an instruction set of which the execution order precedes that of the data loading instruction, selecting a storage address generation instruction of the target data or selecting an instruction of which the execution order follows the storage address generation instruction as the target instruction.
That is, since data prefetching needs to be performed according to the memory address where the target data is located, the prefetching operation can be performed when the storage address of the target data is generated, so that the prefetched data can be obtained as early as possible, and the prefetching operation can be performed after the storage address of the target data is generated and before the target data is loaded. Optionally, the prefetch operation may be implemented by adding a prefetch operator to the original instruction, or by adding a prefetch operation instruction alone, which is not limited in the embodiment of the present invention.
Further, after the target instruction is selected, in an embodiment of the present invention, an execution order of instructions in the program compilation result may be adjusted, so that at least one program instruction is separated between the target instruction and the data load instruction. For example, an instruction unrelated to data loading can be inserted between the target instruction and the data loading instruction for execution, so that the prefetch operation can be performed more clock cycles ahead, and smooth execution of the data loading instruction is further ensured. For another example, in one embodiment of the invention, the instruction execution order is: INST1, INST4, INST3, INST2, if INST3 is a data load instruction and INST1 is a target instruction, and if INST2 is not related to data load, the instruction execution order in the program compilation result can be readjusted, INST2 is inserted between INST1 and INST3, and the adjusted instruction execution order may be, for example: INST1, INST4, INST2 and INST 3.
Since the target data needs to be searched for in the cache based on the storage address of the target data in step S11, at least the storage address of the target data is already known when searching for the target data. The method for obtaining the storage address of the target data may be various, and various known addressing methods may be adopted, which is not limited in the embodiment of the present invention.
For example, in an embodiment of the present invention, before searching the target data in the cache according to the storage address of the target data in step S11, the method for scheduling cache data according to the embodiment of the present invention may further include: and determining the storage address of the target data according to the numerical value stored in the preset register. For example, when instruction1 is an addition of two immediate numbers and instruction2 is an address obtained by taking the operation result of instruction1 as the address and fetching the data stored in the address, the storage address of the target data is generated when instruction1 is executed.
Further, in step S12, if the target data is not found in the cache, after the request for the target data from the memory according to the storage address of the target data in the memory, the method for scheduling cache data according to the embodiment of the present invention may further include: and executing the data loading instruction, wherein the fetching operation of the data loading instruction is at least one clock cycle later than the operation of requesting the target data from the memory.
That is, when the target data needs to be requested from the memory, the operation time of at least one clock cycle for the data requesting operation is prepared in advance, so that the data loading instruction can smoothly load the target data from the memory into the processor.
The scheduling method of the pre-fetch data provided by the embodiment of the present invention is described in detail below by specific embodiments.
A portion of the high level language program instructions in an embodiment of the invention include:
high level language (e.g., C language) fragments:
int a; // a is a global variable with an address of 0x12345678
{
Int c=a;
}
The pipeline of assembler instructions for the high-level language program may be as shown in table 1.
TABLE 1
Figure 1
Where i is the instruction number, INSN ASM is the instruction code, and n cycles, etc. are the clock cycles.
i:lui$r2,0x1234//$r2=0x1234,0000
i + 1: p $ r2, $ r2, $ x 5678/$ r2 ═ 0x1234, 5678, $ r2 get the address of variable a, and the CPU sends address 0x12345678 to the memory access unit for data prefetch in (n +3) cycle.
i + 2: lw $ r2, ($ r 2)/$ r2 takes the value of variable a.
The ith instruction, after raising the hexadecimal immediate 1234 to the high order and then supplementing four 0's, puts it into the register $ r2, i.e. putting 1234 x 216Register $ r2 is placed. The (i +1) th instruction, which represents a logical OR of the hexadecimal number 5678 with the number in register $ r2, results in the hexadecimal number 0x12345678, and places 0x12345678 in register $ r 2. Meanwhile, the prefetch function is executed to prefetch the data with hexadecimal number 0x12345678 as the address, that is, to inquire whether the cache has the data in the memory location with the memory address 0x 12345678. The (i +2) th instruction, which represents the address of the value in the register $ r2, reads the data in the address and stores the data in the register $ r 2.
As can be seen from the pipeline of table 1, since the address forming instruction ori.p with the corresponding implied nature is added, when the CPU executes the (i +1) th instruction, the CPU sends a register value of $ r2 as an address to the memory access unit for prefetching data (mem prefetch) in the (n +3) th cycle, and if data CACHE MISS occurs, the memory access unit can request data from the next stage of memory in advance.
When the CPU executes the (i +2) th instruction, the CPU issues the register value of $ r2 as an address to the memory access unit in the (n +5) th cycle, and since the data prefetch command is issued two times earlier in the (i +1) th instruction, the (i +2) th instruction completes the execution of the instruction2 clock cycles earlier even if the data CACHE MISS occurs.
In this way, the high-level language compiler generates an address forming instruction with an implied property, and guides the CPU to simultaneously send the address immediate to the memory access unit in advance when executing the instruction, so as to complete the pre-fetching function of the memory represented by the address immediate.
According to the cache data scheduling method provided by the embodiment of the invention, the use position of the data address can be found in advance through the high-level language compiler, the corresponding address forming instruction with the suggestive property is generated, and the data request is sent to the memory access unit in advance, so that even if the data CACHE MISS occurs, the waiting clock period can be saved when the data is actually used, the effect of improving the hit rate of the data cache is achieved, and the program execution speed of the processor is effectively improved.
In the prior art, the (i +1) th instruction is not prefetched, but is directly executed in the (i +2) th instruction, and as can be seen from the pipeline in table 2, when the CPU executes the (i +2) th instruction, the CPU sends the register value of $ r2 as an address to the memory access unit in the (n +5) th cycle, and at this time, once the data CACHE MISS, CPU occurs, a wait cycle is inserted into the stream until the memory access unit retrieves the data in the corresponding address from the next stage of memory.
TABLE 2
Figure BDA0002329421880000101
Correspondingly, as shown in fig. 2, an embodiment of the present invention further provides a scheduling apparatus for buffering data, including:
a search unit 31, configured to search the target data in the cache according to a storage address of the target data before executing the data load instruction; the data loading instruction is used for loading the target data from the memory into the processor;
a requesting unit 32, configured to request the memory for the target data according to a storage address of the target data in the memory if the target data is not found in the cache.
The scheduling apparatus for cache data provided in the embodiments of the present invention can search the target data in the cache according to the storage address of the target data before executing the data load instruction, and request the target data from the memory according to the storage address of the target data in the memory if the target data is not in the cache. Therefore, corresponding data can be fetched from the cache in advance before the data loading instruction is executed, so that even if the data is not in the cache, the data is read from the memory into the cache before the data loading instruction is executed, the data is loaded and utilized by the processor, the processor does not need to wait for a plurality of clock cycles, and the execution efficiency of the program instruction is effectively improved.
Optionally, the searching unit 31 may include:
the determining module is used for determining the execution order of the data loading instruction in all instructions of a program compiling result;
the selection module is used for selecting an instruction as a target instruction in an instruction set of which the execution order is prior to the data loading instruction, and attaching a pre-fetching operator to the target instruction;
and the searching module is used for searching the target data in the cache according to the storage address of the target data and the pre-fetching operator when the target instruction is executed.
Optionally, the selection module may be specifically configured to:
and in an instruction set of which the execution order precedes that of the data loading instruction, selecting a storage address generation instruction of the target data or selecting an instruction of which the execution order follows the storage address generation instruction as the target instruction.
Optionally, the scheduling apparatus may further include:
and the adjusting unit is used for adjusting the execution order of the instructions in the program compiling result so as to enable at least one program instruction to be separated between the target instruction and the data loading instruction.
Optionally, the scheduling apparatus may further include:
the generating unit is used for determining the storage address of the target data according to the numerical value stored in the preset register before searching the target data in the cache according to the storage address of the target data.
Optionally, the scheduling apparatus may further include:
and the execution unit is used for executing the data loading instruction after the target data is requested to the memory, wherein the fetching operation of the data loading instruction is at least one clock cycle later than the operation of requesting the target data to the memory.
Correspondingly, as shown in fig. 3, an embodiment of the present invention further provides a CPU chip 5, including: at least one processor core 51, a cache 52;
a processor core 51 to:
before executing a data loading instruction, searching the target data in the cache according to the storage address of the target data; the data loading instruction is used for loading the target data from the memory into the processor;
and if the target data is not in the cache, requesting the target data from the memory according to the storage address of the target data in the memory.
The processor core 51 according to the embodiment of the present invention can search the target data in the cache according to the storage address of the target data before executing the data loading instruction, and request the target data from the memory according to the storage address of the target data in the memory if the target data is not in the cache. Therefore, corresponding data can be fetched from the cache in advance before the data loading instruction is executed, so that even if the data is not in the cache, more time is provided for reading the data from the memory to the cache before the data loading instruction is executed, the data is loaded and utilized by the processor, the processor does not need to wait for a plurality of clock cycles, and the execution efficiency of the program instruction is effectively improved.
Optionally, the processor core 51 may be specifically configured to:
determining an execution order of the data load instructions in all instructions of a program compilation result;
selecting an instruction as a target instruction in an instruction set of which the execution order is prior to the data loading instruction, and attaching a pre-fetching operator to the target instruction;
when the target instruction is executed, searching the target data in a cache according to the storage address of the target data and the pre-fetching operator.
Optionally, the processor core 51 may be specifically configured to:
and in an instruction set of which the execution order precedes that of the data loading instruction, selecting a storage address generation instruction of the target data or selecting an instruction of which the execution order follows the storage address generation instruction as the target instruction.
Optionally, the processor core 51 may be further configured to:
and adjusting the execution order of the instructions in the program compiling result so that at least one program instruction is separated between the target instruction and the data loading instruction.
Optionally, the processor core 51 may be further configured to:
and according to the storage address of the target data, before the target data is searched in the cache, determining the storage address of the target data according to the numerical value stored in a preset register.
Optionally, the processor core 51 may be further configured to:
and executing the data loading instruction after the target data is requested to the memory, wherein the fetching operation of the data loading instruction is at least one clock cycle later than the operation of requesting the target data to the memory.
Accordingly, as shown in fig. 4, a server provided in an embodiment of the present invention may include: the device comprises a shell 61, a processor 62, a memory 63, a circuit board 64 and a power circuit 65, wherein the circuit board 64 is arranged inside a space enclosed by the shell 61, and the processor 62 and the memory 63 are arranged on the circuit board 64; a power supply circuit 65 for supplying power to each circuit or device of the electronic apparatus; the memory 63 is used to store executable program code; the processor 62 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 63, so as to execute any one of the scheduling methods of cache data provided in the foregoing embodiments.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.
In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
For convenience of description, the above devices are described separately in terms of functional division into various units/modules. Of course, the functionality of the units/modules may be implemented in one or more software and/or hardware implementations of the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (19)

1. A method for scheduling buffered data, comprising:
before executing a data loading instruction, searching target data in a cache according to a storage address of the target data; the data loading instruction is used for loading the target data from the memory into the processor;
and if the target data is not in the cache, requesting the target data from the memory according to the storage address of the target data in the memory.
2. The method of claim 1, wherein the searching for the target data in the cache according to the storage address of the target data before executing the data load instruction comprises:
determining an execution order of the data load instructions in all instructions of a program compilation result;
selecting an instruction as a target instruction in an instruction set of which the execution order is prior to the data loading instruction, and attaching a pre-fetching operator to the target instruction;
when the target instruction is executed, searching the target data in the cache according to the storage address of the target data and the pre-fetching operator.
3. The method of claim 2, wherein selecting an instruction as a target instruction in an instruction set that precedes the data load instruction in execution order comprises:
and in an instruction set of which the execution order precedes that of the data loading instruction, selecting a storage address generation instruction of the target data or selecting an instruction of which the execution order follows the storage address generation instruction as the target instruction.
4. The scheduling method of claim 2 or 3, further comprising:
and adjusting the execution order of the instructions in the program compiling result so that at least one program instruction is separated between the target instruction and the data loading instruction.
5. The scheduling method according to any one of claims 1 to 3, wherein before searching the target data in the cache according to the storage address of the target data, the method further comprises:
and determining the storage address of the target data according to the numerical value stored in the preset register.
6. The scheduling method according to any one of claims 1 to 3, wherein after the request for the target data from the memory, the method further comprises:
and executing the data loading instruction, wherein the fetching operation of the data loading instruction is at least one clock cycle later than the operation of requesting the target data from the memory.
7. A scheduling apparatus for buffering data, comprising:
the searching unit is used for searching the target data in the cache according to the storage address of the target data before executing the data loading instruction; the data loading instruction is used for loading the target data from the memory into the processor;
and the request unit is used for requesting the target data from the memory according to the storage address of the target data in the memory if the target data is not found in the cache.
8. The scheduling apparatus of claim 7, wherein the searching unit comprises:
the determining module is used for determining the execution order of the data loading instruction in all instructions of a program compiling result;
the selection module is used for selecting an instruction as a target instruction in an instruction set of which the execution order is prior to the data loading instruction, and attaching a pre-fetching operator to the target instruction;
and the searching module is used for searching the target data in the cache according to the storage address of the target data and the pre-fetching operator when the target instruction is executed.
9. The scheduling device of claim 8, wherein the selection module is specifically configured to:
and in an instruction set of which the execution order precedes that of the data loading instruction, selecting a storage address generation instruction of the target data or selecting an instruction of which the execution order follows the storage address generation instruction as the target instruction.
10. The scheduling apparatus according to claim 8 or 9, further comprising:
and the adjusting unit is used for adjusting the execution order of the instructions in the program compiling result so as to enable at least one program instruction to be separated between the target instruction and the data loading instruction.
11. The scheduling apparatus according to any one of claims 7 to 10, further comprising a generating unit, configured to determine a storage address of target data according to a value stored in a preset register before searching the target data in a cache according to the storage address of the target data.
12. The scheduling apparatus of any one of claims 7 to 10, further comprising an execution unit configured to execute the data load instruction after requesting the target data from a memory, wherein a fetch operation of the data load instruction is at least one clock cycle later than the request of the target data from the memory.
13. A CPU chip, comprising: at least one processor core, a cache;
the processor core to:
before executing a data loading instruction, searching the target data in the cache according to the storage address of the target data; the data loading instruction is used for loading the target data from the memory into the processor;
and if the target data is not in the cache, requesting the target data from the memory according to the storage address of the target data in the memory.
14. The CPU chip of claim 13, wherein the processor core is specifically configured to:
determining an execution order of the data load instructions in all instructions of a program compilation result;
selecting an instruction as a target instruction in an instruction set of which the execution order is prior to the data loading instruction, and attaching a pre-fetching operator to the target instruction;
when the target instruction is executed, searching the target data in a cache according to the storage address of the target data and the pre-fetching operator.
15. The CPU chip of claim 14, wherein the processor core is specifically configured to:
and in an instruction set of which the execution order precedes that of the data loading instruction, selecting a storage address generation instruction of the target data or selecting an instruction of which the execution order follows the storage address generation instruction as the target instruction.
16. The CPU chip of claim 14 or 15, wherein the processor core is further configured to:
and adjusting the execution order of the instructions in the program compiling result so that at least one program instruction is separated between the target instruction and the data loading instruction.
17. The CPU chip of any of claims 13 to 15, wherein the processor core is further configured to:
and according to the storage address of the target data, before the target data is searched in the cache, determining the storage address of the target data according to the numerical value stored in a preset register.
18. The CPU chip of any of claims 13 to 15, wherein the processor core is further configured to:
and executing the data loading instruction after the target data is requested to the memory, wherein the fetching operation of the data loading instruction is at least one clock cycle later than the operation of requesting the target data to the memory.
19. A server, comprising: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing the scheduling method of the cache data according to any one of the preceding claims 1 to 6.
CN201911305828.7A 2019-12-20 2019-12-20 Cache data scheduling method and device, CPU chip and server Active CN111159062B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911305828.7A CN111159062B (en) 2019-12-20 2019-12-20 Cache data scheduling method and device, CPU chip and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911305828.7A CN111159062B (en) 2019-12-20 2019-12-20 Cache data scheduling method and device, CPU chip and server

Publications (2)

Publication Number Publication Date
CN111159062A true CN111159062A (en) 2020-05-15
CN111159062B CN111159062B (en) 2023-07-07

Family

ID=70557574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911305828.7A Active CN111159062B (en) 2019-12-20 2019-12-20 Cache data scheduling method and device, CPU chip and server

Country Status (1)

Country Link
CN (1) CN111159062B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831451A (en) * 2020-07-21 2020-10-27 平安科技(深圳)有限公司 Cloud host memory allocation method, cloud host, cloud device and storage medium
CN112199400A (en) * 2020-10-28 2021-01-08 支付宝(杭州)信息技术有限公司 Method and apparatus for data processing
CN112444731A (en) * 2020-10-30 2021-03-05 海光信息技术股份有限公司 Chip testing method and device, processor chip and server
CN116303125A (en) * 2023-05-16 2023-06-23 太初(无锡)电子科技有限公司 Request scheduling method, cache, device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130318523A1 (en) * 2012-05-25 2013-11-28 Verizon Patent And Licensing Inc. Hypervisor-based stack pre-fetch cache
CN105677580A (en) * 2015-12-30 2016-06-15 杭州华为数字技术有限公司 Method and device for accessing cache
CN110442382A (en) * 2019-07-31 2019-11-12 西安芯海微电子科技有限公司 Prefetch buffer control method, device, chip and computer readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130318523A1 (en) * 2012-05-25 2013-11-28 Verizon Patent And Licensing Inc. Hypervisor-based stack pre-fetch cache
CN105677580A (en) * 2015-12-30 2016-06-15 杭州华为数字技术有限公司 Method and device for accessing cache
CN110442382A (en) * 2019-07-31 2019-11-12 西安芯海微电子科技有限公司 Prefetch buffer control method, device, chip and computer readable storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111831451A (en) * 2020-07-21 2020-10-27 平安科技(深圳)有限公司 Cloud host memory allocation method, cloud host, cloud device and storage medium
WO2021120843A1 (en) * 2020-07-21 2021-06-24 平安科技(深圳)有限公司 Cloud host memory allocation method, cloud host, device, and storage medium
CN112199400A (en) * 2020-10-28 2021-01-08 支付宝(杭州)信息技术有限公司 Method and apparatus for data processing
CN112444731A (en) * 2020-10-30 2021-03-05 海光信息技术股份有限公司 Chip testing method and device, processor chip and server
CN112444731B (en) * 2020-10-30 2023-04-11 海光信息技术股份有限公司 Chip testing method and device, processor chip and server
CN116303125A (en) * 2023-05-16 2023-06-23 太初(无锡)电子科技有限公司 Request scheduling method, cache, device, computer equipment and storage medium
CN116303125B (en) * 2023-05-16 2023-09-29 太初(无锡)电子科技有限公司 Request scheduling method, cache, device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111159062B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
CN111159062B (en) Cache data scheduling method and device, CPU chip and server
JP5357017B2 (en) Fast and inexpensive store-load contention scheduling and transfer mechanism
JP3830651B2 (en) Microprocessor circuit, system, and method embodying a load target buffer for prediction of one or both of loop and stride
JP4934267B2 (en) Compiler device
US8140768B2 (en) Jump starting prefetch streams across page boundaries
TWI437490B (en) Microprocessor and method for reducing tablewalk time
US6401192B1 (en) Apparatus for software initiated prefetch and method therefor
TWI515567B (en) Translation address cache for a microprocessor
US8006041B2 (en) Prefetch processing apparatus, prefetch processing method, storage medium storing prefetch processing program
JPH07281895A (en) Branch cache
EP1039382B1 (en) Memory access optimizing method
JP2007207246A (en) Self prefetching l2 cache mechanism for instruction line
WO2005088455A2 (en) Cache memory prefetcher
CN108874691B (en) Data prefetching method and memory controller
US20090204791A1 (en) Compound Instruction Group Formation and Execution
CN108874690B (en) Data prefetching implementation method and processor
US6785796B1 (en) Method and apparatus for software prefetching using non-faulting loads
CN112099851A (en) Instruction execution method and device, processor and electronic equipment
JP5238797B2 (en) Compiler device
CN114924797A (en) Method for prefetching instruction, information processing apparatus, device, and storage medium
JP2000207224A (en) Software prefetching method
US20050144420A1 (en) Data processing apparatus and compiler apparatus
JP3739556B2 (en) Information processing device
KR20080073822A (en) Apparatus for controlling program command prefetch and method thereof
JP2008191824A (en) Prefetch method and unit for cache mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 300000 North 2-204 industrial incubation-3-8, No. 18, Haitai West Road, Huayuan Industrial Zone, Binhai New Area, Tianjin

Applicant after: Haiguang Information Technology Co.,Ltd.

Address before: 300000 North 2-204 industrial incubation-3-8, No. 18, Haitai West Road, Huayuan Industrial Zone, Binhai New Area, Tianjin

Applicant before: HAIGUANG INFORMATION TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant