WO2024139445A1

WO2024139445A1 - Data prefetching method and data prefetching apparatus

Info

Publication number: WO2024139445A1
Application number: PCT/CN2023/120191
Authority: WO
Inventors: 王科兵; 陈章麒
Original assignee: 华为技术有限公司
Priority date: 2022-12-30
Filing date: 2023-09-20
Publication date: 2024-07-04

Abstract

Embodiments of the present application provide a data prefetching method and a data prefetching apparatus, which are mainly applied to a processing system. The data prefetching method comprises: executing a first instruction, accessing a storage unit according to an operand of the first instruction, and storing obtained first data into a cache and a first register, wherein the operand of the first instruction comprises a first address; executing a second instruction, according to the first data in the first register, determining a second address and accessing the storage unit, and storing obtained second data into the cache and the first register; and executing a third instruction, according to the second data in the first register, determining a third address and accessing the storage unit, and storing obtained third data into the cache, wherein the second address corresponds to the second data in the storage unit, and the third address corresponds to the third data in the storage unit. According to the embodiments of the present application, three new prefetching instructions and the first register used for accessing prefetched data are introduced, so that data prefetching based on irregular indirect addressing is realized.

Description

A data pre-fetching method and a data pre-fetching device

This application claims priority to the Chinese patent application filed with the State Intellectual Property Office on December 30, 2022, with application number 202211719527.0 and application name “A data prefetching method and data prefetching device”, the entire contents of which are incorporated by reference in this application.

Technical Field

The present application relates to the field of storage technology, and in particular to a data pre-fetching method and a data pre-fetching device.

Background technique

In recent years, with the development of processor computing technology, the CPU data processing speed of devices has been greatly improved. In current processor computing, the bottleneck of performance improvement lies in memory access operations. In order to improve system access performance, the CPU usually predicts the data to be accessed and loads the predicted data from the storage device with slower access speed to the storage device with faster access speed in advance.

At present, the processor prefetches data in two main ways: fixed-mode memory access and irregular indirect addressing memory access. For example, when accessing data B[func(A[i])], array A is accessed by traversing the subscripts, which is a fixed-mode memory access. Since the memory access mode is fixed, it is very easy for the current data prefetching mechanism to load data in advance. The subscripts of array B are formed by function transformation of the elements of array A, which is an irregular indirect addressing memory access. Since the future access address cannot be inferred from the historical access records, it is difficult for the current data prefetching mechanism to load data in advance.

In today's systems, array-based random indirect addressing has become a more common data access mode. Therefore, it is urgent to solve the data prefetching bottleneck of random indirect addressing.

Summary of the invention

The embodiment of the present application provides a data prefetching method and a data prefetching device, which are used to simplify the data prefetching of irregular indirect addressing and improve the prefetching effect and system performance. To achieve the above purpose, the present application adopts the following technical solution.

In order to achieve the above objectives, this application adopts the following technical solutions:

In a first aspect, an embodiment of the present application provides a data pre-fetching method, which is mainly applied to a processing system, wherein the processing system executes a first instruction set, and the data pre-fetching method includes:

Execute a first instruction, access a storage unit according to an operand of the first instruction, and store the acquired first data in a cache and a first register; execute a second instruction, determine a second address according to the first data in the first register, access a storage unit, and store the acquired second data in the cache and the first register; execute a third instruction, determine a third address according to the second data in the first register, access a storage unit, and store the acquired third data in the cache; wherein the operand of the first instruction includes a first address, the first address corresponds to the first data in the storage unit, the second address corresponds to the second data in the storage unit, and the third address corresponds to the third data in the storage unit.

In an embodiment of the present application, three new prefetch instructions and a first register for accessing prefetched data are introduced into the processing system. Through the three prefetch instructions, the reading and writing of the first register and the operation of the prefetched data stored therein are realized to obtain the address information of the next prefetch, thereby realizing the data prefetching method based on irregular indirect addressing, simplifying the prefetching steps and complexity.

In a possible implementation, when acquiring the first data from the storage unit according to the first instruction and the first address, the processing system further acquires the first data according to the first offset. The processing system can calculate the correct pre-fetch address according to the first offset and the first address, and acquire the first data from the pre-fetch address. Optionally, the first offset can be a value preset in the processor, or can be input as an operand of the first instruction.

In one possible implementation, the operand of the first instruction includes a first offset, wherein the first offset is adjusted according to a preset standard, and the preset standard includes prefetch timeliness, and the prefetch timeliness is used to indicate whether the prefetched data is stored in the cache when reading the prefetched data.

In the processing system, prefetch timeliness is an important criterion for evaluating the system's data prefetching, that is, when the processor executes a certain instruction, whether its operand (corresponding to the prefetched data) is exactly stored in the corresponding cache address, that is, the previous data prefetch is successful and meets the timeliness. The processing system can adjust the first offset according to the prefetch timeliness of the previous prefetch. When the timeliness is accurate, the first offset can be maintained unchanged; when the timeliness indicates that the data is replaced too early, the first offset is reduced; when the timeliness indicates that the data is still When the prefetched instruction is not received, the first offset is increased. By adjusting the first offset, the prefetching requirements of the processing system for different instructions can be flexibly adapted, making data prefetching more accurate.

In a possible implementation, when the processor determines the second address according to the first data and the second instruction and obtains the second data from the storage unit according to the second address, the second data is obtained according to the second offset.

In a possible implementation, when the processor determines the third address according to the second data and the third instruction and obtains the pre-fetched data from the storage unit according to the pre-fetched address, the third data is obtained according to the third offset.

In a possible implementation, an operand of a second instruction executed by the processor according to the first data includes a fourth address, and the second address can be determined by comparing the fourth address and the first data in the first register.

That is, the fourth address is used as the operand of the second instruction, and the second instruction can calculate the first data stored in the first register by the first instruction based on the fourth address, and the fourth address can be an offset, or a register or memory address. The second instruction performs an addition operation on the first data and the fourth address to obtain the second address. By increasing the operand of the second instruction, the data prefetching of the processing system is made more flexible and adaptable to a variety of irregular indirect addressing operations.

In a possible implementation, the operand of the third instruction executed by the processor according to the second data includes the fifth address, and the third address can be determined by comparing the fifth address and the second data in the first register. By increasing the operand of the third instruction, the data prefetching of the processing system is made more flexible and adaptable to a variety of irregular indirect addressing operations.

In a possible implementation, after executing the first instruction, the second instruction, or the third instruction and before accessing the storage unit, it is determined that the first address, the second address, or the third address corresponding to each of them does not cross the array boundary. That is, when the processor executes any of the above three instructions, if it is found that the data crosses the boundary, the corresponding instruction is not executed.

Since data prefetching requires reference to address information, which may be calculated, it may trigger out-of-bounds access to related storage devices, including at least the following two situations: 1. Data out-of-bounds, such as exceeding the boundary of an array; 2. Access rights out-of-bounds, such as illegal access or the memory to be accessed exceeds the instruction access rights. By treating the prefetch instruction that triggers out-of-bounds as a no-operation instruction (NOP) instruction and not making any changes to the CPU architecture state, risks such as system lockup can be avoided.

In a possible implementation, the processor only stores the pre-fetched data obtained through the third instruction in the cache, thereby reducing the storage pressure of the first register and releasing more space for data stored through the first instruction and the second instruction in the subsequent pre-fetch process.

In one possible implementation, when there are no empty storage cells in the first register and fifth data is to be stored, the fourth data in the first register is replaced, wherein the fourth data is the earliest stored in the first register, and the data in the first register is reasonably updated and its utilization rate is improved.

In a second aspect, an embodiment of the present application provides a data pre-fetch device, comprising a pre-fetch unit and a register unit, wherein the pre-fetch unit is electrically coupled to a storage unit and a cache unit, respectively. Exemplarily, the pre-fetch unit can be electrically coupled to the storage unit and the cache unit through an external interface. The pre-fetch unit can execute at least three instructions in a first instruction set, and the register unit stores data obtained by the pre-fetch unit through the at least three instructions; the pre-fetch unit is specifically used to:

Execute a first instruction to obtain data in a storage unit and store it in a register unit and a cache unit, wherein the operand of the first instruction includes a first address; execute a second instruction based on the data in the register unit to obtain data in the storage unit and store it in the register unit and the cache unit; execute a third instruction based on the data in the register unit to obtain data in the storage unit and store it in the cache unit.

In an embodiment of the present application, three new prefetch instructions and a first register for storing prefetched data are introduced. Through the three prefetch instructions, the reading and writing of the first register and the operation of the prefetched data stored therein are realized to obtain the address information for the next prefetch, thereby realizing a data prefetch method based on irregular indirect addressing and simplifying the prefetching steps and complexity.

In a possible implementation, when the prefetch unit obtains the first data from the storage unit according to the first instruction and the first address, the prefetch unit also obtains the first data according to the first offset. The processor can calculate the correct prefetch address according to the first offset and the first address, and obtain the first data from the prefetch address. Optionally, the first offset can be a value preset in the processor, or can be input as an operand of the first instruction.

In the processing system, prefetch timeliness is an important criterion for evaluating the system's data prefetching, that is, the processor executes a certain instruction. When the first offset is adjusted, the processing system can determine whether its operand (corresponding prefetched data) is stored in the corresponding cache address, that is, the previous data prefetch is successful and timely. The processing system can adjust the first offset according to the prefetch timeliness of the previous prefetch. When the timeliness is accurate, the first offset can be maintained unchanged; when the timeliness indicates that the data has been replaced too early, the first offset is reduced; when the timeliness indicates that the data has not been prefetched, the first offset is increased. By adjusting the first offset, the processing system can flexibly adapt to the prefetch requirements of different instructions, making data prefetching more accurate.

In a possible implementation, when the prefetch unit determines the second address according to the first data and the second instruction and obtains the second data from the storage unit according to the second address, the second data is obtained according to the second offset.

In a possible implementation, when the prefetch unit determines the third address according to the second data and the third instruction and obtains the prefetched data from the storage unit according to the prefetch address, the third data is obtained according to the third offset.

In a possible implementation, an operand of a second instruction executed by the prefetch unit according to the first data includes a fourth address, and the second address can be determined by comparing the fourth address and the first data in the first register.

In a possible implementation, the operand of the third instruction executed by the prefetch unit according to the second data includes a fifth address, and the third address can be determined by comparing the fifth address and the second data in the first register. By increasing the operand of the third instruction, the data prefetching of the processing system is made more flexible and adaptable to a variety of irregular indirect addressing operations.

In a possible implementation, after executing the first instruction, the second instruction, or the third instruction and before accessing the storage unit, it is determined that the first address, the second address, or the third address corresponding to each of them does not cross the array boundary. That is, when the prefetch unit executes any of the above three instructions, if it is found that data crosses the boundary, the corresponding instruction is not executed.

In a possible implementation, the prefetch unit only stores the prefetched data obtained through the third instruction into the cache, thereby reducing the storage pressure of the first register and releasing more space for data stored through the first instruction and the second instruction in the subsequent prefetch process.

In a third aspect, an embodiment of the present application provides a data pre-fetching device, the device comprising: a processor, a first register, a first memory and a cache, wherein the data stored in the first memory includes data to be pre-fetched;

The processor is used to execute instructions in a first instruction set, including at least three instructions:

The first instruction is used to obtain first data from the first memory according to the first address, and store it in the first register and the cache; the second instruction is used to determine the second address according to the first data in the first register, and obtain the second data corresponding to the second address from the first memory, and store it in the first register and the cache; the third instruction is used to determine the prefetch address according to the second data in the first register, and obtain the data to be prefetched from the first memory and store it in the cache.

In a possible implementation, when the processor obtains the first data from the first memory according to the first instruction and the first address, the processor also obtains the first data according to the first offset. The processor can calculate the correct pre-fetch address according to the first offset and the first address, and obtain the first data from the pre-fetch address. Optionally, the first offset can be a value preset in the processor, or can be input as an operand of the first instruction.

In a possible implementation, the operand of the first instruction includes a first offset, wherein the first offset is based on a preset standard Adjustment, the preset standard includes pre-fetch timeliness, and the pre-fetch timeliness is used to indicate whether the pre-fetched data is stored in the cache when reading the pre-fetched data.

In a processing system, prefetch timeliness is an important criterion for evaluating the system's data prefetching, that is, when the processor executes a certain instruction, whether its operand (corresponding to the prefetched data) is exactly stored in the corresponding cache address, that is, the previous data prefetch is successful and meets the timeliness. The processing system can adjust the first offset according to the prefetch timeliness of the previous prefetch. When the timeliness is accurate, the first offset can be maintained unchanged; when the timeliness indicates that the data has been replaced too early, the first offset is reduced; when the timeliness indicates that the data has not been prefetched, the first offset is increased. By adjusting the first offset, the processing system can flexibly adapt to the prefetching requirements of different instructions, making data prefetching more accurate.

In a possible implementation, when the processor determines the second address according to the first data and the second instruction and obtains the second data from the first memory according to the second address, the second data is obtained according to the second offset.

In a possible implementation, when the processor determines the third address according to the second data and the third instruction and obtains the pre-fetched data from the first memory according to the pre-fetched address, the pre-fetched data is obtained according to the third offset.

In a possible implementation, an operand of a second instruction executed by the processor according to the first data includes a third address, and the second address can be determined by comparing the third address and the first data in the first register.

That is, the third address is used as the operand of the second instruction to calculate the first data stored in the first register by the first instruction. The third address can be an offset or a register or memory address. The second instruction performs an addition operation on the first data and the third address to obtain the second address. By increasing the operand of the second instruction, the data prefetching of the processing system is made more flexible and adaptable to a variety of irregular indirect addressing operations.

In a possible implementation, the operand of the third instruction executed by the processor according to the second data includes a fourth address, and the prefetch address can be determined by comparing the fourth address and the second data in the first register. By increasing the operand of the third instruction, the data prefetching of the processing system is made more flexible and adaptable to a variety of irregular indirect addressing operations.

In a possible implementation, when the processor executes any one of the above three instructions, if it is found that the corresponding first address, second address or pre-fetch address has data out of bounds, the corresponding instruction is not executed.

In a possible implementation, the processor, the first register, and the cache are all integrated on one chip.

In a fourth aspect, an embodiment of the present application provides an electronic device, comprising a memory and a processor; the processor is used to send a first prefetch instruction, a second prefetch instruction and a third prefetch instruction to the memory, for storing at least one data block stored in a first data area of the memory to at least one cache unit.

In a fifth aspect, an embodiment of the present application provides a processing system, which includes a memory and a processor, wherein the memory is used to store a computer program, and the processor is configured to execute all or part of the computer program stored in the memory to execute the method described in the first aspect.

In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program and a first instruction set, and when the computer program is executed by a controller, it is used to implement the method described in the first aspect.

In a seventh aspect, an embodiment of the present application provides a computer program product, which, when executed by a controller, is used to implement the method described in the first aspect above.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following briefly introduces the drawings required for use in the description of the embodiments of the present application. Obviously, the drawings described below are only some embodiments of the present application and are not intended for general use in the art. According to technical personnel, other drawings can be obtained based on these drawings without any creative work.

FIG1 is a schematic diagram of the structure of a processor provided in an embodiment of the present application;

FIG2 is a schematic diagram of a structure in which a processor pre-reads data from a memory provided by an embodiment of the present application;

FIG3 is a schematic diagram of a processor software pre-fetching data provided by an embodiment of the present application;

FIG4 is a schematic diagram of a processor pre-reading data from a memory provided by an embodiment of the present application;

FIG5 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application;

6 is a schematic diagram of a structure in which a processor pre-fetches data through a first pre-fetch instruction provided by an embodiment of the present application;

7 is a schematic diagram of a structure in which a processor pre-fetches data through a second pre-fetch instruction provided in an embodiment of the present application;

8 is a schematic diagram of a structure in which a processor pre-fetches data through a third pre-fetch instruction provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of yet another processor software pre-fetching data provided by an embodiment of the present application.

Detailed ways

The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

The words "first", "second" and the like mentioned herein do not indicate any order, quantity or importance, but are only used to distinguish different components. Similarly, words such as "one" or "an" do not indicate quantity limitation, but indicate the existence of at least one.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "for example" in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplary" or "for example" is intended to present related concepts in a concrete way. In the description of the embodiments of the present application, unless otherwise specified, the meaning of "multiple" refers to two or more.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or replace some or all of the technical features therein with equivalents. However, these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.

In order to facilitate readers to understand the solutions of the embodiments of the present application, some technical terms involved in the embodiments of the present application are explained below.

The processor (central processing unit, CPU) is one of the main components of electronic devices and is also a core component in electronic devices. Its main function is to interpret computer instructions and process data in computer software. The processor is the core component of electronic devices responsible for reading instructions, decoding instructions, and executing instructions. As shown in Figure 1, the processor 1 mainly includes two parts, namely, a processing control unit 11 and a logic operation unit (arithmetic logical unit, ALU) 12. In addition, the processor 1 also includes a general register group 13 and a bus for realizing data connection and control connection. The general register group 13 includes registers for storing address codes, registers for storing data or instructions, registers for storing other information, etc.

The memory is a memory component used to store programs and various data information. The memory can be divided into two categories: main memory and auxiliary memory. The main memory directly exchanges information with the processor. The working mode of the main memory is to store or read various types of information according to the address of the storage unit. This working mode can be called access memory. The carrier that collects storage units in the main memory is called a storage body. Each storage unit in the storage body can store a string of information represented by binary code. The total number of bits of the information is called the word length of a storage unit. The address of the storage unit corresponds to the information stored in it one by one. There is only one address, which is fixed, while the information stored in it can be replaced. The binary code indicating each storage unit is called the address code. When looking for a storage unit, its address code must be given first.

In order for the processor 1 to run in a standardized, effective and fast manner, an instruction set (ISA) is set. The instruction set is a standard language required for the operation of the processor 1, which includes a variety of pre-set instructions. These instructions can be roughly divided into operation instructions, data movement instructions and control instructions. Among them, the operation instructions are used to indicate the specific calculation operations performed in the logic operation unit 12 of the processor 1; the data movement instructions are used to instruct the processor 1 to read or store the data or instructions to be processed; the control instructions are used to instruct the processor 1 to change the execution order of instructions to achieve jumps, loops and other operations of the executed program. An instruction mainly includes two parts, namely the operation code and the operand. The operand contains the object that the instruction needs to execute, such as address code and other information; the operation code contains the specific operation logic of the processor 1 to execute the instruction according to the operand. At the same time, In a processor 1 with a relatively complex integration, relatively complex instructions may be set, whereas in a processor 1 with a relatively simple integration, relatively simple instructions are generally set.

As shown in FIG2 , the processor 1 needs to access the data or instructions stored in the memory 2 to realize the normal operation of the processor 1. The specific workflow of the processor 1 can be roughly divided into five parts: instruction fetch, instruction decoding, instruction execution, data memory access and data write back. Among them, instruction fetch means that some instructions that the processor 1 needs to execute during operation are stored in the memory 2. When the processor 1 starts to run, it sends an instruction fetch instruction to the memory 2 to read the instructions that the processor 1 needs to execute from the memory 2. Instruction decoding refers to the process in which the processor 1 translates the instructions to be executed read from the memory 2. At this stage, if the instruction has some operand register indexes after decoding, the operand register index can be used to read the corresponding operand from the general register group 13 of the processor 1. After the instruction decoding, the calculation type required is known, and the required operands have been read from the general register group 13, then the instruction execution is performed next. Instruction execution refers to the process in which the processor 1 actually operates on the instructions to be executed. For example, if the instruction is an addition instruction, the operand is added. In the stage of instruction execution, the logic operation unit 12 is often used as a hardware functional unit for implementing specific operations. Data access is the process of sending data access instructions from the processor 1 to the memory 2. The data access instruction is often one of the most important instruction types in the instruction set, which refers to the process of the memory 2 reading data from the memory 2 or writing data into the memory 2 through the access instruction. Data writeback refers to the process of writing the result of the instruction execution back to the general register group 13 or the memory 2. If it is a normal operation instruction, the result value comes from the result calculated in the instruction execution stage; if it is a memory read instruction, the result comes from the data read from the memory 2 in the data access stage.

Processor 1 has two main memory access modes for data or instructions in memory 2, direct addressing and indirect addressing. In direct addressing, taking the instruction in Figure 3 as an example, its operand is address code information, which is used to indicate the data address of the data to be read in memory 2. When the address code information stores the data to be read, this addressing mode is direct addressing. In addition, the address code of memory 2 may also store second address code information, that is, after processor 1 accesses memory 2, it obtains the second address code information, and needs to access the memory again according to the second address code information to obtain the data to be read. This addressing mode is indirect addressing. In direct addressing, the processor has high memory access efficiency for the memory; the advantage of indirect addressing is a large addressing range, but the processor accesses the memory multiple times and the instruction execution time is long.

Due to the difference in the operating frequency of the processor 1 and the memory 2, the problem of low working efficiency of the processor 1 is caused. Generally, the method of pre-reading the data required by the processor 1 from the memory 2 is adopted to improve the operating efficiency of the processor 1. The methods of pre-reading data are divided into hardware pre-fetching and software pre-fetching. Regarding hardware pre-fetching, generally a memory controller is set up, and the memory controller is used to estimate the data that the processor 1 may read, and calculate and generate the corresponding pre-fetch address, and then the memory controller pre-fetches the data from the memory 2. As for software pre-fetching, it is mainly realized by inserting pre-fetch instructions through the software program display or adding pre-fetch instructions by the compiler. The internal running program of the processor 1 is generally assembly language, etc., which is a manifestation of the instruction set. During the operation of the processor 1, the compiler program converts the assembly language into a machine-readable binary language. In this process, the processor 1 can be set, and the compiler predicts the data to be read by the processor 1 during the working process. According to the prediction result of the compiler, it is estimated that the storage address of the data to be read in the memory 2 can be obtained, and a pre-fetch instruction is generated according to the storage address, so as to realize the pre-reading of the predicted data, so that the processor 1 can realize fast reading when executing the instruction to operate the predicted data. For example, the PRFM (Prefetch From Memory) instruction in the ARM operating instruction system can be used to prefetch the operand of a certain address into the cache.

In the process of processor 1 running prefetch instructions, the timeliness of prefetching has a great impact on the efficiency of the entire processing system. Timeliness means that when processor 1 executes a certain instruction, when its operand is the first prefetched data, the first prefetched data is just stored in the corresponding cache address. It should be noted that the first prefetched data neither replaces other valid data in the corresponding cache address too early, nor is it stored in the corresponding cache address too late to cause the processing flow of processor 1 to wait. The timeliness of software prefetch instructions often varies depending on the platform.

In today's processing systems, prefetching fixed-mode memory access has very good time efficiency. For example, prefetching the data in array A[i] is actually a traversal access to array A according to the index. Taking Figure 4 as an example, the processor 1 includes a cache unit 13 and a prefetch unit 14, and the storage units 11, 12, and 13 in the memory 2 store the continuous data content of the array A[i]. When the prefetch unit 14 performs software or hardware prefetching on the array A[i], it can take out the continuous data content in the storage units 11, 12, and 13 from the memory 2 and store it in the cache unit 13.

Since the memory access mode is fixed, it is very easy to load data in advance by the current data prefetching mechanism, hardware prefetching and software prefetching The timeliness is very good. However, the memory access based on irregular indirect addressing is still a difficult point for the processing system to pre-fetch data. For example, the access to array B[func(A[i])]. Referring to Figure 4, the storage units 22, 24, and 26 in the memory 2 store the data content of the array B[func(A[i])]. After obtaining the content corresponding to the array A[i], it is necessary to calculate and obtain the address information of the storage units 22, 24, and 26 before finally accessing the storage units. Furthermore, the storage units 22, 24, and 26 may also store address information for further address calculation, which will not be repeated here. In actual execution, since the subscript of array B is formed by the function transformation of the data in array A, it is difficult to infer the future access address from the historical access records, so the pre-fetching timeliness is poor.

In addition, when processor 1 performs prefetching according to existing software prefetch instructions, the prefetched data is often stored in the cache. However, due to instruction restrictions, the prefetched data in the cache cannot be used directly, and processor 1 needs to use additional instructions to load data, such as load instructions. During access, if an access is out of bounds, an exception will be triggered, resulting in the clearing of the instruction pipeline, causing processor 1 to hang or become abnormal. To solve this problem, additional instructions need to be added for boundary checking, which reduces the operating efficiency of processor 1.

In order to solve the poor timeliness of irregular indirect addressing and improve the operating efficiency of the processor, the embodiment of the present application provides a data prefetching method and a data prefetching device based on the combination of software and hardware. As shown in FIG5 , the data prefetching device can be an electronic device 5, including a processor 1, a first memory 2, a first register 131 and a cache 132. When prefetching data, the processor 1 sends a first prefetch instruction to the first memory 2; the first prefetch instruction is used to instruct the first memory 2 to store a fixed number of continuous data blocks in the first data area stored in the first memory 2 into the first register 131 and the cache 132.

When executing the next step, the processor 1 can also send a third prefetch instruction to the first memory 2 and the first register 131; the third prefetch instruction is used to generate a new prefetch address based on the data block information stored in the first register 131, and instruct the first memory 2 to store a fixed number of consecutive data blocks corresponding to the prefetch address stored in the first memory 2 into the cache 132.

Furthermore, the processor 1 can also run a second prefetch instruction when performing indirect addressing, and the second prefetch instruction is used to generate a new prefetch address according to the data block information stored in the first register 131, and instruct the first memory 2 to store a fixed number of continuous data blocks corresponding to the prefetch address stored in the first memory 2 into the cache 132 and the first register 131. The data block content stored in the first register 131 can continue to be used for subsequent indirect addressing by the processor 1 running the second prefetch instruction or the third prefetch instruction.

As can be seen from the above, the first register 131 added in the hardware system is used to temporarily store the data obtained by the processor 1 from the first memory 2 through the software prefetch instruction (such as the first prefetch instruction and the second prefetch instruction). The processor 1 can use the data temporarily stored in the first register 131 to calculate and generate a new prefetch address through the software prefetch instruction (such as the second prefetch instruction and the third prefetch instruction), thereby simplifying the processing process of accessing memory based on irregular indirect addressing. In order to cooperate with the newly added first register 131, it is also necessary to modify or add software prefetch instructions (the above-mentioned first prefetch instruction and the second prefetch instruction) so that the processor 1 can reasonably use the first register 131 and the data therein when calculating a new prefetch address or storing prefetch data.

By adding a first register 131 and a newly added software prefetch instruction to the processing system architecture, when the processing system performs indirect addressing data prefetching, the first prefetch data obtained in the previous prefetch is stored in the newly added first register 131, so that when executing the next prefetch instruction, the first prefetch data can be directly read and called to calculate a new prefetch address, thereby improving the timeliness of prefetching. This avoids the multiple accesses to the cache in the prior art, in which only the first prefetch data obtained in the previous prefetch is stored in the cache, and the first prefetch data in the cache must be additionally called with a data load instruction to obtain the first prefetch data and used to calculate a new prefetch address. Therefore, the operating efficiency of the processing system is accelerated, and possible cross-boundary access is avoided. In particular, in the case of multi-layer nested indirect addressing, the number of accesses of the processor 1 to the first memory 2 or the cache 132 can be greatly reduced, thereby improving the computing efficiency.

In some possible implementations, the first register 131 may be a PRDR (Prefetch Data Register) register, and the first prefetch instruction may be a PRFMTRAS (PRefetch From Memory To Register with Adaptive Stride) instruction, whose operand includes address information, and optionally may also include an address offset, for enabling the processor to prefetch data corresponding to the address information into the processor cache and the PRDR register. Exemplarily, the address information may be an identifier of a storage unit, in which the first prefetch address is stored. As shown in FIG6 , through the PRFMTRAS instruction, the processor may obtain corresponding prefetch data from the memory according to the first prefetch address and the address offset, and store it in the cache and the PRDR register. Exemplarily, the first prefetch address may be the starting address of the storage location of the data to be prefetched, and the data to be prefetched is prefetched with a preset address offset. For example, if the first prefetch address is 0x90 and the address offset is 0x08, then data is read starting from address 0x98 and continuing to 0xA0 (including address 0xA0). In addition, the first prefetch address may also indicate the storage location of the data to be prefetched. The tail address of the storage location may also be stored in the data address information, and an identifier of a memory address register may be stored in the data address information, and the corresponding tail address may be provided by the memory address register. In some possible implementation methods, the address offset of the first prefetch instruction may be embodied in the instruction as an operand, or may be preset in the system without being embodied in the form of an operand of the instruction.

In some possible implementations, the address offset of the instruction operand can be a fixed value, such as #3, that is, the actual prefetch address is the instruction operand address information offset 3, or it can be used to indicate an integer multiple of the offset, such as 8, that is, the actual prefetch address is an integer multiple of the instruction operand address information offset 8.

In some possible implementation methods, when the PRFMTRAS instruction is actually executed, the address offset can be adjusted. Exemplarily, the processor can adjust the address offset by monitoring the timeliness of the software prefetch instruction, wherein the timeliness of the software prefetch instruction refers to whether the first data prefetched by the software prefetch instruction is in the corresponding register (cache) when the processor needs the first data. When the processor monitors that the current software prefetch instruction has low timeliness (for example, when the first data is needed, the first data has not been returned), the address offset when executing the PRFMTRAS instruction can be increased, which can be an integer multiple of the address offset preset or in the instruction operand, or a preset offset value can be added to the address offset preset or in the instruction operand, so that the address distance of this memory access is farther, improving the timeliness. For another example, when the processor detects that the timeliness of the current software prefetch instruction is too high (i.e., the prefetch instruction is sent too early, resulting in the first data having been replaced out of the corresponding register when the first data is needed), the address offset of the PRFMTRAS instruction can be reduced. Specifically, it can be an integer multiple of a preset address offset so that it is smaller than the address offset of the previous prefetch instruction, or the preset offset value can be subtracted from the address offset of the previous prefetch instruction, so that the address distance of this memory access is closer and a smaller amount of data is stored in the corresponding register, thereby slowing down the replacement efficiency of the data in the register and improving the prefetch timeliness.

This embodiment does not specifically limit the actual method for adjusting the address offset when the processor executes the PRFMTRAS instruction.

In some possible implementation methods, the third prefetch instruction may be a PRFMBR (PRefetch From Memory Based on Register) instruction, which is used to enable the processor to prefetch the data stored at the second prefetch address into the processor cache according to the content in the PRDR register. Since in the indirect data prefetch process, the processor needs to execute at least two software data prefetch instructions in sequence, except for the last software data prefetch instruction, the processor obtains address information from the first memory through at least one software data prefetch instruction. For example, the processor first executes the PRFMTRAS instruction and then executes the PRFMBR instruction. Through the PRFMTRAS instruction, the processor obtains the first address information from the first memory and stores it in the PRDR register. As shown in FIG7, when the processor executes the PRFMBR instruction, it obtains the second prefetch address by calculating the first address information, and obtains the corresponding prefetch data from the memory according to the second prefetch address and stores it in the cache.

The operand of PRFMBR may be empty, that is, the second prefetch address is equal to the first address information. The operand of PRFMBR may also include address information. Exemplarily, the address information may be an identifier of a storage unit or a fixed offset. The PRFMBR instruction calculates the second prefetch address through the address information and the first address information, and the calculation method includes addition.

Further, when the processor executes the PRFMBR instruction, the second pre-fetch address calculation can be performed according to the address offset and the corresponding data can be pre-fetched, that is, the operand of the PRFMBR can also include the address offset. Optionally, the address offset can be a fixed value or an adjustable value, and the adjustment method can refer to the PRFMTRAS instruction.

In some possible implementation methods, the second prefetch instruction may be a PRFMBRTR (PRefetch From Memory Based on Register To Register) instruction, which is used to enable the processor to prefetch the data stored at the third prefetch address into the processor cache according to the content in the PRDR register. Since in the indirect data prefetch process, the processor needs to execute at least two software data prefetch instructions in sequence, except for the last software data prefetch instruction, the processor obtains address information from the first memory through at least one software data prefetch instruction. Taking the example that the processor first executes the PRFMTRAS instruction and then executes the PRFMBRTR instruction, through the PRFMTRAS instruction, the processor obtains the second address information from the first memory and stores it in the PRDR register. As shown in FIG8, when the processor executes the PRFMBRTR instruction, it obtains the third prefetch address through the second address information operation, and obtains the corresponding prefetch data from the memory according to the third prefetch address, and stores it in the cache and the PRDR register. The pre-fetched data may replace the second address information stored in the PRDR register. Optionally, the pre-fetched data may also be stored in other designated PRDR registers and the second address information may be deleted.

The operand of PRFMBRTR may be empty, that is, the third prefetch address is equal to the second address information. The operand of PRFMBRTR may also include address information. For example, the address information may be an identifier of a storage unit or a fixed offset. The PRFMBRTR instruction calculates the third prefetch address through the address information and the second address information, and the calculation method includes addition.

Furthermore, when the processor executes the PRFMBRTR instruction, the third prefetch address calculation can be performed according to the address offset. Calculate and pre-fetch the corresponding data, that is, the operand of PRFMBRTR can also include the address offset. Optionally, the address offset can be a fixed value or an adjustable value, and the adjustment method can refer to the PRFMTRAS instruction.

As can be seen from the above, the three newly introduced prefetch instructions all have write or read operations on the PRDR register and are dependent on the execution result of the previous instruction, so the three instructions should be submitted for execution in sequence. Exemplarily, the instruction execution sequence that can be implemented includes: executing PRFMTRAS first and then executing PRFMBRTR, executing PRFMTRAS first and then executing PRFMBR, or executing PRFMTRAS first, then executing at least one PRFMBRTR, and finally executing PRFMBR, etc. When starting an irregular indirect addressing prefetch, the first instruction executed should be PRFMTRAS, which is used to obtain relevant prefetch data from the memory and store it in the PRDR register, so that the subsequent PRFMBRTR instruction or PRFMBR instruction can be executed.

In some possible implementations, when executing the newly introduced software prefetch instruction, it may trigger an out-of-bounds or illegal access to the relevant storage device. The processor will process the prefetch instruction that triggers the out-of-bounds as a no-operation instruction (NOP) instruction and will not make any changes to the CPU architecture state. By skipping this instruction prefetch, risks such as system lockup are avoided. Among them, triggering out-of-bounds includes at least the following two situations: 1. Data out-of-bounds, such as exceeding the boundary of an array; 2. Access right out-of-bounds, such as illegal access or the memory to be accessed exceeds the instruction access right.

In some possible implementations, the PRDR register 4 may specifically be a FIFO register.

In some possible implementations, the data pre-fetching device includes a plurality of PRDR registers 4 for storing pre-fetched data. Optionally, the storage capacities of the plurality of PRDR registers 4 may be completely equal, partially equal, or completely unequal.

Optionally, multiple PRDR registers 4 can support renaming to resolve Write after Write (WAW) conflicts that may occur when multiple software prefetch instructions are executed. Although the newly introduced software prefetch instructions should be submitted for processing in sequence, since most processors support out-of-order execution, in specific implementations, it is possible that prefetch instructions are submitted in sequence and irrelevant instructions are inserted during execution. By renaming registers, the above WAW conflicts can be resolved. The irrelevant instructions are instructions that have no data or register dependency on the prefetch instructions to be executed.

Exemplarily, the first memory 2 may be a memory in the electronic device, and the cache 132 may be an in-core cache of the processor 1. The in-core cache is usually composed of a high-speed but expensive storage module, and is used for high-speed data access during processor calculation. When the processor 1 reads data, it first searches from the in-core cache, and immediately reads and sends it to the processor 1 for processing if it finds it; if it does not find it, it reads from the memory with a relatively large access time overhead and sends it to the processor 1 for processing, and at the same time, the data block where the data is located is transferred into the in-core cache, so that the entire block of data can be read from the in-core cache in the future, without having to call the memory.

Exemplarily, the electronic device 5 may be a computer, a tablet, an audio player, a video player, a data processing device, a data computing device, or the like, and may be a device involving data processing and data storage.

The electronic device 5 shown in FIG. 4 in the embodiment of the present application can perform pre-fetch optimization on the example instruction fragments based on the software pre-fetch instructions shown in FIG. 6, FIG. 7, and FIG. 8. The example instruction fragments are shown in Table 1 below:

Table 1

This example instruction fragment indicates an array-based random indirect addressing memory access mode. In which register x21 stores is the address index of an array, accessing the array in sequence. Instruction (1) is used to instruct that the data returned from the x21 address access is loaded into x22; according to instruction (2), the processor uses the content of x22 to calculate the new address and loads the data into x22, replacing the data previously returned from x21. This instruction (2) is a typical irregular indirect addressing. Afterwards, the data returned by instruction (2) is further used to calculate a new memory access address (see instruction (3)). Finally, in instruction (5), the x21 register is incremented by 8 in address sequence.

Among them, the instruction (3) is the second level of nested irregular indirect addressing. Currently, both hardware prefetching and software prefetching are difficult to predict the access address and data of instruction (3). Therefore, software prefetch instructions as shown in Figures 6, 7, and 8 above are introduced after instruction (3) to help prefetch the loop body access data in the above example instruction fragment.

The three newly added instructions are all software prefetch instructions, which are generally composed of an opcode and an operand. The opcodes of software prefetch instructions under different architectures can have different formats. Taking the software prefetch instructions under the ARM architecture as an example, the opcode can be composed of the following three items: operation purpose, destination register, and operation measures. The operation purpose is divided into PLD and PST, where PLD is used to refer to prefetch operations for the purpose of loading, and PST is used to refer to prefetch operations for the purpose of storage. The destination register can exemplarily have L1 representing L1 cache, L2 representing L2 cache, and so on. Operation measures include KEEP and STRM, where KEEP is generally used to prefetch into the cache, and STRM is generally used to prefetch into a single read-write memory space. Through character combination, different operators can be obtained, so that the software prefetch instruction can perform different prefetch operations based on the operator.

The operand in the software prefetch instruction is generally a memory address, which can be understood as a prefetch address here, or the processor can calculate the prefetch address according to the memory address, and the data to be prefetched is stored in the prefetch address.

Table 2

As shown in Table 2, the added instruction (4), which corresponds to the instruction PRFMTRAS in FIG6 , can prefetch the data of sequential access x21, and the preset address offset is 8 bytes. The prefetched data is loaded into the processor cache and the PFDR register at the same time. The data prefetched by this instruction is then used in the instruction (1). Furthermore, based on the preset address offset, the actual hardware implementation can dynamically adjust the actual offset of the prefetch according to the timeliness of the previous prefetch, which can be an integer multiple of the address offset.

The added instruction (5), which corresponds to the instruction PRFMBRTR in FIG8 , can use the data stored in PRDR, calculate the prefetch address and perform data prefetching. As mentioned above, in the case of sequential processing, the data prefetched to PRDR by the previous instruction PRFMTRAS, that is, the data in the sequential access array x21, plus the data in the x6 register and the immediate number 3 form the second data prefetch address. According to the second data prefetch address, the processor prefetches the second data and loads it into the cache and the PFDR register, replacing the data in the sequentially accessed array x21 originally stored in the PFDR register. Among them, the second data prefetched by the PRFMBRTR instruction is then used in the instruction (2).

The added instruction (6), which corresponds to the instruction PRFMBR in FIG7 , can also use the data stored in PRDR, calculate the prefetch address and perform data prefetching. Specifically, it can be seen from the instruction that the second data prefetched to PRDR by the previous PRFMBRTR is added to the data in the x14 register to calculate the third data prefetch address. According to the third data prefetch address, the processor prefetches the third data and loads it into the cache. Among them, the third data prefetched by the PRFMBR instruction is then used in the instruction (3).

Specifically, as shown in FIG9 , an example diagram of irregular indirect addressing based on prefetch instructions provided in this embodiment is used to exemplarily describe that the processor 1 implements irregular indirect addressing data prefetching through three newly introduced prefetch instructions. The processor 1 includes a cache unit 13 and a prefetch unit 14, wherein the PRDR register is not shown. The processor sequentially executes the PRFMTRAS, PRFMBRTR and PRFMBR instructions to prefetch data in the storage units 31, 32 and 35 in the memory 2.

By executing PRFMTRAS, the data in the storage units 11, 12, and 13 in the memory 2 are stored in the cache unit 13 and the PRDR register, and the data in the storage units 11, 12, and 13 are the first address information.

By executing PRFMBRTR, the first address information stored in the PRDR register is called, the address information corresponding to the storage units 22, 24, and 26 in the memory 2 is calculated, and the data therein is pre-fetched to the cache unit 13 and the PRDR register. The data in the storage units 22, 24, and 26 is the second address information.

By executing PRFMBR, the second address information stored in the PRDR register is called, the address information corresponding to the storage units 31, 32, and 35 in the memory 2 is calculated, and the data therein is pre-fetched into the cache unit 13.

Therefore, through the newly introduced three instructions, processor 1 can achieve fast prefetching of irregular indirect addressing data, and CPU performance is greatly improved.

An embodiment of the present application also provides a computer program product including instructions, which, when executed on a computer, enables the computer to execute the data prefetch instructions of the embodiments shown in Figures 6, 7 and 8 above, and the data prefetch method of the embodiment shown in Figure 9.

An embodiment of the present application also provides a computer-readable storage medium, including instructions, which, when executed on a computer, enable the computer to execute the data prefetch instructions of the embodiments shown in Figures 6, 7 and 8 above, and the data prefetch method of the embodiment shown in Figure 9.

In another possible design, when the processing system is a chip in a terminal, the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit may execute computer execution instructions stored in the storage unit, so that the chip in the terminal executes the data prefetch instructions of the embodiments shown in the above-mentioned Figures 6, 7 and 8, and the data prefetch method of the embodiment shown in Figure 9. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the terminal located outside the chip, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.

Among them, the processor mentioned in any of the above places can be a general-purpose central processing unit, a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the above-mentioned data prefetching method.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.

In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or units, which can be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk and other media that can store program code.

In this application, the terms "first", "second", etc. are used to distinguish the same or similar items with substantially the same role and function. It should be understood that there is no logical or temporal dependency between "first", "second", and "nth", nor does it limit the quantity and execution order. It should also be understood that although the following description uses the terms first, second, etc. to describe various elements, this These elements should not be limited by the terms. These terms are only used to distinguish one element from another element. For example, without departing from the scope of the various described examples, a first image may be referred to as a second image, and similarly, a second image may be referred to as a first image. Both the first image and the second image may be images, and in some cases, may be separate and different images.

The term "at least one" in this application means one or more, and the term "multiple" in this application means two or more, for example, multiple second messages means two or more second messages. The terms "system" and "network" are often used interchangeably herein.

It should be understood that the terms used in the description of the various examples herein are only for describing specific examples and are not intended to be limiting. As used in the description of the various examples and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The term "and/or" is a description of the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/" in this application generally indicates that the associated objects before and after are in an "or" relationship.

It should also be understood that in the various embodiments of the present application, the size of the serial number of each process does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

It should be understood that determining B based on A does not mean determining B only based on A. B can also be determined based on A and/or other information.

It should also be understood that the term “comprise” (also known as “includes,” “including,” “comprises” and/or “comprising”) when used in this specification specifies the presence of stated features, integers, steps, operations, elements, and/or components, but does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "if" may be interpreted to mean "when" or "upon" or "in response to determining" or "in response to detecting." Similarly, the phrase "if it is determined that ..." or "if [a stated condition or event] is detected" may be interpreted to mean "upon determining that ..." or "in response to determining that ..." or "upon detecting [a stated condition or event]" or "in response to detecting [a stated condition or event]," depending on the context.

It should be understood that the references to "one embodiment", "an embodiment", or "a possible implementation" throughout the specification mean that specific features, structures, or characteristics related to the embodiment or implementation are included in at least one embodiment of the present application. Therefore, the references to "in one embodiment" or "in an embodiment", or "a possible implementation" appearing throughout the specification do not necessarily refer to the same embodiment. In addition, these specific features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

As described above, the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, a person of ordinary skill in the art should understand that the technical solutions described in the aforementioned embodiments can still be modified, or some of the technical features therein can be replaced by equivalents. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A data pre-fetching method, characterized in that the method comprises:

Execute a first instruction in a first instruction set, access a storage unit according to an operand of the first instruction, and store the acquired first data in a cache and a first register, wherein the operand includes a first address;

Execute a second instruction in the first instruction set, determine a second address according to the first data in the first register, access the storage unit, and store the acquired second data in the cache and the first register;

executing a third instruction in the first instruction set, determining a third address according to the second data in the first register, accessing the storage unit, and storing the acquired third data in the cache;

The second address corresponds to the second data in the storage unit, and the third address corresponds to the third data in the storage unit.
The data prefetching method according to claim 1, characterized in that executing the first instruction, wherein the operand includes a first address; further comprising:

Execute the first instruction to obtain the first data according to the first address and the first offset;

The first instruction is used to indicate the first offset.
The data prefetching method according to claim 2 is characterized in that the operand of the first instruction also includes the first offset, wherein the first offset is adjusted according to the prefetching timeliness, and the prefetching timeliness is used to indicate whether the prefetched data is stored in the cache when reading the prefetched data.
The data prefetching method according to any one of claims 1 to 3, characterized in that the operand of the second instruction includes a fourth address,

The executing the second instruction and determining the second address according to the first data in the first register specifically includes:

The second instruction is executed, and a second address is determined according to an operand of the second instruction and the first data in the first register.
The data prefetching method according to any one of claims 1 to 4, characterized in that the operand of the third instruction includes a fifth address,

The executing the third instruction and determining the third address according to the second data in the first register specifically includes:

The third instruction is executed, and a third address is determined according to an operand of the third instruction and the second data in the first register.
The data pre-fetching method according to any one of claims 1 to 5, characterized in that the method further comprises:

After executing the first instruction, the second instruction or the third instruction and before accessing the storage unit, it is determined that the first address, the second address or the third address corresponding to each of them does not cross the array boundary.
The data prefetching method according to any one of claims 4 to 6 is characterized in that the operand of the second instruction also includes a second offset and/or the operand of the third instruction also includes a third offset.
The data pre-fetching method according to any one of claims 1 to 7, characterized in that:

When there is no empty storage unit in the first register and fifth data is to be stored, the fourth data in the first register is replaced, wherein the fourth data is the earliest stored in the first register.
A data pre-fetch device, characterized in that the data pre-fetch device comprises: a pre-fetch unit and a register unit, wherein the pre-fetch unit is electrically coupled to the storage unit and the cache unit respectively;

The pre-fetch unit is used to execute at least three instructions in the first instruction set, and the register unit is used to store data obtained by the pre-fetch unit through the at least three instructions.

Wherein, the pre-fetch unit is specifically used for:

Execute a first instruction to obtain data in the storage unit and store the data in the register unit and the cache unit, wherein the operand of the first instruction is a first address;

According to the data in the register unit, execute a second instruction to obtain the data in the storage unit and store the data in the register unit and the cache unit;

According to the data in the register unit, a third instruction is executed to obtain the data in the storage unit and store the data in the cache unit.
The data pre-fetching device according to claim 9, characterized in that the pre-fetching unit is specifically used to execute a first instruction to obtain data in the storage unit, and further comprises:

The pre-fetch unit is specifically used to obtain data in the storage unit according to the first instruction, the first address and the first offset;

The first instruction is used to indicate the first offset.
The data prefetch device according to claim 10 is characterized in that the operand of the first instruction also includes a first offset, wherein the first offset is adjusted according to the prefetch timeliness, and the prefetch timeliness is used to indicate whether the prefetch data is stored in the cache unit when the data prefetch device calls the prefetch data.
The data prefetch device according to any one of claims 9 to 11, characterized in that the operand of the second instruction includes a second address, and the prefetch unit is specifically used to execute the second instruction to obtain the data in the storage unit according to the data in the register unit, and further includes:

According to the data in the register unit and the second address, the second instruction is executed to obtain the data in the storage unit.
The data prefetch device according to any one of claims 9 to 12, characterized in that the operand of the second instruction includes a third address, and the prefetch unit is specifically used to execute the third instruction to obtain the data in the storage unit according to the data in the register unit, and further comprises:

According to the data in the register unit and the third address, the third instruction is executed to obtain the data in the storage unit.
The data pre-fetching device according to any one of claims 9 to 13, characterized in that the pre-fetching unit is further used for:

After executing the first instruction, the second instruction or the third instruction and before accessing the storage unit, it is determined that the first address, the second address or the third address corresponding to each of them does not cross the array boundary.
The data prefetch device according to any one of claims 9 to 14, characterized in that the operand of the second instruction includes a second offset, the operand of the third instruction includes a third offset,

The pre-fetch unit is specifically used to execute a second instruction to obtain data in the storage unit according to the data in the register unit, and also includes:

Execute the second instruction to obtain the data in the storage unit according to the data in the register unit and the second offset;

The pre-fetch unit is specifically used to execute a third instruction to obtain data in the storage unit according to the data in the register unit, and also includes:

According to the data in the register unit and the third offset, the third instruction is executed to obtain the data in the storage unit.
The data pre-fetching device according to any one of claims 9 to 15, characterized in that:

When there is no empty storage unit in the first register and fifth data is to be stored, the fourth data in the first register is replaced, wherein the fourth data is the earliest stored in the first register.
A data pre-fetch device, characterized in that the data pre-fetch device comprises: a processor, a first register, a first memory and a cache;

The first memory is used to store data, wherein the data includes data to be pre-fetched;

The processor is configured to:

According to a first instruction and a first address in the first instruction set, obtaining first data from the first memory, and storing the first data in the first register and the cache;

Determine the second address according to the first data in the first register and the second instruction in the first instruction set, obtain second data corresponding to the second address from the first memory, and store the second data in the first register and the cache;

Determine the prefetch address according to the second data in the first register and the third instruction in the first instruction set, and store the to-be-prefetched data corresponding to the prefetch address in the first memory into the cache;

The first register and the second memory are used to store first address information obtained by the processor according to at least one instruction in the first instruction set.
The data pre-fetching device according to claim 17, wherein the processor is used to obtain the first data from the first memory according to the first instruction and the first address in the first instruction set, and further comprises:

The processor is configured to obtain the first data according to the first instruction, the first address and the first offset;

The first instruction is used to indicate the first offset.
The data prefetch device according to claim 18 is characterized in that the operand of the first instruction includes a first offset, wherein the first offset is adjusted according to prefetch timeliness, and the prefetch timeliness is used to indicate whether the prefetch data is stored in the cache when the processor calls the prefetch data.
The data prefetch device according to any one of claims 17 to 19, wherein the processor is used to determine the second address according to the first data in the first register and the second instruction in the first instruction set, and further comprises:

The operand of the second instruction includes a third address, and the second address is determined according to the first data and the third address.
The data prefetch device according to any one of claims 17 to 20, characterized in that the processor is used to determine the prefetch address according to the second data in the first register and the third instruction in the first instruction set, and further comprises:

The operand of the third instruction includes a fourth address, and the prefetch address is determined according to the second data and the fourth address.
The data pre-fetching device according to any one of claims 17 to 21, characterized in that the method further comprises:

After executing the first instruction, the second instruction or the third instruction and before accessing the storage unit, it is determined that the first address, the second address or the third address corresponding to each of them does not cross the array boundary.
The data prefetch device according to any one of claims 17 to 22, characterized in that the operand of the second instruction includes a second offset, the operand of the third instruction includes a third offset,

The processor is configured to determine the second address according to the first data in the first register and the second instruction in the first instruction set, and further includes:

determining the second address according to the first data, the second instruction and the second offset;

The processor is configured to determine the prefetch address according to the second data in the first register and a third instruction in the first instruction set, and further includes:

The prefetch address is determined according to the second data, the third instruction and the third offset.
The data pre-fetching device according to any one of claims 17 to 23, characterized in that:

When there is no empty storage unit in the first register and fifth data is to be stored, the fourth data in the first register is replaced, wherein the fourth data is the earliest stored in the first register.
A processing system, characterized in that it comprises:

Memory, used to store programs;

A processor, configured to execute the program stored in the memory; when the program is executed, the processor is configured to execute the method described in any one of claims 1 to 8.
A computer-readable storage medium, characterized in that it includes instructions, and when the instructions are executed on a computer, the computer executes the method according to any one of claims 1 to 8.