WO2024139445A1 - Data prefetching method and data prefetching apparatus - Google Patents

Data prefetching method and data prefetching apparatus Download PDF

Info

Publication number
WO2024139445A1
WO2024139445A1 PCT/CN2023/120191 CN2023120191W WO2024139445A1 WO 2024139445 A1 WO2024139445 A1 WO 2024139445A1 CN 2023120191 W CN2023120191 W CN 2023120191W WO 2024139445 A1 WO2024139445 A1 WO 2024139445A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
instruction
address
register
prefetch
Prior art date
Application number
PCT/CN2023/120191
Other languages
French (fr)
Chinese (zh)
Inventor
王科兵
陈章麒
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024139445A1 publication Critical patent/WO2024139445A1/en

Links

Abstract

Embodiments of the present application provide a data prefetching method and a data prefetching apparatus, which are mainly applied to a processing system. The data prefetching method comprises: executing a first instruction, accessing a storage unit according to an operand of the first instruction, and storing obtained first data into a cache and a first register, wherein the operand of the first instruction comprises a first address; executing a second instruction, according to the first data in the first register, determining a second address and accessing the storage unit, and storing obtained second data into the cache and the first register; and executing a third instruction, according to the second data in the first register, determining a third address and accessing the storage unit, and storing obtained third data into the cache, wherein the second address corresponds to the second data in the storage unit, and the third address corresponds to the third data in the storage unit. According to the embodiments of the present application, three new prefetching instructions and the first register used for accessing prefetched data are introduced, so that data prefetching based on irregular indirect addressing is realized.

Description

一种数据预取方法和数据预取装置A data pre-fetching method and a data pre-fetching device
本申请要求于2022年12月30日提交国家知识产权局、申请号为202211719527.0、申请名称为“一种数据预取方法和数据预取装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the State Intellectual Property Office on December 30, 2022, with application number 202211719527.0 and application name “A data prefetching method and data prefetching device”, the entire contents of which are incorporated by reference in this application.
技术领域Technical Field
本申请涉及存储技术领域,尤其涉及一种数据预取方法和数据预取装置。The present application relates to the field of storage technology, and in particular to a data pre-fetching method and a data pre-fetching device.
背景技术Background technique
近年来,随着处理器运算技术的发展,设备的CPU数据处理速度有很大的提升。在当前的处理器运算中,性能提升的瓶颈在于访存操作。为提高系统访问性能,CPU通常对待访问的数据进行预测,并提前将预测的数据从访问速度较慢的存储设备中加载到访问速度较快的存储设备中。In recent years, with the development of processor computing technology, the CPU data processing speed of devices has been greatly improved. In current processor computing, the bottleneck of performance improvement lies in memory access operations. In order to improve system access performance, the CPU usually predicts the data to be accessed and loads the predicted data from the storage device with slower access speed to the storage device with faster access speed in advance.
目前,处理器预取数据的方式主要可以分为两种:基于固定模式的访存与无规则的间接寻址访存。例如对数据B[func(A[i])]进行访存,其中,数组A按下标遍历访问,属于基于固定模式的访存,由于访存模式固定,因而非常容易被当前数据预取机制提前加载数据;数组B的下标是数组A元素内容经过函数变换而形成,属于无规则的间接寻址访存,由于不能从历史访问记录推测到未来访问地址,难以被当前数据预取机制提前加载数据。At present, the processor prefetches data in two main ways: fixed-mode memory access and irregular indirect addressing memory access. For example, when accessing data B[func(A[i])], array A is accessed by traversing the subscripts, which is a fixed-mode memory access. Since the memory access mode is fixed, it is very easy for the current data prefetching mechanism to load data in advance. The subscripts of array B are formed by function transformation of the elements of array A, which is an irregular indirect addressing memory access. Since the future access address cannot be inferred from the historical access records, it is difficult for the current data prefetching mechanism to load data in advance.
在如今的系统中,基于数组的无规则间接寻址方式已经是更为常见的数据访存模式。因此,解决无规则间接寻址的数据预取瓶颈十分迫切。In today's systems, array-based random indirect addressing has become a more common data access mode. Therefore, it is urgent to solve the data prefetching bottleneck of random indirect addressing.
发明内容Summary of the invention
本申请实施例提供了一种数据预取方法和数据预取装置,用于简化无规则间接寻址的数据预取,提高预取效果和系统性能。为达到上述目的,本申请采用如下技术方案。The embodiment of the present application provides a data prefetching method and a data prefetching device, which are used to simplify the data prefetching of irregular indirect addressing and improve the prefetching effect and system performance. To achieve the above purpose, the present application adopts the following technical solution.
为达到上述目的,本申请采用如下技术方案:In order to achieve the above objectives, this application adopts the following technical solutions:
第一方面,本申请实施例提供一种数据预取方法,主要应用于处理系统,该处理系统执行第一指令集,该数据预取方法包括:In a first aspect, an embodiment of the present application provides a data pre-fetching method, which is mainly applied to a processing system, wherein the processing system executes a first instruction set, and the data pre-fetching method includes:
执行第一指令,根据该第一指令的操作数访问存储单元,并将获取的第一数据存入缓存与第一寄存器;执行第二指令,根据第一寄存器中的第一数据确定第二地址、访问存储单元,并将获取的第二数据存入缓存与第一寄存器;执行第三指令,根据第一寄存器中的第二数据确定第三地址、访问存储单元,并将获取的第三数据存入缓存;其中,第一指令的操作数包括第一地址,第一地址对应存储单元中的第一数据,第二地址对应存储单元中的第二数据,第三地址对应存储单元中的第三数据。Execute a first instruction, access a storage unit according to an operand of the first instruction, and store the acquired first data in a cache and a first register; execute a second instruction, determine a second address according to the first data in the first register, access a storage unit, and store the acquired second data in the cache and the first register; execute a third instruction, determine a third address according to the second data in the first register, access a storage unit, and store the acquired third data in the cache; wherein the operand of the first instruction includes a first address, the first address corresponds to the first data in the storage unit, the second address corresponds to the second data in the storage unit, and the third address corresponds to the third data in the storage unit.
本申请实施例中,在处理系统中引入三条新的预取指令和用于存取预取数据的第一寄存器,通过该三条预取指令,实现对第一寄存器的读写以及对其中存储的预取数据的运算获取下一步预取的地址信息,从而使基于无规则的间接寻址的数据预取方法得以实现,简化了预取的步骤与复杂性。In an embodiment of the present application, three new prefetch instructions and a first register for accessing prefetched data are introduced into the processing system. Through the three prefetch instructions, the reading and writing of the first register and the operation of the prefetched data stored therein are realized to obtain the address information of the next prefetch, thereby realizing the data prefetching method based on irregular indirect addressing, simplifying the prefetching steps and complexity.
在一种可能的实现方式中,在根据第一指令和第一地址从存储单元获取第一数据时,处理系统还依据第一偏移获取第一数据。处理系统可以根据第一偏移和第一地址,计算正确的预取地址,并从该预取地址获取第一数据。可选地,该第一偏移可以是预设在处理器内部的数值,也可以是作为第一指令的操作数输入。In a possible implementation, when acquiring the first data from the storage unit according to the first instruction and the first address, the processing system further acquires the first data according to the first offset. The processing system can calculate the correct pre-fetch address according to the first offset and the first address, and acquire the first data from the pre-fetch address. Optionally, the first offset can be a value preset in the processor, or can be input as an operand of the first instruction.
在一种可能的实现方式中,第一指令的操作数包括第一偏移,其中,第一偏移根据预设标准调整,所述预设标准包括预取时效性,所述预取时效性用于指示读取预取数据时,所述缓存中是否存储有所述预取数据。In one possible implementation, the operand of the first instruction includes a first offset, wherein the first offset is adjusted according to a preset standard, and the preset standard includes prefetch timeliness, and the prefetch timeliness is used to indicate whether the prefetched data is stored in the cache when reading the prefetched data.
在处理系统中,预取时效性是评价系统执行数据预取的重要标准,即处理器在执行某一指令时,其操作数(对应预取数据)是否恰好存储在对应的缓存地址中,即前次数据预取是成功且符合时效性的。处理系统可以根据前一次预取的预取时效性来调整第一偏移,当时效性准确时,可以维持第一偏移不变;当时效性指示数据被过早替换掉时,调小第一偏移;当时效性指示数据还 未被预取到时,调大第一偏移。通过调整第一偏移,可以灵活地适应处理系统对不同指令的预取需求,使数据预取更准确。In the processing system, prefetch timeliness is an important criterion for evaluating the system's data prefetching, that is, when the processor executes a certain instruction, whether its operand (corresponding to the prefetched data) is exactly stored in the corresponding cache address, that is, the previous data prefetch is successful and meets the timeliness. The processing system can adjust the first offset according to the prefetch timeliness of the previous prefetch. When the timeliness is accurate, the first offset can be maintained unchanged; when the timeliness indicates that the data is replaced too early, the first offset is reduced; when the timeliness indicates that the data is still When the prefetched instruction is not received, the first offset is increased. By adjusting the first offset, the prefetching requirements of the processing system for different instructions can be flexibly adapted, making data prefetching more accurate.
在一种可能的实现方式中,在处理器根据第一数据和第二指令,确定第二地址并根据第二地址从存储单元获取第二数据时,依据第二偏移获取第二数据。In a possible implementation, when the processor determines the second address according to the first data and the second instruction and obtains the second data from the storage unit according to the second address, the second data is obtained according to the second offset.
在一种可能的实现方式中,在处理器根据第二数据和第三指令,确定第三地址并根据预取地址从存储单元获取预取数据时,依据第三偏移获取第三数据。In a possible implementation, when the processor determines the third address according to the second data and the third instruction and obtains the pre-fetched data from the storage unit according to the pre-fetched address, the third data is obtained according to the third offset.
在一种可能的实现方式中,处理器根据第一数据执行的第二指令的操作数包括第四地址,通过对第四地址和第一寄存器中的第一数据可以确定第二地址。In a possible implementation, an operand of a second instruction executed by the processor according to the first data includes a fourth address, and the second address can be determined by comparing the fourth address and the first data in the first register.
即,第四地址作为第二指令的操作数,第二指令可以基于此对通过第一指令存入第一寄存器中的第一数据进行计算,该第四地址可以是偏移量,也可以是寄存器或存储器地址。第二指令对第一数据和第四地址进行加法运算,获取第二地址。通过增加第二指令的操作数,使处理系统的数据预取更有弹性,适应多种无规则的间接寻址运算。That is, the fourth address is used as the operand of the second instruction, and the second instruction can calculate the first data stored in the first register by the first instruction based on the fourth address, and the fourth address can be an offset, or a register or memory address. The second instruction performs an addition operation on the first data and the fourth address to obtain the second address. By increasing the operand of the second instruction, the data prefetching of the processing system is made more flexible and adaptable to a variety of irregular indirect addressing operations.
在一种可能的实现方式中,处理器根据第二数据执行的第三指令的操作数包括第五地址,通过对第五地址和第一寄存器中的第二数据可以确定第三地址。通过增加第三指令的操作数,使处理系统的数据预取更有弹性,适应多种无规则的间接寻址运算。In a possible implementation, the operand of the third instruction executed by the processor according to the second data includes the fifth address, and the third address can be determined by comparing the fifth address and the second data in the first register. By increasing the operand of the third instruction, the data prefetching of the processing system is made more flexible and adaptable to a variety of irregular indirect addressing operations.
在一种可能的实现方式中,执行所述第一指令、所述第二指令或所述第三指令后、访问所述存储单元前,确定各自对应的所述第一地址、所述第二地址或所述第三地址未发生数组越界。即在处理器执行上述三条指令中任一条指令时,发现一旦发生数据越界,则不执行相应的指令。In a possible implementation, after executing the first instruction, the second instruction, or the third instruction and before accessing the storage unit, it is determined that the first address, the second address, or the third address corresponding to each of them does not cross the array boundary. That is, when the processor executes any of the above three instructions, if it is found that the data crosses the boundary, the corresponding instruction is not executed.
由于数据的预取需要参考地址信息,可能地,该地址信息为计算所得,因此可能触发对相关存储设备的越界访问,至少包括以下两个情况:1、数据越界,如超过数组的边界;2、访问权限越界,如违法访问或待访问的存储器超出指令访问权限。通过将触发越界的预取指令作为空操作(no-operation instruction,NOP)指令处理,并且不对CPU架构状态做任何改变,可以避免系统锁死等风险。Since data prefetching requires reference to address information, which may be calculated, it may trigger out-of-bounds access to related storage devices, including at least the following two situations: 1. Data out-of-bounds, such as exceeding the boundary of an array; 2. Access rights out-of-bounds, such as illegal access or the memory to be accessed exceeds the instruction access rights. By treating the prefetch instruction that triggers out-of-bounds as a no-operation instruction (NOP) instruction and not making any changes to the CPU architecture state, risks such as system lockup can be avoided.
在一种可能的实现方式中,处理器仅将通过第三指令获取的预取数据存入缓存中,从而降低第一寄存器的存储压力,释放更多空间给后续预取流程中通过第一指令和第二指令存入的数据。In a possible implementation, the processor only stores the pre-fetched data obtained through the third instruction in the cache, thereby reducing the storage pressure of the first register and releasing more space for data stored through the first instruction and the second instruction in the subsequent pre-fetch process.
在一种可能的实现方式中,当所述第一寄存器中没有空的存储单元且有第五数据待存入时,替换所述第一寄存器中的第四数据,其中所述第四数据是最早存入所述第一寄存器的,合理更新第一寄存器中的数据并提高其利用率。In one possible implementation, when there are no empty storage cells in the first register and fifth data is to be stored, the fourth data in the first register is replaced, wherein the fourth data is the earliest stored in the first register, and the data in the first register is reasonably updated and its utilization rate is improved.
第二方面,本申请实施例提供一种数据预取装置,包括了预取单元和寄存单元,预取单元分别与存储单元和缓存单元电耦合。示例性地,该预取单元可以通过外部接口与该存储单元和缓存单元电耦合。该预取单元可以执行第一指令集中的至少三条指令,所述寄存单元存储所述预取单元通过所述至少三条指令获取的数据;预取单元具体用于:In a second aspect, an embodiment of the present application provides a data pre-fetch device, comprising a pre-fetch unit and a register unit, wherein the pre-fetch unit is electrically coupled to a storage unit and a cache unit, respectively. Exemplarily, the pre-fetch unit can be electrically coupled to the storage unit and the cache unit through an external interface. The pre-fetch unit can execute at least three instructions in a first instruction set, and the register unit stores data obtained by the pre-fetch unit through the at least three instructions; the pre-fetch unit is specifically used to:
执行第一指令获取存储单元中数据、并将其存入寄存单元和缓存单元,其中第一指令的操作数包括第一地址;根据寄存单元中的数据,执行第二指令获取存储单元中的数据、并将其存入寄存单元和缓存单元;根据寄存单元中的数据,执行第三指令获取存储单元中的数据、并将其存入缓存单元。Execute a first instruction to obtain data in a storage unit and store it in a register unit and a cache unit, wherein the operand of the first instruction includes a first address; execute a second instruction based on the data in the register unit to obtain data in the storage unit and store it in the register unit and the cache unit; execute a third instruction based on the data in the register unit to obtain data in the storage unit and store it in the cache unit.
本申请实施例中,通过引入三条新的预取指令和用于存取预取数据的第一寄存器,通过该三条预取指令,实现对第一寄存器的读写以及对其中存储的预取数据的运算获取下一步预取的地址信息,从而使基于无规则的间接寻址的数据预取方法得以实现,简化了预取的步骤与复杂性。In an embodiment of the present application, three new prefetch instructions and a first register for storing prefetched data are introduced. Through the three prefetch instructions, the reading and writing of the first register and the operation of the prefetched data stored therein are realized to obtain the address information for the next prefetch, thereby realizing a data prefetch method based on irregular indirect addressing and simplifying the prefetching steps and complexity.
在一种可能的实现方式中,在预取单元根据第一指令和第一地址从存储单元获取第一数据时,预取单元还依据第一偏移获取第一数据。处理器可以根据第一偏移和第一地址,计算正确的预取地址,并从该预取地址获取第一数据。可选地,该第一偏移可以是预设在处理器内部的数值,也可以是作为第一指令的操作数输入。In a possible implementation, when the prefetch unit obtains the first data from the storage unit according to the first instruction and the first address, the prefetch unit also obtains the first data according to the first offset. The processor can calculate the correct prefetch address according to the first offset and the first address, and obtain the first data from the prefetch address. Optionally, the first offset can be a value preset in the processor, or can be input as an operand of the first instruction.
在一种可能的实现方式中,第一指令的操作数包括第一偏移,其中,第一偏移根据预设标准调整,所述预设标准包括预取时效性,所述预取时效性用于指示读取预取数据时,所述缓存中是否存储有所述预取数据。In one possible implementation, the operand of the first instruction includes a first offset, wherein the first offset is adjusted according to a preset standard, and the preset standard includes prefetch timeliness, and the prefetch timeliness is used to indicate whether the prefetched data is stored in the cache when reading the prefetched data.
在处理系统中,预取时效性是评价系统执行数据预取的重要标准,即处理器在执行某一指令 时,其操作数(对应预取数据)是否恰好存储在对应的缓存地址中,即前次数据预取是成功且有时效性的。处理系统可以根据前一次预取的预取时效性来调整第一偏移,当时效性准确时,可以维持第一偏移不变;当时效性指示数据被过早替换掉时,调小第一偏移;当时效性指示数据还未被预取到时,调大第一偏移。通过调整第一偏移,可以灵活地适应处理系统对不同指令的预取需求,使数据预取更准确。In the processing system, prefetch timeliness is an important criterion for evaluating the system's data prefetching, that is, the processor executes a certain instruction. When the first offset is adjusted, the processing system can determine whether its operand (corresponding prefetched data) is stored in the corresponding cache address, that is, the previous data prefetch is successful and timely. The processing system can adjust the first offset according to the prefetch timeliness of the previous prefetch. When the timeliness is accurate, the first offset can be maintained unchanged; when the timeliness indicates that the data has been replaced too early, the first offset is reduced; when the timeliness indicates that the data has not been prefetched, the first offset is increased. By adjusting the first offset, the processing system can flexibly adapt to the prefetch requirements of different instructions, making data prefetching more accurate.
在一种可能的实现方式中,在预取单元根据第一数据和第二指令,确定第二地址并根据第二地址从存储单元获取第二数据时,依据第二偏移获取第二数据。In a possible implementation, when the prefetch unit determines the second address according to the first data and the second instruction and obtains the second data from the storage unit according to the second address, the second data is obtained according to the second offset.
在一种可能的实现方式中,在预取单元根据第二数据和第三指令,确定第三地址并根据预取地址从存储单元获取预取数据时,依据第三偏移获取第三数据。In a possible implementation, when the prefetch unit determines the third address according to the second data and the third instruction and obtains the prefetched data from the storage unit according to the prefetch address, the third data is obtained according to the third offset.
在一种可能的实现方式中,预取单元根据第一数据执行的第二指令的操作数包括第四地址,通过对第四地址和第一寄存器中的第一数据可以确定第二地址。In a possible implementation, an operand of a second instruction executed by the prefetch unit according to the first data includes a fourth address, and the second address can be determined by comparing the fourth address and the first data in the first register.
即,第四地址作为第二指令的操作数,第二指令可以基于此对通过第一指令存入第一寄存器中的第一数据进行计算,该第四地址可以是偏移量,也可以是寄存器或存储器地址。第二指令对第一数据和第四地址进行加法运算,获取第二地址。通过增加第二指令的操作数,使处理系统的数据预取更有弹性,适应多种无规则的间接寻址运算。That is, the fourth address is used as the operand of the second instruction, and the second instruction can calculate the first data stored in the first register by the first instruction based on the fourth address, and the fourth address can be an offset, or a register or memory address. The second instruction performs an addition operation on the first data and the fourth address to obtain the second address. By increasing the operand of the second instruction, the data prefetching of the processing system is made more flexible and adaptable to a variety of irregular indirect addressing operations.
在一种可能的实现方式中,预取单元根据第二数据执行的第三指令的操作数包括第五地址,通过对第五地址和第一寄存器中的第二数据可以确定第三地址。通过增加第三指令的操作数,使处理系统的数据预取更有弹性,适应多种无规则的间接寻址运算。In a possible implementation, the operand of the third instruction executed by the prefetch unit according to the second data includes a fifth address, and the third address can be determined by comparing the fifth address and the second data in the first register. By increasing the operand of the third instruction, the data prefetching of the processing system is made more flexible and adaptable to a variety of irregular indirect addressing operations.
在一种可能的实现方式中,执行所述第一指令、所述第二指令或所述第三指令后、访问所述存储单元前,确定各自对应的所述第一地址、所述第二地址或所述第三地址未发生数组越界。即,在预取单元执行上述三条指令中任一条指令时,发现一旦发生数据越界,则不执行相应的指令。In a possible implementation, after executing the first instruction, the second instruction, or the third instruction and before accessing the storage unit, it is determined that the first address, the second address, or the third address corresponding to each of them does not cross the array boundary. That is, when the prefetch unit executes any of the above three instructions, if it is found that data crosses the boundary, the corresponding instruction is not executed.
由于数据的预取需要参考地址信息,可能地,该地址信息为计算所得,因此可能触发对相关存储设备的越界访问,至少包括以下两个情况:1、数据越界,如超过数组的边界;2、访问权限越界,如违法访问或待访问的存储器超出指令访问权限。通过将触发越界的预取指令作为空操作(no-operation instruction,NOP)指令处理,并且不对CPU架构状态做任何改变,可以避免系统锁死等风险。Since data prefetching requires reference to address information, which may be calculated, it may trigger out-of-bounds access to related storage devices, including at least the following two situations: 1. Data out-of-bounds, such as exceeding the boundary of an array; 2. Access rights out-of-bounds, such as illegal access or the memory to be accessed exceeds the instruction access rights. By treating the prefetch instruction that triggers out-of-bounds as a no-operation instruction (NOP) instruction and not making any changes to the CPU architecture state, risks such as system lockup can be avoided.
在一种可能的实现方式中,预取单元仅将通过第三指令获取的预取数据存入缓存中,从而降低第一寄存器的存储压力,释放更多空间给后续预取流程中通过第一指令和第二指令存入的数据。In a possible implementation, the prefetch unit only stores the prefetched data obtained through the third instruction into the cache, thereby reducing the storage pressure of the first register and releasing more space for data stored through the first instruction and the second instruction in the subsequent prefetch process.
在一种可能的实现方式中,当所述第一寄存器中没有空的存储单元且有第五数据待存入时,替换所述第一寄存器中的第四数据,其中所述第四数据是最早存入所述第一寄存器的,合理更新第一寄存器中的数据并提高其利用率。In one possible implementation, when there are no empty storage cells in the first register and fifth data is to be stored, the fourth data in the first register is replaced, wherein the fourth data is the earliest stored in the first register, and the data in the first register is reasonably updated and its utilization rate is improved.
第三方面,本申请实施例提供一种数据预取装置,装置包括:处理器,第一寄存器,第一存储器和缓存,第一存储器存储的数据中包括待预取数据;In a third aspect, an embodiment of the present application provides a data pre-fetching device, the device comprising: a processor, a first register, a first memory and a cache, wherein the data stored in the first memory includes data to be pre-fetched;
处理器用于执行第一指令集中的指令、至少包括三条指令:The processor is used to execute instructions in a first instruction set, including at least three instructions:
第一指令用于根据第一地址从第一存储器中获取第一数据,并将其存入第一寄存器与缓存;第二指令用于根据第一寄存器中的第一数据,确定第二地址,并从第一存储器中获取与第二地址对应的第二数据,并将其存入第一寄存器与缓存;第三指令用于根据第一寄存器中的第二数据确定预取地址,并从第一存储器中获取该待预取数据并存入缓存。The first instruction is used to obtain first data from the first memory according to the first address, and store it in the first register and the cache; the second instruction is used to determine the second address according to the first data in the first register, and obtain the second data corresponding to the second address from the first memory, and store it in the first register and the cache; the third instruction is used to determine the prefetch address according to the second data in the first register, and obtain the data to be prefetched from the first memory and store it in the cache.
本申请实施例中,在处理系统中引入三条新的预取指令和用于存取预取数据的第一寄存器,通过该三条预取指令,实现对第一寄存器的读写以及对其中存储的预取数据的运算获取下一步预取的地址信息,从而使基于无规则的间接寻址的数据预取方法得以实现,简化了预取的步骤与复杂性。In an embodiment of the present application, three new prefetch instructions and a first register for accessing prefetched data are introduced into the processing system. Through the three prefetch instructions, the reading and writing of the first register and the operation of the prefetched data stored therein are realized to obtain the address information of the next prefetch, thereby realizing the data prefetching method based on irregular indirect addressing, simplifying the prefetching steps and complexity.
在一种可能的实现方式中,在处理器根据第一指令和第一地址从第一存储器获取第一数据时,处理器还依据第一偏移获取第一数据。处理器可以根据第一偏移和第一地址,计算正确的预取地址,并从该预取地址获取第一数据。可选地,该第一偏移可以是预设在处理器内部的数值,也可以是作为第一指令的操作数输入。In a possible implementation, when the processor obtains the first data from the first memory according to the first instruction and the first address, the processor also obtains the first data according to the first offset. The processor can calculate the correct pre-fetch address according to the first offset and the first address, and obtain the first data from the pre-fetch address. Optionally, the first offset can be a value preset in the processor, or can be input as an operand of the first instruction.
在一种可能的实现方式中,第一指令的操作数包括第一偏移,其中,第一偏移根据预设标准 调整,所述预设标准包括预取时效性,所述预取时效性用于指示读取预取数据时,所述缓存中是否存储有所述预取数据。In a possible implementation, the operand of the first instruction includes a first offset, wherein the first offset is based on a preset standard Adjustment, the preset standard includes pre-fetch timeliness, and the pre-fetch timeliness is used to indicate whether the pre-fetched data is stored in the cache when reading the pre-fetched data.
在处理系统中,预取时效性是评价系统执行数据预取的重要标准,即处理器在执行某一指令时,其操作数(对应预取数据)是否恰好存储在对应的缓存地址中,即前次数据预取是成功且符合时效性的。处理系统可以根据前一次预取的预取时效性来调整第一偏移,当时效性准确时,可以维持第一偏移不变;当时效性指示数据被过早替换掉时,调小第一偏移;当时效性指示数据还未被预取到时,调大第一偏移。通过调整第一偏移,可以灵活地适应处理系统对不同指令的预取需求,使数据预取更准确。In a processing system, prefetch timeliness is an important criterion for evaluating the system's data prefetching, that is, when the processor executes a certain instruction, whether its operand (corresponding to the prefetched data) is exactly stored in the corresponding cache address, that is, the previous data prefetch is successful and meets the timeliness. The processing system can adjust the first offset according to the prefetch timeliness of the previous prefetch. When the timeliness is accurate, the first offset can be maintained unchanged; when the timeliness indicates that the data has been replaced too early, the first offset is reduced; when the timeliness indicates that the data has not been prefetched, the first offset is increased. By adjusting the first offset, the processing system can flexibly adapt to the prefetching requirements of different instructions, making data prefetching more accurate.
在一种可能的实现方式中,在处理器根据第一数据和第二指令,确定第二地址并根据第二地址从第一存储器获取第二数据时,依据第二偏移获取第二数据。In a possible implementation, when the processor determines the second address according to the first data and the second instruction and obtains the second data from the first memory according to the second address, the second data is obtained according to the second offset.
在一种可能的实现方式中,在处理器根据第二数据和第三指令,确定第三地址并根据预取地址从第一存储器获取预取数据时,依据第三偏移获取预取数据。In a possible implementation, when the processor determines the third address according to the second data and the third instruction and obtains the pre-fetched data from the first memory according to the pre-fetched address, the pre-fetched data is obtained according to the third offset.
在一种可能的实现方式中,处理器根据第一数据执行的第二指令的操作数包括第三地址,通过对第三地址和第一寄存器中的第一数据可以确定第二地址。In a possible implementation, an operand of a second instruction executed by the processor according to the first data includes a third address, and the second address can be determined by comparing the third address and the first data in the first register.
即,第三地址作为第二指令的操作数,可以对通过第一指令存入第一寄存器中的第一数据进行计算,该第三地址可以是偏移量,也可以是寄存器或存储器地址。第二指令对第一数据和第三地址进行加法运算,获取第二地址。通过增加第二指令的操作数,使处理系统的数据预取更有弹性,适应多种无规则的间接寻址运算。That is, the third address is used as the operand of the second instruction to calculate the first data stored in the first register by the first instruction. The third address can be an offset or a register or memory address. The second instruction performs an addition operation on the first data and the third address to obtain the second address. By increasing the operand of the second instruction, the data prefetching of the processing system is made more flexible and adaptable to a variety of irregular indirect addressing operations.
在一种可能的实现方式中,处理器根据第二数据执行的第三指令的操作数包括第四地址,通过对第四地址和第一寄存器中的第二数据可以确定预取地址。通过增加第三指令的操作数,使处理系统的数据预取更有弹性,适应多种无规则的间接寻址运算。In a possible implementation, the operand of the third instruction executed by the processor according to the second data includes a fourth address, and the prefetch address can be determined by comparing the fourth address and the second data in the first register. By increasing the operand of the third instruction, the data prefetching of the processing system is made more flexible and adaptable to a variety of irregular indirect addressing operations.
在一种可能的实现方式中,在处理器执行上述三条指令中任一条指令时,发现其对应的第一地址、第二地址或预取地址一旦发生数据越界的情况,则不执行相应的指令。In a possible implementation, when the processor executes any one of the above three instructions, if it is found that the corresponding first address, second address or pre-fetch address has data out of bounds, the corresponding instruction is not executed.
由于数据的预取需要参考地址信息,可能地,该地址信息为计算所得,因此可能触发对相关存储设备的越界访问,至少包括以下两个情况:1、数据越界,如超过数组的边界;2、访问权限越界,如违法访问或待访问的存储器超出指令访问权限。通过将触发越界的预取指令作为空操作(no-operation instruction,NOP)指令处理,并且不对CPU架构状态做任何改变,可以避免系统锁死等风险。Since data prefetching requires reference to address information, which may be calculated, it may trigger out-of-bounds access to related storage devices, including at least the following two situations: 1. Data out-of-bounds, such as exceeding the boundary of an array; 2. Access rights out-of-bounds, such as illegal access or the memory to be accessed exceeds the instruction access rights. By treating the prefetch instruction that triggers out-of-bounds as a no-operation instruction (NOP) instruction and not making any changes to the CPU architecture state, risks such as system lockup can be avoided.
在一种可能的实现方式中,处理器仅将通过第三指令获取的预取数据存入缓存中,从而降低第一寄存器的存储压力,释放更多空间给后续预取流程中通过第一指令和第二指令存入的数据。In a possible implementation, the processor only stores the pre-fetched data obtained through the third instruction in the cache, thereby reducing the storage pressure of the first register and releasing more space for data stored through the first instruction and the second instruction in the subsequent pre-fetch process.
在一种可能的实现方式中,当所述第一寄存器中没有空的存储单元且有第五数据待存入时,替换所述第一寄存器中的第四数据,其中所述第四数据是最早存入所述第一寄存器的,合理更新第一寄存器中的数据并提高其利用率。In one possible implementation, when there are no empty storage cells in the first register and fifth data is to be stored, the fourth data in the first register is replaced, wherein the fourth data is the earliest stored in the first register, and the data in the first register is reasonably updated and its utilization rate is improved.
在一种可能的实现方式中,该处理器和该第一寄存器、该缓存都集成在一个芯片上。In a possible implementation, the processor, the first register, and the cache are all integrated on one chip.
第四方面,本申请实施例提供一种电子设备,该电子设备包括存储器和处理器;处理器用于向存储器发送第一预取指令、第二预取指令和第三预取指令,用于将存储在存储器的第一数据区域中的至少一个数据块存储至至少一个缓存单元。In a fourth aspect, an embodiment of the present application provides an electronic device, comprising a memory and a processor; the processor is used to send a first prefetch instruction, a second prefetch instruction and a third prefetch instruction to the memory, for storing at least one data block stored in a first data area of the memory to at least one cache unit.
第五方面,本申请实施例提供一种处理系统,该处理系统包括存储器和处理器,该存储器用于存储计算机程序,该处理器被配置用于执行该存储器存储的全部或部分计算机程序,执行上述第一方面中记载的方法。In a fifth aspect, an embodiment of the present application provides a processing system, which includes a memory and a processor, wherein the memory is used to store a computer program, and the processor is configured to execute all or part of the computer program stored in the memory to execute the method described in the first aspect.
第六方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储中存储有计算机程序和第一指令集,该计算机程序被控制器执行时用于实现如第一方面所述的方法。In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program and a first instruction set, and when the computer program is executed by a controller, it is used to implement the method described in the first aspect.
第七方面,本申请实施例提供一种计算机程序产品,当所述计算机程序产品被控制器执行时用于实现上述第一方面所述的方法。In a seventh aspect, an embodiment of the present application provides a computer program product, which, when executed by a controller, is used to implement the method described in the first aspect above.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普 通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following briefly introduces the drawings required for use in the description of the embodiments of the present application. Obviously, the drawings described below are only some embodiments of the present application and are not intended for general use in the art. According to technical personnel, other drawings can be obtained based on these drawings without any creative work.
图1是本申请实施例提供的一种处理器的结构示意图;FIG1 is a schematic diagram of the structure of a processor provided in an embodiment of the present application;
图2是本申请实施例提供的一种处理器从存储器中预先读取数据的结构示意图;FIG2 is a schematic diagram of a structure in which a processor pre-reads data from a memory provided by an embodiment of the present application;
图3是本申请实施例提供的一种处理器软件预取数据的示意图;FIG3 is a schematic diagram of a processor software pre-fetching data provided by an embodiment of the present application;
图4是本申请实施例提供的一种处理器从存储器中预先读取数据的示意图;FIG4 is a schematic diagram of a processor pre-reading data from a memory provided by an embodiment of the present application;
图5是本申请实施例提供的一种电子设备的结构示意图;FIG5 is a schematic diagram of the structure of an electronic device provided in an embodiment of the present application;
图6是本申请实施例提供的一种处理器通过第一预取指令预取数据的结构示意图;6 is a schematic diagram of a structure in which a processor pre-fetches data through a first pre-fetch instruction provided by an embodiment of the present application;
图7是本申请实施例提供的一种处理器通过第二预取指令预取数据的结构示意图;7 is a schematic diagram of a structure in which a processor pre-fetches data through a second pre-fetch instruction provided in an embodiment of the present application;
图8是本申请实施例提供的一种处理器通过第三预取指令预取数据的结构示意图;8 is a schematic diagram of a structure in which a processor pre-fetches data through a third pre-fetch instruction provided by an embodiment of the present application;
图9是本申请实施例提供的又一种处理器软件预取数据的示意图。FIG. 9 is a schematic diagram of yet another processor software pre-fetching data provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.
本文所提及的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,"一个“或者”一"等类似词语也不表示数量限制,而是表示存在至少一个。The words "first", "second" and the like mentioned herein do not indicate any order, quantity or importance, but are only used to distinguish different components. Similarly, words such as "one" or "an" do not indicate quantity limitation, but indicate the existence of at least one.
在本申请实施例中,“示例性的”或者“例如”等词用于表示例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。在本申请实施例的描述中,除非另有说明,“多个”的含义是指两个或两个以上。In the embodiments of the present application, words such as "exemplary" or "for example" are used to indicate examples, illustrations or descriptions. Any embodiment or design described as "exemplary" or "for example" in the embodiments of the present application should not be interpreted as being more preferred or more advantageous than other embodiments or designs. Specifically, the use of words such as "exemplary" or "for example" is intended to present related concepts in a concrete way. In the description of the embodiments of the present application, unless otherwise specified, the meaning of "multiple" refers to two or more.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that they can still modify the technical solutions described in the aforementioned embodiments, or replace some or all of the technical features therein with equivalents. However, these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the scope of the technical solutions of the embodiments of the present application.
为了便于读者理解本申请实施例的方案,下面对本申请实施例涉及的一些技术术语进行说明。In order to facilitate readers to understand the solutions of the embodiments of the present application, some technical terms involved in the embodiments of the present application are explained below.
处理器(central processing unit,CPU),是电子设备的主要器件之一,也是电子设备中的核心配件。其功能主要是解释计算机指令以及处理计算机软件中的数据。处理器是电子设备中负责读取指令,对指令进行译码,并执行指令的核心部件。如图1所示,处理器1主要包括两个部分,即处理控制单元11和逻辑运算单元(arithmetic logical unit,ALU)12。除此以外,处理器1还包括通用寄存器组13及实现数据联系和控制联系的总线。通用寄存器组13包括存储地址码的寄存器、存储数据或指令的寄存器、存储其他信息的寄存器等。The processor (central processing unit, CPU) is one of the main components of electronic devices and is also a core component in electronic devices. Its main function is to interpret computer instructions and process data in computer software. The processor is the core component of electronic devices responsible for reading instructions, decoding instructions, and executing instructions. As shown in Figure 1, the processor 1 mainly includes two parts, namely, a processing control unit 11 and a logic operation unit (arithmetic logical unit, ALU) 12. In addition, the processor 1 also includes a general register group 13 and a bus for realizing data connection and control connection. The general register group 13 includes registers for storing address codes, registers for storing data or instructions, registers for storing other information, etc.
存储器是用来存储程序和各种数据信息的记忆部件。存储器可分为主存储器和辅助存储器两大类。和处理器直接交换信息的是主存储器。主存储器的工作方式是按存储单元的地址存放或读取各类信息,这种工作方式可称为访问存储器。主存储器中汇集存储单元的载体称为存储体,存储体中的每个存储单元能够存放一串二进制码表示的信息,该信息的总位数称为一个存储单元的字长。存储单元的地址与存储在其中的信息是一一对应的,地址只有一个,固定不变,而存储在其中的信息是可以更换的。指示每个存储单元的二进制编码称为地址码。寻找某个存储单元时,先要给出它的地址码。The memory is a memory component used to store programs and various data information. The memory can be divided into two categories: main memory and auxiliary memory. The main memory directly exchanges information with the processor. The working mode of the main memory is to store or read various types of information according to the address of the storage unit. This working mode can be called access memory. The carrier that collects storage units in the main memory is called a storage body. Each storage unit in the storage body can store a string of information represented by binary code. The total number of bits of the information is called the word length of a storage unit. The address of the storage unit corresponds to the information stored in it one by one. There is only one address, which is fixed, while the information stored in it can be replaced. The binary code indicating each storage unit is called the address code. When looking for a storage unit, its address code must be given first.
为了处理器1规范、有效、快速地运行,设定有指令集(instruction set architecture,ISA),指令集是一种处理器1运行所需的规范语言,其包括了多种预先设定的指令,这些指令大致可分为运算指令、数据移动指令和控制指令。其中,运算指令用于指示在处理器1的逻辑运算单元12执行的具体的计算操作;数据移动指令用于指示将处理器1对所需处理的数据或指令进行读取操作或者存储操作;控制指令用于指示处理器1更改指令的执行顺序,以实现执行的程序的跳转、循环等等操作。一个指令主要包括两个部分,分别为操作码和操作数。操作数包含了指令所需执行的对象,如地址码等信息;操作码包含了处理器1根据操作数执行指令的具体操作逻辑。同时, 在集成较为复杂的处理器1中,可以设置较为复杂的指令,而在集成较为简单的处理器1中,一般设置较为简单的指令。In order for the processor 1 to run in a standardized, effective and fast manner, an instruction set (ISA) is set. The instruction set is a standard language required for the operation of the processor 1, which includes a variety of pre-set instructions. These instructions can be roughly divided into operation instructions, data movement instructions and control instructions. Among them, the operation instructions are used to indicate the specific calculation operations performed in the logic operation unit 12 of the processor 1; the data movement instructions are used to instruct the processor 1 to read or store the data or instructions to be processed; the control instructions are used to instruct the processor 1 to change the execution order of instructions to achieve jumps, loops and other operations of the executed program. An instruction mainly includes two parts, namely the operation code and the operand. The operand contains the object that the instruction needs to execute, such as address code and other information; the operation code contains the specific operation logic of the processor 1 to execute the instruction according to the operand. At the same time, In a processor 1 with a relatively complex integration, relatively complex instructions may be set, whereas in a processor 1 with a relatively simple integration, relatively simple instructions are generally set.
如图2所示,处理器1需要访问存储器2中所存储的数据或者指令,以实现处理器1的正常运行。处理器1具体的工作流程可大致分为指令取指、指令译码、指令执行、数据访存和数据写回五个部分。其中,指令取指是指处理器1在工作运行中需要执行的一些指令被存储至存储器2中,当处理器1开始运行时,向存储器2发送取指指令,以从存储器2中读取出处理器1需要执行的指令。指令译码是指处理器1将从存储器2中读取出的需要执行指令进行翻译的过程。在这个阶段,如果经过译码之后得到指令存在一些操作数寄存器索引,可以使用此操作数寄存器索引从处理器1的通用寄存器组13中将对应的操作数读出。在指令译码之后所需要进行的计算类型都已得知,并且已经从通用寄存器组13中读取出了所需的操作数,那么接下来便进行指令执行。指令执行是指对处理器1对需要执行的指令进行真正运算的过程。譬如,如果指令是一条加法运算指令,则对操作数进行加法操作。在指令执行这个阶段,常常将逻辑运算单元12作为实施具体运算的硬件功能单元。数据访存是由处理器1向存储器2发送数据访问指令,数据访问指令往往是指令集中最重要的指令类型之一,是指存储器2通过访问指令将数据从存储器2中读出,或者写入存储器2的过程。数据写回是指将指令执行的结果写回通用寄存器组13或者存储器2的过程。如果是普通运算指令,该结果值来自于指令执行阶段计算的结果;如果是存储器读指令,该结果来自于数据访存阶段从存储器2中读取出来的数据。As shown in FIG2 , the processor 1 needs to access the data or instructions stored in the memory 2 to realize the normal operation of the processor 1. The specific workflow of the processor 1 can be roughly divided into five parts: instruction fetch, instruction decoding, instruction execution, data memory access and data write back. Among them, instruction fetch means that some instructions that the processor 1 needs to execute during operation are stored in the memory 2. When the processor 1 starts to run, it sends an instruction fetch instruction to the memory 2 to read the instructions that the processor 1 needs to execute from the memory 2. Instruction decoding refers to the process in which the processor 1 translates the instructions to be executed read from the memory 2. At this stage, if the instruction has some operand register indexes after decoding, the operand register index can be used to read the corresponding operand from the general register group 13 of the processor 1. After the instruction decoding, the calculation type required is known, and the required operands have been read from the general register group 13, then the instruction execution is performed next. Instruction execution refers to the process in which the processor 1 actually operates on the instructions to be executed. For example, if the instruction is an addition instruction, the operand is added. In the stage of instruction execution, the logic operation unit 12 is often used as a hardware functional unit for implementing specific operations. Data access is the process of sending data access instructions from the processor 1 to the memory 2. The data access instruction is often one of the most important instruction types in the instruction set, which refers to the process of the memory 2 reading data from the memory 2 or writing data into the memory 2 through the access instruction. Data writeback refers to the process of writing the result of the instruction execution back to the general register group 13 or the memory 2. If it is a normal operation instruction, the result value comes from the result calculated in the instruction execution stage; if it is a memory read instruction, the result comes from the data read from the memory 2 in the data access stage.
处理器1对于存储器2中的数据或者指令主要有两种访存方式,直接寻址和间接寻址。直接寻址,以图3中的指令为例,其操作数为地址码信息,用于指示待读取数据在存储器2中的数据地址。当该地址码信息中存储的就是待读取数据时,这种寻址方式即为直接寻址。此外,在存储器2的该地址码中,还可能存储有第二地址码信息,即处理器1在访存存储器2后,获得的是第二地址码信息,需要根据该第二地址码信息再次访存,才能获取待读取数据。这种寻址方式即为间接寻址。直接寻址中,处理器对于存储器的访存效率高;间接寻址的优点是寻址范围大,但处理器对于存储器多次访问,指令执行时间长。Processor 1 has two main memory access modes for data or instructions in memory 2, direct addressing and indirect addressing. In direct addressing, taking the instruction in Figure 3 as an example, its operand is address code information, which is used to indicate the data address of the data to be read in memory 2. When the address code information stores the data to be read, this addressing mode is direct addressing. In addition, the address code of memory 2 may also store second address code information, that is, after processor 1 accesses memory 2, it obtains the second address code information, and needs to access the memory again according to the second address code information to obtain the data to be read. This addressing mode is indirect addressing. In direct addressing, the processor has high memory access efficiency for the memory; the advantage of indirect addressing is a large addressing range, but the processor accesses the memory multiple times and the instruction execution time is long.
由于处理器1和存储器2的运行频率存在差异,导致处理器1工作效率低的问题。一般采用预先从存储器2中读取处理器1需要的数据的方式来提高处理器1的运行效率。预先读取数据的方式分为硬件预取和软件预取两种。关于硬件预取,一般为设置一个存储器控制器,通过存储器控制器来估计处理器1可能将要读取的数据、并计算生成相应的预取地址,之后由存储器控制器从存储器2进行数据预取。而关于软件预取,主要通过软件程序显示插入预取指令或编译器加入预取指令实现。处理器1的内部运行程序一般为汇编语言等,其是指令集的一种体现。在处理器1的运行过程中,由编译器程序将汇编语言转换为机器可读的二进制语言,在这个过程中,可以对处理器1进行设置,由编译器在工作过程对处理器1将要读取的数据进行预测。根据编译器的预测结果估计可以得到将要读取的数据位于存储器2中的存储地址,并根据该存储地址生成预取指令,从而实现对所预测的数据进行预先读取,便于处理器1在执行到对该预测数据操作的指令时实现快速读取。例如,ARM操作指令系统中的PRFM(Prefetch From Memory)指令,可以用于将某一地址的操作数预取至缓存。Due to the difference in the operating frequency of the processor 1 and the memory 2, the problem of low working efficiency of the processor 1 is caused. Generally, the method of pre-reading the data required by the processor 1 from the memory 2 is adopted to improve the operating efficiency of the processor 1. The methods of pre-reading data are divided into hardware pre-fetching and software pre-fetching. Regarding hardware pre-fetching, generally a memory controller is set up, and the memory controller is used to estimate the data that the processor 1 may read, and calculate and generate the corresponding pre-fetch address, and then the memory controller pre-fetches the data from the memory 2. As for software pre-fetching, it is mainly realized by inserting pre-fetch instructions through the software program display or adding pre-fetch instructions by the compiler. The internal running program of the processor 1 is generally assembly language, etc., which is a manifestation of the instruction set. During the operation of the processor 1, the compiler program converts the assembly language into a machine-readable binary language. In this process, the processor 1 can be set, and the compiler predicts the data to be read by the processor 1 during the working process. According to the prediction result of the compiler, it is estimated that the storage address of the data to be read in the memory 2 can be obtained, and a pre-fetch instruction is generated according to the storage address, so as to realize the pre-reading of the predicted data, so that the processor 1 can realize fast reading when executing the instruction to operate the predicted data. For example, the PRFM (Prefetch From Memory) instruction in the ARM operating instruction system can be used to prefetch the operand of a certain address into the cache.
在处理器1运行预取指令的过程中,预取的时效性对整个处理系统的效率有很大影响。时效性,即处理器1在执行某一指令时,其操作数为第一预取数据时,在对应的缓存地址中恰好存储了该第一预取数据。需要注意的是,该第一预取数据既没有太早替换掉对应的缓存地址中的其他有效数据,也没有太晚存储入对应的缓存地址中而导致处理器1的处理流程的等待。软件预取指令的时效性往往根据平台的不同而各不相同。In the process of processor 1 running prefetch instructions, the timeliness of prefetching has a great impact on the efficiency of the entire processing system. Timeliness means that when processor 1 executes a certain instruction, when its operand is the first prefetched data, the first prefetched data is just stored in the corresponding cache address. It should be noted that the first prefetched data neither replaces other valid data in the corresponding cache address too early, nor is it stored in the corresponding cache address too late to cause the processing flow of processor 1 to wait. The timeliness of software prefetch instructions often varies depending on the platform.
而在如今的处理系统中,对固定模式的访存进行预取有非常好的时效性。例如,对数组A[i]中的数据进行预取,在实际执行中即为对数组A按下标进行的遍历访问。以图4为例,处理器1包括缓存单元13和预取单元14,存储器2中的存储单元11、12、13存有数组A[i]的连续数据内容。当预取单元14对数组A[i]进行软件或硬件预取时,其可以从存储器2中取出连续的、存储单元11、12、13中的数据内容存入缓存单元13。In today's processing systems, prefetching fixed-mode memory access has very good time efficiency. For example, prefetching the data in array A[i] is actually a traversal access to array A according to the index. Taking Figure 4 as an example, the processor 1 includes a cache unit 13 and a prefetch unit 14, and the storage units 11, 12, and 13 in the memory 2 store the continuous data content of the array A[i]. When the prefetch unit 14 performs software or hardware prefetching on the array A[i], it can take out the continuous data content in the storage units 11, 12, and 13 from the memory 2 and store it in the cache unit 13.
由于访存模式固定,因而非常容易被当前数据预取机制提前加载数据,硬件预取和软件预取 的时效性都很好。但基于无规则的间接寻址访存仍属于处理系统预取数据的难点。例如,对数组B[func(A[i])]的访问。参考图4,即存储器2中的存储单元22、24、26存储有数组B[func(A[i])]的数据内容,在获得数组A[i]对应的内容后,还需要运算获得存储单元22、24、26的地址信息,才能最终对所述存储单元进行访问。更进一步地,存储单元22、24、26中存储的也可能是地址信息,用于执行进一步地地址计算,不再赘述。在实际执行中,由于数组B的下标是数组A中的数据经函数变换而形成,难以从历史访问记录推测未来的访问地址,因此预取时效性差。Since the memory access mode is fixed, it is very easy to load data in advance by the current data prefetching mechanism, hardware prefetching and software prefetching The timeliness is very good. However, the memory access based on irregular indirect addressing is still a difficult point for the processing system to pre-fetch data. For example, the access to array B[func(A[i])]. Referring to Figure 4, the storage units 22, 24, and 26 in the memory 2 store the data content of the array B[func(A[i])]. After obtaining the content corresponding to the array A[i], it is necessary to calculate and obtain the address information of the storage units 22, 24, and 26 before finally accessing the storage units. Furthermore, the storage units 22, 24, and 26 may also store address information for further address calculation, which will not be repeated here. In actual execution, since the subscript of array B is formed by the function transformation of the data in array A, it is difficult to infer the future access address from the historical access records, so the pre-fetching timeliness is poor.
此外,当处理器1根据现有的软件预取指令进行预取时,预取数据往往会被存储至缓存中。但由于指令限制,缓存中的预取数据无法直接使用,需要处理器1使用额外指令进行数据加载,例如load指令。由于访问时,如果发生访问越界则会触发例外导致清空指令流水线,导致处理器1挂起或异常。为解决这个问题,需要增加额外的指令进行边界检验,降低了处理器1的运行效率。In addition, when processor 1 performs prefetching according to existing software prefetch instructions, the prefetched data is often stored in the cache. However, due to instruction restrictions, the prefetched data in the cache cannot be used directly, and processor 1 needs to use additional instructions to load data, such as load instructions. During access, if an access is out of bounds, an exception will be triggered, resulting in the clearing of the instruction pipeline, causing processor 1 to hang or become abnormal. To solve this problem, additional instructions need to be added for boundary checking, which reduces the operating efficiency of processor 1.
为了解决无规则的间接寻址时效性差以及提高处理器运行效率,本申请实施例提供了一种基于软硬件结合的数据预取方法和数据预取装置,如图5所示,该数据预取装置可以是电子设备5,包括处理器1、第一存储器2、第一寄存器131和缓存132。在预取数据时,处理器1向第一存储器2发送第一预取指令;该第一预取指令用于指示第一存储器2将存储在第一存储器2中的第一数据区域中固定数量个连续的数据块存储至第一寄存器131和缓存132中。In order to solve the poor timeliness of irregular indirect addressing and improve the operating efficiency of the processor, the embodiment of the present application provides a data prefetching method and a data prefetching device based on the combination of software and hardware. As shown in FIG5 , the data prefetching device can be an electronic device 5, including a processor 1, a first memory 2, a first register 131 and a cache 132. When prefetching data, the processor 1 sends a first prefetch instruction to the first memory 2; the first prefetch instruction is used to instruct the first memory 2 to store a fixed number of continuous data blocks in the first data area stored in the first memory 2 into the first register 131 and the cache 132.
在执行下一步时,处理器1还能向第一存储器2和第一寄存器131发送第三预取指令;该第三预取指令用于根据第一寄存器131中存储的数据块信息运算产生新的预取地址,并指示第一存储器2将存储在第一存储器2中的预取地址对应的固定数量个连续的数据块存储至缓存132中。When executing the next step, the processor 1 can also send a third prefetch instruction to the first memory 2 and the first register 131; the third prefetch instruction is used to generate a new prefetch address based on the data block information stored in the first register 131, and instruct the first memory 2 to store a fixed number of consecutive data blocks corresponding to the prefetch address stored in the first memory 2 into the cache 132.
更进一步地,处理器1还能在执行间接寻址时,运行第二预取指令,该第二预取指令用于根据第一寄存器131中存储的数据块信息运算产生新的预取地址,并指示第一存储器2将存储在第一存储器2中的预取地址对应的固定数量个连续的数据块存储至缓存132和第一寄存器131中。该存储在第一寄存器131中的数据块内容可继续通过处理器1运行第二预取指令或第三预取指令供后续间接寻址使用。Furthermore, the processor 1 can also run a second prefetch instruction when performing indirect addressing, and the second prefetch instruction is used to generate a new prefetch address according to the data block information stored in the first register 131, and instruct the first memory 2 to store a fixed number of continuous data blocks corresponding to the prefetch address stored in the first memory 2 into the cache 132 and the first register 131. The data block content stored in the first register 131 can continue to be used for subsequent indirect addressing by the processor 1 running the second prefetch instruction or the third prefetch instruction.
由上可知,在硬件系统中增加的第一寄存器131,用于暂存处理器1通过软件预取指令(如第一预取指令和第二预取指令)从第一存储器2中获得的数据,处理器1可以通过软件预取指令(如第二预取指令和第三预取指令)将暂存在第一寄存器131中的数据用于计算生成新的预取地址,从而达到简化基于无规则的间接寻址访存的处理过程。为配合新增加的第一寄存器131,也需要修改或新增软件预取指令(上述的第一预取指令和第二预取指令),使处理器1在计算新的预取地址、或存储预取数据时,合理使用第一寄存器131及其中数据。As can be seen from the above, the first register 131 added in the hardware system is used to temporarily store the data obtained by the processor 1 from the first memory 2 through the software prefetch instruction (such as the first prefetch instruction and the second prefetch instruction). The processor 1 can use the data temporarily stored in the first register 131 to calculate and generate a new prefetch address through the software prefetch instruction (such as the second prefetch instruction and the third prefetch instruction), thereby simplifying the processing process of accessing memory based on irregular indirect addressing. In order to cooperate with the newly added first register 131, it is also necessary to modify or add software prefetch instructions (the above-mentioned first prefetch instruction and the second prefetch instruction) so that the processor 1 can reasonably use the first register 131 and the data therein when calculating a new prefetch address or storing prefetch data.
通过在处理系统架构中增加第一寄存器131,以及新增的软件预取指令,使处理系统在进行间接寻址的数据预取时,通过将前一次预取获得的第一预取数据存入该新增的第一寄存器131中,使得在执行后一条预取指令时可以直接读取调用该第一预取数据来计算新的预取地址,改善预取时效性。避免了现有技术中,只将前一次预取获得的第一预取数据存入缓存、且必须对缓存中的第一预取数据额外调用数据加载指令,才能获取该第一预取数据、并用于计算新的预取地址,该过程中存在的对缓存的多次访问。因此加速了处理系统的运行效率,也避免了可能出现的访问越界。尤其在多层嵌套间接寻址时,可以极大程度地减少处理器1对第一存储器2或缓存132的访问次数,提高运算效率。By adding a first register 131 and a newly added software prefetch instruction to the processing system architecture, when the processing system performs indirect addressing data prefetching, the first prefetch data obtained in the previous prefetch is stored in the newly added first register 131, so that when executing the next prefetch instruction, the first prefetch data can be directly read and called to calculate a new prefetch address, thereby improving the timeliness of prefetching. This avoids the multiple accesses to the cache in the prior art, in which only the first prefetch data obtained in the previous prefetch is stored in the cache, and the first prefetch data in the cache must be additionally called with a data load instruction to obtain the first prefetch data and used to calculate a new prefetch address. Therefore, the operating efficiency of the processing system is accelerated, and possible cross-boundary access is avoided. In particular, in the case of multi-layer nested indirect addressing, the number of accesses of the processor 1 to the first memory 2 or the cache 132 can be greatly reduced, thereby improving the computing efficiency.
在一些可能的实施方式中,该第一寄存器131可以是PRDR(Prefetch Data Register)寄存器,该第一预取指令可以是PRFMTRAS(PRefetch From Memory To Register with Adaptive Stride)指令,其操作数包括地址信息,可选地还可以包括地址偏移量,用于使处理器将对应于该地址信息的数据预取至处理器缓存与PRDR寄存器中。示例性地,所述地址信息可以是存储单元的标识,该存储单元中存储有第一预取地址。如图6所示,通过PRFMTRAS指令,处理器可以根据第一预取地址和地址偏移量从存储器中获得相应的预取数据,并将之存入缓存和PRDR寄存器中。示例性地,该第一预取地址可以是待预取的数据的存储位置的起始地址,以预设的地址偏移量预取待预取的数据。例如第一预取地址为0x90,地址偏移量为0x08,则即从0x98地址处开始读取数据,一直读取到0xA0为止(包括0xA0地址)。此外,该第一预取地址还可以指示待预取的数据的存 储位置的尾地址,也可以在数据地址信息内存储一个存储器地址寄存器的标识,通过该存储器地址寄存器来提供相应的尾地址。在一些可能的实施方法中,该第一预取指令的地址偏移量可以作为操作数体现在指令中,也可以预设在系统中、而不必以指令的操作数的形式体现。In some possible implementations, the first register 131 may be a PRDR (Prefetch Data Register) register, and the first prefetch instruction may be a PRFMTRAS (PRefetch From Memory To Register with Adaptive Stride) instruction, whose operand includes address information, and optionally may also include an address offset, for enabling the processor to prefetch data corresponding to the address information into the processor cache and the PRDR register. Exemplarily, the address information may be an identifier of a storage unit, in which the first prefetch address is stored. As shown in FIG6 , through the PRFMTRAS instruction, the processor may obtain corresponding prefetch data from the memory according to the first prefetch address and the address offset, and store it in the cache and the PRDR register. Exemplarily, the first prefetch address may be the starting address of the storage location of the data to be prefetched, and the data to be prefetched is prefetched with a preset address offset. For example, if the first prefetch address is 0x90 and the address offset is 0x08, then data is read starting from address 0x98 and continuing to 0xA0 (including address 0xA0). In addition, the first prefetch address may also indicate the storage location of the data to be prefetched. The tail address of the storage location may also be stored in the data address information, and an identifier of a memory address register may be stored in the data address information, and the corresponding tail address may be provided by the memory address register. In some possible implementation methods, the address offset of the first prefetch instruction may be embodied in the instruction as an operand, or may be preset in the system without being embodied in the form of an operand of the instruction.
在一些可能的实施方式中,作为指令操作数的地址偏移量可以是确定的数值,如#3,即实际预取地址为指令操作数地址信息偏移3,也可以用于指示偏移整数倍,如8,即实际预取地址为指令操作数地址信息偏移8的整数倍。In some possible implementations, the address offset of the instruction operand can be a fixed value, such as #3, that is, the actual prefetch address is the instruction operand address information offset 3, or it can be used to indicate an integer multiple of the offset, such as 8, that is, the actual prefetch address is an integer multiple of the instruction operand address information offset 8.
在一些可能的实施方法中,PRFMTRAS指令在实际执行时,该地址偏移量可以调整。示例性地,处理器可以通过监听软件预取指令的时效性,并据此对地址偏移量进行调整,其中,软件预取指令的时效性是指,当处理器需要第一数据时,通过软件预取指令预取的该第一数据是否在相应的寄存器(缓存)中。当处理器监听到当前软件预取指令时效性低(例如需要第一数据时,第一数据还未被返回),则可以增大执行PRFMTRAS指令时的地址偏移量,具体可以为预设的或指令操作数中的地址偏移量的整数倍、或在预设的或指令操作数中的地址偏移量基础上增加预设的偏移值,从而使此次访存的地址距离更远,改善时效性。又例如,当处理器监听到当前软件预取指令时效性太高(即预取指令发送过早,导致需要第一数据时,第一数据已经被替换出相应的寄存器),则可以减小PRFMTRAS指令的地址偏移量,具体可以为预设的地址偏移量的整数倍,使之小于前一个预取指令的地址偏移量,或在前一个预取指令的地址偏移量的基础上减去预设的偏移值,从而使此次访存的地址距离更近、在相应的寄存器中存入更少量的数据,减缓该寄存器中数据的替换效率,改善预取时效性。In some possible implementation methods, when the PRFMTRAS instruction is actually executed, the address offset can be adjusted. Exemplarily, the processor can adjust the address offset by monitoring the timeliness of the software prefetch instruction, wherein the timeliness of the software prefetch instruction refers to whether the first data prefetched by the software prefetch instruction is in the corresponding register (cache) when the processor needs the first data. When the processor monitors that the current software prefetch instruction has low timeliness (for example, when the first data is needed, the first data has not been returned), the address offset when executing the PRFMTRAS instruction can be increased, which can be an integer multiple of the address offset preset or in the instruction operand, or a preset offset value can be added to the address offset preset or in the instruction operand, so that the address distance of this memory access is farther, improving the timeliness. For another example, when the processor detects that the timeliness of the current software prefetch instruction is too high (i.e., the prefetch instruction is sent too early, resulting in the first data having been replaced out of the corresponding register when the first data is needed), the address offset of the PRFMTRAS instruction can be reduced. Specifically, it can be an integer multiple of a preset address offset so that it is smaller than the address offset of the previous prefetch instruction, or the preset offset value can be subtracted from the address offset of the previous prefetch instruction, so that the address distance of this memory access is closer and a smaller amount of data is stored in the corresponding register, thereby slowing down the replacement efficiency of the data in the register and improving the prefetch timeliness.
本实施例对处理器在执行PRFMTRAS指令时,对地址偏移量的实际调整方法不做具体限定。This embodiment does not specifically limit the actual method for adjusting the address offset when the processor executes the PRFMTRAS instruction.
在一些可能的实施方法中,该第三预取指令可以是PRFMBR(PRefetch From Memory Based on Register)指令,用于使处理器根据PRDR寄存器中的内容将存于第二预取地址处的数预取至处理器缓存中。由于在间接数据预取过程中,处理器需要顺序执行至少两条软件数据预取指令,除最后一条软件数据预取指令外,处理器通过前至少一条软件数据预取指令从第一存储器中获得的都为地址信息。以处理器先执行PRFMTRAS指令、后执行PRFMBR指令为例,通过PRFMTRAS指令,处理器从第一存储器中获得第一地址信息并存入PRDR寄存器中。如图7所示,处理器在执行PRFMBR指令时,通过对第一地址信息运算获取第二预取地址,并根据该第二预取地址从存储器中获得相应的预取数据、将之存入缓存。In some possible implementation methods, the third prefetch instruction may be a PRFMBR (PRefetch From Memory Based on Register) instruction, which is used to enable the processor to prefetch the data stored at the second prefetch address into the processor cache according to the content in the PRDR register. Since in the indirect data prefetch process, the processor needs to execute at least two software data prefetch instructions in sequence, except for the last software data prefetch instruction, the processor obtains address information from the first memory through at least one software data prefetch instruction. For example, the processor first executes the PRFMTRAS instruction and then executes the PRFMBR instruction. Through the PRFMTRAS instruction, the processor obtains the first address information from the first memory and stores it in the PRDR register. As shown in FIG7, when the processor executes the PRFMBR instruction, it obtains the second prefetch address by calculating the first address information, and obtains the corresponding prefetch data from the memory according to the second prefetch address and stores it in the cache.
PRFMBR的操作数可以为空,即上述第二预取地址等于第一地址信息。PRFMBR的操作数也可以包括地址信息,示例性地,所述地址信息可以是存储单元的标识或固定偏移量,PRFMBR指令通过该地址信息和第一地址信息计算获得第二预取地址,计算方式包括相加。The operand of PRFMBR may be empty, that is, the second prefetch address is equal to the first address information. The operand of PRFMBR may also include address information. Exemplarily, the address information may be an identifier of a storage unit or a fixed offset. The PRFMBR instruction calculates the second prefetch address through the address information and the first address information, and the calculation method includes addition.
进一步地,处理器执行PRFMBR指令的时候,可以按照地址偏移量进行第二预取地址计算、并对相应数据预取,即,PRFMBR的操作数还可以包括该地址偏移量。可选地,该地址偏移量可以是固定值,也可以是可调整的数值,调整方式可参照PRFMTRAS指令。Further, when the processor executes the PRFMBR instruction, the second pre-fetch address calculation can be performed according to the address offset and the corresponding data can be pre-fetched, that is, the operand of the PRFMBR can also include the address offset. Optionally, the address offset can be a fixed value or an adjustable value, and the adjustment method can refer to the PRFMTRAS instruction.
在一些可能的实施方法中,该第二预取指令可以是PRFMBRTR(PRefetch From Memory Based on Register To Register)指令,用于使处理器根据PRDR寄存器中的内容将存于第三预取地址处的数据预取至处理器缓存中。由于在间接数据预取过程中,处理器需要顺序执行至少两条软件数据预取指令,除最后一条软件数据预取指令外,处理器通过前至少一条软件数据预取指令从第一存储器中获得的都为地址信息。以处理器先执行PRFMTRAS指令、后执行PRFMBRTR指令为例,通过PRFMTRAS指令,处理器从第一存储器中获得第二地址信息并存入PRDR寄存器中。如图8所示,处理器在执行PRFMBRTR指令时,通过第二地址信息运算获取第三预取地址,并根据该第三预取地址从存储器中获得相应的预取数据、将之存入缓存和PRDR寄存器中。该预取数据可以替换PRDR寄存器中存储的第二地址信息,可选地,该预取数据也可以存储在其他指定的PRDR寄存器中,并删除该第二地址信息。In some possible implementation methods, the second prefetch instruction may be a PRFMBRTR (PRefetch From Memory Based on Register To Register) instruction, which is used to enable the processor to prefetch the data stored at the third prefetch address into the processor cache according to the content in the PRDR register. Since in the indirect data prefetch process, the processor needs to execute at least two software data prefetch instructions in sequence, except for the last software data prefetch instruction, the processor obtains address information from the first memory through at least one software data prefetch instruction. Taking the example that the processor first executes the PRFMTRAS instruction and then executes the PRFMBRTR instruction, through the PRFMTRAS instruction, the processor obtains the second address information from the first memory and stores it in the PRDR register. As shown in FIG8, when the processor executes the PRFMBRTR instruction, it obtains the third prefetch address through the second address information operation, and obtains the corresponding prefetch data from the memory according to the third prefetch address, and stores it in the cache and the PRDR register. The pre-fetched data may replace the second address information stored in the PRDR register. Optionally, the pre-fetched data may also be stored in other designated PRDR registers and the second address information may be deleted.
PRFMBRTR的操作数可以为空,即上述第三预取地址等于第二地址信息。PRFMBRTR的操作数也可以包括地址信息,示例性地,所述地址信息可以是存储单元的标识或固定偏移量,PRFMBRTR指令通过该地址信息和第二地址信息计算获得第三预取地址,计算方式包括相加。The operand of PRFMBRTR may be empty, that is, the third prefetch address is equal to the second address information. The operand of PRFMBRTR may also include address information. For example, the address information may be an identifier of a storage unit or a fixed offset. The PRFMBRTR instruction calculates the third prefetch address through the address information and the second address information, and the calculation method includes addition.
进一步地,处理器执行PRFMBRTR指令的时候,可以按照地址偏移量进行第三预取地址计 算、并对相应数据预取,即,PRFMBRTR的操作数还可以包括该地址偏移量。可选地,该地址偏移量可以是固定值,也可以是可调整的数值,调整方式可参照PRFMTRAS指令。Furthermore, when the processor executes the PRFMBRTR instruction, the third prefetch address calculation can be performed according to the address offset. Calculate and pre-fetch the corresponding data, that is, the operand of PRFMBRTR can also include the address offset. Optionally, the address offset can be a fixed value or an adjustable value, and the adjustment method can refer to the PRFMTRAS instruction.
由上可知,新引入的三条预取指令都有对PRDR寄存器的写入或读取操作,且对前一条指令的执行结果有依赖,因此该三条指令的应顺序提交执行。示例性地,可以实现的指令执行顺序包括:先执行PRFMTRAS、后执行PRFMBRTR,先执行PRFMTRAS、后执行PRFMBR,或先执行PRFMTRAS、再执行至少一次PRFMBRTR、最后执行PRFMBR,等。在开始一次无规则的间接寻址预取时,第一条被执行的指令应当是PRFMTRAS,用于从存储器中获取相关预取数据存入PRDR寄存器,使后续的PRFMBRTR指令或PRFMBR指令得以被执行。As can be seen from the above, the three newly introduced prefetch instructions all have write or read operations on the PRDR register and are dependent on the execution result of the previous instruction, so the three instructions should be submitted for execution in sequence. Exemplarily, the instruction execution sequence that can be implemented includes: executing PRFMTRAS first and then executing PRFMBRTR, executing PRFMTRAS first and then executing PRFMBR, or executing PRFMTRAS first, then executing at least one PRFMBRTR, and finally executing PRFMBR, etc. When starting an irregular indirect addressing prefetch, the first instruction executed should be PRFMTRAS, which is used to obtain relevant prefetch data from the memory and store it in the PRDR register, so that the subsequent PRFMBRTR instruction or PRFMBR instruction can be executed.
在一些可能的实施方式中,在执行上述新引入的软件预取指令时,可能触发对相关存储设备的越界或违法访问,处理器将触发越界的预取指令作为空操作(no-operation instruction,NOP)指令处理,并且不对CPU架构状态做任何改变。通过跳过此次指令预取,避免系统锁死等风险。其中,触发越界至少包括以下两个情况:1、数据越界,如超过数组的边界;2、访问权限越界,如违法访问或待访问的存储器超出指令访问权限。In some possible implementations, when executing the newly introduced software prefetch instruction, it may trigger an out-of-bounds or illegal access to the relevant storage device. The processor will process the prefetch instruction that triggers the out-of-bounds as a no-operation instruction (NOP) instruction and will not make any changes to the CPU architecture state. By skipping this instruction prefetch, risks such as system lockup are avoided. Among them, triggering out-of-bounds includes at least the following two situations: 1. Data out-of-bounds, such as exceeding the boundary of an array; 2. Access right out-of-bounds, such as illegal access or the memory to be accessed exceeds the instruction access right.
在一些可能的实施方式中,该PRDR寄存器4具体可以是FIFO寄存器。In some possible implementations, the PRDR register 4 may specifically be a FIFO register.
在一些可能的实施方式中,该数据预取装置中包含多个PRDR寄存器4,用于存储预取的数据。可选地,多个PRDR寄存器4之间的存储容量完全可以相等、部分相等、或者完全不相等。In some possible implementations, the data pre-fetching device includes a plurality of PRDR registers 4 for storing pre-fetched data. Optionally, the storage capacities of the plurality of PRDR registers 4 may be completely equal, partially equal, or completely unequal.
可选地,多个PRDR寄存器4可以支持重命名,以解决在多条软件预取指令执行时、可能发生的写后写(Write after Write,WAW)冲突。虽然前述新引入的软件预取指令应被顺序提交处理,但由于处理器大多支持乱序执行,因此在具体实现时,可能出现顺序提交预取指令、执行期间插入不相关的指令的情况,通过重命名寄存器,可以解决上述WAW冲突。该不相关的指令为与待执行的预取指令没有数据或寄存器依赖关系的指令。Optionally, multiple PRDR registers 4 can support renaming to resolve Write after Write (WAW) conflicts that may occur when multiple software prefetch instructions are executed. Although the newly introduced software prefetch instructions should be submitted for processing in sequence, since most processors support out-of-order execution, in specific implementations, it is possible that prefetch instructions are submitted in sequence and irrelevant instructions are inserted during execution. By renaming registers, the above WAW conflicts can be resolved. The irrelevant instructions are instructions that have no data or register dependency on the prefetch instructions to be executed.
示例性地,第一存储器2可以是该电子设备中的内存,缓存132可以是处理器1的核内缓存。其中,核内缓存通常是由高速但昂贵的存储模块构成,用于处理器计算时的数据高速存取。当处理器1读取数据时,首先从核内缓存中查找,找到就立即读取并送给处理器1处理;没有找到,就用相对大的访问时间开销从内存中读取并送给处理器1处理,同时把这个数据所在的数据块调入核内缓存中,可以使得以后对整块数据的读取都从核内缓存中进行,不必再调用内存。Exemplarily, the first memory 2 may be a memory in the electronic device, and the cache 132 may be an in-core cache of the processor 1. The in-core cache is usually composed of a high-speed but expensive storage module, and is used for high-speed data access during processor calculation. When the processor 1 reads data, it first searches from the in-core cache, and immediately reads and sends it to the processor 1 for processing if it finds it; if it does not find it, it reads from the memory with a relatively large access time overhead and sends it to the processor 1 for processing, and at the same time, the data block where the data is located is transferred into the in-core cache, so that the entire block of data can be read from the in-core cache in the future, without having to call the memory.
示例性地,该电子设备5可以为电脑、平板、音频播放器、视频播放器、数据处理设备、数据运算设备等等涉及数据处理和数据存储的设备。Exemplarily, the electronic device 5 may be a computer, a tablet, an audio player, a video player, a data processing device, a data computing device, or the like, and may be a device involving data processing and data storage.
本申请实施例如上图4所示的电子设备5,可基于如上图6、图7、图8所示的软件预取指令,对示例指令片段进行预取优化。示例指令片段如下表1所示:The electronic device 5 shown in FIG. 4 in the embodiment of the present application can perform pre-fetch optimization on the example instruction fragments based on the software pre-fetch instructions shown in FIG. 6, FIG. 7, and FIG. 8. The example instruction fragments are shown in Table 1 below:
表1
Table 1
该示例指令片段指示了一种基于数组的无规则间接寻址访存模式。其中,寄存器x21中存储 的是一个数组的地址索引,按顺序访问数组。第(1)条指令用于指示将从x21地址访问返回的数据加载到x22;根据第(2)条指令可知,处理器使用x22的内容计算出新地址并加载数据到x22,替换了之前从x21返回的数据。该第(2)条指令是典型的无规则间接寻址。之后,第(2)条指令返回的数据被进一步使用来计算产生新的访存地址(见第(3)条指令)。最后,在第(5)条指令中,x21寄存器以地址顺序加8循环。This example instruction fragment indicates an array-based random indirect addressing memory access mode. In which register x21 stores is the address index of an array, accessing the array in sequence. Instruction (1) is used to instruct that the data returned from the x21 address access is loaded into x22; according to instruction (2), the processor uses the content of x22 to calculate the new address and loads the data into x22, replacing the data previously returned from x21. This instruction (2) is a typical irregular indirect addressing. Afterwards, the data returned by instruction (2) is further used to calculate a new memory access address (see instruction (3)). Finally, in instruction (5), the x21 register is incremented by 8 in address sequence.
其中,第(3)条指令是第二层嵌套的无规则间接寻址,目前硬件预取和软件预取都难以预测第(3)条指令的访问地址和数据。因此,在第(3)条指令后引入如上图6、图7、图8所示的软件预取指令,来帮助预取上述示例指令片段中循环体访问数据。Among them, the instruction (3) is the second level of nested irregular indirect addressing. Currently, both hardware prefetching and software prefetching are difficult to predict the access address and data of instruction (3). Therefore, software prefetch instructions as shown in Figures 6, 7, and 8 above are introduced after instruction (3) to help prefetch the loop body access data in the above example instruction fragment.
新增的3条都属于软件预取指令一般由操作码和操作数组成,操作码在不同架构下的软件预取指令可以有不同的格式。以ARM架构下的软件预取指令为例,操作码可以由以下三项组成:操作目的、目的寄存器和操作措施。操作目的分为PLD和PST,其中PLD用于指代以加载为目的的预取操作,PST用于指代以存储为目的的预取操作。目的寄存器示例性地可以有L1代表L1缓存、L2代表L2缓存,以此类推。操作措施包括KEEP和STRM,其中,KEEP一般用于预取到缓存中,STRM一般用于预取到单次读写的内存空间。通过字符组合,可以得到不同的操作符,使软件预取指令基于该操作符进行不同的预取操作。The three newly added instructions are all software prefetch instructions, which are generally composed of an opcode and an operand. The opcodes of software prefetch instructions under different architectures can have different formats. Taking the software prefetch instructions under the ARM architecture as an example, the opcode can be composed of the following three items: operation purpose, destination register, and operation measures. The operation purpose is divided into PLD and PST, where PLD is used to refer to prefetch operations for the purpose of loading, and PST is used to refer to prefetch operations for the purpose of storage. The destination register can exemplarily have L1 representing L1 cache, L2 representing L2 cache, and so on. Operation measures include KEEP and STRM, where KEEP is generally used to prefetch into the cache, and STRM is generally used to prefetch into a single read-write memory space. Through character combination, different operators can be obtained, so that the software prefetch instruction can perform different prefetch operations based on the operator.
软件预取指令中的操作数一般为存储器地址,在这里可以理解为预取地址,或处理器可以根据该存储器地址计算获得预取地址,在该预取地址中存储有待预取的数据。The operand in the software prefetch instruction is generally a memory address, which can be understood as a prefetch address here, or the processor can calculate the prefetch address according to the memory address, and the data to be prefetched is stored in the prefetch address.
表2
Table 2
如表2所示,增加的第(4)条指令,即对应于图6中指令PRFMTRAS可以预取顺序访问x21的数据,预设的地址偏移量为8字节,预取返回的数据同时加载到处理器缓存和PFDR寄存器。该条指令所预取的数据之后为第(1)条指令中使用。更进一步地,基于该预设的地址偏移量,实际硬件实现可以根据之前预取的时效性,动态调整预取的实际偏移量,示例性地可以为该地址偏移量的整数倍。As shown in Table 2, the added instruction (4), which corresponds to the instruction PRFMTRAS in FIG6 , can prefetch the data of sequential access x21, and the preset address offset is 8 bytes. The prefetched data is loaded into the processor cache and the PFDR register at the same time. The data prefetched by this instruction is then used in the instruction (1). Furthermore, based on the preset address offset, the actual hardware implementation can dynamically adjust the actual offset of the prefetch according to the timeliness of the previous prefetch, which can be an integer multiple of the address offset.
增加的第(5)条指令,即对应于图8中指令PRFMBRTR,它可以使用PRDR中存储的数据、计算预取地址并进行数据预取。如前所述,在顺序处理的情况下,通过上一条指令PRFMTRAS预取到PRDR的数据,即顺序访问数组x21中的数据,加上x6寄存器中的数据和立即数3形成第二数据预取地址。根据该第二数据预取地址,处理器预取第二数据并加载到缓存和PFDR寄存器中,对PFDR寄存器中原本存储的、顺序访问的数组x21中的数据进行替换。其中,PRFMBRTR指令所预取的第二数据之后为第(2)条指令中使用。The added instruction (5), which corresponds to the instruction PRFMBRTR in FIG8 , can use the data stored in PRDR, calculate the prefetch address and perform data prefetching. As mentioned above, in the case of sequential processing, the data prefetched to PRDR by the previous instruction PRFMTRAS, that is, the data in the sequential access array x21, plus the data in the x6 register and the immediate number 3 form the second data prefetch address. According to the second data prefetch address, the processor prefetches the second data and loads it into the cache and the PFDR register, replacing the data in the sequentially accessed array x21 originally stored in the PFDR register. Among them, the second data prefetched by the PRFMBRTR instruction is then used in the instruction (2).
增加的第(6)条指令,即对应于图7中指令PRFMBR,它同样可以使用PRDR中存储的数据、计算预取地址并进行数据预取。具体地,从指令中看出使用上一条PRFMBRTR预取到PRDR的第二数据,加上x14寄存器中的数据计算第三数据预取地址。根据该第三数据预取地址,处理器预取第三数据并加载到缓存中。其中,PRFMBR指令所预取的第三数据之后为第(3)条指令中使用。The added instruction (6), which corresponds to the instruction PRFMBR in FIG7 , can also use the data stored in PRDR, calculate the prefetch address and perform data prefetching. Specifically, it can be seen from the instruction that the second data prefetched to PRDR by the previous PRFMBRTR is added to the data in the x14 register to calculate the third data prefetch address. According to the third data prefetch address, the processor prefetches the third data and loads it into the cache. Among them, the third data prefetched by the PRFMBR instruction is then used in the instruction (3).
具体地,如图9所示,为本实施例提供的基于预取指令的无规则的间接寻址示例图,用于示例性地描述处理器1通过新引入的三条预取指令实现无规则的间接寻址数据预取。其中处理器1包括缓存单元13和预取单元14,其中PRDR寄存器并未示出。处理器先后顺序执行PRFMTRAS、PRFMBRTR和PRFMBR指令,以预取存储器2中31、32、35存储单元中的数据。Specifically, as shown in FIG9 , an example diagram of irregular indirect addressing based on prefetch instructions provided in this embodiment is used to exemplarily describe that the processor 1 implements irregular indirect addressing data prefetching through three newly introduced prefetch instructions. The processor 1 includes a cache unit 13 and a prefetch unit 14, wherein the PRDR register is not shown. The processor sequentially executes the PRFMTRAS, PRFMBRTR and PRFMBR instructions to prefetch data in the storage units 31, 32 and 35 in the memory 2.
通过执行PRFMTRAS,存储器2中11、12、13存储单元中的数据存入缓存单元13和PRDR寄存器,11、12、13存储单元中的数据为第一地址信息。 By executing PRFMTRAS, the data in the storage units 11, 12, and 13 in the memory 2 are stored in the cache unit 13 and the PRDR register, and the data in the storage units 11, 12, and 13 are the first address information.
通过执行PRFMBRTR,调用PRDR寄存器中存储的第一地址信息,计算获得对应于存储器2中22、24、26存储单元的地址信息,并预取其中数据到缓存单元13和PRDR寄存器。22、24、26存储单元中的数据为第二地址信息。By executing PRFMBRTR, the first address information stored in the PRDR register is called, the address information corresponding to the storage units 22, 24, and 26 in the memory 2 is calculated, and the data therein is pre-fetched to the cache unit 13 and the PRDR register. The data in the storage units 22, 24, and 26 is the second address information.
通过执行PRFMBR,调用PRDR寄存器中存储的第二地址信息,计算获得对应于存储器2中31、32、35存储单元的地址信息,并预取其中数据到缓存单元13中。By executing PRFMBR, the second address information stored in the PRDR register is called, the address information corresponding to the storage units 31, 32, and 35 in the memory 2 is calculated, and the data therein is pre-fetched into the cache unit 13.
因此,通过新引入的三条指令,处理器1实现对无规则间接寻址数据的快速预取,CPU性能得到较大提升。Therefore, through the newly introduced three instructions, processor 1 can achieve fast prefetching of irregular indirect addressing data, and CPU performance is greatly improved.
本申请实施例还提供一种包括指令的计算机程序产品,当其在计算机上运行时,使得该计算机执行如上述图6、图7和图8中所示的实施例的数据预取指令,和图9中所示的实施例的数据预取方法。An embodiment of the present application also provides a computer program product including instructions, which, when executed on a computer, enables the computer to execute the data prefetch instructions of the embodiments shown in Figures 6, 7 and 8 above, and the data prefetch method of the embodiment shown in Figure 9.
本申请实施例还提供了一种计算机可读存储介质,包括指令,当该指令在计算机上运行时,使得计算机执行如上述图6、图7和图8中所示的实施例的数据预取指令,和图9中所示的实施例的数据预取方法。An embodiment of the present application also provides a computer-readable storage medium, including instructions, which, when executed on a computer, enable the computer to execute the data prefetch instructions of the embodiments shown in Figures 6, 7 and 8 above, and the data prefetch method of the embodiment shown in Figure 9.
在另一种可能的设计中,当该处理系统为终端内的芯片时,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使该终端内的芯片执行如上述图6、图7和图8中所示的实施例的数据预取指令,和图9中所示的实施例的数据预取方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述终端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。In another possible design, when the processing system is a chip in a terminal, the chip includes: a processing unit and a communication unit, the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit may execute computer execution instructions stored in the storage unit, so that the chip in the terminal executes the data prefetch instructions of the embodiments shown in the above-mentioned Figures 6, 7 and 8, and the data prefetch method of the embodiment shown in Figure 9. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the terminal located outside the chip, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制上述数据预取方法的程序执行的集成电路。Among them, the processor mentioned in any of the above places can be a general-purpose central processing unit, a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the above-mentioned data prefetching method.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices and units described above can refer to the corresponding processes in the aforementioned method embodiments and will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be an indirect coupling or communication connection through some interfaces, devices or units, which can be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including several instructions to enable a computer device (which can be a personal computer, server, or network device, etc.) to execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk and other media that can store program code.
本申请中术语“第一”、“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这 些元素不应受术语的限制。这些术语只是用于将一元素与另一元素区别分开。例如,在不脱离各种所述示例的范围的情况下,第一图像可以被称为第二图像,并且类似地,第二图像可以被称为第一图像。第一图像和第二图像都可以是图像,并且在某些情况下,可以是单独且不同的图像。In this application, the terms "first", "second", etc. are used to distinguish the same or similar items with substantially the same role and function. It should be understood that there is no logical or temporal dependency between "first", "second", and "nth", nor does it limit the quantity and execution order. It should also be understood that although the following description uses the terms first, second, etc. to describe various elements, this These elements should not be limited by the terms. These terms are only used to distinguish one element from another element. For example, without departing from the scope of the various described examples, a first image may be referred to as a second image, and similarly, a second image may be referred to as a first image. Both the first image and the second image may be images, and in some cases, may be separate and different images.
本申请中术语“至少一个”的含义是指一个或多个,本申请中术语“多个”的含义是指两个或两个以上,例如,多个第二报文是指两个或两个以上的第二报文。本文中术语“系统”和“网络”经常可互换使用。The term "at least one" in this application means one or more, and the term "multiple" in this application means two or more, for example, multiple second messages means two or more second messages. The terms "system" and "network" are often used interchangeably herein.
应理解,在本文中对各种所述示例的描述中所使用的术语只是为了描述特定示例,而并非旨在进行限制。如在对各种所述示例的描述和所附权利要求书中所使用的那样,单数形式“一个(“a”“,an”)”和“该”旨在也包括复数形式,除非上下文另外明确地指示。It should be understood that the terms used in the description of the various examples herein are only for describing specific examples and are not intended to be limiting. As used in the description of the various examples and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
还应理解,本文中所使用的术语“和/或”是指并且涵盖相关联的所列出的项目中的一个或多个项目的任何和全部可能的组合。术语“和/或”,是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本申请中的字符“/”,一般表示前后关联对象是一种“或”的关系。It should also be understood that the term "and/or" used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. The term "and/or" is a description of the association relationship of associated objects, indicating that three relationships may exist. For example, A and/or B can represent: A exists alone, A and B exist at the same time, and B exists alone. In addition, the character "/" in this application generally indicates that the associated objects before and after are in an "or" relationship.
还应理解,在本申请的各个实施例中,各个过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should also be understood that in the various embodiments of the present application, the size of the serial number of each process does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。It should be understood that determining B based on A does not mean determining B only based on A. B can also be determined based on A and/or other information.
还应理解,术语“包括”(也称“includes”、“including”、“comprises”和/或“comprising”)当在本说明书中使用时指定存在所陈述的特征、整数、步骤、操作、元素、和/或部件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元素、部件、和/或其分组。It should also be understood that the term “comprise” (also known as “includes,” “including,” “comprises” and/or “comprising”) when used in this specification specifies the presence of stated features, integers, steps, operations, elements, and/or components, but does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
还应理解,术语“如果”可被解释为意指“当...时”(“when”或“upon”)或“响应于确定”或“响应于检测到”。类似地,根据上下文,短语“如果确定...”或“如果检测到[所陈述的条件或事件]”可被解释为意指“在确定...时”或“响应于确定...”或“在检测到[所陈述的条件或事件]时”或“响应于检测到[所陈述的条件或事件]”。It should also be understood that the term "if" may be interpreted to mean "when" or "upon" or "in response to determining" or "in response to detecting." Similarly, the phrase "if it is determined that ..." or "if [a stated condition or event] is detected" may be interpreted to mean "upon determining that ..." or "in response to determining that ..." or "upon detecting [a stated condition or event]" or "in response to detecting [a stated condition or event]," depending on the context.
应理解,说明书通篇中提到的“一个实施例”、“一实施例”、“一种可能的实现方式”意味着与实施例或实现方式有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一实施例中”、“一种可能的实现方式”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。It should be understood that the references to "one embodiment", "an embodiment", or "a possible implementation" throughout the specification mean that specific features, structures, or characteristics related to the embodiment or implementation are included in at least one embodiment of the present application. Therefore, the references to "in one embodiment" or "in an embodiment", or "a possible implementation" appearing throughout the specification do not necessarily refer to the same embodiment. In addition, these specific features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。 As described above, the above embodiments are only used to illustrate the technical solutions of the present application, rather than to limit it. Although the present application has been described in detail with reference to the aforementioned embodiments, a person of ordinary skill in the art should understand that the technical solutions described in the aforementioned embodiments can still be modified, or some of the technical features therein can be replaced by equivalents. However, these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (26)

  1. 一种数据预取方法,其特征在于,所述方法包括:A data pre-fetching method, characterized in that the method comprises:
    执行第一指令集中的第一指令,根据所述第一指令的操作数访问存储单元,并将获取的第一数据存入缓存与第一寄存器,其中所述操作数包括第一地址;Execute a first instruction in a first instruction set, access a storage unit according to an operand of the first instruction, and store the acquired first data in a cache and a first register, wherein the operand includes a first address;
    执行所述第一指令集中的第二指令,根据所述第一寄存器中的所述第一数据确定第二地址、访问所述存储单元,并将获取的第二数据存入所述缓存与所述第一寄存器;Execute a second instruction in the first instruction set, determine a second address according to the first data in the first register, access the storage unit, and store the acquired second data in the cache and the first register;
    执行所述第一指令集中的第三指令,根据所述第一寄存器中的所述第二数据确定第三地址、访问所述存储单元,并将获取的第三数据存入所述缓存;executing a third instruction in the first instruction set, determining a third address according to the second data in the first register, accessing the storage unit, and storing the acquired third data in the cache;
    其中,所述第二地址对应所述存储单元中的所述第二数据,所述第三地址对应所述存储单元中的所述第三数据。The second address corresponds to the second data in the storage unit, and the third address corresponds to the third data in the storage unit.
  2. 根据权利要求1所述的数据预取方法,其特征在于,执行所述第一指令,其中所述操作数包括第一地址;还包括:The data prefetching method according to claim 1, characterized in that executing the first instruction, wherein the operand includes a first address; further comprising:
    执行所述第一指令,根据所述第一地址和第一偏移获取所述第一数据;Execute the first instruction to obtain the first data according to the first address and the first offset;
    其中,所述第一指令用于指示所述第一偏移。The first instruction is used to indicate the first offset.
  3. 根据权利要求2所述的数据预取方法,其特征在于,所述第一指令的操作数还包括所述第一偏移,其中,所述第一偏移根据预取时效性调整,所述预取时效性用于指示读取预取数据时,所述缓存中是否存储有所述预取数据。The data prefetching method according to claim 2 is characterized in that the operand of the first instruction also includes the first offset, wherein the first offset is adjusted according to the prefetching timeliness, and the prefetching timeliness is used to indicate whether the prefetched data is stored in the cache when reading the prefetched data.
  4. 根据权利要求1-3中任一所述的数据预取方法,其特征在于,所述第二指令的操作数包括第四地址,The data prefetching method according to any one of claims 1 to 3, characterized in that the operand of the second instruction includes a fourth address,
    所述执行所述第二指令,根据所述第一寄存器中的所述第一数据确定第二地址,具体包括:The executing the second instruction and determining the second address according to the first data in the first register specifically includes:
    执行所述第二指令,根据所述第二指令的操作数和所述第一寄存器中的所述第一数据确定第二地址。The second instruction is executed, and a second address is determined according to an operand of the second instruction and the first data in the first register.
  5. 根据权利要求1-4中任一所述的数据预取方法,其特征在于,所述第三指令的操作数包括第五地址,The data prefetching method according to any one of claims 1 to 4, characterized in that the operand of the third instruction includes a fifth address,
    所述执行所述第三指令,根据所述第一寄存器中的所述第二数据确定第三地址,具体包括:The executing the third instruction and determining the third address according to the second data in the first register specifically includes:
    执行所述第三指令,根据所述第三指令的操作数和所述第一寄存器中的所述第二数据确定第三地址。The third instruction is executed, and a third address is determined according to an operand of the third instruction and the second data in the first register.
  6. 根据权利要求1-5中任一所述的数据预取方法,其特征在于,所述方法还包括:The data pre-fetching method according to any one of claims 1 to 5, characterized in that the method further comprises:
    执行所述第一指令、所述第二指令或所述第三指令后、访问所述存储单元前,确定各自对应的所述第一地址、所述第二地址或所述第三地址未发生数组越界。After executing the first instruction, the second instruction or the third instruction and before accessing the storage unit, it is determined that the first address, the second address or the third address corresponding to each of them does not cross the array boundary.
  7. 根据权利要求4-6中任一所述的数据预取方法,其特征在于,所述第二指令的操作数还包括第二偏移和/或所述第三指令的操作数还包括第三偏移。The data prefetching method according to any one of claims 4 to 6 is characterized in that the operand of the second instruction also includes a second offset and/or the operand of the third instruction also includes a third offset.
  8. 根据权利要求1-7中任一所述的数据预取方法,其特征在于,The data pre-fetching method according to any one of claims 1 to 7, characterized in that:
    当所述第一寄存器中没有空的存储单元且有第五数据待存入时,替换所述第一寄存器中的第四数据,其中所述第四数据是最早存入所述第一寄存器的。When there is no empty storage unit in the first register and fifth data is to be stored, the fourth data in the first register is replaced, wherein the fourth data is the earliest stored in the first register.
  9. 一种数据预取装置,其特征在于,所述数据预取装置包括:预取单元和寄存单元,所述预取单元分别与存储单元和缓存单元电耦合;A data pre-fetch device, characterized in that the data pre-fetch device comprises: a pre-fetch unit and a register unit, wherein the pre-fetch unit is electrically coupled to the storage unit and the cache unit respectively;
    所述预取单元用于执行第一指令集中的至少三条指令,所述寄存单元用于存储所述预取单元通过所述至少三条指令获取的数据,The pre-fetch unit is used to execute at least three instructions in the first instruction set, and the register unit is used to store data obtained by the pre-fetch unit through the at least three instructions.
    其中,所述预取单元具体用于:Wherein, the pre-fetch unit is specifically used for:
    执行第一指令获取所述存储单元中数据、并将其存入所述寄存单元和所述缓存单元,所述第一指令的操作数为第一地址;Execute a first instruction to obtain data in the storage unit and store the data in the register unit and the cache unit, wherein the operand of the first instruction is a first address;
    根据所述寄存单元中的数据,执行第二指令获取所述存储单元中数据、并将其存入所述寄存单元和所述缓存单元;According to the data in the register unit, execute a second instruction to obtain the data in the storage unit and store the data in the register unit and the cache unit;
    根据所述寄存单元中的数据,执行第三指令获取所述存储单元中数据、并将其存入所述缓存单元。 According to the data in the register unit, a third instruction is executed to obtain the data in the storage unit and store the data in the cache unit.
  10. 根据权利要求9所述的数据预取装置,其特征在于,所述预取单元具体用于执行第一指令获取所述存储单元中数据,还包括:The data pre-fetching device according to claim 9, characterized in that the pre-fetching unit is specifically used to execute a first instruction to obtain data in the storage unit, and further comprises:
    所述预取单元具体用于,根据所述第一指令、所述第一地址和第一偏移获取所述存储单元中数据;The pre-fetch unit is specifically used to obtain data in the storage unit according to the first instruction, the first address and the first offset;
    其中,所述第一指令用于指示所述第一偏移。The first instruction is used to indicate the first offset.
  11. 根据权利要求10所述的数据预取装置,其特征在于,所述第一指令的操作数还包括第一偏移,其中,所述第一偏移根据预取时效性调整,所述预取时效性用于指示所述数据预取装置调取预取数据时,所述缓存单元中是否存储有所述预取数据。The data prefetch device according to claim 10 is characterized in that the operand of the first instruction also includes a first offset, wherein the first offset is adjusted according to the prefetch timeliness, and the prefetch timeliness is used to indicate whether the prefetch data is stored in the cache unit when the data prefetch device calls the prefetch data.
  12. 根据权利要求9-11中任一所述的数据预取装置,其特征在于,所述第二指令的操作数包括第二地址,所述预取单元具体用于根据所述寄存单元中的数据,执行第二指令获取所述存储单元中数据,还包括:The data prefetch device according to any one of claims 9 to 11, characterized in that the operand of the second instruction includes a second address, and the prefetch unit is specifically used to execute the second instruction to obtain the data in the storage unit according to the data in the register unit, and further includes:
    根据所述寄存单元中的数据和所述第二地址,执行所述第二指令获取所述存储单元中数据。According to the data in the register unit and the second address, the second instruction is executed to obtain the data in the storage unit.
  13. 根据权利要求9-12中任一所述的数据预取装置,其特征在于,所述第二指令的操作数包括第三地址,所述预取单元具体用于根据所述寄存单元中的数据,执行第三指令获取所述存储单元中数据,还包括:The data prefetch device according to any one of claims 9 to 12, characterized in that the operand of the second instruction includes a third address, and the prefetch unit is specifically used to execute the third instruction to obtain the data in the storage unit according to the data in the register unit, and further comprises:
    根据所述寄存单元中的数据和所述第三地址,执行所述第三指令获取所述存储单元中数据。According to the data in the register unit and the third address, the third instruction is executed to obtain the data in the storage unit.
  14. 根据权利要求9-13中任一所述的数据预取装置,其特征在于,所述预取单元还用于:The data pre-fetching device according to any one of claims 9 to 13, characterized in that the pre-fetching unit is further used for:
    执行所述第一指令、所述第二指令或所述第三指令后、访问所述存储单元前,确定各自对应的所述第一地址、所述第二地址或所述第三地址未发生数组越界。After executing the first instruction, the second instruction or the third instruction and before accessing the storage unit, it is determined that the first address, the second address or the third address corresponding to each of them does not cross the array boundary.
  15. 根据权利要求9-14中任一所述的数据预取装置,其特征在于,所述第二指令的操作数包括第二偏移,所述第三指令的操作数包括第三偏移,The data prefetch device according to any one of claims 9 to 14, characterized in that the operand of the second instruction includes a second offset, the operand of the third instruction includes a third offset,
    所述预取单元具体用于根据所述寄存单元中的数据,执行第二指令获取所述存储单元中数据,还包括:The pre-fetch unit is specifically used to execute a second instruction to obtain data in the storage unit according to the data in the register unit, and also includes:
    根据所述寄存单元中的数据和所述第二偏移,执行所述第二指令获取所述存储单元中数据;Execute the second instruction to obtain the data in the storage unit according to the data in the register unit and the second offset;
    所述预取单元具体用于根据所述寄存单元中的数据,执行第三指令获取所述存储单元中数据,还包括:The pre-fetch unit is specifically used to execute a third instruction to obtain data in the storage unit according to the data in the register unit, and also includes:
    根据所述寄存单元中的数据和所述第三偏移,执行所述第三指令获取所述存储单元中数据。According to the data in the register unit and the third offset, the third instruction is executed to obtain the data in the storage unit.
  16. 根据权利要求9-15中任一所述的数据预取装置,其特征在于,The data pre-fetching device according to any one of claims 9 to 15, characterized in that:
    当所述第一寄存器中没有空的存储单元且有第五数据待存入时,替换所述第一寄存器中的第四数据,其中所述第四数据是最早存入所述第一寄存器的。When there is no empty storage unit in the first register and fifth data is to be stored, the fourth data in the first register is replaced, wherein the fourth data is the earliest stored in the first register.
  17. 一种数据预取装置,其特征在于,所述数据预取装置包括:处理器,第一寄存器,第一存储器和缓存;A data pre-fetch device, characterized in that the data pre-fetch device comprises: a processor, a first register, a first memory and a cache;
    所述第一存储器,用于存储数据,所述数据包括待预取数据;The first memory is used to store data, wherein the data includes data to be pre-fetched;
    所述处理器,用于:The processor is configured to:
    根据所述第一指令集中的第一指令和第一地址,从所述第一存储器中获取第一数据,并将所述第一数据存入所述第一寄存器与所述缓存;According to a first instruction and a first address in the first instruction set, obtaining first data from the first memory, and storing the first data in the first register and the cache;
    根据所述第一寄存器中的所述第一数据和所述第一指令集中的第二指令,确定所述第二地址,并从所述第一存储器中获取所述第二地址对应的第二数据,并将所述第二数据存入所述第一寄存器与所述缓存;Determine the second address according to the first data in the first register and the second instruction in the first instruction set, obtain second data corresponding to the second address from the first memory, and store the second data in the first register and the cache;
    根据所述第一寄存器中的所述第二数据和所述第一指令集中的第三指令,确定所述预取地址,并将所述第一存储器中所述预取地址对应的所述待预取数据存入所述缓存;Determine the prefetch address according to the second data in the first register and the third instruction in the first instruction set, and store the to-be-prefetched data corresponding to the prefetch address in the first memory into the cache;
    其中,所述第一寄存器和所述第二存储器,用于存储所述处理器根据所述第一指令集中的至少一条指令获取的第一地址信息。The first register and the second memory are used to store first address information obtained by the processor according to at least one instruction in the first instruction set.
  18. 根据权利要求17所述的数据预取装置,其特征在于,所述处理器用于根据所述第一指令集中的第一指令和第一地址,从所述第一存储器中获取第一数据,还包括:The data pre-fetching device according to claim 17, wherein the processor is used to obtain the first data from the first memory according to the first instruction and the first address in the first instruction set, and further comprises:
    所述处理器用于,根据所述第一指令、所述第一地址和第一偏移获取所述第一数据;The processor is configured to obtain the first data according to the first instruction, the first address and the first offset;
    其中,所述第一指令用于指示所述第一偏移。 The first instruction is used to indicate the first offset.
  19. 根据权利要求18所述的数据预取装置,其特征在于,所述第一指令的操作数包括第一偏移,其中,所述第一偏移根据预取时效性调整,所述预取时效性用于指示所述处理器调取预取数据时,所述缓存中是否存储有所述预取数据。The data prefetch device according to claim 18 is characterized in that the operand of the first instruction includes a first offset, wherein the first offset is adjusted according to prefetch timeliness, and the prefetch timeliness is used to indicate whether the prefetch data is stored in the cache when the processor calls the prefetch data.
  20. 根据权利要求17-19中任一所述的数据预取装置,其特征在于,所述处理器用于根据所述第一寄存器中的所述第一数据和所述第一指令集中的第二指令,确定所述第二地址,还包括:The data prefetch device according to any one of claims 17 to 19, wherein the processor is used to determine the second address according to the first data in the first register and the second instruction in the first instruction set, and further comprises:
    所述第二指令的操作数包括第三地址,所述第二地址是根据所述第一数据和所述第三地址确定的。The operand of the second instruction includes a third address, and the second address is determined according to the first data and the third address.
  21. 根据权利要求17-20中任一所述的数据预取装置,其特征在于,所述处理器用于根据所述第一寄存器中的所述第二数据和所述第一指令集中的第三指令,确定所述预取地址,还包括:The data prefetch device according to any one of claims 17 to 20, characterized in that the processor is used to determine the prefetch address according to the second data in the first register and the third instruction in the first instruction set, and further comprises:
    所述第三指令的操作数包括第四地址,所述预取地址是根据所述第二数据和所述第四地址确定的。The operand of the third instruction includes a fourth address, and the prefetch address is determined according to the second data and the fourth address.
  22. 根据权利要求17-21中任一所述的数据预取装置,其特征在于,所述方法还包括:The data pre-fetching device according to any one of claims 17 to 21, characterized in that the method further comprises:
    执行所述第一指令、所述第二指令或所述第三指令后、访问所述存储单元前,确定各自对应的所述第一地址、所述第二地址或所述第三地址未发生数组越界。After executing the first instruction, the second instruction or the third instruction and before accessing the storage unit, it is determined that the first address, the second address or the third address corresponding to each of them does not cross the array boundary.
  23. 根据权利要求17-22中任一所述的数据预取装置,其特征在于,所述第二指令的操作数包括第二偏移,所述第三指令的操作数包括第三偏移,The data prefetch device according to any one of claims 17 to 22, characterized in that the operand of the second instruction includes a second offset, the operand of the third instruction includes a third offset,
    所述处理器用于根据所述第一寄存器中的所述第一数据和所述第一指令集中的第二指令,确定所述第二地址,还包括:The processor is configured to determine the second address according to the first data in the first register and the second instruction in the first instruction set, and further includes:
    根据所述第一数据、所述第二指令和所述第二偏移确定所述第二地址;determining the second address according to the first data, the second instruction and the second offset;
    所述处理器用于根据所述第一寄存器中的所述第二数据和所述第一指令集中的第三指令,确定所述预取地址,还包括:The processor is configured to determine the prefetch address according to the second data in the first register and a third instruction in the first instruction set, and further includes:
    根据所述第二数据、所述第三指令和所述第三偏移,确定所述预取地址。The prefetch address is determined according to the second data, the third instruction and the third offset.
  24. 根据权利要求17-23中任一所述的数据预取装置,其特征在于,The data pre-fetching device according to any one of claims 17 to 23, characterized in that:
    当所述第一寄存器中没有空的存储单元且有第五数据待存入时,替换所述第一寄存器中的第四数据,其中所述第四数据是最早存入所述第一寄存器的。When there is no empty storage unit in the first register and fifth data is to be stored, the fourth data in the first register is replaced, wherein the fourth data is the earliest stored in the first register.
  25. 一种处理系统,其特征在于,包括:A processing system, characterized in that it comprises:
    存储器,用于存储程序;Memory, used to store programs;
    处理器,用于执行所述存储器存储的所述程序,当所述程序被执行时,所述处理器用于执行权利要求1至8中任一所述的方法。A processor, configured to execute the program stored in the memory; when the program is executed, the processor is configured to execute the method described in any one of claims 1 to 8.
  26. 一种计算机可读存储介质,其特征在于,包括指令,当所述指令在计算机上运行时,使得计算机执行权利要求1至8中任意一项所述的方法。 A computer-readable storage medium, characterized in that it includes instructions, and when the instructions are executed on a computer, the computer executes the method according to any one of claims 1 to 8.
PCT/CN2023/120191 2022-12-30 2023-09-20 Data prefetching method and data prefetching apparatus WO2024139445A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211719527.0 2022-12-30

Publications (1)

Publication Number Publication Date
WO2024139445A1 true WO2024139445A1 (en) 2024-07-04

Family

ID=

Similar Documents

Publication Publication Date Title
US6957305B2 (en) Data streaming mechanism in a microprocessor
US7434030B2 (en) Processor system having accelerator of Java-type of programming language
EP1353267B1 (en) Microprocessor with repeat prefetch instruction
JP4027620B2 (en) Branch prediction apparatus, processor, and branch prediction method
US6151662A (en) Data transaction typing for improved caching and prefetching characteristics
EP0381471B1 (en) Method and apparatus for preprocessing multiple instructions in a pipeline processor
US8364902B2 (en) Microprocessor with repeat prefetch indirect instruction
US5829028A (en) Data cache configured to store data in a use-once manner
US20200364054A1 (en) Processor subroutine cache
US5727227A (en) Interrupt coprocessor configured to process interrupts in a computer system
KR20010075258A (en) Method for calculating indirect branch targets
US5919256A (en) Operand cache addressed by the instruction address for reducing latency of read instruction
US5930820A (en) Data cache and method using a stack memory for storing stack data separate from cache line storage
KR100638935B1 (en) Processor with memory and data prefetch unit
US20150227373A1 (en) Stop bits and predication for enhanced instruction stream control
JP3837289B2 (en) Microprocessor containing multiple register files occupying the same logical space
US20080184010A1 (en) Method and apparatus for controlling instruction cache prefetch
JP2009524167A5 (en)
KR100777753B1 (en) Data processing using a coprocessor
JP3973129B2 (en) Cache memory device and central processing unit using the same
US20040172518A1 (en) Information processing unit and information processing method
WO2024139445A1 (en) Data prefetching method and data prefetching apparatus
WO2023185993A1 (en) Systems and methods for load-dependent-branch pre-resolution
CN112559389A (en) Storage control device, processing device, computer system, and storage control method
CN114528025B (en) Instruction processing method and device, microcontroller and readable storage medium