WO2023237084A1

WO2023237084A1 - Data prefetching method, compiling method and related apparatus

Info

Publication number: WO2023237084A1
Application number: PCT/CN2023/099303
Authority: WO
Inventors: 勾玥; 孙文博; 刘盈盈
Original assignee: 华为技术有限公司
Priority date: 2022-06-10
Filing date: 2023-06-09
Publication date: 2023-12-14
Also published as: CN117251387A

Abstract

A data prefetching method and a compiling method, which may implement effective prefetching of data in a linked data structure. In the present solution, when a code accessing the linked data structure is identified in a compiling process, a data access instruction and metadata used for indicating addresses of the data in the linked data structure are generated, and a prefetching instruction is generated to indicate an address of the data access instruction and the metadata. In this way, when a running device executes an executable file obtained by compiling, the data access instruction and the metadata can be determined according to the prefetching instruction, such that prefetching of the data in the linked data structure is implemented. Moreover, after the running device acquires the address of the data access instruction on the basis of the prefetching instruction, an access progress of the data in the linked data structure can be learned according to the number of accesses of the data access instruction, thereby adaptively adjusting the quantity of prefetched data, and ensuring the effective prefetching of the data in the linked data structure.

Description

A data prefetching method, compilation method and related devices

This application claims priority to the Chinese patent application submitted to the China Patent Office on June 10, 2022, with application number 202210654495.4 and the invention title "A data prefetching method, compilation method and related devices", the entire content of which is incorporated by reference. incorporated in this application.

Technical field

The present application relates to the field of computer technology, and in particular, to a data prefetching method, compilation method and related devices.

Background technique

In computer systems, different storage devices usually have different access speeds. In computers with multi-level storage systems, computers usually use data prefetching technology to improve system access performance. Specifically, the computer will predict the data to be accessed and load the predicted data from a storage device with a slower access speed to a storage device with a faster access speed in advance, such as loading the predicted data from the memory to the cache ( cache).

Currently, existing data prefetching technologies usually prefetch data to be accessed based on historical memory access information. For example, when the computer detects historical memory access information and finds that the program accesses data in an address incrementing manner, the computer prefetches the data to be accessed based on the currently accessed data and the same address incrementing manner.

However, current data prefetching technology can only achieve effective prefetching for data whose storage addresses have certain patterns, such as data with continuous storage addresses or data with storage addresses that increase by a certain value. For data in a chained data structure with irregular storage addresses, current data prefetching technology is difficult to achieve effective prefetching of data. Among them, the chained data structure usually includes multiple dispersedly stored data, and the data including pointers in the chained data structure will point to the address where the next data is stored. When current data prefetching technology prefetches data in a chained data structure, it is often difficult to determine the amount of data to be prefetched. Too much data may be prefetched at one time, causing cache pollution, or too little data may be prefetched. However, the purpose of improving data access performance cannot be achieved.

Therefore, there is an urgent need for a method that can effectively prefetch data in a chained data structure.

Contents of the invention

This application provides a data prefetching method that can effectively prefetch data in a chained data structure.

A first aspect of the present application provides a data prefetching method, the method is applied to a first instance of a computer system, and the computer system further includes a second instance. Specifically, the data prefetching method includes: the first instance obtains a prefetch instruction in an executable file, the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata, and the data access instruction is used to indicate The address of a chained data structure. The chained data structure includes multiple data with discontinuous addresses. The at least one metadata is used to indicate the address of the data in the chained data structure. Among them, the address of the chained data structure can refer to the address of any data in the chained data structure. The address of the chained data structure can be determined based on the address of the first data that needs to be accessed in the chained data structure. For example, the first data that needs to be accessed in the chained data structure is the first data in the chained data structure. , then the data access instruction indicates the address of the first data in the chained data structure. In addition, the address of the chained data structure may indicate a specific single address, such as the starting address where certain data is stored; the address of the chained data structure may also indicate an address segment, such as the address segment where certain data is stored.

Then, the first instance obtains the address of the chained data structure according to the address of the data access instruction. Furthermore, the first instance prefetches data in the chained data structure according to the address of the chained data structure and the at least one metadata.

Additionally, the second instance executes the data access instructions to access data in the chained data structure.

Specifically, in the process of prefetching data in the chained data structure, the first instance controls the prefetching of the chained data structure according to the number of times the second instance executes the data access instruction. The progress of the data in the data structure, the progress is used to make the data in the chained data structure prefetched into the cache before being accessed.

In this solution, when the running device executes the executable file, it can determine the data access instruction and at least one metadata based on the prefetch instruction in the executable file, thereby realizing data prefetching in the chained data structure; and, the running device can After obtaining the address of the data access instruction based on the prefetch instruction, the data access progress in the chained data structure can be obtained according to the number of accesses of the data access instruction, thereby controlling the progress of prefetching the data in the chained data structure, that is, Adaptively adjust the amount of prefetched data to ensure effective prefetching of data in the chained data structure.

It should be noted that during the process of prefetching data in the chained data structure by the first instance, the data prefetched by the first instance may not include pointers to other data; or all the data prefetched by the first instance may be Part of the data includes pointers to other data, while other parts of the data do not include pointers to other data. Similarly, when the second instance executes the data access instruction, the data accessed by the second instance may not include pointers to other data; or part of all the data accessed by the second instance may include pointers to other data. pointers, while the other part of the data does not include pointers to other data.

In a possible implementation, during the process of prefetching data in the chained data structure by the first instance, the difference between the amount of prefetched data and the amount of accessed data is within a preset range. Inside. For example, assuming that the preset range is 5-10, the first instance can control the prefetch number of data in the chained data structure to always be 5-10 more than the actual number of accessed data, so as to ensure prefetching. While ensuring timeliness, it avoids polluting the cache by prefetching too much data. In addition, during the process of prefetching data, the first instance can dynamically adjust the above-mentioned preset range according to the progress of the data and the available cache space of the running device to ensure a balance between the amount of data prefetched and the available cache space.

In a possible implementation, the data in the chained data structure includes pointers pointing to addresses of other data in the chained data structure, and the at least one metadata is respectively related to the chained data structure. Corresponds to different data in the data, and the at least one metadata is used to indicate the position of the pointer in the corresponding data.

The first instance prefetches data in the chained data structure based on the address of the chained data structure and the at least one metadata, including: the first instance based on the address of the chained data structure , prefetch the data in the chained data structure; the first instance obtains the prefetched data according to the prefetched data in the chained data structure and the metadata corresponding to the prefetched data. the pointer in; the first instance prefetches other data pointed to by the prefetched data from the chained data structure according to the address pointed by the pointer in the prefetched data.

In a possible implementation, the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data. That is to say, for a certain metadata, the metadata is also used to indicate the size of other data pointed to by the pointer in the corresponding data. For example, assume that metadata 1 corresponds to data 1, and the pointer in data 1 points to data 2; then, metadata 1 is also used to indicate the size of data 2 pointed to by the pointer in data 1.

In a possible implementation, each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.

In this solution, by indicating the data type of the data in the metadata, it can be determined which data the pointer indicated by the metadata actually points to, thereby determining the link relationship of the data in the chained data structure, so that the operating device can operate in complex situations. Effectively implement data prefetching in the chained data structure.

In a possible implementation, the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.

In this solution, by indicating the address offset between the prefetch instruction and the data access instruction in the prefetch instruction, the encoding space occupied by the prefetch instruction can be reduced, thereby saving instruction overhead.

In a possible implementation, the prefetch instruction is specifically used to indicate the address of the at least one metadata; the method further includes: the first instance obtains the address of the at least one metadata according to the address of the at least one metadata. Describe at least one metadata.

In this solution, by indicating the storage address of metadata in the prefetch instruction, it is possible to avoid directly storing metadata in the prefetch instruction, reduce the encoding space occupied by the prefetch instruction, and thereby save instruction overhead.

In a possible implementation, the at least one metadata has the same size, and the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata; the first An example of obtaining the at least one metadata based on the storage address of the at least one metadata includes: the first example obtaining the at least one metadata from the start of the at least one metadata based on the number and size of the at least one metadata. The storage address starts to obtain the at least one metadata.

A second aspect of this application provides a compilation method, including: a compiler obtains a first code; wherein the first code may refer to a program source code, such as a code based on high-level languages such as java, c, c++, python, etc.

When recognizing that there is code requesting access to a chained data structure in the first code, the compiler generates a data access instruction and at least one metadata according to the chained data structure, wherein the chained data structure includes an address and A plurality of consecutive data, the at least one metadata is used to indicate the address of the data in the chained data structure, and the data access instruction is used to indicate the address of the chained data structure and request access to the chained data structure. data structure.

Finally, the compiler generates a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled second code, the prefetch instruction is used to indicate the address of the data access instruction and the at least a metadata.

In this solution, when the code accessing the chained data structure is identified during the compilation process, the data access instruction and at least one metadata used to indicate the address of the data to be accessed in the chained data structure are generated, and during the data access A prefetch instruction is inserted before the instruction to indicate the address of the data access instruction and the at least one metadata. In this way, when the running device executes the compiled executable file, it can determine the data access instruction and at least one metadata based on the prefetch instruction, thereby realizing data prefetching in the chained data structure; and, the running device can determine the data access instruction and at least one metadata based on the prefetch instruction. After the instruction fetch obtains the address of the data access instruction, it can learn the data access progress in the chained data structure according to the number of accesses of the data access instruction, thereby adaptively adjusting the amount of prefetched data to ensure that the data in the chained data structure effective prefetching.

In a possible implementation, the data in the chained data structure includes pointers pointing to addresses of other data, the at least one metadata respectively corresponds to different data in the chained data structure, and The at least one metadata is used to indicate the position of the pointer in the corresponding data.

In a possible implementation, the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.

In a possible implementation, the prefetch instruction is specifically used to indicate the address of the at least one metadata.

In a possible implementation, the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata, and the sizes of the at least one metadata are the same.

In a possible implementation, the at least one metadata is located in a code segment or data segment in the second code, and the second code is compiled based on the first code.

The third aspect of this application provides a data prefetching device, including:

An acquisition unit, configured to acquire a prefetch instruction, wherein the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata, the data access instruction is used to indicate the address of a chained data structure, the chained data The structure includes multiple data with discontinuous addresses, and the at least one metadata is used to indicate the address of the data in the chained data structure;

The acquisition unit is also configured to acquire the address of the chained data structure according to the address of the data access instruction;

A prefetch unit, configured to prefetch data in the chained data structure according to the address of the chained data structure and the at least one metadata;

An execution unit, used to execute the data access instructions to access the data in the chained data structure;

Wherein, in the process of prefetching data in the chained data structure, the first instance controls the prefetching of the chained data according to the number of times the second instance executes the data access instruction. The progress of the data in the structure, which is used to make the data in the chained data structure prefetched into the cache before being accessed.

In a possible implementation, during the process of the prefetch unit prefetching data in the chained data structure, the difference between the amount of prefetched data and the amount of accessed data is within a preset range. Inside.

In a possible implementation, the data in the chained data structure includes pointers pointing to addresses of other data in the chained data structure, and the at least one metadata is respectively related to the chained data structure. Corresponds to different data in the data, and the at least one metadata is used to indicate the position of the pointer in the corresponding data;

The prefetch unit is specifically configured to: prefetch data in the chained data structure according to the address of the chained data structure; and prefetch data in the chained data structure and the prefetched Get the metadata corresponding to the data and obtain the pointer in the prefetched data; prefetch the address of the prefetched data from the chained data structure according to the address pointed by the pointer in the prefetched data. other data pointed to.

In a possible implementation, the prefetch instruction is specifically used to indicate that the prefetch instruction is related to the data access Address offset between instructions.

In a possible implementation, the prefetch instruction is specifically used to indicate the address of the at least one metadata;

The acquisition unit is further configured to: acquire the at least one metadata according to the address of the at least one metadata.

In a possible implementation, the at least one metadata has the same size, and the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata;

The acquisition unit is further configured to: acquire the at least one metadata starting from a starting address of the at least one metadata according to the quantity and size of the at least one metadata.

A fourth aspect of this application provides a compilation device, including:

Get unit, used to get the first code;

A processing unit configured to generate a data access instruction and at least one metadata according to the chained data structure when it is recognized that the first code contains code requesting access to the chained data structure, wherein the chained data structure Includes multiple data with discontinuous addresses, the at least one metadata is used to indicate the address of the data in the chained data structure, and the data access instruction is used to indicate the address of the chained data structure and request access to all data. Described chained data structure;

The processing unit is further configured to generate a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled second code, where the prefetch instruction is used to indicate the address of the data access instruction. and said at least one metadata.

A fifth aspect of the present application provides an electronic device. The electronic device includes: a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the The electronic device performs the method implemented in any one of the first aspect or the second aspect.

A sixth aspect of the present application provides a computer-readable storage medium. A computer program is stored in the computer-readable storage medium. When it is run on a computer, it causes the computer to execute any one of the implementations of the first aspect or the second aspect. way method.

A seventh aspect of the present application provides a computer program product that, when run on a computer, causes the computer to execute the method implemented in any one of the first aspect or the second aspect.

An eighth aspect of this application provides a chip including one or more processors. Part or all of the processor is used to read and execute the computer program stored in the memory to perform the method in any possible implementation of any of the above aspects.

Optionally, the chip should include a memory, and the memory and the processor are connected to the memory through circuits or wires. Optionally, the chip also includes a communication interface, and the processor is connected to the communication interface. The communication interface is used to receive data and/or information that needs to be processed. The processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing results through the communication interface. The communication interface may be an input-output interface. The method provided by this application can be implemented by one chip, or can be implemented by multiple chips collaboratively.

For the beneficial effects of the second to eighth aspects of the present application, reference can be made to the introduction of the first aspect of the present application, and will not be described again here.

Description of the drawings

Figure 1 is a schematic diagram of a chained data structure provided by an embodiment of the present application;

Figure 2 is a schematic diagram of multiple different chained data structures provided by embodiments of the present application;

Figure 3 is a schematic diagram of a running device executing an application program provided by an embodiment of the present application;

Figure 4 is a schematic flowchart of a compilation method provided by an embodiment of the present application;

Figure 5 is a schematic diagram of the correspondence between data and metadata in a chained data structure provided by an embodiment of the present application;

Figure 6 is a schematic diagram of the types of data in a chained data structure according to an embodiment of the present application;

Figure 7 is a schematic flow chart of a data prefetching method provided by an embodiment of the present application;

Figure 8 is a schematic diagram of a system architecture provided by an embodiment of the present application;

Figure 9 is a schematic flowchart of a compilation method provided by an embodiment of the present application;

Figure 10A is a schematic diagram of compiling a verification program based on an existing compiler provided by an embodiment of the present application;

Figure 10B is a schematic diagram of a verification program compiled by a compiler based on the newly added optimized PASS provided by the embodiment of the present application;

Figure 11 is a schematic flow chart of a data prefetching method provided by an embodiment of the present application;

Figure 12 is a schematic structural diagram of a compilation device 1200 provided by an embodiment of the present application;

Figure 13 is a schematic structural diagram of a data prefetching device 1300 provided by an embodiment of the present application;

Figure 14 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the present application.

Detailed ways

The embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. The terms used in the embodiments of the present application are only used to explain specific embodiments of the present application and are not intended to limit the present application.

The embodiments of the present application are described below with reference to the accompanying drawings. Persons of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.

The terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances, and are merely a way of distinguishing objects with the same attributes in describing the embodiments of the present application. Furthermore, the terms "include" and "having" and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, product or apparatus comprising a series of elements need not be limited to those elements, but may include not explicitly other elements specifically listed or inherent to such processes, methods, products or equipment.

To facilitate understanding, the technical terms involved in the embodiments of this application will be explained below.

Chain data structure: The chain data structure includes multiple data with discontinuous addresses, and the multiple data have an address pointing relationship with each other, that is, the previous data in the chain data structure points to the address of the next data. Please refer to Figure 1, which is a schematic diagram of a chained data structure provided by an embodiment of the present application. As shown in Figure 1, each data in the chained data structure includes two parts, one part is the valid data part, and the other part is the pointer part, and the pointer part is used to point to the address of the next data linked to the current data. Simply put, the chained data structure uses pointers to reflect the logical relationship between data elements. In this way, in the process of accessing the chained data structure, the access is usually performed from front to back, that is, the previous data is accessed first, and then the next data can be accessed based on the address indicated by the previous data.

Usually, the structural forms of linked data structures mainly include one-way linked list, doubly linked list, circular linked list, spine-rib (Backbone-rib) linked list, binary tree structure and structure array structure. Among them, the structure array structure refers to saving the structure array in continuous memory, and the structure has pointers. For example, please refer to Figure 2, which is a schematic diagram of multiple different chained data structures provided by embodiments of the present application. (1) in Figure 2 shows a Backbone-rib linked list, (2) in Figure 2 shows a binary tree structure, and (3) in Figure 2 shows a structure array structure.

At present, chained data structures are mainly composed of dynamically connected data, usually in the form of trees/graphs/linked lists. Chained data structures are widely used in general computing, high performance computing (High Performance Computing, HPC), databases and artificial intelligence fields. It is also an important data structure for the underlying implementation of containers provided by object-oriented programming languages such as C++/Java. The chained data structure can make full use of computer memory space and achieve flexible dynamic memory management. However, the disadvantage of the chained data structure is that there is no spatial locality between data. Therefore, reading the chained data structure is mostly a typical irregular memory access, which easily causes memory access delays and limits the central processing unit. , CPU) performance appears as a performance bottleneck in different application scenarios.

Memory access latency: Memory access latency is the delay caused by waiting for access to data stored in system memory to complete.

Compilation: refers to the process of using a compiler to generate object code from a source program written in a source language. Object code is a language between high-level language and machine language. The object code can be further converted into executable binary machine code. Simply put, compilation is the conversion of a source program written in a high-level language into an object code that is closer to machine language. Since the computer only recognizes 1 and 0, compilation actually means turning the high-level language that people are familiar with into a binary language that the computer can recognize. The compiler's process of translating a source program into a target program is divided into five stages: lexical analysis; syntax analysis; semantic checking and intermediate code generation; code optimization; and target code generation.

Intermediate code: It is an internal representation of the source program, which can also be called intermediate representation (IntermediateRepresentation, IR). The function of the intermediate representation is to make the structure of the compiled program logically simpler and clearer, especially to make the optimization of the target code easier to implement. The complexity of the intermediate representation is somewhere between source programming language and machine language.

Code optimization: refers to performing various equivalent transformations on the program so that more effective target code can be generated based on the transformed program. The so-called equivalence means that the running results of the program are not changed. The so-called effective mainly refers to the short running time of the target code and the small storage space occupied. This transformation is called optimization.

Optimization Pass: Optimization Pass is an important part of the compilation framework. Optimization Pass analyzes and modifies the intermediate representation. In the process of code optimization, the intermediate representation is analyzed and modified by multiple optimization passes, and each pass completes specific optimization work.

Program Counter (PC): A register used to store the address of the next instruction to be executed by the running device.

Metadata: Metadata, also known as intermediary data and relay data, is a type of data that describes data (dataabout data). Metadata is mainly information that describes data attributes (property) and is used to support functions such as indicating storage location, historical data, resource search, file recording, etc. Specifically, metadata is a kind of electronic catalog. In order to achieve the purpose of cataloging, it must describe and collect the content or characteristics of the data, thereby achieving the purpose of assisting data retrieval.

Generally speaking, an application program is composed of program segments such as program code segments, data segments, and read-only data segments. The program code segments are composed of consecutive instructions. During the execution of the application program, the operating system loads the program segments of the application program into the memory, and then the running device sequentially executes the instructions in the program code segments based on a certain order, thereby realizing the execution of the application program.

Please refer to FIG. 3 , which is a schematic diagram of a running device executing an application program according to an embodiment of the present application. As shown in Figure 3, running equipment usually includes a control unit, a storage unit and an arithmetic unit. The control unit includes an instruction counter and an instruction register. The instruction counter is used to store the address of the next instruction to be executed in the memory, and the instruction register is used to store the instruction to be executed. The storage unit usually includes multiple registers, such as general-purpose registers, floating-point registers, etc. The registers in the storage unit are usually used to store data needed during the execution of instructions. The computing unit is used to process data according to the currently executed instructions.

Based on the above structure, the operating principle of the operating device is: under the action of the timing pulse, the control unit sends the instruction address pointed to by the instruction counter (that is, the address of the instruction in the memory) to the address bus (not shown in Figure 3 ), and then the running device reads the instruction in this instruction address into the instruction register for decoding. For the data needed in the execution of instructions, the running device sends the data address corresponding to the data to the address bus, and based on the data address, the data is read into the temporary storage unit inside the running device. Finally, the computing unit in the running device processes the data based on the currently executed instructions. In general, the running device fetches instructions and corresponding data from the memory one by one, and performs operations on the data according to the operation codes in the instructions until the program is executed.

Specifically, the working process of the operating device can be divided into five stages: instruction fetching, instruction decoding, instruction execution, memory access and result writing back.

1. Instruction Fetch (IF) stage.

The instruction fetch phase is the process of fetching an instruction from memory to the instruction register. Among them, the value in the instruction counter is used to indicate the location of the next instruction to be executed in the memory. When an instruction is fetched, the value in the instruction counter is automatically incremented according to the length of the instruction.

2. Instruction Decode (ID) stage.

After fetching the instruction, the running device immediately enters the instruction decoding stage. In the instruction decoding stage, the instruction decoder splits and interprets the retrieved instructions according to the predetermined instruction format, and identifies and distinguishes different instruction categories and various methods of obtaining operands.

3. Execute instruction (Execute, EX) stage.

After the instruction fetch and instruction decoding stages, the operating device enters the instruction execution stage. The task of executing the instruction phase is to complete various operations specified by the instruction to realize the function of the instruction. Therefore, different parts of the operating equipment are connected to perform the required operations. For example, if an addition operation is required, the arithmetic logic unit in the arithmetic unit will be connected to a set of inputs and a set of outputs. The input terminals provide the values to be added, and the output terminals will contain the final operation result.

4. Access data (Memory, MEM) stage.

During the execution of instructions, depending on the instruction requirements, the running device may need to access memory to read the operands. In this case, the running device enters the access and data access phase. The task of the access phase is: the operating device obtains the address of the operand in the memory according to the instruction address code, and reads the operand from the memory for operation.

5. Result writeback (WB) stage.

As the last stage, the result write-back stage "writes back" the running result data of the execution instruction stage into a certain storage structure. For example, the result data is usually written to the internal register of the running device so that it can be quickly accessed by subsequent instructions; in some cases, the result data can also be written to a relatively slower, but cheaper and larger capacity in memory.

After the instruction is executed and the result data is written back, the running device then obtains the address of the next instruction from the instruction counter and starts a new cycle. The next instruction will be sequentially fetched in the next instruction cycle.

From the above introduction to the working process of the operating device, it can be seen that the operating device usually needs to execute the access phase when processing each memory access instruction, and only after executing the access phase can the data obtained from the memory be processed. Perform computational processing. As a result, when running the device requires a large number of instructions, the running device needs to wait for data to be fetched from the memory to the cache every time it processes a memory access instruction, resulting in a huge memory access delay.

In view of this, the industry generally attempts to cover up memory access delays through prefetching technology. Prefetching technology mainly includes software prefetching technology (SoftWare Prefetch, SWP) and hardware prefetching technology (HardWare Prefetch, HWP).

Software prefetching technology refers to explicitly inserting prefetch instructions into the program to allow the running device to read the data at the specified address from the memory into the cache (Cache). Prefetch instructions can be added automatically by the compiler or manually by the programmer. Software prefetching has almost no hardware requirements. Its biggest technical challenge is how to correctly add prefetch instructions in the target code. For chained data structures, it is difficult to optimize through software prefetching because chained data is calculated The address overhead of structure prefetching is very high, which can easily cause the problem of insufficient prefetch advance.

Hardware prefetching technology uses hardware to prefetch possible future memory access units into the cache based on historical memory access information. Typical hardware prefetchers include stream prefetchers and stride prefetchers. The role of stream prefetchers When it is detected that the program accesses data by increasing the address, the data of the next cache line (Cacheline) is automatically prefetched. The stride prefetcher monitors each memory load instruction (Load). When regular stride reads are found, the prefetcher will precalculate the next address and initiate prefetching. Most of the existing hardware prefetching technologies in the industry are based on the assumptions of temporal locality and spatial locality. However, the linked list data structure is very unfriendly to the current CPU memory access architecture, which also leads to the unsatisfactory performance of current commercial CPUs in such applications. , it is difficult to prefetch complex irregular memory accesses.

Moreover, when current data prefetching technology prefetches data in a chained data structure, it is often difficult to determine the amount of data to be prefetched. It may often prefetch too much data at one time, leading to cache pollution, or data corruption. Prefetching is too little to achieve the purpose of improving data access performance.

In view of this, embodiments of the present application provide a compilation method and a data prefetching method. When the behavior of accessing a chained data structure is recognized during the compilation process, a data access instruction is generated and used to indicate the behavior of the chained data structure. At least one metadata of the address of the data to be accessed, and a prefetch instruction is inserted before the data access instruction to indicate the address of the data access instruction and the at least one metadata. In this way, when the running device executes the compiled executable file, it can determine the data access instruction and at least one metadata based on the prefetch instruction, thereby realizing data prefetching in the chained data structure; and, the running device can determine the data access instruction and at least one metadata based on the prefetch instruction. After the instruction fetch obtains the address of the data access instruction, it can learn the data access progress in the chained data structure according to the number of accesses of the data access instruction, thereby adaptively adjusting the amount of prefetched data to ensure that the data in the chained data structure effective prefetching.

The compilation method provided by the embodiment of the present application can be applied to compile codes with chained data structure access behavior, such as code compilation in fields such as general computing, high-performance computing, databases, and artificial intelligence. The data prefetching method provided by the embodiment of the present application can be applied to scenarios that need to execute applications that require access to chained data structures.

For example, the compilation method and data prefetching method provided by the embodiments of the present application can be applied to electronic devices. The electronic device provided by the embodiment of the present application can be, for example, a server, a smart phone (mobile phone), a personal computer (PC), a notebook computer, a tablet computer, a smart TV, a mobile Internet device (mobile internet device, MID), or Wearable devices, virtual reality (VR) devices, augmented reality (AR) devices, wireless terminals in industrial control, wireless terminals in self-driving, remote surgery (remote medical) Wireless terminals in surgery, wireless terminals in smart grid, wireless terminals in transportation safety, wireless terminals in smart city, wireless terminals in smart home, etc. .

Please refer to Figure 4, which is a schematic flowchart of a compilation method provided by an embodiment of the present application. As shown in Figure 4, the compilation method includes the following steps 401-403.

Step 401: Obtain the first code.

In this embodiment, the first code may refer to program source code. Among them, program source code refers to an uncompiled text file written in accordance with certain programming language specifications. It is a series of human-readable computer language instructions. For example, the program source code may be code written based on high-level languages such as java, c, c++, python, etc.

Step 402: When it is recognized that the first code contains code requesting access to a chained data structure, generate at least one metadata and data access instruction according to the chained data structure, wherein the chained data structure includes an address. A plurality of discontinuous data, the at least one metadata is used to indicate the address of the data to be accessed in the chained data structure, and the data access instruction is used to indicate the address of the chained data structure and request access to all data. Describe the chained data structure.

In this embodiment, when the compiler is compiling the first code, when the compiler recognizes that there is a behavior in the first code that requests access to the chained data structure, since each data in the chained data structure will point downward The address of a data, so the compiler can obtain the address of the data to be accessed in the chained data structure based on the actual structure of the chained data structure, thereby generating at least one metadata. Among them, at least one metadata generated by the compiler is used to indicate the chain The addresses of multiple data to be accessed in the formula data structure. The chained data structure includes multiple data with discontinuous addresses, and there is an address pointing relationship between the data. In addition, the multiple data to be accessed in the chain data structure may be all the data in the chain data structure or part of the data in the chain data structure, which is not specifically limited in this embodiment.

It should be noted that the at least one metadata described in this embodiment refers to one or more metadata. For convenience of description, "at least one metadata" will be referred to as "metadata" below.

Optionally, the metadata generated by the compiler may be a code segment or data segment located in the second code, which is compiled based on the first code, that is, the second code is actually the compiler The executable file compiled based on the first code. That is to say, the metadata may be stored in the code segment of the second code as an instruction code that is not executed, or the metadata may be stored in the data segment of the second code as a type of data in the program code.

In addition, when the compiler recognizes that there is a request to access the chained data structure in the first code, the compiler also generates a data access instruction so that the subsequent running device can execute the compiled executable file according to the data access instruction. Access chained data structures. The data access instruction is specifically used to request access to the chained data structure, and the data access instruction also indicates the address of the chained data structure.

Among them, the address of the chained data structure can refer to the address of any data in the chained data structure. The address of the chained data structure may be determined based on the address of the first data in the chained data structure that needs to be accessed. For example, in the first code, if the first data that needs to be accessed in the chained data structure is the first data in the chained data structure, then the data access instruction indicates the address of the first data in the chained data structure; if The first data that needs to be accessed in the chained data structure is some data in the middle of the chained data structure, then the data access instruction indicates the address of the data in the middle of the chained data structure. In addition, the address of the chained data structure may indicate a specific single address, such as the starting address where certain data is stored; the address of the chained data structure may also indicate an address segment, such as the address segment where certain data is stored.

In addition, the addresses described in the embodiments of this application, such as addresses of chained data structures, addresses of data access instructions, etc., may refer to physical storage addresses or virtual storage addresses, which are not specifically limited in this embodiment.

Step 403: Generate a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled second code. The prefetch instruction is used to indicate the address of the data access instruction and the at least one metadata.

After the compiler generates metadata and data access instructions, the compiler further generates prefetch instructions, which are used to indicate the address of the data access instructions and metadata. In addition, the compiler can also insert the prefetch instruction before the data access instruction, so that during the application execution phase, when the running device executes the compiled executable file, it executes the prefetch instruction first and then executes the data access instruction. .

In this embodiment, by inserting prefetch instructions into the compiled executable file, the running device can determine the data access instructions and metadata based on the prefetch instructions when executing the executable file, thereby realizing the data in the chained data structure. Prefetching. That is, the running device first determines the address of the data access instruction based on the prefetch instruction, thereby obtaining the starting storage address of the chained data structure by accessing the address of the data accessing instruction; then, based on the starting storage address and element of the chained data structure Data, ordered prefetching of data in a chained data structure. Moreover, after the running device obtains the address of the data access instruction based on the prefetch instruction, it can learn the data access progress in the chained data structure based on the number of accesses of the data access instruction, thereby adaptively adjusting the amount of prefetched data. Ensure efficient prefetching of data in chained data structures.

Optionally, the above-mentioned prefetch instruction may directly indicate the address of the data access instruction, such as the prefetch instruction indicator number. The address of the data access instruction is 0x1002. The prefetch instruction may also indicate an address offset between the prefetch instruction and the data access instruction. For example, if the address of the prefetch instruction is 0x1008 and the address of the data access instruction is 0x1002, then the address offset between the prefetch instruction and the data access instruction is 06. It is understandable that, considering the timeliness and effectiveness of the prefetch instruction prompting the running device to perform prefetching, the prefetch instruction is generated before the data access instruction, and the address offset between the two instructions is small.

Therefore, in this embodiment, compared with directly indicating the address of the data access instruction in the prefetch instruction, by indicating the address offset between it and the data access instruction in the prefetch instruction, the encoding of the prefetch instruction can be reduced space occupied, thereby saving instruction overhead.

The above describes that the compiler generates metadata and data access instructions during the compilation process, and inserts prefetch instructions indicating metadata and data access instructions before data access. For ease of understanding, the metadata generated by the compiler is described in detail below.

Optionally, the above-mentioned prefetch instruction may also indicate the content of the metadata, or indicate the storage address of the metadata.

Among them, there are many ways to implement the prefetch instruction to indicate the storage address of the metadata.

Implementation Mode 1: The prefetch instruction is used to indicate the starting storage address of the metadata and the quantity of the metadata, and the size of each metadata is the same.

In this embodiment, the sizes of the metadata generated by the compiler are all the same, and the storage addresses of the metadata are continuous. Therefore, the compiler may indicate the starting storage address of the metadata and the amount of metadata in the prefetch instruction. In this way, the running device can first take out the first metadata in the metadata based on the starting storage address of the metadata and the size of the metadata; and use the size of the metadata as the address offset to continue to take out other subsequent ones. metadata, thereby enabling prefetching of all metadata.

For example, assuming the number of metadata is 4, the first metadata is stored in 0x0004-0x0007, the second metadata is stored in 0x0008-0x000b, the third metadata is stored in 0x000c-0x000f, and the fourth metadata is stored in at 0x0010-0x0013. Then, the compiler can indicate in the prefetch instruction that the starting storage address of metadata is 0x0004 and the number of metadata is 4. In this way, the running device can determine the storage address of each metadata based on the size of the metadata being 4 bytes and the starting storage address and quantity of the metadata.

Implementation Mode 2: The prefetch instruction is used to indicate the starting storage address of metadata and the size of each metadata.

Compared with implementation method 1, in implementation method 2, the size of each metadata indicated by the prefetch instruction may be different.

For example, assume that the number of metadata is 4, the prefetch instruction indicates that the starting storage address of multiple metadata is 0x0001, and the size of the first metadata is 2 bytes, and the size of the second metadata is 4 bytes, the size of the third metadata is 2 bytes, and the size of the fourth metadata is 6 bytes. In this way, the running device can determine the storage address of the first metadata as 0x0000-0x0001, the storage address of the second metadata as 0x002-0x005, and the storage address of the third metadata based on the starting storage address of the metadata and the size of each metadata. The storage address of metadata is 0x0006-0x0007, and the storage address of the fourth metadata is 0x0008-0x000d.

It can be understood that since the data in the chained data structure includes pointers pointing to the addresses of other data, that is, the data in the chained data structure includes pointers pointing to the addresses of the next data. Therefore, in some possible implementations, the metadata generated by the compiler may not directly indicate the address of the data in the chained data structure, but may indicate the location of the pointer in the data.

For example, the plurality of metadata generated by the compiler may respectively correspond to different data in the chained data structure. For example, each metadata in the plurality of metadata corresponds to each piece of data in the chained data structure. There is a one-to-one correspondence between the data to be accessed. Moreover, the plurality of metadata are used to indicate the position of the pointer in the corresponding data.

Please refer to FIG. 5 , which is a schematic diagram of the correspondence between data and metadata in a chained data structure provided by an embodiment of the present application. As shown in Figure 5, the chained data structure includes 4 data to be accessed, namely: data 1, data 2, data 3 and data 4. The compiler generates 4 metadata based on the 4 data to be accessed in the chain data structure, namely: metadata 1, metadata 2, metadata 3 and metadata 4. Moreover, the four pieces of metadata generated by the compiler correspond to the four pieces of data to be accessed in the chained data structure. Among them, metadata 1 indicates that the offset of pointer 1 in data 1 in data 1 (that is, the offset between the starting storage address of pointer 1 and the starting storage address of data 1) is 8; metadata 2 Indicates that the offset of pointer 2 in data 2 is 14; metadata 3 indicates that the offset of pointer 3 in data 3 is 4; metadata 4 indicates that pointer 4 in data 4 The offset in data 4 is 14. In this way, after the running device obtains each metadata, it can determine the position of the pointer of each data prefetched from the chain data structure based on the content indicated by the metadata, and then continue to determine the location of the next data that needs to be prefetched. address.

Optionally, each of the plurality of metadata generated by the compiler is also used to indicate the size of other data pointed to by its corresponding data. In this way, when the size of the data in the chained data structure is different, when the running device executes the compiled executable file, the running device can determine the size of the next data that needs to be prefetched according to the instructions of the metadata, thereby Prefetching of data is implemented based on the starting storage address and size of the next data that needs to be prefetched.

Illustratively, taking Figure 5 as an example, metadata 1 corresponds to data 1, and metadata 1 can also indicate the size of data 2 pointed to by data 1. In this way, after the running device determines the position of pointer 1 in data 1 based on the offset indicated in metadata 1, it can determine the starting storage address of data 2 based on pointer 1; then, the running device combines the offset indicated in metadata 1 The indicated size of data 2 and the starting storage address of data 2 determine the actual storage address of the entire data 2, thereby realizing prefetching of data 2.

Optionally, each of the plurality of metadata generated by the compiler is also used to indicate the type of its corresponding data and the type of other data pointed to by its corresponding data.

Please refer to FIG. 6 , which is a schematic diagram of data types in a chained data structure according to an embodiment of the present application. As shown in Figure 6, in the Backbone-rib chained data structure, part of the data is linked to multiple data, that is, part of the data points to the addresses of multiple other data. In this case, set the data types of the same row to be the same, that is, the data type of the first row of data is 0, the data type of the second row of data is 1, and the data type of the third row of data is 2. In this way, when the metadata indicates data types 0 and 1, the running device can determine that the data corresponding to the metadata is the first row of data, and the data pointed to by the metadata corresponding to the data is the second row of data; similar Specifically, when the metadata indicates data types 0 and 0, the running device can determine that the data corresponding to the metadata is the first row of data, and the data pointed to by the data corresponding to the metadata is the first row of data.

In this way, by indicating the data type of the data in the metadata, it is possible to determine which data the pointer indicated by the metadata actually points to, thereby determining the link relationship of the data in the chained data structure, so that the running device can operate in complex situations. Effectively implement data prefetching in the chained data structure.

The above describes a compilation method provided by the embodiment of the present application. For ease of understanding, a data prefetching method provided by embodiments of the present application will be introduced below to understand how the running device implements data prefetching based on the compiled executable file.

Please refer to FIG. 7 , which is a schematic flowchart of a data prefetching method provided by an embodiment of the present application. As shown in Figure 7, the data prefetching method includes the following steps 701-704. Moreover, the data prefetching method is applied to the first instance of the computer system, and the computer system further includes the second instance.

Step 701: The first instance obtains a prefetch instruction, wherein the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata. The data access instruction is used to indicate the address of a chained data structure. The chained data structure includes a plurality of data with discontinuous addresses, and the at least one metadata is used to indicate the address of the data in the chained data structure.

In this embodiment, during the process of executing the executable file of the application program, the first instance in the running device can obtain the prefetch instructions in the executable file. The prefetch instructions in the executable file are the prefetch instructions compiled in the above-mentioned compilation method. For details, please refer to the above-mentioned compilation method, which will not be described again here.

Wherein, the second instance is used to execute a data access instruction to request access to data in the chained data structure. Specifically, the second instance is used to execute the executable file of the application program, while the first instance independently executes the data prefetching method provided by the embodiment of the present application.

It should be noted that the first instance and the second instance in the embodiment of the present application may be two physically independent execution units. For example, the first instance and the second instance may be two independent processors or processing cores respectively. The first instance and the second instance may also be two virtual independent execution units. For example, the first instance and the second instance may be different threads, hyper-threads or processes respectively, which is not specifically limited in this embodiment.

Step 702: The first instance obtains the address of the chained data structure according to the address of the data access instruction.

In this embodiment, after the second instance obtains the prefetch instruction in the executable file, the running device executes the prefetch instruction to start the first instance. After the first instance is started, the first instance can also obtain the prefetch instruction. Since the metadata and the address of the data access instruction are indicated in the prefetch instruction, the first instance can obtain the metadata according to the prefetch instruction and temporarily store the metadata to facilitate subsequent prefetching of data based on the metadata. In addition, the running device can also obtain the address of the chained data structure indicated by the data access instruction based on the address of the data access instruction indicated by the prefetch instruction.

Specifically, since the prefetch instruction is inserted before the data access instruction, after the first instance obtains the address of the data access instruction according to the prefetch instruction, the first instance can monitor the address of the data access instruction in real time to determine when the second instance is executed. to data access instructions. When the second instance executes the data access instruction, the first instance can obtain the address of the chained data structure based on the data access instruction.

Among them, the address of the chained data structure can refer to the address of any data in the chained data structure. chained data The address of the structure can be determined based on the address of the first data that needs to be accessed in the chained data structure. For example, in the first code, if the first data that needs to be accessed in the chained data structure is the first data in the chained data structure, then the data access instruction indicates the address of the first data in the chained data structure; if The first data that needs to be accessed in the chained data structure is some data in the middle of the chained data structure, then the data access instruction indicates the address of the data in the middle of the chained data structure. In addition, the address of the chained data structure may indicate a specific single address, such as the starting address where certain data is stored; the address of the chained data structure may also indicate an address segment, such as the address segment where certain data is stored.

Step 703: The first instance prefetches data in the chained data structure based on the address of the chained data structure and the at least one metadata.

After obtaining the starting storage address and metadata of the chained data structure, the first instance can sequentially prefetch data in the chained data structure based on the starting storage address and the address of the data to be accessed indicated by the metadata.

Step 704: The second instance executes the data access instruction to access data in the chained data structure.

Specifically, during the process of prefetching data by the first instance, the amount of prefetched data in the chained data structure of the first instance is related to the number of execution times of the data access instructions to ensure that the data prefetched by the first instance There is always more data than is actually accessed. Among them, the number of execution times of data access instructions represents the number of accessed data in the chain data structure. Each time the second instance executes a data access instruction, it means that the accessed data in the chained data structure increases by one. Specifically, the data access instruction may instruct the second instance to access the address indicated by a specific location in a register to access the data in the chained data structure; in addition, the second instance also accesses the address indicated by the specific location in the register. After the indicated address obtains the data that needs to be accessed, the data in the register is replaced with the newly obtained data. In this way, when the second instance continues to execute the data access instruction, it can obtain the next data in the chain data structure according to the address indicated by the new data in the register.

For example, assume that the pointer to a specific type of data in each data structure in the chained data structure is located at the same location, that is, the offset of the pointer in the data is the same. In this way, the data access instruction may be an instruction to access an address indicated at a specific offset in a certain register, that is, to access an address indicated by a pointer at a specific offset in the data stored in the register. Then, every time the second instance executes the data access instruction, the data in the register will change, and the next time the second instance executes the data access instruction, it can access the next one in the chained data structure based on the data in the register. data.

That is to say, after the first instance obtains the address of the data access instruction based on the prefetch instruction, it can monitor the number of accesses of the data access instructions in real time, so that it can learn the data in the chained data structure based on the number of accesses of the data access instructions. Access progress, and then adaptively adjust the amount of prefetched data to avoid too little or too much data prefetching and ensure effective prefetching of data in the chained data structure.

Optionally, since the number of executions of the data access instruction can indicate the number of actually accessed data in the chained data structure, the first example may be to control the prefetch number of data in the chained data structure and the chained data structure. The difference between the amount of data actually accessed is within the preset range. For example, assuming the default range is 5-10, then the first actual For example, you can control the number of prefetched data in the chained data structure to always be 5-10 more than the actual number of accessed data, so as to ensure the timeliness of prefetching and avoid contaminating the cache due to too much prefetched data. .

It can be understood that the value of the preset range can be adjusted according to actual application scenarios. For example, when the running device cache is large and the data access performance requirements are not high, the value of the preset range can be adjusted to a larger value; when the running device cache is small and the data access performance requirements are not high, the preset range value can be adjusted to a larger value. The set range value can be adjusted to a smaller value.

Optionally, the prefetch instruction obtained in the first example may be specifically used to indicate the address offset between the prefetch instruction and the data access instruction. In this way, the running device can determine the actual address of the data access instruction based on the actual address of the prefetch instruction and the address offset between it and the data access instruction.

It should be noted that during the process of prefetching data in the chained data structure by the first instance, the data prefetched by the first instance may not include pointers to other data; or all the data prefetched by the first instance may be Part of the data includes pointers to other data, while other parts of the data do not include pointers to other data. Similarly, when the second instance executes the data access instruction, the data accessed by the second instance may not include pointers to other data; or part of all the data accessed by the second instance may include pointers to other data. pointers, while the other part of the data does not include pointers to other data. The embodiments of this application do not specifically limit the content of prefetched data and accessed data.

In some possible implementations, the above-mentioned prefetch instruction may also indicate the content of metadata, or indicate the storage address of the metadata. Then, when the prefetch instruction indicates the storage address, the running device obtains the metadata based on the storage address of the metadata.

For example, when the size of each metadata in the plurality of metadata is the same, the prefetch instruction specifically indicates the starting storage address of the plurality of metadata and the number of the plurality of metadata. Therefore, after obtaining the prefetch instruction, the running device obtains the plurality of metadata starting from the starting storage address of the metadata according to the number and size of the plurality of metadata. Specifically, the running device first takes out the first metadata among multiple metadata according to the starting storage address of the metadata and the size of the metadata; and uses the size of the metadata as the address offset to continue to take out other subsequent metadata. metadata, thereby enabling prefetching of all metadata.

In addition, the prefetch instruction can also indicate the storage address of the metadata through other implementation methods. For details, please refer to the embodiment corresponding to Figure 3 above, which will not be described again here.

Optionally, since the data in the chained data structure includes pointers pointing to the addresses of other data in the same chained data structure, the plurality of metadata indicated in the prefetch instruction may be separately associated with the chained data structure. Corresponds to different data in the data, and the plurality of metadata are used to indicate the position of the pointer in the corresponding data.

That is to say, the metadata generated by the compiler may not directly indicate the address of the data in the chained data structure, but indicate the location of the pointer in the data. In this case, the running device can implement data prefetching based on the following steps.

First, the running device prefetches the first data in the chained data structure according to the starting storage address of the chained data structure.

Then, the running device obtains the pointer in the prefetched data according to the prefetched data in the chained data structure and the metadata corresponding to the prefetched data, where the prefetched data includes the first one in the chained data structure. data or based on The first data mentioned above continues to prefetch other data obtained.

Finally, the running device prefetches other data pointed to by the prefetched data from the chain data structure according to the address pointed by the pointer in the prefetched data.

For example, taking Figure 5 as an example, it is assumed that the running device prefetches data 1 (ie, the first data) of the chained data structure according to the starting storage address of the chained data structure. The running device then obtains the position of the pointer in data 1 based on data 1 and metadata 1 corresponding to data 1, and then obtains the pointer in data 1. Finally, the running device prefetches data 2 pointed to by data 1 from the chained data structure based on the address pointed to by the pointer in data 1. Similarly, after prefetching data 2, the running device can continue to prefetch data 3 pointed to by data 2 based on data 2 and metadata 2 corresponding to data 2, and this cycle continues until the prefetched data reaches the required level.

Optionally, the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data. Please refer to the above embodiment for details, which will not be described again here.

Optionally, each metadata in the at least one metadata is also used to indicate the type of corresponding data and the type of other data pointed to by the corresponding data. For details, please refer to the above embodiment, here No longer.

In order to facilitate understanding, the above-mentioned compilation method and data prefetching method will be introduced in detail below with specific examples.

Please refer to Figure 8, which is a schematic diagram of a system architecture provided by an embodiment of the present application. As shown in Figure 8, the system architecture includes a compiler and running equipment. Among them, the compiler is used to compile the application source code to obtain the executable file of the application. Moreover, the executable file compiled by the compiler includes prefetch instructions, data access instructions and metadata. The running device is used to execute the executable file of the application and obtain data access instructions and metadata according to the prefetch instructions in the executable file to implement prefetching of data in the chained data structure.

Please refer to Figure 9, which is a schematic flowchart of a compilation method provided by an embodiment of the present application. As shown in Figure 9, the compilation method includes the following steps 901-904.

Step 901: The compiler identifies the memory access behavior of the chained data structure in the source code, refines the data link relationship in the chained data structure, and obtains at least one metadata.

When the compiler compiles the source code, when the compiler recognizes that there is a memory access behavior for the chained data structure in the source code, the compiler refines the data link relationship in the chained data structure and obtains at least one element. data. Among them, each metadata corresponds to one piece of data in the chained data structure, and each metadata indicates another data pointed to by its corresponding data. In addition, each metadata can also indicate the size and data type of another data that its corresponding data points to.

In some embodiments, in order to facilitate the management of metadata, the size of each metadata generated by the compiler is the same. For example, each metadata generated by the compiler is saved in X bytes, where X can be set according to different application scenarios, and is not specifically limited here.

Step 902: The compiler generates data access instructions based on the memory access behavior of the chained data structure in the source code.

The data access instruction is used to instruct access to data in the chained data structure, and the data access instruction also indicates the starting storage address of the chained data structure.

Step 903: The compiler inserts a prefetch instruction before the data access instruction to indicate the address and metadata of the data access instruction.

In this embodiment, the prefetch instruction is used to indicate the address and metadata of the data access instruction, and the prefetch instruction is inserted before the data access instruction. That is to say, during the application execution phase, when the running device executes the compiled executable file, it first executes the prefetch instruction and then executes the data access instruction to facilitate data prefetching.

Specifically, the prefetch instruction may indicate the starting storage address of the metadata, the amount of metadata, and the address of the data access instruction (for example, the offset between the data access instruction and the prefetch instruction).

Step 904: The compiler generates an executable file carrying prefetch instructions, data access instructions and metadata.

Finally, after the compiler completes compiling the source code, an executable file carrying prefetch instructions, data access instructions and metadata can be generated.

In order to explain the embodiment of the present application in detail, this embodiment extracts the access pattern to the chained data structure from a typical workload and constructs a verification program. Then, the new optimized PASS compiler is used to compile the verification program, generate a binary program (ie, executable file) with prefetch instructions, and run the binary program to the emulator for verification.

Please refer to Figure 10A. Figure 10A is a schematic diagram of compiling a verification program based on an existing compiler provided by an embodiment of the present application. As shown in Figure 10A, a in Figure 10A shows a partial structure of the chained data structure, and b in Figure 10A shows the code instructing access to the chained data structure in the verification program. In c in Figure 10A, for the code instructing access to the chained data structure in the verification program, the existing compiler generates the corresponding data access instruction after compilation.

Please refer to Figure 10B. Figure 10B is a schematic diagram of a verification program compiled by a compiler based on the newly added optimization PASS provided by an embodiment of the present application. As shown in FIG. 10B , in the embodiment of the present application, a new optimization PASS is added to the compiler, so that with the new optimization PASS, corresponding prefetch instructions can be generated for the memory access behavior of the chained data structure during the compilation phase. As shown in c in Figure 10B, compared to the assembly code compiled based on the existing compiler in Figure 10A, the assembly code compiled based on the new optimized PASS compiler also includes corresponding prefetch instructions and metadata. Among them, the newly added prefetch instruction is the instruction indicated by the address 400b00, which is specifically [2, 0x104, 0xc0]. In the prefetch instruction, 2 represents the number of metadata; 0x104 represents the address offset between the prefetch instruction and the metadata; 0xc0 represents the address offset between the prefetch instruction and the data access instruction. Based on the address of the prefetch instruction and the address offset 0x104 indicated by the prefetch instruction, the running device can determine that the address of the metadata is 400c04; based on the address of the prefetch instruction and the address offset 0xc0 indicated by the prefetch instruction, the running device can Determine the address of the data access instruction is 400bc0.

An implementation manner in which the compiler generates metadata based on a chained data structure in this embodiment will be introduced in detail below with reference to Figure 10B.

When the compiler compiles the verification program, it recognizes the memory access behavior of the chained data structure in the verification program. Regarding the verification program shown in Figure 10B, the compiler recognizes the loop shown in b in Figure 10B, and generates corresponding metadata based on the data structure information shown in a in 10B.

Specifically, for the metadata generated by the compiler, the metadata is in units of 4 bytes. Each metadata stores information about a certain data in the chained data structure and another data pointed to by a pointer in the data. . Among them, the content stored in each metadata includes the following five types of information.

1. Node identification (Node-ID). Among them, different types of data in the chain data structure are assigned a Node-ID. The Node-ID in the metadata is used to indicate the type of data corresponding to the metadata.

2. Address offset (Offset). Among them, Offset stores the relative offset of the pointer (Ptr) of the data corresponding to the current metadata in N bytes (Byte), that is, it indicates the position of the pointer within the data corresponding to the metadata.

3.Nextnode-ID. Nextnode-ID indicates the ID of the next data pointed to by Ptr, that is, it indicates the type of another data pointed to by the data corresponding to the metadata.

4.Nextnode-size. Nextnode-size, in units of M bytes, stores the size of the next data pointed to by Ptr, which indicates the size of another data pointed to by the data corresponding to the metadata.

5.RSV: Subsequent expansion of other metadata information that needs to be provided to the hardware.

It should be noted that the above N and M and the coding space occupied by each part can be adjusted according to the application and architecture. This embodiment does not limit the specific values of these parameters.

Specifically, the example of this application allocates the encoding space of metadata in the above manner, and sets Offset and Nextnode-size to be expressed in bytes; then, for the program shown in Figure 10B, the metadata generated by the compiler is as shown in the table 1 shown.

Table 1

Since the program shown in b in Figure 10B needs to access the data of the BackboneNode node and the data of the RibNode node, the compiler passes the metadata to the hardware to calculate the information required for the addresses of these two nodes. Since the memory access behavior in b in Figure 10B does not access the ArcNode data, there is no need to provide metadata for calculating the ArcNode address, that is, the metadata generated by the compiler is used to indicate the address of the data to be accessed.

In Table 1, the meaning of the metadata 0x00001800 is as follows: at the offset=0 offset of the node (BackboneNode) represented by NodeID=0, there is a pointer to Nextnode-ID=0 (BackboneNode node type) The pointer, the size of the node represented by Nextnode-ID=0 is 24Byte (Nextnode-size=24). That is, through this metadata, the running device can calculate the next BackboneNode node address that needs to be prefetched in each iteration.

In addition, the meaning of the metadata 0x00805000 is as follows: at the offset=16 offset of the node (BackboneNode) represented by NodeID=0, there is a pointer to Nextnode-ID=1 (RibNode node type), The size of the node represented by Nextnode-ID=1 is 16Byte (Nextnode-size=16). That is, through this piece of metadata, the running device can calculate the next RibNode node address that needs to be prefetched in each iteration.

Please refer to Figure 11, which is a schematic flow chart of a data prefetching method provided by an embodiment of the present application. As shown in Figure 11, the data prefetching method includes the following steps 1101-1104.

Step 1101: The running device determines whether the instruction is a prefetch instruction.

When the running device executes the executable file of the application program, the decoding unit in the running device determines whether the instruction currently to be executed is a prefetch instruction.

Step 1102: The running device initializes the chain prefetcher based on the prefetch instruction to obtain and save the metadata indicated by the prefetch instruction and the address of the data access instruction.

If the instruction currently decoded by the execution device is a prefetch instruction, the execution device initializes the chain prefetcher based on the prefetch instruction. In this way, the chained prefetcher in the running device can obtain and save the metadata indicated by the prefetch instruction and the data access Ask for the address of the command.

Step 1103: The running device determines whether the instruction is a data access instruction.

Since the prefetch instruction is inserted before the data access instruction, after the running device obtains the address of the data access instruction based on the prefetch instruction, the decoding unit in the running device continuously monitors whether the data access instruction is executed, that is, it determines the currently decoded Whether the instruction is the data access instruction indicated by the prefetch instruction.

Step 1104: The running device obtains the starting storage address of the chained data structure based on the data access instruction.

Among them, the data access instruction indicates the starting storage address of the chained data structure.

Step 1105: The running device sends a prefetch request to the cache according to the starting storage address.

Step 1106: The operating device determines whether the cache has returned the prefetched data.

If the running device determines that the cache has returned the prefetched data, the running device continues to execute step 1107.

Step 1107: The running device calculates the next prefetch address based on the metadata indicated by the prefetch instruction and the returned data.

Since the metadata indicates the location of the pointer in the corresponding data, and the pointer in the data indicates the address of the next data, the chain prefetcher in the running device can perform the metadata instruction according to the prefetch instruction. Calculate the next prefetch address with the returned data.

Step 1108: The operating device determines whether to stop prefetching based on the number of execution times of the data access instructions and the number of data prefetching.

Since the address of the data access instruction is indicated in the prefetch instruction, the operating device can monitor the number of execution times of the data access instruction based on the address of the data access instruction. When the difference between the number of data prefetches and the number of execution times of data access instructions is less than the preset range, the running device will continue to perform data prefetching based on the calculated next prefetch address; when the difference between the number of data prefetches and the number of data access instructions When the difference between the execution times of instructions is greater than the preset range, the operating device stops prefetching data.

The compilation method and data prefetching method provided by the embodiments of the present application are introduced above. The following will introduce the execution device for executing the above method.

For details, reference may be made to FIG. 12 , which is a schematic structural diagram of a compilation device 1200 provided by an embodiment of the present application. The compilation device 1200 includes an acquisition unit 1201 and a processing unit 1202 . The acquisition unit 1201 is used to obtain the first code; the processing unit 1202 is used to generate a data access instruction and at least A piece of metadata, wherein the chained data structure includes a plurality of data with discontinuous addresses, the at least one metadata is used to indicate the address of the data in the chained data structure, and the data access instruction is used to indicate The address of the chained data structure and the request to access the chained data structure; the processing unit 1202 is also configured to generate a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled The second code, the prefetch instruction is used to indicate the address of the data access instruction and the at least one metadata.

For details, please refer to FIG. 13 , which is a schematic structural diagram of a data prefetching device 1300 provided by an embodiment of the present application. The data prefetching device 1300 includes: an acquisition unit 1301 , a prefetching unit 1302 and an execution unit 1303 . Acquisition unit 1301, configured to acquire a prefetch instruction, wherein the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata, and the data access instruction is used to indicate the address of a chained data structure, the chained The data structure includes multiple data with discontinuous addresses, and the at least one metadata is used to indicate the address of the data in the chained data structure; the acquisition unit 1301 is also used to acquire all the data according to the address of the data access instruction. The address of the chained data structure; the prefetch unit 1302 is used to prefetch the data in the chained data structure according to the address of the chained data structure and the at least one metadata; the execution unit 1303 is used to Execute the data access instruction to access data in the chained data structure; wherein the first instance is in the process of prefetching data in the chained data structure, and the first instance is based on the first 2. The number of times the data access instruction is executed to control the progress of prefetching the data in the chained data structure. The progress is used to make the data in the chained data structure prefetched into the cache before being accessed. .

In a possible implementation, during the process of the prefetch unit 1302 prefetching data in the chain data structure, the difference between the amount of prefetched data and the amount of accessed data is within a preset value. within the range.

The prefetch unit 1302 is specifically configured to: prefetch data in the chained data structure according to the address of the chained data structure; based on the prefetched data in the chained data structure and the already Prefetch the metadata corresponding to the data and obtain the pointer in the prefetched data; prefetch the prefetched data from the chained data structure according to the address pointed to by the pointer in the prefetched data. other data pointed to.

The obtaining unit 1301 is also configured to: obtain the at least one metadata according to the address of the at least one metadata.

The acquisition unit 1301 is further configured to: acquire the at least one metadata starting from the starting address of the at least one metadata according to the quantity and size of the at least one metadata.

The compiling method and data prefetching method provided by the embodiments of the present application can be specifically executed by a chip in an electronic device. The chip includes: a processing unit and a communication unit. The processing unit can be, for example, a processor, and the communication unit can be, for example, an input/output. Interface, pin or circuit, etc. The processing unit can execute computer execution instructions stored in the storage unit, so that the chip in the electronic device executes the method described in the embodiments shown in FIGS. 1 to 11 . Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit can also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM). Or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.

Referring to Figure 14, the present application also provides a computer-readable storage medium. In some embodiments, the methods disclosed in the above embodiments can be implemented as being encoded on the computer-readable storage medium in a machine-readable format or by Computer program instructions encoded on other non-transitory media or articles.

14 schematically illustrates a conceptual partial view of an example computer-readable storage medium including a computer program for executing a computer process on a computing device, arranged in accordance with at least some embodiments presented herein.

In one embodiment, computer-readable storage media 1400 is provided using signal bearing media 1401. Signal bearing medium 1401 may include one or more program instructions 1402 that, when executed by one or more processors, may provide the functionality or portions of the functionality described above with respect to FIG. 4 or FIG. 7 . Thus, for example, referring to the embodiment shown in FIG. 4 , one or more features of steps 401 - 403 may be undertaken by one or more instructions associated with signal bearing medium 1401 . Additionally, program instructions 1402 in Figure 14 also describe example instructions.

In some examples, signal bearing media 1401 may include computer readable media 1403 such as, but not limited to, a hard drive, compact disk (CD), digital video disc (DVD), digital tape, memory, ROM or RAM, and the like.

In some implementations, signal bearing media 1401 may include computer recordable media 1404 such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, and the like. In some implementations, signal bearing medium 1401 may include communication media 1405, such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.). Thus, for example, signal bearing medium 1401 may be conveyed by a wireless form of communication medium 1405 (eg, a wireless communication medium that complies with the IEEE 802.14 standard or other transmission protocol).

One or more program instructions 1402 may be, for example, computer-executable instructions or logic-implemented instructions. In some examples, the computing device of the computing device may be configured to respond to program instructions 1402 communicated to the computing device via one or more of computer-readable media 1403 , computer-recordable media 1404 , and/or communication media 1405 , Provide various operations, functions, or actions.

It should be understood that the arrangements described here are for example purposes only. Accordingly, those skilled in the art will understand that other arrangements and other elements (e.g., machines, interfaces, functions, sequences, and groups of functions, etc.) can be used instead, and some elements may be omitted altogether depending on the desired results. . Additionally, many of the elements described are functional entities that may be implemented as discrete or distributed components, or in combination with other components in any suitable combination and location.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other various media that can store program codes.

Claims

A data prefetching method, characterized in that the method is applied to a first instance of a computer system, the computer system also includes a second instance, and the method includes:

The first example obtains a prefetch instruction, wherein the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata. The data access instruction is used to indicate the address of a chained data structure. The chained data The structure includes multiple data with discontinuous addresses, and the at least one metadata is used to indicate the address of the data in the chained data structure;

The first instance obtains the address of the chained data structure according to the address of the data access instruction;

The first instance prefetches data in the chained data structure based on the address of the chained data structure and the at least one metadata;

The second instance executes the data access instruction to access data in the chained data structure;

Wherein, in the process of prefetching data in the chained data structure, the first instance controls the prefetching of the chained data according to the number of times the second instance executes the data access instruction. The progress of the data in the structure, which is used to make the data in the chained data structure prefetched into the cache before being accessed.
The method of claim 1, wherein during the process of the first instance prefetching data in the chained data structure, the difference between the amount of prefetched data and the amount of accessed data is within the preset range.
The method according to claim 1 or 2, characterized in that the data in the chained data structure includes pointers pointing to addresses of other data in the chained data structure, and the at least one metadata is respectively related to Different data in the chained data structure correspond to each other, and the at least one metadata is used to indicate the position of the pointer in the corresponding data;

The first instance prefetches data in the chained data structure based on the address of the chained data structure and the at least one metadata, including:

The first instance prefetches the data in the chained data structure according to the address of the chained data structure;

The first instance obtains a pointer in the prefetched data based on the prefetched data in the chained data structure and the metadata corresponding to the prefetched data;

The first instance prefetches other data pointed to by the prefetched data from the chained data structure according to the address pointed by the pointer in the prefetched data.
The method according to claim 3, characterized in that the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
The method according to claim 3 or 4, characterized in that each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data. type.
The method according to any one of claims 1 to 5, characterized in that the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
The method according to any one of claims 1-6, characterized in that the prefetch instruction is specifically used to indicate the address of the at least one metadata;

The method also includes:

The first instance obtains the at least one metadata according to the address of the at least one metadata.
The method of claim 7, wherein the at least one metadata has the same size, and the prefetch instruction is used to indicate a starting address of the at least one metadata and a quantity of the at least one metadata. ;

The first instance obtains the at least one metadata based on the address of the at least one metadata, including:

The first example obtains the at least one metadata starting from a starting address of the at least one metadata according to the quantity and size of the at least one metadata.
A compilation method, characterized by including:

Get the first code;

When it is recognized that there is a code requesting access to a chained data structure in the first code, a data access instruction and at least one metadata are generated according to the chained data structure, wherein the chained data structure includes addresses with discontinuous addresses. A plurality of data, the at least one metadata is used to indicate the address of the data in the chained data structure, and the data access instruction is used to indicate the address of the chained data structure and request access to the chained data structure;

A prefetch instruction is generated according to the at least one metadata and the data access instruction to obtain the compiled second code, and the prefetch instruction is used to indicate the address of the data access instruction and the at least one metadata.
The method according to claim 9, characterized in that the data in the chained data structure includes pointers pointing to addresses of other data, and the at least one metadata is different from those in the chained data structure. The data corresponds to each other, and the at least one metadata is used to indicate the position of the pointer in the corresponding data.
The method according to claim 10, characterized in that the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
The method according to claim 10 or 11, characterized in that each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data. type.
The method according to any one of claims 9-12, wherein the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
The method according to any one of claims 9-13, characterized in that the prefetch instruction is specifically used to indicate the address of the at least one metadata.
The method of claim 14, wherein the prefetch instruction is used to indicate a starting address of the at least one metadata and a quantity of the at least one metadata, and the at least one metadata has the same size. .
The method according to any one of claims 9-15, characterized in that the at least one metadata is located in a code segment or a data segment in the second code, and the second code is compiled based on the first code. of.
An electronic device, characterized by comprising a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the electronic device executes the claims The method described in any one of 1 to 8.
An electronic device, characterized by comprising a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the electronic device executes the claims The method described in any one of 9 to 16.
A computer-readable storage medium, characterized in that it includes computer-readable instructions. When the computer-readable instructions are run on a computer, the computer is caused to perform the method according to any one of claims 1 to 16. .
A computer program product, characterized by comprising computer-readable instructions, which when run on a computer, cause the computer to perform the method according to any one of claims 1 to 16.