WO2023237084A1 - Data prefetching method, compiling method and related apparatus - Google Patents

Data prefetching method, compiling method and related apparatus Download PDF

Info

Publication number
WO2023237084A1
WO2023237084A1 PCT/CN2023/099303 CN2023099303W WO2023237084A1 WO 2023237084 A1 WO2023237084 A1 WO 2023237084A1 CN 2023099303 W CN2023099303 W CN 2023099303W WO 2023237084 A1 WO2023237084 A1 WO 2023237084A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
metadata
address
chained
instruction
Prior art date
Application number
PCT/CN2023/099303
Other languages
French (fr)
Chinese (zh)
Inventor
勾玥
孙文博
刘盈盈
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023237084A1 publication Critical patent/WO2023237084A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems

Definitions

  • the present application relates to the field of computer technology, and in particular, to a data prefetching method, compilation method and related devices.
  • computers In computer systems, different storage devices usually have different access speeds. In computers with multi-level storage systems, computers usually use data prefetching technology to improve system access performance. Specifically, the computer will predict the data to be accessed and load the predicted data from a storage device with a slower access speed to a storage device with a faster access speed in advance, such as loading the predicted data from the memory to the cache ( cache).
  • existing data prefetching technologies usually prefetch data to be accessed based on historical memory access information. For example, when the computer detects historical memory access information and finds that the program accesses data in an address incrementing manner, the computer prefetches the data to be accessed based on the currently accessed data and the same address incrementing manner.
  • current data prefetching technology can only achieve effective prefetching for data whose storage addresses have certain patterns, such as data with continuous storage addresses or data with storage addresses that increase by a certain value.
  • data in a chained data structure with irregular storage addresses current data prefetching technology is difficult to achieve effective prefetching of data.
  • the chained data structure usually includes multiple dispersedly stored data, and the data including pointers in the chained data structure will point to the address where the next data is stored.
  • current data prefetching technology prefetches data in a chained data structure, it is often difficult to determine the amount of data to be prefetched. Too much data may be prefetched at one time, causing cache pollution, or too little data may be prefetched. However, the purpose of improving data access performance cannot be achieved.
  • This application provides a data prefetching method that can effectively prefetch data in a chained data structure.
  • a first aspect of the present application provides a data prefetching method, the method is applied to a first instance of a computer system, and the computer system further includes a second instance.
  • the data prefetching method includes: the first instance obtains a prefetch instruction in an executable file, the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata, and the data access instruction is used to indicate The address of a chained data structure.
  • the chained data structure includes multiple data with discontinuous addresses.
  • the at least one metadata is used to indicate the address of the data in the chained data structure.
  • the address of the chained data structure can refer to the address of any data in the chained data structure.
  • the address of the chained data structure can be determined based on the address of the first data that needs to be accessed in the chained data structure. For example, the first data that needs to be accessed in the chained data structure is the first data in the chained data structure. , then the data access instruction indicates the address of the first data in the chained data structure.
  • the address of the chained data structure may indicate a specific single address, such as the starting address where certain data is stored; the address of the chained data structure may also indicate an address segment, such as the address segment where certain data is stored.
  • the first instance obtains the address of the chained data structure according to the address of the data access instruction. Furthermore, the first instance prefetches data in the chained data structure according to the address of the chained data structure and the at least one metadata.
  • the second instance executes the data access instructions to access data in the chained data structure.
  • the first instance controls the prefetching of the chained data structure according to the number of times the second instance executes the data access instruction.
  • the progress of the data in the data structure the progress is used to make the data in the chained data structure prefetched into the cache before being accessed.
  • the running device when the running device executes the executable file, it can determine the data access instruction and at least one metadata based on the prefetch instruction in the executable file, thereby realizing data prefetching in the chained data structure; and, the running device can After obtaining the address of the data access instruction based on the prefetch instruction, the data access progress in the chained data structure can be obtained according to the number of accesses of the data access instruction, thereby controlling the progress of prefetching the data in the chained data structure, that is, Adaptively adjust the amount of prefetched data to ensure effective prefetching of data in the chained data structure.
  • the data prefetched by the first instance may not include pointers to other data; or all the data prefetched by the first instance may be Part of the data includes pointers to other data, while other parts of the data do not include pointers to other data.
  • the second instance executes the data access instruction, the data accessed by the second instance may not include pointers to other data; or part of all the data accessed by the second instance may include pointers to other data. pointers, while the other part of the data does not include pointers to other data.
  • the difference between the amount of prefetched data and the amount of accessed data is within a preset range.
  • the first instance can control the prefetch number of data in the chained data structure to always be 5-10 more than the actual number of accessed data, so as to ensure prefetching. While ensuring timeliness, it avoids polluting the cache by prefetching too much data.
  • the first instance can dynamically adjust the above-mentioned preset range according to the progress of the data and the available cache space of the running device to ensure a balance between the amount of data prefetched and the available cache space.
  • the data in the chained data structure includes pointers pointing to addresses of other data in the chained data structure, and the at least one metadata is respectively related to the chained data structure.
  • the at least one metadata is used to indicate the position of the pointer in the corresponding data.
  • the first instance prefetches data in the chained data structure based on the address of the chained data structure and the at least one metadata, including: the first instance based on the address of the chained data structure , prefetch the data in the chained data structure; the first instance obtains the prefetched data according to the prefetched data in the chained data structure and the metadata corresponding to the prefetched data. the pointer in; the first instance prefetches other data pointed to by the prefetched data from the chained data structure according to the address pointed by the pointer in the prefetched data.
  • the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data. That is to say, for a certain metadata, the metadata is also used to indicate the size of other data pointed to by the pointer in the corresponding data. For example, assume that metadata 1 corresponds to data 1, and the pointer in data 1 points to data 2; then, metadata 1 is also used to indicate the size of data 2 pointed to by the pointer in data 1.
  • each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
  • the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
  • the prefetch instruction is specifically used to indicate the address of the at least one metadata; the method further includes: the first instance obtains the address of the at least one metadata according to the address of the at least one metadata. Describe at least one metadata.
  • a second aspect of this application provides a compilation method, including: a compiler obtains a first code; wherein the first code may refer to a program source code, such as a code based on high-level languages such as java, c, c++, python, etc.
  • the compiler When recognizing that there is code requesting access to a chained data structure in the first code, the compiler generates a data access instruction and at least one metadata according to the chained data structure, wherein the chained data structure includes an address and A plurality of consecutive data, the at least one metadata is used to indicate the address of the data in the chained data structure, and the data access instruction is used to indicate the address of the chained data structure and request access to the chained data structure. data structure.
  • the compiler generates a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled second code, the prefetch instruction is used to indicate the address of the data access instruction and the at least a metadata.
  • the data access instruction and at least one metadata used to indicate the address of the data to be accessed in the chained data structure are generated, and during the data access A prefetch instruction is inserted before the instruction to indicate the address of the data access instruction and the at least one metadata.
  • the running device executes the compiled executable file, it can determine the data access instruction and at least one metadata based on the prefetch instruction, thereby realizing data prefetching in the chained data structure; and, the running device can determine the data access instruction and at least one metadata based on the prefetch instruction.
  • the instruction fetch After the instruction fetch obtains the address of the data access instruction, it can learn the data access progress in the chained data structure according to the number of accesses of the data access instruction, thereby adaptively adjusting the amount of prefetched data to ensure that the data in the chained data structure effective prefetching.
  • the data in the chained data structure includes pointers pointing to addresses of other data
  • the at least one metadata respectively corresponds to different data in the chained data structure
  • the at least one metadata is used to indicate the position of the pointer in the corresponding data.
  • the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
  • each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
  • the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
  • the prefetch instruction is specifically used to indicate the address of the at least one metadata.
  • the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata, and the sizes of the at least one metadata are the same.
  • the at least one metadata is located in a code segment or data segment in the second code, and the second code is compiled based on the first code.
  • the third aspect of this application provides a data prefetching device, including:
  • An acquisition unit configured to acquire a prefetch instruction, wherein the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata, the data access instruction is used to indicate the address of a chained data structure, the chained data
  • the structure includes multiple data with discontinuous addresses, and the at least one metadata is used to indicate the address of the data in the chained data structure;
  • the acquisition unit is also configured to acquire the address of the chained data structure according to the address of the data access instruction
  • a prefetch unit configured to prefetch data in the chained data structure according to the address of the chained data structure and the at least one metadata
  • An execution unit used to execute the data access instructions to access the data in the chained data structure
  • the first instance controls the prefetching of the chained data according to the number of times the second instance executes the data access instruction.
  • the progress of the data in the structure which is used to make the data in the chained data structure prefetched into the cache before being accessed.
  • the difference between the amount of prefetched data and the amount of accessed data is within a preset range.
  • the data in the chained data structure includes pointers pointing to addresses of other data in the chained data structure, and the at least one metadata is respectively related to the chained data structure.
  • the at least one metadata is used to indicate the position of the pointer in the corresponding data;
  • the prefetch unit is specifically configured to: prefetch data in the chained data structure according to the address of the chained data structure; and prefetch data in the chained data structure and the prefetched Get the metadata corresponding to the data and obtain the pointer in the prefetched data; prefetch the address of the prefetched data from the chained data structure according to the address pointed by the pointer in the prefetched data. other data pointed to.
  • the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
  • each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
  • the prefetch instruction is specifically used to indicate that the prefetch instruction is related to the data access Address offset between instructions.
  • the prefetch instruction is specifically used to indicate the address of the at least one metadata
  • the acquisition unit is further configured to: acquire the at least one metadata according to the address of the at least one metadata.
  • the at least one metadata has the same size, and the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata;
  • the acquisition unit is further configured to: acquire the at least one metadata starting from a starting address of the at least one metadata according to the quantity and size of the at least one metadata.
  • a fourth aspect of this application provides a compilation device, including:
  • a processing unit configured to generate a data access instruction and at least one metadata according to the chained data structure when it is recognized that the first code contains code requesting access to the chained data structure, wherein the chained data structure Includes multiple data with discontinuous addresses, the at least one metadata is used to indicate the address of the data in the chained data structure, and the data access instruction is used to indicate the address of the chained data structure and request access to all data. Described chained data structure;
  • the processing unit is further configured to generate a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled second code, where the prefetch instruction is used to indicate the address of the data access instruction. and said at least one metadata.
  • the data in the chained data structure includes pointers pointing to addresses of other data
  • the at least one metadata respectively corresponds to different data in the chained data structure
  • the at least one metadata is used to indicate the position of the pointer in the corresponding data.
  • the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
  • each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
  • the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
  • the prefetch instruction is specifically used to indicate the address of the at least one metadata.
  • the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata, and the sizes of the at least one metadata are the same.
  • the at least one metadata is located in a code segment or data segment in the second code, and the second code is compiled based on the first code.
  • a fifth aspect of the present application provides an electronic device.
  • the electronic device includes: a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the The electronic device performs the method implemented in any one of the first aspect or the second aspect.
  • a sixth aspect of the present application provides a computer-readable storage medium.
  • a computer program is stored in the computer-readable storage medium. When it is run on a computer, it causes the computer to execute any one of the implementations of the first aspect or the second aspect. way method.
  • a seventh aspect of the present application provides a computer program product that, when run on a computer, causes the computer to execute the method implemented in any one of the first aspect or the second aspect.
  • An eighth aspect of this application provides a chip including one or more processors. Part or all of the processor is used to read and execute the computer program stored in the memory to perform the method in any possible implementation of any of the above aspects.
  • the chip should include a memory, and the memory and the processor are connected to the memory through circuits or wires.
  • the chip also includes a communication interface, and the processor is connected to the communication interface.
  • the communication interface is used to receive data and/or information that needs to be processed.
  • the processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing results through the communication interface.
  • the communication interface may be an input-output interface.
  • Figure 1 is a schematic diagram of a chained data structure provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of multiple different chained data structures provided by embodiments of the present application.
  • Figure 3 is a schematic diagram of a running device executing an application program provided by an embodiment of the present application
  • Figure 4 is a schematic flowchart of a compilation method provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of the correspondence between data and metadata in a chained data structure provided by an embodiment of the present application
  • Figure 6 is a schematic diagram of the types of data in a chained data structure according to an embodiment of the present application.
  • Figure 7 is a schematic flow chart of a data prefetching method provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Figure 9 is a schematic flowchart of a compilation method provided by an embodiment of the present application.
  • Figure 10A is a schematic diagram of compiling a verification program based on an existing compiler provided by an embodiment of the present application
  • Figure 10B is a schematic diagram of a verification program compiled by a compiler based on the newly added optimized PASS provided by the embodiment of the present application;
  • Figure 11 is a schematic flow chart of a data prefetching method provided by an embodiment of the present application.
  • Figure 12 is a schematic structural diagram of a compilation device 1200 provided by an embodiment of the present application.
  • Figure 13 is a schematic structural diagram of a data prefetching device 1300 provided by an embodiment of the present application.
  • Figure 14 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the present application.
  • the chain data structure includes multiple data with discontinuous addresses, and the multiple data have an address pointing relationship with each other, that is, the previous data in the chain data structure points to the address of the next data.
  • Figure 1 is a schematic diagram of a chained data structure provided by an embodiment of the present application.
  • each data in the chained data structure includes two parts, one part is the valid data part, and the other part is the pointer part, and the pointer part is used to point to the address of the next data linked to the current data.
  • the chained data structure uses pointers to reflect the logical relationship between data elements. In this way, in the process of accessing the chained data structure, the access is usually performed from front to back, that is, the previous data is accessed first, and then the next data can be accessed based on the address indicated by the previous data.
  • the structural forms of linked data structures mainly include one-way linked list, doubly linked list, circular linked list, spine-rib (Backbone-rib) linked list, binary tree structure and structure array structure.
  • the structure array structure refers to saving the structure array in continuous memory, and the structure has pointers.
  • Figure 2 is a schematic diagram of multiple different chained data structures provided by embodiments of the present application. (1) in Figure 2 shows a Backbone-rib linked list, (2) in Figure 2 shows a binary tree structure, and (3) in Figure 2 shows a structure array structure.
  • chained data structures are mainly composed of dynamically connected data, usually in the form of trees/graphs/linked lists.
  • Chained data structures are widely used in general computing, high performance computing (High Performance Computing, HPC), databases and artificial intelligence fields. It is also an important data structure for the underlying implementation of containers provided by object-oriented programming languages such as C++/Java.
  • the chained data structure can make full use of computer memory space and achieve flexible dynamic memory management.
  • the disadvantage of the chained data structure is that there is no spatial locality between data. Therefore, reading the chained data structure is mostly a typical irregular memory access, which easily causes memory access delays and limits the central processing unit. , CPU) performance appears as a performance bottleneck in different application scenarios.
  • Memory access latency is the delay caused by waiting for access to data stored in system memory to complete.
  • Compilation refers to the process of using a compiler to generate object code from a source program written in a source language.
  • Object code is a language between high-level language and machine language.
  • the object code can be further converted into executable binary machine code.
  • compilation is the conversion of a source program written in a high-level language into an object code that is closer to machine language. Since the computer only recognizes 1 and 0, compilation actually means turning the high-level language that people are familiar with into a binary language that the computer can recognize.
  • the compiler's process of translating a source program into a target program is divided into five stages: lexical analysis; syntax analysis; semantic checking and intermediate code generation; code optimization; and target code generation.
  • Intermediate code It is an internal representation of the source program, which can also be called intermediate representation (IntermediateRepresentation, IR).
  • IR IntermediateRepresentation
  • the function of the intermediate representation is to make the structure of the compiled program logically simpler and clearer, especially to make the optimization of the target code easier to implement.
  • the complexity of the intermediate representation is somewhere between source programming language and machine language.
  • Code optimization refers to performing various equivalent transformations on the program so that more effective target code can be generated based on the transformed program.
  • the so-called equivalence means that the running results of the program are not changed.
  • the so-called effective mainly refers to the short running time of the target code and the small storage space occupied. This transformation is called optimization.
  • Optimization Pass is an important part of the compilation framework. Optimization Pass analyzes and modifies the intermediate representation. In the process of code optimization, the intermediate representation is analyzed and modified by multiple optimization passes, and each pass completes specific optimization work.
  • PC Program Counter
  • Metadata also known as intermediary data and relay data, is a type of data that describes data (dataabout data). Metadata is mainly information that describes data attributes (property) and is used to support functions such as indicating storage location, historical data, resource search, file recording, etc. Specifically, metadata is a kind of electronic catalog. In order to achieve the purpose of cataloging, it must describe and collect the content or characteristics of the data, thereby achieving the purpose of assisting data retrieval.
  • an application program is composed of program segments such as program code segments, data segments, and read-only data segments.
  • the program code segments are composed of consecutive instructions.
  • the operating system loads the program segments of the application program into the memory, and then the running device sequentially executes the instructions in the program code segments based on a certain order, thereby realizing the execution of the application program.
  • running equipment usually includes a control unit, a storage unit and an arithmetic unit.
  • the control unit includes an instruction counter and an instruction register.
  • the instruction counter is used to store the address of the next instruction to be executed in the memory
  • the instruction register is used to store the instruction to be executed.
  • the storage unit usually includes multiple registers, such as general-purpose registers, floating-point registers, etc.
  • the registers in the storage unit are usually used to store data needed during the execution of instructions.
  • the computing unit is used to process data according to the currently executed instructions.
  • the operating principle of the operating device is: under the action of the timing pulse, the control unit sends the instruction address pointed to by the instruction counter (that is, the address of the instruction in the memory) to the address bus (not shown in Figure 3 ), and then the running device reads the instruction in this instruction address into the instruction register for decoding. For the data needed in the execution of instructions, the running device sends the data address corresponding to the data to the address bus, and based on the data address, the data is read into the temporary storage unit inside the running device. Finally, the computing unit in the running device processes the data based on the currently executed instructions. In general, the running device fetches instructions and corresponding data from the memory one by one, and performs operations on the data according to the operation codes in the instructions until the program is executed.
  • the working process of the operating device can be divided into five stages: instruction fetching, instruction decoding, instruction execution, memory access and result writing back.
  • the instruction fetch phase is the process of fetching an instruction from memory to the instruction register.
  • the value in the instruction counter is used to indicate the location of the next instruction to be executed in the memory.
  • the value in the instruction counter is automatically incremented according to the length of the instruction.
  • the running device After fetching the instruction, the running device immediately enters the instruction decoding stage.
  • the instruction decoder splits and interprets the retrieved instructions according to the predetermined instruction format, and identifies and distinguishes different instruction categories and various methods of obtaining operands.
  • the operating device After the instruction fetch and instruction decoding stages, the operating device enters the instruction execution stage.
  • the task of executing the instruction phase is to complete various operations specified by the instruction to realize the function of the instruction. Therefore, different parts of the operating equipment are connected to perform the required operations. For example, if an addition operation is required, the arithmetic logic unit in the arithmetic unit will be connected to a set of inputs and a set of outputs. The input terminals provide the values to be added, and the output terminals will contain the final operation result.
  • the running device may need to access memory to read the operands.
  • the running device enters the access and data access phase.
  • the task of the access phase is: the operating device obtains the address of the operand in the memory according to the instruction address code, and reads the operand from the memory for operation.
  • the result write-back stage "writes back" the running result data of the execution instruction stage into a certain storage structure.
  • the result data is usually written to the internal register of the running device so that it can be quickly accessed by subsequent instructions; in some cases, the result data can also be written to a relatively slower, but cheaper and larger capacity in memory.
  • the running device After the instruction is executed and the result data is written back, the running device then obtains the address of the next instruction from the instruction counter and starts a new cycle. The next instruction will be sequentially fetched in the next instruction cycle.
  • the operating device usually needs to execute the access phase when processing each memory access instruction, and only after executing the access phase can the data obtained from the memory be processed. Perform computational processing.
  • the running device needs to wait for data to be fetched from the memory to the cache every time it processes a memory access instruction, resulting in a huge memory access delay.
  • Prefetching technology mainly includes software prefetching technology (SoftWare Prefetch, SWP) and hardware prefetching technology (HardWare Prefetch, HWP).
  • SWP software prefetching technology
  • HWP hardware prefetching technology
  • Software prefetching technology refers to explicitly inserting prefetch instructions into the program to allow the running device to read the data at the specified address from the memory into the cache (Cache).
  • Prefetch instructions can be added automatically by the compiler or manually by the programmer.
  • Software prefetching has almost no hardware requirements. Its biggest technical challenge is how to correctly add prefetch instructions in the target code. For chained data structures, it is difficult to optimize through software prefetching because chained data is calculated The address overhead of structure prefetching is very high, which can easily cause the problem of insufficient prefetch advance.
  • Hardware prefetching technology uses hardware to prefetch possible future memory access units into the cache based on historical memory access information.
  • Typical hardware prefetchers include stream prefetchers and stride prefetchers.
  • the role of stream prefetchers When it is detected that the program accesses data by increasing the address, the data of the next cache line (Cacheline) is automatically prefetched.
  • the stride prefetcher monitors each memory load instruction (Load). When regular stride reads are found, the prefetcher will precalculate the next address and initiate prefetching.
  • Most of the existing hardware prefetching technologies in the industry are based on the assumptions of temporal locality and spatial locality. However, the linked list data structure is very unfriendly to the current CPU memory access architecture, which also leads to the unsatisfactory performance of current commercial CPUs in such applications. , it is difficult to prefetch complex irregular memory accesses.
  • embodiments of the present application provide a compilation method and a data prefetching method.
  • a data access instruction is generated and used to indicate the behavior of the chained data structure.
  • At least one metadata of the address of the data to be accessed, and a prefetch instruction is inserted before the data access instruction to indicate the address of the data access instruction and the at least one metadata.
  • the running device when the running device executes the compiled executable file, it can determine the data access instruction and at least one metadata based on the prefetch instruction, thereby realizing data prefetching in the chained data structure; and, the running device can determine the data access instruction and at least one metadata based on the prefetch instruction.
  • the instruction fetch obtains the address of the data access instruction, it can learn the data access progress in the chained data structure according to the number of accesses of the data access instruction, thereby adaptively adjusting the amount of prefetched data to ensure that the data in the chained data structure effective prefetching.
  • the compilation method provided by the embodiment of the present application can be applied to compile codes with chained data structure access behavior, such as code compilation in fields such as general computing, high-performance computing, databases, and artificial intelligence.
  • the data prefetching method provided by the embodiment of the present application can be applied to scenarios that need to execute applications that require access to chained data structures.
  • the compilation method and data prefetching method provided by the embodiments of the present application can be applied to electronic devices.
  • the electronic device provided by the embodiment of the present application can be, for example, a server, a smart phone (mobile phone), a personal computer (PC), a notebook computer, a tablet computer, a smart TV, a mobile Internet device (mobile internet device, MID), or Wearable devices, virtual reality (VR) devices, augmented reality (AR) devices, wireless terminals in industrial control, wireless terminals in self-driving, remote surgery (remote medical) Wireless terminals in surgery, wireless terminals in smart grid, wireless terminals in transportation safety, wireless terminals in smart city, wireless terminals in smart home, etc. .
  • VR virtual reality
  • AR augmented reality
  • Figure 4 is a schematic flowchart of a compilation method provided by an embodiment of the present application. As shown in Figure 4, the compilation method includes the following steps 401-403.
  • Step 401 Obtain the first code.
  • the first code may refer to program source code.
  • program source code refers to an uncompiled text file written in accordance with certain programming language specifications. It is a series of human-readable computer language instructions.
  • the program source code may be code written based on high-level languages such as java, c, c++, python, etc.
  • Step 402 When it is recognized that the first code contains code requesting access to a chained data structure, generate at least one metadata and data access instruction according to the chained data structure, wherein the chained data structure includes an address. A plurality of discontinuous data, the at least one metadata is used to indicate the address of the data to be accessed in the chained data structure, and the data access instruction is used to indicate the address of the chained data structure and request access to all data. Describe the chained data structure.
  • the compiler when the compiler is compiling the first code, when the compiler recognizes that there is a behavior in the first code that requests access to the chained data structure, since each data in the chained data structure will point downward The address of a data, so the compiler can obtain the address of the data to be accessed in the chained data structure based on the actual structure of the chained data structure, thereby generating at least one metadata.
  • at least one metadata generated by the compiler is used to indicate the chain The addresses of multiple data to be accessed in the formula data structure.
  • the chained data structure includes multiple data with discontinuous addresses, and there is an address pointing relationship between the data.
  • the multiple data to be accessed in the chain data structure may be all the data in the chain data structure or part of the data in the chain data structure, which is not specifically limited in this embodiment.
  • At least one metadata described in this embodiment refers to one or more metadata.
  • “at least one metadata” will be referred to as “metadata” below.
  • the metadata generated by the compiler may be a code segment or data segment located in the second code, which is compiled based on the first code, that is, the second code is actually the compiler
  • the executable file compiled based on the first code may be stored in the code segment of the second code as an instruction code that is not executed, or the metadata may be stored in the data segment of the second code as a type of data in the program code.
  • the compiler when the compiler recognizes that there is a request to access the chained data structure in the first code, the compiler also generates a data access instruction so that the subsequent running device can execute the compiled executable file according to the data access instruction. Access chained data structures.
  • the data access instruction is specifically used to request access to the chained data structure, and the data access instruction also indicates the address of the chained data structure.
  • the address of the chained data structure can refer to the address of any data in the chained data structure.
  • the address of the chained data structure may be determined based on the address of the first data in the chained data structure that needs to be accessed. For example, in the first code, if the first data that needs to be accessed in the chained data structure is the first data in the chained data structure, then the data access instruction indicates the address of the first data in the chained data structure; if The first data that needs to be accessed in the chained data structure is some data in the middle of the chained data structure, then the data access instruction indicates the address of the data in the middle of the chained data structure.
  • the address of the chained data structure may indicate a specific single address, such as the starting address where certain data is stored; the address of the chained data structure may also indicate an address segment, such as the address segment where certain data is stored.
  • addresses described in the embodiments of this application may refer to physical storage addresses or virtual storage addresses, which are not specifically limited in this embodiment.
  • Step 403 Generate a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled second code.
  • the prefetch instruction is used to indicate the address of the data access instruction and the at least one metadata.
  • the compiler After the compiler generates metadata and data access instructions, the compiler further generates prefetch instructions, which are used to indicate the address of the data access instructions and metadata. In addition, the compiler can also insert the prefetch instruction before the data access instruction, so that during the application execution phase, when the running device executes the compiled executable file, it executes the prefetch instruction first and then executes the data access instruction. .
  • the running device can determine the data access instructions and metadata based on the prefetch instructions when executing the executable file, thereby realizing the data in the chained data structure. Prefetching. That is, the running device first determines the address of the data access instruction based on the prefetch instruction, thereby obtaining the starting storage address of the chained data structure by accessing the address of the data accessing instruction; then, based on the starting storage address and element of the chained data structure Data, ordered prefetching of data in a chained data structure.
  • the running device After the running device obtains the address of the data access instruction based on the prefetch instruction, it can learn the data access progress in the chained data structure based on the number of accesses of the data access instruction, thereby adaptively adjusting the amount of prefetched data. Ensure efficient prefetching of data in chained data structures.
  • the above-mentioned prefetch instruction may directly indicate the address of the data access instruction, such as the prefetch instruction indicator number.
  • the address of the data access instruction is 0x1002.
  • the prefetch instruction may also indicate an address offset between the prefetch instruction and the data access instruction. For example, if the address of the prefetch instruction is 0x1008 and the address of the data access instruction is 0x1002, then the address offset between the prefetch instruction and the data access instruction is 06. It is understandable that, considering the timeliness and effectiveness of the prefetch instruction prompting the running device to perform prefetching, the prefetch instruction is generated before the data access instruction, and the address offset between the two instructions is small.
  • the encoding of the prefetch instruction can be reduced space occupied, thereby saving instruction overhead.
  • the compiler generates metadata and data access instructions during the compilation process, and inserts prefetch instructions indicating metadata and data access instructions before data access.
  • prefetch instructions indicating metadata and data access instructions before data access.
  • the above-mentioned prefetch instruction may also indicate the content of the metadata, or indicate the storage address of the metadata.
  • Implementation Mode 1 The prefetch instruction is used to indicate the starting storage address of the metadata and the quantity of the metadata, and the size of each metadata is the same.
  • the sizes of the metadata generated by the compiler are all the same, and the storage addresses of the metadata are continuous. Therefore, the compiler may indicate the starting storage address of the metadata and the amount of metadata in the prefetch instruction. In this way, the running device can first take out the first metadata in the metadata based on the starting storage address of the metadata and the size of the metadata; and use the size of the metadata as the address offset to continue to take out other subsequent ones. metadata, thereby enabling prefetching of all metadata.
  • the compiler can indicate in the prefetch instruction that the starting storage address of metadata is 0x0004 and the number of metadata is 4. In this way, the running device can determine the storage address of each metadata based on the size of the metadata being 4 bytes and the starting storage address and quantity of the metadata.
  • the prefetch instruction is used to indicate the starting storage address of metadata and the size of each metadata.
  • the size of each metadata indicated by the prefetch instruction may be different.
  • the prefetch instruction indicates that the starting storage address of multiple metadata is 0x0001, and the size of the first metadata is 2 bytes, and the size of the second metadata is 4 bytes, the size of the third metadata is 2 bytes, and the size of the fourth metadata is 6 bytes.
  • the running device can determine the storage address of the first metadata as 0x0000-0x0001, the storage address of the second metadata as 0x002-0x005, and the storage address of the third metadata based on the starting storage address of the metadata and the size of each metadata.
  • the storage address of metadata is 0x0006-0x0007, and the storage address of the fourth metadata is 0x0008-0x000d.
  • the metadata generated by the compiler may not directly indicate the address of the data in the chained data structure, but may indicate the location of the pointer in the data.
  • the plurality of metadata generated by the compiler may respectively correspond to different data in the chained data structure.
  • each metadata in the plurality of metadata corresponds to each piece of data in the chained data structure. There is a one-to-one correspondence between the data to be accessed.
  • the plurality of metadata are used to indicate the position of the pointer in the corresponding data.
  • FIG. 5 is a schematic diagram of the correspondence between data and metadata in a chained data structure provided by an embodiment of the present application.
  • the chained data structure includes 4 data to be accessed, namely: data 1, data 2, data 3 and data 4.
  • the compiler generates 4 metadata based on the 4 data to be accessed in the chain data structure, namely: metadata 1, metadata 2, metadata 3 and metadata 4.
  • the four pieces of metadata generated by the compiler correspond to the four pieces of data to be accessed in the chained data structure.
  • metadata 1 indicates that the offset of pointer 1 in data 1 in data 1 (that is, the offset between the starting storage address of pointer 1 and the starting storage address of data 1) is 8; metadata 2 Indicates that the offset of pointer 2 in data 2 is 14; metadata 3 indicates that the offset of pointer 3 in data 3 is 4; metadata 4 indicates that pointer 4 in data 4 The offset in data 4 is 14.
  • the running device obtains each metadata, it can determine the position of the pointer of each data prefetched from the chain data structure based on the content indicated by the metadata, and then continue to determine the location of the next data that needs to be prefetched. address.
  • each of the plurality of metadata generated by the compiler is also used to indicate the size of other data pointed to by its corresponding data.
  • the running device executes the compiled executable file, the running device can determine the size of the next data that needs to be prefetched according to the instructions of the metadata, thereby Prefetching of data is implemented based on the starting storage address and size of the next data that needs to be prefetched.
  • metadata 1 corresponds to data 1
  • metadata 1 can also indicate the size of data 2 pointed to by data 1.
  • the running device determines the position of pointer 1 in data 1 based on the offset indicated in metadata 1, it can determine the starting storage address of data 2 based on pointer 1; then, the running device combines the offset indicated in metadata 1
  • the indicated size of data 2 and the starting storage address of data 2 determine the actual storage address of the entire data 2, thereby realizing prefetching of data 2.
  • each of the plurality of metadata generated by the compiler is also used to indicate the type of its corresponding data and the type of other data pointed to by its corresponding data.
  • FIG. 6 is a schematic diagram of data types in a chained data structure according to an embodiment of the present application.
  • part of the data is linked to multiple data, that is, part of the data points to the addresses of multiple other data.
  • set the data types of the same row to be the same, that is, the data type of the first row of data is 0, the data type of the second row of data is 1, and the data type of the third row of data is 2.
  • the running device can determine that the data corresponding to the metadata is the first row of data, and the data pointed to by the metadata corresponding to the data is the second row of data; similar Specifically, when the metadata indicates data types 0 and 0, the running device can determine that the data corresponding to the metadata is the first row of data, and the data pointed to by the data corresponding to the metadata is the first row of data.
  • FIG. 7 is a schematic flowchart of a data prefetching method provided by an embodiment of the present application.
  • the data prefetching method includes the following steps 701-704.
  • the data prefetching method is applied to the first instance of the computer system, and the computer system further includes the second instance.
  • Step 701 The first instance obtains a prefetch instruction, wherein the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata.
  • the data access instruction is used to indicate the address of a chained data structure.
  • the chained data structure includes a plurality of data with discontinuous addresses, and the at least one metadata is used to indicate the address of the data in the chained data structure.
  • the first instance in the running device can obtain the prefetch instructions in the executable file.
  • the prefetch instructions in the executable file are the prefetch instructions compiled in the above-mentioned compilation method. For details, please refer to the above-mentioned compilation method, which will not be described again here.
  • the second instance is used to execute a data access instruction to request access to data in the chained data structure.
  • the second instance is used to execute the executable file of the application program, while the first instance independently executes the data prefetching method provided by the embodiment of the present application.
  • first instance and the second instance in the embodiment of the present application may be two physically independent execution units.
  • first instance and the second instance may be two independent processors or processing cores respectively.
  • the first instance and the second instance may also be two virtual independent execution units.
  • the first instance and the second instance may be different threads, hyper-threads or processes respectively, which is not specifically limited in this embodiment.
  • Step 702 The first instance obtains the address of the chained data structure according to the address of the data access instruction.
  • the running device executes the prefetch instruction to start the first instance.
  • the first instance can also obtain the prefetch instruction. Since the metadata and the address of the data access instruction are indicated in the prefetch instruction, the first instance can obtain the metadata according to the prefetch instruction and temporarily store the metadata to facilitate subsequent prefetching of data based on the metadata.
  • the running device can also obtain the address of the chained data structure indicated by the data access instruction based on the address of the data access instruction indicated by the prefetch instruction.
  • the first instance can monitor the address of the data access instruction in real time to determine when the second instance is executed. to data access instructions.
  • the first instance can obtain the address of the chained data structure based on the data access instruction.
  • the address of the chained data structure can refer to the address of any data in the chained data structure.
  • chained data The address of the structure can be determined based on the address of the first data that needs to be accessed in the chained data structure. For example, in the first code, if the first data that needs to be accessed in the chained data structure is the first data in the chained data structure, then the data access instruction indicates the address of the first data in the chained data structure; if The first data that needs to be accessed in the chained data structure is some data in the middle of the chained data structure, then the data access instruction indicates the address of the data in the middle of the chained data structure.
  • the address of the chained data structure may indicate a specific single address, such as the starting address where certain data is stored; the address of the chained data structure may also indicate an address segment, such as the address segment where certain data is stored.
  • Step 703 The first instance prefetches data in the chained data structure based on the address of the chained data structure and the at least one metadata.
  • the first instance can sequentially prefetch data in the chained data structure based on the starting storage address and the address of the data to be accessed indicated by the metadata.
  • Step 704 The second instance executes the data access instruction to access data in the chained data structure.
  • the first instance controls the prefetching of the chained data according to the number of times the second instance executes the data access instruction.
  • the progress of the data in the structure which is used to make the data in the chained data structure prefetched into the cache before being accessed.
  • the amount of prefetched data in the chained data structure of the first instance is related to the number of execution times of the data access instructions to ensure that the data prefetched by the first instance There is always more data than is actually accessed.
  • the number of execution times of data access instructions represents the number of accessed data in the chain data structure.
  • the data access instruction may instruct the second instance to access the address indicated by a specific location in a register to access the data in the chained data structure; in addition, the second instance also accesses the address indicated by the specific location in the register.
  • the data in the register is replaced with the newly obtained data. In this way, when the second instance continues to execute the data access instruction, it can obtain the next data in the chain data structure according to the address indicated by the new data in the register.
  • the data access instruction may be an instruction to access an address indicated at a specific offset in a certain register, that is, to access an address indicated by a pointer at a specific offset in the data stored in the register.
  • the first instance obtains the address of the data access instruction based on the prefetch instruction, it can monitor the number of accesses of the data access instructions in real time, so that it can learn the data in the chained data structure based on the number of accesses of the data access instructions. Access progress, and then adaptively adjust the amount of prefetched data to avoid too little or too much data prefetching and ensure effective prefetching of data in the chained data structure.
  • the first example may be to control the prefetch number of data in the chained data structure and the chained data structure.
  • the difference between the amount of data actually accessed is within the preset range. For example, assuming the default range is 5-10, then the first actual For example, you can control the number of prefetched data in the chained data structure to always be 5-10 more than the actual number of accessed data, so as to ensure the timeliness of prefetching and avoid contaminating the cache due to too much prefetched data. .
  • the value of the preset range can be adjusted according to actual application scenarios. For example, when the running device cache is large and the data access performance requirements are not high, the value of the preset range can be adjusted to a larger value; when the running device cache is small and the data access performance requirements are not high, the preset range value can be adjusted to a larger value. The set range value can be adjusted to a smaller value.
  • the prefetch instruction obtained in the first example may be specifically used to indicate the address offset between the prefetch instruction and the data access instruction.
  • the running device can determine the actual address of the data access instruction based on the actual address of the prefetch instruction and the address offset between it and the data access instruction.
  • the data prefetched by the first instance may not include pointers to other data; or all the data prefetched by the first instance may be Part of the data includes pointers to other data, while other parts of the data do not include pointers to other data.
  • the second instance executes the data access instruction
  • the data accessed by the second instance may not include pointers to other data; or part of all the data accessed by the second instance may include pointers to other data. pointers, while the other part of the data does not include pointers to other data.
  • the embodiments of this application do not specifically limit the content of prefetched data and accessed data.
  • the above-mentioned prefetch instruction may also indicate the content of metadata, or indicate the storage address of the metadata. Then, when the prefetch instruction indicates the storage address, the running device obtains the metadata based on the storage address of the metadata.
  • the prefetch instruction specifically indicates the starting storage address of the plurality of metadata and the number of the plurality of metadata. Therefore, after obtaining the prefetch instruction, the running device obtains the plurality of metadata starting from the starting storage address of the metadata according to the number and size of the plurality of metadata. Specifically, the running device first takes out the first metadata among multiple metadata according to the starting storage address of the metadata and the size of the metadata; and uses the size of the metadata as the address offset to continue to take out other subsequent metadata. metadata, thereby enabling prefetching of all metadata.
  • the prefetch instruction can also indicate the storage address of the metadata through other implementation methods.
  • the prefetch instruction can also indicate the storage address of the metadata through other implementation methods. For details, please refer to the embodiment corresponding to Figure 3 above, which will not be described again here.
  • the plurality of metadata indicated in the prefetch instruction may be separately associated with the chained data structure.
  • the plurality of metadata are used to indicate the position of the pointer in the corresponding data.
  • the metadata generated by the compiler may not directly indicate the address of the data in the chained data structure, but indicate the location of the pointer in the data.
  • the running device can implement data prefetching based on the following steps.
  • the running device prefetches the first data in the chained data structure according to the starting storage address of the chained data structure.
  • the running device obtains the pointer in the prefetched data according to the prefetched data in the chained data structure and the metadata corresponding to the prefetched data, where the prefetched data includes the first one in the chained data structure. data or based on The first data mentioned above continues to prefetch other data obtained.
  • the running device prefetches other data pointed to by the prefetched data from the chain data structure according to the address pointed by the pointer in the prefetched data.
  • the running device prefetches data 1 (ie, the first data) of the chained data structure according to the starting storage address of the chained data structure.
  • the running device obtains the position of the pointer in data 1 based on data 1 and metadata 1 corresponding to data 1, and then obtains the pointer in data 1.
  • the running device prefetches data 2 pointed to by data 1 from the chained data structure based on the address pointed to by the pointer in data 1.
  • the running device can continue to prefetch data 3 pointed to by data 2 based on data 2 and metadata 2 corresponding to data 2, and this cycle continues until the prefetched data reaches the required level.
  • the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data. Please refer to the above embodiment for details, which will not be described again here.
  • each metadata in the at least one metadata is also used to indicate the type of corresponding data and the type of other data pointed to by the corresponding data.
  • each metadata in the at least one metadata is also used to indicate the type of corresponding data and the type of other data pointed to by the corresponding data.
  • Figure 8 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the system architecture includes a compiler and running equipment.
  • the compiler is used to compile the application source code to obtain the executable file of the application.
  • the executable file compiled by the compiler includes prefetch instructions, data access instructions and metadata.
  • the running device is used to execute the executable file of the application and obtain data access instructions and metadata according to the prefetch instructions in the executable file to implement prefetching of data in the chained data structure.
  • Figure 9 is a schematic flowchart of a compilation method provided by an embodiment of the present application. As shown in Figure 9, the compilation method includes the following steps 901-904.
  • Step 901 The compiler identifies the memory access behavior of the chained data structure in the source code, refines the data link relationship in the chained data structure, and obtains at least one metadata.
  • each metadata corresponds to one piece of data in the chained data structure, and each metadata indicates another data pointed to by its corresponding data.
  • each metadata can also indicate the size and data type of another data that its corresponding data points to.
  • each metadata generated by the compiler is the same.
  • each metadata generated by the compiler is saved in X bytes, where X can be set according to different application scenarios, and is not specifically limited here.
  • Step 902 The compiler generates data access instructions based on the memory access behavior of the chained data structure in the source code.
  • the data access instruction is used to instruct access to data in the chained data structure, and the data access instruction also indicates the starting storage address of the chained data structure.
  • Step 903 The compiler inserts a prefetch instruction before the data access instruction to indicate the address and metadata of the data access instruction.
  • the prefetch instruction is used to indicate the address and metadata of the data access instruction, and the prefetch instruction is inserted before the data access instruction. That is to say, during the application execution phase, when the running device executes the compiled executable file, it first executes the prefetch instruction and then executes the data access instruction to facilitate data prefetching.
  • the prefetch instruction may indicate the starting storage address of the metadata, the amount of metadata, and the address of the data access instruction (for example, the offset between the data access instruction and the prefetch instruction).
  • Step 904 The compiler generates an executable file carrying prefetch instructions, data access instructions and metadata.
  • an executable file carrying prefetch instructions, data access instructions and metadata can be generated.
  • this embodiment extracts the access pattern to the chained data structure from a typical workload and constructs a verification program. Then, the new optimized PASS compiler is used to compile the verification program, generate a binary program (ie, executable file) with prefetch instructions, and run the binary program to the emulator for verification.
  • a binary program ie, executable file
  • Figure 10A is a schematic diagram of compiling a verification program based on an existing compiler provided by an embodiment of the present application.
  • a in Figure 10A shows a partial structure of the chained data structure
  • b in Figure 10A shows the code instructing access to the chained data structure in the verification program.
  • the existing compiler generates the corresponding data access instruction after compilation.
  • Figure 10B is a schematic diagram of a verification program compiled by a compiler based on the newly added optimization PASS provided by an embodiment of the present application.
  • a new optimization PASS is added to the compiler, so that with the new optimization PASS, corresponding prefetch instructions can be generated for the memory access behavior of the chained data structure during the compilation phase.
  • the assembly code compiled based on the new optimized PASS compiler also includes corresponding prefetch instructions and metadata.
  • the newly added prefetch instruction is the instruction indicated by the address 400b00, which is specifically [2, 0x104, 0xc0].
  • the prefetch instruction 2 represents the number of metadata; 0x104 represents the address offset between the prefetch instruction and the metadata; 0xc0 represents the address offset between the prefetch instruction and the data access instruction.
  • the running device can determine that the address of the metadata is 400c04; based on the address of the prefetch instruction and the address offset 0xc0 indicated by the prefetch instruction, the running device can Determine the address of the data access instruction is 400bc0.
  • the compiler When the compiler compiles the verification program, it recognizes the memory access behavior of the chained data structure in the verification program. Regarding the verification program shown in Figure 10B, the compiler recognizes the loop shown in b in Figure 10B, and generates corresponding metadata based on the data structure information shown in a in 10B.
  • the metadata is in units of 4 bytes.
  • Each metadata stores information about a certain data in the chained data structure and another data pointed to by a pointer in the data.
  • the content stored in each metadata includes the following five types of information.
  • Node identification (Node-ID). Among them, different types of data in the chain data structure are assigned a Node-ID.
  • the Node-ID in the metadata is used to indicate the type of data corresponding to the metadata.
  • Offset stores the relative offset of the pointer (Ptr) of the data corresponding to the current metadata in N bytes (Byte), that is, it indicates the position of the pointer within the data corresponding to the metadata.
  • Nextnode-ID indicates the ID of the next data pointed to by Ptr, that is, it indicates the type of another data pointed to by the data corresponding to the metadata.
  • Nextnode-size in units of M bytes, stores the size of the next data pointed to by Ptr, which indicates the size of another data pointed to by the data corresponding to the metadata.
  • N and M and the coding space occupied by each part can be adjusted according to the application and architecture. This embodiment does not limit the specific values of these parameters.
  • the example of this application allocates the encoding space of metadata in the above manner, and sets Offset and Nextnode-size to be expressed in bytes; then, for the program shown in Figure 10B, the metadata generated by the compiler is as shown in the table 1 shown.
  • the compiler Since the program shown in b in Figure 10B needs to access the data of the BackboneNode node and the data of the RibNode node, the compiler passes the metadata to the hardware to calculate the information required for the addresses of these two nodes. Since the memory access behavior in b in Figure 10B does not access the ArcNode data, there is no need to provide metadata for calculating the ArcNode address, that is, the metadata generated by the compiler is used to indicate the address of the data to be accessed.
  • Figure 11 is a schematic flow chart of a data prefetching method provided by an embodiment of the present application. As shown in Figure 11, the data prefetching method includes the following steps 1101-1104.
  • Step 1101 The running device determines whether the instruction is a prefetch instruction.
  • the decoding unit in the running device determines whether the instruction currently to be executed is a prefetch instruction.
  • Step 1102 The running device initializes the chain prefetcher based on the prefetch instruction to obtain and save the metadata indicated by the prefetch instruction and the address of the data access instruction.
  • the execution device initializes the chain prefetcher based on the prefetch instruction. In this way, the chained prefetcher in the running device can obtain and save the metadata indicated by the prefetch instruction and the data access Ask for the address of the command.
  • Step 1103 The running device determines whether the instruction is a data access instruction.
  • the decoding unit in the running device continuously monitors whether the data access instruction is executed, that is, it determines the currently decoded Whether the instruction is the data access instruction indicated by the prefetch instruction.
  • Step 1104 The running device obtains the starting storage address of the chained data structure based on the data access instruction.
  • the data access instruction indicates the starting storage address of the chained data structure.
  • Step 1105 The running device sends a prefetch request to the cache according to the starting storage address.
  • Step 1106 The operating device determines whether the cache has returned the prefetched data.
  • step 1107 If the running device determines that the cache has returned the prefetched data, the running device continues to execute step 1107.
  • Step 1107 The running device calculates the next prefetch address based on the metadata indicated by the prefetch instruction and the returned data.
  • the chain prefetcher in the running device can perform the metadata instruction according to the prefetch instruction. Calculate the next prefetch address with the returned data.
  • Step 1108 The operating device determines whether to stop prefetching based on the number of execution times of the data access instructions and the number of data prefetching.
  • the operating device can monitor the number of execution times of the data access instruction based on the address of the data access instruction. When the difference between the number of data prefetches and the number of execution times of data access instructions is less than the preset range, the running device will continue to perform data prefetching based on the calculated next prefetch address; when the difference between the number of data prefetches and the number of data access instructions When the difference between the execution times of instructions is greater than the preset range, the operating device stops prefetching data.
  • FIG. 12 is a schematic structural diagram of a compilation device 1200 provided by an embodiment of the present application.
  • the compilation device 1200 includes an acquisition unit 1201 and a processing unit 1202 .
  • the acquisition unit 1201 is used to obtain the first code;
  • the processing unit 1202 is used to generate a data access instruction and at least A piece of metadata, wherein the chained data structure includes a plurality of data with discontinuous addresses, the at least one metadata is used to indicate the address of the data in the chained data structure, and the data access instruction is used to indicate The address of the chained data structure and the request to access the chained data structure;
  • the processing unit 1202 is also configured to generate a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled
  • the second code, the prefetch instruction is used to indicate the address of the data access instruction and the at least one metadata.
  • the data in the chained data structure includes pointers pointing to addresses of other data
  • the at least one metadata respectively corresponds to different data in the chained data structure
  • the at least one metadata is used to indicate the position of the pointer in the corresponding data.
  • the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
  • each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
  • the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
  • the prefetch instruction is specifically used to indicate the address of the at least one metadata.
  • the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata, and the sizes of the at least one metadata are the same.
  • the at least one metadata is located in a code segment or data segment in the second code, and the second code is compiled based on the first code.
  • FIG. 13 is a schematic structural diagram of a data prefetching device 1300 provided by an embodiment of the present application.
  • the data prefetching device 1300 includes: an acquisition unit 1301 , a prefetching unit 1302 and an execution unit 1303 .
  • Acquisition unit 1301 configured to acquire a prefetch instruction, wherein the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata, and the data access instruction is used to indicate the address of a chained data structure, the chained
  • the data structure includes multiple data with discontinuous addresses, and the at least one metadata is used to indicate the address of the data in the chained data structure; the acquisition unit 1301 is also used to acquire all the data according to the address of the data access instruction.
  • the number of times the data access instruction is executed to control the progress of prefetching the data in the chained data structure. The progress is used to make the data in the chained data structure prefetched into the cache before being accessed. .
  • the difference between the amount of prefetched data and the amount of accessed data is within a preset value. within the range.
  • the data in the chained data structure includes pointers pointing to addresses of other data in the chained data structure, and the at least one metadata is respectively related to the chained data structure.
  • the at least one metadata is used to indicate the position of the pointer in the corresponding data;
  • the prefetch unit 1302 is specifically configured to: prefetch data in the chained data structure according to the address of the chained data structure; based on the prefetched data in the chained data structure and the already Prefetch the metadata corresponding to the data and obtain the pointer in the prefetched data; prefetch the prefetched data from the chained data structure according to the address pointed to by the pointer in the prefetched data. other data pointed to.
  • the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
  • each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
  • the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
  • the prefetch instruction is specifically used to indicate the address of the at least one metadata
  • the obtaining unit 1301 is also configured to: obtain the at least one metadata according to the address of the at least one metadata.
  • the at least one metadata has the same size, and the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata;
  • the acquisition unit 1301 is further configured to: acquire the at least one metadata starting from the starting address of the at least one metadata according to the quantity and size of the at least one metadata.
  • the compiling method and data prefetching method provided by the embodiments of the present application can be specifically executed by a chip in an electronic device.
  • the chip includes: a processing unit and a communication unit.
  • the processing unit can be, for example, a processor, and the communication unit can be, for example, an input/output. Interface, pin or circuit, etc.
  • the processing unit can execute computer execution instructions stored in the storage unit, so that the chip in the electronic device executes the method described in the embodiments shown in FIGS. 1 to 11 .
  • the storage unit is a storage unit within the chip, such as a register, cache, etc.
  • the storage unit can also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM). Or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • the present application also provides a computer-readable storage medium.
  • the methods disclosed in the above embodiments can be implemented as being encoded on the computer-readable storage medium in a machine-readable format or by Computer program instructions encoded on other non-transitory media or articles.
  • FIG. 14 schematically illustrates a conceptual partial view of an example computer-readable storage medium including a computer program for executing a computer process on a computing device, arranged in accordance with at least some embodiments presented herein.
  • computer-readable storage media 1400 is provided using signal bearing media 1401.
  • Signal bearing medium 1401 may include one or more program instructions 1402 that, when executed by one or more processors, may provide the functionality or portions of the functionality described above with respect to FIG. 4 or FIG. 7 .
  • program instructions 1402 in Figure 14 also describe example instructions.
  • signal bearing media 1401 may include computer readable media 1403 such as, but not limited to, a hard drive, compact disk (CD), digital video disc (DVD), digital tape, memory, ROM or RAM, and the like.
  • computer readable media 1403 such as, but not limited to, a hard drive, compact disk (CD), digital video disc (DVD), digital tape, memory, ROM or RAM, and the like.
  • signal bearing media 1401 may include computer recordable media 1404 such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, and the like.
  • signal bearing medium 1401 may include communication media 1405, such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.).
  • signal bearing medium 1401 may be conveyed by a wireless form of communication medium 1405 (eg, a wireless communication medium that complies with the IEEE 802.14 standard or other transmission protocol).
  • One or more program instructions 1402 may be, for example, computer-executable instructions or logic-implemented instructions.
  • the computing device of the computing device may be configured to respond to program instructions 1402 communicated to the computing device via one or more of computer-readable media 1403 , computer-recordable media 1404 , and/or communication media 1405 , Provide various operations, functions, or actions.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other various media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A data prefetching method and a compiling method, which may implement effective prefetching of data in a linked data structure. In the present solution, when a code accessing the linked data structure is identified in a compiling process, a data access instruction and metadata used for indicating addresses of the data in the linked data structure are generated, and a prefetching instruction is generated to indicate an address of the data access instruction and the metadata. In this way, when a running device executes an executable file obtained by compiling, the data access instruction and the metadata can be determined according to the prefetching instruction, such that prefetching of the data in the linked data structure is implemented. Moreover, after the running device acquires the address of the data access instruction on the basis of the prefetching instruction, an access progress of the data in the linked data structure can be learned according to the number of accesses of the data access instruction, thereby adaptively adjusting the quantity of prefetched data, and ensuring the effective prefetching of the data in the linked data structure.

Description

一种数据预取方法、编译方法及相关装置A data prefetching method, compilation method and related devices
本申请要求于2022年6月10日提交中国专利局、申请号为202210654495.4、发明名称为“一种数据预取方法、编译方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application submitted to the China Patent Office on June 10, 2022, with application number 202210654495.4 and the invention title "A data prefetching method, compilation method and related devices", the entire content of which is incorporated by reference. incorporated in this application.
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种数据预取方法、编译方法及相关装置。The present application relates to the field of computer technology, and in particular, to a data prefetching method, compilation method and related devices.
背景技术Background technique
在计算机系统中,不同存储设备的访问速度通常不同。在具有多级存储系统的计算机中,计算机通常采用数据预取技术来提高系统的访问性能。具体地,计算机会对待访问的数据进行预测,并提前将预测的数据从访问速度较慢的存储设备中加载到访问速度较快的存储设备中,例如将预测的数据从内存中加载到缓存(cache)中。In computer systems, different storage devices usually have different access speeds. In computers with multi-level storage systems, computers usually use data prefetching technology to improve system access performance. Specifically, the computer will predict the data to be accessed and load the predicted data from a storage device with a slower access speed to a storage device with a faster access speed in advance, such as loading the predicted data from the memory to the cache ( cache).
目前,现有的数据预取技术通常是根据历史访存信息来预取待访问的数据。例如,计算机通过检测历史访存信息发现程序以地址递增的方式来访问数据时,计算机则基于当前所访问的数据以及相同的地址递增方式来预取待访问的数据。Currently, existing data prefetching technologies usually prefetch data to be accessed based on historical memory access information. For example, when the computer detects historical memory access information and finds that the program accesses data in an address incrementing manner, the computer prefetches the data to be accessed based on the currently accessed data and the same address incrementing manner.
然而,目前的数据预取技术只能够对存储地址具有一定规律的数据实现有效预取,例如存储地址连续的数据或者是存储地址以一定数值递增的数据。对于存储地址无规律可循的链式数据结构中的数据,当前的数据预取技术难以实现数据的有效预取。其中,链式数据结构中通常包括多个分散存储的数据,且链式数据结构中包括指针的数据会指向下一个数据所存储的地址。目前的数据预取技术在对链式数据结构中的数据进行预取时,往往难以确定数据的预取数量可能会一次性预取过多的数据而导致缓存污染,或者是数据预取过少而达不到提高数据访问性能的目的。However, current data prefetching technology can only achieve effective prefetching for data whose storage addresses have certain patterns, such as data with continuous storage addresses or data with storage addresses that increase by a certain value. For data in a chained data structure with irregular storage addresses, current data prefetching technology is difficult to achieve effective prefetching of data. Among them, the chained data structure usually includes multiple dispersedly stored data, and the data including pointers in the chained data structure will point to the address where the next data is stored. When current data prefetching technology prefetches data in a chained data structure, it is often difficult to determine the amount of data to be prefetched. Too much data may be prefetched at one time, causing cache pollution, or too little data may be prefetched. However, the purpose of improving data access performance cannot be achieved.
因此,目前亟需一种能够对链式数据结构中的数据实现有效预取的方法。Therefore, there is an urgent need for a method that can effectively prefetch data in a chained data structure.
发明内容Contents of the invention
本申请提供了一种数据预取方法,能够对链式数据结构中的数据实现有效预取。This application provides a data prefetching method that can effectively prefetch data in a chained data structure.
本申请第一方面提供一种数据预取方法,所述方法应用于计算机系统的第一实例,且所述计算机系统还包括第二实例。具体地,该数据预取方法包括:第一实例获取可执行文件中的预取指令,所述预取指令用于指示数据访问指令的地址以及至少一个元数据,所述数据访问指令用于指示链式数据结构的地址,所述链式数据结构包括地址不连续的多个数据,所述至少一个元数据用于指示链式数据结构中的数据的地址。其中,链式数据结构的地址可以是指链式数据结构中任意一个数据的地址。链式数据结构的地址可以是根据链式数据结构中第一个需要访问的数据的地址来确定的,例如链式数据结构中第一个需要访问的数据为链式数据结构中的首个数据,那么数据访问指令则指示链式数据结构中首个数据的地址。此外,链式数据结构的地址可以是指示具体的单个地址,例如存储某个数据的起始地址;链式数据结构的地址也可以是指示一个地址段,例如存储某个数据的地址段。A first aspect of the present application provides a data prefetching method, the method is applied to a first instance of a computer system, and the computer system further includes a second instance. Specifically, the data prefetching method includes: the first instance obtains a prefetch instruction in an executable file, the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata, and the data access instruction is used to indicate The address of a chained data structure. The chained data structure includes multiple data with discontinuous addresses. The at least one metadata is used to indicate the address of the data in the chained data structure. Among them, the address of the chained data structure can refer to the address of any data in the chained data structure. The address of the chained data structure can be determined based on the address of the first data that needs to be accessed in the chained data structure. For example, the first data that needs to be accessed in the chained data structure is the first data in the chained data structure. , then the data access instruction indicates the address of the first data in the chained data structure. In addition, the address of the chained data structure may indicate a specific single address, such as the starting address where certain data is stored; the address of the chained data structure may also indicate an address segment, such as the address segment where certain data is stored.
然后,所述第一实例根据所述数据访问指令的地址,获取所述链式数据结构的地址。 并且,所述第一实例根据所述链式数据结构的地址和所述至少一个元数据,预取所述链式数据结构中的数据。Then, the first instance obtains the address of the chained data structure according to the address of the data access instruction. Furthermore, the first instance prefetches data in the chained data structure according to the address of the chained data structure and the at least one metadata.
此外,所述第二实例执行所述数据访问指令,以访问所述链式数据结构中的数据。Additionally, the second instance executes the data access instructions to access data in the chained data structure.
具体地,所述第一实例在预取所述链式数据结构中数据的过程中,所述第一实例根据所述第二实例执行所述数据访问指令的次数来控制预取所述链式数据结构中数据的进度,所述进度用于使所述链式数据结构中的数据在被访问前已预取到缓存中。Specifically, in the process of prefetching data in the chained data structure, the first instance controls the prefetching of the chained data structure according to the number of times the second instance executes the data access instruction. The progress of the data in the data structure, the progress is used to make the data in the chained data structure prefetched into the cache before being accessed.
本方案中,运行设备在执行可执行文件时,能够根据可执行文件中的预取指令确定数据访问指令以及至少一个元数据,从而实现链式数据结构中的数据预取;并且,运行设备在基于预取指令获取到数据访问指令的地址后,则能够根据数据访问指令的访问次数来获知链式数据结构中的数据访问进度,从而控制预取所述链式数据结构中数据的进度,即适应性地调整预取数据的数量,保证链式数据结构中数据的有效预取。In this solution, when the running device executes the executable file, it can determine the data access instruction and at least one metadata based on the prefetch instruction in the executable file, thereby realizing data prefetching in the chained data structure; and, the running device can After obtaining the address of the data access instruction based on the prefetch instruction, the data access progress in the chained data structure can be obtained according to the number of accesses of the data access instruction, thereby controlling the progress of prefetching the data in the chained data structure, that is, Adaptively adjust the amount of prefetched data to ensure effective prefetching of data in the chained data structure.
需要说明的是,在第一实例预取链式数据结构中的数据的过程中,第一实例所预取得到的数据可以是不包括指向其他数据的指针;或者第一实例预取得到的所有数据中的部分数据包括指向其他数据的指针,而另一部分数据则不包括指向其他数据的指针。类似地,第二实例执行数据访问指令的过程中,第二实例所访问的数据也可以是不包括指向其他数据的指针;或者第二实例所访问的所有数据中的部分数据包括指向其他数据的指针,而另一部分数据则不包括指向其他数据的指针。It should be noted that during the process of prefetching data in the chained data structure by the first instance, the data prefetched by the first instance may not include pointers to other data; or all the data prefetched by the first instance may be Part of the data includes pointers to other data, while other parts of the data do not include pointers to other data. Similarly, when the second instance executes the data access instruction, the data accessed by the second instance may not include pointers to other data; or part of all the data accessed by the second instance may include pointers to other data. pointers, while the other part of the data does not include pointers to other data.
在一种可能的实现方式中,在所述第一实例预取所述链式数据结构中数据的过程中,已预取数据的数量与已访问数据的数量之间的差值在预设范围内。例如,假设预设范围为5-10,那么第一实例可以则控制链式数据结构中的数据的预取数量始终比实际已访问的数据个数多5-10个,以在保证预取的及时性的同时,避免预取数据过多而污染缓存。此外,第一实例在预取数据的过程中,可以根据数据的进度以及运行设备的可用缓存空间,动态地调整上述的预设范围,以保证数据预取数量以及可用缓存空间之间的平衡。In a possible implementation, during the process of prefetching data in the chained data structure by the first instance, the difference between the amount of prefetched data and the amount of accessed data is within a preset range. Inside. For example, assuming that the preset range is 5-10, the first instance can control the prefetch number of data in the chained data structure to always be 5-10 more than the actual number of accessed data, so as to ensure prefetching. While ensuring timeliness, it avoids polluting the cache by prefetching too much data. In addition, during the process of prefetching data, the first instance can dynamically adjust the above-mentioned preset range according to the progress of the data and the available cache space of the running device to ensure a balance between the amount of data prefetched and the available cache space.
在一种可能的实现方式中,所述链式数据结构中的数据包括用于指向所述链式数据结构内其他数据的地址的指针,所述至少一个元数据分别与所述链式数据结构中不同的数据对应,且所述至少一个元数据均用于指示所对应的数据中指针的位置。In a possible implementation, the data in the chained data structure includes pointers pointing to addresses of other data in the chained data structure, and the at least one metadata is respectively related to the chained data structure. Corresponds to different data in the data, and the at least one metadata is used to indicate the position of the pointer in the corresponding data.
所述第一实例根据所述链式数据结构的地址和所述至少一个元数据,预取所述链式数据结构中的数据,包括:所述第一实例根据所述链式数据结构的地址,预取所述链式数据结构中的数据;所述第一实例根据所述链式数据结构中的已预取数据和所述已预取数据对应的元数据,获取所述已预取数据中的指针;所述第一实例根据所述已预取数据中的指针所指向的地址,从所述链式数据结构中预取所述已预取数据所指向的其他数据。The first instance prefetches data in the chained data structure based on the address of the chained data structure and the at least one metadata, including: the first instance based on the address of the chained data structure , prefetch the data in the chained data structure; the first instance obtains the prefetched data according to the prefetched data in the chained data structure and the metadata corresponding to the prefetched data. the pointer in; the first instance prefetches other data pointed to by the prefetched data from the chained data structure according to the address pointed by the pointer in the prefetched data.
在一种可能的实现方式中,所述至少一个元数据还用于指示所对应的数据所指向的其他数据的大小。也就是说,对于某一个元数据而言,该元数据还用于指示自身所对应的数据中的指针所指向的其他数据的大小。例如,假设元数据1与数据1对应,数据1中的指针指向数据2;那么,元数据1还用于指示数据1中的指针所指向的数据2的大小。In a possible implementation, the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data. That is to say, for a certain metadata, the metadata is also used to indicate the size of other data pointed to by the pointer in the corresponding data. For example, assume that metadata 1 corresponds to data 1, and the pointer in data 1 points to data 2; then, metadata 1 is also used to indicate the size of data 2 pointed to by the pointer in data 1.
在一种可能的实现方式中,所述至少一个元数据中的每个元数据还用于指示所对应的数据的类型和所述所对应的数据所指向的其他数据的类型。 In a possible implementation, each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
本方案中,通过在元数据中指示数据的数据类型,能够确定元数据所指示的指针实际上是指向哪个数据,从而确定链式数据结构中的数据的链接关系,以便于运行设备能够在复杂的链式数据结构中有效地实现数据的预取。In this solution, by indicating the data type of the data in the metadata, it can be determined which data the pointer indicated by the metadata actually points to, thereby determining the link relationship of the data in the chained data structure, so that the operating device can operate in complex situations. Effectively implement data prefetching in the chained data structure.
在一种可能的实现方式中,所述预取指令具体用于指示所述预取指令与所述数据访问指令之间的地址偏移。In a possible implementation, the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
本方案中,通过在预取指令中指示其与数据访问指令之间的地址偏移,能够减少对预取指令的编码空间占用,从而节省指令开销。In this solution, by indicating the address offset between the prefetch instruction and the data access instruction in the prefetch instruction, the encoding space occupied by the prefetch instruction can be reduced, thereby saving instruction overhead.
在一种可能的实现方式中,所述预取指令具体用于指示所述至少一个元数据的地址;所述方法还包括:所述第一实例根据所述至少一个元数据的地址,获取所述至少一个元数据。In a possible implementation, the prefetch instruction is specifically used to indicate the address of the at least one metadata; the method further includes: the first instance obtains the address of the at least one metadata according to the address of the at least one metadata. Describe at least one metadata.
本方案中,通过在预取指令中指示元数据的存储地址,能够避免在预取指令中直接存放元数据,减少对预取指令的编码空间占用,从而节省指令开销。In this solution, by indicating the storage address of metadata in the prefetch instruction, it is possible to avoid directly storing metadata in the prefetch instruction, reduce the encoding space occupied by the prefetch instruction, and thereby save instruction overhead.
在一种可能的实现方式中,所述至少一个元数据的大小相同,所述预取指令用于指示所述至少一个元数据的起始地址以及所述至少一个元数据的数量;所述第一实例根据所述至少一个元数据的存储地址,获取所述至少一个元数据,包括:所述第一实例根据所述至少一个元数据的数量和大小,从所述至少一个元数据的起始存储地址开始获取所述至少一个元数据。In a possible implementation, the at least one metadata has the same size, and the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata; the first An example of obtaining the at least one metadata based on the storage address of the at least one metadata includes: the first example obtaining the at least one metadata from the start of the at least one metadata based on the number and size of the at least one metadata. The storage address starts to obtain the at least one metadata.
本申请第二方面提供一种编译方法,包括:编译器获取第一代码;其中,第一代码可以是指程序源代码,例如基于java,c,c++,python等高级语言编写成的代码。A second aspect of this application provides a compilation method, including: a compiler obtains a first code; wherein the first code may refer to a program source code, such as a code based on high-level languages such as java, c, c++, python, etc.
在识别到所述第一代码中存在请求访问链式数据结构的代码时,编译器根据所述链式数据结构生成数据访问指令和至少一个元数据,其中,所述链式数据结构包括地址不连续的多个数据,所述至少一个元数据分别用于指示所述链式数据结构中数据的地址,所述数据访问指令用于指示所述链式数据结构的地址以及请求访问所述链式数据结构。When recognizing that there is code requesting access to a chained data structure in the first code, the compiler generates a data access instruction and at least one metadata according to the chained data structure, wherein the chained data structure includes an address and A plurality of consecutive data, the at least one metadata is used to indicate the address of the data in the chained data structure, and the data access instruction is used to indicate the address of the chained data structure and request access to the chained data structure. data structure.
最后,编译器根据所述至少一个元数据和所述数据访问指令生成预取指令,以得到编译后的第二代码,所述预取指令用于指示所述数据访问指令的地址以及所述至少一个元数据。Finally, the compiler generates a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled second code, the prefetch instruction is used to indicate the address of the data access instruction and the at least a metadata.
本方案中,通过在编译过程中识别到访问链式数据结构的代码时,则生成数据访问指令以及用于指示链式数据结构中的待访问数据的地址的至少一个元数据,并且在数据访问指令前插入预取指令,以指示数据访问指令的地址以及该至少一个元数据。这样一来,运行设备在执行编译得到的可执行文件时,能够根据预取指令确定数据访问指令以及至少一个元数据,从而实现链式数据结构中的数据预取;并且,运行设备在基于预取指令获取到数据访问指令的地址后,则能够根据数据访问指令的访问次数来获知链式数据结构中的数据访问进度,从而适应性地调整预取数据的数量,保证链式数据结构中数据的有效预取。In this solution, when the code accessing the chained data structure is identified during the compilation process, the data access instruction and at least one metadata used to indicate the address of the data to be accessed in the chained data structure are generated, and during the data access A prefetch instruction is inserted before the instruction to indicate the address of the data access instruction and the at least one metadata. In this way, when the running device executes the compiled executable file, it can determine the data access instruction and at least one metadata based on the prefetch instruction, thereby realizing data prefetching in the chained data structure; and, the running device can determine the data access instruction and at least one metadata based on the prefetch instruction. After the instruction fetch obtains the address of the data access instruction, it can learn the data access progress in the chained data structure according to the number of accesses of the data access instruction, thereby adaptively adjusting the amount of prefetched data to ensure that the data in the chained data structure effective prefetching.
在一种可能的实现方式中,所述链式数据结构中的数据包括用于指向其他数据的地址的指针,所述至少一个元数据分别与所述链式数据结构中不同的数据对应,且所述至少一个元数据均用于指示所对应的数据中指针的位置。 In a possible implementation, the data in the chained data structure includes pointers pointing to addresses of other data, the at least one metadata respectively corresponds to different data in the chained data structure, and The at least one metadata is used to indicate the position of the pointer in the corresponding data.
在一种可能的实现方式中,所述至少一个元数据还用于指示所对应的数据所指向的其他数据的大小。In a possible implementation, the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
在一种可能的实现方式中,所述至少一个元数据中的每个元数据还用于指示所对应的数据的类型和所述所对应的数据所指向的其他数据的类型。In a possible implementation, each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
在一种可能的实现方式中,所述预取指令具体用于指示所述预取指令与所述数据访问指令之间的地址偏移。In a possible implementation, the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
在一种可能的实现方式中,所述预取指令具体用于指示所述至少一个元数据的地址。In a possible implementation, the prefetch instruction is specifically used to indicate the address of the at least one metadata.
在一种可能的实现方式中,所述预取指令用于指示所述至少一个元数据的起始地址以及所述至少一个元数据的数量,所述至少一个元数据的大小相同。In a possible implementation, the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata, and the sizes of the at least one metadata are the same.
在一种可能的实现方式中,所述至少一个元数据位于第二代码中的代码段或数据段,所述第二代码是基于所述第一代码编译得到的。In a possible implementation, the at least one metadata is located in a code segment or data segment in the second code, and the second code is compiled based on the first code.
本申请第三方面提供一种数据预取装置,包括:The third aspect of this application provides a data prefetching device, including:
获取单元,用于获取预取指令,其中所述预取指令用于指示数据访问指令的地址以及至少一个元数据,所述数据访问指令用于指示链式数据结构的地址,所述链式数据结构包括地址不连续的多个数据,所述至少一个元数据用于指示链式数据结构中的数据的地址;An acquisition unit, configured to acquire a prefetch instruction, wherein the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata, the data access instruction is used to indicate the address of a chained data structure, the chained data The structure includes multiple data with discontinuous addresses, and the at least one metadata is used to indicate the address of the data in the chained data structure;
所述获取单元,还用于根据所述数据访问指令的地址,获取所述链式数据结构的地址;The acquisition unit is also configured to acquire the address of the chained data structure according to the address of the data access instruction;
预取单元,用于根据所述链式数据结构的地址和所述至少一个元数据,预取所述链式数据结构中的数据;A prefetch unit, configured to prefetch data in the chained data structure according to the address of the chained data structure and the at least one metadata;
执行单元,用于执行所述数据访问指令,以访问所述链式数据结构中的数据;An execution unit, used to execute the data access instructions to access the data in the chained data structure;
其中,所述第一实例在预取所述链式数据结构中数据的过程中,所述第一实例根据所述第二实例执行所述数据访问指令的次数来控制预取所述链式数据结构中数据的进度,所述进度用于使所述链式数据结构中的数据在被访问前已预取到缓存中。Wherein, in the process of prefetching data in the chained data structure, the first instance controls the prefetching of the chained data according to the number of times the second instance executes the data access instruction. The progress of the data in the structure, which is used to make the data in the chained data structure prefetched into the cache before being accessed.
在一种可能的实现方式中,在所述预取单元预取所述链式数据结构中数据的过程中,已预取数据的数量与已访问数据的数量之间的差值在预设范围内。In a possible implementation, during the process of the prefetch unit prefetching data in the chained data structure, the difference between the amount of prefetched data and the amount of accessed data is within a preset range. Inside.
在一种可能的实现方式中,所述链式数据结构中的数据包括用于指向所述链式数据结构内其他数据的地址的指针,所述至少一个元数据分别与所述链式数据结构中不同的数据对应,且所述至少一个元数据均用于指示所对应的数据中指针的位置;In a possible implementation, the data in the chained data structure includes pointers pointing to addresses of other data in the chained data structure, and the at least one metadata is respectively related to the chained data structure. Corresponds to different data in the data, and the at least one metadata is used to indicate the position of the pointer in the corresponding data;
所述预取单元,具体用于:根据所述链式数据结构的地址,预取所述链式数据结构中的数据;根据所述链式数据结构中的已预取数据和所述已预取数据对应的元数据,获取所述已预取数据中的指针;根据所述已预取数据中的指针所指向的地址,从所述链式数据结构中预取所述已预取数据所指向的其他数据。The prefetch unit is specifically configured to: prefetch data in the chained data structure according to the address of the chained data structure; and prefetch data in the chained data structure and the prefetched Get the metadata corresponding to the data and obtain the pointer in the prefetched data; prefetch the address of the prefetched data from the chained data structure according to the address pointed by the pointer in the prefetched data. other data pointed to.
在一种可能的实现方式中,所述至少一个元数据还用于指示所对应的数据所指向的其他数据的大小。In a possible implementation, the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
在一种可能的实现方式中,所述至少一个元数据中的每个元数据还用于指示所对应的数据的类型和所述所对应的数据所指向的其他数据的类型。In a possible implementation, each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
在一种可能的实现方式中,所述预取指令具体用于指示所述预取指令与所述数据访问 指令之间的地址偏移。In a possible implementation, the prefetch instruction is specifically used to indicate that the prefetch instruction is related to the data access Address offset between instructions.
在一种可能的实现方式中,所述预取指令具体用于指示所述至少一个元数据的地址;In a possible implementation, the prefetch instruction is specifically used to indicate the address of the at least one metadata;
所述获取单元还用于:根据所述至少一个元数据的地址,获取所述至少一个元数据。The acquisition unit is further configured to: acquire the at least one metadata according to the address of the at least one metadata.
在一种可能的实现方式中,所述至少一个元数据的大小相同,所述预取指令用于指示所述至少一个元数据的起始地址以及所述至少一个元数据的数量;In a possible implementation, the at least one metadata has the same size, and the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata;
所述获取单元还用于:根据所述至少一个元数据的数量和大小,从所述至少一个元数据的起始地址开始获取所述至少一个元数据。The acquisition unit is further configured to: acquire the at least one metadata starting from a starting address of the at least one metadata according to the quantity and size of the at least one metadata.
本申请第四方面提供一种编译装置,包括:A fourth aspect of this application provides a compilation device, including:
获取单元,用于获取第一代码;Get unit, used to get the first code;
处理单元,用于在识别到所述第一代码中存在请求访问链式数据结构的代码时,根据所述链式数据结构生成数据访问指令和至少一个元数据,其中,所述链式数据结构包括地址不连续的多个数据,所述至少一个元数据分别用于指示所述链式数据结构中数据的地址,所述数据访问指令用于指示所述链式数据结构的地址以及请求访问所述链式数据结构;A processing unit configured to generate a data access instruction and at least one metadata according to the chained data structure when it is recognized that the first code contains code requesting access to the chained data structure, wherein the chained data structure Includes multiple data with discontinuous addresses, the at least one metadata is used to indicate the address of the data in the chained data structure, and the data access instruction is used to indicate the address of the chained data structure and request access to all data. Described chained data structure;
所述处理单元,还用于根据所述至少一个元数据和所述数据访问指令生成预取指令,以得到编译后的第二代码,所述预取指令用于指示所述数据访问指令的地址以及所述至少一个元数据。The processing unit is further configured to generate a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled second code, where the prefetch instruction is used to indicate the address of the data access instruction. and said at least one metadata.
在一种可能的实现方式中,所述链式数据结构中的数据包括用于指向其他数据的地址的指针,所述至少一个元数据分别与所述链式数据结构中不同的数据对应,且所述至少一个元数据均用于指示所对应的数据中指针的位置。In a possible implementation, the data in the chained data structure includes pointers pointing to addresses of other data, the at least one metadata respectively corresponds to different data in the chained data structure, and The at least one metadata is used to indicate the position of the pointer in the corresponding data.
在一种可能的实现方式中,所述至少一个元数据还用于指示所对应的数据所指向的其他数据的大小。In a possible implementation, the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
在一种可能的实现方式中,所述至少一个元数据中的每个元数据还用于指示所对应的数据的类型和所述所对应的数据所指向的其他数据的类型。In a possible implementation, each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
在一种可能的实现方式中,所述预取指令具体用于指示所述预取指令与所述数据访问指令之间的地址偏移。In a possible implementation, the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
在一种可能的实现方式中,所述预取指令具体用于指示所述至少一个元数据的地址。In a possible implementation, the prefetch instruction is specifically used to indicate the address of the at least one metadata.
在一种可能的实现方式中,所述预取指令用于指示所述至少一个元数据的起始地址以及所述至少一个元数据的数量,所述至少一个元数据的大小相同。In a possible implementation, the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata, and the sizes of the at least one metadata are the same.
在一种可能的实现方式中,所述至少一个元数据位于第二代码中的代码段或数据段,所述第二代码是基于所述第一代码编译得到的。In a possible implementation, the at least one metadata is located in a code segment or data segment in the second code, and the second code is compiled based on the first code.
本申请第五方面提供一种电子设备,该电子设备包括:存储器和处理器;所述存储器存储有代码,所述处理器被配置为执行所述代码,当所述代码被执行时,所述电子设备执行如第一方面或第二方面中的任意一种实现方式的方法。A fifth aspect of the present application provides an electronic device. The electronic device includes: a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the The electronic device performs the method implemented in any one of the first aspect or the second aspect.
本申请第六方面提供一种计算机可读存储介质,计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如第一方面或第二方面中的任意一种实现 方式的方法。A sixth aspect of the present application provides a computer-readable storage medium. A computer program is stored in the computer-readable storage medium. When it is run on a computer, it causes the computer to execute any one of the implementations of the first aspect or the second aspect. way method.
本申请第七方面提供一种计算机程序产品,当其在计算机上运行时,使得计算机执行如第一方面或第二方面中的任意一种实现方式的方法。A seventh aspect of the present application provides a computer program product that, when run on a computer, causes the computer to execute the method implemented in any one of the first aspect or the second aspect.
本申请第八方面提供一种芯片,包括一个或多个处理器。处理器中的部分或全部用于读取并执行存储器中存储的计算机程序,以执行上述任一方面任意可能的实现方式中的方法。An eighth aspect of this application provides a chip including one or more processors. Part or all of the processor is used to read and execute the computer program stored in the memory to perform the method in any possible implementation of any of the above aspects.
可选地,该芯片该包括存储器,该存储器与该处理器通过电路或电线与存储器连接。可选地,该芯片还包括通信接口,处理器与该通信接口连接。通信接口用于接收需要处理的数据和/或信息,处理器从该通信接口获取该数据和/或信息,并对该数据和/或信息进行处理,并通过该通信接口输出处理结果。该通信接口可以是输入输出接口。本申请提供的方法可以由一个芯片实现,也可以由多个芯片协同实现。Optionally, the chip should include a memory, and the memory and the processor are connected to the memory through circuits or wires. Optionally, the chip also includes a communication interface, and the processor is connected to the communication interface. The communication interface is used to receive data and/or information that needs to be processed. The processor obtains the data and/or information from the communication interface, processes the data and/or information, and outputs the processing results through the communication interface. The communication interface may be an input-output interface. The method provided by this application can be implemented by one chip, or can be implemented by multiple chips collaboratively.
本申请第二方面至第八方面的有益效果可以参考本申请第一方面的介绍,在此不再赘述。For the beneficial effects of the second to eighth aspects of the present application, reference can be made to the introduction of the first aspect of the present application, and will not be described again here.
附图说明Description of the drawings
图1为本申请实施例提供的一种链式数据结构的示意图;Figure 1 is a schematic diagram of a chained data structure provided by an embodiment of the present application;
图2为本申请实施例提供的多种不同的链式数据结构的示意图;Figure 2 is a schematic diagram of multiple different chained data structures provided by embodiments of the present application;
图3为本申请实施例提供的一种运行设备执行应用程序的示意图;Figure 3 is a schematic diagram of a running device executing an application program provided by an embodiment of the present application;
图4为本申请实施例提供的一种编译方法的流程示意图;Figure 4 is a schematic flowchart of a compilation method provided by an embodiment of the present application;
图5为本申请实施例提供的一种链式数据结构中的数据与元数据之间的对应示意图;Figure 5 is a schematic diagram of the correspondence between data and metadata in a chained data structure provided by an embodiment of the present application;
图6为本申请实施例一种链式数据结构中的数据的类型的示意图;Figure 6 is a schematic diagram of the types of data in a chained data structure according to an embodiment of the present application;
图7为本申请实施例提供的一种数据预取方法的流程示意图;Figure 7 is a schematic flow chart of a data prefetching method provided by an embodiment of the present application;
图8为本申请实施例提供的一种系统架构示意图;Figure 8 is a schematic diagram of a system architecture provided by an embodiment of the present application;
图9为本申请实施例提供的一种编译方法的流程示意图;Figure 9 is a schematic flowchart of a compilation method provided by an embodiment of the present application;
图10A为本申请实施例提供的一种基于现有编译器对验证程序进行编译的示意图;Figure 10A is a schematic diagram of compiling a verification program based on an existing compiler provided by an embodiment of the present application;
图10B为本申请实施例提供的一种基于新增优化PASS的编译器对验证程序进行编译的示意图;Figure 10B is a schematic diagram of a verification program compiled by a compiler based on the newly added optimized PASS provided by the embodiment of the present application;
图11为本申请实施例提供的一种数据预取方法的流程示意图;Figure 11 is a schematic flow chart of a data prefetching method provided by an embodiment of the present application;
图12为本申请实施例提供的一种编译装置1200的结构示意图;Figure 12 is a schematic structural diagram of a compilation device 1200 provided by an embodiment of the present application;
图13为本申请实施例提供的一种数据预取装置1300的结构示意图;Figure 13 is a schematic structural diagram of a data prefetching device 1300 provided by an embodiment of the present application;
图14为本申请实施例提供的一种计算机可读存储介质的结构示意图。Figure 14 is a schematic structural diagram of a computer-readable storage medium provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合本申请实施例中的附图对本申请实施例进行描述。本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。The embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. The terms used in the embodiments of the present application are only used to explain specific embodiments of the present application and are not intended to limit the present application.
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。 The embodiments of the present application are described below with reference to the accompanying drawings. Persons of ordinary skill in the art know that with the development of technology and the emergence of new scenarios, the technical solutions provided in the embodiments of this application are also applicable to similar technical problems.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second", etc. in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that the terms so used are interchangeable under appropriate circumstances, and are merely a way of distinguishing objects with the same attributes in describing the embodiments of the present application. Furthermore, the terms "include" and "having" and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, product or apparatus comprising a series of elements need not be limited to those elements, but may include not explicitly other elements specifically listed or inherent to such processes, methods, products or equipment.
为便于理解,以下将对本申请实施例所涉及的技术术语进行解释。To facilitate understanding, the technical terms involved in the embodiments of this application will be explained below.
链式数据结构:链式数据结构中包括地址不连续的多个数据,且多个数据相互间具有地址指向关系,即链式数据结构中的前一个数据指向了下一个数据的地址。请参阅图1,图1为本申请实施例提供的一种链式数据结构的示意图。如图1所示,链式数据结构中的每个数据包括两部分,一部分为有效数据部分,另一部分为指针部分,且指针部分用于指向当前数据所链接的下一个数据的地址。简单来说,链式数据结构是用指针来体现数据元素之间的逻辑关系的。这样一来,在访问链式数据结构的过程中,通常是从前往后依次进行访问,即先访问前一个数据,才能够基于前一个数据所指示的地址来访问下一个数据。Chain data structure: The chain data structure includes multiple data with discontinuous addresses, and the multiple data have an address pointing relationship with each other, that is, the previous data in the chain data structure points to the address of the next data. Please refer to Figure 1, which is a schematic diagram of a chained data structure provided by an embodiment of the present application. As shown in Figure 1, each data in the chained data structure includes two parts, one part is the valid data part, and the other part is the pointer part, and the pointer part is used to point to the address of the next data linked to the current data. Simply put, the chained data structure uses pointers to reflect the logical relationship between data elements. In this way, in the process of accessing the chained data structure, the access is usually performed from front to back, that is, the previous data is accessed first, and then the next data can be accessed based on the address indicated by the previous data.
通常,链式数据结构的结构形式主要包括单向链表、双向链表、环形链表、脊柱-肋骨(Backbone-rib)链表、二叉树结构以及结构体数组结构。其中,结构体数组结构是指在连续内存中保存结构体数组,且结构体内具有指针。示例性地,请参阅图2,图2为本申请实施例提供的多种不同的链式数据结构的示意图。图2中的(1)所展示的是Backbone-rib链表,图2中的(2)所展示的是二叉树结构,图2中的(3)所展示的是结构体数组结构。Usually, the structural forms of linked data structures mainly include one-way linked list, doubly linked list, circular linked list, spine-rib (Backbone-rib) linked list, binary tree structure and structure array structure. Among them, the structure array structure refers to saving the structure array in continuous memory, and the structure has pointers. For example, please refer to Figure 2, which is a schematic diagram of multiple different chained data structures provided by embodiments of the present application. (1) in Figure 2 shows a Backbone-rib linked list, (2) in Figure 2 shows a binary tree structure, and (3) in Figure 2 shows a structure array structure.
目前,链式数据结构主要是由动态连接的数据组成,通常表现为树/图/链表的形式。链式数据结构被广泛的应用在通用计算,高性能计算(High Performance Computing,HPC),数据库和人工智能领域。同时也是C++/Java等面向对象编程语言提供的容器的底层实现的重要数据结构。链式数据结构可以充分利用计算机内存空间,实现灵活的内存动态管理。但链式数据结构的缺点是数据之间没有空间局部性,因此对链式数据结构的读取大多是一种典型的非规则内存访问,容易造成访存延迟,限制中央运行设备(central processing unit,CPU)性能的发挥,在不同应用场景中表现为性能瓶颈。At present, chained data structures are mainly composed of dynamically connected data, usually in the form of trees/graphs/linked lists. Chained data structures are widely used in general computing, high performance computing (High Performance Computing, HPC), databases and artificial intelligence fields. It is also an important data structure for the underlying implementation of containers provided by object-oriented programming languages such as C++/Java. The chained data structure can make full use of computer memory space and achieve flexible dynamic memory management. However, the disadvantage of the chained data structure is that there is no spatial locality between data. Therefore, reading the chained data structure is mostly a typical irregular memory access, which easily causes memory access delays and limits the central processing unit. , CPU) performance appears as a performance bottleneck in different application scenarios.
访存延迟:访存延迟是指等待对系统内存中存储数据的访问完成时引起的延期。Memory access latency: Memory access latency is the delay caused by waiting for access to data stored in system memory to complete.
编译:是指利用编译程序从源语言编写的源程序产生目标代码的过程。目标代码是介于高级语言和机器语言之间的语言。目标代码能够被进一步转换为可执行的二进制机器代码。简单来说,编译是将由高级语言编写的源程序转换为由更接近机器语言的目标代码。由于计算机只认识1和0,因此编译实际上就是把人们熟悉的高级语言变成计算机可以识别的2进制语言。编译程序把一个源程序翻译成目标程序的工作过程分为五个阶段:词法分析;语法分析;语义检查和中间代码生成;代码优化;目标代码生成。Compilation: refers to the process of using a compiler to generate object code from a source program written in a source language. Object code is a language between high-level language and machine language. The object code can be further converted into executable binary machine code. Simply put, compilation is the conversion of a source program written in a high-level language into an object code that is closer to machine language. Since the computer only recognizes 1 and 0, compilation actually means turning the high-level language that people are familiar with into a binary language that the computer can recognize. The compiler's process of translating a source program into a target program is divided into five stages: lexical analysis; syntax analysis; semantic checking and intermediate code generation; code optimization; and target code generation.
中间代码:是源程序的一种内部表示,又可以称为中间表示(IntermediateRepresentation,IR)。中间表示的作用是可使编译程序的结构在逻辑上更为简单明确,特别是使目标代码的优化比较容易实现。中间表示的复杂性介于源程序语言和机器语言之间。 Intermediate code: It is an internal representation of the source program, which can also be called intermediate representation (IntermediateRepresentation, IR). The function of the intermediate representation is to make the structure of the compiled program logically simpler and clearer, especially to make the optimization of the target code easier to implement. The complexity of the intermediate representation is somewhere between source programming language and machine language.
代码优化:是指对程序进行多种等价变换,使得从变换后的程序出发,能生成更有效的目标代码。所谓等价,是指不改变程序的运行结果。所谓有效,主要指目标代码运行时间较短,以及占用的存储空间较小。这种变换称为优化。Code optimization: refers to performing various equivalent transformations on the program so that more effective target code can be generated based on the transformed program. The so-called equivalence means that the running results of the program are not changed. The so-called effective mainly refers to the short running time of the target code and the small storage space occupied. This transformation is called optimization.
优化Pass:优化Pass是编译框架中的重要部分。优化Pass对中间表示进行分析和修改。在代码优化的过程中,由多个优化Pass对中间表示进行分析和修改,每个Pass完成特定的优化工作。Optimization Pass: Optimization Pass is an important part of the compilation framework. Optimization Pass analyzes and modifies the intermediate representation. In the process of code optimization, the intermediate representation is analyzed and modified by multiple optimization passes, and each pass completes specific optimization work.
指令计数器(Program Counter,PC):一种寄存器,用于存储运行设备要执行的下一条指令的地址。Program Counter (PC): A register used to store the address of the next instruction to be executed by the running device.
元数据(Metadata):元数据,又称中介数据、中继数据,为描述数据的一种数据(dataabout data)。元数据主要是描述数据属性(property)的信息,用来支持如指示存储位置、历史数据、资源查找、文件记录等功能。具体来说,元数据算是一种电子式目录,为了达到编制目录的目的,必须在描述并收藏数据的内容或特色,进而达成协助数据检索的目的。Metadata: Metadata, also known as intermediary data and relay data, is a type of data that describes data (dataabout data). Metadata is mainly information that describes data attributes (property) and is used to support functions such as indicating storage location, historical data, resource search, file recording, etc. Specifically, metadata is a kind of electronic catalog. In order to achieve the purpose of cataloging, it must describe and collect the content or characteristics of the data, thereby achieving the purpose of assisting data retrieval.
一般来说,应用程序是由程序代码段、数据段以及只读数据段等程序段构成,程序代码段是由一条条连续的指令所构成。在应用程序的执行过程中,操作系统将应用程序的程序段加载到内存中,然后由运行设备基于一定的顺序依次执行程序代码段中的指令,从而实现应用程序的执行。Generally speaking, an application program is composed of program segments such as program code segments, data segments, and read-only data segments. The program code segments are composed of consecutive instructions. During the execution of the application program, the operating system loads the program segments of the application program into the memory, and then the running device sequentially executes the instructions in the program code segments based on a certain order, thereby realizing the execution of the application program.
可以参阅图3,图3为本申请实施例提供的一种运行设备执行应用程序的示意图。如图3所示,运行设备中通常包括控制单元、存储单元和运算单元。控制单元中包括指令计数器和指令寄存器,其中指令计数器用于存储下一条待执行的指令在内存中的地址,指令寄存器则用于存储待执行的指令。存储单元通常包括多个寄存器,例如通用寄存器、浮点寄存器等,存储单元中的寄存器通常用于存储执行指令期间需要用到的数据。运算单元用于根据当前所执行的指令,对数据进行加工处理。Please refer to FIG. 3 , which is a schematic diagram of a running device executing an application program according to an embodiment of the present application. As shown in Figure 3, running equipment usually includes a control unit, a storage unit and an arithmetic unit. The control unit includes an instruction counter and an instruction register. The instruction counter is used to store the address of the next instruction to be executed in the memory, and the instruction register is used to store the instruction to be executed. The storage unit usually includes multiple registers, such as general-purpose registers, floating-point registers, etc. The registers in the storage unit are usually used to store data needed during the execution of instructions. The computing unit is used to process data according to the currently executed instructions.
基于上述的结构,运行设备的运行原理为:控制单元在时序脉冲的作用下,将指令计数器里所指向的指令地址(即指令在内存中的地址)送到地址总线(图3中未示出)上,然后运行设备将这个指令地址中的指令读到指令寄存器进行译码。对于在执行指令过程中所需要用到的数据,运行设备将该数据对应的数据地址送到地址总线上,并基于该数据地址将数据读到运行设备内部的存储单元暂存。最后,运行设备中的运算单元基于当前所执行的指令对数据进行处理加工。总的来说,运行设备从内存中一条一条地取出指令和相应的数据,并根据指令中的操作码,对数据进行运算处理,直到程序执行完毕为止。Based on the above structure, the operating principle of the operating device is: under the action of the timing pulse, the control unit sends the instruction address pointed to by the instruction counter (that is, the address of the instruction in the memory) to the address bus (not shown in Figure 3 ), and then the running device reads the instruction in this instruction address into the instruction register for decoding. For the data needed in the execution of instructions, the running device sends the data address corresponding to the data to the address bus, and based on the data address, the data is read into the temporary storage unit inside the running device. Finally, the computing unit in the running device processes the data based on the currently executed instructions. In general, the running device fetches instructions and corresponding data from the memory one by one, and performs operations on the data according to the operation codes in the instructions until the program is executed.
具体地,运行设备的工作过程都可以分为5个阶段:取指令、指令译码、执行指令、访存取数和结果写回。Specifically, the working process of the operating device can be divided into five stages: instruction fetching, instruction decoding, instruction execution, memory access and result writing back.
1.取指令(Instruction Fetch,IF)阶段。1. Instruction Fetch (IF) stage.
取指令阶段是将一条指令从内存中取到指令寄存器的过程。其中,指令计数器中的数值,用于指示下一条待执行的指令在内存中的位置。当一条指令被取出后,指令计数器中的数值将根据指令的长度而自动递增。The instruction fetch phase is the process of fetching an instruction from memory to the instruction register. Among them, the value in the instruction counter is used to indicate the location of the next instruction to be executed in the memory. When an instruction is fetched, the value in the instruction counter is automatically incremented according to the length of the instruction.
2.指令译码(Instruction Decode,ID)阶段。 2. Instruction Decode (ID) stage.
取出指令后,运行设备立即进入指令译码阶段。在指令译码阶段,指令译码器按照预定的指令格式,对取回的指令进行拆分和解释,识别区分出不同的指令类别以及各种获取操作数的方法。After fetching the instruction, the running device immediately enters the instruction decoding stage. In the instruction decoding stage, the instruction decoder splits and interprets the retrieved instructions according to the predetermined instruction format, and identifies and distinguishes different instruction categories and various methods of obtaining operands.
3.执行指令(Execute,EX)阶段。3. Execute instruction (Execute, EX) stage.
在取指令和指令译码阶段之后,运行设备进入执行指令阶段。执行指令阶段的任务是完成指令所规定的各种操作,以实现指令的功能。因此,运行设备的不同部分被连接起来,以执行所需的操作。例如,如果要求完成一个加法运算,运算单元中的算术逻辑单元将被连接到一组输入和一组输出,输入端提供需要相加的数值,输出端将含有最后的运算结果。After the instruction fetch and instruction decoding stages, the operating device enters the instruction execution stage. The task of executing the instruction phase is to complete various operations specified by the instruction to realize the function of the instruction. Therefore, different parts of the operating equipment are connected to perform the required operations. For example, if an addition operation is required, the arithmetic logic unit in the arithmetic unit will be connected to a set of inputs and a set of outputs. The input terminals provide the values to be added, and the output terminals will contain the final operation result.
4.访存取数(Memory,MEM)阶段。4. Access data (Memory, MEM) stage.
在执行指令的过程中,根据指令需要,运行设备有可能要访问内存,以读取操作数。在这种情况下,运行设备就进入了访存取数阶段。访存取数阶段的任务是:运行设备根据指令地址码,得到操作数在内存中的地址,并从内存中读取该操作数用于运算。During the execution of instructions, depending on the instruction requirements, the running device may need to access memory to read the operands. In this case, the running device enters the access and data access phase. The task of the access phase is: the operating device obtains the address of the operand in the memory according to the instruction address code, and reads the operand from the memory for operation.
5.结果写回(Writeback,WB)阶段。5. Result writeback (WB) stage.
作为最后一个阶段,结果写回阶段把执行指令阶段的运行结果数据“写回”到某种存储结构中。例如,结果数据通常会被写到运行设备的内部寄存器中,以便被后续的指令快速地存取;在一些情况下,结果数据也可被写入相对较慢、但较廉价且容量较大的内存中。As the last stage, the result write-back stage "writes back" the running result data of the execution instruction stage into a certain storage structure. For example, the result data is usually written to the internal register of the running device so that it can be quickly accessed by subsequent instructions; in some cases, the result data can also be written to a relatively slower, but cheaper and larger capacity in memory.
在指令执行完毕、结果数据写回之后,运行设备就接着从指令计数器中取得下一条指令的地址,开始新一轮的循环,下一个指令周期将顺序取出下一条指令。After the instruction is executed and the result data is written back, the running device then obtains the address of the next instruction from the instruction counter and starts a new cycle. The next instruction will be sequentially fetched in the next instruction cycle.
由上述对运行设备工作过程的介绍可知,运行设备在处理每条访存指令时通常都需要执行访存取数阶段,并且在执行完访存取数阶段之后才能够对从内存中取得的数据进行运算处理。这样一来,在运行设备需要大量指令的情况下,运行设备每处理一条访存指令都需要等待数据从内存中被取出至缓存中,从而造成极大的访存延迟。From the above introduction to the working process of the operating device, it can be seen that the operating device usually needs to execute the access phase when processing each memory access instruction, and only after executing the access phase can the data obtained from the memory be processed. Perform computational processing. As a result, when running the device requires a large number of instructions, the running device needs to wait for data to be fetched from the memory to the cache every time it processes a memory access instruction, resulting in a huge memory access delay.
有鉴于此,业界普遍通过预取技术尝试掩盖访存延迟。预取技术主要包括软件预取技术(SoftWare Prefetch,SWP)和硬件预取技术(HardWare Prefetch,HWP)。In view of this, the industry generally attempts to cover up memory access delays through prefetching technology. Prefetching technology mainly includes software prefetching technology (SoftWare Prefetch, SWP) and hardware prefetching technology (HardWare Prefetch, HWP).
软件预取技术是指在程序中显式地插入预取指令,让运行设备从内存中读取指定地址的数据进缓存(Cache)。预取指令可以通过编译器自动添加,也可以通过程序员手工加入。软件预取几乎对硬件没有要求,其最大的技术挑战在于如何在目标代码中正确添加预取指令,对于链式数据结构来说,很难通过软件预取的方式进行优化,因为计算链式数据结构预取的地址开销很大,很容易造成预取提前量不足的问题。Software prefetching technology refers to explicitly inserting prefetch instructions into the program to allow the running device to read the data at the specified address from the memory into the cache (Cache). Prefetch instructions can be added automatically by the compiler or manually by the programmer. Software prefetching has almost no hardware requirements. Its biggest technical challenge is how to correctly add prefetch instructions in the target code. For chained data structures, it is difficult to optimize through software prefetching because chained data is calculated The address overhead of structure prefetching is very high, which can easily cause the problem of insufficient prefetch advance.
硬件预取技术是由硬件根据访存的历史信息,对未来可能的访存单元预先取入Cache,典型的硬件预取器包括流预取器和跨步预取器,流预取器的作用是当检测到程序以地址递增的方式访问数据时,自动预取下一个缓存行(Cacheline)的数据。跨步预取器会监控每条内存加载指令(Load),当发现规律性跨步读取时,该预取器会预先算出下一步地址并发起预取。业界现有硬件预取技术大多都是基于时间局部性和空间局部性的假设,然而链表式数据结构对当前的CPU访存架构非常不友好,也导致目前商用CPU面对此类应用性能不理想,很难预取复杂的非规则内存访问。 Hardware prefetching technology uses hardware to prefetch possible future memory access units into the cache based on historical memory access information. Typical hardware prefetchers include stream prefetchers and stride prefetchers. The role of stream prefetchers When it is detected that the program accesses data by increasing the address, the data of the next cache line (Cacheline) is automatically prefetched. The stride prefetcher monitors each memory load instruction (Load). When regular stride reads are found, the prefetcher will precalculate the next address and initiate prefetching. Most of the existing hardware prefetching technologies in the industry are based on the assumptions of temporal locality and spatial locality. However, the linked list data structure is very unfriendly to the current CPU memory access architecture, which also leads to the unsatisfactory performance of current commercial CPUs in such applications. , it is difficult to prefetch complex irregular memory accesses.
并且,目前的数据预取技术在对链式数据结构中的数据进行预取时,往往难以确定数据的预取数量,往往可能会一次性预取过多的数据而导致缓存污染,或者是数据预取过少而达不到提高数据访问性能的目的。Moreover, when current data prefetching technology prefetches data in a chained data structure, it is often difficult to determine the amount of data to be prefetched. It may often prefetch too much data at one time, leading to cache pollution, or data corruption. Prefetching is too little to achieve the purpose of improving data access performance.
有鉴于此,本申请实施例中提供了一种编译方法及数据预取方法,在编译过程中识别到访问链式数据结构的行为时,则生成数据访问指令以及用于指示链式数据结构中的待访问数据的地址的至少一个元数据,并且在数据访问指令前插入预取指令,以指示数据访问指令的地址以及该至少一个元数据。这样一来,运行设备在执行编译得到的可执行文件时,能够根据预取指令确定数据访问指令以及至少一个元数据,从而实现链式数据结构中的数据预取;并且,运行设备在基于预取指令获取到数据访问指令的地址后,则能够根据数据访问指令的访问次数来获知链式数据结构中的数据访问进度,从而适应性地调整预取数据的数量,保证链式数据结构中数据的有效预取。In view of this, embodiments of the present application provide a compilation method and a data prefetching method. When the behavior of accessing a chained data structure is recognized during the compilation process, a data access instruction is generated and used to indicate the behavior of the chained data structure. At least one metadata of the address of the data to be accessed, and a prefetch instruction is inserted before the data access instruction to indicate the address of the data access instruction and the at least one metadata. In this way, when the running device executes the compiled executable file, it can determine the data access instruction and at least one metadata based on the prefetch instruction, thereby realizing data prefetching in the chained data structure; and, the running device can determine the data access instruction and at least one metadata based on the prefetch instruction. After the instruction fetch obtains the address of the data access instruction, it can learn the data access progress in the chained data structure according to the number of accesses of the data access instruction, thereby adaptively adjusting the amount of prefetched data to ensure that the data in the chained data structure effective prefetching.
本申请实施例所提供的编译方法可以应用于编译具有链式数据结构访问行为的代码,例如通用计算,高性能计算,数据库以及人工智能等领域的代码编译。本申请实施例所提供的数据预取方法则可以是应用于需要执行有链式数据结构访问需求的应用程序的场景。The compilation method provided by the embodiment of the present application can be applied to compile codes with chained data structure access behavior, such as code compilation in fields such as general computing, high-performance computing, databases, and artificial intelligence. The data prefetching method provided by the embodiment of the present application can be applied to scenarios that need to execute applications that require access to chained data structures.
示例性地,本申请实施例所提供的编译方法和数据预取方法可以应用于电子设备上。本申请实施例所提供的电子设备例如可以是服务器、智能手机(mobile phone)、个人电脑(personal computer,PC)、笔记本电脑、平板电脑、智慧电视、移动互联网设备(mobileinternet device,MID)、可穿戴设备,虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线终端、无人驾驶(selfdriving)中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smartgrid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端等。For example, the compilation method and data prefetching method provided by the embodiments of the present application can be applied to electronic devices. The electronic device provided by the embodiment of the present application can be, for example, a server, a smart phone (mobile phone), a personal computer (PC), a notebook computer, a tablet computer, a smart TV, a mobile Internet device (mobile internet device, MID), or Wearable devices, virtual reality (VR) devices, augmented reality (AR) devices, wireless terminals in industrial control, wireless terminals in self-driving, remote surgery (remote medical) Wireless terminals in surgery, wireless terminals in smart grid, wireless terminals in transportation safety, wireless terminals in smart city, wireless terminals in smart home, etc. .
请参阅图4,图4为本申请实施例提供的一种编译方法的流程示意图。如图4所示,该编译方法包括以下的步骤401-403。Please refer to Figure 4, which is a schematic flowchart of a compilation method provided by an embodiment of the present application. As shown in Figure 4, the compilation method includes the following steps 401-403.
步骤401,获取第一代码。Step 401: Obtain the first code.
本实施例中,第一代码可以是指程序源代码。其中,程序源代码是指未编译的按照一定的程序设计语言规范书写的文本文件,是一系列人类可读的计算机语言指令。示例性地,程序源代码可以是基于java,c,c++,python等高级语言编写成的代码。In this embodiment, the first code may refer to program source code. Among them, program source code refers to an uncompiled text file written in accordance with certain programming language specifications. It is a series of human-readable computer language instructions. For example, the program source code may be code written based on high-level languages such as java, c, c++, python, etc.
步骤402,在识别到所述第一代码中存在请求访问链式数据结构的代码时,根据所述链式数据结构生成至少一个元数据和数据访问指令,其中,所述链式数据结构包括地址不连续的多个数据,所述至少一个元数据分别用于指示所述链式数据结构中待访问数据的地址,所述数据访问指令用于指示所述链式数据结构的地址以及请求访问所述链式数据结构。Step 402: When it is recognized that the first code contains code requesting access to a chained data structure, generate at least one metadata and data access instruction according to the chained data structure, wherein the chained data structure includes an address. A plurality of discontinuous data, the at least one metadata is used to indicate the address of the data to be accessed in the chained data structure, and the data access instruction is used to indicate the address of the chained data structure and request access to all data. Describe the chained data structure.
本实施例中,编译器在对第一代码进行编译的过程,当编译器识别到第一代码中存在请求访问链式数据结构的行为时,由于链式数据结构中的每个数据都会指向下一个数据的地址,因此编译器可以根据链式数据结构的实际结构来获取链式数据结构中待访问数据的地址,从而生成至少一个元数据。其中,编译器所生成的至少一个元数据分别用于指示链 式数据结构中的多个待访问数据的地址。链式数据结构中包括地址不连续的多个数据,且数据之间具有地址指向关系。此外,链式数据结构中的多个待访问数据可以是链式数据结构中的所有数据或者是链式数据结构中的部分数据,本实施例对此不做具体限定。In this embodiment, when the compiler is compiling the first code, when the compiler recognizes that there is a behavior in the first code that requests access to the chained data structure, since each data in the chained data structure will point downward The address of a data, so the compiler can obtain the address of the data to be accessed in the chained data structure based on the actual structure of the chained data structure, thereby generating at least one metadata. Among them, at least one metadata generated by the compiler is used to indicate the chain The addresses of multiple data to be accessed in the formula data structure. The chained data structure includes multiple data with discontinuous addresses, and there is an address pointing relationship between the data. In addition, the multiple data to be accessed in the chain data structure may be all the data in the chain data structure or part of the data in the chain data structure, which is not specifically limited in this embodiment.
需要说明的是,本实施例中所述的至少一个元数据是指一个或一个以上元数据。为了便于叙述,对于“至少一个元数据”,以下将简称为“元数据”。It should be noted that the at least one metadata described in this embodiment refers to one or more metadata. For convenience of description, "at least one metadata" will be referred to as "metadata" below.
可选的,编译器所生成的元数据可以是位于第二代码中的代码段或数据段,所述第二代码是基于所述第一代码编译得到的,即第二代码实际上是编译器根据第一代码编译得到的可执行文件。也就是说,元数据可以是作为不执行的指令代码存放于第二代码的代码段中,该元数据也可以是作为程序代码中的一种数据而存放于第二代码的数据段中。Optionally, the metadata generated by the compiler may be a code segment or data segment located in the second code, which is compiled based on the first code, that is, the second code is actually the compiler The executable file compiled based on the first code. That is to say, the metadata may be stored in the code segment of the second code as an instruction code that is not executed, or the metadata may be stored in the data segment of the second code as a type of data in the program code.
另外,在编译器识别到第一代码中存在请求访问链式数据结构的行为时,编译器还生成数据访问指令,以便于后续运行设备在执行编译后的可执行文件时能够根据数据访问指令来访问链式数据结构。其中,数据访问指令具体用于请求访问所述链式数据结构,且数据访问指令中还指示了所述链式数据结构的地址。In addition, when the compiler recognizes that there is a request to access the chained data structure in the first code, the compiler also generates a data access instruction so that the subsequent running device can execute the compiled executable file according to the data access instruction. Access chained data structures. The data access instruction is specifically used to request access to the chained data structure, and the data access instruction also indicates the address of the chained data structure.
其中,链式数据结构的地址可以是指链式数据结构中任意一个数据的地址。链式数据结构的地址可以是根据链式数据结构中第一个需要访问的数据的地址来确定的。例如,在第一代码中,如果链式数据结构中第一个需要访问的数据为链式数据结构中的首个数据,那么数据访问指令则指示链式数据结构中首个数据的地址;如果链式数据结构中第一个需要访问的数据为链式数据结构中间的某个数据,那么数据访问指令则指示链式数据结构中间的该数据的地址。此外,链式数据结构的地址可以是指示具体的单个地址,例如存储某个数据的起始地址;链式数据结构的地址也可以是指示一个地址段,例如存储某个数据的地址段。Among them, the address of the chained data structure can refer to the address of any data in the chained data structure. The address of the chained data structure may be determined based on the address of the first data in the chained data structure that needs to be accessed. For example, in the first code, if the first data that needs to be accessed in the chained data structure is the first data in the chained data structure, then the data access instruction indicates the address of the first data in the chained data structure; if The first data that needs to be accessed in the chained data structure is some data in the middle of the chained data structure, then the data access instruction indicates the address of the data in the middle of the chained data structure. In addition, the address of the chained data structure may indicate a specific single address, such as the starting address where certain data is stored; the address of the chained data structure may also indicate an address segment, such as the address segment where certain data is stored.
此外,本申请实施例所述的地址,例如链式数据结构的地址、数据访问指令的地址等地址,均可以是指物理存储地址或虚拟存储地址,本实施例对此不做具体限定。In addition, the addresses described in the embodiments of this application, such as addresses of chained data structures, addresses of data access instructions, etc., may refer to physical storage addresses or virtual storage addresses, which are not specifically limited in this embodiment.
步骤403,根据所述至少一个元数据和所述数据访问指令生成预取指令,以得到编译后的第二代码,所述预取指令用于指示所述数据访问指令的地址以及所述至少一个元数据。Step 403: Generate a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled second code. The prefetch instruction is used to indicate the address of the data access instruction and the at least one metadata.
在编译器生成元数据和数据访问指令后,编译器进一步生成预取指令,该预取指令用于指示数据访问指令的地址以及元数据。此外,编译器还可以是将预取指令插入在数据访问指令之前,以便于在应用程序执行阶段,运行设备在执行编译后的可执行文件时,是先执行预取指令,再执行数据访问指令。After the compiler generates metadata and data access instructions, the compiler further generates prefetch instructions, which are used to indicate the address of the data access instructions and metadata. In addition, the compiler can also insert the prefetch instruction before the data access instruction, so that during the application execution phase, when the running device executes the compiled executable file, it executes the prefetch instruction first and then executes the data access instruction. .
本实施例中,通过在编译得到的可执行文件中插入预取指令,运行设备在执行可执行文件时,能够根据预取指令确定数据访问指令以及元数据,从而实现链式数据结构中的数据预取。即,运行设备先根据预取指令确定数据访问指令的地址,从而通过访问数据访问指令的地址来获取链式数据结构的起始存储地址;然后,基于链式数据结构的起始存储地址和元数据,有序地预取链式数据结构中的数据。并且,运行设备在基于预取指令获取到数据访问指令的地址后,则能够根据数据访问指令的访问次数来获知链式数据结构中的数据访问进度,从而适应性地调整预取数据的数量,保证链式数据结构中数据的有效预取。In this embodiment, by inserting prefetch instructions into the compiled executable file, the running device can determine the data access instructions and metadata based on the prefetch instructions when executing the executable file, thereby realizing the data in the chained data structure. Prefetching. That is, the running device first determines the address of the data access instruction based on the prefetch instruction, thereby obtaining the starting storage address of the chained data structure by accessing the address of the data accessing instruction; then, based on the starting storage address and element of the chained data structure Data, ordered prefetching of data in a chained data structure. Moreover, after the running device obtains the address of the data access instruction based on the prefetch instruction, it can learn the data access progress in the chained data structure based on the number of accesses of the data access instruction, thereby adaptively adjusting the amount of prefetched data. Ensure efficient prefetching of data in chained data structures.
可选的,上述的预取指令可以是直接指示数据访问指令的地址,例如预取指令指示数 据访问指令的地址为0x1002。预取指令也可以是指示预取指令与数据访问指令之间的地址偏移。例如,预取指令的地址为0x1008,数据访问指令的地址为0x1002,那么预取指令与数据访问指令之间的地址偏移则为06。可以理解的是,考虑到预取指令提示运行设备进行预取的及时性和有效性,预取指令是生成在对数据访问指令前,且这两个指令之间的地址偏移较小。Optionally, the above-mentioned prefetch instruction may directly indicate the address of the data access instruction, such as the prefetch instruction indicator number. The address of the data access instruction is 0x1002. The prefetch instruction may also indicate an address offset between the prefetch instruction and the data access instruction. For example, if the address of the prefetch instruction is 0x1008 and the address of the data access instruction is 0x1002, then the address offset between the prefetch instruction and the data access instruction is 06. It is understandable that, considering the timeliness and effectiveness of the prefetch instruction prompting the running device to perform prefetching, the prefetch instruction is generated before the data access instruction, and the address offset between the two instructions is small.
因此,本实施例中,相较于直接在预取指令中指示数据访问指令的地址,通过在预取指令中指示其与数据访问指令之间的地址偏移,能够减少对预取指令的编码空间占用,从而节省指令开销。Therefore, in this embodiment, compared with directly indicating the address of the data access instruction in the prefetch instruction, by indicating the address offset between it and the data access instruction in the prefetch instruction, the encoding of the prefetch instruction can be reduced space occupied, thereby saving instruction overhead.
以上介绍了编译器在编译过程中生成元数据以及数据访问指令,且在数据访问之前前插入了指示元数据和数据访问指令的预取指令。为了便于理解,以下将详细介绍编译器所生成的元数据。The above describes that the compiler generates metadata and data access instructions during the compilation process, and inserts prefetch instructions indicating metadata and data access instructions before data access. For ease of understanding, the metadata generated by the compiler is described in detail below.
可选的,上述的预取指令也可以是指示元数据的内容,或者是指示所述元数据的存储地址。Optionally, the above-mentioned prefetch instruction may also indicate the content of the metadata, or indicate the storage address of the metadata.
其中,预取指令指示元数据的存储地址的实现方式有多种。Among them, there are many ways to implement the prefetch instruction to indicate the storage address of the metadata.
实现方式1,预取指令用于指示元数据的起始存储地址以及所述元数据的数量,且每个元数据的大小相同。Implementation Mode 1: The prefetch instruction is used to indicate the starting storage address of the metadata and the quantity of the metadata, and the size of each metadata is the same.
本实施例中,编译器所生成的元数据的大小均相同,且元数据的存储地址是连续的。因此,编译器可以是在预取指令中指示元数据的起始存储地址以及元数据的数量。这样一来,运行设备则能够根据元数据的起始存储地址以及元数据的大小,先取出元数据中的首个元数据;并且,以元数据的大小为地址偏移,继续取出后续的其他元数据,从而实现所有元数据的预取。In this embodiment, the sizes of the metadata generated by the compiler are all the same, and the storage addresses of the metadata are continuous. Therefore, the compiler may indicate the starting storage address of the metadata and the amount of metadata in the prefetch instruction. In this way, the running device can first take out the first metadata in the metadata based on the starting storage address of the metadata and the size of the metadata; and use the size of the metadata as the address offset to continue to take out other subsequent ones. metadata, thereby enabling prefetching of all metadata.
例如,假设元数据的数量为4个,第一个元数据存储于0x0004-0x0007,第二个元数据存储于0x0008-0x000b,第三个元数据存储于0x000c-0x000f,第四个元数据存储于0x0010-0x0013。那么,编译器可以在预取指令中指示元数据的起始存储地址为0x0004,元数据的数量为4个。这样,运行设备可以基于元数据的大小为4字节,以及元数据的起始存储地址和数量,分别确定每个元数据的存储地址。For example, assuming the number of metadata is 4, the first metadata is stored in 0x0004-0x0007, the second metadata is stored in 0x0008-0x000b, the third metadata is stored in 0x000c-0x000f, and the fourth metadata is stored in at 0x0010-0x0013. Then, the compiler can indicate in the prefetch instruction that the starting storage address of metadata is 0x0004 and the number of metadata is 4. In this way, the running device can determine the storage address of each metadata based on the size of the metadata being 4 bytes and the starting storage address and quantity of the metadata.
实现方式2,预取指令用于指示元数据的起始存储地址以及每个元数据的大小。Implementation Mode 2: The prefetch instruction is used to indicate the starting storage address of metadata and the size of each metadata.
相较于实现方式1,实现方式2中预取指令所指示的每个元数据的大小可以是不一样的。Compared with implementation method 1, in implementation method 2, the size of each metadata indicated by the prefetch instruction may be different.
例如,假设元数据的数量为4个,预取指令中指示了多个元数据的起始存储地址为0x0001,且第一个元数据的大小为2字节,第二个元数据的大小为4字节,第三个元数据的大小为2字节,第四个元数据的大小为6字节。这样,运行设备可以基于元数据的起始存储地址和各个元数据的大小,确定第一个元数据的存储地址为0x0000-0x0001,第二个元数据的存储地址为0x002-0x005,第三个元数据的存储地址为0x0006-0x0007,第四个元数据的存储地址为0x0008-0x000d。 For example, assume that the number of metadata is 4, the prefetch instruction indicates that the starting storage address of multiple metadata is 0x0001, and the size of the first metadata is 2 bytes, and the size of the second metadata is 4 bytes, the size of the third metadata is 2 bytes, and the size of the fourth metadata is 6 bytes. In this way, the running device can determine the storage address of the first metadata as 0x0000-0x0001, the storage address of the second metadata as 0x002-0x005, and the storage address of the third metadata based on the starting storage address of the metadata and the size of each metadata. The storage address of metadata is 0x0006-0x0007, and the storage address of the fourth metadata is 0x0008-0x000d.
可以理解的是,由于链式数据结构中的数据包括用于指向其他数据的地址的指针,即链式数据结构中的数据中包括指向下一个数据的地址的指针。因此,在一些可能的实现方式中,编译器所生成的元数据可以是不直接指示链式数据结构中的数据的地址,而是指示数据中的指针在数据中所处的位置。It can be understood that since the data in the chained data structure includes pointers pointing to the addresses of other data, that is, the data in the chained data structure includes pointers pointing to the addresses of the next data. Therefore, in some possible implementations, the metadata generated by the compiler may not directly indicate the address of the data in the chained data structure, but may indicate the location of the pointer in the data.
示例性地,编译器所生成的多个元数据可以是分别与所述链式数据结构中不同的数据对应,例如该多个元数据中的每个元数据分别与链式数据结构中的每个待访问数据一一对应。并且,该多个元数据均用于指示所对应的数据中指针的位置。For example, the plurality of metadata generated by the compiler may respectively correspond to different data in the chained data structure. For example, each metadata in the plurality of metadata corresponds to each piece of data in the chained data structure. There is a one-to-one correspondence between the data to be accessed. Moreover, the plurality of metadata are used to indicate the position of the pointer in the corresponding data.
请参阅图5,图5为本申请实施例提供的一种链式数据结构中的数据与元数据之间的对应示意图。如图5所示,链式数据结构中包括4个待访问的数据,分别为:数据1、数据2、数据3和数据4。编译器根据链式数据结构中的4个待访问的数据,生成了4个元数据,分别为:元数据1、元数据2、元数据3和元数据4。并且,编译器所生成的4个元数据分别与链式数据结构中的4个待访问的数据一一对应。其中,元数据1指示了数据1中的指针1在数据1中的偏移(即指针1的起始存储地址相对于数据1的起始存储地址之间的偏移)为8;元数据2指示了数据2中的指针2在数据2中的偏移为14;元数据3指示了数据3中的指针3在数据3中的偏移为4;元数据4指示了数据4中的指针4在数据4中的偏移为14。这样,运行设备在获得各个元数据之后,即可根据元数据所指示的内容,确定从链式数据结构中预取得到的各个数据的指针的位置,进而继续确定下一个需要预取的数据的地址。Please refer to FIG. 5 , which is a schematic diagram of the correspondence between data and metadata in a chained data structure provided by an embodiment of the present application. As shown in Figure 5, the chained data structure includes 4 data to be accessed, namely: data 1, data 2, data 3 and data 4. The compiler generates 4 metadata based on the 4 data to be accessed in the chain data structure, namely: metadata 1, metadata 2, metadata 3 and metadata 4. Moreover, the four pieces of metadata generated by the compiler correspond to the four pieces of data to be accessed in the chained data structure. Among them, metadata 1 indicates that the offset of pointer 1 in data 1 in data 1 (that is, the offset between the starting storage address of pointer 1 and the starting storage address of data 1) is 8; metadata 2 Indicates that the offset of pointer 2 in data 2 is 14; metadata 3 indicates that the offset of pointer 3 in data 3 is 4; metadata 4 indicates that pointer 4 in data 4 The offset in data 4 is 14. In this way, after the running device obtains each metadata, it can determine the position of the pointer of each data prefetched from the chain data structure based on the content indicated by the metadata, and then continue to determine the location of the next data that needs to be prefetched. address.
可选的,编译器所生成的多个元数据中的每个元数据还用于指示其对应的数据所指向的其他数据的大小。这样,在链式数据结构中的数据的大小不相同的情况下,运行设备在执行编译得到的可执行文件时,运行设备能够根据元数据的指示确定下一个需要预取的数据的大小,从而基于下一个需要预取的数据的起始存储地址和大小实现该数据的预取。Optionally, each of the plurality of metadata generated by the compiler is also used to indicate the size of other data pointed to by its corresponding data. In this way, when the size of the data in the chained data structure is different, when the running device executes the compiled executable file, the running device can determine the size of the next data that needs to be prefetched according to the instructions of the metadata, thereby Prefetching of data is implemented based on the starting storage address and size of the next data that needs to be prefetched.
示例性地,以图5为例,元数据1与数据1对应,元数据1还可以指示数据1所指向的数据2的大小。这样,运行设备在根据元数据1中所指示的偏移确定指针1在数据1中的位置之后,即可根据指针1确定数据2的起始存储地址;然后,运行设备结合元数据1中所指示的数据2的大小以及数据2的起始存储地址,确定整个数据2的实际存储地址,从而实现数据2的预取。Illustratively, taking Figure 5 as an example, metadata 1 corresponds to data 1, and metadata 1 can also indicate the size of data 2 pointed to by data 1. In this way, after the running device determines the position of pointer 1 in data 1 based on the offset indicated in metadata 1, it can determine the starting storage address of data 2 based on pointer 1; then, the running device combines the offset indicated in metadata 1 The indicated size of data 2 and the starting storage address of data 2 determine the actual storage address of the entire data 2, thereby realizing prefetching of data 2.
可选的,编译器所生成的多个元数据中的每个元数据还用于指示其对应的数据的类型以及其对应的数据所指向的其他数据的类型。Optionally, each of the plurality of metadata generated by the compiler is also used to indicate the type of its corresponding data and the type of other data pointed to by its corresponding data.
请参阅图6,图6为本申请实施例一种链式数据结构中的数据的类型的示意图。如图6所示,在Backbone-rib链式数据结构中,一部分数据是链接有多个数据,即一部分数据指向了多个其他数据的地址。在这种情况下,设定同一行的数据的类型相同,即第一行数据的数据类型为0,第二行数据的数据类型为1,第三行数据的数据类型为2。这样,当元数据指示了数据类型0和1的情况下,运行设备则能够确定元数据所对应的数据为第一行数据,且元数据对应的数据所指向的数据为第二行数据;类似地,当元数据指示了数据类型0和0的情况下,运行设备则能够确定元数据所对应的数据为第一行数据,且元数据对应的数据所指向的数据为第一行数据。 Please refer to FIG. 6 , which is a schematic diagram of data types in a chained data structure according to an embodiment of the present application. As shown in Figure 6, in the Backbone-rib chained data structure, part of the data is linked to multiple data, that is, part of the data points to the addresses of multiple other data. In this case, set the data types of the same row to be the same, that is, the data type of the first row of data is 0, the data type of the second row of data is 1, and the data type of the third row of data is 2. In this way, when the metadata indicates data types 0 and 1, the running device can determine that the data corresponding to the metadata is the first row of data, and the data pointed to by the metadata corresponding to the data is the second row of data; similar Specifically, when the metadata indicates data types 0 and 0, the running device can determine that the data corresponding to the metadata is the first row of data, and the data pointed to by the data corresponding to the metadata is the first row of data.
这样一来,通过在元数据中指示数据的数据类型,能够确定元数据所指示的指针实际上是指向哪个数据,从而确定链式数据结构中的数据的链接关系,以便于运行设备能够在复杂的链式数据结构中有效地实现数据的预取。In this way, by indicating the data type of the data in the metadata, it is possible to determine which data the pointer indicated by the metadata actually points to, thereby determining the link relationship of the data in the chained data structure, so that the running device can operate in complex situations. Effectively implement data prefetching in the chained data structure.
以上介绍了本申请实施例提供的一种编译方法。为了便于理解,以下将介绍本申请实施例提供的一种数据预取方法,以了解运行设备如何基于编译得到的可执行文件来实现数据的预取。The above describes a compilation method provided by the embodiment of the present application. For ease of understanding, a data prefetching method provided by embodiments of the present application will be introduced below to understand how the running device implements data prefetching based on the compiled executable file.
请参阅图7,图7为本申请实施例提供的一种数据预取方法的流程示意图。如图7所示,该数据预取方法包括以下的步骤701-704。并且,该数据预取方法应用于计算机系统的第一实例,所述计算机系统还包括第二实例。Please refer to FIG. 7 , which is a schematic flowchart of a data prefetching method provided by an embodiment of the present application. As shown in Figure 7, the data prefetching method includes the following steps 701-704. Moreover, the data prefetching method is applied to the first instance of the computer system, and the computer system further includes the second instance.
步骤701,所述第一实例获取预取指令,其中所述预取指令用于指示数据访问指令的地址以及至少一个元数据,所述数据访问指令用于指示链式数据结构的地址,所述链式数据结构包括地址不连续的多个数据,所述至少一个元数据用于指示链式数据结构中的数据的地址。Step 701: The first instance obtains a prefetch instruction, wherein the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata. The data access instruction is used to indicate the address of a chained data structure. The chained data structure includes a plurality of data with discontinuous addresses, and the at least one metadata is used to indicate the address of the data in the chained data structure.
本实施例中,运行设备中的第一实例在执行应用程序的可执行文件的过程中,能够获取到可执行文件中的预取指令。其中,可执行文件中的预取指令是上述的编译方法中所编译得到的预取指令,具体可参考上述的编译方法,在此不再赘述。In this embodiment, during the process of executing the executable file of the application program, the first instance in the running device can obtain the prefetch instructions in the executable file. The prefetch instructions in the executable file are the prefetch instructions compiled in the above-mentioned compilation method. For details, please refer to the above-mentioned compilation method, which will not be described again here.
其中,第二实例用于执行数据访问指令,以请求访问链式数据结构中的数据。具体地来说,第二实例用于执行应用程序的可执行文件,而第一实例则独立地执行本申请实施例提供的数据预取方法。Wherein, the second instance is used to execute a data access instruction to request access to data in the chained data structure. Specifically, the second instance is used to execute the executable file of the application program, while the first instance independently executes the data prefetching method provided by the embodiment of the present application.
需要说明的是,本申请实施例中的第一实例和第二实例可以为物理上的两个独立的执行单元,例如第一实例和第二实例分别为两个独立的处理器或处理核。第一实例和第二实例也可以为虚拟的两个独立的执行单元,例如第一实例和第二实例分别为不同的线程、超线程或进程,本实施例对此并不做具体限定。It should be noted that the first instance and the second instance in the embodiment of the present application may be two physically independent execution units. For example, the first instance and the second instance may be two independent processors or processing cores respectively. The first instance and the second instance may also be two virtual independent execution units. For example, the first instance and the second instance may be different threads, hyper-threads or processes respectively, which is not specifically limited in this embodiment.
步骤702,所述第一实例根据所述数据访问指令的地址,获取所述链式数据结构的地址。Step 702: The first instance obtains the address of the chained data structure according to the address of the data access instruction.
本实施例中,第二实例在获取到可执行文件中的预取指令后,运行设备则执行该预取指令,以启动第一实例。在第一实例启动后,第一实例同样可以获取到该预取指令。由于预取指令中指示了元数据以及数据访问指令的地址,因此第一实例可以根据预取指令获取到元数据,并暂时存储该元数据,以便于后续基于元数据来预取数据。此外,运行设备还可以基于预取指令所指示的数据访问指令的地址,获取数据访问指令所指示的链式数据结构的地址。In this embodiment, after the second instance obtains the prefetch instruction in the executable file, the running device executes the prefetch instruction to start the first instance. After the first instance is started, the first instance can also obtain the prefetch instruction. Since the metadata and the address of the data access instruction are indicated in the prefetch instruction, the first instance can obtain the metadata according to the prefetch instruction and temporarily store the metadata to facilitate subsequent prefetching of data based on the metadata. In addition, the running device can also obtain the address of the chained data structure indicated by the data access instruction based on the address of the data access instruction indicated by the prefetch instruction.
具体地,由于预取指令插入在数据访问指令之前,因此第一实例根据预取指令得到数据访问指令的地址之后,第一实例可以实时监控数据访问指令的地址,以确定第二实例何时执行到数据访问指令。当第二实例执行到数据访问指令时,第一实例则能够基于该数据访问指令,获取到链式数据结构的地址。Specifically, since the prefetch instruction is inserted before the data access instruction, after the first instance obtains the address of the data access instruction according to the prefetch instruction, the first instance can monitor the address of the data access instruction in real time to determine when the second instance is executed. to data access instructions. When the second instance executes the data access instruction, the first instance can obtain the address of the chained data structure based on the data access instruction.
其中,链式数据结构的地址可以是指链式数据结构中任意一个数据的地址。链式数据 结构的地址可以是根据链式数据结构中第一个需要访问的数据的地址来确定的。例如,在第一代码中,如果链式数据结构中第一个需要访问的数据为链式数据结构中的首个数据,那么数据访问指令则指示链式数据结构中首个数据的地址;如果链式数据结构中第一个需要访问的数据为链式数据结构中间的某个数据,那么数据访问指令则指示链式数据结构中间的该数据的地址。此外,链式数据结构的地址可以是指示具体的单个地址,例如存储某个数据的起始地址;链式数据结构的地址也可以是指示一个地址段,例如存储某个数据的地址段。Among them, the address of the chained data structure can refer to the address of any data in the chained data structure. chained data The address of the structure can be determined based on the address of the first data that needs to be accessed in the chained data structure. For example, in the first code, if the first data that needs to be accessed in the chained data structure is the first data in the chained data structure, then the data access instruction indicates the address of the first data in the chained data structure; if The first data that needs to be accessed in the chained data structure is some data in the middle of the chained data structure, then the data access instruction indicates the address of the data in the middle of the chained data structure. In addition, the address of the chained data structure may indicate a specific single address, such as the starting address where certain data is stored; the address of the chained data structure may also indicate an address segment, such as the address segment where certain data is stored.
步骤703,所述第一实例根据所述链式数据结构的地址和所述至少一个元数据,预取所述链式数据结构中的数据。Step 703: The first instance prefetches data in the chained data structure based on the address of the chained data structure and the at least one metadata.
在获得链式数据结构的起始存储地址以及元数据之后,第一实例可以根据该起始存储地址以及元数据所指示的待访问数据的地址,依次预取链式数据结构中的数据。After obtaining the starting storage address and metadata of the chained data structure, the first instance can sequentially prefetch data in the chained data structure based on the starting storage address and the address of the data to be accessed indicated by the metadata.
步骤704,所述第二实例执行所述数据访问指令,以访问所述链式数据结构中的数据。Step 704: The second instance executes the data access instruction to access data in the chained data structure.
其中,所述第一实例在预取所述链式数据结构中数据的过程中,所述第一实例根据所述第二实例执行所述数据访问指令的次数来控制预取所述链式数据结构中数据的进度,所述进度用于使所述链式数据结构中的数据在被访问前已预取到缓存中。Wherein, in the process of prefetching data in the chained data structure, the first instance controls the prefetching of the chained data according to the number of times the second instance executes the data access instruction. The progress of the data in the structure, which is used to make the data in the chained data structure prefetched into the cache before being accessed.
具体来说,在第一实例预取数据的过程中,第一实例在链式数据结构中的预取数据的数量是与数据访问指令的执行次数有关的,以保证第一实例所预取的数据始终多于实际所访问的数据。其中,数据访问指令的执行次数表示了链式数据结构中已访问的数据的个数。第二实例每执行一次数据访问指令,则代表链式数据结构中已访问的数据增加一个。具体来说,该数据访问指令可以是指示第二实例访问某个寄存器内特定位置所指示的地址,以实现访问链式数据结构中的数据;此外,第二实例还在根据寄存器内特定位置所指示的地址获取到需要访问的数据之后,将该寄存器内的数据替换为新获取到的数据。这样一来,第二实例继续执行该数据访问指令时,则能够根据寄存器内新的数据所指示的地址,获取到链式数据结构中的下一个数据。Specifically, during the process of prefetching data by the first instance, the amount of prefetched data in the chained data structure of the first instance is related to the number of execution times of the data access instructions to ensure that the data prefetched by the first instance There is always more data than is actually accessed. Among them, the number of execution times of data access instructions represents the number of accessed data in the chain data structure. Each time the second instance executes a data access instruction, it means that the accessed data in the chained data structure increases by one. Specifically, the data access instruction may instruct the second instance to access the address indicated by a specific location in a register to access the data in the chained data structure; in addition, the second instance also accesses the address indicated by the specific location in the register. After the indicated address obtains the data that needs to be accessed, the data in the register is replaced with the newly obtained data. In this way, when the second instance continues to execute the data access instruction, it can obtain the next data in the chain data structure according to the address indicated by the new data in the register.
例如,假设链式数据结构中的每个数据中指向特定类型数据的指针都是位于相同的位置,即指针在数据中的偏移都是相同的。这样,数据访问指令可以是指示访问某个寄存器内特定偏移下所指示的地址,即访问寄存器内所保存的数据中特定偏移下的指针所指示的地址。那么,在第二实例每次执行数据访问指令后,寄存器内的数据都会发生变化,且第二实例下一次执行数据访问指令时,能够根据寄存器内的数据来访问链式数据结构内的下一个数据。For example, assume that the pointer to a specific type of data in each data structure in the chained data structure is located at the same location, that is, the offset of the pointer in the data is the same. In this way, the data access instruction may be an instruction to access an address indicated at a specific offset in a certain register, that is, to access an address indicated by a pointer at a specific offset in the data stored in the register. Then, every time the second instance executes the data access instruction, the data in the register will change, and the next time the second instance executes the data access instruction, it can access the next one in the chained data structure based on the data in the register. data.
也就是说,第一实例在基于预取指令获取到数据访问指令的地址后,则可以实时监控数据访问指令的访问次数,从而能够根据数据访问指令的访问次数来获知链式数据结构中的数据访问进度,进而适应性地调整预取数据的数量,避免数据预取过少或过多,保证链式数据结构中数据的有效预取。That is to say, after the first instance obtains the address of the data access instruction based on the prefetch instruction, it can monitor the number of accesses of the data access instructions in real time, so that it can learn the data in the chained data structure based on the number of accesses of the data access instructions. Access progress, and then adaptively adjust the amount of prefetched data to avoid too little or too much data prefetching and ensure effective prefetching of data in the chained data structure.
可选的,由于数据访问指令的执行次数能够指示链式数据结构中实际被访问的数据的个数,因此第一实例可以是控制链式数据结构中的数据的预取数量与链式数据结构中实际已访问的数据的数量之间的差值在预设范围内。例如,假设预设范围为5-10,那么第一实 例可以则控制链式数据结构中的数据的预取数量始终比实际已访问的数据个数多5-10个,以在保证预取的及时性的同时,避免预取数据过多而污染缓存。Optionally, since the number of executions of the data access instruction can indicate the number of actually accessed data in the chained data structure, the first example may be to control the prefetch number of data in the chained data structure and the chained data structure. The difference between the amount of data actually accessed is within the preset range. For example, assuming the default range is 5-10, then the first actual For example, you can control the number of prefetched data in the chained data structure to always be 5-10 more than the actual number of accessed data, so as to ensure the timeliness of prefetching and avoid contaminating the cache due to too much prefetched data. .
可以理解的是,预设范围的值可以是根据实际应用场景来调整。例如,在运行设备缓存较大且数据访问性能要求较高的情况下,预设范围的值可以调整为较大的值;在运行设备缓存较小且数据访问性能要求不高的情况下,预设范围的值可以调整为较小的值。It can be understood that the value of the preset range can be adjusted according to actual application scenarios. For example, when the running device cache is large and the data access performance requirements are not high, the value of the preset range can be adjusted to a larger value; when the running device cache is small and the data access performance requirements are not high, the preset range value can be adjusted to a larger value. The set range value can be adjusted to a smaller value.
可选的,第一实例所获取到的预取指令具体可以是用于指示预取指令与数据访问指令之间的地址偏移。这样,运行设备可以根据预取指令的实际地址以及其与数据访问指令之间的地址偏移,确定数据访问指令的实际地址。Optionally, the prefetch instruction obtained in the first example may be specifically used to indicate the address offset between the prefetch instruction and the data access instruction. In this way, the running device can determine the actual address of the data access instruction based on the actual address of the prefetch instruction and the address offset between it and the data access instruction.
需要说明的是,在第一实例预取链式数据结构中的数据的过程中,第一实例所预取得到的数据可以是不包括指向其他数据的指针;或者第一实例预取得到的所有数据中的部分数据包括指向其他数据的指针,而另一部分数据则不包括指向其他数据的指针。类似地,第二实例执行数据访问指令的过程中,第二实例所访问的数据也可以是不包括指向其他数据的指针;或者第二实例所访问的所有数据中的部分数据包括指向其他数据的指针,而另一部分数据则不包括指向其他数据的指针。本申请实施例并不对预取的数据以及访问的数据的内容做具体限定。It should be noted that during the process of prefetching data in the chained data structure by the first instance, the data prefetched by the first instance may not include pointers to other data; or all the data prefetched by the first instance may be Part of the data includes pointers to other data, while other parts of the data do not include pointers to other data. Similarly, when the second instance executes the data access instruction, the data accessed by the second instance may not include pointers to other data; or part of all the data accessed by the second instance may include pointers to other data. pointers, while the other part of the data does not include pointers to other data. The embodiments of this application do not specifically limit the content of prefetched data and accessed data.
在一些可能的实现方式中,上述的预取指令也可以是指示元数据的内容,或者是指示所述元数据的存储地址。那么,在预取指令指示存储地址的情况下,运行设备则根据所述元数据的存储地址,获取所述元数据。In some possible implementations, the above-mentioned prefetch instruction may also indicate the content of metadata, or indicate the storage address of the metadata. Then, when the prefetch instruction indicates the storage address, the running device obtains the metadata based on the storage address of the metadata.
示例性地,在多个元数据中每个元数据的大小相同的情况下,所述预取指令具体指示所述多个元数据的起始存储地址以及所述多个元数据的数量。因此,运行设备在获取到预取指令后,则根据所述多个元数据的数量和大小,从元数据的起始存储地址开始获取所述多个元数据。具体来说,运行设备根据元数据的起始存储地址以及元数据的大小,先取出多个元数据中的首个元数据;并且,以元数据的大小为地址偏移,继续取出后续的其他元数据,从而实现所有元数据的预取。For example, when the size of each metadata in the plurality of metadata is the same, the prefetch instruction specifically indicates the starting storage address of the plurality of metadata and the number of the plurality of metadata. Therefore, after obtaining the prefetch instruction, the running device obtains the plurality of metadata starting from the starting storage address of the metadata according to the number and size of the plurality of metadata. Specifically, the running device first takes out the first metadata among multiple metadata according to the starting storage address of the metadata and the size of the metadata; and uses the size of the metadata as the address offset to continue to take out other subsequent metadata. metadata, thereby enabling prefetching of all metadata.
此外,预取指令还可以通过其他的实现方式来指示元数据的存储地址,具体请参考上述图3对应的实施例,在此不再赘述。In addition, the prefetch instruction can also indicate the storage address of the metadata through other implementation methods. For details, please refer to the embodiment corresponding to Figure 3 above, which will not be described again here.
可选的,由于链式数据结构中的数据包括用于指向同一个链式数据结构内其他数据的地址的指针,因此预取指令中所指示的多个元数据可以是分别与链式数据结构中不同的数据对应,且所述多个元数据均用于指示其对应的数据中指针的位置。Optionally, since the data in the chained data structure includes pointers pointing to the addresses of other data in the same chained data structure, the plurality of metadata indicated in the prefetch instruction may be separately associated with the chained data structure. Corresponds to different data in the data, and the plurality of metadata are used to indicate the position of the pointer in the corresponding data.
也就是说,编译器所生成的元数据可以是不直接指示链式数据结构中的数据的地址,而是指示数据中的指针在数据中所处的位置。在这种情况下,运行设备可以是基于以下的步骤来实现数据的预取。That is to say, the metadata generated by the compiler may not directly indicate the address of the data in the chained data structure, but indicate the location of the pointer in the data. In this case, the running device can implement data prefetching based on the following steps.
首先,运行设备根据链式数据结构的起始存储地址,预取所述链式数据结构中的首个数据。First, the running device prefetches the first data in the chained data structure according to the starting storage address of the chained data structure.
然后,运行设备根据链式数据结构中的已预取数据和已预取数据对应的元数据,获取所述已预取数据中的指针,其中已预取数据包括链式数据结构中的首个数据或者是基于所 述首个数据继续预取得到的其他数据。Then, the running device obtains the pointer in the prefetched data according to the prefetched data in the chained data structure and the metadata corresponding to the prefetched data, where the prefetched data includes the first one in the chained data structure. data or based on The first data mentioned above continues to prefetch other data obtained.
最后,运行设备根据已预取数据中的指针所指向的地址,从所述链式数据结构中预取所述已预取数据所指向的其他数据。Finally, the running device prefetches other data pointed to by the prefetched data from the chain data structure according to the address pointed by the pointer in the prefetched data.
示例性地,以图5为例,假设运行设备根据链式数据结构的起始存储地址,预取了链式数据结构的数据1(即首个数据)后。运行设备再根据数据1和数据1所对应的元数据1,获取数据1中的指针的位置,进而获取到数据1中的指针。最后,运行设备再根据数据1中的指针所指向的地址,从链式数据结构中预取数据1所指向的数据2。类似地,在预取到数据2之后,运行设备可以根据数据2以及数据2对应的元数据2,继续预取数据2所指向的数据3,以此循环,直至预取的数据达到需要为止。For example, taking Figure 5 as an example, it is assumed that the running device prefetches data 1 (ie, the first data) of the chained data structure according to the starting storage address of the chained data structure. The running device then obtains the position of the pointer in data 1 based on data 1 and metadata 1 corresponding to data 1, and then obtains the pointer in data 1. Finally, the running device prefetches data 2 pointed to by data 1 from the chained data structure based on the address pointed to by the pointer in data 1. Similarly, after prefetching data 2, the running device can continue to prefetch data 3 pointed to by data 2 based on data 2 and metadata 2 corresponding to data 2, and this cycle continues until the prefetched data reaches the required level.
可选的,所述至少一个元数据还用于指示所对应的数据所指向的其他数据的大小,具体请参考上述实施例,在此不再赘述。Optionally, the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data. Please refer to the above embodiment for details, which will not be described again here.
可选的,所述至少一个元数据中的每个元数据还用于指示所对应的数据的类型和所述所对应的数据所指向的其他数据的类型,具体请参考上述实施例,在此不再赘述。Optionally, each metadata in the at least one metadata is also used to indicate the type of corresponding data and the type of other data pointed to by the corresponding data. For details, please refer to the above embodiment, here No longer.
为了便于理解,以下将结合具体例子详细介绍上述的编译方法和数据预取方法。In order to facilitate understanding, the above-mentioned compilation method and data prefetching method will be introduced in detail below with specific examples.
请参阅图8,图8为本申请实施例提供的一种系统架构示意图。如图8所示,系统架构中包括编译器和运行设备。其中,编译器用于对应用程序源代码进行编译,得到应用程序的可执行文件。并且,编译器编译得到的可执行文件中包括预取指令、数据访问指令和元数据。运行设备则用于执行应用程序的可执行文件,并根据可执行文件中的预取指令获取数据访问指令和元数据,以实现链式数据结构中的数据的预取。Please refer to Figure 8, which is a schematic diagram of a system architecture provided by an embodiment of the present application. As shown in Figure 8, the system architecture includes a compiler and running equipment. Among them, the compiler is used to compile the application source code to obtain the executable file of the application. Moreover, the executable file compiled by the compiler includes prefetch instructions, data access instructions and metadata. The running device is used to execute the executable file of the application and obtain data access instructions and metadata according to the prefetch instructions in the executable file to implement prefetching of data in the chained data structure.
请参阅图9,图9为本申请实施例提供的一种编译方法的流程示意图。如图9所示,该编译方法包括以下的步骤901-904。Please refer to Figure 9, which is a schematic flowchart of a compilation method provided by an embodiment of the present application. As shown in Figure 9, the compilation method includes the following steps 901-904.
步骤901,编译器识别源代码中对链式数据结构的访存行为,提炼链式数据结构中的数据链接关系,得到至少一个元数据。Step 901: The compiler identifies the memory access behavior of the chained data structure in the source code, refines the data link relationship in the chained data structure, and obtains at least one metadata.
在编译器对源代码进行编译的过程中,当编译器识别到源代码中存在对链式数据结构的访存行为时,编译器则提炼链式数据结构中的数据链接关系,得到至少一个元数据。其中,每个元数据与链式数据结构中的一个数据对应,且每个元数据指示了其对应的数据所指向的另一个数据。此外,每个元数据还可以是指示其对应的数据所指向的另一个数据的大小以及数据类型。When the compiler compiles the source code, when the compiler recognizes that there is a memory access behavior for the chained data structure in the source code, the compiler refines the data link relationship in the chained data structure and obtains at least one element. data. Among them, each metadata corresponds to one piece of data in the chained data structure, and each metadata indicates another data pointed to by its corresponding data. In addition, each metadata can also indicate the size and data type of another data that its corresponding data points to.
在一些实施例中,为了便于对元数据的管理,编译器所生成的每个元数据的大小都是相同的。例如,编译器所生成的每个元数据都是以X字节来进行保存,其中,X可以是根据不同的应用场景来进行设置,在此不做具体限定。In some embodiments, in order to facilitate the management of metadata, the size of each metadata generated by the compiler is the same. For example, each metadata generated by the compiler is saved in X bytes, where X can be set according to different application scenarios, and is not specifically limited here.
步骤902,编译器基于源代码中对链式数据结构的访存行为,生成数据访问指令。Step 902: The compiler generates data access instructions based on the memory access behavior of the chained data structure in the source code.
其中,数据访问指令用于指示访问链式数据结构中的数据,且数据访问指令还指示了链式数据结构的起始存储地址。The data access instruction is used to instruct access to data in the chained data structure, and the data access instruction also indicates the starting storage address of the chained data structure.
步骤903,编译器在数据访问指令前插入预取指令,以指示数据访问指令的地址以及元数据。 Step 903: The compiler inserts a prefetch instruction before the data access instruction to indicate the address and metadata of the data access instruction.
本实施例中,预取指令用于指示数据访问指令的地址以及元数据,并且预取指令是插入在数据访问指令之前的。也就是说,在应用程序执行阶段,运行设备在执行编译后的可执行文件时,是先执行预取指令,再执行数据访问指令,以便于实现数据的预取。In this embodiment, the prefetch instruction is used to indicate the address and metadata of the data access instruction, and the prefetch instruction is inserted before the data access instruction. That is to say, during the application execution phase, when the running device executes the compiled executable file, it first executes the prefetch instruction and then executes the data access instruction to facilitate data prefetching.
具体地,预取指令中可以是指示了元数据的起始存储地址、元数据的数量以及数据访问指令的地址(例如数据访问指令与预取指令之间的偏移)。Specifically, the prefetch instruction may indicate the starting storage address of the metadata, the amount of metadata, and the address of the data access instruction (for example, the offset between the data access instruction and the prefetch instruction).
步骤904,编译器生成携带预取指令、数据访问指令和元数据的可执行文件。Step 904: The compiler generates an executable file carrying prefetch instructions, data access instructions and metadata.
最后,在编译器完成对源代码的编译后,即可生成一个携带预取指令、数据访问指令和元数据的可执行文件。Finally, after the compiler completes compiling the source code, an executable file carrying prefetch instructions, data access instructions and metadata can be generated.
为详细说明本申请实施例,本实施例从典型Workload中抽取出对链式数据结构的访问模式,构造验证程序。然后,采用新增优化PASS的编译器编译该验证程序,生成带有预取指令的二进制程序(即可执行文件),并将该二进制程序运行到仿真器进行验证。In order to explain the embodiment of the present application in detail, this embodiment extracts the access pattern to the chained data structure from a typical workload and constructs a verification program. Then, the new optimized PASS compiler is used to compile the verification program, generate a binary program (ie, executable file) with prefetch instructions, and run the binary program to the emulator for verification.
请参阅图10A,图10A为本申请实施例提供的一种基于现有编译器对验证程序进行编译的示意图。如图10A所示,图10A中的a示出了链式数据结构的部分结构,图10A中的b示出了验证程序中指示访问链式数据结构的代码。在图10A中的c中,针对于验证程序中指示访问链式数据结构的代码,现有编译器经过编译后生成对应的数据访问指令。Please refer to Figure 10A. Figure 10A is a schematic diagram of compiling a verification program based on an existing compiler provided by an embodiment of the present application. As shown in Figure 10A, a in Figure 10A shows a partial structure of the chained data structure, and b in Figure 10A shows the code instructing access to the chained data structure in the verification program. In c in Figure 10A, for the code instructing access to the chained data structure in the verification program, the existing compiler generates the corresponding data access instruction after compilation.
请参阅图10B,图10B为本申请实施例提供的一种基于新增优化PASS的编译器对验证程序进行编译的示意图。如图10B所示,本申请实施例中在编译器中新增优化PASS,以同新增的优化PASS在编译阶段针对链式数据结构的访存行为生成对应的预取指令。如图10B中的c所示,相比于图10A中基于现有编译器编译得到的汇编代码,基于新增优化PASS的编译器所编译得到的汇编代码中还包括了相应的预取指令以及元数据。其中,新增的预取指令是地址为400b00所指示的指令,该指令具体为[2,0x104,0xc0]。在预取指令中,2表示的是元数据的数量;0x104表示的是预取指令与元数据之间的地址偏移;0xc0表示的是预取指令与数据访问指令之间的地址偏移。基于预取指令的地址以及预取指令所指示的地址偏移0x104,运行设备能够确定元数据的地址为400c04;基于预取指令的地址以及预取指令所指示的地址偏移0xc0,运行设备能够确定数据访问指令的地址为400bc0。Please refer to Figure 10B. Figure 10B is a schematic diagram of a verification program compiled by a compiler based on the newly added optimization PASS provided by an embodiment of the present application. As shown in FIG. 10B , in the embodiment of the present application, a new optimization PASS is added to the compiler, so that with the new optimization PASS, corresponding prefetch instructions can be generated for the memory access behavior of the chained data structure during the compilation phase. As shown in c in Figure 10B, compared to the assembly code compiled based on the existing compiler in Figure 10A, the assembly code compiled based on the new optimized PASS compiler also includes corresponding prefetch instructions and metadata. Among them, the newly added prefetch instruction is the instruction indicated by the address 400b00, which is specifically [2, 0x104, 0xc0]. In the prefetch instruction, 2 represents the number of metadata; 0x104 represents the address offset between the prefetch instruction and the metadata; 0xc0 represents the address offset between the prefetch instruction and the data access instruction. Based on the address of the prefetch instruction and the address offset 0x104 indicated by the prefetch instruction, the running device can determine that the address of the metadata is 400c04; based on the address of the prefetch instruction and the address offset 0xc0 indicated by the prefetch instruction, the running device can Determine the address of the data access instruction is 400bc0.
以下将结合图10B,详细介绍本实施例中编译器基于链式数据结构生成元数据的一种实施方式。An implementation manner in which the compiler generates metadata based on a chained data structure in this embodiment will be introduced in detail below with reference to Figure 10B.
编译器在对验证程序进行编译时,识别到验证程序中对链式数据结构的访存行为。针对图10B所示的验证程序,编译器识别到图10B中的b所示循环,并根据10B中的a所示的数据结构信息,生成相应的元数据。When the compiler compiles the verification program, it recognizes the memory access behavior of the chained data structure in the verification program. Regarding the verification program shown in Figure 10B, the compiler recognizes the loop shown in b in Figure 10B, and generates corresponding metadata based on the data structure information shown in a in 10B.
具体地,针对编译器所生成的元数据,元数据以4字节为单位,每个元数据保存链式数据结构中的某一个数据与该数据中的某指针所指向的另一个数据的信息。其中,每个元数据内存储的内容包括以下的5种信息。Specifically, for the metadata generated by the compiler, the metadata is in units of 4 bytes. Each metadata stores information about a certain data in the chained data structure and another data pointed to by a pointer in the data. . Among them, the content stored in each metadata includes the following five types of information.
1.节点标识(Node-ID)。其中,链式数据结构中不同类型的数据均被分配一个Node-ID。元数据中的Node-ID则用于指示元数据所对应的数据的类型。1. Node identification (Node-ID). Among them, different types of data in the chain data structure are assigned a Node-ID. The Node-ID in the metadata is used to indicate the type of data corresponding to the metadata.
2.地址偏移(Offset)。其中,Offset以N字节(Byte)为单位,存储当前元数据对应的数据的指针(pointer,Ptr)的相对偏移,即指示了指针在元数据对应的数据内的位置。 2. Address offset (Offset). Among them, Offset stores the relative offset of the pointer (Ptr) of the data corresponding to the current metadata in N bytes (Byte), that is, it indicates the position of the pointer within the data corresponding to the metadata.
3.Nextnode-ID。Nextnode-ID指示了Ptr指向的下一个数据的ID,即指示了元数据对应的数据所指向的另一个数据的类型。3.Nextnode-ID. Nextnode-ID indicates the ID of the next data pointed to by Ptr, that is, it indicates the type of another data pointed to by the data corresponding to the metadata.
4.Nextnode-size。Nextnode-size以M字节为单位,存储Ptr指向的下一个数据的大小,即指示了元数据对应的数据所指向的另一个数据的大小。4.Nextnode-size. Nextnode-size, in units of M bytes, stores the size of the next data pointed to by Ptr, which indicates the size of another data pointed to by the data corresponding to the metadata.
5.RSV:后续扩展其他需要提供给硬件的元数据信息。5.RSV: Subsequent expansion of other metadata information that needs to be provided to the hardware.
需要说明的是:上述的N和M以及各部分所占编码空间可以根据应用和架构调整,本实施例并不限定这些参数的具体取值。It should be noted that the above N and M and the coding space occupied by each part can be adjusted according to the application and architecture. This embodiment does not limit the specific values of these parameters.
具体地,本申请实例按以上的方式分配元数据的编码空间,且设定Offset与Nextnode-size以字节为单位表示;那么,对于图10B所示的程序,编译器生成的元数据如表1所示。Specifically, the example of this application allocates the encoding space of metadata in the above manner, and sets Offset and Nextnode-size to be expressed in bytes; then, for the program shown in Figure 10B, the metadata generated by the compiler is as shown in the table 1 shown.
表1
Table 1
由于图10B中的b所示程序中需要对BackboneNode结点的数据和RibNode结点的数据进行访问,因此编译器通过元数据传递给硬件计算这两个结点地址所需要的信息。由于图10B中的b中的访存行为,没有访问到ArcNode的数据,因此不用给出计算ArcNode地址的元数据,即编译器所生成的元数据是用于指示待访问数据的地址。Since the program shown in b in Figure 10B needs to access the data of the BackboneNode node and the data of the RibNode node, the compiler passes the metadata to the hardware to calculate the information required for the addresses of these two nodes. Since the memory access behavior in b in Figure 10B does not access the ArcNode data, there is no need to provide metadata for calculating the ArcNode address, that is, the metadata generated by the compiler is used to indicate the address of the data to be accessed.
在表1中,0x00001800这条元数据所表示的含义如下:在NodeID=0所代表的结点(BackboneNode)的offset=0偏移处,有一个指向Nextnode-ID=0(BackboneNode结点类型)的指针,Nextnode-ID=0所代表的结点Size为24Byte(Nextnode-size=24)。即,通过本条元数据,运行设备可以计算出每次迭代需要预取的下一个BackboneNode结点地址。In Table 1, the meaning of the metadata 0x00001800 is as follows: at the offset=0 offset of the node (BackboneNode) represented by NodeID=0, there is a pointer to Nextnode-ID=0 (BackboneNode node type) The pointer, the size of the node represented by Nextnode-ID=0 is 24Byte (Nextnode-size=24). That is, through this metadata, the running device can calculate the next BackboneNode node address that needs to be prefetched in each iteration.
此外,0x00805000这条元数据所表示的含义如下:在NodeID=0所代表的结点(BackboneNode)的offset=16偏移处,有一个指向Nextnode-ID=1(RibNode结点类型)的指针,Nextnode-ID=1所代表的结点Size为16Byte(Nextnode-size=16)。即,通过本条元数据,运行设备可以计算出每次迭代需要预取的下一个RibNode结点地址。In addition, the meaning of the metadata 0x00805000 is as follows: at the offset=16 offset of the node (BackboneNode) represented by NodeID=0, there is a pointer to Nextnode-ID=1 (RibNode node type), The size of the node represented by Nextnode-ID=1 is 16Byte (Nextnode-size=16). That is, through this piece of metadata, the running device can calculate the next RibNode node address that needs to be prefetched in each iteration.
请参阅图11,图11为本申请实施例提供的一种数据预取方法的流程示意图。如图11所示,该数据预取方法包括以下的步骤1101-1104。Please refer to Figure 11, which is a schematic flow chart of a data prefetching method provided by an embodiment of the present application. As shown in Figure 11, the data prefetching method includes the following steps 1101-1104.
步骤1101,运行设备判断指令是否为预取指令。Step 1101: The running device determines whether the instruction is a prefetch instruction.
在运行设备执行应用程序的可执行文件的过程中,运行设备中的译码单元判断当前待执行的指令是否为预取指令。When the running device executes the executable file of the application program, the decoding unit in the running device determines whether the instruction currently to be executed is a prefetch instruction.
步骤1102,运行设备基于预取指令初始化链式预取器,以获取并保存预取指令所指示的元数据以及数据访问指令的地址。Step 1102: The running device initializes the chain prefetcher based on the prefetch instruction to obtain and save the metadata indicated by the prefetch instruction and the address of the data access instruction.
若运行设备当前所译码的指令为预取指令,则运行设备基于预取指令初始化链式预取器。这样,运行设备中的链式预取器能够获取并保存预取指令所指示的元数据以及数据访 问指令的地址。If the instruction currently decoded by the execution device is a prefetch instruction, the execution device initializes the chain prefetcher based on the prefetch instruction. In this way, the chained prefetcher in the running device can obtain and save the metadata indicated by the prefetch instruction and the data access Ask for the address of the command.
步骤1103,运行设备判断指令是否为数据访问指令。Step 1103: The running device determines whether the instruction is a data access instruction.
由于预取指令插入在数据访问指令之前,因此运行设备基于预取指令获取到数据访问指令的地址后,运行设备中的译码单元持续监控是否执行到数据访问指令,即判断当前所译码的指令是否为预取指令所指示的数据访问指令。Since the prefetch instruction is inserted before the data access instruction, after the running device obtains the address of the data access instruction based on the prefetch instruction, the decoding unit in the running device continuously monitors whether the data access instruction is executed, that is, it determines the currently decoded Whether the instruction is the data access instruction indicated by the prefetch instruction.
步骤1104,运行设备基于数据访问指令获取链式数据结构的起始存储地址。Step 1104: The running device obtains the starting storage address of the chained data structure based on the data access instruction.
其中,数据访问指令指示了链式数据结构的起始存储地址。Among them, the data access instruction indicates the starting storage address of the chained data structure.
步骤1105,运行设备根据起始存储地址向缓存(cache)发送预取请求。Step 1105: The running device sends a prefetch request to the cache according to the starting storage address.
步骤1106,运行设备判断cache是否已返回预取数据。Step 1106: The operating device determines whether the cache has returned the prefetched data.
如果运行设备确定cache已经返回预取数据,则运行设备继续执行步骤1107。If the running device determines that the cache has returned the prefetched data, the running device continues to execute step 1107.
步骤1107,运行设备根据预取指令所指示的元数据和已返回数据计算下一次预取地址。Step 1107: The running device calculates the next prefetch address based on the metadata indicated by the prefetch instruction and the returned data.
由于元数据中指示了其对应的数据中的指针的位置,且数据中的指针又指示了下一个数据的地址,因此运行设备中的链式预取器能够根据预取指令所指示的元数据和已返回数据计算下一次预取地址。Since the metadata indicates the location of the pointer in the corresponding data, and the pointer in the data indicates the address of the next data, the chain prefetcher in the running device can perform the metadata instruction according to the prefetch instruction. Calculate the next prefetch address with the returned data.
步骤1108,运行设备根据数据访问指令的执行次数和数据预取数量判断是否停止预取。Step 1108: The operating device determines whether to stop prefetching based on the number of execution times of the data access instructions and the number of data prefetching.
由于预取指令中指示了数据访问指令的地址,因此运行设备能够基于数据访问指令的地址监控数据访问指令的执行次数。当数据预取数量与数据访问指令的执行次数之间的差值小于预设范围时,运行设备则基于已计算的下一次预取地址继续执行数据的预取;当数据预取数量与数据访问指令的执行次数之间的差值大于预设范围时,运行设备则停止数据的预取。Since the address of the data access instruction is indicated in the prefetch instruction, the operating device can monitor the number of execution times of the data access instruction based on the address of the data access instruction. When the difference between the number of data prefetches and the number of execution times of data access instructions is less than the preset range, the running device will continue to perform data prefetching based on the calculated next prefetch address; when the difference between the number of data prefetches and the number of data access instructions When the difference between the execution times of instructions is greater than the preset range, the operating device stops prefetching data.
以上介绍了本申请实施例所提供的编译方法和数据预取方法,以下将介绍用于执行上述方法的执行设备。The compilation method and data prefetching method provided by the embodiments of the present application are introduced above. The following will introduce the execution device for executing the above method.
具体可以参阅图12,图12为本申请实施例提供的一种编译装置1200的结构示意图,该编译装置1200包括:获取单元1201和处理单元1202。获取单元1201,用于获取第一代码;处理单元1202,用于在识别到所述第一代码中存在请求访问链式数据结构的代码时,根据所述链式数据结构生成数据访问指令和至少一个元数据,其中,所述链式数据结构包括地址不连续的多个数据,所述至少一个元数据分别用于指示所述链式数据结构中数据的地址,所述数据访问指令用于指示所述链式数据结构的地址以及请求访问所述链式数据结构;所述处理单元1202,还用于根据所述至少一个元数据和所述数据访问指令生成预取指令,以得到编译后的第二代码,所述预取指令用于指示所述数据访问指令的地址以及所述至少一个元数据。For details, reference may be made to FIG. 12 , which is a schematic structural diagram of a compilation device 1200 provided by an embodiment of the present application. The compilation device 1200 includes an acquisition unit 1201 and a processing unit 1202 . The acquisition unit 1201 is used to obtain the first code; the processing unit 1202 is used to generate a data access instruction and at least A piece of metadata, wherein the chained data structure includes a plurality of data with discontinuous addresses, the at least one metadata is used to indicate the address of the data in the chained data structure, and the data access instruction is used to indicate The address of the chained data structure and the request to access the chained data structure; the processing unit 1202 is also configured to generate a prefetch instruction according to the at least one metadata and the data access instruction to obtain the compiled The second code, the prefetch instruction is used to indicate the address of the data access instruction and the at least one metadata.
在一种可能的实现方式中,所述链式数据结构中的数据包括用于指向其他数据的地址的指针,所述至少一个元数据分别与所述链式数据结构中不同的数据对应,且所述至少一个元数据均用于指示所对应的数据中指针的位置。In a possible implementation, the data in the chained data structure includes pointers pointing to addresses of other data, the at least one metadata respectively corresponds to different data in the chained data structure, and The at least one metadata is used to indicate the position of the pointer in the corresponding data.
在一种可能的实现方式中,所述至少一个元数据还用于指示所对应的数据所指向的其他数据的大小。 In a possible implementation, the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
在一种可能的实现方式中,所述至少一个元数据中的每个元数据还用于指示所对应的数据的类型和所述所对应的数据所指向的其他数据的类型。In a possible implementation, each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
在一种可能的实现方式中,所述预取指令具体用于指示所述预取指令与所述数据访问指令之间的地址偏移。In a possible implementation, the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
在一种可能的实现方式中,所述预取指令具体用于指示所述至少一个元数据的地址。In a possible implementation, the prefetch instruction is specifically used to indicate the address of the at least one metadata.
在一种可能的实现方式中,所述预取指令用于指示所述至少一个元数据的起始地址以及所述至少一个元数据的数量,所述至少一个元数据的大小相同。In a possible implementation, the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata, and the sizes of the at least one metadata are the same.
在一种可能的实现方式中,所述至少一个元数据位于第二代码中的代码段或数据段,所述第二代码是基于所述第一代码编译得到的。In a possible implementation, the at least one metadata is located in a code segment or data segment in the second code, and the second code is compiled based on the first code.
具体可以参阅图13,图13为本申请实施例提供的一种数据预取装置1300的结构示意图,该数据预取装置1300包括:获取单元1301、预取单元1302和执行单元1303。获取单元1301,用于获取预取指令,其中所述预取指令用于指示数据访问指令的地址以及至少一个元数据,所述数据访问指令用于指示链式数据结构的地址,所述链式数据结构包括地址不连续的多个数据,所述至少一个元数据用于指示链式数据结构中的数据的地址;所述获取单元1301,还用于根据所述数据访问指令的地址,获取所述链式数据结构的地址;预取单元1302,用于根据所述链式数据结构的地址和所述至少一个元数据,预取所述链式数据结构中的数据;执行单元1303,用于执行所述数据访问指令,以访问所述链式数据结构中的数据;其中,所述第一实例在预取所述链式数据结构中数据的过程中,所述第一实例根据所述第二实例执行所述数据访问指令的次数来控制预取所述链式数据结构中数据的进度,所述进度用于使所述链式数据结构中的数据在被访问前已预取到缓存中。For details, please refer to FIG. 13 , which is a schematic structural diagram of a data prefetching device 1300 provided by an embodiment of the present application. The data prefetching device 1300 includes: an acquisition unit 1301 , a prefetching unit 1302 and an execution unit 1303 . Acquisition unit 1301, configured to acquire a prefetch instruction, wherein the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata, and the data access instruction is used to indicate the address of a chained data structure, the chained The data structure includes multiple data with discontinuous addresses, and the at least one metadata is used to indicate the address of the data in the chained data structure; the acquisition unit 1301 is also used to acquire all the data according to the address of the data access instruction. The address of the chained data structure; the prefetch unit 1302 is used to prefetch the data in the chained data structure according to the address of the chained data structure and the at least one metadata; the execution unit 1303 is used to Execute the data access instruction to access data in the chained data structure; wherein the first instance is in the process of prefetching data in the chained data structure, and the first instance is based on the first 2. The number of times the data access instruction is executed to control the progress of prefetching the data in the chained data structure. The progress is used to make the data in the chained data structure prefetched into the cache before being accessed. .
在一种可能的实现方式中,在所述预取单元1302预取所述链式数据结构中数据的过程中,已预取数据的数量与已访问数据的数量之间的差值在预设范围内。In a possible implementation, during the process of the prefetch unit 1302 prefetching data in the chain data structure, the difference between the amount of prefetched data and the amount of accessed data is within a preset value. within the range.
在一种可能的实现方式中,所述链式数据结构中的数据包括用于指向所述链式数据结构内其他数据的地址的指针,所述至少一个元数据分别与所述链式数据结构中不同的数据对应,且所述至少一个元数据均用于指示所对应的数据中指针的位置;In a possible implementation, the data in the chained data structure includes pointers pointing to addresses of other data in the chained data structure, and the at least one metadata is respectively related to the chained data structure. Corresponds to different data in the data, and the at least one metadata is used to indicate the position of the pointer in the corresponding data;
所述预取单元1302,具体用于:根据所述链式数据结构的地址,预取所述链式数据结构中的数据;根据所述链式数据结构中的已预取数据和所述已预取数据对应的元数据,获取所述已预取数据中的指针;根据所述已预取数据中的指针所指向的地址,从所述链式数据结构中预取所述已预取数据所指向的其他数据。The prefetch unit 1302 is specifically configured to: prefetch data in the chained data structure according to the address of the chained data structure; based on the prefetched data in the chained data structure and the already Prefetch the metadata corresponding to the data and obtain the pointer in the prefetched data; prefetch the prefetched data from the chained data structure according to the address pointed to by the pointer in the prefetched data. other data pointed to.
在一种可能的实现方式中,所述至少一个元数据还用于指示所对应的数据所指向的其他数据的大小。In a possible implementation, the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
在一种可能的实现方式中,所述至少一个元数据中的每个元数据还用于指示所对应的数据的类型和所述所对应的数据所指向的其他数据的类型。In a possible implementation, each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data.
在一种可能的实现方式中,所述预取指令具体用于指示所述预取指令与所述数据访问指令之间的地址偏移。In a possible implementation, the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
在一种可能的实现方式中,所述预取指令具体用于指示所述至少一个元数据的地址; In a possible implementation, the prefetch instruction is specifically used to indicate the address of the at least one metadata;
所述获取单元1301还用于:根据所述至少一个元数据的地址,获取所述至少一个元数据。The obtaining unit 1301 is also configured to: obtain the at least one metadata according to the address of the at least one metadata.
在一种可能的实现方式中,所述至少一个元数据的大小相同,所述预取指令用于指示所述至少一个元数据的起始地址以及所述至少一个元数据的数量;In a possible implementation, the at least one metadata has the same size, and the prefetch instruction is used to indicate the starting address of the at least one metadata and the quantity of the at least one metadata;
所述获取单元1301还用于:根据所述至少一个元数据的数量和大小,从所述至少一个元数据的起始地址开始获取所述至少一个元数据。The acquisition unit 1301 is further configured to: acquire the at least one metadata starting from the starting address of the at least one metadata according to the quantity and size of the at least one metadata.
本申请实施例提供的编译方法及数据预取方法具体可以由电子设备中的芯片来执行,该芯片包括:处理单元和通信单元,处理单元例如可以是处理器,通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使电子设备内的芯片执行上述图1至图11所示实施例描述的方法。可选的,存储单元为芯片内的存储单元,如寄存器、缓存等,存储单元还可以是无线接入设备端内的位于芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The compiling method and data prefetching method provided by the embodiments of the present application can be specifically executed by a chip in an electronic device. The chip includes: a processing unit and a communication unit. The processing unit can be, for example, a processor, and the communication unit can be, for example, an input/output. Interface, pin or circuit, etc. The processing unit can execute computer execution instructions stored in the storage unit, so that the chip in the electronic device executes the method described in the embodiments shown in FIGS. 1 to 11 . Optionally, the storage unit is a storage unit within the chip, such as a register, cache, etc. The storage unit can also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM). Or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
可以参阅图14,本申请还提供了一种计算机可读存储介质,在一些实施例中,上述实施例所公开的方法可以实施为以机器可读格式被编码在计算机可读存储介质上或者被编码在其它非瞬时性介质或者制品上的计算机程序指令。Referring to Figure 14, the present application also provides a computer-readable storage medium. In some embodiments, the methods disclosed in the above embodiments can be implemented as being encoded on the computer-readable storage medium in a machine-readable format or by Computer program instructions encoded on other non-transitory media or articles.
图14示意性地示出根据这里展示的至少一些实施例而布置的示例计算机可读存储介质的概念性局部视图,示例计算机可读存储介质包括用于在计算设备上执行计算机进程的计算机程序。14 schematically illustrates a conceptual partial view of an example computer-readable storage medium including a computer program for executing a computer process on a computing device, arranged in accordance with at least some embodiments presented herein.
在一个实施例中,计算机可读存储介质1400是使用信号承载介质1401来提供的。信号承载介质1401可以包括一个或多个程序指令1402,其当被一个或多个处理器运行时可以提供以上针对图4或图7描述的功能或者部分功能。因此,例如,参考图4中所示的实施例,步骤401-403的一个或多个特征可以由与信号承载介质1401相关联的一个或多个指令来承担。此外,图14中的程序指令1402也描述示例指令。In one embodiment, computer-readable storage media 1400 is provided using signal bearing media 1401. Signal bearing medium 1401 may include one or more program instructions 1402 that, when executed by one or more processors, may provide the functionality or portions of the functionality described above with respect to FIG. 4 or FIG. 7 . Thus, for example, referring to the embodiment shown in FIG. 4 , one or more features of steps 401 - 403 may be undertaken by one or more instructions associated with signal bearing medium 1401 . Additionally, program instructions 1402 in Figure 14 also describe example instructions.
在一些示例中,信号承载介质1401可以包含计算机可读介质1403,诸如但不限于,硬盘驱动器、紧密盘(CD)、数字视频光盘(DVD)、数字磁带、存储器、ROM或RAM等等。In some examples, signal bearing media 1401 may include computer readable media 1403 such as, but not limited to, a hard drive, compact disk (CD), digital video disc (DVD), digital tape, memory, ROM or RAM, and the like.
在一些实施方式中,信号承载介质1401可以包含计算机可记录介质1404,诸如但不限于,存储器、读/写(R/W)CD、R/W DVD、等等。在一些实施方式中,信号承载介质1401可以包含通信介质1405,诸如但不限于,数字和/或模拟通信介质(例如,光纤电缆、波导、有线通信链路、无线通信链路、等等)。因此,例如,信号承载介质1401可以由无线形式的通信介质1405(例如,遵守IEEE 802.14标准或者其它传输协议的无线通信介质)来传达。In some implementations, signal bearing media 1401 may include computer recordable media 1404 such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, and the like. In some implementations, signal bearing medium 1401 may include communication media 1405, such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.). Thus, for example, signal bearing medium 1401 may be conveyed by a wireless form of communication medium 1405 (eg, a wireless communication medium that complies with the IEEE 802.14 standard or other transmission protocol).
一个或多个程序指令1402可以是,例如,计算机可执行指令或者逻辑实施指令。在一些示例中,计算设备的计算设备可以被配置为,响应于通过计算机可读介质1403、计算机可记录介质1404、和/或通信介质1405中的一个或多个传达到计算设备的程序指令1402, 提供各种操作、功能、或者动作。One or more program instructions 1402 may be, for example, computer-executable instructions or logic-implemented instructions. In some examples, the computing device of the computing device may be configured to respond to program instructions 1402 communicated to the computing device via one or more of computer-readable media 1403 , computer-recordable media 1404 , and/or communication media 1405 , Provide various operations, functions, or actions.
应该理解,这里描述的布置仅仅是用于示例的目的。因而,本领域技术人员将理解,其它布置和其它元素(例如,机器、接口、功能、顺序、和功能组等等)能够被取而代之地使用,并且一些元素可以根据所期望的结果而一并省略。另外,所描述的元素中的许多是可以被实现为离散的或者分布式的组件的、或者以任何适当的组合和位置来结合其它组件实施的功能实体。It should be understood that the arrangements described here are for example purposes only. Accordingly, those skilled in the art will understand that other arrangements and other elements (e.g., machines, interfaces, functions, sequences, and groups of functions, etc.) can be used instead, and some elements may be omitted altogether depending on the desired results. . Additionally, many of the elements described are functional entities that may be implemented as discrete or distributed components, or in combination with other components in any suitable combination and location.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。 If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other various media that can store program codes.

Claims (20)

  1. 一种数据预取方法,其特征在于,所述方法应用于计算机系统的第一实例,所述计算机系统还包括第二实例,所述方法包括:A data prefetching method, characterized in that the method is applied to a first instance of a computer system, the computer system also includes a second instance, and the method includes:
    所述第一实例获取预取指令,其中所述预取指令用于指示数据访问指令的地址以及至少一个元数据,所述数据访问指令用于指示链式数据结构的地址,所述链式数据结构包括地址不连续的多个数据,所述至少一个元数据用于指示链式数据结构中的数据的地址;The first example obtains a prefetch instruction, wherein the prefetch instruction is used to indicate the address of a data access instruction and at least one metadata. The data access instruction is used to indicate the address of a chained data structure. The chained data The structure includes multiple data with discontinuous addresses, and the at least one metadata is used to indicate the address of the data in the chained data structure;
    所述第一实例根据所述数据访问指令的地址,获取所述链式数据结构的地址;The first instance obtains the address of the chained data structure according to the address of the data access instruction;
    所述第一实例根据所述链式数据结构的地址和所述至少一个元数据,预取所述链式数据结构中的数据;The first instance prefetches data in the chained data structure based on the address of the chained data structure and the at least one metadata;
    所述第二实例执行所述数据访问指令,以访问所述链式数据结构中的数据;The second instance executes the data access instruction to access data in the chained data structure;
    其中,所述第一实例在预取所述链式数据结构中数据的过程中,所述第一实例根据所述第二实例执行所述数据访问指令的次数来控制预取所述链式数据结构中数据的进度,所述进度用于使所述链式数据结构中的数据在被访问前已预取到缓存中。Wherein, in the process of prefetching data in the chained data structure, the first instance controls the prefetching of the chained data according to the number of times the second instance executes the data access instruction. The progress of the data in the structure, which is used to make the data in the chained data structure prefetched into the cache before being accessed.
  2. 根据权利要求1所述的方法,其特征在于,在所述第一实例预取所述链式数据结构中数据的过程中,已预取数据的数量与已访问数据的数量之间的差值在预设范围内。The method of claim 1, wherein during the process of the first instance prefetching data in the chained data structure, the difference between the amount of prefetched data and the amount of accessed data is within the preset range.
  3. 根据权利要求1或2所述的方法,其特征在于,所述链式数据结构中的数据包括用于指向所述链式数据结构内其他数据的地址的指针,所述至少一个元数据分别与所述链式数据结构中不同的数据对应,且所述至少一个元数据均用于指示所对应的数据中指针的位置;The method according to claim 1 or 2, characterized in that the data in the chained data structure includes pointers pointing to addresses of other data in the chained data structure, and the at least one metadata is respectively related to Different data in the chained data structure correspond to each other, and the at least one metadata is used to indicate the position of the pointer in the corresponding data;
    所述第一实例根据所述链式数据结构的地址和所述至少一个元数据,预取所述链式数据结构中的数据,包括:The first instance prefetches data in the chained data structure based on the address of the chained data structure and the at least one metadata, including:
    所述第一实例根据所述链式数据结构的地址,预取所述链式数据结构中的数据;The first instance prefetches the data in the chained data structure according to the address of the chained data structure;
    所述第一实例根据所述链式数据结构中的已预取数据和所述已预取数据对应的元数据,获取所述已预取数据中的指针;The first instance obtains a pointer in the prefetched data based on the prefetched data in the chained data structure and the metadata corresponding to the prefetched data;
    所述第一实例根据所述已预取数据中的指针所指向的地址,从所述链式数据结构中预取所述已预取数据所指向的其他数据。The first instance prefetches other data pointed to by the prefetched data from the chained data structure according to the address pointed by the pointer in the prefetched data.
  4. 根据权利要求3所述的方法,其特征在于,所述至少一个元数据还用于指示所对应的数据所指向的其他数据的大小。The method according to claim 3, characterized in that the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
  5. 根据权利要求3或4所述的方法,其特征在于,所述至少一个元数据中的每个元数据还用于指示所对应的数据的类型和所述所对应的数据所指向的其他数据的类型。The method according to claim 3 or 4, characterized in that each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data. type.
  6. 根据权利要求1-5任意一项所述的方法,其特征在于,所述预取指令具体用于指示所述预取指令与所述数据访问指令之间的地址偏移。 The method according to any one of claims 1 to 5, characterized in that the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
  7. 根据权利要求1-6任意一项所述的方法,其特征在于,所述预取指令具体用于指示所述至少一个元数据的地址;The method according to any one of claims 1-6, characterized in that the prefetch instruction is specifically used to indicate the address of the at least one metadata;
    所述方法还包括:The method also includes:
    所述第一实例根据所述至少一个元数据的地址,获取所述至少一个元数据。The first instance obtains the at least one metadata according to the address of the at least one metadata.
  8. 根据权利要求7所述的方法,其特征在于,所述至少一个元数据的大小相同,所述预取指令用于指示所述至少一个元数据的起始地址以及所述至少一个元数据的数量;The method of claim 7, wherein the at least one metadata has the same size, and the prefetch instruction is used to indicate a starting address of the at least one metadata and a quantity of the at least one metadata. ;
    所述第一实例根据所述至少一个元数据的地址,获取所述至少一个元数据,包括:The first instance obtains the at least one metadata based on the address of the at least one metadata, including:
    所述第一实例根据所述至少一个元数据的数量和大小,从所述至少一个元数据的起始地址开始获取所述至少一个元数据。The first example obtains the at least one metadata starting from a starting address of the at least one metadata according to the quantity and size of the at least one metadata.
  9. 一种编译方法,其特征在于,包括:A compilation method, characterized by including:
    获取第一代码;Get the first code;
    在识别到所述第一代码中存在请求访问链式数据结构的代码时,根据所述链式数据结构生成数据访问指令和至少一个元数据,其中,所述链式数据结构包括地址不连续的多个数据,所述至少一个元数据用于指示所述链式数据结构中数据的地址,所述数据访问指令用于指示所述链式数据结构的地址以及请求访问所述链式数据结构;When it is recognized that there is a code requesting access to a chained data structure in the first code, a data access instruction and at least one metadata are generated according to the chained data structure, wherein the chained data structure includes addresses with discontinuous addresses. A plurality of data, the at least one metadata is used to indicate the address of the data in the chained data structure, and the data access instruction is used to indicate the address of the chained data structure and request access to the chained data structure;
    根据所述至少一个元数据和所述数据访问指令生成预取指令,以得到编译后的第二代码,所述预取指令用于指示所述数据访问指令的地址以及所述至少一个元数据。A prefetch instruction is generated according to the at least one metadata and the data access instruction to obtain the compiled second code, and the prefetch instruction is used to indicate the address of the data access instruction and the at least one metadata.
  10. 根据权利要求9所述的方法,其特征在于,所述链式数据结构中的数据包括用于指向其他数据的地址的指针,所述至少一个元数据分别与所述链式数据结构中不同的数据对应,且所述至少一个元数据均用于指示所对应的数据中指针的位置。The method according to claim 9, characterized in that the data in the chained data structure includes pointers pointing to addresses of other data, and the at least one metadata is different from those in the chained data structure. The data corresponds to each other, and the at least one metadata is used to indicate the position of the pointer in the corresponding data.
  11. 根据权利要求10所述的方法,其特征在于,所述至少一个元数据还用于指示所对应的数据所指向的其他数据的大小。The method according to claim 10, characterized in that the at least one metadata is also used to indicate the size of other data pointed to by the corresponding data.
  12. 根据权利要求10或11所述的方法,其特征在于,所述至少一个元数据中的每个元数据还用于指示所对应的数据的类型和所述所对应的数据所指向的其他数据的类型。The method according to claim 10 or 11, characterized in that each metadata in the at least one metadata is also used to indicate the type of the corresponding data and the type of other data pointed to by the corresponding data. type.
  13. 根据权利要求9-12任意一项所述的方法,其特征在于,所述预取指令具体用于指示所述预取指令与所述数据访问指令之间的地址偏移。The method according to any one of claims 9-12, wherein the prefetch instruction is specifically used to indicate an address offset between the prefetch instruction and the data access instruction.
  14. 根据权利要求9-13任意一项所述的方法,其特征在于,所述预取指令具体用于指示所述至少一个元数据的地址。 The method according to any one of claims 9-13, characterized in that the prefetch instruction is specifically used to indicate the address of the at least one metadata.
  15. 根据权利要求14所述的方法,其特征在于,所述预取指令用于指示所述至少一个元数据的起始地址以及所述至少一个元数据的数量,所述至少一个元数据的大小相同。The method of claim 14, wherein the prefetch instruction is used to indicate a starting address of the at least one metadata and a quantity of the at least one metadata, and the at least one metadata has the same size. .
  16. 根据权利要求9-15任意一项所述的方法,其特征在于,所述至少一个元数据位于第二代码中的代码段或数据段,所述第二代码是基于所述第一代码编译得到的。The method according to any one of claims 9-15, characterized in that the at least one metadata is located in a code segment or a data segment in the second code, and the second code is compiled based on the first code. of.
  17. 一种电子设备,其特征在于,包括存储器和处理器;所述存储器存储有代码,所述处理器被配置为执行所述代码,当所述代码被执行时,所述电子设备执行如权利要求1至8任一项所述的方法。An electronic device, characterized by comprising a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the electronic device executes the claims The method described in any one of 1 to 8.
  18. 一种电子设备,其特征在于,包括存储器和处理器;所述存储器存储有代码,所述处理器被配置为执行所述代码,当所述代码被执行时,所述电子设备执行如权利要求9至16任一项所述的方法。An electronic device, characterized by comprising a memory and a processor; the memory stores code, the processor is configured to execute the code, and when the code is executed, the electronic device executes the claims The method described in any one of 9 to 16.
  19. 一种计算机可读存储介质,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机上运行时,使得所述计算机执行如权利要求1至16中任一项所述的方法。A computer-readable storage medium, characterized in that it includes computer-readable instructions. When the computer-readable instructions are run on a computer, the computer is caused to perform the method according to any one of claims 1 to 16. .
  20. 一种计算机程序产品,其特征在于,包括计算机可读指令,当所述计算机可读指令在计算机上运行时,使得所述计算机执行如权利要求1至16任一项所述的方法。 A computer program product, characterized by comprising computer-readable instructions, which when run on a computer, cause the computer to perform the method according to any one of claims 1 to 16.
PCT/CN2023/099303 2022-06-10 2023-06-09 Data prefetching method, compiling method and related apparatus WO2023237084A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210654495.4 2022-06-10
CN202210654495.4A CN117251387A (en) 2022-06-10 2022-06-10 Data prefetching method, compiling method and related devices

Publications (1)

Publication Number Publication Date
WO2023237084A1 true WO2023237084A1 (en) 2023-12-14

Family

ID=89117547

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/099303 WO2023237084A1 (en) 2022-06-10 2023-06-09 Data prefetching method, compiling method and related apparatus

Country Status (2)

Country Link
CN (1) CN117251387A (en)
WO (1) WO2023237084A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874690A (en) * 2017-05-16 2018-11-23 龙芯中科技术有限公司 The implementation method and processor of data pre-fetching
US20190278709A1 (en) * 2018-03-06 2019-09-12 Arm Limited Prefetching using offset data to access a pointer within a current data element for use in prefetching a subsequent data element
US10684857B2 (en) * 2018-02-01 2020-06-16 International Business Machines Corporation Data prefetching that stores memory addresses in a first table and responsive to the occurrence of loads corresponding to the memory addresses stores the memory addresses in a second table
CN113407119A (en) * 2021-06-28 2021-09-17 海光信息技术股份有限公司 Data prefetching method, data prefetching device and processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108874690A (en) * 2017-05-16 2018-11-23 龙芯中科技术有限公司 The implementation method and processor of data pre-fetching
US10684857B2 (en) * 2018-02-01 2020-06-16 International Business Machines Corporation Data prefetching that stores memory addresses in a first table and responsive to the occurrence of loads corresponding to the memory addresses stores the memory addresses in a second table
US20190278709A1 (en) * 2018-03-06 2019-09-12 Arm Limited Prefetching using offset data to access a pointer within a current data element for use in prefetching a subsequent data element
CN113407119A (en) * 2021-06-28 2021-09-17 海光信息技术股份有限公司 Data prefetching method, data prefetching device and processor

Also Published As

Publication number Publication date
CN117251387A (en) 2023-12-19

Similar Documents

Publication Publication Date Title
US9229746B2 (en) Identifying load-hit-store conflicts
US7243333B2 (en) Method and apparatus for creating and executing integrated executables in a heterogeneous architecture
US7870307B2 (en) DMA and graphics interface emulation
US5727227A (en) Interrupt coprocessor configured to process interrupts in a computer system
KR20150112778A (en) Inter-architecture compatability module to allow code module of one architecture to use library module of another architecture
JPH10187533A (en) Cache system, processor, and method for operating processor
JPH08278918A (en) System and method for execution of endian task
US7480768B2 (en) Apparatus, systems and methods to reduce access to shared data storage
KR20020097160A (en) Method and apparatus for using an assist processor to pre-fetch data values for a primary processor
TW201732556A (en) Hardware content-associative data structure for acceleration of set operations
US20110283067A1 (en) Target Memory Hierarchy Specification in a Multi-Core Computer Processing System
US6260191B1 (en) User controlled relaxation of optimization constraints related to volatile memory references
KR20050074766A (en) System for improving transaction rate of java program and method thereof
US20150356016A1 (en) Method of establishing pre-fetch control information from an executable code and an associated nvm controller, a device, a processor system and computer program products
JP2003140965A (en) Distributed shared memory type parallel computer and instruction scheduling method
US20150089149A1 (en) Arithmetic processing device and control method for arithmetic processing device
US7051146B2 (en) Data processing systems including high performance buses and interfaces, and associated communication methods
WO2023237084A1 (en) Data prefetching method, compiling method and related apparatus
WO2023142524A1 (en) Instruction processing method and apparatus, chip, electronic device, and storage medium
US9342303B2 (en) Modified execution using context sensitive auxiliary code
US11403082B1 (en) Systems and methods for increased bandwidth utilization regarding irregular memory accesses using software pre-execution
CN114218152B (en) Stream processing method, processing circuit and electronic equipment
CN116710891A (en) Compiling and executing method of subgraph and related equipment
WO2021036173A1 (en) Method and apparatus for explaining and executing bytecode instruction stream
EP4196874A1 (en) Representing asynchronous state machine in intermediate code

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23819247

Country of ref document: EP

Kind code of ref document: A1