CN111651379A

CN111651379A - DAX equipment address translation caching method and system

Info

Publication number: CN111651379A
Application number: CN202010357810.8A
Authority: CN
Inventors: 熊子威; 蒋德钧; 熊劲
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2020-09-11
Anticipated expiration: 2040-04-29
Also published as: CN111651379B

Abstract

The invention provides a DAX device address translation caching method and a DAX device address translation caching system, wherein the method comprises the following steps: constructing a DAX address translation cache consisting of a mapping file initial address register (MFA), an object offset register (OFS), a file number register (FID) and an address translation table; according to the address conversion function, writing the file number in the persistent address and the object offset in the persistent address into the file number register and the object offset register respectively; the fast table converts a virtual address sent by the CPU into a physical address, the DAX address conversion cache searches the address conversion table through data stored in the file number register, adds a first address corresponding to a search result and data in the object offset register to obtain a direct access address, and feeds back the direct access address serving as a conversion result of the virtual address to the CPU. The invention can reduce the instruction overhead of the address conversion function by half and greatly enhance the efficiency of processing the multi-mapping file.

Description

DAX equipment address translation caching method and system

Technical Field

The invention relates to the field of computer architecture and nonvolatile memory research, in particular to a DAX device address translation caching method and system.

Background

Opening and mapping a file to a memory, and then accessing the file through load and store access instructions is a common way of accessing a mapped file in the current system. In such a case, the file is mapped into a large array of bytes whose first address is determined by the mapping function. The application is free to access any data in the array during runtime. This creates the problem that if a program writes data into a mapped file and wishes to be able to retrieve the data after the program has been restarted, the program must process the data so that it conforms to a certain format. Because the mapping function does not guarantee that the same file is mapped to the same address, after the program is restarted, the mapping address in the last operation is invalid in the current operation. The program cannot locate the data stored to the mapped file in the last run by the virtual address.

With the emergence of a new-generation nonvolatile Memory (Non-Volatile Memory), some researchers and enterprises are developing NVM development libraries, and it is expected that friendly application interfaces are provided for developers to use. The development libraries work on a persistence device supporting a DAX (Direct Access) mode, and the libraries which are well connected with the existing operating system can select a file system to manage a nonvolatile memory NVM and Access resources on the NVM in a file mapping mode. Therefore, these development libraries also need to provide solutions to allow programs to conveniently access data on the NVM after a reboot.

The existing common development library is designed in such a way that each development library respectively maintains a persistent address for different storage objects, the number of a mapping file and the offset of the storage object relative to the initial address of the mapping file are stored in the persistent address, and an address conversion function is provided to convert the persistent address into a virtual address during operation, so that the overhead of formatting data is avoided. From the foregoing, it can be seen that such a conversion is necessary. Because the virtual addresses are volatile, the development library has to maintain the persisted addresses individually, without guaranteeing that the first address of the mapped file is similar to the mapped address at the last run after each process restart and remapping of the file. Such a design, while enabling the program to still normally access the data in the NVM after the reboot, also becomes a performance bottleneck for the NVM development library.

For the existing NVM development library, the overhead of the address translation function is large, which accounts for about 13% of the total overhead. And such functions cannot be optimized in software because address translation is performed by hardware in conventional hardware. The current development library carries out address conversion on software, and consumes more time. Meanwhile, if a plurality of files need to be managed, the address conversion function has to repeatedly inquire the first addresses of different files in the operation process, and the efficiency is extremely low. And because the logic of the address conversion function is very simple and the code is very short, the optimization of the software level is very difficult.

Disclosure of Invention

The invention designs DAX address translation cache on hardware, provides hardware function for development library to accelerate address translation, and provides operation efficiency of application using the development library.

Specifically, in order to overcome the defects in the prior art, the present invention provides a method for caching address translation of a DAX device, which includes:

step 1, constructing a DAX address translation cache consisting of a mapping file initial address register (MFA), an object offset register (OFS), a file number register (FID) and an address translation table;

step 2, writing the file number in the persistent address and the object offset in the persistent address into the file number register and the object offset register respectively according to the address conversion function;

and 3, converting the virtual address sent by the CPU into a physical address by the fast table, searching the address conversion table by data stored in the file number register by the DAX address conversion cache, adding a first address corresponding to a search result and data in the object offset register to obtain a direct access address, and feeding back the direct access address serving as the conversion result of the virtual address to the CPU.

The address translation caching method for the DAX equipment, wherein the step 3 comprises the following steps:

step 31, if the direct access address is 0, the address translation function fills the first address of the mapping file into the first address register of the mapping file, and writes 0 into the DAX address translation cache, and after the DAX address translation cache receives the write request, the data in the file number register and the first address register of the mapping file is filled into the address translation table through a replacement algorithm.

The address translation caching method for the DAX equipment further comprises the following steps:

and 4, sending the physical address to a cache memory, taking data corresponding to the physical address in the cache memory as a response result, judging whether the direct access address is effective, if so, feeding the direct access address back to the CPU, and otherwise, feeding the response result back to the CPU.

In the address translation caching method for the DAX equipment, the address translation table is 32 register pairs.

The invention also provides a DAX device address translation cache system, which comprises:

the module 1 is used for constructing a DAX address translation cache consisting of a mapping file initial address register (MFA), an object offset register (OFS), a file number register (FID) and an address translation table;

the module 2 writes the file number in the persistent address and the object offset in the persistent address into the file number register and the object offset register respectively according to the address conversion function;

the module 3, the fast table converts the virtual address sent by the CPU into the physical address, the DAX address translation cache retrieves the address translation table through the data stored in the file number register, adds the first address corresponding to the retrieval result and the data in the object offset register to obtain the direct access address, and feeds back the direct access address as the translation result of the virtual address to the CPU.

The address translation cache system of the DAX device, wherein the module 3 includes:

if the direct access address is 0, the address translation function fills the first address of the mapping file into the first address register of the mapping file, and writes 0 into the DAX address translation cache, and after the DAX address translation cache receives the write request, the data in the file number register and the first address register of the mapping file are filled into the address translation table through a replacement algorithm.

The address translation cache system of the DAX device further comprises:

and the module 4 sends the physical address to a cache memory, takes data corresponding to the physical address in the cache memory as a response result, judges whether the direct access address is effective, if so, feeds the direct access address back to the CPU, and otherwise, feeds the response result back to the CPU.

The address translation cache system of the DAX device is characterized in that the address translation table is 32 register pairs.

According to the scheme, the invention has the advantages that:

the invention can reduce the instruction overhead of the address conversion function by half and greatly enhance the efficiency of processing the multi-mapping file.

Drawings

FIG. 1 is a diagram of an address translation cache architecture;

FIG. 2 is a diagram showing the connection relationship between the CPU, TLB and Cache;

FIG. 3 is a block diagram of the present invention;

fig. 4 is a comparison graph of the effects of the present invention.

Detailed Description

The inventor, when studying the efficiency of the address conversion function, finds that the defect in the prior art is caused by too many redundant instructions. These too many instructions come from conditional branches, redundant address loads, security checks, and the like. The purpose of these redundant instructions is to maintain a simple cache that temporarily stores the first address of the most recently accessed map file.

Obviously, a larger cache cannot be maintained on software, otherwise, the searching efficiency is extremely low; the validity check of the cache also brings more redundant instructions. Considering that the purpose of these instructions is to implement address translation, current computer architectures already include a fast table to speed up address translation, and therefore may consider this process to be done by hardware. But in design needs to meet several requirements: (1) the present invention provides a method for improving performance of a device, which includes (1) minimizing changes to an existing computer architecture, which cannot cause excessive changes to a data path, and preferably avoiding changes to the data path, (2) avoiding addition of new instructions as much as possible, otherwise the actual value of the invention will be compromised, and (3) facilitating use, which does not allow developers who expect performance improvements to be achieved by the device to rewrite excessive code.

By taking the structure of the fast table as a reference, the invention designs the DAX address translation cache. By means of the cache, the instruction number of the address conversion function can be reduced by half, and the device can greatly improve the performance of the address conversion function for processing a plurality of mapping files, because the parallel search is realized on hardware with high efficiency.

The main invention points of the invention comprise:

key point 1, the DAX address translation cache is made up of 32 register pairs and 3 independent registers, taking into account the balance between hardware performance and power consumption. The registers are called Address Translation tables (Address Translation tables), and the three independent registers are called MFA registers (mapping file first Address registers), OFS registers (object offset registers) and FID registers (file number registers), respectively. The register pair stores the number of the mapping file and the first address of the mapping file, and the three independent registers can carry out data transmission;

at key 2, the address translation function needs to explicitly access the DAX address translation cache via a memory access instruction. Since the present invention does not wish to disrupt the existing data path in current computer architectures, an address translation function is required to explicitly access the DAX address translation cache to obtain the required first address. On one hand, the method can avoid reducing the efficiency of the existing data at the same time and influencing the performance of the existing system when the DAX address translation cache is added; on the other hand, additional instructions may be avoided, or existing instructions may be modified. The only thing that needs to be changed is to register four virtual addresses in the operating system and map the four virtual addresses to the DAX address translation cache. Because the reserved address with a certain size is reserved in each architecture chip at present, the operation is not complicated;

the key point 3, the DAX address translation cache is responsible for checking the validity of the address, the file number is 0, and the offset is 0, which belong to illegal addresses;

and 4, at the key point, the DAX address translation cache can be written and can not be read, so that a malicious program is prevented from illegally obtaining the address of the mapping file without the access right through the read operation of the DAX.

In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

DAX address translation cache structure:

the address translation cache structure is shown in fig. 1. In this figure, three separate registers may be used to send data uni-directionally to the address translation table, and the address registers in the OFS registers and address translation table may send data to an adder that generates the result as a virtual address translated from the DAX address translation cache.

DAX address translation cache location:

in modern computer architecture, the connection between the CPU, TLB (Cache table) and Cache is shown in fig. 2. When the access instruction is executed, the virtual address generated by the instruction is sent to the TLB by the CPU, and under the condition of hit, the TLB carries out address translation to generate a physical address. The physical address is sent to the Cache, and once hit, the data in the Cache of the Cache memory is transmitted to the CPU to finish the access. If not, the generated physical address is sent to a memory bus, further sent to a memory controller, and finally sent to a DRAM to finish data reading. Wherein the Cache is completely transparent to the programmer and caches the DRAM. The dynamic random access memory DRAM itself is used to accommodate instructions and data.

The DAX address translation Cache should be placed between the TLB and Cache, and address the DAX address translation Cache to the reserved address of the CPU. After TLB finishes address translation, the generated physical address is sent to DAX address translation Cache and Cache directly, if physical address is the access to DAX address translation Cache, DAX address translation Cache should respond and transmit data to CPU. Otherwise, the Cache transmits the data to the CPU or reports an error. Therefore, arbitration logic should be set between the DAX address translation Cache and the Cache, and the response of the DAX address translation Cache has higher priority, and the data sent by the DAX address translation Cache should be transmitted preferentially.

Address translation functions are written by developers and utilize DAX caches to increase the speed of execution of the functions. Typically this function requires querying a software-maintained cache and then deciding how to perform the address translation. The invention is equivalent to moving the cache maintained by the software to the hardware. The address translation function should perform the following flow:

1. and writing the FID register, and writing the file number in the persistent address into the register. The file number is not 0, and there is no special requirement. Each development library is free to choose how to generate the file numbers. Wherein, the file number in the address is determined by the upper developer, and the file number is only a 64-bit integer. The PMDK developed by Intel currently manages objects in the form of file number + offset within file. Here, FID is the file number in the PMDK, and OFS is the intra-file offset.

2. And writing the OFS register, and writing the object offset in the persistent address into the register. There is no special requirement except that the object offset is not 0.

3. An address translation table within the DAX address translation cache is read. At this time, the DAX address translation cache searches an address translation table through data stored in the FID, if a matching item is found, the corresponding first address and data in the OFS register are added, and the obtained result is used as a response to the address translation function read request.

4. Checking whether the read data is 0, if not, ending the address conversion; if 0, the next step is executed.

5. The MFA register is written and the first address of the mapping file is filled into the register. During programming, the first step in accessing the NVM is to perform file mapping, which may be done by accessing the first address of the mapped file.

6. Write 0 to the DAX address translation cache. After the DAX address translation cache receives the write request, the data in the FID and the MFA are filled in the address translation table through a replacement algorithm.

7. The address translation function ends.

Arbitration between DTLB (direct access TLB) and Cache:

in the foregoing, when the TLB completes address translation, the obtained physical address should be sent to the DTLB and the Cache at the same time, and arbitrates the responses of the DTLB and the Cache, and preferentially sends the response of the DTLB to the CPU. Fig. 3 shows the hardware architecture that should be employed to accomplish such arbitration.

And (6) evaluation. Since it is currently impractical to add this component to the CPU, the evaluation is performed in an analog manner. The invention selects a gem5 simulator. The simulator simulates CPUs of different architectures including X86, ARM, etc. Two modes of full system simulation and system interrupt simulation are provided. Because the invention works in the user mode, the operating system does not need to be operated, and the system interrupt simulation mode is used.

In the test, the pmemobj _ direct address conversion function in the PMDK which is developed and maintained by Intel is compared with the address conversion function which is written by self and used for calling the DAX address conversion cache, and 800 ten thousand persistent objects are subjected to address conversion under the condition that a single memory pool and a plurality of memory pools are used respectively. The respective elapsed times (unit: seconds) are shown in FIG. 4.

Impact on existing systems:

in order to evaluate what impact the addition of a DAX address translation cache will have on existing systems, an evaluation of the performance of various components in the existing computer system is required.

Currently, the TLB may complete responses within 1 clock cycle, and the Cache may complete responses within 5 clock cycles. Then theoretically, when the instruction finishes decoding and enters the execution stage, the data can be sent to the CPU after 6 clock cycles at the fastest speed. At present, the method

The searchable data shows that the hit rate of the first-level Cache reaches 95%, and the hit rate of the second-level Cache is 97% in cooperation. The average memory access delay can be estimated to be 9 clock cycles. If the DAX address translation Cache is added between the TLB and the Cache, a clock cycle is added for each access to arbitrate whether the physical address is transmitted to the Cache, so that the common access instruction suffers additional 1 clock cycle delay, and the performance is reduced by about 20%. Therefore, it is emphasized in the design herein that the TLB should send the physical address to both the Cache and the DAX address translation Cache, and select data responding to the access request through arbitration logic, but cannot send to the DAX address translation Cache and then to the Cache in order.

The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.

The address translation cache system of the DAX device further comprises:

Claims

1. A DAX device address translation caching method is characterized by comprising the following steps:

2. The DAX device address translation caching method of claim 1, wherein the step 3 comprises:

3. The DAX device address translation caching method of claim 1, further comprising:

4. The DAX device address translation caching method of claim 1, wherein the address translation table is 32 register pairs.

5. A DAX device address translation cache system, comprising:

6. The DAX device address translation cache system of claim 5, wherein the module 3 comprises:

7. The DAX device address translation cache system of claim 5, further comprising:

8. The DAX device address translation cache system of claim 5, wherein the address translation table is 32 register pairs.