WO2013044829A1 - Data readahead method and device for non-uniform memory access - Google Patents

Data readahead method and device for non-uniform memory access Download PDF

Info

Publication number
WO2013044829A1
WO2013044829A1 PCT/CN2012/082202 CN2012082202W WO2013044829A1 WO 2013044829 A1 WO2013044829 A1 WO 2013044829A1 CN 2012082202 W CN2012082202 W CN 2012082202W WO 2013044829 A1 WO2013044829 A1 WO 2013044829A1
Authority
WO
WIPO (PCT)
Prior art keywords
prefetch
data
parameter
node
weight
Prior art date
Application number
PCT/CN2012/082202
Other languages
French (fr)
Chinese (zh)
Inventor
谭玺
韦竹林
刘轶
朴明铉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2013044829A1 publication Critical patent/WO2013044829A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0862Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with prefetch

Definitions

  • the present invention relates to the field of communications, and more particularly to a data prefetching method and apparatus for non-uniform memory access. Background technique
  • disks are still the primary storage medium for computer systems.
  • the disk faces its input/output (I/O) bandwidth, which is not comparable to the central processing unit (CPU)/memory development speed and disk access latency and CPU.
  • I/O input/output
  • CPU central processing unit
  • Memory two gaps in the growing gap between read and write speeds.
  • the disk transfer speed and disk I/O access speed the disk I/O access speed is the slowest.
  • the disk I/O access speed and the CPU speed are getting wider and larger.
  • Disk I /O access latency has become one of the most important bottlenecks that constrain system I/O performance.
  • heterogeneity is a very effective I/O performance optimization strategy. Data prefetching is a common method to achieve I/O heterogeneity.
  • the so-called data prefetching means that the system performs I/O operations in advance in the background, and loads the required data into the memory in advance to hide the I/O delay of the application, thereby effectively improving the utilization of the computer system.
  • the data prefetching provides an operation strategy that eliminates the waiting time of the CPU and enables the CPU and the disk to work in parallel, thereby improving the overall I/O performance of the system.
  • Data prefetching adopts the method of pattern matching, gp, to maintain the history access record by monitoring the access sequence of each file, and to match the identified patterns one by one. If the behavior characteristics of an access pattern are met, data prediction and prefetching can be performed accordingly.
  • the specific implementation techniques include heuristic pre-fetching and informed prefetching.
  • heuristic prefetching is transparent to the upper layer application, and the I/O characteristics are analyzed through the historical access record of the automatic observing program, and the prediction and prefetching are performed independently. The data block that will be accessed.
  • the Lmux kernel version after 2.6.22 provides a heuristic-based on-demand prefetch algorithm that works on the Virtual File System (VFS) layer to uniformly serve various file read operations ( Through the system call API), the pair is independent of the specific file system.
  • VFS Virtual File System
  • the on-demand prefetch algorithm introduces page state and page cache state, and adopts loose sequential decision conditions to provide effective support for sequential I/O operations, including heterogeneous/non-blocking I/O and multi-thread interleaved I/O. , sequential random mixed I / O, large-scale concurrent I / O and other operations.
  • an application wants to access data, it accesses a disk file via the page cache through the system call interface.
  • the kernel calls the prefetch algorithm on this standard file access path, tracks the application's access sequence, and performs proper prefetching.
  • the heuristic-based on-demand prefetch algorithm provided by Lmux mainly determines the access mode of the application by monitoring the read request of the application and the page cache, and then determines the location and size of the prefetch according to the access mode.
  • the prefetching framework can be roughly divided into two parts: a monitoring part and a judging processing part, wherein the monitoring part is embedded in a read request response process such as a do-generic_file_read() function, and detects whether each page in the request has been In the file cache address space, if not, apply for a new page, and the application is temporarily suspended while waiting for I/O to load the page for peer read-ahead.
  • the page is prefetched (PG_readahead).
  • PG_readahead page when the prefetch tag page (PG_readahead page) is detected, it means that the timing of the next prefetch I/O has arrived, and the system performs the pre-reading.
  • the matching part is located
  • the ondemand-readahead ⁇ function is logically composed of a set of independent decision modules that determine whether the file is a read, a small file read, a sequential read, or a random read.
  • the on-demand prefetching framework supports both sequential read and random read access modes, and simple discarding of small file reads is performed without data prefetching.
  • Both the heuristic-based on-demand prefetch algorithm provided by the Lmux kernel version 2.2.63 and other data prefetching techniques are designed for single-processor systems. Since the single processor system itself is limited by factors such as the computing power, memory capacity and bandwidth of the processor, the corresponding data prefetch design is relatively conservative, especially the prefetch management part: The page size of the initial read request is the benchmark, taking a multiplication strategy with a factor of 2 and setting the upper limit window.
  • NUMA non-uniform memory access
  • Multiprocessor systems based on distributed shared memory NUMA architecture are very different from single processor systems in terms of CPU access queue control, memory access control, and node load balancing architecture.
  • Data prefetching for single processor systems The multiprocessor system environment of the NUMA architecture has not been met. If the Lmux system is deployed on a distributed shared memory NUMA architecture server, the prefetch management method provided by the Linux system does not take into account the unique characteristics of the NUMA architecture server, such as CPU load, node remaining memory size, and global remaining memory size. And other influencing factors, therefore, the actual operation effect of this data prefetching for a single processor system cannot be optimal.
  • the disk system may be overloaded; for example, when the node of the NUMA architecture has local residual memory.
  • the data in the local memory of the node is likely to be due to the large delay of accessing the remote memory due to the characteristics of the distributed memory of the NUMA architecture.
  • the prefetched data further increases the occupation of the remaining memory of the node.
  • Embodiments of the present invention provide a data prefetching method and apparatus for non-uniform memory access, Improve the reliability and accuracy of file prefetching under the NUMA architecture.
  • An embodiment of the present invention provides a data prefetching method for non-uniform memory access, where the method includes:
  • An embodiment of the present invention provides a data prefetching apparatus for non-uniform memory access, where the data prefetching parameter factor obtaining module is configured to access parameters and processes of a disk load in a NUMA system according to a non-uniform memory.
  • the idle prefetch buffer capacity of the node acquires a data prefetch parameter parameter prefetch window multiplier module, which is used to obtain the size of the previous prefetch window, the maximum multiplier of the prefetch amount, and the data prefetch The product of the quantity parameter factor r and the ⁇ ; And the size of the ⁇ -, the smaller value of the 1 ⁇ 2 ⁇ ⁇ ⁇ and the ⁇ is used as the size of the prefetch window to prefetch the data.
  • the previous prefetch can be obtained.
  • the size of the window ⁇ f, the maximum multiplier of the prefetch amount and the data prefetch parameter parameter r are the product of the maximum prefetch
  • the size relationship of MAX determines the size of this prefetch window, and finally according to the determined prefetch ⁇ The size of the mouth to prefetch the data. Since the parameter representing the disk load in the NUMA system is related to the input/output I/O queue of the current operating system, and the data prefetch parameter parameter ⁇ is obtained according to the idle prefetch buffer capacity of the node where the process is located, therefore, Compared with the data prefetching algorithm provided by the technology for the single processor, the data prefetching method provided by the embodiment of the present invention comprehensively considers the factors affecting system performance such as the disk I/O load and the remaining memory size of the node, gp, on the disk.
  • FIG. 1 is a schematic flowchart of a data prefetching method for non-uniform memory access according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a general design principle of a data prefetching algorithm
  • Figure 3 is a schematic diagram of the working hierarchy of the data prefetch algorithm
  • FIG. 4 is a schematic diagram of a prefetch window in a data prefetch algorithm
  • FIG. 5 is a schematic structural diagram of a data prefetching apparatus for non-uniform memory access according to an embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of a data prefetching device for non-uniform memory access according to another embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a data prefetching apparatus for non-uniform memory access according to another embodiment of the present invention. detailed description
  • FIG. 1 is a schematic flowchart of a data prefetching method for non-uniform memory access according to an embodiment of the present invention, which mainly includes the following steps:
  • the parameters characterizing the disk load in the NUMA system are related to the input/output I/O queue of the current operating system.
  • the so-called "current operating system I / O queue” refers to the I / O queue managed by the operating system and the current access to the disk by the I / O queue, how many read and write queues in the current NUMA system are accessing the disk .
  • the Lmux system is taken as an example to describe the data prefetching algorithm based on the design principle and working level of the data prefetching algorithm.
  • the Lmux kernel After the Lmux kernel finishes reading the data, it caches the file page it has recently accessed into the memory for a period of time.
  • the memory of the cache file page is called the page cache.
  • the data read via the system's "read ( )" API) occurs between the application buffer and the page cache, as shown in Figure 2, and
  • the data prefetch algorithm is responsible for reading data from the disk to fill the page cache.
  • the page cache application When the page cache application reads from the page cache to the application buffer, its read granularity is generally small, for example, the read and write granularity of the file copy command.
  • the kernel's data prefetch fills the data from disk to the page cache at a size that it thinks is more appropriate, for example, 16 KByte (kilobytes) to 128 KByte (kilobytes) (page cache as for data prefetching
  • the working level of the algorithm can be seen in Figure 3.
  • the data prefetch algorithm works at the VFS layer, which uniformly serves various file read operations (system call APIs), and is independent of the specific file system.
  • the prefetch buffer of the node is a memory allocated by the system to the node for caching the file page recently accessed by the node kernel, that is, the page cache, and the idle prefetch buffer of the node.
  • the area is the memory remaining after removing the memory occupied by the already prefetched data in the page cache.
  • the idle prefetch buffer capacity of a node is also one of the factors that affect the amount of data prefetch.
  • the data prefetch algorithm when a thread running on a node belongs to a file, whenever a data prefetch request is issued, the data prefetch algorithm uses a data called a "prefetch window".
  • the structure records the data prefetch request to indicate the length of the data requested for prefetching, as shown in FIG.
  • the start and size form a prefetch window that records the location and size of the most recent prefetch request, and async_size indicates the amount of advance prefetch.
  • PG The readahead page is set in the last prefetch I/O, indicating that the application has exhausted enough early read windows, the time for the next prefetch I/O has arrived, and the isochronous read-ahead is initiated. To read more file pages. Therefore, it is easy to obtain the size of the previous prefetch window Rprev — Size by the recorded data prefetch request.
  • the pre-fetch window size can be set to be larger than the data length of the initial request pre-fetch, for example , you can set the prefetch window size to twice the length of the data requested for the first time. Of course, it can be set to other multiples. In principle, as long as the data length of the first request prefetch is larger, the present invention does not particularly limit this.
  • the pre-fetch amount is multiplied by a maximum of ⁇ . It is used to limit the multiplier of each pre-fetch amount, which can be set by the user according to the actual situation. With the size of the previous prefetch window
  • the size of the prefetch window cannot be increased indefinitely.
  • the size of the prefetch window cannot be infinitely increased according to the relationship of ⁇ - xxr, gp, should There is some restriction on the size of the prefetch window. In the embodiment of the present invention, it can be set by a user.
  • the data prefetching method for non-uniform memory access can be used to access the parameters and processes of the disk load in the NUMA system according to the characterization of the non-uniform memory.
  • the product of three factors the set ⁇ maximum prefetch quantity ⁇ size relationship to determine the size of this window prefetch finally determined according to the size of this window to prefetch data prefetching.
  • the data prefetching method provided by the embodiment of the present invention comprehensively considers the factors affecting the system performance such as the disk I/O load and the remaining memory of the node, BP, in the disk.
  • the data prefetch amount is appropriately enlarged, which is conducive to hiding data I/O.
  • the disk I/O load is heavy and the remaining memory of the node is small, the data is appropriately reduced.
  • Pre-fetching is beneficial to save system resources.
  • the data prefetch parameter parameter ⁇ obtained by characterizing the disk load in the NUMA system and the idle prefetch buffer capacity of the node where the process is located can be implemented by:
  • the weight of the disk load to the prefetch amount and the prefetch buffer capacity of the node where the process is located are added to the weight of the prefetch amount. Then, the difference between the weight of the disk load on the prefetch amount and the weight of the prefetch buffer capacity of the node where the process is located is increased, and the difference is the data prefetch parameter parameter. .
  • the weight of the disk load to the prefetch amount is obtained according to the parameter representing the disk load in the NUMA system, and the length of the current I/O queue of the operating system can be obtained by calling the I/O queue obtaining module.
  • the ratio of the queue length (recorded as 2 dishes) is multiplied by the first adjustable factor (denoted as a) to obtain
  • the weight of the disk load on the prefetch increase is a 0 TM 0
  • the weight of the prefetch buffer capacity of the node where the process is located is increased according to the idle prefetch buffer capacity of the node where the process is located, and the idle prefetch of the node where the process is located may be obtained by calling the memory acquisition module.
  • the buffer capacity is implemented. Specifically, the idle prefetch buffer capacity of the node where the process is located (recorded as the ratio of M to the total prefetch buffer capacity of the node where the process is located (denoted as M w.
  • the first adjustable factor a and the second adjustable factor b may be determined by the user according to the hardware environment and the requirements of the user, and the value ranges from (0 1). If the user is not the first The adjustable factor a and the second adjustable factor b are adjusted, and the first adjustable factor a and the second adjustable factor b may take a default value of 1.
  • the first adjustable The factor a and the second adjustable factor b are used to adjust the weight of the total prefetch buffer capacity (ie, ⁇ / ⁇ ) and the disk load condition (ie, Q-1Q) of the node where the process is located, and specifically, the weight of the prefetch amount, specifically,
  • the pre-fetch buffer idle ratio (ie, MM fw ) of the node where the process is located has a relatively large influence on the pre-fetch amount;
  • the disk load condition ie, the » ⁇ / ⁇ dish
  • each virtual machine runs a separate operating system.
  • the data prefetching method for non-uniform memory access provided by the foregoing embodiment is basically unchanged, except that, due to the function of the virtualized I/O subsystem, each virtual machine runs independently.
  • the operating system can have a separate file system and independently manage I/O queues. Therefore, the I/O queue length inside the operating system does not reflect the disk I/O load of the entire system.
  • the I/O queue acquisition module uses the call interface to obtain the current I/O queue length of the entire NUMA system instead of the virtual machine.
  • the independent operating system running in the middle obtains the current I/O queue length of the entire NUMA system; if the virtualization system does not provide such a calling interface, the I/O queue length of the current NUMA system is obtained from the operating system running in the virtual machine. .
  • a virtualization system management tool for example, Hypervisor
  • Management tools such as Hypervisor coordinate the memory management and communication in the virtualization system running on each node.
  • the strategy adopted for memory allocation and scheduling is public.
  • the specific implementation is as follows, if virtualization is performed on a node.
  • the system is running prefetch software, which can first obtain the I/O queue length of the node, according to the management tool.
  • the maximum multiplication factor ⁇ . Fe can be determined by the user according to the total prefetch buffer capacity of the node where the process is located and the characteristics of the main application of the system, ⁇ ⁇ , if the total prefetch buffer allocated by the node where the process is located has a large capacity, the main application has sequential sequential read
  • the maximum multiplier of the prefetch amount 7 ⁇ can be set to a larger value, so that the prefetch window can be rapidly increased and the data prefetch hit rate is increased, if allowed.
  • the maximum multiplication factor of the prefetch amount may be in the range [0, 8], where the symbol "[]" represents a closed interval; in [0, 8], the maximum prefetch amount is also followed.
  • the multiplication factor ⁇ is larger, and the value of the maximum multiplication factor of the prefetch amount is larger.
  • the data prefetching method for non-uniform memory access provided by the present invention can at least bring the following effects:
  • the present invention does not change the basic framework of the file prefetching of the Lmux kernel, but proposes a new prefetch management strategy based on the traditional data prefetching algorithm and the exclusive environment.
  • An optimization of prefetch management does not affect the stability of the system;
  • the present invention comprehensively considers the multiple architecture features of the NUMA system affecting file prefetching effects such as disk load and memory management, and solves the problem that the Lmux kernel data prefetching algorithm does not match, and improves data prefetching. Reliability and accuracy;
  • the data prefetch parameter parameter ⁇ of the active prediction algorithm is proposed.
  • the determined data prefetch is determined by the current disk load of the NUMA system, the idle prefetch buffer size of the node where the process is located, and the global memory size, rather than simply a certain number (for example, 2 or 4). Multiplying, realizing the dynamic determination of the data prefetch parameter parameter size, scientifically effective and low management prefetch window size;
  • FIG. 5 it is a schematic structural diagram of a data prefetching apparatus 05 for non-uniform memory access according to an embodiment of the present invention. For the convenience of description, only parts related to the embodiment of the present invention are shown.
  • the data prefetching device 05 for non-uniform memory access illustrated in Fig. 5 includes a data prefetch parameter factor obtaining module 501, a prefetch window multiplying module 502, and a prefetching window fetching module 503, wherein:
  • the data prefetch parameter acquisition module 501 is configured to obtain a data prefetch parameter parameter according to the parameter that identifies the non-uniform memory access disk load in the NUMA system and the idle prefetch buffer capacity of the node where the process is located, where the NUMA is represented.
  • the parameters of the disk load in the system are related to the current operating system input/output I/O queue.
  • the parameters characterizing the disk load in the NUMA system are related to the input/output I/O queue of the current operating system.
  • the so-called "current operating system I / O queue” refers to the I / O queue managed by the operating system and the current access to the disk by the I / O queue, how many read and write queues in the current NUMA system are accessing the disk .
  • the Lmux system is taken as an example to describe the data prefetching algorithm based on the design principle and working level of the data prefetching algorithm.
  • the Lmux kernel After the Lmux kernel finishes reading the data, it caches the file page it has recently accessed into the memory for a period of time.
  • the memory of the cache file page is called the page cache.
  • the data read via the system's "read ( )" API) occurs between the application buffer and the page cache, as shown in Figure 2, and
  • the data prefetch algorithm is responsible for reading data from the disk to fill the page cache.
  • the page cache application When the page cache application reads from the page cache to the application buffer, its read granularity is generally small, for example, the read and write granularity of the file copy command.
  • the kernel's data prefetch fills the data from disk to the page cache at a size that it thinks is more appropriate, for example, 16 KByte (kilobytes) to 128 KByte (kilobytes) (page cache
  • Figure 3 The data prefetch algorithm works at the VFS layer, and uniformly serves various file read operations (system call APIs), which are independent of the specific File system.
  • the PG_readahead flag is set for the page. Finally, all The new page is passed to read_pages ( ), where they are added to the in-memory radix tree and inactive_ list one by one, and call the readpage ( ) of the file system in which they are located to deliver the page to I/O.
  • the prefetch buffer of the node is a memory allocated by the system to the node for caching the file page recently accessed by the node kernel, that is, the page cache, and the node is idle.
  • the prefetch buffer is the memory remaining after the memory occupied by the already prefetched data in the page cache is removed.
  • the idle prefetch buffer capacity of a node is also one of the factors that affect the amount of data prefetch.
  • the prefetch window multiplication module 502 is configured to obtain the size of the previous prefetch window ⁇ f and the maximum multiplier of the prefetch amount ⁇ . Fe and the product of the data prefetch parameter parameter acquisition module 501 and the data prefetch parameter parameter ⁇ .
  • the data prefetch algorithm when a thread running on a node belongs to a file, whenever a data prefetch request is issued, the data prefetch algorithm uses a data called a "prefetch window".
  • the structure records the data prefetch request to indicate the length of the data requested for prefetching, as shown in FIG.
  • the start and size form a prefetch window that records the location and size of the most recent prefetch request, and async_size indicates the amount of advance prefetch.
  • PG The readahead page is set in the last prefetch I/O, indicating that the application has exhausted enough early read windows, the time for the next prefetch I/O has arrived, and the read-ahead read-ahead is read to read Take more file pages. Therefore, it is easy to obtain the size of the last prefetch window Rprev — Size by the recorded data prefetch request.
  • the pre-fetch window size can be set to be larger than the data length of the initial request pre-fetch, for example , you can set the prefetch window size to twice the length of the data requested for the first time. Of course, it can be set to other multiples. In principle, as long as the data length of the first request prefetch is larger, the present invention does not particularly limit this.
  • the maximum multiplier of the prefetch amount is used to limit each pre-preparation.
  • the multiplication multiple can be set by the user according to the actual situation. ⁇ with the last prefetch window:
  • the prefetch window obtaining module 503 is configured to compare the set maximum read-ahead readahead and the size of the S obtained by the pre-fetch window multiplying module 502 to the smaller of the 'readahead and the ' ze ' The value is used to prefetch the data as the size of this prefetch window.
  • the size of the prefetch window cannot be increased indefinitely.
  • the size of the prefetch window cannot be infinitely increased according to the relationship of ⁇ - xxr, gp, should There is some restriction on the size of the prefetch window.
  • a maximum prefetch can be set by the user: readahead
  • the data prefetching device 05 for non-uniform memory access knows that the data prefetch parameter parameter obtaining module 501 accesses the parameter of the disk load in the NUMA system according to the non-uniform memory.
  • the prefetch window obtaining module 503 can be the size of the last prefetch window R e , the maximum multiplier of the prefetch amount ⁇ .
  • the size relationship between fe and the data prefetch parameter parameter ⁇ and the set maximum prefetch amount ⁇ rearfatearf determines the size of the prefetch window, and finally prefetches the data according to the determined size of the prefetch window. Since the parameter representing the disk load in the NUMA system is related to the input/output I/O queue of the current operating system, and the data prefetch parameter parameter ⁇ is obtained according to the idle prefetch buffer capacity of the node where the process is located, therefore, The present invention is compared to a data prefetch algorithm for a single processor provided by the technology.
  • the data prefetching device comprehensively considers factors such as the disk I/O load and the remaining memory size of the node, which affects system performance, gp, when the disk I/O load is light, and the remaining memory of the node is large, the data pre-expanding is appropriately expanded.
  • the amount of data is conducive to concealing data 1/0.
  • the data prefetching amount is appropriately reduced, which is beneficial to save system resources.
  • each functional module is merely an example, and the actual application may be implemented according to requirements, such as corresponding hardware configuration requirements or software implementation. Convenience considerations, and the above function assignment is performed by different functional modules, that is, the internal structure of the data prefetching device for non-uniform memory access is divided into different functional modules to complete all or part of the functions described above. .
  • the corresponding functional modules in this embodiment may be implemented by corresponding hardware, or may be executed by corresponding hardware, for example, the foregoing data prefetch parameter parameter acquisition module may be Having the foregoing hardware for characterizing the disk load in the non-uniform memory access NUMA system and the idle prefetch buffer capacity of the node where the process is located, for example, the data prefetch parameter factor acquirer may also A general processor or other hardware device capable of executing a corresponding computer program to perform the foregoing functions; and the pre-fetch window multiplication module as described above may have a size of a pre-fetch window for performing the aforementioned pre-fetching window, R prev — Size , Pre Taking the maximum multiplier of each time and the hardware of the product function of the data prefetch parameter parameter acquisition module (or the data prefetch parameter parameter acquirer), for example, the prefetch amount Window multiplier, or it can be executed before the corresponding computer program is completed A general processor or other hardware features (various embodiments of
  • the data prefetch parameter parameter acquisition module 501 illustrated in FIG. 5 further includes a weight acquisition sub-module 601 and a difference sub-module 602, such as the data prefetching device 06 for non-uniform memory access as illustrated in FIG. 6, wherein :
  • the weight obtaining sub-module 601 is configured to obtain a disk load pair according to a parameter indicating a non-uniform memory access to a disk load in the NUMA system and an idle prefetch buffer capacity of a node where the process is located.
  • the weight of the prefetching amount and the weight of the prefetch buffer capacity of the node where the process is located increase the prefetch amount;
  • the difference sub-module 602 is configured to obtain a difference between a weight of the disk load on the pre-fetch amount and a weight of the pre-fetch buffer capacity of the node where the process is located, and obtain the data pre-fetch Quantity parameter factor.
  • the weight acquisition sub-module 601 of the example of Fig. 6 further includes a memory acquisition unit 701 and a prefetch weight acquisition unit 702, such as the data prefetching apparatus 07 for non-uniform memory access as illustrated in Fig. 7, wherein:
  • the memory obtaining unit 701 is configured to call the I/O queue obtaining module and the memory acquiring module to respectively acquire the length of the current I/O queue of the operating system and the idle prefetch buffer prefetching weight obtaining unit 702 of the node where the process is located, Multiplying a ratio of a length of a current I/O queue of the operating system to an operating system-defined maximum I/O queue length by a first adjustable factor to obtain a weight of a disk load to a prefetch amount, the process The ratio of the capacity of the idle prefetch buffer of the node to the total prefetch buffer capacity of the node where the process is located is multiplied by the second adjustable factor to obtain the weight of the prefetch buffer capacity of the node where the process is located.
  • the ratio of the length of the current I/O queue of the operating system to the maximum I/O queue length defined by the operating system is a parameter characterizing the disk load in the non-uniform memory access NUMA system.
  • the memory obtaining unit 701 can acquire the length of the current I/O queue of the operating system by calling the I/O queue obtaining module, combined with the prefetch amount.
  • the weight obtaining unit 702 obtains the weight of the disk load on the increase of the prefetch amount.
  • the jprobe technique can be used to detect the do-generic-make-request() function, from which the length of the current I/O queue of the system is obtained, that is, the operating system I/O queue length being used (by detecting do-generic- The value of the make_request() function, count), can also get the maximum I/O queue length defined by the operating system (by detecting the parameter max_io_length of the do-generic-make-request() function); then, prefetching Volume weight acquisition unit 702 will operate the operating system The ratio of the length of the current I/O queue (denoted Q -t ) to the maximum I/O queue length defined by the operating system (denoted as G max ) multiplied by the first adjustable factor (denoted as a ) to obtain the disk load pair
  • the weight of the prefetch increase, BP, the weight of the disk load on the prefetch increase is a 0 TM 0
  • the memory obtaining unit 701 can also acquire the idle prefetch buffer capacity of the node where the process is located by calling the memory obtaining module, and combine the prefetch weight.
  • the obtaining unit 702 is configured to obtain the weight of the prefetch buffer capacity of the node where the process is located to increase the prefetch amount.
  • the prefetch amount weight obtaining unit 702 multiplies the ratio of the idle prefetch buffer capacity of the node where the process is located (reported as M and the total prefetch buffer capacity of the node where the process is located (denoted as M w. ').
  • the first adjustable factor a and the second adjustable factor b may be determined by the user according to the hardware environment. Determined by its own requirements, its value range is (0 1). If the user does not adjust the first adjustable factor a and the second adjustable factor b, the first adjustable factor a and the second adjustable factor b may be absent. The value is 1.
  • the first adjustable factor a and the second adjustable factor b are used to adjust the total prefetch buffer capacity occupancy of the node where the process is located (ie, M ⁇ / Gmax )
  • the weight of the prefetch amount specifically, when the first adjustable factor a is relatively large and the second adjustable factor b is relatively small, the node where the process is located prefetches the buffer idle ratio (ie, MM fw )
  • the pre-fetch amount has a relatively large influence; on the contrary, when the first adjustable factor a is relatively small and the second adjustable factor b is relatively large, the disk load condition (ie, Qcurrent / Q ⁇ ) has a relatively large impact on the amount of prefetch.
  • each virtual machine runs a separate operating system.
  • the data prefetching method for non-uniform memory access provided by the foregoing embodiment is basically unchanged, except that, due to the role of the virtualized I/O subsystem, each virtual machine runs independently.
  • the operating system can have a separate file system and independently manage I/O queues. Therefore, the I/O queue length inside the operating system does not reflect the disk I/O load of the entire system.
  • the I/O queue acquisition module uses the call interface to obtain the current I/O queue length of the entire NUMA system instead of the virtual machine.
  • the independent operating system running in the middle obtains the current I/O queue length of the entire NUMA system; if the virtualization system does not provide such a calling interface, the I/O queue length of the current NUMA system is obtained from the operating system running in the virtual machine. .
  • a virtualization management tool for example, a hypervisor
  • a management tool such as a hypervisor uniformly coordinates the memory management and communication in a virtualized system running on each node.
  • the strategy adopted for memory allocation and scheduling is public. The specific implementation is as follows. If the virtualization system on a node is running prefetching software, you can first obtain the I/O queue length of the node, according to the management tool (Hypervisor). The memory scheduling strategy derives the current I/O queue length for the entire NUMA system (that is, the length of the I/O queue being used by the entire NUMA system).
  • the maximum multiplier of the prefetch amount ⁇ is used to limit the multiplier of each prefetch amount, which can be determined by the user according to the total prefetch buffer capacity of the node where the process is located and the characteristics of the main application of the system, gp If the total prefetch buffer allocated by the node where the process is located has a large capacity, and the main application has the characteristics of sequential sequential read files, the maximum multiplier of the prefetch amount ⁇ can be set to a larger value, so that the allowed In this case, the prefetch window can grow rapidly and improve the data prefetch hit rate.
  • the data prefetching means 07 for non-uniform memory access exemplified in FIG.
  • the prefetch amount is multiplied by a maximum number ⁇ .
  • the value range of fe can be [0, 8], where the symbol "[ ] " represents a closed interval; in [0, 8], the same as the maximum multiplier of the pre-fetch amount 7 ⁇ s, the prefetch The maximum multiplication factor ⁇ . The principle that the value of fe is larger.
  • the data prefetch parameter parameter is obtained to obtain the size of the previous prefetch window R - Size , the maximum prefetch amount Multiplier ⁇ .
  • Fe and the product of the data prefetch parameter parameter r, S ⁇ ' compare the set maximum prefetch amount ⁇ ⁇ ⁇ ⁇ and the size, with the comparison of the ⁇ 4 rearfatearf and the & ⁇
  • the small value is used to prefetch the data as the size of this prefetch window.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: a read only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data readahead method and device for non-uniform memory access (NUMN), the method comprising: obtaining a data readahead volume parameter factor r according to a parameter representing the magnetic disk load of a NUMA system and the idle readahead buffer capacity of a node that a process is in; calculating the product (Ssize) of the previous readahead window size (Rprev_size), the readahead volume maximization multiplier (Tscale) and the data readahead volume parameter factor r; comparing the preset maximum readahead volume (MAXreadahead) with the Ssize, and using the smaller value of the MAXreadahead and the Ssize as the readahead window size to read the data ahead. The method of the present invention comprehensively considers the factors such as magnetic disk I/O load, the remaining cache size of the node and the like affecting system performance, thus facilitating data I/O hiding and system resource saving.

Description

用于非一致性内存访问的数据预取方法和装置 技术领域 本发明涉及通信领域, 尤其涉及用于非一致性内存访问的数据预取方 法和装置。 背景技术  BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to the field of communications, and more particularly to a data prefetching method and apparatus for non-uniform memory access. Background technique
目前, 磁盘依然是计算机系统的主要存储介质。 然而, 随着技术的 日新月异, 磁盘面临着其输入 /输出 (Input/Output, I/O)带宽比不上中央处 理器(Central Processing Unit, CPU) /内存的发展速度和磁盘的访问延迟与 CPU/内存的读写速度差距日益扩大的两大挑战。 在 CPU速度、 磁盘传输速 度和磁盘 I/O访问速度三者中, 磁盘 I/O访问速度的提高最缓慢, 特别地, 磁 盘 I/O访问速度与 CPU的速度差距越来越大,磁盘 I/O访问时延已经成为制约 系统 I/O性能的一个最主要的瓶颈。 在操作系统层面, 异歩化是一种非常有 效的 I/O性能优化策略, 数据预取则是实现 I/O的异歩化的常用方法。  Currently, disks are still the primary storage medium for computer systems. However, with the rapid development of technology, the disk faces its input/output (I/O) bandwidth, which is not comparable to the central processing unit (CPU)/memory development speed and disk access latency and CPU. / Memory two gaps in the growing gap between read and write speeds. Among the CPU speed, disk transfer speed and disk I/O access speed, the disk I/O access speed is the slowest. In particular, the disk I/O access speed and the CPU speed are getting wider and larger. Disk I /O access latency has become one of the most important bottlenecks that constrain system I/O performance. At the operating system level, heterogeneity is a very effective I/O performance optimization strategy. Data prefetching is a common method to achieve I/O heterogeneity.
所谓数据预取, 是指系统在后台提前进行 I/O操作, 将所需数据提前 加载到内存, 以隐藏应用程序的 I/O延迟, 从而可以有效地提高计算机系统 的利用率。 与传统的串行处理相比, 数据预取提供的异歩操作策略可以消 除 CPU的等待时间,使 CPU与磁盘能并行工作,进而从整体上提高系统的 I/O 性能。 数据预取采用模式匹配的方法, gp, 通过监测应用程度对各个文件 的访问序列, 维护其历史访问记录, 并将其与识别出来的模式逐一进行模 式匹配。 如果符合某一访问模式的行为特征, 即可依此进行数据的预测和 预取。 具体的实现技术包括启发式预取和知情式预取, 其中, 启发式预取 对上层应用是透明的,通过自动观测程序的历史访问记录,分析其 I/O 特征, 独立自主地预测并预取即将被访问的数据块。 Lmux内核 2.6.23之后的版本提供了一种基于启发式的按需预取算 法, 它工作在虚拟文件系统(Virtual File System, VFS )层, 对上统一地服 务于各种文件读取操作 (通过系统调用 API), 对下独立于具体的文件系统。 按需预取算法引入了页面状态和页面缓存状态, 采用宽松的顺序性判决条 件,对顺序性 I/O操作提供有效的支持,包括异歩 /非阻塞 I/O、多线程交织 I/O、 顺序随机混合 I/O、大规模并发 I/O等多种操作。 当应用程序要进行数据的访 问时, 通过系统调用接口, 经由页面缓存访问一个磁盘文件。 内核对这一 标准文件访问路径调用预取算法, 跟踪应用程序的访问序列, 并进行恰当 的预取。 具体地, Lmux提供的基于启发式的按需预取算法主要通过监控应 用程序的读请求和页面缓存来判定应用程序的访问模式, 再根据访问模式 来决定预取的位置和大小等。 预取框架大致可分为两大部分: 监控部分和 判断处理部分, 其中, 监控部分嵌入在 do— generic— file— read( )函数等读请求 响应历程中, 检测请求中的每一个页面是否已经在文件缓存地址空间中, 如果没有, 则申请一个新的页面, 同时应用程序被临时挂起而等待 I/O加载 该页面, 进行同歩预读。 如果该新页面的偏移量正好是预读参数 async— size 指向的位置, 则为该页面置预取标记(PG— readahead) 。在后续的数据预取 过程中, 当检测到预取标记页面 (PG— readahead page) , 则意味着下一个 预取 I/O的时机已经到来, 系统进行异歩预读。 匹配部分位于 The so-called data prefetching means that the system performs I/O operations in advance in the background, and loads the required data into the memory in advance to hide the I/O delay of the application, thereby effectively improving the utilization of the computer system. Compared with the traditional serial processing, the data prefetching provides an operation strategy that eliminates the waiting time of the CPU and enables the CPU and the disk to work in parallel, thereby improving the overall I/O performance of the system. Data prefetching adopts the method of pattern matching, gp, to maintain the history access record by monitoring the access sequence of each file, and to match the identified patterns one by one. If the behavior characteristics of an access pattern are met, data prediction and prefetching can be performed accordingly. The specific implementation techniques include heuristic pre-fetching and informed prefetching. Among them, heuristic prefetching is transparent to the upper layer application, and the I/O characteristics are analyzed through the historical access record of the automatic observing program, and the prediction and prefetching are performed independently. The data block that will be accessed. The Lmux kernel version after 2.6.22 provides a heuristic-based on-demand prefetch algorithm that works on the Virtual File System (VFS) layer to uniformly serve various file read operations ( Through the system call API), the pair is independent of the specific file system. The on-demand prefetch algorithm introduces page state and page cache state, and adopts loose sequential decision conditions to provide effective support for sequential I/O operations, including heterogeneous/non-blocking I/O and multi-thread interleaved I/O. , sequential random mixed I / O, large-scale concurrent I / O and other operations. When an application wants to access data, it accesses a disk file via the page cache through the system call interface. The kernel calls the prefetch algorithm on this standard file access path, tracks the application's access sequence, and performs proper prefetching. Specifically, the heuristic-based on-demand prefetch algorithm provided by Lmux mainly determines the access mode of the application by monitoring the read request of the application and the page cache, and then determines the location and size of the prefetch according to the access mode. The prefetching framework can be roughly divided into two parts: a monitoring part and a judging processing part, wherein the monitoring part is embedded in a read request response process such as a do-generic_file_read() function, and detects whether each page in the request has been In the file cache address space, if not, apply for a new page, and the application is temporarily suspended while waiting for I/O to load the page for peer read-ahead. If the offset of the new page is exactly the location pointed to by the pre-read parameter async_size, the page is prefetched (PG_readahead). In the subsequent data prefetching process, when the prefetch tag page (PG_readahead page) is detected, it means that the timing of the next prefetch I/O has arrived, and the system performs the pre-reading. The matching part is located
ondemand— readahead〇函数, 在逻辑上由一组独立的判决模块组成, 判断是 否是文件开始读、 小文件读、 顺序读和随机读。 按需预取框架支持顺序读 和随机读两种访问模式, 对小文件读则采取简单的抛弃, 不进行数据预取。 The ondemand-readahead〇 function is logically composed of a set of independent decision modules that determine whether the file is a read, a small file read, a sequential read, or a random read. The on-demand prefetching framework supports both sequential read and random read access modes, and simple discarding of small file reads is performed without data prefetching.
无论是 Lmux内核 2.6.23之后版本提供的基于启发式的按需预取算法 还是其他数据预取技术, 在设计之初, 都是面向单处理器系统的。 由于单 处理器系统本身受到处理器的计算能力、 存储器容量和带宽等因素的限制, 因此, 对应的数据预取设计得相对比较保守, 尤其是预取量管理部分: 以 初次读请求的页面大小为基准, 采取以 2为因子的倍增策略并设定上限窗Both the heuristic-based on-demand prefetch algorithm provided by the Lmux kernel version 2.2.63 and other data prefetching techniques are designed for single-processor systems. Since the single processor system itself is limited by factors such as the computing power, memory capacity and bandwidth of the processor, the corresponding data prefetch design is relatively conservative, especially the prefetch management part: The page size of the initial read request is the benchmark, taking a multiplication strategy with a factor of 2 and setting the upper limit window.
□。 □.
随着科学计算、 事务处理对计算机性能要求的不断提高, 对称多处 理器 (Symmetrical Multi-Processing, SMP) 系统的应用越来越广泛, 规模 也越来越大。 非一致性内存访问 (Non-Uniform Memory Access, NUMA) 的多处理器系统是由若干通过高速专用网络连接起来的独立节点构成的系 统, 各个节点可以是单个的 CPU或是 SMP系统。 NUMA系统作为分布式共 享存储器结构一类, 同时结合了 SMP系统易编程性和分布式存储系统高可 扩展性的优点, 己成为当今高性能服务器的主流体系结构之一。  With the continuous improvement of computer performance requirements in scientific computing and transaction processing, Symmetrical Multi-Processing (SMP) systems are becoming more widely used and larger in scale. A non-uniform memory access (NUMA) multiprocessor system is a system of independent nodes connected by a high-speed private network. Each node can be a single CPU or an SMP system. As a distributed shared memory structure, NUMA system combines the advantages of SMP system easy programming and high scalability of distributed storage systems, and has become one of the mainstream architectures of today's high-performance servers.
基于分布式共享内存的 NUMA架构的多处理器系统在 CPU访问队列控 制、 内存访问控制和节点负载平衡等体系架构上与单处理器系统有很大的 区别, 针对单处理器系统的数据预取已经不能满足 NUMA架构的多处理器 系统环境。 如果将 Lmux系统部署在分布式共享内存的 NUMA架构服务器 上, 由于 Linux系统提供的预取量管理方法没有综合考虑 NUMA架构服务器 特有的性质, 例如, CPU负载、 节点剩余内存大小和全局剩余内存大小等 影响因素, 因此, 这种针对单处理器系统的数据预取的实际运行效果不能 达到最优。 例如, 多个处理器同时访问文件时, 如果仍然按照针对单处理 器系统设计的预取数据量去预取数据, 则可能导致磁盘系统负载过重; 又 如, 当 NUMA架构的节点本地剩余内存较少时, 如果仍然按照针对单处理 器系统设计的预取数据量去预取数据, 由于 NUMA架构分布式内存的特性 导致访问远端内存延迟较大, 很有可能在节点本地内存中的数据尚未取走 (没有被访问远端内存的节点取走) 时, 预取回的数据进一歩加剧了对节 点本地剩余内存的占用。 发明内容  Multiprocessor systems based on distributed shared memory NUMA architecture are very different from single processor systems in terms of CPU access queue control, memory access control, and node load balancing architecture. Data prefetching for single processor systems The multiprocessor system environment of the NUMA architecture has not been met. If the Lmux system is deployed on a distributed shared memory NUMA architecture server, the prefetch management method provided by the Linux system does not take into account the unique characteristics of the NUMA architecture server, such as CPU load, node remaining memory size, and global remaining memory size. And other influencing factors, therefore, the actual operation effect of this data prefetching for a single processor system cannot be optimal. For example, when multiple processors access files at the same time, if the data is still prefetched according to the amount of prefetched data designed for a single processor system, the disk system may be overloaded; for example, when the node of the NUMA architecture has local residual memory. When there are few, if the data is still pre-fetched according to the amount of pre-fetched data designed for the single-processor system, the data in the local memory of the node is likely to be due to the large delay of accessing the remote memory due to the characteristics of the distributed memory of the NUMA architecture. When it has not been taken (the node that has not been accessed by the remote memory is taken away), the prefetched data further increases the occupation of the remaining memory of the node. Summary of the invention
本发明实施例提供用于非一致性内存访问的数据预取方法和装置, 以 提高 NUMA架构下文件预取的可靠性和准确率。 Embodiments of the present invention provide a data prefetching method and apparatus for non-uniform memory access, Improve the reliability and accuracy of file prefetching under the NUMA architecture.
本发明实施例提供一种用于非一致性内存访问的数据预取方法, 所述 方法包括:  An embodiment of the present invention provides a data prefetching method for non-uniform memory access, where the method includes:
根据表征非一致性内存访问 NUMA系统中磁盘负载的参数和进程所在 节点的空闲预取缓冲区容量, 获取数据预取量参数因子  Obtain the data prefetch parameter parameter based on the parameter representing the disk load in the non-uniform memory access NUMA system and the idle prefetch buffer capacity of the node where the process is located.
R  R
求取前一次预取窗口的大小 e"ize、 预取量最大倍增倍数 。fe以及所 述数据预取量参数因子 r三者的乘积 . 比较设定的最大预取 readahead禾卩 ^ s 的大小, 以所述Find the product of the size of the previous prefetch window e " ize , the maximum multiplier of the prefetch amount , fe and the data prefetch parameter parameter r. Compare the maximum prefetch readahead and the size of the s With the stated
- rea ^和所述 中的较小值作为本次预取窗口的大小去预取数据。 本发明实施例提供一种用于非一致性内存访问的数据预取装置, 所述 数据预取量参数因子获取模块, 用于根据表征非一致性内存访问 NUMA系统中磁盘负载的参数和进程所在节点的空闲预取缓冲区容量获取 数据预取量参数因子 预取量窗口倍增模块, 用于求取前一次预取窗口的大小 ^f、 预取 量每次最大倍增倍数 以及所述数据预取量参数因子 r三者的乘积 & β
Figure imgf000005_0001
和所述 ύ-的大小, 以所述½^^ ^和所述^ 中的较小值作为本次预取窗口的 大小去预取数据。
- rea ^ and the smaller of the values are used to prefetch data as the size of this prefetch window. An embodiment of the present invention provides a data prefetching apparatus for non-uniform memory access, where the data prefetching parameter factor obtaining module is configured to access parameters and processes of a disk load in a NUMA system according to a non-uniform memory. The idle prefetch buffer capacity of the node acquires a data prefetch parameter parameter prefetch window multiplier module, which is used to obtain the size of the previous prefetch window, the maximum multiplier of the prefetch amount, and the data prefetch The product of the quantity parameter factor r and the β ;
Figure imgf000005_0001
And the size of the ύ -, the smaller value of the 1⁄2 ^ ^ ^ and the ^ is used as the size of the prefetch window to prefetch the data.
从上述本发明实施例可知, 在根据表征非一致性内存访问 NUMA系统 中磁盘负载的参数和进程所在节点的空闲预取缓冲区容量获取了数据预取 量参数因子 后, 即可由前一次预取窗口的大小 ^ f、 预取量最大倍增 倍数 和所述数据预取量参数因子 r三者的乘积 与设定的最大预取量 According to the embodiment of the present invention, after the data prefetch parameter parameter is obtained according to the parameter indicating the disk load in the non-uniform memory access NUMA system and the idle prefetch buffer capacity of the node where the process is located, the previous prefetch can be obtained. The size of the window ^ f, the maximum multiplier of the prefetch amount and the data prefetch parameter parameter r are the product of the maximum prefetch
MAX 的大小关系确定本次预取窗口的大小,最终按照确定出的预取^ 口的大小去预取数据。 由于表征 NUMA系统中磁盘负载的参数与当前操作 系统的输入 /输出 I/O队列相关,并且,是数据预取量参数因子 ^根据进程所 在节点的空闲预取缓冲区容量获取, 因此, 与现有技术提供的针对单处理 器的数据预取算法相比, 本发明实施例提供的数据预取方法综合考虑了磁 盘 I/O负载和节点内存剩余大小等影响系统性能的因素, gp, 在磁盘 I/O负 载较轻、 节点剩余内存较多时, 适当地扩大数据预取量, 有利于隐藏数据 I/O, 当磁盘 I/O负载较重、节点剩余内存较少时,适当地减小数据预取量, 有利于节省系统资源。 附图说明 The size relationship of MAX determines the size of this prefetch window, and finally according to the determined prefetch ^ The size of the mouth to prefetch the data. Since the parameter representing the disk load in the NUMA system is related to the input/output I/O queue of the current operating system, and the data prefetch parameter parameter ^ is obtained according to the idle prefetch buffer capacity of the node where the process is located, therefore, Compared with the data prefetching algorithm provided by the technology for the single processor, the data prefetching method provided by the embodiment of the present invention comprehensively considers the factors affecting system performance such as the disk I/O load and the remaining memory size of the node, gp, on the disk. When the I/O load is light and the remaining memory of the node is large, the data prefetch amount is appropriately enlarged, which is conducive to hiding data I/O. When the disk I/O load is heavy and the remaining memory of the node is small, the data is appropriately reduced. Pre-fetching is beneficial to save system resources. DRAWINGS
为了更清楚地说明本发明实施例的技术方案, 下面将对现有技术或实 施例描述中所需要使用的附图作简单地介绍, 显而易见地, 下面描述中的 附图仅仅是本发明的一些实施例, 对于本领域技术人员来讲, 还可以如这 些附图获得其他的附图。  In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the prior art or the embodiments will be briefly described below. Obviously, the drawings in the following description are only some of the present invention. For the embodiments, other figures can also be obtained as those skilled in the art.
图 1 是本发明实施例提供的用于非一致性内存访问的数据预取方法流 程示意图;  1 is a schematic flowchart of a data prefetching method for non-uniform memory access according to an embodiment of the present invention;
图 2是数据预取算法的一般设计原理示意图;  2 is a schematic diagram of a general design principle of a data prefetching algorithm;
图 3是数据预取算法的工作层次示意图;  Figure 3 is a schematic diagram of the working hierarchy of the data prefetch algorithm;
图 4是数据预取算法中预取窗口示意图;  4 is a schematic diagram of a prefetch window in a data prefetch algorithm;
图 5 是本发明实施例提供的用于非一致性内存访问的数据预取装置结 构示意图;  FIG. 5 is a schematic structural diagram of a data prefetching apparatus for non-uniform memory access according to an embodiment of the present invention; FIG.
图 6 是本发明另一实施例提供的用于非一致性内存访问的数据预取装 置结构示意图;  6 is a schematic structural diagram of a data prefetching device for non-uniform memory access according to another embodiment of the present invention;
图 7 是本发明另一实施例提供的用于非一致性内存访问的数据预取装 置结构示意图。 具体实施方式 FIG. 7 is a schematic structural diagram of a data prefetching apparatus for non-uniform memory access according to another embodiment of the present invention. detailed description
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进 行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例, 而不是全部的实施例。 基于本发明中的实施例, 本领域技术人员所获得的 所有其他实施例, 都属于本发明保护的范围。  The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention are within the scope of the present invention.
请参阅附图 1,是本发明实施例提供的用于非一致性内存访问的数据预 取方法流程示意图, 主要包括歩骤:  Referring to FIG. 1, FIG. 1 is a schematic flowchart of a data prefetching method for non-uniform memory access according to an embodiment of the present invention, which mainly includes the following steps:
S101 ,根据表征非一致性内存访问 NUMA系统中磁盘负载的参数和进 程所在节点的空闲预取缓冲区容量, 获取数据预取量参数因子 。  S101. Obtain a data prefetch parameter parameter according to a parameter that identifies a non-uniform memory access disk load in the NUMA system and an idle prefetch buffer capacity of the node where the process is located.
需要说明的是, NUMA系统虽然包含多个节点, 但只运行一个操作系 统, 因此, 数据预取是针对整个操作系统而言。 在本发明实施例中, 所述 表征 NUMA系统中磁盘负载的参数与当前操作系统的输入 /输出 I/O队列相 关。 所谓 "当前操作系统的 I/O队列", 是指由 I/O队列由操作系统管理且 当前对磁盘访问的 I/O队列, , 当前 NUMA系统中有多少个读写队列正 在对磁盘进行访问。  It should be noted that although the NUMA system contains multiple nodes, only one operating system is run. Therefore, data prefetching is for the entire operating system. In an embodiment of the invention, the parameters characterizing the disk load in the NUMA system are related to the input/output I/O queue of the current operating system. The so-called "current operating system I / O queue" refers to the I / O queue managed by the operating system and the current access to the disk by the I / O queue, how many read and write queues in the current NUMA system are accessing the disk .
为了便于说明节点的空闲预取缓冲区容量,此处以 Lmux系统为例,就 数据预取算法的设计原理以及工作层次, 对数据预取算法进行简单介绍。 Lmux内核在完成对数据的读取之后,会将它最近访问过的文件页面缓存到 内存中一段时间,这段缓存文件页面的内存被称为页面缓存(page cache)。 通常情况下所说到的数据读取 (通过系统的 "read ( )"这个 API) 发生在 应用程序缓冲区 (Application Buffer) 与页面缓存 (page cache) 之间, 如 附图 2所示,而数据预取算法则负责从磁盘中读取数据填充页面缓存(page cache 应用程序从页面缓存读取到应用程序缓冲区时, 其读取颗粒一般都 比较小, 例如, 文件拷贝命令的读写粒度一般是 4KByte (千字节), 内核的 数据预取则会以它认为更合适的大小,例如, 16 KByte (千字节)至 128 KByte (千字节) 将数据从磁盘中填充至页面缓存 (page cache 至于数据预取 算法的工作层次, 可以参阅附图 3。 数据预取算法工作于 VFS 层, 对上统 一地服务于各种文件读取操作 (系统调用 API), 对下独立于具体的文件系统。 当应用程序通过 read ( )、 pread ( )、 readv ( )、 al/O— read ( )、 sendfile ( ) 和 splice ( )等不同的系统 API请求读取文件数据时, 都会进入统一的读请 求处理函数 do— generic— file— read ( )。 这个函数从页面缓存中取出数据来满 足应用程序的请求, 并在适当的时候调用预读例程进行必要的预读 I/O。 预 读算法发出的预读 I/O 请求交由— do_page— cache— readaheadO 进行预处理, 该函数检查请求中的每一个页面是否已经在文件缓存地址空间中, 如果没 有的话就申请一个新页面。 如果该新页面的偏移量正好是预读参数 async— size指向的位置, 则为该页面置 PG— readahead标记。 最后, 所有的 新页面被传给 read_pages ( ), 它们在这里被逐个加入内存中 radix tree 和 inactive— list, 并调用所在文件系统的 readpage ( ), 将页面交付 I/O。 In order to illustrate the capacity of the idle prefetch buffer of the node, the Lmux system is taken as an example to describe the data prefetching algorithm based on the design principle and working level of the data prefetching algorithm. After the Lmux kernel finishes reading the data, it caches the file page it has recently accessed into the memory for a period of time. The memory of the cache file page is called the page cache. Usually the data read (via the system's "read ( )" API) occurs between the application buffer and the page cache, as shown in Figure 2, and The data prefetch algorithm is responsible for reading data from the disk to fill the page cache. When the page cache application reads from the page cache to the application buffer, its read granularity is generally small, for example, the read and write granularity of the file copy command. Typically 4KByte (kilobytes), the kernel's data prefetch fills the data from disk to the page cache at a size that it thinks is more appropriate, for example, 16 KByte (kilobytes) to 128 KByte (kilobytes) (page cache as for data prefetching The working level of the algorithm can be seen in Figure 3. The data prefetch algorithm works at the VFS layer, which uniformly serves various file read operations (system call APIs), and is independent of the specific file system. When the application requests to read file data through different system APIs such as read ( ), pread ( ), readv ( ), al/O_ read ( ), sendfile ( ), and splice ( ), it will enter the unified read request processing. The function do— generic— file— read ( ). This function fetches data from the page cache to satisfy the application's request and, when appropriate, calls the prefetch routine to perform the necessary read-ahead I/O. The read-ahead I/O request from the read-ahead algorithm is preprocessed by — do_page — cache — readaheadO , which checks if each page in the request is already in the file cache address space, and if not, requests a new page. If the offset of the new page is exactly the location pointed to by the pre-read parameter async_size, the PG_readahead flag is set for the page. Finally, all new pages are passed to read_pages(), where they are added to the in-memory radix tree and inactive-list one by one, and call the readpage() of the filesystem in which they are located to deliver the page to I/O.
在本发明实施例中, 节点的预取缓冲区就是系统分配给节点的、 一段 用于缓存节点内核最近访问过的文件页面的内存,即页面缓存( age cache ), 而节点的空闲预取缓冲区也就是除去页面缓存 (page cache) 中被已经预取 到的数据占用的内存之后剩余的内存。 节点的空闲预取缓冲区容量也是影 响数据预取量大小的因素之一。  In the embodiment of the present invention, the prefetch buffer of the node is a memory allocated by the system to the node for caching the file page recently accessed by the node kernel, that is, the page cache, and the idle prefetch buffer of the node. The area is the memory remaining after removing the memory occupied by the already prefetched data in the page cache. The idle prefetch buffer capacity of a node is also one of the factors that affect the amount of data prefetch.
S102,求取前一次预取窗口的大小 RSize、预取量最大倍增倍数 ^。fe以 及所述数据预取量参数因子 ^三者的乘积 。 S102, a prefetch window size before obtaining R - Size, multiple prefetch largest multiplication ^. Fe and the product of the data prefetch parameter parameter ^.
在数据预取算法中, 当某个进程所属的一个运行在节点上的线程读取 文件时, 每当发出一个数据预取请求, 数据预取算法都会以一个称为 "预 取窗口" 的数据结构来记录该数据预取请求以表明请求预取的数据长度, 如附图 4所示。 起点 (start) 和大小 (size) 构成一个预取窗口, 记录了最 近一次预取请求的位置和大小, async— size 指示了异歩预取的位置提前量。 PG— readahead page是在上一次预取 I/O中设置的, 指示应用程序已经用尽 了足够的提前读窗口,进行下一个预取 I/O的时机已经到来,启动异歩预读 以读取更多的文件页面。 因此, 通过记录的数据预取请求, 很容易获取前 一次预取窗口的大小 RprevSizeIn the data prefetch algorithm, when a thread running on a node belongs to a file, whenever a data prefetch request is issued, the data prefetch algorithm uses a data called a "prefetch window". The structure records the data prefetch request to indicate the length of the data requested for prefetching, as shown in FIG. The start and size form a prefetch window that records the location and size of the most recent prefetch request, and async_size indicates the amount of advance prefetch. PG—The readahead page is set in the last prefetch I/O, indicating that the application has exhausted enough early read windows, the time for the next prefetch I/O has arrived, and the isochronous read-ahead is initiated. To read more file pages. Therefore, it is easy to obtain the size of the previous prefetch window RprevSize by the recorded data prefetch request.
需要说明的是, 若进程是初次对文件访问, 即不存在之前记录的预取 窗口, 因此, 在这种情形下, 可以将预取窗口大小设置为比初次请求预取 的数据长度大, 例如, 可以将预取窗口大小设置为初次请求预取的数据长 度的 2倍。 当然, 也可以设置为其他倍数, 原则上只要比初次请求预取的 数据长度大即可, 本发明对此不做特别限制。 在本发明实施例中, 预取量最大倍增倍数 ^。 用来限制每一次预取量 倍增倍数, 可以由用户根据实际情况设定。 与前一次预取窗口的大小  It should be noted that if the process is the first access to the file, that is, there is no pre-fetch window of the previous record, in this case, the pre-fetch window size can be set to be larger than the data length of the initial request pre-fetch, for example , you can set the prefetch window size to twice the length of the data requested for the first time. Of course, it can be set to other multiples. In principle, as long as the data length of the first request prefetch is larger, the present invention does not particularly limit this. In the embodiment of the present invention, the pre-fetch amount is multiplied by a maximum of ^. It is used to limit the multiplier of each pre-fetch amount, which can be set by the user according to the actual situation. With the size of the previous prefetch window
»、 预取量每次最大倍增倍数 。 fe以及所述数据预取 參数因子 三 者的关系为 prev_size χ 1 scale X T », the maximum multiplier of the pre-fetch amount each time. The relationship between fe and the data prefetch parameter factor is prev_size χ 1 sca le XT
S103 , 比较设定的最大预取量 ^^^/^和所述 6的大小, 以所述 ^4 rearfatearf和所述 & β中的较小值作为本次预取窗口的大小去预取数据。 S103, comparing the set maximum prefetch amount ^^^/^ and the size of the 6 , and using the smaller value of the ^4 rearfatearf and the & β as the size of the prefetch window to prefetch data .
由于受预取缓冲区容量等各方面的限制, 预取窗口的大小不可能无限 地增大, 例如, 预取窗口的大小不可能按照^^- x x r这一关系无 限地增大, gp, 应该对预取窗口的大小进行某种限制。 在本发明实施例中,可以由用户设置
Figure imgf000009_0001
Due to various limitations of the prefetch buffer capacity, the size of the prefetch window cannot be increased indefinitely. For example, the size of the prefetch window cannot be infinitely increased according to the relationship of ^^- xxr, gp, should There is some restriction on the size of the prefetch window. In the embodiment of the present invention, it can be set by a user.
Figure imgf000009_0001
将 A^ rearfatearf与歩骤 S102中求取的 ^ (= ^Prev_size χ T丄 scale χ Γ )进行比较, 最终, 以 和 S 中的较小值作为预取窗口的大小去预取数据。 The A^ rearfatearf is compared with ^ (= ^Prev_size χ T丄scale χ 求 ) obtained in step S10 2 , and finally, the data is prefetched with the smaller value in S and the size of the prefetch window.
从上述本发明实施例提供的用于非一致性内存访问的数据预取方法可 知, 在根据表征非一致性内存访问 NUMA系统中磁盘负载的参数和进程所 数因子 三者的乘积 Λ 与设定的最大预取量^^^^的大小关系确定本 次预取窗口的大小, 最终按照确定出的本次预取窗口的大小去预取数据。 由于表征 NUMA系统中磁盘负载的参数与当前操作系统的输入 /输出 I/O队 列相关, 并且, 是数据预取量参数因子 ^根据进程所在节点的空闲预取缓冲 区容量获取,因此,与现有技术提供的针对单处理器的数据预取算法相比, 本发明实施例提供的数据预取方法综合考虑了磁盘 I/O 负载和节点内存剩 余大小等影响系统性能的因素, BP , 在磁盘 I/O负载较轻、节点剩余内存较 多时,适当地扩大数据预取量,有利于隐藏数据 I/O,当磁盘 I/O负载较重、 节点剩余内存较少时, 适当地减小数据预取量, 有利于节省系统资源。 The data prefetching method for non-uniform memory access provided by the embodiment of the present invention can be used to access the parameters and processes of the disk load in the NUMA system according to the characterization of the non-uniform memory. The product of three factors the set Λ maximum prefetch quantity ^^^^ size relationship to determine the size of this window prefetch finally determined according to the size of this window to prefetch data prefetching. Since the parameter representing the disk load in the NUMA system is related to the input/output I/O queue of the current operating system, and the data prefetch parameter parameter ^ is obtained according to the idle prefetch buffer capacity of the node where the process is located, therefore, Compared with the data prefetching algorithm provided by the technology for the single processor, the data prefetching method provided by the embodiment of the present invention comprehensively considers the factors affecting the system performance such as the disk I/O load and the remaining memory of the node, BP, in the disk. When the I/O load is light and the remaining memory of the node is large, the data prefetch amount is appropriately enlarged, which is conducive to hiding data I/O. When the disk I/O load is heavy and the remaining memory of the node is small, the data is appropriately reduced. Pre-fetching is beneficial to save system resources.
在本发明提供的一个实施例中, 由表征 NUMA系统中磁盘负载的参数 和进程所在节点的空闲预取缓冲区容量获取数据预取量参数因子 ^,可以通 过以下方式实现:  In an embodiment provided by the present invention, the data prefetch parameter parameter ^ obtained by characterizing the disk load in the NUMA system and the idle prefetch buffer capacity of the node where the process is located can be implemented by:
首先, 根据表征 NUMA系统中磁盘负载的参数和进程所在节点的空闲 预取缓冲区容量, 获取磁盘负载对预取量增长的权重和进程所在节点的预 取缓冲区容量对预取量增长的权重; 然后, 求取所述磁盘负载对预取量增 长的权重与所述进程所在节点的预取缓冲区容量对预取量增长的权重的差 值, 该差值即为数据预取量参数因子 。  First, according to the parameters of the disk load in the NUMA system and the idle prefetch buffer capacity of the node where the process is located, the weight of the disk load to the prefetch amount and the prefetch buffer capacity of the node where the process is located are added to the weight of the prefetch amount. Then, the difference between the weight of the disk load on the prefetch amount and the weight of the prefetch buffer capacity of the node where the process is located is increased, and the difference is the data prefetch parameter parameter. .
在上述根据表征 NUMA系统中磁盘负载的参数获取磁盘负载对预取量 增长的权重的实施例中,可以通过调用 I/O队列获取模块获取操作系统当前 I/O 队列的长度来实现。 具体地, 可以使用 jprobe 技术探测 do— generic— make— request ( ) 函数, 从此函数中获取系统当前 I/O队列的长 度,即正在使用的操作系统 I/O队列长度(通过探测 do— generic— make— request ( )函数的参数 count) , 也可以获取操作系统限定的最大 I/O队列长度(通 过探测 do— generic— make— request ( ) 函数的参数 max— io— length); 然后, 将 所述操作系统当前 I/O队列的长度(记为 G™ 与操作系统限定的最大 I/O 队列长度 (记为2皿) 的比值乘以第一可调因子(记为 a) 以获取磁盘负载 对预取量增长的权重, BP , 磁盘负载对预取量增长的权重为 a00 而在上述根据进程所在节点的空闲预取缓冲区容量获取进程所在节点 的预取缓冲区容量对预取量增长的权重的实施例中, 可以通过调用内存获 取模块获取进程所在节点的空闲预取缓冲区容量来实现。 具体地, 将进程 所在节点的空闲预取缓冲区容量 (记为 M 与所述进程所在节点总预取缓 冲区容量 (记为 Mw。') 的比值乘以第二可调因子 (记为 b) 以获取所述进 程所在节点的预取缓冲区容量对预取量增长的权重, gp, 进程所在节点的 预取缓冲区容量对预取量增长的权重为 MslMf In the above embodiment, the weight of the disk load to the prefetch amount is obtained according to the parameter representing the disk load in the NUMA system, and the length of the current I/O queue of the operating system can be obtained by calling the I/O queue obtaining module. Specifically, you can use the jprobe technique to detect the do-generic-make-request() function, from which you can get the length of the current I/O queue of the system, that is, the operating system I/O queue length being used (by detecting do-generic- Make - the parameter of the request ( ) function count ) , you can also get the maximum I / O queue length defined by the operating system (by detecting the parameter max - io - length of the do - generic - make - request ( ) function); said current operating system I / O queue length (referred to as G ™ operating system defined maximum I / O The ratio of the queue length (recorded as 2 dishes) is multiplied by the first adjustable factor (denoted as a) to obtain the weight of the disk load on the prefetch increase. BP, the weight of the disk load on the prefetch increase is a 0 TM 0 In the foregoing embodiment, the weight of the prefetch buffer capacity of the node where the process is located is increased according to the idle prefetch buffer capacity of the node where the process is located, and the idle prefetch of the node where the process is located may be obtained by calling the memory acquisition module. The buffer capacity is implemented. Specifically, the idle prefetch buffer capacity of the node where the process is located (recorded as the ratio of M to the total prefetch buffer capacity of the node where the process is located (denoted as M w. ') is multiplied by the second adjustable factor (denoted as b) to obtain the weight of the prefetch buffer capacity of the node where the process is located, and gp, the weight of the prefetch buffer capacity of the node where the process is located to the prefetch amount is M sl M f
此时,数据预取量参数因子 ^为磁盘负载对预取量增长的权重与进程所 在节点的预取缓冲区容量对预取量增长的权重的差值, gP = ^current I Q^,
Figure imgf000011_0001
/ total
At this time, the data prefetch parameter parameter ^ is the difference between the weight of the disk load on the prefetch amount and the weight of the prefetch buffer capacity of the node where the process is located, and gP = ^current IQ^,
Figure imgf000011_0001
/ total
需要说明的是, 在本发明实施例中, 第一可调因子 a和第二可调因子 b 可由用户根据硬件环境和自身需求确定, 其取值范围在 (0 1]。 如果用户 不对第一可调因子 a和第二可调因子 b进行调整,则第一可调因子 a和第二 可调因子 b可取缺省值 1。 从数据预取量参数因子 的表达式可知, 第一可 调因子 a和第二可调因子 b用于调节进程所在节点总预取缓冲区容量占用 情况 (即^/ ^ )和磁盘负载情况 (即 Q—lQ )对预取量的影响权重, 具体地, 当第一可调因子 a相对较大而第二可调因子 b相对较小时, 进程 所在节点预取缓冲区空闲比例 (即 M Mf w ) 对预取量影响相对较大; 反 之, 当第一可调因子 a相对较小而第二可调因子 b相对较大时, 磁盘负载 情况 (即 »·^/β皿) 对预取量影响相对较大。 It should be noted that, in the embodiment of the present invention, the first adjustable factor a and the second adjustable factor b may be determined by the user according to the hardware environment and the requirements of the user, and the value ranges from (0 1). If the user is not the first The adjustable factor a and the second adjustable factor b are adjusted, and the first adjustable factor a and the second adjustable factor b may take a default value of 1. From the expression of the data prefetch parameter parameter, the first adjustable The factor a and the second adjustable factor b are used to adjust the weight of the total prefetch buffer capacity (ie, ^/^) and the disk load condition (ie, Q-1Q) of the node where the process is located, and specifically, the weight of the prefetch amount, specifically, When the first adjustable factor a is relatively large and the second adjustable factor b is relatively small, the pre-fetch buffer idle ratio (ie, MM fw ) of the node where the process is located has a relatively large influence on the pre-fetch amount; When the adjustment factor a is relatively small and the second adjustment factor b is relatively large, the disk load condition (ie, the »·^/β dish) has a relatively large influence on the prefetch amount.
为了实现资源的高效利用, 服务器系统(可以是 NUMA系统中的一个 节点) 中常常运行多个虚拟机, 每个虚拟机中运行独立的操作系统。 在虚 拟化系统中, 前述实施例提供的用于非一致性内存访问的数据预取方法基 本不变, 不同之处在于, 由于虚拟化 I/O子系统的作用, 每个虚拟机中运行 独立的操作系统可以拥有独立的文件系统,并独立地进行 I/O队列管理, 因 此, 操作系统内部的 I/O队列长度不能反映整个系统的磁盘 I/O负载情况。 此时,如果虚拟化系统提供获取整个 NUMA系统 I/O队列长度的调用接口, 则 I/O队列获取模块就利用该调用接口获取当前整个 NUMA系统的 I/O队 列长度, 而不是从虚拟机中运行的独立操作系统获取当前整个 NUMA系统 的 I/O队列长度;如果虚拟化系统不提供此种调用接口,则从虚拟机中运行 的操作系统中获取当前整个 NUMA系统的 I/O队列长度。 具体地, 对于虚 拟化系统不提供调用接口的情形, 可以调用虚拟化系统管理工具 (例如, Hypervisor) 来获得相关参数。 Hypervisor等管理工具对运行于各个节点的 虚拟化系统在内存管理、 通信等方面进行统一协调, 对内存的分配和调度 所采用的策略是公开的, 具体实现如下, 如果某一个节点上的虚拟化系统 在运行预取软件, 可以首先获取该节点的 I/O 队列长度, 根据管理工具In order to achieve efficient use of resources, multiple virtual machines are often run in a server system (which can be a node in a NUMA system), and each virtual machine runs a separate operating system. In virtual In the virtualized system, the data prefetching method for non-uniform memory access provided by the foregoing embodiment is basically unchanged, except that, due to the function of the virtualized I/O subsystem, each virtual machine runs independently. The operating system can have a separate file system and independently manage I/O queues. Therefore, the I/O queue length inside the operating system does not reflect the disk I/O load of the entire system. At this time, if the virtualization system provides a call interface that acquires the entire NUMA system I/O queue length, the I/O queue acquisition module uses the call interface to obtain the current I/O queue length of the entire NUMA system instead of the virtual machine. The independent operating system running in the middle obtains the current I/O queue length of the entire NUMA system; if the virtualization system does not provide such a calling interface, the I/O queue length of the current NUMA system is obtained from the operating system running in the virtual machine. . Specifically, for the case where the virtualization system does not provide a calling interface, a virtualization system management tool (for example, Hypervisor) may be invoked to obtain related parameters. Management tools such as Hypervisor coordinate the memory management and communication in the virtualization system running on each node. The strategy adopted for memory allocation and scheduling is public. The specific implementation is as follows, if virtualization is performed on a node. The system is running prefetch software, which can first obtain the I/O queue length of the node, according to the management tool.
(Hypervisor)的内存调度策略推算出当前整个 NUMA系统的 I/O队列长度(Hypervisor) memory scheduling strategy to calculate the current I / O queue length of the entire NUMA system
(也即整个 NUMA系统正在使用的 I/O队列长度)。 需要进一歩说明的是, 对于预取量最大倍增倍数 ^。fe, 可由用户根据 进程所在节点总预取缓冲区容量大小和本系统主要应用的特点来确定, §卩, 如果进程所在节点分配的总预取缓冲区容量较大, 主要的应用具有连续顺 序读文件的特征, 则预取量最大倍增倍数7^ 可以设置为较大的值, 以使 得在允许的情况下, 预取窗口能够快速增长, 提高数据预取命中率。 (ie the length of the I/O queue being used by the entire NUMA system). What needs to be further explained is that for the pre-fetch amount, the maximum multiplication factor ^. Fe , can be determined by the user according to the total prefetch buffer capacity of the node where the process is located and the characteristics of the main application of the system, § 卩, if the total prefetch buffer allocated by the node where the process is located has a large capacity, the main application has sequential sequential read For the characteristics of the file, the maximum multiplier of the prefetch amount 7 ^ can be set to a larger value, so that the prefetch window can be rapidly increased and the data prefetch hit rate is increased, if allowed.
作为本发明一个实施例,预取量最大倍增倍数 的取值范围可以为 [0, 8] , 其中, 符号 " [ ] "表示闭区间; 在 [0, 8]内, 同样遵循预取量最大倍增 倍数 ^^越大, 所述预取量最大倍增倍数 的取值越大的原则。  As an embodiment of the present invention, the maximum multiplication factor of the prefetch amount may be in the range [0, 8], where the symbol "[]" represents a closed interval; in [0, 8], the maximum prefetch amount is also followed. The multiplication factor ^^ is larger, and the value of the maximum multiplication factor of the prefetch amount is larger.
综合上述本发明提供的实施例可知, 与现有技术提供的用于单处理器 的数据预取算法相比, 本发明提供的用于非一致性内存访问的数据预取方 法至少能够带来如下效果: According to the embodiment provided by the present invention, it is known to provide a single processor with the prior art. Compared with the data prefetching algorithm, the data prefetching method for non-uniform memory access provided by the present invention can at least bring the following effects:
第一,本发明没有改变 Lmux内核既有文件预取的基本框架,而是在其 基础上提出一种新的预取量管理策略, 是对传统数据预取算法的补强和专 属环境下的预取量管理的一种优化, 不会对系统的稳定性造成影响;  First, the present invention does not change the basic framework of the file prefetching of the Lmux kernel, but proposes a new prefetch management strategy based on the traditional data prefetching algorithm and the exclusive environment. An optimization of prefetch management does not affect the stability of the system;
第二, 本发明综合考虑了 NUMA系统在磁盘负载、 内存管理等影响文 件预取效果的多个体系结构特点,解决了 Lmux内核数据预取算法与之不匹 配的问题, 提高了数据预取的可靠性和准确率;  Secondly, the present invention comprehensively considers the multiple architecture features of the NUMA system affecting file prefetching effects such as disk load and memory management, and solves the problem that the Lmux kernel data prefetching algorithm does not match, and improves data prefetching. Reliability and accuracy;
第三, 提出了主动性预测算法的数据预取量参数因子 ^, 根据数据预取 量与进程所在节点的空闲预取缓冲区容量的正相关关系, 与 NUMA系统磁 盘负载的反相关关系, 每次确定的数据预取量由 NUMA系统当前的磁盘负 载、 进程所在节点的空闲预取缓冲区容量大小和全局内存大小共同确定, 而不是简单地以某个数字 (例如, 2或 4) 为系数进行倍增, 实现了动态确 定数据预取量参数因子 ^大小, 科学有效低管理预取窗口大小;  Thirdly, the data prefetch parameter parameter ^ of the active prediction algorithm is proposed. According to the positive correlation between the data prefetch amount and the idle prefetch buffer capacity of the node where the process is located, the inverse correlation relationship with the NUMA system disk load, The determined data prefetch is determined by the current disk load of the NUMA system, the idle prefetch buffer size of the node where the process is located, and the global memory size, rather than simply a certain number (for example, 2 or 4). Multiplying, realizing the dynamic determination of the data prefetch parameter parameter size, scientifically effective and low management prefetch window size;
第四, 实现了动态和自适应地预取数据大小及提前量, 保证程序在任 意时刻终止其顺序、 逆序访问, 预取命中率都在一个可接受的水平。  Fourth, dynamic and adaptive prefetching of data size and advancement is achieved, ensuring that the program terminates its order and reverse order access at any time, and the prefetch hit rate is at an acceptable level.
请参阅附图 5,是本发明实施例提供的用于非一致性内存访问的数据预 取装置 05结构示意图。 为了便于说明, 仅仅示出了与本发明实施例相关的 部分。 附图 5示例的用于非一致性内存访问的数据预取装置 05包括数据预 取量参数因子获取模块 501、预取量窗口倍增模块 502和预取量窗口获取模 块 503, 其中:  Referring to FIG. 5, it is a schematic structural diagram of a data prefetching apparatus 05 for non-uniform memory access according to an embodiment of the present invention. For the convenience of description, only parts related to the embodiment of the present invention are shown. The data prefetching device 05 for non-uniform memory access illustrated in Fig. 5 includes a data prefetch parameter factor obtaining module 501, a prefetch window multiplying module 502, and a prefetching window fetching module 503, wherein:
数据预取量参数因子获取模块 501, 用于根据表征非一致性内存访问 NUMA系统中磁盘负载的参数和进程所在节点的空闲预取缓冲区容量, 获 取数据预取量参数因子 , 所述表征 NUMA系统中磁盘负载的参数与当前 操作系统输入 /输出 I/O队列相关。  The data prefetch parameter acquisition module 501 is configured to obtain a data prefetch parameter parameter according to the parameter that identifies the non-uniform memory access disk load in the NUMA system and the idle prefetch buffer capacity of the node where the process is located, where the NUMA is represented. The parameters of the disk load in the system are related to the current operating system input/output I/O queue.
需要说明的是, NUMA系统虽然包含多个节点, 但只运行一个操作系 统, 因此, 数据预取是针对整个操作系统而言。 在附图 5所示实施例中, 所述表征 NUMA系统中磁盘负载的参数与当前操作系统的输入 /输出 I/O队 列相关。 所谓 "当前操作系统的 I/O队列", 是指由 I/O队列由操作系统管 理且当前对磁盘访问的 I/O队列, , 当前 NUMA系统中有多少个读写队 列正在对磁盘进行访问。 It should be noted that although the NUMA system contains multiple nodes, only one operating system is run. Therefore, data prefetching is for the entire operating system. In the embodiment shown in FIG. 5, the parameters characterizing the disk load in the NUMA system are related to the input/output I/O queue of the current operating system. The so-called "current operating system I / O queue" refers to the I / O queue managed by the operating system and the current access to the disk by the I / O queue, how many read and write queues in the current NUMA system are accessing the disk .
为了便于说明节点的空闲预取缓冲区容量,此处以 Lmux系统为例,就 数据预取算法的设计原理以及工作层次, 对数据预取算法进行简单介绍。 Lmux内核在完成对数据的读取之后,会将它最近访问过的文件页面缓存到 内存中一段时间,这段缓存文件页面的内存被称为页面缓存(page cache)。 通常情况下所说到的数据读取 (通过系统的 "read ( )"这个 API) 发生在 应用程序缓冲区 (Application Buffer) 与页面缓存 (page cache) 之间, 如 附图 2所示,而数据预取算法则负责从磁盘中读取数据填充页面缓存(page cache 应用程序从页面缓存读取到应用程序缓冲区时, 其读取颗粒一般都 比较小, 例如, 文件拷贝命令的读写粒度一般是 4KByte (千字节), 内核的 数据预取则会以它认为更合适的大小,例如, 16 KByte (千字节)至 128 KByte (千字节) 将数据从磁盘中填充至页面缓存 (page cache 至于数据预取 算法的工作层次, 可以参阅附图 3。 数据预取算法工作于 VFS 层, 对上统 一地服务于各种文件读取操作 (系统调用 API), 对下独立于具体的文件系统。 当应用程序通过 read ( )、 pread ( )、 readv ( )、 al/O— read ( )、 sendfile ( ) 和 splice ( )等不同的系统 API请求读取文件数据时, 都会进入统一的读请 求处理函数 do— generic— file— read ( )。 这个函数从页面缓存中取出数据来满 足应用程序的请求, 并在适当的时候调用预读例程进行必要的预读 I/O。 预 读算法发出的预读 I/O 请求交由— do_page— cache— readaheadO 进行预处理, 该函数检查请求中的每一个页面是否已经在文件缓存地址空间中, 如果没 有的话就申请一个新页面。 如果该新页面的偏移量正好是预读参数 async— size指向的位置, 则为该页面置 PG— readahead标记。 最后, 所有的 新页面被传给 read_pages ( ), 它们在这里被逐个加入内存中 radix tree 和 inactive— list, 并调用所在文件系统的 readpage ( ), 将页面交付 I/O。 In order to illustrate the capacity of the idle prefetch buffer of the node, the Lmux system is taken as an example to describe the data prefetching algorithm based on the design principle and working level of the data prefetching algorithm. After the Lmux kernel finishes reading the data, it caches the file page it has recently accessed into the memory for a period of time. The memory of the cache file page is called the page cache. Usually the data read (via the system's "read ( )" API) occurs between the application buffer and the page cache, as shown in Figure 2, and The data prefetch algorithm is responsible for reading data from the disk to fill the page cache. When the page cache application reads from the page cache to the application buffer, its read granularity is generally small, for example, the read and write granularity of the file copy command. Typically 4KByte (kilobytes), the kernel's data prefetch fills the data from disk to the page cache at a size that it thinks is more appropriate, for example, 16 KByte (kilobytes) to 128 KByte (kilobytes) (page cache As for the working level of the data prefetch algorithm, refer to Figure 3. The data prefetch algorithm works at the VFS layer, and uniformly serves various file read operations (system call APIs), which are independent of the specific File system. When an application requests to read file data through different system APIs such as read ( ), pread ( ), readv ( ), al/O_ read ( ), sendfile ( ), and splice ( ), Enter the unified read request handler do — generic — file — read ( ) This function takes the data from the page cache to satisfy the application's request, and calls the prefetch routine to make the necessary read-ahead I/O when appropriate. The read-ahead I/O request from the read-ahead algorithm is preprocessed by — do_page — cache — readaheadO , which checks if each page in the request is already in the file cache address space, and if not, requests a new page. If the offset of the new page is exactly the location pointed to by the pre-read parameter async_size, then the PG_readahead flag is set for the page. Finally, all The new page is passed to read_pages ( ), where they are added to the in-memory radix tree and inactive_ list one by one, and call the readpage ( ) of the file system in which they are located to deliver the page to I/O.
在附图 5所示实施例中, 节点的预取缓冲区就是系统分配给节点的、 一段用于缓存节点内核最近访问过的文件页面的内存, 即页面缓存 (page cache) , 而节点的空闲预取缓冲区也就是除去页面缓存 (page cache) 中被 已经预取到的数据占用的内存之后剩余的内存。 节点的空闲预取缓冲区容 量也是影响数据预取量大小的因素之一。 预取量窗口倍增模块 502, 用于求取前一次预取窗口的大小 ^f 、 预取量最大倍增倍数 ^。fe以及所述数据预取量参数因子获取模块 501 获取 的数据预取量参数因子 ^三者的乘积 。 In the embodiment shown in FIG. 5, the prefetch buffer of the node is a memory allocated by the system to the node for caching the file page recently accessed by the node kernel, that is, the page cache, and the node is idle. The prefetch buffer is the memory remaining after the memory occupied by the already prefetched data in the page cache is removed. The idle prefetch buffer capacity of a node is also one of the factors that affect the amount of data prefetch. The prefetch window multiplication module 502 is configured to obtain the size of the previous prefetch window ^f and the maximum multiplier of the prefetch amount ^. Fe and the product of the data prefetch parameter parameter acquisition module 501 and the data prefetch parameter parameter ^.
在数据预取算法中, 当某个进程所属的一个运行在节点上的线程读取 文件时, 每当发出一个数据预取请求, 数据预取算法都会以一个称为 "预 取窗口" 的数据结构来记录该数据预取请求以表明请求预取的数据长度, 如附图 4所示。 起点 (start) 和大小 (size) 构成一个预取窗口, 记录了最 近一次预取请求的位置和大小, async— size 指示了异歩预取的位置提前量。 PG— readahead page是在上一次预取 I/O中设置的, 指示应用程序已经用尽 了足够的提前读窗口,进行下一个预取 I/O的时机已经到来,启动异歩预读 以读取更多的文件页面。 因此, 通过记录的数据预取请求, 很容易获取上 一次预取窗口的大小 RprevSizeIn the data prefetch algorithm, when a thread running on a node belongs to a file, whenever a data prefetch request is issued, the data prefetch algorithm uses a data called a "prefetch window". The structure records the data prefetch request to indicate the length of the data requested for prefetching, as shown in FIG. The start and size form a prefetch window that records the location and size of the most recent prefetch request, and async_size indicates the amount of advance prefetch. PG—The readahead page is set in the last prefetch I/O, indicating that the application has exhausted enough early read windows, the time for the next prefetch I/O has arrived, and the read-ahead read-ahead is read to read Take more file pages. Therefore, it is easy to obtain the size of the last prefetch window RprevSize by the recorded data prefetch request.
需要说明的是, 若进程是初次对文件访问, 即不存在之前记录的预取 窗口, 因此, 在这种情形下, 可以将预取窗口大小设置为比初次请求预取 的数据长度大, 例如, 可以将预取窗口大小设置为初次请求预取的数据长 度的 2倍。 当然, 也可以设置为其他倍数, 原则上只要比初次请求预取的 数据长度大即可, 本发明对此不做特别限制。  It should be noted that if the process is the first access to the file, that is, there is no pre-fetch window of the previous record, in this case, the pre-fetch window size can be set to be larger than the data length of the initial request pre-fetch, for example , you can set the prefetch window size to twice the length of the data requested for the first time. Of course, it can be set to other multiples. In principle, as long as the data length of the first request prefetch is larger, the present invention does not particularly limit this.
在附图 5 所示实施例中, 预取量最大倍增倍数 用来限制每一次预 取量倍增倍数, 可以由用户根据实际情况设定。 ^与上一次预取窗口的:In the embodiment shown in Figure 5, the maximum multiplier of the prefetch amount is used to limit each pre-preparation. The multiplication multiple can be set by the user according to the actual situation. ^ with the last prefetch window:
R R
小 ^f、 预取量每次最大倍增倍数 。fe以及所述数据预取量参数因子 ^ Small ^f, pre-fetch amount maximum multiplier each time. Fe and the data prefetch parameter parameter ^
S,.  S,.
三者的关系为 prev—size χ 1 scale X T The relationship between the three is prev-size χ 1 sca le XT
.MAX  .MAX
预取量窗口获取模块 503, 用于比较设定的最大预取 readahead禾卩^ ^ 述预取量窗口倍增模块 502获取的 S 的大小,以所述 ' readahead和所述 'ze 中的较小值作为本次预取窗口的大小去预取数据。 The prefetch window obtaining module 503 is configured to compare the set maximum read-ahead readahead and the size of the S obtained by the pre-fetch window multiplying module 502 to the smaller of the 'readahead and the ' ze ' The value is used to prefetch the data as the size of this prefetch window.
由于受预取缓冲区容量等各方面的限制, 预取窗口的大小不可能无限 地增大, 例如, 预取窗口的大小不可能按照^^- x x r这一关系无 限地增大, gp, 应该对预取窗口的大小进行某种限制。  Due to various limitations of the prefetch buffer capacity, the size of the prefetch window cannot be increased indefinitely. For example, the size of the prefetch window cannot be infinitely increased according to the relationship of ^^- xxr, gp, should There is some restriction on the size of the prefetch window.
.MAX 在附图 5所示实施例中, 可以由用户设置一个最大预取: readahead
Figure imgf000016_0001
.MAX In the embodiment shown in Figure 5, a maximum prefetch can be set by the user: readahead
Figure imgf000016_0001
(= R  (= R
取的 χ ^ scale X Γ ) 进行比较, 最终, 以 readahead禾口 Ssize中 的较小值作为本次预取窗口的大小去预取数据。 The comparison χ ^ scale X Γ ) is compared, and finally, the smaller value of the readahead and the S size is used as the size of the prefetch window to prefetch the data.
从上述附图 5所示实施例提供的用于非一致性内存访问的数据预取装 置 05可知, 在数据预取量参数因子获取模块 501根据表征非一致性内存访 问 NUMA系统中磁盘负载的参数和进程所在节点的空闲预取缓冲区容量获 取了数据预取量参数因子 ^后,预取量窗口获取模块 503即可由上一次预取 窗口的大小 R e、 预取量每次最大倍增倍数 ^。fe和所述数据预取量参数 因子 ^三者的乘积 与设定的最大预取量^ rearfatearf的大小关系确定预取 窗口的大小, 最终按照确定出的预取窗口的大小去预取数据。 由于表征 NUMA系统中磁盘负载的参数与当前操作系统的输入 /输出 I/O 队列相关, 并且,是数据预取量参数因子 ^根据进程所在节点的空闲预取缓冲区容量获 取, 因此, 与现有技术提供的针对单处理器的数据预取算法相比, 本发明 实施例提供的数据预取装置综合考虑了磁盘 I/O 负载和节点内存剩余大小 等影响系统性能的因素, gp, 在磁盘 I/O负载较轻、 节点剩余内存较多时, 适当地扩大数据预取量, 有利于隐藏数据 1/0, 当磁盘 I/O负载较重、 节点 剩余内存较少时, 适当地减小数据预取量, 有利于节省系统资源。 The data prefetching device 05 for non-uniform memory access provided by the embodiment shown in FIG. 5 above knows that the data prefetch parameter parameter obtaining module 501 accesses the parameter of the disk load in the NUMA system according to the non-uniform memory. After the data prefetch parameter parameter ^ is obtained from the idle prefetch buffer capacity of the node where the process is located, the prefetch window obtaining module 503 can be the size of the last prefetch window R e , the maximum multiplier of the prefetch amount ^ . The size relationship between fe and the data prefetch parameter parameter ^ and the set maximum prefetch amount ^ rearfatearf determines the size of the prefetch window, and finally prefetches the data according to the determined size of the prefetch window. Since the parameter representing the disk load in the NUMA system is related to the input/output I/O queue of the current operating system, and the data prefetch parameter parameter ^ is obtained according to the idle prefetch buffer capacity of the node where the process is located, therefore, The present invention is compared to a data prefetch algorithm for a single processor provided by the technology. The data prefetching device provided by the embodiment comprehensively considers factors such as the disk I/O load and the remaining memory size of the node, which affects system performance, gp, when the disk I/O load is light, and the remaining memory of the node is large, the data pre-expanding is appropriately expanded. The amount of data is conducive to concealing data 1/0. When the disk I/O load is heavy and the remaining memory of the node is small, the data prefetching amount is appropriately reduced, which is beneficial to save system resources.
需要说明的是, 以上用于非一致性内存访问的数据预取装置的实施方 式中, 各功能模块的划分仅是举例说明, 实际应用中可以根据需要, 例如 相应硬件的配置要求或者软件的实现的便利考虑, 而将上述功能分配由不 同的功能模块完成, 即将所述用于非一致性内存访问的数据预取装置的内 部结构划分成不同的功能模块, 以完成以上描述的全部或者部分功能。 而 且,实际应用中,本实施例中的相应的功能模块可以是由相应的硬件实现, 也可以由相应的硬件执行相应的软件完成, 例如, 前述的数据预取量参数 因子获取模块, 可以是具有执行前述由表征非一致性内存访问 NUMA系统 中磁盘负载的参数和进程所在节点的空闲预取缓冲区容量获取数据预取量 参数因子 的硬件, 例如数据预取量参数因子获取器, 也可以是能够执行相 应计算机程序从而完成前述功能的一般处理器或者其他硬件设备; 再如前 述的预取量窗口倍增模块, 可以是具有执行前述求取上一次预取窗口的大 小 RprevSize、 预取量每次最大倍增倍数 以及所述数据预取量参数因子获 取模块(或数据预取量参数因子获取器)获取的数据预取量参数因子 ^三者 的乘积 功能的硬件, 例如预取量窗口倍增器, 也可以是能够执行相应计 算机程序从而完成前述功能的一般处理器或者其他硬件设备 (本说明书提 供的各个实施例都可应用上述描述原则)。 It should be noted that, in the foregoing embodiment of the data prefetching device for non-uniform memory access, the division of each functional module is merely an example, and the actual application may be implemented according to requirements, such as corresponding hardware configuration requirements or software implementation. Convenience considerations, and the above function assignment is performed by different functional modules, that is, the internal structure of the data prefetching device for non-uniform memory access is divided into different functional modules to complete all or part of the functions described above. . Moreover, in practical applications, the corresponding functional modules in this embodiment may be implemented by corresponding hardware, or may be executed by corresponding hardware, for example, the foregoing data prefetch parameter parameter acquisition module may be Having the foregoing hardware for characterizing the disk load in the non-uniform memory access NUMA system and the idle prefetch buffer capacity of the node where the process is located, for example, the data prefetch parameter factor acquirer may also A general processor or other hardware device capable of executing a corresponding computer program to perform the foregoing functions; and the pre-fetch window multiplication module as described above may have a size of a pre-fetch window for performing the aforementioned pre-fetching window, R prevSize , Pre Taking the maximum multiplier of each time and the hardware of the product function of the data prefetch parameter parameter acquisition module (or the data prefetch parameter parameter acquirer), for example, the prefetch amount Window multiplier, or it can be executed before the corresponding computer program is completed A general processor or other hardware features (various embodiments of the present disclosure provides the above described principles can be applied).
附图 5示例的数据预取量参数因子获取模块 501进一歩包括权重获取 子模块 601和求差子模块 602,如附图 6示例的用于非一致性内存访问的数 据预取装置 06, 其中:  The data prefetch parameter parameter acquisition module 501 illustrated in FIG. 5 further includes a weight acquisition sub-module 601 and a difference sub-module 602, such as the data prefetching device 06 for non-uniform memory access as illustrated in FIG. 6, wherein :
权重获取子模块 601, 用于根据表征非一致性内存访问 NUMA系统中 磁盘负载的参数和进程所在节点的空闲预取缓冲区容量, 获取磁盘负载对 预取量增长的权重和所述进程所在节点的预取缓冲区容量对预取量增长的 权重; The weight obtaining sub-module 601 is configured to obtain a disk load pair according to a parameter indicating a non-uniform memory access to a disk load in the NUMA system and an idle prefetch buffer capacity of a node where the process is located. The weight of the prefetching amount and the weight of the prefetch buffer capacity of the node where the process is located increase the prefetch amount;
求差子模块 602,用于求取所述磁盘负载对预取量增长的权重与所述进 程所在节点的预取缓冲区容量对预取量增长的权重的差值, 得到所述数据 预取量参数因子 。  The difference sub-module 602 is configured to obtain a difference between a weight of the disk load on the pre-fetch amount and a weight of the pre-fetch buffer capacity of the node where the process is located, and obtain the data pre-fetch Quantity parameter factor.
附图 6示例的权重获取子模块 601进一歩包括内存获取单元 701和求 预取量权重获取单元 702,如附图 7示例的用于非一致性内存访问的数据预 取装置 07, 其中:  The weight acquisition sub-module 601 of the example of Fig. 6 further includes a memory acquisition unit 701 and a prefetch weight acquisition unit 702, such as the data prefetching apparatus 07 for non-uniform memory access as illustrated in Fig. 7, wherein:
内存获取单元 701, 用于调用 I/O队列获取模块和内存获取模块以分别 获取操作系统当前 I/O 队列的长度和所述进程所在节点的空闲预取缓冲区 预取量权重获取单元 702, 用于将所述操作系统当前 I/O队列的长度与 操作系统限定的最大 I/O 队列长度的比值乘以第一可调因子以获取磁盘负 载对预取量增长的权重, 将所述进程所在节点的空闲预取缓冲区容量与所 述进程所在节点总预取缓冲区容量的比值乘以第二可调因子以获取所述进 程所在节点的预取缓冲区容量对预取量增长的权重,所述操作系统当前 I/O 队列的长度与操作系统限定的最大 I/O 队列长度的比值为表征非一致性内 存访问 NUMA系统中磁盘负载的参数。  The memory obtaining unit 701 is configured to call the I/O queue obtaining module and the memory acquiring module to respectively acquire the length of the current I/O queue of the operating system and the idle prefetch buffer prefetching weight obtaining unit 702 of the node where the process is located, Multiplying a ratio of a length of a current I/O queue of the operating system to an operating system-defined maximum I/O queue length by a first adjustable factor to obtain a weight of a disk load to a prefetch amount, the process The ratio of the capacity of the idle prefetch buffer of the node to the total prefetch buffer capacity of the node where the process is located is multiplied by the second adjustable factor to obtain the weight of the prefetch buffer capacity of the node where the process is located. The ratio of the length of the current I/O queue of the operating system to the maximum I/O queue length defined by the operating system is a parameter characterizing the disk load in the non-uniform memory access NUMA system.
在上述附图 7示例的用于非一致性内存访问的数据预取装置 07中, 内 存获取单元 701可以通过调用 I/O队列获取模块获取操作系统当前 I/O队列 的长度,结合预取量权重获取单元 702,获取磁盘负载对预取量增长的权重。 具体地, 可以使用 jprobe技术探测 do— generic— make— request ( ) 函数, 从此 函数中获取系统当前 I/O队列的长度,即正在使用的操作系统 I/O队列长度 (通过探测 do— generic— make— request ( ) 函数的参数 count) , 也可以获取操 作系统限定的最大 I/O队列长度(通过探测 do— generic— make— request ( ) 函 数的参数 max— io— length); 然后, 预取量权重获取单元 702将所述操作系统 当前 I/O队列的长度(记为 Q—t )与操作系统限定的最大 I/O队列长度(记 为 Gmax ) 的比值乘以第一可调因子(记为 a) 以获取磁盘负载对预取量增长 的权重, BP , 磁盘负载对预取量增长的权重为 a00 In the data prefetching device 07 for non-uniform memory access exemplified in FIG. 7 above, the memory obtaining unit 701 can acquire the length of the current I/O queue of the operating system by calling the I/O queue obtaining module, combined with the prefetch amount. The weight obtaining unit 702 obtains the weight of the disk load on the increase of the prefetch amount. Specifically, the jprobe technique can be used to detect the do-generic-make-request() function, from which the length of the current I/O queue of the system is obtained, that is, the operating system I/O queue length being used (by detecting do-generic- The value of the make_request() function, count), can also get the maximum I/O queue length defined by the operating system (by detecting the parameter max_io_length of the do-generic-make-request() function); then, prefetching Volume weight acquisition unit 702 will operate the operating system The ratio of the length of the current I/O queue (denoted Q -t ) to the maximum I/O queue length defined by the operating system (denoted as G max ) multiplied by the first adjustable factor (denoted as a ) to obtain the disk load pair The weight of the prefetch increase, BP, the weight of the disk load on the prefetch increase is a 0 TM 0
在上述附图 7示例的用于非一致性内存访问的数据预取装置 07中, 内 存获取单元 701 也可以通过调用内存获取模块获取进程所在节点的空闲预 取缓冲区容量, 结合预取量权重获取单元 702, 从而获取进程所在节点的预 取缓冲区容量对预取量增长的权重。 具体地, 预取量权重获取单元 702将 进程所在节点的空闲预取缓冲区容量 (记为 M 与所述进程所在节点总预 取缓冲区容量 (记为 Mw。') 的比值乘以第二可调因子 (记为 b ) 以获取所 述进程所在节点的预取缓冲区容量对预取量增长的权重, gp, 进程所在节 点的预取缓冲区容量对预取量增长的权重为 Ms lMf In the data prefetching device 07 for non-uniform memory access as illustrated in FIG. 7 above, the memory obtaining unit 701 can also acquire the idle prefetch buffer capacity of the node where the process is located by calling the memory obtaining module, and combine the prefetch weight. The obtaining unit 702 is configured to obtain the weight of the prefetch buffer capacity of the node where the process is located to increase the prefetch amount. Specifically, the prefetch amount weight obtaining unit 702 multiplies the ratio of the idle prefetch buffer capacity of the node where the process is located (reported as M and the total prefetch buffer capacity of the node where the process is located (denoted as M w. '). two adjustable factor (denoted by b) of the process is located to the right of access nodes pre-fetch buffer capacity prefetch grew heavy, gp, a process where the prefetch buffer capacity of the node on the right amount of growth prefetch weight M Sl M f
此时,数据预取量参数因子 ^为磁盘负载对预取量增长的权重与进程所 在节点的预取缓冲区容量对预取量增长的权重的差值, gP = ^current I Q^,
Figure imgf000019_0001
/ total
At this time, the data prefetch parameter parameter ^ is the difference between the weight of the disk load on the prefetch amount and the weight of the prefetch buffer capacity of the node where the process is located, and gP = ^current IQ^,
Figure imgf000019_0001
/ total
需要说明的是, 在附图 6或附图 7示例的用于非一致性内存访问的数 据预取装置 06或 07中, 第一可调因子 a和第二可调因子 b可由用户根据 硬件环境和自身需求确定, 其取值范围在 (0 1]。 如果用户不对第一可调 因子 a和第二可调因子 b进行调整, 则第一可调因子 a和第二可调因子 b 可取缺省值 1。 从数据预取量参数因子 的表达式可知, 第一可调因子 a和 第二可调因子 b用于调节进程所在节点总预取缓冲区容量占用情况 (即 M^
Figure imgf000019_0002
/ Gmax ) 对预取量的影响权重, 具体地, 当第一可调因子 a相对较大而第二可调因子 b相对较小时, 进程所在节点 预取缓冲区空闲比例 (即 M Mf w ) 对预取量影响相对较大; 反之, 当第 一可调因子 a相对较小而第二可调因子 b相对较大时, 磁盘负载情况 (即 Qcurrent / Q^ ) 对预取量影响相对较大。
It should be noted that, in the data prefetching device 06 or 07 for non-uniform memory access illustrated in FIG. 6 or FIG. 7, the first adjustable factor a and the second adjustable factor b may be determined by the user according to the hardware environment. Determined by its own requirements, its value range is (0 1). If the user does not adjust the first adjustable factor a and the second adjustable factor b, the first adjustable factor a and the second adjustable factor b may be absent. The value is 1. From the expression of the data prefetch parameter parameter, the first adjustable factor a and the second adjustable factor b are used to adjust the total prefetch buffer capacity occupancy of the node where the process is located (ie, M ^
Figure imgf000019_0002
/ Gmax ) The weight of the prefetch amount, specifically, when the first adjustable factor a is relatively large and the second adjustable factor b is relatively small, the node where the process is located prefetches the buffer idle ratio (ie, MM fw ) The pre-fetch amount has a relatively large influence; on the contrary, when the first adjustable factor a is relatively small and the second adjustable factor b is relatively large, the disk load condition (ie, Qcurrent / Q^ ) has a relatively large impact on the amount of prefetch.
为了实现资源的高效利用, 服务器系统(可以是 NUMA系统中的一个 节点) 中常常运行多个虚拟机, 每个虚拟机中运行独立的操作系统。 在虚 拟化系统中, 前述实施例提供的用于非一致性内存访问的数据预取方法基 本不变, 不同之处在于, 由于虚拟化 I/O子系统的作用, 每个虚拟机中运行 独立的操作系统可以拥有独立的文件系统,并独立地进行 I/O队列管理, 因 此, 操作系统内部的 I/O队列长度不能反映整个系统的磁盘 I/O负载情况。 此时,如果虚拟化系统提供获取整个 NUMA系统 I/O队列长度的调用接口, 则 I/O队列获取模块就利用该调用接口获取当前整个 NUMA系统的 I/O队 列长度, 而不是从虚拟机中运行的独立操作系统获取当前整个 NUMA系统 的 I/O队列长度;如果虚拟化系统不提供此种调用接口,则从虚拟机中运行 的操作系统中获取当前整个 NUMA系统的 I/O队列长度。 具体地, 对于虚 拟化系统不提供调用接口的情形, 可以调用虚拟化管理工具 (例如, Hypervisor), Hypervisor等管理工具对运行于各个节点的虚拟化系统在内存 管理、 通信等方面进行统一协调, 对内存的分配和调度所采用的策略是公 开的, 具体实现如下, 如果某一个节点上的虚拟化系统在运行预取软件, 可以首先获取该节点的 I/O队列长度, 根据管理工具 (Hypervisor) 的内存 调度策略推算出当前整个 NUMA系统的 I/O队列长度 (也即整个 NUMA 系统正在使用的 I/O队列长度)。 需要进一歩说明的是, 对于预取量最大倍增倍数 ^ 用来限制每一次 预取量倍增倍数, 可由用户根据进程所在节点总预取缓冲区容量大小和本 系统主要应用的特点来确定, gp, 如果进程所在节点分配的总预取缓冲区 容量较大, 主要的应用具有连续顺序读文件的特征, 则预取量每次最大倍 增倍数 ^ 可以设置为较大的值, 以使得在允许的情况下, 预取窗口能够 快速增长, 提高数据预取命中率。 作为附图 7示例的用于非一致性内存访问的数据预取装置 07中的一个 实施例,预取量最大倍增倍数 ^。fe的取值范围可以为 [0, 8],其中,符号" [ ] " 表示闭区间; 在 [0, 8]内, 同样遵循预取量最大倍增倍数7^ s越大, 所述预 取量最大倍增倍数 ^。fe的取值越大的原则。 In order to achieve efficient use of resources, multiple virtual machines are often run in a server system (which can be a node in a NUMA system), and each virtual machine runs a separate operating system. In the virtualization system, the data prefetching method for non-uniform memory access provided by the foregoing embodiment is basically unchanged, except that, due to the role of the virtualized I/O subsystem, each virtual machine runs independently. The operating system can have a separate file system and independently manage I/O queues. Therefore, the I/O queue length inside the operating system does not reflect the disk I/O load of the entire system. At this time, if the virtualization system provides a call interface that acquires the entire NUMA system I/O queue length, the I/O queue acquisition module uses the call interface to obtain the current I/O queue length of the entire NUMA system instead of the virtual machine. The independent operating system running in the middle obtains the current I/O queue length of the entire NUMA system; if the virtualization system does not provide such a calling interface, the I/O queue length of the current NUMA system is obtained from the operating system running in the virtual machine. . Specifically, for the case where the virtualization system does not provide a calling interface, a virtualization management tool (for example, a hypervisor) may be invoked, and a management tool such as a hypervisor uniformly coordinates the memory management and communication in a virtualized system running on each node. The strategy adopted for memory allocation and scheduling is public. The specific implementation is as follows. If the virtualization system on a node is running prefetching software, you can first obtain the I/O queue length of the node, according to the management tool (Hypervisor). The memory scheduling strategy derives the current I/O queue length for the entire NUMA system (that is, the length of the I/O queue being used by the entire NUMA system). Need to further explain that, for the maximum multiplier of the prefetch amount ^ is used to limit the multiplier of each prefetch amount, which can be determined by the user according to the total prefetch buffer capacity of the node where the process is located and the characteristics of the main application of the system, gp If the total prefetch buffer allocated by the node where the process is located has a large capacity, and the main application has the characteristics of sequential sequential read files, the maximum multiplier of the prefetch amount ^ can be set to a larger value, so that the allowed In this case, the prefetch window can grow rapidly and improve the data prefetch hit rate. As an embodiment of the data prefetching means 07 for non-uniform memory access exemplified in FIG. 7, the prefetch amount is multiplied by a maximum number ^. The value range of fe can be [0, 8], where the symbol "[ ] " represents a closed interval; in [0, 8], the same as the maximum multiplier of the pre-fetch amount 7 ^ s, the prefetch The maximum multiplication factor ^. The principle that the value of fe is larger.
需要说明的是, 上述装置各模块 /单元之间的信息交互、 执行过程等内 容, 由于与本发明方法实施例基于同一构思, 其带来的技术效果与本发明 方法实施例相同, 具体内容可参见本发明方法实施例中的叙述, 此处不再 赘述。  It should be noted that the information interaction, the execution process, and the like between the modules/units of the foregoing device are based on the same concept as the method embodiment of the present invention, and the technical effects thereof are the same as the embodiment of the method of the present invention. Reference is made to the description in the method embodiment of the present invention, and details are not described herein again.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分 歩骤是可以通过程序来指令相关的硬件来完成, 比如以下各种方法的一种 或多种或全部:  One of ordinary skill in the art will appreciate that all or a portion of the various methods of the above-described embodiments can be performed by a program to instruct related hardware, such as one or more or all of the following various methods:
根据表征非一致性内存访问 NUMA系统中磁盘负载的参数和进程所在 节点的空闲预取缓冲区容量, 获取数据预取量参数因子 求取前一次预取窗口的大小 RSize、 预取量最大倍增倍数 ^。fe以及所 述数据预取量参数因子 r三者的乘积 S ·' 比较设定的最大预取量^^^ ^和所述 的大小, 以所述 ^4 rearfatearf和所述 & β中的较小值作为本次预取窗口的大小去预取数据。 该程序可以存储于一计算机可读存储介质中, 存储介质可以包括: 只 读存储器 (ROM, Read Only Memory), 随机存取存储器 (RAM, Random Access Memory )、 磁盘或光盘等。 According to the parameter indicating the disk load in the non-uniform memory access NUMA system and the idle prefetch buffer capacity of the node where the process is located, the data prefetch parameter parameter is obtained to obtain the size of the previous prefetch window R - Size , the maximum prefetch amount Multiplier ^. Fe and the product of the data prefetch parameter parameter r, S · ' compare the set maximum prefetch amount ^ ^ ^ ^ and the size, with the comparison of the ^4 rearfatearf and the & β The small value is used to prefetch the data as the size of this prefetch window. The program may be stored in a computer readable storage medium, and the storage medium may include: a read only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like.
以上对本发明实施例提供的用于非一致性内存访问的数据预取方法和 装置进行了详细介绍, 本文中应用了具体个例对本发明的原理及实施方式 进行了阐述, 以上实施例的说明只是用于帮助理解本发明的方法及其核心 思想; 同时, 对于本领域的一般技术人员, 依据本发明的思想, 在具体实 施方式及应用范围上均会有改变之处, 综上所述, 本说明书内容不应理解 为对本发明的限制。 The data prefetching method and apparatus for non-uniform memory access provided by the embodiments of the present invention are described in detail above. The specific examples are used to explain the principles and implementation manners of the present invention. The description of the above embodiments is only Used to help understand the method of the present invention and its core idea; at the same time, for those of ordinary skill in the art, in accordance with the idea of the present invention, The scope of the application and the scope of the application are subject to change. In the above description, the content of the specification should not be construed as limiting the invention.
+  +

Claims

权利要求 Rights request
1、 一种用于非一致性内存访问的数据预取方法, 其特征在于, 所述方 法包括: A data prefetching method for non-uniform memory access, wherein the method comprises:
根据表征非一致性内存访问 NUMA系统中磁盘负载的参数和进程所在 节点的空闲预取缓冲区容量, 获取数据预取量参数因子 求取前一次预取窗口的大小 RSize、 预取量最大倍增倍数 ^。fe以及所 述数据预取量参数因子 r三者的乘积 S ·' 比较设定的最大预取量^^^ ^和所述 的大小, 以所述 ^4 rearfatearf和所述 & β中的较小值作为本次预取窗口的大小去预取数据。 According to the parameter indicating the disk load in the non-uniform memory access NUMA system and the idle prefetch buffer capacity of the node where the process is located, the data prefetch parameter parameter is obtained to obtain the size of the previous prefetch window R - Size , the maximum prefetch amount Multiplier ^. Fe and the product of the data prefetch parameter parameter r, S · ' compare the set maximum prefetch amount ^ ^ ^ ^ and the size, with the comparison of the ^4 rearfatearf and the & β The small value is used to prefetch the data as the size of this prefetch window.
2、 如权利要求 1所述的方法, 其特征在于, 所述根据表征非一致性内 存访问 NUMA系统中磁盘负载的参数和线程所在节点的空闲预取缓冲区容 量, 获取数据预取量参数因子 ^包括:  2. The method according to claim 1, wherein the obtaining a data prefetch parameter parameter according to a parameter representing a disk load in the non-uniform memory access NUMA system and an idle prefetch buffer capacity of a node where the thread is located ^Includes:
根据表征非一致性内存访问 NUMA系统中磁盘负载的参数和进程所在 节点的空闲预取缓冲区容量, 获取磁盘负载对预取量增长的权重和所述进 程所在节点的预取缓冲区容量对预取量增长的权重;  Obtaining the weight of the disk load on the prefetch amount and the prefetch buffer capacity of the node where the process is located according to the parameter indicating the disk load in the non-uniform memory access NUMA system and the idle prefetch buffer capacity of the node where the process is located. Take the weight of growth;
求取所述磁盘负载对预取量增长的权重与所述进程所在节点的预取缓 冲区容量对预取量增长的权重的差值, 得到所述数据预取量参数因子 。  The difference between the weight of the disk load on the prefetch amount and the weight of the prefetch buffer capacity of the node where the process is located is increased, and the data prefetch parameter parameter is obtained.
3、 如权利要求 2所述的方法, 其特征在于, 所述根据表征非一致性内 存访问 NUMA系统中磁盘负载的参数和进程所在节点的空闲预取缓冲区容 量, 获取磁盘负载对预取量增长的权重和所述进程所在节点的预取缓冲区 容量对预取量增长的权重包括:  The method according to claim 2, wherein the obtaining the disk load pair prefetch amount according to the parameter indicating the disk load in the non-uniform memory access NUMA system and the idle prefetch buffer capacity of the node where the process is located The weight of the increase and the weight of the prefetch buffer capacity of the node where the process is located increase the amount of prefetch:
调用输入输出 I/O 队列获取模块和内存获取模块以分别获取操作系统 当前 I/O队列的长度和所述进程所在节点的空闲预取缓冲区容量;  Calling the input/output I/O queue acquisition module and the memory acquisition module to respectively obtain the length of the current I/O queue of the operating system and the idle prefetch buffer capacity of the node where the process is located;
将所述操作系统当前 I/O队列的长度与操作系统限定的最大 I/O队列长 度的比值乘以第一可调因子以获取磁盘负载对预取量增长的权重, 将所述 进程所在节点的空闲预取缓冲区容量与所述线程所在节点总预取缓冲区容 量的比值乘以第二可调因子以获取所述线程所在节点的预取缓冲区容量对 预取量增长的权重,所述操作系统当前 I/O队列的长度与操作系统限定的最 大 I/O队列长度的比值为表征非一致性内存访问 NUMA系统中磁盘负载的 参数。 Multiplying a ratio of a length of a current I/O queue of the operating system to a maximum I/O queue length defined by an operating system by a first adjustable factor to obtain a weight of a disk load to a prefetch amount, The ratio of the capacity of the idle prefetch buffer of the node where the process is located to the total prefetch buffer capacity of the node where the thread is located is multiplied by the second adjustable factor to obtain the prefetch buffer capacity of the node where the thread is located, which increases the prefetch amount. Weight, the ratio of the length of the current I/O queue of the operating system to the maximum I/O queue length defined by the operating system is a parameter characterizing the disk load in the non-uniform memory access NUMA system.
4、 如权利要求 1所述的方法, 其特征在于, 所述预取量最大倍增倍数 。fe的取值范围为 [08] , 其中符号 " [ ] "表示闭区间。 4. The method of claim 1, wherein the prefetch amount is a maximum multiplication factor. The range of fe is [ 0 , 8] , where the symbol " [ ] " indicates a closed interval.
5、 如权利要求 1所述的方法, 其特征在于, 所述空闲预取缓冲区容量 越大, 则所述预取量最大倍增倍数 ^。fe的取值越大。 The method according to claim 1, wherein the larger the capacity of the idle prefetch buffer is, the maximum multiplier of the prefetch amount is ^. The larger the value of fe .
6、 一种用于非一致性内存访问的数据预取装置, 其特征在于, 所述装 置包括:  6. A data prefetching apparatus for non-uniform memory access, wherein the apparatus comprises:
数据预取量参数因子获取模块, 用于根据表征非一致性内存访问 NUMA系统中磁盘负载的参数和进程所在节点的空闲预取缓冲区容量, 获 取数据预取量参数因子 预取量窗口倍增模块, 用于求取前一次预取窗口的大小 ^f、 预取 量最大倍增倍数7^ fe以及所述数据预取量参数因子 ^三者的乘积 & ; 预取量窗口获取模块, 用于比较设定的最大预取量M4 ^^^和所述 S 的大小, 以所述 Λ 和所述 S 中的较小值作为本次预取窗口的 大小去预取数据。 a data prefetch parameter parameter acquisition module, configured to obtain a data prefetch parameter parameter prefetch window multiplication module according to a parameter representing a non-uniform memory access parameter of a disk load in a NUMA system and an idle prefetch buffer capacity of a node where the process is located , for obtaining the size of the previous prefetch window ^f, the maximum multiplier of the prefetch amount 7 ^ fe and the product of the data prefetch parameter parameter ^ &; the prefetch window acquisition module, for comparison set the maximum prefetch quantity and size of the M4 ^^^ S, the smaller the value of the S and Λ are prefetched as the size of this window to prefetch data.
7、 如权利要求 6所述的装置, 其特征在于, 所述数据预取量参数因子 获取模块包括:  The device according to claim 6, wherein the data prefetch parameter parameter acquisition module comprises:
权重获取子模块, 用于根据所述表征非一致性内存访问 NUMA系统中 磁盘负载的参数和进程所在节点的空闲预取缓冲区容量, 获取磁盘负载对 预取量增长的权重和所述进程所在节点的预取缓冲区容量对预取量增长的 权重; 求差子模块, 用于求取所述磁盘负载对预取量增长的权重与所述线程 所在节点的预取缓冲区容量对预取量增长的权重的差值, 得到所述数据预 取量参数因子 a weight obtaining sub-module, configured to obtain, according to the parameter that the non-uniform memory accesses the disk load in the NUMA system and the idle prefetch buffer capacity of the node where the process is located, obtain the weight of the disk load on the prefetch amount and the process The weight of the prefetch buffer capacity of the node for the increase of the prefetch amount; a difference submodule, configured to obtain a difference between a weight of the disk load on the prefetch amount and a weight of the prefetch buffer capacity of the node where the thread is located, and a weight of the prefetch amount, to obtain the data prefetch amount Parameter factor
8、如权利要求 7所述的装置,其特征在于,所述权重获取子模块包括: 内存获取单元,用于调用输入输出 I/O队列获取模块和内存获取模块以 分别获取操作系统当前 I/O 队列的长度和所述线程所在节点的空闲预取缓 冲区容量;  The apparatus according to claim 7, wherein the weight acquisition sub-module comprises: a memory acquisition unit, configured to call an input/output I/O queue acquisition module and a memory acquisition module to respectively acquire an operating system current I/ O The length of the queue and the idle prefetch buffer capacity of the node where the thread is located;
预取量权重获取单元,用于将所述操作系统当前 I/O队列的长度与操作 系统限定的最大 I/O 队列长度的比值乘以第一可调因子以获取磁盘负载对 预取量增长的权重, 将所述线程所在节点的空闲预取缓冲区容量与所述线 程所在节点总预取缓冲区容量的比值乘以第二可调因子以获取所述线程所 在节点的预取缓冲区容量对预取量增长的权重,所述操作系统当前 I/O队列 的长度与操作系统限定的最大 I/O 队列长度的比值为表征非一致性内存访 问 NUMA系统中磁盘负载的参数。  a prefetching weight obtaining unit, configured to multiply a ratio of a length of a current I/O queue of the operating system to a maximum I/O queue length defined by an operating system by a first adjustable factor to obtain a disk load to increase a prefetch amount The weight of the idle prefetch buffer of the node where the thread is located is multiplied by the ratio of the total prefetch buffer capacity of the node where the thread is located by the second adjustable factor to obtain the prefetch buffer capacity of the node where the thread is located. For the weight of the prefetch increase, the ratio of the length of the current I/O queue of the operating system to the maximum I/O queue length defined by the operating system is a parameter characterizing the disk load in the non-uniform memory access NUMA system.
9、 如权利要求 6所述的装置, 其特征在于, 所述预取量最大倍增倍数 。fe的取值范围为 [08] , 其中符号 " [ ] "表示闭区间。 9. The apparatus according to claim 6, wherein the pre-fetch amount is a maximum multiplication factor. The range of fe is [ 0 , 8] , where the symbol " [ ] " indicates a closed interval.
10、 如权利要求 6或 9所述的装置, 其特征在于, 所述空闲预取缓冲  10. The apparatus according to claim 6 or 9, wherein the idle prefetch buffer
PCT/CN2012/082202 2011-09-27 2012-09-27 Data readahead method and device for non-uniform memory access WO2013044829A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110296544.3 2011-09-27
CN201110296544.3A CN102508638B (en) 2011-09-27 2011-09-27 Data pre-fetching method and device for non-uniform memory access

Publications (1)

Publication Number Publication Date
WO2013044829A1 true WO2013044829A1 (en) 2013-04-04

Family

ID=46220732

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/082202 WO2013044829A1 (en) 2011-09-27 2012-09-27 Data readahead method and device for non-uniform memory access

Country Status (2)

Country Link
CN (1) CN102508638B (en)
WO (1) WO2013044829A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104657198A (en) * 2015-01-24 2015-05-27 深圳职业技术学院 Memory access optimization method and memory access optimization system for NUMA (Non-Uniform Memory Access) architecture system in virtual machine environment

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102508638B (en) * 2011-09-27 2014-09-17 华为技术有限公司 Data pre-fetching method and device for non-uniform memory access
CN103577158B (en) * 2012-07-18 2017-03-01 阿里巴巴集团控股有限公司 Data processing method and device
US9665491B2 (en) * 2014-07-17 2017-05-30 Samsung Electronics Co., Ltd. Adaptive mechanism to tune the degree of pre-fetches streams
KR20170014496A (en) * 2015-07-30 2017-02-08 에스케이하이닉스 주식회사 Memory system and operation method for the same
CN107203480B (en) * 2016-03-17 2020-11-17 华为技术有限公司 Data prefetching method and device
WO2018032519A1 (en) * 2016-08-19 2018-02-22 华为技术有限公司 Resource allocation method and device, and numa system
CN106844740B (en) * 2017-02-14 2020-12-29 华南师范大学 Data pre-reading method based on memory object cache system
CN108877199A (en) 2017-05-15 2018-11-23 华为技术有限公司 Control method, equipment and the car networking system of fleet
CN109471671B (en) * 2017-09-06 2023-03-24 武汉斗鱼网络科技有限公司 Program cold starting method and system
CN110019086B (en) * 2017-11-06 2024-02-13 中兴通讯股份有限公司 Multi-copy reading method, device and storage medium based on distributed file system
CN112445725A (en) * 2019-08-27 2021-03-05 华为技术有限公司 Method and device for pre-reading file page and terminal equipment
CN110865947B (en) * 2019-11-14 2022-02-08 中国人民解放军国防科技大学 Cache management method for prefetching data
CN113128531B (en) * 2019-12-30 2024-03-26 上海商汤智能科技有限公司 Data processing method and device
CN112380017B (en) * 2020-11-30 2024-04-09 成都虚谷伟业科技有限公司 Memory management system based on loose memory release
CN112558866B (en) * 2020-12-03 2022-12-09 Oppo(重庆)智能科技有限公司 Data pre-reading method, mobile terminal and computer readable storage medium
CN112748989A (en) * 2021-01-29 2021-05-04 上海交通大学 Virtual machine memory management method, system, terminal and medium based on remote memory
CN114238417A (en) * 2021-12-27 2022-03-25 四川启睿克科技有限公司 Data caching method
CN116795877B (en) * 2023-08-23 2023-12-19 本原数据(北京)信息技术有限公司 Method and device for pre-reading database, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761706A (en) * 1994-11-01 1998-06-02 Cray Research, Inc. Stream buffers for high-performance computer memory system
CN1604055A (en) * 2003-09-30 2005-04-06 国际商业机器公司 Apparatus and method for pre-fetching data to cached memory using persistent historical page table data
CN102508638A (en) * 2011-09-27 2012-06-20 华为技术有限公司 Data pre-fetching method and device for non-uniform memory access

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761706A (en) * 1994-11-01 1998-06-02 Cray Research, Inc. Stream buffers for high-performance computer memory system
CN1604055A (en) * 2003-09-30 2005-04-06 国际商业机器公司 Apparatus and method for pre-fetching data to cached memory using persistent historical page table data
CN102508638A (en) * 2011-09-27 2012-06-20 华为技术有限公司 Data pre-fetching method and device for non-uniform memory access

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI, QIONG ET AL.: "Design of High-Performance Scalable Distributed Shared Parallel I/O Systems", COMPUTER ENGINEERING & SCIENCE, vol. 28, no. 1, January 2006 (2006-01-01), pages 137 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104657198A (en) * 2015-01-24 2015-05-27 深圳职业技术学院 Memory access optimization method and memory access optimization system for NUMA (Non-Uniform Memory Access) architecture system in virtual machine environment

Also Published As

Publication number Publication date
CN102508638A (en) 2012-06-20
CN102508638B (en) 2014-09-17

Similar Documents

Publication Publication Date Title
WO2013044829A1 (en) Data readahead method and device for non-uniform memory access
US9804798B2 (en) Storing checkpoint file in high performance storage device for rapid virtual machine suspend and resume
US10037222B2 (en) Virtualization of hardware accelerator allowing simultaneous reading and writing
KR101361928B1 (en) Cache prefill on thread migration
US8738875B2 (en) Increasing memory capacity in power-constrained systems
US10204175B2 (en) Dynamic memory tuning for in-memory data analytic platforms
US20140195772A1 (en) System and method for out-of-order prefetch instructions in an in-order pipeline
CN106293944B (en) non-consistency-based I/O access system and optimization method under virtualized multi-core environment
TW201734758A (en) Multi-core communication acceleration using hardware queue device
US20130318269A1 (en) Processing structured and unstructured data using offload processors
US20080104325A1 (en) Temporally relevant data placement
KR20120025612A (en) Mapping of computer threads onto heterogeneous resources
US11048447B2 (en) Providing direct data access between accelerators and storage in a computing environment, wherein the direct data access is independent of host CPU and the host CPU transfers object map identifying object of the data
US20100185817A1 (en) Methods and Systems for Implementing Transcendent Page Caching
Kim et al. GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management
Sun et al. Scheduling algorithm based on prefetching in MapReduce clusters
CN113407119A (en) Data prefetching method, data prefetching device and processor
Sun et al. HPSO: Prefetching based scheduling to improve data locality for MapReduce clusters
US10579419B2 (en) Data analysis in storage system
Zhao et al. Selective replication in memory-side GPU caches
Seelam et al. Masking I/O latency using application level I/O caching and prefetching on Blue Gene systems
Chen et al. Data prefetching and eviction mechanisms of in-memory storage systems based on scheduling for big data processing
Yoon et al. Design of DRAM-NAND flash hybrid main memory and Q-learning-based prefetching method
Jahre et al. A high performance adaptive miss handling architecture for chip multiprocessors
Lv et al. Dynamic I/O-aware scheduling for batch-mode applications on chip multiprocessor systems of cluster platforms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12834825

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12834825

Country of ref document: EP

Kind code of ref document: A1