CN108897618B

CN108897618B - Resource allocation method based on task perception under heterogeneous memory architecture

Info

Publication number: CN108897618B
Application number: CN201810632230.8A
Authority: CN
Inventors: 许胤龙; 陈吉强; 李永坤; 郭帆; 刘军明
Original assignee: University of Science and Technology of China USTC
Current assignee: Pingkai Star Beijing Technology Co ltd
Priority date: 2018-06-19
Filing date: 2018-06-19
Publication date: 2021-10-01
Anticipated expiration: 2038-06-19
Also published as: CN108897618A

Abstract

The invention discloses a resource allocation method based on task perception under a heterogeneous memory architecture, which is characterized by comprising the steps of process performance metadata recording, node task allocation recording, task characteristic perception scheduling strategy and page perception migration strategy. Because different tasks are distinguished, the tasks in each NUMA node are distributed relatively uniformly, and compared with a default task distribution strategy of a system, cache contention and memory access contention of a system CPU are relieved; meanwhile, because pages with different read-write characteristics of the task are distinguished, and an adaptive placement strategy is adopted under a heterogeneous memory architecture, the write operation times of the NVM are reduced, and the service life of the NVM is prolonged; by adopting the method of the invention, the performance loss is reduced as much as possible because most of the write operation occurs in the DRAM.

Description

Resource allocation method based on task perception under heterogeneous memory architecture

Technical Field

The invention belongs to the technical field of computer memory management, and particularly relates to a method for constructing a heterogeneous memory by applying a novel nonvolatile memory (NVM) and a traditional Dynamic Random Access Memory (DRAM) in a widely used non-uniform memory access (NUMA) architecture server and realizing efficient task resource allocation through task characteristic perception on the basis.

Background

In 9 months 1999, IBM corporation integrated NUMA technology into IBMUnix. The breakthrough technology of NUMA thoroughly breaks away from the constraint of the traditional super-large bus to a multiprocessing structure. It greatly enhances the processor, memory and I/O slots that a single operating system can manage. In the face of the current large data scene, more and more applications are changed from traditional computing-intensive applications to data-intensive applications, and in order to meet the larger memory requirements of the applications, a heterogeneous memory architecture is gradually proposed. Thus, future new NUMA heterogeneous memory architectures will exhibit a high degree of non-uniformity: application class complexity, storage medium read and write speed asymmetry, and NUMA inherent access non-uniformity. The traditional NUMA technology cannot distinguish the characteristics of different memory media, cannot distinguish and treat different types of applications to obtain optimal operating performance, and cannot place specific pages to obtain optimal storage performance for different storage media, so that the actual performance of the system is far from the theoretical optimal performance.

Disclosure of Invention

The invention aims to provide a resource allocation method based on task perception under a heterogeneous memory architecture, on one hand, aiming at different types of applications, adaptive CPU and memory allocation is adopted; on the other hand, different page placement strategies are adopted for application access pages with different characteristics, so that the defects of the existing NUMA management technology when applied to the heterogeneous memory are overcome, and under the condition of ensuring low software overhead, efficient distribution of multiple tasks and efficient use of the heterogeneous memory are realized.

The invention discloses a resource allocation method based on task perception under a heterogeneous memory architecture, which is characterized by comprising the following steps:

the first step is as follows: process performance metadata records

Aiming at all optimized task processes, acquiring two performance parameters, namely a memory write request number WAPS (write access per second) of the process and a total memory occupation amount MF (memory focus) of the process, through a hardware performance counter, and calculating a task classification standard TC (task classification) MF, wherein the unit of the WAPS is set to million, and the unit of the MF is set to GB; tasks are divided into two broad categories by TC value: when TC <1, is a computationally intensive application; when TC >1, it is a data intensive application;

the second step is that: node task allocation record

According to the CPU occupation, memory allocation and performance metadata records of each process, a task process record table is established for each node in the NUMA architecture, and metadata of related processes in the node is recorded; simultaneously, a resource allocation record table is created for each node, and the occupation condition of a CPU core in the node and the capacity of a node idle memory are recorded;

the third step: task characteristic aware scheduling policy

Based on a default task resource allocation mode of the system, periodically completing task migration adjustment among nodes according to the task allocation record of each node so as to enable different types of applications to be uniformly distributed in all nodes;

firstly, traversing distribution record tables of all nodes of NUMA, and finding out a Node1 which runs most computation intensive applications, namely TC <1, and a Node2 which runs most data intensive applications, namely TC > 1); the compute-intensive task numbers in the two nodes are recorded as follows: computing _ task _ NUM; the number of data intensive tasks is: data _ task _ NUM;

for both nodes, respectively: if the absolute value of the data _ task _ NUM subtracted by the computing _ task _ NUM is larger than 1, the situation that the task placement of the Node1 and Node2 nodes is not uniform is explained; if the two nodes have free memory to support task migration, one compute-intensive application in the Node1 is migrated to the Node2, and one data-intensive application in the Node2 is migrated to the Node 1; if the two nodes do not support task migration, the task migration operation is not carried out;

for both nodes, respectively: if the absolute value of the data _ task _ NUM subtracted by the computing _ task _ NUM is less than or equal to 1, the fact that the task placement of the Node1 and the Node2 is completely uniform is indicated, and further the fact that different types of applications of all nodes are uniformly distributed is indicated, task migration adjustment is not needed;

the fourth step: task page aware migration policy

If the memory occupation amount of the application is still increased, indicating that the task is still in the initial memory allocation stage, not carrying out page migration;

if the memory occupation amount of the application is relatively stable, indicating that the task is in a calculation operation stage, starting page migration, and specifically dividing the page migration into two parts: (1) migrating a page set which has not recently undergone write operation in the DRAM into the NVM; (2) the set of pages in the NVM where the write operation has recently occurred is migrated to DRAM.

The resource allocation method based on task perception under the heterogeneous memory architecture mainly comprises the following operation steps: process performance metadata records, node task allocation records, task characteristic-aware scheduling policies, and page-aware migration policies. Because different tasks are distinguished, the tasks in each NUMA node are distributed relatively uniformly, and compared with a default task distribution strategy of a system, cache contention and memory access contention of a system CPU are relieved; meanwhile, because pages with different read-write characteristics of the task are distinguished, and an adaptive placement strategy is adopted under a heterogeneous memory architecture, the write operation times of the NVM are reduced, and the service life of the NVM is prolonged; since most of the write operations occur in the DRAM, performance loss is minimized.

Drawings

Fig. 1 is a schematic diagram of an implementation operation flow of the resource allocation method based on task awareness in the heterogeneous memory architecture.

FIG. 2 is a schematic diagram of a default allocation manner applied under a two-node NUMA architecture.

Fig. 3 shows a task placement diagram after feature perception adjustment.

FIG. 4 is a diagram illustrating memory usage after initial allocation is applied.

FIG. 5 is a diagram illustrating memory footprint after page aware migration.

Detailed Description

The following describes the resource allocation method based on task awareness in the heterogeneous memory architecture in detail by using specific embodiments with reference to the accompanying drawings.

Example 1:

in the resource allocation method based on task awareness in the heterogeneous memory architecture of this embodiment, 4 compute-intensive applications (a1, a2, A3, and a4) and 4 data-intensive applications (B1, B2, B3, and B4) are allocated to run in a NUMA heterogeneous memory architecture of two nodes. There are 4 cores, 4GB DRAM memory and 12GB NVM memory per node. The initial allocation is applied by adopting a system default allocation mode, and the task perception strategy is periodically adopted for adjustment. Fig. 1 is a schematic diagram of an operation flow of the resource allocation method based on task awareness in the NUMA heterogeneous memory architecture of the present embodiment, which includes two major parts, namely a scheduling policy for sensing periodic task characteristics and a migration policy for sensing periodic task pages.

The resource allocation method based on task awareness under the heterogeneous memory architecture specifically comprises the following steps:

the first step is as follows: process performance metadata records

For all optimized task processes, acquiring two performance parameters, namely a memory write request number WAPS (write access per second) of the process and a total memory occupation amount MF (memory focus) of the process, through a hardware performance counter (see a flow operation box (I) in fig. 1), thereby calculating a task classification standard TC (task classification) MF, wherein the unit of the WAPS is set to million, and the unit of the MF is set to GB; tasks are divided into two broad categories by TC value: when TC <1, is a computationally intensive application; when TC >1, it is a data intensive application.

Fig. 2 shows the distribution results of 8 applications in the default allocation mode of the system in this embodiment. The dashed boxes represent NUMA Node nodes, one solid square represents an application, the gray squares represent compute intensive applications, and the white squares represent data intensive applications. The calculation intensive application characteristics are as follows: the access data volume is small, the access data locality is strong, and the main bottleneck of the task is CPU calculation. The data intensive application characteristics are as follows: the access data volume is large, the access data locality is poor, and the main bottleneck of the task is memory access. As shown in fig. 2, 3 compute-intensive applications (a1, a2, A3) and 1 data-intensive application (B1) are allocated in Node1, and 1 compute-intensive application (a4) and 3 data-intensive applications (B2, B3, B4) are allocated in Node 2. The performance metadata of each application is respectively sampled and counted according to a hardware performance counter, and the memory write request number WAPS and the total memory occupation quantity MF per second are shown in the figure. The TC value for each application is calculated as a criterion for system task awareness. For compute intensive applications a1, a2, A3, a4, the TC values are: 0.0005, 0.002, 0.0075, 0.0005. For data intensive applications B1, B2, B3, B4, the TC values are: 2.5, 1.8, 2.5. In the present invention, setting TC 1 is a threshold for distinguishing between compute-intensive applications and data-intensive applications.

The second step is that: node task allocation record

According to the CPU occupation, memory allocation and performance metadata records of each process, a task process record table is established for each node in the NUMA architecture, and metadata of related processes in the node is recorded; meanwhile, a resource allocation record table is created for each node, and the CPU core occupation conditions in the node and the capacity of the node free memory are recorded (see the flow operation block two in fig. 1).

In this embodiment, all the process metadata records are periodically traversed, and the task process record table of each NUMA node is updated. Under the default distribution mode of the system, the number of the compute-intensive applications in the Node1 is 3, and the number of the data-intensive applications is 1; the number of compute-intensive applications in Node2 is 1 and the number of data-intensive applications is 3.

And meanwhile, updating the resource allocation record table of the node according to the occupation condition of the system resources. Because the memory occupation volumes MF of 4 applications in the Node1 are respectively: 0.5, 1.0, 1.5, 5.0, so the remaining free memory is 8GB and the free core is 0. Because the memory occupation volumes MF of 4 applications in the Node2 are respectively: 0.5, 4.0, 5.0, so the remaining free memory is 1.5GB and the free core is 0.

The third step: task characteristic aware scheduling policy

firstly, traversing distribution record tables of all nodes of NUMA, and finding out a Node1 which runs most computation intensive applications, namely TC <1, and a Node2 which runs most data intensive applications, namely TC > 1; respectively recording the number of compute-intensive tasks of each node: computing _ task _ NUM; and data intensive task number: data _ task _ NUM;

for both nodes, respectively: if the absolute value of the data _ task _ NUM subtracted by the computing _ task _ NUM is less than or equal to 1, the fact that the task placement of the Node1 and the Node2 is completely uniform is explained, and further, the fact that different types of applications of all nodes are uniformly distributed is explained, and task migration adjustment is not needed.

The default task resource allocation strategy in the existing system is to equally allocate tasks to different nodes in a round-robin manner according to the arrival time of the tasks, and to ensure that the same node exists between the CPU and the memory allocated by the tasks as much as possible. Such an approach is limited, does not take into account the nature of the arriving task, and cannot be adaptively adjusted. And a task characteristic perception scheduling strategy is based on a system default task resource allocation mode, and task migration adjustment among nodes is periodically completed according to the task allocation record of each node, so that uniform allocation of different types of applications to all nodes is ensured.

In this embodiment, first, the task process record tables of all nodes are traversed, and the following results are obtained by comparison: node1 is the Node running the most compute intensive applications (TC <1) and Node2 is the Node running the most data intensive applications (TC > 1). Computing _ task _ NUM of Node1 is 3, and data _ task _ NUM is 1. Computing _ task _ NUM of Node2 is 1, and data _ task _ NUM is 3. Therefore, in this adjustment, both nodes satisfy | computing _ task _ NUM-data _ task _ NUM | >1 (see block (c) in fig. 1), which indicates that the task placement of nodes 1 and 2 is not uniform enough. Selecting a calculation intensive application and a data intensive application to be scheduled (see a frame IV in figure 1), checking a resource allocation record table of a node, and judging whether enough free memory supports task migration adjustment between the two nodes (see a frame V in figure 1): because the compute-intensive application A1 with the minimum memory occupation in the Node1 occupies 0.5GB of memory, which is less than 1.5GB of the rest idle memory of the Node 2. Meanwhile, the data intensive application B2 with the minimum memory occupation in the Node2 occupies 4GB of memory, which is smaller than 8GB of the rest idle memory of the Node 1. Therefore, the free memory is enough to complete task migration, task A1 is migrated from Node1 to Node2, and task B2 is migrated from Node1 to Node 2. Fig. 3 shows a schematic diagram of task placement after the adjustment is completed.

And periodically traversing the task process record tables of all the nodes again, and comparing to obtain: node1 and Node2 have the same number of compute intensive applications and data intensive applications. The computing _ task _ NUM is 2, and the data _ task _ NUM is 2. In this adjustment, | computing _ task _ NUM _ data _ task _ NUM | >1 is not satisfied (see block (c) in fig. 1), which indicates that the task placement of the nodes 1 and 2 is completely uniform, and no task migration adjustment is required.

The fourth step: task page aware migration policy

Compared with the conventional DRAM (dynamic random access memory) medium, the novel NVM medium has the defects of low writing speed, limited erasing times and the like. The results of the related research show that: the NVM write operation is 10-20 times slower than that of DRAM, and the erasing life of the server scene is 3-5 years. At the same time, however, NVM has the characteristics of large storage capacity, non-volatility, etc. In order to better exert the performance of the heterogeneous memory, the invention adopts the periodic task page sensing and carries out the corresponding page migration strategy according to different reading and writing characteristics of the page.

Firstly, the task page aware migration strategy is based on the following experimental conclusions: a large number of application page characteristics were analyzed and for most applications the following experimental conclusions were found: the write operation in the initial memory allocation stage is not considered, and the number of pages with write operation is far less than the number of pages allocated by the application in the calculation running process of the application. And the page on which the write operation occurs is relatively fixed throughout the running of the application. Therefore, the following page migration policy is periodically performed:

firstly, traversing the task process record tables of all nodes, and comparing with the last traversal result. And judging which applications are still in the memory occupation growth stage, and not carrying out page migration on the applications. For the application with relatively stable memory occupation, a page sensing migration strategy is carried out:

fig. 4 is a schematic diagram of memory usage of 8 applications adopted in this embodiment after the initial memory allocation stage. The dashed frame divides the memory of each node into two parts according to the storage medium: DRAM and NVM. The white squares represent the set of pages for which no write has recently occurred, and the gray squares represent the set of pages for which a write has recently occurred. Since the default memory allocation scheme of the system will preferentially allocate DRAM memory and reallocate NVM memory, a portion of the pages that have not recently been written to are allocated in DRAM (A, C, E, F) and a portion of the pages that have recently been written to are allocated in NVM (H, I, K). Traversing page table entries of all pages (see a frame (c) in fig. 1), and dividing the pages according to dirty page flag bits (see a frame (c) in fig. 1): A. c, G, E, F, J, L is the set of pages for which no write operation has recently occurred; B. h, I, D, K is the set of pages where a write operation occurs. Detect if there is a page in the DRAM where no write operation has occurred (see block viii of fig. 1): the A, C, E, F page set is migrated to NVM. Detecting whether a page with a write operation is in the NVM (see box nine in FIG. 1): h, I, K page sets are migrated into DRAM.

FIG. 5 shows the page placement result after the migration is completed, and since the page where the write operation occurs is relatively fixed, it can be ensured that the page of the write operation is allocated in the DRAM as much as possible during the application running process, and a large amount of DRAM and NVM are not swapped in and out. Meanwhile, whether the page has over-written operation recently can be judged according to the dirty page flag bit of the page table entry corresponding to the page, extra metadata recording is not needed for the page, and the operation cost is low.

In this embodiment, by using a resource allocation method based on task awareness in a heterogeneous memory architecture, 4 compute-intensive applications and 4 data-intensive applications are uniformly allocated to two NUMA nodes, thereby alleviating cache contention and memory access contention of a CPU. And different read-write characteristic pages of the application are distinguished, and an adaptive placement strategy is adopted under a heterogeneous memory architecture, so that the write operation times of the NVM are reduced, and the service life of the NVM is prolonged. Since most of the write operations occur in DRAM, the impact of heterogeneous memory on the runtime of the application is mitigated.

Claims

1. A resource allocation method based on task perception under a heterogeneous memory architecture is characterized by comprising the following steps:

the first step is as follows: process performance metadata records

Aiming at all optimized task processes, acquiring two performance parameters, namely a memory write request number WAPS per second of the process and a total memory occupation quantity MF of the process, through a hardware performance counter, and calculating a task classification standard TC (WAPS) MF, wherein the unit of the WAPS is set as million, and the unit of the MF is set as GB; tasks are divided into two broad categories by TC value: when TC <1, is a computationally intensive application; when TC >1, it is a data intensive application;

the second step is that: node task allocation record

the third step: task characteristic aware scheduling policy

firstly, traversing distribution record tables of all nodes of NUMA, and finding out a Node1 which runs most computation intensive applications, namely TC <1, and a Node2 which runs most data intensive applications, namely TC > 1; the compute-intensive task numbers in the two nodes are recorded as follows: computing _ task _ NUM; the number of data intensive tasks is: data _ task _ NUM;

for both nodes, respectively: if the absolute value of the data _ task _ NUM subtracted by the computing _ task _ NUM is larger than 1, the fact that the task placement of the Node1 and Node2 nodes is not uniform enough is explained; if the two nodes have free memory to support task migration, one compute-intensive application in the Node1 is migrated to the Node2, and one data-intensive application in the Node2 is migrated to the Node 1; if the two nodes do not support task migration, the task migration operation is not carried out;

the fourth step: task page aware migration policy