WO2022267443A1

WO2022267443A1 - Memory resource dynamic regulation and control method and system based on memory access and performance modeling

Info

Publication number: WO2022267443A1
Application number: PCT/CN2022/070519
Authority: WO
Inventors: 徐易难; 周耀阳; 王卅; 唐丹; 孙凝晖; 包云岗
Original assignee: 中国科学院计算技术研究所
Priority date: 2021-06-24
Filing date: 2022-01-06
Publication date: 2022-12-29
Also published as: CN113505084B; CN113505084A

Abstract

Provided in the present invention are a memory resource dynamic regulation and control method and system based on memory access and performance modeling. By means of technology of guaranteeing quality of service of key applications on real-time multi-core hardware by means of dynamic memory bandwidth resource partitioning, a non-invasive solution with fine granularity, high precision and quick response is provided. In the present invention, an overall architecture of an automatic process performance regulation and control mechanism is designed; by means of a label mechanism, hardware directly acquires the priority of an upper-layer application, so as to provide differentiated hardware resource allocation for processes with different priorities. On the basis of a machine learning method, latency modeling is performed on a bank structure of a dynamic random access memory. For the problem of guaranteeing quality of service of key applications, in a real-time multi-core environment, memory access interference of other processes to a key process can be effectively reduced by means of dynamically adjusting memory bandwidth allocation, such that the quality of service of a high-priority process can be accurately guaranteed.

Description

Method and system for dynamic regulation and control of memory resources based on memory access and performance modeling

technical field

The invention belongs to the technical field of key application service quality assurance in real-time multi-core system scenarios, and particularly relates to a memory resource dynamic regulation method and system based on memory access and performance modeling.

Background technique

In a real-time system, the quality of service of key applications must be guaranteed, and when reflected in hardware, the allocation of hardware resources for key processes must be guaranteed. With the continuous improvement of application requirements for computing resources, scenarios such as cloud computing, smartphones, and 5G base stations have also continuously increased the processing capabilities of computer hardware. Multi-core has become the standard configuration for almost all real-time systems. However, in a multi-core scenario, multiple applications running on the same processor will compete with each other for hardware resources, resulting in performance fluctuations, which in turn affect the performance of real-time systems.

Therefore, some work has been carried out around the problem of "how to correctly and efficiently control the allocation of hardware resources between different applications in a system with real-time requirements, and ensure the quality of service of key applications".

Intel Xeon series processors are equipped with Resource Director Technology (RDT), including cache monitoring technology, cache allocation technology, memory bandwidth monitoring technology, memory bandwidth allocation technology, etc. The operating system uses resource allocation technology to monitor the cache and bandwidth usage of different cores, and adjusts the number of available resources of a single core by directly setting the resource allocation ratio, thereby reducing performance interference and ensuring the performance of key loads in complex environments.

The application performance loss model (Application Slowdown Model, ASM) combines the analysis of shared cache and main memory, and believes that for applications with limited memory access, its performance is proportional to the sending speed of memory access requests, and the highest priority In this case, the process can reach its maximum memory access bandwidth. On the one hand, ASM minimizes the interference from the memory access period, on the other hand, it quantifies the interference in the shared cache, and then periodically evaluates the performance loss to realize the feedback dynamic adjustment of hardware resources.

Intel RDT allows only static partitioning of resources, and allocation amounts are based only on the needs of known sensitive applications. In actual operation, because the hardware has no perception of program requirements, RDT relies on the manual control of software (operating system or user), and the hardware cannot dynamically adjust the number of resources at runtime. Since software usually has a coarser control granularity, this causes hardware resources waste and negatively impact overall system performance.

The application performance loss model does not have the general applicability of the architecture. Due to the existence of a large number of shared resources, in order to realize the control of hardware resources, ASM needs to make extensive intrusive modifications to the system bus, memory controller, prefetcher and other components to ensure Its support for priority is very expensive to implement. Due to the need to consider hardware implementation details, the complexity of modeling is increased, which makes it difficult to migrate resource competition evaluation models between different platforms.

Existing methods all have the problem of "using heuristic rules to judge inter-core interference in specific scenarios, and insufficient application universality". RDT hardware cannot automatically identify and adjust resource division, and ASM believes that applications with limited memory access In the case of the highest priority, its maximum memory access bandwidth can be achieved to quantify the bandwidth loss caused by interference between multi-cores, but this premise is not necessarily true.

invention disclosure

The purpose of the present invention is to solve the shortcomings of the above-mentioned prior art that can only be statically divided, does not have architecture universality, and does not have application universality, and proposes a technology based on memory access delay prediction, performance loss prediction, and bandwidth dynamic adjustment technology. Key application service quality assurance method and system.

Aiming at the deficiencies of the prior art, the present invention proposes a method for dynamic regulation and control of memory resources based on memory access and performance modeling, which includes:

Step 1. In the multi-core system, use the historical memory access request information of the preset process alone to the DRAM as the training data, and use the delay corresponding to the historical memory access request information as the training target to train the neural network model to obtain the memory access delay model ;

Step 2. When the multi-core system runs with multiple processes, record the target memory access request of the target process and input it into the memory access delay model, and obtain the memory access delay t of the target memory access request without inter-process interference _solo , and detect the actual memory access delay t _mix of the target memory access request at the same time, and divide the memory access delay t _solo by the memory access delay t _mix to obtain the increase ratio of the memory access delay;

Step 3, count the number of execution clock cycles inside and outside the core of the target process, and combine the increase ratio of the memory access delay to obtain the performance loss of the target process when it is running in multiple processes, relative to its single operation;

Step 4. When the performance loss is greater than the threshold, limit the DRAM memory access traffic of processes other than the target process to dynamically allocate DRAM bandwidth resources in real time to ensure the service quality of the target process.

The memory access and performance modeling-based dynamic control method for memory resources, wherein the historical memory access request information includes the current request information _h0 of the access target Bank and the past k memory access histories h _i (i=1,..., k), where h _i (i=0,...,k) includes the issue time t _i of _hi , the row address row _i and the column address col _i accessed by h _i , and the input of the memory access delay model is the memory access history and the current The difference between request information t ₀ -t _i , row ₀ -row _i and col ₀ -col _i , the output of the memory access delay model is the memory access delay of the current request information h ₀ Latency=g(h ₀ ,…, h _k ), complete the training of the memory access latency model by fitting the function g.

The method for dynamically regulating memory resources based on memory access and performance modeling, wherein the performance loss is:

Among them, A is the number of execution clock cycles in the core, B is the number of execution clock cycles outside the core, and Lats is the increase ratio of memory access delay.

In the method for dynamically regulating memory resources based on memory access and performance modeling, step 4 includes: using token bucket technology to limit DRAM memory access traffic of processes other than the target process.

The method for dynamic regulation and control of memory resources based on memory access and performance modeling, wherein

Each core of a multi-core system has an independent token bucket, and a certain number of tokens are automatically added to the token bucket every certain period, and the token bucket has a maximum token capacity, and all memory access requests issued by the core All will go through the token bucket, and any access request data packet will be marked when it enters the token bucket, record the time when it enters the token bucket, and judge whether there is an available token in the token bucket, and if so, the data The packet will be sent to the lower layer, and the number of tokens in the token bucket will decrease according to the data volume of the memory access request, otherwise, the memory access request will be sent to the waiting queue.

The present invention also proposes a dynamic control system for memory resources based on memory access and performance modeling, which includes:

Module 1 is used to use the historical memory access request information of the preset process to DRAM alone as the training data in the multi-core system, and use the delay corresponding to the historical memory access request information as the training target to train the neural network model to obtain the memory access delay Model;

Module 2, used to record the target memory access request of the target process and input it into the memory access delay model when the multi-core system runs with multiple processes, and obtain the memory access of the target memory access request without inter-process interference Delay t _solo , and detect the actual memory access delay t _mix of the target memory access request at the same time, and divide the memory access delay t _solo by the memory access delay t _mix to obtain the increase ratio of memory access delay;

Module 3 is used to count the number of execution clock cycles in the core and outside the core of the target process, and in combination with the increase ratio of the memory access delay, to obtain the performance loss of the target process relative to its separate operation when multi-process operation is performed;

Module 4 is used to limit the DRAM access traffic of processes other than the target process when the performance loss is greater than a threshold, so as to dynamically allocate DRAM bandwidth resources in real time and ensure the service quality of the target process.

The memory resource dynamic control system based on memory access and performance modeling, wherein the historical memory access request information includes the current request information h ₀ of the access target Bank and the past k memory access histories h _i (i=1,..., k), where h _i (i=0,...,k) includes the issue time t _i of _hi , the row address row _i and the column address col _i accessed by h _i , and the input of the memory access delay model is the memory access history and the current The difference between request information t ₀ -t _i , row ₀ -row _i and col ₀ -col _i , the output of the memory access delay model is the memory access delay of the current request information h ₀ Latency=g(h ₀ ,…, h _k ), complete the training of the memory access latency model by fitting the function g.

The memory resource dynamic control system based on memory access and performance modeling, wherein the performance loss is:

In the memory resource dynamic control system based on memory access and performance modeling, the module 4 includes: using token bucket technology to limit the DRAM memory access flow of processes other than the target process.

The memory resource dynamic control system based on memory access and performance modeling, wherein

As can be seen from the above scheme, the present invention has the advantages of:

The present invention proposes a technique for ensuring key application service quality through dynamic memory bandwidth resource division on real-time multi-core hardware, and provides a fine-grained, high-precision, and fast-response non-intrusive solution. The present invention designs the overall architecture of the process performance automatic regulation mechanism, allows the hardware to directly obtain the priority of upper-layer applications through the label mechanism, and provides differentiated hardware resource allocation for processes with different priorities. Innovatively based on the machine learning method to model the delay of the DRAM (Dynamic random access memory, DRAM) bank structure, in most scenarios, it can achieve a prediction accuracy of more than 90%, and the average error is 2.78%. Based on the memory access delay, the performance loss of a process is estimated relative to its independent operation, and the average error is only 8.78%, which is better than the existing related technologies. Aiming at the problem of quality of service guarantee for key applications, in a real-time multi-core environment, the memory access interference of other processes to key processes is effectively reduced by dynamically adjusting memory bandwidth allocation, and the quality of service of high-priority processes is accurately guaranteed to achieve their independent operation. 90% of the time.

Brief description of the drawings

Fig. 1 is the position and composition structure schematic diagram of AutoMBA of the present invention in the system;

Fig. 2 is the input and output schematic diagram of multi-layer perceptron model;

Fig. 3 is a schematic diagram showing that the execution time of a sequential processor can be divided into two parts, inside the core and outside the core;

Fig. 4 is a schematic diagram of the operating principle of the token bucket mechanism.

BEST MODE FOR CARRYING OUT THE INVENTION

When the inventors conducted multi-core program performance analysis and dynamic resource adjustment and optimization research, they found that the existing technology did not combine low-level hardware information such as memory delay, bandwidth, and program memory access characteristics with high-level software information, resulting in hardware Problems such as lack of information, unknowable actual software performance, complex control technology, etc. After research, the inventor found that this problem can be solved by "using delay, bandwidth, access characteristics and other information combined with the actual performance loss of the software to model, and deriving the estimated performance loss of the software online on the hardware, based on continuous observation results Feedback To implement memory bandwidth allocation based on token bucket". Specifically, this application includes the following key technical points:

Key point 1. Using machine learning methods to model memory latency offline based on historical memory access address sequences, the average error rate on the SPEC CPU2006 test program is 2.84%. The offline means that the model has been established when the program is running (the program has been completed offline), and there is no need to establish the model online when the program is running.

Key point 2, using machine learning methods, based on the measured memory access delay, estimated ideal memory access delay, program memory access bandwidth, program memory access frequency and other information to model the performance loss of the program in the case of multi-core offline, tested in SPEC CPU2006 The average error of the procedure is 8.78%.

Key point 3, use the quota-based token bucket technology to control the memory access bandwidth of the program, and use the adjustment of the token bucket parameters to dynamically control the memory access bandwidth allocation of different programs, so as to guarantee the performance of key programs and achieve the set The ideal performance target of 90% (standard deviation 4.19%).

In order to make the above-mentioned features and effects of the present invention more clear and understandable, the following specific examples are given together with the accompanying drawings for detailed description as follows.

As shown in Figure 1, the core technical principles of the present invention include: (1) Based on the memory access request sequence of a single application and their timestamps in a multi-core environment, the main memory behavior is consistently simulated by modeling the memory, and online estimation The latency of the memory access request when it is running alone, combined with the memory access latency in the actual mixed environment, obtains the memory access latency increase ratio (Latency Scale-up, LatS); (2) Statistical process's in-core and out-of-core execution clocks The number of cycles, combined with LatS, calculates the execution time of the process when it runs alone, and quantifies the amount of interference the process receives; (3) According to the priority relationship of different cores, the token bucket technology is used to dynamically allocate memory bandwidth resources in real time to ensure Quality of service for critical applications.

The concrete implementation process of technical scheme of the present invention is as follows:

(1) DRAM memory access delay modeling

In order to obtain the delay of the process memory access request in a separate operating environment in real time in a multi-core environment, the present invention proposes (1) modeling the access delay of DRAM, and considering the DRAM Bank structure as a black box model ShadowDRAM, wherein the present invention is to The overall DRAM is modeled. The behavior of each bank in the DRAM is similar. The same black box model is used for different banks. The input is the relevant information of the current and past k memory access requests, and the output is the current access (2) When multiple processes are running, obtain the process information of the upper-layer application based on the labeling environment, record the memory access request of a single process by the hardware and input it into the DRAM memory access delay model ShadowDRAM, and obtain the memory access request in no time Delay t _solo in the case of multi-core interference.

The present invention considers that the current state of the DRAM and its controller depends on all historical access requests, and if a current memory access request input is given on this basis, the access delay of the request can be accurately predicted by modeling. The reason is that for a memory access component, its input signal is non-trivial only when it receives a memory access request, and it will not affect the way the internal state changes at other times, that is, it is a Moore type when there is no memory access request sequential circuit. Therefore, the internal state of the DRAM and its controller only needs to be determined from meaningful input signals in the past, ie, historical access request sequences.

More specifically, the inventors found that the delay of a DRAM access request can be determined by a small number of memory access request sequences that access the same Bank before and after it. This is because (1) the state of the row cache in the DRAM Bank has the greatest impact on the memory access delay, and the state of the row cache is determined by the last request to access the Bank, and the earlier memory access request will not affect the row cache (2) Non-blocking cache (Non-blocking Cache) brings memory-level parallelism (Memory-level Parallelism), that is, the cache continuously sends out multiple memory access requests without receiving a reply, and may Reply data is received out of order, which makes the processing flow of a single request affected by all requests within a short period of time before and after it, and is processed in advance or delayed.

The present invention proposes that the DRAM delay prediction model (black box model) samples q and predicts its delay only when no access request for the same Bank is issued within the time period after the memory access request q is issued and before the reply data is received . The main reasons are as follows: (1) The prediction of the delay needs to be relatively real-time, and the time difference between two consecutive requests for a Bank access may be large, so the input of the model cannot contain future information; (2) The sampling condition restrictions mentioned above ensure that future requests will not affect the processing time of request q, so the historical access request sequence information input by the model is sufficient for the delay prediction of the current request; (3) the model requires The calculation time may be long, and the sampling method is used to select some requests to predict the delay and performance loss, which can not only reserve sufficient time for calculation, but also reduce system power consumption.

Based on the above discussion, in order to find a general delay prediction mechanism and minimize unnecessary research on the internal structure of DRAM, based on the assumption that the current state of DRAM and its controller can be determined by the historical access requests of some current Banks, the present invention attempts to The Bank structure in DRAM is modeled. Based on the sampling of some requests that meet certain conditions, the delay of the current request is predicted by using the limited number of access histories corresponding to the Bank. The certain condition may be, for example, fetching a request every 100 items, or requiring no new instruction to be issued within a period of time after the request is sent and before the reply is received, or other feasible filtering conditions.

The present invention proposes to model the DRAM Bank using a machine learning method, which has the advantages of: (1) using basic arithmetic operations and a small amount of historical information storage and calculation to predict the delay of the current memory access request with a fairly high accuracy; 2) The machine learning model is versatile and universal, and its training process requires very little internal information of the DRAM controller and DRAM particles. When transplanting on different platforms, it is only necessary to re-grab the memory access information training model; ( 3) Based on partial sampling of memory access requests, the number of memory access delay predictions is effectively reduced, and the dynamic power consumption of the system is reduced.

The present invention uses a machine learning method to model the DRAM Bank based on a Multi-layer Perceptron (MLP) model. The reasons are: (1) MLP has universal applicability, and it can accurately Simulate the DRAM Bank under different configurations accurately, which enables the memory access delay prediction mechanism to be reused between different platforms, effectively reducing the workload of transplantation; (2) The MLP model and the ReLU activation function do not include complex function operations, only The result can be output after a simple multiplication and addition operation, and it is no longer necessary to maintain a large number of queues and state machines inside the DRAM and its controller, which effectively reduces the power consumption of the delay prediction mechanism.

As shown in Figure 2, the present invention uses the current request h ₀ of the access target Bank and past k memory access histories h _i (i=1,...,k) to predict the memory access delay, where h _i (i=0,... ,k) contains the information of the issue time t _i of _hi , the row address row _i and the column address col _i accessed by _hi , and the actual input memory access history of the model can be determined by the difference between the historical request information and the current request information To represent. In the training and prediction of the model, for each memory access history h _i , we use t ₀ -t _i ,row ₀ -row _i ,col ₀ -col _i to represent their relationship with the current request, and the output of the model is The memory access latency of the current request is Latency=g(h ₀ ,...,h _k ), and the training process of the model is to continuously fit the function g. Among them, the memory access delay of the current request can be obtained by pre-running some preset threads or programs to capture their memory access and the delay of the corresponding request.

(2) Program performance loss modeling

From the perspective of microstructure, we can divide the execution process of the process into two parts: inside the core and outside the core. The process inside the core includes the operation unit, access to the core private cache, etc., and the part outside the core includes sending a read request to DRAM. , DRAM internal processing, etc.

As shown in Figure 3, in a sequential processor, when a memory access instruction causes an L2 cache miss, the internal pipeline of the core will stop and wait for DRAM to return data before continuing to execute. Therefore, during program execution, CPU clock cycles can be strictly divided into two types: in-core time and out-of-core time, among which only out-of-core time is subject to multi-core interference.

In an out-of-order processor, instructions at a later position may be executed in advance under certain conditions, so a cache miss does not necessarily cause the internal pipeline of the CPU core to stop. However, from the perspective of the system as a whole, in any clock cycle, either the memory access request outside the core is being processed, or all the memory access requests have been completed. These two types of clock cycles are called memory cycles (Memory Cycles, MCs). ) and non-memory cycles (Non-Memory Cycles, NMCs).

On the other hand, in a multi-core architecture, affected by the interference caused by resource competition among multiple cores, for requests sent from L2 to the DRAM controller through the system bus, their overall delay will increase due to multi-core resource competition. When the system bus is being occupied by another core, the request must wait for the other core's request to complete the transfer before it can be issued. The biggest source of inter-core interference is inside the DRAM Bank. The line caching mechanism is similar to Cache. When the locality of the program is good, its memory access delay will be greatly reduced. However, in a multi-core environment, memory access requests of other processes are likely to be inserted between two consecutive memory access requests of a single process, thereby destroying the locality of DRAM access and affecting memory access latency.

Considering the above two aspects comprehensively, the present invention considers that when running simultaneously with other processes under the multi-core architecture, the number of MCs of a process will increase compared with that of running alone, while the number of NMCs is the same. In a multi-core environment, by judging whether there are currently outstanding DRAM access requests, the hardware can divide each clock cycle into two types: NMC and MC, and count the number of two types of time cycles in a period of time, which are recorded as A ,B. Due to the existence of inter-core interference, the memory access request delay increases and the number of MCs increases, and the higher the increase ratio, the greater the interference caused by resource competition, and the greater the performance loss of the process.

Based on this, the present invention proposes a performance loss estimation (Slowdown Estimation via Memory Access Latency, SEMAL) model based on memory access delay: after the single-core memory access delay t _solo of the process is obtained by a machine learning method, combined with the actual multi-core memory access delay t _mix , we can estimate the increase ratio of the MC number of the process LatS=t _mix /t _solo based on the latency increase ratio (Latency Scale-up, LatS). Corresponding to the single running scenario, when the process executes the same stage, the number of NMCs is still A, but the number of MCs will be reduced in proportion to LatS, that is, the execution time required by the process is A+B/LatS.

Therefore, in a multi-core environment, based on the relative extension of process execution time, we calculate the performance loss when it is running alone:

In the formula, Execution_Time is the execution time of the process or program, Execution_Time _solo is the individual execution time of the process or program, and Execution_Time _mix is the mixed execution time of the process or program.

(3) Service quality assurance technology based on token bucket

Based on the SEMAL model, the present invention proposes an automatic memory bandwidth allocation mechanism (Automatic Memory Bandwidth Allocation, AutoMBA): using SEMAL, we can dynamically monitor the operation of the program at runtime and obtain the degree of performance loss. Based on the memory access bandwidth during hybrid operation, Forecasts its optimal-case (standalone) bandwidth requirements. Using the token bucket technology, the system can regulate the memory access bandwidth of different cores, and by limiting low-priority memory access traffic, the service quality of key applications can be guaranteed first.

Token Bucket (TB) technology is the basic tool of AutoMBA, which can effectively and accurately control the memory access bandwidth of different cores. As shown in Figure 4, each CPU core has a private and independent token bucket, and a certain number of tokens (Inc) are automatically added every certain period (Freq), and the token bucket has a maximum capacity (Size). All memory access requests issued by the core will pass through the token bucket, and any request data packet will be marked when it enters the token bucket, and the time when it enters the token bucket will be recorded. At this time, if there is an available token in the token bucket (such as PACKET ₀ sent at time t ₀ ), the data packet will be sent to the lower layer, and the number of tokens will decrease according to the amount of requested data. Otherwise, if there is no remaining token in the token bucket (such as PACKET ₁ and PACKET ₂ sent at time t ₁ and t ₂ respectively), the request will be blocked and sent to the waiting queue. A timer synchronized with the system clock is set in the token bucket, which will reset every time it reaches Freq, and trigger the token to automatically increase the number of Inc tokens. At this time, in the case that there are still remaining tokens, the requests in the waiting queue will be sent to the lower layer and the number of tokens will be reduced accordingly. For example, the previously blocked PACKET ₂ and PACKET ₃ requests are released at time _t3 . If the requested data volume of the two requests is 1 respectively, ntokens will be reduced by 2 at the same time.

In each sampling period (Sampling Interval, SI): (1) The TB of each core runs automatically according to the set parameters, and the core sends memory access requests based on the current remaining number of tokens; (2) The delay prediction module (LPM) records The memory access history of the high-priority core (or target process), predicting the delay t _solo of the memory access request issued by the core when running alone; (3) the token bucket controller (TBM) handles the memory access issued by TB on the one hand Request, forward requests from different cores to the DRAM controller, on the other hand, get the predicted t _solo from the LPM, detect the actual memory access delay t _mix , use the relative ratio of t _solo and t _mix and the core memory access clock cycle The number estimates the performance loss of the target process when it is running alone, and records it in the preset related registers.

An updating interval (Updating Interval, UI) consists of multiple sampling periods. At the end of each UI, the AutoMBA mechanism evaluates the degree of performance loss of the target process in the past UI. When the performance loss of the target process is too much, TBM automatically restricts the memory access traffic of the remaining cores, thereby reducing the interference between multiple cores and improving the performance of the target process; when the performance of the target process meets the requirements, the flow control of the remaining cores can be relaxed.

AutoMBA's control algorithm is divided into observation (Observe) and action (Act) two steps. Observe is performed at the end of each SI. Combined with memory access delay and memory access cycle count, the hardware calculates the performance loss of the process and sets the corresponding counter. Act is carried out at the end of each UI. If the hardware judges that the target process meets the performance loss of less than 10% in most SIs, increase the maximum allowable flow of the remaining cores, and the more time it satisfies, the more the number will be increased; when the target process When the performance loss reaches more than 50% in no less than 3 SIs, the traffic allowed by the remaining cores is directly halved; when the performance loss of the target process is within the remaining range, there is also a corresponding token bucket Inc parameter adjustment.

The following are system embodiments corresponding to the foregoing method embodiments, and this implementation manner may be implemented in cooperation with the foregoing implementation manners. The relevant technical details mentioned in the foregoing implementation manners are still valid in this implementation manner, and will not be repeated here in order to reduce repetition. Correspondingly, the relevant technical details mentioned in this implementation manner may also be applied in the foregoing implementation manners.

Module 3, which is used to count the number of execution clock cycles inside and outside the core of the target process, and in combination with the increase ratio of the memory access delay, to obtain the performance loss of the target process when it is running in multiple processes, relative to its single operation;

Industrial applicability

The invention proposes a method and system for dynamic regulation and control of memory resources based on memory access and performance modeling. The present invention proposes a dynamic control method and system for memory resources based on memory access and performance modeling. Memory latency model to predict the memory access latency of the application to be executed when it runs alone, combined with the memory access latency of the application to be executed in a mixed execution environment of multiple applications, to obtain the increase ratio of memory access latency; count the in-core and out-of-core of the process The number of execution clock cycles, combined with the increase ratio, is used to obtain the execution time when the process is running alone. Based on this, the performance loss suffered by the process is quantified. When the performance loss of the target process is greater than the threshold, the memory access traffic other than from the target process is limited. It ensures that the quality of service of the high-priority process is close to the quality of service when it runs alone.

Claims

A method for dynamic regulation and control of memory resources based on memory access and performance modeling, characterized in that it includes:

Step 1. In the multi-core system, use the historical memory access request information of the preset process alone to the DRAM as the training data, and use the delay corresponding to the historical memory access request information as the training target to train the neural network model to obtain the memory access delay model ;

Step 2. When the multi-core system runs with multiple processes, record the target memory access request of the target process and input it into the memory access delay model, and obtain the memory access delay t of the target memory access request without inter-process interference solo , and detect the actual memory access delay t mix of the target memory access request at the same time, and divide the memory access delay t solo by the memory access delay t mix to obtain the increase ratio of the memory access delay;

Step 3, count the number of execution clock cycles inside and outside the core of the target process, and combine the increase ratio of the memory access delay to obtain the performance loss of the target process when it is running in multiple processes, relative to its single operation;

Step 4. When the performance loss is greater than the threshold, limit the DRAM memory access traffic of processes other than the target process to dynamically allocate DRAM bandwidth resources in real time to ensure the service quality of the target process.
The memory resource dynamic control method based on memory access and performance modeling according to claim 1, wherein the historical memory memory request information includes the current request information h0 of the access target Bank and the past k pieces of memory memory memory history hi (i=1,...,k), where h i (i=0,...,k) includes the issue time t i of hi , the row address row i and the column address col i accessed by hi , and the memory access delay model The input is the difference between the memory access history and the current request information t 0 -t i , row 0 -row i and col 0 -col i , and the output of the memory access delay model is the memory access delay Latency of the current request information h 0 = g(h 0 ,…,h k ), complete the training of the memory access latency model by fitting the function g.
The memory resource dynamic control method based on memory access and performance modeling as claimed in claim 1, wherein the performance loss is:

Among them, A is the number of execution clock cycles in the core, B is the number of execution clock cycles outside the core, and Lats is the increase ratio of memory access delay.
The method for dynamically controlling memory resources based on memory access and performance modeling according to claim 1, wherein step 4 comprises: using token bucket technology to limit the DRAM memory access flow of processes other than the target process.
The memory resource dynamic control method based on memory access and performance modeling as claimed in claim 1, characterized in that,

Each core of a multi-core system has an independent token bucket, and a certain number of tokens are automatically added to the token bucket every certain period, and the token bucket has a maximum token capacity, and all memory access requests issued by the core All will go through the token bucket, and any access request data packet will be marked when it enters the token bucket, record the time when it enters the token bucket, and judge whether there is an available token in the token bucket, and if so, the data The packet will be sent to the lower layer, and the number of tokens in the token bucket will decrease according to the data volume of the memory access request, otherwise, the memory access request will be sent to the waiting queue.
A memory resource dynamic control system based on memory access and performance modeling, characterized in that it includes:

Module 1 is used to use the historical memory access request information of the preset process to DRAM alone as the training data in the multi-core system, and use the delay corresponding to the historical memory access request information as the training target to train the neural network model to obtain the memory access delay Model;

Module 2, used to record the target memory access request of the target process and input it into the memory access delay model when the multi-core system runs with multiple processes, and obtain the memory access of the target memory access request without inter-process interference Delay t solo , and detect the actual memory access delay t mix of the target memory access request at the same time, and divide the memory access delay t solo by the memory access delay t mix to obtain the increase ratio of memory access delay;

Module 3, which is used to count the number of execution clock cycles inside and outside the core of the target process, and in combination with the increase ratio of the memory access delay, to obtain the performance loss of the target process when it is running in multiple processes, relative to its single operation;

Module 4 is used to limit the DRAM access traffic of processes other than the target process when the performance loss is greater than a threshold, so as to dynamically allocate DRAM bandwidth resources in real time and ensure the service quality of the target process.
The memory resource dynamic control system based on memory access and performance modeling according to claim 6, wherein the historical memory memory request information includes the current request information h0 of the access target Bank and the past k pieces of memory memory memory history hi (i=1,...,k), where h i (i=0,...,k) includes the issue time t i of hi , the row address row i and the column address col i accessed by hi , and the memory access delay model The input is the difference between the memory access history and the current request information t 0 -t i , row 0 -row i and col 0 -col i , and the output of the memory access delay model is the memory access delay Latency of the current request information h 0 = g(h 0 ,…,h k ), complete the training of the memory access latency model by fitting the function g.
The memory resource dynamic control system based on memory access and performance modeling as claimed in claim 6, wherein the performance loss is:

Among them, A is the number of execution clock cycles in the core, B is the number of execution clock cycles outside the core, and Lats is the increase ratio of memory access delay.
The memory resource dynamic control system based on memory access and performance modeling according to claim 6, wherein the module 4 includes: using token bucket technology to limit the DRAM memory access flow of processes other than the target process.
The memory resource dynamic control system based on memory access and performance modeling as claimed in claim 6, wherein,

Each core of a multi-core system has an independent token bucket, and a certain number of tokens are automatically added to the token bucket every certain period, and the token bucket has a maximum token capacity, and all memory access requests issued by the core All will go through the token bucket, and any access request data packet will be marked when it enters the token bucket, record the time when it enters the token bucket, and judge whether there is a token available in the token bucket, and if so, the data The packet will be sent to the lower layer, and the number of tokens in the token bucket will decrease according to the data volume of the memory access request, otherwise, the memory access request will be sent to the waiting queue.