CN113505084A

CN113505084A - Memory resource dynamic regulation and control method and system based on memory access and performance modeling

Info

Publication number: CN113505084A
Application number: CN202110702890.0A
Authority: CN
Inventors: 徐易难; 周耀阳; 王卅; 唐丹; 孙凝晖; 包云岗
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2021-06-24
Filing date: 2021-06-24
Publication date: 2021-10-15
Anticipated expiration: 2041-06-24
Also published as: WO2022267443A1; CN113505084B

Abstract

The invention provides a memory resource dynamic regulation and control method and system based on memory access and performance modeling. The technology for guaranteeing the quality of service of key application is carried out on real-time multi-core hardware through dynamic memory bandwidth resource division, and a non-invasive solution with fine granularity, high precision and quick response is provided. The invention designs the overall architecture of a process performance automatic regulation mechanism, and hardware directly acquires the priority of upper application through a label mechanism, thereby providing differentiated hardware resource allocation for processes with different priorities. And performing delay modeling on the body structure of the dynamic random access memory based on a machine learning method. Aiming at the problem of guaranteeing the service quality of key application, under the real-time multi-core environment, the access interference of other processes to the key process is effectively reduced by dynamically adjusting the memory bandwidth allocation, and the service quality of the high-priority process is accurately guaranteed.

Description

Memory resource dynamic regulation and control method and system based on memory access and performance modeling

Technical Field

The invention belongs to the technical field of critical application service quality assurance in a real-time multi-core system scene, and particularly relates to a memory resource dynamic regulation and control method and system based on memory access and performance modeling.

Background

In a real-time system, the quality of service of key applications must be guaranteed, and the amount of hardware resource allocation expressed as a key process on hardware is guaranteed. With the continuous improvement of the demand of applications on computing resources, the processing capability requirements of scenes such as cloud computing, smart phones, 5G base stations and the like on computer hardware are also continuously improved, and multi-core is already the standard configuration of almost all real-time systems. However, in a multi-core scenario, multiple applications running on the same processor may contend for hardware resources, which may cause performance fluctuation, and further affect performance of the real-time system.

Therefore, there are some works around the problem of how to accurately and efficiently control the allocation of hardware resources among different applications and guarantee the quality of service of critical applications in a system with real-time requirements.

The intel to r series of processors are equipped with Resource Director Technology (RDT), which includes cache monitoring Technology, cache allocation Technology, memory bandwidth monitoring Technology, memory bandwidth allocation Technology, and the like. The operating system monitors the use conditions of caches and bandwidths of different cores by using a resource allocation technology, and adjusts the number of available resources of a single core by directly giving a resource allocation proportion, so that the performance interference is reduced, and the performance of a key load is guaranteed in a complex environment.

An Application slow down Model (ASM) combines the analysis of the shared cache and the main memory, and it is considered that for an Application with limited access, the performance is proportional to the sending speed of the access request, and the process can reach its maximum access bandwidth under the condition of the highest priority. The ASM reduces the interference from the memory access period as much as possible on one hand, and quantifies the interference in the shared cache on the other hand, thereby periodically evaluating the performance loss and realizing the feedback type dynamic adjustment of hardware resources.

Intel RDT only allows static partitioning of resources and the allocation amount is based only on the requirements of known sensitive applications. In actual operation, because hardware does not sense program requirements, RDT relies on manual control of software (operating system or user), hardware cannot dynamically adjust resource quantity during operation, and because software generally has a coarse regulation granularity, hardware resources are wasted and overall system performance is negatively affected.

The performance loss model of the application program does not have architecture universality, and due to the existence of a large amount of shared resources, in order to realize the control of hardware resources, the ASM needs to carry out large-range invasive modification on components such as a system bus, a memory controller, a prefetcher and the like, so that the support of the ASM on priority is ensured, and the realization cost is very high. Due to the fact that hardware implementation details need to be considered, modeling complexity is increased, and migration of the resource competition evaluation model between different platforms becomes difficult.

The existing methods all have the problems of insufficient application universality caused by the fact that heuristic rules are used for judging inter-core interference in a specific scene, RDT hardware cannot automatically identify and adjust resource division conditions, and ASM considers that the access limited application can reach the maximum access bandwidth under the condition of the highest priority, so that the bandwidth loss caused by the inter-core interference is quantified, but the premise is not necessarily true.

Disclosure of Invention

The invention aims to overcome the defects that the prior art can only realize static division, does not have architecture universality and does not have application universality, and provides a key application service quality guarantee method and a system based on memory access delay prediction, performance loss prediction and bandwidth dynamic adjustment technology.

Aiming at the defects of the prior art, the invention provides a memory resource dynamic regulation and control method based on memory access and performance modeling, which comprises the following steps:

step 1, in the multi-core system, historical access request information of a preset process to a DRAM independently serves as training data, a delay corresponding to the historical access request information serves as a training target, a neural network model is trained, and an access delay model is obtained;

step 2, when the multi-process system runs in a multi-process mode, recording a target memory access request of a target process and inputting the target memory access request into the memory access delay model to obtain the memory access delay t of the target memory access request under the condition of no multi-process interference_soloSimultaneously detecting the actual memory access delay t of the target memory access request_mixBy memory access delay t_soloDivide by the memory access delay t_mixObtaining the memory access delay improvement proportion;

step 3, counting the number of clock cycles of the execution inside and outside the core of the target process, and increasing the proportion by combining the memory access delay to obtain the performance loss of the target process relative to the independent operation of the target process when the target process operates in multiple processes;

and 4, when the performance loss is larger than a threshold value, limiting the DRAM access flow of the process except the target process so as to dynamically allocate DRAM bandwidth resources in real time and ensure the service quality of the target process.

The memory resource dynamic regulation and control method based on memory access and performance modeling is characterized in that the historical memory access request information comprises current request information h for accessing a target Bank₀And past k memory access histories h_i(i ═ 1, …, k), where h is_i(i-0, …, k) includes h_iIs sent out at time t_i、h_iRow address row of access_iAnd column address col_iThe input of the memory access delay model is the difference value t between the memory access history and the current request information₀-t_i、row₀-row_iAnd col₀-col_iThe output of the access delay model is the current request information h₀Access delay of g (h)₀,…,h_k) And finishing the training of the memory access delay model by a fitting function g.

The memory resource dynamic regulation and control method based on memory access and performance modeling comprises the following performance loss:

wherein A is the number of clock cycles for in-core execution, B is the number of clock cycles for out-of-core execution, and Lats is the memory access delay increase ratio.

The memory resource dynamic regulation and control method based on memory access and performance modeling, wherein the step 4 comprises the following steps: using token bucket technology, DRAM access traffic for processes other than the target process is limited.

The memory resource dynamic regulation and control method based on memory access and performance modeling, wherein

Each core of the multi-core system is provided with an independent token bucket, a certain number of tokens are automatically added to the token bucket at regular intervals, the token bucket is provided with the maximum token capacity, all access requests sent by the cores pass through the token bucket, any access request data packet is marked when entering the token bucket, the time when the access request data packet enters the token bucket is recorded, whether an available token exists in the token bucket is judged, if yes, the data packet is sent to a lower layer, meanwhile, the number of the tokens in the token bucket is reduced according to the data volume of the access requests, and if not, the access requests are sent to a waiting queue.

The invention also provides a memory resource dynamic regulation and control system based on memory access and performance modeling, which comprises:

the module 1 is used for training a neural network model by taking historical access request information of a DRAM (dynamic random access memory) independently in a multi-core system by a preset process as training data and taking delay corresponding to the historical access request information as a training target to obtain an access delay model;

module 2, for recording target access request of target process and inputting it into the access delay model when the multi-process system operates in multi-process to obtain access delay t of target access request under the condition of no interference between multi-processes_soloSimultaneously detecting the actual memory access delay t of the target memory access request_mixBy memory access delay t_soloDivide by the memory access delay t_mixObtaining the memory access delay improvement proportion;

a module 3, configured to count the number of clock cycles for executing the target process inside and outside the core, and increase a ratio by combining the memory access delay, so as to obtain a performance loss of the target process when the target process runs in multiple processes, relative to the performance loss of the target process when the target process runs alone;

and the module 4 is used for limiting the DRAM access flow of the process except the target process when the performance loss is greater than a threshold value so as to dynamically allocate DRAM bandwidth resources in real time and guarantee the service quality of the target process.

The memory resource dynamic regulation and control system based on memory access and performance modeling is characterized in that the historical memory access request information comprises current request information h for accessing a target Bank₀And past k memory access histories h_i(i ═ 1, …, k), where h is_i(i-0, …, k) includes h_iIs sent out at time t_i、h_iRow address row of access_iAnd column address col_iThe input of the memory access delay model is the difference value t between the memory access history and the current request information₀-t_i、row₀-row_iAnd col₀-col_iThe output of the access delay model is the current request information h₀Access delay of g (h)₀,…,h_k) And finishing the training of the memory access delay model by a fitting function g.

The memory resource dynamic regulation and control system based on memory access and performance modeling is characterized in that the performance loss is as follows:

The memory resource dynamic regulation and control system based on memory access and performance modeling, wherein the module 4 comprises: using token bucket technology, DRAM access traffic for processes other than the target process is limited.

The memory resource dynamic regulation and control system based on memory access and performance modeling is characterized in that

According to the scheme, the invention has the advantages that:

the invention provides a technology for guaranteeing the quality of service of key application through dynamic memory bandwidth resource division on real-time multi-core hardware, and provides a non-invasive solution with fine granularity, high precision and quick response. The invention designs the overall architecture of a process performance automatic regulation mechanism, and hardware directly acquires the priority of upper application through a label mechanism, thereby providing differentiated hardware resource allocation for processes with different priorities. The method is characterized in that a body (Bank) structure of a Dynamic Random Access Memory (DRAM) is innovatively subjected to delay modeling based on a machine learning method, the prediction accuracy can reach over 90% in most scenes, and the average error is 2.78%. The performance loss of the process relative to the single operation of the process is estimated based on the memory access delay, and the average error is only 8.78 percent, which is better than that of the prior related art. Aiming at the problem of guaranteeing the service quality of key application, under the real-time multi-core environment, the memory bandwidth allocation is dynamically adjusted to effectively reduce the memory access interference of other processes to the key process, and the service quality of the high-priority process is accurately guaranteed to reach 90% of that of the high-priority process when the high-priority process operates alone.

Drawings

FIG. 1 is a schematic diagram of the position of the AutoMBA in the system and the composition structure thereof;

FIG. 2 is a schematic diagram of the input and output of a multi-layered perceptron model;

FIG. 3 is a diagram illustrating that the execution time of a sequential processor can be divided into an inner core and an outer core;

fig. 4 is a schematic diagram illustrating the operation of the token bucket mechanism.

Detailed Description

When the inventor conducts multi-core program performance analysis and dynamic resource adjustment optimization research, the inventor finds that the prior art does not combine low-level hardware information such as delay, bandwidth and memory access characteristics of a program with high-level software information, so that the problems of hardware information loss, unknown actual software performance, complex control technology and the like are caused. The inventor finds that solving the problem can be realized by modeling by combining information such as delay, bandwidth and access characteristics with actual performance loss of software, deducing on line on hardware to obtain estimated performance loss of the software, and performing token bucket-based memory bandwidth allocation in a feedback manner according to continuous observation results. Specifically, the present application includes the following key technical points:

the key point 1 is that a machine learning method is used, offline modeling is carried out on memory delay based on a historical memory access address sequence, and the average error rate on a SPEC CPU2006 test program is 2.84%. Where offline means that the model is built already at program run time (the program is completed offline), and it is not necessary to build the model online at program run time.

And a key point 2 is used for performing off-line modeling on the performance loss of the program under the multi-core condition by using a machine learning method based on information such as actually measured memory access delay, estimated ideal memory access delay, program memory access bandwidth and program memory access frequency, wherein the average error on the SPEC CPU2006 test program is 8.78%.

And a key point 3, controlling the memory access bandwidth of the program by using a token bucket technology based on quota, and dynamically controlling the memory access bandwidth allocation of different programs by adjusting the parameters of the token bucket, thereby realizing the guarantee of the performance of the key program and achieving the set 90% ideal performance target (standard deviation 4.19%).

In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

As shown in fig. 1, the core technical principle of the present invention includes: (1) based on the memory access request sequence of single application and the time stamps of the memory access request sequence under the multi-core environment, the main memory behavior is consistently simulated by modeling the memory, the delay of the memory access request in the independent operation is estimated on line, and the memory access delay under the actual mixed environment is combined to obtain the memory access delay improvement ratio (Latency Scale-up, LatS); (2) counting the number of clock cycles of the process during the in-core execution and the out-core execution, and calculating by combining with the LatS to obtain the execution time of the process during the independent operation, thereby quantifying the interference on the process; (3) and dynamically allocating memory bandwidth resources in real time by using a token bucket technology according to the priority relations of different cores, and ensuring the service quality of key application.

The technical scheme of the invention is implemented as follows:

(1) DRAM memory access delay modeling

In order to obtain the delay of a process access request in a single operation environment in real time under a multi-core environment, the invention provides (1) modeling the access delay of a DRAM (dynamic random access memory), and regarding the structure of the DRAM Bank as a black box model ShadowDRAM, wherein the invention models the whole DRAM, the behavior of each Bank in the DRAM is similar, the same black box model is used for different banks, the input of the same black box model is the related information of current and past k access requests, and the output of the same black box model is the delay of the current access request; (2) during multi-process operation, process information of an upper layer application is obtained based on a labeling environment, a memory access request of a single process is recorded by hardware and is input into a DRAM memory access delay model ShadowDRAM, and delay t of the memory access request under the condition of no multi-core interference is obtained_solo。

The present invention recognizes that the current state of the DRAM and its controller is dependent on all historical access requests, and if a current access request input is given based thereon, the access latency of that request can be accurately predicted by modeling. The reason is that for a memory access component, the input signal is non-trivial only when the memory access request is received, and the change mode of the internal state cannot be influenced at other moments, namely, the memory access component is a Moore type sequential circuit when no memory access request exists. Thus, the internal state of the DRAM and its controller need only be determined from past meaningful input signals, i.e., a historical access request sequence.

More particularly, the inventors have discovered that the latency of a DRAM access request can be determined by a sequence of memory requests that access the same Bank, a small fraction of the time before and after the DRAM access request. This is because (1) the line cache state in a DRAM Bank has the greatest effect on the access latency, and the line cache state is determined by the last request to access the Bank, which was not affected by the older, more recent access requests; (2) non-blocking Cache (Non-blocking Cache) brings Memory-level Parallelism (Memory-level Parallelism), that is, a Cache continuously sends out a plurality of access requests under the condition of not receiving a reply and may receive reply data out of order, so that the processing flow of a single request is influenced by all requests in a short time before and after the request, and is processed in advance or in a delayed manner.

The invention provides that only when no access request of the same Bank is sent in a time period after a memory access request q is sent and before reply data is received, a DRAM delay prediction model (black box model) samples q and predicts the delay of the q. The reasons are mainly as follows: (1) the prediction of the delay requires relative real-time, and the time difference between two consecutive requests for a Bank access can be large, so the input of the model cannot contain future information; (2) the sampling condition limitation ensures that future requests cannot influence the processing time of the request q, so that the historical memory access request sequence information input by the model is enough to perform delay prediction of the current request; (3) the calculation time required by the model is possibly longer, and part of requests are selected by a sampling method to predict delay and performance loss, so that sufficient time can be reserved for calculation, and the power consumption of the system can be reduced.

In view of the above discussion, to find a generic latency prediction mechanism that minimizes unnecessary research into the internal structure of the DRAM, based on the assumption that the current state of the DRAM and its controller can be determined by historical access requests from a portion of the current Bank, the present invention attempts to model the Bank structure in the DRAM, and predict the latency of the current request based on sampling of the portion of the request that meets certain conditions, using a limited number of access histories from the corresponding Bank. Certain conditions may be, for example, every 100 fetch requests, or a period of time after the request is issued and before a reply is received, no new instructions are issued, or other filtering conditions may be applicable.

The invention provides that the DRAM Bank is modeled by using a machine learning method, and the method has the advantages that: (1) predicting the delay of the current memory access request at a relatively high precision by utilizing basic arithmetic operation and a small amount of historical information storage calculation; (2) the machine learning model has universality and universality, the requirements of the training process on the DRAM controller and the DRAM particle internal information are very little, and the memory access information training model only needs to be grabbed again when the memory access information training model is transplanted on different platforms; (3) based on the partial sampling of the memory access request, the memory access delay prediction number is effectively reduced, and the dynamic power consumption of the system is reduced.

The invention adopts a machine learning method and models a DRAM Bank based on a Multi-layer Perceptron (MLP) model, and the reasons are as follows: (1) the MLP has universal universality, and can accurately simulate DRAM banks under different configurations under different parameters, so that a memory access delay prediction mechanism can be multiplexed among different platforms, and the transplanting workload is effectively reduced; (2) the MLP model and the ReLU activation function do not comprise complex function operation, the result can be output only through simple multiply-add operation, a large number of queues and state machines in the DRAM and a controller thereof do not need to be maintained, and the power consumption expense of a delay prediction mechanism is effectively reduced.

As shown in FIG. 2, the present invention uses a current request h to access a target Bank₀And past k memory access histories h_i(i-1, …, k) to predict memory access latency, where h_i(i-0, …, k) contains the information h_iIs sent out at time t_i、h_iRow address row of access_iAnd column address col_iThe actual input memory history of the model can be represented by the difference between the historical request information and the current request information. In the training and prediction of the model, history h is stored for each access_iWe use t₀-t_i,row₀-row_i,col₀-col_iTo represent their relationship to the current request, the output of the model being that of the current requestAccess delay (g) (h)₀,…,h_k) The training process of the model is to fit the function g continuously. The memory access delay of the current request can be obtained by operating some preset threads or programs in advance and catching the memory access conditions and the delay of the corresponding request.

(2) Program performance loss modeling

From the viewpoint of microstructure, the execution process of the process can be divided into an inner core part and an outer core part, wherein the flow in the core comprises the access of an arithmetic unit and a core private cache, and the outer core part comprises the read request sent to the DRAM, the internal processing of the DRAM and the like.

As shown in FIG. 3, in a sequential processor, after an access instruction causes an L2 cache miss, the internal core pipeline is stalled and execution continues after the DRAM returns data. Therefore, during the execution of the program, the sequential processing of the CPU clock cycles can be strictly divided into the two types of the core time and the core time, wherein only the core time is interfered by the multiple cores.

In an out-of-order processor, instructions that are located later under certain conditions may be executed in advance, and therefore a cache miss does not necessarily cause a stall of the CPU core internal pipeline. However, from the overall system perspective, in any one clock cycle, either the out-of-core Memory requests are being processed or all the Memory requests are completed, and these two types of clock Cycles are respectively referred to as Memory Cycles (MCs) and Non-Memory Cycles (NMCs).

On the other hand, in the multi-core architecture, due to the interference caused by resource contention among the cores, the delay of the requests from L2 to the DRAM controller via the system bus is increased by the multi-core resource contention as a whole, for example, when the system bus is occupied by another core when the request is issued, the request must wait for the request of another core to complete transmission before being issued. The largest source of inter-core interference is within the DRAM Bank. The line Cache mechanism is similar to Cache, and when the locality of a program is good, the access delay of the program is greatly reduced. However, in a multi-core environment, the access requests of other processes are likely to be inserted between two consecutive access requests of a single process, so that the locality of DRAM access is damaged, and the access delay is influenced.

In view of the above two aspects, the present invention considers that, when running simultaneously with other processes in a multi-core architecture, the number of MC of one process is increased compared to that of the other process when running alone, and the number of NMC is the same. In a multi-core environment, by judging whether an uncompleted DRAM access request exists currently, hardware can divide each clock cycle into NMC and MC, and count the number of the two types of time cycles within a time interval, which are respectively marked as A and B. Due to the existence of inter-core interference, the memory access request delay is increased and the number of MCs is increased, and the higher the increase ratio is, the larger the interference brought by resource competition is, and the larger the performance loss of the process is.

Based on this, the invention provides a Memory Access delay-based performance loss Estimation (SEMAL) model: obtaining process single-core memory access delay t through machine learning method_soloThen, the multi-core actual memory access delay t is utilized_mixWe can estimate the increase ratio of the MC number of processes, LatS ═ t, based on the delay Scale-up (LatS)_mix/t_solo. Corresponding to the single operation scene, when the process executes the same stage, the number of NMCs is still A, but the number of MCs is reduced by the LatS proportion, namely the execution time required by the process is A + B/LatS.

Therefore, in a multi-core environment, based on the relative extension of the execution time of the process, we calculate the performance loss of the process relative to the single runtime as:

wherein Execution _ Time is Execution Time of the process or program, Execution _ Time_soloExecution _ Time, which is the individual Execution Time of a process or program_mixIs the hybrid execution time of a process or program.

(3) Service quality guarantee technology based on token bucket

Based on the SEMAL model, the invention provides an Automatic Memory Bandwidth Allocation mechanism (Automatic Memory Bandwidth Allocation, AutoMBA): by utilizing SEMAL, the running condition of a program can be dynamically monitored during running, the performance loss degree of the program can be obtained, and the bandwidth requirement under the optimal condition (single running) can be predicted and obtained based on the memory access bandwidth during mixed running. By using the token bucket technology, the system can regulate and control the access bandwidth of different cores, and the service quality of key application is preferentially ensured by limiting the access flow with low priority.

Token Bucket (TB) technology is a basic tool of AutoMBA, and can effectively and accurately control access bandwidth of different cores. As shown in fig. 4, each CPU core has a private, independent token bucket, a certain number of tokens (Inc) are automatically added at regular intervals (Freq), and the token bucket is set with a maximum capacity (Size). All access requests sent by the core pass through the token bucket, any request data packet is marked when entering the token bucket, and the time when the request data packet enters the token bucket is recorded. At this point, if there are tokens available in the token bucket (e.g., t)₀PACKET sent at a moment₀) The packet is sent to the lower layer while the number of tokens is reduced according to the amount of data requested. Otherwise, if there are no tokens remaining in the token bucket (e.g., t)₁,t₂PACKET sent out at different time₁,PACKET₂) The request may be blocked and sent to a wait queue. A timer synchronized to the system clock is set in the token bucket, which resets each time Freq is reached and triggers the token to automatically increment its Inc number of tokens. At this time, in case there are still remaining tokens, the request in the waiting queue is sent to the lower layer and the number of tokens is reduced accordingly, e.g. t₃Time of day release previously blocked PACKET₂,PACKET₃If the request data amounts of the two requests are respectively 1, the ntokens is simultaneously decreased by 2.

In each Sampling period (Sampling Interval, SI): (1) the TB of each core automatically runs according to the set parameters, and the core is limited to send the memory access request based on the number of the current remaining tokens; (2) a delay prediction module (LPM) records the access history of a high priority core (or target process), predicting the coreLatency t of memory access requests issued by the core when running alone_solo(ii) a (3) The token bucket controller (TBM) processes access requests sent by TB (transport Block), forwards the requests from different cores to the DRAM controller, and obtains predicted t from LPM_soloDetecting the actual memory access delay t_mixUsing t_soloAnd t_mixRelative proportion and core memory access clock period number, estimating the performance loss of the target process relative to the independent operation of the target process, and recording the performance loss in a preset relevant register.

One Update Interval (UI) consists of multiple sampling cycles, and at the end of each UI, the AutoMBA mechanism evaluates the degree of target process performance loss in past UIs. When the performance loss of the target process is excessive, the access flow of other cores is automatically limited through the TBM, so that the interference among multiple cores is reduced, and the performance of the target process is improved; when the target process performance meets the requirements, the flow control to the remaining cores may be relaxed.

The control algorithm of AutoMBA is divided into two steps of observation (observer) and action (Act). And (3) performing the operation by using the observer at the end of each SI, combining the memory access delay and the memory access period counting, calculating the performance loss of the process by using hardware and setting a corresponding counter. Act is carried out when each UI is finished, if hardware judges that the target process meets less than 10% of performance loss in most SI, the maximum allowable flow of other cores is increased, and the more time is met, the more the number is increased; when the performance loss of the target process reaches more than 50% in not less than 3 SIs, directly reducing the flow allowed by the rest cores by half; when the performance loss of the target process is in the rest interval, the corresponding token bucket Inc parameter is adjusted.

The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.

Claims

1. A memory resource dynamic regulation and control method based on memory access and performance modeling is characterized by comprising the following steps:

step 2, when the multi-process system runs in a multi-process mode, recording a target memory access request of a target process and inputting the target memory access request into the memory access delay model to obtain the memory access delay t of the target memory access request under the condition of no multi-process interference_soloWhile detecting the targetActual memory access delay t of memory access request_mixBy memory access delay t_soloDivide by the memory access delay t_mixObtaining the memory access delay improvement proportion;

2. The memory resource dynamic regulation and control method based on memory access and performance modeling as claimed in claim 1, wherein the historical memory access request information includes current request information h for accessing the target Bank₀And past k memory access histories h_i(i ═ 1, …, k), where h is_i(i-0, …, k) includes h_iIs sent out at time t_i、h_iRow address row of access_iAnd column address col_iThe input of the memory access delay model is the difference value t between the memory access history and the current request information₀-t_i、row₀-row_iAnd col₀-col_iThe output of the access delay model is the current request information h₀Access delay of g (h)₀,…,h_k) And finishing the training of the memory access delay model by a fitting function g.

3. The memory resource dynamic regulation method based on memory access and performance modeling of claim 1, wherein the performance loss is:

4. The memory resource dynamic regulation and control method based on memory access and performance modeling as claimed in claim 1, wherein the step 4 comprises: using token bucket technology, DRAM access traffic for processes other than the target process is limited.

5. The memory resource dynamic regulation and control method based on memory access and performance modeling as claimed in claim 1,

6. A memory resource dynamic regulation and control system based on memory access and performance modeling is characterized by comprising:

7. The memory resource dynamic regulation and control system based on memory access and performance modeling as claimed in claim 1, wherein the historical memory access request information includes current request information h for accessing the target Bank₀And past k memory access histories h_i(i ═ 1, …, k), where h is_i(i-0, …, k) includes h_iIs sent out at time t_i、h_iRow address row of access_iAnd column address col_iThe input of the memory access delay model is the difference value t between the memory access history and the current request information₀-t_i、row₀-row_iAnd col₀-col_iThe output of the access delay model is the current request information h₀Access delay of g (h)₀,…,h_k) And finishing the training of the memory access delay model by a fitting function g.

8. The memory resource dynamic regulation system based on memory access and performance modeling of claim 1, wherein the performance penalty is:

9. The memory resource dynamic regulation system based on memory access and performance modeling of claim 1, wherein the module 4 comprises: using token bucket technology, DRAM access traffic for processes other than the target process is limited.

10. The memory resource dynamic regulation system based on memory access and performance modeling of claim 1,