CN112540934B - Method and system for ensuring service quality when multiple delay key programs are executed together - Google Patents
Method and system for ensuring service quality when multiple delay key programs are executed together Download PDFInfo
- Publication number
- CN112540934B CN112540934B CN202011465046.2A CN202011465046A CN112540934B CN 112540934 B CN112540934 B CN 112540934B CN 202011465046 A CN202011465046 A CN 202011465046A CN 112540934 B CN112540934 B CN 112540934B
- Authority
- CN
- China
- Prior art keywords
- program
- type
- stage
- cache space
- phase
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention discloses a method and a system for ensuring the service quality when a plurality of delay key programs are executed together, which start the plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last-level cache space; dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals; sampling a program interval in each program phase of each delay critical program in a process in which a plurality of delay critical programs are operated together; calculating first, second and third actual performance data of each program stage according to the sampling data; classifying the phase types and performances of the corresponding program phases according to the first, second and third actual performance data; and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
Description
Technical Field
The present application relates to the field of parallel and distributed computing technologies, and in particular, to a method and system for ensuring quality of service when a plurality of delay-critical programs are executed together.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Data centers have grown to maturity from concept. In a data center, a large number of programs are executed on as few servers as possible in order to improve resource utilization. In a server node, there are multiple programs executing on a single node. The advantage of multiple programs executing together is that the utilization of the server can be increased, and the problem is that the performance of the program is reduced. The degree of performance degradation of a program depends on program characteristics, and for some programs, performance degradation is not significant when the program is executed with other programs, and for some programs, performance degradation is significant when the program is executed with other programs.
At the same time, a large number of delay-critical programs are running in the data center. The client executes the program in the data center and has certain service quality requirements on the program, such as the performance of the program cannot be lower than 90% of the performance of the program when the program is executed alone. When a delay-critical program is executed together with other programs, it is easy to cause severe performance degradation due to performance interference, and thus the quality of service requirements of customers cannot be satisfied. This is a problem that must be solved. Therefore, a method is needed to ensure the service quality of the delay-critical program on the basis of improving the utilization rate of system resources as much as possible. This is a problem that the present application has to solve.
Disclosure of Invention
In order to solve the defects of the prior art, the application provides a method and a system for ensuring the service quality when a plurality of delay key programs are executed together;
in a first aspect, the present application provides a method for ensuring quality of service when multiple delay-critical programs are executed together;
a method for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
initializing a hardware counter, and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
And dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
In a second aspect, the present application provides a system for ensuring quality of service when multiple delay-critical programs are executed together;
a system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
A dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
In a third aspect, the present application further provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs are stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first aspect.
In a fourth aspect, the present application also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.
In a fifth aspect, the present application also provides a computer program (product) comprising a computer program for implementing the method of any of the preceding first aspects when run on one or more processors.
Compared with the prior art, the beneficial effects of this application are:
by monitoring the performance indexes of the delay key programs in real time and utilizing CAT to dynamically divide LLC resources for the delay key programs of different types, the performance of the delay key programs in common execution is ensured, and the utilization rate of the LLC resources is improved as much as possible.
Intel technology supporting Last Level Cache (LLC) allocation may better utilize the cache through cache partitioning. The present application may utilize this technique to guarantee the performance requirements of users by preventing delay-critical programs from polluting each other's caches. In addition, the present application may better meet the performance requirements of the user by allocating more LLC resources to the performance-benefited delay-critical programs and reducing or stopping allocation to those delay-critical programs that do not benefit.
The invention dynamically adjusts the space occupied by the program by using the performance index of the stage of the program during operation. The invention can promote the quantity of the delay key programs and the resource utilization rate of the LLC as much as possible while ensuring the service quality of the programs.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a flowchart of a resource partitioning method according to a first embodiment;
FIG. 2 is a flowchart of the program phase performance analysis of the first embodiment.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Interpretation of terms:
delay critical procedures, refer to: applications with strict requirements on tail latency, tail latency is an important performance indicator for latency critical programs.
LLC, means: the Last Level Cache, i.e. the Last Level Cache, refers to the highest Level Cache that is commonly shared by all functional units (e.g. CPU core, IGP and DSP) on the chip.
CAT, meaning: cache Allocation Technology, has the basic goal of enabling resource Allocation based on application priority or class of service (CLOS). The Intel Xeon processor E5 v4 family (and a subset of the Intel Xeon processor E5 v3 family devoted to communication) introduces functionality to configure and utilize cache allocation techniques on the last level cache.
CLOS, refers to: class of Service, CLOS, as an abstraction, may add multiple resource control attributes, thereby reducing software overhead during context switching.
Example one
The embodiment provides a method for ensuring the service quality when a plurality of delay key programs are executed together;
a method for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
S101: initializing a hardware counter, and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share a last level cache space (LLC);
s102: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
s103: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together;
calculating first, second and third actual performance data for each program phase from the sampled data;
classifying the phase types of the corresponding program phases according to the first actual performance data;
classifying the performance of the program phase according to the second and third actual performance data;
s104: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
It should be understood that in S101, to ensure that each latency critical program has no CPU time contention, the initial state of the system places each latency critical program in a different core, with the latency critical programs on each core sharing the LLC.
For example, the plurality of delay criticalities refer to two or more delay critical processes.
As one or more embodiments, the S101 further includes:
assuming that the buffer space LLC has N paths of spaces in total, reserving M paths of spaces as standby spaces, and averagely distributing the rest N-M paths of spaces to all delay key programs; n and M are both positive integers.
Illustratively, each delay-critical program is partitioned into different CLOS and isolated by CAT, reducing the impact between delay-critical programs.
For example, assuming that the system has N LLC spaces, M cache way spaces are reserved for CLOS #1 as candidate spaces, and the remaining N-M spaces are equally allocated to all latency critical programs.
Illustratively, assuming that the latency critical procedure is two, the LLC space is allocated from low to high by address, since the CAT only supports a contiguous partition of LLC space. Delay Key 1 occupies the amount of space represented by CLOS #0, equal in size toAnd (4) a way. The N LLC spaces are isolated and defined as CLOS #1, which is a spare space whose addresses begin with the address next to the last address of CLOS # 0. The remaining LLC space is allocated to delay-critical program 2, with the space defined as CLOS # 2.
As one or more embodiments, the S102: dividing each delay key program into a plurality of program stages; the method comprises the following specific steps:
the number of instructions is counted by a counter, and the stages of the program are divided by executing a set number of instructions.
As one or more embodiments, the S102: dividing each program stage into a plurality of program intervals; the method comprises the following specific steps:
counting the condition branch instructions, and triggering interruption after executing the X condition branch instructions;
namely every X condition branch instructions are used as a program interval; another hardware counter is responsible for recording the total number of instructions executed during the period, and X is a positive integer.
Illustratively, each program phase is subdivided into a number of program intervals; the method comprises the following specific steps:
each program phase is divided into a number of program intervals using different sampling periods.
It should be understood that, in S102, each delay-critical program is divided into program phases containing a fixed number of instructions, and in order to better acquire program performance information, the present application introduces a two-stage phase detection method.
Stage division: in the running process of a program, performance indexes (such as IPC) of the program may change, program segments belonging to the same stage have similar performance indexes, while program segments belonging to different stages have different performance indexes, and the program can be divided into different stages according to the performance indexes of the program. The method uses a fixed instruction number to divide the stages of the program, and then uses IPC indexes to classify the runtime stages of the program, wherein the fixed instruction number can be 1000 ten thousand, 1 hundred million and 10 hundred million.
And (5) dividing the interval. In order to obtain the phase information of the program in the running process in more detail, the method adopts an interval division method for subdividing the program phase. In order to reduce sampling overhead and information loss, performance data is sampled once every X condition branch instructions. The sampling period can be selected according to practical situations, for example, 100M and 200M.
As one or more embodiments, the S103: sampling a program interval of each program stage of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; the method comprises the following specific steps:
in the process that a plurality of delay key programs are operated together, a hardware performance counter is utilized to sample the program interval of each program phase of each delay key program, and the instruction number IPC, the number of LLC uncommitted numbers, the number of LLC hits and the number of LLC references of each period of a performance index are obtained.
Calculating program interval index MPKI by using the obtained LLC number of hits, LLC number of hits and LLC number of references LLC And HPKI LLC . Program-spaced MPKI LLC And HPKI LLC The calculation formula of (a) is as follows:
Num Miss refers to the number of LLC misses, Num Ins Refers to LLC reference number, Num Hit Refers to the number of LLC hits.
It should be understood that, with the performance indicators of all intervals in the program stage, the average of the performance indicators is calculated as the performance indicator of the stage where the delay-critical program is located, and is used for analyzing the stage behavior of the delay-critical program.
As one or more embodiments, the S103: calculating first, second and third actual performance data for each program phase from the sampled data; the method comprises the following specific steps:
calculating first actual performance data of each program stage according to the sampling data; the first actual performance data refers to: IPC average of number of instructions per cycle;
calculating second actual performance data of each program stage according to the sampling data; second actual performance data, refer to: mean MPKI of missed instruction count per thousand instructions in LLC LLC ;
Calculating third actual performance data of each program stage according to the sampling data; the third actual performance data refers to: hit finger per thousand instructions on LLCMean value of order number HPKI LLC 。
Illustratively, the IPC and MPKI of the phase are calculated according to the data sampled at each interval in the phase LLC (number of missed instructions per thousand instructions on LLC) and HPKI LLC (number of hits per thousand instructions on LLC). IPC LLC 、MPKI LLC 、HPKI LLC The average calculation steps are as follows:
IPC LLC1 IPC index referring to the first interval, n represents the number of intervals in a program phase, MPKI LLC1 Refers to the MPKI index, HPKI, of the first interval LLC1 Refers to the HPKI index for the first interval.
As one or more embodiments, the S103: classifying the phase types of the corresponding program phases according to the first actual performance data; the method comprises the following specific steps:
where α is a first set threshold value and β is a second set threshold value.
As one or more embodiments, the S103: classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
where η refers to a third set threshold, γ refers to a fourth set threshold, and η < γ.
As one or more embodiments, the S104: dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program; the method comprises the following specific steps:
Judging the stage type and the performance type of each program stage of each delay key program;
if the type of the program phase is A type and the performance type is a or b type, primarily judging that the cache space needs to be increased in the program phase; in the process of increasing the cache space, if the type of the program stage is not changed, immediately stopping increasing the cache space and reducing the increased cache space; if the phase type is changed into B or C, saving modification;
if the type of the program phase is A type and the performance type is c type, preliminarily judging that the cache space of the program phase needs to be reduced; if the 1-way cache space is reduced,if not, continuing to reduce the cache space, otherwise, restoring the modification operation;
if the program phase is B type and the performance type is B type, the cache space occupied by the program phase is not changed;
if the program stage is of type B and the performance type is of type a or c, preliminarily judging that the cache space needs to be reduced in the program stage; if the 1-path cache space is reduced and the stage type is not changed into the A type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program stage is C type and the performance type is a type or C type, judging whether the program stage has the phenomenon of surplus resources, and if the 1-path cache space is reduced and the stage type is not changed into the A type or the B type, continuing to reduce the cache space; otherwise, restoring the modified operation;
If the program phase is of type C and the performance type is of type b, the cache space occupied by the program phase is not changed;
as one or more embodiments, the method further comprises:
acquiring the resource use condition of the LLC to perform dynamic management;
if the CLOS #1 has free space, acquiring the space from the CLOS #1, and if the cache space in the CLOS #1 is completely allocated, judging whether the adjacent programs on the physical address are in a resource surplus state; if yes, distributing redundant space according to the performance status of the adjacent program, and if the resource surplus condition does not exist, continuing to wait for the free space; and if the free space does not appear for a long time, carrying out data migration.
The phase performance data for each program is registered in a historical phase & performance table (HPPT). The HPPT stores phase information and performance information for each program. The method and the device dynamically adjust the cache occupied by the current stage according to the stage behavior of the running program and the performance information of the program during running.
And dynamically adjusting the cache space occupied by the delay key program according to the stage of the delay key program, the MPKI _ LLC and the HPKI _ LLC.
Fig. 1 depicts a resource partitioning method. For each program that needs to be executed, program performance information is obtained using a hardware performance counter.
Fig. 2 depicts a program phase performance analysis method.
For each program to be executed, use is made ofThe index analyzes the program runtime phase and utilizesAndthe index analyzes the program performance, and further dynamically adjusts the cache space occupied by the program.
Example two
The embodiment provides a system for ensuring the service quality when a plurality of delay key programs are executed together;
a system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
A dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
It should be noted here that the initialization module, the phase division module, the classification module and the dynamic adjustment module correspond to steps S101 to S104 in the first embodiment, and the modules are the same as the corresponding steps in the implementation example and application scenarios, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.
In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The proposed system can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed.
EXAMPLE III
The present embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein, a processor is connected with the memory, the one or more computer programs are stored in the memory, and when the electronic device runs, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first embodiment.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.
The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Example four
The present embodiments also provide a computer-readable storage medium for storing computer instructions, which when executed by a processor, perform the method of the first embodiment.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (8)
1. A method for ensuring quality of service when a plurality of delay critical programs are executed together, comprising:
initializing a hardware counter, and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data; calculating first, second and third actual performance data for each program phase from the sampled data; the method comprises the following specific steps: calculating first actual performance data of each program stage according to the sampling data; the first actual performance data refers to: IPC average of number of instructions per cycle; calculating second actual performance data of each program stage according to the sampling data; second actual performance data, refer to: mean MPKI of missed instruction count per thousand instructions in LLC LLC (ii) a Calculating third actual performance data of each program stage according to the sampling data; the third actual performance data refers to: average HPKI of hit instruction count per thousand instructions on LLC LLC (ii) a Classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps: according toProgram phase types are divided into 3 classes:
wherein alpha is a first set threshold value, beta is a second set threshold value;
alternatively, the first and second electrodes may be,
classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
wherein η refers to a third set threshold, γ refers to a fourth set threshold, and η < γ;
dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program; the method comprises the following specific steps:
judging the stage type and the performance type of each program stage of each delay key program;
if the type of the program phase is A type and the performance type is a or b type, primarily judging that the cache space needs to be increased in the program phase; in the process of increasing the cache space, if the type of the program stage is not changed, immediately stopping increasing the cache space and reducing the increased cache space; if the phase type is changed into B or C, saving modification;
If the type of the program phase is A type and the performance type is c type, preliminarily judging that the cache space of the program phase needs to be reduced; if the cache space of 1 way is reduced,if not, continuing to reduce the cache space, otherwise, restoring the modification operation;
if the program phase is of type B and the performance type is of type B, the cache space occupied by the program phase is not changed;
if the program stage is of type B and the performance type is of type a or c, preliminarily judging that the cache space needs to be reduced in the program stage; if the 1-path cache space is reduced and the stage type is not changed into the A type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program stage is C type and the performance type is a type or C type, judging whether the program stage has the phenomenon of surplus resources, and if the 1-path cache space is reduced and the stage type is not changed into the A type or the B type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program phase is C type and the performance type is b type, the cache space occupied by the program phase is not changed;
and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
2. The method of claim 1, wherein the initializing a hardware counter starts a plurality of latency critical processes; further comprising:
assuming that a cache space LLC has N paths of spaces in total, reserving M paths of spaces as standby spaces, and averagely distributing the rest N-M paths of spaces to all delay key programs; n and M are both positive integers.
3. The method of claim 1, wherein each delay-critical program is divided into a number of program phases; the method comprises the following specific steps:
the number of instructions is counted by a counter, and the stages of the program are divided by executing a set number of instructions.
4. The method of claim 1, wherein each program phase is subdivided into a number of program intervals; the method comprises the following specific steps:
counting the condition branch instructions, and triggering interruption after executing the X condition branch instructions;
namely every X condition branch instructions are used as a program interval; another hardware counter is responsible for recording the total number of instructions executed during the period, and X is a positive integer.
5. The method of claim 1, wherein the program interval of each program phase of each delay critical program is sampled using a hardware performance counter during which a plurality of delay critical programs are being run together; the method comprises the following specific steps:
In the process that a plurality of delay key programs are operated together, a hardware performance counter is utilized to sample the program interval of each program phase of each delay key program, and the instruction number IPC, the number of LLC uncommitted numbers, the number of LLC hits and the number of LLC references of each period of a performance index are obtained.
6. A system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter, and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data; calculating first, second and third actual performance data for each program phase from the sampled data; the method comprises the following specific steps: calculating first actual performance data of each program stage according to the sampling data; the first actual performance data refers to: IPC average of number of instructions per cycle; calculating second actual performance data of each program stage according to the sampling data; second actual performance data, refer to: mean MPKI of missed instruction count per thousand instructions in LLC LLC (ii) a Calculating third actual performance data of each program stage according to the sampling data; the third actual performance data refers to: average HPKI of hit instruction count per thousand instructions on LLC LLC (ii) a Classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
wherein alpha is a first set threshold value, beta is a second set threshold value;
alternatively, the first and second electrodes may be,
classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
wherein η refers to a third set threshold, γ refers to a fourth set threshold, and η < γ;
dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program; the method comprises the following specific steps:
judging the stage type and the performance type of each program stage of each delay key program;
if the type of the program phase is A type and the performance type is a or b type, primarily judging that the cache space needs to be increased in the program phase; in the process of increasing the cache space, if the type of the program stage is not changed, immediately stopping increasing the cache space and reducing the increased cache space; if the phase type is changed into B or C, saving modification;
If the type of the program phase is A type and the performance type is c type, preliminarily judging that the cache space of the program phase needs to be reduced; if the cache space of 1 way is reduced,if not, continuing to reduce the cache space, otherwise, restoring the modification operation;
if the program phase is B type and the performance type is B type, the cache space occupied by the program phase is not changed;
if the program stage is of type B and the performance type is of type a or c, preliminarily judging that the cache space needs to be reduced in the program stage; if the 1-path cache space is reduced and the stage type is not changed into the A type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program stage is C type and the performance type is a type or C type, judging whether the program stage has the phenomenon of surplus resources, and if the 1-path cache space is reduced and the stage type is not changed into the A type or the B type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program phase is C type and the performance type is b type, the cache space occupied by the program phase is not changed;
a dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
7. An electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of any of the preceding claims 1-5.
8. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011465046.2A CN112540934B (en) | 2020-12-14 | 2020-12-14 | Method and system for ensuring service quality when multiple delay key programs are executed together |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011465046.2A CN112540934B (en) | 2020-12-14 | 2020-12-14 | Method and system for ensuring service quality when multiple delay key programs are executed together |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112540934A CN112540934A (en) | 2021-03-23 |
CN112540934B true CN112540934B (en) | 2022-07-29 |
Family
ID=75018579
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011465046.2A Active CN112540934B (en) | 2020-12-14 | 2020-12-14 | Method and system for ensuring service quality when multiple delay key programs are executed together |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112540934B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113821324B (en) * | 2021-09-17 | 2022-08-09 | 海光信息技术股份有限公司 | Cache system, method, apparatus and computer medium for processor |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9244732B2 (en) * | 2009-08-28 | 2016-01-26 | Vmware, Inc. | Compensating threads for microarchitectural resource contentions by prioritizing scheduling and execution |
CN101916230A (en) * | 2010-08-11 | 2010-12-15 | 中国科学技术大学苏州研究院 | Partitioning and thread-aware based performance optimization method of last level cache (LLC) |
US9401869B1 (en) * | 2012-06-04 | 2016-07-26 | Google Inc. | System and methods for sharing memory subsystem resources among datacenter applications |
US10554505B2 (en) * | 2012-09-28 | 2020-02-04 | Intel Corporation | Managing data center resources to achieve a quality of service |
CN103077128B (en) * | 2012-12-29 | 2015-09-23 | 华中科技大学 | Shared buffer memory method for dynamically partitioning under a kind of multi-core environment |
CN103235764B (en) * | 2013-04-11 | 2016-01-20 | 浙江大学 | Thread aware multinuclear data pre-fetching self-regulated method |
CN104572493A (en) * | 2013-10-23 | 2015-04-29 | 华为技术有限公司 | Memory resource optimization method and device |
US9626295B2 (en) * | 2015-07-23 | 2017-04-18 | Qualcomm Incorporated | Systems and methods for scheduling tasks in a heterogeneous processor cluster architecture using cache demand monitoring |
CN107463510B (en) * | 2017-08-21 | 2020-05-08 | 北京工业大学 | High-performance heterogeneous multi-core shared cache buffer management method |
CN110618872B (en) * | 2019-09-25 | 2022-04-15 | 山东师范大学 | Hybrid memory dynamic scheduling method and system |
CN111258927B (en) * | 2019-11-13 | 2022-05-03 | 北京大学 | Application program CPU last-level cache miss rate curve prediction method based on sampling |
CN112000465B (en) * | 2020-07-21 | 2023-02-03 | 山东师范大学 | Method and system for reducing performance interference of delay sensitive program in data center environment |
-
2020
- 2020-12-14 CN CN202011465046.2A patent/CN112540934B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN112540934A (en) | 2021-03-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102456085B1 (en) | Dynamic memory remapping to reduce row buffer collisions | |
US6662272B2 (en) | Dynamic cache partitioning | |
US7725657B2 (en) | Dynamic quality of service (QoS) for a shared cache | |
US7899994B2 (en) | Providing quality of service (QoS) for cache architectures using priority information | |
US8190795B2 (en) | Memory buffer allocation device and computer readable medium having stored thereon memory buffer allocation program | |
US7103735B2 (en) | Methods and apparatus to process cache allocation requests based on priority | |
US9223712B2 (en) | Data cache method, device, and system in a multi-node system | |
CN108845960B (en) | Memory resource optimization method and device | |
US7185167B2 (en) | Heap allocation | |
US20080235487A1 (en) | Applying quality of service (QoS) to a translation lookaside buffer (TLB) | |
US20050125613A1 (en) | Reconfigurable trace cache | |
KR101356033B1 (en) | Hybrid Main Memory System and Task Scheduling Method therefor | |
JP3727887B2 (en) | Shared register file control method in multi-thread processor | |
US20120079494A1 (en) | System And Method For Maximizing Data Processing Throughput Via Application Load Adaptive Scheduling And Content Switching | |
US10725940B2 (en) | Reallocate memory pending queue based on stall | |
US8769201B2 (en) | Technique for controlling computing resources | |
CN106294192B (en) | Memory allocation method, memory allocation device and server | |
CN112540934B (en) | Method and system for ensuring service quality when multiple delay key programs are executed together | |
US9189279B2 (en) | Assignment method and multi-core processor system | |
Li | Orchestrating thread scheduling and cache management to improve memory system throughput in throughput processors | |
CN112579277B (en) | Central processing unit, method, device and storage medium for simultaneous multithreading | |
Ikeda et al. | Application aware DRAM bank partitioning in CMP | |
US20240061780A1 (en) | Systems and methods for memory bandwidth allocation | |
CN112506660A (en) | Method and device for optimizing memory of audio/video codec and storage medium | |
CN114780249A (en) | Cache management method, system, device and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |