CN112540934A - Method and system for ensuring service quality when multiple delay key programs are executed together - Google Patents

Method and system for ensuring service quality when multiple delay key programs are executed together Download PDF

Info

Publication number
CN112540934A
CN112540934A CN202011465046.2A CN202011465046A CN112540934A CN 112540934 A CN112540934 A CN 112540934A CN 202011465046 A CN202011465046 A CN 202011465046A CN 112540934 A CN112540934 A CN 112540934A
Authority
CN
China
Prior art keywords
program
type
stage
phase
delay key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011465046.2A
Other languages
Chinese (zh)
Other versions
CN112540934B (en
Inventor
王琳
李东桦
黄天元
耿世超
周莲莲
季红滨
张昭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Big Data Center
Shandong Normal University
Original Assignee
Shandong Big Data Center
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Big Data Center, Shandong Normal University filed Critical Shandong Big Data Center
Priority to CN202011465046.2A priority Critical patent/CN112540934B/en
Publication of CN112540934A publication Critical patent/CN112540934A/en
Application granted granted Critical
Publication of CN112540934B publication Critical patent/CN112540934B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a system for ensuring the service quality when a plurality of delay key programs are executed together, which start the plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last-level cache space; dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals; sampling a program interval in each program phase of each delay critical program in a process in which a plurality of delay critical programs are operated together; calculating first, second and third actual performance data of each program stage according to the sampling data; classifying the phase types and performances of the corresponding program phases according to the first, second and third actual performance data; and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.

Description

Method and system for ensuring service quality when multiple delay key programs are executed together
Technical Field
The present application relates to the field of parallel and distributed computing technologies, and in particular, to a method and system for ensuring quality of service when a plurality of delay-critical programs are executed together.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Data centers have gone from concept to maturity. In a data center, a large number of programs are executed on as few servers as possible in order to improve resource utilization. In a server node, there are multiple programs executing on a single node. The advantage of multiple programs executing together is that the utilization of the server can be increased, and the problem is that the performance of the program is reduced. The degree of performance degradation of a program depends on program characteristics, and for some programs, performance degradation is not significant when the program is executed with other programs, and for some programs, performance degradation is significant when the program is executed with other programs.
At the same time, a large number of delay-critical programs are running in the data center. The client executes the program in the data center and has certain service quality requirements on the program, such as the performance of the program cannot be lower than 90% of the performance of the program when the program is executed alone. When a delay-critical program is executed together with other programs, it is easy to cause severe performance degradation due to performance interference, and thus the quality of service requirements of customers cannot be satisfied. This is a problem that must be solved. Therefore, a method is needed to ensure the service quality of the delay-critical program on the basis of improving the utilization rate of system resources as much as possible. This is a problem that the present application has to solve.
Disclosure of Invention
In order to solve the defects of the prior art, the application provides a method and a system for ensuring the service quality when a plurality of delay key programs are executed together;
in a first aspect, the present application provides a method for ensuring quality of service when multiple latency critical programs are executed together;
a method for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
In a second aspect, the present application provides a system for ensuring quality of service when multiple latency critical programs are executed together;
a system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
a dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
In a third aspect, the present application further provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs are stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first aspect.
In a fourth aspect, the present application also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.
In a fifth aspect, the present application also provides a computer program (product) comprising a computer program for implementing the method of any of the preceding first aspects when run on one or more processors.
Compared with the prior art, the beneficial effects of this application are:
by monitoring the performance indexes of the delay key programs in real time and utilizing CAT to dynamically divide LLC resources for the delay key programs of different types, the performance of the delay key programs in common execution is ensured, and the utilization rate of the LLC resources is improved as much as possible.
Intel technology supporting Last Level Cache (LLC) allocation may better utilize the cache through cache partitioning. The present application may utilize this technique to guarantee the performance requirements of users by preventing delay-critical programs from polluting each other's caches. In addition, the present application may better meet the performance requirements of the user by allocating more LLC resources to the performance-benefited delay-critical programs and reducing or stopping allocation to those delay-critical programs that do not benefit.
The invention dynamically adjusts the space occupied by the program by using the performance index of the stage of the program during operation. The invention can promote the quantity of the delay key programs and the resource utilization rate of the LLC as much as possible while ensuring the service quality of the programs.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a flowchart of a resource partitioning method according to a first embodiment;
FIG. 2 is a flowchart of the program phase performance analysis of the first embodiment.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Interpretation of terms:
delay critical procedure, refers to: applications with strict requirements on tail latency, tail latency is an important performance indicator for latency critical programs.
LLC means: the Last Level Cache, i.e. the Last Level Cache, refers to the highest Level Cache that is commonly shared by all functional units (e.g. CPU core, IGP and DSP) on the chip.
CAT, means: cache Allocation Technology, has the basic goal of enabling resource Allocation based on application priority or class of service (CLOS). The Intel Xeon processor E5 v4 family (and a subset of the Intel Xeon processor E5 v3 family devoted to communication) introduces functionality to configure and utilize cache allocation techniques on the last level cache.
CLOS, refers to: class of Service, CLOS, as an abstraction, may add multiple resource control attributes, thereby reducing software overhead during context switching.
Example one
The embodiment provides a method for ensuring the service quality when a plurality of delay key programs are executed together;
a method for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
s101: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share a last level cache space (LLC);
s102: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
s103: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together;
calculating first, second and third actual performance data for each program phase from the sampled data;
classifying the phase types of the corresponding program phases according to the first actual performance data;
classifying the performance of the program phase according to the second and third actual performance data;
s104: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
It should be understood that in S101, to ensure that each latency critical program has no CPU time contention, the initial state of the system places each latency critical program in a different core, with the latency critical programs on each core sharing the LLC.
Illustratively, the plurality of delay criticalities refers to two or more delay critical programs.
As one or more embodiments, the S101 further includes:
assuming that the buffer space LLC has N paths of spaces in total, reserving M paths of spaces as standby spaces, and averagely distributing the rest N-M paths of spaces to all delay key programs; n and M are both positive integers.
Illustratively, each delay-critical program is partitioned into different CLOS and isolated by CAT, reducing the impact between delay-critical programs.
For example, assuming that the system has N LLC spaces, M cache way spaces are reserved for CLOS #1 as candidate spaces, and the remaining N-M spaces are equally allocated to all latency critical programs.
Illustratively, assuming that the latency critical procedure is two, the LLC space is allocated from low to high by address, since the CAT only supports a contiguous partition of LLC space. Delay Key 1 occupies the amount of space represented by CLOS #0, equal in size to
Figure BDA0002833794340000071
And (4) a way. The N LLC spaces are isolated and defined as CLOS #1, which is a spare space whose addresses begin with the address next to the last address of CLOS # 0. The remaining LLC space is allocated to delay-critical program 2, with the space defined as CLOS # 2.
As one or more embodiments, the S102: dividing each delay key program into a plurality of program stages; the method comprises the following specific steps:
the number of instructions is counted by a counter, and the stages of the program are divided by executing a set number of instructions.
As one or more embodiments, the S102: dividing each program stage into a plurality of program intervals; the method comprises the following specific steps:
counting the condition branch instructions, and triggering interruption after executing the X condition branch instructions;
namely every X condition branch instructions are used as a program interval; another hardware counter is responsible for recording the total number of instructions executed during the period, and X is a positive integer.
Illustratively, each program phase is subdivided into a number of program intervals; the method comprises the following specific steps:
each program phase is divided into a number of program intervals using different sampling periods.
It should be understood that, in S102, each delay-critical program is divided into program phases containing a fixed number of instructions, and in order to better acquire program performance information, the present application introduces a two-stage phase detection method.
Stage division: in the running process of a program, performance indexes (such as IPC) of the program may change, program segments belonging to the same stage have similar performance indexes, while program segments belonging to different stages have different performance indexes, and the program can be divided into different stages according to the performance indexes of the program. The method uses a fixed instruction number to divide the stages of the program, and then uses IPC indexes to classify the runtime stages of the program, wherein the fixed instruction number can be 1000 ten thousand, 1 hundred million and 10 hundred million.
And (5) dividing the interval. In order to obtain the phase information of the program in the running process in more detail, the method adopts an interval division method for subdividing the program phase. In order to reduce sampling overhead and information loss, performance data is sampled once every X condition branch instructions. The sampling period can be selected according to practical situations, for example, 100M and 200M.
As one or more embodiments, the S103: sampling a program interval of each program stage of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; the method comprises the following specific steps:
in the process that a plurality of delay key programs are operated together, a hardware performance counter is utilized to sample the program interval of each program phase of each delay key program, and the instruction number IPC, the number of LLC uncommitted numbers, the number of LLC hits and the number of LLC references of each period of a performance index are obtained.
Calculating program interval index MPKI by using the obtained LLC number of hits, LLC number of hits and LLC number of referencesLLCAnd HPKILLC. Program-spaced MPKILLCAnd HPKILLCThe calculation formula of (a) is as follows:
Figure BDA0002833794340000081
Figure BDA0002833794340000082
NumMissrefers to the number of LLC misses, NumInsRefers to LLC reference number, NumHitRefers to the number of LLC hits。
It should be understood that, with the performance indicators of all intervals in the program stage, the average of the performance indicators is calculated as the performance indicator of the stage where the delay-critical program is located, and is used for analyzing the stage behavior of the delay-critical program.
As one or more embodiments, the S103: calculating first, second and third actual performance data for each program phase from the sampled data; the method comprises the following specific steps:
calculating first actual performance data of each program stage according to the sampling data; the first actual performance data refers to: IPC average of number of instructions per cycle;
calculating second actual performance data of each program stage according to the sampling data; second actual performance data, refer to: mean MPKI of missed instruction count per thousand instructions in LLCLLC
Calculating third actual performance data of each program stage according to the sampling data; the third actual performance data refers to: average HPKI of hit instruction count per thousand instructions on LLCLLC
Illustratively, the IPC and MPKI of the phase are calculated according to the data sampled at each interval in the phaseLLC(number of missed instructions per thousand instructions on LLC) and HPKILLC(number of hits per thousand instructions on LLC). IPCLLC、MPKILLC、HPKILLCThe average calculation steps are as follows:
Figure BDA0002833794340000091
Figure BDA0002833794340000092
Figure BDA0002833794340000093
IPCLLC1IPC indicator for the first interval, n indicates in a program phaseNumber of intervals, MPKILLC1Refers to the MPKI index, HPKI, of the first intervalLLC1Refers to the HPKI index for the first interval.
As one or more embodiments, the S103: classifying the phase types of the corresponding program phases according to the first actual performance data; the method comprises the following specific steps:
according to
Figure BDA0002833794340000094
Program phase types are divided into 3 classes:
a type:
Figure BDA0002833794340000095
b type:
Figure BDA0002833794340000096
class C:
Figure BDA0002833794340000097
where α is a first set threshold value and β is a second set threshold value.
As one or more embodiments, the S103: classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
according to
Figure BDA0002833794340000101
And
Figure BDA0002833794340000102
program performance types are classified into 3 types:
a type:
Figure BDA0002833794340000103
b type:
Figure BDA0002833794340000104
and c is as follows:
Figure BDA0002833794340000105
where η refers to a third set threshold, γ refers to a fourth set threshold, and η < γ.
As one or more embodiments, the S104: dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program; the method comprises the following specific steps:
judging the stage type and the performance type of each program stage of each delay key program;
if the type of the program phase is A type and the performance type is a or b type, primarily judging that the cache space needs to be increased in the program phase; in the process of increasing the cache space, if the type of the program stage is not changed, immediately stopping increasing the cache space and reducing the increased cache space; if the phase type is changed into B or C, saving modification;
if the type of the program phase is A type and the performance type is c type, preliminarily judging that the cache space of the program phase needs to be reduced; if the 1-way cache space is reduced,
Figure BDA0002833794340000106
if not, continuing to reduce the cache space, otherwise, restoring the modification operation;
if the program phase is B type and the performance type is B type, the cache space occupied by the program phase is not changed;
if the program stage is of type B and the performance type is of type a or c, preliminarily judging that the cache space needs to be reduced in the program stage; if the 1-path cache space is reduced and the stage type is not changed into the A type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program stage is C type and the performance type is a type or C type, judging whether the program stage has the phenomenon of surplus resources, and if the 1-path cache space is reduced and the stage type is not changed into the A type or the B type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program phase is C type and the performance type is b type, the cache space occupied by the program phase is not changed;
as one or more embodiments, the method further comprises:
acquiring the resource use condition of the LLC to perform dynamic management;
if the CLOS #1 has free space, acquiring the space from the CLOS #1, and if the cache space in the CLOS #1 is completely allocated, judging whether the adjacent programs on the physical address are in a resource surplus state; if yes, distributing redundant space according to the performance status of the adjacent program, and if the resource surplus condition does not exist, continuing to wait for the free space; and if the free space does not appear for a long time, carrying out data migration.
The phase performance data for each program is registered in a historical phase & performance table (HPPT). The HPPT stores phase information and performance information for each program. The method and the device dynamically adjust the cache occupied by the current stage according to the stage behavior of the running program and the performance information of the program during running.
And dynamically adjusting the cache space occupied by the delay key program according to the stage of the delay key program, the MPKI _ LLC and the HPKI _ LLC.
Fig. 1 depicts a resource partitioning method. For each program that needs to be executed, program performance information is obtained using a hardware performance counter.
FIG. 2 depicts a program phase performance analysis method.
For each program to be executed, use is made of
Figure BDA0002833794340000111
The index analyzes the program runtime phase and utilizes
Figure BDA0002833794340000121
And
Figure BDA0002833794340000122
the indicators analyze the program performance and are dynamicAnd adjusting the cache space occupied by the program.
Example two
The embodiment provides a system for ensuring the service quality when a plurality of delay key programs are executed together;
a system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
a dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
It should be noted here that the initialization module, the phase division module, the classification module and the dynamic adjustment module correspond to steps S101 to S104 in the first embodiment, and the modules are the same as the corresponding steps in the implementation example and application scenarios, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.
In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The proposed system can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed.
EXAMPLE III
The present embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein, a processor is connected with the memory, the one or more computer programs are stored in the memory, and when the electronic device runs, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first embodiment.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.
The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Example four
The present embodiments also provide a computer-readable storage medium for storing computer instructions, which when executed by a processor, perform the method of the first embodiment.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A method for ensuring the service quality when a plurality of delay key programs are executed together is characterized by comprising the following steps:
initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
2. The method of claim 1, wherein the initializing a hardware counter, starting a plurality of latency critical processes; further comprising:
assuming that the buffer space LLC has N paths of spaces in total, reserving M paths of spaces as standby spaces, and averagely distributing the rest N-M paths of spaces to all delay key programs; n and M are both positive integers.
3. The method of claim 1, wherein each delay-critical program is divided into a number of program phases; the method comprises the following specific steps:
the number of instructions is counted by a counter, and the stages of the program are divided by executing a set number of instructions.
4. The method of claim 1, wherein each program phase is subdivided into a number of program intervals; the method comprises the following specific steps:
counting the condition branch instructions, and triggering interruption after executing the X condition branch instructions;
namely every X condition branch instructions are used as a program interval; another hardware counter is responsible for recording the total number of instructions executed during the period, and X is a positive integer.
5. The method of claim 1, wherein the program interval of each program phase of each delay critical program is sampled using a hardware performance counter during which a plurality of delay critical programs are being run together; the method comprises the following specific steps:
in the process that a plurality of delay key programs are operated together, a hardware performance counter is utilized to sample the program interval of each program phase of each delay key program, and the instruction number IPC, the number of LLC uncommitted numbers, the number of LLC hits and the number of LLC references of each period of a performance index are obtained.
6. The method of claim 1, wherein the first, second and third actual performance data for each program phase are calculated based on the sampled data; the method comprises the following specific steps:
calculating first actual performance data of each program stage according to the sampling data; the first actual performance data refers to: IPC average of number of instructions per cycle;
calculating second actual performance data of each program stage according to the sampling data; second actual performance data, refer to: mean MPKI of missed instruction count per thousand instructions in LLCLLC
Calculating third actual performance data of each program stage according to the sampling data; the third actual performance data refers to: average HPKI of hit instruction count per thousand instructions on LLCLLC
7. The method of claim 1, wherein the phase types of the respective program phases are classified based on the first actual performance data; the method comprises the following specific steps:
according to
Figure FDA0002833794330000021
Program phase types are divided into 3 classes:
a type:
Figure FDA0002833794330000022
b type:
Figure FDA0002833794330000023
class C:
Figure FDA0002833794330000031
wherein alpha is a first set threshold value, beta is a second set threshold value;
or,
classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
according to
Figure FDA0002833794330000032
And
Figure FDA0002833794330000033
program performance types are classified into 3 types:
a type:
Figure FDA0002833794330000034
b type:
Figure FDA0002833794330000035
and c is as follows:
Figure FDA0002833794330000036
wherein η refers to a third set threshold, γ refers to a fourth set threshold, and η < γ;
dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program; the method comprises the following specific steps:
judging the stage type and the performance type of each program stage of each delay key program;
if the type of the program phase is A type and the performance type is a or b type, primarily judging that the cache space needs to be increased in the program phase; in the process of increasing the cache space, if the type of the program stage is not changed, immediately stopping increasing the cache space and reducing the increased cache space; if the phase type is changed into B or C, saving modification;
if the type of the program phase is A type and the performance type is c type, preliminarily judging that the cache space of the program phase needs to be reduced; if the 1-way cache space is reduced,
Figure FDA0002833794330000037
if not, continuing to reduce the cache space, otherwise, restoring the modification operation;
if the program phase is B type and the performance type is B type, the cache space occupied by the program phase is not changed;
if the program stage is of type B and the performance type is of type a or c, preliminarily judging that the cache space needs to be reduced in the program stage; if the 1-path cache space is reduced and the stage type is not changed into the A type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program stage is C type and the performance type is a type or C type, judging whether the program stage has the phenomenon of surplus resources, and if the 1-path cache space is reduced and the stage type is not changed into the A type or the B type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program phase is class C and the performance type is class b, the cache space occupied by the program phase is not changed.
8. A system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
a dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
9. An electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of any of the preceding claims 1-7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 7.
CN202011465046.2A 2020-12-14 2020-12-14 Method and system for ensuring service quality when multiple delay key programs are executed together Active CN112540934B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011465046.2A CN112540934B (en) 2020-12-14 2020-12-14 Method and system for ensuring service quality when multiple delay key programs are executed together

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011465046.2A CN112540934B (en) 2020-12-14 2020-12-14 Method and system for ensuring service quality when multiple delay key programs are executed together

Publications (2)

Publication Number Publication Date
CN112540934A true CN112540934A (en) 2021-03-23
CN112540934B CN112540934B (en) 2022-07-29

Family

ID=75018579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011465046.2A Active CN112540934B (en) 2020-12-14 2020-12-14 Method and system for ensuring service quality when multiple delay key programs are executed together

Country Status (1)

Country Link
CN (1) CN112540934B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113504977A (en) * 2021-06-18 2021-10-15 山东师范大学 Cache partitioning method and system for ensuring service quality of multiple delay key programs
CN113821324A (en) * 2021-09-17 2021-12-21 海光信息技术股份有限公司 Cache system, method, apparatus and computer medium for processor

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916230A (en) * 2010-08-11 2010-12-15 中国科学技术大学苏州研究院 Partitioning and thread-aware based performance optimization method of last level cache (LLC)
US20110055479A1 (en) * 2009-08-28 2011-03-03 Vmware, Inc. Thread Compensation For Microarchitectural Contention
CN103077128A (en) * 2012-12-29 2013-05-01 华中科技大学 Method for dynamically partitioning shared cache in multi-core environment
CN103235764A (en) * 2013-04-11 2013-08-07 浙江大学 Thread-aware multi-core data prefetching self-regulation method
US20140095691A1 (en) * 2012-09-28 2014-04-03 Mrittika Ganguli Managing data center resources to achieve a quality of service
US9401869B1 (en) * 2012-06-04 2016-07-26 Google Inc. System and methods for sharing memory subsystem resources among datacenter applications
CN107463510A (en) * 2017-08-21 2017-12-12 北京工业大学 It is a kind of towards high performance heterogeneous polynuclear cache sharing amortization management method
CN107851040A (en) * 2015-07-23 2018-03-27 高通股份有限公司 For the system and method using cache requirements monitoring scheduler task in heterogeneous processor cluster framework
CN108845960A (en) * 2013-10-23 2018-11-20 华为技术有限公司 A kind of memory resource optimization method and device
CN110618872A (en) * 2019-09-25 2019-12-27 山东师范大学 Hybrid memory dynamic scheduling method and system
CN111258927A (en) * 2019-11-13 2020-06-09 北京大学 Application program CPU last-level cache miss rate curve prediction method based on sampling
CN112000465A (en) * 2020-07-21 2020-11-27 山东师范大学 Method and system for reducing performance interference of delay sensitive program in data center environment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110055479A1 (en) * 2009-08-28 2011-03-03 Vmware, Inc. Thread Compensation For Microarchitectural Contention
CN101916230A (en) * 2010-08-11 2010-12-15 中国科学技术大学苏州研究院 Partitioning and thread-aware based performance optimization method of last level cache (LLC)
US9401869B1 (en) * 2012-06-04 2016-07-26 Google Inc. System and methods for sharing memory subsystem resources among datacenter applications
US20140095691A1 (en) * 2012-09-28 2014-04-03 Mrittika Ganguli Managing data center resources to achieve a quality of service
CN103077128A (en) * 2012-12-29 2013-05-01 华中科技大学 Method for dynamically partitioning shared cache in multi-core environment
CN103235764A (en) * 2013-04-11 2013-08-07 浙江大学 Thread-aware multi-core data prefetching self-regulation method
CN108845960A (en) * 2013-10-23 2018-11-20 华为技术有限公司 A kind of memory resource optimization method and device
CN107851040A (en) * 2015-07-23 2018-03-27 高通股份有限公司 For the system and method using cache requirements monitoring scheduler task in heterogeneous processor cluster framework
CN107463510A (en) * 2017-08-21 2017-12-12 北京工业大学 It is a kind of towards high performance heterogeneous polynuclear cache sharing amortization management method
CN110618872A (en) * 2019-09-25 2019-12-27 山东师范大学 Hybrid memory dynamic scheduling method and system
CN111258927A (en) * 2019-11-13 2020-06-09 北京大学 Application program CPU last-level cache miss rate curve prediction method based on sampling
CN112000465A (en) * 2020-07-21 2020-11-27 山东师范大学 Method and system for reducing performance interference of delay sensitive program in data center environment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113504977A (en) * 2021-06-18 2021-10-15 山东师范大学 Cache partitioning method and system for ensuring service quality of multiple delay key programs
CN113821324A (en) * 2021-09-17 2021-12-21 海光信息技术股份有限公司 Cache system, method, apparatus and computer medium for processor
CN113821324B (en) * 2021-09-17 2022-08-09 海光信息技术股份有限公司 Cache system, method, apparatus and computer medium for processor

Also Published As

Publication number Publication date
CN112540934B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
KR102456085B1 (en) Dynamic memory remapping to reduce row buffer collisions
US20210374046A1 (en) Performance counters for computer memory
US6865647B2 (en) Dynamic cache partitioning
US7899994B2 (en) Providing quality of service (QoS) for cache architectures using priority information
US7725657B2 (en) Dynamic quality of service (QoS) for a shared cache
US8190795B2 (en) Memory buffer allocation device and computer readable medium having stored thereon memory buffer allocation program
CN108845960B (en) Memory resource optimization method and device
CN112540934B (en) Method and system for ensuring service quality when multiple delay key programs are executed together
US20080235487A1 (en) Applying quality of service (QoS) to a translation lookaside buffer (TLB)
US20110113215A1 (en) Method and apparatus for dynamic resizing of cache partitions based on the execution phase of tasks
US20050125613A1 (en) Reconfigurable trace cache
US8769543B2 (en) System and method for maximizing data processing throughput via application load adaptive scheduling and context switching
CN109308220B (en) Shared resource allocation method and device
KR101356033B1 (en) Hybrid Main Memory System and Task Scheduling Method therefor
US20170371550A1 (en) Frame choosing during storage constraint condition
US20200210340A1 (en) Cache Management Method, Cache and Storage Medium
US9189279B2 (en) Assignment method and multi-core processor system
WO2016202154A1 (en) Gpu resource allocation method and system
US8769201B2 (en) Technique for controlling computing resources
US20190056872A1 (en) Reallocate memory pending queue based on stall
CN106294192B (en) Memory allocation method, memory allocation device and server
CN115421924A (en) Memory allocation method, device and equipment
CN112579277B (en) Central processing unit, method, device and storage medium for simultaneous multithreading
CN116483742A (en) Prefetch address generation method and computer equipment
CN113505087B (en) Cache dynamic dividing method and system considering service quality and utilization rate

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant