CN112540934B - Method and system for ensuring service quality when multiple delay key programs are executed together - Google Patents

Method and system for ensuring service quality when multiple delay key programs are executed together Download PDF

Info

Publication number
CN112540934B
CN112540934B CN202011465046.2A CN202011465046A CN112540934B CN 112540934 B CN112540934 B CN 112540934B CN 202011465046 A CN202011465046 A CN 202011465046A CN 112540934 B CN112540934 B CN 112540934B
Authority
CN
China
Prior art keywords
program
type
stage
cache space
phase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011465046.2A
Other languages
Chinese (zh)
Other versions
CN112540934A (en
Inventor
王琳
李东桦
黄天元
耿世超
周莲莲
季红滨
张昭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Big Data Center
Shandong Normal University
Original Assignee
Shandong Big Data Center
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Big Data Center, Shandong Normal University filed Critical Shandong Big Data Center
Priority to CN202011465046.2A priority Critical patent/CN112540934B/en
Publication of CN112540934A publication Critical patent/CN112540934A/en
Application granted granted Critical
Publication of CN112540934B publication Critical patent/CN112540934B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a method and a system for ensuring the service quality when a plurality of delay key programs are executed together, which start the plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last-level cache space; dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals; sampling a program interval in each program phase of each delay critical program in a process in which a plurality of delay critical programs are operated together; calculating first, second and third actual performance data of each program stage according to the sampling data; classifying the phase types and performances of the corresponding program phases according to the first, second and third actual performance data; and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.

Description

Method and system for ensuring service quality when multiple delay key programs are executed together
Technical Field
The present application relates to the field of parallel and distributed computing technologies, and in particular, to a method and system for ensuring quality of service when a plurality of delay-critical programs are executed together.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Data centers have grown to maturity from concept. In a data center, a large number of programs are executed on as few servers as possible in order to improve resource utilization. In a server node, there are multiple programs executing on a single node. The advantage of multiple programs executing together is that the utilization of the server can be increased, and the problem is that the performance of the program is reduced. The degree of performance degradation of a program depends on program characteristics, and for some programs, performance degradation is not significant when the program is executed with other programs, and for some programs, performance degradation is significant when the program is executed with other programs.
At the same time, a large number of delay-critical programs are running in the data center. The client executes the program in the data center and has certain service quality requirements on the program, such as the performance of the program cannot be lower than 90% of the performance of the program when the program is executed alone. When a delay-critical program is executed together with other programs, it is easy to cause severe performance degradation due to performance interference, and thus the quality of service requirements of customers cannot be satisfied. This is a problem that must be solved. Therefore, a method is needed to ensure the service quality of the delay-critical program on the basis of improving the utilization rate of system resources as much as possible. This is a problem that the present application has to solve.
Disclosure of Invention
In order to solve the defects of the prior art, the application provides a method and a system for ensuring the service quality when a plurality of delay key programs are executed together;
in a first aspect, the present application provides a method for ensuring quality of service when multiple delay-critical programs are executed together;
a method for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
initializing a hardware counter, and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
And dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
In a second aspect, the present application provides a system for ensuring quality of service when multiple delay-critical programs are executed together;
a system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
A dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
In a third aspect, the present application further provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs are stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first aspect.
In a fourth aspect, the present application also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.
In a fifth aspect, the present application also provides a computer program (product) comprising a computer program for implementing the method of any of the preceding first aspects when run on one or more processors.
Compared with the prior art, the beneficial effects of this application are:
by monitoring the performance indexes of the delay key programs in real time and utilizing CAT to dynamically divide LLC resources for the delay key programs of different types, the performance of the delay key programs in common execution is ensured, and the utilization rate of the LLC resources is improved as much as possible.
Intel technology supporting Last Level Cache (LLC) allocation may better utilize the cache through cache partitioning. The present application may utilize this technique to guarantee the performance requirements of users by preventing delay-critical programs from polluting each other's caches. In addition, the present application may better meet the performance requirements of the user by allocating more LLC resources to the performance-benefited delay-critical programs and reducing or stopping allocation to those delay-critical programs that do not benefit.
The invention dynamically adjusts the space occupied by the program by using the performance index of the stage of the program during operation. The invention can promote the quantity of the delay key programs and the resource utilization rate of the LLC as much as possible while ensuring the service quality of the programs.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a flowchart of a resource partitioning method according to a first embodiment;
FIG. 2 is a flowchart of the program phase performance analysis of the first embodiment.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Interpretation of terms:
delay critical procedures, refer to: applications with strict requirements on tail latency, tail latency is an important performance indicator for latency critical programs.
LLC, means: the Last Level Cache, i.e. the Last Level Cache, refers to the highest Level Cache that is commonly shared by all functional units (e.g. CPU core, IGP and DSP) on the chip.
CAT, meaning: cache Allocation Technology, has the basic goal of enabling resource Allocation based on application priority or class of service (CLOS). The Intel Xeon processor E5 v4 family (and a subset of the Intel Xeon processor E5 v3 family devoted to communication) introduces functionality to configure and utilize cache allocation techniques on the last level cache.
CLOS, refers to: class of Service, CLOS, as an abstraction, may add multiple resource control attributes, thereby reducing software overhead during context switching.
Example one
The embodiment provides a method for ensuring the service quality when a plurality of delay key programs are executed together;
a method for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
S101: initializing a hardware counter, and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share a last level cache space (LLC);
s102: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
s103: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together;
calculating first, second and third actual performance data for each program phase from the sampled data;
classifying the phase types of the corresponding program phases according to the first actual performance data;
classifying the performance of the program phase according to the second and third actual performance data;
s104: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
It should be understood that in S101, to ensure that each latency critical program has no CPU time contention, the initial state of the system places each latency critical program in a different core, with the latency critical programs on each core sharing the LLC.
For example, the plurality of delay criticalities refer to two or more delay critical processes.
As one or more embodiments, the S101 further includes:
assuming that the buffer space LLC has N paths of spaces in total, reserving M paths of spaces as standby spaces, and averagely distributing the rest N-M paths of spaces to all delay key programs; n and M are both positive integers.
Illustratively, each delay-critical program is partitioned into different CLOS and isolated by CAT, reducing the impact between delay-critical programs.
For example, assuming that the system has N LLC spaces, M cache way spaces are reserved for CLOS #1 as candidate spaces, and the remaining N-M spaces are equally allocated to all latency critical programs.
Illustratively, assuming that the latency critical procedure is two, the LLC space is allocated from low to high by address, since the CAT only supports a contiguous partition of LLC space. Delay Key 1 occupies the amount of space represented by CLOS #0, equal in size to
Figure BDA0002833794340000071
And (4) a way. The N LLC spaces are isolated and defined as CLOS #1, which is a spare space whose addresses begin with the address next to the last address of CLOS # 0. The remaining LLC space is allocated to delay-critical program 2, with the space defined as CLOS # 2.
As one or more embodiments, the S102: dividing each delay key program into a plurality of program stages; the method comprises the following specific steps:
the number of instructions is counted by a counter, and the stages of the program are divided by executing a set number of instructions.
As one or more embodiments, the S102: dividing each program stage into a plurality of program intervals; the method comprises the following specific steps:
counting the condition branch instructions, and triggering interruption after executing the X condition branch instructions;
namely every X condition branch instructions are used as a program interval; another hardware counter is responsible for recording the total number of instructions executed during the period, and X is a positive integer.
Illustratively, each program phase is subdivided into a number of program intervals; the method comprises the following specific steps:
each program phase is divided into a number of program intervals using different sampling periods.
It should be understood that, in S102, each delay-critical program is divided into program phases containing a fixed number of instructions, and in order to better acquire program performance information, the present application introduces a two-stage phase detection method.
Stage division: in the running process of a program, performance indexes (such as IPC) of the program may change, program segments belonging to the same stage have similar performance indexes, while program segments belonging to different stages have different performance indexes, and the program can be divided into different stages according to the performance indexes of the program. The method uses a fixed instruction number to divide the stages of the program, and then uses IPC indexes to classify the runtime stages of the program, wherein the fixed instruction number can be 1000 ten thousand, 1 hundred million and 10 hundred million.
And (5) dividing the interval. In order to obtain the phase information of the program in the running process in more detail, the method adopts an interval division method for subdividing the program phase. In order to reduce sampling overhead and information loss, performance data is sampled once every X condition branch instructions. The sampling period can be selected according to practical situations, for example, 100M and 200M.
As one or more embodiments, the S103: sampling a program interval of each program stage of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; the method comprises the following specific steps:
in the process that a plurality of delay key programs are operated together, a hardware performance counter is utilized to sample the program interval of each program phase of each delay key program, and the instruction number IPC, the number of LLC uncommitted numbers, the number of LLC hits and the number of LLC references of each period of a performance index are obtained.
Calculating program interval index MPKI by using the obtained LLC number of hits, LLC number of hits and LLC number of references LLC And HPKI LLC . Program-spaced MPKI LLC And HPKI LLC The calculation formula of (a) is as follows:
Figure BDA0002833794340000081
Figure BDA0002833794340000082
Num Miss refers to the number of LLC misses, Num Ins Refers to LLC reference number, Num Hit Refers to the number of LLC hits.
It should be understood that, with the performance indicators of all intervals in the program stage, the average of the performance indicators is calculated as the performance indicator of the stage where the delay-critical program is located, and is used for analyzing the stage behavior of the delay-critical program.
As one or more embodiments, the S103: calculating first, second and third actual performance data for each program phase from the sampled data; the method comprises the following specific steps:
calculating first actual performance data of each program stage according to the sampling data; the first actual performance data refers to: IPC average of number of instructions per cycle;
calculating second actual performance data of each program stage according to the sampling data; second actual performance data, refer to: mean MPKI of missed instruction count per thousand instructions in LLC LLC
Calculating third actual performance data of each program stage according to the sampling data; the third actual performance data refers to: hit finger per thousand instructions on LLCMean value of order number HPKI LLC
Illustratively, the IPC and MPKI of the phase are calculated according to the data sampled at each interval in the phase LLC (number of missed instructions per thousand instructions on LLC) and HPKI LLC (number of hits per thousand instructions on LLC). IPC LLC 、MPKI LLC 、HPKI LLC The average calculation steps are as follows:
Figure BDA0002833794340000091
Figure BDA0002833794340000092
Figure BDA0002833794340000093
IPC LLC1 IPC index referring to the first interval, n represents the number of intervals in a program phase, MPKI LLC1 Refers to the MPKI index, HPKI, of the first interval LLC1 Refers to the HPKI index for the first interval.
As one or more embodiments, the S103: classifying the phase types of the corresponding program phases according to the first actual performance data; the method comprises the following specific steps:
according to
Figure BDA0002833794340000094
Program phase types are divided into 3 classes:
a type:
Figure BDA0002833794340000095
b type:
Figure BDA0002833794340000096
class C:
Figure BDA0002833794340000097
where α is a first set threshold value and β is a second set threshold value.
As one or more embodiments, the S103: classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
according to
Figure BDA0002833794340000101
And
Figure BDA0002833794340000102
program performance types are classified into 3 types:
a type:
Figure BDA0002833794340000103
b type:
Figure BDA0002833794340000104
and c is as follows:
Figure BDA0002833794340000105
where η refers to a third set threshold, γ refers to a fourth set threshold, and η < γ.
As one or more embodiments, the S104: dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program; the method comprises the following specific steps:
Judging the stage type and the performance type of each program stage of each delay key program;
if the type of the program phase is A type and the performance type is a or b type, primarily judging that the cache space needs to be increased in the program phase; in the process of increasing the cache space, if the type of the program stage is not changed, immediately stopping increasing the cache space and reducing the increased cache space; if the phase type is changed into B or C, saving modification;
if the type of the program phase is A type and the performance type is c type, preliminarily judging that the cache space of the program phase needs to be reduced; if the 1-way cache space is reduced,
Figure BDA0002833794340000106
if not, continuing to reduce the cache space, otherwise, restoring the modification operation;
if the program phase is B type and the performance type is B type, the cache space occupied by the program phase is not changed;
if the program stage is of type B and the performance type is of type a or c, preliminarily judging that the cache space needs to be reduced in the program stage; if the 1-path cache space is reduced and the stage type is not changed into the A type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program stage is C type and the performance type is a type or C type, judging whether the program stage has the phenomenon of surplus resources, and if the 1-path cache space is reduced and the stage type is not changed into the A type or the B type, continuing to reduce the cache space; otherwise, restoring the modified operation;
If the program phase is of type C and the performance type is of type b, the cache space occupied by the program phase is not changed;
as one or more embodiments, the method further comprises:
acquiring the resource use condition of the LLC to perform dynamic management;
if the CLOS #1 has free space, acquiring the space from the CLOS #1, and if the cache space in the CLOS #1 is completely allocated, judging whether the adjacent programs on the physical address are in a resource surplus state; if yes, distributing redundant space according to the performance status of the adjacent program, and if the resource surplus condition does not exist, continuing to wait for the free space; and if the free space does not appear for a long time, carrying out data migration.
The phase performance data for each program is registered in a historical phase & performance table (HPPT). The HPPT stores phase information and performance information for each program. The method and the device dynamically adjust the cache occupied by the current stage according to the stage behavior of the running program and the performance information of the program during running.
And dynamically adjusting the cache space occupied by the delay key program according to the stage of the delay key program, the MPKI _ LLC and the HPKI _ LLC.
Fig. 1 depicts a resource partitioning method. For each program that needs to be executed, program performance information is obtained using a hardware performance counter.
Fig. 2 depicts a program phase performance analysis method.
For each program to be executed, use is made of
Figure BDA0002833794340000111
The index analyzes the program runtime phase and utilizes
Figure BDA0002833794340000121
And
Figure BDA0002833794340000122
the index analyzes the program performance, and further dynamically adjusts the cache space occupied by the program.
Example two
The embodiment provides a system for ensuring the service quality when a plurality of delay key programs are executed together;
a system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
A dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
It should be noted here that the initialization module, the phase division module, the classification module and the dynamic adjustment module correspond to steps S101 to S104 in the first embodiment, and the modules are the same as the corresponding steps in the implementation example and application scenarios, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.
In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The proposed system can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed.
EXAMPLE III
The present embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein, a processor is connected with the memory, the one or more computer programs are stored in the memory, and when the electronic device runs, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first embodiment.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.
The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Example four
The present embodiments also provide a computer-readable storage medium for storing computer instructions, which when executed by a processor, perform the method of the first embodiment.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (8)

1. A method for ensuring quality of service when a plurality of delay critical programs are executed together, comprising:
initializing a hardware counter, and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data; calculating first, second and third actual performance data for each program phase from the sampled data; the method comprises the following specific steps: calculating first actual performance data of each program stage according to the sampling data; the first actual performance data refers to: IPC average of number of instructions per cycle; calculating second actual performance data of each program stage according to the sampling data; second actual performance data, refer to: mean MPKI of missed instruction count per thousand instructions in LLC LLC (ii) a Calculating third actual performance data of each program stage according to the sampling data; the third actual performance data refers to: average HPKI of hit instruction count per thousand instructions on LLC LLC (ii) a Classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps: according to
Figure FDA0003693549500000011
Program phase types are divided into 3 classes:
a type:
Figure FDA0003693549500000012
b type:
Figure FDA0003693549500000013
class C:
Figure FDA0003693549500000014
wherein alpha is a first set threshold value, beta is a second set threshold value;
alternatively, the first and second electrodes may be,
classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
according to
Figure FDA0003693549500000021
And
Figure FDA0003693549500000022
program performance types are classified into 3 types:
a type:
Figure FDA0003693549500000023
b type:
Figure FDA0003693549500000024
and c is as follows:
Figure FDA0003693549500000025
wherein η refers to a third set threshold, γ refers to a fourth set threshold, and η < γ;
dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program; the method comprises the following specific steps:
judging the stage type and the performance type of each program stage of each delay key program;
if the type of the program phase is A type and the performance type is a or b type, primarily judging that the cache space needs to be increased in the program phase; in the process of increasing the cache space, if the type of the program stage is not changed, immediately stopping increasing the cache space and reducing the increased cache space; if the phase type is changed into B or C, saving modification;
If the type of the program phase is A type and the performance type is c type, preliminarily judging that the cache space of the program phase needs to be reduced; if the cache space of 1 way is reduced,
Figure FDA0003693549500000026
if not, continuing to reduce the cache space, otherwise, restoring the modification operation;
if the program phase is of type B and the performance type is of type B, the cache space occupied by the program phase is not changed;
if the program stage is of type B and the performance type is of type a or c, preliminarily judging that the cache space needs to be reduced in the program stage; if the 1-path cache space is reduced and the stage type is not changed into the A type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program stage is C type and the performance type is a type or C type, judging whether the program stage has the phenomenon of surplus resources, and if the 1-path cache space is reduced and the stage type is not changed into the A type or the B type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program phase is C type and the performance type is b type, the cache space occupied by the program phase is not changed;
and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
2. The method of claim 1, wherein the initializing a hardware counter starts a plurality of latency critical processes; further comprising:
assuming that a cache space LLC has N paths of spaces in total, reserving M paths of spaces as standby spaces, and averagely distributing the rest N-M paths of spaces to all delay key programs; n and M are both positive integers.
3. The method of claim 1, wherein each delay-critical program is divided into a number of program phases; the method comprises the following specific steps:
the number of instructions is counted by a counter, and the stages of the program are divided by executing a set number of instructions.
4. The method of claim 1, wherein each program phase is subdivided into a number of program intervals; the method comprises the following specific steps:
counting the condition branch instructions, and triggering interruption after executing the X condition branch instructions;
namely every X condition branch instructions are used as a program interval; another hardware counter is responsible for recording the total number of instructions executed during the period, and X is a positive integer.
5. The method of claim 1, wherein the program interval of each program phase of each delay critical program is sampled using a hardware performance counter during which a plurality of delay critical programs are being run together; the method comprises the following specific steps:
In the process that a plurality of delay key programs are operated together, a hardware performance counter is utilized to sample the program interval of each program phase of each delay key program, and the instruction number IPC, the number of LLC uncommitted numbers, the number of LLC hits and the number of LLC references of each period of a performance index are obtained.
6. A system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter, and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data; calculating first, second and third actual performance data for each program phase from the sampled data; the method comprises the following specific steps: calculating first actual performance data of each program stage according to the sampling data; the first actual performance data refers to: IPC average of number of instructions per cycle; calculating second actual performance data of each program stage according to the sampling data; second actual performance data, refer to: mean MPKI of missed instruction count per thousand instructions in LLC LLC (ii) a Calculating third actual performance data of each program stage according to the sampling data; the third actual performance data refers to: average HPKI of hit instruction count per thousand instructions on LLC LLC (ii) a Classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
according to
Figure FDA0003693549500000051
Program phase types are divided into 3 classes:
a type:
Figure FDA0003693549500000052
b type:
Figure FDA0003693549500000053
class C:
Figure FDA0003693549500000054
wherein alpha is a first set threshold value, beta is a second set threshold value;
alternatively, the first and second electrodes may be,
classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
according to
Figure FDA0003693549500000055
And
Figure FDA0003693549500000056
program performance types are classified into 3 types:
a type:
Figure FDA0003693549500000057
b type:
Figure FDA0003693549500000058
and c is as follows:
Figure FDA0003693549500000059
wherein η refers to a third set threshold, γ refers to a fourth set threshold, and η < γ;
dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program; the method comprises the following specific steps:
judging the stage type and the performance type of each program stage of each delay key program;
if the type of the program phase is A type and the performance type is a or b type, primarily judging that the cache space needs to be increased in the program phase; in the process of increasing the cache space, if the type of the program stage is not changed, immediately stopping increasing the cache space and reducing the increased cache space; if the phase type is changed into B or C, saving modification;
If the type of the program phase is A type and the performance type is c type, preliminarily judging that the cache space of the program phase needs to be reduced; if the cache space of 1 way is reduced,
Figure FDA00036935495000000510
if not, continuing to reduce the cache space, otherwise, restoring the modification operation;
if the program phase is B type and the performance type is B type, the cache space occupied by the program phase is not changed;
if the program stage is of type B and the performance type is of type a or c, preliminarily judging that the cache space needs to be reduced in the program stage; if the 1-path cache space is reduced and the stage type is not changed into the A type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program stage is C type and the performance type is a type or C type, judging whether the program stage has the phenomenon of surplus resources, and if the 1-path cache space is reduced and the stage type is not changed into the A type or the B type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program phase is C type and the performance type is b type, the cache space occupied by the program phase is not changed;
a dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
7. An electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of any of the preceding claims 1-5.
8. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 5.
CN202011465046.2A 2020-12-14 2020-12-14 Method and system for ensuring service quality when multiple delay key programs are executed together Active CN112540934B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011465046.2A CN112540934B (en) 2020-12-14 2020-12-14 Method and system for ensuring service quality when multiple delay key programs are executed together

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011465046.2A CN112540934B (en) 2020-12-14 2020-12-14 Method and system for ensuring service quality when multiple delay key programs are executed together

Publications (2)

Publication Number Publication Date
CN112540934A CN112540934A (en) 2021-03-23
CN112540934B true CN112540934B (en) 2022-07-29

Family

ID=75018579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011465046.2A Active CN112540934B (en) 2020-12-14 2020-12-14 Method and system for ensuring service quality when multiple delay key programs are executed together

Country Status (1)

Country Link
CN (1) CN112540934B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821324B (en) * 2021-09-17 2022-08-09 海光信息技术股份有限公司 Cache system, method, apparatus and computer medium for processor

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9244732B2 (en) * 2009-08-28 2016-01-26 Vmware, Inc. Compensating threads for microarchitectural resource contentions by prioritizing scheduling and execution
CN101916230A (en) * 2010-08-11 2010-12-15 中国科学技术大学苏州研究院 Partitioning and thread-aware based performance optimization method of last level cache (LLC)
US9401869B1 (en) * 2012-06-04 2016-07-26 Google Inc. System and methods for sharing memory subsystem resources among datacenter applications
US10554505B2 (en) * 2012-09-28 2020-02-04 Intel Corporation Managing data center resources to achieve a quality of service
CN103077128B (en) * 2012-12-29 2015-09-23 华中科技大学 Shared buffer memory method for dynamically partitioning under a kind of multi-core environment
CN103235764B (en) * 2013-04-11 2016-01-20 浙江大学 Thread aware multinuclear data pre-fetching self-regulated method
CN104572493A (en) * 2013-10-23 2015-04-29 华为技术有限公司 Memory resource optimization method and device
US9626295B2 (en) * 2015-07-23 2017-04-18 Qualcomm Incorporated Systems and methods for scheduling tasks in a heterogeneous processor cluster architecture using cache demand monitoring
CN107463510B (en) * 2017-08-21 2020-05-08 北京工业大学 High-performance heterogeneous multi-core shared cache buffer management method
CN110618872B (en) * 2019-09-25 2022-04-15 山东师范大学 Hybrid memory dynamic scheduling method and system
CN111258927B (en) * 2019-11-13 2022-05-03 北京大学 Application program CPU last-level cache miss rate curve prediction method based on sampling
CN112000465B (en) * 2020-07-21 2023-02-03 山东师范大学 Method and system for reducing performance interference of delay sensitive program in data center environment

Also Published As

Publication number Publication date
CN112540934A (en) 2021-03-23

Similar Documents

Publication Publication Date Title
KR102456085B1 (en) Dynamic memory remapping to reduce row buffer collisions
US6662272B2 (en) Dynamic cache partitioning
US7725657B2 (en) Dynamic quality of service (QoS) for a shared cache
US7899994B2 (en) Providing quality of service (QoS) for cache architectures using priority information
US8190795B2 (en) Memory buffer allocation device and computer readable medium having stored thereon memory buffer allocation program
US7103735B2 (en) Methods and apparatus to process cache allocation requests based on priority
US9223712B2 (en) Data cache method, device, and system in a multi-node system
CN108845960B (en) Memory resource optimization method and device
US7185167B2 (en) Heap allocation
US20080235487A1 (en) Applying quality of service (QoS) to a translation lookaside buffer (TLB)
US20050125613A1 (en) Reconfigurable trace cache
KR101356033B1 (en) Hybrid Main Memory System and Task Scheduling Method therefor
JP3727887B2 (en) Shared register file control method in multi-thread processor
US20120079494A1 (en) System And Method For Maximizing Data Processing Throughput Via Application Load Adaptive Scheduling And Content Switching
US10725940B2 (en) Reallocate memory pending queue based on stall
US8769201B2 (en) Technique for controlling computing resources
CN106294192B (en) Memory allocation method, memory allocation device and server
CN112540934B (en) Method and system for ensuring service quality when multiple delay key programs are executed together
US9189279B2 (en) Assignment method and multi-core processor system
Li Orchestrating thread scheduling and cache management to improve memory system throughput in throughput processors
CN112579277B (en) Central processing unit, method, device and storage medium for simultaneous multithreading
Ikeda et al. Application aware DRAM bank partitioning in CMP
US20240061780A1 (en) Systems and methods for memory bandwidth allocation
CN112506660A (en) Method and device for optimizing memory of audio/video codec and storage medium
CN114780249A (en) Cache management method, system, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant