CN112540934A - Method and system for ensuring service quality when multiple delay key programs are executed together - Google Patents
Method and system for ensuring service quality when multiple delay key programs are executed together Download PDFInfo
- Publication number
- CN112540934A CN112540934A CN202011465046.2A CN202011465046A CN112540934A CN 112540934 A CN112540934 A CN 112540934A CN 202011465046 A CN202011465046 A CN 202011465046A CN 112540934 A CN112540934 A CN 112540934A
- Authority
- CN
- China
- Prior art keywords
- program
- type
- stage
- phase
- delay key
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 230000008569 process Effects 0.000 claims abstract description 26
- 238000005070 sampling Methods 0.000 claims abstract description 18
- 230000015654 memory Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 11
- 238000012986 modification Methods 0.000 claims description 6
- 230000004048 modification Effects 0.000 claims description 6
- 230000015556 catabolic process Effects 0.000 description 4
- 238000006731 degradation reaction Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0811—Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a method and a system for ensuring the service quality when a plurality of delay key programs are executed together, which start the plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last-level cache space; dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals; sampling a program interval in each program phase of each delay critical program in a process in which a plurality of delay critical programs are operated together; calculating first, second and third actual performance data of each program stage according to the sampling data; classifying the phase types and performances of the corresponding program phases according to the first, second and third actual performance data; and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
Description
Technical Field
The present application relates to the field of parallel and distributed computing technologies, and in particular, to a method and system for ensuring quality of service when a plurality of delay-critical programs are executed together.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Data centers have gone from concept to maturity. In a data center, a large number of programs are executed on as few servers as possible in order to improve resource utilization. In a server node, there are multiple programs executing on a single node. The advantage of multiple programs executing together is that the utilization of the server can be increased, and the problem is that the performance of the program is reduced. The degree of performance degradation of a program depends on program characteristics, and for some programs, performance degradation is not significant when the program is executed with other programs, and for some programs, performance degradation is significant when the program is executed with other programs.
At the same time, a large number of delay-critical programs are running in the data center. The client executes the program in the data center and has certain service quality requirements on the program, such as the performance of the program cannot be lower than 90% of the performance of the program when the program is executed alone. When a delay-critical program is executed together with other programs, it is easy to cause severe performance degradation due to performance interference, and thus the quality of service requirements of customers cannot be satisfied. This is a problem that must be solved. Therefore, a method is needed to ensure the service quality of the delay-critical program on the basis of improving the utilization rate of system resources as much as possible. This is a problem that the present application has to solve.
Disclosure of Invention
In order to solve the defects of the prior art, the application provides a method and a system for ensuring the service quality when a plurality of delay key programs are executed together;
in a first aspect, the present application provides a method for ensuring quality of service when multiple latency critical programs are executed together;
a method for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
In a second aspect, the present application provides a system for ensuring quality of service when multiple latency critical programs are executed together;
a system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
a dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
In a third aspect, the present application further provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs are stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first aspect.
In a fourth aspect, the present application also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.
In a fifth aspect, the present application also provides a computer program (product) comprising a computer program for implementing the method of any of the preceding first aspects when run on one or more processors.
Compared with the prior art, the beneficial effects of this application are:
by monitoring the performance indexes of the delay key programs in real time and utilizing CAT to dynamically divide LLC resources for the delay key programs of different types, the performance of the delay key programs in common execution is ensured, and the utilization rate of the LLC resources is improved as much as possible.
Intel technology supporting Last Level Cache (LLC) allocation may better utilize the cache through cache partitioning. The present application may utilize this technique to guarantee the performance requirements of users by preventing delay-critical programs from polluting each other's caches. In addition, the present application may better meet the performance requirements of the user by allocating more LLC resources to the performance-benefited delay-critical programs and reducing or stopping allocation to those delay-critical programs that do not benefit.
The invention dynamically adjusts the space occupied by the program by using the performance index of the stage of the program during operation. The invention can promote the quantity of the delay key programs and the resource utilization rate of the LLC as much as possible while ensuring the service quality of the programs.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a flowchart of a resource partitioning method according to a first embodiment;
FIG. 2 is a flowchart of the program phase performance analysis of the first embodiment.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Interpretation of terms:
delay critical procedure, refers to: applications with strict requirements on tail latency, tail latency is an important performance indicator for latency critical programs.
LLC means: the Last Level Cache, i.e. the Last Level Cache, refers to the highest Level Cache that is commonly shared by all functional units (e.g. CPU core, IGP and DSP) on the chip.
CAT, means: cache Allocation Technology, has the basic goal of enabling resource Allocation based on application priority or class of service (CLOS). The Intel Xeon processor E5 v4 family (and a subset of the Intel Xeon processor E5 v3 family devoted to communication) introduces functionality to configure and utilize cache allocation techniques on the last level cache.
CLOS, refers to: class of Service, CLOS, as an abstraction, may add multiple resource control attributes, thereby reducing software overhead during context switching.
Example one
The embodiment provides a method for ensuring the service quality when a plurality of delay key programs are executed together;
a method for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
s101: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share a last level cache space (LLC);
s102: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
s103: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together;
calculating first, second and third actual performance data for each program phase from the sampled data;
classifying the phase types of the corresponding program phases according to the first actual performance data;
classifying the performance of the program phase according to the second and third actual performance data;
s104: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
It should be understood that in S101, to ensure that each latency critical program has no CPU time contention, the initial state of the system places each latency critical program in a different core, with the latency critical programs on each core sharing the LLC.
Illustratively, the plurality of delay criticalities refers to two or more delay critical programs.
As one or more embodiments, the S101 further includes:
assuming that the buffer space LLC has N paths of spaces in total, reserving M paths of spaces as standby spaces, and averagely distributing the rest N-M paths of spaces to all delay key programs; n and M are both positive integers.
Illustratively, each delay-critical program is partitioned into different CLOS and isolated by CAT, reducing the impact between delay-critical programs.
For example, assuming that the system has N LLC spaces, M cache way spaces are reserved for CLOS #1 as candidate spaces, and the remaining N-M spaces are equally allocated to all latency critical programs.
Illustratively, assuming that the latency critical procedure is two, the LLC space is allocated from low to high by address, since the CAT only supports a contiguous partition of LLC space. Delay Key 1 occupies the amount of space represented by CLOS #0, equal in size toAnd (4) a way. The N LLC spaces are isolated and defined as CLOS #1, which is a spare space whose addresses begin with the address next to the last address of CLOS # 0. The remaining LLC space is allocated to delay-critical program 2, with the space defined as CLOS # 2.
As one or more embodiments, the S102: dividing each delay key program into a plurality of program stages; the method comprises the following specific steps:
the number of instructions is counted by a counter, and the stages of the program are divided by executing a set number of instructions.
As one or more embodiments, the S102: dividing each program stage into a plurality of program intervals; the method comprises the following specific steps:
counting the condition branch instructions, and triggering interruption after executing the X condition branch instructions;
namely every X condition branch instructions are used as a program interval; another hardware counter is responsible for recording the total number of instructions executed during the period, and X is a positive integer.
Illustratively, each program phase is subdivided into a number of program intervals; the method comprises the following specific steps:
each program phase is divided into a number of program intervals using different sampling periods.
It should be understood that, in S102, each delay-critical program is divided into program phases containing a fixed number of instructions, and in order to better acquire program performance information, the present application introduces a two-stage phase detection method.
Stage division: in the running process of a program, performance indexes (such as IPC) of the program may change, program segments belonging to the same stage have similar performance indexes, while program segments belonging to different stages have different performance indexes, and the program can be divided into different stages according to the performance indexes of the program. The method uses a fixed instruction number to divide the stages of the program, and then uses IPC indexes to classify the runtime stages of the program, wherein the fixed instruction number can be 1000 ten thousand, 1 hundred million and 10 hundred million.
And (5) dividing the interval. In order to obtain the phase information of the program in the running process in more detail, the method adopts an interval division method for subdividing the program phase. In order to reduce sampling overhead and information loss, performance data is sampled once every X condition branch instructions. The sampling period can be selected according to practical situations, for example, 100M and 200M.
As one or more embodiments, the S103: sampling a program interval of each program stage of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; the method comprises the following specific steps:
in the process that a plurality of delay key programs are operated together, a hardware performance counter is utilized to sample the program interval of each program phase of each delay key program, and the instruction number IPC, the number of LLC uncommitted numbers, the number of LLC hits and the number of LLC references of each period of a performance index are obtained.
Calculating program interval index MPKI by using the obtained LLC number of hits, LLC number of hits and LLC number of referencesLLCAnd HPKILLC. Program-spaced MPKILLCAnd HPKILLCThe calculation formula of (a) is as follows:
NumMissrefers to the number of LLC misses, NumInsRefers to LLC reference number, NumHitRefers to the number of LLC hits。
It should be understood that, with the performance indicators of all intervals in the program stage, the average of the performance indicators is calculated as the performance indicator of the stage where the delay-critical program is located, and is used for analyzing the stage behavior of the delay-critical program.
As one or more embodiments, the S103: calculating first, second and third actual performance data for each program phase from the sampled data; the method comprises the following specific steps:
calculating first actual performance data of each program stage according to the sampling data; the first actual performance data refers to: IPC average of number of instructions per cycle;
calculating second actual performance data of each program stage according to the sampling data; second actual performance data, refer to: mean MPKI of missed instruction count per thousand instructions in LLCLLC;
Calculating third actual performance data of each program stage according to the sampling data; the third actual performance data refers to: average HPKI of hit instruction count per thousand instructions on LLCLLC。
Illustratively, the IPC and MPKI of the phase are calculated according to the data sampled at each interval in the phaseLLC(number of missed instructions per thousand instructions on LLC) and HPKILLC(number of hits per thousand instructions on LLC). IPCLLC、MPKILLC、HPKILLCThe average calculation steps are as follows:
IPCLLC1IPC indicator for the first interval, n indicates in a program phaseNumber of intervals, MPKILLC1Refers to the MPKI index, HPKI, of the first intervalLLC1Refers to the HPKI index for the first interval.
As one or more embodiments, the S103: classifying the phase types of the corresponding program phases according to the first actual performance data; the method comprises the following specific steps:
where α is a first set threshold value and β is a second set threshold value.
As one or more embodiments, the S103: classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
where η refers to a third set threshold, γ refers to a fourth set threshold, and η < γ.
As one or more embodiments, the S104: dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program; the method comprises the following specific steps:
judging the stage type and the performance type of each program stage of each delay key program;
if the type of the program phase is A type and the performance type is a or b type, primarily judging that the cache space needs to be increased in the program phase; in the process of increasing the cache space, if the type of the program stage is not changed, immediately stopping increasing the cache space and reducing the increased cache space; if the phase type is changed into B or C, saving modification;
if the type of the program phase is A type and the performance type is c type, preliminarily judging that the cache space of the program phase needs to be reduced; if the 1-way cache space is reduced,if not, continuing to reduce the cache space, otherwise, restoring the modification operation;
if the program phase is B type and the performance type is B type, the cache space occupied by the program phase is not changed;
if the program stage is of type B and the performance type is of type a or c, preliminarily judging that the cache space needs to be reduced in the program stage; if the 1-path cache space is reduced and the stage type is not changed into the A type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program stage is C type and the performance type is a type or C type, judging whether the program stage has the phenomenon of surplus resources, and if the 1-path cache space is reduced and the stage type is not changed into the A type or the B type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program phase is C type and the performance type is b type, the cache space occupied by the program phase is not changed;
as one or more embodiments, the method further comprises:
acquiring the resource use condition of the LLC to perform dynamic management;
if the CLOS #1 has free space, acquiring the space from the CLOS #1, and if the cache space in the CLOS #1 is completely allocated, judging whether the adjacent programs on the physical address are in a resource surplus state; if yes, distributing redundant space according to the performance status of the adjacent program, and if the resource surplus condition does not exist, continuing to wait for the free space; and if the free space does not appear for a long time, carrying out data migration.
The phase performance data for each program is registered in a historical phase & performance table (HPPT). The HPPT stores phase information and performance information for each program. The method and the device dynamically adjust the cache occupied by the current stage according to the stage behavior of the running program and the performance information of the program during running.
And dynamically adjusting the cache space occupied by the delay key program according to the stage of the delay key program, the MPKI _ LLC and the HPKI _ LLC.
Fig. 1 depicts a resource partitioning method. For each program that needs to be executed, program performance information is obtained using a hardware performance counter.
FIG. 2 depicts a program phase performance analysis method.
For each program to be executed, use is made ofThe index analyzes the program runtime phase and utilizesAndthe indicators analyze the program performance and are dynamicAnd adjusting the cache space occupied by the program.
Example two
The embodiment provides a system for ensuring the service quality when a plurality of delay key programs are executed together;
a system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
a dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
It should be noted here that the initialization module, the phase division module, the classification module and the dynamic adjustment module correspond to steps S101 to S104 in the first embodiment, and the modules are the same as the corresponding steps in the implementation example and application scenarios, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.
In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The proposed system can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed.
EXAMPLE III
The present embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein, a processor is connected with the memory, the one or more computer programs are stored in the memory, and when the electronic device runs, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first embodiment.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.
The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Example four
The present embodiments also provide a computer-readable storage medium for storing computer instructions, which when executed by a processor, perform the method of the first embodiment.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (10)
1. A method for ensuring the service quality when a plurality of delay key programs are executed together is characterized by comprising the following steps:
initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
2. The method of claim 1, wherein the initializing a hardware counter, starting a plurality of latency critical processes; further comprising:
assuming that the buffer space LLC has N paths of spaces in total, reserving M paths of spaces as standby spaces, and averagely distributing the rest N-M paths of spaces to all delay key programs; n and M are both positive integers.
3. The method of claim 1, wherein each delay-critical program is divided into a number of program phases; the method comprises the following specific steps:
the number of instructions is counted by a counter, and the stages of the program are divided by executing a set number of instructions.
4. The method of claim 1, wherein each program phase is subdivided into a number of program intervals; the method comprises the following specific steps:
counting the condition branch instructions, and triggering interruption after executing the X condition branch instructions;
namely every X condition branch instructions are used as a program interval; another hardware counter is responsible for recording the total number of instructions executed during the period, and X is a positive integer.
5. The method of claim 1, wherein the program interval of each program phase of each delay critical program is sampled using a hardware performance counter during which a plurality of delay critical programs are being run together; the method comprises the following specific steps:
in the process that a plurality of delay key programs are operated together, a hardware performance counter is utilized to sample the program interval of each program phase of each delay key program, and the instruction number IPC, the number of LLC uncommitted numbers, the number of LLC hits and the number of LLC references of each period of a performance index are obtained.
6. The method of claim 1, wherein the first, second and third actual performance data for each program phase are calculated based on the sampled data; the method comprises the following specific steps:
calculating first actual performance data of each program stage according to the sampling data; the first actual performance data refers to: IPC average of number of instructions per cycle;
calculating second actual performance data of each program stage according to the sampling data; second actual performance data, refer to: mean MPKI of missed instruction count per thousand instructions in LLCLLC;
Calculating third actual performance data of each program stage according to the sampling data; the third actual performance data refers to: average HPKI of hit instruction count per thousand instructions on LLCLLC。
7. The method of claim 1, wherein the phase types of the respective program phases are classified based on the first actual performance data; the method comprises the following specific steps:
wherein alpha is a first set threshold value, beta is a second set threshold value;
or,
classifying the performance of the program phase according to the second and third actual performance data; the method comprises the following specific steps:
wherein η refers to a third set threshold, γ refers to a fourth set threshold, and η < γ;
dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program; the method comprises the following specific steps:
judging the stage type and the performance type of each program stage of each delay key program;
if the type of the program phase is A type and the performance type is a or b type, primarily judging that the cache space needs to be increased in the program phase; in the process of increasing the cache space, if the type of the program stage is not changed, immediately stopping increasing the cache space and reducing the increased cache space; if the phase type is changed into B or C, saving modification;
if the type of the program phase is A type and the performance type is c type, preliminarily judging that the cache space of the program phase needs to be reduced; if the 1-way cache space is reduced,if not, continuing to reduce the cache space, otherwise, restoring the modification operation;
if the program phase is B type and the performance type is B type, the cache space occupied by the program phase is not changed;
if the program stage is of type B and the performance type is of type a or c, preliminarily judging that the cache space needs to be reduced in the program stage; if the 1-path cache space is reduced and the stage type is not changed into the A type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program stage is C type and the performance type is a type or C type, judging whether the program stage has the phenomenon of surplus resources, and if the 1-path cache space is reduced and the stage type is not changed into the A type or the B type, continuing to reduce the cache space; otherwise, restoring the modified operation;
if the program phase is class C and the performance type is class b, the cache space occupied by the program phase is not changed.
8. A system for ensuring quality of service when a plurality of delay-critical programs are executed together, comprising:
an initialization module configured to: initializing a hardware counter and starting a plurality of delay key programs; each delay key program is preset in a corresponding core, and the delay key programs on each core share the last level cache space LLC;
a staging module configured to: dividing each delay key program into a plurality of program stages; dividing each program stage into a plurality of program intervals;
a classification module configured to: sampling a program interval in each program phase of each delay key program by using a hardware performance counter in the process that a plurality of delay key programs are operated together; calculating first, second and third actual performance data for each program phase from the sampled data; classifying the phase types of the corresponding program phases according to the first actual performance data; classifying the performance of the program phase according to the second and third actual performance data;
a dynamic adjustment module configured to: and dynamically adjusting the cache space occupied by each delay key program in the running process according to the stage type and the performance type of each program stage of each delay key program.
9. An electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of any of the preceding claims 1-7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011465046.2A CN112540934B (en) | 2020-12-14 | 2020-12-14 | Method and system for ensuring service quality when multiple delay key programs are executed together |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011465046.2A CN112540934B (en) | 2020-12-14 | 2020-12-14 | Method and system for ensuring service quality when multiple delay key programs are executed together |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112540934A true CN112540934A (en) | 2021-03-23 |
CN112540934B CN112540934B (en) | 2022-07-29 |
Family
ID=75018579
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011465046.2A Active CN112540934B (en) | 2020-12-14 | 2020-12-14 | Method and system for ensuring service quality when multiple delay key programs are executed together |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112540934B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113504977A (en) * | 2021-06-18 | 2021-10-15 | 山东师范大学 | Cache partitioning method and system for ensuring service quality of multiple delay key programs |
CN113821324A (en) * | 2021-09-17 | 2021-12-21 | 海光信息技术股份有限公司 | Cache system, method, apparatus and computer medium for processor |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101916230A (en) * | 2010-08-11 | 2010-12-15 | 中国科学技术大学苏州研究院 | Partitioning and thread-aware based performance optimization method of last level cache (LLC) |
US20110055479A1 (en) * | 2009-08-28 | 2011-03-03 | Vmware, Inc. | Thread Compensation For Microarchitectural Contention |
CN103077128A (en) * | 2012-12-29 | 2013-05-01 | 华中科技大学 | Method for dynamically partitioning shared cache in multi-core environment |
CN103235764A (en) * | 2013-04-11 | 2013-08-07 | 浙江大学 | Thread-aware multi-core data prefetching self-regulation method |
US20140095691A1 (en) * | 2012-09-28 | 2014-04-03 | Mrittika Ganguli | Managing data center resources to achieve a quality of service |
US9401869B1 (en) * | 2012-06-04 | 2016-07-26 | Google Inc. | System and methods for sharing memory subsystem resources among datacenter applications |
CN107463510A (en) * | 2017-08-21 | 2017-12-12 | 北京工业大学 | It is a kind of towards high performance heterogeneous polynuclear cache sharing amortization management method |
CN107851040A (en) * | 2015-07-23 | 2018-03-27 | 高通股份有限公司 | For the system and method using cache requirements monitoring scheduler task in heterogeneous processor cluster framework |
CN108845960A (en) * | 2013-10-23 | 2018-11-20 | 华为技术有限公司 | A kind of memory resource optimization method and device |
CN110618872A (en) * | 2019-09-25 | 2019-12-27 | 山东师范大学 | Hybrid memory dynamic scheduling method and system |
CN111258927A (en) * | 2019-11-13 | 2020-06-09 | 北京大学 | Application program CPU last-level cache miss rate curve prediction method based on sampling |
CN112000465A (en) * | 2020-07-21 | 2020-11-27 | 山东师范大学 | Method and system for reducing performance interference of delay sensitive program in data center environment |
-
2020
- 2020-12-14 CN CN202011465046.2A patent/CN112540934B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110055479A1 (en) * | 2009-08-28 | 2011-03-03 | Vmware, Inc. | Thread Compensation For Microarchitectural Contention |
CN101916230A (en) * | 2010-08-11 | 2010-12-15 | 中国科学技术大学苏州研究院 | Partitioning and thread-aware based performance optimization method of last level cache (LLC) |
US9401869B1 (en) * | 2012-06-04 | 2016-07-26 | Google Inc. | System and methods for sharing memory subsystem resources among datacenter applications |
US20140095691A1 (en) * | 2012-09-28 | 2014-04-03 | Mrittika Ganguli | Managing data center resources to achieve a quality of service |
CN103077128A (en) * | 2012-12-29 | 2013-05-01 | 华中科技大学 | Method for dynamically partitioning shared cache in multi-core environment |
CN103235764A (en) * | 2013-04-11 | 2013-08-07 | 浙江大学 | Thread-aware multi-core data prefetching self-regulation method |
CN108845960A (en) * | 2013-10-23 | 2018-11-20 | 华为技术有限公司 | A kind of memory resource optimization method and device |
CN107851040A (en) * | 2015-07-23 | 2018-03-27 | 高通股份有限公司 | For the system and method using cache requirements monitoring scheduler task in heterogeneous processor cluster framework |
CN107463510A (en) * | 2017-08-21 | 2017-12-12 | 北京工业大学 | It is a kind of towards high performance heterogeneous polynuclear cache sharing amortization management method |
CN110618872A (en) * | 2019-09-25 | 2019-12-27 | 山东师范大学 | Hybrid memory dynamic scheduling method and system |
CN111258927A (en) * | 2019-11-13 | 2020-06-09 | 北京大学 | Application program CPU last-level cache miss rate curve prediction method based on sampling |
CN112000465A (en) * | 2020-07-21 | 2020-11-27 | 山东师范大学 | Method and system for reducing performance interference of delay sensitive program in data center environment |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113504977A (en) * | 2021-06-18 | 2021-10-15 | 山东师范大学 | Cache partitioning method and system for ensuring service quality of multiple delay key programs |
CN113821324A (en) * | 2021-09-17 | 2021-12-21 | 海光信息技术股份有限公司 | Cache system, method, apparatus and computer medium for processor |
CN113821324B (en) * | 2021-09-17 | 2022-08-09 | 海光信息技术股份有限公司 | Cache system, method, apparatus and computer medium for processor |
Also Published As
Publication number | Publication date |
---|---|
CN112540934B (en) | 2022-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102456085B1 (en) | Dynamic memory remapping to reduce row buffer collisions | |
US20210374046A1 (en) | Performance counters for computer memory | |
US6865647B2 (en) | Dynamic cache partitioning | |
US7899994B2 (en) | Providing quality of service (QoS) for cache architectures using priority information | |
US7725657B2 (en) | Dynamic quality of service (QoS) for a shared cache | |
US8190795B2 (en) | Memory buffer allocation device and computer readable medium having stored thereon memory buffer allocation program | |
CN108845960B (en) | Memory resource optimization method and device | |
CN112540934B (en) | Method and system for ensuring service quality when multiple delay key programs are executed together | |
US20080235487A1 (en) | Applying quality of service (QoS) to a translation lookaside buffer (TLB) | |
US20110113215A1 (en) | Method and apparatus for dynamic resizing of cache partitions based on the execution phase of tasks | |
US20050125613A1 (en) | Reconfigurable trace cache | |
US8769543B2 (en) | System and method for maximizing data processing throughput via application load adaptive scheduling and context switching | |
CN109308220B (en) | Shared resource allocation method and device | |
KR101356033B1 (en) | Hybrid Main Memory System and Task Scheduling Method therefor | |
US20170371550A1 (en) | Frame choosing during storage constraint condition | |
US20200210340A1 (en) | Cache Management Method, Cache and Storage Medium | |
US9189279B2 (en) | Assignment method and multi-core processor system | |
WO2016202154A1 (en) | Gpu resource allocation method and system | |
US8769201B2 (en) | Technique for controlling computing resources | |
US20190056872A1 (en) | Reallocate memory pending queue based on stall | |
CN106294192B (en) | Memory allocation method, memory allocation device and server | |
CN115421924A (en) | Memory allocation method, device and equipment | |
CN112579277B (en) | Central processing unit, method, device and storage medium for simultaneous multithreading | |
CN116483742A (en) | Prefetch address generation method and computer equipment | |
CN113505087B (en) | Cache dynamic dividing method and system considering service quality and utilization rate |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |