CN108712303B

CN108712303B - Tail delay evaluation system and method for cloud platform

Info

Publication number: CN108712303B
Application number: CN201810386680.3A
Authority: CN
Inventors: 李克秋; 张桌箫; 齐恒; 张玉超
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2018-04-25
Filing date: 2018-04-25
Publication date: 2021-08-20
Anticipated expiration: 2038-04-25
Also published as: CN108712303A

Abstract

The invention belongs to the technical field of computer application, and provides a tail delay evaluation system and method of a cloud platform. The tail delay evaluation system comprises a server, a client and a logic analysis module; the client comprises a request generation module and a delay statistic module; the request generation module generates random requests at a fixed rate, the client sends the generated requests to a server deployed on the cloud platform through a TCP/IP protocol, and the server feeds back processing results to the client to complete one request. And reading the delay statistical file by the logic analysis module to obtain a linear regression equation between the processing quantity in the time period and the time period sequence. The invention realizes that the tail delay probability distribution graph which can accurately depict the tail delay performance between cloud platforms can be obtained only by monitoring at the client. The method can accurately depict the performance of the tail delay performance by a small amount of tests and calculation of probability statistics.

Description

Tail delay evaluation system and method for cloud platform

Technical Field

The invention belongs to the technical field of computer application, and particularly relates to a system and a method for evaluating tail delay of a cloud platform.

Background

With the development of cloud computing, more and more companies choose to arrange large-scale, high-complexity, high-density computing processes on leased cloud platforms in a short time, however, due to the inherent drawbacks of the von neumann architecture, when multiple users of virtual machines are distributed on the same physical machine, the overall performance is greatly unstable and significantly degraded, which is mainly due to the following two reasons:

on one hand, when the requests of different users work on the same physical machine, the bottom layer hardware cannot identify the specific source user of the instruction request, which causes a large amount of inter-user interference among different virtual machines when multiple virtual machine users run on the same physical machine, and the interference causes random allocation to any user when the same CPU core runs, thereby generating a large amount of interrupt overhead.

On the other hand, for a single user, each request is interfered by users outside the virtual machine in the physical layer during execution, and the interference is completely random and unpredictable. Even if the CPU and memory resources are not fully utilized, delay sensitive requests under such random interference may produce several to tens of thousands of times of delay variations in the original single-user mode.

Tail delays, which are delays accounting for 90%, 95%, or 99% of the deciles of all delay-sensitive requests in the case of such performance degradation and random interference, are often much higher than the average delay, and for delay-sensitive services, these tail delays result in very poor user experience, so how to correctly evaluate the performance of the cloud platform on the tail delay performance has been an important issue of concern in the business industry.

On one hand, the existing testing tool needs to modify a client and a server to a certain degree, leave corresponding testing interfaces and also need root authorities of the server, and many times users do not have corresponding root authorities, on the other hand, the existing testing tool only carries out simple measurement when evaluating tail delay, data obtained by measurement in a short time has large volatility, and long-time measurement is difficult to ensure the stability of testing environments among different platforms.

Disclosure of Invention

In order to solve the problems, the invention provides a method for evaluating the tail delay performance of a cloud platform. According to the method, on one hand, the modification range is limited in the client side, the contradiction that a user cannot obtain the root authority of the server side under some conditions is avoided, the selection range of software which can be used for evaluation is expanded, on the other hand, the tail delay value is not simply obtained any more, the delay data collected by the client side is analyzed and sorted, the probability distribution map of the tail delay is obtained through a probability-based algorithm, and therefore the evaluation difficulty is reduced when the evaluation is greatly saved.

The technical scheme of the invention is as follows:

a tail delay evaluation system of a cloud platform comprises a server, a client and a logic analysis module;

the client comprises a request generation module and a delay statistic module; the request generation module generates random requests at a fixed rate, the client sends the generated requests to a server deployed on the cloud platform through a TCP/IP protocol, and the server feeds back processing results to the client to complete one request. The delay counting module runs along with the client, records the current time when the client starts running, records the time for sending the request when the client sends a request, records the time for receiving feedback when the client receives a processing result fed back by the server, and records the request generation time, the time for receiving feedback and the client delay time counting collection obtained by calculation in a delay counting file.

And the logic analysis module reads the delay statistical file, sequences through the time difference between the time when the client receives the feedback and the time when the client starts, and counts the number of the feedback requests received in each time period with fixed length to obtain a linear regression equation between the processing number in the time period and the time period sequence. And calculating the interference ratio in each time period by taking the request number of the time period with the largest request number as a standard, and modifying all delay data according to the interference ratio and the average delay in the time period to eliminate delay fluctuation caused by interference. After the first modification is completed, linear regression analysis is carried out on the modified data again, the expected average delay time in each time period is calculated according to a linear regression equation, the delay data in all the time periods are modified according to the expected average delay time, and the interference of network transmission performance fluctuation to the delay time of the client is reduced.

And calculating the delay data and probability distribution condition of the solved subordinates according to the second modified delay data and the previously measured interference ratio of each time period and the Poisson distribution, namely the expected tail delay value and the probability distribution condition of the solved subordinates.

A tail delay evaluation method of a cloud platform comprises the following steps:

the first step is as follows: the delay statistic module records the client starting time t 1; for each request, recording the time t2 when the client sends the request to the server; and recording the feedback time t3 sent back by the client side after the client side receives the feedback. For each group of t2 and t3, three corresponding time difference values of (t2-t1), (t3-t2) and (t3-t1) are recorded, and after N times of tests, a delay statistic file is generated.

The second step is that: solving a linear regression equation

2.1 sorting all data groups in the delay statistical file in an ascending order according to the sizes of data items t3-t 1;

2.2 divide the equal time quantum by the partition time quantum length, the time quantum length u is:

u＝t/(1-c)

wherein c is the required accuracy; t is the minimum value of all data t3-t2 recorded by the client.

2.3 counting the number of the received feedbacks in each fixed time period u, namely the number of the requests completed in the time period;

and (3) completing a linear regression equation between the number of the requests and the sequence number of the time period in all the time periods:

y＝kx+b

wherein k is a slope value; b is a constant; y is the number of requests completed in a time period; x is the time period number.

The third step: interference ratio P in each time period:

the interference ratio P is:

P＝(l-s)/l

where s is the completion quantity value in each time period and l is the maximum value in s.

The fourth step: first time modification of delay data

Let the delay data be a, the average delay over the time period be a, for each time period,

when k < >0, the following operations are performed on all delay data a in the current time period:

a'＝a-(A-A/(1+P))

when k is greater than 0, the following operations are carried out on the delayed data a in the current time period and all the time periods after the current time period:

a'＝a-(A-A/(1+P))

wherein, P is the interference ratio in each time period; the delay data is the average of t3-t 2.

The fifth step: second modifying the delayed data

Calculating the average value A 'of a' in each time period according to the modified a 'obtained in the fourth step, and performing linear regression analysis on the A' and the time period sequence to obtain a new linear regression equation:

y＝k'x+b'

wherein k ' is a slope value, b ' is a constant, x is a time period number, and y is A '.

And b, modifying a' for the second time according to the linear regression equation to obtain:

a″＝a'-(A'-D)

and D is the corresponding expected value in each time period in the linear regression equation.

And a sixth step: expected value of tail delay T

Case 1: when k' >0, the expected value of the tail delay, T:

T＝t4+t5*p1/(1-p1)

wherein t4 is the tail delay of the divided bit, and t5 is t3-t1 corresponding to the tail delay; p1 is the average value of the interference ratio P in the time period of tail delay and all the time periods before.

And (3) drawing Poisson distribution with probability p1, wherein the abscissa is delay time, the starting point is T1, the highest point is T, and the ordinate is probability.

Case 2: and when k' < ═ 0, calculating Poisson distribution of the delay time according to the corresponding interference ratio P for all time periods, and multiplying the probability of each time point by the number of the completed requests in the current time period, namely obtaining the expected value e of the occurrence times of each delay time point, thereby obtaining the expected value of the tail delay.

For any delay time point, the probability P' that the tail delay falls at that point is equal to the sum of the probabilities corresponding to all delay time points from that point to the tail delay expectation point, and obeys a Poisson distribution.

The seventh step: summarizing the probability statistics of all the points and then drawing to obtain the tail delay probability distribution graph.

The invention has the beneficial effects that:

1. greatly saving the test time. In a common method for measuring tail delay, the measurement result has large fluctuation, and a large number of repeated tests are needed to obtain a more accurate tail delay value. The method can accurately depict the performance of the tail delay performance by a small amount of tests and combining probability statistics to obtain a tail delay probability distribution diagram.

2. The details of the carving are richer. Compared with the common measurement method, only one simple numerical value can be obtained after the test, the method can be deeper into the delay statistical data during evaluation, the performance fluctuation of the cloud platform is depicted from more angles, the bottleneck is pointed out more intuitively and effectively, and the cloud platform architecture is improved conveniently.

Drawings

Fig. 1 is an overall architecture diagram of the present invention.

FIG. 2 is a block diagram of a statistical analysis of the present invention.

FIG. 3 is a flow diagram of a logic analysis module of the present invention.

Detailed Description

The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.

Referring to fig. 1 to 3, a method for evaluating tail delay of a cloud platform includes the following steps:

The second step is that: solving a linear regression equation

u＝t/(1-c)

y＝kx+b

The third step: interference ratio P in each time period:

the interference ratio P is:

P＝(l-s)/l

The fourth step: first time modification of delay data

a'＝a-(A-A/(1+P))

The fifth step: second modifying the delayed data

y＝k'x+b'

a″＝a'-(A'-D)

And a sixth step: expected value of tail delay T

Case 1: when k' >0, the expected value of the tail delay, T:

T＝t4+t5*p1/(1-p1)

Case 2: and when k' < ═ 0, calculating Poisson distribution of the delay time according to the corresponding interference ratio P for all time periods, and multiplying the probability of each time point by the number of the completed requests in the current time period, namely obtaining the expected value e of the occurrence times of each delay time point to obtain the delay data of the obtained decimals, namely the expected value of the tail delay.

For each delay time point, a weighted average interference ratio p2 corresponding to the time point is calculated according to the expected value supplied for each time segment.

Claims

1. The tail delay evaluation system of the cloud platform is characterized by comprising a server, a client and a logic analysis module;

the client comprises a request generation module and a delay statistic module; the method comprises the steps that a request generating module generates random requests at a fixed rate, a client sends the generated requests to a server deployed on a cloud platform through a TCP/IP protocol, and the server feeds back processing results to the client to complete one request; the delay counting module runs along with the client, records the current time when the client starts running, records the time for sending the request when the client sends a request, records the time for receiving feedback when the client receives a processing result fed back by the server, and records the request generation time, the time for receiving feedback and the client delay time counting collection obtained by calculation in a delay counting file;

the logic analysis module reads the delay statistical file, sequences through the time difference between the time when the client receives the feedback and the time when the client starts, and counts the number of the feedback requests received in each time period with fixed length to obtain a linear regression equation between the processing number in the time period and the time period sequence; calculating the interference ratio in each time period by taking the request number of the time period with the largest request number as a standard, and modifying all delay data according to the interference ratio and the average delay in the time period to eliminate delay fluctuation caused by interference; after the first modification is completed, performing linear regression analysis on the modified data again, calculating the expected average delay time in each time period according to a linear regression equation, modifying the delay data in all the time periods according to the expected average delay time, and reducing the interference of network transmission performance fluctuation on the delay time of the client;

2. The tail delay evaluation method of the evaluation system of claim 1, characterized by the steps of:

the first step is as follows: the delay statistic module records the client starting time t 1; for each request, recording the time t2 when the client sends the request to the server; recording feedback time t3 when the client receives the feedback sent back by the server; recording corresponding time difference values of (t2-t1), (t3-t2) and (t3-t1) for each group of t2 and t3, and generating a delay statistical file after N times of tests;

the second step is that: solving a linear regression equation

u＝t/(1-c)

wherein c is the required accuracy; t is the minimum value of all data t3-t2 recorded by the client;

y＝kx+b

wherein k is a slope value; b is a constant; y is the number of requests completed in a time period; x is a time period sequence number;

the third step: interference ratio P in each time period:

the interference ratio P is:

P＝(l-s)/l

wherein s is a completion quantity value in each time period, and l is the maximum value in s;

the fourth step: first time modification of delay data

a'＝a-(A-A/(1+P))

wherein, P is the interference ratio in each time period; the average value of the delay data, t3-t 2;

the fifth step: second modifying the delayed data

y＝k'x+b'

wherein k ' is a slope value, b ' is a constant, x is a time period serial number, and y is A ';

a”＝a'-(A'-D)

d is a corresponding expected value in each time period in the linear regression equation;

and a sixth step: expected value of tail delay T

Case 1: when k' >0, the expected value of the tail delay, T:

T＝t4+t5*p1/(1-p1)

wherein t4 is the tail delay of the divided bit, and t5 is t3-t1 corresponding to the tail delay; p1 is the average value of the interference ratio P in the time period of tail delay and all the time periods before;

drawing Poisson distribution with probability p1, wherein the abscissa is delay time, the starting point is T1, the highest point is T, and the ordinate is probability;

case 2: when k' is less than 0, calculating Poisson distribution of delay time for all time periods according to the corresponding interference ratio P, and multiplying the probability of each time point by the number of the completed requests in the current time period, namely obtaining an expected value e of the occurrence times of each delay time point, thereby obtaining a tail delay expected value;

for any delay time point, the probability P' that the tail delay falls at the point is equal to the sum of the probabilities corresponding to all delay time points from the point to the tail delay expectation value point, and the Poisson distribution is obeyed;