CN108712303B - Tail delay evaluation system and method for cloud platform - Google Patents

Tail delay evaluation system and method for cloud platform Download PDF

Info

Publication number
CN108712303B
CN108712303B CN201810386680.3A CN201810386680A CN108712303B CN 108712303 B CN108712303 B CN 108712303B CN 201810386680 A CN201810386680 A CN 201810386680A CN 108712303 B CN108712303 B CN 108712303B
Authority
CN
China
Prior art keywords
delay
time
time period
client
tail
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810386680.3A
Other languages
Chinese (zh)
Other versions
CN108712303A (en
Inventor
李克秋
张桌箫
齐恒
张玉超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN201810386680.3A priority Critical patent/CN108712303B/en
Publication of CN108712303A publication Critical patent/CN108712303A/en
Application granted granted Critical
Publication of CN108712303B publication Critical patent/CN108712303B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0852Delays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Complex Calculations (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention belongs to the technical field of computer application, and provides a tail delay evaluation system and method of a cloud platform. The tail delay evaluation system comprises a server, a client and a logic analysis module; the client comprises a request generation module and a delay statistic module; the request generation module generates random requests at a fixed rate, the client sends the generated requests to a server deployed on the cloud platform through a TCP/IP protocol, and the server feeds back processing results to the client to complete one request. And reading the delay statistical file by the logic analysis module to obtain a linear regression equation between the processing quantity in the time period and the time period sequence. The invention realizes that the tail delay probability distribution graph which can accurately depict the tail delay performance between cloud platforms can be obtained only by monitoring at the client. The method can accurately depict the performance of the tail delay performance by a small amount of tests and calculation of probability statistics.

Description

Tail delay evaluation system and method for cloud platform
Technical Field
The invention belongs to the technical field of computer application, and particularly relates to a system and a method for evaluating tail delay of a cloud platform.
Background
With the development of cloud computing, more and more companies choose to arrange large-scale, high-complexity, high-density computing processes on leased cloud platforms in a short time, however, due to the inherent drawbacks of the von neumann architecture, when multiple users of virtual machines are distributed on the same physical machine, the overall performance is greatly unstable and significantly degraded, which is mainly due to the following two reasons:
on one hand, when the requests of different users work on the same physical machine, the bottom layer hardware cannot identify the specific source user of the instruction request, which causes a large amount of inter-user interference among different virtual machines when multiple virtual machine users run on the same physical machine, and the interference causes random allocation to any user when the same CPU core runs, thereby generating a large amount of interrupt overhead.
On the other hand, for a single user, each request is interfered by users outside the virtual machine in the physical layer during execution, and the interference is completely random and unpredictable. Even if the CPU and memory resources are not fully utilized, delay sensitive requests under such random interference may produce several to tens of thousands of times of delay variations in the original single-user mode.
Tail delays, which are delays accounting for 90%, 95%, or 99% of the deciles of all delay-sensitive requests in the case of such performance degradation and random interference, are often much higher than the average delay, and for delay-sensitive services, these tail delays result in very poor user experience, so how to correctly evaluate the performance of the cloud platform on the tail delay performance has been an important issue of concern in the business industry.
On one hand, the existing testing tool needs to modify a client and a server to a certain degree, leave corresponding testing interfaces and also need root authorities of the server, and many times users do not have corresponding root authorities, on the other hand, the existing testing tool only carries out simple measurement when evaluating tail delay, data obtained by measurement in a short time has large volatility, and long-time measurement is difficult to ensure the stability of testing environments among different platforms.
Disclosure of Invention
In order to solve the problems, the invention provides a method for evaluating the tail delay performance of a cloud platform. According to the method, on one hand, the modification range is limited in the client side, the contradiction that a user cannot obtain the root authority of the server side under some conditions is avoided, the selection range of software which can be used for evaluation is expanded, on the other hand, the tail delay value is not simply obtained any more, the delay data collected by the client side is analyzed and sorted, the probability distribution map of the tail delay is obtained through a probability-based algorithm, and therefore the evaluation difficulty is reduced when the evaluation is greatly saved.
The technical scheme of the invention is as follows:
a tail delay evaluation system of a cloud platform comprises a server, a client and a logic analysis module;
the client comprises a request generation module and a delay statistic module; the request generation module generates random requests at a fixed rate, the client sends the generated requests to a server deployed on the cloud platform through a TCP/IP protocol, and the server feeds back processing results to the client to complete one request. The delay counting module runs along with the client, records the current time when the client starts running, records the time for sending the request when the client sends a request, records the time for receiving feedback when the client receives a processing result fed back by the server, and records the request generation time, the time for receiving feedback and the client delay time counting collection obtained by calculation in a delay counting file.
And the logic analysis module reads the delay statistical file, sequences through the time difference between the time when the client receives the feedback and the time when the client starts, and counts the number of the feedback requests received in each time period with fixed length to obtain a linear regression equation between the processing number in the time period and the time period sequence. And calculating the interference ratio in each time period by taking the request number of the time period with the largest request number as a standard, and modifying all delay data according to the interference ratio and the average delay in the time period to eliminate delay fluctuation caused by interference. After the first modification is completed, linear regression analysis is carried out on the modified data again, the expected average delay time in each time period is calculated according to a linear regression equation, the delay data in all the time periods are modified according to the expected average delay time, and the interference of network transmission performance fluctuation to the delay time of the client is reduced.
And calculating the delay data and probability distribution condition of the solved subordinates according to the second modified delay data and the previously measured interference ratio of each time period and the Poisson distribution, namely the expected tail delay value and the probability distribution condition of the solved subordinates.
A tail delay evaluation method of a cloud platform comprises the following steps:
the first step is as follows: the delay statistic module records the client starting time t 1; for each request, recording the time t2 when the client sends the request to the server; and recording the feedback time t3 sent back by the client side after the client side receives the feedback. For each group of t2 and t3, three corresponding time difference values of (t2-t1), (t3-t2) and (t3-t1) are recorded, and after N times of tests, a delay statistic file is generated.
The second step is that: solving a linear regression equation
2.1 sorting all data groups in the delay statistical file in an ascending order according to the sizes of data items t3-t 1;
2.2 divide the equal time quantum by the partition time quantum length, the time quantum length u is:
u=t/(1-c)
wherein c is the required accuracy; t is the minimum value of all data t3-t2 recorded by the client.
2.3 counting the number of the received feedbacks in each fixed time period u, namely the number of the requests completed in the time period;
and (3) completing a linear regression equation between the number of the requests and the sequence number of the time period in all the time periods:
y=kx+b
wherein k is a slope value; b is a constant; y is the number of requests completed in a time period; x is the time period number.
The third step: interference ratio P in each time period:
the interference ratio P is:
P=(l-s)/l
where s is the completion quantity value in each time period and l is the maximum value in s.
The fourth step: first time modification of delay data
Let the delay data be a, the average delay over the time period be a, for each time period,
when k < >0, the following operations are performed on all delay data a in the current time period:
a'=a-(A-A/(1+P))
when k is greater than 0, the following operations are carried out on the delayed data a in the current time period and all the time periods after the current time period:
a'=a-(A-A/(1+P))
wherein, P is the interference ratio in each time period; the delay data is the average of t3-t 2.
The fifth step: second modifying the delayed data
Calculating the average value A 'of a' in each time period according to the modified a 'obtained in the fourth step, and performing linear regression analysis on the A' and the time period sequence to obtain a new linear regression equation:
y=k'x+b'
wherein k ' is a slope value, b ' is a constant, x is a time period number, and y is A '.
And b, modifying a' for the second time according to the linear regression equation to obtain:
a″=a'-(A'-D)
and D is the corresponding expected value in each time period in the linear regression equation.
And a sixth step: expected value of tail delay T
Case 1: when k' >0, the expected value of the tail delay, T:
T=t4+t5*p1/(1-p1)
wherein t4 is the tail delay of the divided bit, and t5 is t3-t1 corresponding to the tail delay; p1 is the average value of the interference ratio P in the time period of tail delay and all the time periods before.
And (3) drawing Poisson distribution with probability p1, wherein the abscissa is delay time, the starting point is T1, the highest point is T, and the ordinate is probability.
Case 2: and when k' < ═ 0, calculating Poisson distribution of the delay time according to the corresponding interference ratio P for all time periods, and multiplying the probability of each time point by the number of the completed requests in the current time period, namely obtaining the expected value e of the occurrence times of each delay time point, thereby obtaining the expected value of the tail delay.
For any delay time point, the probability P' that the tail delay falls at that point is equal to the sum of the probabilities corresponding to all delay time points from that point to the tail delay expectation point, and obeys a Poisson distribution.
The seventh step: summarizing the probability statistics of all the points and then drawing to obtain the tail delay probability distribution graph.
The invention has the beneficial effects that:
1. greatly saving the test time. In a common method for measuring tail delay, the measurement result has large fluctuation, and a large number of repeated tests are needed to obtain a more accurate tail delay value. The method can accurately depict the performance of the tail delay performance by a small amount of tests and combining probability statistics to obtain a tail delay probability distribution diagram.
2. The details of the carving are richer. Compared with the common measurement method, only one simple numerical value can be obtained after the test, the method can be deeper into the delay statistical data during evaluation, the performance fluctuation of the cloud platform is depicted from more angles, the bottleneck is pointed out more intuitively and effectively, and the cloud platform architecture is improved conveniently.
Drawings
Fig. 1 is an overall architecture diagram of the present invention.
FIG. 2 is a block diagram of a statistical analysis of the present invention.
FIG. 3 is a flow diagram of a logic analysis module of the present invention.
Detailed Description
The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.
Referring to fig. 1 to 3, a method for evaluating tail delay of a cloud platform includes the following steps:
the first step is as follows: the delay statistic module records the client starting time t 1; for each request, recording the time t2 when the client sends the request to the server; and recording the feedback time t3 sent back by the client side after the client side receives the feedback. For each group of t2 and t3, three corresponding time difference values of (t2-t1), (t3-t2) and (t3-t1) are recorded, and after N times of tests, a delay statistic file is generated.
The second step is that: solving a linear regression equation
2.1 sorting all data groups in the delay statistical file in an ascending order according to the sizes of data items t3-t 1;
2.2 divide the equal time quantum by the partition time quantum length, the time quantum length u is:
u=t/(1-c)
wherein c is the required accuracy; t is the minimum value of all data t3-t2 recorded by the client.
2.3 counting the number of the received feedbacks in each fixed time period u, namely the number of the requests completed in the time period;
and (3) completing a linear regression equation between the number of the requests and the sequence number of the time period in all the time periods:
y=kx+b
wherein k is a slope value; b is a constant; y is the number of requests completed in a time period; x is the time period number.
The third step: interference ratio P in each time period:
the interference ratio P is:
P=(l-s)/l
where s is the completion quantity value in each time period and l is the maximum value in s.
The fourth step: first time modification of delay data
Let the delay data be a, the average delay over the time period be a, for each time period,
when k < >0, the following operations are performed on all delay data a in the current time period:
a'=a-(A-A/(1+P))
when k is greater than 0, the following operations are carried out on the delayed data a in the current time period and all the time periods after the current time period:
a'=a-(A-A/(1+P))
wherein, P is the interference ratio in each time period; the delay data is the average of t3-t 2.
The fifth step: second modifying the delayed data
Calculating the average value A 'of a' in each time period according to the modified a 'obtained in the fourth step, and performing linear regression analysis on the A' and the time period sequence to obtain a new linear regression equation:
y=k'x+b'
wherein k ' is a slope value, b ' is a constant, x is a time period number, and y is A '.
And b, modifying a' for the second time according to the linear regression equation to obtain:
a″=a'-(A'-D)
and D is the corresponding expected value in each time period in the linear regression equation.
And a sixth step: expected value of tail delay T
Case 1: when k' >0, the expected value of the tail delay, T:
T=t4+t5*p1/(1-p1)
wherein t4 is the tail delay of the divided bit, and t5 is t3-t1 corresponding to the tail delay; p1 is the average value of the interference ratio P in the time period of tail delay and all the time periods before.
And (3) drawing Poisson distribution with probability p1, wherein the abscissa is delay time, the starting point is T1, the highest point is T, and the ordinate is probability.
Case 2: and when k' < ═ 0, calculating Poisson distribution of the delay time according to the corresponding interference ratio P for all time periods, and multiplying the probability of each time point by the number of the completed requests in the current time period, namely obtaining the expected value e of the occurrence times of each delay time point to obtain the delay data of the obtained decimals, namely the expected value of the tail delay.
For each delay time point, a weighted average interference ratio p2 corresponding to the time point is calculated according to the expected value supplied for each time segment.
For any delay time point, the probability P' that the tail delay falls at that point is equal to the sum of the probabilities corresponding to all delay time points from that point to the tail delay expectation point, and obeys a Poisson distribution.
The seventh step: summarizing the probability statistics of all the points and then drawing to obtain the tail delay probability distribution graph.

Claims (2)

1. The tail delay evaluation system of the cloud platform is characterized by comprising a server, a client and a logic analysis module;
the client comprises a request generation module and a delay statistic module; the method comprises the steps that a request generating module generates random requests at a fixed rate, a client sends the generated requests to a server deployed on a cloud platform through a TCP/IP protocol, and the server feeds back processing results to the client to complete one request; the delay counting module runs along with the client, records the current time when the client starts running, records the time for sending the request when the client sends a request, records the time for receiving feedback when the client receives a processing result fed back by the server, and records the request generation time, the time for receiving feedback and the client delay time counting collection obtained by calculation in a delay counting file;
the logic analysis module reads the delay statistical file, sequences through the time difference between the time when the client receives the feedback and the time when the client starts, and counts the number of the feedback requests received in each time period with fixed length to obtain a linear regression equation between the processing number in the time period and the time period sequence; calculating the interference ratio in each time period by taking the request number of the time period with the largest request number as a standard, and modifying all delay data according to the interference ratio and the average delay in the time period to eliminate delay fluctuation caused by interference; after the first modification is completed, performing linear regression analysis on the modified data again, calculating the expected average delay time in each time period according to a linear regression equation, modifying the delay data in all the time periods according to the expected average delay time, and reducing the interference of network transmission performance fluctuation on the delay time of the client;
and calculating the delay data and probability distribution condition of the solved subordinates according to the second modified delay data and the previously measured interference ratio of each time period and the Poisson distribution, namely the expected tail delay value and the probability distribution condition of the solved subordinates.
2. The tail delay evaluation method of the evaluation system of claim 1, characterized by the steps of:
the first step is as follows: the delay statistic module records the client starting time t 1; for each request, recording the time t2 when the client sends the request to the server; recording feedback time t3 when the client receives the feedback sent back by the server; recording corresponding time difference values of (t2-t1), (t3-t2) and (t3-t1) for each group of t2 and t3, and generating a delay statistical file after N times of tests;
the second step is that: solving a linear regression equation
2.1 sorting all data groups in the delay statistical file in an ascending order according to the sizes of data items t3-t 1;
2.2 divide the equal time quantum by the partition time quantum length, the time quantum length u is:
u=t/(1-c)
wherein c is the required accuracy; t is the minimum value of all data t3-t2 recorded by the client;
2.3 counting the number of the received feedbacks in each fixed time period u, namely the number of the requests completed in the time period;
and (3) completing a linear regression equation between the number of the requests and the sequence number of the time period in all the time periods:
y=kx+b
wherein k is a slope value; b is a constant; y is the number of requests completed in a time period; x is a time period sequence number;
the third step: interference ratio P in each time period:
the interference ratio P is:
P=(l-s)/l
wherein s is a completion quantity value in each time period, and l is the maximum value in s;
the fourth step: first time modification of delay data
Let the delay data be a, the average delay over the time period be a, for each time period,
when k < >0, the following operations are performed on all delay data a in the current time period:
a'=a-(A-A/(1+P))
when k is greater than 0, the following operations are carried out on the delayed data a in the current time period and all the time periods after the current time period:
a'=a-(A-A/(1+P))
wherein, P is the interference ratio in each time period; the average value of the delay data, t3-t 2;
the fifth step: second modifying the delayed data
Calculating the average value A 'of a' in each time period according to the modified a 'obtained in the fourth step, and performing linear regression analysis on the A' and the time period sequence to obtain a new linear regression equation:
y=k'x+b'
wherein k ' is a slope value, b ' is a constant, x is a time period serial number, and y is A ';
and b, modifying a' for the second time according to the linear regression equation to obtain:
a”=a'-(A'-D)
d is a corresponding expected value in each time period in the linear regression equation;
and a sixth step: expected value of tail delay T
Case 1: when k' >0, the expected value of the tail delay, T:
T=t4+t5*p1/(1-p1)
wherein t4 is the tail delay of the divided bit, and t5 is t3-t1 corresponding to the tail delay; p1 is the average value of the interference ratio P in the time period of tail delay and all the time periods before;
drawing Poisson distribution with probability p1, wherein the abscissa is delay time, the starting point is T1, the highest point is T, and the ordinate is probability;
case 2: when k' is less than 0, calculating Poisson distribution of delay time for all time periods according to the corresponding interference ratio P, and multiplying the probability of each time point by the number of the completed requests in the current time period, namely obtaining an expected value e of the occurrence times of each delay time point, thereby obtaining a tail delay expected value;
for any delay time point, the probability P' that the tail delay falls at the point is equal to the sum of the probabilities corresponding to all delay time points from the point to the tail delay expectation value point, and the Poisson distribution is obeyed;
the seventh step: summarizing the probability statistics of all the points and then drawing to obtain the tail delay probability distribution graph.
CN201810386680.3A 2018-04-25 2018-04-25 Tail delay evaluation system and method for cloud platform Active CN108712303B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810386680.3A CN108712303B (en) 2018-04-25 2018-04-25 Tail delay evaluation system and method for cloud platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810386680.3A CN108712303B (en) 2018-04-25 2018-04-25 Tail delay evaluation system and method for cloud platform

Publications (2)

Publication Number Publication Date
CN108712303A CN108712303A (en) 2018-10-26
CN108712303B true CN108712303B (en) 2021-08-20

Family

ID=63867458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810386680.3A Active CN108712303B (en) 2018-04-25 2018-04-25 Tail delay evaluation system and method for cloud platform

Country Status (1)

Country Link
CN (1) CN108712303B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109586996B (en) * 2018-11-08 2022-03-18 孔欣然 Cloud platform real-time testing system and method based on network message time delay comparison

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102917399B (en) * 2012-11-19 2015-09-16 无锡清华信息科学与技术国家实验室物联网技术中心 A kind of time-delay measuring method of wireless sensor network
CN104486129B (en) * 2014-12-24 2017-11-03 中国科学院计算技术研究所 The method and system of application service quality are ensured under distributed environment
US10346425B2 (en) * 2015-07-02 2019-07-09 Google Llc Distributed storage system with replica location selection
US10474505B2 (en) * 2016-09-02 2019-11-12 Telefonaktiebolaget Lm Ericsson (Publ) Systems and methods of managing computational resources
US10372344B2 (en) * 2016-12-08 2019-08-06 Western Digital Technologies, Inc. Read tail latency reduction

Also Published As

Publication number Publication date
CN108712303A (en) 2018-10-26

Similar Documents

Publication Publication Date Title
US8296426B2 (en) System and method for performing capacity planning for enterprise applications
EP3857381B1 (en) Collecting samples hierarchically in a datacenter
CN108683560A (en) A kind of performance benchmark test system and method for high amount of traffic processing frame
US7860700B2 (en) Hardware verification batch computing farm simulator
Quoc et al. Approximate stream analytics in apache flink and apache spark streaming
Wan et al. Analysis and modeling of the end-to-end i/o performance on olcf's titan supercomputer
Perez-Palacin et al. Log2cloud: Log-based prediction of cost-performance trade-offs for cloud deployments
Tran et al. Towards a profound analysis of bags-of-tasks in parallel systems and their performance impact
CN108712303B (en) Tail delay evaluation system and method for cloud platform
US8271643B2 (en) Method for building enterprise scalability models from production data
van Dijk Bounds and error bounds for queueing networks
Incerto et al. Moving horizon estimation of service demands in queuing networks
WO2012093469A1 (en) Performance evaluation device and performance evaluation method
US11797366B1 (en) Identifying a root cause of an error
Geimer et al. Recent developments in the scalasca toolset
CN111274112B (en) Application program pressure measurement method, device, computer equipment and storage medium
De Mello et al. A new migration model based on the evaluation of processes load and lifetime on heterogeneous computing environments
Kordelas et al. KORDI: A Framework for Real-Time Performance and Cost Optimization of Apache Spark Streaming
Hovestadt et al. Adaptive online compression in clouds—making informed decisions in virtual machine environments
Zhang et al. A comprehensive toolset for workload characterization, performance modeling, and online control
CN106855840B (en) System CPU analysis method and device
Kim et al. An indirect estimation of machine parameters for serial production lines with Bernoulli reliability model
Shende et al. Performance profiling overhead compensation for MPI programs
Glaub Modeling interferences of CEP operators on limited resources
Khoshaba et al. Modeling the Process of Loading Impact on Web Servers in Computer Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant