WO2020220438A1 - 一种针对虚拟机不同类型的业务并发量预测方法 - Google Patents

一种针对虚拟机不同类型的业务并发量预测方法 Download PDF

Info

Publication number
WO2020220438A1
WO2020220438A1 PCT/CN2019/090872 CN2019090872W WO2020220438A1 WO 2020220438 A1 WO2020220438 A1 WO 2020220438A1 CN 2019090872 W CN2019090872 W CN 2019090872W WO 2020220438 A1 WO2020220438 A1 WO 2020220438A1
Authority
WO
WIPO (PCT)
Prior art keywords
concurrency
business
value
concurrent
business concurrency
Prior art date
Application number
PCT/CN2019/090872
Other languages
English (en)
French (fr)
Inventor
郭军
王馨悦
张斌
刘晨
侯帅
侯凯
李薇
柳波
王嘉怡
刘文凤
张瀚铎
张娅杰
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Publication of WO2020220438A1 publication Critical patent/WO2020220438A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Definitions

  • the invention relates to the technical field of cloud computing, in particular to a method for predicting the concurrent amount of different types of virtual machines.
  • the technical problem to be solved by the present invention is to provide a method for predicting the concurrency of different types of virtual machines in view of the above-mentioned shortcomings of the prior art, so as to realize the prediction of the concurrency of different types of services in the virtual machine.
  • a method for predicting the amount of concurrent services of different types of virtual machines includes the following steps:
  • Step 1 Collect the historical business concurrent volume of the virtual machine and perform preprocessing.
  • the specific method is:
  • Step 1.1 Scan the business concurrency of virtual machines over a period of time and find the missing points of business concurrency
  • Step 1.2 Process the missing points of the scanned business concurrency
  • Step 1.2.1 For the absence of individual sampling points, use the average value of the business concurrency in the previous cycle and the next cycle to fill in.
  • the virtual machine's business concurrency in the t-th period of time is calculated as follows: Shown:
  • Step 1.2.2 For the case where the sample missing reaches more than 90%, discard all samples and set the value of business concurrency within this period to zero;
  • Step 1.3 Adjust the abnormal value of the extremely small samples with abnormal fluctuations in the collected concurrent business volume
  • Step 1.3.1 Combine the quartiles to calculate the upper limit H and lower limit L of the normal virtual machine service concurrency within t, as shown in the following formula:
  • Q1 represents the lower quartile, that is, the 25% point of the ascending sequence of business concurrency in time t
  • Q3 represents the upper quartile, that is, the percent of the ascending sequence of business concurrent volume in t Seventy-five sites
  • k is used to describe the abnormal degree of unreasonable sampling points, generally 1.5 and 3, representing moderate and extreme respectively;
  • Step 1.3.2 Determine whether the data of each sampling point is normal through the Tukey test method, and adjust the abnormal value
  • the error value is discarded first, and then the mean value filling method is used to supplement;
  • Step 1.4 Adjust the data interval of the business concurrency and CPU utilization data collected from the log database or the management log, and merge the collected data in seconds, minutes or hours;
  • Step 1.5 Use the maximum and minimum value normalization method to normalize the data processed in step 1.4;
  • Step 2 Based on the improved 1-NearestNeighbor-Dynamic Time Warping (1-NearestNeighbor-Dynamic Time Warping, 1NN-DTW) method to determine the type of virtual machine service concurrent volume, the specific method is:
  • Step 2.1 Classify the concurrency of each business of the virtual machine into ascending, descending, quadratic, random, periodic fluctuation, periodic rising, and periodic falling;
  • Step 2.2 For various types of business concurrency, select the labeled business concurrency series as a known sample in advance;
  • Step 2.3 For each sequence of business concurrency to be classified, scan all known samples in turn and calculate the closest known sample through the proximity algorithm, then the type of the known sample is the type of business concurrency to be classified ;
  • Step 2.4 Classify all concurrent services into two categories to simplify the 1 nearest neighbor model
  • Step 2.5 Construct an n ⁇ m matrix so that the business concurrency sequence to be classified ⁇ x 1 ,x 2 ,...,x n ⁇ and a known business concurrency sequence ⁇ y 1 ,y 2 ,...,y m ⁇ Alignment, where n is the total number of concurrent services to be classified, and m is the total number of known concurrent services;
  • Step 2.6 The i-th concurrency traffic to be sorted x i and a known amount of the j-th concurrent services y j as a deviation matrix points (i, j) position of the value d i, j, while using Euclidean distance
  • the deviation d i,j of each point after alignment is shown in the following formula:
  • x′ i and y′ j are the derivatives of x i and y j , respectively, and the estimation of the derivative x′ i of the business concurrency amount x i is as follows:
  • Step 2.7 Starting from position (1,1) in the matrix, iteratively find a path with the smallest cumulative deviation according to the constraint that each position can only reach the position above, to the right or the top right except for the boundary value. Until the end of position (n, m);
  • Step 3 Forecast the business concurrency of different types of virtual machines, the specific method is:
  • Step 3.1 Use Classification and Regression Tree (CART) to fit the business concurrency without periodic changes;
  • Step 3.1.1 Traverse the arbitrary value f of each feature F in the sample business concurrency sequence, divide the sample data with (F, f) as a condition, determine the segmentation position with the smallest square error, and select the best value from the business concurrency sequence Good cutting point
  • Step 3.1.2 Save the business concurrency value as the cutting point, and perform segmentation on the business concurrency series;
  • Step 3.1.3 Construct subtrees with feature F greater than f and subtrees less than f in sequence, and further iteratively split and fit the business concurrency series on the left and right of the current split point until they can no longer be classified as leaf nodes;
  • Step 3.1.4 Re-traverse the sample data from bottom to top, check each split point of all business concurrency series, and judge the fitting error of the concurrency series before and after the division,
  • Step 3.2 Use Fourier series FS and classification regression tree CART to fit the business concurrency with periodic changes
  • Step 3.2.1 Use the classification regression tree CART to fit the business concurrency at the moment ⁇ t 1 ,t 2 ,...,t n' ⁇ to obtain the fitted value ⁇ y(0),...y(n'-1),y (n') ⁇ , portray the upward or downward trend of concurrent business volume;
  • Step 3.2.2 Compare the business concurrency volume obtained in step 3.2.1 with the real business concurrency volume to obtain the residual sequence ⁇ e(0),e(1),...,e(n) ⁇ ;
  • Step 3.2.3 Use the classification regression tree CART to predict the business concurrency at ⁇ t n+1 ,t n+2 ,...,t m' ⁇ as ⁇ y(n+1),y(n+2),... ,y(m');
  • Step 3.2.4 Use the Fourier series FS to fit the residual sequence ⁇ e(0), e(1),...,e(n) ⁇ , describe the periodic trend of the business concurrent volume, and obtain ⁇ t n '+1 ,t n'+2 ,...,t m' ⁇ The residual value of the concurrent business volume at the moment ⁇ e(n'+1),e(n'+2),...,e(m') ⁇ ;
  • Step 3.2.4.1 Use the function w(t) to fit the residual sequence e(0), e(1),...,e(n’), the function w(t) is as shown in the formula:
  • Step 3.2.5 Add the service concurrency at the time ⁇ t n'+1 ,t n'+2 ,...,t m' ⁇ to the corresponding residual value to obtain ⁇ t n'+1 ,t n' +2 ,...,t m' ⁇
  • the predicted value of the business concurrency at the moment ie ⁇ y(n'+1)+e(n'+1),y(n'+2)+e(n'+2) ,...,Y(m')+e(m') ⁇ .
  • the present invention provides a method for predicting the concurrent volume of services for different types of virtual machines, which divides the concurrent access volume of services into periodic, rising, falling, quadratic and random
  • Different types of business concurrency are applicable to different prediction methods.
  • Classification of each business concurrency before prediction can not only train the business concurrency model in a targeted manner, but also when modeling the same type of business concurrency
  • the sharing of parameters can also be realized.
  • Predicting the concurrency of each service of the virtual machine by the method of the present invention can provide a basis for the increase or decrease of the virtual machine in the next step, and at the same time help to accurately estimate the software aging status of the virtual machine, so as to improve the performance and reliability of the working virtual machine Sexual purpose.
  • Figure 1 is an example topology diagram of an online ticket ordering system provided by an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for predicting the amount of concurrent services of different types of virtual machines according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of the prediction result of the concurrency of quadratic services provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of the prediction result of the concurrency volume of the cycle-increasing service provided by the embodiment of the present invention.
  • an air ticket online ordering system simulates a PC-side user application, and the service system is built on the Sugon server, and real business concurrency scenarios are simulated by pressurizing the air ticket online ordering system, and different business concurrency data is collected as an example Using the method for predicting business concurrency for different types of virtual machines of the present invention to predict business concurrency.
  • the example topology is shown in Figure 1.
  • Client 1 uses LoadRunner software to generate concurrent business access. It can simulate a large number of users clicking on the plane ticket ordering system page at the same time.
  • the load balancing Nginx2 realizes the reception and service request Assign, finally the server 4 installs Tomcat and deploys the airline ticket online booking system, responsible for reading and writing the business database MySQL5, and processing the request sent by LoadRunner.
  • a method for predicting the amount of concurrent services of different types of virtual machines includes the following steps:
  • Step 1 Collect the historical business concurrent volume of the virtual machine and perform preprocessing.
  • the specific method is:
  • Step 1.1 Scan the business concurrency of virtual machines over a period of time and find the missing points of business concurrency
  • Step 1.2 Process the missing points of the scanned business concurrency
  • Step 1.2.1 For the absence of individual sampling points, use the average value of the business concurrency in the previous cycle and the next cycle to fill in.
  • the virtual machine's business concurrency in the t-th period of time is calculated as follows: Shown:
  • Step 1.2.2 Discard all samples when the sample is missing more than 90% and set the value of the business concurrency during this period to zero; for example, in 20 consecutive sampling periods, only 2 cycles of the business concurrency are collected Value, even if all the data is empty, then it can be considered that the concurrency of the business collected during this period is not credible and cannot be included in the historical series for prediction;
  • Step 1.3 Adjust the abnormal value of the extremely small samples with abnormal fluctuations in the collected concurrent business volume
  • Step 1.3.1 Combine the quartiles to calculate the upper limit H and lower limit L of the normal virtual machine service concurrency within t, as shown in the following formula:
  • Q1 represents the lower quartile, that is, the 25% point of the ascending sequence of business concurrency in time t
  • Q3 represents the upper quartile, that is, the percent of the ascending sequence of business concurrent volume in t Seventy-five sites
  • k is used to describe the abnormal degree of unreasonable sampling points, generally 1.5 and 3, representing moderate and extreme respectively;
  • Step 1.3.2 Determine whether the data of each sampling point is normal through the Tukey test method, and adjust the abnormal value
  • the error value is discarded first, and then the mean value filling method is used to supplement;
  • Step 1.4 Adjust the data interval of the business concurrency and CPU utilization data collected from the log database or the management log, and merge the collected data in units of seconds, minutes or hours;
  • the business concurrency sampled at 1 second intervals fluctuates frequently, the trend change is not obvious, it is impossible to mine the changing characteristics, and the excessive sampling makes the model calculation more expensive and the training more slow; therefore, In this embodiment, the average value is taken at an interval of 15 seconds to organize the data. Other data of the virtual machine is also at an interval of 15 seconds;
  • Step 1.5 Use the maximum and minimum value normalization method to normalize the data processed in step 1.4;
  • Step 2 Based on the improved 1-NearestNeighbor-Dynamic Time Warping (1-NearestNeighbor-Dynamic Time Warping, 1NN-DTW) method to determine the type of virtual machine service concurrent volume, the specific method is:
  • Step 2.1 Classify the concurrency of each business of the virtual machine into ascending, descending, quadratic, random, periodic fluctuation, periodic rising, and periodic falling;
  • Step 2.2 For various types of business concurrency, select the labeled business concurrency series as a known sample in advance;
  • Step 2.3 For each sequence of business concurrency to be classified, scan all known samples in turn and calculate the closest known sample through the proximity algorithm, then the type of the known sample is the type of business concurrency to be classified ;
  • Step 2.4 Classify all concurrent services into two categories to simplify the 1 nearest neighbor model
  • Step 2.5 Construct an n ⁇ m matrix so that the business concurrency sequence to be classified ⁇ x 1 ,x 2 ,...,x n ⁇ and a known business concurrency sequence ⁇ y 1 ,y 2 ,...,y m ⁇ Alignment, where n is the total number of concurrent services to be classified, and m is the total number of known concurrent services;
  • Step 2.6 The i-th concurrency traffic to be sorted x i and a known amount of the j-th concurrent services y j as a deviation matrix points (i, j) position of the value d i, j, while using Euclidean distance
  • the deviation d i,j of each point after alignment is shown in the following formula:
  • x′ i and y′ j are the derivatives of x i and y j , respectively, and the estimation of the derivative x′ i of the business concurrency amount x i is as follows:
  • Step 2.7 Starting from position (1,1) in the matrix, iteratively find a path with the smallest cumulative deviation according to the constraint that each position can only reach the position above, to the right or the top right except for the boundary value. Until the end of position (n, m);
  • Step 3 Forecast the business concurrency of different types of virtual machines, the specific method is:
  • Step 3.1 Use Classification and Regression Tree (CART) to fit the business concurrency without periodic changes;
  • Step 3.1.1 Traverse the arbitrary value f of each feature F in the sample business concurrency sequence, divide the sample data with (F, f) as a condition, determine the segmentation position with the smallest square error, and select the best value from the business concurrency sequence Good cutting point
  • Step 3.1.2 Save the business concurrency value as the cutting point, and perform segmentation on the business concurrency series;
  • Step 3.1.3 Construct subtrees with feature F greater than f and subtrees less than f in sequence, and further iteratively split and fit the business concurrency series on the left and right of the current split point until they can no longer be classified as leaf nodes;
  • Step 3.1.4 Re-traverse the sample data from bottom to top, check each split point of all business concurrency series, and judge the fitting error of the concurrency series before and after the division,
  • Step 3.2 Use Fourier series FS and classification regression tree CART to fit the business concurrency with periodic changes
  • Step 3.2.1 Use the classification regression tree CART to fit the business concurrency at the moment ⁇ t 1 ,t 2 ,...,t n' ⁇ to obtain the fitted value ⁇ y(0),...y(n'-1),y (n') ⁇ , portray the upward or downward trend of concurrent business volume;
  • Step 3.2.2 Compare the business concurrency volume obtained in step 3.2.1 with the real business concurrency volume to obtain the residual sequence ⁇ e(0),e(1),...,e(n) ⁇ ;
  • Step 3.2.3 Use the classification regression tree CART to predict the business concurrency at ⁇ t n+1 ,t n+2 ,...,t m' ⁇ as ⁇ y(n+1),y(n+2),... ,y(m');
  • Step 3.2.4 Use the Fourier complex number FS to fit the residual sequence ⁇ e(0),e(1),...,e(n) ⁇ , describe the periodic trend of the business concurrent volume, and obtain ⁇ t n '+1 ,t n'+2 ,...,t m' ⁇ The residual value of the concurrent business volume at the moment ⁇ e(n'+1),e(n'+2),...,e(m') ⁇ ;
  • Step 3.2.4.1 Use the function w(t) to fit the residual sequence e(0), e(1),...,e(n’), the function w(t) is shown in the following formula:
  • Step 3.2.5 Add the service concurrency at the time ⁇ t n'+1 ,t n'+2 ,...,t m' ⁇ to the corresponding residual value to obtain ⁇ t n'+1 ,t n' +2 ,...,t m' ⁇
  • the predicted value of the business concurrency at the moment ie ⁇ y(n'+1)+e(n'+1),y(n'+2)+e(n'+2) ,...,Y(m')+e(m') ⁇ .
  • This embodiment also provides the use of the improved 1NN-DTW algorithm to determine the type of service concurrency, and compares it with the algorithm before the improvement to verify the accuracy of the improved 1NN-DTW, specifically:
  • LoadRunner to record the access behavior of various services such as browsing, querying, and refunding of the server application. Then the server virtual machine is continuously pressurized for one hour and the business concurrency is collected, the missing and abnormal business concurrency values are processed according to the preprocessing method, and the concurrency data is adjusted at 15 second intervals.
  • the improved 1NN-DTW algorithm to judge the type of concurrent business visits, and compare it with 1NN-DTW and 1NN-DDTW, and use the accuracy rate and F-measure to measure the quality of each algorithm.
  • the concurrency amount obtained in the first step is intercepted as a sub-sequence for every 80, 120, 160, and 200 sampling points, and the type label is marked as a sample sequence according to the seven load change trends listed in Table 1, and finally 700 is obtained. Sample sequences, select 420 of them as known samples for type judgment, and the remaining 280 as test samples.
  • Table 2 shows the comparison result of judging the type of service concurrency by adopting the improved 1NN-DTW and the existing 1NN-DTW and 1NN-DDTW of the present invention. It can be seen from Table 2 that the Accuracy and F-measure of the method of the present invention are significantly higher than the other two methods, indicating that when judging the type of business concurrency, the effect is better than considering the value and change trend of the business concurrency. Only focus on one aspect. In addition, although the method of the present invention calculates the Euclidean distance and the derivative difference of similar points at the same time, the time used does not increase significantly.
  • This embodiment also provides the use of the method of the present invention to predict the amount of service concurrency, and compares it with traditional methods such as ARIMA, specifically:
  • LoadRunner use LoadRunner to record various business access behaviors such as browsing, querying, and refunding of server applications. Then the server virtual machine is continuously pressurized for one hour and the business concurrency is collected, the missing and abnormal business concurrency values are processed according to the method described in the preprocessing, and the concurrency data is adjusted at 15 second intervals.
  • the business concurrency prediction results of the three methods are shown in Table 3.
  • the comparison results between the business concurrency prediction results of the three methods and the real concurrency are shown in Figures 3 and 4. It can be seen from the figure that the method of the present invention fits the real business concurrency sequence better than ARIMA and Holt-Winters under the two conditions set, which shows that the method of the present invention can predict various types of concurrency. More effective.
  • the method of the present invention compared with ARIMA and Holt-Winters, has the lowest MSE and MAE in the two types of concurrency scenarios. In the quadratic concurrency scenario, the MSE and MAE of the three methods are relatively close, but this method is obviously better in the cyclically increasing concurrency scenario. ARIMA and Holt-Winters have poor learning capabilities for such complex concurrency. These indicate that the method of the present invention has considerable accuracy in various scenarios.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

本发明提供一种针对虚拟机不同类型的业务并发量预测方法,涉及云计算技术领域。一种针对虚拟机不同类型的业务并发量预测方法,首先采集虚拟机的历史业务并发量,并进行预处理,;然后基于改进的1最近邻-动态时间调整方法1NN-DTW判断虚拟机业务并发量的类型;最后采用分类回归树拟合不具有周期变化的业务并发量;采用傅里叶级数FS和分类回归树CART拟合具有周期变化的业务并发量;本发明提供的针对虚拟机不同类型的业务并发量预测方法,对虚拟机各业务的并发量进行预测,可以为下一步虚拟机的增加或者减少提供依据,同时有助于准确估计虚拟机的软件老化状况,以达到提高工作虚拟机性能和可靠性的目的。

Description

一种针对虚拟机不同类型的业务并发量预测方法 技术领域
本发明涉及云计算技术领域,尤其涉及一种针对虚拟机不同类型的业务并发量预测方法。
背景技术
软件老化普遍存在于云服务系统中,在虚拟机处理业务并发请求的过程中,操作系统、应用软件等不断地积累错误,导致工作虚拟机的性能逐渐下降,进而影响到云服务系统的服务质量。云平台的高可伸缩、动态重构特性为确保不同并发条件下的云服务质量提供了技术基础,然而现有的虚拟资源动态调整方法仍存在很多缺陷。
一般来说,虚拟机上部署着各种各样的业务,而且不同时间各业务并发量的变化趋势不同,例如,有的业务并发量在白天某段时间持续增加,在晚上某段时间持续减少,有的业务并发量持续循环波动,而有的业务并发量一直保持平稳。通过对云平台各业务的并发量进行预测,可以为下一步虚拟机的增加或者减少提供依据,同时有助于准确估计虚拟机的软件老化状况,以达到提高工作虚拟机性能和可靠性的目的。
由于用户操作、虚拟机业务以及其他不确定性因素在时刻改变,所以业务的并发访问量不仅会随着时间平稳变化,往往还具有上升、下降以及循环波动等趋势,传统的负载模型比如指数平滑模型只能大致刻画出业务并发量的变化趋势,不能很好地捕获其中的非线性变化特征。
发明内容
本发明要解决的技术问题是针对上述现有技术的不足,提供一种针对虚拟机不同类型的业务并发量预测方法,实现对虚拟机中不同类型的业务并发量进行预测。
一种针对虚拟机不同类型的业务并发量预测方法,包括以下步骤:
步骤1:采集虚拟机的历史业务并发量,并进行预处理,具体方法为:
步骤1.1:扫描一段时间内虚拟机的业务并发量,发现业务并发量的缺失点;
步骤1.2:对扫描到的业务并发量缺失点进行处理;
步骤1.2.1:对于个别采样点缺失的情况,采用前一周期和后一周期业务并发量的平均值进行填补,虚拟机第t个时间段的业务并发量con(t)缺失的计算如下公式所示:
Figure PCTCN2019090872-appb-000001
步骤1.2.2:对于样本缺失达到百分九十以上的情况,舍弃全部样本并且将该段时间内业务并发量的值置为零;
步骤1.3:对于采集到的业务并发量中存在异常波动的极大极小样本进行异常值调整;
步骤1.3.1:结合四分位数计算t时间内虚拟机业务并发量正常取值的上限H和下限L,如下公式所示:
H=Q3+k*(Q3-Q1)   (2)
L=Q1-k*(Q3-Q1)   (3)
其中,Q1表示下四分位数,即t时间内业务并发量升序数列的百分之二十五位点,Q3表示上四分位数,即t时间内业务并发量升序数列的百分之七十五位点,k用于描述不合理采样点的异常程度,一般取1.5和3,分别代表中度和极度;
步骤1.3.2:通过图基检验方法判定各采样点数据是否正常,并对异常值进行调整;
如果采样点数据值被判定为错误业务并发量样本,则先将错误值丢弃,再用均值填补法补充;
如果采样点数据值被判定为正常业务并发量样本,则不做任何调整;
步骤1.4:对从日志数据库或者打点日志中采集到的业务并发量和CPU利用率数据进行数据间隔调整,对采集的数据以秒、分钟或小时为单位进行合并;
步骤1.5:采用最大最小值归一法将步骤1.4处理后的数据进行归一化;
步骤2:基于改进的1最近邻-动态时间调整(1-NearestNeighbor-Dynamic Time Warping,即1NN-DTW)方法判断虚拟机业务并发量的类型,具体方法为:
步骤2.1:对虚拟机的各业务并发量进行分类,分为上升型、下降型、二次型、随机型、周期波动型、周期上升型和周期下降型;
步骤2.2:针对各种类型的业务并发量,提前选取带标签的业务并发量数列作为已知样本;
步骤2.3:对每一个待分类的业务并发量数列,依次扫描所有已知样本并通过临近算法计算出最相近的一条已知样本,则该已知样本的类型即为待分类业务并发量的类型;
步骤2.4:将所有业务并发量归为两大类以简化1最近邻模型;
将随机型、上升型、下降型和二次型业务并发量归为不具有周期变化类;
将周期波动型、周期上升型和周期下降型业务并发量归为具有周期变化类;
步骤2.5:构造n×m矩阵,使待分类的业务并发量数列{x 1,x 2,…,x n}和一条已知的业务并发量数列{y 1,y 2,…,y m}对齐,其中,n为待分类的业务并发量总数量,m为已知的业务并发量总数量;
步骤2.6:将待分类的第i个业务并发量x i和已知的第j个业务并发量y j两点偏差作为矩阵中(i,j)位置的值d i,j,同时使用欧式距离和两点导数差的平方的方法,计算待分类的业 务并发量数列{x 1,x 2,…,x n}和已知的业务并发量数列{y 1,y 2,…,y m}对齐后各点的偏差d i,j,如下公式所示:
d i,j=(x i-y j) 2+(x′ i-y′ j) 2   (4)
其中,x′ i、y′ j分别为x i、y j的导数,业务并发量x i的导数x′ i的估计如下公式所示:
Figure PCTCN2019090872-appb-000002
步骤2.7:在矩阵中从位置(1,1)开始,根据除边界值外规定每个位置只能到达其上方、右方或者右上方的位置的约束条件迭代寻找出一条累积偏差最小的路径,直到位置(n,m)结束;
步骤3:预测虚拟机不同变化类型的业务并发量,具体方法为:
步骤3.1:采用分类回归树(Classification And Regression Tree,即CART)拟合不具有周期变化的业务并发量;
步骤3.1.1:遍历样本业务并发量数列的每个特征F的任意取值f,以(F,f)作为条件分割样本数据,确定平方误差最小的分割位置,从业务并发量数列中选择最好的切割点;
所述平方误差error的计算公式如下:
Figure PCTCN2019090872-appb-000003
其中,
Figure PCTCN2019090872-appb-000004
代表样本x中第i’个业务并发量的特征,y i'代表分割前的第i’个序列样本,
Figure PCTCN2019090872-appb-000005
代表分割后的第i’个子序列样本的拟合结果;
步骤3.1.2:保存作为切割点的业务并发量值,并对业务并发量数列执行切分;
步骤3.1.3:依次构建特征F大于f的子树和小于f的子树,进一步迭代对当前分割点左边和右边的业务并发量数列分割拟合,直到无法再分记为叶子节点;
步骤3.1.4:从下而上重新遍历样本数据,对所有业务并发量数列检查每个分割点,判断分割之前与分割之后并发量数列的拟合误差,
若分割之后并发量数列的拟合误差降低,则保留该分割点;
若分割之后并发量数列的拟合误差升高,则取消该分割点并合并左右数列;
步骤3.2:采用傅里叶级数FS和分类回归树CART拟合具有周期变化的业务并发量;
步骤3.2.1:利用分类回归树CART拟合{t 1,t 2,…,t n’}时刻的业务并发量得到拟合值{y(0),…y(n’-1),y(n’)},刻画出业务并发量的上升或者下降趋势;
步骤3.2.2:把步骤3.2.1中所得到的业务并发量与真实业务并发量比较得到残差序列{e(0),e(1),…,e(n)};
步骤3.2.3:利用分类回归树CART预测{t n+1,t n+2,…,t m’}时刻的业务并发量为{y(n+1),y(n+2),…,y(m’);
步骤3.2.4:利用傅里叶级数FS拟合残差序列{e(0),e(1),…,e(n)},刻画出业务并发量的周期趋势,求得{t n’+1,t n’+2,…,t m’}时刻业务并发量的残差值{e(n’+1),e(n’+2),…,e(m’)};
步骤3.2.4.1:使用函数w(t)拟合残差序列e(0),e(1),…,e(n’),函数w(t)如不公式所示:
Figure PCTCN2019090872-appb-000006
其中,a 0、a j’和b j’均为变量,P=n’,
Figure PCTCN2019090872-appb-000007
表示向下取整,t=1,2,…n’;
步骤3.2.4.2:通过最小二乘法计算变量a j’和b j’的值,如下公式所示:
Figure PCTCN2019090872-appb-000008
其中,w j’为第j’个用于拟合残差的函数;
步骤3.2.5:将{t n’+1,t n’+2,…,t m’}时刻的业务并发量与其对应的残差值相加,得到{t n’+1,t n’+2,…,t m’}时刻业务并发量的预测值,即{y(n’+1)+e(n’+1),y(n’+2)+e(n’+2),…,y(m’)+e(m’)}。
采用上述技术方案所产生的有益效果在于:本发明提供的一种针对虚拟机不同类型的业务并发量预测方法,将业务并发访问量分为周期型、上升型、下降型、二次型和随机型,不同类型的业务并发量所适用的预测方法不同,在预测之前对各业务并发量进行分类,不仅可以有针对性地训练业务并发量模型,而且在对相同类型的业务并发量建模时还可以实现参数的共享。通过本发明方法对虚拟机各业务的并发量进行预测,可以为下一步虚拟机的增加或者减少提供依据,同时有助于准确估计虚拟机的软件老化状况,以达到提高工作虚拟机性能和可靠性的目的。
附图说明
图1为本发明实施例提供的飞机票在线订购系统的实例拓扑图;
图2为本发明实施例提供的一种针对虚拟机不同类型的业务并发量预测方法的流程图;
图3为本发明实施例提供的二次型业务并发量预测结果的示意图;
图4为本发明实施例提供的周期上升型业务并发量预测结果的示意图。
图中,1、客户端;2、负载均衡Nginx;3、交换机;4、服务端;5、业务数据库MySQL。
具体实施方式
下面结合附图和实施例,对本发明的具体实施方式作进一步详细描述。以下实施例用于说明本发明,但不用来限制本发明的范围。
本实施例以飞机票在线订购系统模拟PC端用户应用,在曙光服务器上搭建该服务系统, 通过对飞机票在线订购系统加压模拟真实的业务并发场景,并采集不同的业务并发量数据为例,使用本发明的一种针对虚拟机不同类型的业务并发量预测方法进行业务并发量的预测。实例拓扑图如图1所示,客户端1使用LoadRunner软件产生业务并发访问,它可以模拟大量的用户同时点击飞机票订购系统页面,LoadRunner发送页面请求后,由负载均衡Nginx2实现业务请求的接收和分配,最后服务端4安装Tomcat并部署飞机票在线预订系统,负责读写业务数据库MySQL5,处理LoadRunner发送的请求。
一种针对虚拟机不同类型的业务并发量预测方法,如图2所示,包括以下步骤:
步骤1:采集虚拟机的历史业务并发量,并进行预处理,具体方法为:
步骤1.1:扫描一段时间内虚拟机的业务并发量,发现业务并发量的缺失点;
步骤1.2:对扫描到的业务并发量缺失点进行处理;
步骤1.2.1:对于个别采样点缺失的情况,采用前一周期和后一周期业务并发量的平均值进行填补,虚拟机第t个时间段的业务并发量con(t)缺失的计算如下公式所示:
Figure PCTCN2019090872-appb-000009
步骤1.2.2:对于样本缺失达到百分九十以上舍弃全部样本并且将该段时间内业务并发量的值置为零;例如在20个连续采样周期中,只有2个周期采集到业务并发量值,甚至全部数据为空,那么可以认为这段时间采集到的业务并发量都是不可信的,不能纳入历史数列进行预测;
步骤1.3:对于采集到的业务并发量中存在异常波动的极大极小样本进行异常值调整;
步骤1.3.1:结合四分位数计算t时间内虚拟机业务并发量正常取值的上限H和下限L,如下公式所示:
H=Q3+k*(Q3-Q1)   (2)
L=Q1-k*(Q3-Q1)   (3)
其中,Q1表示下四分位数,即t时间内业务并发量升序数列的百分之二十五位点,Q3表示上四分位数,即t时间内业务并发量升序数列的百分之七十五位点,k用于描述不合理采样点的异常程度,一般取1.5和3,分别代表中度和极度;
步骤1.3.2:通过图基检验方法判定各采样点数据是否正常,并对异常值进行调整;
如果采样点数据值被判定为错误业务并发量样本,则先将错误值丢弃,再用均值填补法补充;
如果采样点数据值被判定为正常业务并发量样本,则不做任何调整;
步骤1.4:对从日志数据库或者打点日志中采集到的业务并发量和CPU利用率数据进行 数据间隔调整,对采集的数据以秒、分钟或小时为单位进行合并;
在业务并发访问量建模时,以1秒为时间间隔采样的业务并发量波动频繁,趋势变化不明显,无法挖掘变化的特征,而且过密采样使得模型计算量加大,训练更加迟缓;因此,在本实施例中以15秒为间隔取平均值进行整理数据都是,虚拟机的其他数据也是以15秒为间隔;
步骤1.5:采用最大最小值归一法将步骤1.4处理后的数据进行归一化;
步骤2:基于改进的1最近邻-动态时间调整(1-NearestNeighbor-Dynamic Time Warping,即1NN-DTW)方法判断虚拟机业务并发量的类型,具体方法为:
步骤2.1:对虚拟机的各业务并发量进行分类,分为上升型、下降型、二次型、随机型、周期波动型、周期上升型和周期下降型;
步骤2.2:针对各种类型的业务并发量,提前选取带标签的业务并发量数列作为已知样本;
步骤2.3:对每一个待分类的业务并发量数列,依次扫描所有已知样本并通过临近算法计算出最相近的一条已知样本,则该已知样本的类型即为待分类业务并发量的类型;
步骤2.4:将所有业务并发量归为两大类以简化1最近邻模型;
将随机型、上升型、下降型和二次型业务并发量归为不具有周期变化类;
将周期波动型、周期上升型和周期下降型业务并发量归为具有周期变化类;
步骤2.5:构造n×m矩阵,使待分类的业务并发量数列{x 1,x 2,…,x n}和一条已知的业务并发量数列{y 1,y 2,…,y m}对齐,其中,n为待分类的业务并发量总数量,m为已知的业务并发量总数量;
步骤2.6:将待分类的第i个业务并发量x i和已知的第j个业务并发量y j两点偏差作为矩阵中(i,j)位置的值d i,j,同时使用欧式距离和两点导数差的平方的方法,计算待分类的业务并发量数列{x 1,x 2,…,x n}和已知的业务并发量数列{y 1,y 2,…,y m}对齐后各点的偏差d i,j,如下公式所示:
d i,j=(x i-y j) 2+(x′ i-y′ j) 2   (4)
其中,x′ i、y′ j分别为x i、y j的导数,业务并发量x i的导数x′ i的估计如下公式所示:
Figure PCTCN2019090872-appb-000010
步骤2.7:在矩阵中从位置(1,1)开始,根据除边界值外规定每个位置只能到达其上方、右方或者右上方的位置的约束条件迭代寻找出一条累积偏差最小的路径,直到位置(n,m)结束;
步骤3:预测虚拟机不同变化类型的业务并发量,具体方法为:
步骤3.1:采用分类回归树(Classification And Regression Tree,即CART)拟合不具有周期变化的业务并发量;
步骤3.1.1:遍历样本业务并发量数列的每个特征F的任意取值f,以(F,f)作为条件分割样本数据,确定平方误差最小的分割位置,从业务并发量数列中选择最好的切割点;
所述平方误差error的计算公式如下:
Figure PCTCN2019090872-appb-000011
其中,
Figure PCTCN2019090872-appb-000012
代表样本x中第i’个业务并发量的特征,y i'代表分割前的第i’个序列样本,
Figure PCTCN2019090872-appb-000013
代表分割后的第i’个子序列样本的拟合结果;
步骤3.1.2:保存作为切割点的业务并发量值,并对业务并发量数列执行切分;
步骤3.1.3:依次构建特征F大于f的子树和小于f的子树,进一步迭代对当前分割点左边和右边的业务并发量数列分割拟合,直到无法再分记为叶子节点;
步骤3.1.4:从下而上重新遍历样本数据,对所有业务并发量数列检查每个分割点,判断分割之前与分割之后并发量数列的拟合误差,
若分割之后并发量数列的拟合误差降低,则保留该分割点;
若分割之后并发量数列的拟合误差升高,则取消该分割点并合并左右数列;
步骤3.2:采用傅里叶级数FS和分类回归树CART拟合具有周期变化的业务并发量;
步骤3.2.1:利用分类回归树CART拟合{t 1,t 2,…,t n’}时刻的业务并发量得到拟合值{y(0),…y(n’-1),y(n’)},刻画出业务并发量的上升或者下降趋势;
步骤3.2.2:把步骤3.2.1中所得到的业务并发量与真实业务并发量比较得到残差序列{e(0),e(1),…,e(n)};
步骤3.2.3:利用分类回归树CART预测{t n+1,t n+2,…,t m’}时刻的业务并发量为{y(n+1),y(n+2),…,y(m’);
步骤3.2.4:利用傅里叶络数FS拟合残差序列{e(0),e(1),…,e(n)},刻画出业务并发量的周期趋势,求得{t n’+1,t n’+2,…,t m’}时刻业务并发量的残差值{e(n’+1),e(n’+2),…,e(m’)};
步骤3.2.4.1:使用函数w(t)拟合残差序列e(0),e(1),…,e(n’),函数w(t)如下公式所示:
Figure PCTCN2019090872-appb-000014
其中,a 0、a j’和b j’均为变量,P=n’,
Figure PCTCN2019090872-appb-000015
表示向下取整,t=1,2,…n’;
步骤3.2.4.2:通过最小二乘法计算变量a j’和b j’的值,如下公式所示:
Figure PCTCN2019090872-appb-000016
其中,w j’为第j’个用于拟合残差的函数;
步骤3.2.5:将{t n’+1,t n’+2,…,t m’}时刻的业务并发量与其对应的残差值相加,得到{t n’+1,t n’+2,…,t m’}时刻业务并发量的预测值,即{y(n’+1)+e(n’+1),y(n’+2)+e(n’+2),…,y(m’)+e(m’)}。
本实施例还提供了使用改进的1NN-DTW算法进行业务并发量的类型判断,并与改进前算法进行对比,验证改进后1NN-DTW的准确性,具体为:
首先使用LoadRunner对服务端应用的浏览、查询、退票等各类业务的访问行为进行记录。然后对服务端虚拟机持续加压一小时并采集业务并发量,按照预处理的方法处理缺失和异常的业务并发量值,并以15秒为间隔调整并发量数据。
利用改进的1NN-DTW算法判断业务并发访问量类型,并与1NN-DTW、1NN-DDTW对比,采用正确率Accuracy和F值F-measure来衡量各算法的好坏。将第一步得到的并发量分别按每80、120、160、200个采样点截取为一个子序列,并根据表1中列举的七种负载变化趋势打上类型标签作为一个样本序列,最后得到700个样本序列,选取其中420个作为类型判断的已知样本,剩下的280个作为测试样本。
表1不同类型的业务并发访问量
Figure PCTCN2019090872-appb-000017
采用本发明的改进的1NN-DTW和现有的1NN-DTW、1NN-DDTW这三种方法对业务并发量类型判断的对比结果如表2所示。从表2可以看出,本发明方法的Accuracy、F-measure明显高于另外两种方法,说明在判断业务并发量类型时,从业务并发量的取值和变化趋势两方面考虑效果要优于只关注其中一个方面。另外,虽然本发明方法同时计算相似点的欧式距离和导数差,但是所用时间并未大幅度增加。
表2不同方法的业务并发量分类情况
方法 Accuracy F-measure Time(ms)
改进的1NN-DTW 0.942 0.867 1120
1NN-DTW 0.873 0.751 984
1NN-DDTW 0.916 0.834 1097
本实施例还提供了使用本发明方法预测业务并发量,并与传统的ARIMA等方法进行对比,具体为:
首先使用LoadRunner对服务端应用的浏览、查询、退票等各类业务的访问行为进行记录。然后对服务端虚拟机持续加压一小时并采集业务并发量,按照预处理叙述的方法处理缺失和异常的业务并发量值,并以15秒为间隔调整并发量数据。
选取二次型和周期上升型两类相对复杂的并发量进行预测。通过分析过去25分钟的业务并发量值,估计未来5分钟的业务并发量,并选取均方误差MSE、绝对误差MAE、用时Time三项评价标准,借助Python工具包将本方法与ARIMA、指数平滑Holt-Winters对比,验证本发明方法的准确性。
三种方法的业务并发量预测结果如表3所示,三种方法的业务并发量预测结果与真实并发量之间的对照结果如图3和图4所示。从图中来看,在设定的两种情况下本发明方法与ARIMA、Holt-Winters相比,对真实的业务并发量序列拟合更好,说明本发明方法在对各类并发量预测时较为有效。根据表3中结果进一步分析,本发明方法与ARIMA、Holt-Winters相比,在两种类型的并发量场景下MSE和MAE最低。在二次型并发量场景下三种方法的MSE和MAE较为接近,但是在周期上升型并发量场景下本方法明显更优,ARIMA、Holt-Winters对这类复杂的并发量学习能力较差,这些表明在各种场景下本发明方法都具有可观的准确度。
表3不同方法的业务并发量预测结果
Figure PCTCN2019090872-appb-000018
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而 这些修改或者替换,并不使相应技术方案的本质脱离本发明权利要求所限定的范围。

Claims (6)

  1. 一种针对虚拟机不同类型的业务并发量预测方法,其特征在于:包括以下步骤:
    步骤1:采集虚拟机的历史业务并发量,并进行预处理,具体方法为:
    步骤1.1:扫描一段时间内虚拟机的业务并发量,发现业务并发量的缺失点;
    步骤1.2:对扫描到的业务并发量缺失点进行处理;
    步骤1.3:对于采集到的业务并发量中存在异常波动的极大极小样本进行异常值调整;
    步骤1.4:对从日志数据库或者打点日志中采集到的业务并发量和CPU利用率数据进行数据间隔调整,对采集的数据以秒、分钟或小时为单位进行合并;
    步骤1.5:采用最大最小值归一法将步骤1.4处理后的数据进行归一化;
    步骤2:基于改进的1最近邻-动态时间调整方法1NN-DTW判断虚拟机业务并发量的类型,具体方法为:
    步骤2.1:对虚拟机的各业务并发量进行分类,分为上升型、下降型、二次型、随机型、周期波动型、周期上升型和周期下降型;
    步骤2.2:针对各种类型的业务并发量,提前选取带标签的业务并发量数列作为已知样本;
    步骤2.3:对每一个待分类的业务并发量数列,依次扫描所有已知样本并通过临近算法计算出最相近的一条已知样本,则该已知样本的类型即为待分类业务并发量的类型;
    步骤2.4:将所有业务并发量归为两大类以简化1最近邻模型;
    将随机型、上升型、下降型和二次型业务并发量归为不具有周期变化类;
    将周期波动型、周期上升型和周期下降型业务并发量归为具有周期变化类;
    步骤2.5:构造n×m矩阵,使待分类的业务并发量数列{x 1,x 2,...,x n}和一条已知的业务并发量数列{y 1,y 2,...,y m}对齐,其中,n为待分类的业务并发量总数量,m为已知的业务并发量总数量;
    步骤2.6:将待分类的第i个业务并发量x i和已知的第j个业务并发量y j两点偏差作为矩阵中(i,j)位置的值d i,j,同时使用欧式距离和两点导数差的平方的方法,计算待分类的业务并发量数列{x 1,x 2,...,x n}和已知的业务并发量数列{y 1,y 2,...,y m}对齐后各点的偏差d i,j,如下公式所示:
    d i,j=(x i-y j) 2+(x′ i-y′ j) 2      (1)
    其中,x′ i、y′ j分别为x i、y j的导数,业务并发量x i的导数x′ i的估计如下公式所示:
    Figure PCTCN2019090872-appb-100001
    步骤2.7:在矩阵中从位置(1,1)开始,根据除边界值外规定每个位置只能到达其上方、 右方或者右上方的位置的约束条件迭代寻找出一条累积偏差最小的路径,直到位置(n,m)结束;
    步骤3:预测虚拟机不同变化类型的业务并发量,具体方法为:
    步骤3.1:采用分类回归树CART拟合不具有周期变化的业务并发量;
    步骤3.2:采用傅里叶级数FS和分类回归树CART拟合具有周期变化的业务并发量。
  2. 根据权利要求1所述的一种针对虚拟机不同类型的业务并发量预测方法,其特征在于:所述步骤1.2的具体方法为:
    步骤1.2.1:对于个别采样点缺失的情况,采用前一周期和后一周期业务并发量的平均值进行填补,虚拟机第t个时间段的业务并发量con(t)缺失的计算如下公式所示:
    Figure PCTCN2019090872-appb-100002
    步骤1.2.2:对于样本缺失达到百分九十以上的情况,舍弃全部样本并且将该段时间内业务并发量的值置为零。
  3. 根据权利要求1所述的一种针对虚拟机不同类型的业务并发量预测方法,其特征在于:所述步骤1.3的具体方法为:
    步骤1.3.1:结合四分位数计算t时间内虚拟机业务并发量正常取值的上限H和下限L,如下公式所示:
    H=Q3+k*(Q3-Q1)       (4)
    L=Q1-k*(Q3-Q1)     (5)
    其中,Q1表示下四分位数,即t时间内业务并发量升序数列的百分之二十五位点,Q3表示上四分位数,即t时间内业务并发量升序数列的百分之七十五位点,k用于描述不合理采样点的异常程度,一股取1.5和3,分别代表中度和极度;
    步骤1.3.2:通过图基检验方法判定各采样点数据是否正常,并对异常值进行调整;
    如果采样点数据值被判定为错误业务并发量样本,则先将错误值丢弃,再用均值填补法补充;
    如果采样点数据值被判定为正常业务并发量样本,则不做任何调整。
  4. 根据权利要求1所述的一种针对虚拟机不同类型的业务并发量预测方法,其特征在于:所述步骤3.1的具体方法为:
    步骤3.1.1:遍历样本业务并发量数列的每个特征F的任意取值f,以(F,f)作为条件分割样本数据,确定平方误差最小的分割位置,从业务并发量数列中选择最好的切割点;
    所述平方误差error的计算公式如下:
    Figure PCTCN2019090872-appb-100003
    其中,
    Figure PCTCN2019090872-appb-100004
    代表样本x中第i’个业务并发量的特征,y i′代表分割前的第i’个序列样本,
    Figure PCTCN2019090872-appb-100005
    代表分割后的第i’个子序列样本的拟合结果;
    步骤3.1.2:保存作为切割点的业务并发量值,并对业务并发量数列执行切分;
    步骤3.1.3:依次构建特征F大于f的子树和小于f的子树,进一步迭代对当前分割点左边和右边的业务并发量数列分割拟合,直到无法再分记为叶子节点;
    步骤3.1.4:从下而上重新遍历样本数据,对所有业务并发量数列检查每个分割点,判断分割之前与分割之后并发量数列的拟合误差,
    若分割之后并发量数列的拟合误差降低,则保留该分割点;
    若分割之后并发量数列的拟合误差升高,则取消该分割点并合并左右数列。
  5. 根据权利要求4所述的一种针对虚拟机不同类型的业务并发量预测方法,其特征在于:所述步骤3.2的具体方法为:
    步骤3.2.1:利用分类回归树CART拟合{t 1,t 2,...,t n’}时刻的业务并发量得到拟合值{y(0),...y(n’-1),y(n’)},刻画出业务并发量的上升或者下降趋势;
    步骤3.2.2:把步骤3.2.1中所得到的业务并发量与真实业务并发量比较得到残差序列{e(0),e(1),...,e(n)};
    步骤3.2.3:利用分类回归树CART预测{t n+1,t n+2,...,t m’}时刻的业务并发量为{y(n+1),y(n+2),...,y(m’);
    步骤3.2.4:利用傅里叶级数FS拟合残差序列{e(0),e(1),...,e(n)},刻画出业务并发量的周期趋势,求得{t n’+1,t n’+2,...,t m’}时刻业务并发量的残差值{e(n’+1),e(n’+2),...,e(m’)};
    步骤3.2.5:将{t n’+1,t n’+2,...,t m’}时刻的业务并发量与其对应的残差值相加,得到{t n’+1,t n’+2,...,t m’}时刻业务并发量的预测值,即{y(n’+1)+e(n’+1),y(n’+2)+e(n’+2),...,y(m’)+e(m’)}。
  6. 根据权利要求5所述的一种针对虚拟机不同类型的业务并发量预测方法,其特征在于:所述步骤3.2.4的具体方法为:
    步骤3.2.4.1:使用函数w(t)拟合残差序列e(0),e(1),...,e(n’),函数w(t)如下公式所示:
    Figure PCTCN2019090872-appb-100006
    其中,a 0、a j’和b j’均为变量,P=n’,
    Figure PCTCN2019090872-appb-100007
    表示向下取整,t=1,2,...n’;
    步骤3.2.4.2:通过最小二乘法计算变量a j’和b j’的值,如下公式所示:
    Figure PCTCN2019090872-appb-100008
    其中,w j’为第j’个用于拟合残差的函数。
PCT/CN2019/090872 2019-04-29 2019-06-12 一种针对虚拟机不同类型的业务并发量预测方法 WO2020220438A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910355147.5A CN110096335B (zh) 2019-04-29 2019-04-29 一种针对虚拟机不同类型的业务并发量预测方法
CN201910355147.5 2019-04-29

Publications (1)

Publication Number Publication Date
WO2020220438A1 true WO2020220438A1 (zh) 2020-11-05

Family

ID=67446350

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/090872 WO2020220438A1 (zh) 2019-04-29 2019-06-12 一种针对虚拟机不同类型的业务并发量预测方法

Country Status (2)

Country Link
CN (1) CN110096335B (zh)
WO (1) WO2020220438A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114500349A (zh) * 2021-12-27 2022-05-13 天翼云科技有限公司 一种云平台混沌测试方法及装置
CN116010206A (zh) * 2023-01-04 2023-04-25 上海弘积信息科技有限公司 一种虚拟服务cpu占有率计算方法、系统、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104407688A (zh) * 2014-10-29 2015-03-11 哈尔滨工业大学深圳研究生院 基于树回归的虚拟化云平台能耗测量方法及系统
CN104915434A (zh) * 2015-06-24 2015-09-16 哈尔滨工业大学 一种基于马氏距离dtw的多维时间序列分类方法
US20180018533A1 (en) * 2016-07-15 2018-01-18 University Of Central Florida Research Foundation, Inc. Synthetic data generation of time series data
CN109034179A (zh) * 2018-05-30 2018-12-18 河南理工大学 一种基于马氏距离idtw的岩层分类方法
CN109088747A (zh) * 2018-07-10 2018-12-25 郑州云海信息技术有限公司 云计算系统中资源的管理方法和装置
CN109409496A (zh) * 2018-11-14 2019-03-01 重庆邮电大学 一种基于蚁群算法改进的ldtw序列相似度量方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080091761A1 (en) * 2002-08-06 2008-04-17 Stt Webos, Inc. Method and apparatus for information exchange over a web based environment
CN103036974B (zh) * 2012-12-13 2016-12-21 广东省电信规划设计院有限公司 基于隐马尔可夫模型的云计算资源调度方法和系统
CN106533750B (zh) * 2016-10-28 2019-05-21 东北大学 一种云环境下非平稳型应用用户并发量的预测系统及方法
CN108255613B (zh) * 2018-02-07 2021-01-01 北京航空航天大学 一种基于图着色的soa系统资源管理方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104407688A (zh) * 2014-10-29 2015-03-11 哈尔滨工业大学深圳研究生院 基于树回归的虚拟化云平台能耗测量方法及系统
CN104915434A (zh) * 2015-06-24 2015-09-16 哈尔滨工业大学 一种基于马氏距离dtw的多维时间序列分类方法
US20180018533A1 (en) * 2016-07-15 2018-01-18 University Of Central Florida Research Foundation, Inc. Synthetic data generation of time series data
CN109034179A (zh) * 2018-05-30 2018-12-18 河南理工大学 一种基于马氏距离idtw的岩层分类方法
CN109088747A (zh) * 2018-07-10 2018-12-25 郑州云海信息技术有限公司 云计算系统中资源的管理方法和装置
CN109409496A (zh) * 2018-11-14 2019-03-01 重庆邮电大学 一种基于蚁群算法改进的ldtw序列相似度量方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MENG, YU ET AL.: "Prediction Interval Estimation Model of User Concurrent Requests for Cloud Service in Cloud Environment", CHINESE JOURNAL OF COMPUTERS, vol. 40, no. 2, 28 February 2017 (2017-02-28), pages 1 - 19, XP055750683, ISSN: 0254-4164 *
YUAN, JIDONG ET AL.: "Review of Time Series Representation and Classification Techniques", COMPUTER SCIENCE, vol. 42, no. 3, 31 March 2015 (2015-03-31), pages 1 - 7, XP055750691, ISSN: 1002-137X *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114500349A (zh) * 2021-12-27 2022-05-13 天翼云科技有限公司 一种云平台混沌测试方法及装置
CN114500349B (zh) * 2021-12-27 2023-08-08 天翼云科技有限公司 一种云平台混沌测试方法及装置
CN116010206A (zh) * 2023-01-04 2023-04-25 上海弘积信息科技有限公司 一种虚拟服务cpu占有率计算方法、系统、设备及介质
CN116010206B (zh) * 2023-01-04 2024-01-26 上海弘积信息科技有限公司 一种虚拟服务cpu占有率计算方法、系统、设备及介质

Also Published As

Publication number Publication date
CN110096335B (zh) 2022-06-21
CN110096335A (zh) 2019-08-06

Similar Documents

Publication Publication Date Title
CN108520357B (zh) 一种线损异常原因的判别方法、装置及服务器
US8078913B2 (en) Automated identification of performance crisis
US11849212B2 (en) Method and system for tuning a camera image signal processor for computer vision tasks
WO2021051529A1 (zh) 评估云主机资源的方法、装置、设备及存储介质
US8954910B1 (en) Device mismatch contribution computation with nonlinear effects
CN110689368B (zh) 一种移动应用内广告点击率预测系统设计方法
CN108345670B (zh) 一种用于95598电力工单的服务热点发现方法
CN114048436A (zh) 一种预测企业财务数据模型构建方法及构建装置
CN113537807B (zh) 一种企业智慧风控方法及设备
US11377117B2 (en) Automated vehicle condition grading
WO2020220438A1 (zh) 一种针对虚拟机不同类型的业务并发量预测方法
WO2018006631A1 (zh) 一种用户等级自动划分方法及系统
CN109783459A (zh) 从日志中提取数据的方法、装置及计算机可读存储介质
JP5061999B2 (ja) 解析装置、解析方法及び解析プログラム
CN110083518B (zh) 一种基于AdaBoost-Elman的虚拟机软件老化预测方法
CN112101692B (zh) 移动互联网质差用户的识别方法及装置
CN115730152A (zh) 基于用户画像分析的大数据处理方法及大数据处理系统
CN112860531B (zh) 基于深度异构图神经网络的区块链广泛共识性能评测方法
US11562110B1 (en) System and method for device mismatch contribution computation for non-continuous circuit outputs
CN113610225A (zh) 质量评估模型训练方法、装置、电子设备及存储介质
CN114625781A (zh) 一种基于商品住房价值的批量评估方法
CN112348584A (zh) 一种车辆估值方法、装置及设备
US20220138229A1 (en) Technologies for unsupervised data classification with topological methods
KR102365910B1 (ko) 속성 값 품질 지수를 이용한 데이터 프로파일링 방법 및 데이터 프로파일링 시스템
WO2023029065A1 (zh) 数据集质量评估方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19927250

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19927250

Country of ref document: EP

Kind code of ref document: A1