CN116594855A

CN116594855A - A Virtual Machine Load Prediction Method Based on Missing Value Filling

Info

Publication number: CN116594855A
Application number: CN202310520971.8A
Authority: CN
Inventors: 高岩; 孙汉玺; 刘凯
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2023-05-10
Filing date: 2023-05-10
Publication date: 2023-08-15

Abstract

The invention designs a virtual machine load prediction method based on deficiency value filling, and belongs to the technical field of cloud computing; the concurrent access quantity and the resource quantity of the virtual machine to be subjected to load prediction in a certain period are taken as known values, the virtual machine load to be predicted is taken as a missing value, and the virtual machine load is filled by using a missing value filling method, so that a task of predicting the virtual machine load is realized, and the virtual machine load prediction problem is converted into a missing virtual machine load value filling problem; then constructing and training a GAIN-VMLP (virtual machine load prediction model) based on GAIN; predicting the load of the virtual machine by using the trained GAIN-VMLP model; the virtual machine load prediction method and device can effectively solve the virtual machine load prediction problem in cloud computing, and therefore effective support is provided for elastic expansion adjustment of virtual machine resources in cloud computing.

Description

A Virtual Machine Load Prediction Method Based on Missing Value Filling

技术领域technical field

本发明属于云计算技术领域，特别涉及一种基于缺失值填补的虚拟机负载预测方法。The invention belongs to the technical field of cloud computing, in particular to a virtual machine load prediction method based on missing value filling.

背景技术Background technique

随着云计算的发展，越来越多的软件系统被作为SaaS部署到云环境中提供服务。SaaS层应用(即云应用)通过与用户签订服务水平协议(SLA，Service Level Agreement)向用户提供服务。With the development of cloud computing, more and more software systems are deployed in the cloud environment as SaaS to provide services. A SaaS layer application (that is, a cloud application) provides services to users by signing a service level agreement (SLA, Service Level Agreement) with the users.

一般情况下，云应用提供商需要租用云资源提供商提供的资源(如虚拟机)，部署云应用，并为用户提供满足SLA要求的服务。部署在虚拟机(VM，Virtual Machine)上的云应用，其性能与其所在虚拟机的资源量(如CPU类型及核数、内存容量、网络带宽、系统盘类型及容量等)和负载(如CPU利用率、内存占用率等)密切相关，同一配置下的虚拟机，如果其负载不同，则在其上运行的云应用的性能也不同。而虚拟机负载高低受虚拟机资源量、用户并发访问量等因素的影响。在相同的用户并发访问量下，如果虚拟机资源量大，则可能负载低、性能好；如果虚拟机资源量小，则可能负载高、性能差。Generally, cloud application providers need to lease resources (such as virtual machines) provided by cloud resource providers, deploy cloud applications, and provide users with services that meet SLA requirements. The performance of a cloud application deployed on a virtual machine (VM, Virtual Machine) is related to the amount of resources (such as CPU type and number of cores, memory capacity, network bandwidth, system disk type and capacity, etc.) and load (such as CPU Utilization, memory occupancy, etc.) are closely related, if the virtual machines under the same configuration have different loads, the performance of the cloud applications running on them will also be different. The virtual machine load is affected by factors such as the amount of virtual machine resources and the number of concurrent user visits. Under the same amount of concurrent user visits, if the amount of virtual machine resources is large, the load may be low and the performance may be good; if the amount of virtual machine resources is small, the load may be high and performance may be poor.

云应用提供商总是希望以最小的资源使用成本来租用资源部署云应用，为用户提供满足SLA要求的服务。为了以最小资源使用成本来保证云应用的性能，需要将虚拟机的负载限制在某个范围之内，如果超出范围，则需要对虚拟机资源进行动态调整。为尽量满足初始申请或者调整后的虚拟机负载在所限制的范围内，需要建立一个基于虚拟机资源量、用户并发访问量对虚拟机负载进行预测的模型，利用该模型计算在当前并发访问量条件下需要将虚拟机资源量调整到何种程度，因此如何基于虚拟机资源量和用户并发访问量对虚拟机负载进行预测成为虚拟机资源弹性调整的关键问题。Cloud application providers always hope to rent resources to deploy cloud applications with the minimum resource usage cost, and provide users with services that meet SLA requirements. In order to ensure the performance of cloud applications with the minimum cost of resource usage, it is necessary to limit the load of the virtual machine within a certain range, and if it exceeds the range, it is necessary to dynamically adjust the resources of the virtual machine. In order to satisfy the initial application as much as possible or the adjusted virtual machine load is within the limited range, it is necessary to establish a model for predicting the virtual machine load based on the amount of virtual machine resources and user concurrent visits, and use this model to calculate the current concurrent visits How to adjust the amount of virtual machine resources under certain conditions, so how to predict the virtual machine load based on the amount of virtual machine resources and the number of concurrent user accesses becomes a key issue in the elastic adjustment of virtual machine resources.

基于虚拟机资源量和用户并发访问量的虚拟机负载预测是一个回归问题，其输入为虚拟机的资源量与并发访问量，输出为虚拟机的负载。Virtual machine load prediction based on virtual machine resources and user concurrent visits is a regression problem. Its input is the virtual machine's resources and concurrent visits, and its output is the virtual machine's load.

发明内容Contents of the invention

针对现有技术存在的不足，本发明提出了一种基于缺失值填补的虚拟机负载预测方法；将虚拟机配置信息与并发量当做已知值，将待预测的虚拟机负载当做缺失值，采用GAIN(Generative Adversarial Imputation Nets，生成对抗插补网络)对缺失值进行填补。Aiming at the deficiencies in the prior art, the present invention proposes a virtual machine load prediction method based on missing value filling; the virtual machine configuration information and concurrency amount are regarded as known values, and the virtual machine load to be predicted is regarded as a missing value. GAIN (Generative Adversarial Imputation Nets, Generative Adversarial Imputation Nets) fills in missing values.

一种基于缺失值填补的虚拟机负载预测方法，具体包括以下步骤：A virtual machine load prediction method based on missing value filling, specifically comprising the following steps:

步骤1：将虚拟机负载预测问题转化为缺失虚拟机负载值填补问题；Step 1: Transform the virtual machine load prediction problem into a missing virtual machine load filling problem;

所述虚拟机负载预测，是根据虚拟机某时段的并发访问量及资源量来预测该时段虚拟机的负载；其中并发访问量及资源量和负载的指标根据实际应用需求设置；The virtual machine load prediction is to predict the load of the virtual machine in a certain period of time according to the concurrent visits and resources of the virtual machine; wherein the concurrent visits, resources and load indicators are set according to actual application requirements;

将要进行负载预测的虚拟机某时段的并发访问量及资源量当作已知值，将待预测的虚拟机负载当作缺失值，通过使用缺失值填补方法对虚拟机负载进行填补，实现对虚拟机负载的预测任务，从而将虚拟机负载预测问题转化为缺失虚拟机负载值填补问题；具体为：The amount of concurrent visits and resources of the virtual machine for load prediction in a certain period of time is regarded as a known value, and the load of the virtual machine to be predicted is regarded as a missing value, and the virtual machine load is filled by using the missing value filling method to realize the virtual Machine load prediction task, so that the virtual machine load prediction problem is transformed into a missing virtual machine load value filling problem; specifically:

根据实际应用需求采用并发访问量和n项虚拟机资源量指标预测m项虚拟机负载，将虚拟机负载预测问题转化为缺失虚拟机负载值填补问题的转化方法如下：According to the actual application requirements, the concurrent visits and n virtual machine resource indicators are used to predict the load of m virtual machines, and the transformation method of transforming the problem of virtual machine load prediction into the problem of filling the missing virtual machine load value is as follows:

虚拟机的n项虚拟机资源量指标、虚拟机并发访问量和m项虚拟机负载指标共同组成一个由k项(k＝n+1+m)指标描述的虚拟机状态；其中，前n项为虚拟机资源量、第n+1项为并发访问量、后m项为待预测的虚拟机负载；The virtual machine's n items of virtual machine resource quantity indicators, virtual machine concurrent visits and m items of virtual machine load indicators together form a virtual machine state described by k items (k=n+1+m) indicators; wherein, the first n items is the amount of virtual machine resources, the n+1th item is the amount of concurrent access, and the last m items are the virtual machine load to be predicted;

虚拟机在某个时间段的各状态指标的具体值，就形成了一个k维虚拟机状态向量；针对一个待预测负载的虚拟机，基于虚拟机资源量、用户并发访问量对虚拟机负载进行预测，即利用该虚拟机状态向量的前n+1项预测后m项的值；将虚拟机状态向量的前n+1项当作是已知值，后m项当作是缺失值，对缺失值进行填补完成对虚拟机负载的预测工作，从而将虚拟机负载预测问题转换为一个缺失虚拟机负载值填补问题；The specific values of each state index of the virtual machine in a certain period of time form a k-dimensional virtual machine state vector; for a virtual machine whose load is to be predicted, the virtual machine load is calculated based on the amount of virtual machine resources and the amount of concurrent user visits. Prediction, that is, using the first n+1 items of the virtual machine state vector to predict the value of the next m items; the first n+1 items of the virtual machine state vector are regarded as known values, and the last m items are regarded as missing values. The missing value is filled to complete the prediction of the virtual machine load, thus converting the virtual machine load prediction problem into a missing virtual machine load value filling problem;

步骤2：构建基于GAIN的虚拟机负载预测模型GAIN-VMLP；Step 2: Build a GAIN-based virtual machine load prediction model GAIN-VMLP;

所述GAIN-VMLP，将虚拟机资源量与并发访问量(即虚拟机状态向量的前n+1项)当作已知值，将待预测的虚拟机负载(即虚拟机状态向量的后m项)当做缺失值，将虚拟机负载预测问题转化为缺失虚拟机负载值填补问题，并采用GAIN对缺失虚拟机负载值进行填补，从而完成对虚拟机负载的预测，由此构建用于虚拟机负载预测的GAIN-VMLP模型；The GAIN-VMLP regards the amount of virtual machine resources and the amount of concurrent access (i.e., the first n+1 items of the virtual machine state vector) as known values, and takes the virtual machine load to be predicted (i.e., the last m of the virtual machine state vector) item) as the missing value, transform the virtual machine load prediction problem into the problem of filling the missing virtual machine load value, and use GAIN to fill the missing virtual machine load value, so as to complete the prediction of the virtual machine load, thus constructing a virtual machine GAIN-VMLP model for load forecasting;

所述采用GAIN进行缺失虚拟机负载值填补，具体为：通过生成器生成对缺失虚拟机负载值的拟合数据，然后通过判别器来判断数据是否是真实的来达到对抗目的；The use of GAIN to fill in the missing virtual machine load value is specifically: generating fitting data for the missing virtual machine load value through the generator, and then using the discriminator to judge whether the data is real to achieve the purpose of confrontation;

所述GAIN-VMLP模型的输入是带有缺失虚拟机负载值的k维虚拟机状态向量(后m项为待预测的虚拟机负载)，输出是带有虚拟机负载预测值的k维虚拟机状态向量(后m项为已经预测出结果的虚拟机负载)；The input of the GAIN-VMLP model is a k-dimensional virtual machine state vector with a missing virtual machine load value (the latter m items are the virtual machine load to be predicted), and the output is a k-dimensional virtual machine with a virtual machine load prediction value State vector (the last m items are virtual machine loads that have been predicted);

所述构建基于GAIN的虚拟机负载预测模型GAIN-VMLP，具体过程如下：The construction of the GAIN-based virtual machine load prediction model GAIN-VMLP, the specific process is as follows:

步骤S1：对输入数据向量进行设计；Step S1: designing the input data vector;

在虚拟机状态向量中缺失的项用一个特殊值进行标注，所述特殊值不在任何指标的取值范围之内，形成输入数据向量X；针对k维虚拟机状态向量，并发访问量和虚拟机资源量的前n+1项都是已知的，而代表虚拟机负载的后m项是缺失的；The missing item in the virtual machine state vector is marked with a special value, and the special value is not within the value range of any index, forming the input data vector X; for the k-dimensional virtual machine state vector, the number of concurrent visits and the virtual machine The first n+1 items of the resource amount are known, while the last m items representing the load of the virtual machine are missing;

步骤S2：对掩码向量进行设计；Step S2: designing the mask vector;

通过掩码向量对缺失数据位置进行标记；k维向量M＝(M₁,...,M_k)，每个分量的取值为0或1，其中1表示X中对应分量的值不缺失，0表示X中对应分量的值不缺失；每个X都对应一个M，M中各分量的值根据X进行设置，当X中的分量X_i(i∈[1,k])为0时，对应的M中的分量M_i也为0；当X中的分量X_i(i∈[1,k])不为0时，对应的M中的分量M_i为1，从而形成掩码向量M；Mark the position of the missing data through the mask vector; the k-dimensional vector M=(M ₁ ,...,M _k ), the value of each component is 0 or 1, where 1 means that the value of the corresponding component in X is not missing , 0 means that the value of the corresponding component in X is not missing; each X corresponds to an M, and the value of each component in M is set according to X, when the component _Xi (i∈[1,k]) in X is 0 , the corresponding component _Mi in M is also 0; when the component _Xi (i∈[1,k]) in X is not 0, the corresponding component Mi in _M is 1, thus forming a mask vector M;

步骤S3：对随机噪声向量进行设计；Step S3: designing random noise vectors;

使用随机噪声对缺失数据进行最初的填补；随机生成一个k维随机噪声向量Z＝(Z₁,…,Z_k)，其每个分量的取值范围是[0,1]，从而形成随机噪声向量Z；Use random noise to fill in the missing data initially; randomly generate a k-dimensional random noise vector Z=(Z ₁ ,…,Z _k ), and the value range of each component is [0,1], thus forming random noise vector Z;

步骤S4：对提示向量进行设计；Step S4: designing the prompt vector;

所述提示向量强化生成器和判别器的对抗过程，给判别器提示原始数据中部分缺失的信息，让判别器更加关注提示向量提示的部分，同时逼迫生成器去生成更加真实的数据；提示向量H的生成方法如下：The prompt vector strengthens the confrontation process between the generator and the discriminator, prompts the discriminator with some missing information in the original data, makes the discriminator pay more attention to the part prompted by the prompt vector, and forces the generator to generate more real data at the same time; the prompt vector The generation method of H is as follows:

步骤S4.1：首先生成随机向量B；k维向量B＝(B₁,…,B_k)的每个分量的取值为0或1；每个分量值的生成方法为：从{1,…,k}之间随机生成一个数p，将B中第p分量的值设为0，其余分量的值都设为1；Step S4.1: first generate a random vector B; the value of each component of the k-dimensional vector B=(B ₁ ,...,B _k ) is 0 or 1; the generation method of each component value is: from {1, ...,k} Randomly generate a number p, set the value of the pth component in B to 0, and set the value of the other components to 1;

步骤S4.2：根据B生成提示向量H；k维向量H＝(H₁,…,H_k)的每个分量的取值为0、0.5或1，H的生成方法如式1所示：Step S4.2: Generate a hint vector H according to B; the value of each component of the k-dimensional vector H=(H ₁ ,...,H _k ) is 0, 0.5 or 1, and the generation method of H is shown in formula 1:

H＝B⊙M+0.5(1-B) (1)H＝B⊙M+0.5(1-B) (1)

其中，⊙表示向量的点乘计算；Among them, ⊙ represents the point product calculation of the vector;

步骤S5：对生成器G进行设计；Step S5: Design the generator G;

所述GAIN-VMLP中使用生成器用来生成预测数据；生成器的输入为带缺失值的k维输入数据向量X、掩码向量M和随机噪声向量Z，经过三层全连接层，输出为带有虚拟机负载预测值的输出数据向量由于生成器不仅预测待预测位置的值，也会预测原始的输入数据的值，所以输出的向量不仅要使预测的值欺骗判别器，也要使原始的输入数据的值输出也接近真实值；因此生成器的两个损失函数如式2和式3所示：In the GAIN-VMLP, a generator is used to generate prediction data; the input of the generator is a k-dimensional input data vector X, a mask vector M and a random noise vector Z with missing values, and after three fully connected layers, the output is a band with output data vector with virtual machine load predictions Since the generator not only predicts the value of the position to be predicted, but also predicts the value of the original input data, the output vector should not only make the predicted value deceive the discriminator, but also make the value output of the original input data close to the real value; Therefore, the two loss functions of the generator are shown in formula 2 and formula 3:

其中，m表示掩码向量；b表示随机向量；表示判别器的判别结果，表示对生成器生成的数据是否是缺失数据的概率；x代表输入数据向量；/>代表生成器生成的数据向量；k表示数据向量x的维数；Among them, m represents the mask vector; b represents the random vector; Indicates the discrimination result of the discriminator, indicating the probability of whether the data generated by the generator is missing data; x represents the input data vector; /> Represents the data vector generated by the generator; k represents the dimension of the data vector x;

生成器G的目标是使得两个损失函数的加权和最小化，如式4所示：The goal of the generator G is to minimize the weighted sum of the two loss functions, as shown in Equation 4:

其中，K_G表示生成器G采用梯度下降法进行训练时每批次训练样本的个数，α为超参数；Among them, K _G represents the number of training samples per batch when the generator G is trained using the gradient descent method, and α is a hyperparameter;

步骤S6：对判别器D进行设计；Step S6: Design the discriminator D;

所述GAIN-VMLP中使用判别器来判断数据是数据集中真实的数据还是生成器生成的虚假数据，其输入为生成器的输出和提示向量H，通过三层全连接层，输出以概率形式表示的判断结果/> In the GAIN-VMLP, a discriminator is used to determine whether the data is real data in the data set or false data generated by the generator, and its input is the output of the generator And the hint vector H, through the three layers of fully connected layers, output the judgment result expressed in the form of probability />

判别器的损失函数如式5所示：The loss function of the discriminator is shown in Equation 5:

其中，m表示掩码向量；b表示随机向量；表示判别器的判别结果，表示对生成器生成的数据是否是缺失数据的概率；Among them, m represents the mask vector; b represents the random vector; Indicates the discrimination result of the discriminator, indicating the probability of whether the data generated by the generator is missing data;

针对式5，期望判别器对预测出来的虚拟机负载的真实性判断的足够准确，所以判别器的训练准则如式6所示：For Equation 5, it is expected that the discriminator can accurately judge the authenticity of the predicted virtual machine load, so the training criterion of the discriminator is shown in Equation 6:

其中，K_D表示判别器D采用梯度下降法进行训练时每批次训练样本的个数；Among them, K _D represents the number of training samples per batch when the discriminator D is trained using the gradient descent method;

步骤S7：对输入数据进行标准化设计；Step S7: standardize the design of the input data;

对所述GAIN-VMLP输入数据的每一维都进行标准化处理，将数据映射到[0，1]区间内，标准化公式如式7所示：Each dimension of the GAIN-VMLP input data is standardized, and the data is mapped to the interval [0, 1]. The normalization formula is shown in formula 7:

步骤3：训练GAIN-VMLP模型；Step 3: Train the GAIN-VMLP model;

所述GAIN-VMLP模型的训练过程如下：The training process of the GAIN-VMLP model is as follows:

步骤3.1：对训练数据集进行处理；所述训练数据集由虚拟机运行或进行基准测试所获得的若干条数据组成，每条数据由n个虚拟机资源量指标值、并发访问量、m个虚拟机负载指标值组成；按照比例随机选取数据集中若干条数据，对虚拟机负载指标值清空，表示缺失值；Step 3.1: Process the training data set; the training data set is composed of several pieces of data obtained by running a virtual machine or performing a benchmark test, and each piece of data consists of n virtual machine resource index values, concurrent visits, m Composition of virtual machine load index values; randomly select several pieces of data in the data set according to the ratio, and clear the virtual machine load index values to indicate missing values;

步骤3.2：生成输入数据向量；按照输入数据的标准化设计，采用式7对数据集中的每条数据进行标准化处理，并将每条数据中负载值为空的项用-1进行标注，从而形成由若干个k维虚拟机状态向量标准化后的输入数据向量所组成的数据集X；Step 3.2: Generate the input data vector; according to the standardized design of the input data, use Equation 7 to standardize each piece of data in the data set, and mark the items with empty load values in each piece of data with -1, thus forming the A data set X composed of normalized input data vectors of several k-dimensional virtual machine state vectors;

步骤3.3：设置批处理大小s；使用小批量梯度下降法对所述GAIN-VMLP模型进行训练；每批次输入到GAIN-VMLP中的虚拟机状态向量数将根据批处理大小(batch size)参数s进行控制；Step 3.3: Set the batch size s; use the mini-batch gradient descent method to train the GAIN-VMLP model; the number of virtual machine state vectors input into GAIN-VMLP in each batch will be calculated according to the batch size (batch size) parameter s to control;

步骤3.4：计算掩码向量；按照掩码向量设计方法，对数据集X中的每个数据向量x生成掩码向量m，从而组成数据集X的掩码集M；Step 3.4: Calculate the mask vector; according to the mask vector design method, generate a mask vector m for each data vector x in the data set X, thereby forming the mask set M of the data set X;

步骤3.5：进行判别器优化训练：判别器优化训练过程如下：Step 3.5: Perform discriminator optimization training: The discriminator optimization training process is as follows:

步骤3.5.1：从数据集中X选取s个数据向量并且同时从掩码集M中选取这s个数据向量所对应的掩码向量/> Step 3.5.1: Select s data vectors from the dataset X And at the same time select the mask vector corresponding to the s data vector from the mask set M

步骤3.5.2：生成s个独立同分布的随机噪声Z；Step 3.5.2: Generate s independent and identically distributed random noise Z;

步骤3.5.3：生成s个独立同分布的随机向量B；Step 3.5.3: Generate s independent and identically distributed random vectors B;

步骤3.5.4：使用生成器针对s个数据向量生成s个数据向量/> Step 3.5.4: Use the generator for s data vectors generate s data vectors />

步骤3.5.5：使用梯度下降法基于式5更新训练判别器D；Step 3.5.5: Use the gradient descent method to update the training discriminator D based on Equation 5;

步骤3.6：进行生成器优化训练；生成器器优化训练过程如下：Step 3.6: Perform generator optimization training; the generator optimization training process is as follows:

步骤3.6.1：从数据集中X选取s个数据向量并且同时从掩码集M中选取这s个数据所对应的掩码向量/> Step 3.6.1: Select s data vectors from the dataset X And at the same time select the mask vector corresponding to the s data from the mask set M />

步骤3.6.2：生成s个独立同分布的随机噪声Z；Step 3.6.2: Generate s independent and identically distributed random noise Z;

步骤3.6.3：生成s个独立同分布的随机向量B；Step 3.6.3: Generate s independent and identically distributed random vectors B;

步骤3.6.4：基于式1生成s个提示向量 Step 3.6.4: Generate s hint vectors based on Equation 1

步骤3.6.5：使用梯度下降法基于式4更新训练生成器G；Step 3.6.5: Use the gradient descent method to update the training generator G based on Equation 4;

步骤4：使用训练完成的GAIN-VMLP模型对虚拟机的负载进行预测；Step 4: Use the trained GAIN-VMLP model to predict the load of the virtual machine;

根据实际应用需求设置虚拟机资源量指标和虚拟机负载指标，结合给定的并发量，组成虚拟机状态向量，输入到GAIN-VMLP模型，模型输出结果即为虚拟机负载预测结果，从而实现对虚拟机的负载进行预测。Set the virtual machine resource quantity index and virtual machine load index according to the actual application requirements, combine the given concurrency, form the virtual machine state vector, input it into the GAIN-VMLP model, and the model output result is the virtual machine load prediction result, so as to realize the The load of the virtual machine is predicted.

本发明有益技术效果：Beneficial technical effect of the present invention:

1.本发明构建基于缺失值填补的虚拟机负载预测方法，探讨预测方法中的模型建立问题，对云计算中虚拟机负载进行预测。根据实验结果以及实际应用需求确定选用的模型与相关参数。1. The present invention builds a virtual machine load prediction method based on missing value filling, discusses the problem of model establishment in the prediction method, and predicts the virtual machine load in cloud computing. The selected model and related parameters are determined according to the experimental results and actual application requirements.

2.本发明基于缺失值填补的虚拟机负载预测方法，介绍了该模型的预测过程以及预测算法。经过实验对比分析，本方法在预测精度和稳定性上都取得了很好的效果。2. The present invention is based on a virtual machine load prediction method filled with missing values, and introduces the prediction process and prediction algorithm of the model. After comparative analysis of experiments, this method has achieved good results in prediction accuracy and stability.

3.本发明将虚拟机资源量与并发访问量当作已知值，将待预测的虚拟机负载当作缺失值，从而将虚拟机负载预测问题转化为缺失虚拟机负载值填补问题，进而提出了一种基于缺失值填补的虚拟机负载预测方法，并采用一种缺失值填补算法GAIN对问题进行求解，可以有效解决云计算中虚拟机负载预测问题，从而为云计算中虚拟机资源的弹性伸缩调整提供有效支持。3. The present invention regards the amount of virtual machine resources and the amount of concurrent visits as known values, and regards the virtual machine load to be predicted as a missing value, thereby transforming the problem of virtual machine load prediction into a problem of filling in missing virtual machine load values, and then proposes A virtual machine load prediction method based on missing value filling is proposed, and a missing value filling algorithm GAIN is used to solve the problem, which can effectively solve the problem of virtual machine load prediction in cloud computing, and thus provide for the elasticity of virtual machine resources in cloud computing Telescopic adjustment provides effective support.

附图说明Description of drawings

图1本发明实施例所述GAIN-VMLP模型的虚拟机状态向量示意图；Fig. 1 is a schematic diagram of a virtual machine state vector of the GAIN-VMLP model described in the embodiment of the present invention;

图2本发明实施例所述GAIN-VMLP模型的掩码向量示意图；The mask vector diagram of the GAIN-VMLP model described in the embodiment of the present invention in Fig. 2;

图3本发明实施例所述GAIN-VMLP模型架构图；FIG. 3 is an architecture diagram of the GAIN-VMLP model described in the embodiment of the present invention;

图4本发明实施例对CPU负载的预测结果；Fig. 4 the prediction result of CPU load according to the embodiment of the present invention;

图5本发明实施例对RAM负载的预测结果；Fig. 5 is the prediction result of RAM load according to the embodiment of the present invention;

图6本发明实施例数据集样本数量的不同对算法的影响实验结果；Fig. 6 The experimental results of the influence of different sample sizes on the algorithm in the embodiment of the present invention;

图7本发明实施例数据集的不同分布形式对算法精确度的影响实验结果；Fig. 7 The experimental results of the influence of different distribution forms of the data set of the embodiment of the present invention on the accuracy of the algorithm;

图8本发明实施例算法中不同迭代次数对算法精确度的影响实验结果；Fig. 8 is the experimental result of the impact of different iterations on the accuracy of the algorithm in the algorithm of the embodiment of the present invention;

图9本发明实施例不同全连接层在10000轮迭代时不同比例测试集实验结果；Fig. 9 Experimental results of test sets with different proportions at 10,000 rounds of iterations for different fully connected layers in the embodiment of the present invention;

图10本发明实施例不同全连接层在20000轮迭代时不同比例测试集实验结果；Fig. 10 Experimental results of different proportions of test sets in different fully connected layers of the embodiment of the present invention at 20,000 rounds of iterations;

图11本发明实施例不同全连接层在30000轮迭代时不同比例测试集实验结果；Fig. 11 Experimental results of test sets with different ratios at 30,000 rounds of iterations for different fully connected layers in the embodiment of the present invention;

图12本发明实施例不同超参数α对算法精度的影响实验结果；Fig. 12 The experimental results of the influence of different hyperparameters α on the accuracy of the algorithm in the embodiment of the present invention;

图13本发明实施例不同的预测位置对算法精度的影响实验结果。Fig. 13 is the experimental result of the influence of different predicted positions on the accuracy of the algorithm according to the embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和实施例对本发明做进一步说明；The present invention will be further described below in conjunction with accompanying drawing and embodiment;

所述虚拟机负载预测，是根据虚拟机某时段的并发访问量及资源量来预测该时段虚拟机的负载；其中虚拟机的资源量包括CPU主频、CPU核数、内存容量、内网带宽、外网带宽、系统盘类型、系统盘容量等多项指标，具体采用哪些指标取决于云服务器提供商所提供的虚拟机资源描述和实际应用需求；虚拟机的负载包括CPU利用率、内存利用率、IO消耗、内网带宽使用率、外网带宽使用率等多项指标，具体将预测哪些指标将取决于实际应用需求；The load prediction of the virtual machine is to predict the load of the virtual machine in a certain period of time according to the amount of concurrent visits and the amount of resources of the virtual machine; wherein the amount of resources of the virtual machine includes CPU main frequency, number of CPU cores, memory capacity, and intranet bandwidth , external network bandwidth, system disk type, system disk capacity and many other indicators, which indicators to use depends on the virtual machine resource description provided by the cloud server provider and the actual application requirements; the load of the virtual machine includes CPU utilization, memory utilization Rate, IO consumption, internal network bandwidth usage, external network bandwidth usage and many other indicators, which indicators will be predicted will depend on actual application requirements;

虚拟机在某个时间段的各状态指标的具体值，就形成了一个k维虚拟机状态向量，如附图1所示；针对一个待预测负载的虚拟机，基于虚拟机资源量、用户并发访问量对虚拟机负载进行预测，即利用该虚拟机状态向量的前n+1项预测后m项的值；将虚拟机状态向量的前n+1项当作是已知值，后m项当作是缺失值，对缺失值进行填补完成对虚拟机负载的预测工作，从而将虚拟机负载预测问题转换为一个缺失虚拟机负载值填补问题；The specific values of each state index of the virtual machine in a certain period of time form a k-dimensional virtual machine state vector, as shown in Figure 1; The amount of visits predicts the virtual machine load, that is, uses the first n+1 items of the virtual machine state vector to predict the value of the next m items; the first n+1 items of the virtual machine state vector are regarded as known values, and the last m items As a missing value, the missing value is filled to complete the prediction of the virtual machine load, thereby converting the virtual machine load prediction problem into a missing virtual machine load value filling problem;

本发明实施例利用CPU核数、内存容量等2项虚拟机资源量指标及并发访问量预测虚拟机的CPU利用率、内存利用率等2项虚拟机负载为例进行说明；2项虚拟机资源量指标、并发访问量、2项虚拟机负载指标共同组成了一个由5项指标描述的虚拟机状态；该虚拟机状态的前2项为CPU核数、内存容量，第3项为并发访问量，后2项为CPU利用率、内存利用率；针对一个待预测负载的虚拟机，其虚拟机状态向量的前3项均为已知值，而后2项为缺失值，对后2项缺失值进行填补就可以完成对虚拟机负载的预测工作，从而将虚拟机负载预测问题转换为一个缺失虚拟机负载值填补问题。In the embodiment of the present invention, two virtual machine resource indicators, such as the number of CPU cores and memory capacity, and concurrent visits are used to predict two virtual machine loads, such as the CPU utilization rate and memory utilization rate of the virtual machine, for illustration; the two virtual machine resources Volume index, concurrent visits, and two virtual machine load indexes together form a virtual machine state described by five indexes; the first two items of the virtual machine state are the number of CPU cores and memory capacity, and the third is the number of concurrent visits , the last two items are CPU utilization rate and memory utilization rate; for a virtual machine whose load is to be predicted, the first three items of its virtual machine state vector are all known values, while the last two items are missing values, and the last two items are missing values Filling can complete the virtual machine load prediction work, thus transforming the virtual machine load prediction problem into a missing virtual machine load value filling problem.

步骤2：构建基于GAIN的虚拟机负载预测模型GAIN-VMLP；(GAIN for VM LoadPrediction)；如附图3所示；Step 2: Build a GAIN-based virtual machine load prediction model GAIN-VMLP; (GAIN for VM LoadPrediction); as shown in Figure 3;

在虚拟机状态向量中缺失的项用一个特殊值(这个特殊值不在任何指标的取值范围之内)进行标注，形成输入数据向量X；针对k维虚拟机状态向量，其中代表并发访问量和虚拟机资源量的前n+1项都是已知的，而代表虚拟机负载的后m项是缺失的；The missing item in the virtual machine state vector is marked with a special value (this special value is not within the value range of any index) to form the input data vector X; for the k-dimensional virtual machine state vector, it represents the amount of concurrent access and The first n+1 items of the virtual machine resource amount are known, while the last m items representing the virtual machine load are missing;

步骤S2：对掩码向量进行设计；掩码向量如附图2所示；Step S2: designing the mask vector; the mask vector is shown in Figure 2;

步骤S4：对提示向量进行设计；Step S4: designing the prompt vector;

H＝B⊙M+0.5(1-B) (1)H＝B⊙M+0.5(1-B) (1)

步骤S5：对生成器G进行设计；Step S5: Design the generator G;

所述GAIN-VMLP中使用生成器用来生成预测数据；生成器的输入为带缺失值的k维输入数据向量X、掩码向量M和随机噪声向量Z，经过三层全连接层，输出为带有虚拟机负载预测值的输出数据向量因为生成器不仅预测待预测位置的值，也会预测原始的输入数据的值，所以输出的向量不仅要使预测的值欺骗判别器，也要使原始的输入数据的值输出也接近真实值；所以生成器的两个损失函数如式2和式3所示：In the GAIN-VMLP, a generator is used to generate prediction data; the input of the generator is a k-dimensional input data vector X, a mask vector M and a random noise vector Z with missing values, and after three fully connected layers, the output is a band with output data vector with virtual machine load predictions Because the generator not only predicts the value of the position to be predicted, but also predicts the value of the original input data, so the output vector not only makes the predicted value deceive the discriminator, but also makes the value output of the original input data close to the real value; So the two loss functions of the generator are shown in Equation 2 and Equation 3:

步骤S6：对判别器D进行设计；Step S6: Design the discriminator D;

对所述GAIN-VMLP输入数据的每一维都进行标准化处理，将数据映射到[0,1]区间内，标准化公式如式7所示：Each dimension of the GAIN-VMLP input data is standardized, and the data is mapped to the [0,1] interval, and the standardized formula is shown in formula 7:

步骤3：训练GAIN-VMLP模型；Step 3: Train the GAIN-VMLP model;

步骤3.3：设置批处理大小s；使用小批量梯度下降法对所述GAIN-VMLP模型进行训练；每批次输入到GAIN-VMLP中的虚拟机状态向量数将根据批处理大小(batch size)参数s进行控制；如果s设置为4，则将为每批次选取4个虚拟机状态向量输入到GAIN-VMLP中进行处理。Step 3.3: Set the batch size s; use the mini-batch gradient descent method to train the GAIN-VMLP model; the number of virtual machine state vectors input into GAIN-VMLP in each batch will be calculated according to the batch size (batch size) parameter s to control; if s is set to 4, 4 virtual machine state vectors will be selected for each batch and input to GAIN-VMLP for processing.

步骤3.5.1：从数据集中X选取s个数据向量并且同时从掩码集M中选取这s个数据向量所对应的掩码向量/> Step 3.5.1: Select s data vectors from the dataset X And at the same time select the mask vector corresponding to the s data vector from the mask set M />

本发明包括基于缺失值填补的虚拟机负载预测方法，该模型的预测过程以及预测算法；将虚拟机负载预测问题转化为缺失值填补问题，提出了采用GAIN算法对虚拟机的CPU负载和RAM负载进行预测的方法。The present invention includes a virtual machine load prediction method based on missing value filling, the prediction process of the model and a prediction algorithm; the virtual machine load prediction problem is transformed into a missing value filling problem, and the CPU load and RAM load of the virtual machine are proposed using the GAIN algorithm method for making predictions.

本发明所述的基于缺失值填补的虚拟机负载预测实例如下：The virtual machine load prediction example based on missing value filling described in the present invention is as follows:

实验方案设计；Experimental protocol design;

1.实验数据集。通过云平台上获取数据，并假定每台虚拟机上仅运行一个服务。本文共预设了16种虚拟机资源静态属性分配方案，并模拟泊松分布、正态分布、均匀分布、卡方分布、突变分布、指数分布、渐增分布、渐减分布、随机分布等9种分布类型的并发用户访问频率情况，以便形成不同虚拟机资源负载情况(即不同的虚拟机资源状态)，每种分配方式下每种分布用户访问频率共产生100条数据，共得到实验数据14400条，将数据分为训练数据和测试数据两部分。1. Experimental dataset. Fetch data from the cloud and assume only one service is running on each virtual machine. In this paper, a total of 16 static attribute allocation schemes for virtual machine resources are preset, and simulate Poisson distribution, normal distribution, uniform distribution, chi-square distribution, mutation distribution, exponential distribution, increasing distribution, decreasing distribution, random distribution, etc. 9 In order to form different virtual machine resource load conditions (that is, different virtual machine resource states), a total of 100 pieces of data are generated for each distributed user access frequency under each allocation method, and a total of 14400 experimental data are obtained. The data is divided into two parts, training data and test data.

2.实验方案。基于以上数据集，考虑不同的参数对算法运行以及结果的影响，设计了七个实验方案，并使用均方根误差(RMSE)作为评定依据。RMSE越小，说明预测的结果与实际值的差距越小。RMSE的定义如式9所示。2. Experimental protocol. Based on the above data sets, considering the influence of different parameters on the operation of the algorithm and the results, seven experimental schemes were designed, and the root mean square error (RMSE) was used as the evaluation basis. The smaller the RMSE, the smaller the gap between the predicted result and the actual value. The definition of RMSE is shown in Equation 9.

实验1：用于探究GAIN算法预测虚拟机负载时，单独预测CPU负载RAM负载的精度与同时预测两个负载的精度差异。本实验设计八组对比试验，将测试集的比例从0.1递增到0.8，步长为0.1。每组实验中，使用GAIN算法分别单独预测CPU负载、单独预测RAM负载、同时对CPU与RAM负载进行预测，记录其RMSE。Experiment 1: It is used to explore the difference between the accuracy of predicting CPU load and RAM load alone and the accuracy of predicting two loads at the same time when GAIN algorithm predicts virtual machine load. In this experiment, eight groups of comparative experiments were designed, and the proportion of the test set was increased from 0.1 to 0.8 with a step size of 0.1. In each group of experiments, the GAIN algorithm is used to predict the CPU load alone, the RAM load alone, and the CPU and RAM load at the same time, and record the RMSE.

实验2：用于探究GAIN算法中数据集样本数量的不同对算法的影响。设定测试集比例为0.2，将样本大小从2000逐渐增加到18000，步长为2000，记录算法的RMSE。Experiment 2: It is used to explore the influence of the different number of data set samples in the GAIN algorithm on the algorithm. Set the test set ratio to 0.2, gradually increase the sample size from 2000 to 18000, and the step size is 2000, and record the RMSE of the algorithm.

实验3：用于探究GAIN算法中数据集的不同分布形式对算法精确度的影响。分别针对泊松分布、正态分布、均匀分布、卡方分布、突变分布、指数分布、渐增分布、渐减分布、随机分布等9种分布类型进行实验。Experiment 3: It is used to explore the influence of different distribution forms of data sets in the GAIN algorithm on the accuracy of the algorithm. Experiments were carried out on 9 distribution types, including Poisson distribution, normal distribution, uniform distribution, chi-square distribution, mutation distribution, exponential distribution, increasing distribution, decreasing distribution, and random distribution.

实验4：用于探究GAIN算法中不同的迭代次数对算法精确度的影响。将GAIN算法的迭代轮数从1000次逐渐增加到10000次，步长为1000，记录其RMSE。Experiment 4: It is used to explore the influence of different iterations in the GAIN algorithm on the accuracy of the algorithm. Gradually increase the number of iterations of the GAIN algorithm from 1000 to 10000, with a step size of 1000, and record its RMSE.

实验5：用于探究GAIN算法中不同的全连接层数量对算法精确度的影响。默认的GAIN是三个全连接层，本实验将全连接层数量从3增加至5，步长为1，记录不同全连接层数量下的RMSE。Experiment 5: It is used to explore the influence of different numbers of fully connected layers in the GAIN algorithm on the accuracy of the algorithm. The default GAIN is three fully connected layers. In this experiment, the number of fully connected layers is increased from 3 to 5, and the step size is 1, and the RMSE under different numbers of fully connected layers is recorded.

实验6：用于探究GAIN算法中不同的超参数alpha对算法精确度的影响。超参数alpha影响了生成器G的损失函数，本组设置超参数alpha由1递增至512。Experiment 6: It is used to explore the influence of different hyperparameters alpha in the GAIN algorithm on the accuracy of the algorithm. The hyperparameter alpha affects the loss function of the generator G. In this group, the hyperparameter alpha is increased from 1 to 512.

实验7：用于探究GAIN算法中不同的待预测位置对算法精确度的影响。基于GAIN的虚拟机预测算法待预测的位置为最后两列，也即将CPU负载与RAM负载放在最后。本实验设置三组对照实验，分别将待预测数据放在数据集前两列、中间两列、最后两列，分别记录RMSE，探究不同的预测位置对算法精度的影响。Experiment 7: It is used to explore the influence of different predicted positions in the GAIN algorithm on the accuracy of the algorithm. The position to be predicted by the GAIN-based virtual machine prediction algorithm is the last two columns, that is, the CPU load and RAM load are placed at the end. In this experiment, three sets of control experiments were set up, and the data to be predicted were placed in the first two columns, the middle two columns, and the last two columns of the data set, respectively, and the RMSE was recorded respectively to explore the influence of different prediction positions on the accuracy of the algorithm.

实验结果及分析；Experimental results and analysis;

实验1：GAIN算法单独预测RAM负载的结果与同时预测CPU负载和RAM负载中CPU负载预测的结果对比如图4所示，GAIN算法单独预测RAM负载的结果与同时预测CPU负载和RAM负载中RAM负载预测的结果对比如图5所示。Experiment 1: The results of GAIN algorithm predicting RAM load alone and the results of simultaneously predicting CPU load and RAM load in CPU load prediction are shown in Figure 4. The comparison of load forecasting results is shown in Figure 5.

由图4和图5可以看出，GAIN算法在同时预测CPU负载和RAM负载时，精度与单独预测两个负载时的精度几乎相同。同时预测两个负载能节省一半的训练时间，同时几乎不降低预测精度，大大减少了训练模型时的代价。It can be seen from Figure 4 and Figure 5 that when the GAIN algorithm predicts the CPU load and RAM load at the same time, the accuracy is almost the same as when the two loads are predicted separately. Predicting two workloads at the same time can save half the training time, and hardly reduce the prediction accuracy, which greatly reduces the cost of training the model.

实验2：按上节实验三的参数进行试验，仅针对CPU负载进行预测，实验结果如图6所示。由图6可知，随着数据集的增大，RMSE呈逐渐减小的趋势，算法求解精度逐渐升高。Experiment 2: Experiment with the parameters of Experiment 3 in the previous section, and only predict the CPU load. The experimental results are shown in Figure 6. It can be seen from Figure 6 that with the increase of the data set, the RMSE gradually decreases, and the algorithm solution accuracy gradually increases.

实验3：仅针对CPU负载进行预测，实验结果如图7所示。由图7可知，在不同的并发量分布下，采用GAIN算法和ELM算法的预测误差都比较稳定，而采用BP算法与DBN算法的预测误差波动较大，同时采用GAIN算法的预测误差最小、效果最好，说明本文所提方法在预测CPU负载时具有较高的准确率和稳定性。Experiment 3: Forecasting only for CPU load, the experimental results are shown in Figure 7. It can be seen from Figure 7 that under different concurrency distributions, the prediction errors of the GAIN algorithm and the ELM algorithm are relatively stable, while the prediction errors of the BP algorithm and the DBN algorithm fluctuate greatly, and the prediction error of the GAIN algorithm is the smallest and the effect Best, it shows that the method proposed in this paper has high accuracy and stability when predicting CPU load.

实验4：仅针对CPU负载进行预测，实验结果如图8所示。由图8可知，随着迭代轮数的逐渐增加，GAIN算法的求解精度也在逐渐提高。Experiment 4: Forecasting only for CPU load, the experimental results are shown in Figure 8. It can be seen from Figure 8 that with the gradual increase in the number of iterations, the solution accuracy of the GAIN algorithm is also gradually improved.

实验5：仅针对CPU负载进行预测，结果如图9、10和11所示。五层全连接层的GAIN算法在一万轮迭代时的结果波动比较大，稳定性较差，两万轮和三万轮迭代时比较稳定。当迭代次数超过两万轮时，五层全连接层比三层全连接层的精度略高，尤其是两万轮迭代时，精度提升比较大。Experiment 5: Forecasting only for CPU load, the results are shown in Figures 9, 10 and 11. The GAIN algorithm of the five-layer fully connected layer has a relatively large fluctuation and poor stability at 10,000 iterations, and is relatively stable at 20,000 and 30,000 iterations. When the number of iterations exceeds 20,000 rounds, the accuracy of the five-layer fully-connected layer is slightly higher than that of the three-layer fully-connected layer, especially when the number of iterations exceeds 20,000 rounds, the accuracy improvement is relatively large.

实验6：仅针对CPU负载进行预测，结果如图12所示。由图12可知，在alpha低于128时，算法的精度比较高，超过128后算法的RMSE升高很多，算法精度大幅下降。Experiment 6: Forecasting only for CPU load, the results are shown in Figure 12. It can be seen from Figure 12 that when alpha is lower than 128, the accuracy of the algorithm is relatively high, and when the alpha exceeds 128, the RMSE of the algorithm increases a lot, and the accuracy of the algorithm drops significantly.

实验7：仅针对CPU负载进行预测，结果如图13所示。由图13可知，在不同的测试集比例下，不同的预测位置对算法的精度几乎没有影响。当测试集占比超过0.5时，预测位置在中间的误差会稍大于其他两个位置。Experiment 7: Forecasting only for CPU load, the results are shown in Figure 13. It can be seen from Figure 13 that under different test set ratios, different prediction positions have little effect on the accuracy of the algorithm. When the proportion of the test set exceeds 0.5, the error of the predicted position in the middle will be slightly larger than the other two positions.

Claims

1. The virtual machine load prediction method based on the deficiency value filling is characterized by comprising the following steps of:

step 1: converting the virtual machine load prediction problem into a missing virtual machine load value filling problem;

step 2: constructing a GAIN-VMLP (virtual machine load prediction model) based on GAIN;

step 3: training a GAIN-VMLP model;

step 4: predicting the load of the virtual machine by using the trained GAIN-VMLP model;

setting a virtual machine resource quantity index and a virtual machine load index according to actual application requirements, combining given concurrency quantity to form a virtual machine state vector, inputting the virtual machine state vector into a GAIN-VMLP model, and obtaining a model output result which is a virtual machine load prediction result, thereby realizing the prediction of the load of the virtual machine.

2. The virtual machine load prediction method based on missing value filling according to claim 1, wherein the virtual machine load prediction in step 1 predicts the load of a virtual machine in a certain period according to the concurrent access amount and the resource amount of the virtual machine in the certain period; and setting indexes of the concurrent access quantity, the resource quantity and the load according to actual application requirements.

3. The virtual machine load prediction method based on missing value filling according to claim 1, wherein step 1 specifically comprises:

the concurrent access quantity and the resource quantity of the virtual machine to be subjected to load prediction in a certain period are taken as known values, the virtual machine load to be predicted is taken as a missing value, and the virtual machine load is filled by using a missing value filling method, so that a task of predicting the virtual machine load is realized, and the virtual machine load prediction problem is converted into a missing virtual machine load value filling problem; the method comprises the following steps:

the method for converting the virtual machine load prediction problem into the missing virtual machine load value filling problem by predicting m virtual machine loads according to the actual application requirements by adopting concurrent access quantity and n virtual machine resource quantity indexes comprises the following steps:

n virtual machine resource quantity indexes, virtual machine concurrent access quantity and m virtual machine load indexes of the virtual machine jointly form a virtual machine state described by k (k=n+1+m) indexes; the first n items are virtual machine resource amounts, the n+1th item is concurrent access amount, and the last m items are virtual machine loads to be predicted;

the specific value of each state index of the virtual machine in a certain time period forms a k-dimensional virtual machine state vector; predicting the virtual machine load based on the virtual machine resource amount and the user concurrent access amount aiming at a virtual machine of the load to be predicted, namely predicting the value of the m items by using the first n+1 items of the virtual machine state vector; and taking the first n+1 items of the virtual machine state vector as known values, taking the last m items as missing values, filling the missing values to complete the prediction work of the virtual machine load, and converting the virtual machine load prediction problem into a missing virtual machine load value filling problem.

4. The virtual machine load prediction method based on missing value filling according to claim 1, wherein in step 2, the GAIN-VMLP takes the first n+1 item of the virtual machine resource amount and the concurrent access amount, i.e. the virtual machine state vector, as a known value, takes the virtual machine load to be predicted, i.e. the last m items of the virtual machine state vector, as a missing value, converts the virtual machine load prediction problem into a missing virtual machine load value filling problem, and fills the missing virtual machine load value by adopting GAIN, thereby completing the prediction of the virtual machine load, and thus constructing a GAIN-VMLP model for virtual machine load prediction;

the filling of the missing virtual machine load value by adopting GAIN specifically comprises the following steps: generating fitting data of the missing virtual machine load value through a generator, and judging whether the data are real through a discriminator so as to achieve the aim of countermeasure;

the input of the GAIN-VMLP model is the k-dimensional virtual machine state vector with the missing virtual machine load value, namely the last m items are virtual machine loads to be predicted, and the output is the k-dimensional virtual machine state vector with the virtual machine load predicted value, namely the last m items are virtual machine loads for which the result is predicted.

5. The virtual machine load prediction method based on missing value filling according to claim 1, wherein in the step 2, the virtual machine load prediction model GAIN-VMLP based on GAIN is constructed, and the specific process is as follows:

step S1: designing an input data vector;

the missing items in the virtual machine state vector are marked by a special value which is not in the value range of any index to form an input data vector X; for a k-dimensional virtual machine state vector, the first n+1 terms of concurrent access amount and virtual machine resource amount are known, while the last m terms representing virtual machine load are missing;

step S2: designing a mask vector;

marking the missing data position through a mask vector; k-dimensional vector m= (M ₁ ,...,M _k ) The value of each component is 0 or 1, wherein 1 indicates that the value of the corresponding component in X is not missing, and 0 indicates that the value of the corresponding component in X is not missing; each X corresponds to one M, the values of the components in M are set according to X, when the components in X are _i (i∈[1,k]) When 0, the component M in the corresponding M _i Also 0; when component X in X _i (i∈[1,k]) When the value is not 0, the component M in the corresponding M _i 1, thereby forming a mask vector M;

step S3: designing a random noise vector;

initially padding the missing data using random noise; randomly generating a k-dimensional random noise vector z= (Z) ₁ ,…,Z _k ) The value range of each component is 0,1]Thereby forming a random noise vector Z;

step S4: designing a prompt vector;

the prompt vector intensifies the countermeasure process of the generator and the discriminator, prompts the discriminator for partial missing information of the original data, enables the discriminator to pay more attention to the prompt part of the prompt vector, and simultaneously forces the generator to generate more real data;

step S5: designing a generator G;

a generator used in the GAIN-VMLP to generate prediction data; the input of the generator is k-dimensional input data vector X with missing value, mask vector M and random noise vector Z, and the output data vector with virtual machine load predicted value is output through three full-connection layersThe output vector is not only required to make the predicted value deception discriminator, but also to make the value output of the original input data approach to the true value; the two loss functions of the generator are therefore shown in equations 2 and 3:

where m represents a mask vector; b represents a random vector;a discrimination result of the discriminator, which indicates a probability of whether or not the data generated by the generator is missing data; x represents an input data vector; />Representing the data vector generated by the generator; k represents the dimension of the data vector x;

the goal of generator G is to minimize the weighted sum of the two loss functions, as shown in equation 4:

wherein K is _G The number of training samples in each batch when the generator G adopts a gradient descent method for training is represented, and alpha is a super parameter;

step S6: designing a discriminator D;

the GAIN-VMLP uses a discriminator to determine whether the data is real data in the dataset or false data generated by the generator, the input of which is the output of the generatorAnd a prompt vector H for outputting a judgment result expressed in a probability form through three full connection layers>

The loss function of the discriminator is shown in fig. 5:

where m represents a mask vector; b represents a random vector;a discrimination result of the discriminator, which indicates a probability of whether or not the data generated by the generator is missing data;

for equation 5, the training criteria for the expected arbiter for the authenticity determination of the predicted virtual machine load is shown in equation 6:

wherein K is _D The number of training samples of each batch when the discriminator D adopts a gradient descent method for training is represented;

step S7: carrying out standardized design on input data;

carrying out standardization processing on each dimension of the GAIN-VMLP input data, mapping the data into a [0,1] interval, wherein a standardization formula is shown in a formula 7:

6. the virtual machine load prediction method based on missing value padding of claim 5, wherein the generating method of the hint vector H in step S4 is as follows:

step S4.1: firstly, generating a random vector B; k-dimensional vector b= (B) ₁ ,…,B _k ) The value of each component of (2) is 0 or 1; the generation method of each component value comprises the following steps: randomly generating a number p from {1, …, k }, setting the value of the p-th component in B to 0, and setting the values of the rest components to 1;

step S4.2: generating a prompt vector H according to the B; k-dimensional vector h= (H) ₁ ,…,H _k ) The generation method of H with the value of 0, 0.5 or 1 of each component is shown in the formula 1:

H＝B⊙M+0.5(1-B) (1)

wherein, as indicated by the letter, "-represents the dot product calculation of the vector.

7. The virtual machine load prediction method based on missing value filling according to claim 1, wherein step 3 specifically comprises:

step 3.1: processing the training data set; the training data set consists of a plurality of pieces of data obtained by running or performing benchmark test of the virtual machine, wherein each piece of data consists of n virtual machine resource quantity index values, concurrent access quantity and m virtual machine load index values; randomly selecting a plurality of data in the data set according to a proportion, emptying the load index value of the virtual machine to represent a missing value;

step 3.2: generating an input data vector; according to the standardized design of input data, carrying out standardized processing on each piece of data in a data set by adopting a 7, and marking items with empty load values in each piece of data by using a-1, so as to form a data set X consisting of a plurality of input data vectors standardized by the state vectors of the k-dimensional virtual machine;

step 3.3: setting a batch processing size s; training the GAIN-VMLP model using a small batch gradient descent method; the number of virtual machine state vectors input into the GAIN-VMLP per batch will be controlled according to a batch size (batch size) parameter s;

step 3.4: calculating a mask vector; generating a mask vector M for each data vector X in the data set X according to a mask vector design method, thereby forming a mask set M of the data set X;

step 3.5: and (3) performing discriminant optimization training:

step 3.6: and performing generator optimization training.

8. The virtual machine load prediction method based on missing value padding of claim 7, wherein the optimized training process of the step 3.5 arbiter is as follows:

step 3.5.1: selecting s data vectors from data set XAnd simultaneously selecting mask vectors corresponding to the s data vectors from the mask set M +.>

Step 3.5.2: generating s independent random noise Z distributed in the same way;

step 3.5.3: generating s independent random vectors B distributed in the same way;

step 3.5.4: using a generator for s data vectorsGenerating s data vectors->

Step 3.5.5: the training discriminant D is updated based on equation 5 using a gradient descent method.

9. The virtual machine load prediction method based on missing value padding of claim 7, wherein the step 3.6 generator optimizes the training process as follows:

step 3.6.1: selecting s data vectors from data set XAnd simultaneously selecting mask vectors corresponding to the s data from the mask set M>

Step 3.6.2: generating s independent random noise Z distributed in the same way;

step 3.6.3: generating s independent random vectors B distributed in the same way;

step 3.6.4: generating s hint vectors based on 1

Step 3.6.5: the training generator G is updated based on equation 4 using a gradient descent method.