WO2020077682A1 - 一种服务质量评估模型的训练方法及装置 - Google Patents

一种服务质量评估模型的训练方法及装置 Download PDF

Info

Publication number
WO2020077682A1
WO2020077682A1 PCT/CN2018/113031 CN2018113031W WO2020077682A1 WO 2020077682 A1 WO2020077682 A1 WO 2020077682A1 CN 2018113031 W CN2018113031 W CN 2018113031W WO 2020077682 A1 WO2020077682 A1 WO 2020077682A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
neural network
model
training
service
Prior art date
Application number
PCT/CN2018/113031
Other languages
English (en)
French (fr)
Inventor
叶唐陟
黄华俊杰
Original Assignee
网宿科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 网宿科技股份有限公司 filed Critical 网宿科技股份有限公司
Priority to US17/043,148 priority Critical patent/US20210027170A1/en
Priority to EP18937140.4A priority patent/EP3863222A4/en
Publication of WO2020077682A1 publication Critical patent/WO2020077682A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements
    • H04L43/55Testing of service level quality, e.g. simulating service usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5009Determining service level performance parameters or violations of service level contracts, e.g. violations of agreed response time or mean time between failures [MTBF]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0817Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking functioning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • H04L43/106Active monitoring, e.g. heartbeat, ping or trace-route using time related information in packets, e.g. by adding timestamps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes

Definitions

  • the invention relates to the technical field of CDN, in particular to a training method and device for a service quality evaluation model.
  • CDN Content Delivery Network
  • one way to evaluate the service quality of the CDN service system is to evaluate the service quality by analyzing the access logs of the server, such as calculating the lagging rate and other indicators.
  • a large amount of computing resources are needed to traverse the access logs, resulting in very high cost of internal operation and maintenance equipment and bandwidth; at the same time, this method is highly coupled with business types, and the evaluation indicators of each business type are different Very large, there is no way to form a unified standard, which makes internal management very difficult.
  • Another way is to use machine performance and network conditions to assess service quality. This assessment method relies heavily on the experience of operation and maintenance personnel, and the accuracy is not high.
  • embodiments of the present invention provide a training method and device for a service quality evaluation model.
  • the technical solution is as follows:
  • a training method for a service quality assessment model is provided, which is applied to a model training node.
  • the method includes:
  • each of the service quality assessment models is suitable for quality assessment of a service type
  • the quality monitoring data of the service node is collected according to a fixed period, including:
  • the machine performance data includes CPU utilization, remaining memory, load, iowait value, and iUM value; and the network characteristic data includes ping data, poll data, and download rate.
  • the method further includes:
  • the monitoring node periodically sends detection signals to the service node and obtains network characteristic data
  • the step of collecting network characteristic data of the service node according to a fixed period includes:
  • determining a characteristic value based on the machine performance data and the network characteristic data it includes:
  • the step of replacing the abnormal values in the machine performance data, the network characteristic data, and the quality monitoring data with normal values includes:
  • the characteristic values of the machine performance data include one or more of the mean, maximum value, or variance of the machine performance data of each dimension;
  • the characteristic value of the network characteristic data includes at least one preset quantile value of the network characteristic data of each dimension.
  • the step of determining the label based on the quality monitoring data includes:
  • the deep neural network model is an LSTM neural network model.
  • the LSTM neural network model includes at least one layer of neural networks, and each layer of neural networks includes a forget gate, an input gate, an output gate, a neuron state, and an output result.
  • the expressions are:
  • i t ⁇ g (W i x t + U i c t-1 + b i );
  • o t ⁇ g (W 0 x t + U o c t-1 + b o );
  • f t represents a forgetting door
  • i t denotes an input gate
  • o t denotes an output gate
  • c t represents the states of neurons
  • h t represents the output
  • ⁇ g, ⁇ c, ⁇ h respectively represent the activation function
  • x t represents a t
  • W f , W i , W o , W c , U f , U i , and U o respectively represent the weight matrix
  • b f , b i , b o , and b c respectively represent the bias vectors.
  • the settings of the same parameter in each layer of the neural network are different.
  • the step of training the LSTM neural network model using the training set includes:
  • the training set includes multiple training samples, and each of the training samples includes a label and feature values of n time steps, where n is a positive integer;
  • the steps of inputting the feature values in the training set into the first layer of the neural network of the LSTM neural network model to obtain the output result include:
  • the method further includes: using the feature value and the tag to establish a verification set;
  • the step of training the deep neural network model using the training set it includes:
  • the relationship between the input and output results established by the service quality assessment model is a non-linear relationship.
  • the model training node is a single server or server group.
  • a training device for a service quality evaluation model includes:
  • the collection module is used to collect the machine performance data, network characteristic data and quality monitoring data of the service node according to a fixed period;
  • a processing module configured to determine a characteristic value based on the machine performance data and the network characteristic data
  • the processing module is also used to determine a label based on the quality monitoring data
  • the processing module is also used to establish a training set by using the feature value and the label;
  • the training module is used for training a deep neural network model using the training set to obtain a service quality evaluation model.
  • each of the service quality assessment models is suitable for quality assessment of a service type
  • the collection module is specifically used for:
  • the machine performance data is used for CPU utilization, remaining memory, load, iowait value, and iUM value; and the network characteristic data is used for ping data, poll data, and download rate.
  • the collection module is specifically used for:
  • processing module is also used to:
  • processing module is specifically used for:
  • the characteristic values of the machine performance data are used for one or more of the mean, maximum value, or variance of the machine performance data of each dimension;
  • the characteristic value of the network characteristic data is used for at least one preset quantile value of the network characteristic data of each dimension.
  • processing module is specifically used for:
  • the deep neural network model is an LSTM neural network model.
  • the LSTM neural network model is used for at least one layer of neural network, and each layer of neural network is used for forgetting gate, input gate, output gate, neuron state, and output result.
  • the expressions are:
  • i t ⁇ g (W i x t + U i c t-1 + b i );
  • o t ⁇ g (W 0 x t + U o c t-1 + b o );
  • f t represents a forgetting door
  • i t denotes an input gate
  • o t denotes an output gate
  • c t represents the states of neurons
  • h t represents the output
  • ⁇ g, ⁇ c, ⁇ h respectively represent the activation function
  • x t represents a t
  • W f , W i , W o , W c , U f , U i , and U o respectively represent the weight matrix
  • b f , b i , b o , and b c respectively represent the bias vectors.
  • the settings of the same parameter in each layer of the neural network are different.
  • the training module is specifically used to:
  • the training set is used for multiple training samples, and each of the training samples is used for labels and feature values of n time steps, where n is a positive integer;
  • the training module is specifically used for:
  • An embodiment of the present invention uses machine performance data, network feature data, and quality monitoring data to train a model to learn the nonlinear relationship between machine performance data, network feature data, and service quality, and uses the model to evaluate the service quality of the service system At the time, only the machine performance data and network characteristic data of the service system need to be input. Compared with the method of evaluating service quality through the server-side access log, the data input can be reduced, and the computing resources and bandwidth required for the evaluation can be greatly reduced Improve the efficiency of service quality assessment and reduce operating costs;
  • embodiments of the present invention use machine performance data and network feature data as model inputs. These data are decoupled from specific services, so a set of universal services can be formed Quality assessment standards to facilitate the management of service systems;
  • the embodiments of the present invention do not need to rely on human experience, can use machine learning methods to automatically model, and have higher accuracy.
  • FIG. 1 is a schematic diagram of a network framework provided by an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of another network framework provided by an embodiment of the present invention.
  • FIG. 3 is a flowchart of a training method of a service quality evaluation model provided by an embodiment of the present invention
  • FIG. 4 is a flowchart of a method for evaluating service quality provided by an embodiment of the present invention.
  • FIG. 5 is a structural block diagram of a training device for a service quality evaluation model provided by an embodiment of the present invention.
  • FIG. 6 is a structural block diagram of an apparatus for evaluating service quality provided by an embodiment of the present invention.
  • the embodiment of the present invention provides a training method of a service quality evaluation model, which can be applied to the network framework shown in FIG. 1.
  • the network framework includes service nodes, monitoring nodes and model training nodes.
  • the service node is a node in the CDN service system that provides services for users.
  • the monitoring node is connected to the service node, and is used for detecting the network condition from the monitoring node to each service node by sending a detection signal to the service node, and generating network characteristic data.
  • the monitoring node can also be used to collect machine performance data and quality monitoring data in each service node. .
  • the model training node is connected to the monitoring node and is used to collect machine performance data, network feature data and quality monitoring data of each service node from the monitoring node, and then use the collected data to train to obtain a service quality evaluation model.
  • the model training node can be a single server or a server group.
  • the above model training node may include a processor, a memory, and a transceiver.
  • the processor may be used to train the service quality assessment model in the following process, and the memory may be used to store data required during the training process described below and generated data
  • the transceiver can be used to receive and send the relevant data during the training process described below.
  • the embodiment of the present invention also provides a service quality evaluation method, which can be applied to the network framework shown in FIG. 2.
  • the network framework includes service nodes, monitoring nodes and quality evaluation nodes.
  • the monitoring node is used to detect the network situation from the monitoring node to each service node, and generate network characteristic data, as well as to collect machine performance data in each service node.
  • the quality evaluation node is connected to the monitoring node, and is used to collect machine performance data and network characteristic data of each service node from the monitoring node, and then use the collected data and the trained service quality evaluation model to evaluate the service quality of the CDN service system.
  • the quality assessment node can be a single server or a server group.
  • the above-mentioned quality evaluation node may include a processor, a memory, and a transceiver.
  • the processor may be used to evaluate the quality of service in the following process
  • the memory may be used to store data required and data generated during the following evaluation process.
  • the transceiver It can be used to receive and send relevant data during the evaluation process described below.
  • the above model training node and quality evaluation node may be the same node, or not the same node.
  • the embodiments of the present invention are not only suitable for evaluating the service quality of a CDN service system, but also for evaluating the service quality of a single server node and the service quality of other service systems or clusters composed of multiple server nodes.
  • the embodiment does not specifically limit the applicable range.
  • FIG. 3 it is a flowchart of a training method for a service quality assessment model provided by an embodiment of the present invention.
  • the method can be applied to a model training node, that is, executed by a model training node.
  • the method may specifically include the following steps.
  • Step 301 Collect machine performance data, network characteristic data, and quality monitoring data of the service node according to a fixed period.
  • the process of training the service quality evaluation model is to input the eigenvalues into the model for training to obtain the output result, then adjust the model parameters according to the error between the output result and the real result, and then continue to train the adjusted model, so iterative loop, Thus, a nonlinear relationship between the input and output results is established, that is, a service quality evaluation model is obtained.
  • the data for determining the characteristic value may include machine performance data and network characteristic data.
  • the data used to embody the quality of service may also include other data. The embodiment of the present invention does not specifically limit the data used to embody the quality of service.
  • machine performance data, network characteristic data, and quality monitoring data of each service node in the CDN service system may be collected according to a fixed period.
  • Machine performance data can include CPU utilization, remaining memory, iowait value, i embod value, and so on.
  • the monitoring node periodically sends a detection signal to the service node to detect the network situation from the monitoring node to each service node, and obtains network characteristic data, so the network characteristics of the service node can be collected from the monitoring node data.
  • the network characteristic data may include ping (Packet Internet Groper, Internet packet explorer) data, poll data, download rate and so on.
  • Machine performance data and quality monitoring data need to be obtained from the service node.
  • the machine performance data and quality monitoring data are collected from the service node, and then the model training node obtains the machine performance data, network feature data, and quality monitoring data from the monitoring node according to a fixed period.
  • the service node and the monitoring node may also send the data required for model training to the distributed storage system, and then the model training node obtains these data from the distributed storage system according to a fixed period.
  • the embodiment of the present invention does not specifically limit the manner in which the original data is collected.
  • the quality monitoring data is used to calculate the evaluation index of service quality, that is, the real result used for comparison with the model output.
  • the quality monitoring data may include information such as request response time and request content size.
  • the corresponding quality monitoring data can be obtained from the log information of the service node.
  • the trained service quality model is suitable for the quality assessment of a business type, and each business type includes multiple application services, so it can be selected according to the business type to which the service quality assessment model applies Perform model training on quality monitoring data corresponding to application services belonging to the business type. Therefore, the embodiment of the present invention can adopt a general model training method to train a service quality model suitable for various service types.
  • quality monitoring data corresponding to one or more application services in the service node may be collected, and the one or more application services belong to a service type to which the service quality evaluation model applies, that is to say
  • the quality monitoring data that needs to be collected is the quality monitoring data corresponding to the limited application service, without the need to collect the quality monitoring data corresponding to all the application services included in this type of business.
  • the type of business applicable to the model to be trained is type A business.
  • the type A business includes application services A1, application service A2, ..., application service An.
  • you can only collect The quality monitoring data corresponding to application service A1 is used for subsequent model training, thereby reducing the data transmission pressure during data collection and the subsequent data processing burden.
  • a larger range of data may also be collected, and then quality monitoring data corresponding to the preset application service in the service node may be obtained from the collected large data set to use the obtained data for model training.
  • machine performance data, network feature data, and quality monitoring data of multiple CDN service systems can be collected.
  • the preprocessing process includes: deleting the data with repeated time stamps in the original data, and filtering out the null values and abnormal values in the original data, and replacing the null values with normal values And outliers, or delete null and outliers.
  • null values you can filter them directly from the original data; for outliers, use clustering algorithms or data normalization to set confidence intervals for filtering.
  • outliers you can use clustering algorithms or data normalization to set confidence intervals for filtering.
  • the following is an example of how to use the data collected in adjacent collection cycles to replace. For example, if the CPU utilization rate of node A collected in the current collection cycle is null or abnormal, it can be replaced with the CPU utilization rate of node A collected in the previous collection cycle.
  • the collected original data can be imported into the kafka queue, so that the original data can be repeatedly consumed, for example, the original data is copied into two copies, one copy is used to train the model offline, and the other copy is used to calculate the quality of service in real time .
  • Step 302 Determine a characteristic value based on the machine performance data and the network characteristic data.
  • the characteristic values of the machine performance data may include one or more of the average, maximum value, or variance of the machine performance data of each dimension, specifically, including the average, maximum value, or variance of CPU utilization ; The average, maximum or variance of the remaining memory; the average, maximum or variance of the load; the average, maximum or variance of the iowait value and the average, maximum or variance of the i embod value.
  • the characteristic value of the network characteristic data may include at least one preset quantile value of the network characteristic data of each dimension, for example, including the 25th quantile value, 50th quantile value and 75th quantile value of the ping data, and the 25th quantile value of the poll data Value, 50th quantile, and 75th quantile.
  • the average value of cpu utilization is the average value of cpu utilization of all service nodes collected in the same collection cycle in the same CDN service system .
  • Step 303 Determine a label based on the quality monitoring data.
  • Labels can intuitively reflect the level of service quality, and for different types of business, the indicators used to assess the quality of service are different. According to the type of business to which the service quality evaluation model applies, the corresponding evaluation index is used to determine the label.
  • the step of determining a label based on the quality monitoring data specifically includes: determining an evaluation indicator of service quality based on a service type to which the service quality evaluation model is applicable, and then using the quality monitoring data to calculate the value of the evaluation indicator, and calculating the value of the evaluation indicator
  • the value of the evaluation index is determined as the label. For example, for a service quality evaluation model suitable for on-demand services, you can use the lagging rate as an evaluation indicator, and use the lagging rate value calculated using the quality monitoring data as a label for model training.
  • the embodiment of the present invention does not label the model during training
  • the evaluation indicators used are specifically limited.
  • the collected quality monitoring data is a relatively large amount of original data, which does not directly reflect the level of service quality. Therefore, a series of calculations are required to obtain the value of the service quality evaluation index, and use the calculated value of the evaluation index as Labels for model training.
  • Step 304 Use the feature value and the label to establish a training set.
  • each training sample includes n time step feature values and their corresponding labels, where n is a positive integer , And each training sample can correspond to a label value.
  • the original data can be summarized according to the number of time steps included in the training sample, that is, the original data collected in n collection cycles can be summarized Together, the machine performance data and the characteristic values of the network characteristic data obtained in each collection cycle and the labels corresponding to the quality monitoring data obtained in n collection cycles are calculated, and training samples including n time step data are obtained.
  • a large number of training samples can be obtained by using the collected original data, and the training samples can be set up according to a preset division ratio for the training set, the verification set, and the test set, for example, the division ratio can be 60%, 20%, and 20%.
  • the training set is used to train the model; the verification set is used to verify the trained model and select the model with higher accuracy; the test set is used to further test and optimize the model selected by the verification set.
  • Step 305 Use the training set to train a deep neural network model to obtain a service quality evaluation model.
  • the deep neural network model can use the LSTM (Long Short-Term Memory, long-short-term memory network) neural network model.
  • the LSTM neural network belongs to a time-recursive neural network.
  • the LSTM neural network model used in the embodiment of the present invention includes at least one layer of neural network, and each layer of neural network includes forget gate, input gate, output gate, neuron state and output result, and their expressions are:
  • i t ⁇ g (W i x t + U i c t-1 + b i );
  • o t ⁇ g (W 0 x t + U o c t-1 + b o );
  • f t represents a forgetting door
  • i t denotes an input gate
  • o t denotes an output gate
  • c t represents neurons (cell) state
  • h t represents the output
  • ⁇ g, ⁇ c, ⁇ h represent activation function
  • x t represents the input data at time t
  • W f , W i , W o , W c , U f , U i , U o respectively represent the weight matrix
  • b f , b i , b o , b c represent the offset vectors respectively.
  • ⁇ g may be a Sigmiod function
  • ⁇ c and ⁇ h may be tanh functions.
  • each neural network in the same parameter settings can be different, for example, the parameter ⁇ g ⁇ g and the first layer a second layer parameter settings may be different.
  • the process of training the LSTM neural network model of the multi-layer structure using the training set is: first, the training set is input into the first layer of the LSTM neural network model to propagate, and the output result is obtained; the current output result is input to the next layer of neural The network propagates to get a new output result. If the next layer of neural network is the last layer of neural network, then end this step, otherwise repeat this step; determine the error between the output of the last layer of neural network and the label ; Back propagation of the error to optimize model parameters.
  • the embodiment of the present invention may use a two-layer structure LSTM neural network model.
  • the following takes the two-layer structure LSTM neural network model as an example to illustrate the training process of the model.
  • n 10
  • the value of n can be set according to an experience value, or can be set through self-learning.
  • the embodiment of the present invention does not specifically limit the value of n.
  • the propagation process of h n in the second layer neural network is similar to the propagation process of the training set in the first layer neural network, and will not be repeated here.
  • the quality of service data is output, and the error is calculated according to the output quality of service data and the labels in the training set.
  • This error can be expressed as a loss function, and the error is input into the model for back propagation. According to the error, the parameters (including the weight matrix and the offset vector) of the model are partially differentiated, and then the parameters are adjusted according to the value of the partial differentiation, thereby optimizing the model.
  • the training samples can also be trained in batches, and each batch of training samples constitutes a training set, and the training is performed according to the above method, and the model parameters are updated. For example, use the first batch of training samples for training, and update the model parameters, and then continue to use the second batch of training samples for training, and update the model parameters, enter each batch of training samples for training until the last batch is used The training of the training sample ends.
  • the verification set After using the training set to train the model, you can use the verification set to verify the accuracy of the trained model, that is, calculate the error between the model output and the real result, that is, the label. If it does not meet the requirements, that is, the accuracy is not enough, then adjust Hyperparameters, retrain the adjusted model, and iterate the loop in this way, so as to select the model with higher accuracy. Then you can use the test set to further test and optimize the model selected by the verification set, that is, calculate the error between the model output and the real result, and then backpropagate the loss function to optimize the model parameters.
  • the model training node can be a single server or a server group.
  • the model training node is a single server, the above training process is performed by the single server. If the above training process has a large amount of data processing, it can be executed by the server group.
  • the model training nodes may include big data nodes and deep learning nodes.
  • the big data node is used to preprocess the collected original data, and use the original data to establish a training set.
  • the deep learning node is used to train the model using the training set to obtain a service quality evaluation model.
  • the big data node and the deep learning node can be a single server or a server group.
  • the Hadoop distributed file system can be used to save the training set, and the deep learning node can read the training set from the Hadoop distributed file system for model training.
  • TensorFlow can be used to train the model.
  • An embodiment of the present invention uses machine performance data, network feature data, and quality monitoring data to train a model to learn the nonlinear relationship between machine performance data, network feature data, and service quality, and uses the model to evaluate the service quality of the service system At the time, only the machine performance data and network characteristic data of the service system need to be input. Compared with the way of evaluating the service quality through the server-side access log, the data input can be reduced, and the computing resources and bandwidth required for the evaluation can be greatly reduced Improve the efficiency of service quality assessment and reduce operating costs;
  • embodiments of the present invention use machine performance data and network feature data as model inputs. These data are decoupled from specific services, so a set of universal services can be formed Quality assessment standards to facilitate the management of service systems;
  • the embodiments of the present invention do not need to rely on human experience, can use machine learning methods to automatically model, and have higher accuracy.
  • the trained service quality assessment model may be deployed online for application, and the method of using the service quality assessment model to perform service quality assessment is as follows.
  • FIG. 4 it is a flowchart of a method for evaluating service quality provided by an embodiment of the present invention.
  • the method may be applied to a quality evaluation node, that is, executed by a quality evaluation node.
  • the method may specifically include the following steps.
  • Step 401 Collect machine performance data and network feature data of the service node.
  • the quality evaluation node may collect machine performance data and network characteristic data in the service system to be evaluated for service quality, for use in service quality evaluation.
  • the quality evaluation node collects the original data, it can collect data of a preset number of cycles according to a fixed cycle.
  • the preset number of cycles needs to be not less than the number of time steps required by the sample. For example, when training a model, the training samples used include data with 10 time steps, and the preset number of cycles needs to be not less than 10.
  • Step 402 Determine a characteristic value based on the machine performance data and the network characteristic data.
  • This step is similar to the calculation process of the feature value in the above model training process, and will not be described here.
  • Step 403 Input the feature value into the trained service quality evaluation model to obtain a service quality evaluation result.
  • a preset number of eigenvalues in time steps are selected to input the trained service quality evaluation model, and after the model calculation, the service quality evaluation result is output.
  • the number of time steps of the feature value used for service quality evaluation is equal to the number of time steps included in the training sample when training the model.
  • the service quality assessment model After the service quality assessment model is deployed in an online application, it can be regularly tested and trained to further optimize model parameters and improve model accuracy.
  • An embodiment of the present invention uses machine performance data, network feature data, and quality monitoring data to train a model to learn the nonlinear relationship between machine performance data, network feature data, and service quality, and uses the model to evaluate the service quality of the service system At the time, only the machine performance data and network characteristic data of the service system need to be input. Compared with the method of evaluating service quality through the server-side access log, the data input can be reduced, and the computing resources and bandwidth required for the evaluation can be greatly reduced Improve the efficiency of service quality assessment and reduce operating costs;
  • embodiments of the present invention use machine performance data and network feature data as model inputs. These data are decoupled from specific services, so a set of universal services can be formed Quality assessment standards to facilitate the management of service systems;
  • the embodiments of the present invention do not need to rely on human experience, can use machine learning methods to automatically model, and have higher accuracy.
  • FIG. 5 is a structural block diagram of an apparatus for training a service quality evaluation model provided by an embodiment of the present invention.
  • the apparatus may be configured in a model training node or the model training node itself.
  • the apparatus may specifically include an acquisition module 501, processing Module 502 and training module 503.
  • the collection module is used to collect machine performance data, network characteristic data and quality monitoring data of the service node according to a fixed period;
  • a processing module configured to determine a characteristic value based on the machine performance data and the network characteristic data
  • the processing module is also used to determine a label based on the quality monitoring data
  • the processing module is also used to establish a training set by using the feature value and the label;
  • the training module is used for training a deep neural network model using the training set to obtain a service quality evaluation model.
  • each of the service quality assessment models is suitable for quality assessment of a service type
  • the collection module is specifically used for:
  • the machine performance data is used for CPU utilization, remaining memory, load, iowait value, and iUM value; and the network characteristic data is used for ping data, poll data, and download rate.
  • the collection module is specifically used for:
  • the processing module is also used to:
  • the processing module is specifically used for:
  • the characteristic value of the machine performance data is used for one or more of the mean, maximum value or variance of the machine performance data of each dimension;
  • the characteristic value of the network characteristic data is used for at least one preset quantile value of the network characteristic data of each dimension.
  • the processing module is specifically used for:
  • the deep neural network model is an LSTM neural network model.
  • the LSTM neural network model is used for at least one layer of neural network, and each layer of neural network is used for forgetting gate, input gate, output gate, neuron state and output result, the expressions are:
  • i t ⁇ g (W i x t + U i c t-1 + b i );
  • o t ⁇ g (W 0 x t + U o c t-1 + b o );
  • f t represents a forgetting door
  • i t denotes an input gate
  • o t denotes an output gate
  • c t represents the states of neurons
  • h t represents the output
  • ⁇ g, ⁇ c, ⁇ h respectively represent the activation function
  • x t represents a t
  • W f , W i , W o , W c , U f , U i , and U o respectively represent the weight matrix
  • b f , b i , b o , and b c respectively represent the bias vectors.
  • the settings of the same parameter in each layer of the neural network are different.
  • the training module is specifically used to:
  • the training set is used for multiple training samples, and each of the training samples is used for labels and feature values of n time steps, where n is a positive integer;
  • the training module is specifically used for:
  • the processing module is further used to establish a verification set using the feature value and the label;
  • the training module is used to:
  • the relationship between the input and output results established by the service quality assessment model is a non-linear relationship.
  • the model training node is a single server or server group.
  • the training device of the service quality evaluation model provided in the above embodiments only uses the division of the above functional modules as examples for training the model.
  • the above functions can be allocated by different functions as needed
  • Module completion means dividing the internal structure of the device into different functional modules to complete all or part of the functions described above.
  • the service quality evaluation model training apparatus and the service quality evaluation model training method embodiment provided in the above embodiments belong to the same concept.
  • the training device of the service quality evaluation model provided in the above embodiments has the same beneficial effects as the training method of the service quality evaluation model.
  • the beneficial effects of the training device embodiment of the service quality evaluation model refer to the training method of the service quality evaluation model for implementation The beneficial effects of examples will not be repeated here.
  • FIG. 6 it is a structural block diagram of an apparatus for evaluating service quality provided by an embodiment of the present invention.
  • the apparatus may be configured in a quality assessment node or the quality assessment node itself.
  • the apparatus may specifically include an acquisition module 601, a processing module 602, and Evaluation module 603.
  • the collection module 601 is used to collect machine performance data and network characteristic data for evaluating service quality
  • the processing module 602 is configured to determine a characteristic value based on the machine performance data and the network characteristic data;
  • the evaluation module 603 is configured to input the feature value into the trained quality of service evaluation model to obtain quality data.
  • the service quality evaluation device provided in the above embodiment only uses the division of the above functional modules as an example to illustrate the quality of service.
  • the above functions can be allocated by different functional modules according to needs. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • the service quality evaluation apparatus and the service quality evaluation method provided in the above embodiments belong to the same concept. For the specific implementation process, refer to the method embodiments, and details are not described here.
  • the service quality evaluation apparatus provided in the above embodiments has the same beneficial effects as the service quality evaluation method.
  • the beneficial effects of the service quality evaluation apparatus embodiments reference may be made to the beneficial effects of the service quality evaluation method embodiments, which will not be repeated here.
  • the program may be stored in a computer-readable storage medium.
  • the mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Cardiology (AREA)
  • Quality & Reliability (AREA)
  • Environmental & Geological Engineering (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明公开了一种服务质量评估模型的训练方法及装置,其中,该方法包括:按照固定周期采集服务节点的机器性能数据、网络特征数据以及质量监控数据;基于所述机器性能数据以及所述网络特征数据确定特征值;基于所述质量监控数据确定标签;利用所述特征值以及所述标签建立训练集;利用所述训练集训练深度神经网络模型,得到服务质量评估模型。利用本发明提供的服务质量评估模型进行服务质量评估,能够提高评估的准确性,并且能够减少数据的输入,大幅降低了评估所需的计算资源以及带宽,不仅提高了服务质量评估效率,而且降低了运营成本。

Description

一种服务质量评估模型的训练方法及装置 技术领域
本发明涉及CDN技术领域,特别涉及一种服务质量评估模型的训练方法及装置。
背景技术
随着CDN(Content Delivery Network,内容分发网络)技术的越来越普及,CDN的业务越来越复杂和庞大,客户对CDN服务系统的服务质量的要求越来越高。为了保障高质量的服务,CDN服务系统需要实时了解为客户提供的服务质量,及时发现和替换故障节点,避免机器或者网络原因导致服务质量的下降。
当前,用来评估CDN服务系统的服务质量的一种方式,是通过分析服务端的访问日志来评估服务质量,例如计算卡顿率等指标。通过服务端访问日志评估服务质量时需要大量的计算资源去遍历访问日志,导致内部运维的设备与带宽成本非常高;同时这种方式和业务类型耦合程度大,每种业务类型的评估指标差别很大,没办法形成统一的标准,导致内部管理十分困难。另一种方式是利用机器性能以及网络情况等指标评估服务质量,这种评估方式十分依赖运维人员的经验,且准确性不高。
发明内容
为了解决现有技术的问题,本发明实施例提供了一种服务质量评估模型的训练方法及装置。所述技术方案如下:
第一方面,提供了一种服务质量评估模型的训练方法,应用于模型训练节点,所述方法包括:
按照固定周期采集服务节点的机器性能数据、网络特征数据以及质量监控数据;
基于所述机器性能数据以及所述网络特征数据确定特征值;
基于所述质量监控数据确定标签;
利用所述特征值以及所述标签建立训练集;
利用所述训练集训练深度神经网络模型,得到服务质量评估模型。
可选的,每个所述服务质量评估模型适用于一种业务类型的质量评估;
相应的,按照固定周期采集服务节点的质量监控数据,包括:
按照固定周期采集服务节点中一种或多种应用服务对应的质量监控数据,所述一种或多种应用服务属于所述服务质量评估模型所适用的业务类型。
可选的,所述机器性能数据包括cpu利用率、内存剩余量、负载、iowait值以及ioutil值;所述网络特征数据包括ping数据、poll数据以及下载速率。
可选的,所述方法还包括:
监控节点周期性地向所述服务节点发送检测信号,并得到网络特征数据;
相应的,所述按照固定周期采集服务节点的网络特征数据的步骤,包括:
按照固定周期从所述监控节点中采集服务节点的网络特征数据。
可选的,基于所述机器性能数据以及所述网络特征数据确定特征值之前,包括:
删除所述机器性能数据、所述网络特征数据以及所述质量监控数据中有重复时间戳的数据;
将所述机器性能数据、所述网络特征数据以及所述质量监控数据中的空值和异常值替换为正常值,或者删除所述空值和异常值。
可选的,将所述机器性能数据、所述网络特征数据以及所述质量监控数据中的异常值替换为正常值的步骤包括:
使用聚类算法或者数据标准化后设置置信区间方式,筛选所述机器性能数据、所述网络特征数据以及所述质量监控数据中的空值和异常值;
使用k-NN方法或者相邻采集周期采集到的数据替换所述异常值。
可选的,所述机器性能数据的特征值包括各维度机器性能数据的均值、极大值或者方差中的一种或多种;
所述网络特征数据的特征值包括各维度网络特征数据的至少一个预设分位值。
可选的,基于所述质量监控数据确定标签的步骤,包括:
基于所述服务质量评估模型所适用的业务类型确定服务质量的评估指标;
利用所述质量监控数据计算所述评估指标的数值,并将所述评估指标的数 值确定为标签。
可选的,所述深度神经网络模型为LSTM神经网络模型。
可选的,所述LSTM神经网络模型包括至少一层神经网络,每层神经网络均包括遗忘门、输入门、输出门、神经元状态以及输出结果,表达式分别为:
f t=σ g(W fx t+U fc t-1+b f);
i t=σ g(W ix t+U ic t-1+b i);
o t=σ g(W 0x t+U oc t-1+b o);
c t=f tοc t-1+i tοσ c(W cx t+b c);
h t=o tοσ h(c t);
其中,f t表示遗忘门,i t表示输入门,o t表示输出门,c t表示神经元状态,h t表示输出结果,σ g、σ c、σ h分别表示激活函数,x t表示t时刻的输入数据,W f、W i、W o、W c、U f、U i、U o分别表示权重矩阵,b f、b i、b o、b c分别表示偏置向量。
可选的,当所述LSTM神经网络模型包括多层神经网络时,各层神经网络中同一种参数的设置不同。
可选的,当所述LSTM神经网络模型包括多层神经网络时,利用所述训练集训练LSTM神经网络模型的步骤,包括:
将所述训练集中的特征值输入LSTM神经网络模型的第一层神经网络中进行传播,得到输出结果;
将当前得到的输出结果输入下一层神经网络进行传播,得到新的输出结果,如果所述下一层神经网络为最后一层神经网络,则结束该步骤,否则重复该步骤;
确定最后一层神经网络的输出结果与标签之间的误差;
将所述误差进行反向传播,优化模型参数。
可选的,所述训练集包括多个训练样本,每个所述训练样本均包括标签和n个时间步长的特征值,其中,n为正整数;
将所述训练集中的特征值输入LSTM神经网络模型的第一层神经网络中进行传播,得到输出结果的步骤,包括:
依次将x t(t=1、2、……、n)输入所述LSTM神经网络模型的第一层神经网络中,该x t为所述训练集所包含的全部训练样本中第t个时间步长的特征值组成的矩阵,并得到输出结果h n
可选的,所述方法还包括:利用所述特征值以及所述标签建立验证集;
利用所述训练集训练深度神经网络模型的步骤之后,包括:
将所述验证集的特征值输入训练之后的模型,得到输出结果;
确定所述输出结果与所述验证集的标签之间的误差;
如果所述误差不符合要求,则调整超参数,重新训练调整之后的模型。
可选的,所述服务质量评估模型所建立的输入与输出结果之间的关系为非线性关系。
可选的,所述模型训练节点为单个服务器或服务器组。
第二方面,提供了一种服务质量评估模型的训练装置,所述装置包括:
采集模块,用于按照固定周期采集服务节点的机器性能数据、网络特征数据以及质量监控数据;
处理模块,用于基于所述机器性能数据以及所述网络特征数据确定特征值;
所述处理模块,还用于基于所述质量监控数据确定标签;
所述处理模块,还用于利用所述特征值以及所述标签建立训练集;
训练模块,用于利用所述训练集训练深度神经网络模型,得到服务质量评估模型。
可选的,每个所述服务质量评估模型适用于一种业务类型的质量评估;
所述采集模块,具体用于:
按照固定周期采集服务节点中一种或多种应用服务对应的质量监控数据,所述一种或多种应用服务属于所述服务质量评估模型所适用的业务类型。
可选的,所述机器性能数据用于cpu利用率、内存剩余量、负载、iowait值以及ioutil值;所述网络特征数据用于ping数据、poll数据以及下载速率。
可选的,所述采集模块,具体用于:
按照固定周期从监控节点中采集服务节点的网络特征数据。
可选的,所述处理模块,还用于:
删除所述机器性能数据、所述网络特征数据以及所述质量监控数据中有重复时间戳的数据;
将所述机器性能数据、所述网络特征数据以及所述质量监控数据中的空值和异常值替换为正常值,或者删除所述空值和异常值。
可选的,所述处理模块,具体用于:
使用聚类算法或者数据标准化后设置置信区间方式,筛选所述机器性能数据、所述网络特征数据以及所述质量监控数据中的空值和异常值;
使用k-NN装置或者相邻采集周期采集到的数据替换所述异常值。
可选的,所述机器性能数据的特征值用于各维度机器性能数据的均值、极大值或者方差中的一种或多种;
所述网络特征数据的特征值用于各维度网络特征数据的至少一个预设分位值。
可选的,所述处理模块,具体用于:
基于所述服务质量评估模型所适用的业务类型确定服务质量的评估指标;
利用所述质量监控数据计算所述评估指标的数值,并将所述评估指标的数值确定为标签。
可选的,所述深度神经网络模型为LSTM神经网络模型。
可选的,所述LSTM神经网络模型用于至少一层神经网络,每层神经网络均用于遗忘门、输入门、输出门、神经元状态以及输出结果,表达式分别为:
f t=σ g(W fx t+U fc t-1+b f);
i t=σ g(W ix t+U ic t-1+b i);
o t=σ g(W 0x t+U oc t-1+b o);
c t=f tοc t-1+i tοσ c(W cx t+b c);
h t=o tοσ h(c t);
其中,f t表示遗忘门,i t表示输入门,o t表示输出门,c t表示神经元状态,h t表示输出结果,σ g、σ c、σ h分别表示激活函数,x t表示t时刻的输入数据,W f、W i、W o、W c、U f、U i、U o分别表示权重矩阵,b f、b i、b o、b c分别表示偏置向量。
可选的,当所述LSTM神经网络模型用于多层神经网络时,各层神经网络中同一种参数的设置不同。
可选的,当所述LSTM神经网络模型用于多层神经网络时,所述训练模块,具体用于:
将所述训练集中的特征值输入LSTM神经网络模型的第一层神经网络中进行传播,得到输出结果;
将当前得到的输出结果输入下一层神经网络进行传播,得到新的输出结果,如果所述下一层神经网络为最后一层神经网络,则结束该步骤,否则重复该步骤;
确定最后一层神经网络的输出结果与标签之间的误差;
将所述误差进行反向传播,优化模型参数。
可选的,所述训练集用于多个训练样本,每个所述训练样本均用于标签和n个时间步长的特征值,其中,n为正整数;
所述训练模块,具体用于:
依次将x t(t=1、2、……、n)输入所述LSTM神经网络模型的第一层神经网络中,该x t为所述训练集所包含的全部训练样本中第t个时间步长的特征值组成的矩阵,并得到输出结果h n
本发明实施例有以下有益效果:
(1)本发明实施例利用机器性能数据、网络特征数据以及质量监控数据训练模型,从而学习得到机器性能数据、网络特征数据与服务质量的非线性关系,在使用该模型评估服务系统的服务质量时,只需要输入服务系统的机器性能数据和网络特征数据,相比于通过服务端访问日志评估服务质量的方式,能够减少数据的输入,大幅降低了评估所需的计算资源以及带宽,不仅提高了服务质量评估效率,而且能够降低运营成本;
(2)相比于通过服务端访问日志评估服务质量的方式,本发明实施例采用机器性能数据和网络特征数据作为模型输入,这些数据和具体的业务解耦,所以能够形成一套通用的服务质量评估标准,方便服务系统的管理;
(3)相对于通过人工分析评估服务质量的方式,本发明实施例不需要依赖人工的经验,能够利用机器学习方法自动建模,且准确性更高。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种网络框架示意图;
图2是本发明实施例提供的另一种网络框架示意图;
图3是本发明实施例提供的一种服务质量评估模型的训练方法的流程图;
图4是本发明实施例提供的一种服务质量评估方法的流程图;
图5是本发明实施例提供的一种服务质量评估模型的训练装置的结构框图;
图6是本发明实施例提供的一种服务质量评估装置的结构框图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
本发明实施例提供了一种服务质量评估模型的训练方法,该方法可以应用于图1所示的网络框架中。该网络框架包括服务节点、监控节点以及模型训练节点。服务节点为CDN服务系统中为用户提供服务的节点。监控节点连接服务节点,用于通过向服务节点发送检测信号以检测监控节点到各个服务节点的网络情况,并生成网络特征数据。监控节点还可以用于采集各个服务节点中的机器性能数据以及质量监控数据。。模型训练节点连接监控节点,用于从监控节点中采集各个服务节点的机器性能数据、网络特征数据以及质量监控数据,然后利用采集到的数据训练得到服务质量评估模型。监控节点可以为一个或多个,当存在多个监控节点时,每个监控节点可以负责监控部分服务节点。模型训练节点可以是单个服务器,也可以是服务器组。上述模型训练节点中可以包括处理器、存储器、收发器,处理器可以用于进行下述流程中的服务质量评估模型的训练,存储器可以用于存储下述训练过程中需要的数据以及产生的数据,收发器可以用于接收和发送下述训练过程中的相关数据。
本发明实施例还提供了一种服务质量评估方法,该方法可以应用于图2所 示的网络框架中。该网络框架包括服务节点、监控节点以及质量评估节点。监控节点用于检测监控节点到各个服务节点的网络情况,并生成网络特征数据,以及用于采集各个服务节点中的机器性能数据。质量评估节点连接监控节点,用于从监控节点中采集各个服务节点的机器性能数据以及网络特征数据,然后利用采集到的数据以及训练好的服务质量评估模型评估CDN服务系统的服务质量。质量评估节点可以是单个服务器,也可以是服务器组。上述质量评估节点可以包括处理器、存储器、收发器,处理器可以用于进行下述流程中的服务质量的评估,存储器可以用于存储下述评估过程中需要的数据以及产生的数据,收发器可以用于接收和发送下述评估过程中的相关数据。上述模型训练节点与质量评估节点可以是同一节点,或者不是同一节点。
需要说明的是,本发明实施例不仅适用于评估CDN服务系统的服务质量,还适用于评估单个服务器节点的服务质量,以及由多个服务器节点组成的其他服务系统或集群的服务质量,本发明实施例不对其可应用的范围进行具体限定。
参见图3,为本发明实施例提供的一种服务质量评估模型的训练方法的流程图,该方法可以应用于模型训练节点,也即由模型训练节点来执行,该方法具体可以包括以下步骤。
步骤301,按照固定周期采集服务节点的机器性能数据、网络特征数据以及质量监控数据。
训练服务质量评估模型的过程为将特征值输入模型中进行训练,得到输出结果,然后根据输出结果与真实结果之间的误差,调整模型的参数,再继续训练调整之后的模型,如此迭代循环,从而建立输入与输出结果之间的非线性关系,也即得到服务质量评估模型。其中,用于确定特征值的数据可以包括机器性能数据和网络特征数据。在具体实施中,用于体现服务质量的数据还可以包括其他数据,本发明实施例不对用于体现服务质量的数据进行具体限定。本发明实施例可以按照固定周期采集CDN服务系统中每个服务节点的机器性能数据、网络特征数据和质量监控数据。机器性能数据可以包括cpu利用率、内存剩余量、iowait值以及ioutil值等等。在CDN服务系统的运行中,监控节点周期性地向服务节点发送检测信号以检测监控节点到各个服务节点的网络情况,并得到网络特征数据,所以可以从监控节点中采集到服务节点的网络特征数据。 网络特征数据可以包括ping(Packet Internet Groper,因特网包探索器)数据、poll数据以及下载速率等等。
机器性能数据以及质量监控数据需要从服务节点中获取,不过为避免模型训练节点直接从服务节点采集数据,而需要与CDN系统中的服务节点建立大规模的链路,可以统一由监控节点先周期地从服务节点中采集机器性能数据以及质量监控数据,然后模型训练节点再按照固定周期从监控节点中获取机器性能数据、网络特征数据以及质量监控数据。在另一种实施方式中,服务节点和监控节点也可以将模型训练所需的数据发送至分布式存储系统中,然后模型训练节点再按照固定周期从分布式存储系统中获取这些数据。本发明实施例不对采集原始数据所采用的方式进行具体限定。
质量监控数据用于计算服务质量的评估指标,也即用于与模型输出结果进行对比的真实结果,质量监控数据可以包括请求响应时间以及请求内容大小等信息。在采集质量监控数据时,可以从服务节点的日志信息中获得相应的质量监控数据。
利用本发明实施例提供的方法,训练得到的服务质量模型适用于一种业务类型的质量评估,而每种业务类型包括多种应用服务,所以可以根据服务质量评估模型所适用的业务类型,选择属于该业务类型的应用服务对应的质量监控数据进行模型训练。所以本发明实施例可以采用通用的模型训练方法,训练出适用于各种业务类型的服务质量模型。在采集质量监控数据时,可以采集服务节点中一种或多种应用服务对应的质量监控数据,所述一种或多种应用服务属于所述服务质量评估模型所适用的业务类型,也就是说,需要采集的质量监控数据为有限应用服务对应的质量监控数据,而无需采集该类业务所包含的全部应用服务对应的质量监控数据。例如,待训练的模型所适用的业务类型为A类业务,A类业务所包含的应用服务有应用服务A1、应用服务A2、……、应用服务An,在采集质量监控数据时,可以只采集应用服务A1对应的质量监控数据,用于后续的模型训练,从而降低数据采集时的数据传输压力,以及后续的数据处理负担。在具体实施中,也可以采集更大范围的数据,然后从采集到的大数据集中获取服务节点中预设应用服务对应的质量监控数据,以利用获取到的数据进行模型训练。
在实施中,可以采集多个CDN服务系统的机器性能数据、网络特征数据以 及质量监控数据。
在采集到原始数据之后,需要对原始数据进行预处理,预处理过程包括:删除原始数据中有重复时间戳的数据,以及筛选出原始数据中的空值以及异常值,并用正常值替换空值和异常值,或者删除空值和异常值。对于空值可以直接从原始数据中筛选出;对于异常值,使用聚类算法或者数据标准化后设置置信区间进行筛选。在进行替换时可以使用k-NN方法或者使用相邻采集周期采集到的数据进行替换。以下对使用相邻采集周期采集到的数据进行替换的方式进行举例说明。例如,当前采集周期采集到的节点A的cpu利用率为空值或者异常值,则可以将其替换为上个采集周期采集到的节点A的cpu利用率。
在实施中,可以将采集到的原始数据导入kafka队列,从而可以重复消费原始数据,例如将原始数据复制为两份,一份数据用于离线训练模型,另一份数据用于实时计算服务质量。
步骤302,基于所述机器性能数据以及所述网络特征数据确定特征值。
在实施中,可以利用统计学方法或结合人工经验筛选出模型训练所需的特征值,也即与评估服务质量相关的特征值。在本发明实施例中,机器性能数据的特征值可以包括各维度机器性能数据的均值、极大值或者方差的一种或多种,具体地,包括cpu利用率的均值、极大值或者方差;内存剩余量的均值、极大值或者方差;负载的均值、极大值或者方差;iowait值的均值、极大值或者方差以及ioutil值的均值、极大值或者方差。网络特征数据的特征值可以包括各维度网络特征数据的至少一个预设分位值,例如,包括ping数据的25分位值、50分位值和75分位值,以及poll数据的25分位值、50分位值和75分位值。在计算各特征值时,按照CDN服务系统的粒度以及采集周期的粒度进行计算,例如,cpu利用率的均值为同一CDN服务系统中同一采集周期内采集到的全部服务节点的cpu利用率的均值。
具体的,可以使用Hive SQL计算各特征值。
步骤303,基于所述质量监控数据确定标签。
标签可以直观地反映出服务质量的高低,而对于不同的业务类型,用于评估服务质量的指标不同。根据服务质量评估模型所适用的业务类型,使用相应的评估指标确定标签。基于所述质量监控数据确定标签的步骤具体包括:基于所述服务质量评估模型所适用的业务类型确定服务质量的评估指标,再利用所 述质量监控数据计算所述评估指标的数值,并将所述评估指标的数值确定为标签。例如,对于适用于点播业务的服务质量评估模型,可以使用卡顿率作为评估指标,并将使用质量监控数据计算出的卡顿率值作为标签进行模型训练,本发明实施例不对模型训练时标签所使用的评估指标进行具体限定。
采集到的质量监控数据为较大量的原始数据,无法直观地反映出服务质量的高低,所以需要经过一系列地计算,得到服务质量的评估指标的数值,并将计算得到的评估指标的数值作为标签,用以进行模型训练。
步骤304,利用所述特征值以及所述标签建立训练集。
在根据原始数据得到特征值以及标签之后,可以利用所述特征值以及所述标签组建训练样本,每个训练样本包括n个时间步长的特征值及其对应的标签,其中,n为正整数,并且每个训练样本可以对应有一个标签值。可选的,在根据原始数据计算特征值以及标签之前,可以根据训练样本所包含的时间步长的个数对原始数据进行汇总,也就是说,将n个采集周期内采集到的原始数据汇总在一起,再计算每个采集周期内得到的机器性能数据和网络特征数据的特征值以及n个采集周期内得到的质量监控数据对应的标签,得到包括n个时间步长数据的训练样本。
利用采集到的原始数据可以得到大量的训练样本,训练样本可以按照预设划分比例建立训练集、验证集以及测试集,例如划分比例可以为60%、20%、20%。该训练集用于训练模型;验证集用于验证训练后的模型,并选出准确度较高的模型;测试集用于进一步测试并优化通过验证集选出的模型。
步骤305,利用所述训练集训练深度神经网络模型,得到服务质量评估模型。
该深度神经网络模型可以采用LSTM(Long Short-Term Memory,长短期记忆网络)神经网络模型,LSTM神经网络属于一种时间递归神经网络。本发明实施例采用的LSTM神经网络模型至少包括一层神经网络,每层神经网络均包括遗忘门、输入门、输出门、神经元状态以及输出结果,其表达式分别为:
f t=σ g(W fx t+U fc t-1+b f);
i t=σ g(W ix t+U ic t-1+b i);
o t=σ g(W 0x t+U oc t-1+b o);
c t=f tοc t-1+i tοσ c(W cx t+b c);
h t=o tοσ h(c t);
其中,f t表示遗忘门,i t表示输入门,o t表示输出门,c t表示神经元(cell) 状态,h t表示输出结果,σ g、σ c、σ h分别表示激活函数,x t表示t时刻的输入数据,W f、W i、W o、W c、U f、U i、U o分别表示权重矩阵,b f、b i、b o、b c分别表示偏置向量。具体的,σ g可以为Sigmiod函数,σ c和σ h可以为tanh函数。
当LSTM神经网络模型包括多层神经网络时,每层神经网络中同一种参数的设置可以不同,例如第一层中的σ g参数与第二层中的σ g参数设置可以不同。利用训练集训练多层结构的LSTM神经网络模型的过程为:首先将训练集输入LSTM神经网络模型的第一层神经网络中进行传播,得到输出结果;将当前得到的输出结果输入下一层神经网络进行传播,得到新的输出结果,如果所述下一层神经网络为最后一层神经网络,则结束该步骤,否则重复该步骤;确定最后一层神经网络的输出结果与标签之间的误差;将所述误差进行反向传播,优化模型参数。
优选的,本发明实施例可以采用两层结构的LSTM神经网络模型。以下以两层结构的LSTM神经网络模型为例,说明模型的训练过程。
首先将训练集输入LSTM神经网络模型的第一层神经网络中进行传播,其中x t为训练集中的特征值,t=1、2、……、n,n为上述每个训练样本所包含的时间步长的个数,例如n为10。n值可以根据经验值进行设置,也可以通过自学习进行设置,本发明实施例不对n的取值进行具体限定。
训练集在第一层神经网络中的传播过程具体为:依次将x t(t=1、2、……、n)输入所述LSTM神经网络模型的第一层神经网络中,该x t为所述训练集所包含的全部训练样本中第t个时间步长的特征值组成的矩阵,并得到输出结果h n
再将h n作为x t输入第二层神经网络中进行传播,h n在第二层神经网络中的传播过程与训练集在第一层神经网络中传播过程类似,在此不做赘述。h n在第二层神经网络中传播结束后,输出服务质量数据,根据输出的服务质量数据与训练集中的标签计算误差,该误差可以用损失函数表示,将误差输入模型中进行反向传播,并根据误差对模型中的参数(包括权重矩阵和偏置向量)求偏微分,然后根据求偏微分后的值调整参数,从而优化模型。
在实施中,训练样本也可以分批次进行模型训练,每批次训练样本均构成一个训练集,并按照上述方法进行训练,并更新模型参数。例如,利用第一批次训练样本进行训练,并更新模型参数,然后继续用第二批次训练样本进行训 练,并更新模型参数,依次输入各批次训练样本进行训练,直至利用最后一批次训练样本训练结束。
在利用训练集训练模型结束后,可以利用验证集验证训练后的模型的精确度,即计算模型输出结果与真实结果,即标签之间的误差,如果不符合要求,即精确度不够,则调整超参数,重新训练调整之后的模型,如此迭代循环,从而选出准确度较高的模型。然后还可以利用测试集进一步测试并优化通过验证集选出的模型,即计算模型输出结果与真实结果之间的误差,再反向传播该损失函数,优化模型参数。
模型训练节点可以是单个服务器,也可以是服务器组。当模型训练节点是单个服务器时,上述训练过程均由该单个服务器执行。如果上述训练过程数据处理量较大时,可以由服务器组来执行。可选的,模型训练节点可以包括大数据节点和深度学习节点。大数据节点用于对采集到的原始数据进行预处理,以及利用原始数据建立训练集。深度学习节点用于利用训练集对模型进行训练,得到服务质量评估模型。大数据节点和深度学习节点可以是单个服务器,也可以是服务器组。在实施中,可以利用Hadoop分布式文件系统保存训练集,深度学习节点可以从Hadoop分布式文件系统中读取训练集,进行模型训练。具体地,可以使用TensorFlow训练模型。
本发明实施例有以下有益效果:
(1)本发明实施例利用机器性能数据、网络特征数据以及质量监控数据训练模型,从而学习得到机器性能数据、网络特征数据与服务质量的非线性关系,在使用该模型评估服务系统的服务质量时,只需要输入服务系统的机器性能数据和网络特征数据,相比于通过服务端访问日志评估服务质量的方式,能够减少数据的输入,大幅降低了评估所需的计算资源以及带宽,不仅提高了服务质量评估效率,而且能够降低运营成本;
(2)相比于通过服务端访问日志评估服务质量的方式,本发明实施例采用机器性能数据和网络特征数据作为模型输入,这些数据和具体的业务解耦,所以能够形成一套通用的服务质量评估标准,方便服务系统的管理;
(3)相对于通过人工分析评估服务质量的方式,本发明实施例不需要依赖人工的经验,能够利用机器学习方法自动建模,且准确性更高。
服务质量评估模型训练结束后,可以将该训练好的服务质量评估模型部署到线上进行应用,利用该服务质量评估模型进行服务质量评估的方法如下所述。
参见图4,为本发明实施例提供的一种服务质量评估方法的流程图,该方法可以应用于质量评估节点,也即由质量评估节点来执行,该方法具体可以包括以下步骤。
步骤401,采集服务节点的机器性能数据和网络特征数据。
质量评估节点可以采集待进行服务质量评估的服务系统中的机器性能数据和网络特征数据,以用于服务质量评估。质量评估节点在采集原始数据时,可以按照固定周期采集预设周期数的数据。该预设周期数需要不小于样本所需的时间步长的个数。例如,在训练模型时,使用的训练样本包括10个时间步长的数据,则预设周期数需要不小于10。
步骤402,基于所述机器性能数据以及所述网络特征数据确定特征值。
该步骤与上述模型训练过程中特征值的计算过程类似,在此不再赘述。
步骤403,将所述特征值输入训练好的服务质量评估模型,得到服务质量评估结果。
从计算得到的特征值中,选取预设数量时间步长的特征值输入训练好的服务质量评估模型,经过模型计算后,输出服务质量评估结果。用于服务质量评估的特征值的时间步长的数量等于训练模型时训练样本所包含的时间步长的数量。
服务质量评估模型部署到线上应用后,可以定期进行测试以及训练,进一步优化模型参数,提高模型的精确度。
本发明实施例有以下有益效果:
(1)本发明实施例利用机器性能数据、网络特征数据以及质量监控数据训练模型,从而学习得到机器性能数据、网络特征数据与服务质量的非线性关系,在使用该模型评估服务系统的服务质量时,只需要输入服务系统的机器性能数据和网络特征数据,相比于通过服务端访问日志评估服务质量的方式,能够减少数据的输入,大幅降低了评估所需的计算资源以及带宽,不仅提高了服务质量评估效率,而且能够降低运营成本;
(2)相比于通过服务端访问日志评估服务质量的方式,本发明实施例采用机器性能数据和网络特征数据作为模型输入,这些数据和具体的业务解耦,所 以能够形成一套通用的服务质量评估标准,方便服务系统的管理;
(3)相对于通过人工分析评估服务质量的方式,本发明实施例不需要依赖人工的经验,能够利用机器学习方法自动建模,且准确性更高。
参照图5,为本发明实施例提供的一种服务质量评估模型的训练装置的结构框图,该装置可以配置于模型训练节点,或者为模型训练节点本身,该装置具体可以包括采集模块501、处理模块502以及训练模块503。
其中,采集模块,用于按照固定周期采集服务节点的机器性能数据、网络特征数据以及质量监控数据;
处理模块,用于基于所述机器性能数据以及所述网络特征数据确定特征值;
所述处理模块,还用于基于所述质量监控数据确定标签;
所述处理模块,还用于利用所述特征值以及所述标签建立训练集;
训练模块,用于利用所述训练集训练深度神经网络模型,得到服务质量评估模型。
优选的,每个所述服务质量评估模型适用于一种业务类型的质量评估;
所述采集模块,具体用于:
按照固定周期采集服务节点中一种或多种应用服务对应的质量监控数据,所述一种或多种应用服务属于所述服务质量评估模型所适用的业务类型。
优选的,所述机器性能数据用于cpu利用率、内存剩余量、负载、iowait值以及ioutil值;所述网络特征数据用于ping数据、poll数据以及下载速率。
优选的,所述采集模块,具体用于:
按照固定周期从监控节点中采集服务节点的网络特征数据。
优选的,所述处理模块,还用于:
删除所述机器性能数据、所述网络特征数据以及所述质量监控数据中有重复时间戳的数据;
将所述机器性能数据、所述网络特征数据以及所述质量监控数据中的空值和异常值替换为正常值,或者删除所述空值和异常值。
优选的,所述处理模块,具体用于:
使用聚类算法或者数据标准化后设置置信区间方式,筛选所述机器性能数据、所述网络特征数据以及所述质量监控数据中的空值和异常值;
使用k-NN装置或者相邻采集周期采集到的数据替换所述异常值。
优选的,所述机器性能数据的特征值用于各维度机器性能数据的均值、极大值或者方差中的一种或多种;
所述网络特征数据的特征值用于各维度网络特征数据的至少一个预设分位值。
优选的,所述处理模块,具体用于:
基于所述服务质量评估模型所适用的业务类型确定服务质量的评估指标;
利用所述质量监控数据计算所述评估指标的数值,并将所述评估指标的数值确定为标签。
优选的,所述深度神经网络模型为LSTM神经网络模型。
优选的,所述LSTM神经网络模型用于至少一层神经网络,每层神经网络均用于遗忘门、输入门、输出门、神经元状态以及输出结果,表达式分别为:
f t=σ g(W fx t+U fc t-1+b f);
i t=σ g(W ix t+U ic t-1+b i);
o t=σ g(W 0x t+U oc t-1+b o);
c t=f tοc t-1+i tοσ c(W cx t+b c);
h t=o tοσ h(c t);
其中,f t表示遗忘门,i t表示输入门,o t表示输出门,c t表示神经元状态,h t表示输出结果,σ g、σ c、σ h分别表示激活函数,x t表示t时刻的输入数据,W f、W i、W o、W c、U f、U i、U o分别表示权重矩阵,b f、b i、b o、b c分别表示偏置向量。
优选的,当所述LSTM神经网络模型用于多层神经网络时,各层神经网络中同一种参数的设置不同。
优选的,当所述LSTM神经网络模型用于多层神经网络时,所述训练模块,具体用于:
将所述训练集中的特征值输入LSTM神经网络模型的第一层神经网络中进行传播,得到输出结果;
将当前得到的输出结果输入下一层神经网络进行传播,得到新的输出结果,如果所述下一层神经网络为最后一层神经网络,则结束该步骤,否则重复该步 骤;
确定最后一层神经网络的输出结果与标签之间的误差;
将所述误差进行反向传播,优化模型参数。
优选的,所述训练集用于多个训练样本,每个所述训练样本均用于标签和n个时间步长的特征值,其中,n为正整数;
所述训练模块,具体用于:
依次将x t(t=1、2、……、n)输入所述LSTM神经网络模型的第一层神经网络中,该x t为所述训练集所包含的全部训练样本中第t个时间步长的特征值组成的矩阵,并得到输出结果h n
优选的,所述处理模块,还用于利用所述特征值以及所述标签建立验证集;
相应的,所述训练模块,用于:
将所述验证集的特征值输入训练之后的模型,得到输出结果;
确定所述输出结果与所述验证集的标签之间的误差;
如果所述误差不符合要求,则调整超参数,重新训练调整之后的模型。
优选的,所述服务质量评估模型所建立的输入与输出结果之间的关系为非线性关系。
优选的,所述模型训练节点为单个服务器或服务器组。
需要说明的是:上述实施例提供的服务质量评估模型的训练装置在训练模型时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的服务质量评估模型的训练装置与服务质量评估模型的训练方法的实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。并且上述实施例提供的服务质量评估模型的训练装置与服务质量评估模型的训练方法所具有的有益效果相同,服务质量评估模型的训练装置实施例的有益效果可参考服务质量评估模型的训练方法实施例的有益效果,这里也不再赘述。
参照图6,为本发明实施例提供的一种服务质量评估装置的结构框图,该装置可以配置于质量评估节点,或者为质量评估节点本身,该装置具体可以包括采集模块601、处理模块602以及评估模块603。
其中,采集模块601,用于采集用于评估服务质量的机器性能数据和网络特征数据;
处理模块602,用于基于所述机器性能数据和所述网络特征数据确定特征值;
评估模块603,用于将所述特征值输入上述训练好的服务质量评估模型,得到质量数据。
需要说明的是:上述实施例提供的服务质量评估装置在评估服务质量时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的服务质量评估装置与服务质量评估方法的实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。并且上述实施例提供的服务质量评估装置与服务质量评估方法所具有的有益效果相同,服务质量评估装置实施例的有益效果可参考服务质量评估方法实施例的有益效果,这里也不再赘述。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (29)

  1. 一种服务质量评估模型的训练方法,其特征在于,应用于模型训练节点,所述方法包括:
    按照固定周期采集服务节点的机器性能数据、网络特征数据以及质量监控数据;
    基于所述机器性能数据以及所述网络特征数据确定特征值;
    基于所述质量监控数据确定标签;
    利用所述特征值以及所述标签建立训练集;
    利用所述训练集训练深度神经网络模型,得到服务质量评估模型。
  2. 根据权利要求1所述的方法,其特征在于,每个所述服务质量评估模型适用于一种业务类型的质量评估;
    相应的,按照固定周期采集服务节点的质量监控数据,包括:
    按照固定周期采集服务节点中一种或多种应用服务对应的质量监控数据,所述一种或多种应用服务属于所述服务质量评估模型所适用的业务类型。
  3. 根据权利要求1所述的方法,其特征在于,所述机器性能数据包括cpu利用率、内存剩余量、负载、iowait值以及ioutil值;所述网络特征数据包括ping数据、poll数据以及下载速率。
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    监控节点周期性地向所述服务节点发送检测信号,并得到网络特征数据;
    相应的,所述按照固定周期采集服务节点的网络特征数据的步骤,包括:
    按照固定周期从所述监控节点中采集服务节点的网络特征数据。
  5. 根据权利要求1所述的方法,其特征在于,基于所述机器性能数据以及所述网络特征数据确定特征值之前,包括:
    删除所述机器性能数据、所述网络特征数据以及所述质量监控数据中有重复时间戳的数据;
    将所述机器性能数据、所述网络特征数据以及所述质量监控数据中的空值和异常值替换为正常值,或者删除所述空值和异常值。
  6. 根据权利要求5所述的方法,其特征在于,将所述机器性能数据、所述网络特征数据以及所述质量监控数据中的异常值替换为正常值的步骤包括:
    使用聚类算法或者数据标准化后设置置信区间方式,筛选所述机器性能数据、所述网络特征数据以及所述质量监控数据中的空值和异常值;
    使用k-NN方法或者相邻采集周期采集到的数据替换所述异常值。
  7. 根据权利要求3所述的方法,其特征在于,
    所述机器性能数据的特征值包括各维度机器性能数据的均值、极大值或者方差中的一种或多种;
    所述网络特征数据的特征值包括各维度网络特征数据的至少一个预设分位值。
  8. 根据权利要求1所述的方法,其特征在于,基于所述质量监控数据确定标签的步骤,包括:
    基于所述服务质量评估模型所适用的业务类型确定服务质量的评估指标;
    利用所述质量监控数据计算所述评估指标的数值,并将所述评估指标的数值确定为标签。
  9. 根据权利要求1所述的方法,其特征在于,所述深度神经网络模型为LSTM神经网络模型。
  10. 根据权利要求9所述的方法,其特征在于,所述LSTM神经网络模型包括至少一层神经网络,每层神经网络均包括遗忘门、输入门、输出门、神经元状态以及输出结果,表达式分别为:
    f t=σ g(W fx t+U fc t-1+b f);
    i t=σ g(W ix t+U ic t-1+b i);
    o t=σ g(W 0x t+U oc t-1+b o);
    Figure PCTCN2018113031-appb-100001
    Figure PCTCN2018113031-appb-100002
    其中,f t表示遗忘门,i t表示输入门,o t表示输出门,c t表示神经元状态,h t表示输出结果,σ g、σ c、σ h分别表示激活函数,x t表示t时刻的输入数据,W f、W i、W o、W c、U f、U i、U o分别表示权重矩阵,b f、b i、b o、b c分别表示偏置向量。
  11. 根据权利要求10所述的方法,其特征在于,当所述LSTM神经网络模型包括多层神经网络时,各层神经网络中同一种参数的设置不同。
  12. 根据权利要求10所述的方法,其特征在于,当所述LSTM神经网络模 型包括多层神经网络时,利用所述训练集训练LSTM神经网络模型的步骤,包括:
    将所述训练集中的特征值输入LSTM神经网络模型的第一层神经网络中进行传播,得到输出结果;
    将当前得到的输出结果输入下一层神经网络进行传播,得到新的输出结果,如果所述下一层神经网络为最后一层神经网络,则结束该步骤,否则重复该步骤;
    确定最后一层神经网络的输出结果与标签之间的误差;
    将所述误差进行反向传播,优化模型参数。
  13. 根据权利要求12所述的方法,其特征在于,所述训练集包括多个训练样本,每个所述训练样本均包括标签和n个时间步长的特征值,其中,n为正整数;
    将所述训练集中的特征值输入LSTM神经网络模型的第一层神经网络中进行传播,得到输出结果的步骤,包括:
    依次将x t(t=1、2、……、n)输入所述LSTM神经网络模型的第一层神经网络中,该x t为所述训练集所包含的全部训练样本中第t个时间步长的特征值组成的矩阵,并得到输出结果h n
  14. 根据权利要求1所述的方法,其特征在于,所述方法还包括:利用所述特征值以及所述标签建立验证集;
    利用所述训练集训练深度神经网络模型的步骤之后,包括:
    将所述验证集的特征值输入训练之后的模型,得到输出结果;
    确定所述输出结果与所述验证集的标签之间的误差;
    如果所述误差不符合要求,则调整超参数,重新训练调整之后的模型。
  15. 根据权利要求1所述的方法,其特征在于,所述服务质量评估模型所建立的输入与输出结果之间的关系为非线性关系。
  16. 根据权利要求1所述的方法,其特征在于,所述模型训练节点为单个 服务器或服务器组。
  17. 一种服务质量评估模型的训练装置,其特征在于,用于:
    采集模块,用于按照固定周期采集服务节点的机器性能数据、网络特征数据以及质量监控数据;
    处理模块,用于基于所述机器性能数据以及所述网络特征数据确定特征值;
    所述处理模块,还用于基于所述质量监控数据确定标签;
    所述处理模块,还用于利用所述特征值以及所述标签建立训练集;
    训练模块,用于利用所述训练集训练深度神经网络模型,得到服务质量评估模型。
  18. 根据权利要求17所述的装置,其特征在于,每个所述服务质量评估模型适用于一种业务类型的质量评估;
    所述采集模块,具体用于:
    按照固定周期采集服务节点中一种或多种应用服务对应的质量监控数据,所述一种或多种应用服务属于所述服务质量评估模型所适用的业务类型。
  19. 根据权利要求17所述的装置,其特征在于,所述机器性能数据用于cpu利用率、内存剩余量、负载、iowait值以及ioutil值;所述网络特征数据用于ping数据、poll数据以及下载速率。
  20. 根据权利要求17所述的装置,其特征在于,所述采集模块,具体用于:
    按照固定周期从监控节点中采集服务节点的网络特征数据。
  21. 根据权利要求17所述的装置,其特征在于,所述处理模块,还用于:
    删除所述机器性能数据、所述网络特征数据以及所述质量监控数据中有重复时间戳的数据;
    将所述机器性能数据、所述网络特征数据以及所述质量监控数据中的空值和异常值替换为正常值,或者删除所述空值和异常值。
  22. 根据权利要求21所述的装置,其特征在于,所述处理模块,具体用于:
    使用聚类算法或者数据标准化后设置置信区间方式,筛选所述机器性能数据、所述网络特征数据以及所述质量监控数据中的空值和异常值;
    使用k-NN装置或者相邻采集周期采集到的数据替换所述异常值。
  23. 根据权利要求19所述的装置,其特征在于,
    所述机器性能数据的特征值用于各维度机器性能数据的均值、极大值或者方差中的一种或多种;
    所述网络特征数据的特征值用于各维度网络特征数据的至少一个预设分位值。
  24. 根据权利要求17所述的装置,其特征在于,所述处理模块,具体用于:
    基于所述服务质量评估模型所适用的业务类型确定服务质量的评估指标;
    利用所述质量监控数据计算所述评估指标的数值,并将所述评估指标的数值确定为标签。
  25. 根据权利要求17所述的装置,其特征在于,所述深度神经网络模型为LSTM神经网络模型。
  26. 根据权利要求25所述的装置,其特征在于,所述LSTM神经网络模型用于至少一层神经网络,每层神经网络均用于遗忘门、输入门、输出门、神经元状态以及输出结果,表达式分别为:
    f t=σ g(W fx t+U fc t-1+b f);
    i t=σ g(W ix t+U ic t-1+b i);
    o t=σ g(W 0x t+U oc t-1+b o);
    Figure PCTCN2018113031-appb-100003
    Figure PCTCN2018113031-appb-100004
    其中,f t表示遗忘门,i t表示输入门,o t表示输出门,c t表示神经元状态,h t表示输出结果,σ g、σ c、σ h分别表示激活函数,x t表示t时刻的输入数据,W f、W i、W o、W c、U f、U i、U o分别表示权重矩阵,b f、b i、b o、b c分别表示偏置向量。
  27. 根据权利要求26所述的装置,其特征在于,当所述LSTM神经网络模型用于多层神经网络时,各层神经网络中同一种参数的设置不同。
  28. 根据权利要求26所述的装置,其特征在于,当所述LSTM神经网络模型用于多层神经网络时,所述训练模块,具体用于:
    将所述训练集中的特征值输入LSTM神经网络模型的第一层神经网络中进行传播,得到输出结果;
    将当前得到的输出结果输入下一层神经网络进行传播,得到新的输出结果, 如果所述下一层神经网络为最后一层神经网络,则结束该步骤,否则重复该步骤;
    确定最后一层神经网络的输出结果与标签之间的误差;
    将所述误差进行反向传播,优化模型参数。
  29. 根据权利要求28所述的装置,其特征在于,所述训练集用于多个训练样本,每个所述训练样本均用于标签和n个时间步长的特征值,其中,n为正整数;
    所述训练模块,具体用于:
    依次将x t(t=1、2、……、n)输入所述LSTM神经网络模型的第一层神经网络中,该x t为所述训练集所包含的全部训练样本中第t个时间步长的特征值组成的矩阵,并得到输出结果h n
PCT/CN2018/113031 2018-10-17 2018-10-31 一种服务质量评估模型的训练方法及装置 WO2020077682A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/043,148 US20210027170A1 (en) 2018-10-17 2018-10-31 Training method and apparatus for service quality evaluation models
EP18937140.4A EP3863222A4 (en) 2018-10-17 2018-10-31 METHOD AND DEVICE FOR TRAINING A MODEL FOR EVALUATING THE QUALITY OF SERVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811210366.6A CN109347668B (zh) 2018-10-17 2018-10-17 一种服务质量评估模型的训练方法及装置
CN201811210366.6 2018-10-17

Publications (1)

Publication Number Publication Date
WO2020077682A1 true WO2020077682A1 (zh) 2020-04-23

Family

ID=65309213

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/113031 WO2020077682A1 (zh) 2018-10-17 2018-10-31 一种服务质量评估模型的训练方法及装置

Country Status (4)

Country Link
US (1) US20210027170A1 (zh)
EP (1) EP3863222A4 (zh)
CN (1) CN109347668B (zh)
WO (1) WO2020077682A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527783A (zh) * 2020-11-27 2021-03-19 中科曙光南京研究院有限公司 一种基于Hadoop的数据质量探查系统

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710636B (zh) * 2018-11-13 2022-10-21 广东工业大学 一种基于深度迁移学习的无监督工业系统异常检测方法
CN116032781A (zh) 2019-02-27 2023-04-28 华为技术有限公司 人工智能增强数据采样
CN111797851A (zh) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 特征提取方法、装置、存储介质及电子设备
CN110058943B (zh) * 2019-04-12 2021-09-21 三星(中国)半导体有限公司 用于电子设备的内存优化方法和设备
FR3095100B1 (fr) * 2019-04-15 2021-09-03 Continental Automotive Procédé de prédiction d’une qualité de signal et/ou de service et dispositif associé
CN110276380B (zh) * 2019-05-22 2021-08-17 杭州电子科技大学 一种基于深度模型框架的实时运动在线指导系统
US11558266B2 (en) * 2019-12-17 2023-01-17 Arbor Networks, Inc. Scoring network traffic service requests using response time metrics
US20210304039A1 (en) * 2020-03-24 2021-09-30 Hitachi, Ltd. Method for calculating the importance of features in iterative multi-label models to improve explainability
CN111211984B (zh) * 2020-04-20 2020-07-10 中国人民解放军国防科技大学 优化cdn网络的方法、装置及电子设备
CN111679959A (zh) * 2020-04-23 2020-09-18 平安科技(深圳)有限公司 计算机性能数据确定方法、装置、计算机设备及存储介质
CN112819152B (zh) * 2020-08-14 2024-03-01 腾讯科技(深圳)有限公司 一种神经网络训练方法及装置
CN112671633B (zh) * 2020-12-01 2022-08-23 重庆邮电大学 一种基于bp神经网络预测的二分探测心跳间隔系统及方法
CN112817952A (zh) * 2021-01-20 2021-05-18 北京明略软件系统有限公司 数据质量评估方法及系统
CN113206824B (zh) * 2021-03-23 2022-06-24 中国科学院信息工程研究所 动态网络异常攻击检测方法、装置、电子设备和存储介质
CN113084388B (zh) * 2021-03-29 2023-05-09 广州明珞装备股份有限公司 焊接质量的检测方法、系统、装置及存储介质
CN113204571B (zh) * 2021-04-23 2022-08-30 新华三大数据技术有限公司 涉及写入操作的sql执行方法、装置及存储介质
CN113592078A (zh) * 2021-08-09 2021-11-02 郑州大学 一种基于人工智能的深度学习网络训练方法
CN114629802B (zh) * 2021-11-04 2023-12-08 国网浙江省电力有限公司湖州供电公司 一种基于业务感知的电力通信骨干网络质量评估方法
CN114760191B (zh) * 2022-05-24 2023-09-19 咪咕文化科技有限公司 数据服务质量预警方法、系统、设备与可读存储介质
CN115099371B (zh) * 2022-08-24 2022-11-29 广东粤港澳大湾区硬科技创新研究院 一种lstm异常检测方法、装置、设备及存储介质
CN116245406B (zh) * 2023-02-09 2023-09-19 江苏省工商行政管理局信息中心 基于运维质量管理数据库的软件运维质量评价方法及系统
CN116614431B (zh) * 2023-07-19 2023-10-03 中国电信股份有限公司 数据处理方法、装置、电子设备和计算机可读存储介质
CN118152693A (zh) * 2024-05-11 2024-06-07 华腾数云(北京)科技有限公司 大数据质量评估方法、装置、设备以及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102647299A (zh) * 2012-04-24 2012-08-22 网宿科技股份有限公司 基于内容分发网络的层次化报警分析方法和系统
CN104038392A (zh) * 2014-07-04 2014-09-10 云南电网公司 一种云计算资源服务质量评估方法
CN105897736A (zh) * 2016-05-17 2016-08-24 北京邮电大学 一种tcp视频流业务用户体验质量评估方法及装置
CN106131865A (zh) * 2016-07-19 2016-11-16 浪潮软件集团有限公司 一种基于高铁沿线的网络质量分析方法
CN106651830A (zh) * 2016-09-28 2017-05-10 华南理工大学 一种基于并行卷积神经网络的图像质量测试方法

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100396016C (zh) * 2006-03-01 2008-06-18 华为技术有限公司 在内容分发网络中保障服务水平的系统和方法
CN101018164A (zh) * 2007-02-28 2007-08-15 西南科技大学 一种tcp/ip网络性能评估预测方法
CN103177017B (zh) * 2011-12-22 2016-01-06 阿里巴巴集团控股有限公司 一种训练服务质量评估模型的方法及装置
CN104270291B (zh) * 2014-10-22 2018-05-01 网宿科技股份有限公司 Cdn网络质量监控方法
CN104702666B (zh) * 2015-01-30 2019-05-28 北京邮电大学 用户体验质量确定方法及系统
KR20180016739A (ko) * 2015-06-09 2018-02-20 올넷 브로커 에스피. 제트.오.오. 정보 및 통신 시스템들에서의 네트워크 트래픽 관리의 방법
CN105244020B (zh) * 2015-09-24 2017-03-22 百度在线网络技术(北京)有限公司 韵律层级模型训练方法、语音合成方法及装置
CN105513591B (zh) * 2015-12-21 2019-09-03 百度在线网络技术(北京)有限公司 用lstm循环神经网络模型进行语音识别的方法和装置
CN106384197A (zh) * 2016-09-13 2017-02-08 北京协力筑成金融信息服务股份有限公司 一种基于大数据的业务质量评估方法和装置
US11018958B2 (en) * 2017-03-14 2021-05-25 Tupl Inc Communication network quality of experience extrapolation and diagnosis
CN108256695A (zh) * 2018-03-14 2018-07-06 北京邮电大学 一种基于历史信息的Web服务QoS预测方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102647299A (zh) * 2012-04-24 2012-08-22 网宿科技股份有限公司 基于内容分发网络的层次化报警分析方法和系统
CN104038392A (zh) * 2014-07-04 2014-09-10 云南电网公司 一种云计算资源服务质量评估方法
CN105897736A (zh) * 2016-05-17 2016-08-24 北京邮电大学 一种tcp视频流业务用户体验质量评估方法及装置
CN106131865A (zh) * 2016-07-19 2016-11-16 浪潮软件集团有限公司 一种基于高铁沿线的网络质量分析方法
CN106651830A (zh) * 2016-09-28 2017-05-10 华南理工大学 一种基于并行卷积神经网络的图像质量测试方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3863222A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112527783A (zh) * 2020-11-27 2021-03-19 中科曙光南京研究院有限公司 一种基于Hadoop的数据质量探查系统
CN112527783B (zh) * 2020-11-27 2024-05-24 中科曙光南京研究院有限公司 一种基于Hadoop的数据质量探查系统

Also Published As

Publication number Publication date
CN109347668A (zh) 2019-02-15
EP3863222A1 (en) 2021-08-11
EP3863222A4 (en) 2021-12-01
CN109347668B (zh) 2020-11-06
US20210027170A1 (en) 2021-01-28

Similar Documents

Publication Publication Date Title
WO2020077682A1 (zh) 一种服务质量评估模型的训练方法及装置
WO2020077672A1 (zh) 一种服务质量评估模型的训练方法及装置
CN111178456B (zh) 异常指标检测方法、装置、计算机设备和存储介质
WO2021139235A1 (zh) 系统异常检测方法、装置、设备及存储介质
CN108520357B (zh) 一种线损异常原因的判别方法、装置及服务器
Auld et al. Bayesian neural networks for internet traffic classification
CN110335168B (zh) 基于gru优化用电信息采集终端故障预测模型的方法及系统
TWI684139B (zh) 基於自動學習的基地台異常之預測的系統與方法
WO2021103823A1 (zh) 模型更新系统、模型更新方法及相关设备
Wang et al. Dealing with alarms in optical networks using an intelligent system
CN105141446A (zh) 一种基于客观权重确定的网络设备健康度评估方法
CN113225346A (zh) 一种基于机器学习的网络运维态势评估方法
CN112433927A (zh) 基于时间序列聚类和lstm的云服务器老化预测方法
CN117110748A (zh) 一种基于融合终端的变电站主设备运行状态异常检测方法
Zhang et al. Cause-aware failure detection using an interpretable XGBoost for optical networks
CN115567269A (zh) 基于联邦学习与深度学习的物联网异常检测方法及系统
CN115051929A (zh) 基于自监督目标感知神经网络的网络故障预测方法及装置
CN114385403A (zh) 基于双层知识图谱架构的分布式协同故障诊断方法
CN106789163A (zh) 一种网络设备用电信息监测方法、装置和系统
WO2023093431A1 (zh) 一种模型训练方法、装置、设备、存储介质和程序产品
CN113572639B (zh) 一种载波网络故障的诊断方法、系统、设备和介质
CN112801815B (zh) 一种基于联邦学习的电力通信网络故障预警方法
CN115587612A (zh) 基于自监督超图的网络故障预测方法及装置
Ren et al. Triple: the interpretable deep learning anomaly detection framework based on trace-metric-log of microservice
CN110990236A (zh) 一种基于隐马尔科夫随机场的SaaS软件性能问题识别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18937140

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018937140

Country of ref document: EP

Effective date: 20210504