WO2016033969A1 - 业务数据量和/或资源数据量的预测方法及预测系统 - Google Patents

业务数据量和/或资源数据量的预测方法及预测系统 Download PDF

Info

Publication number
WO2016033969A1
WO2016033969A1 PCT/CN2015/075995 CN2015075995W WO2016033969A1 WO 2016033969 A1 WO2016033969 A1 WO 2016033969A1 CN 2015075995 W CN2015075995 W CN 2015075995W WO 2016033969 A1 WO2016033969 A1 WO 2016033969A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
data set
amount
resource
clustering
Prior art date
Application number
PCT/CN2015/075995
Other languages
English (en)
French (fr)
Inventor
顾军
马达
张士蒙
高晶宝
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016033969A1 publication Critical patent/WO2016033969A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models

Definitions

  • the present invention relates to prediction technologies in the field of mobile communications, and in particular, to a method and a prediction system for predicting the amount of traffic data and/or resource data, and a computer storage medium.
  • LTE Long Term Evolution
  • the user behavior data generated by the LTE-based communication system is completely different from the traditional 2G and 3G, and it contains more information about services and resources.
  • the LTE protocol can provide faster uplink and downlink peak rates, so the usage of data services is greatly increased, and the amount of data generated on the wireless side and the core network side is exponentially increased. Therefore, traditional data analysis tools are used. Such a large amount of data is no longer applicable.
  • Embodiments of the present invention provide a method for predicting a service data amount and/or a resource data amount, a prediction system, and a computer storage medium, which solve the problem that an existing data analysis method cannot be applied to a data volume of a growing service and/or resource. Problems that prevent analysis and prediction of the amount of data for the business and/or resources.
  • the embodiment of the invention provides a method for predicting the amount of service data and/or the amount of resource data, including:
  • an expected amount of data for the service and/or resource is obtained.
  • the original data set for constructing the amount of traffic data and/or the amount of resource data comprises:
  • the original data set is constructed according to the consumption data of the service and/or the consumption data of the resource.
  • the original data set is subjected to dimensionality reduction preprocessing by principal component analysis to obtain a preprocessed data set.
  • the method further includes: performing normalization processing on the original data set before performing the dimensionality reduction preprocessing on the original data set; and/or,
  • the pre-processed data set is normalized.
  • the initial clustering process is performed on the pre-processed data set to obtain an initial clustering data set, and at least one accurate clustering process is performed on the original data set according to the initial clustering data set.
  • the precise clustering data set includes:
  • the initial clustering process is performed on the preprocessed data set to obtain an initial clustering data set
  • the original data set is subjected to an accurate clustering process to obtain an accurate clustering data set.
  • determining the prediction model according to the accurate clustering data set comprises:
  • the basic data amount and the data volume to be predicted are fitted, and a fitting function is determined, and the fitting function is used as a prediction model.
  • obtaining an expected data volume of the service and/or resource includes:
  • the amount of traffic data and/or the amount of resource data is predicted according to the selected fitting function to obtain an expected amount of data of the service and/or resource.
  • the method further includes:
  • the network is optimized based on the expected amount of data.
  • the embodiment of the invention provides a prediction system for the amount of service data and/or the amount of resource data, including:
  • a build module configured to construct a raw data set of business data volume and/or resource data volume
  • a preprocessing module configured to perform a dimensionality reduction preprocessing on the original data set constructed by the building module to obtain a preprocessed data set
  • the clustering module is configured to perform an initial clustering process on the pre-processed data set obtained by the pre-processing module to obtain an initial clustering data set, and construct the initial clustering data set according to the initial clustering data set.
  • the original data set is subjected to at least one precise clustering process to obtain an accurate clustering data set;
  • a determining module configured to determine a prediction model according to the accurate clustering data set obtained by the clustering module
  • a prediction module configured to obtain the service according to a prediction model determined by the determining module And/or the expected amount of data for the resource.
  • the method further includes an acquisition module;
  • the determining module is further configured to determine a service and/or a resource to be predicted
  • the obtaining module is configured to acquire consumption data of the service and/or consumption data of the service determined by the determining module in at least one historical time period, and use the consumption data of the service as the service data quantity, Decoding data of the resource as the amount of the resource data;
  • the building module is further configured to construct a raw data set according to consumption data of the service acquired by the acquiring module and/or consumption data of the resource.
  • the pre-processing module is further configured to perform a dimensionality reduction pre-processing on the original data set constructed by the building module, and normalize the original data set before obtaining the pre-processed data set; and/or ,
  • the method further includes a calculation module, where the clustering module includes an initial clustering sub-module and a precise clustering sub-module;
  • the initial clustering sub-module is configured to perform an initial clustering process on the pre-processed data set obtained by the pre-processing module according to an initial clustering method to obtain an initial clustering data set;
  • the calculating module is configured to calculate an initial clustering center according to the initial clustering data set obtained by the initial clustering sub-module;
  • the precise clustering sub-module is configured to perform an accurate clustering process on the original data set constructed by the building module according to an accurate clustering method and an initial clustering center calculated by the computing module, to obtain accurate clustering data. set.
  • the determining module is further configured to determine a basic item and a to-be-predicted item in the accurate clustering data set obtained by the accurate clustering sub-module; determine a basic data quantity according to the basic item and the item to be predicted, The amount of data to be predicted;
  • the determining module is further configured to fit the basic data amount and the data volume to be predicted according to a gradient descent method, determine a fitting function, and use the fitting function as a prediction model
  • the method further comprises:
  • Select a module configured to select different fitting functions according to different basic items
  • the prediction module is further configured to predict the amount of service data and/or the amount of resource data according to a fitting function selected by the selection module to obtain an expected amount of data of the service and/or resource.
  • the embodiment of the invention provides a computer storage medium, wherein the computer storage medium stores executable instructions, and the executable instructions are used to execute the foregoing method for predicting the amount of service data and/or the amount of resource data.
  • Embodiments of the present invention provide a method for predicting a service data amount and/or a resource data amount, a prediction system, and a computer storage medium, and performing pre-processing on the original data to perform clustering processing to implement multi-dimensional prediction of users, services, and resources. Therefore, it provides a reference for the optimization of LTE network resources.
  • the initial clustering result is taken as the initial condition of accurate clustering, which makes the clustering result distribution more scientific and accurate, and more in line with the relationship between different dimensional data resources.
  • the prediction effect of the prediction model in the present invention is better than that of the direct fitting of the original data, and the prediction error is reduced by more than 10%, and some resources can reach 25%.
  • the present invention can reflect the overall characteristics and effects of all the original data through a small amount of data, save the data resource analysis cost, reduce the algorithm complexity for the data analysis, and the prediction result can provide reference for the resource planning of the LTE network.
  • the invention is more suitable for the prediction of the LTE network, that is, the correlation algorithm of the LTE data, realizing the prediction of the channel resources and the analysis of the group behavior of the user.
  • FIG. 1 is a method for predicting traffic data amount and/or resource data amount according to Embodiment 1 of the present invention
  • FIG. 3 is a schematic structural diagram of a system for predicting a quantity of service data and/or a quantity of resource data according to Embodiment 2 of the present invention
  • FIG. 4 is a flowchart of a method for predicting a quantity of service data and/or a quantity of resource data according to Embodiment 3 of the present invention
  • FIG. 5 is a partial data set selected from sample data according to Embodiment 3 of the present invention.
  • FIG. 6 is a cluster data set obtained by clustering according to Embodiment 3 of the present invention.
  • FIG. 7 is a comparison diagram of a clustering prediction effect and a direct prediction effect of sample data according to Embodiment 3 of the present invention.
  • FIG. 11 is a flowchart of a method for predicting a quantity of service data and/or a quantity of resource data according to Embodiment 4 of the present invention.
  • FIG. 13 is a cluster data set obtained by clustering according to Embodiment 4 of the present invention.
  • FIG. 14 is a comparison diagram of a clustering prediction effect and a direct prediction effect of sample data according to Embodiment 4 of the present invention.
  • FIG. 17 is a comparison diagram of algorithm complexity according to Embodiment 4 of the present invention.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • FIG. 1 is a flowchart of a method for predicting a quantity of service data and/or a quantity of resource data according to Embodiment 1 of the present invention. As shown in FIG. 1, the method for predicting the amount of service data and/or the amount of resource data includes:
  • S101 construct a raw data set of a quantity of service data and/or a quantity of resource data
  • the construction manner thereof includes the following manners:
  • the first method is to determine the service to be predicted according to the actual prediction requirement, and obtain the consumption data of the service to be predicted in at least one historical time period after the determination is completed, and the obtaining manner includes acquiring the LTE network base station by using the network device or the like.
  • Consumption data the consumption data is used as a business data amount, and a raw data set is constructed according to the consumption data;
  • the resource to be predicted is determined according to the actual prediction requirement.
  • the consumption data of the resource to be predicted is obtained in at least one historical time period, and the obtaining manner includes acquiring the LTE network base station by using a network device or the like.
  • Consumption data the consumption data is used as a resource data amount, and a raw data set is constructed according to the consumption data;
  • the service and the resource to be predicted are determined according to the actual predicted demand.
  • the consumption data and the consumption data of the service to be predicted are acquired in at least one historical time period, and the obtaining manner includes the The number of consumptions in the LTE network base station is obtained by the device or the like
  • the consumption data of the resource is used as the amount of resource data, and the raw data set is constructed based on the consumption data.
  • the original data set includes the consumption data acquired according to the service and/or the resource to be predicted in at least one historical time period, and if the determined number of services and/or resources to be predicted is m, The number of historical time segments is N, and the original data set is a matrix of N*m, where m and N are positive integers.
  • the granularity of each historical time period is the same, for example, the granularity is 1 hour, that is, the duration of each historical time period is 1 hour, and
  • the historical time period can be selected according to actual needs, for example, multiple historical time periods in the same day of each week, multiple historical time periods of each day of the week, or, for three consecutive weeks, every day from 8 am to 8 pm Multiple historical time periods, etc. within.
  • the original data set is subjected to dimensionality reduction preprocessing to obtain a preprocessed data set.
  • the principal component analysis method is used to perform dimensionality reduction preprocessing on the original data set to obtain a preprocessed data set.
  • the principal component analysis method mainly uses the idea of dimensionality reduction to convert a plurality of variables into a few
  • the m variables are used to describe the research objects, which are respectively represented by Z1, Z2, ..., Zm, and the m variables.
  • the mean ⁇ is the eigenvector corresponding to the eigenvalue of the covariance matrix ⁇ of the random vector Z, Y1(x), Y2(x),..., Yk(x) are the original variables after linear combination The main ingredient.
  • the preprocessed data set is obtained, and the preprocessed data set is an N*k matrix, that is, the m-dimensional original data set is changed into a k-dimensional preprocessed data set, and is embodied by a k-dimensional integrated variable.
  • the characteristic information of the original variable of the m-dimensional is obtained, wherein k and N are positive integers.
  • the original data set is subjected to dimensionality reduction preprocessing, and before the preprocessed data set is obtained, each consumed data in the original data set is normalized.
  • the normalization formula is as shown in the following formula (1.2):
  • x represents the specific amount of consumed data of a certain service or resource in the original data set in a historical time period. Indicates the average value of the service or resource in a historical time period, N represents the number of historical time periods, and Z(x) represents the normalized consumption data. After normalization, the original data set is still a N. a matrix of *m, where m and N are positive integers;
  • the data in the preprocessed data set is also normalized, and the normalization formula is as shown in the following formula (1.3):
  • Y(x) represents the data in the preprocessed data set
  • minY(x), maxY(x) respectively represent the minimum and maximum values of all data in the preprocessed data set
  • Z(x) represents the normalized data
  • S103 Perform an initial clustering process on the preprocessed data set to obtain an initial clustering data set, and perform at least one accurate clustering process on the original data set according to the initial clustering data set to obtain an accurate clustering data set;
  • the initial clustering process may be performed on the pre-processed data set to obtain an initial clustering data set, and at least one accurate clustering of the original data set is performed according to the initial clustering data set. Processed to get a precise clustering data set.
  • the purpose of the initial clustering process is to perform preliminary analysis on the data and provide initial accurate clustering conditions for at least one precise clustering process, so that the results of the accurate clustering process are more scientific and accurate, for example,
  • the pre-processed original data set is clustered for the first time, and the average value of each type of data after the first clustering is calculated, and the average value is used as the initial aggregation of at least one accurate clustering process.
  • the class center performs at least one precise clustering process on the original data according to the initial clustering center, thereby accurately clustering the original data.
  • an initial clustering process and an accurate clustering process are taken as an example for description.
  • an initial clustering process is performed on the preprocessed data set, and the preprocessed data set may be returned.
  • the preprocessed data set after the processing may also be a preprocessed data set that is not subjected to normalization processing to obtain an initial clustering data set.
  • any algorithm can be used to quickly, simply, roughly cluster the data, and obtain a clustering center, which includes a Canopy algorithm, which is simple, fast, but not too Accurate clustering methods can therefore be used as an auxiliary algorithm.
  • the algorithm principle of the Canopy algorithm is that each object is represented by a point in the multidimensional feature space, and a fast approximate distance measure and two distance thresholds T1>T2 (T1>0, T2>0) are used to cluster the data.
  • T1>T2 T1>0, T2>0
  • the average value of each type of data in the initial clustering data set is calculated, and the average value is taken as the initial clustering center of the accurate clustering processing.
  • FIG. 2 is a flowchart of a K-means algorithm according to Embodiment 1 of the present invention.
  • the -means algorithm is a hard clustering algorithm. It is a typical prototype-based objective function clustering method. It is a certain distance from the data point to the prototype as the objective function of the optimization. The function is used to find the extremum method to obtain the iterative operation. Adjust the rules.
  • the K-means algorithm uses the Euclidean distance as the similarity measure, which is to find the optimal classification corresponding to a certain initial cluster center, so that the evaluation index is the smallest.
  • the algorithm principle of the K-means algorithm is a typical distance-based clustering algorithm. The distance is used as the evaluation index of similarity. That is, the closer the distance between two objects is, the larger the similarity is.
  • the K-means algorithm considers clusters to be composed of objects that are close together, so that a compact and independent cluster is the ultimate goal. Through many iterations, and the correction of the center point in each iteration, the convergence is finally achieved, and the data aggregation is realized. class. It should be noted that after clustering by K-means algorithm, the clustering result is analyzed. If the change of a certain dimension variable is very small (the variation range does not exceed 5%), it indicates that this dimension variable is not very much in the clustering result. In the big sense, this dimension variable should be removed.
  • an accurate clustering data set is obtained. If the precise clustering data set divides m services and/or resources in the original data set into h classes, the precision is obtained.
  • the cluster data set is a matrix of h*m, where h and m are positive integers, and h ⁇ N.
  • the prediction model can be determined.
  • the basic item and the item to be predicted are determined, and the basic data quantity and the data quantity to be predicted are determined according to the basic item and the item to be predicted, and the basic data quantity is determined according to the gradient descent method.
  • the amount of data to be predicted is fitted, the fitting function is determined, and the fitting function is used as a prediction model.
  • the prediction model may be a single-dimensional prediction model, that is, one of the determined basic items or a multi-dimensional prediction model, that is, at least two items are determined.
  • the basic item is a service or a resource
  • the basic data quantity is a cluster center of the service or resource in all classes in the accurate cluster set
  • the to-be-predicted item is a service or a resource
  • the to-be-predicted data The quantity is the cluster center of the business or resource in all classes in the exact clustering set.
  • Case 1 If the original data set is constructed for the service, the accurate clustering data set is also for the service.
  • the basic item and the item to be predicted are all services, that is, the basic item is a basic service, and the to-be-predicted item is a service to be predicted, according to Basic service, determine the clustering center of the basic service in all classes, determine the clustering center of the to-be-predicted service in all classes according to the service to be predicted, and fit the two cluster centers according to the gradient descent method , determining a fitting function, and using the fitting function as a prediction model;
  • the basic item and the item to be predicted are all resources, that is, the basic item is a basic resource, and the to-be-predicted item is a resource to be predicted, according to
  • the basic resource determines the clustering center of the basic resource in all classes, determines the clustering center of the resource to be predicted in all classes according to the resource to be predicted, and fits the two cluster centers according to the gradient descent method , determining a fitting function, and using the fitting function as a prediction model;
  • the basic item can be either a service or a resource
  • the item to be predicted can be either a service or a service.
  • a resource that is, when the basic item is a basic service
  • the item to be predicted may be a service to be predicted or a resource to be predicted.
  • the item to be predicted may be a service to be predicted or may be predicted.
  • the resource determines the cluster center of the basic service or the basic resource in all classes, and determines the cluster of the to-be-predicted service or the to-be-predicted resource in all classes according to the to-be-predicted service or the to-be-predicted resource.
  • Center according to the gradient descent method, the two cluster centers are fitted to determine the fitting function, and the fitting function is used as the prediction model.
  • the selected basic items are different, and the determined fitting function is also different, and the most suitable fitting function may be selected according to the prediction evaluation parameters of different fitting functions, ie The smaller the prediction evaluation parameter of the selected fitting function is, the better the prediction result is. If the prediction evaluation parameter of a fitting function is the smallest, the basic term in the fitting function is the best basic item. The function is more accurate.
  • S105 Obtain an expected amount of data of the service and/or the resource according to the prediction model.
  • the expected amount of data for the service and/or resources within the required prediction period can be obtained according to the prediction model.
  • different fitting functions are selected according to different basic items, and the amount of business data and/or the amount of resource data are predicted according to the selected fitting function, and the required prediction is obtained.
  • the expected amount of data for the business and/or resources within the interval are selected according to different basic items, and the amount of business data and/or the amount of resource data are predicted according to the selected fitting function, and the required prediction is obtained. The expected amount of data for the business and/or resources within the interval.
  • the network may be provided with certain guidance according to the expected data volume, thereby increasing the network to become increasingly rich.
  • the carrying capacity of the data service may be provided with certain guidance according to the expected data volume, thereby increasing the network to become increasingly rich.
  • the initial clustering result is used as the initial condition of precise clustering, which makes the clustering result distribution more scientific and accurate, and more in line with the relationship between different dimensional data resources.
  • the prediction effect of the prediction model in the present invention is better than that of the direct fitting of the original data, and the prediction error is reduced by more than 10%, and some resources can reach 25%.
  • the present invention can reflect the overall characteristics and effects of all the original data through a small amount of data, save the data resource analysis cost, reduce the algorithm complexity for the data analysis, and the prediction result can provide reference for the resource planning of the LTE network.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • FIG. 3 is a schematic structural diagram of a system for predicting a quantity of service data and/or a quantity of resource data according to Embodiment 2 of the present invention.
  • the prediction system includes a building module 1, a preprocessing module 2, and a clustering module 3.
  • the determining module 4 and the predicting module 5 are configured to construct a raw data set of the amount of service data and/or the amount of resource data, and the pre-processing module 2 is configured to perform dimension reduction preprocessing on the original data set constructed by the building module 1 Obtaining a pre-processed data set, the clustering module 3 is configured to perform an initial clustering process on the pre-processed data set obtained by the pre-processing module 2, to obtain an initial clustering data set, and according to the initial clustering data set, and then to the building module 1
  • the constructed original data set is subjected to at least one precise clustering process to obtain an accurate clustering data set, and the determining module 4 is configured to determine the prediction model according to the accurate clustering data set obtained by the clustering module 3, and the prediction module 5 is configured according to the determining module. 4 Determine the predictive model and get the expected amount of data for the business and/or resources.
  • the method further includes an obtaining module 6, the determining module 4 is further configured to determine a service and/or a resource to be predicted, and the obtaining module 6 is configured to acquire the data in at least one historical time period.
  • the consumption data of the service determined by the module 4 and/or the consumption data of the resource is used as the quantity of the business data
  • the consumption data of the resource is used as the quantity of the resource data
  • the building module 1 is further configured to be the service acquired according to the obtaining module 6.
  • the consumption data and/or consumption data of the resources are constructed to construct a raw data set.
  • the pre-processing module 2 performs a dimensionality reduction preprocessing on the original data set by principal component analysis to obtain a pre-processed data set.
  • the pre-processing module 3 is further configured to perform a dimensionality reduction pre-processing on the original data set constructed by the building module 1, and normalize the original data set before obtaining the pre-processed data set, and/or, After the pre-processing of the original data set constructed by the building module 1 is performed, and the pre-processed data set is obtained, the pre-processed data set is normalized.
  • the clustering module 3 includes an initial clustering sub-module 31 and an accurate clustering sub-module 32.
  • the initial clustering sub-module 31 is configured to firstly perform a pre-processing module according to an initial clustering method. 2
  • the obtained pre-processed data set is subjected to an initial clustering process to obtain an initial clustering data set
  • the calculating module 7 is configured to calculate an initial clustering center according to the initial clustering data set obtained by the initial clustering sub-module 31, and perform accurate clustering.
  • the sub-module 32 is configured to perform an accurate clustering process on the original data set constructed by the building module 1 according to the accurate clustering method and the initial clustering center calculated by the calculating module 7, to obtain an accurate clustering data set.
  • the determining module 4 is further configured to determine the basic item and the item to be predicted in the accurate clustering data set obtained by the accurate clustering sub-module 32, and further configured to determine the basic data according to the basic item and the item to be predicted.
  • the quantity and the amount of data to be predicted, the determining module 4 is specifically configured to fit the basic data amount and the data volume to be predicted according to the gradient descent method, determine the fitting function, and use the fitting function as the prediction model.
  • a selection module 8 is further included, and the selection module 8 is configured to select different fitting functions according to different basic items, and the prediction module 9 is further configured to calculate the amount of business data according to the fitting function selected by the selection module 8. And / or the amount of resource data to make predictions, get business and / or resources The expected amount of data.
  • the optimization module 9 is further included. After the prediction module 8 obtains the expected data amount of the service and/or the resource according to the prediction model, the optimization module 9 optimizes the network according to the expected data amount, such as for the LTE network. Adjustments to business and/or resources, etc.
  • Embodiment 3 is a diagrammatic representation of Embodiment 3
  • FIG. 4 is a flowchart of a method for predicting a quantity of service data and/or a quantity of resource data according to Embodiment 3 of the present invention. As shown in FIG. 4, the method for predicting the amount of service data and/or the amount of resource data includes:
  • S201 Construct a raw data set according to the amount of business and resource data generated by multiple different time periods in the same day of the week;
  • the data volume of the service and the resource in the LTE network base station is collected by the network device.
  • the number of users connected to the RRC (Radio Resource Control) and the uplink and downlink average are related to the predicted service and resources. Traffic, number of successful incoming and outgoing calls, uplink and downlink shared channel utilization, downlink control channel utilization, etc., wherein the number of RRC connected users has the maximum number of active users and the average number of active users.
  • RRC Radio Resource Control
  • the data is three-dimensional data from users, resources, and services in a certain area of the LTE network.
  • the time granularity is one hour, that is, the duration of the time period is one hour, and the sample data collection time is two adjacent Mondays. In this case, N data of Mondays in the next two weeks are selected.
  • Each data represents the amount of data consumed by a base station in one hour, and the data and resources of the m columns to be analyzed are filtered out.
  • Predicting the number of services and resources, constructing a raw data set P, P is a matrix of N*m;
  • FIG. 5 is a partial data set selected from sample data according to Embodiment 3 of the present invention. As shown in FIG. 5, each row of data in FIG. 5 represents one hour of service and resource usage of a base station in the area, and each column represents The amount of data that the business or resource specifically consumes in an hour is studied for nine kinds of business and resources.
  • the amount of data in the original data set P is normalized, and the normalization formula is as shown in the following formula (2.1):
  • x is the specific value consumed by a certain service or resource in the original data set P within one hour. Is the average of the data consumed by the service or resource for one hour, N is the number of multiple time periods, and Z(x) represents the normalized consumption data. After the normalization process, the original data set is still a N. A matrix of *m, where m and N are both positive integers.
  • the m-dimensional service and the resource in the normalized original data set P are subjected to dimensionality reduction processing.
  • the specific method can be obtained by principal component analysis, and after the dimension reduction, the pre-processed data set Q is obtained.
  • Q is a matrix of N*k, k ⁇ m.
  • N and k are natural numbers
  • y N and k represent data after dimensionality reduction.
  • Each of the pre-processed data sets Q is obtained by principal component analysis through the normalized original data set P. Taking the converted first column as an example, the conversion formula is as shown in the following formula (2.2):
  • each of the data in the preprocessed data set Q is normalized, and the normalization formula is as shown in the following equation (2.3):
  • Y(x) represents the data in the preprocessed data set Q
  • minY(x) and maxY(x) respectively represent the minimum and maximum values of all the data in the preprocessed data set
  • Z(x) represents the normalized The data
  • the normalized preprocessed data set is divided into h classes, and the normalized preprocessed data set is processed by using a clustering algorithm, which is a pair of business and resource data.
  • a clustering algorithm which is a pair of business and resource data.
  • the result of Canopy clustering is used as the initial clustering center of the second K-means clustering.
  • the clustering data set Q1 is obtained, and Q1 is a h*m.
  • the matrix where h ⁇ N.
  • FIG. 6 is a cluster data set obtained by clustering according to Embodiment 3 of the present invention. As shown in FIG. 6 , in FIG. 6 , h is 11, which represents 11 types of results, and each row of data represents clustering.
  • the clustering center of the business or resource in each class is the average of the business and resource data contained in this class.
  • the basic item and its data amount, the item to be predicted and its data amount are selected, and the gradient descent method is used to fit the curve.
  • the function is also different.
  • S205 Predict the amount of service and resource data according to the time period of the required prediction.
  • the amount of business and resource data in a required time period, such as the expected amount of data of services and resources in a certain week of the week.
  • MSE Mobile Squared Error
  • MAPE Mobile Absolute Percentage Error
  • ME mean error
  • the calculation formulas for MSE, MAPE, and ME are as shown in the following equations (2.4), (2.5), and (2.6):
  • FIG. 7 is a comparison diagram between the clustering prediction effect and the direct prediction effect of the sample data according to Embodiment 3 of the present invention.
  • the basic item is the maximum number of RRC connected users
  • the to-be-predicted item is the average RRC.
  • the scatter point in Fig. 7 is a scatter plot of the sample data
  • the dark curve is a function of fitting after cluster analysis
  • the light curve is a function directly fitting the sample data.
  • the settings can be selected according to the parameters that need to be predicted.
  • a single-dimensional prediction is taken as an example.
  • FIG. 8 is a prediction effect evaluation parameter of the clustering result according to Embodiment 3 of the present invention. As shown in FIG. 8, each row in FIG. 8 is an evaluation corresponding to a certain service and resource. The value of the parameter, each column represents a predictive evaluation parameter for the item to be predicted.
  • FIG. 9 is a histogram of a MAPE prediction effect on channel utilization according to Embodiment 3 of the present invention.
  • PDCCH-UTI is a downlink control channel utilization
  • PDSCH-UTI is a downlink shared channel utilization.
  • Rate, MAPE is the error measurement parameter. The higher the value, the less accurate the prediction. It can be seen from the figure that the effect of cluster prediction is better than the direct prediction of data.
  • FIG. 10 is a comparison diagram of the algorithm complexity according to the third embodiment of the present invention.
  • the fitting method used in cluster fitting is the gradient descent method.
  • the algorithm complexity is N*k*a
  • N is the number of sample data
  • k is the business and resource category of the study.
  • the number, a is the number of iterative calculations of the gradient descent method.
  • the complexity is NW*K
  • W is the number of basic items of multidimensional prediction.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • FIG. 11 is a flowchart of a method for predicting a quantity of service data and/or a quantity of resource data according to Embodiment 4 of the present invention. As shown in FIG. 11, the method for predicting the amount of service data and/or the amount of resource data includes:
  • S301 Construct a raw data set according to the amount of business and resource data generated by multiple different time periods in the same day of the week;
  • the services and resources involved in the prediction have an average number of users, a forward control channel mean, a front traffic channel mean, a reverse access channel mean, an uplink and downlink traffic, and a reverse CE occupation mean.
  • the data source is 3D data of users, resources and services in a certain area of the 3G network.
  • the time granularity is one hour, that is, the duration of the time period is one hour, and the sample data collection time is July 2, 2012.
  • the data collection time is July 9, 2012.
  • the number of services and resources to be predicted construct a raw data set P, P is a matrix of N*m;
  • FIG. 12 is a partial data set selected from sample data according to Embodiment 4 of the present invention. As shown in FIG. 12, each row of data in FIG. 12 represents one hour of service and resource usage of a base station in the area, and each column represents The value of this type of business or resource specifically consumed in an hour, the research object is seven kinds of business and resources.
  • x is the specific value consumed by a certain service or resource in the original data set P within one hour. Is the average of the data consumed by the service or resource for one hour, N is the number of multiple time periods, and Z(x) represents the normalized consumption data. After the normalization process, the original data set is still a N. A matrix of *m, where m and N are both positive integers.
  • the m-dimensional service and the resource in the normalized original data set P are subjected to dimensionality reduction processing.
  • the specific method can be obtained by principal component analysis, and after the dimension reduction, the pre-processed data set Q is obtained.
  • Q is a matrix of N*k, k ⁇ m.
  • N and k are natural numbers
  • y N and k represent data after dimensionality reduction.
  • Each data in the preprocessed data set Q is obtained by principal component analysis through the normalized original data set P. Taking the converted first column as an example, the conversion formula is as shown in the following formula (3.2):
  • each of the data in the preprocessed data set Q is normalized, and the normalization formula is as shown in the following formula (3.3):
  • Y(x) represents the data in the preprocessed data set Q
  • minY(x) and maxY(x) respectively represent the minimum and maximum values of all the data in the preprocessed data set
  • Z(x) represents the normalized The data
  • the normalized preprocessed data set is divided into h classes, and the normalized preprocessed data set is processed by using a clustering algorithm, which is a pair of business and resource data.
  • a clustering algorithm which is a pair of business and resource data.
  • the result of Canopy clustering is used as the initial clustering center of the second K-means clustering.
  • the clustering data set Q1 is obtained, and Q1 is a h*m.
  • the matrix where h ⁇ N.
  • FIG. 13 is a cluster data set obtained by clustering according to Embodiment 4 of the present invention. As shown in FIG. 13 , in FIG. 13 , h is 10, that is, 10 types of results are represented, and each row of data represents clustering.
  • the clustering center of the business or resource in each class is the average of the business and resource data contained in this class.
  • the basic item and its data amount, the item to be predicted and its data amount are selected, and the gradient descent method is used to fit the curve.
  • the function is also different.
  • S205 Predict the amount of service and resource data according to the time period of the required prediction.
  • the amount of business and resource data in a required time period, such as the expected amount of data of services and resources in a certain week of the week.
  • MSE Mel Squared Error
  • MAPE Mobile Absolute Percentage Error
  • ME mean error
  • the optimal basic item of the reverse CE occupation mean is the average number of users, and the fitting function y is reversed.
  • CE f(x 1 ) is the prediction result.
  • the calculation formulas for MSE, MAPE, and ME are as shown in the following equations (3.4), (3.5), and (3.6):
  • FIG. 14 is a comparison diagram between the clustering prediction effect and the direct prediction effect of the sample data according to Embodiment 4 of the present invention.
  • the basic item is the average number of users, and the item to be predicted is the reverse CE occupation.
  • the scatter in Figure 14 is a scatter plot of the sample data, and the dark curve is a scatter plot.
  • a function that is fitted after class analysis, and a light curve is a function that directly fits the sample data.
  • the settings can be selected according to the parameters that need to be predicted.
  • a single-dimensional prediction is taken as an example.
  • FIG. 15 is a prediction effect evaluation parameter of the clustering result according to Embodiment 4 of the present invention.
  • each row in FIG. 15 is an evaluation corresponding to a certain service and resource.
  • the value of the parameter, each column represents a predictive evaluation parameter for the item to be predicted.
  • FIG. 16 is a histogram of the MAPE prediction effect on channel utilization according to Embodiment 4 of the present invention.
  • MAPE is an error measurement parameter, and the higher the value, the less accurate the prediction is. It can be seen from the figure that the effect of cluster prediction is better than that of direct data prediction.
  • FIG 17 is a comparison diagram of the algorithm complexity of the fourth embodiment of the present invention.
  • the fitting method used in clustering fitting is the gradient descent method.
  • the algorithm complexity is N*k*a
  • N is the number of sample data
  • k is the business and resource category of the study.
  • the number, a is the number of iterative calculations of the gradient descent method.
  • the complexity is NW*K
  • W is the number of basic items of multidimensional prediction.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明实施例提供一种业务数据量和/或资源数据量的预测方法及预测系统、计算机存储介质;方法包括:构建业务数据量和/或资源数据量的原始数据集合;对原始数据集合进行降维预处理,得到预处理数据集合;先对预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合,根据初始聚类数据集合,再对原始数据集合进行至少一次精确聚类处理,得到精确聚类数据集合;根据精确聚类数据集合,确定预测模型;根据预测模型,得到业务和/或资源的预期数据量。

Description

业务数据量和/或资源数据量的预测方法及预测系统 技术领域
本发明涉及移动通信领域的预测技术,尤其涉及一种业务数据量和/或资源数据量的预测方法及预测系统、计算机存储介质。
背景技术
随着长期演进(LTE,Long Term Evolution)网络的发展与4G业务的普及,数据业务的种类和流量都有了很大的提高,因此,对用户的群体行为分析也就愈加复杂。
以LTE为协议的通信系统所产生的用户行为数据与传统2G、3G完全不同,其包含了更多业务与资源的信息。在当前频谱带宽下,LTE协议能够提供更快的上下行峰值速率,因此数据业务的使用量大幅度增加,在无线侧与核心网侧产生的数据量以指数形式增长,因而传统的数据分析工具已经不再适用如此之大的数据量。
发明内容
本发明实施例提供了一种业务数据量和/或资源数据量的预测方法及预测系统、计算机存储介质,解决了现有数据分析方式无法适用于日益增长的业务和/或资源的数据量,导致无法对业务和/或资源的数据量进行分析及预测的问题。
本发明实施例提供了一种业务数据量和/或资源数据量的预测方法,包括:
构建业务数据量和/或资源数据量的原始数据集合;
对所述原始数据集合进行降维预处理,得到预处理数据集合;
先对所述预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合,根据所述初始聚类数据集合,再对所述原始数据集合进行至少一次精确聚类处理,得到精确聚类数据集合;
根据所述精确聚类数据集合,确定预测模型;
根据所述预测模型,得到所述业务和/或资源的预期数据量。
优选地,构建业务数据量和/或资源数据量的原始数据集合包括:
确定待预测的业务和/或资源;
获取在至少一个历史时间段内,所述业务的消耗数据和/或资源的消耗数据,将所述业务的消耗数据作为所述业务数据量,将所述资源的消耗数据作为所述资源数据量;
根据所述业务的消耗数据和/或资源的消耗数据,构建原始数据集合。
优选地,通过主成分分析法,对所述原始数据集合进行降维预处理,得到预处理数据集合。
优选地,还包括:在对所述原始数据集合进行降维预处理,得到预处理数据集合之前,对所述原始数据集合进行归一化处理;和/或,
在对所述原始数据集合进行降维预处理,得到预处理数据集合之后,对所述预处理数据集合进行归一化处理。
优选地,先对所述预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合,根据所述初始聚类数据集合,再对所述原始数据集合进行至少一次精确聚类处理,得到精确聚类数据集合包括:
根据初始聚类方法,先对所述预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合;
根据所述初始聚类数据集合,计算初始聚类中心;
根据精确聚类方法、所述初始聚类中心,再对所述原始数据集合进行一次精确聚类处理,得到精确聚类数据集合。
优选地,根据所述精确聚类数据集合,确定预测模型包括:
在所述精确聚类数据集合中,确定基本项、待预测项;
根据所述基本项、待预测项,确定基本数据量、待预测数据量;
根据梯度下降法,对所述基本数据量、待预测数据量进行拟合,确定拟合函数,将所述拟合函数作为预测模型。
优选地,根据所述预测模型,得到所述业务和/或资源的预期数据量包括:
根据不同的基本项,选择不同的拟合函数;
根据所述选择的拟合函数,对所述业务数据量和/或资源数据量进行预测,得到所述业务和/或资源的预期数据量。
优选地,在根据所述预测模型,得到所述业务和/或资源的预期数据量之后,还包括:
根据所述预期数据量,对网络进行优化。
本发明实施例提供了一种业务数据量和/或资源数据量的预测系统,包括:
构建模块,配置为构建业务数据量和/或资源数据量的原始数据集合;
预处理模块,配置为对所述构建模块构建的原始数据集合进行降维预处理,得到预处理数据集合;
聚类模块,配置为先对所述预处理模块得到的预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合,根据所述初始聚类数据集合,再对所述构建模块构建的原始数据集合进行至少一次精确聚类处理,得到精确聚类数据集合;
确定模块,配置为根据所述聚类模块得到的精确聚类数据集合,确定预测模型;
预测模块,配置为根据所述确定模块确定的预测模型,得到所述业务 和/或资源的预期数据量。
优选地,还包括获取模块;
所述确定模块还配置为确定待预测的业务和/或资源;
所述获取模块,配置为获取在至少一个历史时间段内,所述确定模块确定的业务的消耗数据和/或资源的消耗数据,将所述业务的消耗数据作为所述业务数据量,将所述资源的消耗数据作为所述资源数据量;
所述构建模块还配置为根据所述获取模块获取的业务的消耗数据和/或资源的消耗数据,构建原始数据集合。
优选地,所述预处理模块还配置为在对所述构建模块构建的原始数据集合进行降维预处理,得到预处理数据集合之前,对所述原始数据集合进行归一化处理;和/或,
在对所述构建模块构建的原始数据集合进行降维预处理,得到预处理数据集合之后,对所述预处理数据集合进行归一化处理。
优选地,还包括计算模块,所述聚类模块包括初始聚类子模块、精确聚类子模块;
所述初始聚类子模块,配置为根据初始聚类方法,先对所述预处理模块得到的预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合;
所述计算模块,配置为根据所述初始聚类子模块得到的初始聚类数据集合,计算初始聚类中心;
所述精确聚类子模块,配置为根据精确聚类方法、所述计算模块计算的初始聚类中心,再对所述构建模块构建的原始数据集合进行一次精确聚类处理,得到精确聚类数据集合。
优选地,所述确定模块还配置为在所述精确聚类子模块得到的精确聚类数据集合中,确定基本项、待预测项;根据所述基本项、待预测项,确定基本数据量、待预测数据量;
所述确定模块还配置为根据梯度下降法,对所述基本数据量、待预测数据量进行拟合,确定拟合函数,将所述拟合函数作为预测模型
优选地,还包括:
选择模块,配置为根据不同的基本项,选择不同的拟合函数;
所述预测模块还配置为根据所述选择模块选择的拟合函数,对所述业务数据量和/或资源数据量进行预测,得到所述业务和/或资源的预期数据量。
本发明实施例提供了一种计算机存储介质,所述计算机存储介质中存储有可执行指令,所述可执行指令用于执行上述的业务数据量和/或资源数据量的预测方法。
本发明实施例的有益效果:
本发明实施例提供一种业务数据量和/或资源数据量的预测方法及预测系统、计算机存储介质,对原始数据进行预处理后进行聚类处理,实现对用户、业务与资源的多维度预测,从而对LTE网络资源的优化提供参考。通过聚类处理,将初始聚类的结果作为精确聚类的初始条件,使聚类结果分布更加科学准确,也更符合不同维度数据资源之间的关联关系。多数情况下,本发明中的预测模型的预测效果优于原始数据直接拟合的预测效果,预测误差减小10%以上,某些资源可以达到25%。此外,本发明通过少量数据能够体现出所有原始数据的整体特点及效果,节省了数据资源分析成本,为数据分析减小算法复杂度,预测结果可为LTE网络的资源规划提供参考。本发明更适用于对LTE网络的预测,即LTE数据的相关算法,实现对信道资源的预测以及对用户的群体行为的分析。
附图说明
图1为本发明实施例一提供的业务数据量和/或资源数据量的预测方法 的流程图;
图2为本发明实施例一提供的K-means算法的流程图;
图3为本发明实施例二提供的业务数据量和/或资源数据量的预测系统的结构示意图;
图4为本发明实施例三提供的业务数据量和/或资源数据量的预测方法的流程图;
图5为本发明实施例三提供的从样本数据中选取的部分数据集合;
图6为本发明实施例三提供的通过聚类处理后得到的聚类数据集合;
图7为本发明实施例三提供的聚类预测效果与样本数据直接预测效果对比图;
图8为本发明实施例三提供的聚类结果的预测效果评估参数;
图9为本发明实施例三提供的对信道利用率的MAPE预测效果柱状图;
图10为本发明实施例三提供的算法复杂度对比图;
图11为本发明实施例四提供的业务数据量和/或资源数据量的预测方法的流程图;
图12为本发明实施例四提供的从样本数据中选取的部分数据集合;
图13为本发明实施例四提供的通过聚类处理后得到的聚类数据集合;
图14为本发明实施例四提供的聚类预测效果与样本数据直接预测效果对比图;
图15为本发明实施例四提供的聚类结果的预测效果评估参数;
图16为本发明实施例四提供的对信道利用率的MAPE预测效果柱状图;
图17为本发明实施例四提供的算法复杂度对比图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进 行清楚、完整地描述,显然,所描述的实施例只是本发明中一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
下面通过具体实施方式结合附图对本发明作进一步详细说明。
实施例一:
如图1为本发明实施例一提供的业务数据量和/或资源数据量的预测方法的流程图,如图1所示,该业务数据量和/或资源数据量的预测方法包括:
S101:构建业务数据量和/或资源数据量的原始数据集合;
例如,随着LTE网络的发展,业务数据量与资源数据量大幅度增长,需要通过对以指数形式增长的数据量进行分析,预测各个地区LTE网络的运营情况,从而进一步完成对LTE网络的优化。
在本实施例中,为了对LTE网络的运营情况进行预测,需要构建业务数据量和/或资源数据量的原始数据集合,其构建方式包括以下方式:
方式一、根据实际预测需求,确定待预测的业务,当确定完成后,获取在至少一个历史时间段内,该待预测的业务的消耗数据,其获取方式包括通过网络设备等获取LTE网络基站中的消耗数据,将该消耗数据作为业务数据量,根据该消耗数据,构建原始数据集合;
方式二、根据实际预测需求,确定待预测的资源,当确定完成后,获取在至少一个历史时间段内,该待预测的资源的消耗数据,其获取方式包括通过网络设备等获取LTE网络基站中的消耗数据,将该消耗数据作为资源数据量,根据该消耗数据,构建原始数据集合;
方式三、根据实际预测需求,确定待预测的业务和资源,当确定完成后,获取在至少一个历史时间段内,该待预测的业务的消耗数据与资源的消耗数据,其获取方式包括通过网络设备等获取LTE网络基站中的消耗数 据,将业务的消耗数据作为业务数据量,将资源的消耗数据作为资源数据量,根据该消耗数据,构建原始数据集合。
通过上述几种方式,该原始数据集合包括在至少一个历史时间段内,根据待预测的业务和/或资源获取的消耗数据,若确定的待预测的业务和/或资源的数目为m个,历史时间段的数目为N个,则该原始数据集合为一个N*m的矩阵,其中,m、N均为正整数。
在上述技术方案中,对于上述至少一个历史时间段,优选地,每个历史时间段的颗粒度均相同,如颗粒度为1小时,即每个历史时间段的时长均为1小时,此外,该历史时间段可以根据实际需求进行选择,例如,每一周同一天内的多个历史时间段,连续一周每一天内的多个历史时间段,或者,连续三周每一天上午8点至晚上8点内的多个历史时间段等。
S102:对原始数据集合进行降维预处理,得到预处理数据集合;
例如,当构建业务数据量和/或资源数据量的原始数据集合后,对该原始数据集合进行降维预处理,得到预处理数据集合。
在本实施例中,通过主成分分析法,对原始数据集合进行降维预处理,得到预处理数据集合,该主成分分析法主要是利用降维的思想,将多个变量转化为少数几个综合变量,即主成分,其中每个主成分都是原始变量的线性组合,各主成分之间互不相关,因而这些主成分能够体现出原始变量的绝大部分特征信息,且所含的信息互不重叠。
针对上述方案记载的,若确定的待预测的业务和/或资源的数目为m个,则采用这m个变量来描述研究对象,分别用Z1,Z2,…,Zm来表示,这m个变量构成的m维随机向量为Z=(Z1,Z2,…,Zm)t,设随机向量Z的均值为μ,协方差矩阵为Σ,对随机向量Z进行如下式(1.1)所示的线性变化,考虑原始变量的线性组合,即可得到主成分是不相关的线性组合Y1,Y2,…,Yk,其中,m、k均为正整数,且k<m。
Figure PCTCN2015075995-appb-000001
         式(1.1)
在式(1.1)中,均值μ为随机向量Z的协方差矩阵Σ的特征值所对应的特征向量,Y1(x),Y2(x),…,Yk(x)是原始变量经过线性组合后的主成分。通过主成分分析法后,得到预处理数据集合,该预处理数据集合为一个N*k的矩阵,即将m维的原始数据集合变为k维的预处理数据集合,通过k维的综合变量体现出m维的原始变量的特征信息,其中,k、N均为正整数。
在本实施例中,为了简化计算,且使预测结果更为准确,在对原始数据集合进行降维预处理,得到预处理数据集合之前,对该原始数据集合中的每一个消耗数据进行归一化处理,其归一化公式如下式(1.2)所示:
Figure PCTCN2015075995-appb-000002
              式(1.2)
其中,x表示原始数据集合中某一业务或资源在一个历史时间段内的具体的消耗数据量,
Figure PCTCN2015075995-appb-000003
表示该业务或资源在一个历史时间段内的平均值,N表示历史时间段的数目,Z(x)表示归一化后的消耗数据,通过归一化处理后,原始数据集合仍然为一个N*m的矩阵,其中,m、N均为正整数;
和/或,
在对原始数据集合进行降维预处理,得到预处理数据集合之后,还对预处理数据集合中的数据进行归一化处理,其归一化公式如下式(1.3)所示:
Figure PCTCN2015075995-appb-000004
              式(1.3)
其中,Y(x)表示预处理数据集合中的数据,minY(x)、maxY(x)分别表示预处理数据集合中所有数据的最小值、最大值,Z(x)表示归一化后的数据。
S103:先对预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合,根据初始聚类数据集合,再对原始数据集合进行至少一次精确聚类处理,得到精确聚类数据集合;
例如,当得到预处理数据集合后,即可先对预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合,根据该初始聚类数据集合,对原始数据集合进行至少一次精确聚类处理,得到精确聚类数据集合。
在上述技术方案中,初始聚类处理的目的是为了对数据进行初步分析,并为至少一次精确聚类处理提供初始精确聚类条件,使精确聚类处理的结果更加科学及准确,例如,在初始聚类处理中,将预处理后的原始数据集合进行第一次聚类,计算第一次聚类后每一类数据的平均值,将该平均值作为至少一次精确聚类处理的初始聚类中心,根据该初始聚类中心,对原始数据进行至少一次精确聚类处理,从而对原始数据进行精确聚类。
在本实施例中,以一次初始聚类处理、一次精确聚类处理为例进行说明,根据初始聚类方法,对预处理数据集合进行一次初始聚类处理,该预处理数据集合可以为进行归一化处理后的预处理数据集合,也可以为不进行归一化处理的预处理数据集合,得到初始聚类数据集合。
对于该初始聚类方法,只要可以对数据进行快速、简单、粗略地聚类,并得到聚类中心,任何算法均可,其包括Canopy算法,该Canopy算法是一种简单、快速、但不太准确的聚类方法,因此可作为辅助算法。该Canopy算法的算法原理为每个对象通过多维特征空间里的一个点表示,采用一个快速近似距离度量和两个距离阈值T1>T2(T1>0、T2>0)来对数据进行聚类处理,其算法流程为:
(1)将数据集向量化,得到一个数据点集后放入内存,选择两个距离阈值T1和T2,其中T1>T2,T1和T2的值可以用交叉校验来确定;
(2)从数据点集中任取一点P,用低计算成本方法快速计算点P与所有Canopy(这里的Canopy只聚类过程中的一个类)之间的距离(如果当前不存在Canopy,则把点P作为一个Canopy),如果点P与某个Canopy距离在T1以内,则将点P加入到这个Canopy;
(3)如果点P曾经与某个Canopy的距离在T2以内,则需要把点P从数据点集中删除,这一步是认为点P此时与这个Canopy已经够近了,因此它不可以再做其它Canopy的中心了;
(4)重复步骤(2)、(3),直到数据点集为空结束。
当通过一次初始聚类方法,得到初始聚类数据集合时,则计算该初始聚类数据集合中每一类数据的平均值,将该平均值作为精确聚类处理的初始聚类中心。
根据精确聚类方法、初始聚类中心,对原始数据集合进行一次精确聚类处理,得到精确聚类数据集合。对于该精确聚类方法,只需要能够对数据进行精确聚类,任何算法均可,其包括K-means算法,如图2为本发明实施例一提供的K-means算法的流程图,该K-means算法是硬聚类算法,是典型的基于原型的目标函数聚类方法的代表,它是数据点到原型的某种距离作为优化的目标函数,利用函数求极值的方法得到迭代运算的调整规则。K-means算法以欧式距离作为相似度测度,它是求对应某一初始聚类中心的最优分类,使得评价指标最小。该K-means算法的算法原理是典型的基于距离的聚类算法,采用距离作为相似性的评价指标,即认为两个对象的距离越近,其相似度就越大,该距离。K-means算法认为簇是由距离靠近的对象组成的,因此把得到紧凑且独立的簇作为最终目标。通过的很多次迭代,以及每次迭代对中心点的修正,最终达到收敛,实现数据的聚合分 类。需要说明的是,经过了K-means算法聚类后,分析聚类结果,若某一维变量变化极小(变化范围不超过5%),则说明这一维变量在聚类结果中没有很大意义,应当删除这一维变量。
通过上述一次初始聚类处理、一次精确聚类处理后,得到精确聚类数据集合,若该精确聚类数据集合将原始数据集合中的m个业务和/或资源分为h类,则该精确聚类数据集合为一个h*m的矩阵,其中,h、m均为正整数,且h<<N。
S104:根据精确聚类数据集合,确定预测模型;
例如,当得到精确聚类数据集合后,即可确定预测模型。
在本实施例中,在精确聚类数据集合中,确定基本项、待预测项,根据该基本项、待预测项,确定基本数据量、待预测数据量,根据梯度下降法,对基本数据量、待预测数据量进行拟合,确定拟合函数,将该拟合函数作为预测模型。需要说明的是,该预测模型可以为单维预测模型,即所确定的基本项有一项,也可以为多维预测模型,即所确定的基本项至少有两项。
在上述技术方案中,该基本项为业务或资源,该基本数据量为精确聚类集合中该业务或资源在所有类中的聚类中心,该待预测项为业务或资源,该待预测数据量为精确聚类集合中该业务或资源在所有类中的聚类中心,该拟合函数的确定情况包括:
情况一、若针对业务构建原始数据集合,则精确聚类数据集合同样针对业务,此时,基本项、待预测项均为业务,即基本项为基本业务,待预测项为待预测业务,根据基本业务,确定该基本业务在所有类中的聚类中心,根据待预测业务,确定该待预测业务在所有类中的聚类中心,根据梯度下降法,对这两个聚类中心进行拟合,确定拟合函数,将该拟合函数作为预测模型;
情况二、若针对资源构建原始数据集合,则精确聚类数据集合同样针对资源,此时,基本项、待预测项均为资源,即基本项为基本资源,待预测项为待预测资源,根据基本资源,确定该基本资源在所有类中的聚类中心,根据待预测资源,确定该待预测资源在所有类中的聚类中心,根据梯度下降法,对这两个聚类中心进行拟合,确定拟合函数,将该拟合函数作为预测模型;
情况三、若针对业务和资源构建原始数据集合,则精确聚类数据集合同样针对业务和资源,此时,基本项既可以为业务、也可以为资源,待预测项既可以为业务、也可以为资源,即当基本项为基本业务时,待预测项可以为待预测业务,也可以为待预测资源,当基本项为基本资源时,待预测项可以为待预测业务,也可以为待预测资源,根据基本业务或基本资源,确定该基本业务或基本资源在所有类中的聚类中心,根据待预测业务或待预测资源,确定该待预测业务或待预测资源在所有类中的聚类中心,根据梯度下降法,对这两个聚类中心进行拟合,确定拟合函数,将该拟合函数作为预测模型。
在上述技术方案中,针对同一个待预测项而言,选择的基本项不同,其确定的拟合函数也不同,可根据不同拟合函数的预测评估参数,选择最合适的拟合函数,即选择的拟合函数的预测评估参数越小,则其预测结果越好,若某一拟合函数的预测评估参数最小,那么该拟合函数中的基本项即为最佳基本项,通过该拟合函数,其预测结果更为准确。
S105:根据预测模型,得到业务和/或资源的预期数据量。
例如,当预测模型确定完成后,即可根据该预测模型,得到所需预测时间段内的业务和/或资源的预期数据量。
在本实施例中,根据不同的基本项,选择不同的拟合函数,根据所选择的拟合函数,对业务数据量和/或资源数据量进行预测,得到所需预测时 间段内业务和/或资源的预期数据量。
在本实施例中,当根据预测模型,得到业务和/或资源的预期数据量之后,还可以根据该预期数据量,为网络进行规划、优化、扩容等提供一定指导,从而提高网络对日益丰富的数据业务的承载能力。
通过对原始数据进行预处理后进行聚类处理,将初始聚类的结果作为精确聚类的初始条件,使聚类结果分布更加科学准确,也更符合不同维度数据资源之间的关联关系。多数情况下,本发明中的预测模型的预测效果优于原始数据直接拟合的预测效果,预测误差减小10%以上,某些资源可以达到25%。此外,本发明通过少量数据能够体现出所有原始数据的整体特点及效果,节省了数据资源分析成本,为数据分析减小算法复杂度,预测结果可为LTE网络的资源规划提供参考。
实施例二:
如图3为本发明实施例二提供的业务数据量和/或资源数据量的预测系统的结构示意图,如图3所示,该预测系统包括构建模块1、预处理模块2、聚类模块3、确定模块4以及预测模块5,构建模块1配置为构建业务数据量和/或资源数据量的原始数据集合,预处理模块2配置为对构建模块1构建的原始数据集合进行降维预处理,得到预处理数据集合,聚类模块3配置为先对预处理模块2得到的预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合,根据初始聚类数据集合,再对构建模块1构建的原始数据集合进行至少一次精确聚类处理,得到精确聚类数据集合,确定模块4配置为根据聚类模块3得到的精确聚类数据集合,确定预测模型,预测模块5配置为根据确定模块4确定的预测模型,得到业务和/或资源的预期数据量。
在上述技术方案中,还包括获取模块6,确定模块4还配置为确定待预测的业务和/或资源,获取模块6配置为获取在至少一个历史时间段内,确 定模块4确定的业务的消耗数据和/或资源的消耗数据,将业务的消耗数据作为业务数据量,将资源的消耗数据作为资源数据量,构建模块1还配置为根据获取模块6获取的业务的消耗数据和/或资源的消耗数据,构建原始数据集合。
在上述技术方案中,预处理模块2通过主成分分析法,对原始数据集合进行降维预处理,得到预处理数据集合。
在上述技术方案中,预处理模块3还配置为在对构建模块1构建的原始数据集合进行降维预处理,得到预处理数据集合之前,对原始数据集合进行归一化处理,和/或,在对构建模块1构建的原始数据集合进行降维预处理,得到预处理数据集合之后,对预处理数据集合进行归一化处理。
在上述技术方案中,还包括计算模块7,聚类模块3包括初始聚类子模块31、精确聚类子模块32,初始聚类子模块31配置为根据初始聚类方法,先对预处理模块2得到的预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合,计算模块7配置为根据初始聚类子模块31得到的初始聚类数据集合,计算初始聚类中心,精确聚类子模块32,配置为根据精确聚类方法、计算模块7计算的初始聚类中心,再对构建模块1构建的原始数据集合进行一次精确聚类处理,得到精确聚类数据集合。
在上述技术方案中,确定模块4还配置为在精确聚类子模块32得到的精确聚类数据集合中,确定基本项、待预测项,还配置为根据基本项、待预测项,确定基本数据量、待预测数据量,确定模块4具体配置为根据梯度下降法,对基本数据量、待预测数据量进行拟合,确定拟合函数,将拟合函数作为预测模型。
在上述技术方案中,还包括选择模块8,选择模块8配置为根据不同的基本项,选择不同的拟合函数,预测模块9还配置为根据选择模块8选择的拟合函数,对业务数据量和/或资源数据量进行预测,得到业务和/或资源 的预期数据量。
在上述技术方案中,还包括优化模块9,当预测模块8根据预测模型,得到业务和/或资源的预期数据量之后,优化模块9根据该预期数据量,对网络进行优化,如对LTE网络业务和/或资源进行调整等。
实施例三:
如图4为本发明实施例三提供的业务数据量和/或资源数据量的预测方法的流程图,如图4所示,该业务数据量和/或资源数据量的预测方法包括:
S201:根据每周同一天内多个不同时间段所产生的业务与资源数据量,构建原始数据集合;
例如,本实施例中通过网络设备采集LTE网络基站内业务与资源的数据量,经过初步筛选后,涉及预测的业务与资源有无线资源控制(RRC,Radio Resource Control)连接用户数、上下行平均流量、成功呼入呼出次数、上下行共享信道利用率、下行控制信道利用率等,其中,RRC连接用户数有最大活跃用户数和平均活跃用户数。
该数据为来源于LTE现网中某个地区的用户、资源、业务三维数据,其时间颗粒度为一小时,即时间段的时长为一小时,样本数据采集时间为两个相邻周一,具体来说,是将相邻两周的周一的N条数据挑选出来,每一条数据代表一个基站在一个小时内的业务与资源消耗数据量,筛选出需要分析的m列业务与资源,m就是所要预测的业务与资源的数目,构建原始数据集合P,P是N*m的矩阵;
Figure PCTCN2015075995-appb-000005
其中,N、m都是自然数,xN,m表示某一业务与资源在某一小时中具体 消耗的数据量。如图5为本发明实施例三提供的从样本数据中选取的部分数据集合,如图5所示,图5中每一行数据代表该地区一个基站一小时的业务与资源使用情况,每一列表示该业务或资源在一小时中具体消耗的数据量,其研究的对象为九种业务与资源。
S202:对该原始数据集合进行预处理;
例如,将上述原始数据集合P中的数据量进行归一化处理,其归一化公式如下式(2.1)所示:
Figure PCTCN2015075995-appb-000006
             式(2.1)
其中,x是原始数据集合P中某个业务或资源在一小时内消耗的具体数值,
Figure PCTCN2015075995-appb-000007
是该业务或资源一小时消耗的数据的平均值,N是多个时间段的数目,Z(x)表示归一化后的消耗数据,通过归一化处理后,原始数据集合仍然为一个N*m的矩阵,其中,m、N均为正整数。
当完成归一化处理之后,将上述归一化后的原始数据集合P中的m维业务与资源进行降维处理,具体方法可用主成分分析法,经过降维后,得到预处理数据集合Q,Q是N*k的矩阵,k<m。
Figure PCTCN2015075995-appb-000008
其中,N、k都是自然数,yN,k表示降维后的数据。预处理数据集合Q中的每一个数据是通过归一化后的原始数据集合P经过主成分分析得到的,以转换后的第一列为例,其转换公式如下式(2.2)所示:
Q(i,1)=P1(i,1)*0.338+P1(i,2)*0.333+P1(i,3)*0.340+P1(i,4)*0.329+P1(i,5)*0.176+P1(i,6)*0.326+P1(i,7)*0.319+P1(i,8)*0.317+P1(i,9)*0.320               式(2.2)
S203:对预处理后的数据集合、原始数据集合进行两次聚类处理,得到聚类数据集合;
例如,对预处理数据集合Q中的每一个数据进行归一化处理,其归一化公式如下式(2.3)所示:
Figure PCTCN2015075995-appb-000009
           式(2.3)
其中,Y(x)表示预处理数据集合Q中的数据,minY(x)、maxY(x)分别表示预处理数据集合中所有数据的最小值、最大值,Z(x)表示归一化后的数据。
当完成归一化处理之后,将归一化后的预处理数据集合划分为h类,使用聚类算法对归一化后的预处理数据集合进行处理,该聚类算法是对业务与资源数据先经过一次Canopy聚类,将Canopy聚类的结果作为第二次K-means聚类的初始聚类中心,完成对数据的聚类处理后,得到聚类数据集合Q1,Q1为一个h*m的矩阵,这里的h<<N。如图6为本发明实施例三提供的通过聚类处理后得到的聚类数据集合,如图6所示,在图6中,h为11,即表示11类结果,每一行数据表示聚类后每一类中业务或资源的聚类中心,也就是这一类中所包含的业务与资源数据的平均值。
S204:根据聚类数据集合,确定拟合函数;
例如,在Q1中选取基本项及其数据量、待预测项及其数据量,采用梯度下降法拟合,对选取的两项数据量进行曲线拟合,拟合结果为函数y=f(xn),n∈[1,8],其中,y是待预测项,xn是基本项,基本项xn是用来预测待预测项y的业务或资源,根据不同的基本项,其拟合函数也不同。例如选取平均RRC连接用户数为基本项,选取下行平均流量为待预测项,则拟合函数为y下行=f(x1),y是下行平均流量,x1是平均RRC连接用户数。
S205:根据所需预测的时间段,预测业务与资源数据量。
例如,根据得到的拟合函数,即可预测所需时间段内,业务与资源数据量,如未来某一周周一内业务与资源的预期数据量。
通过上述预测方法,根据前一周某一天的业务与资源变化情况以及一个相关业务与资源变化情况,预测需要预测的业务与资源的消耗趋势。
对于本实施例的效果,进一步说明如下:
对于不同的基本项xn,对函数y=f(xn),n∈[1,8]计算预测评估参数,该预测评估参数包括MSE(Mean Squared Error,均方误差),MAPE(Mean Absolute Percentage Error,平均百分比绝对误差),ME(mean error,平均误差)。预测评估参数数值越小,预测结果越好,根据预测评估参数选取最佳基本项,在本实施例中,下行平均流量的最佳基本项是平均RRC连接用户数,则拟合函数y下行=f(x1)就是预测结果。MSE、MAPE以及ME的计算公式如下式(2.4)、(2.5)、(2.6)所示:
Figure PCTCN2015075995-appb-000010
              式(2.4)
Figure PCTCN2015075995-appb-000011
             式(2.5)
Figure PCTCN2015075995-appb-000012
           式(2.6)
如图7为本发明实施例三提供的聚类预测效果与样本数据直接预测效果对比图,如图7所示,单维预测时,基本项是最大RRC连接用户数,待预测项是平均RRC连接用户数时,图7中的散点是样本数据的散点图,深色曲线是聚类分析后拟合的函数,浅色曲线是样本数据直接拟合的函数。多维联合预测时可根据需要预测的参数来选择设置,这里以单维预测为例。
为了使聚类结果更加直观,如图8为本发明实施例三提供的聚类结果的预测效果评估参数,如图8所示,图8中每一行就是对应某一业务与资源的一种评估参数的值,每一列表示一种待预测项的预测评估参数。
为了便于观察,如图9为本发明实施例三提供的对信道利用率的MAPE预测效果柱状图,如图9所示,PDCCH-UTI是下行控制信道利用率,PDSCH-UTI是下行共享信道利用率,MAPE为误差衡量参数,其数值越高,表示预测越不准确,从图中可以看出,聚类预测的效果要好于数据直接预测的效果。
同时,为了展示算法对数据处理方面的优化,这里给出了算法复杂度以10为底取对数的结果柱状图,如图10为本发明实施例三提供的算法复杂度对比图,如图10所示,聚类拟合中采用的拟合方式是梯度下降法,单维预测时,算法复杂度为N*k*a,N是样本数据条数,k是研究的业务与资源类别的数目,a是梯度下降法的迭代计算次数。多维预测时,复杂度为NW*K,W是多维预测的基本项数目。从图10中可以看出,在算法复杂度方面,聚类算法处理后的计算要明显优于数据直接处理,在联合预测中,输入预测维数越多,复杂度优化越明显,这对于未来的LTE大数据研究具有十分重大的意义。
实施例四:
如图11为本发明实施例四提供的业务数据量和/或资源数据量的预测方法的流程图,如图11所示,该业务数据量和/或资源数据量的预测方法包括:
S301:根据每周同一天内多个不同时间段所产生的业务与资源数据量,构建原始数据集合;
例如,经过初步筛选后,涉及预测的业务与资源有平均用户数,前向控制信道均值,前项业务信道均值,反向接入信道均值,上下行流量,反向CE占用均值等。该数据来源是3G现网中某个地区的用户、资源、业务三维数据,其时间颗粒度为一小时,即时间段的时长为一小时,样本数据采集时间为2012年7月2日,对比数据采集时间为2012年7月9日。具 体来说,是将2012年7月2日内的N条数据挑选出来,每一条数据代表一个基站在一个小时内的业务与资源消耗数据量,筛选出需要分析的m列业务与资源,m就是所要预测的业务与资源的数目,构建原始数据集合P,P是N*m的矩阵;
Figure PCTCN2015075995-appb-000013
其中,N、m都是自然数,xN,m表示某一业务与资源在某一小时中具体消耗的数据量。如图12为本发明实施例四提供的从样本数据中选取的部分数据集合,如图12所示,图12中每一行数据代表该地区一个基站一小时的业务与资源使用情况,每一列表示该类业务或资源在一小时中具体消耗的数值,其研究的对象为七种业务与资源。
S302:对该原始数据集合进行预处理;
例如,将上述原始数据集合P中的数据量进行归一化处理,其归一化公式如下式(3.1)所示:
Figure PCTCN2015075995-appb-000014
             式(2.1)
其中,x是原始数据集合P中某个业务或资源在一小时内消耗的具体数值,
Figure PCTCN2015075995-appb-000015
是该业务或资源一小时消耗的数据的平均值,N是多个时间段的数目,Z(x)表示归一化后的消耗数据,通过归一化处理后,原始数据集合仍然为一个N*m的矩阵,其中,m、N均为正整数。
当完成归一化处理之后,将上述归一化后的原始数据集合P中的m维业务与资源进行降维处理,具体方法可用主成分分析法,经过降维后,得到预处理数据集合Q,Q是N*k的矩阵,k<m。
Figure PCTCN2015075995-appb-000016
其中,N、k都是自然数,yN,k表示降维后的数据。预处理数据集合Q中的每一个数据是通过归一化后的原始数据集合P经过主成分分析得到的,以转换后的第一列为例,其转换公式如下式(3.2)所示:
Q(i,1)=P1(i,1)*0.338+P1(i,2)*0.333+P1(i,3)*0.340+P1(i,4)*0.329+P1(i,5)*0.176+P1(i,6)*0.326+P1(i,7)*0.319                  式(3.2)
S303:对预处理后的数据集合、原始数据集合进行两次聚类处理,得到聚类数据集合;
例如,对预处理数据集合Q中的每一个数据进行归一化处理,其归一化公式如下式(3.3)所示:
Figure PCTCN2015075995-appb-000017
               式(3.3)
其中,Y(x)表示预处理数据集合Q中的数据,minY(x)、maxY(x)分别表示预处理数据集合中所有数据的最小值、最大值,Z(x)表示归一化后的数据。
当完成归一化处理之后,将归一化后的预处理数据集合划分为h类,使用聚类算法对归一化后的预处理数据集合进行处理,该聚类算法是对业务与资源数据先经过一次Canopy聚类,将Canopy聚类的结果作为第二次K-means聚类的初始聚类中心,完成对数据的聚类处理后,得到聚类数据集合Q1,Q1为一个h*m的矩阵,这里的h<<N。如图13为本发明实施例四提供的通过聚类处理后得到的聚类数据集合,如图13所示,在图13中,h为10,即表示10类结果,每一行数据表示聚类后每一类中业务或资源的聚类中心,也就是这一类中所包含的业务与资源数据的平均值。
S204:根据聚类数据集合,确定拟合函数;
例如,在Q1中选取基本项及其数据量、待预测项及其数据量,采用梯度下降法拟合,对选取的两项数据量进行曲线拟合,拟合结果为函数y=f(xn),n∈[1,6],其中,y是待预测项,xn是基本项,基本项xn是用来预测待预测项y的业务或资源,根据不同的基本项,其拟合函数也不同。例如选取平均用户数为基本项,选取反向CE占用均值为待预测项,则拟合函数为y反向CE=f(x1),y是反向CE占用均值,x1是平均用户数。
S205:根据所需预测的时间段,预测业务与资源数据量。
例如,根据得到的拟合函数,即可预测所需时间段内,业务与资源数据量,如未来某一周周一内业务与资源的预期数据量。
对于本实施例的效果,进一步说明如下:
对于不同的基本项xn,对函数y=f(xn),n∈[1,6]计算预测评估参数,该预测评估参数包括MSE(Mean Squared Error,均方误差),MAPE(Mean Absolute Percentage Error,平均百分比绝对误差),ME(mean error,平均误差)。预测评估参数数值越小,预测结果越好,根据预测评估参数选取最佳基本项,在本实施例中,反向CE占用均值的最佳基本项是平均用户数,则拟合函数y反向CE=f(x1)就是预测结果。MSE、MAPE以及ME的计算公式如下式(3.4)、(3.5)、(3.6)所示:
Figure PCTCN2015075995-appb-000018
                 式(3.4)
Figure PCTCN2015075995-appb-000019
              式(3.5)
Figure PCTCN2015075995-appb-000020
              式(3.6)
如图14为本发明实施例四提供的聚类预测效果与样本数据直接预测效果对比图,如图14所示,单维预测时,基本项是平均用户数,待预测项是反向CE占用均值时,图14中的散点是样本数据的散点图,深色曲线是聚 类分析后拟合的函数,浅色曲线是样本数据直接拟合的函数。多维联合预测时可根据需要预测的参数来选择设置,这里以单维预测为例。
为了使聚类结果更加直观,如图15为本发明实施例四提供的聚类结果的预测效果评估参数,如图15所示,图15中每一行就是对应某一业务与资源的一种评估参数的值,每一列表示一种待预测项的预测评估参数。
为了便于观察,如图16为本发明实施例四提供的对信道利用率的MAPE预测效果柱状图,如图16所示,MAPE为误差衡量参数,其数值越高,表示预测越不准确,从图中可以看出,聚类预测的效果要好于数据直接预测的效果。
同时,为了展示算法对数据处理方面的优化,这里给出了算法复杂度以10为底取对数的结果柱状图,如图17为本发明实施例四提供的算法复杂度对比图,如图17所示,聚类拟合中采用的拟合方式是梯度下降法,单维预测时,算法复杂度为N*k*a,N是样本数据条数,k是研究的业务与资源类别的数目,a是梯度下降法的迭代计算次数。多维预测时,复杂度为NW*K,W是多维预测的基本项数目。从图17中可以看出,在算法复杂度方面,聚类算法处理后的计算要明显优于数据直接处理,在联合预测中,输入预测维数越多,复杂度优化越明显,这对于未来的LTE大数据研究具有十分重大的意义。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程 图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。

Claims (15)

  1. 一种业务数据量和/或资源数据量的预测方法,包括:
    构建业务数据量和/或资源数据量的原始数据集合;
    对所述原始数据集合进行降维预处理,得到预处理数据集合;
    对所述预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合,根据所述初始聚类数据集合,对所述原始数据集合进行至少一次精确聚类处理,得到精确聚类数据集合;
    根据所述精确聚类数据集合,确定预测模型;
    根据所述预测模型,得到所述业务和/或资源的预期数据量。
  2. 根据权利要求1所述的业务数据量和/或资源数据量的预测方法,其中,所述构建业务数据量和/或资源数据量的原始数据集合,包括:
    确定待预测的业务和/或资源;
    获取在至少一个历史时间段内,所述业务的消耗数据和/或资源的消耗数据,将所述业务的消耗数据作为所述业务数据量,将所述资源的消耗数据作为所述资源数据量;
    根据所述业务的消耗数据和/或资源的消耗数据,构建原始数据集合。
  3. 根据权利要求1所述的业务数据量和/或资源数据量的预测方法,其中,所述对所述预处理数据集合进行一次初始聚类处理,包括:
    通过主成分分析法,对所述原始数据集合进行降维预处理,得到预处理数据集合。
  4. 根据权利要求1所述的业务数据量和/或资源数据量的预测方法,其中,还包括:
    在对所述原始数据集合进行降维预处理,得到预处理数据集合之前,对所述原始数据集合进行归一化处理;和/或,
    在对所述原始数据集合进行降维预处理,得到预处理数据集合之后, 对所述预处理数据集合进行归一化处理。
  5. 根据权利要求1-4任一项所述的业务数据量和/或资源数据量的预测方法,其中,所述对所述预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合,根据所述初始聚类数据集合,对所述原始数据集合进行至少一次精确聚类处理,得到精确聚类数据集合包括:
    根据初始聚类方法,对所述预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合;
    根据所述初始聚类数据集合,计算初始聚类中心;
    根据精确聚类方法、所述初始聚类中心,对所述原始数据集合进行一次精确聚类处理,得到精确聚类数据集合。
  6. 根据权利要求5所述的业务数据量和/或资源数据量的预测方法,其中,所述根据所述精确聚类数据集合,确定预测模型包括:
    在所述精确聚类数据集合中,确定基本项、待预测项;
    根据所述基本项、待预测项,确定基本数据量、待预测数据量;
    根据梯度下降法,对所述基本数据量、待预测数据量进行拟合,确定拟合函数,将所述拟合函数作为预测模型。
  7. 根据权利要求6所述的业务数据量和/或资源数据量的预测方法,其中,所述根据所述预测模型,得到所述业务和/或资源的预期数据量包括:
    根据不同的基本项,选择不同的拟合函数;
    根据所述选择的拟合函数,对所述业务数据量和/或资源数据量进行预测,得到所述业务和/或资源的预期数据量。
  8. 根据权利要求1-3任一项所述的业务数据量和/或资源数据量的预测方法,其中,在根据所述预测模型,得到所述业务和/或资源的预期数据量之后,还包括:
    根据所述预期数据量,对网络进行优化。
  9. 一种业务数据量和/或资源数据量的预测系统,包括:
    构建模块,配置为构建业务数据量和/或资源数据量的原始数据集合;
    预处理模块,配置为对所述构建模块构建的原始数据集合进行降维预处理,得到预处理数据集合;
    聚类模块,配置为对所述预处理模块得到的预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合,根据所述初始聚类数据集合,再对所述构建模块构建的原始数据集合进行至少一次精确聚类处理,得到精确聚类数据集合;
    确定模块,配置为根据所述聚类模块得到的精确聚类数据集合,确定预测模型;
    预测模块,配置为根据所述确定模块确定的预测模型,得到所述业务和/或资源的预期数据量。
  10. 根据权利要求9所述的业务数据量和/或资源数据量的预测系统,其中,还包括获取模块;
    所述确定模块还配置为确定待预测的业务和/或资源;
    所述获取模块,配置为获取在至少一个历史时间段内,所述确定模块确定的业务的消耗数据和/或资源的消耗数据,将所述业务的消耗数据作为所述业务数据量,将所述资源的消耗数据作为所述资源数据量;
    所述构建模块还配置为根据所述获取模块获取的业务的消耗数据和/或资源的消耗数据,构建原始数据集合。
  11. 根据权利要求9所述的业务数据量和/或资源数据量的预测系统,其中,
    所述预处理模块还配置为在对所述构建模块构建的原始数据集合进行降维预处理,得到预处理数据集合之前,对所述原始数据集合进行归一化处理;和/或,
    在对所述构建模块构建的原始数据集合进行降维预处理,得到预处理数据集合之后,对所述预处理数据集合进行归一化处理。
  12. 根据权利要求9-11任一项所述的业务数据量和/或资源数据量的预测系统,其中,还包括计算模块,所述聚类模块包括初始聚类子模块、精确聚类子模块;
    所述初始聚类子模块,配置为根据初始聚类方法,对所述预处理模块得到的预处理数据集合进行一次初始聚类处理,得到初始聚类数据集合;
    所述计算模块,配置为根据所述初始聚类子模块得到的初始聚类数据集合,计算初始聚类中心;
    所述精确聚类子模块,配置为根据精确聚类方法、所述计算模块计算的初始聚类中心,对所述构建模块构建的原始数据集合进行一次精确聚类处理,得到精确聚类数据集合。
  13. 根据权利要求12所述的业务数据量和/或资源数据量的预测系统,其中,所述确定模块还配置为在所述精确聚类子模块得到的精确聚类数据集合中,确定基本项、待预测项;根据所述基本项、待预测项,确定基本数据量、待预测数据量;
    所述确定模块还配置为根据梯度下降法,对所述基本数据量、待预测数据量进行拟合,确定拟合函数,将所述拟合函数作为预测模型。
  14. 根据权利要求13所述的业务数据量和/或资源数据量的预测系统,其中,还包括:
    选择模块,配置为根据不同的基本项,选择不同的拟合函数;
    所述预测模块具体配置为根据所述选择模块选择的拟合函数,对所述业务数据量和/或资源数据量进行预测,得到所述业务和/或资源的预期数据量。
  15. 一种计算机存储介质,所述计算机存储介质中存储有可执行指令, 所述可执行指令用于执行权利要求1-8任一项所述的业务数据量和/或资源数据量的预测方法。
PCT/CN2015/075995 2014-09-02 2015-04-07 业务数据量和/或资源数据量的预测方法及预测系统 WO2016033969A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410443753.X 2014-09-02
CN201410443753.XA CN105472631A (zh) 2014-09-02 2014-09-02 一种业务数据量和/或资源数据量的预测方法及预测系统

Publications (1)

Publication Number Publication Date
WO2016033969A1 true WO2016033969A1 (zh) 2016-03-10

Family

ID=55439088

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/075995 WO2016033969A1 (zh) 2014-09-02 2015-04-07 业务数据量和/或资源数据量的预测方法及预测系统

Country Status (2)

Country Link
CN (1) CN105472631A (zh)
WO (1) WO2016033969A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110380881A (zh) * 2018-04-13 2019-10-25 中国移动通信有限公司研究院 一种网络扩容预测方法及装置
CN110705777A (zh) * 2019-09-26 2020-01-17 联想(北京)有限公司 用于预测备件储备量的方法、装置及系统
CN111368457A (zh) * 2020-03-19 2020-07-03 北京航空航天大学 基于小波密度估计模型改进的半导体激光器退化预测方法
US20210209939A1 (en) * 2020-12-08 2021-07-08 Harbin Engineering University Large-scale real-time traffic flow prediction method based on fuzzy logic and deep LSTM
CN115310565A (zh) * 2022-10-12 2022-11-08 西安道法数器信息科技有限公司 一种基于人工智能的网络安全监控方法

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107040475B (zh) * 2016-11-14 2020-10-02 平安科技(深圳)有限公司 资源调度方法和装置
CN109978172B (zh) * 2017-12-27 2021-08-06 中国移动通信集团设计院有限公司 一种基于极限学习机的资源池利用率预测方法及装置
CN109905271B (zh) * 2018-05-18 2021-01-12 华为技术有限公司 一种预测方法、训练方法、装置及计算机存储介质
CN109685092B (zh) * 2018-08-21 2024-02-06 中国平安人寿保险股份有限公司 基于大数据的聚类方法、设备、存储介质及装置
CN109325801A (zh) * 2018-08-31 2019-02-12 阿里巴巴集团控股有限公司 电子券发放、资源分配方法、装置及计算机设备
CN112235152B (zh) * 2020-09-04 2022-05-10 北京邮电大学 流量大小估算方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101394311A (zh) * 2008-11-12 2009-03-25 北京交通大学 一种基于时间序列的网络舆情预测方法
CN102088709A (zh) * 2010-11-30 2011-06-08 哈尔滨工业大学 基于聚类和arima模型的话务量预测方法
CN103024762A (zh) * 2012-12-26 2013-04-03 北京邮电大学 基于业务特征的通信业务预测方法
US20130225156A1 (en) * 2012-02-29 2013-08-29 Cerion Optimization Services, Inc. Systems and Methods for Convergence and Forecasting for Mobile Broadband Networks

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101360319B (zh) * 2007-07-30 2011-07-20 鼎桥通信技术有限公司 一种基于业务量的资源预留方法及装置
CN103702360B (zh) * 2013-12-12 2017-06-06 华为技术有限公司 一种确定业务接入端口的数据流速的方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101394311A (zh) * 2008-11-12 2009-03-25 北京交通大学 一种基于时间序列的网络舆情预测方法
CN102088709A (zh) * 2010-11-30 2011-06-08 哈尔滨工业大学 基于聚类和arima模型的话务量预测方法
US20130225156A1 (en) * 2012-02-29 2013-08-29 Cerion Optimization Services, Inc. Systems and Methods for Convergence and Forecasting for Mobile Broadband Networks
CN103024762A (zh) * 2012-12-26 2013-04-03 北京邮电大学 基于业务特征的通信业务预测方法

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110380881A (zh) * 2018-04-13 2019-10-25 中国移动通信有限公司研究院 一种网络扩容预测方法及装置
CN110380881B (zh) * 2018-04-13 2021-11-19 中国移动通信有限公司研究院 一种网络扩容预测方法及装置
CN110705777A (zh) * 2019-09-26 2020-01-17 联想(北京)有限公司 用于预测备件储备量的方法、装置及系统
CN110705777B (zh) * 2019-09-26 2022-07-26 联想(北京)有限公司 用于预测备件储备量的方法、装置及系统
CN111368457A (zh) * 2020-03-19 2020-07-03 北京航空航天大学 基于小波密度估计模型改进的半导体激光器退化预测方法
US20210209939A1 (en) * 2020-12-08 2021-07-08 Harbin Engineering University Large-scale real-time traffic flow prediction method based on fuzzy logic and deep LSTM
US11657708B2 (en) * 2020-12-08 2023-05-23 Harbin Engineering University Large-scale real-time traffic flow prediction method based on fuzzy logic and deep LSTM
CN115310565A (zh) * 2022-10-12 2022-11-08 西安道法数器信息科技有限公司 一种基于人工智能的网络安全监控方法
CN115310565B (zh) * 2022-10-12 2023-05-30 昆明市网络建设运营有限公司 一种基于人工智能的网络安全监控方法

Also Published As

Publication number Publication date
CN105472631A (zh) 2016-04-06

Similar Documents

Publication Publication Date Title
WO2016033969A1 (zh) 业务数据量和/或资源数据量的预测方法及预测系统
EP3295611B1 (en) Early warning and recommendation system for the proactive management of wireless broadband networks
JP7072640B2 (ja) 畳み込み効率を向上させる方法、システム、及び装置
Niyato et al. Market model and optimal pricing scheme of big data and Internet of Things (IoT)
CN107547154B (zh) 一种建立视频流量预测模型的方法及装置
CN110348526B (zh) 一种基于半监督聚类算法的设备类型识别方法和装置
CN113705959B (zh) 网络资源分配方法及电子设备
CN104391879B (zh) 层次聚类的方法及装置
CN104598557A (zh) 数据栅格化、用户行为分析的方法和装置
US20210182595A1 (en) Methods, systems and apparatus to improve image classification with boundary-bitmaps
CN108989092A (zh) 一种无线网络预测方法、电子设备及存储介质
WO2019120007A1 (zh) 用户性别预测方法、装置及电子设备
Wu et al. Characterizing and predicting individual traffic usage of mobile application in cellular network
Da Silva et al. A clustering approach for sampling data streams in sensor networks
KR20230087316A (ko) 인공지능 기반 클라우드 서비스 서버 결정 장치 및 방법
Dickmanns An integrated approach to feature based dynamic vision
Xu et al. Multi‐Dimensional Attention Based Spatial‐Temporal Networks for Traffic Forecasting
Huang The value-of-information in matching with queues
Majidpour Time series prediction for electric vehicle charging load and solar power generation in the context of smart grid
Janko et al. Choosing duty-cycle parameters for context recognition
CN115269543A (zh) 数据采样方法
CN113449008B (zh) 一种建模方法及装置
KR20230087315A (ko) 인공지능 기반 클라우드 서비스 서버 결정 장치 및 방법
Zhang et al. Device Scheduling and Assignment in Hierarchical Federated Learning for Internet of Things
Galanis et al. Edge computing and efficient resource management for integration of video devices in smart grid deployments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15838645

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15838645

Country of ref document: EP

Kind code of ref document: A1