WO2019219052A1 - 一种预测方法、训练方法、装置及计算机存储介质 - Google Patents

一种预测方法、训练方法、装置及计算机存储介质 Download PDF

Info

Publication number
WO2019219052A1
WO2019219052A1 PCT/CN2019/087185 CN2019087185W WO2019219052A1 WO 2019219052 A1 WO2019219052 A1 WO 2019219052A1 CN 2019087185 W CN2019087185 W CN 2019087185W WO 2019219052 A1 WO2019219052 A1 WO 2019219052A1
Authority
WO
WIPO (PCT)
Prior art keywords
indicator
data
feature
training
prediction model
Prior art date
Application number
PCT/CN2019/087185
Other languages
English (en)
French (fr)
Inventor
周敏
杨文森
张建锋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019219052A1 publication Critical patent/WO2019219052A1/zh
Priority to US17/020,361 priority Critical patent/US20200410376A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/27Regression, e.g. linear or logistic regression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/36Statistical metering, e.g. recording occasions when traffic exceeds capacity of trunks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic

Definitions

  • the present application relates to the field of communications, and more particularly to a prediction method, training method, apparatus, and computer storage medium.
  • the training data of the device can be obtained, and the prediction model can be obtained according to the training data.
  • the forecasting process some future situations can be predicted based on the obtained prediction model and the actual situation.
  • two training data of one network device can be directly obtained, and the prediction model can be directly trained according to the two training data.
  • the communication network operator Before the communication network operator conducts the business activity, it is required to predict the future resource occupancy rate of the communication device according to the assumed value of the number of using a certain service (which may also be referred to as the number of users) (also referred to as a resource occupancy indicator).
  • the network device that may be overloaded can be expanded in advance to ensure smooth operation of the system.
  • a functional relationship between the number of users of the target device and the resource occupancy index is directly trained, and the resource occupancy rate is predicted according to the assumed number of users and a functional relationship between the number of users and the resource occupancy rate. Due to the lack of significant changes in the number of users of the target device during the data sample collection cycle, the sample data lacks diversity. The relationship between the number of users and the resource occupancy rate is not fully reflected in the data. It is difficult to obtain a functional relationship between the number of users and the resource occupancy rate, and it is difficult to achieve a large-scale extrapolation of the predicted functional relationship. prediction.
  • the present application provides a prediction method, a training method, a device, and a computer storage medium.
  • An accurate prediction model can be obtained through two training data, and another data can be accurately predicted based on one of the prediction data.
  • a prediction method including: acquiring first target data to be predicted of a target device; and inputting the first indicator data to be predicted into a first prediction model to obtain predicted second indicator data of the target device And inputting the predicted second indicator data into the second prediction model to obtain a prediction result of the target device.
  • the first prediction model may be trained according to the first training data
  • the second prediction model is trained according to the second training data
  • the first training data may include the first indicator data and the second indicator data of the plurality of devices, that is, the first indicator data and the second indicator data may be from a plurality of network devices including the target device.
  • the first indicator data and the second indicator data are not specifically limited in the embodiment of the present application, and may be any two indicator data.
  • the first indicator data may be the number of users, and the second indicator data may be the amount of traffic.
  • the second training data may include second indicator data and third indicator data of the target device.
  • first indicator data and the second indicator data may be from a target device.
  • the second indicator data and the third indicator data are not specifically limited in the embodiment of the present application, and may be any two indicator data.
  • the second indicator data may be a traffic volume
  • the third indicator data may be a resource occupancy rate
  • the present application does not specifically limit the mentioned target device and/or network device (which may also be referred to as a communication network device), and may include, but is not limited to, any subnet, network element, and sub-device of the network element in the network (for example, a single Board), the functional unit of the network element (for example, module).
  • communication network devices can include, but are not limited to, network adapters, network transceivers, network media conversion devices, multiplexers, interrupters, hubs, bridges, switches, routers, gateways, and the like.
  • the number of users indicator in the embodiment of the present application may be expressed as the number of users who use a certain service in the network communication device.
  • the same network communication device may have multiple user number indicators.
  • the number of users can be expressed as a "2G+3G" number of users indicator.
  • the number of users can also be expressed as a "4G number of users” indicator.
  • the number of users can also be expressed as a "registered users” metric. This embodiment of the present application does not specifically limit this.
  • the traffic volume indicator in the embodiment of the present application can be understood as the number of services used in the network communication device.
  • traffic in a communication network device can be used to represent a "total traffic volume” metric for a network device.
  • traffic in a communication network device can also be used to represent a "Gi Interface Packets” metric for a network device.
  • the traffic volume in the communication network device can also be used to represent the "SGi User Plane Packets" metric for the network device. This embodiment of the present application does not specifically limit this.
  • the resource occupancy indicator in the embodiment of the present application may be represented as a resource consumption amount in the network communication device.
  • the resource occupancy indicator can be expressed as "CPU peak occupancy.”
  • the resource occupancy metric can also be expressed as “memory occupancy.”
  • the resource occupancy rate can also be expressed as “License occupancy”. This embodiment of the present application does not specifically limit this.
  • the multiple network devices may be network devices that are consistent with the indicator relationship between the user number indicator and the traffic volume indicator of the target device.
  • the relationship between the number of users of the other network devices and the indicator of the traffic volume indicator is the same as or the same as the indicator of the number of users of the target device and the traffic volume indicator.
  • multiple network devices including target devices and other network devices
  • the number of users and the traffic volume indicator can expand the data sample diversity of the number of users.
  • the relationship between the number of users of the other network devices and the indicator of the traffic volume indicator is the same as or similar to the indicator of the number of users of the target device and the traffic volume indicator, and the collected multiple network devices are
  • the user number indicator and the traffic volume indicator including the target device and other network devices are trained, and the obtained prediction model can be applied to the target device and other network devices.
  • the predicted model can be used to accurately predict the predicted resource occupancy of target devices and other network devices.
  • the network element ATS0 and the network element ATS1 belong to the same type of communication device.
  • the communication device may have a hierarchical decomposition structure, and the network element ATS0 may be decomposed into modules VCU0, VCU1, and DPU0 (the traffic of the network element ATS0 may be evenly distributed among the three modules (VCU0, VCU1, DPU0) under which the traffic is loaded).
  • the network element ATS1 can be decomposed into modules VCU0, VCU1, DPU0 (the service of the network element ATS1 can also be equally distributed among the three modules (VCU0, VCU1, DPU0) under it).
  • the number of users can correspond to network elements (ATS0, ATS1)
  • traffic indicators for example, “total traffic volume” indicators
  • the resource occupancy indicator (for example, the "CPU peak occupancy rate” indicator) may correspond to the modules (VCU0, VCU1, DPU0,) provided under the network element (ATS0, ATS1).
  • the network element ATS0 and the network element ATS1 belong to the same type of communication device, and the "registered users” indicator and the “total occupied traffic” indicator all correspond to the network element, and the "registered users” indicator and the “total occupied traffic”
  • the definition of the indicator is the same on the network element ATS0 and the network element ATS1.
  • the correlation between the "registered users” metric and the “total occupant traffic” metric has cross-device combination versatility between the network element ATS0 and the network element ATS1 (the number of users on the network element ATS0 and the network element ATS1) The indicator relationship between the indicator and the traffic indicator is consistent).
  • the first prediction model can be trained through the first training data of the collected multiple devices (including the target device and other network devices), and the diversity of the historical data samples of the first indicator data can be expanded.
  • the second prediction model can be trained through the collected second training data of the target device, so that the functional relationship between the first indicator data and the second indicator data can be more accurately reflected.
  • the method further includes: acquiring first training data; and obtaining a first prediction model according to the first training data.
  • the first prediction model is configured to predict second indicator data of the target device according to the first indicator data of the target device.
  • the embodiment of the present application does not specifically limit the implementation manner of training the first prediction model by using the first training data.
  • the first training data may be regression analysis to obtain the first prediction model.
  • the embodiment of the present application does not specifically limit the implementation manner of training the second prediction model by using the second training data.
  • the second training data may be regression analysis to obtain a second prediction model.
  • the indicator data before the regression analysis is performed on the first indicator data, the second indicator data, and the third indicator data (for example, the number of users, the traffic volume indicator, and the resource occupancy index), the indicator data may also be performed. Feature processing.
  • the indicator data can be standardized.
  • the indicator data may be normalized.
  • the metric data may be subjected to dimensionality reduction processing.
  • the first indicator data to be predicted may be input into the first prediction model, and the prediction result of the target device may be obtained.
  • principal component analysis is performed on the first indicator data in the first training data to obtain a principal component analysis model; and the first training data is obtained according to the principal component analysis model
  • the dimensionality reduction process is performed to obtain third dimension data after dimensionality reduction; and the first prediction model is trained according to the third training data.
  • principal component analysis can be performed on the first training data.
  • principal component analysis may be performed on the second indicator data in the first training data to obtain a principal component analysis model, and the first training data may be subjected to dimensionality reduction processing according to the principal component analysis model, and the third dimension after dimension reduction may be obtained.
  • Training data may be subjected to low variance filter processing, and the first training data may be subjected to dimensionality reduction processing.
  • the first feature data may be subjected to a backward feature elimination process, and the first training data may be subjected to a dimensionality reduction process.
  • the processing method of principal component analysis may be a statistical method, and a set of second indicator data that may have correlation may be converted into a set of linear uncorrelated variables by orthogonal transformation, and the converted variable may be referred to as a The main component of the two indicator data.
  • the first indicator data is dimension-reduced, and the calculation difficulty caused by the collinearity of the first indicator data that may be encountered by performing regression analysis on the second training data is avoided.
  • the first prediction model is obtained by performing regression analysis on the first training data.
  • the first prediction model may be obtained by performing regression analysis on the first training data.
  • the first prediction model may also be obtained by performing regression analysis of the origin of the first training data.
  • the first prediction model can be trained by the method of regression analysis, and the degree of correlation and the degree of fitting between the various factors can be accurately measured, and the calculation is simple and easy to implement.
  • the first training data if the diversity of the first training data meets a preset condition, the first training data is subjected to origin regression analysis; If the diversity does not satisfy the preset condition, the first training data is not subjected to the origin regression analysis.
  • the data diversity of the first indicator data may be determined. If the data diversity of the first indicator data in the first training data does not satisfy the preset condition, the first indicator data label and the second indicator data in the first training data may be subjected to regression analysis of the coordinate origin. If the data diversity of the first indicator data in the first training data satisfies a preset condition, the regression analysis of the coordinate origin may not be constrained to the first indicator data and the second indicator data in the first training data.
  • the preset condition mentioned above may represent a preset threshold. If the data diversity of the first indicator data reaches a preset threshold, it may be used to indicate that the data diversity of the first indicator data satisfies a preset condition.
  • regression analysis of the origin of the coordinates of the data and regression analysis of the origin of the coordinates can be regarded as regression analysis of the data.
  • the model obtained by regression analysis of the coordinate origin may have no constant term, and the model obtained by regression analysis without constraining the coordinate origin may have a constant term.
  • the first indicator data may be subjected to a normalization process, and the data of the first indicator data subjected to the normalization process may be determined.
  • the first indicator data may be subjected to a standardization process, and the data of the first indicator data subjected to the normalization process may be determined.
  • the first indicator data may be subjected to a dimensionality reduction process, and the data of the first indicator data subjected to the dimensionality reduction processing may be determined.
  • the first indicator data and the second indicator data may be constrained to be subjected to regression analysis of the coordinate origin to avoid possible Model extrapolation caused by insufficient diversity of the first indicator data of the data set is inaccurate.
  • the regression analysis of the unconstrained origin may be used in the case where the diversity of the first indicator data in the first training data satisfies the preset condition, and the information provided by the data set is fully utilized to obtain a more accurate model.
  • the second prediction model is obtained by performing regression analysis on the second training data.
  • the second training data may be subjected to regression analysis of the origin, and the second prediction model is trained.
  • the second training data may also be subjected to regression analysis without coordinate origin, and the second prediction model is trained. This application does not specifically limit this.
  • the second prediction model is obtained by performing a quantile regression analysis on the second training data.
  • quantile regression analysis can be one of the methods of regression analysis.
  • the quantile can be a numerical point that divides the distribution range of the random variable according to the ratio of the probability.
  • the quantile regression can predict the upper or lower bound of the index.
  • a quantile parameter of 0.1 can be used to indicate that the distribution of the variable is divided into two parts, and the probability that the variable is less than 0.1 quantile can be 0.1.
  • the method for performing quantile regression by using the second indicator data and the third indicator data in the second training data is not specifically limited in the embodiment of the present application.
  • a linear quantile regression method can be used.
  • a non-linear quantile regression method can also be used.
  • the second prediction model is established by quantile regression analysis, and can predict the upper and lower bounds of the third indicator data instead of the average value, and meet the concern of the application requirements on the boundary value.
  • a training method including: acquiring first training data and second training data; obtaining a first prediction model according to the first training data; and obtaining a second prediction model according to the second training data.
  • the first indicator data may be the first indicator data
  • the second indicator data may be the second indicator data
  • the third indicator data may be the third indicator data
  • the first indicator data of the target device to be predicted is acquired; the first indicator data to be predicted is input into the first prediction model, and the second indicator of the target device is obtained. Data; inputting the predicted second indicator data into the second prediction model to obtain a prediction result of the target device.
  • principal component analysis is performed on the second indicator data indicator in the first training data to obtain a principal component analysis model; and the first training is performed according to the principal component analysis model
  • the data is subjected to dimensionality reduction processing to obtain third dimension data after dimensionality reduction; and the first prediction model is trained according to the third training data.
  • the first prediction model is obtained by performing regression analysis on the first training data.
  • the first training data if the diversity of the first training data satisfies a preset condition, the first training data is subjected to origin regression analysis; If the diversity does not satisfy the preset condition, the first training data is not subjected to the origin regression analysis.
  • the second training data of the target device is acquired; and the second prediction model of the target device is trained according to the second training data.
  • the second prediction model may be trained according to the acquired second training data.
  • the second prediction model is obtained by performing regression analysis on the second training data.
  • the second prediction model is obtained by performing a quantile regression analysis on the second training data.
  • the indicator relationship between the first indicator data indicator and the second indicator data indicator between the target device and the other network device is consistent.
  • the first metric data and the second metric data of the multiple devices may be acquired, by training the multiple devices.
  • the prediction model obtained from the first indicator data and the second indicator data can be applied to a plurality of devices.
  • a method for modeling a numerical relationship between a tenant quantity indicator and a resource consumption indicator comprising: a numerical relationship between a feature describing a tenant quantity indicator and a characteristic of a service usage indicator On a data set, the first regression analysis is performed to obtain the first prediction model; on the second data set describing the numerical relationship between the characteristics of the service usage indicator and the characteristics of the resource consumption indicator, the second regression analysis is obtained. Two predictive models. Any data sample in the first data set corresponds to the value of the tenant quantity indicator and the value of the service usage indicator under a certain combination of a certain device.
  • the original value of the tenant quantity indicator of some devices in the device combination is directly used as a feature of the tenant quantity indicator of the data sample, or is input into the first feature processing, and the output value processed by the first feature is A feature of the tenant number indicator as the data sample.
  • the original value of the service usage indicator of some devices in the device combination is directly used as a feature of the service usage indicator of the data sample, or is input into a second feature processing, and processed by the second feature.
  • the output value is characteristic of the service usage indicator of the data sample.
  • the set consisting of the device combinations corresponding to the entire data samples is not singular: at least one pair of data samples exists in the data set, and at least one of the tenant quantity indicators exists, and the number of tenants of the pair of data samples The original value of the indicator is taken from two different devices.
  • Any data sample in the second data set corresponds to the value of the service usage indicator and the value of the resource consumption indicator under a certain combination of a certain device.
  • the original value of the service usage indicator of some devices in the device combination is directly used as a feature of the service usage indicator of the data sample, or is input into a second feature processing, and processed by the second feature.
  • the output value is characteristic of the service usage indicator of the data sample.
  • the original value of the resource consumption indicator of some devices in the device combination is directly used as a feature of the resource consumption indicator of the data sample, or is input into a third feature processing, and processed by the third feature.
  • the output value is characteristic of the resource consumption indicator of the data sample.
  • the service usage indicator is determined according to the tenant quantity indicator and the service usage indicator.
  • a load distribution between a device that provides an original value of the tenant quantity indicator and a device that provides an original value of the service usage indicator Relationships are similar between different data samples.
  • the second feature processing in the case where the input value is all zero or approximately all zeros, the output value is all zeros or approximately all zeros.
  • the second feature processing includes a first translation transformation, the first translation transformation being determined by the following steps:
  • Partial processing in the second feature processing is performed on all zero or approximately all zero input values, and a first translation transform is determined based on the partially processed output values.
  • the first regression analysis includes: constraining, on the first data set, a feature of the tenant quantity indicator and a feature of the service usage indicator Regression analysis of the origin.
  • the tenant when the diversity of the tenant quantity indicator on the first data set does not meet a preset condition, the tenant is on the first data set
  • the characteristics of the quantity indicator and the characteristics of the service usage indicator are used to constrain the regression analysis of the origin to obtain the first prediction model.
  • the first regression analysis includes: when the diversity of the tenant quantity indicator on the first data set meets a preset condition, in the first On the data set, the characteristics of the tenant quantity index and the characteristics of the service usage index are not constrained by the regression analysis of the origin, and the first prediction model is obtained.
  • the second feature processing includes: performing a first dimensionality reduction mapping process on certain service usage indicators of the device on the first data set, to obtain the Characteristics of service usage metrics.
  • the first dimensionality reduction mapping process includes: performing feature processing by using a service usage principal component model, where the service usage principal component model is determined by the following steps:
  • the principal component analysis is performed to obtain the service usage principal component model
  • the first feature processing in the case where the input value is all zero or approximately all zeros, the output value is all zeros or approximately all zeros.
  • the first feature processing includes a second translation transformation, where the first translation transformation is determined by: performing the first feature on all zero or approximately all zero input values A partial process in processing determines a second translation transform based on the partially processed output value.
  • the first feature processing includes: performing a second dimensionality reduction mapping process on certain tenant quantity indicators of the device on the first data set, to obtain the tenant The characteristics of the quantitative indicators.
  • the second dimensionality reduction mapping process includes: performing feature processing by using a tenant number principal component model, and determining a tenant number principal component model by the following steps:
  • the principal component analysis is performed, and the principal component model describing the tenant quantity index is obtained;
  • the second regression analysis includes: performing, on the second data set, a feature of the service usage indicator and a feature of the resource consumption indicator Quantile regression analysis yielded a second prediction model.
  • an apparatus for modeling a numerical relationship between a tenant quantity indicator and a resource consumption indicator including:
  • a first processing module configured to perform a first regression analysis to obtain a first prediction model on a first data set describing a numerical relationship between a feature of the tenant quantity indicator and a feature of the service usage indicator;
  • the second processing module is configured to perform a second regression analysis to obtain a second prediction model on a second data set describing a numerical relationship between a feature of the service usage indicator and a feature of the resource consumption indicator.
  • Any data sample in the first data set corresponds to the value of the tenant quantity indicator and the value of the service usage indicator under a certain combination of a certain device.
  • the original value of the tenant quantity indicator of some devices in the device combination is directly used as a feature of the tenant quantity indicator of the data sample, or is input into the first feature processing, and the output value processed by the first feature is A feature of the tenant number indicator as the data sample.
  • the original value of the service usage indicator of some devices in the device combination is directly used as a feature of the service usage indicator of the data sample, or is input into a second feature processing, and processed by the second feature.
  • the output value is characteristic of the service usage indicator of the data sample.
  • the set consisting of the device combinations corresponding to the entire data samples is not singular: at least one pair of data samples exists in the data set, and at least one of the tenant quantity indicators exists, the pair of data samples The original value of the tenant quantity indicator is taken from two different devices.
  • Any data sample in the second data set corresponds to the value of the service usage indicator and the value of the resource consumption indicator under a certain combination of a certain device.
  • the original value of the service usage indicator of some devices in the device combination is directly used as a feature of the service usage indicator of the data sample, or is input into a second feature processing, and processed by the second feature.
  • the output value is characteristic of the service usage indicator of the data sample.
  • the original value of the resource consumption indicator of some devices in the device combination is directly used as a feature of the resource consumption indicator of the data sample, or is input into a third feature processing, and processed by the third feature.
  • the output value is characteristic of the resource consumption indicator of the data sample.
  • the service usage indicator is determined according to the tenant quantity indicator and the service usage indicator.
  • a load distribution between a device that provides an original value of the tenant quantity indicator and a device that provides an original value of the service usage indicator Relationships are similar between different data samples.
  • the second feature processing when the input value is all zero or approximately all zeros, the output value is all zero or approximately all zeros.
  • the second feature processing includes a first translation transformation, where the first translation transformation is determined by the following steps:
  • Partial processing in the second feature processing is performed on all zero or approximately all zero input values, and a first translation transform is determined based on the partially processed output values.
  • the first regression analysis includes: constraining, on the first data set, a feature of the tenant quantity indicator and a feature of the service usage indicator Regression analysis of the origin.
  • the first processing module is specifically configured to: when the diversity of the tenant quantity indicator on the first data set does not meet a preset condition, The regression analysis of the characteristics of the tenant quantity indicator and the characteristics of the service usage indicator on the first data set is performed to obtain a first prediction model.
  • the first regression analysis includes: when the diversity of the tenant quantity indicator on the first data set meets a preset condition, in the first On the data set, the characteristics of the tenant quantity index and the characteristics of the service usage index are not constrained by the regression analysis of the origin, and the first prediction model is obtained.
  • the second feature processing includes: performing a first dimensionality reduction mapping process on certain service usage indicators of the device on the first data set, to obtain the Characteristics of service usage metrics.
  • the first dimensionality reduction mapping process includes: performing feature processing by using a service usage principal component model, where the service usage principal component model is determined by the following steps:
  • the principal component analysis is performed to obtain the service usage principal component model
  • the first feature processing when the input value is all zero or approximately all zeros, the output value is all zero or approximately all zeros.
  • the first feature processing includes a second translation transformation, where the first translation transformation is determined by: performing the first feature on all zero or approximately all zero input values A partial process in processing determines a second translation transform based on the partially processed output value.
  • the first feature processing includes: performing a second dimensionality reduction mapping process on certain tenant quantity indicators of the device on the first data set, to obtain the tenant The characteristics of the quantitative indicators.
  • the second dimensionality reduction mapping process comprises: performing feature processing by using a tenant number principal component model, and the tenant number principal component model is determined by the following steps:
  • the principal component analysis is performed, and the principal component model describing the tenant quantity index is obtained;
  • the second regression analysis includes: performing, on the second data set, a feature of the service usage indicator and a feature of the resource consumption indicator Quantile regression analysis yielded a second prediction model.
  • a training apparatus including: a first acquiring module, configured to acquire first training data and second training data; and a first training module, configured to obtain a first prediction according to the first training data a second training module, configured to train the second prediction model according to the second training data.
  • the first training data includes a first indicator data indicator and a second indicator data indicator of the plurality of network devices (eg, the target device and other network devices), and the second training data includes the second indicator of the target device.
  • Data indicators and third indicator data indicators are included in the first training data.
  • the first prediction model is used to indicate a mapping relationship between the first indicator data indicator of the target device and the second indicator data indicator.
  • the second prediction model is used to indicate a mapping relationship between the second indicator data indicator of the target device and the third indicator data indicator.
  • the apparatus further includes: a second acquiring module, configured to acquire first target data to be predicted of the target device; and a first determining module, configured to: And a second determining module, configured to input the predicted second indicator data into the second prediction model, to obtain the target device forecast result.
  • the prediction model includes the first prediction model and the second prediction model, the first prediction model is trained according to the first training data, and the second prediction model is trained according to the second training data. get.
  • the first training module is specifically configured to perform principal component analysis on the second indicator data indicator in the first training data to obtain a principal component analysis model;
  • the component analysis model performs dimensionality reduction processing on the first training data to obtain third dimension data after dimensionality reduction; and the first prediction model is trained according to the third training data.
  • the first training module is specifically configured to: obtain the first prediction model by performing regression analysis on the first training data.
  • the first training module is specifically configured to perform an origin return on the first training data if the diversity of the first training data meets a preset condition. Analysis; if the diversity of the first training data does not satisfy the preset condition, the first training data is subjected to an off-site regression analysis.
  • the second training module is specifically configured to: obtain the second prediction model by performing regression analysis on the second training data.
  • the second training module is specifically configured to: obtain the second prediction model by performing quantile regression analysis on the second training data.
  • the target device has consistency with the index relationship between the first indicator data indicator and the second indicator data indicator of the other network device.
  • the sixth aspect provides a prediction apparatus, including: a first acquiring module, configured to acquire first target data to be predicted of the target device; and a first determining module, configured to input the first indicator data to be predicted into the first
  • the prediction model obtains the predicted second indicator data of the target device, and the second determining module is configured to input the predicted second indicator data into the second prediction model to obtain a prediction result of the target device.
  • the first prediction model is trained according to the first training data
  • the second prediction model is trained according to the second training data, where the first training data includes the first indicator data and the second indicator of the plurality of devices.
  • the second training data includes second indicator data and third indicator data of the target device, the plurality of devices including the target device.
  • the apparatus further includes: a second acquiring module, configured to acquire first training data, where the first training module is configured to obtain the first training according to the first training data Forecast model.
  • the first prediction model is configured to predict second indicator data of the target device according to the first indicator data of the target device.
  • the first training module is specifically configured to: perform principal component analysis on the second indicator data indicator in the first training data to obtain a principal component analysis model;
  • the component analysis model performs dimensionality reduction processing on the first training data to obtain third dimension data after dimensionality reduction; and the first prediction model is trained according to the third training data.
  • the first training module is specifically configured to: obtain the first prediction model by performing regression analysis on the first training data.
  • the first training module is specifically configured to perform an origin return on the first training data if the diversity of the first training data meets a preset condition. Analysis; if the diversity of the first training data does not satisfy the preset condition, the first training data is subjected to an off-site regression analysis.
  • the apparatus further includes: a third acquiring module, configured to acquire second training data of the target device, where the second training data includes second indicator data of the target device And a second indicator module, configured to train a second prediction model of the target device according to the second training data, where the second prediction model is used to indicate a second indicator data indicator of the target device The mapping relationship of the third indicator data indicator.
  • the second training module is specifically configured to: obtain the second prediction model by performing regression analysis on the second training data.
  • the second training module is specifically configured to: obtain the second prediction model by performing a quantile regression analysis on the second training data.
  • the indicator relationship between the first indicator data and the second indicator data between the target device and the other network device is consistent.
  • a training apparatus comprising a memory for storing a program, a processor for executing a program stored in the memory, the processing when the program is executed
  • the apparatus is for performing the method of any one of the second aspect or the second aspect.
  • a prediction apparatus comprising: a memory for storing a program; and a processor for executing a program stored in the memory, when the program is executed, the processing
  • the apparatus is for performing the method of any one of the first aspect or the first aspect.
  • an apparatus for modeling a numerical relationship between a tenant quantity indicator and a resource consumption indicator comprising: a memory and a processor, wherein the memory is used to store a program; the processor is configured to execute The program stored in the memory, when the program is executed, the processor executes the method in any one of the implementations of the third aspect or the third aspect.
  • a tenth aspect a computer readable storage medium comprising computer instructions that, when executed on the training device, cause the training device to perform any one of the second aspect or the second aspect Said method.
  • a computer readable storage medium comprising computer instructions that, when executed on the predictive device, cause the predicting device to perform any of the first aspect or the first aspect A method as described in the implementation.
  • a twelfth aspect a computer readable storage medium comprising computer instructions for causing a computer instruction to run on a device for modeling a numerical relationship between a tenant number indicator and a resource consumption indicator
  • the prediction apparatus is for performing the method of any one of the third aspect or the third aspect.
  • a thirteenth aspect a chip comprising a memory for storing a program, a processor for executing a program stored in the memory, and the processing when the program is executed The method of the first aspect or any one of the implementations of the first aspect is performed.
  • a chip comprising: a memory for storing a program; and a processor for executing a program stored in the memory, when the program is executed, the processing The method of any one of the implementations of the second aspect or the second aspect is performed.
  • a computer program product when the computer program product is run on a computer, causing the computer to perform the method of any one of the first aspect or the first aspect.
  • a computer program product which when executed on a computer, causes the computer to perform the method of any one of the second aspect or the second aspect.
  • FIG. 1 is a schematic flowchart of a prediction method provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a cross-device combination universality scenario between possible indicators provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a cross-device combination universality scenario between possible indicators provided by another embodiment of the present application.
  • FIG. 4 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • FIG. 5 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • FIG. 6 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • FIG. 7 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • FIG. 8 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • FIG. 9 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a training device 1000 according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a prediction apparatus 1100 according to an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a training device 1200 according to an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a prediction apparatus 1300 provided by an embodiment of the present application.
  • the application does not specifically limit the applicable scenario for predicting the second indicator data according to the first indicator data and the prediction model. It can be applied to various communication network devices, and can also be applied to various computer devices, for example, computer devices that can be applied to a data operation center.
  • the embodiment of the present application provides a prediction method, which can accurately predict a prediction result (third indicator data) of a target device according to the first indicator data to be predicted of the target device.
  • the embodiments of the present application are described in detail below with reference to FIG. 1 .
  • FIG. 1 is a schematic flowchart of a prediction method provided by an embodiment of the present application.
  • the method of FIG. 1 may include steps 110-130, which are described in detail below.
  • step 110 the first indicator data of the target device to be predicted is acquired.
  • the target device in the embodiment of the present application may also be referred to as a device to be modeled.
  • the present application does not specifically limit the mentioned target device and/or network device (which may also be referred to as a communication network device), and may include, but is not limited to, any subnet, network element, and sub-device of the network element in the network (for example, a single Board), the functional unit of the network element (for example, module).
  • communication network devices can include, but are not limited to, network adapters, network transceivers, network media conversion devices, multiplexers, interrupters, hubs, bridges, switches, routers, gateways, and the like.
  • the type of the device (which may also be referred to as a network communication device) is not specifically limited, and may be any communication network device.
  • the device may be an advanced telephony server (ATS).
  • ATS advanced telephony server
  • the device may also be a unified packet gateway (UGW).
  • GGW unified packet gateway
  • step 120 the first indicator data to be predicted is input to the first prediction model, and the predicted second indicator data of the target device is obtained.
  • the embodiment of the present application does not specifically limit the prediction model.
  • it can be a predictive model.
  • the prediction model may include a first prediction model and a second prediction model.
  • first prediction model is trained according to the first training data
  • second prediction model is trained according to the second training data
  • the first training data in the embodiment of the present application may include first indicator data and second indicator data of the plurality of devices.
  • first indicator data and the second indicator data may be from a plurality of network devices including the target device.
  • the first indicator data and the second indicator data are not specifically limited in the embodiment of the present application, and may be two indicator data having positive correlation or two index data having negative correlation.
  • step 130 the predicted second indicator data is input to the second prediction model to obtain a prediction result of the target device.
  • the second prediction model in the embodiment of the present application may be trained according to the second training data, and the second training data may include second indicator data and third indicator data of the target device.
  • the second indicator data and the third indicator data may be from the target device, or may be from multiple network devices including the target device, which is not specifically limited in this application.
  • the predicted second indicator output by the first prediction model may be input to the second prediction model, and the prediction result of the target device, that is, the predicted third indicator data output by the second prediction model may be obtained.
  • the first indicator data may be the number of users
  • the second indicator data may be the traffic volume
  • the third indicator data may be the resource occupancy rate
  • the number of users indicator in the embodiment of the present application may be expressed as the number of users who use a certain service in the network communication device.
  • the same network communication device may have multiple user number indicators.
  • the number of users can be expressed as a "2G+3G" number of users indicator.
  • the number of users can also be expressed as a "4G number of users” indicator.
  • the number of users can also be expressed as a "registered users” metric. This embodiment of the present application does not specifically limit this.
  • the traffic volume indicator in the embodiment of the present application can be understood as the number of services used in the network communication device.
  • traffic in a communication network device can be used to represent a "total traffic volume” metric for a network device.
  • traffic in a communication network device can also be used to represent a "Gi Interface Packets” metric for a network device.
  • the traffic volume in the communication network device can also be used to represent the "SGi User Plane Packets" metric for the network device. This embodiment of the present application does not specifically limit this.
  • the "traffic” indicator may be related to the "number of users” indicator (also referred to as the number of users), the frequency of user communication, and the length of time that the user communicates. If the number of "users" in a unit of time is higher, the longer the communication takes, the larger the "traffic” indicator.
  • the metric relationship between the number of users and the traffic metric between the multiple devices mentioned above is consistent, that is, the metrics of the number of users and the metrics of the traffic between the multiple devices are the same or have roughly The same trend of change.
  • the resource occupancy index in the embodiment of the present application may be expressed as the resource consumption amount in the network communication device.
  • the resource occupancy rate can be a certain resource occupancy rate corresponding to the number of users. For example, a certain number of users corresponds to a CPU usage of 80%.
  • the resource occupancy indicator can be expressed as "CPU peak occupancy.”
  • the resource occupancy metric can also be expressed as “memory occupancy.”
  • the resource occupancy rate can also be expressed as “License occupancy”. This embodiment of the present application does not specifically limit this.
  • the first prediction model may be trained by using the first training data of the collected multiple devices (including the target device and other network devices), and the diversity of the historical data samples of the first indicator data may be expanded.
  • the second prediction model can be trained through the collected second training data of the target device, so that the functional relationship between the first indicator data and the second indicator data can be more accurately reflected.
  • the first training data may also be acquired, and the first prediction model is trained according to the first training data.
  • the second training data may also be acquired, and the second prediction model is trained according to the second training data.
  • the first prediction model may be used to predict second indicator data of the target device according to the first indicator data of the target device.
  • the second prediction model is configured to predict third indicator data of the target device according to the second indicator data of the target device obtained by the first prediction model.
  • the following uses the first indicator data as the number of users, the second indicator data as the traffic volume, and the third indicator data as the resource occupancy rate as an example, which is described in detail.
  • the second predictive model may also be referred to as a "user number - traffic volume” model.
  • the user number indicator and/or the traffic volume indicator in the first training data may be directly trained, and the first prediction model may be obtained, and the user number indicator and/or the traffic volume indicator may be subjected to feature processing. And training the data after the feature processing to obtain the first prediction model. This application does not specifically limit this.
  • the metrics collected by the device are subjected to feature processing, so that the data after the feature processing has a certain numerical characteristic, so as to facilitate the mathematical tool.
  • some dimensions have a certain mean, variance, etc. within a certain interval.
  • standardization and normalization are all transformations of data, and are commonly used feature processing methods.
  • the metrics collected by the device for example, the number of users, the traffic metric, and the resource occupant
  • some information in the metric may also be extracted to facilitate subsequent analysis. For example, you can identify the sign of a value.
  • the number of users and/or traffic indicators can be standardized.
  • normalization processing may also be performed on the number of users indicator and/or the traffic volume indicator.
  • the user number indicator and/or the traffic volume indicator may also be subjected to dimensionality reduction processing.
  • principal component analysis may be performed on the number of users indicator and/or the traffic volume indicator.
  • the embodiment of the present application does not specifically limit the implementation manner of training the first prediction model by using the first training data.
  • the first training data may be regression analysis to obtain the first prediction model.
  • the second prediction model may be used to indicate a mapping relationship between the traffic volume indicator and the resource occupancy indicator of the target device.
  • the second predictive model may also be referred to as a "traffic-resource occupancy" model.
  • the traffic volume indicator and/or the resource occupancy rate indicator in the second training data may be directly trained, and the second prediction model may be obtained, and the traffic indicator and/or the resource occupancy indicator may be used to perform the feature. Processing, and training the data after feature processing to obtain a second prediction model. This application does not specifically limit this.
  • standardization processing may be performed on the traffic metrics and/or resource occupancy metrics.
  • normalization processing may also be performed on the traffic metrics and/or resource occupancy metrics.
  • dimension reduction indicator and/or the resource occupancy indicator may also be subjected to dimensionality reduction processing.
  • the principal component analysis may be performed by using the conversational traffic indicator and/or the resource occupancy indicator.
  • the embodiment of the present application does not specifically limit the implementation manner of training the second prediction model by using the second training data.
  • the second training data may be regression analysis to obtain a second prediction model.
  • the first prediction model can be trained through the first training data of the collected multiple devices (including the target device and other network devices), and the diversity of the historical data samples of the user number indicator can be expanded.
  • the second prediction model can be trained through the collected second training data of the target device, so that the functional relationship between the user number index and the resource occupancy index can be more accurately reflected.
  • the predicted number of users of the target device may be obtained, and according to the predicted number of users, the number of predicted users in the target device may be obtained by using the first prediction model and the second prediction model mentioned above. Corresponding predicted resource occupancy rate.
  • the network device can accurately obtain the predicted resource occupancy rate according to the predicted number of users.
  • the network operator can obtain the resource occupancy rate of the network device corresponding to the predicted number of users before the activity is performed, and the network device that is overloaded can be expanded in advance.
  • the number of users in FIG. 2 to FIG. 3 corresponds to the first indicator data in the above
  • the traffic volume corresponds to the second indicator data in the above
  • the resource occupancy rate corresponds to the third data indicator in the above.
  • FIG. 2 is a schematic flowchart of a cross-device combination universality scenario between possible indicators provided by an embodiment of the present application.
  • the network element ATS0 and the network element ATS1 belong to the same type of communication device.
  • the communication device may have a hierarchical decomposition structure, and the network element ATS0 may be decomposed into modules VCU0, VCU1, and DPU0, and the network element ATS1 may be decomposed into modules VCU0, VCU1, and DPU0.
  • the network element ATS can be a general-purpose voice service server.
  • the network element ATS can provide basic call services.
  • the network element ATS can provide basic voice call and time-frequency telephone functions to the user.
  • the network element ATS may provide some supplementary services.
  • the network element ATS may provide enhanced additional system functions such as a display class, a restricted call class, a transfer class, a callback class, a conference class, and a reminder class.
  • the service of the network element ATS0 can be evenly distributed among the three modules (VCU0, VCU1, DPU0) under it, and the service of the network element ATS1 can also be loaded between the three modules (VCU0, VCU1, DPU0) under it. equally distributed.
  • the dispatch process unit (DPU) can be used to execute the control strategy configured by the engineer, and can implement functions such as data acquisition, scale conversion, alarm limit check, operation record, and sequential time record.
  • the number of users may correspond to network elements (ATS0, ATS1)
  • traffic indicators for example, “total traffic volume” indicators
  • network elements for example, “total traffic volume” indicator
  • resource occupancy indicators for example, "CPU peak occupancy” indicator
  • modules VCU0, VCU1, DPU0
  • the network element ATS0 and the network element ATS1 belong to the same type of communication device, and the "registered users” indicator and the “total occupied traffic” indicator all correspond to the network element, and the "registered users” indicator and the “total occupied traffic”
  • the definition of the indicator is the same on the network element ATS0 and the network element ATS1. Therefore, the correlation between the "number of registered users” indicator and the “total occupied traffic” indicator has cross-device combination versatility between the network element ATS0 and the network element ATS1. That is to say, the model trained by the correlation between the "number of registered users” indicator and the "total occupied traffic” indicator can be applied to the network element ATS0 or the network element ATS1.
  • the "total occupied traffic" indicator on the network element ATS0 and the network element ATS1 may be the same or approximately the same.
  • the number of users indicator for example, "number of registered users” indicator
  • the traffic volume indicator for example, “total occupied traffic” indicator
  • FIG. 2 may be from a target device (for example, network element ATS0) and Other network devices (for example, network element ATS1).
  • the network element ATS0 and the network element ATS1 can be collected because the indicator relationship between the number of registered users and the total number of occupied traffic indicators on the network element ATS0 and the network element ATS1 has the same or substantially the same change trend.
  • the number of users and the traffic volume indicator can expand the diversity of historical data samples of the number of users.
  • FIG. 3 is a schematic flowchart of a cross-device combination universality scenario between possible indicators according to another embodiment of the present application.
  • the network elements UGW0, UGW1, and UGW2 belong to the same type of communication device.
  • the communication device may have a hierarchical decomposition structure, the network element UGW0 may be decomposed into the module SPU instance 0, the SPU instance 1, the network element UGW1 may be decomposed into the module SPU instance 0, the SPU instance 1, and the network element UGW2 may be decomposed into the module SPU instance 0, SPU Example 1, SPU instance 2.
  • the network element UGW can be a unified packet gateway, and the service of the network element UGW0 can be loaded in the module SPU instance under it.
  • the service process unit (SPUSPU) instance can be used to provide service function requirements in a network application scenario such as load balancing, firewall, and the like.
  • the SPU provides an efficient load balancing solution that can solve information technology (information technology).
  • information technology information technology
  • the response is slow, the application delay is too large, and the device traffic is uneven.
  • the reliability of the service is ensured, the response speed of the service is improved, and the service is flexibly expanded.
  • the service of the network element UGW0 can be evenly distributed between two modules (SPU instance 0, SPU instance 1) under which the network element UGW1 can be loaded.
  • the two modules under the network element UGW1 can be loaded (SPU instance 0)
  • the SPU instances 1) are evenly distributed, and the services of the network element UGW2 can be evenly distributed among the three modules (SPU instance 0, SPU instance 1, SPU instance 2) under load.
  • the number of users can correspond to network elements (UGW0, UGW1, UGW2)
  • traffic indicators for example, "Gi interface packets” indicator
  • the "SGi interface packet number” indicator can correspond to the network element (UGW0, UGW1, UGW2)
  • the traffic volume indicator for example, the "GW receives the user plane packet number” indicator
  • the network elements UGW0, UGW1, and UGW2 belong to the same type of communication device.
  • the "2G+3G user number” indicator, the "4G user number” indicator, the "Gi interface packet number” indicator, and the “SGi interface packet number” indicator all correspond to the network element.
  • the definitions of the "2G+3G user number” indicator, the "4G user number” indicator, the "Gi interface packet number” indicator, and the “SGi interface packet number” indicator are the same on the network elements UGW0, UGW1, and UGW2.
  • the correlation between the "2G+3G user number” indicator, the "4G user number” indicator, the "Gi interface packet number” indicator, and the “SGi interface packet number” indicator has a cross between the network elements UGW0, UGW1, and UGW2.
  • Equipment combination versatility
  • the number of SPU instances of the network element UGW2 is different from the number of instances of the network element UGW0 and the network element UGW1, as shown in FIG. 3, the GW receives the number of user plane packet packets.
  • the decomposition relationship between the SPU instances of the network element UGW2 may be different from the network element UGW0 and the network element UGW1. Therefore, the correlation between the "2G+3G user number” indicator, the "4G user number” indicator, and the "GW receiving user plane packet number” indicator has cross-device combination versatility between the network elements UGW0 and UGW1, and There is no cross-device combination versatility between network elements UGW0, UGW1, UGW2.
  • the number of users for example, "2G+3G number of users” indicator, “4G number of users” indicator
  • traffic volume indicators for example, "Gi interface packet number” indicator, "SGi interface packet number”
  • the correlation between the indicator and the "GW receiving the number of user plane packets” indicator may have cross-device combination versatility between the network elements UGW0 and UGW1.
  • the model trained by the correlation can be applied to the network element UGW0, and can also be applied to the network element UGW1.
  • the number of users for example, “2G + 3G users” indicator, “4G users” indicator
  • traffic volume indicators for example, "Gi interface packet number” indicator, "SGi interface packet number” indicator, " The correlation between the GW receiving the number of user plane packet packets may have cross-device combination versatility between the network elements UGW0 and UGW1.
  • the data of the user number indicator and the traffic volume indicator of the target device for example, the network element UGW0
  • other network devices for example, the network element UGW1 can be collected, and the diversity of the historical data samples of the user number indicator can be expanded.
  • the first prediction model may be obtained by performing regression analysis on the first training data and/or the second training data.
  • the first prediction model and/or the second prediction model can be trained by the method of regression analysis, and the correlation degree and the degree of fitting between the various factors can be accurately measured, and the calculation is simple and easy to implement.
  • the following uses the first indicator data as the number of users, the second indicator data as the traffic volume, and the third indicator data as the resource occupancy rate as an example, which is described in detail.
  • the following is a detailed illustration of the first prediction model and/or the second prediction model by performing regression analysis on the first training data and/or the second training data in conjunction with FIG. 4 as an example.
  • FIG. 4 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • Figure 4 includes steps 410-450, which are described in detail below with respect to steps 410-450, respectively.
  • the training scenario shown in FIG. 3 is taken as an example to describe the flow of training the first prediction model and the second prediction model in detail.
  • the device to be modeled shown in FIG. 3 may correspond to the target device in the embodiment of the present application.
  • step 410 according to the device to be modeled, the “resource occupancy rate” indicator, the “user number” indicator, and the “traffic volume” indicator to be predicted are determined.
  • the number of users to be predicted is determined by the “2G+3G users” indicator of the UGW network element and the “4G user number” indicator of the UGW network element.
  • the resource occupancy rate indicator to be predicted is the CPU peak occupancy rate indicator of the SPU instance.
  • the traffic metric to be predicted is the "Gi interface packet number" indicator of the UGW network element, the "SGi user plane packet number” indicator of the UGW network element, and the "GW receiving user plane packet number” indicator of the SPU instance.
  • step 420 according to the cross-device versatility of the correlation between the "number of users” indicator and the "traffic” indicator, a device having versatility with the device to be modeled is selected to obtain a device combination list 1.
  • the first training data is obtained according to the device combination list 1, and includes a “number of users” indicator and a “traffic volume” indicator.
  • the "2G+3G user number” indicator of the UGW network element According to UGW0, UGW1, and UGW2, the "2G+3G user number” indicator of the UGW network element, the "4G user number” indicator of the UGW network element, and the "Gi interface packet number” indicator of the UGW network element, and the UGW network element.
  • the "SGi User Face Packets" indicator is consistent in the definition of the three devices, and all correspond to the network element. It can be judged that the correlation between these indicators is between the three devices UGW0, UGW1, UGW2. Combine versatility across devices.
  • the GW receives the number of user plane packet packets in the SPU instance does not correspond to the UGW network element, and because the number of SPU instances of UGW2 is different from UW0 and UGW1, the decomposition relationship of the service load between the SPU instances of UGW2 is different. For UGW0 and UGW1. Therefore, the "2G+3G user number" indicator of the UGW network element, the "4G user number” indicator of the UGW network element, and the "GW receiving user plane packet number" indicator of the SPU instance are combined in UGW0 and SPU instances of 4 devices.
  • UGW0 and SPU instance 1 UGW1 and SPU instance 0, UGW1 and SPU instance 1 have cross-device combination versatility, and also combine UGW2 and SPU instance 0, UGW2 and SPU instance 1, UGW2 and SPU instance 2 in 3 devices. There is cross-device combination versatility between the two, but there is no cross-device combination versatility between the seven device combinations.
  • the traffic volume indicator is the number of "Gi interface packets" of the UGW network element and the UGW network element.
  • the SGi user plane packet number indicator and the GW receiving user plane packet number indicator of the SPU instance determine that the value samples are from four device combinations: UGW0 and SPU instance 0, UGW0 and SPU instance 1, UGW1, and SPU. Example 0, UGW1 and SPU instance 1.
  • the value of the indicator and the GW receives the number of user plane packet packets is used as a data sample to obtain the first training data.
  • the first training data may select four device combinations as data sources, or may select three device combinations as data sources. In this embodiment, four device combinations are selected.
  • step 430 a regression analysis is performed on the "number of users" indicator and the "traffic volume” indicator in the first training data to obtain a "user number-to-communication model".
  • the data of the network elements UGW0 and UGW1 can be obtained, including: the number of 2G+3G users of the network element, the number of 4G users, the number of Gi interface packets, and the number of SGi user plane packets.
  • the GW receives the number of user plane packet packets in the SPU instance.
  • the obtained data of the network elements UGW0 and UGW1 can be filtered, and the data after the filtering can be subjected to regression analysis, and the “user number-traffic model” of the network element UGW0 and the network element UGW1 can be obtained (also referred to as the first A predictive model).
  • the "2G+3G users” indicator, the "4G user number” indicator, and the “Gi interface packet number” indicator, and the "SGi user face report” in the filtered data can be used.
  • the "packet number” indicator and the "GW receiving user plane packet number” index are used for regression analysis, and the "user number-traffic model” (first prediction model) of the network element UGW0 and the network element UGW1 can be trained.
  • the second training data is acquired according to the device to be modeled, and includes a “traffic” indicator and a “resource occupancy” indicator.
  • the rate indicator is the "CPU peak occupancy” indicator of the SPU instance, and the data samples are determined to be from the target device: UGW0 and SPU instance 0.
  • the second training data is acquired.
  • step 450 regression analysis is performed on the “traffic” indicator and the “resource occupancy” indicator in the second training data, and the “traffic-resource occupancy model” is obtained by training.
  • the "Gi interface packet number” indicator, the "SGi user plane packet number” indicator, and the "GW receiving user plane packet number” are used as traffic volume indicators, with UGW0 and The CPU peak occupancy rate indicator in SPU instance 0 (target device) is used as the resource occupancy index, and the regression analysis of the traffic volume index and the resource occupancy index can be used to train UGW0 and SPU instance 0 (target device).
  • Traffic-resource occupancy model second predictive model
  • the prediction process for the SPU instance 0 of the network element UGW0 may be as follows:
  • Step 1 According to the SPU of the network element UGW0, the device type of the SPU is the SPU device under the UGW device, and it can be determined that the “user number” index input by the first prediction model (the “user number-traffic model”) is the network element.
  • the "2G+3G users” indicator and the “4G users” indicator of the network element can further determine “predicted 2G+3G users" and "predicted 4G users".
  • Step 2 According to the SPU instance 0 in which the device to be predicted is the network element UGW0, the first prediction model ("user number-traffic model") and "predicted registered user number” corresponding to the device may be determined, and "predictive Gi” may be obtained. Number of interface packets, "predicting the number of SGi user plane packets", “predicting the number of kilobytes of GW receiving user plane packets”.
  • Step 3 According to the SPU instance 0 of the UGW0 to be predicted, the corresponding second prediction model ("Traffic-Resource Usage Model”) can be determined. According to the "traffic-resource occupancy model” and the “predicted number of Gi interface packets”, “predicting the number of SGi user plane packets”, “predicting the number of kilobytes of GW receiving user plane packets”, the “predictive CPU” Peak occupancy rate.”
  • the second prediction model may also be obtained by performing a quantile regression analysis on the second training data.
  • quantile regression analysis can be one of the methods of regression analysis.
  • the quantile can be a numerical point that divides the distribution range of the random variable according to the ratio of the probability.
  • the quantile regression can predict the upper or lower bound of the index.
  • a quantile parameter of 0.1 can be used to indicate that the distribution of the variable is divided into two parts, and the probability that the variable is less than 0.1 quantile can be 0.1.
  • the method for performing quantile regression on the second indicator data and the third indicator data in the second training data is not specifically limited in the embodiment of the present application.
  • a linear quantile regression method can be used.
  • a non-linear quantile regression method can also be used.
  • the second prediction model is trained by quantile regression analysis, and the upper and lower bounds of the third indicator data can be predicted instead of the average value, which can satisfy the application requirement to the boundary value.
  • the following uses the first indicator data as the number of users, the second indicator data as the traffic volume, and the third indicator data as the resource occupancy rate as an example, which is described in detail.
  • the second prediction model may be trained by performing quantile regression on the traffic volume indicator and the resource occupancy index in the second training data. This implementation will be described in detail below with reference to FIG. 5.
  • FIG. 5 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • FIG. 5 includes steps 510-550, and steps 510-540 respectively correspond to steps 410-440.
  • steps 510-550 and steps 510-540 respectively correspond to steps 410-440.
  • steps 510-540 respectively correspond to steps 410-440.
  • the training scenario shown in FIG. 3 is taken as an example to describe the flow of training the first prediction model and the second prediction model in detail.
  • step 510 according to the device to be modeled, the “resource occupancy rate” indicator, the “user number” indicator, and the “traffic volume” indicator to be predicted are determined.
  • step 520 according to the versatility of the correlation between the "number of users” indicator and the "traffic” indicator, a device having versatility with the device to be modeled is selected to obtain a device combination list 1.
  • the first training data is obtained according to the device combination list 1, and includes a “number of users” indicator and a “traffic volume” indicator.
  • step 530 regression analysis is performed on the "number of users" indicator and the "traffic volume” indicator in the first training data to obtain a "user number-to-communication" model.
  • the second training data is acquired according to the device to be modeled, and includes a “traffic” indicator and a “resource occupancy” indicator.
  • the "traffic-resource occupancy model" is trained by performing a quantile regression analysis on the traffic volume indicator and the resource occupancy index in the second training data.
  • the "Gi interface packet number” indicator, the "SGi user plane packet number” indicator, and the "GW receiving user plane packet number” are used as traffic volume indicators, and "CPU”
  • the peak occupancy rate indicator is used as the resource occupancy index, and the quantile regression analysis of the dialogue traffic indicator and the resource occupancy index can train the traffic volume-resource occupancy model of UGW0 and SPU instance 0 (target device). "(Second predictive model).
  • the second prediction model is established by quantile regression analysis, and the upper and lower bounds of the third indicator data can be predicted instead of the average value, which satisfies the concern of the application requirement on the boundary value.
  • the “constraint modeling” feature may be added during the process of establishing the first prediction model by performing regression analysis on the first indicator data and the second indicator data in the first training data. That is to say, the data diversity of the first indicator data can be judged. If the data diversity of the first indicator data in the first training data does not satisfy the preset condition, the first indicator data and the second indicator data in the first training data may be subjected to regression analysis of the coordinate origin. If the data diversity of the first indicator data in the first training data satisfies a preset condition, the regression analysis of the coordinate origin may not be constrained to the first indicator data and the second indicator data in the first training data.
  • the regression analysis that does not constrain the origin is used, and the information provided by the data set is fully utilized to obtain a more accurate model.
  • the first indicator data and the second indicator data are constrained to the regression analysis of the coordinate origin to avoid inaccurate model extrapolation caused by insufficient diversity of the first indicator data of the data set.
  • the preset condition mentioned above may represent a preset threshold. If the data diversity of the first indicator data reaches a preset threshold, it may be used to indicate that the data diversity of the first indicator data satisfies a preset condition.
  • regression analysis of the origin of the coordinates of the data and regression analysis of the origin of the coordinates can be regarded as regression analysis of the data.
  • the model obtained by regression analysis of the coordinate origin may have no constant term, and the model obtained by regression analysis without constraining the coordinate origin may have a constant term.
  • the regression model obtained by the origin of the coordinates may have a variable Y of 0 when the variable X is 0.
  • Regression analysis over the origin of the coordinates can be easier to calculate and easier to implement.
  • Regression analysis over a specified non-coordinate origin can usually be converted to a regression analysis of the origin of the coordinate.
  • the regression model obtained from the coordinate origin may have a predicted variable Y that is not necessarily zero when the variable X is zero.
  • the first indicator data may be subjected to a normalization process, and the data of the first indicator data subjected to the normalization process may be determined.
  • the first indicator data may be subjected to a standardization process, and the data of the first indicator data subjected to the normalization process may be determined.
  • the first indicator data may be subjected to a dimensionality reduction process, and the data of the first indicator data subjected to the dimensionality reduction processing may be determined.
  • the following uses the first indicator data as the number of users, the second indicator data as the traffic volume, and the third indicator data as the resource occupancy rate as an example, which is described in detail.
  • the judgment of the diversity of the number of users may be increased. This implementation will be described in detail below with reference to FIG. 6.
  • FIG. 6 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • the method of Figure 6 includes steps 610-670. Steps 610-670 are described in detail below.
  • the training scenario shown in FIG. 3 is taken as an example to describe the flow of training the first prediction model and the second prediction model in detail.
  • step 610 the “resource occupancy rate” indicator, the “user number” indicator, and the “traffic volume” indicator to be predicted are determined according to the device to be modeled.
  • the number of users to be predicted is determined by the “2G+3G users” indicator of the UGW network element and the “4G user number” indicator of the UGW network element.
  • the resource occupancy rate indicator to be predicted is the CPU peak occupancy rate indicator of the SPU instance.
  • the traffic metric to be predicted is the "Gi interface packet number" indicator of the UGW network element, the "SGi user plane packet number” indicator of the UGW network element, and the "GW receiving user plane packet number” indicator of the SPU instance.
  • step 620 according to the cross-device versatility of the correlation between the "number of users” indicator and the "traffic” indicator, a device having versatility with the device to be modeled is selected to obtain a device combination list 1.
  • the first training data is obtained according to the device combination list 1, and includes a “number of users” indicator and a “traffic volume” indicator.
  • the "2G+3G user number” indicator of the UGW network element According to UGW0, UGW1, and UGW2, the "2G+3G user number” indicator of the UGW network element, the "4G user number” indicator of the UGW network element, and the "Gi interface packet number” indicator of the UGW network element, and the UGW network element.
  • the "SGi User Face Packets" indicator is consistent in the definition of the three devices, and all correspond to the network element. It can be judged that the correlation between these indicators is between the three devices UGW0, UGW1, UGW2. Combine versatility across devices.
  • the GW receives the number of user plane packet packets in the SPU instance does not correspond to the UGW network element, and because the number of SPU instances of UGW2 is different from UW0 and UGW1, the decomposition relationship of the service load between the SPU instances of UGW2 is different. For UGW0 and UGW1. Therefore, the "2G+3G user number" indicator of the UGW network element, the "4G user number” indicator of the UGW network element, and the "GW receiving user plane packet number" indicator of the SPU instance are combined in UGW0 and SPU instances of 4 devices.
  • UGW0 and SPU instance 1 UGW1 and SPU instance 0, UGW1 and SPU instance 1 have cross-device combination versatility, and also combine UGW2 and SPU instance 0, UGW2 and SPU instance 1, UGW2 and SPU instance 2 in 3 devices. There is cross-device combination versatility between the two, but there is no cross-device combination versatility between the seven device combinations.
  • the traffic volume indicator is the number of "Gi interface packets" of the UGW network element and the UGW network element.
  • the SGi user plane packet number indicator and the GW receiving user plane packet number indicator of the SPU instance determine that the value samples are from four device combinations: UGW0 and SPU instance 0, UGW0 and SPU instance 1, UGW1, and SPU. Example 0, UGW1 and SPU instance 1.
  • the value of the indicator and the GW receives the number of user plane packet packets is used as a data sample to obtain the first training data.
  • the first training data may select four device combinations as data sources, or may select three device combinations as data sources. In this embodiment, four device combinations are selected.
  • step 630 it is determined whether the diversity of the "number of users" indicator in the first training data satisfies a preset condition.
  • the first prediction model (which may also be referred to as a "user number-traffic model") may be established by performing step 640. If the data diversity of the "number of users" indicator in the first training data does not satisfy the preset condition, the first prediction model may be established by performing step 650.
  • the diversity of the 2G+3G user number indicator and the "4G user number" indicator in the first training data can be judged.
  • step 640 the "user number” indicator and the “traffic volume” indicator in the first training data are subjected to regression analysis of the coordinate origin, and the "user number-traffic model” is obtained.
  • “2G+3G user number” indicator, "4G user number” indicator) does not satisfy the preset condition
  • "2G” in the first training data may be +3G user number” indicator, "4G user number” indicator and “Gi interface packet number” indicator, "SGi user plane packet number” indicator, "GW receiving user plane packet number” indicator to constrain the coordinate origin
  • the "user number-traffic model” (first prediction model) can be obtained.
  • step 430 in FIG. 4 For a specific regression model, refer to the description of step 430 in FIG. 4, and details are not described herein again.
  • step 650 the user number index and the traffic volume indicator in the first training data are subjected to regression analysis without constraining the coordinate origin to obtain a “user number-traffic model”.
  • the "2G+3G user number” indicator and the “4G user number” indicator may be used.
  • the index, the "4G user number” indicator, the "Gi interface packet number” indicator, the "SGi user plane packet number” indicator, and the "GW receiving user plane packet number” indicator perform regression analysis without constraining the coordinate origin.
  • the "user number-traffic model” (first prediction model) is obtained.
  • step 430 in FIG. 4 For a specific regression model, refer to the description of step 430 in FIG. 4, and details are not described herein again.
  • the second training data is acquired according to the device to be modeled, and includes a “traffic” indicator and a “resource occupancy” indicator.
  • the step 660 corresponds to the step 440 shown in FIG. 4 .
  • the step 660 corresponds to the step 440 shown in FIG. 4 .
  • details refer to the description of FIG. 4 , and details are not described herein again.
  • step 670 a regression analysis is performed on the traffic volume indicator and the resource occupancy index in the second training data to establish a second prediction model describing a numerical relationship between the traffic volume indicator and the resource occupancy indicator.
  • the "Gi interface packet number” indicator, the "SGi user plane packet number” indicator, and the "GW receiving user plane packet number” are used as traffic volume indicators, and "CPU”
  • the peak occupancy rate indicator is used as a resource occupancy index, a regression analysis of the traffic volume indicator and the resource occupancy index, and a second prediction model describing the numerical relationship between the traffic volume indicator and the resource occupancy index is established.
  • the first prediction model may be established by performing regression analysis on the origin of the first indicator data and the second indicator data in the first training data and the second training data, and may also be Feature processing is performed on the first indicator data and the second indicator data in the second training data. If the input value of the second indicator data feature processing is all zeros or approximately all zeros, the output value of the second indicator data feature processing is all zeros or approximately all zeros.
  • the regression analysis of the origin of the second indicator data and the first indicator data after the feature can be performed, and the first prediction model is trained. It is also possible to perform regression analysis of the origin of the second indicator data and the third indicator data after the feature, and establish a second prediction model.
  • all the feature values of all zeros or approximately all zeros may be mapped to output values of all features processing all zeros or approximately all zeros to match Regression analysis of the first training data constrained from the origin.
  • the method for performing feature processing on the second indicator data in the first training data and the second training data in the embodiment of the present application is not specifically limited.
  • the second indicator data in the first training data and the second training data may be subjected to dimensionality reduction processing.
  • principal component analysis may be performed on the second indicator data.
  • the second indicator data in the first training data and the second training data may also be subjected to normalization processing.
  • the first indicator data and the second indicator data in the second training data may also be normalized.
  • the number of tenants and the amount of service used are constrained to the regression analysis of the origin of the coordinates, so as to avoid inaccurate model extrapolation caused by insufficient diversity of the first indicator data of the data set.
  • the dimension reduction processing may be performed on the second indicator data in the first training data and the second training data.
  • the dimensionality reduction process can map high-dimensional data into low-dimensional data, reduce data redundancy, and can be regarded as a feature processing.
  • Principal component analysis is a common method of dimensionality reduction.
  • the implementation manner of performing the dimension reduction processing on the second indicator data in the first training data and the second training data is not specifically limited in the embodiment of the present application.
  • principal component analysis may be performed on the second indicator data on the first training data and the second training data, and the second indicator data principal component may be obtained.
  • the principal component analysis may be performed on the second indicator data in the first training data to obtain a principal component analysis model
  • the first training data may be subjected to dimensionality reduction processing according to the obtained principal component analysis model, and the dimensionality reduction is obtained.
  • the third training data may be trained according to the obtained third training data.
  • the second indicator data for performing principal component analysis may be the second indicator data original value.
  • the second indicator data for performing principal component analysis may be a value obtained by normalizing the original value of the second indicator data.
  • the second indicator data for performing principal component analysis may be a value obtained by normalizing the original value of the second indicator data.
  • the second indicator data is dimensioned to avoid computational difficulties caused by the collinearity of the second indicator data that may be encountered by performing regression analysis on the second training data.
  • the second indicator data in the first training data may be subjected to principal component analysis to achieve the purpose of reducing dimensionality of the second indicator data in the first training data.
  • principal component analysis to achieve the purpose of reducing dimensionality of the second indicator data in the first training data.
  • the output variable of the second indicator data principal component model may be multi-dimensional, but may not be more than the dimension of the input variable.
  • the second indicator data in the embodiment of the present application may be 3 dimensions, and the output second indicator data main component may be less than 3 dimensions, for example, including two dimensions: the second indicator data principal component 1 And the second indicator data main component 2.
  • the second indicator data principal component mentioned above may be used to represent a set of variables obtained after the second indicator data is subjected to principal component analysis.
  • Principal component analysis can be a statistical method.
  • a set of second indicator data with possible correlation can be transformed into a set of linear uncorrelated variables by orthogonal transformation.
  • the converted variable can be called the second indicator data main component. .
  • the input variables of the regression analysis may not be the original values of the second indicator data, but rather the values processed by the features.
  • the input variables of the regression analysis may not be the original values of the second indicator data, but rather the values processed by the features.
  • the following uses the first indicator data as the number of users, the second indicator data as the traffic volume, and the third indicator data as the resource occupancy rate as an example, which is described in detail.
  • FIG. 7 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • Figure 7 includes steps 710-780, which are described in detail below with respect to steps 710-780, respectively.
  • step 710 according to the device to be modeled, the “resource occupancy rate” indicator, the “user number” indicator, and the “traffic volume” indicator to be predicted are determined.
  • the resource occupancy index to be modeled is the SPU instance.
  • the CPU peak occupancy rate indicator determines that the traffic volume indicator is the "Gi interface packet number" indicator of the UGW network element, the "SGi user plane packet number” indicator of the UGW network element, and the "GW receiving user plane message of the SPU instance. Number of packages" indicator.
  • step 720 according to the cross-device versatility of the correlation between the "number of users” indicator and the "traffic” indicator, a device having versatility with the device to be modeled is selected to obtain a device combination list 1.
  • the first training data is obtained according to the device combination list 1, and includes the “number of users” indicator and the “traffic volume” indicator.
  • the method for obtaining the first training data in step 720 corresponds to the step 420 shown in FIG. 4 .
  • the method for obtaining the first training data in step 720 corresponds to the step 420 shown in FIG. 4 .
  • FIG. 4 For details, please refer to the description of FIG. 4 , and details are not described herein again.
  • step 730 principal component analysis is performed on the "traffic" indicator in the first training data to obtain a "communication principal component model”.
  • the traffic principal component model can be obtained, thereby achieving the purpose of achieving dimensionality reduction for the "traffic" indicator.
  • the first training data of the network elements UGW0 and UGW1 may be obtained, including: the 2G+3G user number indicator, the 4G user number indicator, the “Gi interface packet number” indicator, and the “SGi user plane packet number” of the network element. ” indicator and GW receive the number of user plane packet packets in the SPU instance.
  • the "traffic" indicator in the first training data (for example, the "Gi interface packet number” indicator, the "SGi user plane packet number” indicator, and the "GW receiving user plane packet number” indicator of the SPU instance may be used.
  • a traffic principal component model can be obtained.
  • step 740 the first training data is processed using a "communication principal component model.”
  • the "Gi interface packet number” indicator, the "SGi user plane packet number” indicator, and the “GW receiving user plane packet number” in the first training data are obtained by the "communication principal component model” obtained in step 730.
  • the indicator is processed to obtain the third training data.
  • the data of the network elements UGW0 and UGW1 can be obtained, including: the number of 2G+3G users of the network element, the number of 4G users, the number of Gi interface packets, and the number of SGi user plane packets.
  • the GW receives the number of user plane packet packets in the SPU instance.
  • the acquired data of the network elements UGW0 and UGW1 can be filtered, and the data after the screening can be subjected to regression analysis, and the third training data can be obtained.
  • step 750 a regression analysis is performed on the "user number” indicator and the "traffic amount” indicator in the third training data to obtain a "user number-to-communication amount” model.
  • the regression analysis can be performed on the “user number” indicator and the “traffic volume” indicator in the third training data, and the “user number-traffic model” of the network element UGW0 and the network element UGW1 can be obtained (also referred to as the first Forecast model). For example, if the screening is performed at 17:00 per day as the peak time, the "2G+3G users number” indicator, the "4G user number” indicator, and the "Gi interface packet number” indicator in the third training data data after screening can be used.
  • the SGi user plane packet number indicator and the GW receiving user plane packet number index are regression analysis, and the "user number-traffic model" (first prediction model) of the network element UGW0 and the network element UGW1 can be obtained. .
  • the second training data is obtained according to the device to be modeled, and includes a “traffic” indicator and a “resource occupancy” indicator.
  • the rate indicator is the "CPU peak occupancy” indicator of the SPU instance, and the data samples are determined to be from the target device: UGW0 and SPU instance 0.
  • the second training data is acquired.
  • the second training data may also select other device combinations such as UGW0 and SPU instance 1, and this embodiment selects UGW0 and SPU instance 0.
  • step 770 the second training data is processed using the "Traffic Principal Component Model.”
  • the "Gi interface packet number” indicator On the second training data, using the "communication principal component model” established in step 730, the "Gi interface packet number” indicator, the "SGi user plane packet number” indicator, and the “GW receiving user plane packet number” The indicator is characterized and processed to obtain traffic volume indicators.
  • step 780 regression analysis is performed on the "traffic" indicator and the "resource occupancy” indicator in the second training data to obtain a "traffic-resource occupancy model”.
  • the "Gi interface packet number” indicator, the "SGi user plane packet number” indicator, and the "GW receiving user plane packet number” are used as traffic volume indicators, with UGW0 and The CPU peak occupancy rate indicator in SPU instance 0 (target device) is used as a resource occupancy index, and the traffic volume index and resource occupancy index are regression analysis.
  • the training results in the traffic volume-resource occupancy model. Two prediction models).
  • dashed arrows in Figure 7 can be used to indicate an indirect effect on another step.
  • a dashed arrow between step 730 and step 770 can be used to indicate the indirect impact of the traffic principal component model established in step 730 on performing step 770.
  • the principal component analysis may be performed on the second indicator data, and the second indicator data may be transformed to form mutually independent principal components, which may avoid the problem that the traffic collinearity causes difficulty in calculation.
  • the model obtained by principal component analysis describes a mapping.
  • the points in the original space can be mapped to points in the mapping space, and the origin in the original space is not necessarily at the origin in the mapping space.
  • the origin of the original space can be translated to the origin by translation transformation.
  • the determination of the diversity of the first indicator data may be added on the basis of FIG. If the diversity of the first indicator data satisfies the preset condition mentioned above, the first indicator data and the second indicator data in the first training data may be subjected to regression analysis of the origin. If the diversity of the first indicator data does not satisfy the preset condition mentioned above, the first indicator data and the second indicator data in the first training data may be subjected to regression analysis of the origin. This implementation will be described in detail below in conjunction with FIG.
  • the determination of the diversity of the first indicator data may be increased based on FIG. 7 (the principal component analysis is performed on the second indicator data in the first training data and the second training data).
  • the first indicator data diversity satisfies a preset condition
  • the first indicator data and the second indicator data may be subjected to a regression analysis that does not constrain the origin.
  • the first indicator data diversity does not satisfy the preset condition
  • the first indicator data and the second indicator data may be subjected to a regression analysis that constrains the origin.
  • the origin in the original space is not necessarily at the origin in the mapping space.
  • the origin of the original space can be transformed into the origin by translation transformation, so that the first indicator data and the second indicator data (the main component of the traffic volume) can be constrained by the regression analysis of the origin.
  • the following uses the first indicator data as the number of users, the second indicator data as the traffic volume, and the third indicator data as the resource occupancy rate as an example, which is described in detail.
  • FIG. 8 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • Figure 8 includes steps 810-890, which are described in detail below with respect to steps 810-890, respectively.
  • step 810 according to the device to be modeled, the “resource occupancy rate” indicator, the “user number” indicator, and the “traffic volume” indicator to be predicted are determined.
  • the resource occupancy index to be modeled is the SPU instance.
  • the CPU peak occupancy rate indicator determines that the traffic volume indicator is the "Gi interface packet number" indicator of the UGW network element, the "SGi user plane packet number” indicator of the UGW network element, and the "GW receiving user plane message of the SPU instance". Number of packages" indicator.
  • step 820 according to the cross-device versatility of the correlation between the "number of users” indicator and the "traffic” indicator, a device having versatility with the device to be modeled is selected to obtain a device combination list 1.
  • the first training data is obtained according to the device combination list 1, and includes a “number of users” indicator and a “traffic volume” indicator.
  • the step 820 corresponds to the step 720 shown in FIG. 7.
  • the step 820 corresponds to the step 720 shown in FIG. 7.
  • details please refer to the description of FIG. 7, and details are not described herein again.
  • step 830 principal component analysis is performed on the "user number” indicator and the "traffic volume” indicator in the first training data to obtain a "communication principal component model”.
  • Step 830 corresponds to step 730 shown in FIG. 7.
  • Step 830 corresponds to step 730 shown in FIG. 7.
  • details please refer to the description of FIG. 7, and details are not described herein again.
  • step 840 it is determined whether the data diversity of the "number of users" indicator in the first training data is sufficient.
  • the diversity of the number of users on the first training data may be determined. If the data diversity of the number of users in the first training data meets the preset condition, step 850 may be performed. If the data diversity of the number of users in the first training data does not meet the preset condition, step 860 may be performed first, and step 850 is performed.
  • step 850 the first training data is processed using a "communication principal component model.”
  • the traffic component principal component model established in step 830 is used for the "Gi interface packet number" indicator and the "SGi user".
  • the face packet number indicator and the GW receive user face packet number indicator are processed to obtain the third training data.
  • step 860 a translation transformation is determined that is added to the "communication principal component model.”
  • the regression analysis of the origin of the constraint on the user number index and the traffic volume indicator (the traffic principal component) on the first training data is required.
  • the model obtained by principal component analysis can map the points in the original space to the points in the mapping space, and the origin in the original space is not necessarily at the origin in the mapping space.
  • the output value processed by the traffic indicator is not all zero or approximately all zero, and may be based on not being full
  • the output value of zero or approximately all zeros determines a translational change T, and the output values of the first training data and the second training data that are not all zeros or approximately all zeros may be subjected to translation transformation T to cause the traffic volume index to be processed.
  • the output value is all zero or approximately all zeros.
  • step 870 a regression analysis is performed on the "user number” indicator and the "traffic amount” indicator in the third training data to obtain a "user number-traffic model".
  • the "2G+3G user number” indicator and the "4G user number” index are used as the user number index, and the user number index and the traffic volume index are subjected to regression analysis, and the first prediction model is trained.
  • the second training data is obtained according to the device to be modeled, and includes a “traffic” indicator and a “resource occupancy” indicator.
  • Step 880 corresponds to step 760 shown in FIG. 7.
  • Step 880 corresponds to step 760 shown in FIG. 7.
  • details please refer to the description of FIG. 7, and details are not described herein again.
  • step 890 the second training data is processed using the "communication principal component model.”
  • Step 890 corresponds to step 770 shown in FIG. 7.
  • Step 890 corresponds to step 770 shown in FIG. 7.
  • details please refer to the description of FIG. 7, and details are not described herein again.
  • step 895 a regression analysis is performed on the "traffic” indicator and the "resource occupancy” indicator in the second training data to obtain a "traffic-resource occupancy model".
  • the “CPU peak occupancy rate” indicator is used as the resource occupancy index, and the dialogue traffic indicator and the resource occupancy index are subjected to regression analysis, and the training results in the “traffic-resource occupancy model” (second Forecast model).
  • the dashed arrows in Figure 8 can be used to indicate an indirect effect on another step.
  • the dashed arrow between step 830 and step 860 can be used to indicate the indirect impact of the traffic principal component model established in step 830 on performing step 860.
  • the dashed arrow between step 830 and step 890 can be used to indicate the indirect impact of the traffic principal component model established in step 830 on execution step 890.
  • the translation transformation may be added, and the second indicator data principal component may be added.
  • the non-origin regression analysis in space is converted to the origin regression analysis (corresponding to the original space origin to the mapping space origin), which is simple and easy to implement.
  • the first indicator data may be subjected to regression analysis of the first indicator data on the first indicator data, and the first indicator data may be performed on the first data. Feature processing. If the input value of the first indicator data processing is all zero or approximately all zeros, the output value of the first indicator data processing is all zeros or approximately all zeros. A regression analysis of the origin of the first indicator data and the second indicator data may be performed to establish a first prediction model.
  • all the input values of all zeros or approximately all zeros should be mapped to the output values of all features all zero or approximately all zeros to match the first training data.
  • the method for performing feature processing on the first indicator data on the first training data in the embodiment of the present application is not specifically limited.
  • the first indicator data may be subjected to a dimensionality reduction process on the first training data.
  • the first indicator data may be subjected to principal component analysis.
  • it may also be that the first indicator data is normalized on the first training data.
  • it may also be that the first indicator data is normalized on the first training data.
  • the output value of the first indicator data processing is not all zeros or approximately all zeros, and a translational change may be determined according to an output value that is not all zero or approximately all zeros, and may not be all zeros or approximately all zeros.
  • the output value of the first indicator data processing is all zero or approximately all zeros.
  • the regression analysis of the origin of the first indicator data and the second indicator data may be performed to establish a first prediction model.
  • the method for performing feature processing on the first indicator data in the embodiment of the present application is not specifically limited.
  • the first indicator data may be subjected to dimensionality reduction processing.
  • principal component analysis may be performed on the first indicator data.
  • all the input values of all zeros or approximately all zeros should be mapped to the output values of all features all zero or approximately all zeros to match the first training data.
  • the first indicator data of the first training data may be dimension-reduced by principal component analysis to obtain a first dimension data dimensionality reduction feature. This implementation will be described in detail below with reference to FIG.
  • the first indicator data for performing principal component analysis may be the first indicator data original value.
  • the first indicator data for performing principal component analysis may be a value obtained by normalizing the original value of the first indicator data.
  • the first indicator data for performing principal component analysis may be a value obtained by normalizing the original value of the first indicator data.
  • the following uses the first indicator data as the number of users, the second indicator data as the traffic volume, and the third indicator data as the resource occupancy ratio as an example, and details are described.
  • FIG. 9 is a schematic flowchart of training a first prediction model and a second prediction model according to another embodiment of the present application.
  • Figure 9 includes steps 910-970, which are described in detail below with respect to steps 910-970, respectively.
  • step 910 the “resource occupancy rate” indicator, the “user number” indicator, and the “traffic volume” indicator to be predicted are determined according to the device to be modeled.
  • the step 910 corresponds to the step 410 shown in FIG. 4 .
  • the step 910 corresponds to the step 410 shown in FIG. 4 .
  • details please refer to the description of FIG. 4 , and details are not described herein again.
  • step 920 according to the cross-device versatility of the correlation between the "number of users” indicator and the "traffic” indicator, a device having versatility with the device to be modeled is selected to obtain a device combination list 1.
  • the first training data is obtained according to the device combination list 1, and includes a “number of users” indicator and a “traffic volume” indicator.
  • the first training data is obtained in step 920 and corresponds to step 420 shown in FIG. 4 .
  • the first training data is obtained in step 920 and corresponds to step 420 shown in FIG. 4 .
  • FIG. 4 For details, please refer to FIG. 4 , and details are not described herein again.
  • step 930 a principal component analysis is performed on the "number of users" indicator in the first training data, and a "user number principal component model" is established.
  • the "user number principal component model” By performing principal component analysis on the "user number” indicator in the first training data, the "user number principal component model” can be obtained, thereby achieving the purpose of achieving dimensionality reduction for the "user number” indicator.
  • the principal component analysis can be performed on the "2G+3G user number” indicator and the "4G user number” indicator in the first training data, and the "user number principal component model" can be obtained.
  • step 940 the first training data is processed using the "user number principal component model".
  • the "2G+3G user number” indicator and the "4G user number” indicator in the first training data are processed by the user number principal component model to obtain fourth training data.
  • step 950 a regression analysis is performed on the "user number” indicator and the "traffic amount” indicator in the fourth training data to obtain a "user number-traffic model".
  • the "Gi interface packet number” indicator, the “SGi user plane packet number” indicator, and the “GW receiving user plane packet number” indicator are used as traffic volume indicators, and the user number principal component is used as the user. Number indicator. Regression analysis is performed on the number of users and the traffic volume indicator, and the training "user number-traffic model" (first prediction model) is obtained.
  • the second training data is obtained according to the device to be modeled, and includes a “traffic” indicator and a “resource occupancy” indicator.
  • the rate indicator is the "CPU peak occupancy” indicator of the SPU instance, and the data samples are determined to be from the target device: UGW0 and SPU instance 0.
  • the second training data is acquired.
  • the second training data may also select other device combinations such as UGW0 and SPU instance 1, and this embodiment selects UGW0 and SPU instance 0.
  • step 970 a regression analysis is performed on the "traffic" indicator and the "resource occupancy” indicator in the second training data to obtain a second prediction model.
  • the “CPU peak occupancy rate” indicator is used as the resource occupancy index, and the dialogue traffic indicator and the resource occupancy index are subjected to regression analysis, and the second prediction model is trained.
  • the principal component analysis may be performed on the first indicator data indicator, and the first indicator data indicator may be deformed to form mutually independent principal components, which may avoid the problem that the first indicator data index is collinear and difficult to calculate. .
  • principal component analysis may be performed on the first indicator data indicator and the second indicator data simultaneously.
  • the embodiment of the present application provides a prediction method, where the predicted first indicator data indicator of the target device is obtained, and the predicted third indicator data indicator is obtained by using the first prediction model and the second prediction model.
  • FIG. 10 is a schematic diagram of a training device provided by an embodiment of the present application.
  • the training device 1000 of FIG. 10 can perform the training method described in any of the embodiments of FIGS.
  • the training device 1000 of Figure 10 can include:
  • the first obtaining module 1001 is configured to acquire first training data and second training data
  • the first training data includes first indicator data and second indicator data of the plurality of devices
  • the second training data includes second indicator data and third indicator data of the target device
  • the target device is the multiple devices Any of the devices.
  • a first training module 1002 configured to learn, according to the first training data, a first prediction model
  • the first prediction model is configured to predict second indicator data of the target device according to the first indicator data of the target device.
  • a second training module 1003 configured to learn, according to the second training data, a second prediction model
  • the second prediction model is configured to predict third indicator data of the target device according to the second indicator data of the target device obtained by the first prediction model.
  • the first indicator data is a user quantity
  • the second indicator data is a traffic volume
  • the third indicator data is a resource occupancy rate
  • the apparatus 1000 further includes:
  • the second obtaining module 1004 is configured to acquire first target data to be predicted of the target device.
  • the first determining module 1005 is configured to input the first indicator data to be predicted into the first prediction model to obtain predicted second indicator data of the target device.
  • the second determining module 1006 is configured to input the predicted second indicator data into the second prediction model to obtain a prediction result of the target device.
  • the prediction model includes the first prediction model and the second prediction model
  • the first prediction model is trained according to the first training data
  • the second prediction model is based on the second training data Trained.
  • the first training module 1002 is specifically configured to: perform principal component analysis on the second indicator data in the first training data to obtain a principal component analysis model; and first, according to the principal component analysis model The training data is subjected to dimensionality reduction processing to obtain third dimension data after dimensionality reduction; and the first prediction model is trained according to the third training data.
  • the first training module 1002 is specifically configured to: obtain a first prediction model by performing regression analysis on the first training data.
  • the first training module 1002 is specifically configured to perform an origin regression analysis on the first training data if the diversity of the first training data satisfies a preset condition; If the diversity of the data does not satisfy the preset condition, the first training data is not subjected to the origin regression analysis.
  • the second training module 1003 is specifically configured to: obtain a second prediction model by performing regression analysis on the second training data.
  • the second training module 1003 is specifically configured to: obtain a second prediction model by performing a quantile regression analysis on the second training data.
  • the indicator relationship between the first indicator data and the second indicator data between the plurality of devices is consistent.
  • FIG. 11 is a schematic diagram of a prediction apparatus provided by an embodiment of the present application.
  • the prediction apparatus 1100 of FIG. 11 can be used to perform the prediction method of any of the possible implementations of the second aspect or the second aspect.
  • the prediction device 1100 in FIG. 11 may include:
  • the first obtaining module 1101 is configured to acquire first target data to be predicted of the target device
  • a first determining module 1102 configured to input the first indicator data to be predicted into the first prediction model, to obtain predicted second indicator data of the target device;
  • the second determining module 1103 is configured to input the predicted second indicator data into the second prediction model to obtain a prediction result of the target device.
  • the prediction model includes a first prediction model and a second prediction model, the first prediction model is trained according to the first training data, and the second prediction model is trained according to the second training data, the first training The data includes first indicator data and second indicator data of the plurality of devices, the second training data including second indicator data and third indicator data of the target device, the plurality of devices including the target device.
  • the first indicator data is a user quantity
  • the second indicator data is a traffic volume
  • the third indicator data is a resource occupancy rate
  • the device 1100 further includes:
  • the second obtaining module 1104 is configured to acquire the first training data.
  • the first training module 1105 is configured to train the first prediction model according to the first training data.
  • the first prediction model is configured to predict second indicator data of the target device according to the first indicator data of the target device.
  • the first training module 1105 is specifically configured to:
  • the first training module 1105 is specifically configured to: obtain the first prediction model by performing regression analysis on the first training data.
  • the first training module 1105 is specifically configured to perform an origin regression analysis on the first training data if the diversity of the first training data meets a preset condition; In the case that the diversity of the first training data does not satisfy the preset condition, the first training data is subjected to a homeless regression analysis.
  • the device 1100 further includes:
  • the third obtaining module 1106 is configured to acquire the second training data.
  • the second training module 1107 is configured to train the second prediction model according to the second training data.
  • the second prediction model is configured to predict third indicator data of the target device according to the second indicator data of the target device obtained by the first prediction model.
  • the second training module 1107 is specifically configured to: obtain the second prediction model by performing regression analysis on the second training data.
  • the second training module 1107 is specifically configured to: obtain the second prediction model by performing a quantile regression analysis on the second training data.
  • the indicator relationship between the first indicator data indicator and the second indicator data between the multiple devices is consistent.
  • FIG. 12 is a schematic structural diagram of a training apparatus according to an embodiment of the present application.
  • the training device 1200 of Figure 12 can perform the training method described in any of the embodiments of Figures 1-9.
  • the training device 1200 of FIG. 12 can include a memory 1201 and a processor 1202.
  • the memory 1201 can be used to store programs, and the processor 1202 can be used to execute programs stored in the memory.
  • the processor 1202 can be used to perform the training method described in any of the above embodiments.
  • the processor 1202 can be, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), and a field programmable gate. Field Programmable Gate Array (FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor may also be a combination of computing functions, such as one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
  • the memory 1201 can be used to store program code and data of the device for modeling the numerical relationship between the user number indicator and the resource occupancy index. Therefore, the memory 1201 may be a storage unit inside the processor 1202, or may be an external storage unit independent of the processor 1202, or may be a storage unit including the processor 1202 and an external storage unit independent of the processor 1202. component.
  • FIG. 13 is a schematic structural diagram of a prediction apparatus according to an embodiment of the present application.
  • the prediction device 1300 of FIG. 13 can be used to perform the prediction method of any of the possible implementations of the second aspect or the second aspect.
  • the prediction device 1300 of FIG. 13 may include a memory 1301 and a processor 1302.
  • the memory 1301 can be used to store programs, and the processor 1302 can be used to execute programs stored in the memory.
  • the processor 1302 can be used to perform the training method described in any of the above embodiments.
  • the processor 1302 can be, for example, a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), and a field programmable gate. Field Programmable Gate Array (FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor may also be a combination of computing functions, such as one or more microprocessor combinations, a combination of a DSP and a microprocessor, and the like.
  • the memory 1301 can be used to store the program code and data of the device for modeling the numerical relationship between the user number indicator and the resource occupancy index. Therefore, the memory 1301 may be a storage unit inside the processor 1302, or may be an external storage unit independent of the processor 1302, or may be a storage unit including the processor 1302 and an external storage unit independent of the processor 1302. component.
  • the embodiment of the present application provides a computer readable storage medium, including computer instructions, when the computer instruction is run on a training device, causing the training device to perform the first aspect or any one of the first aspects. Training method.
  • the embodiment of the present application provides a computer readable storage medium, including computer instructions, when the computer instruction is run on a predicting device, causing the predicting device to perform the second aspect or the second aspect Forecast method.
  • An embodiment of the present application provides a chip, including a memory and a processor, the memory is used to store a program, and the processor is configured to execute a program stored in the memory, when the program is executed, the processor The method of any one of the implementations of the first aspect or the first aspect.
  • An embodiment of the present application provides a chip, including a memory and a processor, the memory is used to store a program, and the processor is configured to execute a program stored in the memory, when the program is executed, the processor The method of any one of the second aspect or the second aspect is performed.
  • the embodiment of the present application provides a computer program product, when the computer program product is run on a computer, causing the computer to perform the method of any one of the first aspect or the first aspect.
  • the embodiment of the present application provides a computer program product, when the computer program product is run on a computer, causing the computer to perform the method of any one of the second aspect or the second aspect.
  • the term "and/or” is merely an association relationship describing an associated object, indicating that there may be three relationships.
  • a and/or B may indicate that A exists separately, and A and B exist simultaneously, and B cases exist alone.
  • the character "/" in this article generally indicates that the contextual object is an "or" relationship.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (such as a digital video disc (DVD)), or a semiconductor medium (such as a solid state disk (SSD)).
  • a magnetic medium for example, a floppy disk, a hard disk, a magnetic tape
  • an optical medium such as a digital video disc (DVD)
  • a semiconductor medium such as a solid state disk (SSD)
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请提供了一种预测方法、训练方法、装置以及计算机可读存储介质。该方法包括:获取目标设备的待预测第一指标数据;将所述待预测第一指标数据输入第一预测模型,得到所述目标设备的预测第二指标数据;将所述预测第二指标数据输入第二预测模型,得到所述目标设备的预测结果,其中,所述第二预测模型根据第二训练数据训练得到,所述第二训练数据包括所述目标设备的第二指标数据和第三指标数据。本申请提供的技术方案可以更准确地体现第一指标数据与第三指标数据之间的函数关系,可以根据待预测的第一指标数据准确地预测出目标设备的预测结果(预测第三指标数据)。

Description

一种预测方法、训练方法、装置及计算机存储介质
本申请要求于2018年05月18日提交中国专利局、申请号为201810481548.0、申请名称为“一种预测方法、训练方法、装置及计算机存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,并且更具体地,涉及一种预测方法、训练方法、装置及计算机存储介。
背景技术
在训练模型过程中,可以获取到设备的训练数据,并可以根据训练数据得到预测模型。在预测过程中,可以根据得到的预测模型以及实际情况对未来的一些情况进行预测。
现有技术中,在训练模型过程中,可以直接获取一个网络设备的两个训练数据,可以根据该两个训练数据,直接训练得到预测模型。
例如,通信网络运营商在开展业务活动之前,需要根据使用某种服务的数(也可以称为用户数指标)的假设值预测通信设备未来的资源占用率(也可以称为资源占用率指标),可以对可能过载的网络设备提前进行扩容,从而保障系统平稳运行。
现有技术中,直接训练目标设备的用户数指标与资源占用率指标之间的函数关系,并根据假设的用户数以及用户数与资源占用率之间的函数关系预测资源占用率。由于在数据样本采集周期内,目标设备的用户数指标有可能缺乏大幅度变化,使得样本数据缺乏多样性。用户数与资源占用率之间的关系不会在数据中充分体现,很难得到准确的用户数与资源占用率之间的函数关系,并且预测出的该函数关系很难实现大范围的外推预测。
因此,在采集到的一个训练数据样本多样性不足的情况下,如何通过两个训练数据得到准确的预测模型,可以根据其中一个预测数据准确预测出另一个数据成为亟需要解决的问题。
发明内容
本申请提供一种预测方法、训练方法、装置及计算机存储介质,可以通过两个训练数据得到准确的预测模型,可以根据其中一个预测数据准确预测出另一个数据。
第一方面,提供了一种预测方法,包括:获取目标设备的待预测第一指标数据;将所述待预测第一指标数据输入第一预测模型,得到所述目标设备的预测第二指标数据;将所述预测第二指标数据输入第二预测模型,得到所述目标设备的预测结果。
本申请实施例中第一预测模型可以根据第一训练数据训练得到,第二预测模型根据第二训练数据训练得到。
应理解,第一训练数据可以包括多个设备的第一指标数据和第二指标数据,也就是说, 第一指标数据和第二指标数据可以来自包括目标设备在内的多个网络设备。
本申请实施例中对第一指标数据和第二指标数据不做具体限定,可以是任意的两个指标数据。
可选地,在一些实施例中,第一指标数据可以是用户数,第二指标数据可以是话务量。
第二训练数据可以包括目标设备的第二指标数据和第三指标数据。
应理解,第一指标数据和第二指标数据可以来自目标设备。
本申请实施例中对第二指标数据和第三指标数据不做具体限定,可以是任意的两个指标数据。
可选地,在一些实施例中,第二指标数据可以是话务量,第三指标数据可以是资源占用率。
本申请对提及的目标设备和/或网络设备(也可以称为通信网络设备)不做具体限定,可以包括但不限于网络中的任意子网、网元、网元的子设备(例如单板)、网元的功能单元(例如模块)。例如,通信网络设备可以包括但不限于网络适配器、网络收发器、网络媒体转换设备、多路复用器、中断器、集线器、网桥、交换机、路由器、网关等。
下面以第一指标数据为用户数,第二指标数据为话务量为例,进行详细说明。
应理解,本申请实施例中用户数指标可以表示为在网络通信设备中使用某种服务的用户数。
应理解,同一个网络通信设备可以有多个用户数指标。作为一个示例,用户数可以表示为“2G+3G”用户数指标。作为另一个示例,用户数还可以表示为“4G用户数”指标。作为另一个示例,用户数还可以表示为“注册用户数”指标。本申请实施例对此不做具体限定。
本申请实施例中话务量指标可以理解为在网络通信设备中使用某种服务的数量。
应理解,同一个设备可以有多个话务量指标。作为一个示例,通信网络设备中的话务量可以用于表示网络设备的“总占用话务量”指标。作为另一个示例,通信网络设备中的话务量还可以用于表示网络设备的“Gi接口包数”指标。作为另一个示例,通信网络设备中的话务量还可以用于表示网络设备的“SGi用户面报文包数”指标。本申请实施例对此不做具体限定。
应理解,本申请实施例中资源占用率指标可以表示为在网络通信设备中的资源消耗量。
应理解,不同的设备可以具有不同的资源占用率指标。作为一个示例,资源占用率指标可以表示为“CPU峰值占用率”。作为另一个示例,资源占用率指标还可以表示为“内存占用率”。作为另一个示例,资源占用率还可以表示为“License占用量”。本申请实施例对此不做具体限定。
本申请实施例中多个网络设备可以是与目标设备的用户数指标和话务量指标的指标关系具有一致性的网络设备。
应理解,其他网络设备的用户数指标和话务量指标的指标关系与目标设备的用户数指标和话务量指标的指标关系相同或具有大致相同的变化趋势。
如果其他网络设备的用户数指标和话务量指标的指标关系与目标设备中用户数指标和话务量指标的指标关系具有一致性,可以采集多个网络设备(包括目标设备以及其他网 络设备)的用户数指标和话务量指标,可以扩大用户数指标的数据样本多样性。由于其他网络设备的用户数指标和话务量指标的指标关系与目标设备的用户数指标和话务量指标的指标关系相同或具有大致相同的变化趋势,通过对采集到的多个网络设备(包括目标设备以及其他网络设备)的用户数指标和话务量指标进行训练,得到的预测模型可以适用于目标设备以及其他网络设备。用训练出来的预测模型可以准确的预测目标设备及其他网络设备的预测资源占用率。
例如,网元ATS0和网元ATS1属于同类型的通信设备。通信设备可以具有层级分解结构,网元ATS0可以分解为模块VCU0、VCU1、DPU0(网元ATS0的业务可以负载在其下的三个模块(VCU0、VCU1、DPU0)之间平均分配)。网元ATS1可以分解为模块VCU0、VCU1、DPU0(网元ATS1的业务也可以负载在其下的三个模块(VCU0、VCU1、DPU0)之间平均分配)。用户数指标(例如,“注册用户数”指标)可以对应到网元(ATS0、ATS1),话务量指标(例如,“总占用话务量”指标)可以对应到网元(ATS0、ATS1),资源占用率指标(例如,“CPU峰值占用率”指标)可以对应到网元(ATS0、ATS1)下设的模块(VCU0、VCU1、DPU0、)。网元ATS0和网元ATS1属于同类型的通信设备,“注册用户数”指标和“总占用话务量”指标均对应到网元,并且“注册用户数”指标和“总占用话务量”指标的定义在网元ATS0和网元ATS1上相同。因此,“注册用户数”指标和“总占用话务量”指标之间的相关关系在网元ATS0和网元ATS1之间具有跨设备组合通用性(网元ATS0和网元ATS1上的用户数指标和话务量指标的指标关系具有一致性)。
上述技术方案中,可以通过采集到的多个设备(包括目标设备与其他网络设备)的第一训练数据训练出第一预测模型,可以扩大第一指标数据的历史数据样本的多样性。再可以通过采集到的目标设备的第二训练数据训练出第二预测模型,从而可以更准确地体现第一指标数据与第二指标数据之间的函数关系。
结合第一方面,在一种可能的实现方式中,该方法还包括:获取第一训练数据;根据第一训练数据得到第一预测模型。
应理解,第一预测模型用于根据目标设备的第一指标数据预测目标设备的第二指标数据。
本申请实施例对通过第一训练数据训练出第一预测模型的实现方式不做具体限定,作为一个示例,可以是对第一训练数据回归分析,得到第一预测模型。作为另一个示例,还可以是对第一训练数据进行过原点回归分析,得到第一预测模型。
本申请实施例对通过第二训练数据训练出第二预测模型的实现方式不做具体限定,作为一个示例,可以是对第二训练数据回归分析,得到第二预测模型。作为另一个示例,还可以是对第二训练数据进行过原点回归分析,得到第二预测模型。作为另一个示例,还可以是对第二训练数据进行分位数回归分析,得到第二预测模型。
本申请实施例中也可以对第一指标数据、第二指标数据、第三指标数据(例如,用户数指标、话务量指标、资源占用率指标)进行回归分析之前,还可以对指标数据进行特征处理。作为一个示例,可以对指标数据进行标准化(standardization)处理。作为另一个示例,可以对指标数据进行归一化(normalization)处理。作为另一个示例,可以对指标数据进行降维处理。
上述技术方案中,可以将待预测第一指标数据输入第一预测模型,可以得到目标设备的预测结果。
结合第一方面,在一种可能的实现方式中,对该第一训练数据中的第一指标数据进行主成分分析,得到主成分分析模型;根据该主成分分析模型,对该第一训练数据进行降维处理,得到降维后的第三训练数据;根据该第三训练数据,训练该第一预测模型。
本申请实施中对第一训练数据进行降维的方式有多种,本申请对比不做具体限定。作为一个示例,可以对第一训练数据进行主成分分析。例如,可以对第一训练数据中的第二指标数据进行主成分分析,得到主成分分析模型,可以根据该主成分分析模型对第一训练数据进行降维处理,可以得到降维后的第三训练数据。作为另一个示例,可以对第一训练数据进行低方差滤波(low variance filter)处理,可以实现对第一训练数据进行降维处理。作为另一个示例,可以对第一训练数据进行反向特征消除(backward feature elimination)处理,可以实现对第一训练数据进行降维处理。
应理解,主成分分析的处理方法可以一种统计方法,可以通过正交变换将一组可能存在相关性的第二指标数据转换为一组线性不相关的变量,转换后的变量可以称为第二指标数据主成分。
上述技术方案中,对第一指标数据进行降维,避免通过对第二训练数据进行回归分析可能遇到的由第一指标数据的共线性导致的计算困难。
结合第一方面,在一种可能的实现方式中,通过对该第一训练数据进行回归分析,得到该第一预测模型。
本申请实施例中,根据第一训练数据训练得到第一预测模型的方法有很多,本申请不做具体限定。作为一个示例,可以通过对第一训练数据进行回归分析,得到该第一预测模型。作为另一个示例,还可以通过对第一训练数据进行过原点的回归分析,得到该第一预测模型。
上述技术方案中,可以通过回归分析的方法训练得到第一预测模型,可以准确地计量各个因素之间的相关程度和拟合程度的高低,计算简单,易于实现。
结合第一方面,在一种可能的实现方式中,在该第一训练数据的多样性满足预设条件的情况下,对该第一训练数据进行过原点回归分析;在该第一训练数据的多样性不满足该预设条件的情况下,对该第一训练数据进行不过原点回归分析。
本申请实施例中,可以对第一指标数据的数据多样性进行判断。如果第一训练数据中的第一指标数据的数据多样性不满足预设条件,可以对第一训练数据中的第一指标数据标和第二指标数据进行约束过坐标原点的回归分析。如果第一训练数据中的第一指标数据的数据多样性满足预设条件,可以对第一训练数据中的第一指标数据和第二指标数据不约束过坐标原点的回归分析。
应理解,上文提及的预设条件可以表示一个预设阈值,如果第一指标数据的数据多样性达到预设阈值,可以用于表示该第一指标数据的数据多样性满足预设条件。
应理解,对数据进行过坐标原点的回归分析以及不过坐标原点的回归分析均可以看作是对数据进行回归分析。约束过坐标原点的回归分析得到的模型可以没有常数项,不约束过坐标原点的回归分析得到的模型可以有常数项。
还应理解,在一些实施例中,可以在对第一指标数据进行多样性判断之前,可以对第 一指标数据进行一些特征处理,本申请对此不做具体限定。作为一个示例,可以对第一指标数据进行归一化(Normalization)处理,可以对经过该归一化处理的第一指标数据进行数据多样性的判断。作为另一个示例,可以对第一指标数据进行标准化(Standardization)处理,可以对经过该标准化处理的第一指标数据进行数据多样性的判断。作为另一个示例,可以对第一指标数据进行降维处理,可以对经过该降维处理的第一指标数据进行数据多样性的判断。
上述技术方案中,可以在第一训练数据中的第一指标数据的多样性不满足预设条件的情况下,对第一指标数据和第二指标数据进行约束过坐标原点的回归分析,避免可能由数据集的第一指标数据的多样性不足导致的模型外推不准确。可以在第一训练数据中的第一指标数据的多样性满足预设条件的情况下,使用不约束过原点的回归分析,充分利用数据集提供的信息,得到更准确的模型。
结合第一方面,在一种可能的实现方式中,通过对该第二训练数据进行回归分析,得到该第二预测模型。
本申请实施例中,可以对第二训练数据进行过原点的回归分析,训练得到第二预测模型。还可以对第二训练数据进行不做坐标原点的回归分析,训练得到第二预测模型。本申请对此不做具体限定。
具体对第二训练数据进行回归分析的方法,可参见对第一训练数据中的描述,此处不再赘述。
结合第一方面,在一种可能的实现方式中,通过对该第二训练数据进行分位数回归分析,得到该第二预测模型。
应理解,分位数回归分析可以是回归分析的方法之一。分位数可以是将随机变量的分布范围按照概率的比例做分割的数值点,分位数回归可以预测指标的上界或下界。作为一个示例,分位数参数为0.1可以用于表示将变量的分布范围分为2个部分,变量小于0.1分位数的概率可以为0.1。例如,要预测的是第三指标数据的下界,可以选择低分位数0.1或0.2。
本申请实施例中对通过第二训练数据中的第二指标数据和第三指标数据做分位数回归的方法不做具体限定。作为一个示例,可以使用线性分位数回归方法。作为另一个示例,还可以使用非线性分位数回归方法。
上述技术方案中,第二预测模型用分位数回归分析建立,能预测第三指标数据上界和下界而不是平均值,满足应用需求对边界值的关切。
第二方面,提供了一种训练方法,包括:获取第一训练数据和第二训练数据;根据第一训练数据得到第一预测模型;根据第二训练数据得到第二预测模型。
可选地,在一些实施例中,第一指标数据可以是第一指标数据,第二指标数据可以是第二指标数据,第三指标数据可以是第三指标数据。
结合第二方面,在一种可能的实现方式中,获取目标设备的待预测第一指标数据;将所述待预测第一指标数据输入第一预测模型,得到所述目标设备的预测第二指标数据;将所述预测第二指标数据输入所述第二预测模型,得到所述目标设备的预测结果。
训练第一预测模型具体的方法,请参考第一方面中对预测方法的描述,此处不再赘述。
结合第二方面,在一种可能的实现方式中,对该第一训练数据中的第二指标数据指标 进行主成分分析,得到主成分分析模型;根据该主成分分析模型,对该第一训练数据进行降维处理,得到降维后的第三训练数据;根据该第三训练数据,训练该第一预测模型。
结合第二方面,在一种可能的实现方式中,通过对该第一训练数据进行回归分析,得到该第一预测模型。
结合第二方面,在一种可能的实现方式中,在该第一训练数据的多样性满足预设条件的情况下,对该第一训练数据进行过原点回归分析;在该第一训练数据的多样性不满足该预设条件的情况下,对该第一训练数据进行不过原点回归分析。
结合第二方面,在一种可能的实现方式中,获取该目标设备的第二训练数据;根据该第二训练数据,训练该目标设备的第二预测模型。
本申请实施例中,可以在根据预测第一指标数据指标,通过第一预测模型和第二预测模型得到预测第三指标数据之前,还可以根据获取的第二训练数据,训练得到第二预测模型。
训练第二预测模型具体的方法,请参考第一方面中对训练方法的描述,此处不再赘述。
结合第二方面,在一种可能的实现方式中,通过对该第二训练数据进行回归分析,得到该第二预测模型。
结合第二方面,在一种可能的实现方式中,通过对该第二训练数据进行分位数回归分析,得到该第二预测模型。
结合第二方面,在一种可能的实现方式中,目标设备与其他网络设备之间的第一指标数据指标和第二指标数据指标的指标关系具有一致性。
也就是说,如果多个设备之间的第一指标数据和第二指标数据的指标关系具有一致性,可以获取多个设备的第一指标数据和第二指标数据,通过训练该多个设备的第一指标数据和第二指标数据得到的预测模型可以适用于多个设备。
第三方面,提供了一种对租户数量指标和资源消耗量指标之间的数值关系建模的方法,包括:在描述租户数量指标的特征和服务使用量指标的特征之间的数值关系的第一数据集上,做第一回归分析得到第一预测模型;在描述服务使用量指标的特征和资源消耗量指标的特征之间的数值关系的第二数据集上,做第二回归分析得到第二预测模型。所述第一数据集中的任一数据样本,都对应某个设备组合在某一条件下的所述租户数量指标的取值和所述服务使用量指标的取值。所述设备组合中某些设备的所述租户数量指标的原值,被直接作为所述数据样本的所述租户数量指标的特征,或被输入第一特征处理,由第一特征处理的输出值作为所述数据样本的所述租户数量指标的特征。所述设备组合中某些设备的所述服务使用量指标的原值,被直接作为所述数据样本的所述服务使用量指标的特征,或被输入第二特征处理,由第二特征处理的输出值作为所述数据样本的所述服务使用量指标的特征。
第一数据集中,由全体数据样本分别对应的设备组合组成的集合是非单一的:所述数据集中存在至少一对数据样本,所述租户数量指标中存在至少一个,这对数据样本的这个租户数量指标的原值取自不同的两个设备。
所述第二数据集中的任一数据样本,都对应某个设备组合在某一条件下的所述服务使用量指标的取值和所述资源消耗量指标的取值。所述设备组合中某些设备的所述服务使用量指标的原值,被直接作为所述数据样本的所述服务使用量指标的特征,或被输入第二特 征处理,由第二特征处理的输出值作为所述数据样本的所述服务使用量指标的特征。所述设备组合中某些设备的所述资源消耗量指标的原值,被直接作为所述数据样本的所述资源消耗量指标的特征,或被输入第三特征处理,由第三特征处理的输出值作为所述数据样本的所述资源消耗量指标的特征。
结合第三方面,在一种可能的实现方式中,所述服务使用量指标,是根据所述租户数量指标和所述服务使用量指标确的。
结合第三方面,在一种可能的实现方式中,所述第一数据集中,提供所述租户数量指标的原值的设备与提供所述服务使用量指标的原值的设备之间的负载分配关系,在不同数据样本之间是相似的。
结合第三方面,在一种可能的实现方式中,所述第二特征处理,在输入值为全零或近似全零的情况下,输出值为全零或近似全零。
结合第三方面,在一种可能的实现方式中,所述第二特征处理包括第一平移变换,所述第一平移变换由如下步骤确定:
对全零或近似全零的输入值进行所述第二特征处理中的部分处理,根据所述部分处理的输出值确定第一平移变换。
结合第三方面,在一种可能的实现方式中,所述第一回归分析包括:在所述第一数据集上对所述租户数量指标的特征和所述服务使用量指标的特征做约束过原点的回归分析。
结合第三方面,在一种可能的实现方式中,当所述第一数据集上的所述租户数量指标的多样性不满足预设条件时,在所述第一数据集上对所述租户数量指标的特征和所述服务使用量指标的特征做约束过原点的回归分析,得到第一预测模型。
结合第三方面,在一种可能的实现方式中,所述第一回归分析包括:当所述第一数据集上的所述租户数量指标的多样性满足预设条件时,在所述第一数据集上对所述租户数量指标的特征和所述服务使用量指标的特征做不约束过原点的回归分析,得到第一预测模型。
结合第三方面,在一种可能的实现方式中,所述第二特征处理包括:在所述第一数据集上对设备的某些服务使用量指标进行第一降维映射处理,得到所述服务使用量指标的特征。
结合第三方面,在一种可能的实现方式中,所述第一降维映射处理包括:通过服务使用量主成分模型进行特征处理,所述服务使用量主成分模型由如下步骤确定:
在描述某些服务使用量指标的特征之间的数值关系的第三数据集上,做主成分分析,得到服务使用量主成分模型;
结合第三方面,在一种可能的实现方式中,所述第一特征处理,在输入值为全零或近似全零的情况下,输出值为全零或近似全零。
结合第三方面,在一种可能的实现方式中,所述第一特征处理包括第二平移变换,第一平移变换由如下步骤确定:对全零或近似全零输入值进行所述第一特征处理中的部分处理,根据所述部分处理输出值确定第二平移变换。
结合第三方面,在一种可能的实现方式中,所述第一特征处理包括:在所述第一数据集上对设备的某些租户数量指标进行第二降维映射处理,得到所述租户数量指标的特征。
结合第三方面,在一种可能的实现方式中,所述第二降维映射处理包括:通过租户数 量主成分模型进行特征处理,租户数量主成分模型由如下步骤确定:
在描述租户数量指标的特征之间的数值关系的第四数据集上,做主成分分析,得到描述租户数量指标主成分模型;
结合第三方面,在一种可能的实现方式中,所述第二回归分析,包括:在所述第二数据集上对所述服务使用量指标的特征和所述资源消耗量指标的特征做分位数回归分析,得到第二预测模型。
第四方面,提供了一种对租户数量指标和资源消耗量指标之间的数值关系建模的装置,包括:
第一处理模块,用于在描述租户数量指标的特征和服务使用量指标的特征之间的数值关系的第一数据集上,做第一回归分析得到第一预测模型;
第二处理模块,用于在描述服务使用量指标的特征和资源消耗量指标的特征之间的数值关系的第二数据集上,做第二回归分析得到第二预测模型。所述第一数据集中的任一数据样本,都对应某个设备组合在某一条件下的所述租户数量指标的取值和所述服务使用量指标的取值。所述设备组合中某些设备的所述租户数量指标的原值,被直接作为所述数据样本的所述租户数量指标的特征,或被输入第一特征处理,由第一特征处理的输出值作为所述数据样本的所述租户数量指标的特征。所述设备组合中某些设备的所述服务使用量指标的原值,被直接作为所述数据样本的所述服务使用量指标的特征,或被输入第二特征处理,由第二特征处理的输出值作为所述数据样本的所述服务使用量指标的特征。
其中,第一数据集中,由全体数据样本分别对应的设备组合组成的集合是非单一的:所述数据集中存在至少一对数据样本,所述租户数量指标中存在至少一个,这对数据样本的这个租户数量指标的原值取自不同的两个设备。
所述第二数据集中的任一数据样本,都对应某个设备组合在某一条件下的所述服务使用量指标的取值和所述资源消耗量指标的取值。所述设备组合中某些设备的所述服务使用量指标的原值,被直接作为所述数据样本的所述服务使用量指标的特征,或被输入第二特征处理,由第二特征处理的输出值作为所述数据样本的所述服务使用量指标的特征。所述设备组合中某些设备的所述资源消耗量指标的原值,被直接作为所述数据样本的所述资源消耗量指标的特征,或被输入第三特征处理,由第三特征处理的输出值作为所述数据样本的所述资源消耗量指标的特征。
结合第四方面,在一种可能的实现方式中,所述服务使用量指标,是根据所述租户数量指标和所述服务使用量指标确的。
结合第四方面,在一种可能的实现方式中,所述第一数据集中,提供所述租户数量指标的原值的设备与提供所述服务使用量指标的原值的设备之间的负载分配关系,在不同数据样本之间是相似的。
结合第四方面,在一种可能的实现方式中,所述第二特征处理,在输入值为全零或近似全零的情况下,输出值为全零或近似全零。
结合第四方面,在一种可能的实现方式中,所述第二特征处理包括第一平移变换,所述第一平移变换由如下步骤确定:
对全零或近似全零的输入值进行所述第二特征处理中的部分处理,根据所述部分处理的输出值确定第一平移变换。
结合第四方面,在一种可能的实现方式中,所述第一回归分析包括:在所述第一数据集上对所述租户数量指标的特征和所述服务使用量指标的特征做约束过原点的回归分析。
结合第四方面,在一种可能的实现方式中,所述第一处理模块具体用于:当所述第一数据集上的所述租户数量指标的多样性不满足预设条件时,在所述第一数据集上对所述租户数量指标的特征和所述服务使用量指标的特征做约束过原点的回归分析,得到第一预测模型。
结合第四方面,在一种可能的实现方式中,所述第一回归分析包括:当所述第一数据集上的所述租户数量指标的多样性满足预设条件时,在所述第一数据集上对所述租户数量指标的特征和所述服务使用量指标的特征做不约束过原点的回归分析,得到第一预测模型。
结合第四方面,在一种可能的实现方式中,所述第二特征处理包括:在所述第一数据集上对设备的某些服务使用量指标进行第一降维映射处理,得到所述服务使用量指标的特征。
结合第四方面,在一种可能的实现方式中,所述第一降维映射处理包括:通过服务使用量主成分模型进行特征处理,所述服务使用量主成分模型由如下步骤确定:
在描述某些服务使用量指标的特征之间的数值关系的第三数据集上,做主成分分析,得到服务使用量主成分模型;
结合第四方面,在一种可能的实现方式中,所述第一特征处理,在输入值为全零或近似全零的情况下,输出值为全零或近似全零。
结合第四方面,在一种可能的实现方式中,所述第一特征处理包括第二平移变换,第一平移变换由如下步骤确定:对全零或近似全零输入值进行所述第一特征处理中的部分处理,根据所述部分处理输出值确定第二平移变换。
结合第四方面,在一种可能的实现方式中,所述第一特征处理包括:在所述第一数据集上对设备的某些租户数量指标进行第二降维映射处理,得到所述租户数量指标的特征。
结合第四方面,在一种可能的实现方式中,所述第二降维映射处理包括:通过租户数量主成分模型进行特征处理,租户数量主成分模型由如下步骤确定:
在描述租户数量指标的特征之间的数值关系的第四数据集上,做主成分分析,得到描述租户数量指标主成分模型;
结合第四方面,在一种可能的实现方式中,所述第二回归分析,包括:在所述第二数据集上对所述服务使用量指标的特征和所述资源消耗量指标的特征做分位数回归分析,得到第二预测模型。
第五方面,提供了一种训练装置,包括:第一获取模块,用于获取第一训练数据和第二训练数据;第一训练模块,用于根据所述第一训练数据训练得到第一预测模型;第二训练模块,用于根据所述第二训练数据训练得到第二预测模型。
应理解,该第一训练数据包括多个网络设备(例如,该目标设备以及其他网络设备)的第一指标数据指标和第二指标数据指标,该第二训练数据包括该目标设备的第二指标数据指标和第三指标数据指标。
第一预测模型用于指示该目标设备的第一指标数据指标与第二指标数据指标的映射关系。该第二预测模型用于指示该目标设备的第二指标数据指标与第三指标数据指标的映 射关系。
结合第五方面,在一种可能的实现方式中,该装置还包括:第二获取模块,用于获取所述目标设备的待预测第一指标数据;第一确定模块,用于将所述待预测第一指标数据输入预测模型,得到所述目标设备的预测第二指标数据;第二确定模块,用于将所述预测第二指标数据输入所述第二预测模型,得到所述目标设备的预测结果。
应理解,预测模型包括所述第一预测模型和所述第二预测模型,所述第一预测模型根据所述第一训练数据训练得到,所述第二预测模型根据所述第二训练数据训练得到。
结合第五方面,在一种可能的实现方式中,该第一训练模块具体用于:对该第一训练数据中的第二指标数据指标进行主成分分析,得到主成分分析模型;根据该主成分分析模型,对该第一训练数据进行降维处理,得到降维后的第三训练数据;根据该第三训练数据,训练该第一预测模型。
结合第五方面,在一种可能的实现方式中,该第一训练模块具体用于:通过对该第一训练数据进行回归分析,得到该第一预测模型。
结合第五方面,在一种可能的实现方式中,该第一训练模块具体用于:在该第一训练数据的多样性满足预设条件的情况下,对该第一训练数据进行过原点回归分析;在该第一训练数据的多样性不满足该预设条件的情况下,对该第一训练数据进行不过原点回归分析。
结合第五方面,在一种可能的实现方式中,该第二训练模块具体用于:通过对该第二训练数据进行回归分析,得到该第二预测模型。
结合第五方面,在一种可能的实现方式中,该第二训练模块具体用于:通过对该第二训练数据进行分位数回归分析,得到该第二预测模型。
结合第五方面,在一种可能的实现方式中,该目标设备与其他网络设备的第一指标数据指标和第二指标数据指标的指标关系具有一致性。
第六方面,提供了一种预测装置,包括:第一获取模块,用于获取目标设备的待预测第一指标数据;第一确定模块,用于将所述待预测第一指标数据输入第一预测模型,得到所述目标设备的预测第二指标数据;第二确定模块,用于将所述预测第二指标数据输入第二预测模型,得到所述目标设备的预测结果。
应理解,所述第一预测模型根据第一训练数据训练得到,所述第二预测模型根据第二训练数据训练得到,所述第一训练数据包括多个设备的第一指标数据和第二指标数据,所述第二训练数据包括所述目标设备的第二指标数据和第三指标数据,所述多个设备包括所述目标设备。
结合第六方面,在一种可能的实现方式中,该装置还包括:第二获取模块,用于获取第一训练数据,第一训练模块,用于根据所述第一训练数据训练得到第一预测模型。
应理解,第一预测模型用于根据所述目标设备的第一指标数据预测所述目标设备的第二指标数据。
结合第六方面,在一种可能的实现方式中,该第一训练模块具体用于:对该第一训练数据中的第二指标数据指标进行主成分分析,得到主成分分析模型;根据该主成分分析模型,对该第一训练数据进行降维处理,得到降维后的第三训练数据;根据该第三训练数据,训练该第一预测模型。
结合第六方面,在一种可能的实现方式中,该第一训练模块具体用于:通过对该第一训练数据进行回归分析,得到该第一预测模型。
结合第六方面,在一种可能的实现方式中,该第一训练模块具体用于:在该第一训练数据的多样性满足预设条件的情况下,对该第一训练数据进行过原点回归分析;在该第一训练数据的多样性不满足该预设条件的情况下,对该第一训练数据进行不过原点回归分析。
结合第六方面,在一种可能的实现方式中,该装置还包括:第三获取模块,用于获取该目标设备的第二训练数据,该第二训练数据包括该目标设备的第二指标数据指标和第三指标数据指标;第二训练模块,用于根据该第二训练数据,训练该目标设备的第二预测模型,该第二预测模型用于指示该目标设备的第二指标数据指标与第三指标数据指标的映射关系。
结合第六方面,在一种可能的实现方式中,该第二训练模块具体用于:通过对该第二训练数据进行回归分析,得到该第二预测模型。
结合第六方面,在一种可能的实现方式中,该第二训练模块具体用于:通过对该第二训练数据进行分位数回归分析,得到该第二预测模型。
结合第六方面,在一种可能的实现方式中,该目标设备与其他网络设备之间的的第一指标数据和第二指标数据的指标关系具有一致性。
第七方面,提供了一种训练装置,包括存储器和处理器,所述存储器用于存储程序;所述处理器用于执行所述存储器中存储的程序,当所述程序被执行时,所述处理器用于执行第二方面或第二方面任意一种实现方式中的方法。
第八方面,提供了一种预测装置,包括存储器和处理器,所述存储器用于存储程序;所述处理器用于执行所述存储器中存储的程序,当所述程序被执行时,所述处理器用于执行第一方面或第一方面任意一种实现方式中的方法。
第九方面,提供了一种租户数量指标和资源消耗量指标之间的数值关系建模的装置,其特征在于,包括存储器和处理器,所述存储器用于存储程序;所述处理器用于执行所述存储器中存储的程序,当所述程序被执行时,所述处理器执行如执行第三方面或第三方面任意一种实现方式中的方法。
第十方面,提供了一种计算机可读存储介质,包括计算机指令,当所述计算机指令在所述训练装置上运行时,使得所述训练装置执行第二方面或第二方面任意一种实现方式所述的方法。
第十一方面,提供了一种计算机可读存储介质,包括计算机指令,当所述计算机指令在所述预测装置上运行时,使得所述预测装置用于执行第一方面或第一方面任意一种实现方式所述的方法。
第十二方面,提供了一种计算机可读存储介质,包括计算机指令,当所述计算机指令在所述租户数量指标和资源消耗量指标之间的数值关系建模的装置上运行时,使得所述预测装置用于执行第三方面或第三方面任意一种实现方式所述的方法。
第十三方面,提供了一种芯片,包括存储器和处理器,所述存储器用于存储程序;所述处理器用于执行所述存储器中存储的程序,当所述程序被执行时,所述处理器执行第一方面或第一方面任意一种实现方式所述的方法。
第十四方面,提供了一种芯片,包括存储器和处理器,所述存储器用于存储程序;所述处理器用于执行所述存储器中存储的程序,当所述程序被执行时,所述处理器执行第二方面或第二方面任意一种实现方式所述的方法。
第十五方面,提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行第一方面或第一方面任意一种实现方式所述的方法。
第十六方面,提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行第二方面或第二方面任意一种实现方式所述的方法。
附图说明
图1是本申请实施例提供的一种预测方法的示意性流程图。
图2是本申请实施例提供的一种可能的指标之间跨设备组合通用性场景的示意性流程图。
图3是本申请另一实施例提供的一种可能的指标之间跨设备组合通用性场景示意性流程图。
图4是本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。
图5是本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。
图6是本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。
图7是本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。
图8是本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。
图9是本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。
图10是本申请实施例提供的训练装置1000的示意性结构图。
图11是本申请实施例提供的预测装置1100的示意性结构图。
图12是本申请实施例提供的训练装置1200的示意性结构图。
图13是本申请实施例提供的预测装置1300的示意性结构图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请对根据第一指标数据以及预测模型预测第二指标数据的适用场景不做具体限定。可以应用于各种通信网络设备,也可以应用于各种计算机设备,例如,可以应用于数据运营中心的计算机设备。
本申请实施例提供一种预测方法,可以根据目标设备的待预测第一指标数据准确地预测出目标设备的预测结果(第三指标数据)。下面结合图1对本申请实施例进行详细描述。
图1是本申请实施例提供的一种预测方法的示意性流程图。图1的方法可以包括步骤110-130,下面分别对步骤110-130进行详细描述。
在步骤110中,获取目标设备的待预测第一指标数据。
本申请实施例中的目标设备也可以称为待建模设备。
本申请对提及的目标设备和/或网络设备(也可以称为通信网络设备)不做具体限定,可以包括但不限于网络中的任意子网、网元、网元的子设备(例如单板)、网元的功能单 元(例如模块)。例如,通信网络设备可以包括但不限于网络适配器、网络收发器、网络媒体转换设备、多路复用器、中断器、集线器、网桥、交换机、路由器、网关等。
本申请实施例对设备(也可以称为网络通信设备)的类型不做具体限定,可以是任意一种通信网络设备。例如,该设备可以是通用语音业务服务器(advanced telephony server,ATS),又如,该设备还可以是统一分组网关(unified packet gateway,UGW)。
在步骤120中,将待预测的第一指标数据输入到第一预测模型,得到目标设备的预测第二指标数据。
本申请实施例对预测模型不做具体限定。作为一个示例,可以是一个预测模型。作为另一个示例,也可以是两个预测模型,例如,预测模型可以包括第一预测模型和第二预测模型。
应理解,所述第一预测模型根据第一训练数据训练得到,第二预测模型根据第二训练数据训练得到。
本申请实施例中第一训练数据可以包括多个设备的第一指标数据和第二指标数据。
应理解,第一指标数据和第二指标数据可以来自包括目标设备在内的多个网络设备。
本申请实施例中对第一指标数据和第二指标数据不做具体限定,可以是具有正相关性的两个指标数据,也可以是具有负相关的两个指标数据。
在步骤130中,将预测第二指标数据输入第二预测模型,得到目标设备的预测结果。
本申请实施例中第二预测模型可以根据第二训练数据训练得到,第二训练数据可以包括目标设备的第二指标数据和第三指标数据。
应理解,第二指标数据和第三指标数据可以来自目标设备,也可以来自包括目标设备在内的多个网络设备,本申请对此不做具体限定。
本申请实施例中,可以将第一预测模型输出的预测第二指标输入到第二预测模型,可以得到目标设备的预测结果,也就是第二预测模型输出的预测第三指标数据。
可选地,在一些实施例中,第一指标数据可以是用户数,第二指标数据可以是话务量,第三指标数据可以是资源占用率。
下面以第一指标数据为用户数,第二指标数据为话务量为例,进行详细说明。
应理解,本申请实施例中用户数指标可以表示为在网络通信设备中使用某种服务的用户数。
应理解,同一个网络通信设备可以有多个用户数指标。作为一个示例,用户数可以表示为“2G+3G”用户数指标。作为另一个示例,用户数还可以表示为“4G用户数”指标。作为另一个示例,用户数还可以表示为“注册用户数”指标。本申请实施例对此不做具体限定。
本申请实施例中话务量指标可以理解为在网络通信设备中使用某种服务的数量。
应理解,同一个设备可以有多个话务量指标。作为一个示例,通信网络设备中的话务量可以用于表示网络设备的“总占用话务量”指标。作为另一个示例,通信网络设备中的话务量还可以用于表示网络设备的“Gi接口包数”指标。作为另一个示例,通信网络设备中的话务量还可以用于表示网络设备的“SGi用户面报文包数”指标。本申请实施例对此不做具体限定。
应理解,“话务量”指标可以与“用户数”指标(也可称为用户数)、用户通信的频 繁程度、用户通信所占时长有关。如果单位时间内“用户数”指标越多,通信所占用的时间越长,此时“话务量”指标越大。
上文提及的多个设备之间的用户数指标和话务量指标的指标关系具有一致性,也就是说多个设备之间的用户数指标和话务量指标的指标关系相同或具有大致相同的变化趋势。
下面会结合图2至图3对多个设备的用户数指标和话务量指标的指标关系具有一致性进行详细描述,此处暂不详述。
本申请实施例中资源占用率指标可以表示为在网络通信设备中的资源消耗量。资源占用率可以是用户数对应的一定资源占用率。例如,一定数量的用户数对应CPU占用率为80%。
应理解,不同的设备可以具有不同的资源占用率指标。作为一个示例,资源占用率指标可以表示为“CPU峰值占用率”。作为另一个示例,资源占用率指标还可以表示为“内存占用率”。作为另一个示例,资源占用率还可以表示为“License占用量”。本申请实施例对此不做具体限定。
本申请实施例中,可以通过采集到的多个设备(包括目标设备与其他网络设备)的第一训练数据训练出第一预测模型,可以扩大第一指标数据的历史数据样本的多样性。再可以通过采集到的目标设备的第二训练数据训练出第二预测模型,从而可以更准确地体现第一指标数据与第二指标数据之间的函数关系。
可选地,在一些实施例中,还可以获取第一训练数据,根据第一训练数据训练得到第一预测模型。
可选地,在一些实施例中,还可以获取第二训练数据,根据第二训练数据训练得到第二预测模型。
本申请实施例中,第一预测模型可以用于用于根据所述目标设备的第一指标数据预测所述目标设备的第二指标数据。第二预测模型用于根据所述第一预测模型得到的所述目标设备的第二指标数据预测所述目标设备的第三指标数据。
下面以第一指标数据为用户数,第二指标数据为话务量,第三指标数据为资源占用率为例,进行详细说明。
可选地,在一些实施例中,第二预测模型也可称为“用户数-话务量”模型。
本申请实施例中可以对第一训练数据中的用户数指标和/或话务量指标直接进行训练,可以得到第一预测模型,还可以对用户数指标和/或话务量指标进行特征处理,并对特征处理之后的数据进行训练,得到第一预测模型。本申请对此不做具体限定。
应理解,对从设备采集到的指标(例如,用户数指标、话务量指标、资源占用率指标)经过特征处理,可以使经过特征处理之后的数据具备一定的数值特性,以便于数学工具的使用。例如某些维度在一定区间内,具有一定的均值、方差等等。例如,标准化(Standardization)和归一化(Normalization)都是对数据进行一定变换,是常用的特征处理手段。
对从设备采集到的指标(例如,用户数指标、话务量指标、资源占用率指标)经过特征处理,也可以提取指标中的某种信息以方便后续分析。例如,可以标识出数值的正负号。
本申请实施例中对用户数指标和/或话务量指标进行特征处理的具体实现方式有很多。作为一个示例,可以对用户数指标和/或话务量指标进行标准化(standardization)处 理。作为另一个示例,还可以对用户数指标和/或话务量指标进行归一化(normalization)处理。作为另一个示例,还可以对用户数指标和/或话务量指标进行降维处理,例如,可以对用户数指标和/或话务量指标进行主成分分析。后面会结合具体的实施例进行详细描述,此处暂不详述。
本申请实施例对通过第一训练数据训练出第一预测模型的实现方式不做具体限定,作为一个示例,可以是对第一训练数据回归分析,得到第一预测模型。作为另一个示例,还可以是对第一训练数据进行过原点回归分析,得到第一预测模型。后面会结合具体的实施例进行详细描述,此处暂不详述。
本申请实施例中,第二预测模型可以用于表示目标设备的话务量指标和资源占用率指标之间的映射关系。
可选地,在一些实施例中,第二预测模型也可称为“话务量-资源占用率”模型。
本申请实施例中可以对第二训练数据中的话务量指标和/或资源占用率指标直接进行训练,可以得到第二预测模型,还可以对话务量指标和/或资源占用率指标进行特征处理,并对特征处理之后的数据进行训练,得到第二预测模型。本申请对此不做具体限定。
本申请实施例中对话务量指标和/或资源占用率指标进行特征处理的具体实现方式有很多。作为一个示例,可以对话务量指标和/或资源占用率指标进行标准化(standardization)处理。作为另一个示例,还可以对话务量指标和/或资源占用率指标进行归一化(normalization)处理。作为另一个示例,还可以对话务量指标和/或资源占用率指标进行降维处理,例如,可以对话务量指标和/或资源占用率指标进行主成分分析。后面会结合具体的实施例进行详细描述,此处暂不详述。
本申请实施例对通过第二训练数据训练出第二预测模型的实现方式不做具体限定,作为一个示例,可以是对第二训练数据回归分析,得到第二预测模型。作为另一个示例,还可以是对第二训练数据进行过原点回归分析,得到第二预测模型。后面会结合具体的实施例进行详细描述,此处暂不详述。
本申请实施例中,可以通过采集到的多个设备(包括目标设备与其他网络设备)的第一训练数据训练出第一预测模型,可以扩大用户数指标的历史数据样本的多样性。再可以通过采集到的目标设备的第二训练数据训练出第二预测模型,从而可以更准确地体现用户数指标与资源占用率指标之间的函数关系。
可选地,在一些实施例中,可以获取目标设备的预测用户数,可以根据该预测用户数,通过上文提及的第一预测模型以及第二预测模型,可以得到目标设备中预测用户数对应的预测资源占用率。
本申请实施例中,可以根据预测的用户数,可以得到网络设备(目标设备)准确的预测资源占用率。网络运营商可以在开展活动之前,可以获取到预测的用户数对应的网络设备的资源占用率,可以对会过载的网络设备进行提前扩容。
下面结合具体的例子,更加详细地描述本申请实施例提及的目标设备与所述其他网络设备的用户数指标和话务量指标的指标关系具有一致性的具体实现方式。应注意,下文的例子仅仅是为了帮助本领域技术人员理解本申请实施例,而非要将申请实施例限制于所示例的具体数值或具体场景。本领域技术人员根据文所给出的例子,显然可以进行各种等价的修改或变化,这样的修改和变化也落入本申请实施例的范围内。
应理解,图2-图3中的用户数对应于上文中的第一指标数据,话务量对应于上文中的第二指标数据,资源占用率对应于上文中的第三数据指标。
图2是本申请实施例提供的一种可能的指标之间跨设备组合通用性场景的示意性流程图。如图2所示,网元ATS0和网元ATS1属于同类型的通信设备。通信设备可以具有层级分解结构,网元ATS0可以分解为模块VCU0、VCU1、DPU0,网元ATS1可以分解为模块VCU0、VCU1、DPU0。
网元ATS可以是一种通用语音业务服务器,作为一个示例,网元ATS可以提供基本呼叫业务,例如,网元ATS可以向用户提供基本的语音呼叫、时频电话功能。作为另一个示例,网元ATS可以提供一些补充业务,例如,网元ATS可以提供显示类、限呼类、转移类、回叫类、会议类、提醒类等增强附加的系统功能。
网元ATS0的业务可以负载在其下的三个模块(VCU0、VCU1、DPU0)之间平均分配,网元ATS1的业务也可以负载在其下的三个模块(VCU0、VCU1、DPU0)之间平均分配。其中,调度处理单元(dispatch process unit,DPU)可以用于执行工程师组态的控制策略,可以实现数据采集、标度变换、报警限值检查、操作记录、顺序时间记录等功能。
参见图2,用户数指标(例如,“注册用户数”指标)可以对应到网元(ATS0、ATS1),话务量指标(例如,“总占用话务量”指标)可以对应到网元(ATS0、ATS1),资源占用率指标(例如,“CPU峰值占用率”指标)可以对应到网元(ATS0、ATS1)下设的模块(VCU0、VCU1、DPU0)。
网元ATS0和网元ATS1属于同类型的通信设备,“注册用户数”指标和“总占用话务量”指标均对应到网元,并且“注册用户数”指标和“总占用话务量”指标的定义在网元ATS0和网元ATS1上相同。因此,“注册用户数”指标和“总占用话务量”指标之间的相关关系在网元ATS0和网元ATS1之间具有跨设备组合通用性。也就是说,通过“注册用户数”指标和“总占用话务量”指标之间的相关关系训练出的模型既可以适用于网元ATS0,也可以同样适用于网元ATS1。例如,如果网元ATS0和网元ATS1上的“注册用户数”指标相同,网元ATS0和网元ATS1上的“总占用话务量”指标可以相同或近似相同。
应理解,图2所示的用户数指标(例如,“注册用户数”指标)以及话务量指标(例如,“总占用话务量”指标)可以来自目标设备(例如,网元ATS0)以及其他网络设备(例如,网元ATS1)。由于网元ATS0和网元ATS1上的“注册用户数”指标和“总占用话务量”指标之间的指标关系具有相同或大致相同的变化趋势,可以采集网元ATS0和网元ATS1上的用户数指标和话务量指标,可以扩大用户数指标的历史数据样本的多样性。
图3是本申请另一实施例提供的一种可能的指标之间跨设备组合通用性场景的示意性流程图。如图3所示,网元UGW0、UGW1、UGW2属于同类型的通信设备。通信设备可以具有层级分解结构,网元UGW0可以分解为模块SPU实例0、SPU实例1,网元UGW1可以分解为模块SPU实例0、SPU实例1,网元UGW2可以分解为模块SPU实例0、SPU实例1、SPU实例2。
网元UGW可以是一种统一分组网关,网元UGW0的业务可以负载在其下的模块SPU实例中。其中,服务处理单元(service process unit,SPUSPU)实例可以用于提供负载均衡、防火墙、等网络应用场景中的业务功能需求,其中,SPU提供的高效的负载均衡方 案,可以解决信息科技(information technology,IT)系统中响应慢、应用时延过大、设备流量不均等问题,保证服务的可靠性、提高服务的响应速度、方便业务灵活扩展。
参见图3,网元UGW0的业务可以负载在其下的两个模块(SPU实例0、SPU实例1)之间平均分配,网元UGW1的业务可以负载在其下的两个模块(SPU实例0、SPU实例1)之间平均分配,网元UGW2的业务可以负载在其下的三个模块(SPU实例0、SPU实例1、SPU实例2)之间平均分配。
用户数指标(例如,“2G+3G用户数”指标、“4G用户数”指标)可以对应到网元(UGW0、UGW1、UGW2),话务量指标(例如,“Gi接口包数”指标、“SGi接口包数”指标)可以对应到网元(UGW0、UGW1、UGW2),话务量指标(例如,“GW接收用户面报文包数”指标)可以对应到SPU实例。
网元UGW0、UGW1、UGW2属于同类型的通信设备,“2G+3G用户数”指标、“4G用户数”指标、“Gi接口包数”指标以及“SGi接口包数”指标均对应到网元,并且“2G+3G用户数”指标、“4G用户数”指标、“Gi接口包数”指标以及“SGi接口包数”指标的定义在网元UGW0、UGW1、UGW2上相同。因此,“2G+3G用户数”指标、“4G用户数”指标、“Gi接口包数”指标以及“SGi接口包数”指标之间的相关关系在网元UGW0、UGW1、UGW2之间具有跨设备组合通用性。
由于“GW接收用户面报文包数”指标对应到网元其下的模块SPU实例中,如图3所示,网元UGW2的SPU实例数不同于网元UGW0和网元UGW1的实例数,可以导致业务负载在网元UGW2的SPU实例间的分解关系可以不同于网元UGW0和网元UGW1。因此,“2G+3G用户数”指标、“4G用户数”指标以及“GW接收用户面报文包数”指标之间的相关关系在网元UGW0和UGW1之间具有跨设备组合通用性,而在网元UGW0、UGW1、UGW2之间不具有跨设备组合通用性。
如图3所示,用户数指标(例如,“2G+3G用户数”指标、“4G用户数”指标)和话务量指标(例如,“Gi接口包数”指标、“SGi接口包数”指标、“GW接收用户面报文包数”指标)之间的相关关系可以在网元UGW0和UGW1之间具有跨设备组合通用性。也就是说,通过“2G+3G用户数”指标、“4G用户数”指标、“Gi接口包数”指标、“SGi接口包数”指标、“GW接收用户面报文包数”指标之间的相关关系训练出的模型可以适用于网元UGW0,也可以同样适用于网元UGW1。
应理解,用户数指标(例如,“2G+3G用户数”指标、“4G用户数”指标)和话务量指标(例如,“Gi接口包数”指标、“SGi接口包数”指标、“GW接收用户面报文包数”指标)之间的相关关系可以在网元UGW0和UGW1之间具有跨设备组合通用性。可以采集目标设备(例如,网元UGW0)以及其他网络设备(例如,网元UGW1)的用户数指标和话务量指标的数据,可以扩大用户数指标的历史数据样本的多样性。
可选地,在一些实施例中,可以通过对第一训练数据和/或第二训练数据进行回归分析,得到第一预测模型。
本申请实施例中,可以通过回归分析的方法训练得到第一预测模型和/或第二预测模型,可以准确地计量各个因素之间的相关程度和拟合程度的高低,计算简单,易于实现。
下面以第一指标数据为用户数,第二指标数据为话务量,第三指标数据为资源占用率为例,进行详细说明。
下面结合图4以通过对第一训练数据和/或第二训练数据进行回归分析,训练得到第一预测模型和/或第二预测模型为例,进行详细举例说明。
图4是本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。图4包括步骤410-450,下面分别对步骤410-450进行详细描述。
下面以图3所示的训练场景为例对训练第一预测模型以及第二预测模型的流程进行详细描述。
应理解,图3所示的待建模设备可以对应到本申请实施例中的目标设备。
在步骤410中,根据待建模设备,确定待预测“资源占用率”指标、“用户数”指标、“话务量”指标。
如图3所示,根据待建模设备,确定待预测的用户数指标是UGW网元的“2G+3G用户数”指标和UGW网元的“4G用户数”指标。待预测的资源占用率指标是SPU实例的“CPU峰值占用率”指标。待预测的话务量指标是UGW网元的“Gi接口包数”指标、UGW网元的“SGi用户面报文包数”指标、SPU实例的“GW接收用户面报文包数”指标。
在步骤420中,根据“用户数”指标和“话务量”指标的相关关系的跨设备通用性,选择与待建模设备具有通用性的设备,得到设备组合列表1。根据设备组合列表1获取第一训练数据,包含“用户数”指标和“话务量”指标。
根据UGW0、UGW1、UGW2为同类型设备,UGW网元的“2G+3G用户数”指标、UGW网元的“4G用户数”指标和UGW网元的“Gi接口包数”指标、UGW网元的“SGi用户面报文包数”指标在3个设备的定义是一致的,而且都是对应到网元,可以判断这些指标之间的相关关系在3个设备UGW0、UGW1、UGW2之间具有跨设备组合通用性。但是SPU实例的“GW接收用户面报文包数”指标并不对应到UGW网元,而且因为UGW2的SPU实例数量不同于UW0和UGW1,可以导致业务负载在UGW2的SPU实例间的分解关系不同于UGW0和UGW1。因此UGW网元的“2G+3G用户数”指标、UGW网元的“4G用户数”指标和SPU实例的“GW接收用户面报文包数”指标在4个设备组合UGW0和SPU实例0、UGW0和SPU实例1、UGW1和SPU实例0、UGW1和SPU实例1之间具有跨设备组合通用性,也在3个设备组合UGW2和SPU实例0、UGW2和SPU实例1、UGW2和SPU实例2之间具有跨设备组合通用性,但不在这7个设备组合之间具有跨设备组合通用性。
根据用户数指标是UGW网元的“2G+3G用户数”指标和UGW网元的“4G用户数”指标,话务量指标是UGW网元的“Gi接口包数”指标、UGW网元的“SGi用户面报文包数”指标、SPU实例的“GW接收用户面报文包数”指标,确定数值样本来自4个设备组合:UGW0和SPU实例0、UGW0和SPU实例1、UGW1和SPU实例0、UGW1和SPU实例1。以1个设备组合在全天峰值时间(例如17:00)上的“2G+3G用户数”指标和“4G用户数”指标、“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”指标的值做为一个数据样本,获取第一训练数据。
第一训练数据可以选择4个设备组合作为数据来源,也可以选择3个设备组合作为数据来源,本实施例中选择4个设备组合。
在步骤430中,对第一训练数据中的“用户数”指标和“话务量”指标进行回归分析,得到“用户数-话务量模型”。
可以获取网元UGW0和UGW1的数据,包括:网元的“2G+3G用户数”指标、“4G用户数”指标、“Gi接口包数”指标、“SGi用户面报文包数”指标和SPU实例的“GW接收用户面报文包数”指标。可以对获取的网元UGW0和UGW1的数据进行筛选,并可以对筛选之后的数据进行回归分析,可以得到网元UGW0和网元UGW1的“用户数-话务量模型”(也可以称为第一预测模型)。例如,以每天17:00作为峰值时点进行筛选,可以对筛选后数据中的“2G+3G用户数”指标、“4G用户数”指标和“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”指标进行回归分析,可以训练得到网元UGW0和网元UGW1的“用户数-话务量模型”(第一预测模型)。
在步骤440中,根据待建模设备获取第二训练数据,包含“话务量”指标和“资源占用率”指标。
根据话务量指标是UGW网元的“Gi接口包数”指标、UGW网元的“SGi用户面报文包数”指标、SPU实例的“GW接收用户面报文包数”指标,资源占用率指标是SPU实例的“CPU峰值占用率”指标,确定数据样本来自目标设备:UGW0和SPU实例0。以该设备在全天任一时间点的“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”指标、“CPU峰值占用率”的值作为一个数据样本,获取第二训练数据。
在步骤450中,对第二训练数据中的“话务量”指标和“资源占用率”指标进行回归分析,训练得到“话务量-资源占用率模型”。
以UGW0和SPU实例0(目标设备)中的“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”作为话务量指标,以UGW0和SPU实例0(目标设备)中的“CPU峰值占用率”指标作为资源占用率指标,对话务量指标和资源占用率指标的做回归分析,可以训练得到UGW0和SPU实例0(目标设备)的“话务量-资源占用率模型”(第二预测模型)。
可选地,在一些实施例中,对网元UGW0的SPU实例0的预测流程可以如下所示:
步骤1:根据网元UGW0的SPU实例0的设备类型是UGW设备下的SPU设备,可以确定第一预测模型(“用户数-话务量模型”)输入的“用户数”指标是网元的“2G+3G用户数”指标和网元的“4G用户数”指标,进而可以确定“预测2G+3G用户数”和“预测4G用户数”。
步骤2:根据待预测设备是网元UGW0的SPU实例0,可以确定该设备对应的第一预测模型(“用户数-话务量模型”)和“预测注册用户数”,可以得到“预测Gi接口包数”、“预测SGi用户面报文包数”、“预测GW接收用户面报文千字节数”。
步骤3:根据待预测设备是UGW0的SPU实例0,可以确定对应的第二预测模型(“话务量-资源占用率模型”)。可以根据“话务量-资源占用率模型”和“预测Gi接口包数”、“预测SGi用户面报文包数”、“预测GW接收用户面报文千字节数”,得到“预测CPU峰值占用率”。
可选地,在一些实施例中,还可以通过对第二训练数据进行分位数回归分析,得到第二预测模型。
应理解,分位数回归分析可以是回归分析的方法之一。分位数可以是将随机变量的分布范围按照概率的比例做分割的数值点,分位数回归可以预测指标的上界或下界。作为一个示例,分位数参数为0.1可以用于表示将变量的分布范围分为2个部分,变量小于0.1 分位数的概率可以为0.1。例如,要预测的是资源占用率指标的下界,可以选择低分位数0.1或0.2。
本申请实施例通过对第二训练数据中的第二指标数据和第三指标数据做分位数回归的方法不做具体限定。作为一个示例,可以使用线性分位数回归方法。作为另一个示例,还可以使用非线性分位数回归方法。
本申请实施例中,第二预测模型通过分位数回归分析进行训练,可以预测第三指标数据的上界和下界而不是平均值,可以满足应用需求对边界值的关切。
下面以第一指标数据为用户数,第二指标数据为话务量,第三指标数据为资源占用率为例,进行详细说明。
可选地,可以在图4的基础上,通过对第二训练数据中的话务量指标和资源占用率指标进行分位数回归,可以训练得到第二预测模型。下面结合图5,对这种实现方式进行详细描述。
图5是本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。图5包括步骤510-550,步骤510-540分别与步骤410-440对应,具体可参见图4的描述,此处不再赘述。
下面以图3所示的训练场景为例对训练第一预测模型以及第二预测模型的流程进行详细描述。
在步骤510中,根据待建模设备,确定待预测“资源占用率”指标、“用户数”指标、“话务量”指标。
在步骤520中,根据“用户数”指标和“话务量”指标的相关关系的通用性,选择与待建模设备具有通用性的设备,得到设备组合列表1。根据设备组合列表1获取第一训练数据,包含“用户数”指标和“话务量”指标。
在步骤530中,对第一训练数据中的“用户数”指标和“话务量”指标进行回归分析,得到“用户数-话务量”模型。
在步骤540中,根据待建模设备获取第二训练数据,包含“话务量”指标和“资源占用率”指标。
在步骤550中,通过对第二训练数据中的话务量指标和资源占用率指标做分位数回归分析,训练得到“话务量-资源占用率模型”。
以UGW0和SPU实例0(目标设备)中的“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”作为话务量指标,以“CPU峰值占用率”指标作为资源占用率指标,对话务量指标和资源占用率指标的做分位数回归分析,可以训练得到UGW0和SPU实例0(目标设备)的“话务量-资源占用率模型”(第二预测模型)。
作为一个示例,要预测的是“CPU峰值占用率”指标的上界,可以选择高比例的分位数,例如0.8或0.9。作为另一个示例,要预测的是“CPU峰值占用率”指标的下界,可以选择低比例的分位数,例如0.1或0.2。
本申请实施例中,第二预测模型用分位数回归分析建立,可以预测第三指标数据的上界和下界而不是平均值,满足应用需求对边界值的关切。
可选地,在一些实施例中,可以通过对第一训练数据中的第一指标数据和第二指标数据做回归分析建立第一预测模型的过程中,可以加入“约束建模”特性。也就是说,可以对 第一指标数据的数据多样性进行判断。如果第一训练数据中的第一指标数据的数据多样性不满足预设条件,可以对第一训练数据中的第一指标数据和第二指标数据进行约束过坐标原点的回归分析。如果第一训练数据中的第一指标数据的数据多样性满足预设条件,可以对第一训练数据中的第一指标数据和第二指标数据不约束过坐标原点的回归分析。
本申请实施例中,在第一训练数据中的第一指标数据的多样性满足预设条件的情况下,使用不约束过原点的回归分析,充分利用数据集提供的信息,得到更准确的模型。
本申请实施例中,对第一指标数据和第二指标数据进行约束过坐标原点的回归分析,避免可能由数据集的第一指标数据的多样性不足导致的模型外推不准确。
应理解,上文提及的预设条件可以表示一个预设阈值,如果第一指标数据的数据多样性达到预设阈值,可以用于表示该第一指标数据的数据多样性满足预设条件。
应理解,对数据进行过坐标原点的回归分析以及不过坐标原点的回归分析均可以看作是对数据进行回归分析。约束过坐标原点的回归分析得到的模型可以没有常数项,不约束过坐标原点的回归分析得到的模型可以有常数项。作为一个示例,过坐标原点得到的回归模型可以在变量X为0时,预测的变量Y一定为0。过坐标原点的回归分析可以更容易计算,易于实现。通常可以将过指定非坐标原点的回归分析转化为过坐标原点的回归分析。作为另一个示例,不过坐标原点得到的回归模型可以在变量X为0时,预测的变量Y不一定为0。
还应理解,在一些实施例中,可以在对第一指标数据进行多样性判断之前,可以对第一指标数据进行一些特征处理,本申请对此不做具体限定。作为一个示例,可以对第一指标数据进行归一化(normalization)处理,可以对经过该归一化处理的第一指标数据进行数据多样性的判断。作为另一个示例,可以对第一指标数据进行标准化(standardization)处理,可以对经过该标准化处理的第一指标数据进行数据多样性的判断。作为另一个示例,可以对第一指标数据进行降维处理,可以对经过该降维处理的第一指标数据进行数据多样性的判断。
下面以第一指标数据为用户数,第二指标数据为话务量,第三指标数据为资源占用率为例,进行详细说明。
可选地,在图4的基础上,可以增加对用户数指标多样性的判断。下面结合图6,对这种实现方式进行详细描述。
图6是本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。图6的方法包括步骤610-670。下面分别对步骤610-670进行详细描述。
下面以图3所示的训练场景为例对训练第一预测模型以及第二预测模型的流程进行详细描述。
在步骤610中,根据待建模设备,确定待预测“资源占用率”指标、“用户数”指标、“话务量”指标。
如图3所示,根据待建模设备,确定待预测的用户数指标是UGW网元的“2G+3G用户数”指标和UGW网元的“4G用户数”指标。待预测的资源占用率指标是SPU实例的“CPU峰值占用率”指标。待预测的话务量指标是UGW网元的“Gi接口包数”指标、UGW网元的“SGi用户面报文包数”指标、SPU实例的“GW接收用户面报文包数”指标。
在步骤620中,根据“用户数”指标和“话务量”指标的相关关系的跨设备通用性,选 择与待建模设备具有通用性的设备,得到设备组合列表1。根据设备组合列表1获取第一训练数据,包含“用户数”指标和“话务量”指标。
根据UGW0、UGW1、UGW2为同类型设备,UGW网元的“2G+3G用户数”指标、UGW网元的“4G用户数”指标和UGW网元的“Gi接口包数”指标、UGW网元的“SGi用户面报文包数”指标在3个设备的定义是一致的,而且都是对应到网元,可以判断这些指标之间的相关关系在3个设备UGW0、UGW1、UGW2之间具有跨设备组合通用性。但是SPU实例的“GW接收用户面报文包数”指标并不对应到UGW网元,而且因为UGW2的SPU实例数量不同于UW0和UGW1,可以导致业务负载在UGW2的SPU实例间的分解关系不同于UGW0和UGW1。因此UGW网元的“2G+3G用户数”指标、UGW网元的“4G用户数”指标和SPU实例的“GW接收用户面报文包数”指标在4个设备组合UGW0和SPU实例0、UGW0和SPU实例1、UGW1和SPU实例0、UGW1和SPU实例1之间具有跨设备组合通用性,也在3个设备组合UGW2和SPU实例0、UGW2和SPU实例1、UGW2和SPU实例2之间具有跨设备组合通用性,但不在这7个设备组合之间具有跨设备组合通用性。
根据用户数指标是UGW网元的“2G+3G用户数”指标和UGW网元的“4G用户数”指标,话务量指标是UGW网元的“Gi接口包数”指标、UGW网元的“SGi用户面报文包数”指标、SPU实例的“GW接收用户面报文包数”指标,确定数值样本来自4个设备组合:UGW0和SPU实例0、UGW0和SPU实例1、UGW1和SPU实例0、UGW1和SPU实例1。以1个设备组合在全天峰值时间(例如17:00)上的“2G+3G用户数”指标和“4G用户数”指标、“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”指标的值做为一个数据样本,获取第一训练数据。
第一训练数据可以选择4个设备组合作为数据来源,也可以选择3个设备组合作为数据来源,本实施例中选择4个设备组合。
在步骤630中,判断第一训练数据中的“用户数”指标的多样性是否满足预设条件。
如果第一训练数据中的用户数指标的数据多样性满足预设条件,可以通过执行步骤640来建立第一预测模型(也可称为“用户数-话务量模型”)。如果第一训练数据中的“用户数”指标的数据多样性不满足预设条件,可以通过执行步骤650来建立第一预测模型。
可以对第一训练数据中的2G+3G用户数”指标和“4G用户数”指标的多样性进行判断。
在步骤640中,对第一训练数据中的“用户数”指标和“话务量”指标进行约束过坐标原点的回归分析,得到“用户数-话务量模型”。
如果第一训练数据中的用户数指标(“2G+3G用户数”指标、“4G用户数”指标)中的至少一个的多样性不满足预设条件,可以对第一训练数据中的“2G+3G用户数”指标、“4G用户数”指标和“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”指标进行约束过坐标原点的回归分析,可以得到“用户数-话务量模型”(第一预测模型)。
具体的回归分析建立第一预测模型请参考图4中步骤430的描述,此处不再赘述。
在步骤650中,对第一训练数据中的用户数指标和话务量指标进行不约束过坐标原点的回归分析,得到“用户数-话务量模型”。
如果第一训练数据中的用户数指标(“2G+3G用户数”指标、“4G用户数”指标)的多样性满足预设条件,可以对第一训练数据中的“2G+3G用户数”指标、“4G用户数”指标和“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”指标进行不约束过坐标原点的回归分析,可以得到“用户数-话务量模型”(第一预测模型)。
具体的回归分析建立第一预测模型请参考图4中步骤430的描述,此处不再赘述。
在步骤660中,根据待建模设备获取第二训练数据,包含“话务量”指标和“资源占用率”指标。
步骤660与图4所示的步骤440对应,具体请参考图4的描述,此处不再赘述。
在步骤670中,通过对第二训练数据中的话务量指标和资源占用率指标做回归分析,建立描述话务量指标和资源占用率指标之间的数值关系的第二预测模型。
以UGW0和SPU实例0(目标设备)中的“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”作为话务量指标,以“CPU峰值占用率”指标作为资源占用率指标,对话务量指标和资源占用率指标的做回归分析,建立描述话务量指标和资源占用率指标之间的数值关系的第二预测模型。
本申请实施例中,可以在扩大了训练模型的数据来源的基础上,对可能仍然得不到多样性的数据的情况,可以进行一种备案以完成建模,可以满足实际需求。
可选地,在一些实施例中,可以通过对第一训练数据和第二训练数据中的第一指标数据和第二指标数据做过原点的回归分析建立第一预测模型的过程中,还可以对第一训练数据和第二训练数据中的第二指标数据进行特征处理。如果第二指标数据特征处理的输入值为全零或近似全零,第二指标数据特征处理的输出值为全零或近似全零。可以对经过特征之后的第二指标数据和第一指标数据做过原点的回归分析,训练得到第一预测模型。还可以对经过特征之后的第二指标数据和第三指标数据做过原点的回归分析,建立第二预测模型。
本申请实施例中,在对第二指标数据进行特征处理的过程中,可以将所有特征处理全零或近似全零的输入值映射为所有特征处理全零或近似全零的输出值,以配合对第一训练数据进行约束过原点的回归分析。
应理解,本申请实施例中对第一训练数据和第二训练数据中的第二指标数据进行特征处理的方法不做具体限定。作为一个示例,可以对第一训练数据和第二训练数据中的第二指标数据进行降维处理,例如,可以对第二指标数据进行主成分分析。作为另一个示例,还可以对第一训练数据和第二训练数据中的第二指标数据进行标准化处理。作为另一个示例,还可以对第一训练数据和第二训练数据中的第二指标数据进行归一化处理。
本申请实施例中,对租户数量和使用服务量进行约束过坐标原点的回归分析,避免可能由数据集的第一指标数据的多样性不足导致的模型外推不准确。
可选地,在一些实施例中,可以对第一训练数据和第二训练数据中的第二指标数据进行降维处理。
应理解,降维处理可以将高维的数据映射为低维的数据,降低数据冗余,可被视为一种特征处理。主成分分析是一种常用的降维处理的方法。
本申请实施例对第一训练数据和第二训练数据中的第二指标数据进行降维处理的实现方式不做具体限定。作为一个示例,可以在第一训练数据和第二训练数据上对第二指标 数据进行主成分分析,可以得到第二指标数据主成分。
应理解,可以对第一训练数据中的第二指标数据进行主成分分析,得到主成分分析模型,可以根据得到的主成分分析模型,对第一训练数据进行降维处理,得到降维后的第三训练数据,可以根据得到的第三训练数据,训练所述第一预测模型。
应理解,作为一个示例,进行主成分分析的第二指标数据可以是第二指标数据原值。作为另一个示例,进行主成分分析的第二指标数据可以是对第二指标数据原值进行标准化之后的数值。作为另一个示例,进行主成分分析的第二指标数据可以是对第二指标数据原值进行归一化之后的数值。
本申请实施例中,对第二指标数据进行降维,避免通过对第二训练数据进行回归分析可能遇到的由第二指标数据的共线性导致的计算困难。
可选的,在一些实施例中,在图4的基础上,可以对第一训练数据中的第二指标数据通过主成分分析,达到对第一训练数据中的第二指标数据降维的目的。下面结合图7,对这种实现方式进行详细描述。
应理解,对数据进行主成分分析是对数据进行降维处理的其中一种实现方式,本申请对此不做具体限定。
对第一训练数据中的第二指标数据的历史数做主成分析,可以得到第二指标数据主成分模型,
可选地,在一些实施例中,第二指标数据主成分模型的输出变量可以是多维的,但是不可能多于输入变量的维度。作为一个示例,本申请实施例中的第二指标数据可以是3个维度,输出的第二指标数据主成分可以为少于3个维度,例如,包括两个维度:第二指标数据主成分1和第二指标数据主成分2。
应理解,上文提及的第二指标数据主成分可以用于表示该第二指标数据经过主成分分析后得到的一组变量。主成分分析可以是一种统计方法,可以通过正交变换将一组可能存在相关性的第二指标数据转换为一组线性不相关的变量,转换后的变量可以称为第二指标数据主成分。
可选地,在一些实施例中,回归分析的输入变量可以不是第二指标数据的原值,而是经过特征处理的值。具体的特征处理可以参见前文对特征处理的描述,此处不再赘述。
下面以第一指标数据为用户数,第二指标数据为话务量,第三指标数据为资源占用率为例,进行详细说明。
图7是本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。图7包括步骤710-780,下面分别对步骤710-780进行详细描述。
在步骤710中,根据待建模设备,确定待预测“资源占用率”指标、“用户数”指标、“话务量”指标。
如图3所示,根据待建模用户数指标是UGW网元的“2G+3G用户数”指标和UGW网元的“4G用户数”指标,待建模资源占用率指标是SPU实例的“CPU峰值占用率”指标,确定话务量指标是UGW网元的“Gi接口包数”指标、UGW网元的“SGi用户面报文包数”指标、SPU实例的“GW接收用户面报文包数”指标。
在步骤720中,根据“用户数”指标和“话务量”指标的相关关系的跨设备通用性,选择与待建模设备具有通用性的设备,得到设备组合列表1。根据设备组合列表1获取第一 训练数据,包含“用户数”指标和“话务量”指标。
步骤720中获取第一训练数据的方法与图4所示的步骤420对应,具体请参照图4的描述,此处不再赘述。
在步骤730中,对第一训练数据中的“话务量”指标做主成分分析,得到“话务量主成分模型”。
可以通过对第一训练数据中的“话务量”指标做主成分分析,可以得到话务量主成分模型,从而可以实现对“话务量”指标实现降维的目的。
可以获取网元UGW0和UGW1的第一训练数据,包括:网元的“2G+3G用户数”指标、“4G用户数”指标、“Gi接口包数”指标、“SGi用户面报文包数”指标和SPU实例的“GW接收用户面报文包数”指标。
可以对第一训练数据中的“话务量”指标(例如,“Gi接口包数”指标、“SGi用户面报文包数”指标和SPU实例的“GW接收用户面报文包数”指标)做主成分分析,可以得到话务量主成分模型。
在步骤740中,用“话务量主成分模型”对第一训练数据进行处理。
通过步骤730中得到的“话务量主成分模型”对第一训练数据中的“Gi接口包数”指标、“SGi用户面报文包数”指标和“GW接收用户面报文包数”指标进行处理,得到第三训练数据。
可以获取网元UGW0和UGW1的数据,包括:网元的“2G+3G用户数”指标、“4G用户数”指标、“Gi接口包数”指标、“SGi用户面报文包数”指标和SPU实例的“GW接收用户面报文包数”指标。可以对获取的网元UGW0和UGW1的数据进行筛选,并可以对筛选之后的数据进行回归分析,可以得到第三训练数据。
在步骤750中,对第三训练数据中的“用户数”指标和“话务量”指标做回归分析,得到“用户数-话务量”模型。
可以对第三训练数据中的“用户数”指标和“话务量”指标做回归分析,可以得到网元UGW0和网元UGW1的“用户数-话务量模型”(也可以称为第一预测模型)。例如,以每天17:00作为峰值时点进行筛选,可以对筛选后第三训练数据数据中的“2G+3G用户数”指标、“4G用户数”指标和“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”指标进行回归分析,可以得到网元UGW0和网元UGW1的“用户数-话务量模型”(第一预测模型)。
在步骤760中,根据待建模设备获取第二训练数据,包含“话务量”指标和“资源占用率”指标。
根据话务量指标是UGW网元的“Gi接口包数”指标、UGW网元的“SGi用户面报文包数”指标、SPU实例的“GW接收用户面报文包数”指标,资源占用率指标是SPU实例的“CPU峰值占用率”指标,确定数据样本来自目标设备:UGW0和SPU实例0。以该设备在全天任一时间点的“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”指标、“CPU峰值占用率”的值作为一个数据样本,获取第二训练数据。
第二训练数据也可以选择其它设备组合例如UGW0和SPU实例1,本实施例选择UGW0和SPU实例0。
在步骤770中,用“话务量主成分模型”对第二训练数据进行处理。
在第二训练数据上,用步骤730中建立的“话务量主成分模型”对“Gi接口包数”指标、“SGi用户面报文包数”指标和“GW接收用户面报文包数”指标做特征处理,得到话务量指标。
在步骤780中,对第二训练数据中的“话务量”指标和“资源占用率”指标进行回归分析,得到“话务量-资源占用率模型”。
以UGW0和SPU实例0(目标设备)中的“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”作为话务量指标,以UGW0和SPU实例0(目标设备)中的“CPU峰值占用率”指标作为资源占用率指标,对话务量指标和资源占用率指标的做回归分析,训练得到“话务量-资源占用率模型”(第二预测模型)。
应理解,图7中的虚线箭头可以用于表示对另一步骤的间接影响。例如,步骤730与步骤770之间的虚线箭头可以用于表示步骤730中建立的话务量主成分模型对执行步骤770的间接影响。
本申请实施例中,可以对第二指标数据进行主成分分析,可以对第二指标数据进行变形后形成彼此相互独立的主成分,可以避免话务量共线性导致难以计算的问题。
还应理解,主成分分析得到的模型,描述的是一个映射。可以将原空间中的点对应到映射空间中的点,原空间中的原点在映射空间中并不一定在原点。可以在映射空间中,将原空间的原点经过平移变换可以到原点。
可选地,在一些实施例中,在图7的基础上可以增加对第一指标数据多样性的判断。如果第一指标数据的多样性满足上文提及的预设条件,可以对第一训练数据中的第一指标数据和第二指标数据进行过原点的回归分析。如果第一指标数据的多样性不满足上文提及的预设条件,可以对第一训练数据中的第一指标数据和第二指标数据进行不过原点的回归分析。下面结合图8对这种实现方式进行详细描述。
应理解,在图7(对第一训练数据和第二训练数据中的第二指标数据进行主成分分析)的基础上可以增加对第一指标数据多样性的判断。作为一个示例,如果第一指标数据多样性满足预设条件,可以对第一指标数据和第二指标数据(话务量主成分)进行不约束过原点的回归分析。作为另一个示例,如果第一指标数据多样性不满足预设条件,可以对第一指标数据和第二指标数据(话务量主成分)进行约束过原点的回归分析。在进行约束过原点的回归分析过程中,对第二指标数据进行主成分分析后,原空间中的原点在映射空间中并不一定在原点。可以在映射空间中,将原空间的原点经过平移变换可以到原点,从而可以对第一指标数据和第二指标数据(话务量主成分)进行约束过原点的回归分析。
下面以第一指标数据为用户数,第二指标数据为话务量,第三指标数据为资源占用率为例,进行详细说明。
图8本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。图8包括步骤810-890,下面分别对步骤810-890进行详细描述。
在步骤810中,根据待建模设备,确定待预测“资源占用率”指标、“用户数”指标、“话务量”指标。
如图3所示,根据待建模用户数指标是UGW网元的“2G+3G用户数”指标和UGW网元的“4G用户数”指标,待建模资源占用率指标是SPU实例的“CPU峰值占用率”指标,确定话务量指标是UGW网元的“Gi接口包数”指标、UGW网元的“SGi用户面报文包数” 指标、SPU实例的“GW接收用户面报文包数”指标。
在步骤820中,根据“用户数”指标和“话务量”指标的相关关系的跨设备通用性,选择与待建模设备具有通用性的设备,得到设备组合列表1。根据设备组合列表1获取第一训练数据,包含“用户数”指标和“话务量”指标。
步骤820与图7所示的步骤720对应,具体请参考图7的描述,此处不再赘述。
在步骤830中,对第一训练数据中的“用户数”指标和“话务量”指标进行主成分分析,得到“话务量主成分模型”。
步骤830与图7所示的步骤730对应,具体请参考图7的描述,此处不再赘述。
在步骤840中,判断第一训练数据中的“用户数”指标的数据多样性是否足够。
可以对第一训练数据上的用户数指标的多样性进行判断,如果第一训练数据中的用户数指标的数据多样性满足预设条件,可以通过执行步骤850。如果第一训练数据中的用户数指标的数据多样性不满足预设条件,可以先执行步骤860,在执行步骤850。
在步骤850中,用“话务量主成分模型”对第一训练数据进行处理。
如果第一训练数据中的用户数指标的数据多样性满足预设条件,在第一训练数据上,用步骤830中建立的话务量主成分模型对“Gi接口包数”指标、“SGi用户面报文包数”指标和“GW接收用户面报文包数”指标做特征处理,得到第三训练数据。
在步骤860中,确定一个平移变换,将该平移变换加入“话务量主成分模型”。
如果第一训练数据中的用户数指标的数据多样性满足预设条件,需要对第一训练数据上的用户数指标和话务量指标(话务量主成分)做约束过原点的回归分析。主成分分析得到的模型可以将原空间中的点对应到映射空间中的点,原空间中的原点在映射空间中并不一定在原点。如果第一训练数据和第二训练数据上的话务量指标处理的输入值为全零或近似全零,话务量指标处理的输出值不为全零或近似全零,可以根据不为全零或近似全零的输出值确定一个平移变化T,可以将第一训练数据和第二训练数据上的不为全零或近似全零的输出值经过平移变换T后使得话务量指标处理的输出值为全零或近似全零。
在步骤870中,对第三训练数据中的“用户数”指标和“话务量”指标进行回归分析,得到“用户数-话务量模型”。
在第三训练数据上,以“2G+3G用户数”指标、“4G用户数”指标作为用户数指标,对用户数指标和话务量指标做回归分析,训练得到第一预测模型。
在步骤880中,根据待建模设备获取第二训练数据,包含“话务量”指标和“资源占用率”指标。
步骤880与图7中所示的步骤760对应,具体请参考图7的描述,此处不再赘述。
在步骤890中,用“话务量主成分模型”对第二训练数据进行处理。
步骤890与图7中所示的步骤770对应,具体请参考图7的描述,此处不再赘述。
在步骤895中,对第二训练数据中的“话务量”指标和“资源占用率”指标做回归分析,得到“话务量-资源占用率模型”。
在第二训练数据上,以“CPU峰值占用率”指标作为资源占用率指标,对话务量指标和资源占用率指标的做回归分析,训练得到“话务量-资源占用率模型”(第二预测模型)。
应理解,图8中的虚线箭头可以用于表示对另一步骤的间接影响。例如,步骤830与步骤860之间的虚线箭头可以用于表示步骤830中建立的话务量主成分模型对执行步骤 860的间接影响。又如,步骤830与步骤890之间的虚线箭头可以用于表示步骤830中建立的话务量主成分模型对执行步骤890的间接影响。
本申请实施例中,在对第二指标数据进行主成分分析得到第二指标数据主成分后,如果数据的多样性仍不能满足预设条件,可以加入平移变换,可以将第二指标数据主成分空间中的非原点回归分析转换为原点回归分析(将原空间原点对应到映射空间原点),计算简单,易于实现。
可选地,在一些实施例中,可以在第一训练数据上对第一指标数据做过原点的回归分析建立第一预测模型的过程中,还可以在第一数据上对第一指标数据进行特征处理。如果第一指标数据处理的输入值为全零或近似全零,第一指标数据处理的输出值为全零或近似全零。可以对该第一指标数据和第二指标数据做过原点的回归分析,建立第一预测模型。
本申请实施例中,在得到第一指标数据的特征处理,应将所有特征全零或近似全零的输入值映射为所有特征全零或近似全零的输出值,以配合第一训练数据上的约束过原点的回归分析。
应理解,本申请实施例中在第一训练数据上对第一指标数据进行特征处理的方法不做具体限定。作为一个示例,可以是在第一训练数据上对第一指标数据进行降维处理,例如,可以对第一指标数据进行主成分分析。作为另一个示例,还可以是在第一训练数据上对第一指标数据进行标准化处理。作为另一个示例,还可以是在第一训练数据上对第一指标数据进行归一化处理。
可选地,在一些实施例中,可以在对第一指标数据和第二指标数据做过原点的回归分析建立第一预测模型的过程中,如果第一指标数据处理的输入值为全零或近似全零,第一指标数据处理的输出值不为全零或近似全零,可以根据不为全零或近似全零的输出值确定一个平移变化,可以将不为全零或近似全零的输出值经过平移变换后使得第一指标数据处理的输出值为全零或近似全零。再可以对该第一指标数据和第二指标数据做过原点的回归分析,建立第一预测模型。
应理解,本申请实施例中对第一指标数据进行特征处理的方法不做具体限定。作为一个示例,可以是对第一指标数据进行降维处理,例如,可以对第一指标数据进行主成分分析。作为另一个示例,还可以是对第一指标数据进行标准化处理。作为另一个示例,还可以是对第一指标数据进行归一化处理。
本申请实施例中,在得到第一指标数据的特征处理,应将所有特征全零或近似全零的输入值映射为所有特征全零或近似全零的输出值,以配合第一训练数据上的约束过原点的回归分析。如果特征处理不满足不满足,通过加入平移变换,可以使之满足要求。
可选的,在一些实施例中,在图4的基础上,可以对第一训练数据的第一指标数据通过主成分分析做降维,得到第一指标数据降维特征。下面结合图9,对这种实现方式进行详细描述。
应理解,作为一个示例,进行主成分分析的第一指标数据可以是第一指标数据原值。作为另一个示例,进行主成分分析的第一指标数据可以是对第一指标数据原值进行标准化之后的数值。作为另一个示例,进行主成分分析的第一指标数据可以是对第一指标数据原值进行归一化之后的数值。
下面以第一指标数据为用户数,第二指标数据为话务量,第三指标数据为资源占用率 为例,进行详细说明。
图9是本申请另一实施例提供的训练第一预测模型和第二预测模型的示意性流程图。图9包括步骤910-970,下面分别对步骤910-970进行详细描述。
在步骤910中,根据待建模设备,确定待预测“资源占用率”指标、“用户数”指标、“话务量”指标。
步骤910与图4所示的步骤410对应,具体请参照图4的描述,此处不再赘述。
在步骤920中,根据“用户数”指标和“话务量”指标的相关关系的跨设备通用性,选择与待建模设备具有通用性的设备,得到设备组合列表1。根据设备组合列表1获取第一训练数据,包含“用户数”指标和“话务量”指标。
步骤920中获取第一训练数据与图4所示的步骤420对应,具体请参照图4的描述,此处不再赘述。
在步骤930中,对第一训练数据中的“用户数”指标做主成分分析,建立“用户数主成分模型”。
可以通过对第一训练数据中的“用户数”指标做主成分分析,可以得到“用户数主成分模型”,从而可以实现对“用户数”指标实现降维的目的。
可以对第一训练数据中的“2G+3G用户数”指标、“4G用户数”指标做主成分分析,可以得到“用户数主成分模型”。
在步骤940中,用“用户数主成分模型”对第一训练数据进行处理。
通过用户数主成分模型对第一训练数据中的“2G+3G用户数”指标、““4G用户数”指标进行处理,得到第四训练数据。
在步骤950中,对第四训练数据中的“用户数”指标和“话务量”指标做回归分析,得到“用户数-话务量模型”。
在第四训练数据上,以“Gi接口包数”指标、“SGi用户面报文包数”指标和“GW接收用户面报文包数”指标作为话务量指标,用户数主成分作为用户数指标。对用户数指标和话务量指标做回归分析,训练得到“用户数-话务量模型”(第一预测模型)。
在步骤960中,根据待建模设备获取第二训练数据,包含“话务量”指标和“资源占用率”指标。
根据话务量指标是UGW网元的“Gi接口包数”指标、UGW网元的“SGi用户面报文包数”指标、SPU实例的“GW接收用户面报文包数”指标,资源占用率指标是SPU实例的“CPU峰值占用率”指标,确定数据样本来自目标设备:UGW0和SPU实例0。以该设备在全天任一时间点的“Gi接口包数”指标、“SGi用户面报文包数”指标、“GW接收用户面报文包数”指标、“CPU峰值占用率”的值作为一个数据样本,获取第二训练数据。
第二训练数据也可以选择其它设备组合例如UGW0和SPU实例1,本实施例选择UGW0和SPU实例0。
在步骤970中,对第二训练数据中的“话务量”指标和“资源占用率”指标做回归分析,得到第二预测模型。
在第二训练数据上,以“CPU峰值占用率”指标作为资源占用率指标,对话务量指标和资源占用率指标的做回归分析,训练得到第二预测模型。
本申请实施例中,可以对第一指标数据指标进行主成分分析,可以对第一指标数据指 标进行变形后形成彼此相互独立的主成分,可以避免第一指标数据指标共线性导致难以计算的问题。
可选地,在一些实施例中,可以同时对第一指标数据指标和第二指标数据进行主成分分析。
应理解,可以在图7或图8的基础上可以对第一指标数据指标进行主成分分析。
可选地,本申请实施例提供了一种预测方法,可以获取目标设备的预测第一指标数据指标,通过第一预测模型和第二预测模型,可以得到预测的第三指标数据指标。
训练第一预测模型和第二预测模型的方法请参照上文训练第一预测模型和训练第二预测模型的方法,此处不再赘述。
上文结合图1-图9,详细描述了本申请实施例提供的预测方法以及训练方法,下面将结合图10-图13,详细描述本申请实施例提供的装置实施例。
图10是本申请实施例提供的训练装置的示意性示意图。图10中的训练装置1000可以执行图1-图9中的任一实施例描述的训练方法。
图10中的训练装置1000可以包括:
第一获取模块1001,用于获取第一训练数据和第二训练数据;
其中,第一训练数据包括多个设备的第一指标数据和第二指标数据,所述第二训练数据包括目标设备的第二指标数据和第三指标数据所述目标设备为所述多个设备中的任一个设备。
第一训练模块1002,用于根据所述第一训练数据训练得到第一预测模型;
其中,第一预测模型用于根据所述目标设备的第一指标数据预测所述目标设备的第二指标数据。
第二训练模块1003,用于根据所述第二训练数据训练得到第二预测模型;
其中,第二预测模型用于根据所述第一预测模型得到的所述目标设备的第二指标数据预测所述目标设备的第三指标数据。
可选地,在一些实施例中,第一指标数据为用户数,所述第二指标数据为话务量,所述第三指标数据为资源占用率。
可选地,在一些实施例中,装置1000还包括:
第二获取模块1004,用于获取所述目标设备的待预测第一指标数据。
第一确定模块1005,用于将所述待预测第一指标数据输入所述第一预测模型,得到所述目标设备的预测第二指标数据。
第二确定模块1006,用于将所述预测第二指标数据输入所述第二预测模型,得到所述目标设备的预测结果。
其中,所述预测模型包括所述第一预测模型和所述第二预测模型,所述第一预测模型根据所述第一训练数据训练得到,所述第二预测模型根据所述第二训练数据训练得到。
可选地,在一些实施例中,第一训练模块1002具体用于:对第一训练数据中的第二指标数据进行主成分分析,得到主成分分析模型;根据主成分分析模型,对第一训练数据进行降维处理,得到降维后的第三训练数据;根据第三训练数据,训练第一预测模型。
可选地,在一些实施例中,第一训练模块1002具体用于:通过对第一训练数据进行回归分析,得到第一预测模型。
可选地,在一些实施例中,第一训练模块1002具体用于:在第一训练数据的多样性满足预设条件的情况下,对第一训练数据进行过原点回归分析;在第一训练数据的多样性不满足预设条件的情况下,对第一训练数据进行不过原点回归分析。
可选地,在一些实施例中,所述第二训练模块1003具体用于:通过对第二训练数据进行回归分析,得到第二预测模型。
可选地,在一些实施例中,第二训练模块1003具体用于:通过对第二训练数据进行分位数回归分析,得到第二预测模型。
可选地,在一些实施例中,多个设备之间的第一指标数据和第二指标数据的指标关系具有一致性。
图11是本申请实施例提供的预测装置的示意性示意图。图11中的预测装置1100可以用于执行第二方面或第二方面中任一种可能的实现方式中的预测方法。图11中的预测装置1100可以包括:
第一获取模块1101,用于获取目标设备的待预测第一指标数据;
第一确定模块1102,用于将所述待预测第一指标数据输入第一预测模型,得到所述目标设备的预测第二指标数据;
第二确定模块1103,用于将所述预测第二指标数据输入第二预测模型,得到所述目标设备的预测结果。
其中,所述预测模型包括第一预测模型和第二预测模型,所述第一预测模型根据第一训练数据训练得到,所述第二预测模型根据第二训练数据训练得到,所述第一训练数据包括多个设备的第一指标数据和第二指标数据,所述第二训练数据包括所述目标设备的第二指标数据和第三指标数据,所述多个设备包括所述目标设备。
可选地,在一些实施例中,第一指标数据为用户数,所述第二指标数据为话务量,所述第三指标数据为资源占用率。
可选地,在一些实施例中,装置1100还包括:
第二获取模块1104,用于获取所述第一训练数据。
第一训练模块1105,用于根据所述第一训练数据训练得到第一预测模型。
其中,第一预测模型用于根据所述目标设备的第一指标数据预测所述目标设备的第二指标数据。
可选地,在一些实施例中,第一训练模块1105具体用于:
对所述第一训练数据中的第二指标数据进行主成分分析,得到主成分分析模型;根据所述主成分分析模型,对所述第一训练数据进行降维处理,得到降维后的第三训练数据;根据所述第三训练数据,训练所述第一预测模型。
可选地,在一些实施例中,第一训练模块1105具体用于:通过对所述第一训练数据进行回归分析,得到所述第一预测模型。
可选地,在一些实施例中,第一训练模块1105具体用于:在所述第一训练数据的多样性满足预设条件的情况下,对所述第一训练数据进行过原点回归分析;在所述第一训练数据的多样性不满足所述预设条件的情况下,对所述第一训练数据进行不过原点回归分析。
可选地,在一些实施例中,装置1100还包括:
第三获取模块1106,用于获取所述第二训练数据。
第二训练模块1107,用于根据所述第二训练数据训练得到第二预测模型。
其中,第二预测模型用于根据所述第一预测模型得到的所述目标设备的第二指标数据预测所述目标设备的第三指标数据。
可选地,在一些实施例中,第二训练模块1107具体用于:通过对所述第二训练数据进行回归分析,得到所述第二预测模型。
可选地,在一些实施例中,第二训练模块1107具体用于:通过对所述第二训练数据进行分位数回归分析,得到所述第二预测模型。
可选地,在一些实施例中,所述多个设备之间的第一指标数据指标和第二指标数据的指标关系具有一致性。
图12是本申请实施例提供的训练装置的示意性结构图。图12中的训练装置1200可以执行图1-图9的任一实施例描述的训练方法。图12中的训练装置1200可以包括存储器1201和处理器1202。存储器1201可用于存储程序,处理器1202可用于执行所述存储器中存储的程序。当存储器1201中存储的程序被执行时,所述处理器1202可用于执行上文任一实施例描述的训练方法。
处理器1202例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),专用集成电路(Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。
相应的,该存储器1201可以用于存储该对用户数指标和资源占用率指标之间的数值关系建模的装置的程序代码和数据。因此,该存储器1201可以是处理器1202内部的存储单元,也可以是与处理器1202独立的外部存储单元,还可以是包括处理器1202内部的存储单元和与处理器1202独立的外部存储单元的部件。
图13是本申请实施例提供的预测装置的示意性结构图。图13中的预测装置1300可以用于执行第二方面或第二方面中任一种可能的实现方式中的预测方法。图13中预测装置1300可以包括存储器1301和处理器1302。存储器1301可用于存储程序,处理器1302可用于执行所述存储器中存储的程序。当存储器1301中存储的程序被执行时,所述处理器1302可用于执行上文任一实施例描述的训练方法。
处理器1302例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),专用集成电路(Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。
相应的,该存储器1301可以用于存储该对用户数指标和资源占用率指标之间的数值关系建模的装置的程序代码和数据。因此,该存储器1301可以是处理器1302内部的存储单元,也可以是与处理器1302独立的外部存储单元,还可以是包括处理器1302内部的存 储单元和与处理器1302独立的外部存储单元的部件。
本申请实施例提供了一种计算机可读存储介质,包括计算机指令,当所述计算机指令在训练装置上运行时,使得所述训练装置执行第一方面或第一方面任意一种实现方式所述的训练方法。
本申请实施例提供了一种计算机可读存储介质,包括计算机指令,当所述计算机指令在预测装置上运行时,使得所述预测装置执行第二方面或第二方面任意一种实现方式所述的预测方法。
本申请实施例提供了一种芯片,包括存储器和处理器,所述存储器用于存储程序;所述处理器用于执行所述存储器中存储的程序,当所述程序被执行时,所述处理器执行第一方面或第一方面任意一种实现方式中的方法。
本申请实施例提供了一种芯片,包括存储器和处理器,所述存储器用于存储程序;所述处理器用于执行所述存储器中存储的程序,当所述程序被执行时,所述处理器执行第二方面或第二方面任意一种实现方式中的方法。
本申请实施例提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行第一方面或第一方面任意一种实现方式的方法。
本申请实施例提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得该计算机执行第二方面或第二方面任意一种实现方式的方法。
应理解,在本申请实施例中,术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系。例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种对租户数量指标和资源消耗量指标之间的数值关系建模的方法,其特征在于,包括:
    在描述租户数量指标的特征和服务使用量指标的特征之间的数值关系的第一数据集上,做第一回归分析得到第一预测模型;
    在描述服务使用量指标的特征和资源消耗量指标的特征之间的数值关系的第二数据集上,做第二回归分析得到第二预测模型;
    所述第一数据集中的任一数据样本,都对应某个设备组合在某一条件下的所述租户数量指标的取值和所述服务使用量指标的取值,所述设备组合中某些设备的所述租户数量指标的原值,被直接作为所述数据样本的所述租户数量指标的特征,或被输入第一特征处理,由第一特征处理的输出值作为所述数据样本的所述租户数量指标的特征,所述设备组合中某些设备的所述服务使用量指标的原值,被直接作为所述数据样本的所述服务使用量指标的特征,或被输入第二特征处理,由第二特征处理的输出值作为所述数据样本的所述服务使用量指标的特征;
    第一数据集中,由全体数据样本分别对应的设备组合组成的集合是非单一的:所述数据集中存在至少一对数据样本,所述租户数量指标中存在至少一个,这对数据样本的这个租户数量指标的原值取自不同的两个设备;
    所述第二数据集中的任一数据样本,都对应某个设备组合在某一条件下的所述服务使用量指标的取值和所述资源消耗量指标的取值。所述设备组合中某些设备的所述服务使用量指标的原值,被直接作为所述数据样本的所述服务使用量指标的特征,或被输入第二特征处理,由第二特征处理的输出值作为所述数据样本的所述服务使用量指标的特征。所述设备组合中某些设备的所述资源消耗量指标的原值,被直接作为所述数据样本的所述资源消耗量指标的特征,或被输入第三特征处理,由第三特征处理的输出值作为所述数据样本的所述资源消耗量指标的特征。
  2. 如权利要求1所述的方法,其特征在于,所述服务使用量指标,是根据所述租户数量指标和所述服务使用量指标确的。
  3. 如权利要求1所述的方法,其特征在于,所述第一数据集中,提供所述租户数量指标的原值的设备与提供所述服务使用量指标的原值的设备之间的负载分配关系,在不同数据样本之间是相似的。
  4. 如权利要求1或2所述的方法,其特征在于,所述第二特征处理,在输入值为全零或近似全零的情况下,输出值为全零或近似全零。
  5. 如权利要求4所述的方法,其特征在于,所述第二特征处理包括第一平移变换,所述第一平移变换由如下步骤确定:
    对全零或近似全零的输入值进行所述第二特征处理中的部分处理,根据所述部分处理的输出值确定第一平移变换。
  6. 如权利要求1所述的方法,其特征在于,所述第一回归分析包括:
    在所述第一数据集上对所述租户数量指标的特征和所述服务使用量指标的特征做约束过原点的回归分析。
  7. 如权利要求6所述的方法,其特征在于,所述“在所述第一数据集上对所述租户数量指标的特征和所述服务使用量指标的特征做约束过原点的回归分析”,包括:
    当所述第一数据集上的所述租户数量指标的多样性不满足预设条件时,在所述第一数据集上对所述租户数量指标的特征和所述服务使用量指标的特征做约束过原点的回归分析,得到第一预测模型。
  8. 如权利要求1所述的方法,其特征在于,所述第一回归分析包括:
    当所述第一数据集上的所述租户数量指标的多样性满足预设条件时,在所述第一数据集上对所述租户数量指标的特征和所述服务使用量指标的特征做不约束过原点的回归分析,得到第一预测模型。
  9. 如权利要求1所述的方法,其特征在于,所述第二特征处理包括:
    在所述第一数据集上对设备的某些服务使用量指标进行第一降维映射处理,得到所述服务使用量指标的特征。
  10. 如权利要求9所述的方法,其特征在于,所述第一降维映射处理包括:
    通过服务使用量主成分模型进行特征处理,所述服务使用量主成分模型由如下步骤确定:
    在描述某些服务使用量指标的特征之间的数值关系的第三数据集上,做主成分分析,得到服务使用量主成分模型。
  11. 如权利要求1所述的方法,其特征在于,所述第一特征处理,在输入值为全零或近似全零的情况下,输出值为全零或近似全零。
  12. 如权利要求11所述的方法,其特征在于,所述第一特征处理包括第二平移变换,第一平移变换由如下步骤确定:
    对全零或近似全零输入值进行所述第一特征处理中的部分处理,根据所述部分处理输出值确定第二平移变换。
  13. 如权利要求1所述的方法,其特征在于,所述第一特征处理包括:
    在所述第一数据集上对设备的某些租户数量指标进行第二降维映射处理,得到所述租户数量指标的特征。
  14. 如权利要求13所述的方法,其特征在于,所述第二降维映射处理包括:
    通过租户数量主成分模型进行特征处理,租户数量主成分模型由如下步骤确定:
    在描述租户数量指标的特征之间的数值关系的第四数据集上,做主成分分析,得到描述租户数量指标主成分模型。
  15. 如权利要求1所述的方法,其特征在于,所述第二回归分析,包括:
    在所述第二数据集上对所述服务使用量指标的特征和所述资源消耗量指标的特征做分位数回归分析,得到第二预测模型。
  16. 一种对租户数量指标和资源消耗量指标之间的数值关系建模的装置,其特征在于,包括用于执行如权利要求1至15中任一项所述的方法的模块。
  17. 一种对租户数量指标和资源消耗量指标之间的数值关系建模的装置,其特征在于,包括存储器和处理器,
    所述存储器用于存储程序;
    所述处理器用于执行所述存储器中存储的程序,当所述程序被执行时,所述处理器执行如权利要求1至15中任一项所述的方法。
  18. 一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在所述对租户数量指标和资源消耗量指标之间的数值关系建模的装置上运行时,使得所述对租户数量指标和资源消耗量指标之间的数值关系建模的装置执行如权利要求1至15中任一项所述的方法。
PCT/CN2019/087185 2018-05-18 2019-05-16 一种预测方法、训练方法、装置及计算机存储介质 WO2019219052A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/020,361 US20200410376A1 (en) 2018-05-18 2020-09-14 Prediction method, training method, apparatus, and computer storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810481548.0A CN109905271B (zh) 2018-05-18 2018-05-18 一种预测方法、训练方法、装置及计算机存储介质
CN201810481548.0 2018-05-18

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/020,361 Continuation US20200410376A1 (en) 2018-05-18 2020-09-14 Prediction method, training method, apparatus, and computer storage medium

Publications (1)

Publication Number Publication Date
WO2019219052A1 true WO2019219052A1 (zh) 2019-11-21

Family

ID=66943109

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/087185 WO2019219052A1 (zh) 2018-05-18 2019-05-16 一种预测方法、训练方法、装置及计算机存储介质

Country Status (3)

Country Link
US (1) US20200410376A1 (zh)
CN (1) CN109905271B (zh)
WO (1) WO2019219052A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200042937A1 (en) * 2018-08-06 2020-02-06 Walmart Apollo, Llc System and method for item category footage recommendation
WO2020033408A1 (en) 2018-08-06 2020-02-13 Walmart Apollo, Llc System and method for item facing recommendation
CN110798227B (zh) * 2019-09-19 2023-07-25 平安科技(深圳)有限公司 模型预测优化方法、装置、设备及可读存储介质
US20220180244A1 (en) * 2020-12-08 2022-06-09 Vmware, Inc. Inter-Feature Influence in Unlabeled Datasets
CN112766596B (zh) * 2021-01-29 2024-04-16 苏州思萃融合基建技术研究所有限公司 建筑能耗预测模型的构建方法、能耗预测方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6266532B1 (en) * 1999-12-29 2001-07-24 Bellsouth Intellectual Property Management Corporation Method and apparatus for determining the optimal number of analog and digital radios in a dual-mode wireless network
CN101888650A (zh) * 2010-06-28 2010-11-17 中兴通讯股份有限公司 机器对机器业务接入容量的确定方法及系统
CN104901827A (zh) * 2014-03-07 2015-09-09 中国移动通信集团安徽有限公司 一种基于用户业务结构的网络资源评估方法及装置
CN105472631A (zh) * 2014-09-02 2016-04-06 中兴通讯股份有限公司 一种业务数据量和/或资源数据量的预测方法及预测系统
CN107943861A (zh) * 2017-11-09 2018-04-20 北京众荟信息技术股份有限公司 一种基于时间序列的缺失数据补充方法与系统

Family Cites Families (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7768496B2 (en) * 2004-12-02 2010-08-03 Sharp Laboratories Of America, Inc. Methods and systems for image tonescale adjustment to compensate for a reduced source light power level
US7536373B2 (en) * 2006-02-14 2009-05-19 International Business Machines Corporation Resource allocation using relational fuzzy modeling
JP4859558B2 (ja) * 2006-06-30 2012-01-25 株式会社日立製作所 コンピュータシステムの制御方法及びコンピュータシステム
CN101689050B (zh) * 2007-03-12 2014-03-12 艾默生过程管理电力和水力解决方案有限公司 发电厂性能监测中统计分析的使用
JP5598464B2 (ja) * 2009-03-25 2014-10-01 日本電気株式会社 基地局、基地局の制御方法、制御プログラム、および移動局
US8560283B2 (en) * 2009-07-10 2013-10-15 Emerson Process Management Power And Water Solutions, Inc. Methods and apparatus to compensate first principle-based simulation models
CN101715197B (zh) * 2009-11-19 2011-12-28 北京邮电大学 一种无线网络中多用户混合业务的容量规划方法
US8539078B2 (en) * 2010-07-08 2013-09-17 International Business Machines Corporation Isolating resources between tenants in a software-as-a-service system using the estimated costs of service requests
US8539018B2 (en) * 2010-11-04 2013-09-17 International Business Machines Corporation Analysis of IT resource performance to business organization
US9060269B2 (en) * 2010-12-15 2015-06-16 At&T Intellectual Property I, L.P. Optimization of cellular network architecture based on device type-specific traffic dynamics
US8209274B1 (en) * 2011-05-09 2012-06-26 Google Inc. Predictive model importation
JP5874292B2 (ja) * 2011-10-12 2016-03-02 ソニー株式会社 情報処理装置、情報処理方法、及びプログラム
CN103491556B (zh) * 2012-06-13 2017-06-20 华为技术服务有限公司 一种网络调整的方法及装置
US20150263925A1 (en) * 2012-10-05 2015-09-17 Telefonaktiebolaget L M Ericsson (Publ) Method and apparatus for ranking users within a network
US10531251B2 (en) * 2012-10-22 2020-01-07 United States Cellular Corporation Detecting and processing anomalous parameter data points by a mobile wireless data network forecasting system
US9288716B2 (en) * 2012-11-30 2016-03-15 At&T Mobility Ii Llc Resource management in a wireless communications network
US10782402B2 (en) * 2013-12-06 2020-09-22 Siemens Mobility GmbH Method for determining a position of at least two sensors, and sensor network
US20150244645A1 (en) * 2014-02-26 2015-08-27 Ca, Inc. Intelligent infrastructure capacity management
US11909616B2 (en) * 2014-04-08 2024-02-20 Eino, Inc. Mobile telecommunications network capacity simulation, prediction and planning
US9451473B2 (en) * 2014-04-08 2016-09-20 Cellco Partnership Analyzing and forecasting network traffic
US9204319B2 (en) * 2014-04-08 2015-12-01 Cellco Partnership Estimating long term evolution network capacity and performance
US9832138B1 (en) * 2014-04-16 2017-11-28 Google Llc Method for automatic management capacity and placement for global services
US20150347940A1 (en) * 2014-05-27 2015-12-03 Universita Degli Studi Di Modena E Reggio Emilia Selection of optimum service providers under uncertainty
US20220188700A1 (en) * 2014-09-26 2022-06-16 Bombora, Inc. Distributed machine learning hyperparameter optimization
CN105574601A (zh) * 2014-10-25 2016-05-11 胡峻源 用于移动话务统计的回归模型建模方法
US9886948B1 (en) * 2015-01-05 2018-02-06 Amazon Technologies, Inc. Neural network processing of multiple feature streams using max pooling and restricted connectivity
JP2016212547A (ja) * 2015-05-01 2016-12-15 富士通株式会社 情報提供プログラム、情報提供装置、及び情報提供方法
US9693355B2 (en) * 2015-07-21 2017-06-27 Verizon Patent And Licensing Inc. Methods and systems for profiling network resource usage by a mobile application
US11423323B2 (en) * 2015-09-02 2022-08-23 Qualcomm Incorporated Generating a sparse feature vector for classification
US10116521B2 (en) * 2015-10-15 2018-10-30 Citrix Systems, Inc. Systems and methods for determining network configurations using historical real-time network metrics data
CN105225020A (zh) * 2015-11-11 2016-01-06 国家电网公司 一种基于bp神经网络算法的运行状态预测方法和系统
US10559044B2 (en) * 2015-11-20 2020-02-11 Opower, Inc. Identification of peak days
US10659542B2 (en) * 2016-04-27 2020-05-19 NetSuite Inc. System and methods for optimal allocation of multi-tenant platform infrastructure resources
US20170316338A1 (en) * 2016-04-29 2017-11-02 Hewlett Packard Enterprise Development Lp Feature vector generation
US9756518B1 (en) * 2016-05-05 2017-09-05 Futurewei Technologies, Inc. Method and apparatus for detecting a traffic suppression turning point in a cellular network
US10217060B2 (en) * 2016-06-09 2019-02-26 The Regents Of The University Of California Capacity augmentation of 3G cellular networks: a deep learning approach
US10114440B2 (en) * 2016-06-22 2018-10-30 Razer (Asia-Pacific) Pte. Ltd. Applying power management based on a target time
US10726356B1 (en) * 2016-08-01 2020-07-28 Amazon Technologies, Inc. Target variable distribution-based acceptance of machine learning test data sets
US20180060458A1 (en) * 2016-08-30 2018-03-01 Futurewei Technologies, Inc. Using the information of a dependent variable to improve the performance in learning the relationship between the dependent variable and independent variables
US10942776B2 (en) * 2016-09-21 2021-03-09 Accenture Global Solutions Limited Dynamic resource allocation for application containers
US10372146B2 (en) * 2016-10-21 2019-08-06 Johnson Controls Technology Company Systems and methods for creating and using combined predictive models to control HVAC equipment
US10409642B1 (en) * 2016-11-22 2019-09-10 Amazon Technologies, Inc. Customer resource monitoring for versatile scaling service scaling policy recommendations
US10250950B2 (en) * 2017-01-03 2019-04-02 Synamedia Limited Method and device for determining redress measures for TV service outages based on impact analysis
US11093838B2 (en) * 2017-05-10 2021-08-17 Microsoft Technology Licensing, Llc Adaptive selection of user to database mapping
US11424993B1 (en) * 2017-05-30 2022-08-23 Amazon Technologies, Inc. Artificial intelligence system for network traffic flow based detection of service usage policy violations
CN107688872A (zh) * 2017-08-20 2018-02-13 平安科技(深圳)有限公司 预测模型建立装置、方法及计算机可读存储介质
US20190087389A1 (en) * 2017-09-18 2019-03-21 Elutions IP Holdings S.à.r.l. Systems and methods for configuring display layout
US10430898B2 (en) * 2017-12-06 2019-10-01 NAD Grid Corp Method and system for facilitating electricity services
CN107992200B (zh) * 2017-12-21 2021-03-26 爱驰汽车有限公司 车载显示屏的画面补偿方法、装置及电子设备
CN108009692A (zh) * 2017-12-26 2018-05-08 东软集团股份有限公司 设备维修信息处理方法、装置、计算机设备和存储介质
CN107992906A (zh) * 2018-01-02 2018-05-04 联想(北京)有限公司 一种模型处理方法、系统、终端设备及服务器
US10380753B1 (en) * 2018-05-30 2019-08-13 Aimotive Kft. Method and apparatus for generating a displacement map of an input dataset pair
US11550635B1 (en) * 2019-03-28 2023-01-10 Amazon Technologies, Inc. Using delayed autocorrelation to improve the predictive scaling of computing resources
US20200394455A1 (en) * 2019-06-15 2020-12-17 Paul Lee Data analytics engine for dynamic network-based resource-sharing
JP2021182314A (ja) * 2020-05-20 2021-11-25 富士通株式会社 判定方法、および判定プログラム
US11678001B2 (en) * 2020-06-23 2023-06-13 Roku, Inc. Modulating a quality of media content

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6266532B1 (en) * 1999-12-29 2001-07-24 Bellsouth Intellectual Property Management Corporation Method and apparatus for determining the optimal number of analog and digital radios in a dual-mode wireless network
CN101888650A (zh) * 2010-06-28 2010-11-17 中兴通讯股份有限公司 机器对机器业务接入容量的确定方法及系统
CN104901827A (zh) * 2014-03-07 2015-09-09 中国移动通信集团安徽有限公司 一种基于用户业务结构的网络资源评估方法及装置
CN105472631A (zh) * 2014-09-02 2016-04-06 中兴通讯股份有限公司 一种业务数据量和/或资源数据量的预测方法及预测系统
CN107943861A (zh) * 2017-11-09 2018-04-20 北京众荟信息技术股份有限公司 一种基于时间序列的缺失数据补充方法与系统

Also Published As

Publication number Publication date
CN109905271B (zh) 2021-01-12
CN109905271A (zh) 2019-06-18
US20200410376A1 (en) 2020-12-31

Similar Documents

Publication Publication Date Title
WO2019219052A1 (zh) 一种预测方法、训练方法、装置及计算机存储介质
CN110880984B (zh) 基于模型的流量异常监测方法、装置、设备及存储介质
WO2020135806A1 (zh) 一种应用于数据中心的运维方法和运维设备
WO2019062405A1 (zh) 应用程序的处理方法、装置、存储介质及电子设备
US9088540B1 (en) Processing data formatted for efficient communication over a network
CN111274256B (zh) 基于时序数据库的资源管控方法、装置、设备及存储介质
WO2017113774A1 (zh) 一种无线通讯系统用户优先级的判别方法及装置
WO2022021834A1 (zh) 神经网络模型确定方法、装置、电子设备、介质及产品
CN111985831A (zh) 云计算资源的调度方法、装置、计算机设备及存储介质
CN113986564A (zh) 应用数据的流量监控方法、装置、计算机设备及介质
CN113485792A (zh) 一种kubernetes集群内Pod调度方法、终端设备及存储介质
CN114493376A (zh) 一种基于工单数据的任务调度管理方法及系统
CN110598093A (zh) 一种业务规则管理方法及装置
WO2019062404A1 (zh) 应用程序的处理方法、装置、存储介质及电子设备
CN114172819A (zh) Nfv网元的需求资源预测方法、系统、电子设备和存储介质
US11250365B2 (en) Systems and methods for utilizing compliance drivers to conserve system resources and reduce compliance violations
CN113010122A (zh) 图像形成装置监控装置、方法、系统和存储介质
CN109669836B (zh) 智能it运维分析方法、装置、设备及可读存储介质
WO2020000724A1 (zh) 云平台主机间通信负载的处理方法、电子装置及介质
JP4934660B2 (ja) 通信帯域算出方法、装置、およびトラヒック管理方法
CN113609126B (zh) 一种众源时空数据的一体化存储管理方法及系统
CN110888733A (zh) 集群资源使用情况处理方法、装置及电子设备
CN114493358A (zh) 指标分解方法、装置和电子设备
JP4823258B2 (ja) 通信帯域算出方法、装置、およびトラヒック管理方法
CN113764109A (zh) 传染病传播规模预测方法、装置、介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19802859

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19802859

Country of ref document: EP

Kind code of ref document: A1