WO2021052394A1 - 模型训练方法、装置及系统 - Google Patents

模型训练方法、装置及系统 Download PDF

Info

Publication number
WO2021052394A1
WO2021052394A1 PCT/CN2020/115770 CN2020115770W WO2021052394A1 WO 2021052394 A1 WO2021052394 A1 WO 2021052394A1 CN 2020115770 W CN2020115770 W CN 2020115770W WO 2021052394 A1 WO2021052394 A1 WO 2021052394A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
machine learning
learning model
training sample
sample set
Prior art date
Application number
PCT/CN2020/115770
Other languages
English (en)
French (fr)
Inventor
薛莉
张彦芳
张�浩
张亮
李扬
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP20866562.0A priority Critical patent/EP4024261A4/en
Publication of WO2021052394A1 publication Critical patent/WO2021052394A1/zh
Priority to US17/696,593 priority patent/US20220207434A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3428Benchmarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection

Definitions

  • This application relates to the field of Artificial Intelligence (AI), and in particular to a model training method, device and system.
  • AI Artificial Intelligence
  • Machine learning refers to allowing the machine to train a machine learning model based on training samples, so that the machine learning model has category prediction capabilities for samples other than the training samples.
  • the data analysis system includes multiple analysis devices for data analysis.
  • the multiple analysis devices may include cloud analysis devices and site analysis devices.
  • the deployment method of the machine learning model in the system includes: the cloud analysis device performs the model After the offline training, the offline training model is directly deployed on the analysis equipment of the site.
  • the trained model may not be able to effectively adapt to the requirements of site analysis equipment.
  • the embodiments of the present application provide a model training method, device, and system.
  • the technical solution is as follows:
  • a model training method which is applied to site analysis equipment, including:
  • the first analysis device is a cloud analysis device
  • the machine learning model is incrementally trained based on the first training sample set, and the feature data in the first training sample set is the feature data of the site network corresponding to the site analysis device.
  • the feature data in the first training sample set is the feature data obtained from the site network corresponding to the site analysis device, which is more suitable for the application scenario of the site analysis device.
  • the first training sample set of the feature data obtained by the corresponding site network is used for model training, which can make the trained machine learning model more suitable for the needs of the site analysis equipment itself, realize the customization of the model, and improve the application of the model Flexibility;
  • the machine learning model is trained through a combination of offline training and incremental training, which can perform incremental training of the machine learning model when the type or pattern of the feature data obtained by the site analysis device changes ,
  • the model training method provided in the embodiments of the present application can effectively adapt to the requirements of site analysis equipment.
  • the method further includes:
  • the method further includes: sending prediction information to the evaluation device, the prediction information including a classification result obtained by prediction, for the evaluation device to evaluate whether the machine learning model is degraded based on the prediction information .
  • the site analysis device may send prediction information to the evaluation device every time a machine learning model is used to predict the classification result, and the prediction information includes the predicted classification result; in another example, the site The point analysis device may also periodically send prediction information to the evaluation device, and the prediction information includes the classification results obtained in the current period; in another example, the site analysis device may also, after the number of classification results obtained reaches the number threshold, Send prediction information to the evaluation device, the prediction information including the obtained classification results; in another example, the site analysis device may also send prediction information to the evaluation device within a set time period, the prediction information including the currently obtained classification As a result, interference with user services can be avoided in this way.
  • the incremental training of the machine learning model based on the first training sample set includes:
  • the machine learning model After receiving the training instruction sent by the evaluation device, the machine learning model is incrementally trained based on the first training sample set, and the training instruction is used to instruct to train the machine learning model.
  • the machine learning model is used to predict the classification results of the to-be-predicted data composed of one or more key performance indicator KPI feature data;
  • the KPI feature data is the feature data of the KPI time series, or is the KPI data ;
  • the prediction information also includes the KPI category corresponding to the KPI feature data in the data to be predicted, the identifier of the device to which the data to be predicted belongs, and the collection time of the KPI data corresponding to the data to be predicted.
  • the method further includes:
  • a retraining request is sent to the first analysis device, and the retraining request is used to request the first analysis device to perform an evaluation of the machine learning model. Perform heavy training.
  • the machine learning model is a tree model
  • the incremental training of the machine learning model based on the first training sample set includes:
  • the first node is any non-leaf node in the machine learning model.
  • the second node is the parent node or the child node of the first node;
  • the nodes in the subtree of the first node are traversed, and the traversed node is determined as the new first node, and the traversal is performed again Process until the current split cost of the first node traversed is less than the historical split cost of the first node, or the traverse reaches the target depth;
  • the current split cost of the first node is the cost of node splitting by the first node based on the first training sample
  • the first training sample is any training sample in the first training sample set
  • the first training sample is any training sample in the first training sample set.
  • a training sample includes feature data of one or more feature dimensions
  • the feature data is numerical data
  • the historical split cost of the first node is that the first node performs the node based on the historical training sample set of the first node
  • the cost of splitting, the historical training sample set of the first node is a set of samples divided into the first node in the historical training sample set of the machine learning model.
  • the current split cost of the first node is negatively related to the size of a first numerical distribution range
  • the first numerical distribution range is based on the feature value in the first training sample and the second numerical distribution range Determined distribution range
  • the second numerical distribution range is the distribution range of feature values in the historical training sample set of the first node
  • the historical split cost of the first node is negative to the size of the second numerical distribution range
  • the current split cost of the first node is the reciprocal of the sum of the spans of feature values in each feature dimension in the first numerical distribution range
  • the historical split cost of the first node is the first value distribution range.
  • node splitting is performed based on the numerical distribution range of the training sample set without a large amount of access to historical training samples. Therefore, the occupation of memory and computing resources is effectively reduced, and the training cost is reduced.
  • the weight of the machine learning model can be realized, the deployment of the machine learning model is more convenient, and the effective generalization of the model can be realized.
  • the second node to which the association is added includes:
  • the second node is added based on the first splitting point on the first splitting dimension, wherein, in the first value distribution range, the value of the first splitting dimension is not greater than the value range of the first splitting point value.
  • the value range in the first numerical distribution range that is greater than the value of the first split point in the first splitting dimension is divided to the right child node of the second node ,
  • the first splitting dimension is a span range based on the value of the feature in each feature dimension, the splitting dimension determined in each feature dimension, and the first splitting point is within the first numerical distribution range The numerical point for splitting determined on the first splitting dimension;
  • the second node is the parent node or child node of the first node
  • the second splitting dimension is the first node in the machine learning
  • the historical splitting dimension in the model, the second splitting point is the historical splitting point of the first node in the machine learning model
  • the first splitting dimension is the same as the second splitting dimension, and the first splitting point is located on the right side of the second splitting point, the second node is the parent node of the first node, and The first node is the left child node of the second node;
  • the second node is a left child node of the first node.
  • the first splitting dimension is a feature dimension randomly selected among the feature dimensions of the first numerical value distribution range, or the first splitting dimension is each feature dimension of the first numerical value distribution range The largest feature dimension in the mid-span;
  • the first splitting point is a numerical point randomly selected on the first splitting dimension of the first numerical distribution range.
  • the second node to which the association is added includes:
  • the method also includes:
  • the incremental training of the machine learning model is stopped.
  • the method further includes:
  • the first non-leaf node and the second non-leaf node in the machine learning model are merged, and the first leaf node and the second leaf node are merged to obtain a streamlined machine learning model.
  • the streamlined machine learning model is used For the prediction of classification results;
  • a streamlined machine learning model sent by the first analysis device where the streamlined machine learning model is that the first analysis device compares the first non-leaf node and the second non-leaf node in the machine learning model The nodes are merged, and the first leaf node and the second leaf node are merged;
  • the first leaf node is a child node of the first non-leaf node
  • the second leaf node is a child node of the second non-leaf node
  • the nodes include the same classification result, and the span ranges of the feature value of the historical training sample set allocated on the same feature dimension are adjacent.
  • the simplified machine learning model structure is simpler, reducing the number of tree branches and preventing the tree from being too deep.
  • the model architecture changes, it does not affect its prediction results, which can save storage space and improve prediction efficiency. And by streamlining the process, the model can be prevented from overfitting.
  • the simplified model is only used for sample analysis, the historical split information may not be recorded in the node information, which can further reduce the size of the model itself and improve the model prediction efficiency.
  • each node in the machine learning model stores node information correspondingly, and the node information of any node in the machine learning model includes label distribution information, and the label distribution information is used to reflect the division to Corresponding to the proportion of the labels of different categories of the samples in the historical training sample set in the corresponding node in the total number of labels, the total number of labels is the total number of labels corresponding to the samples in the historical training sample set of any node, any
  • the node information of a non-leaf node also includes historical split information, and the historical split information is information used by the corresponding node for splitting.
  • the historical split information includes: location information of the corresponding node in the machine learning model, split dimension, split point, numerical distribution range of the historical training sample set divided into the corresponding node, and historical split cost;
  • the label distribution information includes: the number of labels of the same category of the samples in the historical training sample set of the corresponding node and the total number of labels; or, the labels of different categories of the samples in the historical training sample set of the corresponding node are in The percentage of the total number of tags.
  • the first training sample set includes samples that meet a low-discrimination condition selected from samples obtained by the site analysis device, and the low-discrimination condition includes at least one of the following:
  • the absolute value of the difference between any two probabilities in the target probability set obtained by using the machine learning model to predict the sample is less than the first difference threshold.
  • the target probability set includes the first n classification results in descending order of probability Probability, 1 ⁇ n ⁇ m, m is the total number of probabilities obtained by the machine learning model predicting samples;
  • the absolute value of the difference between any two probabilities obtained by using the machine learning model to predict the sample is less than a second difference threshold
  • the absolute value of the difference between the highest probability and the lowest probability among the probabilities of the multiple classification results of the predicted sample by using the machine learning model is less than the third difference threshold
  • the absolute value of the difference between any two probabilities among the probabilities of the multiple classification results of the predicted sample by using the machine learning model is less than the fourth difference threshold
  • the machine learning model is used to predict that the probability distribution entropy E of the multiple classification results of the sample is greater than a specified distribution entropy threshold, and the E satisfies:
  • x i represents the i-th classification result
  • P(x i ) represents the probability of predicting the i-th classification result of the sample
  • b is the specified base, 0 ⁇ P(x i ) ⁇ 1.
  • a model training method is provided, which is applied to a first analysis device.
  • the first analysis device may be a cloud analysis device.
  • the method includes:
  • the feature data in the training sample set of the machine learning model is the feature data of the site network corresponding to any site analysis device.
  • the first analysis device may distribute the trained machine learning model to each site analysis device, and each site analysis device will perform incremental training to ensure the machine learning model on each site analysis device Performance.
  • the first analysis device does not need to train the corresponding machine learning model for each site analysis device, which effectively reduces the overall training time of the first analysis device, and the model obtained by offline training can be used as an increase in each site analysis device.
  • the basis of quantitative training improves the versatility of the model obtained from offline training, thereby realizing model generalization and reducing the overall training cost of the first analysis device.
  • the historical training sample set is a set of training samples sent by multiple site analysis devices.
  • the method further includes:
  • the machine learning model is a tree model
  • the offline training based on the historical training sample set to obtain the machine learning model includes:
  • the offline training process includes:
  • the splitting of the third node to obtain the left child node and the right child node of the third node includes:
  • the third node is split based on the numerical distribution range of the historical training sample set to obtain the left child node and the right child node of the third node, and the numerical distribution range of the historical training sample set is the historical training The distribution range of feature values in the sample set.
  • the splitting of the third node based on the numerical distribution range of the historical training sample set to obtain the left child node and the right child node of the third node includes:
  • the third numerical value distribution range is divided into the left child node in the third numerical value distribution range whose numerical value is not greater than the numerical value of the third division point in the third division dimension, and the third numerical value distribution range is in the third division.
  • the value range of the dimension greater than the value of the third split point is divided into the right child node, and the third value distribution range is the distribution range of feature values in the historical training sample set of the third node.
  • node splitting is performed based on the numerical distribution range of the training sample set without a large amount of access to historical training samples. Therefore, the occupation of memory and computing resources is effectively reduced, and the training cost is reduced.
  • the weight of the machine learning model can be realized, the deployment of the machine learning model is more convenient, and the effective generalization of the model can be realized.
  • the first analysis device since the first analysis device has already obtained the foregoing historical training sample set for training, it is also possible to directly use the samples to split the node, and perform the splitting of the third node to obtain the result.
  • the replacement of the left child node and the right child node of the third node includes: dividing the samples in the historical training sample set whose feature value in the third splitting dimension is not greater than the value of the third split point to the left child node, and the historical training sample set In the third splitting dimension, the samples whose feature value is greater than the value of the third splitting point are divided into the right child node.
  • the split cutoff condition includes at least one of the following:
  • the current splitting cost of the third node is greater than the splitting cost threshold, so that excessive splitting of the tree can be avoided and computing overhead can be reduced;
  • the number of samples in the historical training sample set is less than the second sample number threshold. This situation indicates that the amount of data in the historical training sample set is already too small to support effective node splitting, and the offline training process is stopped at this time , Which can reduce computing overhead;
  • the number of splits corresponding to the third node is greater than the threshold of the number of splits. This situation indicates that the current machine learning model has reached the upper limit of the number of splits. At this time, the offline training process is stopped, which can reduce computational overhead;
  • the depth of the third node in the machine learning model is greater than a depth threshold, so that the depth of the machine learning model can be controlled;
  • the number of labels with the largest proportion among the labels corresponding to the historical training sample set accounts for more than a specified proportion threshold in the total number of labels corresponding to the historical training sample set. This situation indicates that the proportion of the labels corresponding to the historical training sample set is greater than the specified proportion threshold.
  • the number of labels with the largest ratio has reached the classification condition, and an accurate classification result can be determined based on this. Stopping the offline training process at this time can reduce unnecessary splits and reduce computational overhead.
  • the current split cost of the third node is negatively related to the size of the distribution range of feature values in the historical training sample set.
  • the current split cost of the third node is the reciprocal of the sum of the spans of feature values of the historical training sample set in each feature dimension.
  • the method further includes:
  • the first non-leaf node and the second non-leaf node in the machine learning model are merged, and the first leaf node and the second leaf node are merged to obtain a streamlined machine learning model.
  • the streamlined machine learning model is used For predicting classification results, wherein the first leaf node is a child node of the first non-leaf node, the second leaf node is a child node of the second non-leaf node, and the first leaf The node and the second leaf node include the same classification result, and the span ranges of the feature value of the historical training sample set allocated on the same feature dimension are adjacent;
  • each node in the machine learning model stores node information correspondingly, and the node information of any node in the machine learning model includes label distribution information, and the label distribution information is used to reflect the division to Corresponding to the proportion of the labels of different categories of the samples in the historical training sample set in the corresponding node in the total number of labels, the total number of labels is the total number of labels corresponding to the samples in the historical training sample set of any node, any
  • the node information of a non-leaf node also includes historical split information, and the historical split information is information used by the corresponding node for splitting.
  • the historical split information includes: location information of the corresponding node in the machine learning model, split dimension, split point, numerical distribution range of the historical training sample set divided into the corresponding node, and historical split cost;
  • the label distribution information includes: the number of labels of the same category of the samples in the historical training sample set of the corresponding node and the total number of labels; or, the labels of different categories of the samples in the historical training sample set of the corresponding node are in The percentage of the total number of tags.
  • the first training sample set includes samples that meet a low-discrimination condition selected from samples obtained by the site analysis device, and the low-discrimination condition includes at least one of the following:
  • the absolute value of the difference between any two probabilities in the target probability set obtained by using the machine learning model to predict the sample is less than the first difference threshold.
  • the target probability set includes the first n classification results in descending order of probability Probability, 1 ⁇ n ⁇ m, m is the total number of probabilities obtained by the machine learning model predicting samples;
  • the absolute value of the difference between any two probabilities obtained by using the machine learning model to predict the sample is less than a second difference threshold
  • the absolute value of the difference between the highest probability and the lowest probability among the probabilities of the multiple classification results of the predicted sample by using the machine learning model is less than the third difference threshold
  • the absolute value of the difference between any two probabilities among the probabilities of the multiple classification results of the predicted sample by using the machine learning model is less than the fourth difference threshold
  • the machine learning model is used to predict that the probability distribution entropy E of the multiple classification results of the sample is greater than a specified distribution entropy threshold, and the E satisfies:
  • x i represents the i-th classification result
  • P(x i ) represents the probability of predicting the i-th classification result of the sample
  • b is the specified base, 0 ⁇ P(x i ) ⁇ 1.
  • a model training device in a third aspect, includes: a plurality of functional modules: the plurality of functional modules interact to implement the methods in the first aspect and various embodiments described above.
  • the multiple functional modules can be implemented based on software, hardware, or a combination of software and hardware, and the multiple functional modules can be combined or divided arbitrarily based on specific implementations.
  • a model training device comprising: a plurality of functional modules: the multiple functional modules interact to implement the above-mentioned second aspect and the methods in various embodiments thereof.
  • the multiple functional modules can be implemented based on software, hardware, or a combination of software and hardware, and the multiple functional modules can be combined or divided arbitrarily based on specific implementations.
  • a model training device including: a processor and a memory;
  • the memory is used to store a computer program, and the computer program includes program instructions
  • the processor is configured to call the computer program to implement the model training method according to any one of the first aspect; or, to implement the model training method according to any one of the second aspect.
  • a computer storage medium is provided, and instructions are stored on the computer storage medium.
  • the model training method according to any one of the first aspects is implemented; or The model training method of any one of the second aspect.
  • a chip in a seventh aspect, includes a programmable logic circuit and/or program instructions.
  • the model training method as described in the first aspect is implemented; or, the model training method as in the second aspect is implemented.
  • the described model training method is provided.
  • a computer program product is provided, and instructions are stored in the computer program product, and when the instructions are executed on a computer, the computer executes the model training method according to any one of the first aspects; Or, the computer is caused to execute the model training method according to any one of the second aspects.
  • the site analysis device receives the machine learning model sent by the first analysis device, and can compare the machine learning model based on the first training sample set obtained from the site network corresponding to the site analysis device. Learning model for incremental training.
  • the feature data in the first training sample set is the feature data obtained from the site network corresponding to the site analysis device, which is more suitable for the application scenario of the site analysis device.
  • the first training sample set of the feature data obtained by the corresponding site network is used for model training, which can make the trained machine learning model more suitable for the needs of the site analysis equipment itself, realize the customization of the model, and improve the application of the model Flexibility; on the other hand, the machine learning model is trained through a combination of offline training and incremental training, which can perform incremental training of the machine learning model when the type or pattern of the feature data obtained by the site analysis device changes , Realize the flexible adjustment of the machine learning model, so as to ensure that the machine learning model obtained by training meets the requirements of the site analysis equipment. Therefore, compared with related technologies, the model training method provided in the embodiments of the present application can effectively adapt to the requirements of site analysis equipment.
  • the first analysis device may distribute the trained machine learning model to each site analysis device, and each site analysis device performs incremental training to ensure the performance of the machine learning model on each site analysis device.
  • the first analysis device does not need to train the corresponding machine learning model for each site analysis device, which effectively reduces the overall training time of the first analysis device, and the model obtained by offline training can be used as an increase in each site analysis device.
  • the basis of quantitative training improves the versatility of the model obtained from offline training, thereby realizing model generalization and reducing the overall training cost of the first analysis device.
  • the simplified machine learning model structure is simpler, reducing the number of tree branches and preventing the tree from being too deep.
  • the model architecture changes, it does not affect its prediction results, which can save storage space and improve prediction efficiency. And by streamlining the process, the model can be prevented from overfitting. Further, if the simplified model is only used for sample analysis, the historical split information may not be recorded in the node information, which can further reduce the size of the model itself and improve the efficiency of model prediction.
  • node splitting is performed based on the numerical distribution range of the training sample set, without a large amount of access to historical training samples. Therefore, the occupation of memory and computing resources is effectively reduced, and training is reduced. cost.
  • the weight of the machine learning model can be realized, the deployment of the machine learning model is more convenient, and the effective generalization of the model can be realized.
  • FIG. 1 is a schematic diagram of an application scenario involved in a model training method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of another application scenario involved in the model training method provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of another application scenario involved in the model training method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • FIG. 5 is a flow chart of a method for controlling a site analysis device to perform incremental training on a machine learning model based on an evaluation result of a classification result according to an embodiment of the present application;
  • FIG. 6 is a schematic diagram of a tree structure provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of the splitting principle of a tree model provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of the splitting principle of another tree model provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of the splitting principle of another tree model provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of the splitting principle of yet another tree model provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of the splitting principle of a tree model provided by another embodiment of the present application.
  • FIG. 12 is a schematic diagram of the splitting principle of another tree model provided by another embodiment of the present application.
  • FIG. 13 is a schematic diagram of the splitting principle of yet another tree model provided by another embodiment of the present application.
  • FIG. 14 is a schematic diagram of the splitting principle of still another tree model provided by another embodiment of the present application.
  • 15 is a schematic diagram of the splitting principle of a tree model provided by another embodiment of the present application.
  • 16 is a schematic diagram of the splitting principle of another tree model provided by another embodiment of the present application.
  • FIG. 17 is a schematic diagram of the splitting principle of yet another tree model provided by another embodiment of the present application.
  • Figure 18 is a schematic diagram of the incremental training effect of a traditional machine learning model
  • FIG. 19 is a schematic diagram of an incremental training effect of a machine learning model provided by an embodiment of the present application.
  • FIG. 20 is a schematic structural diagram of a model training device provided by an embodiment of the present application.
  • FIG. 21 is a schematic structural diagram of another model training device provided by an embodiment of the present application.
  • FIG. 22 is a schematic structural diagram of another model training device provided by an embodiment of the present application.
  • FIG. 23 is a schematic structural diagram of still another model training device provided by an embodiment of the present application.
  • FIG. 24 is a schematic structural diagram of a model training device provided by another embodiment of the present application.
  • FIG. 25 is a schematic structural diagram of another model training device provided by another embodiment of the present application.
  • FIG. 26 is a schematic structural diagram of yet another model training device provided by another embodiment of the present application.
  • Fig. 27 is a block diagram of an analysis device provided by an embodiment of the present application.
  • machine learning algorithms have been widely used in many fields. From the perspective of learning methods, machine learning algorithms can be divided into supervised learning algorithms, unsupervised learning algorithms, semi-supervised learning algorithms, and reinforcement learning algorithms.
  • Supervised learning algorithm refers to the ability to learn an algorithm or establish a model based on training data, and use this algorithm or model to infer new instances.
  • Training data also called training samples, is composed of input data and expected output.
  • the model of a machine learning algorithm also called a machine learning model, its expected output, called a label, which can be a predicted classification result (called a classification label).
  • the difference between the unsupervised learning algorithm and the supervised learning algorithm is that the training samples of the unsupervised learning algorithm do not have a given label, and the machine learning algorithm model obtains certain results by analyzing the training samples.
  • a semi-supervised learning algorithm one part of the training sample is labeled and the other part is unlabeled, and there are far more unlabeled data than labeled data.
  • Reinforcement learning algorithms try to maximize the expected benefits through continuous attempts in the environment, and through the rewards or punishments given by the environment, generate the choices that can obtain the greatest benefits.
  • each training sample includes one-dimensional or multi-dimensional feature data, that is, feature data including one or more features.
  • the characteristic data may specifically be KPI characteristic data.
  • KPI feature data refers to feature data generated based on KPI data.
  • the KPI feature data can be the feature data of the KPI time series, that is, the data obtained by extracting the features of the KPI time series; the KPI feature data can also be directly KPI data .
  • the KPI may specifically be a network KPI.
  • the network KPI may include various categories such as central processing unit (CPU) utilization, optical power, network traffic, packet loss rate, delay, and/or number of user accesses. KPI.
  • the KPI characteristic data may specifically be the characteristic data extracted from the time series of the KPI data of any of the aforementioned KPI categories.
  • a training sample includes the network KPI feature data of two characteristics: the maximum value of the corresponding network KPI time series and the weighted average value.
  • the KPI characteristic data may specifically be KPI data of any of the aforementioned KPI categories.
  • a training sample includes network KPI feature data with three characteristics: CPU utilization, packet loss rate, and delay.
  • the training sample may also include a label. For example, in the aforementioned scenario of predicting the classification result of KPI data, assuming that the classification result is used to indicate whether the data sequence is abnormal, a training sample also includes a label: "abnormal" or "normal".
  • time series is a special data series, which is a collection of a set of data arranged in a time series.
  • the time series is usually the order in which the data is generated, and the data in the time series are also called data points.
  • time interval of each data point in a time series is a constant value, so the time series can be analyzed and processed as discrete time data.
  • the current training methods of machine learning algorithms are divided into offline learning (online learning) and online learning (online learning) methods.
  • offline learning also known as offline training
  • samples in the training sample set need to be input into the machine learning model in batches for model training, and the amount of data required for training is relatively large.
  • Offline learning is usually used to train large or complex models, so the training process is often time-consuming and the amount of data processed is large.
  • Incremental learning also called incremental training
  • online learning is a special online learning method that not only requires the model to have the ability to learn new models in real time, but also requires the model Having anti-forgetting ability means that the model can not only remember the historically learned patterns, but also learn new patterns.
  • the sample with strong correlation with the category is selected as the sample set.
  • the label is used to identify the sample data, such as identifying the type of the sample data.
  • the data used for machine learning model training are all sample data.
  • the training data is called training sample
  • the training sample set is called training sample set
  • the sample data is referred to as sample in part of the content. .
  • FIG. 1 is a schematic diagram of an application scenario involved in the model training method provided by an embodiment of the present application.
  • the application scenario includes multiple analysis devices, and the multiple analysis devices include an analysis device 101 and a plurality of analysis devices 102. Each analysis device is used to perform a series of data analysis processes such as data mining and/or data modeling.
  • the number of analysis devices 101 and analysis devices 102 in FIG. 1 is only for illustration, and not as a limitation on the application scenarios involved in the model training method provided in the embodiment of the present application.
  • the analysis device 101 may specifically be a cloud analysis device (also called a cloud analysis platform), which may be a computer, or a server, or a server cluster composed of several servers, or a cloud computing service center, It is deployed at the back end of the service network.
  • the analysis device 102 may specifically be a site analysis device (also called a site analysis platform), a server, or a server cluster composed of several servers, or a cloud computing service center.
  • the model training system involved in the model training method includes multiple site networks.
  • the site network can be a core network or an edge network.
  • the users of each site network can be operators or enterprise customers. .
  • Multiple analysis devices 102 can have a one-to-one correspondence with multiple site networks.
  • Each analysis device 102 is used to provide data analysis services for the corresponding site network.
  • Each analysis device 102 can be located in the corresponding site analysis network. It can be located outside the corresponding site analysis network.
  • Each analysis device 102 and the analysis device 101 are connected through a wired network or a wireless network.
  • the communication network involved in the embodiments of this application is a second-generation (2-Generation, 2G) communication network, a third-generation (3rd Generation, 3G) communication network, a long-term evolution (Long Term Evolution, LTE) communication network, or a fifth-generation (2-Generation, 2G) communication network.
  • Generation (5rd Generation, 5G) communication network etc.
  • the analysis device 101 is also used to manage part or all of the business of the analysis device 102, collect training sample sets, and provide data analysis services for the analysis device 102.
  • the analysis device 101 can train to obtain machine learning based on the collected training sample sets. Model (this process adopts the aforementioned offline learning method), and then deploys the machine learning model in each site analysis device, and the site analysis device performs incremental training (this process adopts the aforementioned online learning method).
  • Model this process adopts the aforementioned offline learning method
  • the site analysis device performs incremental training (this process adopts the aforementioned online learning method).
  • Based on different training samples different machine learning models can be trained, and different machine learning models can achieve different classification functions. For example, functions such as anomaly detection, prediction, network security protection and application identification or user experience evaluation (ie, evaluation of user experience) can be implemented.
  • FIG. 2 is a schematic diagram of another application scenario involved in the model training method provided in an embodiment of the present application.
  • the application scenario also includes a network device 103.
  • Each analysis device 102 can manage a network device 103 in a network (also called a site network), and the analysis device 102 is connected to the network device 103 managed by it through a wired network or a wireless network.
  • the network device 103 may be a router, a switch, a base station, or the like.
  • the network device 103 and the analysis device 102 are connected through a wired network or a wireless network.
  • the network device 103 is used to upload collected data, such as various KPI time series, to the analysis device 102, and the analysis device 102 is used to extract and use data from the network device 103, for example, to determine the tags of the acquired time series.
  • the data uploaded by the network device 103 to the analysis device 102 may also include various types of log data and device status data.
  • FIG. 3 is a schematic diagram of another application scenario involved in the model training method provided in an embodiment of the present application.
  • the application scenario further includes the evaluation device 104 (FIG. 3 draws the application scenario based on FIG. 2, but it is not limited thereto).
  • the evaluation device 104 and the analysis device 102 are connected through a wired network or a wireless network.
  • the evaluation device 104 is configured to evaluate the classification result of the data classification by the analysis device 102 using the machine learning model, and control the site analysis device to perform incremental training of the machine learning model based on the evaluation result.
  • the application scenario may also include a storage device, which is used to store data provided by the network device 103 or the analysis device 102.
  • the storage device may be a distributed storage device, an analysis device 102 or the analysis device 101 can read and write data stored in the storage device. In this way, when there are a lot of data in the application scenario, the storage device performs data storage, which can reduce the load of the analysis device (such as the analysis device 102 or the analysis device 101), and improve the data analysis efficiency of the analysis device.
  • the storage device may be used to store data of determined labels, and the data of determined labels may be used as samples for model training. It should be noted that when the amount of data in the application scenario is small, the storage device may not be set.
  • the application scenario also includes a management device, such as a network management device (also called a network management platform) or a third-party management device.
  • the management device is used to provide configuration feedback and sample labeling feedback.
  • the management device is usually performed by operation and maintenance personnel.
  • the management device may be a computer, or a server, or a server cluster composed of several servers, or a cloud computing service center, which may be an operations support system (OSS) or Other network equipment connected to the analysis equipment.
  • OSS operations support system
  • the aforementioned analysis device can perform feature data selection and model update of each machine learning model, and feed back the selected feature data and model update results to the management device, and the management device decides whether to retrain the model.
  • the model training method provided in the embodiment of the present application can be used in an anomaly detection scenario.
  • Anomaly detection refers to the detection of patterns that do not meet expectations.
  • the data sources of anomaly detection include applications, processes, operating systems, devices, or networks.
  • the object of anomaly detection may be the aforementioned KPI data sequence.
  • the analysis device 102 may be a network analyzer, the machine learning model maintained by the analysis device 102 is an anomaly detection model, and the determined label is an anomaly detection label.
  • the detection label includes two classification labels, namely: "normal" and "abnormal".
  • the aforementioned machine learning model can be a model based on statistics and data distribution algorithms (such as the N-Sigma algorithm), a model based on distance/density algorithms (such as a local anomaly factor algorithm), and a tree model (such as isolated Forest (Isolation forest, Iforest) or prediction-based algorithm models (such as differential integrated moving average autoregressive model (Autoregressive Integrated Moving Average model, ARIMA)), etc.
  • statistics and data distribution algorithms such as the N-Sigma algorithm
  • a model based on distance/density algorithms such as a local anomaly factor algorithm
  • a tree model such as isolated Forest (Isolation forest, Iforest) or prediction-based algorithm models (such as differential integrated moving average autoregressive model (Autoregressive Integrated Moving Average model, ARIMA)
  • ARIMA differential integrated moving average autoregressive model
  • the cloud analysis device performs offline training of the model, and then the offline trained model is directly deployed on the site analysis device.
  • the trained model may not be able to effectively adapt to the requirements of site analysis equipment, such as the predicted performance (such as accuracy or recall) requirements.
  • the training samples in the historical training sample set used by the cloud analysis equipment are usually pre-configured fixed training samples, which may not meet the requirements of the site analysis equipment; on the other hand, even if the machine learning model obtained by training is in the When deployed on the site analysis equipment, it meets the requirements of site analysis equipment.
  • the type or pattern of the feature data obtained by the site analysis equipment changes, resulting in the machine learning model obtained from training and the site. The demand for point analysis equipment no longer meets.
  • the machine learning model obtained by training can only target a single site analysis device.
  • the cloud analysis device serves multiple site analysis devices, it is necessary to train the corresponding machine learning model for each site analysis device.
  • the model obtained by training has low versatility, cannot achieve model generalization, and the training cost is high.
  • the embodiment of the application provides a model training method.
  • the subsequent embodiments assume that the aforementioned analysis device 101 is the first analysis device, and the analysis device 102 is the site analysis device.
  • the site analysis device receives the machine learning model sent by the first analysis device, and The machine learning model can be incrementally trained based on the first training sample set obtained from the site network corresponding to the site analysis device.
  • the feature data in the first training sample set is the feature data obtained from the site network corresponding to the site analysis device, which is more suitable for the application scenario of the site analysis device.
  • the first training sample set of the feature data obtained by the corresponding site network is used for model training, so that the trained machine learning model can be more adapted to the needs of the site analysis device itself (that is, the site corresponding to the site analysis device) Point network requirements) to realize the customization of the model and improve the application flexibility of the model; on the other hand, through the combination of offline training and incremental training to train the machine learning model, the feature data obtained by the equipment can be analyzed at the site When the category or mode of the machine is changed, the incremental training of the machine learning model is carried out to realize the flexible adjustment of the machine learning model, so as to ensure that the machine learning model obtained by the training meets the requirements of the site analysis equipment. Therefore, compared with related technologies, the model training method provided in the embodiments of the present application can effectively adapt to the requirements of site analysis equipment.
  • the first analysis device may distribute the trained machine learning model to each site analysis device, and each site analysis device performs incremental training to ensure the performance of the machine learning model on each site analysis device.
  • the first analysis device does not need to train the corresponding machine learning model for each site analysis device, which effectively reduces the overall training time of the first analysis device, and the model obtained by offline training can be used as an increase in each site analysis device.
  • the basis of quantitative training improves the versatility of the model obtained from offline training, thereby realizing model generalization and reducing the overall training cost of the first analysis device.
  • the embodiment of the present application provides a model training method, which can be applied to any of the application scenarios shown in Figures 1 to 3, and the machine learning model can be used to predict the classification results.
  • the machine learning model can be used to predict the classification results.
  • it can be a two-classification model.
  • the classification result determined manually or by label migration is called a label
  • the result predicted by the machine learning model itself is called a classification result.
  • the two are substantially the same, and both are used to identify the corresponding sample. Category.
  • the application scenario of the model training method usually includes multiple site analysis devices.
  • the embodiment of this application uses a site analysis device as an example to illustrate the model training method.
  • the actions of other site analysis devices please refer to the site analysis device. action.
  • the method includes:
  • Step 401 The first analysis device performs offline training based on the historical training sample set to obtain a machine learning model.
  • the first analysis device may continuously collect training samples to obtain a training sample set, and perform offline training based on the collected training sample set (may be referred to as a historical training sample set), thereby obtaining a machine learning model.
  • the historical training sample set may be a set of training samples sent by multiple site analysis devices.
  • the machine learning model obtained based on this training can be adapted to the needs of the multiple site analysis equipment, and the model obtained by training has high versatility, thereby ensuring the generalization of the model.
  • the training samples can be obtained from the data collected and uploaded by the site analysis device in the network device, and transmitted by the site analysis device to the first analysis device.
  • the training samples may also be obtained by the first analysis device in other ways, for example, obtained from data stored in the storage device, which is not limited in the embodiment of the present application.
  • the training sample can have multiple forms. Accordingly, the first analysis device can obtain the training sample in multiple ways.
  • the embodiment of the present application takes the following two optional methods as examples for illustration:
  • the training samples acquired by the aforementioned first analysis device may include data determined based on a time series.
  • a time series For example, include data determined based on KPI time series.
  • each training sample in the historical training sample set corresponds to a time series, and each training sample may include feature data for extracting one or more features from the corresponding time series.
  • the number of features corresponding to each training sample is the same as the number of feature data of the training sample (that is, the feature corresponds to the feature data one-to-one).
  • the features in the training sample refer to the features of the corresponding time series, which may include data features and/or extracted features.
  • the data characteristics are the characteristics of the data in the time series.
  • the data feature includes data arrangement period, data change trend or data fluctuation, etc.
  • the feature data of the data feature includes: data arrangement period data, data change trend data or data fluctuation data, etc.
  • the data arrangement period refers to the period involved in the data arrangement in the time series if the data in the time series is arranged periodically.
  • the data arrangement period includes the period duration (that is, the time interval between the initiation of two periods) and/or the period.
  • data change trend data is used to reflect the change trend of the data arrangement in the time series (that is, the data change trend), for example, the data change trend data includes: continuous growth, continuous decline, first rise and then fall, first fall and then rise, Or satisfy normal distribution, etc.
  • data fluctuation data is used to reflect the fluctuation state of the data in the time series (that is, data fluctuation), for example, the data fluctuation data includes a function that characterizes the fluctuation curve of the time series, or the designation of the time series Value, such as maximum, minimum, or average.
  • the extracted feature is the feature in the process of extracting the data in the time series.
  • the extracted features include statistical features, fitting features, or frequency domain features, etc.
  • the feature data of the extracted features includes statistical feature data, fitting feature data, or frequency domain feature data.
  • Statistical features refer to the statistical features of time series. Statistical features are divided into quantitative features and attribute features. The quantitative features are divided into measurement features and counting features.
  • Quantitative features can be directly represented by numerical values, for example, CPU,
  • the consumption value of various resources such as memory and IO resources is a measurement feature; the number of abnormalities and the number of devices that are working normally are count features; attribute characteristics cannot be directly represented by numerical values, such as whether the device is abnormal or whether the device is down
  • the characteristics of statistical characteristics are the indicators that need to be investigated in statistics.
  • the statistical feature data includes moving average (Moving_average), weighted average (Weighted_mv), etc.;
  • the fitting feature is the feature of the time series fitting, then the fitting feature data is used to reflect the features of the time series for fitting
  • fitting feature data includes the algorithm used for fitting, such as ARIMA; frequency domain features are the features of the time series in the frequency domain, and the frequency domain features are used to reflect the characteristics of the time series in the frequency domain.
  • the frequency domain feature data includes: data of the law followed by the time series distribution in the frequency domain, such as the proportion of high frequency components in the time series.
  • the frequency domain feature data can be obtained by performing wavelet decomposition on the time series.
  • the data acquisition process may include: determining the target feature to be extracted, extracting the feature data of the determined target feature in the first time series, and obtaining the obtained target feature
  • the training sample composed of the data.
  • the target feature that needs to be extracted is determined based on the application scenario involved in the model training method.
  • the target feature is a pre-configured feature, for example, a feature configured by a user.
  • the target feature is one or more of the specified features, for example, the specified feature is the aforementioned statistical feature.
  • the user can pre-set designated features, but for the first time series, it may not have all designated features, and the first analysis device can filter the features belonging to the designated features in the first time series as target features.
  • the target feature includes statistical features: time series decompose_seasonal (Tsd_seasonal), moving average, weighted average, time series classification, maximum, minimum, quantile, variance, standard deviation, Cycle year-on-year (year on year, yoy, refers to the comparison with the historical same period), one or more of daily volatility, bucket entropy, sample entropy, moving average, exponential moving average, Gaussian distribution feature, or T distribution feature, etc.
  • the target characteristic data includes the one or more statistical characteristic data;
  • the target feature includes fitting features: one or more of autoregressive fitting error, Gaussian process regression fitting error, or neural network fitting error, and correspondingly, the target feature data includes the one or more fitting features. Data of appropriate characteristics;
  • the target feature includes frequency domain features: the proportion of high-frequency components in the time series; correspondingly, the target feature data includes data on the proportion of high-frequency components in the time series, and the data can be obtained by performing wavelet decomposition on the time series.
  • Table 1 is a schematic description of a sample in the historical training sample set.
  • each training sample in the historical training sample set includes one or more characteristic KPI time series feature data, and each training sample corresponds to one KPI time series.
  • the identification (ID) is the training sample of KPI_1, which includes the feature data of 4 features.
  • the feature data of the 4 features are: moving average (Moving_average), weighted average (Weighted_mv), time Sequence decomposition_period component (time series decompose_seasonal, Tsd_seasonal) and period yoy.
  • the KPI time series corresponding to the training sample is (x1, x2,..., xn) (the time series is usually obtained by sampling data of a KPI category), and the corresponding label is "abnormal".
  • the training samples acquired by the aforementioned first analysis device may include data with certain characteristics, which is the acquired data itself.
  • the training sample includes KPI data.
  • each sample can include network KPI data of one or more network KPI categories, that is, the characteristic corresponding to the sample is the KPI category.
  • Table 2 is a schematic description of a sample in the historical training sample set.
  • each training sample in the historical training sample set includes one or more characteristic network KPI data.
  • each training sample corresponds to Multiple network KPI data acquired at the same collection time.
  • the identity is the training sample of KPI_2, which includes the feature data of 4 features.
  • the feature data of the 4 features are: network traffic, CPU utilization, packet loss rate, and delay, corresponding to Is labeled "Normal".
  • the feature data corresponding to each feature in the foregoing Table 1 and Table 2 are usually numerical data, that is, each feature has a feature value.
  • Table 1 and Table 2 do not show the feature value.
  • the feature data of the historical training sample set can all be stored in the format of Table 1 or Table 2.
  • the samples in the historical training sample set may also have other forms, which are not limited in the embodiments of the present application.
  • the first analysis device may preprocess the samples in the collected training sample set, and then perform the aforementioned offline training based on the preprocessed training sample set.
  • the preprocessing process is used to process the collected samples into samples that meet preset conditions.
  • the preprocessing process may include one or more of sample deduplication, data cleaning, and data completion.
  • the offline training process described in step 401 is also called a model learning process, and is a learning process in which a machine learning model performs its related classification function.
  • the offline training process is a process of training the initial learning model to obtain a machine learning model; in another alternative, the offline training process is a process of establishing a machine learning model, that is, The machine learning model after offline training is the initial learning model, which is not limited in the embodiment of the present application.
  • the first analysis device may also perform a model evaluation process on the trained machine learning model to evaluate whether the machine learning model meets the performance compliance condition.
  • the following step 402 is executed again.
  • the machine learning model does not meet the performance compliance condition, the machine learning model can be retrained at least once until the machine learning model meets the performance compliance condition. Perform the following step 402.
  • the first analysis device may set the first performance compliance threshold based on user requirements, and compare the parameter values of the forward performance parameters of the trained machine learning model with the first performance compliance threshold.
  • the forward performance When the forward performance When the parameter value is greater than the first performance compliance threshold, it is determined that the machine learning model meets the performance compliance condition; when the forward performance parameter value is not greater than the first performance compliance threshold, it is determined that the machine learning model does not meet the performance compliance condition.
  • the forward performance parameter is positively correlated with the performance of the machine learning model, that is, the larger the parameter value of the forward performance parameter, the better the performance of the machine learning model.
  • the forward performance parameter is an index that characterizes the performance of the model, such as accuracy, recall, precision, or f-score (f-score), and for example, the first performance compliance threshold is 90%.
  • the accuracy rate the number of correct predictions/total number of predictions.
  • the first analysis device may set a first performance degradation threshold based on user requirements, and compare the parameter value of the negative performance parameter of the machine learning model that has been trained with the first performance degradation threshold.
  • the performance parameter value is greater than the first performance degradation threshold, it is determined that the machine learning model does not meet the performance compliance condition; when the negative performance parameter value is greater than the first performance degradation threshold, it is determined that the machine learning model meets the performance compliance condition.
  • the negative performance parameter is negatively related to the performance of the machine learning model, that is, the larger the parameter value of the negative performance parameter, the worse the performance of the machine learning model.
  • a specified number of test samples into a machine learning model to obtain a specified number of classification results.
  • Statistic accuracy or misjudgment rate based on the specified number of classification results.
  • the total number of predictions is the aforementioned specified number.
  • the correctness or error of the predicted classification result can be judged by the operation and maintenance personnel based on expert experience.
  • the foregoing retraining process may be an offline training process or an online training process (such as an incremental training process).
  • the training samples used in the retraining process and the training samples used in the previous training process may be the same or different, which is not limited in the embodiment of the present application.
  • Step 402 The first analysis device sends the machine learning model to multiple site analysis devices.
  • the first analysis device can provide the machine learning model to each site analysis device in different ways.
  • the embodiments of this application are described with the following two examples:
  • the first analysis device may send the machine learning model to the site analysis device after receiving the model acquisition request sent by the site analysis device, and The model acquisition request is used to request the first analysis device to acquire the machine learning model; in another optional example, the first analysis device may actively push the machine learning model to the site analysis device after the machine learning model is obtained through training.
  • the first analysis device may include a model deployment module, which establishes a communication connection with each site analysis device, and the machine learning model can be deployed to each site analysis device through the model deployment module.
  • a model deployment module which establishes a communication connection with each site analysis device, and the machine learning model can be deployed to each site analysis device through the model deployment module.
  • Step 403 The site analysis equipment uses a machine learning model to predict the classification result.
  • different machine learning models can implement different functions. These functions are all realized by predicting the classification results. Different functions correspond to different classification results. After receiving the machine learning model sent by the first analysis device, the site analysis device may use the machine learning model to predict the classification result.
  • the data that needs to be classified result prediction may include the KPI of the CPU and/or the KPI of the memory (memory).
  • the site analysis equipment can periodically perform anomaly detection process.
  • the anomaly detection results output by the machine learning model are shown in Tables 3 and 4.
  • Tables 3 and 4 record different collection times (also called data generation time) the abnormal detection result of the data to be detected acquired, wherein the different collection time includes T1 to TN (N is an integer greater than 1), and the abnormal detection result indicates whether the corresponding data to be detected is abnormal.
  • the data to be detected in Table 3 and Table 4 both include one-dimensional feature data.
  • Table 3 records the abnormal detection results of the data to be detected for the KPI with the feature category of CPU;
  • Table 4 records the results of the KPI for the KPI with the feature category as memory.
  • the abnormal detection result of the detection data Assume that 0 represents normal and 1 represents abnormal.
  • the duration of every two collection time intervals from T1 to TN is a preset time period. Take the collection time T1 as an example. At this time, the KPI of the CPU in Table 3 is 0, and the KPI of the memory in Table 4 is 1, which means that the KPI of the CPU collected at the collection time T1 is normal, and the KPI of the memory collected at the collection time T1 is normal.
  • the KPI is abnormal.
  • Step 404 The site analysis device performs incremental training on the machine learning model based on the first training sample set.
  • the site analysis device can periodically obtain the first training sample set; for example, the site analysis device receives the operation and maintenance personnel of the site analysis device. After the sample set acquisition instruction sent, or the sample set acquisition instruction sent by the first analysis device or the aforementioned management device is received, the first training sample set is acquired, and the sample set acquisition instruction is used to instruct to acquire the first training sample set; for another example , When the machine learning model deteriorates, the authority point analysis equipment obtains the first training sample set.
  • the site analysis equipment only performs incremental training of the machine learning model when the machine learning model is degraded, which can reduce the training time and avoid affecting the user's business.
  • the trigger mechanism of the incremental training (that is, the detection mechanism to detect whether the model is degraded) may include the following two situations:
  • the application scenario of the model training method further includes an evaluation device.
  • the evaluation device can control the site analysis device to perform incremental training of the machine learning model based on the evaluation result of the classification result.
  • the process includes:
  • Step 4041 the site analysis device sends the prediction information to the evaluation device.
  • the site analysis device may send prediction information to the evaluation device every time a machine learning model is used to predict the classification result, and the prediction information includes the predicted classification result; in another example, the site The point analysis device may also periodically send prediction information to the evaluation device, and the prediction information includes the classification results obtained in the current period; in another example, the site analysis device may also, after the number of classification results obtained reaches the number threshold, Send prediction information to the evaluation device, the prediction information including the obtained classification results; in another example, the site analysis device may also send prediction information to the evaluation device within a set time period, the prediction information including the currently obtained classification
  • the time period may be a time period set by the user, or a time period when the frequency of occurrence of user services is lower than a specified frequency threshold, such as 0:00-5:00. In this way, interference with user services can be avoided.
  • the aforementioned prediction information may also carry other information, so that the evaluation device can effectively evaluate each classification result and ensure the accuracy of the evaluation.
  • the machine learning model is used to predict the classification result of the to-be-predicted data composed of one or more KPI feature data.
  • the KPI characteristic data is the characteristic data of the KPI time series, or KPI data.
  • the prediction information also includes: the identification of the device to which the data to be predicted belongs (that is, the device that generates the KPI data corresponding to the data to be predicted, such as a network device), the KPI category corresponding to the data to be predicted, and the data to be predicted The time when the corresponding KPI data was collected. Based on this information, the device, KPI category, and KPI data collection time corresponding to each classification result can be determined, so as to accurately determine whether the KPI data collected at different collection time is abnormal.
  • the KPI feature data is the feature data of the KPI time series
  • the KPI data corresponding to the data to be predicted is the data in the KPI time series
  • the KPI category corresponding to the data to be predicted is the category of the KPI time series
  • the time of collection It is the collection time of any data in the KPI time series, or can be the collection time of data at a specified location, for example, the collection time of the last data.
  • the KPI category of the KPI time series is the packet loss rate
  • the time series: (x1,x2, «,xn) means that the packet loss rates collected in a collection period are x1, x2,ising, xn.
  • the data is (1, 2, 3, 4), which means that the moving average is 1, the weighted average is 2, the time series decomposition_period component is 3, and the period yoy is 4. It is assumed that the collection time of the KPI data corresponding to the data to be predicted is the collection time of the last data in the KPI time series. Then the KPI category corresponding to the data to be predicted is the packet loss rate, and the collection time of the KPI data corresponding to the data to be predicted is the collection time of xn.
  • the KPI category corresponding to the data to be predicted is the KPI category of the KPI data
  • the KPI data corresponding to the data to be predicted is the KPI data itself
  • the collection time is the collection time of the KPI data.
  • the structure of the data to be predicted is similar to Table 2
  • the data to be predicted is (100, 20%, 3, 4), which means that the network traffic is 100, the CPU utilization rate is 20%, the packet loss rate is 3, and the delay Is 4.
  • the KPI categories corresponding to the data to be predicted are network traffic, CPU utilization, packet loss rate, and delay.
  • the KPI data corresponding to the data to be predicted is collected at the time of (100, 20%, 3, 4). The time is usually the same collection time.
  • the evaluation device After the evaluation device receives the prediction information, it can present at least the classification result and the data to be predicted in the prediction information, or it can present all the content in the prediction information for the operation and maintenance personnel to mark whether the classification result is correct or wrong based on expert experience.
  • Step 4042 The evaluation device evaluates whether the machine learning model is degraded based on the prediction information.
  • the evaluation device may evaluate whether the machine learning model is degraded based on the prediction information when the received classification results reach a specified number threshold; it may also periodically evaluate whether the machine learning model is degraded based on the prediction information.
  • the evaluation period can be one week or one month, etc.
  • the evaluation process in the embodiment of the present application is similar in principle to the model evaluation process in step 401 described above.
  • the evaluation device may set a second performance compliance threshold based on user requirements, and compare the parameter value of the forward performance parameter of the machine learning model that has been trained with the second performance compliance threshold.
  • the forward performance parameter is positively correlated with the performance of the machine learning model, that is, the larger the parameter value of the forward performance parameter, the better the performance of the machine learning model.
  • the forward performance parameter is an index that characterizes model performance such as accuracy, recall, precision, or f-score
  • the second performance compliance threshold is 90%.
  • the second performance compliance threshold and the aforementioned first performance compliance threshold may be the same or different.
  • the calculation method of the correctness rate can refer to the calculation method of the correctness rate provided in the model evaluation process in 401.
  • the evaluation device may set a second performance degradation threshold based on user requirements, and compare the parameter value of the negative performance parameter of the machine learning model that has been trained with the second performance degradation threshold.
  • the negative performance parameter is negatively related to the performance of the machine learning model, that is, the larger the parameter value of the negative performance parameter, the worse the performance of the machine learning model.
  • the negative performance parameter is the classification result error rate (also called the misjudgment rate), and the second performance degradation threshold is 20%.
  • the second performance degradation threshold may be the same as or different from the aforementioned first performance degradation threshold.
  • the calculation method of the misjudgment rate can refer to the calculation method of the misjudgment rate provided in the model evaluation process in 401.
  • the evaluation device obtains multiple classification results.
  • Statistic accuracy or misjudgment rate based on multiple classification results obtained.
  • the total number of predictions in the accuracy rate or the misjudgment rate is the number of classification results obtained.
  • the correctness or error of the predicted classification results can be determined by the operation and maintenance personnel of the evaluation equipment.
  • the misjudgment rate can also be obtained in other ways.
  • the site analysis device also establishes a communication connection with the management device.
  • the classification result output by the machine learning model of the site analysis device is "abnormal”
  • it will send alarm information to the management device, which is used to indicate The sample data is abnormal, and it carries the sample data whose classification result is "abnormal”.
  • the management device will identify the sample data and classification results in the alarm information. If the classification results are incorrect, the classification results will be updated (that is, the classification results: "abnormal” will be updated to "normal"), indicating that the alarm information is one False alarm information.
  • the number of false alarm information is the number of errors in the predicted classification result.
  • the site analysis equipment or management equipment can feed back the number of false alarm information in each evaluation cycle to the evaluation equipment, or report the false alarm information to the evaluation equipment, and the evaluation equipment counts the number of false alarm information. Then, based on the number of classification results obtained in the evaluation period calculated by the evaluation equipment, the false positive rate is calculated by using the aforementioned calculation formula of the false positive rate.
  • Step 4043 After determining that the machine learning model is degraded, the evaluation device sends a training instruction to the site analysis device.
  • the training instruction is used to instruct to train the machine learning model.
  • Step 4044 After receiving the training instruction sent by the evaluation device, the site analysis device performs incremental training on the machine learning model based on the first training sample set.
  • the site analysis device itself can evaluate whether the machine learning model has deteriorated based on the prediction information, and after the machine learning model has deteriorated, perform incremental training on the machine learning model based on the first training sample set.
  • the evaluation process can refer to the aforementioned step 4042.
  • the site analysis device can also use other trigger mechanisms for incremental training. For example, when at least one of the following trigger conditions is met, incremental training is performed: the incremental training period is reached, or the site analysis device is received Training instructions sent by the operation and maintenance personnel, or received training instructions sent by the first analysis device. The training instruction is used to instruct incremental training.
  • the first training sample set may include the site analysis device directly extracts from the data obtained by the site analysis device based on a set rule, and determines the sample data of the label.
  • the first training sample set may include time series data obtained by the site analysis device from a network device, or time series feature data.
  • the labels of the first training sample set may be presented to the operation and maintenance personnel by the site analysis device or the foregoing management device or the foregoing first analysis device, and the operation and maintenance personnel may label the labels based on expert experience.
  • the training sample can have various forms. Accordingly, the site analysis device can obtain the training sample in various ways.
  • the embodiment of this application uses the following two optional methods as examples for illustration:
  • the training samples in the first training sample set acquired by the site analysis device may include data determined based on a time sequence.
  • a time sequence For example, include data determined based on KPI time series.
  • each training sample in the first training sample set corresponds to a time sequence
  • each training sample may include feature data for extracting one or more features from the corresponding time sequence.
  • the number of features corresponding to each training sample is the same as the number of feature data of the training sample (that is, the feature corresponds to the feature data one-to-one).
  • the features in the training sample refer to the features of the corresponding time series, which may include data features and/or extracted features.
  • the site analysis device can receive the time sequence sent by the network device (ie, the network device managed by the site analysis device) connected to the site analysis device in the corresponding site network; in another optional
  • the site analysis device has an input/output (I/O) interface, and receives the time series in the corresponding site network through the I/O interface; in another optional example, the site analysis device can use the corresponding
  • the time sequence is read from the storage device of, and the storage device is used to store the time sequence obtained in advance by the site analysis device in the corresponding site network.
  • the data obtaining process can refer to the aforementioned process of obtaining the training samples in the historical training sample set from the first time series. For example, to determine the target feature to be extracted, in the first time series, 2. Extract the feature data of the determined target feature from the time series, and obtain the first training sample composed of the acquired data of the target feature. This is not repeated in the embodiment of the application.
  • the training samples acquired by the aforementioned site analysis device may include data with certain characteristics, which is the acquired data itself.
  • the training sample includes KPI data.
  • each sample can include network KPI data of one or more network KPI categories, that is, the characteristic corresponding to the sample is the KPI category.
  • the process of obtaining training samples by the site analysis device can refer to the process of obtaining training samples by the first analysis device in step 401, and the structure of the training samples in the first training sample set obtained can also refer to the structure of the training samples in the historical training sample set. This is not repeated in the embodiment of this application.
  • the site analysis equipment can perform a certain screening from the sample data obtained by itself, to select better quality sample data as training samples, and provide these training samples to the operation and maintenance personnel for labeling, so as to obtain labeled samples data. Thereby improving the performance of the trained machine learning model.
  • This application refers to the screening function as an active learning function.
  • the machine learning model predicts the classification results based on probability theory, that is, predicts the probability of the existence of multiple classification results, and uses the classification result with the highest probability as the final classification result.
  • a machine learning model based on the principle of binary classification selects a classification result with a higher probability (such as 0 or 1) as the final classification result output.
  • two classification means that there are two classification results of the machine learning model.
  • Table 5 records the KPI of the CPU obtained at different collection times The probabilities of different classification results, where the different collection moments include T1 to TN (N is a positive integer greater than 1), and the different classification results include "normal” and "abnormal” results.
  • 0_prob represents the probability that the prediction is normal
  • 1_prob represents The probability that the prediction is abnormal
  • the machine learning model determines that the final classification result of the KPI of the CPU collected at this collection time T1 is 1, which is the The KPI of the CPU collected at T1 is abnormal.
  • the classification result finally determined by the machine learning model is the classification result with the highest probability, but it is the second highest probability classification result.
  • the gap between the classification results is very small, which leads to poor reliability of the classification results finally determined by the machine learning model.
  • the embodiments of the present application are actually applied, when the probabilities of the multiple classification results obtained by prediction differ greatly from each other, the reliability of the classification results finally determined by the machine learning model is higher.
  • the machine learning model predicts that the probability that the KPI of the CPU acquired at the collection time T1 is 0 and the probability of being 1 is only 0.02 between the two. The two are very close, which can indicate that the machine learning model obtains the KPI at the collection time T1. The prediction result of sample data is unreliable.
  • the machine learning model predicts that the probability that the KPI of the CPU acquired at the collection time T2 is 0 and the probability of being 1 is quite different, which can indicate that the machine learning model is reliable for the prediction results of the sample data acquired at the collection time T2.
  • this kind of sample does not need to be trained; and when the degree of discrimination is small, the machine learning model cannot determine the accurate classification result.
  • This kind of sample can determine its label manually or by label migration to give an accurate classification result ( It can also be considered as an ideal classification result), and training the sample with the determined label as the training sample can improve the reliability of the classification result of the machine learning model for this sample.
  • the embodiment of the present application uses a low-discrimination condition to screen the first training sample set, that is, the first training sample set includes samples that meet the low-discrimination condition selected from the samples obtained by the site analysis device.
  • the low discrimination condition includes at least one of the following:
  • the absolute value of the difference between any two probabilities in the target probability set obtained by using the machine learning model to predict the sample is less than the first difference threshold.
  • the target probability set includes the probabilities of the first n classification results in descending order of probability , 1 ⁇ n ⁇ m, m is the total number of probabilities obtained by the machine learning model predicting samples. Under this condition, samples with insufficient discrimination of n classification results can be screened out.
  • Condition 2 The absolute value of the difference between any two probabilities in the target probability set obtained by using the machine learning model to predict the sample is less than the first difference threshold.
  • the target probability set includes the first n classification results in descending order of probability Probability, n is an integer greater than 1. Under this condition, samples with insufficient discrimination of the classification results can be screened out.
  • Condition 3 The absolute value of the difference between the highest probability and the lowest probability among the probabilities of the various classification results of the sample predicted by the machine learning model is less than the third difference threshold. Under this condition, it is possible to screen samples with insufficient discrimination among multiple classification results.
  • Condition 4 The absolute value of the difference between any two probabilities among the probabilities of the various classification results of the sample predicted by the machine learning model is less than the fourth difference threshold. Under this condition, it is possible to screen samples with insufficient discrimination among multiple classification results.
  • the machine learning model is used to predict the probability distribution entropy E of the various classification results of the sample to be greater than the specified distribution entropy threshold, and the E satisfies:
  • x i represents the i-th classification result
  • P(x i ) represents the probability of predicting the i-th classification result of the sample
  • b is the specified base, such as 2 or the constant e, 0 ⁇ P(x i ) ⁇ 1 , ⁇ means summation.
  • the machine learning model can be used to predict the first sample to obtain the probabilities of multiple classification results.
  • the value range is 0 to 1.
  • the probabilities of the multiple classification results are sorted in descending order of probability.
  • the probabilities of the first n classification results are filtered in the sorted probabilities to obtain the target probability set, and every two of the target probability sets are calculated.
  • the absolute value of the probability difference, and the calculated absolute value of the difference is compared with the first difference threshold. When the absolute value of the difference between any two probabilities is less than the first difference threshold, the first difference is the same This is determined as a sample that satisfies the low discrimination condition.
  • the first difference threshold is 0.3
  • the probability of using the machine learning model to predict the sample X to obtain 3 classification results is 0.32, 0.33, and 0.35
  • the target probability set includes: 0.33 And 0.35, the absolute value of the difference between the two is less than the first difference threshold, then the sample X is a sample that meets the low discrimination condition.
  • the first sample meets the low discrimination condition.
  • the first probability is the probability of using the first tree model to predict the classification result of the first sample as the first classification result
  • the second probability is the probability of using the first tree model to predict the classification result of the first sample as the second classification result.
  • the probabilities are 0.51 and 0.49 respectively, that is, the first probability and the second probability are 0.51 and 0.49 respectively, and the difference is only 0.02.
  • the absolute value of the difference between the two is less than 0.1. It can be determined that the first sample meets the low discrimination condition .
  • the machine learning model can be used to predict the first sample to obtain the probabilities of multiple classification results, and the probabilities of the multiple classification results can be screened to obtain the highest probability and the lowest probability, and the two probability differences are calculated And compare the calculated absolute value of the difference with the third difference threshold. When the absolute value of the difference is less than the third difference threshold, the first sample is determined to meet the low distinction Sample of degree conditions.
  • the third difference threshold is 0.2
  • the probability of using the machine learning model to predict the sample Y to obtain 3 classification results is 0.33, 0.33, and 0.34, respectively
  • the maximum probability and the minimum probability are 0.34 and 0.33, respectively, the difference between the two If the absolute value of the value is less than the third difference threshold, the sample Y is a sample that meets the low discrimination condition.
  • the machine learning model can be used to predict the first sample to obtain the probabilities of multiple classification results, calculate the absolute value of each two probability difference among the probabilities of the multiple classification results, and calculate the difference The absolute value of the value is compared with the fourth difference threshold, and when the absolute value of the difference between any two probabilities is less than the fourth difference threshold, the first sample is determined as a sample that satisfies the low discrimination condition.
  • the fourth difference threshold is 0.2
  • the probability of using a machine learning model to predict sample Z to obtain three classification results is 0.33, 0.33, and 0.34, respectively.
  • the absolute value of the difference between any two probabilities is less than 0.2, then sample Z It is a sample that satisfies the low discrimination condition.
  • the probability distribution is a description of random variables. Different random variables have the same or different probability distributions.
  • the probability distribution entropy is the description of different probability distributions.
  • the probability distribution entropy and the probability The uncertainty of is positively correlated, the greater the entropy of the probability distribution, the greater the uncertainty of the probability. For example, a two-class machine learning model predicts that the probability of the two classification results of the sample is 50%, and the probability distribution entropy takes the maximum value, but in the end, it is impossible to select the actual probability and reliable classification result as the final classification result.
  • the aforementioned formula 1 can be:
  • x 1 represents the first classification result
  • x 2 represents the second classification result.
  • x 1 indicates that the classification result is "normal”
  • x 2 indicates that the classification result is "abnormal”.
  • x 1 indicates that the classification result is "normal”
  • x 2 indicates that the classification result is "abnormal”.
  • Step 405 When the performance of the machine learning model after incremental training does not meet the performance compliance condition, the site analysis device triggers the first analysis device to retrain the machine learning model.
  • the machine learning model after incremental training may have poor performance due to poor training sample quality or other reasons.
  • the first analysis device is still required to retrain the machine learning model.
  • the first analysis device is an analysis device that supports offline training.
  • the amount of data in the training sample set obtained by it is much larger than the data amount of the first training sample set of the site analysis device, and the first analysis device can perform training
  • the length of time is much longer than the allowable training time of the site analysis equipment, and the computing performance of the first analysis equipment is also greater than that of the site analysis equipment. Therefore, when the performance of the machine learning model after incremental training does not meet the performance compliance condition, the machine learning model can be retrained by the first analysis device to train a machine learning model with better performance.
  • the action of evaluating whether the performance of the machine learning model after incremental training meets the performance compliance condition can be performed by the first analysis device.
  • This process can refer to the process of evaluating whether the machine learning model meets the performance compliance condition in step 401, where machine learning The model meets the performance compliance condition, indicating that the machine learning model has not deteriorated, and the machine learning model does not meet the performance compliance condition, indicating whether the machine learning model has deteriorated.
  • the action of evaluating whether the performance of the machine learning model after incremental training meets the performance requirements can be performed by the first analysis device, or by the evaluation device or the site analysis device. For this process, refer to the detection machine learning in step 404.
  • the model is degraded, when other equipment (such as the first analysis equipment or evaluation equipment) other than the site analysis equipment performs the action of evaluating whether the performance of the machine learning model after the incremental training meets the performance compliance conditions, After the other equipment completes the evaluation action, it needs to send the evaluation result to the site analysis device for the site analysis device to determine whether the performance of the machine learning model after the incremental training meets the performance compliance condition.
  • other equipment such as the first analysis equipment or evaluation equipment
  • the process of the site analysis device triggering the first analysis device to retrain the machine learning model may include: the site analysis device sends a retraining request to the first analysis device, and the retraining request is used to request the first analysis device to retrain the machine learning model.
  • the machine learning model is retrained; after receiving the retraining request, the first analysis device retrains the machine learning model based on the retraining request.
  • the site analysis device may also send the training sample set obtained from the corresponding site network for the first analysis device to retrain the machine learning model based on the training sample set.
  • the training sample set may be carried in the aforementioned retraining request, or may be sent to the first analysis device through independent information, which is not limited in the embodiment of the present application.
  • the retraining process of the first analysis device may include the following two optional methods:
  • the first analysis device can perform the machine learning model based on the training sample set sent by the site analysis device that sent the retraining request. Heavy training.
  • the training sample set sent by the site analysis device is a training sample set obtained from the site network corresponding to the site analysis device, and it includes at least the aforementioned first training sample set.
  • the first analysis device receives the retraining request sent by the site analysis device, based on the training sample set sent by the site analysis device that sent the retraining request and the training sent by other site analysis devices Sample collection to retrain the machine learning model.
  • the training sample set used in the retraining includes not only the training sample set obtained by the aforementioned site analysis device from the corresponding site network, but also the training sample set obtained by other site analysis devices in the corresponding site network. Therefore, retraining
  • the training sample set used has a wider range of sample sources and more diverse data types.
  • the machine learning model obtained by retraining is more suitable for the needs of multiple site analysis equipment, which improves the versatility of the model obtained from offline training. In this way, the generalization of the model is realized, and the overall training cost of the first analysis device is reduced.
  • the site analysis device may not send a retraining request, but only send the training sample set obtained from the corresponding site network.
  • the first analysis device can also use The following third method is optional for retraining:
  • the first analysis device receives training sample sets sent by at least two site analysis devices, and retrains the machine learning model based on the received training sample sets.
  • the first analysis device may receive a set of training samples sent by a specified number of site analysis devices (for example, all site analysis devices that have established a communication connection with the first analysis device), or after the training is reached Period, or when a sufficient number of training samples are obtained (that is, when the number of training samples obtained is greater than the training data threshold), the machine learning model is retrained.
  • site analysis devices for example, all site analysis devices that have established a communication connection with the first analysis device
  • Period for example, all site analysis devices that have established a communication connection with the first analysis device
  • a sufficient number of training samples that is, when the number of training samples obtained is greater than the training data threshold
  • the site analysis device sends the training sample set (for example, the first training sample set) obtained from the corresponding site network to the first analysis device.
  • the acquired training sample set can be uploaded periodically; or, the site analysis device can upload the training sample set after receiving the sample set upload instruction sent by the operation and maintenance personnel, or after receiving the sample set upload instruction sent by the first analysis device,
  • the sample set upload instruction is used to instruct to upload the acquired training sample set to the first analysis device.
  • the first analysis device may perform retraining of the machine learning model based on the collected training sample set.
  • the retraining process may be an offline training process or an incremental training process.
  • the training samples used in the retraining process and the training samples used in the previous training process may be the same or different, which is not limited in the embodiment of the present application.
  • the foregoing steps 401 and 404 may be performed periodically, that is, the application scenario supports periodic offline training or incremental training.
  • the machine learning model of the first analysis device may be sent to at least one site analysis device in the manner of step 402 after it is determined that the performance meets the performance compliance condition after evaluation. For example, it is only sent to the site analysis device that sends the aforementioned retraining request to the first analysis device; or, to the site analysis device that provides the training sample set for retraining; or, to the site analysis device that is established with the first analysis device. All or designated site analysis equipment connected by communication, etc.
  • the site analysis device For the site analysis device that receives the retrained machine learning model, if the machine learning model trained by the site analysis device itself also meets the performance requirements, the site analysis device can filter the target in the obtained machine learning model
  • the machine learning model is used to screen the better machine learning model (for example, the machine learning model with the highest performance index) to predict the classification result.
  • the site analysis equipment selects the latest machine learning model as the target machine learning model to adapt to the classification requirements of the current application scenario.
  • the machine learning model can be of multiple types.
  • the tree model is a relatively common machine learning model.
  • the tree model includes multiple associated nodes.
  • each node includes node elements and several branches pointing to the subtree; the subtree on the left of a node is called the left subtree of the node, and the subtree on the right of the node is called Is the right subtree; the root of the subtree of a node is called the child node of the node, also called the child node; if one node is a child node of another node, the other node is The parent node of a node is also called parent node); the depth or level of a node refers to the number of edges of the longest simple path from the root node to the node, for example, the depth of the root node (also called height or level).
  • leaf nodes also called terminal nodes, are nodes with a degree of 0; the degree of a node refers to the number of the subtree of the node Number; non-leaf nodes are nodes other than leaf nodes, which include the root node and the nodes between the root node and the leaf nodes.
  • Binary tree is a tree structure with at most two subtrees per node. In fact, it is a relatively common tree structure.
  • the machine learning model in the embodiment of the present application may be a binary tree model. For example, the isolated forest model.
  • Figure 6 is a schematic tree structure provided by an embodiment of the application.
  • the tree structure includes nodes P1 to P5, where P1 is the root node, P3, P4, and P5 are leaf nodes, and P1 And P2 is a non-leaf node, and the depth of the tree is 2.
  • the machine learning model is formed by two node splits of nodes P1 and P3.
  • Node splitting refers to that the training sample set corresponding to a node is divided into at most two subsets at a certain splitting point of a certain splitting dimension. It can be considered that the node splits at most two child nodes, and each child node corresponds to a subset.
  • splitting the method of dividing the training sample set corresponding to a node into sub-nodes.
  • FIG. 6 is only a schematic tree structure, and it can also have other representation methods such as FIG. 8 or FIG. 9. This application does not describe the way of representing the tree structure. Qualify.
  • the training efficiency of the machine learning model can also be improved.
  • the splitting of the tree model is involved, and the main principle is to split the space corresponding to one or more samples (also called sample space).
  • each training sample includes feature data of one or more features.
  • the feature (ie feature category) corresponding to a training sample is the dimension of the feature data of the space corresponding to the training sample.
  • a training sample includes one-dimensional or multi-dimensional feature data, which means that the training sample includes one or more feature data of feature dimensions.
  • a training sample includes feature data of two-dimensional features, which is also called the training sample includes feature data of two feature dimensions, and the space corresponding to the training sample is a two-dimensional space (ie, a plane).
  • a training sample in Table 1 includes feature data of 4 feature dimensions, and the space corresponding to the training sample is a 4-dimensional space.
  • Figure 7 is a schematic diagram of the splitting principle of a tree model provided by an embodiment of the present application.
  • the tree model is split based on the Mondrian process, which uses a random hyperplane to cut the sample space ( data space), cut once to generate two subspaces, and then continue to use a random hyperplane to cut each subspace, and loop until there is only one sample point in each subspace.
  • a cluster with a high density can be cut many times before it stops cutting, and a point with a low density can easily stop in a subspace very early, which ultimately corresponds to a leaf node of the tree.
  • Figure 7 assumes that the samples in the training sample set corresponding to the machine learning model in Figure 6 include two-dimensional feature data, that is, feature data that includes two feature dimensions.
  • the feature dimensions are feature dimension x and feature dimension y.
  • the training The sample set includes samples (a1, b1), (a1, b2), (a2, b1).
  • the splitting dimension of the first node splitting is the feature dimension x, and the splitting point is a3.
  • the sample space where the training sample set is located is cut into 2 subspaces, corresponding to the left subtree and the right subtree of the P1 node in Figure 6;
  • the splitting dimension of the second node splitting is the feature dimension y, and the splitting point is b3, which corresponds to the left subtree and the right subtree of the P2 node in Figure 6.
  • the sample set: ⁇ (a1, b1), (a1, b2), (a2, b1) ⁇ is divided into three subspaces respectively.
  • the aforementioned feature dimension x and feature dimension y may be any two feature dimensions of the aforementioned data feature and/or extracted feature (for specific features, refer to the aforementioned embodiment).
  • the feature dimension x is the period duration in the data arrangement period
  • the feature dimension y is the moving average value in the statistical feature.
  • the feature data (a1, b1) refers to the period duration as a1 and the moving average value as b1.
  • the aforementioned feature data is data with certain characteristics, such as network KPI data
  • the aforementioned feature dimension x and feature dimension y may be any two KPI categories in the aforementioned KPI categories (for specific features, refer to the aforementioned embodiment).
  • the characteristic dimension x is network traffic
  • the characteristic dimension y is CPU utilization
  • the characteristic data (a1, b1) means that the network traffic is a1 and the CPU utilization is b1.
  • characteristic data is numerical data
  • each characteristic data has a corresponding value.
  • value of the characteristic data is referred to as a characteristic value in the following text.
  • each node in the machine learning model may correspondingly store node information, so that when the machine learning model is subsequently retrained, the node splitting can be performed based on the node information, and the classification result of the leaf node can be determined .
  • the node information of any node in the machine learning model includes label distribution information, and the label distribution information is used to reflect the percentage of labels of different categories of samples in the historical training sample set divided into the corresponding node in the total number of labels.
  • the total number of labels is the total number of labels corresponding to the samples in the historical training sample set divided into any node.
  • the node information of any non-leaf node also includes historical split information, which is the corresponding node for splitting Information.
  • the leaf node in the machine learning model is a node that has not been split currently, and there is no subtree, its historical split information is empty. During the retraining process, if it splits, the If a leaf node becomes a non-leaf node, historical split information needs to be added to it.
  • the foregoing historical split information includes one or more of the location information of the corresponding node in the machine learning model, the split dimension, the split point, the numerical distribution range of the historical training sample set divided into the corresponding node, and the historical split cost.
  • the location information of the corresponding node in the machine learning model is used to uniquely locate the node in the machine learning model.
  • the information includes: the number of layers of the node, the identification of the node and/or the branch relationship of the node, the identification of the node Used to uniquely identify a node in the machine learning model, which can be assigned to the node when the node is generated, and the identification can consist of numbers and/or characters.
  • the branch relationship of any node includes the identification of the parent node of any node and the description of the relationship with the parent node.
  • the branch relationship of node P2 in Figure 6 includes: (Node P1 : Parent node); when any node has child nodes, the branch relationship of any node includes the identification of the child node of any node and the description of the relationship with the child node, for example, the branch of node P2 in Figure 6
  • the relationship also includes: (node P4: left child node, node P5: right child node).
  • the splitting dimension of any node is the characteristic dimension of splitting in the historical sample data set divided into the any node, and the splitting point is a numerical point used for splitting.
  • the splitting dimension of node P1 is x and the splitting point is a3; the splitting dimension of node P2 is y, and the splitting point is b3, and each non-leaf node has only one splitting dimension and one splitting point.
  • the numerical distribution range of the historical training sample set divided into the corresponding node is the distribution range of the feature values in the historical training sample set corresponding to the node.
  • the numerical distribution range of the node P3 is [a3, a2], and Can be expressed as a3-a2.
  • the historical split cost is the split cost determined by the corresponding node based on the numerical distribution range of the historical training sample set. For specific explanation, please refer to the following text.
  • the foregoing label distribution information includes: the number of labels of the same category of the samples in the historical training sample set of the corresponding node and the total number of the foregoing labels; or, the total number of labels of different categories of the samples in the historical training sample set of the corresponding node In the proportion.
  • the total number of labels is the total number of labels corresponding to samples in the historical training sample set divided into any node
  • the ratio of the number of labels of the same category of the samples in the historical training sample set to the total number of labels is the label of the category
  • the percentage of the total number of tags For example, in an anomaly detection scenario, there are 10 labels for samples classified to node P1, among which there are 2 "normal" labels 0 and 8 abnormal labels "1".
  • the label distribution information includes: label 0: 2; label 1: 8; total number of labels: 10 (based on this, the proportion of labels of different categories of samples in the historical training sample set of the corresponding node can be determined in the total number of labels ).
  • the label distribution information includes: label 0: 20%; label 1: 80%. It should be noted that here is only a schematic introduction of the label distribution information representation method, and the label distribution information representation method may also have other methods in actual implementation, which is not limited in the embodiment of the present application.
  • the node information of the leaf node may also include the classification result, that is, the finally determined label corresponding to the leaf node.
  • the node information of nodes P4 and P5 as shown in FIG. 6 may include classification results.
  • the node information includes the numerical distribution range of the historical training sample set
  • effective model retraining can be performed based only on the numerical distribution range of the historical training sample set, without the need to obtain the features in the historical training sample set
  • the actual value of the data effectively reduces the training complexity.
  • the training process of the machine learning model in the foregoing step 401 includes:
  • Step A1 Obtain a set of historical training samples with determined labels.
  • the label of the sample in the historical training sample set may not be marked when the first analysis device acquires it, then the first analysis device may present the sample to the operation and maintenance personnel, and the operation and maintenance personnel may label the label; in another In this case, the labels of the samples in the historical training sample set are already marked when the first analysis device acquires them. For example, the first analysis device acquires the samples from the aforementioned storage device, and the first analysis device directly uses the training sample set to perform model analysis. Just train.
  • Step A2 create a root node.
  • Step A3 the root node is used as the third node, and the offline training process is executed until the split cutoff condition is reached.
  • the offline training process includes:
  • Step A31 Split the third node to obtain the left child node and the right child node of the third node.
  • Step A32 Use the left child node as the updated third node, divide the historical training sample set into the left sample set of the left child node as the updated historical training sample set, and perform the offline training process again.
  • Step A33 Use the right child node as the updated third node, divide the historical training sample set into the right sample set of the right child node as the updated historical training sample set, and perform the offline training process again.
  • Step A4 Determine the classification result for each leaf node to obtain a machine learning model.
  • the third node in the foregoing steps reaches the splitting cutoff condition, it does not have child nodes, so it can be used as a leaf node.
  • a node is a leaf node, it can be based on the historical training samples divided into that node The number of labels of the same category of the samples in the set, the total number of labels in the historical training sample set of the node, and the classification result of the leaf node; or, the labels of different categories based on the samples in the historical training sample set are in the total number of labels
  • the method for determining the classification result is still based on the aforementioned probability theory, that is, the label with the highest proportion or the largest number is used as the final classification result.
  • the proportion of any tag in the total number of tags is the ratio of the total number of any tag to the total number of tags. For example, if the number of "abnormal" tags in the leaf node is 7 and the number of "normal” tags is 3, then the "abnormal" tags account for 70% of the total number of tags, and the "normal” tags are in the total number of tags. The proportion of is 30%, and the final classification result is "abnormal". Save the classification result in the node information of the leaf node.
  • the traditional iForest model uses the average height of the leaf nodes on each tree to calculate the corresponding classification results.
  • the classification result of the leaf node is to use the label with the highest proportion or the largest number as the final classification result, the classification result is more accurate, and the calculation cost is small.
  • the third node may be split based on the numerical distribution range of the historical training sample set to obtain the left child node and the right child node of the third node.
  • the numerical distribution range of the historical training sample set reflects the density of the samples in the historical training sample set. When the sample distribution is relatively scattered, the numerical distribution range is larger, and when the sample distribution is relatively concentrated, the numerical distribution range is smaller.
  • the samples of the historical training sample set may include at least one-dimensional feature data
  • the value distribution range of the historical training sample set is the distribution range of the feature values in the historical training sample set, that is, the value of the historical training sample set
  • the distribution range can be characterized by the minimum and maximum values of feature values in each feature dimension.
  • the characteristic data included in the samples in the historical training sample set are numerical data, such as decimal values, binary values, or vectors.
  • the samples in the historical training sample set include one-dimensional feature data, and the characteristic values included in the historical training sample set are: 1, 3,..., 7, 10, where the minimum value is 1, and the maximum value is 7.
  • the value distribution range of the historical training sample set is [1, 10], which can also be expressed as 1-10.
  • the characteristic data included in the samples in the historical training sample set may be numerical data originally, or may be obtained by transforming non-numerical data through a specified algorithm.
  • feature dimensions are data that cannot be initially represented by numerical values, such as data change trends, data fluctuations, statistical features, or fitting features, and can be converted to numerical data through a specified algorithm. For example, for characteristic data: high, it can be converted into numerical data: 2; for characteristic data: medium, it can be converted into numerical data: 1; for characteristic data low, it can be converted into numerical data: 0.
  • the use of historical training sample sets including numerical values for node splitting can simplify computational complexity and improve computational efficiency.
  • the process of splitting the third node based on the numerical distribution range of the historical training sample set, and obtaining the left child node and the right child node of the third node may include:
  • Step A311 Determine the third split dimension in each feature dimension of the historical training sample set.
  • the third split dimension is a feature dimension randomly selected from each feature dimension of the historical training sample set.
  • the third split dimension is the feature dimension with the largest span among the feature dimensions of the historical training sample set.
  • the span of the feature value on each feature dimension is the difference between the maximum value and the minimum value of the feature value on the feature dimension.
  • the feature dimensions of the historical training sample set can be sorted in the order of span from largest to smallest, and then the feature dimension corresponding to the span with the highest ranking can be selected as the third split dimension.
  • the feature dimension with a large span has a higher probability of splitting. Performing node splitting on this feature dimension can speed up the model convergence speed and avoid the splitting of invalid feature dimensions. Therefore, the feature dimension with the largest span is selected as the third split dimension , which can improve the probability of effective splitting of the machine learning model and save the overhead of node splitting.
  • the embodiment of the present application assumes that the samples in the historical training sample set include two-dimensional feature data, which corresponds to a two-dimensional space.
  • the corresponding spans are x1_max–x1_min, and x2_max–x2_min. Compare these two spans, assuming x1_max–x1_min>x2_max–x2_min, then select the feature dimension x1 as the third split dimension.
  • the third split dimension is the feature dimension with the largest span among the feature dimensions of the historical training sample set.
  • Step A312 Determine the third splitting point on the third splitting dimension of the historical training sample set.
  • the third split point is a numerical point randomly selected on the third split dimension of the historical training sample set. In this way, equal probability splitting in the third splitting dimension can be achieved.
  • the third split node may also be selected in other ways, which is not limited in the embodiment of the present application.
  • Step A313 Perform the splitting of the third node based on the third splitting point of the third splitting dimension, where the value range of the third splitting dimension not greater than the third splitting point value in the third value distribution range is divided into the left sub Node, the value range of the third numerical value distribution range that is greater than the value of the third division point in the third split dimension is divided into the right child node.
  • the third numerical value distribution range is the distribution range of feature values in the historical training sample set, which is composed of the span ranges of feature values in each feature dimension of the historical training set.
  • the feature dimension x1 is not greater than the value range of x1_value, that is, [x1_min,x1_value], and it is divided into the third node P1
  • the left child node P2 of the feature dimension x1 is greater than the value range of x1_value, that is, [x1_value, x1_max], and is divided into the right child node P3 of the third node P1.
  • the first analysis device since the first analysis device has already obtained the aforementioned historical training sample set for training, it is also possible to directly use the samples to split the nodes.
  • the method of dividing the nodes by using the numerical distribution range in step A313 can be replaced with: In the historical training sample set, the samples whose feature value in the third split dimension is not greater than the value of the third split point are divided into the left child node, and the historical training sample set in the third split dimension has a feature value greater than the value of the third split point The sample is divided into the right child node.
  • the embodiment of the present application can control the depth of the tree by setting the splitting cut-off condition, and avoid excessive splitting of the tree.
  • the split cut-off condition includes at least one of the following:
  • Condition 3 The number of splits corresponding to the third node is greater than the threshold of the number of splits.
  • the embodiment of this application proposes a concept of splitting cost.
  • the current splitting cost of any node is negatively related to the size of the numerical distribution range of the training sample set of any node
  • the training sample set of any node is a set of samples used to train the machine learning model divided into any node.
  • the current splitting cost of any node is the reciprocal of the sum of the spans of the feature values of the samples of the training sample set of the any node in each feature dimension.
  • the current splitting cost of the third node is negatively related to the distribution range of feature values in the historical training sample set, that is, the larger the numerical distribution range, the smaller the splitting cost.
  • the current split cost of the third node is the reciprocal of the sum of the spans of the feature values of the historical training sample set in each feature dimension.
  • the split cost threshold may be positive infinity.
  • max j -min j represents the maximum value minus the minimum value of the feature value on the j-th feature dimension in the numerical distribution range of the historical training sample set, that is, the span of the feature value on the feature dimension, and N is the feature The total number of dimensions.
  • the cost of splitting can be calculated for each node splitting and compared with the threshold of splitting cost.
  • the initial value of splitting cost is 0 and the threshold of splitting cost is positive infinity.
  • the calculated split costs along the depth of the tree are 0, COST1 (the first node split), COST2 (the second node split), and COST3 (the third node split), etc., where the number of splits and the split cost are positive Relevant, the more the number of splits, the higher the cost of splitting.
  • the second sample threshold may be 2 or 3.
  • the number of splits corresponding to the third node is the total number of splits from the first split of the root node to the current split of the third node.
  • the number of splits corresponding to the third node is greater than the threshold of the number of splits, the current machine learning model The upper limit of the number of splits has been reached, and the offline training process is stopped at this time, which can reduce the computational overhead.
  • the proportion of the label with the largest proportion among the labels corresponding to the historical training sample set in the total number of labels is greater than the specified proportion threshold, indicating that the number of the labels with the largest proportion has reached the classification condition, which can be based on This determines the accurate classification result. Stopping the offline training process at this time can reduce unnecessary splits and reduce computational overhead.
  • the site analysis device performs incremental training of the machine learning model based on the first training sample set, in fact, the process of sequentially inputting multiple training samples in the first training sample set into the machine
  • the training process of the learning model (that is, one training sample is input at a time), multiple training processes are performed, and the training process is the same each time, and each training process is actually a node traversal process. For any training sample in the first training sample set, this traversal process is performed.
  • This embodiment of the application takes the first training sample as an example. It is assumed that the first training sample is any training sample in the first training sample set, which includes feature data of one or more feature dimensions.
  • step 404 includes:
  • Step B1 When the current split cost of the traversed first node is less than the historical split cost of the first node, add an associated second node, the second node being the parent node or child node of the first node.
  • the current split cost of the first node is the cost of the first node splitting the node based on the first training sample (that is, the cost of adding an associated second node to the first node, in this case below, the node split of the first node refers to adding a new branch to the first node)
  • the historical split cost of the first node is the cost of the first node splitting the node based on the historical training sample set of the first node
  • the first node The historical training sample set of is a set of samples divided to the first node in the historical training sample set of the machine learning model.
  • the historical training sample set of the first node is the corresponding one of the third node Historical training sample collection.
  • the current split cost of the first node and the historical split cost of the first node can be directly compared to add the associated first node when the current split cost of the first node is less than the historical split cost of the first node.
  • Two nodes further, you can first obtain the difference value obtained by subtracting the current split cost of the first node from the historical split cost of the first node, and determine whether the absolute value of the difference is greater than the specified difference threshold, so as to ensure that the When the current splitting cost of the first node is far less than the historical splitting cost of the first node, node splitting is performed, which can save training costs and improve training efficiency.
  • the current split cost of the first node is negatively related to the size of the first numerical distribution range
  • the first numerical distribution range is a distribution range determined based on the feature value in the first training sample and the second numerical distribution range.
  • the second numerical value distribution range is the distribution range of the feature values in the historical training sample set of the first node.
  • the first numerical value distribution range is a distribution range determined based on the union of the first training sample and the second numerical value distribution range.
  • the samples in the historical training sample set of the first node include feature data of two feature dimensions, the range of feature values in feature dimension x is [1, 10], and the span of feature values in feature dimension y If the range is [5, 10], the second value distribution range includes the span range of the feature value on the feature dimension x: [1, 10] and the span range of the feature value on the feature dimension y: [5, 10] ; If the feature value of the first training sample on the feature dimension x is 9 and the feature value on the feature dimension y is 13, then the first training sample and the second numerical distribution range are obtained in different feature dimensions.
  • the range of the feature value of the first numerical distribution range in each feature dimension includes: [1, 10] in the feature dimension x, and [5, 13] in the feature dimension y.
  • the current split cost of the first node is the reciprocal of the sum of the spans of the feature values of the first numerical distribution range in each feature dimension.
  • the calculation method of the current split cost of the first node can refer to the calculation method of the current split cost of the third node.
  • the calculation method of the aforementioned cost calculation formula ie formula 3
  • the corresponding numerical distribution range in the formula is trained by the aforementioned historical training.
  • the numerical value distribution range of the sample set is replaced with the first numerical value distribution range, which is not repeated in this embodiment of the application.
  • the historical split cost of the first node is the reciprocal of the sum of the spans of the characteristic values of the samples of the historical training sample set of the first node in each characteristic dimension.
  • the calculation method of the historical split cost of the first node can refer to the calculation method of the current split cost of the aforementioned third node.
  • the calculation method of the aforementioned cost calculation formula is adopted, but the corresponding numerical distribution range in the formula is determined by the numerical distribution range of the aforementioned historical training sample set. It is replaced by the numerical distribution range of the historical training sample set of the first node, which is not repeated in this embodiment of the application.
  • the process of adding the associated second node may include:
  • Step B11 Determine the span range of the characteristic value of the first numerical value distribution range in each characteristic dimension.
  • Step B12 Add a second node based on the first splitting point on the first splitting dimension, where the value range of the first splitting dimension in the first value distribution range whose value is not greater than the value of the first splitting point is divided into the second node For the left child node, the numerical value range in the first numerical distribution range that is greater than the numerical value of the first division point in the first splitting dimension is divided to the right child node of the second node.
  • the value range of the feature dimension y1 in the first numerical distribution range is smaller than or equal to y1_value, namely [y1_min, y1_value].
  • the left child node P5 of the second node P4 is divided, and the value range of the feature dimension y1 in the first numerical distribution range that is greater than y1_value, that is, [y1_value, y1_max] is divided to the right child node P6 of the second node P4.
  • the process of splitting the second node may refer to the process of splitting the third node in step A313 described above, which is not repeated in this embodiment of the present application.
  • the aforementioned first splitting dimension is a splitting dimension determined in each feature dimension based on the span range of feature values on each feature dimension, and the first splitting point is determined on the first splitting dimension of the first numerical distribution range The numerical point used for splitting.
  • the first split dimension is a feature dimension randomly selected from the feature dimensions of the first numerical value distribution range.
  • the first split dimension is the feature dimension with the largest span among the feature dimensions of the first value distribution range.
  • the first splitting point is a numerical point randomly selected on the first splitting dimension of the first numerical value distribution range. In this way, an equal probability split in the first split dimension can be achieved.
  • the second node is the parent node or child node of the first node, that is, the second node is located above or below the first node.
  • the second splitting dimension is the historical splitting dimension of the first node in the machine learning model
  • the second splitting point is the historical splitting point of the first node in the machine learning model.
  • the second node is the parent node of the first node, and the first node is the second The left child of the node.
  • the second node is the left child node of the first node.
  • the node information of each non-leaf node may include a splitting dimension and a splitting point.
  • a splitting dimension For ease of description, subsequent embodiments of this application use the format of "u>v" to indicate that the splitting dimension is u and the splitting point is v.
  • Figure 11 is a machine learning model before adding the second node, including nodes Q1 and Q3, and Figure 9 is the machine learning model shown in Figure 11 with a second addition.
  • the machine learning model after the second node.
  • node Q1 is the first node, its second splitting dimension is x 2 , and the second splitting point is 0.2;
  • Q2 is the second node, its first splitting dimension is x 1 , and the first splitting point is 0.7, because the first node Unlike the split dimension of the second node, as shown in Figure 9, the newly added second node serves as the parent node of the first node.
  • Figure 12 shows another machine learning model before adding the second node, including nodes Q1 and Q2.
  • Figure 9 shows the machine learning model shown in Figure 12 after adding the second node. model. Assuming that node Q1 is the first node, its second splitting dimension is x 2 , and the second splitting point is 0.2; Q3 is the second node, its first splitting dimension is x 1 , and the first splitting point is 0.4, because the first node Unlike the split dimension of the second node, the newly added second node serves as a child node of the first node.
  • Figure 13 is a machine learning model before adding the second node, including nodes Q4 and Q6.
  • Figure 14 is the machine learning model shown in Figure 13 with a second addition. The machine learning model after the second node.
  • node Q4 is the first node
  • its second splitting dimension is x 1
  • the second splitting point is 0.2
  • Q5 is the second node
  • its first splitting dimension is x 1
  • the first splitting point is 0.7
  • the split dimension is the same as that of the second node, and the first split point is located to the right of the second split point, the newly added second node serves as the parent node of the first node, and the first node is the left child node of the second node.
  • Figure 13 is a machine learning model before adding the second node, including nodes Q4 and Q6,
  • Figure 15 is the machine learning model shown in Figure 13 with the first addition The machine learning model after the second node.
  • node Q4 is the first node
  • its second splitting dimension is x 1
  • the second splitting point is 0.2
  • Q7 is the second node
  • its first splitting dimension is x 1
  • the first splitting point is 0.1
  • the split dimension is the same as that of the second node, and the first split point is located to the left of the second split point, and the newly added second node serves as the left child node of the first node.
  • the child nodes of the child nodes that are not the first node are leaf nodes, and the classification result of the leaf node needs to be determined. That is, when the second node is the parent node of the first node, the other child node of the second node is a leaf node; when the first node is the parent node of the second node, the two child nodes of the second node are both leaves node.
  • the method for determining the classification result of the leaf node in the incremental training process can refer to the method for determining the classification result of the leaf node in the offline training process, which is based on the number of labels of the same category of the samples in the historical training sample set, and the total number of labels.
  • Determine the classification result of the leaf node the total number of labels is the total number of labels corresponding to samples in the historical training sample set of the leaf node; or, based on the labels of different categories of the samples in the historical training sample set in the total number of labels Proportion to determine the classification result of the leaf node. This is not repeated in the embodiment of the application.
  • each node in the machine learning model stores node information, so that in the incremental training process, the historical split information obtained from the node information of the first node, such as split dimension, split point, historical training
  • the numerical distribution range of the sample set is used to determine whether to add a second node to the first node to achieve rapid incremental training; when a node is determined to be a leaf node, it can be quickly determined based on the label distribution information in the node information
  • the classification result of the leaf node is used to determine whether to add a second node to the first node to achieve rapid incremental training.
  • the corresponding node information can be stored for each node, or after the entire machine learning training is completed, the corresponding node information can be stored for each node for subsequent repetition. Training use.
  • the node information of the second node needs to be correspondingly saved. Since the purpose of adding a second node is to separate samples of different categories, the branch where the new second node is located is a branch that is not in the branch of the original machine learning model, and belongs to the new branch, so it does not affect the original branch distribution.
  • the node that has a connection relationship with the second node such as the position information in the node information corresponding to the parent node or the child node is updated correspondingly, and other information in the node information remains unchanged. In this way, the incremental training of the machine learning model can be performed while minimizing the impact on other nodes.
  • step B1 it is also possible to detect whether the sum of the number of samples in the historical training sample set of the first node and the number of first training samples is greater than the first sample number threshold, When the sum of the number of samples in the historical training sample set of the first node and the number of first training samples is greater than the first sample number threshold, a second node is added. In each incremental training process, the number of first training samples is 1.
  • the incremental training of the machine learning model is stopped. That is, the step of adding the associated second node is not performed. In this way, the second node is added only when the number of samples is large, and node splitting is performed to avoid invalid node splitting and reduce the overhead of computing resources. In addition, since nodes with too few samples are split, the prediction performance of the machine learning model may decrease. By setting the first sample number threshold, the accuracy of the model can be guaranteed.
  • Step B2 When the current splitting cost of the first node is not less than the historical splitting cost of the first node, each node in the subtree of the first node is traversed, and the traversed node is determined as the new first node, and all nodes are executed again. Describe the traversal process until the current split cost of the first node traversed is less than the historical split cost of the first node, or the target depth is traversed. Stop the traversal process at this time. It should be noted that when the current splitting cost of the first node is not less than the historical splitting cost of the first node, and the subtree of the first node is traversed, that is, the first node is a leaf node, the traversal process is also stopped. .
  • the second node associated with the first node is added.
  • the process of adding the second node can refer to the aforementioned step B1 This is not repeated in the embodiment of this application. Stopping the traversal process after traversing to the target depth can avoid excessive splitting of the tree model and prevent the number of tree layers from being too deep.
  • the historical training sample set of the machine learning model refers to the training sample set before the current training process, which is relative to the first training sample currently input.
  • the incremental training process is the first incremental training process after step 401
  • the historical training sample set in the incremental training process is the same as the historical training sample set in the foregoing step 401;
  • the incremental training process is After step 401, the wth (w is an integer greater than 1) incremental training process
  • the historical training sample set in the incremental training process is the historical training sample set in step 401 and the previous w-1 increments The collection of training samples input during training.
  • the embodiment of the application provides a training method. Through the aforementioned incremental training method, online incremental training of a machine learning model can be realized. Since each node corresponds to storing node information, incremental training can be performed without acquiring a large number of samples, thereby achieving a A lightweight machine learning model.
  • the analysis equipment that maintains the machine learning model can simplify the machine learning model when the model simplification conditions are reached, making the structure of the streamlined machine learning model simpler, and the calculation efficiency is higher when performing predictions.
  • the principle of model simplification is actually the principle of searching for connected domains, that is, to merge the divided spaces that can belong to the same connected domain in the machine learning model.
  • the model streamlining process includes:
  • the first non-leaf node and the second non-leaf node in the machine learning model are merged, and the first leaf node and the second leaf node are merged to obtain a simplified machine learning model, which is used for classification
  • the prediction of the result where the first leaf node is a child node of the first non-leaf node, the second leaf node is a child node of the second non-leaf node, the first leaf node and the second leaf node include the same classification result, and
  • the span ranges of the feature value of the historical training sample set allocated on the same feature dimension are adjacent.
  • Figure 16 assumes that the samples in the training sample set corresponding to the machine learning model include feature data of two feature dimensions.
  • the feature dimensions are feature dimension x and feature dimension y.
  • the training sample set includes: sample M(a1, b1), N(a1, b2), Q(a2, b1), U(a4, b4).
  • the splitting dimension of the first node splitting is the feature dimension x, and the splitting point is a3.
  • the sample space where the two-dimensional sample is located is cut into 2 subspaces, which correspond to the left subtree and the right subtree of the Q8 node corresponding to Figure 16;
  • the split dimension of the second node split is the feature dimension y, and the split point is b3, which corresponds to the left and right subtrees of the Q9 node in Figure 16;
  • the split dimension of the third node split is the feature dimension y, the split point It is b4, which corresponds to the left subtree and the right subtree of the Q10 node in Figure 16.
  • the space where the samples M (a1, b1), N (a1, b2), Q (a2, b1), and U (a4, b4) are located is divided into subspaces 1-4, a total of 4 subspaces. From the space division diagram on the right side of Figure 16, it can be seen that the classification results of the leaf nodes Q91 and Q101 corresponding to subspaces 3 and 4 are both c, and the feature value spans of the two are adjacent, and the two can be merged. A connected domain is formed, and the merged subspace does not affect the actual classification results of the machine learning model.
  • the final subspace 3 and subspace 4 are merged to form a new subspace 3.
  • the non-leaf node Q9 and the non-leaf node Q10 are merged to form a new non-leaf node Q12, and the leaf node Q91 and the leaf node Q101 are merged to form a new The leaf node Q121.
  • the corresponding node information has also been merged.
  • the merging of node information is actually the union of corresponding parameters (that is, parameters of the same type) in the node information.
  • the aforementioned span ranges on the y-axis [b4, b3] and [b3, b2] are merged into [b4, b2].
  • the simplified machine learning model structure is simpler, reducing the number of tree branches and preventing the tree from being too deep.
  • the model architecture changes, it does not affect its prediction results, which can save storage space and improve prediction efficiency. And by streamlining the process, the model can be prevented from overfitting.
  • the simplification process can be executed periodically, and the simplification process needs to be executed from the bottom of the machine learning model from bottom to top (also called depth from large to small).
  • the model simplification process can be executed by the first analysis device after the aforementioned step 401 or 405, and the streamlined machine learning model can be sent to the site analysis device for the site analysis device to perform sample analysis based on the machine learning model, namely Prediction of classification results. Since the size of the simplified model itself (that is, the size of the memory occupied by the model itself) becomes smaller, when the model is used to predict the classification results, the prediction speed will be faster than the unreduced model, the prediction efficiency will be higher, and the transmission of the model The overhead is reduced accordingly. Further, if the simplified model is only used for sample analysis, the historical split information may not be recorded in the node information, which can further reduce the size of the model itself and improve the efficiency of model prediction.
  • any feature dimension is the aforementioned data feature and/or extracted feature (for specific features, refer to the previous embodiment).
  • Any characteristic dimension is any of the aforementioned KPI categories.
  • delay data is used as characteristic data
  • the characteristic dimension of the characteristic data is delay.
  • the packet loss rate data is used as the characteristic data
  • the characteristic dimension of the characteristic data is the packet loss rate.
  • the machine learning model sent to the site analysis device in the foregoing step 402 is the machine learning model directly obtained in step 401 without being streamlined, so as to support the site analysis device to perform incremental training of the machine learning model.
  • the machine learning model sent to the site analysis device in step 402 may also be a simplified machine learning model, but the machine learning model needs to carry additional node information that has not been merged, so that the site The point analysis device may recover the unsimplified machine learning model based on the simplified machine learning model and the unmerged node information, so as to perform incremental training of the machine learning model.
  • the model simplification process can also be performed by a site analysis device after the aforementioned step 404, and the streamlined machine learning model can be used for sample analysis, that is, prediction of classification results.
  • the model used is a machine learning model that has not been streamlined.
  • the foregoing embodiment of the present application uses the site analysis device directly based on the first training sample set obtained from the site analysis device to perform incremental training of the machine learning model as an example.
  • the aforementioned site analysis device can also perform incremental training on the machine learning model indirectly based on the first training sample set obtained from the site analysis device.
  • the site The analysis device can send the current machine learning model and the first training sample set to the first analysis device, and the first analysis device performs incremental training of the machine learning model based on the first training sample set, and learns the trained machine The model is sent to the site analysis equipment.
  • the incremental training process can refer to the aforementioned step 404, which is not repeated in this embodiment; in another implementation manner, the site analysis device can send the first training sample set to the first analysis device, and the first analysis device The device integrates the first training sample set and the historical training samples used to train the machine learning model to obtain new historical training samples, and performs offline training of the initial machine learning model based on the historical training samples.
  • the results of incremental training for the first training sample set are the same.
  • the machine learning model In the traditional model training method, after offline training, the machine learning model cannot be incrementally trained once it is deployed on the site analysis equipment.
  • the machine learning model supports incremental training and can have good adaptability to new training samples. Especially for anomaly detection scenarios, it can have good adaptability to the emergence of new anomaly patterns and samples with new labels, and the trained model can accurately detect different anomaly patterns. So as to realize the generalization of the model, ensure the prediction performance, and effectively improve the user experience.
  • node splitting is performed based on the numerical distribution range of the training sample set, without a large amount of access to historical training samples. Therefore, the occupation of memory and computing resources is effectively reduced, and training is reduced. cost.
  • the weight of the machine learning model can be realized, the deployment of the machine learning model is more convenient, and the effective generalization of the model can be realized.
  • FIG. 18 is a schematic diagram of the incremental training effect of a traditional machine learning model
  • FIG. 19 is a schematic diagram of the incremental training effect of the machine learning model provided by an embodiment of the present application.
  • the horizontal axis represents the percentage of the input training samples to the total number of training samples
  • the vertical axis represents the performance index reflecting the performance of the model. The larger the index value, the better the performance of the model.
  • a training sample obtained by a site analysis device is used as the training sample set M to compare the traditional machine learning model and the machine learning model provided in this embodiment of the application.
  • the incremental training process is executed periodically, and each round of incremental training inputs 10% of the samples in the training sample set M, then the traditional machine learning model adopts incremental training to train and the performance of the model obtained fluctuates Larger, the performance is unstable; while the performance of the model obtained by training using the incremental training method provided by the embodiment of the present application is gradually increased, and the performance is consistently maintained at about 90%, and the performance is stable. It can be seen that the model training method provided by the embodiment of the present application can train a machine learning model with relatively stable performance, thereby ensuring the generalization of the model.
  • An embodiment of the present application provides a model training device 50, as shown in FIG. 20, which is applied to a site analysis device, and includes:
  • the receiving module 501 is configured to receive the machine learning model sent by the first analysis device
  • the incremental training module 502 is configured to perform incremental training on the machine learning model based on a first training sample set, and the feature data in the first training sample set is the site network corresponding to the site analysis device Characteristic data.
  • the embodiment of the application provides a model training device.
  • the receiving module receives the machine learning model sent by the first analysis device.
  • the incremental training module is based on the first training sample set obtained from the site network corresponding to the site analysis device.
  • the machine learning model is trained incrementally.
  • the feature data in the first training sample set is the feature data obtained from the site network corresponding to the site analysis device, which is more suitable for the application scenario of the site analysis device.
  • the first training sample set of the feature data obtained by the corresponding site network is used for model training, which can make the trained machine learning model more suitable for the needs of the site analysis equipment itself, realize the customization of the model, and improve the application of the model Flexibility; on the other hand, the machine learning model is trained through a combination of offline training and incremental training, which can perform incremental training of the machine learning model when the type or pattern of the feature data obtained by the site analysis device changes , Realize the flexible adjustment of the machine learning model, so as to ensure that the machine learning model obtained by training meets the requirements of the site analysis equipment. Therefore, compared with related technologies, the model training device provided by the embodiment of the present application can effectively adapt to the requirements of site analysis equipment.
  • the apparatus 50 further includes:
  • the prediction module 503 is configured to use the machine learning model to predict the classification result after the machine learning model sent by the first analysis device is received;
  • the first sending module 504 is configured to send prediction information to the evaluation device, where the prediction information includes a classification result obtained by prediction, so that the evaluation device can evaluate whether the machine learning model is degraded based on the prediction information;
  • the incremental training module 502 is used to:
  • the machine learning model After receiving the training instruction sent by the evaluation device, the machine learning model is incrementally trained based on the first training sample set, and the training instruction is used to instruct to train the machine learning model.
  • the machine learning model is used to predict the classification results of the to-be-predicted data composed of one or more key performance indicator KPI feature data;
  • the KPI feature data is the feature data of the KPI time series, or is the KPI data ;
  • the prediction information also includes the KPI category corresponding to the KPI feature data in the data to be predicted, the identifier of the device to which the data to be predicted belongs, and the collection time of the KPI data corresponding to the data to be predicted.
  • the apparatus 50 further includes:
  • the second sending module 505 is configured to send a retraining request to the first analysis device when the performance of the machine learning model after incremental training does not meet the performance compliance condition, and the retraining request is used to request the first The analysis device retrains the machine learning model.
  • the machine learning model is a tree model
  • the incremental training module 502 is configured to:
  • the first node is any non-leaf node in the machine learning model.
  • the second node is the parent node or the child node of the first node;
  • the nodes in the subtree of the first node are traversed, and the traversed node is determined as the new first node, and the traversal is performed again Process until the current split cost of the first node traversed is less than the historical split cost of the first node, or the traverse reaches the target depth;
  • the current split cost of the first node is the cost of node splitting by the first node based on the first training sample
  • the first training sample is any training sample in the first training sample set
  • the first training sample is any training sample in the first training sample set.
  • a training sample includes feature data of one or more feature dimensions
  • the feature data is numerical data
  • the historical split cost of the first node is that the first node performs the node based on the historical training sample set of the first node
  • the cost of splitting, the historical training sample set of the first node is a set of samples divided into the first node in the historical training sample set of the machine learning model.
  • the current split cost of the first node is negatively related to the size of a first numerical distribution range
  • the first numerical distribution range is based on the feature value in the first training sample and the second numerical distribution range Determined distribution range
  • the second numerical distribution range is the distribution range of feature values in the historical training sample set of the first node
  • the historical split cost of the first node is negative to the size of the second numerical distribution range
  • the current split cost of the first node is the reciprocal of the sum of the spans of feature values in each feature dimension in the first numerical distribution range
  • the historical split cost of the first node is the first value distribution range.
  • the incremental training module 502 is configured to:
  • the second node is added based on the first splitting point on the first splitting dimension, wherein, in the first value distribution range, the value of the first splitting dimension is not greater than the value range of the first splitting point value.
  • the value range in the first numerical distribution range that is greater than the value of the first split point in the first splitting dimension is divided to the right child node of the second node ,
  • the first splitting dimension is a span range based on the value of the feature in each feature dimension, the splitting dimension determined in each feature dimension, and the first splitting point is within the first numerical distribution range The numerical point for splitting determined on the first splitting dimension;
  • the second node is the parent node or child node of the first node
  • the second splitting dimension is the first node in the machine learning
  • the historical splitting dimension in the model, the second splitting point is the historical splitting point of the first node in the machine learning model
  • the first splitting dimension is the same as the second splitting dimension, and the first splitting point is located on the right side of the second splitting point, the second node is the parent node of the first node, and The first node is the left child node of the second node;
  • the second node is a left child node of the first node.
  • the first splitting dimension is a feature dimension randomly selected among the feature dimensions of the first numerical value distribution range, or the first splitting dimension is each feature dimension of the first numerical value distribution range The largest feature dimension in the mid-span;
  • the first splitting point is a numerical point randomly selected on the first splitting dimension of the first numerical distribution range.
  • the incremental training module 502 is configured to:
  • the device also includes:
  • the stop module is used to stop the increase of the machine learning model when the sum of the number of samples in the historical training sample set of the first node and the number of the first training samples is not greater than the first sample number threshold. Quantity training.
  • the apparatus 50 further includes:
  • the merging module 506 is configured to merge the first non-leaf node and the second non-leaf node in the machine learning model, and the first leaf node and the second leaf node are merged to obtain a simplified machine learning model.
  • the latter machine learning model is used to predict the classification results.
  • the receiving module 501 is further configured to receive a streamlined machine learning model sent by the first analysis device, where the streamlined machine learning model is the first analysis device Obtained by merging the first non-leaf node and the second non-leaf node in the machine learning model, and the first leaf node and the second leaf node are merged;
  • the first leaf node is a child node of the first non-leaf node
  • the second leaf node is a child node of the second non-leaf node
  • the nodes include the same classification result, and the span ranges of the feature value of the historical training sample set allocated on the same feature dimension are adjacent.
  • each node in the machine learning model stores node information correspondingly, and the node information of any node in the machine learning model includes label distribution information, and the label distribution information is used to reflect the division to Corresponding to the proportion of the labels of different categories of the samples in the historical training sample set in the corresponding node in the total number of labels, the total number of labels is the total number of labels corresponding to the samples in the historical training sample set of any node, any
  • the node information of a non-leaf node also includes historical split information, and the historical split information is information used by the corresponding node for splitting.
  • the historical split information includes: location information of the corresponding node in the machine learning model, split dimension, split point, numerical distribution range of the historical training sample set divided into the corresponding node, and historical split cost;
  • the label distribution information includes: the number of labels of the same category of the samples in the historical training sample set of the corresponding node and the total number of labels; or, the labels of different categories of the samples in the historical training sample set of the corresponding node are in The percentage of the total number of tags.
  • the first training sample set includes samples that meet a low-discrimination condition selected from samples obtained by the site analysis device, and the low-discrimination condition includes at least one of the following:
  • the absolute value of the difference between any two probabilities in the target probability set obtained by using the machine learning model to predict the sample is less than the first difference threshold.
  • the target probability set includes the first n classification results in descending order of probability Probability, 1 ⁇ n ⁇ m, m is the total number of probabilities obtained by the machine learning model predicting samples;
  • the absolute value of the difference between any two probabilities obtained by using the machine learning model to predict the sample is less than a second difference threshold
  • the absolute value of the difference between the highest probability and the lowest probability among the probabilities of the multiple classification results of the predicted sample by using the machine learning model is less than the third difference threshold
  • the absolute value of the difference between any two probabilities among the probabilities of the multiple classification results of the predicted sample by using the machine learning model is less than the fourth difference threshold
  • the machine learning model is used to predict that the probability distribution entropy E of the multiple classification results of the sample is greater than a specified distribution entropy threshold, and the E satisfies:
  • x i represents the i-th classification result
  • P(x i ) represents the probability of predicting the i-th classification result of the sample
  • b is the specified base, 0 ⁇ P(x i ) ⁇ 1.
  • An embodiment of the present application provides a model training device 60, as shown in FIG. 24, applied to a first analysis device, including:
  • the offline training module 601 is used to perform offline training based on a collection of historical training samples to obtain a machine learning model
  • the sending module 602 is configured to send the machine learning model to multiple site analysis devices for the site analysis device to perform incremental training on the machine learning model based on the first training sample set, any site
  • the feature data in the training sample set used by the analysis device to train the machine learning model is the feature data of the site network corresponding to any site analysis device.
  • the sending module can distribute the machine learning model trained by the offline training module to each site analysis device, and each site analysis device will perform incremental training to ensure the performance of the machine learning model on each site analysis device.
  • the first analysis device does not need to train the corresponding machine learning model for each site analysis device, which effectively reduces the overall training time of the first analysis device, and the model obtained by offline training can be used as an increase in each site analysis device.
  • the basis of quantitative training improves the versatility of the model obtained from offline training, thereby realizing model generalization and reducing the overall training cost of the first analysis device.
  • the site analysis device receives the machine learning model sent by the first analysis device, and can perform incremental training on the machine learning model based on the first training sample set obtained from the site network corresponding to the site analysis device.
  • the feature data in the first training sample set is the feature data obtained from the site network corresponding to the site analysis device, which is more suitable for the application scenario of the site analysis device.
  • the first training sample set of the feature data obtained by the corresponding site network is used for model training, which can make the trained machine learning model more suitable for the needs of the site analysis equipment itself, realize the customization of the model, and improve the application of the model Flexibility; on the other hand, the machine learning model is trained through a combination of offline training and incremental training, which can perform incremental training of the machine learning model when the type or pattern of the feature data obtained by the site analysis device changes , Realize the flexible adjustment of the machine learning model, so as to ensure that the machine learning model obtained by training meets the requirements of the site analysis equipment. Therefore, compared with related technologies, the model training method provided in the embodiments of the present application can effectively adapt to the requirements of site analysis equipment.
  • the historical training sample set is a set of training samples sent by multiple site analysis devices.
  • the device 60 further includes:
  • the receiving module 603 is used for:
  • the retraining request sent by the site analysis device is received, and based on the training sample set sent by the site analysis device that sent the retraining request, the machine Retrain the learning model;
  • the machine learning model is a tree model
  • the offline training module is configured to:
  • the offline training process includes:
  • the offline training module 601 is used to:
  • the third node is split based on the numerical distribution range of the historical training sample set to obtain the left child node and the right child node of the third node, and the numerical distribution range of the historical training sample set is the historical training The distribution range of feature values in the sample set.
  • the offline training module 601 is used to:
  • the third numerical value distribution range is divided into the left child node in the third numerical value distribution range whose numerical value is not greater than the numerical value of the third division point in the third division dimension, and the third numerical value distribution range is in the third division.
  • the value range of the dimension greater than the value of the third split point is divided into the right child node, and the third value distribution range is the distribution range of feature values in the historical training sample set of the third node.
  • the split cutoff condition includes at least one of the following:
  • the current splitting cost of the third node is greater than the splitting cost threshold
  • the number of samples in the historical training sample set is less than the second sample number threshold
  • the number of splits corresponding to the third node is greater than a threshold of the number of splits
  • the depth of the third node in the machine learning model is greater than a depth threshold
  • the proportion of the number of labels with the largest proportion among the labels corresponding to the historical training sample set in the total number of labels of the labels corresponding to the historical training sample set is greater than a specified proportion threshold.
  • the current split cost of the third node is negatively related to the size of the distribution range of feature values in the historical training sample set.
  • the current split cost of the third node is the reciprocal of the sum of the spans of feature values of the historical training sample set in each feature dimension.
  • the apparatus 60 further includes:
  • the merging module 604 is configured to merge the first non-leaf node and the second non-leaf node in the machine learning model, and the first leaf node and the second leaf node are merged to obtain a simplified machine learning model.
  • the latter machine learning model is used to predict the classification result, wherein the first leaf node is a child node of the first non-leaf node, and the second leaf node is a child node of the second non-leaf node ,
  • the first leaf node and the second leaf node include the same classification result, and the span ranges of the feature value of the historical training sample set allocated on the same feature dimension are adjacent;
  • the sending module 602 is further configured to send the simplified machine learning model to the site analysis device for the site analysis device to predict the classification result based on the simplified machine learning model.
  • each node in the machine learning model stores node information correspondingly, and the node information of any node in the machine learning model includes label distribution information, and the label distribution information is used to reflect the division to Corresponding to the proportion of the labels of different categories of the samples in the historical training sample set in the corresponding node in the total number of labels, the total number of labels is the total number of labels corresponding to the samples in the historical training sample set of any node, any
  • the node information of a non-leaf node also includes historical split information, and the historical split information is information used by the corresponding node for splitting.
  • the historical split information includes: location information of the corresponding node in the machine learning model, split dimension, split point, numerical distribution range of the historical training sample set divided into the corresponding node, and historical split cost;
  • the label distribution information includes: the number of labels of the same category of the samples in the historical training sample set of the corresponding node and the total number of labels; or, the labels of different categories of the samples in the historical training sample set of the corresponding node are in The percentage of the total number of tags.
  • the first training sample set includes samples that meet a low-discrimination condition selected from samples obtained by the site analysis device, and the low-discrimination condition includes at least one of the following:
  • the absolute value of the difference between any two probabilities in the target probability set obtained by using the machine learning model to predict the sample is less than the first difference threshold.
  • the target probability set includes the first n classification results in descending order of probability Probability, 1 ⁇ n ⁇ m, m is the total number of probabilities obtained by the machine learning model predicting samples;
  • the absolute value of the difference between any two probabilities obtained by using the machine learning model to predict the sample is less than a second difference threshold
  • the absolute value of the difference between the highest probability and the lowest probability among the probabilities of the multiple classification results of the predicted sample by using the machine learning model is less than the third difference threshold
  • the absolute value of the difference between any two probabilities among the probabilities of the multiple classification results of the predicted sample by using the machine learning model is less than the fourth difference threshold
  • the machine learning model is used to predict that the probability distribution entropy E of the multiple classification results of the sample is greater than a specified distribution entropy threshold, and the E satisfies:
  • x i represents the i-th classification result
  • P(x i ) represents the probability of predicting the i-th classification result of the sample
  • b is the specified base, 0 ⁇ P(x i ) ⁇ 1.
  • Fig. 27 is a block diagram of a model training device provided by an embodiment of the present application.
  • the model training device may be the aforementioned analysis device, such as a site analysis device, or the aforementioned first analysis device.
  • the analysis device 70 includes a processor 701 and a memory 702.
  • the memory 701 is used to store a computer program, and the computer program includes program instructions;
  • the processor 702 is configured to call a computer program to implement the model training method provided in the embodiment of the present application.
  • the network device 70 further includes a communication bus 703 and a communication interface 704.
  • the processor 701 includes one or more processing cores, and the processor 701 executes various functional applications and data processing by running a computer program.
  • the memory 702 can be used to store computer programs.
  • the memory may store an operating system and at least one application program unit required by the function.
  • the operating system can be a real-time operating system (Real Time eXecutive, RTX), LINUX, UNIX, WINDOWS, or OS X.
  • the communication interfaces 704 are used to communicate with other storage devices or network devices.
  • the communication interface 704 may be used to receive sample data sent by a network device in a communication network.
  • the memory 702 and the communication interface 704 are respectively connected to the processor 701 through a communication bus 703.
  • the embodiment of the present application provides a computer storage medium with instructions stored on the computer storage medium.
  • the instructions are executed by a processor, the model training method provided in the embodiments of the present application is implemented.
  • the embodiment of the present application provides a model training system, including: a first analysis device and multiple site analysis devices;
  • the first analysis device includes the model training device described in any of the foregoing embodiments; the site analysis device includes the model training device described in any of the foregoing embodiments.
  • the deployment of each device in the model training system can refer to the deployment of each device in the application scenarios shown in Figs. 1 to 3.
  • the model training system also includes: network devices, evaluation devices, storage devices, and One or more of the management devices.
  • FIG. 1 to FIG. 3 For the introduction of related devices, refer to the aforementioned FIG. 1 to FIG. 3, which will not be repeated in the embodiment of the present application.
  • A refers to B, which means that A can be the same as B, or can be simply modified on the basis of B.
  • the computer may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it may be implemented in whole or in part in the form of a computer program product, which includes one or more computer instructions.
  • the computer may be a general-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data.
  • the center transmits to another website site, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium, or a semiconductor medium (for example, a solid state hard disk).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请公开了一种模型训练方法、装置及系统,属于AI领域。所述方法包括:接收第一分析设备发送的机器学习模型;基于第一训练样本集合,对所述机器学习模型进行增量训练,所述第一训练样本集合中的特征数据是局点分析设备所对应的局点网络的特征数据。本申请解决了离线训练得到的机器学习模型无法有效适配于局点分析设备的需求的问题,本申请实施例用于分类结果的预测。

Description

模型训练方法、装置及系统 技术领域
本申请涉及人工智能(Artificial Intelligence,AI)领域,特别涉及一种模型训练方法、装置及系统。
背景技术
机器学习,是指让机器基于训练样本训练出机器学习模型,使机器学习模型对训练样本之外的样本具有类别预测能力。
目前,数据分析系统包括用于数据分析的多个分析设备,该多个分析设备可以包括云端分析设备和局点分析设备,机器学习模型在该系统中的部署方法包括:由云端分析设备进行模型的离线训练,然后将离线训练后的模型直接部署在该局点分析设备。
但是,训练得到的模型可能无法有效适配于局点分析设备的需求。
发明内容
本申请实施例提供了一种模型训练方法、装置及系统。所述技术方案如下:
第一方面,提供了一种模型训练方法,应用于局点分析设备,包括:
接收第一分析设备发送的机器学习模型,可选地,该第一分析设备为云端分析设备;
基于第一训练样本集合,对所述机器学习模型进行增量训练,所述第一训练样本集合中的特征数据是所述局点分析设备所对应的局点网络的特征数据。
一方面,第一训练样本集合中的特征数据是从该局点分析设备所对应的局点网络获取的特征数据,其更适配于局点分析设备的应用场景,采用包括局点分析设备从对应的局点网络获取的特征数据的第一训练样本集合进行模型训练,可以使得训练后的机器学习模型更适配于该局点分析设备自身的需求,实现模型的定制化,提高模型的应用灵活性;另一方面,通过离线训练和增量训练结合的方式来训练机器学习模型,可以在局点分析设备所获取的特征数据的类别或模式出现变化时,进行机器学习模型的增量训练,实现机器学习模型的灵活调整,从而保证训练得到的机器学习模型符合局点分析设备的需求。因此,本申请实施例提供的模型训练方法,相较于相关技术,能够有效适配于局点分析设备的需求。
可选地,在所述接收第一分析设备发送的机器学习模型之后,所述方法还包括:
采用所述机器学习模型进行分类结果的预测;
可选地,所述方法还包括:向所述评估设备发送预测信息,所述预测信息包括预测得到的分类结果,以供所述评估设备基于所述预测信息评估所述机器学习模型是否发生劣化。在一种示例中,局点分析设备可以在每次采用机器学习模型进行分类结果的预测之后,向评估设备发送预测信息,该预测信息包括预测得到的分类结果;在另一种示例中,局点分析设备还可以周期性地向评估设备发送预测信息,该预测信息包括当前周期获取的分类结果;在又一种示例中,局点分析设备还可以在获取的分类结果数量达到数量阈值后,向评估设备发送预测信息,该预测信息包括获取的分类结果;在再一种示例中,局点分析设备还可以在设定 的时间段内向评估设备发送预测信息,该预测信息包括当前获取的分类结果,如此可以避免干扰用户业务。
所述基于第一训练样本集合,对所述机器学习模型进行增量训练,包括:
在接收到所述评估设备发送的训练指令后,基于所述第一训练样本集合,对所述机器学习模型进行增量训练,所述训练指令用于指示对所述机器学习模型进行训练。
可选地,所述机器学习模型用于对一个或多个关键绩效指标KPI特征数据组成的待预测数据进行分类结果的预测;所述KPI特征数据为KPI时间序列的特征数据,或者为KPI数据;
所述预测信息还包括所述待预测数据中的KPI特征数据对应的KPI类别,所述待预测数据所属的设备的标识以及所述待预测数据对应的KPI数据的采集时刻。
可选地,所述方法还包括:
当增量训练后的机器学习模型的性能不满足性能达标条件时,向所述第一分析设备发送重训练请求,所述重训练请求用于请求所述第一分析设备对所述机器学习模型进行重训练。
可选地,所述机器学习模型为树模型,所述基于第一训练样本集合,对所述机器学习模型进行增量训练,包括:
对于所述第一训练样本集合中的任一训练样本,从所述机器学习模型的根节点开始遍历,执行如下遍历过程:
当遍历到的第一节点的当前分裂成本小于第一节点的历史分裂成本时,添加关联的第二节点,所述第一节点为所述机器学习模型中的任一非叶子节点,所述第二节点为所述第一节点的父节点或子节点;
当第一节点的当前分裂成本不小于第一节点的历史分裂成本,遍历所述第一节点的子树中的节点,并将遍历到的节点确定为新的第一节点,再次执行所述遍历过程,直至遍历到的第一节点的当前分裂成本小于第一节点的历史分裂成本,或者遍历到目标深度;
其中,第一节点的当前分裂成本为所述第一节点基于第一训练样本进行节点分裂的成本,所述第一训练样本为所述第一训练样本集合中的任一训练样本,所述第一训练样本包括一个或多个特征维度的特征数据,所述特征数据为数值数据,所述第一节点的历史分裂成本为所述第一节点基于所述第一节点的历史训练样本集合进行节点分裂的成本,所述第一节点的历史训练样本集合为所述机器学习模型的历史训练样本集合中划分至所述第一节点的样本的集合。
可选地,所述第一节点的当前分裂成本与第一数值分布范围的大小负相关,所述第一数值分布范围是基于所述第一训练样本中的特征取值与第二数值分布范围确定的分布范围;所述第二数值分布范围为第一节点的历史训练样本集合中的特征取值的分布范围,所述第一节点的历史分裂成本与所述第二数值分布范围的大小负相关。
可选地,所述第一节点的当前分裂成本为所述第一数值分布范围中各特征维度上的特征取值的跨度之和的倒数,所述第一节点的历史分裂成本为所述第二数值分布范围中各特征维度上的特征取值的跨度之和的倒数。
在增量训练过程中,基于训练样本集合的数值分布范围进行节点分裂,无需大量访问历史训练样本,因此,有效减少了内存和计算资源的占用,降低训练代价。并且通过前述节点信息携带各个节点的相关信息,可以实现机器学习模型的轻量化,更便于机器学习模型的部署,实现模型的有效泛化。
可选地,所述添加关联的第二节点,包括:
确定所述第一数值分布范围在各特征维度上的特征取值的跨度范围;
基于第一分裂维度上的第一分裂点添加所述第二节点,其中,所述第一数值分布范围中在所述第一分裂维度上数值不大于所述第一分裂点数值的数值范围划分至所述第二节点的左子节点,所述第一数值分布范围中在所述第一分裂维度上数值大于所述第一分裂点数值的数值范围划分至所述第二节点的右子节点,所述第一分裂维度为基于所述各特征维度上的特征取值的跨度范围,在所述各特征维度中确定的分裂维度,所述第一分裂点为在所述第一数值分布范围的所述第一分裂维度上确定的用于分裂的数值点;
当所述第一分裂维度与第二分裂维度不同时,所述第二节点为所述第一节点的父节点或子节点,所述第二分裂维度为所述第一节点在所述机器学习模型中的历史分裂维度,所述第二分裂点为所述第一节点在所述机器学习模型中的历史分裂点;
当所述第一分裂维度与所述第二分裂维度相同,且所述第一分裂点位于所述第二分裂点右侧,所述第二节点为所述第一节点的父节点,且所述第一节点为所述第二节点的左子节点;
当所述第一分裂维度与所述第二分裂维度相同,且所述第一分裂点位于所述第二分裂点左侧,所述第二节点为所述第一节点左子节点。
可选地,所述第一分裂维度为在所述第一数值分布范围的各特征维度中随机选择的特征维度,或,所述第一分裂维度为所述第一数值分布范围的各特征维度中跨度最大的特征维度;
和/或,所述第一分裂点为在所述第一数值分布范围的所述第一分裂维度上随机选择的数值点。
可选地,所述添加关联的第二节点,包括:
当所述第一节点的历史训练样本集合中的样本的数量与所述第一训练样本的数量之和大于第一样本数阈值,添加所述第二节点;
所述方法还包括:
当所述第一节点的历史训练样本集合中的样本的数量与所述第一训练样本的数量之和不大于所述第一样本数阈值,停止所述机器学习模型的增量训练。
可选地,所述方法还包括:
将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并,得到精简后的机器学习模型,所述精简后的机器学习模型用于进行分类结果的预测;
或者,接收所述第一分析设备发送的精简后的机器学习模型,所述精简后的机器学习模型是所述第一分析设备将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并得到的;
其中,所述第一叶子节点为所述第一非叶子节点的子节点,所述第二叶子节点为所述第二非叶子节点的子节点,所述第一叶子节点和所述第二叶子节点包括相同的分类结果,且在同一特征维度上分配的历史训练样本集合的特征取值的跨度范围相邻。
精简后的机器学习模型结构更为简单,减少了树的分支层数,防止树层数过深,虽然模型架构产生变化,但不影响其预测结果,可以节约存储空间,提高预测效率。并且通过精简过程可以防止模型的过拟合。进一步的,该精简后的模型若仅用于样本分析,可以不在其节点信息中记录历史分裂信息,这样可以进一步减少模型本身的大小,提高模型预测效率。
可选地,所述机器学习模型中的每个节点对应存储有节点信息,所述机器学习模型中的任一节点的所述节点信息包括标签分布信息,所述标签分布信息用于反映划分至对应节点中的历史训练样本集合中样本的不同类别的标签在标签总数中的占比,所述标签总数是划分至所述任一节点的历史训练样本集合中样本对应的标签的总数量,任一非叶子节点的节点信息还包括历史分裂信息,所述历史分裂信息为对应节点用于分裂的信息。
可选地,所述历史分裂信息包括:对应节点在机器学习模型中的位置信息,分裂维度,分裂点,划分至对应节点的历史训练样本集合的数值分布范围和历史分裂成本;
所述标签分布信息包括:划分至对应节点的历史训练样本集合中样本的同一类别的标签个数和所述标签总数;或,划分至对应节点的历史训练样本集合中样本的不同类别的标签在所述标签总数中的占比。
可选地,所述第一训练样本集合包括在所述局点分析设备获取的样本中筛选得到的满足低区分度条件的样本,所述低区分度条件包括以下至少一者:
采用所述机器学习模型预测样本得到的目标概率集合中任意两个概率的差值的绝对值小于第一差值阈值,所述目标概率集合包括按照概率的大小降序排列的前n个分类结果的概率,1<n<m,m为所述机器学习模型预测样本得到的概率总数;
或者,采用所述机器学习模型预测样本得到的概率中任意两个概率的差值的绝对值小于第二差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率中最高概率和最低概率的差值的绝对值小于第三差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率中任意两个概率的差值的绝对值小于第四差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率分布熵E大于指定分布熵阈值,所述E满足:
Figure PCTCN2020115770-appb-000001
其中,x i表示第i种分类结果,P(x i)表示预测得到样本的第i种分类结果的概率,b为指定的底数,0≤P(x i)≤1。
第二方面,提供一种模型训练方法,应用于第一分析设备,例如该第一分析设备可以为云端分析设备,该方法包括:
基于历史训练样本集合进行离线训练,得到机器学习模型;
向多个局点分析设备发送所述机器学习模型,以供所述局点分析设备基于第一训练样本集合,对所述机器学习模型进行增量训练,任一局点分析设备用于训练所述机器学习模型的训练样本集合中的特征数据是所述任一局点分析设备所对应的局点网络的特征数据。
本申请实施例中,第一分析设备可以将训练得到的机器学习模型分发至每个局点分析设备中,由各个局点分析设备进行增量训练,保证各个局点分析设备上的机器学习模型的性能。如此,第一分析设备无需为每个局点分析设备都训练对应的机器学习模型,有效减少了第一分析设备的整体训练时长,且离线训练得到的模型在各个局点分析设备均可以作为增量训练的基础,提高了离线训练得到的模型的通用性,从而实现模型泛化,降低了第一分析设备的整体训练成本。
可选地,所述历史训练样本集合是多个所述局点分析设备发送的训练样本的集合。
可选地,在向局点分析设备发送所述机器学习模型之后,所述方法还包括:
接收所述局点分析设备发送的重训练请求,基于发送所述重训练请求的局点分析设备所发送的训练样本集合,对所述机器学习模型进行重训练;
或者,接收所述局点分析设备发送的重训练请求,基于发送所述重训练请求的局点分析设备所发送的训练样本集合以及其他局点分析设备所发送的训练样本集合,对所述机器学习模型进行重训练;
或者,接收至少两个所述局点分析设备发送的训练样本集合,并基于接收到的训练样本集合,重训练所述机器学习模型。
可选地,所述机器学习模型为树模型,所述基于历史训练样本集合进行离线训练,得到机器学习模型,包括:
获取已确定标签的历史训练样本集合,所述历史训练样本集合中的训练样本包括一个或多个特征维度的特征数据,所述特征数据为数值数据;
创建根节点;
将所述根节点作为第三节点,执行离线训练过程直至达到分裂截止条件;
为每个叶子节点确定分类结果,得到所述机器学习模型;
其中,所述离线训练过程包括:
进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点;
将所述左子节点作为更新后的第三节点,将所述历史训练样本集合划分至所述左子节点的左样本集合作为更新后的历史训练样本集合,再次执行所述离线训练过程;
将所述右子节点作为更新后的第三节点,将所述历史训练样本集合划分至所述右子节点的右样本集合作为更新后的历史训练样本集合,再次执行所述离线训练过程。
在一种可选方式中,所述进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点,包括:
基于所述历史训练样本集合的数值分布范围进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点,所述历史训练样本集合的数值分布范围为所述历史训练样本集合中的特征取值的分布范围。
可选地,所述基于所述历史训练样本集合的数值分布范围进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点,包括:
在所述历史训练样本集合的各特征维度中确定第三分裂维度;
在所述历史训练样本集合的所述第三分裂维度上确定第三分裂点;
将第三数值分布范围中在所述第三分裂维度上数值不大于所述第三分裂点数值的数值范围划分至所述左子节点,所述第三数值分布范围中在所述第三分裂维度上数值大于所述第三分裂点数值的数值范围划分至所述右子节点,所述第三数值分布范围为所述第三节点的历史训练样本集中的特征取值的分布范围。
在离线训练过程中,基于训练样本集合的数值分布范围进行节点分裂,无需大量访问历史训练样本,因此,有效减少了内存和计算资源的占用,降低训练代价。并且通过前述节点信息携带各个节点的相关信息,可以实现机器学习模型的轻量化,更便于机器学习模型的部署,实现模型的有效泛化。
在另一种可选的方式中,由于第一分析设备已经获取了前述用于训练的历史训练样本集合,因此也可以直接采用样本进行节点分裂,前述进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点可以替换包括:将历史训练样本集合中在第三分裂维度上特征取值不大于第三分裂点数值的样本划分至左子节点,历史训练样本集合中在第三分裂维度上特征取值大于第三分裂点数值的样本划分至右子节点。
可选地,所述分裂截止条件包括以下至少一者:
所述第三节点的当前分裂成本大于分裂成本阈值,如此可以避免树的过度分裂,减少运算开销;
或者,所述历史训练样本集合的样本的数量小于第二样本数阈值,这种情况说明,历史训练样本集合的数据量已经较少,已不足以支持有效的节点分裂,此时停止离线训练过程,可以减少运算开销;
或者,所述第三节点对应的分裂次数大于分裂次数阈值,这种情况说明当前的机器学习模型已经达到了分裂次数上限,此时停止离线训练过程,可以减少运算开销;
或者,所述第三节点在所述机器学习模型中的深度大于深度阈值,如此可以实现对机器学习模型的深度的控制;
或者,所述历史训练样本集合所对应的标签中占比最大的标签的数量在所述历史训练样本集合所对应的标签的标签总数中的占比大于指定占比阈值,这种情况说明该占比最大的标签的数量已经达到了分类条件,可以基于此确定准确的分类结果了,此时停止离线训练过程,可以减少不必要的分裂,减少运算开销。
可选地,所述第三节点的当前分裂成本与所述历史训练样本集合中的特征取值的分布范围的大小负相关。
可选地,所述第三节点的当前分裂成本为所述历史训练样本集合在各特征维度上的特征取值的跨度之和的倒数。
可选地,所述方法还包括:
将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并,得到精简后的机器学习模型,所述精简后的机器学习模型用于进行分类结果的预测,其中,所述第一叶子节点为所述第一非叶子节点的子节点,所述第二叶子节点为所述第二非叶子节点的子节点,所述第一叶子节点和所述第二叶子节点包括相同的分类结果,且在同一特征维度上分配的历史训练样本集合的特征取值的跨度范围相邻;
向所述局点分析设备发送所述精简后的机器学习模型,以供所述局点分析设备基于所述精简后的机器学习模型进行分类结果的预测。
可选地,所述机器学习模型中的每个节点对应存储有节点信息,所述机器学习模型中的任一节点的所述节点信息包括标签分布信息,所述标签分布信息用于反映划分至对应节点中的历史训练样本集合中样本的不同类别的标签在标签总数中的占比,所述标签总数是划分至所述任一节点的历史训练样本集合中样本对应的标签的总数量,任一非叶子节点的节点信息还包括历史分裂信息,所述历史分裂信息为对应节点用于分裂的信息。
可选地,所述历史分裂信息包括:对应节点在机器学习模型中的位置信息,分裂维度,分裂点,划分至对应节点的历史训练样本集合的数值分布范围和历史分裂成本;
所述标签分布信息包括:划分至对应节点的历史训练样本集合中样本的同一类别的标签 个数和所述标签总数;或,划分至对应节点的历史训练样本集合中样本的不同类别的标签在所述标签总数中的占比。
可选地,所述第一训练样本集合包括在所述局点分析设备获取的样本中筛选得到的满足低区分度条件的样本,所述低区分度条件包括以下至少一者:
采用所述机器学习模型预测样本得到的目标概率集合中任意两个概率的差值的绝对值小于第一差值阈值,所述目标概率集合包括按照概率的大小降序排列的前n个分类结果的概率,1<n<m,m为所述机器学习模型预测样本得到的概率总数;
或者,采用所述机器学习模型预测样本得到的概率中任意两个概率的差值的绝对值小于第二差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率中最高概率和最低概率的差值的绝对值小于第三差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率中任意两个概率的差值的绝对值小于第四差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率分布熵E大于指定分布熵阈值,所述E满足:
Figure PCTCN2020115770-appb-000002
其中,x i表示第i种分类结果,P(x i)表示预测得到样本的第i种分类结果的概率,b为指定的底数,0≤P(x i)≤1。
第三方面,提供了一种模型训练装置,所述装置包括:多个功能模块:所述多个功能模块相互作用,实现上述第一方面及其各实施方式中的方法。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
第四方面,提供了一种模型训练装置,所述装置包括:多个功能模块:所述多个功能模块相互作用,实现上述第二方面及其各实施方式中的方法。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
第五方面,提供了一种模型训练装置,包括:处理器和存储器;
所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
所述处理器,用于调用所述计算机程序,实现如第一方面任一所述的模型训练方法;或者,实现如第二方面任一所述的模型训练方法。
第六方面,提供了一种计算机存储介质,所述计算机存储介质上存储有指令,当所述指令被处理器执行时,实现如第一方面任一所述的模型训练方法;或者,实现如第二方面任一所述的模型训练方法。
第七方面,提供了一种芯片,芯片包括可编程逻辑电路和/或程序指令,当芯片运行时,实现如第一方面任一所述的模型训练方法;或者,实现如第二方面任一所述的模型训练方法。
第八方面,提供了一种计算机程序产品,所述计算机程序产品中存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如第一方面任一所述的模型训练方法;或者,使得所述计算机执行如第二方面任一所述的模型训练方法。
本申请实施例提供的技术方案带来的有益效果是:
本申请实施例提供的模型训练方法,局点分析设备接收第一分析设备发送的机器学习模 型,并可以基于从该局点分析设备所对应的局点网络获取的第一训练样本集合,对机器学习模型进行增量训练。一方面,第一训练样本集合中的特征数据是从该局点分析设备所对应的局点网络获取的特征数据,其更适配于局点分析设备的应用场景,采用包括局点分析设备从对应的局点网络获取的特征数据的第一训练样本集合进行模型训练,可以使得训练后的机器学习模型更适配于该局点分析设备自身的需求,实现模型的定制化,提高模型的应用灵活性;另一方面,通过离线训练和增量训练结合的方式来训练机器学习模型,可以在局点分析设备所获取的特征数据的类别或模式出现变化时,进行机器学习模型的增量训练,实现机器学习模型的灵活调整,从而保证训练得到的机器学习模型符合局点分析设备的需求。因此,本申请实施例提供的模型训练方法,相较于相关技术,能够有效适配于局点分析设备的需求。
进一步的,第一分析设备可以将训练得到的机器学习模型分发至每个局点分析设备中,由各个局点分析设备进行增量训练,保证各个局点分析设备上的机器学习模型的性能。如此,第一分析设备无需为每个局点分析设备都训练对应的机器学习模型,有效减少了第一分析设备的整体训练时长,且离线训练得到的模型在各个局点分析设备均可以作为增量训练的基础,提高了离线训练得到的模型的通用性,从而实现模型泛化,降低了第一分析设备的整体训练成本。
精简后的机器学习模型结构更为简单,减少了树的分支层数,防止树层数过深,虽然模型架构产生变化,但不影响其预测结果,可以节约存储空间,提高预测效率。并且通过精简过程可以防止模型的过拟合。进一步的,该精简后的模型若仅用于样本分析,可以不在其节点信息中记录历史分裂信息,这样可以进一步减少模型本身的大小,提高模型预测效率。
而本申请实施例中,在增量训练或离线训练过程中,基于训练样本集合的数值分布范围进行节点分裂,无需大量访问历史训练样本,因此,有效减少了内存和计算资源的占用,降低训练代价。并且通过前述节点信息携带各个节点的相关信息,可以实现机器学习模型的轻量化,更便于机器学习模型的部署,实现模型的有效泛化。
附图说明
图1是本申请实施例提供的模型训练方法所涉及的一种应用场景示意图;
图2是本申请实施例提供的模型训练方法所涉及的另一种应用场景示意图;
图3是本申请实施例提供的模型训练方法所涉及的又一种应用场景示意图;
图4是本申请实施例提供的一种模型训练方法的流程示意图;
图5是本申请实施例提供的一种基于对分类结果的评估结果,控制局点分析设备对机器学习模型进行增量训练的方法流程图;
图6是本申请实施例提供的一种树结构的示意图;
图7是本申请实施例提供的一种树模型的分裂原理示意图;
图8是本申请实施例提供的另一种树模型的分裂原理示意图;
图9是本申请实施例提供的又一种树模型的分裂原理示意图;
图10是本申请实施例提供的再一种树模型的分裂原理示意图;
图11是本申请另一实施例提供的一种树模型的分裂原理示意图;
图12是本申请另一实施例提供的另一种树模型的分裂原理示意图;
图13是本申请另一实施例提供的又一种树模型的分裂原理示意图;
图14是本申请另一实施例提供的再一种树模型的分裂原理示意图;
图15是本申请又一实施例提供的一种树模型的分裂原理示意图;
图16是本申请又一实施例提供的另一种树模型的分裂原理示意图;
图17是本申请又一实施例提供的又一种树模型的分裂原理示意图;
图18是一种传统的机器学习模型的增量训练效果示意图;
图19是本申请实施例提供的机器学习模型的增量训练效果示意图;
图20是本申请实施例提供的一种模型训练装置的结构示意图;
图21是本申请实施例提供的另一种模型训练装置的结构示意图;
图22是本申请实施例提供的又一种模型训练装置的结构示意图;
图23是本申请实施例提供的再一种模型训练装置的结构示意图;
图24是本申请另一实施例提供的一种模型训练装置的结构示意图;
图25是本申请另一实施例提供的另一种模型训练装置的结构示意图;
图26是本申请另一实施例提供的又一种模型训练装置的结构示意图;
图27是本申请实施例提供的一种分析设备的框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
为了便于读者理解,本申请实施例对提供的模型训练方法所涉及的机器学习算法进行简单介绍。
机器学习算法作为AI领域的一个重要分支,在众多领域得到了广泛的应用。从学习方法的角度,机器学习算法可以分为监督式学习算法、非监督式学习算法、半监督式学习算法、强化学习算法几大类。监督式学习算法,是指可以基于训练数据学习一个算法或建立一个模式,并以此算法或模式推测新的实例。训练数据,也称训练样本,是由输入数据和预期输出组成。机器学习算法的模型,也称机器学习模型,其预期输出,称为标签,其可以是一个预测的分类结果(称作分类标签)。非监督式学习算法与监督式学习算法的区别在于,非监督式学习算法的训练样本没有给定标签,机器学习算法模型通过分析训练样本,从而得到一定的成果。半监督学习算法,其训练样本一部分带有标签,另一部分没有标签,而无标签的数据远远多于有标签的数据。强化学习算法通过不断在环境中尝试,以取得最大化的预期利益,通过环境给予的奖励或惩罚,产生能获得最大利益的选择。
需要说明的是,每个训练样本包括一维或多维的特征数据,也即是包括一个或多个特征的特征数据。示例的,在对关键绩效指标(key performance indicator,KPI)数据进行分类结果预测的场景中,该特征数据具体可以为KPI特征数据。KPI特征数据指得是基于KPI数据所生成的特征数据,该KPI特征数据可以为KPI时间序列的特征数据,即提取KPI时间序列的特征所得到的数据;该KPI特征数据也可以直接为KPI数据。其中,KPI具体可以为网络KPI,网络KPI可以包括中央处理器(central processing unit,CPU)利用率、光功率、网络流量、丢包率、时延和/或用户接入数等各种类别的KPI。当KPI特征数据为KPI时间序列的特征数据时,该KPI特征数据具体可以为前述任一KPI类别的KPI数据的时间序列所提取的特征数据。例如一个训练样本包括:对应的网络KPI时间序列的最大值、加权平均值共2个 特征的网络KPI特征数据。当该KPI特征数据为KPI数据时,该KPI特征数据具体可以为前述任一KPI类别的KPI数据。例如,一个训练样本包括CPU利用率、丢包率和时延共3个特征的网络KPI特征数据。进一步的,在应用有监督学习算法或半监督学习算法的场景中,该训练样本还可以包括标签。例如,在前述对KPI数据进行分类结果预测的场景中,假设分类结果用于指示数据序列是否异常,则一个训练样本还包括标签:“异常”或“正常”。
需要说明的是,前述时间序列是一种特殊的数据序列,其为按照时序排列的一组数据的集合,该时序通常为数据产生的先后顺序,时间序列中的数据也称为数据点。通常一个时间序列中各个数据点的时间间隔为一恒定值,因此时间序列可以作为离散时间数据进行分析处理。
目前的机器学习算法的训练方式分为离线学习(online learning)方式和在线学习(online learning)方式。
在离线学习(也称离线训练)方式中,需要将训练样本集合中的样本批量输入机器学习模型来进行模型训练,训练所需数据量较大。离线学习,通常是用来训练大的或者复杂的模型,因此训练的过程往往比较耗时,处理数据量大。
在在线学习(也称在线训练)方式中,需要小批量或逐个采用训练样本集合中的样本来进行模型训练,训练所需数据量较小。在线学习往往应用于对即时性要求比较高的场景,增量学习(也称增量训练)方式是一种特殊的在线学习方式,不仅要求模型具备即时的对新模式的学习能力,更要求模型具备抗遗忘能力,也就是要求模型既能记住历史学习过的模式,又能对新的模式进行学习。
在机器学习的实践任务中,需要选择具有代表性的样本组成样本集合来构建机器学习模型。通常在有标签的样本数据中,选择与类别相关性强的样本作为该样本集合。其中,标签用于标识样本数据,如标识样本数据的类别。本申请实施例中,用于进行机器学习模型训练的数据均为样本数据,下文中训练数据称为训练样本,将训练样本集合称为训练样本集合,并在部分内容中将样本数据简称为样本。
图1是本申请实施例提供的模型训练方法所涉及的一种应用场景示意图。如图1所示,该应用场景中包括多个分析设备,该多个分析设备包括分析设备101和多个分析设备102。每个分析设备用于执行数据挖掘和/或数据建模等一系列数据分析过程。图1中分析设备101和分析设备102的数量仅用作示意,不作为对本申请实施例提供的模型训练方法所涉及的应用场景的限制。
其中,分析设备101,具体可以为云端分析设备(也称云分析平台),其可以是一台计算机,或者一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心,其部署在服务网络的后端。分析设备102,具体可以为局点分析设备(也称局点分析平台),可以是一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心。在该应用场景中,模型训练方法所涉及的模型训练系统包括多个局点网络,局点网络可以为核心网,也可以为边缘网络,每个局点网络的用户可以为运营商或企业客户。多个分析设备102与多个局点网络可以一一对应,每个分析设备102用于为对应的局点网络提供数据分析服务,每个分析设备102可以位于对应的局点分析网络内,也可以位于对应的局点分析网络外。每个分析设备102与分析设备101之间通过有线网络或无线网络连接。本申请实施例中所涉及的通信网络是第二代(2-Generation,2G)通信网络、第三代(3rd Generation, 3G)通信网络、长期演进(Long Term Evolution,LTE)通信网络或第五代(5rd Generation,5G)通信网络等。
除了进行数据分析,分析设备101还用于管理分析设备102的部分或全部业务,收集训练样本集合,为分析设备102提供数据分析服务等,分析设备101基于收集的训练样本集合可以训练得到机器学习模型(该过程即采用前述离线学习方式),之后将该机器学习模型部署在各个局点分析设备中,由局点分析设备进行增量训练(该过程即采用前述在线学习方式)。基于不同的训练样本,可以训练得到不同的机器学习模型,不同的机器学习模型可以实现不同的分类功能。例如,可以实现异常检测、预测、网络安全防护和应用识别或用户体验评估(即评估用户的体验)等功能。
进一步的,如图2所示,图2是本申请实施例提供的模型训练方法所涉及的另一种应用场景示意图。在图1的基础上该应用场景还包括网络设备103。每个分析设备102可以管理一个网络(也称局点网络)中的网络设备103,分析设备102与其管理的网络设备103之间通过有线网络或无线网络连接。网络设备103可以是路由器、交换机或基站等。网络设备103与分析设备102之间通过有线网络或无线网络连接。网络设备103用于向分析设备102上传采集到的数据,例如各类KPI时间序列,分析设备102用于从网络设备103提取和使用数据,例如确定获取的时间序列的标签。可选地,网络设备103向分析设备102上传的数据还可以包括各类日志数据和设备状态数据等。
如图3所示,图3是本申请实施例提供的模型训练方法所涉及的另一种应用场景示意图。在图1或图2的基础上该应用场景还包括评估设备104(图3绘制的是以图2为基础的应用场景,但并不对此进行限定)。评估设备104与分析设备102之间通过有线网络或无线网络连接。评估设备104用于评估分析设备102使用机器学习模型进行数据分类的分类结果,并基于评估结果控制局点分析设备进行机器学习模型的增量训练。
在图1至图3所示的场景的基础上,该应用场景还可以包括存储设备,其用于存储网络设备103或分析设备102提供的数据,该存储设备可以为分布式存储设备,分析设备102或分析设备101可以对该存储设备所存储的数据进行读写。这样在应用场景中数据较多的情况下,由存储设备进行数据存储,可以减轻分析设备(如分析设备102或分析设备101)的负载,提高分析设备的数据分析效率。该存储设备可以用于存储已确定标签的数据,可以将这些已确定标签的数据作为样本,以进行模型训练。需要说明的是,当应用场景中数据量较少时,也可以不设置该存储设备。
可选地,该应用场景还包括管理设备,例如网管设备(也称网管平台)或第三方管理设备,该管理设备用于提供配置反馈,以及样本标注反馈,该管理设备通常由运维人员来管理。示例的,该管理设备可以是一台计算机,或者一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务中心,其可以是运维支撑系统(operations support system,OSS)或其它的与分析设备连接的网络设备。可选地,前述分析设备可以进行各个机器学习模型的特征数据选择和模型更新,并将选择的特征数据以及模型的更新结果反馈给该管理设备,由管理设备来决策是否进行模型的重训练。
进一步的,本申请实施例提供的模型训练方法可以用于异常检测场景中。异常检测是指对不符合预期的模式进行检测。异常检测的数据来源包括应用、进程、操作系统、设备或者网络,例如,该异常检测的对象可以为前述KPI数据序列。本申请实施例提供的模型训练方 法,应用在异常检测场景中时,分析设备102可以为网络分析器,分析设备102维护的机器学习模型为异常检测模型,确定的标签为异常检测标签,该异常检测标签包括两种分类标签,分别为:“正常”和“异常”。
在异常检测场景中,前述机器学习模型可以为基于统计与数据分布的算法(例如N-Sigma算法)的模型、基于距离/密度的算法(例如局部异常因子算法)的模型、树模型(如孤立森林(Isolation forest,Iforest))或基于预测的算法模型(例如差分整合移动平均自回归模型(Autoregressive Integrated Moving Average model,ARIMA))等。
相关技术中,在数据分析系统中,由云端分析设备进行模型的离线训练,然后将离线训练后的模型直接部署在局点分析设备上。但是,训练得到的模型可能无法有效适配于局点分析设备的需求,如预测的性能(如准确率或查全率)需求。一方面,云端分析设备采用的历史训练样本集合中的训练样本通常是预先配置的固定的训练样本,可能与局点分析设备的需求不符合;另一方面,即使训练得到的机器学习模型在刚部署在局点分析设备上时,符合局点分析设备的需求,但是随着时间的推移,由于局点分析设备所获取的特征数据的类别或模式出现变化,导致训练得到的机器学习模型与局点分析设备的需求不再符合。
并且相关技术中,训练得到的机器学习模型只能针对单个的局点分析设备,当云端分析设备服务于多个局点分析设备时,需要为每个局点分析设备分别训练对应的机器学习模型。训练得到的模型通用性较低,无法实现模型泛化,且训练成本较高。
本申请实施例提供一种模型训练方法,后续实施例假设前述分析设备101为第一分析设备,分析设备102为局点分析设备,局点分析设备接收第一分析设备发送的机器学习模型,并可以基于从该局点分析设备所对应的局点网络获取的第一训练样本集合,对机器学习模型进行增量训练。一方面,第一训练样本集合中的特征数据是从该局点分析设备所对应的局点网络获取的特征数据,其更适配于局点分析设备的应用场景,采用包括局点分析设备从对应的局点网络获取的特征数据的第一训练样本集合进行模型训练,可以使得训练后的机器学习模型更适配于该局点分析设备自身的需求(即该局点分析设备所对应的局点网络的需求),实现模型的定制化,提高模型的应用灵活性;另一方面,通过离线训练和增量训练结合的方式来训练机器学习模型,可以在局点分析设备所获取的特征数据的类别或模式出现变化时,进行机器学习模型的增量训练,实现机器学习模型的灵活调整,从而保证训练得到的机器学习模型符合局点分析设备的需求。因此,本申请实施例提供的模型训练方法,相较于相关技术,能够有效适配于局点分析设备的需求。
进一步的,第一分析设备可以将训练得到的机器学习模型分发至每个局点分析设备中,由各个局点分析设备进行增量训练,保证各个局点分析设备上的机器学习模型的性能。如此,第一分析设备无需为每个局点分析设备都训练对应的机器学习模型,有效减少了第一分析设备的整体训练时长,且离线训练得到的模型在各个局点分析设备均可以作为增量训练的基础,提高了离线训练得到的模型的通用性,从而实现模型泛化,降低了第一分析设备的整体训练成本。
本申请实施例提供一种模型训练方法,该方法可以应用于图1至图3任一所示的应用场景,机器学习模型可以用于进行分类结果的预测,例如其可以为二分类模型,为了便于区分,本申请后续实施例中将人工或标签迁移方式确定的分类结果称之为标签,将机器学习模型自身预测得到的结果称之为分类结果,两者实质相同,均用于标识对应样本的类别。该模型训 练方法的应用场景通常包括多个局点分析设备,本申请实施例以一个局点分析设备为例对模型训练方法进行说明,其他局点分析设备的动作可以参考该局点分析设备的动作。如图4所示,该方法包括:
步骤401、第一分析设备基于历史训练样本集合进行离线训练,得到机器学习模型。
第一分析设备可以不断收集训练样本以得到训练样本集合,并基于收集得到的训练样本集合(可以称为历史训练样本集合)进行离线训练,从而得到机器学习模型。示例的,该历史训练样本集合可以是多个局点分析设备发送的训练样本的集合。基于此训练得到的机器学习模型可以适配于该多个局点分析设备的需求,训练得到的模型的通用性较高,从而能够保证模型的泛化。
请参考前述图2和图3,训练样本可以由局点分析设备在网络设备收集并上传的数据中获取,由局点分析设备传输至第一分析设备。训练样本也可以由第一分析设备采用其他方式获取,例如从存储设备存储的数据中获取,本申请实施例对此不做限定。
其中,训练样本可以有多种形式,相应的,第一分析设备可以采用多种方式获取训练样本,本申请实施例以以下两种可选方式为例进行说明:
在第一种可选方式中,前述第一分析设备获取的训练样本可以包括基于时间序列确定的数据。例如,包括基于KPI时间序列确定的数据。通常情况下,历史训练样本集合中的每个训练样本对应一个时间序列,每个训练样本可以包括从对应时间序列中提取一个或多个特征的特征数据。每个训练样本对应的特征与该训练样本的特征数据的个数相同(即特征与特征数据一一对应)。其中,训练样本中的特征指的是对应时间序列所具有的特征,其可以包括数据特征和/或提取特征。
其中,数据特征是时间序列中的数据的自身特征。例如,数据特征包括数据排列周期、数据变化趋势或数据波动等,相应的,该数据特征的特征数据包括:数据排列周期的数据、数据变化趋势数据或数据波动数据等。数据排列周期是指若时间序列中数据周期性排列,该时间序列中数据排列所涉及的周期,例如,数据排列周期的数据包括周期时长(也即两个周期发起的时间间隔)和/或周期个数;数据变化趋势数据用于反映时间序列中数据排列的变化趋势(即数据变化趋势),例如,该数据变化趋势数据包括:持续增长、持续下降、先升后降,先降后升,或者满足正态分布等等;数据波动数据用于反映时间序列中数据的波动状态(即数据波动),例如该数据波动数据包括表征该时间序列的波动曲线的函数,或者,该时间序列的指定值,如最大值、最小值或平均值。
提取特征是提取该时间序列中的数据的过程中的特征。例如,提取特征包括统计特征、拟合特征或频域特征等,相应的,提取特征的特征数据包括统计特征数据、拟合特征数据或频域特征数据等。统计特征是指时间序列所具有的统计学特征,统计特征有数量特征和属性特征之分,其中数量特征又有计量特征和计数特征之分,数量特征可以直接用数值来表示,例如,CPU、内存、IO资源等多种资源的消耗值为计量特征;而出现异常的次数、正常工作的设备个数是计数特征;属性特征不能直接用数值来表示,如设备是否出现异常、设备是否产生宕机等,统计特征中的特征就是统计时需要考察的指标。例如,该统计特征数据包括移动平均值(Moving_average)、加权平均值(Weighted_mv)等;拟合特征是时间序列拟合时的特征,则拟合特征数据用于反映时间序列用于拟合的特征,例如拟合特征数据包括进行拟合时所采用的算法,如ARIMA;频域特征是时间序列在频域上的特征,则频域特征用于反映 时间序列在频域上的特征。例如,频域特征数据包括:时间序列在频域上分布所遵循的规律的数据,如该时间序列中高频分量的占比。可选地,频域特征数据可以通过对时间序列进行小波分解得到。
假设训练样本中的特征数据从第一时间序列获取,则该数据获取过程可以包括:确定需要提取的目标特征,在第一时间序列中提取确定的目标特征的特征数据,得到由获取的目标特征的数据组成的训练样本。示例的,该需要提取的目标特征是基于模型训练方法所涉及的应用场景确定的。在一种可选示例中,该目标特征为预先配置的特征,例如是由用户配置的特征。在另一种可选示例中,该目标特征为指定特征中的一个或多个,例如该指定特征为前述统计特征。
值得说明的是,用户可以预先设置指定特征,但是对于第一时间序列,其可能无法具有全部指定特征,第一分析设备可以在第一时间序列中筛选属于该指定特征的特征作为目标特征。例如,该目标特征包括统计特征:时间序列分解_周期分量(time series decompose_seasonal,Tsd_seasonal)、移动平均值、加权平均值、时间序列分类、最大值、最小值、分位数、方差、标准差、周期同比(year on year,yoy,指的是与历史同时期比较)、每天波动率、分桶熵、样本熵、滑动平均、指数滑动平均、高斯分布特征或T分布特征等中的一个或多个,相应的,目标特征数据包括该一个或多个统计特征的数据;
和/或,该目标特征包括拟合特征:自回归拟合误差、高斯过程回归拟合误差或神经网络拟合误差中的一个或多个,相应的,目标特征数据包括该一个或多个拟合特征的数据;
和/或,该目标特征包括频域特征:时间序列中高频分量的占比;相应的,目标特征数据包括时间序列中高频分量的占比的数据,该数据可以对时间序列进行小波分解得到。
表1为历史训练样本集合中的一个样本的示意性说明,表1中,该历史训练样本集合中每个训练样本包括一个或多个特征的KPI时间序列的特征数据,每个训练样本对应一个KPI时间序列。表1中,身份标识(identification,ID)为KPI_1的训练样本,包括4个特征的特征数据,该4个特征的特征数据分别为:移动平均值(Moving_average)、加权平均值(Weighted_mv)、时间序列分解_周期分量(time series decompose_seasonal,Tsd_seasonal)和周期yoy。该训练样本对应的KPI时间序列为(x1,x2,……,xn)(该时间序列通常是对一种KPI类别的数据进行采样得到的),对应的标签为“异常”。
表1
Figure PCTCN2020115770-appb-000003
在第二种可选方式中,前述第一分析设备获取的训练样本可以包括本身具有一定特征的数据,其为获取的数据本身。例如,训练样本包括KPI数据。如前所述,假设KPI为网络KPI,则每个样本可以包括一种或多种网络KPI类别的网络KPI数据,也即是样本对应的特征为KPI类别。
表2为历史训练样本集合中的一个样本的示意性说明,表2中,该历史训练样本集合中每个训练样本包括一个或多个特征的网络KPI数据,表2中,每个训练样本对应同一采集时 刻获取的多个网络KPI数据。表2中,身份标识(identification,ID)为KPI_2的训练样本,包括4个特征的特征数据,该4个特征的特征数据分别为:网络流量、CPU利用率、丢包率和时延,对应的标签为“正常”。
表2
Figure PCTCN2020115770-appb-000004
前述表1和表2中的每个特征对应的特征数据通常为数值数据,也即是每个特征具有特征取值,为了便于说明,表1和表2未示出该特征取值。假设历史训练样本集合按照固定格式存储特征数据,其对应的特征可以为预先设定的特征,则历史训练样本集合的特征数据均可以按照表1或表2的格式存储。本申请实施例在实际实现时,历史训练样本集合中的样本还可以有其他形式,本申请实施例对此不做限定。
值得说明的是,在进行离线训练之前,第一分析设备可以对收集到的训练样本集合中的样本进行预处理,然后再基于预处理后的训练样本集合,进行前述离线训练。该预处理过程用于将收集的样本处理成符合预设条件的样本,该预处理过程可以包括样本去重,数据清洗和数据补全中的一种或多种处理。
步骤401中所述的离线训练过程也称模型学习过程,是机器学习模型进行其相关分类功能的学习过程。在一种可选方式,该离线训练过程,是对初始学习模型进行训练得到机器学习模型的过程;在另一种可选方式中,该离线训练过程是建立机器学习模型的过程,也即是离线训练后的机器学习模型即为初始学习模型,本申请实施例对此不做限定。在完成离线训练之后,第一分析设备还可以对训练得到的机器学习模型执行模型评估过程,以评估该机器学习模型是否满足性能达标条件。当机器学习模型满足性能达标条件时,再执行下述步骤402,当机器学习模型不满足性能达标条件时,可以进行机器学习模型的至少一次重训练,直至机器学习模型满足性能达标条件时,再执行下述步骤402。
在一种示例中,第一分析设备可以基于用户需求设置第一性能达标阈值,并将训练完成的机器学习模型的正向性能参数的参数值与第一性能达标阈值进行比较,当正向性能参数值大于第一性能达标阈值,确定机器学习模型满足性能达标条件;当正向性能参数值不大于第一性能达标阈值,确定机器学习模型不满足性能达标条件。该正向性能参数与机器学习模型的性能的优劣正相关,也即是该正向性能参数的参数值越大,机器学习模型的性能越好。例如,正向性能参数为准确率、查全率、查准率或者f-score(f-分数)等表征模型性能的指标,又例如,第一性能达标阈值为90%。该准确率=预测正确的次数/总预测次数。
在另一种示例中,第一分析设备可以基于用户需求设置第一性能劣化阈值,并将训练完成的机器学习模型的负向性能参数的参数值与第一性能劣化阈值进行比较,当负向性能参数值大于第一性能劣化阈值,确定机器学习模型不满足性能达标条件;当负向性能参数值大于第一性能劣化阈值,确定机器学习模型满足性能达标条件。该负向性能参数与机器学习模型的性能的优劣负相关,也即是该负向性能参数的参数值越大,机器学习模型的性能越差。例如,负向性能参数为分类结果错误率(也称误判率),第一性能劣化阈值为20%。误判率=预测错误的次数/总预测次数。
例如,将指定个数的测试样本输入机器学习模型,得到指定个数个分类结果。基于该指定个数个分类结果统计准确率或误判率。前述公式中,总预测次数即为前述指定个数。预测的分类结果的正确或错误可以由运维人员根据专家经验进行判定。
如,指定个数为100次,其中,预测错误20次,则误判率为20/100=20%。若第一性能劣化阈值为10%,则确定机器学习模型不满足性能达标条件。
前述重训练过程可以是离线训练过程,也可以是在线训练过程(如增量训练过程)。该重训练过程使用的训练样本与之前的训练过程所使用的训练样本可以相同也可以不同,本申请实施例对此不做限定。
步骤402、第一分析设备向多个局点分析设备发送机器学习模型。
第一分析设备可以采用不同方式向各个局点分析设备提供机器学习模型。本申请实施例以以下两种示例进行说明:在一种可选示例中,第一分析设备可以在接收到局点分析设备发送的模型获取请求后,向局点分析设备发送机器学习模型,该模型获取请求用于向第一分析设备请求获取机器学习模型;在另一种可选示例中,第一分析设备可以在训练得到机器学习模型后,主动向局点分析设备推送该机器学习模型。
示例的,该第一分析设备可以包括模型部署模块,该模型部署模块与各个局点分析设备均建立有通信连接,可以通过该模型部署模块将机器学习模型部署至各个局点分析设备。
步骤403、局点分析设备采用机器学习模型进行分类结果的预测。
如前所述,不同的机器学习模型,可以分别实现不同功能。这些功能均是通过对分类结果的预测来实现的。不同功能所对应的分类结果不同。局点分析设备在接收第一分析设备发送的机器学习模型之后,可以采用机器学习模型进行分类结果的预测。
示例的,若需要对局点分析设备的在线数据进行分类结果的预测,需要进行分类结果预测的数据可以包括CPU的KPI和/或内存(memory)的KPI。
假设需要对局点分析设备的在线数据进行异常检测,也即预测得到的分类结果指示数据是否异常。则局点分析设备可以周期性执行异常检测过程,在对在线数据进行异常检测后,机器学习模型输出的异常检测结果如表3和表4所示,表3和表4记录了不同采集时刻(也称数据产生时刻)所获取的待检测数据的异常检测结果,其中该不同采集时刻包括T1至TN(N为大于1的整数),异常检测结果指示对应待检测数据是否异常。其中,表3和表4中的待检测数据均包括一维特征数据,表3记录了特征类别为CPU的KPI的待检测数据的异常检测结果;表4记录了特征类别为内存的KPI的待检测数据的异常检测结果。假设0代表正常,1代表异常。T1至TN每两个采集时刻间隔的时长为预设的时间周期。则以采集时刻T1为例,在该时刻,表3中CPU的KPI为0,表4的内存的KPI为1,代表在采集时刻T1采集的CPU的KPI正常,在采集时刻T1采集的内存的KPI异常。
表3
采集时刻 CPU的KPI
T1 0
T2 0
... ...
TN 0
表4
采集时刻 内存的KPI
T1 1
T2 0
... ...
TN 1
步骤404、局点分析设备基于第一训练样本集合,对机器学习模型进行增量训练。
局点分析设备获取第一训练样本集合的时机有多种情况,例如局点分析设备可以周期性获取第一训练样本集合;又例如,局点分析设备在接收到局点分析设备的运维人员发送的样本集合获取指令,或者接收到第一分析设备或者前述管理设备发送的样本集合获取指令后,获取第一训练样本集合,该样本集合获取指令用于指示获取第一训练样本集合;再例如,当局点分析设备在机器学习模型发生劣化时,获取第一训练样本集合。
通常情况下,局点分析设备在机器学习模型发生劣化时,才进行机器学习模型的增量训练,这样可以减少训练时长,避免影响用户业务。该增量训练的触发机制(即检测模型是否发生劣化的检测机制)可以包括以下两种情况:
在第一种情况下,如图3所示,该模型训练方法的应用场景还包括评估设备,评估设备可以基于对分类结果的评估结果,控制局点分析设备对机器学习模型进行增量训练。如图5,该过程包括:
步骤4041、局点分析设备将预测信息发送至评估设备。
在一种示例中,局点分析设备可以在每次采用机器学习模型进行分类结果的预测之后,向评估设备发送预测信息,该预测信息包括预测得到的分类结果;在另一种示例中,局点分析设备还可以周期性地向评估设备发送预测信息,该预测信息包括当前周期获取的分类结果;在又一种示例中,局点分析设备还可以在获取的分类结果数量达到数量阈值后,向评估设备发送预测信息,该预测信息包括获取的分类结果;在再一种示例中,局点分析设备还可以在设定的时间段内向评估设备发送预测信息,该预测信息包括当前获取的分类结果,例如该时间段可以是用户设定的时间段,或者用户业务发生频率低于指定频率阈值的时段,如0:00-5:00。如此可以避免干扰用户业务。
需要说明的是,在不同的应该场景,前述预测信息还可以携带其他信息,以便于评估设备对各个分类结果进行有效评估,保证评估的准确性。
示例的,在KPI数据序列进行分类结果预测的场景中,机器学习模型用于对一个或多个KPI特征数据组成的待预测数据进行分类结果的预测。其中,KPI特征数据为KPI时间序列的特征数据,或者为KPI数据。相应的,该预测信息还包括:待预测数据所属的设备(也即是产生该待预测数据所对应的KPI数据的设备,例如网络设备)的标识、待预测数据对应的KPI类别和待预测数据对应的KPI数据的采集时刻。基于这些信息,可以确定每个分类结果所对应的设备、KPI类别、KPI数据的采集时刻,从而准确地判定不同的采集时刻所采集的KPI数据是否出现异常。
其中,当KPI特征数据为KPI时间序列的特征数据时,该待预测数据对应的KPI数据为该KPI时间序列中的数据,则待预测数据对应的KPI类别为该KPI时间序列的类别,采集时 刻为该KPI时间序列中的任一数据的采集时刻,也可以为指定位置的数据的采集时刻,例如最后一个数据的采集时刻。例如,假设KPI时间序列的KPI类别为丢包率,该时间序列:(x1,x2,……,xn),表示在一个采集周期中采集到丢包率分别为x1,x2,……,xn,假设待预测数据假设结构与表1类似,该数据为(1,2,3,4),表示移动平均值为1,加权平均值为2、时间序列分解_周期分量为3,周期yoy为4,假设待预测数据对应的KPI数据的采集时刻为该KPI时间序列中的最后一个数据的采集时刻。则待预测数据对应的KPI类别为丢包率,待预测数据对应的KPI数据的采集时刻为xn的采集时刻。
当KPI特征数据为KPI数据,则待预测数据对应的KPI类别为KPI数据的KPI类别,该待预测数据对应的KPI数据为KPI数据本身,则采集时刻即为该KPI数据的采集时刻。例如,假设待预测数据假设结构与表2类似,该待预测数据为(100,20%,3,4),表示网络流量为100,CPU利用率为20%、丢包率为3,时延为4。则待预测数据对应的KPI类别为网络流量、CPU利用率、丢包率和时延,待预测数据对应的KPI数据的采集时刻为(100,20%,3,4)的采集时刻,该采集时刻通常为同一采集时刻。
评估设备在接收到预测信息后,可以至少呈现预测信息中的分类结果和待预测数据,也可以呈现预测信息中的全部内容,以供运维人员标注根据专家经验判定分类结果的正确或错误。
步骤4042、评估设备基于预测信息评估机器学习模型是否发生劣化。
示例的,评估设备可以在接收到的分类结果达到指定数量阈值时,基于预测信息评估机器学习模型是否发生劣化;也可以周期性基于预测信息评估机器学习模型是否发生劣化。相应的,评估周期可以为一周或一个月等。本申请实施例中该评估过程与前述步骤401中的模型评估过程原理类似。
在一种示例中,评估设备可以基于用户需求设置第二性能达标阈值,并将训练完成的机器学习模型的正向性能参数的参数值与第二性能达标阈值进行比较,当正向性能参数值大于第二性能达标阈值,确定机器学习模型未发生劣化;当正向性能参数值不大于第二性能达标阈值,确定机器学习模型发生劣化。该正向性能参数与机器学习模型的性能的优劣正相关,也即是该正向性能参数的参数值越大,机器学习模型的性能越好。例如,正向性能参数为准确率、查全率、查准率或者f-score等表征模型性能的指标,第二性能达标阈值为90%。该第二性能达标阈值和前述第一性能达标阈值可以相同也可以不同。该正确率的计算方式可以参考前述401中的模型评估过程所提供的正确率的计算方式。
在另一种示例中,评估设备可以基于用户需求设置第二性能劣化阈值,并将训练完成的机器学习模型的负向性能参数的参数值与第二性能劣化阈值进行比较,当负向性能参数值大于第二性能劣化阈值,确定机器学习模型发生劣化;当负向性能参数值大于第二性能劣化阈值,确定机器学习模型未发生劣化。该负向性能参数与机器学习模型的性能的优劣负相关,也即是该负向性能参数的参数值越大,机器学习模型的性能越差。例如,负向性能参数为分类结果错误率(也称误判率),第二性能劣化阈值为20%。该第二性能劣化阈值与前述第一性能劣化阈值可以相同也可以不同。该误判率的计算方式可以参考前述401中的模型评估过程所提供的误判率的计算方式。
例如,评估设备获取多个分类结果。基于获取的多个分类结果统计准确率或误判率。准确率或误判率中的总预测次数即为获取的分类结果的个数。如前所述,预测的分类结果的正 确或错误可以由评估设备的运维人员进行判定。
在异常检测场景中,误判率还可以采用其他方式获取。在该场景中,局点分析设备还与管理设备建立有通信连接,局点分析设备的机器学习模型输出的分类结果为“异常”时,会向管理设备发送告警信息,该告警信息用于指示样本数据发生异常,其携带了分类结果为“异常”的样本数据。管理设备会对告警信息中的样本数据以及分类结果进行识别,若分类结果有误,会进行分类结果更新(即将分类结果:“异常”更新为“正常”),则说明此次告警信息为一个虚报的告警信息。虚报的告警信息的个数即为预测的分类结果的错误次数。局点分析设备或管理设备可以向评估设备反馈每个评估周期的虚报的告警信息的个数,或者,将虚报的告警信息上报给评估设备,由评估设备统计虚报的告警信息的个数。再基于评估设备统计的评估周期内获取的分类结果的个数,采用前述误判率的计算公式计算得到误判率。
步骤4043、评估设备在确定机器学习模型发生劣化后,向局点分析设备发送训练指令。
该训练指令用于指示对机器学习模型进行训练。
可选地,当评估设备在确定机器学习模型未发生劣化后,不进行动作。
步骤4044、局点分析设备在接收到评估设备发送的训练指令后,基于第一训练样本集合,对机器学习模型进行增量训练。
在第二种情况下,局点分析设备自身可以基于预测信息评估机器学习模型是否发生劣化,并在机器学习模型发生劣化后,基于第一训练样本集合,对机器学习模型进行增量训练。该评估过程可以参考前述步骤4042。
值得说明的是,局点分析设备也可以采用其他触发机制来进行增量训练,例如,在满足以下至少一条触发条件时,进行增量训练:达到增量训练周期,或者接收到局点分析设备的运维人员发送的训练指令,或者接收到第一分析设备发送的训练指令。该训练指令用于指示进行增量训练。
在本申请实施例中,第一训练样本集合可以包括局点分析设备从自身获取的数据中直接基于设定的规则提取,并确定标签的样本数据。例如,该第一训练样本集合可以包括局点分析设备从网络设备获取的时间序列的数据,或者时间序列的特征数据。并且,第一训练样本集合的标签可以由局点分析设备或前述管理设备或前述第一分析设备呈现给运维人员,由运维人员根据专家经验进行标签的标注。
参考前述步骤401,训练样本可以有多种形式,相应的,局点分析设备可以采用多种方式获取训练样本,本申请实施例以以下两种可选方式为例进行说明:
在第一种可选方式中,局点分析设备获取的第一训练样本集合中的训练样本可以包括基于时间序列确定的数据。例如,包括基于KPI时间序列确定的数据。参考前述历史训练样本集合的结构,通常情况下,第一训练样本集合中的每个训练样本对应一个时间序列,每个训练样本可以包括从对应时间序列中提取一个或多个特征的特征数据。每个训练样本对应的特征与该训练样本的特征数据的个数相同(即特征与特征数据一一对应)。其中,训练样本中的特征指的是对应时间序列所具有的特征,其可以包括数据特征和/或提取特征。
在一种可选示例中,局点分析设备可以接收在对应的局点网络中该局点分析设备所连接的网络设备(即其管理的网络设备)发送的时间序列;在另一种可选示例中,局点分析设备具有输入输出(I/O)接口,通过该I/O接口接收对应局点网络中的时间序列;在又一种可选示例中,局点分析设备可以从其对应的存储设备中读取时间序列,该存储设备用于存储局点 分析设备在对应的局点网络中预先获取的时间序列。
假设训练样本中的特征数据从第二时间序列获取,则该数据获取过程可以参考前述从第一时间序列获取历史训练样本集合中的训练样本的过程,例如,确定需要提取的目标特征,在第二时间序列中提取确定的目标特征的特征数据,得到由获取的目标特征的数据组成的第一训练样本。本申请实施例对此不再赘述。
在第二种可选方式中,前述局点分析设备获取的训练样本可以包括具有一定特征的数据,其为获取的数据本身。例如,训练样本包括KPI数据。如前所述,假设KPI为网络KPI,则每个样本可以包括一种或多种网络KPI类别的网络KPI数据,也即是样本对应的特征为KPI类别。
局点分析设备获取训练样本的过程可以参考前述步骤401中第一分析设备获取训练样本的过程,获取的第一训练样本集合中训练样本的结构也可以参考前述历史训练样本集合中训练样本的结构,本申请实施例对此不再赘述。
通常情况下,在对机器学习模型进行增量训练时,采集时刻离当前时间越近的样本数据对机器学习模型的影响越大。则如果对机器学习模型进行增量训练所采用的第一训练样本集合中的样本数据质量较差,最终训练得到机器学习模型就可能覆盖之前训练得到的性能较优的机器学习模型,导致机器学习模型的性能偏差。
因此,局点分析设备可以从自身获取的样本数据中进行一定的筛选,以选择质量较好的样本数据作为训练样本,将这些训练样本提供给运维人员进行标签的标注,得到具有标签的样本数据。从而提高训练后的机器学习模型的性能。本申请将该筛选功能称之为主动学习功能。
机器学习模型是基于概率论来进行分类结果预测的,也即是,预测多种分类结果存在的概率,将概率最高的分类结果作为最终的分类结果。例如,基于二分类原理的机器学习模型就是选择概率较大的分类结果(如0或者1)作为最终的分类结果输出。其中,二分类指的是机器学习模型的分类结果有两种。
以对在线数据的CPU的KPI进行异常检测(即输入机器学习模型的样本数据的类型为CPU的KPI)为例,如表5所示,表5记录了不同采集时刻所获取的CPU的KPI的不同分类结果的概率,其中该不同采集时刻包括T1至TN(N为大于1的正整数),不同分类结果包括“正常”和“异常”两种结果,0_prob代表预测为正常的概率,1_prob代表预测为异常的概率,则对于采集时刻T1,1_prob为0.51,0_prob为0.49,1_prob大于0_prob,所以机器学习模型确定这个采集时刻T1所采集的CPU的KPI的最终分类结果为1,也即是该T1时刻采集的CPU的KPI异常。
表5
采集时刻 1_prob 0_prob 分类结果
T1 0.49 0.51 0
T2 0.9 0.1 1
... ... ... ...
TN 0.51 0.49 1
但是,若预测得到的多种分类结果存在的概率中,取值最靠前的两种概率较为接近时,机器学习模型最终确定的分类结果虽然是概率最高的分类结果,但是与概率次高的分类结果的差距很小,这样导致机器学习模型最终确定的分类结果的可靠性较差。而本申请实施例在实际应用时,预测得到的多种分类结果存在的概率相互差距较大时,机器学习模型最终确定的分类结果的可靠性更高。
仍然以表5为例,机器学习模型预测在采集时刻T1获取的CPU的KPI为0的概率和为1的概率仅相差0.02,两者非常接近,可以说明机器学习模型对于采集时刻T1所获取的样本数据的预测结果是不可靠的。机器学习模型预测在采集时刻T2获取的CPU的KPI为0的概率和为1的概率相差较大,可以说明机器学习模型对于采集时刻T2所获取的样本数据的预测结果是可靠的。
由上可知,当一个样本,其由机器学习模型预测得到的多种分类结果存在的概率相互之间的差距较大,也即是概率的区分度很大时,机器学习模型已经能够确定准确的分类结果,这种样本无需再进行训练;而区分度较小时,机器学习模型无法确定准确的分类结果,这种样本可以以人工或者标签迁移的方式确定其标签,从而给定准确的分类结果(也可以认为是理想的分类结果),将确定了标签的样本作为训练样本进行训练,可以提高机器学习模型对这种样本的分类结果的可靠性。
示例的,本申请实施例采用低区分度条件来筛选第一训练样本集合,也即是该第一训练样本集合包括在局点分析设备获取的样本中筛选得到的满足低区分度条件的样本,该低区分度条件包括以下至少一者:
条件1,采用机器学习模型预测样本得到的目标概率集合中任意两个概率的差值的绝对值小于第一差值阈值,目标概率集合包括按照概率的大小降序排列的前n个分类结果的概率,1<n<m,m为机器学习模型预测样本得到的概率总数。在这种条件下可以筛选出存在n个分类结果的区分度不足的样本。
条件2,采用机器学习模型预测样本得到的目标概率集合中任意两个概率的差值的绝对值小于第一差值阈值,该目标概率集合包括按照概率的大小降序排列的前n个分类结果的概率,n为大于1的整数。在这种条件下可以筛选出分类结果的区分度不足的样本。
条件3,采用机器学习模型预测样本的多种分类结果的概率中最高概率和最低概率的差值的绝对值小于第三差值阈值。在这种条件下可以筛选多种分类结果均区分度不足的样本。
条件4,采用机器学习模型预测样本的多种分类结果的概率中任意两个概率的差值的绝对值小于第四差值阈值。在这种条件下可以筛选多种分类结果均区分度不足的样本。
条件5,采用机器学习模型预测样本的多种分类结果的概率分布熵E大于指定分布熵阈值,所述E满足:
Figure PCTCN2020115770-appb-000005
其中,x i表示第i种分类结果,P(x i)表示预测得到样本的第i种分类结果的概率,b为指定的底数,例如2或常数e,0≤P(x i)≤1,∑表示求和。
假设第一样本为局点分析设备获取的任一待预测的样本,则对于前述条件1,可以先采用机器学习模型对第一样本进行预测得到多种分类结果的概率,该概率的取值范围为0至1,将该多种分类结果的概率按照概率的大小降序排序,在排序后的概率中筛选前n个分类结果 的概率得到目标概率集合,计算该目标概率集合中每两个概率差值的绝对值,并将计算得到的差值的绝对值与第一差值阈值进行比较,当任意两个概率的差值的绝对值小于第一差值阈值时,将该第一样本确定为满足低区分度条件的样本。
例如,n=2,第一差值阈值为0.3,采用机器学习模型预测样本X得到3个分类结果(即m=3)的概率,分别为0.32,0.33和0.35,则目标概率集合包括:0.33和0.35,两者的差值的绝对值小于第一差值阈值,则样本X为满足低区分度条件的样本。
对于前述条件2,可以可以先采用机器学习模型对第一样本进行预测得到多种分类结果的概率,该概率的取值范围为0至1,然后计算每两个概率差值的绝对值,并将计算得到的差值的绝对值与第二差值阈值进行比较,当任意两个概率的差值的绝对值小于第二差值阈值时,将该第一样本确定为满足低区分度条件的样本。
例如,在二分类场景中,第一样本对应的第一概率与第二概率的差值的绝对值小于第二差值阈值,则该第一样本满足低区分度条件。第一概率为采用第一树模型预测第一样本的分类结果为第一分类结果的概率,第二概率为采用第一树模型预测第一样本的分类结果为第二分类结果的概率。继续参考表5,假设第一样本为在采集时刻TN获取的CPU的KPI,第二差值阈值为0.1,机器学习模型预测在采集时刻TN获取的CPU的KPI为0的概率和为1的概率,分别为0.51和0.49,即第一概率和第二概率分别为0.51和0.49中的一个,仅相差0.02,两者差值的绝对值小于0.1,可以确定第一样本满足低区分度条件。
对于前述条件3,可以先采用机器学习模型对第一样本进行预测得到多种分类结果的概率,将该多种分类结果的概率中筛选得到最高的概率和最低的概率,计算两个概率差值的绝对值,并将计算得到的差值的绝对值与第三差值阈值进行比较,当该差值的绝对值小于第三差值阈值时,将该第一样本确定为满足低区分度条件的样本。
例如,第三差值阈值为0.2,采用机器学习模型预测样本Y得到3个分类结果的概率,分别为0.33,0.33和0.34,则最大概率和最小概率分别为:0.34和0.33,两者的差值的绝对值小于第三差值阈值,则样本Y为满足低区分度条件的样本。
对于前述条件4,可以先采用机器学习模型对第一样本进行预测得到多种分类结果的概率,计算多种分类结果的概率中每两个概率差值的绝对值,并将计算得到的差值的绝对值与第四差值阈值进行比较,当任意两个概率的差值的绝对值小于第四差值阈值时,将该第一样本确定为满足低区分度条件的样本。
例如,第四差值阈值为0.2,采用机器学习模型预测样本Z得到3个分类结果的概率,分别为0.33,0.33和0.34,任意两个概率的差值的绝对值均小于0.2,则样本Z为满足低区分度条件的样本。
对于前述条件5,概率分布是对随机变量的刻画,不同的随机变量有着相同或不同的概率分布,概率分布熵,就是对不同概率分布的刻画,在本申请实施例中,概率分布熵与概率的不确定性正相关,概率分布熵越大,概率的不确定性越大。例如,二分类的机器学习模型预测样本的两个分类结果的概率均为50%,概率分布熵取最大值,但最终无法选择实际的概率可靠的分类结果作为最终分类结果。
由此可知,当概率分布熵E达到一定程度,如指定分布熵阈值,则无法实现有效的概率区分,因此采用该公式可以有效筛选低区分度的概率。
当前述机器学习模型为二分类模型时,前述公式一可以为:
E=-P(x 1)log bP(x 1)+P(x 2)log bP(x 2)。  (公式二);
其中,x 1表示第1种分类结果,x 2表示第2种分类结果。如在异常检测场景,x 1表示分类结果“正常”,x 2表示分类结果“异常”,其他参数的含义可以参考前述公式一。
步骤405、当增量训练后的机器学习模型的性能不满足性能达标条件时,局点分析设备触发第一分析设备对机器学习模型进行重训练。
增量训练后的机器学习模型可能由于训练样本质量较差,或者其他原因,导致性能较差,在这种情况下,仍然需要第一分析设备进行机器学习模型的重训练。通常情况下,第一分析设备是支持离线训练的分析设备,其所获取的训练样本集合的数据量远远大于局点分析设备的第一训练样本集合的数据量,第一分析设备可以进行训练的时长也远远大于局点分析设备的允许训练时长,第一分析设备运算性能也大于局点分析设备的运算性能。因此,当增量训练后的机器学习模型的性能不满足性能达标条件时,由第一分析设备对机器学习模型进行重训练可以训练得到性能较优的机器学习模型。
评估增量训练后的机器学习模型的性能是否满足性能达标条件的动作可以由第一分析设备执行,该过程可以参考前述步骤401中评估机器学习模型是否满足性能达标条件的过程,其中,机器学习模型满足性能达标条件,表示机器学习模型未发生劣化,机器学习模型不满足性能达标条件,表示机器学习模型是否发生劣化。评估增量训练后的机器学习模型的性能是否满足性能达标条件的动作除了可以由第一分析设备执行,也可以由评估设备或局点分析设备执行,该过程可以参考前述步骤404中检测机器学习模型是否发生劣化的过程,当由局点分析设备之外的其他设备(如第一分析设备或评估设备)执行该评估增量训练后的机器学习模型的性能是否满足性能达标条件的动作时,在该其他设备完成评估动作后,需要向局点分析设备发送评估结果,以供局点分析设备确定增量训练后的机器学习模型的性能是否满足性能达标条件。本申请实施例对不再赘述。
示例的,局点分析设备触发第一分析设备对机器学习模型进行重训练的过程可以包括:局点分析设备向第一分析设备发送重训练请求,该重训练请求用于请求第一分析设备对机器学习模型进行重训练;第一分析设备接收到该重训练请求后,基于重训练请求,对机器学习模型进行重训练。在这种情况下,局点分析设备也可以发送从对应局点网络中获取的训练样本集合,以供第一分析设备基于训练样本集合对机器学习模型进行重训练。该训练样本集合可以携带在前述重训练请求中,也可以通过独立的信息发送至第一分析设备,本申请实施例对此不做限定。相应的,第一分析设备的重训练过程可以包括以下两种可选方式:
在第一种可选方式中,第一分析设备可以在接收到局点分析设备发送的重训练请求后,基于发送重训练请求的局点分析设备所发送的训练样本集合,对机器学习模型进行重训练。
其中,局点分析设备所发送的训练样本集合是从局点分析设备对应局点网络中获取的训练样本集合,其至少包括前述第一训练样本集合。这样,采用包括局点分析设备获取的特征数据的训练样本集合进行重训练,可以使得训练后的机器学习模型更适配于该局点分析设备自身的需求,实现模型的定制化,提高模型的应用灵活性。
在第二种可选方式中,第一分析设备接收局点分析设备发送的重训练请求,基于发送重训练请求的局点分析设备所发送的训练样本集合以及其他局点分析设备所发送的训练样本集合,对机器学习模型进行重训练。
其中,重训练所采用的训练样本集合不仅包括前述局点分析设备从对应局点网络获取的 训练样本集合,还包括其他局点分析设备在各自对应局点网络获取的训练样本集合,因此重训练所采用的训练样本集合的样本来源更广泛,数据类型更多样,重训练得到的机器学习模型更能适配于多个局点分析设备的需求,提高了离线训练得到的模型的通用性,从而实现模型泛化,降低了第一分析设备的整体训练成本。
值得说明的是,则除了前述两种可选方式,局点分析设备也可以不发送重训练请求,仅发送从对应局点网络中获取的训练样本集合,相应的,第一分析设备还可以采用以下第三种方式可选方式进行重训练:
在第三种可选方式中,第一分析设备接收至少两个局点分析设备发送的训练样本集合,并基于接收到的训练样本集合,重训练机器学习模型。
示例的,第一分析设备可以在接收到指定个数的局点分析设备(例如为与第一分析设备建立有通信连接的所有局点分析设备)发送的训练样本集合后,或者,在达到训练周期,或者,获取到足够数量的训练样本(即获取的训练样本的数量量大于训练数据量阈值)时,对机器学习模型进行重训练。
在前述三种可选方式中,局点分析设备将从对应局点网络中获取的训练样本集合(例如第一训练样本集合)发送至第一分析设备还可以存在其他时机,例如局点分析设备可以周期性地上传获取的训练样本集合;或者,局点分析设备可以在接收到运维人员发送的样本集合上传指令,或者接收到第一分析设备发送的样本集合上传指令后上传训练样本集合,该样本集合上传指令用于指示向第一分析设备上传获取的训练样本集合。第一分析设备可以基于收集的训练样本集合来进行机器学习模型的重训练。该重训练过程可以是离线训练过程,也可以是增量训练过程。该重训练过程使用的训练样本与之前的训练过程所使用的训练样本可以相同也可以不同,本申请实施例对此不做限定。
值得说明的是,本申请实施例中,前述步骤401和404可以周期性执行,也即是该应用场景支持周期性的离线训练或增量训练。其中,第一分析设备的机器学习模型在经过评估后确定性能满足性能达标条件后,可以采用步骤402的方式发送至至少一个局点分析设备。例如仅发送给向第一分析设备发送前述重训练请求的局点分析设备;或者,发送给提供用于重训练的训练样本集合的局点分析设备;或者,发送给与该第一分析设备建立有通信连接的所有或指定局点分析设备等等。对于接收到重训练后的机器学习模型的局点分析设备,若该局点分析设备自身训练得到的机器学习模型也满足性能达标条件,则局点分析设备可以在获取的机器学习模型中筛选目标机器学习模型,以筛选较优的机器学习模型(例如性能指标最高的机器学习模型)来进行分类结果的预测。通常情况下,局点分析设备选择最新的机器学习模型作为目标机器学习模型,以适配当前应用场景的分类需求。
如前所述,机器学习模型可以为多种类型的模型。其中,树模型,是一种较为常见的机器学习模型。树模型包括多个关联的节点(node)。为了便于读者理解,本申请实施例对树模型进行简单介绍。在树模型中,每个节点包括节点元素和若干指向子树的分支;在一个结点左侧的子树称为该节点的左子树(left subtree),在该节点右侧的子树称为右子树(right subtree);一个结点的子树的根称为该结点的子节点(child node),也称孩子节点;一个节点是另一节点的子节点,则另一节点为该一个节点的父节点也称双亲节点);某节点的深度或者层数是指从根节点到该节点的最长简单路径边的条数,例如,根结点的深度(也称高度 或层数)为1,根的子节点的深度为2,依此类推;叶子结点:也叫终端结点,是节点的度为0的结点;结点的度指的是结点子树的个数;非叶子节点是叶子节点之外的节点,其包括根节点,以及根节点与叶子节点之间的节点。二叉树是每个结点最多有两个子树的树结构,其实较为常见的树结构。本申请实施例中机器学习模型可以是二叉树模型。例如孤立森林模型。
如图6所示,图6为本申请实施例提供的一种示意性的树结构,该树结构包括:节点P1至P5,其中,P1为根节点,P3、P4和P5是叶子节点,P1和P2是非叶子节点,树的深度是2。该机器学习模型是通过节点P1和P3的两次节点分裂形成。节点分裂指的是一个节点对应的训练样本集合在某一分裂维度的某一分裂点被划分成至多两个子集,可以视为该节点分裂出至多两个子节点,每个子节点对应一个子集。也即是将一个节点对应的训练样本集合划分到子节点中的方式称为分裂。本申请实施例在实际应用中,树结构的表示方式有多种,图6只是示意性的树结构,其还可以有图8或图9等其他表示方式,本申请并不对树结构的表示方式进行限定。
目前在采用训练样本集合对机器学习模型进行训练时,需要遍历训练样本集合中的所有特征维度的特征数据,然后确定机器学习模型中非叶子节点的分裂参数,例如分裂维度和分裂点,基于确定的非叶子节点的分裂参数训练得到该机器学习模型。
由于在训练机器学习模型时需要进行训练样本中特征数据的遍历,而通常训练样本的数据量非常大,因此导致机器学习模型的训练效率较低。
本申请实施例中,当前述机器学习模型为树模型时,在支持机器学习模型的增量训练的基础上,还能实现机器学习模型的训练效率的提高。后续实施例中,以机器学习模型为树模型为例,对前述步骤进行解释。在树模型的离线训练或者增量训练过程中,涉及到树模型的分裂,其主要原理是将一个或多个样本所对应的空间(也称样本空间)进行切割(split)。如前所述,每个训练样本包括一个或多个特征的特征数据。在树模型中,一个训练样本对应的特征(即特征类别)为该训练样本所对应空间的特征数据所在的维度。因此,在树模型中,考虑到空间概念,一个训练样本对应的特征也称之为特征维度。本申请实施例中,一个训练样本包括一维或多维的特征数据,指的是该训练样本包括一个或多个特征维度的特征数据。例如,一个训练样本包括二维特征的特征数据,也称为该训练样本包括两个特征维度的特征数据,该训练样本对应的空间为二维空间(即一个平面)。又例如,参考表1,表1中一个训练样本包括4个特征维度的特征数据,该训练样本对应的空间为4维空间。
如图7所示,图7是本申请实施例提供的一种树模型的分裂原理示意图,该树模型基于蒙德里安过程(Mondrian process)进行分裂,其采用一个随机超平面来切割样本空间(data space),切一次可以生成两个子空间,再继续用一个随机超平面来切割每个子空间,循环下去,直到每子空间里面只有一个样本点为止。这样,密度很高的簇是可以被切很多次才会停止切割,密度很低的点很容易很早的就停到一个子空间了,其最终对应于树的一个叶子节点。图7假设图6中的机器学习模型对应的训练样本集合中的样本包括二维特征数据,也即是包括两个特征维度的特征数据,特征维度分别为特征维度x和特征维度y,该训练样本集合包括样本(a1,b1),(a1,b2),(a2,b1)。第一次节点分裂的分裂维度为特征维度x,分裂点为a3,将训练样本集合所在的样本空间切割成了2个子空间,对应于图6即为P1节点的左子树和右子树;第二次节点分裂的分裂维度为特征维度y,分裂点为b3,对应于图6即 为P2节点的左子树和右子树。由此可知,样本集合:{(a1,b1),(a1,b2),(a2,b1)}分别被划分至三个子空间。其中,当前述特征数据为时间序列的特征数据时,前述特征维度x和特征维度y可以分别为前述数据特征和/或提取特征(其中的具体特征参考前述实施例)中的任意两种特征维度。例如,特征维度x为数据排列周期中的周期时长,特征维度y为统计特征中的移动平均值,则特征数据(a1,b1)指的是周期时长为a1,移动平均值为b1。当前述特征数据为本身具有一定特征的数据,例如网络KPI数据时,前述特征维度x和特征维度y可以分别为前述KPI类别(其中的具体特征参考前述实施例)中的任意两种KPI类别。例如,特征维度x为网络流量,特征维度y为CPU利用率,则特征数据(a1,b1)指的是网络流量为a1,CPU利用率为b1。
需要说明的是,由于特征数据为数值数据,因此每个特征数据具有相应的取值,本申请实施例中,后文中将特征数据的取值称为特征取值。
在本申请实施例中,机器学习模型中的每个节点可以对应存储有节点信息,这样后续进行机器学习模型的重训练时,可以基于该节点信息进行节点分裂,以及叶子节点的分类结果的确定。示例的,该机器学习模型中的任一节点的节点信息包括标签分布信息,该标签分布信息用于反映划分至对应节点中的历史训练样本集合中样本的不同类别的标签在标签总数中的占比,该标签总数是划分至该任一节点的历史训练样本集合中样本对应的标签的总数量,任一非叶子节点的节点信息还包括历史分裂信息,该历史分裂信息为对应节点用于分裂的信息。需要说明的是,由于机器学习模型中的叶子节点是当前未进行过分裂的节点,其不存在子树,因此其历史分裂信息为空,在重训练过程中,若其进行了分裂,则该叶子节点变为非叶子节点,则需要为其添加历史分裂信息。
示例的,前述历史分裂信息包括:对应节点在机器学习模型中的位置信息,分裂维度,分裂点,划分至对应节点的历史训练样本集合的数值分布范围和历史分裂成本中的一个或多个。其中,对应节点在机器学习模型中的位置信息用于在机器学习模型中唯一定位该节点,例如,该信息包括:节点的层数、节点的标识和/或节点的分支关系,该节点的标识用于在机器学习模型唯一标识节点,其可以在节点生成时分配至该节点,该标识可以由数字和/或字符组成。当任一节点存在父节点时,该任一节点的分支关系包括该任一节点的父节点的标识,以及与该父节点的关系描述,例如图6中节点P2的分支关系包括:(节点P1:父节点);当任一节点存在子节点时,该任一节点的分支关系包括该任一节点的子节点的标识,以及与该子节点的关系描述,例如,图6中节点P2的分支关系还包括:(节点P4:左子节点,节点P5:右子节点)。任一节点的分裂维度为划分至该任一节点的历史样本数据集合中进行分裂的特征维度,分裂点为用于分裂的数值点。示例中,图6中,节点P1的分裂维度为x,分裂点为a3;节点P2的分裂维度为y,分裂点为b3,每个非叶子节点仅具有一个分裂维度和一个分裂点。划分至对应节点的历史训练样本集合的数值分布范围为该节点对应的历史训练样本集合中特征取值的分布范围,例如,图1中,节点P3的数值分布范围为[a3,a2],也可以表示为a3-a2。历史分裂成本是对应节点基于历史训练样本集合的数值分布范围确定的分裂成本,具体解释可以参考后文。
前述标签分布信息包括:划分至对应节点的历史训练样本集合中样本的同一类别的标签个数和前述标签总数;或,划分至对应节点的历史训练样本集合中样本的不同类别的标签在标签总数中的占比。其中,该标签总数是划分至该任一节点的历史训练样本集合中样本对应 的标签的总数量,历史训练样本集合中样本的同一类别的标签个数与标签总数的比值即为该类别的标签在标签总数中的占比。例如在异常检测场景中,划分至节点P1的样本的标签有10个,其中,有2个“正常”标签0,8个异常标签“1”。则标签分布信息包括:标签0:2个;标签1:8个,标签总数:10(基于此可以确定划分至对应节点的历史训练样本集合中样本的不同类别的标签在标签总数中的占比)。或者,标签分布信息包括:标签0:20%;标签1:80%。需要说明的是,此处只是示意性地介绍标签分布信息的表示方式,实际实现时该标签分布信息表示方式还可以有其他方式,本申请实施例对此不做限定。
进一步的,叶子节点的节点信息还可以包括分类结果,即最终确定的该叶子节点对应的标签。如图6中的节点P4和P5可以的节点信息可以包括分类结果。
通过为每个节点存储节点信息,可以为后续模型训练提供完整的训练信息,减少模型训练时相关信息获取的复杂度,提高模型训练效率。
尤其在节点信息包括历史训练样本集合的数值分布范围时,在后续重训练过程中,仅基于历史训练样本集合的数值分布范围即可进行有效的模型重训练,无需获取历史训练样本集合中的特征数据的实际数值,有效减少训练复杂度。
假设机器学习模型为一个二叉树模型,离线训练过程是建立机器学习模型的过程,则前述步骤401中的对机器学习模型的训练过程包括:
步骤A1、获取已确定标签的历史训练样本集合。
在一种情况下,该历史训练样本集合中样本的标签在第一分析设备获取时可能未标注,则第一分析设备可以将样本呈现给运维人员,由运维人员标注标签;在另外一种情况下,历史训练样本集合中样本的标签在第一分析设备获取时已完成标注,例如第一分析设备从前述存储设备中获取的样本,第一分析设备直接采用该训练样本集合进行模型的训练即可。
步骤A2、创建根节点。
步骤A3、将根节点作为第三节点,执行离线训练过程直至达到分裂截止条件,该离线训练过程包括:
步骤A31、进行第三节点的分裂,得到第三节点的左子节点和右子节点。
步骤A32、将左子节点作为更新后的第三节点,将历史训练样本集合划分至左子节点的左样本集合作为更新后的历史训练样本集合,再次执行离线训练过程。
步骤A33、将右子节点作为更新后的第三节点,将历史训练样本集合划分至右子节点的右样本集合作为更新后的历史训练样本集合,再次执行离线训练过程。
步骤A4、为每个叶子节点确定分类结果,得到机器学习模型。
本申请实施例中,当前述步骤中的第三节点达到分裂截止条件时,其不存在子节点,因此可以作为叶子节点,当一个节点是叶子节点时,可以基于划分至该节点的历史训练样本集合中样本的同一类别的标签个数,划分至该节点的历史训练样本集合中标签总数,确定该叶子节点的分类结果;或者,基于历史训练样本集合中样本的不同类别的标签在标签总数中的占比,确定该叶子节点的分类结果。该分类结果的确定方式仍然基于前述概率论原理,即将占比最高或数量最多的标签作为最终的分类结果。其中,任一标签在标签总数中的占比即该任一标签的总数与标签总数的比值。例如,叶子节点中的“异常”标签的个数为7,“正常”标签的个数为3,则“异常”标签在标签总数中的占比为70%,“正常”标签在标签总数中 的占比为30%,最终的分类结果为“异常”。在叶子节点的节点信息中保存该分类结果。
传统的iForest模型,是以叶子节点在每棵树上的高度平均值来计算相应的分类结果。而本申请实施例中,叶子节点的分类结果是将占比最高或数量最多的标签作为最终的分类结果,分类结果较为准确,且运算代价较小。
在前述步骤A31中,可以基于历史训练样本集合的数值分布范围进行第三节点的分裂,得到第三节点的左子节点和右子节点。历史训练样本集合的数值分布范围反映了历史训练样本集合中样本的疏密程度,当样本分布较为分散,该数值分布范围较大,当样本分布较为集中,该数值分布范围较小。
本申请实施例中,历史训练样本集合的样本可以包括至少一维特征数据,历史训练样本集合的数值分布范围为历史训练样本集合中的特征取值的分布范围,也即历史训练样本集合的数值分布范围可以由每个特征维度上的特征取值的最小值和最大值表征。在本申请实施例中,历史训练样本集合中样本包括的特征数据为数值数据,例如十进制数值、二进制数值或向量等。例如,历史训练样本集合中样本包括一维特征数据,该历史训练样本集合包括的特征取值为:1,3,...,7,10,其中最小值为1,最大值为7。则历史训练样本集合的数值分布范围为[1,10],也可以表示为1-10。
该历史训练样本集合中样本包括的特征数据可以原本即为数值数据,也可以由非数值数据通过指定算法转化得到。例如,特征维度为数据变化趋势、数据波动、统计特征或拟合特征等无法初始用数值表示的数据,可以通过指定算法转化得到数值数据。例如,对于特征数据:高,可以转化为数值数据:2;对于特征数据:中,可以转化为数值数据:1;对于特征数据低,可以转化为数值数据:0。采用包括数值数值的历史训练样本集合进行节点分裂,可以简化计算复杂度,提高运算效率。
示例的,基于历史训练样本集合的数值分布范围进行第三节点的分裂,得到第三节点的左子节点和右子节点的过程可以包括:
步骤A311、在历史训练样本集合的各特征维度中确定第三分裂维度。
在第一种可选实现方式中,第三分裂维度为在历史训练样本集合的各特征维度中随机选择的特征维度。
在第二种可选实现方式中,第三分裂维度为历史训练样本集合的各特征维度中跨度最大的特征维度。示例的,每个特征维度上的特征取值的跨度为该特征维度上特征取值的最大值与最小值之差。
例如可以先将历史训练样本集合的各特征维度按照跨度从大到小的顺序进行排序,然后在选择排序最前的跨度所对应的特征维度作为第三分裂维度。
跨度大的特征维度,其可分裂的概率较高,在该特征维度上进行节点分裂可以加快模型收敛速度,避免无效的特征维度的分裂,因此,通过选择跨度最大的特征维度作为第三分裂维度,可以提高机器学习模型的有效分裂的概率,节约节点分裂的开销。
如图8所示,本申请实施例假设历史训练样本集合中的样本包括二维特征数据,其对应二维空间,有x1,x2两个特征维度,各个特征维度上的特征取值的跨度范围分别为[x1_min,x1_max],[x2_min,x2_max]。相应的跨度为x1_max–x1_min,和x2_max–x2_min。比较这两个跨度,假设x1_max–x1_min>x2_max–x2_min,则选择特征维度x1作为第三分裂维度。
基于与前述第二种可选实现方式相同的原理,该第三分裂维度为历史训练样本集合的各特征维度中跨度占比最大的特征维度。任一特征维度上的特征取值的跨度占比d满足占比公式:d=h/z,其中,h为该历史训练样本集合在该任一特征维度上的跨度,z为各个特征维度上的特征取值的跨度之和。
以如图8为例,则z=(x1_max–x1_min)+(x2_max–x2_min),特征维度x1的跨度占比dx1=(x1_max–x1_min)/z,特征维度x2的跨度占比dx2=(x2_max–x2_min)/z;假设dx1>dx2,则选择特征维度x2作为第三分裂维度。
步骤A312、在历史训练样本集合的第三分裂维度上确定第三分裂点。
示例的,第三分裂点为在历史训练样本集合的第三分裂维度上随机选择的数值点。这样可以实现第三分裂维度上的等概率分裂。
本申请实施例在实际实现时,也可以采用其他方式选择第三分裂节点,本申请实施例对此不做限定。
步骤A313、基于第三分裂维度的第三分裂点进行第三节点的分裂,其中,将第三数值分布范围中在第三分裂维度上数值不大于第三分裂点数值的数值范围划分至左子节点,第三数值分布范围中在第三分裂维度上数值大于第三分裂点数值的数值范围划分至右子节点。
第三数值分布范围为历史训练样本集中的特征取值的分布范围,其由历史训练集合在各特征维度上的特征取值的跨度范围组成。这样在进行节点分裂时,仅需获取每个特征维度上的数值的最小值和最大值,获取的数据量小,计算简便,模型的训练效率较高。
仍然以如图8为例,假设选取的第三分裂点为x1_value∈[x1_min,x1_max],则将特征维度x1上不大于x1_value的数值范围,即[x1_min,x1_value],划分至第三节点P1的左子节点P2,将特征维度x1上大于x1_value的数值范围,即[x1_value,x1_max],划分至第三节点P1的右子节点P3。
由于前述节点的分裂只基于特征数据的数值分布范围,而不基于特征数据本身,因此在进行节点分裂时,仅需获取每个特征维度上的数值的最小值和最大值,获取的数据量小,计算简便,模型的训练效率较高。
值得说明的是,由于第一分析设备已经获取了前述用于训练的历史训练样本集合,因此也可以直接采用样本进行节点分裂,步骤A313中采用数值分布范围进行节点划分的方式可以替换为:将历史训练样本集合中在第三分裂维度上特征取值不大于第三分裂点数值的样本划分至左子节点,历史训练样本集合中在第三分裂维度上特征取值大于第三分裂点数值的样本划分至右子节点。
若对机器学习模型的分裂未添加限制,可能会造成机器学习模型的深度无限制的增大,可能一直迭代到每个叶子节点只有相同标签的样本点,或者只含有一个样本点,机器学习模型的分裂才停止。本申请实施例通过设置分裂截止条件可以控制树的深度,避免树的过度分裂。
可选地,该分裂截止条件包括以下至少一者:
条件1、第三节点的当前分裂成本大于分裂成本阈值。
条件2、历史训练样本集合的样本的数量小于第二样本数阈值。
条件3、第三节点对应的分裂次数大于分裂次数阈值。
条件4、第三节点在机器学习模型中的深度大于深度阈值。
条件5、历史训练样本集合所对应的标签中占比最大的标签的数量在历史训练样本集合所对应的标签总数中的占比大于指定占比阈值。
对于前述条件1,本申请实施例提出一种分裂成本的概念,在机器学习模型的离线训练过程中,任一节点的当前分裂成本与任一节点的训练样本集合的数值分布范围的大小负相关,该任一节点的训练样本集合为用于训练机器学习模型的训练样本集合划分至任一节点的样本的集合。示例的,该任一节点的当前分裂成本为该任一节点的训练样本集合的样本在各特征维度上的特征取值的跨度之和的倒数。则,对于第三节点,该第三节点的当前分裂成本与历史训练样本集合中的特征取值的分布范围的大小负相关,即数值分布范围越大,分裂成本越小。第三节点的当前分裂成本为历史训练样本集合在各特征维度上的特征取值的跨度之和的倒数。示例的,该分裂成本阈值可以为正无穷。
第三节点的当前分裂成本满足成本计算公式:
Figure PCTCN2020115770-appb-000006
其中,max j-min j表示历史训练样本集合的数值分布范围中第j个特征维度上的特征取值的最大值减去最小值,即该特征维度上的特征取值的跨度,N为特征维度的总数。
则如图8,节点P1的分裂成本为1/z,z=(x1_max–x1_min)+(x2_max–x2_min)。如图9,在进行增量训练的过程中,每进行一次节点分裂可以计算一次分裂成本,并与分裂成本阈值进行比较,图9中假设分裂成本初始值为0,分裂成本阈值为正无穷,计算得到的分裂成本沿树的深度方向分别为0、COST1(第一次节点分裂)、COST2(第二次节点分裂)和COST3(第三次节点分裂)等,其中,分裂次数与分裂成本正相关,分裂次数越多,分裂成本越高。
采用该条件1,在分裂成本达到一定程度时,不再进行节点分裂,可以避免树的过度分裂,减少运算开销。
对于前述条件2,当第三节点的历史训练样本集合的样本的数量小于第二样本数阈值,说明,历史训练样本集合的数据量已经较少,已不足以支持有效的节点分裂,此时停止离线训练过程,可以减少运算开销。示例的,该第二样本阈值可以为2或3。
对于前述条件3,第三节点对应的分裂次数是从根节点的首次分裂到该第三节点的当前分裂的总次数,当第三节点对应的分裂次数大于分裂次数阈值,说明当前的机器学习模型已经达到了分裂次数上限,此时停止离线训练过程,可以减少运算开销。
对于前述条件4、第三节点在机器学习模型中的深度大于深度阈值时,停止离线训练过程,可以实现对机器学习模型的深度的控制。
对于条件5、历史训练样本集合所对应的标签中占比最大的标签的数量在标签总数中的占比大于指定占比阈值,说明该占比最大的标签的数量已经达到了分类条件,可以基于此确定准确的分类结果了,此时停止离线训练过程,可以减少不必要的分裂,减少运算开销。
可选地,在前述步骤404中,局点分析设备基于第一训练样本集合,对机器学习模型进行增量训练的过程,实际上是将第一训练样本集合中的多个训练样本依次输入机器学习模型的训练过程(即一次输入一个训练样本),执行了多次训练过程,每次训练过程相同,每次训练过程实际上是节点的遍历过程。对于第一训练样本集合中的任一训练样本,均执行该遍 历过程。本申请实施例以述第一训练样本为例,假设该第一训练样本为第一训练样本集合中的任一训练样本,其包括一个或多个特征维度的特征数据,参考前述历史训练样本集合中的样本的结构,该第一训练样本集合中的特征数据为数值数据,其可以原本即为数值数据,也可以由非数值数据通过指定算法转化得到。假设第一节点为机器学习模型中的任一非叶子节点,从机器学习模型的根节点开始遍历,对步骤404中的遍历过程进行说明,则步骤404包括:
步骤B1、当遍历到的第一节点的当前分裂成本小于第一节点的历史分裂成本时,添加关联的第二节点,第二节点为第一节点的父节点或子节点。
其中,在增量训练过程中,第一节点的当前分裂成本为第一节点基于第一训练样本进行节点分裂的成本(即为该第一节点添加关联的第二节点的成本,在这种情况下,第一节点的节点分裂指的是为第一节点添加新的分支),第一节点的历史分裂成本为第一节点基于第一节点的历史训练样本集合进行节点分裂的成本,第一节点的历史训练样本集合为机器学习模型的历史训练样本集合中划分至第一节点的样本的集合。则参考前述401,若当前增量训练为接收到机器学习模型后的首次增量训练,且第一节点为前述任一第三节点,第一节点的历史训练样本集合即为第三节点对应的历史训练样本集合。
在本申请实施例中,可以通过直接比较第一节点的当前分裂成本和第一节点的历史分裂成本,以在第一节点的当前分裂成本小于第一节点的历史分裂成本时,添加关联的第二节点;进一步的,也可以先获取第一节点的当前分裂成本减去第一节点的历史分裂成本所得到的差值,判断该差值的绝对值是否大于指定差值阈值,这样可以保证在第一节点的当前分裂成本远远小于第一节点的历史分裂成本时,才进行节点分裂,这样可以节约训练成本,提高训练效率。
其中,第一节点的当前分裂成本与第一数值分布范围的大小负相关,该第一数值分布范围是基于第一训练样本中的特征取值与第二数值分布范围确定的分布范围。第二数值分布范围为第一节点的历史训练样本集合中的特征取值的分布范围。可选地,该第一数值分布范围是基于第一训练样本与第二数值分布范围的并集所确定的分布范围。例如,第一节点的历史训练样本集合中的样本包括两个特征维度的特征数据,在特征维度x的特征取值的跨度范围为[1,10],在特征维度y的特征取值的跨度范围为[5,10],则第二数值分布范围包括特征维度x上的特征取值的跨度范围:[1,10]和特征维度y上的特征取值的跨度范围:[5,10];若第一训练样本在特征维度x上的特征取值为9,在特征维度y上的特征取值为13,则分别在不同特征维度上求取第一训练样本与第二数值分布范围的并集,则第一数值分布范围在各特征维度上的特征取值的跨度范围包括:在特征维度x为[1,10],在特征维度y为[5,13]。
示例的,第一节点的当前分裂成本为第一数值分布范围在各特征维度上的特征取值的跨度之和的倒数。第一节点的当前分裂成本的计算方式均可以参考前述第三节点的当前分裂成本的计算方式,如采用前述成本计算公式(即公式三)计算,只是公式中对应的数值分布范围由前述历史训练样本集合的数值分布范围替换为该第一数值分布范围,本申请实施例不再赘述。
示例的,第一节点的历史分裂成本为第一节点的历史训练样本集合的样本在各特征维度上的特征取值的跨度之和的倒数。第一节点的历史分裂成本的计算方式可以参考前述第三节点的当前分裂成本的计算方式,如采用前述成本计算公式计算,只是公式中对应的数值分布 范围由前述历史训练样本集合的数值分布范围替换为该第一节点的历史训练样本集合的数值分布范围,本申请实施例不再赘述。
其中,添加关联的第二节点的过程可以包括:
步骤B11、确定第一数值分布范围在各特征维度上的特征取值的跨度范围。
步骤B12、基于第一分裂维度上的第一分裂点添加第二节点,其中,第一数值分布范围中在第一分裂维度上数值不大于第一分裂点数值的数值范围划分至第二节点的左子节点,第一数值分布范围中在第一分裂维度上数值大于第一分裂点数值的数值范围划分至第二节点的右子节点。
以如图10为例,假设选取的第二节点P4的分裂点为y1_value∈[y1_min,y1_max],则将第一数值分布范围中特征维度y1上小于或等于y1_value的数值范围,即[y1_min,y1_value]。划分至第二节点P4的左子节点P5,将第一数值分布范围中特征维度y1上大于y1_value的数值范围,即[y1_value,y1_max]划分至第二节点P4的右子节点P6。
第二节点分裂的过程可以参考前述步骤A313第三节点分裂的过程,本申请实施例对此不做赘述。
其中,前述第一分裂维度为基于各特征维度上的特征取值的跨度范围,在各特征维度中确定的分裂维度,第一分裂点为在第一数值分布范围的第一分裂维度上确定的用于分裂的数值点。
示例的,在第一种可选方式中,第一分裂维度为在第一数值分布范围的各特征维度中随机选择的特征维度。在第二种可选方式中,第一分裂维度为第一数值分布范围的各特征维度中跨度最大的特征维度。相应原理可以参考前述步骤A311,本申请实施例不再赘述。
可选地,第一分裂点为在第一数值分布范围的第一分裂维度上随机选择的数值点。这样可以实现第一分裂维度上的等概率分裂。
本申请实施例在实际实现时,也可以采用其他方式选择第一分裂节点,本申请实施例对此不做限定。
在第一种情况中,当第一分裂维度与第二分裂维度不同时,第二节点为第一节点的父节点或子节点,也即是第二节点位于第一节点的上层或下层。
第二分裂维度为第一节点在机器学习模型中的历史分裂维度,第二分裂点为第一节点在机器学习模型中的历史分裂点,则参考前述步骤A311和A312,当该第一节点为前述任一第三节点,则第二分裂维度即为前述第三分裂维度,第二分裂点即为前述第三分裂点。
在第二种情况中,当第一分裂维度与第二分裂维度相同,且第一分裂点位于第二分裂点右侧,第二节点为第一节点的父节点,且第一节点为第二节点的左子节点。
在第三种情况中,当第一分裂维度与第二分裂维度相同,且第一分裂点位于第二分裂点左侧,第二节点为第一节点左子节点。
如前,每个非叶子节点的节点信息可以包括分裂维度和分裂点,为了便于说明,本申请后续实施例以“u>v”的格式表示分裂维度为u,分裂点为v。
对于前述第一种情况,如图9和图11所示,图11为添加第二节点之前的一种机器学习模型,包括节点Q1和Q3,图9为图11所示的机器学习模型添加第二节点之后的机器学习模型。假设节点Q1为第一节点,其第二分裂维度为x 2,第二分裂点为0.2;Q2为第二节点,其第一分裂维度为x 1,第一分裂点为0.7,由于第一节点和第二节点的分裂维度不同,如图9 所示,新添加的第二节点作为了第一节点的父节点。
如图9和图12所示,图12为添加第二节点之前的另一种机器学习模型,包括节点Q1和Q2,图9为图12所示的机器学习模型添加第二节点之后的机器学习模型。假设节点Q1为第一节点,其第二分裂维度为x 2,第二分裂点为0.2;Q3为第二节点,其第一分裂维度为x 1,第一分裂点为0.4,由于第一节点和第二节点的分裂维度不同,新添加的第二节点作为了第一节点的子节点。
对于前述第二种情况,如图13和图14所示,图13为添加第二节点之前的一种机器学习模型,包括节点Q4和Q6,图14为图13所示的机器学习模型添加第二节点之后的机器学习模型。假设节点Q4为第一节点,其第二分裂维度为x 1,第二分裂点为0.2;Q5为第二节点,其第一分裂维度为x 1,第一分裂点为0.7,由于第一节点和第二节点的分裂维度相同,且第一分裂点位于第二分裂点右侧,新添加的第二节点作为了第一节点的父节点,且第一节点为第二节点的左子节点。
对于前述第三种情况,如图13和图15所示,图13为添加第二节点之前的一种机器学习模型,包括节点Q4和Q6,图15为图13所示的机器学习模型添加第二节点之后的机器学习模型。假设节点Q4为第一节点,其第二分裂维度为x 1,第二分裂点为0.2;Q7为第二节点,其第一分裂维度为x 1,第一分裂点为0.1,由于第一节点和第二节点的分裂维度相同,且第一分裂点位于第二分裂点左侧,新添加的第二节点作为了第一节点的左子节点。
值得说明的是,前述第二节点添加后,其子节点中不是第一节点的子节点为叶子节点,需要确定该叶子节点的分类结果。也即是,第二节点是第一节点的父节点时,第二节点的另一子节点是叶子节点;第一节点是第二节点的父节点时,第二节点的两个子节点均为叶子节点。
增量训练过程中的叶子节点的分类结果的确定方式可以参考前述离线训练的过程中叶子节点的分类结果的确定方式,基于历史训练样本集合中样本的同一类别的标签个数,以及标签总数,确定该叶子节点的分类结果,该标签总数为划分至该叶子节点的历史训练样本集合中样本对应的标签的总数量;或者,基于历史训练样本集合中样本的不同类别的标签在标签总数中的占比,确定该叶子节点的分类结果。本申请实施例对此不再赘述。
如前所述,机器学习模型中的每个节点对应存储有节点信息,这样在增量训练过程中,从第一节点的节点信息中获取的历史分裂信息,如裂维度,分裂点,历史训练样本集合的数值分布范围,用于确定是否为该第一节点添加第二节点,以实现快速的增量训练;当确定一个节点为叶子节点时,可以基于节点信息中的标签分布信息,快速确定该叶子节点的分类结果。
进一步的,在离线训练过程中,在确定每个节点后,可以为每个节点存储对应的节点信息,或者在整个机器学习训练完成后,为每个节点存储对应的节点信息,以备后续重训练使用。在增量训练过程中,在添加第二节点后,需要为第二节点对应保存其节点信息。由于新增第二节点的目的是要把不同类别的样本分开,新增的第二节点所在分支是在原机器学习模型的分支里面没有的分支,属于新增分支,所以不影响原有分支分布。因此,与第二节点存在连接关系的节点,如父节点或子节点对应的节点信息中的位置信息对应更新,节点信息中的其他信息保持不变。这样可以在尽量减少对其他节点影响的情况下进行机器学习模型的增量训练。
值得说明的是,在步骤B1的添加关联的第二节点之前,还可以检测第一节点的历史训练样本集合中的样本的数量与第一训练样本的数量之和是否大于第一样本数阈值,当第一节点的历史训练样本集合中的样本的数量与第一训练样本的数量之和大于第一样本数阈值,添加第二节点。在每一次增量训练过程中,第一训练样本的数量为1。
当第一节点的历史训练样本集合中的样本的数量与第一训练样本的数量之和不大于第一样本数阈值,停止机器学习模型的增量训练。即不执行上述添加关联的第二节点的步骤。这样一来,只有在样本的数量较多时才添加第二节点,并进行节点分裂,以避免出现无效的节点分裂,单少计算资源的开销。并且由于样本数量过少的节点进行分裂,可能带来机器学习模型的预测性能下降,通过设置该第一样本数阈值,可以保证模型的精度。
步骤B2、当第一节点的当前分裂成本不小于第一节点的历史分裂成本,遍历第一节点的子树中的各个节点,并将遍历到的节点确定为新的第一节点,再次执行所述遍历过程,直至遍历到的第一节点的当前分裂成本小于第一节点的历史分裂成本,或者遍历到目标深度。此时停止遍历过程。需要说明的是,当第一节点的当前分裂成本不小于第一节点的历史分裂成本,且遍历完第一节点的子树,也即是第一节点是叶子节点时,也停止执行上述遍历过程。
对于更新后的第一节点,当该第一节点的当前分裂成本小于第一节点的历史分裂成本,则添加与第一节点关联的第二节点,该添加第二节点的过程可以参考前述步骤B1,本申请实施例对此不再赘述。在遍历至目标深度停止遍历过程,可以避免树模型的过度分裂,防止树层数过深。
值得说明的是,在增量训练过程中,机器学习模型的历史训练样本集合指的是当前训练过程之前的训练样本集合,其相对于当前输入的第一训练样本而言的。例如,若该增量训练过程是步骤401之后首次增量训练过程,则该增量训练过程所述的历史训练样本集合与前述步骤401中的历史训练样本集合相同;若该增量训练过程是步骤401之后第w(w为大于1的整数)次增量训练过程,则该增量训练过程所述的历史训练样本集合为前述步骤401中的历史训练样本集合与前w-1次增量训练过程中输入的训练样本的集合。
本申请实施例提供训练方法,通过前述增量训练方法可以实现机器学习模型的在线增量训练,并且由于各个节点对应存储有节点信息,无需获取大量样本即可进行增量训练,从而实现了一种轻量级的机器学习模型。
值得说明的是,维护有机器学习模型的分析设备可以在达到模型精简条件时,进行机器学习模型的精简,使得精简后的机器学习模型的结构更为简单,在执行预测时,运算效率较高。在本申请实施例中,模型精简的原理实际上是寻找连通域的原理,即将机器学习模型中可以属于同一的连通域的被分割的空间进行合并。该模型精简过程包括:
将机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并,得到精简后的机器学习模型,该精简后的机器学习模型用于进行分类结果的预测,其中,第一叶子节点为第一非叶子节点的子节点,第二叶子节点为第二非叶子节点的子节点,第一叶子节点和第二叶子节点包括相同的分类结果,且在同一特征维度上分配的历史训练样本集合的特征取值的跨度范围相邻。
如图16,图16假设机器学习模型对应的训练样本集合中的样本包括两个特征维度的特征数据,特征维度分别为特征维度x和特征维度y,该训练样本集合包括:样本M(a1,b1), N(a1,b2),Q(a2,b1),U(a4,b4)。第一次节点分裂的分裂维度为特征维度x,分裂点为a3,将二维样本所在的样本空间切割成了2个子空间,对应于图16即为Q8节点的左子树和右子树;第二次节点分裂的分裂维度为特征维度y,分裂点为b3,对应于图16即为Q9节点的左子树和右子树;第三次节点分裂的分裂维度为特征维度y,分裂点为b4,对应于图16即为Q10节点的左子树和右子树。由此可知,样本M(a1,b1),N(a1,b2),Q(a2,b1),U(a4,b4)所在空间被划分为子空间1-4共4个子空间。由图16右侧的空间分割示意图可知,子空间3和4对应的叶子节点Q91和Q101的分类结果均为c,两者的特征取值的跨度范围相邻,则可以将两者进行合并,形成一个连通域,而合并后的子空间并不影响机器学习模型的实际分类结果。由图16左侧的机器学习模型可知,非叶子节点Q9的叶子节点Q91和非叶子节点Q10的叶子节点Q101的标签相同,均为c,且在y轴的特征取值的跨度范围分别为相邻的[b4,b3]以及[b3,b2],因此,非叶子节点Q9和非叶子节点Q10分别为前述第一非叶子节点和第二非叶子节点,两者的叶子节点分别为前述的第一叶子节点和第二叶子节点。如图17所示,最终子空间3和子空间4合并形成新的子空间3,非叶子节点Q9和非叶子节点Q10合并形成的新的非叶子节点Q12,叶子节点Q91和叶子节点Q101合并形成新的叶子节点Q121。相应的节点信息也进行了合并。其中,节点信息的合并,实际上是节点信息中对应参数(即相同类型的参数)取并集。例如,前述的在y轴的跨度范围[b4,b3]以及[b3,b2]合并为[b4,b2]。
精简后的机器学习模型结构更为简单,减少了树的分支层数,防止树层数过深,虽然模型架构产生变化,但不影响其预测结果,可以节约存储空间,提高预测效率。并且通过精简过程可以防止模型的过拟合。可选地,该精简过程可以周期性执行,并且该精简过程需要从机器学习模型的底层,按照由下至上(也称深度由大到小)的顺序执行。
该模型精简过程可以在前述步骤401之后或405之后由第一分析设备执行,精简后的机器学习模型可以发送至局点分析设备,以供局点分析设备基于该机器学习模型进行样本分析,即分类结果的预测。由于精简后的模型本身的大小(即模型本身占用内存的大小)变小,采用该模型进行分类结果预测时,预测速度较未精简的模型会更快,预测效率更高,并且该模型的传输开销也相应减少。进一步的,该精简后的模型若仅用于样本分析,可以不在其节点信息中记录历史分裂信息,这样可以进一步减少模型本身的大小,提高模型预测效率。
值得说明的是,前述图8至17中,若当前述特征数据为时间序列的特征数据时,任一特征维度为前述数据特征和/或提取特征(其中的具体特征参考前述实施例)中的任意一种特征维度。当前述特征数据为本身具有一定特征的数据,例如网络KPI数据时,任一特征维度为前述任一KPI类别,如,将时延数据作为特征数据,则该特征数据的特征维度为时延,又如,将丢包率数据作为特征数据,则该特征数据的特征维度为丢包率。具体定义可以参考前述实施例以及图7中的解释,本申请实施例对此不再赘述。
由于模型在训练时,需要使用具有完整结构的机器学习模型,而不是精简后的机器学习模型。由此可知,前述步骤402中发送至局点分析设备的机器学习模型为步骤401直接得到的未进行精简的机器学习模型,以支持局点分析设备进行该机器学习模型的增量训练。在另一种可选方式中,前述步骤402中发送至局点分析设备的机器学习模型也可以是精简后的机器学习模型,但是该机器学习模型需要额外携带未进行合并的节点信息,这样局点分析设备可以基于精简后的机器学习模型和未进行合并的节点信息恢复得到未进行精简的机器学习模 型,以进行该机器学习模型的增量训练。
该模型精简过程也可以在前述步骤404之后由局点分析设备执行,精简后的机器学习模型可以用于进行样本分析,即分类结果的预测。后续再次进行增量训练时,采用的模型为未进行精简的机器学习模型。
需要说明的是,本申请前述实施例以局点分析设备直接基于从局点分析设备获取的第一训练样本集合,对机器学习模型进行增量训练为例进行说明。本申请实施例在实际实现时,前述局点分析设备还可以间接基于从局点分析设备获取的第一训练样本集合,对机器学习模型进行增量训练,在一种实现方式中,该局点分析设备可以将当前的机器学习模型和第一训练样本集合发送至第一分析设备,由第一分析设备基于该第一训练样本集合进行机器学习模型的增量训练,并将训练后的机器学习模型发送至局点分析设备。该增量训练过程可以参考前述步骤404,本申请实施例不再赘述;在另一种实现方式中,该局点分析设备可以将第一训练样本集合发送至第一分析设备,由第一分析设备将第一训练样本集合以及用于训练该机器学习模型的历史训练样本进行整合得到新的历史训练样本,并基于该历史训练样本,进行初始机器学习模型的离线训练,其训练结果与基于该第一训练样本集合进行增量训练的结果相同。该离线训练过程可以参考前述步骤401,本申请实施例不再赘述。
传统的模型训练方法,在进行离线训练后,机器学习模型一旦部署在局点分析设备,即无法进行增量训练。而本申请实施例提供的模型训练方法,机器学习模型支持增量训练,可以对新的训练样本有良好的自适应性。尤其对于异常检测场景,可以对新异常模式的出现、具有新标签的样本有很好的自适应性,训练后的模型能够精准检测不同的异常模式。从而实现模型的泛化,保证预测性能,有效提高用户体验。
进一步的,如果将传统的模型训练方法应用于本申请实施例提供的应用场景中,需要在机器学习模型所部署的局点分析设备大量采集样本,并且进行样本批量训练,由于机器学习模型的训练需要能够访问历史训练样本,因此还需要存储大量历史训练样本,导致消耗大量内存和计算资源,训练代价较大。
而本申请实施例中,在增量训练或离线训练过程中,基于训练样本集合的数值分布范围进行节点分裂,无需大量访问历史训练样本,因此,有效减少了内存和计算资源的占用,降低训练代价。并且通过前述节点信息携带各个节点的相关信息,可以实现机器学习模型的轻量化,更便于机器学习模型的部署,实现模型的有效泛化。
如图18和图19所示,图18是一种传统的机器学习模型的增量训练效果示意图,图19是本申请实施例提供的机器学习模型的增量训练效果示意图。横轴表示已输入的训练样本占训练样本总量的百分比,纵轴表示反映模型性能的性能指标,该指标值越大,模型的性能越好。图18和图19中,假设在路由器的KPIs进行异常检测场景中,将一个局点分析设备获取的训练样本作为训练样本集合M分别对传统的机器学习模型和本申请实施例提供的机器学习模型进行训练,该增量训练过程周期性执行,每一轮增量训练输入该训练样本集合M中10%的样本,则传统的机器学习模型采用增量训练的方式进行训练得到的模型的性能波动较大,性能不稳定;而采用本申请实施例提供的增量训练的方式进行训练得到的模型的性能在逐渐升高后,一致维持在90%左右,性能稳定。由此可知,本申请实施例提供的模型训练方法,可以训练得到性能较稳定的机器学习模型,从而保证模型的泛化。
本申请实施例提供的模型训练方法的步骤先后顺序可以进行适当调整,步骤也可以根据 情况进行相应增减,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化的方法,都应涵盖在本申请的保护范围之内,因此不再赘述。
本申请实施例提供一种模型训练装置50,如图20所示,应用于局点分析设备,包括:
接收模块501,用于接收第一分析设备发送的机器学习模型;
增量训练模块502,用于基于第一训练样本集合,对所述机器学习模型进行增量训练,所述第一训练样本集合中的特征数据是所述局点分析设备所对应的局点网络的特征数据。
本申请实施例提供一种模型训练装置,接收模块接收第一分析设备发送的机器学习模型,增量训练模块基于从该局点分析设备所对应的局点网络获取的第一训练样本集合,对机器学习模型进行增量训练。一方面,第一训练样本集合中的特征数据是从该局点分析设备所对应的局点网络获取的特征数据,其更适配于局点分析设备的应用场景,采用包括局点分析设备从对应的局点网络获取的特征数据的第一训练样本集合进行模型训练,可以使得训练后的机器学习模型更适配于该局点分析设备自身的需求,实现模型的定制化,提高模型的应用灵活性;另一方面,通过离线训练和增量训练结合的方式来训练机器学习模型,可以在局点分析设备所获取的特征数据的类别或模式出现变化时,进行机器学习模型的增量训练,实现机器学习模型的灵活调整,从而保证训练得到的机器学习模型符合局点分析设备的需求。因此,本申请实施例提供的模型训练装置,相较于相关技术,能够有效适配于局点分析设备的需求。
可选地,如图21所示,所述装置50还包括:
预测模块503,用于在所述接收第一分析设备发送的机器学习模型之后,采用所述机器学习模型进行分类结果的预测;
第一发送模块504,用于向所述评估设备发送预测信息,所述预测信息包括预测得到的分类结果,以供所述评估设备基于所述预测信息评估所述机器学习模型是否发生劣化;
所述增量训练模块502,用于:
在接收到所述评估设备发送的训练指令后,基于所述第一训练样本集合,对所述机器学习模型进行增量训练,所述训练指令用于指示对所述机器学习模型进行训练。
可选地,所述机器学习模型用于对一个或多个关键绩效指标KPI特征数据组成的待预测数据进行分类结果的预测;所述KPI特征数据为KPI时间序列的特征数据,或者为KPI数据;
所述预测信息还包括所述待预测数据中的KPI特征数据对应的KPI类别,所述待预测数据所属的设备的标识以及所述待预测数据对应的KPI数据的采集时刻。
可选地,如图22所示,所述装置50还包括:
第二发送模块505,用于当增量训练后的机器学习模型的性能不满足性能达标条件时,向所述第一分析设备发送重训练请求,所述重训练请求用于请求所述第一分析设备对所述机器学习模型进行重训练。
可选地,所述机器学习模型为树模型,所述增量训练模块502,用于:
对于所述第一训练样本集合中的任一训练样本,从所述机器学习模型的根节点开始遍历,执行如下遍历过程:
当遍历到的第一节点的当前分裂成本小于第一节点的历史分裂成本时,添加关联的第二节点,所述第一节点为所述机器学习模型中的任一非叶子节点,所述第二节点为所述第一节点的父节点或子节点;
当第一节点的当前分裂成本不小于第一节点的历史分裂成本,遍历所述第一节点的子树中的节点,并将遍历到的节点确定为新的第一节点,再次执行所述遍历过程,直至遍历到的第一节点的当前分裂成本小于第一节点的历史分裂成本,或者遍历到目标深度;
其中,第一节点的当前分裂成本为所述第一节点基于第一训练样本进行节点分裂的成本,所述第一训练样本为所述第一训练样本集合中的任一训练样本,所述第一训练样本包括一个或多个特征维度的特征数据,所述特征数据为数值数据,所述第一节点的历史分裂成本为所述第一节点基于所述第一节点的历史训练样本集合进行节点分裂的成本,所述第一节点的历史训练样本集合为所述机器学习模型的历史训练样本集合中划分至所述第一节点的样本的集合。
可选地,所述第一节点的当前分裂成本与第一数值分布范围的大小负相关,所述第一数值分布范围是基于所述第一训练样本中的特征取值与第二数值分布范围确定的分布范围;所述第二数值分布范围为第一节点的历史训练样本集合中的特征取值的分布范围,所述第一节点的历史分裂成本与所述第二数值分布范围的大小负相关。
可选地,所述第一节点的当前分裂成本为所述第一数值分布范围中各特征维度上的特征取值的跨度之和的倒数,所述第一节点的历史分裂成本为所述第二数值分布范围中各特征维度上的特征取值的跨度之和的倒数。
可选地,所述增量训练模块502,用于:
确定所述第一数值分布范围在各特征维度上的特征取值的跨度范围;
基于第一分裂维度上的第一分裂点添加所述第二节点,其中,所述第一数值分布范围中在所述第一分裂维度上数值不大于所述第一分裂点数值的数值范围划分至所述第二节点的左子节点,所述第一数值分布范围中在所述第一分裂维度上数值大于所述第一分裂点数值的数值范围划分至所述第二节点的右子节点,所述第一分裂维度为基于所述各特征维度上的特征取值的跨度范围,在所述各特征维度中确定的分裂维度,所述第一分裂点为在所述第一数值分布范围的所述第一分裂维度上确定的用于分裂的数值点;
当所述第一分裂维度与第二分裂维度不同时,所述第二节点为所述第一节点的父节点或子节点,所述第二分裂维度为所述第一节点在所述机器学习模型中的历史分裂维度,所述第二分裂点为所述第一节点在所述机器学习模型中的历史分裂点;
当所述第一分裂维度与所述第二分裂维度相同,且所述第一分裂点位于所述第二分裂点右侧,所述第二节点为所述第一节点的父节点,且所述第一节点为所述第二节点的左子节点;
当所述第一分裂维度与所述第二分裂维度相同,且所述第一分裂点位于所述第二分裂点左侧,所述第二节点为所述第一节点左子节点。
可选地,所述第一分裂维度为在所述第一数值分布范围的各特征维度中随机选择的特征维度,或,所述第一分裂维度为所述第一数值分布范围的各特征维度中跨度最大的特征维度;
和/或,所述第一分裂点为在所述第一数值分布范围的所述第一分裂维度上随机选择的数值点。
可选地,所述增量训练模块502,用于:
当所述第一节点的历史训练样本集合中的样本的数量与所述第一训练样本的数量之和大于第一样本数阈值,添加所述第二节点;
所述装置还包括:
停止模块,用于当所述第一节点的历史训练样本集合中的样本的数量与所述第一训练样本的数量之和不大于所述第一样本数阈值,停止所述机器学习模型的增量训练。
在一种可选实现方式中,如图23所示,所述装置50还包括:
合并模块506,用于将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并,得到精简后的机器学习模型,所述精简后的机器学习模型用于进行分类结果的预测。
在另一种可选实现方式中,所述接收模块501,还用于接收所述第一分析设备发送的精简后的机器学习模型,所述精简后的机器学习模型是所述第一分析设备将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并得到的;
其中,所述第一叶子节点为所述第一非叶子节点的子节点,所述第二叶子节点为所述第二非叶子节点的子节点,所述第一叶子节点和所述第二叶子节点包括相同的分类结果,且在同一特征维度上分配的历史训练样本集合的特征取值的跨度范围相邻。
可选地,所述机器学习模型中的每个节点对应存储有节点信息,所述机器学习模型中的任一节点的所述节点信息包括标签分布信息,所述标签分布信息用于反映划分至对应节点中的历史训练样本集合中样本的不同类别的标签在标签总数中的占比,所述标签总数是划分至所述任一节点的历史训练样本集合中样本对应的标签的总数量,任一非叶子节点的节点信息还包括历史分裂信息,所述历史分裂信息为对应节点用于分裂的信息。
可选地,所述历史分裂信息包括:对应节点在机器学习模型中的位置信息,分裂维度,分裂点,划分至对应节点的历史训练样本集合的数值分布范围和历史分裂成本;
所述标签分布信息包括:划分至对应节点的历史训练样本集合中样本的同一类别的标签个数和所述标签总数;或,划分至对应节点的历史训练样本集合中样本的不同类别的标签在所述标签总数中的占比。
可选地,所述第一训练样本集合包括在所述局点分析设备获取的样本中筛选得到的满足低区分度条件的样本,所述低区分度条件包括以下至少一者:
采用所述机器学习模型预测样本得到的目标概率集合中任意两个概率的差值的绝对值小于第一差值阈值,所述目标概率集合包括按照概率的大小降序排列的前n个分类结果的概率,1<n<m,m为所述机器学习模型预测样本得到的概率总数;
或者,采用所述机器学习模型预测样本得到的概率中任意两个概率的差值的绝对值小于第二差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率中最高概率和最低概率的差值的绝对值小于第三差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率中任意两个概率的差值的绝对值小于第四差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率分布熵E大于指定分布熵阈值,所述E满足:
Figure PCTCN2020115770-appb-000007
其中,x i表示第i种分类结果,P(x i)表示预测得到样本的第i种分类结果的概率,b为 指定的底数,0≤P(x i)≤1。
本申请实施例提供一种模型训练装置60,如图24所示,应用于第一分析设备,包括:
离线训练模块601,用于基于历史训练样本集合进行离线训练,得到机器学习模型;
发送模块602,用于向多个局点分析设备发送所述机器学习模型,以供所述局点分析设备基于第一训练样本集合,对所述机器学习模型进行增量训练,任一局点分析设备用于训练所述机器学习模型的训练样本集合中的特征数据是所述任一局点分析设备所对应的局点网络的特征数据。
发送模块可以将离线训练模块训练得到的机器学习模型分发至每个局点分析设备中,由各个局点分析设备进行增量训练,保证各个局点分析设备上的机器学习模型的性能。如此,第一分析设备无需为每个局点分析设备都训练对应的机器学习模型,有效减少了第一分析设备的整体训练时长,且离线训练得到的模型在各个局点分析设备均可以作为增量训练的基础,提高了离线训练得到的模型的通用性,从而实现模型泛化,降低了第一分析设备的整体训练成本。
并且,局点分析设备接收第一分析设备发送的机器学习模型,并可以基于从该局点分析设备所对应的局点网络获取的第一训练样本集合,对机器学习模型进行增量训练。一方面,第一训练样本集合中的特征数据是从该局点分析设备所对应的局点网络获取的特征数据,其更适配于局点分析设备的应用场景,采用包括局点分析设备从对应的局点网络获取的特征数据的第一训练样本集合进行模型训练,可以使得训练后的机器学习模型更适配于该局点分析设备自身的需求,实现模型的定制化,提高模型的应用灵活性;另一方面,通过离线训练和增量训练结合的方式来训练机器学习模型,可以在局点分析设备所获取的特征数据的类别或模式出现变化时,进行机器学习模型的增量训练,实现机器学习模型的灵活调整,从而保证训练得到的机器学习模型符合局点分析设备的需求。因此,本申请实施例提供的模型训练方法,相较于相关技术,能够有效适配于局点分析设备的需求。
可选地,所述历史训练样本集合是多个所述局点分析设备发送的训练样本的集合。
如图25所示,所述装置60还包括:
接收模块603,用于:
在向局点分析设备发送所述机器学习模型之后,接收所述局点分析设备发送的重训练请求,基于发送所述重训练请求的局点分析设备所发送的训练样本集合,对所述机器学习模型进行重训练;
或者,接收所述局点分析设备发送的重训练请求,基于发送所述重训练请求的局点分析设备所发送的训练样本集合以及其他局点分析设备所发送的训练样本集合,对所述机器学习模型进行重训练;
或者,接收至少两个所述局点分析设备发送的训练样本集合,并基于接收到的训练样本集合,重训练所述机器学习模型。
可选地,所述机器学习模型为树模型,所述离线训练模块,用于:
获取已确定标签的历史训练样本集合,所述历史训练样本集合中的训练样本包括一个或多个特征维度的特征数据,所述特征数据为数值数据;
创建根节点;
将所述根节点作为第三节点,执行离线训练过程直至达到分裂截止条件;
为每个叶子节点确定分类结果,得到所述机器学习模型;
其中,所述离线训练过程包括:
进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点;
将所述左子节点作为更新后的第三节点,将所述历史训练样本集合划分至所述左子节点的左样本集合作为更新后的历史训练样本集合,再次执行所述离线训练过程;
将所述右子节点作为更新后的第三节点,将所述历史训练样本集合划分至所述右子节点的右样本集合作为更新后的历史训练样本集合,再次执行所述离线训练过程。
可选地,所述离线训练模块601,用于:
基于所述历史训练样本集合的数值分布范围进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点,所述历史训练样本集合的数值分布范围为所述历史训练样本集合中的特征取值的分布范围。
可选地,所述离线训练模块601,用于:
在所述历史训练样本集合的各特征维度中确定第三分裂维度;
在所述历史训练样本集合的所述第三分裂维度上确定第三分裂点;
将第三数值分布范围中在所述第三分裂维度上数值不大于所述第三分裂点数值的数值范围划分至所述左子节点,所述第三数值分布范围中在所述第三分裂维度上数值大于所述第三分裂点数值的数值范围划分至所述右子节点,所述第三数值分布范围为所述第三节点的历史训练样本集中的特征取值的分布范围。
可选地,所述分裂截止条件包括以下至少一者:
所述第三节点的当前分裂成本大于分裂成本阈值;
或者,所述历史训练样本集合的样本的数量小于第二样本数阈值;
或者,所述第三节点对应的分裂次数大于分裂次数阈值;
或者,所述第三节点在所述机器学习模型中的深度大于深度阈值;
或者,所述历史训练样本集合所对应的标签中占比最大的标签的数量在所述历史训练样本集合所对应的标签的标签总数中的占比大于指定占比阈值。
可选地,所述第三节点的当前分裂成本与所述历史训练样本集合中的特征取值的分布范围的大小负相关。
可选地,所述第三节点的当前分裂成本为所述历史训练样本集合在各特征维度上的特征取值的跨度之和的倒数。
可选地,如图26所示,所述装置60还包括:
合并模块604,用于将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并,得到精简后的机器学习模型,所述精简后的机器学习模型用于进行分类结果的预测,其中,所述第一叶子节点为所述第一非叶子节点的子节点,所述第二叶子节点为所述第二非叶子节点的子节点,所述第一叶子节点和所述第二叶子节点包括相同的分类结果,且在同一特征维度上分配的历史训练样本集合的特征取值的跨度范围相邻;
所述发送模块602,还用于向所述局点分析设备发送所述精简后的机器学习模型,以供所述局点分析设备基于所述精简后的机器学习模型进行分类结果的预测。
可选地,所述机器学习模型中的每个节点对应存储有节点信息,所述机器学习模型中的任一节点的所述节点信息包括标签分布信息,所述标签分布信息用于反映划分至对应节点中的历史训练样本集合中样本的不同类别的标签在标签总数中的占比,所述标签总数是划分至所述任一节点的历史训练样本集合中样本对应的标签的总数量,任一非叶子节点的节点信息还包括历史分裂信息,所述历史分裂信息为对应节点用于分裂的信息。
可选地,所述历史分裂信息包括:对应节点在机器学习模型中的位置信息,分裂维度,分裂点,划分至对应节点的历史训练样本集合的数值分布范围和历史分裂成本;
所述标签分布信息包括:划分至对应节点的历史训练样本集合中样本的同一类别的标签个数和所述标签总数;或,划分至对应节点的历史训练样本集合中样本的不同类别的标签在所述标签总数中的占比。
可选地,所述第一训练样本集合包括在所述局点分析设备获取的样本中筛选得到的满足低区分度条件的样本,所述低区分度条件包括以下至少一者:
采用所述机器学习模型预测样本得到的目标概率集合中任意两个概率的差值的绝对值小于第一差值阈值,所述目标概率集合包括按照概率的大小降序排列的前n个分类结果的概率,1<n<m,m为所述机器学习模型预测样本得到的概率总数;
或者,采用所述机器学习模型预测样本得到的概率中任意两个概率的差值的绝对值小于第二差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率中最高概率和最低概率的差值的绝对值小于第三差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率中任意两个概率的差值的绝对值小于第四差值阈值;
或者,采用所述机器学习模型预测样本的多种分类结果的概率分布熵E大于指定分布熵阈值,所述E满足:
Figure PCTCN2020115770-appb-000008
其中,x i表示第i种分类结果,P(x i)表示预测得到样本的第i种分类结果的概率,b为指定的底数,0≤P(x i)≤1。
图27是本申请实施例提供的一种模型训练装置的框图。该模型训练装置可以是前述分析设备,例如局点分析设备,或前述第一分析设备。如图27所示,分析设备70包括:处理器701和存储器702。
存储器701,用于存储计算机程序,计算机程序包括程序指令;
处理器702,用于调用计算机程序,实现本申请实施例提供的模型训练方法。
可选地,该网络设备70还包括通信总线703和通信接口704。
其中,处理器701包括一个或者一个以上处理核心,处理器701通过运行计算机程序,从而执行各种功能应用以及数据处理。
存储器702可用于存储计算机程序。可选地,存储器可存储操作系统和至少一个功能所需的应用程序单元。操作系统可以是实时操作系统(Real Time eXecutive,RTX)、LINUX、UNIX、WINDOWS或OS X之类的操作系统。
通信接口704可以为多个,通信接口704用于与其它存储设备或网络设备进行通信。例 如在本申请实施例中,通信接口704可以用于接收通信网络中的网络设备发送的样本数据。
存储器702与通信接口704分别通过通信总线703与处理器701连接。
本申请实施例提供了一种计算机存储介质,计算机存储介质上存储有指令,当指令被处理器执行时,实现本申请实施例提供的模型训练方法。
本申请实施例提供了一种模型训练系统,包括:第一分析设备和多个局点分析设备;
该第一分析设备包括前述实施例任一所述的模型训练装置;所述局点分析设备包括前述实施例任一所述的模型训练装置。示例的,该模型训练系统中各个设备的部署可以参考前述图1至图3所示的应用场景中的各个设备的部署,例如,该模型训练系统还包括:网络设备、评估设备、存储设备和管理设备中的一者或多者,相关设备的介绍参考前述图1至图3,本申请实施例对此不做赘述。
本申请实施例中,A参考B,指的是A可以与B相同,也可以在B的基础上做简单变形。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现,所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机的可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质,或者半导体介质(例如固态硬盘)等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (57)

  1. 一种模型训练方法,其特征在于,应用于局点分析设备,包括:
    接收第一分析设备发送的机器学习模型;
    基于第一训练样本集合,对所述机器学习模型进行增量训练,所述第一训练样本集合中的特征数据是所述局点分析设备所对应的局点网络的特征数据。
  2. 根据权利要求1所述的方法,其特征在于,在所述接收第一分析设备发送的机器学习模型之后,所述方法还包括:
    采用所述机器学习模型进行分类结果的预测;
    向所述评估设备发送预测信息,所述预测信息包括预测得到的分类结果,以供所述评估设备基于所述预测信息评估所述机器学习模型是否发生劣化;
    所述基于第一训练样本集合,对所述机器学习模型进行增量训练,包括:
    在接收到所述评估设备发送的训练指令后,基于所述第一训练样本集合,对所述机器学习模型进行增量训练,所述训练指令用于指示对所述机器学习模型进行训练。
  3. 根据权利要求2所述的方法,其特征在于,所述机器学习模型用于对一个或多个关键绩效指标KPI特征数据组成的待预测数据进行分类结果的预测;所述KPI特征数据为KPI时间序列的特征数据,或者为KPI数据;
    所述预测信息还包括所述待预测数据中的KPI特征数据对应的KPI类别,所述待预测数据所属的设备的标识以及所述待预测数据对应的KPI数据的采集时刻。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述方法还包括:
    当增量训练后的机器学习模型的性能不满足性能达标条件时,向所述第一分析设备发送重训练请求,所述重训练请求用于请求所述第一分析设备对所述机器学习模型进行重训练。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述机器学习模型为树模型,所述基于第一训练样本集合,对所述机器学习模型进行增量训练,包括:
    对于所述第一训练样本集合中的任一训练样本,从所述机器学习模型的根节点开始遍历,执行如下遍历过程:
    当遍历到的第一节点的当前分裂成本小于第一节点的历史分裂成本时,添加关联的第二节点,所述第一节点为所述机器学习模型中的任一非叶子节点,所述第二节点为所述第一节点的父节点或子节点;
    当第一节点的当前分裂成本不小于第一节点的历史分裂成本,遍历所述第一节点的子树中的节点,并将遍历到的节点确定为新的第一节点,再次执行所述遍历过程,直至遍历到的第一节点的当前分裂成本小于第一节点的历史分裂成本,或者遍历到目标深度;
    其中,第一节点的当前分裂成本为所述第一节点基于第一训练样本进行节点分裂的成本,所述第一训练样本为所述第一训练样本集合中的任一训练样本,所述第一训练样本包括一个或多个特征维度的特征数据,所述特征数据为数值数据,所述第一节点的历史分裂成本为所 述第一节点基于所述第一节点的历史训练样本集合进行节点分裂的成本,所述第一节点的历史训练样本集合为所述机器学习模型的历史训练样本集合中划分至所述第一节点的样本的集合。
  6. 根据权利要求5所述的方法,其特征在于,所述第一节点的当前分裂成本与第一数值分布范围的大小负相关,所述第一数值分布范围是基于所述第一训练样本中的特征取值与第二数值分布范围确定的分布范围;所述第二数值分布范围为第一节点的历史训练样本集合中的特征取值的分布范围,所述第一节点的历史分裂成本与所述第二数值分布范围的大小负相关。
  7. 根据权利要求6所述的方法,其特征在于,所述第一节点的当前分裂成本为所述第一数值分布范围中各特征维度上的特征取值的跨度之和的倒数,所述第一节点的历史分裂成本为所述第二数值分布范围中各特征维度上的特征取值的跨度之和的倒数。
  8. 根据权利要求6所述的方法,其特征在于,所述添加关联的第二节点,包括:
    确定所述第一数值分布范围在各特征维度上的特征取值的跨度范围;
    基于第一分裂维度上的第一分裂点添加所述第二节点,其中,所述第一数值分布范围中在所述第一分裂维度上数值不大于所述第一分裂点数值的数值范围划分至所述第二节点的左子节点,所述第一数值分布范围中在所述第一分裂维度上数值大于所述第一分裂点数值的数值范围划分至所述第二节点的右子节点,所述第一分裂维度为基于所述各特征维度上的特征取值的跨度范围,在所述各特征维度中确定的分裂维度,所述第一分裂点为在所述第一数值分布范围的所述第一分裂维度上确定的用于分裂的数值点;
    当所述第一分裂维度与第二分裂维度不同时,所述第二节点为所述第一节点的父节点或子节点,所述第二分裂维度为所述第一节点在所述机器学习模型中的历史分裂维度,所述第二分裂点为所述第一节点在所述机器学习模型中的历史分裂点;
    当所述第一分裂维度与所述第二分裂维度相同,且所述第一分裂点位于所述第二分裂点右侧,所述第二节点为所述第一节点的父节点,且所述第一节点为所述第二节点的左子节点;
    当所述第一分裂维度与所述第二分裂维度相同,且所述第一分裂点位于所述第二分裂点左侧,所述第二节点为所述第一节点左子节点。
  9. 根据权利要求6所述的方法,其特征在于,所述第一分裂维度为在所述第一数值分布范围的各特征维度中随机选择的特征维度,或,所述第一分裂维度为所述第一数值分布范围的各特征维度中跨度最大的特征维度;
    和/或,所述第一分裂点为在所述第一数值分布范围的所述第一分裂维度上随机选择的数值点。
  10. 根据权利要求5至9任一所述的方法,其特征在于,所述添加关联的第二节点,包括:
    当所述第一节点的历史训练样本集合中的样本的数量与所述第一训练样本的数量之和大 于第一样本数阈值,添加所述第二节点;
    所述方法还包括:
    当所述第一节点的历史训练样本集合中的样本的数量与所述第一训练样本的数量之和不大于所述第一样本数阈值,停止所述机器学习模型的增量训练。
  11. 根据权利要求5至10任一所述的方法,其特征在于,所述方法还包括:
    将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并,得到精简后的机器学习模型,所述精简后的机器学习模型用于进行分类结果的预测;
    或者,接收所述第一分析设备发送的精简后的机器学习模型,所述精简后的机器学习模型是所述第一分析设备将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并得到的;
    其中,所述第一叶子节点为所述第一非叶子节点的子节点,所述第二叶子节点为所述第二非叶子节点的子节点,所述第一叶子节点和所述第二叶子节点包括相同的分类结果,且在同一特征维度上分配的历史训练样本集合的特征取值的跨度范围相邻。
  12. 根据权利要求5至11任一所述的方法,其特征在于,所述机器学习模型中的每个节点对应存储有节点信息,所述机器学习模型中的任一节点的所述节点信息包括标签分布信息,所述标签分布信息用于反映划分至对应节点中的历史训练样本集合中样本的不同类别的标签在标签总数中的占比,所述标签总数是划分至所述任一节点的历史训练样本集合中样本对应的标签的总数量,任一非叶子节点的节点信息还包括历史分裂信息,所述历史分裂信息为对应节点用于分裂的信息。
  13. 根据权利要求12所述的方法,其特征在于,所述历史分裂信息包括:对应节点在机器学习模型中的位置信息,分裂维度,分裂点,划分至对应节点的历史训练样本集合的数值分布范围和历史分裂成本;
    所述标签分布信息包括:划分至对应节点的历史训练样本集合中样本的同一类别的标签个数和所述标签总数;或,划分至对应节点的历史训练样本集合中样本的不同类别的标签在所述标签总数中的占比。
  14. 根据权利要求1至13任一所述的方法,其特征在于,所述第一训练样本集合包括在所述局点分析设备获取的样本中筛选得到的满足低区分度条件的样本,所述低区分度条件包括以下至少一者:
    采用所述机器学习模型预测样本得到的目标概率集合中任意两个概率的差值的绝对值小于第一差值阈值,所述目标概率集合包括按照概率的大小降序排列的前n个分类结果的概率,1<n<m,m为所述机器学习模型预测样本得到的概率总数;
    或者,采用所述机器学习模型预测样本得到的概率中任意两个概率的差值的绝对值小于第二差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率中最高概率和最低概率的 差值的绝对值小于第三差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率中任意两个概率的差值的绝对值小于第四差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率分布熵E大于指定分布熵阈值,所述E满足:
    Figure PCTCN2020115770-appb-100001
    其中,x i表示第i种分类结果,P(x i)表示预测得到样本的第i种分类结果的概率,b为指定的底数,0≤P(x i)≤1。
  15. 一种模型训练方法,其特征在于,应用于第一分析设备,包括:
    基于历史训练样本集合进行离线训练,得到机器学习模型;
    向多个局点分析设备发送所述机器学习模型,以供所述局点分析设备基于第一训练样本集合,对所述机器学习模型进行增量训练,任一局点分析设备用于训练所述机器学习模型的训练样本集合中的特征数据是所述任一局点分析设备所对应的局点网络的特征数据。
  16. 根据权利要求15所述的方法,其特征在于,所述历史训练样本集合是多个所述局点分析设备发送的训练样本的集合。
  17. 根据权利要求15所述的方法,其特征在于,在向局点分析设备发送所述机器学习模型之后,所述方法还包括:
    接收所述局点分析设备发送的重训练请求,基于发送所述重训练请求的局点分析设备所发送的训练样本集合,对所述机器学习模型进行重训练;
    或者,接收所述局点分析设备发送的重训练请求,基于发送所述重训练请求的局点分析设备所发送的训练样本集合以及其他局点分析设备所发送的训练样本集合,对所述机器学习模型进行重训练;
    或者,接收至少两个所述局点分析设备发送的训练样本集合,并基于接收到的训练样本集合,重训练所述机器学习模型。
  18. 根据权利要求15至17任一所述的方法,其特征在于,所述机器学习模型为树模型,所述基于历史训练样本集合进行离线训练,得到机器学习模型,包括:
    获取已确定标签的历史训练样本集合,所述历史训练样本集合中的训练样本包括一个或多个特征维度的特征数据,所述特征数据为数值数据;
    创建根节点;
    将所述根节点作为第三节点,执行离线训练过程直至达到分裂截止条件;
    为每个叶子节点确定分类结果,得到所述机器学习模型;
    其中,所述离线训练过程包括:
    进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点;
    将所述左子节点作为更新后的第三节点,将所述历史训练样本集合划分至所述左子节点 的左样本集合作为更新后的历史训练样本集合,再次执行所述离线训练过程;
    将所述右子节点作为更新后的第三节点,将所述历史训练样本集合划分至所述右子节点的右样本集合作为更新后的历史训练样本集合,再次执行所述离线训练过程。
  19. 根据权利要求18所述的方法,其特征在于,所述进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点,包括:
    基于所述历史训练样本集合的数值分布范围进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点,所述历史训练样本集合的数值分布范围为所述历史训练样本集合中的特征取值的分布范围。
  20. 根据权利要求19所述的方法,其特征在于,所述基于所述历史训练样本集合的数值分布范围进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点,包括:
    在所述历史训练样本集合的各特征维度中确定第三分裂维度;
    在所述历史训练样本集合的所述第三分裂维度上确定第三分裂点;
    将第三数值分布范围中在所述第三分裂维度上数值不大于所述第三分裂点数值的数值范围划分至所述左子节点,所述第三数值分布范围中在所述第三分裂维度上数值大于所述第三分裂点数值的数值范围划分至所述右子节点,所述第三数值分布范围为所述第三节点的历史训练样本集中的特征取值的分布范围。
  21. 根据权利要求18至20任一所述的方法,其特征在于,所述分裂截止条件包括以下至少一者:
    所述第三节点的当前分裂成本大于分裂成本阈值;
    或者,所述历史训练样本集合的样本的数量小于第二样本数阈值;
    或者,所述第三节点对应的分裂次数大于分裂次数阈值;
    或者,所述第三节点在所述机器学习模型中的深度大于深度阈值;
    或者,所述历史训练样本集合所对应的标签中占比最大的标签的数量在所述历史训练样本集合所对应的标签的标签总数中的占比大于指定占比阈值。
  22. 根据权利要求21所述的方法,其特征在于,所述第三节点的当前分裂成本与所述历史训练样本集合中的特征取值的分布范围的大小负相关。
  23. 根据权利要求22所述的方法,其特征在于,所述第三节点的当前分裂成本为所述历史训练样本集合在各特征维度上的特征取值的跨度之和的倒数。
  24. 根据权利要求18至23任一所述的方法,其特征在于,所述方法还包括:
    将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并,得到精简后的机器学习模型,所述精简后的机器学习模型用于进行分类结果的预测,其中,所述第一叶子节点为所述第一非叶子节点的子节点,所述第二叶子节点为所述第二非叶子节点的子节点,所述第一叶子节点和所述第二叶子节点包括相同的分类 结果,且在同一特征维度上分配的历史训练样本集合的特征取值的跨度范围相邻;
    向所述局点分析设备发送所述精简后的机器学习模型,以供所述局点分析设备基于所述精简后的机器学习模型进行分类结果的预测。
  25. 根据权利要求18至24任一所述的方法,其特征在于,所述机器学习模型中的每个节点对应存储有节点信息,所述机器学习模型中的任一节点的所述节点信息包括标签分布信息,所述标签分布信息用于反映划分至对应节点中的历史训练样本集合中样本的不同类别的标签在标签总数中的占比,所述标签总数是划分至所述任一节点的历史训练样本集合中样本对应的标签的总数量,任一非叶子节点的节点信息还包括历史分裂信息,所述历史分裂信息为对应节点用于分裂的信息。
  26. 根据权利要求25所述的方法,其特征在于,所述历史分裂信息包括:对应节点在机器学习模型中的位置信息,分裂维度,分裂点,划分至对应节点的历史训练样本集合的数值分布范围和历史分裂成本;
    所述标签分布信息包括:划分至对应节点的历史训练样本集合中样本的同一类别的标签个数和所述标签总数;或,划分至对应节点的历史训练样本集合中样本的不同类别的标签在所述标签总数中的占比。
  27. 根据权利要求15至26任一所述的方法,其特征在于,所述第一训练样本集合包括在所述局点分析设备获取的样本中筛选得到的满足低区分度条件的样本,所述低区分度条件包括以下至少一者:
    采用所述机器学习模型预测样本得到的目标概率集合中任意两个概率的差值的绝对值小于第一差值阈值,所述目标概率集合包括按照概率的大小降序排列的前n个分类结果的概率,1<n<m,m为所述机器学习模型预测样本得到的概率总数;
    或者,采用所述机器学习模型预测样本得到的概率中任意两个概率的差值的绝对值小于第二差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率中最高概率和最低概率的差值的绝对值小于第三差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率中任意两个概率的差值的绝对值小于第四差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率分布熵E大于指定分布熵阈值,所述E满足:
    Figure PCTCN2020115770-appb-100002
    其中,x i表示第i种分类结果,P(x i)表示预测得到样本的第i种分类结果的概率,b为指定的底数,0≤P(x i)≤1。
  28. 一种模型训练装置,其特征在于,应用于局点分析设备,包括:
    接收模块,用于接收第一分析设备发送的机器学习模型;
    增量训练模块,用于基于第一训练样本集合,对所述机器学习模型进行增量训练,所述 第一训练样本集合中的特征数据是所述局点分析设备所对应的局点网络的特征数据。
  29. 根据权利要求28所述的装置,其特征在于,所述装置还包括:
    预测模块,用于在所述接收第一分析设备发送的机器学习模型之后,采用所述机器学习模型进行分类结果的预测;
    第一发送模块,用于向所述评估设备发送预测信息,所述预测信息包括预测得到的分类结果,以供所述评估设备基于所述预测信息评估所述机器学习模型是否发生劣化;
    所述增量训练模块,用于:
    在接收到所述评估设备发送的训练指令后,基于所述第一训练样本集合,对所述机器学习模型进行增量训练,所述训练指令用于指示对所述机器学习模型进行训练。
  30. 根据权利要求29所述的装置,其特征在于,所述机器学习模型用于对一个或多个关键绩效指标KPI特征数据组成的待预测数据进行分类结果的预测;所述KPI特征数据为KPI时间序列的特征数据,或者为KPI数据;
    所述预测信息还包括所述待预测数据中的KPI特征数据对应的KPI类别,所述待预测数据所属的设备的标识以及所述待预测数据对应的KPI数据的采集时刻。
  31. 根据权利要求28至30任一所述的装置,其特征在于,所述装置还包括:
    第二发送模块,用于当增量训练后的机器学习模型的性能不满足性能达标条件时,向所述第一分析设备发送重训练请求,所述重训练请求用于请求所述第一分析设备对所述机器学习模型进行重训练。
  32. 根据权利要求28至31任一所述的装置,其特征在于,所述机器学习模型为树模型,所述增量训练模块,用于:
    对于所述第一训练样本集合中的任一训练样本,从所述机器学习模型的根节点开始遍历,执行如下遍历过程:
    当遍历到的第一节点的当前分裂成本小于第一节点的历史分裂成本时,添加关联的第二节点,所述第一节点为所述机器学习模型中的任一非叶子节点,所述第二节点为所述第一节点的父节点或子节点;
    当第一节点的当前分裂成本不小于第一节点的历史分裂成本,遍历所述第一节点的子树中的节点,并将遍历到的节点确定为新的第一节点,再次执行所述遍历过程,直至遍历到的第一节点的当前分裂成本小于第一节点的历史分裂成本,或者遍历到目标深度;
    其中,第一节点的当前分裂成本为所述第一节点基于第一训练样本进行节点分裂的成本,所述第一训练样本为所述第一训练样本集合中的任一训练样本,所述第一训练样本包括一个或多个特征维度的特征数据,所述特征数据为数值数据,所述第一节点的历史分裂成本为所述第一节点基于所述第一节点的历史训练样本集合进行节点分裂的成本,所述第一节点的历史训练样本集合为所述机器学习模型的历史训练样本集合中划分至所述第一节点的样本的集合。
  33. 根据权利要求32所述的装置,其特征在于,所述第一节点的当前分裂成本与第一数值分布范围的大小负相关,所述第一数值分布范围是基于所述第一训练样本中的特征取值与第二数值分布范围确定的分布范围;所述第二数值分布范围为第一节点的历史训练样本集合中的特征取值的分布范围,所述第一节点的历史分裂成本与所述第二数值分布范围的大小负相关。
  34. 根据权利要求33所述的装置,其特征在于,所述第一节点的当前分裂成本为所述第一数值分布范围中各特征维度上的特征取值的跨度之和的倒数,所述第一节点的历史分裂成本为所述第二数值分布范围中各特征维度上的特征取值的跨度之和的倒数。
  35. 根据权利要求33所述的装置,其特征在于,所述增量训练模块,用于
    确定所述第一数值分布范围在各特征维度上的特征取值的跨度范围;
    基于第一分裂维度上的第一分裂点添加所述第二节点,其中,所述第一数值分布范围中在所述第一分裂维度上数值不大于所述第一分裂点数值的数值范围划分至所述第二节点的左子节点,所述第一数值分布范围中在所述第一分裂维度上数值大于所述第一分裂点数值的数值范围划分至所述第二节点的右子节点,所述第一分裂维度为基于所述各特征维度上的特征取值的跨度范围,在所述各特征维度中确定的分裂维度,所述第一分裂点为在所述第一数值分布范围的所述第一分裂维度上确定的用于分裂的数值点;
    当所述第一分裂维度与第二分裂维度不同时,所述第二节点为所述第一节点的父节点或子节点,所述第二分裂维度为所述第一节点在所述机器学习模型中的历史分裂维度,所述第二分裂点为所述第一节点在所述机器学习模型中的历史分裂点;
    当所述第一分裂维度与所述第二分裂维度相同,且所述第一分裂点位于所述第二分裂点右侧,所述第二节点为所述第一节点的父节点,且所述第一节点为所述第二节点的左子节点;
    当所述第一分裂维度与所述第二分裂维度相同,且所述第一分裂点位于所述第二分裂点左侧,所述第二节点为所述第一节点左子节点。
  36. 根据权利要求33所述的装置,其特征在于,所述第一分裂维度为在所述第一数值分布范围的各特征维度中随机选择的特征维度,或,所述第一分裂维度为所述第一数值分布范围的各特征维度中跨度最大的特征维度;
    和/或,所述第一分裂点为在所述第一数值分布范围的所述第一分裂维度上随机选择的数值点。
  37. 根据权利要求32至36任一所述的装置,其特征在于,所述增量训练模块,用于:
    当所述第一节点的历史训练样本集合中的样本的数量与所述第一训练样本的数量之和大于第一样本数阈值,添加所述第二节点;
    所述装置还包括:
    停止模块,用于当所述第一节点的历史训练样本集合中的样本的数量与所述第一训练样本的数量之和不大于所述第一样本数阈值,停止所述机器学习模型的增量训练。
  38. 根据权利要求32至37任一所述的装置,其特征在于,所述装置还包括:
    合并模块,用于将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并,得到精简后的机器学习模型,所述精简后的机器学习模型用于进行分类结果的预测;
    或者,所述接收模块,还用于接收所述第一分析设备发送的精简后的机器学习模型,所述精简后的机器学习模型是所述第一分析设备将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并得到的;
    其中,所述第一叶子节点为所述第一非叶子节点的子节点,所述第二叶子节点为所述第二非叶子节点的子节点,所述第一叶子节点和所述第二叶子节点包括相同的分类结果,且在同一特征维度上分配的历史训练样本集合的特征取值的跨度范围相邻。
  39. 根据权利要求32至38任一所述的装置,其特征在于,所述机器学习模型中的每个节点对应存储有节点信息,所述机器学习模型中的任一节点的所述节点信息包括标签分布信息,所述标签分布信息用于反映划分至对应节点中的历史训练样本集合中样本的不同类别的标签在标签总数中的占比,所述标签总数是划分至所述任一节点的历史训练样本集合中样本对应的标签的总数量,任一非叶子节点的节点信息还包括历史分裂信息,所述历史分裂信息为对应节点用于分裂的信息。
  40. 根据权利要求39所述的装置,其特征在于,所述历史分裂信息包括:对应节点在机器学习模型中的位置信息,分裂维度,分裂点,划分至对应节点的历史训练样本集合的数值分布范围和历史分裂成本;
    所述标签分布信息包括:划分至对应节点的历史训练样本集合中样本的同一类别的标签个数和所述标签总数;或,划分至对应节点的历史训练样本集合中样本的不同类别的标签在所述标签总数中的占比。
  41. 根据权利要求28至40任一所述的装置,其特征在于,所述第一训练样本集合包括在所述局点分析设备获取的样本中筛选得到的满足低区分度条件的样本,所述低区分度条件包括以下至少一者:
    采用所述机器学习模型预测样本得到的目标概率集合中任意两个概率的差值的绝对值小于第一差值阈值,所述目标概率集合包括按照概率的大小降序排列的前n个分类结果的概率,1<n<m,m为所述机器学习模型预测样本得到的概率总数;
    或者,采用所述机器学习模型预测样本得到的概率中任意两个概率的差值的绝对值小于第二差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率中最高概率和最低概率的差值的绝对值小于第三差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率中任意两个概率的差值的绝对值小于第四差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率分布熵E大于指定分布熵阈值,所述E满足:
    Figure PCTCN2020115770-appb-100003
    其中,x i表示第i种分类结果,P(x i)表示预测得到样本的第i种分类结果的概率,b为指定的底数,0≤P(x i)≤1。
  42. 一种模型训练装置,其特征在于,应用于第一分析设备,包括:
    离线训练模块,用于基于历史训练样本集合进行离线训练,得到机器学习模型;
    发送模块,用于向多个局点分析设备发送所述机器学习模型,以供所述局点分析设备基于第一训练样本集合,对所述机器学习模型进行增量训练,任一局点分析设备用于训练所述机器学习模型的训练样本集合中的特征数据是所述任一局点分析设备所对应的局点网络的特征数据。
  43. 根据权利要求42所述的装置,其特征在于,所述历史训练样本集合是多个所述局点分析设备发送的训练样本的集合。
  44. 根据权利要求42所述的装置,其特征在于,所述装置还包括:
    接收模块,用于:
    在向局点分析设备发送所述机器学习模型之后,接收所述局点分析设备发送的重训练请求,基于发送所述重训练请求的局点分析设备所发送的训练样本集合,对所述机器学习模型进行重训练;
    或者,接收所述局点分析设备发送的重训练请求,基于发送所述重训练请求的局点分析设备所发送的训练样本集合以及其他局点分析设备所发送的训练样本集合,对所述机器学习模型进行重训练;
    或者,接收至少两个所述局点分析设备发送的训练样本集合,并基于接收到的训练样本集合,重训练所述机器学习模型。
  45. 根据权利要求42至44任一所述的装置,其特征在于,所述机器学习模型为树模型,所述离线训练模块,用于:
    获取已确定标签的历史训练样本集合,所述历史训练样本集合中的训练样本包括一个或多个特征维度的特征数据,所述特征数据为数值数据;
    创建根节点;
    将所述根节点作为第三节点,执行离线训练过程直至达到分裂截止条件;
    为每个叶子节点确定分类结果,得到所述机器学习模型;
    其中,所述离线训练过程包括:
    进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点;
    将所述左子节点作为更新后的第三节点,将所述历史训练样本集合划分至所述左子节点的左样本集合作为更新后的历史训练样本集合,再次执行所述离线训练过程;
    将所述右子节点作为更新后的第三节点,将所述历史训练样本集合划分至所述右子节点的右样本集合作为更新后的历史训练样本集合,再次执行所述离线训练过程。
  46. 根据权利要求45所述的装置,其特征在于,所述离线训练模块,用于:
    基于所述历史训练样本集合的数值分布范围进行所述第三节点的分裂,得到所述第三节点的左子节点和右子节点,所述历史训练样本集合的数值分布范围为所述历史训练样本集合中的特征取值的分布范围。
  47. 根据权利要求46所述的装置,其特征在于,所述离线训练模块,用于:
    在所述历史训练样本集合的各特征维度中确定第三分裂维度;
    在所述历史训练样本集合的所述第三分裂维度上确定第三分裂点;
    将第三数值分布范围中在所述第三分裂维度上数值不大于所述第三分裂点数值的数值范围划分至所述左子节点,所述第三数值分布范围中在所述第三分裂维度上数值大于所述第三分裂点数值的数值范围划分至所述右子节点,所述第三数值分布范围为所述第三节点的历史训练样本集中的特征取值的分布范围。
  48. 根据权利要求45至47任一所述的装置,其特征在于,所述分裂截止条件包括以下至少一者:
    所述第三节点的当前分裂成本大于分裂成本阈值;
    或者,所述历史训练样本集合的样本的数量小于第二样本数阈值;
    或者,所述第三节点对应的分裂次数大于分裂次数阈值;
    或者,所述第三节点在所述机器学习模型中的深度大于深度阈值;
    或者,所述历史训练样本集合所对应的标签中占比最大的标签的数量在所述历史训练样本集合所对应的标签的标签总数中的占比大于指定占比阈值。
  49. 根据权利要求48所述的装置,其特征在于,所述第三节点的当前分裂成本与所述历史训练样本集合中的特征取值的分布范围的大小负相关。
  50. 根据权利要求49所述的装置,其特征在于,所述第三节点的当前分裂成本为所述历史训练样本集合在各特征维度上的特征取值的跨度之和的倒数。
  51. 根据权利要求45至50任一所述的装置,其特征在于,所述装置还包括:
    合并模块,用于将所述机器学习模型中第一非叶子节点和第二非叶子节点进行合并,第一叶子节点和第二叶子节点进行合并,得到精简后的机器学习模型,所述精简后的机器学习模型用于进行分类结果的预测,其中,所述第一叶子节点为所述第一非叶子节点的子节点,所述第二叶子节点为所述第二非叶子节点的子节点,所述第一叶子节点和所述第二叶子节点包括相同的分类结果,且在同一特征维度上分配的历史训练样本集合的特征取值的跨度范围相邻;
    所述发送模块,还用于向所述局点分析设备发送所述精简后的机器学习模型,以供所述局点分析设备基于所述精简后的机器学习模型进行分类结果的预测。
  52. 根据权利要求45至51任一所述的装置,其特征在于,所述机器学习模型中的每个节点对应存储有节点信息,所述机器学习模型中的任一节点的所述节点信息包括标签分布信息,所述标签分布信息用于反映划分至对应节点中的历史训练样本集合中样本的不同类别的标签在标签总数中的占比,所述标签总数是划分至所述任一节点的历史训练样本集合中样本对应的标签的总数量,任一非叶子节点的节点信息还包括历史分裂信息,所述历史分裂信息为对应节点用于分裂的信息。
  53. 根据权利要求52所述的装置,其特征在于,所述历史分裂信息包括:对应节点在机器学习模型中的位置信息,分裂维度,分裂点,划分至对应节点的历史训练样本集合的数值分布范围和历史分裂成本;
    所述标签分布信息包括:划分至对应节点的历史训练样本集合中样本的同一类别的标签个数和所述标签总数;或,划分至对应节点的历史训练样本集合中样本的不同类别的标签在所述标签总数中的占比。
  54. 根据权利要求42至53任一所述的装置,其特征在于,所述第一训练样本集合包括在所述局点分析设备获取的样本中筛选得到的满足低区分度条件的样本,所述低区分度条件包括以下至少一者:
    采用所述机器学习模型预测样本得到的目标概率集合中任意两个概率的差值的绝对值小于第一差值阈值,所述目标概率集合包括按照概率的大小降序排列的前n个分类结果的概率,1<n<m,m为所述机器学习模型预测样本得到的概率总数;
    或者,采用所述机器学习模型预测样本得到的概率中任意两个概率的差值的绝对值小于第二差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率中最高概率和最低概率的差值的绝对值小于第三差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率中任意两个概率的差值的绝对值小于第四差值阈值;
    或者,采用所述机器学习模型预测样本的多种分类结果的概率分布熵E大于指定分布熵阈值,所述E满足:
    Figure PCTCN2020115770-appb-100004
    其中,x i表示第i种分类结果,P(x i)表示预测得到样本的第i种分类结果的概率,b为指定的底数,0≤P(x i)≤1。
  55. 一种模型训练装置,其特征在于,包括:处理器和存储器;
    所述存储器,用于存储计算机程序,所述计算机程序包括程序指令;
    所述处理器,用于调用所述计算机程序,实现如权利要求1至14任一所述的模型训练方法;或者,实现如权利要求15至27任一所述的模型训练方法。
  56. 一种计算机存储介质,其特征在于,所述计算机存储介质上存储有指令,当所述指令被处理器执行时,实现如权利要求1至14任一所述的模型训练方法;或者,实现如权利要 求15至27任一所述的模型训练方法。
  57. 一种模型训练系统,其特征在于,包括:第一分析设备和多个局点分析设备;
    所述第一分析设备包括权利要求28至41任一所述的模型训练装置;所述局点分析设备包括权利要求42至54任一所述的模型训练装置。
PCT/CN2020/115770 2019-09-17 2020-09-17 模型训练方法、装置及系统 WO2021052394A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP20866562.0A EP4024261A4 (en) 2019-09-17 2020-09-17 PATTERN LEARNING METHOD, APPARATUS AND SYSTEM
US17/696,593 US20220207434A1 (en) 2019-09-17 2022-03-16 Model training method, apparatus, and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910878280.9A CN112529204A (zh) 2019-09-17 2019-09-17 模型训练方法、装置及系统
CN201910878280.9 2019-09-17

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/696,593 Continuation US20220207434A1 (en) 2019-09-17 2022-03-16 Model training method, apparatus, and system

Publications (1)

Publication Number Publication Date
WO2021052394A1 true WO2021052394A1 (zh) 2021-03-25

Family

ID=74883382

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/115770 WO2021052394A1 (zh) 2019-09-17 2020-09-17 模型训练方法、装置及系统

Country Status (4)

Country Link
US (1) US20220207434A1 (zh)
EP (1) EP4024261A4 (zh)
CN (1) CN112529204A (zh)
WO (1) WO2021052394A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780582A (zh) * 2021-09-15 2021-12-10 杭银消费金融股份有限公司 基于机器学习模型的风控特征筛选方法及系统

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11509674B1 (en) * 2019-09-18 2022-11-22 Rapid7, Inc. Generating machine learning data in salient regions of a feature space
US12088600B1 (en) 2019-09-18 2024-09-10 Rapid7, Inc. Machine learning system for detecting anomalies in hunt data
US11853853B1 (en) 2019-09-18 2023-12-26 Rapid7, Inc. Providing human-interpretable explanation for model-detected anomalies
CN112866175B (zh) * 2019-11-12 2022-08-19 华为技术有限公司 一种异常流量类型保留方法、装置、设备及存储介质
CN113328872B (zh) 2020-02-29 2023-03-28 华为技术有限公司 故障修复方法、装置和存储介质
US20220124110A1 (en) * 2020-10-20 2022-04-21 Amazon Technologies, Inc. Anomaly detection using an ensemble of detection models
US11803459B2 (en) * 2021-10-19 2023-10-31 Hewlett-Packard Development Company, L.P. Latency tolerance reporting value determinations
CN116192308A (zh) * 2021-11-24 2023-05-30 华为技术有限公司 数据处理方法和通信装置
CN114219346B (zh) * 2021-12-24 2023-04-14 江苏童能文化科技有限公司 一种提高网络学习环境服务质量的方法及系统
WO2023206249A1 (en) * 2022-04-28 2023-11-02 Qualcomm Incorporated Machine learning model performance monitoring reporting
WO2024087000A1 (en) * 2022-10-25 2024-05-02 Huawei Technologies Co., Ltd. Methods and apparatuses for articifical intelligence or machine learning training
CN115562948B (zh) * 2022-12-05 2023-04-07 成都索贝数码科技股份有限公司 大规模并行化的多kpi预测方法及系统
CN116415687B (zh) * 2022-12-29 2023-11-21 江苏东蓝信息技术有限公司 一种基于深度学习的人工智能网络优化训练系统及方法
CN117132174B (zh) * 2023-10-26 2024-01-30 扬宇光电(深圳)有限公司 一种应用于工业流水线质量检测的模型训练方法与系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160300156A1 (en) * 2015-04-10 2016-10-13 Facebook, Inc. Machine learning model tracking platform
CN108173708A (zh) * 2017-12-18 2018-06-15 北京天融信网络安全技术有限公司 基于增量学习的异常流量检测方法、装置及存储介质
CN109670583A (zh) * 2018-12-27 2019-04-23 浙江省公众信息产业有限公司 去中心化的数据分析方法、系统以及介质
CN110221558A (zh) * 2019-06-05 2019-09-10 镇江四联机电科技有限公司 一种基于边缘计算技术的电液伺服阀在线故障诊断网关

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017127976A1 (zh) * 2016-01-25 2017-08-03 华为技术有限公司 一种用于增量式学习云系统的训练、调度方法及相关设备
CN109871872A (zh) * 2019-01-17 2019-06-11 西安交通大学 一种基于壳向量式svm增量学习模型的流量实时分类方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160300156A1 (en) * 2015-04-10 2016-10-13 Facebook, Inc. Machine learning model tracking platform
CN108173708A (zh) * 2017-12-18 2018-06-15 北京天融信网络安全技术有限公司 基于增量学习的异常流量检测方法、装置及存储介质
CN109670583A (zh) * 2018-12-27 2019-04-23 浙江省公众信息产业有限公司 去中心化的数据分析方法、系统以及介质
CN110221558A (zh) * 2019-06-05 2019-09-10 镇江四联机电科技有限公司 一种基于边缘计算技术的电液伺服阀在线故障诊断网关

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780582A (zh) * 2021-09-15 2021-12-10 杭银消费金融股份有限公司 基于机器学习模型的风控特征筛选方法及系统

Also Published As

Publication number Publication date
CN112529204A (zh) 2021-03-19
EP4024261A4 (en) 2022-12-14
EP4024261A1 (en) 2022-07-06
US20220207434A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
WO2021052394A1 (zh) 模型训练方法、装置及系统
WO2021042843A1 (zh) 告警信息的决策方法、装置、计算机设备及存储介质
EP4020315A1 (en) Method, apparatus and system for determining label
EP3968243A1 (en) Method and apparatus for realizing model training, and computer storage medium
CN109787846A (zh) 一种5g网络服务质量异常监测和预测方法及系统
US10917419B2 (en) Systems and methods for anomaly detection
Decker et al. Real-time anomaly detection in data centers for log-based predictive maintenance using an evolving fuzzy-rule-based approach
CN108600009B (zh) 一种基于告警数据分析的网络告警根源定位方法
WO2018011742A1 (en) Early warning and recommendation system for the proactive management of wireless broadband networks
CN111176953B (zh) 一种异常检测及其模型训练方法、计算机设备和存储介质
WO2021103823A1 (zh) 模型更新系统、模型更新方法及相关设备
CN112769605A (zh) 一种异构多云的运维管理方法及混合云平台
CN112363896A (zh) 日志异常检测系统
Cao et al. Load prediction for data centers based on database service
CN114090393B (zh) 一种告警级别的确定方法、装置及设备
CN115562940A (zh) 负载能耗监控方法、装置、介质及电子设备
US11550691B2 (en) Computing resources schedule recommendation
Mdini Anomaly detection and root cause diagnosis in cellular networks
Gias et al. Samplehst: Efficient on-the-fly selection of distributed traces
CN111984514A (zh) 基于Prophet-bLSTM-DTW的日志异常检测方法
Mijumbi et al. MAYOR: machine learning and analytics for automated operations and recovery
Jerome et al. Anomaly detection and classification using a metric for determining the significance of failures: Case study: mobile network management data from LTE network
Sayed-Mouchaweh Learning from Data Streams in Evolving Environments: Methods and Applications
Fichera et al. Artificial Intelligence in virtualized networks: a journey.
US20240333615A1 (en) Network analysis using dataset shift detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20866562

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020866562

Country of ref document: EP

Effective date: 20220331