WO2022100491A1 - 模型训练方法和装置、电子设备、计算机可读存储介质 - Google Patents

模型训练方法和装置、电子设备、计算机可读存储介质 Download PDF

Info

Publication number
WO2022100491A1
WO2022100491A1 PCT/CN2021/128319 CN2021128319W WO2022100491A1 WO 2022100491 A1 WO2022100491 A1 WO 2022100491A1 CN 2021128319 W CN2021128319 W CN 2021128319W WO 2022100491 A1 WO2022100491 A1 WO 2022100491A1
Authority
WO
WIPO (PCT)
Prior art keywords
label
label sample
classification model
category
model
Prior art date
Application number
PCT/CN2021/128319
Other languages
English (en)
French (fr)
Inventor
何世明
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2022100491A1 publication Critical patent/WO2022100491A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/06Testing, supervising or monitoring using simulated traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present application relate to the field of communications, and in particular, to a model training method and apparatus, an electronic device, and a computer-readable storage medium.
  • an embodiment of the present application provides a model training method, including:
  • the basic classification model is retrained by using the first label sample to obtain a final classification model; wherein, the basic classification model is a classification model suitable for the second area, so The final classification model is a classification model applicable to the first region.
  • an electronic device including:
  • a memory where at least one program is stored, and when the at least one program is executed by the at least one processor, the above-mentioned model training method is implemented.
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned model training method is implemented.
  • FIG. 1 is a flowchart of a model training method provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a model training method according to an embodiment of the present application.
  • FIG. 3 is a block diagram of a model training apparatus provided by an embodiment of the present application.
  • the current fault detection technology is usually based on real-time detection of equipment and analysis based on the experience of experts, which usually requires a lot of human input, and has extremely high requirements for wireless network operation and maintenance.
  • AI Artificial Intelligence
  • AI Artificial Intelligence
  • the model is often applicable in one place, but when switched to another place, due to changes in geographical location, user habits, networking methods, equipment structure, weather and many other factors, the model applicable in a certain place cannot be used. Apply in another place. Since the labeled samples are difficult to obtain, it is difficult to re-collect the labeled samples in another place, and thus it is also difficult to retrain the model.
  • Embodiments of the present application provide a model training method, an electronic device, a computer-readable storage medium, and a model training apparatus to at least partially solve the above problems.
  • FIG. 1 is a flowchart of a model training method provided by an embodiment of the present application.
  • an embodiment of the present application provides a model training method, including steps 100 and 101 .
  • Step 100 Obtain the first label sample of the first area and the category to which the first label sample belongs.
  • the first area may be any area and may be preset.
  • any method well known to those skilled in the art may be used to obtain the first label sample of the first region and the category to which the first label sample belongs.
  • the specific acquisition method is not used to limit the protection scope of the embodiments of the present application.
  • the automation of labeling can be implemented, specifically
  • the second label sample of the first area can be acquired, and some or all of the second label sample can be selected as the first label sample according to the third label sample of the second area and the category to which the third label sample belongs, and the first label sample can be determined.
  • the category to which it belongs That is to say, as shown in Figure 2, obtaining the first label sample of the first area and the category to which the first label sample belongs include:
  • Some or all of the second label samples are selected as the first label samples according to the third label samples in the second area and the category to which the third label samples belong, and the category to which the first label samples belong is determined.
  • selecting some or all of the second label samples as the first label samples according to the third label samples of the second region and the categories to which the third label samples belong, and determining the category to which the first label samples belong includes:
  • N third label samples belong to the same category among the K third label samples, and N is greater than or equal to rK
  • the second label sample is used as the first label sample, and the category to which the first label sample belongs is determined.
  • the category is the category to which the N third label samples belong, and r is an integer greater than or equal to 0 and less than or equal to 1.
  • the model training method further includes: discarding the second label sample.
  • the similarity between a certain second label sample and a certain third label sample may be represented by the distance between the second label sample and the third label sample.
  • the similarity between a certain second label sample and a certain third label sample may also be represented by other parameters, and the specific representation parameters are not used to limit the protection scope of the embodiments of the present application.
  • Dis is the distance between a certain second label sample and a certain third label sample
  • p is a constant
  • n is the dimension of the label sample
  • the minimum number of first label samples required for each category may be preset, and when the number of first label samples of a certain category reaches the corresponding minimum number, the first label sample of the category is stopped. Acquisition of the number of label samples.
  • the third label sample in the second area may be part or all of the label samples (that is, the sixth label sample mentioned below) used in the model training to obtain the basic classification model, or may be based on the basic classification model.
  • Part or all of the label samples of the category to which they belong may also be label samples including the above two kinds of label samples, which label samples are used to realize the determination of the first label sample and the category to which the first label sample belongs is not limited The protection scope of the embodiments of the present application.
  • each label sample should be calculated and determined separately.
  • Step 101 Retrain the basic classification model according to the category to which the first label sample belongs and the first label sample to obtain a final classification model.
  • the basic classification model is a classification model suitable for the second area, and the final classification model is suitable for the first classification model.
  • the second area may be any area and may be preset.
  • first area and the second area are different areas, the first area and the second area may be two areas without an overlapping area, or there may be an overlapping area, and the first area and the second area may be Two regions whose proportion of overlapping regions is less than or equal to a preset threshold.
  • the basic classification model applicable to the second area should not be applicable to the first area.
  • the basic classification mode is a classification model obtained by performing model training according to the sixth label sample and the category to which the sixth label sample belongs.
  • the final classification model obtained by retraining the basic classification model according to the category to which the first label sample belongs and the first label sample includes at least one of the following:
  • the basic classification model is a serialized model (such as xgboost, AdaBoost, neural network, etc.)
  • a new layer is added after the basic classification model to keep the structural parameters of the basic classification model unchanged.
  • the category to which the sample belongs and the first label sample are trained on the new layer to obtain the final classification model;
  • the basic classification model is a parallelized model (such as random forest, etc.)
  • a new classifier is generated, and a final classification model is obtained by training the new classifier according to the category to which the first label sample belongs and the first label sample.
  • the specific type of the classifier is not limited, it may be a weak classifier, a strong classifier, or other classifiers, and the specific type is not used to limit the protection of the embodiments of the present application scope.
  • each variable in the label sample has a different unit
  • standardization processing can be performed for each first label sample respectively, Then, the basic classification model is retrained based on the standardized first label sample and the category to which the first label sample belongs to obtain a final classification model.
  • each first label sample can be subjected to dimensionality reduction processing , and then retrain the basic classification model based on the first label sample after dimensionality reduction processing and the category to which the first label sample belongs to obtain a final classification model.
  • standardization processing and dimension reduction processing may also be performed on each first label sample at the same time, that is, the basic classification model is retrained according to the category to which the first label sample belongs and the first label sample.
  • the model training method further includes:
  • Retraining the basic classification model according to the category to which the first label sample belongs and the first label sample to obtain the final classification model includes: retraining the basic classification model according to the category to which the first label sample belongs and the fifth label sample. Get the final classification model.
  • normalizing the first label sample includes:
  • x1 ij is the j-th dimension variable in the i-th first label sample
  • mean1(S) is the average of the j-th dimension variables in all the first label samples value
  • std1(S) is the standard deviation of the jth dimension variable in all the first label samples.
  • mean1(S) can be taken as the average value of the j-th dimension variable in all sixth-label samples
  • std1(S) can be taken as the j-th variable in all sixth-label samples
  • the standard deviation of the dimension variable alternatively, mean1(S) can also be taken as the mean value of the jth dimension variable in all first label samples and all sixth label samples, and std1(S) can also be taken as all first label samples and the standard deviation of the jth-dimensional variable in all sixth-label samples.
  • dimensionality reduction processing algorithms eg, Principal Component Analysis (PCA, Principal Component Analysis) algorithm, T-distribution, and T-Stochastic Neighbor Embedding (TSNE, T-Stochastic Neighbor Embedding) well-known to those skilled in the art may be employed.
  • PCA Principal Component Analysis
  • TSNE T-Stochastic Neighbor Embedding
  • algorithm Locally Linear Embedding (LLE, Locally Linear Embedding) algorithm, dimensionality reduction (MDS, MultiDimensional Scaling) algorithm, etc.
  • the dimension of label samples after dimension reduction can be preset. For example, if PCA is used for dimensionality reduction, 85% of the principal components can be retained; if TSNE, LLE, and MDS are used for dimensionality reduction, dimensionality reduction can be set to two dimensions.
  • the dimension of the label sample after the dimension reduction process is not used to limit the protection scope of the embodiments of the present application.
  • the standardization and dimensionality reduction of the label samples will not change the category to which the label samples belong, that is, the category to which the first label samples belong, and the standardized first label samples (that is, the fourth label)
  • the category to which the sample) belongs, the category to which the first label sample after dimensionality reduction processing belongs, and the category to which the fifth label sample belongs are the same.
  • the final classification model obtained by retraining the basic classification model according to the category to which the first label sample belongs and the fifth label sample includes at least one of the following:
  • the basic classification model is a serialized model (such as xgboost, AdaBoost, neural network, etc.)
  • a new layer is added after the basic classification model to keep the structural parameters of the basic classification model unchanged.
  • the category to which the sample belongs and the fifth label sample are trained on the new layer to obtain the final classification model;
  • the basic classification model is a parallelized model (such as random forest, etc.)
  • the samples are used to train the new classifier to obtain the final classification model.
  • the model training method before acquiring the first label sample of the first region and the category to which the first label sample belongs, the model training method further includes:
  • the basic classification model is obtained by performing model training according to the category to which the sixth label sample of the second area belongs and the sixth label sample.
  • the classification model may be a classification model well known to those skilled in the art, for example, random forest, Gradient Boosting Decision Tree (GBDT, Gradient Boosting Decision Tree), xgboost (Extreme GBoosted), neural network, etc.
  • GBDT Gradient Boosting Decision Tree
  • xgboost Extreme GBoosted
  • neural network etc.
  • the first classification model is applied to a corresponding sixth label sample used for model verification to obtain the category to which each sixth label sample belongs, Then, the accuracy of the first classification model is determined according to the category to which each sixth label sample belongs and the real category obtained.
  • the score of the first classification model may be used to represent the accuracy of the first classification model.
  • the confusion matrix can be used to represent the quality of the first classification model. The confusion matrix is shown in Table 1.
  • the true value refers to the true category of the sixth label sample obtained by other means
  • the predicted value refers to the application of the first classification model to the corresponding The category to which each sixth label sample obtained from the verified sixth label sample belongs.
  • the score for the first classification model is the average of the scores for all classes.
  • the score for the a-th category of the first classification model can be calculated according to the following formula,
  • N bb is the number of sixth label samples whose actual value is b and predicted value is b in the sixth label sample used for verification
  • N cb is the actual value of c and predicted value b in the sixth label sample used for verification
  • the number of sixth label samples of , N bc is the number of sixth label samples whose real value is b and predicted value is c in the sixth label sample used for verification
  • (m-1) is the number of categories.
  • the grid method in order to improve the accuracy of the basic classification model, can be used to adjust the input parameters of the classification model, that is, enumerated values are set for the input parameters of the classification model, and model training is used for each value.
  • the method obtains a second classification model, and selects the second classification model with the highest accuracy from all the second classification models as the basic classification model.
  • the second classification model is applied to the corresponding test set to obtain the category to which each sixth label sample belongs, and then according to the obtained sixth label sample The class to which it belongs and the true class determine the accuracy of the second classification model.
  • the score of the second classification model may be used to represent the accuracy of the second classification model.
  • the confusion matrix can be used to represent the quality of the second classification model. The confusion matrix is shown in Table 1.
  • the true value refers to the true category of the sixth label sample obtained by other means
  • the predicted value refers to the application of the first classification model to the corresponding The category to which each sixth label sample obtained from the verified sixth label sample belongs.
  • the score for the second classification model is the average of the scores for all classes.
  • the score for the a-th category of the first classification model can be calculated according to the following formula,
  • N bb is the number of sixth label samples whose actual value is b and predicted value is b in the sixth label sample in the test set
  • N cb is the number of sixth label samples whose actual value is c and predicted value is b in the sixth label sample in the test set.
  • the number of six-label samples, N bc is the number of sixth-label samples with the true value of b and the predicted value of c in the sixth-label sample in the test set
  • (m-1) is the number of categories.
  • each variable in the label sample has a different unit
  • standardization processing can be performed for each sixth label sample respectively, Then, model training is performed based on the standardized sixth label sample and the category to which the sixth label sample belongs to obtain a basic classification model.
  • dimensionality reduction processing can be performed on each sixth label sample. , and then perform model training based on the sixth label sample after dimensionality reduction processing and the category to which the sixth label sample belongs to obtain a basic classification model.
  • standardization processing and dimensionality reduction processing can also be performed on each sixth label sample at the same time, that is, the basic classification is obtained by performing model training according to the category to which the sixth label sample of the second region belongs and the sixth label sample.
  • the model training method further includes:
  • the basic classification model obtained by performing model training according to the category to which the sixth label sample of the second region belongs and the sixth label sample includes: performing model training according to the category to which the sixth label sample belongs and the eighth label sample to obtain a basic classification model.
  • normalizing the sixth label sample includes:
  • j-th dimension variable in the i-th seventh label sample is the j-th dimension variable in the i-th seventh label sample
  • x2 ij is the j-th dimension variable in the i-th sixth label sample
  • mean2(S) is the average of the j-th dimension variables in all sixth label samples value
  • std2(S) is the standard deviation of the jth dimension variable in all sixth label samples.
  • dimensionality reduction processing algorithms eg, PCA algorithm, TSNE algorithm, LLE algorithm, MDS algorithm, etc.
  • PCA algorithm e.g., PCA algorithm, TSNE algorithm, LLE algorithm, MDS algorithm, etc.
  • the dimension of label samples after dimension reduction can be preset. For example, if PCA is used for dimensionality reduction, 85% of the principal components can be retained; if TSNE, LLE, and MDS are used for dimensionality reduction, dimensionality reduction can be set to two dimensions.
  • the dimension of the label sample after the dimension reduction process is not used to limit the protection scope of the embodiments of the present application.
  • the standardization and dimensionality reduction of the label samples will not change the category to which the label samples belong, that is, the category to which the sixth label sample belongs, and the standardized sixth label sample (that is, the seventh label)
  • the category to which the sample) belongs, the category to which the sixth label sample after dimensionality reduction processing belongs, and the category to which the eighth label sample belongs are the same.
  • model training method firstly obtain the first label sample of the first area and the category to which the first label sample belongs, and then use the first label sample to classify the basic classification based on the category to which the first label sample belongs.
  • the model retrains the model to obtain the final classification model. Since the final classification model is obtained by retraining the basic classification model, rather than retraining the model, it is not necessary to obtain too many data from the first area.
  • the label sample can realize the training of the classification model suitable for the first area, that is, the final classification model can be obtained by retraining the basic classification model with a small number of first label samples, which simply realizes the classification of different areas. Model training.
  • the category to which the first label sample belongs is automatically marked based on the third label sample of the second region, which saves a lot of manual labeling workload and improves the accuracy of model training.
  • This example describes the training method of the fault classification model applied to the sleeping cell.
  • the faults of the sleeping cell are mainly divided into 5 types, plus the normal cell, a total of 6 categories.
  • the model training method includes the following steps 1 to 9.
  • j-th dimension variable in the i-th seventh label sample is the j-th dimension variable in the i-th seventh label sample
  • x2 ij is the j-th dimension variable in the i-th sixth label sample
  • mean2(S) is the average of the j-th dimension variables in all sixth label samples value
  • std2(S) is the standard deviation of the jth dimension variable in all sixth label samples.
  • This example uses PCA for dimensionality reduction.
  • the main idea of PCA is to map n-dimensional variables to d-dimensions.
  • This d-dimensional variable is a new orthogonal feature, also known as principal component, which is the basis of the original n-dimensional variable
  • the d-dimensional variable reconstructed on .
  • the job of PCA is to sequentially find a set of mutually orthogonal coordinate axes from the original n-dimensional space, and the selection of the new coordinate axis is closely related to the seventh label sample.
  • the first new coordinate axis selects the direction with the largest variance of the seventh label sample
  • the second new coordinate axis selects the direction that maximizes the variance of the seventh label sample in the plane orthogonal to the first new coordinate axis
  • the third new coordinate axis is the direction that maximizes the variance of the seventh label sample in the plane orthogonal to the first new coordinate axis and the second new coordinate axis, and so on, d such new coordinates can be obtained axis. Then map each seventh label sample into the new coordinate axis through axis transformation.
  • Principal component analysis is used to ensure that the complexity of the model training method can be reduced without reducing the accuracy of the model training method, retaining 85% of the contribution rate.
  • This example uses the random forest model as the classification model, and uses the cross-validation method to train the random forest model.
  • L-layer cross-validation is used to ensure the accuracy of the model.
  • a classification model selecting the first classification model with the highest accuracy as the second classification model, and applying the second classification model to the eighth label sample for verification to obtain the category to which all the eighth label samples in the test set belong.
  • the score of the first classification model can be used to represent the accuracy of the first classification model.
  • the score for the first classification model is the average of the scores for all classes.
  • the score for the a-th category of the first classification model can be calculated according to the following formula,
  • N bb is the number of sixth label samples whose actual value is b and predicted value is b in the sixth label sample used for verification
  • N cb is the actual value of c and predicted value b in the sixth label sample used for verification
  • the number of sixth label samples of , N bc is the number of sixth label samples whose real value is b and predicted value is c in the sixth label sample used for verification
  • (m-1) is the number of categories.
  • this example adopts the grid method to adjust the input parameters of the model.
  • the adjusted input parameters of the random forest include the number of random forest base classifiers n_estimators, the maximum depth of the base classifier max_depth, and the maximum selected base classifier.
  • n_estimators [50, 100, 150, 200];
  • Criterion [gini, entropy].
  • the score of the second classification model can be used to represent the accuracy of the second classification model.
  • the score for the second classification model is the average of the scores for all classes.
  • the score of the a-th category of the first classification model can be calculated according to the formula,
  • N bb is the number of sixth label samples whose actual value is b and predicted value is b in the sixth label sample in the test set
  • N cb is the number of sixth label samples whose actual value is c and predicted value is b in the sixth label sample in the test set.
  • the number of six-label samples, N bc is the number of sixth-label samples with the true value of b and the predicted value of c in the sixth-label sample in the test set
  • (m-1) is the number of categories.
  • a second label sample was collected in the first area as shown in Table 3.
  • the second label sample is taken as the first label sample, and the category to which the first label sample belongs is the category to which the 16 sixth label samples belong.
  • the second label samples are discarded.
  • This cycle is repeated until the number of first label samples of each category is greater than or equal to the minimum number of label samples corresponding to the category set by the user, as shown in Table 4 for the minimum number of label samples corresponding to different categories.
  • x1 ij is the j-th dimension variable in the i-th first label sample
  • mean1(S) is the average of the j-th dimension variables in all the first label samples value
  • std1(S) is the standard deviation of the jth dimension variable in all the first label samples.
  • This example uses PCA for dimensionality reduction.
  • the main idea of PCA is to map n-dimensional variables to d-dimensions.
  • This d-dimensional variable is a new orthogonal feature, also known as principal component, which is the basis of the original n-dimensional variable
  • the d-dimensional variable reconstructed on .
  • the job of PCA is to sequentially find a set of mutually orthogonal coordinate axes from the original n-dimensional space, and the selection of new coordinate axes is closely related to the fourth label sample.
  • the first new coordinate axis selects the direction with the largest variance of the fourth label sample
  • the second new coordinate axis selects the direction that maximizes the variance of the fourth label sample in the plane orthogonal to the first new coordinate axis
  • the third new coordinate axis is the direction that maximizes the variance of the fourth label sample in the plane orthogonal to the first new coordinate axis and the second new coordinate axis, and so on, d such new coordinates can be obtained axis. Then map each fourth label sample to the new coordinate axis through axis transformation.
  • Principal component analysis is used to ensure that the complexity of the model training method can be reduced without reducing the accuracy of the model training method, retaining 85% of the contribution rate.
  • the basic random forest model applicable to the second region that is, the above-mentioned basic classification model
  • train a new weak classifier with a small number of fifth label samples in the first region and the default number of new weak classifiers It is 30% of the number of weak classifiers of the basic classification model. If the number of weak classifiers of the basic classification model is 100, 30 new weak classifiers need to be trained.
  • splitting feature and splitting value of the first node need to be determined.
  • Gini(D) is the probability that two fifth label samples are randomly selected from sample set D and their categories are inconsistent.
  • the ratio of the number of fifth label samples, y is the total number of categories.
  • the value of e is divided according to a certain interval, assuming that the dividing point is ⁇ e 1 ,e 2 ,...,e V ⁇ , if e is used to divide the sample set D, then V branches will be generated, and the vth branch node contains all the fifth label samples in the sample set D that are greater than e v-1 and less than e v on the feature e, denoted as D v .
  • Gini_index(D,e) is the Gini coefficient.
  • the feature with the smallest Gini_index(D,e) is selected as the feature of the classification node of the first node of the new weak classifier.
  • a new weak classifier can be obtained by training, and in the same way, all new weak classifiers are learned according to the above method.
  • This example describes the training method applied to the fault classification model covering the interfering cells.
  • the faults covering the interfering cells are mainly divided into 5 types, plus normal cells, there are a total of 6 types.
  • Category code classification name 0 normal cell 1 Weak coverage cell 2 Overlapping coverage cells 3 handover coverage cell 4 uplink interference cell 5 Downlink Interfering Cell
  • the model training method includes the following steps 1 to 9.
  • each time point corresponds to a sixth label sample.
  • a sixth label sample includes the 71-dimensional variables shown in Table 6.
  • j-th dimension variable in the i-th seventh label sample is the j-th dimension variable in the i-th seventh label sample
  • x2 ij is the j-th dimension variable in the i-th sixth label sample
  • mean2(S) is the average of the j-th dimension variables in all sixth label samples value
  • std2(S) is the standard deviation of the jth dimension variable in all sixth label samples.
  • the TSNE algorithm is used for dimensionality reduction.
  • the TSNE algorithm models the distribution of the neighbors of each seventh label sample, and the neighbor refers to the set of label samples that are close to the seventh label sample.
  • the high-dimensional space of the seventh label sample the high-dimensional space is modeled as a Gaussian distribution
  • the low-dimensional output space ie, the eighth label sample
  • the goal of the process is Find a transformation that maps a high-dimensional space to a low-dimensional space and minimize the gap between these two distributions for all labeled samples.
  • the dimension reduction of the TSNE algorithm can be set by yourself. In this example, 71 dimensions are reduced to 5 dimensions.
  • This example uses GBDT as the classification model, and uses the cross-validation method to train GBDT.
  • L-layer cross-validation is used to ensure the accuracy of the model.
  • a classification model selecting the first classification model with the highest accuracy as the second classification model, and applying the second classification model to the eighth label sample for verification to obtain the category to which all the eighth label samples in the test set belong.
  • the score of the first classification model can be used to represent the accuracy of the first classification model.
  • the score for the first classification model is the average of the scores for all classes.
  • the score of the a-th category of the first classification model can be calculated according to the formula,
  • N bb is the number of sixth label samples whose actual value is b and predicted value is b in the sixth label sample used for verification
  • N cb is the actual value of c and predicted value b in the sixth label sample used for verification
  • the number of sixth label samples of , N bc is the number of sixth label samples whose real value is b and predicted value is c in the sixth label sample used for verification
  • (m-1) is the number of categories.
  • this example adopts the grid method to adjust the input parameters of the model.
  • the input parameters include the number of GBDT base classifiers n_estimators, the maximum depth of the base classifier max_depth, the maximum number of features selected by the base classifier max_features, the learning depth learning_rate, a total of four parameters.
  • n_estimators [50, 100, 150, 200];
  • the score of the second classification model can be used to represent the accuracy of the second classification model.
  • the score for the second classification model is the average of the scores for all classes.
  • the score of the a-th category of the first classification model can be calculated according to the formula,
  • N bb is the number of sixth label samples whose actual value is b and predicted value is b in the sixth label sample in the test set
  • N cb is the number of sixth label samples whose actual value is c and predicted value is b in the sixth label sample in the test set.
  • the number of six-label samples, N bc is the number of sixth-label samples with the true value of b and the predicted value of c in the sixth-label sample in the test set
  • (m-1) is the number of categories.
  • a second label sample was collected in the first area as shown in Table 6.
  • the second label sample is taken as the first label sample, and the category to which the first label sample belongs is the category to which the 16 sixth label samples belong.
  • the second label samples are discarded.
  • This cycle is repeated until the number of first label samples of each category is greater than or equal to the minimum number of label samples corresponding to the category set by the user, as shown in Table 7 for the minimum number of label samples corresponding to different categories.
  • Category code classification name Minimum number of label samples required for other regions 0 normal cell 100 1 Weak coverage cell 300 2 Overlapping coverage cells 200 3 handover coverage cell 200 4 uplink interference cell 300 5 Downlink Interfering Cell 400
  • x1 ij is the j-th dimension variable in the i-th first label sample
  • mean1(S) is the average of the j-th dimension variables in all the first label samples value
  • std1(S) is the standard deviation of the jth dimension variable in all the first label samples.
  • the TSNE algorithm is used for dimensionality reduction.
  • the TSNE algorithm models the distribution of the neighbors of each fourth label sample, and the neighbor refers to the set of label samples that are close to the fourth label sample.
  • the high-dimensional space of the fourth label sample the high-dimensional space is modeled as a Gaussian distribution, and in the low-dimensional output space (i.e., the fifth label sample), it can be modeled as a t-distribution.
  • the goal of the process is Find a transformation that maps a high-dimensional space to a low-dimensional space and minimize the gap between these two distributions for all labeled samples.
  • the dimension reduction of the TSNE algorithm can be set by yourself. In this example, 71 dimensions are reduced to 5 dimensions.
  • a new weak classifier is trained with a small number of fifth label samples in the first region.
  • the default number of new weak classifiers is 30% of the number of weak classifiers of the basic classification model. If the number of weak classifiers of the basic classification model is 100, 30 new weak classifiers need to be trained.
  • splitting feature and splitting value of the first node need to be determined.
  • the information gain can be calculated according to the following formula:
  • G represents the first derivative of the loss function
  • H represents the second derivative of the loss function
  • y represents the true value
  • y* represents the predicted value (according to each candidate)
  • L represents the left tree after splitting according to the classification node
  • R represents the right tree after splitting according to the classification node
  • ⁇ and ⁇ are input parameters, which are set to 0 by default.
  • the Gain value is calculated in parallel, and the candidate split point of the candidate feature with the largest Gain value is selected as the split feature and split value of the first node.
  • a new weak classifier can be obtained by training, and in the same way, all new weak classifiers are learned according to the above method.
  • an embodiment of the present application also provides an electronic device, including:
  • a memory where at least one program is stored, and when the at least one program is executed by the at least one processor, the above-mentioned model training method is implemented.
  • a processor is a device with data processing capability, including but not limited to a central processing unit (CPU), etc.; a memory is a device with data storage capability, including but not limited to random access memory (RAM, more specifically such as SDRAM, DDR, etc.) etc.), read-only memory (ROM), electrified erasable programmable read-only memory (EEPROM), flash memory (FLASH).
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrified erasable programmable read-only memory
  • FLASH flash memory
  • the processor and memory are connected to each other through a bus, which in turn is connected to other components of the computing device.
  • an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-mentioned model training method is implemented.
  • FIG. 3 is a block diagram of a model training apparatus provided by an embodiment of the present application.
  • an embodiment of the present application further provides a model training device, including:
  • the obtaining module 301 is configured to obtain the first label sample of the first area and the category to which the first label sample belongs;
  • the model retraining module 302 is configured to perform model retraining on the basic classification model according to the category to which the first label sample belongs and the first label sample to obtain a final classification model; the basic classification model is a classification model suitable for the second area, and the final classification The model is a classification model suitable for the first region.
  • the obtaining module 301 is specifically configured as:
  • Some or all of the second label samples are selected as the first label samples according to the third label samples in the second area and the category to which the third label samples belong, and the category to which the first label samples belong is determined.
  • the obtaining module 301 is specifically configured to select some or all of the second label samples as the first label samples according to the third label samples of the second region and the category to which the third label samples belong in the following manner, and Determine the class to which the first label sample belongs:
  • N third label samples belong to the same category among the K third label samples, and N is greater than or equal to rK
  • the second label sample is used as the first label sample, and the category to which the first label sample belongs is determined.
  • the category is the category to which the N third label samples belong, and r is an integer greater than or equal to 0 and less than or equal to 1.
  • the obtaining module 302 is further configured to:
  • N third-label samples belong to the same category among the K third-label samples, and N is less than rK, the second-label samples are discarded.
  • the obtaining module 301 is further configured to:
  • the model retraining module 302 is specifically configured to: retrain the basic classification model according to the category to which the first label sample belongs and the fifth label sample to obtain a final classification model.
  • the model retraining module 302 is specifically configured to perform at least one of the following:
  • the basic classification model is a serialized model generated in series
  • a new layer is added after the basic classification model, and the structural parameters of the basic classification model are kept unchanged. According to the category to which the first label sample belongs and the fifth label sample pair The new layer is trained to obtain the final classification model;
  • the basic classification model is a parallelized model, keep the structural parameters of the classifier in the basic classification model unchanged, generate a new classifier, and classify the new classifier according to the category to which the first label sample belongs and the fifth label sample. Perform training to get the final classification model.
  • the model training apparatus further includes:
  • the model training module 303 is configured to perform model training according to the category to which the sixth label sample of the second area belongs and the sixth label sample to obtain a basic classification model.
  • the obtaining module 301 is further configured to:
  • the model training module 303 is specifically configured to: perform model training according to the category to which the sixth label sample belongs and the eighth label sample to obtain a basic classification model.
  • Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes both volatile and nonvolatile implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules or other data flexible, removable and non-removable media.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage, or available with Any other medium that stores the desired information and can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and can include any information delivery media, as is well known to those of ordinary skill in the art .
  • Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should only be construed in a general descriptive sense and not for purposes of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics described in connection with other embodiments unless expressly stated otherwise and/or components used in combination. Accordingly, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the scope of the application as set forth in the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种模型训练方法、一种模型训练装置、一种电子设备和一种计算机可读存储介质,所述模型训练方法包括:获取第一区域的第一标签样本和第一标签样本所属的类别;以及根据第一标签样本所属的类别和第一标签样本对基础分类模型进行模型的再次训练得到最终分类模型,基础分类模型为适用于第二区域的分类模型,最终分类模型为适用于第一区域的分类模型。

Description

模型训练方法和装置、电子设备、计算机可读存储介质
相关申请的交叉引用
本申请要求于2020年11月11日提交的中国专利申请NO.202011259760.6的优先权,该中国专利申请的内容通过引用的方式整体合并于此。
技术领域
本申请实施例涉及通讯领域,特别涉及模型训练方法和装置、电子设备、计算机可读存储介质。
背景技术
随着通信技术的高速发展,用户的要求也在不断提高,通信系统作为一种高度复杂且集成化的系统,如果某一部分出现故障,将对整个系统的正常运行造成严重影响。
公开内容
第一方面,本申请实施例提供一种模型训练方法,包括:
获取第一区域的第一标签样本以及所述第一标签样本所属的类别;以及
根据所述第一标签样本所属的类别,使用所述第一标签样本对基础分类模型进行模型的再次训练得到最终分类模型;其中,所述基础分类模型为适用于第二区域的分类模型,所述最终分类模型为适用于所述第一区域的分类模型。
第二方面,本申请实施例提供一种电子设备,包括:
至少一个处理器;以及
存储器,存储器上存储有至少一个程序,当所述至少一个程序被所述至少一个处理器执行时,实现上述的模型训练方法。
第三方面,本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述的模型训练方法。
附图说明
图1为本申请实施例提供的模型训练方法的流程图;
图2为本申请实施例的模型训练方法的示意图;以及
图3为本申请实施例提供的模型训练装置的组成框图。
具体实施方式
为使本领域的技术人员更好地理解本申请的技术方案,下面结合附图对本申请提供的模型训练方法和装置、电子设备、计算机可读存储介质进行详细描述。
在下文中将参考附图更充分地描述示例实施例,但是所述示例实施例可以以不同形式来体现且不应当被解释为限于本文阐述的实施例。提供这些实施例的目的在于使本申请更加透彻和完整,并将使本领域技术人员充分理解本申请的范围。
在不冲突的情况下,本申请各实施例及实施例中的各特征可相互组合。
如本文所使用的,术语“和/或”包括至少一个相关列举条目的任何和所有组合。
本文所使用的术语仅用于描述特定实施例,且不意欲限制本申请。如本文所使用的,单数形式“一个”和“该”也意欲包括复数形式,除非上下文另外清楚指出。还将理解的是,当本说明书中使用术语“包括”和/或“由……制成”时,指定存在特定特征、整体、步骤、操作、元件和/或组件,但不排除存在或可添加至少一个其它特征、整体、步骤、操作、元件、组件和/或其群组。
除非另外限定,否则本文所用的所有术语(包括技术和科学术语)的含义与本领域普通技术人员通常理解的含义相同。还将理解,诸如在那些常用字典中限定的那些术语应当被解释为具有与其在相关技 术以及本申请的背景下的含义一致的含义,且将不解释为具有理想化或过度形式上的含义,除非本文明确如此限定。
为了避免通信系统的某一部分出现故障而对整个系统的正常运行造成严重影响,如何对问题小区进行快速的检测和识别以保证系统的稳定性是业界急需解决的问题。
目前的故障检测技术往往都是通过对设备进行实时检测以及根据专家的经验进行分析,通常需要大量的人力投入,对无线网络运维的要求极高。在一些相关技术中,也有采用大数据进行分析、并结合人工智能(AI,ArtificialIntelligence)的方法,训练得到相关模型。但是在通信领域中,模型往往在某地适用,但是切换到另一个地方,则由于地理位置、用户习惯、组网方式、设备结构、天气等诸多因素的变化,导致在某地适用的模型无法在另一个地方适用。由于标签样本的获取难度很大,因此难以在另一个地方重新收集标签样本,从而也难以重新训练模型。
本申请实施例提供一种模型训练方法、一种电子设备、一种计算机可读存储介质、以及一种模型训练装置,以至少部分地解决上述问题。
图1为本申请实施例提供的模型训练方法的流程图。
第一方面,参照图1,本申请实施例提供一种模型训练方法,包括步骤100和101。
步骤100、获取第一区域的第一标签样本和第一标签样本所属的类别。
在一些示例性实施方式中,第一区域可以是任何区域,可以预先设定。
在一些示例性实施方式中,可以采用本领域技术人员熟知的任意一种方法来获取第一区域的第一标签样本和第一标签样本所属的类别。具体的获取方式不用于限定本申请实施例的保护范围。
在一些示例性实施方式中,由于通常情况下,需要人工确定第一标签样本所属的类别,因此,为了节省人工标记(即标记标签样本所属的类别)的工作量,可以实现标记的自动化,具体地,可获取第 一区域的第二标签样本,根据第二区域的第三标签样本和第三标签样本所属的类别选择部分或全部第二标签样本作为第一标签样本,以及确定第一标签样本所属的类别。也就是说,如图2所示,获取第一区域的第一标签样本和第一标签样本所属的类别包括:
获取第一区域的第二标签样本;
根据第二区域的第三标签样本和第三标签样本所属的类别选择部分或全部第二标签样本作为第一标签样本,以及确定第一标签样本所属的类别。
在一些示例性实施方式中,根据第二区域的第三标签样本和第三标签样本所属的类别选择部分或全部第二标签样本作为第一标签样本、以及确定第一标签样本所属的类别包括:
确定与每个第二标签样本相似度最高的K个第三标签样本,K为大于或等于2的整数;
在所述K个第三标签样本中有N个第三标签样本所属的类别相同,且N大于或等于rK的情况下,将第二标签样本作为第一标签样本,确定第一标签样本所属的类别为N个第三标签样本所属的类别,r为大于或等于0、且小于或等于1的整数。
在一些示例性实施方式中,在所述K个第三标签样本中有N个第三标签样本所属的类别相同,且N小于rK的情况下,所述模型训练方法还包括:丢弃第二标签样本。
在一些示例性实施方式中,某一个第二标签样本与某一个第三标签样本的相似度可以采用该第二标签样本与该第三标签样本之间的距离来表示。当然,某一个第二标签样本与某一个第三标签样本的相似度也可以采用其他参数来表示,具体的表示参数不用于限定本申请实施例的保护范围。
在一些示例性实施方式中,按照公式
Figure PCTCN2021128319-appb-000001
计算某一个第二标签样本与某一个第三标签样本之间的距离;
Dis为某一个第二标签样本与某一个第三标签样本之间的距离, p为常数,n为标签样本的维数,x 1j(j=1,2,3,…,n)为某一个第二标签样本的第j维变量,x 2j(j=1,2,3,…,n)为某一个第三标签样本的第j维变量。
当然,也可以采用其他方式来计算某一个第二标签样本与某一个第三标签样本之间的距离,具体的计算方式不用于限定本申请实施例的保护范围。
在一些示例性实施方式中,可以预先设置每个类别所需要的第一标签样本的最小数量,在某一个类别的第一标签样本数量达到对应的最小数量的情况下,停止该类别的第一标签样本数量的获取。
需要说明的是,第二区域的第三标签样本可以是进行模型训练得到基础分类模型时所采用的部分或全部标签样本(即下文提到的第六标签样本),也可以是根据基础分类模型确定了所属的类别的部分或全部标签样本,也可以是包括以上两种标签样本的标签样本,具体采用哪些标签样本来实现第一标签样本和第一标签样本所属的类别的确定并不用于限定本申请实施例的保护范围。
需要说明的是,在确定是否将第二标签样本作为第一标签样本时,应该分别对每一个标签样本进行计算确定。
步骤101、根据第一标签样本所属的类别和第一标签样本对基础分类模型进行模型的再次训练得到最终分类模型,基础分类模型为适用于第二区域的分类模型,最终分类模型为适用于第一区域的分类模型。
在一些示例性实施方式中,第二区域可以是任何区域,可以预先设定。
需要说明的是,第一区域和第二区域为不同的区域,第一区域和第二区域可以是没有重叠区域的两个区域,也可以是存在重叠区域,且第一区域和第二区域可以为其重叠区域的比例小于或等于预设阈值的两个区域。
需要说明的是,在设置第一区域和第二区域时,应该使得适用于第二区域的基础分类模型不适用于第一区域。
在一些示例性实施方式中,基础分类模式是根据第六标签样本 和第六标签样本所属的类别进行模型训练得到的分类模型。
在一些示例性实施方式中,根据第一标签样本所属的类别和第一标签样本对基础分类模型进行模型的再次训练得到最终分类模型包括以下至少之一:
在基础分类模型为串行生成的序列化模型(如xgboost、AdaBoost、神经网络等)的情况下,在基础分类模型后面增加新的层,保持基础分类模型的结构参数不变,根据第一标签样本所属的类别和第一标签样本对新的层进行训练得到最终分类模型;
在基础分类模型为并行化模型(如随机森林等)的情况下,生成新的分类器,根据第一标签样本所属的类别和第一标签样本对新的分类器进行训练得到最终分类模型。
在一些示例性实施方式中,对分类器的具体类型不作限定,可以是弱分类器,也可以是强分类器,还可以是其他的分类器,具体的类型不用于限定本申请实施例的保护范围。
在一些示例性实施方式中,由于标签样本中的每个变量都有不同的单位,为了防止不同的量纲引起数据量级之间的误差,可以分别为每一个第一标签样本进行标准化处理,然后基于标准化处理后的第一标签样本和第一标签样本所属的类别对基础分类模型进行模型的再次训练得到最终分类模型。
在一些示例性实施方式中,在原始样本的维度比较高、问题空间维度较大的情况下,对模型的计算能力有较大的影响,因此,可以对每一个第一标签样本进行降维处理,然后基于降维处理后的第一标签样本和第一标签样本所属的类别对基础分类模型进行模型的再次训练得到最终分类模型。
在一些示例性实施方式中,也可以同时对每一个第一标签样本进行标准化处理和降维处理,即根据第一标签样本所属的类别和第一标签样本对基础分类模型进行模型的再次训练得到最终分类模型之前,所述模型训练方法还包括:
对第一标签样本进行标准化处理,得到第四标签样本;以及
对第四标签样本进行降维处理,得到第五标签样本;
根据第一标签样本所属的类别和第一标签样本对基础分类模型进行模型的再次训练得到最终分类模型包括:根据第一标签样本所属的类别和第五标签样本对基础分类模型进行模型的再次训练得到最终分类模型。
在一些示例性实施方式中,对第一标签样本进行标准化处理包括:
按照公式
Figure PCTCN2021128319-appb-000002
对第i个第一标签样本中的第j维变量进行标准化处理;
Figure PCTCN2021128319-appb-000003
为第i个第四标签样本中的第j维变量,x1 ij为第i个第一标签样本中的第j维变量,mean1(S)为所有第一标签样本中的第j维变量的平均值,std1(S)为所有第一标签样本中的第j维变量的标准差。
在一些示例性实施方式中,由于第一标签样本的数量一般比较小,统计所有第一标签样本中的第j维变量的平均值和标准差的意义不大,而用于训练基础分类模型的第六标签样本的数量一般比较大,因此,mean1(S)可以取为所有第六标签样本中的第j维变量的平均值,std1(S)可以取为所有第六标签样本中的第j维变量的标准差;或者,mean1(S)也可以取为所有第一标签样本和所有第六标签样本中的第j维变量的平均值,std1(S)也可以取为所有第一标签样本和所有第六标签样本中的第j维变量的标准差。
在一些示例性实施方式中,可以采用本领域技术人员熟知的降维处理算法(例如,主成分分析(PCA,Principal Component Analysis)算法、T分布和随机近邻嵌入(TSNE,T-Stochastic Neighbor Embedding)算法、局部线性嵌入(LLE,Locally Linear Embedding)算法、降维(MDS,MultiDimensional Scaling)算法等)来进行降维处理,具体的降维处理算法不用于限定本申请实施例的保护范围,这里不再赘述。
降维处理后标签样本的维数可以预先设置。例如,若采用PCA进行降维处理,可以设置保留85%的主成分;若采用TSNE、LLE、MDS 进行降维处理,可以设置降维到二维。降维处理后标签样本的维数不用于限定本申请实施例的保护范围。
需要说明的是,对标签样本进行标准化处理和降维处理并不会改变标签样本所属的类别,也就是说,第一标签样本所属的类别、标准化处理后的第一标签样本(即第四标签样本)所属的类别、降维处理后的第一标签样本所属的类别、第五标签样本所属的类别均相同。
在一些示例性实施方式中,根据第一标签样本所属的类别和第五标签样本对基础分类模型进行模型的再次训练得到最终分类模型包括以下至少之一:
在基础分类模型为串行生成的序列化模型(如xgboost、AdaBoost、神经网络等)的情况下,在基础分类模型后面增加新的层,保持基础分类模型的结构参数不变,根据第一标签样本所属的类别和第五标签样本对新的层进行训练得到最终分类模型;
在基础分类模型为并行化模型(如随机森林等)的情况下,保持基础分类模型中的分类器的结构参数不变,生成新的分类器,根据第一标签样本所属的类别和第五标签样本对新的分类器进行训练得到最终分类模型。
在一些示例性实施方式中,如图2所示,获取第一区域的第一标签样本和第一标签样本所属的类别之前,所述模型训练方法还包括:
根据第二区域的第六标签样本所属的类别和第六标签样本进行模型训练得到基础分类模型。
在一些示例性实施方式中,分类模型可以是本领域技术人员熟知的分类模型,例如,随机森林、梯度提升决策树(GBDT,Gradient Boosting Decision Tree)、xgboost(Extreme GBoosted),神经网络等。
在一些示例性实施方式中,可以采用本领域技术人员熟知的模型训练方法进行模型训练得到基础分类模型。例如,按照分层抽样将所有第六标签样本按照预先设置的比例(例如,训练集:测试集=0.75:0.25)分成训练集和测试集,采用L层交叉验证的方法保证模型的准确性,即,将训练集再次划分为L份,(L-1)份用于模型训练, 另外1份用于模型验证;循环L次可以得到L个第一分类模型;从L个第一分类模型中选择精度最高的第一分类模型作为基础分类模型;将基础分类模型应用到测试集,得到测试集中每一个第六标签样本所属的类别。
需要说明的是,每一次循环过程中采用(L-1)份标签数据进行模型训练得到1个分类模型,L次循环对应的用于模型验证的标签样本均不相同。
例如,假设有1000个第六标签样本,将这1000个标签样本按照0.75:0.25的比例分成训练集和测试集,训练集包括750个第六标签样本,测试集包括250个第六标签样本;然后将750个第六标签样本再分成K=10份,每一份包括75个第六标签样本,那么在第1次循环时,可以将第1份至第9份第六标签样本用于模型训练,第10份第六标签样本用于模型验证;在第2次循环时,可以将第1份至第8份以及第10份第六标签样本用于模型训练,第9份第六标签样本用于模型验证;在第3次循环时,可以将第1份至第7份、以及第9份至第10份第六标签样本用于模型训练,第8份第六标签样本用于模型验证;在第4次循环时,可以将第1份至第6份、以及第8份至第10份第六标签样本用于模型训练,第7份第六标签样本用于模型验证;在第5次循环时,可以将第1份至第5份、以及第7份至第10份第六标签样本用于模型训练,第6份第六标签样本用于模型验证;在第6次循环时,可以将第1份至第4份、以及第6份至第10份第六标签样本用于模型训练,第5份第六标签样本用于模型验证;在第7次循环时,可以将第1份至第3份、以及第5份至第10份第六标签样本用于模型训练,第4份第六标签样本用于模型验证;在第8次循环时,可以将第1份至第2份、以及第4份至第10份第六标签样本用于模型训练,第3份第六标签样本用于模型验证;在第9次循环时,可以将第1份、以及第3份至第10份第六标签样本用于模型训练,第2份第六标签样本用于模型验证;在第10次循环时,可以将第2份至第10份第六标签样本用于模型训练,第1份第六标签样本用于模型验证;这样,循环10次得到10个分类模型。
在一些示例性实施方式中,对于每一个第一分类模型,将该第一分类模型应用到对应的用于模型验证的1份第六标签样本中,得到每一个第六标签样本所属的类别,然后根据得到的每一个第六标签样本所属的类别与真实的类别确定该第一分类模型的精度。
在一些示例性实施方式中,可以采用第一分类模型的分数来表示第一分类模型的精度。例如,在分类模型为随机森林的情况下,可以用混淆矩阵来表示第一分类模型的好坏,混淆矩阵如表1所示。
表1
Figure PCTCN2021128319-appb-000004
需要说明的是,在计算第一分类模型的分数时,真实值是指通过其他方式获得的第六标签样本的真实的所属的类别,预测值是指将第一分类模型应用到对应的用于验证的第六标签样本得到的每一个第六标签样本所属的类别。
那么,第一分类模型的分数为所有类别的分数的平均值。
可按照如下公式计算第一分类模型的第a个类别的分数,
Figure PCTCN2021128319-appb-000005
Figure PCTCN2021128319-appb-000006
N bb为用于验证的第六标签样本中真实值为b、预测值为b的第六标签样本的数量,N cb为用于验证的第六标签样本中真实值为c、预测值为b的第六标签样本的数量,N bc为用于验证的第六标签样本中真实值为b、预测值为c的第六标签样本的数量,(m-1)为类别的数量。
在一些示例性实施方式中,为了提高基础分类模型的准确度,可以采用网格法调整分类模型的输入参数,即对分类模型的输入参数设置枚举型的数值,对每一个数值采用模型训练方法得到一个第二分类模型,从所有第二分类模型中选择精度最高的第二分类模型作为基础分类模型。
在一些示例性实施方式中,可以采用本领域技术人员熟知的模型训练方法得到第二分类模型。例如,对于每一个输入参数,按照分层抽样将所有第六标签样本按照预先设置的比例(例如,训练集:测试集=0.75:0.25)分成训练集和测试集,采用L层交叉验证的方法保证模型的准确性,即,将训练集再次划分为L份,(L-1)份用于模型训练,另外1份用于模型验证;循环L次可以得到L个第一分类模型;从L个第一分类模型中选择精度最高的第一分类模型作为第二分类模型;从所有第二分类模型中选择精度最高的第二分类模型作为基础分类模型。
需要说明的是,每一次循环过程中采用(L-1)份标签数据进行模型训练得到1个分类模型,L次循环对应的用于模型验证的标签样本均不相同。
在一些示例性实施方式中,对于每一个第二分类模型,将该第二分类模型应用到对应的测试集中,得到每一个第六标签样本所属的类别,然后根据得到的每一个第六标签样本所属的类别与真实的类别确定该第二分类模型的精度。
在一些示例性实施方式中,可以采用第二分类模型的分数来表示第二分类模型的精度。例如,在分类模型为随机森林的情况下,可以用混淆矩阵来表示第二分类模型的好坏,混淆矩阵如表1所示。
需要说明的是,在计算第二分类模型的分数时,真实值是指通过其他方式获得的第六标签样本的真实的所属的类别,预测值是指将第一分类模型应用到对应的用于验证的第六标签样本得到的每一个第六标签样本所属的类别。
那么,第二分类模型的分数为所有类别的分数的平均值。
可按照如下公式计算第一分类模型的第a个类别的分数,
Figure PCTCN2021128319-appb-000007
Figure PCTCN2021128319-appb-000008
N bb为测试集中的第六标签样本中真实值为b、预测值为b的第六标签样本的数量,N cb为测试集中的第六标签样本中真实值为c、预测值为b的第六标签样本的数量,N bc为测试集中的第六标签样本中真实值为b、预测值为c的第六标签样本的数量,(m-1)为类别的数量。
在一些示例性实施方式中,由于标签样本中的每个变量都有不同的单位,为了防止不同的量纲引起数据量级之间的误差,可以分别为每一个第六标签样本进行标准化处理,然后基于标准化处理后的第六标签样本和第六标签样本所属的类别进行模型训练得到基础分类模型。
在一些示例性实施方式中,在原始样本的维度比较高、问题空间维度较大的情况下,对模型的计算能力有较大的影响,因此,可以对每一个第六标签样本进行降维处理,然后基于降维处理后的第六标签样本和第六标签样本所属的类别进行模型训练得到基础分类模型。
在一些示例性实施方式中,也可以同时对每一个第六标签样本进行标准化处理和降维处理,即根据第二区域的第六标签样本所属的类别和第六标签样本进行模型训练得到基础分类模型之前,所述模型训练方法还包括:
对第六标签样本进行标准化处理,得到第七标签样本;以及
对第七标签样本进行降维处理,得到第八标签样本;
根据第二区域的第六标签样本所属的类别和第六标签样本进行模型训练得到基础分类模型包括:根据第六标签样本所属的类别和第八标签样本进行模型训练得到基础分类模型。
在一些示例性实施方式中,对第六标签样本进行标准化处理包括:
按照公式
Figure PCTCN2021128319-appb-000009
对第i个第六标签样本中的第j维变量进行标准化处理;
Figure PCTCN2021128319-appb-000010
为第i个第七标签样本中的第j维变量,x2 ij为第i个第六标签样本中的第j维变量,mean2(S)为所有第六标签样本中的第j维变量的平均值,std2(S)为所有第六标签样本中的第j维变量的标准差。
在一些示例性实施方式中,可以采用本领域技术人员熟知的降维处理算法(例如,PCA算法、TSNE算法、LLE算法、MDS算法等)来进行降维处理,具体的降维处理算法不用于限定本申请实施例的保护范围,这里不再赘述。
降维处理后标签样本的维数可以预先设置。例如,若采用PCA进行降维处理,可以设置保留85%的主成分;若采用TSNE、LLE、MDS进行降维处理,可以设置降维到二维。降维处理后标签样本的维数不用于限定本申请实施例的保护范围。
需要说明的是,对标签样本进行标准化处理和降维处理并不会改变标签样本所属的类别,也就是说,第六标签样本所属的类别、标准化处理后的第六标签样本(即第七标签样本)所属的类别、降维处理后的第六标签样本所属的类别、第八标签样本所属的类别均相同。
本申请实施例提供的模型训练方法中,先获取第一区域的第一标签样本以及所述第一标签样本所属的类别,然后基于第一标签样本所属的类别,使用第一标签样本对基础分类模型进行模型的再次训练得到最终分类模型,由于最终分类模型是对基础分类模型进行模型的再次训练得到的,而不是进行模型的重新训练得到的,因此,并不需要获取第一区域过多的标签样本就能实现对适用于第一区域的分类模型的训练,也就是,采用少量的第一标签样本对基础分类模型进行模型的再次训练就能得到最终分类模型,简单地实现了不同区域的模型训练。
在一些示例性实施方式中,基于第二区域的第三标签样本自动对第一标签样本所属的类别进行标记,节省了人工标注的大量工作量, 提高了模型训练的精度。
下面通过两个示例详细说明上述实施例的模型训练方法的具体实现过程,需要说明的是,所列举的示例仅仅是为了说明方便,不能认为所列举的示例是本申请实施例的模型训练方法的唯一实现方式,所列举的示例也不用于限定本申请实施例的保护范围。
示例1
该示例描述应用于睡眠小区的故障分类模型的训练方法,如表2所示,睡眠小区的故障主要分为5种,加上正常小区,一共6种类别。
表2睡眠小区的类别表
Figure PCTCN2021128319-appb-000011
如图2所示,所述模型训练方法包括如下步骤1至9。
1、获取第二区域的第六标签样本。
本示例中,在时间维度上,获取当前时刻以及4个历史同期时刻的数据,例如此时为星期一18点,获取当前星期一18点、上星期一18点、上上星期一18点、上上上星期一18点、上上上上星期一18点(共5个时刻)的数据,且每个时刻的指标变量包括表3中的15个指标变量,因此,一个第六标签样本的维度变量为15*5=75个维度的变量。
表3
Figure PCTCN2021128319-appb-000012
Figure PCTCN2021128319-appb-000013
2、分别对每一个第六标签样本的每一维变量进行标准化处理,得到第七标签样本。
可以按照公式
Figure PCTCN2021128319-appb-000014
对第i个第六标签样本中的第j维变量进行标准化处理;
Figure PCTCN2021128319-appb-000015
为第i个第七标签样本中的第j维变量,x2 ij为第i个第六标签样本中的第j维变量,mean2(S)为所有第六标签样本中的第j维变量的平均值,std2(S)为所有第六标签样本中的第j维变量的标准差。
3、分别对每一个第七标签样本进行降维处理,得到第八标签样本。
本示例采用PCA进行降维处理,PCA的主要思想是将n维变量映射到d维上,这d维变量是全新的正交特征,也被称为主成分,是在 原有n维变量的基础上重新构造出来的d维变量。PCA的工作就是从原始的n维空间中顺序地找一组相互正交的坐标轴,新的坐标轴的选择与第七标签样本是密切相关的。第一个新坐标轴选择的是第七标签样本的方差最大的方向,第二个新坐标轴选取的是与第一个新坐标轴正交的平面中使得第七标签样本的方差最大的方向,第三个新坐标轴是与第一个新坐标轴和第二个新坐标轴正交的平面中使得第七标签样本的方差最大的方向,以此类推,可以得到d个这样的新坐标轴。然后将每一个第七标签样本进行坐标轴转换映射到新坐标轴中。
通过主成分分析来保证在不降低模型训练方法的精度的情况下降低模型训练方法的复杂度,保留85%的贡献率。
4、根据第六标签样本所属的类别和第八标签样本进行模型训练得到适用于第二区域的基础分类模型。
本示例采用随机森林模型作为分类模型,用交叉验证方法对随机森林模型进行训练。
根据分层抽样将第八标签样本划分为训练集:测试集=0.75:0.25。同时,采用L层交叉验证保证模型的准确性。令L=10,即将训练集中的所有第八标签样本分为10份,选择9份用于模型训练,剩余1份用于模型验证,得到一个第一分类模型,循环10次,得到10个第一分类模型,选择精度最高的第一分类模型作为第二分类模型,将第二分类模型应用到用于验证的第八标签样本,得到测试集中所有第八标签样本所属的类别。
可以采用第一分类模型的分数来表示第一分类模型的精度。
第一分类模型的分数为所有类别的分数的平均值。
可按照如下公式计算第一分类模型的第a个类别的分数,
Figure PCTCN2021128319-appb-000016
Figure PCTCN2021128319-appb-000017
N bb为用于验证的第六标签样本中真实值为b、预测值为b的第六标签样本的数 量,N cb为用于验证的第六标签样本中真实值为c、预测值为b的第六标签样本的数量,N bc为用于验证的第六标签样本中真实值为b、预测值为c的第六标签样本的数量,(m-1)为类别的数量。
为了提高模型的精度,本实例采用网格法进行模型的输入参数的调整,调整的随机森林的输入参数包括随机森林基分类器数量n_estimators、基分类器的最大深度max_depth、基分类器选择的最大特征数max_features、评价准则函数criterion,一共四个参数。
设置每个输入参数的枚举变量:
n_estimators=[50,100,150,200];
max_depth=[6,8,10,12];
max_features=[sqrt,0.7,0.9,1];
Criterion=[gini,entropy]。
因此,一共需要循环4*4*4*2=128次,最终从128次中得到128个第二分类模型,从128个第二分类模型中选择精度最高的第二分类模型作为基础分类模型。
可以采用第二分类模型的分数来表示第二分类模型的精度。
第二分类模型的分数为所有类别的分数的平均值。
可按照公式计算第一分类模型的第a个类别的分数,
Figure PCTCN2021128319-appb-000018
Figure PCTCN2021128319-appb-000019
N bb为测试集中的第六标签样本中真实值为b、预测值为b的第六标签样本的数量,N cb为测试集中的第六标签样本中真实值为c、预测值为b的第六标签样本的数量,N bc为测试集中的第六标签样本中真实值为b、预测值为c的第六标签样本的数量,(m-1)为类别的数量。
5、获取第一区域的第二标签样本。
如表3所示在第一区域收集第二标签样本。
6、根据第六标签样本和第六标签样本所属的类别选择部分或全部第二标签样本作为第一标签样本,以及确定第一标签样本所属的类别。
本示例中,针对每一个第二标签样本,在第六标签样本中,寻找与第二标签样本距离最近的K=20个第六标签样本,在20个第六标签样本中,若有大于或等于0.8*20=16个第六标签样本所属的类别相同,则将第二标签样本作为第一标签样本,并且第一标签样本所属的类别为16个第六标签样本所属的类别。
若所属的类别相同的第六标签样本的数量小于16,则丢弃第二标签样本。
如此循环,直到每一个类别的第一标签样本数量大于或等于用户设置的该类别对应的最少标签样本数,如表4所示为不同类别对应的最小标签样本数。
表4
类别代号 类别名称 其他区域所需最少标签样本数
0 正常小区 100
1 无用户接入睡眠小区 200
2 有随机接入睡眠小区 200
3 有RRC接入或切入请求睡眠小区 200
4 有RRC无ERAB睡眠小区 100
5 PDCP流量异常睡眠小区 200
7、分别对每一个第一标签样本的每一维变量进行标准化处理,得到第四标签样本。
可按照公式
Figure PCTCN2021128319-appb-000020
对第i个第一标签样本中的第j维变量进行标准化处理;
Figure PCTCN2021128319-appb-000021
为第i个第四标签样本中的第j维变量,x1 ij为第i个第一标签样本中的第j维变量,mean1(S)为所有第一标签样本中的第j维变量的平均值,std1(S)为所有第一标签样本中的第j维变量的标准差。
8、分别对每一个第四标签样本进行降维处理,得到第五标签样本。
本示例采用PCA进行降维处理,PCA的主要思想是将n维变量映射到d维上,这d维变量是全新的正交特征,也被称为主成分,是在原有n维变量的基础上重新构造出来的d维变量。PCA的工作就是从原始的n维空间中顺序地找一组相互正交的坐标轴,新的坐标轴的选择与第四标签样本是密切相关的。第一个新坐标轴选择的是第四标签样本的方差最大的方向,第二个新坐标轴选取的是与第一个新坐标轴正交的平面中使得第四标签样本的方差最大的方向,第三个新坐标轴是与第一个新坐标轴和第二个新坐标轴正交的平面中使得第四标签样本的方差最大的方向,以此类推,可以得到d个这样的新坐标轴。然后将每一个第四标签样本进行坐标轴转换映射到新坐标轴中。
通过主成分分析来保证在不降低模型训练方法的精度的情况下降低模型训练方法的复杂度,保留85%的贡献率。
9、根据第一标签样本所属的类别和第五标签样本对基础分类模型进行模型的再次训练得到最终分类模型。
将适用于第二区域的基础的随机森林模型(即上述基础分类模型)迁移到第一区域后,用第一区域的少量第五标签样本训练新的弱分类器,默认新的弱分类器数量为基础分类模型的弱分类器数量的30%,若基础分类模型的弱分类器的数量为100个,则需要训练30个新的弱分类器。
在保持基础分类模型的原有100个弱分类器的结构参数都不变的情况下,生成30个新的弱分类器,最终得到130个弱分类器的随机森林模型。将该模型应用于第一区域的现场进行诊断。
利用基础分类模型对应的模型输入参数(例如max_depth=6,max_features=0.7,Criterion=gini)训练30个新的弱分类器。
针对每一个新的弱分类器,首先在新的弱分类器的所有特征(如表3所示)中,随机选择15*0.7≈10个特征作为该新的弱分类器的特征。
首先需要确定第一个节点的分裂特征和分裂值。
包括第五标签样本的样本集D的纯度可用如下公式来度量:
Figure PCTCN2021128319-appb-000022
Gini(D)为从样本集D中随机抽取两个第五标签样本、其所属的类别不一致的概率,Gini(D)越小,则样本集D的纯度越高,p k为第k个类别的第五标签样本的数量的比例,y为总类别数。
假设特征e是连续的属性,将e的取值根据一定区间进行划分,假设划分点为{e 1,e 2,...,e V},若使用e来对样本集D进行划分,则会产生V个分支,第v个分支节点包含了样本集D中所有在特征e上大于e v-1且小于e v的第五标签样本,记为D v
在新的弱分类器的10个特征中,针对每一个特征e,计算特征e的Gini系数:
Figure PCTCN2021128319-appb-000023
Gini_index(D,e)为Gini系数。
在10个特征中,挑选Gini_index(D,e)最小的特征作为新的弱分类器第一个节点的分类节点的特征。
然后遍历所有的划分点{e 1,e 2,...,e V},根据划分点将样本集D划分为D 1和D 2。计算如下公式:Gini(D,e v)=Gini(D 1)+Gini(D 2);
计算每一个划分点的Gini(D,e v),选择最小的划分点作为第一个节点的最佳分裂值,由此,得到了新的弱分类器的第一个节点的最优分裂属性和最优分裂值。
如此循环,对每一个节点的子左节点和子右节点计算得到最优分裂属性和最优分裂值,且新的弱分类器的深度不超过max_depth=6。
由此可以训练得到一个新的弱分类器,同理,将所有新的弱分类器按照上述方法进行学习。
示例2
该示例描述应用于覆盖干扰小区的故障分类模型的训练方法,如表5所示,覆盖干扰小区的故障主要分为5种,加上正常小区,一共6种类别。
表5
类别代号 类别名称
0 正常小区
1 弱覆盖小区
2 重叠覆盖小区
3 越区覆盖小区
4 上行干扰小区
5 下行干扰小区
所述模型训练方法包括以下步骤1至9。
1、获取第二区域的第六标签样本。
本示例中,在时间维度上,每一个时间点对应一个第六标签样本。在空间维度上,一个第六标签样本包括表6所示的71维变量。
表6
Figure PCTCN2021128319-appb-000024
Figure PCTCN2021128319-appb-000025
Figure PCTCN2021128319-appb-000026
2、分别对每一个第六标签样本的每一维变量进行标准化处理,得到第七标签样本。
可按照公式
Figure PCTCN2021128319-appb-000027
对第i个第六标签样本中的第j维变量进行标准化处理;
Figure PCTCN2021128319-appb-000028
为第i个第七标签样本中的第j维变量,x2 ij为第i个第六标签样本中的第j维变量,mean2(S)为所有第六标签样本中的第j维变量的平均值,std2(S)为所有第六标签样本中的第j维变量的标准差。
3、分别对每一个第七标签样本进行降维处理,得到第八标签样本。
本示例采用TSNE算法进行降维处理,TSNE算法对每个第七标签样本近邻的分布进行建模,近邻是指相互靠近第七标签样本的标签样本的集合。在第七标签样本的高维空间中,将高维空间建模成高斯分布,而在低维输出空间(即第八标签样本)中,可以将其建模为t分布,该过程的目标是找到将高维空间映射到低维空间的变换,并且最小化所有标签样本在这两个分布之间的差距。
TSNE算法降维到几维可以自行设定,本示例将71维降维到5维。
4、根据第六标签样本所属的类别和第八标签样本进行模型训练得到适用于第二区域的基础分类模型。
本示例采用GBDT作为分类模型,用交叉验证方法对GBDT进行训练。
根据分层抽样将第八标签样本划分为训练集:测试集=0.75:0.25。同时,采用L层交叉验证保证模型的准确性。令L=10,即将训练集中的所有第八标签样本分为10份,选择9份用于模型训练,剩余1份用于模型验证,得到一个第一分类模型,循环10次,得到10个第一分类模型,选择精度最高的第一分类模型作为第二分类模型,将第二分类模型应用到用于验证的第八标签样本,得到测试集中所有第八标签样本所属的类别。
可以采用第一分类模型的分数来表示第一分类模型的精度。
第一分类模型的分数为所有类别的分数的平均值。
可按照公式计算第一分类模型的第a个类别的分数,
Figure PCTCN2021128319-appb-000029
Figure PCTCN2021128319-appb-000030
N bb为用于验证的第六标签样本中真实值为b、预测值为b的第六标签样本的数量,N cb为用于验证的第六标签样本中真实值为c、预测值为b的第六标签样本的数量,N bc为用于验证的第六标签样本中真实值为b、预测值为c的第六标签样本的数量,(m-1)为类别的数量。
为了提高模型的精度,本实例采用网格法进行模型的输入参数的调整,输入参数包括GBDT基分类器数量n_estimators、基分类器的最大深度max_depth、基分类器选择的最大特征数max_features、学习深度learning_rate,一共四个参数。
设置每个参数的枚举变量:
n_estimators=[50,100,150,200];
max_depth=[6,8,10,12];
max_features=[sqrt,0.7,0.9,1];
learning_rate=[0.1,0.2,0.4,0.8]。
因此,一共需要循环4*4*4*4=256次,最终从256次中得到256个第二分类模型,从256个第二分类模型中选择精度最高的第二分类模型作为基础分类模型。
可以采用第二分类模型的分数来表示第二分类模型的精度。
第二分类模型的分数为所有类别的分数的平均值。
可按照公式计算第一分类模型的第a个类别的分数,
Figure PCTCN2021128319-appb-000031
Figure PCTCN2021128319-appb-000032
N bb为测试集中的第六标签样本中真实值为b、预测值为b的第六标签样本的数量,N cb为测试集中的第六标签样本中真实值为c、预测值为b的第六标签样本的数量,N bc为测试集中的第六标签样本中真实值为b、预测值为c的第六标签样本的数量,(m-1)为类别的数量。
5、获取第一区域的第二标签样本。
如表6所示在第一区域收集第二标签样本。
6、根据第六标签样本和第六标签样本所属的类别选择部分或全部第二标签样本作为第一标签样本,以及确定第一标签样本所属的类别。
本示例中,针对每一个第二标签样本,在第六标签样本中,寻找与第二标签样本距离最近的K=20个第六标签样本,在20个第六标签样本中,若有大于或等于0.8*20=16个第六标签样本所属的类别相同,则将第二标签样本作为第一标签样本,并且第一标签样本所属的类别为16个第六标签样本所属的类别。
若所属的类别相同的第六标签样本的数量小于16,则丢弃第二标签样本。
如此循环,直到每一个类别的第一标签样本数量大于或等于用户设置的该类别对应的最少标签样本数,如表7所示为不同类别对应的最少标签样本数。
表7
类别代号 类别名称 其他区域所需最少标签样本数
0 正常小区 100
1 弱覆盖小区 300
2 重叠覆盖小区 200
3 越区覆盖小区 200
4 上行干扰小区 300
5 下行干扰小区 400
7、分别对每一个第一标签样本的每一维变量进行标准化处理,得到第四标签样本。
可按照公式
Figure PCTCN2021128319-appb-000033
对第i个第一标签样本中的第j维变量进行标准化处理;
Figure PCTCN2021128319-appb-000034
为第i个第四标签样本中的第j维变量,x1 ij为第i个第一标签样本中的第j维变量,mean1(S)为所有第一标签样本中的第j维变量的平均值,std1(S)为所有第一标签样本中的第j维变量的标准差。
8、分别对每一个第四标签样本进行降维处理,得到第五标签样本。
本示例采用TSNE算法进行降维处理,TSNE算法对每个第四标签样本近邻的分布进行建模,近邻是指相互靠近第四标签样本的标签样本的集合。在第四标签样本的高维空间中,将高维空间建模成高斯分布,而在低维输出空间(即第五标签样本)中,可以将其建模为t分布,该过程的目标是找到将高维空间映射到低维空间的变换,并且最小化所有标签样本在这两个分布之间的差距。
TSNE算法降维到几维可以自行设定,本示例将71维降维到5维。
9、根据第一标签样本所属的类别和第五标签样本对基础分类模型进行模型的再次训练得到最终分类模型。
将适用于第二区域的基础的GBDT模型(即上述基础分类模型) 迁移到第一区域后,用第一区域的少量第五标签样本训练新的弱分类器,默认新的弱分类器数量为基础分类模型的弱分类器数量的30%,若基础分类模型的弱分类器的数量为100个,则需要训练30个新的弱分类器。
在保持基础分类模型的原有100个弱分类器的结构参数都不变的情况下,生成30个新的弱分类器,最终得到130个弱分类器的随机森林模型。将该模型应用于第一区域的现场进行诊断。
利用基础分类模型对应的模型输入参数(例如max_depth=6,max_features=0.5)训练30个新的弱分类器。
针对每一个新的弱分类器,首先在新的弱分类器的所有特征(如表6所示)中,随机选择71*0.5≈35个特征作为该新的弱分类器的特征。
首先需要确定第一个节点的分裂特征和分裂值。
可根据如下公式计算信息增益:
Figure PCTCN2021128319-appb-000035
G表示损失函数的一阶导数,H表示损失函数的二阶导数,定义损失函数为:L=1/2*(y-y*) 2,y表示真实值,y*表示预测值(根据每一个候选分裂点对样本集D进行划分得到的分类结果),L表示根据分类节点分裂后的左树,R表示根据分类节点分裂后的右树,γ和λ为输入参数,默认取0。
根据每一个特征每一个候选分裂点并行计算Gain值,选择Gain值最大的候选特征的候选分裂点,作为第一个节点的分裂特征和分裂值。
如此循环,对每一个节点的子左节点和子右节点计算得到最优分裂特征和最优分裂值,且新的弱分类器的深度不超过max_depth=6。
由此可以训练得到一个新的弱分类器,同理,将所有新的弱分类器按照上述方法进行学习。
第二方面,本申请实施例还提供一种电子设备,包括:
至少一个处理器;以及
存储器,存储器上存储有至少一个程序,当所述至少一个程序被所述至少一个处理器执行时,实现上述的模型训练方法。
处理器为具有数据处理能力的器件,其包括但不限于中央处理器(CPU)等;存储器为具有数据存储能力的器件,其包括但不限于随机存取存储器(RAM,更具体如SDRAM、DDR等)、只读存储器(ROM)、带电可擦可编程只读存储器(EEPROM)、闪存(FLASH)。
在一些实施方式中,处理器、存储器通过总线相互连接,进而与计算设备的其它组件连接。
第三方面,本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述的模型训练方法。
图3为本申请实施例提供的模型训练装置的组成框图。
第四方面,参照图3,本申请实施例还提供一种模型训练装置,包括:
获取模块301,配置为获取第一区域的第一标签样本和所述第一标签样本所属的类别;
模型再训练模块302,配置为根据第一标签样本所属的类别和第一标签样本对基础分类模型进行模型的再次训练得到最终分类模型;基础分类模型为适用于第二区域的分类模型,最终分类模型为适用于第一区域的分类模型。
在一些示例性实施方式中,获取模块301具体配置为:
获取第一区域的第二标签样本;
根据第二区域的第三标签样本和第三标签样本所属的类别选择部分或全部第二标签样本作为第一标签样本,以及确定第一标签样本所属的类别。
在一些示例性实施方式中,获取模块301具体配置为采用以下方式实现根据第二区域的第三标签样本和第三标签样本所属的类别选择部分或全部第二标签样本作为第一标签样本、以及确定第一标签样本所属的类别:
确定与第二标签样本相似度最高的K个第三标签样本,K为大于 或等于2的整数;
在所述K个第三标签样本中有N个第三标签样本所属的类别相同,且N大于或等于rK的情况下,将第二标签样本作为第一标签样本,确定第一标签样本所属的类别为N个第三标签样本所属的类别,r为大于或等于0、且小于或等于1的整数。
在一些示例性实施方式中,获取模块302还配置为:
在K个第三标签样本中有N个第三标签样本所属的类别相同,且N小于rK的情况下,丢弃第二标签样本。
在一些示例性实施方式中,获取模块301还配置为:
对第一标签样本进行标准化处理,得到第四标签样本;以及
对第四标签样本进行降维处理,得到第五标签样本;
模型再训练模块302具体配置为:根据第一标签样本所属的类别和第五标签样本对基础分类模型进行模型的再次训练得到最终分类模型。
在一些示例性实施方式中,模型再训练模块302具体配置为执行以下至少之一:
在基础分类模型为串行生成的序列化模型的情况下,在基础分类模型后面增加新的层,保持基础分类模型的结构参数不变,根据第一标签样本所属的类别和第五标签样本对新的层进行训练得到最终分类模型;
在基础分类模型为并行化模型的情况下,保持基础分类模型中的分类器的结构参数不变,生成新的分类器,根据第一标签样本所属的类别和第五标签样本对新的分类器进行训练得到最终分类模型。
在一些示例性实施方式中,所述模型训练装置还包括:
模型训练模块303,配置为根据第二区域的第六标签样本所属的类别和第六标签样本进行模型训练得到基础分类模型。
在一些示例性实施方式中,获取模块301还配置为:
对第六标签样本进行标准化处理,得到第七标签样本;以及
对第七标签样本进行降维处理,得到第八标签样本;
模型训练模块303具体配置为:根据第六标签样本所属的类别 和第八标签样本进行模型训练得到基础分类模型。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器(如中央处理器、数字信号处理器或微处理器)执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其它数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其它存储器技术、CD-ROM、数字多功能盘(DVD)或其它光盘存储、磁盒、磁带、磁盘存储或其它磁存储器、或者可以用于存储期望的信息并且可以被计算机访问的任何其它的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其它传输机制之类的调制数据信号中的其它数据,并且可包括任何信息递送介质。
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则与特定实施例相结合描述的特征、特性和/或元素可单独使用,或可与结合其它实施例描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本申请的范围的情况下,可进行各种形式和细节上的改变。

Claims (10)

  1. 一种模型训练方法,包括:
    获取第一区域的第一标签样本和所述第一标签样本所属的类别;以及
    根据所述第一标签样本所属的类别和所述第一标签样本对基础分类模型进行模型的再次训练得到最终分类模型;其中,所述基础分类模型为适用于第二区域的分类模型,所述最终分类模型为适用于所述第一区域的分类模型。
  2. 根据权利要求1所述的模型训练方法,其中,所述获取第一区域的第一标签样本和第一标签样本所属的类别包括:
    获取所述第一区域的第二标签样本;以及
    根据所述第二区域的第三标签样本和所述第三标签样本所属的类别选择部分或全部所述第二标签样本作为所述第一标签样本,以及确定所述第一标签样本所属的类别。
  3. 根据权利要求2所述的模型训练方法,其中,所述根据第二区域的第三标签样本和第三标签样本所属的类别选择部分或全部第二标签样本作为第一标签样本、以及确定第一标签样本所属的类别包括:
    确定与所述第二标签样本相似度最高的K个第三标签样本;其中,K为大于或等于2的整数;
    在所述K个第三标签样本中有N个所述第三标签样本所属的类别相同,且N大于或等于rK的情况下,将所述第二标签样本作为所述第一标签样本,确定所述第一标签样本所属的类别为N个所述第三标签样本所属的类别;其中,r为大于或等于0、且小于或等于1的整数。
  4. 根据权利要求3所述的模型训练方法,在所述K个第三标签 样本中有N个所述第三标签样本所属的类别相同,且N小于rK的情况下,所述模型训练方法还包括:
    丢弃所述第二标签样本。
  5. 根据权利要求1至4中任一项所述的模型训练方法,其中,所述根据第一标签样本所属的类别和第一标签样本对基础分类模型进行模型的再次训练得到最终分类模型之前,所述模型训练方法还包括:
    对所述第一标签样本进行标准化处理,得到第四标签样本;以及
    对所述第四标签样本进行降维处理,得到第五标签样本;
    所述根据第一标签样本所属的类别和第一标签样本对基础分类模型进行模型的再次训练得到最终分类模型包括:根据所述第一标签样本所属的类别和所述第五标签样本对所述基础分类模型进行模型的再次训练得到所述最终分类模型。
  6. 根据权利要求5所述的模型训练方法,其中,所述根据第一标签样本所属的类别和第五标签样本对基础分类模型进行模型的再次训练得到最终分类模型包括以下至少之一:
    在所述基础分类模型为串行生成的序列化模型的情况下,在所述基础分类模型后面增加新的层,保持所述基础分类模型的结构参数不变,根据所述第一标签样本所属的类别和所述第五标签样本对所述新的层进行训练得到所述最终分类模型;以及
    在所述基础分类模型为并行化模型的情况下,保持所述基础分类模型中的分类器的结构参数不变,生成新的分类器,根据所述第一标签样本所属的类别和所述第五标签样本对所述新的分类器进行训练得到所述最终分类模型。
  7. 根据权利要求1至4中任一项所述的模型训练方法,所述获取第一区域的第一标签样本和第一标签样本所属的类别之前,所述模 型训练方法还包括:
    根据所述第二区域的第六标签样本所属的类别和所述第六标签样本进行模型训练得到所述基础分类模型。
  8. 根据权利要求7所述的模型训练方法,所述根据第二区域的第六标签样本所属的类别和第六标签样本进行模型训练得到基础分类模型之前,所述模型训练方法还包括:
    对所述第六标签样本进行标准化处理,得到第七标签样本;以及
    对所述第七标签样本进行降维处理,得到第八标签样本;
    所述根据第二区域的第六标签样本所属的类别和第六标签样本进行模型训练得到基础分类模型包括:根据所述第六标签样本所属的类别和所述第八标签样本进行模型训练得到所述基础分类模型。
  9. 一种电子设备,包括:
    至少一个处理器;以及
    存储器,所述存储器上存储有至少一个程序,当所述至少一个程序被所述至少一个处理器执行时,实现根据权利要求1至8中任意一项所述的模型训练方法。
  10. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现根据权利要求1至8中任意一项所述的模型训练方法。
PCT/CN2021/128319 2020-11-11 2021-11-03 模型训练方法和装置、电子设备、计算机可读存储介质 WO2022100491A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011259760.6 2020-11-11
CN202011259760.6A CN114501515A (zh) 2020-11-11 2020-11-11 模型训练方法和装置、电子设备、计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022100491A1 true WO2022100491A1 (zh) 2022-05-19

Family

ID=81489741

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/128319 WO2022100491A1 (zh) 2020-11-11 2021-11-03 模型训练方法和装置、电子设备、计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN114501515A (zh)
WO (1) WO2022100491A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170147944A1 (en) * 2015-11-24 2017-05-25 Xerox Corporation Adapted domain specific class means classifier
CN110210625A (zh) * 2019-05-20 2019-09-06 平安科技(深圳)有限公司 基于迁移学习的建模方法、装置、计算机设备和存储介质
WO2020091871A1 (en) * 2018-10-29 2020-05-07 Hrl Laboratories, Llc Systems and methods for few-shot transfer learning
CN111401454A (zh) * 2020-03-19 2020-07-10 创新奇智(重庆)科技有限公司 一种基于迁移学习的少样本目标识别方法
CN111444952A (zh) * 2020-03-24 2020-07-24 腾讯科技(深圳)有限公司 样本识别模型的生成方法、装置、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170147944A1 (en) * 2015-11-24 2017-05-25 Xerox Corporation Adapted domain specific class means classifier
WO2020091871A1 (en) * 2018-10-29 2020-05-07 Hrl Laboratories, Llc Systems and methods for few-shot transfer learning
CN110210625A (zh) * 2019-05-20 2019-09-06 平安科技(深圳)有限公司 基于迁移学习的建模方法、装置、计算机设备和存储介质
CN111401454A (zh) * 2020-03-19 2020-07-10 创新奇智(重庆)科技有限公司 一种基于迁移学习的少样本目标识别方法
CN111444952A (zh) * 2020-03-24 2020-07-24 腾讯科技(深圳)有限公司 样本识别模型的生成方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN114501515A (zh) 2022-05-13

Similar Documents

Publication Publication Date Title
CN111914644B (zh) 一种基于双模态协同的弱监督时序动作定位方法及系统
US11537884B2 (en) Machine learning model training method and device, and expression image classification method and device
CN113378632B (zh) 一种基于伪标签优化的无监督域适应行人重识别方法
US20200097709A1 (en) Classification model training method, server, and storage medium
WO2022033072A1 (zh) 一种面向知识图谱表示学习训练的局部训练方法
US8626682B2 (en) Automatic data cleaning for machine learning classifiers
Einmahl et al. An M-estimator of spatial tail dependence
WO2023125654A1 (zh) 人脸识别模型的训练方法、装置、电子设备及存储介质
WO2021043140A1 (zh) 标签确定方法、装置和系统
CN108446689A (zh) 一种人脸识别方法
CN110222785B (zh) 用于气体传感器漂移校正的自适应置信度主动学习方法
CN106067034B (zh) 一种基于高维矩阵特征根的配电网负荷曲线聚类方法
CN110543906B (zh) 基于Mask R-CNN模型的肤质自动识别方法
WO2016201679A1 (zh) 特征提取方法、唇语分类方法、装置及设备
CN112417981B (zh) 基于改进FasterR-CNN的复杂战场环境目标高效识别方法
CN111178196B (zh) 一种细胞分类的方法、装置及设备
CN112801162B (zh) 基于图像属性先验的自适应软标签正则化方法
CN113177587B (zh) 基于主动学习和变分自编码器的广义零样本目标分类方法
WO2022100491A1 (zh) 模型训练方法和装置、电子设备、计算机可读存储介质
CN117407781A (zh) 基于联邦学习的设备故障诊断方法及装置
CN117079017A (zh) 可信的小样本图像识别分类方法
Lipor et al. Margin-based active subspace clustering
CN113777965B (zh) 喷涂质量控制方法、装置、计算机设备及存储介质
CN114693088A (zh) 一种水库温度场影响因素分析方法、装置及存储介质
CN111626376A (zh) 一种基于判别联合概率的域适配方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21891018

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 04/10/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21891018

Country of ref document: EP

Kind code of ref document: A1