CN114239720A - Method and device for reducing feature weight and computer-readable storage medium - Google Patents

Method and device for reducing feature weight and computer-readable storage medium Download PDF

Info

Publication number
CN114239720A
CN114239720A CN202111547646.8A CN202111547646A CN114239720A CN 114239720 A CN114239720 A CN 114239720A CN 202111547646 A CN202111547646 A CN 202111547646A CN 114239720 A CN114239720 A CN 114239720A
Authority
CN
China
Prior art keywords
feature
sample
model
subset
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111547646.8A
Other languages
Chinese (zh)
Inventor
顾凌云
周轩
王妍
乔韵如
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai IceKredit Inc
Original Assignee
Shanghai IceKredit Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai IceKredit Inc filed Critical Shanghai IceKredit Inc
Priority to CN202111547646.8A priority Critical patent/CN114239720A/en
Publication of CN114239720A publication Critical patent/CN114239720A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides a method and a device for reducing feature weight and a computer readable storage medium, and relates to the technical field of computers. Firstly, acquiring a sample characteristic set; then, obtaining feature weight ranking of different sample features; then, dividing the sample feature set into a first sample feature subset consisting of sample features needing to be ranked by reducing the feature weight and a second sample feature subset consisting of sample features not needing to be ranked by reducing the feature weight; and finally, constructing a base model according to the second sample feature subset, and sequentially constructing an incremental model on the basis of the base model through the feature weight ranking sequence of the sample features in the first sample feature subset. According to the method, the weight ranking of the sample features in the first sample feature subset is reduced by reducing the number of trees in which the sample features in the first sample feature subset participate in construction, so that the purpose of reducing the feature weight is achieved, and information loss and great reduction of model effect caused by direct feature deletion are avoided.

Description

Method and device for reducing feature weight and computer-readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for reducing feature weights, and a computer-readable storage medium.
Background
The characteristics are important components of the model, the characteristic selection is an important link for constructing the model, and the model entering characteristics determine the model effect. In actual practice, one such situation is encountered: a feature works well on modeling a data set, but since the feature is unstable or empirically works well, modeling may result in a decrease in the predictive power of the model on future data. In order to ensure the stability of the model, the modeler usually deletes the feature, and the loss of the feature information leads to the obvious reduction of the model effect.
Disclosure of Invention
To overcome at least the above-mentioned shortcomings in the prior art, the present application is directed to a method, an apparatus and a computer-readable storage medium for reducing feature weights, which are used to solve the above-mentioned technical problems.
In a first aspect, an embodiment of the present application provides a method for reducing a feature weight, which is applied to a computer device, and the method includes:
obtaining a training sample set used for model training and a sample characteristic set formed by characteristics of samples in the training sample set;
based on the training sample set, adopting the sample feature set to construct a feature model, and obtaining a feature weight ranking of each sample feature in the sample feature set in the feature model, wherein the feature model is an integrated model which can be used for realizing incremental learning;
dividing the sample feature set into a first sample feature subset consisting of sample features needing to be ranked by reducing the feature weight and a second sample feature subset consisting of sample features needing not to be ranked by reducing the feature weight;
and constructing a base model according to the second sample feature subset, and sequentially constructing incremental models according to the feature weight ranking sequence of the sample features in the first sample feature subset in the feature model based on the base model to obtain the feature model with reduced feature weight.
In a possible implementation manner, the step of obtaining a feature weight ranking of each sample feature in the sample feature set in the feature model by using the sample feature set to construct the feature model based on the training sample set includes:
constructing a feature model consisting of a tree by using the sample feature set;
and obtaining the characteristic weight ranking of each sample characteristic in the sample characteristic set in the characteristic model based on the times of the sample characteristics in the sample characteristic set for the split node in the characteristic model as the measurement of the characteristic weight.
In a possible implementation manner, the step of dividing the sample feature set into a first sample feature subset composed of sample features requiring a reduced feature weight rank and a second sample feature subset composed of sample features not requiring a reduced feature weight rank includes:
selecting a first sample feature subset consisting of sample features needing to reduce feature weight ranking from the sample feature set based on expert experience, and sequencing the sample features in the first sample feature subset according to feature weights;
and performing subtraction processing on the sample feature set and the first sample feature subset to obtain a second sample feature subset consisting of sample features without reducing the feature weight ranking.
In a possible implementation manner, the step of constructing a base model according to the second sample feature subset, and sequentially performing incremental model construction according to the feature weight ranking order of the sample features in the first sample feature subset in the feature model based on the base model to obtain the feature model with reduced feature weights includes:
constructing a base model by adopting the second sample feature subset, and taking the sample feature with the maximum feature weight ranking in the first sample feature subset as a target sample feature;
constructing an incremental model by adopting the target sample characteristics in the first sample characteristic subset and the second sample characteristic subset on the basis of the base model, and obtaining the characteristic weight ranking of the target sample characteristics in the incremental model;
judging whether the base model needs to be determined again or not based on the feature weight ranking of the target sample features in the feature model, the feature weight ranking of the target sample features in the incremental model, a preset feature weight ranking reduction threshold and the number of sample features in the first sample feature subset;
and if the base model does not need to be determined again, adding the target sample features into the second sample feature subset, removing the target sample features from the first sample feature subset, repeating the steps until the number of the sample features in the first sample feature subset is zero, and taking the finally obtained incremental model as a feature model.
In a possible implementation manner, the step of determining whether to update the base model based on the feature weight ranking of the target sample feature in the feature model, the feature weight ranking of the target sample feature in the incremental model, a preset feature weight ranking reduction threshold, and the number of sample features in the first sample feature subset includes:
calculating the sum of the feature weight ranking of the target sample feature in the incremental model and the number of other sample features except the target sample feature in the first sample feature subset to obtain a first sum;
calculating the sum of the characteristic weight ranking of the target sample characteristics in the characteristic model and the preset characteristic weight ranking reduction threshold value to obtain a second sum value;
comparing the first sum value with the second sum value, re-determining the base model when the first sum value is less than or equal to the second sum value, and performing incremental model construction based on the determined base model; when the first sum value is larger than the second sum value, updating the number of the determined trees in the base model, and resetting the number of the trees to be determined in the base model and the number of the trees constructed by the target sample characteristics.
In a possible implementation manner, if it is determined that the base model does not need to be determined again, the step of adding the target sample features into the second sample feature subset, removing the target sample features from the first sample feature subset, repeating the above steps until the number of sample features in the first sample feature subset is zero, and using the finally obtained incremental model as a feature model includes:
adding the target sample features into the second subset of sample features and removing the target sample features from the first subset of sample features;
detecting the number of sample features in the first sample feature subset, and when detecting that the number of sample features in the first sample feature subset is not zero, taking the sample feature with the largest new feature weight ranking in the first sample feature subset as a target sample feature, and repeating the steps; and when the number of the sample features in the first sample feature subset is detected to be zero, taking the finally obtained incremental model as a feature model.
In a second aspect, an embodiment of the present application further provides an apparatus for reducing a feature weight, where the apparatus is applied to a computer device, and the apparatus includes:
the acquisition module is used for acquiring a training sample set used for model training and a sample characteristic set formed by the characteristics of samples in the training sample set;
the first construction module is used for constructing a feature model by adopting the sample feature set based on the training sample set to obtain a feature weight ranking of each sample feature in the sample feature set in the feature model, wherein the feature model can be used for realizing incremental learning;
the dividing module is used for dividing the sample feature set into a first sample feature subset consisting of sample features needing to be ranked by reducing the feature weight and a second sample feature subset consisting of sample features needing not to be ranked by reducing the feature weight;
and the second construction module is used for constructing a base model according to the second sample feature subset, and sequentially constructing incremental models according to the feature weight ranking sequence of the sample features in the first sample feature subset in the feature model based on the base model to obtain the feature model with reduced feature weight.
In a possible implementation manner, the feature model is a gradient lifting model, and the first building module is specifically configured to:
constructing a feature model consisting of a tree by using the sample feature set;
and obtaining the characteristic weight ranking of each sample characteristic in the sample characteristic set in the characteristic model based on the times of the sample characteristics in the sample characteristic set for the split node in the characteristic model as the measurement of the characteristic weight.
In a third aspect, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed, the computer is caused to perform the method for reducing the feature weight in the first aspect or any one of the possible implementation manners of the first aspect.
In a fourth aspect, an embodiment of the present application further provides a computer device, where the computer device includes a processor and a computer-readable storage medium, where the processor and the computer-readable storage medium are connected through a bus system, and the computer-readable storage medium is used to store a program, an instruction, or a code, and the processor is used to execute the program, the instruction, or the code in the computer-readable storage medium, so as to implement the method for reducing a feature weight in the first aspect or any one of the possible implementation manners in the first aspect.
Based on any one of the above aspects, first, a sample feature set is obtained; secondly, constructing a feature model based on the sample feature set to obtain feature weight ranks of different sample features; then, dividing the sample feature set into a first sample feature subset consisting of sample features needing to be ranked by reducing the feature weight and a second sample feature subset consisting of sample features not needing to be ranked by reducing the feature weight; and finally, constructing a base model according to the second sample feature subset, and sequentially constructing incremental models through the feature weight ranking sequence of the sample features in the first sample feature subset on the basis of the base model to obtain the feature model with reduced feature weight. According to the method, the weight ranking of the sample features in the first sample feature subset is reduced by reducing the number of trees in which the sample features in the first sample feature subset participate in construction, so that the purpose of reducing the feature weight is achieved, and information loss and great reduction of model effect caused by direct feature deletion are avoided.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that need to be called in the embodiments are briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a method for reducing feature weight according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart illustrating the sub-steps of step S104 in FIG. 1;
FIG. 3 is a block diagram illustrating functional blocks of an apparatus for reducing feature weights according to an embodiment of the present disclosure;
fig. 4 is a schematic hardware structure diagram of a computer device according to an embodiment of the present application.
Detailed Description
The present application will now be described in detail with reference to the drawings, and the specific operations in the method embodiments may also be applied to the apparatus embodiments or the system embodiments.
In order to overcome the defects of the feature screening method in the prior art, an embodiment of the present application provides a method for reducing a feature weight, please refer to fig. 1, where fig. 1 is a schematic flow chart of the method for reducing a feature weight provided in the embodiment of the present application, the method for reducing a feature weight provided in the embodiment of the present application may be executed by a computer device, and for convenience of explaining a technical scheme of the present application, the following describes in detail the flow steps of the method for reducing a feature weight with reference to fig. 1.
Step S101, a training sample set for model training and a sample feature set composed of features of samples in the training sample set are obtained.
In this step, the sample feature set is composed of features of samples in the training sample set, for example, when the samples in the training sample set are fans of a certain sports item (e.g., marathon), the features of the samples may be age, income, gender, and the like. The sample feature set may be represented as the set X ═ { X1, X2.. xn }, where xi (1 ≦ i ≦ n) represents the sample features.
And S102, constructing a feature model by using the sample feature set based on the training sample set to obtain the feature weight ranking of each sample feature in the sample feature set in the feature model.
Feature weights, also called feature importance, are used to measure how well a feature affects the performance of the model. In the linear model, the coefficient of the feature is the feature weight, and the larger the coefficient is, the more important the feature is. In the tree model, attribute importance is measured by calculating the amount of improvement performance metric for each feature split point, which may be the information gain when selecting a feature split node or the number of times a selected feature is used to split a node.
In the embodiment of the present application, the feature model is an integrated model that can be used to implement incremental learning, and specifically, the feature model may include a gradient boost model (xgboost model), a LightGBM model, and the like. For convenience of description, the feature model is taken as a gradient boost model for example.
Specifically, step S102 may be implemented in the following manner.
First, a feature model composed of trees is constructed by using the sample feature set, wherein a tree of the feature model may be T0.
Then, based on the number of times that the sample features in the sample feature set are used for the split node in the feature model as the measure of the feature weight, the feature weight ranking of each sample feature in the sample feature set in the feature model is obtained.
The feature weight ranking result of the sample feature set X may be D ═ r1, r 2.. rn }, ri is the feature weight ranking of the feature xi, and the higher the feature weight is, the smaller the ranking value is, the more important the feature is.
Step S103, dividing the sample feature set into a first sample feature subset consisting of sample features needing to be ranked by reducing the feature weight and a second sample feature subset consisting of sample features needing not to be ranked by reducing the feature weight.
In this step, the sample feature set X may be divided into a first sample feature subset a and a second sample feature subset B based on expert experience. Specifically, m sample features may be selected from the sample feature set X to reduce the feature weight, and a first sample feature subset a composed of m sample features whose feature weight needs to be reduced is obtained, { a1, a 2.,. am }, and sorted based on the feature weights of the sample features, and a sorted set D _ a { (r _ a1, r _ a 2.,. r _ am }, r _ ai ∈ D, which is a ranking of the feature weight of the feature ai in the sample feature set X, is obtained. The remaining n-m sample features in the sample feature set X constitute a second sample feature subset B, and specifically, the second sample feature subset B is obtained by performing subtraction processing on the sample feature set X and the first sample feature subset a.
And step S104, constructing a base model according to the second sample feature subset, and sequentially constructing incremental models according to the feature weight ranking sequence of the sample features in the first sample feature subset in the feature model based on the base model to obtain the feature model with reduced feature weights.
Referring to fig. 2, fig. 2 shows a flowchart of the sub-steps of step S104, and a specific implementation of step S104 is described below with reference to fig. 2.
And a substep S1041 of constructing a base model by using the second sample feature subset, and taking the sample feature with the largest feature weight ranking in the first sample feature subset as the target sample feature.
When an initial basis model is constructed, model parameters need to be initialized, specifically, T can be taken as the number of trees to be determined, the initial value T is 0, T is the number of the trees to be determined, and the initial value T is T0; and constructing a gradient lifting model of the t trees as a base model by using the second sample feature subset B, and recording the gradient lifting model as f (t). In the sorted set D _ a corresponding to the first sample feature subset a, { r _ a1, r _ a 2., r _ am }, a sample feature a corresponding to r with the highest feature weight rank (i.e., the sample feature with the weakest importance) is selected, and a predetermined feature weight rank reduction threshold δ is set.
And a substep S1042 of constructing an incremental model by using the target sample characteristics in the first sample characteristic subset and the second sample characteristic subset on the basis of the base model, and obtaining a characteristic weight ranking of the target sample characteristics in the incremental model.
Specifically, on the basis of a base model f (T), constructing T + T +1 to T0 trees by using a second sample feature subset B and a target sample feature a, obtaining an incremental model g (T), and obtaining a feature weight ranking r' of the target sample feature a on the incremental model g (T).
And a substep S1043 of determining whether the base model needs to be re-determined based on the feature weight ranking of the target sample features in the feature model, the feature weight ranking of the target sample features in the incremental model, a preset feature weight ranking reduction threshold and the number of sample features in the first sample feature subset.
In the embodiment of the present application, the sub-step S1043 may be implemented in the following manner.
First, the sum of the feature weight ranking of the target sample feature in the incremental model and the number of other sample features except the target sample feature in the first sample feature subset is calculated to obtain a first sum value T1, where T1 ═ r' + (number of elements in D _ a-1).
Next, the sum of the feature weight rank of the target sample feature in the feature model and a predetermined feature weight rank reduction threshold is calculated to obtain a second sum T2, where T2 is r + δ.
Then, comparing the first sum value with the second sum value, re-determining the base model when the first sum value is less than or equal to the second sum value, and performing incremental model construction based on the determined base model; when the first sum value is larger than the second sum value, updating the number of the determined trees in the base model, and resetting the number of the trees to be determined in the base model and the number of the trees constructed by the target sample characteristics.
Specifically, when T1 ≦ T2, let T be T + step _ T (step _ T is a positive integer), re-determine the base model f (T), and re-return to substep S1042; when T1> T2, the feature weight rank of the target sample feature is reduced to be below the target value, the number T of the determined trees is updated to T, the number T of the trees to be determined is reset to 1, the base model f (T) is updated to construct T +1 to T + T trees for the T trees constructed by the current base model and the target sample feature a, and when T1> T2, the sub-step S1044 is performed.
And step S1044, if the base model is determined not to be determined again, adding the target sample features into the second sample feature subset, removing the target sample features from the first sample feature subset, repeating the steps until the number of the sample features in the first sample feature subset is zero, and taking the finally obtained incremental model as the feature model.
In this sub-step, when it is determined not to update the base model, the target sample feature a is added to the second sample feature subset B, and the target feature a is removed from the first sample feature subset a, the first sample feature subset a and the second sample feature subset B are updated, whether the first sample feature subset a is empty is detected, if not, the sub-step S1041 is returned to until the first sample feature subset a is empty, and the finally obtained incremental model is used as the feature model.
According to the scheme provided by the embodiment of the application, firstly, a sample feature set is obtained; secondly, constructing a feature model based on the sample feature set to obtain feature weight ranks of different sample features; then, dividing the sample feature set into a first sample feature subset consisting of sample features needing to be ranked by reducing the feature weight and a second sample feature subset consisting of sample features not needing to be ranked by reducing the feature weight; and finally, constructing a base model according to the second sample feature subset, and sequentially constructing incremental models through the feature weight ranking sequence of the sample features in the first sample feature subset on the basis of the base model to obtain the feature model with reduced feature weight. According to the method, the weight ranking of the sample features in the first sample feature subset is reduced by reducing the number of trees in which the sample features in the first sample feature subset participate in construction, so that the purpose of reducing the feature weight is achieved, and information loss and great reduction of model effect caused by direct feature deletion are avoided.
Referring to fig. 3, fig. 3 is a schematic diagram of functional modules of an apparatus for reducing a feature weight according to an embodiment of the present disclosure, in which the embodiment can divide the functional modules of the apparatus 200 for reducing a feature weight according to a method embodiment executed by a computer device, that is, the following functional modules corresponding to the apparatus 200 for reducing a feature weight can be used to execute the method embodiments executed by the computer device. The feature weight reduction apparatus 200 may include an obtaining module 210, a first constructing module 220, a dividing module 230, and a second constructing module 240, and the functions of the functional modules of the feature weight reduction apparatus 200 are described in detail below.
The obtaining module 210 is configured to obtain a training sample set used for model training and a sample feature set formed by features of samples in the training sample set.
In this step, the sample feature set is composed of features of samples in the training sample set, for example, when the samples in the training sample set are fans of a certain sports item (e.g., marathon), the features of the samples may be age, income, gender, and the like. The sample feature set may be represented as the set X ═ { X1, X2.. xn }, where xi (1 ≦ i ≦ n) represents the sample features.
The first constructing module 220 is configured to construct a feature model by using the sample feature set based on the training sample set, and obtain a feature weight ranking of each sample feature in the sample feature set in the feature model.
Feature weights, also called feature importance, are used to measure how well a feature affects the performance of the model. In the linear model, the coefficient of the feature is the feature weight, and the larger the coefficient is, the more important the feature is. In the tree model, attribute importance is measured by calculating the amount of improvement performance metric for each feature split point, which may be the information gain when selecting a feature split node or the number of times a selected feature is used to split a node.
In the embodiment of the present application, the feature model is an integrated model that can be used to implement incremental learning, and specifically, the feature model may include a gradient boost model (xgboost model), a LightGBM model, and the like. For convenience of description, the feature model is taken as a gradient boost model for example.
Specifically, the first building module 220 may be implemented in the following manner.
First, a feature model composed of trees is constructed by using the sample feature set, wherein a tree of the feature model may be T0.
Then, based on the number of times that the sample features in the sample feature set are used for the split node in the feature model as the measure of the feature weight, the feature weight ranking of each sample feature in the sample feature set in the feature model is obtained.
The feature weight ranking result of the sample feature set X may be D ═ r1, r 2.. rn }, ri is the feature weight ranking of the feature xi, and the higher the feature weight is, the smaller the ranking value is, the more important the feature is.
The dividing module 230 is configured to divide the sample feature set into a first sample feature subset composed of sample features whose feature weight rank needs to be reduced and a second sample feature subset composed of sample features whose feature weight rank does not need to be reduced.
The partitioning module 230 may partition the sample feature set X into a first sample feature subset a and a second sample feature subset B based on expert experience. Specifically, m sample features may be selected from the sample feature set X to reduce the feature weight, and a first sample feature subset a composed of m sample features whose feature weight needs to be reduced is obtained, { a1, a 2.,. am }, and sorted based on the feature weights of the sample features, and a sorted set D _ a { (r _ a1, r _ a 2.,. r _ am }, r _ ai ∈ D, which is a ranking of the feature weight of the feature ai in the sample feature set X, is obtained. The remaining n-m sample features in the sample feature set X constitute a second sample feature subset B, and specifically, the second sample feature subset B is obtained by performing subtraction processing on the sample feature set X and the first sample feature subset a.
And a second construction module 240, configured to construct a base model according to the second sample feature subset, and based on the base model, sequentially perform incremental model construction according to the feature weight ranking order of the sample features in the first sample feature subset in the feature model, so as to obtain the feature model with reduced feature weights.
In the embodiment of the present application, the second building module 240 may be implemented in the following manner.
Firstly, a base model is constructed by adopting a second sample feature subset, and the sample feature with the largest feature weight ranking in the first sample feature subset is used as the target sample feature.
When an initial basis model is constructed, model parameters need to be initialized, specifically, T can be taken as the number of trees to be determined, the initial value T is 0, T is the number of the trees to be determined, and the initial value T is T0; and constructing a gradient lifting model of the t trees as a base model by using the second sample feature subset B, and recording the gradient lifting model as f (t). In the sorted set D _ a corresponding to the first sample feature subset a, { r _ a1, r _ a 2., r _ am }, a sample feature a corresponding to r with the highest feature weight rank (i.e., the sample feature with the weakest importance) is selected, and a predetermined feature weight rank reduction threshold δ is set.
And then, constructing an incremental model by adopting the target sample characteristics in the first sample characteristic subset and the second sample characteristic subset on the basis of the base model, and obtaining the characteristic weight ranking of the target sample characteristics in the incremental model.
Specifically, on the basis of a base model f (T), constructing T + T +1 to T0 trees by using a second sample feature subset B and a target sample feature a, obtaining an incremental model g (T), and obtaining a feature weight ranking r' of the target sample feature a on the incremental model g (T).
Then, whether the base model needs to be determined again is judged based on the feature weight ranking of the target sample features in the feature model, the feature weight ranking of the target sample features in the incremental model, a preset feature weight ranking reduction threshold value and the number of the sample features in the first sample feature subset.
Specifically, the sum of the feature weight ranking of the target sample feature in the incremental model and the number of other sample features except the target sample feature in the first sample feature subset is calculated to obtain a first sum value T1, where T1 ═ r' + (number of elements in D _ a-1). And calculating the sum of the feature weight rank of the target sample feature in the feature model and a preset feature weight rank reduction threshold to obtain a second sum value T2, wherein T2 is r + delta. Comparing the first sum value with the second sum value, re-determining the base model when the first sum value is less than or equal to the second sum value, and performing incremental model construction based on the determined base model; when the first sum value is larger than the second sum value, updating the number of the determined trees in the base model, and resetting the number of the trees to be determined in the base model and the number of the trees constructed by the target sample characteristics.
In detail, when T1 is less than or equal to T2, let T be T + step _ T (step _ T is a positive integer), re-determine the base model f (T), and re-construct the incremental model; when T1 is greater than T2, the feature weight ranking of the target sample features is reduced to be lower than a target value, the number T of the determined trees is updated to be T, the number T of the trees to be determined is reset to be 1, and the base model f (T) is updated to construct T +1 to T + T trees for the T trees constructed by the current base model and the target sample features a.
And finally, if the base model is judged not to be determined again, adding the target sample features into the second sample feature subset, removing the target sample features from the first sample feature subset, repeating the steps until the number of the sample features in the first sample feature subset is zero, and taking the finally obtained incremental model as the feature model.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating a hardware structure of a computer device 10 for implementing the method for reducing the feature weight according to the embodiment of the present disclosure, where the computer device 10 may be implemented on a cloud server. As shown in fig. 4, computer device 10 may include a processor 11, a computer-readable storage medium 12, and a bus 13.
In particular implementation, at least one processor 11 executes computer-executable instructions (e.g., modules included in the apparatus for reducing feature weight 200 shown in fig. 3) stored by the computer-readable storage medium 12, so that the processor 11 can execute the method for reducing feature weight according to the above method embodiment, wherein the processor 11 and the computer-readable storage medium 12 are connected through the bus 13.
For the specific implementation process of the processor 11, reference may be made to the above-mentioned method embodiments executed by the computer device 10, which implement the principle and the technical effect similarly, and the detailed description of the embodiment is omitted here.
Computer-readable storage medium 12 may include random access memory and may also include non-volatile storage, such as at least one disk storage.
The bus 13 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
In addition, the embodiment of the present application further provides a readable storage medium, in which a computer executing instruction is stored, and when a processor executes the computer executing instruction, the method for reducing the feature weight as above is implemented.
To sum up, in the method, the apparatus, and the computer-readable storage medium for reducing feature weight provided in the embodiments of the present application, first, a sample feature set is obtained; secondly, constructing a feature model based on the sample feature set to obtain feature weight ranks of different sample features; then, dividing the sample feature set into a first sample feature subset consisting of sample features needing to be ranked by reducing the feature weight and a second sample feature subset consisting of sample features not needing to be ranked by reducing the feature weight; and finally, constructing a base model according to the second sample feature subset, and sequentially constructing incremental models through the feature weight ranking sequence of the sample features in the first sample feature subset on the basis of the base model to obtain the feature model with reduced feature weight. According to the method, the weight ranking of the sample features in the first sample feature subset is reduced by reducing the number of trees in which the sample features in the first sample feature subset participate in construction, so that the purpose of reducing the feature weight is achieved, and information loss and great reduction of model effect caused by direct feature deletion are avoided.
The embodiments described above are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application provided in the accompanying drawings is not intended to limit the scope of the application, but is merely representative of selected embodiments of the application. Based on this, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for reducing feature weights, applied to a computer device, the method comprising:
obtaining a training sample set used for model training and a sample characteristic set formed by characteristics of samples in the training sample set;
based on the training sample set, adopting the sample feature set to construct a feature model, and obtaining a feature weight ranking of each sample feature in the sample feature set in the feature model, wherein the feature model is an integrated model which can be used for realizing incremental learning;
dividing the sample feature set into a first sample feature subset consisting of sample features needing to be ranked by reducing the feature weight and a second sample feature subset consisting of sample features needing not to be ranked by reducing the feature weight;
and constructing a base model according to the second sample feature subset, and sequentially constructing incremental models according to the feature weight ranking sequence of the sample features in the first sample feature subset in the feature model based on the base model to obtain the feature model with reduced feature weight.
2. The method of claim 1, wherein the feature model is a gradient boosting model, and the step of constructing the feature model based on the training sample set by using the sample feature set to obtain a feature weight ranking of each sample feature in the sample feature set in the feature model comprises:
constructing a feature model consisting of a tree by using the sample feature set;
and obtaining the characteristic weight ranking of each sample characteristic in the sample characteristic set in the characteristic model based on the times of the sample characteristics in the sample characteristic set for the split node in the characteristic model as the measurement of the characteristic weight.
3. The method for reducing feature weights according to claim 2, wherein the step of dividing the sample feature set into a first sample feature subset consisting of sample features requiring a reduced feature weight ranking and a second sample feature subset consisting of sample features not requiring a reduced feature weight ranking comprises:
selecting a first sample feature subset consisting of sample features needing to reduce feature weight ranking from the sample feature set based on expert experience, and sequencing the sample features in the first sample feature subset according to feature weights;
and performing subtraction processing on the sample feature set and the first sample feature subset to obtain a second sample feature subset consisting of sample features without reducing the feature weight ranking.
4. The method for reducing the feature weight according to claim 2 or 3, wherein the step of constructing a base model according to the second sample feature subset, and sequentially performing incremental model construction according to the feature weight ranking order of the sample features in the first sample feature subset in the feature model based on the base model to obtain the feature model with reduced feature weight includes:
constructing a base model by adopting the second sample feature subset, and taking the sample feature with the maximum feature weight ranking in the first sample feature subset as a target sample feature;
constructing an incremental model by adopting the target sample characteristics in the first sample characteristic subset and the second sample characteristic subset on the basis of the base model, and obtaining the characteristic weight ranking of the target sample characteristics in the incremental model;
judging whether the base model needs to be determined again or not based on the feature weight ranking of the target sample features in the feature model, the feature weight ranking of the target sample features in the incremental model, a preset feature weight ranking reduction threshold and the number of sample features in the first sample feature subset;
and if the base model does not need to be determined again, adding the target sample features into the second sample feature subset, removing the target sample features from the first sample feature subset, repeating the steps until the number of the sample features in the first sample feature subset is zero, and taking the finally obtained incremental model as a feature model.
5. The method of reducing feature weights according to claim 4, wherein the step of determining whether to update the base model based on the feature weight ranking of the target sample feature in the feature model, the feature weight ranking of the target sample feature in the incremental model, a pre-determined feature weight ranking reduction threshold, and the number of sample features in the first subset of sample features comprises:
calculating the sum of the feature weight ranking of the target sample feature in the incremental model and the number of other sample features except the target sample feature in the first sample feature subset to obtain a first sum;
calculating the sum of the characteristic weight ranking of the target sample characteristics in the characteristic model and the preset characteristic weight ranking reduction threshold value to obtain a second sum value;
comparing the first sum value with the second sum value, re-determining the base model when the first sum value is less than or equal to the second sum value, and performing incremental model construction based on the determined base model; when the first sum value is larger than the second sum value, updating the number of the determined trees in the base model, and resetting the number of the trees to be determined in the base model and the number of the trees constructed by the target sample characteristics.
6. The method of claim 5, wherein if it is determined that the base model does not need to be re-determined, adding the target sample features to the second subset of sample features and removing the target sample features from the first subset of sample features, repeating the above steps until the number of sample features in the first subset of sample features is zero, and using the resulting incremental model as a feature model, comprises:
adding the target sample features into the second subset of sample features and removing the target sample features from the first subset of sample features;
detecting the number of sample features in the first sample feature subset, and when detecting that the number of sample features in the first sample feature subset is not zero, taking the sample feature with the largest new feature weight ranking in the first sample feature subset as a target sample feature, and repeating the steps; and when the number of the sample features in the first sample feature subset is detected to be zero, taking the finally obtained incremental model as a feature model.
7. An apparatus for reducing feature weights, applied to a computer device, the apparatus comprising:
the acquisition module is used for acquiring a training sample set used for model training and a sample characteristic set formed by the characteristics of samples in the training sample set;
the first construction module is used for constructing a feature model by adopting the sample feature set based on the training sample set to obtain a feature weight ranking of each sample feature in the sample feature set in the feature model, wherein the feature model can be used for realizing incremental learning;
the dividing module is used for dividing the sample feature set into a first sample feature subset consisting of sample features needing to be ranked by reducing the feature weight and a second sample feature subset consisting of sample features needing not to be ranked by reducing the feature weight;
and the second construction module is used for constructing a base model according to the second sample feature subset, and sequentially constructing incremental models according to the feature weight ranking sequence of the sample features in the first sample feature subset in the feature model based on the base model to obtain the feature model with reduced feature weight.
8. The apparatus for reducing feature weights according to claim 7, wherein the feature model is a gradient boosting model, and the first building module is specifically configured to:
constructing a feature model consisting of a tree by using the sample feature set;
and obtaining the characteristic weight ranking of each sample characteristic in the sample characteristic set in the characteristic model based on the times of the sample characteristics in the sample characteristic set for the split node in the characteristic model as the measurement of the characteristic weight.
9. A computer device, comprising a processor and a computer-readable storage medium connected by a bus system, wherein the computer-readable storage medium is used for storing a program, instructions or codes, and the processor is used for executing the program, instructions or codes in the computer-readable storage medium to realize the method for reducing feature weight according to any one of claims 1 to 6.
10. A computer-readable storage medium having stored therein instructions that, when executed, cause a computer device to perform the method of reducing feature weight of any of claims 1-6.
CN202111547646.8A 2021-12-16 2021-12-16 Method and device for reducing feature weight and computer-readable storage medium Pending CN114239720A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111547646.8A CN114239720A (en) 2021-12-16 2021-12-16 Method and device for reducing feature weight and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111547646.8A CN114239720A (en) 2021-12-16 2021-12-16 Method and device for reducing feature weight and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN114239720A true CN114239720A (en) 2022-03-25

Family

ID=80757548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111547646.8A Pending CN114239720A (en) 2021-12-16 2021-12-16 Method and device for reducing feature weight and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN114239720A (en)

Similar Documents

Publication Publication Date Title
JP7169369B2 (en) Method, system for generating data for machine learning algorithms
US8713489B2 (en) Simulation parameter correction technique
CN111414987A (en) Training method and training device for neural network and electronic equipment
CN112232495B (en) Prediction model training method, device, medium and computing equipment
CN115394358B (en) Single-cell sequencing gene expression data interpolation method and system based on deep learning
US11100072B2 (en) Data amount compressing method, apparatus, program, and IC chip
US8813009B1 (en) Computing device mismatch variation contributions
US20190129918A1 (en) Method and apparatus for automatically determining optimal statistical model
CN114757112A (en) Motor parameter design method and system based on Hui wolf algorithm
CN114168318A (en) Training method of storage release model, storage release method and equipment
US8056045B2 (en) System and method for circuit simulation
CN114154615A (en) Neural architecture searching method and device based on hardware performance
CN109918237B (en) Abnormal network layer determining method and related product
CN114239720A (en) Method and device for reducing feature weight and computer-readable storage medium
US20230222385A1 (en) Evaluation method, evaluation apparatus, and non-transitory computer-readable recording medium storing evaluation program
CN116737373A (en) Load balancing method, device, computer equipment and storage medium
CN115688853A (en) Process mining method and system
CN110908599B (en) Data writing method and system
CN111079390B (en) Method and device for determining selection state of check box list
US10482157B2 (en) Data compression apparatus and data compression method and storage medium
US7483819B2 (en) Representing data having multi-dimensional input vectors and corresponding output element by piece-wise polynomials
JP7283645B1 (en) PREDICTION VALUE CORRECTION DEVICE, PREDICTION VALUE CORRECTION METHOD AND PROGRAM
US11790984B1 (en) Clustering for read thresholds history table compression in NAND storage systems
EP3792837A1 (en) Learning program and learning method
WO2023203769A1 (en) Weight coefficient calculation device and weight coefficient calculation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination