CN115859169A

CN115859169A - Feature extraction method, device, equipment, readable storage medium and program product

Info

Publication number: CN115859169A
Application number: CN202211573429.0A
Authority: CN
Inventors: 田天; 郭向; 李元锋; 景昕; 王静; 孙知洋
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Information Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Information Technology Co Ltd
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-03-28

Abstract

The application provides a method and a device for extracting characteristics an apparatus, a readable storage medium, and a program product. Acquiring a training sample set and an initial classification model; inputting the training sample set and the initial classification model into N interpretation models to obtain first interpretation information output by each interpretation model, wherein the first interpretation information output by each interpretation model comprises an importance degree value of each feature in the first feature set, and the importance degree value is used for indicating the influence degree of each feature on a prediction result of the initial classification model; performing fusion processing on N pieces of first interpretation information output by the N interpretation models to obtain fused second interpretation information, wherein the second interpretation information comprises fusion importance degree values of all features in a second feature set, and the second feature set comprises the same features in the N first feature sets; and extracting the target feature set according to the fusion importance degree value of each feature in the second feature set. According to the embodiment of the application, the accuracy of feature extraction can be improved.

Description

Feature extraction method, device, equipment, readable storage medium and program product

Technical Field

The present application belongs to the technical field of model interpretation, and in particular, relates to a method, an apparatus, a device, a readable storage medium, and a program product for feature extraction.

Background

In the practical application of the classification model, people not only want to obtain the prediction result of the classification model, but also want to know how the classification model obtains the prediction result, and based on this, the classification model can be interpreted through a model interpretation algorithm. In general, features can be better understood and analyzed based on feature impact interpretation of the classification model, and then interpretable feature sets are extracted to train a more accurate classification model. However, the interpretability feature set extracted based on model interpretation at present has the defect of low accuracy.

Disclosure of Invention

The embodiment of the application provides a feature extraction method, a device, equipment, a readable storage medium and a program product, so as to improve the accuracy of an interpretable feature set extracted based on model interpretation.

In a first aspect, an embodiment of the present application provides a feature extraction method, where the method includes:

acquiring a training sample set and an initial classification model, wherein the initial classification model is obtained by training the training sample set;

inputting the training sample set and the initial classification model into N interpretation models to obtain first interpretation information output by each interpretation model, wherein the first interpretation information output by each interpretation model comprises an importance degree value of each feature in the first feature set, the importance degree value is used for indicating the influence degree of each feature on the prediction result of the initial classification model, and N is an integer greater than 1;

fusing N pieces of first interpretation information output by the N interpretation models to obtain fused second interpretation information, wherein the second interpretation information comprises fusion importance degree values of all features in a second feature set, and the second feature set comprises the same features in the first feature set corresponding to the N interpretation models;

and extracting a target feature set according to the fusion importance degree value of each feature in the second feature set, wherein the target feature set comprises target features of which the fusion importance degree values meet preset conditions.

In a second aspect, an embodiment of the present application provides a feature extraction apparatus, including:

the acquisition module is used for acquiring a training sample set and an initial classification model, and the initial classification model is obtained by training the training sample set;

the output module is used for inputting the training sample set and the initial classification model into N interpretation models to obtain first interpretation information output by each interpretation model, the first interpretation information output by each interpretation model comprises an importance degree value of each feature in the first feature set, the importance degree value is used for indicating the influence degree of each feature on the prediction result of the initial classification model, and N is an integer greater than 1;

the fusion module is used for carrying out fusion processing on the N pieces of first interpretation information output by the N interpretation models to obtain fused second interpretation information, wherein the second interpretation information comprises fusion importance degree values of all features in a second feature set, and the second feature set comprises the same features in the first feature set corresponding to the N interpretation models;

and the extraction module is used for extracting a target feature set according to the fusion importance degree value of each feature in the second feature set, wherein the target feature set comprises target features of which the fusion importance degree values meet preset conditions.

In a third aspect, an embodiment of the present application provides an electronic device, where the device includes:

a processor and a memory storing programs or instructions;

the processor, when executing the program or instructions, implements the method described above.

In a fourth aspect, the present application provides a machine-readable storage medium, on which a program or instructions are stored, and when the program or instructions are executed by a processor, the method described above is implemented.

In a fifth aspect, the present application provides a computer program product, and instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the above method.

The feature extraction method, the device, the equipment, the readable storage medium and the program product can acquire a training sample set and an initial classification model, wherein the initial classification model is obtained by training the training sample set; inputting the training sample set and the initial classification model into N interpretation models to obtain first interpretation information output by each interpretation model, wherein the first interpretation information output by each interpretation model comprises an importance degree value of each feature in the first feature set; fusing N pieces of first interpretation information output by the N interpretation models to obtain fused second interpretation information, wherein the second interpretation information comprises fusion importance degree values of all features in a second feature set, and the second feature set comprises the same features in the first feature set corresponding to the N interpretation models; and extracting a target feature set according to the fusion importance degree value of each feature in the second feature set, wherein the target feature set comprises target features of which the fusion importance degree values meet preset conditions.

Therefore, the first interpretation information output by a plurality of different interpretation models can be considered in a combined manner, the second interpretation information which is more comprehensive and has higher objectivity can be obtained through fusion, the target feature set can be extracted based on the fusion importance degree value of each feature in the second interpretation information, the risk that the extraction accuracy of the target feature set is influenced due to the fact that the importance degree value of each feature is wrong due to the interpretation limitation of a single interpretation model is reduced, and the accuracy of the target feature set is effectively guaranteed.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a feature extraction method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating an embodiment of a scenario of a feature extraction method according to an embodiment of the present application;

fig. 3 is a flowchart of an embodiment of a scenario of a feature extraction method according to another embodiment of the present application;

fig. 4 is a schematic structural diagram of a feature extraction device according to another embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to still another embodiment of the present application.

Detailed Description

Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are intended to be illustrative only and are not intended to be limiting. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present application by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" comprise 8230; "do not exclude the presence of additional identical elements in any process, method, article, or apparatus that comprises the element.

In order to solve the problems of the prior art, embodiments of the present application provide a feature extraction method, apparatus, device, readable storage medium, and program product. First, a feature extraction method provided in an embodiment of the present application is described below.

Fig. 1 shows a schematic flowchart of a feature extraction method according to an embodiment of the present application. As shown in fig. 1, the method includes:

s110, a training sample set and an initial classification model are obtained.

And S120, inputting the training sample set and the initial classification model into the N interpretation models to obtain first interpretation information output by each interpretation model.

S130, fusing the N pieces of first interpretation information output by the N pieces of interpretation models to obtain fused second interpretation information.

And S140, extracting a target feature set according to the fusion importance degree value of each feature in the second feature set.

Specific implementations of the above steps will be described in detail below.

In the embodiment of the application, the first interpretation information output by a plurality of different interpretation models can be considered in a combined manner to obtain more comprehensive and higher-objectivity second interpretation information, the target feature set can be extracted based on the fusion importance degree value of each feature in the second interpretation information, the risk that the importance degree value of each feature is wrong and the extraction accuracy of the target feature set is influenced due to the interpretation limitation of a single interpretation model is reduced, and the accuracy of the target feature set is effectively ensured.

Specific implementations of the above steps are described below.

In S110, the training sample set may be historical sample data of the object to be classified, and the initial classification model may be a machine learning model, such as a decision tree model, a logistic regression model, a random forest model, or a neural network model. The initial classification model may be derived from a training sample set through machine learning training. The set of training samples and the initial classification model may be obtained.

In S120, the N interpretation models can be any of a plurality of existing interpretation models, such as any of a plurality of interpretation models corresponding to interpretation algorithms, such as Partial dependency graphs (PDPs), additive interpretation (SHAP), feature Importance (Feature Importance), model-independent Local Interpretation (LIME), and the like.

The acquired training sample set and the initial classification model may be input into the N interpretation models, each interpretation model may modify at least one feature in the training sample set, the initial classification model may output prediction results based on the features before and after modification, and the first interpretation information may be determined and output by analyzing and comparing the prediction results corresponding to the features before and after modification.

As an example, as shown in fig. 2, the classification model and the original training sample are respectively input into the PDP model, the SHAP model and the LIME model to obtain a classification result interpretation, that is, first interpretation information, where the first interpretation information includes each feature impact value, each feature impact graph and a model rule.

The first interpretation information output by each interpretation model can be obtained based on the output result of each interpretation model, and the first interpretation information can include an importance degree value of each feature in the first feature set, wherein the importance degree value is used for indicating the influence degree of each feature on the prediction result of the initial classification model, and the greater the importance degree value, the greater the influence of the feature on the prediction result can be understood. For example, if the N interpretation models include a PDP model, a SHAP model, and a LIME model, the first interpretation information output by each interpretation model may be as shown in table 1:

table 1 example table of first interpretation information

Each first feature set may be all the features output by each interpretation model, or a combination of features whose influence degrees satisfy a threshold value, or all the features may be ranked according to an importance degree value, and the first features of the interpretation model are taken as corresponding first feature sets.

In S130, N pieces of first interpretation information output by the N interpretation models may be subjected to fusion processing, so as to obtain fused second interpretation information. The second interpretation information may include fusion importance degree values of the features in the second feature set, and the second feature set may include the same features of the union set of the first feature sets corresponding to the N interpretation models, that is, the same features may be features that all appear in the first feature sets corresponding to the N interpretation models.

For example, for each feature in the second feature set, a fusion importance degree value of the feature may be obtained after the fusion processing based on an importance degree value corresponding to the feature in the first interpretation information output by each interpretation model.

For example, the importance level values corresponding to the features a may be a1, a2, and a3, and the fusion importance level value after the features a are fused may be an average value of the importance level values, that is, (a 1+ a2+ a 3)/3. The fusion importance degree value after the feature a is fused may also be the sum of products of the weight value corresponding to each interpretation model and the importance degree value output by the interpretation model, the weight value corresponding to each interpretation model may be a value preset according to an empirical value, or may be a value determined based on the first interpretation information output by each interpretation model. The fusion importance degree value after feature a fusion can also be calculated by using other existing fusion algorithms, which is not specifically limited herein.

In S140, a target feature set may be extracted according to the fusion importance degree value of each feature in the second feature set. For example, all features with fused importance values greater than or equal to a preset importance value threshold may be extracted as the target feature set. The fusion importance degree values can also be ranked from large to small, and the features ranked in advance in the preset number are used as a target feature set. And randomly extracting a preset number of features from all the features of which the fusion importance degree value is greater than or equal to the preset importance degree value threshold value to serve as a target feature set. The specific extraction method can be set according to actual requirements, and is not specifically limited herein.

It can be understood that the extracted target feature set can be used for training a classification model, so that the classification model with more accurate prediction result and lower computational power consumption can be obtained, and the purpose of optimizing the initial classification model is achieved.

In order to filter some features that have a small influence on the interpretation model, so as to reduce the amount of computation in the fusion process, an embodiment of the present application provides a specific implementation manner of S120, where the specific implementation manner may include:

and inputting the training sample set and the initial classification model into a first interpretation model, executing a first operation by the first interpretation model, and outputting the importance degree value of each feature in the first feature set.

Wherein the first operation comprises:

determining importance degree values of all features in the training sample set based on the training sample set and the initial classification model;

sorting the importance degree values of all the features from large to small to obtain a feature sequence;

and determining the first K features in the feature sequence as a first feature set, and acquiring the importance degree value of each feature in the first feature set.

In the above specific implementation manner, the training sample set and the initial classification model corresponding to the first interpretation model may be input into the first interpretation model, the first interpretation model determines importance degree values of all features in the training sample set, ranks the importance degree values of all features from large to small to obtain a feature sequence, and selects the top K features in the feature sequence as the first feature set. The first feature sets of all the first interpretation models can be determined according to the above method, so as to obtain N first feature sets. The value of K may be preset according to actual conditions, for example, the value of K may be 20 to 50.

As an example, firstly, a training sample set and an initial classification model corresponding to N first interpretation models are input into the corresponding first interpretation models, the interpretation models determine importance degree values of all features, all the features of each interpretation model are respectively arranged according to the importance degree values to obtain a feature sequence, and the first 25 features of each interpretation model are respectively extracted as a first feature set corresponding to the interpretation model.

Therefore, all the features corresponding to the importance degree value determined by the interpretation model are sorted, the features with small influence on the interpretation model can be screened out by selecting the features with the previous preset values, and therefore the calculated amount of the subsequent fusion process is reduced, and the calculation power is saved.

Similarly, in some examples, when N pieces of first interpretation information output by N interpretation models are subjected to fusion processing, fusion importance degree values of all the same features of a union of N pieces of first feature sets may also be obtained first, and all the same features are arranged according to the size of the fusion importance degree values, so as to obtain a fusion feature sequence, and the first P features in the fusion feature sequence are selected as the second feature set. It is understood that P is less than K, for example when K is 25, P may be 20.

For example, as shown in fig. 3, taking N interpretation models including a PDP model, a SHAP model, and a LIME model as an example, a feature union extraction may be performed according to 25 feature interpretations of the PDP model (i.e., the importance level values of 25 features output by the PDP model), 25 feature interpretations of the SHAP model (i.e., the importance level values of 25 features output by the SHAP model), and 25 feature interpretations of the LIME model (i.e., the importance level values of 25 features output by the LIME model), and the extracted same features are subjected to a fusion interpretation (i.e., the importance level values of the same features in each interpretation model are fused), so as to obtain a fusion importance level value of the same features. Based on the size of the fusion importance degree value, the first 20 features with larger fusion importance degree values can be reserved, and the fusion importance degree values of the 20 features can be output. In order to obtain the fused second interpretation information more accurately, an embodiment of the present application further provides a specific implementation manner of S130, where the specific implementation manner may include:

determining a second feature set according to the same features in the N first feature sets;

determining N importance degree values corresponding to the features in the second feature set according to N first interpretation information output by the N interpretation models;

and determining a fusion importance degree value of each feature in the second feature set according to the N weight values corresponding to the N interpretation models one to one and the N importance degree values corresponding to each feature in the second feature set.

In the above specific implementation manner, all the same features may be determined as a second feature set, and the N importance degree values corresponding to each feature in the second feature set may be importance degree values corresponding to the feature in the N first feature sets, respectively. For example, the N importance degree values corresponding to feature a may be a1, a2, and a3, respectively.

The N weight values corresponding to the N interpretation models one to one may be weighted and summed with the N importance values corresponding to the features in the second feature set, and the value obtained after weighted summation is used as the fusion importance value of the features in the second feature set. For example, the weight value of the PDP model may be w1, the weight value of the SHAP model may be w2, and the weight value of the LIME model may be w3. The importance degree value of the feature A in the PDP model is a1, the importance degree value of the feature A in the SHAP model is a2, and the importance degree value of the feature A in the LIME model is a3. The fused importance value of feature a may be equal to a1 × w1+ a2 × w2+ a3 × w3.

It can be understood that the N weight values corresponding to the N interpretation models one to one may be preset weight values according to empirical values, or may be determined based on the first interpretation information output by each interpretation model.

Therefore, the N importance degree values corresponding to the features in the second feature set are weighted and summed through different weight values of different interpretation models, so that the fusion importance degree value of the features in the second interpretation information is more reasonable, and the second interpretation information is more accurately obtained.

In order to more accurately determine the weight value of each interpretation model, the first interpretation information output by each interpretation model may further include a prediction rule of the initial classification model, the prediction rule may be determined based on a change condition of a change of each feature in the first feature set to a prediction result of the initial classification model, and before determining the fused importance degree value of each feature in the second feature set according to N weight values corresponding to the N interpretation models one to one and N importance degree values corresponding to each feature in the second feature set, the specific implementation manner may further include:

respectively inputting the initial classification model and each sample in the training sample set into N interpretation models to obtain interpretation information of each sample output by each interpretation model;

determining the interpretation consistency of each interpretation model according to the interpretation information of each sample;

predicting the training sample set according to the prediction rule of the initial classification model to obtain the prediction result corresponding to each interpretation model;

determining the rule hit rate of each interpretation model according to the prediction result;

and determining the weight value of each interpretation model according to the interpretation consistency and the rule hit rate.

In the above specific implementation manner, the first interpretation information may further include a prediction rule of the initial classification model, the prediction rule may be used to predict the training sample set to obtain a prediction result corresponding to each interpretation model, and the prediction rule may be determined based on a change condition of a change of each feature in the first feature set to the prediction result of the initial classification model.

In some examples, a change influence graph may be further generated based on a change condition of a change of each feature in the first feature set to a prediction result of the initial classification model, and the first interpretation information may further include the change influence graph of each feature in the first feature set, so that the first interpretation information output by each interpretation model may be more intuitively reflected, and the interpretability of each interpretation model to the initial classification model is increased.

As shown in fig. 3, taking N interpretation models including a PDP model, a SHAP model, and a LIME model as an example, the initial classification model and the training sample set may be respectively input into the PDP model, the SHAP model, and the LIME model, each interpretation model may output first interpretation information, and the first interpretation information may include an importance value of each feature, a change influence graph, and a prediction rule of the initial classification model.

In the above specific implementation manner, the initial classification model and each sample in the training sample set may be respectively input into the N interpretation models, so as to obtain interpretation information of each sample output by each interpretation model.

Interpretation consistency means that for two similar samples, the interpretation information predicted by the interpretation model should be approximately equal. Based on this, the interpretation consistency of the interpretation models can be determined by comparing the interpretation information of any two similar samples in each interpretation model. It can be understood that the closer the interpretation information of any two similar samples is, the stronger the interpretation capability of the corresponding interpretation model can be explained.

And predicting the training sample set according to the prediction rule of the initial classification model to obtain a prediction result corresponding to each interpretation model, wherein the prediction result can be used for indicating samples belonging to the target object in the training sample set and samples not belonging to the target object in the training sample set. The rule hit rate of each interpretation model can be determined from the prediction. In other words, the training sample set may be predicted according to a prediction rule corresponding to a certain interpretation model, and a probability that the prediction rule is correct in prediction may be determined based on a prediction result, where the correct probability is a rule hit rate of the interpretation model.

It can be understood that the higher the hit rate of the rule is, the higher the accuracy of the prediction rule can be illustrated, and the stronger the interpretation capability of the corresponding interpretation model can be illustrated.

The weight value of each interpretation model can be determined according to the interpretation consistency and the rule hit rate of each interpretation model. It can be understood that the stronger the interpretation capability of the interpretation model is, the larger the weight value corresponding to the interpretation model can be, so that the accuracy of the fused second interpretation information can be ensured.

For example, taking N interpretation models including a PDP model, a SHAP model, and a LIME model as an example, according to the interpretation consistency and the rule hit rate, the weight value of each interpretation model may be determined, which may include the following steps:

a. and respectively calculating the interpretation consistency and the hit rate of the PDP model, the SHAP model and the LIME model.

b. The interpretation consistency is normalized using the following equation (1):

wherein, SAM ^* And the SAM is the interpretation consistency of each interpretation model after normalization of each interpretation model.

c. The interpretability of each interpretation model is calculated using the following formula (2):

Ability＝0.5*(1-SAM ^* )+0.5*COR (2)

wherein, ability represents the interpretation Ability of each interpretation model, SAM ^* For the interpretation consistency after normalization of each interpretation model, COR represents the rule hit rate of each interpretation model.

d. Calculating the weight value of each interpretation model by using the following formula (3), taking the SHAP model as an example:

wherein, W _SHAP Representing the weight value, ability, of the SHAP model _SHAP Representing the interpretation Ability of the SHAP model, ability _PDP Representing the interpretation capability, ability, of the PDP model _LIME Representing the interpretation capabilities of the LIME model.

As an example, determining the fusion importance degree value of each feature in the second feature set according to N weight values corresponding to N interpretation models one to one and N importance degree values corresponding to each feature in the first feature set may be as shown in formula (4):

wherein, the Importance _i A fused importance value, I, for the feature I _i,j The importance value of the feature i in the model j, which may be any one of the above-mentioned interpretation models.

Therefore, the interpretation consistency and the rule hit rate of each corresponding interpretation model are determined by judgment according to the interpretation information of each sample, and the higher the interpretation consistency and the hit rate are, the better the model interpretation capacity is, so that the weight value is calculated, and the accuracy of the weight value of each interpretation model can be ensured. In some examples, the first interpretation information output by each interpretation model may further include a prediction rule of the initial classification model, and the fusing of the N first interpretation information output by the N interpretation models to obtain the fused second interpretation information may further include fusing the prediction rules corresponding to the N interpretation models, where the fusing of the prediction rules corresponding to the N interpretation models may follow the following principle:

(1) Explanation of the principle of capability priority: and determining the reliability of the interpretation result of each interpretation model according to the first interpretation information output by each interpretation model, wherein the greater the interpretation capability is, the higher the reliability is, and the interpretation capability can be determined based on the interpretation consistency and the rule hit rate. In the case of interpretation conflict, namely when different interpretation information appears on the same classification model by a plurality of interpretation models, the interpretation model with higher reliability of model interpretation is preferentially selected.

(2) Principle of abnormal regulation: on the premise of an interpretation capability priority principle, for the first interpretation information given by each interpretation model of each feature, if obvious abnormal and unreasonable conditions occur, the fusion of the first interpretation information is optimized.

Therefore, model interpretation can be carried out on the initial classification model based on the fused prediction rule and the fusion importance degree value of each feature, so that a user can more intuitively see a more reasonable model interpretation result with higher interpretability, and the reliability and universality of model interpretation are effectively improved.

In order to more accurately obtain the interpretation consistency of each interpretation model, the embodiment of the present application provides a specific implementation manner that the initial classification model and each sample in the training sample set are respectively input into N interpretation models to obtain the interpretation information of each sample output by each interpretation model, where the specific implementation manner may include:

acquiring any two similar samples in a training sample set and difference values of all characteristics in any two similar samples;

respectively inputting the initial classification model and any two similar samples into N interpretation models to obtain the importance degree value of each feature in any two similar samples output by each interpretation model;

determining the interpretation consistency of each interpretation model according to the interpretation information of each sample, wherein the method comprises the following steps:

determining the difference value of the importance degree of each feature in each interpretation model according to the importance degree value of each feature in any two similar samples;

determining the interpretation consistency of any two similar samples in each interpretation model according to the difference value of each characteristic and the difference value of the importance degree of each characteristic;

and determining the interpretation consistency of each interpretation model according to the interpretation consistency of any two similar samples in each interpretation model.

In the above specific implementation manner, the similar samples may refer to samples with relatively close features, and any two similar samples in the training sample set and a difference value between the features in the any two similar samples may be obtained.

For example, any two similar samples x and x 'in the training sample set and the difference value of each feature in the samples x and x' may be obtained, where the difference value of each feature may be as shown in formula (5):

wherein, dif _i (x, x') represents a difference value of the characteristic i, x _i And x' _i Respectively representing the feature i in the sample x and the feature i in the sample x'.

And then inputting the initial classification model and one of the samples in any two similar samples into the N interpretation models to obtain the importance degree value of each feature in one of the samples output by each interpretation model. And inputting the initial classification model and the other sample of any two similar samples into the N interpretation models to obtain the importance degree value of each feature in the other sample output by each interpretation model. And determining the difference value of the importance degree of each feature in each interpretation model according to the importance degree value of each feature in any two similar samples.

For example, the initial classification model and the samples x and x 'may be respectively input into any one interpretation model, to obtain the importance degree value of each feature in the samples x and x' output by the interpretation model, and to determine the importance degree difference value of each feature. Wherein, the importance degree difference value for the feature i can be as shown in formula (6):

wherein dif _i (M (x, x ')) represents the difference value of the importance of the feature i, and M (x) and M (x ') represent the importance values of the feature i in the samples x and x ', respectively.

The interpretation consistency of any two similar samples in each interpretation model can be determined according to the difference value of each feature and the difference value of the importance degree of each feature, wherein the interpretation consistency of any two similar samples x and x' can be shown in formula (7):

where Sam (x, x ') denotes the consistency of interpretation of any two similar samples x and x', and n denotes the number of all features in the first feature set.

The interpretation consistency of any two similar samples in each interpretation model can be determined according to the difference value of each feature and the difference value of the importance degree of each feature, wherein the interpretation consistency of the interpretation models can be shown as formula (8):

sam represents the interpretation consistency of the interpretation model, j and k represent j and k samples in the training sample set respectively, and S represents the total number of samples in the training sample set.

It is understood that if the interpretation consistency of the interpretation model is higher, the closer Sam value should be to 1, and the farther the distance greater than 1 or less than 1 is, the worse the interpretation consistency is, i.e., the worse the interpretation capability of the interpretation model is.

Therefore, by obtaining any two similar samples in the training sample set, the difference value of each feature in any two similar samples and the difference value of the importance degree of each feature, the difference of the two similar samples to the same interpretation model is judged, the interpretation consistency of the whole model is obtained through calculation, and the interpretation consistency of each interpretation model can be more accurately determined.

In order to obtain the rule hit rate more accurately, an embodiment of the present application further provides another implementation manner for determining the rule hit rate of each interpretation model according to the prediction result, where the another implementation manner specifically includes:

acquiring a first number of first samples in a training sample set and a second number of all samples in the training sample set, wherein the first samples comprise samples of which prediction results indicate that the samples belong to a target object and actually belong to the target object, and samples of which prediction results indicate that the samples do not belong to the target object and actually do not belong to the target object;

and obtaining the rule hit rate of each interpretation model according to the ratio of the first quantity to the second quantity.

In the above specific implementation, the first sample may be a sample whose prediction result is correct, that is, the prediction result indicates a sample that belongs to the target object and actually belongs to the target object, and the prediction result indicates a sample that does not belong to the target object and actually does not belong to the target object, and the first number may be a sum of samples whose prediction result indicates a sample that belongs to the target object and actually belongs to the target object, and the prediction result indicates a sample that does not belong to the target object and actually does not belong to the target object.

As an example, the rule hit rate of each interpretation model may be obtained according to a ratio of the first number to the second number, where the rule hit rate of each interpretation model may be as shown in formula (9):

where COR represents the rule hit rate of the interpretation model, N _y,y Representing the number of samples, N, whose prediction results indicate that they belong to the target object and actually belong to the target object _n，n The representation prediction result indicates the number of samples that do not belong to the target object and do not actually belong to the target object, and S represents the total number of all samples in the training sample set.

Therefore, the rule hit rate is calculated by obtaining the first number of the first samples in the training sample set and the second number of all the samples in the training sample set and substituting the number of the samples with correct prediction results into a formula, so that the accuracy of the rule hit rate is ensured.

In order to obtain a target feature set more accurately, an embodiment of the present application provides a specific implementation manner of S140, where the specific implementation manner may include:

according to the fusion importance degree value of each feature in the second feature set, determining the initially selected feature of which the fusion importance degree value is greater than or equal to a preset importance degree value threshold;

and under the condition that the number of the initially selected features is less than or equal to a preset number threshold, extracting the initially selected features into a target feature set.

In the above specific implementation manner, the preset importance value threshold may be any value set according to actual conditions, and is not specifically limited herein.

The features in the second feature set can be sorted according to the fusion importance degree value, the features of which the fusion importance degree value is smaller than a preset importance degree value threshold are removed, and the remaining primary selected features are obtained. The preset number threshold may be set according to an empirical value in combination with an actual situation, for example, the preset number threshold may be 0.3b, and b is a total feature number. And setting the number of the primary selection features as a, and if a is less than or equal to 0.3b, extracting all primary selection features as a target feature set.

Therefore, the primary selected features with the fused importance degree value larger than or equal to the preset importance degree value threshold are extracted as the target feature set under the condition that the number of the primary selected features is smaller than or equal to the preset number threshold, all the features can be used as the target features when the number of the primary selected features is smaller than the preset number threshold, the feature number of the target feature set is ensured not to be too large, the calculation amount of a subsequent target feature set optimization training classification model can be further reduced, and the calculation force is effectively saved.

In order to obtain the target feature set more accurately, after determining the initially selected feature having the fusion importance degree value greater than or equal to the preset importance degree value threshold according to the fusion importance degree value of each feature in the second feature set, the specific implementation manner may further include:

under the condition that the number of the initially selected features is larger than a preset number threshold, extracting M target feature sets from the initially selected features, wherein the number of the features corresponding to the M target feature sets is different, the number of the features corresponding to the M target feature sets is smaller than or equal to the preset number threshold, and M is an integer larger than 1.

In the above specific implementation manner, when the number of the initially selected features is greater than the preset number threshold, it may be considered that the number of the currently remaining features is excessive, and at this time, M target feature sets with different feature numbers may be extracted from the initially selected features. It is understood that the number of features corresponding to each target feature set is less than or equal to the preset number threshold.

It can be understood that the M target feature sets extracted from the initially selected features may be randomly extracted, or the initially selected features may be ranked from large to small according to the fusion importance degree value of each feature in the second feature set, and then different numbers of features are extracted from the top ranked features in sequence as different target feature sets.

For example, assuming that the number of remaining features is a, the preset number threshold may be 0.3b, and b is the total number of features. If a > 0.3b, a number of target feature sets of 0.1b, 0.2b and 0.3b, respectively, can be extracted from the initial features.

Therefore, under the condition that the number of the primary selected features is larger than the preset number threshold, M target feature sets are extracted from the primary selected features, and the optimized classification model is trained respectively through different target feature sets to obtain a classification model with higher precision. Therefore, the accuracy of the classification model after optimization can be effectively ensured while the calculation force is saved.

Based on the feature extraction method provided by the above embodiment, the present application also provides an embodiment of a feature extraction device.

Fig. 4 is a schematic structural diagram of a feature extraction apparatus according to another embodiment of the present application, and only a part related to the embodiment of the present application is shown for convenience of description.

Referring to fig. 4, the feature extraction apparatus 400 includes:

an obtaining module 401, configured to obtain a training sample set and an initial classification model, where the initial classification model is obtained by training the training sample set;

an output module 402, configured to input the training sample set and the initial classification model into N interpretation models to obtain first interpretation information output by each interpretation model, where the first interpretation information output by each interpretation model includes an importance level value of each feature in the first feature set, the importance level value is used to indicate an influence degree of each feature on a prediction result of the initial classification model, and N is an integer greater than 1;

the fusion module 403 is configured to perform fusion processing on the N pieces of first interpretation information output by the N interpretation models to obtain fused second interpretation information, where the second interpretation information includes a fusion importance degree value of each feature in the second feature set, and the second feature set includes the same feature in the first feature set corresponding to the N interpretation models;

an extracting module 404, configured to extract a target feature set according to the fusion importance degree value of each feature in the second feature set, where the target feature set includes target features whose fusion importance degree values satisfy a preset condition.

Therefore, the first interpretation information output by a plurality of different interpretation models can be considered in a combined manner, the second interpretation information which is more comprehensive and higher in objectivity can be obtained through fusion, the target feature set can be extracted based on the fusion importance degree value of each feature in the second interpretation information, the risk that the importance degree value of each feature is wrong due to the interpretation limitation of a single interpretation model, the extraction accuracy of the target feature set is influenced is reduced, and the accuracy of the target feature set is effectively guaranteed.

In some embodiments, in order to obtain the fused second interpretation information more accurately, the fusion module 403 may include the following units:

the first determining unit is used for determining a second feature set according to the same features in the N first feature sets;

the second determining unit is used for determining N importance degree values corresponding to each feature in the second feature set according to N pieces of first interpretation information output by the N interpretation models;

and the third determining unit is used for determining the fusion importance degree value of each feature in the second feature set according to the N weight values corresponding to the N interpretation models one to one and the N importance degree values corresponding to each feature in the second feature set.

In some embodiments, the first interpretation information output by each interpretation model further includes a prediction rule of the initial classification model, the prediction rule is determined based on a change of each feature in the first feature set to a predicted result of the initial classification model, and in order to more accurately determine the weight value of each interpretation model, the fusion module 403 may further include the following units:

the input unit is used for respectively inputting the initial classification model and each sample in the training sample set into the N interpretation models to obtain the interpretation information of each sample output by each interpretation model;

the fourth determining unit is used for determining the interpretation consistency of each interpretation model according to the interpretation information of each sample;

the prediction unit is used for predicting the training sample set according to the prediction rule of the initial classification model to obtain the prediction result corresponding to each interpretation model;

a fifth determining unit, configured to determine a rule hit rate of each interpretation model according to the prediction result;

and the sixth determining unit is used for determining the weight value of each interpretation model according to the interpretation consistency and the rule hit rate.

In some embodiments, in order to obtain the interpretation consistency of each interpretation model more accurately, the input unit may include the following sub-units:

the first acquisition subunit is used for acquiring any two similar samples in the training sample set and difference values of each feature in any two similar samples;

and the output subunit is used for respectively inputting the initial classification model and any two similar samples into the N interpretation models to obtain the importance degree value of each feature in any two similar samples output by each interpretation model.

The fourth confirmation unit may include the following sub-units:

the first determining subunit is used for determining the importance degree difference value of each feature in each interpretation model according to the importance degree value of each feature in any two similar samples;

the second determining subunit is used for determining the interpretation consistency of any two similar samples in each interpretation model according to the difference value of each feature and the difference value of the importance degree of each feature;

and the third determining subunit is used for determining the interpretation consistency of each interpretation model according to the interpretation consistency of any two similar samples in each interpretation model.

In some embodiments, in order to obtain the rule hit rate more accurately, the fifth validation unit may include the following sub-units:

a second obtaining subunit, configured to obtain a first number of first samples in the training sample set and a second number of all samples in the training sample set, where the first samples include samples whose prediction results indicate that the samples belong to the target object and actually belong to the target object, and samples whose prediction results indicate that the samples do not belong to the target object and actually do not belong to the target object;

and the calculating subunit is used for obtaining the rule hit rate of each interpretation model according to the ratio of the first quantity to the second quantity.

In some embodiments, in order to obtain the target feature set more accurately, the above-mentioned extraction module 204 may include the following units:

a seventh determining unit, configured to determine, according to the fusion importance degree value of each feature in the second feature set, a primary selected feature having a fusion importance degree value greater than or equal to a preset importance degree value threshold;

and the first extraction unit is used for extracting the primary selected features as the target feature set under the condition that the number of the primary selected features is less than or equal to a preset number threshold.

In some embodiments, in order to obtain the target feature set more accurately, the extraction module 204 may further include the following units:

and the second extraction unit is used for extracting M target feature sets from the primary selected features under the condition that the number of the primary selected features is larger than a preset number threshold, wherein the number of the M target feature sets is different, the number of the M target feature sets is smaller than or equal to the preset number threshold, and M is an integer larger than 1.

In some embodiments, in order to make the first interpretation information output by each interpretation model more accurate, the output module 402 may further be configured to:

inputting the training sample set and the initial classification model into a first interpretation model, wherein the first interpretation model executes a first operation and outputs an importance degree value of each feature in the first feature set, the first interpretation model is any one of the N interpretation models, and the first operation comprises:

determining the first K features in the feature sequence as a first feature set, and acquiring the importance degree value of each feature in the first feature set, wherein K is an integer greater than 1.

It should be noted that, the contents of information interaction, execution processes, and the like between the above devices/units are based on the same concept as that of the embodiment of the method of the present application and are devices corresponding to the above feature extraction method, and all implementation manners in the embodiment of the method are applicable to the embodiment of the device, and specific functions and technical effects thereof may be specifically referred to a part of the embodiment of the method, and are not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. For the specific working processes of the units and modules in the system, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.

Fig. 5 shows a hardware structure diagram of an electronic device according to still another embodiment of the present application.

The device may include a processor 501 and a memory 302 storing programs or instructions.

The steps in any of the various method embodiments described above are implemented when the processor 501 executes a program.

Illustratively, the programs may be partitioned into one or more modules/units, which are stored in the memory 502 and executed by the processor 501 to accomplish the present application. One or more modules/units may be a series of program instruction segments capable of performing specific functions, the instruction segments describing the execution of the program in the device.

Specifically, the processor 501 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 502 may include mass storage for data or instructions. By way of example, and not limitation, memory 502 may include a Hard Disk Drive (HDD), a floppy Disk Drive, flash memory, an optical Disk, a magneto-optical Disk, magnetic tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 502 may include removable or non-removable (or fixed) media, where appropriate. The memory 502 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 502 is non-volatile solid-state memory.

The memory may include Read Only Memory (ROM), random Access Memory (RAM), magnetic disk storage media devices, optical storage media devices, flash memory devices, electrical, optical, or other physical/tangible memory storage devices. Thus, in general, the memory includes one or more tangible (non-transitory) machine-readable storage media (e.g., a memory device) encoded with software comprising computer-executable instructions and when the software is executed (e.g., by one or more processors), it is operable to perform operations described with reference to the method according to an aspect of the disclosure.

The processor 501 reads and executes the program or instructions stored in the memory 502 to implement any one of the methods in the above embodiments.

In one example, the electronic device can also include a communication interface 503 and a bus 504. The processor 501, the memory 502, and the communication interface 503 are connected via a bus 504 to complete communication therebetween.

The communication interface 503 is mainly used for implementing communication between modules, apparatuses, units and/or devices in the embodiments of the present application.

Bus 504 comprises hardware, software, or both to couple the components of the online data traffic billing device to each other. By way of example, and not limitation, a bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a Hypertransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus or a combination of two or more of these. Bus 504 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

In addition, in combination with the methods in the foregoing embodiments, the embodiments of the present application may be implemented by providing a machine-readable storage medium. The machine-readable storage medium having stored thereon a program or instructions; which when executed by a processor implements any of the methods in the above embodiments. The machine-readable storage medium may be readable by a machine, such as a computer.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction, so as to implement each process of the foregoing method embodiment, and achieve the same technical effect, and in order to avoid repetition, the details are not repeated here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

Embodiments of the present application provide a computer program product, which is stored in a machine-readable storage medium and executed by at least one processor to implement the processes of the foregoing method embodiments, and achieve the same technical effects, and therefore, in order to avoid repetition, the details are not repeated here.

It is to be understood that the present application is not limited to the particular arrangements and instrumentality described above and shown in the attached drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present application are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications, and additions or change the order between the steps after comprehending the spirit of the present application.

The functional blocks shown in the above structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the present application are the programs or code segments used to perform the required tasks. The program or code segments can be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of a machine-readable medium include electronic circuits, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and so forth. The code segments may be downloaded via a computer grid such as the internet, an intranet, etc.

It should also be noted that the exemplary embodiments mentioned in the present application, some methods or systems are described in terms of a series of steps or devices. However, the present application is not limited to the order of the above steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed at the same time.

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer programs or instructions. These programs or instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.

As described above, only the specific embodiments of the present application are provided, and it can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application.

Claims

1. A method of feature extraction, comprising:

inputting the training sample set and the initial classification model into N interpretation models to obtain first interpretation information output by each interpretation model, wherein the first interpretation information output by each interpretation model comprises an importance degree value of each feature in a first feature set, the importance degree value is used for indicating the influence degree of each feature on a prediction result of the initial classification model, and N is an integer greater than 1;

fusing the N pieces of first interpretation information output by the N interpretation models to obtain fused second interpretation information, wherein the second interpretation information comprises fusion importance degree values of all features in a second feature set, and the second feature set comprises the same features in the first feature sets corresponding to the N interpretation models;

2. The method according to claim 1, wherein the fusing the N pieces of first interpretation information output by the N interpretation models to obtain fused second interpretation information includes:

determining N importance degree values corresponding to the features in the second feature set according to N pieces of first interpretation information output by the N interpretation models;

and determining the fusion importance degree value of each feature in the second feature set according to the N weight values corresponding to the N interpretation models one to one and the N importance degree values corresponding to each feature in the second feature set.

3. The method according to claim 2, wherein the first interpretation information output by each interpretation model further includes a prediction rule of the initial classification model, the prediction rule is determined based on a variation of each feature in the first feature set on a result of prediction of the initial classification model,

before determining the fusion importance degree value of each feature in the second feature set according to the N weight values corresponding to the N interpretation models one to one and the N importance degree values corresponding to each feature in the second feature set, the method further includes:

respectively inputting the initial classification model and each sample in the training sample set into the N interpretation models to obtain interpretation information of each sample output by each interpretation model;

predicting the training sample set according to the prediction rule of the initial classification model to obtain a prediction result corresponding to each interpretation model;

4. The method according to claim 3, wherein the inputting the initial classification model and each sample in the training sample set into the N interpretation models respectively to obtain the interpretation information of each sample output by each interpretation model comprises:

acquiring any two similar samples in the training sample set and the difference value of each characteristic in any two similar samples;

inputting the initial classification model and the any two similar samples into the N interpretation models respectively to obtain an importance degree value of each feature in the any two similar samples output by each interpretation model;

the determining the interpretation consistency of each interpretation model according to the interpretation information of each sample comprises the following steps:

5. The method of claim 3, wherein determining a rule hit rate for each interpretation model based on the predicted outcome comprises:

obtaining a first number of first samples in the training sample set and a second number of all samples in the training sample set, wherein the first samples comprise samples of which the prediction result indicates that the samples belong to a target object and actually belong to the target object, and the prediction result indicates samples of which the prediction result does not belong to the target object and actually does not belong to the target object;

6. The method according to claim 1, wherein extracting a target feature set according to the fused importance degree value of each feature in the second feature set comprises:

determining the primary selected features of which the fusion importance degree values are greater than or equal to a preset importance degree value threshold according to the fusion importance degree values of the features in the second feature set;

7. The method of claim 6, wherein after determining the initially selected features having a fused importance value greater than or equal to a predetermined importance value threshold based on the fused importance value of each feature in the second feature set, the method further comprises:

and under the condition that the number of the initially selected features is larger than a preset number threshold, extracting M target feature sets from the initially selected features, wherein the number of the features corresponding to the M target feature sets is different, the number of the features corresponding to the M target feature sets is smaller than or equal to the preset number threshold, and M is an integer larger than 1.

8. The method according to claim 1, wherein the inputting the training sample set and the initial classification model into N interpretation models to obtain first interpretation information output by each interpretation model includes:

inputting the training sample set and the initial classification model into a first interpretation model, wherein the first interpretation model performs a first operation and outputs a degree of importance value of each feature in the first feature set, and the first interpretation model is any one of the N interpretation models, and the first operation includes:

sorting the importance degree values of all the characteristics from big to small to obtain a characteristic sequence;

9. A feature extraction apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a training sample set and an initial classification model, wherein the initial classification model is obtained by training the training sample set;

the output module is used for inputting the training sample set and the initial classification model into N interpretation models to obtain first interpretation information output by each interpretation model, the first interpretation information output by each interpretation model comprises an importance degree value of each feature in a first feature set, the importance degree value is used for indicating the influence degree of each feature on the prediction result of the initial classification model, and N is an integer greater than 1;

the fusion module is used for performing fusion processing on the N pieces of first interpretation information output by the N interpretation models to obtain fused second interpretation information, wherein the second interpretation information comprises fusion importance degree values of all features in a second feature set, and the second feature set comprises the same features in the first feature set corresponding to the N interpretation models;

10. An electronic device, characterized in that the device comprises: a processor and a memory storing programs or instructions;

the processor, when executing the program or instructions, implements the method of any of claims 1-8.

11. A machine readable storage medium, having stored thereon a program or instructions which, when executed by a processor, implement the method of any one of claims 1-8.

12. A computer program product, wherein instructions in the computer program product, when executed by a processor of an electronic device, cause the electronic device to perform the method of any of claims 1-8.