CN109670976B - Feature factor determination method and device - Google Patents

Feature factor determination method and device Download PDF

Info

Publication number
CN109670976B
CN109670976B CN201811549933.0A CN201811549933A CN109670976B CN 109670976 B CN109670976 B CN 109670976B CN 201811549933 A CN201811549933 A CN 201811549933A CN 109670976 B CN109670976 B CN 109670976B
Authority
CN
China
Prior art keywords
auc
characteristic
value
factor
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811549933.0A
Other languages
Chinese (zh)
Other versions
CN109670976A (en
Inventor
崔蓝艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd, Taikang Online Property Insurance Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN201811549933.0A priority Critical patent/CN109670976B/en
Publication of CN109670976A publication Critical patent/CN109670976A/en
Application granted granted Critical
Publication of CN109670976B publication Critical patent/CN109670976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment provides a method and equipment for determining a characteristic factor, wherein the method comprises the following steps: firstly, obtaining a reference AUC value according to N candidate characteristic factors, wherein each candidate characteristic factor is respectively used for describing a type of wind control characteristic, and the type of the wind control characteristic comprises at least one of the following types: an application feature, an underwriting feature, or an claim feature; obtaining an AUC critical value and an AUC fitting curve according to the importance of each candidate characteristic factor; and determining a target characteristic factor in the N candidate characteristic factors according to the reference AUC value, the AUC critical value and the AUC fitting curve, so that the problems that the number of characteristic factors entering a model can only be estimated by a worker, deviation exists and the optimal characteristic factor cannot be reasonably selected due to the fact that the number of the characteristic factors entering the model can only be estimated by the worker by using a single-round cutting method to screen the characteristic factors are solved.

Description

Feature factor determination method and device
Technical Field
The invention relates to the technical field of insurance wind control, in particular to a characteristic factor determination method and equipment.
Background
At present, there are some fraud behaviors in accident insurance and health insurance, for example, some insurance applicants fill false high income information on an application form, and high insurance value is obtained by the information, which causes great fraud suspicion. For the behaviors, a set of comprehensive and systematic mathematical models which are fit with the business scene can be built by means of the insurance business scene, and then the false information is screened in a multi-dimensional mode by combining with a business scene rule engine and is applied to an underwriting rule to avoid the occurrence of fraudulent behaviors.
The optimization of the current mathematical model is mainly of three types: optimization of an algorithm and optimization of samples, namely screening of a part of good-quality samples and optimization of characteristic factors. The existing characteristic factor optimization method mainly adopts a single-wheel cutting method, and the specific principle is as follows: and eliminating the characteristic factors which have small influence on the model by evaluating the effect of each characteristic factor on the model so as to complete the attribute specification of the sample data corresponding to the characteristic factors, namely screening a more reasonable characteristic factor input model.
However, the number of the characteristic factors of the input model cannot be evaluated by screening the characteristic factors by using the single-round cutting method, and the number of the characteristic factors of the input model can only be estimated by a worker, so that deviation exists, and the optimal characteristic factor cannot be reasonably selected.
Disclosure of Invention
The embodiment of the invention provides a method and equipment for determining characteristic factors, which solve the problems that the number of the characteristic factors entering a die cannot be evaluated by screening the characteristic factors by using a single-wheel cutting method, the number of the characteristic factors entering the die can only be evaluated by a worker, the deviation exists, and the optimal characteristic factors cannot be reasonably selected.
In a first aspect, an embodiment of the present invention provides a method for determining a characteristic factor, including:
obtaining a reference AUC value according to N candidate characteristic factors, wherein each candidate characteristic factor is respectively used for describing a type of wind control characteristic, and the type of the wind control characteristic comprises at least one of the following types: an application feature, an underwriting feature, or an claim feature;
acquiring an AUC critical value and an AUC fitting curve according to the importance of each candidate characteristic factor;
determining a target feature factor among the N candidate feature factors based on the baseline AUC value, the AUC cutoff value, and the AUC fitted curve.
In one possible design, the obtaining an AUC threshold and an AUC fitted curve according to the importance of each candidate feature factor includes:
deleting the feature factor with the lowest importance from the N candidate feature factors, and inputting the remaining N-1 candidate feature factors into a training model to obtain an AUC value output by the training model;
deleting the feature factor with the lowest importance from the N-1 candidate feature factors, and inputting the remaining N-2 candidate feature factors into the training model to obtain an AUC value output by the training model;
repeatedly executing the operation of deleting the characteristic factor with the lowest importance until the characteristic factor with the lowest importance is deleted from the remaining 2 candidate characteristic factors, and inputting 1 candidate characteristic factor into the training model to obtain an AUC value output by the training model;
and obtaining the AUC critical value and the AUC fitting curve according to the obtained N-1 AUC values.
In one possible design, the obtaining the AUC threshold and the AUC fitted curve from the obtained N-1 AUC values comprises:
(ii) taking the maximum of said N-1 AUC values as said AUC cut-off value;
and fitting the N-1 AUC values to obtain the AUC fitting curve.
In one possible design, the determining a target feature factor among the N candidate feature factors from the baseline AUC value, the AUC threshold value, and the AUC fitted curve includes:
storing the candidate characteristic factors corresponding to the AUC critical value into a first characteristic factor group;
obtaining an AUC extreme value according to the reference AUC value and the AUC fitting curve, wherein the AUC extreme value is a maximum value or a minimum value which is larger than the reference AUC value and smaller than the AUC critical value;
and determining a target characteristic factor according to the characteristic factor group and the candidate characteristic factors corresponding to the AUC extreme values.
In a possible design, the determining a target feature factor according to the set of feature factors and the candidate feature factor corresponding to each AUC extreme value includes:
for each AUC extreme value, obtaining M candidate characteristic factors corresponding to the AUC extreme value;
for each candidate characteristic factor in the M candidate characteristic factors, storing the candidate characteristic factor into the first characteristic factor group to obtain a second characteristic factor group;
inputting the characteristic factors in the second characteristic factor group into the training model to obtain a plurality of first AUC values;
determining a target characteristic factor from the plurality of first AUC values and the critical AUC value.
In one possible design, the determining a target feature factor from the plurality of first AUC values and the critical AUC value includes:
and if a first AUC value larger than the critical AUC value and a first AUC value smaller than the critical value exist, taking the candidate characteristic factor corresponding to the first AUC value larger than the critical AUC value and the characteristic factor in the first characteristic factor group as the target characteristic factor.
In one possible design, the determining a target feature factor from the plurality of first AUC values and the critical AUC value includes:
if all the first AUC values are larger than the critical AUC value, storing the M candidate characteristic factors into the first characteristic factor group to obtain a third characteristic factor group;
inputting the characteristic factors in the third characteristic factor group into the training model to obtain a second AUC value;
and determining a target characteristic factor according to the second AUC value and the critical AUC value.
In a second aspect, an embodiment of the present invention provides a feature factor determining device, including:
a reference AUC value obtaining module, configured to obtain a reference AUC value according to N candidate feature factors, where each candidate feature factor is used to describe a type of wind control feature, and the type of the wind control feature includes at least one of the following: an application feature, an underwriting feature, or an claim feature;
an AUC critical value obtaining module, configured to obtain an AUC critical value and an AUC fitting curve according to the importance of each candidate feature factor;
and a target characteristic factor determination module, configured to determine a target characteristic factor among the N candidate characteristic factors according to the reference AUC value, the AUC critical value, and the AUC fitted curve.
In one possible design, the AUC threshold acquisition module is specifically configured to:
deleting the feature factor with the lowest importance from the N candidate feature factors, and inputting the remaining N-1 candidate feature factors into a training model to obtain an AUC value output by the training model;
deleting the feature factor with the lowest importance from the N-1 candidate feature factors, and inputting the remaining N-2 candidate feature factors into the training model to obtain an AUC value output by the training model;
repeatedly executing the operation of deleting the characteristic factor with the lowest importance until the characteristic factor with the lowest importance is deleted from the remaining 2 candidate characteristic factors, and inputting 1 candidate characteristic factor into the training model to obtain an AUC value output by the training model;
and obtaining the AUC critical value and the AUC fitting curve according to the obtained N-1 AUC values.
In one possible design, the AUC threshold acquisition module is further specifically configured to:
(ii) taking the maximum of said N-1 AUC values as said AUC cut-off value;
and fitting the N-1 AUC values to obtain the AUC fitting curve.
In one possible design, the target feature factor determination module is specifically configured to:
storing the candidate characteristic factors corresponding to the AUC critical value into a first characteristic factor group;
obtaining an AUC extreme value according to the reference AUC value and the AUC fitting curve, wherein the AUC extreme value is a maximum value or a minimum value which is larger than the reference AUC value and smaller than the AUC critical value;
and determining a target characteristic factor according to the first characteristic factor group and the candidate characteristic factors corresponding to the AUC extreme values.
In one possible design, the target feature factor determination module is further specifically configured to:
for each AUC extreme value, obtaining M candidate characteristic factors corresponding to the AUC extreme value;
for each candidate characteristic factor in the M candidate characteristic factors, storing the candidate characteristic factor into the first characteristic factor group to obtain a second characteristic factor group;
inputting the characteristic factors in the second characteristic factor group into the training model to obtain a plurality of first AUC values;
determining a target characteristic factor from the plurality of first AUC values and the critical AUC value.
In one possible design, the target feature factor determination module is further specifically configured to:
and if a first AUC value larger than the critical AUC value and a first AUC value smaller than the critical value exist, taking the candidate characteristic factor corresponding to the first AUC value larger than the critical AUC value and the characteristic factor in the first characteristic factor group as the target characteristic factor.
In one possible design, the target feature factor determination module is further specifically configured to:
if all the first AUC values are larger than the critical AUC value, storing the M candidate characteristic factors into the first characteristic factor group to obtain a third characteristic factor group;
inputting the characteristic factors in the third characteristic factor group into the training model to obtain a second AUC value;
and determining a target characteristic factor according to the second AUC value and the critical AUC value.
In a third aspect, an embodiment of the present invention provides a device for determining a feature factor, including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the method of determining a characteristic factor as set forth in any one of the first aspects.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the method for determining a feature factor according to any one of the first aspect is implemented.
According to the feature factor determining method and device provided by the embodiment, an area AUC value under a reference feature curve is obtained according to N candidate feature factors, each candidate feature factor is used for describing a type of wind control feature, and the type of the wind control feature includes at least one of the following types: an application feature, an underwriting feature, or an claim feature; obtaining an AUC critical value and an AUC fitting curve according to the importance of each candidate characteristic factor; and determining a target characteristic factor in the N candidate characteristic factors according to the reference AUC value, the AUC critical value and the AUC fitting curve, so that the problems that the number of characteristic factors entering a model can only be estimated by a worker, deviation exists and the optimal characteristic factor cannot be reasonably selected due to the fact that the number of the characteristic factors entering the model can only be estimated by the worker by using a single-round cutting method to screen the characteristic factors are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a first schematic flow chart of a characteristic factor determination method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a characteristic factor determination method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a characteristic factor deleting process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the AUC fitted curve provided in the present example;
fig. 5 is a schematic flow chart of a feature factor determination method provided in the embodiment of the present invention;
fig. 6 is a schematic structural diagram of a characteristic factor determining device according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a hardware structure of the characteristic factor determining device according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a first schematic flow chart of a method for determining a feature factor according to an embodiment of the present invention, as shown in the figure, the method includes:
s101, obtaining a reference AUC value according to N candidate characteristic factors, wherein each candidate characteristic factor is respectively used for describing one type of wind control characteristic, and the type of the wind control characteristic comprises at least one of the following types: an application feature, an underwriting feature, or an claim feature.
In a specific implementation process, N candidate feature factors are obtained from sample data, where N is an integer greater than or equal to 2. And determining an algorithm model for outputting the AUC value of the area under the characteristic curve.
For example, the candidate feature factor is specifically used to describe one type of wind-controlled feature. The type of the wind control feature comprises at least one of the following: an application feature, an underwriting feature, or an claim feature. Where the application characteristics are derived from application data, such as the application data including the applicant, the actual identity of the insured person, the property condition, etc., and the claim characteristics are derived from claim data, such as the claim data including the claim amount of the claim event, the frequency of claim (or claim) or the number of persons in the claim (or claim), and the underwriting characteristics are derived from underwriting data, such as the underwriting time, the underwriting amount, etc. Thus, the N candidate feature factors may be, for example, the applicant, the actual identity of the insured life, the amount of the claim, the number of people in the claim (or venture), etc.
Specifically, the N candidate feature factors may be input into a training model, and an AUC value of an area under a reference feature curve, that is, a reference AUC value, is obtained through the training model. The training model may be, for example, a feature model such as a Gradient Boosting Decision Tree (GBDT), (Logic regression, LR) logistic regression, (Random forest, RF) Random forest, (Support Vector Machine, SVM), and the like, and this embodiment is not limited herein.
Where AUC is defined as the area enclosed by the coordinate axes under the ROC curve, it is clear that the value of this area is not larger than 1. Since the ROC curve is generally located above the line y ═ x, the AUC ranges between 0.5 and 1. ROC, a receiver operating characteristic curve (ROC curve), also called sensitivity curve (sensitivity curve). The working characteristic curve of the subject is a graph formed by taking the False positive probability (False positive rate) as the horizontal axis and the True positive probability (True positive rate) as the vertical axis, and is drawn by different results of the subject under specific stimulation conditions due to different judgment standards.
Taking the analysis of the odds and the odds of the insurance products as an example, screening the insurance acceptance and the odds and the ends warranty data of the hospitalization insurance products by using the user portrait system indexes, extracting a part of data from the insurance acceptance and the ends warranty data, obtaining 11 thousands of data after data preprocessing, and screening characteristic factors of 30 dimensions. The characteristic factors have kept independence at the service level, but do not completely achieve decoupling at the mathematical level, or the characteristic factors have redundancy and strong correlation.
Specifically, in this embodiment, N candidate feature factors are substituted into the GBDT gradient descent tree feature model, a feature curve (ROC) is output, and then, according to an area enclosed by the ROC curve and a coordinate axis, the area is an area AUC under a reference feature curve, which is denoted as AUC-total, risk customers are predicted. Wherein, the higher the AUC value, the more accurate the prediction is represented, and the lower the risk.
S102, acquiring an AUC critical value and an AUC fitting curve according to the importance of each candidate characteristic factor.
Optionally, the importance of the N candidate feature factors is ranked from low to high. The specific sorting method comprises the following steps:
the importance of the N candidate characteristic factors is ranked from low to high based on the Gini index, and after ranking, the importance of the characteristic factors is standardized, wherein the standardization processing formula is as follows:
Importance=C*feature_importance/Max(feature_importance)
wherein, the Importance is the standardized feature factor Importance, C is a constant, feature _ Importance is the feature factor Importance, and Max (feature _ Importance) is the maximum value of the feature factor Importance. C can be set according to actual conditions.
And after the N characteristic factors are sorted in descending order of importance, the N characteristic factors are subjected to Torontal screening according to the importance, the characteristic factors after each screening are brought into a training model, a plurality of AUC values are finally obtained, and the maximum value in the plurality of AUC values is taken as an AUC critical value and is recorded as AUC _ max. And fitting the plurality of AUC values into a curve to obtain an AUC fitting curve.
S103, determining a target characteristic factor in the N candidate characteristic factors according to the reference AUC value, the AUC critical value and the AUC fitting curve.
Specifically, an extreme point that an AUC value is greater than an AUC _ total value on an AUC fitting curve is found and is recorded as AUC _ j, and a target feature factor is determined in a candidate feature factor corresponding to an AUC critical value of the N candidate feature factors and a candidate feature factor corresponding to AUC _ j.
In the feature factor determining method provided in this embodiment, an AUC value of an area under a reference feature curve is obtained according to N candidate feature factors, where each candidate feature factor is used to describe a type of wind control feature, and the type of the wind control feature includes at least one of the following: an application feature, an underwriting feature, or an claim feature; obtaining an AUC critical value and an AUC fitting curve according to the importance of each candidate characteristic factor; and determining a target characteristic factor in the N candidate characteristic factors according to the reference AUC value, the AUC critical value and the AUC fitting curve, so that the problems that the number of characteristic factors entering a model can only be estimated by a worker, deviation exists and the optimal characteristic factor cannot be reasonably selected due to the fact that the number of the characteristic factors entering the model can only be estimated by the worker by using a single-round cutting method to screen the characteristic factors are solved.
S102 in the embodiment of fig. 1 is further described in detail with reference to specific embodiments. Fig. 2 is a schematic flow chart of a second method for determining a feature factor according to an embodiment of the present invention, as shown in fig. 2, the method includes:
s201, deleting the feature factor with the lowest importance from the N candidate feature factors, and inputting the remaining N-1 candidate feature factors into a training model to obtain an AUC value output by the training model;
s202, deleting the feature factor with the lowest importance from the N-1 candidate feature factors, and inputting the remaining N-2 candidate feature factors into a training model to obtain an AUC value output by the training model;
s203, repeatedly executing the operation of deleting the feature factor with the lowest importance until the remaining 2 candidate feature factors remain, inputting the 2 candidate feature factors into the training model, obtaining the AUC value output by the training model, and obtaining N-1 AUC values;
specifically, as shown in fig. 3, in the first round, the feature factor with the lowest importance is deleted from the N candidate feature factors, and the N-1 candidate feature factors are input into the training model to obtain an AUC value output by the training model, which is recorded as ACU _ 1; deleting the feature factor with the lowest importance from the N-1 candidate feature factors in the second round, inputting the N-2 candidate feature factors into the training model to obtain an AUC value output by the training model, and recording the AUC value as ACU _ 2; and repeatedly executing the operation of deleting the characteristic factor with the lowest importance, deleting for multiple times until 2 candidate characteristic factors are left, inputting the 2 candidate characteristic factors into the training model to obtain an AUC value output by training, and recording the AUC value as AUC _ n-1. Fig. 3 is a schematic diagram of a characteristic factor deleting process according to an embodiment of the present invention.
S204, taking the maximum value of the N-1 AUC values as the AUC critical value;
s205, fitting the N-1 AUC values to obtain the AUC fitting curve.
Specifically, the maximum value of AUC _1, ACU _2 … … AUC _ N-1, N-1 AUC values is taken as an AUC threshold value and is recorded as AUC _ max, and the AUC _1, ACU _2 … … AUC _ N-1, N-1 AUC values are fitted to obtain an AUC fitting curve, as shown in fig. 4, fig. 4 is a schematic diagram of the AUC fitting curve provided in this embodiment.
Specifically, in this embodiment, 30 candidate feature factors are subjected to multiple rounds of screening, so as to obtain 13 high-quality feature factors, and an AUC value obtained by inputting the 13 high-quality feature factors into the training model is AUC _ max. Optionally, the recall ratio and the table lookup ratio can also be obtained by inputting the characteristic factors into the training model. As shown in table 1, values of AUC, recall ratio and table lookup ratio obtained by inputting 30 candidate feature factors into the training model, obtaining 13 feature factors through single-round cutting, and obtaining 13 good-quality feature factors through multiple-round screening in this embodiment are given.
TABLE 1
Figure GDA0002866964030000091
As can be seen from table 1, the AUC values obtained by inputting 13 high-quality feature factors into the training model through multiple rounds of screening are greater than the AUC values obtained by inputting 30 candidate feature factors into the training model, and the AUC values obtained by inputting 13 feature factors into the training model through single round of cutting.
In the feature factor determining method provided in this embodiment, the feature factor with the lowest importance is deleted from the N candidate feature factors, and the remaining N-1 candidate feature factors are input into the training model to obtain an AUC value output by the training model; deleting the feature factor with the lowest importance from the N-1 candidate feature factors, and inputting the N-2 candidate feature factors into a training model to obtain an AUC value output by the training model; repeatedly executing the operation of deleting the characteristic factors with the lowest importance until 2 candidate characteristic factors remain, inputting the remaining 2 candidate characteristic factors into the training model, obtaining the AUC value output by the training model, and obtaining N-1 AUC values; (ii) taking the maximum of said N-1 AUC values as said AUC cut-off value; and fitting the N-1 AUC values to obtain the AUC fitting curve, and obtaining a high-quality characteristic factor according to an AUC critical value.
S103 in the embodiment of fig. 1 is further described in detail with reference to specific embodiments. Fig. 5 is a schematic flow chart of a third method for determining a feature factor according to an embodiment of the present invention, as shown in fig. 5, the method includes:
s501, storing the candidate characteristic factors corresponding to the AUC critical value into a first characteristic factor group;
specifically, the first characteristic factor set includes 13 high-quality characteristic factors a, b, and c … … m corresponding to the AUC threshold.
S502, obtaining an AUC extreme value according to the reference AUC value and the AUC fitting curve, wherein the AUC extreme value is a maximum value or a minimum value which is larger than the reference AUC value and smaller than the AUC critical value;
specifically, as shown in fig. 3, P, Q is the AUC limit point which is greater than the baseline AUC and less than the AUC threshold.
S503, aiming at each AUC extreme value, obtaining M candidate characteristic factors corresponding to the AUC extreme value;
s504, aiming at each candidate characteristic factor in the M candidate characteristic factors, storing the candidate characteristic factors into the first characteristic factor group to obtain a second characteristic factor group;
s505, inputting the characteristic factors in the second characteristic factor group into the training model to obtain a plurality of first AUC values;
s506, if there are a first AUC value greater than the critical AUC value and a first AUC value smaller than the critical AUC value, taking the candidate feature factor corresponding to the first AUC value greater than the critical AUC value and the feature factor in the first feature factor group as the target feature factor;
specifically, the AUC limit value at point P will be described as an example. And obtaining 2 candidate characteristic factors corresponding to the AUC extreme value of the point P, wherein the two candidate characteristic factors corresponding to the AUC extreme value of the point P are n and l respectively. Respectively storing n and l into the first characteristic factor group to obtain a second characteristic factor group; inputting the characteristic factors in the second characteristic factor group into the training model to obtain a first AUC value; and if the first AUC value obtained by adding n into the first characteristic factor group is larger than the critical AUC value and the first AUC value obtained by adding l into the first characteristic factor group is smaller than the critical AUC value, taking the second characteristic factor group obtained by adding n into the first characteristic factor group as the latest high-quality characteristic factor, namely the target characteristic factor. The target feature factors include: a. b, c … … m and n are 14 characteristic factors.
S507, if all the first AUC values are larger than the critical AUC value, storing the M candidate characteristic factors into the first characteristic factor group to obtain a third characteristic factor group;
s508, inputting the characteristic factors in the third characteristic factor group into the training model to obtain a second AUC value;
s509, determining a target characteristic factor according to the second AUC value and the critical AUC value.
Specifically, if a first AUC value obtained by adding n to the first characteristic factor group is greater than the critical AUC value, and a first AUC value obtained by adding l to the first characteristic factor group is also greater than the critical AUC value, then n and l are simultaneously stored in the first characteristic factor group to obtain a third characteristic factor group, and the characteristic factors in the third characteristic factor group are input to the training model to obtain a second AUC value. And if the second AUC value is larger than the critical AUC value, taking the third characteristic factor group as a target characteristic factor, namely the latest high-quality characteristic factor. And if the second AUC value is smaller than the critical AUC value, taking the n and the first characteristic factor group as the target characteristic factors.
Similarly, after judging whether the characteristic factor corresponding to the AUC extremum of the P point has the target characteristic factor, judging whether the characteristic factor corresponding to the AUC extremum of the Q point has the target characteristic factor. The specific process is as follows:
if the characteristic factor corresponding to the AUC extreme value of the point P has the target characteristic factor, that is, the target characteristic factor includes: a. b, c … … m and n are 14 characteristic factors, and then the characteristic factors corresponding to the AUC extreme value of the Q point are merged into the target characteristic factor. For example, in this embodiment, if the characteristic factor corresponding to the AUC extremum of the Q point is o, the o is incorporated into the target characteristic factor, and is input into the training model, and the relationship between the output third AUC value and AUC _ max is determined, and if it is determined that the third AUC value is greater than AUC _ max, the characteristic factor o corresponding to the AUC extremum of the Q point is the target characteristic factor, and a new target characteristic factor is obtained, including: a. b, c … … m, n, o.
If the characteristic factor corresponding to the AUC extreme value of the point P does not have the target characteristic factor, that is, the target characteristic factor includes: a. b, c … … m are 13 feature factors, the feature factor o corresponding to the AUC extreme value of the point Q is merged into the target feature factor, the merged feature factor is input into the training model, the relationship between the output third AUC value and AUC _ max is judged, if the third AUC value is determined to be greater than AUC _ max, the feature factor o corresponding to the AUC extreme value of the point Q is the target feature factor, and a new target feature factor is obtained, including: a. b, c … … m, o.
Specifically, in this embodiment, the number of the target feature factors is finally determined to be 14. As shown in table 2, the values of AUC, recall, and table lookup rate obtained by inputting 30 candidate feature factors into the training model, 13 good-quality feature factors, and 14 target feature factors into the training model are given.
TABLE 2
Figure GDA0002866964030000121
As can be seen from table 2, the AUC values obtained by inputting 14 target feature factors into the training model are greater than the AUC values obtained by inputting 30 candidate feature factors into the training model and 13 good-quality feature factors into the training model.
In summary, the method for determining the characteristic factors provided by the embodiment of the invention effectively avoids the occurrence of the condition of selecting the characteristic factors by mistake or selecting more characteristic factors, and well reduces the number of the characteristic factors to the optimal state.
Fig. 6 is a schematic structural diagram of a characteristic factor determining device according to an embodiment of the present invention, and as shown in fig. 6, the characteristic factor determining device 60 includes: a reference AUC value acquisition module 601, an AUC threshold acquisition module 602, and a target feature factor determination module 603.
A reference AUC value obtaining module 601, configured to obtain a reference AUC value according to N candidate feature factors, where each candidate feature factor is respectively used to describe a type of wind control feature, and the type of the wind control feature includes at least one of the following: an application feature, an underwriting feature, or an claim feature;
an AUC threshold obtaining module 602, configured to obtain an AUC threshold and an AUC fitting curve according to the importance of each candidate feature factor;
a target feature factor determining module 603, configured to determine a target feature factor among the N candidate feature factors according to the reference AUC value, the AUC critical value, and the AUC fitted curve.
Optionally, the AUC threshold obtaining module 602 is specifically configured to:
deleting the feature factor with the lowest importance from the N candidate feature factors, and inputting the remaining N-1 candidate feature factors into a training model to obtain an AUC value output by the training model;
deleting the feature factor with the lowest importance from the N-1 candidate feature factors, and inputting the remaining N-2 candidate feature factors into the training model to obtain an AUC value output by the training model;
repeatedly executing the operation of deleting the characteristic factor with the lowest importance until the characteristic factor with the lowest importance is deleted from the remaining 2 candidate characteristic factors, and inputting 1 candidate characteristic factor into the training model to obtain an AUC value output by the training model;
and obtaining the AUC critical value and the AUC fitting curve according to the obtained N-1 AUC values.
The AUC threshold acquisition module 602 is further specifically configured to:
(ii) taking the maximum of said N-1 AUC values as said AUC cut-off value;
and fitting the N-1 AUC values to obtain the AUC fitting curve.
Optionally, the target characteristic factor determining module 603 is specifically configured to:
storing the candidate characteristic factors corresponding to the AUC critical value into a first characteristic factor group;
obtaining an AUC extreme value according to the reference AUC value and the AUC fitting curve, wherein the AUC extreme value is a maximum value or a minimum value which is larger than the reference AUC value and smaller than the AUC critical value;
and determining a target characteristic factor according to the first characteristic factor group and the candidate characteristic factors corresponding to the AUC extreme values.
Optionally, the target characteristic factor determining module 603 is further specifically configured to:
for each AUC extreme value, obtaining M candidate characteristic factors corresponding to the AUC extreme value;
for each candidate characteristic factor in the M candidate characteristic factors, storing the candidate characteristic factor into the first characteristic factor group to obtain a second characteristic factor group;
inputting the characteristic factors in the second characteristic factor group into the training model to obtain a plurality of first AUC values;
determining a target characteristic factor from the plurality of first AUC values and the critical AUC value.
Optionally, the target characteristic factor determining module 603 is further specifically configured to:
and if a first AUC value larger than the critical AUC value and a first AUC value smaller than the critical value exist, taking the candidate characteristic factor corresponding to the first AUC value larger than the critical AUC value and the characteristic factor in the first characteristic factor group as the target characteristic factor.
Optionally, the target characteristic factor determining module 603 is further specifically configured to:
if all the first AUC values are larger than the critical AUC value, storing the M candidate characteristic factors into the first characteristic factor group to obtain a third characteristic factor group;
inputting the characteristic factors in the third characteristic factor group into the training model to obtain a second AUC value;
and determining a target characteristic factor according to the second AUC value and the critical AUC value.
The apparatus provided in this embodiment may be used to implement the technical solutions of the method embodiments shown in fig. 1 to fig. 5, and the implementation principles and technical effects are similar, which are not described herein again.
Fig. 7 is a schematic diagram of a hardware structure of the characteristic factor determining device according to the embodiment of the present invention. As illustrated in fig. 7, the characteristic factor determination device 70 provided in the present embodiment includes:
a processor 701, a memory 702; wherein
Memory 702 for storing computer-executable instructions.
A processor 701 for executing computer-executable instructions stored by the memory.
The processor 701 implements the various steps performed by the characteristic factor determination device in the above embodiments by executing computer-executable instructions stored by the memory. Reference may be made in particular to the description relating to the method embodiments described above.
Optionally, the memory 702 may be independent or integrated with the processor 601, and the embodiment is not particularly limited.
When the memory 702 is separately provided, the network switching device further includes a bus 703 for connecting the memory 702 and the processor 701.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer executing instruction is stored in the computer-readable storage medium, and when the processor executes the computer executing instruction, the characteristic factor determination method as described above is implemented.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit. The unit formed by the modules can be realized in a hardware form, and can also be realized in a form of hardware and a software functional unit.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present application.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A method for determining a feature factor, comprising:
acquiring a reference AUC value under a reference characteristic curve according to N candidate characteristic factors, wherein each candidate characteristic factor is respectively used for describing a type of wind control characteristic, and the type of the wind control characteristic comprises at least one of the following types: the application characteristic, the underwriting characteristic or the claim settlement characteristic, wherein N is an integer greater than or equal to 2;
acquiring an AUC critical value and an AUC fitting curve according to the importance of each candidate characteristic factor;
determining a target feature factor among the N candidate feature factors based on the baseline AUC value, the AUC cutoff value, and the AUC fitted curve;
said determining a target feature factor among said N candidate feature factors from said baseline AUC value, said AUC cutoff value, and said AUC fitted curve, comprising:
storing the candidate characteristic factors corresponding to the AUC critical value into a first characteristic factor group;
obtaining an AUC extreme value according to the reference AUC value and the AUC fitting curve, wherein the AUC extreme value is a maximum value or a minimum value which is larger than the reference AUC value and smaller than the AUC critical value;
for each AUC extreme value, obtaining M candidate characteristic factors corresponding to the AUC extreme value;
for each candidate characteristic factor in the M candidate characteristic factors, storing the candidate characteristic factor into the first characteristic factor group to obtain a second characteristic factor group;
inputting the characteristic factors in the second characteristic factor group into a training model to obtain a plurality of first AUC values;
determining a target feature factor from the plurality of first AUC values and the AUC cut-off value;
the obtaining of the reference AUC value under the reference characteristic curve according to the N candidate characteristic factors includes:
and inputting the N candidate characteristic factors into a training model, and acquiring a reference AUC value under a reference characteristic curve, wherein the reference AUC value is the area enclosed by the coordinate axis under the reference characteristic curve, and the AUC is defined as the area enclosed by the coordinate axis under the ROC curve.
2. The method of claim 1, wherein obtaining an AUC threshold and an AUC fitted curve based on the importance of each candidate feature comprises:
deleting the feature factor with the lowest importance from the N candidate feature factors, and inputting the remaining N-1 candidate feature factors into a training model to obtain an AUC value output by the training model;
deleting the feature factor with the lowest importance from the N-1 candidate feature factors, and inputting the remaining N-2 candidate feature factors into the training model to obtain an AUC value output by the training model;
repeatedly executing the operation of deleting the characteristic factors with the lowest importance until the remaining characteristic factors with the lowest importance are deleted from the 2 candidate characteristic factors, and inputting the 1 candidate characteristic factor into the training model to obtain an AUC value output by the training model;
and obtaining the AUC critical value and the AUC fitting curve according to the obtained N-1 AUC values.
3. The method of claim 2, wherein said obtaining said AUC threshold and said AUC fitted curve from said N-1 AUC values comprises:
(ii) taking the maximum of said N-1 AUC values as said AUC cut-off value;
and fitting the N-1 AUC values to obtain the AUC fitting curve.
4. The method of claim 1, wherein said determining a target feature factor from said plurality of first AUC values and said AUC threshold comprises:
and if a first AUC value larger than the AUC critical value and a first AUC value smaller than the AUC critical value exist, taking the candidate characteristic factor corresponding to the first AUC value larger than the AUC critical value and the characteristic factor in the first characteristic factor group as the target characteristic factor.
5. The method of claim 1, wherein said determining a target feature factor from said plurality of first AUC values and said AUC threshold comprises:
if all the first AUC values are larger than the AUC critical value, storing the M candidate characteristic factors into the first characteristic factor group to obtain a third characteristic factor group;
inputting the characteristic factors in the third characteristic factor group into the training model to obtain a second AUC value;
and determining a target characteristic factor according to the second AUC value and the AUC critical value.
6. A characteristic factor determination device characterized by comprising:
a reference AUC value obtaining module, configured to obtain an AUC value of an area under a reference characteristic curve according to N candidate characteristic factors, where each candidate characteristic factor is respectively used to describe a type of wind control feature, and the type of the wind control feature includes at least one of the following types: the application characteristic, the underwriting characteristic or the claim settlement characteristic, wherein N is an integer greater than or equal to 2;
an AUC critical value obtaining module, configured to obtain an AUC critical value and an AUC fitting curve according to the importance of each candidate feature factor;
a target feature factor determination module, configured to determine a target feature factor among the N candidate feature factors according to the baseline AUC value, the AUC critical value, and the AUC fitted curve;
the target characteristic factor determination module is specifically configured to store the candidate characteristic factor corresponding to the AUC critical value into a first characteristic factor group;
obtaining an AUC extreme value according to the reference AUC value and the AUC fitting curve, wherein the AUC extreme value is a maximum value or a minimum value which is larger than the reference AUC value and smaller than the AUC critical value;
for each AUC extreme value, obtaining M candidate characteristic factors corresponding to the AUC extreme value;
for each candidate characteristic factor in the M candidate characteristic factors, storing the candidate characteristic factor into the first characteristic factor group to obtain a second characteristic factor group;
inputting the characteristic factors in the second characteristic factor group into a training model to obtain a plurality of first AUC values;
determining a target feature factor from the plurality of first AUC values and the AUC cut-off value;
the reference AUC value obtaining module is specifically configured to input the N candidate feature factors into a training model, and obtain a reference AUC value under a reference feature curve, where the reference AUC value is an area enclosed by coordinate axes under the reference feature curve, and the AUC is defined as an area enclosed by ROC curves and coordinate axes.
7. A characteristic factor determination device characterized by comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing computer-executable instructions stored by the memory causes the at least one processor to perform a method of determining a characteristic factor as claimed in any one of claims 1 to 5.
8. A computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a processor, implement the feature factor determination method of any one of claims 1 to 5.
CN201811549933.0A 2018-12-18 2018-12-18 Feature factor determination method and device Active CN109670976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811549933.0A CN109670976B (en) 2018-12-18 2018-12-18 Feature factor determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811549933.0A CN109670976B (en) 2018-12-18 2018-12-18 Feature factor determination method and device

Publications (2)

Publication Number Publication Date
CN109670976A CN109670976A (en) 2019-04-23
CN109670976B true CN109670976B (en) 2021-02-26

Family

ID=66143956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811549933.0A Active CN109670976B (en) 2018-12-18 2018-12-18 Feature factor determination method and device

Country Status (1)

Country Link
CN (1) CN109670976B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503566B (en) * 2019-07-08 2024-02-09 中国平安人寿保险股份有限公司 Wind control model building method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567477A (en) * 2011-06-16 2012-07-11 北京亿赞普网络技术有限公司 Website value evaluation method and device
CN104615790A (en) * 2015-03-09 2015-05-13 百度在线网络技术(北京)有限公司 Characteristic recommendation method and device
CN108876487A (en) * 2018-08-29 2018-11-23 盈盈(杭州)网络技术有限公司 A kind of industrial plot estimation method based on big data and intelligent decision mechanism

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761451B (en) * 2014-01-02 2017-04-05 中国科学院数学与系统科学研究院 Biomarker combined recognising method and system based on biomedical big data
CN105469263A (en) * 2014-09-24 2016-04-06 阿里巴巴集团控股有限公司 Commodity recommendation method and device
CN107194137B (en) * 2016-01-31 2023-05-23 北京万灵盘古科技有限公司 Necrotizing enterocolitis classification prediction method based on medical data modeling

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567477A (en) * 2011-06-16 2012-07-11 北京亿赞普网络技术有限公司 Website value evaluation method and device
CN104615790A (en) * 2015-03-09 2015-05-13 百度在线网络技术(北京)有限公司 Characteristic recommendation method and device
CN108876487A (en) * 2018-08-29 2018-11-23 盈盈(杭州)网络技术有限公司 A kind of industrial plot estimation method based on big data and intelligent decision mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MicroRNA-320c、MicroRNA-451a、MicroRNA-486在食管鳞癌及癌前病变患者血清中的表达及其作为初筛标志物的可行性研究;马箐;《中国优秀硕士学位论文全文数据库(电子期刊)医药卫生科技辑》;20180115;全文第26-32页 *
候选食管癌相关抗原的筛选鉴定及其对食管鳞癌的诊断价值;张红飞;《中国博士学位论文全文数据库(电子期刊)医药卫生科技辑》;20171215;全文第57-73页 *

Also Published As

Publication number Publication date
CN109670976A (en) 2019-04-23

Similar Documents

Publication Publication Date Title
CN110503566B (en) Wind control model building method and device, computer equipment and storage medium
CN111090780A (en) Method and device for determining suspicious transaction information, storage medium and electronic equipment
CN113379301A (en) Method, device and equipment for classifying users through decision tree model
CN108197795B (en) Malicious group account identification method, device, terminal and storage medium
CN109670976B (en) Feature factor determination method and device
CN113240259B (en) Rule policy group generation method and system and electronic equipment
CN114265740A (en) Error information processing method, device, equipment and storage medium
CN113423113A (en) Wireless parameter optimization processing method and device and server
CN110991241B (en) Abnormality recognition method, apparatus, and computer-readable medium
CN110728585A (en) Authority guaranteeing method, device, equipment and storage medium
CN115936841A (en) Method and device for constructing credit risk assessment model
CN114881761A (en) Determination method of similar sample and determination method of credit limit
CN110570301B (en) Risk identification method, device, equipment and medium
CN114170000A (en) Credit card user risk category identification method, device, computer equipment and medium
CN113705625A (en) Method and device for identifying abnormal life guarantee application families and electronic equipment
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium
CN111738648A (en) Product selection method, device, equipment and storage medium
CN113723522B (en) Abnormal user identification method and device, electronic equipment and storage medium
CN116861101B (en) Data processing method and device for social matching
CN117808441B (en) Bid information checking method and system
CN112261484B (en) Target user identification method and device, electronic equipment and storage medium
CN117408787B (en) Root cause mining analysis method and system based on decision tree
CN108510071A (en) Feature extracting method, device and the computer readable storage medium of data
CN113238939A (en) Test case generation method, device, equipment, storage medium and program
CN107678841B (en) Method and device for controlling underwriting processing, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Floor 36, Zheshang Building, No. 718 Jianshe Avenue, Jiang'an District, Wuhan, Hubei 430019

Patentee after: TK.CN INSURANCE Co.,Ltd.

Patentee after: TAIKANG INSURANCE GROUP Co.,Ltd.

Address before: Taikang Life Building, 156 fuxingmennei street, Xicheng District, Beijing 100031

Patentee before: TAIKANG INSURANCE GROUP Co.,Ltd.

Patentee before: TK.CN INSURANCE Co.,Ltd.