CN109670976A - Characterization factor determines method and apparatus - Google Patents
Characterization factor determines method and apparatus Download PDFInfo
- Publication number
- CN109670976A CN109670976A CN201811549933.0A CN201811549933A CN109670976A CN 109670976 A CN109670976 A CN 109670976A CN 201811549933 A CN201811549933 A CN 201811549933A CN 109670976 A CN109670976 A CN 109670976A
- Authority
- CN
- China
- Prior art keywords
- auc
- factor
- value
- feature
- candidate feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/08—Insurance
Abstract
The present embodiment provides a kind of characterization factors to determine method and apparatus, this method comprises: first according to N number of candidate feature factor, obtain benchmark AUC value, each candidate feature factor is respectively used to describe a type of air control feature, and the type of the air control feature includes at least one of following: feature of insuring, feature of accepting insurance or Claims Resolution feature;Further according to the importance of each candidate feature factor, AUC critical value and AUC matched curve are obtained;Further according to the benchmark AUC value, the AUC critical value and the AUC matched curve, the target signature factor is determined in N number of candidate feature factor, it overcomes and removes screening characterization factor using single-wheel jackknife, the quantity into the modular character factor cannot be assessed, quantity for entering the characterization factor of mould can only be estimated by staff oneself, there are deviation, the problem of can not reasonably choosing the best features factor.
Description
Technical field
The present invention relates to insurance air control technical fields more particularly to a kind of characterization factor to determine method and apparatus.
Background technique
Currently, there are part frauds for accident insurance and health insurance, for example some insurers fill in falseness on request slip
Booming income information, information is high insured amount to obtain whereby, causes very big fraud suspicion.It, can be by insurance for this class behavior
Business scenario builds a set of comprehensive, system, mathematical model for suiting business scenario, and then draws in conjunction with business scenario rule
It holds up, carrys out various dimensions screening deceptive information, and be applied to core and protect in rule, to avoid the generation of fraud.
At present there are three types of the optimizations substantially of mathematical model: the optimization of algorithm, the optimization of sample are screened a part of good
The optimization of sample, characterization factor.Wherein the method for existing characterization factor optimization is mainly single-wheel jackknife, concrete principle are as follows:
By assessing each characterization factor to the effect of model, to exclude to influence lesser characterization factor to model, completed with this
The attitude layer of the corresponding sample data of characterization factor screens more reasonable characterization factor input model.
But go to screen the quantity for the characterization factor that characterization factor cannot assess input model using single-wheel jackknife, it is right
It can only be estimated by staff oneself in the quantity of the characterization factor of input model, there are deviations, can not reasonably choose most
Good characterization factor.
Summary of the invention
The embodiment of the present invention provides a kind of characterization factor and determines method and apparatus, overcomes and goes to screen using single-wheel jackknife
Characterization factor cannot assess the quantity into the modular character factor, can only be by staff certainly for entering the quantity of characterization factor of mould
Oneself estimation, there are deviation, the problem of can not reasonably choosing the best features factor.
In a first aspect, the embodiment of the present invention, which provides a kind of characterization factor, determines method, comprising:
According to N number of candidate feature factor, benchmark AUC value is obtained, each candidate feature factor is respectively used to describe one kind
The air control feature of type, the type of the air control feature include at least one of following: feature of insuring, feature of accepting insurance or Claims Resolution
Feature;
According to the importance of each candidate feature factor, AUC critical value and AUC matched curve are obtained;
According to the benchmark AUC value, the AUC critical value and the AUC matched curve, in N number of candidate feature
The target signature factor is determined in the factor.
In a kind of possible design, the importance according to each candidate feature factor, obtain AUC critical value with
And AUC matched curve, comprising:
The minimum characterization factor of importance is deleted from N number of candidate feature factor, by remaining N-1 candidate feature
The factor inputs training pattern, obtains the AUC value of the training pattern output;
The minimum characterization factor of importance is deleted from the N-1 candidate feature factor, it is candidate special by remaining N-2
It levies the factor and inputs the training pattern, obtain the AUC value of the training pattern output;
The operation for deleting the minimum characterization factor of importance is repeated, until from remaining 2 candidate feature factors
The minimum characterization factor of importance is deleted, 1 candidate feature factor is inputted into the training pattern, it is defeated to obtain the training pattern
AUC value out;
According to N-1 obtained AUC value, the AUC critical value and the AUC matched curve are obtained.
In a kind of possible design, the N-1 AUC value that the basis obtains obtains the AUC critical value and described
AUC matched curve, comprising:
Using the maximum value in the N-1 AUC value as the AUC critical value;
The N-1 AUC value is fitted, the AUC matched curve is obtained.
It is described to be fitted according to the benchmark AUC value, the AUC critical value and the AUC in a kind of possible design
Curve determines the target signature factor in N number of candidate feature factor, comprising:
The corresponding candidate feature factor of the AUC critical value is stored in fisrt feature factor set;
According to the benchmark AUC value and the AUC matched curve, AUC extreme value is obtained, the AUC extreme value is greater than described
Benchmark AUC value, and it is less than the maximum value or minimum value of the AUC critical value;
According to the characterization factor group, the corresponding candidate feature factor of each AUC extreme value, the target signature factor is determined.
In a kind of possible design, it is described according to the characterization factor group, the corresponding candidate feature of each AUC extreme value
The factor determines the target signature factor, comprising:
For each AUC extreme value, the corresponding M candidate feature factor of the AUC extreme value is obtained;
For each candidate feature factor in the M candidate feature factor, by the candidate feature factor be stored in
The fisrt feature factor set, obtains second feature factor set;
Characterization factor in the second feature factor set is inputted into the training pattern, obtains multiple first AUC value;
According to the multiple first AUC value and the critical AUC value, the target signature factor is determined.
It is described according to the multiple first AUC value and the critical AUC value in a kind of possible design, determine target
Characterization factor, comprising:
The first AUC value greater than the first AUC value of the critical AUC value and less than the critical value if it exists then will
Greater than the feature in the corresponding candidate feature factor of the first AUC value of the critical AUC value and the fisrt feature factor set because
Son is used as the target signature factor.
It is described according to the multiple first AUC value and the critical AUC value in a kind of possible design, determine target
Characterization factor, comprising:
If all first AUC value are all larger than the critical AUC value, by M candidate feature factor deposit described the
One characterization factor group, obtains third feature factor set;
Characterization factor in the third feature factor set is input to the training pattern, obtains the second AUC value;
According to second AUC value and the critical AUC value, the target signature factor is determined.
Second aspect, the embodiment of the present invention provide a kind of characterization factor and determine equipment, comprising:
Benchmark AUC value obtains module, for obtaining benchmark AUC value value, each candidate according to N number of candidate feature factor
Characterization factor is respectively used to describe a type of air control feature, and the type of the air control feature includes at least one in following
Kind: feature of insuring, feature of accepting insurance or Claims Resolution feature;
AUC critical value obtains module, for the importance according to each candidate feature factor, obtain AUC critical value with
And AUC matched curve;
Target signature factor determining module, for quasi- according to the benchmark AUC value, the AUC critical value and the AUC
Curve is closed, the target signature factor is determined in N number of candidate feature factor.
In a kind of possible design, the AUC critical value obtains module and is specifically used for:
The minimum characterization factor of importance is deleted from N number of candidate feature factor, by remaining N-1 candidate feature
The factor inputs training pattern, obtains the AUC value of the training pattern output;
The minimum characterization factor of importance is deleted from the N-1 candidate feature factor, it is candidate special by remaining N-2
It levies the factor and inputs the training pattern, obtain the AUC value of the training pattern output;
The operation for deleting the minimum characterization factor of importance is repeated, until from remaining 2 candidate feature factors
The minimum characterization factor of importance is deleted, 1 candidate feature factor is inputted into the training pattern, it is defeated to obtain the training pattern
AUC value out;
According to N-1 obtained AUC value, the AUC critical value and the AUC matched curve are obtained.
In a kind of possible design, the AUC critical value obtain module also particularly useful for:
Using the maximum value in the N-1 AUC value as the AUC critical value;
The N-1 AUC value is fitted, the AUC matched curve is obtained.
In a kind of possible design, the target signature factor determining module is specifically used for:
The corresponding candidate feature factor of the AUC critical value is stored in fisrt feature factor set;
According to the benchmark AUC value and the AUC matched curve, AUC extreme value is obtained, the AUC extreme value is greater than described
Benchmark AUC value, and it is less than the maximum value or minimum value of the AUC critical value;
According to the fisrt feature factor set, the corresponding candidate feature factor of each AUC extreme value, determine target signature because
Son.
In a kind of possible design, the target signature factor determining module also particularly useful for:
For each AUC extreme value, the corresponding M candidate feature factor of the AUC extreme value is obtained;
For each candidate feature factor in the M candidate feature factor, by the candidate feature factor be stored in
The fisrt feature factor set, obtains second feature factor set;
Characterization factor in the second feature factor set is inputted into the training pattern, obtains multiple first AUC value;
According to the multiple first AUC value and the critical AUC value, the target signature factor is determined.
In a kind of possible design, the target signature factor determining module also particularly useful for:
The first AUC value greater than the first AUC value of the critical AUC value and less than the critical value if it exists then will
Greater than the feature in the corresponding candidate feature factor of the first AUC value of the critical AUC value and the fisrt feature factor set because
Son is used as the target signature factor.
In a kind of possible design, the target signature factor determining module also particularly useful for:
If all first AUC value are all larger than the critical AUC value, by M candidate feature factor deposit described the
One characterization factor group, obtains third feature factor set;
Characterization factor in the third feature factor set is input to the training pattern, obtains the second AUC value;
According to second AUC value and the critical AUC value, the target signature factor is determined.
The third aspect, the embodiment of the present invention provide a kind of characterization factor and determine equipment, comprising: at least one processor and deposit
Reservoir;
The memory stores computer executed instructions;
At least one described processor executes the computer executed instructions of memory storage so that it is described at least one
Processor is executed as the described in any item characterization factors of first aspect determine method.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium
It is stored with computer executed instructions in matter, when processor executes the computer executed instructions, realizes as first aspect is any
Characterization factor described in determines method.
Characterization factor provided in this embodiment determines method and apparatus, and first according to N number of candidate feature factor, it is special to obtain benchmark
Area under the curve AUC value is levied, each candidate feature factor is respectively used to describe a type of air control feature, and the air control is special
The type of sign includes at least one of following: feature of insuring, feature of accepting insurance or Claims Resolution feature;Further according to each candidate feature
The importance of the factor obtains AUC critical value and AUC matched curve;Further according to the benchmark AUC value, the AUC critical value with
And the AUC matched curve, the target signature factor is determined in N number of candidate feature factor, is overcome and is cut using single-wheel knife
Method removes screening characterization factor, cannot assess the quantity into the modular character factor, can only be by work for entering the quantity of characterization factor of mould
Make personnel oneself estimation, there are deviation, the problem of can not reasonably choosing the best features factor.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the flow diagram one that the characterization factor that embodiment of the present invention provides determines method;
Fig. 2 is the flow diagram two that the characterization factor that embodiment of the present invention provides determines method;
Fig. 3 is that the characterization factor that embodiment of the present invention provides deletes process schematic;
Fig. 4 is the schematic diagram of AUC matched curve provided in this embodiment;
Fig. 5 is characterization factor provided in an embodiment of the present invention with determining method flow diagram three;
Fig. 6 is the structural schematic diagram that characterization factor provided in an embodiment of the present invention determines equipment;
Fig. 7 is the hardware structural diagram that characterization factor provided in an embodiment of the present invention determines equipment.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Fig. 1 is the flow diagram one that the characterization factor that embodiment of the present invention provides determines method, as shown, the party
Method includes:
S101, according to N number of candidate feature factor, obtain benchmark AUC value, each candidate feature factor is respectively used to describe
A type of air control feature, the type of the air control feature include at least one of following: feature of insuring, feature of accepting insurance or
Claims Resolution feature.
During specific implementation, first N number of candidate feature factor is obtained from sample data, wherein N be greater than or
Integer equal to 2.And determine algorithm model for exporting area AUC value under indicatrix.
For example, the candidate feature factor is specifically used for describing a type of air control feature.The type packet of the air control feature
It includes at least one of following: feature of insuring, feature of accepting insurance or Claims Resolution feature.Wherein, feature of insuring can be obtained from insurance data
It arrives, such as insurance data includes insurer, the true identity of warrantee, property situation etc., Claims Resolution feature can be from Claims Resolution data
It obtains, such as Claims Resolution data include the amount for which loss settled of Claims Resolution event, Claims Resolution (or being in danger) frequency, Claims Resolution (or being in danger) person-time, are held
Protecting feature can obtain from data of accepting insurance, and data of accepting insurance are, for example, accept insurance time, underwriting amount etc..N number of candidate feature as a result,
The factor can be for example insurer, the true identity of warrantee, amount for which loss settled, Claims Resolution (or being in danger) person-time etc..
Specifically, N number of candidate feature factor can be input in training pattern, reference characteristic song is obtained by training pattern
Area AUC value under line, i.e. benchmark AUC value.The training pattern for example can be special for gradient decline tree
(GradientBoostingDecisionTree, GBDT), (Logic Rgression, LR) logistic regression, (Random
Rorest, RF) characteristic models such as random forest, (Support Vector Machine, SVM) support vector machines, the present embodiment
Herein with no restrictions.
Wherein, AUC is defined as the area surrounded under ROC curve with reference axis, it is clear that the numerical value of this area will not be big
In 1.Top due to ROC curve generally all in this straight line of y=x again, so the value range of AUC is between 0.5 and 1.
AOC, Receiver operating curve's (receiver operating characteristic curve, ROC curve), also known as
To experience linearity curve (sensitivity curve).Receiver operating curve is with false positive probability (False
Positive rate) it is horizontal axis, true positives (True positive rate) are coordinate diagram and subject composed by the longitudinal axis
Due to curve that the Different Results obtained using different judgment criterias are drawn under the conditions of particular stimulation.
By taking the analysis of the loss ratios of insurance products as an example, by user draw a portrait system index screening be hospitalized protect product accept insurance with
And Claims Resolution expiry declaration form data, a part of data are therefrom extracted, by getting 110,000 datas, screening 30 after data prediction
The characterization factor of a dimension.These characterization factors have maintained independence in service layer, but incomplete in mathematics level
Reach decoupling, still there is the characterization factor of redundancy and strong correlation.
Specifically, in the present embodiment, bringing N number of candidate feature factor into GBDT gradient decline tree characteristic model, output
Indicatrix (receiver operating characteristic curve, ROC), further according under ROC curve with reference axis
The area surrounded is reference characteristic area under the curve AUC, is denoted as AUC-total, carrys out forecasting risk client.Wherein, AUC value is got over
Height, representative prediction is more quasi-, and risk is lower.
S102, according to the importance of each candidate feature factor, obtain AUC critical value and AUC matched curve.
Optionally, first the importance of N number of candidate feature factor is ranked up from low to high.Specific sort method are as follows:
N number of candidate feature Importance of Factors is ranked up from low to high based on gini index, after sequence, and to feature because
The importance of son is standardized, and standardization formula is as follows:
Importance=C*feature_importance/Max (feature_importance)
Wherein, Importance is standardized characterization factor importance, and C is constant, and feature_importance is
Characterization factor importance, Max (feature_importance) are characterized the maximum value of Importance of Factors.C can be according to reality
Situation is voluntarily formulated.
After carrying out importance descending sort to N number of characterization factor, N number of characterization factor Duolun is screened according to importance, and
Characterization factor after each round is screened is brought into training pattern, finally obtains multiple AUC value, and by multiple AUC value most
Big value is used as AUC critical value, is denoted as AUC_max.Multiple AUC value are fitted to curve again, obtain AUC matched curve.
S103, according to the benchmark AUC value, the AUC critical value and the AUC matched curve, in N number of candidate
The target signature factor is determined in characterization factor.
Specifically, finding out the extreme point that the AUC value in AUC matched curve is greater than AUC_total value, it is denoted as AUC_j, in N
It is determined in the corresponding candidate feature factor of AUC critical value and the corresponding candidate feature factor of AUC_j in a candidate feature factor
The target signature factor.
Characterization factor provided in this embodiment determines method, first according to N number of candidate feature factor, obtains reference characteristic curve
Lower area AUC value, each candidate feature factor are respectively used to describe a type of air control feature, the class of the air control feature
Type includes at least one of following: feature of insuring, feature of accepting insurance or Claims Resolution feature;Further according to each candidate feature factor
Importance obtains AUC critical value and AUC matched curve;Further according to the benchmark AUC value, the AUC critical value and described
AUC matched curve determines the target signature factor in N number of candidate feature factor, overcomes and go to sieve using single-wheel jackknife
Characterization factor is selected, the quantity into the modular character factor cannot be assessed, it can only be by staff for entering the quantity of characterization factor of mould
Oneself estimation, there are deviation, the problem of can not reasonably choosing the best features factor.
Below with reference to specific embodiment, the S102 in Fig. 1 embodiment is further elaborated.Fig. 2 is this hair
The characterization factor that bright embodiment provides determines the flow diagram two of method, as shown in Fig. 2, this method comprises:
S201, the minimum characterization factor of importance is deleted from N number of candidate feature factor, remaining N-1 is waited
It selects characterization factor to input training pattern, obtains the AUC value of the training pattern output;
S202, the minimum characterization factor of importance is deleted from the N-1 candidate feature factor, by remaining N-2
The candidate feature factor inputs training pattern, obtains the AUC value of the training pattern output;
S203, the operation for deleting the minimum characterization factor of importance is repeated, until being left remaining 2 candidate features
2 candidate feature factors are inputted the training pattern by the factor, are obtained the AUC value of the training pattern output, are obtained N-1
AUC value;
Specifically, as shown in figure 3, the first round delete the minimum characterization factor of importance from N number of candidate feature factor, by N-
1 candidate feature factor inputs training pattern, obtains the AUC value of training pattern type output, is denoted as ACU_1;Second wheel is from N-1
The candidate feature factor deletes the minimum characterization factor of importance, and the N-2 candidate feature factor is inputted training pattern, is trained
The AUC value of model output, is denoted as ACU_2;The operation for deleting the minimum characterization factor of importance is repeated, is deleted through excessively taking turns
It removes, until being left 2 candidate feature factors, 2 candidate feature factors is inputted into training pattern, obtain the AUC value of training output,
It is denoted as AUC_n-1.Fig. 3 is that the characterization factor that embodiment of the present invention provides deletes process schematic.
S204, using the maximum value in the N-1 AUC value as the AUC critical value;
S205, the N-1 AUC value is fitted, obtains the AUC matched curve.
The AUC_n-1 specifically, by AUC_1, ACU_2 ..., the maximum value in N-1 AUC value is as AUC critical value, note
For AUC_max, and by AUC_1, ACU_2 ... AUC_n-1, N-1 AUC value is fitted to obtain AUC matched curve, such as Fig. 4
Shown, Fig. 4 is the schematic diagram of AUC matched curve provided in this embodiment.
Specifically, in the present embodiment, carrying out multi-turns screen to 30 candidate feature factors, 13 quality features can be obtained
The factor, the AUC value that this 13 quality features factor input training patterns obtain is AUC_max.Optionally, characterization factor is inputted
Training pattern can also obtain recall ratio and rate of tabling look-up.As shown in table 1,30 candidate feature factors are given and input training mould
Type cuts to obtain in 13 characterization factors and the present embodiment that obtain 13 quality features factors defeated by multi-turns screen by single-wheel knife
Enter training pattern and obtains the value of AUC, recall ratio and rate of tabling look-up.
Table 1
As shown in Table 1, the AUC value that 13 quality features factor input training patterns obtain is obtained by multi-turns screen to be greater than
30 candidate feature factors input training patterns cut to obtain 13 characterization factors input training patterns and obtain AUC by single-wheel knife
Value.
Characterization factor provided in this embodiment determines method, and it is minimum that importance is deleted from N number of candidate feature factor
Characterization factor, the remaining N-1 candidate feature factor is inputted into training pattern, obtains the AUC value of training pattern output;
The minimum characterization factor of importance is deleted from the N-1 candidate feature factor, by N-2 candidate feature factor input training
Model obtains the AUC value of the training pattern output;The operation for deleting the minimum characterization factor of importance is repeated, until
It is left 2 candidate feature factors, remaining 2 candidate feature factors is inputted into the training pattern, obtain the training pattern
The AUC value of output obtains N-1 AUC value;Using the maximum value in the N-1 AUC value as the AUC critical value;To described
N-1 AUC value is fitted, and obtains the AUC matched curve, according to the available quality features factor of AUC critical value.
Below with reference to specific embodiment, the S103 in Fig. 1 embodiment is further elaborated.Fig. 5 is this hair
Determining method the flow diagram three of the characterization factor that bright embodiment provides, as shown in figure 5, this method comprises:
S501, the corresponding candidate feature factor of the AUC critical value is stored in fisrt feature factor set;
Specifically, fisrt feature factor set includes the corresponding 13 quality features factor a, b, c ... the m of AUC critical value.
S502, according to the benchmark AUC value and the AUC matched curve, obtain AUC extreme value, the AUC extreme value be greater than
The benchmark AUC value, and it is less than the maximum value or minimum value of the AUC critical value;
Specifically, as shown in figure 3, P, Q are greater than benchmark AUC value and less than the AUC extreme point of AUC critical value.
S503, it is directed to each AUC extreme value, obtains the corresponding M candidate feature factor of the AUC extreme value;
S504, for each candidate feature factor in the M candidate feature factor, the candidate feature factor is deposited
Enter to the fisrt feature factor set, obtains second feature factor set;
S505, the characterization factor in the second feature factor set is inputted into the training pattern, obtains multiple first AUC
Value;
S506, the first AUC value if it exists greater than the first AUC value of the critical AUC value and less than the critical value,
It then will be greater than the spy in the corresponding candidate feature factor of the first AUC value and the fisrt feature factor set of the critical AUC value
The factor is levied as the target signature factor;
Specifically, being illustrated by taking the AUC extreme value of P point as an example.Obtain P point corresponding 2 candidate features of AUC extreme value because
Son, the corresponding two candidate feature factors of the AUC extreme value of P point are respectively n and l.N and l are stored in respectively to fisrt feature factor set,
Obtain second feature factor set;Characterization factor in second feature factor set is inputted into training pattern, obtains the first AUC value;If
The first AUC value that n addition fisrt feature factor set obtains is greater than critical AUC value, l addition fisrt feature factor set is obtained
First AUC value is less than critical AUC value, then the second feature factor set obtained n addition fisrt feature factor set is as newest
The quality features factor, i.e. the target signature factor.The target signature factor includes: a, b, c ... m, n totally 14 characterization factors.
If S507, all first AUC value are all larger than the critical AUC value, the M candidate feature factor is stored in institute
Fisrt feature factor set is stated, third feature factor set is obtained;
S508, the characterization factor in the third feature factor set is input to the training pattern, obtains the 2nd AUC
Value;
S509, according to second AUC value and the critical AUC value, determine the target signature factor.
Specifically, equally l is added if the first AUC value that n addition fisrt feature factor set obtains is greater than critical AUC value
Enter the first AUC value that fisrt feature factor set obtains and be greater than critical AUC value, then n and l is stored in fisrt feature factor set simultaneously,
Third feature factor set is obtained, the characterization factor in third feature factor set is input to training pattern, obtains the second AUC value.
If the second AUC value is greater than critical AUC value, using third feature factor set as the target signature factor, i.e., newest quality features
The factor.If the second AUC value is less than critical AUC value, using n and fisrt feature factor set as the target signature factor.
Likewise, after having judged that the corresponding characterization factor of AUC extreme value of P point whether there is the target signature factor, then judge Q
It whether there is the target signature factor in the corresponding characterization factor of AUC extreme value of point.Detailed process are as follows:
If the corresponding characterization factor of AUC extreme value of P point there are the target signature factor, i.e., the target signature factor include: a, b,
C ... m, n totally 14 characterization factors, then be incorporated into the target signature factor for the corresponding characterization factor of AUC extreme value of Q point.Such as
In the present embodiment, the corresponding characterization factor of AUC extreme value of Q point is o, and o is incorporated into the target signature factor, is input to training pattern,
Judge relationship of the third AUC value with AUC_max of output, however, it is determined that third AUC value is greater than AUC_max, the AUC extreme value pair of Q point
The characterization factor o answered is the target signature factor, obtains the new target signature factor, comprising: a, b, c ... m, n, o.
If the corresponding characterization factor of AUC extreme value of P point be not present the target signature factor, i.e., the target signature factor include: a,
B, c ... m totally 13 characterization factors, then be incorporated into the target signature factor for the corresponding characterization factor o of the AUC extreme value of Q point, be input to
Training pattern judges relationship of the third AUC value with AUC_max of output, however, it is determined that third AUC value is greater than AUC_max, Q point
The corresponding characterization factor o of AUC extreme value is the target signature factor, obtains the new target signature factor, comprising: a, b, c ... m, o.
Specifically, in the present embodiment, it is final to determine that the target signature factor is 14.As shown in table 2,30 times are given
Select characterization factor input training pattern, 13 quality features factors and 14 target signature factors input training patterns obtain AUC,
The value of recall ratio and rate of tabling look-up.
Table 2
As shown in Table 2, the AUC value that 14 target signature factor input training patterns obtain is greater than 30 candidate feature factors
Input training pattern, 13 quality features factor input training patterns obtain the value of AUC.
To sum up, characterization factor provided in an embodiment of the present invention determines method, effectively avoids falsely dropping or multiselect characterization factor
It happens, and the quantity of characterization factor is dropped into optimum state well.
Fig. 6 is the structural schematic diagram that characterization factor provided in an embodiment of the present invention determines equipment, as shown in fig. 6, this feature
The factor determines that equipment 60 includes: that benchmark AUC value obtains module 601, AUC critical value obtains module 602 and the target signature factor
Determining module 603.
Benchmark AUC value obtains module 601, for obtaining benchmark AUC value, each candidate according to N number of candidate feature factor
Characterization factor is respectively used to describe a type of air control feature, and the type of the air control feature includes at least one in following
Kind: feature of insuring, feature of accepting insurance or Claims Resolution feature;
AUC critical value obtains module 602, for the importance according to each candidate feature factor, obtains AUC critical value
And AUC matched curve;
Target signature factor determining module 603, for according to the benchmark AUC value, the AUC critical value and described
AUC matched curve determines the target signature factor in N number of candidate feature factor.
Optionally, AUC critical value obtains module 602 and is specifically used for:
The minimum characterization factor of importance is deleted from N number of candidate feature factor, by remaining N-1 candidate feature
The factor inputs training pattern, obtains the AUC value of the training pattern output;
The minimum characterization factor of importance is deleted from the N-1 candidate feature factor, it is candidate special by remaining N-2
It levies the factor and inputs the training pattern, obtain the AUC value of the training pattern output;
The operation for deleting the minimum characterization factor of importance is repeated, until from remaining 2 candidate feature factors
The minimum characterization factor of importance is deleted, 1 candidate feature factor is inputted into the training pattern, it is defeated to obtain the training pattern
AUC value out;
According to N-1 obtained AUC value, the AUC critical value and the AUC matched curve are obtained.
AUC critical value obtain module 602 also particularly useful for:
Using the maximum value in the N-1 AUC value as the AUC critical value;
The N-1 AUC value is fitted, the AUC matched curve is obtained.
Optionally, target signature factor determining module 603 is specifically used for:
The corresponding candidate feature factor of the AUC critical value is stored in fisrt feature factor set;
According to the benchmark AUC value and the AUC matched curve, AUC extreme value is obtained, the AUC extreme value is greater than described
Benchmark AUC value, and it is less than the maximum value or minimum value of the AUC critical value;
According to the fisrt feature factor set, the corresponding candidate feature factor of each AUC extreme value, determine target signature because
Son.
Optionally, target signature factor determining module 603 also particularly useful for:
For each AUC extreme value, the corresponding M candidate feature factor of the AUC extreme value is obtained;
For each candidate feature factor in the M candidate feature factor, by the candidate feature factor be stored in
The fisrt feature factor set, obtains second feature factor set;
Characterization factor in the second feature factor set is inputted into the training pattern, obtains multiple first AUC value;
According to the multiple first AUC value and the critical AUC value, the target signature factor is determined.
Optionally, target signature factor determining module 603 also particularly useful for:
The first AUC value greater than the first AUC value of the critical AUC value and less than the critical value if it exists then will
Greater than the feature in the corresponding candidate feature factor of the first AUC value of the critical AUC value and the fisrt feature factor set because
Son is used as the target signature factor.
Optionally, target signature factor determining module 603 also particularly useful for:
If all first AUC value are all larger than the critical AUC value, by M candidate feature factor deposit described the
One characterization factor group, obtains third feature factor set;
Characterization factor in the third feature factor set is input to the training pattern, obtains the second AUC value;
According to second AUC value and the critical AUC value, the target signature factor is determined.
Device provided in this embodiment can be used for executing the technical solution of embodiment of the method shown in Fig. 1 to Fig. 5, in fact
Existing principle is similar with technical effect, and details are not described herein again.
Fig. 7 is the hardware structural diagram that characterization factor provided in an embodiment of the present invention determines equipment.As described in Figure 7, originally
The characterization factor that embodiment provides determines that equipment 70 includes:
Processor 701, memory 702;Wherein
Memory 702, for storing computer executed instructions.
Processor 701, for executing the computer executed instructions of memory storage.
Processor 701 by executing the computer executed instructions of memory storage, realize in above-described embodiment feature because
Son determines each step performed by equipment.It specifically may refer to the associated description in above method embodiment.
Optionally, memory 702 can also be integrated, the present embodiment is not with processor 601 either independent
It is specifically limited.
When memory 702 is independently arranged, which further includes bus 703, for connecting the memory
702, processor 701.
The embodiment of the present invention also provides a kind of computer readable storage medium, stores in the computer readable storage medium
There are computer executed instructions, when processor executes the computer executed instructions, realizes that characterization factor as described above determines
Method.
In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it
Its mode is realized.For example, apparatus embodiments described above are merely indicative, for example, the division of the module, only
Only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple modules can combine or
It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of device or module
It connects, can be electrical property, mechanical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
It, can also be in addition, each functional module in each embodiment of the present invention can integrate in one processing unit
It is that modules physically exist alone, can also be integrated in one unit with two or more modules.Above-mentioned module at
Unit both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated module realized in the form of software function module, can store and computer-readable deposit at one
In storage media.Above-mentioned software function module is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server or the network equipment etc.) or processor (English: processor) execute this Shen
Please each embodiment the method part steps.
It should be understood that above-mentioned processor can be central processing unit (English: Central Processing Unit, letter
Claim: CPU), can also be other general processors, digital signal processor (English: Digital Signal Processor,
Referred to as: DSP), specific integrated circuit (English: Application Specific Integrated Circuit, referred to as:
ASIC) etc..General processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with hair
The step of bright disclosed method, can be embodied directly in hardware processor and execute completion, or with hardware in processor and soft
Part block combiner executes completion.
Memory may include high speed RAM memory, it is also possible to and it further include non-volatile memories NVM, for example, at least one
Magnetic disk storage can also be USB flash disk, mobile hard disk, read-only memory, disk or CD etc..
Bus can be industry standard architecture (Industry Standard Architecture, ISA) bus, outer
Portion's apparatus interconnection (PeripheralComponent, PCI) bus or extended industry-standard architecture (Extended
Industry Standard Architecture, EISA) bus etc..Bus can be divided into address bus, data/address bus, control
Bus etc..For convenient for indicating, the bus in illustrations does not limit only a bus or a type of bus.
Above-mentioned storage medium can be by any kind of volatibility or non-volatile memory device or their combination
It realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable
Read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash memory,
Disk or CD.Storage medium can be any usable medium that general or specialized computer can access.
A kind of illustrative storage medium is coupled to processor, believes to enable a processor to read from the storage medium
Breath, and information can be written to the storage medium.Certainly, storage medium is also possible to the component part of processor.It processor and deposits
Storage media can be located at specific integrated circuit (Application Specific Integrated Circuits, referred to as:
ASIC in).Certainly, pocessor and storage media can also be used as discrete assembly and be present in electronic equipment or main control device.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to
The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey
When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or
The various media that can store program code such as person's CD.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (10)
1. a kind of characterization factor determines method characterized by comprising
According to N number of candidate feature factor, benchmark AUC value under reference characteristic curve is obtained, each candidate feature factor is used respectively
In describing a type of air control feature, the type of the air control feature includes at least one of following: feature of insuring is accepted insurance
Feature or Claims Resolution feature;
According to the importance of each candidate feature factor, AUC critical value and AUC matched curve are obtained;
According to the benchmark AUC value, the AUC critical value and the AUC matched curve, in N number of candidate feature factor
The middle determining target signature factor.
2. the method according to claim 1, wherein the importance according to each candidate feature factor,
Obtain AUC critical value and AUC matched curve, comprising:
The minimum characterization factor of importance is deleted from N number of candidate feature factor, by the remaining N-1 candidate feature factor
Training pattern is inputted, the AUC value of the training pattern output is obtained;
The minimum characterization factor of importance is deleted from the N-1 candidate feature factor, by remaining N-2 candidate feature because
Son inputs the training pattern, obtains the AUC value of the training pattern output;
The operation for deleting the minimum characterization factor of importance is repeated, is deleted from 2 candidate feature factors until remaining
1 candidate feature factor is inputted the training pattern by the minimum characterization factor of importance, obtains the training pattern output
AUC value;
According to N-1 obtained AUC value, the AUC critical value and the AUC matched curve are obtained.
3. according to the method described in claim 2, it is characterized in that, the N-1 AUC value that the basis obtains, obtains the AUC
Critical value and the AUC matched curve, comprising:
Using the maximum value in the N-1 AUC value as the AUC critical value;
The N-1 AUC value is fitted, the AUC matched curve is obtained.
4. according to the method described in claim 3, it is characterized in that, described according to the benchmark AUC value, the AUC critical value
And the AUC matched curve, the target signature factor is determined in N number of candidate feature factor, comprising:
The corresponding candidate feature factor of the AUC critical value is stored in fisrt feature factor set;
According to the benchmark AUC value and the AUC matched curve, AUC extreme value is obtained, the AUC extreme value is greater than the benchmark
AUC value, and it is less than the maximum value or minimum value of the AUC critical value;
According to the fisrt feature factor set, the corresponding candidate feature factor of each AUC extreme value, the target signature factor is determined.
5. according to the method described in claim 4, it is characterized in that, described according to the fisrt feature factor set, each AUC
The corresponding candidate feature factor of extreme value, determines the target signature factor, comprising:
For each AUC extreme value, the corresponding M candidate feature factor of the AUC extreme value is obtained;
For each candidate feature factor in the M candidate feature factor, the candidate feature factor is stored in described
Fisrt feature factor set obtains second feature factor set;
Characterization factor in the second feature factor set is inputted into the training pattern, obtains multiple first AUC value;
According to the multiple first AUC value and the critical AUC value, the target signature factor is determined.
6. according to the method described in claim 5, it is characterized in that, described according to the multiple first AUC value and described critical
AUC value determines the target signature factor, comprising:
The first AUC value greater than the first AUC value of the critical AUC value and less than the critical AUC value if it exists, then will be big
Characterization factor in the corresponding candidate feature factor of the first AUC value of the critical AUC value and the fisrt feature factor set
As the target signature factor.
7. according to the method described in claim 5, it is characterized in that, described according to the multiple first AUC value and described critical
AUC value determines the target signature factor, comprising:
It is if all first AUC value are all larger than the critical AUC value, M candidate feature factor deposit described first is special
Factor set is levied, third feature factor set is obtained;
Characterization factor in the third feature factor set is input to the training pattern, obtains the second AUC value;
According to second AUC value and the critical AUC value, the target signature factor is determined.
8. a kind of characterization factor determines equipment characterized by comprising
Benchmark AUC value obtains module, for obtaining reference characteristic area under the curve AUC value, respectively according to N number of candidate feature factor
The candidate feature factor is respectively used to describe a type of air control feature, and the type of the air control feature includes in following
It is at least one: feature of insuring, feature of accepting insurance or Claims Resolution feature;
AUC critical value obtains module, for the importance according to each candidate feature factor, obtains AUC critical value and AUC
Matched curve;
Target signature factor determining module, it is bent for being fitted according to the benchmark AUC value, the AUC critical value and the AUC
Line determines the target signature factor in N number of candidate feature factor.
9. a kind of characterization factor determines equipment characterized by comprising at least one processor and memory;
The memory stores computer executed instructions;
At least one described processor executes the computer executed instructions of the memory storage, so that at least one described processing
Device executes a kind of characterization factor as described in any one of claim 1 to 7 and determines method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium
It executes instruction, when processor executes the computer executed instructions, realizes feature as described in any one of claim 1 to 7
Factor determination method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811549933.0A CN109670976B (en) | 2018-12-18 | 2018-12-18 | Feature factor determination method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811549933.0A CN109670976B (en) | 2018-12-18 | 2018-12-18 | Feature factor determination method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109670976A true CN109670976A (en) | 2019-04-23 |
CN109670976B CN109670976B (en) | 2021-02-26 |
Family
ID=66143956
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811549933.0A Active CN109670976B (en) | 2018-12-18 | 2018-12-18 | Feature factor determination method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109670976B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110503566A (en) * | 2019-07-08 | 2019-11-26 | 中国平安人寿保险股份有限公司 | Air control method for establishing model, device, computer equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102567477A (en) * | 2011-06-16 | 2012-07-11 | 北京亿赞普网络技术有限公司 | Website value evaluation method and device |
CN103761451A (en) * | 2014-01-02 | 2014-04-30 | 中国科学院数学与系统科学研究院 | Biomarker combination identification method and system based on biomedical big data |
CN104615790A (en) * | 2015-03-09 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Characteristic recommendation method and device |
CN105469263A (en) * | 2014-09-24 | 2016-04-06 | 阿里巴巴集团控股有限公司 | Commodity recommendation method and device |
CN107194137A (en) * | 2016-01-31 | 2017-09-22 | 青岛睿帮信息技术有限公司 | A kind of necrotizing enterocolitis classification Forecasting Methodology modeled based on medical data |
CN108876487A (en) * | 2018-08-29 | 2018-11-23 | 盈盈(杭州)网络技术有限公司 | A kind of industrial plot estimation method based on big data and intelligent decision mechanism |
-
2018
- 2018-12-18 CN CN201811549933.0A patent/CN109670976B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102567477A (en) * | 2011-06-16 | 2012-07-11 | 北京亿赞普网络技术有限公司 | Website value evaluation method and device |
CN103761451A (en) * | 2014-01-02 | 2014-04-30 | 中国科学院数学与系统科学研究院 | Biomarker combination identification method and system based on biomedical big data |
CN105469263A (en) * | 2014-09-24 | 2016-04-06 | 阿里巴巴集团控股有限公司 | Commodity recommendation method and device |
CN104615790A (en) * | 2015-03-09 | 2015-05-13 | 百度在线网络技术(北京)有限公司 | Characteristic recommendation method and device |
CN107194137A (en) * | 2016-01-31 | 2017-09-22 | 青岛睿帮信息技术有限公司 | A kind of necrotizing enterocolitis classification Forecasting Methodology modeled based on medical data |
CN108876487A (en) * | 2018-08-29 | 2018-11-23 | 盈盈(杭州)网络技术有限公司 | A kind of industrial plot estimation method based on big data and intelligent decision mechanism |
Non-Patent Citations (2)
Title |
---|
张红飞: "候选食管癌相关抗原的筛选鉴定及其对食管鳞癌的诊断价值", 《中国博士学位论文全文数据库(电子期刊)医药卫生科技辑》 * |
马箐: "MicroRNA-320c、MicroRNA-451a、MicroRNA-486在食管鳞癌及癌前病变患者血清中的表达及其作为初筛标志物的可行性研究", 《中国优秀硕士学位论文全文数据库(电子期刊)医药卫生科技辑》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110503566A (en) * | 2019-07-08 | 2019-11-26 | 中国平安人寿保险股份有限公司 | Air control method for establishing model, device, computer equipment and storage medium |
CN110503566B (en) * | 2019-07-08 | 2024-02-09 | 中国平安人寿保险股份有限公司 | Wind control model building method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109670976B (en) | 2021-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108932585B (en) | Merchant operation management method and equipment, storage medium and electronic equipment thereof | |
CN107910068A (en) | Insure health risk Forecasting Methodology, device, equipment and the storage medium of user | |
CN109872232A (en) | It is related to illicit gain to legalize account-classification method, device, computer equipment and the storage medium of behavior | |
CN108416506B (en) | Client risk level management method, server and computer readable storage medium | |
CN110209660A (en) | Cheat clique's method for digging, device and electronic equipment | |
CN107622326A (en) | User's classification, available resources Forecasting Methodology, device and equipment | |
CN109299085A (en) | A kind of data processing method, electronic equipment and storage medium | |
CN113159922A (en) | Data flow direction identification method, device, equipment and medium | |
CN112801773A (en) | Enterprise risk early warning method, device, equipment and storage medium | |
CN110363642A (en) | Loan data processing method, device, readable storage medium storing program for executing and program product | |
CN111931047A (en) | Artificial intelligence-based black product account detection method and related device | |
CN109670976A (en) | Characterization factor determines method and apparatus | |
CN114266640A (en) | Auditing method and device, computer equipment and storage medium | |
CN113032440A (en) | Data processing method and device for training risk model | |
CN107656927A (en) | A kind of feature selection approach and equipment | |
CN112232944A (en) | Scoring card creating method and device and electronic equipment | |
CN110930242A (en) | Credibility prediction method, device, equipment and storage medium | |
CN110570301B (en) | Risk identification method, device, equipment and medium | |
CN111784495B (en) | Guarantee ring identification method and device, computer equipment and storage medium | |
CN114862188A (en) | Analysis system for agricultural product e-commerce data | |
CN108985811A (en) | Method, apparatus and electronic equipment for precision marketing | |
CN113487440A (en) | Model generation method, health insurance claim settlement determination method, device, equipment and medium | |
CN112734210A (en) | Intelligent case division method and system | |
CN109544348A (en) | Assets security screening technique, equipment and computer readable storage medium | |
CN111091472A (en) | Data processing method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: Floor 36, Zheshang Building, No. 718 Jianshe Avenue, Jiang'an District, Wuhan, Hubei 430019 Patentee after: TK.CN INSURANCE Co.,Ltd. Patentee after: TAIKANG INSURANCE GROUP Co.,Ltd. Address before: Taikang Life Building, 156 fuxingmennei street, Xicheng District, Beijing 100031 Patentee before: TAIKANG INSURANCE GROUP Co.,Ltd. Patentee before: TK.CN INSURANCE Co.,Ltd. |
|
CP03 | Change of name, title or address |