CN109670976A - Characterization factor determines method and apparatus - Google Patents

Characterization factor determines method and apparatus Download PDF

Info

Publication number
CN109670976A
CN109670976A CN201811549933.0A CN201811549933A CN109670976A CN 109670976 A CN109670976 A CN 109670976A CN 201811549933 A CN201811549933 A CN 201811549933A CN 109670976 A CN109670976 A CN 109670976A
Authority
CN
China
Prior art keywords
auc
factor
value
feature
candidate feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811549933.0A
Other languages
Chinese (zh)
Other versions
CN109670976B (en
Inventor
崔蓝艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd, Taikang Online Property Insurance Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN201811549933.0A priority Critical patent/CN109670976B/en
Publication of CN109670976A publication Critical patent/CN109670976A/en
Application granted granted Critical
Publication of CN109670976B publication Critical patent/CN109670976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Abstract

The present embodiment provides a kind of characterization factors to determine method and apparatus, this method comprises: first according to N number of candidate feature factor, obtain benchmark AUC value, each candidate feature factor is respectively used to describe a type of air control feature, and the type of the air control feature includes at least one of following: feature of insuring, feature of accepting insurance or Claims Resolution feature;Further according to the importance of each candidate feature factor, AUC critical value and AUC matched curve are obtained;Further according to the benchmark AUC value, the AUC critical value and the AUC matched curve, the target signature factor is determined in N number of candidate feature factor, it overcomes and removes screening characterization factor using single-wheel jackknife, the quantity into the modular character factor cannot be assessed, quantity for entering the characterization factor of mould can only be estimated by staff oneself, there are deviation, the problem of can not reasonably choosing the best features factor.

Description

Characterization factor determines method and apparatus
Technical field
The present invention relates to insurance air control technical fields more particularly to a kind of characterization factor to determine method and apparatus.
Background technique
Currently, there are part frauds for accident insurance and health insurance, for example some insurers fill in falseness on request slip Booming income information, information is high insured amount to obtain whereby, causes very big fraud suspicion.It, can be by insurance for this class behavior Business scenario builds a set of comprehensive, system, mathematical model for suiting business scenario, and then draws in conjunction with business scenario rule It holds up, carrys out various dimensions screening deceptive information, and be applied to core and protect in rule, to avoid the generation of fraud.
At present there are three types of the optimizations substantially of mathematical model: the optimization of algorithm, the optimization of sample are screened a part of good The optimization of sample, characterization factor.Wherein the method for existing characterization factor optimization is mainly single-wheel jackknife, concrete principle are as follows: By assessing each characterization factor to the effect of model, to exclude to influence lesser characterization factor to model, completed with this The attitude layer of the corresponding sample data of characterization factor screens more reasonable characterization factor input model.
But go to screen the quantity for the characterization factor that characterization factor cannot assess input model using single-wheel jackknife, it is right It can only be estimated by staff oneself in the quantity of the characterization factor of input model, there are deviations, can not reasonably choose most Good characterization factor.
Summary of the invention
The embodiment of the present invention provides a kind of characterization factor and determines method and apparatus, overcomes and goes to screen using single-wheel jackknife Characterization factor cannot assess the quantity into the modular character factor, can only be by staff certainly for entering the quantity of characterization factor of mould Oneself estimation, there are deviation, the problem of can not reasonably choosing the best features factor.
In a first aspect, the embodiment of the present invention, which provides a kind of characterization factor, determines method, comprising:
According to N number of candidate feature factor, benchmark AUC value is obtained, each candidate feature factor is respectively used to describe one kind The air control feature of type, the type of the air control feature include at least one of following: feature of insuring, feature of accepting insurance or Claims Resolution Feature;
According to the importance of each candidate feature factor, AUC critical value and AUC matched curve are obtained;
According to the benchmark AUC value, the AUC critical value and the AUC matched curve, in N number of candidate feature The target signature factor is determined in the factor.
In a kind of possible design, the importance according to each candidate feature factor, obtain AUC critical value with And AUC matched curve, comprising:
The minimum characterization factor of importance is deleted from N number of candidate feature factor, by remaining N-1 candidate feature The factor inputs training pattern, obtains the AUC value of the training pattern output;
The minimum characterization factor of importance is deleted from the N-1 candidate feature factor, it is candidate special by remaining N-2 It levies the factor and inputs the training pattern, obtain the AUC value of the training pattern output;
The operation for deleting the minimum characterization factor of importance is repeated, until from remaining 2 candidate feature factors The minimum characterization factor of importance is deleted, 1 candidate feature factor is inputted into the training pattern, it is defeated to obtain the training pattern AUC value out;
According to N-1 obtained AUC value, the AUC critical value and the AUC matched curve are obtained.
In a kind of possible design, the N-1 AUC value that the basis obtains obtains the AUC critical value and described AUC matched curve, comprising:
Using the maximum value in the N-1 AUC value as the AUC critical value;
The N-1 AUC value is fitted, the AUC matched curve is obtained.
It is described to be fitted according to the benchmark AUC value, the AUC critical value and the AUC in a kind of possible design Curve determines the target signature factor in N number of candidate feature factor, comprising:
The corresponding candidate feature factor of the AUC critical value is stored in fisrt feature factor set;
According to the benchmark AUC value and the AUC matched curve, AUC extreme value is obtained, the AUC extreme value is greater than described Benchmark AUC value, and it is less than the maximum value or minimum value of the AUC critical value;
According to the characterization factor group, the corresponding candidate feature factor of each AUC extreme value, the target signature factor is determined.
In a kind of possible design, it is described according to the characterization factor group, the corresponding candidate feature of each AUC extreme value The factor determines the target signature factor, comprising:
For each AUC extreme value, the corresponding M candidate feature factor of the AUC extreme value is obtained;
For each candidate feature factor in the M candidate feature factor, by the candidate feature factor be stored in The fisrt feature factor set, obtains second feature factor set;
Characterization factor in the second feature factor set is inputted into the training pattern, obtains multiple first AUC value;
According to the multiple first AUC value and the critical AUC value, the target signature factor is determined.
It is described according to the multiple first AUC value and the critical AUC value in a kind of possible design, determine target Characterization factor, comprising:
The first AUC value greater than the first AUC value of the critical AUC value and less than the critical value if it exists then will Greater than the feature in the corresponding candidate feature factor of the first AUC value of the critical AUC value and the fisrt feature factor set because Son is used as the target signature factor.
It is described according to the multiple first AUC value and the critical AUC value in a kind of possible design, determine target Characterization factor, comprising:
If all first AUC value are all larger than the critical AUC value, by M candidate feature factor deposit described the One characterization factor group, obtains third feature factor set;
Characterization factor in the third feature factor set is input to the training pattern, obtains the second AUC value;
According to second AUC value and the critical AUC value, the target signature factor is determined.
Second aspect, the embodiment of the present invention provide a kind of characterization factor and determine equipment, comprising:
Benchmark AUC value obtains module, for obtaining benchmark AUC value value, each candidate according to N number of candidate feature factor Characterization factor is respectively used to describe a type of air control feature, and the type of the air control feature includes at least one in following Kind: feature of insuring, feature of accepting insurance or Claims Resolution feature;
AUC critical value obtains module, for the importance according to each candidate feature factor, obtain AUC critical value with And AUC matched curve;
Target signature factor determining module, for quasi- according to the benchmark AUC value, the AUC critical value and the AUC Curve is closed, the target signature factor is determined in N number of candidate feature factor.
In a kind of possible design, the AUC critical value obtains module and is specifically used for:
The minimum characterization factor of importance is deleted from N number of candidate feature factor, by remaining N-1 candidate feature The factor inputs training pattern, obtains the AUC value of the training pattern output;
The minimum characterization factor of importance is deleted from the N-1 candidate feature factor, it is candidate special by remaining N-2 It levies the factor and inputs the training pattern, obtain the AUC value of the training pattern output;
The operation for deleting the minimum characterization factor of importance is repeated, until from remaining 2 candidate feature factors The minimum characterization factor of importance is deleted, 1 candidate feature factor is inputted into the training pattern, it is defeated to obtain the training pattern AUC value out;
According to N-1 obtained AUC value, the AUC critical value and the AUC matched curve are obtained.
In a kind of possible design, the AUC critical value obtain module also particularly useful for:
Using the maximum value in the N-1 AUC value as the AUC critical value;
The N-1 AUC value is fitted, the AUC matched curve is obtained.
In a kind of possible design, the target signature factor determining module is specifically used for:
The corresponding candidate feature factor of the AUC critical value is stored in fisrt feature factor set;
According to the benchmark AUC value and the AUC matched curve, AUC extreme value is obtained, the AUC extreme value is greater than described Benchmark AUC value, and it is less than the maximum value or minimum value of the AUC critical value;
According to the fisrt feature factor set, the corresponding candidate feature factor of each AUC extreme value, determine target signature because Son.
In a kind of possible design, the target signature factor determining module also particularly useful for:
For each AUC extreme value, the corresponding M candidate feature factor of the AUC extreme value is obtained;
For each candidate feature factor in the M candidate feature factor, by the candidate feature factor be stored in The fisrt feature factor set, obtains second feature factor set;
Characterization factor in the second feature factor set is inputted into the training pattern, obtains multiple first AUC value;
According to the multiple first AUC value and the critical AUC value, the target signature factor is determined.
In a kind of possible design, the target signature factor determining module also particularly useful for:
The first AUC value greater than the first AUC value of the critical AUC value and less than the critical value if it exists then will Greater than the feature in the corresponding candidate feature factor of the first AUC value of the critical AUC value and the fisrt feature factor set because Son is used as the target signature factor.
In a kind of possible design, the target signature factor determining module also particularly useful for:
If all first AUC value are all larger than the critical AUC value, by M candidate feature factor deposit described the One characterization factor group, obtains third feature factor set;
Characterization factor in the third feature factor set is input to the training pattern, obtains the second AUC value;
According to second AUC value and the critical AUC value, the target signature factor is determined.
The third aspect, the embodiment of the present invention provide a kind of characterization factor and determine equipment, comprising: at least one processor and deposit Reservoir;
The memory stores computer executed instructions;
At least one described processor executes the computer executed instructions of memory storage so that it is described at least one Processor is executed as the described in any item characterization factors of first aspect determine method.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium It is stored with computer executed instructions in matter, when processor executes the computer executed instructions, realizes as first aspect is any Characterization factor described in determines method.
Characterization factor provided in this embodiment determines method and apparatus, and first according to N number of candidate feature factor, it is special to obtain benchmark Area under the curve AUC value is levied, each candidate feature factor is respectively used to describe a type of air control feature, and the air control is special The type of sign includes at least one of following: feature of insuring, feature of accepting insurance or Claims Resolution feature;Further according to each candidate feature The importance of the factor obtains AUC critical value and AUC matched curve;Further according to the benchmark AUC value, the AUC critical value with And the AUC matched curve, the target signature factor is determined in N number of candidate feature factor, is overcome and is cut using single-wheel knife Method removes screening characterization factor, cannot assess the quantity into the modular character factor, can only be by work for entering the quantity of characterization factor of mould Make personnel oneself estimation, there are deviation, the problem of can not reasonably choosing the best features factor.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.
Fig. 1 is the flow diagram one that the characterization factor that embodiment of the present invention provides determines method;
Fig. 2 is the flow diagram two that the characterization factor that embodiment of the present invention provides determines method;
Fig. 3 is that the characterization factor that embodiment of the present invention provides deletes process schematic;
Fig. 4 is the schematic diagram of AUC matched curve provided in this embodiment;
Fig. 5 is characterization factor provided in an embodiment of the present invention with determining method flow diagram three;
Fig. 6 is the structural schematic diagram that characterization factor provided in an embodiment of the present invention determines equipment;
Fig. 7 is the hardware structural diagram that characterization factor provided in an embodiment of the present invention determines equipment.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Fig. 1 is the flow diagram one that the characterization factor that embodiment of the present invention provides determines method, as shown, the party Method includes:
S101, according to N number of candidate feature factor, obtain benchmark AUC value, each candidate feature factor is respectively used to describe A type of air control feature, the type of the air control feature include at least one of following: feature of insuring, feature of accepting insurance or Claims Resolution feature.
During specific implementation, first N number of candidate feature factor is obtained from sample data, wherein N be greater than or Integer equal to 2.And determine algorithm model for exporting area AUC value under indicatrix.
For example, the candidate feature factor is specifically used for describing a type of air control feature.The type packet of the air control feature It includes at least one of following: feature of insuring, feature of accepting insurance or Claims Resolution feature.Wherein, feature of insuring can be obtained from insurance data It arrives, such as insurance data includes insurer, the true identity of warrantee, property situation etc., Claims Resolution feature can be from Claims Resolution data It obtains, such as Claims Resolution data include the amount for which loss settled of Claims Resolution event, Claims Resolution (or being in danger) frequency, Claims Resolution (or being in danger) person-time, are held Protecting feature can obtain from data of accepting insurance, and data of accepting insurance are, for example, accept insurance time, underwriting amount etc..N number of candidate feature as a result, The factor can be for example insurer, the true identity of warrantee, amount for which loss settled, Claims Resolution (or being in danger) person-time etc..
Specifically, N number of candidate feature factor can be input in training pattern, reference characteristic song is obtained by training pattern Area AUC value under line, i.e. benchmark AUC value.The training pattern for example can be special for gradient decline tree (GradientBoostingDecisionTree, GBDT), (Logic Rgression, LR) logistic regression, (Random Rorest, RF) characteristic models such as random forest, (Support Vector Machine, SVM) support vector machines, the present embodiment Herein with no restrictions.
Wherein, AUC is defined as the area surrounded under ROC curve with reference axis, it is clear that the numerical value of this area will not be big In 1.Top due to ROC curve generally all in this straight line of y=x again, so the value range of AUC is between 0.5 and 1. AOC, Receiver operating curve's (receiver operating characteristic curve, ROC curve), also known as To experience linearity curve (sensitivity curve).Receiver operating curve is with false positive probability (False Positive rate) it is horizontal axis, true positives (True positive rate) are coordinate diagram and subject composed by the longitudinal axis Due to curve that the Different Results obtained using different judgment criterias are drawn under the conditions of particular stimulation.
By taking the analysis of the loss ratios of insurance products as an example, by user draw a portrait system index screening be hospitalized protect product accept insurance with And Claims Resolution expiry declaration form data, a part of data are therefrom extracted, by getting 110,000 datas, screening 30 after data prediction The characterization factor of a dimension.These characterization factors have maintained independence in service layer, but incomplete in mathematics level Reach decoupling, still there is the characterization factor of redundancy and strong correlation.
Specifically, in the present embodiment, bringing N number of candidate feature factor into GBDT gradient decline tree characteristic model, output Indicatrix (receiver operating characteristic curve, ROC), further according under ROC curve with reference axis The area surrounded is reference characteristic area under the curve AUC, is denoted as AUC-total, carrys out forecasting risk client.Wherein, AUC value is got over Height, representative prediction is more quasi-, and risk is lower.
S102, according to the importance of each candidate feature factor, obtain AUC critical value and AUC matched curve.
Optionally, first the importance of N number of candidate feature factor is ranked up from low to high.Specific sort method are as follows:
N number of candidate feature Importance of Factors is ranked up from low to high based on gini index, after sequence, and to feature because The importance of son is standardized, and standardization formula is as follows:
Importance=C*feature_importance/Max (feature_importance)
Wherein, Importance is standardized characterization factor importance, and C is constant, and feature_importance is Characterization factor importance, Max (feature_importance) are characterized the maximum value of Importance of Factors.C can be according to reality Situation is voluntarily formulated.
After carrying out importance descending sort to N number of characterization factor, N number of characterization factor Duolun is screened according to importance, and Characterization factor after each round is screened is brought into training pattern, finally obtains multiple AUC value, and by multiple AUC value most Big value is used as AUC critical value, is denoted as AUC_max.Multiple AUC value are fitted to curve again, obtain AUC matched curve.
S103, according to the benchmark AUC value, the AUC critical value and the AUC matched curve, in N number of candidate The target signature factor is determined in characterization factor.
Specifically, finding out the extreme point that the AUC value in AUC matched curve is greater than AUC_total value, it is denoted as AUC_j, in N It is determined in the corresponding candidate feature factor of AUC critical value and the corresponding candidate feature factor of AUC_j in a candidate feature factor The target signature factor.
Characterization factor provided in this embodiment determines method, first according to N number of candidate feature factor, obtains reference characteristic curve Lower area AUC value, each candidate feature factor are respectively used to describe a type of air control feature, the class of the air control feature Type includes at least one of following: feature of insuring, feature of accepting insurance or Claims Resolution feature;Further according to each candidate feature factor Importance obtains AUC critical value and AUC matched curve;Further according to the benchmark AUC value, the AUC critical value and described AUC matched curve determines the target signature factor in N number of candidate feature factor, overcomes and go to sieve using single-wheel jackknife Characterization factor is selected, the quantity into the modular character factor cannot be assessed, it can only be by staff for entering the quantity of characterization factor of mould Oneself estimation, there are deviation, the problem of can not reasonably choosing the best features factor.
Below with reference to specific embodiment, the S102 in Fig. 1 embodiment is further elaborated.Fig. 2 is this hair The characterization factor that bright embodiment provides determines the flow diagram two of method, as shown in Fig. 2, this method comprises:
S201, the minimum characterization factor of importance is deleted from N number of candidate feature factor, remaining N-1 is waited It selects characterization factor to input training pattern, obtains the AUC value of the training pattern output;
S202, the minimum characterization factor of importance is deleted from the N-1 candidate feature factor, by remaining N-2 The candidate feature factor inputs training pattern, obtains the AUC value of the training pattern output;
S203, the operation for deleting the minimum characterization factor of importance is repeated, until being left remaining 2 candidate features 2 candidate feature factors are inputted the training pattern by the factor, are obtained the AUC value of the training pattern output, are obtained N-1 AUC value;
Specifically, as shown in figure 3, the first round delete the minimum characterization factor of importance from N number of candidate feature factor, by N- 1 candidate feature factor inputs training pattern, obtains the AUC value of training pattern type output, is denoted as ACU_1;Second wheel is from N-1 The candidate feature factor deletes the minimum characterization factor of importance, and the N-2 candidate feature factor is inputted training pattern, is trained The AUC value of model output, is denoted as ACU_2;The operation for deleting the minimum characterization factor of importance is repeated, is deleted through excessively taking turns It removes, until being left 2 candidate feature factors, 2 candidate feature factors is inputted into training pattern, obtain the AUC value of training output, It is denoted as AUC_n-1.Fig. 3 is that the characterization factor that embodiment of the present invention provides deletes process schematic.
S204, using the maximum value in the N-1 AUC value as the AUC critical value;
S205, the N-1 AUC value is fitted, obtains the AUC matched curve.
The AUC_n-1 specifically, by AUC_1, ACU_2 ..., the maximum value in N-1 AUC value is as AUC critical value, note For AUC_max, and by AUC_1, ACU_2 ... AUC_n-1, N-1 AUC value is fitted to obtain AUC matched curve, such as Fig. 4 Shown, Fig. 4 is the schematic diagram of AUC matched curve provided in this embodiment.
Specifically, in the present embodiment, carrying out multi-turns screen to 30 candidate feature factors, 13 quality features can be obtained The factor, the AUC value that this 13 quality features factor input training patterns obtain is AUC_max.Optionally, characterization factor is inputted Training pattern can also obtain recall ratio and rate of tabling look-up.As shown in table 1,30 candidate feature factors are given and input training mould Type cuts to obtain in 13 characterization factors and the present embodiment that obtain 13 quality features factors defeated by multi-turns screen by single-wheel knife Enter training pattern and obtains the value of AUC, recall ratio and rate of tabling look-up.
Table 1
As shown in Table 1, the AUC value that 13 quality features factor input training patterns obtain is obtained by multi-turns screen to be greater than 30 candidate feature factors input training patterns cut to obtain 13 characterization factors input training patterns and obtain AUC by single-wheel knife Value.
Characterization factor provided in this embodiment determines method, and it is minimum that importance is deleted from N number of candidate feature factor Characterization factor, the remaining N-1 candidate feature factor is inputted into training pattern, obtains the AUC value of training pattern output; The minimum characterization factor of importance is deleted from the N-1 candidate feature factor, by N-2 candidate feature factor input training Model obtains the AUC value of the training pattern output;The operation for deleting the minimum characterization factor of importance is repeated, until It is left 2 candidate feature factors, remaining 2 candidate feature factors is inputted into the training pattern, obtain the training pattern The AUC value of output obtains N-1 AUC value;Using the maximum value in the N-1 AUC value as the AUC critical value;To described N-1 AUC value is fitted, and obtains the AUC matched curve, according to the available quality features factor of AUC critical value.
Below with reference to specific embodiment, the S103 in Fig. 1 embodiment is further elaborated.Fig. 5 is this hair Determining method the flow diagram three of the characterization factor that bright embodiment provides, as shown in figure 5, this method comprises:
S501, the corresponding candidate feature factor of the AUC critical value is stored in fisrt feature factor set;
Specifically, fisrt feature factor set includes the corresponding 13 quality features factor a, b, c ... the m of AUC critical value.
S502, according to the benchmark AUC value and the AUC matched curve, obtain AUC extreme value, the AUC extreme value be greater than The benchmark AUC value, and it is less than the maximum value or minimum value of the AUC critical value;
Specifically, as shown in figure 3, P, Q are greater than benchmark AUC value and less than the AUC extreme point of AUC critical value.
S503, it is directed to each AUC extreme value, obtains the corresponding M candidate feature factor of the AUC extreme value;
S504, for each candidate feature factor in the M candidate feature factor, the candidate feature factor is deposited Enter to the fisrt feature factor set, obtains second feature factor set;
S505, the characterization factor in the second feature factor set is inputted into the training pattern, obtains multiple first AUC Value;
S506, the first AUC value if it exists greater than the first AUC value of the critical AUC value and less than the critical value, It then will be greater than the spy in the corresponding candidate feature factor of the first AUC value and the fisrt feature factor set of the critical AUC value The factor is levied as the target signature factor;
Specifically, being illustrated by taking the AUC extreme value of P point as an example.Obtain P point corresponding 2 candidate features of AUC extreme value because Son, the corresponding two candidate feature factors of the AUC extreme value of P point are respectively n and l.N and l are stored in respectively to fisrt feature factor set, Obtain second feature factor set;Characterization factor in second feature factor set is inputted into training pattern, obtains the first AUC value;If The first AUC value that n addition fisrt feature factor set obtains is greater than critical AUC value, l addition fisrt feature factor set is obtained First AUC value is less than critical AUC value, then the second feature factor set obtained n addition fisrt feature factor set is as newest The quality features factor, i.e. the target signature factor.The target signature factor includes: a, b, c ... m, n totally 14 characterization factors.
If S507, all first AUC value are all larger than the critical AUC value, the M candidate feature factor is stored in institute Fisrt feature factor set is stated, third feature factor set is obtained;
S508, the characterization factor in the third feature factor set is input to the training pattern, obtains the 2nd AUC Value;
S509, according to second AUC value and the critical AUC value, determine the target signature factor.
Specifically, equally l is added if the first AUC value that n addition fisrt feature factor set obtains is greater than critical AUC value Enter the first AUC value that fisrt feature factor set obtains and be greater than critical AUC value, then n and l is stored in fisrt feature factor set simultaneously, Third feature factor set is obtained, the characterization factor in third feature factor set is input to training pattern, obtains the second AUC value. If the second AUC value is greater than critical AUC value, using third feature factor set as the target signature factor, i.e., newest quality features The factor.If the second AUC value is less than critical AUC value, using n and fisrt feature factor set as the target signature factor.
Likewise, after having judged that the corresponding characterization factor of AUC extreme value of P point whether there is the target signature factor, then judge Q It whether there is the target signature factor in the corresponding characterization factor of AUC extreme value of point.Detailed process are as follows:
If the corresponding characterization factor of AUC extreme value of P point there are the target signature factor, i.e., the target signature factor include: a, b, C ... m, n totally 14 characterization factors, then be incorporated into the target signature factor for the corresponding characterization factor of AUC extreme value of Q point.Such as In the present embodiment, the corresponding characterization factor of AUC extreme value of Q point is o, and o is incorporated into the target signature factor, is input to training pattern, Judge relationship of the third AUC value with AUC_max of output, however, it is determined that third AUC value is greater than AUC_max, the AUC extreme value pair of Q point The characterization factor o answered is the target signature factor, obtains the new target signature factor, comprising: a, b, c ... m, n, o.
If the corresponding characterization factor of AUC extreme value of P point be not present the target signature factor, i.e., the target signature factor include: a, B, c ... m totally 13 characterization factors, then be incorporated into the target signature factor for the corresponding characterization factor o of the AUC extreme value of Q point, be input to Training pattern judges relationship of the third AUC value with AUC_max of output, however, it is determined that third AUC value is greater than AUC_max, Q point The corresponding characterization factor o of AUC extreme value is the target signature factor, obtains the new target signature factor, comprising: a, b, c ... m, o.
Specifically, in the present embodiment, it is final to determine that the target signature factor is 14.As shown in table 2,30 times are given Select characterization factor input training pattern, 13 quality features factors and 14 target signature factors input training patterns obtain AUC, The value of recall ratio and rate of tabling look-up.
Table 2
As shown in Table 2, the AUC value that 14 target signature factor input training patterns obtain is greater than 30 candidate feature factors Input training pattern, 13 quality features factor input training patterns obtain the value of AUC.
To sum up, characterization factor provided in an embodiment of the present invention determines method, effectively avoids falsely dropping or multiselect characterization factor It happens, and the quantity of characterization factor is dropped into optimum state well.
Fig. 6 is the structural schematic diagram that characterization factor provided in an embodiment of the present invention determines equipment, as shown in fig. 6, this feature The factor determines that equipment 60 includes: that benchmark AUC value obtains module 601, AUC critical value obtains module 602 and the target signature factor Determining module 603.
Benchmark AUC value obtains module 601, for obtaining benchmark AUC value, each candidate according to N number of candidate feature factor Characterization factor is respectively used to describe a type of air control feature, and the type of the air control feature includes at least one in following Kind: feature of insuring, feature of accepting insurance or Claims Resolution feature;
AUC critical value obtains module 602, for the importance according to each candidate feature factor, obtains AUC critical value And AUC matched curve;
Target signature factor determining module 603, for according to the benchmark AUC value, the AUC critical value and described AUC matched curve determines the target signature factor in N number of candidate feature factor.
Optionally, AUC critical value obtains module 602 and is specifically used for:
The minimum characterization factor of importance is deleted from N number of candidate feature factor, by remaining N-1 candidate feature The factor inputs training pattern, obtains the AUC value of the training pattern output;
The minimum characterization factor of importance is deleted from the N-1 candidate feature factor, it is candidate special by remaining N-2 It levies the factor and inputs the training pattern, obtain the AUC value of the training pattern output;
The operation for deleting the minimum characterization factor of importance is repeated, until from remaining 2 candidate feature factors The minimum characterization factor of importance is deleted, 1 candidate feature factor is inputted into the training pattern, it is defeated to obtain the training pattern AUC value out;
According to N-1 obtained AUC value, the AUC critical value and the AUC matched curve are obtained.
AUC critical value obtain module 602 also particularly useful for:
Using the maximum value in the N-1 AUC value as the AUC critical value;
The N-1 AUC value is fitted, the AUC matched curve is obtained.
Optionally, target signature factor determining module 603 is specifically used for:
The corresponding candidate feature factor of the AUC critical value is stored in fisrt feature factor set;
According to the benchmark AUC value and the AUC matched curve, AUC extreme value is obtained, the AUC extreme value is greater than described Benchmark AUC value, and it is less than the maximum value or minimum value of the AUC critical value;
According to the fisrt feature factor set, the corresponding candidate feature factor of each AUC extreme value, determine target signature because Son.
Optionally, target signature factor determining module 603 also particularly useful for:
For each AUC extreme value, the corresponding M candidate feature factor of the AUC extreme value is obtained;
For each candidate feature factor in the M candidate feature factor, by the candidate feature factor be stored in The fisrt feature factor set, obtains second feature factor set;
Characterization factor in the second feature factor set is inputted into the training pattern, obtains multiple first AUC value;
According to the multiple first AUC value and the critical AUC value, the target signature factor is determined.
Optionally, target signature factor determining module 603 also particularly useful for:
The first AUC value greater than the first AUC value of the critical AUC value and less than the critical value if it exists then will Greater than the feature in the corresponding candidate feature factor of the first AUC value of the critical AUC value and the fisrt feature factor set because Son is used as the target signature factor.
Optionally, target signature factor determining module 603 also particularly useful for:
If all first AUC value are all larger than the critical AUC value, by M candidate feature factor deposit described the One characterization factor group, obtains third feature factor set;
Characterization factor in the third feature factor set is input to the training pattern, obtains the second AUC value;
According to second AUC value and the critical AUC value, the target signature factor is determined.
Device provided in this embodiment can be used for executing the technical solution of embodiment of the method shown in Fig. 1 to Fig. 5, in fact Existing principle is similar with technical effect, and details are not described herein again.
Fig. 7 is the hardware structural diagram that characterization factor provided in an embodiment of the present invention determines equipment.As described in Figure 7, originally The characterization factor that embodiment provides determines that equipment 70 includes:
Processor 701, memory 702;Wherein
Memory 702, for storing computer executed instructions.
Processor 701, for executing the computer executed instructions of memory storage.
Processor 701 by executing the computer executed instructions of memory storage, realize in above-described embodiment feature because Son determines each step performed by equipment.It specifically may refer to the associated description in above method embodiment.
Optionally, memory 702 can also be integrated, the present embodiment is not with processor 601 either independent It is specifically limited.
When memory 702 is independently arranged, which further includes bus 703, for connecting the memory 702, processor 701.
The embodiment of the present invention also provides a kind of computer readable storage medium, stores in the computer readable storage medium There are computer executed instructions, when processor executes the computer executed instructions, realizes that characterization factor as described above determines Method.
In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, apparatus embodiments described above are merely indicative, for example, the division of the module, only Only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple modules can combine or It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of device or module It connects, can be electrical property, mechanical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
It, can also be in addition, each functional module in each embodiment of the present invention can integrate in one processing unit It is that modules physically exist alone, can also be integrated in one unit with two or more modules.Above-mentioned module at Unit both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated module realized in the form of software function module, can store and computer-readable deposit at one In storage media.Above-mentioned software function module is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) or processor (English: processor) execute this Shen Please each embodiment the method part steps.
It should be understood that above-mentioned processor can be central processing unit (English: Central Processing Unit, letter Claim: CPU), can also be other general processors, digital signal processor (English: Digital Signal Processor, Referred to as: DSP), specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC) etc..General processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with hair The step of bright disclosed method, can be embodied directly in hardware processor and execute completion, or with hardware in processor and soft Part block combiner executes completion.
Memory may include high speed RAM memory, it is also possible to and it further include non-volatile memories NVM, for example, at least one Magnetic disk storage can also be USB flash disk, mobile hard disk, read-only memory, disk or CD etc..
Bus can be industry standard architecture (Industry Standard Architecture, ISA) bus, outer Portion's apparatus interconnection (PeripheralComponent, PCI) bus or extended industry-standard architecture (Extended Industry Standard Architecture, EISA) bus etc..Bus can be divided into address bus, data/address bus, control Bus etc..For convenient for indicating, the bus in illustrations does not limit only a bus or a type of bus.
Above-mentioned storage medium can be by any kind of volatibility or non-volatile memory device or their combination It realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable Read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, Disk or CD.Storage medium can be any usable medium that general or specialized computer can access.
A kind of illustrative storage medium is coupled to processor, believes to enable a processor to read from the storage medium Breath, and information can be written to the storage medium.Certainly, storage medium is also possible to the component part of processor.It processor and deposits Storage media can be located at specific integrated circuit (Application Specific Integrated Circuits, referred to as: ASIC in).Certainly, pocessor and storage media can also be used as discrete assembly and be present in electronic equipment or main control device.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (10)

1. a kind of characterization factor determines method characterized by comprising
According to N number of candidate feature factor, benchmark AUC value under reference characteristic curve is obtained, each candidate feature factor is used respectively In describing a type of air control feature, the type of the air control feature includes at least one of following: feature of insuring is accepted insurance Feature or Claims Resolution feature;
According to the importance of each candidate feature factor, AUC critical value and AUC matched curve are obtained;
According to the benchmark AUC value, the AUC critical value and the AUC matched curve, in N number of candidate feature factor The middle determining target signature factor.
2. the method according to claim 1, wherein the importance according to each candidate feature factor, Obtain AUC critical value and AUC matched curve, comprising:
The minimum characterization factor of importance is deleted from N number of candidate feature factor, by the remaining N-1 candidate feature factor Training pattern is inputted, the AUC value of the training pattern output is obtained;
The minimum characterization factor of importance is deleted from the N-1 candidate feature factor, by remaining N-2 candidate feature because Son inputs the training pattern, obtains the AUC value of the training pattern output;
The operation for deleting the minimum characterization factor of importance is repeated, is deleted from 2 candidate feature factors until remaining 1 candidate feature factor is inputted the training pattern by the minimum characterization factor of importance, obtains the training pattern output AUC value;
According to N-1 obtained AUC value, the AUC critical value and the AUC matched curve are obtained.
3. according to the method described in claim 2, it is characterized in that, the N-1 AUC value that the basis obtains, obtains the AUC Critical value and the AUC matched curve, comprising:
Using the maximum value in the N-1 AUC value as the AUC critical value;
The N-1 AUC value is fitted, the AUC matched curve is obtained.
4. according to the method described in claim 3, it is characterized in that, described according to the benchmark AUC value, the AUC critical value And the AUC matched curve, the target signature factor is determined in N number of candidate feature factor, comprising:
The corresponding candidate feature factor of the AUC critical value is stored in fisrt feature factor set;
According to the benchmark AUC value and the AUC matched curve, AUC extreme value is obtained, the AUC extreme value is greater than the benchmark AUC value, and it is less than the maximum value or minimum value of the AUC critical value;
According to the fisrt feature factor set, the corresponding candidate feature factor of each AUC extreme value, the target signature factor is determined.
5. according to the method described in claim 4, it is characterized in that, described according to the fisrt feature factor set, each AUC The corresponding candidate feature factor of extreme value, determines the target signature factor, comprising:
For each AUC extreme value, the corresponding M candidate feature factor of the AUC extreme value is obtained;
For each candidate feature factor in the M candidate feature factor, the candidate feature factor is stored in described Fisrt feature factor set obtains second feature factor set;
Characterization factor in the second feature factor set is inputted into the training pattern, obtains multiple first AUC value;
According to the multiple first AUC value and the critical AUC value, the target signature factor is determined.
6. according to the method described in claim 5, it is characterized in that, described according to the multiple first AUC value and described critical AUC value determines the target signature factor, comprising:
The first AUC value greater than the first AUC value of the critical AUC value and less than the critical AUC value if it exists, then will be big Characterization factor in the corresponding candidate feature factor of the first AUC value of the critical AUC value and the fisrt feature factor set As the target signature factor.
7. according to the method described in claim 5, it is characterized in that, described according to the multiple first AUC value and described critical AUC value determines the target signature factor, comprising:
It is if all first AUC value are all larger than the critical AUC value, M candidate feature factor deposit described first is special Factor set is levied, third feature factor set is obtained;
Characterization factor in the third feature factor set is input to the training pattern, obtains the second AUC value;
According to second AUC value and the critical AUC value, the target signature factor is determined.
8. a kind of characterization factor determines equipment characterized by comprising
Benchmark AUC value obtains module, for obtaining reference characteristic area under the curve AUC value, respectively according to N number of candidate feature factor The candidate feature factor is respectively used to describe a type of air control feature, and the type of the air control feature includes in following It is at least one: feature of insuring, feature of accepting insurance or Claims Resolution feature;
AUC critical value obtains module, for the importance according to each candidate feature factor, obtains AUC critical value and AUC Matched curve;
Target signature factor determining module, it is bent for being fitted according to the benchmark AUC value, the AUC critical value and the AUC Line determines the target signature factor in N number of candidate feature factor.
9. a kind of characterization factor determines equipment characterized by comprising at least one processor and memory;
The memory stores computer executed instructions;
At least one described processor executes the computer executed instructions of the memory storage, so that at least one described processing Device executes a kind of characterization factor as described in any one of claim 1 to 7 and determines method.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium It executes instruction, when processor executes the computer executed instructions, realizes feature as described in any one of claim 1 to 7 Factor determination method.
CN201811549933.0A 2018-12-18 2018-12-18 Feature factor determination method and device Active CN109670976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811549933.0A CN109670976B (en) 2018-12-18 2018-12-18 Feature factor determination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811549933.0A CN109670976B (en) 2018-12-18 2018-12-18 Feature factor determination method and device

Publications (2)

Publication Number Publication Date
CN109670976A true CN109670976A (en) 2019-04-23
CN109670976B CN109670976B (en) 2021-02-26

Family

ID=66143956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811549933.0A Active CN109670976B (en) 2018-12-18 2018-12-18 Feature factor determination method and device

Country Status (1)

Country Link
CN (1) CN109670976B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503566A (en) * 2019-07-08 2019-11-26 中国平安人寿保险股份有限公司 Air control method for establishing model, device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567477A (en) * 2011-06-16 2012-07-11 北京亿赞普网络技术有限公司 Website value evaluation method and device
CN103761451A (en) * 2014-01-02 2014-04-30 中国科学院数学与系统科学研究院 Biomarker combination identification method and system based on biomedical big data
CN104615790A (en) * 2015-03-09 2015-05-13 百度在线网络技术(北京)有限公司 Characteristic recommendation method and device
CN105469263A (en) * 2014-09-24 2016-04-06 阿里巴巴集团控股有限公司 Commodity recommendation method and device
CN107194137A (en) * 2016-01-31 2017-09-22 青岛睿帮信息技术有限公司 A kind of necrotizing enterocolitis classification Forecasting Methodology modeled based on medical data
CN108876487A (en) * 2018-08-29 2018-11-23 盈盈(杭州)网络技术有限公司 A kind of industrial plot estimation method based on big data and intelligent decision mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567477A (en) * 2011-06-16 2012-07-11 北京亿赞普网络技术有限公司 Website value evaluation method and device
CN103761451A (en) * 2014-01-02 2014-04-30 中国科学院数学与系统科学研究院 Biomarker combination identification method and system based on biomedical big data
CN105469263A (en) * 2014-09-24 2016-04-06 阿里巴巴集团控股有限公司 Commodity recommendation method and device
CN104615790A (en) * 2015-03-09 2015-05-13 百度在线网络技术(北京)有限公司 Characteristic recommendation method and device
CN107194137A (en) * 2016-01-31 2017-09-22 青岛睿帮信息技术有限公司 A kind of necrotizing enterocolitis classification Forecasting Methodology modeled based on medical data
CN108876487A (en) * 2018-08-29 2018-11-23 盈盈(杭州)网络技术有限公司 A kind of industrial plot estimation method based on big data and intelligent decision mechanism

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张红飞: "候选食管癌相关抗原的筛选鉴定及其对食管鳞癌的诊断价值", 《中国博士学位论文全文数据库(电子期刊)医药卫生科技辑》 *
马箐: "MicroRNA-320c、MicroRNA-451a、MicroRNA-486在食管鳞癌及癌前病变患者血清中的表达及其作为初筛标志物的可行性研究", 《中国优秀硕士学位论文全文数据库(电子期刊)医药卫生科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503566A (en) * 2019-07-08 2019-11-26 中国平安人寿保险股份有限公司 Air control method for establishing model, device, computer equipment and storage medium
CN110503566B (en) * 2019-07-08 2024-02-09 中国平安人寿保险股份有限公司 Wind control model building method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN109670976B (en) 2021-02-26

Similar Documents

Publication Publication Date Title
CN108932585B (en) Merchant operation management method and equipment, storage medium and electronic equipment thereof
CN107910068A (en) Insure health risk Forecasting Methodology, device, equipment and the storage medium of user
CN109872232A (en) It is related to illicit gain to legalize account-classification method, device, computer equipment and the storage medium of behavior
CN108416506B (en) Client risk level management method, server and computer readable storage medium
CN110209660A (en) Cheat clique's method for digging, device and electronic equipment
CN107622326A (en) User's classification, available resources Forecasting Methodology, device and equipment
CN109299085A (en) A kind of data processing method, electronic equipment and storage medium
CN113159922A (en) Data flow direction identification method, device, equipment and medium
CN112801773A (en) Enterprise risk early warning method, device, equipment and storage medium
CN110363642A (en) Loan data processing method, device, readable storage medium storing program for executing and program product
CN111931047A (en) Artificial intelligence-based black product account detection method and related device
CN109670976A (en) Characterization factor determines method and apparatus
CN114266640A (en) Auditing method and device, computer equipment and storage medium
CN113032440A (en) Data processing method and device for training risk model
CN107656927A (en) A kind of feature selection approach and equipment
CN112232944A (en) Scoring card creating method and device and electronic equipment
CN110930242A (en) Credibility prediction method, device, equipment and storage medium
CN110570301B (en) Risk identification method, device, equipment and medium
CN111784495B (en) Guarantee ring identification method and device, computer equipment and storage medium
CN114862188A (en) Analysis system for agricultural product e-commerce data
CN108985811A (en) Method, apparatus and electronic equipment for precision marketing
CN113487440A (en) Model generation method, health insurance claim settlement determination method, device, equipment and medium
CN112734210A (en) Intelligent case division method and system
CN109544348A (en) Assets security screening technique, equipment and computer readable storage medium
CN111091472A (en) Data processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Floor 36, Zheshang Building, No. 718 Jianshe Avenue, Jiang'an District, Wuhan, Hubei 430019

Patentee after: TK.CN INSURANCE Co.,Ltd.

Patentee after: TAIKANG INSURANCE GROUP Co.,Ltd.

Address before: Taikang Life Building, 156 fuxingmennei street, Xicheng District, Beijing 100031

Patentee before: TAIKANG INSURANCE GROUP Co.,Ltd.

Patentee before: TK.CN INSURANCE Co.,Ltd.

CP03 Change of name, title or address