CN110348993A - Wind is discussed and select model workers determination method, determining device and the electronic equipment of type label - Google Patents
Wind is discussed and select model workers determination method, determining device and the electronic equipment of type label Download PDFInfo
- Publication number
- CN110348993A CN110348993A CN201910578914.9A CN201910578914A CN110348993A CN 110348993 A CN110348993 A CN 110348993A CN 201910578914 A CN201910578914 A CN 201910578914A CN 110348993 A CN110348993 A CN 110348993A
- Authority
- CN
- China
- Prior art keywords
- sample data
- overdue
- rate
- label
- recall rate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000012545 processing Methods 0.000 claims description 16
- 238000001914 filtration Methods 0.000 claims description 7
- 238000012216 screening Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000009434 installation Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 241001269238 Data Species 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000005291 magnetic effect Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 210000000352 storage cell Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Technology Law (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
It discusses and select model workers determination method, determining device and the electronic equipment of type label the present invention provides a kind of wind, wherein the discuss and select model workers determination method of type label of wind includes: to obtain sample data to be calibrated;Obtain accuracy rate and recall rate of the sample data under the different overdue performance phases;Based on the accuracy rate and the recall rate, the target overdue performance phase is filtered out from the overdue performance of the difference is interim, and as label;According to the label filtered out, the sample data is demarcated.According to the technical solution of the present invention, contingency table timing is being carried out to sample data, is demarcating, timely and accurately sample data can be predicted, to enhance the timeliness of sample data again after being showed completely without waiting for sample data.
Description
Technical field
The present invention relates to internet financial technology field, discuss and select model workers the determination side of type label in particular to a kind of wind
Method, a kind of wind are discussed and select model workers determining device, a kind of electronic equipment and a kind of computer readable storage medium of type label.
Background technique
With the rapid development of economy, credit consuming is also more and more concerned, credit card purchase, is helped personal automobile loan
It is increasing to learn the various personal consumption loans such as loan, small amount consumptive loan, and growth rate is very fast.Consumer credit
Rapid growth needs each credit guarantee side to have more perfect management of credit risk system, thus each credit guarantee side's meeting
Type is discussed and select model workers using wind to carry out risk profile control to business, the discuss and select model workers validity of type of wind directly affects risk evaluating result
Accuracy.
At present during building wind discusses and select model workers type, the used universal deviation of sample data timeliness, especially for
The some length of maturity longer samples, such as 12 months, then 1 year or more time is at least needed to wait these samples complete
Performance is " good person " or " bad person " to demarcate sample to determine label, then is discussed and select model workers type according to sample to construct wind, because of building
Used sample is older when model, it is constructed go out wind discuss and select model workers type be not just yet very well, it is inadequate to will lead to its evaluating result
Accurately.
Summary of the invention
Present invention seek to address that existing comment used sample data when modeling that cannot be classified calibration in time for wind,
The problem of poor in timeliness.
In order to solve the above-mentioned technical problem, the first aspect of the present invention proposes a kind of wind and discusses and select model workers the determination side of type label
Method, comprising: obtain sample data to be calibrated;It obtains accuracy rate of the sample data under the different overdue performance phases and recalls
Rate;Based on the accuracy rate and the recall rate, the target overdue performance phase is filtered out from the overdue performance of the difference is interim, and will
It is as label;According to the label filtered out, the sample data is demarcated.
In the technical scheme, by obtaining sample data and sample data to be calibrated in the different overdue performance phases
Under accuracy rate and recall rate, according to accuracy rate and recall rate from Bu Tong it is overdue show it is interim filter out suitable label, with right
Sample data is demarcated, and is demarcated again after showing completely without waiting for sample data, can be timely and accurately to sample
Data are predicted, the timeliness of sample data is enhanced.
In the above-mentioned technical solutions, it is preferable that it is described based on the accuracy rate and the recall rate, it is overdue from the difference
Show interim the step of filtering out target overdue performance phase, specifically include: building includes that the overdue performance of the difference is interim
The statistical form of each overdue performance phase accuracy rate corresponding with its and recall rate;It is overdue that the target is screened according to the statistical form
The performance phase.
In the technical scheme, statistical form building convenient under different condition accuracy rate and recall rate carry out screening ratio
It is right, and then guarantee accurately to filter out the target overdue performance phase.
In any of the above-described technical solution, it is preferable that described to screen the overdue performance of target according to the statistical form
The step of phase, specifically includes: filtering out accuracy rate from the statistical form greater than first threshold and recall rate is greater than second threshold
The corresponding overdue performance phase, as the target overdue performance phase.
In the technical scheme, recall rate is lower when accuracy rate is higher in general, accuracy rate if recall rate is higher
It is lower, so filtering out accuracy rate by comparing accuracy rate and recall rate in statistical form and recall rate being all relatively high
One group of corresponding overdue performance phase as label, to ensure to screen the reasonability of label, ensures subsequent to sample to the full extent
The accuracy of notebook data classification calibration.
In any of the above-described technical solution, it is preferable that calculate standard of the sample data under the different overdue performance phases
The step of true rate, specifically includes: calculating accuracy rate of the sample data under the different overdue performance phases: a according to the following formula
=(TP+TN)/(TP+FP+TN+FN);Wherein, a indicates accuracy rate, and TP indicates that forecast sample is good person, really good person, FP table
Show that forecast sample is good person, really bad person, TN indicates that forecast sample is bad person, really bad person, and FN indicates that forecast sample is
Bad person, really good person.
In any of the above-described technical solution, it is preferable that obtain the sample data calling together in the case where not having to the overdue performance phase
It the step of rate of returning, specifically includes: calculating recall rate of the sample data under the different overdue performance phases: b according to the following formula
=TP/ (TP+FN);Wherein, b indicates recall rate, and TP indicates that forecast sample is good person, and really good person, FP indicate forecast sample
It is good person, really bad person, TN indicate that forecast sample is bad person, really bad person, and FN indicates that forecast sample is bad person, really
Good person.
In any of the above-described technical solution, it is preferable that the difference overdue performance phase includes issue and overdue number of days.
In any of the above-described technical solution, it is preferable that described the step of obtaining sample data to be calibrated, specific to wrap
It includes: transferring the sample data from database;And/or the sample data is transferred from third party's loan platform.
In any of the above-described technical solution, it is preferable that further include: based on to calibrated sample data progress engineering
Simulated training is practised, is discussed and select model workers type with constructing wind.
In the technical scheme, strong using timeliness, the accurate sample data of classification is discussed and select model workers type to construct wind, utmostly
Shangdi ensures available preferable model, and then improves the accuracy of model evaluating result.
In order to solve the above-mentioned technical problem, what the second aspect of the present invention proposed that a kind of wind discusses and select model workers type label determines dress
It sets, comprising: first acquisition unit, for obtaining sample data to be calibrated;Second acquisition unit, for obtaining the sample number
According to the accuracy rate and recall rate under the different overdue performance phases;Processing unit, for being based on the accuracy rate and the recall rate,
The target overdue performance phase is filtered out from the overdue performance of the difference is interim, and as label;Unit is demarcated, for according to sieve
The label selected, demarcates the sample data.
In the technical scheme, by obtaining sample data and sample data to be calibrated in the different overdue performance phases
Under accuracy rate and recall rate, according to accuracy rate and recall rate from Bu Tong it is overdue show it is interim filter out suitable label, with right
Sample data is demarcated, and is demarcated again after showing completely without waiting for sample data, can be timely and accurately to sample
Data are predicted, the timeliness of sample data is enhanced.
In any of the above-described technical solution, it is preferable that the processing unit includes: statistical form construction unit, is used for structure
Building includes the overdue statistical form for showing each of interim accuracy rate corresponding with its of overdue performance phase and recall rate of the difference;
Screening unit, for screening the target overdue performance phase according to the statistical form.
In the technical scheme, statistical form building convenient under different condition accuracy rate and recall rate carry out screening ratio
It is right, and then guarantee accurately to filter out the target overdue performance phase.
In any of the above-described technical solution, it is preferable that the screening unit is specifically used for: being screened from the statistical form
Accuracy rate is greater than first threshold out and recall rate is greater than the second threshold corresponding overdue performance phase, overdue as the target
The performance phase.
In the technical scheme, recall rate is lower when accuracy rate is higher in general, accuracy rate if recall rate is higher
It is lower, so filtering out accuracy rate by comparing accuracy rate and recall rate in statistical form and recall rate being all relatively high
One group of corresponding overdue performance phase as label, to ensure to screen the reasonability of label, ensures subsequent to sample to the full extent
The accuracy of notebook data classification calibration.
In any of the above-described technical solution, it is preferable that the second acquisition unit is specifically used for: counting according to the following formula
Calculate accuracy rate of the sample data under the different overdue performance phases: a=(TP+TN)/(TP+FP+TN+FN);Wherein, a is indicated
Accuracy rate, TP indicate that forecast sample is good person, really good person, and FP indicates that forecast sample is good person, and really bad person, TN are indicated
Forecast sample is bad person, really bad person, and FN indicates that forecast sample is bad person, really good person.
In any of the above-described technical solution, it is preferable that the second acquisition unit is specifically used for: counting according to the following formula
Calculate recall rate of the sample data under the different overdue performance phases: b=TP/ (TP+FN);Wherein, b indicates recall rate, TP table
Show that forecast sample is good person, really good person, FP indicates that forecast sample is good person, really bad person, and TN indicates that forecast sample is
Bad person, really bad person, FN indicate that forecast sample is bad person, really good person.
In any of the above-described technical solution, it is preferable that the difference overdue performance phase includes issue and overdue number of days.
In any of the above-described technical solution, it is preferable that the first acquisition unit is specifically used for: transferring from database
The sample data;And/or the sample data is transferred from third party's loan platform.
In any of the above-described technical solution, it is preferable that further include: model construction unit, for based on to calibrated
Sample data carries out machine learning simulated training, is discussed and select model workers type with constructing wind.
In the technical scheme, strong using timeliness, the accurate sample data of classification is discussed and select model workers type to construct wind, utmostly
Shangdi ensures available preferable model, and then improves the accuracy of model evaluating result.
In order to solve the above-mentioned technical problem, third aspect present invention proposes a kind of electronic equipment, comprising: processor and
The memory of computer executable instructions is stored, the executable instruction makes the processor execute such as above-mentioned skill when executed
Method described in any one of art scheme.
In order to solve the above-mentioned technical problem, fourth aspect present invention proposes a kind of computer readable storage medium, wherein
The computer-readable recording medium storage one or more program, one or more of programs when being executed by a processor,
Realize the method as described in any one of above-mentioned technical proposal.
Since present invention employs calculate accuracy rate and recall rate of the sample data under the different overdue performance phases, and foundation
Accuracy rate and recall rate filter out suitable label from Bu Tong overdue performance is interim, to be demarcated to sample data, therefore this
Invention can carry out contingency table timing to sample data, demarcate again after showing completely without waiting for sample data, energy
It is enough that timely and accurately sample data is predicted, to enhance the timeliness of sample data.
Detailed description of the invention
In order to keep technical problem solved by the invention, the technological means of use and the technical effect of acquirement clearer,
Detailed description of the present invention specific embodiment below with reference to accompanying drawings.But it need to state, drawings discussed below is only this
The attached drawing of invention exemplary embodiment of the present, to those skilled in the art, before not making the creative labor
It puts, the attached drawing of other embodiments can be obtained according to these attached drawings.
The wind that Fig. 1 shows embodiment according to the present invention discuss and select model workers type label determination method schematic flow diagram;
It discusses and select model workers the schematic block diagram of the determining device of type label Fig. 2 shows the wind of embodiment according to the present invention;
Fig. 3 shows the schematic block diagram of the electronic equipment of embodiment according to the present invention;
Fig. 4 shows the schematic block diagram of the computer readable storage medium of embodiment according to the present invention.
Specific embodiment
Exemplary embodiment of the present invention is described more fully with reference to the drawings.However, exemplary embodiment can
Implement in a variety of forms, and is understood not to that present invention is limited only to embodiments set forth herein.On the contrary, it is exemplary to provide these
Embodiment enables to the present invention more full and complete, easily facilitates the technology that inventive concept is comprehensively communicated to this field
Personnel.Identical appended drawing reference indicates same or similar element, component or part in figure, thus will omit weight to them
Multiple description.
Under the premise of meeting technical concept of the invention, the feature described in some specific embodiment, structure, spy
Property or other details be not excluded for can be combined in any suitable manner in one or more other embodiments.
In the description for specific embodiment, feature, structure, characteristic or the other details that the present invention describes are to make
Those skilled in the art fully understands embodiment.But, it is not excluded that those skilled in the art can practice this hair
Bright technical solution is one or more without special characteristic, structure, characteristic or other details.
Flow chart shown in the drawings is merely illustrative, it is not necessary to including all content and operation/step,
It is not required to execute by described sequence.For example, some operation/steps can also decompose, and some operation/steps can close
And or part merge, therefore the sequence actually executed is possible to change according to the actual situation.
Block diagram shown in the drawings is only functional entity, not necessarily must be corresponding with physically separate entity.
I.e., it is possible to realize these functional entitys using software form, or realized in one or more hardware modules or integrated circuit
These functional entitys, or these functional entitys are realized in heterogeneous networks and/or processor device and/or microcontroller device.
Respectively the same reference numbers in the drawings refer to same or similar element, component or parts, thus hereinafter may
It is omitted to same or similar element, component or partial repeated description.Although should also be understood that may use the herein
One, the attribute of the expressions such as second, third number describes various devices, element, component or part, but these devices, element,
Component or part should not be limited by these attributes.That is, these attributes are intended merely to distinguish one and another one.Example
Such as, the first device is also referred to as the second device without departing from the technical solution of essence of the invention.In addition, term "and/or" or
" and/or " it include the associated all combinations for listing any of project and one or more.
Usually during making model label, such problems can be encountered: when the length of maturity is longer, such as 12
Month, then equal samples data have showed completely, " bad person " is all come out, and at least needs to wait 1 year, this means that at least with one
Year pervious sample makees this model, and the model obtained in this way is obviously bad, and preferable model in order to obtain, it is necessary to when
The strong sample data of effect property is trained, and for these sample datas because true quality does not show, needs label
It predicts to classify, in order to get suitable label, introduces accuracy rate and recall rate, wherein accuracy rate refers to that prediction is correct
Data occupy the ratio of total data, recall rate is then practical be good person data in predict correct data probability, the two it
Between relationship be that the higher recall rate of accuracy rate is lower, recall rate more high-accuracy is lower, using under the different overdue performance phases
Preparation rate and recall rate filter out suitable label, prediction classification is carried out to sample data, with enhance sample data when
Effect property, specifically label determination process, as shown in Figure 1, comprising:
Step S102 obtains sample data to be calibrated.
Wherein, the source of sample data can be transferred from database, can also be with third party's loan platform (in such as client
Debt-credit APP in information) in adjust, can be with the sample data in integrated database and third party's loan platform, it is ensured that sample
Data it is comprehensive.
Step S104 obtains accuracy rate and recall rate of the sample data under the different overdue performance phases.Wherein, different overdue
The performance phase is one day overdue when may include issue and overdue number of days, such as 6 phase.
Specifically, in formula (1) and formula (2), TP indicates that forecast sample is good person, really good person, and FP indicates prediction
Sample is good person, really bad person, and TN indicates that forecast sample is bad person, really bad person, and FN indicates that forecast sample is bad person, real
Border is good person.
Accuracy rate a of the sample data under the different overdue performance phases is calculated according to formula (1):
A=(TP+TN)/(TP+FP+TN+FN) (1).
Recall rate b of the sample data under the different overdue performance phases is calculated according to formula (2):
B=TP/ (TP+FN) (2).
Step S106 is based on accuracy rate and recall rate, filters out the target overdue performance phase from different overdue performances are interim, and
As label.
Specifically screening process: building includes each of interim standard corresponding with its of overdue performance phase of different overdue performances
The statistical form of true rate and recall rate filters out accuracy rate greater than first threshold from statistical form and recall rate is greater than second threshold pair
The overdue performance phase answered, as the target overdue performance phase.Recall rate is lower when accuracy rate is higher in general, recall rate
Accuracy rate is lower if higher, so filtering out accuracy rate by comparing accuracy rate and recall rate in statistical form and recalling
One group of all relatively high corresponding overdue performance phase of rate as label, to ensure to screen the reasonability of label, utmostly
On ensure it is subsequent to sample data classification calibration accuracy.
It is illustrated by taking number of days overdue when sample data is in 6 phase as an example below:
Different overdue performance phase (when 6 phase when overdue 1 day, 6 phase when overdue 7 days, 6 phase overdue 30 days when overdue 15 days, 6 phase)
Corresponding accuracy rate and recall rate are as shown in table 1:
The overdue performance phase | Accuracy rate | Recall rate |
Overdue 1 day when 6 phase | a1 | b1 |
Overdue 7 days when 6 phase | a2 | b2 |
Overdue 15 days when 6 phase | a3 | b3 |
Overdue 30 days when 6 phase | a4 | b4 |
Table 1
By the comparison to accuracy rate and recall rate under the overdue performance phase each in table 1, if standard when 6 phase at overdue 30 days
True rate a4With recall rate b4It is relatively high, then it can choose overdue 30 days when 6 phase and be used as label, overdue 30 days samples when by 6 phase
Data definition is bad person.What needs to be explained here is that in this example 6 phases be only illustrated as an example, not to the overdue performance phase
It is defined.
Step S108 demarcates sample data according to the label filtered out.
In the present embodiment, by obtaining sample data and sample data to be calibrated under the different overdue performance phases
Accuracy rate and recall rate filter out suitable label from Bu Tong overdue performance is interim with recall rate according to accuracy rate, to sample
Data are demarcated, and are demarcated again after showing completely without waiting for sample data, can be timely and accurately to sample data
It is predicted, enhances the timeliness of sample data.
Further, further includes: based on machine learning simulated training is carried out to calibrated sample data, commented with constructing wind
Model., classification accurate sample data strong using timeliness is discussed and select model workers type to construct wind, utmostly Shangdi ensure it is available compared with
Good model, and then improve the accuracy of model evaluating result.
It will be understood by those skilled in the art that realizing that all or part of the steps of above-described embodiment is implemented as by computer
The program (computer program) that data processing equipment executes.It is performed in the computer program, offer of the present invention is provided
The above method.Moreover, the computer program can store in computer readable storage medium, which can be with
It is the readable storage medium storing program for executing such as disk, CD, ROM, RAM, is also possible to the storage array of multiple storage medium compositions, such as disk
Or tape storage array.The storage medium is not limited to centralised storage, is also possible to distributed storage, such as based on cloud
The cloud storage of calculating.
The device of the invention embodiment is described below, which can be used for executing embodiment of the method for the invention.For
Details described in apparatus of the present invention embodiment should be regarded as the supplement for above method embodiment;For in apparatus of the present invention
Undisclosed details in embodiment is referred to above method embodiment to realize.
Usually during making model label, such problems can be encountered: when the length of maturity is longer, such as 12
Month, then equal samples data have showed completely, " bad person " is all come out, and at least needs to wait 1 year, this means that at least with one
Year pervious sample makees this model, and the model obtained in this way is obviously bad, and preferable model in order to obtain, it is necessary to when
The strong sample data of effect property is trained, and for these sample datas because true quality does not show, needs label
It predicts to classify, in order to get suitable label, introduces accuracy rate and recall rate, wherein accuracy rate refers to that prediction is correct
Data occupy the ratio of total data, recall rate is then practical be good person data in predict correct data probability, the two it
Between relationship be that the higher recall rate of accuracy rate is lower, recall rate more high-accuracy is lower, using under the different overdue performance phases
Preparation rate and recall rate filter out suitable label, prediction classification is carried out to sample data, with enhance sample data when
Effect property specifically realizes the device that label determines, the determining device 200 of type label includes: first as shown in Fig. 2, wind is discussed and select model workers
Acquiring unit 202, second acquisition unit 204, processing unit 206 and calibration unit 208.
Wherein, first acquisition unit 202 is for obtaining sample data to be calibrated, wherein the source of sample data can be with
It transfers, can also can also be integrated with being adjusted in third party's loan platform (information in the debt-credit APP in such as client) from database
Sample data in database and third party's loan platform, it is ensured that sample data it is comprehensive.
Second acquisition unit 204 is used to obtain accuracy rate and recall rate of the sample data under the different overdue performance phases.Its
In, the different overdue performance phases are one day overdue when may include issue and overdue number of days, such as 6 phase.
In following formula (1) and formula (2), TP indicates that forecast sample is good person, really good person, and FP indicates forecast sample
It is good person, really bad person, TN indicate that forecast sample is bad person, really bad person, and FN indicates that forecast sample is bad person, really
Good person.
Specifically, second acquisition unit 204 is accurate under the different overdue performance phases according to formula (1) calculating sample data
Rate a:
A=(TP+TN)/(TP+FP+TN+FN) (1).
Second acquisition unit 204 calculates recall rate b of the sample data under the different overdue performance phases according to formula (2):
B=TP/ (TP+FN) (2).
Processing unit 206 is used to be based on accuracy rate and recall rate, filters out the overdue table of target from different overdue performances are interim
It is current, and as label.
Specifically, processing unit 206 includes statistical form construction unit 2062 and screening unit 2064, statistical form construction unit
2062 buildings include the different overdue statistics for showing each of interim accuracy rate corresponding with its of overdue performance phase and recall rate
Table, accuracy rate is filtered out from statistical form greater than first threshold for screening unit 2064 and recall rate exceedes greater than second threshold is corresponding
Phase shows the phase, as the target overdue performance phase.Recall rate is lower when accuracy rate is higher in general, and recall rate is higher
Words accuracy rate is lower, so filtering out accuracy rate and recall rate all phases by comparing accuracy rate and recall rate in statistical form
Higher one group corresponding overdue performance phase is ensured as label with ensuring to screen the reasonability of label to the full extent
The subsequent accuracy to sample data classification calibration.
It is illustrated by taking number of days overdue when sample data is in 6 phase as an example below:
Different overdue performance phase (when 6 phase when overdue 1 day, 6 phase when overdue 7 days, 6 phase overdue 30 days when overdue 15 days, 6 phase)
Corresponding accuracy rate and recall rate are as shown in table 1:
The overdue performance phase | Accuracy rate | Recall rate |
Overdue 1 day when 6 phase | a1 | b1 |
Overdue 7 days when 6 phase | a2 | b2 |
Overdue 15 days when 6 phase | a3 | b3 |
Overdue 30 days when 6 phase | a4 | b4 |
Table 1
By the comparison to accuracy rate and recall rate under the overdue performance phase each in table 1, if standard when 6 phase at overdue 30 days
True rate a4With recall rate b4It is relatively high, then it can choose overdue 30 days when 6 phase and be used as label, overdue 30 days samples when by 6 phase
Data definition is bad person.What needs to be explained here is that in this example 6 phases be only illustrated as an example, not to the overdue performance phase
It is defined.
Calibration unit 208 is used to demarcate sample data according to the label filtered out.
In the present embodiment, by obtaining sample data and sample data to be calibrated under the different overdue performance phases
Accuracy rate and recall rate filter out suitable label from Bu Tong overdue performance is interim with recall rate according to accuracy rate, to sample
Data are demarcated, and are demarcated again after showing completely without waiting for sample data, can be timely and accurately to sample data
It is predicted, enhances the timeliness of sample data.
Further, wind is discussed and select model workers the determining device 200 of type label further include: model construction unit 210, for based on pair
Calibrated sample data carries out machine learning simulated training, is discussed and select model workers type with constructing wind., classification accurate sample strong using timeliness
Notebook data is discussed and select model workers type to construct wind, and utmostly Shangdi ensures available preferable model, and then improves model evaluating result
Accuracy.
It will be understood by those skilled in the art that each module in above-mentioned apparatus embodiment can be distributed in device according to description
In, corresponding change can also be carried out, is distributed in one or more devices different from above-described embodiment.The mould of above-described embodiment
Block can be merged into a module, can also be further split into multiple submodule.
Electronic equipment embodiment of the invention is described below, which can be considered as the method for aforementioned present invention
With the specific entity embodiment of Installation practice.For details described in electronic equipment embodiment of the present invention, should be regarded as pair
In the above method or the supplement of Installation practice;For undisclosed details, Ke Yican in electronic equipment embodiment of the present invention
It is realized according to the above method or Installation practice.
Fig. 3 is the structural block diagram of the exemplary embodiment of a kind of electronic equipment according to the present invention.It is retouched referring to Fig. 3
State the electronic equipment 300 of the embodiment according to the present invention.The electronic equipment 300 that Fig. 3 is shown is only an example, should not be right
The function and use scope of the embodiment of the present invention bring any restrictions.
As shown in figure 3, electronic equipment 300 is showed in the form of universal computing device.The component of electronic equipment 300 can wrap
It includes but is not limited to: at least one processing unit 310, at least one storage unit 320, (including the storage of the different system components of connection
Unit 320 and processing unit 310) bus 330, display unit 340 etc..
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 310
Row, so that the processing unit 310 executes described in this specification above-mentioned electronic prescription circulation processing method part according to this
The step of inventing various illustrative embodiments.For example, the processing unit 310 can execute step as shown in Figure 1.
The storage unit 320 may include the readable medium of volatile memory cell form, such as random access memory
Unit (RAM) 3201 and/or cache memory unit 3202 can further include read-only memory unit (ROM) 3203.
The storage unit 320 can also include program/practical work with one group of (at least one) program module 3205
Tool 3204, such program module 3205 includes but is not limited to: operating system, one or more application program, other programs
It may include the realization of network environment in module and program data, each of these examples or certain combination.
Bus 330 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage
Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures
Local bus.
Electronic equipment 300 can also be with one or more external equipments 400 (such as keyboard, sensing equipment, bluetooth equipment
Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 300 communicate, and/or with make
Any equipment (such as the router, modulation /demodulation that the electronic equipment 300 can be communicated with one or more of the other calculating equipment
Device etc.) communication.This communication can be carried out by input/output (I/O) interface 350.Also, electronic equipment 300 can be with
By network adapter 360 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network,
Such as internet) communication.Network adapter 360 can be communicated by bus 330 with other modules of electronic equipment 300.It should
Understand, although not shown in the drawings, other hardware and/or software module can be used in conjunction with electronic equipment 300, including but unlimited
In: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and number
According to backup storage system etc..
Through the above description of the embodiments, those skilled in the art it can be readily appreciated that the present invention describe it is exemplary
Embodiment can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to this hair
The technical solution of bright embodiment can be embodied in the form of software products, which can store calculates at one
In the readable storage medium of machine (can be CD-ROM, USB flash disk, mobile hard disk etc.) or on network, including some instructions are so that one
Platform calculates equipment (can be personal computer, server or network equipment etc.) and executes according to the above method of the present invention.When
When the computer program is executed by a data processing equipment so that the computer-readable medium can be realized it is of the invention upper
State method, it may be assumed that obtain sample data to be calibrated, obtain accuracy rate of the sample data under the different overdue performance phases and recall
Rate is based on accuracy rate and recall rate, filters out the target overdue performance phase from different overdue performances are interim, and as label,
According to the label filtered out, sample data is demarcated.
Fig. 4 is the schematic diagram of a computer readable storage medium of the invention.As shown in figure 4, the computer program
It can store on one or more computer-readable mediums.Computer-readable medium can be readable signal medium or readable
Storage medium.Readable storage medium storing program for executing for example can be but be not limited to the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, dress
It sets or device, or any above combination.The more specific example (non exhaustive list) of readable storage medium storing program for executing includes: to have
It is the electrical connections of one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only memory (ROM), erasable
Formula programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory (CD-ROM), optical memory
Part, magnetic memory device or above-mentioned any appropriate combination.
The computer readable storage medium may include in a base band or the data as the propagation of carrier wave a part are believed
Number, wherein carrying readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetism
Signal, optical signal or above-mentioned any appropriate combination.Readable storage medium storing program for executing can also be any other than readable storage medium storing program for executing
Readable medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or
Person's program in connection.The program code for including on readable storage medium storing program for executing can transmit with any suitable medium, packet
Include but be not limited to wireless, wired, optical cable, RF etc. or above-mentioned any appropriate combination.
The program for executing operation of the present invention can be write with any combination of one or more programming languages
Code, described program design language include object oriented program language-Java, C++ etc., further include conventional
Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user
It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's calculating
Upper side point is executed on a remote computing or is executed in remote computing device or server completely.It is being related to far
Journey calculates in the situation of equipment, and remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network
(WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP
To be connected by internet).
In conclusion the present invention can be implemented in hardware, or the software to run on one or more processors
Module is realized, or is implemented in a combination thereof.It will be understood by those of skill in the art that micro process can be used in practice
The communications data processing units such as device or digital signal processor (DSP) come realize according to embodiments of the present invention in it is some or
The some or all functions of whole components.The present invention is also implemented as a part for executing method as described herein
Or whole device or device program (for example, computer program and computer program product).Such realization present invention
Program can store on a computer-readable medium, or may be in the form of one or more signals.Such letter
It number can be downloaded from an internet website to obtain, be perhaps provided on the carrier signal or be provided in any other form.
Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects
It describes in detail bright, it should be understood that the present invention is not inherently related to any certain computer, virtual bench or electronic equipment, various
The present invention also may be implemented in fexible unit.The above is only a specific embodiment of the present invention, is not limited to this hair
Bright, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in the present invention
Protection scope within.
Claims (10)
- A kind of determination method of type label 1. wind is discussed and select model workers characterized by comprisingObtain sample data to be calibrated;Obtain accuracy rate and recall rate of the sample data under the different overdue performance phases;Based on the accuracy rate and the recall rate, the target overdue performance phase is filtered out from the overdue performance of the difference is interim, and As label;According to the label filtered out, the sample data is demarcated.
- The determination method of type label 2. wind according to claim 1 is discussed and select model workers, which is characterized in that described based on described accurate Rate and the recall rate are specifically included from the overdue performance of the difference interim the step of filtering out target overdue performance phase:Building, which includes that the difference is overdue, shows each of interim accuracy rate corresponding with its of overdue performance phase and recall rate Statistical form;The target overdue performance phase is screened according to the statistical form.
- 3. -2 described in any item wind are discussed and select model workers the determination method of type label according to claim 1, which is characterized in that the foundation The statistical form screens the step of target overdue performance phase, specifically includes:Accuracy rate is filtered out from the statistical form greater than first threshold and recall rate is greater than the corresponding overdue performance of second threshold Phase, as the target overdue performance phase.
- The determination method of type label 4. wind according to claim 1-3 is discussed and select model workers, which is characterized in that described in calculating It the step of accuracy rate of the sample data under the different overdue performance phases, specifically includes:Accuracy rate of the sample data under the different overdue performance phases is calculated according to the following formula:A=(TP+TN)/(TP+FP+TN+FN);Wherein, a indicates accuracy rate, and TP indicates that forecast sample is good person, really good person, and FP indicates that forecast sample is good person, real Border is bad person, and TN indicates that forecast sample is bad person, really bad person, and FN indicates that forecast sample is bad person, really good person.
- The determination method of type label 5. wind according to claim 1-4 is discussed and select model workers, which is characterized in that described in acquisition Sample data specifically includes the step of not having to the recall rate under the overdue performance phase:Recall rate of the sample data under the different overdue performance phases is calculated according to the following formula:B=TP/ (TP+FN);Wherein, b indicates recall rate, and TP indicates that forecast sample is good person, really good person, and FP indicates that forecast sample is good person, real Border is bad person, and TN indicates that forecast sample is bad person, really bad person, and FN indicates that forecast sample is bad person, really good person.
- The determination method of type label 6. wind according to claim 1-5 is discussed and select model workers, which is characterized in that the difference The overdue performance phase includes issue and overdue number of days.
- The determination method of type label 7. wind according to claim 1-6 is discussed and select model workers, which is characterized in that the acquisition It the step of sample data to be calibrated, specifically includes:The sample data is transferred from database;And/orThe sample data is transferred from third party's loan platform.
- The determining device of type label 8. a kind of wind is discussed and select model workers characterized by comprisingFirst acquisition unit, for obtaining sample data to be calibrated;Second acquisition unit, for obtaining accuracy rate and recall rate of the sample data under the different overdue performance phases;Processing unit filters out target from the overdue performance of the difference is interim for being based on the accuracy rate and the recall rate The overdue performance phase, and as label;Unit is demarcated, for being demarcated to the sample data according to the label filtered out.
- 9. a kind of electronic equipment, wherein the electronic equipment, comprising:Processor;AndThe memory of computer executable instructions is stored, the executable instruction makes the processor execute basis when executed Method of any of claims 1-7.
- 10. a kind of computer readable storage medium, wherein the computer-readable recording medium storage one or more program, One or more of programs when being executed by a processor, realize method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910578914.9A CN110348993B (en) | 2019-06-28 | 2019-06-28 | Determination method and determination device for label for wind assessment model and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910578914.9A CN110348993B (en) | 2019-06-28 | 2019-06-28 | Determination method and determination device for label for wind assessment model and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110348993A true CN110348993A (en) | 2019-10-18 |
CN110348993B CN110348993B (en) | 2023-12-22 |
Family
ID=68177378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910578914.9A Active CN110348993B (en) | 2019-06-28 | 2019-06-28 | Determination method and determination device for label for wind assessment model and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110348993B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130138554A1 (en) * | 2011-11-30 | 2013-05-30 | Rawllin International Inc. | Dynamic risk assessment and credit standards generation |
CN103699628A (en) * | 2013-12-20 | 2014-04-02 | 北京百度网讯科技有限公司 | Multiple tag obtaining method and device |
CN107909097A (en) * | 2017-11-08 | 2018-04-13 | 阿里巴巴集团控股有限公司 | The update method and device of sample in sample storehouse |
CN108595497A (en) * | 2018-03-16 | 2018-09-28 | 北京达佳互联信息技术有限公司 | Data screening method, apparatus and terminal |
CN109242499A (en) * | 2018-09-19 | 2019-01-18 | 中国银行股份有限公司 | A kind of processing method of transaction risk prediction, apparatus and system |
CN109388760A (en) * | 2017-08-03 | 2019-02-26 | 腾讯科技(北京)有限公司 | Recommend label acquisition method, media content recommendations method, apparatus and storage medium |
-
2019
- 2019-06-28 CN CN201910578914.9A patent/CN110348993B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130138554A1 (en) * | 2011-11-30 | 2013-05-30 | Rawllin International Inc. | Dynamic risk assessment and credit standards generation |
CN103699628A (en) * | 2013-12-20 | 2014-04-02 | 北京百度网讯科技有限公司 | Multiple tag obtaining method and device |
CN109388760A (en) * | 2017-08-03 | 2019-02-26 | 腾讯科技(北京)有限公司 | Recommend label acquisition method, media content recommendations method, apparatus and storage medium |
CN107909097A (en) * | 2017-11-08 | 2018-04-13 | 阿里巴巴集团控股有限公司 | The update method and device of sample in sample storehouse |
CN108595497A (en) * | 2018-03-16 | 2018-09-28 | 北京达佳互联信息技术有限公司 | Data screening method, apparatus and terminal |
CN109242499A (en) * | 2018-09-19 | 2019-01-18 | 中国银行股份有限公司 | A kind of processing method of transaction risk prediction, apparatus and system |
Also Published As
Publication number | Publication date |
---|---|
CN110348993B (en) | 2023-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110119413B (en) | Data fusion method and device | |
EP4137961A1 (en) | Method and apparatus for executing automatic machine learning process, and device | |
JP6713238B2 (en) | Electronic device, method for constructing retail store evaluation model, system and storage medium | |
CN107220217A (en) | Characteristic coefficient training method and device that logic-based is returned | |
US11642783B2 (en) | Automated generation of robotic computer program code | |
CN110348726A (en) | A kind of user's amount method of adjustment, device and electronic equipment based on social networks network | |
CN104516730A (en) | Data processing method and device | |
CN110363575A (en) | A kind of credit user moves branch wish prediction technique, device and equipment | |
CN110348991A (en) | Assess the method, apparatus and electronic equipment of user's accrediting amount upper limit | |
CN110363654A (en) | A kind of favor information method for pushing, device and electronic equipment | |
CN116245670B (en) | Method, device, medium and equipment for processing financial tax data based on double-label model | |
CN111598677A (en) | Resource quota determining method and device and electronic equipment | |
CN110348852A (en) | A kind of credit evaluation model modification method, device, electronic equipment | |
CN110309142A (en) | The method and apparatus of regulation management | |
CN110362825A (en) | A kind of text based finance data abstracting method, device and electronic equipment | |
CN109947811A (en) | Generic features library generating method and device, storage medium, electronic equipment | |
CN112016792A (en) | User resource quota determining method and device and electronic equipment | |
US8495018B2 (en) | Transitioning application replication configurations in a networked computing environment | |
US20180359852A1 (en) | Modifying a Circuit Design | |
CN112508692A (en) | Resource recovery risk prediction method and device based on convolutional neural network and electronic equipment | |
CN112348658A (en) | Resource allocation method and device and electronic equipment | |
CN111582649A (en) | Risk assessment method and device based on user APP unique hot coding and electronic equipment | |
CN110349022A (en) | A kind of automated testing method, device and the electronic equipment of the virtual credit card transaction scene based on micro services | |
CN110363392A (en) | Line of credit method of adjustment, device and electronic equipment based on user's Wifi information | |
CN110348993A (en) | Wind is discussed and select model workers determination method, determining device and the electronic equipment of type label |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |