CN110348993B - Determination method and determination device for label for wind assessment model and electronic equipment - Google Patents

Determination method and determination device for label for wind assessment model and electronic equipment Download PDF

Info

Publication number
CN110348993B
CN110348993B CN201910578914.9A CN201910578914A CN110348993B CN 110348993 B CN110348993 B CN 110348993B CN 201910578914 A CN201910578914 A CN 201910578914A CN 110348993 B CN110348993 B CN 110348993B
Authority
CN
China
Prior art keywords
overdue
sample data
presentation
accuracy
recall
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910578914.9A
Other languages
Chinese (zh)
Other versions
CN110348993A (en
Inventor
熊庄
苏绥绥
常富洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qiyu Information Technology Co Ltd
Original Assignee
Beijing Qiyu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qiyu Information Technology Co Ltd filed Critical Beijing Qiyu Information Technology Co Ltd
Priority to CN201910578914.9A priority Critical patent/CN110348993B/en
Publication of CN110348993A publication Critical patent/CN110348993A/en
Application granted granted Critical
Publication of CN110348993B publication Critical patent/CN110348993B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Technology Law (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method and a device for determining a label for a wind evaluation model and electronic equipment, wherein the method for determining the label for the wind evaluation model comprises the following steps: acquiring sample data to be calibrated; obtaining the accuracy and recall of the sample data under different overdue presentation periods; screening out target overdue presentation periods from the different overdue presentation periods based on the accuracy rate and the recall rate, and taking the target overdue presentation periods as labels; and calibrating the sample data according to the screened label. According to the technical scheme, when the sample data is classified and marked, the sample data can be predicted timely and accurately without waiting for the sample data to be marked after the sample data is completely displayed, so that the timeliness of the sample data is enhanced.

Description

Determination method and determination device for label for wind assessment model and electronic equipment
Technical Field
The present invention relates to the field of internet finance, and in particular, to a method for determining a label for a wind evaluation model, a device for determining a label for a wind evaluation model, an electronic device, and a computer-readable storage medium.
Background
With the rapid development of economy, credit consumption is also attracting more attention, credit card consumption, personal car loan, learning-aid loan, small amount of consumption loan and other various personal consumption loans are increasing, and the growing speed is very fast. The rapid increase of personal consumption credit requires each credit guarantee party to have a perfect credit risk management system, and for this purpose, each credit guarantee party can use a wind assessment model to conduct risk prediction management and control on the business, and the validity of the wind assessment model directly influences the accuracy of risk assessment results.
At present, in the process of constructing a wind assessment model, timeliness of sample data adopted is generally deviated, particularly for some samples with longer loan period, such as 12 months, more than one year is needed to wait for the samples to completely express so as to determine the labels, so that whether the samples are good or bad is marked, and the wind assessment model is constructed according to the samples.
Disclosure of Invention
The invention aims to solve the problems that sample data used in wind assessment modeling can not be classified and calibrated in time and timeliness is poor.
In order to solve the above technical problem, a first aspect of the present invention provides a method for determining a label for a wind assessment model, including: acquiring sample data to be calibrated; obtaining the accuracy and recall of the sample data under different overdue presentation periods; screening out target overdue presentation periods from the different overdue presentation periods based on the accuracy rate and the recall rate, and taking the target overdue presentation periods as labels; and calibrating the sample data according to the screened label.
According to the technical scheme, the sample data to be calibrated and the accuracy and recall rate of the sample data under different overdue expression periods are obtained, and the proper labels are screened out from the different overdue expression periods according to the accuracy and recall rate, so that the sample data are calibrated, the sample data are not required to be calibrated after being completely expressed, the sample data can be predicted timely and accurately, and the timeliness of the sample data is enhanced.
In the foregoing technical solution, preferably, the step of screening the target overdue presentation period from the different overdue presentation periods based on the accuracy rate and the recall rate specifically includes: constructing a statistical table containing each overdue presentation period of the different overdue presentation periods and the corresponding accuracy and recall thereof; and screening the target overdue presentation period according to the statistical table.
In the technical scheme, the statistical table is constructed to facilitate screening and comparing the accuracy and recall under different conditions, so that the target overdue presentation period can be accurately screened.
In any of the foregoing solutions, preferably, the step of screening the target overdue performance according to the statistics includes: and screening out overdue presentation dates corresponding to the accuracy rate greater than the first threshold and the recall rate greater than the second threshold from the statistical table, and taking the overdue presentation dates as the target overdue presentation dates.
In the technical scheme, the recall rate is lower when the accuracy rate is higher, and the accuracy rate is lower when the recall rate is higher, so that a group of corresponding overdue expression periods with relatively higher accuracy rate and recall rate are screened out to serve as labels by comparing the accuracy rate and the recall rate in the statistical table, the rationality of the screened labels is ensured, and the accuracy of the subsequent classification and calibration of sample data is ensured to the greatest extent.
In any of the above solutions, preferably, the step of calculating the accuracy of the sample data under different overdue performance periods specifically includes: the accuracy of the sample data at different overdue presentation times is calculated according to the following formula: a= (tp+tn)/(tp+fp+tn+fn); wherein a represents the accuracy, TP represents that the predicted sample is a good person, FP represents that the predicted sample is a bad person, TN represents that the predicted sample is a bad person, FN represents that the predicted sample is a bad person, and FN represents that the predicted sample is a bad person.
In any of the foregoing solutions, preferably, the step of obtaining the recall of the sample data without overdue presentation period specifically includes: the recall of the sample data at different overdue presentation times is calculated according to the following formula: b=tp/(tp+fn); wherein b represents the recall, TP represents that the predicted sample is a good person, actually a good person, FP represents that the predicted sample is a good person, actually a bad person, TN represents that the predicted sample is a bad person, actually a bad person, FN represents that the predicted sample is a bad person, and actually a good person.
In any of the above embodiments, preferably, the different overdue performance period includes a number of periods and a number of days of overdue.
In any of the foregoing technical solutions, preferably, the step of obtaining sample data to be calibrated specifically includes: retrieving the sample data from a database; and/or retrieving the sample data from a third party lending platform.
In any of the above embodiments, preferably, the method further comprises: and performing machine learning simulation training on the calibrated sample data to construct a wind assessment model.
According to the technical scheme, the wind evaluation model is built by using sample data with strong timeliness and accurate classification, so that a better model can be obtained to the greatest extent, and the accuracy of the model evaluation result is improved.
In order to solve the above-mentioned technical problem, a second aspect of the present invention provides a determination device for a label for a wind assessment model, including: the first acquisition unit is used for acquiring sample data to be calibrated; the second acquisition unit is used for acquiring the accuracy and recall rate of the sample data under different overdue expression periods; a processing unit for screening out target overdue presentation periods from the different overdue presentation periods based on the accuracy rate and the recall rate, and taking the target overdue presentation periods as labels; and the calibration unit is used for calibrating the sample data according to the screened labels.
According to the technical scheme, the sample data to be calibrated and the accuracy and recall rate of the sample data under different overdue expression periods are obtained, and the proper labels are screened out from the different overdue expression periods according to the accuracy and recall rate, so that the sample data are calibrated, the sample data are not required to be calibrated after being completely expressed, the sample data can be predicted timely and accurately, and the timeliness of the sample data is enhanced.
In any one of the foregoing solutions, preferably, the processing unit includes: the statistical table construction unit is used for constructing a statistical table containing each overdue presentation period of the different overdue presentation periods and the corresponding accuracy and recall rate of the overdue presentation period; and the screening unit is used for screening the target overdue presentation period according to the statistical table.
In the technical scheme, the statistical table is constructed to facilitate screening and comparing the accuracy and recall under different conditions, so that the target overdue presentation period can be accurately screened.
In any of the above embodiments, preferably, the screening unit is specifically configured to: and screening out overdue presentation dates corresponding to the accuracy rate greater than the first threshold and the recall rate greater than the second threshold from the statistical table, and taking the overdue presentation dates as the target overdue presentation dates.
In the technical scheme, the recall rate is lower when the accuracy rate is higher, and the accuracy rate is lower when the recall rate is higher, so that a group of corresponding overdue expression periods with relatively higher accuracy rate and recall rate are screened out to serve as labels by comparing the accuracy rate and the recall rate in the statistical table, the rationality of the screened labels is ensured, and the accuracy of the subsequent classification and calibration of sample data is ensured to the greatest extent.
In any one of the foregoing technical solutions, preferably, the second obtaining unit is specifically configured to: the accuracy of the sample data at different overdue presentation times is calculated according to the following formula: a= (tp+tn)/(tp+fp+tn+fn); wherein a represents the accuracy, TP represents that the predicted sample is a good person, FP represents that the predicted sample is a bad person, TN represents that the predicted sample is a bad person, FN represents that the predicted sample is a bad person, and FN represents that the predicted sample is a bad person.
In any one of the foregoing technical solutions, preferably, the second obtaining unit is specifically configured to: the recall of the sample data at different overdue presentation times is calculated according to the following formula: b=tp/(tp+fn); wherein b represents the recall, TP represents that the predicted sample is a good person, actually a good person, FP represents that the predicted sample is a good person, actually a bad person, TN represents that the predicted sample is a bad person, actually a bad person, FN represents that the predicted sample is a bad person, and actually a good person.
In any of the above embodiments, preferably, the different overdue performance period includes a number of periods and a number of days of overdue.
In any one of the foregoing technical solutions, preferably, the first obtaining unit is specifically configured to: retrieving the sample data from a database; and/or retrieving the sample data from a third party lending platform.
In any of the above embodiments, preferably, the method further comprises: and the model construction unit is used for carrying out machine learning simulation training on the calibrated sample data so as to construct a wind assessment model.
According to the technical scheme, the wind evaluation model is built by using sample data with strong timeliness and accurate classification, so that a better model can be obtained to the greatest extent, and the accuracy of the model evaluation result is improved.
In order to solve the above technical problem, a third aspect of the present invention provides an electronic device, including: a processor and a memory storing computer executable instructions that when executed cause the processor to perform the method of any of the above claims.
In order to solve the above technical problem, a fourth aspect of the present invention proposes a computer readable storage medium, wherein the computer readable storage medium stores one or more programs, which when executed by a processor, implement the method according to any of the above technical aspects.
The invention calculates the accuracy and recall rate of the sample data under different overdue expression periods, and screens out the proper label from different overdue expression periods according to the accuracy and recall rate so as to calibrate the sample data.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects achieved more clear, specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted, however, that the drawings described below are merely illustrative of exemplary embodiments of the present invention and that other embodiments of the present invention may be derived from these drawings by those skilled in the art without undue effort.
FIG. 1 shows a schematic flow chart of a method of determining a label for a wind assessment model according to an embodiment of the present invention;
fig. 2 shows a schematic block diagram of a determination device of a tag for a wind assessment model according to an embodiment of the present invention;
FIG. 3 shows a schematic block diagram of an electronic device according to an embodiment of the invention;
fig. 4 shows a schematic block diagram of a computer-readable storage medium according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. The same reference numerals in the drawings denote the same or similar elements, components or portions, and thus a repetitive description thereof will be omitted.
The features, structures, characteristics or other details described in a particular embodiment do not exclude that may be combined in one or more other embodiments in a suitable manner, without departing from the technical idea of the invention.
In the description of specific embodiments, features, structures, characteristics, or other details described in the present invention are provided to enable one skilled in the art to fully understand the embodiments. However, it is not excluded that one skilled in the art may practice the present invention without one or more of the specific features, structures, characteristics, or other details.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The same reference numerals in the drawings denote the same or similar elements, components or portions, and thus repeated descriptions of the same or similar elements, components or portions may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various devices, elements, components or portions, these devices, elements, components or portions should not be limited by these terms. That is, these phrases are merely intended to distinguish one from the other. For example, a first device may also be referred to as a second device without departing from the spirit of the invention. Furthermore, the term "and/or" and/or "includes all combinations of any of the associated listed items and one or more.
In the process of labeling models, such problems are generally encountered: when the loan term is longer, for example, 12 months, the sample data are completely represented, the "bad people" is completely represented, and at least 1 year is needed, which means that at least one year old sample is needed to make the model, the obtained model is obviously bad, and in order to obtain a better model, sample data with strong timeliness are needed to train, labels are needed to predict and classify the sample data because the sample data are truly good or bad, in order to obtain a proper label, accuracy and recall rate are introduced, wherein the accuracy refers to the proportion of the predicted correct data to all data, the recall rate is the probability of predicting the correct data in the data which is actually good, the relation between the two is that the higher the accuracy is lower, the higher the recall rate is, the lower the accuracy is, the proper labels are screened out by using the preparation rate and the recall rate under different overdue expression periods, the sample data are predicted and classified to enhance the timeliness of the sample data, and the specific label determining process is shown in fig. 1, and comprises:
step S102, sample data to be calibrated are obtained.
The source of the sample data can be called from a database, can be called in a third party lending platform (such as information in a lending APP in a client), and can integrate the sample data in the database and the third party lending platform, so that the comprehensiveness of the sample data is ensured.
Step S104, obtaining the accuracy and recall of the sample data under different overdue presentation periods. Wherein, the different overdue presentation dates may include the number of dates and the number of days of overdue, such as the 6-period overdue day.
Specifically, in the formula (1) and the formula (2), TP indicates that the predicted sample is a good person, actually a good person, FP indicates that the predicted sample is a good person, actually a bad person, TN indicates that the predicted sample is a bad person, actually a bad person, FN indicates that the predicted sample is a bad person, and actually a good person.
Calculating the accuracy a of the sample data under different overdue expression periods according to the formula (1):
a=(TP+TN)/(TP+FP+TN+FN) (1)。
calculating recall b of the sample data at different overdue presentation periods according to formula (2):
b=TP/(TP+FN) (2)。
and step S106, screening out target overdue presentation periods from different overdue presentation periods based on the accuracy and the recall, and taking the target overdue presentation periods as labels.
Specifically, the screening process comprises the following steps: and constructing a statistical table containing each overdue presentation period of different overdue presentation periods and the accuracy and recall corresponding to each overdue presentation period, screening out overdue presentation periods corresponding to which the accuracy is larger than a first threshold and the recall is larger than a second threshold from the statistical table, and taking the overdue presentation periods as target overdue presentation periods. Generally, the higher the accuracy is, the lower the recall is, and the lower the accuracy is, so that a group of corresponding overdue expression dates with relatively higher accuracy and recall are screened out as labels by comparing the accuracy and the recall in the statistical table, the rationality of the screened labels is ensured, and the accuracy of the subsequent classification and calibration of sample data is ensured to the greatest extent.
The following is an example of the expiration days of the sample data at 6. Fig.:
the corresponding accuracy and recall for the different overdue presentation dates (1 day over 6, 7 days over 6, 15 days over 6, 30 days over 6) are shown in table 1:
overdue presentation period Accuracy rate of Recall rate of recall
The expiration of the 6 th period is 1 day a 1 b 1
The 6 th time is overdue for 7 days a 2 b 2
The expiration of the 6 th period is 15 days a 3 b 3
Expiration of the 6 th period for 30 days a 4 b 4
TABLE 1
By comparing the accuracy with the recall at each overdue presentation in Table 1, if the 6 th is overdue for 30 days, the accuracy a 4 And recall rate b 4 If they are high, the tag may be selected to be 30 days over the 6-period, and the sample data of 30 days over the 6-period may be defined as bad people. Here, the 6 th period in this example is merely described as an example, and the overdue expression period is not limited thereto.
And S108, calibrating the sample data according to the screened labels.
In the embodiment, the sample data to be calibrated and the accuracy and recall rate of the sample data under different overdue expression periods are obtained, and the proper labels are screened out from the different overdue expression periods according to the accuracy and recall rate, so that the sample data are calibrated, the sample data are not required to be calibrated after being completely expressed, the sample data can be predicted timely and accurately, and the timeliness of the sample data is enhanced.
Further, the method further comprises the following steps: and performing machine learning simulation training on the calibrated sample data to construct a wind assessment model. The wind evaluation model is constructed by utilizing sample data with strong timeliness and accurate classification, so that a better model can be obtained to the greatest extent, and the accuracy of the model evaluation result is improved.
Those skilled in the art will appreciate that all or part of the steps implementing the above-described embodiments are implemented as a program (computer program) executed by a computer data processing apparatus. The above-described method provided by the present invention can be implemented when the computer program is executed. Moreover, the computer program may be stored in a computer readable storage medium, which may be a readable storage medium such as a magnetic disk, an optical disk, a ROM, a RAM, or a storage array composed of a plurality of storage media, for example, a magnetic disk or a tape storage array. The storage medium is not limited to a centralized storage, but may be a distributed storage, such as cloud storage based on cloud computing.
The following describes apparatus embodiments of the invention that may be used to perform method embodiments of the invention. Details described in the embodiments of the device according to the invention should be regarded as additions to the embodiments of the method described above; for details not disclosed in the embodiments of the device according to the invention, reference may be made to the above-described method embodiments.
In the process of labeling models, such problems are generally encountered: when the loan term is longer, for example, 12 months, the sample data are completely represented, the "bad people" is completely represented, and at least 1 year is needed, which means that at least one year old sample is needed to make the model, the obtained model is obviously bad, and in order to obtain a better model, sample data with strong timeliness are needed to train, labels are needed to predict and classify the sample data because the sample data are truly good or bad, in order to obtain a proper label, accuracy and recall rate are introduced, wherein the accuracy refers to the proportion of the predicted correct data to all data, the recall rate is the probability of predicting the correct data in the actual good people, the relation between the two is that the higher the accuracy is lower, the higher the recall rate is, the lower the accuracy is, the proper labels are screened out by using the preparation rate and the recall rate under different overdue expression periods, the timeliness of the sample data are predicted and classified, are specific, and, as shown in fig. 2, the wind evaluation and label determining device 200 for evaluating and label and using label determining device 200 is shown in fig. 2, comprises: a first acquisition unit 202, a second acquisition unit 204, a processing unit 206 and a calibration unit 208.
The first obtaining unit 202 is configured to obtain sample data to be calibrated, where a source of the sample data may be called from a database, or may be called from a third party lending platform (such as information in a lending APP in a client), or may integrate the sample data in the database and the third party lending platform, so as to ensure comprehensiveness of the sample data.
The second obtaining unit 204 is configured to obtain accuracy and recall of the sample data under different overdue presentation periods. Wherein, the different overdue presentation dates may include the number of dates and the number of days of overdue, such as the 6-period overdue day.
In the following formulas (1) and (2), TP indicates that the predicted sample is a good person, actually a good person, FP indicates that the predicted sample is a good person, actually a bad person, TN indicates that the predicted sample is a bad person, actually a bad person, FN indicates that the predicted sample is a bad person, and actually a good person.
Specifically, the second obtaining unit 204 calculates the accuracy a of the sample data under different overdue expression periods according to the formula (1):
a=(TP+TN)/(TP+FP+TN+FN) (1)。
the second obtaining unit 204 calculates recall b of the sample data at different overdue performance periods according to formula (2):
b=TP/(TP+FN) (2)。
the processing unit 206 is configured to filter out target overdue presentation dates from different overdue presentation dates based on the accuracy rate and the recall rate and take them as labels.
Specifically, the processing unit 206 includes a statistical table construction unit 2062 and a screening unit 2064, the statistical table construction unit 2062 constructs a statistical table including each of the different overdue presentation periods and the accuracy and recall corresponding thereto, and the screening unit 2064 screens out the overdue presentation period corresponding to an accuracy greater than the first threshold and a recall greater than the second threshold from the statistical table as the target overdue presentation period. Generally, the higher the accuracy is, the lower the recall is, and the lower the accuracy is, so that a group of corresponding overdue expression dates with relatively higher accuracy and recall are screened out as labels by comparing the accuracy and the recall in the statistical table, the rationality of the screened labels is ensured, and the accuracy of the subsequent classification and calibration of sample data is ensured to the greatest extent.
The following is an example of the expiration days of the sample data at 6. Fig.:
the corresponding accuracy and recall for the different overdue presentation dates (1 day over 6, 7 days over 6, 15 days over 6, 30 days over 6) are shown in table 1:
overdue presentation period Accuracy rate of Recall rate of recall
The expiration of the 6 th period is 1 day a 1 b 1
The 6 th time is overdue for 7 days a 2 b 2
The expiration of the 6 th period is 15 days a 3 b 3
Expiration of the 6 th period for 30 days a 4 b 4
TABLE 1
By comparing the accuracy with the recall at each overdue presentation in Table 1, if the 6 th is overdue for 30 days, the accuracy a 4 And recall rate b 4 If they are high, the tag may be selected to be 30 days over the 6-period, and the sample data of 30 days over the 6-period may be defined as bad people. Here, the 6 th period in this example is merely described as an example, and the overdue expression period is not limited thereto.
The calibration unit 208 is configured to calibrate the sample data according to the selected tag.
In the embodiment, the sample data to be calibrated and the accuracy and recall rate of the sample data under different overdue expression periods are obtained, and the proper labels are screened out from the different overdue expression periods according to the accuracy and recall rate, so that the sample data are calibrated, the sample data are not required to be calibrated after being completely expressed, the sample data can be predicted timely and accurately, and the timeliness of the sample data is enhanced.
Further, the wind-evaluation-model tag determination device 200 further includes: the model construction unit 210 is configured to perform machine learning simulation training on the calibrated sample data to construct a wind assessment model. The wind evaluation model is constructed by utilizing sample data with strong timeliness and accurate classification, so that a better model can be obtained to the greatest extent, and the accuracy of the model evaluation result is improved.
It will be appreciated by those skilled in the art that the modules in the embodiments of the apparatus described above may be distributed in an apparatus as described, or may be distributed in one or more apparatuses different from the embodiments described above with corresponding changes. The modules of the above embodiments may be combined into one module, or may be further split into a plurality of sub-modules.
The following describes an embodiment of an electronic device according to the present invention, which may be regarded as a specific physical implementation of the above-described embodiment of the method and apparatus according to the present invention. Details described in relation to the embodiments of the electronic device of the present invention should be considered as additions to the embodiments of the method or apparatus described above; for details not disclosed in the embodiments of the electronic device of the present invention, reference may be made to the above-described method or apparatus embodiments.
Fig. 3 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. An electronic device 300 according to this embodiment of the present invention is described below with reference to fig. 3. The electronic device 300 shown in fig. 3 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 3, the electronic device 300 is embodied in the form of a general purpose computing device. Components of electronic device 300 may include, but are not limited to: at least one processing unit 310, at least one memory unit 320, a bus 330 connecting the different system components (including the memory unit 320 and the processing unit 310), a display unit 340, and the like.
Wherein the storage unit stores program code that is executable by the processing unit 310 such that the processing unit 310 performs the steps according to various exemplary embodiments of the present invention described in the electronic prescription stream processing method section above in this specification. For example, the processing unit 310 may perform the steps shown in fig. 1.
The memory unit 320 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 3201 and/or cache memory 3202, and may further include Read Only Memory (ROM) 3203.
The storage unit 320 may also include a program/utility 3204 having a set (at least one) of program modules 3205, such program modules 3205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 330 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 300 may also communicate with one or more external devices 400 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 300, and/or any device (e.g., router, modem, etc.) that enables the electronic device 300 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 350. Also, electronic device 300 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 360. The network adapter 360 may communicate with other modules of the electronic device 300 via the bus 330. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 300, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the exemplary embodiments described herein may be implemented in software, or may be implemented in software in combination with necessary hardware. Thus, the technical solution according to the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a computer readable storage medium (may be a CD-ROM, a usb disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, or a network device, etc.) to perform the above-mentioned method according to the present invention. The computer program, when executed by a data processing device, enables the computer readable medium to carry out the above-described method of the present invention, namely: and acquiring sample data to be calibrated, acquiring the accuracy and recall of the sample data under different overdue presentation periods, screening out target overdue presentation periods from different overdue presentation periods based on the accuracy and recall, taking the target overdue presentation periods as labels, and calibrating the sample data according to the screened labels.
Fig. 4 is a schematic diagram of a computer readable storage medium of the present invention. As shown in fig. 4, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
In summary, the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components in accordance with embodiments of the present invention may be implemented in practice using a general purpose data processing device such as a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
The above-described specific embodiments further describe the objects, technical solutions and advantageous effects of the present invention in detail, and it should be understood that the present invention is not inherently related to any particular computer, virtual device or electronic apparatus, and various general-purpose devices may also implement the present invention. The foregoing description of the embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (7)

1. A method for determining a label for a wind assessment model is characterized by comprising the following steps:
acquiring sample data to be calibrated;
obtaining the accuracy and recall of the sample data at different overdue presentation dates, comprising:
the accuracy of the sample data at different overdue presentation times is calculated according to the following formula:
a=(TP+TN)/(TP+FP+TN+FN)
wherein a represents the accuracy, TP represents that the predicted sample is a good person, FP represents that the predicted sample is a bad person, TN represents that the predicted sample is a bad person, FN represents that the predicted sample is a bad person, and FN represents that the predicted sample is a bad person;
the recall of the sample data at different overdue presentation times is calculated according to the following formula:
b=TP/(TP+FN);
wherein b represents a recall rate, TP represents that the predicted sample is a good person, actually a good person, FP represents that the predicted sample is a good person, actually a bad person, TN represents that the predicted sample is a bad person, actually a bad person, FN represents that the predicted sample is a bad person, and actually a good person; constructing a statistical table containing each overdue presentation period of the different overdue presentation periods and the corresponding accuracy and recall thereof; the overdue presentation period corresponding to the accuracy rate larger than the first threshold value and the recall rate larger than the second threshold value is screened out from the statistical table and is used as a target overdue presentation period; comparing the accuracy and recall of each target overdue presentation in the statistical table, and screening out a group of corresponding overdue presentation with relatively high accuracy and recall as labels;
and calibrating the sample data according to the screened labels without waiting for the complete representation of the sample data.
2. The method for determining a label for a wind assessment model according to claim 1, further comprising:
and performing machine learning simulation training on the calibrated sample data to construct a wind assessment model.
3. The method of claim 1, wherein the different overdue performance dates include a number of dates and a number of days of overdue.
4. A method of determining a label for a wind assessment model according to any one of claims 1 to 3, wherein the obtaining of sample data to be calibrated comprises:
retrieving the sample data from a database; and/or
And retrieving the sample data from a third party lending platform.
5. A label determination device for a wind assessment model, comprising:
the first acquisition unit is used for acquiring sample data to be calibrated;
the second obtaining unit is configured to obtain accuracy and recall of the sample data under different overdue performance periods, and includes:
the accuracy of the sample data at different overdue presentation times is calculated according to the following formula:
a=(TP+TN)/(TP+FP+TN+FN);
wherein a represents the accuracy, TP represents that the predicted sample is a good person, FP represents that the predicted sample is a bad person, TN represents that the predicted sample is a bad person, FN represents that the predicted sample is a bad person, and FN represents that the predicted sample is a bad person;
the recall of the sample data at different overdue presentation times is calculated according to the following formula:
b=TP/(TP+FN);
wherein b represents a recall rate, TP represents that the predicted sample is a good person, actually a good person, FP represents that the predicted sample is a good person, actually a bad person, TN represents that the predicted sample is a bad person, actually a bad person, FN represents that the predicted sample is a bad person, and actually a good person;
a processing unit for screening out target overdue presentation dates from the different overdue presentation dates based on the accuracy rate and the recall rate, and taking the target overdue presentation dates as labels, comprising: the statistical table construction unit is used for constructing a statistical table containing each overdue presentation period of the different overdue presentation periods and the corresponding accuracy and recall rate of the overdue presentation period; the screening unit is used for screening out overdue presentation dates corresponding to an accuracy rate greater than a first threshold value and a recall rate greater than a second threshold value from the statistical table, taking the overdue presentation dates as target overdue presentation dates, comparing the accuracy rate and the recall rate of each target overdue presentation date in the statistical table, and screening out a group of corresponding overdue presentation dates with relatively high accuracy rate and recall rate as labels;
and the calibration unit is used for calibrating the sample data according to the screened labels without waiting for the complete representation of the sample data.
6. An electronic device, wherein the electronic device comprises:
a processor; and
a memory storing computer executable instructions that, when executed, cause the processor to perform the method of any of claims 1-4.
7. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-4.
CN201910578914.9A 2019-06-28 2019-06-28 Determination method and determination device for label for wind assessment model and electronic equipment Active CN110348993B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910578914.9A CN110348993B (en) 2019-06-28 2019-06-28 Determination method and determination device for label for wind assessment model and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910578914.9A CN110348993B (en) 2019-06-28 2019-06-28 Determination method and determination device for label for wind assessment model and electronic equipment

Publications (2)

Publication Number Publication Date
CN110348993A CN110348993A (en) 2019-10-18
CN110348993B true CN110348993B (en) 2023-12-22

Family

ID=68177378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910578914.9A Active CN110348993B (en) 2019-06-28 2019-06-28 Determination method and determination device for label for wind assessment model and electronic equipment

Country Status (1)

Country Link
CN (1) CN110348993B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699628A (en) * 2013-12-20 2014-04-02 北京百度网讯科技有限公司 Multiple tag obtaining method and device
CN107909097A (en) * 2017-11-08 2018-04-13 阿里巴巴集团控股有限公司 The update method and device of sample in sample storehouse
CN108595497A (en) * 2018-03-16 2018-09-28 北京达佳互联信息技术有限公司 Data screening method, apparatus and terminal
CN109242499A (en) * 2018-09-19 2019-01-18 中国银行股份有限公司 A kind of processing method of transaction risk prediction, apparatus and system
CN109388760A (en) * 2017-08-03 2019-02-26 腾讯科技(北京)有限公司 Recommend label acquisition method, media content recommendations method, apparatus and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130138554A1 (en) * 2011-11-30 2013-05-30 Rawllin International Inc. Dynamic risk assessment and credit standards generation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699628A (en) * 2013-12-20 2014-04-02 北京百度网讯科技有限公司 Multiple tag obtaining method and device
CN109388760A (en) * 2017-08-03 2019-02-26 腾讯科技(北京)有限公司 Recommend label acquisition method, media content recommendations method, apparatus and storage medium
CN107909097A (en) * 2017-11-08 2018-04-13 阿里巴巴集团控股有限公司 The update method and device of sample in sample storehouse
CN108595497A (en) * 2018-03-16 2018-09-28 北京达佳互联信息技术有限公司 Data screening method, apparatus and terminal
CN109242499A (en) * 2018-09-19 2019-01-18 中国银行股份有限公司 A kind of processing method of transaction risk prediction, apparatus and system

Also Published As

Publication number Publication date
CN110348993A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
Antonio et al. Micro-level stochastic loss reserving for general insurance
Wolf et al. Clarifying vulnerability definitions and assessments using formalisation
CN109461023B (en) Loss user retrieval method and device, electronic equipment and storage medium
US20180196023A1 (en) Air quality forecasting based on dynamic blending
CN111179055B (en) Credit line adjusting method and device and electronic equipment
CN111210335A (en) User risk identification method and device and electronic equipment
CN110688536A (en) Label prediction method, device, equipment and storage medium
US12056622B2 (en) Identifying influential effects to be adjusted in goal seek analysis
US20110093309A1 (en) System and method for predictive categorization of risk
CN110889725A (en) Online advertisement CTR estimation method, device, equipment and storage medium
CN115345530A (en) Market address recommendation method, device and equipment and computer readable storage medium
CN113159934A (en) Method and system for predicting passenger flow of network, electronic equipment and storage medium
CN110543996A (en) job salary assessment method, apparatus, server and storage medium
US11003859B2 (en) Machine-learning automated structural quality analysis
CN118151998A (en) Code annotation quality determining method, device, equipment and readable storage medium
Abdallah et al. Sarmanov family of bivariate distributions for multivariate loss reserving analysis
Hoekman et al. Aid for trade and trade in services
Herbert et al. Validation of forest vegetation simulator model finds overprediction of carbon growth in California
Ugur et al. Information asymmetry, risk aversion and R&D subsidies: effect-size heterogeneity and policy conundrums
CN112508692A (en) Resource recovery risk prediction method and device based on convolutional neural network and electronic equipment
CN112070530A (en) Online evaluation method and related device of advertisement prediction model
CN112348658A (en) Resource allocation method and device and electronic equipment
Avenali et al. Heterogeneity of national accounting systems, world-class universities and financial resources: What are the links?
CN110348993B (en) Determination method and determination device for label for wind assessment model and electronic equipment
CN116720946A (en) Credit risk prediction method, device and storage medium based on recurrent neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant