CN109447461B - User credit evaluation method and device, electronic equipment and storage medium - Google Patents
User credit evaluation method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN109447461B CN109447461B CN201811260889.1A CN201811260889A CN109447461B CN 109447461 B CN109447461 B CN 109447461B CN 201811260889 A CN201811260889 A CN 201811260889A CN 109447461 B CN109447461 B CN 109447461B
- Authority
- CN
- China
- Prior art keywords
- parameters
- type
- target
- parameter
- class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0609—Buyer or seller confidence or verification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/06—Asset management; Financial planning or analysis
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Technology Law (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Educational Administration (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
Abstract
The disclosure relates to a user credit assessment method and device, electronic equipment and a storage medium, and relates to the technical field of internet, wherein the method comprises the following steps: acquiring a plurality of feature information of a target user, wherein the feature information comprises a first type of parameter and a second type of parameter; preprocessing the first type parameters and the second type parameters; converting the preprocessed first-class parameters to generate target parameters; inputting the preprocessed second type parameters and the preprocessed target parameters into a machine learning model to obtain a credit evaluation result of the target user; the characteristic information with the IV value lower than a preset threshold is the first type of parameters, and the characteristic information with the IV value higher than the preset threshold is the second type of parameters. The method and the device can more accurately determine the credit evaluation result of the user and accurately identify the credit risk.
Description
Technical Field
The present disclosure relates to the field of internet technologies, and in particular, to a user credit evaluation method, a user credit evaluation apparatus, an electronic device, and a computer-readable storage medium.
Background
The credit card model is the most common risk scoring model in the financial field, and the model balances interpretability and algorithm complexity.
In the related art, the default probability of the user is generally predicted by using parameters with strong financial attributes. However, in most cases, the acquired user data does not have strong financial attributes. Therefore, the amount of data of the strong financial attribute parameters that can be used is small, which may cause inaccurate predicted credit assessment results and limited application range, and may not accurately measure the user risk.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure is directed to a user credit assessment method and apparatus, an electronic device, and a storage medium, which overcome, at least to some extent, the problem that a user risk cannot be accurately measured due to limitations and disadvantages of related art.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the present disclosure, there is provided a user credit evaluation method, including: acquiring a plurality of feature information of a target user, wherein the feature information comprises a first type of parameter and a second type of parameter; preprocessing the first type parameters and the second type parameters; converting the preprocessed first-class parameters to generate target parameters; inputting the preprocessed second type parameters and the preprocessed target parameters into a machine learning model to obtain a credit evaluation result of the target user; the characteristic information with the IV value lower than a preset threshold is the first type of parameters, and the characteristic information with the IV value higher than the preset threshold is the second type of parameters.
In an exemplary embodiment of the present disclosure, preprocessing the first class of parameters and the second class of parameters includes: and performing box separation treatment on the first type of parameters and the second type of parameters respectively through the evidence weight values to obtain the first type of parameters after box separation and the second type of parameters after box separation.
In an exemplary embodiment of the present disclosure, converting the preprocessed first type parameters, and generating the target parameters includes: and performing feature combination on the first class parameters after the classification associated with each theme by adopting a linear discriminant algorithm to generate the target parameters.
In an exemplary embodiment of the present disclosure, the method further comprises: performing secondary box separation on the target parameters, and putting the secondary box separated target parameters into a candidate variable pool; and putting the second type of parameters after the classification into the candidate variable pool.
In an exemplary embodiment of the disclosure, inputting the preprocessed second class parameters and the target parameters into a machine learning model includes: removing multiple collinearity between the second type of parameters after the candidate variable pool is subjected to binning and the target parameters after the candidate variable pool is subjected to binning again to obtain residual parameters; inputting the remaining parameters into the machine learning model.
In an exemplary embodiment of the present disclosure, the removing multiple collinearity between the binned second-class parameter and the binned target parameter in the candidate variable pool to obtain the remaining parameter includes: and removing the second class of binned parameters with the evidence weight value smaller than the preset value and the target parameters after binning again from the candidate variable pool to obtain the residual parameters.
In an exemplary embodiment of the present disclosure, the removing, from the candidate variable pool, the second class of binned parameters whose evidence weight values are smaller than a preset value and the target parameters after binning again includes: according to the arrangement sequence of the evidence weight values from small to large, the second type of parameters after the classification and the target parameters after the secondary classification are removed, wherein the evidence weight values are smaller than the preset value; recalculating the evidence weight values of the second type of parameters subjected to the removed binning and the target parameters subjected to the binning again; and according to the arrangement sequence of the evidence weight values from small to large, removing the second class parameters after the classification and the target parameters after the classification are removed until the second class parameters and the target parameters with the evidence weight values smaller than the preset value are removed.
According to an aspect of the present disclosure, there is provided a user credit evaluation apparatus including: the system comprises a characteristic acquisition module, a characteristic acquisition module and a characteristic analysis module, wherein the characteristic acquisition module is used for acquiring a plurality of characteristic information of a target user, and the plurality of characteristic information comprises a first type parameter and a second type parameter; the parameter preprocessing module is used for preprocessing the first type of parameters and the second type of parameters; the target parameter generation module is used for converting the preprocessed first-class parameters to generate target parameters; the evaluation result determining module is used for inputting the preprocessed second type parameters and the preprocessed target parameters into a machine learning model to obtain a credit evaluation result of the target user; the characteristic information with the IV value lower than a preset threshold is the first type of parameters, and the characteristic information with the IV value higher than the preset threshold is the second type of parameters.
According to an aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any of the above described user credit assessment methods via execution of the executable instructions.
According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a user credit assessment method as described in any one of the above.
In the user credit evaluation method, the user credit evaluation device, the electronic device, and the computer-readable storage medium provided in the exemplary embodiment of the present disclosure, on one hand, the target parameter generated by converting the first type of parameter is generated by converting the preprocessed first type of parameter, so that the target parameter generated by converting the first type of parameter can be used for credit evaluation, thereby avoiding the problems of insufficient data size and small application range caused by only using the second type of parameter for credit evaluation in the related art, and increasing the data size and the application range; on the other hand, the data volume is increased by inputting the preprocessed second-class parameters and the target parameters generated after conversion into the machine learning model, so that accurate credit evaluation results can be obtained based on the preprocessed second-class parameters and the target parameters generated after conversion, and the credit risk of the user can be accurately measured.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 is a schematic diagram illustrating a user credit assessment method in an exemplary embodiment of the disclosure;
FIG. 2 schematically illustrates a detailed flow chart of user credit evaluation in an exemplary embodiment of the present disclosure;
FIG. 3 schematically illustrates a block diagram of a user credit evaluation device in an exemplary embodiment of the disclosure;
FIG. 4 schematically illustrates a block diagram of an electronic device in an exemplary embodiment of the disclosure;
fig. 5 schematically illustrates a program product in an exemplary embodiment of the disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
First, a user credit evaluation method is provided in the present exemplary embodiment, and the user credit evaluation method is described in detail with reference to fig. 1.
In step S110, a plurality of feature information of the target user is obtained, where the plurality of feature information includes a first type parameter and a second type parameter.
In this exemplary embodiment, the feature information refers to a data feature corresponding to the historical data of the target user, and may specifically include a first type of parameter and a second type of parameter. The characteristic information with the IV value lower than a preset threshold is the first type of parameters, and the characteristic information with the IV value higher than the preset threshold is the second type of parameters. The feature information may be a parameter corresponding to each topic, and each topic may include a plurality of feature information. For example, characteristic information includes, but is not limited to, age, income, consumption data, number of views, time of view, and the like.
After obtaining the plurality of feature information, the feature information may be divided into a first class of parameters and a second class of parameters, and specifically, the feature information may be divided according to an IV value. The IV (information value) value may also be referred to as an information value, and refers to an index for measuring the capability of a certain feature information to distinguish good and bad customers when a model is constructed by a model method such as logistic regression and decision tree, and all the feature information may be screened and classified by using the IV value. In general, the larger the IV value, the larger the information value indicating the feature information, and therefore, the feature information having a large IV value may be put into a model to perform fitting training on the model.
For example, assume that in a classification problem, there are two categories of feature information: y1, Y2. For an individual A to be predicted, certain information is required to judge whether the individual A belongs to Y1 or Y2. Assuming that the total amount of information is I, and the required information is contained in all the feature information C1, C2, C3, … …, Cn, the more information a feature information Ci contains, the greater the contribution of the feature information to the judgment of whether a belongs to Y1 or Y2, the greater the information value of Ci, the greater the IV value of Ci, and the better the distinguishing capability of the feature information is, the model can be built using the feature information Ci.
The first type of parameters includes weak parameters, such as weak financial attribute parameters; the second type of parameters includes strong parameters, such as strong financial attribute parameters. In the exemplary embodiment, the preset threshold may be set to 0.2, the feature information with an information value lower than 0.2 in all the feature information may be determined as the first type of parameter, and the feature information with an information value greater than 0.2 in all the feature information may be determined as the second type of parameter. However, the preset threshold is not limited to the above value, and may be set according to actual requirements. In addition, the characteristic information having an information value of less than 0.05 may be used as the infirm parameter. For the extremely weak parameters, because the distinguishing capability of the characteristic information is poor, the extremely weak parameters can be directly filtered out, so that the influence of the extremely weak parameters on the whole evaluation result is avoided.
In step S120, the first type parameters and the second type parameters are preprocessed.
In the present exemplary embodiment, the preprocessing refers to binning processing. The binning processing, i.e., grouping processing, refers to discretizing a continuous parameter or combining discrete parameters of multiple states into discrete parameters of fewer states. In the exemplary embodiment, the WOE algorithm may be specifically used for binning processing. WOE (Weight of Evidence) refers to a form of encoding of the original parameters. Before performing WOE encoding on a parameter, the parameter needs to be subjected to binning processing, which specifically includes equidistant binning, equal-depth binning, optimal binning and other modes. After binning, the evidence weight value WOE for the ith group can be calculated by equation (1):
wherein, pyiIs in the ith groupProportion of responding clients to all responding clients in all samples, pniIs the proportion of non-responding clients in group i to all non-responding clients in the sample, # yiIs the number of responding clients in group i, # niIs the number of unresponsive clients in group i, # yTIs the number of all responding clients in the sample, # nTIs the number of all unresponsive clients in the sample. The responding client refers to an individual with a parameter value of 1 in the model.
It can be seen that the proof weight value WOE represents the difference between the "proportion of responding clients to all responding clients in the current packet" and the "proportion of non-responding clients to all non-responding clients in the current packet". Transforming equation (1) to obtain equation (2):
wherein, the larger the evidence weight value WOE, the larger the difference, the more likely the sample response in this packet is.
By performing box separation processing on the first type of parameters and the second type of parameters, random errors or abnormal parameters of the parameters can be avoided, so that the parameters are denoised, and the processing speed and efficiency are improved.
In step S130, the preprocessed first type parameters are transformed to generate target parameters.
In the exemplary embodiment, since the first type of parameters cannot be directly used for evaluating credit evaluation, a part of feature information cannot be fully utilized. When the second type of parameter belonging to the strong financial attribute does not exist in the feature information, the credit evaluation cannot be performed. In order to avoid the above problem, all the first type parameters may be converted, so that the original first type parameters are converted into the target parameters. The specific process of conversion refers to feature combination, and the target parameter refers to a parameter associated with the second type of parameter, i.e., a parameter having strong financial attributes. The parameters related to the second type of parameters refer to the same type of parameters as the second type of parameters, for example, a plurality of original weak parameters are combined into a strong parameter, so that credit evaluation can be performed according to the strong parameter and the strong parameter converted from the weak parameter. It should be noted that, through step S110, a plurality of first-class parameters may be obtained, where each topic may also include a plurality of first-class parameters, for example, 10, 50, and so on, which is not particularly limited in this exemplary embodiment. However, only one target parameter is obtained for each topic through the combination of features in step S130. For example, the weak parameter 1, the weak parameter 2, the weak parameter 3, and the weak parameter 4 corresponding to the theme 1 are feature-combined to obtain the strong parameter 1 of the theme 1.
When the feature combination is performed, a linear discriminant algorithm can be specifically adopted to perform the feature combination on all the binned first-class parameters associated with each topic so as to obtain a second-class parameter-associated parameter. Topics may include, for example, transactions, browsing, forecasting, and so on. The first type of parameter and the second type of parameter included therein may be different for each topic. In order to make all weak parameters satisfy the interpretability on business, therefore, feature combination needs to be carried out on all the first-class parameters associated with each theme. For example, all the first type parameters corresponding to the transaction theme are combined, and all the first type parameters corresponding to the browsing theme are combined. By combining the characteristics of the first type of parameters of each theme, the mutual influence of the parameters among different themes is avoided, and the efficiency and the accuracy of characteristic combination can be improved.
The linear discriminant algorithm LDA refers to the linear combination of a plurality of binned weak parameters in the classification process to form a linear expression. Specifically, a linear expression comprising each weak parameter is obtained by carrying out linear combination on a plurality of weak parameters, the target parameters are rotated around different angles on a feature space where the linear expression is located in the classification process, an optimal angle is obtained in the rotation process by utilizing a linear discriminant algorithm, the classification potential of the target parameters on the optimal angle is maximum, and the strong parameters can be obtained according to the target parameters with the maximum classification potential. Or an optimal linear combination is obtained according to the optimal angle, so that the classification potential of the target parameter corresponding to the optimal linear combination is maximum, and the strong parameter can be obtained according to the target parameter with the maximum classification potential. Wherein the target parameter refers to any one of all weak parameters. Classification potentials refer to potentials used for classification.
For example, for a transaction theme, the weak parameters include browsing times x1 and evaluation y1, the browsing times x1 and the evaluation y1 may be linearly combined to obtain a linear expression Ax1+ By1, an optimal angle for maximizing the classification potential of the target parameter may be obtained according to a linear discriminant algorithm in a feature space where the linear expression is located, and the browsing times x1 and the evaluation y1 may be combined into a strong parameter associated with the browsing times and the evaluation. The weak parameters under each theme can be combined into strong parameters in the above mode, so that the transformed strong parameters are used for model construction. In this way, compared to the related art, the risk score model with strong interpretability can be constructed by completely using weak parameters, and more parameters with more quantity and types can be included.
After the target parameters are generated, the target parameters and the second class parameters may be placed into a pool of candidate variables. The first type parameters and the parameters related to the first type parameters, which are put into the candidate variable pool of the model, are classified. In order to meet the requirement, after the first-class parameters are converted into the target parameters, the target parameters need to be subjected to binning again, and the target parameters are put into the candidate variable pool after being subjected to binning again. The candidate variable pool may include, for example, binned strong parameters 1, binned strong parameters 4 composed of weak parameters 2 and weak parameters 3, and so on.
Next, in step S140, inputting the preprocessed second-class parameters and the target parameters into a machine learning model to obtain a credit evaluation result of the target user.
In the present exemplary embodiment, the machine learning model may be a trained machine learning model, such as a convolutional neural network algorithm, a deep learning algorithm, and the like, and the convolutional neural network model is taken as an example in the present exemplary embodiment for explanation. The convolutional neural network model generally includes an input layer, a mapping layer, and an output layer.
When the preprocessed second-class parameters and the preprocessed target parameters are input into a machine learning model, in order to ensure the accuracy of the result, the target parameters in the candidate variable pool and the binned second-class parameters can be screened. That is, all the parameters in the candidate variable pool may be filtered to obtain remaining parameters, and the remaining parameters are input into the machine learning model as input parameters, and the output of the output layer of the machine learning model may be the probability that the user credit belongs to a certain interval, so as to determine the user credit evaluation result according to the magnitude of the probability that the user credit belongs to the certain interval, for example, when the probability that the user credit belongs to the interval with good credit is the maximum, the credit evaluation result is determined to be good.
Specifically, in order to ensure the accuracy of the result, multiple collinearity between the second-class parameter and the target parameter in the candidate variable pool can be eliminated. By multicollinearity is meant that the model estimates are distorted or difficult to estimate accurately due to the presence of precise or highly correlated relationships between parameters in the linear regression model. The evidence weight values should generally all be positive values, and if negative values occur in the calculated evidence weight values, the influence of whether the parameters are multicollinearity can be considered. Based on the above, whether multiple collinearity exists between the parameters can be determined according to the magnitude relation between the evidence weight value and the preset value. The default value may be 0, for example, and if the weight value of the evidence is smaller than 0 (i.e., a negative value), it is considered that multiple collinearity exists between the parameters, so that the parameters with the weight value of the evidence smaller than 0 can be sequentially eliminated to obtain the remaining parameters.
When multiple collinearity among parameters is eliminated, according to the arrangement sequence of the evidence weight values from small to large, the second type of parameters after the classification and the target parameters after the classification are eliminated, wherein the evidence weight values are smaller than the preset value; recalculating the evidence weight values of the second type of parameters subjected to the removed binning and the target parameters subjected to the binning again; and according to the arrangement sequence of the evidence weight values from small to large, removing the second class parameters after the classification and the target parameters after the classification are removed until the second class parameters and the target parameters with the evidence weight values smaller than the preset value are removed.
For example, if the evidence weight value of the parameter 1 is-3, the evidence weight value of the parameter 2 is-1, and the evidence weight value of the parameter 3 is 1, the parameter 1 with the evidence weight value of-3 can be removed for the first time, and the parameter 2 and the parameter 3 are linearly combined again, and the evidence weight value of each parameter is recalculated. The parameters with evidence weight values that are the least negative values may then be culled. In addition, multiple collinearities can be eliminated according to other algorithms. By sequentially rejecting parameters with evidence weight values smaller than a preset value and recalculating the evidence weight values of all the remaining parameters, all the parameters with the evidence weight values smaller than the preset value can be rejected more accurately, so that more accurate remaining parameters are obtained, and more accurate credit assessment results are obtained by inputting the remaining parameters into a trained machine learning model, so that user risks are accurately measured.
A specific flow chart for determining the credit evaluation result is shown in fig. 2. Wherein:
in step S201, feature extraction is performed on the data at the modeling layer to obtain a feature summary broad table, which includes a plurality of feature information.
In step S202, feature screening is performed based on the modeling layer and the information value IV, the plurality of feature information are divided into strong parameters, weak parameters, and very weak parameters according to the IV value, and the very weak parameters are directly filtered out.
In step S203, a WOE algorithm is used to perform binning processing on the strong parameter and the weak parameter, so as to obtain a strong parameter WOE bin and a weak parameter WOE bin.
In step S204, feature combination is performed on the weak parameters, and specifically, linear combination is performed on a plurality of weak parameters corresponding to each topic according to the LDA linear discriminant algorithm to obtain a strong parameter corresponding to each topic. For example, the LDA combination of browsing themes, the LDA combination of trading themes, and the trading credit card theme combination are obtained. And further carrying out secondary binning on the strong parameters converted from the weak parameters corresponding to each theme to obtain the WOE binning after LDA combination.
In step S205, the strong parameters after the WOE binning and the weak parameters after the WOE binning are subjected to multiple collinearity elimination through the strong parameters generated by the LDA combination to obtain remaining parameters, and the remaining parameters are input into the machine learning model to obtain a user credit evaluation result.
Next, the credit evaluation results may be monitored. For example, the method can be used for monitoring in modes of an ROC curve, a Kernel coefficient AR value, a distinguishing capability index KS value or a Lorentz curve and the like; in addition, PSI indexes can be adopted for monitoring so as to monitor the accuracy of the evaluation result.
Through the steps in fig. 2, the weak parameter features of each topic may be combined into a strong parameter, thereby performing a credit evaluation based on the strong parameter. By the method, the weak parameters can be completely utilized to carry out the risk scoring with stronger interpretability, and more parameters with more quantity and more types can be included, so that the parameters are more accurate and more comprehensive, the interpretability of the credit evaluation result is stronger, and the user risk can be more accurately and timely measured.
The disclosure also provides a user credit evaluation device. Referring to fig. 3, the user credit evaluation apparatus 300 may include:
a feature obtaining module 301, configured to obtain a plurality of feature information of a target user, where the plurality of feature information includes a first type of parameter and a second type of parameter;
a parameter preprocessing module 302, configured to preprocess the first type of parameter and the second type of parameter;
a target parameter generating module 303, configured to convert the preprocessed first type of parameters to generate target parameters;
an evaluation result determining module 304, configured to input the preprocessed second type parameters and the preprocessed target parameters into a machine learning model, so as to obtain a credit evaluation result of the target user;
the characteristic information with the IV value lower than a preset threshold is the first type of parameters, and the characteristic information with the IV value higher than the preset threshold is the second type of parameters.
In an exemplary embodiment of the present disclosure, the parameter preprocessing module includes: and the box separation processing module is used for respectively carrying out box separation processing on the first type parameters and the second type parameters through the evidence weight values to obtain the first type parameters after box separation and the second type parameters after box separation.
In an exemplary embodiment of the present disclosure, the target parameter generation module includes: and the characteristic combination module is used for performing characteristic combination on the first class of parameters subjected to the box separation and associated with each theme by adopting a linear discriminant algorithm to generate the target parameters.
In an exemplary embodiment of the present disclosure, the apparatus further includes: the first storage module is used for performing secondary binning on the target parameters and putting the secondary binned target parameters into a candidate variable pool; and the second storage module is used for placing the second type of parameters after the classification into the candidate variable pool.
In an exemplary embodiment of the present disclosure, the evaluation result determination module includes: the parameter eliminating module is used for eliminating multiple collinearity between the second type of parameters after the candidate variable pool is subjected to the box separation and the target parameters after the candidate variable pool is subjected to the box separation again to obtain residual parameters; and the input control module is used for inputting the residual parameters into the machine learning model.
In an exemplary embodiment of the present disclosure, the parameter culling module includes: and the removing control module is used for removing the second class parameters after the classification with the evidence weight value smaller than the preset value and the target parameters after the classification again from the candidate variable pool to obtain the residual parameters.
In an exemplary embodiment of the present disclosure, the culling control module includes: the first eliminating module is used for eliminating the second type of parameters after the evidence weight value is less than the preset value and the target parameters after the evidence weight value is subjected to secondary box separation according to the arrangement sequence of the evidence weight values from small to large; the evidence weight value calculation module is used for recalculating the evidence weight values of the second type of parameters subjected to the elimination and the target parameters subjected to the secondary box separation; and the secondary removing module is used for removing the second type of parameters after the second type of parameters and the target parameters are separated again, wherein the newly calculated evidence weight values are smaller than the preset value, according to the arrangement sequence of the evidence weight values from small to large, until the second type of parameters and the target parameters, of which the evidence weight values are smaller than the preset value, are removed.
It should be noted that, the details of each module in the user credit evaluation apparatus have been described in detail in the corresponding user credit evaluation method, and therefore are not described herein again.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Moreover, although the steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 400 according to this embodiment of the invention is described below with reference to fig. 4. The electronic device 400 shown in fig. 4 is only an example and should not bring any limitation to the function and the scope of use of the embodiments of the present invention.
As shown in fig. 4, electronic device 400 is embodied in the form of a general purpose computing device. The components of electronic device 400 may include, but are not limited to: the at least one processing unit 410, the at least one memory unit 420, and a bus 430 that couples various system components including the memory unit 420 and the processing unit 410.
Wherein the storage unit stores program code that is executable by the processing unit 410 to cause the processing unit 410 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 410 may perform the steps as shown in fig. 1.
The storage unit 420 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)4201 and/or a cache memory unit 4202, and may further include a read only memory unit (ROM) 4203.
The storage unit 420 may also include a program/utility 4204 having a set (at least one) of program modules 4205, such program modules 4205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The display unit 440 may be a display having a display function to show a processing result by the processing unit 410 performing the method in the present exemplary embodiment through the display. The display includes, but is not limited to, a liquid crystal display or other display.
The electronic device 400 may also communicate with one or more external devices 600 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 400, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 400 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 450. Also, the electronic device 400 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 460. As shown, the network adapter 460 communicates with the other modules of the electronic device 400 over the bus 430. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
Referring to fig. 5, a program product 500 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
Claims (9)
1. A method for assessing user credit, comprising:
acquiring a plurality of feature information of a target user, wherein the feature information comprises a first type of parameter and a second type of parameter;
preprocessing the first type parameters and the second type parameters;
converting the preprocessed first-class parameters to generate target parameters;
inputting the preprocessed second type parameters and the preprocessed target parameters into a machine learning model to obtain a credit evaluation result of the target user;
converting the preprocessed first-class parameters to generate target parameters, wherein the converting comprises the following steps:
performing feature combination on the first class parameters after the classification associated with each theme by adopting a linear discriminant algorithm to generate the target parameters;
the characteristic information with the IV value lower than a preset threshold is the first type of parameters, and the characteristic information with the IV value higher than the preset threshold is the second type of parameters.
2. The method of claim 1, wherein pre-processing the first type of parameters and the second type of parameters comprises:
and performing box separation treatment on the first type of parameters and the second type of parameters respectively through the evidence weight values to obtain the first type of parameters after box separation and the second type of parameters after box separation.
3. The method of claim 1, further comprising:
performing secondary box separation on the target parameters, and putting the secondary box separated target parameters into a candidate variable pool;
and putting the second type of parameters after the classification into the candidate variable pool.
4. The method of claim 3, wherein entering the preprocessed second class parameters and the target parameters into a machine learning model comprises:
removing multiple collinearity between the second type of parameters after the candidate variable pool is subjected to binning and the target parameters after the candidate variable pool is subjected to binning again to obtain residual parameters;
inputting the remaining parameters into the machine learning model.
5. The user credit evaluation method of claim 4, wherein the removing of the multiple collinearity between the binned second-class parameters and the binned target parameters in the candidate variable pool to obtain the remaining parameters comprises:
and removing the second class of binned parameters with the evidence weight value smaller than the preset value and the target parameters after binning again from the candidate variable pool to obtain the residual parameters.
6. The user credit evaluation method of claim 5, wherein the removing of the binned second class parameters and the rebinned target parameters from the candidate variable pool, the evidence weight of which is less than the predetermined value, comprises:
according to the arrangement sequence of the evidence weight values from small to large, the second type of parameters after the classification and the target parameters after the secondary classification are removed, wherein the evidence weight values are smaller than the preset value;
recalculating the evidence weight values of the second type of parameters subjected to the removed binning and the target parameters subjected to the binning again;
and according to the arrangement sequence of the evidence weight values from small to large, removing the second class parameters after the classification and the target parameters after the classification are removed until the second class parameters and the target parameters with the evidence weight values smaller than the preset value are removed.
7. A user credit evaluation apparatus, comprising:
the system comprises a characteristic acquisition module, a characteristic acquisition module and a characteristic analysis module, wherein the characteristic acquisition module is used for acquiring a plurality of characteristic information of a target user, and the plurality of characteristic information comprises a first type parameter and a second type parameter;
the parameter preprocessing module is used for preprocessing the first type of parameters and the second type of parameters;
the target parameter generation module is used for converting the preprocessed first-class parameters to generate target parameters;
the evaluation result determining module is used for inputting the preprocessed second type parameters and the preprocessed target parameters into a machine learning model to obtain a credit evaluation result of the target user;
converting the preprocessed first-class parameters to generate target parameters, wherein the converting comprises the following steps:
performing feature combination on the first class parameters after the classification associated with each theme by adopting a linear discriminant algorithm to generate the target parameters;
the characteristic information with the IV value lower than a preset threshold is the first type of parameters, and the characteristic information with the IV value higher than the preset threshold is the second type of parameters.
8. An electronic device, comprising:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the user credit assessment method of any of claims 1-6 via execution of the executable instructions.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method for user credit assessment according to any one of claims 1 to 6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811260889.1A CN109447461B (en) | 2018-10-26 | 2018-10-26 | User credit evaluation method and device, electronic equipment and storage medium |
CA3059937A CA3059937A1 (en) | 2018-10-26 | 2019-10-24 | User credit evaluation method and device, electronic device, storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811260889.1A CN109447461B (en) | 2018-10-26 | 2018-10-26 | User credit evaluation method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109447461A CN109447461A (en) | 2019-03-08 |
CN109447461B true CN109447461B (en) | 2022-05-03 |
Family
ID=65548550
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811260889.1A Active CN109447461B (en) | 2018-10-26 | 2018-10-26 | User credit evaluation method and device, electronic equipment and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109447461B (en) |
CA (1) | CA3059937A1 (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110727510A (en) * | 2019-09-25 | 2020-01-24 | 浙江大搜车软件技术有限公司 | User data processing method and device, computer equipment and storage medium |
CN110782349A (en) * | 2019-10-25 | 2020-02-11 | 支付宝(杭州)信息技术有限公司 | Model training method and system |
CN110866696B (en) * | 2019-11-15 | 2023-05-26 | 成都数联铭品科技有限公司 | Training method and device for risk assessment model of shop drop |
CN111709826A (en) * | 2020-06-11 | 2020-09-25 | 中国建设银行股份有限公司 | Target information determination method and device |
CN111815457A (en) * | 2020-07-01 | 2020-10-23 | 北京金堤征信服务有限公司 | Target object evaluation method and device |
CN111950889A (en) * | 2020-08-10 | 2020-11-17 | 中国平安人寿保险股份有限公司 | Client risk assessment method and device, readable storage medium and terminal equipment |
CN112734433A (en) * | 2020-12-10 | 2021-04-30 | 深圳市欢太科技有限公司 | Abnormal user detection method and device, electronic equipment and storage medium |
CN112529477A (en) * | 2020-12-29 | 2021-03-19 | 平安普惠企业管理有限公司 | Credit evaluation variable screening method, device, computer equipment and storage medium |
CN112734568B (en) * | 2021-01-29 | 2024-01-12 | 深圳前海微众银行股份有限公司 | Credit scoring card model construction method, device, equipment and readable storage medium |
CN113570066B (en) * | 2021-07-23 | 2024-03-29 | 中国恩菲工程技术有限公司 | Data processing method, system, electronic device and storage medium |
CN115880053B (en) * | 2022-12-05 | 2024-05-31 | 中电金信软件有限公司 | Training method and device for scoring card model |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2010266983A (en) * | 2009-05-13 | 2010-11-25 | Sony Corp | Information processing apparatus and method, learning device and method, program, and information processing system |
US20150178754A1 (en) * | 2013-12-19 | 2015-06-25 | Microsoft Corporation | Incentive system for interactive content consumption |
CN104866484B (en) * | 2014-02-21 | 2018-12-07 | 阿里巴巴集团控股有限公司 | A kind of data processing method and device |
US11521106B2 (en) * | 2014-10-24 | 2022-12-06 | National Ict Australia Limited | Learning with transformed data |
CN108230067A (en) * | 2016-12-14 | 2018-06-29 | 阿里巴巴集团控股有限公司 | The appraisal procedure and device of user credit |
CN108399255A (en) * | 2018-03-06 | 2018-08-14 | 中国银行股份有限公司 | A kind of input data processing method and device of Classification Data Mining model |
-
2018
- 2018-10-26 CN CN201811260889.1A patent/CN109447461B/en active Active
-
2019
- 2019-10-24 CA CA3059937A patent/CA3059937A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
CN109447461A (en) | 2019-03-08 |
CA3059937A1 (en) | 2020-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109447461B (en) | User credit evaluation method and device, electronic equipment and storage medium | |
CN107633265B (en) | Data processing method and device for optimizing credit evaluation model | |
CN106095942B (en) | Strong variable extracting method and device | |
WO2020082734A1 (en) | Text emotion recognition method and apparatus, electronic device, and computer non-volatile readable storage medium | |
CN110347840A (en) | Complain prediction technique, system, equipment and the storage medium of text categories | |
CN110633991A (en) | Risk identification method and device and electronic equipment | |
CN116560895B (en) | Fault diagnosis method for mechanical equipment | |
CN111179055B (en) | Credit line adjusting method and device and electronic equipment | |
CN111581945B (en) | Public opinion analysis-based data analysis method, device and system | |
Li et al. | Aggregative model-based classifier ensemble for improving land-use/cover classification of Landsat TM Images | |
CN117235608B (en) | Risk detection method, risk detection device, electronic equipment and storage medium | |
CN116738198A (en) | Information identification method, device, equipment, medium and product | |
CN110619024A (en) | Credit evaluation method, system and related device | |
CN115641198A (en) | User operation method, device, electronic equipment and storage medium | |
CN115409616A (en) | Risk detection method, device, equipment, storage medium and product | |
CN115237970A (en) | Data prediction method, device, equipment, storage medium and program product | |
CN115063574A (en) | Method and device for identifying disability grade, electronic equipment and storage medium | |
CN114566280A (en) | User state prediction method and device, electronic equipment and storage medium | |
CN113516398A (en) | Risk equipment identification method and device based on hierarchical sampling and electronic equipment | |
Wu et al. | Network Construction for Bearing Fault Diagnosis Based on Double Attention Mechanism | |
CN111325350A (en) | Suspicious tissue discovery system and method | |
CN116403403B (en) | Traffic early warning method, system, equipment and medium based on big data analysis | |
CN113591813B (en) | Association rule algorithm-based abnormity studying and judging method, model construction method and device | |
US20240259408A1 (en) | Test case-based anomaly detection within a computing environment | |
Zeng et al. | Selection Of Variables And Indicators In Financial Distress Prediction Model-Svm Method Based On Sparse Principal Component Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |