WO2019214248A1 - 一种风险评估方法、装置、终端设备及存储介质 - Google Patents

一种风险评估方法、装置、终端设备及存储介质 Download PDF

Info

Publication number
WO2019214248A1
WO2019214248A1 PCT/CN2018/122992 CN2018122992W WO2019214248A1 WO 2019214248 A1 WO2019214248 A1 WO 2019214248A1 CN 2018122992 W CN2018122992 W CN 2018122992W WO 2019214248 A1 WO2019214248 A1 WO 2019214248A1
Authority
WO
WIPO (PCT)
Prior art keywords
financial
user
financial risk
category
risk
Prior art date
Application number
PCT/CN2018/122992
Other languages
English (en)
French (fr)
Inventor
刘顺
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2019214248A1 publication Critical patent/WO2019214248A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Definitions

  • the present application relates to the field of financial service technologies, and in particular, to a risk assessment method, apparatus, terminal device, and storage medium.
  • the level of risk tolerance refers to how much a person has the ability to take risks. It has a relationship with personal assets, family situation, work situation, etc. Consider the measurement to assess how much risk the user can afford to lose without affecting his normal life.
  • the assessment of the user's risk tolerance level generally adopts the method of questionnaire survey.
  • the user conducts a risk assessment questionnaire for the user before purchasing the wealth management product, fund or stock, according to the questionnaire of the financial risk assessment questionnaire filled out by the user, and Combine the user's personal information to get the user's risk tolerance level.
  • the answers given by users are often subjective, or the user information obtained from the questionnaires is not comprehensive and cannot objectively reflect the real financial situation of users, so The level of financial risk tolerance of users is often inaccurate, resulting in lower accuracy of financial risk assessment.
  • the embodiment of the present application provides a risk assessment method to solve the problem that the accuracy of the financial risk assessment of the user's financial risk tolerance level is low in the prior art.
  • an embodiment of the present application provides a risk assessment method, including:
  • a random forest algorithm is used to construct the decision tree, and a financial risk assessment model is obtained, wherein the financial risk assessment model includes a K decision tree, and K is a positive integer;
  • the left average value is greater than or equal to the right average value, determining a difference between the initial aversion coefficient of the reference category and the left average value as a financial risk aversion coefficient of the user to be evaluated, otherwise, The sum of the initial aversion coefficient of the reference category and the right average is determined as the financial risk aversion coefficient;
  • the embodiment of the present application provides a risk assessment apparatus, including:
  • a user history financial information acquiring module configured to acquire historical financial information of the sample user
  • a training set building module configured to construct a training set according to the historical financial information
  • the financial risk assessment model building module is configured to perform a decision tree construction using a random forest algorithm for the training set, and obtain a financial risk assessment model, wherein the financial risk assessment model includes a K decision tree, and K is a positive integer;
  • a financial risk assessment model prediction module configured to perform model prediction on the financial information of the user to be evaluated by using the financial risk assessment model, and obtain a prediction result of each of the decision trees in the financial risk assessment model by the user to be evaluated;
  • a vote rate statistics module configured to vote on a preset financial risk category according to the predicted result, and count a vote rate of each of the financial risk categories, wherein the financial risk category includes a preset plurality of risks a rating and an initial aversion coefficient corresponding to each of the risk levels;
  • a base category determining module configured to determine a financial risk category of a highest vote rate among the votes of each of the financial risk categories as a base category, and calculate a left average of a vote rate lower than a financial risk category of the base category And a right average of the votes of the financial risk category above the baseline category;
  • a financial risk aversion coefficient calculation module configured to determine, if the left average value is greater than or equal to the right average value, a difference between an initial aversion coefficient of the reference category and the left average value as the user to be evaluated a financial risk aversion coefficient, otherwise, determining a sum of an initial aversion coefficient of the reference category and the right average as the financial risk aversion coefficient;
  • the financial risk tolerance level determining module is configured to determine a financial risk tolerance level of the user to be evaluated according to the financial risk aversion coefficient.
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor executes the computer
  • the steps of the risk assessment method are implemented when the instruction is read.
  • embodiments of the present application provide one or more non-transitory computer readable storage media storing computer readable instructions, the computer readable instructions being executed by one or more processors such that the one Or a plurality of processors executing the steps of the risk assessment method.
  • FIG. 1 is a flowchart of a risk assessment method provided in an embodiment of the present application.
  • step S20 is a flowchart of an implementation of step S20 in the risk assessment method provided in the embodiment of the present application
  • FIG. 3 is a flowchart of an implementation of normalizing a financial risk feature vector in a risk assessment method provided in an embodiment of the present application
  • step S30 is a flowchart of an implementation of step S30 in the risk assessment method provided in the embodiment of the present application.
  • FIG. 5 is a flowchart showing an implementation of optimizing a user financial risk aversion coefficient when a reference category is the highest level of a financial risk category in the risk assessment method provided in the embodiment of the present application;
  • FIG. 6 is a schematic diagram of a risk assessment apparatus provided in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a terminal device provided in an embodiment of the present application.
  • FIG. 1 shows an implementation process of a risk assessment method provided by an embodiment of the present application.
  • the risk assessment method can collect historical financial information of the user from the user database, so as to perform financial risk assessment model training based on the collected historical financial information.
  • the risk assessment method can be specifically applied to the user financial risk assessment system of the financial service industry to evaluate the user's financial risk tolerance level, which can effectively improve the accuracy of the user's financial risk assessment.
  • the risk assessment method includes steps S10 to S80, which are detailed as follows:
  • the historical financial information of the sample user may be collected from the user database, and the data stored in the user database includes, but is not limited to, the user's registration information, the user's questionnaire, the user's historical financial consumption information, or the bank card information. Information data.
  • the historical financial information includes basic attribute information of the user and financial attribute information of the user
  • the basic attribute information of the user includes information such as the user's ID card, age, gender, education level, education level, and family members, and the financial attribute of the user.
  • the information includes monthly consumption levels, income levels, investment experience, investment time of wealth management products, and asset distribution.
  • S20 Construct a training set based on historical financial information.
  • the financial risk feature vector is constructed according to the obtained historical financial information of the sample user, and the financial risk feature vector includes the basic attribute information of the user and the financial attribute information of the user.
  • the financial risk feature vector is used to construct a training set, and the training set is used as training data for machine model training, wherein the training set includes M financial risk feature vectors, and M is a positive integer.
  • the financial risk assessment model includes a K decision tree, and K is a positive integer.
  • a plurality of financial risk feature vectors are randomly selected from the training set, and a random sampling manner may be adopted.
  • the random sampling is a random sampling with a return, and the K-round extraction is repeated in the training set, each round.
  • the extracted result is used as a sub-training set to obtain K sub-training sets.
  • K sub-training sets are independent of each other, and there may be repeated financial risk feature vectors in the sub-training set.
  • the quantity of the financial risk feature vector can be obtained according to historical experience, or the appropriate financial risk feature vector can be extracted according to specific business needs, and the machine model training is performed as a sub-training set, although the sample data of the training is more The more accurate, but the higher the training cost and the more difficult the implementation, the specific number can be extracted according to the needs of the actual application, and is not limited here.
  • a random forest algorithm is used to construct the decision tree.
  • a decision tree is constructed for each sub-train set, and a K-tree decision tree is obtained. Then, a random forest is constructed according to the generated K-tree decision tree, and a financial risk assessment model is obtained.
  • the financial information of the user to be evaluated is obtained, and the financial risk assessment model obtained in step S30 is used to predict the financial information of the user to be evaluated.
  • the financial risk assessment model treats the user through each decision tree. The financial information is judged, the financial risk tolerance level of the user to be evaluated is evaluated, and the corresponding predicted result is output.
  • S50 According to the prediction result, vote on a preset financial risk category, and count the votes of each financial risk category, wherein the financial risk category includes a preset plurality of risk levels and an initial aversion corresponding to each risk level. coefficient.
  • the financial risk category of the financial risk assessment model is preset, and a corresponding initial aversion coefficient is set for each financial risk category.
  • the financial risk category and its corresponding initial aversion coefficient can be set according to historical experience, or can be set according to the characteristics of the financial risk model, and the specificity can be set according to the needs of the actual application, and there is no limitation here.
  • the financial risk category can be divided into five types of financial risk categories: low risk level, lower risk level, medium risk level, higher risk level and high risk level.
  • the initial aversion coefficients corresponding to the five types of financial risk categories are 1 respectively. 3, 5, 7 and 9, in which the smaller the aversion coefficient is the risk of aversion, indicating that the user's financial risk tolerance level is weak, and the greater the aversion coefficient, the more able to bear the risk, indicating that the user's financial risk tolerance level is strong.
  • each decision tree in the financial risk assessment model judges and votes on the financial information of the user to be evaluated, and calculates each financial risk category according to formula (1).
  • Vote rate :
  • Rate is the vote rate
  • T is the number of votes the decision tree votes for the financial risk category
  • K is the total number of decision trees.
  • S60 determining a financial risk category with the highest winning rate among the votes of each financial risk category as a base category, and calculating a left average value of the ticketing rate of the financial risk category lower than the base category, and a higher than the benchmark category The right average of the votes for the financial risk category.
  • the financial risk category with the highest vote rate among the votes of each financial risk category is determined as the base category, and the financial risk category lower than the base category is used as the left financial risk category, which will be higher than The financial risk category for this base category is the right financial risk category.
  • the financial risk category includes five categories: low risk level, lower risk level, medium risk level, higher risk level and high risk level. If the base category is lower risk level, the left financial risk category includes low risk level. The right financial risk category includes a medium risk level, a higher risk level, and a high risk level.
  • the left average value of the vote rate of the left financial risk category is calculated according to formula (2):
  • LeftMean is the left average
  • ⁇ LeftRate is the sum of the votes of the financial risk category on the left
  • LeftNum is the number of the financial risk category on the left.
  • RightMean is the right average
  • ⁇ RightRate is the sum of the votes of the right financial risk category
  • RightNum is the number of the right financial risk category.
  • the financial risk aversion coefficient of the user to be evaluated is calculated by analyzing the vote rate of each financial risk category.
  • the financial risk aversion coefficient of the user to be evaluated is calculated according to formula (4):
  • FinalScore is the financial risk aversion coefficient of the users to be evaluated
  • InitScore is the initial aversion coefficient of the benchmark category
  • LeftMean is the left average.
  • step S60 If the left average value calculated in step S60 is smaller than the right average value, the financial risk aversion coefficient of the user to be evaluated is calculated according to formula (5):
  • RightMean is the right average.
  • S80 Determine the financial risk tolerance level of the user to be evaluated according to the financial risk aversion coefficient.
  • the financial risk tolerance level of the user to be evaluated is analyzed and determined.
  • the initial aversion coefficients corresponding to the five types of financial risk categories are 1, 3, 5, 7 and 9 respectively. .
  • Rate(3) which can be used to determine that the benchmark category is a medium risk level
  • initial aversion coefficient of the benchmark category is 5, according to formula (2) and formula (3).
  • the financial risk aversion coefficient of the user to be evaluated is calculated by using formula (5), and the financial risk aversion coefficient can be obtained as:
  • the financial risk tolerance level of the user to be evaluated can be further determined.
  • the financial risk category corresponding to the largest vote rate is directly selected as the financial risk tolerance level of the user to be evaluated, the prediction results of other decision trees in the financial risk assessment model are ignored, and the user's financial risk assessment exists. Certain error.
  • the financial risk aversion coefficient of the users to be evaluated can be further accurately calculated, and the financial risk tolerance of the users to be evaluated is determined. Level, so as to comprehensively consider the prediction results of the financial risk assessment model and improve the accuracy of the user's financial risk assessment.
  • the historical financial information of the sample user is acquired, and the training set is built to train the machine model, and the random forest algorithm is used for the training set to construct the decision tree, and the random forest is constructed according to the generated multiple decision trees.
  • the user's financial risk tolerance level is evaluated, and the subjective factors existing in the prediction of the user's financial risk assessment form are avoided, and the accuracy and accuracy of the user's financial risk assessment are improved, thereby facilitating the provision of appropriate financial products to the user.
  • step S20 the specific implementation method for constructing the training set according to the historical financial information mentioned in step S20 is described in detail below through a specific embodiment.
  • FIG. 2 shows a specific implementation process of step S20 provided by the embodiment of the present application, which is described in detail as follows:
  • S201 Determine n user financial features according to historical financial information, construct a financial risk feature vector based on the user financial feature, and use the financial risk feature vector as a training sample, where n is a positive integer.
  • the n user financial features are determined according to the historical financial information acquired in step S10, and the model data is trained using the information data corresponding to the n user financial features, thereby avoiding too many features for training and causing the model to be too complicated. Or the model is not targeted.
  • n is a positive integer
  • the n user financial features may be determined according to historical experience, or may be determined according to the characteristics of the machine model, and may be determined according to actual application requirements, and are not limited herein.
  • the financial risk feature vector Y is constructed based on the determined user financial feature X, and the financial risk feature vector Y is used as a training sample.
  • S202 Filtering the financial risk feature vector. If a plurality of training samples having the same financial risk feature vector are detected, retaining one of the training samples and deleting the remaining training samples.
  • the constructed financial risk feature vector is screened, and the training samples in which the financial risk feature vector is completely consistent in the training sample are excluded.
  • any one of the training samples is retained, and the remaining training samples are deleted, thereby improving the data quality of the training samples.
  • Y 1 (X 11 , X 12 , X 13 , X 14 , X 15 , X 16 , X 17 , X 18 )
  • Y 2 (X 21 , X 22 , X 23 , X 24 , X 25 , X 26 , X 27 , X 28 )
  • S203 Construct a training set according to the filtered financial risk feature vector.
  • the user historical financial information record table is established according to the filtered financial risk feature vector, and the user history financial information record table is used as the training set.
  • the constructed user history financial information record table has the specific form as shown in Table 1.
  • the user financial feature is determined by historical financial information, and the redundant feature is excluded.
  • the financial risk feature vector is constructed based on the user financial feature, the financial risk feature vector is used as a training sample to avoid training. Too many features lead to the model being too complicated or the model is not targeted, and the financial risk feature vector is screened.
  • the user historical financial information record table is established according to the selected financial risk feature vector, and the training set is obtained for the machine model. Training to improve the quality of the data used to train the machine learning model.
  • the risk assessment method further includes:
  • S21 Mark the identification information of the financial risk feature vector in the training set according to the preset classification condition.
  • the preset classification condition is a feature value interval preset for each user financial feature, and identification information corresponding to each feature value interval, and the financial risk feature vector is performed according to the preset classification condition. Marking, and further determining identification information corresponding to the user financial feature in each financial risk feature vector.
  • the preset classification condition may be set according to the historical experience, or may be set according to the data distribution of the specific user financial feature, and may be specifically set according to the needs of the actual application, and is not limited herein.
  • Table 2 shows the criteria value intervals of the respective user financial features and the corresponding criteria for the identification information.
  • the feature value interval of the user financial feature is gender is set to male and female. If the gender of the user's financial risk feature vector is male, the corresponding identification information is 1, and if the user's financial risk feature vector gender is female, the corresponding The identification information is 0.
  • the asset distribution includes the amount of bonds, equity, gold, and cash held by the user, pre-set the characteristic value interval of the asset distribution and the reference table of the corresponding identification information, and classify the characteristic value intervals of the asset distribution into five categories.
  • the corresponding identification information is 1, 2, 3, 4, and 5, wherein the larger the value of the identification information indicates that the user's asset distribution is wider, and the ability to withstand the risk level is stronger, and the reference table of the asset distribution is as shown in Table 3. Show.
  • Bond cash Equity gold Identification information 0.309611 0.575552 0.06194 0.052896 1 0.688191 0.086436 0.122431 0.102942 2 0.644879 0 0.194244 0.160877 3 0.515787 0 0.265836 0.218377 4 0.310197 0 0.379852 0.309951 5
  • the asset distribution vector is constructed.
  • the asset distribution vector (bond, cash, option, gold)
  • S22 Normalize the financial risk feature vector in the training set according to the result of the identification information tag.
  • the normalization process may specifically be that the value of each user financial feature identification information is divided by the maximum value of the identification information in the user financial feature corresponding to the training set, or the financial feature identification information of each user is The value is divided by the average value of the identification information of the user's financial feature corresponding to the training set, and the specific value may be processed according to the needs of the actual application, and is not limited herein.
  • the identification information of the financial risk feature vector is marked according to Table 2, and (23, 5, 1 is obtained). , 1,0,2,1,1), and then normalized according to the result of the identification information tag, and the normalized financial risk feature vector is:
  • the financial risk feature vector in the training set is marked with the identification information by a preset classification condition, so that the user's historical financial information can be quantized into a specific numerical value, used for machine model training, and according to the identifier.
  • the result of the information tag normalizes the financial risk feature vector in the training set, converges the data to a specific interval, facilitates data processing, and improves the construction efficiency of the financial evaluation model.
  • the following is a specific embodiment for the decision set in the step S30 for the training set, using the random forest algorithm to construct the decision tree, and the specific implementation method of the financial risk assessment model is detailed. Description.
  • FIG. 4 shows a specific implementation process of step S30 provided by the embodiment of the present application, which is described in detail as follows:
  • S301 Extract training samples from the training set by using random sampling, and construct K sub-training sets.
  • the training samples are extracted from the training set by using random sampling
  • the random sampling method may use the resampling technique to extract the training samples from the training set.
  • the resampling technique is to perform the sampling with the return in the training set, and the training is performed.
  • Each sample data is equal in probability of being extracted each time, and K-round extraction is repeated in the training set, and the result of each round extraction is used as a sub-training set to obtain K sub-training sets, wherein the number of training samples in the sub-training set Less than or equal to the number of training samples in the training set.
  • H(c) is the information entropy before splitting according to the user financial feature X
  • X) is the information entropy after splitting according to the user financial feature X.
  • IntI is the penalty factor of the user financial feature
  • D is the total amount of the training samples in the sub-training set
  • W X is the number of training samples of each identification information of the user financial feature
  • gr is the information gain ratio of the user financial feature.
  • the feature value interval of the monthly consumption level is set to (0,000], (1000, 3000], (3000, 5000], (5000, 10000), and 10000+, each The identification information corresponding to the feature value interval is 1, 2, 3, 4, and 5, and the number of training samples corresponding to each identification information is 40, 30, 10, 10, and 10, and the user's consumption is calculated by using formula (10).
  • Level penalty factor :
  • S305 Selecting a maximum information gain to split the corresponding user financial feature as a split node.
  • the C4.5 algorithm is used to construct the decision tree, and the penalty factor of the user financial feature is calculated according to formula (10), and the information gain ratio of each user's financial feature is calculated by using formula (9), and according to the maximum The information gain is split as a split node than the corresponding user financial feature.
  • the decision tree construction tends to select the user financial feature with larger information gain as the split node, such as the user's ID card, credit card number or time stamp, etc.
  • the information gain will be relatively large, but for the case where there are multiple user financial features in the training set and there are multiple values, the predicted accuracy of the trained decision tree is lower, and the information gain ratio is calculated according to the penalty factor of the user's financial characteristics. According to the maximum information gain ratio, the corresponding user financial features are split as split nodes, which can effectively avoid the adverse effects of distributed attributes on decision tree splitting and improve the quality of decision tree construction.
  • step S306 The remaining user financial features are returned to step S302 to continue execution until the n user financial features are split as split points to obtain a decision tree.
  • the remaining user financial features are returned to the sub-training set mentioned in step S302, and the information entropy of the user financial feature is calculated to continue to execute until the n user financial features are split as split points. So far, split into multiple branches of the decision tree to recursively establish a decision tree.
  • S307 Construct a random forest according to the generated K decision tree, and obtain a financial risk assessment model.
  • the K decision tree is combined into a random forest, and a financial risk assessment model is obtained for evaluating the financial risk tolerance level of the user.
  • the training samples are extracted from the training set by using the random sampling with the return, and multiple sub-training sets are constructed for the machine model training, and the uncertainty of the data used for the model training is enhanced.
  • To improve the quality of financial risk assessment for each sub-training set, calculate the information gain ratio of each user's financial characteristics, and select the largest information gain each time as the corresponding user financial feature is split as a split node until all user financial features are used as After the splitting point completes the split, the corresponding decision tree is obtained.
  • the random forest is constructed according to the generated multiple decision trees, and the financial risk assessment model is obtained.
  • the maximum information gain ratio can be used as the splitting point to effectively avoid the uniform distribution of user financial features.
  • the adverse effects of tree splitting improve the quality of decision tree construction, and the construction of random forests by multiple decision trees, which enhances the classification prediction ability of machine models and improves the accuracy of financial risk assessment models.
  • the financial risk category of the highest vote rate of each financial risk category is determined as the base category in step S60, and the vote rate of the financial risk category lower than the base category is calculated. After the average value and the right average of the votes of the financial risk category above the base category, if the base category is the highest level of the financial risk category, the user financial risk aversion coefficient can be further optimized.
  • FIG. 5 shows an implementation flow of optimizing the calculation of the user financial risk aversion coefficient when the reference category is the highest level of the financial risk category, as detailed below:
  • the preset first probability value is compared with the ticket rate corresponding to the reference category, and the preset first probability value may be specifically set according to an actual application, for example, the first probability value may be specifically 0.5, which is not used herein. limit.
  • the difference between the initial aversion coefficient of the reference category and the left average is calculated according to formula (4), and the calculated result is determined as the financial risk aversion of the user to be evaluated. coefficient.
  • the preset second probability value may be specifically set according to an actual application, for example, the second probability value may be specifically 0.8, and the preset first adjustment parameter may be specifically set according to an actual application, such as the first
  • the adjustment parameter may be specifically 0.1, which is not limited herein.
  • the financial risk aversion coefficient of the user to be evaluated is calculated according to formula (11):
  • FinalScore is the financial risk aversion coefficient of the user to be evaluated
  • InitScore is the initial aversion coefficient of the reference category
  • rate is the ticketing rate of the reference category
  • ⁇ 1 is the preset first adjustment parameter.
  • S63 Determine the sum of the initial aversion coefficient of the reference category and the preset second adjustment parameter as the financial risk aversion coefficient if the vote rate corresponding to the reference category is greater than the second probability value.
  • the preset second adjustment parameter may be specifically set according to an actual application, for example, the second adjustment parameter may be specifically 1, and is not limited herein.
  • the financial risk aversion coefficient of the user to be evaluated is calculated according to formula (12):
  • ⁇ 2 is a preset second adjustment parameter.
  • the initial aversion coefficients corresponding to the five types of financial risk categories are 1, 3, 5, 7 and 9 respectively.
  • the preset first probability value is 0.5
  • the second probability value is 0.8
  • the first adjustment parameter is 0.1
  • the second adjustment parameter is 1.
  • Rate(5) the highest rate of votes is Rate(5), which can be used to determine that the benchmark category is a higher risk level, and the initial aversion coefficient of the benchmark category is 9.
  • Rate(5) is greater than the first probability value and less than the second probability value
  • the financial risk aversion coefficient of the user to be evaluated is calculated by using equation (11), and the financial risk aversion coefficient can be obtained as:
  • the ticketing rate corresponding to the highest level of the financial risk category is obtained by comparing the preset ticket value with the preset probability value, and appropriate fine-tuning processing such as attenuation or rounding is performed according to the comparison result.
  • appropriate fine-tuning processing such as attenuation or rounding is performed according to the comparison result.
  • FIG. 6 shows the risk assessment apparatus corresponding to the risk assessment method provided in the foregoing embodiment. For the convenience of description, only the parts related to the embodiments of the present application are shown.
  • the risk assessment apparatus includes a user history financial information acquisition module 10, a training set construction module 20, a financial risk assessment model construction module 30, a financial risk assessment model prediction module 40, a vote rate statistics module 50, and a benchmark category determination.
  • Each function module is described in detail as follows:
  • the user history financial information obtaining module 10 is configured to acquire historical financial information of the sample user
  • the training set building module 20 is configured to construct a training set according to historical financial information
  • the financial risk assessment model building module 30 is configured to construct a decision tree by using a random forest algorithm for the training set, and obtain a financial risk assessment model, wherein the financial risk assessment model includes a K decision tree, and K is a positive integer;
  • the financial risk assessment model prediction module 40 is configured to perform model prediction on the financial information of the user to be evaluated by using the financial risk assessment model, and obtain a prediction result of each decision tree in the financial risk assessment model of the user to be evaluated;
  • the vote rate statistics module 50 is configured to vote on a preset financial risk category according to the predicted result, and count the vote rate of each financial risk category, wherein the financial risk category includes a preset plurality of risk levels and each The initial aversion coefficient corresponding to the risk level;
  • a benchmark category determining module 60 configured to determine a financial risk category of the highest winning rate of each financial risk category as a base category, and calculate a left average value of the ticketing rate of the financial risk category lower than the base category, and The right average of the votes of the financial risk category above the baseline category;
  • the financial risk aversion coefficient calculation module 70 is configured to determine, if the left average value is greater than or equal to the right average value, the difference between the initial aversion coefficient of the reference category and the left average value as the financial risk aversion coefficient of the user to be evaluated, otherwise, the benchmark is The sum of the initial aversion coefficient of the category and the right average is determined as the financial risk aversion coefficient;
  • the financial risk tolerance level determining module 80 is configured to determine the financial risk tolerance level of the user to be evaluated according to the financial risk aversion coefficient.
  • training set building module 20 includes:
  • the financial risk feature vector construction unit 201 is configured to determine n user financial features according to historical financial information, and construct a financial risk feature vector based on the user financial feature, and use the financial risk feature vector as a training sample, where n is a positive integer;
  • the financial risk feature vector screening unit 202 is configured to filter the financial risk feature vector. If a plurality of training samples having the same financial risk feature vector are detected, any one of the training samples is retained, and the remaining training samples are deleted;
  • the training set construction unit 203 is configured to construct a training set according to the filtered financial risk feature vector.
  • the risk assessment device further includes:
  • the identification information marking module 21 is configured to mark the identification information of the financial risk feature vector in the training set according to the preset classification condition
  • the normalization processing module 22 is configured to normalize the financial risk feature vector in the training set according to the result of the identification information tag.
  • the financial risk assessment model prediction module 30 includes:
  • the sub-training set construction unit 301 is configured to extract training samples from the training set by using random sampling, and construct K sub-training sets;
  • the information entropy calculation unit 302 is configured to calculate an information entropy of each user financial feature according to the following formula for each sub-train set:
  • the information gain calculation unit 303 is configured to calculate an information gain of each user financial feature according to the information entropy according to the following formula:
  • H(c) is the information entropy before splitting according to the user financial feature X
  • X) is the information entropy after splitting according to the user financial feature X
  • the information gain ratio calculation unit 304 is configured to calculate an information gain ratio of each user financial feature according to the information gain according to the following formula:
  • IntI is the penalty factor of the user financial feature
  • D is the total amount of the training samples in the sub-training set
  • W X is the number of training samples of each identification information of the user financial feature
  • gr is the information gain ratio of the user financial feature
  • the split node selecting unit 305 is configured to select a maximum information gain to split the corresponding user financial feature as a split node;
  • the decision tree generating unit 306 is configured to return, for each remaining training set, the step of calculating the information entropy of each user financial feature according to the following formula, until the n user financial features are used as the splitting point. Get the decision tree until the split is completed;
  • the financial risk assessment model construction unit 307 is configured to construct a random forest according to the generated K-tree decision tree, and obtain a financial risk assessment model.
  • the risk assessment device further includes:
  • the first calculating module 61 is configured to: when the reference category is the highest level of the financial risk category, if the ticketing rate corresponding to the reference category is less than the preset first probability value, the difference between the initial aversion coefficient of the reference category and the left average value Determined as a financial risk aversion coefficient;
  • the second calculating module 62 is configured to: if the ticketing rate corresponding to the reference category is greater than the first probability value and less than the preset second probability value, subtract the preset initial aversion coefficient from the winning rate of the reference category by a preset number A parameter is adjusted, and the obtained value is determined as a financial risk aversion coefficient;
  • the third calculating module 63 is configured to determine, as the financial risk aversion coefficient, the sum of the initial aversion coefficient of the reference category and the preset second adjustment parameter, if the ticket rate corresponding to the reference category is greater than the second probability value.
  • Embodiments of the present application also provide one or more non-transitory computer readable storage media storing computer readable instructions that, when executed by one or more processors, cause one or more processors to execute The risk assessment method in the foregoing embodiment, or the computer readable instructions are executed by one or more processors to implement the functions of each module/unit in the financial risk assessment apparatus in the foregoing embodiments, in order to avoid duplication, no longer here. Narration.
  • non-transitory computer readable storage media storing computer readable instructions may comprise any entity or device capable of carrying the computer readable instruction code, a recording medium, a USB flash drive, a mobile hard disk , disk, optical disk, computer memory, read-only memory (ROM), random access memory (Random Access Memory, RAM), electrical carrier signals and telecommunications signals.
  • FIG. 7 is a schematic diagram of a terminal device according to an embodiment of the present application.
  • the terminal device 7 of this embodiment includes a processor 71, a memory 72, and computer readable instructions 73 stored in the memory 72 and operable on the processor 71.
  • the processor 71 executes the steps of the risk assessment method of the above embodiment when executing the computer readable instructions 73, such as steps S10 through S80 shown in FIG.
  • the processor 71 implements the functions of the modules/units of the risk assessment apparatus of the above-described embodiments when the computer readable instructions 73 are executed, such as the functions of the modules 10 to 80 shown in FIG.
  • computer readable instructions 73 may be partitioned into one or more modules/units, one or more modules/units being stored in memory 72 and executed by processor 71 to complete the application.
  • the one or more modules/units may be an instruction segment of a series of computer readable instructions capable of performing a particular function, which is used to describe the execution of computer readable instructions 73 in the terminal device 7.
  • the computer readable instructions 73 may be segmented into a user history financial information acquisition module, a training set construction module, a financial risk assessment model construction module, a financial risk assessment model prediction module, a vote rate statistics module, a benchmark category determination module, and a financial risk aversion.
  • the coefficient calculation module and the financial risk tolerance level determination module are as shown in the foregoing embodiments. To avoid repetition, details are not described herein.
  • the terminal device 7 can be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the terminal device 7 may include, but is not limited to, a processor 71, a memory 72, and a computer program 73. It will be understood by those skilled in the art that FIG. 7 is only an example of the terminal device 7, and does not constitute a limitation of the terminal device 7, and may include more or less components than those illustrated, or combine some components or different components.
  • the terminal device 7 may further include an input/output device, a network access device, a bus, and the like.
  • the processor 71 may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 72 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7.
  • the memory 72 may also be an external storage device of the terminal device 7, such as a plug-in hard disk provided on the terminal device 7, a smart memory card (SMC), a Secure Digital (SD) card, and a flash memory card (Flash). Card) and so on.
  • the memory 72 may also include both an internal storage unit of the terminal device 7 and an external storage device.
  • the memory 72 is used to store computer programs and other programs and data required by the terminal device 7.
  • the memory 72 can also be used to temporarily store data that has been or will be output.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Technology Law (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

本申请公开了一种风险评估方法、装置、终端设备及存储介质,该风险评估方法包括:获取样本用户的历史金融信息构建训练集,针对训练集,使用随机森林算法构建决策树,根据生成的多棵决策树构造出随机森林,得到金融风险评估模型,使用金融风险评估模型对待评测用户的金融信息进行预测,统计金融风险评估模型中每棵决策树的预测结果,并充分利用每一棵决策树的投票结果,进一步计算出待评测用户的金融风险厌恶系数。本申请的技术方案通过构建金融风险评估模型对用户的金融信息进行预测,并统计模型的预测结果作进一步的计算,从而得到用户的金融风险承受水平,提高对用户的金融风险评估的精确度。

Description

一种风险评估方法、装置、终端设备及存储介质
本申请以2018年05月09日提交的申请号为201810435813.1,名称为“一种风险评估方法、装置、终端设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及金融服务技术领域,尤其涉及一种风险评估方法、装置、终端设备及存储介质。
背景技术
在许多金融业务的推广方面,需要明确的了解用户的金融风险承受水平,风险承受水平是指一个人有多大能力承担风险,与个人资产状况、家庭情况、工作情况等方面都有关系,需要综合考虑衡量,从而评估得到用户能承受多大的投资损失但不至于影响他的正常生活的风险承受能力。
目前评估用户的风险承受水平一般采用问卷调查的方法,如用户在购买理财产品、基金或者股票之前,对用户做一个风险评估的问卷调查,根据用户填写的金融风险评估的问卷的答题情况,并结合用户的个人信息,得到该用户的风险承受水平。但是,在这种金融风险评估问卷调查的方式中,用户给出的答案往往具有强烈的主观性,或者由问卷中得到的用户信息并不全面,无法客观的反映用户真实的金融情况,因此得到的用户的金融风险承受水平往往并不准确,导致金融风险评估的准确性较低。
发明内容
本申请实施例提供一种风险评估方法,以解决现有技术中对用户的金融风险承受水平进行金融风险评估的准确性低的问题。
第一方面,本申请实施例提供一种风险评估方法,包括:
获取样本用户的历史金融信息;
根据所述历史金融信息构建训练集;
针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型,其中,所述金融风险评估模型包括K棵决策树,K为正整数;
使用所述金融风险评估模型对待评测用户的金融信息进行模型预测,得到所述待评测用户在所述金融风险评估模型中每棵所述决策树的预测结果;
根据所述预测结果,对预设的金融风险类别进行投票,并统计每个所述金融风险类别的得票率,其中,所述金融风险类别包括预设的多个风险等级和每个所述风险等级对应的初始厌恶系数;
将每个所述金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于所述基准类别的金融风险类别的得票率的左平均值,以及高于所述基准类别的金融风险类别的得票率的右平均值;
若所述左平均值大于或者等于所述右平均值,则将所述基准类别的初始厌恶系数与所述左平均值的差确定为所述待评测用户的金融风险厌恶系数,否则,将所述基准类别的初始厌恶系数与所述右平均值的和确定为所述金融风险厌恶系数;
根据所述金融风险厌恶系数确定所述待评测用户的金融风险承受水平。
第二方面,本申请实施例提供一种风险评估装置,包括:
用户历史金融信息获取模块,用于获取样本用户的历史金融信息;
训练集构建模块,用于根据所述历史金融信息构建训练集;
金融风险评估模型构建模块,用于针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型,其中,所述金融风险评估模型包括K棵决策树,K为正整数;
金融风险评估模型预测模块,用于使用所述金融风险评估模型对待评测用户的金融信息进行模型预测,得到所述待评测用户在所述金融风险评估模型中每棵所述决策树的预测结果;
得票率统计模块,用于根据所述预测结果,对预设的金融风险类别进行投票,并统计每个所述金融风险类别的得票率,其中,所述金融风险类别包括预设的多个风险等级和每个所述风险等级对应的初始厌恶系数;
基准类别确定模块,用于将每个所述金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于所述基准类别的金融风险类别的得票率的左平均值,以及高于所述基准类别的金融风险类别的得票率的右平均值;
金融风险厌恶系数计算模块,用于若所述左平均值大于或者等于所述右平均值,则将所述基准类别的初始厌恶系数与所述左平均值的差确定为所述待评测用户的金融风险厌恶系数,否则,将所述基准类别的初始厌恶系数与所述右平均值的和确定为所述金融风险厌恶系数;
金融风险承受水平确定模块,用于根据所述金融风险厌恶系数确定所述待评测用户的金融风险承受水平。
第三方面,本申请实施例提供一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现所述风险评估方法的步骤。
第四方面,本申请实施例提供一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行所述风险评估方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例中提供的风险评估方法的流程图;
图2是本申请实施例中提供的风险评估方法中步骤S20的实现流程图;
图3是本申请实施例中提供的风险评估方法中对金融风险特征向量进行归一化处理的实现流程图;
图4是本申请实施例中提供的风险评估方法中步骤S30的实现流程图;
图5是本申请实施例中提供的风险评估方法中当基准类别为金融风险类别的最高级别时对用户金融风险厌恶系数进行优化计算的实现流程图;
图6是本申请实施例中提供的风险评估装置的示意图;
图7是本申请实施例中提供的终端设备的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参阅图1,图1示出了本申请实施例提供的风险评估方法的实现流程。该风险评估方法可从用户数据库中采集用户的历史金融信息,以便基于采集到的历史金融信息进行金融风险评估模型训练。该风险评估方法可具体应用在金融服务行业的用户金融风险评估系统中,用于对用户的金融风险承受水平进行评估,能够有效提高用户的金融风险评估的精确度。如图1所示,该风险评估方法包括步骤S10至步骤S80,详述如下:
S10:获取样本用户的历史金融信息。
在本申请实施例中,样本用户的历史金融信息可以从用户数据库中采集,用户数据库存储的数据包括但不限于用户的注册信息、用户的调查问卷、用户的历史金融消费信息或者银行卡信息等信息数据。
具体地,历史金融信息包括用户的基础属性信息和用户的金融属性信息,用户的基础属性信息包括用户的身份证、年龄、性别、文化程度、受教育程度和家庭成员等信息,用户的金融属性信息包括月消费水平、收入水平、投资经验、理财产品投资期限和资产分布等信息。
S20:根据历史金融信息构建训练集。
在本申请实施例中,根据获取的样本用户的历史金融信息构造金融风险特征向量,该金融风险特征向量包括用户的基础属性信息和用户的金融属性信息。
具体地,金融风险特征向量的定义为Y=(X 1,X 2,X 3,...,X n),其中,Y为金融风险特征向量,X 1,X 2,X 3,...,X n为n个用户金融特征。
进一步地,使用金融风险特征向量构建训练集,将该训练集作为训练数据进行机器模型训练,其中,训练集包括M个金融风险特征向量,M为正整数。
S30:针对训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型,其中,该金融风险评估模型包括K棵决策树,K为正整数。
在本申请实施例中,从训练集中随机抽取多个金融风险特征向量,具体可以采取随机采样的方式,该随机抽样为有放回的随机抽样,重复在训练集中进行K轮抽取,每一轮抽取的结果作为一个子训练集,得到K个子训练集,其中,K个子训练集之间相互独立,子训练集中可以存在重复的金融风险特征向量。
需要说明的是,抽取金融风险特征向量的数量具体可以根据历史经验进行获取,或者根据具体的业务需要进行抽取合适的金融风险特征向量,作为子训练集进行机器模型训练,虽然训练的样本数据越多越准确,但是训练成本也越高而且实现方式越难,其具体数量可以根据实际应用的需要进行抽取,此处不作限制。
进一步地,使用随机森林算法进行决策树构建,针对每一个子训练集构建一棵决策树,得到K棵决策树,再根据生成的K棵决策树构造随机森林,得到金融风险评估模型。
S40:使用金融风险评估模型对待评测用户的金融信息进行模型预测,得到待评测用户在金融风险评估模型中每棵决策树的预测结果。
在本申请实施例中,获取待评测用户的金融信息,使用步骤S30得到的金融风险评估模型对待评测用户的金融信息进行模型预测,具体地,金融风险评估模型中通过每棵决策树对待评测用户的金融信息进行判断,评估待评测用户的金融风险承受水平,并输出对应的预测结果。
S50:根据预测结果,对预设的金融风险类别进行投票,并统计每个金融风险类别的得票率,其中,该金融风险类别包括预设的多个风险等级和每个风险等级对应的初始厌恶系数。
在本申请实施例中,预先设置金融风险评估模型的金融风险类别,并为每个金融风险类别设置对应的初始厌恶系数。
需要说明的是,金融风险类别和其对应的初始厌恶系数可以根据历史经验进行设置,也可以根据金融风险模型的特性进行设置,其具体可以根据实际应用的需要进行设置,此 处不作限制。
例如,金融风险类别具体可以划分低风险等级、较低风险等级、中等风险等级、较高风险等级和高风险等级等五类金融风险类别,该五类金融风险类别对应的初始厌恶系数分别为1、3、5、7和9,其中,厌恶系数越小越厌恶风险,表示用户的金融风险承受水平能力弱,厌恶系数越大越能够承受风险,表示用户的金融风险承受水平能力强。
进一步地,根据步骤S40得到的预测结果以及预设的金融风险类别,金融风险评估模型中每棵决策树都会对待评测用户的金融信息进行判断投票,并根据公式(1)计算每个金融风险类别的得票率:
Figure PCTCN2018122992-appb-000001
其中,Rate为得票率,T为决策树对金融风险类别进行投票的得票数量,K为决策树的总数量。
S60:将每个金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于该基准类别的金融风险类别的得票率的左平均值,以及高于该基准类别的金融风险类别的得票率的右平均值。
在本申请实施例中,将每个金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并将低于该基准类别的金融风险类别作为左侧金融风险类别,将高于该基准类别的金融风险类别作为右侧金融风险类别。
例如,金融风险类别包括低风险等级、较低风险等级、中等风险等级、较高风险等级和高风险等级这五类,若基准类别为较低风险等级,则左侧金融风险类别包括低风险等级,右侧金融风险类别包括中等风险等级、较高风险等级和高风险等级。
具体地,根据统计得到的每个金融风险类别的得票率,按照公式(2)计算左侧金融风险类别的得票率的左平均值:
Figure PCTCN2018122992-appb-000002
其中,LeftMean为左平均值,∑LeftRate为左侧金融风险类别的得票率之和,LeftNum为左侧金融风险类别的个数。
按照公式(3)计算右侧金融风险类别的得票率的右平均值;
Figure PCTCN2018122992-appb-000003
其中,RightMean为右平均值,∑RightRate为右侧金融风险类别的得票率之和,RightNum为右侧金融风险类别的个数。
S70:若左平均值大于或者等于右平均值,则将基准类别的初始厌恶系数与左平均值的差确定为待评测用户的金融风险厌恶系数,否则,将基准类别的初始厌恶系数与右平均值的和确定为金融风险厌恶系数。
具体地,通过分析每个金融风险类别的得票率,计算待评测用户的金融风险厌恶系数。
若步骤S60计算得到的左平均值大于或者等于右平均值,则按照公式(4)计算待评测用户的金融风险厌恶系数:
FinalScore=InitScore-LeftMean  公式(4)
其中,FinalScore为待评测用户的金融风险厌恶系数,InitScore为基准类别的初始厌恶系数,LeftMean为左平均值。
若步骤S60计算得到的左平均值小于右平均值,则按照公式(5)计算待评测用户的金 融风险厌恶系数:
FinalScore=InitScore+RightMean  公式(5)
其中,RightMean为右平均值。
S80:根据金融风险厌恶系数确定待评测用户的金融风险承受水平。
具体地,根据步骤S70计算得到的金融风险厌恶系数,分析确定待评测用户的金融风险承受水平。
为了更好的理解本申请实施例,举例说明如下:
假设金融风险类别划分为低风险等级、较低风险等级、中等风险等级、较高风险等级和高风险等级,该五类金融风险类别对应的初始厌恶系数分别为1、3、5、7和9。
使用金融风险评估模型对一位待评测用户的金融信息进行模型预测,根据公式(1)计算得到每个金融风险类别的得票率分别为Rate(1)=0.15735、Rate(2)=0.19358、Rate(3)=0.27222、Rate(4)=0.17111和Rate(5)=0.20572。
根据每个金融风险类别的得票率可知,得票率最高的为Rate(3),即可以确定基准类别为中等风险等级,基准类别的初始厌恶系数为5,根据公式(2)和公式(3)计算出左平均值和右平均值为:
leftMean=(Rate(1)+Rate(2))/2=(0.15735+0.19358)/2=0.175465
RightMean=(Rate(4)+Rate(5))/2=(0.17111+0.20572)/2=0.188415
由于左平均值小于右平均值,则采用公式(5)计算待评测用户的金融风险厌恶系数,可以得到金融风险厌恶系数为:
FinalScore=5+0.188415=5.188415
根据该金融风险厌恶系数可以进一步确定待评测用户的金融风险承受水平。
需要说明的是,若直接选择最大的得票率对应的金融风险类别,作为待评测用户的金融风险承受水平,则会忽略金融风险评估模型中其它决策树的预测结果,对用户的金融风险评估存在一定的误差。而通过对金融风险类别进行微调,设置对应的初始厌恶系数,充分利用每一棵决策树的投票结果,能够进一步精确地计算出待评测用户的金融风险厌恶系数,确定待评测用户的金融风险承受水平,从而综合考量金融风险评估模型的预测结果,提高对用户的金融风险评估的精确度。
在图1对应的实施例中,通过获取样本用户的历史金融信息,并构建训练集进行机器模型训练,针对训练集使用随机森林算法进行决策树构建,根据生成的多棵决策树构造出随机森林,得到金融风险评估模型,用于评估用户的金融风险承受水平,提高了对用户的金融风险评估的效率,在使用金融风险评估模型对待评测用户的金融信息进行模型预测之后,根据金融风险评估模型中每棵决策树的预测结果,统计每个预设的金融风险类别的得票率,并充分利用每一棵决策树的投票结果,进一步精确地计算出待评测用户的金融风险厌恶系数,确定待评测用户的金融风险承受水平,避免根据用户填写金融风险评估表格进行预测所存在的主观因素,提高对用户的金融风险评估的准确率与精确度,从而有利于向用户提供合适风险的金融产品。
接下来,在图1对应的实施例的基础之上,下面通过一个具体的实施例对步骤S20中提及的根据历史金融信息构建训练集的具体实现方法进行详细说明。
请参阅图2,图2示出了本申请实施例提供的步骤S20的具体实现流程,详述如下:
S201:根据历史金融信息确定n个用户金融特征,并基于用户金融特征构造金融风险特征向量,将金融风险特征向量作为训练样本,其中,n为正整数。
在本申请实施例中,根据步骤S10获取的历史金融信息确定n个用户金融特征,使用该n个用户金融特征对应的信息数据进行模型训练,避免用于训练的特征过多而导致模型过于复杂或者造成模型针对性不强。
其中,n为正整数,该n个用户金融特征具体可以根据历史经验确定,也可以根据机器模型的特性进行确定,其具体可以根据实际应用的需要进行确定,此处不作限制。
进一步地,基于已经确定的用户金融特征X构造金融风险特征向量Y,并将金融风险特征向量Y作为训练样本。
例如,金融风险特征向量Y的定义为Y=(X 1,X 2,X 3,X 4,X 5,X 6,X 7,X 8),其中,X 1为用户的年龄,X 2为用户的文化程度,X 3为用户的性别,X 4为用户的收入水平,X 5为用户的资产分布,X 6为用户的月消费水平,X 7为用户的投资经验,X 8为用户的产品投资期限。
S202:对金融风险特征向量进行筛选,若检测到存在金融风险特征向量相同的多个训练样本,则保留其中任意一个训练样本,删除其余的训练样本。
在本申请实施例中,对构造的金融风险特征向量进行筛选,剔除训练样本中金融风险特征向量完全一致的训练样本。
具体地,若检测到存在金融风险特征向量相同的多个训练样本,则保留其中任意一个训练样本,删除其余的训练样本,从而提高训练样本的数据质量。
例如,获取到2个用户的金融风险特征向量Y,其中,
Y 1=(X 11,X 12,X 13,X 14,X 15,X 16,X 17,X 18)
Y 2=(X 21,X 22,X 23,X 24,X 25,X 26,X 27,X 28)
若Y 1和Y 2的数据信息完全相同,则只需要保留其中任意一个金融风险特征向量,删除掉另一个金融风险特征向量。
S203:根据筛选后的金融风险特征向量构建训练集。
在本申请实施例中,根据筛选后的金融风险特征向量建立用户历史金融信息记录表,以用户历史金融信息记录表作为训练集。
例如,构建的用户历史金融信息记录表,其具体形式如表一所示。
表一
Figure PCTCN2018122992-appb-000004
在图2对应的实施例中,通过历史金融信息确定用户金融特征,排除掉冗余特征,在基于用户金融特征构造金融风险特征向量时,将金融风险特征向量作为训练样本,避免用于训练的特征过多而导致模型过于复杂或者造成模型针对性不强,并对金融风险特征向量进行筛选,根据筛选后的金融风险特征向量建立用户历史金融信息记录表,得到训练集,用于进行机器模型训练,提高用于训练机器学习模型的数据的质量。
在图2对应的实施例的基础之上,在通过步骤S20根据历史金融信息构建训练集之后,以及通过步骤S30针对训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型之前,还可以对训练集中金融风险特征向量进行归一化处理,如图3所示,该风险评估方法还包括:
S21:根据预设的分类条件对训练集中的金融风险特征向量标记其标识信息。
在本申请实施例中,预设的分类条件是对每个用户金融特征分别预先设置的特征值区间,以及每个特征值区间对应的标识信息,根据预设的分类条件对金融风险特征向量进行标记,进而确定每个金融风险特征向量中的用户金融特征对应的标识信息。
需要说明的是,该预设的分类条件具体可以根据历史经验进行设置,也可以根据具体的用户金融特征的数据分布情况设置,其具体可以根据实际应用的需要进行设置,此处不 作限制。
为了更好地理解本步骤,下面以一个具体的用户金融特征的特征值区间和对应的标识信息为例加以说明。如表二所示,表二示出了各个用户金融特征的特征值区间和对应的标识信息的标准。
表二
Figure PCTCN2018122992-appb-000005
例如,用户金融特征为性别的特征值区间设置为男和女,若用户的金融风险特征向量中性别为男,则对应的标识信息为1,若用户的金融风险特征向量性别为女,则对应的标识信息为0。
具体地,资产分布包括用户购买的债券、股权、黄金和持有现金的金额,预先设置资产分布的特征值区间和对应的标识信息的基准表,并将资产分布的特征值区间分为五类,对应的标识信息为1、2、3、4和5,其中,标识信息的取值越大表示用户的资产分布越广,其承受风险水平能力越强,资产分布的基准表如表三所示。
表三
债券 现金 股权 黄金 标识信息
0.309611 0.575552 0.06194 0.052896 1
0.688191 0.086436 0.122431 0.102942 2
0.644879 0 0.194244 0.160877 3
0.515787 0 0.265836 0.218377 4
0.310197 0 0.379852 0.309951 5
根据用户的资产分布情况构造资产分布向量,其中,资产分布向量=(债券,现金,期权,黄金),则对应的用户的资产分布向量为A=(A 1,A 2,A 3,A 4),基准表的资产分布向量为B=(B 1,B 2,B 3,B 4)。
针对基准表中每个资产分布向量,分别按照公式(6)计算用户的资产分布向量与基准表的资产分布向量的余弦值:
Figure PCTCN2018122992-appb-000006
获取余弦值最大的资产分布向量,将其在基准表中的资产分布向量所对应的标识信息作为金融风险特征向量中资产分布的标识信息,其中,余弦越大表示用户的资产分布向量与基准表中该组资产分布向量相似度越高。
S22:根据标识信息标记的结果对训练集中的金融风险特征向量进行归一化处理。
在本申请实施例中,归一化处理具体可以是将每个用户金融特征标识信息的值除以训练集中对应的用户金融特征中标识信息的最大值,或者将每个用户金融特征标识信息的值除以对应训练集中对应的用户金融特征的标识信息的均值,其具体可以根据实际应用的需要进行处理,此处不作限制。
例如,若用户的金融风险特征向量为(23岁,本科,男,8000,0,2000,0,0),则根据表二对金融风险特征向量标记其标识信息,得到(23,5,1,1,0,2,1,1),再根据标识信息标记的结果,进行归一化处理,得到归一化后的金融风险特征向量为:
Figure PCTCN2018122992-appb-000007
在图3对应的实施例中,通过预设的分类条件对训练集中的金融风险特征向量标记其标识信息,使得用户的历史金融信息能够量化成为具体的数值,用于机器模型训练,并根据标识信息标记的结果对训练集中的金融风险特征向量进行归一化处理,将数据收敛到特定区间,方便进行数据处理,提高了金融评估模型的构建效率。
在图3对应的实施例的基础之上,下面通过一个具体的实施例对步骤S30中提及的针对训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型的具体实现方法进行详细说明。
请参阅图4,图4示出了本申请实施例提供的步骤S30的具体实现流程,详述如下:
S301:使用随机抽样的方式从训练集中抽取训练样本,构建K个子训练集。
在本申请实施例中,使用随机抽样的方式从训练集中抽取训练样本,随机采样的方式可以使用重采样技术从训练集中抽取训练样本,重采样技术是在训练集中进行有放回的抽样,训练集中每个样本数据每次被抽到的概率相等,重复在训练集中进行K轮抽取,每一轮抽取的结果作为一个子训练集,得到K个子训练集,其中,子训练集中的训练样本数量小于或等于训练集中的训练样本数量。
S302:针对每个子训练集,按照公式(7)计算每个用户金融特征的信息熵:
H(X)=-∑p(x i)log(2,p(x i))  公式(7)
其中,X为用户金融特征,H(X)为用户金融特征的信息熵,i=1,2,...,n,x i为第i个用户金融特征,p(x i)为第i个用户金融特征的特征值概率。
S303:根据公式(7)计算得到的信息熵,按照公式(8)计算每个用户金融特征的信息增益:
gain=H(c)-H(c|X)  公式(8)
其中,gain为用户金融特征的信息增益,H(c)为按照用户金融特征X进行分裂之前的信息熵,H(c|X)为按照用户金融特征X分裂之后的信息熵。
S304:根据公式(8)计算得到的信息增益,按照公式(9)与公式(10)计算每个用户金融特征的信息增益比:
Figure PCTCN2018122992-appb-000008
Figure PCTCN2018122992-appb-000009
其中,IntI为用户金融特征的惩罚因子,D为子训练集中训练样本的总量,W X为用户金融特征的每个标识信息的训练样本数量,gr为用户金融特征的信息增益比。
例如,若用户金融特征X为月消费水平,月消费水平的特征值区间设置为(0,000]、(1000,3000]、(3000,5000]、(5000,10000]和10000+,每个特征值区间对应的的标识信息为1,2,3,4和5,并且每个标识信息对应的训练样本数量为40,30,10,10和10,则采用公式(10)计算用户的消费水平的惩罚因子:
Figure PCTCN2018122992-appb-000010
进一步地,可以采用公式(9)计算月消费水平的信息增益比,月消费水平的信息增益比=月消费水平的信息增益/月消费水平的惩罚因子。
S305:选取最大的信息增益比对应的用户金融特征作为分裂节点进行分裂。
在本申请实施例中,使用C4.5算法进行构建决策树,根据公式(10)计算得到用户金融特征的惩罚因子,使用公式(9)计算每个用户金融特征的信息增益比,并按照最大的信息增益比对应的用户金融特征作为分裂节点进行分裂。
需要说明的是,若按照信息增益作为分裂点进行分裂,决策树的构建倾向于选择信息增益较大的用户金融特征作为分裂节点,如用户的身份证、信用卡号或者时间戳等用户金融特征的信息增益会比较大,但是对于训练集中存在多个用户金融特征并且有多种取值的情况下,训练得到的决策树的预测准确率较低,而根据用户金融特征的惩罚因子计算信息增益比,按照最大的信息增益比对应的用户金融特征作为分裂节点进行分裂,能够有效的规避分布均匀的属性对决策树分裂产生的不利影响,提高决策树构建的质量。
S306:对剩下的用户金融特征,返回步骤S302继续执行,直到n个用户金融特征均作为分裂点完成分裂为止,得到决策树。
在本申请实施例中,对剩下的用户金融特征,返回步骤S302提及的针对每个子训练集,计算用户金融特征的信息熵处继续执行,直到n个用户金融特征均作为分裂点完成分裂为止,分裂成决策树的多个分支,以递归方式建立决策树。
S307:根据生成的K棵决策树构造随机森林,得到金融风险评估模型。
具体地,根据步骤S302至步骤S306生成的K棵决策树,将该K棵决策树组合成为随机森林,得到金融风险评估模型,用于评估用户的金融风险承受水平。
在图4对应的实施例中,通过使用有放回的随机抽样的方式从训练集中抽取训练样本,构建多个子训练集,用于进行机器模型训练,增强用于模型训练的数据的不确定性,提高金融风险评估质量;针对每个子训练集,计算每个用户金融特征的信息增益比,每次选取最大的信息增益比对应的用户金融特征作为分裂节点进行分裂,直到所有用户金融特征均作为分裂点完成分裂为止,得到对应的决策树,根据生成的多棵决策树构造随机森林,得到金融风险评估模型,使用最大的信息增益比作为分裂点能够有效的规避分布均匀的用户金融特征对决策树分裂产生的不利影响,提高决策树构建的质量,并且由多棵决策树构造随机森林,使得机器模型的分类预测能力增强,提高金融风险评估模型的准确率。
在以上实施例的基础之上,在步骤S60将每个金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于该基准类别的金融风险类别的得票率的左平均值,以及高于该基准类别的金融风险类别的得票率的右平均值之后,若基准类别为金融风险类别的最高级别,还可以进一步对用户金融风险厌恶系数进行优化计算。
请参阅图5,图5示出了当基准类别为金融风险类别的最高级别时对用户金融风险厌恶系数进行优化计算的实现流程,详述如下:
S61:当基准类别为金融风险类别的最高级别时,若基准类别对应的得票率小于预设的 第一概率值,则将基准类别的初始厌恶系数与左平均值的差确定为金融风险厌恶系数。
在本申请实施例中,当统计得到最大的得票率为金融风险类别的最高风险级别时,则不存在右平均值。
具体地,通过预设的第一概率值与基准类别对应的得票率作比较,该预设的第一概率值具体可以根据实际应用进行设置,如第一概率值具体可以为0.5,此处不作限制。
若基准类别对应的得票率小于预设的第一概率值,则按照公式(4)计算基准类别的初始厌恶系数与左平均值的差,将计算得到的结果确定为待评测用户的金融风险厌恶系数。
S62:若基准类别对应的得票率大于第一概率值且小于预设的第二概率值,则将基准类别的初始厌恶系数与该得票率的和减去预设的第一调节参数,得到的值确定为金融风险厌恶系数。
在本申请实施例中,预设的第二概率值具体可以根据实际应用进行设置,如第二概率值具体可以为0.8,预设的第一调节参数具体可以根据实际应用进行设置,如第一调节参数具体可以为0.1,此处不作限制。
根据步骤S61确定的基准类别,若基准类别对应的得票率大于第一概率值且小于第二概率值,则按照公式(11)计算待评测用户的金融风险厌恶系数:
FinalScore=InitScore+rate-θ 1  公式(11)
其中,FinalScore为待评测用户的金融风险厌恶系数,InitScore为基准类别的初始厌恶系数,rate为基准类别的得票率,θ 1为预设的第一调节参数。
S63:若基准类别对应的得票率大于第二概率值,则将基准类别的初始厌恶系数与预设的第二调节参数的和确定为金融风险厌恶系数。
在本申请实施例中,预设的第二调节参数具体可以根据实际应用进行设置,如第二调节参数具体可以为1,此处不作限制。
根据步骤S61确定的基准类别,若基准类别对应的得票率大于第二概率值,则按照公式(12)进行计算待评测用户的金融风险厌恶系数:
FinalScore=InitScore+θ 2  公式(12)
其中,θ 2为预设的第二调节参数。
为了更好的理解本申请实施例,举例说明如下:
假设金融风险类别划分为低风险等级、较低风险等级、中等风险等级、较高风险等级和高风险等级,该五类金融风险类别对应的初始厌恶系数分别为1、3、5、7和9,预设的第一概率值为0.5,第二概率值为0.8,第一调节参数为0.1,第二调节参数为1。
使用金融风险评估模型对一位待评测用户的金融信息进行模型预测,根据公式(1)计算得到每个金融风险类别的得票率分别为Rate(1)=0.01826、Rate(2)=0.06849、Rate(3)=0.10273、Rate(4)=0.23972和Rate(5)=0.57077。
根据每个金融风险类别的得票率可知,得票率最高的为Rate(5),即可以确定基准类别为高等风险等级,基准类别的初始厌恶系数为9。
由于Rate(5)大于第一概率值且小于第二概率值,则采用公式(11)计算待评测用户的金融风险厌恶系数,可以得到金融风险厌恶系数为:
FinalScore=9+0.57077-0.1=9.47077
需要说明的是,若统计得到最大的得票率为金融风险类别的最低风险级别时,则不存在左平均值,可以采用公式(5)计算待评测用户的金融风险厌恶系数。
在图5对应的实施例中,通过获取基准类别为金融风险类别的最高级别时对应的得票率,与预设的概率值作比较,根据比较的结果作适当的衰减或取整等微调处理,将金融风险评估模型中每棵决策树的分类结果和每个金融风险类别的得票率充分利用上,映射出具体的金融风险厌恶系数,根据得到的金融风险厌恶系数能够精确得到用户的金融风险承受水平,提高评估用户的金融风险承受水平的精确度。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
对应于前述实施例中的风险评估方法,图6示出了与前述实施例提供的风险评估方法一一对应的风险评估装置。为了便于说明,仅示出了与本申请实施例相关的部分。
如图6所示,该风险评估装置包括用户历史金融信息获取模块10、训练集构建模块20、金融风险评估模型构建模块30、金融风险评估模型预测模块40、得票率统计模块50、基准类别确定模块60、金融风险厌恶系数计算模块70和金融风险承受水平确定模块80。各功能模块详细说明如下:
用户历史金融信息获取模块10,用于获取样本用户的历史金融信息;
训练集构建模块20,用于根据历史金融信息构建训练集;
金融风险评估模型构建模块30,用于针对训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型,其中,该金融风险评估模型包括K棵决策树,K为正整数;
金融风险评估模型预测模块40,用于使用金融风险评估模型对待评测用户的金融信息进行模型预测,得到待评测用户在金融风险评估模型中每棵决策树的预测结果;
得票率统计模块50,用于根据预测结果,对预设的金融风险类别进行投票,并统计每个金融风险类别的得票率,其中,该金融风险类别包括预设的多个风险等级和每个风险等级对应的初始厌恶系数;
基准类别确定模块60,用于将每个金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于该基准类别的金融风险类别的得票率的左平均值,以及该高于基准类别的金融风险类别的得票率的右平均值;
金融风险厌恶系数计算模块70,用于若左平均值大于或者等于右平均值,则将基准类别的初始厌恶系数与左平均值的差确定为待评测用户的金融风险厌恶系数,否则,将基准类别的初始厌恶系数与右平均值的和确定为金融风险厌恶系数;
金融风险承受水平确定模块80,用于根据金融风险厌恶系数确定待评测用户的金融风险承受水平。
进一步地,训练集构建模块20包括:
金融风险特征向量构造单元201,用于根据历史金融信息确定n个用户金融特征,并基于用户金融特征构造金融风险特征向量,将金融风险特征向量作为训练样本,其中,n为正整数;
金融风险特征向量筛选单元202,用于对金融风险特征向量进行筛选,若检测到存在金融风险特征向量相同的多个训练样本,则保留其中任意一个训练样本,删除其余的训练样本;
训练集构建单元203,用于根据筛选后的金融风险特征向量构建训练集。
进一步地,该风险评估装置还包括:
标识信息标记模块21,用于根据预设的分类条件对训练集中的金融风险特征向量标记其标识信息;
归一化处理模块22,用于根据标识信息标记的结果对训练集中的金融风险特征向量进行归一化处理。
进一步地,金融风险评估模型预测模块30包括:
子训练集构建单元301,用于使用随机抽样的方式从训练集中抽取训练样本,构建K个子训练集;
信息熵计算单元302,用于针对每个子训练集,按照如下公式计算每个用户金融特征的信息熵:
H(X)=-∑p(x i)log(2,p(x i))
其中,X为用户金融特征,H(X)为用户金融特征的信息熵,i=1,2,...,n,x i为第i个 用户金融特征,p(x i)为第i个用户金融特征的特征值概率;
信息增益计算单元303,用于根据信息熵,按照如下公式计算每个用户金融特征的信息增益:
gain=H(c)-H(c|X)
其中,gain为用户金融特征的信息增益,H(c)为按照用户金融特征X进行分裂之前的信息熵,H(c|X)为按照用户金融特征X分裂之后的信息熵;
信息增益比计算单元304,用于根据信息增益,按照如下公式计算每个用户金融特征的信息增益比:
Figure PCTCN2018122992-appb-000011
Figure PCTCN2018122992-appb-000012
其中,IntI为用户金融特征的惩罚因子,D为子训练集中训练样本的总量,W X为用户金融特征的每个标识信息的训练样本数量,gr为用户金融特征的信息增益比;
分裂节点选取单元305,用于选取最大的信息增益比对应的用户金融特征作为分裂节点进行分裂;
决策树生成单元306,用于对剩下的用户金融特征,返回针对每个子训练集,按照如下公式计算每个用户金融特征的信息熵的步骤继续执行,直到n个用户金融特征均作为分裂点完成分裂为止,得到决策树;
金融风险评估模型构建单元307,用于根据生成的K棵决策树构造随机森林,得到金融风险评估模型。
进一步地,该风险评估装置还包括:
第一计算模块61,用于当基准类别为金融风险类别的最高级别时,若基准类别对应的得票率小于预设的第一概率值,则将基准类别的初始厌恶系数与左平均值的差确定为金融风险厌恶系数;
第二计算模块62,用于若基准类别对应的得票率大于第一概率值且小于预设的第二概率值,则将基准类别的初始厌恶系数与该得票率的和减去预设的第一调节参数,得到的值确定为金融风险厌恶系数;
第三计算模块63,用于若基准类别对应的得票率大于第二概率值,则将基准类别的初始厌恶系数与预设的第二调节参数的和确定为金融风险厌恶系数。
本实施例提供的一种风险评估装置中各模块实现各自功能的过程,具体可参考前述实施例的描述,此处不再赘述。
本申请实施例还提供一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行前述实施例中的风险评估方法,或者,该计算机可读指令被一个或多个处理器执行时实现前述实施例中的金融风险评估装置中各模块/单元的功能,为避免重复,这里不再赘述。
可以理解地,一个或多个存储有计算机可读指令的非易失性计算机可读存储介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电载波信号和电信信号等。
图7是本申请一实施例提供的终端设备的示意图。如图7所示,该实施例的终端设备7包括:处理器71、存储器72以及存储在存储器72中并可在处理器71上运行的计算机可读指令73。处理器71执行计算机可读指令73时实现上述实施例中风险评估方法的步骤,例如图1所示的步骤S10至S80。或者,处理器71执行计算机可读指令73时实现上述实 施例中风险评估装置的各模块/单元的功能,例如图6所示模块10至模块80的功能。
示例性的,计算机可读指令73可以被分割成一个或多个模块/单元,一个或者多个模块/单元被存储在存储器72中,并由处理器71执行,以完成本申请。一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令的指令段,该指令段用于描述计算机可读指令73在终端设备7中的执行过程。例如,计算机可读指令73可以被分割成用户历史金融信息获取模块、训练集构建模块、金融风险评估模型构建模块、金融风险评估模型预测模块、得票率统计模块、基准类别确定模块、金融风险厌恶系数计算模块和金融风险承受水平确定模块。各模块的具体功能如前述实施例所示,为避免重复,此处不一一赘述。
终端设备7可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。终端设备7可包括,但不仅限于,处理器71、存储器72及计算机程序73。本领域技术人员可以理解,图7仅仅是终端设备7的示例,并不构成对终端设备7的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如终端设备7还可以包括输入输出设备、网络接入设备、总线等。
所称处理器71可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器72可以是终端设备7的内部存储单元,例如终端设备7的硬盘或内存。存储器72也可以是终端设备7的外部存储设备,例如终端设备7上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器72还可以既包括终端设备7的内部存储单元也包括外部存储设备。存储器72用于存储计算机程序以及终端设备7所需的其他程序和数据。存储器72还可以用于暂时地存储已经输出或者将要输出的数据。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种风险评估方法,所述风险评估方法包括:
    获取样本用户的历史金融信息;
    根据所述历史金融信息构建训练集;
    针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型,其中,所述金融风险评估模型包括K棵决策树,K为正整数;
    使用所述金融风险评估模型对待评测用户的金融信息进行模型预测,得到所述待评测用户在所述金融风险评估模型中每棵所述决策树的预测结果;
    根据所述预测结果,对预设的金融风险类别进行投票,并统计每个所述金融风险类别的得票率,其中,所述金融风险类别包括预设的多个风险等级和每个所述风险等级对应的初始厌恶系数;
    将每个所述金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于所述基准类别的金融风险类别的得票率的左平均值,以及高于所述基准类别的金融风险类别的得票率的右平均值;
    若所述左平均值大于或者等于所述右平均值,则将所述基准类别的初始厌恶系数与所述左平均值的差确定为所述待评测用户的金融风险厌恶系数,否则,将所述基准类别的初始厌恶系数与所述右平均值的和确定为所述金融风险厌恶系数;
    根据所述金融风险厌恶系数确定所述待评测用户的金融风险承受水平。
  2. 如权利要求1所述的风险评估方法,其特征在于,所述根据所述历史金融信息构建训练集包括:
    根据所述历史金融信息确定n个用户金融特征,并基于所述用户金融特征构造金融风险特征向量,将所述金融风险特征向量作为训练样本,其中,n为正整数;
    对所述金融风险特征向量进行筛选,若检测到存在所述金融风险特征向量相同的多个所述训练样本,则保留其中任意一个所述训练样本,删除其余的所述训练样本;
    根据筛选后的所述金融风险特征向量构建所述训练集。
  3. 如权利要求2所述的风险评估方法,其特征在于,在所述根据所述历史金融信息构建训练集之后,以及所述针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型之前,所述风险评估方法还包括:
    根据预设的分类条件对所述训练集中的所述金融风险特征向量标记其标识信息;
    根据所述标识信息标记的结果对所述训练集中的所述金融风险特征向量进行归一化处理。
  4. 如权利要求3所述的风险评估方法,其特征在于,所述针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型包括:
    使用随机抽样的方式从所述训练集中抽取所述训练样本,构建K个子训练集;
    针对每个所述子训练集,按照如下公式计算每个所述用户金融特征的信息熵:
    H(X)=-∑p(x i)log(2,p(x i))
    其中,X为所述用户金融特征,H(X)为所述用户金融特征的信息熵,i=1,2,...,n,x i为第i个所述用户金融特征,p(x i)为第i个所述用户金融特征的特征值概率;
    根据所述信息熵,按照如下公式计算每个所述用户金融特征的信息增益:
    gain=H(c)-H(c|X)
    其中,gain为所述用户金融特征的信息增益,H(c)为按照用户金融特征X进行分裂之前的信息熵,H(c|X)为按照所述用户金融特征X分裂之后的信息熵;
    根据所述信息增益,按照如下公式计算每个所述用户金融特征的信息增益比:
    Figure PCTCN2018122992-appb-100001
    Figure PCTCN2018122992-appb-100002
    其中,IntI为用户金融特征的惩罚因子,D为所述子训练集中训练样本的总量,W X为用户金融特征的每个标识信息的训练样本数量,gr为所述用户金融特征的信息增益比;
    选取最大的信息增益比对应的用户金融特征作为分裂节点进行分裂;
    对剩下的所述用户金融特征,返回所述针对每个所述子训练集,按照如下公式计算每个所述用户金融特征的信息熵的步骤继续执行,直到n个所述用户金融特征均作为所述分裂点完成分裂为止,得到所述决策树;
    根据生成的K棵所述决策树构造随机森林,得到金融风险评估模型。
  5. 如权利要求1至4任一项所述的风险评估方法,其特征在于,所述将每个所述金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于所述基准类别的金融风险类别的得票率的左平均值,以及高于所述基准类别的金融风险类别的得票率的右平均值之后,所述风险评估方法还包括:
    当所述基准类别为所述金融风险类别的最高级别时,若所述基准类别对应的得票率小于预设的第一概率值,则将所述基准类别的初始厌恶系数与所述左平均值的差确定为所述金融风险厌恶系数;
    若所述基准类别对应的得票率大于所述第一概率值且小于预设的第二概率值,则将所述基准类别的初始厌恶系数与该得票率的和减去预设的第一调节参数,得到的值确定为所述金融风险厌恶系数;
    若所述基准类别对应的得票率大于所述第二概率值,则将所述基准类别的初始厌恶系数与预设的第二调节参数的和确定为所述金融风险厌恶系数。
  6. 一种风险评估装置,所述风险评估装置包括:
    用户历史金融信息获取模块,用于获取样本用户的历史金融信息;
    训练集构建模块,用于根据所述历史金融信息构建训练集;
    金融风险评估模型构建模块,用于针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型,其中,所述金融风险评估模型包括K棵决策树,K为正整数;
    金融风险评估模型预测模块,用于使用所述金融风险评估模型对待评测用户的金融信息进行模型预测,得到所述待评测用户在所述金融风险评估模型中每棵所述决策树的预测结果;
    得票率统计模块,用于根据所述预测结果,对预设的金融风险类别进行投票,并统计每个所述金融风险类别的得票率,其中,所述金融风险类别包括预设的多个风险等级和每个所述风险等级对应的初始厌恶系数;
    基准类别确定模块,用于将每个所述金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于所述基准类别的金融风险类别的得票率的左平均值,以及高于所述基准类别的金融风险类别的得票率的右平均值;
    金融风险厌恶系数计算模块,用于若所述左平均值大于或者等于所述右平均值,则将所述基准类别的初始厌恶系数与所述左平均值的差确定为所述待评测用户的金融风险厌恶系数,否则,将所述基准类别的初始厌恶系数与所述右平均值的和确定为所述金融风险厌恶系数;
    金融风险承受水平确定模块,用于根据所述金融风险厌恶系数确定所述待评测用户的金融风险承受水平。
  7. 如权利要求6所述的风险评估装置,其特征在于,所述训练集构建模块包括:
    金融风险特征向量构造单元,用于根据所述历史金融信息确定n个用户金融特征,并 基于所述用户金融特征构造金融风险特征向量,将所述金融风险特征向量作为训练样本,其中,n为正整数;
    金融风险特征向量筛选单元,用于对所述金融风险特征向量进行筛选,若检测到存在所述金融风险特征向量相同的多个所述训练样本,则保留其中任意一个所述训练样本,删除其余的所述训练样本;
    训练集构建单元,用于根据筛选后的所述金融风险特征向量构建所述训练集。
  8. 如权利要求7所述的风险评估装置,其特征在于,所述风险评估装置还包括:
    标识信息标记模块,用于根据预设的分类条件对所述训练集中的所述金融风险特征向量标记其标识信息;
    归一化处理模块,用于根据所述标识信息标记的结果对所述训练集中的所述金融风险特征向量进行归一化处理。
  9. 如权利要求8所述的风险评估装置,其特征在于,所述金融风险评估模型构建模块包括:
    子训练集构建单元,用于使用随机抽样的方式从所述训练集中抽取所述训练样本,构建K个子训练集;
    信息熵计算单元,用于针对每个所述子训练集,按照如下公式计算每个所述用户金融特征的信息熵:
    H(X)=-∑p(x i)log(2,p(x i))
    其中,X为所述用户金融特征,H(X)为所述用户金融特征的信息熵,i=1,2,...,n,x i为第i个所述用户金融特征,p(x i)为第i个所述用户金融特征的特征值概率;
    信息增益计算单元,用于根据所述信息熵,按照如下公式计算每个所述用户金融特征的信息增益:
    gain=H(c)-H(c|X)
    其中,gain为所述用户金融特征的信息增益,H(c)为按照用户金融特征X进行分裂之前的信息熵,H(c|X)为按照所述用户金融特征X分裂之后的信息熵;
    信息增益比计算单元,用于根据所述信息增益,按照如下公式计算每个所述用户金融特征的信息增益比:
    Figure PCTCN2018122992-appb-100003
    Figure PCTCN2018122992-appb-100004
    其中,IntI为用户金融特征的惩罚因子,D为所述子训练集中训练样本的总量,W X为用户金融特征的每个标识信息的训练样本数量,gr为所述用户金融特征的信息增益比;
    分裂节点选取单元,用于选取最大的信息增益比对应的用户金融特征作为分裂节点进行分裂;
    决策树生成单元,用于对剩下的所述用户金融特征,返回所述针对每个所述子训练集,按照如下公式计算每个所述用户金融特征的信息熵的步骤继续执行,直到n个所述用户金融特征均作为所述分裂点完成分裂为止,得到所述决策树;
    金融风险评估模型构建单元,用于根据生成的K棵所述决策树构造随机森林,得到金融风险评估模型。
  10. 如权利要求6至9任一项所述的风险评估装置,其特征在于,所述风险评估装置还包括:
    第一计算模块,用于当所述基准类别为所述金融风险类别的最高级别时,若所述基准 类别对应的得票率小于预设的第一概率值,则将所述基准类别的初始厌恶系数与所述左平均值的差确定为所述金融风险厌恶系数;
    第二计算模块,用于若所述基准类别对应的得票率大于所述第一概率值且小于预设的第二概率值,则将所述基准类别的初始厌恶系数与该得票率的和减去预设的第一调节参数,得到的值确定为所述金融风险厌恶系数;
    第三计算模块,用于若所述基准类别对应的得票率大于所述第二概率值,则将所述基准类别的初始厌恶系数与预设的第二调节参数的和确定为所述金融风险厌恶系数。
  11. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取样本用户的历史金融信息;
    根据所述历史金融信息构建训练集;
    针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型,其中,所述金融风险评估模型包括K棵决策树,K为正整数;
    使用所述金融风险评估模型对待评测用户的金融信息进行模型预测,得到所述待评测用户在所述金融风险评估模型中每棵所述决策树的预测结果;
    根据所述预测结果,对预设的金融风险类别进行投票,并统计每个所述金融风险类别的得票率,其中,所述金融风险类别包括预设的多个风险等级和每个所述风险等级对应的初始厌恶系数;
    将每个所述金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于所述基准类别的金融风险类别的得票率的左平均值,以及高于所述基准类别的金融风险类别的得票率的右平均值;
    若所述左平均值大于或者等于所述右平均值,则将所述基准类别的初始厌恶系数与所述左平均值的差确定为所述待评测用户的金融风险厌恶系数,否则,将所述基准类别的初始厌恶系数与所述右平均值的和确定为所述金融风险厌恶系数;
    根据所述金融风险厌恶系数确定所述待评测用户的金融风险承受水平。
  12. 如权利要求11所述的计算机设备,其特征在于,所述根据所述历史金融信息构建训练集包括:
    根据所述历史金融信息确定n个用户金融特征,并基于所述用户金融特征构造金融风险特征向量,将所述金融风险特征向量作为训练样本,其中,n为正整数;
    对所述金融风险特征向量进行筛选,若检测到存在所述金融风险特征向量相同的多个所述训练样本,则保留其中任意一个所述训练样本,删除其余的所述训练样本;
    根据筛选后的所述金融风险特征向量构建所述训练集。
  13. 如权利要求12所述的计算机设备,其特征在于,在所述根据所述历史金融信息构建训练集之后,以及所述针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型之前,所述处理器执行所述计算机可读指令时还实现如下步骤:
    根据预设的分类条件对所述训练集中的所述金融风险特征向量标记其标识信息;
    根据所述标识信息标记的结果对所述训练集中的所述金融风险特征向量进行归一化处理。
  14. 如权利要求13所述的计算机设备,其特征在于,所述针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型包括:
    使用随机抽样的方式从所述训练集中抽取所述训练样本,构建K个子训练集;
    针对每个所述子训练集,按照如下公式计算每个所述用户金融特征的信息熵:
    H(X)=-∑p(x i)log(2,p(x i))
    其中,X为所述用户金融特征,H(X)为所述用户金融特征的信息熵,i=1,2,...,n,x i 为第i个所述用户金融特征,p(x i)为第i个所述用户金融特征的特征值概率;
    根据所述信息熵,按照如下公式计算每个所述用户金融特征的信息增益:
    gain=H(c)-H(c|X)
    其中,gain为所述用户金融特征的信息增益,H(c)为按照用户金融特征X进行分裂之前的信息熵,H(c|X)为按照所述用户金融特征X分裂之后的信息熵;
    根据所述信息增益,按照如下公式计算每个所述用户金融特征的信息增益比:
    Figure PCTCN2018122992-appb-100005
    Figure PCTCN2018122992-appb-100006
    其中,IntI为用户金融特征的惩罚因子,D为所述子训练集中训练样本的总量,W X为用户金融特征的每个标识信息的训练样本数量,gr为所述用户金融特征的信息增益比;
    选取最大的信息增益比对应的用户金融特征作为分裂节点进行分裂;
    对剩下的所述用户金融特征,返回所述针对每个所述子训练集,按照如下公式计算每个所述用户金融特征的信息熵的步骤继续执行,直到n个所述用户金融特征均作为所述分裂点完成分裂为止,得到所述决策树;
    根据生成的K棵所述决策树构造随机森林,得到金融风险评估模型。
  15. 如权利要求11至14任一项所述的计算机设备,其特征在于,在所述将每个所述金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于所述基准类别的金融风险类别的得票率的左平均值,以及高于所述基准类别的金融风险类别的得票率的右平均值之后,所述处理器执行所述计算机可读指令时还实现如下步骤:
    当所述基准类别为所述金融风险类别的最高级别时,若所述基准类别对应的得票率小于预设的第一概率值,则将所述基准类别的初始厌恶系数与所述左平均值的差确定为所述金融风险厌恶系数;
    若所述基准类别对应的得票率大于所述第一概率值且小于预设的第二概率值,则将所述基准类别的初始厌恶系数与该得票率的和减去预设的第一调节参数,得到的值确定为所述金融风险厌恶系数;
    若所述基准类别对应的得票率大于所述第二概率值,则将所述基准类别的初始厌恶系数与预设的第二调节参数的和确定为所述金融风险厌恶系数。
  16. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取样本用户的历史金融信息;
    根据所述历史金融信息构建训练集;
    针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型,其中,所述金融风险评估模型包括K棵决策树,K为正整数;
    使用所述金融风险评估模型对待评测用户的金融信息进行模型预测,得到所述待评测用户在所述金融风险评估模型中每棵所述决策树的预测结果;
    根据所述预测结果,对预设的金融风险类别进行投票,并统计每个所述金融风险类别的得票率,其中,所述金融风险类别包括预设的多个风险等级和每个所述风险等级对应的初始厌恶系数;
    将每个所述金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于所述基准类别的金融风险类别的得票率的左平均值,以及高于所述基准类别的金融风险类别的得票率的右平均值;
    若所述左平均值大于或者等于所述右平均值,则将所述基准类别的初始厌恶系数与所 述左平均值的差确定为所述待评测用户的金融风险厌恶系数,否则,将所述基准类别的初始厌恶系数与所述右平均值的和确定为所述金融风险厌恶系数;
    根据所述金融风险厌恶系数确定所述待评测用户的金融风险承受水平。
  17. 如权利要求16所述的非易失性计算机可读存储介质,其特征在于,所述根据所述历史金融信息构建训练集包括:
    根据所述历史金融信息确定n个用户金融特征,并基于所述用户金融特征构造金融风险特征向量,将所述金融风险特征向量作为训练样本,其中,n为正整数;
    对所述金融风险特征向量进行筛选,若检测到存在所述金融风险特征向量相同的多个所述训练样本,则保留其中任意一个所述训练样本,删除其余的所述训练样本;
    根据筛选后的所述金融风险特征向量构建所述训练集。
  18. 如权利要求17所述的非易失性计算机可读存储介质,其特征在于,在所述根据所述历史金融信息构建训练集之后,以及所述针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    根据预设的分类条件对所述训练集中的所述金融风险特征向量标记其标识信息;
    根据所述标识信息标记的结果对所述训练集中的所述金融风险特征向量进行归一化处理。
  19. 如权利要求18所述的非易失性计算机可读存储介质,其特征在于,所述针对所述训练集,使用随机森林算法进行决策树构建,得到金融风险评估模型包括:
    使用随机抽样的方式从所述训练集中抽取所述训练样本,构建K个子训练集;
    针对每个所述子训练集,按照如下公式计算每个所述用户金融特征的信息熵:
    H(X)=-∑p(x i)log(2,p(x i))
    其中,X为所述用户金融特征,H(X)为所述用户金融特征的信息熵,i=1,2,...,n,x i为第i个所述用户金融特征,p(x i)为第i个所述用户金融特征的特征值概率;
    根据所述信息熵,按照如下公式计算每个所述用户金融特征的信息增益:
    gain=H(c)-H(c|X)
    其中,gain为所述用户金融特征的信息增益,H(c)为按照用户金融特征X进行分裂之前的信息熵,H(c|X)为按照所述用户金融特征X分裂之后的信息熵;
    根据所述信息增益,按照如下公式计算每个所述用户金融特征的信息增益比:
    Figure PCTCN2018122992-appb-100007
    Figure PCTCN2018122992-appb-100008
    其中,IntI为用户金融特征的惩罚因子,D为所述子训练集中训练样本的总量,W X为用户金融特征的每个标识信息的训练样本数量,gr为所述用户金融特征的信息增益比;
    选取最大的信息增益比对应的用户金融特征作为分裂节点进行分裂;
    对剩下的所述用户金融特征,返回所述针对每个所述子训练集,按照如下公式计算每个所述用户金融特征的信息熵的步骤继续执行,直到n个所述用户金融特征均作为所述分裂点完成分裂为止,得到所述决策树;
    根据生成的K棵所述决策树构造随机森林,得到金融风险评估模型。
  20. 如权利要求16至19任一项所述的非易失性计算机可读存储介质,其特征在于,在所述将每个所述金融风险类别的得票率中最高得票率的金融风险类别确定为基准类别,并计算低于所述基准类别的金融风险类别的得票率的左平均值,以及高于所述基准类别的 金融风险类别的得票率的右平均值之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    当所述基准类别为所述金融风险类别的最高级别时,若所述基准类别对应的得票率小于预设的第一概率值,则将所述基准类别的初始厌恶系数与所述左平均值的差确定为所述金融风险厌恶系数;
    若所述基准类别对应的得票率大于所述第一概率值且小于预设的第二概率值,则将所述基准类别的初始厌恶系数与该得票率的和减去预设的第一调节参数,得到的值确定为所述金融风险厌恶系数;
    若所述基准类别对应的得票率大于所述第二概率值,则将所述基准类别的初始厌恶系数与预设的第二调节参数的和确定为所述金融风险厌恶系数。
PCT/CN2018/122992 2018-05-09 2018-12-24 一种风险评估方法、装置、终端设备及存储介质 WO2019214248A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810435813.1 2018-05-09
CN201810435813.1A CN108665159A (zh) 2018-05-09 2018-05-09 一种风险评估方法、装置、终端设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019214248A1 true WO2019214248A1 (zh) 2019-11-14

Family

ID=63778756

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/122992 WO2019214248A1 (zh) 2018-05-09 2018-12-24 一种风险评估方法、装置、终端设备及存储介质

Country Status (2)

Country Link
CN (1) CN108665159A (zh)
WO (1) WO2019214248A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849760A (zh) * 2021-12-02 2021-12-28 云账户技术(天津)有限公司 敏感信息风险评估方法、系统和存储介质

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665159A (zh) * 2018-05-09 2018-10-16 深圳壹账通智能科技有限公司 一种风险评估方法、装置、终端设备及存储介质
CN109657696B (zh) * 2018-11-05 2023-06-30 创新先进技术有限公司 多任务监督学习模型训练、预测方法和装置
CN109711665A (zh) * 2018-11-20 2019-05-03 深圳壹账通智能科技有限公司 一种基于金融风控数据的预测模型构建方法及相关设备
CN109657978A (zh) * 2018-12-19 2019-04-19 重庆誉存大数据科技有限公司 一种风险识别方法和系统
CN109858970B (zh) * 2019-02-02 2021-07-02 中国银行股份有限公司 一种用户行为预测方法、装置及存储介质
CN110134862A (zh) * 2019-04-17 2019-08-16 深圳壹账通智能科技有限公司 产品信息展示方法、装置、计算机设备和存储介质
CN110223155A (zh) * 2019-04-25 2019-09-10 深圳壹账通智能科技有限公司 投资推荐信息的推送方法、装置及计算机设备
CN110289098B (zh) * 2019-05-17 2022-11-25 天津科技大学 一种基于临床检验和用药干预数据的风险预测方法
CN110334737B (zh) * 2019-06-04 2023-04-07 创新先进技术有限公司 一种基于随机森林的客户风险指标筛选的方法和系统
CN110503459B (zh) * 2019-07-19 2023-09-15 平安科技(深圳)有限公司 基于大数据的用户信用度评估方法、装置及存储介质
CN110752942B (zh) * 2019-09-06 2021-09-17 平安科技(深圳)有限公司 告警信息的决策方法、装置、计算机设备及存储介质
CN111353784A (zh) * 2020-02-25 2020-06-30 支付宝(杭州)信息技术有限公司 一种转账处理方法、系统、装置和设备
CN111459828A (zh) * 2020-04-07 2020-07-28 中国建设银行股份有限公司 一种软件版本的非功能性测试评估方法及装置
CN111583014A (zh) * 2020-04-09 2020-08-25 上海淇毓信息科技有限公司 一种基于gbst的金融风险管理方法、装置和电子设备
CN111783830A (zh) * 2020-05-29 2020-10-16 平安科技(深圳)有限公司 基于oct的视网膜分类方法、装置、计算机设备及存储介质
CN112116441B (zh) * 2020-10-13 2024-03-12 腾讯科技(深圳)有限公司 金融风险分类模型的训练方法、分类方法、装置及设备
TWI776370B (zh) * 2021-01-25 2022-09-01 第一商業銀行股份有限公司 對於基金商品的投資風險評分方法及系統
CN112950383B (zh) * 2021-04-15 2023-09-26 平安直通咨询有限公司上海分公司 基于人工智能的金融风险监控方法及相关设备
CN113112343A (zh) * 2021-04-16 2021-07-13 上海同态信息科技有限责任公司 基于Random Forest神经网络的金融风险评估方法
CN113052684A (zh) * 2021-04-30 2021-06-29 中国银行股份有限公司 理财产品的风险预测方法、相关装置及计算机存储介质
CN113240509B (zh) * 2021-05-18 2022-04-22 重庆邮电大学 一种基于多源数据联邦学习的贷款风险评估方法
CN113298185B (zh) * 2021-06-21 2024-05-28 深信服科技股份有限公司 模型训练方法、异常文件检测方法、装置、设备及介质
CN113628748A (zh) * 2021-08-16 2021-11-09 未鲲(上海)科技服务有限公司 用户风险承受倾向的评估方法、装置、设备及存储介质
CN114663219B (zh) * 2022-03-28 2023-09-12 南通电力设计院有限公司 一种基于能源互联电力市场的主体征信评估方法及系统
CN116306958A (zh) * 2022-09-13 2023-06-23 中债金科信息技术有限公司 违约风险预测模型训练方法、违约风险预测方法及设备
CN115409613A (zh) * 2022-09-13 2022-11-29 中债金科信息技术有限公司 债券风险检测模型训练方法和债券风险检测方法
CN117572808A (zh) * 2024-01-15 2024-02-20 埃睿迪信息技术(北京)有限公司 一种设备监测方法、装置及设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100317420A1 (en) * 2003-02-05 2010-12-16 Hoffberg Steven M System and method
US20160086185A1 (en) * 2014-10-15 2016-03-24 Brighterion, Inc. Method of alerting all financial channels about risk in real-time
CN106022508A (zh) * 2016-05-06 2016-10-12 陈丛威 预测线上理财平台的用户邀请好友行为的方法和装置
CN106897918A (zh) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 一种混合式机器学习信用评分模型构建方法
CN108665159A (zh) * 2018-05-09 2018-10-16 深圳壹账通智能科技有限公司 一种风险评估方法、装置、终端设备及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150235222A1 (en) * 2014-02-18 2015-08-20 Mastercard International Incorporated Investment Risk Modeling Method and Apparatus
CN105279691A (zh) * 2014-07-25 2016-01-27 中国银联股份有限公司 基于随机森林模型的金融交易检测方法和设备
CN106991611A (zh) * 2017-03-27 2017-07-28 北京贝塔智投科技有限公司 一种智能理财投资顾问机器人系统及其工作方法
CN107766883A (zh) * 2017-10-13 2018-03-06 华中师范大学 一种基于加权决策树的优化随机森林分类方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100317420A1 (en) * 2003-02-05 2010-12-16 Hoffberg Steven M System and method
US20160086185A1 (en) * 2014-10-15 2016-03-24 Brighterion, Inc. Method of alerting all financial channels about risk in real-time
CN106022508A (zh) * 2016-05-06 2016-10-12 陈丛威 预测线上理财平台的用户邀请好友行为的方法和装置
CN106897918A (zh) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 一种混合式机器学习信用评分模型构建方法
CN108665159A (zh) * 2018-05-09 2018-10-16 深圳壹账通智能科技有限公司 一种风险评估方法、装置、终端设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849760A (zh) * 2021-12-02 2021-12-28 云账户技术(天津)有限公司 敏感信息风险评估方法、系统和存储介质
CN113849760B (zh) * 2021-12-02 2022-07-22 云账户技术(天津)有限公司 敏感信息风险评估方法、系统和存储介质

Also Published As

Publication number Publication date
CN108665159A (zh) 2018-10-16

Similar Documents

Publication Publication Date Title
WO2019214248A1 (zh) 一种风险评估方法、装置、终端设备及存储介质
CN108564286B (zh) 一种基于大数据征信的人工智能金融风控授信评定方法和系统
CN107993143A (zh) 一种信贷风险评估方法及系统
US10614073B2 (en) System and method for using data incident based modeling and prediction
WO2019061976A1 (zh) 基金产品推荐方法、装置、终端设备及存储介质
CN108133418A (zh) 实时信用风险管理系统
CN108763277B (zh) 一种数据分析方法、计算机可读存储介质及终端设备
Kočišová et al. Discriminant analysis as a tool for forecasting company's financial health
EP1361526A1 (en) Electronic data processing system and method of using an electronic processing system for automatically determining a risk indicator value
CN108549973B (zh) 识别模型构建及评估的方法、装置、存储介质及终端
CN110019785B (zh) 一种文本分类方法及装置
WO2021174699A1 (zh) 用户筛选方法、装置、设备及存储介质
CN112990386A (zh) 用户价值聚类方法、装置、计算机设备和存储介质
CN106874286B (zh) 一种筛选用户特征的方法及装置
CN114626940A (zh) 数据分析方法、装置及电子设备
CN110264306B (zh) 基于大数据的产品推荐方法、装置、服务器及介质
CN114066513A (zh) 一种用户分类的方法和装置
CN114170000A (zh) 信用卡用户风险类别识别方法、装置、计算机设备和介质
CN113688287A (zh) 一种企业风险传染路径分析方法、装置、设备及存储介质
CN113689299B (zh) 新闻信息指数模型的构建方法及新闻信息分析方法
CN110610200B (zh) 车商分类方法、装置、计算机设备及存储介质
WO2023029065A1 (zh) 数据集质量评估方法、装置、计算机设备及存储介质
CN113469235B (zh) 用水波动异常识别方法及装置、计算机装置及存储介质
CN114547482B (zh) 业务特征生成方法、装置、电子设备及存储介质
CN110377592B (zh) 定量变量到虚拟变量的数据预处理方法、装置及终端设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18917968

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29/03/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18917968

Country of ref document: EP

Kind code of ref document: A1