WO2019218748A1 - 一种保险业务风险预测的处理方法、装置及处理设备 - Google Patents

一种保险业务风险预测的处理方法、装置及处理设备 Download PDF

Info

Publication number
WO2019218748A1
WO2019218748A1 PCT/CN2019/076024 CN2019076024W WO2019218748A1 WO 2019218748 A1 WO2019218748 A1 WO 2019218748A1 CN 2019076024 W CN2019076024 W CN 2019076024W WO 2019218748 A1 WO2019218748 A1 WO 2019218748A1
Authority
WO
WIPO (PCT)
Prior art keywords
risk
user
data
predicted
training
Prior art date
Application number
PCT/CN2019/076024
Other languages
English (en)
French (fr)
Inventor
吴龙凤
石秋慧
张泰玮
陈诗奕
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2019218748A1 publication Critical patent/WO2019218748A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Definitions

  • the embodiment of the present specification belongs to the technical field of computer data processing for insurance business risk prediction, and particularly relates to a method, device and processing device for processing insurance business risk prediction.
  • Motor vehicle insurance that is, automobile insurance (or car insurance) refers to a type of commercial insurance that is liable for personal injury or property damage caused by natural disasters or accidents. With the development of the economy, the number of motor vehicles is increasing. At present, auto insurance has become one of the biggest insurances in China's property insurance business.
  • the generalized linear model mainly deals with linearly related data objects. For example, the Internet time is reduced by 1%, and the age is increased by 1 year.
  • the GLM model can be modeled based on the linear relationship between network age data and age data.
  • the purpose of the embodiments of the present specification is to provide a method, a device and a processing device for processing insurance business risk prediction, which can improve the decision tree by introducing a calculation gradient in the insurance business risk prediction, and is not only compatible with the insurance business data of the non-linear relationship in the insurance business.
  • Risk prediction processing can also output the relative risk size relationship after risk prediction.
  • the sorted risk prediction results represent the relative magnitude of risks between different users, and can provide another more reliable risk prediction implementation scheme for insurance business.
  • the processing method, device and processing device for insurance business risk prediction provided by the embodiments of the present specification are implemented by the following methods:
  • a method for processing insurance business risk prediction comprising:
  • the target risk correlation data is processed by using the constructed risk ranking model, and the relative value of the risk of the user to be predicted is output, and the risk ranking model includes: training the calculation gradient lifting decision tree by using the labeled risk correlation data. Determine the sorting model.
  • a method for processing insurance business risk prediction comprising:
  • the target risk associated data of the predicted user is taken
  • the target risk association data is processed by the constructed risk ranking model, and the relative value of the risk of the user to be predicted is outputted, and the risk relative value represents a relative risk size relationship between users in a specified user set.
  • An insurance business risk prediction processing device includes:
  • An insurance business risk prediction processing device includes:
  • a prediction data acquisition module configured to acquire target risk association data of the user to be predicted
  • a risk prediction module configured to process the target risk association data by using a constructed risk ranking model, and output a relative value of the risk of the user to be predicted, where the risk ranking model comprises: using a labeled risk correlation data pair calculation
  • the gradient lifting decision tree performs a training-determined ranking model.
  • An insurance business risk prediction processing apparatus includes a processor and a memory for storing processor executable instructions, the processor implementing the instructions to:
  • the target risk correlation data is processed by using the constructed risk ranking model, and the relative value of the risk of the user to be predicted is output, and the risk ranking model includes: training the calculation gradient lifting decision tree by using the labeled risk correlation data. Determine the sorting model.
  • An insurance business risk prediction processing apparatus includes a processor and a memory for storing processor executable instructions, the processor implementing the instructions to:
  • the target risk association data is processed by the constructed risk ranking model, and the risk relative value of the user to be predicted is output, and the risk relative value represents a relative risk size relationship between users in the specified user set.
  • the processing method, device and processing device for insurance business risk prediction provided by the embodiments of the present invention can obtain target risk related data of the user to be predicted, and then process the target risk related data by using the constructed risk ranking model. And outputting the relative risk value of the to-be-predicted user, where the risk ranking model comprises: a ranking model for training the calculated gradient lifting decision tree by using the labeled risk association data.
  • the decision tree can be improved by introducing a calculus gradient in the insurance business risk prediction, which is not only compatible with the risk prediction processing of the insurance business data in the non-linear relationship of the insurance business, but also can output the relative risk after the risk prediction.
  • the size relationship, the sorted risk prediction results represent the relative size of the risks between different users, and can provide another more reliable insurance risk prediction implementation.
  • FIG. 1 is a schematic flow chart of an embodiment of an insurance business risk prediction processing method provided by the present specification
  • FIG. 2 is a schematic diagram of a Lambda MART model processing procedure in the method provided by the present specification
  • FIG. 3 is a schematic flowchart diagram of another embodiment of a method for predicting insurance business risk prediction provided by the present specification
  • FIG. 4 is a hardware structural block diagram of a server for applying an insurance business risk prediction processing method provided by the present specification.
  • FIG. 5 is a block diagram showing the structure of an insurance business risk prediction processing apparatus provided by the present specification.
  • the classification of data characteristics in insurance business risk prediction is also increasingly dimensioned and detailed.
  • the impact of many variables on screening classifications is non-linear, such as the length of time and age of the Internet, but the correlation can be varied. For example, it can be a simple linear relationship. For example, the Internet time is reduced by 1%, and the age is increased by 1 year. It can also be a complicated relationship, such as an exponential relationship. The Internet time is reduced by 4%, and the age is increased by 2 years. It can be solved by a generalized linear model by converting certain mathematical changes into linear. In real life, in addition to some variables with basic linear relationships, there are a large number of nonlinear variables.
  • the "age” does not change with the change of the online time, but at the same time, it is related to the shopping and habits of the crowd. Different consumption habits change the age distribution in a non-linear form with their own changes. Because predicting "user age” is one of the goals, if some linear relationship prediction models fail to identify nonlinear relationships, the prediction performance of the model will be greatly reduced. In the existing solution, the variables can be segmented and summarized by sub-boxing, but the accuracy of many variables is lost and the prediction result is reduced.
  • the risk quantification model can be constructed reasonably and effectively in risk prediction.
  • the model can be well compatible with linear and nonlinear variables, and can integrate risk relationships.
  • the direct model outputs the relative risk relationship of users in the same user set. To make it as close as possible to the real situation, the reliability of the prediction results has been significantly improved.
  • LambdaMART can be referred to as a calculation gradient elevation decision tree in this specification.
  • the relative risk value described herein may include outputting a relative risk relationship of other users in a relative user set in a user set in a user set, and subsequent sorting may be performed according to a specific value of the risk relative value.
  • the relative risk value of user A predicted using the LambdaMART model is 0.6
  • the risk of A is greater than the risk of B
  • the actual loss ratio of A and B is very low or the absolute value of the predicted risk is very low. Both A and B may be based on the actual loss ratio and thus meet the risk assessment requirements.
  • the user set described above may be specifically classified according to the processed data or the predicted demand of the actual application scenario.
  • a user belonging to the same insurance company may serve as a user set, or a user belonging to the same insurance type may serve as a user.
  • a user collection can be the same insurance company or the insurance service user when the model is trained.
  • the online prediction can be used without limitation. For example, the relative risk relationship between multiple users of different insurance companies can be output.
  • FIG. 1 is a schematic flowchart diagram of an embodiment of a method for processing the insurance business risk prediction provided by the present specification.
  • the present specification provides method operation steps or device structures as shown in the following embodiments or figures, there may be more or partial merged fewer operational steps in the method or device based on conventional or no inventive labor. Or module unit.
  • the execution order of the steps or the module structure of the device is not limited to the execution order or the module structure shown in the embodiment or the drawings.
  • server or terminal product of the method or module structure When the device, server or terminal product of the method or module structure is applied, it may be executed sequentially or in parallel according to the method or module structure shown in the embodiment or the drawing (for example, parallel processor or multi-thread processing). Environment, even including distributed processing, server cluster implementation environment).
  • an insurance business risk prediction processing method provided by the present specification may include:
  • S2 processing, by using the constructed risk ranking model, the target risk association data, and outputting a relative value of the risk of the user to be predicted, where the risk ranking model comprises: using a labeled risk association data to calculate a gradient to improve a decision tree Perform a training-determined sorting model.
  • the MART is another term of GBDT (Gradient Boosting Decision Tree). It is an iterative decision tree algorithm composed of multiple decision trees. The conclusions of all trees are added up to make the final answer. .
  • the trees in GBDT are all regression trees and can be used for regression prediction.
  • the underlying training model can use the GBDT nonlinear relational algorithm model, and the decision tree model can be constructed in advance using the labeled risk-related data, and the machine learning through the regression (distribution iteration) The parameters in the decision tree are gradually adjusted and optimized.
  • the model prediction result meets the accuracy requirements of the insurance business risk prediction, it can be used online to predict the relative value of the risk of the user to be predicted.
  • Lambda is the gradient used by the MART solver process, and its physical meaning can represent the direction (up or down) and intensity of the next iteration of the document to be sorted.
  • the core idea of GBDT is that in the continuous iteration, the regression model tree model generated by the new round of iteration fits the gradient of the loss function, and finally all the regression decision trees are superimposed to obtain the final model.
  • LambdaMART uses a special Lambda value instead of the gradient, which is to add the LambdaRank algorithm to the MART algorithm.
  • the combination of MART and Lambda can be used as the LambdaMART used in some embodiments of the present specification.
  • the principle of MART is to solve the function directly in the function space.
  • the model result can be composed of many trees, and the fitting target of each tree is the gradient of the loss function.
  • LambdaMART's framework (the underlying model) is based on MART, mainly in the middle of the calculated gradient using Lambda.
  • the parameters set by MART may include: the number of trees M, the number of leaf nodes L, and the learning rate v, etc. These parameters can be adjusted to obtain optimal parameters through the verification set adjustment.
  • Each tree's training will first traverse all the training data (label different document pairs), calculate the indicator changes caused by each pair swap position and Lambda, ie, then calculate the Lambda of each document:, then calculate each The derivative wi is used by the subsequent Newton step to solve the value of the leaf node.
  • the criterion for dividing the tree node is MSE (mean squared error), and generate a regression tree with the number of leaf nodes being L.
  • the algorithm process of LambdaMART can be as shown in FIG. 2, and mainly includes the following steps:
  • the model can be iteratively updated according to the underlying model of the initialization; if there is no underlying model, then the initial values are all 0;
  • the criteria for dividing the tree node can refer to the MSE to generate a regression tree R with a leaf node L.
  • lambdaMart used in the embodiment. Different from the single-user absolute correlation of the traditional regression model, lambdaMart considers the comprehensive risk relationship of all users under given conditions, and directly solves the problem, and the result is more comprehensive. Generally, there is a certain difference in pricing between different insurance companies. When lambdaMart constructs training samples, it constructs a sequence relationship between the same insurance company. Model training is based on the relative relationship between data, not absolute values, so Negative sample ratio imbalance is not sensitive. Relative to the absolute value of the regression model, this ordering relationship can more accurately characterize the user's risk level.
  • the lambdaMart in the embodiment of the present specification utilizes the regression model of GBDT, and can perform risk prediction processing on user characteristic data of the nonlinear relationship of model input in the insurance business, and has strong applicability in risk prediction of automobile insurance users, and risk prediction. The results are also more accurate and reliable.
  • a LambdaMART-based risk ranking model may be pre-built.
  • the specific underlying GBDT model training and construction can be based on the actual business scenario requirements and data to carry out the corresponding model structure and parameter settings, such as individual training can be carried out in a single tree, the training residuals continue as another tree input Training; or multi-tree multi-level connection for training, training residuals as an input to another multi-level connection.
  • the risk prediction processing of the insurance service data based on the nonlinear relationship of the LambdaMART implementation based on the GBDT algorithm may be applied, and the implementation process of the LambdaMART model construction is not described one by one in this specification.
  • the training data of the risk ranking model may be determined according to the historical auto insurance policy data collection in advance, and the training data is marked according to the risk division or setting requirements.
  • the training data may be referred to as risk-related data, and the risk-related data is usually associated with the insurance service for sample training of the risk ranking model.
  • the risk association may be user feature data including multiple dimensions, and the user feature data associated with one user is a set of training data, and each set of risk associated data may be marked to set a corresponding risk score.
  • the risk association data may include user feature data with at least one category, the user feature data including data information of a non-linear relationship associated with the insurance service.
  • User A's risk associated data may include (A1, A2, A3..., A9) 9 dimensions of user profile data.
  • User characteristics data of different dimensions may be selected according to the demand of the automobile insurance prediction.
  • the nine dimensions of the above example may include age, gender, occupation, annual income, historical risk times, monthly average consumption, credit rating, marital status, liabilities. assets.
  • user feature data of 10 or more dimensions may be acquired in advance, and user feature data requiring model training may be selected from user feature data of multiple dimensions when determining risk association data.
  • specific risk-related data can include the following Table 1:
  • the risk-related data may further include manual data generated according to a predetermined rule, for example, risk-related data that the worker can customize the model according to the situation that the expected risk may include.
  • the required risk-related data is automatically generated by the computer after the set data generation rules.
  • the generated artificial data here is more in line with the expected risk prediction situation, while the historical auto insurance case data is closer to the real risk situation.
  • one of them can be used or combined with manual data and historical auto insurance case data. Training the risk ranking model to improve the accuracy of the prediction results.
  • the acquired risk-related data can be trained as training data in the GBDT model.
  • the threshold of the decision feature (which may be all thresholds or partial thresholds) of the decision tree branching in the risk ranking model can satisfy the model.
  • the relationship between the actual risk of each user's risk usually also requires continuous and stable output.
  • the GBDT used in the embodiments of the present specification is an iterative decision tree algorithm, which can be mainly divided into a decision tree (Regression Decision Tree, DT) and a gradient boost (GB).
  • Decision trees are mainly divided into two categories: classification trees and regression trees. Classification trees are often used to solve classification problems, such as user gender, whether the web page is a spam page, and whether the user is cheating.
  • the regression tree is generally used to predict real values, such as the age of the user, the probability of the user clicking, the degree of relevance of the web page, and so on.
  • the former is used to classify label papers and the latter is used to predict real values.
  • the general process of the regression tree is similar to that of the classification tree. The difference is that each node of the regression tree will get a predicted value. Take the age as an example. The predicted value is equal to the average of the ages of all people belonging to this node.
  • each feature is exhausted to find the optimal segmentation variable and the optimal segmentation point.
  • the criterion measured in this embodiment is no longer the Gini coefficient in the classification tree, but the square error is minimized. That is, the more people are predicted to make mistakes, the larger the square error is, and the most reliable branching basis is found by minimizing the square error. Branching until each leaf node has a unique interest in the game or reaches a preset termination condition (such as the upper limit of the number of leaves). If the age of the leaf node is not unique, then everyone on the node is The average age is the predicted result of the leaf node.
  • Gradient boosting is a machine learning technique for regression, classification, and sorting tasks that is part of the Boosting family of algorithms.
  • Boosting is a family of algorithms that can promote weak learners to strong learners, and belongs to the category of ensemble learning.
  • the Boosting method is based on the idea that for a complex task, the judgment of the appropriate combination of the judgments of multiple experts is better than the judgment of any one of the experts alone. In layman's terms, it is the truth of "three smugglers top Zhuge Liang”.
  • Gradient elevation like other boosting methods, builds the final predictive model by ensemble multiple weak learners, usually decision trees.
  • the boosting method builds the model in a step-wise way. The weak learners built at each step of the iteration are designed to compensate for the deficiencies of the existing model.
  • the tree of the tree can be set during training. After the tree reaches the specified value (such as 80 trees), the training can be stopped; or the residual is small (the conditions for stopping the training are satisfied) When these two conditions satisfy a training, you can stop training.
  • the specified value such as 80 trees
  • the residual result of the node using the Nth tree is substituted into the N+1 tree for learning according to the corresponding original value
  • the predicted value corresponding to the current leaf node is output. Specifically, all residuals can be accumulated as a predicted value.
  • the number of decision trees used for training may be determined in advance, and the threshold of the decision feature of a decision tree for branching is determined by gradient iteration. If 80 decision trees can be used, each tree in each tree learns the residuals of all previous tree conclusions. The threshold of the initial number can be set based on the empirical value. If A's true score (marking score is 80), but the first tree's age-based decision feature predicts a score of 60, a difference of 20 points, and a residual of 20. Then in the second tree (decision feature is the user's occupation), the score of A is set to 20 points to learn. If the second tree can really divide A into 20 points of the leaf node, then add two trees.
  • the conclusion is the true score of A (predicted score 60 points + residual 20 points); if the conclusion of the second tree is 18 points, then A still has 2 points residual, the third tree (decision feature is year)
  • the age of A becomes 2 points and continues to learn.
  • the residual calculation for each step is equivalent to increasing the weight of the fault event in a disguised manner, while the time that has been paired tends to zero. For example, if the age is too large or too small, the risk is greater, and the income is more The higher the risk, the smaller the risk is.
  • the threshold of the decision feature of the decision tree may be determined until the adjusted threshold satisfies the prediction result output requirement of the risk ranking model.
  • the risk ranking model For example, the initial set risk score is divided into 60 and 80. The threshold is whether the age is greater than 20 years old. After a large amount of data training and optimization, the decision-making feature of risk assessment from the age dimension can be adjusted to determine whether the age is greater than 24 years old to meet the true prediction results in most cases.
  • the number of decision trees used by the risk ranking model when determining the number of decision trees used by the risk ranking model, it may be determined based on the number of categories corresponding to the user feature data. For example, user characteristics data of 80 dimensions is selected, and each dimension can represent the decision characteristics of a tree, so that 80 decision trees can be used to construct a nonlinear risk ranking model.
  • the total number of specific decision trees may be determined according to the collection data, the number of branches of the tree, and the connection relationship between the upper and lower levels of the tree.
  • the risk-related data used by the risk ranking model training is user feature data belonging to the same user set, and includes at least one category of user feature data, and the user feature data includes an insurance service. Data information for associated nonlinear relationships.
  • the ranking model used in this embodiment uses the loss ratio of the users of the same insurance company to construct the relative risk relationship between the users during training, since the user data in the same insurance company is completely comparable, this The training data is guaranteed to be true and reliable. Therefore, compared with the traditional regression model, the present embodiment provides another risk prediction recognition processing method based on the ranking model, and the result of the ranking model is more authentic and credible. At the same time, the multi-dimensional nonlinear relationship variables in the insurance business can be applied reasonably and effectively by using the method provided in the embodiments of the present specification.
  • the risk ranking model based on the nonlinear relationship of the gradient decision tree can be well compatible with linear and nonlinear variables. Compared with the traditional linear model, the accuracy of the prediction results has been significantly improved, effectively making up for the shortcomings of the traditional linear model and improving the insurance service experience.
  • the risk correlation data used by the risk ranking model in the training process includes:
  • the performing the training comprises: inputting feature data of the first user in the first sample training set, and outputting a relative value of the risk of the first user in the first user set, where the first user set is in the The users included in the first sample training set are described.
  • the user attribution label may indicate a distribution or attribution classification of the user. For example, a user of an insurance company uses a corresponding user attribution label, and may mark the user belonging to the user belonging to the insurance company A.
  • the first sample training set, the first user set, and the first user described above are mainly to distinguish the currently processed training set from other training sets, and do not specifically refer to a certain set.
  • the training set formed by the feature data of the user of another insurance company B may be referred to as a second sample training set, and the set of users belonging to the insurance company B may be referred to as a second user set.
  • the model training the training set set by each insurance company can be input, and the relative risk of the user to be predicted is output.
  • the risk ranking model can predict the relative risk value of each user individually, and can directly optimize the relative risk relationship of the users in the same user set outputted by the model. Therefore, in another embodiment of the method, the method may further include:
  • the data information of the relative risk size relationship is output.
  • the feature data of users A, B, C, and D can be processed separately using the constructed risk ranking model, and the relative risk values of the respective values are 0.48, 0.56, 0.81, and 0.62, wherein B, C, D belongs to the same insurance company.
  • the feature data of the user of the same insurance company constructs a sample training set, and the training and training of the model is used to perform more accurate and reliable output. Risk prediction results. Specifically, it can be output in this example, such as:
  • LambdaMART inputting the characteristic data of the user, the numerical ordering order relationship number of the user loss ratio can be output, and the existing regression model is solved for the benefit ratio of different insurance companies.
  • the model obtained by the model has poor reliability.
  • LambdaMART's sorting model is the level of risk between users. The value of the sorting model has no actual physical meaning and is used as a basis for comparison of order relationships.
  • the embodiments provided in this specification can be applied not only to the implementation scenario of the risk insurance business risk prediction, but also to the implementation scenarios of the fund risk ranking and the medical insurance risk ranking.
  • Specific application scenarios in the risk prediction of auto insurance business are also to the implementation scenarios of the fund risk ranking and the medical insurance risk ranking.
  • the risk ranking model is a car insurance risk ranking model based on training related to risk related data related to auto insurance business;
  • the risk relative value includes a relative risk size of the payout ratio corresponding to the user to be predicted.
  • loss ratio and vehicle risk score are only one output representation of one or more embodiments for the nonlinear relationship risk ranking model.
  • the present specification does not limit other embodiments, and other representation manners or representation manners of the loss ratio and the risk risk score may be obtained. If the loss ratio is linearly transformed, the vehicle insurance score may be obtained. The smaller the risk (the opposite of the risk risk score, the greater the risk score, the higher the risk).
  • linear relationship generally refers to a one-time function between two variables.
  • the broad understanding of the linear relationship may mean that the relationship between two variables is clear and fixed, and in some cases, it may be expressed by a straight line or by a certain
  • the mathematical changes are converted into linear relationships (converted information loss is within a certain range).
  • the non-linear relationship mainly means that the relationship between variables is constantly changing and cannot be described by a formula. In some cases, it can only be represented by curves, surfaces or irregular lines, such as risk scores and occupations, risk points. Value and gender.
  • the processing of constructing the risk ranking model may be generated by an offline pre-built method, and the training data including the nonlinear relationship may be pre-selected for learning and training of the GBDT decision tree. After the training is completed, Use it again online.
  • This manual does not exclude that the risk ranking model can be built or updated/maintained online. For example, if the computer has sufficient capacity, the risk ranking model can be constructed online, and the risk ranking model can be constructed to be used online synchronously.
  • the target risk associated data is processed.
  • the present specification further provides another method for processing insurance business risk prediction, and the method includes:
  • S32 The target risk association data is processed by using the constructed risk ranking model, and the relative risk value of the user to be predicted is output, and the risk relative value represents a relative risk size relationship between users in the specified user set.
  • the relative risk value of the output can be a value of one user or a value of multiple users. When multiple user values are output, it can be an output that represents the relative risk size between users, or an unsorted output.
  • the processing method for insurance business risk prediction can obtain the target risk association data of the user to be predicted, and then process the target risk association data by using the constructed risk ranking model, and output the to-be-predicted
  • the user's risk relative value, the risk ranking model includes: a ranking model that uses the labeled risk correlation data to train and determine the gradient gradient decision tree.
  • the decision tree can be improved by introducing a calculus gradient in the insurance business risk prediction, which is not only compatible with the risk prediction processing of the insurance business data in the non-linear relationship of the insurance business, but also can output the relative risk after the risk prediction.
  • the size relationship, the sorted risk prediction results represent the relative size of the risks between different users, and can provide another more reliable insurance risk prediction implementation.
  • the method described above can be used for risk identification on the client side, such as risk assessment of insurance services provided in the payment application of the mobile terminal.
  • the client can be a PC (personal computer), a server, an industrial computer (industrial control computer), a mobile smart phone, a tablet electronic device, a portable computer (such as a laptop computer, etc.), a personal digital assistant (PDA), or a desktop.
  • Computer or smart wearable device, etc. Mobile communication terminal, handheld device, in-vehicle device, wearable device, television device, computing device. It can also be applied to a system server of an insurance company or a third-party insurance service, which may include a separate server, a server cluster, a distributed system server, or a server that processes device request data and other associated data processing systems. Server combination.
  • an implementation may include establishing an Open Data Processing Service (ODPS) platform on the Facebook Cloud.
  • ODPS Open Data Processing Service
  • a unified programming interface and interface can be provided for various data processing tasks from different user needs.
  • the system implementing the method of the embodiment of the present specification can process massive data in parallel and achieve optimal performance.
  • FIG. 4 is a hardware structural block diagram of a server for applying the insurance business risk prediction processing method provided by the present specification.
  • server 10 may include one or more (only one shown) processor 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), A memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in FIG.
  • server 10 may also include more or fewer components than those shown in FIG. 4, for example, may also include other processing hardware, such as a database or multi-level cache, or have a different configuration than that shown in FIG.
  • the memory 104 can be used to store software programs and modules of application software, such as program instructions/modules corresponding to the search method in the embodiment of the present invention, and the processor 102 executes various functions by running software programs and modules stored in the memory 104.
  • Application and data processing that is, a processing method for realizing the content display of the above navigation interaction interface.
  • Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory.
  • memory 104 may further include memory remotely located relative to processor 102, which may be coupled to computer terminal 10 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the transmission module 106 is configured to receive or transmit data via a network.
  • the network specific examples described above may include a wireless network provided by a communication provider of the computer terminal 10.
  • the transport module 106 includes a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet.
  • the transmission module 106 can be a Radio Frequency (RF) module for communicating with the Internet wirelessly.
  • NIC Network Interface Controller
  • RF Radio Frequency
  • the present specification also provides an insurance business risk prediction processing apparatus.
  • the apparatus may include a system (including a distributed system), software (applications), modules, components, servers, clients, etc., using the methods described in the embodiments of the present specification, in conjunction with necessary device hardware for implementing the hardware.
  • the processing device in one embodiment provided by this specification is as described in the following embodiments.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 5 is a schematic structural diagram of a module of an insurance service risk prediction processing apparatus provided in this specification, which may include:
  • the prediction data obtaining module 201 may be configured to acquire target risk associated data of the user to be predicted
  • the risk prediction module 202 is configured to process the target risk association data by using the constructed risk ranking model, and output a risk relative value of the to-be-predicted user, where the risk ranking model includes: using the labeled risk association data A ranking model for training the decision-calculated gradient decision tree.
  • the server or client provided by the embodiment of the present specification may be implemented by a processor executing a corresponding program instruction in a computer, such as a C++ language of a Windows operating system, implemented on a PC or a server, or other corresponding to, for example, Linux or a system. Apply the design language to the necessary hardware implementations, or to implement processing logic based on quantum computers.
  • the foregoing processing device may specifically provide a risk prediction server for an insurance server or a third-party service organization, and the server may be a separate server, a server cluster, a distributed system server, or a server that processes data requesting data and other associated data.
  • the present specification also provides an insurance business risk prediction processing device, which may specifically include a processor and a memory for storing processor-executable instructions, which when the processor executes the instructions:
  • the target risk correlation data is processed by using the constructed risk ranking model, and the relative value of the risk of the user to be predicted is output, and the risk ranking model includes: training the calculation gradient lifting decision tree by using the labeled risk correlation data. Determine the sorting model.
  • the user feature data with the same user home tag is configured as a sample training set, and the user feature data includes the insurance service.
  • the performing the training comprises: inputting feature data of the first user in the first sample training set, and outputting a relative value of the risk of the first user in the first user set, where the first user set is in the The users included in the first sample training set are described.
  • the processor further implements the instruction when executing the instruction:
  • the data information of the relative risk size relationship is output.
  • the risk ranking model is a car insurance risk ranking model based on training related to risk associated data associated with a car insurance service
  • the risk relative value includes a relative risk size of the payout ratio corresponding to the user to be predicted.
  • the ranking model used in the risk prediction processing device is not limited to the LambdaMART model, and other algorithm models whose output indicates the relative magnitude of the user risk are equally applicable. Accordingly, the present specification also provides another insurance business risk prediction processing device, and may specifically include a processor and a memory for storing processor-executable instructions that are implemented when the processor executes the instructions:
  • the target risk association data is processed by the constructed risk ranking model, and the relative value of the risk of the user to be predicted is outputted, and the risk relative value represents a relative risk size relationship between users in a specified user set.
  • the above instructions may be stored in a variety of computer readable storage media.
  • the computer readable storage medium may include physical means for storing information, which may be digitized and stored in a medium utilizing electrical, magnetic or optical means.
  • the computer readable storage medium of this embodiment may include: means for storing information by means of electrical energy, such as various types of memories, such as RAM, ROM, etc.; means for storing information by magnetic energy means, such as hard disk, floppy disk, magnetic tape, magnetic Core memory, bubble memory, U disk; means for optically storing information such as CD or DVD.
  • electrical energy such as various types of memories, such as RAM, ROM, etc.
  • magnetic energy means such as hard disk, floppy disk, magnetic tape, magnetic Core memory, bubble memory, U disk
  • means for optically storing information such as CD or DVD.
  • quantum memories graphene memories, and the like.
  • the processing method, device and processing device for insurance business risk prediction provided by the embodiments of the present invention can obtain target risk related data of the user to be predicted, and then process the target risk related data by using the constructed risk ranking model. And outputting the relative risk value of the to-be-predicted user, where the risk ranking model comprises: a ranking model for training the calculated gradient lifting decision tree by using the labeled risk association data.
  • the decision tree can be improved by introducing a calculus gradient in the insurance business risk prediction, which is not only compatible with the risk prediction processing of the insurance business data in the non-linear relationship of the insurance business, but also can output the relative risk after the risk prediction.
  • the size relationship, the sorted risk prediction results represent the relative size of the risks between different users, and can provide another more reliable insurance risk prediction implementation.
  • embodiments of the present specification refer to the definition of a linear relationship/non-linear relationship, the construction of a GDBT underlying model in LambdaMART, the processing of a GBDT model algorithm, and the like, data acquisition, storage, interaction, calculation, judgment, and the like. Description, however, embodiments of the present specification are not limited to situations that must be consistent with industry communication standards, standard GBDT model algorithm processing, communication protocols, and standard data models/templates or embodiments of the specification. Certain industry standards or implementations that have been modified in a manner that uses a custom approach or an embodiment described above may also achieve the same, equivalent, or similar, or post-deformation implementation effects of the above-described embodiments. Embodiments obtained by applying such modified or modified data acquisition, storage, judgment, processing, etc., may still fall within the scope of alternative embodiments of the present specification.
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • HDL Hardware Description Language
  • the controller can be implemented in any suitable manner, for example, the controller can take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (eg, software or firmware) executable by the (micro)processor.
  • computer readable program code eg, software or firmware
  • examples of controllers include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, The Microchip PIC18F26K20 and the Silicone Labs C8051F320, the memory controller can also be implemented as part of the memory's control logic.
  • the controller can be logically programmed by means of logic gates, switches, ASICs, programmable logic controllers, and embedding.
  • Such a controller can therefore be considered a hardware component, and the means for implementing various functions included therein can also be considered as a structure within the hardware component.
  • a device for implementing various functions can be considered as a software module that can be both a method of implementation and a structure within a hardware component.
  • the processing device, device, module or unit set forth in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • a typical implementation device is a computer.
  • the computer can be, for example, a personal computer, a laptop computer, a car-mounted human-machine interaction device, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet.
  • a computer, wearable device, or a combination of any of these devices are examples of these devices.
  • the above devices are described as being separately divided into various modules by function.
  • the functions of the modules may be implemented in the same software or software, or the modules that implement the same function may be implemented by multiple sub-modules or a combination of sub-units.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the controller can be logically programmed by means of logic gates, switches, ASICs, programmable logic controllers, and embedding.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present specification can be provided as a method, system, or computer program product.
  • embodiments of the present specification can take the form of an entirely hardware embodiment, an entirely software embodiment or a combination of software and hardware.
  • embodiments of the present specification can take the form of a computer program product embodied on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • Embodiments of the present description can be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • Embodiments of the present specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network.
  • program modules can be located in both local and remote computer storage media including storage devices.

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • Technology Law (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

一种保险业务风险预测的处理方法、装置及处理设备。可以通过在保险业务风险预测中引入演算梯度提升决策树,不仅兼容保险业务中非线性关系的保险业务数据的风险预测处理,还可以输出风险预测后的相对风险大小关系,排序后的风险预测结果表征的是不同用户之间风险的相对大小,可以提供另一种更加可靠的保险业务的风险预测实施方案。

Description

一种保险业务风险预测的处理方法、装置及处理设备 技术领域
本说明书实施例方案属于保险业务风险预测的计算机数据处理技术领域,尤其涉及一种保险业务风险预测的处理方法、装置及处理设备。
背景技术
机动车辆保险即汽车保险(或简称车险),是指对机动车辆由于自然灾害或意外事故所造成的人身伤亡或财产损失负赔偿责任的一种商业保险。随着经济的发展,机动车辆的数量不断增加,当前,车险已成为中国财产保险业务中最大的险种之一。
用户在进行车辆投保时,保险公司通常会对用户进行风险评估,风险评估的结果会直接影响到用户投保金额、优惠待遇等。通过对用户的风险评估,保险公司可以更加准确、合理的进行保险业务的处理,有效规避或减少业务风险。目前,在车险风险预测领域,基于广义线性模型(generalized linear model,GLM)的风险预测已成为业内的主流风险预测技术体系。广义线性模型主要处理的为线性相关的数据对象,如上网时长降低1个百分点,年龄增大1岁,可以基于网龄数据与年龄数据的线性关系实现GLM的建模。
但随着车险业务的不断增加,互联网数据已呈现多种类、海量数据增长,传统的GLM模型体系已越来越受到限制,例如如果“年龄”不是单纯随上网时长变化而变化,而是同时与人群的购物以及习惯等方面相关,不同消费习惯随自身变化改变年龄分布呈非线性影响的形式。GLM模型可以通过分箱将非线性变量进行分段汇总,但是会损失很多变量的精准性,难以适应当前大数据、多维度的风险预测要求。因此,业内亟需一种可以在多维度数据中更加有效和高效的进行车险业务风险预测的处理方式。
发明内容
本说明书实施例目的在于提供一种保险业务风险预测的处理方法、装置及处理设备,可以通过在保险业务风险预测中引入演算梯度提升决策树,不仅兼容保险业务中非线性关系的保险业务数据的风险预测处理,还可以输出风险预测后的相对风险大小关系,排序后的风险预测结果表征的是不同用户之间风险的相对大小,可以提供另一种更加可靠的保险业务的风险预测实施方案。
本说明书实施例提供的一种保险业务风险预测的处理方法、装置及处理设备是包括以下方式实现的:
一种保险业务风险预测的处理方法,所述方法包括:
获取待预测用户的目标风险关联数据;
利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。
一种保险业务风险预测的处理方法,包括:
取待预测用户的目标风险关联数据;
利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险相对值表征在指定用户集合中用户之间的相对风险大小关系。
一种保险业务风险预测处理装置,包括:
一种保险业务风险预测处理装置,包括:
预测数据获取模块,用于获取待预测用户的目标风险关联数据;
风险预测模块,用于利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。
一种保险业务风险预测处理设备,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现:
获取待预测用户的目标风险关联数据;
利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。
一种保险业务风险预测处理设备,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现:
获取待预测用户的目标风险关联数据;
利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户 的风险相对值,所述风险相对值表征在指定用户集合中用户之间的相对风险大小关系。
本说明书实施例提供的一种保险业务风险预测的处理方法、装置及处理设备,可以获取待预测用户的目标风险关联数据,然后利用利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的在风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。利用本说明书实施例提供的方法,可以通过在保险业务风险预测中引入演算梯度提升决策树,不仅兼容保险业务中非线性关系的保险业务数据的风险预测处理,还可以输出风险预测后的相对风险大小关系,排序后的风险预测结果表征的是不同用户之间风险的相对大小,可以提供另一种更加可靠的保险业务的风险预测实施方案。
附图说明
为了更清楚地说明本说明书实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本说明书中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本说明书提供的一种保险业务风险预测处理方法实施例的流程示意图;
图2是本说明书提供的所述方法中一种LambdaMART模型处理过程示意图;
图3是本说明书提供的另一种保险业务风险预测处理方法实施例的流程示意图;
图4是本说明书提供的一种应用保险业务风险预测处理方法的服务器的硬件结构框图。
图5是本说明书提供的一种保险业务风险预测处理装置的模块结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本说明书中的技术方案,下面将结合本说明书实施例中的附图,对本说明书实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本说明书中的一部分实施例,而不是全部的实施例。基于本说明书中的一个或多个实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本说明书实施例保护的范围。
随着计算机互联网技术的发展,数据量飞速增长。保险业务风险预测时数据特征的 分类也越来越多维度、细致化。很多变量对筛选分类的影响是以非线性存在的,例如上网时长和年龄呈现相关性,但是该相关性可以是多种多样的。例如可以是简单的线性关系,例如上网时长降低1个百分点,年龄增大1岁;也可以是比较复杂的关系,例如指数关系,上网时长降低4个百分点,年龄增大2岁,此时可以通过一定数学变化转化为线性的均可以用广义线性模型解决。现实生活中,除了一些基本线性关系的变量外,还存在大量非线性变量。例如在预测年龄时,如果“年龄”不是单纯随上网时长变化而变化,而是同时与人群的购物以及习惯等方面相关,不同消费习惯随自身变化改变年龄分布呈非线性影响的形式。因为预测“用户年龄”是目的之一,若一些线性关系预测模型无法识别非线性关系将会大幅降低模型的预测性能。现有解决的方式中,可以通过分箱将变量进行分段汇总,但是会损失很多变量的精准性,降低预测结果。本说明书实施例提供的区别于现有常规实施方案的另一种保险业务中风险预测的实现方法,引入了LambdaMART(Lambda Multiple Additive Regression Tree,λ演算梯度提升决策树,或λ-梯度提升决策树),可以在风险预测中合理有效地应用非线性变量构建风险排序模型,该模型能很好地兼容线性和非线性变量,并且可以综合风险关系,直接模型输出同一用户集合中用户相对风险高低关系,使其尽可能接近真实情况,预测结果可靠性有着显著的提升。需要注意的是,为便于描述,本说明书中可以将LambdaMART称为演算梯度提升决策树。
这里所述的风险相对值可以包括输出某个用户在一个用户集合中的相对用户集合中其他用户的相对风险高低关系,后续的可以根据风险相对值具体的数值进行排序。例如,本说明书的一些实施例中使用LambdaMART模型预测的用户A的风险相对值为0.6,这里的风险相对值的取值范围为[0,1],则可以表示该用户A=0.6的风险大于用户B=0.58的风险,此时的数值0.6或0.8并非代表具体的赔付率或预测的风险绝对值,表示的是在某些用户集合中各个用户之间相对的风险大小。如在一些应用场景下,虽然A的风险大于B的风险,但A和B的实际赔付率很低或者预测的风险绝对值很低,A和B都可能基于实际赔付率从而符合风险评估要求。
上述中所述的用户集合具体的可以根据处理的数据或实际应用场景的预测需求进行划分,如属于同一个保险公司的用户可以作为一个用户集合,或者属于同一个保险种类的用户可以作为一个用户集合,或者以构建的风险排序模型响应的所有用户作为一个用户集合,或者指定的多个用户为作为一个用户集合。一般的,一个用户集合在模型训练时可以为同一个保险公司或保险服务方的用户,在线上预测使用时可以不做限定,如可以输出不同保险公司的多个用户之间相对风险大小关系。
下面以一个具体的车险业务风险预测处理的应用场景为例对本说明书实施方案进行说明。具体的,图1是本说明书提供的所述保险业务风险预测的处理方法实施例的流程示意图。虽然本说明书提供了如下述实施例或附图所示的方法操作步骤或装置结构,但基于常规或者无需创造性的劳动在所述方法或装置中可以包括更多或者部分合并后更少的操作步骤或模块单元。在逻辑性上不存在必要因果关系的步骤或结构中,这些步骤的执行顺序或装置的模块结构不限于本说明书实施例或附图所示的执行顺序或模块结构。所述的方法或模块结构的在实际中的装置、服务器或终端产品应用时,可以按照实施例或者附图所示的方法或模块结构进行顺序执行或者并行执行(例如并行处理器或者多线程处理的环境、甚至包括分布式处理、服务器集群的实施环境)。
当然,下述车险业务风险预测的实施例的描述并不对基于本说明书的其他可扩展到的技术方案构成限制。例如其他的实施场景中,本说明书提供的实施方案同样可以应用到基金风险排序、医疗保险风险排序等的实施场景中,其他实施场景中的应用参照本说明书车险业务的实施例描述,不再进行替代性的重复描述。具体的一种实施例如图1所示,本说明书提供的一种保险业务风险预测处理方法可以包括:
S0:获取待预测用户的目标风险关联数据;
S2:利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。
所述的MART是GBDT(Gradient Boosting Decision Tree,梯度提升决策树)的另一种说法,是一种迭代的决策树算法,该算法由多棵决策树组成,所有树的结论累加起来做最终答案。GBDT中的树都是回归树,可以用来做回归预测。本说明书提供的保险业务风险预测的处理方法中,底层训练模型可以使用GBDT非线性关系算法模型,可以预先使用已打标的风险关联数据构建决策树模型,通过回归的机器学习(分布迭代)对决策树中的参数逐步调整优化。当模型预测结果符合保险业务风险预测的精度要求时,可以线上使用来预测待预测用户的风险相对值。
Lambda是MART求解过程使用的梯度,其物理含义可以表示一个待排序的文档下一次迭代应该排序的方向(向上或者向下)和强度。GBDT的核心思想是在不断的迭代中,新一轮迭代产生的回归决策树模型拟合损失函数的梯度,最终将所有的回归决策树叠加得到最终的模型。LambdaMART使用一个特殊的Lambda值来代替上述梯度,也就是将LambdaRank算法与MART算法加和起来。将MART和Lambda组合起来可 以作为本说明书一些实施例中使用的LambdaMART。MART的原理是直接在函数空间对函数进行求解,模型结果可以由许多棵树组成,每棵树的拟合目标是损失函数的梯度。LambdaMART的框架(底层模型)是基于MART,主要在于中间计算的梯度使用的是Lambda。在LambdaMART中,MART设置的参数可以包括:树的数量M、叶子节点数L和学习率v等,这些参数可以通过验证集调节获取最优参数。
MART支持“热启动”,即可以在已经训练好的模型基础上继续训练,在刚开始的时候通过初始化加载进来即可。下面简单介绍LambdaMART通常的执行补正:
1、每棵树的训练会先遍历所有的训练数据(label不同的文档pair),计算每个pair互换位置导致的指标变化以及Lambda,即,然后计算每个文档的Lambda:,再计算每个的导数wi,用于后面的Newton step求解叶子节点的数值。
2、创建回归树拟合第一步生成的,划分树节点的标准是MSE(mean squared error,均方误差),生成一颗叶子节点数为L的回归树。
3、对第二步生成的回归树,计算每个叶子节点的数值,采用Newton step求解,即对落入该叶子节点的文档集,用公式计算该叶子节点的输出值。
4、更新模型,将当前学习到的回归树加入到已有的模型中,用学习率v(也叫shrinkage系数)做regularization。
具体的一个示例中,LambdaMART的算法过程可以如图2所示,主要包括以下几个步骤:
1)初始值的确定。模型可以根据初始化的底层模型迭代更新;如果没有底层模型的话,那么初始值全部为0;
2)遍历训练集,对每个文档计算它的lambda梯度(λ),用于后续牛顿迭代法求解叶子节点的偏导;
3)利用上述梯度信息,生成决策树。划分树节点的标准可以参考MSE,生成一颗叶子节点为L的回归树R。
4)根据牛顿迭代法,计算叶子节点的值;
5)更新模型,根据学习率η更新每个文档的得分。
本说明书提供实施例中使用的lambdaMart,区别于传统回归模型的单用户绝对相关性,lambdaMart考虑了给定条件下,所有用户的综合风险关系,直接求解,结果更全面。 一般的,不同的保险公司之间定价有一定的区别,lambdaMart构造训练样本时,在同一个保险公司间构造序关系,模型训练是根据数据之间的相对关系,而不是绝对数值,因此对正负样本比例失衡不敏感。相对回归模型的绝对数值,这种序关系可以更加准确地表征用户的风险水平。另外,本说明书实施例lambdaMart中利用了GBDT的回归模型,可以对保险业务中模型输入的非线性关系的用户特征数据进行风险预测处理,在车险用户风险预测具有较强的适用性,风险预测的结果也更加准确和可靠。
在具体实施过程中,本说明书的一个或多个实施例中,可以预先构建基于LambdaMART的风险排序模型。具体的底层使用的GBDT模型的训练和构建可以根据实际业务场景需求和数据进行相应的模型结构和参数设置,如可以以单棵树进行单独训练,训练的残差作为另一个树的输入继续进行训练;或者多棵树多级连接进行训练,训练残差再作为另一个多级连接的数的输入。当然,其他的实施例中还可以应用基于GBDT算法进行一些变形、变换或改进的LambdaMART实现的非线性关系的保险业务数据的风险预测处理,本说明书不再对LambdaMART模型构建的实现过程逐一赘述。
本实施例中可以预先根据历史车险业务保单数据采集确定风险排序模型的训练数据,根据风险划分或者设置要求对训练数据进行打标。在本实施例保险业务风险预测的实施场景中,所述的训练数据可以称为风险关联数据,这些风险关联数据通常与保险业务相关联,用于对风险排序模型的样本训练。例如风险关联可以为包括多个维度的用户特征数据,一个用户相关联的用户特征数据为一组训练数据,每组风险关联数据可以打标设置相应的风险分值。具体的,本说明书所述方法的一个实施例中,所述风险关联数据可以包括与至少一个类别的用户特征数据,所述用户特征数据包括与保险业务相关联的非线性关系的数据信息。例如一个示例中,用户A的风险关联数据可以包括(A1,A2,A3…,A9)9个维度的用户特征数据。可以根据车险预测的需求相应的选取不同维度的用户特征数据,例如上述示例的9个维度可以包括年龄,性别,职业,年收入,历史出险次数,月均消费,征信等级,婚姻状况,负债资产。或者可以预先采集获取10个或10个维度以上的用户特征数据,在确定风险关联数据时从多个维度的用户特征数据中选取需要进行模型训练的用户特征数据。例如,具体的风险关联数据可以包括如下表1所示:
表1 模型训练的风险关联数据示意表
Figure PCTCN2019076024-appb-000001
当然,其他的实施例中,所述的风险关联数据还可以包括按照预定规则生成的人工数据,例如作业人员可以根据预期的风险可能包括的情况自定义设置进行模型训练的风险关联数据。或者,在设置的数据生成规则后由计算机自动生成所需的风险关联数据。这里的生成的人工数据更加符合预期的风险预测情况,而历史车险案件数据则更接近真实的风险情况,一些实施应用场景中,可以使用其中的一种或者,同时结合人工数据和历史车险案件数据进行风险排序模型的训练,以提高预测结果的准确性。
获取的风险关联数据可以作为训练数据在GBDT模型中进行训练,经过学习训练后风险排序模型中决策树分枝时的决策特征的阈值(可以是全部的阈值,或者部分的阈值)能满足模型最终实际打标的各个用户风险大小的关系(通常还可以要求连续稳定的输出)。本说明书实施例中使用的GBDT是一种迭代的决策树算法,主要可以分为决策树(Regression Decision Tree,DT)和梯度提升(Gradient boosting,GB)。决策树主要分为两类:分类树和回归树,分类树常用来解决分类问题,比如用户性别、网页是否是垃圾页面、用户是不是作弊等。而回归树一般用来预测真实数值,比如用户的年龄、用户点击的概率、网页相关程度等等。前者用于分类标签纸,后者用于预测实数值。这里要强调的是,回归树的结果加减是有意义的,如10岁+5岁-3岁=12岁,后者则是没有办法累加或累加结果无意义,如男+男+女=到底是男是女。回归树大致流程与分类树类似,区别在于,回归树的每一个节点都会得到一个预测值,以年龄为例,该预测值等于属于这个节点的所有人年龄的平均值。分枝时穷举每一个特征寻找最优切分变量和最优切分点,本实施例中衡量的准则不再是分类树中的基尼系数,而是平方误差最小化。也就是被预测错误的人数越多,平方误差就越大,通过最小化平方误差找到最可靠的分枝依据。分枝直到每个叶子节点上人的对游戏感兴趣都是唯一的或者达到预设的终止条件(如叶子个数上限),若最终叶子节点上年龄不是唯一的,则以该节点上所有人的平均年 龄作为该叶子节点的预测结果。
梯度提升(Gradient boosting)是一种用于回归、分类和排序任务的机器学习技术,属于Boosting算法族的一部分。Boosting是一族可将弱学习器提升为强学习器的算法,属于集成学习(ensemble learning)的范畴。Boosting方法基于这样一种思想:对于一个复杂任务来说,将多个专家的判断进行适当的综合所得出的判断,要比其中任何一个专家单独的判断要好。通俗地说,就是“三个臭皮匠顶个诸葛亮”的道理。梯度提升同其他boosting方法一样,通过集成(ensemble)多个弱学习器,通常是决策树,来构建最终的预测模型。boosting方法通过分步迭代(stage-wise)的方式来构建模型,在迭代的每一步构建的弱学习器都是为了弥补已有模型的不足。
例如具体的一个处理过程中,训练的时候可以设定树的棵树,树的棵树达到指定数值后(如八十棵)可以停止训练了;或者残差很小(满足停止训练的条件)的时候,这两个条件满足一个训练就可以停止训练。
若在第N棵残差不全为0或不满足停止条件时,使用第N棵树的节点的残差结果替代相应的原值代入到第N+1棵树中进行学习;
直至第N+K颗数叶子节点的残差和预测值相等或小于阈值,输出当前叶子节点对应的预测数值。具体的可以将所有残差累加作为预测数值。
本实施例中,可以预先确定训练使用的决策树的数量,通过梯度迭代逐渐优化确定一颗决策树进行分枝时的决策特征的阈值。如可以使用80棵决策树,每棵树每一棵树学的是之前所有树结论和的残差。初始的数的阈值可以根据经验值进行设置。假如A的真实分值(打标分值为80分),但第一棵树的根据年龄的决策特征预测分值是60分,差了20分,残差为20。那么在第二棵树(决策特征为用户的职业)里把A的分值设为20分去学习,如果第二棵树真的能把A分到20分的叶子节点,那累加两棵树的结论就是A的真实分值(预测分值60分+残差20分);如果第二棵树的结论是18分,则A仍然存在2分残差,第三棵树(决策特征为年收入)里A的年龄就变成2分,继续学习。每一步的残差计算相当于变相地增大了分错事件的权重,而已经分对的时间则都趋向于0,如,根据年龄过大或过小,则风险越大,以及,收入越高风险越小;如果一个用户年龄过大为60岁,但被分入了风险较小的分支L1,但风险较小的分组L1上的平均年龄在20-40岁之间,则得到的残差值就会相应的增大,该用户可以通过后续的收入、婚姻状况、驾龄等逐渐将其分向靠近实际风险的叶子节点。
若训练的决策树的数量达到预定数值后,如从根节点一直到叶子节点的10棵树均训练一遍,或者当前数的参数满足停止训练条件,如残差为0或者其他残差停止阈值,此时可以停止该组数据的训练。当每个阈值找最好的分割点,或者符合训练要求的分割点,则可以确定决策树的决策特征的阈值,直至调整后的所述阈值满足风险排序模型的预测结果输出要求时,确定所述风险排序模型。例如初始设置风险分值分为60和80的阈值为年龄是否大于20岁。经过大量数据训练优化后,最终可以将从年龄维度进行风险评估这个决策特征调整年龄是否大于24岁,以符合多数情况下的真实预测结果。
另一种实施例中,确定风险排序模型使用的决策树的数量时,可以基于所述用户特征数据对应的类别的数量确定。例如选取了80个维度的用户特征数据,每一个维度可以代表一棵树的决策特征,这样可以使用80个决策树来构建非线性的风险排序模型。当然,本说明书其他的实施例中,具体决策树的总数量可以根据采集是数据、树的分支数、树的上下级连接关系等进行确定。
本说明书提供的一种实施例中,所述风险排序模型训练使用的风险关联数据为属于相同用户集合的用户特征数据,且包括至少一个类别的用户特征数据,所述用户特征数据包括与保险业务相关联的非线性关系的数据信息。
本实施例中使用的排序模型,该模型在训练时利用同一个保险公司的用户的赔付率构造这些用户间的相对风险大小关系,由于同一个保险公司里的用户数据是完全可以比较的,这就保证了训练数据真实可靠,因此相对传统的回归模型,本实施例基于排序模型提供了另一种风险预测的识别处理方式,排序模型的结果更真实可信。同时,利用本说明书实施例提供的方法可以合理有效地应用保险业务中多维度的非线性关系变量,基于梯度提升决策树的非线性关系的风险排序模型可以很好地兼容线性和非线性变量,相对于传统的线性模型,预测结果的准确性有着显著的提升,有效弥补传统线性模型的不足,提高保险业务服务体验
本说明书的另一个实施例中,具体的,所述风险排序模型在训练过程中使用的风险关联数据包括:
S20:将用户归属标签相同的用户特征数据构造为一个样本训练集合,所述用户特征数据中包括与保险业务相关联的非线性关系的数据信息;
相应的,所述进行训练包括:输入第一样本训练集合中第一用户的特征数据,输出所述第一用户在第一用户集合中的风险相对值,所述第一用户集合为在所述第一样本训 练集合中所包括的用户。
所述的用户归属标签可以表示用户的分布或归属分类,例如一个保险公司的用户使用相应的用户归属标签,可以为用户标记该用户属于保险公司A的用户归属标签。
上述中所述的第一样本训练集合、第一用户集合、第一用户,主要是将当前处理的训练集与其他训练集区别开来,并不特指某个集合。以此类推的,另一个保险公司B的用户的特征数据构成的训练集可以称为第二样本训练集合,属于保险公司B的用户的集合可以称为第二用户集合。在模型训练中,可以输入各家保险公司分别构造出来的训练集合集,输出待预测用户的风险相对大小。
所述的风险排序模型可以单个的预测出每个用户的风险相对值,可以直接优化模型输出的属于同一用户集合中用户相对风险高低关系。因此,所述方法的另一个实施例中,还可以包括:
基于所风险相对值,确定指定用户集合中用户之间的相对风险大小关系;
输出所述相对风险大小关系的数据信息。
例如一个示例中,可以使用构建好的风险排序模型分别对用户A、B、C、D的特征数据进行处理,分别得的其风险相对值为0.48、0.56、0.81、0.62,其中B、C、D属于同一个保险公司。由于本说明书的一些实施例中,在风险排序模型训练阶段,同一个保险公司的用户的特征数据构建一个样本训练集合,利用这样的训练样本进行模型的学习、训练,可以输出更加准确、可靠的风险预测结果。具体的在本示例中可以输出,如:
C(0.81)>D(0.62)>B(0.56)>A(0.48)。
在本说明书实施例提供方法中,利用LambdaMART的排序模型,输入用户的特征数据,可以输出用户赔付率的数值排序后序关系编号,解决了现有回归模型对利用不同保险公司的赔付率进行建模得到的模型输出结果可靠性较差的问题。LambdaMART的排序模型学的就是用户之间的风险高低,排序模型的数值无实际物理含义,是作为序关系的比较依据。
如前所述,本说明书提供的实施例不仅可以用于车险业务风险预测的实施场景中,还可以应用到基金风险排序、医疗保险风险排序等的实施场景中。具体的在车险业务风险预测的应用场景中,
S22:所述风险排序模型为基于与车险业务相关联的风险关联数据进行训练得到的 车险风险排序模型;
所述风险相对值包括所述待预测用户对应的赔付率的相对风险大小。
当然,上述所述的赔付率、车险风险分值仅仅是一种或多种实施例对非线性关系风险排序模型的一种输出表征方式。本说明书不限制其他的实施例中还可以有其他的表征方式或者所述赔付率、车险风险分值经过变形、变换的表征方式,如赔付率经过线性变换后可以得到车险分,车险分越大,风险越小(车险风险分值相反,风险分值越大,风险越高)。
需要说明的是,通常所述的线性关系是指两个变量之间存在一次方函数,本说明书实施例中所述的保险或车险中变量的线性关系可以包括y=ax+b形式,x为自变量,y为因变量。本说明书实施例在具体的保险或车险业务应用场景中,所述的线性关系广义的理解可以是指两个变量之间的关系是明确的、固定的,一些情况下可以用直线表述或者通过一定的数学变化后转化为线性关系(转化的信息损失在一定范围内)。所述的非线性关系主要是指变量之间的关系是不断变化的,无法用公式描述,一些情况下只能用曲线、曲面或不规则的线来表示,如风险分值与职业、风险分值与性别。
本说明书一个或多个实施例中,所述的构建风险排序模型的处理,可以采用离线预先构建的方式生成,可以预先选取包含非线性关系的训练数据进行GBDT决策树的学习训练,训练完成后再在线上使用。本说明书不排除所述风险排序模型可以采用在线构建或更新/维护的方式,例如在计算机能力足够的情况下,可以在线构建出风险排序模型,构建出风险排序模型可以同步在线使用,对待预测用的目标风险关联数据进行处理。
虽然上述实施例提供了可以利用LambdaMART模型实现风险排序模型的实施方案,但本说明书不排除其他的实施例中可以利用其他数据模型输出用户之间的相对风险大小关系的实施方式,如其他lambdarank(一种排序算法)、listnet(一种排序算法)等的listwise方法(文档列表方法),上述中的。具体的,如图3所示,本说明书还提供另一种保险业务风险预测的处理方法,所述方法包括:
S30:获取待预测用户的目标风险关联数据;
S32:利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险相对值表征在指定用户集合中用户之间的相对风险大小关系。
输出的风险相对值可以是一个用户的值,也可以是多个用户的值。输出多个用户值 时,可以是表征用户之间相对风险大小排序后的输出结果,也可以是未排序的输出结果。
本说明书实施例提供的一种保险业务风险预测的处理方法,可以获取待预测用户的目标风险关联数据,然后利用利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的在风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。利用本说明书实施例提供的方法,可以通过在保险业务风险预测中引入演算梯度提升决策树,不仅兼容保险业务中非线性关系的保险业务数据的风险预测处理,还可以输出风险预测后的相对风险大小关系,排序后的风险预测结果表征的是不同用户之间风险的相对大小,可以提供另一种更加可靠的保险业务的风险预测实施方案。
上述所述的方法可以用于客户端一侧的风险识别,如移动终端的支付应用中提供的保险业务的风险评估。所述的客户端可以为PC(personal computer)机、服务器、工控机(工业控制计算机)、移动智能电话、平板电子设备、便携式计算机(例如笔记本电脑等)、个人数字助理(PDA)、或桌面型计算机或智能穿戴设备等。移动通信终端、手持设备、车载设备、可穿戴设备、电视设备、计算设备。也可以应用在保险公司或第三方保险服务机构的系统服务器中,所述的系统服务器可以包括单独的服务器、服务器集群、分布式系统服务器或者处理设备请求数据的服务器与其他相关联数据处理的系统服务器组合。例如,一种实现中可以包括建立在阿里云开放数据处理服务(Open Data Processing Service,简称ODPS)平台上。可以为来自不同用户需求的各种数据处理任务提供统一的编程接口和界面。基于ODPS进行系统性能的保障,实施本说明书实施例方法的系统可以并行处理海量数据并达到最佳的运算性能。
如前所述,本说明书实施例所提供的方法实施例可以在移动终端、计算机终端、服务器或者类似的运算装置中执行。以运行在服务器上为例,图4是本说明书提供的一种应用保险业务风险预测处理方法的服务器的硬件结构框图。如图4所示,服务器10可以包括一个或多个(图中仅示出一个)处理器102(处理器102可以包括但不限于微处理器MCU或可编程逻辑器件FPGA等的处理装置)、用于存储数据的存储器104、以及用于通信功能的传输模块106。本领域普通技术人员可以理解,图4所示的结构仅为示意,其并不对上述电子装置的结构造成限定。例如,服务器10还可包括比图4中所示更多或者更少的组件,例如还可以包括其他的处理硬件,如数据库或多级缓存,或者具有与图4所示不同的配置。
存储器104可用于存储应用软件的软件程序以及模块,如本发明实施例中的搜索方 法对应的程序指令/模块,处理器102通过运行存储在存储器104内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述导航交互界面内容展示的处理方法。存储器104可包括高速随机存储器,还可包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器104可进一步包括相对于处理器102远程设置的存储器,这些远程存储器可以通过网络连接至计算机终端10。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
传输模块106用于经由一个网络接收或者发送数据。上述的网络具体实例可包括计算机终端10的通信供应商提供的无线网络。在一个实例中,传输模块106包括一个网络适配器(Network Interface Controller,NIC),其可通过基站与其他网络设备相连从而可与互联网进行通讯。在一个实例中,传输模块106可以为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
基于上述所述的设备型号识别方法,本说明书还提供一种保险业务风险预测处理装置。所述的装置可以包括使用了本说明书实施例所述方法的系统(包括分布式系统)、软件(应用)、模块、组件、服务器、客户端等并结合必要的实施硬件的设备装置。基于同一创新构思,本说明书提供的一种实施例中的处理装置如下面的实施例所述。由于装置解决问题的实现方案与方法相似,因此本说明书实施例具体的处理装置的实施可以参见前述方法的实施,重复之处不再赘述。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。具体的,如图5所示,图5是本说明书提供的一种保险业务风险预测处理装置实施例的模块结构示意图,可以包括:
预测数据获取模块201,可以用于获取待预测用户的目标风险关联数据;
风险预测模块202,可以用于利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。
需要说明的是,本说明书实施例上述所述的装置,根据相关方法实施例的描述还可以包括其他的实施方式。具体的实现方式可以参照方法实施例的描述,在此不作一一赘述。
本说明书实施例提供的服务器或客户端可以在计算机中由处理器执行相应的程序指 令来实现,如使用windows操作系统的c++语言在PC端或服务器端实现,或其他例如Linux、系统相对应的应用设计语言集合必要的硬件实现,或者基于量子计算机的处理逻辑实现等。上述的处理设备可以具体的为保险服务器或第三方服务机构提供风险预测的服务器,所述的服务器可以为单独的服务器、服务器集群、分布式系统服务器或者处理设备请求数据的服务器与其他相关联数据处理的系统服务器组合。本说明书还提供一种保险业务风险预测处理设备,具体的可以包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现:
获取待预测用户的目标风险关联数据;
利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。
基于前述方式实施例所述,本说明书提供的所述处理设备的另一个实施例中,将用户归属标签相同的用户特征数据构造为一个样本训练集合,所述用户特征数据中包括与保险业务相关联的非线性关系的数据信息;
相应的,所述进行训练包括:输入第一样本训练集合中第一用户的特征数据,输出所述第一用户在第一用户集合中的风险相对值,所述第一用户集合为在所述第一样本训练集合中所包括的用户。
基于前述方式实施例所述,本说明书提供的所述处理设备的另一个实施例中,所述处理器执行所述指令时还实现:
基于所风险相对值,确定指定用户集合中用户之间的相对风险大小关系;
输出所述相对风险大小关系的数据信息。
基于前述方式实施例所述,本说明书提供的所述处理设备的另一个实施例中,所述风险排序模型为基于与车险业务相关联的风险关联数据进行训练得到的车险风险排序模型;
所述风险相对值包括所述待预测用户对应的赔付率的相对风险大小。
当然,本说明书提供的风险预测处理设备中使用的排序模型不限于LambdaMART模型,其他输出表示用户风险相对大小的其算法模型同样可以适用。因此,本说明书还提供另一种保险业务风险预测处理设备,具体的可以包括处理器以及用 于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现:
获取待预测用户的目标风险关联数据;
利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险相对值表征在指定用户集合中用户之间的相对风险大小关系。
上述的指令可以存储在多种计算机可读存储介质中。所述计算机可读存储介质可以包括用于存储信息的物理装置,可以将信息数字化后再以利用电、磁或者光学等方式的媒体加以存储。本实施例所述的计算机可读存储介质有可以包括:利用电能方式存储信息的装置如,各式存储器,如RAM、ROM等;利用磁能方式存储信息的装置如,硬盘、软盘、磁带、磁芯存储器、磁泡存储器、U盘;利用光学方式存储信息的装置如,CD或DVD。当然,还有其他方式的可读存储介质,例如量子存储器、石墨烯存储器等等。上述所述的装置或服务器或客户端或处理设备中的所涉及的指令同上描述。
需要说明的是,本说明书实施例上述所述的装置和处理设备,根据相关方法实施例的描述还可以包括其他的实施方式。具体的实现方式可以参照方法实施例的描述,在此不作一一赘述。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于硬件+程序类实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本说明书实施例提供的一种保险业务风险预测的处理方法、装置及处理设备,可以获取待预测用户的目标风险关联数据,然后利用利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的在风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。利用本说明书实施例提供的方法,可以通过在保险业务风险预测中引入演算梯度提升决策树, 不仅兼容保险业务中非线性关系的保险业务数据的风险预测处理,还可以输出风险预测后的相对风险大小关系,排序后的风险预测结果表征的是不同用户之间风险的相对大小,可以提供另一种更加可靠的保险业务的风险预测实施方案。
虽然本申请提供了如实施例或流程图所述的方法操作步骤,但基于常规或者无创造性的劳动可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。在实际中的装置或系统服务器产品执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境)。
尽管本说明书实施例内容中提到线性关系/非线性关系的定义、LambdaMART中GDBT底层模型的构建、GBDT模型算法的处理过程等之类的数据获取、存储、交互、计算、判断等操作和数据描述,但是,本说明书实施例并不局限于必须是符合行业通信标准、标准GBDT模型算法处理、通信协议和标准数据模型/模板或本说明书实施例所描述的情况。某些行业标准或者使用自定义方式或实施例描述的实施基础上略加修改后的实施方案也可以实现上述实施例相同、等同或相近、或变形后可预料的实施效果。应用这些修改或变形后的数据获取、存储、判断、处理方式等获取的实施例,仍然可以属于本说明书的可选实施方案范围之内。
在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable Gate Array,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University  Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language)等,目前最普遍使用的是VHDL(Very-High-Speed Integrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。
控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
上述实施例阐明的处理设备、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、车载人机交互设备、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。
虽然本说明书实施例提供了如实施例或流程图所述的方法操作步骤,但基于常规或者无创造性的手段可以包括更多或者更少的操作步骤。实施例中列举的步骤顺序仅仅为众多步骤执行顺序中的一种方式,不代表唯一的执行顺序。在实际中的装置或终端产品执行时,可以按照实施例或者附图所示的方法顺序执行或者并行执行(例如并行处理器或者多线程处理的环境,甚至为分布式数据处理环境)。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、产品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、产品或者设备所固有的要素。在没有更多限制的情况下,并不排除 在包括所述要素的过程、方法、产品或者设备中还存在另外的相同或等同要素。
为了描述的方便,描述以上装置时以功能分为各种模块分别描述。当然,在实施本说明书实施例时可以把各模块的功能在同一个或多个软件和/或硬件中实现,也可以将实现同一功能的模块由多个子模块或子单元的组合实现等。以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内部包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
本领域技术人员应明白,本说明书的实施例可提供为方法、系统或计算机程序产品。因此,本说明书实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本说明书实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本说明书实施例可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本说明书实施例,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本说明书实施例的至少一个实施例或示 例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。
以上所述仅为本说明书实施例的实施例而已,并不用于限制本说明书实施例。对于本领域技术人员来说,本说明书实施例可以有各种更改和变化。凡在本说明书实施例的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本说明书实施例的权利要求范围之内。

Claims (11)

  1. 一种保险业务风险预测的处理方法,所述方法包括:
    获取待预测用户的目标风险关联数据;
    利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。
  2. 如权利要求1所述的方法,所述风险排序模型在训练过程中使用的风险关联数据包括:
    将用户归属标签相同的用户特征数据构造为一个样本训练集合,所述用户特征数据中包括与保险业务相关联的非线性关系的数据信息;
    相应的,所述进行训练包括:输入第一样本训练集合中第一用户的特征数据,输出所述第一用户在第一用户集合中的风险相对值,所述第一用户集合为在所述第一样本训练集合中所包括的用户。
  3. 如权利要求2所述的方法,所述方法还包括:
    基于所风险相对值,确定指定用户集合中用户之间的相对风险大小关系;
    输出所述相对风险大小关系的数据信息。
  4. 如权利要求3中所述的方法,所述风险排序模型为基于与车险业务相关联的风险关联数据进行训练得到的车险风险排序模型;
    所述风险相对值包括所述待预测用户对应的赔付率的相对风险大小。
  5. 一种保险业务风险预测的处理方法,包括:
    获取待预测用户的目标风险关联数据;
    利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险相对值表征在指定用户集合中用户之间相对风险大小的关系。
  6. 一种保险业务风险预测处理装置,包括:
    预测数据获取模块,用于获取待预测用户的目标风险关联数据;
    风险预测模块,用于利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。
  7. 一种保险业务风险预测处理设备,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现:
    获取待预测用户的目标风险关联数据;
    利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险排序模型包括:利用已打标的风险关联数据对演算梯度提升决策树进行训练确定的排序模型。
  8. 如权利要求7所述的处理设备,所述风险排序模型在训练过程中使用的风险关联数据包括:
    将用户归属标签相同的用户特征数据构造为一个样本训练集合,所述用户特征数据中包括与保险业务相关联的非线性关系的数据信息;
    相应的,所述进行训练包括:输入第一样本训练集合中第一用户的特征数据,输出所述第一用户在第一用户集合中的风险相对值,所述第一用户集合为在所述第一样本训练集合中所包括的用户。
  9. 如权利要求8所述的处理设备,所述处理器执行所述指令时还实现:
    基于所风险相对值,确定指定用户集合中用户之间的相对风险大小关系;
    输出所述相对风险大小关系的数据信息。
  10. 如权利要求9所述的处理设备,所述风险排序模型为基于与车险业务相关联的风险关联数据进行训练得到的车险风险排序模型;
    所述风险相对值包括所述待预测用户对应的赔付率的相对风险大小。
  11. 一种保险业务风险预测处理设备,包括处理器以及用于存储处理器可执行指令的存储器,所述处理器执行所述指令时实现:
    获取待预测用户的目标风险关联数据;
    利用构建的风险排序模型对所述目标风险关联数据进行处理,输出所述待预测用户的风险相对值,所述风险相对值表征在指定用户集合中用户之间的相对风险大小关系。
PCT/CN2019/076024 2018-05-16 2019-02-25 一种保险业务风险预测的处理方法、装置及处理设备 WO2019218748A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810468154.1A CN108694673A (zh) 2018-05-16 2018-05-16 一种保险业务风险预测的处理方法、装置及处理设备
CN201810468154.1 2018-05-16

Publications (1)

Publication Number Publication Date
WO2019218748A1 true WO2019218748A1 (zh) 2019-11-21

Family

ID=63846360

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/076024 WO2019218748A1 (zh) 2018-05-16 2019-02-25 一种保险业务风险预测的处理方法、装置及处理设备

Country Status (3)

Country Link
CN (1) CN108694673A (zh)
TW (1) TW201947510A (zh)
WO (1) WO2019218748A1 (zh)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694673A (zh) * 2018-05-16 2018-10-23 阿里巴巴集团控股有限公司 一种保险业务风险预测的处理方法、装置及处理设备
CN109657696B (zh) * 2018-11-05 2023-06-30 创新先进技术有限公司 多任务监督学习模型训练、预测方法和装置
US10977738B2 (en) * 2018-12-27 2021-04-13 Futurity Group, Inc. Systems, methods, and platforms for automated quality management and identification of errors, omissions and/or deviations in coordinating services and/or payments responsive to requests for coverage under a policy
CN109919783A (zh) * 2019-01-31 2019-06-21 德联易控科技(北京)有限公司 车险理赔案件的风险识别方法、装置、设备及存储介质
CN110110906B (zh) * 2019-04-19 2023-04-07 电子科技大学 一种基于Efron近似优化的生存风险建模方法
CN110119784B (zh) * 2019-05-16 2020-08-04 重庆天蓬网络有限公司 一种订单推荐方法及装置
CN110503565B (zh) * 2019-07-05 2024-02-06 中国平安人寿保险股份有限公司 行为风险识别方法、系统、设备及可读存储介质
CN110428091B (zh) * 2019-07-10 2022-12-27 平安科技(深圳)有限公司 基于数据分析的风险识别方法及相关设备
CN111126628B (zh) * 2019-11-21 2021-03-02 支付宝(杭州)信息技术有限公司 在可信执行环境中训练gbdt模型的方法、装置及设备
CN111553800B (zh) * 2020-04-30 2023-08-25 上海商汤智能科技有限公司 一种数据处理方法及装置、电子设备和存储介质
CN113628026A (zh) * 2021-06-30 2021-11-09 重庆度小满优扬科技有限公司 一种用于预测逾期风险排序的方法和装置
CN113610354A (zh) * 2021-07-15 2021-11-05 北京淇瑀信息科技有限公司 第三方平台用户的策略分配方法、装置及电子设备
CN113592371B (zh) * 2021-10-08 2022-01-18 北京市科学技术研究院城市安全与环境科学研究所 基于多维风险矩阵的综合风险分析系统、方法和设备
CN116051296B (zh) * 2022-12-28 2023-09-29 中国银行保险信息技术管理有限公司 基于标准化保险数据的客户评价分析方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462611A (zh) * 2015-01-05 2015-03-25 五八同城信息技术有限公司 信息排序模型的建模方法、排序方法及建模装置、排序装置
CN106779272A (zh) * 2015-11-24 2017-05-31 阿里巴巴集团控股有限公司 一种风险预测方法和设备
CN107292528A (zh) * 2017-06-30 2017-10-24 阿里巴巴集团控股有限公司 车险风险预测方法、装置及服务器
CN107832581A (zh) * 2017-12-15 2018-03-23 百度在线网络技术(北京)有限公司 状态预测方法和装置
CN108694673A (zh) * 2018-05-16 2018-10-23 阿里巴巴集团控股有限公司 一种保险业务风险预测的处理方法、装置及处理设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971343A (zh) * 2016-01-13 2017-07-21 平安科技(深圳)有限公司 保险数据的风险分析方法及系统
CN111784348A (zh) * 2016-04-26 2020-10-16 阿里巴巴集团控股有限公司 账户风险识别方法及装置
CN107798390B (zh) * 2017-11-22 2023-03-21 创新先进技术有限公司 一种机器学习模型的训练方法、装置以及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462611A (zh) * 2015-01-05 2015-03-25 五八同城信息技术有限公司 信息排序模型的建模方法、排序方法及建模装置、排序装置
CN106779272A (zh) * 2015-11-24 2017-05-31 阿里巴巴集团控股有限公司 一种风险预测方法和设备
CN107292528A (zh) * 2017-06-30 2017-10-24 阿里巴巴集团控股有限公司 车险风险预测方法、装置及服务器
CN107832581A (zh) * 2017-12-15 2018-03-23 百度在线网络技术(北京)有限公司 状态预测方法和装置
CN108694673A (zh) * 2018-05-16 2018-10-23 阿里巴巴集团控股有限公司 一种保险业务风险预测的处理方法、装置及处理设备

Also Published As

Publication number Publication date
TW201947510A (zh) 2019-12-16
CN108694673A (zh) 2018-10-23

Similar Documents

Publication Publication Date Title
WO2019218751A1 (zh) 一种保险业务风险预测的处理方法、装置及处理设备
WO2019218748A1 (zh) 一种保险业务风险预测的处理方法、装置及处理设备
WO2019196545A1 (zh) 保险欺诈识别的数据处理方法、装置、设备及服务器
AU2021232839B2 (en) Updating Attribute Data Structures to Indicate Trends in Attribute Data Provided to Automated Modelling Systems
US11030415B2 (en) Learning document embeddings with convolutional neural network architectures
Liu et al. Simulating land-use dynamics under planning policies by integrating artificial immune systems with cellular automata
KR102455325B1 (ko) 대량의 구조화되지 않은 데이터 필드에서 기술적 및 의미론적 신호 처리
Liu et al. Stock price movement prediction from financial news with deep learning and knowledge graph embedding
CN113610239B (zh) 针对机器学习的特征处理方法及特征处理系统
TW201905773A (zh) 車險風險預測方法、裝置及伺服器
CN111831826B (zh) 跨领域的文本分类模型的训练方法、分类方法以及装置
CN114647741A (zh) 工艺自动决策和推理方法、装置、计算机设备及存储介质
CN113326852A (zh) 模型训练方法、装置、设备、存储介质及程序产品
CN111178533B (zh) 实现自动半监督机器学习的方法及装置
CN114707041B (zh) 消息推荐方法、装置、计算机可读介质及电子设备
Li et al. Piecewise convolutional neural networks with position attention and similar bag attention for distant supervision relation extraction
CN113723077B (zh) 基于双向表征模型的句向量生成方法、装置及计算机设备
Chen et al. Futures price prediction modeling and decision-making based on DBN deep learning
CN113837307A (zh) 数据相似度计算方法、装置、可读介质及电子设备
CN111353728A (zh) 一种风险分析方法和系统
CN117009621A (zh) 信息搜索方法、装置、电子设备、存储介质及程序产品
Zhang et al. Intelligent travelling visitor estimation model with big data mining
Padhi et al. Feature Enhancement-Based Stock Prediction Strategy to Forecast the Fiscal Market
Li et al. Time series classification with deep neural networks based on Hurst exponent analysis
CN114066278B (zh) 物品召回的评估方法、装置、介质及程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19803185

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19803185

Country of ref document: EP

Kind code of ref document: A1