CN110866832A

CN110866832A - Risk control method, system, storage medium and computing device

Info

Publication number: CN110866832A
Application number: CN201910943137.3A
Authority: CN
Inventors: 赵春亮
Original assignee: Beijing Absolute Health Ltd
Current assignee: Beijing Absolute Health Ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-03-06

Abstract

The invention discloses a risk control method, a system, a storage medium and computing equipment, wherein the method predicts the risk of the case report information of a user by utilizing a constructed user risk scoring target model, executes corresponding risk control operation based on the prediction result, can efficiently and accurately identify the authenticity of the case report information of the user, and solves the problem of low efficiency in the prior art that manual analysis and induction extraction are needed by utilizing a rule engine, thereby effectively realizing the purpose of risk control.

Description

Risk control method, system, storage medium and computing device

Technical Field

The invention relates to the technical field of machine learning and Internet, in particular to a risk control method, a risk control system, a risk control storage medium and computing equipment.

Background

In recent years, with the increasing maturity and perfection of mobile internet, a novel mutual-aid community insurance form based on community relations is more and more well known and accepted by the public, mutual-aid fulfillment and compensation services similar to the traditional insurance industry are gradually perfected, and how to effectively identify the authenticity of information provided by users applying mutual-aid fulfillment reports becomes a key factor for the healthy and durable development of mutual-aid insurance.

In the related art, the behavior of fraud protection is generally based on the way of defining rules to control the risk of insurance claims, and after the fraud is found, an analyst can extract key points of the case, abstract the key points as a rule, and add the rule to a rule engine. When a new report appears, the system firstly queries a rule engine system and returns a risk control result according to the rule hit condition.

The above scheme has certain defects: firstly, the maintenance cost of the rule engine is high, once the scale of fraud is increased, a large number of analysts are needed to analyze the reason of the fraud and comb out the fraud rules, and the rule output period is long; second, rule weights cannot be learned automatically, and when a certain reported case hits multiple rules at the same time, it is impossible to determine which rule is more important, and rule weights cannot be maintained automatically.

Therefore, how to efficiently and accurately identify the authenticity of the user report information so as to realize risk control becomes a technical problem to be solved urgently at present.

Disclosure of Invention

In view of this, embodiments of the present invention provide a risk control method, system, storage medium, and computing device, which can efficiently and accurately identify the authenticity of user application information, so as to achieve the purpose of risk control.

According to an aspect of an embodiment of the present invention, there is provided a risk control method, including:

receiving the report information of a user, and performing text processing on the report information of the user to obtain report information after the text processing;

predicting the risk corresponding to the report information after the text processing by using the constructed user risk scoring target model to obtain a prediction result;

and executing corresponding risk control operation based on the prediction result.

Optionally, constructing a user risk scoring target model by:

analyzing the collected user portrait data to obtain a user sample data set;

constructing an initial model of user risk score;

selecting training set data from the user sample data set, and training the user risk score initial model by using the training set data to obtain a trained user risk score model;

selecting test set data from the user sample data set, evaluating the trained user risk scoring model by using the test set data, and determining whether to optimize and adjust the trained user risk scoring model according to an evaluation result so as to obtain a user risk scoring target model.

Optionally, the user representation data includes one or more items of user basic attribute information, user behavior information, user service information, user report information and user social information.

Optionally, the analyzing the collected user portrait data to obtain a user sample data set includes:

analyzing the collected user portrait data to determine a user risk score value;

extracting one or more user features from the collected user portrait data, and determining a feature value of each of the one or more user features according to feature information of the one or more user features;

generating a user sample data set based on the user risk score value, the one or more user characteristics, and respective characteristic values of the one or more user characteristics.

Optionally, the analyzing the collected user representation data to determine a user risk score value includes:

acquiring historical report information of a user from the collected user portrait data;

and determining the risk score value of the user according to the historical report information of the user.

Optionally, determining a feature value of each of the one or more user features according to the feature information of the one or more user features includes:

carrying out specific value filling processing on the characteristic information of the one or more user characteristics to determine the characteristic value of each of the one or more user characteristics; and/or

And carrying out quantitative conversion processing on the characteristic information of the one or more user characteristics, and determining the characteristic value of each of the one or more user characteristics.

Optionally, training the initial user risk score model by using the training set data to obtain a trained user risk score model, including:

and taking the characteristic values of one or more user characteristics in the training set data as the input of the user risk scoring initial model, taking the user risk scoring values in the training set data as the output of the user risk scoring initial model, training the user risk scoring initial model, and determining the weight of one or more user characteristics so as to obtain the trained user risk scoring model.

Optionally, the evaluating the trained user risk score model by using the test set data, and determining whether to optimize and adjust the trained user risk score model according to an evaluation result, so as to obtain a user risk score target model, including:

inputting the characteristic values of one or more user characteristics in the test set data into the trained user risk scoring model to obtain the user risk scoring value output by the trained user risk scoring model;

comparing the user risk score value in the test set data with an output user risk score value, and calculating the accuracy of an output result of the trained user risk score model according to a comparison result;

if the accuracy of the output result is greater than or equal to a preset threshold value, determining that the trained user risk scoring model is not subjected to optimization adjustment, and taking the trained user risk scoring model as a user risk scoring target model;

and if the accuracy of the output result is smaller than a preset threshold value, determining to perform optimization adjustment on the trained user risk scoring model, and taking the model after optimization adjustment as a user risk scoring target model.

Optionally, the receiving the user's application information includes:

providing an interface for reporting cases for a user;

and acquiring the report information submitted by the user according to the report interface of the case.

Optionally, predicting the risk corresponding to the report information after the text processing by using the constructed user risk scoring target model includes:

acquiring characteristic values of one or more user characteristics according to the report information after the text processing;

inputting the obtained characteristic values of one or more user characteristics into the constructed user risk score target model, and calculating a user risk score value corresponding to the report information after text processing;

and predicting the risk corresponding to the report information after the text processing according to the user risk score value corresponding to the report information after the text processing.

Optionally, predicting the risk corresponding to the report information after the text processing according to the user risk score value corresponding to the report information after the text processing includes:

if the user risk score value corresponding to the report information after the text processing is smaller than a preset score threshold value, predicting to obtain a prediction result of the report information of the user corresponding to a first risk case;

and if the user risk score value corresponding to the report information after the text processing is greater than or equal to a preset score threshold value, predicting to obtain a prediction result of a second risk case corresponding to the report information of the user.

Optionally, performing a corresponding risk control operation based on the prediction result, including:

if the prediction result is that the report information of the user corresponds to a first risk case, generating alarm prompt information which represents that the report information of the user corresponds to the first risk case;

and if the prediction result is that the report information of the user corresponds to a second risk case, generating general prompt information which represents that the report information of the user corresponds to the second risk case.

according to a predefined rule, judging the risk corresponding to the report information after the text processing to obtain a judgment result;

and executing corresponding risk control operation by combining the prediction result and the judgment result.

Optionally, the performing, in combination with the prediction result and the determination result, a corresponding risk control operation includes:

if the prediction result and the judgment result are both the case report information of the user corresponding to a first risk case, generating alarm prompt information;

and if the prediction result and the judgment result are both the case report information of the user corresponding to a second risk case, generating general prompt information.

According to another aspect of the embodiments of the present invention, there is also provided a risk control system, including:

the text processing module is suitable for receiving the report information of the user and performing text processing on the report information of the user to obtain the report information after the text processing;

the prediction module is suitable for predicting the risk corresponding to the report information after the text processing by utilizing the constructed user risk scoring target model to obtain a prediction result;

and the risk control module is suitable for executing corresponding risk control operation based on the prediction result.

Optionally, the system further comprises a construction module adapted to:

analyzing the collected user portrait data to obtain a user sample data set;

constructing an initial model of user risk score;

Optionally, the system further comprises a data collection module adapted to:

collecting user portrait data;

the user portrait data comprises one or more items of user basic attribute information, user behavior information, user service information, user report information and user social information.

Optionally, the building module is further adapted to:

Optionally, the text processing module is further adapted to:

providing an interface for reporting cases for a user;

Optionally, the prediction module is further adapted to:

Optionally, the risk control module is further adapted to:

According to yet another aspect of embodiments of the present invention, there is also provided a computer-readable storage medium storing computer program code which, when run on a computing device, causes the computing device to perform any of the above described risk control methods.

According to still another aspect of the embodiments of the present invention, there is also provided a computing device including: a processor; a memory storing computer program code; the computer program code, when executed by the processor, causes the computing device to perform any of the risk control methods described above.

By means of the technical scheme, the embodiment of the invention predicts the risk of the case report information of the user by utilizing the constructed user risk scoring target model, executes corresponding risk control operation based on the prediction result, can efficiently and accurately identify the authenticity of the case report information of the user, solves the problem of low efficiency caused by manual analysis and induction and extraction of a rule engine in the related technology, and effectively realizes the purpose of risk control.

Furthermore, the user risk scoring target model is constructed based on the multi-dimensional user characteristics and the user risk labels, so that the result of predicting the risk of the report information of the user by utilizing the constructed user risk scoring target model is more comprehensive and accurate, and the model has strong applicability.

In addition, the embodiment of the invention realizes automatic rule definition and automatic feature weight learning by the structuralization and characterization processing of the user portrait data, thereby greatly simplifying the traditional processing flow based on a rule engine; the traditional wind control purpose based on the rule engine can be realized only by a small amount of professional maintenance, the efficiency of mutual-aid performance audit is accelerated, the probability of default claim payment is reduced, and the healthy and long-term development of mutual-aid insurance is further guaranteed.

The above description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

The above and other objects, advantages and features of the present invention will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

The invention will be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

FIG. 1 shows a flow diagram of a risk control method of an embodiment of the invention;

FIG. 2 illustrates a flow diagram of a method of building a user risk scoring model according to an embodiment of the invention;

FIG. 3 is a flow diagram illustrating a risk control architecture according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an offline model training process according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating online model prediction according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating the distribution of claim rejection scores according to one embodiment of the invention;

FIG. 7 illustrates a block diagram of a risk control system according to an embodiment of the present invention;

FIG. 8 shows a block diagram of a risk control system according to another embodiment of the invention;

FIG. 9 illustrates a hierarchical architecture diagram of a risk control system according to an embodiment of the present invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless it is specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Embodiments of the invention are applicable to computer systems/servers operable with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the computer system/server include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network pcs, minicomputers, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

The computer system/server may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

In the related art, the risk control system based on the rule engine is simple to understand and simple to implement. On one hand, however, as the rule engine system excessively depends on manual analysis and induction extraction, the process flow is more complicated by purely depending on manual operation along with the continuous increase of the data volume, and the maintenance cost exponentially increases; on the other hand, since the rule weight cannot be automatically learned, in the case of multi-rule hit, the rule weight updating mechanism is difficult to process, further resulting in the decrease of the usability and accuracy of the rule engine.

In order to solve the above technical problems, embodiments of the present invention provide a risk control method, a risk control system, a computer-readable storage medium, and a computing device, which can efficiently and accurately identify authenticity of user report information, thereby achieving a purpose of risk control.

Fig. 1 shows a flow chart of a risk control method according to an embodiment of the invention. As shown in fig. 1, the method may include the following steps S101 to S103:

s101, receiving the report information of a user, and performing text processing on the report information of the user to obtain the report information after the text processing;

s102, predicting the risk corresponding to the report information after the text processing by using the constructed user risk scoring target model to obtain a prediction result;

and S103, executing corresponding risk control operation based on the prediction result.

The embodiment of the invention predicts the risk of the report information of the user by using the constructed user risk scoring target model, executes corresponding risk control operation based on the prediction result, can efficiently and accurately identify the authenticity of the report information of the user, solves the problem of low efficiency caused by manual analysis and induction and extraction of the rule engine in the related technology, and effectively realizes the purpose of risk control.

In the above step S101, the report information of the user is received, an optional scheme is provided in the embodiment of the present invention, that is, an interface for reporting a case can be provided for the user, and then the report information submitted by the user is obtained according to the interface for reporting a case.

In a specific application scenario, the report information of the user may include characters, pictures and the like, and the embodiment of the invention can extract the picture information by an OCR (Optical Character Recognition) method and convert the picture information into a text; then preprocessing the converted text, including space filtering, label extraction by using a label dictionary and the like; and then, label Processing is carried out by utilizing the NLP (Natural Language Processing) technology, synonyms are combined, word segmentation, prefix postaffixing, entity alignment and label grading are carried out, so that manual examination and verification of pictures are not needed, and the examination and verification efficiency of the reported materials is greatly improved.

In an alternative embodiment of the present invention, as shown in fig. 2, a user risk scoring target model may be constructed through steps S201 to S204:

s201, analyzing and processing the collected user portrait data to obtain a user sample data set;

s202, constructing an initial model of user risk score;

s203, selecting training set data from the user sample data set, and training the user risk scoring initial model by using the training set data to obtain a trained user risk scoring model;

s204, selecting test set data from the user sample data set, evaluating the trained user risk scoring model by using the test set data, and determining whether to optimize and adjust the trained user risk scoring model according to an evaluation result so as to obtain a user risk scoring target model.

The user image data mentioned in step S201 above may be user basic attribute information, user behavior information, user service information, user report information, user social information, and the like, which is not limited in this embodiment of the present invention.

Step S201 performs analysis processing on the collected user portrait data to obtain a user sample data set, and an optional scheme provided in the embodiment of the present invention may include the following steps a1 to A3.

A1, analyzing the collected user representation data to determine a user risk score value.

In this step, historical report information of the user may be specifically acquired from the collected user portrait data, and the user risk score value may be determined according to the historical report information of the user. Here, the case is a fraud protection case or a normal claim settlement case has been recorded or marked in the history report information of the user, so that the user risk score value can be determined to be 1 if the case is a fraud protection case according to the history report information of the user; if it is a normal claim case, the user risk score value may be determined to be 0. It should be noted that, the user risk score value 1 or 0 is only an illustration, and the user risk score value may be represented by a number, a letter, a word or other symbols, and may be specifically selected according to actual needs, which is not limited in this embodiment of the present invention.

A2, extracting one or more user characteristics from the collected user portrait data, and determining the characteristic value of each of the one or more user characteristics according to the characteristic information of the one or more user characteristics.

In this step, a specific value filling process may be performed on the feature information of the one or more user features to determine a feature value of each of the one or more user features; or quantitative conversion processing can be carried out on the characteristic information of one or more user characteristics to determine the characteristic value of each of the one or more user characteristics.

Furthermore, the respective characteristic values of one or more user characteristics can be normalized to unify the measurement units, so that subsequent calculation and analysis are facilitated.

A3, generating a user sample data set based on the user risk score value, the one or more user characteristics and the characteristic value of each of the one or more user characteristics.

In this step, one or more user features may be screened to generate a user sample data set, and specifically, the influence of the user features to be selected on the user risk score value may be measured, the user features meeting the preset influence condition may be selected to generate the user sample data set, and the user features not meeting the preset influence condition may be removed. The measurement method may adopt methods such as information gain, correlation, and the like, which is not limited in this embodiment of the present invention.

For the generated user sample data set, because the data difference between the fraud protection case and the normal claim case is relatively large, if the fraud protection case is taken as a negative sample, and the normal claim case is taken as a positive sample, the final error of the model is relatively large due to the imbalance of the positive sample and the negative sample. Therefore, when training set data or test set data is selected from the user sample data set, a method combining up-sampling and down-sampling can be used for processing, on one hand, a plurality of negative samples are copied, on the other hand, a certain number of positive samples are randomly extracted, and the proportion of the positive samples and the negative samples is controlled within an effective range, wherein the effective range can be determined according to actual requirements, and the embodiment of the invention is not limited to this.

In step S203, the training set data is used to train the initial user risk score model to obtain the trained initial user risk score model, and an optional scheme is provided in the embodiment of the present invention.

In the step S204, the trained user risk scoring model is evaluated by using the test set data, and whether the trained user risk scoring model is optimized and adjusted is determined according to the evaluation result, so as to obtain the user risk scoring target model.

And B1, inputting the characteristic values of one or more user characteristics in the test set data into the trained user risk scoring model to obtain the user risk scoring value output by the trained user risk scoring model.

And B2, comparing the user risk score value in the test set data with the output user risk score value, and calculating the accuracy of the output result of the trained user risk score model according to the comparison result.

And B3, if the accuracy of the output result is greater than or equal to a preset threshold, determining that the trained user risk scoring model is not optimized and adjusted, and taking the trained user risk scoring model as a user risk scoring target model.

In this step, the preset threshold may be 90% or 95%, and the like, and may be specifically set according to actual needs, which is not limited in this embodiment of the present invention.

And B4, if the accuracy of the output result is smaller than a preset threshold, determining to optimize and adjust the trained user risk scoring model, and taking the optimized and adjusted model as a user risk scoring target model.

In this step, the trained user risk score model is optimized and adjusted, and the step S203 may be returned to train the model, so as to optimize and adjust the model.

After the user risk scoring target model is constructed, step S102 predicts the risk corresponding to the report information after the text processing by using the constructed user risk scoring target model, and an optional scheme is provided in the embodiment of the present invention, and the scheme may include the following steps C1 to C3.

C1, according to the report information after text processing, obtaining one or more characteristic values of user characteristics.

And C2, inputting the obtained characteristic values of one or more user characteristics into the constructed user risk score target model, and calculating the user risk score value corresponding to the report information after text processing.

And C3, predicting the risk corresponding to the report information after the text processing according to the user risk score value corresponding to the report information after the text processing.

In the step, if the user risk score value corresponding to the report information after the text processing is smaller than a preset score threshold value, predicting to obtain a prediction result of the report information of the user corresponding to the first risk case; if the user risk score value corresponding to the processed report information is larger than or equal to the preset score threshold value, the prediction result of the second risk case corresponding to the report information of the user is measured in advance. The preset scoring threshold may be 10 or 20, and may be specifically set according to actual requirements, which is not limited in this embodiment of the present invention.

After predicting the risk corresponding to the report information after the text processing by using the constructed user risk scoring target model in step S102 to obtain a prediction result, step S103 may execute a corresponding risk control operation based on the prediction result, and specifically, if the prediction result is that the report information of the user corresponds to the first risk case, generate alarm prompt information indicating that the report information of the user corresponds to the first risk case; and if the prediction result is that the report information of the user corresponds to the second risk case, generating general prompt information which indicates that the report information of the user corresponds to the second risk case. The first risk case is a high risk case, so that the alarm prompt message can be generated; the second risk case is a low risk case, so general hints may be generated.

In an optional embodiment of the present invention, step S103 executes a corresponding risk control operation based on the prediction result, or may determine a risk corresponding to the report information after the text processing according to a predefined rule, so as to obtain a determination result; and then corresponding risk control operation is executed by combining the prediction result and the judgment result. The predefined rule may be, for example, a performance rule, and the embodiment of the present invention is not limited thereto.

Further, if the prediction result and the judgment result are both the report information of the user corresponding to the first risk case, generating alarm prompt information; and if the prediction result and the judgment result are both the case report information of the user corresponding to the second risk case, generating general prompt information. The first risk case is a high risk case, so that the alarm prompt message can be generated; the second risk case is a low risk case, so a general reminder message can be generated.

And if the prediction result is that the report information of the user corresponds to the first risk case and the judgment result is that the report information of the user corresponds to the third risk case, outputting the prompt information of which the prediction result is the first risk case and the prompt information of which the judgment result is the third risk case.

And if the prediction result is that the report information of the user corresponds to the second risk case and the judgment result is that the report information of the user corresponds to the third risk case, outputting the prompt information of which the prediction result is the second risk case and the prompt information of which the judgment result is the third risk case.

In the above, various implementation manners of each link in the embodiments shown in fig. 1 and fig. 2 are introduced, and the risk control method provided by the embodiment of the present invention is further described below by using specific embodiments.

In the embodiment of the invention, the risk control method based on the data mining technology mainly comprises a user portrait data collection stage, an off-line model training stage and an on-line model prediction stage. Fig. 3 is a schematic diagram illustrating a flow architecture of risk control according to an embodiment of the present invention, as shown in fig. 3, including a data warehouse 310, feature information 320, model training 330, model prediction 340, and risk control 350, and the following describes each part of the flow architecture of risk control in detail.

First, user portrait data collection phase

At this stage, user-related image data is extracted from the data repository 310 to construct user characteristic information, which may include the following five aspects:

(1) user basic attribute information

May include information such as the user's name, gender, age, occupation, registration information, identification card, etc.

(2) User behavior information

The access condition of the user in the application program can be included, such as access time, place, IP, access times, registration channel and other information.

(3) User service information

The information of the mutual help joining time, the mutual help joining plan type, the joining channel, the joining amount, the recharging condition, whether to fund, whether to buy insurance, whether to pay attention to the public number, whether to cancel the attention to the public number and the like can be included.

(4) User case information

The information of the reporting time, the reporting place, the reporting history, the treatment time, the obtained diseases, the treatment record and the like can be included.

(5) User social information

May include friend information, social comment information, and the like.

Second, off-line model training stage

This stage may include reporting user risk annotation, feature analysis processing, model training and model evaluation, and fig. 4 shows a schematic diagram of an offline model training process according to an embodiment of the present invention. As shown in FIG. 4, including user representation 410, feature analysis 420, model training 430, model evaluation 440, determination 450 of whether the algorithmic model effect is significant, and model storage 460, portions of which are described in detail below.

1. Reporting user risk annotation

The embodiment of the invention adopts a supervised machine learning technology, needs to manually mark the classification category of the target, gives a blacklist mark isBlack of 1 to the identified fraudulent user, and gives a common mark isBlack of 0 to the common user.

2. Stage of feature analysis processing

The stage mainly comprises specific value filling, normalization processing, characteristic quantitative conversion, information entropy (information gain) calculation, data equalization processing and the like. The specific value filling here may be performed by filling data that is not provided by the user with a specific value, where the specific value may be a null value or a value that does not affect the model result.

In one specific embodiment, the feature analysis process is specifically as follows:

(1) data normalization processing

The normalized formula is (for a particular index): x-min/(max-min)

X is the value of a certain index of one person, max is the maximum value of all persons in the index, and min is the minimum value of all persons in the index.

(2) Quantitative conversion of features

Since the algorithmic model requires that all inputs be of numeric type, quantitative conversion of textual type data is required, as an example of adding a mutual applicant relationship, to convert as follows:

i am	Lovers	Father and father	Mother	Brother	Brother	…
							1	2	3	4	5	6	…

(3) Information gain (K-L divergence) calculation

The K-L divergence can measure the information lost when one distribution is used to approximate another distribution, and can be used to provide the impact of one feature on the overall correlation.

For example, a variable X is set, whichThere are n possible values, each with a probability of P_iThen the entropy of X is defined as:

the conditional entropy of X under Y is:

the information gain (K-L divergence) is:

g(D，A)＝H(D)-H(D|A)

the information gain represents how much the degree of clutter is reduced after the data set D is classified using feature a. The larger the information gain, the stronger the classification, indicating that the feature is more relevant.

(4) Data equalization processing

Because the data difference between the fraud protection case and the normal claim case is large, if the fraud protection case is used as a negative sample, and the normal claim case is used as a positive sample, the final error of the model is large due to the imbalance of the positive sample and the negative sample.

The method adopted here is a method combining up-sampling and down-sampling, on one hand, negative samples are copied into multiple copies, on the other hand, positive samples are randomly extracted in a certain number, and the proportion of the positive samples and the negative samples is controlled within an effective range, wherein the effective range can be determined according to actual requirements, and the method is not limited in the embodiment of the invention.

3. Model training and model assessment

At this stage, training set data and test set data are divided for the user sample data set, and a LightGBM algorithm is applied to the training set data to train the constructed user risk score initial model. The LightGBM algorithm is an open-source gradient lifting framework based on a decision tree algorithm, and the algorithm has the characteristics of high running efficiency, automatic feature sequencing, strong interpretability and the like.

After the model is trained successfully, the algorithm is evaluated by applying the test set data, and if the algorithm model has an obvious effect, the algorithm model is stored; and returning to carry out model training if the effect of the algorithm model is not obvious. The training parameters are continuously optimized and adjusted, so that the model effect is optimal.

In one embodiment, the model training and model evaluation may include the following steps:

(1) firstly, training set data and test set data are selected from a user sample data set, 70% of the data are taken as the training set data, and 30% of the data are taken as the test set data.

(2) And carrying out model training on the training set data.

Adjusting several main parameters of the LightGBM algorithm, such as max _ depth, num _ leaves, bagging _ fraction, bagging _ freq, etc., wherein max _ depth takes values between 3 and 5 and includes integers of 3 and 5, which represent the maximum depth of the constraint tree to prevent overfitting; num _ leaves takes a value between 5 and 10 and comprises integers of 5 and 10, the number of leaf nodes is represented, and the complexity of the tree model is controlled; the value of the bagging _ fraction is between 0 and 1, and represents the sampling proportion of the sample for building the tree; the bagging _ freq takes on an integer between 3 and 7 and including 3 and 7, representing the number of iterations. The result of the algorithm model output is a fraction between 0 and 1.

By continuously optimizing the algorithm parameters, the weight of the user feature, i.e. the feature importance value, is obtained, as shown below, it should be noted that this is only one case in the optimization process.

wait _ days: the number of days of waiting period is 100;

eff _ time _ create: the effective date of the policy is separated from the reporting time interval, and the characteristic importance value is 66;

ill _ create: the confirmed date is separated from the reporting time interval, and the characteristic importance value is 60;

lst _ charge _ create: the last recharging time is separated from the reporting time interval, and the characteristic important value is 48;

total _ bal _ amt: the balance of the mutual aid order is 47;

total _ share _ amt: accumulating the amount of the shared money, wherein the characteristic important value is 45;

ill _ days: the number of days from the time of confirmation or accident to the time of the mutual assistance plan, and the characteristic importance value is 42;

total _ frozen _ cnt: the number of times of account freezing is accumulated, and the characteristic importance value is 60;

hz _ inst _20_ cnt: million mutual aid planning times, and the characteristic importance value is 30;

join _ rel: adding the relationship, wherein the characteristic importance value is 26;

pay _ amount: the mutual aid total payment amount is 19;

hz _ user _ advance _ flag: whether the user is an upgrading user or not is judged, and the characteristic importance value is 17;

wait _ flag: whether the waiting period elapses or not, the feature importance value is 16;

join _ rel _ 99999: the adding relation is null, and the characteristic importance value is 15;

user _ cnt _ from _ fst: the order number of the order user, the characteristic importance value is 13;

ill _ croedfunding: the financing initiating time is separated from the diagnosis confirming time interval, and the characteristic important value is 12;

order _ cnt: the mutual-help accumulated amount of orders has a characteristic importance value of 11;

age: age, feature importance value 9;

charge _ amount: the recharging amount is 9;

instance _ id: dangerous seed, the feature importance value is 6.

(3) Algorithm effect evaluation

ROC (Receiver Operating characterization) curves are used here for the algorithmic evaluation. The ROC Curve is a useful visualization tool to compare two classification models, where the AUC (Area under the ROC Curve) of a perfect classifier is 1.0, and the randomly guessed AUC value is 0.5.

The user risk scoring target model has an obvious effect on finding the potential risk of reporting, and the accuracy of the model output result is about 96% in AUC, so that the effect is obvious.

Third, on-line model prediction stage

FIG. 5 shows a schematic diagram of online model prediction according to an embodiment of the invention. As shown in fig. 5, the following steps S501 to S510 may be included.

Step S501, receiving the report material of the user.

Step S502, processing the report material of the user.

The processing of user data for the proposal may include:

(1) processing text information such as report description and the like by using an NLP technology, and extracting effective text features from the processed text information;

(2) the materials provided by the user are processed into texts by utilizing technologies such as OCR and the like, and effective information is extracted.

In a specific embodiment, taking OCR text feature extraction as an example, the specific steps are as follows:

1.1) uploading a picture of the certification material by a user;

1.2) extracting an OCR text by utilizing a third-party technology picture, and converting the OCR text into a text;

1.3) preprocessing the converted text, including space filtering, and extracting labels by using a label dictionary, wherein the labels include useful information such as names, ages, diagnosis records, case numbers and the like;

1.4) performing label processing by using NLP technology, merging synonyms, segmenting words, prefixing and suffixing, aligning entities, performing label equating processing, and outputting, for example, converting patients into names;

1.5) outputting the required result data.

The picture information is extracted through an OCR technology, and the pictures are automatically marked by utilizing an artificial intelligence technology, an NLP natural language processing technology and the like, so that manual picture auditing is not needed, and the efficiency of auditing the reported materials is greatly improved.

In one embodiment, the entry information may include user information such as user name, case, hospital visit, time of visit, original name of entry, etc.

In step S503, a user image is acquired.

Step S504, merging the user characteristic data.

And according to the user characteristic information obtained in the step S502, combining the user image information obtained in the step S503, combining and constructing user characteristics, and extracting effective user characteristic information required by the user risk score target model so as to perform model calculation.

And step S505, calculating based on the user risk score target model obtained by off-line training.

Inputting the report information of the user and the characteristics of the user into a user risk scoring target model obtained by offline training, carrying out risk prediction on the report information of the user, and determining a risk prediction value of the report of the user.

Step S506, determining a model prediction result.

And obtaining a risk prediction value and a risk prevention and control rule of the user according to the user risk scoring target model, and judging whether the user is a case reporting user with a first risk.

In practice, the risk prediction value is a decimal between 0 and 1, the risk prediction value is amplified by 100 times, the distribution area of the claim rejection scores obtained based on the big data is shown in fig. 6, the ordinate is the risk prediction value amplified by 100 times, the abscissa is the number of sample points, and it can be found that users under 10 points are the most rejected users, but a threshold value of 20 to 30 points can be set for improving the auditing.

In one embodiment, a user with a predicted value of no more than 10 points may be set as a first risk claim rejection user, and a user with a predicted value of about 10 points may be set as a second risk user.

Step S507, according to a predefined rule, determining a risk corresponding to the report information after the text processing.

In step S508, a rule determination result is determined.

In step S509, the predicted value is combined between the rule engine processing result and the result output by the model engine. For example, the embodiment of the invention obtains the characteristics of the user through the structuralization, standardization and regularization of the performance rules and judges whether the user accords with the performance rules through the natural language processing technology, thereby not only improving the prediction accuracy, but also realizing the automatic regularization and reducing the manual judgment cost.

And step S510, auditing is carried out according to the rechecking condition, and an auditing result is obtained.

The case risk is further determined according to the auditing rule process, so that the auditing efficiency is improved, the manual auditing time and the configuration of human resources are reduced, and the human cost can be saved.

At present, the overall risk control auditing efficiency is improved by over 60 percent, and the auditing efficiency can be further improved in gradual optimization.

The user risk scoring target model provided by the embodiment of the invention can be used for reporting a case for a user in a water drop mutual aid protection system. Of course, it should be noted that the risk control for the application of the mutual aid security system according to the embodiment of the present invention is only an example. Those skilled in the art should understand that the actual reported information may also be reported information of insurance companies and other mutual support systems, and as long as there is case information of other related systems, the user risk scoring target model can be obtained through offline training of data of the related insurance companies and the mutual support systems, and is applied to risk control of the related systems to realize risk control.

Based on the risk control method provided by each embodiment, the embodiment of the invention also provides a risk control system based on the same inventive concept.

FIG. 7 illustrates a block diagram of a risk control system according to an embodiment of the present invention. As shown in fig. 7, the risk control system may include a text processing module 710, a prediction module 720, and a risk control module 730, specifically:

the text processing module 710 is adapted to receive the report information of the user, perform text processing on the report information of the user, and obtain report information after the text processing;

the prediction module 720 is adapted to predict the risk corresponding to the report information after the text processing by using the constructed user risk scoring target model to obtain a prediction result;

and a risk control module 730 adapted to perform a corresponding risk control operation based on the prediction result.

In an alternative embodiment of the present invention, as shown in fig. 8, the system illustrated in fig. 7 above may further comprise a building module 740 adapted to:

analyzing the collected user portrait data to obtain a user sample data set;

constructing an initial model of user risk score;

In an alternative embodiment of the present invention, as shown in fig. 8, the system illustrated in fig. 7 above may further comprise a data collection module 750 adapted to:

collecting user portrait data;

In an alternative embodiment of the present invention, the construction module 740 is further adapted to:

extracting one or more user features from the collected user portrait data, and determining respective feature values of the one or more user features according to feature information of the one or more user features;

a user sample data set is generated based on the user risk score value, the one or more user characteristics, and the characteristic values of each of the one or more user characteristics.

carrying out specific value filling processing on the characteristic information of one or more user characteristics to determine the characteristic value of each of the one or more user characteristics; and/or

And carrying out quantitative conversion processing on the characteristic information of the one or more user characteristics to determine the characteristic value of each of the one or more user characteristics.

comparing the user risk score value in the test set data with the output user risk score value, and calculating the accuracy rate of the output result of the trained user risk score model according to the comparison result;

and if the accuracy of the output result is smaller than a preset threshold, determining to perform optimization adjustment on the trained user risk scoring model, and taking the model after optimization adjustment as a user risk scoring target model.

In an alternative embodiment of the invention, the text processing module 710 is further adapted to:

providing an interface for reporting cases for a user;

In an alternative embodiment of the invention, the prediction module 720 is further adapted to:

inputting the obtained characteristic values of one or more user characteristics into a constructed user risk scoring target model, and calculating a user risk scoring value corresponding to the report information after text processing;

if the user risk score value corresponding to the report information after the text processing is smaller than a preset score threshold value, predicting to obtain a prediction result of the report information of the user corresponding to the first risk case;

and if the user risk score value corresponding to the report information after the text processing is greater than or equal to the preset score threshold value, predicting to obtain a prediction result of the second risk case corresponding to the report information of the user.

In an alternative embodiment of the invention, the risk control module 730 is further adapted to:

if the prediction result is that the report information of the user corresponds to the first risk case, generating alarm prompt information which represents that the report information of the user corresponds to the first risk case;

and if the prediction result is that the report information of the user corresponds to the second risk case, generating general prompt information which indicates that the report information of the user corresponds to the second risk case.

if the prediction result and the judgment result are both the case reporting information of the user corresponding to the first risk case, generating alarm prompt information;

and if the prediction result and the judgment result are both the case report information of the user corresponding to the second risk case, generating general prompt information.

In an alternative embodiment of the present invention, the risk control system provided in the embodiment of the present invention integrates an OCR system, data mining, user profiling, artificial intelligence machine learning, and a regularization engine to perform risk control. Fig. 9 shows a hierarchical architecture diagram of a risk control system according to an embodiment of the present invention, and specifically, the risk control system may be divided into a data layer 910, a service layer 920 and an application layer 930, which will be described in detail below.

A data layer 910

The big data technology is used for storing bottom data, and the data can be stored in a MySQL911 or HBase912 mode, so that the user information can be conveniently and efficiently accessed.

Second, service layer 920

(1) And the rule engine 921 uses a rule engine technology to construct a rule engine service, and implements a risk control method flow based on a rule engine method.

(2) The model engine 922 constructs a user risk score model by using an artificial intelligence technology such as machine learning, and is based on a risk control method corresponding to the machine learning.

(3) The user portrait 923 is used for user portrait and data processing flow, and the user portrait is constructed and updated in real time by utilizing a big data technology, including user access conditions, mutual help adding conditions, recharging conditions, access equipment information, IP geographic information and the like.

Third, the application layer 930

(1) And providing a rule engine interface 931 externally, so that a business party can call the rule engine to judge the risk of the user conveniently, and if a specific rule is triggered, returning the data condition corresponding to the rule to prompt an auditor to hit the rule prompt information.

(2) And providing a model engine interface 932 externally, facilitating a business party to call an artificial intelligence user risk score model, and prompting if the score of an input user is lower than a threshold value, and outputting a special prompt to an auditor when auditing the case.

The embodiment of the invention constructs an intelligent risk control system based on big data technology and machine learning technology, realizes automatic rule definition and automatic feature weight learning by structured and characterized processing of data, and greatly simplifies the traditional processing flow based on a rule engine; the purpose of traditional risk control based on the rule engine can be realized only by a small amount of professional maintenance, the efficiency of mutual aid performance auditing is accelerated, the probability of default claim payment is reduced, and the healthy and long-term development of mutual aid insurance is further guaranteed.

Based on the same inventive concept, embodiments of the present invention further provide a computer storage medium storing computer program code, which, when run on a computing device, causes the computing device to perform the steps of the risk control method according to any of the above embodiments.

Based on the same inventive concept, an embodiment of the present invention further provides a computing device, including: a processor; a memory storing computer program code; the computer program code, when executed by a processor, causes a computing device to perform the steps of the risk control method of any of the embodiments described above.

In one embodiment, the risk control system provided by the embodiments of the present invention may be implemented in the form of a computer program that is executable on a computing device. The memory of the computing device may store various program modules that make up the risk control system, such as the text processing module, prediction module, and risk control module shown in fig. 7 or 8. The computer program constituted by the respective program modules causes the processor to perform the steps of the risk control method as described in any of the embodiments above.

In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, refer to the partial description of the method embodiment.

The method and system of the present invention may be implemented in a number of ways. For example, the methods and systems of the present invention may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above order of the steps for the method is for illustrative purposes only, and the steps of the method of the present invention are not limited to the order specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention.

The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A risk control method, comprising:

2. The method of claim 1, wherein the user risk scoring goal model is constructed by:

analyzing the collected user portrait data to obtain a user sample data set;

constructing an initial model of user risk score;

3. The method of claim 2, wherein the user representation data includes one or more of user basic attribute information, user behavior information, user business information, user application information, and user social information.

4. The method of claim 2, wherein analyzing the collected user representation data to obtain a user sample data set comprises:

5. The method of claim 4, wherein analyzing the collected user representation data to determine a user risk score value comprises:

6. The method of claim 4, wherein determining the feature value of each of the one or more user features according to the feature information of the one or more user features comprises:

7. The method of claim 4, wherein training the initial user risk score model using the training set data to obtain a trained user risk score model comprises:

8. The method according to claim 4, wherein the evaluating the trained user risk scoring model by using the test set data, and determining whether to optimize and adjust the trained user risk scoring model according to an evaluation result to obtain a user risk scoring target model comprises:

9. The method according to any one of claims 1-8, wherein the receiving the user's entry information comprises:

providing an interface for reporting cases for a user;

10. The method according to any one of claims 1 to 8, wherein predicting the risk corresponding to the report information after text processing by using the constructed user risk scoring target model comprises:

inputting the obtained characteristic values of one or more user characteristics into the constructed user risk scoring target model, and calculating a user risk scoring value corresponding to the report information after text processing;

11. The method of claim 10, wherein predicting the risk corresponding to the text-processed application information according to the user risk score value corresponding to the text-processed application information comprises:

12. The method of claim 11, wherein performing a corresponding risk control operation based on the prediction comprises:

13. The method of claim 11, wherein performing a corresponding risk control operation based on the prediction comprises:

14. The method of claim 13, wherein combining the predicted outcome and the determined outcome to perform a corresponding risk control operation comprises:

15. A risk control system, comprising:

16. The system of claim 15, further comprising a construction module adapted to:

analyzing the collected user portrait data to obtain a user sample data set;

constructing an initial model of user risk score;

17. The system of claim 16, further comprising a data collection module adapted to:

collecting user portrait data;

18. The system of claim 16, wherein the build module is further adapted to:

19. The system of claim 18, wherein the build module is further adapted to:

20. The system of claim 18, wherein the build module is further adapted to:

21. The system of claim 18, wherein the build module is further adapted to:

22. The system of claim 18, wherein the build module is further adapted to:

23. The system according to any of claims 15-22, wherein the text processing module is further adapted to:

providing an interface for reporting cases for a user;

24. The system according to any one of claims 15-22, wherein the prediction module is further adapted to:

25. The system of claim 24, wherein the prediction module is further adapted to:

26. The system of claim 25, wherein the risk control module is further adapted to:

27. The system of claim 25, wherein the risk control module is further adapted to:

28. The system of claim 27, wherein the risk control module is further adapted to:

29. A computer readable storage medium storing computer program code which, when run on a computing device, causes the computing device to perform the risk control method of any of claims 1-14.

30. A computing device, comprising: a processor; a memory storing computer program code; the computer program code, when executed by the processor, causes the computing device to perform the risk control method of any of claims 1-14.