CN113379530A

CN113379530A - User risk determination method and device and server

Info

Publication number: CN113379530A
Application number: CN202110643176.9A
Authority: CN
Inventors: 单升起; 吴垠; 蔡海嘉; 池纪锋
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-09-10

Abstract

The specification provides a user risk determination method, a user risk determination device and a server. Based on the method, before specific implementation, a plurality of sample data sets clustered together based on business relevance of a plurality of sample users can be obtained by carrying out preset clustering processing on a plurality of data tables containing full business data; training by using a plurality of sample data sets of a plurality of sample users to obtain a preset user risk prediction model which has high accuracy and simultaneously comprises a first layer model and a second layer model double-layer structure; the first layer model specifically comprises a plurality of sub-models respectively corresponding to a plurality of sub-service scenes; during specific implementation, after the business data of the target user is obtained, the business data of the target user can be comprehensively processed by calling the preset user risk prediction model, so that the risk level of the target user can be determined comprehensively and accurately, errors in determining the user risk can be effectively reduced, and the risk prediction precision is improved.

Description

User risk determination method and device and server

Technical Field

The specification belongs to the technical field of artificial intelligence, and particularly relates to a user risk determination method, a user risk determination device and a server.

Background

When a business handling organization handles business related to user risks, such as credit business, credit card business and the like, for a user, the handling organization often needs to evaluate the risk condition of the user and then determine whether to handle corresponding business for the user based on the risk condition of the user.

However, when the user risk is predicted specifically, the error is often large based on the existing user risk determining method, and the risk condition of the user cannot be determined comprehensively and accurately.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The specification provides a user risk determination method, a user risk determination device and a server, which can comprehensively and accurately determine the comprehensive risk level of a target user, effectively reduce errors in user risk determination and improve risk prediction accuracy.

An embodiment of the present specification provides a method for determining a user risk, including:

acquiring service data of a target user;

calling a preset user risk prediction model to process the business data of the target user to obtain a corresponding target processing result; the preset user risk prediction model comprises a first layer model and a second layer model, wherein the first layer model comprises a plurality of sub-models, and the sub-models respectively correspond to one sub-service scene;

and determining the risk level of the target user according to the target processing result.

In some embodiments, the method further comprises:

acquiring a plurality of data tables of a service system; wherein each of the plurality of data tables respectively contains a plurality of service data of a sample user;

performing preset clustering processing on the data tables to obtain a plurality of sample data sets of sample users; each sample data set in the plurality of sample data sets respectively comprises service data which corresponds to one sub-service scene and has service correlation;

and training an initial model by using a plurality of sample data sets of the sample user to obtain the preset user risk prediction model.

In some embodiments, performing preset clustering processing on the plurality of data tables to obtain a plurality of sample data sets of a sample user includes:

based on a K-means clustering algorithm, clustering the plurality of data tables according to the identity of the sample user to obtain a plurality of aggregation tables; the aggregation table comprises an identity corresponding to a sample user and service data of a sub-service scene;

and constructing a plurality of sample data sets of the sample user according to the aggregation tables.

In some embodiments, the initial model is constructed as follows:

constructing a plurality of initial sub-models aiming at a plurality of sub-business scenes based on a random forest algorithm; combining the plurality of initial sub-models to obtain an initial first-layer model;

constructing an initial second-layer model based on a decision tree algorithm;

and connecting a plurality of initial sub-models in the initial first-layer model with the initial second-layer model to obtain the initial model.

In some embodiments, after performing a preset clustering process on the plurality of data tables to obtain a plurality of sample data sets of a sample user, the method further includes:

and respectively carrying out data cleaning treatment on a plurality of sample data sets of the sample user according to a preset check rule so as to filter invalid data.

In some embodiments, the invalid data comprises at least one of: the data generation time is greater than the business data of the time threshold value preserved, the data format is not in accordance with business data, data value of business data that the preserving requires, the data value is empty business data.

In some embodiments, the business data comprises banking business data; correspondingly, the sub-service scenario includes: a loan saving business scene, a credit card business scene and a financial management business scene.

In some embodiments, training an initial model using a plurality of sample data sets of the sample user to obtain the preset user risk prediction model includes:

respectively training corresponding initial sub-models in the initial first-layer model by using a plurality of sample data sets of a sample user to obtain a first-layer model meeting requirements;

calling the first layer model to process a plurality of sample data sets of a sample user to obtain a plurality of intermediate processing results;

and training the initial second-layer model by using the plurality of intermediate processing results to obtain a second-layer model meeting the requirements.

In some embodiments, while training the initial second-tier model using the plurality of intermediate processing results, the method further comprises:

associating the credit investigation system to obtain credit investigation risk parameters of the sample user;

and correcting the second layer model by utilizing credit investigation risk parameters of the sample user.

The embodiment of the present specification further provides a preset user risk prediction model training method, including:

training an initial model by utilizing a plurality of sample data sets of the sample user to obtain the preset user risk prediction model; the initial model comprises an initial first-layer model and an initial second-layer model, the initial first-layer model comprises a plurality of initial sub-models, and the initial sub-models respectively correspond to one sub-service scene.

An embodiment of the present specification further provides a device for determining a user risk, including:

the acquisition module is used for acquiring the service data of the target user;

the calling module is used for calling a preset user risk prediction model to process the service data of the target user to obtain a corresponding target processing result; the preset user risk prediction model comprises a first layer model and a second layer model, wherein the first layer model comprises a plurality of sub-models, and the sub-models respectively correspond to one sub-service scene;

and the determining module is used for determining the risk level of the target user according to the target processing result.

Embodiments of the present specification also provide a server, including a processor and a memory for storing processor-executable instructions, where the processor executes the instructions to implement the following: acquiring service data of a target user; calling a preset user risk prediction model to process the business data of the target user to obtain a corresponding target processing result; the preset user risk prediction model comprises a first layer model and a second layer model, wherein the first layer model comprises a plurality of sub-models, and the sub-models respectively correspond to one sub-service scene; and determining the risk level of the target user according to the target processing result.

Embodiments of the present specification also provide a computer-readable storage medium having stored thereon computer instructions that, when executed, implement: acquiring service data of a target user; calling a preset user risk prediction model to process the business data of the target user to obtain a corresponding target processing result; the preset user risk prediction model comprises a first layer model and a second layer model, wherein the first layer model comprises a plurality of sub-models, and the sub-models respectively correspond to one sub-service scene; and determining the risk level of the target user according to the target processing result.

Before specific implementation, a plurality of data tables containing full service data are subjected to preset clustering processing to obtain a plurality of sample data sets of a plurality of sample users clustered together based on service relevance; training by using a plurality of sample data sets of a plurality of sample users to obtain a preset user risk prediction model which has high accuracy and simultaneously comprises a first layer model and a second layer model double-layer structure; the first layer model specifically comprises a plurality of sub-models respectively corresponding to a plurality of sub-service scenes; during specific implementation, after the business data of the target user is obtained, the business data of the target user can be comprehensively processed by calling the preset user risk prediction model, so that the risk level of the target user can be determined comprehensively and accurately, errors in determining the user risk can be effectively reduced, and the risk prediction precision is improved.

Drawings

In order to more clearly illustrate the embodiments of the present specification, the drawings needed to be used in the embodiments will be briefly described below, and the drawings in the following description are only some of the embodiments described in the present specification, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a schematic diagram of an embodiment of a system structure composition to which a method for determining a user risk provided by an embodiment of the present specification is applied;

FIG. 2 is a diagram illustrating an embodiment of a method for determining user risk provided by an embodiment of the present specification, in an example scenario;

FIG. 3 is a flow chart illustrating a method for determining user risk provided by an embodiment of the present description;

FIG. 4 is a schematic flow chart diagram illustrating a method for training a predictive model of risk of a user according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a server according to an embodiment of the present disclosure;

fig. 6 is a schematic structural component diagram of a user risk determination device provided in an embodiment of the present specification;

FIG. 7 is a diagram illustrating an embodiment of a method for determining user risk provided by an embodiment of the present specification, in one example scenario;

FIG. 8 is a diagram illustrating an embodiment of a method for determining user risk provided by an embodiment of the present specification, in one example scenario;

FIG. 9 is a diagram illustrating an embodiment of a method for determining user risk provided by an embodiment of the present specification, in one example scenario;

fig. 10 is a schematic diagram of an embodiment of a method for determining user risk provided by an embodiment of the present specification, in an example scenario.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

The embodiment of the specification provides a method for determining user risk, which can be particularly applied to a system comprising a server and a terminal device. Specifically, as shown in fig. 1, the terminal device and the server may be connected in a wired or wireless manner to perform specific data interaction.

In this embodiment, the server may specifically include a background server that is applied to the data processing system side of the bank a data center and is capable of implementing functions such as data transmission and data processing. Specifically, the server may be, for example, an electronic device having data operation, storage function and network interaction function. Alternatively, the server may be a software program running in the electronic device and providing support for data processing, storage and network interaction. In this embodiment, the number of servers included in the server is not particularly limited. The server may specifically be one server, or may also be several servers, or a server cluster formed by several servers.

In this embodiment, the terminal device may specifically include a front-end electronic device that is disposed on a counter of bank a and can implement functions such as data acquisition and data transmission. Specifically, the terminal device may be, for example, a desktop computer, a tablet computer, a notebook computer, a smart phone, and the like. Alternatively, the terminal device may be a software application capable of running in the electronic device. For example, it may be some APP running on a desktop computer, etc.

Currently, a user B wants to apply for the credit pending payment service of bank A in the service handling hall of bank A. A worker C in charge of business handling can firstly utilize the terminal equipment to interact with the server so as to determine the risk level of the user B; and determining whether to handle the credit loan transaction for the user B and how to handle the credit loan transaction according to the risk level of the user B.

Specifically, the staff C may generate a service data query request by using the terminal device, where the service data query request may specifically carry an identity of the user B (for example, a name, an account name, or an identification number of the user B); and sending the service data query request to a database of a data center of the bank A through the terminal equipment so as to query and acquire service data of the user B. The obtained service data of the user B may specifically include a plurality of service data corresponding to the identity of the user B extracted from a plurality of data tables stored in the database by each service system of bank a. For example, user B's current financial transaction, user B's credit card historical repayment record, user B's house loan repayment record, and so on.

Further, the staff C may use the terminal device to generate a user risk prediction request, where the user risk prediction request may specifically carry the service data of the user B; and sending the user risk prediction request to a server of a bank data center A.

Correspondingly, the server receives the user risk prediction request and analyzes and extracts the service data of the user B from the user risk prediction request.

Then, the server may input the service data of the user B as a model, input the model into a preset user risk prediction model, and operate the preset user risk prediction model to process the service data of the user B.

In particular, as shown in fig. 2. Firstly, business data of a user B are grouped according to business relevance, and a plurality of groups of business data are respectively input into a plurality of corresponding sub-models contained in a first layer model in a preset user risk prediction model. Each sub-model corresponds to a specific sub-service scene. For example, in fig. 2, the sub model No. 1 corresponds to a credit and loan transaction scenario, the sub model No. 2 corresponds to a credit card transaction scenario, the sub model No. 3 corresponds to a financial and financing transaction scenario, and the sub model No. 4 corresponds to a bank VIP service transaction scenario.

Then, controlling a plurality of sub-models to respectively process each group of input service data so as to output risk probability values aiming at the plurality of sub-service scenes as intermediate processing results; and outputting a plurality of intermediate processing results from the first layer model and inputting the intermediate processing results into the second layer model.

Furthermore, the second layer model can be controlled to calculate a corresponding target processing result by integrating a plurality of intermediate processing results, and the target processing result is used as the final model output of the preset user risk prediction model.

The server can determine a more comprehensive and accurate risk level aiming at the target user according to the target processing result; and transmitting the risk level to the terminal device.

The terminal device may present a risk level for user B to staff C.

Staff member C may compare the risk level of user B with a preset first level threshold, and may determine that there is a relatively high risk for the user to transact credit loan transaction when the risk level of user B is determined to be high, e.g., higher than the preset first level threshold. At this point, employee C may refuse to transact credit for the user to avoid bank A accepting bad assets.

When it is determined that the risk level of the user B is low, for example, lower than a preset first level threshold, the worker C may determine that the user is at a low risk of transacting the credit loan transaction, and may transact the requested loan transaction for the user.

Further, the staff C may also compare the risk level of the user B with a preset second level threshold, and may determine that the user transacts a small-amount credit loan transaction when determining that the risk level of the user B is higher than the preset second level threshold.

When the risk level of the user B is determined to be lower than the preset second level threshold, the staff member C can determine that the user is transacted with a large amount of credit loan business.

Through the embodiment, various service data of the user can be integrated by utilizing the preset user risk prediction model, so that the risk level of the user can be determined comprehensively and accurately, and further, corresponding services can be handled for the user finely according to the risk level of the user.

Referring to fig. 3, an embodiment of the present disclosure provides a method for determining a user risk, where the method is specifically applied to a server side. In specific implementation, the method may include the following:

s301: acquiring service data of a target user;

s302: calling a preset user risk prediction model to process the business data of the target user to obtain a corresponding target processing result; the preset user risk prediction model comprises a first layer model and a second layer model, wherein the first layer model comprises a plurality of sub-models, and the sub-models respectively correspond to one sub-service scene;

s303: and determining the risk level of the target user according to the target processing result.

Through the embodiment, the business data of the target user can be comprehensively processed by utilizing the pre-trained preset user risk prediction model with the double-layer structure, so that the risk grade of the target user can be determined comprehensively and accurately, the error in user risk determination is effectively reduced, and the risk prediction precision is improved.

In some embodiments, the target user may be specifically understood as a user object whose business risk is to be determined. According to specific situations and processing requirements, the business risk may be transaction risk, credit risk, health risk and other different types of risks.

In some embodiments, the business data of the target user may be specifically one or more parameter data related to the target user, which can reflect the risk condition of the target user in a direct or indirect manner. The service data may include various different types of data corresponding to different application scenarios.

Specifically, taking a banking business handling scenario as an example, the business data includes banking business data. Further, the service data may specifically include: record data of the historical transacted financing service of the target user, historical repayment records of the credit card of the target user, house loan repayment records of the target user and the like. Of course, it should be noted that the above listed service data is only an exemplary illustration. In specific implementation, the service data may also include other types of data according to specific application scenarios and processing requirements. The present specification is not limited to these.

Through the embodiment, the method for determining the user risk provided by the embodiment of the specification can be effectively applied to process banking business data of the target user so as to accurately predict the risk level of the target user in a banking business handling scene.

In some embodiments, in specific implementation, data matched with the identity of the target user may be extracted from a plurality of data tables stored in the database by querying the database according to the identity of the target user, and the extracted data is used as the service data of the target user.

In some embodiments, the preset user risk prediction model may be specifically understood as a classification model that is trained in advance and can comprehensively and accurately determine the risk level of the user by comprehensively processing various different types of service data of the user.

In some embodiments, the predetermined user risk prediction model may be a classification model with a two-layer structure including a first-layer model and a second-layer model. The first layer model may specifically include a plurality of sub-models. Each sub-model corresponds to a specific sub-service scene. Each sub-model is used for predicting a risk probability value under the corresponding sub-business scene as an intermediate processing result based on the corresponding business data of the user. The second layer model is used for integrating a plurality of intermediate processing results, and predicting the overall risk level of the user as a final target processing result. How the preset user risk prediction model is specifically trained and established will be described in detail later.

Specifically, in a banking transaction scenario, the multiple sub-service scenarios may include: a loan deposit business scenario, a credit card business scenario, a financial management business scenario, and so on. Of course, it should be noted that the above listed sub-service scenarios are only schematic illustrations. In specific implementation, the sub-service scenarios may further include other types of service scenarios according to specific application scenarios and processing requirements. The present specification is not limited to these.

In some embodiments, in specific implementation, the business data of the target user can be input as a model and input into a preset user risk prediction model; and processing the business data of the target user by operating the preset user risk prediction model, and outputting a corresponding target processing result.

During specific operation, the service data of the target user is divided into a plurality of groups of service data based on the service correlation, wherein each group of service data corresponds to one sub-service scene. Then, the multiple groups of service data are respectively input into the corresponding sub-models in the first layer model for specific processing, and the risk probability value under the corresponding sub-service scene is output as an intermediate processing result. Furthermore, the intermediate processing results output by the sub-models are input into the second layer model again and again through the first layer model system for processing. Finally, the second layer model may output a risk level for the overall situation of the target user as the target processing result by integrating the plurality of intermediate processing results. Correspondingly, the risk level of the target user can be comprehensively and finely determined according to the target processing result.

In some embodiments, after determining the risk level of the target user, when the method is implemented, the following may be further included: carrying out corresponding risk marking on the target user according to the risk grade of the target user; and/or determining whether to respond to the target service transaction application of the target user according to the risk level of the target user, transacting the corresponding target service for the target user, and transacting the target service for the target user by adopting a matching mode.

The target service may specifically include at least one of the following: financial transactions, VIP service transactions, credit card transactions, and the like.

In some embodiments, in implementation, the risk level of the target user may be compared with a preset first level threshold, and when it is determined that the risk level of the target user is greater than or equal to the preset first level threshold, it is determined that the overall risk of the target user is greater, and it may be determined that the target user is denied to handle the target service. And under the condition that the risk level of the target user is determined to be smaller than a preset first level threshold value, determining that the overall risk of the target user is smaller, and determining that the target user handles the target service.

In some embodiments, after determining that the target service is transacted for the target user, when the method is implemented, the following may be further included: and comparing whether the risk level of the target user is smaller than a preset second level threshold value, wherein the preset second level threshold value is smaller than the preset first level threshold value.

Under the condition that the risk level of the target user is determined to be smaller than the preset second level threshold, the credibility of the target user can be judged to be high, and at this time, the target user can be transacted with target services with higher relative authority levels, for example, credit services with higher credit limits can be transacted for the target user.

Conversely, when it is determined that the risk level of the target user is greater than or equal to the preset second level threshold, it may be determined that the target user has insufficient credibility, and then the target user may be transacted with a target service with a lower relative permission level, for example, a credit service with a lower quota may be transacted with the target user.

In some embodiments, after transacting the target service for the target user, when the method is implemented, the following may be further included: acquiring service data in a preset time period of a target user every preset time period (for example, every other week); calling a preset user risk prediction model to process the business data of the target user within a preset time period so as to track and analyze the risk grade change condition of the target user in real time according to a target processing result corresponding to the preset time period; and further, the target service transacted by the target user can be timely adjusted according to the risk level change condition of the target user.

Specifically, for example, when it is determined that the risk level of the target user is changing higher and higher according to the risk level change situation of the target user through tracking analysis, the permission level of the target user based on the target service may be gradually retrieved and controlled until the target service of the target user is suspended, so that the risk may be identified and found in time to avoid, and the risk loss of the service handling mechanism may be reduced.

In some embodiments, the method, when implemented, may further include:

s1: acquiring a plurality of data tables of a service system; wherein each of the plurality of data tables respectively contains a plurality of service data of a sample user;

s2: performing preset clustering processing on the data tables to obtain a plurality of sample data sets of sample users; each sample data set in the plurality of sample data sets respectively comprises service data which corresponds to one sub-service scene and has service correlation;

s3: and training an initial model by using a plurality of sample data sets of the sample user to obtain the preset user risk prediction model.

Through the embodiment, the preset clustering processing can be firstly carried out on the plurality of data tables, and a plurality of sample data sets with service correlation and rich characteristics of sample users are clustered; and then, the initial model can be comprehensively trained by utilizing a plurality of sample data sets of the sample users, so that a preset user risk prediction model with higher precision and better effect can be obtained.

In some embodiments, one data table directly acquired by the service system may simultaneously include service data for different sub-service scenarios, and different data tables may respectively include service data corresponding to the same sub-service scenario. By clustering, a plurality of service data which are relatively high and related to the same sample user aiming at the same sub-service scene can be clustered together, and a first sample data set of the sample user which is relatively rich in data aiming at the service scene is obtained.

In some embodiments, the preset clustering process is performed on the multiple data tables to obtain multiple sample data sets of the sample user, and the specific implementation may include the following contents: based on a K-means clustering algorithm, clustering the plurality of data tables according to the identity of the sample user to obtain a plurality of aggregation tables; the aggregation table comprises an identity corresponding to a sample user and service data of a sub-service scene; and constructing a plurality of sample data sets of the sample user according to the aggregation tables.

Through the embodiment, the preset clustering processing can be efficiently completed by utilizing the K-means clustering algorithm, and a plurality of sample data sets which are rich in characteristics and more suitable for the sample users for model training are obtained.

In some embodiments, the aggregation table may specifically correspond to an identity of a sample user and to a specific sub-service scenario. Specifically, one aggregation table may include one or more service data of the corresponding sample user with respect to a certain sub-service scenario, where the service data has service correlation.

In some embodiments, when clustering is performed on a plurality of data tables according to the identity of a sample user specifically based on a K-means clustering algorithm, the plurality of data tables may be retrieved to find out a plurality of service data corresponding to the identity of the same sample user according to the identity of the sample user; calculating field values of a plurality of service data according to the semantic processing rule; further, the value of the field of the plurality of service data can be clustered by using a K-means clustering algorithm; and according to the clustering result, combining a plurality of clustered service data into an aggregation table corresponding to the identity of the sample user.

In some embodiments, the initial model may be specifically constructed in the following manner:

s1: constructing a plurality of initial sub-models aiming at a plurality of sub-business scenes based on a random forest algorithm; combining the plurality of initial sub-models to obtain an initial first-layer model;

s2: constructing an initial second-layer model based on a decision tree algorithm;

s3: and connecting a plurality of initial sub-models in the initial first-layer model with the initial second-layer model to obtain the initial model.

By the embodiment, the initial model which has a good effect and simultaneously comprises the double-layer result of the initial first-layer model and the initial second-layer model can be constructed.

In some embodiments, after performing preset clustering processing on the plurality of data tables to obtain a plurality of sample data sets of a sample user, when the method is implemented, the method may further include the following steps: and respectively carrying out data cleaning treatment on a plurality of sample data sets of the sample user according to a preset check rule so as to filter invalid data.

The invalid data may be specifically understood as data that has a small influence on training of a preset user risk prediction model or is prone to introduce data errors.

Through the embodiment, after the plurality of sample data sets of the sample user are obtained and before the model is specifically trained by the plurality of sample data sets of the sample user, data cleaning processing can be firstly carried out on the plurality of sample data sets, invalid data are filtered, model errors caused by the invalid data are avoided in the subsequent model training process, or overfitting of training is avoided, and therefore the preset user risk prediction model with relatively higher precision can be trained.

In some embodiments, in specific implementation, the corresponding invalid data may be identified and determined according to a preset check rule. The preset check rule may be obtained by summarizing and summarizing historical service data in the corresponding application scene in advance.

In some embodiments, specifically, the invalid data includes at least one of: the data generation time is greater than the preset time threshold value of the service data (for example, ten years old service data), the data format is not in accordance with the preset requirement of the service data (for example, the format of the service data is wrong), the data value is empty service data, and so on. Of course, the above-listed invalid data is only an illustrative example. In specific implementation, according to a specific application scenario and a processing requirement, the invalid data may further include other types of data, for example, service data contradictory to the existing service data, or service data overlapping with the existing service data.

Through the implementation, various invalid data can be identified and filtered from the sample data set according to the preset check rule, so that the sample data set with high precision and good effect can be obtained.

In some embodiments, the training of the initial model by using a plurality of sample data sets of the sample user to obtain the preset user risk prediction model may include the following steps:

s1: respectively training corresponding initial sub-models in the initial first-layer model by using a plurality of sample data sets of a sample user to obtain a first-layer model meeting requirements;

s2: calling the first layer model to process a plurality of sample data sets of a sample user to obtain a plurality of intermediate processing results;

s3: and training the initial second-layer model by using the plurality of intermediate processing results to obtain a second-layer model meeting the requirements.

Through the embodiment, the first-layer model and the second-layer model which meet the requirements can be obtained through training in sequence, and then the preset user risk prediction model which meets the requirements can be obtained.

In some embodiments, when the first-layer model is specifically trained, taking a current initial sub-model of a plurality of initial sub-models included in the first-layer model for training as an example, a risk label of the whole target user and a sub-risk label of a sample data set of the target user for a specific sub-service scene may be marked to obtain a plurality of marked sample data sets of the sample user; continuously training the current initial sub-model by using the labeled sample data set corresponding to the current initial sub-model in the plurality of labeled sample data sets of the sample user to determine a plurality of CART trees, wherein each CART tree can correspond to a type of feature extraction and processing structure in the sub-service scene; and combining the plurality of CART trees to obtain a corresponding random forest model as a current sub-model.

Through the embodiment, the plurality of initial submodels can be subjected to model training respectively by using a plurality of sample data sets of a sample user to obtain a plurality of corresponding submodels, so that the training of the first-layer model is completed.

In some embodiments, when the second layer model is specifically trained, the trained first layer model may be called first to process a plurality of sample data sets of the sample user, and the risk parameters under a plurality of sub-service scenarios are obtained as a plurality of intermediate processing results. Combining the intermediate processing results with the risk labels of the sample users to obtain a plurality of combined sample data sets; wherein, the combined sample data group contains a plurality of intermediate results corresponding to a sample user; dividing the combined sample data groups into a training set and a test set; and then, the training set and the test set can be utilized to train and test the initial second-layer model to obtain the second-layer model meeting the requirements.

In some embodiments, for a banking transaction scenario, when the initial second-layer model is trained by using the plurality of intermediate processing results, the method may further include the following steps: associating the credit investigation system to obtain credit investigation risk parameters of the sample user; and correcting the second layer model by utilizing credit investigation risk parameters of the sample user.

Through the embodiment, in the process of training the second-layer model, the second-layer model can be subjected to targeted optimization and correction by introducing and utilizing the credit investigation system, so that the second-layer model with higher precision and better effect can be obtained.

In some embodiments, when modifying the second layer model by specifically using the credit investigation risk parameter of the sample user, the method may include: calling a second layer model to determine the target risk level of the sample user; comparing the risk level of the sample user with the credit investigation risk parameter to obtain a corresponding comparison result; and correcting the second layer model according to the comparison result.

Specifically, according to the comparison result, under the condition that the difference value between the target risk level and the credit investigation risk parameter is determined to be larger than the preset difference value, the sample user can be marked as a forward training sample; and the forward training sample is utilized to carry out targeted training on the second layer model, so that the training data of the second layer model is advanced towards a correct training result, and the optimization and the correction of the second layer model are realized.

In some embodiments, when the method is implemented while training the initial second-layer model using the plurality of intermediate processing results, the method may further include: associating the business system to obtain a business risk label of the sample user; and correcting the second layer model by using the business risk label of the sample user.

In some embodiments, when modifying the second layer model by specifically using the business risk label of the sample user, the modifying may include: calling a second layer model to determine the target risk level of the sample user; comparing the risk level of the sample user with the business risk label to obtain a corresponding comparison result; and correcting the second layer model according to the comparison result.

Specifically, according to the comparison result, under the condition that the difference value between the target risk level and the business risk label is determined to be larger than the preset difference value, the sample user can be marked as a forward training sample; and the forward training sample is utilized to carry out targeted training on the second layer model, so that the training data of the second layer model is advanced towards a correct training result, and the optimization and the correction of the second layer model are realized.

In some embodiments, in specific implementation, the two optimization manners may also be used to modify and optimize the second layer model, so as to obtain a second layer model with relatively higher accuracy.

In some embodiments, the invoking of the preset user risk prediction model to process the service data of the target user may include the following steps: performing preset clustering processing on the service data of the target user to obtain a plurality of data sets of the target user; inputting the multiple data sets of the target user into a preset user risk prediction model according to a preset input rule; and operating the preset user risk prediction model to obtain a corresponding risk grade as a target processing result.

In some embodiments, the data sets may be combined according to a preset order according to a preset input rule, and then input into a preset user risk prediction model. The preset sequence may be an arrangement sequence of a plurality of sub-models in the first layer model. The service data of the target user may specifically be full service data of the target user.

Through the embodiment, the total business data of the target user can be synthesized by using the preset user risk prediction model, so that the risk level of the target user based on the integral dimensionality can be determined comprehensively and accurately.

In some embodiments, under the condition that only the risk level of a target user in a certain target sub-service scene or a plurality of target sub-service scenes needs to be predicted, after preset clustering processing is performed on the service data of the target user to obtain a plurality of data sets of the target user, a data set corresponding to the target sub-service scene can be screened from the plurality of data sets and recorded as a target data set;

inputting the target data set into a preset user risk prediction model according to a preset input rule (for example, other data sets except the target data set may be set to be empty, and then the data sets are combined according to a preset sequence and then input into the preset user risk prediction model); and operating the preset user risk prediction model to obtain a risk grade aiming at the target sub-service scene as a target processing result.

In some embodiments, after transacting the target service for the target user, when the method is implemented, the following may be further included: acquiring service data of a time period preset by a target user at intervals of a preset time period; calling a preset user risk prediction model to process service data of a target user in a preset time period so as to track and analyze the risk grade change condition of the target user in real time; and adjusting the execution of the target service of the target user according to the risk level change condition of the target user.

As can be seen from the above, before specific implementation, the method for determining user risk provided in the embodiments of the present specification may perform preset clustering processing on a plurality of data tables including full service data, to obtain a plurality of sample data sets of a plurality of sample users clustered together based on service correlation; training by using a plurality of sample data sets of a plurality of sample users to obtain a preset user risk prediction model which has high accuracy and simultaneously comprises a first layer model and a second layer model double-layer structure; the first layer model also comprises a plurality of sub-models respectively corresponding to a plurality of sub-service scenes; during specific implementation, after the business data of the target user is obtained, the business data of the target user can be comprehensively processed by calling the preset user risk prediction model, so that the risk level of the target user can be determined comprehensively and accurately, errors in determining the user risk can be effectively reduced, and the risk prediction precision is improved.

Referring to fig. 4, an embodiment of the present disclosure further provides a method for training a preset user risk prediction model. When the method is implemented, the following contents can be included:

s401: acquiring a plurality of data tables of a service system; wherein each of the plurality of data tables respectively contains a plurality of service data of a sample user;

s402: performing preset clustering processing on the data tables to obtain a plurality of sample data sets of sample users; each sample data set in the plurality of sample data sets respectively comprises service data which corresponds to one sub-service scene and has service correlation;

s403: training an initial model by utilizing a plurality of sample data sets of the sample user to obtain the preset user risk prediction model; the initial model comprises an initial first-layer model and an initial second-layer model, the initial first-layer model comprises a plurality of initial sub-models, and the initial sub-models respectively correspond to one sub-service scene.

In some embodiments, the plurality of service data of the sample user included in the plurality of data tables may constitute the total service data of the sample user.

As can be seen from the above, based on the preset user risk prediction model training method provided in the embodiments of the present specification, the full service data of the sample user can be fully utilized, and a comprehensive and accurate preset user risk prediction model with a wide application range is obtained through training.

Embodiments of the present specification further provide a server, including a processor and a memory for storing processor-executable instructions, where the processor, when implemented, may perform the following steps according to the instructions: acquiring service data of a target user; calling a preset user risk prediction model to process the business data of the target user to obtain a corresponding target processing result; the preset user risk prediction model comprises a first layer model and a second layer model, wherein the first layer model comprises a plurality of sub-models, and the sub-models respectively correspond to one sub-service scene; and determining the risk level of the target user according to the target processing result.

In order to more accurately complete the above instructions, referring to fig. 5, another specific server is provided in the embodiments of the present specification, wherein the server includes a network communication port 501, a processor 502 and a memory 503, and the above structures are connected by an internal cable, so that the structures can perform specific data interaction.

The network communication port 501 may be specifically configured to obtain service data of a target user.

The processor 502 may be specifically configured to invoke a preset user risk prediction model to process the service data of the target user, so as to obtain a corresponding target processing result; the preset user risk prediction model comprises a first layer model and a second layer model, wherein the first layer model comprises a plurality of sub-models, and the sub-models respectively correspond to one sub-service scene; and determining the risk level of the target user according to the target processing result.

The memory 503 may be specifically configured to store a corresponding instruction program.

In this embodiment, the network communication port 501 may be a virtual port that is bound to different communication protocols, so that different data can be sent or received. For example, the network communication port may be a port responsible for web data communication, a port responsible for FTP data communication, or a port responsible for mail data communication. In addition, the network communication port can also be a communication interface or a communication chip of an entity. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it can also be a Wifi chip; it may also be a bluetooth chip.

In this embodiment, the processor 502 may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The description is not intended to be limiting.

In this embodiment, the memory 503 may include multiple layers, and in a digital system, the memory may be any memory as long as binary data can be stored; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.

The present specification further provides a computer-readable storage medium based on the above-mentioned user risk determination method, where the computer-readable storage medium stores computer program instructions, and when the computer program instructions are executed, the computer program instructions implement: acquiring service data of a target user; calling a preset user risk prediction model to process the business data of the target user to obtain a corresponding target processing result; the preset user risk prediction model comprises a first layer model and a second layer model, wherein the first layer model comprises a plurality of sub-models, and the sub-models respectively correspond to one sub-service scene; and determining the risk level of the target user according to the target processing result.

In this embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.

In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer-readable storage medium can be explained in comparison with other embodiments, and are not described herein again.

Referring to fig. 6, in a software level, an embodiment of the present specification further provides an apparatus for determining a user risk, where the apparatus may specifically include the following structural modules:

the obtaining module 601 may be specifically configured to obtain service data of a target user;

the invoking module 602 may be specifically configured to invoke a preset user risk prediction model to process the service data of the target user, so as to obtain a corresponding target processing result; the preset user risk prediction model comprises a first layer model and a second layer model, wherein the first layer model comprises a plurality of sub-models, and the sub-models respectively correspond to one sub-service scene;

the determining module 603 may be specifically configured to determine a risk level of the target user according to the target processing result.

It should be noted that, the units, devices, modules, etc. illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. It is to be understood that, in implementing the present specification, functions of each module may be implemented in one or more pieces of software and/or hardware, or a module that implements the same function may be implemented by a combination of a plurality of sub-modules or sub-units, or the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

As can be seen from the above, the device for determining a user risk provided in the embodiment of the present specification comprehensively processes the service data of the target user by invoking the preset user risk prediction model with higher precision and wider application range, and can determine the risk level of the target user relatively comprehensively and accurately, thereby effectively reducing an error when determining the user risk and improving the risk prediction precision.

In a specific scenario example, the user risk determination method provided in this specification may be applied, and first, based on business-wide data of the monitoring system, business-wide data is clustered according to relevance to form a plurality of data modules (e.g., a plurality of sub-business scenarios) with strong business relevance, such as a loan data module, a customer information module, a financial management module, and a credit card module. A machine learning model (e.g., a sub-model in the first-layer model) can be trained for each data module, so that an initial analysis can be performed on a client (e.g., a target user) from a multidimensional business perspective, and then each initial analysis result (e.g., an intermediate processing result) is used as an input of a client comprehensive analysis model (e.g., the second-layer model), so that a risk level of the client can be finally decided. Meanwhile, the credit investigation system and the service system can be linked, data are corrected through a data comparison result based on the credit investigation system and the service system to form a forward training sample, the forward training sample is input into the model, and the model algorithm is optimized, so that real-time, comprehensive and accurate risk analysis of bank customers can be realized.

The specific implementation process can be executed by referring to the following contents.

Based on the user risk determination method provided by the embodiment of the specification, a corresponding client risk analysis system based on a supervision system can be constructed first. As can be seen in particular in fig. 7. The system may specifically comprise the following structure: a data preparation device 1, a risk level model device 2 and a model adjusting device 3.

When the system is implemented, the data preparation device 1 can perform clustering, cleaning and feature extraction on the data (for example, a plurality of data tables) collected by the monitoring system to complete data preparation. And inputting the prepared data into the risk level model device 2 to perform risk analysis on the client, thereby completing the risk analysis on the client. And finally, adjusting the model algorithm by adopting a model adjusting device 3.

Referring to fig. 8, the data preparation apparatus 1 may specifically include the following structure: the device comprises a clustering module, a cleaning module and a feature extraction module. The whole device realizes data clustering and data cleaning according to business relevance through a clustering algorithm and a check rule preset by a supervisory system. And then realizing feature extraction according to the input requirement of the risk level model device 2, wherein the core of the device is to perform grouping feature extraction on data according to business relevance.

When the data preparation apparatus 1 is implemented, a clustering algorithm (e.g., K-means algorithm) may be used to group data having a large traffic correlation. The method aims to extract feature data with abundant data dimensions (for example, approval information and issuing information of a credit card of an original A customer are in different data tables, the two parts of information can be aggregated together through a clustering algorithm, and then the feature data with abundant large-class dimensions based on the service can be obtained during feature extraction), so that the comprehensiveness of the data is guaranteed, and the accuracy of subsequent model training is improved. The subsequent model algorithm has a solid data base. The data preparation process may specifically include the following:

1) acquiring full-service aggregation table data of a supervision system, clustering the aggregated data according to service correlation through a clustering algorithm, and dividing the data into a plurality of large groups (corresponding to a plurality of sample data sets). The clustering can be realized by field information of a client number, an accounting subject number, a client account number and the like of the aggregation table, for example, three clients are listed below, the client number "001" of each client represents an individual client, the accounting subject number "010102" represents a deposit subject, related data of each client is listed in the aggregation table A, three clients are also listed in the aggregation table B and are the same account number "2347658389", the subject "10102" represents a financing service, which indicates that the client has a financing service next to the account number "2347658389", so that the two pieces of data in the aggregation tables A and B are related in service and can be clustered into a loan-saving data module. Similarly, the data of the monitoring system can be clustered into a large class of services such as a financial management module, a credit card module and the like, so that the related information of each client can be obtained as comprehensively as possible, and the information can be classified according to a set strategy. In particular, see table 1.

TABLE 1

Name (I)	Customer number	Accounting subject number	Customer account number	Polymer table of	Business module
						Zhang San	001	010102	2347658389	A	Loan-saving data module
Zhang San	001	010103	2347658389	B	Loan-saving data module
						Li Si	003	010104	2347658391	C	Credit card module

2) And inputting the clustered data into a cleaning module, and removing junk data, error data and null data (for example, filtering invalid data) by combining with a check rule preset by a supervisory system, so as to ensure that the training data required by the model is complete and accurate as far as possible.

3) Inputting the cleaned data into a characteristic selection module, and performing characteristic selection according to the module category after clustering, for example, a loan deposit data module extracts information such as client numbers, deposit amount, monthly transaction amount, loan amount and overdue amount as input information of a loan deposit data model; the credit card module extracts information such as a client number, an examination and approval limit of a line I, an examination and approval limit of a line other than the line I, an overdraft limit, a card state and the like as input information of a credit card model.

Referring to fig. 9, a risk classification model apparatus 2 is shown that may specifically be composed of multiple sets of machine learning models. The risk level model device can be divided into a 2-layer structure, the first layer (for example, the first layer model) is formed by a plurality of sets of initial analysis models in parallel to form an initial analysis model group, each set of model (for example, a sub-model) corresponds to one business large-class module (for example, a sub-business scene), so that each model analyzes the risk level of a customer from different business dimensions, and the processing mode improves the stability and the analysis accuracy of the whole risk level model by dividing the responsibility of a machine learning model according to the business dimensions. And constructing an initial analysis model of the first layer by adopting a random forest algorithm.

Specifically, for example, a credit card service class corresponds to the credit card model in the first layer. The service broad class data is extracted according to 3 characteristic broad classes (credit class characteristic, card issuing class characteristic and post-credit class characteristic). Each feature type corresponds to one CART tree, and an analysis example of each CART tree can be shown in table 2, table 3, and table 4.

TABLE 2 example table of credit type characteristics

TABLE 3 example table of card issuing class characteristics

Name (I)

Customer number

Amount of overdraft

Card status

Is thrown to industry

Number of growth stages

Risk rating

Zhang San

001

50 yuan

Is normal

General consumption

0

Without risk

Li Si

002

100 yuan

Overdue

General consumption

1

Low risk

TABLE 4 exemplary table of characteristics of after-credit category

Name (I)

Customer number

Whether or not to transfer assets

Transfer amount

Whether to check

Amount of money to be checked

Risk rating

Zhang San

001

Whether or not

0

Without risk

Li Si

002

Whether or not

0

Is that

100

Low risk

Wang Wu

003

Is that

100

Is that

100

High risk

And selecting proper feature data by selecting the feature classes and completing the preliminary risk analysis of each service class respectively by using the other initialization models as the credit card models.

The second layer (for example, the second layer model) is a comprehensive risk analysis model, and a decision tree algorithm is adopted to realize the model. The output results of the multiple sets of initial analysis models of the first layer can be used as input data of the second layer, and finally the comprehensive risk level of the client is decided. The comprehensive risk analysis model can be linked with a credit investigation system and a service system, and forms a forward training sample by comparing with credit investigation data of a credit investigation system client and correction data of the service system, and the forward training sample is input into the decision tree model, so that algorithm tuning is realized.

A specific optimization process can be seen in fig. 10. The model adjusting device 3 can realize the algorithm self-optimization of the comprehensive risk analysis model through the following two ways.

1) And associating the client credit investigation system, comparing the model analysis result with the credit investigation system risk level of the corresponding client, matching the corresponding characteristic vector if the comparison result is at two extremes of the risk level (namely the model analysis result is far from the credit investigation system result), forming a forward training sample, inputting the forward training sample into the comprehensive risk analysis model, and performing decision algorithm optimization.

2) And (3) associating the client risk analysis system with a business system, enabling business personnel to inquire the client risk level in the business system, marking the client if the client risk level is far from the actual situation, matching corresponding characteristic vectors by the system to form a forward training sample, inputting the forward training sample into a comprehensive risk analysis model, and optimizing a decision algorithm.

Through the scene example, the system can cluster the business-wide affairs according to the relevance based on the business-wide affair data provided by the monitoring system to form a plurality of data modules with strong business relevance, such as a loan storage data module, a customer information module, a financial management module, a credit card module and the like. Each data module corresponds to one machine learning model, so that initial analysis can be performed on the client from the multi-dimensional service perspective, and each initial analysis result is used as an input scene of a client comprehensive analysis model, so that the client risk level is finally decided. In particular, there may be several advantages listed below:

1. the accurate positioning of the business data of the whole industry according to the business correlation is realized through a clustering algorithm, and the dimensionality of the characteristic data is enriched, so that the data preparation work is completed for improving the analysis precision of the model;

2. risk analysis is realized by adopting a plurality of sets of machine learning models together, so that each model analyzes the risk level of a client from different business angles, and the stability and the analysis accuracy of the models are improved;

3. model parameters are dynamically adjusted by linking a credit investigation system and a service system, so that algorithm tuning is realized, and a model with higher precision can be obtained.

Although the present specification provides method steps as described in the examples or flowcharts, additional or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. The terms first, second, etc. are used to denote names, but not any particular order.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer-readable storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus necessary general hardware platform. With this understanding, the technical solutions in the present specification may be essentially embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments in the present specification.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims

1. A method for determining user risk, comprising:

acquiring service data of a target user;

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein performing a predetermined clustering process on the plurality of data tables to obtain a plurality of sample data sets of a sample user comprises:

4. The method of claim 2, wherein the initial model is constructed as follows:

constructing an initial second-layer model based on a decision tree algorithm;

5. The method of claim 2, wherein after performing a predetermined clustering process on the plurality of data tables to obtain a plurality of sample data sets of sample users, the method further comprises:

6. The method of claim 5, wherein the invalid data comprises at least one of: the data generation time is greater than the business data of the time threshold value preserved, the data format is not in accordance with business data, data value of business data that the preserving requires, the data value is empty business data.

7. The method of claim 2, wherein the business data comprises banking data; correspondingly, the sub-service scenario includes: a loan saving business scene, a credit card business scene and a financial management business scene.

8. The method of claim 7, wherein training an initial model to obtain the predetermined user risk prediction model using a plurality of sample data sets of the sample user comprises:

9. The method of claim 8, wherein while training the initial second-tier model using the plurality of intermediate processing results, the method further comprises:

10. A preset training method for a user risk prediction model is characterized by comprising the following steps:

11. An apparatus for determining a risk of a user, comprising:

12. A server comprising a processor and a memory for storing processor-executable instructions which, when executed by the processor, implement the steps of the method of any one of claims 1 to 9.

13. A computer-readable storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 9.