CN117455234A

CN117455234A - Method, device, computer equipment and storage medium for determining risk classification model

Info

Publication number: CN117455234A
Application number: CN202311420545.3A
Authority: CN
Inventors: 汪凤君
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-10-30
Filing date: 2023-10-30
Publication date: 2024-01-26

Abstract

The application relates to a method, a device, computer equipment and a storage medium for determining a risk classification model. The application relates to big data technical field information. The method comprises the following steps: acquiring new user data of a new product, historical user data of a plurality of historical products, and identifying new product domain information of the new product and product domain information of each historical product; calculating the divergence value between the product field information and the new product field information of each item respectively, and screening auxiliary field information with the divergence value lower than a preset divergence threshold value; assimilating historical user data of the auxiliary field information with new user data to obtain a migration transformation matrix, and training each risk classification model to obtain a sub-target risk classification model corresponding to the auxiliary field information; and determining the weight value of each sub-target risk classification model based on the divergence value of the information in each auxiliary field to obtain a target prediction model of a new product. By adopting the method, the time efficiency of constructing the risk classification model of the new product can be improved.

Description

Method, device, computer equipment and storage medium for determining risk classification model

Technical Field

The present disclosure relates to the field of big data technologies, and in particular, to a method and apparatus for determining a risk classification model, a computer device, and a storage medium.

Background

When the bank pushes the general products to expand the coverage, the management of risks must be considered, and along with the increasing of general financial products, the bank must consider the corresponding risk control of the products, so as to prevent the occurrence of high reject rate and bad account rate. In general, a bank can perform risk admission control on the offered benefit product, namely, perform risk scoring on a client applying the corresponding benefit product. Therefore, constructing a risk classification model of a user is the current research focus

The traditional risk classification model is constructed by acquiring a large amount of data of the existing users, but the new product cannot directly acquire the risk classification model because the applicable users are new users and the data amount of the new users is small when the general product is popularized, so that the time efficiency of constructing the risk classification model of the new product is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, apparatus, computer device, computer-readable storage medium, and computer program product for determining a risk classification model.

In a first aspect, the present application provides a method for determining a risk classification model. The method comprises the following steps:

acquiring new user data of a new product, historical user data of a plurality of historical products and a risk classification model of each historical product, and identifying new product domain information of the new product and product domain information of each historical product;

calculating product domain information of each historical product and a divergence value between the product domain information of the new product based on the new user data of the new product and the historical user data of each historical product, and screening the product domain information corresponding to the divergence value lower than a preset divergence threshold value as auxiliary domain information of the new product;

assimilating historical user data of target historical products corresponding to the auxiliary field information and new user data of the new products to obtain migration transformation matrixes of the target historical products and the new products, training risk classification models of the target historical products based on the migration transformation matrixes, and obtaining sub-target risk classification models corresponding to the auxiliary field information;

And determining a weight value of each sub-target risk classification model based on the divergence value between the auxiliary field information and the product field information of the new product, and determining a target prediction model of the new product based on each sub-target risk classification model and the weight value of each sub-target risk classification model.

Optionally, the new product domain information identifying the new product, and the product domain information of each of the history products include:

identifying the product types of the new products and the product types of the historical products, and acquiring the corresponding relation between the product types and the product field information in a product database;

based on the product category of the new product and the product category of each of the history products, the new product domain information of the new product and the product domain information of each of the history products are determined by the correspondence relationship between the product category and the product domain information.

Optionally, the calculating, based on the new user data of the new product and the historical user data of each historical product, a divergence value between the product domain information of each historical product and the new product domain information of the new product includes:

Identifying user distribution information corresponding to new user data of the new product and user distribution information corresponding to historical user data of each historical product;

and calculating the divergence value between the user information corresponding to the new product and the user distribution information corresponding to the historical product by a divergence algorithm according to each historical product, and taking the divergence value as the divergence value between the product field information of the historical product and the new product field information of the new product.

Optionally, the assimilating the historical user data of the target historical product corresponding to each auxiliary field information and the new user data of the new product to obtain a migration transformation matrix of each target historical product and the new product includes:

acquiring risk classification information corresponding to historical user data of each target historical product, and respectively predicting first risk classification information of each new user data based on a risk classification model of each target historical product;

taking the new user data and the first risk classification information of the new user data as new data sets of the new products, and taking the historical user data of the target historical products and the risk classification information of the historical user data of the target historical products as historical data sets of the target historical products;

And carrying out data balance processing on the new data set and each historical data set through a data balance program to obtain balance vectors corresponding to each target historical product and the new product, and carrying out matrixing processing on the balance vectors to obtain migration transformation matrixes corresponding to each target historical product and the new product.

Optionally, training a risk classification model of each target historical product based on the migration transformation matrix to obtain a sub-target risk classification model corresponding to each auxiliary field information, where the training includes:

identifying each conversion user data of the migration transformation matrix and risk classification information of each conversion user data, and respectively training a risk classification model of each target historical product based on each conversion user data and the risk classification information of each conversion user data to obtain an initial sub-target risk classification model corresponding to each auxiliary field information;

based on the initial sub-target risk classification model, predicting new risk classification information of the new user data respectively, and calculating an average value of the new risk classification information of each new user data to obtain second risk classification information of each new user data;

Identifying a deviation value between the first risk classification information of each new user data and the second risk classification information of each new user data, replacing the first risk classification information of each new user data with the second risk classification information of each new user data when the deviation value is larger than a deviation threshold value, and returning to execute a data balancing program to perform data balancing processing on the new data set and each historical data set to obtain a balancing vector corresponding to each new product of each target historical product;

and taking the initial sub-target risk classification model corresponding to the auxiliary field information obtained in the last iteration as the initial sub-target risk classification model corresponding to the auxiliary field information until no deviation value larger than a deviation threshold exists.

Optionally, the determining the weight value of each sub-objective risk classification model based on the divergence value between each auxiliary domain information and the product domain information of the new product includes:

calculating the reciprocal of the divergence value between each auxiliary field information and the product field information of the new product to obtain the reciprocal value corresponding to each auxiliary field information;

Dividing the reciprocal value of the auxiliary field information by the sum of reciprocal values of the auxiliary field information for each piece of auxiliary field information to obtain the occupation ratio of the auxiliary field information, and taking the occupation ratio of the auxiliary field information as the weight value of the sub-target risk classification model corresponding to the auxiliary field information.

Optionally, the method further comprises:

acquiring target user data of the new product, and respectively predicting sub-target risk classification information corresponding to the target user data based on each sub-target risk classification model in the target risk classification model;

and respectively carrying out weighted summation processing on the sub-target risk classification information corresponding to each sub-target risk classification model based on the weight value of each sub-target risk classification model to obtain target risk classification information corresponding to the target user data.

In a second aspect, the application further provides a device for determining a risk classification model. The device comprises:

the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring new user data of a new product, historical user data of a plurality of historical products and a risk classification model of each historical product, and identifying new product field information of the new product and product field information of each historical product;

The screening module is used for respectively calculating the product field information of each historical product and the divergence value between the product field information of the new product based on the new user data of the new product and the historical user data of each historical product, and screening the product field information corresponding to the divergence value lower than a preset divergence threshold value as the auxiliary field information of the new product;

the assimilation module is used for assimilating the historical user data of the target historical products corresponding to the auxiliary field information and the new user data of the new products to obtain migration transformation matrixes of the target historical products and the new products, training a risk classification model of the target historical products based on the migration transformation matrixes, and obtaining a sub-target risk classification model corresponding to the auxiliary field information;

the determining module is used for determining the weight value of each sub-target risk classification model based on the divergence value between the auxiliary field information and the product field information of the new product, and determining the target prediction model of the new product based on each sub-target risk classification model and the weight value of each sub-target risk classification model.

Optionally, the acquiring module is specifically configured to:

Optionally, the screening module is specifically configured to:

Optionally, the assimilation module is specifically configured to:

Optionally, the determining module is specifically configured to:

Optionally, the apparatus further includes:

the target acquisition module is used for acquiring target user data of the new product and respectively predicting sub-target risk classification information corresponding to the target user data based on each sub-target risk classification model in the target risk classification model;

and the processing module is used for respectively carrying out weighted summation processing on the sub-target risk classification information corresponding to each sub-target risk classification model based on the weight value of each sub-target risk classification model to obtain the target risk classification information corresponding to the target user data.

In a third aspect, the present application provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method of any of the first aspects when the processor executes the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium. On which a computer program is stored which, when being executed by a processor, implements the steps of the method of any of the first aspects.

In a fifth aspect, the present application provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects.

The method, the device, the computer equipment and the storage medium for determining the risk classification model are used for acquiring new user data of a new product, historical user data of a plurality of historical products and the risk classification model of each historical product, and identifying new product field information of the new product and product field information of each historical product; calculating product domain information of each historical product and a divergence value between the product domain information of the new product based on the new user data of the new product and the historical user data of each historical product, and screening the product domain information corresponding to the divergence value lower than a preset divergence threshold value as auxiliary domain information of the new product; assimilating historical user data of target historical products corresponding to the auxiliary field information and new user data of the new products to obtain migration transformation matrixes of the target historical products and the new products, training risk classification models of the target historical products based on the migration transformation matrixes, and obtaining sub-target risk classification models corresponding to the auxiliary field information; and determining a weight value of each sub-target risk classification model based on the divergence value between the auxiliary field information and the product field information of the new product, and determining a target prediction model of the new product based on each sub-target risk classification model and the weight value of each sub-target risk classification model. Product domain information of historical products with smaller divergence values is screened to serve as auxiliary domain information of new products, assimilation processing is carried out on each piece of historical user data corresponding to the auxiliary domain information through the new user data, so that edge distribution and condition distribution differences in the two pieces of domain information are weakened, migration transformation matrixes of a large amount of user data suitable for the new products are obtained, risk classification models corresponding to each piece of auxiliary domain information are trained through the migration transformation matrixes respectively to obtain sub-target risk classification models corresponding to each piece of auxiliary domain information, and weight values of the sub-target risk classification models are determined based on the divergence values of the pieces of auxiliary domain information, so that a target prediction model of the new products is determined. The method and the device have the advantages that the distribution difference between the historical products which are highly similar to the new products is weakened, and then the risk classification model of the historical products is used as the risk classification model of the new products, so that the problem that the risk classification model cannot be directly obtained due to insufficient training data of the new products is avoided. Thereby improving the time-efficiency of constructing the risk classification model of the new product.

Drawings

FIG. 1 is an application environment diagram of a method of determining a risk classification model in one embodiment;

FIG. 2 is a flow chart of a method for determining a risk classification model according to one embodiment;

FIG. 3 is a flow diagram of an example of determining a risk classification model in one embodiment;

fig. 4 is a block diagram of a determination apparatus for risk classification model in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The method for determining the risk classification model can be applied to an application environment in the early stage of product popularization. The method can be applied to the terminal, the server and a system comprising the terminal and the server, and is realized through interaction of the terminal and the server. The terminal may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and the like. The terminal obtains a migration transformation matrix suitable for a large amount of user data of a new product by screening product field information of a historical product with smaller divergence value as auxiliary field information of the new product and assimilating each historical user data corresponding to the auxiliary field information through new user data, so as to weaken edge distribution and condition distribution difference in the two field information, finally, respectively training risk classification models corresponding to each auxiliary field information through the migration transformation matrix to obtain sub-target risk classification models corresponding to each auxiliary field information, and determining weight values of each sub-target risk classification model based on the divergence value of each auxiliary field information, thereby determining a target prediction model of the new product. The method and the device have the advantages that the distribution difference between the historical products which are highly similar to the new products is weakened, and then the risk classification model of the historical products is used as the risk classification model of the new products, so that the problem that the risk classification model cannot be directly obtained due to insufficient training data of the new products is avoided. Thereby improving the time-efficiency of constructing the risk classification model of the new product.

In one embodiment, as shown in fig. 1, a method for determining a risk classification model is provided, and the method is applied to a terminal for illustration, and includes the following steps:

step S101, acquiring new user data of a new product, historical user data of a plurality of historical products, and risk classification models of each historical product, and identifying new product domain information of the new product, and product domain information of each historical product.

In this embodiment, the terminal obtains existing user information in a product database of a new product as new user data of the new product, and obtains user information of each historical product in a historical database according to a category of each historical product, so as to obtain historical user data of each historical product. And then, the terminal queries a risk classification model corresponding to each historical product in the model database. Wherein the user information includes a resource possession of the user, resource flow information, asset liability information, and user identity information. In the case where the user is an enterprise, the user information further includes subsidiary information, branch information, partner information, and affiliated company information. Then, the terminal recognizes the product domain information of each product to obtain new product domain information of new products and product domain information of each history product. Wherein, the product domain information user characterizes the category to which the product belongs, and the product domain information includes, but is not limited to, financial domain information, entertainment domain information, ancient playing domain information, quick sales domain information, life domain information, service domain information, and the like.

Step S102, calculating the product domain information of each historical product and the divergence value between the product domain information of the new product based on the new user data of the new product and the historical user data of each historical product, and screening the product domain information corresponding to the divergence value lower than the preset divergence threshold as the auxiliary domain information of the new product.

In this embodiment, the terminal calculates a divergence value between product domain information of two products based on distribution information of user data of the two product domain information, and obtains a divergence value between product domain information of each historical product and new product domain information of a new product. The divergence value is calculated by JS (Jensen-Shannon) divergence algorithm. And then, presetting a divergence threshold value by the terminal, and screening product field information corresponding to a divergence value lower than the preset divergence threshold value from the divergence values of the historical field information, wherein the product field information is used as auxiliary field information of a new product.

Step S103, assimilating the historical user data of the target historical products corresponding to the auxiliary field information and the new user data of the new products to obtain migration transformation matrixes of the target historical products and the new products, and training the risk classification model of each target historical product based on the migration transformation matrixes to obtain the sub-target risk classification model corresponding to the auxiliary field information.

In this embodiment, the terminal assimilates the historical user data of the target historical product corresponding to the information of each auxiliary field with the new user data of the new product to obtain a migration transformation matrix of each target historical product and the new product, and trains the risk classification model of each target historical product based on the migration transformation matrix to obtain the sub-target risk classification model corresponding to the information of each auxiliary field. The data assimilation processing mode is a BDA dynamic field information self-adaption method. Specifically, the auxiliary domain information data acquired by the terminal are respectively fused with the target domain information data, the BDA dynamic domain information self-adaptive method is used for mapping the auxiliary domain information and the target domain information data to the same spatial domain information, the migration transformation matrix is obtained in an iteration mode, the edge distribution and condition distribution difference in the two domain information are reduced, finally, the data after the auxiliary domain information transformation and the existing label data are utilized to construct a model, and the final model is a risk access result for predicting the target domain information. The specific assimilation process will be described in detail later. Wherein the risk classification model is used to predict risk classification information generated by the user for the product. The risk classification information is various risk levels, and each risk level is used for representing the credibility of the product to the user. For example, risk classification information corresponding to the quick-loan product is used for representing the credit rating of the quick-loan product to the user, so that the situation of high reject rate and bad account rate in the later stage is avoided. Each grade in the credit rating is preset in a risk classification model corresponding to each history difference by a worker, and the classification standard of each grade is obtained by training the risk classification model corresponding to each history product based on a large amount of user data.

Step S104, determining the weight value of each sub-target risk classification model based on the divergence value between the auxiliary field information and the product field information of the new product, and determining the target prediction model of the new product based on each sub-target risk classification model and the weight value of each sub-target risk classification model.

In this embodiment, the terminal determines the weight value of each sub-target risk classification model based on the divergence value between each auxiliary field information and the product field information of the new product, and determines the target prediction model of the new product based on each sub-target risk classification model and the weight value of each sub-target risk classification model.

Based on the scheme, product field information of a historical product with smaller divergence value is screened to be used as auxiliary field information of a new product, assimilation processing is carried out on each historical user data corresponding to the auxiliary field information through new user data, so that edge distribution and condition distribution differences in the two field information are weakened, a migration transformation matrix suitable for a large amount of user data of the new product is obtained, finally, risk classification models corresponding to each auxiliary field information are respectively trained through the migration transformation matrix to obtain sub-target risk classification models corresponding to each auxiliary field information, and weight values of the sub-target risk classification models are determined based on the divergence value of each auxiliary field information, so that a target prediction model of the new product is determined. The method and the device have the advantages that the distribution difference between the historical products which are highly similar to the new products is weakened, and then the risk classification model of the historical products is used as the risk classification model of the new products, so that the problem that the risk classification model cannot be directly obtained due to insufficient training data of the new products is avoided. Thereby improving the time-efficiency of constructing the risk classification model of the new product.

Optionally, the new product domain information identifying new products, and the product domain information of each history product, includes: identifying the product types of the new products and the product types of the historical products, and acquiring the corresponding relation between the product types and the product field information in a product database; based on the product types of the new products and the product types of the historical products, the new product field information of the new products and the product field information of the historical products are determined through the corresponding relation between the product types and the product field information.

In this embodiment, the terminal identifies the product type of the new product and the product types of the historical products, and obtains the correspondence between the product types and the product domain information in the product database. Then, the terminal determines new product domain information of the new product based on the product type of the new product through the corresponding relation between the product type and the product domain information. And then, the terminal determines the product field information of each historical product according to the corresponding relation between the product type and the product field information based on the product type of each historical product. The product types are used for representing the types of the products, and the types include, but are not limited to, the types of the inauguration investment products, the types of financial products, the types of warranty products and the like corresponding to the information of the financial product field.

Based on the scheme, the product types of the products are identified, so that the product field information of the products is determined, and the accuracy of determining the product field information is improved.

Optionally, calculating the divergence value between the product domain information of each historical product and the new product domain information of the new product based on the new user data of the new product and the historical user data of each historical product, respectively, includes: identifying user distribution information corresponding to new user data of a new product and user distribution information corresponding to historical user data of each historical product; and calculating the divergence value between the user information corresponding to the new product and the user distribution information corresponding to the historical product by a divergence algorithm according to each historical product, and taking the divergence value as the divergence value between the product field information of the historical product and the new product field information of the new product.

In this embodiment, the terminal identifies user distribution information corresponding to new user data of a new product and user distribution information corresponding to historical user data of each historical product. Wherein the user distribution information is the distribution information of each resource in the user information. The resource distribution information is used for representing the quantity distribution of resource information such as different resource possession, resource flow information, asset liability information and the like. The terminal calculates the divergence value between the user information corresponding to the new product and the user distribution information corresponding to the historical product by a divergence algorithm according to each historical product, and takes the divergence value as the product field information of the historical product and the divergence value between the product field information of the new product.

Wherein the divergence calculation formula is:

in the above formula, p is the user distribution information of the new product, q represents the user distribution information of the historical product, and KL is the KL divergence algorithm.

Based on the scheme, the similarity information of the product field information of each historical product and the product field information of the new product is identified through the divergence value, so that the accuracy of identifying the similarity information is improved.

Optionally, assimilating the historical user data of the target historical product corresponding to the auxiliary field information and the new user data of the new product to obtain a migration transformation matrix of each target historical product and the new product, including: acquiring risk classification information corresponding to historical user data of each target historical product, and respectively predicting first risk classification information of each new user data based on a risk classification model of each target historical product; taking the new user data and the first risk classification information of the new user data as new data sets of new products, and taking the historical user data of each target historical product and the risk classification information of the historical user data of each target historical product as historical data sets of each target historical product; and carrying out data balance processing on the new data set and each historical data set through a data balance program to obtain balance vectors corresponding to each target historical product and the new product, and carrying out matrixing processing on the balance vectors to obtain migration transformation matrixes corresponding to each target historical product and the new product.

In this embodiment, the terminal obtains risk classification information corresponding to historical user data of each target historical product, and predicts first risk classification information of each new user data based on a risk classification model of each target historical product. Then, the terminal uses the new user data and the first risk classification information of the new user data as new data sets of new products, and uses the historical user data of the target historical products and the risk classification information of the historical user data of the target historical products as historical data sets of the target historical products.

Specifically, suppose that the target domain information (i.e., the product domain information of a new product) is D _t The data set is X _t The auxiliary domain information is D _s The data set is X _s The label is y _s Setting dimension of dimension reduction feature in mapping space domain information as D, setting regularization parameter lambda, setting balance factor as mu, wherein the balance factor is dominant for balancing edge distribution and condition distribution, and the larger mu is, which shows that two data sets are similar, so that condition distribution is more important, otherwise, the balance formula is as follows, and BDA aims at minimizing D (D _s ，D _t )：

D(D _s ，D _t )≈(1-μ)D(P(x _s )，P(x _t ))+μD(P(y _s |x _s )，P(y _t |x _t ))

Tag y in which target area information _t None, the BDA models by training on ancillary domain information.

The terminal performs data balance processing on the new data set and each historical data set through a data balance program to obtain balance vectors corresponding to each target historical product and the new product, and performs matrixing processing on the balance vectors to obtain migration transformation matrixes corresponding to each target historical product and the new product.

Specifically, by matrix change and regularization, the above minimization of D (D _s ，D _t ) Conversion to the following algorithm

s.t.A ^T XHX ^T A＝I，0≤μ≤1

Wherein Φ= (Φ) ₁ ，Φ ₂ ，...，Φ _d ) And expressing Lagrangian multipliers, converting the optimization problem into a generalized characteristic decomposition problem, and solving to obtain a migration transformation matrix A.

In the above formula, x= [ X ] _s ，X _t ]I represents an identity matrix, H represents a center matrix, M ₀ And M _c The MMD distance of the edge distribution of the two domain information and the MMD distance of the conditional distribution are shown as C epsilon {1,2,.. The label of the data is shown as C } and n is the number of user data in the history data set of the auxiliary domain information and m is the number of user data in the new data set of the target domain information.

Based on the scheme, the edge distribution and condition distribution difference in the two field information is reduced by performing BDA processing on the user data of the historical product and the user data of the new product, so that the user data of the historical product can be used as training data of the new product.

Optionally, training a risk classification model of each target historical product based on the migration transformation matrix to obtain a sub-target risk classification model corresponding to each auxiliary field information, including: identifying each conversion user data of the migration transformation matrix and risk classification information of each conversion user data, and respectively training a risk classification model of each target historical product based on each conversion user data and the risk classification information of each conversion user data to obtain an initial sub-target risk classification model corresponding to each auxiliary field information; based on the initial sub-target risk classification models, predicting new risk classification information of each new user data respectively, and calculating the average value of the new risk classification information of each new user data to obtain second risk classification information of each new user data; identifying a deviation value between the first risk classification information of each new user data and the second risk classification information of each new user data, replacing the first risk classification information of each new user data with the second risk classification information of each new user data when the deviation value is larger than a deviation threshold value, returning to execute a data balancing program, and performing data balancing processing on the new data set and each historical data set to obtain a balancing vector corresponding to each target historical product and a new product; and taking the initial sub-target risk classification model corresponding to the auxiliary field information obtained in the last iteration as the initial sub-target risk classification model corresponding to the auxiliary field information until no deviation value larger than the deviation threshold exists.

In this embodiment, the terminal identifies each conversion user data of the migration transformation matrix and risk classification information of each conversion user data, and respectively trains a risk classification model of each target historical product based on each conversion user data and risk classification information of each conversion user data to obtain an initial sub-target risk classification model corresponding to each auxiliary field information. And then, the terminal respectively predicts the new risk classification information of each new user data based on each initial sub-target risk classification model, calculates the average value of the new risk classification information of each new user data, and obtains the second risk classification information of each new user data.

The terminal identifies the deviation value between the first risk classification information of each new user data and the second risk classification information of each new user data, replaces the first risk classification information of each new user data with the second risk classification information of each new user data under the condition that the deviation value is larger than a deviation threshold value, returns to execute a data balancing process through a data balancing program, performs data balancing processing on a new data set and each historical data set to obtain a balancing vector step corresponding to each target historical product and a new product, and uses an initial sub-target risk classification model corresponding to each auxiliary field information obtained in the last iteration as a sub-target risk classification model corresponding to each auxiliary field information until the deviation value is not larger than the deviation threshold value.

Specifically, the terminal is in auxiliary domain information { A ^T X _s ,y _s Training a risk classification model f corresponding to each auxiliary field information on the model to obtain a sub-target risk corresponding to each auxiliary field informationAnd (5) classifying the model.

Wherein three different base classifiers clf are arranged in a single risk classification model, and then 3 classifiers are used for training on the auxiliary field respectively to obtain a strong classifier f= { clf ₁ ,clf ₂ ,clf ₃ Then the terminal predicts the risk classification information of the target domain information by using the sub-target risk classification model in f, and averages the results to obtain the prediction result of the target domain informationFinally updating matrix M _c Iterative training is continued until the model converges.

Based on the scheme, the classification parameters of the sub-target risk classification models are adjusted by training in a plurality of auxiliary field information and taking convergence as a training standard, so that the variability of the sub-target risk classification models is further reduced, and the prediction accuracy of risk classification information of users of target field information corresponding to new products is improved.

Optionally, determining the weight value of each sub-target risk classification model based on the divergence value between each auxiliary domain information and the product domain information of the new product includes: calculating the reciprocal of the divergence value between each auxiliary field information and the product field information of the new product to obtain the reciprocal value corresponding to each auxiliary field information; dividing the reciprocal value of the auxiliary field information by the sum of the reciprocal values of the auxiliary field information for each auxiliary field information to obtain the occupation ratio of the auxiliary field information, and taking the occupation ratio of the auxiliary field information as the weight value of the sub-target risk classification model corresponding to the auxiliary field information.

In this embodiment, the terminal calculates the reciprocal of the divergence value between each auxiliary domain information and the product domain information of the new product, and obtains the reciprocal value corresponding to each auxiliary domain information. And finally, dividing the reciprocal value of the auxiliary field information by the sum of the reciprocal values of the auxiliary field information for each auxiliary field information by the terminal to obtain the occupation ratio of the auxiliary field information, and taking the occupation ratio of the auxiliary field information as the weight value of the sub-target risk classification model corresponding to the auxiliary field information.

The terminal obtains the weight lambda of each auxiliary field information constructed model according to the following formula ₁ ,λ ₂ ,...,λ _i (taking i auxiliary regions as an example), where j ₁ ,j ₂ ,...,j _i The JS divergence values of the i pieces of auxiliary field information and the target field information are respectively represented by 1, 2.

......

Based on the scheme, the accuracy of calculating the target risk classification information of the user of the new product is improved by determining the weight value of the information of each auxiliary field.

Optionally, the method further comprises: acquiring target user data of a new product, and respectively predicting sub-target risk classification information corresponding to the target user data based on each sub-target risk classification model in the target risk classification model; and respectively carrying out weighted summation processing on the sub-target risk classification information corresponding to each sub-target risk classification model based on the weight value of each sub-target risk classification model to obtain target risk classification information corresponding to the target user data.

In this embodiment, the terminal obtains target user data of a new product, and predicts sub-target risk classification information corresponding to the target user data based on each sub-target risk classification model in the target risk classification model. And then, the terminal respectively carries out weighted summation processing on the sub-target risk classification information corresponding to each sub-target risk classification model based on the weight value of each sub-target risk classification model to obtain target risk classification information corresponding to the target user data.

Specifically, the terminal is based on each sub-target risk classification model f _i Data sets X corresponding to target user data of target domain information respectively _t Predicting to obtainThen the terminal predicts the result p of each sub-target risk classification model _i Multiplying the weight lambda corresponding to the sub-target risk classification model _i Then, adding the results of all sub-target risk classification models to obtain target risk classification information P corresponding to the final predicted target user data, namely P=lambda ₁ p ₁ +λ ₂ p ₂ +...+λ _i p _i 。

Based on the scheme, the risk classification information of the new product is predicted by combining the sub-target risk classification models of the plurality of auxiliary field information and the weights of the sub-target risk classification models. The problem that risk classification information of a user corresponding to a new product cannot be predicted due to the fact that a risk classification model of the new product cannot be directly obtained is avoided. Thereby improving the prediction efficiency of risk classification information of users of new products.

The application also provides a determination example of the risk classification model, as shown in fig. 2, and the specific processing procedure includes the following steps:

step S201, acquiring new user data of a new product, historical user data of a plurality of historical products, and risk classification models of the historical products.

Step S202, identifying the product types of the new products and the product types of the historical products, and acquiring the corresponding relation between the product types and the product field information in a product database.

Step S203, based on the product types of the new product and the product types of the history products, the new product domain information of the new product and the product domain information of the history products are determined by the correspondence relationship between the product types and the product domain information.

Step S204, user distribution information corresponding to new user data of a new product and user distribution information corresponding to historical user data of each historical product are identified.

Step S205, calculating the user information corresponding to the new product and the divergence value between the user distribution information corresponding to the historical product by a divergence algorithm according to each historical product, and taking the divergence value as the divergence value between the product field information of the historical product and the new product field information of the new product.

Step S206, screening the product domain information corresponding to the divergence value lower than the preset divergence threshold value as the auxiliary domain information of the new product.

Step S207, acquiring risk classification information corresponding to the historical user data of each target historical product, and respectively predicting first risk classification information of each new user data based on the risk classification model of each target historical product.

Step S208, the new user data and the first risk classification information of the new user data are used as new data sets of new products, and the historical user data of the target historical products and the risk classification information of the historical user data of the target historical products are used as historical data sets of the target historical products.

Step S209, performing data balance processing on the new data set and each history data set through a data balance program to obtain balance vectors corresponding to each target history product and the new product, and performing matrixing processing on the balance vectors to obtain migration transformation matrixes corresponding to each target history product and the new product.

Step S210, identifying each conversion user data of the migration transformation matrix and risk classification information of each conversion user data, and training a risk classification model of each target historical product based on each conversion user data and the risk classification information of each conversion user data to obtain an initial sub-target risk classification model corresponding to each auxiliary field information.

Step S211, based on the initial sub-target risk classification models, predicting new risk classification information of each new user data respectively, and calculating an average value of the new risk classification information of each new user data to obtain second risk classification information of each new user data.

Step S212, identifying the deviation value between the first risk classification information of each new user data and the second risk classification information of each new user data, replacing the first risk classification information of each new user data with the second risk classification information of each new user data when the deviation value is larger than the deviation threshold value, and returning to execute the data balancing process to perform data balancing processing on the new data set and each historical data set through the data balancing program to obtain the balancing vector corresponding to each target historical product and the new product.

And S213, taking the initial sub-target risk classification model corresponding to each auxiliary field information obtained in the last iteration as the initial sub-target risk classification model corresponding to each auxiliary field information until no deviation value larger than the deviation threshold exists.

Step S214, calculating the reciprocal of the divergence value between each auxiliary field information and the product field information of the new product, and obtaining the reciprocal value corresponding to each auxiliary field information.

Step S215, dividing the reciprocal value of the auxiliary field information by the sum of reciprocal values of the auxiliary field information to obtain the duty ratio of the auxiliary field information aiming at each auxiliary field information, and taking the duty ratio of the auxiliary field information as the weight value of the sub-target risk classification model corresponding to the auxiliary field information.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a determining device of the risk classification model for realizing the determining method of the risk classification model. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiment of the determining apparatus for one or more risk classification models provided below may be referred to the limitation of the determining method for a risk classification model hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 3, there is provided a determining apparatus of a risk classification model, including: acquisition module 310, screening module 320, assimilation module 330, and determination module 340, wherein:

an acquisition module 310 for acquiring new user data of a new product, historical user data of a plurality of historical products, and a risk classification model of each of the historical products, and identifying new product domain information of the new product, and product domain information of each of the historical products;

a screening module 320, configured to calculate, based on new user data of the new product and historical user data of each of the historical products, product domain information of each of the historical products, and a divergence value between the new product domain information of the new product, and screen product domain information corresponding to the divergence value below a preset divergence threshold, as auxiliary domain information of the new product;

The assimilation module 330 is configured to assimilate historical user data of a target historical product corresponding to each auxiliary field information and new user data of the new product to obtain a migration transformation matrix of each target historical product and the new product, and train a risk classification model of each target historical product based on the migration transformation matrix to obtain a sub-target risk classification model corresponding to each auxiliary field information;

a determining module 340, configured to determine a weight value of each of the sub-target risk classification models based on a divergence value between each of the auxiliary domain information and the product domain information of the new product, and determine a target prediction model of the new product based on each of the sub-target risk classification models and the weight value of each of the sub-target risk classification models.

Optionally, the acquiring module 310 is specifically configured to:

Optionally, the screening module 320 is specifically configured to:

Optionally, the assimilation module 330 is specifically configured to:

Optionally, the determining module 340 is specifically configured to:

Optionally, the apparatus further includes:

The respective modules in the above-described determination means of the risk classification model may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of determining a risk classification model. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the structures shown in FIG. 4 are block diagrams only, and are not intended to limit the scope of the computer devices on which the present teachings are applied, and that a particular computer device may include more or less elements than those shown, or may combine some elements, or have a different arrangement of elements.

In an embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method of any of the first aspects when the computer program is executed.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method of any of the first aspects.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects.

It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the present application, which is within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of determining a risk classification model, the method comprising:

2. The method of claim 1, wherein the identifying new product domain information for the new product, and product domain information for each of the historical products, comprises:

3. The method of claim 1, wherein the calculating a divergence value between the product domain information of each historical product and the new product domain information of the new product, respectively, based on the new user data of the new product and the historical user data of each of the historical products, comprises:

4. The method of claim 3, wherein assimilating the historical user data of the target historical product corresponding to each auxiliary domain information with the new user data of the new product to obtain the migration transformation matrix of each target historical product and the new product comprises:

5. The method according to claim 1, wherein training the risk classification model of each target historical product based on the migration transformation matrix to obtain the sub-target risk classification model corresponding to each auxiliary domain information comprises:

6. The method of claim 5, wherein the determining a weight value for each of the sub-objective risk classification models based on a divergence value between each of the ancillary domain information and the product domain information for the new product comprises:

7. The method according to claim 1, wherein the method further comprises:

8. A device for determining a risk classification model, the device comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.