CN117634871A - Risk assessment model training method and device, storage medium and electronic equipment - Google Patents

Risk assessment model training method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN117634871A
CN117634871A CN202311484048.XA CN202311484048A CN117634871A CN 117634871 A CN117634871 A CN 117634871A CN 202311484048 A CN202311484048 A CN 202311484048A CN 117634871 A CN117634871 A CN 117634871A
Authority
CN
China
Prior art keywords
sample
guest group
transaction
group data
risk assessment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311484048.XA
Other languages
Chinese (zh)
Inventor
柴宝玥
陈惊雷
郝正鸿
韩冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang eCommerce Bank Co Ltd
Original Assignee
Zhejiang eCommerce Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang eCommerce Bank Co Ltd filed Critical Zhejiang eCommerce Bank Co Ltd
Priority to CN202311484048.XA priority Critical patent/CN117634871A/en
Publication of CN117634871A publication Critical patent/CN117634871A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The specification discloses a risk assessment model training method, a device, a storage medium and electronic equipment, wherein the method comprises the following steps: obtaining first sample business guest group data and first sample general guest group data, carrying out feature extraction to obtain sample general guest group features and sample business guest group features, calculating to obtain sample characterization parameters based on the sample general guest group features and the sample business guest group features, obtaining sample evaluation scores, training a first risk evaluation model based on the sample general guest group features, the sample business guest group features, the sample characterization parameters and the sample evaluation scores to obtain initial parameters, obtaining a second risk evaluation model based on the initial parameters, calculating an evaluation loss function and a migration loss function, determining the training state of the second risk evaluation model based on the evaluation loss function and the migration loss function until the training state indicates that the second risk evaluation model converges to obtain a trained risk evaluation model, and adopting the specification to realize risk evaluation on guest groups with insufficient data volume.

Description

Risk assessment model training method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a risk assessment model training method and apparatus, a storage medium, and an electronic device.
Background
Nowadays, more and more small micro enterprises play an increasingly important role in the market, and in the operation process of the small micro enterprises, in order to be able to operate smoothly, financing and other actions are required. However, in the process of financing, the risk of small and micro enterprises is high, so that financing is difficult.
Disclosure of Invention
The embodiment of the specification provides a risk assessment model training method, a risk assessment model training device, a risk assessment model training storage medium and a risk assessment model training electronic device, wherein the risk assessment model can be used for calculating an assessment score corresponding to sample business guest group data according to sample guest group data with sufficient data volume by adopting the risk assessment model, and risk assessment is carried out on the sample business guest group data with insufficient data volume.
In a first aspect, embodiments of the present disclosure provide a risk assessment model training method, the method including:
acquiring first sample business guest group data and first sample guest group data, and extracting features of the first sample guest group data and the first sample business guest group data to obtain sample guest group features corresponding to the first sample guest group data and sample business guest group features corresponding to the first sample business guest group data;
Based on the sample general objective feature and the sample transaction objective feature, calculating to obtain a sample characterization parameter of a cluster migration network, wherein the sample characterization parameter is used for correlating the sample general objective feature and the sample transaction objective feature as an evaluation standard, and the cluster migration network is contained in a first risk evaluation model;
acquiring a sample evaluation score for representing the risk state of a transaction guest group, training the first risk evaluation model based on the sample general guest group characteristics, the sample transaction guest group characteristics, the sample characterization parameters and the sample evaluation score to obtain initial parameters of the first risk evaluation model, and obtaining a second risk evaluation model based on the initial parameters;
and calculating an evaluation loss function corresponding to the second risk evaluation model and a migration loss function corresponding to the clustering migration network, and determining a training state of the second risk evaluation model based on the evaluation loss function and the migration loss function until the training state indicates that the second risk evaluation model converges, so as to obtain a trained risk evaluation model.
In a second aspect, embodiments of the present disclosure provide a risk assessment method, the method including:
Acquiring business guest group data corresponding to a business guest group and general guest group data corresponding to the business guest group data, and inputting the business guest group data and the general guest group data into a risk assessment model obtained by the training of the risk assessment model;
performing high-dimensional mapping on the transaction guest group data and the general guest group data based on the risk assessment model to obtain characterization parameters corresponding to the transaction guest group data and the general guest group data;
and performing risk assessment on the transaction guest group data based on the characterization parameters and the general guest group data to obtain a risk assessment result corresponding to the transaction guest group.
In a third aspect, embodiments of the present disclosure provide a risk assessment model training apparatus, the apparatus including:
the feature acquisition unit is used for acquiring first sample business guest group data and first sample general guest group data, and carrying out feature extraction on the first sample general guest group data and the first sample business guest group data to obtain sample general guest group features corresponding to the first sample general guest group data and sample business guest group features corresponding to the first sample business guest group data;
The parameter acquisition unit is used for calculating sample characterization parameters of a cluster migration network based on the sample general objective feature and the sample transaction objective feature, wherein the sample characterization parameters are used for correlating the sample general objective feature and the sample transaction objective feature as evaluation standards, and the cluster migration network is contained in a first risk evaluation model;
the model training unit is used for obtaining a sample evaluation score for representing the risk state of the business guest group, training the first risk evaluation model based on the sample general guest group characteristics, the sample business guest group characteristics, the sample characterization parameters and the sample evaluation score to obtain initial parameters of the first risk evaluation model, and obtaining a second risk evaluation model based on the initial parameters;
the model completion unit is configured to calculate an evaluation loss function corresponding to the second risk evaluation model and a migration loss function corresponding to the clustered migration network, determine a training state of the second risk evaluation model based on the evaluation loss function and the migration loss function, until the training state indicates that the second risk evaluation model converges, and obtain a trained risk evaluation model.
In a fourth aspect, embodiments of the present disclosure provide a risk assessment apparatus, the apparatus comprising:
the data acquisition unit is used for acquiring transaction guest group data corresponding to the transaction guest group and general guest group data corresponding to the transaction guest group data, and inputting the transaction guest group data and the general guest group data into a trained risk assessment model obtained by the risk assessment model training method;
the parameter calculation unit is used for carrying out high-dimensional mapping on the transaction guest group data and the general guest group data based on the risk assessment model to obtain characterization parameters corresponding to the transaction guest group data and the general guest group data;
and the result acquisition unit is used for carrying out risk assessment on the transaction guest group data based on the characterization parameters and the general guest group data to obtain a risk assessment result corresponding to the transaction guest group.
In a fifth aspect, the present description embodiments provide a computer program product storing at least one instruction adapted to be loaded by a processor and to perform the above-described method steps.
In a sixth aspect, the present description provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method described above.
In a seventh aspect, embodiments of the present disclosure provide an electronic device, including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the steps of the method described above.
In the embodiment of the specification, the first sample business guest group data and the first sample general guest group data are obtained, the characteristics are extracted to obtain sample business guest group characteristics and sample general guest group characteristics, the sample characterization parameters are obtained through calculation, the sample evaluation score is obtained, the first risk evaluation model is trained to obtain the second risk evaluation model based on the sample characterization parameters, the sample evaluation score, the sample business guest group characteristics and the sample general guest group characteristics, the recommended loss function is calculated to judge whether the second risk evaluation model is converged or not, so that the risk evaluation model training is completed, the trained risk evaluation model is obtained, the evaluation score corresponding to the sample business guest group data can be calculated by adopting the risk evaluation model according to the obtained sample general guest group data with sufficient data quantity, and the risk evaluation of the sample business guest group data with insufficient data quantity is realized.
Drawings
In order to more clearly illustrate the technical solutions of the present specification or the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the prior art descriptions, it is obvious that the drawings in the following description are only some embodiments of the present specification, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a system architecture diagram of a risk assessment model training method according to an embodiment of the present disclosure;
FIG. 2 is a schematic flow chart of a risk assessment model training method according to an embodiment of the present disclosure;
FIG. 3 is an exemplary schematic diagram of a feature classification provided by embodiments of the present disclosure;
FIG. 4 is an exemplary schematic diagram of a high-dimensional stretch provided in an embodiment of the present disclosure;
FIG. 5 is a schematic flow chart of a risk assessment model training method according to an embodiment of the present disclosure;
FIG. 6 is an exemplary schematic diagram of another high-dimensional stretch provided by embodiments of the present disclosure;
fig. 7 is a schematic flow chart of a risk assessment method according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of a risk assessment model training apparatus according to an embodiment of the present disclosure;
Fig. 9 is a schematic structural view of a feature acquisition unit provided in the embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of a parameter obtaining unit according to an embodiment of the present disclosure;
fig. 11 is a schematic structural diagram of a risk assessment device according to an embodiment of the present disclosure;
fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
So that the manner in which the features and advantages of the present specification can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present disclosure.
In the prior art, when a model is trained to evaluate the credit risk of a small micro-enterprise, the problems of insufficient data, low accuracy and the like exist, and the credit risk of the small micro-enterprise cannot be evaluated rapidly and accurately.
Based on the above, the embodiment of the specification provides a risk assessment model training method, by adopting the embodiment of the specification, by acquiring first sample business guest group data and first sample general guest group data, performing feature extraction to obtain sample business guest group features and sample general guest group features, calculating to obtain sample characterization parameters, acquiring sample assessment scores, training a first risk assessment model based on the sample characterization parameters, the sample assessment scores, the sample business guest group features and the sample general guest group features to obtain a second risk assessment model, calculating a recommended loss function to judge whether the second risk assessment model is converged, so as to determine whether the risk assessment model training is completed, thereby obtaining a trained risk assessment model, and calculating corresponding assessment scores according to the acquired sample business guest group data by adopting the risk assessment model to realize risk assessment on the sample business guest group data.
Referring to fig. 1, a system architecture diagram for risk assessment model training is provided for an embodiment of the present disclosure. As shown in fig. 1, the risk assessment model training method provided in the embodiment of the present disclosure may be applied to a terminal device or a server to implement a risk assessment model training process, and the system structure provided in the embodiment of the present disclosure mainly includes a model training server 10 and a database server 20. The model training server 10 may be a large-scale integrated server used by an enterprise, or may be a micro-computer, such as a personal computer; the database server 20 may be a large-scale integrated server used by an enterprise, or may be a micro-computer, such as a personal computer.
In the embodiment of the present disclosure, the model training server 10 obtains sample transaction guest group data, sample population data and sample evaluation score sent by the database server 20, obtains sample transaction guest group features and sample population features according to the sample transaction guest group data and the sample population data, obtains sample characterization parameters according to the sample transaction guest group features and the sample population features, trains the risk evaluation model according to the sample characterization parameters, the sample evaluation score, the sample transaction guest group features and the sample population features, and obtains a trained risk evaluation model.
In the embodiment of the specification, the first sample business guest group data and the first sample general guest group data are obtained, the characteristics are extracted to obtain sample business guest group characteristics and sample general guest group characteristics, the sample characterization parameters are obtained through calculation, the sample evaluation score is obtained, the first risk evaluation model is trained to obtain the second risk evaluation model based on the sample characterization parameters, the sample evaluation score, the sample business guest group characteristics and the sample general guest group characteristics, the recommended loss function is calculated to judge whether the second risk evaluation model is converged or not, so that the risk evaluation model training is completed, the trained risk evaluation model is obtained, the evaluation score corresponding to the sample business guest group data can be calculated by adopting the risk evaluation model according to the obtained sample general guest group data with sufficient data quantity, and the risk evaluation of the sample business guest group data with insufficient data quantity is realized.
Based on the system architecture shown in fig. 1, the risk assessment model training method provided in the embodiment of the present disclosure will be described in detail below with reference to fig. 2 to 4.
Referring to fig. 2, a flowchart of a risk assessment model training method is provided for the embodiment of the present disclosure. As shown in fig. 2, the method may include the following steps S102-S108.
S102, acquiring first sample business guest group data and first sample general guest group data, and extracting features of the first sample general guest group data and the first sample business guest group data to obtain sample general guest group features corresponding to the first sample general guest group data and sample business guest group features corresponding to the first sample business guest group data;
in one embodiment, first sample transaction guest group data and first sample general guest group data are obtained, data screening is conducted on the first sample transaction guest group data and the first sample general guest group data to obtain second sample transaction guest group data and second sample general guest group data, feature extraction is conducted on the second sample transaction guest group data and the second sample general guest group data to obtain sample general guest group features corresponding to the second sample general guest group data and sample transaction guest group features corresponding to the second sample transaction guest group data.
The first sample transaction guest group data may be a guest group with a newer guest group type, for example, may be a new industry guest group, may also be a refined industry guest group, and the like. It should be noted that the guest group is characterized by an insufficient amount of data to be collected for model training. For example, at the time the industry of live broadcast is emerging, a live broadcast-related entertainment company may at that time consider a group of guests of a newer group type. The first sample transaction group data may include transaction name, transaction type, profit information, tax information, borrowing and repayment information, employee information, etc. of the transaction group data.
The newer definition of the guest group type can be that the guest group is a new industry, or can be that the collected data in the existing industry is less.
The refinement can be the targeted data collection of each sub-category in the existing industry.
The first sample community data may be community data of an existing industry, where the first sample community data has a correlation with the first sample community data, so as to facilitate obtaining a corresponding evaluation result of the first sample community data.
For example, the method for screening data may be to sort each data type in the data, and screen out the data type irrelevant to risk audit, for example, part of employee information in a transaction party in a guest group. Thereby determining the type of data and tag columns required for feature extraction.
The step of screening the data may be performed in advance, or may be performed after the data is input to a model to be trained, and the data screening may be performed based on the model, which may be specifically set according to actual needs.
S104, calculating to obtain sample characterization parameters of the cluster migration network based on the sample generic and customer group characteristics and the sample transaction customer group characteristics;
in one embodiment, feature classification is performed on sample population features and sample transaction population features to obtain sub-sample population features and sub-sample transaction population features of each classification, high-dimensional characterization is performed on the sub-sample population features and the sub-sample transaction population features of each classification to obtain feature distribution of each classification mapped in the high-dimensional characterization, distribution distances of feature distribution between the sub-sample population features and the sub-transaction population features of each classification are calculated, and distribution distances are optimized, so that dimensions of the sub-sample population features and the sub-sample transaction population features of each classification are the same, and sample characterization parameters corresponding to each classification in a clustering migration network are obtained when the dimensions are the same.
The sample characterization parameter may be a parameter for evaluating a sample generic-guest-group feature and a sample transaction-guest-group feature, and the sample generic-guest-group feature and the sample transaction-guest-group feature are associated by using a sample standard parameter as an evaluation standard. It can be understood that the corresponding data information in different guest groups is difficult to measure or compare by adopting an accurate number, so that the sample business guest group domain sample general guest group can be correlated by taking the sample characterization parameter as a measurement standard, and further the sample business data is evaluated according to the sample general guest group data.
The cluster migration network is included in the first risk assessment model, and may be an algorithm with a migration learning function.
For example, the method for classifying the features may be classifying the features with larger relevance into the same class, so as to obtain the sub-sample generic-guest group feature corresponding to the sample generic-guest group feature and the sub-sample transaction guest group feature corresponding to the sample transaction guest group feature. For example, the names, group types, and the like in the features are classified as basic information, and the annual profit, month profit, day profit, and the like are classified as profit. For example, as shown in fig. 3, fig. 3 includes a plurality of classifications such as "classification 1" and "classification 2", and each of the classifications includes a plurality of information features, where there is a correspondence between the information of the sample transaction group feature and the sample flood group feature in the same classification, for example, "information 1" in "classification 1" corresponds to "information 3", and so on.
It should be noted that, the method for obtaining the distribution distance may be to obtain the distribution distance of the sub-sample generic-guest group feature and the sub-sample transaction guest group feature of each class, where the distribution distance obtaining process between the classes does not interfere with each other.
The method for optimizing the distribution distance can be, for example, to perform high-dimensional mapping on each feature of the same class, improve the correlation degree between each feature, further shorten the distribution distance corresponding to each high-dimensional feature in the same class, and reduce the distribution distance until the same dimension is reached.
Illustratively, as shown in fig. 4, the "information 1" and the "information 3" for the "category 1" are mapped in fig. 4 in a high dimension, and the dimensions of the "information 1" and the "information 3" are optimized to reach the same dimension "dimension x".
After the high-dimensional mapping is performed, features reaching the same dimension with great dimensions can be classified and subdivided and then respectively subjected to dimension stretching, or all features of two groups of sample business guest groups and sample general guest groups of the same classification can be jointly subjected to dimension stretching, and the method can be specifically set according to actual needs.
S106, obtaining a sample evaluation score for representing the risk state of the transaction guest group, training the first risk evaluation model based on the sample general guest group characteristics, the sample transaction guest group characteristics, the sample characterization parameters and the sample evaluation score to obtain initial parameters of the first risk evaluation model, and obtaining a second risk evaluation model based on the initial parameters;
In one embodiment, the sample evaluation score may be data for characterizing a risk status of a transaction guest group, specifically, a number, or other terms with hierarchical adjective. For example, it may be ninety minutes in the percentile, or it may be superior in the median difference.
The first risk assessment model may be a model for performing risk assessment on a transaction guest group, where the first risk assessment model includes a cluster migration network, and parameters in the first risk assessment model are all set as unknown parameters. Wherein the cluster migration network may be a module of automated training in the first risk assessment model.
The initial parameters may be parameters used for calculating the first risk assessment model according to the obtained sample features, so as to obtain an assessment score corresponding to the sample transaction guest group features. It should be noted that the initial parameters may include a plurality of parameters, and the first risk assessment model is trained according to the initial parameters.
The second risk assessment model may be an initial parameter obtained after training according to the first risk assessment model, and the unknown parameter in the first risk assessment model is replaced and updated by the initial parameter to obtain the second risk assessment model, so that the risk state of the acquired sample transaction guest group data is assessed by adopting the second risk assessment.
S108, calculating an evaluation loss function corresponding to the second risk evaluation model and a migration loss function corresponding to the clustering migration network, and determining a training state of the second risk evaluation model based on the evaluation loss function and the migration loss function until the training state indicates that the second risk evaluation model converges to obtain a trained risk evaluation model;
in one embodiment, after the second risk assessment model is obtained, a training assessment score is obtained by calculation according to sample transaction objective features, sample general objective features and sample characterization parameters, the training risk score and the sample assessment score are compared to obtain an assessment loss function corresponding to the second risk assessment model and model regularities, the sum of distribution distances corresponding to all classifications in the cluster migration network is calculated, the sum of the distribution distances is used as a migration loss function of the cluster migration network, the training state of the second risk assessment model is determined based on the assessment loss function, model regularities and the migration loss function, the second risk assessment model is judged according to the training state until the training state indicates that the second risk assessment model converges, and the trained risk assessment model is obtained.
The evaluation loss function may be a loss function for indicating a difference between the training risk score and the sample risk score obtained from the second risk evaluation model.
The modular regularization may be a parameter for indicating a degree of overfitting of the second risk assessment.
The migration loss function may be a loss function for indicating the accuracy of the calculated characterization parameters of the clustered migration network.
The training state may be data indicating a degree of training of the second risk assessment model, and the training state may be a result of comparing the training assessment score with the sample assessment score. And when the difference value between the training evaluation score and the sample risk evaluation result is smaller than or equal to a preset difference value range, determining that the training state indicates that the second risk evaluation model converges.
Further, when the training state indicates that the second risk assessment model is not converged, the cluster migration network is optimized, an optimized cluster migration network is obtained, and steps S106 and S108 are executed based on the optimized cluster migration network until the training state is converged only by the second risk assessment model.
In the embodiment of the specification, the first sample business guest group data and the first sample general guest group data are obtained, the characteristics are extracted to obtain sample business guest group characteristics and sample general guest group characteristics, the sample characterization parameters are obtained through calculation, the sample evaluation score is obtained, the first risk evaluation model is trained to obtain the second risk evaluation model based on the sample characterization parameters, the sample evaluation score, the sample business guest group characteristics and the sample general guest group characteristics, the recommended loss function is calculated to judge whether the second risk evaluation model is converged or not, so that the risk evaluation model training is completed, the trained risk evaluation model is obtained, the evaluation score corresponding to the sample business guest group data can be calculated by adopting the risk evaluation model according to the obtained sample general guest group data with sufficient data quantity, and the risk evaluation of the sample business guest group data with insufficient data quantity is realized.
Referring to fig. 5, a flowchart of a risk assessment model training method is provided for the embodiment of the present disclosure. As shown in fig. 5, the method may include the following steps S202 to S224.
S202, acquiring first sample business guest group data and first sample general guest group data, and carrying out data screening on the first sample general guest group data and the first sample business guest group data to obtain second sample general guest group data and second sample business guest group data;
in one embodiment, the first sample transaction guest group data may be a guest group with a newer guest group type, specifically may be a new industry guest group, may be a refined industry guest group, and the like. It should be noted that the guest group is characterized by an insufficient amount of data to be collected for model training. For example, at the time the industry of live broadcast is emerging, a live broadcast-related entertainment company may at that time consider a group of guests of a newer group type. The first sample transaction group data may include transaction name, transaction type, profit information, tax information, borrowing and repayment information, employee information, etc. of the transaction group data.
The newer definition of the guest group type can be that the guest group is a new industry, or can be that the collected data in the existing industry is less.
The refinement can be the targeted collection of data for each sub-category in the existing industry group. For example, the restaurant industry group belongs to an industry group with a fairly long development history, but as more and more novel restaurant shops are now appeared, the novel restaurant shops can refine the industry group.
The first sample community data may be community data of an existing industry, where the first sample community data has a correlation with the first sample community data, so as to facilitate obtaining a corresponding evaluation result of the first sample community data. It should be noted that, the relevant data amount of the first sample pan-group data for risk assessment is enough to complete the data set of the model for risk assessment.
The second sample transaction guest group data and the second sample general guest group data may be data obtained by respectively performing data screening on the first sample transaction guest group data and the first sample general guest group data.
For example, the method for screening data may be to sort each data type in the data, and screen out the data type irrelevant to risk audit, for example, part of employee information in a transaction party in a guest group. Thereby determining the type of data and tag columns required for feature extraction.
The step of screening the data may be performed in advance, or may be performed after the data is input to a model to be trained, and the data screening may be performed based on the model, which may be specifically set according to actual needs.
S204, extracting the characteristics of the second sample general guest group data and the second sample transaction guest group data to obtain sample general guest group characteristics corresponding to the second sample general guest group data and sample transaction guest group characteristics corresponding to the second sample transaction guest group data;
in one embodiment, the step of performing feature extraction on the second sample transaction guest group data and the second sample generic guest group data may be a model that needs to be trained when the second sample transaction guest group data and the second sample generic guest group data are input, and feature extraction may be performed based on the model, or feature extraction may be performed in advance, and may be specifically set according to actual needs.
It will be appreciated that the process of feature extraction of data is a process of converting an expression form of data into an expression form that can be understood and processed by the model.
S206, carrying out feature classification on the sample general guest group features and the sample transaction guest group features to obtain sub-sample general guest group features and sub-sample transaction guest group features of each classification;
In one embodiment, the method for classifying the features may be classifying the features with larger relevance into the same class, so as to obtain sub-sample generic-guest-group features corresponding to the sample generic-guest-group features and sub-sample transaction-guest-group features corresponding to the sample transaction-guest-group features. For example, the names, group types, and the like in the features are classified as basic information, and the annual profit, month profit, day profit, and the like are classified as profit. The sub-sample transaction guest group features may be information features corresponding to each category.
As shown in fig. 3, in an exemplary embodiment, fig. 3 includes a plurality of classifications such as "classification 1" and "classification 2", and each of the clusters includes a plurality of information features, where there is a correspondence between the information of the sample transaction cluster feature and the sample flood cluster feature in the same classification, for example, "information 1" in "classification 1" corresponds to "information 3", and so on.
S208, respectively carrying out high-dimensional characterization on the sub-sample general-guest group characteristics and the sub-sample transaction guest group characteristics of each category to obtain characteristic distribution of each category mapped in the high-dimensional characterization;
in one embodiment, the method for performing high-dimensional characterization on the sub-sample generic-guest-group features and the sub-sample transaction-guest-group features of each class may be that each class is sequentially mapped in high dimensions by using initial dimensions of the sub-sample transaction-guest-group features and the sub-sample generic-guest-group features of each class, so as to obtain feature distribution of each class mapping in high-dimensional characterization.
In the high-dimensional mapping, the dimension span mapped may be set according to actual needs, for example, the dimension span may be five dimensions, or may be one dimension. The method is specifically set according to actual needs.
S210, calculating the distribution distance of the feature distribution between the sub-sample general-guest group features and the sub-transaction guest group features in each category, and optimizing the distribution distance so that the dimensions of the sub-sample general-guest group features and the sub-sample transaction guest group features in each category are the same;
in one embodiment, the computation of the distribution distance of the feature distribution is performed for the subsampled generic-guest features and the subsampled generic-guest features of the same class, and the optimization is performed on the basis of the distribution distance until the subsampled transactional-guest features of each class are in the same dimension as the relevant high-dimensional characterization of the subsampled generic-guest features by narrowing the distribution distance.
It will be appreciated that the optimisation of the distribution distance between the different classifications is not interfering with each other, and therefore the specific dimension values for which the sub-sample transaction guest group features of the different classifications reach the same dimension may not be the same as the sub-sample guest group features.
Illustratively, as shown in fig. 4, the "information 1" and the "information 3" for the "category 1" are mapped in fig. 4 in a high dimension, and the dimensions of the "information 1" and the "information 3" are optimized to reach the same dimension "dimension x".
Further, as shown in fig. 6, it can be seen from the content in fig. 6 that "information 1" corresponds to "information 3", and "information 2" corresponds to "information 4", and the highest dimensions when the two pairs reach the same dimension are "dimension x" and "dimension y", respectively. The x and y in the "dimension x" and the "dimension y" may be the same value, or may be different values, which are specifically determined according to the actual situation.
After the high-dimensional mapping is performed, features reaching the same dimension with great dimensions can be classified and subdivided and then respectively subjected to dimension stretching, or all features of two groups of sample business guest groups and sample general guest groups of the same classification can be jointly subjected to dimension stretching, and the method can be specifically set according to actual needs.
S212, obtaining sample characterization parameters corresponding to each category in the cluster migration network when the dimensions are the same;
in one embodiment, the sample characterization parameter may be a parameter for evaluating the sample generic-population feature and the sample transaction-population feature, and the sample generic-population feature and the sample transaction-population feature are associated using the sample standard parameter as an evaluation criterion.
It can be understood that, since the data volume of the sample generic-guest group features is sufficient and the data volume of the sample transaction guest group features is insufficient, the evaluation standard of the sample generic-guest group features can be converted into the evaluation of the sample transaction guest group through the sample characterization parameters, and the more accurate the sample characterization parameters, the higher the accuracy of the evaluation standard.
S214, obtaining a sample evaluation score for representing the risk state of the transaction guest group;
in one embodiment, the sample evaluation score may be data for characterizing a risk status of a transaction guest group, specifically, a number, or other terms with hierarchical adjective. For example, it may be ninety minutes in the percentile, or it may be superior in the median difference.
The risk status may be information indicating the credit of the transaction group, for example, a good-quality group with good credit, a bad-quality group with bad credit, or the like.
S216, training the first risk assessment model based on the sample assessment score, the sample characterization parameters, the sample general objective features and the sample transaction objective features corresponding to each classification, so as to obtain initial parameters corresponding to the first risk assessment model, wherein the sample characterization parameters are used for assessing the sample general objective features and sub-sample transaction objective features corresponding to each classification in the sample general objective features and the sample transaction objective features;
In one embodiment, the first risk assessment model may be a model for performing risk assessment for a transaction guest group, where the first risk assessment model includes a cluster migration network, and parameters in the first risk assessment model are all set as unknown parameters. Wherein the cluster migration network may be a module of automated training in the first risk assessment model.
The initial parameters may be parameters used for calculating the first risk assessment model according to the obtained sample features, so as to obtain an assessment score corresponding to the sample transaction guest group features. It should be noted that the initial parameters may include a plurality of parameters, and the first risk assessment model is trained according to the initial parameters.
S218, obtaining a second risk assessment model based on the initial parameters;
in one embodiment, the second risk assessment model may be an initial parameter obtained after training according to the first risk assessment model, and the unknown parameter in the first risk assessment model is replaced and updated by the initial parameter, so as to obtain the second risk assessment model, so that the risk state of the acquired sample transaction guest data is assessed by adopting the second risk assessment.
S220, calculating an evaluation loss function and model regularities corresponding to the second risk evaluation model;
In one embodiment, after the second risk assessment model is obtained, a training assessment score is obtained by calculation according to the sample transaction objective feature, the sample general objective feature and the sample characterization parameter, and the training risk score and the sample assessment score are compared to obtain an assessment loss function and model regularization corresponding to the second risk assessment model.
The evaluation loss function may be a loss function for indicating a difference between the training risk score and the sample risk score obtained from the second risk evaluation model.
The modular regularization may be a parameter for indicating a degree of overfitting of the second risk assessment.
S222, calculating the sum of distribution distances corresponding to the classifications in the cluster migration network, and taking the sum of the distribution distances as a migration loss function of the cluster migration network;
in one embodiment, the migration loss function may be a loss function for indicating the accuracy of the calculated characterization parameters of the clustered migration network.
The cluster migration network includes five classifications, the distribution distances corresponding to the classifications are "distance 1", "distance 2", "distance 3", "distance 4" and "distance 5", the five distribution distances are added to obtain "distance 6", and "distance 6" is a migration loss function corresponding to the cluster migration network, so that the cluster migration network can evaluate the characteristics of the business objective according to the migration loss function. It should be noted that, the "distance 6" is not a specific distance, but is used to characterize the accuracy of the sample characterization parameter obtained by the cluster migration network to evaluate the characteristics of the transaction group.
S224, determining a training state of the second risk assessment model based on the assessment loss function, the model regularization and the migration loss function, until the training state indicates that the second risk assessment model converges, and obtaining a trained risk assessment model;
in one embodiment, the training state may be data indicative of a degree of training of the second risk assessment model, and the training state may be a comparison of the training assessment score and the sample assessment score. And when the difference value between the training evaluation score and the sample risk evaluation result is smaller than or equal to a preset difference value range, determining that the training state indicates that the second risk evaluation model converges.
Further, when the training state indicates that the second risk assessment model is not converged, the cluster migration network is optimized, so as to obtain an optimized cluster migration network, and the step S216 is executed until the training state is converged by the second risk assessment model only.
In the embodiment of the specification, the first sample business guest group data and the first sample general guest group data are obtained, the characteristics are extracted to obtain sample business guest group characteristics and sample general guest group characteristics, the sample characterization parameters are obtained through calculation, the sample evaluation score is obtained, the first risk evaluation model is trained to obtain the second risk evaluation model based on the sample characterization parameters, the sample evaluation score, the sample business guest group characteristics and the sample general guest group characteristics, the recommended loss function is calculated to judge whether the second risk evaluation model is converged or not, so that the risk evaluation model training is completed, the trained risk evaluation model is obtained, the evaluation score corresponding to the sample business guest group data can be calculated by adopting the risk evaluation model according to the obtained sample general guest group data with sufficient data quantity, and the risk evaluation of the sample business guest group data with insufficient data quantity is realized.
Referring to fig. 7, a flowchart of a risk assessment model training method is provided for the embodiment of the present disclosure. As shown in fig. 7, the method may include the following steps S302-S306.
S302, transaction guest group data corresponding to the transaction guest group and general guest group data corresponding to the transaction guest group data are obtained, and the transaction guest group data and the general guest group data are input into a trained risk assessment model;
in one embodiment, the transaction guest group data may be a guest group with a newer guest group type that needs to be risk assessed, for example, a new industry guest group, a refined industry guest group, and so on. It should be noted that the guest group is characterized by an insufficient amount of data to be collected for model training.
For example, at the time the industry of live broadcast is emerging, a live broadcast-related entertainment company may at that time consider a group of guests of a newer group type. The first sample transaction group data may include transaction name, transaction type, profit information, tax information, borrowing and repayment information, employee information, etc. of the transaction group data.
The newer definition of the guest group type can be that the guest group is a new industry, or can be that the collected data in the existing industry is less.
The refinement can be the targeted data collection of each sub-category in the existing industry.
The general guest group data can be guest group data of the existing industry, and the general guest group data has correlation with the business guest group data, so that the corresponding evaluation result of the business guest group data can be conveniently obtained.
The risk assessment model may be a pre-trained model for risk assessment of a business guest group, and the specific training method of the risk assessment model may refer to the above embodiments, which are not described herein.
It should be noted that, in order to improve the calculation efficiency of the risk assessment model for obtaining the calculation result, the risk assessment model performs data screening on the inputted business guest group data and general guest group data to obtain the data required by calculation.
Furthermore, a feasible method is that before the transaction guest group data and the general guest group data are input into the risk assessment model, data screening is performed on the transaction guest group data and the general guest group data, so that a data screening step of the risk assessment data is omitted, and the setting can be performed according to actual situations.
S304, performing high-dimensional mapping on the transaction guest group data and the general guest group data based on the risk assessment model to obtain characterization parameters corresponding to the transaction guest group data and the general guest group data;
In one embodiment, a cluster migration network in a risk assessment model is used for performing high-dimensional mapping on the transaction guest group data and the general guest group data to obtain characterization parameters corresponding to the transaction guest group data and the general guest group data.
The cluster migration network can be a module integrated in a packaging in risk assessment model, the cluster migration network uses the calculated characterization parameters to evaluate the general guest group characteristics and the transaction guest group characteristics, and uses standard parameters as evaluation standards to correlate the general guest group characteristics and the transaction guest group characteristics.
It can be understood that the corresponding data information in different guest groups is difficult to measure or compare by adopting an accurate number, so that the business guest group domain sample guest group can be associated by taking the characterization parameter as a measurement standard, and the evaluation of the business guest group is supported by the characteristic that the data volume of the guest group is sufficient, and further the evaluation and other calculations of the business data are performed according to the guest group data.
S306, performing risk assessment on the transaction guest group data based on the characterization parameters and the general guest group data to obtain a risk assessment result corresponding to the transaction guest group;
In one embodiment, the risk assessment result may be a result output after the risk assessment model performs risk assessment on the transaction guest group data, and specifically may be a number or other terms with a hierarchical role. For example, it may be ninety minutes in the percentile, or it may be superior in the median difference. And characterizing the risk state of the transaction guest group based on the risk assessment result, thereby providing an intuitive form for expressing the risk state of the transaction guest group.
It can be understood that, based on the risk assessment result, the risk assessment of the transaction guest group corresponding to the transaction guest group data can be determined, so that the risk assessment of the transaction guest group with insufficient data volume is completed through the data volume of the general guest group data of the general guest group with sufficient data volume, and further, data support can be provided for related transactions of the transaction guest group according to the risk assessment result, and the transaction handling efficiency of the transaction guest group is improved.
In the embodiment of the specification, the transaction guest group data and the general guest group data corresponding to the transaction guest group data are obtained, after the characterization parameters corresponding to the transaction guest group data are calculated based on the risk assessment model, the risk assessment is carried out according to the characterization parameters and the general guest group data corresponding to the transaction guest group data, so that the risk assessment result corresponding to the transaction guest group data output by the risk assessment model after training is completed is obtained, the purpose of providing support for the assessment of the transaction guest group by means of the characteristic that the data volume of the general guest group is sufficient is achieved, and further, the calculation of the assessment score is carried out on the transaction data according to the general guest group data with sufficient data volume, so that the risk assessment on the transaction guest group with insufficient data is realized.
Based on the system architecture shown in fig. 1, the risk assessment model training apparatus provided in the embodiment of the present disclosure will be described in detail below with reference to fig. 8 to 10. It should be noted that, the risk assessment model training apparatus in fig. 8-10 is used to perform the method of the embodiment shown in fig. 2-6 in the embodiment of the present disclosure, and for convenience of explanation, only the portion relevant to the embodiment of the present disclosure is shown, and specific technical details are not disclosed, please refer to the embodiment shown in fig. 2-6 in the embodiment of the present disclosure.
Referring to fig. 8, a schematic structural diagram of a risk assessment model training apparatus is provided in the embodiment of the present disclosure. As shown in fig. 8, the risk assessment model training apparatus 1 of the embodiment of the present specification may include: a feature acquisition unit 11, a parameter acquisition unit 12, a model training unit 13, and a model completion unit 14.
The feature acquiring unit 11 is configured to acquire first sample business guest group data and first sample general guest group data, perform feature extraction on the first sample general guest group data and the first sample business guest group data, and obtain a sample general guest group feature corresponding to the first sample general guest group data and a sample business guest group feature corresponding to the first sample business guest group data;
A parameter obtaining unit 12, configured to calculate, based on the sample generic-guest-group feature and the sample transaction-guest-group feature, a sample characterization parameter of a cluster migration network, where the sample characterization parameter is used as an evaluation criterion to correlate the sample generic-guest-group feature and the sample transaction-guest-group feature, and the cluster migration network is included in a first risk evaluation model;
a model training unit 13, configured to obtain a sample evaluation score for characterizing a risk state of a transaction guest group, train the first risk evaluation model based on the sample general guest group feature, the sample transaction guest group feature, the sample characterization parameter and the sample evaluation score, obtain an initial parameter of the first risk evaluation model, and obtain a second risk evaluation model based on the initial parameter;
the model completion unit 14 is configured to calculate an estimated loss function corresponding to the second risk assessment model and a migration loss function corresponding to the clustered migration network, determine a training state of the second risk assessment model based on the estimated loss function and the migration loss function, until the training state indicates that the second risk assessment model converges, and obtain a trained risk assessment model.
Alternatively, as shown in fig. 9, the feature acquisition unit 11 includes:
a data screening subunit 111, configured to obtain first sample business guest group data and first sample general guest group data, and perform data screening on the first sample general guest group data and the first sample business guest group data to obtain second sample general guest group data and second sample business guest group data;
and the feature obtaining subunit 112 is configured to perform feature extraction on the second sample generic-guest group data and the second sample transaction guest group data, so as to obtain a sample generic-guest group feature corresponding to the second sample generic-guest group data and a sample transaction guest group feature corresponding to the second sample transaction guest group data.
Alternatively, as shown in fig. 10, the parameter acquiring unit 12 includes:
a feature classification subunit 121, configured to perform feature classification on the sample generic-guest group feature and the sample transaction guest group feature, so as to obtain a sub-sample generic-guest group feature and a sub-sample transaction guest group feature of each classification;
a distribution obtaining subunit 122, configured to perform high-dimensional characterization on the sub-sample generic-guest group features and the sub-sample transaction guest group features of each class, so as to obtain feature distribution of each class mapped in the high-dimensional characterization;
An optimizing subunit 123, configured to calculate a distribution distance of the feature distribution between the sub-sample generic-guest group feature and the sub-transaction guest group feature in each classification, and optimize the distribution distance so that dimensions of the sub-sample generic-guest group feature and the sub-sample transaction guest group feature in each classification are the same;
and the parameter obtaining subunit 124 is configured to obtain, when the dimensions are the same, a sample characterization parameter corresponding to each of the classifications in the cluster migration network.
Optionally, the model training unit 13 is configured to:
obtaining a sample evaluation score for characterizing a risk state of a transaction guest group;
training the first risk assessment model based on the sample assessment score, the sample characterization parameters, the sample general objective features and the sample transaction objective features corresponding to each classification, so as to obtain initial parameters corresponding to the first risk assessment model, wherein the sample characterization parameters are used for assessing the sample general objective features and the sub-sample transaction objective features corresponding to each classification in the sample general objective features and the sample transaction objective features;
and obtaining a second risk assessment model based on the initial parameters.
Optionally, the model completion unit 14 is further configured to:
calculating an evaluation loss function and model regularities corresponding to the second risk evaluation model;
calculating the sum of distribution distances corresponding to the classifications in the cluster migration network, and taking the sum of the distribution distances as a migration loss function of the cluster migration network;
and determining the training state of the second risk assessment model based on the assessment loss function, the model regularization and the migration loss function until the training state indicates the convergence of the second risk assessment model, so as to obtain a risk assessment model after training.
Optionally, the model completion unit 14 is further configured to:
when the training state indicates that the second risk assessment model is not converged, optimizing the cluster migration network to obtain an optimized cluster migration network, wherein the distribution distance of each classification in the optimized cluster migration network is reduced when the dimensions of the classification are the same;
training the second risk assessment model based on the optimized cluster migration network, and performing the step of calculating an assessment loss function corresponding to the second risk assessment model and a migration loss function corresponding to the cluster migration network.
In the embodiment of the specification, the first sample business guest group data and the first sample general guest group data are obtained, the characteristics are extracted to obtain sample business guest group characteristics and sample general guest group characteristics, the sample characterization parameters are obtained through calculation, the sample evaluation score is obtained, the first risk evaluation model is trained to obtain the second risk evaluation model based on the sample characterization parameters, the sample evaluation score, the sample business guest group characteristics and the sample general guest group characteristics, the recommended loss function is calculated to judge whether the second risk evaluation model is converged or not, so that the risk evaluation model training is completed, the trained risk evaluation model is obtained, the evaluation score corresponding to the sample business guest group data can be calculated by adopting the risk evaluation model according to the obtained sample general guest group data with sufficient data quantity, and the risk evaluation of the sample business guest group data with insufficient data quantity is realized.
Based on the system architecture shown in fig. 1, the risk classification model training apparatus provided in the embodiment of the present disclosure will be described in detail below with reference to fig. 11. It should be noted that, the risk assessment apparatus in fig. 11 is used to perform the method of the embodiment shown in fig. 7 in the embodiment of the present specification, and for convenience of description, only the portion relevant to the embodiment of the present specification is shown, and specific technical details are not disclosed, please refer to the embodiment shown in fig. 7 in the embodiment of the present specification.
Referring to fig. 11, a schematic structural diagram of a risk assessment apparatus is provided in the embodiment of the present disclosure. As shown in fig. 11, the risk assessment apparatus 1 of the embodiment of the present specification may include: a data acquisition unit 21, a parameter calculation unit 22, and a result acquisition unit 23.
A data obtaining unit 21, configured to obtain transaction guest group data corresponding to a transaction guest group and general guest group data corresponding to the transaction guest group data, and input the transaction guest group data and the general guest group data into a risk assessment model obtained by the training of the risk assessment model training method;
a parameter calculation unit 22, configured to perform high-dimensional mapping on the transaction guest group data and the generic guest group data based on the risk assessment model, so as to obtain characterization parameters corresponding to the transaction guest group data and the generic guest group data;
the result obtaining unit 23 is configured to perform risk assessment on the transaction guest group data based on the characterization parameter and the generic guest group data, so as to obtain a risk assessment result corresponding to the transaction guest group.
In the embodiment of the specification, the transaction guest group data and the general guest group data corresponding to the transaction guest group data are obtained, after the characterization parameters corresponding to the transaction guest group data are calculated based on the risk assessment model, the risk assessment is carried out according to the characterization parameters and the general guest group data corresponding to the transaction guest group data, so that the risk assessment result corresponding to the transaction guest group data output by the risk assessment model after training is completed is obtained, the purpose of providing support for the assessment of the transaction guest group by means of the characteristic that the data volume of the general guest group is sufficient is achieved, and further, the calculation of the assessment score is carried out on the transaction data according to the general guest group data with sufficient data volume, so that the risk assessment on the transaction guest group with insufficient data is realized.
The embodiments of the present disclosure further provide a computer storage medium, where a plurality of program instructions may be stored, where the program instructions are adapted to be loaded by a processor and execute the method steps of the embodiments shown in fig. 1 to fig. 7, and the specific execution process may refer to the specific description of the embodiments shown in fig. 1 to fig. 7, which is not repeated herein.
The embodiment of the present disclosure further provides a computer program product, where the computer program product stores at least one instruction, where the at least one instruction is loaded by the processor and executed by the processor to implement the risk assessment model training method and the risk assessment method as described in the embodiment of fig. 1 to fig. 7, and the specific implementation process may refer to the specific description of the embodiment of fig. 1 to fig. 7, which is not repeated herein.
Referring to fig. 12, a schematic structural diagram of an electronic device is provided in an embodiment of the present disclosure. As shown in fig. 12, the electronic device 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, an input output interface 1003, a memory 1005, at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 12, an operating system, a network communication module, an input-output interface module, and a risk assessment model training application may be included in a memory 1005, which is one type of computer storage medium.
In the electronic device 1000 shown in fig. 12, the input/output interface 1003 is mainly used as an interface for providing input for a user, and acquires data input by the user.
In one embodiment, the processor 1001 may be configured to invoke the risk assessment model training application stored in the memory 1005 and specifically perform the following operations:
acquiring first sample business guest group data and first sample guest group data, and extracting features of the first sample guest group data and the first sample business guest group data to obtain sample guest group features corresponding to the first sample guest group data and sample business guest group features corresponding to the first sample business guest group data;
based on the sample general objective feature and the sample transaction objective feature, calculating to obtain a sample characterization parameter of a cluster migration network, wherein the sample characterization parameter is used for correlating the sample general objective feature and the sample transaction objective feature as an evaluation standard, and the cluster migration network is contained in a first risk evaluation model;
acquiring a sample evaluation score for representing the risk state of a transaction guest group, training the first risk evaluation model based on the sample general guest group characteristics, the sample transaction guest group characteristics, the sample characterization parameters and the sample evaluation score to obtain initial parameters of the first risk evaluation model, and obtaining a second risk evaluation model based on the initial parameters;
And calculating an evaluation loss function corresponding to the second risk evaluation model and a migration loss function corresponding to the clustering migration network, and determining a training state of the second risk evaluation model based on the evaluation loss function and the migration loss function until the training state indicates that the second risk evaluation model converges, so as to obtain a trained risk evaluation model.
Optionally, when the processor 1001 performs to obtain the first sample transaction guest group data and the first sample guest group data, perform feature extraction on the first sample guest group data and the first sample transaction guest group data to obtain a sample guest group feature corresponding to the first sample guest group data and a sample transaction guest group feature corresponding to the first sample transaction guest group data, specifically perform the following operations:
acquiring first sample business guest group data and first sample general guest group data, and carrying out data screening on the first sample general guest group data and the first sample business guest group data to obtain second sample general guest group data and second sample business guest group data;
and extracting the characteristics of the second sample general guest group data and the second sample transaction guest group data to obtain sample general guest group characteristics corresponding to the second sample general guest group data and sample transaction guest group characteristics corresponding to the second sample transaction guest group data.
Optionally, when executing the calculation to obtain the sample characterization parameter of the cluster migration network based on the sample generic-guest group feature and the sample transaction guest group feature, the processor 1001 specifically performs the following operations:
performing feature classification on the sample general guest group features and the sample transaction guest group features to obtain sub-sample general guest group features and sub-sample transaction guest group features of each classification;
respectively carrying out high-dimensional characterization on the sub-sample generic-guest group characteristics and the sub-sample transaction guest group characteristics of each category to obtain characteristic distribution of each category mapped in the high-dimensional characterization;
calculating the distribution distance of the feature distribution between the sub-sample general guest group feature and the sub-transaction guest group feature in each classification, and optimizing the distribution distance so that the dimensions of the sub-sample general guest group feature and the sub-sample transaction guest group feature in each classification are the same;
and obtaining sample characterization parameters corresponding to each category in the cluster migration network when the dimensions are the same.
Optionally, when executing obtaining a sample evaluation score for characterizing a risk state of a transaction guest group, the processor 1001 trains the first risk evaluation model based on the sample generic guest group feature, the sample transaction guest group feature, the sample characterization parameter and the sample evaluation score to obtain an initial parameter of the first risk evaluation model, and obtains a second risk evaluation model based on the initial parameter, specifically performs the following operations:
Obtaining a sample evaluation score for characterizing a risk state of a transaction guest group;
training the first risk assessment model based on the sample assessment score, the sample characterization parameters, the sample general objective features and the sample transaction objective features corresponding to each classification, so as to obtain initial parameters corresponding to the first risk assessment model, wherein the sample characterization parameters are used for assessing the sample general objective features and the sub-sample transaction objective features corresponding to each classification in the sample general objective features and the sample transaction objective features;
and obtaining a second risk assessment model based on the initial parameters.
Optionally, when the processor 1001 performs calculation of an estimated loss function corresponding to the second risk assessment model and a migration loss function corresponding to the clustered migration network, determines a training state of the second risk assessment model based on the estimated loss function and the migration loss function until the training state indicates that the second risk assessment model converges, and obtains a trained risk assessment model, the processor specifically performs the following operations:
calculating an evaluation loss function and model regularities corresponding to the second risk evaluation model;
Calculating the sum of distribution distances corresponding to the classifications in the cluster migration network, and taking the sum of the distribution distances as a migration loss function of the cluster migration network;
and determining the training state of the second risk assessment model based on the assessment loss function, the model regularization and the migration loss function until the training state indicates the convergence of the second risk assessment model, so as to obtain a risk assessment model after training.
Optionally, when the processor 1001 determines, based on the estimated loss function and the migration loss function, a training state of the second risk assessment model until the training state indicates that the second risk assessment model converges, and obtains a trained risk assessment model, the processor specifically performs the following operations:
when the training state indicates that the second risk assessment model is not converged, optimizing the cluster migration network to obtain an optimized cluster migration network, wherein the distribution distance of each classification in the optimized cluster migration network is reduced when the dimensions of the classification are the same;
training the second risk assessment model based on the optimized cluster migration network, and performing the step of calculating an assessment loss function corresponding to the second risk assessment model and a migration loss function corresponding to the cluster migration network.
In the embodiment of the specification, the first sample business guest group data and the first sample general guest group data are obtained, the characteristics are extracted to obtain sample business guest group characteristics and sample general guest group characteristics, the sample characterization parameters are obtained through calculation, the sample evaluation score is obtained, the first risk evaluation model is trained to obtain the second risk evaluation model based on the sample characterization parameters, the sample evaluation score, the sample business guest group characteristics and the sample general guest group characteristics, the recommended loss function is calculated to judge whether the second risk evaluation model is converged or not, so that the risk evaluation model training is completed, the trained risk evaluation model is obtained, the evaluation score corresponding to the sample business guest group data can be calculated by adopting the risk evaluation model according to the obtained sample general guest group data with sufficient data quantity, and the risk evaluation of the sample business guest group data with insufficient data quantity is realized.
Referring to fig. 13, a schematic structural diagram of an electronic device is provided in an embodiment of the present disclosure. As shown in fig. 13, the electronic device 2000 may include: at least one processor 2001, such as a CPU, at least one network interface 2004, an input output interface 2003, a memory 2005, at least one communication bus 2002. Wherein a communication bus 2002 is used to enable connected communications between these components. The network interface 2004 may optionally include standard wired interfaces, wireless interfaces (e.g., WI-FI interfaces), among others. The memory 2005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 2005 may also optionally be at least one storage device located remotely from the aforementioned processor 2001. As shown in fig. 13, an operating system, a network communication module, an input-output interface module, and a risk assessment application program may be included in the memory 2005 as one type of computer storage medium.
In the electronic device 2000 shown in fig. 13, the input/output interface 2003 is mainly used as an interface for providing input to a user, and data input by the user is acquired.
In one embodiment, processor 2001 may be used to invoke a risk assessment application stored in memory 2005 and specifically:
acquiring transaction guest group data corresponding to the transaction guest group and general guest group data corresponding to the transaction guest group data, and inputting the transaction guest group data and the general guest group data into a trained risk assessment model obtained by the risk assessment model training method;
performing high-dimensional mapping on the transaction guest group data and the general guest group data based on the risk assessment model to obtain characterization parameters corresponding to the transaction guest group data and the general guest group data;
and performing risk assessment on the transaction guest group data based on the characterization parameters and the general guest group data to obtain a risk assessment result corresponding to the transaction guest group.
In the embodiment of the specification, the transaction guest group data and the general guest group data corresponding to the transaction guest group data are obtained, after the characterization parameters corresponding to the transaction guest group data are calculated based on the risk assessment model, the risk assessment is carried out according to the characterization parameters and the general guest group data corresponding to the transaction guest group data, so that the risk assessment result corresponding to the transaction guest group data output by the risk assessment model after training is completed is obtained, the purpose of providing support for the assessment of the transaction guest group by means of the characteristic that the data volume of the general guest group is sufficient is achieved, and further, the calculation of the assessment score is carried out on the transaction data according to the general guest group data with sufficient data volume, so that the risk assessment on the transaction guest group with insufficient data is realized.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.
The foregoing disclosure is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the claims, which follow the meaning of the claims of the present invention.

Claims (12)

1. A risk assessment model training method, the method comprising:
acquiring first sample business guest group data and first sample guest group data, and extracting features of the first sample guest group data and the first sample business guest group data to obtain sample guest group features corresponding to the first sample guest group data and sample business guest group features corresponding to the first sample business guest group data;
Based on the sample general objective feature and the sample transaction objective feature, calculating to obtain a sample characterization parameter of a cluster migration network, wherein the sample characterization parameter is used for correlating the sample general objective feature and the sample transaction objective feature as an evaluation standard, and the cluster migration network is contained in a first risk evaluation model;
acquiring a sample evaluation score for representing the risk state of a transaction guest group, training the first risk evaluation model based on the sample general guest group characteristics, the sample transaction guest group characteristics, the sample characterization parameters and the sample evaluation score to obtain initial parameters of the first risk evaluation model, and obtaining a second risk evaluation model based on the initial parameters;
and calculating an evaluation loss function corresponding to the second risk evaluation model and a migration loss function corresponding to the clustering migration network, and determining a training state of the second risk evaluation model based on the evaluation loss function and the migration loss function until the training state indicates that the second risk evaluation model converges, so as to obtain a trained risk evaluation model.
2. The method of claim 1, wherein the obtaining the first sample transaction guest group data and the first sample guest group data, performing feature extraction on the first sample guest group data and the first sample transaction guest group data, and obtaining the sample guest group feature corresponding to the first sample guest group data and the sample transaction guest group feature corresponding to the first sample transaction guest group data, includes:
Acquiring first sample business guest group data and first sample general guest group data, and carrying out data screening on the first sample general guest group data and the first sample business guest group data to obtain second sample general guest group data and second sample business guest group data;
and extracting the characteristics of the second sample general guest group data and the second sample transaction guest group data to obtain sample general guest group characteristics corresponding to the second sample general guest group data and sample transaction guest group characteristics corresponding to the second sample transaction guest group data.
3. The method of claim 1, wherein the calculating, based on the sample generic-guest feature and the sample transactional-guest feature, a sample characterization parameter of a cluster migration network comprises:
performing feature classification on the sample general guest group features and the sample transaction guest group features to obtain sub-sample general guest group features and sub-sample transaction guest group features of each classification;
respectively carrying out high-dimensional characterization on the sub-sample generic-guest group characteristics and the sub-sample transaction guest group characteristics of each category to obtain characteristic distribution of each category mapped in the high-dimensional characterization;
calculating the distribution distance of the feature distribution between the sub-sample general guest group feature and the sub-transaction guest group feature in each classification, and optimizing the distribution distance so that the dimensions of the sub-sample general guest group feature and the sub-sample transaction guest group feature in each classification are the same;
And obtaining sample characterization parameters corresponding to each category in the cluster migration network when the dimensions are the same.
4. The method of claim 1, the obtaining a sample evaluation score for characterizing a risk status of a transactional guest group, training the first risk assessment model based on the sample generic-guest-group feature, the sample transactional guest-group feature, the sample characterization parameter, and the sample evaluation score, obtaining an initial parameter of the first risk assessment model, obtaining a second risk assessment model based on the initial parameter, comprising:
obtaining a sample evaluation score for characterizing a risk state of a transaction guest group;
training the first risk assessment model based on the sample assessment score, the sample characterization parameters, the sample general objective features and the sample transaction objective features corresponding to each classification, so as to obtain initial parameters corresponding to the first risk assessment model, wherein the sample characterization parameters are used for assessing the sample general objective features and the sub-sample transaction objective features corresponding to each classification in the sample general objective features and the sample transaction objective features;
and obtaining a second risk assessment model based on the initial parameters.
5. A method according to claim 3, wherein the calculating the estimated loss function corresponding to the second risk assessment model and the migration loss function corresponding to the clustered migration network, determining the training state of the second risk assessment model based on the estimated loss function and the migration loss function until the training state indicates that the second risk assessment model converges, and obtaining a trained risk assessment model includes:
calculating an evaluation loss function and model regularities corresponding to the second risk evaluation model;
calculating the sum of distribution distances corresponding to the classifications in the cluster migration network, and taking the sum of the distribution distances as a migration loss function of the cluster migration network;
and determining the training state of the second risk assessment model based on the assessment loss function, the model regularization and the migration loss function until the training state indicates the convergence of the second risk assessment model, so as to obtain a risk assessment model after training.
6. A method according to claim 3, said determining a training state of the second risk assessment model based on the assessment loss function and the migration loss function until the training state indicates convergence of the second risk assessment model, resulting in a trained risk assessment model, comprising:
When the training state indicates that the second risk assessment model is not converged, optimizing the cluster migration network to obtain an optimized cluster migration network, wherein the distribution distance of each classification in the optimized cluster migration network is reduced when the dimensions of the classification are the same;
training the second risk assessment model based on the optimized cluster migration network, and performing the step of calculating an assessment loss function corresponding to the second risk assessment model and a migration loss function corresponding to the cluster migration network.
7. A risk assessment method, the method comprising:
acquiring transaction guest group data corresponding to a transaction guest group and general guest group data corresponding to the transaction guest group data, and inputting the transaction guest group data and the general guest group data into a trained risk assessment model obtained by the risk assessment model training method according to any one of claims 1 to 6;
performing high-dimensional mapping on the transaction guest group data and the general guest group data based on the risk assessment model to obtain characterization parameters corresponding to the transaction guest group data and the general guest group data;
and performing risk assessment on the transaction guest group data based on the characterization parameters and the general guest group data to obtain a risk assessment result corresponding to the transaction guest group.
8. A risk assessment model training apparatus, the apparatus comprising:
the feature acquisition unit is used for acquiring first sample business guest group data and first sample general guest group data, and carrying out feature extraction on the first sample general guest group data and the first sample business guest group data to obtain sample general guest group features corresponding to the first sample general guest group data and sample business guest group features corresponding to the first sample business guest group data;
the parameter acquisition unit is used for calculating sample characterization parameters of a cluster migration network based on the sample general objective feature and the sample transaction objective feature, wherein the sample characterization parameters are used for correlating the sample general objective feature and the sample transaction objective feature as evaluation standards, and the cluster migration network is contained in a first risk evaluation model;
the model training unit is used for obtaining a sample evaluation score for representing the risk state of the business guest group, training the first risk evaluation model based on the sample general guest group characteristics, the sample business guest group characteristics, the sample characterization parameters and the sample evaluation score to obtain initial parameters of the first risk evaluation model, and obtaining a second risk evaluation model based on the initial parameters;
The model completion unit is configured to calculate an evaluation loss function corresponding to the second risk evaluation model and a migration loss function corresponding to the clustered migration network, determine a training state of the second risk evaluation model based on the evaluation loss function and the migration loss function, until the training state indicates that the second risk evaluation model converges, and obtain a trained risk evaluation model.
9. A risk assessment apparatus, the apparatus comprising:
a data acquisition unit, configured to acquire transaction guest group data corresponding to a transaction guest group and general guest group data corresponding to the transaction guest group data, and input the transaction guest group data and the general guest group data into a trained risk assessment model obtained by the risk assessment model training method according to any one of claims 1 to 6;
the parameter calculation unit is used for carrying out high-dimensional mapping on the transaction guest group data and the general guest group data based on the risk assessment model to obtain characterization parameters corresponding to the transaction guest group data and the general guest group data;
and the result acquisition unit is used for carrying out risk assessment on the transaction guest group data based on the characterization parameters and the general guest group data to obtain a risk assessment result corresponding to the transaction guest group.
10. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method according to any one of claims 1 to 7.
11. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the steps of the method according to any one of claims 1-7.
12. A computer program product having stored thereon at least one instruction which when executed by a processor implements the steps of the method of any of claims 1 to 7.
CN202311484048.XA 2023-11-08 2023-11-08 Risk assessment model training method and device, storage medium and electronic equipment Pending CN117634871A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311484048.XA CN117634871A (en) 2023-11-08 2023-11-08 Risk assessment model training method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311484048.XA CN117634871A (en) 2023-11-08 2023-11-08 Risk assessment model training method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117634871A true CN117634871A (en) 2024-03-01

Family

ID=90034779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311484048.XA Pending CN117634871A (en) 2023-11-08 2023-11-08 Risk assessment model training method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117634871A (en)

Similar Documents

Publication Publication Date Title
CN107193876B (en) Missing data filling method based on nearest neighbor KNN algorithm
CN110245213A (en) Questionnaire generation method, device, equipment and storage medium
CN112700325A (en) Method for predicting online credit return customers based on Stacking ensemble learning
CN111583012B (en) Method for evaluating default risk of credit, debt and debt main body by fusing text information
CN110222733B (en) High-precision multi-order neural network classification method and system
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
CN111639690A (en) Fraud analysis method, system, medium, and apparatus based on relational graph learning
CN112070559A (en) State acquisition method and device, electronic equipment and storage medium
CN117634871A (en) Risk assessment model training method and device, storage medium and electronic equipment
CN115660101A (en) Data service providing method and device based on service node information
CN114266394A (en) Enterprise portrait and scientific service personalized demand prediction method oriented to scientific service platform
CN114282657A (en) Market data long-term prediction model training method, device, equipment and storage medium
CN114331665A (en) Training method and device for credit judgment model of predetermined applicant and electronic equipment
CN113850669A (en) User grouping method and device, computer equipment and computer readable storage medium
CN113779391A (en) Intelligent lock unlocking recommendation method, system and device based on modeling and storage medium
CN111489134A (en) Data model construction method, device, equipment and computer readable storage medium
CN112950279A (en) Accurate marketing strategy model construction method and device based on machine learning
CN111752985A (en) Method, device and storage medium for generating main portrait
CN114757723B (en) Data analysis model construction system and method for resource element trading platform
CN116484230B (en) Method for identifying abnormal business data and training method of AI digital person
CN117349728A (en) Quality evaluation method and device for intelligent model
CN116842395A (en) Data matching method, device, equipment and storage medium based on artificial intelligence
Qi et al. A novel and convenient variable selection method for choosing effective input variables for telecommunication customer churn prediction model
CN117235366A (en) Collaborative recommendation method and system based on content relevance
CN115578202A (en) Money amount recommendation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination