CN113095408A - Risk determination method and device and server - Google Patents

Risk determination method and device and server Download PDF

Info

Publication number
CN113095408A
CN113095408A CN202110398902.5A CN202110398902A CN113095408A CN 113095408 A CN113095408 A CN 113095408A CN 202110398902 A CN202110398902 A CN 202110398902A CN 113095408 A CN113095408 A CN 113095408A
Authority
CN
China
Prior art keywords
preset
data
feature
classifiers
subspace
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110398902.5A
Other languages
Chinese (zh)
Inventor
陈李龙
王娜
强锋
刘华杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110398902.5A priority Critical patent/CN113095408A/en
Publication of CN113095408A publication Critical patent/CN113095408A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The specification provides a risk determination method, a risk determination device and a risk determination server. Based on the method, before specific implementation, by introducing and utilizing subspace consistency constraint and dynamic pairwise constraint, sample data carrying preset labels and sample data not carrying the preset labels are simultaneously utilized, and a preset processing model with good effect and high accuracy is obtained through training; in specific implementation, after the plurality of feature data of the target object are obtained, a plurality of feature groups with lower dimensionality of the target object can be obtained by mapping the plurality of feature data into a plurality of preset feature subspaces according to a preset mapping rule; then, calling a preset processing model to process the plurality of feature groups to obtain corresponding processing results; and then, according to the processing result, whether the target object has the preset risk can be determined efficiently and accurately. Therefore, the method can be well suitable for a data prediction scene with high characteristic dimensionality and complexity, and whether the target object has the preset risk or not can be accurately predicted.

Description

Risk determination method and device and server
Technical Field
The specification belongs to the technical field of artificial intelligence, and particularly relates to a risk determination method, a risk determination device and a risk determination server.
Background
In some complex data prediction scenarios (e.g., overdue risk prediction scenarios), the collected feature data of the data object whose risk is to be predicted is often of a large variety (e.g., may include more than 100 different kinds of feature data), and the feature dimension is relatively high.
For the data prediction scene with higher feature dimension and more complexity, a risk prediction model with better effect and smaller error is often difficult to train based on the existing method, so that the existing method is difficult to be applied to the data prediction scene with higher feature dimension and more complexity, and whether the data object in the scene has the preset risk or not is difficult to accurately determine.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The present specification provides a risk determination method, apparatus, and server, which may be better adapted to a data prediction scenario with higher feature dimension and complexity by using a preset processing model obtained in advance based on subspace consistency constraint and dynamic pairwise constraint training, and accurately predict whether a target object has a preset risk.
The present specification provides a method of determining risk, comprising:
acquiring a plurality of characteristic data of a target object;
mapping the plurality of feature data to a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace;
calling a preset processing model to process the plurality of feature groups to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint and dynamic pairwise constraint in advance;
and determining whether the target object has a preset risk or not according to the processing result.
In one embodiment, the target object comprises a transaction account; correspondingly, the preset risk comprises the overdue risk of the transaction data.
In one embodiment, the feature data comprises: identity class characteristic data of the transaction account, historical transaction behavior class characteristic data of the transaction account, current transaction behavior class characteristic data of the transaction account and associated behavior class characteristic data of the transaction account.
In one embodiment, the predetermined processing model further comprises a discriminant structure; the judging structure is connected with the plurality of preset classifiers; and the judging structure is used for generating the processing result according to the classification result output by the plurality of preset classifiers.
In one embodiment, the preset processing model is established as follows:
acquiring a plurality of sample data; the sample data corresponds to a sample object, and the sample data comprises a plurality of characteristic data of the corresponding sample object; the sample objects comprise a first type sample object carrying a preset label and a second type sample object not carrying the preset label;
mapping a plurality of feature data of the first type sample object to a plurality of preset feature subspaces according to a preset mapping rule to obtain first type training data; mapping a plurality of feature data of the second type sample object to a plurality of preset feature subspaces to obtain second type training data;
training a plurality of initial classifiers by using the first type of training data to obtain a plurality of corresponding intermediate classifiers; wherein the initial classifiers correspond to a preset feature subspace respectively;
constructing an objective function for a plurality of intermediate classifiers using the plurality of intermediate classifiers, the first class of training data, the second class of training data; wherein, the objective function comprises a subspace consistency constraint formula and a dynamic pairwise constraint formula;
and determining a plurality of preset classifiers meeting the requirements according to the objective function so as to construct and obtain a preset processing model.
In one embodiment, an objective function is constructed for a plurality of intermediate classifiers using the plurality of intermediate classifiers, the first class of training data, the second class of training data; wherein, the objective function contains subspace consistency constraint formula and dynamic pairwise constraint formula, including:
calling a plurality of intermediate classifiers to process the second class of training data so as to obtain dynamic labels aiming at the second class of sample objects;
constructing a dynamic pairwise constraint equation according to the preset labels of the first type sample objects and the dynamic labels of the second type sample objects;
calling a plurality of intermediate classifiers to process the first type of training data to obtain a plurality of classification results of the first type of sample objects;
constructing a subspace consistency constraint formula according to a plurality of classification results of the first type sample objects;
and constructing the objective function according to the dynamic pairwise constraint equation and the subspace consistency constraint equation.
In one embodiment, constructing a dynamic pairwise constraint equation according to the preset labels of the first type sample objects and the dynamic labels of the second type sample objects includes:
establishing a connection set and a non-connection set according to the preset label of the first type sample object and the dynamic label of the second type sample object; wherein the connected set comprises a plurality of connected groups, and the connected groups comprise two sample objects with the same label; the non-connected set comprises a plurality of non-connected groups, and the non-connected groups comprise two sample objects with different labels;
constructing a dynamic pairing matrix according to the connected set and the non-connected set;
and constructing the dynamic pairwise constraint equation according to the dynamic pairwise matrix and the plurality of intermediate classifiers.
In one embodiment, constructing a subspace consistency constraint equation according to a plurality of classification results of the first type sample objects comprises:
dividing a plurality of comparison groups according to a plurality of classification results of the first type sample objects; wherein the comparison set comprises two classification results of the same first type sample object;
and constructing the subspace consistency constraint equation according to the plurality of comparison groups.
In one embodiment, determining a plurality of preset classifiers meeting requirements according to the objective function to construct and obtain a preset processing model, includes:
solving the optimal solution of the objective function by using a gradient descent algorithm to determine a plurality of preset classifiers which meet the requirements;
and connecting a plurality of preset classifiers with the discrimination structure to obtain a preset processing model.
In one embodiment, after obtaining the plurality of sample data, the method further comprises:
detecting whether the missing quantity of the characteristic data of the sample data is smaller than a preset missing quantity threshold value or not;
and under the condition that the missing quantity of the characteristic data of the sample data is determined to be smaller than a preset missing quantity threshold value, zero filling processing is carried out on the characteristic data with missing.
The present specification also provides a method for determining risk, comprising:
acquiring a plurality of characteristic data of a target object;
mapping the plurality of feature data to a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace;
calling a preset processing model to process the plurality of feature groups to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint or dynamic pairwise constraint in advance;
and determining whether the target object has a preset risk or not according to the processing result.
The present specification also provides a risk determination apparatus comprising:
the acquisition module is used for acquiring a plurality of characteristic data of the target object;
the mapping module is used for mapping the plurality of feature data into a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace;
the calling module is used for calling a preset processing model to process the plurality of feature groups so as to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint and dynamic pairwise constraint in advance;
and the determining module is used for determining whether the target object has a preset risk or not according to the processing result.
The present specification also provides a server comprising a processor and a memory for storing processor-executable instructions which, when executed by the processor, carry out the steps associated with the method of risk determination.
The present specification also provides a computer readable storage medium having stored thereon computer instructions which, when executed, carry out the steps associated with the method of risk determination.
Based on the method, before specific implementation, for a data prediction scene with higher feature dimensionality and more complexity, multi-dimensional features can be divided into a plurality of preset feature subspaces according to a preset mapping rule, and feature data of a sample object is subjected to dimensionality reduction so as to reduce complexity and related data processing amount during subsequent model training; furthermore, by combining the feature data after dimension reduction, and by introducing and utilizing subspace consistency constraint and dynamic pairwise constraint to construct an objective function considering interaction between different feature subspaces and interaction between different sample objects, specific model training is participated, so that sample data carrying preset labels and sample data not carrying the preset labels can be fully utilized at the same time, and a preset processing model with better effect and higher accuracy is obtained through training; in specific implementation, after the plurality of feature data of the target object are obtained, the plurality of feature data with higher dimensionality can be mapped into a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups with lower dimensionality of the corresponding target object; then, a preset processing model is called to process the plurality of feature groups, and corresponding processing results are obtained relatively quickly; and then, according to the processing result, whether the target object has the preset risk can be determined efficiently and accurately. Therefore, the method can be well suitable for a data prediction scene with high characteristic dimensionality and complexity, and whether the target object has the preset risk or not can be accurately predicted.
Drawings
In order to more clearly illustrate the embodiments of the present specification, the drawings needed to be used in the embodiments will be briefly described below, and the drawings in the following description are only some of the embodiments described in the present specification, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.
Fig. 1 is a schematic diagram of an embodiment of a structural composition of a system to which a risk determination method provided by an embodiment of the present specification is applied;
FIG. 2 is a schematic diagram illustrating an embodiment of a method for risk determination provided by an embodiment of the present specification, in one example scenario;
FIG. 3 is a schematic flow chart diagram of a method of risk determination provided by one embodiment of the present description;
FIG. 4 is a schematic structural component diagram of a server provided in an embodiment of the present description;
fig. 5 is a schematic structural component diagram of a risk determination device provided by an embodiment of the present specification;
FIG. 6 is a diagram illustrating an embodiment of a method for risk determination provided by an embodiment of the present specification, in an example scenario;
fig. 7 is a schematic diagram of an embodiment of a risk determination method provided by an embodiment of the present specification, in an example scenario.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.
The embodiment of the specification provides a risk determination method, which can be particularly applied to a system comprising a server and a terminal device. Specifically, as shown in fig. 1, the server and the terminal device may be connected in a wired or wireless manner to perform specific data interaction.
In this embodiment, the server may specifically include a background server that is applied to a network platform side and is capable of implementing functions such as data transmission and data processing. Specifically, the server may be, for example, an electronic device having data operation, storage function and network interaction function. Alternatively, the server may be a software program running in the electronic device and providing support for data processing, storage and network interaction. In this embodiment, the number of servers included in the server is not particularly limited. The server may specifically be one server, or may also be several servers, or a server cluster formed by several servers.
In this embodiment, the terminal device may specifically include a front-end electronic device that is disposed on a user side and is capable of implementing functions such as data acquisition and data transmission. Specifically, the client may be, for example, a desktop computer, a tablet computer, a notebook computer, a smart phone, a self-service machine, a counter terminal, and the like. Alternatively, the terminal device may be a software application capable of running in the electronic device. For example, it may be some related APP running on a smartphone, etc.
In particular, the user may use a kiosk disposed in a business hall of a bank as a terminal device to handle related business related to transaction data (e.g., housing accumulation loan, etc.).
The user may log into his/her transaction account (e.g., target object) and initiate a transaction request for transaction data as directed by the guide interface presented by the kiosk.
Correspondingly, the self-service machine can respond to the operation of the user, and generate and display an information input interface for the user. Through the information input interface, the kiosk may receive user input of first user data, such as a user's name, a user's address, a user's phone number, and the like.
The kiosk may then send a business transaction request carrying the first user data to a server.
Correspondingly, the server receives the service transaction request, and determines a transaction account and first user data which are required to transact related services related to the transaction data by analyzing the service transaction request.
Further, the server may query the own user database and the partner user database according to the transaction account in response to the transaction request, and obtain second user data of the transaction account, such as historical transaction records, credit service records, integrity data, and the like.
Further, the server may extract a plurality of related feature data from the large amount of first user data and the second user data, including: identity class feature data of a transaction account (e.g., name, work unit, academic calendar, etc. of a user of the transaction account), historical transaction behavior class feature data of the transaction account (e.g., shopping line of the transaction account for the past year, total income data of the past year, total expenditure data of the past point, etc.), current transaction behavior class feature data of the transaction account (e.g., current credit class business data of the transaction account, current housing public fund contribution data of the transaction account, insurance class business data currently engaged in the transaction account, etc.), associated behavior class feature data of the transaction account (e.g., credit rating of the transaction account at other institutions, transaction overdue record of the transaction account at other institutions, etc.), and the like.
Therefore, the server can obtain a plurality of characteristic data with more types and higher dimensionality aiming at the transaction account.
Further, the server may map a plurality of feature data of the transaction account to a plurality of preset feature subspaces according to a preset mapping rule, so as to obtain a plurality of feature groups. Wherein each of the plurality of feature groups corresponds to a predetermined feature subspace.
In addition, each feature group may specifically include feature data of one or more transaction accounts. And each of the plurality of feature data of the transaction account is divided into at least one feature group.
Specifically, for example, the first feature group corresponding to the feature subspace preset by number 1 includes three feature data, namely, the user name and the user age of the transaction account, and the current housing accumulation payment data of the transaction account. The second feature group corresponding to the feature subspace of the number 2 preset comprises the user work unit, the user academic record and the user age of the transaction account, the credit rating of the transaction account in other institutions, the transaction overdue record of the transaction account in other institutions and total five feature data … …
By the mode, the plurality of feature data are mapped to the plurality of preset feature subspaces according to the preset mapping rule, the plurality of feature data with high dimensionality can be converted into the plurality of feature groups with low dimensionality, the data dimension reduction of the feature data with high dimensionality is realized, and the subsequent data processing is facilitated.
Then, the server may input the feature sets as model inputs into a preset processing model for specific processing to determine whether the transaction account has a preset risk (e.g., a risk of overdue repayment on a housing accumulation loan).
The preset processing model can be specifically understood as a model which is obtained by performing semi-supervised learning by using the sample data after dimensionality reduction and is capable of predicting whether a target object has a preset risk according to a plurality of input feature groups based on subspace consistency constraint and dynamic pairwise constraint in advance.
Referring to fig. 2, the predetermined processing model at least includes a plurality of predetermined classifiers. Wherein. The plurality of preset classifiers respectively correspond to a preset feature subspace and are used for processing a corresponding feature group. For example, the classifier preset No. 1 corresponds to the feature subspace preset No. 1, and is used for accessing and processing the first feature group. And correspondingly, the classifier preset by No. 2 corresponds to the feature subspace preset by No. 2 and is used for accessing and processing the second feature group.
In addition, the preset processing model also comprises a judgment structure. The judging structure is connected with the plurality of preset classifiers and used for accessing and generating and outputting a final processing result according to the classification result output by the plurality of preset classifiers.
After the preset processing model receives the plurality of feature groups of the input model, in specific implementation, the corresponding plurality of classification results can be output by respectively processing the responsible feature groups through a plurality of preset classifiers. The discrimination structure may generate a final processing result by, for example, a weighted sum after obtaining a plurality of classification results output by a plurality of preset classifiers, and output a model.
The server can determine whether the transaction account has a preset risk according to the processing result.
Specifically, if the server determines that the transaction account has a preset risk according to the processing result, the related service can be stopped being transacted for the user, and a first type of prompt information of 'the user has an overdue risk and cannot transact the related service' is generated.
The server can send the first type prompt message to the self-service machine. Correspondingly, the self-service machine can display the first type of prompt information to the user to prompt the user that the business transaction fails.
If the server determines that the transaction account has no preset risk according to the processing result, related services can be continuously transacted for the user, and after the transaction is completed, second-class prompt information of 'successful service transaction' is generated.
The server can send the second type prompt message to the self-service machine. Correspondingly, the self-service machine can display the second prompting information to the user to prompt the user that the business is successfully processed.
By the method, the server can utilize the preset processing model which is introduced in advance and is obtained based on subspace consistency constraint and dynamic pairwise constraint training and has a good effect, is suitable for a data prediction scene with high characteristic dimensionality and complexity, accurately predicts whether the user requesting to handle the related service has the preset risk, and further can accurately judge whether the user handles the related service according to the risk condition of the user, so that the risk to be borne by one side of a service party is reduced.
Referring to fig. 3, an embodiment of the present disclosure provides a method for determining a risk. The method is particularly applied to the server side. In specific implementation, the method may include the following:
s301: acquiring a plurality of characteristic data of a target object;
s302: mapping the plurality of feature data to a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace;
s303: calling a preset processing model to process the plurality of feature groups to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint and dynamic pairwise constraint in advance;
s304: and determining whether the target object has a preset risk or not according to the processing result.
By the embodiment, the preset processing model with better effect obtained by training based on subspace consistency constraint and dynamic pairwise constraint in advance can be better suitable for the data prediction scene with higher feature dimension and more complexity, whether the target object has the preset risk or not can be accurately predicted, and the prediction error is reduced.
In some embodiments, the target object may be specifically understood as a data object to be predicted whether a preset risk exists. The target objects may be different types of data objects corresponding to different application scenarios and different processing requirements.
In some embodiments, the target object may include a transaction account; accordingly, the preset risk may include a transaction data overdue risk.
In particular, the transaction account may be an account used by a user who wants to apply for a related transaction involving transaction data (e.g., credit, housing accumulation loan, mini-enterprise loan, etc.).
By the embodiment, whether the transaction account applying for handling the related business related to the transaction data has the overdue risk of the transaction data can be predicted by applying the risk determining method provided by the specification, so that a reference basis with higher value can be provided for a business handling party, the business handling party can reasonably and accurately judge whether the related business is handled for the transaction account according to the predicted overdue risk condition of the transaction account, and the risk to be borne by the business handling party is reduced.
Of course, it should be noted that the above listed target objects and the preset risks are only schematic illustrations. For different application scenarios and processing requirements, the target object may further include other types of data objects, and correspondingly, the preset risk may further include other types of risks. The present specification is not limited to these.
In some embodiments, the plurality of feature data may be specifically understood as data related to the target object, which is capable of characterizing certain attribute characteristics of the target object of interest. Whether the target object has a preset risk can be generally predicted based on the characteristic data.
In some embodiments, for a data prediction scene with higher feature dimension and more complexity, in order to more accurately predict whether a preset risk exists in a target object, the number of types of feature data of the target object is often larger and the feature dimension is relatively higher.
In some embodiments, the feature data may specifically include: identity class feature data of the transaction account (e.g., name, work unit, academic calendar, etc. of the user of the transaction account), historical transaction behavior class feature data of the transaction account (e.g., shopping line of the transaction account in the past year, total income data of the past year, total expenditure data of the past point, etc.), current transaction behavior class feature data of the transaction account (e.g., current credit class business data of the transaction account, current housing gross fund payment data of the transaction account, current insurance class business data of the transaction account, etc.), etc. In addition, the characteristic data may also include associated behavior characteristic data of the transaction account (for example, a credit rating of the transaction account at another institution, a transaction overdue record of the transaction account at another institution, etc.), and the like. Of course, it should be noted that the above listed feature data is only an illustrative description.
Through the embodiment, the risk determination method provided by the specification can be applied to a data prediction scene with high feature dimension and complexity, and the high-dimension feature data of the target object is fully and comprehensively utilized, so that whether the target object has the preset risk or not can be predicted more accurately.
In some embodiments, the obtaining of the plurality of feature data of the target object may specifically include: receiving first user data actively input by a target object, and extracting related characteristic data from the first user data; and/or receiving and according to the identification (such as the user name or the transaction account name and the like) provided by the target object, mining data associated with the identification of the target object from the database held by the searching self party and the database held by the cooperating party as second user data, and extracting relevant feature data from the second user data.
In some embodiments, in many data prediction scenarios, especially the data prediction scenarios with higher feature dimension and complexity, the kinds of feature data of the target object obtained and used are often more, and the feature dimension is relatively higher. In this case, if the high-dimensional feature data is directly processed, the processing process is inevitably complex and cumbersome, the processing amount of the related data is relatively large, the processing cost is relatively high, and errors are more likely to occur. Therefore, in this embodiment, the obtained feature data of the target object are not directly used, but the feature data are mapped into the feature subspaces according to the preset mapping rule to obtain the feature groups of the target object. Each feature group corresponds to a preset feature subspace. And each feature set may specifically include one or more feature data with a relatively small number of types and relatively low feature dimensions.
Therefore, a plurality of feature groups with lower feature dimensionality can be used for replacing a plurality of original feature data with higher feature dimensionality to perform subsequent data processing, and data dimension reduction of high-dimensional feature data is realized, so that the feature data can be processed more efficiently and accurately.
In some embodiments, the preset mapping rule may specifically be a rule set including mapping relationships between each feature data and a preset feature subspace. Specifically, based on the preset mapping rule, each feature data in the plurality of feature data may be mapped to at least one preset feature subspace; meanwhile, the obtained feature group corresponding to each preset feature subspace can be ensured to at least comprise one feature data. It should be noted that, based on a preset mapping rule, the same feature data may be mapped to a plurality of different preset feature subspaces simultaneously.
In some implementations, the preset processing model may be specifically understood as a model that is trained by performing semi-supervised learning by using the reduced-dimension sample data based on subspace consistency constraint and dynamic pairwise constraint in advance and is capable of predicting whether the target object has a preset risk according to the input multiple feature sets.
It should be added that, in the conventional model for predicting risk, only the correlation between the feature data of a single sample object and the label is considered, the interaction between different sample objects is ignored, and the interaction between different feature subspaces is not considered. The preset processing model provided by the specification is trained in advance by introducing and based on subspace consistency constraints related to different feature subspaces and dynamic pairwise constraints related to different sample objects, and has better coverage and higher model accuracy. The specific training mode of the preset processing model will be described later.
In some embodiments, specifically, referring to fig. 2, the predetermined processing model at least includes a plurality of predetermined classifiers. And the preset classifiers respectively correspond to a preset feature subspace. Accordingly, the preset classifiers correspond to one feature group, respectively.
Specifically, a plurality of feature groups input into a preset processing model are shunted and input into a corresponding preset classifier for processing, and a classification result obtained based on the corresponding feature group is output from the preset classifier.
In some embodiments, the predetermined processing model may further include a discriminant structure. The judging structure is connected with the plurality of preset classifiers; and the judging structure is used for generating the processing result according to the classification result output by the plurality of preset classifiers.
Specifically, a plurality of classification results output from a plurality of preset classifiers are input to the discrimination structure, and the discrimination structure integrates the plurality of classification results based on a built-in discrimination function to analyze, calculate and output a final processing result, which is output as a model of a preset processing model.
By the embodiment, a plurality of feature groups corresponding to a plurality of preset feature subspaces can be processed in parallel by using a preset processing model to obtain a plurality of classification results; further, a final processing result can be obtained from the plurality of classification results.
In some embodiments, the processing result may be a prediction tag, which is used to characterize whether the target object has a predetermined risk.
In some embodiments, the processing result may be a predicted probability for representing that the target object has the preset risk.
In some embodiments, determining whether the target object has a preset risk according to the processing result may include: according to the processing result, comparing the prediction probability with a preset probability threshold; determining that a preset risk exists in the target object under the condition that the prediction probability is determined to be greater than or equal to a preset probability threshold; and under the condition that the prediction probability is determined to be smaller than a preset probability threshold value, determining that the target object has no preset risk.
In some embodiments, when it is determined that the target object has the preset risk, a corresponding risk flag may be set for the target object, so that the target object may be subsequently subjected to matching business data processing according to the risk flag of the target object.
In some embodiments, before implementation, the preset processing model may be constructed and trained as follows:
s1: acquiring a plurality of sample data; the sample data corresponds to a sample object, and the sample data comprises a plurality of characteristic data of the corresponding sample object; the sample objects comprise a first type sample object carrying a preset label and a second type sample object not carrying the preset label;
s2: mapping a plurality of feature data of the first type sample object to a plurality of preset feature subspaces according to a preset mapping rule to obtain first type training data; mapping a plurality of feature data of the second type sample object to a plurality of preset feature subspaces to obtain second type training data;
s3: training a plurality of initial classifiers by using the first type of training data to obtain a plurality of corresponding intermediate classifiers; wherein the initial classifiers correspond to a preset feature subspace respectively;
s4: constructing an objective function for a plurality of intermediate classifiers using the plurality of intermediate classifiers, the first class of training data, the second class of training data; wherein, the objective function comprises a subspace consistency constraint formula and a dynamic pairwise constraint formula;
s5: and determining a plurality of preset classifiers meeting the requirements according to the objective function so as to construct and obtain a preset processing model.
Through the embodiment, the preset processing model which meets the requirements and has a good effect can be established and trained aiming at the data prediction scene with high feature dimensionality and complex structure.
In some embodiments, for some data prediction scenarios with higher feature dimensions and more complexity, all sample data carrying a preset tag often cannot be directly obtained. In most cases, only part of the obtained sample data carries preset tags, or the corresponding preset tags can be easily determined; and the rest other sample data do not carry the preset label or are not easy to determine the corresponding preset label.
In this embodiment, the sample object carrying the preset label may be denoted as a first type sample object. And marking the sample object which does not carry the preset label as a second type sample object.
In some embodiments, in specific implementation, after a plurality of sample data are obtained, the sample objects for which the preset labels are easily determined may also be labeled according to a preset labeling rule, so as to determine and label a plurality of first type sample objects from the second type sample objects.
Specifically, taking a prediction scenario of overdue risk of transaction data as an example, the sample object may be obtained and labeled according to a repayment record of a historical service of the sample object. For example, if it is determined that the sample object has a overdue record in the historical business according to the repayment record of the historical business of the sample object, a preset tag with a value of "1" (corresponding to the existence of a preset risk) may be set for the sample object, and the sample object may be marked as the first type sample object. If it is determined that the sample object has no overdue records in all historical businesses according to the repayment records of the historical businesses of the sample object and other businesses which are not yet cleared exist in the sample object, a preset label with a value of '0' (corresponding to no preset risk) can be set for the sample object, and the sample object is marked as a first type of sample object. If it is determined that the sample object has no overdue records in all historical businesses according to the repayment records of the historical businesses of the sample object and other businesses which are not yet cleared exist in the sample object at present, a preset label cannot be simply set for the sample object, and then the sample object can be marked as a second type sample object.
In some embodiments, during the specific training, a plurality of initial classifiers respectively corresponding to the preset feature subspace may be constructed first; and respectively inputting the characteristic data in the first training data into the corresponding initial classifiers in a distinguishing manner so as to respectively train the plurality of initial classifiers, so as to obtain a plurality of intermediate classifiers obtained by training by using the characteristic data of the first-class sample object carrying the preset label. Each intermediate classifier corresponds to a preset feature subspace.
In some embodiments, the constructing an objective function for a plurality of intermediate classifiers by using the plurality of intermediate classifiers, the first class of training data, and the second class of training data may include the following steps:
s1: calling a plurality of intermediate classifiers to process the second class of training data so as to obtain dynamic labels aiming at the second class of sample objects;
s2: constructing a dynamic pairwise constraint equation according to the preset labels of the first type sample objects and the dynamic labels of the second type sample objects;
s3: calling a plurality of intermediate classifiers to process the first type of training data to obtain a plurality of classification results of the first type of sample objects;
s4: constructing a subspace consistency constraint formula according to a plurality of classification results of the first type sample objects;
s5: and constructing the objective function according to the dynamic pairwise constraint equation and the subspace consistency constraint equation.
By the embodiment, on the basis of the intermediate classifier obtained by training based on the feature data of the first type sample object, the feature data of the second type sample object is further introduced and utilized, and meanwhile, the subspace consistency constraint and the dynamic pairwise constraint are introduced and utilized to construct and obtain the target function meeting the requirements, so that a better preset processing model can be obtained by training based on the target function in the following.
In some embodiments, in specific implementation, a plurality of intermediate classifiers may be invoked to process the corresponding feature groups of the second type sample objects in the second type training data, respectively, to obtain corresponding classification results; and then, integrating classification results output by the plurality of intermediate classifiers in modes of voting and the like to obtain a dynamic label for the second-class sample object.
The dynamic label may be specifically understood as a pseudo label determined by integrating classification results output by a plurality of intermediate classifiers, which is different from a preset label.
In some embodiments, when implemented, the classification results output by the plurality of intermediate classifiers may be combined to determine the corresponding dynamic label according to the following formula:
Figure BDA0003019524060000121
where K is the number of the pre-set feature subspaces (i.e., the number of intermediate classifiers), fk(. is) an intermediate classifier corresponding to a preset feature subspace numbered k, xuCharacteristic data of a second type of sample object, numbered u,/uIs a dynamic label for the second type of sample object, numbered u.
In some embodiments, the constructing a dynamic pairwise constraint equation according to the preset label of the first type sample object and the dynamic label of the second type sample object may include: establishing a connection set and a non-connection set according to the preset label of the first type sample object and the dynamic label of the second type sample object; wherein the connected set comprises a plurality of connected groups, and the connected groups comprise two sample objects with the same label; the non-connected set comprises a plurality of non-connected groups, and the non-connected groups comprise two sample objects with different labels; constructing a dynamic pairing matrix according to the connected set and the non-connected set; and constructing the dynamic pairwise constraint equation according to the dynamic pairwise matrix and the plurality of intermediate classifiers.
Through the embodiment, the dynamic pairing matrix can be constructed according to the connected set and the non-connected set by determining, so that the mutual relation among different sample objects is introduced, and further, the corresponding dynamic pairing constraint formula can be established and obtained.
In some embodiments, when implemented, the connection set may be established according to the following equation: ML { (x)i,xj)|li=lj}. Wherein x isiCharacteristic data, x, representing a sample object numbered ijThe sample object may be a first type sample object or a second type sample object, and i and j may be the same or different from each other。liLabel indicating sample object numbered i,/jAnd the label represents the sample object with the number j, and the label can be a preset label or a dynamic label. Similarly, the set of non-connections may be established according to the following equation: CL { (x)i,xj)|li≠lj}。
And then according to the connected set and the non-connected set, constructing a dynamic pairing matrix S containing the interaction between every two sample objects according to the following formula:
Figure BDA0003019524060000131
wherein s isi,jIs an element in the ith row and jth column of the dynamic pairwise matrix S for representing the interaction between the sample object numbered i and the sample object numbered j.
In some embodiments, when implemented, the dynamic pairwise constraint equation may be constructed from the dynamic pairwise matrix and the plurality of intermediate classifiers according to the following equation:
Figure BDA0003019524060000132
wherein R is1Representing dynamic pairwise constraints, S representing a dynamic pairwise matrix, fk(·) is an intermediate classifier corresponding to a preset feature subspace of the reference number k, and X, X' represents feature data of any two sample objects (which may be the same sample object or different sample objects).
In some embodiments, the constructing a subspace consistency constraint equation according to the plurality of classification results of the first type sample objects may include: dividing a plurality of comparison groups according to a plurality of classification results of the first type sample objects; wherein the comparison set comprises two classification results of the same first type sample object; and constructing the subspace consistency constraint equation according to the plurality of comparison groups.
By the embodiment, the interaction of different feature subspaces based on the same sample object can be introduced by comparing and utilizing the classification results of the same first type sample object obtained by the intermediate classifiers based on the corresponding different feature subspaces, and then the corresponding subspace consistency constraint formula can be established.
In some embodiments, when implemented, the subspace consistency constraint equation may be constructed from the plurality of comparison groups according to the following equation:
Figure BDA0003019524060000141
wherein R is2Representing a subspace consistency constraint, fp(·)、fq(. X) represents two intermediate classifiers corresponding to different feature subspacesLCharacteristic data representing a first type of sample object carrying a predetermined label.
In some embodiments, when implemented, the objective function may be constructed according to the following equation based on the dynamic pairwise constraint equation and the subspace consistency constraint equation:
L=Remp+α·R1+β·R2
wherein L represents a function value (or called loss value) of the objective function, RempIndicating the empirical loss and alpha, beta indicating the hyper-parameters used to adjust the associated weights. The experience loss may be specifically determined according to a preset label of the first type sample object.
The above objective function can further be expanded into the following form:
Figure BDA0003019524060000142
wherein Y represents a first type sample object XLThe preset tag of (1).
In some embodiments, the determining, according to the objective function, a plurality of preset classifiers meeting requirements to construct and obtain a preset processing model may include: determining a plurality of preset classifiers which meet the requirement by solving an optimal solution (for example, a minimum loss value) of the objective function by using a gradient descent algorithm; and connecting a plurality of preset classifiers with the discrimination structure respectively to obtain a preset processing model.
By the embodiment, the model training process can be converted into the process of solving the optimal solution of the objective function, and the corresponding multiple preset classifiers are determined by finding and determining the optimal solution of the objective function, so that the preset processing model which meets the requirements and has a good effect can be constructed.
In some embodiments, the above-mentioned discriminant structure may specifically have a discriminant function as shown below:
Figure BDA0003019524060000143
wherein F (x) represents the processing result of the output of the discrimination structure, and x ∈ ω1Indicating that the data object has a preset risk, and x belongs to omega2Indicating that there is no preset risk for the data object.
In some embodiments, in the specific training process, a corresponding weight parameter may be set in the discriminant function for each preset classifier according to the contribution of the plurality of preset classifiers to the processing result, so that the processing result output by the discriminant function is relatively more accurate.
In some embodiments, after obtaining a plurality of sample data, when the method is implemented, the following may be further included: detecting whether the missing quantity of the characteristic data of the sample data is smaller than a preset missing quantity threshold value or not; and under the condition that the missing quantity of the characteristic data of the sample data is determined to be smaller than a preset missing quantity threshold value, zero filling processing is carried out on the characteristic data with missing.
By the embodiment, the sample data can be preprocessed before the preset processing model is trained by using the sample data, so that data errors are reduced, the sample data with a good training effect is obtained, and the subsequent model training is participated.
In some embodiments, when it is determined that the missing amount of the feature data of the sample data is smaller than a preset missing amount threshold and the missing degree is not serious, the missing feature data may be padded by using a value 0 or by using a character "unknown". When the missing quantity of the characteristic data of the sample data is determined to be larger than or equal to the preset missing quantity threshold value and the missing degree is serious, the sample data can be deleted, the data error brought by the fact that the sample data is subsequently used to participate in model training is avoided, and the model training precision is improved.
In some embodiments, after obtaining a plurality of sample data, when implementing, the method may further include: performing multivariate derived variable exploration on the sample data to obtain richer data characteristics; and then the preset processing model can be trained by combining the richer data characteristics.
Specifically, the features may be evolved based on feature data included in the sample data, for example, statistical features (including maximum values, minimum values, mean values, variances, and the like) of the numerical features may be grouped and counted according to category features, deviation features (including differences between the original features and the minimum values, maximum values, and mean values of the columns, and the like) of the numerical features, cross features between the numerical features (including correlation addition, subtraction, multiplication, and division operations between the numerical features to obtain new columns), and the like, so that richer data features may be obtained.
As can be seen from the above, before specific implementation, for a data prediction scene with higher feature dimensionality and more complexity, the method provided in the embodiment of the present specification may first divide multidimensional features into a plurality of preset feature subspaces according to a preset mapping rule, perform dimension reduction on feature data of a sample object, and reduce complexity and related data processing amount during subsequent model training; in addition, by combining the feature data after dimensionality reduction, and by introducing and utilizing subspace consistency constraint and dynamic pairwise constraint to construct an objective function to participate in model training, sample data carrying preset labels and sample data not carrying the preset labels can be effectively utilized, and a preset processing model with good training effect and high accuracy is obtained; in specific implementation, after the plurality of feature data of the target object are obtained, the plurality of feature data may be mapped into a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups of the corresponding target object; then, calling a preset processing model to process the plurality of feature groups of the target object to obtain corresponding processing results; and then, according to the processing result, whether the target object has the preset risk can be determined efficiently and accurately. Therefore, the method can be well suitable for a data prediction scene with high characteristic dimensionality and complexity, and whether the target object has the preset risk or not can be accurately predicted.
The embodiment of the specification also provides another risk determination method. When the method is implemented, the following contents can be included:
s1: acquiring a plurality of characteristic data of a target object;
s2: mapping the plurality of feature data to a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace;
s3: calling a preset processing model to process the plurality of feature groups to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint or dynamic pairwise constraint in advance;
s4: and determining whether the target object has a preset risk or not according to the processing result.
Through the embodiment, the processing cost and the processing efficiency can be considered by utilizing the preset processing model obtained by training based on subspace consistency alone or dynamic pairwise constraints alone, and whether the preset risk exists in the target object can be determined more accurately.
In some embodiments, the objective function used for training the preset processing model may specifically be an objective function including only a subspace consistency constraint equation, or may be an objective function including a dynamic pairwise constraint equation.
Embodiments of the present specification further provide a server, including a processor and a memory for storing processor-executable instructions, where the processor, when implemented, may perform the following steps according to the instructions: acquiring a plurality of characteristic data of a target object; mapping the plurality of feature data to a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace; calling a preset processing model to process the plurality of feature groups to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint and dynamic pairwise constraint in advance; and determining whether the target object has a preset risk or not according to the processing result.
In order to more accurately complete the above instructions, referring to fig. 4, another specific server is provided in the embodiments of the present specification, wherein the server includes a network communication port 401, a processor 402, and a memory 403, and the above structures are connected by an internal cable, so that the structures may perform specific data interaction.
The network communication port 401 may be specifically configured to acquire a plurality of feature data of a target object.
The processor 402 may be specifically configured to map the plurality of feature data to a plurality of preset feature subspaces according to a preset mapping rule, so as to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace; calling a preset processing model to process the plurality of feature groups to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint and dynamic pairwise constraint in advance; and determining whether the target object has a preset risk or not according to the processing result.
The memory 403 may be specifically configured to store a corresponding instruction program.
In this embodiment, the network communication port 401 may be a virtual port that is bound to different communication protocols, so that different data can be sent or received. For example, the network communication port may be a port responsible for web data communication, a port responsible for FTP data communication, or a port responsible for mail data communication. In addition, the network communication port can also be a communication interface or a communication chip of an entity. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it can also be a Wifi chip; it may also be a bluetooth chip.
In this embodiment, the processor 402 may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The description is not intended to be limiting.
In this embodiment, the memory 403 may include multiple layers, and in a digital system, the memory may be any memory as long as binary data can be stored; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.
The present specification further provides a computer-readable storage medium based on the above risk determination method, where the computer-readable storage medium stores computer program instructions, and when the computer program instructions are executed, the computer program instructions implement: acquiring a plurality of characteristic data of a target object; mapping the plurality of feature data to a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace; calling a preset processing model to process the plurality of feature groups to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint and dynamic pairwise constraint in advance; and determining whether the target object has a preset risk or not according to the processing result.
In this embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.
In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer-readable storage medium can be explained in comparison with other embodiments, and are not described herein again.
Referring to fig. 5, in a software level, an embodiment of the present specification further provides a risk determining apparatus, which may specifically include the following structural modules:
the obtaining module 501 may be specifically configured to obtain a plurality of feature data of a target object;
the mapping module 502 may be specifically configured to map the plurality of feature data to a plurality of preset feature subspaces according to a preset mapping rule, so as to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace;
the invoking module 503 is specifically configured to invoke a preset processing model to process the plurality of feature groups to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint and dynamic pairwise constraint in advance;
the determining module 504 may be specifically configured to determine whether the target object has a preset risk according to the processing result.
It should be noted that, the units, devices, modules, etc. illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. It is to be understood that, in implementing the present specification, functions of each module may be implemented in one or more pieces of software and/or hardware, or a module that implements the same function may be implemented by a combination of a plurality of sub-modules or sub-units, or the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Therefore, the risk determining device provided by the embodiment of the specification can be better suitable for a data prediction scene with higher feature dimension and more complexity, and accurately predicts whether the target object has the preset risk.
In a specific scenario example, the risk determination method provided in this specification may be applied to construct and utilize a semi-supervised overdue principal loan overdue prediction model (a preset processing model) based on subspace consistency and dynamic pairwise constraints, so as to improve the accuracy of overdue principal loan overdue (a preset risk) prediction. The following can be referred to as a specific implementation process.
In a specific application, as shown in fig. 6, the method may include the following steps: characteristic information (for example, a plurality of characteristic data of a target object) related to overdue prediction of the public fund loan is firstly acquired from a data warehouse. And carrying out data preprocessing and characteristic engineering processing on the sample. And constructing a test sample by using the characteristics of the data to be predicted. And inputting the test sample into a semi-supervised accumulation fund loan overdue prediction model based on subspace consistency and dynamic pairwise constraint to obtain a prediction result.
The training process of the semi-supervised public accumulation loan overdue prediction model based on subspace consistency and dynamic pair constraint is shown in fig. 7. Training samples are obtained through data preprocessing and feature engineering processing, and the training samples comprise a small amount of labeled samples and a large amount of unlabeled samples.
Firstly, mapping the features (e.g. a plurality of sample data) of the sample into a total of K random subspaces (e.g. a plurality of preset feature subspaces) including a random subspace 1, a random subspace 2 … … and a random subspace K; respectively training the features of the samples with the labels (for example, a first type of sample object carrying preset labels) in the subspace to obtain K total sub-classifiers (for example, a plurality of intermediate classifiers) comprising a classifier 1 and a classifier 2 … …; and then, giving a corresponding label (for example, a dynamic label) to the unlabeled sample (for example, a second type sample object which does not carry a preset label) by using the voting results of the K sub-classifications, and updating the label of the unlabeled sample in the optimization process of the model.
Secondly, the dynamic pairwise constraints can be propagated from the labeled data points to the unlabeled data points by using the dynamic labels of the unlabeled samples in the model optimization process, the dynamic pairwise constraints among all the samples are calculated, the constraint information of the whole data set is finally realized, and the classifier is updated by effectively using the dynamic pairwise constraints in an iterative manner, so that the similar samples are as close as possible in the output space, and the heterogeneous samples are as far as possible in the output space.
Meanwhile, different subspaces can be mutually optimized by designing subspace consistency constraint, the subspace learning effect is improved, the robustness is enhanced, and further a semi-supervised public accumulation fund loan overdue prediction model based on subspace consistency and dynamic pairwise constraint can be obtained.
In particular implementations, the classifier can be iteratively optimized by minimizing empirical losses, dynamic pairwise constraints, and subspace consistency constraints.
In this scenario example, the model for predicting overdue semi-supervised public accumulation fund loan based on subspace consistency and dynamic pairwise constraint is specifically constructed, and may include the following three parts: data preprocessing, feature engineering, model construction and training. The following will specifically explain each part of the content.
1. About "data preprocessing"
1.1, data selection. The data utilized by the modeling comprises basic identity information of an individual user, data information of individual house accumulation fund payment, loan and the like. The method is characterized in that related characteristics related to overdue prediction of the public accumulation fund loan are divided into three types, the first type is basic information such as age, gender, and regions, the second type is house public accumulation fund payment information such as the basic number of real-time personal payment, the balance of a personal account, the payment amount of a personal month and the like, and the third type is loan information such as loan issuance amount, loan balance, loan interest rate and the like. The data ranges and thus the data tables involved can be determined by category.
And 1.2, constructing label information. For the user who finishes repayment of the public accumulation fund loan, the user who has overdue repayment is defined as an overdue client, the label is set to be 1, and the first type sample omega is represented1Defining the users who return all the loans on time as non-overdue users, setting the label as-1 and representing the second type sample omega2. For users still in the payment period, no label sample is defined, and no label needs to be constructed.
2. About "characteristic engineering"
And 2.1, processing the missing value. And observing data columns in the data table, completing columns with missing values in a certain mode, completing columns with missing values of numerical features by using a column '0' value, completing missing values of non-numerical features by using 'un' and directly deleting the field for columns with particularly serious missing values.
2.2 multivariate derivation variables exploration. And evolving the characteristics, such as grouping statistical information (maximum value, minimum value, mean value, variance and the like) of the numerical characteristics according to the category characteristics, deviation value characteristics (difference values between the original characteristics and the minimum value, maximum value and mean value of the column and the like) of the numerical characteristics, cross characteristics (correlation addition, subtraction, multiplication and division operations between the numerical characteristics obtain a new column) between the numerical characteristics and the like.
3. About "model construction and training"
And 3.1, constructing label information of the label-free sample. Mapping the characteristics of the samples to K random subspaces, respectively training K sub-classifiers by using labels in the subspaces, endowing the label to the unlabeled sample by using the voting result of the K sub-classifications, and updating the label of the unlabeled sample in the optimization process of the model. The method comprises the following specific steps:
Figure BDA0003019524060000201
wherein K is the number of subspaces, fk(. is a base classifier in the kth subspace, xuFor unlabeled specimen,/uGiving unlabeled sample x to voting results using K sub-classificationsuAn assigned label.
And 3.2, constructing dynamic pair constraints. Calculating a connection set between samples by using dynamic labels of the unlabeled samples in the model optimization process: ML { (x)i,xj)|li=ljAnd non-connected set: CL { (x)i,xj)|li≠ljAnd then calculating a dynamic pair matrix S between all samples. The calculation method is as follows:
Figure BDA0003019524060000202
and then, calculating dynamic pairwise constraints by using the dynamic pairwise matrix, wherein the calculation mode is as follows:
Figure BDA0003019524060000203
where K is the number of random subspaces, and X, X' is any two training samples (including labeled samples and unlabeled samples).
And 3.3, constructing subspace consistency constraint. Through designing subspace consistency constraint, different subspaces can be mutually optimized, the subspace learning effect is improved, and robustness is enhanced. The subspace consistency constraint calculation method is as follows:
Figure BDA0003019524060000211
wherein, XLThere is a set of labeled samples in the training samples.
And 3.4, designing an objective function. The classifier is iteratively optimized by minimizing empirical losses, dynamic pairwise constraints, and subspace consistency constraints. The objective function is as follows: l ═ Remp+α·R1+β·R2
Further, the objective function is expanded as follows:
Figure BDA0003019524060000212
wherein Y is a labeled sample set label, RempFor empirical loss, α and β are hyperparameters used to adjust the weights of the above items.
And 3.5, optimizing the model. And solving the optimization problem by using a gradient descent method, and minimizing the objective function of the model until the preset iteration times are reached or the difference between the loss values of the two loss functions is less than a preset threshold value. And obtaining a final classification model. The specific discriminant function is as follows:
Figure BDA0003019524060000213
3.6, model testing. And inputting the test sample x into a discrimination function of the classifier to obtain a discrimination result of the model.
In the scene example, a semi-supervised accumulation fund loan overdue prediction model based on subspace consistency and dynamic pairwise constraint is established through the training in the above manner, and the training samples comprise a small amount of labeled samples and a large amount of unlabeled samples. First, subspace learning is used instead of stitching together all classes of features, preventing too many features from posing a "dimensional disaster" problem in the training process. Secondly, by designing subspace consistency constraint, different subspaces can be mutually optimized, the subspace learning effect is improved, and robustness is enhanced. Secondly, in the iterative optimization process of the model, possible dynamic labels are given to the unlabeled samples according to the voting results of the models in the multiple subspaces, and the spatial structure information contained in the unlabeled samples is further fully utilized. And finally, designing a dynamic pair-wise constraint to enable the similar samples to be as close as possible in an output space and the heterogeneous samples to be as far away as possible in the output space, and spreading the pair-wise constraint from the labeled data points to the unlabeled data points by using the dynamic labels of the unlabeled samples in the model optimization process, so that the constraint information of the whole data set is finally realized, and the generalization effect of the model is improved.
Through the scene example, compared with the traditional model based on the semi-supervised learning algorithm, the model obtained through training has better effect on the accuracy rate, the recall rate and the comprehensive evaluation value of overdue prediction classification of the public deposit loan, and can more accurately predict overdue users of the public deposit loan. The model is applied to financial institutions such as banks and the like, and an accurate risk control model can be established by utilizing personal basic identity information and personal data information such as house public deposit payment and loan, so that the safety of credit assets and the repayment capability of credit subjects can be scientifically evaluated, and the overdue risk of public deposit loan can be furthest prevented.
Although the present specification provides method steps as described in the examples or flowcharts, additional or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. The terms first, second, etc. are used to denote names, but not any particular order.
Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer-readable storage media including memory storage devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus necessary general hardware platform. With this understanding, the technical solutions in the present specification may be essentially embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments in the present specification.
The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims (14)

1. A method for determining risk, comprising:
acquiring a plurality of characteristic data of a target object;
mapping the plurality of feature data to a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace;
calling a preset processing model to process the plurality of feature groups to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint and dynamic pairwise constraint in advance;
and determining whether the target object has a preset risk or not according to the processing result.
2. The method of claim 1, wherein the target object comprises a transaction account; correspondingly, the preset risk comprises the overdue risk of the transaction data.
3. The method of claim 2, wherein the characterization data comprises: identity class characteristic data of the transaction account, historical transaction behavior class characteristic data of the transaction account and current transaction behavior class characteristic data of the transaction account.
4. The method of claim 1, wherein the pre-defined processing model further comprises a discriminant structure; the judging structure is connected with the plurality of preset classifiers; and the judging structure is used for generating the processing result according to the classification result output by the plurality of preset classifiers.
5. The method of claim 4, wherein the pre-set treatment model is established as follows:
acquiring a plurality of sample data; the sample data corresponds to a sample object, and the sample data comprises a plurality of characteristic data of the corresponding sample object; the sample objects comprise a first type sample object carrying a preset label and a second type sample object not carrying the preset label;
mapping a plurality of feature data of the first type sample object to a plurality of preset feature subspaces according to a preset mapping rule to obtain first type training data; mapping a plurality of feature data of the second type sample object to a plurality of preset feature subspaces to obtain second type training data;
training a plurality of initial classifiers by using the first type of training data to obtain a plurality of corresponding intermediate classifiers; wherein the initial classifiers correspond to a preset feature subspace respectively;
constructing an objective function for a plurality of intermediate classifiers using the plurality of intermediate classifiers, the first class of training data, the second class of training data; wherein, the objective function comprises a subspace consistency constraint formula and a dynamic pairwise constraint formula;
and determining a plurality of preset classifiers meeting the requirements according to the objective function so as to construct and obtain a preset processing model.
6. The method of claim 5, wherein constructing an objective function for a plurality of intermediate classifiers using the plurality of intermediate classifiers, the first class of training data, and the second class of training data comprises:
calling a plurality of intermediate classifiers to process the second class of training data so as to obtain dynamic labels aiming at the second class of sample objects;
constructing a dynamic pairwise constraint equation according to the preset labels of the first type sample objects and the dynamic labels of the second type sample objects;
calling a plurality of intermediate classifiers to process the first type of training data to obtain a plurality of classification results of the first type of sample objects;
constructing a subspace consistency constraint formula according to a plurality of classification results of the first type sample objects;
and constructing the objective function according to the dynamic pairwise constraint equation and the subspace consistency constraint equation.
7. The method of claim 6, wherein constructing a dynamic pairwise constraint equation according to the preset labels of the first type sample objects and the dynamic labels of the second type sample objects comprises:
establishing a connection set and a non-connection set according to the preset label of the first type sample object and the dynamic label of the second type sample object; wherein the connected set comprises a plurality of connected groups, and the connected groups comprise two sample objects with the same label; the non-connected set comprises a plurality of non-connected groups, and the non-connected groups comprise two sample objects with different labels;
constructing a dynamic pairing matrix according to the connected set and the non-connected set;
and constructing the dynamic pairwise constraint equation according to the dynamic pairwise matrix and the plurality of intermediate classifiers.
8. The method of claim 6, wherein constructing a subspace consistency constraint equation based on the plurality of classification results for the first class of sample objects comprises:
dividing a plurality of comparison groups according to a plurality of classification results of the first type sample objects; wherein the comparison set comprises two classification results of the same first type sample object;
and constructing the subspace consistency constraint equation according to the plurality of comparison groups.
9. The method of claim 5, wherein determining a plurality of required pre-set classifiers according to the objective function to construct a pre-set processing model comprises:
solving the optimal solution of the objective function by using a gradient descent algorithm to determine a plurality of preset classifiers which meet the requirements;
and connecting a plurality of preset classifiers with the discrimination structure to obtain a preset processing model.
10. The method of claim 5, wherein after obtaining a plurality of sample data, the method further comprises:
detecting whether the missing quantity of the characteristic data of the sample data is smaller than a preset missing quantity threshold value or not;
and under the condition that the missing quantity of the characteristic data of the sample data is determined to be smaller than a preset missing quantity threshold value, zero filling processing is carried out on the characteristic data with missing.
11. A method for determining risk, comprising:
acquiring a plurality of characteristic data of a target object;
mapping the plurality of feature data to a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace;
calling a preset processing model to process the plurality of feature groups to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint or dynamic pairwise constraint in advance;
and determining whether the target object has a preset risk or not according to the processing result.
12. An apparatus for risk determination, comprising:
the acquisition module is used for acquiring a plurality of characteristic data of the target object;
the mapping module is used for mapping the plurality of feature data into a plurality of preset feature subspaces according to a preset mapping rule to obtain a plurality of feature groups of the target object; wherein the feature groups respectively correspond to a preset feature subspace;
the calling module is used for calling a preset processing model to process the plurality of feature groups so as to obtain corresponding processing results; the preset processing model at least comprises a plurality of preset classifiers, and the preset classifiers correspond to a preset feature subspace respectively; the preset processing model is obtained by training based on subspace consistency constraint and dynamic pairwise constraint in advance;
and the determining module is used for determining whether the target object has a preset risk or not according to the processing result.
13. A server comprising a processor and a memory for storing processor-executable instructions which, when executed by the processor, implement the steps of the method of any one of claims 1 to 10.
14. A computer-readable storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 10.
CN202110398902.5A 2021-04-14 2021-04-14 Risk determination method and device and server Pending CN113095408A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110398902.5A CN113095408A (en) 2021-04-14 2021-04-14 Risk determination method and device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110398902.5A CN113095408A (en) 2021-04-14 2021-04-14 Risk determination method and device and server

Publications (1)

Publication Number Publication Date
CN113095408A true CN113095408A (en) 2021-07-09

Family

ID=76677251

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110398902.5A Pending CN113095408A (en) 2021-04-14 2021-04-14 Risk determination method and device and server

Country Status (1)

Country Link
CN (1) CN113095408A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435900A (en) * 2021-07-12 2021-09-24 中国工商银行股份有限公司 Transaction risk determination method and device and server
CN113688802A (en) * 2021-10-22 2021-11-23 季华实验室 Gesture recognition method, device and equipment based on electromyographic signals and storage medium
CN114819614A (en) * 2022-04-22 2022-07-29 支付宝(杭州)信息技术有限公司 Data processing method, device, system and equipment
CN116226744A (en) * 2023-03-16 2023-06-06 中金同盛数字科技有限公司 User classification method, device and equipment
CN117011063A (en) * 2023-09-25 2023-11-07 中国建设银行股份有限公司 Customer transaction risk prediction processing method and device
CN117787727A (en) * 2024-02-26 2024-03-29 百融云创科技股份有限公司 Service risk prediction method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529508A (en) * 2016-12-07 2017-03-22 西安电子科技大学 Local and non-local multi-feature semantics-based hyperspectral image classification method
CN107563445A (en) * 2017-09-06 2018-01-09 苏州大学 A kind of method and apparatus of the extraction characteristics of image based on semi-supervised learning
CN110442712A (en) * 2019-07-05 2019-11-12 阿里巴巴集团控股有限公司 Determination method, apparatus, server and the text of risk try system
CN111027582A (en) * 2019-09-20 2020-04-17 哈尔滨理工大学 Semi-supervised feature subspace learning method and device based on low-rank graph learning
CN111353516A (en) * 2018-12-21 2020-06-30 华为技术有限公司 Sample classification method and model updating method for online learning
CN111681091A (en) * 2020-08-12 2020-09-18 腾讯科技(深圳)有限公司 Financial risk prediction method and device based on time domain information and storage medium
CN112257808A (en) * 2020-11-02 2021-01-22 郑州大学 Integrated collaborative training method and device for zero sample classification and terminal equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106529508A (en) * 2016-12-07 2017-03-22 西安电子科技大学 Local and non-local multi-feature semantics-based hyperspectral image classification method
CN107563445A (en) * 2017-09-06 2018-01-09 苏州大学 A kind of method and apparatus of the extraction characteristics of image based on semi-supervised learning
CN111353516A (en) * 2018-12-21 2020-06-30 华为技术有限公司 Sample classification method and model updating method for online learning
CN110442712A (en) * 2019-07-05 2019-11-12 阿里巴巴集团控股有限公司 Determination method, apparatus, server and the text of risk try system
CN111027582A (en) * 2019-09-20 2020-04-17 哈尔滨理工大学 Semi-supervised feature subspace learning method and device based on low-rank graph learning
CN111681091A (en) * 2020-08-12 2020-09-18 腾讯科技(深圳)有限公司 Financial risk prediction method and device based on time domain information and storage medium
CN112257808A (en) * 2020-11-02 2021-01-22 郑州大学 Integrated collaborative training method and device for zero sample classification and terminal equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
征察;吉立新;高超;李邵梅;吴翼腾;: "基于成对约束的偏标记数据消歧算法", 自动化学报, no. 07, 21 November 2018 (2018-11-21), pages 1367 - 1377 *
王娜 等: "基于监督信息特性的主动半监督谱聚类算法", 电子学报, vol. 38, no. 1, 31 January 2010 (2010-01-31), pages 172 - 176 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435900A (en) * 2021-07-12 2021-09-24 中国工商银行股份有限公司 Transaction risk determination method and device and server
CN113688802A (en) * 2021-10-22 2021-11-23 季华实验室 Gesture recognition method, device and equipment based on electromyographic signals and storage medium
CN114819614A (en) * 2022-04-22 2022-07-29 支付宝(杭州)信息技术有限公司 Data processing method, device, system and equipment
CN116226744A (en) * 2023-03-16 2023-06-06 中金同盛数字科技有限公司 User classification method, device and equipment
CN117011063A (en) * 2023-09-25 2023-11-07 中国建设银行股份有限公司 Customer transaction risk prediction processing method and device
CN117011063B (en) * 2023-09-25 2023-12-29 中国建设银行股份有限公司 Customer transaction risk prediction processing method and device
CN117787727A (en) * 2024-02-26 2024-03-29 百融云创科技股份有限公司 Service risk prediction method, device, equipment and storage medium
CN117787727B (en) * 2024-02-26 2024-05-31 百融云创科技股份有限公司 Service risk prediction method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
US20240281727A1 (en) Machine learning artificial intelligence system for predicting hours of operation
EP3985578A1 (en) Method and system for automatically training machine learning model
CN113095408A (en) Risk determination method and device and server
US10504120B2 (en) Determining a temporary transaction limit
US9576248B2 (en) Record linkage sharing using labeled comparison vectors and a machine learning domain classification trainer
CN107729519B (en) Multi-source multi-dimensional data-based evaluation method and device, and terminal
CN108334625B (en) User information processing method and device, computer equipment and storage medium
CN110942392A (en) Service data processing method, device, equipment and medium
CA3169417A1 (en) Method of and system for appraising risk
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN113435900A (en) Transaction risk determination method and device and server
CN116800831B (en) Service data pushing method, device, storage medium and processor
US11295325B2 (en) Benefit surrender prediction
CN115204881A (en) Data processing method, device, equipment and storage medium
CN117114901A (en) Method, device, equipment and medium for processing insurance data based on artificial intelligence
CN111047336A (en) User label pushing method, user label display method, device and computer equipment
CN114066513A (en) User classification method and device
CN114549174A (en) User behavior prediction method and device, computer equipment and storage medium
CN114170000A (en) Credit card user risk category identification method, device, computer equipment and medium
CN113094595A (en) Object recognition method, device, computer system and readable storage medium
CN113781235B (en) Data processing method, device, computer equipment and storage medium
EP4310755A1 (en) Self learning machine learning transaction scores adjustment via normalization thereof
CN116151986A (en) Fund recommendation method, device and equipment based on user risk type
CN114255101A (en) Product information recommendation method and device, computer equipment and storage medium
CN113989012A (en) Method, device, medium and equipment for classifying borrowing object crowd of bad assets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination