CN115618008A

CN115618008A - Account state model construction method and device, computer equipment and storage medium

Info

Publication number: CN115618008A
Application number: CN202211122877.9A
Authority: CN
Inventors: 吴嘉婧; 章杨清; 黄宝莹; 赵山河; 尹川学; 郭海旭; 郑子彬
Original assignee: Merchants Union Consumer Finance Co Ltd; Sun Yat Sen University
Current assignee: Merchants Union Consumer Finance Co Ltd; Sun Yat Sen University
Priority date: 2022-09-15
Filing date: 2022-09-15
Publication date: 2023-01-17

Abstract

The application relates to an account state model construction method and device, computer equipment and a storage medium. The method comprises the following steps: acquiring historical user characteristic data; constructing a user characteristic map according to historical user characteristic data; generating a corresponding node initial characteristic vector based on the node attribute characteristics of each node in the user characteristic map and the initial value of the model training parameter, wherein the initial value of the model training parameter is the initial value of the parameter to be trained in the account state model; obtaining an incidence matrix based on the fusion of the initial feature vectors of the nodes, wherein the incidence matrix is used for representing the incidence relation of each node and the importance of each node; obtaining feature input sample data based on the user feature map, the incidence matrix and the node initial feature vector; and training the characteristic input sample data to obtain an account state model, wherein the account state model is used for evaluating the operation state of the target account. By adopting the method, the accuracy of the account state model prediction can be effectively improved.

Description

Account state model construction method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for constructing an account status model, a computer device, and a storage medium.

Background

With the development of computer technology, more and more users can perform online operation through channels such as mobile phone applications, websites and the like, in the process, analysis and mining are performed according to feature data submitted by the users, and the users are the current operation states of accounts.

In the traditional technology, the state scores of all relevant equipment nodes connected with each medium node are obtained by constructing a user equipment relation graph, and are used as confidence scores to analyze the operation state of a user target account, so that the model prediction accuracy is low.

Disclosure of Invention

Therefore, it is necessary to provide an account status model construction method, an account status model construction device, a computer device, and a computer-readable storage medium, which can effectively improve the accuracy of the account status model prediction.

An account state model building method, the method comprising:

acquiring historical user characteristic data;

constructing a user characteristic map according to historical user characteristic data;

generating a corresponding node initial characteristic vector based on the node attribute characteristics of each node in the user characteristic map and the initial value of the model training parameter, wherein the initial value of the model training parameter is the initial value of the parameter to be trained in the account state model;

obtaining an incidence matrix based on the fusion of the initial feature vectors of the nodes, wherein the incidence matrix is used for representing the incidence relation of each node and the importance of each node;

obtaining feature input sample data based on the user feature map, the incidence matrix and the node initial feature vector;

and training the characteristic input sample data to obtain an account state model, wherein the account state model is used for evaluating the operation state of the target account.

In one embodiment, generating a corresponding node initial feature vector based on the node attribute feature of each node in the user feature map and the initial value of the model training parameter includes:

acquiring a nonlinear activation function;

determining the original node characteristics of each node according to the node attributes of each node;

and generating a node initial characteristic vector corresponding to each node based on the nonlinear activation function, the original node characteristics and the model training parameter initial value.

when the node is a contact way node, acquiring an initial characteristic vector of an adjacent node corresponding to the adjacent node of the contact way node, wherein the contact way node is a node of which the node attribute is the contact way;

and determining the initial characteristic vectors of the nodes corresponding to the contact mode nodes according to the average value of the initial characteristic vectors of the adjacent nodes.

In one embodiment, obtaining the incidence matrix of each node based on the initial feature vector fusion of the nodes includes:

acquiring a user identification node in each node;

obtaining a user initial characteristic vector according to account channel information, geographic information and application state information of the user identification node;

and obtaining the incidence matrix based on the fusion of the initial feature vectors of the user and the initial feature vectors of the nodes corresponding to other nodes.

In one embodiment, obtaining feature input sample data based on the user feature map, the incidence matrix and the node initial feature vector comprises:

determining a sub-graph in the user characteristic graph according to user identification nodes in all nodes;

obtaining a user characteristic map weight matrix according to the sub-map;

fusing the user characteristic map weight matrix and the incidence matrix to obtain a node degree matrix;

obtaining a sub-map degree matrix based on sub-map nodes and incidence matrixes in the sub-map;

and obtaining feature input sample data based on the user feature map weight matrix, the incidence matrix, the node degree matrix, the sub-graph spectral degree matrix and the node initial feature vector.

In one embodiment, training to obtain an account status model based on feature input sample data includes:

normalizing the characteristic input sample data to obtain standard input sample data;

and inputting standard input sample data into the graph convolution neural network model for training to obtain an account state model.

In one embodiment, inputting standard input sample data into a graph convolution neural network model for training to obtain an account state model, including:

inputting standard input sample data into a graph convolution neural network model to obtain a model actual output value;

classifying the actual output value of the model according to a preset classification algorithm to obtain an actual account state estimation result corresponding to input sample data;

when the error between the actual estimation result of the account state and the standard estimation result of the account state preset in the standard input sample data is larger than a preset error threshold value, returning to the step of obtaining the actual output value of the model by convolving the standard input sample data into the neural network model;

and when the error between the actual estimation result of the account state and the standard estimation result of the account state preset in the standard input sample data is less than or equal to a preset error threshold, finishing training to obtain an account state model.

An account state model building apparatus, the apparatus comprising:

the data acquisition module is used for acquiring historical user characteristic data;

the sample data generation module is used for constructing a user characteristic map according to historical user characteristic data; generating a corresponding node initial characteristic vector based on the node attribute characteristics of each node in the user characteristic map and the initial value of the model training parameter, wherein the initial value of the model training parameter is the initial value of the parameter to be trained in the account state model; obtaining an incidence matrix based on the fusion of the initial feature vectors of the nodes, wherein the incidence matrix is used for representing the incidence relation of each node and the importance of each node; obtaining characteristic input sample data based on the user characteristic map, the incidence matrix and the node initial characteristic vector;

and the training module is used for training to obtain an account state model based on the characteristic input sample data, and the account state model is used for evaluating the operation state of the target account.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring historical user characteristic data;

obtaining an incidence matrix based on the node initial feature vector fusion, wherein the incidence matrix is used for representing the incidence relation of each node and the importance of each node;

A computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of:

acquiring historical user characteristic data;

According to the account state model construction method, the account state model construction device, the computer equipment and the storage medium, the historical user characteristic data are obtained, the user characteristic map is constructed according to the historical user characteristic data, the corresponding node initial characteristic vectors are generated based on the node attribute characteristics of all nodes in the user characteristic map and the initial values of model training parameters, the initial values of the model training parameters are the initial values of parameters to be trained in the account state model, the incidence matrixes are obtained based on the fusion of the node initial characteristic vectors and are used for representing the incidence relation of all nodes and the importance of all nodes, the characteristic input sample data are obtained based on the user characteristic map, the incidence matrixes and the node initial characteristic vectors, the account state model is obtained based on the training of the characteristic input sample data, and the account state model is used for evaluating the operation state of a target account. Therefore, the initial values of the model training parameters participate in the process of generating the initial characteristic vectors of the nodes, so that new training parameters are continuously iterated along with the iteration of the model, the characteristic vectors of the nodes are regenerated according to the new parameters generated by each iteration, the convergence of the model is accelerated, and the incidence matrix is obtained through the fusion of the initial characteristic vectors of the nodes.

Drawings

FIG. 1 is a diagram of an application environment of a method for building an account state model in one embodiment;

FIG. 2 is a schematic flow chart diagram of a method for constructing an account state model according to one embodiment;

FIG. 3 is a schematic flow chart illustrating the generation of initial feature vectors for nodes in one embodiment;

FIG. 4 is a schematic diagram of a process for generating initial feature vectors of contact details nodes in another embodiment;

FIG. 5 is a schematic flow chart diagram for generating a correlation matrix in one embodiment;

FIG. 6 is a schematic diagram of a process for generating feature input sample data in one embodiment;

FIG. 7 is a schematic flow chart diagram illustrating the process of obtaining an account status model in one embodiment;

FIG. 8 is a schematic flow chart diagram that illustrates training of an account state model in one embodiment;

FIG. 9 is a block diagram of an apparatus for constructing an account state model according to an embodiment;

FIG. 10 is a diagram showing an internal structure of a computer device in one embodiment;

FIG. 11 is a diagram illustrating all nodes associated with a user in one embodiment;

FIG. 12 is a diagram of a user profile in one embodiment;

FIG. 13 is a flow diagram illustrating the prediction of a risk of overdue payment for a user account, according to one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The account state model construction method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. The computer device 102 acquires historical user feature data, constructs a user feature map according to the historical user feature data, generates corresponding node initial feature vectors based on node attribute features of each node in the user feature map and initial values of model training parameters, wherein the initial values of the model training parameters are initial values of parameters to be trained in an account state model, obtains an incidence matrix based on the fusion of the node initial feature vectors, the incidence matrix is used for representing incidence relations of the nodes and importance of the nodes, obtains feature input sample data based on the user feature map, the incidence matrix and the node initial feature vectors, obtains the account state model based on feature input sample data training, and the account state model is used for evaluating the operation state of a target account. The computer device 102 may specifically include, but is not limited to, various personal computers, laptops, servers, smartphones, tablets, smart cameras, portable wearable devices, and the like.

In one embodiment, as shown in fig. 2, there is provided an account status model building method, which is described by taking the method as an example applied to the computer device 102 in fig. 1, and includes the following steps:

step S202, historical user characteristic data is acquired.

The historical user characteristic data is personal data of the user, including birth year and month, academic level, associated enterprises, contact person and inviter information and the like, and the data are necessary information filled when the user applies for related activities on the internet. The server can also obtain equipment information used when a user logs in the client, such as an equipment identification code, an equipment brand, a physical address and the like, finally, after desensitization, missing value and exception processing, the user data is divided into six categories, namely user identification, identity characteristics, unit characteristics, bank card characteristics, equipment characteristics and contact ways, each user corresponds to state label information of an account in an abnormal state or a normal state, and the state label information is used as standard output sample data of subsequent model training.

And step S204, constructing a user characteristic map according to the historical user characteristic data.

The user identification node comprises node attributes such as an account creating channel, a geographic position and an application state, the identity characteristic node comprises node attributes such as an identity state, a academic position and a credit investigation condition, the node attributes such as an enterprise type, a geographic position and registered capital of a unit node, the node attributes such as an account type and an account opening city, and the node attributes such as an equipment identification code and an equipment system.

Specifically, the computer device takes user identification, identity characteristics, unit characteristics, bank card characteristics, device characteristics and contact ways in historical user characteristic data as nodes, and takes the association relationship between each node as an edge to connect together to construct a user characteristic map.

Step S206, generating corresponding node initial feature vectors based on the node attribute features of each node in the user feature map and the initial values of the model training parameters, wherein the initial values of the model training parameters are the initial values of the parameters to be trained in the account state model.

The initial feature vectors of the nodes are vectors representing node attributes of corresponding nodes, that is, the initial feature vectors of the nodes can represent an association relationship between each node, for example, when the association between two nodes is high, the distance between two corresponding initial feature vectors of the nodes is smaller, the initial feature vectors of the nodes corresponding to each node are vectors with the same dimension, the initial value of the model training parameter is the initial value of the parameter to be trained in the account state model, and new model training parameters are iterated every iteration along with the model training, wherein the initial value of the parameter to be trained in the account state model can be specified according to experience, and can also be determined according to intelligent optimization algorithms such as a genetic algorithm, a particle swarm algorithm and the like.

Specifically, the computer device generates a node initial feature vector according to the node attribute features of each node in the user feature map and the initial value of the model training parameter, normalizes the node initial feature vector, uses the value of the training parameter of each iteration of the model as a new model training parameter, and reconstructs the initial feature vector of each node, that is, along with each iteration of the model, the node initial feature vector of each node is not stopped to iterate, and then a group of node initial feature vectors which can represent the current user feature map most are iterated.

And S208, obtaining an incidence matrix based on the initial feature vector fusion of the nodes, wherein the incidence matrix is used for representing the incidence relation of each node and the importance of each node.

The incidence matrix is used for representing incidence relation of each node and importance of each node, row and column information of each matrix element in the incidence matrix is used for representing position information of a corresponding node in the user characteristic map relative to the map, and numerical value information of the matrix element is used for representing importance of the corresponding node in the user characteristic map, namely, the larger the numerical value of the matrix element is, the larger the importance of the node corresponding to the matrix element is.

Specifically, the computer device divides each sub-map according to the user feature nodes in the user feature map, determines the row and column position information of the matrix elements corresponding to the target nodes in the incidence matrix according to the sub-map positions in the user feature map where the corresponding target nodes are located, and constructs the size of the matrix elements corresponding to each node according to the value of the initial feature vector of each node.

And step S210, obtaining characteristic input sample data based on the user characteristic map, the incidence matrix and the node initial characteristic vector.

Specifically, the computer device divides each sub-map according to user feature nodes in the user feature map, assigns a corresponding weight value to each sub-map to obtain a weight diagonal matrix of the whole user feature map, calculates a degree matrix of the corresponding node and a degree matrix of the sub-map according to the weight value of each sub-map, and finally uses the weight diagonal matrix, the association matrix, the degree matrix of the node and the degree matrix of the sub-map of the whole user feature map as an input sample matrix of the model.

Step S212, training based on the characteristic input sample data to obtain an account state model, wherein the account state model is used for evaluating the operation state of the target account.

The account state model is used to evaluate the operation state of the target account, for example, to determine the risk state of the target account, and determine whether there is a risk or no risk, and the account state model may be, but is not limited to, various deep learning models and neural network models, such as a wavelet neural network, a convolutional neural network, a recurrent neural network, and the like.

Specifically, the computer device inputs the characteristic input sample data determined in the previous step into the account state model for training until the error between the model output result and the standard output sample data in the sample data is less than or equal to a preset error threshold value or the iteration times are greater than the preset iteration times, and the training is stopped.

In the embodiment, the initial values of the model training parameters are involved in the process of generating the initial feature vectors of the nodes, so that new training parameters are continuously iterated along with the iteration of the model, the feature vectors of the nodes are regenerated according to the new parameters generated by each iteration, the convergence of the model is accelerated, and the incidence matrix is obtained by the fusion of the initial feature vectors of the nodes.

In one embodiment, as shown in fig. 3, generating a corresponding node initial feature vector based on the node attribute feature of each node in the user feature map and the initial value of the model training parameter includes:

step S302, a nonlinear activation function is acquired.

The nonlinear activation function is used for mapping each node attribute feature in the user graph to a tool function of the same space dimension, such as a sigmoid function, and the node attribute feature is a unique feature vector which is not repeated and has different vector dimensions and is set manually according to each node attribute feature.

And step S304, determining the original node characteristics of each node according to the node attributes of each node.

Specifically, the computer device sets original node features of each node according to a preset empirical rule according to node attributes of each node, and the original node features set for the same type of node have the same dimensionality, and the original node features corresponding to different types of nodes may be different.

And S306, generating node initial characteristic vectors corresponding to the nodes based on the nonlinear activation function, the original node characteristics and the model training parameter initial values.

Specifically, the computer device performs fusion based on the original node features and model training parameters to generate fusion factors, and then calculates the corresponding node initial vector features by taking the fusion factors as the input of the nonlinear activation function.

For example, node v for a particular attribute in each node _i ∈V _card As followsEquation 1 characterizes the original node as

Transformation to uniform dimension of 1 xd _N New initial feature vector of node

Wherein, V _card For the set of all nodes in the user feature map,

is at 1 xd _card Vector of dimensions, W _card And b _card Are all parameters that need to be learned during the model training process.

In this embodiment, a nonlinear activation function is obtained, the original node characteristics of each node are determined according to the node attributes of each node, the original node characteristics and model training parameters are fused to generate a fusion factor, the fusion factor is used as the input of the nonlinear activation function, a node initial vector characteristic vector corresponding to the same dimension of each node attribute is obtained through calculation, and the generated node initial vector can reflect the attributes among the nodes better along with the continuous iteration of the model, so that the accuracy of the generated node initial vector is improved.

In one embodiment, as shown in fig. 4, generating a corresponding node initial feature vector based on the node attribute feature of each node in the user feature map and the initial value of the model training parameter includes:

step S402, when the node is a contact mode node, acquiring an initial feature vector of an adjacent node corresponding to the adjacent node of the contact mode node, wherein the contact mode node is a node of which the node attribute is the contact mode.

The contact method node is a node whose node attribute is a contact method, and includes a contact method of a corresponding user, for example, including contact telephone or mailbox information of the user, when a plurality of users share the same contact method, the related users in the user feature map are connected to the same contact method node, and the contact method node is necessarily associated with an identity feature node of the corresponding user, that is, the identity feature node of any sub-map in the user feature map is necessarily connected to the contact method node, and the user feature node in the same sub-map is necessarily connected to the corresponding contact method node, that is, a node directly connected to the contact method node in the user feature map is an adjacent node of the contact method node.

And S404, determining the node initial characteristic vector corresponding to the contact mode node according to the average value of the adjacent node initial characteristic vectors.

Specifically, the computer device averages the node initial feature vectors of the neighboring nodes of the current contact address node obtained in the previous step, and takes the average value vector as the node initial feature vector of the current contact address node.

For example, the computer device calculates the node initial feature vector of the current contact address node according to the following formula 2

Wherein k is the number of nodes adjacent to the current contact mode node,

a feature vector is initialized for the nodes of the neighboring nodes.

In this embodiment, when the node is a contact way node, the initial feature vector of the adjacent node corresponding to the adjacent node of the contact way node is obtained, and the initial feature vector of the node corresponding to the contact way node is determined according to the mean value of the initial feature vectors of the adjacent node, so that the generation steps of the initial feature vector of the node corresponding to the contact way node are simplified, and the calculation efficiency of the initial feature vector of the node is improved.

In one embodiment, as shown in fig. 5, obtaining the incidence matrix of each node based on the initial feature vector fusion of the nodes includes:

step S502, user identification nodes in all nodes are obtained.

And step S504, obtaining the initial characteristic vector of the user according to the account channel information, the geographic information and the application state information of the user identification node.

Specifically, the computer device carries out classification identification on account channel information, geographic information and application state information of user identification nodes respectively, defines several common modes for creating account channels, divides the geographic information into different large areas or provinces and cities, divides the application state information into a plurality of labels to be applied, applied and applied after auditing, and endows different labels in each attribute with different values according to preset rules, so that the user initial characteristic vector corresponding to each user identification node is obtained.

Step S506, a correlation matrix is obtained based on the fusion of the user initial feature vector and the node initial feature vectors corresponding to other nodes.

Specifically, the computer device calculates each corresponding matrix element in the incidence matrix according to the following formula 3, and then obtains the incidence matrix from each matrix element:

wherein alpha is _ij Representing the matrix elements in the ith row and the jth column of the correlation matrix,

in order to initiate the feature vector for the user,

representing initial characteristic vectors of nodes corresponding to other nodes, tanh is hyperbolic tangent function, N _j Is jth sub-spectrumThe number of all nodes contained in (1), v, W _att And U are both parameters of the training learning in the model.

In the embodiment, user identification nodes in each node are obtained, account channel information, geographic information and application state information of the user identification nodes are respectively identified in a classification manner, several common manners for establishing account channels are defined, the geographic information is divided into different large areas or provinces and cities, the application state information is divided into a plurality of labels to be applied, applied and checked, different labels in each attribute are endowed with different values according to preset rules, so that a user initial characteristic vector corresponding to each user identification node is obtained, finally each matrix element in an association matrix is determined according to model training parameters, the user initial characteristic vectors of the user identification nodes and node initial characteristic vectors of other nodes, the association matrix is obtained through each matrix element, dynamic update of the association matrix is realized, appropriate matrix elements are automatically iterated, namely the importance of each node, and the accuracy of constructing the association matrix is improved.

In one embodiment, as shown in fig. 6, obtaining feature input sample data based on the user feature map, the association matrix, and the node initial feature vector includes:

step S602, determining a sub-map in the user characteristic map according to the user identification nodes in each node.

Specifically, the computer device takes the topological structure of six types of nodes, namely the user identifier, the identity characteristic, the unit characteristic, the bank card characteristic, the equipment characteristic and the contact way, corresponding to the same user in the historical user characteristic data as a sub-graph, and determines other five nodes corresponding to the user identifier node according to the user identifier node to form the sub-graph.

And step S604, obtaining a user characteristic map weight matrix according to the sub-map.

Specifically, the computer equipment specifies the weight value of each sub-map according to a preset rule specified according to experience, and then the weight values of the sub-maps in the user feature map form a user feature map weight matrix, wherein the user feature map weight matrix is a diagonal matrix.

And step S606, fusing the weight matrix and the incidence matrix based on the user characteristic map to obtain a node degree matrix.

Specifically, the computer device calculates the node degree matrix D according to the following formula 4 _v Matrix element D of ith row and ith column in middle _{v_ii} ：

Wherein M is the number of sub-maps in the user characteristic map, W _∈∈ Is the matrix element in the user characteristic map weight matrix in the E-th row and E-th column, H _i∈ Is the matrix element in the ith row and the ith column in the correlation matrix H.

Step S608, a sub-map degree matrix is obtained based on the sub-map nodes and the incidence matrix in the sub-map.

Specifically, the computer device calculates a sub-spectral density matrix D according to the following formula 5 _e E th row and e th column of (2) _{e_∈∈} Then from D _{e_∈∈} Determining a sub-spectral metric matrix D _e ：

Wherein D is _{e_∈∈} Is a subgraph spectral matrix D _e The middle element belongs to the matrix elements of the rows and columns, N is the number of nodes in the whole user characteristic map, H _i∈ Is the matrix element in the ith row and the ith column in the correlation matrix H.

Step S610, obtaining feature input sample data based on the user feature map weight matrix, the incidence matrix, the node degree matrix, the subgraph spectrum degree matrix and the node initial feature vector.

In the embodiment, a sub-map in a user characteristic map is determined according to user identification nodes in each node, a user characteristic map weight matrix is obtained according to the sub-map, a node degree matrix is obtained based on fusion of the user characteristic map weight matrix and an incidence matrix, a sub-map degree matrix is obtained based on sub-map nodes and the incidence matrix in the sub-map, and characteristic input sample data is obtained based on the user characteristic map weight matrix, the incidence matrix, the node degree matrix, the sub-map degree matrix and a node initial characteristic vector, so that nodes and a topological structure of the user characteristic map are fully mined, and the accuracy of constructing the characteristic input sample data according to the user characteristic map is improved.

In one embodiment, as shown in fig. 7, training to obtain an account status model based on feature input sample data includes:

step S702, carrying out normalization processing on the characteristic input sample data to obtain standard input sample data;

step S704, inputting standard input sample data into the graph convolution neural network model for training to obtain an account state model.

Specifically, the computer equipment obtains a node feature propagation rule in hypergraph convolution according to standard input sample data and convolution kernel fusion of a graph convolution neural network:

wherein, X ^(l) And X ^(l+1) Representing the node features at the l-th and l + 1-th levels, W is the user feature graph weight matrix determined according to step S604, Θ ^(l) And a convolution kernel representing the l-th layer can carry out dimension transformation on the node characteristics, and the weight can be trained. H ^T Represents the transpose of H, H being the correlation matrix determined according to step S506, D _v (node degree matrix) and D _e The (sub-graph spectral degree matrix) plays a role of normalization. According to the definition of hypergraph convolution, the layer-by-layer propagation rule can realize the conversion of point-edge-point: first, the node features pass through the learnable theta ^(l) To extract new dimensional features of the nodes, then to compare with H ^T After multiplication, the feature sum of the included nodes, namely the super edge feature, is calculated for each super edge. Finally, multiplying the characteristic by H to aggregate the characteristics of all the super edges where the nodes are located, and forming the node characteristics with high-order correlation。

In the embodiment, the account state model is obtained by inputting the characteristic input sample data into the graph convolution neural network model for training, so that the high-order relation and the local clustering structure between the nodes can be effectively captured, and the efficient information propagation between the nodes is realized.

In one embodiment, as shown in fig. 8, inputting standard input sample data into a graph convolutional neural network model for training to obtain an account status model, including:

step S802, inputting standard input sample data into the graph convolution neural network model to obtain the actual output value of the model.

The initial values of the training parameters in the untrained convolutional neural network model may be randomly specified, may be set according to experience, and may be determined by an intelligent optimization algorithm such as a particle swarm algorithm and a genetic algorithm, which is not limited specifically here.

And step S804, classifying the actual output value of the model according to a preset classification algorithm to obtain an actual estimation result of the account state corresponding to the input sample data.

Specifically, the computer device classifies the user account into two types of clients with overdue risk accounts and no overdue risk accounts, an XGboost classifier is adopted, the actual output value of the graph convolutional neural network is input into the XGboost classifier to obtain an account state actual estimation result, and the account state actual estimation result Y is calculated according to the following formula 7 _p ：

Y _p ＝XGBoost(X _K ) Equation 7

Wherein X _K The actual output values of the neural network are convolved for the K maps determined according to the preceding steps.

Step S806, when the error between the actual estimation result of the account state and the standard estimation result of the account state preset in the standard input sample data is larger than a preset error threshold, returning to the step of inputting the standard input sample data into the convolutional neural network model to obtain the actual output value of the model.

Specifically, the computer device compares the actual estimation result of the account state obtained in the previous step with the standard estimation result of the account state preset in the standard input sample data to obtain an estimation result error value, compares the estimation result error value with a preset error threshold, returns to the step of inputting the standard input sample data into the convolutional neural network model to obtain the actual output value of the model when the estimation result error value is greater than the preset error threshold, and continues training, wherein the computer device calculates the estimation result error value according to the following formula 8:

wherein Loss is the error value of the estimation result,

representing the actual estimate of the status of the ith node account, y ⁱ And inputting an account state standard estimation result preset in the sample data for the standard.

Step S808, when the error between the actual estimation result of the account state and the standard estimation result of the account state preset in the standard input sample data is less than or equal to a preset error threshold, finishing training and obtaining an account state model.

In the embodiment, standard input sample data is input into the graph convolution neural network model to obtain a model actual output value, the model actual output value is classified according to a preset classification algorithm to obtain an account state actual estimation result corresponding to the input sample data, when an error between the account state actual estimation result and an account state standard estimation result preset in the standard input sample data is larger than a preset error threshold value, the step of inputting the standard input sample data into the graph convolution neural network model to obtain the model actual output value is returned, and when the error between the account state actual estimation result and the account state standard estimation result preset in the standard input sample data is judged to be smaller than or equal to the preset error threshold value, training is finished to obtain the account state model, and through continuous iteration, the training of the model is finished until an iteration stop condition is met, so that the accuracy of model training is improved.

The application also provides an application scenario, wherein the application scenario applies the account state model construction method, and the method is applied to a scenario for predicting overdue risks of a user account, and specifically, the application of the account state model construction method to the application scenario is as follows:

the computer device obtains personal data of the user, including the date of birth, the academic level, the associated enterprises, the contact and inviter information and the like. The data is necessary information filled in by the user when the borrowing client applies for. When the user operates at the client, the system also assigns the corresponding ID number to the user, and the ID number which does not repeat with each other is used as the unique identifier of the user in the subsequent map construction. In addition, the server can also obtain equipment information used when the user logs in the client, such as an equipment identification code, an equipment brand, a physical address and the like. Finally, after desensitization, missing value and abnormal value processing, the user data can be divided into six categories of user characteristics, identity characteristics, unit characteristics, bank card characteristics, equipment characteristics and contact ways; and storing the user data acquired in the previous steps by using a graph database, wherein the data structure of the graph database comprises a node set, an edge set and node attributes, and nodes with the same characteristic field are connected with one another. The user ID, identity, unit, bank card, device, and contact address are used as nodes, and as shown in fig. 11, all nodes associated with a user are also regarded as a super edge belonging to the user, and thus, this is also a heterogeneous graph. The user ID node comprises node attributes such as an account number channel, a geographic position and an application state, the identity node comprises node attributes such as an identity state, a academic position and a credit investigation condition, the unit node comprises node attributes such as an enterprise type, a geographic position and registered capital, the bank card node comprises node attributes such as an account type and an account opening city, and the equipment node comprises node attributes such as an equipment identification code and an equipment system.

Because there are relationships such as inviters and contacts among users, a plurality of users belong to the same unit, and a plurality of users use the same device or contact way, different user nodes can be associated through other nodes. As shown in fig. 12, user a and user B work in the same unit, and user a and user C share the same device and contact means.

A hypergraph composed of all users and their related nodes is denoted as G = (V, E), which includes N nodes and M hyperedges, obviously, the number of user nodes is also M, and N is much larger than M. The input part of the model is discussed next. For each super edge e _j E.g. E, all are given a weight

Therefore, a super-edge weight diagonal matrix W epsilon R capable of training learning can be obtained ^M ^×M . According to the definition of the hypergraph, the hypergraph can also use the incidence matrix H epsilon R ^N×M To indicate if node v is _i Belonging to the super edge e _j Then H is _ij =1, otherwise H _ij ＝0。

The node degree, namely the sum of the weights of all connected hyper-edges, is calculated to obtain a node degree matrix D _v ∈R ^N×N ：

Calculating the degree of the excess edge, namely the number of nodes contained in the excess edge, and calculating an excess edge matrix D _e ∈R ^M×M ：

After node attributes are normalized and subjected to feature processing, a feature vector of each node can be obtained, and the feature vector, a super-edge weight diagonal matrix W, an incidence matrix H and a node degree matrix D of the hypergraph are obtained _v Overcritical matrix D _e Together as input to the model. Since different types of nodes have features with different dimensions, it is necessary to map these different feature vectors to d by learning a function _N A common space of dimensions. For example, for a bank card node v _i ∈V _card The original node characteristics

Transforming into new initial feature vectors

Wherein, σ is a nonlinear activation function, and if no special description is given, the nonlinear activation functions used in the invention are sigmoid functions. The transformation function can be written again as:

can know W _card And b _card Are all parameters that need to be learned during the model training process. The processing mode of the device, identity, user and unit node characteristics is similar to that of a bank card. Note that the "contact" node itself does not have any node attribute, so its initial feature vector is the average of the initial feature vectors of the connected k nodes:

to this end, the initial feature vectors of all nodes are represented as

In a graph structure for practical applications, the relationships between nodes are not usually pairwise, but have higher order relationships. In order to capture the high-order relation and the local clustering structure between the nodes, the invention uses hypergraph convolution to realize the efficient information propagation between the nodes. We define a hypergraph convolution neural network with L layers, the first layer of node characteristicsIs X ⁽⁰⁾ The layer-by-layer propagation rule for defining the node features is as follows:

X ^(l) and X ^(l+1) Represents the node characteristics at level l and level l +1, Θ ^(l) And a convolution kernel representing the l-th layer can carry out dimension transformation on the node characteristics, and the weight can be trained. H ^T Representing the transpose of the correlation matrix H, the node degree matrix D _v And a super-edge matrix D _e And the normalization function is realized. According to the definition of hypergraph convolution, the layer-by-layer propagation rule can realize the conversion of point-edge-point: first, the node features pass through the learnable theta ^(l) To extract new dimensional features of the nodes, then to compare with H ^T After multiplication, the feature sum of the included nodes, namely the super edge feature, is calculated for each super edge. And finally, multiplying the characteristic by H to aggregate the characteristics of all the super edges where the nodes are located, thereby forming the node characteristics with high-order correlation.

However, in the original incidence matrix H, the influence of nodes within the same super-edge on the super-edge feature is the same, which is not applicable to the super-graph in the credit scenario. Since different types of nodes have different characteristics and also different importance. According to the experience of manual examination, the financial attributes of the bank card nodes are more, so that the importance of the characteristics of the bank card nodes is higher than that of the equipment nodes when judging whether the user has overdue risks. Note that the force mechanism can assign different weights to different nodes when features are aggregated. The invention adds an attention mechanism into the calculation of the overcide characteristic to establish a new incidence matrix H _new . If node v _i Belonging to the super edge e _j Then H is _ij ＝α _ij Otherwise H _ij ＝0。α _ij I.e. representing node v _i At the super edge e _j Is used to represent the corresponding attention coefficient. We define the hyper-edge e _j Of the original feature vector

Features equal to user nodes in a super edge

α _ij The calculation formula is as follows:

wherein v, W _att U and tan h are parameters which can be used for training and learning, and N is a hyperbolic tangent function _j Indicating a supercide e _j All nodes involved. So far, the node feature propagation rule in hypergraph convolution is updated to the following formula:

finally, the node expression vector of the last layer of the hypergraph convolution neural network is obtained as

In the overdue risk prediction task, users need to be classified into two types of clients with overdue risk and without overdue risk, which is essentially a two-classification problem. The model adopts an XGboost classifier, and the X obtained in the previous step is used _final K known expired and unexpired label user nodes in the vector X _K As inputs to the classifier:

Y _p ＝XGBoost(X _K ) Equation 17

Obtaining the probability Y of the user's predicted category _p ∈R ^K×1 Then, the cross entropy loss function commonly used by the binary problem is used as the target function. The training goal of the model is to make the objective function as small as possible. The target function Loss is shown below, where y ⁱ If the true overdue label represents the ith node, the true overdue label is 1, otherwise the true overdue label is 0;

representing the probability that the ith node is predicted to be overdue.

After multiple training, the loss function reaches the optimal value, we obtain a model with the optimal parameters, and then input the current user data into the model to obtain a prediction result of whether the current user has overdue risk, as shown in fig. 13.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

In one embodiment, as shown in fig. 9, there is provided an account status model building apparatus, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes: a data acquisition module 902, a sample data generation module 904, and a training module 906, wherein:

a data obtaining module 902, configured to obtain historical user feature data;

a sample data generating module 904, configured to construct a user feature map according to historical user feature data; generating a corresponding node initial characteristic vector based on the node attribute characteristics of each node in the user characteristic map and the initial value of the model training parameter, wherein the initial value of the model training parameter is the initial value of the parameter to be trained in the account state model; obtaining an incidence matrix based on the fusion of the initial feature vectors of the nodes, wherein the incidence matrix is used for representing the incidence relation of each node and the importance of each node; obtaining feature input sample data based on the user feature map, the incidence matrix and the node initial feature vector;

a training module 906, configured to train to obtain an account status model based on the feature input sample data, where the account status model is used to evaluate an operation status of the target account.

The account state model building device builds the user characteristic map according to the historical user characteristic data by acquiring the historical user characteristic data, generates the corresponding node initial characteristic vector based on the node attribute characteristics of each node in the user characteristic map and the initial values of the model training parameters, wherein the initial values of the model training parameters are the initial values of the parameters to be trained in the account state model, obtains the incidence matrix based on the fusion of the node initial characteristic vectors, the incidence matrix is used for representing the incidence relation of each node and the importance of each node, obtains the characteristic input sample data based on the user characteristic map, the incidence matrix and the node initial characteristic vectors, obtains the account state model based on the characteristic input sample data training, and the account state model is used for evaluating the operation state of the target account. Therefore, the initial values of the training parameters of the model are involved in the process of generating the initial characteristic vectors of the nodes, so that new training parameters are continuously iterated along with the iteration of the model, the characteristic vectors of the nodes are regenerated according to the new parameters generated by each iteration, the convergence of the model is accelerated, the incidence matrix is obtained through the fusion of the initial characteristic vectors of the nodes, the importance weights given to the nodes are dynamically adjusted according to the importance of different nodes in the training of the model, the importance of different nodes can be better mined according to a specific application scene, the trained model can better mine data characteristics, and the accuracy of the prediction of the account state model can be effectively improved.

In one embodiment, the sample data generation module 904 is further configured to obtain a non-linear activation function; determining the original node characteristics of each node according to the node attributes of each node; and generating a node initial characteristic vector corresponding to each node based on the nonlinear activation function, the original node characteristics and the model training parameter initial value.

In one embodiment, the sample data generating module 904 is further configured to, when the node is a contact manner node, obtain an initial feature vector of an adjacent node corresponding to the adjacent node of the contact manner node, where the contact manner node is a node whose node attribute is a contact manner; and determining the node initial characteristic vector corresponding to the contact mode node according to the average value of the adjacent node initial characteristic vectors.

In one embodiment, the sample data generating module 904 is further configured to obtain a user identification node in each node; obtaining a user initial characteristic vector according to account channel information, geographic information and application state information of the user identification node; and obtaining the incidence matrix based on the fusion of the initial feature vectors of the user and the initial feature vectors of the nodes corresponding to other nodes.

In one embodiment, the sample data generation module 904 is further configured to determine a sub-graph in the user feature graph according to a user identification node in each node; obtaining a user characteristic map weight matrix according to the sub-map; fusing the user characteristic map weight matrix and the incidence matrix to obtain a node degree matrix; obtaining a sub-map degree matrix based on sub-map nodes and incidence matrixes in the sub-map; and obtaining feature input sample data based on the user feature map weight matrix, the incidence matrix, the node degree matrix, the sub-graph spectral degree matrix and the node initial feature vector.

In one embodiment, the training module 906 is further configured to perform normalization processing on the feature input sample data to obtain standard input sample data; and inputting standard input sample data into the graph convolution neural network model for training to obtain an account state model.

In one embodiment, the training module 906 is further configured to input standard input sample data into the graph convolution neural network model to obtain a model actual output value; classifying the actual output value of the model according to a preset classification algorithm to obtain an actual estimation result of the account state corresponding to the input sample data; when the error between the actual account state estimation result and the standard account state estimation result preset in the standard input sample data is larger than a preset error threshold value, returning to the step of inputting the standard input sample data into the convolutional neural network model to obtain the actual output value of the model; and when the error between the actual estimation result of the account state and the standard estimation result of the account state preset in the standard input sample data is less than or equal to a preset error threshold, finishing training to obtain an account state model.

For the specific definition of the account status model building device, reference may be made to the above definition of the account status model building method, which is not described herein again. The modules in the account state model building device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of account state model construction. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on a shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of the computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps of the above-described method embodiments.

It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. An account state model building method, the method comprising:

acquiring historical user characteristic data;

constructing a user characteristic map according to the historical user characteristic data;

generating a corresponding node initial characteristic vector based on the node attribute characteristics of each node in the user characteristic map and an initial value of a model training parameter, wherein the initial value of the model training parameter is an initial value of a parameter to be trained in the account state model;

obtaining an incidence matrix based on the initial feature vector fusion of the nodes, wherein the incidence matrix is used for representing the incidence relation of each node and the importance of each node;

and training the input sample data based on the characteristics to obtain an account state model, wherein the account state model is used for evaluating the operation state of the target account.

2. The method according to claim 1, wherein generating a corresponding node initial feature vector based on the node attribute features of each node in the user feature map and model training parameter initial values comprises:

acquiring a nonlinear activation function;

and generating a node initial feature vector corresponding to each node based on the nonlinear activation function, the original node feature and the model training parameter initial value.

3. The method according to claim 1, wherein generating a corresponding node initial feature vector based on the node attribute features of each node in the user feature map and model training parameter initial values comprises:

when the node is a contact way node, acquiring an initial feature vector of an adjacent node corresponding to the adjacent node of the contact way node, wherein the contact way node is a node of which the node attribute is a contact way;

and determining the node initial characteristic vector corresponding to the contact way node according to the average value of the adjacent node initial characteristic vectors.

4. The method of claim 1, wherein the obtaining the correlation matrix of each node based on the initial eigenvector fusion of the nodes comprises:

acquiring a user identification node in each node;

and obtaining an incidence matrix based on the fusion of the user initial feature vector and the node initial feature vectors corresponding to other nodes.

5. The method according to claim 1, wherein said deriving feature input sample data based on the user feature map, the correlation matrix, and the node initial feature vector comprises:

determining a sub-graph in the user characteristic graph according to a user identification node in each node;

obtaining a user characteristic map weight matrix according to the sub-maps;

a node degree matrix is obtained based on fusion of the user characteristic map weight matrix and the incidence matrix;

obtaining a sub-map degree matrix based on sub-map nodes in the sub-map and the incidence matrix;

and obtaining feature input sample data based on the user feature map weight matrix, the incidence matrix, the node degree matrix, the sub-map degree matrix and the node initial feature vector.

6. The method of claim 1, wherein training an account status model based on the feature input sample data comprises:

and inputting the standard input sample data into a graph convolution neural network model for training to obtain an account state model.

7. The method of claim 6, wherein training the standard input sample data input graph convolutional neural network model to obtain an account state model comprises:

inputting the standard input sample data into a graph convolution neural network model to obtain a model actual output value;

classifying the actual output value of the model according to a preset classification algorithm to obtain an actual estimation result of the account state corresponding to the input sample data;

when the error between the actual account state estimation result and the standard account state estimation result preset in the standard input sample data is larger than a preset error threshold value, returning to the step of obtaining the actual model output value by the convolution neural network model of the standard input sample data input diagram;

and when the error between the actual account state estimation result and the preset account state standard estimation result in the standard input sample data is less than or equal to a preset error threshold, finishing training to obtain an account state model.

8. An account state model building apparatus, the apparatus comprising:

the sample data generation module is used for constructing a user characteristic map according to the historical user characteristic data; generating a corresponding node initial feature vector based on the node attribute feature of each node in the user feature map and a model training parameter initial value, wherein the model training parameter initial value is an initial value of a parameter to be trained in the account state model; obtaining an incidence matrix based on the initial feature vector fusion of the nodes, wherein the incidence matrix is used for representing the incidence relation of each node and the importance of each node; obtaining feature input sample data based on the user feature map, the incidence matrix and the node initial feature vector;

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.