CN112200666A - Feature vector processing method and related device - Google Patents

Feature vector processing method and related device Download PDF

Info

Publication number
CN112200666A
CN112200666A CN202011289894.2A CN202011289894A CN112200666A CN 112200666 A CN112200666 A CN 112200666A CN 202011289894 A CN202011289894 A CN 202011289894A CN 112200666 A CN112200666 A CN 112200666A
Authority
CN
China
Prior art keywords
feature
user
vectors
neural network
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011289894.2A
Other languages
Chinese (zh)
Inventor
鲁海生
严澄
杨青
杨志谋
谭大坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Du Xiaoman Technology Beijing Co Ltd
Original Assignee
Shanghai Youyang New Media Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Youyang New Media Information Technology Co ltd filed Critical Shanghai Youyang New Media Information Technology Co ltd
Priority to CN202011289894.2A priority Critical patent/CN112200666A/en
Publication of CN112200666A publication Critical patent/CN112200666A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application provides a processing method of a feature vector and a related device, wherein the method comprises the following steps: respectively determining the feature vectors of a user to be pre-estimated and at least one target neighbor user; classifying the feature vectors according to the categories of the feature information corresponding to the feature vectors to obtain a feature vector set; inputting each characteristic vector set into a neural network model to obtain a credit risk value of the user to be estimated; the neural network model respectively calculates the merging vectors of the feature vector sets, and credit risk values are determined according to the merging vectors; the process of calculating the merging vector of any feature vector set by the neural network model comprises the following steps: and calculating the correlation coefficient of the feature vector of each target neighbor user in the feature vector set relative to the feature vector of the user to be estimated, taking the correlation coefficient as the weight, and calculating the weighted sum of the feature vectors of each target neighbor user in the feature vector set. The method and the device can solve the problem that the accuracy of the determined credit risk value is low.

Description

Feature vector processing method and related device
Technical Field
The present application relates to the field of data processing, and in particular, to a method and a related device for processing a feature vector.
Background
In the 'internet +' and big data era, credit records become more and more concerned by governments and financial institutions in recent years, a perfect credit investigation system is an important component of market economy, and extensive fraud or default can have negative influence on the development of the economy in an immeasurable way. Whether the credit is assessed by individuals or enterprises, high-risk applications, especially fraud risks, can be effectively distinguished, bad account rate of financial institutions can be greatly reduced, and loss of the financial institutions is avoided. Meanwhile, each credit investigation institution masters a large amount of information such as application records, fraud blacklists, overdue blacklists and the like, and how to distinguish high-risk applications based on the big data of the stock is an important research content.
At present, a correlation network can be constructed by using information of all users acquired in history, feature information of neighbor users of the users to be estimated in the correlation network is processed by using a neural network model, and the processed result is used for evaluating credit risks of the users to be estimated. Therefore, the accuracy of the characteristic information processing result directly influences the accuracy of the evaluation result.
Disclosure of Invention
The application provides a feature vector processing method and a related device, and aims to solve the problem that the accuracy of a determined credit risk value is low.
In order to achieve the above object, the present application provides the following technical solutions:
the application provides a processing method of a feature vector, which comprises the following steps:
respectively determining feature vectors of a user to be pre-estimated and at least one target neighbor user, wherein the feature vectors comprise vectors corresponding to at least one type of feature information;
classifying the feature vectors according to the categories of the feature information corresponding to the feature vectors to respectively obtain a feature vector set corresponding to each feature information category;
inputting each characteristic vector set into a preset neural network model to obtain a credit risk value of the user to be estimated; the neural network model respectively calculates a merging vector of each characteristic vector set, and a credit risk value is determined according to the merging vector; the process of calculating the merging vector of any feature vector set by the neural network model comprises the following steps: and calculating the correlation coefficient of the feature vector of each target neighbor user in the feature vector set relative to the feature vector of the user to be pre-estimated, and calculating the weighted sum of the feature vectors of each target neighbor user in the feature vector set by taking the correlation coefficient as the weight.
Optionally, the at least one type of feature information includes: node characteristic information, edge characteristic information and structural characteristic information; the vector corresponding to the at least one type of feature information comprises: a node feature vector, an edge feature vector, and a structural feature vector.
Optionally, before the inputting each feature vector set into the preset neural network model, the method further includes:
splicing every two feature vectors in the feature vectors of the user to be estimated to obtain a spliced vector of the user to be estimated;
splicing every two eigenvectors in the eigenvectors of each target neighbor user respectively to obtain spliced vectors corresponding to each target neighbor user respectively;
combining spliced vectors obtained by splicing the same two kinds of characteristic vectors in the spliced vectors of the user to be estimated and each target neighbor user to form a characteristic vector set to obtain a plurality of characteristic vector sets;
and splicing vectors obtained by splicing all the characteristic vectors of the users to be pre-estimated and splicing vectors obtained by splicing all the characteristic vectors of all the target neighbor users respectively to form a characteristic vector set.
Optionally, the neural network model includes: a first neural network model and a second neural network model; the first neural network model is connected with the second neural network model;
the neural network model respectively calculates the merging vectors of the feature vector sets, determines a credit risk value according to the merging vectors, and comprises the following steps:
the first neural network model respectively calculates the merging vectors of all the characteristic vector sets;
the second neural network model determines the credit risk value as a function of the merge vector.
Optionally, the first neural network model is a graph attention neural network; the second neural network model is a graph convolution neural network;
the determining, by the second neural network model, the credit risk value according to the merged vector specifically includes:
and the second neural network model performs convolution operation on the merged vector to obtain the credit risk value.
Optionally, the first neural network model is a graph attention neural network; the second neural network model is a graph attention neural network;
the determining, by the second neural network model, the credit risk value according to the merged vector specifically includes:
and the second neural network model respectively calculates the correlation coefficients of the merged vectors, takes the correlation coefficients of the merged vectors as weights, and determines the credit risk value according to the weighted sum of the merged vectors.
Optionally, the determining feature vectors of the user to be predicted and the at least one target neighbor user respectively, where the feature vectors include vectors corresponding to at least one type of feature information, includes:
acquiring a pre-established association network; the association network is obtained by taking a user obtained in history as a node and establishing an association relation between nodes according to the information of the user;
establishing an association relation between the user to be estimated and nodes in the association network according to the information of the user to be estimated to obtain an updated association network;
determining the sub-graph of the user to be pre-estimated in the updated associated network;
taking all levels of neighbor nodes of the user to be pre-estimated in the subgraph as the target neighbor users;
and mapping at least one kind of feature information respectively corresponding to the user to be predicted and each target neighbor user into a feature vector respectively to obtain the feature vectors of the user to be predicted and at least one target neighbor user.
The present application further provides a processing apparatus for feature vectors, including:
the determining module is used for respectively determining the feature vectors of the user to be pre-estimated and at least one target neighbor user, wherein the feature vectors comprise vectors corresponding to at least one type of feature information;
the classification module is used for classifying the feature vectors according to the categories of the feature information corresponding to the feature vectors to respectively obtain a feature vector set corresponding to each feature information category;
the execution module is used for inputting each characteristic vector set into a preset neural network model to obtain a credit risk value of the user to be estimated; the neural network model respectively calculates a merging vector of each characteristic vector set, and a credit risk value is determined according to the merging vector; the process of calculating the merging vector of any feature vector set by the neural network model comprises the following steps: and calculating the correlation coefficient of the feature vector of each target neighbor user in the feature vector set relative to the feature vector of the user to be pre-estimated, and calculating the weighted sum of the feature vectors of each target neighbor user in the feature vector set by taking the correlation coefficient as the weight.
The present application also provides a storage medium including a stored program, wherein the program executes the method for processing a feature vector according to any one of the above.
The application also provides a device, which comprises at least one processor, at least one memory connected with the processor, and a bus; the processor and the memory complete mutual communication through the bus; the processor is used for calling program instructions in the memory to execute the processing method of the feature vector.
The feature vector processing method and the related device respectively determine feature vectors of a user to be pre-estimated and at least one target neighbor user, classify the feature vectors according to the categories of feature information corresponding to the feature vectors, and respectively obtain a feature vector set corresponding to each feature information category; that is, a set of feature vectors is obtained that includes: and respectively corresponding feature vectors of the same type of feature information of the user to be estimated and each target neighbor user. Each feature vector set is processed subsequently, namely each type of feature information is processed respectively, so that the processing is more targeted, and the accuracy of the processing result is facilitated.
After the characteristic vector set is obtained, calculating a correlation coefficient of the characteristic vector of each target neighbor user in the characteristic vector set relative to the characteristic vector of the user to be estimated by using a neural network model, wherein the correlation coefficient reflects the closeness degree between the target neighbor user and the user to be estimated, and the closeness degree is higher when the correlation coefficient is higher. Therefore, the relation number is used as the weight, the weighted sum of the feature vectors of all target neighbor users in the feature vector set is calculated, and the obtained merged vector reflects that the utilization accuracy of the feature vectors of all target neighbor users is higher, and further, when the merged vector is used for evaluating credit risks subsequently, the accuracy of the evaluation result can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a processing method of feature vectors disclosed in an embodiment of the present application;
fig. 2 is a flowchart of another feature vector processing method disclosed in the embodiment of the present application;
fig. 3 is a schematic structural diagram of a feature vector processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an apparatus disclosed in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a processing method of a feature vector according to an embodiment of the present application, which includes the following steps:
s101, respectively determining the feature vectors of the user to be estimated and at least one target neighbor user.
In this embodiment, the target neighbor user is a user that has a certain association with the user to be estimated, among the users obtained in history.
In this step, the feature vectors include vectors corresponding to at least one type of feature information, that is, in this step, vectors corresponding to at least one type of feature information of the user to be pre-estimated are determined, and vectors corresponding to at least one type of feature information of each target neighbor user are determined.
In this embodiment, the feature information of the user may be divided into three categories, which are node feature information, edge feature information, and structural feature information.
Wherein the node characteristic information of the user represents attribute information of the user. Such as the age, gender, province, income, etc. of the user.
The side feature information of the user represents association information between the user and an associated user (other existing user associated with the user). Wherein the associated information may include: type of association and degree of association. For example, there is an association between the user a and the user B, specifically, the user a is associated with the user B through a mobile phone number, and assuming that the remark information for the mobile phone number of the user B in the mobile phone of the user a is "girlfried", the association type between the user a and the user B is a mobile phone number connection, and the association degree is the association degree embodied by "girlfried", that is, the association degree is relatively high.
The structural characteristic information of the user is used for representing the position of the user in the associated network and the degree of closeness of association with other users in the associated network. Wherein, the associated network refers to: and establishing a network representing the association relation between different users according to the information of the users and other users. For example, node a is located at the center of the associated network, and is closely connected to other nodes, while node B is located at the edge of the associated network. The structural characteristic information of node a and node B are different.
In this embodiment, the structural feature information may be extracted from the associated network by a structure2vec, a node2vec, RandomWalk, or the like.
Optionally, a specific implementation manner of this step may include step a1 to step a 5:
and A1, acquiring the pre-established association network.
In this embodiment, the association network is obtained by using the users obtained in the history as nodes and establishing an association relationship between the nodes according to the information of the users. The information of the user acquired historically comprises but is not limited to information such as an address list, a call record, a friend circle, a GPS (global positioning system), wifi equipment and the like.
In this step, the specific implementation process of establishing the association network is the prior art, and is not described herein again.
A2, establishing the association relation between the user to be estimated and the nodes in the association network according to the information of the user to be estimated, and obtaining the updated association network.
In this step, the user to be estimated is used as a node of the association network, and the association relationship between the user to be estimated and the node in the association network is established according to the information of the user to be estimated, so as to obtain an updated association network. The specific establishment process is the prior art, and is not described herein again.
And A3, determining the updated subgraph of the user to be pre-estimated in the associated network.
In this step, the subgraph of the user to be pre-estimated includes the user to be pre-estimated and n-level neighbor users of the user to be pre-estimated.
And A4, taking neighbor nodes of all levels of users to be pre-estimated in the subgraph as target neighbor users.
In the step, each level of neighbor users of the users to be pre-estimated in the subgraph are taken as target neighbor users. Namely, other users except the users to be estimated in the subgraph are all used as target neighbor users.
A5, mapping at least one type of feature information corresponding to the user to be pre-estimated and each target neighbor user respectively into feature vectors to obtain the feature vectors of the user to be pre-estimated and at least one target neighbor user.
In this step, if the feature information may include node feature information, edge feature information, and structural feature information. The meanings of the node feature information, the edge feature information, and the structure feature information are described above, and are not described herein again.
In this step, various types of feature information in at least one type of feature information of the user to be estimated are respectively mapped as feature vectors, and the feature information of each target neighbor user is respectively mapped as feature vectors, wherein, taking any target neighbor user as an example, at least one type of feature information of the target neighbor user is respectively mapped as feature vectors, and the feature vectors corresponding to the target neighbor user are obtained. The specific implementation process of mapping is the prior art, and is not described herein again.
And S102, classifying the feature vectors according to the categories of the feature information corresponding to the feature vectors to respectively obtain a feature vector set corresponding to each feature information category.
In this step, feature vectors obtained by mapping the same type of feature information from feature vectors of the user to be predicted and at least one target neighbor user form a feature vector set, that is, feature vectors corresponding to the same type of feature information of the user to be predicted and the target neighbor user form a feature vector set, that is, the feature vector set corresponding to each feature information type is obtained.
S103, inputting each characteristic vector set into a preset neural network model to obtain a credit risk value of the user to be estimated.
In this embodiment, after each feature vector is input into the neural network model, the neural network model calculates a merged vector of each feature vector set, and determines a credit risk value according to the merged vector obtained by calculation.
The process of calculating the merged vector of each feature vector set by the neural network model is the same, and for convenience of description, the process of calculating the merged vector of any feature vector set is taken as an example, and the method specifically includes: and calculating the correlation coefficient of the feature vector of each target neighbor user in the feature vector set relative to the feature vector of the user to be pre-estimated, taking the correlation coefficient as the weight, and calculating the weighted sum of the feature vectors of each target neighbor user in the feature vector set.
Optionally, in this embodiment, the neural network model may include: the neural network model comprises a first neural network model and a second neural network model, wherein the first neural network model is connected with the second neural network model.
The process of calculating, by the neural network model, a merged vector of each feature vector set based on the first neural network model and the second neural network model, and determining the credit risk value according to the calculated merged vector may include: and the first neural network model respectively calculates the merging vectors of the feature vector sets, and the second neural network model determines the credit risk value according to the merging vectors.
In this embodiment, the first neural network model may be a graph attention neural network. The second neural network model may be a graph attention neural network or a graph convolution neural network, and the embodiment does not limit the specific form of the second neural network model.
In this embodiment, the neural network model may be a trained neural network model, wherein the specific training mode may be an existing deep learning-based training mode, and details are not repeated in this embodiment.
To describe the graph attention neural network model more clearly, the process of calculating the input feature vector set to obtain the merge vector may include:
firstly, generating a correlation coefficient of each target neighbor, and then performing weighted summation on a feature vector of each target neighbor. Specifically, attention coefficients of respective target neighbors are calculated, wherein the calculation process of the attention coefficients is shown in the following formula (1):
Figure BDA0002782425440000091
wherein the content of the first and second substances,
Figure BDA0002782425440000092
for the feature vector of the user (node) i to be predicted,
Figure BDA0002782425440000093
the feature vector of the target neighbor user (node) j for node i. The eigenvectors of the node i and the node j are firstly respectively subjected to primary conversion by a left multiplication matrix W, alpha is an attention calculation function, and the target is two-dimensional
Figure BDA0002782425440000094
And
Figure BDA0002782425440000095
the vector obtains a value, i.e. the attention coefficient, where α is typically implemented in a manner that will be
Figure BDA0002782425440000096
Transfer right ride
Figure BDA0002782425440000097
The attention coefficient e of each target neighbor user can be calculated and obtained through the formula (1)ijThen, performing one-time normalization on the calculated attention coefficient to obtain a final correlation coefficient, wherein the normalization is generally calculated by adopting a softmax function, and a specific calculation formula is shown as the following formula (2):
Figure BDA0002782425440000098
finally, the feature vector of the user to be estimated is as follows: and taking the correlation coefficient of the feature vector of each target neighbor user as a weight, and carrying out weighted summation on the feature vectors of each target neighbor user to obtain a merged vector. The specific calculation formula is shown in the following formula (3):
Figure BDA0002782425440000099
in the formula, the 'sigma' is an activation function, and the feature vector of the target neighbor user and the correlation coefficient are weighted and summed in parentheses of the activation function.
In the embodiment of the application, in order to further improve the accuracy of the credit risk value of the user to be estimated, the diversity of the feature vector sets can be increased, so that the neural network model processes various feature vector sets, and the accuracy of the obtained credit risk value of the user to be estimated is further improved. Specifically, the manner of increasing the diversity of the feature vector sets may include: the method comprises the steps of splicing the characteristic vectors of users to be pre-estimated to obtain spliced vectors of the users to be pre-estimated, splicing the characteristic vectors of target neighbor users to obtain spliced vectors corresponding to the target neighbor users respectively, and generating a characteristic vector set according to the spliced vectors of the users to be pre-estimated and the spliced vectors corresponding to the target neighbor users respectively, so that the diversity of the characteristic vector set is increased.
The process of determining the risk value of the user to be predicted based on increasing the diversity of the feature vector sets is shown in fig. 2. Fig. 2 is a further method for processing a feature vector according to an embodiment of the present application, where the method includes the following steps:
s201, respectively determining the feature vectors of the user to be pre-estimated and at least one target neighbor user.
The meaning and optional implementation of this step may refer to S101, which is not described herein again.
S202, classifying the feature vectors according to the categories of the feature information corresponding to the feature vectors to respectively obtain a feature vector set corresponding to each feature information category.
The meaning of this step can refer to S102, which is not described herein again.
S203, splicing every two feature vectors in the feature vectors of the user to be estimated to obtain a spliced vector of the user to be estimated.
In this embodiment, the feature vector of the user to be predicted may include: the feature vectors corresponding to the various types of feature information respectively exist, namely the feature vectors of the user to be estimated exist in a plurality. In the step, every two feature vectors in the feature vectors of the user to be estimated are spliced. The specific implementation process of splicing is the prior art, and is not described herein again.
For example, the feature vectors of the user to be estimated include a node feature vector, an edge feature vector and a structural feature vector, and in this step, the node feature vector and the edge feature vector are spliced, the node feature vector and the structural feature vector are spliced, and the edge feature vector and the structural feature vector are spliced to obtain three spliced vectors.
S204, splicing every two eigenvectors in the eigenvectors of each target neighbor user respectively to obtain spliced vectors corresponding to each target neighbor user respectively.
In this step, the feature vectors of each target neighbor user are processed respectively, the specific processing modes are the same, and for convenience of introduction, a target neighbor user is taken as an example for introduction. Specifically, every two feature vectors in the feature vectors of the target neighbor user are spliced, wherein the specific implementation manner of splicing is the prior art, and is not described herein again. If the feature vector of the target neighbor user includes three, then in this step, three spliced vectors of the target neighbor user can be obtained.
S205, in the splicing vectors of the user to be estimated and each target neighbor user, a characteristic vector set is formed by the splicing vectors obtained by splicing the same two characteristic vectors, and a plurality of characteristic vector sets are obtained.
In the step, a feature vector set is formed by splicing the spliced vectors of the user to be estimated and each target neighbor user by the same two feature vectors.
If the user to be estimated and each target neighbor user respectively correspond to three splicing vectors, three characteristic vector sets can be obtained in the step.
S206, splicing vectors obtained by splicing all the characteristic vectors of the users to be pre-estimated and splicing vectors obtained by splicing all the characteristic vectors of all target neighbor users respectively form a characteristic vector set.
In the step, a splicing vector is obtained for the user to be pre-estimated, and each target neighbor user corresponds to one splicing vector. Thus, in this step, a set of feature vectors is obtained.
If the user to be predicted and each target neighbor user correspond to three feature vectors (node feature vector, edge feature vector and structural feature vector) respectively, 7 feature vector sets can be obtained through S202-S206.
And S207, inputting each characteristic vector set into a preset neural network model to obtain a credit risk value of the user to be estimated.
The meaning of this step can refer to S103, which is not described herein.
Taking the above-mentioned 7 feature vector sets obtained in S202 to S206 as an example, in this step, the neural network model processes the 7 feature vector sets respectively to obtain 7 merged vectors, which can be represented as Wn, Ws, We, Wn, Wse, Wns, and Wsen. And determining the credit risk value of the user to be estimated according to the 7 merged vectors.
Taking 7 feature vector sets as an example, in this step, the neural network model processes a plurality of input feature vector sets, wherein each feature vector set is processed separately, each feature vector set is processed according to the following formula (4), for convenience of description, any one feature vector set is used as an example for introduction, and specifically, a process of generating a merge vector based on the feature vector set is shown in the following formula (4):
eij Φ=att1st(hi,hj;Φ)
Figure BDA0002782425440000121
Figure BDA0002782425440000122
in the formula (I), the compound is shown in the specification,
Figure BDA0002782425440000123
i.e. the merged vector generated for the set of eigenvectors, phi denotes the local attention direction function.
In the formula, eij ΦRepresenting the attention coefficient, att, between the target neighbor node j and the user i to be predicted1stThe first-tier attention calculation, i.e., the operation indicated by equation (1), is represented. Alpha is alphaij ΦRepresents a pair eij ΦThe normalized attention coefficient, namely the correlation coefficient of the target neighbor node j relative to the user i to be estimated, Ni ΦAnd representing a set formed by target neighbor nodes of the user i to be predicted.
It should be noted that, in this embodiment, each feature vector set corresponds to a local attention direction function, and the local attention direction functions corresponding to different feature vector sets may be different. For 7 feature vector sets, 7 merged vectors can be obtained by the calculation of formula (4), which can be represented as Wn, Ws, We, Wne, Wse, Wns, Wsen.
In this step, if the process of generating the credit risk value according to the merge vector is processed by using the graph attention neural network model, the specific processing formula is as shown in the following formula (5):
Figure BDA0002782425440000124
in the formula (I), the compound is shown in the specification,
Figure BDA0002782425440000125
representing the 7 merged vectors which are to be merged,
Figure BDA0002782425440000126
represents the correlation coefficients, att, corresponding to 7 merging vectors respectively2ndRepresents the second attention calculation, i.e., the operation indicated by formula (1). And n is taken as 7, Z represents the attention mechanism vector which is finally output, and is generally input into the fully-connected network, and a numerical value is finally output as a risk score.
In the prior art, whether a user to be estimated is a high-risk user is judged in a rule-based mode, specifically, a threshold value used for representing high risk is determined according to an overdue rate or the number of blacklists and the like, and whether the user is the high-risk user is judged according to the threshold value, so that the prior art can only determine the user with the risk value higher than the threshold value, and cannot identify the user with medium or low risk. In the embodiment of the application, the neural network model is adopted for calculation, the output risk value can be a certain numerical value in the range of 0-1, namely the output risk value can be various, namely, users with middle-low risk and high risk can be estimated.
Fig. 3 is a device for processing a feature vector according to an embodiment of the present application, where the device includes: a determination module 301, a classification module 302, and an execution module 303, wherein,
a determining module 301, configured to determine feature vectors of a user to be predicted and at least one target neighbor user, where the feature vectors include vectors corresponding to at least one type of feature information;
a classification module 302, configured to classify the feature vectors according to categories of feature information corresponding to the feature vectors, so as to obtain a feature vector set corresponding to each feature information category;
the execution module 303 is configured to input each feature vector set into a preset neural network model to obtain a credit risk value of the user to be estimated; the neural network model respectively calculates a merging vector of each characteristic vector set, and a credit risk value is determined according to the merging vector; the process of calculating the merging vector of any feature vector set by the neural network model comprises the following steps: and calculating the correlation coefficient of the feature vector of each target neighbor user in the feature vector set relative to the feature vector of the user to be pre-estimated, and calculating the weighted sum of the feature vectors of each target neighbor user in the feature vector set by taking the correlation coefficient as the weight.
Optionally, the at least one type of feature information includes: node characteristic information, edge characteristic information and structural characteristic information; the vector corresponding to the at least one type of feature information comprises: a node feature vector, an edge feature vector, and a structural feature vector.
Optionally, the apparatus may further include:
the characteristic vector increasing module is used for splicing every two characteristic vectors in the characteristic vectors of the user to be estimated before each characteristic vector set is input into a preset neural network model to obtain a spliced vector of the user to be estimated; splicing every two eigenvectors in the eigenvectors of each target neighbor user respectively to obtain spliced vectors corresponding to each target neighbor user respectively; combining spliced vectors obtained by splicing the same two kinds of characteristic vectors in the spliced vectors of the user to be estimated and each target neighbor user to form a characteristic vector set to obtain a plurality of characteristic vector sets; and splicing vectors obtained by splicing all the characteristic vectors of the users to be pre-estimated and splicing vectors obtained by splicing all the characteristic vectors of all the target neighbor users respectively to form a characteristic vector set.
Optionally, the neural network model includes: a first neural network model and a second neural network model; the first neural network model is connected with the second neural network model;
the neural network model respectively calculates the merging vectors of the feature vector sets, determines a credit risk value according to the merging vectors, and comprises the following steps: the first neural network model respectively calculates the merging vectors of all the characteristic vector sets; the second neural network model determines the credit risk value as a function of the merge vector.
Optionally, the first neural network model is a graph attention neural network; the second neural network model is a graph convolution neural network; the determining, by the second neural network model, the credit risk value according to the merged vector specifically includes: and the second neural network model performs convolution operation on the merged vector to obtain the credit risk value.
Optionally, the first neural network model is a graph attention neural network; the second neural network model is a graph attention neural network; the determining, by the second neural network model, the credit risk value according to the merged vector specifically includes: and the second neural network model respectively calculates the correlation coefficients of the merged vectors, takes the correlation coefficients of the merged vectors as weights, and determines the credit risk value according to the weighted sum of the merged vectors.
Optionally, the determining module 301 is configured to determine feature vectors of the to-be-pre-estimated user and at least one target neighbor user, where the feature vectors include vectors corresponding to at least one type of feature information, and the determining module includes:
the determining module 301 is specifically configured to acquire a pre-established association network; the association network is obtained by taking a user obtained in history as a node and establishing an association relation between nodes according to the information of the user; establishing an association relation between the user to be estimated and nodes in the association network according to the information of the user to be estimated to obtain an updated association network; determining the sub-graph of the user to be pre-estimated in the updated associated network; taking all levels of neighbor nodes of the user to be pre-estimated in the subgraph as the target neighbor users; and mapping at least one kind of feature information respectively corresponding to the user to be predicted and each target neighbor user into a feature vector respectively to obtain the feature vectors of the user to be predicted and at least one target neighbor user.
The feature vector processing device comprises a processor and a memory, wherein the determining module 301, the classifying module 302, the executing module 303, and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to one or more, and the problem that the accuracy of the determined credit risk value is low is solved by adjusting the kernel parameters.
An embodiment of the present invention provides a storage medium on which a program is stored, the program implementing the processing method of the feature vector when executed by a processor.
The embodiment of the invention provides a processor, which is used for running a program, wherein the processing method of the feature vector is executed when the program runs.
An embodiment of the present invention provides an apparatus, as shown in fig. 4, the apparatus includes at least one processor, and at least one memory and a bus connected to the processor; the processor and the memory complete mutual communication through a bus; the processor is used for calling program instructions in the memory to execute the processing method of the feature vector. The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:
respectively determining feature vectors of a user to be pre-estimated and at least one target neighbor user, wherein the feature vectors comprise vectors corresponding to at least one type of feature information;
classifying the feature vectors according to the categories of the feature information corresponding to the feature vectors to respectively obtain a feature vector set corresponding to each feature information category;
inputting each characteristic vector set into a preset neural network model to obtain a credit risk value of the user to be estimated; the neural network model respectively calculates a merging vector of each characteristic vector set, and a credit risk value is determined according to the merging vector; the process of calculating the merging vector of any feature vector set by the neural network model comprises the following steps: and calculating the correlation coefficient of the feature vector of each target neighbor user in the feature vector set relative to the feature vector of the user to be pre-estimated, and calculating the weighted sum of the feature vectors of each target neighbor user in the feature vector set by taking the correlation coefficient as the weight.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. The device may also include input/output interfaces, network interfaces, and the like.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip. The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Features described in the embodiments of the present specification may be replaced with or combined with each other, each embodiment is described with a focus on differences from other embodiments, and the same or similar portions among the embodiments may be referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for processing a feature vector, comprising:
respectively determining feature vectors of a user to be pre-estimated and at least one target neighbor user, wherein the feature vectors comprise vectors corresponding to at least one type of feature information;
classifying the feature vectors according to the categories of the feature information corresponding to the feature vectors to respectively obtain a feature vector set corresponding to each feature information category;
inputting each characteristic vector set into a preset neural network model to obtain a credit risk value of the user to be estimated; the neural network model respectively calculates a merging vector of each characteristic vector set, and a credit risk value is determined according to the merging vector; the process of calculating the merging vector of any feature vector set by the neural network model comprises the following steps: and calculating the correlation coefficient of the feature vector of each target neighbor user in the feature vector set relative to the feature vector of the user to be pre-estimated, and calculating the weighted sum of the feature vectors of each target neighbor user in the feature vector set by taking the correlation coefficient as the weight.
2. The method of claim 1, wherein the at least one type of feature information comprises: node characteristic information, edge characteristic information and structural characteristic information; the vector corresponding to the at least one type of feature information comprises: a node feature vector, an edge feature vector, and a structural feature vector.
3. The method of claim 2, wherein before inputting each set of feature vectors into the preset neural network model, further comprising:
splicing every two feature vectors in the feature vectors of the user to be estimated to obtain a spliced vector of the user to be estimated;
splicing every two eigenvectors in the eigenvectors of each target neighbor user respectively to obtain spliced vectors corresponding to each target neighbor user respectively;
combining spliced vectors obtained by splicing the same two kinds of characteristic vectors in the spliced vectors of the user to be estimated and each target neighbor user to form a characteristic vector set to obtain a plurality of characteristic vector sets;
and splicing vectors obtained by splicing all the characteristic vectors of the users to be pre-estimated and splicing vectors obtained by splicing all the characteristic vectors of all the target neighbor users respectively to form a characteristic vector set.
4. The method of claim 1, wherein the neural network model comprises: a first neural network model and a second neural network model; the first neural network model is connected with the second neural network model;
the neural network model respectively calculates the merging vectors of the feature vector sets, determines a credit risk value according to the merging vectors, and comprises the following steps:
the first neural network model respectively calculates the merging vectors of all the characteristic vector sets;
the second neural network model determines the credit risk value as a function of the merge vector.
5. The method of claim 4, wherein the first neural network model is a graph attention neural network; the second neural network model is a graph convolution neural network;
the determining, by the second neural network model, the credit risk value according to the merged vector specifically includes:
and the second neural network model performs convolution operation on the merged vector to obtain the credit risk value.
6. The method of claim 4, wherein the first neural network model is a graph attention neural network; the second neural network model is a graph attention neural network;
the determining, by the second neural network model, the credit risk value according to the merged vector specifically includes:
and the second neural network model respectively calculates the correlation coefficients of the merged vectors, takes the correlation coefficients of the merged vectors as weights, and determines the credit risk value according to the weighted sum of the merged vectors.
7. The method according to claim 1, wherein the determining feature vectors of the user to be predicted and the at least one target neighbor user respectively, the feature vectors including vectors corresponding to at least one type of feature information, comprises:
acquiring a pre-established association network; the association network is obtained by taking a user obtained in history as a node and establishing an association relation between nodes according to the information of the user;
establishing an association relation between the user to be estimated and nodes in the association network according to the information of the user to be estimated to obtain an updated association network;
determining the sub-graph of the user to be pre-estimated in the updated associated network;
taking all levels of neighbor nodes of the user to be pre-estimated in the subgraph as the target neighbor users;
and mapping at least one kind of feature information respectively corresponding to the user to be predicted and each target neighbor user into a feature vector respectively to obtain the feature vectors of the user to be predicted and at least one target neighbor user.
8. An apparatus for processing feature vectors, comprising:
the determining module is used for respectively determining the feature vectors of the user to be pre-estimated and at least one target neighbor user, wherein the feature vectors comprise vectors corresponding to at least one type of feature information;
the classification module is used for classifying the feature vectors according to the categories of the feature information corresponding to the feature vectors to respectively obtain a feature vector set corresponding to each feature information category;
the execution module is used for inputting each characteristic vector set into a preset neural network model to obtain a credit risk value of the user to be estimated; the neural network model respectively calculates a merging vector of each characteristic vector set, and a credit risk value is determined according to the merging vector; the process of calculating the merging vector of any feature vector set by the neural network model comprises the following steps: and calculating the correlation coefficient of the feature vector of each target neighbor user in the feature vector set relative to the feature vector of the user to be pre-estimated, and calculating the weighted sum of the feature vectors of each target neighbor user in the feature vector set by taking the correlation coefficient as the weight.
9. A storage medium comprising a stored program, wherein the program executes the method of processing a feature vector according to any one of claims 1 to 7.
10. An apparatus comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is used for calling program instructions in the memory to execute the processing method of the feature vector according to any one of claims 1-7.
CN202011289894.2A 2020-11-17 2020-11-17 Feature vector processing method and related device Pending CN112200666A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011289894.2A CN112200666A (en) 2020-11-17 2020-11-17 Feature vector processing method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011289894.2A CN112200666A (en) 2020-11-17 2020-11-17 Feature vector processing method and related device

Publications (1)

Publication Number Publication Date
CN112200666A true CN112200666A (en) 2021-01-08

Family

ID=74033555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011289894.2A Pending CN112200666A (en) 2020-11-17 2020-11-17 Feature vector processing method and related device

Country Status (1)

Country Link
CN (1) CN112200666A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191565A (en) * 2021-05-18 2021-07-30 同盾科技有限公司 Security prediction method, security prediction device, security prediction medium, and security prediction apparatus
CN114596097A (en) * 2022-05-10 2022-06-07 富算科技(上海)有限公司 User identification method, device, electronic equipment and computer readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191565A (en) * 2021-05-18 2021-07-30 同盾科技有限公司 Security prediction method, security prediction device, security prediction medium, and security prediction apparatus
CN113191565B (en) * 2021-05-18 2023-04-07 同盾科技有限公司 Security prediction method, security prediction device, security prediction medium, and security prediction apparatus
CN114596097A (en) * 2022-05-10 2022-06-07 富算科技(上海)有限公司 User identification method, device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
WO2021114974A1 (en) User risk assessment method and apparatus, electronic device, and storage medium
TWI728292B (en) Method and device for identifying suspicious money laundering gang
EP3703332B1 (en) Graph structure model training and junk account identification
WO2020024716A1 (en) Method and device for training prediction model for new scenario
JP6484730B2 (en) Collaborative filtering method, apparatus, server, and storage medium for fusing time factors
WO2019149059A1 (en) Method and apparatus for determining decision strategy corresponding to service and electronic device
CN110009474B (en) Credit risk assessment method and device and electronic equipment
CN107451854B (en) Method and device for determining user type and electronic equipment
TW201734893A (en) Method and apparatus for acquiring score credit and outputting feature vector value
CN110009486B (en) Method, system, equipment and computer readable storage medium for fraud detection
CN112200666A (en) Feature vector processing method and related device
CN114612743A (en) Deep learning model training method, target object identification method and device
CN114187112A (en) Training method of account risk model and determination method of risk user group
CN115035347A (en) Picture identification method and device and electronic equipment
CN113743678A (en) User credit score prediction method and related equipment
CN109597851B (en) Feature extraction method and device based on incidence relation
CN112232417A (en) Classification method and device, storage medium and terminal
CN112733743A (en) Model training method, data, image quality evaluation method and related device
CN111461892A (en) Method and device for selecting derived variables of risk identification model
JP5652250B2 (en) Image processing program and image processing apparatus
US20230137864A1 (en) Information processing method and device, and storage medium
CN116432106A (en) Data processing method, device, equipment and medium based on model self-distillation
CN117787995A (en) Suspicious group partner identification method and device and electronic equipment
CN110727861A (en) Method and equipment for microblog water army identification
CN111143552A (en) Text information category prediction method and device and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: B7-7-2, Yuxing Plaza, No.5, Huangyang Road, Yubei District, Chongqing

Applicant after: Chongqing duxiaoman Youyang Technology Co.,Ltd.

Address before: 201800 room 307, 3 / F, building 8, 55 Huiyuan Road, Jiading District, Shanghai

Applicant before: SHANGHAI YOUYANG NEW MEDIA INFORMATION TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211213

Address after: 100193 Room 606, 6 / F, building 4, West District, courtyard 10, northwest Wangdong Road, Haidian District, Beijing

Applicant after: Du Xiaoman Technology (Beijing) Co.,Ltd.

Address before: B7-7-2, Yuxing Plaza, No.5, Huangyang Road, Yubei District, Chongqing

Applicant before: Chongqing duxiaoman Youyang Technology Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210108