CN110706092B - Risk user identification method and device, storage medium and electronic equipment - Google Patents

Risk user identification method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN110706092B
CN110706092B CN201910901456.8A CN201910901456A CN110706092B CN 110706092 B CN110706092 B CN 110706092B CN 201910901456 A CN201910901456 A CN 201910901456A CN 110706092 B CN110706092 B CN 110706092B
Authority
CN
China
Prior art keywords
matrix
user
user set
users
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910901456.8A
Other languages
Chinese (zh)
Other versions
CN110706092A (en
Inventor
何曲棠
罗广锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Feisuanzhi Technology (Shenzhen) Co.,Ltd.
Original Assignee
Qianhai Feisuan Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianhai Feisuan Technology Shenzhen Co ltd filed Critical Qianhai Feisuan Technology Shenzhen Co ltd
Priority to CN201910901456.8A priority Critical patent/CN110706092B/en
Publication of CN110706092A publication Critical patent/CN110706092A/en
Application granted granted Critical
Publication of CN110706092B publication Critical patent/CN110706092B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Abstract

The disclosure relates to a method and a device for identifying a risk user, a storage medium and an electronic device. The method comprises the following steps: acquiring characteristic data of each user in a user set, wherein the user set comprises risk sample users and a plurality of users to be identified; with the information entropy of the user set minimized as a target, determining similarity among users in the user set; based on a spectral clustering algorithm, clustering a user set according to the similarity between users in the user set so as to divide the user set into a plurality of clusters; and determining the risk users from the plurality of users to be identified according to the distribution information of the risk sample users in the plurality of clusters. Through the technical scheme, the similarity between the users in the user set can be automatically determined, so that the efficiency of the whole risk user identification process and the accuracy of the identification result can be improved, and the labor cost of the whole process is reduced.

Description

Risk user identification method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for identifying a risky user, a storage medium, and an electronic device.
Background
In the business fields of credit, finance anti-fraud and the like, identification of risky users is generally required to reduce business risks.
In the related technology, the risk user identification is mostly carried out by manually determining the similarity between users based on the respective feature data of a plurality of users to be identified, and then carrying out the risk user identification according to the similarity between the users and the historical feature data of the risk users.
Disclosure of Invention
In order to overcome the problems in the related art, the present disclosure provides a method and apparatus for identifying a risky user, a storage medium, and an electronic device.
In order to achieve the above object, according to a first aspect of the embodiments of the present disclosure, there is provided a method for identifying a risky user, including:
acquiring characteristic data of each user in a user set, wherein the user set comprises risk sample users and a plurality of users to be identified;
determining similarity among users in the user set by taking the minimized information entropy of the user set as a target, wherein the information entropy is used for representing the uncertainty of a clustering result obtained by clustering the user set, and the probability that each user in the information entropy belongs to one cluster is the proportion of the sum of the similarities between the user and other users in the user set to the sum of the similarities among the users in the user set;
based on a spectral clustering algorithm, clustering a user set according to the similarity between users in the user set so as to divide the user set into a plurality of clusters;
and determining the risk users from the plurality of users to be identified according to the distribution information of the risk sample users in the plurality of clusters.
Optionally, the information entropy is:
Figure BDA0002211956470000021
wherein H (X) is the information entropy; p (x)i) For user x in the user setiProbability of belonging to a cluster, n being the number of users in the set of users; wijFor user x in the user setiAnd user xjThe similarity between them.
Optionally, the clustering, based on a spectral clustering algorithm, the user set according to the similarity between users in the user set to divide the user set into a plurality of clusters includes:
respectively constructing a similarity matrix and a degree matrix according to the similarity between users in the user set, wherein elements in the similarity matrix are used for representing the similarity between two users in the user set, and elements in the degree matrix are used for representing the sum of the similarities between one user and other users in the user set;
constructing a target matrix according to at least the similarity matrix and the degree matrix, wherein each row vector of the target matrix represents a coordinate of one user in the user set in a feature space;
and clustering the row vectors subjected to dimensionality reduction on the target matrix to divide the user set into a plurality of clusters.
Optionally, the constructing a target matrix according to at least the similarity matrix and the degree matrix includes:
constructing a Laplace matrix according to the similarity matrix and the degree matrix;
performing feature mapping according to the Laplace matrix, and selecting feature values of the number of clusters;
constructing a characteristic vector matrix according to the characteristic vectors corresponding to the selected characteristic values;
and carrying out normalization processing on the row vectors of the characteristic vector matrix to obtain the target matrix.
Optionally, the feature data of each user in the user set includes features of the user in multiple dimensions;
constructing a target matrix according to at least the similarity matrix and the degree matrix, including:
constructing a Laplace matrix according to the similarity matrix and the degree matrix;
determining at least one candidate dimension from the multiple dimensions, and combining features of each user in the user set under each candidate dimension to obtain a feature combination;
constructing a diagonal matrix, a first intermediate matrix and a second intermediate matrix from the feature combinations and the Laplace matrix, respectively, based on the following formulas:
Figure BDA0002211956470000031
where U (j, j) is a diagonal matrix, PjIs the jth row of the projection matrix; alpha, beta and gamma are preset adjusting parameters; r is a correlation matrix for characterizing the degree of correlation between the features in the feature combination X, Rij=I(fi,fj),rij∈R,rijRepresenting mutual information between the features in the dimension i and the features in the dimension j in the feature combination X, and rij∈[0,1](ii) a A is the first intermediate matrix; h is the second intermediate matrix; d is the degree matrix; l is the Laplace matrix;
selecting the eigenvalue of the clustering number from all the eigenvalues of the second intermediate matrix;
respectively constructing a feature vector matrix and a projection matrix corresponding to the feature vector according to the feature vector, the first intermediate matrix and the feature combination corresponding to each selected feature value:
Figure BDA0002211956470000032
wherein T is the eigenvector matrix, and c is the number of clusters; v. of1,v2,…,vcSelecting characteristic vectors corresponding to the characteristic values; p is the projection matrix;
if the constructed projection matrix is not converged, repeating the step of determining at least one candidate dimension from the plurality of dimensions, and combining the features of each user in the user set under each candidate dimension to the constructed feature vector matrix and the projection matrix corresponding to the feature vector respectively until the constructed projection matrix is converged; and the number of the first and second groups,
and under the condition that the constructed projection matrix is converged, carrying out normalization processing on the row vector of the characteristic vector matrix corresponding to the projection matrix to obtain the target matrix.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for identifying a risky user, including:
the system comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring characteristic data of each user in a user set, and the user set comprises risk sample users and a plurality of users to be recognized;
a first determining module, configured to determine similarity between users in the user set with a goal of minimizing an information entropy of the user set, where the information entropy is used to represent an uncertainty of a clustering result obtained by clustering the user set, and a probability that each user in the information entropy belongs to a cluster is a proportion of a sum of similarities between the user and other users in the user set to a sum of similarities between the users in the user set;
the clustering module is used for clustering the user set according to the similarity among the users in the user set based on a spectral clustering algorithm so as to divide the user set into a plurality of clusters;
and the second determining module is used for determining the risk users from the plurality of users to be identified according to the distribution information of the risk sample users in the plurality of clusters.
Optionally, the information entropy is:
Figure BDA0002211956470000041
wherein H (X) is the information entropy; p (x)i) For user x in the user setiProbability of belonging to a cluster, n being the number of users in the set of users; wijFor user x in the user setiAnd user xjThe similarity between them.
Optionally, the clustering module comprises:
a first construction submodule, configured to respectively construct a similarity matrix and a degree matrix according to a similarity between users in the user set, where an element in the similarity matrix is used to characterize a similarity between two users in the user set, and an element in the degree moment is used to characterize a sum of similarities between one user in the user set and other users;
a second constructing submodule, configured to construct a target matrix according to at least the similarity matrix and the degree matrix, where each row vector of the target matrix represents a coordinate of one user in the user set in a feature space;
and the clustering submodule is used for clustering the row vectors subjected to the dimensionality reduction on the target matrix so as to divide the user set into a plurality of clusters.
Optionally, the second constructing sub-module is configured to construct the target matrix according to the following:
constructing a Laplace matrix according to the similarity matrix and the degree matrix;
performing feature mapping according to the Laplace matrix, and selecting feature values of the number of clusters;
constructing a characteristic vector matrix according to the characteristic vectors corresponding to the selected characteristic values;
and carrying out normalization processing on the row vectors of the characteristic vector matrix to obtain the target matrix.
Optionally, the feature data of each user in the user set includes features of the user in multiple dimensions, and the second constructing sub-module is configured to construct the target matrix according to the following manner:
constructing a Laplace matrix according to the similarity matrix and the degree matrix;
determining at least one candidate dimension from the multiple dimensions, and combining features of each user in the user set under each candidate dimension to obtain a feature combination;
constructing a diagonal matrix, a first intermediate matrix and a second intermediate matrix from the feature combinations and the Laplace matrix, respectively, based on the following formulas:
Figure BDA0002211956470000061
where U (j, j) is a diagonal matrix, PjIs the jth row of the projection matrix; alpha, betaGamma is a preset adjusting parameter; r is a correlation matrix for characterizing the degree of correlation between the features in the feature combination X, Rij=I(fi,fj),rij∈R,rijRepresenting mutual information between the features in the dimension i and the features in the dimension j in the feature combination X, and rij∈[0,1](ii) a A is the first intermediate matrix; h is the second intermediate matrix; d is the degree matrix; l is the Laplace matrix;
selecting the eigenvalue of the clustering number from all the eigenvalues of the second intermediate matrix;
respectively constructing a feature vector matrix and a projection matrix corresponding to the feature vector according to the feature vector, the first intermediate matrix and the feature combination corresponding to each selected feature value:
Figure BDA0002211956470000062
wherein T is the eigenvector matrix, and c is the number of clusters; v. of1,v2,…,vcSelecting characteristic vectors corresponding to the characteristic values; p is the projection matrix;
if the constructed projection matrix is not converged, repeating the step of determining at least one candidate dimension from the plurality of dimensions, and combining the features of each user in the user set under each candidate dimension to the constructed feature vector matrix and the projection matrix corresponding to the feature vector respectively until the constructed projection matrix is converged; and the number of the first and second groups,
and under the condition that the constructed projection matrix is converged, carrying out normalization processing on the row vector of the characteristic vector matrix corresponding to the projection matrix to obtain the target matrix.
According to a third aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.
According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the method of the first aspect.
Through the technical scheme, the following technical effects can be at least achieved: with the aim of minimizing the information entropy of the user set as a target, the similarity between the users in the user set can be automatically determined, and compared with the method for determining the similarity between the users in a manual mode, the method has the advantages of higher efficiency and accuracy and labor cost saving. Furthermore, the efficiency of the whole risk user identification process and the accuracy of the identification result can be improved, and the labor cost of the whole process is reduced.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:
FIG. 1 is a flow chart illustrating a method of risky user identification according to an exemplary embodiment of the present disclosure;
FIG. 2 is a flow chart illustrating a method of clustering a set of users according to an exemplary embodiment of the present disclosure;
FIG. 3 is a block diagram illustrating an at risk user identification device according to an exemplary embodiment of the present disclosure;
FIG. 4 is a block diagram illustrating an at risk user identification device according to another exemplary embodiment of the present disclosure;
FIG. 5 illustrates a block diagram of an electronic device in accordance with an exemplary embodiment of the present disclosure.
Detailed Description
The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.
It is worth noting that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described figures are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
In the related art, the risk user identification is mostly performed by manually determining the similarity between users based on the respective feature data of a plurality of users to be identified, and then performing the risk user identification according to the similarity between the users and the historical feature data of the risk users.
However, since the similarity between users depends on the experience and efficiency of the operator, the calculation efficiency and accuracy of the similarity between users are affected, and further, the efficiency of the identification process of the risk users and the accuracy of the identification result are affected. Moreover, the whole process requires manual work, so that the labor cost is high.
In view of this, the present disclosure provides a method and an apparatus for identifying a risky user, a storage medium, and an electronic device, so as to automatically identify the risky user based on respective feature data of a plurality of users to be identified, improve efficiency and accuracy of identifying the risky user, and reduce labor cost.
Fig. 1 is a flowchart illustrating a method of risk user identification according to an exemplary embodiment of the present disclosure. Referring to fig. 1, the method includes the steps of:
s101, acquiring characteristic data of each user in the user set.
The user set comprises risk sample users and a plurality of users to be identified.
Specifically, the feature data of each user can be customized according to the service scene of the specific application of the method. For example, in the case of credit service, the feature data of each user may include features of different dimensions of the user, such as age, academic calendar, income, credit information, and the like.
S102, with the aim of minimizing the information entropy of the user set as a target, determining the similarity between the users in the user set.
The information entropy of the user set is used for representing the uncertainty of a clustering result obtained by clustering the user set. The clustering result includes a plurality of clusters obtained by dividing the user set and user information included in each cluster, including but not limited to a young couple cluster, a parent-child cluster, a brother cluster, and the like.
The probability that each user belongs to one cluster in the information entropy is the proportion of the sum of the similarity between the user and other users to the sum of the similarity between the users in the user set.
For example, the information entropy of the feature data set may be the following formula (1).
Figure BDA0002211956470000091
Wherein, h (x) is the information entropy of the user set; p (x)i) For user x in the user setiProbability of belonging to a cluster, n being the number of users contained in the set of users; wijFor user x in the user setiAnd user xjSimilarity between them, Wij∈Rn×nAnd is
Figure BDA0002211956470000092
d(xi,xj) For user x in the user setiAnd user xjThe euclidean distance between them, σ being the width parameter.
In a specific implementation, the width parameter σ may be calculated according to the above formula (1), and the similarity between users may be further calculated according to the width parameter σ.
S103, based on a spectral clustering algorithm, clustering is carried out on the user set according to the similarity among the users in the user set, so that the user set is divided into a plurality of clusters.
Specifically, the user relationship graph may be constructed according to the similarity between users in the user set, for example, the undirected weight graph G ═ V, E may be used to represent the user relationship graph, where V is the user set (including the risk sample user and the multiple users to be identified), and E is the edge set in the user relationship graph G, and the weight of each edge is used to characterize the similarity between two users connected by the edge.
Further, after the user relationship graph is constructed, the user relationship graph can be cut based on a spectral clustering algorithm, and then the users in the user set are divided into a plurality of clusters. In the specific implementation, as shown in fig. 2, the following steps can be performed:
s131, according to the similarity among the users in the user set, respectively constructing a similarity matrix and a degree matrix.
The elements in the similarity matrix are used for representing the similarity between two users in the user set, and the elements in the degree matrix are used for representing the sum of the similarities between one user in the user set and all other users.
S132, constructing a target matrix at least according to the similarity matrix and the degree matrix.
Wherein each row vector of the target matrix characterizes coordinates of one user in the set of users in the feature space.
In an alternative implementation, the laplacian matrix may be first constructed from the similarity matrix and the degree matrix. And then, performing feature mapping according to the Laplace matrix, calculating all the feature values of the Laplace matrix and the feature vector corresponding to each feature value, and selecting the feature values of the cluster number from all the feature values of the Laplace matrix. And finally, constructing a characteristic vector matrix according to the characteristic vectors corresponding to the selected characteristic values respectively, and carrying out normalization processing on the characteristic vector matrix to obtain a target matrix. For example, the laplacian matrix L may be first constructed according to formula (2), then k minimum eigenvalues are selected from all eigenvalues of the laplacian matrix L, and the eigenvectors v corresponding to the eigenvalues may be selected1,v2,…,vkConstructing a feature vector matrix V ═ V as column vectors1,v2,…,vk]∈Rn×kAnd carrying out normalization processing on the characteristic vector matrix according to a formula (3) to obtain a target matrix Y.
L=D-1/2WD-1/2 (2)
Figure BDA0002211956470000101
Wherein k is the number of clusters; l is a Laplace matrix; d is a degree matrix; w is a similarity matrix; y isijAre elements in the object matrix Y.
In another alternative implementation, the feature data of each user in the set of users includes features of the user in multiple dimensions. Accordingly, the laplacian matrix may be first constructed from the similarity matrix and the degree matrix. Then, at least one candidate dimension is determined from the multiple dimensions, and features of the users in the user set under each candidate dimension are combined to obtain a feature combination. Further, a diagonal matrix, a first intermediate matrix, and a second intermediate matrix are respectively constructed from the feature combinations and the laplacian matrix. Further, selecting the eigenvalue of the cluster number from all eigenvalues of the second intermediate matrix, and respectively constructing an eigenvector matrix and a projection matrix corresponding to the eigenvector according to the eigenvector, the first intermediate matrix and the combination of the characteristics corresponding to each selected eigenvalue. Further, whether the constructed projection matrix is converged is judged, if the projection matrix is not converged, it is indicated that the correlation of the features in the selected feature combination is low and redundant features exist, so that the step of determining at least one candidate dimension from the multiple dimensions, combining the features of each user in the user set in each candidate dimension to respectively construct a feature vector matrix and a projection matrix corresponding to the feature vector is repeatedly executed until the constructed projection matrix is converged, which indicates that the features in the selected feature combination are correlated and superior, and further under the condition, row vectors of the feature vectors corresponding to the projection matrix are subjected to normalization processing to obtain a target matrix.
In this implementation, the laplacian matrix may be constructed according to equation (2) above, the diagonal matrix, the first intermediate matrix, and the second intermediate matrix may be constructed according to equation (4) below, and the eigenvector matrix may be constructed, for exampleAnd the projection matrix corresponding to the eigenvector matrix may be constructed according to the following equation (5). In addition, for the selection of the eigenvalues of the second intermediate matrix, c minimum eigenvalues can be selected from all the eigenvalues of the second intermediate matrix, and the eigenvectors v corresponding to the eigenvalues respectively are selected1,v2,…,vcConstructing a feature vector matrix V ═ V as column vectors1,v2,…,vc]∈Rn×c
Figure BDA0002211956470000111
Figure BDA0002211956470000112
Where U (j, j) is a diagonal matrix, PjIs the jth row of the projection matrix; alpha, beta and gamma are preset adjusting parameters; r is a correlation matrix for characterizing the degree of correlation between the features in the feature combination X, Rij=I(fi,fj),rij∈R,rijRepresenting mutual information between the features in the dimension i and the features in the dimension j in the feature combination X, and rij∈[0,1](ii) a A is the first intermediate matrix; h is the second intermediate matrix; d is the degree matrix; l is the Laplace matrix; t is the eigenvector matrix, and c is the clustering number; v. of1,v2,…,vcSelecting characteristic vectors corresponding to the characteristic values; p is the projection matrix.
Through the implementation mode, automatic selection of the features of the users in the user set under different dimensions can be achieved, and dimension explosion caused by overhigh feature dimension is avoided. Secondly, as the correlation degree correlation matrix and the projection matrix used for representing the correlation degree between the features in the feature combination are used in the feature selection process, the correlation and the projection relation between the features are considered, so that the selected feature combination is better, the clustering result obtained by clustering all users according to the selected feature combination is more accurate, and the result of identifying the risk users based on the clustering result is more accurate.
S133, clustering the row vectors subjected to the dimensionality reduction on the target matrix so as to divide the user set into a plurality of clusters.
Specifically, the row vectors after the dimensionality reduction of the target matrix may be clustered based on a kmeans algorithm or other clustering algorithms known in the art, and if the ith row of the target matrix is classified as the a-th class, the user x in the corresponding user setiClassified as category a.
And S104, determining the risk user from the plurality of users to be identified according to the distribution information of the risk sample user in the plurality of clusters obtained by division.
The similarity of users in the same cluster is higher, while the similarity of users in different clusters is lower. Therefore, the risk user can be determined from the users to be identified according to the distribution information of the risk sample users in the plurality of clusters obtained by division.
In an alternative implementation manner, one risk sample user may be included in the user set, in which case, a cluster to which the risk sample user belongs may be used as a risk cluster, and a user to be identified in the risk cluster may be determined as a risk user.
In another alternative implementation manner, the user set may include a plurality of risk sample users, a proportion of the risk sample users in each cluster to total users in the cluster may be calculated according to a distribution situation of each risk sample user in a plurality of clusters, a cluster with a highest calculated proportion is used as a risk cluster, and a user to be identified in the cluster is determined as a risk user.
By adopting the risk user identification method, the similarity among the users in the user set can be automatically determined by taking the minimum information entropy of the user set as a target, and compared with the method for determining the similarity among the users in a manual mode, the efficiency and the accuracy are higher, and the labor cost is saved. Furthermore, the efficiency of the whole risk user identification process and the accuracy of the identification result can be improved, and the labor cost of the whole process is reduced.
Fig. 3 is a block diagram illustrating an apparatus for identifying an at risk user according to an exemplary embodiment of the present disclosure. Referring to fig. 3, the apparatus 300 includes:
an obtaining module 301, configured to obtain feature data of each user in a user set, where the user set includes a risk sample user and multiple users to be identified;
a first determining module 302, configured to determine similarity between users in the user set with a goal of minimizing an information entropy of the user set, where the information entropy is used to represent an uncertainty of a clustering result obtained by clustering the user set, and a probability that each user in the information entropy belongs to a cluster is a ratio of a sum of similarities between the user and other users in the user set to a sum of similarities between the users in the user set;
a clustering module 303, configured to perform clustering processing on a user set according to similarity between users in the user set based on a spectral clustering algorithm, so as to divide the user set into multiple clusters;
a second determining module 304, configured to determine a risk user from the multiple users to be identified according to distribution information of the risk sample user in the multiple clusters.
Optionally, the information entropy is:
Figure BDA0002211956470000141
wherein H (X) is the information entropy; p (x)i) For user x in the user setiProbability of belonging to a cluster, n being the number of users in the set of users; wijFor user x in the user setiAnd user xjThe similarity between them.
Optionally, as shown in fig. 4, the clustering module 303 includes:
the first constructing submodule 331, configured to respectively construct a similarity matrix and a degree matrix according to similarities between users in the user set, where an element in the similarity matrix is used to characterize the similarity between two users in the user set, and an element in the degree moment is used to characterize a sum of similarities between one user in the user set and other users;
a second constructing submodule 332, configured to construct a target matrix according to at least the similarity matrix and the degree matrix, where each row vector of the target matrix represents a coordinate of one user in the user set in a feature space;
the clustering sub-module 333 is configured to perform clustering processing on the reduced-dimension row vectors of the target matrix, so as to divide the user set into a plurality of clusters.
Optionally, the second construction sub-module 332 is configured to construct the target matrix according to the following manner:
constructing a Laplace matrix according to the similarity matrix and the degree matrix;
performing feature mapping according to the Laplace matrix, and selecting feature values of the number of clusters;
constructing a characteristic vector matrix according to the characteristic vectors corresponding to the selected characteristic values;
and carrying out normalization processing on the row vectors of the characteristic vector matrix to obtain the target matrix.
Optionally, the feature data of each user in the user set includes features of the user in multiple dimensions, and the second constructing sub-module 332 is configured to construct the target matrix according to the following manner:
constructing a Laplace matrix according to the similarity matrix and the degree matrix;
determining at least one candidate dimension from the multiple dimensions, and combining features of each user in the user set under each candidate dimension to obtain a feature combination;
constructing a diagonal matrix, a first intermediate matrix and a second intermediate matrix from the feature combinations and the Laplace matrix, respectively, based on the following formulas:
Figure BDA0002211956470000151
where U (j, j) is a diagonal matrix, PjIs the jth row of the projection matrix; alpha, beta and gamma are preset adjusting parameters; r is a correlation matrix for characterizing the degree of correlation between the features in the feature combination X, Rij=I(fi,fj),rij∈R,rijRepresenting mutual information between the features in the dimension i and the features in the dimension j in the feature combination X, and rij∈[0,1](ii) a A is the first intermediate matrix; h is the second intermediate matrix; d is the degree matrix; l is the Laplace matrix;
selecting the eigenvalue of the clustering number from all the eigenvalues of the second intermediate matrix;
respectively constructing a feature vector matrix and a projection matrix corresponding to the feature vector according to the feature vector, the first intermediate matrix and the feature combination corresponding to each selected feature value:
Figure BDA0002211956470000152
wherein T is the eigenvector matrix, and c is the number of clusters; v. of1,v2,…,vcSelecting characteristic vectors corresponding to the characteristic values; p is the projection matrix;
if the constructed projection matrix is not converged, repeating the step of determining at least one candidate dimension from the plurality of dimensions, and combining the features of each user in the user set under each candidate dimension to the constructed feature vector matrix and the projection matrix corresponding to the feature vector respectively until the constructed projection matrix is converged; and the number of the first and second groups,
and under the condition that the constructed projection matrix is converged, carrying out normalization processing on the row vector of the characteristic vector matrix corresponding to the projection matrix to obtain the target matrix.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions.
In addition, with regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.
By adopting the device, the similarity among the users in the user set can be automatically determined by taking the minimum information entropy of the user set as a target, and compared with the method of determining the similarity among the users in a manual mode, the efficiency and the accuracy are higher, and the labor cost is saved. Furthermore, the efficiency of the whole risk user identification process and the accuracy of the identification result can be improved, and the labor cost of the whole process is reduced.
Fig. 5 is a block diagram illustrating an electronic device 500 according to an exemplary embodiment of the present disclosure. For example, the electronic device 500 may be provided as a server. Referring to fig. 5, the electronic device 500 comprises a processor 522, which may be one or more in number, and a memory 532 for storing computer programs executable by the processor 522. The computer programs stored in memory 532 may include one or more modules that each correspond to a set of instructions. Further, the processor 522 may be configured to execute the computer program to perform the above-described risky user identification method.
Additionally, the electronic device 500 may also include a power component 526 and a communication component 550, the power component 526 may be configured to perform power management of the electronic device 500, and the communication component 550 may be configured to enable communication, e.g., wired or wireless communication, of the electronic device 500. In addition, the electronic device 500 may also include input/output (I/O) interfaces 558. The electronic device 500 may operate based on an operating system stored in memory 532, such as Windows Server, Mac OS XTM, UnixTM, Linux, and the like.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described method of risk user identification is also provided. For example, the computer readable storage medium may be the memory 532 described above including program instructions that are executable by the processor 522 of the electronic device 500 to perform the method for at risk user identification described above.
The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.
It should be noted that the various features described in the above embodiments may be combined in any suitable manner without departing from the scope of the invention. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.
In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims (6)

1. A method for identifying an at-risk user, comprising:
acquiring characteristic data of each user in a user set, wherein the user set comprises risk sample users and a plurality of users to be identified;
determining similarity among users in the user set by taking the minimized information entropy of the user set as a target, wherein the information entropy is used for representing the uncertainty of a clustering result obtained by clustering the user set, and the probability that each user in the information entropy belongs to one cluster is the proportion of the sum of the similarities between the user and other users in the user set to the sum of the similarities among the users in the user set;
based on a spectral clustering algorithm, clustering a user set according to the similarity between users in the user set so as to divide the user set into a plurality of clusters;
determining a risk user from the plurality of users to be identified according to the distribution information of the risk sample user in the plurality of clusters;
the clustering processing is performed on the user set according to the similarity between the users in the user set based on the spectral clustering algorithm so as to divide the user set into a plurality of clusters, and the clustering processing comprises the following steps:
respectively constructing a similarity matrix and a degree matrix according to the similarity between users in the user set, wherein elements in the similarity matrix are used for representing the similarity between two users in the user set, and elements in the degree matrix are used for representing the sum of the similarities between one user and other users in the user set;
constructing a target matrix according to at least the similarity matrix and the degree matrix, wherein each row vector of the target matrix represents a coordinate of one user in the user set in a feature space;
performing clustering processing on the row vectors subjected to dimensionality reduction on the target matrix so as to divide the user set into a plurality of clusters;
the feature data of each user in the user set comprises features of the user in multiple dimensions;
constructing a target matrix according to at least the similarity matrix and the degree matrix, including:
constructing a Laplace matrix according to the similarity matrix and the degree matrix;
determining at least one candidate dimension from the multiple dimensions, and combining features of each user in the user set under each candidate dimension to obtain a feature combination;
constructing a diagonal matrix, a first intermediate matrix and a second intermediate matrix from the feature combinations and the Laplace matrix, respectively, based on the following formulas:
Figure FDA0002945925090000021
where U (j, j) is a diagonal matrix, PjIs the jth row of the projection matrix; alpha, beta and gamma are preset adjusting parameters; r is a correlation matrix for characterizing the degree of correlation between the features in the feature combination X, Rij=I(fi,fj),rij∈R,rijRepresenting mutual information between the features in the dimension i and the features in the dimension j in the feature combination X, and rij∈[0,1](ii) a A is the first intermediate matrix; h is the second intermediate matrix; d is the degree matrix; l is the Laplace matrix;
selecting the eigenvalue of the clustering number from all the eigenvalues of the second intermediate matrix;
respectively constructing a feature vector matrix and a projection matrix corresponding to the feature vector according to the feature vector, the first intermediate matrix and the feature combination corresponding to each selected feature value:
Figure FDA0002945925090000022
wherein T is the eigenvector matrix, and c is the number of clusters; v. of1,v2,…,vcSelecting characteristic vectors corresponding to the characteristic values; p is the projection matrix;
if the constructed projection matrix is not converged, repeating the step of determining at least one candidate dimension from the plurality of dimensions, and combining the features of each user in the user set under each candidate dimension to the constructed feature vector matrix and the projection matrix corresponding to the feature vector respectively until the constructed projection matrix is converged; and the number of the first and second groups,
and under the condition that the constructed projection matrix is converged, carrying out normalization processing on the row vector of the characteristic vector matrix corresponding to the projection matrix to obtain the target matrix.
2. The method of claim 1, wherein the information entropy is:
Figure FDA0002945925090000031
wherein H (X) is the information entropy; p (x)i) For user x in the user setiProbability of belonging to a cluster, n being the number of users in the set of users; wijFor user x in the user setiAnd user xjThe similarity between them.
3. The method of claim 1, wherein constructing a target matrix based on at least the similarity matrix and the degree matrix comprises:
constructing a Laplace matrix according to the similarity matrix and the degree matrix;
performing feature mapping according to the Laplace matrix, and selecting feature values of the number of clusters;
constructing a characteristic vector matrix according to the characteristic vectors corresponding to the selected characteristic values;
and carrying out normalization processing on the row vectors of the characteristic vector matrix to obtain the target matrix.
4. An apparatus for identifying an at-risk user, comprising:
the system comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring characteristic data of each user in a user set, and the user set comprises risk sample users and a plurality of users to be recognized;
a first determining module, configured to determine similarity between users in the user set with a goal of minimizing an information entropy of the user set, where the information entropy is used to represent an uncertainty of a clustering result obtained by clustering the user set, and a probability that each user in the information entropy belongs to a cluster is a proportion of a sum of similarities between the user and other users in the user set to a sum of similarities between the users in the user set;
the clustering module is used for clustering the user set according to the similarity among the users in the user set based on a spectral clustering algorithm so as to divide the user set into a plurality of clusters;
the second determining module is used for determining the risk users from the users to be identified according to the distribution information of the risk sample users in the clusters;
the clustering module comprises:
a first construction submodule, configured to respectively construct a similarity matrix and a degree matrix according to a similarity between users in the user set, where an element in the similarity matrix is used to characterize a similarity between two users in the user set, and an element in the degree moment is used to characterize a sum of similarities between one user in the user set and other users;
a second constructing submodule, configured to construct a target matrix according to at least the similarity matrix and the degree matrix, where each row vector of the target matrix represents a coordinate of one user in the user set in a feature space;
the clustering submodule is used for clustering the row vectors subjected to the dimensionality reduction on the target matrix so as to divide the user set into a plurality of clusters;
the feature data of each user in the user set comprises features of the user in multiple dimensions, and the second construction submodule is used for constructing the target matrix according to the following modes:
constructing a Laplace matrix according to the similarity matrix and the degree matrix;
determining at least one candidate dimension from the multiple dimensions, and combining features of each user in the user set under each candidate dimension to obtain a feature combination;
constructing a diagonal matrix, a first intermediate matrix and a second intermediate matrix from the feature combinations and the Laplace matrix, respectively, based on the following formulas:
Figure FDA0002945925090000041
where U (j, j) is a diagonal matrix, PjIs the jth row of the projection matrix; alpha, beta and gamma are preset adjusting parameters; r is a correlation matrix for characterizing the degree of correlation between the features in the feature combination X, Rij=I(fi,fj),rij∈R,rijRepresenting mutual information between the features in the dimension i and the features in the dimension j in the feature combination X, and rij∈[0,1](ii) a A is the first intermediate matrix; h is the second intermediate matrix; d is the degree matrix; l is the Laplace matrix;
selecting the eigenvalue of the clustering number from all the eigenvalues of the second intermediate matrix;
respectively constructing a feature vector matrix and a projection matrix corresponding to the feature vector according to the feature vector, the first intermediate matrix and the feature combination corresponding to each selected feature value:
Figure FDA0002945925090000051
wherein T is the eigenvector matrix, and c is the number of clusters; v. of1,v2,…,vcSelecting characteristic vectors corresponding to the characteristic values; p is the projection matrix;
if the constructed projection matrix is not converged, repeating the step of determining at least one candidate dimension from the plurality of dimensions, and combining the features of each user in the user set under each candidate dimension to the constructed feature vector matrix and the projection matrix corresponding to the feature vector respectively until the constructed projection matrix is converged; and the number of the first and second groups,
and under the condition that the constructed projection matrix is converged, carrying out normalization processing on the row vector of the characteristic vector matrix corresponding to the projection matrix to obtain the target matrix.
5. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 3.
6. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 3.
CN201910901456.8A 2019-09-23 2019-09-23 Risk user identification method and device, storage medium and electronic equipment Active CN110706092B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910901456.8A CN110706092B (en) 2019-09-23 2019-09-23 Risk user identification method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910901456.8A CN110706092B (en) 2019-09-23 2019-09-23 Risk user identification method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110706092A CN110706092A (en) 2020-01-17
CN110706092B true CN110706092B (en) 2021-05-18

Family

ID=69195872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910901456.8A Active CN110706092B (en) 2019-09-23 2019-09-23 Risk user identification method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110706092B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612041B (en) * 2020-04-24 2023-10-13 平安直通咨询有限公司上海分公司 Abnormal user identification method and device, storage medium and electronic equipment
CN111586001B (en) * 2020-04-28 2022-11-22 咪咕文化科技有限公司 Abnormal user identification method and device, electronic equipment and storage medium
CN113627454A (en) * 2020-05-09 2021-11-09 北京沃东天骏信息技术有限公司 Article information clustering method, pushing method and pushing device
CN112259210B (en) * 2020-11-18 2021-05-11 云南财经大学 Medical big data access control method and device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169500A (en) * 2017-03-09 2017-09-15 中国矿业大学 A kind of Spectral Clustering about subtracted based on neighborhood rough set and system
CN107563399A (en) * 2016-06-30 2018-01-09 中国矿业大学 The characteristic weighing Spectral Clustering and system of a kind of knowledge based entropy
CN109086822A (en) * 2018-08-01 2018-12-25 广州虎牙信息科技有限公司 A kind of main broadcaster's user classification method, device, equipment and storage medium
CN109784388A (en) * 2018-12-29 2019-05-21 北京中电普华信息技术有限公司 Stealing user identification method and device
CN109784636A (en) * 2018-12-13 2019-05-21 中国平安财产保险股份有限公司 Fraudulent user recognition methods, device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009058915A1 (en) * 2007-10-29 2009-05-07 The Trustees Of The University Of Pennsylvania Computer assisted diagnosis (cad) of cancer using multi-functional, multi-modal in-vivo magnetic resonance spectroscopy (mrs) and imaging (mri)

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563399A (en) * 2016-06-30 2018-01-09 中国矿业大学 The characteristic weighing Spectral Clustering and system of a kind of knowledge based entropy
CN107169500A (en) * 2017-03-09 2017-09-15 中国矿业大学 A kind of Spectral Clustering about subtracted based on neighborhood rough set and system
CN109086822A (en) * 2018-08-01 2018-12-25 广州虎牙信息科技有限公司 A kind of main broadcaster's user classification method, device, equipment and storage medium
CN109784636A (en) * 2018-12-13 2019-05-21 中国平安财产保险股份有限公司 Fraudulent user recognition methods, device, computer equipment and storage medium
CN109784388A (en) * 2018-12-29 2019-05-21 北京中电普华信息技术有限公司 Stealing user identification method and device

Also Published As

Publication number Publication date
CN110706092A (en) 2020-01-17

Similar Documents

Publication Publication Date Title
CN110706092B (en) Risk user identification method and device, storage medium and electronic equipment
US11526799B2 (en) Identification and application of hyperparameters for machine learning
CN109859054B (en) Network community mining method and device, computer equipment and storage medium
CN108763420B (en) Data object classification method, device, terminal and computer-readable storage medium
CN111444363B (en) Picture retrieval method and device, terminal equipment and storage medium
CN108764726B (en) Method and device for making decision on request according to rules
CN109189876A (en) A kind of data processing method and device
CN116126947B (en) Big data analysis method and system applied to enterprise management system
CN112329660A (en) Scene recognition method and device, intelligent equipment and storage medium
CN112203324B (en) MR positioning method and device based on position fingerprint database
CN110414569B (en) Clustering implementation method and device
CN111601380A (en) Position location method, device and equipment based on position fingerprint and storage medium
US6882998B1 (en) Apparatus and method for selecting cluster points for a clustering analysis
CN110929218A (en) Difference minimization random grouping method and system
CN111753921B (en) Hyperspectral image clustering method, device, equipment and storage medium
CN112243247B (en) Base station optimization priority determining method and device and computing equipment
CN115423201A (en) Method, device and equipment for predicting power generation capacity data and computer readable storage medium
CN115190587A (en) WIFI position determination method and device, electronic equipment and storage medium
CN109949070B (en) User viscosity evaluation method, device, computer equipment and storage medium
CN114020971A (en) Abnormal data detection method and device
CN115146890A (en) Enterprise operation risk warning method and device, computer equipment and storage medium
CN114139621A (en) Method, device, equipment and storage medium for determining model classification performance identification
US20210201088A1 (en) Image classification system and method
JP4036009B2 (en) Image data classification device
CN105389597B (en) A kind of more sorting techniques of high-spectral data based on Chernoff distances and SVM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210218

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Qianhai feisuan Technology (Shenzhen) Co.,Ltd.

Address before: 518000 unit a, B, C, D, 20 / F, 22 / F, unit a, B, C, D, block a, financial technology building, 11 Keyuan Road, Yuehai street, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: SHENZHEN ZHONG XING CREDEX FINANCE TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Patentee after: Feisuanzhi Technology (Shenzhen) Co.,Ltd.

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Patentee before: Qianhai feisuan Technology (Shenzhen) Co.,Ltd.

CP01 Change in the name or title of a patent holder