CN111309822B - User identity recognition method and device - Google Patents
User identity recognition method and device Download PDFInfo
- Publication number
- CN111309822B CN111309822B CN202010087184.5A CN202010087184A CN111309822B CN 111309822 B CN111309822 B CN 111309822B CN 202010087184 A CN202010087184 A CN 202010087184A CN 111309822 B CN111309822 B CN 111309822B
- Authority
- CN
- China
- Prior art keywords
- community
- user
- risk
- node
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 239000011159 matrix material Substances 0.000 claims abstract description 36
- 238000004422 calculation algorithm Methods 0.000 claims description 60
- 238000004364 calculation method Methods 0.000 claims description 37
- 238000012545 processing Methods 0.000 claims description 15
- 238000002347 injection Methods 0.000 claims description 9
- 239000007924 injection Substances 0.000 claims description 9
- 230000014509 gene expression Effects 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 7
- 230000001902 propagating effect Effects 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000012954 risk control Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 230000002411 adverse Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 235000013162 Cocos nucifera Nutrition 0.000 description 1
- 244000060011 Cocos nucifera Species 0.000 description 1
- 241001025261 Neoraja caerulea Species 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Databases & Information Systems (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Game Theory and Decision Science (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- General Engineering & Computer Science (AREA)
- Educational Administration (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the disclosure discloses a user identity recognition method and device. Wherein the method comprises the following steps: performing primary recognition on the overall knowledge graph to obtain a first data set of at least one high-risk user community, and respectively establishing sub-graphs of the user community; acquiring a second data set of each node in the sub-map, and updating the sub-map; and constructing a propagation matrix in the user community according to the updated sub-graph, performing secondary identification based on the propagation matrix, and updating the user risk confidence coefficient of each node in the sub-graph.
Description
Technical Field
The disclosure relates to the technical field of data mining, and in particular relates to a user identity recognition method, a device, electronic equipment and a storage medium.
Background
With the development of big data and artificial intelligence technology, especially the breakthrough of cognitive intelligence technology in recent years, knowledge-graph technology based on relational databases can provide more specialized and more accurate intelligent analysis services for users in many application fields. Typically, knowledge-graph can be utilized to support a variety of artificial intelligence models that identify information based on relationships, such as personalized recommendations, associated information searches, map data processing, social networking services, specialized knowledge bases, user authentication, or internet finance, among others.
In the artificial intelligent model based on the knowledge graph, a relationship graph constructed by the knowledge graph is utilized, and a tag propagation algorithm (Label Propagation Algorithm, LPA) is applied to perform tag propagation on seed data (white list and black list), so that probability/confidence of the whole network is obtained. For the application of user identity/reliability recognition, the recognition of user organization/community has special practical significance, and besides the conventional user social and organization relationship recognition, the recognition of fraudulent parties is a necessary but more difficult task as a specific task in anti-fraud recognition. In a common method, first, a label propagation algorithm is used to propagate the nodes with the representation values to other nodes through a relation matrix; a community discovery algorithm (Community Detection Algorithm), such as the classical Girvan-Newman algorithm, may then be used to identify potential communities and to reliably identify each community, thereby confirming whether a community is a rogue group, helping the system or other user to promote security of internet applications.
However, the prior art propagates tags only by the degree of association of nodes (typically people), but fraudsters/groups may also have normal social relationships, such as unknown relatives or frequently connected service personnel, etc., for which the existing approach tends to propagate frauds tags to normal users, causing the system to make false positives, which are unacceptable to both normal users and service providers. In order to solve the problem, some correction schemes further appear in the prior art, such as a method of directly judging nodes with lower weights or fewer edges with other nodes as weak nodes according to the number or weight of the association relation (edges in the map) of the nodes, and directly removing the nodes from the identified fraudulent parties. However, the direct correction scheme is just based on threshold segmentation performed empirically, and does not consider the real logic in the node social relationship, so that the problem of misjudgment cannot be really solved, but on the other hand, unnecessary errors are brought.
Disclosure of Invention
Aiming at the technical problems in the prior art, the embodiment of the disclosure provides a user identity recognition method, a device, electronic equipment and a computer readable storage medium, so as to solve the problems of fuzzy community recognition boundary and high misjudgment rate in the prior art.
A first aspect of an embodiment of the present disclosure provides a method for identifying a user identity, including:
performing primary recognition on the overall knowledge graph to obtain a first data set of at least one high-risk user community, and respectively establishing sub-graphs of the user community;
acquiring a second data set of each node in the sub-map, and updating the sub-map;
and constructing a propagation matrix in the user community according to the updated sub-graph, performing secondary identification based on the propagation matrix, and updating the user risk confidence coefficient of each node in the sub-graph.
In some embodiments, the initial identifying comprises:
using a label propagation algorithm to propagate the expression value possessed by the node to the associated node through a propagation matrix;
after the label propagation algorithm converges, identifying at least one user community by using a community discovery algorithm;
and identifying the risk confidence of each user community, and determining the risk degree of the community and the user.
In some embodiments, the secondary identifying comprises:
using a label propagation algorithm to propagate the expression value possessed by the node to the associated node through a propagation matrix;
after the label propagation algorithm converges, identifying real members by using a community discovery algorithm, or screening out nodes with small change of risk confidence values before and after updating, or normalizing the updated risk confidence values;
and identifying the risk confidence degree of each user node, and determining the risk degree of communities and users.
In some embodiments, the acquiring the second dataset for each node in the sub-graph comprises at least one of:
and comprehensively calculating the second data set by the data in the map, or obtaining the second data set by a third-party data source, or obtaining the second data set by a manual injection mode.
In some embodiments, the comprehensive calculation includes:
and calculating by adopting at least one of potential association calculation, voiceprint data similarity calculation, position association calculation, biological feature similarity calculation and information sending frequency calculation, and finding new edges between nodes or updating weight values of the existing edges.
A second aspect of an embodiment of the present disclosure provides a user identity recognition apparatus, including:
the community identification module is used for carrying out primary identification on the overall knowledge graph to obtain a first data set of at least one high-risk user community, and respectively establishing sub-graph of the user community;
the map updating module is used for acquiring a second data set of each node in the sub-map and updating the sub-map;
and the secondary identification module is used for constructing a propagation matrix in the user community according to the updated sub-graph, carrying out secondary identification based on the propagation matrix and updating the user risk confidence coefficient of each node in the sub-graph.
In some embodiments, the community identification module includes:
the first label propagation module is used for propagating the representation value of the node to the associated node through the propagation matrix by using a label propagation algorithm;
the first community discovery module is used for identifying at least one user community by using a community discovery algorithm after the tag propagation algorithm converges;
the first risk identification module is used for identifying the risk confidence of each user community and determining the risk degree of the community and the user.
In some embodiments, the secondary identification module comprises:
the second tag propagation module is used for propagating the representation value of the node to the associated node through the propagation matrix by using a tag propagation algorithm;
the comprehensive processing module is used for identifying real members by using a community discovery algorithm after the label propagation algorithm converges, or screening out nodes with small change of risk confidence values before and after updating, and normalizing the updated risk confidence values;
and the second risk identification module is used for identifying the risk confidence degree of each user node and determining the risk degree of communities and users.
In some embodiments, the profile updating module comprises at least one of the following:
the comprehensive calculation module is used for comprehensively calculating the second data set from the data in the map;
the third party acquisition module is used for acquiring the second data set through a third party data source;
and the injection module is used for obtaining the second data set by means of manual injection.
In some embodiments, the comprehensive computation module comprises:
and the edge updating module is used for adopting at least one of potential association degree calculation, voiceprint data similarity calculation, position association degree calculation, biological feature similarity calculation and information sending frequency calculation to calculate and find out new edges between nodes or update the weight values of the existing edges.
A third aspect of the disclosed embodiments provides an electronic device, comprising:
a memory and one or more processors;
wherein the memory is communicatively coupled to the one or more processors, and instructions executable by the one or more processors are stored in the memory, which when executed by the one or more processors, are operable to implement the methods as described in the previous embodiments.
A fourth aspect of the disclosed embodiments provides a computer-readable storage medium having stored thereon computer-executable instructions which, when executed by a computing device, are operable to implement the methods of the previous embodiments.
A fifth aspect of the disclosed embodiments provides a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are operable to implement a method as described in the previous embodiments.
According to the embodiment of the disclosure, the hidden data relationship of the members in the identified communities is further mined, and the boundaries of the communities are judged through secondary identification, so that the accuracy of user identification is improved, and the problem of misjudgment of the system is effectively solved under the condition that the reliability, the data safety and the processing efficiency of the system are ensured.
Drawings
The features and advantages of the present disclosure will be more clearly understood by reference to the accompanying drawings, which are schematic and should not be construed as limiting the disclosure in any way, in which:
FIG. 1 is a diagram illustrating a typical knowledge-graph, in accordance with some embodiments of the present disclosure;
FIG. 2 is a schematic diagram of a risk control system based on knowledge graph and artificial intelligence, in accordance with some embodiments of the disclosure;
FIG. 3 is a flow chart of a user identification method according to some embodiments of the present disclosure;
FIG. 4 is a representation of a graph node according to some embodiments of the disclosure;
5A-C are schematic illustrations of an identification change to a fraudulent party member according to some embodiments of the present disclosure;
FIG. 6 is a schematic block diagram of a user identification device according to some embodiments of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device, shown in accordance with some embodiments of the present disclosure.
Detailed Description
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. However, it will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without these specific details. It should be appreciated that the use of "system," "apparatus," "unit," and/or "module" terms in this disclosure is one method for distinguishing between different parts, elements, portions, or components at different levels in a sequential arrangement. However, these terms may be replaced with other expressions if the other expressions can achieve the same purpose.
It will be understood that when a device, unit, or module is referred to as being "on," "connected to," or "coupled to" another device, unit, or module, it can be directly on, connected to, or coupled to, or in communication with the other device, unit, or module, or intervening devices, units, or modules may be present unless the context clearly indicates an exception. For example, the term "and/or" as used in this disclosure includes any and all combinations of one or more of the associated listed items.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present disclosure. As used in the specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" are intended to cover only those features, integers, steps, operations, elements, and/or components that are explicitly identified, but do not constitute an exclusive list, as other features, integers, steps, operations, elements, and/or components may be included.
These and other features and characteristics of the present disclosure, as well as the methods of operation, functions of the related elements of structure, combinations of parts and economies of manufacture, may be better understood with reference to the following description and the accompanying drawings, all of which form a part of this specification. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosure. It will be understood that the figures are not drawn to scale.
Various block diagrams are used in the present disclosure to illustrate various modifications of the embodiments according to the present disclosure. It should be understood that the foregoing or following structures are not intended to limit the present disclosure. The protection scope of the present disclosure is subject to the claims.
The knowledge graph can help to identify and understand the association relation among things in the real world, so that the hidden characteristic of certain things can be further found, in the prior art, the automatic cognition of the real things can be realized by a machine by further combining with an artificial intelligence technology, and the processing of complex business can be automatically completed. One typical application scenario in the prior art is to apply a tag propagation algorithm to classify/cluster people in a knowledge graph, but due to the complexity of social relations, when the tag propagation algorithm runs in the whole graph, accurate distinction and recognition of boundary conditions of people cannot be performed, and often misjudgment is caused by a system. In the prior art, some nodes are directly rejected through thresholds, but the mode does not consider the real logic of the social relationship of the crowd, and the thresholds are set purely empirically, so that the problem of misjudgment cannot be really solved, but unnecessary errors are brought on the other hand (for example, nodes which should belong to a certain crowd are usually rejected).
In view of this, the embodiment of the disclosure provides a user identity recognition method, which can accurately judge the true identity of an edge node by performing secondary recognition on the boundary of a crowd/community in a knowledge graph, so as to avoid adverse effects caused by misinterpretation of a system to normal users and service providers.
Typically, the graph database (knowledge graph database) stores relationship data, typically in terms of entities and relationships in the real world; different entities correspond to different nodes, the different entities are connected through a relation, and the nodes and the relation further comprise different attributes for defining the types of the entities and the types of the relation. As shown in FIG. 1, in one exemplary graph database example, a knowledge graph illustrates a network of user relationships constructed based on personal information, wherein different entities form nodes of different shapes in FIG. 1, and relationships between the entities form links between the nodes. For example, "Zhang" and "Li Jiang" are two individual entities, each of which is connected to other entities such as "cell phone number" or "company" by relationships such as "work on" or "own phone". As further shown in fig. 2, an embodiment of the disclosure provides a schematic diagram of a risk control system based on a knowledge graph and artificial intelligence, such as an intelligent risk control system of a company. The user submits a financial approach application through an internet front-end system, such as SDK (Software Development Kit), an H5 page and an internet financial APP program; then accessing the financial inlet to a task matching server through a wired or wireless communication network; one financial inlet will be automatically matched to a different financial service provider in the tasking server; typically, the matching server is owned by a third party financial institution; further, for entry data entering the financial service system, the entry data is preprocessed and then stored in a map database; the map database may be a map database of Neo4J for storing a large amount of knowledge map data about financial transactions. Further, the financial approach generates a risk control analysis task that obtains relationship-based data from a graph database by way of a graph query, resulting in relationship data associated with the approach. And inputting the relationship data into a variable computing module to obtain an evaluation variable corresponding to the relationship data. Further, inputting the evaluation variable into an anti-fraud evaluation model to complete anti-fraud recognition; wherein the anti-fraud assessment model may be a machine learning based assessment model, e.g. the model may be a decision tree based GDBT model or a neural network based depth model. Further, the result of the anti-fraud recognition and the variable data are input to an anti-fraud and wind control system module, which completes the evaluation of the entry based on the corresponding decision flow and optionally manual intervention.
As a specific task in anti-fraud recognition, fraud group recognition is a process of recognizing a crowd/community that may exist in a map and judging whether the crowd/community is a fraudulent group. In the prior art, a tag propagation algorithm and a community discovery algorithm are mainly used for identifying potential communities, and fraud probability operation is carried out on each community to obtain a conclusion whether the community is a fraudulent party or not. In the prior art, fraudulent partners can be identified through membership with stronger association degree, but misjudgment is often generated for edge nodes with weaker association, which brings a plurality of adverse effects to normal users and service providers. In order to avoid erroneous judgment, in one embodiment of the present disclosure, the boundary of the crowd/community in the knowledge graph is secondarily identified, as shown in fig. 3, the related user identity identification method includes the steps of:
s301, carrying out primary recognition on the overall knowledge graph to obtain a first data set of at least one high-risk user community, and respectively establishing sub-graphs of the user community;
s302, acquiring a second data set of each node in the sub-map, and updating the sub-map;
s303, constructing a propagation matrix in the user community according to the updated sub-graph, performing secondary identification based on the propagation matrix, and updating user risk confidence coefficient of each node in the sub-graph.
In an embodiment of the present disclosure, the primary identifying includes: using a label propagation algorithm to propagate the expression value possessed by the node to the associated node through a propagation matrix; after the label propagation algorithm converges, identifying at least one user community by using a community discovery algorithm; and identifying the risk confidence coefficient of each user community, and determining the risk confidence coefficient of the community and the user. The tag propagation algorithm, the community discovery algorithm and the risk identification algorithm are fully researched in the prior art, and the embodiment of the disclosure can adopt the existing basic algorithm or related improved algorithm, for example, the tag propagation algorithm can adopt LPA, COPRA, SLPA and the like, so that specific implementation of the algorithm is not described one by one, and the specific implementation of the disclosure is not limited.
In one embodiment of the present disclosure, the first data set is a data set directly obtained through the global knowledge graph, including, but not limited to, original information of nodes and edges (such as node names, contents, attributes, and representation values, weights and attributes of edges, etc.), and information after the primary recognition post-processing (such as representation values of nodes after propagation, community labels, and risk confidence, etc.). Wherein the representation value is typically a set of values representing the probability that the node has a black and white label, such as a binary set of values in which the black label represents fraud and the white label represents normal (fraud probability, normal probability). FIG. 4 illustrates a top node including representation values whose doublets represent probability of fraud and probability values of non-fraud; in an embodiment of the present disclosure, the representation of the top node will be propagated to its bottom three vertices by the degree of edge association through a label propagation algorithm. Further, fig. 5A illustrates a rogue partner found by community identification in a relational knowledge graph, wherein nodes in the rogue partner are shown as black nodes to indicate a higher probability of fraud (or higher confidence of high risk) for visually displaying the distinction. As can be seen from fig. 5A, a community can be considered as a sub-graph with a higher connectivity; to simplify the subsequent processing, in a preferred embodiment of the present disclosure, a corresponding sub-graph is constructed from the extracted first data set of communities.
In the real world, due to the wide connection of social relationships, a person associated with a rogue partner or rogue molecule may be a normal user, and the tag propagation algorithm will propagate any nodes with associations through the relationship matrix with a probability of fraud, and it is likely that a normal user with a certain degree of association with a partner will be misjudged as a black point. However, as the fraud probability of the fraud group node is the result after the label propagation algorithm converges, the relation and the expression value data in the map are fully utilized, and the identification precision is difficult to further improve on the basis of the current data. In one embodiment of the present disclosure, after obtaining the rogue group, further more relationship data will be obtained for the internal nodes of the rogue group, thereby performing secondary identification on the nodes of the rogue group, and making the boundaries of the group more accurate. The establishment of the sub-atlas can enable data adjustment of the partner to be carried out only in a limited range, so that the phenomenon that the adjusted data influence the data consistency of the whole knowledge atlas is avoided, the data processing range and the operation scale are reduced, and the system reliability, the data safety and the processing efficiency are achieved.
In particular, the sub-map, and in particular the relational data of the internal nodes, is updated by the acquired second dataset of nodes. The second data set is a data set different from the first data set, and can be obtained by comprehensively calculating data in the atlas, can be obtained by a third-party data source, can be obtained by manually injecting, and the like, or can be any combination of the above multiple modes. For example, for a calculation mode, potential association degrees in data can be comprehensively calculated according to data of a plurality of nodes in the map, so that a new edge or a weight value of an updated edge is generated; or obtaining the weight value between two nodes through similarity calculation of voiceprint data in the atlas, association calculation of positions, calculation of biological feature similarity, calculation of information sending frequency and the like. In the mode of the third-party data source, query data of each node is sent to the third-party data source, query result information fed back by the third-party data source is obtained, and therefore data of any two related nodes are calculated, and a new edge or a weight value of an updated edge is obtained. For example, interaction data in a social network between users represented by two nodes can be obtained by accessing a social network website, so that a weight value is obtained by calculation; data obtained from other reliable data sources, such as call data, financial data, credit data, or personal profile data, may only be available for retrieval, and are equally applicable to embodiments of the present disclosure. In the manual injection mode, the result of the manual investigation can be input to the map through a predefined interface, for example, the input mode is to input the identity IDs of two nodes and the relation between the two nodes, and the system adds the edge of the map or updates the weight value of the edge through the input identity IDs and the relation between the two nodes.
With the help of the calculation result of the second data set, the connection or weight value of the relationship/edge inside the fraudulent group is changed, which means that the system obtains more data to verify the specific member of the group, so that the accuracy of identifying the community boundary can be further improved. Specifically, in one embodiment of the present disclosure, a propagation matrix within a group is constructed from updated data, and this new propagation matrix introduces updated newly added edges or updated edge weight values, and is thus different from the propagation matrix used in the initial community identification. After obtaining a new propagation matrix, performing community recognition again; the process of the secondary community identification is preferably the same as the primary identification described above, and the description thereof will not be repeated here. Wherein, for the tag propagation algorithm, the updated data is used to rerun until the tag propagation algorithm converges. Taking a label propagation algorithm multiplied by a matrix as an example, in the execution process of each round, the propagation matrix is the representation value of the community node and the surrounding associated nodes, the relationship matrix is the matrix formed by the weight values of the edges between the updated nodes, and the two matrices are multiplied so as to further propagate the representation value of the previous round of results in the community. Optionally, for data that does not exist in the map, such as some nodes have no representation value, or edges between two nodes have no connection, the corresponding value may be set to zero or other specified value in the matrix; each round of computation spreads the representation values one level in the nodes inside the community, and when the difference between the computation results of the two nearest rounds is not large (the computation results can be quantitatively evaluated by adopting the difference value or the variance, etc.), the label propagation algorithm can be considered to be converged, and all the representation values at the moment are output.
After the tag propagation algorithm is converged again, the membership in the true fraud partner is strengthened, so that the fraud probability of the true fraud partner member is also greatly improved; whereas normal applicants have less variation in their fraud probability due to the lack of strong contact with true fraud members in the second data set. Therefore, in the course of performing the community recognition process again, the real members of the community can be recognized in various ways. For example, the community discovery algorithm and risk identification algorithm may be run again to get updated fraudulent campaigns and/or to determine high risk members; nodes with small changes can be removed according to the change condition of the front and back fraud probability values (high-risk confidence values), so that the nodes with large fraud probability values or large amplification are reserved as core members of fraud groups; or, the fraud probability values may be normalized, so that the fraud probability values before and after updating remain consistent on average, and at this time, the fraud probability of the normal user will decrease, and thus the fraud partner member range may be eliminated. FIGS. 5A-C are illustrations of identification changes to a fraudulent party member in a preferred embodiment of the present disclosure, where FIG. 5A is a graph of the fraudulent party after initial identification; FIG. 5B is an updated map based on a second data set, wherein the dashed lines represent new edges (single dashed line) or edges (dashed and solid line) of updated weight values calculated from the second data set; FIG. 5C is a graph of a secondary identified rogue partner, in which the rogue probability value of the partner core member is enhanced, and some nodes at the edge are identified as non-rogue partner members during secondary identification, and are removed from the partner, because they are not significantly enhanced in the second data set.
According to the user identity recognition method provided by the embodiment of the disclosure, the hidden data relationship of the members in the recognized communities is further mined, and the boundaries of the communities are secondarily recognized and judged, so that the accuracy of user identity recognition is improved, and the problem of system misjudgment is effectively solved under the condition that the reliability, the data safety and the processing efficiency of the system are ensured.
Fig. 6 is a schematic diagram of a user identification device according to some embodiments of the present disclosure. As shown in fig. 6, the user identification apparatus 600 includes a community identification module 601, a map updating module 602, and a secondary identification module 603; wherein,,
the community identification module 601 is configured to perform primary identification on the overall knowledge graph to obtain a first data set of at least one high-risk user community, and respectively establish sub-graphs of the user community;
the map updating module 602 is configured to obtain a second data set of each node in the sub-map, and update the sub-map;
the secondary recognition module 603 is configured to construct a propagation matrix inside the user community according to the updated sub-graph, perform secondary recognition based on the propagation matrix, and update the user risk confidence coefficient of each node in the sub-graph.
In some embodiments, the community identification module includes:
the first label propagation module is used for propagating the representation value of the node to the associated node through the propagation matrix by using a label propagation algorithm;
the first community discovery module is used for identifying at least one user community by using a community discovery algorithm after the tag propagation algorithm converges;
the first risk identification module is used for identifying the risk confidence of each user community and determining the risk degree of the community and the user.
In some embodiments, the secondary identification module comprises:
the second tag propagation module is used for propagating the representation value of the node to the associated node through the propagation matrix by using a tag propagation algorithm;
the comprehensive processing module is used for identifying real members by using a community discovery algorithm after the label propagation algorithm converges, or screening out nodes with small change of risk confidence values before and after updating, and normalizing the updated risk confidence values;
and the second risk identification module is used for identifying the risk confidence degree of each user node and determining the risk degree of communities and users.
In some embodiments, the profile updating module comprises at least one of the following:
the comprehensive calculation module is used for comprehensively calculating the second data set from the data in the map;
the third party acquisition module is used for acquiring the second data set through a third party data source;
and the injection module is used for obtaining the second data set by means of manual injection.
In some embodiments, the comprehensive computation module comprises:
and the edge updating module is used for adopting at least one of potential association degree calculation, voiceprint data similarity calculation, position association degree calculation, biological feature similarity calculation and information sending frequency calculation to calculate and find out new edges between nodes or update the weight values of the existing edges.
Referring to fig. 7, a schematic diagram of an electronic device according to an embodiment of the present application is provided. As shown in fig. 7, the electronic device 700 includes:
memory 730, and one or more processors 710;
wherein the memory 730 is communicatively coupled to the one or more processors 710, and instructions 732 executable by the one or more processors are stored in the memory 730, the instructions 732 being executable by the one or more processors 710 to cause the one or more processors 710 to perform the methods of the foregoing embodiments of the present application.
In particular, processor 710 and memory 730 may be connected by a bus or otherwise, as exemplified in FIG. 7 by bus 740. The processor 710 may be a central processing unit (Central Processing Unit, CPU). The processor 710 may also be a chip such as other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or a combination thereof.
Memory 730 acts as a non-transitory computer readable storage medium that may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as a cascading progressive network in embodiments of the present application, and the like. The processor 710 executes various functional applications of the processor and data processing by running non-transitory software programs, instructions, and functional modules 732 stored in the memory 730.
Memory 730 may include a program storage area that may store an operating system, at least one application program required for functionality, and a data storage area; the storage data area may store data created by the processor 710, etc. In addition, memory 730 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 730 may optionally include memory located remotely from processor 710, which may be connected to processor 710 via a network, such as through communication interface 720. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
An embodiment of the present application further provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed, perform the method of the previous embodiments of the present application.
The foregoing computer-readable storage media includes both physical volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Computer-readable storage media includes, but is not limited to, U disk, removable hard disk, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), erasable programmable Read-Only Memory (EPROM), electrically erasable programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, digital Versatile Disks (DVD), HD-DVD, blue-Ray or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium capable of storing the desired information and that can be accessed by a computer.
While the subject matter described herein is provided in the general context of operating systems and application programs that execute in conjunction with the execution of a computer system, those skilled in the art will recognize that other implementations may also be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types. Those skilled in the art will appreciate that the subject matter described herein may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like, as well as distributed computing environments that have tasks performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments of the application herein may be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
In summary, the disclosure provides a user identity recognition method, a device, an electronic device and a computer readable storage medium thereof. According to the embodiment of the disclosure, the hidden data relationship of the members in the identified communities is further mined, and the boundaries of the communities are judged through secondary identification, so that the accuracy of user identification is improved, and the problem of misjudgment of the system is effectively solved under the condition that the reliability, the data safety and the processing efficiency of the system are ensured.
It is to be understood that the above-described embodiments of the present disclosure are merely illustrative or explanatory of the principles of the disclosure and are not restrictive of the disclosure. Accordingly, any modifications, equivalent substitutions, improvements, or the like, which do not depart from the spirit and scope of the present disclosure, are intended to be included within the scope of the present disclosure. Furthermore, the appended claims of this disclosure are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or the equivalents of such scope and boundary.
Claims (4)
1. A method for identifying a user, comprising:
performing primary recognition on the overall knowledge graph to obtain a first data set of at least one high-risk user community, and respectively establishing sub-graphs of the user community;
acquiring a second data set of each node in the sub-map, and updating the sub-map;
constructing a propagation matrix in the user community according to the updated sub-graph, performing secondary identification based on the propagation matrix, and updating user risk confidence degrees of all nodes in the sub-graph;
the primary identifying includes:
using a label propagation algorithm to propagate the expression value possessed by the node to the associated node through a propagation matrix;
after the label propagation algorithm converges, identifying at least one user community by using a community discovery algorithm;
identifying the risk confidence of each user community, and determining the risk degree of the community and the user;
the secondary identifying includes:
using a label propagation algorithm to propagate the expression value possessed by the node to the associated node through a propagation matrix;
after the label propagation algorithm converges, identifying real members by using a community discovery algorithm, or screening out nodes with small change of risk confidence values before and after updating, or normalizing the updated risk confidence values;
identifying risk confidence of each user node, and determining risk degrees of communities and users;
the acquiring the second data set of each node in the sub-map comprises at least one of the following:
and comprehensively calculating the second data set by the data in the map, or obtaining the second data set by a third-party data source, or obtaining the second data set by a manual injection mode.
2. The method of claim 1, wherein the comprehensive calculation comprises:
and calculating by adopting at least one of potential association calculation, voiceprint data similarity calculation, position association calculation, biological feature similarity calculation and information sending frequency calculation, and finding new edges between nodes or updating weight values of the existing edges.
3. A user identification device, comprising:
the community identification module is used for carrying out primary identification on the overall knowledge graph to obtain a first data set of at least one high-risk user community, and respectively establishing sub-graph of the user community;
the map updating module is used for acquiring a second data set of each node in the sub-map and updating the sub-map;
the secondary identification module is used for constructing a propagation matrix in the user community according to the updated sub-graph, carrying out secondary identification based on the propagation matrix and updating the user risk confidence coefficient of each node in the sub-graph;
the community identification module includes:
the first label propagation module is used for propagating the representation value of the node to the associated node through the propagation matrix by using a label propagation algorithm;
the first community discovery module is used for identifying at least one user community by using a community discovery algorithm after the tag propagation algorithm converges;
the first risk identification module is used for identifying the risk confidence of each user community and determining the risk degree of the community and the user;
the secondary identification module includes:
the second tag propagation module is used for propagating the representation value of the node to the associated node through the propagation matrix by using a tag propagation algorithm;
the comprehensive processing module is used for identifying real members by using a community discovery algorithm after the label propagation algorithm converges, or screening out nodes with small change of risk confidence values before and after updating, and normalizing the updated risk confidence values;
the second risk identification module is used for identifying the risk confidence coefficient of each user node and determining the risk degree of communities and users;
the map updating module comprises at least one of the following modules:
the comprehensive calculation module is used for comprehensively calculating the second data set from the data in the map;
the third party acquisition module is used for acquiring the second data set through a third party data source;
and the injection module is used for obtaining the second data set by means of manual injection.
4. The apparatus of claim 3, wherein the comprehensive computation module comprises:
and the edge updating module is used for adopting at least one of potential association degree calculation, voiceprint data similarity calculation, position association degree calculation, biological feature similarity calculation and information sending frequency calculation to calculate and find out new edges between nodes or update the weight values of the existing edges.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010087184.5A CN111309822B (en) | 2020-02-11 | 2020-02-11 | User identity recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010087184.5A CN111309822B (en) | 2020-02-11 | 2020-02-11 | User identity recognition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111309822A CN111309822A (en) | 2020-06-19 |
CN111309822B true CN111309822B (en) | 2023-05-09 |
Family
ID=71150973
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010087184.5A Active CN111309822B (en) | 2020-02-11 | 2020-02-11 | User identity recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111309822B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112102093B (en) * | 2020-08-05 | 2024-08-13 | 中证征信(深圳)有限公司 | Principal identity and association relationship identification method, device, equipment and medium |
CN112069416B (en) * | 2020-08-21 | 2022-09-02 | 河南科技大学 | Cross-social network user identity recognition method based on community discovery |
CN112348659B (en) * | 2020-10-21 | 2024-03-19 | 上海淇玥信息技术有限公司 | User identification policy distribution method and device and electronic equipment |
CN112507312B (en) * | 2020-12-08 | 2022-10-14 | 电子科技大学 | Digital fingerprint-based verification and tracking method in deep learning system |
US20220230238A1 (en) * | 2021-01-19 | 2022-07-21 | PayU Credit B.V. | System and method for assessing risk |
CN114997869A (en) * | 2021-02-26 | 2022-09-02 | 北京字节跳动网络技术有限公司 | Risk node identification method and device, electronic equipment and computer readable storage medium |
CN113033966B (en) * | 2021-03-03 | 2024-07-12 | 携程旅游信息技术(上海)有限公司 | Risk target identification method, risk target identification device, electronic equipment and storage medium |
CN113434587B (en) * | 2021-06-30 | 2023-08-18 | 青岛海尔科技有限公司 | Data storage and data query method and system |
CN113409139B (en) * | 2021-07-27 | 2024-05-28 | 深圳前海微众银行股份有限公司 | Credit risk identification method, apparatus, device and program |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109784636A (en) * | 2018-12-13 | 2019-05-21 | 中国平安财产保险股份有限公司 | Fraudulent user recognition methods, device, computer equipment and storage medium |
CN109949046A (en) * | 2018-11-02 | 2019-06-28 | 阿里巴巴集团控股有限公司 | The recognition methods of risk clique and device |
WO2019137050A1 (en) * | 2018-01-12 | 2019-07-18 | 阳光财产保险股份有限公司 | Real-time fraud detection method and device under internet credit scene, and server |
CN110110093A (en) * | 2019-04-08 | 2019-08-09 | 深圳众赢维融科技有限公司 | A kind of recognition methods, device, electronic equipment and the storage medium of knowledge based map |
CN110223168A (en) * | 2019-06-24 | 2019-09-10 | 浪潮卓数大数据产业发展有限公司 | A kind of anti-fraud detection method of label propagation and system based on business connection map |
-
2020
- 2020-02-11 CN CN202010087184.5A patent/CN111309822B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019137050A1 (en) * | 2018-01-12 | 2019-07-18 | 阳光财产保险股份有限公司 | Real-time fraud detection method and device under internet credit scene, and server |
CN109949046A (en) * | 2018-11-02 | 2019-06-28 | 阿里巴巴集团控股有限公司 | The recognition methods of risk clique and device |
CN109784636A (en) * | 2018-12-13 | 2019-05-21 | 中国平安财产保险股份有限公司 | Fraudulent user recognition methods, device, computer equipment and storage medium |
CN110110093A (en) * | 2019-04-08 | 2019-08-09 | 深圳众赢维融科技有限公司 | A kind of recognition methods, device, electronic equipment and the storage medium of knowledge based map |
CN110223168A (en) * | 2019-06-24 | 2019-09-10 | 浪潮卓数大数据产业发展有限公司 | A kind of anti-fraud detection method of label propagation and system based on business connection map |
Non-Patent Citations (1)
Title |
---|
宾晟 ; 孙更新 ; .基于多关系社交网络的协同过滤推荐算法.计算机科学.(12),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111309822A (en) | 2020-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111309822B (en) | User identity recognition method and device | |
US11314693B2 (en) | Method for data structure relationship detection | |
US20190164015A1 (en) | Machine learning techniques for evaluating entities | |
US11093845B2 (en) | Tree pathway analysis for signature inference | |
CN113011973B (en) | Method and equipment for financial transaction supervision model based on intelligent contract data lake | |
CN112989035B (en) | Method, device and storage medium for identifying user intention based on text classification | |
CN110163242B (en) | Risk identification method and device and server | |
CN110287292B (en) | Judgment criminal measuring deviation degree prediction method and device | |
CN111199474A (en) | Risk prediction method and device based on network diagram data of two parties and electronic equipment | |
CN111222976A (en) | Risk prediction method and device based on network diagram data of two parties and electronic equipment | |
CN110348516B (en) | Data processing method, data processing device, storage medium and electronic equipment | |
CN110609870B (en) | Distributed data processing method and device, electronic equipment and storage medium | |
CN110162958B (en) | Method, apparatus and recording medium for calculating comprehensive credit score of device | |
CN111861463A (en) | Intelligent information identification method based on block chain and artificial intelligence and big data platform | |
CN111259167B (en) | User request risk identification method and device | |
CN114298176A (en) | Method, device, medium and electronic equipment for detecting fraudulent user | |
CN116402512B (en) | Account security check management method based on artificial intelligence | |
CN116307671A (en) | Risk early warning method, risk early warning device, computer equipment and storage medium | |
CN113438239B (en) | Network attack detection method and device based on depth k nearest neighbor | |
CN111241297B (en) | Atlas data processing method and apparatus based on label propagation algorithm | |
CN112990989B (en) | Value prediction model input data generation method, device, equipment and medium | |
CN111277433B (en) | Network service abnormity detection method and device based on attribute network characterization learning | |
CN116739764A (en) | Transaction risk detection method, device, equipment and medium based on machine learning | |
CN107424026A (en) | Businessman's reputation evaluation method and device | |
CN115310606A (en) | Deep learning model depolarization method and device based on data set sensitive attribute reconstruction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220608 Address after: 510000 floor 7, building S6, poly Yuzhu port, No. 848, Huangpu Avenue East, Huangpu District, Guangzhou, Guangdong Applicant after: Jianlian Technology (Guangdong) Co.,Ltd. Address before: 510623 Room 201, building a, No. 1, Qianwan 1st Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong Applicant before: SHENZHEN ZHONGYING WEIRONG TECHNOLOGY Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |