CN110766091A - Method and system for identifying road loan partner - Google Patents

Method and system for identifying road loan partner Download PDF

Info

Publication number
CN110766091A
CN110766091A CN201911049749.4A CN201911049749A CN110766091A CN 110766091 A CN110766091 A CN 110766091A CN 201911049749 A CN201911049749 A CN 201911049749A CN 110766091 A CN110766091 A CN 110766091A
Authority
CN
China
Prior art keywords
node
relationship
nodes
loan
road
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911049749.4A
Other languages
Chinese (zh)
Other versions
CN110766091B (en
Inventor
刘胜
梁淑云
马影
陶景龙
王启凡
魏国富
徐�明
殷钱安
余贤喆
周晓勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Data Security Solutions Co Ltd
Original Assignee
Information and Data Security Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Data Security Solutions Co Ltd filed Critical Information and Data Security Solutions Co Ltd
Priority to CN201911049749.4A priority Critical patent/CN110766091B/en
Publication of CN110766091A publication Critical patent/CN110766091A/en
Application granted granted Critical
Publication of CN110766091B publication Critical patent/CN110766091B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Abstract

The embodiment of the invention provides a method and a system for identifying a road loan partner, wherein the method comprises the following steps: 1) acquiring characteristic data related to the road loan operation process; 2) taking keywords contained in the characteristic data as nodes, and constructing a relation graph comprising the nodes according to the relation among the nodes; 3) contracting non-character nodes in the relational graph into character nodes corresponding to the non-character nodes; 4) determining the weight of the edges according to the types of the edges among the character nodes, and dividing the relational graph into a plurality of node sets; 5) and aiming at each node set, acquiring the probability that the nodes in the node set are set as set loan partner members according to the coincidence degree of the node set and the data of the predetermined set loan criminals, and taking the characters corresponding to the node set with the probability greater than a preset threshold value as the set loan partner members. By applying the embodiment of the invention, the corresponding road loan gangs can be identified according to the data of the existing road loan criminals.

Description

Method and system for identifying road loan partner
Technical Field
The invention relates to an identification method and an identification system, in particular to an identification method and an identification system for a road loan partner.
Background
The road loan is nominally a folk loan and is essentially an illegal act. Criminals create false debt and debt by inducing victims to sign loan-related agreements, falsely increasing loan amounts, maliciously making defaults, wantonly identifying defaults, destroying concealed repayment evidences and the like, and illegally occupy the property of the victims by lawsuits, arbitration, notarization or adopting violence, threats and other means. The road loan has the rudiment of a knowledge crime, even individual law practitioners become the conspiracy of the perpetrator and give professional legal guidance to the perpetrator, the victory rate of false litigation is improved, and the result of a high-volume crime is obtained. The road loan activity has strong concealment, fast profit, high profit, easy copy and spread and great harm. The road loan seriously infringes the legal rights and interests of the borrowers, disturbs the normal financial order, derives various criminal crimes and influences the social stability. Some road loan sets spread from offline to online by means of a network platform, and the traditional contact crime is changed into a novel non-contact crime, so that the number of infringed groups is more, the range is wider, and the social hazard is large. Therefore, how to timely and accurately identify the road loan group and further play an active role in the safety and stability of the society is a technical problem to be solved urgently.
Patent application No. 201810562975.1 discloses a method for monitoring transfer of claims, the method comprising: acquiring credit right transfer information and credit right information, wherein the credit right information at least comprises credit right grades, and the credit right transfer information comprises transfer credit right persons, transfer credit right persons and transfer money; updating the credit information according to the credit transfer information; and establishing and displaying a credit assignment relationship diagram according to the credit assignment information and the credit information.
In the prior art, only bond relation transfer can be identified, and the route loan group cannot be identified.
Disclosure of Invention
The invention aims to provide a method and a system for identifying a road loan partner.
The invention solves the technical problems through the following technical means:
the embodiment of the invention provides a method for identifying a road loan partner, which comprises the following steps:
1) acquiring characteristic data related to a road set loan operation process, wherein the characteristic data comprises: communication data, transaction records and personal information of personnel involved in the process of road set loan;
2) taking keywords contained in the characteristic data as nodes, and constructing a relation graph comprising the nodes according to the relation among the nodes;
3) contracting non-character nodes in the relational graph into character nodes corresponding to the non-character nodes;
4) determining the weight of the edges according to the types of the edges among the character nodes, and dividing the relational graph into a plurality of node sets; wherein the types of the edges include: one or a combination of employment relationship, colleague relationship, transfer relationship, charging relationship, payment relationship, conversation relationship, investment relationship, reporting relationship, call relationship, job relationship, behavior relationship and intimacy relationship;
5) and aiming at each node set, acquiring the probability that the nodes in the node set are set as set loan group members according to the coincidence degree of the node set and the data of the predetermined set of road loan criminals, and taking the persons corresponding to the node sets with the probability larger than a preset threshold value as the set loan group members.
By applying the embodiment of the invention, a corresponding relation graph is established according to the characteristic data related in the operation process of the road cover loan, and a relation graph only containing the relation of the characters is established according to the relation graph; and dividing a relation graph only containing character relations into a plurality of node sets through iteration among weights, judging the probability of each node set as a road set loan group partner according to the number of road set loan criminals appearing in the node sets, and further identifying the corresponding road set loan group partner according to the data of the existing road set loan criminals.
Optionally, the step 2) includes:
extracting keywords contained in the feature data by using a natural language processing algorithm, wherein the keywords comprise: one or a combination of a person's name, place name, company name, identification number, telephone number, bank card number, QQ number, email address, IP address, number home, and number home company.
Optionally, the process of obtaining the relationship between the nodes in step 2) includes:
for structured data, directly querying to obtain a relation between nodes, wherein the structured data comprises: a bank transaction record; the relationship between the nodes comprises: one or a combination of a transfer relation, a charging relation, a payment relation, a conversation relation and an investment relation;
for unstructured data, extracting the relationship between nodes by using a syntactic analysis algorithm, wherein the unstructured data comprises the following components: conversation content and chat records; the relationship between the nodes further comprises: reporting relationship, title relationship, job relationship, behavior relationship and intimacy relationship.
Optionally, the step 4) includes:
41) respectively randomly assigning a unique ID to each node in the relational graph after the node contraction operation is executed, and assigning a preset weight to each edge according to the type of the edge between adjacent nodes;
42) for each node, using the formula, Wab=∑wab+∑wbaAnd calculating the weight summary of the nodes, wherein WabSummarizing the weight between the node a and the node b; w is aabIs the weight pointed to by node a to node b; w is abaIs the weight pointed to by node b to node a;
43) updating the ID of the node to the ID of the node with the maximum weight summarized value in the neighbor nodes of the node, and returning to execute the step 42) until the ID of each node is not changed any more;
44) and dividing the nodes with the same ID into a node set to obtain a plurality of node sets.
Optionally, the obtaining of the probability that the node in the node set is a route loan partner member includes:
for each set of nodes, the node is updated, using the formula,calculating the probability that the character corresponding to the node in the node set is used as a road loan partner, wherein,
s is the probability that the character corresponding to the node in the node set is used as a road loan partner member; m is the number of nodes in the node set; n is the number of the person corresponding to the node in the node set and the node in the data belonging to the predetermined road set lending criminal.
The embodiment of the invention also provides a system for identifying the arbitrage lending partners, which comprises the following steps:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring characteristic data involved in the operation process of the road cover loan, and the characteristic data comprises: communication data, transaction records and personal information of personnel involved in the process of road set loan;
the construction module is used for taking the keywords contained in the characteristic data as nodes and constructing a relation graph comprising the nodes according to the relation among the nodes;
the contraction module is used for contracting non-person nodes in the relational graph into person nodes corresponding to the non-person nodes;
the dividing module is used for determining the weight of the edges according to the types of the edges among the character nodes and dividing the relation graph into a plurality of node sets; wherein the types of the edges include: one or a combination of employment relationship, colleague relationship, transfer relationship, charging relationship, payment relationship, conversation relationship, investment relationship, reporting relationship, call relationship, job relationship, behavior relationship and intimacy relationship;
and the second acquisition module is used for acquiring the probability that the nodes in the node sets are the set loan partner members according to the coincidence degree of the node sets and the data of the predetermined set loan criminals, and taking the persons corresponding to the node sets with the probability greater than the preset threshold value as the set loan partner members.
Optionally, the building module is configured to:
extracting keywords contained in the feature data by using a natural language processing algorithm, wherein the keywords comprise: one or a combination of a person's name, place name, company name, identification number, telephone number, bank card number, QQ number, email address, IP address, number home, and number home company.
Optionally, the building module is configured to:
for structured data, directly querying to obtain a relation between nodes, wherein the structured data comprises: a bank transaction record; the relationship between the nodes comprises: one or a combination of a transfer relation, a charging relation, a payment relation, a conversation relation and an investment relation;
for unstructured data, extracting the relationship between nodes by using a syntactic analysis algorithm, wherein the unstructured data comprises the following components: conversation content and chat records; the relationship between the nodes further comprises: reporting relationship, title relationship, job relationship, behavior relationship and intimacy relationship.
Optionally, the dividing module is configured to:
41) respectively randomly assigning a unique ID to each node in the relational graph after the node contraction operation is executed, and assigning a preset weight to each edge according to the type of the edge between adjacent nodes;
42) for each node, using the formula, Wab=∑wab+∑wbaAnd calculating the weight summary of the nodes, wherein WabSummarizing the weight between the node a and the node b; w is aabIs the weight pointed to by node a to node b; w is abaIs the weight pointed to by node b to node a;
43) updating the ID of the node to the ID of the node with the maximum weight summarized value in the neighbor nodes of the node, and returning to execute the step 42) until the ID of each node is not changed any more;
44) and dividing the nodes with the same ID into a node set to obtain a plurality of node sets.
Optionally, the second obtaining module is configured to:
for each set of nodes, the node is updated, using the formula,
Figure BDA0002255034590000061
calculating the probability that the character corresponding to the node in the node set is used as a road loan partner, wherein,
s is the probability that the character corresponding to the node in the node set is used as a road loan partner member; m is the number of nodes in the node set; n is the number of the person corresponding to the node in the node set and the node in the data belonging to the predetermined road set lending criminal.
The invention has the advantages that:
by applying the embodiment of the invention, a corresponding relation graph is established according to the characteristic data related in the operation process of the road cover loan, and a relation graph only containing the relation of the characters is established according to the relation graph; and dividing a relation graph only containing character relations into a plurality of node sets through iteration among weights, judging the probability of each node set as a road set loan group partner according to the number of road set loan criminals appearing in the node sets, and further identifying the corresponding road set loan group partner according to the data of the existing road set loan criminals.
Drawings
Fig. 1 is a schematic flow chart of a method for identifying a route loan group according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an identification system for a road loan group provided in an embodiment of the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Fig. 1 is a schematic flow chart of a method for identifying a route loan group according to an embodiment of the present invention, as shown in fig. 1, the method includes:
s101: acquiring characteristic data related to a road set loan operation process, wherein the characteristic data comprises: communication data, transaction records, and personal information of the persons involved in the route loan process.
For example, the data needed by the route loan group identification can be stored in the corresponding directory of the local server from various related databases according to the month. These data are required to satisfy the following: the data of call information, bank transaction records, website access records, personal basic information, mastered partial criminal data and the like involved in the implementation process of the road loan.
In practical applications, the feature data, such as the call information, needs to include the following fields: a calling number (call _ phone), a called number (called _ phone), call time (call _ time), call duration (call _ dur), call content (call _ content), and the like; the bank transaction record needs to contain: a sender (u _ transfer), a remitted person (u _ receive), a transaction time (transfer _ time), a transaction account number (transfer _ acc); the website access record should contain: user IP (user _ IP), website (v _ url), access time (t _ time) and operation content (user _ opt); the personal basic information should contain the fields: name (user _ name), certificate number (identi _ num), phone number (phone _ num), QQ number (QQ _ num), micro signal code (wechat _ num), and the like.
It should be emphasized that the feature data involved in the operation of the road block loan refers to data that may be involved in the whole process of the road block loan.
S102: and taking the keywords contained in the feature data as nodes, and constructing a relationship graph comprising the nodes according to the relationship among the nodes.
Illustratively, in the first step, a natural language processing algorithm may be used to extract keywords included in the feature data, where the keywords include: name of person, place name, company name, identification number, telephone number, bank card number, QQ number, e-mail address, IP address, number attribution company, location, key event, number word, action word, bank card number, telephone number, identification number, IP address, domain name, mail address, etc. According to the requirements, data of the bank card affiliated row, the telephone number affiliated operator, the identity card affiliated place and the like can be further improved, and the name normalization processing of the keywords is performed, wherein the normalization is realized through coreference resolution in the natural language processing technology, the coreferences among all phrases are extracted from sentences, and the keywords commonly referred to by the coreferences are determined, so that the normalization is realized, and the follow-up further processing and analysis are facilitated.
For example, from the feature data acquired in step S101, named entity recognition techniques in the field of natural language processing can be used to extract these entities from the original text. The names of people, places, key events (or actions), etc. are identified and extracted. For information such as a person's name, place name, company name, etc., it is called an entity in the field of natural language processing, or named an entity. The information is the key keyword scattered in a large amount of data; meanwhile, the information is used as very important input data and is input to a subsequent character relationship network analysis functional module.
In addition, the feature data acquired in step S101 includes another type of very important information, including but not limited to: identification number, telephone number, bank card number, QQ number, email address, IP address, and the like. When the characteristic data acquired in step S101 is manually read and meets the information, a judgment cannot be made immediately. Therefore, the information can be extracted and stored in a centralized way by using a keyword extraction algorithm, such as word2vec and other models, and further the information can be further sorted and analyzed, so that the information is better utilized. For example, the information such as the identification number, the mobile phone number, the QQ number and the like can be extracted and directly associated with a certain person according to all the relations of the numbers, so that more comprehensive information is provided when the relation and the interaction of the person are analyzed.
Further, in the feature data obtained in step S101, especially in the spoken and informal text, "alternative names" or "nicknames" of names or other keywords often appear, or many different names may be identified, and then in step S103, the keywords are put into the relationship graph as nodes and the nodes are connected with other nodes by edges.
In addition, information such as the mobile phone number of the bank card can be inquired about the attribution and the attribution company, the number and possibly related information such as inquired geographic position information and group information are respectively used as nodes for edge connection, and then the functions of mutual matching and mutual authentication are achieved.
And secondly, extracting the relationship among the nodes by taking the keywords as the nodes in the relationship graph.
After extracting the keywords in the data related to the complete set of road lending, all that needs to be done next is to extract the association relationship between the keywords from the feature data acquired in step S101. The method mainly extracts the relationship between the elements into two aspects: for structured data, directly querying to obtain a relation between nodes, wherein the structured data comprises: a bank transaction record; the relationship between the nodes comprises: one or a combination of a transfer relation, a charging relation, a payment relation, a conversation relation and an investment relation; for unstructured data, extracting the relationship between nodes by using a syntactic analysis algorithm, wherein the unstructured data comprises the following components: conversation content and chat records; the relationship between the nodes further comprises: reporting relationship, title relationship, job relationship, behavior relationship and intimacy relationship.
Illustratively, for the characteristic data acquired in the step of structured S101, such as the bank transaction record, etc., the relationship of remittance between the remitter and the recipient can be directly inquired in the data. By analogy, the association relationship among the keywords (such as the names of people, the names of companies, the locations, the key events, and the like) in all the structured data can be obtained, and the relationship types to be extracted include: a transfer relationship, a charging relationship, a payment relationship, a call relationship, an investment relationship, etc.
Illustratively, for the feature data acquired in the step S101 without structuring, such as call voice, chat records, etc., the keyword relationship extraction may be performed by a syntactic analysis method. Syntactic analysis is one of the key technologies in natural language processing, and is a process of analyzing an input text sentence to obtain a syntactic structure of the sentence, and performs correlation processing by analyzing syntactic information such as a subject, a predicate, and an object in the sentence. The relationships to be extracted here include: reporting relationship, title relationship, job relationship, behavior relationship, intimacy relationship, etc.
It should be noted that both the keyword recognition and the syntactic analysis algorithm are existing algorithms, and the innovation of the embodiment of the present invention mainly lies in the innovation of the overall technical idea.
S103: and contracting the non-human nodes in the relationship graph into the human nodes corresponding to the non-human nodes.
Illustratively, the method is based on extracting keywords in S102 and combining the association relationship found in S103, and a multi-element relationship network is constructed by taking the keywords as nodes and the association relationship between the keywords as edges. The feature data acquired in step S101 contains many important relationship information, and most importantly, the relationship between people is the relationship between people. Which people have direct contact and interaction with each other and which people have indirect associations with each other through others are probably one of the most important information to be included in the data. In addition, the relationship between people and groups (including companies, organizations, etc.) is also a large category of relationship that can be extracted from the text, such as which people directly participate in a group or do things for the group.
Based on the multi-element relationship network constructed in S104, the method further optimizes to obtain a relationship graph only containing character nodes. Because the relationship network constructed in the S104 method comprises a plurality of keyword nodes, the nodes comprise the following elements: personnel name, company name, location, key event, number word, action word, bank card number, telephone number, identification number, IP address, domain name, email address, etc. In order to better analyze the relationship graph between people in the relationship network, the non-people nodes need to be simplified and merged into the attributes of edges between people nodes. For example, in a multi-element relationship network, a person a has an employment relationship with a company c, and a person b also has an employment relationship with a company c, then in the simplified person relationship graph, the edge relationship between the person a and the person b is a co-worker relationship.
S104: determining the weight of the edges according to the types of the edges among the character nodes, and dividing the relational graph into a plurality of node sets; wherein the types of the edges include: employment relations, co-worker relations, transfer relations, charge relations, payment relations, call relations, investment relations, reporting relations, call relations, job relations, behavioral relations, intimacy relations, or a combination thereof.
Specifically, the steps may include the following: 41) respectively randomly assigning a unique ID to each node in the relational graph after the node contraction operation is executed, and assigning a preset weight to each edge according to the type of the edge between adjacent nodes; 42) for each node, using the formula, Wab=∑wab+∑wbaAnd calculating the weight summary of the nodes, wherein WabIs the weight between node a and node bSummarizing; w is aabIs the weight pointed to by node a to node b; w is abaIs the weight pointed to by node b to node a; 43) updating the ID of the node to the ID of the node with the maximum weight summarized value in the neighbor nodes of the node, and returning to execute the step 42) until the ID of each node is not changed any more; 44) and dividing the nodes with the same ID into a node set to obtain a plurality of node sets.
Illustratively, the ID of each node is updated with reference to the IDs of the neighboring nodes, and the node ID of the maximum weight edge calculated in S1062 among all the neighboring nodes is taken as the latest ID of the node. In the updating process, all the nodes are simultaneously carried out, the ID used for calculation is the ID before updating, and the updated ID does not participate in the calculation.
For example, if the ID of node a is 1, the ID of node B is 2, there is an edge between node a and node B, and node B is the largest neighbor node of node a, then the ID of node a will be replaced with 2 after the update, and if a is also the largest neighbor node of node B before the update, then node B will be replaced with 1 after the update. After a round of computation is completed, new iterations continue until the IDs of all nodes no longer change. At this time, the node sets with the same node ID have a very large association relationship and belong to the same node set.
It should be emphasized that, in the same round of updating, the node ID is updated, and the updated ID of the node is the ID of the neighbor node of the node obtained after the previous round of updating is finished.
S105: and aiming at each node set, acquiring the probability that the nodes in the node set are set as set loan partner members according to the coincidence degree of the node set and the data of the predetermined set loan criminals, and taking the persons corresponding to the node sets with the probability greater than a preset threshold value as set loan partner members.
The node nodes may be grouped for each node set using a formula,calculating the probability that the character corresponding to the node in the node set is used as a road loan partner, wherein S is the probabilityProbability that the character corresponding to the node in the node set is used as a road loan partner member; m is the number of nodes in the node set; n is the number of the person corresponding to the node in the node set and the node in the data belonging to the predetermined road set lending criminal.
It is understood that some of the criminals who have mastered refer to criminals who have had clear evidence of a road loan act.
By applying the embodiment of the invention, a corresponding relation graph is established according to the characteristic data related in the operation process of the road cover loan, and a relation graph only containing the relation of the characters is established according to the relation graph; and dividing a relation graph only containing character relations into a plurality of node sets through iteration among weights, judging the probability of each node set as a road set loan group partner according to the number of road set loan criminals appearing in the node sets, and further identifying the corresponding road set loan group partner according to the data of the existing road set loan criminals.
In addition, the traditional road loan group identification technology mainly relies on reading and analyzing the record files from different sources, combing the relationship among people and identifying key people and clues. In the analysis process, people at different levels need to repeatedly understand the same case, and important clues hidden in multiple times of different word supplies and hundreds of word supply file summaries of the same person are found through comparative analysis. After the physical relationship in the road loan is cleared, whether the criminal has a group or not is analyzed through a statistical analysis method according to the mastered data such as the criminal conversation information, the transfer information, the chat records and the like. And carrying out similar analysis on the identified group members in turn until all the group members are found.
With the increasing form of the route loan, various information data which can be used for the route loan group identification are increasing. The traditional working mode of relying on manual reading, understanding and analyzing data clues presents a new challenge, the information types and the numerical models related to the same road loan case exceed the comprehensible degree of human brains, the investigation clues hidden in the information are not easy to be found, a large amount of manpower is usually consumed in screening redundant information, and truly valuable clues can be identified and even possibly ignored finally.
The invention provides an analysis method based on a knowledge graph, which aims to solve the technical problems that various types of information are difficult to comprehensively analyze, effective clues are difficult to accurately position and the data visualization effect is poor in the prior art scheme, can effectively display the information content of a plurality of channels, and can analyze the relationship of personnel involved in the route loan in a multi-dimensional manner, so that the route loan group analysis is more comprehensive and accurate.
Compared with the prior art, the invention has the beneficial effects that: the invention is based on knowledge graph technology, can collect the characteristic data obtained in the step S101 from different sources into the same relation graph after extraction and integration, not only greatly enhances the visualization effect of the data, but also can more easily dig out the deep level relation hidden behind the complex network. Compared with the traditional statistical analysis method, the group identification based on the graph analysis method has higher accuracy and interpretability.
Example 2
Corresponding to the embodiment 1 of the invention, the embodiment of the invention also provides a system for identifying the route loan group.
Fig. 2 is a schematic structural diagram of an identification system for a route loan group provided in an embodiment of the present invention, as shown in fig. 2, the system includes:
a first obtaining module 201, configured to obtain feature data involved in a road set loan operation process, where the feature data includes: communication data, transaction records and personal information of personnel involved in the process of road set loan;
a building module 202, configured to use the keywords included in the feature data as nodes, and build a relationship graph including the nodes according to relationships among the nodes;
a contracting module 203, configured to contract non-person nodes in the relationship graph into person nodes corresponding to the non-person nodes;
the dividing module 204 is configured to determine the weight of each edge according to the type of the edge between the person nodes, and divide the relationship graph into a plurality of node sets; wherein the types of the edges include: one or a combination of employment relationship, colleague relationship, transfer relationship, charging relationship, payment relationship, conversation relationship, investment relationship, reporting relationship, call relationship, job relationship, behavior relationship and intimacy relationship;
a second obtaining module 205, configured to, for each node set, obtain, by using a degree of coincidence between the node set and data of a predetermined road loan criminal, a probability that a node in the node set is a road loan partner member, and use, as a road loan partner member, a person corresponding to the node set whose probability is greater than a preset threshold.
By applying the embodiment of the invention, a corresponding relation graph is established according to the characteristic data related in the operation process of the road cover loan, and a relation graph only containing the relation of the characters is established according to the relation graph; and dividing a relation graph only containing character relations into a plurality of node sets through iteration among weights, judging the probability of each node set as a road set loan group partner according to the number of road set loan criminals appearing in the node sets, and further identifying the corresponding road set loan group partner according to the data of the existing road set loan criminals.
In a specific implementation manner of the embodiment of the present invention, the building module 202 is configured to:
extracting keywords contained in the feature data by using a natural language processing algorithm, wherein the keywords comprise: one or a combination of a person's name, place name, company name, identification number, telephone number, bank card number, QQ number, email address, IP address, number home, and number home company.
In a specific implementation manner of the embodiment of the present invention, the contraction module 203 is configured to:
for structured data, directly querying to obtain a relation between nodes, wherein the structured data comprises: a bank transaction record; the relationship between the nodes comprises: one or a combination of a transfer relation, a charging relation, a payment relation, a conversation relation and an investment relation;
for unstructured data, extracting the relationship between nodes by using a syntactic analysis algorithm, wherein the unstructured data comprises the following components: conversation content and chat records; the relationship between the nodes further comprises: reporting relationship, title relationship, job relationship, behavior relationship and intimacy relationship.
In a specific implementation manner of the embodiment of the present invention, the dividing module 204 is configured to:
41) respectively randomly assigning a unique ID to each node in the relational graph after the node contraction operation is executed, and assigning a preset weight to each edge according to the type of the edge between adjacent nodes;
42) for each node, using the formula, Wab=∑wab+∑wbaAnd calculating the weight summary of the nodes, wherein WabSummarizing the weight between the node a and the node b; w is aabIs the weight pointed to by node a to node b; w is abaIs the weight pointed to by node b to node a;
43) updating the ID of the node to the ID of the node with the maximum weight summarized value in the neighbor nodes of the node, and returning to execute the step 42) until the ID of each node is not changed any more;
44) and dividing the nodes with the same ID into a node set to obtain a plurality of node sets.
In a specific implementation manner of the embodiment of the present invention, the second obtaining module 205 is configured to:
for each set of nodes, the node is updated, using the formula,
Figure BDA0002255034590000161
calculating the probability that the character corresponding to the node in the node set is used as a road loan partner, wherein,
s is the probability that the character corresponding to the node in the node set is used as a road loan partner member; m is the number of nodes in the node set; n is the number of the person corresponding to the node in the node set and the node in the data belonging to the predetermined road set lending criminal.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for identifying a road loan partner, the method comprising:
1) acquiring characteristic data related to a road set loan operation process, wherein the characteristic data comprises: communication data, transaction records and personal information of personnel involved in the process of road set loan;
2) taking keywords contained in the characteristic data as nodes, and constructing a relation graph comprising the nodes according to the relation among the nodes;
3) contracting non-character nodes in the relational graph into character nodes corresponding to the non-character nodes;
4) determining the weight of the edges according to the types of the edges among the character nodes, and dividing the relational graph into a plurality of node sets; wherein the types of the edges include: one or a combination of employment relationship, colleague relationship, transfer relationship, charging relationship, payment relationship, conversation relationship, investment relationship, reporting relationship, call relationship, job relationship, behavior relationship and intimacy relationship;
5) and aiming at each node set, acquiring the probability that the nodes in the node set are set as set loan group members according to the coincidence degree of the node set and the data of the predetermined set of road loan criminals, and taking the persons corresponding to the node sets with the probability larger than a preset threshold value as the set loan group members.
2. The method for identifying a roulette party as claimed in claim 1, wherein the step 2) comprises:
extracting keywords contained in the feature data by using a natural language processing algorithm, wherein the keywords comprise: one or a combination of a person's name, place name, company name, identification number, telephone number, bank card number, QQ number, email address, IP address, number home, and number home company.
3. The method for identifying a road loan partner as claimed in claim 1, wherein the obtaining procedure of the relationship between the nodes in step 2) comprises:
for structured data, directly querying to obtain a relation between nodes, wherein the structured data comprises: a bank transaction record; the relationship between the nodes comprises: one or a combination of a transfer relation, a charging relation, a payment relation, a conversation relation and an investment relation;
for unstructured data, extracting the relationship between nodes by using a syntactic analysis algorithm, wherein the unstructured data comprises the following components: conversation content and chat records; the relationship between the nodes further comprises: reporting relationship, title relationship, job relationship, behavior relationship and intimacy relationship.
4. The method for identifying a roulette party as claimed in claim 1, wherein the step 4) comprises:
41) respectively randomly assigning a unique ID to each node in the relational graph after the node contraction operation is executed, and assigning a preset weight to each edge according to the type of the edge between adjacent nodes;
42) for each node, using the formula, Wab=∑wab+∑wbaAnd calculating the weight summary of the nodes, wherein WabSummarizing the weight between the node a and the node b; w is aabIs the weight pointed to by node a to node b; w is abaIs the weight pointed to by node b to node a;
43) updating the ID of the node to the ID of the node with the maximum weight summarized value in the neighbor nodes of the node, and returning to execute the step 42) until the ID of each node is not changed any more;
44) and dividing the nodes with the same ID into a node set to obtain a plurality of node sets.
5. The method as claimed in claim 1, wherein the obtaining the probability that the node in the node set is a member of the arbitrage group comprises:
for each set of nodes, the node is updated, using the formula,
Figure FDA0002255034580000021
calculating the probability that the character corresponding to the node in the node set is used as a road loan partner, wherein,
s is the probability that the character corresponding to the node in the node set is used as a road loan partner member; m is the number of nodes in the node set; n is the number of the person corresponding to the node in the node set and the node in the data belonging to the predetermined road set lending criminal.
6. A system for identifying a road loan partner, the system comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring characteristic data involved in the operation process of the road cover loan, and the characteristic data comprises: communication data, transaction records and personal information of personnel involved in the process of road set loan;
the construction module is used for taking the keywords contained in the characteristic data as nodes and constructing a relation graph comprising the nodes according to the relation among the nodes;
the contraction module is used for contracting non-person nodes in the relational graph into person nodes corresponding to the non-person nodes;
the dividing module is used for determining the weight of the edges according to the types of the edges among the character nodes and dividing the relation graph into a plurality of node sets; wherein the types of the edges include: one or a combination of employment relationship, colleague relationship, transfer relationship, charging relationship, payment relationship, conversation relationship, investment relationship, reporting relationship, call relationship, job relationship, behavior relationship and intimacy relationship;
and the second acquisition module is used for acquiring the probability that the nodes in the node sets are the set loan partner members according to the coincidence degree of the node sets and the data of the predetermined set loan criminals, and taking the persons corresponding to the node sets with the probability greater than the preset threshold value as the set loan partner members.
7. The system for identifying a arbitrage partnership as claimed in claim 6, wherein said construction module is configured to:
extracting keywords contained in the feature data by using a natural language processing algorithm, wherein the keywords comprise: one or a combination of a person's name, place name, company name, identification number, telephone number, bank card number, QQ number, email address, IP address, number home, and number home company.
8. The system for identifying a arbitrage partnership as claimed in claim 6, wherein said construction module is configured to:
for structured data, directly querying to obtain a relation between nodes, wherein the structured data comprises: a bank transaction record; the relationship between the nodes comprises: one or a combination of a transfer relation, a charging relation, a payment relation, a conversation relation and an investment relation;
for unstructured data, extracting the relationship between nodes by using a syntactic analysis algorithm, wherein the unstructured data comprises the following components: conversation content and chat records; the relationship between the nodes further comprises: reporting relationship, title relationship, job relationship, behavior relationship and intimacy relationship.
9. The system for identifying a roulette group according to claim 6, wherein the dividing module is configured to:
41) respectively randomly assigning a unique ID to each node in the relational graph after the node contraction operation is executed, and assigning a preset weight to each edge according to the type of the edge between adjacent nodes;
42) for each node, using the formula, Wab=∑wab+∑wbaAnd calculating the weight summary of the nodes, wherein WabSummarizing the weight between the node a and the node b; w is aabIs the weight pointed to by node a to node b; w is abaIs the weight pointed to by node b to node a;
43) updating the ID of the node to the ID of the node with the maximum weight summarized value in the neighbor nodes of the node, and returning to execute the step 42) until the ID of each node is not changed any more;
44) and dividing the nodes with the same ID into a node set to obtain a plurality of node sets.
10. The system for identifying a road loan partner as claimed in claim 6, wherein the second obtaining module is configured to:
for each set of nodes, the node is updated, using the formula,
Figure FDA0002255034580000051
calculating the probability that the character corresponding to the node in the node set is used as a road loan partner, wherein,
s is the probability that the character corresponding to the node in the node set is used as a road loan partner member; m is the number of nodes in the node set; n is the number of the person corresponding to the node in the node set and the node in the data belonging to the predetermined road set lending criminal.
CN201911049749.4A 2019-10-31 2019-10-31 Method and system for identifying trepanning loan group partner Active CN110766091B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911049749.4A CN110766091B (en) 2019-10-31 2019-10-31 Method and system for identifying trepanning loan group partner

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911049749.4A CN110766091B (en) 2019-10-31 2019-10-31 Method and system for identifying trepanning loan group partner

Publications (2)

Publication Number Publication Date
CN110766091A true CN110766091A (en) 2020-02-07
CN110766091B CN110766091B (en) 2024-02-27

Family

ID=69334905

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911049749.4A Active CN110766091B (en) 2019-10-31 2019-10-31 Method and system for identifying trepanning loan group partner

Country Status (1)

Country Link
CN (1) CN110766091B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754337A (en) * 2020-06-30 2020-10-09 上海观安信息技术股份有限公司 Method and system for identifying credit card maintenance contract group

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050243736A1 (en) * 2004-04-19 2005-11-03 International Business Machines Corporation System, method, and service for finding an optimal collection of paths among a plurality of paths between two nodes in a complex network
US20080005106A1 (en) * 2006-06-02 2008-01-03 Scott Schumacher System and method for automatic weight generation for probabilistic matching
US20160071208A1 (en) * 2012-07-03 2016-03-10 Lexisnexis Risk Solutions Fl Inc. Systems and Method for Improving Computation Efficiency in the Detection of Fraud Indicators for Loans with Multiple Applicants
CN105404890A (en) * 2015-10-13 2016-03-16 广西师范学院 Criminal gang discrimination method considering locus space-time meaning
WO2016210327A1 (en) * 2015-06-25 2016-12-29 Websafety, Inc. Management and control of mobile computing device using local and remote software agents
CN108038778A (en) * 2017-12-05 2018-05-15 深圳信用宝金融服务有限公司 Clique's fraud recognition methods of the small micro- loan of internet finance and device
CN108681936A (en) * 2018-04-26 2018-10-19 浙江邦盛科技有限公司 A kind of fraud clique recognition methods propagated based on modularity and balance label
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN109191281A (en) * 2018-08-21 2019-01-11 重庆富民银行股份有限公司 A kind of group's fraud identifying system of knowledge based map
CN109299811A (en) * 2018-08-20 2019-02-01 众安在线财产保险股份有限公司 A method of the identification of fraud clique and Risk of Communication prediction based on complex network
CN109598509A (en) * 2018-10-17 2019-04-09 阿里巴巴集团控股有限公司 The recognition methods of risk clique and device
CN109741173A (en) * 2018-12-27 2019-05-10 深圳前海微众银行股份有限公司 Recognition methods, device, equipment and the computer storage medium of suspicious money laundering clique
CN109816519A (en) * 2019-01-25 2019-05-28 宜人恒业科技发展(北京)有限公司 A kind of recognition methods of fraud clique, device and equipment
CN109919624A (en) * 2019-02-28 2019-06-21 杭州师范大学 A kind of net loan fraud clique's identification and method for early warning based on space-time centrality
CN110224859A (en) * 2019-05-16 2019-09-10 阿里巴巴集团控股有限公司 The method and system of clique for identification

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050243736A1 (en) * 2004-04-19 2005-11-03 International Business Machines Corporation System, method, and service for finding an optimal collection of paths among a plurality of paths between two nodes in a complex network
US20080005106A1 (en) * 2006-06-02 2008-01-03 Scott Schumacher System and method for automatic weight generation for probabilistic matching
US20160071208A1 (en) * 2012-07-03 2016-03-10 Lexisnexis Risk Solutions Fl Inc. Systems and Method for Improving Computation Efficiency in the Detection of Fraud Indicators for Loans with Multiple Applicants
WO2016210327A1 (en) * 2015-06-25 2016-12-29 Websafety, Inc. Management and control of mobile computing device using local and remote software agents
CN105404890A (en) * 2015-10-13 2016-03-16 广西师范学院 Criminal gang discrimination method considering locus space-time meaning
CN108038778A (en) * 2017-12-05 2018-05-15 深圳信用宝金融服务有限公司 Clique's fraud recognition methods of the small micro- loan of internet finance and device
CN108681936A (en) * 2018-04-26 2018-10-19 浙江邦盛科技有限公司 A kind of fraud clique recognition methods propagated based on modularity and balance label
CN108764917A (en) * 2018-05-04 2018-11-06 阿里巴巴集团控股有限公司 It is a kind of fraud clique recognition methods and device
CN109299811A (en) * 2018-08-20 2019-02-01 众安在线财产保险股份有限公司 A method of the identification of fraud clique and Risk of Communication prediction based on complex network
CN109191281A (en) * 2018-08-21 2019-01-11 重庆富民银行股份有限公司 A kind of group's fraud identifying system of knowledge based map
CN109598509A (en) * 2018-10-17 2019-04-09 阿里巴巴集团控股有限公司 The recognition methods of risk clique and device
CN109741173A (en) * 2018-12-27 2019-05-10 深圳前海微众银行股份有限公司 Recognition methods, device, equipment and the computer storage medium of suspicious money laundering clique
CN109816519A (en) * 2019-01-25 2019-05-28 宜人恒业科技发展(北京)有限公司 A kind of recognition methods of fraud clique, device and equipment
CN109919624A (en) * 2019-02-28 2019-06-21 杭州师范大学 A kind of net loan fraud clique's identification and method for early warning based on space-time centrality
CN110224859A (en) * 2019-05-16 2019-09-10 阿里巴巴集团控股有限公司 The method and system of clique for identification

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
吴钟刚等: "一种基于局部相似性的社区发现算法", 《计算机工程》 *
吴钟刚等: "一种基于局部相似性的社区发现算法", 《计算机工程》, no. 12, 15 December 2016 (2016-12-15) *
张静等: "基于专利发明人人名消歧的研发团队识别研究", 《知识管理论坛》 *
张静等: "基于专利发明人人名消歧的研发团队识别研究", 《知识管理论坛》, no. 03, 29 June 2016 (2016-06-29) *
赵国庆: "科技驱动金融 生活与众不同", 《金融电子化》 *
赵国庆: "科技驱动金融 生活与众不同", 《金融电子化》, no. 11, 15 November 2018 (2018-11-15) *
黄敏等: "一种动态联盟企业风险概率识别方法", 《东北大学学报(自然科学版)》 *
黄敏等: "一种动态联盟企业风险概率识别方法", 《东北大学学报(自然科学版)》, no. 12, 28 December 2005 (2005-12-28) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754337A (en) * 2020-06-30 2020-10-09 上海观安信息技术股份有限公司 Method and system for identifying credit card maintenance contract group
CN111754337B (en) * 2020-06-30 2024-02-23 上海观安信息技术股份有限公司 Method and system for identifying credit card maintenance card present community

Also Published As

Publication number Publication date
CN110766091B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
Jiang et al. Linguistic signals under misinformation and fact-checking: Evidence from user comments on social media
CN107122451B (en) Automatic construction method of legal document sorter
Christen et al. Linking sensitive data
Houser et al. The use of big data analytics by the IRS: Efficient solutions or the end of privacy as we know it
TWI709927B (en) Method and device for determining target user group
US7693767B2 (en) Method for generating predictive models for a business problem via supervised learning
CN112053221A (en) Knowledge graph-based internet financial group fraud detection method
Dokuchaev et al. Data subject as augmented reality
Debreceny et al. Data mining of electronic mail and auditing: A research agenda
US20160179806A1 (en) Identity confidence scoring system and method
CN112053222A (en) Knowledge graph-based internet financial group fraud detection method
CN113989019A (en) Method, device, equipment and storage medium for identifying risks
CN110728301A (en) Credit scoring method, device, terminal and storage medium for individual user
Swaminathan et al. Wearmail: On-the-go access to information in your email with a privacy-preserving human computation workflow
Hidayati et al. Development of conceptual framework for cyber fraud investigation
Snipp American Indians: Clues to the future of other racial groups
Wu et al. Fraud-agents detection in online microfinance: A large-scale empirical study
CN108734021B (en) Financial loan big data risk assessment method and system based on privacy-removing data
CN110766091B (en) Method and system for identifying trepanning loan group partner
Clarke Dataveillance by governments: The technique of computer matching
CN112016850A (en) Service evaluation method and device
US20220270008A1 (en) Systems and methods for enhanced risk identification based on textual analysis
Xiong A method of mining key accounts from internet pyramid selling data
CN110135804A (en) Data processing method and device
KR102524828B1 (en) Detective agency brokerage system based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant