CN110223168B - Label propagation anti-fraud detection method and system based on enterprise relationship map - Google Patents
Label propagation anti-fraud detection method and system based on enterprise relationship map Download PDFInfo
- Publication number
- CN110223168B CN110223168B CN201910546944.1A CN201910546944A CN110223168B CN 110223168 B CN110223168 B CN 110223168B CN 201910546944 A CN201910546944 A CN 201910546944A CN 110223168 B CN110223168 B CN 110223168B
- Authority
- CN
- China
- Prior art keywords
- enterprise
- blacklist
- graph
- node
- fraud
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Computational Linguistics (AREA)
- Accounting & Taxation (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Animal Behavior & Ethology (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a label propagation anti-fraud detection method and a label propagation anti-fraud detection system based on an enterprise relationship map, which belong to the field of financial credit, and solve the technical problem of how to effectively analyze complex network data to find valuable information and further mine fraud risk embodied by complex network relationship, wherein the technical scheme is as follows: the method comprises the following steps: s1, establishing an enterprise blacklist library; s2, constructing a relation map: screening relevant tables and fields listed in a relational map in a relational database, and extracting object entities and entity relations of the relational database; s3, carrying out anti-fraud detection on the enterprise based on the self-built blacklist library and the enterprise relation map: and identifying the blacklist nodes of the relation map based on the blacklist library, extracting blacklist node connection subgraphs, identifying fraudulent enterprise nodes in each connection subgraph by using a label propagation algorithm, and estimating the anti-fraud probability of the enterprise. The system comprises an enterprise blacklist library establishing unit, a relation map establishing unit and an anti-fraud detection unit.
Description
Technical Field
The invention relates to the field of financial credit, in particular to a label propagation anti-fraud detection method and a label propagation anti-fraud detection system based on an enterprise relationship map.
Background
In the current marketplace of the financial profession, online fraud risks vary very frequently, and previously single individual fraud has rapidly evolved into organized, scaled group fraud and associated risks. However, the traditional anti-fraud means mainly identify individual risks in ways of identity authentication, client information logic verification, external information comparison verification, blacklist filtering and the like, and potential group fraud cannot be mined according to the relationship of thousands of strands, so that the risk vulnerability of the part needs to be covered based on the global risk identification capability of the network. Due to the intricate relationships of many large enterprises, the traditional graph representation mode is not suitable any more. In order to solve the problem, patent No. CN107229756A discloses a design method and system for visually showing an enterprise relationship graph, which includes the following steps: capturing main enterprise information of an enterprise to be queried from a national enterprise credit information public system by using a web crawler algorithm, and storing first-layer enterprise relationship data into a graphic database; respectively taking enterprise shareholders and external investment companies as keywords, capturing main enterprise information from a national enterprise credit information public system by using a web crawler algorithm again, and storing the main enterprise information into an enterprise main information database; storing the second-layer enterprise relation data into a graph database; until there is no stockholder or investing company in the last layer of enterprise relation data; and generating an enterprise relation map according to the graph database. The technical scheme enables the system user to quickly know the enterprise relationship and grasp the development trend of the company. However, the relationship between enterprises is not intuitive, and the processing efficiency of the sample with large data volume needs to be verified.
The enterprise relation map is characterized in that the relation between two enterprises is intuitively known by taking the enterprises as points, however, the enterprise relation is complex, thousands of enterprise data are predicted in a manual mode, a large amount of manpower and material resources are consumed, and the efficiency of constructing the enterprise relation map is low.
Therefore, how to effectively analyze the complex network data to find out valuable information and further mine the fraud risk embodied by the complex network relationship is a technical problem which needs to be solved urgently in the prior art at present.
The common label propagation method in the prior art is low in time complexity and suitable for a complex network, but has the problems that label updating is unstable and the number of communities depends on specific parameters.
Patent document CN109583620A discloses a method, an apparatus, a computer device and a storage medium for enterprise potential risk early warning. The method comprises the following steps: acquiring an enterprise association graph and extracting a node association relation, acquiring a risk parameter label carried by a propagation starting node in the enterprise association graph, acquiring a propagation path of the risk parameter label according to the risk parameter label and the node association relation, acquiring a propagation coefficient among nodes in the propagation path, and performing label propagation processing on the risk parameter label according to the propagation path and the propagation coefficient to acquire node potential risk early warning information. According to the technical scheme, the initial risk nodes are propagated by applying a label propagation algorithm based on the initial risk nodes, propagation paths are searched, potential risk data information is fed back according to the propagation paths, early warning is prompted, the potential risk early warning of enterprises is carried out in an application scene, but the complex network data cannot be effectively analyzed to find valuable information and further mine fraud risks reflected by complex network relationships.
Patent document CN108038700A discloses an anti-fraud data analysis method and system, which is used to obtain a data mart and a database from a back-end database and send the data mart and the database to an analysis model, wherein the data mart and the database are generated according to pre-collected basic data about fraud; the analysis model analyzes the data marts and the graph database to obtain an analysis result; and outputting the analysis result to the front end and displaying. The technical scheme is a model method based on combination of an individual fraud risk identification method and group fraud risk, aims to enrich a relational map visualization result based on fraud detection results, and aims to analyze data as personal data, but cannot effectively analyze complex network data to find valuable information and further mine fraud risk embodied by complex network relations.
Disclosure of Invention
The invention provides a label propagation anti-fraud detection method and a label propagation anti-fraud detection system based on an enterprise relationship map, and aims to solve the problem of how to effectively analyze complex network data to find valuable information and further mine fraud risks embodied by complex network relationships.
The technical task of the invention is realized in the following way, and the label propagation anti-fraud detection method based on the enterprise relationship map comprises the following steps:
s1, establishing an enterprise blacklist library: the method comprises the steps that a data acquisition technology collects original network data, the original network data are stored in a relational database, tables and fields which can be listed in an anti-fraud blacklist library in the relational database are screened, relevant data are subjected to extraction, fusion and duplicate removal preprocessing, and an enterprise anti-fraud blacklist library is established;
s2, constructing a relation map: screening relevant tables and fields listed in a relational graph in a relational database, extracting object entities and entity relations of the relational database, and constructing the relational graph;
s3, anti-fraud detection is carried out on the enterprise based on the self-built blacklist library and the enterprise relation map: and identifying the blacklist nodes of the relation map based on the blacklist library, extracting blacklist node connection subgraphs, identifying fraudulent enterprise nodes in each connection subgraph by using a label propagation algorithm, and estimating the probability that the enterprise belongs to fraud.
Preferably, the specific steps of establishing the enterprise blacklist library in step S1 are as follows:
s101, data acquisition and storage: acquiring data covering nationwide enterprise information, blacklist information and each information-losing enterprise information based on a data acquisition technology, and storing the acquired data in a relational database;
S102, screening the blacklist library warehousing objects: screening relevant tables and warehousing fields of the selected blacklist library in the relational database based on a service target for establishing an anti-fraud blacklist library;
s103, warehouse data duplicate removal: carrying out data deduplication aiming at the selected warehousing data, and uniquely identifying the enterprise object by using a unified social credit code;
s104, data updating: and regularly updating the related table data in the relational database, and synchronously updating the enterprise information in the enterprise blacklist database.
Preferably, the enterprise blacklist library comprises an illegal investment enterprise list, a loss-of-credit enterprise list, a business and/or customs loss-of-credit enterprise list, a credit China loss-of-credit financial enterprise list, a loss-of-credit logistics enterprise list and a judicial risk related enterprise list.
Preferably, the enterprise information in step S101 includes an enterprise name, a social credit code, and a blacklisting time.
Preferably, the specific steps of constructing the relationship map in step S2 are as follows:
s201, screening a relational graph related table: enterprise data covering the whole country is collected in a relational database, and the enterprise data comprises enterprise basic information, enterprise branches, enterprise changes, contact ways, external guarantee, external investment, mortgage of property, stockholders and loan information;
S202, node and node relation extraction: extracting entities of two types, namely companies and individuals of enterprises, legal persons and main employment personnel as relationship map nodes, and extracting triple relations of investment, invested investment, guarantee, legal persons and positions among the enterprises, the legal persons and the main employment personnel as relationship map node relations;
s203, weight assignment: different weights are given to the node relations of the relation maps according to the effect of the entity relations on anti-fraud, and the relation maps of the enterprise social relations are established based on the neo4j technology.
Preferably, the specific steps of performing anti-fraud detection on the enterprise based on the self-built blacklist library and the enterprise relationship map in step S3 are as follows:
s301, labeling relation map blacklist nodes: extracting enterprise blacklist data in the established blacklist library, searching blacklist enterprises appearing in the relational graph, and labeling blacklist node seed _ label attributes in the relational graph;
s302, extracting a blacklist connection subgraph: extracting a connection subgraph which is weakly communicated with each blacklist enterprise based on a Connected Components algorithm in a neo4j graph library aiming at the relation graph labeled based on the blacklist library;
s303, carrying out label propagation anti-fraud on the blacklist connection subgraph: and aiming at each extracted blacklist connection sub-graph, applying a Label Propagation algorithm in a neo4j graph library, setting parameters of algorithm nodes, node relations, relation weights, iteration times and seed nodes, iteratively tuning a Label Propagation algorithm to obtain a community to which the enterprise belongs, and calculating the probability of the enterprise prejudged as a fraud enterprise.
Preferably, the specific step of marking the relation graph blacklist node in step S301 is as follows:
s30101, sequentially reading each enterprise in an external blacklist library, and searching whether the enterprise node exists in a relational graph or not:
firstly, if the enterprise node exists, assigning a value to the seed _ label attribute of the enterprise node, and executing a step S30102;
if not, continuing to search the next blacklist node;
s30102, performing algorithm modeling by taking the nodes after assignment in the relational graph as seed nodes for label propagation.
Preferably, the specific steps of extracting the blacklist connection subgraph in the step S302 are as follows:
s30201, searching a subgraph Connected with any node based on Connected Components algorithm in neo4 j; the method comprises the following specific steps:
firstly, performing data modeling based on a Label Propagation algorithm in Neo4j, and setting initial seed node information;
secondly, setting nodes for label propagation, node relations, used node relation weights and iteration parameters;
thirdly, adjusting each parameter to run a label propagation algorithm in an iterative manner to obtain an ideal partitioning result of each node;
s30202, a subgraph SG corresponding to one node exists in the subgraph, and for any two nodes u and v in the subgraph SG, a path of u- > v exists or a path of v- > u exists;
S30203, storing the blacklist connection sub-graph information in each node of the relationship graph as an attribute.
Preferably, the specific steps of performing label propagation anti-fraud on the blacklist connection subgraph in step S303 are as follows:
s30301, establishing a complete graph of enterprises (including tagged and untagged), and enabling each enterprise (tagged and untagged) to be used as a node;
s30302, initializing, and calculating the weight of the edge between the two enterprises by using a weight formula to obtain the similarity between the enterprises;
s30303, each enterprise with the label is propagated to all enterprises through the edge, and the enterprises with the heavy edges can influence the adjacent enterprises more easily; the calculation formula of the edge weight among the enterprises is as follows:
s30304, defining a (l + u) probability propagation square matrix T, and further solving the probability that the enterprise label j propagates to the label i;
s30305, concentrating the probability distribution of the enterprises in a given category through probability transmission, and transmitting enterprise labels through side weight values, namely, each enterprise adds the marking values transmitted by the enterprises around the enterprise according to the transmission probability and updates the probability distribution of the enterprise;
s30306, defining the marked enterprises, reassigning the probability distribution of the marked enterprises to be an initial value, and jumping to step S30304 until the final iteration is finished, the probability distribution of similar enterprises tends to be similar, and the iteration can be finished after the probability distribution of similar enterprises is divided into a class.
A label propagation anti-fraud detection system based on an enterprise relationship map comprises,
the enterprise blacklist library establishing unit is used for establishing an enterprise anti-fraud blacklist library through preprocessing of extraction, fusion and duplicate removal of original network data collected by a data acquisition technology;
the relational graph construction unit is used for extracting the object entities of the relational database and the entity relations to construct the relational graph by screening the related tables and fields of the relational graph in the relational database;
and the anti-fraud detection unit is used for carrying out anti-fraud detection on the enterprise based on the self-built blacklist library and the enterprise relation map and estimating the probability that the enterprise belongs to fraud.
The label propagation anti-fraud detection method and system based on the enterprise relationship map have the following advantages:
the invention relates to a research method based on the integration of a relational graph and a label propagation algorithm, and a self-built blacklist library is used for pre-judging the anti-fraud risk of an enterprise on the relational graph constructed by the rich social relationship of the enterprise by using a graph library algorithm to identify the anti-fraud enterprise, and compared with the prior art, the method has the following beneficial results:
compared with the traditional anti-fraud method based on the rule engine, the method for distinguishing the fraud behavior from the normal operation by using the rule model overcomes the defect that the rule engine needs to summarize expert knowledge based on a large number of historical cases;
Secondly, compared with a machine learning model method, the method for describing the characteristics of the fraudulent conduct, utilizing data mining to establish a classification model aiming at historical data and identifying the fraudulent conduct overcomes the defect that a large amount of user behavior data must be collected;
the method is more suitable for the scene that the enterprise cannot collect loan application behaviors after first proposing loan application requests, and the method provided by the invention has more abundant application scenes;
establishing a relationship map based on enterprise social relations by mainly using enterprise data, wherein an application scene is anti-fraud detection before credit in a credit field;
the method for discriminating the anti-fraud based on the complex social relationship network of the enterprise by using the graph algorithm has stronger theoretical basis and can effectively identify the fraud type of group fraud;
sixthly, introducing a large data real-time processing method in the later period, continuously enriching the social relationship of enterprises and fusing various algorithms, the method provided by the invention can identify more potential and more accurate cheating enterprises, and has extremely wide application prospect;
the invention provides a method for detecting enterprise anti-fraud based on a label propagation algorithm under an enterprise relationship map, which is suitable for a credit subject of an enterprise in the credit field, realizes the anti-fraud detection of the enterprise by applying the method when the enterprise initiates credit application, and enriches the implementation method and application scenes of the enterprise anti-fraud detection;
And thirdly, labeling blacklist nodes in the relational graph based on the self-built blacklist library, searching connection subgraphs of the blacklist nodes based on the blacklist, carrying out label propagation based on the subgraphs to prejudge whether the enterprise is cheated, wherein the processing objects are all blacklist connection subgraphs, and the application scene is anti-cheating detection before loan in the credit process.
Drawings
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flow chart of relational graph building;
FIG. 2 is a block diagram of a process for anti-fraud detection for an enterprise based on a self-built blacklist repository and an enterprise relationship graph;
FIG. 3 is a block flow diagram of example 3.
Detailed Description
The tag propagation anti-fraud detection method and system based on the enterprise relationship graph are described in detail below with reference to the drawings and specific embodiments of the specification.
Example 1:
the invention discloses a label propagation anti-fraud detection method based on an enterprise relationship map, which comprises the following steps:
s1, establishing an enterprise blacklist library: the data acquisition technology collects original network data, the original network data are stored in a relational database, tables and fields which can be listed in an anti-fraud black name list library in the relational database are screened, relevant data are subjected to extraction, fusion and duplicate removal preprocessing, and an enterprise anti-fraud black name list library is established; the method comprises the following specific steps:
S101, data acquisition and storage: acquiring data covering nationwide enterprise information, blacklist information and each information-losing enterprise information based on a data acquisition technology, and storing the acquired data in a relational database; the business information includes a business name, a social credit code, and a blacklisting time.
S102, screening the blacklist library warehousing objects: screening relevant tables and warehousing fields of the selected blacklist library in the relational database based on a service target for establishing an anti-fraud blacklist library;
s103, warehouse data duplicate removal: carrying out data deduplication aiming at the selected warehousing data, and uniquely identifying the enterprise object by using a unified social credit code;
s104, data updating: and regularly updating the related table data in the relational database, and synchronously updating the enterprise information in the enterprise blacklist database. The enterprise blacklist library comprises an illegal investment collection enterprise list, a loss-of-credit enterprise list, a wage and/or customs loss-of-credit enterprise list, a credit China loss-of-credit financial enterprise list, a loss-of-credit logistics enterprise list and a judicial risk related enterprise list.
S2, constructing a relation map: screening relevant tables and fields listed in a relational map in a relational database, extracting object entities and entity relations of the relational database, and constructing the relational map; a relational graph refers to a graph-based data structure, consisting of nodes and edges. Each node represents an entity, and each edge is an entity-entity relationship. Relationship maps link different entities together by relationships, thereby providing the ability to analyze problems from a "relational" perspective. The structure of the relationship graph depends on how to define the relationship between the entities, and when solving the practical problem, the definition of the relationship needs to be according to the business requirements and is often very complex. The establishment of the relationship map mainly takes the individuals of enterprises, legal persons and main employment personnel as entity nodes, and extracts the social relationship among the entities as the entity relationship in the relationship map. As shown in the attached figure 1, the method comprises the following specific steps:
S201, screening a relational graph related table: enterprise data covering the whole country is collected in a relational database, and the enterprise data comprises enterprise basic information, enterprise branches, enterprise changes, contact ways, external guarantee, external investment, mortgage of property, stockholders and loan information;
s202, node and node relation extraction: extracting entities of two types, namely companies and individuals of enterprises, legal persons and main employment personnel as relationship map nodes, and extracting triple relations of investment, invested investment, guarantee, legal persons and positions among the enterprises, the legal persons and the main employment personnel as relationship map node relations;
s203, weight assignment: different weights are given to the node relations of the relation maps according to the effect of the entity relations on anti-fraud, and the relation maps of the enterprise social relations are established based on the neo4j technology.
S3, carrying out anti-fraud detection on the enterprise based on the self-built blacklist library and the enterprise relation map: identifying blacklist nodes of the relation map based on a blacklist library, extracting blacklist node connection subgraphs, identifying fraudulent enterprise nodes in each connection subgraph by using a label propagation algorithm, and estimating the probability that an enterprise belongs to fraud; the anti-fraud detection method based on the relational graph mainly comprises two cases of a supervision model and an unsupervised model. According to the method, a more optimal modeling method is selected according to whether an actual modeling scene sample can be labeled or not, the blacklist nodes in the relational graph can be well labeled by the self-built blacklist library, the blacklist nodes are used as marked training samples to participate in modeling, but the data volume of the labeled blacklist is very small compared with the data volume of huge nodes in the relational graph, and a semi-supervised modeling method is mainly adopted for modeling. As shown in the attached figure 2, the specific steps are as follows:
S301, labeling relation graph blacklist nodes: extracting enterprise blacklist data in the established blacklist library, searching blacklist enterprises appearing in the relational graph, and labeling blacklist node seed _ label attribute in the relational graph; in the constructed relationship graph, when the enterprise social relationship is less, such as a small micro-enterprise, the relationship graph is often embodied as an independent node, or is embodied as a group with the size of two formed with other enterprises or entities. When the social relationship information of an enterprise is rich, a larger group with more than three points and even more than ten nodes with close relationships appears, and if a certain enterprise in the group is marked as a blacklist enterprise under the condition, other enterprises in the group have anti-fraud risks to a great extent. Based on the situation, the invention searches the connection subgraphs with each node on the basis of labeling the blacklist nodes in the relation graph based on the blacklist library, and performs anti-fraud detection in each connection subgraph associated with the blacklist by using a label propagation semi-supervised modeling method. The method comprises the following specific steps:
s30101, sequentially reading each enterprise in an external blacklist library, and searching whether the enterprise node exists in a relational graph or not:
Firstly, if yes, assigning a value to the seed _ label attribute of the enterprise node, and executing a step S30102;
if not, continuously searching a next blacklist node;
s30102, performing algorithm modeling by taking the nodes after assignment in the relation graph as seed nodes for label propagation.
S302, extracting a blacklist connection subgraph: extracting a connection subgraph which is weakly communicated with each blacklist enterprise based on a Connected Components algorithm in a neo4j graph library aiming at the relation graph labeled based on the blacklist library; the Label Propagation Algorithm (LPA) is a fast algorithm for finding communities in graphs, using only the network structure as a guide to monitor communities in relational graphs, and does not require a predefined objective function or any a priori information about communities. In the LPA, initial labels can be assigned to nodes to narrow the scope of generating the final solution, i.e. a semi-supervised modeling approach is adopted to find the communities of initial communities that we pick in person. The principle of the label propagation algorithm is to predict the enterprise without the label based on the existing labeled enterprise, the existing enterprise and the label category, wherein the input is x unlabeled enterprises, y labeled enterprises and labels thereof, and the output is x unlabeled enterprises labels. The method comprises the following specific steps:
S30201, searching a sub-graph Connected with any node based on Connected Components algorithm in neo4 j; the method comprises the following specific steps:
firstly, performing data modeling based on a Label Propagation algorithm in Neo4j, and setting initial seed node information;
secondly, setting nodes for label propagation, node relations, used node relation weights and iteration parameters;
thirdly, adjusting each parameter to run a label propagation algorithm in an iterative manner to obtain an ideal partitioning result of each node;
s30202, a subgraph SG corresponding to one node exists in the subgraph, and for any two nodes u and v in the subgraph SG, a path of u- > v exists or a path of v- > u exists;
and S30203, storing the blacklist connection sub-graph information in each node of the relational graph by using the attribute.
S303, carrying out label propagation anti-fraud on the blacklist connection subgraph: aiming at each extracted blacklist connection sub-graph, applying a Label Propagation algorithm in a neo4j graph library, setting parameters of algorithm nodes, node relations, relation weights, iteration times and seed nodes, iteratively tuning a Label Propagation algorithm to obtain communities to which enterprises belong, and calculating the probability that the enterprises are prejudged to be fraudulent enterprises; the method comprises the following specific steps:
S30301, establishing a complete graph of enterprises (including tagged and untagged), and enabling each enterprise (tagged and untagged) to be used as a node;
s30302, initializing, and calculating the weight of the edge between the two enterprises by using a weight formula to obtain the similarity between the enterprises;
s30303, each enterprise with the label is propagated to all enterprises through the edge, and the enterprises with the heavy edges can influence the adjacent enterprises more easily; the calculation formula of the edge weight among the enterprises is as follows:
s30304, defining a (l + u) probability propagation square matrix T, and further solving the probability that the enterprise label j propagates to the label i;
s30305, concentrating the probability distribution of the enterprises in a given category through probability transmission, and transmitting enterprise labels through side weight values, namely, each enterprise adds the marking values transmitted by the enterprises around the enterprise according to the transmission probability and updates the probability distribution of the enterprise;
s30306, defining the marked enterprises, reassigning the probability distribution of the marked enterprises to be an initial value, and jumping to step S30304 until the final iteration is finished, the probability distribution of similar enterprises tends to be similar, and the iteration can be finished after the probability distribution of similar enterprises is divided into a class.
Example 2:
The invention discloses a label propagation anti-fraud detection system based on an enterprise relationship graph, which comprises,
the enterprise blacklist library establishing unit is used for establishing an enterprise anti-fraud blacklist library through preprocessing of extraction, fusion and duplicate removal of original network data collected by a data acquisition technology;
the relational graph construction unit is used for extracting the object entities and the entity relations of the relational database to construct the relational graph by screening the related tables and fields of the relational graph in the relational database;
and the anti-fraud detection unit is used for carrying out anti-fraud detection on the enterprise based on the self-built blacklist library and the enterprise relation map and estimating the probability that the enterprise belongs to fraud.
Example 3:
the method takes bank goods on-line examination and loan as an application example:
as shown in fig. 3, the specific steps are as follows:
the model training comprises the following specific steps:
(1) collecting original network data through a data acquisition technology, and storing the network data in a relational database;
(2) screening relevant tables which can be listed in an anti-fraud blacklist base based on enterprise data covering the whole country, extracting and fusing the table data, and rebuilding the enterprise anti-fraud blacklist base, namely regularly updating the blacklist base by using a relational database;
(3) Extracting entities such as enterprises, legal persons and main employment personnel in the database and various social relationships such as investment, invested, guarantee and legal persons among the entities, and constructing an enterprise relationship map by using a neo4j map library technology, wherein the node relationships in the relationship map are endowed with different weights according to the degree of the social relationships on anti-fraud, namely the neo4j map library is periodically updated by using a blacklist library;
(4) based on the self-built blacklist library and the enterprise relation map, extracting a maximum connection subgraph of blacklist enterprises by using a Connected Components algorithm in the neo4j graph library, extracting the enterprises with anti-fraud risks in the blacklist connection subgraph by using a label propagation algorithm, and estimating the probability of fraud belonging to the enterprises, namely performing anti-fraud detection by regular iteration through the blacklist library and the neo4j graph library;
and (II) the online credit examining unit completes the online credit examining process by using the data obtained by model training, and the specific steps are as follows:
(1) initiating an audit loan by the enterprise;
(2) before credit, performing pre-credit true check and fraud detection and judging whether the enterprise meets the requirements:
if yes, executing the step (3) next;
if not, skipping to the step (6);
(3) and credit rating are needed in the loan, and whether the enterprise meets the requirements or not is judged:
If yes, executing the step (4) next;
if not, skipping to the step (6);
(4) after the loan, risk monitoring and risk early warning are needed, and whether the enterprise meets the requirements or not is judged:
if yes, executing the step (5) next;
if not, skipping to the step (6);
(5) the bank acquires data of the wind control service;
(6) and finishing the on-line credit examination process.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (7)
1. A label propagation anti-fraud detection method based on an enterprise relationship map is characterized by comprising the following steps:
s1, establishing an enterprise blacklist library: the method comprises the steps that a data acquisition technology collects original network data, the original network data are stored in a relational database, tables and fields which can be listed in an anti-fraud black name list library in the relational database are screened, relevant data are preprocessed, the preprocessing comprises extraction, fusion and duplication removal, and an enterprise anti-fraud black name list library is established;
S2, constructing a relation map: screening relevant tables and fields listed in a relational map in a relational database, extracting object entities and entity relations of the relational database, and constructing the relational map; the method comprises the following specific steps:
s201, screening a relational graph related table: enterprise data covering the whole country is collected in a relational database, and the enterprise data comprises enterprise basic information, enterprise branches, enterprise changes, contact ways, external guarantee, external investment, mortgage of property, stockholders and loan information;
s202, extracting various social relationships among enterprises, legal persons and main employment personnel in a relational database as entities and among entities, invested, guaranteed and legal persons, constructing an enterprise relational map by using a neo4j gallery technology, and endowing different weights to the node relationships in the enterprise relational map according to the degree of the social relationships on anti-fraud, namely regularly updating the neo4j gallery by using a blacklist gallery;
s3, carrying out anti-fraud detection on the enterprise based on the self-built blacklist library and the enterprise relation map: identifying blacklist nodes of the relation map based on a blacklist library, extracting blacklist node connection subgraphs, identifying fraudulent enterprise nodes in each connection subgraph by using a label propagation algorithm, and estimating the probability that an enterprise belongs to fraud; the method comprises the following specific steps:
S301, labeling relation graph blacklist nodes: extracting enterprise blacklist data in the established blacklist library, searching blacklist enterprises appearing in the relational graph, and labeling blacklist node seed _ label attribute in the relational graph;
s302, extracting a blacklist connection subgraph: extracting a connection subgraph which is weakly communicated with each blacklist enterprise based on a Connected Components algorithm in a neo4j graph library aiming at the relation graph labeled based on the blacklist library; the method comprises the following specific steps:
s30201, searching a subgraph Connected with any node based on Connected Components algorithm in neo4 j; the method comprises the following specific steps:
firstly, performing data modeling based on a Label Propagation algorithm in neo4j, and setting initial seed node information;
secondly, setting nodes for label propagation, node relations, used node relation weights and iteration parameters;
thirdly, adjusting each parameter to run a label propagation algorithm in an iterative manner to obtain an ideal partitioning result of each node;
s30202, a subgraph SG corresponding to one node exists in the subgraph, and for any two nodes u and v in the subgraph SG, a path of u- > v exists or a path of v- > u exists;
s30203, storing the blacklist connection sub-graph information in each node of the relational graph by attributes;
S303, performing label propagation anti-fraud on the blacklist connection subgraph: and aiming at each extracted blacklist connection sub-graph, applying a Label Propagation algorithm in a neo4j graph library, setting parameters of algorithm nodes, node relations, relation weights, iteration times and seed nodes, iteratively tuning a Label Propagation algorithm to obtain a community to which the enterprise belongs, and calculating the probability of the enterprise prejudged as a fraud enterprise.
2. The label propagation anti-fraud detection method based on the enterprise relationship graph as claimed in claim 1, wherein the specific steps of establishing the enterprise blacklist library in the step S1 are as follows:
s101, data acquisition and storage: acquiring data covering nationwide enterprise information, blacklist information and each information-losing enterprise information based on a data acquisition technology, and storing the acquired data in a relational database;
s102, screening the blacklist library warehousing objects: screening relevant tables and warehousing fields of the selected blacklist library in the relational database based on a service target for establishing an anti-fraud blacklist library;
s103, warehouse data duplicate removal: carrying out data deduplication aiming at the selected warehousing data, and uniquely identifying the enterprise object by using a unified social credit code;
S104, data updating: and regularly updating the related table data in the relational database, and synchronously updating the enterprise information in the enterprise blacklist database.
3. The label propagation anti-fraud detection method based on enterprise relationship graph as claimed in claim 1 or 2, characterized in that said enterprise blacklist library comprises illegal funding enterprise list, credit loss enterprise list, wage and/or customs credit loss enterprise list, credit China credit loss financial enterprise list, credit loss logistics enterprise list and judicial risk related enterprise list.
4. The tag propagation anti-fraud detection method according to claim 3, wherein the enterprise information in step S101 includes an enterprise name, a social credit code, and a blacklisting time.
5. The label propagation anti-fraud detection method based on enterprise relationship graph of claim 1, wherein the specific steps of labeling relationship graph blacklist nodes in step S301 are as follows:
s30101, sequentially reading each enterprise in the external blacklist library, and searching whether the enterprise node exists in the relational graph or not:
firstly, if the enterprise node exists, assigning a value to the seed _ label attribute of the enterprise node, and executing a step S30102;
If not, continuously searching a next blacklist node;
s30102, performing algorithm modeling by taking the nodes after assignment in the relation graph as seed nodes for label propagation.
6. The method for detecting label propagation anti-fraud based on the enterprise relationship graph according to claim 1, wherein the specific steps of performing label propagation anti-fraud on the blacklist connection subgraph in step S303 are as follows:
s30301, establishing a complete graph of enterprises, and enabling each enterprise to serve as a node;
s30302, initializing, and calculating the weight of the edge between the two enterprises by using a weight formula to obtain the similarity between the enterprises;
s30303, each enterprise with the label is propagated to all enterprises through the edge, and the enterprises with the heavy edges can influence the adjacent enterprises more easily; the calculation formula of the edge weight among the enterprises is as follows:
s30304, defining a (l + u) probability propagation square matrix T, and further solving the probability that the enterprise label j propagates to the label i;
s30305, probability distribution is concentrated in a given category through probability transmission, enterprise labels are transmitted through edge weight values, namely, each enterprise adds the label values transmitted by the enterprises around the enterprise according to the transmission probability and updates the probability distribution of the enterprise;
S30306, defining the marked enterprises, reassigning the probability distribution of the marked enterprises to be an initial value, and jumping to step S30304 until the probability distribution of the similar enterprises is similar when the final iteration is finished, and dividing the probability distribution into a class, namely finishing the iteration.
7. A label propagation anti-fraud detection system based on an enterprise relationship map is characterized by comprising,
the enterprise blacklist base establishing unit is used for establishing an enterprise anti-fraud blacklist base by preprocessing original network data collected by a data acquisition technology; wherein the pretreatment comprises extraction, fusion and duplication removal;
the relational graph construction unit is used for extracting the object entities and the entity relations of the relational database to construct the relational graph by screening the related tables and fields of the relational graph in the relational database; the working process of the relation map construction unit is as follows:
s201, screening a relational graph related table: enterprise data covering the whole country is collected in a relational database, and the enterprise data comprises enterprise basic information, enterprise branches, enterprise changes, contact ways, external guarantee, external investment, mortgage of property, stockholders and loan information;
S202, extracting various social relationships among enterprises, legal persons and main employment personnel in a relational database as entities and among entities, invested, guaranteed and legal persons, constructing an enterprise relational map by using a neo4j gallery technology, and endowing different weights to the node relationships in the enterprise relational map according to the degree of the social relationships on anti-fraud, namely regularly updating the neo4j gallery by using a blacklist gallery;
the anti-fraud detection unit is used for carrying out anti-fraud detection on the enterprise based on the self-built blacklist library and the enterprise relation map and estimating the probability that the enterprise belongs to fraud;
the working process of the anti-fraud detection unit is as follows:
s301, labeling relation graph blacklist nodes: extracting enterprise blacklist data in the established blacklist library, searching blacklist enterprises appearing in the relational graph, and labeling blacklist node seed _ label attributes in the relational graph;
s302, extracting a blacklist connection subgraph: extracting a connection subgraph which is weakly communicated with each blacklist enterprise based on a Connected Components algorithm in a neo4j graph library aiming at the relation graph labeled based on the blacklist library; the method comprises the following specific steps:
s30201, searching a subgraph Connected with any node based on Connected Components algorithm in neo4 j; the method comprises the following specific steps:
Firstly, performing data modeling based on a Label Propagation algorithm in neo4j, and setting initial seed node information;
secondly, setting nodes for label propagation, node relations, used node relation weights and iteration parameters;
thirdly, adjusting each parameter to run a label propagation algorithm in an iterative manner to obtain an ideal partitioning result of each node;
s30202, a subgraph SG corresponding to one node exists in the subgraph, and for any two nodes u and v in the subgraph SG, a path of u- > v exists or a path of v- > u exists;
s30203, storing the blacklist connection sub-graph information in each node of the relational graph in an attribute mode;
s303, performing label propagation anti-fraud on the blacklist connection subgraph: and aiming at each extracted blacklist connection sub-graph, applying a Label Propagation algorithm in a neo4j graph library, setting parameters of algorithm nodes, node relations, relation weights, iteration times and seed nodes, iteratively tuning a Label Propagation algorithm to obtain a community to which the enterprise belongs, and calculating the probability of the enterprise prejudged as a fraud enterprise.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910546944.1A CN110223168B (en) | 2019-06-24 | 2019-06-24 | Label propagation anti-fraud detection method and system based on enterprise relationship map |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910546944.1A CN110223168B (en) | 2019-06-24 | 2019-06-24 | Label propagation anti-fraud detection method and system based on enterprise relationship map |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110223168A CN110223168A (en) | 2019-09-10 |
CN110223168B true CN110223168B (en) | 2022-06-28 |
Family
ID=67814376
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910546944.1A Active CN110223168B (en) | 2019-06-24 | 2019-06-24 | Label propagation anti-fraud detection method and system based on enterprise relationship map |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110223168B (en) |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110717823B (en) * | 2019-09-29 | 2022-08-02 | 支付宝(杭州)信息技术有限公司 | Credit overdue risk identification method and system |
CN110909986A (en) * | 2019-11-04 | 2020-03-24 | 苏宁金融科技(南京)有限公司 | Suspected actual controller risk identification method and system based on knowledge graph |
CN110990587B (en) * | 2019-12-04 | 2023-04-18 | 电子科技大学 | Enterprise relation discovery method and system based on topic model |
CN111131626B (en) * | 2019-12-20 | 2022-01-14 | 珠海高凌信息科技股份有限公司 | Group harmful call detection method and device based on stream data atlas and readable medium |
CN111178615B (en) * | 2019-12-24 | 2023-10-27 | 成都数联铭品科技有限公司 | Method and system for constructing enterprise risk identification model |
CN111031068B (en) * | 2019-12-27 | 2022-04-26 | 杭州安恒信息技术股份有限公司 | DNS analysis method based on complex network |
CN111309822B (en) * | 2020-02-11 | 2023-05-09 | 简链科技(广东)有限公司 | User identity recognition method and device |
CN111414485B (en) * | 2020-03-17 | 2022-09-30 | 北京恒通慧源大数据技术有限公司 | Enterprise customer association relationship map construction method and device, storage and computer |
CN111798092B (en) * | 2020-05-27 | 2024-03-12 | 深圳奇迹智慧网络有限公司 | Customs inspection monitoring method, customs inspection monitoring device, computer equipment and storage medium |
CN111814064B (en) * | 2020-06-24 | 2024-09-13 | 平安科技(深圳)有限公司 | Neo4 j-based abnormal user processing method, neo4 j-based abnormal user processing device, computer equipment and medium |
CN111932174B (en) * | 2020-07-28 | 2024-05-28 | 中华人民共和国深圳海关 | Freight supervision abnormal information acquisition method, device, server and storage medium |
CN112053221A (en) * | 2020-08-14 | 2020-12-08 | 百维金科(上海)信息科技有限公司 | Knowledge graph-based internet financial group fraud detection method |
CN112084343A (en) * | 2020-09-10 | 2020-12-15 | 杭州安恒信息安全技术有限公司 | Method, device and medium for quantifying social relationship graph |
CN112115174A (en) * | 2020-09-15 | 2020-12-22 | 北京通付盾人工智能技术有限公司 | KYC method and system based on graph computing technology |
CN112131275B (en) * | 2020-09-23 | 2023-07-25 | 长三角信息智能创新研究院 | Enterprise portrait construction method of holographic city big data model and knowledge graph |
CN112199450B (en) * | 2020-09-30 | 2024-06-14 | 支付宝(杭州)信息技术有限公司 | Relation map construction method and device and electronic equipment |
CN112200583B (en) * | 2020-10-28 | 2023-12-19 | 交通银行股份有限公司 | Knowledge graph-based fraudulent client identification method |
CN112613763B (en) * | 2020-12-25 | 2024-04-16 | 北京知因智慧科技有限公司 | Data transmission method and device |
CN112767136A (en) * | 2021-01-26 | 2021-05-07 | 天元大数据信用管理有限公司 | Credit anti-fraud identification method, credit anti-fraud identification device, credit anti-fraud identification equipment and credit anti-fraud identification medium based on big data |
CN112785423A (en) * | 2021-02-07 | 2021-05-11 | 撼地数智(重庆)科技有限公司 | Method, device, equipment and storage medium for mining fraud risk node |
CN112966099B (en) * | 2021-02-26 | 2024-06-25 | 北京金堤征信服务有限公司 | Relationship graph display method and device and computer readable storage medium |
CN112989374B (en) * | 2021-03-09 | 2021-11-26 | 闪捷信息科技有限公司 | Data security risk identification method and device based on complex network analysis |
CN113222737B (en) * | 2021-05-25 | 2022-06-14 | 天津大学 | Risk visualization graph layout method for financial network |
CN113516553A (en) * | 2021-07-28 | 2021-10-19 | 中国建设银行股份有限公司 | Credit risk early warning method and device |
CN113988886A (en) * | 2021-10-28 | 2022-01-28 | 深圳永安在线科技有限公司 | Fraud behavior tracking method, device and related equipment based on safety information |
CN115426206B (en) * | 2022-11-07 | 2023-03-24 | 中邮消费金融有限公司 | Graph anti-fraud capability enabling method and system based on homomorphic encryption technology |
CN115983636B (en) * | 2022-12-26 | 2023-11-17 | 深圳市中政汇智管理咨询有限公司 | Risk assessment method, apparatus, device and storage medium |
CN115774793B (en) * | 2023-01-29 | 2023-05-30 | 上海蜜度信息技术有限公司 | Mechanism timeliness detection method, system, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106127038A (en) * | 2016-06-22 | 2016-11-16 | 中国建设银行股份有限公司 | The processing method of a kind of blacklist and system |
CN109583620A (en) * | 2018-10-11 | 2019-04-05 | 平安科技(深圳)有限公司 | Enterprise's potential risk method for early warning, device, computer equipment and storage medium |
CN109685647A (en) * | 2018-12-27 | 2019-04-26 | 阳光财产保险股份有限公司 | The training method of credit fraud detection method and its model, device and server |
CN109800335A (en) * | 2019-01-23 | 2019-05-24 | 平安科技(深圳)有限公司 | Generation method, device, computer equipment and the storage medium of enterprise's map |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150161622A1 (en) * | 2013-12-10 | 2015-06-11 | Florian Hoffmann | Fraud detection using network analysis |
-
2019
- 2019-06-24 CN CN201910546944.1A patent/CN110223168B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106127038A (en) * | 2016-06-22 | 2016-11-16 | 中国建设银行股份有限公司 | The processing method of a kind of blacklist and system |
CN109583620A (en) * | 2018-10-11 | 2019-04-05 | 平安科技(深圳)有限公司 | Enterprise's potential risk method for early warning, device, computer equipment and storage medium |
CN109685647A (en) * | 2018-12-27 | 2019-04-26 | 阳光财产保险股份有限公司 | The training method of credit fraud detection method and its model, device and server |
CN109800335A (en) * | 2019-01-23 | 2019-05-24 | 平安科技(深圳)有限公司 | Generation method, device, computer equipment and the storage medium of enterprise's map |
Non-Patent Citations (1)
Title |
---|
在反欺诈拉锯战中,关系图谱扮演着什么重要角色;倪伟渊;《https://www.secrss.com/articles/503》;20180125;第1-3页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110223168A (en) | 2019-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223168B (en) | Label propagation anti-fraud detection method and system based on enterprise relationship map | |
CN110704572B (en) | Suspected illegal fundraising risk early warning method, device, equipment and storage medium | |
CN112053221A (en) | Knowledge graph-based internet financial group fraud detection method | |
Nguyen et al. | Vasabi: Hierarchical user profiles for interactive visual user behaviour analytics | |
CN112132233A (en) | Criminal personnel dangerous behavior prediction method and system based on effective influence factors | |
CN110990718B (en) | Social network model building module of company image lifting system | |
CN105574544A (en) | Data processing method and device | |
CN112053222A (en) | Knowledge graph-based internet financial group fraud detection method | |
CN115794803B (en) | Engineering audit problem monitoring method and system based on big data AI technology | |
CN112417176A (en) | Graph feature-based method, device and medium for mining implicit association relation between enterprises | |
CN116402512B (en) | Account security check management method based on artificial intelligence | |
CN116384889A (en) | Intelligent analysis method for information big data based on natural language processing technology | |
CN110716957B (en) | Intelligent mining and analyzing method for class case suspicious objects | |
CN117829994A (en) | Money laundering risk analysis method based on graph calculation | |
Rabbi et al. | An Approximation For Monitoring The Efficiency Of Cooperative Across Diverse Network Aspects | |
CN107493275A (en) | The extracted in self-adaptive and analysis method and system of heterogeneous network security log information | |
Yu et al. | Predicting nft classification with gnn: A recommender system for web3 assets | |
CN117436729A (en) | Government system based data management and data analysis method | |
CN113408207A (en) | Data mining method based on social network analysis technology | |
CN111833073A (en) | Airline customer segmentation method based on K-Means + + algorithm | |
CN116228402A (en) | Financial credit investigation feature warehouse technical support system | |
CN112506930B (en) | Data insight system based on machine learning technology | |
CN109828995A (en) | A kind of diagram data detection method, the system of view-based access control model feature | |
Zhao et al. | Detecting fake reviews via dynamic multimode network | |
CN114529383A (en) | Method and system for realizing tax payment tracking and tax loss early warning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |