CN111143430A - Guarantee data mining method and system - Google Patents

Guarantee data mining method and system Download PDF

Info

Publication number
CN111143430A
CN111143430A CN201911241360.XA CN201911241360A CN111143430A CN 111143430 A CN111143430 A CN 111143430A CN 201911241360 A CN201911241360 A CN 201911241360A CN 111143430 A CN111143430 A CN 111143430A
Authority
CN
China
Prior art keywords
guarantee
entity
graph
identified
chain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911241360.XA
Other languages
Chinese (zh)
Inventor
刘鹏飞
耿少华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201911241360.XA priority Critical patent/CN111143430A/en
Publication of CN111143430A publication Critical patent/CN111143430A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Abstract

The embodiment of the invention discloses a method and a system for guaranteeing data mining, wherein the method comprises the following steps: determining a guarantee chain circle candidate set from a pre-stored guarantee network according to the identity of the entity to be identified; and determining the guarantee chain of the entity to be identified from the guarantee chain candidate set according to a predefined guarantee chain identification model. Therefore, the guarantee chain ring can be identified from the mass data, and the efficiency of guarantee ring identification is improved.

Description

Guarantee data mining method and system
Technical Field
The embodiments of the present invention relate to data mining technologies, and in particular, to a method and system for guaranteeing data mining.
Background
In credit business, a bank usually cannot accurately evaluate repayment capacity of a credit object through analysis of the operating condition of the credit object, and meanwhile, the bank has certain limitation on a risk control means, so that mortgages and pledges meeting legal and policy requirements are provided for credit objects for maintaining the rights of creditors and slowly releasing default risks. Warranty measures are taken in the event of insufficient collateral assets. Under this mechanism, multiple enterprises form a security chain with security relationships as the main by connecting with each other or in a chain. Although the guarantee mode can slowly release risks, the guarantee chain loop has complexity and instability and has strong risk infectivity. A business presents a credit risk that is likely to spread into regional, industrial, or even systematic risks. Therefore, the method actively explores the credit enterprise guarantee link identification, and has important significance for preventing the further spread of the risk of the credit enterprise guarantee link and ensuring the credit fund security of the commercial bank.
At present, in order to identify a guarantee link, a bank adopts a guarantee link identification scheme which takes a Structured database as a core and adopts a storage process sql (Structured Query Language) to realize a guarantee link identification algorithm based on the existing system and data. The main idea of the scheme is to traverse a Graph (Graph) generated by guarantee relationship by adopting a Depth First Search (DFS) algorithm, perform exhaustive Search on all guarantee relationships until all guarantee rings are closed and not crossed, namely, enterprises in each guarantee ring have guarantee relationships with each other, and enterprises in different guarantee rings have no guarantee relationship with each other, and mark corresponding guarantee ring identifiers for each enterprise. For example, according to the above traversal process, the specific steps of the security circle identification are as follows: first, the warranty relationship is extracted and a graph is generated. The guarantee relationship (excluding duplicate guarantee relationships) of all loans is derived from the database, and one guarantee relationship is represented by (x, y), where x is the borrower and y is the insurer. The guarantee relationship is used as an edge, and the enterprise is used as a node to form a graph. Since the guaranteed relationship is directional, it is represented as a directed graph. Second, the graph is represented as an adjacency matrix (X, Y). The adjacency matrix is a two-dimensional array in which each dimension is all the nodes (i.e., business names) in the graph. When an edge exists between the node i and the node j (namely, a guarantee relationship exists), the values of the elements corresponding to the ith row and the jth column are 1, otherwise, the values are 0. The adjacent matrix represents the complex guarantee relationship as a clear two-dimensional matrix, which is beneficial to fast searching all adjacent nodes of any node in the graph by the DFS and ensures the high efficiency and accuracy of searching. And thirdly, adopting a DFS algorithm to identify the guarantee circle. And (5) searching and traversing the adjacent matrixes (X and Y) by utilizing a DFS algorithm to obtain a guarantee circle identification result.
Due to the rapid growth of banking credit business, with large amounts of credit data and ever changing vouching relationships, the size of the constructed vouching map has increased dramatically. Thus, based on the above guarantee link identification scheme, the current guarantee link identification strategy is difficult to meet the requirement of mass data mining due to the complexity of the guarantee relationship network.
Disclosure of Invention
In view of this, an embodiment of the present invention provides a method for guaranteeing data mining, including:
determining a guarantee chain circle candidate set from a pre-stored guarantee network according to the identity of the entity to be identified;
and determining the guarantee chain of the entity to be identified from the guarantee chain candidate set according to a predefined guarantee chain identification model.
An embodiment of the present invention further provides a system for guarantying data mining, including:
the first determining unit is used for determining a guarantee link candidate set from a pre-stored guarantee network according to the identity of the entity to be identified;
and the second determining unit is used for determining the guarantee link of the entity to be identified from the guarantee link candidate set according to a predefined guarantee link identification model.
An embodiment of the present invention further provides a system for guarantying data mining, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing the above-described method of vouching for data mining.
An embodiment of the present invention further provides a computer-readable storage medium, where an information processing program is stored on the computer-readable storage medium, and when the information processing program is executed by a processor, the information processing program implements the steps of the method for guaranteeing data mining.
According to the technical scheme provided by the embodiment of the invention, the guarantee chain ring can be identified from mass data, and the efficiency of guarantee ring identification is improved.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. Other advantages of the present application may be realized and attained by the instrumentalities and combinations particularly pointed out in the specification and the drawings.
Drawings
The accompanying drawings are included to provide an understanding of the present disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the examples serve to explain the principles of the disclosure and not to limit the disclosure.
FIG. 1 is a flow diagram illustrating a method for guarantying data mining according to an embodiment of the invention;
FIG. 2 is a flow diagram illustrating a method for guarantying data mining according to another embodiment of the invention;
FIG. 3 is a flow diagram illustrating a method for guarantying data mining according to another embodiment of the invention;
FIG. 4 is a flow diagram of a system for guarantying data mining according to an embodiment of the inventions;
FIG. 5 is a flow diagram illustrating a system for guarantying data mining according to another embodiment of the inventions;
FIG. 6 is a schematic illustration of an identified warranty link in accordance with an embodiment of the present invention;
fig. 7 is a flowchart illustrating a system for guarantying data mining according to another embodiment of the present invention.
Detailed Description
The present application describes embodiments, but the description is illustrative rather than limiting and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the embodiments described herein. Although many possible combinations of features are shown in the drawings and discussed in the detailed description, many other combinations of the disclosed features are possible. Any feature or element of any embodiment may be used in combination with or instead of any other feature or element in any other embodiment, unless expressly limited otherwise.
The present application includes and contemplates combinations of features and elements known to those of ordinary skill in the art. The embodiments, features and elements disclosed in this application may also be combined with any conventional features or elements to form a unique inventive concept as defined by the claims. Any feature or element of any embodiment may also be combined with features or elements from other inventive aspects to form yet another unique inventive aspect, as defined by the claims. Thus, it should be understood that any of the features shown and/or discussed in this application may be implemented alone or in any suitable combination. Accordingly, the embodiments are not limited except as by the appended claims and their equivalents. Furthermore, various modifications and changes may be made within the scope of the appended claims.
Further, in describing representative embodiments, the specification may have presented the method and/or process as a particular sequence of steps. However, to the extent that the method or process does not rely on the particular order of steps set forth herein, the method or process should not be limited to the particular sequence of steps described. Other orders of steps are possible as will be understood by those of ordinary skill in the art. Therefore, the particular order of the steps set forth in the specification should not be construed as limitations on the claims. Further, the claims directed to the method and/or process should not be limited to the performance of their steps in the order written, and one skilled in the art can readily appreciate that the sequences may be varied and still remain within the spirit and scope of the embodiments of the present application.
Fig. 1 is a schematic flow chart of a method for guaranteeing data mining according to an embodiment of the present invention, as shown in fig. 1, the method includes:
step 101, determining a guarantee chain circle candidate set from a pre-stored guarantee network according to an entity identifier to be identified;
and 102, determining the guarantee link of the entity to be identified from the guarantee link candidate set according to a predefined guarantee link identification model.
Optionally, the guarantee network is a point-edge relationship graph formed by directional connections between points, where a point relationship stores attributes of corresponding entities, and an edge relationship connected between points stores attributes of corresponding associated entities and associated relationship attributes;
the type of the warranty chain in the warranty chain identification model comprises at least one of the following types:
simple mutual insurance, joint insurance, cyclic insurance, guarantee chain, group internal guarantee ring and mixed guarantee ring.
Optionally, the determining a guarantee link candidate set from a pre-stored guarantee network according to the identity of the entity to be identified includes:
loading a pre-saved vouching network by using a graph computation engine;
and identifying the guarantee sub-network associated with the identity of the entity to be identified from the guarantee network by utilizing a maximum connected graph algorithm as a guarantee chain circle candidate set of the entity to be identified.
Optionally, the determining, from the guaranteed link candidate set, a guaranteed link of the entity to be identified according to a predefined guaranteed link identification model includes:
and traversing the guarantee chain ring candidate set according to the guarantee chain ring identification model by utilizing a graph calculation algorithm, and identifying all guarantee chain rings corresponding to the entity to be identified.
Optionally, before determining the candidate set of guaranteed link circles from the pre-saved guaranteed network according to the identity of the entity to be identified, the method further comprises:
and extracting the point-edge relationship which is combed in advance from credit system data by using a map extraction tool to form the guarantee network, and storing the guarantee network in a database.
Optionally, the graph computation engine is a spark graph computation engine, the maximum connected graph algorithm is a depth-first graph search algorithm, the graph computation algorithm is a Pregel based on spark graph, the graph extraction tool is a hive sql graph extraction tool, and the database is a hive database.
Optionally, the method further comprises:
displaying the guarantee chain ring of the entity to be identified through a graph display tool
According to the technical scheme provided by the embodiment of the invention, the guarantee chain ring can be identified from mass data, and the efficiency of guarantee ring identification is improved.
Fig. 2 is a flowchart illustrating a method for guaranteeing data mining according to another embodiment of the present invention, as shown in fig. 2, the method includes:
step 201, extracting point-edge relations which are combed in advance from credit system data by using a map extraction tool to form the guarantee network, and storing the guarantee network in a database;
the guarantee network is a point-edge relationship graph formed by directed connection between points, wherein the point relationship stores attributes of corresponding entities (entities may also be called nodes in the graph), and the edge relationship connected between points stores attributes of corresponding associated entities and associated relationship attributes.
Optionally, the atlas extraction tool is any atlas extraction tool in the prior art, such as a hive sql atlas extraction tool, and the database is any database in the prior art, such as a hive database. For example, the credit system data can be extracted from the point-edge relationship through a hive sql atlas extraction tool based on the point-edge relationship which is combed in advance, and a guarantee atlas is formed and stored in the hive.
Step 202, loading a pre-saved guarantee network by using a graph computation engine;
optionally, the graph computation engine is any one of graph computation engines in the prior art, such as a spark graph computation engine. For example, the point-edge relationship data stored in hive is loaded by taking spark graph x as a graph computation engine.
Step 203, identifying a guarantee subnetwork associated with the identity of the entity to be identified from the guarantee network by using a maximum connected graph algorithm, and using the guarantee subnetwork as a guarantee chain circle candidate set of the entity to be identified;
optionally, the maximum connected graph algorithm is any maximum connected graph algorithm in the prior art, for example, a depth-first graph search algorithm, for example, a guaranteed relationship sub-graph in a guaranteed minimum graph is identified by a maximum connected graph algorithm such as a depth-first graph search algorithm, and the guaranteed relationship sub-graph is stored in a node attribute by using a node id as a sub-graph identifier, so that irrelevant enterprises and relevant relationships are removed, and a guaranteed chain circle candidate set is obtained.
Wherein the entity to be identified refers to a customer to be identified, such as a customer requesting a loan. The entity identification to be identified refers to the customer identification to be identified, such as the identification representing the identity of the customer, such as the customer id or name.
Step 204, traversing the guarantee chain ring candidate set according to the guarantee chain ring identification model by using a graph calculation algorithm, and identifying all guarantee chain rings corresponding to the entity to be identified;
optionally, the type of the warranty link in the warranty link identification model comprises at least one of:
simple mutual security, joint security, cyclic security, security chain, group internal security ring, mixed security ring
Wherein, simply protect each other: the two enterprises guarantee each other, namely enterprise A guarantees enterprise B, enterprise B guarantees enterprise A again;
joint guarantee: three or more than three clients form a multi-user joint guarantee group to provide guarantee for the group members.
The method is environment-friendly: assurance of a closed loop formed between three or more customers is guaranteed.
And (4) a guarantee chain: at least three clients and two sets of warranty guarantees are included to ensure that the warranty guarantees are connected but do not form a closed loop.
Group internal guarantee ring: guarantee circles are formed among parent companies and subsidiary companies in single group clients and among members due to guarantee.
Mixing the guarantee ring: a complex guarantee network consisting of two or more guarantee rings.
Optionally, the graph computation algorithm is any graph computation algorithm in the prior art, such as Pregel based on sparkgraphx. For example, based on the guarantee chain circle candidate set obtained in step 203, in combination with the guarantee chain circle recognition model, the candidate set traversal is realized through pregel, and the candidate set is screened to obtain all guarantee chain circles of the entity to be recognized.
Optionally, the specific implementation step of traversing the guaranteed link candidate set by pregel includes:
step 1, traversing all nodes in a guarantee chain ring candidate set, endowing an initial chain ring identification id to a target node (namely the node corresponding to an entity to be identified), and setting all other nodes as 'NULL'; attributes are given to edges associated with the target nodes, and whether traversal is carried out or not is identified;
wherein, the attribute of the node in the guarantee relation graph (namely the point relation attribute) comprises at least one of the following: identification id, corresponding entity name, whether "NULL", etc. The edge relationship between points in the guarantee relationship graph stores corresponding associated entity attributes and associated relationship attributes, such as a public client and a guarantee, a private client and a guarantee, and the like, and the edge relationship is directional, such as the edge relationship between the node A and the node B points from the node A to the node B, indicating that the corresponding entity of the node A is the public client and the guarantee of the corresponding entity of the node B.
Step 2, if the attribute of the destination node is 'NULL', the source node sends a message to the destination node; if the attribute of the source node is 'NULL', the destination node sends a message to the source node; if the nodes at the two ends are 'NULL', the message is not sent; if the nodes at the two ends have id and the edge attribute is not traversed, a guarantee ring appears, and a guarantee ring identifier is made for the edge based on a guarantee chain ring identification model;
by analogy, based on the iteration, all guaranteed chain loops of the target node can be obtained.
And step 205, displaying the guarantee chain ring of the entity to be identified through a graph display tool.
Alternatively, the graph presentation tool may be any one of the prior art graph presentation tools, such as echarts and the like.
According to the technical scheme provided by the embodiment of the invention, the guarantee chain ring can be identified from mass data, and the efficiency of guarantee ring identification is improved.
Fig. 3 is a flowchart illustrating a method for guaranteeing data mining according to another embodiment of the present invention, as shown in fig. 3, the method includes:
step 301, extracting point-edge relations of credit system data through a hive sql atlas extraction tool based on point-edge relations which are combed in advance to form a guarantee atlas and store the guarantee atlas in the hive;
the guarantee map refers to the guarantee network in the previous embodiment.
Step 302, loading a guarantee map stored in hive by taking spark graph x as a map calculation engine, and identifying a guarantee minimum map of an entity to be identified through a maximum connected graph algorithm;
the minimum security graph refers to the security sub-network in the previous embodiment as the security link candidate set.
In this step, irrelevant enterprises and relevant relations are removed from the guarantee network, and a guarantee link candidate set is obtained.
Step 303, traversing the guarantee minimum map through the pregel according to a guarantee link identification model, and identifying all guarantee links corresponding to the entity to be identified;
and 304, displaying all the guarantee chain loops corresponding to the entity to be identified through a graph display tool.
Optionally, the graph displaying tool is any one of existing graph displaying tools, such as echarts and the like.
According to the technical scheme provided by the embodiment of the invention, the storage and representation problems of mass graph data are solved by using the hive database, the performance problems existing in the traditional sql traversal complex network are solved by using the spark graph calculation engine, and the guarantee link can be identified from the mass graph data. The identification of the guarantee chain ring is beneficial to tidying the operation mode of the guarantee chain ring in the commercial bank and analyzing the influence of the guarantee chain ring on the risk management. This will be favorable to the bank to discern structure and the risk of making a bet and protect the circle in risk management, and better she holds the operating condition of enterprise, carries out dynamic management to enterprise's loan risk.
Fig. 4 is a schematic flowchart of a system for guaranteeing data mining according to an embodiment of the present invention, as shown in fig. 4, the system includes:
the first determining unit is used for determining a guarantee link candidate set from a pre-stored guarantee network according to the identity of the entity to be identified;
and the second determining unit is used for determining the guarantee link of the entity to be identified from the guarantee link candidate set according to a predefined guarantee link identification model.
Optionally, the guarantee network is a point-edge relationship graph formed by directional connections between points, where a point relationship stores attributes of corresponding entities, and an edge relationship connected between points stores attributes of corresponding associated entities and associated relationship attributes;
the type of the warranty chain in the warranty chain identification model comprises at least one of the following types:
simple mutual insurance, joint insurance, cyclic insurance, guarantee chain, group internal guarantee ring and mixed guarantee ring.
Optionally, the first determining unit is specifically configured to load a pre-saved guaranteed network by using the graph computation engine;
and identifying the guarantee sub-network associated with the identity of the entity to be identified from the guarantee network by utilizing a maximum connected graph algorithm as a guarantee chain circle candidate set of the entity to be identified.
Optionally, the second determining unit is specifically configured to traverse the guaranteed link candidate set according to the guaranteed link identification model by using a graph computation algorithm, and identify all guaranteed links corresponding to the entity to be identified.
Optionally, the system further comprises:
and the third determining unit is used for extracting the point-edge relationship which is combed in advance from the credit system data by using a map extraction tool to form the guarantee network and storing the guarantee network in a database before determining the guarantee chain circle candidate set from the pre-stored guarantee network according to the identification of the entity to be identified.
Optionally, the graph computation engine is a spark graph computation engine, the maximum connected graph algorithm is a depth-first graph search algorithm, the graph computation algorithm is a Pregel based on spark graph, the graph extraction tool is a hive sql graph extraction tool, and the database is a hive database.
Optionally, the system further comprises: and the display unit is used for displaying the guarantee chain ring of the entity to be identified through a graph display tool.
According to the technical scheme provided by the embodiment of the invention, the guarantee chain ring can be identified from mass data, and the efficiency of guarantee ring identification is improved.
Fig. 5 is a schematic flow chart of a system for guaranteeing data mining according to another embodiment of the present invention, as shown in fig. 5, the system includes:
a connectivity graph API (Application Programming Interface) and a guarantee link filtering API;
the connected graph API corresponds to a first determination unit in the above embodiment, and the warranty chain loop filter API corresponds to a second determination unit in the above embodiment.
The system comprises a communication graph API and a security chain ring candidate set, wherein the communication graph API is used for determining a security chain ring candidate set from a pre-stored security network according to an entity identifier to be identified;
optionally, the guarantee network is a point-edge relationship graph formed by directional connections between points, where a point relationship stores attributes of corresponding entities, and an edge relationship connected between points stores attributes of corresponding associated entities and associated relationship attributes;
the type of the warranty chain in the warranty chain identification model comprises at least one of the following types:
simple mutual insurance, joint insurance, cyclic insurance, guarantee chain, group internal guarantee ring and mixed guarantee ring.
Optionally, the connectivity graph API is specifically configured to load a pre-saved guaranteed network using a graph computation engine;
and identifying the guarantee sub-network associated with the identity of the entity to be identified from the guarantee network by utilizing a maximum connected graph algorithm as a guarantee chain circle candidate set of the entity to be identified.
Optionally, the graph computation engine is any one of existing graph computation engines, such as a spark graph computation engine, and the maximum connected graph algorithm is any one of existing maximum connected graph algorithms, such as a depth-first graph search algorithm.
Wherein, this system still includes:
and the third determining unit is used for extracting the point-edge relationship which is combed in advance from the credit system data by using the map extraction tool to form the guarantee network and storing the guarantee network in the database.
Optionally, the atlas extraction tool is any kind of existing atlas extraction tool, such as a hive sql atlas extraction tool, and the database is any kind of existing database, such as a hive database.
For example, in the present embodiment, description is made based on certain commercial bank guarantee data, point-edge relationships in the guarantee data are extracted by a hive sql tool from the guarantee data defined in advance, and a guarantee network is constructed to store the point and the edge relationships, respectively. The point relation stores the enterprise entity and the related attribute thereof, and the edge relation stores the information with the associated enterprise id as the main body and the associated attribute. Then, the guarantee network in the hive database is used as the input of the connected graph API, and a guarantee chain circle candidate set in the guarantee network is calculated.
And the guarantee chain ring filtering API is used for determining the guarantee chain ring of the entity to be identified from the guarantee chain ring candidate set according to a predefined guarantee chain ring identification model.
Optionally, the guarantee link filtering API is specifically configured to traverse the guarantee link candidate set according to the guarantee link recognition model by using a graph computation algorithm, and identify all guarantee links corresponding to the entity to be identified.
Optionally, the graph computation algorithm is any one of the existing graph computation algorithms, such as Pregel based on spark graph x.
For example, in this embodiment, a client to be identified (i.e., an entity to be identified) and a guarantee link candidate set are used as input, a guarantee link filtering API is called, and all guarantee links where the client to be identified is located are filtered.
Wherein, this system still includes: a display unit for displaying the image of the object,
and the display unit is used for displaying all the identified guarantee chain loops through the graph display tool.
Optionally, the graph displaying tool is any one of existing graph displaying tools, such as echarts and the like. For example, as shown in fig. 6, which is an exemplary diagram of a security chain illustrated by the diagram illustration tool, A, B, C, D, E, F indicates that each node in the diagram may represent a different entity, each entity may represent a different client, and an edge relationship between each two nodes represents an association relationship between the two nodes, for example, an edge relationship between AB (both enterprises) is an relations between a public client and a security policy, where a is B.
According to the technical scheme provided by the embodiment of the invention, the guarantee chain ring can be identified from mass data, and the efficiency of guarantee ring identification is improved.
Fig. 7 is a schematic flowchart of a system for guaranteeing data mining according to another embodiment of the present invention, as shown in fig. 7, the system includes:
a guarantee map API, a connected map API, a guarantee chain ring filtering API and a display unit;
wherein the guaranty map API corresponds to the third determination unit in the above-described embodiments.
The guarantee map API is used for extracting point-edge relations of credit system data through a hive sql map extraction tool based on the point-edge relations which are combed in advance to form a guarantee map and storing the guarantee map in the hive;
the security map refers to the security network in the above embodiments.
The connected graph API is used for loading the guarantee graph stored in the hive by taking spark graph x as a graph calculation engine, and identifying the guarantee minimum graph of the entity to be identified through a maximum connected graph algorithm;
the minimum security graph refers to the security sub-network in the previous embodiment as the security link candidate set.
In this step, irrelevant enterprises and relevant relations are removed from the guarantee network, and a guarantee link candidate set is obtained.
The guarantee chain ring filtering API is used for traversing the guarantee minimum map through a pregel according to a guarantee chain ring recognition model, and recognizing all guarantee chain rings corresponding to the entity to be recognized;
and the display unit is used for displaying all the guarantee chain loops corresponding to the entity to be identified through a graph display tool.
Optionally, the graph displaying tool is any one of existing graph displaying tools, such as echarts and the like.
According to the technical scheme provided by the embodiment of the invention, the public client guarantee map is constructed, hive is adopted as a map storage medium, on the basis of the constructed guarantee map, map search algorithms such as maximum connected subgraphs, depth priority and the like are realized on the basis of Pregel of spark graph x, the exhaustive search of the guarantee map is realized, and a guarantee chain ring recognition model is designed in combination with business rules, so that the public client guarantee chain ring recognition is completed. In this way, the data from the credit system is processed to extract the entities and related attributes related to the credit business, and the guarantee relationship of the client is combed and perfected.
An embodiment of the present invention further provides a system for guarantying data mining, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing any of the methods of underwriting data mining described above.
An embodiment of the present invention further provides a computer-readable storage medium, where an information processing program is stored on the computer-readable storage medium, and when the information processing program is executed by a processor, the information processing program implements any one of the steps of the method for guarantying data mining.
It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

Claims (10)

1. A method of underwriting data mining, comprising:
determining a guarantee chain circle candidate set from a pre-stored guarantee network according to the identity of the entity to be identified;
and determining the guarantee chain of the entity to be identified from the guarantee chain candidate set according to a predefined guarantee chain identification model.
2. The method of claim 1,
the guarantee network is a point-edge relation graph formed by directed connection between points, wherein the point relation stores the attribute of a corresponding entity, and the edge relation connected between the points stores the attribute of a corresponding associated entity and the attribute of the associated relation;
the type of the warranty chain in the warranty chain identification model comprises at least one of the following types:
simple mutual insurance, joint insurance, cyclic insurance, guarantee chain, group internal guarantee ring and mixed guarantee ring.
3. The method of claim 1, wherein determining a guaranteed link round candidate set from a pre-saved guaranteed network based on the identity of the entity to be identified comprises:
loading a pre-saved vouching network by using a graph computation engine;
and identifying the guarantee sub-network associated with the identity of the entity to be identified from the guarantee network by utilizing a maximum connected graph algorithm as a guarantee chain circle candidate set of the entity to be identified.
4. The method of claim 1, wherein determining a guaranteed link for the entity to be identified from the guaranteed link candidate set according to a predefined guaranteed link identification model comprises:
and traversing the guarantee chain ring candidate set according to the guarantee chain ring identification model by utilizing a graph calculation algorithm, and identifying all guarantee chain rings corresponding to the entity to be identified.
5. The method of claim 1, wherein prior to determining the guaranteed link candidate set from a pre-saved guaranteed network based on the identity of the entity to be identified, the method further comprises:
and extracting the point-edge relationship which is combed in advance from credit system data by using a map extraction tool to form the guarantee network, and storing the guarantee network in a database.
6. The method according to any one of claims 3 to 5,
the map calculation engine is a spark graph calculation engine, the maximum connected graph algorithm is a depth-first graph search algorithm, the map calculation algorithm is a Pregel based on spark graph, the map extraction tool is a hive sql map extraction tool, and the database is a hive database.
7. The method of claim 1, further comprising:
and displaying the guarantee chain ring of the entity to be identified through a graph display tool.
8. A system for guarantying data mining, comprising:
the first determining unit is used for determining a guarantee link candidate set from a pre-stored guarantee network according to the identity of the entity to be identified;
and the second determining unit is used for determining the guarantee link of the entity to be identified from the guarantee link candidate set according to a predefined guarantee link identification model.
9. A system for guarantying data mining, comprising: memory, processor and computer program stored on the memory and executable on the processor, the computer program when executed by the processor implementing a method of vouching for data mining according to any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon an information processing program which, when executed by a processor, implements the steps of a method of vouching for data mining as recited in any one of claims 1 through 7.
CN201911241360.XA 2019-12-06 2019-12-06 Guarantee data mining method and system Pending CN111143430A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911241360.XA CN111143430A (en) 2019-12-06 2019-12-06 Guarantee data mining method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911241360.XA CN111143430A (en) 2019-12-06 2019-12-06 Guarantee data mining method and system

Publications (1)

Publication Number Publication Date
CN111143430A true CN111143430A (en) 2020-05-12

Family

ID=70517730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911241360.XA Pending CN111143430A (en) 2019-12-06 2019-12-06 Guarantee data mining method and system

Country Status (1)

Country Link
CN (1) CN111143430A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784495A (en) * 2020-06-04 2020-10-16 江苏常熟农村商业银行股份有限公司 Guarantee ring identification method and device, computer equipment and storage medium
CN112256769A (en) * 2020-11-13 2021-01-22 北京海致星图科技有限公司 Pregel-based method for realizing fund circle distribution for mining commercial bank transaction data
CN113468382A (en) * 2021-07-01 2021-10-01 同盾控股有限公司 Multi-party loop detection method, device and related equipment based on knowledge federation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411499A (en) * 2011-08-11 2012-04-11 浙江大学 Directed-graph-structure-based system information extraction method for single electronic control unit (ECU)
CN105468702A (en) * 2015-11-18 2016-04-06 中国科学院计算机网络信息中心 Large-scale RDF data association path discovery method
US20160147817A1 (en) * 2014-11-25 2016-05-26 International Business Machines Corporation Data credibility vouching system
CN109685647A (en) * 2018-12-27 2019-04-26 阳光财产保险股份有限公司 The training method of credit fraud detection method and its model, device and server
CN110209826A (en) * 2018-02-06 2019-09-06 武汉观图信息科技有限公司 A kind of financial map construction and analysis method towards bank risk control

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411499A (en) * 2011-08-11 2012-04-11 浙江大学 Directed-graph-structure-based system information extraction method for single electronic control unit (ECU)
US20160147817A1 (en) * 2014-11-25 2016-05-26 International Business Machines Corporation Data credibility vouching system
CN105468702A (en) * 2015-11-18 2016-04-06 中国科学院计算机网络信息中心 Large-scale RDF data association path discovery method
CN110209826A (en) * 2018-02-06 2019-09-06 武汉观图信息科技有限公司 A kind of financial map construction and analysis method towards bank risk control
CN109685647A (en) * 2018-12-27 2019-04-26 阳光财产保险股份有限公司 The training method of credit fraud detection method and its model, device and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田宇: ""商业银行担保圈风险识别与防范研究"", 《中国优秀博硕士学位论文全文数据库(硕士)经济与管理科学辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111784495A (en) * 2020-06-04 2020-10-16 江苏常熟农村商业银行股份有限公司 Guarantee ring identification method and device, computer equipment and storage medium
CN111784495B (en) * 2020-06-04 2021-05-04 江苏常熟农村商业银行股份有限公司 Guarantee ring identification method and device, computer equipment and storage medium
CN112256769A (en) * 2020-11-13 2021-01-22 北京海致星图科技有限公司 Pregel-based method for realizing fund circle distribution for mining commercial bank transaction data
CN112256769B (en) * 2020-11-13 2024-04-12 北京海致星图科技有限公司 Pregel-based method for realizing fund circle distribution of mining business banking transaction data
CN113468382A (en) * 2021-07-01 2021-10-01 同盾控股有限公司 Multi-party loop detection method, device and related equipment based on knowledge federation
CN113468382B (en) * 2021-07-01 2024-04-02 同盾控股有限公司 Knowledge federation-based multiparty loop detection method, device and related equipment

Similar Documents

Publication Publication Date Title
CN111143430A (en) Guarantee data mining method and system
US10115108B1 (en) Rendering transaction data to identify fraud detection rule strength
CN105335855A (en) Transaction risk identification method and apparatus
WO2019116137A1 (en) Data de-identification based on detection of allowable configurations for data de-identification processes
CN110032568B (en) Data structure reading and updating method and device, and electronic equipment
WO2018233393A1 (en) Insurance purchasing verification method, apparatus, computer device and storage medium
CN107944866B (en) Transaction record duplication elimination method and computer-readable storage medium
CN111935182B (en) Firewall policy checking method, device and storage medium of network equipment
CN110489416B (en) Information storage method based on data processing and related equipment
CN106294530A (en) The method and system of rule match
CN116992044A (en) Knowledge graph fusion method and system
CN112287400A (en) Transaction sequencing method and device in super account book and computer equipment
CN111177150A (en) Method and system for identifying group genealogy
CN111737369A (en) Relationship map updating method and device and storage medium
CN111784495B (en) Guarantee ring identification method and device, computer equipment and storage medium
CN114357198B (en) Entity fusion method and device for multiple knowledge graphs
CN111209330A (en) Method and system for identifying consistent actor
CN111552847B (en) Method and device for changing number of objects
CN113849579A (en) Knowledge graph data processing method and system based on knowledge view
CN116012123B (en) Wind control rule engine method and system based on Rete algorithm
US20230385337A1 (en) Systems and methods for metadata based path finding
CN111179052A (en) Method and system for identifying actual control person
CN115994194B (en) Method, system, equipment and medium for checking data quality of government affair big data
CN117313855B (en) Rule decision method and device
CN117011352B (en) Standard brain map construction method, device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200512