CN117390480A - Information extraction method, device, equipment and storage medium - Google Patents

Information extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN117390480A
CN117390480A CN202311472149.5A CN202311472149A CN117390480A CN 117390480 A CN117390480 A CN 117390480A CN 202311472149 A CN202311472149 A CN 202311472149A CN 117390480 A CN117390480 A CN 117390480A
Authority
CN
China
Prior art keywords
node
core
transaction
nodes
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311472149.5A
Other languages
Chinese (zh)
Inventor
潘婧
汤韬
顾河建
高鹏飞
杨燕明
郑建宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN202311472149.5A priority Critical patent/CN117390480A/en
Publication of CN117390480A publication Critical patent/CN117390480A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides an information extraction method, an information extraction device, information extraction equipment and a storage medium, and relates to the technical field of computers, wherein the information extraction method comprises the following steps: extracting a plurality of core nodes from transaction nodes in a transaction connectivity graph; extracting respective representation vectors of the plurality of core nodes through a graph embedding model; clustering the plurality of core nodes based on the respective representation vectors of the plurality of core nodes to obtain a plurality of node groups; determining a core node path corresponding to each node group in the plurality of node groups from the transaction connectivity graph based on the core nodes in the node groups through a shortest path model; based on the obtained multiple core node paths, a core graph structure in the transaction connection graph is determined, the calculation complexity of dividing the large-scale transaction connection graph is reduced, the core nodes can be effectively clustered and divided, the association relation of the nodes can be rapidly and effectively excavated, and the compression effect of the transaction connection graph is further improved.

Description

Information extraction method, device, equipment and storage medium
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to an information extraction method, an information extraction device, information extraction equipment and a storage medium.
Background
With the rapid development of internet technology, network payment is increasingly used in life. Because of the complex and large number of transaction users, very large scale transaction connectivity graphs often appear.
When processing super-large-scale transaction graph structure data and knowledge graphs, under the related technology, obtaining importance degree and sparse vectors of each node according to the obtained target knowledge graphs, obtaining a starting point and a possible ending point according to the sparse vectors, obtaining adjusted sparse vectors according to the starting point, the possible ending point and a category sequence, obtaining recognition degree of a target knowledge image and a target knowledge image according to the sparse vectors and the adjusted sparse vectors, obtaining compression loss degree of the possible ending point according to the recognition degree and the importance degree of each node, obtaining the ending point of the starting point according to the compression loss degree of the possible ending point, inputting the starting point and the ending point into a compression module, and performing compression storage according to the compression module, so that the storage data amount of the target knowledge graphs is reduced, but the side emphasis of the method is on how to effectively compress the nodes of the target knowledge graphs without influencing the recognition degree of the target knowledge graphs, and the excavation of node association relations in the knowledge graphs is insufficient.
Therefore, how to effectively mine the association relation of the nodes in the ultra-large scale transaction connected graph is a technical problem to be solved in the prior art.
Disclosure of Invention
The embodiment of the application provides an information extraction method, an information extraction device, information extraction equipment and a storage medium, which can quickly and effectively mine node association relations in a transaction connection graph.
In a first aspect, an embodiment of the present application provides an information extraction method, including:
extracting a plurality of core nodes from transaction nodes in a transaction connectivity graph;
extracting respective representation vectors of the plurality of core nodes through a graph embedding model;
clustering the plurality of core nodes based on the respective representation vectors of the plurality of core nodes to obtain a plurality of node groups;
determining a core node path corresponding to each node group in the plurality of node groups from the transaction connectivity graph based on the core nodes in the node groups through a shortest path model;
based on the obtained plurality of core node paths, a core graph structure in the transaction connectivity graph is determined.
According to the embodiment of the application, the plurality of core nodes are extracted from the transaction connected graph, the graph embedding model is utilized, the respective representation vectors of the core nodes are extracted, the plurality of core nodes are clustered and divided into the plurality of node groups according to the respective representation vectors of the core nodes, then the core node paths corresponding to each node group are determined from the transaction connected graph through the shortest path model for each node group, finally the core graph structure in the transaction connected graph is determined according to the plurality of obtained core node paths, the calculation complexity of the large-scale transaction connected graph division is reduced, the local topology information of the core nodes can be effectively mined, the core nodes can be effectively clustered and divided, the node association relationship is rapidly and effectively mined, and the compression effect of the transaction connected graph is further improved.
In an alternative embodiment, the extracting, by using a graph embedding model, a representation vector of each of the plurality of core nodes includes:
for the plurality of core nodes, the following operations are respectively executed:
taking one core node as an initial endpoint, and performing random walk in the transaction communication graph to obtain a walk sequence corresponding to the one core node;
extracting an embedded representation vector of the walk sequence of the one core node through a graph embedding model;
and taking the embedded representation vector of the wandering sequence as the representation vector of the core node.
In an optional implementation manner, the taking one core node as an initial endpoint, performing random walk in the transaction connectivity graph to obtain a walk sequence corresponding to the one core node, and the method includes:
calculating the weight from each transaction node to each adjacent node aiming at each transaction node in the transaction connectivity graph; summing the weights from the transaction node to each adjacent node to obtain a cumulative weight; taking the ratio of the weight value from the transaction node to each adjacent node to the accumulated weight value as the travelling probability of the transaction node travelling to each adjacent node;
and taking one core node as an initial endpoint, and performing random walk in the transaction communication graph based on each obtained walk probability to obtain a walk sequence corresponding to the one core node.
In an alternative embodiment, the weight of the transaction node to an adjacent node characterizes: the transaction strength between the transaction node and the one neighboring node.
In an alternative embodiment, the determining, for each node group in the plurality of node groups, a core node path corresponding to the node group from the transaction connectivity graph based on the core nodes in the node group through a shortest path model includes:
selecting any core node in the node groups as an initial endpoint for each node group in the plurality of node groups, performing breadth path expansion on the initial endpoint, and continuing breadth path expansion by taking other core nodes as initial endpoints when expanding to other core nodes in the node groups until all core nodes in the node groups are contained in the obtained breadth search path;
and taking the breadth search paths of all the core nodes comprising the node group as core node paths corresponding to the node group.
In the above embodiment, for each node group, any core node in the node group is selected as an initial endpoint, the initial endpoint is subjected to breadth path expansion, when the initial endpoint is expanded to other core nodes in the node group, the breadth path expansion is continued by using the other core nodes as the initial endpoint until all core nodes in the node group are included in the obtained breadth search path, the obtained breadth search path is stopped, and finally, the breadth search path of all core nodes including the node group is used as a core node path corresponding to the node group, and the method can be effectively expanded to a core link from the core nodes by a shortest path identification method based on a breadth path search algorithm, and the algorithm has lower calculation complexity and stronger interpretation of intermediate results.
In an alternative embodiment, the determining, based on the obtained multiple core node paths, a core graph structure in the transaction connectivity graph includes:
performing secondary association expansion on the obtained multiple core node paths, and determining a core graph structure in the transaction connected graph.
In an optional implementation manner, each transaction node in the transaction connectivity graph corresponds to a user identifier or a merchant identifier, and the connection relationship between any two transaction nodes is at least used for representing whether a transaction occurs between any two transaction nodes.
In a second aspect, an embodiment of the present application provides an information extraction apparatus, including:
the node extraction module is used for extracting a plurality of core nodes from the transaction nodes in the transaction connection graph;
the vector representation module is used for extracting the respective representation vectors of the plurality of core nodes through a graph embedding model;
the clustering division module is used for clustering the plurality of core nodes based on the respective representation vectors of the plurality of core nodes to obtain a plurality of node groups;
a core path extraction module, configured to determine, for each node group of the plurality of node groups, a core node path corresponding to the node group from the transaction connectivity graph based on core nodes in the node group through a shortest path model;
and the information extraction module is used for determining a core graph structure in the transaction connected graph based on the obtained multiple core node paths.
In an alternative embodiment, the vector representation module is specifically configured to:
for the plurality of core nodes, the following operations are respectively executed:
taking one core node as an initial endpoint, and performing random walk in the transaction communication graph to obtain a walk sequence corresponding to the one core node;
extracting an embedded representation vector of the walk sequence of the one core node through a graph embedding model;
and taking the embedded representation vector of the wandering sequence as the representation vector of the core node.
In an alternative embodiment, the vector representation module is specifically configured to:
calculating the weight from each transaction node to each adjacent node aiming at each transaction node in the transaction connectivity graph; summing the weights from the transaction node to each adjacent node to obtain a cumulative weight; taking the ratio of the weight value from the transaction node to each adjacent node to the accumulated weight value as the travelling probability of the transaction node travelling to each adjacent node;
and taking one core node as an initial endpoint, and performing random walk in the transaction communication graph based on each obtained walk probability to obtain a walk sequence corresponding to the one core node.
In an alternative embodiment, the weight of the transaction node to an adjacent node characterizes: the transaction strength between the transaction node and the one neighboring node.
In an alternative embodiment, the core path extraction module is specifically configured to:
selecting any core node in the node groups as an initial endpoint for each node group in the plurality of node groups, performing breadth path expansion on the initial endpoint, and continuing breadth path expansion by taking other core nodes as initial endpoints when expanding to other core nodes in the node groups until all core nodes in the node groups are contained in the obtained breadth search path;
and taking the breadth search paths of all the core nodes comprising the node group as core node paths corresponding to the node group.
In an alternative embodiment, the information extraction module is specifically configured to:
performing secondary association expansion on the obtained multiple core node paths, and determining a core graph structure in the transaction connected graph.
In a third aspect, embodiments of the present application provide a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above information extraction method when executing the program.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program executable by a computer device, which when run on the computer device, causes the computer device to perform the steps of the above-described information extraction method.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of a system architecture according to an embodiment of the present application;
fig. 2 is a flow chart of an information extraction method according to an embodiment of the present application;
fig. 3 is a flow chart of an information extraction method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an information extraction device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantageous effects of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
For a better explanation of the embodiments of the present application, the following noun explanations are provided:
random walk: a mathematical statistical model, which is a series of trajectory components, each of which is random, can be used to represent irregular patterns of variation.
Referring to fig. 1, a system architecture diagram applicable to the embodiment of the present application, where the system architecture at least includes a terminal device 101 and an information extraction system 102, the number of the terminal devices 101 may be one or more, and the number of the information extraction systems 102 may be one or more, and the number of the terminal devices 101 and the information extraction systems 102 is not specifically limited in this application.
An application is pre-installed in the terminal device 101, wherein the application is a client application, a web page application, an applet application, or the like. The terminal device 101 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart home appliance, a smart voice interaction device, a smart car-mounted device, and the like.
The information extraction system 102 is a background server of an application, and the information extraction system 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, abbreviated as CDN), basic cloud computing services such as big data and an artificial intelligent platform. The terminal device 101 and the information extraction system 102 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
The information extraction method in the embodiment of the present application may be executed by the terminal device 101, may be executed by the information extraction system 102, or may be executed interactively by the terminal device 101 and the information extraction system 102.
Based on the system architecture diagram shown in fig. 1, the embodiment of the present application provides a flowchart of an information extraction method, as shown in fig. 2, where the flowchart of the method is executed by a computer device, and the computer device may be the terminal device 101 and/or the information extraction system 102 shown in fig. 1, and includes the following steps:
step 201, extracting a plurality of core nodes from transaction nodes in a transaction connectivity graph.
In an alternative embodiment, each transaction node in the transaction connectivity graph corresponds to a user identifier or a merchant identifier, and the connection relationship between any two transaction nodes is at least used for representing whether a transaction occurs between any two transaction nodes.
In a large-scale transaction connectivity graph, the core nodes may be related by multi-hop, but the core nodes may not be closely related to each other, but may be subdivided into smaller-scale local communities.
A batch of core nodes are preliminarily determined from transaction nodes in the transaction connected graph through external information introduction, individual node model classification and the like, and marked in the large-scale transaction connected graph and marked as N cores . Of course, the plurality of extracted core nodes may be core nodes whose history has been already marked.
Step 202, extracting respective expression vectors of a plurality of core nodes through a graph embedding model.
In an alternative embodiment, the following operations are performed for a plurality of core nodes, respectively:
taking a core node as an initial endpoint, and performing random walk in a transaction communication graph to obtain a walk sequence corresponding to the core node; extracting an embedded representation vector of the wandering sequence of the core node through a graph embedding model; the embedded representation vector of the walk sequence is taken as the representation vector of the core node.
In some embodiments, the embodiments of the present application at least obtain the walk sequence corresponding to the core node in the following manner:
calculating the weight from the transaction node to each adjacent node aiming at each transaction node in the transaction connectivity graph; summing the weights from the transaction node to each adjacent node to obtain a cumulative weight; the ratio of the weight value from the transaction node to each adjacent node to the accumulated weight value is used as the travelling probability of the transaction node travelling to each adjacent node; and taking a core node as an initial endpoint, and performing random walk in the transaction communication graph based on the obtained walk probabilities to obtain a walk sequence corresponding to the core node.
Specifically, for any node u in the transaction connectivity graph, defineFor the set of all neighbor nodes of u, generating a walk sequence and a representation vector of each core node by adopting a biased random walk strategy, and specifically dividing the cases into two cases:
first case: if it isI.e. there are no neighboring nodes around transaction node u, then the transaction node u is abandoned and a random walk of the next node is entered.
Second case: if it isI.e. there is at least one neighboring node around transaction node u, then calculate its weight e to each neighboring node v uv For weight e uv Normalization processing is carried out to obtain the wandering probability +.>See specifically the following formula (1):
then, for any node u in the transaction connectivity graphGenerates a weight e with any adjacent node v around it uv Based on the weight, corresponding wandering probabilityLet the core node with the probability of wandering->There is a partial walk to the neighboring nodes until a walk sequence of core nodes is formed.
Wherein, the weight of transaction node to a neighboring node characterizes: the transaction strength between the transaction node and one neighboring node may be the frequency of transactions between the transaction node and the transaction node, the transaction amount, etc.
For the generated wander sequence, a trained graph embedding model is adopted to obtain an embedded representation vector of each core node, which is equivalent to mapping each core nodeWhere k is the dimension of the node vector. Through the operation, the sequence S can be correspondingly moved cores Forming an embedded representation vector V cores And simultaneously storing the wandering sequence corresponding to the core node and the embedded representation vector. Finally, the embedded representation vector V cores As a representation vector of the core node. The graph embedding model may be: skip-gram model (Skip model), node2vec model (Node embedding algorithm model), etc.
Step 203, clustering the plurality of core nodes based on the respective representation vectors of the plurality of core nodes to obtain a plurality of node groups.
Specifically, by embedding the representation vector V cores Clustering calculation is carried out to realize clustering division of core nodes, and N can be further obtained cores Into a plurality of node groups. For node group i, denoted N cores-i . The clustering calculation method can be K-means (K Mean clustering algorithm), kernel clustering, mean shift algorithm and the like.
According to the node clustering method provided by the embodiment of the application, the nodes with the high-order similarity relationship can be placed in the same migration sequence, so that the high-order relationship can be represented by the model.
Step 204, for each node group in the plurality of node groups, determining a core node path corresponding to the node group from the transaction connectivity graph based on the core nodes in the node group through the shortest path model.
In an optional implementation manner, for each node group in the plurality of node groups, selecting any core node in the node group as an initial endpoint, performing breadth path expansion on the initial endpoint, and when the initial endpoint is expanded to other core nodes in the node group, continuing to perform breadth path expansion by taking the other core nodes as the initial endpoint until all core nodes in the node group are included in the obtained breadth search path;
and taking the breadth search paths of all the core nodes comprising the node group as core node paths corresponding to the node group.
Specifically, for each node group, core nodes in the node group are put back into the transaction connection graph, any core node in the node group is selected as an initial endpoint, the expansion of the breadth path is carried out until the kth hop expansion comprises other core nodes in the node group, the expansion of the breadth path is carried out again starting from the other expanded core nodes, and the steps are repeated until all the core nodes in the node group are covered. And extracting the breadth search path covering the core nodes to form the shortest association path, namely the core node path corresponding to the node group. Through the steps, each node group N cores-i A core node path is formed.
Referring to fig. 3, assume a total of four core nodes in a node group: node a, node B, node C, node D, the other nodes being non-core nodes.
Taking a core node A as an initial endpoint, performing breadth path expansion on the node A to form a path: AB. A1, AC, then respectively carrying out breadth path expansion by taking the expanded node B, node 1 and node C as initial endpoints to form a new path: c2, C3, C4, then performing breadth path extension with the extended node2, node 3, node 4 as initial endpoints, respectively, to form a new path: 2D, 25, finally extracting the breadth search path including the core node a, the node B, the node C, and the node D as the core node path of the node group, i.e. the path BAC2D is the core node path of the node group.
In the above manner, for each node group, any core node in the node group is selected as an initial endpoint, the initial endpoint is subjected to breadth path expansion, when the initial endpoint is expanded to other core nodes in the node group, the breadth path expansion is continued by taking the other core nodes as the initial endpoint until all core nodes in the node group are contained in the obtained breadth search path, the breadth search path of all the core nodes containing the node group is stopped, and finally the breadth search path is used as a core node path corresponding to the node group, and the method can be effectively expanded to a core link from the core nodes through a shortest path identification method based on a breadth path search algorithm, and has lower algorithm calculation complexity and stronger interpretability of intermediate results.
Step 205, determining a core graph structure in the transaction connectivity graph based on the obtained plurality of core node paths.
In an alternative implementation manner, two-degree association expansion is performed on the obtained multiple core node paths, and a core graph structure in the transaction connection graph is determined.
Specifically, the second degree association process may be a related process of performing a second degree association graph, where the second degree association graph may refer to a graph formed by starting from a node of a subset of two subsets that are mutually disjoint, associating the node through a connection line (or a connection edge), and performing a first degree association process by using a newly associated node, and the first degree association process may refer to a graph formed by starting from a node of a subset of two subsets that are mutually disjoint, directly associating the node through a connection line (or a connection edge), and connecting the node (or the connection edge). For example, as shown in fig. 3, from node a, node B, node 1, and node C can be connected by a connection, and the connection between node a, node B, node 1, and node C forms a one-degree association diagram.
The second degree association graph may be formed by starting from node B, node 1, and node C and connecting to corresponding nodes through wires on the basis of the first degree association graph, and the first degree association graph, node B, node 1, node C, and corresponding nodes and corresponding wires thereof form the second degree association graph. In the transaction connection graph, secondary association expansion processing is performed on the plurality of core node paths in the mode, and the output structure is the core graph structure of the transaction connection graph.
According to the embodiment of the application, the plurality of core nodes are extracted from the transaction connected graph, the graph embedding model is utilized, the respective representation vectors of the core nodes are extracted, the plurality of core nodes are clustered and divided into the plurality of node groups according to the respective representation vectors of the core nodes, then the core node paths corresponding to each node group are determined from the transaction connected graph through the shortest path model for each node group, finally the core graph structure in the transaction connected graph is determined according to the plurality of obtained core node paths, the calculation complexity of the large-scale transaction connected graph division is reduced, the local topology information of the core nodes can be effectively mined, the core nodes can be effectively clustered and divided, the node association relationship is rapidly and effectively mined, and the compression effect of the transaction connected graph is further improved.
In order to better explain the embodiments of the present application, taking a specific application scenario experiment as an example, for a large-scale graph structure communication community (60000 more nodes), it is difficult to perform rapid graph compression and core association community positioning according to the existing core node labels by using the conventional method. According to the information extraction method, a plurality of core nodes are extracted from 60000 more nodes, the graph embedding model is utilized to extract respective representation vectors of the core nodes, the plurality of core nodes are clustered and divided into a plurality of node groups according to the respective representation vectors of the core nodes, then a core node path corresponding to each node group is determined from a transaction connected graph through a shortest path model for each node group, and finally a core graph structure in the transaction connected graph is determined according to the obtained plurality of core node paths, so that the graph structure is compressed, and finally the graph structure is compressed from 60000 more nodes to 3000 more node scales, so that not only is the close association between the core nodes rapidly discovered, but also a transaction core structure is effectively excavated, and the transaction core structure can be used for rapidly positioning an existing core transaction abnormal bank card and an association card with higher suspected degree, and meanwhile, a corresponding suspected abnormal transaction structure can be obtained.
Based on the same technical concept, referring to fig. 4, an embodiment of the present application provides an information extraction apparatus, including:
a node extraction module 401, configured to extract a plurality of core nodes from the transaction nodes in the transaction connectivity graph;
a vector representation module 402, configured to extract respective representation vectors of the plurality of core nodes through a graph embedding model;
a cluster division module 403, configured to cluster the plurality of core nodes based on the respective representation vectors of the plurality of core nodes, to obtain a plurality of node groups;
a core path extraction module 404, configured to determine, for each of the plurality of node groups, a core node path corresponding to the node group from the transaction connectivity graph based on core nodes in the node group through a shortest path model;
the information extraction module 405 is configured to determine a core graph structure in the transaction connectivity graph based on the obtained multiple core node paths.
In an alternative embodiment, the vector representation module 402 is specifically configured to:
for a plurality of core nodes, the following operations are respectively executed:
taking a core node as an initial endpoint, and performing random walk in the transaction communication graph to obtain a walk sequence corresponding to the core node;
extracting an embedded representation vector of the wandering sequence of a core node through a graph embedding model;
the embedded representation vector of the walk sequence is used as the representation vector of a core node.
In an alternative embodiment, the vector representation module 402 is specifically configured to:
calculating the weight from the transaction node to each adjacent node aiming at each transaction node in the transaction connectivity graph; summing the weights from the transaction node to each adjacent node to obtain a cumulative weight; the ratio of the weight value from the transaction node to each adjacent node to the accumulated weight value is used as the travelling probability of the transaction node travelling to each adjacent node;
and taking a core node as an initial endpoint, and performing random walk in the transaction communication graph based on the obtained walk probabilities to obtain a walk sequence corresponding to the core node.
In an alternative embodiment, the weight of a transaction node to an adjacent node characterizes: transaction strength between a transaction node and an adjacent node.
In an alternative embodiment, the core path extraction module 404 is specifically configured to:
selecting any core node in the node group as an initial endpoint for each node group in the plurality of node groups, performing breadth path expansion on the initial endpoint, and continuing to perform breadth path expansion by taking other core nodes as initial endpoints when expanding to other core nodes in the node group until all the core nodes in the node group are contained in the obtained breadth search path;
and taking the breadth search paths of all the core nodes comprising the node group as core node paths corresponding to the node group.
In an alternative embodiment, the information extraction module 405 is specifically configured to:
and performing secondary association expansion on the obtained multiple core node paths to determine a core graph structure in the transaction connected graph.
According to the embodiment of the application, the plurality of core nodes are extracted from the transaction connected graph, the graph embedding model is utilized, the respective representation vectors of the core nodes are extracted, the plurality of core nodes are clustered and divided into the plurality of node groups according to the respective representation vectors of the core nodes, then the core node paths corresponding to each node group are determined from the transaction connected graph through the shortest path model for each node group, finally the core graph structure in the transaction connected graph is determined according to the plurality of obtained core node paths, the calculation complexity of large-scale transaction connected graph division is reduced, the core nodes can be clustered and divided effectively, and the node association relation is excavated effectively.
Based on the same technical concept, the embodiment of the present application provides a computer device, which may be the terminal device and/or the information extraction system shown in fig. 1, as shown in fig. 5, including at least one processor 501, and a memory 502 connected to the at least one processor, where a specific connection medium between the processor 501 and the memory 502 is not limited in the embodiment of the present application, and in fig. 5, the connection between the processor 501 and the memory 502 is exemplified by a bus. The buses may be divided into address buses, data buses, control buses, etc.
In the embodiment of the present application, the memory 502 stores instructions executable by the at least one processor 501, and the at least one processor 501 may perform the steps of the information extraction method described above by executing the instructions stored in the memory 502.
Where the processor 501 is the control center of the computer device, various interfaces and lines may be used to connect various portions of the computer device to effect information extraction by executing or executing instructions stored in the memory 502 and invoking data stored in the memory 502. Alternatively, the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor, wherein the application processor primarily processes operating systems, user interfaces, application programs, etc., and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501. In some embodiments, processor 501 and memory 502 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.
The processor 501 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, and may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
The memory 502, as a non-volatile computer readable storage medium, may be used to store non-volatile software programs, non-volatile computer executable programs, and modules. The Memory 502 may include at least one type of storage medium, and may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory), magnetic Memory, magnetic disk, optical disk, and the like. Memory 502 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer device, but is not limited to such. The memory 502 in the present embodiment may also be circuitry or any other device capable of implementing a memory function for storing program instructions and/or data.
Based on the same inventive concept, the embodiments of the present application provide a computer-readable storage medium storing a computer program executable by a computer device, which when run on the computer device, causes the computer device to perform the steps of the above-described information extraction method.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, or as a computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer device or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer device or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer device or other programmable apparatus to produce a computer device implemented process such that the instructions which execute on the computer device or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. An information extraction method, characterized by comprising:
extracting a plurality of core nodes from transaction nodes in a transaction connectivity graph;
extracting respective representation vectors of the plurality of core nodes through a graph embedding model;
clustering the plurality of core nodes based on the respective representation vectors of the plurality of core nodes to obtain a plurality of node groups;
determining a core node path corresponding to each node group in the plurality of node groups from the transaction connectivity graph based on the core nodes in the node groups through a shortest path model;
based on the obtained plurality of core node paths, a core graph structure in the transaction connectivity graph is determined.
2. The method of claim 1, wherein extracting the respective representation vectors of the plurality of core nodes by a graph embedding model comprises:
for the plurality of core nodes, the following operations are respectively executed:
taking one core node as an initial endpoint, and performing random walk in the transaction communication graph to obtain a walk sequence corresponding to the one core node;
extracting an embedded representation vector of the walk sequence of the one core node through a graph embedding model;
and taking the embedded representation vector of the wandering sequence as the representation vector of the core node.
3. The method of claim 2, wherein the taking one core node as an initial endpoint performs random walk in the transaction connectivity graph to obtain a walk sequence corresponding to the one core node, and the method comprises:
calculating the weight from each transaction node to each adjacent node aiming at each transaction node in the transaction connectivity graph; summing the weights from the transaction node to each adjacent node to obtain a cumulative weight; taking the ratio of the weight value from the transaction node to each adjacent node to the accumulated weight value as the travelling probability of the transaction node travelling to each adjacent node;
and taking one core node as an initial endpoint, and performing random walk in the transaction communication graph based on each obtained walk probability to obtain a walk sequence corresponding to the one core node.
4. A method as claimed in claim 3, wherein the weight of the transaction node to an adjacent node characterizes: the transaction strength between the transaction node and the one neighboring node.
5. The method of claim 1, wherein the determining, for each node group of the plurality of node groups, a core node path corresponding to the node group from the transaction connectivity graph based on core nodes in the node group by a shortest path model, comprises:
selecting any core node in the node groups as an initial endpoint for each node group in the plurality of node groups, performing breadth path expansion on the initial endpoint, and continuing breadth path expansion by taking other core nodes as initial endpoints when expanding to other core nodes in the node groups until all core nodes in the node groups are contained in the obtained breadth search path;
and taking the breadth search paths of all the core nodes comprising the node group as core node paths corresponding to the node group.
6. The method of claim 1, wherein the determining a core graph structure in the transaction connectivity graph based on the obtained plurality of core node paths comprises:
performing secondary association expansion on the obtained multiple core node paths, and determining a core graph structure in the transaction connected graph.
7. The method according to any one of claims 1-6, wherein each transaction node in the transaction connectivity graph corresponds to a user identifier or a merchant identifier, and the connection relationship between any two transaction nodes is at least used to characterize whether a transaction occurs between any two transaction nodes.
8. An information extraction apparatus, characterized by comprising:
the node extraction module is used for extracting a plurality of core nodes from the transaction nodes in the transaction connection graph;
the vector representation module is used for extracting the respective representation vectors of the plurality of core nodes through a graph embedding model;
the clustering division module is used for clustering the plurality of core nodes based on the respective representation vectors of the plurality of core nodes to obtain a plurality of node groups;
a core path extraction module, configured to determine, for each node group of the plurality of node groups, a core node path corresponding to the node group from the transaction connectivity graph based on core nodes in the node group through a shortest path model;
and the information extraction module is used for determining a core graph structure in the transaction connected graph based on the obtained multiple core node paths.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1-7 when the program is executed.
10. A computer readable storage medium, characterized in that it stores a computer program executable by a computer device, which program, when run on the computer device, causes the computer device to perform the steps of the method according to any one of claims 1-7.
CN202311472149.5A 2023-11-06 2023-11-06 Information extraction method, device, equipment and storage medium Pending CN117390480A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311472149.5A CN117390480A (en) 2023-11-06 2023-11-06 Information extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311472149.5A CN117390480A (en) 2023-11-06 2023-11-06 Information extraction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117390480A true CN117390480A (en) 2024-01-12

Family

ID=89437191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311472149.5A Pending CN117390480A (en) 2023-11-06 2023-11-06 Information extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117390480A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117788174A (en) * 2024-02-26 2024-03-29 山东华创远智信息科技有限公司 Financial user data security protection method based on blockchain

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117788174A (en) * 2024-02-26 2024-03-29 山东华创远智信息科技有限公司 Financial user data security protection method based on blockchain

Similar Documents

Publication Publication Date Title
TWI360754B (en) Web page analysis using multiple graphs
CN111523010A (en) Recommendation method and device, terminal equipment and computer storage medium
CN110009486B (en) Method, system, equipment and computer readable storage medium for fraud detection
CN110738577B (en) Community discovery method, device, computer equipment and storage medium
CN106844407B (en) Tag network generation method and system based on data set correlation
CN111698247A (en) Abnormal account detection method, device, equipment and storage medium
CN109325118B (en) Unbalanced sample data preprocessing method and device and computer equipment
CN117390480A (en) Information extraction method, device, equipment and storage medium
CN108363686A (en) A kind of character string segmenting method, device, terminal device and storage medium
CN111461164B (en) Sample data set capacity expansion method and model training method
CN115577858B (en) Block chain-based carbon emission prediction method and device and electronic equipment
CN111859986A (en) Semantic matching method, device, equipment and medium based on multitask twin network
CN112116436A (en) Intelligent recommendation method and device, computer equipment and readable storage medium
CN112328657A (en) Feature derivation method, feature derivation device, computer equipment and medium
CN113609345B (en) Target object association method and device, computing equipment and storage medium
CN113506113B (en) Credit card cash-registering group-partner mining method and system based on associated network
CN108460038A (en) Rule matching method and its equipment
CN114239083A (en) Efficient state register identification method based on graph neural network
CN109885708A (en) The searching method and device of certificate picture
CN106844338B (en) method for detecting entity column of network table based on dependency relationship between attributes
CN111325578B (en) Sample determination method and device of prediction model, medium and equipment
CN111028092A (en) Community discovery method based on Louvain algorithm, computer equipment and readable storage medium thereof
CN109885651A (en) A kind of question pushing method and device
CN112541357B (en) Entity identification method and device and intelligent equipment
CN113869398A (en) Unbalanced text classification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination