CN115994244B - Directed graph data processing method and device based on big data and computer equipment - Google Patents

Directed graph data processing method and device based on big data and computer equipment Download PDF

Info

Publication number
CN115994244B
CN115994244B CN202111211788.7A CN202111211788A CN115994244B CN 115994244 B CN115994244 B CN 115994244B CN 202111211788 A CN202111211788 A CN 202111211788A CN 115994244 B CN115994244 B CN 115994244B
Authority
CN
China
Prior art keywords
node
target
closed loop
information
pointer array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111211788.7A
Other languages
Chinese (zh)
Other versions
CN115994244A (en
Inventor
廖承龙
周宓
贾克典
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan Nantian Electronics Information Corp ltd
GUANGZHOU NANTIAN COMPUTER SYSTEM CO Ltd
Original Assignee
Yunnan Nantian Electronics Information Corp ltd
GUANGZHOU NANTIAN COMPUTER SYSTEM CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan Nantian Electronics Information Corp ltd, GUANGZHOU NANTIAN COMPUTER SYSTEM CO Ltd filed Critical Yunnan Nantian Electronics Information Corp ltd
Priority to CN202111211788.7A priority Critical patent/CN115994244B/en
Publication of CN115994244A publication Critical patent/CN115994244A/en
Application granted granted Critical
Publication of CN115994244B publication Critical patent/CN115994244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a directed graph data processing method and device based on big data and computer equipment. The method comprises the following steps: acquiring node identification characteristic values corresponding to a plurality of candidate nodes in the mesh map; performing node screening processing according to node identification characteristic values corresponding to the candidate nodes respectively, and determining a plurality of target nodes meeting closed loop link searching conditions; generating a target two-dimensional pointer array based on node identification characteristic values and directional path information corresponding to each of a plurality of target nodes; and according to the preset path matching information, performing recursive traversal on each target node in the target two-dimensional pointer array by adopting a multithreading processing mode to obtain the directional closed loop link information. The method can optimize the closed loop link based on the directed graph, can screen and obtain the target node according to the node identification characteristic value, uses the target two-dimensional pointer array, adopts a multithreading processing mode to carry out recursive traversal, saves the memory space, has high traversal speed and improves the operation efficiency.

Description

Directed graph data processing method and device based on big data and computer equipment
Technical Field
The present invention relates to the field of data processing, and in particular, to a method, an apparatus, a computer device, and a storage medium for processing directed graph data based on big data.
Background
For many scenarios in reality, a mesh map composed of nodes and directed paths, such as a large number of transfer transactions generated by banks each day, may be merged based on a plurality of data or information involved in the scenario, where each node of the mesh map has recorded related directed path information.
In order to find a closed loop from the directed mesh map, the conventional method often needs to occupy a large amount of memory in the process of processing big data, and the operation efficiency is not high.
Therefore, the related art has a problem of low processing efficiency when searching for a closed loop based on a directed graph.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a directed graph data processing method, apparatus, computer device, and storage medium that can solve the foregoing problems.
A directed graph data processing method based on big data, the method comprising:
acquiring node identification characteristic values corresponding to a plurality of candidate nodes in the mesh map;
Performing node screening processing according to node identification characteristic values corresponding to the candidate nodes respectively, and determining a plurality of target nodes meeting closed loop link searching conditions;
generating a target two-dimensional pointer array based on node identification characteristic values and directed path information corresponding to each of the plurality of target nodes;
and according to preset path matching information, performing recursion traversal on each target node in the target two-dimensional pointer array by adopting a multithreading processing mode to obtain directional closed loop link information.
In one embodiment, before the step of obtaining the node identification feature values corresponding to each of the plurality of candidate nodes in the mesh map, the method further includes:
acquiring a mesh map of closed loop link information to be searched; the mesh graph includes a plurality of nodes;
determining a plurality of processing devices and candidate nodes corresponding to the processing devices according to preset distributed configuration information;
and acquiring node identifications of the candidate nodes from the mesh map based on each processing device, and converting the node identifications into node identification characteristic values.
In one embodiment, the node screening process is performed according to the node identification feature values corresponding to the candidate nodes, and determining a plurality of target nodes meeting the closed loop link searching condition includes:
Generating an outbound node pointer array and an inbound node pointer array according to the node identification characteristic values corresponding to the candidate nodes respectively;
performing characteristic value screening treatment on the output node pointer array and the input node pointer array respectively to obtain an output node characteristic value screening result and an input node characteristic value screening result;
and obtaining the plurality of target nodes based on the output node characteristic value screening result and the input node characteristic value screening result according to a preset closed loop link searching condition.
In one embodiment, the performing feature value screening processing on the output node pointer array and the input node pointer array to obtain an output node feature value screening result and an input node feature value screening result includes:
performing feature value sorting and feature value de-duplication on a plurality of node identification feature values in the outbound node pointer array to obtain the outbound node feature value screening result;
and performing feature value sequencing and feature value de-duplication on the plurality of node identification feature values in the input node pointer array to obtain the input node feature value screening result.
In one embodiment, the performing recursive traversal on each target node in the target two-dimensional pointer array by using a multithreading processing manner according to preset path matching information to obtain directional closed loop link information includes:
Taking each target node in the target two-dimensional pointer array as a processing head node of a closed loop link to be searched;
for each processing head node, a single thread is adopted, recursive traversal is carried out according to preset path matching information, and a directed closed loop link corresponding to the processing head node is obtained;
and taking the directional closed loop links corresponding to the processing head nodes as directional closed loop link information.
In one embodiment, the directional path information includes one or more outbound path information, and for each processing head node, a single thread is adopted to perform recursive traversal according to preset path matching information to obtain a directional closed loop link corresponding to the processing head node, including:
for each outbound path information of the processing head node, if the outbound path information is matched with the preset path matching information, determining an inbound node corresponding to the processing head node;
if the outbound path information of the current inbound node is matched with the preset path matching information, determining the next inbound node of the current inbound node until traversing to obtain a closed-loop result;
taking the closed loop results corresponding to the one or more outbound path information as the directed closed loop links corresponding to the processing head node; the directed closed loop link has one or more.
In one embodiment, further comprising:
if the outbound path information of the current inbound node is not matched with the preset path matching information, returning to the last inbound node of the current inbound node;
and carrying out path information matching judgment again for the last ingress node, and determining an updated ingress node of the last ingress node so as to continue traversing according to the updated ingress node.
A directed graph data processing apparatus based on big data, the apparatus comprising:
the node acquisition module is used for acquiring node identification characteristic values corresponding to a plurality of candidate nodes in the mesh map respectively;
the node screening module is used for carrying out node screening processing according to the node identification characteristic values corresponding to the candidate nodes respectively and determining a plurality of target nodes which meet the closed loop link searching condition;
the target two-dimensional pointer array generation module is used for generating a target two-dimensional pointer array based on the node identification characteristic values and the directional path information corresponding to the target nodes respectively;
and the recursion traversing module is used for recursively traversing each target node in the target two-dimensional pointer array in a multithreading processing mode according to preset path matching information to obtain directed closed loop link information.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the big data based directed graph data processing method as described above when the computer program is executed.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a big data based directed graph data processing method as described above.
According to the directed graph data processing method, the device, the computer equipment and the storage medium based on big data, node identification characteristic values corresponding to a plurality of candidate nodes in the mesh graph are obtained, node screening processing is carried out according to the node identification characteristic values corresponding to the candidate nodes, a plurality of target nodes meeting the closed loop link searching condition are determined, then a target two-dimensional pointer array is generated based on the node identification characteristic values corresponding to the target nodes and the directed path information, and then each target node in the target two-dimensional pointer array is subjected to recursive traversal in a multi-thread processing mode according to preset path matching information to obtain directed closed loop link information, so that optimization of searching for a closed loop link based on the directed graph is achieved, the target nodes are obtained through screening according to the node identification characteristic values, the target nodes are subjected to recursive traversal in a multi-thread processing mode, the memory space can be saved, repeated traversal is avoided, the traversing speed is high, and the operation efficiency is improved.
Drawings
FIG. 1 is an application environment diagram of a directed graph data processing method based on big data in one embodiment;
FIG. 2 is a flow chart of a method for processing directed graph data based on big data in one embodiment;
FIG. 3 is a flow chart of a node screening step in one embodiment;
FIG. 4 is a flow chart of a recursive traversal process in one embodiment;
FIG. 5 is a schematic diagram of a process flow for finding a closed loop link in one embodiment;
FIG. 6 is a block diagram of a directed graph data processing device based on big data in one embodiment;
FIG. 7 is an internal block diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for presentation, analyzed data, etc.) related in the present application are both information and data authorized by the user or sufficiently authorized by each party; correspondingly, the application also provides a corresponding user authorization entry for the user to select authorization or select rejection.
The directed graph data processing method based on big data can be applied to an application environment shown in fig. 1. The plurality of terminals 101 (e.g., a plurality of physical devices) communicate with the server 102 through a network, so that the closed-loop link search processing can be performed on the data in the mesh map by adopting the plurality of terminals 101 based on the distributed configuration, and the server 102 can collect the directional closed-loop link information obtained by the plurality of terminals 101, so as to generate a closed-loop link search result for the mesh map. In practical applications, the terminal 101 may be, but not limited to, various personal computers, notebook computers, and the server 102 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.
In one embodiment, as shown in fig. 2, a directed graph data processing method based on big data is provided, and the method is applied to the terminal 101 in fig. 1 for illustration, and includes the following steps:
step 201, obtaining node identification characteristic values corresponding to a plurality of candidate nodes in a mesh map respectively;
the candidate nodes may be part of nodes in the mesh graph, for example, based on the distributed configuration, part of nodes in the mesh graph to be processed by the physical device a may be determined, and then the physical device a may perform closed loop link searching processing for the part of nodes.
As an example, the node identification feature value may be a hash value obtained by converting a node number, and if each node in the mesh map has a unique number, the node number of each node may be converted into the hash value.
In practical application, for a plurality of candidate nodes in the mesh map, node numbers corresponding to the candidate nodes can be obtained, and then each node number can be converted into a hash value.
Specifically, each node in the mesh map corresponds to recorded directional path information, such as related information from an outbound node to an inbound node, so that the number of the outbound node and the number of the inbound node can be respectively extracted from the directional path information, the number of each outbound node can be converted into a corresponding hash value, and the number of each inbound node can be converted into a corresponding hash value. Compared with the character string, the node screening processing is performed through the hash value, so that the processing speed can be increased.
Step 202, performing node screening processing according to node identification characteristic values corresponding to the candidate nodes, and determining a plurality of target nodes meeting closed loop link searching conditions;
after the node identification characteristic values corresponding to the candidate nodes are obtained, node screening processing can be performed according to the node identification characteristic values, and then a plurality of target nodes meeting preset closed loop link searching conditions can be obtained.
In an example, since the mesh graph is a directed graph, the nodes in the graph have one or more directional arrows, i.e., the arrows have directionality, which can point from one node to another, the number of arrows pointed into by each node is the degree of ingress to the node, and the number of arrows pointed out from the node is the degree of egress to the node. To find the closed loop link, nodes that are not likely to form the closed loop link may be removed, e.g., hash values corresponding to nodes that are only outbound or only inbound may be deleted.
Step 203, generating a target two-dimensional pointer array based on node identification characteristic values and directional path information corresponding to each of the plurality of target nodes;
as an example, the target node may be an out-degree node.
The target two-dimensional pointer array can be an asymmetric two-dimensional structure pointer array, and the asymmetric two-dimensional structure pointer array can dynamically apply for the number of row arrays according to the effective node number, so that the memory space can be saved to a greater extent.
In a specific implementation, directional path information corresponding to each target node may be determined based on a plurality of target nodes, where the directional path information may include one or more outbound path information, and further a target two-dimensional pointer array may be generated according to node identification feature values corresponding to the plurality of target nodes and one or more outbound path information of each target node.
In an example, the directional path data (i.e., one or more outbound path information) of the selected target node may be stored in an asymmetric two-dimensional structure pointer array (i.e., a target two-dimensional pointer array) of the dynamic application, where the number of outbound nodes may be used as the number of line arrays, the number of outbound nodes may be used as the number of columns, and the line array pointer points to the first node address of the one-dimensional structure array.
For example, for a plurality of target nodes, the hash values corresponding to the node numbers can be sorted, the sorted hash values are used as row arrays in the asymmetric two-dimensional structure pointer array, each target node is a row, the corresponding columns are configured according to the number of the outbound path information of the target node, and then the rows where different target nodes are located can correspond to different columns, so that by using the asymmetric two-dimensional structure pointer array, the memory can be saved, and the advantages of the dynamic pointer linked list and the array can be fully utilized.
And 204, performing recursion traversal on each target node in the target two-dimensional pointer array by adopting a multithreading processing mode according to preset path matching information to obtain directional closed loop link information.
The preset path matching information may be one or more path parameters configured in advance, which may be used to determine whether the degree path information meets the matching condition, for example, taking a bank account transfer transaction as an example, the transfer amount, transfer time, transfer proportion, etc. may be configured in advance, which is not limited in this embodiment.
After the target two-dimensional pointer array is obtained, each target node in the target two-dimensional pointer array can be subjected to recursive traversal in a multithreading processing mode according to preset path matching information, and further, the directional closed loop link information can be obtained based on the recursive traversal results corresponding to the target nodes.
In one example, after recursive traversal based on the target two-dimensional pointer array, a plurality of directed closed loop links, and all nodes and recorded directed path data in each directed closed loop link, may be obtained.
For example, taking a bank account transfer transaction as an example, the bank generates a large number of account transfer transactions every day, a complex directed network chart formed by account transfer transaction related data can be obtained based on the large data, illegal persons can perform illegal actions through transferring among a plurality of accounts in order to evade tracking of police, namely, an A account is transferred to B, B and C, C is transferred to D, D and is transferred to A, so that a 4-layer account transfer closed loop is formed, and in the embodiment, all account transfer information forming the closed loop can be efficiently found out by configuring account transfer amount, time span, account transfer proportion, account transfer layer number and the like, and the account transfer information can comprise related account numbers, account transfer amount, account transfer time and the like for transferring in the closed loop to be provided for a related department as suspicious illegal account transfer records for decision reference.
In the embodiment of the application, node screening processing is performed according to node identification characteristic values corresponding to a plurality of candidate nodes in a mesh map, a plurality of target nodes meeting a closed loop link searching condition are determined, then a target two-dimensional pointer array is generated based on the node identification characteristic values corresponding to the plurality of target nodes and directional path information, each target node in the target two-dimensional pointer array is recursively traversed according to preset path matching information by adopting a multithreading processing mode, directional closed loop link information is obtained, optimization of searching a closed loop link based on the directional map is achieved, the target nodes are obtained by screening according to the node identification characteristic values, the target nodes are recursively traversed by adopting a multithreading processing mode, memory space can be saved, repeated traversing is avoided, traversing speed is high, and operation efficiency is improved.
In one embodiment, before the step of obtaining the node identification feature values corresponding to each of the plurality of candidate nodes in the mesh map, the method may include the following steps:
acquiring a mesh map of closed loop link information to be searched; the mesh graph includes a plurality of nodes; and determining a plurality of processing devices and a plurality of candidate nodes corresponding to the processing devices according to preset distributed configuration information, so as to process the corresponding plurality of candidate nodes by adopting the processing devices to obtain directional closed-loop link information.
The mesh map may be one or more directed mesh maps acquired based on big data, and each directed mesh map may include a plurality of nodes therein.
In practical application, based on preset distributed configuration information, nodes in the mesh map of the closed-loop link information to be searched can be divided into a plurality of partial nodes to be processed, then a plurality of processing devices and the partial nodes to be processed corresponding to the processing devices, namely a plurality of candidate nodes, can be determined, and then the corresponding plurality of candidate nodes can be processed by the processing devices to obtain the directed closed-loop link information.
For example, when two physical devices (i.e. processing devices) are adopted for distributed processing, the nodes in the mesh map can be divided into two partial nodes to be processed according to a preset configuration proportion, for example, physical device A processes 40% of the nodes and physical device B processes 60% of the nodes, so that a plurality of physical devices can be adopted for searching closed loop links simultaneously through distributed processing, the execution speed is improved, and the operation efficiency is improved.
In an example, when distributed processing is performed by using multiple physical devices (i.e., processing devices), each physical device may use a multithreading manner to process a plurality of candidate nodes corresponding to the physical device, so as to obtain directional closed-loop link information.
According to the embodiment, the mesh map of the closed loop link information to be searched is obtained, then the plurality of processing devices and the candidate nodes corresponding to the plurality of processing devices are determined according to the preset distributed configuration information, further the node identification of the candidate nodes is obtained from the mesh map based on each processing device, the node identification is converted into the node identification characteristic value, the closed loop link can be searched simultaneously by adopting a plurality of physical devices through distributed processing, and the operation efficiency is improved.
In one embodiment, as shown in fig. 3, the node screening process is performed according to the node identification feature values corresponding to each of the plurality of candidate nodes, and the determining a plurality of target nodes may include the following steps:
step 301, generating an outbound degree node pointer array and an inbound degree node pointer array according to node identification characteristic values corresponding to the candidate nodes respectively;
in a specific implementation, for a plurality of candidate nodes, the number of the outbound node and the number of the inbound node can be respectively extracted from the directional path information of the mesh map, then the number of each outbound node can be converted into a corresponding hash value, the number of each inbound node can be converted into a corresponding hash value, and further the corresponding hash value can be respectively stored in an outbound node pointer array and an inbound node pointer array of a dynamic application.
In an example, the array of outbound node pointers and the array of inbound node pointers may be one-dimensional structure arrays.
Step 302, performing feature value screening processing on the output node pointer array and the input node pointer array respectively to obtain an output node feature value screening result and an input node feature value screening result;
in practical application, the characteristic value screening process can adopt characteristic value sorting and characteristic value deduplication, and the hash values in the outbound node pointer array and the inbound node pointer array are respectively sorted so as to solve the problem of hash collision, namely the node number and the hash value are in a unique corresponding relation; and then, the repeated hash values can be removed for the out-degree node pointer array and the in-degree node pointer array respectively, if the out-degree node A has a plurality of out-degree numbers, only the hash value of one node A number is recorded, and the repeatedly recorded hash value of the node A number is removed.
Step 303, obtaining the plurality of target nodes based on the outbound node characteristic value screening result and the inbound node characteristic value screening result according to the preset closed loop link searching condition.
In an example, in order to find a closed-loop link, node characteristic values corresponding to nodes that cannot form the closed-loop link may be removed according to the outbound node characteristic value screening result and the inbound node characteristic value screening result, for example, hash values corresponding to nodes that have only outbound or only inbound degrees may be deleted, and thus a plurality of outbound nodes that meet the closed-loop link finding condition may be obtained as a plurality of target nodes.
According to the embodiment, the output node pointer array and the input node pointer array are generated according to the node identification characteristic values corresponding to the candidate nodes, then characteristic value screening processing is carried out on the output node pointer array and the input node pointer array respectively to obtain an output node characteristic value screening result and an input node characteristic value screening result, and further according to preset closed loop link searching conditions, a plurality of target nodes are obtained based on the output node characteristic value screening result and the input node characteristic value screening result, node screening processing can be carried out through hash values, and processing speed is increased.
In one embodiment, the feature value screening process is performed on the output node pointer array and the input node pointer array to obtain an output node feature value screening result and an input node feature value screening result, which may include the following steps:
performing feature value sorting and feature value de-duplication on a plurality of node identification feature values in the outbound node pointer array to obtain the outbound node feature value screening result; and performing feature value sequencing and feature value de-duplication on the plurality of node identification feature values in the input node pointer array to obtain the input node feature value screening result.
In an example, the feature value sorting can solve the problem of hash collision by sorting hash values in the outbound node pointer array or the inbound node pointer array, i.e. the node number and the hash value are in a unique corresponding relationship; the eigenvalue deduplication may remove duplicate hash values by targeting either the outbound node pointer array or the inbound node pointer array.
Through the embodiment, the characteristic value sorting and the characteristic value duplication removal are performed on the plurality of node identification characteristic values in the output node pointer array to obtain an output node characteristic value screening result, the characteristic value sorting and the characteristic value duplication removal are performed on the plurality of node identification characteristic values in the input node pointer array to obtain an input node characteristic value screening result, the hash conflict problem can be solved and the repeated hash value can be removed through the characteristic value sorting and the characteristic value duplication removal, and data support is provided for subsequent processing.
In one embodiment, as shown in fig. 4, according to the preset path matching information, each target node in the target two-dimensional pointer array is recursively traversed by adopting a multithreading processing manner to obtain directional closed loop link information, which may include the following steps:
step 401, taking each target node in the target two-dimensional pointer array as a processing head node of a closed loop link to be searched;
In a specific implementation, each target node in the target two-dimensional pointer array can be used as a processing head node of a closed loop link to be searched, and then the closed loop can be traversed by taking each target node as a head node.
Step 402, for each processing head node, performing recursive traversal by adopting a single thread according to preset path matching information to obtain a directed closed loop link corresponding to the processing head node;
in practical application, by opening a plurality of threads, each thread can traverse a closed loop with a certain node as a head node respectively, namely, a single thread is adopted, recursive traversal can be performed according to preset path matching information, and then a directed closed loop link corresponding to the head node is obtained.
In an alternative embodiment, when the current processing device has N CPU core thread numbers, N-1 threads can be started, each thread can traverse a closed loop with a certain node as a head node, and then the next node is applied for traversing after the completion of the process, so that the requirement of multi-thread processing can be met by starting N-1 threads for traversing, the performance of the multi-core multi-thread CPU is fully utilized, and the normal operation of the machine can be maintained without blocking.
In an example, taking a single physical device as an example, the physical device may have a total process for controlling the processing progress of multiple threads, through node coordination of the total process, it may be avoided that the threads repeatedly process the processed nodes, for example, each thread may process a closed loop with a certain node as a first node, after the thread processing is completed, the total process may apply for processing a next node to the total process, and further, the total process may allocate, among unprocessed nodes, the unprocessed certain node as the first node to be processed to the applied thread, and the total process may allocate, according to the unprocessed node numbers, in order from small to large until all the multiple threads process all the nodes.
And step 402, taking the directional closed loop links corresponding to the processing head nodes as directional closed loop link information.
In an example, the traversed directional closed-loop link may be written into a file, and further directional closed-loop link information may be obtained based on the directional closed-loop links corresponding to each of the plurality of processing head nodes.
According to the embodiment, each target node in the target two-dimensional pointer array is used as the processing head node of the closed loop link to be searched, then, for each processing head node, a single thread is adopted, recursive traversal is carried out according to preset path matching information, the directed closed loop link corresponding to the processing head node is obtained, further, the directed closed loop links corresponding to the processing head nodes are used as the directed closed loop link information, the multi-core multi-thread CPU performance can be fully utilized in a multi-thread processing mode, and the operation efficiency is improved.
In an embodiment, the directional path information may include one or more outbound path information, and for each processing head node, a single thread is adopted to perform recursive traversal according to preset path matching information to obtain a directional closed loop link corresponding to the processing head node, which may include the following steps:
for each outbound path information of the processing head node, if the outbound path information is matched with the preset path matching information, determining an inbound node corresponding to the processing head node;
in practical application, whether the outbound path information is matched with the preset path matching information can be judged according to each outbound path information of the processing head node, and further when the outbound path information is matched with the preset path matching information, the relevant information of the inbound node can be extracted from the outbound path information so as to determine the inbound node corresponding to the processing head node.
In an example, each node may have one or more outbound paths, that is, indicate an arrow, where the outbound path information of the node may record related information from the node to its inbound node, taking a bank transfer transaction as an example, each account may be a node, when there is a situation from account a to account B, the node a may have outbound path information for pointing to the node B, where the outbound path information may include account information corresponding to the node B, transfer amount between the account a and the account B, transfer time, transfer proportion, and so on, and further according to preset path matching information, it may be determined whether the outbound path information of the node a pointing to the node B meets a matching condition.
If the outbound path information of the current inbound node is matched with the preset path matching information, determining the next inbound node of the current inbound node until traversing to obtain a closed-loop result;
in a specific implementation, a next incoming degree node can be determined by processing the first node, then the incoming degree node is used as a current incoming degree node, a path matching judgment can be adopted to determine the corresponding next incoming degree node, and in this way, a closed loop searching link processing is performed until a closed loop result is obtained through traversal.
For example, the processing head node is node a, node B pointed by node a can be determined according to the path matching information, then whether the outbound path information between node B and node C is matched with the path matching information can be judged, and when the outbound path information is matched with the path matching information, node C pointed by node B can be determined, so that traversal is performed until node a is pointed back, and a closed loop link is formed.
Taking the closed loop results corresponding to the one or more outbound path information as the directed closed loop links corresponding to the processing head node; the directed closed loop link has one or more.
In an example, the processing head node may have one or more outbound path information, and then a closed loop result corresponding to each of the one or more outbound path information may be used as a directed closed loop link corresponding to the processing head node, e.g., one or more directed closed loop links may be obtained for the processing head node a through traversal.
According to the embodiment, for each outgoing degree path information of the processing head node, if the outgoing degree path information is matched with the preset path matching information, determining an incoming degree node corresponding to the processing head node, if the outgoing degree path information of the current incoming degree node is matched with the preset path matching information, determining the next incoming degree node of the current incoming degree node until a closed loop result is obtained through traversing, further taking the closed loop result corresponding to one or more outgoing degree path information as a directed closed loop link corresponding to the processing head node, wherein the directed closed loop link has one or more target nodes and can be traversed by taking each target node as the processing head node, so that one or more directed closed loop links are obtained, and the traversing effect of the closed loop link is improved.
In one embodiment, the method may further comprise the steps of:
if the outbound path information of the current inbound node is not matched with the preset path matching information, returning to the last inbound node of the current inbound node; and carrying out path information matching judgment again for the last ingress node, and determining an updated ingress node of the last ingress node so as to continue traversing according to the updated ingress node.
In practical application, by a recursion algorithm, when the outbound path information of the current inbound node is not matched with the preset path matching information, the current inbound node is returned to the last inbound node of the current inbound node, so that path information matching judgment can be carried out again for the last inbound node, and the updated inbound node of the last inbound node is determined, so that traversal is continued according to the updated inbound node.
For example, after the first node is the node a, after the link from the node a to the node B to the node C to the node D is obtained, if the node D cannot be matched with the next incoming node conforming to the preset path matching information, the node C may be returned to perform the path information matching judgment again to obtain the update path from the node C to the node E, so that the updated link is the node a to the node B to the node C to the node E, without returning to the first node a, the process of traversing the node a to the node B to the node C is repeated, and thus the repeated traversing can be avoided by using the traversing result before the recursive algorithm is saved.
In one example, the key algorithm employed in the recursive traversal process has a temporal complexity of 0 (n×log 2 n), where n is the number of directed paths, log 2 n represents a base 2 logarithm that exhibits excellent traversal speed for a directed graph-based lookup of a closed loop link.
By the embodiment, if the outbound path information of the current inbound node is not matched with the preset path matching information, the current inbound node is returned to the last inbound node of the current inbound node, and then the path information matching judgment is carried out again for the last inbound node, so that the updated inbound node of the last inbound node is determined, the traversal is continued according to the updated inbound node, the previous traversal result can be saved by using a recursion algorithm, repeated traversal is avoided, and the traversal speed is improved.
In order to enable those skilled in the art to better understand the above steps, an embodiment of the present application will be exemplarily described below by way of an example with reference to fig. 5, but it should be understood that the embodiment of the present application is not limited thereto.
1. Converting the node number (i.e. the node identification of the candidate node) into a hash value (i.e. the node identification characteristic value) and storing the hash value into an outbound degree (i.e. an outbound degree node pointer array) and an inbound degree group (i.e. an inbound degree node pointer array) respectively;
2. sorting the hash values of the out-degree and in-degree node pointer arrays (i.e., sorting the characteristic values of a plurality of node identification characteristic values in the out-degree node pointer array, and sorting the characteristic values of a plurality of node identification characteristic values in the in-degree node pointer array);
3. Removing repeated hash values from the output node pointer array and the input node pointer array respectively (namely, performing eigenvalue deduplication on a plurality of node identification eigenvalues in the output node pointer array and performing eigenvalue deduplication on a plurality of node identification eigenvalues in the input node pointer array);
4. deleting the node hash value with only the outbound degree or only the inbound degree to obtain a plurality of target nodes which meet the closed loop link searching condition;
5. the directed path data (namely node identification characteristic values and directed path information corresponding to a plurality of target nodes) are stored in an asymmetric two-dimensional structure pointer array (namely a target two-dimensional pointer array);
6. each thread recursively traverses one head node (namely processes the head node), and the next head node is applied after the completion;
7. multiple physical devices (i.e., multiple processing devices) may be utilized for simultaneous traversal through distributed processing;
8. and writing the file into a directed closed loop (namely a directed closed loop link obtained by traversing each processing head node) formed in the traversing process.
It should be understood that, although the steps in the flowcharts of fig. 2-4 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-4 may include multiple steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the steps or stages in other steps or other steps.
In one embodiment, as shown in fig. 6, there is provided a directed graph data processing apparatus based on big data, including:
a node obtaining module 601, configured to obtain node identification feature values corresponding to a plurality of candidate nodes in the mesh map;
the node screening module 602 is configured to perform node screening according to node identifier feature values corresponding to the plurality of candidate nodes, and determine a plurality of target nodes that meet a closed loop link search condition;
a target two-dimensional pointer array generating module 603, configured to generate a target two-dimensional pointer array based on node identification feature values and directional path information corresponding to each of the plurality of target nodes;
and a recursive traversing module 604, configured to recursively traverse each target node in the target two-dimensional pointer array by using a multithreading processing manner according to the preset path matching information, so as to obtain directional closed loop link information.
In one embodiment, the apparatus further comprises:
the network diagram acquisition module is used for acquiring a network diagram of closed loop link information to be searched; the mesh graph includes a plurality of nodes;
the distributed configuration module is used for determining a plurality of processing devices and candidate nodes corresponding to the processing devices according to preset distributed configuration information;
And the node identification conversion module is used for acquiring the node identifications of the candidate nodes from the mesh map based on each processing device and converting the node identifications into node identification characteristic values.
In one embodiment, the node screening module 602 includes:
the one-dimensional pointer array generation submodule is used for generating an outbound degree node pointer array and an inbound degree node pointer array according to the node identification characteristic values corresponding to the candidate nodes respectively;
the characteristic value screening submodule is used for respectively carrying out characteristic value screening treatment on the output node pointer array and the input node pointer array to obtain an output node characteristic value screening result and an input node characteristic value screening result;
the target node obtaining submodule is used for obtaining the plurality of target nodes based on the output node characteristic value screening result and the input node characteristic value screening result according to a preset closed loop link searching condition.
In one embodiment, the eigenvalue screening submodule includes:
the output pointer array processing unit is used for carrying out feature value sequencing and feature value de-duplication on a plurality of node identification feature values in the output node pointer array to obtain the output node feature value screening result;
And the input degree pointer array processing unit is used for carrying out feature value sequencing and feature value de-duplication on a plurality of node identification feature values in the input degree node pointer array to obtain the input degree node feature value screening result.
In one embodiment, the recursive traversal module 604 comprises:
the processing head node determining submodule is used for taking each target node in the target two-dimensional pointer array as a processing head node of a closed loop link to be searched;
the single-thread processing sub-module is used for recursively traversing each processing head node by adopting a single thread according to preset path matching information to obtain a directed closed loop link corresponding to the processing head node;
and the directed closed loop link information obtaining submodule is used for taking the directed closed loop links corresponding to the processing head nodes as directed closed loop link information.
In one embodiment, the directed path information includes one or more outbound path information, and the single-threaded processing submodule includes:
the path matching information matching unit is used for determining an ingress node corresponding to the processing head node for each egress path information of the processing head node if the egress path information is matched with the preset path matching information;
The traversal unit is used for determining the next incoming degree node of the current incoming degree node if the outgoing degree path information of the current incoming degree node is matched with the preset path matching information until a closed-loop result is obtained through traversal;
a directional closed loop link obtaining unit, configured to use the closed loop results corresponding to the one or more outbound path information as a directional closed loop link corresponding to the processing head node; the directed closed loop link has one or more.
In one embodiment, the apparatus further comprises:
the first recursion processing module is used for returning to the last input node of the current input node if the output path information of the current input node is not matched with the preset path matching information;
and the second recursion processing module is used for carrying out path information matching judgment again on the last incoming degree node, determining the updating incoming degree node of the last incoming degree node and continuing traversing according to the updating incoming degree node.
Specific limitations regarding a big data based directed graph data processing apparatus can be found in the above description of a big data based directed graph data processing method, and will not be described herein. The various modules in the big data based directed graph data processing device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing directed graph data processing data based on big data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a directed graph data processing method based on big data.
It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:
acquiring node identification characteristic values corresponding to a plurality of candidate nodes in the mesh map;
performing node screening processing according to node identification characteristic values corresponding to the candidate nodes respectively, and determining a plurality of target nodes meeting closed loop link searching conditions;
generating a target two-dimensional pointer array based on node identification characteristic values and directed path information corresponding to each of the plurality of target nodes;
and according to preset path matching information, performing recursion traversal on each target node in the target two-dimensional pointer array by adopting a multithreading processing mode to obtain directional closed loop link information.
In one embodiment, the processor, when executing the computer program, further implements the steps of the big data based directed graph data processing method in the other embodiments described above.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring node identification characteristic values corresponding to a plurality of candidate nodes in the mesh map;
Performing node screening processing according to node identification characteristic values corresponding to the candidate nodes respectively, and determining a plurality of target nodes meeting closed loop link searching conditions;
generating a target two-dimensional pointer array based on node identification characteristic values and directed path information corresponding to each of the plurality of target nodes;
and according to preset path matching information, performing recursion traversal on each target node in the target two-dimensional pointer array by adopting a multithreading processing mode to obtain directional closed loop link information.
In an embodiment, the computer program when executed by a processor also implements the steps of the big data based directed graph data processing method in the other embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (9)

1. A directed graph data processing method based on big data, the method comprising:
acquiring node identification characteristic values corresponding to a plurality of candidate nodes in the mesh map;
performing node screening processing according to node identification characteristic values corresponding to the candidate nodes respectively, and determining a plurality of target nodes meeting closed loop link searching conditions;
Generating a target two-dimensional pointer array based on node identification characteristic values and directed path information corresponding to each of the plurality of target nodes;
according to preset path matching information, performing recursion traversal on each target node in the target two-dimensional pointer array by adopting a multithreading processing mode to obtain directed closed loop link information; each target node in the target two-dimensional pointer array is used as a processing head node of a closed loop link to be searched; for each processing head node, a single thread is adopted, recursive traversal is carried out according to preset path matching information, and a directed closed loop link corresponding to the processing head node is obtained; and taking the directional closed loop links corresponding to the processing head nodes as directional closed loop link information.
2. The method of claim 1, further comprising, prior to the step of obtaining node identification feature values for each of a plurality of candidate nodes in the mesh map:
acquiring a mesh map of closed loop link information to be searched; the mesh graph includes a plurality of nodes;
and determining a plurality of processing devices and a plurality of candidate nodes corresponding to the processing devices according to preset distributed configuration information, so as to process the corresponding plurality of candidate nodes by adopting the processing devices to obtain directional closed-loop link information.
3. The method of claim 1, wherein the performing node screening according to the node identification feature values corresponding to the candidate nodes, and determining a plurality of target nodes that meet the closed-loop link search condition, includes:
generating an outbound node pointer array and an inbound node pointer array according to the node identification characteristic values corresponding to the candidate nodes respectively;
performing characteristic value screening treatment on the output node pointer array and the input node pointer array respectively to obtain an output node characteristic value screening result and an input node characteristic value screening result;
and obtaining the plurality of target nodes based on the output node characteristic value screening result and the input node characteristic value screening result according to a preset closed loop link searching condition.
4. The method of claim 3, wherein the performing feature value screening on the output node pointer array and the input node pointer array to obtain an output node feature value screening result and an input node feature value screening result includes:
performing feature value sorting and feature value de-duplication on a plurality of node identification feature values in the outbound node pointer array to obtain the outbound node feature value screening result;
And performing feature value sequencing and feature value de-duplication on the plurality of node identification feature values in the input node pointer array to obtain the input node feature value screening result.
5. The method according to claim 1, wherein the directional path information includes one or more outbound path information, and the performing recursion traversal according to preset path matching information by using a single thread for each processing head node to obtain a directional closed loop link corresponding to the processing head node includes:
for each outbound path information of the processing head node, if the outbound path information is matched with the preset path matching information, determining an inbound node corresponding to the processing head node;
if the outbound path information of the current inbound node is matched with the preset path matching information, determining the next inbound node of the current inbound node until traversing to obtain a closed-loop result;
taking the closed loop results corresponding to the one or more outbound path information as the directed closed loop links corresponding to the processing head node; the directed closed loop link has one or more.
6. The method as recited in claim 5, further comprising:
If the outbound path information of the current inbound node is not matched with the preset path matching information, returning to the last inbound node of the current inbound node;
and carrying out path information matching judgment again for the last ingress node, and determining an updated ingress node of the last ingress node so as to continue traversing according to the updated ingress node.
7. A directed graph data processing apparatus based on big data, the apparatus comprising:
the node acquisition module is used for acquiring node identification characteristic values corresponding to a plurality of candidate nodes in the mesh map respectively;
the node screening module is used for carrying out node screening processing according to the node identification characteristic values corresponding to the candidate nodes respectively and determining a plurality of target nodes which meet the closed loop link searching condition;
the target two-dimensional pointer array generation module is used for generating a target two-dimensional pointer array based on the node identification characteristic values and the directional path information corresponding to the target nodes respectively;
the recursion traversing module is used for recursively traversing each target node in the target two-dimensional pointer array in a multithreading mode according to preset path matching information to obtain directed closed loop link information; each target node in the target two-dimensional pointer array is used as a processing head node of a closed loop link to be searched; for each processing head node, a single thread is adopted, recursive traversal is carried out according to preset path matching information, and a directed closed loop link corresponding to the processing head node is obtained; and taking the directional closed loop links corresponding to the processing head nodes as directional closed loop link information.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the big data based directed graph data processing method of any of claims 1 to 6 when the computer program is executed.
9. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the big data based directed graph data processing method of any of claims 1 to 6.
CN202111211788.7A 2021-10-18 2021-10-18 Directed graph data processing method and device based on big data and computer equipment Active CN115994244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111211788.7A CN115994244B (en) 2021-10-18 2021-10-18 Directed graph data processing method and device based on big data and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111211788.7A CN115994244B (en) 2021-10-18 2021-10-18 Directed graph data processing method and device based on big data and computer equipment

Publications (2)

Publication Number Publication Date
CN115994244A CN115994244A (en) 2023-04-21
CN115994244B true CN115994244B (en) 2024-03-19

Family

ID=85992809

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111211788.7A Active CN115994244B (en) 2021-10-18 2021-10-18 Directed graph data processing method and device based on big data and computer equipment

Country Status (1)

Country Link
CN (1) CN115994244B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339501A (en) * 2008-08-12 2009-01-07 北京航空航天大学 WS-BPEL control loop detection method based on directed graph
CN110598054A (en) * 2019-08-19 2019-12-20 桂林长海发展有限责任公司 Multithreading linked list processing method and device and computer readable storage medium
CN110955608A (en) * 2019-12-23 2020-04-03 金蝶软件(中国)有限公司 Test data processing method and device, computer equipment and storage medium
CN110968429A (en) * 2019-12-20 2020-04-07 北京百度网讯科技有限公司 Method, device, equipment and storage medium for loop detection in directed graph
CN111814002A (en) * 2019-04-12 2020-10-23 阿里巴巴集团控股有限公司 Directed graph identification method and system and server
CN112241474A (en) * 2020-11-24 2021-01-19 深圳前海微众银行股份有限公司 Information processing method, device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6738777B2 (en) * 2000-12-20 2004-05-18 Microsoft Corporation Chaining actions for a directed graph

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101339501A (en) * 2008-08-12 2009-01-07 北京航空航天大学 WS-BPEL control loop detection method based on directed graph
CN111814002A (en) * 2019-04-12 2020-10-23 阿里巴巴集团控股有限公司 Directed graph identification method and system and server
CN110598054A (en) * 2019-08-19 2019-12-20 桂林长海发展有限责任公司 Multithreading linked list processing method and device and computer readable storage medium
CN110968429A (en) * 2019-12-20 2020-04-07 北京百度网讯科技有限公司 Method, device, equipment and storage medium for loop detection in directed graph
CN110955608A (en) * 2019-12-23 2020-04-03 金蝶软件(中国)有限公司 Test data processing method and device, computer equipment and storage medium
CN112241474A (en) * 2020-11-24 2021-01-19 深圳前海微众银行股份有限公司 Information processing method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
求解大规模稀疏有向图回路的多线程并行算法;牛健;崔焕庆;成曦;傅游;;山东科技大学学报(自然科学版)(02);全文 *

Also Published As

Publication number Publication date
CN115994244A (en) 2023-04-21

Similar Documents

Publication Publication Date Title
US11430261B2 (en) Target re-identification
CN112287182B (en) Graph data storage and processing method and device and computer storage medium
CN106682215B (en) Data processing method and management node
US20120143844A1 (en) Multi-level coverage for crawling selection
US20170139913A1 (en) Method and system for data assignment in a distributed system
US20230056760A1 (en) Method and apparatus for processing graph data, device, storage medium, and program product
Mestre et al. Adaptive sorted neighborhood blocking for entity matching with mapreduce
CN112287339A (en) APT intrusion detection method and device and computer equipment
CN107920067B (en) Intrusion detection method on active object storage system
CN115994244B (en) Directed graph data processing method and device based on big data and computer equipment
Aryal et al. SparkSNN: a density-based clustering algorithm on spark
CN116719646A (en) Hot spot data processing method, device, electronic device and storage medium
WO2023083241A1 (en) Graph data division
CN115883172A (en) Anomaly monitoring method and device, computer equipment and storage medium
CN116225314A (en) Data writing method, device, computer equipment and storage medium
CN110555158A (en) mutually exclusive data processing method and system, and computer readable storage medium
CN115454781A (en) Data visualization display method and system based on enterprise architecture system
CN115858322A (en) Log data processing method and device and computer equipment
CN111191082B (en) Data management method, device, computer equipment and storage medium
Krishna et al. Mugram: An approach for multi-labelled graph matching
US7929774B2 (en) Method of inferential analysis of low resolution images
Li et al. Memory effect in DBSCAN algorithm
CN112486615B (en) Method, device, equipment and storage medium for executing decision flow based on topological path
CN110968454B (en) Method and apparatus for determining recovery data for lost data blocks
CN118034801A (en) Code loading method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant