CN116340559A - Graph data processing method - Google Patents

Graph data processing method Download PDF

Info

Publication number
CN116340559A
CN116340559A CN202310576057.5A CN202310576057A CN116340559A CN 116340559 A CN116340559 A CN 116340559A CN 202310576057 A CN202310576057 A CN 202310576057A CN 116340559 A CN116340559 A CN 116340559A
Authority
CN
China
Prior art keywords
node
target
graph
candidate
graph data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310576057.5A
Other languages
Chinese (zh)
Other versions
CN116340559B (en
Inventor
孟轲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Damo Institute Hangzhou Technology Co Ltd
Original Assignee
Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Damo Institute Hangzhou Technology Co Ltd filed Critical Alibaba Damo Institute Hangzhou Technology Co Ltd
Priority to CN202310576057.5A priority Critical patent/CN116340559B/en
Publication of CN116340559A publication Critical patent/CN116340559A/en
Application granted granted Critical
Publication of CN116340559B publication Critical patent/CN116340559B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a graph data processing method, wherein the graph data processing method comprises the following steps: acquiring to-be-processed image data and a target image mode; screening a candidate node set in the graph data to be processed based on the designated initial node and the target graph mode, and counting the recurrence frequency of each node in the candidate node set; screening target node pairs from a target node set, and counting the number of target node pairs respectively related to each node in the candidate node set, wherein the target node set is determined from the map data to be processed based on the designated initial node and the target map mode, and the target node pairs are determined based on node connection relations in the target map mode; and determining sub-graph data corresponding to the target graph mode in the graph data to be processed according to the recurrence frequency and the number.

Description

Graph data processing method
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a graph data processing method.
Background
With the continuous development of computer technology, the method for processing graph data is also continuously optimized.
Currently, in order to mine out a subgraph conforming to a graph mode in a graph, pairing check is generally required to be performed on nodes in the graph to determine connection relations between the nodes.
However, in the process of pairing and checking the nodes in the graph, repeated or redundant node matching exists, so that the waste of computing resources is caused, and the efficiency of mining graph data conforming to the graph mode in the graph data is affected.
Therefore, there is a need to provide a more reliable solution to achieve efficient processing of graph data.
Disclosure of Invention
In view of this, the present embodiment provides a graph data processing method. One or more embodiments of the present specification relate to a graph data processing apparatus, a computing device, a computer-readable storage medium, and a computer program that solve the technical drawbacks existing in the prior art.
According to a first aspect of embodiments of the present specification, there is provided a graph data processing method, including:
acquiring to-be-processed image data and a target image mode;
screening a candidate node set in the graph data to be processed based on the designated initial node and the target graph mode, and counting the recurrence frequency of each node in the candidate node set;
screening target node pairs from a target node set, and counting the number of target node pairs respectively related to each node in the candidate node set, wherein the target node set is determined from the map data to be processed based on the designated initial node and the target map mode, and the target node pairs are determined based on node connection relations in the target map mode;
And determining sub-graph data corresponding to the target graph mode in the graph data to be processed according to the recurrence frequency and the number.
According to a second aspect of embodiments of the present specification, there is provided a graph data processing apparatus comprising:
the acquisition module is configured to acquire the to-be-processed graph data and the target graph mode;
the statistics module is configured to screen a candidate node set in the graph data to be processed based on a designated initial node and the target graph mode, and count the recurrence frequency of each node in the candidate node set;
a screening module configured to screen target node pairs from a target node set and count the number of target node pairs respectively related to nodes in the candidate node set, wherein the target node set is determined from the map data to be processed based on the specified initial node and the target map mode, and the target node pairs are determined based on node connection relations in the target map mode;
and the determining module is configured to determine sub-graph data corresponding to the target graph mode in the graph data to be processed according to the recurrence frequency and the number.
According to a third aspect of embodiments of the present specification, there is provided a computing device comprising:
A memory and a processor;
the memory is configured to store computer executable instructions that, when executed by the processor, perform the steps of the graph data processing method described above.
According to a fourth aspect of embodiments of the present specification, there is provided a computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of the graph data processing method described above.
According to a fifth aspect of embodiments of the present specification, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the graph data processing method described above.
One embodiment of the present description implements obtaining pending diagram data and a target diagram mode; screening a candidate node set in the graph data to be processed based on the designated initial node and the target graph mode, and counting the recurrence frequency of each node in the candidate node set; screening target node pairs from a target node set, and counting the number of target node pairs respectively related to each node in the candidate node set, wherein the target node set is determined from the map data to be processed based on the designated initial node and the target map mode, and the target node pairs are determined based on node connection relations in the target map mode; and determining sub-graph data corresponding to the target graph mode in the graph data to be processed according to the recurrence frequency and the number.
The image data to be processed and the target image mode are acquired, so that the image data to be processed can be conveniently processed based on the target image mode; the method comprises the steps of screening a candidate node set based on a designated initial node and a target graph mode, counting the recurrence frequency of nodes in the candidate node set, screening target node pairs in the target node set, counting the number of target node pairs corresponding to the nodes, and determining sub-graph data in graph data to be processed based on the number of target node pairs and the recurrence frequency of the nodes, so that each node is prevented from being calculated, calculation resources are saved, and the processing efficiency of the graph data to be processed is improved.
Drawings
FIG. 1 is a schematic view of a scenario of a graph data processing method according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a graph data processing method provided in one embodiment of the present disclosure;
FIG. 3 is a process flow diagram of a graph data processing method according to one embodiment of the present disclosure;
FIG. 4 is a process flow diagram of another graph data processing method provided by one embodiment of the present disclosure;
FIG. 5 is a flow chart of a commodity graph data processing method for an electronic commerce provided in one embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a data processing apparatus according to one embodiment of the present disclosure;
FIG. 7 is a block diagram of a computing device provided in one embodiment of the present description.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.
The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
Furthermore, it should be noted that, user information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for analysis, stored data, presented data, etc.) according to one or more embodiments of the present disclosure are information and data authorized by a user or sufficiently authorized by each party, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions, and is provided with corresponding operation entries for the user to select authorization or denial.
First, terms related to one or more embodiments of the present specification will be explained.
The figure: graph, an irregular data structure consisting of points and edges.
Graph mining: GPM, graph pattern mining finds sub-graphs isomorphic with a given pattern on the graph structure.
Graph partitioning: the graph parts divide a large graph into sub-graphs with point sets or edge sets which are not overlapped with each other.
GPU: graphics processing unit the display processing unit is a microprocessor specially used for image and graphic related operation.
Matching sequence: the order of points is considered when looking for pattern maps.
Self-isomorphism: automorphism, self-isomorphism, is a symmetry of a graph in which each vertex and edge can be re-labeled with other vertices and edges without changing the structure of the graph.
Error candidates: false candidates refers to candidate points encountered during the search that do not form the final result.
The graph schema mining has made significant progress in recent years, opening a series of big data applications. However, GPM applications require a large number of pairing checks, which require enumerating all possible vertex pairs and checking their connectivity or computing the public neighbors of each pair. Because of its great parallelism, a Graphics Processor (GPU) has the potential to accelerate GPM applications. However, while many approaches to reduce false candidates have been proposed, such as optimizing matching order, early termination, and self-isomorphic filtering, many existing frameworks still suffer from serious false candidate problems. These erroneous candidates waste significant computational effort and exacerbate the load imbalance problem, resulting in inefficient implementation on the GPU.
The graph mining algorithm is applied to more scenes, such as a structured electronic commerce based on a knowledge graph, inquiring a place meeting requirements in a road network, mining a user relationship in a social media network for analysis, or performing medicine searching in a medicine molecular database. Unlike traditional graph computing application, the graph mining algorithm not only relates to updating of node states, but also focuses on the topology of the subgraph, and because the graph mining algorithm often needs to store a large amount of topology information of the subgraph in the computing process, huge pressure is generated on the memory capacity, and meanwhile, higher requirements are also put on the access speed of data. The subgraph matching algorithm takes the labeled pattern graph as input, and searches all subgraphs isomorphic with the pattern graph in the data graph as output. To cope with the increasingly rich graph mining scenarios, as well as low-latency, efficient query requirements, the GPU is utilized to compute the graph mining application. Specifically, the following technical problems need to be solved: (1) a GPU-friendly graph mining operator. The graph mining task can be realized only by using set operation, a set of high-concurrency and hierarchical storage characteristics of the GPU are required to be designed for optimization, the joint memory access capability of the GPU is fully utilized, and meanwhile, redundant data loading is avoided, so that an upper-layer graph mining algorithm is supported. (2) In the calculation process of the graph mining task, a large number of edges are checked and are exponentially related to the original edge data, and the checked edges can form solutions only with a very small proportion, so that the redundant calculation in the graph mining task needs to be purposefully optimized to improve the arrangement utilization efficiency of the system. (3) Based on the above optimization of collective operations operator and redundant computation, how to construct a system supporting multiple graph mining algorithms, common problems are extracted from various graph mining algorithms, and the system can obtain high performance on more general scenes by optimizing the common problems.
The method of the present specification utilizes the characteristics of data and pattern graphs to reduce redundant searches by avoiding checking already checked pairs of points and to reduce useless searches by skipping checking of two vertices inferred to be non-connected based on pre-generated connectivity samples, enabling optimization of the pattern mining (GPM) problem.
In the present specification, a graph data processing method is provided, and the present specification relates to a graph data processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.
Referring to fig. 1, fig. 1 shows a schematic view of a scenario of a graph data processing method according to an embodiment of the present disclosure, which specifically includes:
receiving a graph data processing task, and determining a target graph mode to be queried and graph data to be processed in the graph 1 based on the graph processing task; after determining the target graph mode and the graph data to be processed, the method 1 or the method 2 in fig. 1 can be selected based on the requirement to mine the subgraph, namely, the subgraph conforming to the target graph mode is obtained from the graph data to be processed.
Specifically, in the method 1, redundant searching is reduced by avoiding checking the node pairs which are checked, so that the determination efficiency of the subgraph is improved; in practical applications, the candidate node sets corresponding to different node pairs are very similar. For example, in looking up optional nodes 2 and 3 in 3-star mode, no edges need to be connected to each other, while both are connected to 0 and both are disconnected from 1. When enumerating the candidate node for 1, the search space gap for nodes 2 and 3 is found to be small, resulting in redundant computation for the same node.
To eliminate redundancy occurring when checking node pairs one by one in the candidate node set, method 1 performs subgraph mining based on checking the complement of the corresponding candidate node set. By quickly estimating the number of possible sub-graphs and then eliminating erroneous edge pairs through a low overhead process, a large number of duplicate edge-connectivity checks are avoided.
Specifically, the candidate node set s= {1,2,3,4,5,6,7}, based on the graph data to be processed, is determined. Step 1: the recurrence frequency of the node is calculated. Assuming that id (v 0) =0, i.e., it is determined that node 0 in the graph data to be processed is v0, S (v 1) =n (0) = {3,4,5,6,7}, i.e., in the case that v0 is 0, v1 may be only one of {3,4,5,6,7} nodes in order to satisfy the t condition of id (v 1) > id (v 2) > id (v 3). And screening the candidate node set corresponding to v2 from the graph data to be processed based on the candidate node set corresponding to v. The candidate set for v2 is listed below: cand [7] = {1,2,3,6}, i.e. in case v1 is 7, v2 may be 1,2,3,6; similarly, cand [6] = {1,2,3,5}, cand [5] = {1,2,3}, cand [4] = {1,2,3}, cand [3] = {1,2}; by summing the number of occurrences of each node in each v2 candidate set, the frequency of nodes {1,2,3,5,6} is found to be {5,5,4,1,1}. Step 2: the required point logarithm is calculated. For a given candidate set s= {1,2,3,4,5,6,7}, we list all possible vertex pairs without edges, e.g., <1,6>, <2,6>, …, <1,2>. Then, the logarithm associated with vertex {1,2,3,5,6} is {0,1,2,3,4}. Step 3: and estimating the number of the subgraphs. Based on Freq (i) and Pair (i } (i e {1,2,3,5,6 }), we can estimate the number of possible candidate sets as Freq (i) ∗ Pair (i), but such an estimate has a repetition count, including invalid candidates, such as [0,7,6,5], because there is actually no for the point Pair <5,6>, resulting in an inaccurate estimate. Step 4: and (5) correcting. To obtain more accurate results, the number of invalid point pairs needs to be calculated. For each Cand [ k ], we calculate the number of pairs of points < i, j > = 0 with i e Comp [ k ], i.e., the complement of Cand [ k ], j e Cand [ k ]. These pairs are then deleted from the previous estimates. It should be noted that the point pair < i, j > with i, j e Comp k should not be considered part of the correction because it was not accounted for in step 3. Ineffective searching is reduced by skipping pairs of points that have been inferred as unconnected from pre-generated connectivity samples.
Specifically, method 2 reduces ineffective searching by skipping pairs of points that have been inferred as unconnected from pre-generated connectivity samples. In practical application, the sparsity of the data graph, namely the graph to be processed, makes the possibility that two nodes are not communicated larger. The 4-path mode requires that 2 and 3 are not connected to each other, but are in contrast to the connection state of 0 and 1. The candidate points 2 and 3 are more likely to be naturally disjoint by edge continuity checking, resulting in more edge continuity checking.
To reduce invalid searches, method 2 implements a batch node pair check, rather than enumerating all node pairs one by one. The erroneous candidates are terminated in advance by storing node connection relations within the divided node blocks in advance.
Specific: based on the 4-path pattern and the data pattern in FIG. 1, the matching order is determined to be v0< v1< v2< v3>, and a disjoint pair of nodes < v2, v3> needs to be determined, where v2 e (N (v 0) -N (v 1)), v3 e (N (v 1) -N (v 0)). Currently, S (v 0) = {0} and S (v 1) = {7,12,13}, it is necessary to check whether three vertices 7,12,13 in S (v 1) are not connected to vertices in S (v 0). The sampling granularity b=3 may be set and the vertices in the data map will be divided into five blocks B0, B1, B2, B3, B4. For each vertex, a bitmap may be used to store its neighbor information. For example, if the neighboring blocks of vertex 0 are B0, B1, and B2, we can calculate its bitmap Nblock (0) =11100. Step 1: candidates (labeled 1) are grouped by block. We divide the vertices of S (v 1) into the block lists l0= {7} and l1= {12,13}, it is clear that g (L0) =b2, g (L1) =b4. Thus, we can check the connectivity from the same block in batches. Step 2: the inter-block connection (labeled 2) is checked using a bitmap. The connectivity between S (v 0) = {0} and L was checked. For L0, we look up Nblock (0) to get B2, find that the bit is 1, which means that 0 must be connected to b2= {6,7,8} but it is uncertain whether it is connected to 7, so we need to load N (0) in the range (6, 7), and perform a binary search in {6,7} to find 7. For L1, nblock (1) is looked up to obtain B4, finding that the bit is 0, so we can directly obtain the unconnected point pair <0,12>, <0,13>.
In the current graph data processing method, the redundant calculation problem is not considered, and each edge in the graph is repeatedly processed for multiple times. And the problem of invalid calculation is not considered, most points in the graph data in the application scene are unconnected, and the connectivity of the nodes can be predetermined. According to the graph data processing method, the duplicate removal is carried out on all the point sets to be selected once, so that the repeated edge inspection is reduced, correction operation is added to correct the number of estimated subgraphs for further improving the accuracy, and the problem of redundant calculation is solved. Meanwhile, as the data graph in the application scene, namely the data of the graph to be processed, has sparsity and locality, the method of the specification pre-stores the node relation among the nodes, and if the node is determined to be not connected with one node in the node block, the node is not communicated with the whole node block, so that the node pair needing to be processed is reduced, the problem of invalid calculation is solved, and the efficiency of obtaining the subgraph is further improved.
Referring to fig. 2, fig. 2 shows a flowchart of a graph data processing method according to an embodiment of the present disclosure, which specifically includes the following steps.
Step 202: and acquiring the data of the to-be-processed graph and the target graph mode.
The to-be-processed graph data refers to a graph formed by nodes and relations among the nodes, for example, the to-be-processed graph data can be a commodity relation graph representing relations among commodities in shopping software and the like; the target graph mode refers to a graph mode of a sub graph to be mined in the graph data to be processed, for example, the target graph mode is a 3-star mode, a 4-path mode, or the like.
Specifically, a terminal or a platform capable of carrying out subgraph mining on graph data receives a graph processing request; analyzing the received graph data processing request to obtain a graph data identifier and a graph mode identifier; and acquiring the graph data to be processed based on the graph data identification, and determining a target graph mode required by the graph processing task based on the graph mode identification.
In one embodiment of the present disclosure, a graph data processing request is received by a graph data processing platform; analyzing the graph data processing request, acquiring graph data to be processed as a personnel relationship graph, and acquiring a target graph mode as a 3-star mode.
And obtaining the to-be-processed graph data and the target graph mode based on the graph data processing request by receiving the graph data processing request so as to carry out subsequent mining of sub-graphs in the to-be-processed graph data based on the target graph mode.
Step 204: and screening a candidate node set in the graph data to be processed based on the designated initial node and the target graph mode, and counting the recurrence frequency of each node in the candidate node set.
The designated initial node refers to any node in the nodes of the graph data to be processed, for example, if the personnel relationship graph contains a node 1, a node 2 and a node 3, the node 1 can be used as the designated initial node; the candidate node set refers to nodes which are determined in the graph data to be processed and can be used for constructing subgraphs corresponding to the graph data to be processed; the recurrence frequency refers to the number of times that a node appears in a candidate node set; for example, in the case where the node a and the node B are used as the sub-graph nodes, the node C may be used as the sub-graph node, and in the case where the node a and the node G are used as the sub-graph nodes, the node C may still be used as the sub-graph node, and both sub-graph construction modes are recorded in the candidate node set, so that the current recurrence frequency of the node C is 2..
Specifically, determining an executing initial node in nodes of graph data to be processed; screening nodes which can be used for forming subgraphs of the target graph mode in the graph data to be processed based on the determined designated initial node and the target graph mode corresponding to the graph data processing task; forming a candidate node set by the screened nodes; and determining nodes contained in the candidate node set, and counting the recurrence frequency of each node in the candidate node set.
In a specific embodiment of the present disclosure, determining the node 3 as the designated initial node in the personnel relationship graph; screening one or more nodes which can form a target graph mode 3-star with the node 3 in the graph data to be processed, namely in a personnel relationship graph, based on the designated initial node and the target graph mode 3-star, for example, screening out a node 4 and a node 5; forming a candidate node set by the node 4 and the node 5; the frequency of recurrence of node 4 in the candidate node set and the frequency of recurrence of node 5 in the candidate node set are counted.
Selecting a candidate node set from the to-be-processed graph data through the designated initial node and the target graph mode, so that the node irrelevant to the designated initial node in the to-be-processed graph data is eliminated, subsequent calculation of irrelevant nodes is avoided, and calculation resources are saved; and counting the recurrence frequency of each node in the candidate node set so as to determine the subgraph of the data of the graph to be processed based on the recurrence frequency.
In practical application, the method for screening the candidate node set in the to-be-processed graph data based on the designated initial node and the target graph mode may include:
determining an initial candidate node set connected with the designated initial node in the graph data to be processed according to the designated initial node and the target graph mode;
And screening the candidate node set in the graph data to be processed based on the initial candidate node set and the target graph mode.
The initial candidate node set refers to a node composition set with a connection relation with a designated initial node;
in practical application, the target graph modes all include nodes with connection relations, for example, the target graph modes include three nodes v1, v2 and v3 in the graph, v1 is connected with v2, and v1 is connected with v 3; therefore, under the condition of determining the appointed initial node, the node connected with the appointed initial node can be determined based on the node connection relation in the graph data to be processed, so as to form an initial candidate node set; and determining which nodes can be the next nodes of the graph data to be processed in the target graph mode based on the initial candidate node set and the target graph mode, and forming the candidate node set based on the screened nodes.
Specifically, according to the designated initial node and the target graph mode, nodes with connection relation with the designated initial node are screened from the graph data to be processed, and an initial candidate node set is formed; under the condition that initial candidate nodes are obtained, screening candidate node sets in the graph data to be processed based on the initial candidate nodes and the target graph mode; the method comprises the following steps: in the case where a target node among the designated initial node and the initial candidate nodes is determined, and a sub-graph is determined based on the target graph pattern to contain the designated candidate node and the target node in which a connection relationship exists, the next node may be which nodes, so that a candidate node set is composed of these nodes.
In one embodiment of the present disclosure, the determining the target pattern is: the figure comprises four nodes v0, v1, v2 and v3, wherein v0 is respectively connected with v1, v2 and v3, and v1, v2 and v3 are not connected with each other before; determining a designated initial node as a node 1 in the graph data to be processed, wherein v0 in a target graph mode is used, and node attribute information is 1; determining a node 2, a node 3 and a node 4 which have a connection relation with the node 1 in the graph data to be processed based on the designated initial node and the target graph mode, wherein the node 2, the node 3 and the node 4 can be used as v1, and forming an initial candidate node set; and screening nodes which can be used as v3 and v4 in the graph data to be processed based on the initial candidate node set and the target graph mode to form a candidate node set.
The initial candidate node set is screened from the graph data to be processed, and the candidate node set is screened based on the initial candidate node set, so that the calculation amount of the nodes is reduced, and the determination efficiency of the nodes is improved.
The number of the candidate node sets is a plurality in practical application;
the method for counting the recurrence frequency of each node in the candidate node set may include:
counting the occurrence times of the node in each candidate node set aiming at any node;
And determining the recurrence frequency of each node based on the occurrence times of each node in each candidate node set.
The occurrence times refer to the times of occurrence of nodes in the candidate node set and the candidate node set; the recurrence frequency refers to the total number of occurrences of nodes in each candidate node set.
Specifically, determining any one candidate node set in a plurality of candidate node sets, and determining any one node in the candidate node sets; counting the total number of times of occurrence of the node in each candidate node set, and taking the total number of times as the recurrence frequency corresponding to the node; and determining the recurrence frequency corresponding to each node in the candidate node set based on the mode.
In one embodiment of the present disclosure, an initial candidate node set corresponding to v0 and v1 in the target graph mode is determined; determining a candidate set of v2 based on node 3, node 4 and node 5 in the initial candidate node set and node 0 corresponding to v 0; the method comprises the following steps: in the case where v1 is node 3, the candidate node set corresponding to v2 is { node 1, node 2}; in the case where v1 is node 4, the candidate node set corresponding to v2 is { node 1, node 2, node 3}; in the case where v1 is node 5, the candidate node set corresponding to v2 is { node 1, node 2, node 3, node 4}; under the condition that three candidate node sets are determined, determining that nodes contained in the candidate node sets are respectively 'node 1, node 2, node 3 and node 4'; the recurrence frequency of the statistical node 1 in each candidate node set is 3, the recurrence frequency of the node 2 is 3, the recurrence frequency of the node 3 is 2, and the recurrence frequency of the node 4 is 1. It should be noted that, the method for specifically determining the candidate node set is not described herein in detail, which is obtained by screening the candidate nodes in the candidate node set based on the connection relationship in the to-be-processed graph data and the target graph mode.
The nodes used to construct the subgraph are determined based on the recurrence frequency by determining the recurrence frequency for each node in the candidate set of nodes.
Step 206: and screening target node pairs from a target node set, and counting the number of target node pairs respectively related to each node in the candidate node set, wherein the target node set is determined from the map data to be processed based on the designated initial node and the target map mode, and the target node pairs are determined based on node connection relations in the target map mode.
The target node set refers to a set formed by nodes screened in the graph data to be processed based on the designated initial node and the target graph mode; the target node pair refers to a node pair which can be contained in the target node set; the number of target node pairs refers to the total number of node pairs comprising nodes in the candidate node set.
Specifically, based on the designated initial node and the target graph mode, under the condition that the designated initial node is acquired from the graph data to be processed, one or more nodes for constructing a target graph mode sub-graph are used, and a target node set is formed by the one or more nodes; screening target node pairs in a target node set based on node connection relations in a target graph mode; each node in the candidate node set is determined, and the number of target node pairs associated with each node is counted.
In a specific embodiment of the present disclosure, determining that the designated initial node is node 0 in the graph data to be processed; according to the node 0 and the target graph mode, determining nodes which can be used for generating a target graph mode subgraph in the graph data to be processed, wherein the nodes comprise a node 1, a node 2 and a node 3; forming a target node set based on the node 1, the node 2 and the node 3; screening target node pairs { node 1, node 2}, { node 2, node 3}, in the target node set based on the connection relation in the target graph mode; and determining that the node in the candidate node set is the node 3, and determining that the number of target node pairs related to the node 3 is 1.
And dividing the subgraph consistent with the target graph mode in the graph data to be processed based on the number of the target node pairs by screening the target node pairs and determining the number of the target node pairs related to each node.
In practical applications, the method for screening target node pairs from the target node set may include:
determining a target connection relationship among nodes in the target node set according to the target graph mode;
and determining a target node pair in the target node set according to the target connection relation and the to-be-processed graph data.
The target connection relationship refers to a connection relationship when nodes in the target node set are connected based on a target graph mode.
Specifically, according to the target graph mode, determining nodes which can be connected in the target node set and corresponding target connection relations; and determining a target node pair in the target node set based on the target connection relationship and the actual connection relationship in the graph data to be processed.
In a specific embodiment of the present disclosure, determining the node 1, the node 2, and the node 3 as nodes in the target node set; the target node pair may be { node 1, node 2} based on node 1, node 2, and node 3, { node 1, node 3} and { node 2, node 3}.
Further, the method for counting the number of target node pairs respectively associated with each node in the candidate node set may include:
determining a target node pair comprising nodes in the candidate node set;
ordering all nodes in the candidate node set based on a preset arrangement sequence to obtain a node sequence;
and counting the number of target node pairs corresponding to each node in the candidate node set according to the node sequence.
The preset arrangement sequence refers to an arrangement sequence of nodes in the candidate node set, for example, the nodes are ordered according to the node numbers; the node sequence is a sequence obtained by sequencing the nodes based on a preset arrangement sequence.
Specifically, determining each node contained in the candidate node set, and sequencing each node based on a preset sequencing order to obtain a node sequence; determining nodes one by one in a node sequence, and counting the number of target node pairs corresponding to the nodes; it should be noted that if the number statistics have been performed based on one node in the target node pair as the associated target node pair, then that target node pair cannot be used as the target node pair associated with the other node.
In one embodiment of the present disclosure, determining a node in a candidate set of nodes includes: node 2, node 4, node 3; based on the number of the nodes, ordering the nodes to obtain node sequences of node 4, node 3 and node 2; obtaining a target node pair, comprising: { node 7, node 4}, { node 4, node 3}, { node 3, node 2}, { node 4, node 2}; since the node pair { node 7, node 4} contains a node that is not related to the node, namely node 7, then the target node pair is determined to be not related to the node in the candidate node set; based on the number statistics of the target node pairs corresponding to the statistical nodes of the remaining target node pairs, specifically, firstly taking out the node 4 from the node sequence, counting 2 target node pairs containing the node 4, namely { node 4, node 3}, { node 4, node 2}, taking out the node 3 from the node sequence, counting { node 4, node 3}, { node 3, node 2}, wherein { node 4, node 3} is used as the target node pair related to the node 4, and the number of the target node pairs related to the node 3 is 1; node 2 is fetched from the node sequence, and the node pair including node 2 is { node 3, node 2}, { node 4, node 2}, but two node pairs have been used as node pairs of node 3 and node 4, and the number of target node pairs of node 2 is 0.
And counting the number of target node pairs related to the nodes in the candidate node set so as to divide subgraphs corresponding to the target graph modes in the graph data to be processed based on the number of target node pairs.
Step 208: and determining sub-graph data corresponding to the target graph mode in the graph data to be processed according to the recurrence frequency and the number.
The sub-graph data refers to graph data which can be used for forming a sub-graph conforming to a target graph mode in the graph data to be processed.
Specifically, according to the recurrence frequency and the number, the method for determining sub-graph data corresponding to the target graph mode in the graph data to be processed may include:
determining a sub-graph node set in the graph data to be processed according to the recurrence frequency and the number;
dividing sub-graph data in the graph data to be processed based on the sub-graph node set.
Wherein, the sub-graph node set refers to a set formed by nodes which can form a sub-graph.
Specifically, based on the recurrence frequency and the number, searching nodes which can form a subgraph in the graph data to be processed; generating a sub-graph node set corresponding to the sub-graph based on the searched nodes; and dividing subgraphs conforming to the target graph mode in the graph data to be processed according to the subgraph node set.
Further, after counting the number of target node pairs respectively related to each node in the candidate node set, the method further includes:
and calculating the number of the estimated subgraphs corresponding to the graph data to be processed according to the recurrence frequency and the number.
The estimated sub-graph number refers to the total number of sub-graphs contained in the to-be-processed graph data estimated according to the recurrence frequency and the number.
Specifically, the number of the estimated subgraphs corresponding to the node can be calculated based on the recurrence frequency and the number corresponding to the node in the candidate node set, and similarly, the number of the estimated subgraphs corresponding to each node is summed to obtain the number of the estimated subgraphs corresponding to the graph data to be processed.
In one embodiment of the present disclosure, it is determined that the candidate node set includes node 2 and node 3; determining that the recurrence frequency corresponding to the node 2 is 3, and the number of corresponding target node pairs is 2, wherein the estimated number of subgraphs of the node 2 is 2*3 =6; similarly, determining the number of the estimated subgraphs corresponding to the node 3 as 4; and calculating the number of the estimated subgraphs corresponding to the to-be-processed graph data to be the sum of the number 4 of the estimated subgraphs corresponding to the node 3 and the number 6 of the estimated subgraphs corresponding to the node 2, namely, the number=6+4=10 of the estimated subgraphs.
According to the recurrence frequency and the number, the number of the estimated subgraphs corresponding to the graph data to be processed is calculated, so that the number of the estimated subgraphs can be obtained, and the calculation of a downstream task is facilitated.
Further, after calculating the number of estimated subgraphs corresponding to the graph data to be processed according to the recurrence frequency and the number, the method further includes:
determining a complement set corresponding to the candidate node set;
and correcting the number of the estimated subgraphs based on the connection relation between the nodes in the complement set and the nodes in the candidate node set.
The complement refers to a set formed by nodes in the target node set except for nodes in the candidate node set.
Specifically, determining complementary sets corresponding to each candidate node set; and determining whether the node in the complement set and the node in the candidate node set have inconsistent node connection relation with the data of the graph to be processed, and if so, subtracting incorrect node connection condition from the estimated sub-graph number, thereby realizing correction of the estimated sub-graph number and improving the accuracy of the estimated sub-graph number.
In a specific embodiment of the present disclosure, the number of estimated subgraphs corresponding to the graph data to be processed is determined to be 10; determining a complement set corresponding to the candidate node set; determining that the node 3 in the candidate node set and the node 4 in the complement set have a connection relationship in the to-be-processed graph data, and cannot generate the subgraph conforming to the target graph mode, so that 1 can be subtracted from the estimated subgraph number to obtain corrected estimated subgraph number 9.
By correcting the number of the estimated subgraphs, the accuracy of the number of the estimated subgraphs is improved.
In practical application, useless searching exists, namely, the relation among partial nodes does not need to be calculated, so that the efficiency of determining the subgraph is influenced; the specification also provides a method for partitioning nodes so as to perform subgraph mining, which specifically comprises the following steps:
based on a blocking threshold, carrying out blocking processing on a target node sequence to obtain a plurality of node blocks, wherein the target node sequence is obtained by arranging nodes in the graph data to be processed by taking the designated initial node as a starting point;
determining an adjacent relation table corresponding to the appointed initial node according to the adjacent relation between the appointed initial node and each node block, wherein elements in the adjacent relation table represent whether the appointed initial node is adjacent to each node block or not;
screening candidate nodes from the candidate node set according to the adjacent relation table;
and determining sub-graph data corresponding to the target graph mode in the graph data to be processed based on the candidate nodes.
The blocking threshold refers to a node number threshold for blocking nodes in the graph data to be processed, for example, the blocking threshold is 3, i.e. the nodes can be blocked by taking 3 nodes as a group; in practical application, the blocking threshold value can be set based on the to-be-processed image data, and the mode for determining the blocking threshold value is not particularly limited in the specification; the target node sequence is a sequence obtained by arranging nodes in the graph data to be processed by taking a designated initial node as a starting point; the node block refers to a node group consisting of nodes; the adjacent relation refers to the connection relation between the appointed initial node and each node in the node block; the adjacent relation table is a data table for recording the relation between the appointed initial node and each node block; the adjacent relation table contains elements corresponding to each node block, and the elements represent that the designated initial node is connected or not connected with the node block; candidate nodes refer to nodes that may be used to generate a target graph pattern sub-graph.
Specifically, determining a blocking threshold, and sequentially performing blocking processing on the target node sequence based on the blocking threshold to obtain a plurality of node blocks; determining the connection relation between the designated initial node and the nodes in each node block based on the graph data to be processed, thereby determining the adjacent relation between the designated initial node and each node block; filling adjacent relations into an adjacent relation table, and screening candidate nodes in a candidate node set based on the adjacent relation table; and determining division target graph mode sub-graph data in the to-be-processed graph data based on the candidate nodes.
In a specific embodiment of the present disclosure, determining the blocking threshold as 3, and blocking the node 2, the node 3, the node 4, and the node 5 in the target node sequence based on the blocking threshold, that is, dividing the node 2, the node 3, and the node 4 into a node block, and dividing the node 4 and the node 5 into a node block; determining connection relation between a designated initial node 1 and nodes in a target node sequence based on the graph data to be processed, determining elements corresponding to each node block, and adding the elements to an adjacent relation table; and screening candidate nodes in the candidate node set based on the adjacent relation table, and determining sub-graph data in the graph data to be processed based on the screened candidate nodes.
Further, the method for determining the neighbor relation table corresponding to the designated initial node according to the neighbor relation between the designated initial node and each node block may include:
recording that the target node block is in a connection state in the adjacent relation table under the condition that the designated initial node is connected with any node in the target node block;
and under the condition that the designated initial node is not connected with each node in the target node block, recording that the target node block is in a non-connection state in the adjacent relation table.
The connection state refers to the connection relation between the designated initial node and the nodes in the node block; the non-connection state refers to that no connection relationship exists between the designated initial node and the nodes in the node block.
Specifically, determining whether a connection relationship exists between a designated initial node and a node in a node block according to the graph data to be processed; setting the element of the node block to a connection state in the case of a connection relationship; setting the node block to a non-connected state in the absence of a connection; for example, an element "0" may be set for the node block in the case where the non-connection state is determined, and an element "1" may be set for the node block in the case where the connection state is determined.
In a specific embodiment of the present specification, determining a designated initial node 1 and a node block, where the node block includes a node 2 and a node 3; and if the connection relation between the node 1 and the node 2 is determined according to the data of the graph to be processed, setting the elements of the node block as a connection state.
Along the above example, if it is determined that, according to the graph data to be processed, the node 2 and the node 3 have no connection relationship with the designated initial node 1, then the element of the node block is set to a non-connection state.
The neighbor relation table is supplemented based on the neighbor relation by determining the neighbor relation between the designated initial node and each node block, so that the candidate node can be determined in the candidate node set based on the neighbor relation table corresponding to the designated initial node.
Further, according to the neighbor relation table, the method for screening candidate nodes from the candidate node set may include:
determining each target node block containing each node in the adjacent relation table based on each node in the candidate node set;
determining elements corresponding to each target node block recorded in the adjacent relation table;
and screening candidate nodes from the nodes according to the elements.
The target node block is a node block including nodes in the candidate node set, for example, if it is determined that the node 1 in the candidate node set is included, it is determined that the node block including the node 1 is the target node block; the element refers to the state of the target node block in the adjacency list.
Specifically, determining any node in the candidate node set; determining a target node block containing the node in the node blocks corresponding to the designated initial node; determining the elements of the target node block recorded in the adjacent relation table; candidate nodes are selected among the nodes based on the element.
In one embodiment of the present disclosure, node 2 is determined among the candidate node set; determining a target node block containing the node 2; if the element of the target node block is recorded in the adjacent relation table as a connection state, determining that a connection relation is stored between the node in the target node block and the designated initial node; candidate nodes are further screened among the nodes based on the element.
In practical application, the method for screening candidate nodes in the nodes according to the elements can include:
determining the connection relation between each node in a target node block and the target node under the condition that an element corresponding to the target node block where the target node is located is in a connection state, wherein the target node is any node in the candidate node set;
and screening candidate nodes in the target node block based on the connection relation.
Specifically, the target node refers to any node in the candidate node set; determining whether the element of the target node in the adjacent relation table is in a connection state, and further determining whether a connection relation exists between the target node and the nodes in the target node block; if the node with connection needs to be queried in the target graph mode, the candidate node can be determined by determining the connection relation between the target node and the nodes in the target node block.
In one embodiment of the present disclosure, node 7 is determined among the candidate set of nodes; if the node 7 is determined to be in the target node block and the element of the target node block is in the connection state, the node connected with the node 7 can be screened in the target node block in a binary search mode to serve as a candidate node.
And screening candidate nodes on the basis of elements of the target node blocks by the target node blocks corresponding to the target nodes, so that connection calculation of each node is avoided, calculation resources are saved, and the determination efficiency of sub-graph data is improved.
One embodiment of the present description implements obtaining pending diagram data and a target diagram mode; screening a candidate node set in the graph data to be processed based on the designated initial node and the target graph mode, and counting the recurrence frequency of each node in the candidate node set; screening target node pairs from a target node set, and counting the number of target node pairs respectively related to each node in the candidate node set, wherein the target node set is determined from the map data to be processed based on the designated initial node and the target map mode, and the target node pairs are determined based on node connection relations in the target map mode; and determining sub-graph data corresponding to the target graph mode in the graph data to be processed according to the recurrence frequency and the number.
The image data to be processed and the target image mode are acquired, so that the image data to be processed can be conveniently processed based on the target image mode; the method comprises the steps of screening a candidate node set based on a designated initial node and a target graph mode, counting the recurrence frequency of nodes in the candidate node set, screening target node pairs in the target node set, counting the number of target node pairs corresponding to the nodes, and determining sub-graph data in graph data to be processed based on the number of target node pairs and the recurrence frequency of the nodes, so that each node is prevented from being calculated, calculation resources are saved, and the processing efficiency of the graph data to be processed is improved.
The graph data processing method provided in the present specification will be further described with reference to fig. 3 by taking an application of the graph data processing method in 3-star graph mode acquisition as an example. Fig. 3 is a flowchart of a processing procedure of a graph data processing method according to an embodiment of the present disclosure, which specifically includes the following steps.
Step 302: and acquiring the data of the graph to be processed and the 3-star graph mode.
Step 304: and determining an initial candidate node set connected with the designated initial node in the graph data to be processed according to the designated initial node and the 3-star graph mode.
Step 306: and screening the candidate node set in the graph data to be processed based on the initial candidate node set and the 3-star graph mode.
Step 308: and counting the occurrence times of the node in each candidate node set aiming at any node.
Step 310: and determining the recurrence frequency of each node based on the occurrence times of each node in each candidate node set.
Step 312: and determining the target connection relation among the nodes in the target node set according to the 3-star graph mode.
Step 314: and determining a target node pair in the target node set according to the target connection relation and the to-be-processed graph data.
Step 316: and counting the number of target node pairs respectively related to each node in the candidate node set.
Specifically, determining a target node pair comprising nodes in the candidate node set; ordering all nodes in the candidate node set based on a preset arrangement sequence to obtain a node sequence; and counting the number of target node pairs corresponding to each node in the candidate node set according to the node sequence.
Step 318: and determining a sub-graph node set in the graph data to be processed according to the recurrence frequency and the number.
Step 320: dividing sub-graph data in the graph data to be processed based on the sub-graph node set.
The graph data processing method applied to the 3-star graph mode acquisition is convenient for processing the graph data to be processed based on the target graph mode by acquiring the graph data to be processed and the target graph mode; the method comprises the steps of screening a candidate node set based on a designated initial node and a target graph mode, counting the recurrence frequency of nodes in the candidate node set, screening target node pairs in the target node set, counting the number of target node pairs corresponding to the nodes, and determining sub-graph data in graph data to be processed based on the number of target node pairs and the recurrence frequency of the nodes, so that each node is prevented from being calculated, calculation resources are saved, and the processing efficiency of the graph data to be processed is improved.
The graph data processing method provided in the present specification will be further described with reference to fig. 4 by taking an application of the graph data processing method in the 4-path graph mode acquisition as an example. Fig. 4 is a flowchart of a processing procedure of another graph data processing method according to an embodiment of the present disclosure, which specifically includes the following steps.
Step 402: and acquiring the data of the to-be-processed graph and the 4-path graph mode, and screening the candidate node set in the data of the to-be-processed graph based on the designated initial node and the 4-path graph mode.
Step 404: and performing blocking processing on a target node sequence based on a blocking threshold value to obtain a plurality of node blocks, wherein the target node sequence is obtained by arranging nodes in the graph data to be processed by taking the designated initial node as a starting point.
Step 406: and determining an adjacent relation table corresponding to the appointed initial node according to the adjacent relation between the appointed initial node and each node block, wherein elements in the adjacent relation table represent whether the appointed initial node is adjacent to each node block.
Specifically, under the condition that any node in the designated initial node and the target node block is connected, recording that the target node block is in a connection state in the adjacent relation table; and under the condition that the designated initial node is not connected with each node in the target node block, recording that the target node block is in a non-connection state in the adjacent relation table.
Step 408: and determining each target node block containing each node in the adjacent relation table based on each node in the candidate node set.
Step 410: and determining elements corresponding to each target node block recorded in the adjacent relation table, and screening candidate nodes from the nodes according to the elements.
Step 412: and determining sub-graph data corresponding to the 4-path graph mode in the graph data to be processed based on the candidate nodes.
The target node sequence is partitioned to obtain a plurality of node blocks, the adjacent relation between the designated initial node and each node block is determined, and the adjacent relation table corresponding to the designated initial node is determined, so that candidate nodes are screened by the adjacent relation table, and the calculation of error options is avoided through the relation among the pre-stored nodes, thereby improving the efficiency of determining subgraphs.
Referring to fig. 5, fig. 5 shows a flowchart of a commodity graph data processing method for an electronic commerce according to an embodiment of the present disclosure, which specifically includes the following steps:
step 502: and acquiring commodity graph data, and determining a target graph mode according to a sub-graph acquisition task corresponding to the commodity graph data, wherein the commodity graph data is constructed based on the user attribute information and commodity attribute information among users.
Specifically, under the electronic market scene, the commodity graph can be constructed by taking the attribute information of the users as nodes and the commodity attribute information among the users as the relation among the nodes, wherein the commodity graph is composed of commodity graph data; under the condition that a recommending task for recommending goods to a user is received, namely, a sub-graph acquisition task of commodity graph data is received, analyzing the sub-graph acquisition task to obtain a target graph mode carried by the sub-graph acquisition task; the target graph schema is used for mining sub-graph data in the commodity graph data later.
Step 504: and screening a candidate node set in the commodity graph data based on the designated initial node and the target graph mode, and counting the recurrence frequency of each node in the candidate node set.
Specifically, after commodity graph data are determined, one node is arbitrarily selected from the commodity graph data to serve as a designated initial node; in the embodiment, a node 1 containing user attribute information of a user A is selected as a designated initial node; screening nodes which can be used for generating a target graph mode subgraph in commodity graph data according to the designated initial nodes, and generating a candidate node set; counting the recurrence frequency corresponding to each node in the candidate node set, namely determining the occurrence times of the nodes in the candidate node set, for example, when the node A and the node B are used as sub-graph nodes, the node C can be used as the sub-graph node, when the node A and the node G are used as the sub-graph nodes, the node C can still be used as the sub-graph node, and the two sub-graph construction modes are recorded in the candidate node set, so that the current recurrence frequency of the node C is 2.
Step 506: and screening target node pairs from a target node set, and counting the number of target node pairs respectively related to each node in the candidate node set, wherein the target node set is determined from the commodity graph data based on the designated initial node and the target graph mode, and the target node pairs are determined based on node connection relations in the target graph mode.
Specifically, according to the designated initial node 1 and the target graph mode, nodes which can be used for generating subgraphs of the target graph model are screened in commodity graph data based on the designated initial node 1 and the target graph mode, and then a target node set is constructed based on the screened nodes; screening target node pairs in the target node set, namely forming the target node pairs based on two nodes with connection relations in the target node set; and comparing the target node pairs with the nodes in the candidate node set based on the screened target node pairs, and counting the number of target node pairs corresponding to the nodes with consistent comparison.
Step 508: and determining sub-graph data corresponding to the target graph mode in the commodity graph data according to the recurrence frequency and the quantity.
Specifically, according to the recurrence frequency and the number, calculating the number of subgraphs containing different target graph modes in commodity graph data; the sub-image data corresponding to each sub-image is expressed as a commodity content recommended to the user.
Step 510: and sending the sub-graph data to a client for display.
Specifically, the screened sub-image data is sent to the client, so that the client can generate commodity information displayed on a user recommendation page based on the sub-image data.
According to the commodity image data processing method, commodity image data and the target image mode are acquired, so that the commodity image data can be conveniently processed based on the target image mode; the method comprises the steps of screening a candidate node set based on a designated initial node and a target graph mode, counting the recurrence frequency of nodes in the candidate node set, screening target node pairs in the target node set, counting the number of target node pairs corresponding to the nodes, and determining sub-graph data in commodity graph data based on the number of target node pairs and the recurrence frequency of the nodes, so that each node is prevented from being calculated, calculation resources are saved, and the processing efficiency of the commodity graph data is improved.
Corresponding to the method embodiments described above, the present disclosure further provides an embodiment of a graph data processing apparatus, and fig. 6 shows a schematic structural diagram of a graph data processing apparatus provided in one embodiment of the present disclosure. As shown in fig. 6, the apparatus includes:
an acquisition module 602 configured to acquire pending diagram data and a target diagram pattern;
a statistics module 604 configured to screen a candidate node set from the map data to be processed based on a specified initial node and the target map mode, and to count a recurrence frequency of each node in the candidate node set;
A screening module 606 configured to screen target node pairs from a target node set, and count the number of target node pairs respectively related to nodes in the candidate node set, wherein the target node set is determined from the map data to be processed based on the specified initial node and the target map mode, and the target node pairs are determined based on node connection relations in the target map mode;
a determining module 608 is configured to determine sub-graph data corresponding to the target graph mode in the pending graph data according to the recurrence frequency and the number.
Optionally, the statistics module 604 is further configured to:
determining an initial candidate node set connected with the designated initial node in the graph data to be processed according to the designated initial node and the target graph mode;
and screening the candidate node set in the graph data to be processed based on the initial candidate node set and the target graph mode.
Optionally, the number of the candidate node sets is a plurality; the statistics module 504 is further configured to:
counting the occurrence times of the node in each candidate node set aiming at any node;
And determining the recurrence frequency of each node based on the occurrence times of each node in each candidate node set.
Optionally, the screening module 606 is further configured to:
determining a target connection relationship among nodes in the target node set according to the target graph mode;
and determining a target node pair in the target node set according to the target connection relation and the to-be-processed graph data.
Optionally, the screening module 606 is further configured to:
determining a target node pair comprising nodes in the candidate node set;
ordering all nodes in the candidate node set based on a preset arrangement sequence to obtain a node sequence;
and counting the number of target node pairs corresponding to each node in the candidate node set according to the node sequence.
Optionally, the determining module 608 is further configured to:
determining a sub-graph node set in the graph data to be processed according to the recurrence frequency and the number;
dividing sub-graph data in the graph data to be processed based on the sub-graph node set.
Optionally, the apparatus further comprises a computing module configured to:
and calculating the number of the estimated subgraphs corresponding to the graph data to be processed according to the recurrence frequency and the number.
Optionally, the apparatus further comprises a correction module configured to:
determining a complement set corresponding to the candidate node set;
and correcting the number of the estimated subgraphs based on the connection relation between the nodes in the complement set and the nodes in the candidate node set.
Optionally, the apparatus further comprises a blocking module configured to:
based on a blocking threshold, carrying out blocking processing on a target node sequence to obtain a plurality of node blocks, wherein the target node sequence is obtained by arranging nodes in the graph data to be processed by taking the designated initial node as a starting point;
determining an adjacent relation table corresponding to the appointed initial node according to the adjacent relation between the appointed initial node and each node block, wherein elements in the adjacent relation table represent whether the appointed initial node is adjacent to each node block or not;
screening candidate nodes from the candidate node set according to the adjacent relation table;
and determining sub-graph data corresponding to the target graph mode in the graph data to be processed based on the candidate nodes.
Optionally, the partitioning module is further configured to:
recording that the target node block is in a connection state in the adjacent relation table under the condition that the designated initial node is connected with any node in the target node block;
And under the condition that the designated initial node is not connected with each node in the target node block, recording that the target node block is in a non-connection state in the adjacent relation table.
Optionally, the partitioning module is further configured to:
determining each target node block containing each node in the adjacent relation table based on each node in the candidate node set;
determining elements corresponding to each target node block recorded in the adjacent relation table;
and screening candidate nodes from the nodes according to the elements.
Optionally, the partitioning module is further configured to:
determining the connection relation between each node in a target node block and the target node under the condition that an element corresponding to the target node block where the target node is located is in a connection state, wherein the target node is any node in the candidate node set;
and screening candidate nodes in the target node block based on the connection relation.
The image data processing device of the specification acquires image data to be processed and a target image mode; screening a candidate node set in the graph data to be processed based on the designated initial node and the target graph mode, and counting the recurrence frequency of each node in the candidate node set; screening target node pairs from a target node set, and counting the number of target node pairs respectively related to each node in the candidate node set, wherein the target node set is determined from the map data to be processed based on the designated initial node and the target map mode, and the target node pairs are determined based on node connection relations in the target map mode; and determining sub-graph data corresponding to the target graph mode in the graph data to be processed according to the recurrence frequency and the number.
The image data to be processed and the target image mode are acquired, so that the image data to be processed can be conveniently processed based on the target image mode; the method comprises the steps of screening a candidate node set based on a designated initial node and a target graph mode, counting the recurrence frequency of nodes in the candidate node set, screening target node pairs in the target node set, counting the number of target node pairs corresponding to the nodes, and determining sub-graph data in graph data to be processed based on the number of target node pairs and the recurrence frequency of the nodes, so that each node is prevented from being calculated, calculation resources are saved, and the processing efficiency of the graph data to be processed is improved.
The above is a schematic scheme of a graph data processing apparatus of the present embodiment. It should be noted that, the technical solution of the graph data processing apparatus and the technical solution of the graph data processing method belong to the same conception, and details of the technical solution of the graph data processing apparatus, which are not described in detail, can be referred to the description of the technical solution of the graph data processing method.
Fig. 7 illustrates a block diagram of a computing device 700 provided in accordance with one embodiment of the present description. The components of computing device 700 include, but are not limited to, memory 710 and processor 720. Processor 720 is coupled to memory 710 via bus 730, and database 750 is used to store data.
Computing device 700 also includes access device 740, access device 740 enabling computing device 700 to communicate via one or more networks 760. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 740 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network interface controller), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Network) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, near field communication (NFC, near Field Communication).
In one embodiment of the present description, the above-described components of computing device 700, as well as other components not shown in FIG. 7, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 7 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 700 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 700 may also be a mobile or stationary server.
Wherein the processor 720 is configured to execute computer-executable instructions that, when executed by the processor, perform the steps of the graph data processing method described above.
The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the graph data processing method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the graph data processing method.
An embodiment of the present disclosure also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the graph data processing method described above.
The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the graph data processing method belong to the same conception, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the graph data processing method.
An embodiment of the present disclosure also provides a computer program, where the computer program, when executed in a computer, causes the computer to perform the steps of the graph data processing method described above.
The above is an exemplary version of a computer program of the present embodiment. It should be noted that, the technical solution of the computer program and the technical solution of the graph data processing method belong to the same conception, and details of the technical solution of the computer program, which are not described in detail, can be referred to the description of the technical solution of the graph data processing method.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be increased or decreased appropriately according to the requirements of the patent practice, for example, in some areas, according to the patent practice, the computer readable medium does not include an electric carrier signal and a telecommunication signal.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims (14)

1. A graph data processing method, comprising:
acquiring to-be-processed image data and a target image mode;
screening a candidate node set in the graph data to be processed based on the designated initial node and the target graph mode, and counting the recurrence frequency of each node in the candidate node set;
screening target node pairs from a target node set, and counting the number of target node pairs respectively related to each node in the candidate node set, wherein the target node set is determined from the map data to be processed based on the designated initial node and the target map mode, and the target node pairs are determined based on node connection relations in the target map mode;
And determining sub-graph data corresponding to the target graph mode in the graph data to be processed according to the recurrence frequency and the number.
2. The method of claim 1, screening the candidate node set in the pending graph data based on specifying an initial node and the target graph pattern, comprising:
determining an initial candidate node set connected with the designated initial node in the graph data to be processed according to the designated initial node and the target graph mode;
and screening the candidate node set in the graph data to be processed based on the initial candidate node set and the target graph mode.
3. The method of claim 1, the number of candidate node sets being a plurality;
counting the recurrence frequency of each node in the candidate node set, including:
counting the occurrence times of the node in each candidate node set aiming at any node;
and determining the recurrence frequency of each node based on the occurrence times of each node in each candidate node set.
4. The method of claim 1, screening target node pairs from a target node set, comprising:
determining a target connection relationship among nodes in the target node set according to the target graph mode;
And determining a target node pair in the target node set according to the target connection relation and the to-be-processed graph data.
5. The method of claim 1 or 4, counting the number of target node pairs respectively associated with each node in the candidate node set, comprising:
determining a target node pair comprising nodes in the candidate node set;
ordering all nodes in the candidate node set based on a preset arrangement sequence to obtain a node sequence;
and counting the number of target node pairs corresponding to each node in the candidate node set according to the node sequence.
6. The method of claim 1, determining sub-graph data corresponding to the target graph mode in the pending graph data according to the recurrence frequency and the number, comprising:
determining a sub-graph node set in the graph data to be processed according to the recurrence frequency and the number;
dividing sub-graph data in the graph data to be processed based on the sub-graph node set.
7. The method of claim 1, further comprising, after counting the number of target node pairs respectively associated with each node in the candidate node set:
and calculating the number of the estimated subgraphs corresponding to the graph data to be processed according to the recurrence frequency and the number.
8. The method of claim 7, further comprising, after calculating the number of pre-estimated subgraphs corresponding to the graph data to be processed according to the recurrence frequency and the number:
determining a complement set corresponding to the candidate node set;
and correcting the number of the estimated subgraphs based on the connection relation between the nodes in the complement set and the nodes in the candidate node set.
9. The method of claim 1, further comprising:
based on a blocking threshold, carrying out blocking processing on a target node sequence to obtain a plurality of node blocks, wherein the target node sequence is obtained by arranging nodes in the graph data to be processed by taking the designated initial node as a starting point;
determining an adjacent relation table corresponding to the appointed initial node according to the adjacent relation between the appointed initial node and each node block, wherein elements in the adjacent relation table represent whether the appointed initial node is adjacent to each node block or not;
screening candidate nodes from the candidate node set according to the adjacent relation table;
and determining sub-graph data corresponding to the target graph mode in the graph data to be processed based on the candidate nodes.
10. The method of claim 9, determining a neighbor relation table corresponding to the designated initial node according to the neighbor relation between the designated initial node and each node block, comprising:
Recording that the target node block is in a connection state in the adjacent relation table under the condition that the designated initial node is connected with any node in the target node block;
and under the condition that the designated initial node is not connected with each node in the target node block, recording that the target node block is in a non-connection state in the adjacent relation table.
11. The method of claim 9, filtering candidate nodes from the set of candidate nodes according to the neighbor relation table, comprising:
determining each target node block containing each node in the adjacent relation table based on each node in the candidate node set;
determining elements corresponding to each target node block recorded in the adjacent relation table;
and screening candidate nodes from the nodes according to the elements.
12. A commodity graph data processing method for electronic commerce comprises the following steps:
acquiring commodity image data, and determining a target image mode according to a sub-image acquisition task corresponding to the commodity image data, wherein the commodity image data is constructed based on user attribute information and commodity attribute information among users;
screening a candidate node set in the commodity graph data based on the designated initial node and the target graph mode, and counting the recurrence frequency of each node in the candidate node set;
Screening target node pairs from a target node set, and counting the number of target node pairs respectively related to each node in the candidate node set, wherein the target node set is determined from the commodity graph data based on the designated initial node and the target graph mode, and the target node pairs are determined based on node connection relations in the target graph mode;
determining sub-graph data corresponding to the target graph mode in the commodity graph data according to the recurrence frequency and the quantity;
and sending the sub-graph data to a client for display.
13. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer executable instructions, the processor being configured to execute the computer executable instructions, which when executed by the processor, implement the steps of the method of any one of claims 1 to 12.
14. A computer readable storage medium storing computer executable instructions which when executed by a processor implement the steps of the method of any one of claims 1 to 12.
CN202310576057.5A 2023-05-17 2023-05-17 Graph data processing method Active CN116340559B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310576057.5A CN116340559B (en) 2023-05-17 2023-05-17 Graph data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310576057.5A CN116340559B (en) 2023-05-17 2023-05-17 Graph data processing method

Publications (2)

Publication Number Publication Date
CN116340559A true CN116340559A (en) 2023-06-27
CN116340559B CN116340559B (en) 2023-10-20

Family

ID=86886145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310576057.5A Active CN116340559B (en) 2023-05-17 2023-05-17 Graph data processing method

Country Status (1)

Country Link
CN (1) CN116340559B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006228162A (en) * 2005-02-21 2006-08-31 Advanced Telecommunication Research Institute International Information processor and program
CN108388642A (en) * 2018-02-27 2018-08-10 中南民族大学 A kind of subgraph query method, device and computer readable storage medium
CN109614521A (en) * 2018-11-09 2019-04-12 复旦大学 A kind of efficient secret protection subgraph inquiry processing method
CN111510454A (en) * 2020-04-15 2020-08-07 中国人民解放军国防科技大学 Pattern graph change-oriented continuous subgraph matching method, system and equipment
CN112767186A (en) * 2021-01-26 2021-05-07 东南大学 Social network link prediction method based on 7-subgraph topological structure
US20210319329A1 (en) * 2020-03-30 2021-10-14 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating knowledge graph, method for relation mining
CN113779085A (en) * 2021-07-26 2021-12-10 北京大学 Method and device for acquiring isomorphic subgraph, computer equipment and readable storage medium
CN114282073A (en) * 2022-03-02 2022-04-05 支付宝(杭州)信息技术有限公司 Data storage method and device and data reading method and device
US20220129766A1 (en) * 2018-12-24 2022-04-28 Parexel International, Llc Data storage and retrieval system including a knowledge graph employing multiple subgraphs and a linking layer including multiple linking nodes, and methods, apparatus and systems for constructing and using same
CN114490799A (en) * 2020-11-11 2022-05-13 电科云(北京)科技有限公司 Method and device for mining frequent subgraphs of single graph
CN114600097A (en) * 2020-08-27 2022-06-07 清华大学 Subgraph matching strategy determination method, subgraph matching method, subgraph counting method and computing device
US20220391365A1 (en) * 2021-06-08 2022-12-08 International Business Machines Corporation Duplicate determination in a graph

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006228162A (en) * 2005-02-21 2006-08-31 Advanced Telecommunication Research Institute International Information processor and program
CN108388642A (en) * 2018-02-27 2018-08-10 中南民族大学 A kind of subgraph query method, device and computer readable storage medium
CN109614521A (en) * 2018-11-09 2019-04-12 复旦大学 A kind of efficient secret protection subgraph inquiry processing method
US20220129766A1 (en) * 2018-12-24 2022-04-28 Parexel International, Llc Data storage and retrieval system including a knowledge graph employing multiple subgraphs and a linking layer including multiple linking nodes, and methods, apparatus and systems for constructing and using same
US20210319329A1 (en) * 2020-03-30 2021-10-14 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for generating knowledge graph, method for relation mining
CN111510454A (en) * 2020-04-15 2020-08-07 中国人民解放军国防科技大学 Pattern graph change-oriented continuous subgraph matching method, system and equipment
CN114600097A (en) * 2020-08-27 2022-06-07 清华大学 Subgraph matching strategy determination method, subgraph matching method, subgraph counting method and computing device
CN114490799A (en) * 2020-11-11 2022-05-13 电科云(北京)科技有限公司 Method and device for mining frequent subgraphs of single graph
CN112767186A (en) * 2021-01-26 2021-05-07 东南大学 Social network link prediction method based on 7-subgraph topological structure
US20220391365A1 (en) * 2021-06-08 2022-12-08 International Business Machines Corporation Duplicate determination in a graph
CN113779085A (en) * 2021-07-26 2021-12-10 北京大学 Method and device for acquiring isomorphic subgraph, computer equipment and readable storage medium
CN114282073A (en) * 2022-03-02 2022-04-05 支付宝(杭州)信息技术有限公司 Data storage method and device and data reading method and device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FENGCAI QIAO ETC.: "A Parallel Algorithm for Graph Transaction Based Frequent Subgraph Mining", 《 2020 IEEE FIFTH INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC)》, pages 351 - 355 *
ZHAO SUN ETC.: "Efficient Subgraph Matching on Billion Node Graphs", 《PROCEEDINGS OF THE VLDB ENDOWMENT 》, pages 788 - 799 *
刘古刘: "多约束图模式匹配模型及算法研究", 《中国博士学位论文全文数据库(基础科学辑)》, pages 002 - 3 *
郭聪敏: "图集的子图查询算法研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, pages 138 - 159 *

Also Published As

Publication number Publication date
CN116340559B (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN111026570A (en) Method and device for determining abnormal reason of business system
CN110019989B (en) Data processing method and device
CN110719106B (en) Social network graph compression method and system based on node classification and sorting
US20220230369A1 (en) Generating a data visualization graph utilizing modularity-based manifold tearing
JP2019513245A (en) METHOD, DEVICE, SERVER AND STORAGE MEDIUM FOR SEARCHING GROUPS BASED ON SOCIAL NETWORK
CN111159577B (en) Community dividing method and device, storage medium and electronic device
CN111026765A (en) Dynamic processing method, equipment, storage medium and device for strictly balanced binary tree
CN112364014A (en) Data query method, device, server and storage medium
Zhao et al. Effective and efficient dense subgraph query in large-scale social Internet of Things
CN111309946A (en) Established file optimization method and device
CN116340559B (en) Graph data processing method
CN113157695B (en) Data processing method and device, readable medium and electronic equipment
Michael et al. An empirical investigation of ceteris paribus learnability
US20200257664A1 (en) Node layout determining method and apparatus, computing device, and computer readable medium
CN116860981A (en) Potential customer mining method and device
CN117056663B (en) Data processing method and device, electronic equipment and storage medium
CN113312436B (en) Spatial index processing method and device
CN117370619B (en) Method and device for storing and sub-sampling images in fragments
US12032578B1 (en) Data compression, store, and search system
CN117033541B (en) Space-time knowledge graph indexing method and related equipment
CN111433768B (en) System and method for intelligently guiding shopping
US20230342420A1 (en) Approximate maximal clique enumeration for dynamic graphs
US20240006026A1 (en) Genome assembly method, apparatus, device and storage medium
CN113065071A (en) Product information recommendation method and computer equipment
CN116800625A (en) API call data processing method and using scene data recommending method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant