CN110019253B - Distributed graph data sequence sampling method and device - Google Patents

Distributed graph data sequence sampling method and device Download PDF

Info

Publication number
CN110019253B
CN110019253B CN201910313368.6A CN201910313368A CN110019253B CN 110019253 B CN110019253 B CN 110019253B CN 201910313368 A CN201910313368 A CN 201910313368A CN 110019253 B CN110019253 B CN 110019253B
Authority
CN
China
Prior art keywords
sampling
edge
path
edges
path length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910313368.6A
Other languages
Chinese (zh)
Other versions
CN110019253A (en
Inventor
张熙
雷鸣涛
杨金翠
方滨兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201910313368.6A priority Critical patent/CN110019253B/en
Publication of CN110019253A publication Critical patent/CN110019253A/en
Application granted granted Critical
Publication of CN110019253B publication Critical patent/CN110019253B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Abstract

The embodiment of the invention provides a distributed graph data sequence sampling method and a distributed graph data sequence sampling device, which are applied to distributed computing nodes, wherein the distributed computing nodes comprise: two or more computing nodes, wherein the method comprises: acquiring preset image data, sampling times and sampling path length; the sampling times are equally divided to obtain the respective sampling times of each computing node as sampling distribution times; determining a target path with a path length having the same value as the sampling path length from the path set processed by each computing node according to the sampling path length, wherein the target path is formed by edges of the graph data with the same number of edges as the sampling path length, and each edge of the graph data comprises at least one element; and for the target path of each computing node, respectively extracting one element from at least one element included in each edge forming the target path based on a predetermined weight to obtain a sampling element sequence.

Description

Distributed graph data sequence sampling method and device
Technical Field
The invention relates to the technical field of data analysis, in particular to a distributed graph data sequence sampling method and device.
Background
With the development of communication technology, a large amount of data is formed, and how to obtain valuable information from the large amount of data becomes a concern. The graph data is used as a kind of data, in the related art, a single computation node is adopted for sampling the graph data sequence, although the single computation node realizes the graph data sequence sampling, the whole sampling process is completed through the single computation node, and the sampling efficiency is low.
Disclosure of Invention
The embodiment of the invention aims to provide a distributed graph data sequence sampling method and a distributed graph data sequence sampling device, which are used for solving the technical problem that the whole sampling process is completed through a single computing node and the sampling efficiency is low in the prior art. The specific technical scheme is as follows:
in a first aspect, the present invention provides a distributed graph data sequence sampling method, which is applied to distributed computing nodes, where the distributed computing nodes include: two or more computing nodes, the method comprising:
acquiring preset image data, sampling times and sampling path length;
the sampling times are equally divided to obtain the respective sampling times of each computing node as sampling distribution times;
determining a target path with a path length having the same value as the sampling path length from the path set processed by each computing node according to the sampling path length, wherein the target path is formed by edges of the graph data with the same number of edges as the sampling path length, and each edge of the graph data comprises at least one element;
for a target path of each computation node, extracting an element from at least one element included in each edge forming the target path respectively based on a predetermined weight to obtain a sampling element sequence, wherein the weight is used for indicating the proportion of the sampling element sequence in a full-scale sampling element sequence set, and the full-scale sampling element sequence in the full-scale sampling element sequence set is obtained by sampling the distributed computation nodes respectively according to the sampling distribution times.
Further, before determining a target path having a path length equal to the sampling path length according to the sampling path length in the path set processed from each compute node, the method further includes:
according to the number of the distributed computing nodes, partitioning the edge sets of the graph data to obtain a plurality of partitioned edge sets;
assigning a set of block edges to a compute node;
determining each path formed by each block edge set;
each path formed by each block edge set forms a path set processed by each computing node;
determining a target path with the path length same as the sampling path length in the path set processed by each computing node according to the sampling path length, wherein the target path comprises:
based on a path set processed by each computing node, expanding and searching the numerical value of the sampling path length minus one edge except for the initial edge in the direction that the initial edge in the block edge set distributed by the computing node is taken as the initial edge and the rest edges exist along the initial edge in the block edge set;
and expanding and searching the block edges intensively along the initial edge, wherein the number of the edges is the same as the number of the edges of the sampling path length, and the target path is formed.
Further, the weight is determined by the following steps:
for each computation node, determining the product of the total number of the full-scale sampling element sequences on all edges of the target path forming the computation node and the total number of all target paths included in the computation node as the weight.
Further, the extracting, for the target path of each computation node, one element from at least one element included in each edge forming the target path based on a predetermined weight includes:
for each computing node, determining the reciprocal of the total number of all target paths included by the computing node as the occurrence probability of the total number of all target paths of the target path in each computing node;
for the target path of each computing node, determining the sampling probability of the elements of the edge at the k-th position in the target path based on the occurrence probability and the reciprocal of the sum of the elements of the edge at the k-th position on the target path, wherein k is traversed to take each non-negative integer value in { k |0 is less than or equal to k and less than or equal to L }, and L is the length of the sampling path;
extracting an element from at least one element included in the edge at the k-th position according to the element sampling probability.
In a second aspect, the present invention provides a distributed graph data sequence sampling apparatus, which is applied to a distributed computing node, where the distributed computing node includes: two or more computing nodes, the apparatus comprising:
the first acquisition module is used for acquiring preset image data, sampling times and sampling path length;
the first processing module is used for equally dividing the sampling times to obtain the respective sampling times of each computing node as sampling distribution times;
a second processing module, configured to determine, from a path set processed by each compute node, a target path having a path length that is the same as the sampling path length according to the sampling path length, where the target path is formed by edges of the graph data having the same number of edges as the sampling path length, and each edge of the graph data includes at least one element;
and a third processing module, configured to extract, for a target path of each computation node, one element from at least one element included in each edge forming the target path, respectively, based on a predetermined weight, to obtain a sample element sequence, where the weight is used to indicate a proportion of the sample element sequence in a full-scale sample element sequence set, and the full-scale sample element sequence in the full-scale sample element sequence set is obtained by sampling the distributed computation nodes individually according to the sampling allocation times.
Further, the apparatus further comprises:
a partitioning module, configured to, in the path set processed from each compute node, partition an edge set of the graph data according to the number of the distributed compute nodes to obtain multiple partitioned edge sets before determining, according to the sampling path length, a target path having a path length that is the same as the sampling path length in value;
an allocation module for allocating a block edge set to a compute node;
the fourth processing module is used for determining each path formed by the edges in each block edge set;
the composition module is used for forming each path formed by each block edge set to form a path set processed by each computing node;
the second processing module is configured to:
based on a path set processed by each computing node, expanding and searching the numerical value of the sampling path length minus one edge except for the initial edge in the direction that the initial edge in the block edge set distributed by the computing node is taken as the initial edge and the rest edges exist along the initial edge in the block edge set;
and expanding and searching the block edges intensively along the initial edge, wherein the number of the edges is the same as the number of the edges of the sampling path length, and the target path is formed.
Further, the apparatus further comprises: a fifth processing module to:
for each computation node, determining the product of the total number of the full-scale sampling element sequences on all edges of the target path forming the computation node and the total number of all target paths included in the computation node as the weight.
Further, the third processing module is configured to:
for each computing node, determining the reciprocal of the total number of all target paths included by the computing node as the occurrence probability of the total number of all target paths of the target path in each computing node;
for the target path of each computing node, determining the sampling probability of the elements of the edge at the k-th position in the target path based on the occurrence probability and the reciprocal of the sum of the elements of the edge at the k-th position on the target path, wherein k is traversed to take each non-negative integer value in { k |0 is less than or equal to k and less than or equal to L }, and L is the length of the sampling path;
extracting an element from at least one element included in the edge at the k-th position according to the element sampling probability.
The embodiment of the invention provides a distributed graph data sequence sampling method and a distributed graph data sequence sampling device, wherein preset graph data, sampling times and sampling path length are obtained; the sampling times are equally divided to obtain the respective sampling times of each computing node as sampling distribution times; determining a target path with the path length being the same as the sampling path length according to the sampling path length from the path set processed by each computing node, wherein the target path is formed by edges of graph data with the edge number being the same as the sampling path length, and each edge of the graph data comprises at least one element; for the target path of each computing node, respectively extracting an element from at least one element included in each edge forming the target path based on a predetermined weight to obtain a sampling element sequence, wherein the weight is used for indicating the proportion of the sampling element sequence in a full-scale sampling element sequence set, and the full-scale sampling element sequence in the full-scale sampling element sequence set is obtained by distributing the sampling times of the respective distributed computing nodes.
Therefore, the sampling times are distributed to the same sampling number of each computing node, so that tasks of sampling element sequences can be uniformly distributed to each computing node in the distributed computing nodes, each computing node shares the task of the required sampling times, in addition, a target path with the same numerical value as the sampling path length is determined according to the sampling path length in a path set processed by each computing node, and then for the target path of each computing node, one element is extracted from at least one element included in each edge in the target path respectively on the basis of the predetermined weight to obtain a sampling element sequence, so that each computing node in the distributed computing nodes samples the respective sampling element sequence, and the sampling of the graph data sequence is completed. Compared with the prior art that a single computing node completes the whole sampling process, the method not only reduces the quantity of processing data of each computing node, but also improves the efficiency of acquiring the required sampling graph data.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a distributed graph data sequence sampling method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a distributed computing node according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating application of graph data to a social network, according to an embodiment of the present invention;
FIG. 4 is a first diagram of a block edge set according to an embodiment of the present invention;
FIG. 5 is a second diagram of a block edge set according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating an exemplary application scenario according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a directed path according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a name of a distributed graph data sequence sampling apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Aiming at the problem that the whole sampling process is completed through a single computing node in the prior art and the sampling efficiency is low, the embodiment of the invention provides a distributed graph data sequence sampling method and a distributed graph data sequence sampling device, which are used for acquiring preset graph data, sampling times and sampling path length; the sampling times are equally divided to obtain the respective sampling times of each computing node as sampling distribution times; determining a target path with the path length being the same as the sampling path length according to the sampling path length from the path set processed by each computing node, wherein the target path is formed by edges of graph data with the edge number being the same as the sampling path length, and each edge of the graph data comprises at least one element; and for the target path of each computing node, respectively extracting an element from at least one element included in each edge forming the target path based on a predetermined weight to obtain a sampling element sequence, wherein the weight is used for indicating the proportion of the sampling element sequence in a full-scale sampling element sequence set, and the full-scale sampling element sequence in the full-scale sampling element sequence set is obtained by sampling the distributed computing nodes respectively according to the sampling distribution times.
Therefore, the sampling times are distributed to the same sampling number of each computing node, so that tasks of sampling element sequences can be uniformly distributed to each computing node in the distributed computing nodes, each computing node shares the task of the required sampling times, in addition, a target path with the same numerical value as the sampling path length is determined according to the sampling path length in a path set processed by each computing node, and then for the target path of each computing node, one element is extracted from at least one element included in each edge in the target path respectively on the basis of the predetermined weight to obtain a sampling element sequence, so that each computing node in the distributed computing nodes samples the respective sampling element sequence, and the sampling of the graph data sequence is completed. Compared with the prior art that a single computing node completes the whole sampling process, the method not only reduces the quantity of processing data of each computing node, but also improves the efficiency of acquiring the required sampling graph data.
First, a distributed graph data sequence sampling method provided by an embodiment of the present invention is described below.
The distributed graph data sequence sampling method provided by the embodiment of the invention is applied to graph data sequence sampling with a directed graph data structure among edges, such as social network friend relationship sequence sampling, fault log sequence sampling, reference relationship sequence sampling and the like, so that after the method provided by the embodiment of the invention is realized, the fault log sequence with the largest occurrence frequency or the mining of reference relationship in a network public sentiment network system can be analyzed respectively.
As shown in fig. 1 and fig. 2, a distributed graph data sequence sampling method provided in an embodiment of the present invention is applied to distributed computing nodes, where the distributed computing nodes include: the computing nodes 20 are two or more, only one computing node is labeled in fig. 2, and the computing node in the embodiment of the present invention may be a terminal device, such as a desktop computer, or may be a server. And are not limited herein. The above method may comprise the steps of:
step 110, obtaining preset graph data, sampling times and sampling path length.
The preset graph data is the basis for implementing the embodiment of the present invention, and for convenience of the following description, the preset graph data is referred to as the graph data for short.
The graph data may be, but is not limited to, a directed graph data structure, and the graph data may include, but is not limited to: vertices, edges, elements, where the connection between two vertices is called an edge, a vertex may have zero or more neighboring elements, an element is included on an edge, i.e., each edge of the graph data includes at least one element. For example, as shown in fig. 3, the directed graph data structure may refer to a social network, the vertices may refer to respective users, such as user 1, user 2, user 3, user 4, user 5, user 6, and user 7, the edges may refer to respective directed paths, and the plurality of directed paths may be formed by social network friendship between the respective users, and are used to represent data forwarding directions between the respective users. Specifically, the directed path may include: a directional path with a path length of 3 composed of the user 1, the user 3, the user 6, and the user 7, a directional path with a path length of 2 composed of the user 1, the user 2, and the user 5, a directional path with a path length of 2 composed of the user 1, the user 2, and the user 4, a directional path with a path length of 1 composed of the user 2 and the user 5, and the like. The edge elements may refer to message elements propagated by each user, and if each user propagates multiple message elements, the message elements may be propagated to multiple users due to social network friendships between the users. User i propagates the message elements, i taking a positive integer from 1 to 7 in succession.
In order to limit the number of times that a sequence formed by specified elements in the graph data can be sampled, the number of times of sampling may be obtained in this step 110. The sampling times can be set according to the requirements of users. The sequence formed by the designated elements refers to a sampling element sequence obtained by extracting one element from at least one element included in each edge in the target path based on a predetermined weight for the target path of each computing node in the embodiment of the present invention. This allows sampling the desired number of sample element sequences in terms of sample times.
Since the sampling element sequence is not randomly obtained, the sampling path length can be obtained in step 110, where the sampling path length is used to determine the length of the path to be sampled of the sampling element sequence, so that the paths to be sampled are equal in length, and the obtained sampling element sequences are also equal in length. In the embodiment of the present invention, a path to be sampled for determining the sequence of sampling elements is referred to as a target path.
And step 120, averaging the sampling times to obtain the respective sampling times of each computing node as sampling distribution times.
In step 120, the following steps may be adopted to equally divide the sampling times to obtain the respective sampling times of each computing node as the sampling distribution times: the method comprises the steps of firstly, obtaining the number m of computing nodes and the sampling times X, and secondly, dividing the sampling times X according to the number m of the computing nodes to obtain the sampling times X/m of each computing node; and thirdly, taking the respective sampling times X/m of each computing node as sampling distribution times. In this way, the tasks of the required sampling times shared by each computing node are the same, and thus, each computing node of the processed tasks is fair.
Step 130, determining a target path with the path length being the same as the sampling path length according to the sampling path length from the path set processed by each computing node, where the target path is formed by edges of the graph data with the edge number being the same as the sampling path length, and each edge of the graph data includes at least one element.
The set of paths that each compute node processes may refer to the set of paths that each compute node is assigned to be able to process in the graph data. Before this step 130, the method may obtain the path set by adopting the following steps:
step 1, partitioning the edge set of the graph data according to the number of the distributed computing nodes to obtain a plurality of partitioned edge sets; step 2, distributing a block edge set to a computing node; step 3, determining each path formed by each block edge set; here, the path may be a directed path or an undirected path. And 4, forming a path set processed by each computing node by each path formed by each block edge set.
It should be noted that, for the graph data, a set composed of all edges existing in the graph data is referred to as an original edge set, and at least one edge exists in the original edge set, and at least one directed path may be composed of at least one edge. Specifically, all edges in the original edge set do not have a directed path between each edge, as shown in FIG. 6, edge e1And edge e3There is a directed path between, edge e2And edge e3There is no directed path in between.
Generally, the target path of each computing node can be determined according to the path set processed by each computing node, but in order to avoid omission and error, the target path of the computing node can be determined, and the target path with the path length the same as the sampling path length can be determined by adopting the following steps:
step 5, based on the path set processed by each computing node, with the initial edge in the block edge set allocated by the computing node as the starting point, and along the direction that the starting point in the block edge set has other edges, expanding and searching the value of the sampling path length minus one edge other than the initial edge; wherein, the starting edge may refer to an edge where a leftmost vertex of the block edge set is located. The starting edge may also be an edge where the top point of the directed graph data which is the most started is located, and the number of the starting edges may be one or more than two, which is determined according to the actual situation.
In order to avoid missing the expansion of the value of the searched sampling path length by subtracting another edge except the starting edge, the step 5 may specifically include: according to the comparison result of comparing the numerical value of the sampling path length with the maximum path length in the path set processed by each computing node, based on the path set processed by each computing node, expanding and searching the numerical value of the sampling path length minus one edge except for the initial edge in the direction that the initial edge in the block edge set allocated by the computing node is the initial edge and the rest edge exists along the initial edge in the block edge set; wherein, the starting edge may refer to an edge where a leftmost vertex of the block edge set is located. The further implementation process is as follows:
for the step 5, the following multiple implementation manners may be adopted to implement the path set processed based on each computing node, and with a starting edge in the block edge set allocated by the computing node as a starting point, the direction in which the remaining edges exist along the starting point in the block edge set is extended and found out by subtracting one edge other than the starting edge from the value of the sampling path length:
in one implementation, when the comparison result of comparing the value of the sampling path length with the maximum path length in the path set processed by each computing node is that the value of the sampling path length is smaller than the maximum path length in the path set processed by each computing node, based on the path set processed by each computing node, taking a starting edge in a block edge set allocated by the computing node as a starting point, determining whether an edge in the direction of the remaining edge exists along the starting point in the block edge set;
if the edge along the direction of the rest edge in the block edge set is determined to be the edge, expanding and searching for other edges except the initial edge as the current other edges; and if the edge in the direction of the rest edge does not exist in the block edge set, stopping expanding and searching the edge in the direction.
Judging whether the total number of other edges in the expanded search does not reach the value of the sampling path length, and subtracting one from the total number of other edges in the expanded search, and then determining whether the total number of other edges in the expanded search does not reach the value of the sampling path lengthAnd updating other previous edges as starting points, and continuing to return to the step of determining whether the starting points in the block edge set have edges in the direction of other edges or not until the numerical value expanding the searched sampling path length is reduced by one edge except the starting edge. Therefore, all other edges can be searched according to the block edge set, and omission is avoided. Taking the sampling path length value as 2 as an example, the number of the computing nodes is 2, as shown in the left part of the arrow in fig. 4 and 5, the first block edge set is E1={e1,e5,e6The second block edge set is E2={e2,e3,e4},EiAnd the block edge set stored by each computing node in the m computing nodes is represented, m represents the total number of the computing nodes, and the value range of i is more than or equal to 1 and less than or equal to m. The "first" in the first block edge set and the "second" in the second block edge set are only used to distinguish the two block edge sets, and are not limited in order.
Because the number of edges in the blocking edge set changes, when the value of the sampling path length is smaller than the maximum path length in the path set processed by each computing node, the above implementation may be adopted to extend and find the value of the sampling path length minus one edge other than the starting edge, but, if the value of the sampling path length is larger than the maximum path length in the path set processed by each computing node, in order to avoid that the edge set of the graph data is blocked into multiple blocking edge sets and the number of edges is truncated or omitted, in the embodiment of the present invention, in the above step 5, another implementation may be adopted to extend and find the value of the sampling path length minus one edge other than the starting edge:
when the comparison result of comparing the value of the sampling path length with the maximum path length in the path set processed by each computing node is that the value of the sampling path length is greater than the maximum path length in the path set processed by each computing node, based on the path set processed by each computing node, with the starting edge in the block edge set allocated by the computing node as the starting point, searching the edge in the edge set of the graph data, which is the same as the starting edge, and determining whether an edge in the direction of the rest edge exists along the edge in the edge set of the graph data, which is the same as the starting edge;
if the edges in the edge set of the graph data, which are the same as the initial edges, are determined to be edges in the direction with the rest edges, expanding and searching for other edges except the initial edges to be used as current other edges; if it is determined that an edge in the direction in which no other edge exists in the same edge set as the starting edge in the graph data, the outward search for the edge in the direction is stopped.
And if the total number of the other edges found in an expanded way is judged to be less than the number obtained by subtracting one from the number of the sampling path length, updating the other edges to be the starting point, and continuously returning to the step of determining whether the edges in the direction of the rest edges exist in the edges which are the same as the starting edge in the edge set of the graph data until the number of the sampling path length found in an expanded way is subtracted by the other edges except the starting edge. Therefore, all other edges can be searched according to the edge set of the graph data, and omission is avoided. Taking the value of the sampling path length as 2 as an example, the number of the computing nodes is 2, as shown in the right part of the arrow in fig. 4 and 5, the expanded block edge set may also be called an expanded block edge set, and the first expanded block edge set is
Figure BDA0002032262000000111
The second expansion block edge set is
Figure BDA0002032262000000112
The block edge set after expanding the edge set stored by each computing node in m computing nodes is shown, m represents the total number of the computing nodes, the value range of i is more than or equal to 1 and less than or equal to m,
Figure BDA0002032262000000113
is contained in EiAnd is and
Figure BDA0002032262000000114
to be comprised in EiAnd contains the set of edges that form the target path. "first" and second expanded block edge sets in a first expanded block edge setThe second in (2) is only for distinguishing two extended block edge sets, and is not limited in sequence.
The method of the embodiment of the invention also comprises the following steps: from expanding a set of block edges
Figure BDA0002032262000000115
Determining the specific implementation mode of the edge at each position in the target path:
respectively determining the sampling probability of the kth position of each edge forming the target path in the (n-k) edges, wherein n is an expanded block edge set
Figure BDA0002032262000000116
The total number of middle edges, the (n-k) edges are the edges in the original edge set that do not include the determined first k positions, and k traversal takes each non-negative integer value in { k |0 ≦ k ≦ L }.
In the embodiment of the present invention, the sampling probabilities of the kth position of each edge forming the target path in the (n-k) edges may be respectively determined as follows:
Figure BDA0002032262000000117
wherein the content of the first and second substances,
Figure BDA0002032262000000118
to expand the set of block edges, E(n-k)Is a set of (n-k) edges, E(n-k)Does not include the edge at the determined first k positions, DL(eq) Is composed of
Figure BDA0002032262000000119
Middle distance edge eqNumber of sides of L, eqIs taken for a while
Figure BDA00020322620000001110
Each side of DL-k+1(ek-1) Is composed of
Figure BDA00020322620000001111
Middle distanceThe number of sides (L-k +1) from the (k-1) th position, DL-k(ej) Is composed of
Figure BDA00020322620000001112
Middle distance edge ejIs the number of sides of (L-k), ejTake over E(n-k)Each edge of (1).
It should be noted that, in the following description,
Figure BDA00020322620000001113
middle distance edge eqThe number of sides of L, in particular
Figure BDA00020322620000001114
Middle edge eqIs a starting edge, a distance edge eqThe number of sides of L. Taking FIG. 6 as an example, the distance edge e3The edge being 1 has e1、e2、e4However, in the embodiment of the present invention, the distance edge e3An edge of 1 does not include e1、e2Only including e4Thus, from the side e3The number of sides 1 is 1. This simplifies the calculation of the sampling probability.
And 6, expanding and searching the blocked edges intensively along the initial edge, wherein the number of the edges is the same as that of all the edges of the sampling path length, and forming a target path. Because the number of the edges of all the other edges is found along the initial edge in an expanding way and is the same as the numerical value of the sampling path length, the path length of the obtained target path is the same as the sampling path length. Therefore, all the edges with the number of the edges same as the length of the sampling path can be searched along the expanded length, and the target path is determined.
Because the number of the compute nodes is limited, it is not necessarily guaranteed that each edge has one compute node, and the number of the edges may be equal to, may be greater than, or may be less than the number of the compute nodes, and specifically according to an actual situation, the number of the edges is greater than the number of the compute nodes for example, and is not limited herein, so for convenience of understanding the above-mentioned 5 th step and 6 th step, an exemplary description is as follows:
step 1, as shown in the edge set of the graph data in fig. 6, the edge set of the graph data is partitioned, that is, the edge set of the graph data is partitioned into different partitioned edge sets. Referring to fig. 4 and 5, for example, 2 compute nodes are used to block the edge sets of the graph data, so as to obtain two block edge sets. As long as the same edge is not placed in multiple partitions, and a vertex is not placed in multiple partitions, the integrity of each partition edge set is ensured, and the specific partitioning manner is not limited herein. This allows for separate processing of different sets of block edges.
After obtaining two sets of block edges, step 2, a set of block edges is assigned to a compute node so that the compute node can process the assigned set of block edges.
Step 3, determining each path formed by the edges in each block edge set, for example, the first block edge set is e1, e5, e6, the second block edge set is e2, e3, e4, and the paths with path length 1 in the first block edge set are: (v1, v2), (v1, v3), (v3, v6), the paths in the second set of partition edges with path length 1 are respectively: (v2, v4), (v4, v7), (v2, v5), the paths in the first set of tile edges with path length 2 are: (v1, v3, v,6), the paths in the second block edge set with path length 2 are respectively: (v2, v4, v 7).
And 4, forming each path formed by each block edge set into a path set processed by each computing node. For example, the paths formed by the first set of block edges, which are (v1, v2), (v1, v3), (v3, v6) and (v1, v3, v,6), respectively, constitute the set of paths for the processing of the first compute node; the paths formed by the second partition edge set are (v2, v4), (v4, v7), (v2, v5) and (v2, v4, v7), respectively, and constitute a processed path set of the second compute node. The "first" in the first computing node and the "second" in the second computing node are only for distinguishing the two computing nodes, and are not limited in sequence.
And 5, based on the path set processed by each computing node, expanding the numerical value of the length of the searched sampling path to subtract one edge except for the initial edge by taking the initial edge in the block edge set distributed by the computing node as the initial edge and along the direction that the initial edge in the block edge set has other edges.
And 6, expanding and searching the blocked edges intensively along the initial edge, wherein the number of the edges is the same as that of all the edges of the sampling path length, and a target path is formed.
Based on the path set processed by the first computing node, referring to fig. 4, the specific application example in step 5 above: assuming that the value of the sampling path length is 2 as an example, the path set processed by the first compute node is directly used: then based on the set of paths processed by the first compute node, respectively include: (v1, v2), (v1, v3), (v3, v6) and (v1, v3, v,6), starting from the starting edges e1 and e5 in the first block edge set allocated by the computing node, in the direction in which the rest edges exist in the first block edge set, i.e., in the direction toward the rest edge e6 in the figure, and in the rightward direction in the figure, the edges other than the starting edges e1 and e5 are found out in a spreading manner, and the edges other than the starting edges e1 and e5 are the edges obtained by subtracting one from the value 2 of the sampling path length, i.e., the edges other than the starting edges e1 and e5, i.e., no other edges exist behind the starting edge e1, only the other edges e6 in the rest edges are determined.
In the above specific application example in step 6, the first partitioning edge set is expanded along the start edge e1 to find all edges with the same number of edges as the value 2 of the sampling path length, that is, the start edge e5 and the other edges e6, to form the target path.
Based on the path set processed by the second computing node, referring to fig. 5, the specific application example in step 5 above: assuming that the value of the sampling path length is 2 as an example, the path set processed by the second computing node is directly used: then based on the set of paths processed by the second compute node, respectively include: (v2, v4), (v4, v7), (v2, v5) and (v2, v4, v7), starting from the starting edges e2 and e3 in the second set of block edges allocated by the computing node, in the direction in which the remaining edges exist in the second set of block edges, that is, in the direction toward the remaining edge e4 in the figure, in the rightward direction in the figure, the edges other than the starting edges e2 and e3 are found by expanding, and the edges other than the starting edges e2 and e3 are the edges obtained by subtracting one from the value 2 of the sampling path length, that is, the edges other than the starting edges e2 and e3, that is, because there are no other edges behind the starting edge e2, only the other edges e4 in the remaining edges are determined.
In the above specific application example in step 6, the second partition edge set expands and finds all the edges with the same number of edges as the value 2 of the sampling path length, that is, the starting edge e3 and the other edges e4, along the starting edge e3 to form the target path.
Step 140, for the target path of each computation node, based on a predetermined weight, respectively extracting an element from at least one element included in each edge forming the target path, to obtain a sampling element sequence, where the weight is used to indicate a proportion of the sampling element sequence in a full-scale sampling element sequence set, and the full-scale sampling element sequence in the full-scale sampling element sequence set is obtained by sampling the respective distributed computation nodes according to the sampling allocation times.
When the embodiment of the present invention is applied to an application scenario in which at least two vertices exist in an original vertex set and at least one directional path is formed by at least one edge, a full-vector element sequence may be formed by Q elements, where the Q elements are respectively from Q edges in the original edge set, and a directional path with a length of Q is formed by Q edges, that is, the full-vector sampling element sequence is various possible element sequences formed by elements included in edges forming lengths of respective paths, where positions of respective elements in the element sequence are the same as positions of the edges from the element sequence in the path. Taking FIG. 6 as an example, by edge e1And e2Form a path with path length 2, assume edge e1Including the elements Y1 (1, (a, b, c)), Y2 (2 (a, d)), Y3 (3 (a, c)), and the edge e2Comprising the elements Y11 (11, (b, c)), Y12 (12 (a, c)), wherein one of the full-length element sequences is (Y1, Y11), wherein Y1 comes from the edge e1Y11 from edge e2And the arrangement sequence of the elements in the full element sequence is the same as the pointing sequence of each side in the path. Assuming that the sampling path length is 2, the weight of the one full-scale element sequence (Y1, Y11) is 6 ═ 3 × 2, and 3 is the edge e1The number of elements that are included in the composition,2 is an edge e2The number of elements involved.
In order to obtain the predetermined weight, the method of the embodiment of the present invention further includes: the following steps are used to determine the weights:
for each computation node, determining the product of the total number of the full-scale sampling element sequences on all edges forming the target path of the computation node and the total number of all target paths included in the computation node as a weight.
The full-scale sampling element sequence may refer to all element sequences formed by extracting, for target paths of all computation nodes, one element from at least one element included in each edge forming each target path, the length of the element sequence being the same as that of the sampling element sequence. The sampling element sequence may be a sampling element sequence formed by extracting, based on a predetermined weight, one element from at least one element included in each edge forming the target path for each computation node. And obtaining the numerical value of the sampling element sequence of each calculation node according to the target path of each calculation node, wherein the numerical value of the obtained sampling element sequence is the same as the numerical value of the sampling distribution times. And calculating the target paths of the nodes, wherein the obtained numerical value of the sampling element sequence is the same as the numerical value of the sampling times.
In order to obtain the sample element sequence, the step 140 may adopt the following implementation steps from step 141 to step 143, and respectively extract one element from at least one element included in each edge forming the target path:
141, determining the reciprocal of the total number of all target paths included in each computation node as the occurrence probability of the total number of all target paths of the target path in each computation node;
142, determining the sampling probability of the elements of the edge at the k-th position in the target path based on the occurrence probability and the reciprocal of the sum of the elements of the edge at the k-th position on the target path for the target path of each computation node; the aforementioned 142 th step specifically includes, based on the occurrence probability, an inverse of a sum of edge elements at the kth position on the target path: and determining the product of the occurrence probability and the reciprocal of the sum of the elements on the edge at the k-th position on the target path as the sampling probability of the elements on the edge at the k-th position in the target path.
143, extracting an element from at least one element included in the edge at the k-th position according to the element sampling probability;
wherein the element sampling probability of the edge at the kth position is 1/(N)kX), the weight is NkX X, 1/X is the probability of occurrence, 1/NkIs the inverse of the sum of the elements on the edge at the k-th position, NkFor the sum of the elements on the edge at the kth position, k traversal takes each non-negative integer value in { k |0 ≦ k ≦ L }, where X is the total number of all target paths included in the compute node, and L is the sample path length.
Specific examples of the applications of the steps 141 to 143 are as follows:
assume that when the sampling path length L is 2, the edge e1Edge e2Edge e3Edge e4Edge e5Edge e6Taking fig. 6 as an example, the following manner is used in the present application to follow edge e1Edge e2Edge e3Edge e4Edge e5Edge e6The elements on 2 sides of one target path are determined:
first, an edge at the 0 th position in the path with L1 being 1 is determined, specifically, to ensure the correlation between the elements in the sampled sample element sequence, the edge e is the first edge1Edge e2Edge e3Edge e4Edge e5Edge e6Determining probabilities of the edges at the 0 th position in the path with L1 ═ 1 as the edges at the 0 th position in the path with L1 ═ 1, respectively, and then selecting the edge e based on the probability of the edge at the 0 th position in the path with L1 ═ 11Edge e2Edge e3Edge e4Edge e5Edge e6One edge as the edge at position 0. In the embodiment of the invention, the following formula is adopted:
Figure BDA0002032262000000161
ej∈{e1、e2、e3、e4、e5、e6determine the probability of an edge at position 0 in the path with L1 ═ 1, where Pr (e)j) Probability of an edge at position 0 in a path with L1 ═ 1, DL1(ej) Is composed of
Figure BDA0002032262000000162
Middle distance edge ejThe number of sides having a length of 1 is specified as the following side ejAs a start edge, an edge e having a distance of 1 from the edgejThe number of (2).
The edge e may be set to1Edge e2Edge e3Edge e4Edge e5Edge e6The node at the 0 th position in the path, which is regarded as L1 being 1, respectively, i.e. the edge e1Edge e2Edge e3Edge e4Edge e5Edge e6The probability of each node being the 0 th position in the path with L1 being 1 is the sampling probability of the edge at the 0 th position in the path with L1 being 1.
Next, the edge at the 1 st position in the path with L1 equal to 1 is determined, taking fig. 6 as an example, and is separated from the edge e along the direction of the directed graph 12 sides of length 1, respectively side e2Edge e3Thus, DL1(e1) 2; distance edge e2The side of length 1 has 0, thus, DL1(e2) 0; in the same way, DL1(e3)=1,DL1(e4)=0,DL1(e5)=1,DL1(e6)=0。
Therefore, the temperature of the molten metal is controlled,
Figure BDA0002032262000000163
in a similar way, Pr (e)2)=0,
Figure BDA0002032262000000164
Pr(e4)=0,
Figure BDA0002032262000000165
Pr(e6)=0。
Assume that the edge at the 0 th position is determined to be the edge e according to the sampling probability of the edge at the 0 th position in the path with L1 being 1 of each node1Then from e2Edge e3Edge e4Edge e5Edge e6Determines the edge at the 1 st position in the path with L1 ═ 1.
In the embodiment of the present invention, the edge (e) at the 1 st position in the path where L1 is 1 may be determined first2Edge e3Edge e4Edge e5Edge e6) The sampling probability of the edge at (e), and then the edge at the 1 st position in the path according to L1 ═ 1 (e)2Edge e3Edge e4Edge e5Edge e6) From edge e2Edge e3Edge e4Edge e5Edge e6To select a node as the node at the 1 st position in the path with L1 ═ 1, it should be understood that e will be2Edge e3Edge e4Edge e5Edge e6The sampling probability of the edge at the 1 st position in the path with L1 ═ 1, which is the node at the 1 st position in the path with L1 ═ 1 respectively, is i.e. e2Edge e3Edge e4Edge e5Edge e6The probability of each node being the 1 st position in the path with L1 being 1.
In the embodiment of the invention, the following formula is adopted:
Figure BDA0002032262000000171
ej∈{e2、e3、e4、e5、e6the sampling probability of the edge at the 1 st position in the path of L1 ═ 1, specifically,
Figure BDA0002032262000000172
Pr(e4)=0,Pr(e5)=0,Pr(e6)=0。
then, from edge e2Edge e3One edge is selected, assuming that the selected edge is edge e2Then, in the path of fig. 7 where L1 is 1, the edge at the 0 th position is the edge e1The edge at the 1 st position is the edge e2
Finally, respectively from edge e1Including at least one element and an edge e2At least one element included in the sequence is extracted to obtain a sampling element sequence:
to follow edge e1Extract an element from the edge e as an example2The way and edge e of the included elements to extract the elements1The manner of extracting an element is similar, and is not described in detail herein.
Due to the edge e1Includes 3 elements, and thus, edge e1Has an element sampling probability of 1/3; and edge e2Includes 2 elements, and thus, edge e2Has an element sampling probability of 1/2.
From edge e according to element sampling probability 1/31Including elements Y1, Y2, Y3, one element is extracted. Assume that the slave edge e is based on the above method1From edge e1Including at least one element and an edge e2At least one element is extracted, namely Y1 and Y11. Since the edge at the 0 th position in the target path is e1The edge at the 1 st position is e2Then the sample element sequence is (Y1, Y11), the weight determined (Y1, Y11) is the total number of the full-scale sample element sequences on all edges of the edge of the target path 6, and
Figure BDA0002032262000000173
the product 18 of the total number 3 of all target paths included by the compute node.
The following proceeds to describe the distributed graph data sequence sampling apparatus provided in the embodiment of the present invention.
As shown in fig. 8, an embodiment of the present invention further provides a distributed graph data sequence sampling apparatus, which is applied to distributed computing nodes, where the distributed computing nodes include: two or more computing nodes, the apparatus comprising:
a first obtaining module 31, configured to obtain preset map data, sampling times, and a sampling path length;
the first processing module 32 is configured to equally divide the sampling times to obtain respective sampling times of each computing node, which are used as sampling distribution times;
a second processing module 33, configured to determine, from the path set processed by each compute node, a target path having a path length that is the same as the sampling path length according to the sampling path length, where the target path is formed by edges of the graph data having the same number of edges as the sampling path length, and each edge of the graph data includes at least one element;
a third processing module 34, configured to, for a target path of each computation node, extract one element from at least one element included in each edge forming the target path, respectively, based on a predetermined weight, to obtain a sample element sequence, where the weight is used to indicate a proportion of the sample element sequence in a full-scale sample element sequence set, and the full-scale sample element sequence in the full-scale sample element sequence set is obtained by sampling the respective distributed computation nodes according to the sampling allocation times.
In the embodiment of the present invention, the sampling times are allocated to the same sampling number of each computing node, so that tasks of sampling element sequences can be respectively and uniformly allocated to each computing node in the distributed computing nodes, each computing node shares the task of the respective required sampling times, and a target path having the same numerical value as the sampling path length is determined according to the sampling path length from a path set processed by each computing node, and then, for the target path of each computing node, one element is respectively extracted from at least one element included in each edge in the target path based on a predetermined weight to obtain a sampling element sequence, so that the respective sampling element sequence is sampled by each computing node in the distributed computing nodes, thereby completing sampling of a graph data sequence. Compared with the prior art that a single computing node completes the whole sampling process, the method not only reduces the quantity of processing data of each computing node, but also improves the efficiency of acquiring the required sampling graph data.
In one possible implementation, the apparatus further includes:
a partitioning module, configured to, in the path set processed from each compute node, partition an edge set of the graph data according to the number of the distributed compute nodes to obtain multiple partitioned edge sets before determining, according to the sampling path length, a target path having a path length that is the same as the sampling path length in value;
an allocation module for allocating a block edge set to a compute node;
the fourth processing module is used for determining each path formed by the edges in each block edge set;
the composition module is used for forming each path formed by each block edge set to form a path set processed by each computing node;
the second processing module is configured to:
based on a path set processed by each computing node, expanding and searching the numerical value of the sampling path length minus one edge except for the initial edge in the direction that the initial edge in the block edge set distributed by the computing node is taken as the initial edge and the rest edges exist along the initial edge in the block edge set;
and expanding and searching the block edges intensively along the initial edge, wherein the number of the edges is the same as the number of the edges of the sampling path length, and the target path is formed.
In one possible implementation, the apparatus further includes: a fifth processing module to:
for each computation node, determining the product of the total number of the full-scale sampling element sequences on all edges of the target path forming the computation node and the total number of all target paths included in the computation node as the weight.
In a possible implementation manner, the third processing module is configured to:
for each computing node, determining the reciprocal of the total number of all target paths included by the computing node as the occurrence probability of the total number of all target paths of the target path in each computing node;
for the target path of each computing node, determining the sampling probability of the elements of the edge at the k-th position in the target path based on the occurrence probability and the reciprocal of the sum of the elements of the edge at the k-th position on the target path, wherein k is traversed to take each non-negative integer value in { k |0 is less than or equal to k and less than or equal to L }, and L is the length of the sampling path;
extracting an element from at least one element included in the edge at the k-th position according to the element sampling probability.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (8)

1. A distributed graph data sequence sampling method is applied to distributed computing nodes, and the distributed computing nodes comprise: two or more computing nodes, the method comprising:
acquiring preset image data, sampling times and sampling path length;
the sampling times are equally divided to obtain the respective sampling times of each computing node as sampling distribution times;
determining a target path with a path length having the same value as the sampling path length from the path set processed by each computing node according to the sampling path length, wherein the target path is formed by edges of the graph data with the same number of edges as the sampling path length, and each edge of the graph data comprises at least one element;
for a target path of each computation node, extracting an element from at least one element included in each edge forming the target path respectively based on a predetermined weight to obtain a sampling element sequence, wherein the weight is used for indicating the proportion of the sampling element sequence in a full-scale sampling element sequence set, and the full-scale sampling element sequence in the full-scale sampling element sequence set is obtained by sampling the distributed computation nodes respectively according to the sampling distribution times;
before determining, in the path set processed from each compute node, a target path having a path length equal to the sample path length according to the sample path length, the method further includes:
according to the number of the distributed computing nodes, partitioning the edge sets of the graph data to obtain a plurality of partitioned edge sets;
assigning a set of block edges to a compute node;
determining each path formed by each block edge set;
and forming a path set processed by each computing node by the paths formed by each block edge set.
2. The method of claim 1, wherein determining a target path from the set of paths processed by each compute node having a path length equal to the sampled path length by the sampled path length comprises:
based on a path set processed by each computing node, expanding and searching the numerical value of the sampling path length minus one edge except for the initial edge in the direction that the initial edge in the block edge set distributed by the computing node is taken as the initial edge and the rest edges exist along the initial edge in the block edge set;
and expanding and searching the block edges intensively along the initial edge, wherein the number of the edges is the same as the number of the edges of the sampling path length, and the target path is formed.
3. The method of claim 1, wherein the weights are determined by:
for each computation node, determining the product of the total number of the full-scale sampling element sequences on all edges of the target path forming the computation node and the total number of all target paths included in the computation node as the weight.
4. The method according to claim 1 or 3, wherein said extracting, for the target path of each computing node, one element from at least one element included in each edge forming the target path, respectively, based on a predetermined weight, comprises:
for each computing node, determining the reciprocal of the total number of all target paths included by the computing node as the occurrence probability of the total number of all target paths of the target path in each computing node;
for the target path of each computing node, determining the sampling probability of the elements of the edge at the k-th position in the target path based on the occurrence probability and the reciprocal of the sum of the elements of the edge at the k-th position on the target path, wherein k is traversed to take each non-negative integer value in { k |0 is less than or equal to k and less than or equal to L }, and L is the length of the sampling path;
extracting an element from at least one element included in the edge at the k-th position according to the element sampling probability.
5. A distributed graph data sequence sampling apparatus, applied to distributed computing nodes, the distributed computing nodes comprising: two or more computing nodes, the apparatus comprising:
the first acquisition module is used for acquiring preset image data, sampling times and sampling path length;
the first processing module is used for equally dividing the sampling times to obtain the respective sampling times of each computing node as sampling distribution times;
a second processing module, configured to determine, from a path set processed by each compute node, a target path having a path length that is the same as the sampling path length according to the sampling path length, where the target path is formed by edges of the graph data having the same number of edges as the sampling path length, and each edge of the graph data includes at least one element;
a third processing module, configured to extract, for a target path of each computation node, one element from at least one element included in each edge forming the target path, respectively, based on a predetermined weight, to obtain a sample element sequence, where the weight is used to indicate a proportion of the sample element sequence in a full-scale sample element sequence set, and the full-scale sample element sequence in the full-scale sample element sequence set is obtained by sampling the distributed computation nodes individually according to the sampling allocation times;
the device further comprises:
a partitioning module, configured to, in the path set processed from each compute node, partition an edge set of the graph data according to the number of the distributed compute nodes to obtain multiple partitioned edge sets before determining, according to the sampling path length, a target path having a path length that is the same as the sampling path length in value;
an allocation module for allocating a block edge set to a compute node;
the fourth processing module is used for determining each path formed by the edges in each block edge set;
and the composition module is used for forming each path formed by each block edge set into a path set processed by each computing node.
6. The apparatus of claim 5,
the second processing module is configured to: based on a path set processed by each computing node, expanding and searching the numerical value of the sampling path length minus one edge except for the initial edge in the direction that the initial edge in the block edge set distributed by the computing node is taken as the initial edge and the rest edges exist along the initial edge in the block edge set;
and expanding and searching the block edges intensively along the initial edge, wherein the number of the edges is the same as the number of the edges of the sampling path length, and the target path is formed.
7. The apparatus of claim 5, wherein the apparatus further comprises: a fifth processing module to:
for each computation node, determining the product of the total number of the full-scale sampling element sequences on all edges of the target path forming the computation node and the total number of all target paths included in the computation node as the weight.
8. The apparatus of claim 5 or 7, wherein the third processing module is to:
for each computing node, determining the reciprocal of the total number of all target paths included by the computing node as the occurrence probability of the total number of all target paths of the target path in each computing node;
for the target path of each computing node, determining the sampling probability of the elements of the edge at the k-th position in the target path based on the occurrence probability and the reciprocal of the sum of the elements of the edge at the k-th position on the target path, wherein k is traversed to take each non-negative integer value in { k |0 is less than or equal to k and less than or equal to L }, and L is the length of the sampling path;
extracting an element from at least one element included in the edge at the k-th position according to the element sampling probability.
CN201910313368.6A 2019-04-18 2019-04-18 Distributed graph data sequence sampling method and device Active CN110019253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910313368.6A CN110019253B (en) 2019-04-18 2019-04-18 Distributed graph data sequence sampling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910313368.6A CN110019253B (en) 2019-04-18 2019-04-18 Distributed graph data sequence sampling method and device

Publications (2)

Publication Number Publication Date
CN110019253A CN110019253A (en) 2019-07-16
CN110019253B true CN110019253B (en) 2021-10-12

Family

ID=67191800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910313368.6A Active CN110019253B (en) 2019-04-18 2019-04-18 Distributed graph data sequence sampling method and device

Country Status (1)

Country Link
CN (1) CN110019253B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114742691B (en) * 2022-05-19 2023-08-18 支付宝(杭州)信息技术有限公司 Graph data sampling method and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10162868B1 (en) * 2015-03-13 2018-12-25 Amazon Technologies, Inc. Data mining system for assessing pairwise item similarity

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8265778B2 (en) * 2010-06-17 2012-09-11 Microsoft Corporation Event prediction using hierarchical event features
US9189217B2 (en) * 2011-10-03 2015-11-17 Telefonaktiebolaget L M Ericsson (Publ) Method for exploiting massive parallelism
US9251277B2 (en) * 2012-12-07 2016-02-02 International Business Machines Corporation Mining trajectory for spatial temporal analytics
CN104317801B (en) * 2014-09-19 2017-07-18 东北大学 A kind of Data clean system and method towards big data
CN105631752A (en) * 2016-03-22 2016-06-01 南京信息工程大学 Social network sampling generation algorithm for improving Dijkstra weight
CN106874083B (en) * 2017-01-03 2019-06-28 杭州医学院 A kind of data actuation man-machine interface method for scheduling task
CN106815080B (en) * 2017-01-09 2020-01-14 北京航空航天大学 Distributed graph data processing method and device
CN108319600B (en) * 2017-01-16 2021-01-08 华为技术有限公司 Data mining method and device
CN109344295B (en) * 2018-08-24 2020-05-05 阿里巴巴集团控股有限公司 Distributed graph embedding method, device, equipment and system
CN109151042B (en) * 2018-09-06 2019-11-29 褚战星 Internet of Things perception data Intelligent planning method
CN109299373B (en) * 2018-10-20 2021-10-29 上海交通大学 Recommendation system based on graph convolution technology

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10162868B1 (en) * 2015-03-13 2018-12-25 Amazon Technologies, Inc. Data mining system for assessing pairwise item similarity

Also Published As

Publication number Publication date
CN110019253A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
CN111460311A (en) Search processing method, device and equipment based on dictionary tree and storage medium
CN107952243B (en) Path determining method and device
US8527503B2 (en) Processing search queries in a network of interconnected nodes
US9159030B1 (en) Refining location detection from a query stream
US10360263B2 (en) Parallel edge scan for single-source earliest-arrival in temporal graphs
US20130103678A1 (en) Processing Search Queries Using A Data Structure
CN109783589B (en) Method, device and storage medium for resolving address of electronic map
CN110019253B (en) Distributed graph data sequence sampling method and device
CN108572958B (en) Data processing method and device
WO2017026999A1 (en) Identifying shortest paths
KR20170032366A (en) Method and apparatus for obtaining candidate address information in map
CN102999558B (en) Data structure is used to process search inquiry
CN114547439A (en) Service optimization method based on big data and artificial intelligence and electronic commerce AI system
CN108696418B (en) Privacy protection method and device in social network
CN112256957A (en) Information sorting method and device, electronic equipment and storage medium
CN105025013A (en) A dynamic IP coupling model based on a priority Trie tree
US9600468B2 (en) Dictionary creation device, word gathering method and recording medium
CN111340623A (en) Data storage method and device
CN110956553A (en) Community structure division method based on social network node dual-label propagation algorithm
CN108011735B (en) Community discovery method and device
CN113326430A (en) Information pushing method and system based on live social big data
JP6169471B2 (en) Residence position estimation device and residence position estimation method
KR101591595B1 (en) Method for predicting link in big database
CN114547440A (en) User portrait mining method based on internet big data and artificial intelligence cloud system
KR102323424B1 (en) Rating Prediction Method for Recommendation Algorithm Based on Observed Ratings and Similarity Graphs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant