CN113779322B - Method, apparatus, device and computer readable storage medium for graph retrieval - Google Patents

Method, apparatus, device and computer readable storage medium for graph retrieval Download PDF

Info

Publication number
CN113779322B
CN113779322B CN202111064232.XA CN202111064232A CN113779322B CN 113779322 B CN113779322 B CN 113779322B CN 202111064232 A CN202111064232 A CN 202111064232A CN 113779322 B CN113779322 B CN 113779322B
Authority
CN
China
Prior art keywords
vertices
vertex
graph
graph data
devices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111064232.XA
Other languages
Chinese (zh)
Other versions
CN113779322A (en
Inventor
汪洋
袁鹏程
陈曦
李方明
杨仁凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111064232.XA priority Critical patent/CN113779322B/en
Publication of CN113779322A publication Critical patent/CN113779322A/en
Application granted granted Critical
Publication of CN113779322B publication Critical patent/CN113779322B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

According to example embodiments of the present disclosure, a method, apparatus, device, and computer-readable storage medium for graph retrieval are provided. The method includes determining a number of contiguous vertices (i.e., dimensions) of vertices in the graph data, and then ordering the plurality of vertices in the graph data according to the number of contiguous vertices. The method further includes slicing the graph data based on the ordering of the plurality of vertices, and performing a graph retrieval operation for the graph data using the sliced graph data. The embodiment of the disclosure sorts all the vertexes in the graph data according to the dimension, and divides the vertexes based on the sorting, so that the number of the cut edges can be reduced in the graph cutting process, and the graph retrieval efficiency is improved.

Description

Method, apparatus, device and computer readable storage medium for graph retrieval
The present application is a divisional application of chinese patent application with application date 2018, 8, 27, application number 201810983673.1, and the invention name "method, apparatus, device, and computer-readable storage medium for dividing map data".
Technical Field
Embodiments of the present disclosure relate generally to the field of data processing and, more particularly, relate to a method, apparatus, electronic device, and computer-readable storage medium for graph retrieval.
Background
A graph (graph) is a complex nonlinear data structure that consists of a set of vertices and a set of edges between vertices. There may be an association between any two vertices in the graph, which is represented by an edge, and the vertices and edges may each have various settable properties. Depending on whether an edge has a direction, the graph can be divided into an undirected graph and a directed graph, where each edge in the graph is undirected, the graph is referred to as an undirected graph, and where each edge in the graph is directed, the graph is referred to as a directed graph. Furthermore, the graph can be further divided into an unweighted graph and a weighted graph according to whether an edge has a weight.
An advantage of the graph structure is that describing the association between data, each vertex in the graph may represent an "entity" in the real or virtual world, and each edge may represent a "relationship" between entities. For example, each vertex may represent a person, each edge may represent a relationship between persons (such as parent-child relationships, colleague relationships, friends relationships, etc.), and such a graph may also be referred to as a person graph. In the big data age, with the explosive growth of information, graph data reaches a massive scale, for example, a figure graph of a large social network can have hundreds of millions of vertices and edges.
Disclosure of Invention
According to example embodiments of the present disclosure, a method, apparatus, electronic device, computer-readable storage medium, and computer program product for graph retrieval are provided.
In a first aspect of the present disclosure, a method for graph retrieval is provided. The method comprises the following steps: determining a number of contiguous vertices of the vertices in the graph data; sorting the plurality of vertices in the graph data according to the number of adjacent vertices; based on the ordering of the plurality of vertexes, segmenting the graph data; and performing a graph retrieval operation for the graph data using the segmented graph data.
In a second aspect of the present disclosure, an apparatus for segmenting graph data is provided. The device comprises: a determination module configured to determine a number of contiguous vertices of vertices in the graph data; the ordering module is configured to order the plurality of vertexes in the graph data according to the number of adjacent vertexes; the segmentation module is configured to segment the graph data based on the ordering of the plurality of vertexes; and a retrieval module configured to perform a graph retrieval operation for the graph data using the segmented graph data.
In a third aspect of the present disclosure, an electronic device is provided that includes one or more processors and a storage device for storing one or more programs. The one or more programs, when executed by the one or more processors, cause the electronic device to implement methods or processes in accordance with embodiments of the present disclosure.
In a fourth aspect of the present disclosure, there is provided a computer readable medium having stored thereon a computer program which when executed by a processor implements a method or process according to an embodiment of the present disclosure.
In a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method or process according to embodiments of the present disclosure.
It should be understood that what is described in this summary is not intended to limit the critical or essential features of the embodiments of the disclosure nor to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The above and other features, advantages and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings, wherein like or similar reference numerals denote like or similar elements, in which:
FIG. 1 shows a schematic diagram for segmenting graph data according to an embodiment of the present disclosure;
FIG. 2 illustrates a flow chart of a method for segmenting graph data, according to an embodiment of the present disclosure;
FIG. 3 illustrates a flow chart of a method for assigning vertices and edges in graph data to multiple search engine machines, according to an embodiment of the disclosure;
4A-4E illustrate a schematic diagram of a process for assigning vertices and edges in graph data to multiple search engine machines, according to an embodiment of the disclosure;
5A-5B illustrate diagrams for adjusting the allocation of a particular vertex according to embodiments of the present disclosure;
FIG. 6 illustrates a block diagram of an apparatus for segmenting graph data, according to an embodiment of the present disclosure; and
fig. 7 illustrates a block diagram of an electronic device capable of implementing various embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been illustrated in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather, these embodiments are provided so that this disclosure will be more thorough and complete. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.
In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The term "some embodiments" should be understood as "at least some embodiments". Other explicit and implicit definitions are also possible below.
The inventors of the present application have noted that in the event that the data volume of graph data is excessive, such as a character graph of a large social network may have up to hundreds of millions of vertices and edges, a single graph search engine machine may not be able to undertake the task of searching the entire graph data. Thus, the graph data needs to be split into multiple sub-graphs, each search engine machine undertaking the search task of the respective sub-graph, wherein each search engine machine may for example be a physical device, also referred to as a shard (shard).
Conventionally, graph data is typically randomly sliced, so long as the number of vertices per sub-graph is guaranteed to be substantially uniform, i.e., vertices are randomly assigned to individual search engine machines. The traditional graph cutting method is generally divided into two types of point cutting and edge cutting, wherein the point cutting indicates that a plurality of points are cut, and the cut points need to keep copies on two machines; while edge slicing means that several edges are cut, the cut edges need to retain copies on both machines, and when it is desired to traverse the cut edges, communication across machines is made. However, this random slicing method may cause excessive edges to be sliced, and excessive edge slicing may result in excessive inter-machine communication, so that the overall performance of the retrieval system is drastically reduced. Therefore, the conventional graph cutting method is inefficient, resulting in excessive edges being cut.
The embodiment of the disclosure provides a scheme for segmenting graph data. Embodiments of the present disclosure sort all vertices in the graph data by dimensions and divide the vertices according to the sorting, enabling a reduction in the number of edges cut during the graph cutting process. Thus, according to embodiments of the present disclosure, the complete sub-graph portions may be partitioned together as much as possible, thereby reducing the probability that edges are cut. Some example embodiments of the present disclosure are described in detail below with reference to fig. 1-7.
Fig. 1 shows a schematic diagram 100 for segmenting graph data according to an embodiment of the present disclosure. As shown in fig. 1, graph data 110 (also referred to simply as a graph) includes a plurality of vertices and directed edges between the vertices. According to an embodiment of the present disclosure, the graph data 110 can be segmented into 3 sub-graph data 121, 122, and 123 by ordering all vertices in the graph data by the number of adjacent vertices and dividing the vertices according to the ordering. As shown in fig. 1, in the sub-graph data 121-123 cut out according to the embodiment of the present disclosure, only two sides are cut out, and thus the number of cut-out sides can be reduced during the cutting process. An example implementation of splitting the graph data 110 into multiple sub-graph data 121, 122, and 123 is shown below with reference to fig. 2-7.
Fig. 2 illustrates a flow chart of a method 200 for segmenting graph data, according to an embodiment of the present disclosure. At block 202, a number of contiguous vertices of the vertices in the graph data is determined, where the number of contiguous vertices represents a number of vertices adjacent to each vertex, also referred to as a dimension of the vertex. In some embodiments, the dimensions of each vertex in the graph data may be calculated, and the graph data may store, for example, persona relationship data, business relationship data, or financial entity relationship data, among others.
In some embodiments, the number of contiguous vertices of the vertices in the graph data may be determined if the number of at least one of the vertices and edges in the graph data exceeds a predetermined number. That is, if the amount of data of the graph data reaches a level that needs to be sliced before processing (e.g., distributed retrieval for the graph data), the graph slicing method or process described in accordance with embodiments of the present disclosure may be performed on the graph data. It should be appreciated that the segmented graph data can be used for other purposes, such as distributed storage, in addition to distributed retrieval.
At block 204, the plurality of vertices in the graph data are ordered according to the number of contiguous vertices. For example, after the dimensions for each vertex are calculated, the vertices are ordered based on the dimensions. Alternatively, the ordering may be random for multiple vertices of the same dimension. Alternatively, for multiple vertices of the same dimension, other attributes of the vertices may be considered to rank such that the relatively important vertices rank higher.
At block 206, the graph data is sliced based on the ordering of the plurality of vertices. For example, the highest ranked vertex and its neighbors may be assigned together according to the ordering of all vertices, and iterative assignment may be performed until all vertices are assigned, such that higher-dimensional vertices and their neighbors are assigned as much as possible in one machine, thereby reducing the number of edges cut. In some embodiments, an ordered set of vertices may be generated based on the ordering of the plurality of vertices, and the plurality of vertices in the set of vertices are assigned to a plurality of search engine machines or devices based on the ordered set of vertices. An example implementation of assigning multiple vertices to multiple machines is described below with reference to FIGS. 3, 4A-4E, and 5A-5B.
FIG. 3 illustrates a flow chart of a method 300 for assigning vertices and edges in graph data to multiple search engine machines, according to an embodiment of the disclosure. It should be appreciated that fig. 3 may be an example implementation of step 206 in the method 200 described above with reference to fig. 2. For purposes of clarity in explaining aspects of the present disclosure, the method 300 of FIG. 3 is described below with reference to schematic diagrams 400, 440, 450, 460, and 470 shown in FIGS. 4A-4E.
At block 302, all vertices in the graph data are ordered by dimension to generate a set of vertices. For example, as shown in FIG. 4A, the graph data 410 needs to be split into 3 parts to be distributed to 3 search engine machines 431, 432, and 433. The graph data 410 includes 13 vertices, namely, vertex A, vertex B, vertex C, vertex D, vertex E, vertex F, vertex G, vertex H, vertex I, vertex J, vertex K, vertex L, and vertex M. In addition, graph data 410 includes a total of 12 edges, with directed edges between the vertices. For example, a unidirectional edge of vertex B to vertex A indicates that vertex B has some unidirectional relationship to vertex A. It should be appreciated that although described with unidirectional edges as an example, the graph cut method according to embodiments of the present disclosure is equally applicable to graph data having undirected edges.
With continued reference to fig. 4A, after the dimensions of all vertices in graph data 410 are calculated, all vertices are ordered by dimension and vertex set 420 is generated. As shown in FIG. 4A, in vertex set 420, the number above the vertex indicates the dimension of the vertex, e.g., the dimension of vertex A is 6, indicating that vertex A has 6 contiguous vertices, and also indicating that 6 edges are connected to vertex A. Next, all vertices in vertex set 420 need to be assigned to 3 machines 431, 432, and 433 according to the ordering in vertex set 420, such that the number of edges cut is minimal or small.
At block 304, the highest current dimension vertex in the vertex set and the least currently loaded machine are selected. For example, in FIG. 4B, vertex A, which has the highest current dimension (its dimension is 6), is selected, and the machine with the least current load (i.e., the machine with the least currently assigned vertex) is selected. In the initial phase, each machine has not yet assigned any vertices, so a machine (such as machine 431) may be randomly selected. It should be appreciated that the vertex with the highest current dimension may be selected first and then the machine with the least current load, or the machine with the least current load may be selected first and then the vertex with the highest current dimension may be selected, or both the vertex with the highest current dimension and the machine with the least current load may be selected.
At block 306, the vertex with the highest current dimension and its unassigned contiguous vertices are assigned to the least currently loaded machine. For example, as shown in FIG. 4B, vertex A, which is the highest in the current dimension, may be assigned to machine 431 first, then unassigned contiguous vertices of vertex A, including vertex B, vertex C, vertex D, vertex E, vertex F, and vertex G, are determined and assigned together to machine 431. That is, if one or more adjacent vertices of the vertex with the highest current dimension are not assigned, then that or those adjacent vertices are assigned. In some embodiments, the unassigned contiguous vertex of the highest current dimension may also be determined first, and then the highest current dimension vertex and its unassigned contiguous vertex together assigned to the least currently loaded machine. It should be appreciated that in the course of the vertex being assigned, the edges that connect to the vertex are also assigned to the corresponding machine at the same time.
At block 308, the set of vertices is dynamically updated. As shown in fig. 4B and 4C, vertex set 420 is dynamically updated whenever a vertex is assigned from vertex set 420 to the machine, and thus, the currently unassigned vertex is always retained in vertex set 420. As shown in FIG. 4C, unassigned vertices in vertex set 420 include vertex H, vertex K, vertex L, vertex I, vertex J, and vertex M.
At block 310, a determination is made as to whether there are unassigned vertices in the vertex set, and steps 304-308 are performed iteratively if there are unassigned vertices in the vertex set. That is, steps 304-308 of each round of vertex assignment are performed iteratively until all vertices in the vertex set have been assigned.
For example, as shown in FIG. 4C, after the first round of allocation is completed, the vertex with the highest current dimension becomes vertex H, the machines with the least current load are machines 432 and 433, and in FIG. 4D, vertex H and its unassigned adjacent vertex I, vertex J, and vertex K (vertex G, although also the adjacent vertex to vertex H, has been allocated in the first round) are allocated to machine 432 and vertex set 420 is dynamically updated.
As shown in fig. 4D, after the second round of allocation is completed, the vertex with the highest current dimension becomes vertex L, the machine with the smallest current load is machine 433, and in fig. 4E, vertex L and its unassigned contiguous vertex M (although vertex K is also a contiguous vertex of vertex L, it has been allocated in the second round) are allocated to machine 433, and vertex set 420 is dynamically updated.
Referring back to FIG. 3, if it is determined at block 310 that there are no unassigned vertices in the vertex set (e.g., all vertices in vertex set 420 shown in FIG. 4E are assigned), then at block 312, a distributed search for graph data is performed. For example, machine 431 is used to perform searches for vertex A, vertex B, vertex C, vertex D, vertex E, vertex F, vertex G, and related edges, machine 432 is used to perform searches for vertex H, vertex I, vertex J, vertex K, and related edges, and machine 433 is used to perform searches for vertex L, vertex M, and related edges. In addition, in the case where it is necessary to traverse the edge between the vertex G and the vertex H or the edge between the vertex H and the vertex L, it is necessary to perform cross-machine communication. However, since the graph-cutting method according to embodiments of the present disclosure can make fewer or minimal edges to be cut (in the example of fig. 4A-4E, only 2 edges are cut, the cut rate is only 0.167), the likelihood of requiring such cross-machine communication will be reduced, thereby improving the overall retrieval performance of the graph retrieval system.
In some embodiments, a maximum number of vertices to be assigned per machine may be set, and if the number of vertices assigned by a machine reaches the maximum number, continuing to assign vertices to the machine is stopped. For example, the maximum number Pmax may be determined based on the following formula (1):
P max =a×V/P (1)
where V represents the total number of vertices in the graph data, P represents the number of search engine machines to be assigned, and a is a preset parameter that takes a value greater than 1, such as 1.5.
In some embodiments, the principles of cutting figures according to embodiments of the present disclosure may include: ensuring that contiguous edges of high-dimensional vertices are assigned to the same machine; isolated vertices are evenly distributed on each machine; and the upper limit of vertices allocated on each machine is less than the maximum number Pmax.
In some embodiments, a setting of an upper limit value (i.e., a high-dimensional warning line) for the number of neighboring vertices may be received, and if a particular vertex having the upper limit value number of neighboring vertices is assigned as a neighboring vertex to the other vertices, the assignment of the particular vertex is adjusted or revised. For example, fig. 5A-5B illustrate schematic diagrams 500 and 550 for adjusting the allocation of a particular vertex according to embodiments of the present disclosure.
As shown in fig. 5A, the graph data 510 includes 9 vertices, i.e., vertex O, vertex P, vertex Q, vertex R, vertex S, vertex T, vertex U, vertex V, and vertex W, wherein the dimension of vertex S is 5, the dimension of vertex T is 4, and the dimensions of the other vertices are all 1. Thus, according to embodiments of the present disclosure, vertex S with the highest dimension and its neighbors, vertex O, vertex P, vertex Q, vertex R, vertex T, may be assigned to machine 521, followed by the remaining 3 vertices, U, V, vertex T, respectively, assigned to machine 522.
However, as shown in fig. 5A, this graph cutting method causes 3 sides of the high-dimensional vertex T to be cut, and thus it is not an optimal result. Thus, embodiments of the present disclosure may specifically process high-dimensional vertices such that high-dimensional vertices are not assigned as contiguous vertices of other vertices. Referring to fig. 5B, an upper limit value (e.g., set to 4) of the number of adjacent vertices may be set, and since the dimension of the vertex T satisfies 4 or more, the allocation of fig. 5A may be adjusted so that the vertex T is not allocated as an adjacent vertex to the vertex S. In this way, the highest dimension vertex S (its dimension is 5) and its neighbors, vertex O, vertex P, vertex Q, vertex R, are assigned to machine 521, while vertex T is not assigned to machine 521 following vertex S. Next, during the second round of dispensing, the remaining highest dimension vertex T (its dimension is 4) and its neighbors U, V, and W are dispensed to machine 522. It can be seen that only one edge of the modified dispensing pattern of fig. 5B is cut compared to the dispensing pattern of fig. 5A, thereby reducing the number of edges cut during the pattern cutting process.
Thus, embodiments of the present disclosure can reduce the number of edges cut during the cutting process. For example, in a conventional random slicing method, about seventy percent of edges will be sliced in graph data having millions of vertices and tens of millions of edges. According to the segmentation method of the embodiment of the disclosure, the ratio of the edges which are actually segmented is greatly reduced, even reduced to below one percent, so that the performance loss of cross-machine communication of distributed retrieval is effectively reduced.
Fig. 6 shows a block diagram of an apparatus 600 for segmenting graph data, according to an embodiment of the disclosure. As shown in fig. 6, the apparatus 600 includes a determination module 610, a sorting module 620, and a segmentation module 630. The determination module 610 is configured to determine a number of contiguous vertices of vertices in the graph data. The ordering module 620 is configured to order the plurality of vertices in the graph data according to the number of contiguous vertices. The segmentation module 630 is configured to segment the graph data based on the ordering of the plurality of vertices.
In some embodiments, wherein the segmentation module 630 comprises: a generation module configured to generate an ordered vertex set based on an ordering of the plurality of vertices; and an assignment module configured to assign a plurality of vertices in the vertex set to a plurality of devices based on the ordered vertex set.
In some embodiments, wherein the allocation module comprises: a second allocation module configured to allocate a first vertex of the set of vertices and an adjacent vertex of the first vertex to a first device of the plurality of devices.
In some embodiments, wherein the allocation module comprises: an iteration module configured to iteratively perform the following steps until all vertices in the vertex set have been assigned: selecting the highest-ranking vertex in the vertex set and the least-currently-allocated vertex in the plurality of devices; means for assigning the unassigned adjacent vertex to the highest currently ranked vertex to the least currently assigned vertex; and updating the set of vertices by removing the currently highest ranked vertex from the set of vertices and the unassigned contiguous vertices of the currently highest ranked vertex.
In some embodiments, wherein the allocation module comprises: a setting module configured to set a maximum number of vertices to be assigned for each of the plurality of devices; and a stopping module configured to stop assigning vertices to a device of the plurality of devices in response to the number of assigned vertices of the device reaching a maximum number.
In some embodiments, wherein the allocation module comprises: a receiving module configured to receive a setting of an upper limit value of the number of adjacent top points; and an adjustment module configured to adjust allocation of the specific vertex in response to the specific vertex having the upper limit value of the adjacent vertices being allocated as the adjacent vertex.
In some embodiments, wherein the determination module 610 comprises: a second determination module configured to determine a number of contiguous vertices of the vertices in the graph data in response to a number of at least one of the vertices and edges in the graph data exceeding a predetermined number.
In some embodiments, the apparatus 600 further comprises: and a retrieval module configured to perform distributed retrieval for the graph data using the sliced graph data.
It should be appreciated that the determination module 610, the ordering module 620, the segmentation module 630 shown in fig. 6 may be included in an electronic device (such as a server). Moreover, it should be understood that the modules illustrated in fig. 6 may perform steps or actions in a method or process with reference to embodiments of the present disclosure.
Fig. 7 shows a schematic block diagram of an example device 700 that may be used to implement embodiments of the present disclosure. It should be appreciated that the apparatus 700 may be used to implement the device 600 for segmenting graph data described in this disclosure. As shown, the device 700 includes a Central Processing Unit (CPU) 701 that can perform various suitable actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 702 or loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The CPU 701, ROM 702, and RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processing unit 701 performs the various methods and processes described above, such as methods 200 and 300. For example, in some embodiments, the method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into RAM 703 and executed by CPU 701, one or more actions or steps of the method described above may be performed. Alternatively, in other embodiments, CPU 701 may be configured to perform the methods by any other suitable means (e.g., by means of firmware).
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), and so forth.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Moreover, although the acts or steps are depicted in a particular order, this should be understood as requiring that such acts or steps be performed in the particular order shown or in sequential order, or that all illustrated acts or steps be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
Although embodiments of the disclosure have been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims (14)

1. A method for graph retrieval, comprising:
determining a number of contiguous vertices of the vertices in the graph data;
sorting a plurality of vertexes in the graph data according to the number of adjacent vertexes;
segmenting the graph data based on the ordering of the plurality of vertices; and
using the sliced map data, performing a map retrieval operation for the map data,
wherein slicing the graph data comprises:
generating an ordered vertex set based on the ordering of the plurality of vertices; and
assigning a plurality of vertices in the vertex set to a plurality of devices based on the ordered set of vertices,
wherein assigning the plurality of vertices in the set of vertices to the plurality of devices comprises:
iteratively performing the following steps until all vertices in the vertex set have been assigned:
selecting the highest-ranking vertex in the vertex set and the least-currently-allocated vertex in the plurality of devices;
means for assigning the current highest ranked vertex and the unassigned neighboring vertex for the current highest ranked vertex to the current lowest assigned vertex; and
updating the set of vertices by removing the currently highest ranked vertex and unassigned contiguous vertices of the currently highest ranked vertex from the set of vertices, and
wherein the graph retrieval operation is a distributed retrieval and performing the graph retrieval operation includes:
a search is performed for vertices and their associated edges assigned to one of the devices using the one of the devices.
2. The method of claim 1, wherein assigning a plurality of vertices in the set of vertices to a plurality of devices comprises:
a first vertex of the set of vertices and an adjacent vertex of the first vertex are assigned to a first device of the plurality of devices.
3. The method of claim 1, wherein assigning a plurality of vertices in the set of vertices to a plurality of devices comprises:
setting a maximum number of vertices to be assigned for each device of the plurality of devices; and
responsive to a number of assigned vertices of a device of the plurality of devices reaching the maximum number, assigning vertices to the device is stopped.
4. The method of claim 1, wherein assigning a plurality of vertices in the set of vertices to a plurality of devices comprises:
receiving a setting of an upper limit value for the number of adjacent top points; and
in response to a particular vertex having the upper-limit value number of adjacent vertices being assigned as an adjacent vertex of the other vertices, the assignment of the particular vertex is adjusted.
5. The method of claim 1, wherein determining the number of contiguous vertices of vertices in the graph data comprises:
in response to a number of at least one of vertices and edges in the graph data exceeding a predetermined number, a number of contiguous vertices of the vertices in the graph data is determined.
6. The method of claim 2, wherein the graph retrieval operation is a distributed retrieval, and performing the graph retrieval operation comprises:
a search is performed for the first vertex and adjacent vertices of the first vertex and their related edges using the first device.
7. An apparatus for graph retrieval, comprising:
a determination module configured to determine a number of contiguous vertices of vertices in the graph data;
a ranking module configured to rank a plurality of vertices in the graph data according to the number of contiguous vertices;
a segmentation module configured to segment the graph data based on the ordering of the plurality of vertices; and
a retrieval module configured to perform a graph retrieval operation for the graph data using the segmented graph data,
wherein the segmentation module comprises:
a generation module configured to generate an ordered set of vertices based on the ordering of the plurality of vertices; and
an assignment module configured to assign a plurality of vertices in the vertex set to a plurality of devices based on the ordered set of vertices,
wherein the allocation module comprises:
an iteration module configured to iteratively perform the following steps until all vertices in the set of vertices have been assigned:
selecting the highest-ranking vertex in the vertex set and the least-currently-allocated vertex in the plurality of devices;
means for assigning the current highest ranked vertex and the unassigned neighboring vertex for the current highest ranked vertex to the current lowest assigned vertex; and
updating the set of vertices by removing the currently highest ranked vertex and unassigned contiguous vertices of the currently highest ranked vertex from the set of vertices, and
wherein the graph retrieval operation is a distributed retrieval and the retrieval module is further configured to:
a search is performed for vertices and their associated edges assigned to one of the devices using the one of the devices.
8. The apparatus of claim 7, wherein the allocation module comprises:
a second allocation module configured to allocate a first vertex of the set of vertices and an adjacent vertex of the first vertex to a first device of the plurality of devices.
9. The apparatus of claim 7, wherein the allocation module comprises:
a setting module configured to set a maximum number of vertices to be assigned for each of the plurality of devices; and
a stopping module configured to stop assigning vertices to a device of the plurality of devices in response to the number of assigned vertices of the device reaching the maximum number.
10. The apparatus of claim 7, wherein the allocation module comprises:
a receiving module configured to receive a setting of an upper limit value of the number of adjacent top points; and
an adjustment module configured to adjust allocation of a specific vertex having the upper limit value of adjacent vertices in response to the specific vertex being allocated as an adjacent vertex of other vertices.
11. The apparatus of claim 7, wherein the determining module comprises:
a second determination module configured to determine a number of contiguous vertices of the vertices in the graph data in response to a number of at least one of the vertices and edges in the graph data exceeding a predetermined number.
12. The apparatus of claim 8, wherein the graph retrieval operation is a distributed retrieval, and the retrieval module is further configured to:
a search is performed for the first vertex and adjacent vertices of the first vertex and their related edges using the first device.
13. An electronic device, the electronic device comprising:
one or more processors; and
storage means for storing one or more programs that when executed by the one or more processors cause the electronic device to implement the method of any of claims 1-6.
14. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method according to any of claims 1-6.
CN202111064232.XA 2018-08-27 2018-08-27 Method, apparatus, device and computer readable storage medium for graph retrieval Active CN113779322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111064232.XA CN113779322B (en) 2018-08-27 2018-08-27 Method, apparatus, device and computer readable storage medium for graph retrieval

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111064232.XA CN113779322B (en) 2018-08-27 2018-08-27 Method, apparatus, device and computer readable storage medium for graph retrieval
CN201810983673.1A CN109165325B (en) 2018-08-27 2018-08-27 Method, apparatus, device and computer-readable storage medium for segmenting graph data

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201810983673.1A Division CN109165325B (en) 2018-08-27 2018-08-27 Method, apparatus, device and computer-readable storage medium for segmenting graph data

Publications (2)

Publication Number Publication Date
CN113779322A CN113779322A (en) 2021-12-10
CN113779322B true CN113779322B (en) 2023-08-01

Family

ID=64896866

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202111064232.XA Active CN113779322B (en) 2018-08-27 2018-08-27 Method, apparatus, device and computer readable storage medium for graph retrieval
CN201810983673.1A Active CN109165325B (en) 2018-08-27 2018-08-27 Method, apparatus, device and computer-readable storage medium for segmenting graph data

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201810983673.1A Active CN109165325B (en) 2018-08-27 2018-08-27 Method, apparatus, device and computer-readable storage medium for segmenting graph data

Country Status (1)

Country Link
CN (2) CN113779322B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111538867B (en) * 2020-04-15 2021-06-15 深圳计算科学研究院 Method and system for dividing bounded incremental graph
CN113792170B (en) * 2021-11-15 2022-03-15 支付宝(杭州)信息技术有限公司 Graph data dividing method and device and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970902A (en) * 2014-05-27 2014-08-06 重庆大学 Method and system for reliable and instant retrieval on situation of large quantities of data
CN104281664A (en) * 2014-09-24 2015-01-14 北京航空航天大学 Data segmenting method and system of distributed graph calculating system
CN106649391A (en) * 2015-11-03 2017-05-10 华为技术有限公司 Graph data processing method and apparatus
CN107193896A (en) * 2017-05-09 2017-09-22 华中科技大学 A kind of diagram data division methods based on cluster
CN108073583A (en) * 2016-11-08 2018-05-25 华为技术有限公司 A kind of component method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9934325B2 (en) * 2014-10-20 2018-04-03 Korean Institute Of Science And Technology Information Method and apparatus for distributing graph data in distributed computing environment
US9699205B2 (en) * 2015-08-31 2017-07-04 Splunk Inc. Network security system
CN107016092B (en) * 2017-04-06 2019-12-03 湘潭大学 A kind of text search method based on flattening algorithm

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103970902A (en) * 2014-05-27 2014-08-06 重庆大学 Method and system for reliable and instant retrieval on situation of large quantities of data
CN104281664A (en) * 2014-09-24 2015-01-14 北京航空航天大学 Data segmenting method and system of distributed graph calculating system
CN106649391A (en) * 2015-11-03 2017-05-10 华为技术有限公司 Graph data processing method and apparatus
CN108073583A (en) * 2016-11-08 2018-05-25 华为技术有限公司 A kind of component method and device
CN107193896A (en) * 2017-05-09 2017-09-22 华中科技大学 A kind of diagram data division methods based on cluster

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Multi-Manifold Ranking: Using Multiple Features for Better Image Retrieval;Wang Yang等;《Pacific-Asia Conference on Knowkedge Discovery and Data Mining》;全文 *
基于邻域的大规模图数据动态分割算法;张晓媛;张珩;翟健;;计算机系统应用(09);全文 *

Also Published As

Publication number Publication date
CN109165325B (en) 2021-08-17
CN109165325A (en) 2019-01-08
CN113779322A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
CN113169990A (en) Segmentation of deep learning inference with dynamic offload
EP3079077A1 (en) Graph data query method and device
CN113779322B (en) Method, apparatus, device and computer readable storage medium for graph retrieval
CN106202092A (en) The method and system that data process
CN104281664A (en) Data segmenting method and system of distributed graph calculating system
CN111143685B (en) Commodity recommendation method and device
CN111522968A (en) Knowledge graph fusion method and device
CN113868434A (en) Data processing method, device and storage medium for graph database
CN113204642A (en) Text clustering method and device, storage medium and electronic equipment
CN113360350A (en) Method, device, equipment and storage medium for positioning root cause alarm of network equipment
CN114610825A (en) Method and device for confirming associated grid set, electronic equipment and storage medium
CN112365333B (en) Real-time dynamic flow distribution method, system, electronic equipment and storage medium
CN114881761A (en) Determination method of similar sample and determination method of credit limit
CN110321435B (en) Data source dividing method, device, equipment and storage medium
CN112612790A (en) Card number configuration method, device, equipment and computer storage medium
CN112948087A (en) Task scheduling method and system based on topological sorting
CN114265556B (en) Data storage method and device
CN110955637A (en) Method for realizing ordering of oversized files based on low memory
CN115391052B (en) Robot task processing method and device, electronic equipment and storage medium
CN117950600B (en) Data storage method and related device
TWI813042B (en) Neural network partitioning method, system, terminal equipment and storage medium
CN116860183B (en) Data storage method, electronic equipment and storage medium
CN113378184B (en) Method, apparatus and computer program product for data protection
CN113822301B (en) Sorting center sorting method and device, storage medium and electronic equipment
CN114647386B (en) Big data distributed storage method based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant