WO2023138505A1

WO2023138505A1 - Methods, systems, and devices for data query

Info

Publication number: WO2023138505A1
Application number: PCT/CN2023/072091
Authority: WO
Inventors: Yi Yu; Mingwei Zhou; Cong Li
Original assignee: Zhejiang Dahua Technology Co., Ltd.
Priority date: 2022-01-20
Filing date: 2023-01-13
Publication date: 2023-07-27
Also published as: CN114741570A

Abstract

The present disclosure provides methods, systems, and devices for data query. The methods may include obtaining one or more query conditions. Each of the one or more query conditions may include at least one of schema information of an edge of interest to be queried or property information of one or more nodes associated with the edge of interest to be queried. The methods may include determining, based on the one or more query conditions, at least one record in a target index. Each record of the at least one record in the target index may be generated based on schema information of a target edge of at least one target edge that satisfies one or more target edge conditions and property information of one or more nodes associated with the target edge. Further, the methods may include determining, based on the at least one record, a query result. The query result may include the edge of interest and the one or more nodes associated with the edge of interest.

Description

METHODS, SYSTEMS, AND DEVICES FOR DATA QUERY

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority of Chinese Patent Application No. 202210068412.3 filed on January 20, 2022, the contents of which are hereby incorporated by reference.

TECHNICAL FIELD

The present disclosure generally relates to graph databases, and more particularly, relates to methods, systems, and devices for data query.

BACKGROUND

With the rapid development of the computer field and requirements for data query, graph data and graph databases that can intuitively represent relationships between data are becoming more and more popular. However, if the data is directly queried from the graph database, a full-image query needs to be performed on the graph database, which is time-consuming and inefficient. Indexes can be established to improve the query performance of the graph database, thereby improving the efficiency of the data query from the graph database. However, the indexes can be used for certain scenarios (e.g., a graph database including less graph data, a graph database including data with simple relationships, etc. ) , and traversal processes during the data query cannot be simplified.

Therefore, it is desirable to provide methods, systems, and devices for data query, which can simplify or avoid the traversal processes during the data query and improve the efficiency of the data query from the graph database.

SUMMARY

An aspect of the present disclosure relates to a method for data structure generation. The method may be implemented on a computing device having at least one processor and at least one storage device. The method may include determining one or more target edge conditions; and generating, based on at least one target edge that satisfies the one or more target edge conditions, a target index including at least one record by: generating each record of the at least one record in the target index based on schema information of a target edge of the at least one target edge and property information of one or more nodes associated with the target edge.

Another aspect of the present disclosure relates to a method for data query. The method may be implemented on a computing device having at least one processor and at least one storage device. The method may include obtaining one or more query conditions. Each of the one or more query conditions may include at least one of schema information of an edge of interest to be queried or property information of one or more nodes associated with the edge of interest to be queried. The method may include determining, based on the one or more query conditions, at least one record in a target index. Each record of the at least one record in the target index may be generated based on schema information of a target edge of at least one target edge that satisfies one or more target edge conditions and property information of one or more nodes associated with the target edge. Further, the method may include determining, based on the at least one record, a query result. The query result may include the edge of interest and the one or more nodes associated with the edge of interest.

Still another aspect of the present disclosure relates to a system for data structure generation. The system may include at least one storage device including a set of instructions; and at least one processor in communication with the at least one storage device. When executing the set of instructions, the at least one processor may be directed to perform operations. The operations may include determining one or more target edge conditions; and generating, based on at least one target edge that satisfies the one or more target edge conditions, a target index including at least one record by: generating each record of the at least one record in the target index based on schema information of a target edge of the at least one target edge and property information of one or more nodes associated with the target edge.

Still another aspect of the present disclosure relates to a system for data query. The system may include at least one storage device including a set of instructions; and at least one processor in communication with the at least one storage device. When executing the set of instructions, the at least one processor may be directed to perform operations. The operations may include obtaining one or more query conditions. Each of the one or more query conditions may include at least one of schema information of an edge of interest to be queried or property information of one or more nodes associated with the edge of interest to be queried. The operations may include determining, based on the one or more query conditions, at least one record in a target index. Each record of the at least one record in the target index may be generated based on schema information of a target edge of at least one target edge that satisfies one or more target edge conditions and property information of one or more nodes associated with the target edge.Further, the operations may include determining, based on the at least one record, a query result.The query result may include the edge of interest and the one or more nodes associated with the edge of interest.

Additional features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The features of the present disclosure may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities, and combinations set forth in the detailed examples discussed below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:

FIG. 1 is a schematic diagram illustrating an exemplary data query system according to some embodiments of the present disclosure;

FIG. 2 is a schematic diagram illustrating an exemplary graph structure according to some embodiments of the present disclosure;

FIG. 3 is a schematic diagram illustrating an exemplary node stored in a row of a Hadoop database according to some embodiments of the present disclosure;

FIG. 4A is a schematic diagram illustrating an exemplary graph structure according to some embodiments of the present disclosure;

FIG. 4B is a schematic diagram illustrating an exemplary process for data query according to some embodiments of the present disclosure;

FIG. 5 is a block diagram illustrating an exemplary processing device according to some embodiments of the present disclosure;

FIG. 6 is a flowchart illustrating an exemplary process for data structure generation according to some embodiments of the present disclosure;

FIG. 7 is a flowchart illustrating an exemplary process for storing a record in a target index according to some embodiments of the present disclosure;

FIG. 8 is a flowchart illustrating an exemplary process for generating a record in a target index according to some embodiments of the present disclosure;

FIG. 9 is a flowchart illustrating an exemplary process for data query according to some embodiments of the present disclosure;

FIG. 10 is a schematic diagram illustrating an exemplary electronic device for resource management according to some embodiments of the present disclosure; and

FIG. 11 is a schematic diagram illustrating an exemplary computer-readable storage medium for resource management according to some embodiments of the present disclosure.

DETAILED DESCRIPTION

In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. However, it should be apparent to those skilled in the art that the present disclosure may be practiced without such details. In other instances, well-known methods, procedures, systems, components, and/or circuitry have been described at a relatively high level, without detail, in order to avoid unnecessarily obscuring aspects of the present disclosure. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments shown, but to be accorded the widest scope consistent with the claims.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms “a, ” “an, ” and “the” may be intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprise, ” “comprises, ” and/or “comprising, ” “include, ” “includes, ” and/or “including, ” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It will be understood that when a unit, engine, module, or block is referred to as being “on, ” “connected to, ” or “coupled to, ” another unit, engine, module, or block, it may be directly on, connected or coupled to, or communicate with the other unit, engine, module, or block, or an intervening unit, engine, module, or block may be present, unless the context clearly indicates otherwise. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

These and other features, and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, may become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form a part of this disclosure. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended to limit the scope of the present disclosure. It is understood that the drawings are not to scale.

The present disclosure relates to methods and systems for data structure generation. The methods may include determining one or more target edge conditions, and generating, based on at least one target edge that satisfies the one or more target edge conditions, a target index including at least one record by generating each record of the at least one record in the target index based on schema information of a target edge of the at least one target edge and property information of one or more nodes associated with the target edge. The target index may be generated for the at least one target edge that satisfies the one or more target edge conditions, which can be performed with simple operations, thereby enlarging the application scope of the target index. In addition, the target index may be generated by adding the property information of the one or more nodes associated with the target edge to the schema information of the target edge, which needs a small amount of storage space without an influence on the storage device of the graph database. Further, the target index may be used for data query, which can simplify or avoid a traversal process during the data query, thereby improving the efficiency of the data query.

FIG. 1 is a schematic diagram illustrating an exemplary data query system 100 according to some embodiments of the present disclosure. As shown in FIG. 1, the data query system 100 may include a processing device 110, a network 120, a database 130, and a terminal device 140. In some embodiments, the processing device 110, the network 120, the database 130, and/or the terminal device 140 may be connected to and/or communicate with each other via a wireless connection, a wired connection, or a combination thereof. The connection among the components of the data query system 100 may be variable. Merely by way of example, the database 130 may be connected to the processing device 110 through the network 120, as illustrated in FIG. 1. As another example, the database 130 may be connected to the processing device 110 directly.

The processing device 110 may process data and/or information obtained from one or more components (e.g., the database 130, the terminal device 140, etc. ) of the data query system 100. For example, the processing device 110 may generate a data structure (e.g., a target index) for data (e.g., graph data, edges, etc. ) in the database 130. For instance, the processing device 110 may determine one or more target edge conditions, and generate, based on at least one target edge that satisfies the one or more target edge conditions, a target index including at least one record by generating each record of the at least one record in the target index based on schema information of a target edge of the at least one target edge and property information of one or more nodes associated with the target edge. As another example, the processing device 110 may perform a data query in the database 130. For instance, the processing device 110 may obtain one or more query conditions. Each of the one or more query conditions may include at least one of schema information of an edge of interest to be queried or property information of one or more nodes associated with the edge of interest to be queried. The processing device 110 may determine, based on the one or more query conditions, at least one record in a target index. Each of the at least one record in the target index may be generated based on schema information of a target edge and property information of one or more nodes associated with the target edge. Further, the processing device 110 may determine, based on the at least one record, a query result. The query result may include at least one edge of interest and corresponding the one or more nodes associated with the edge of interest.

In some embodiments, the processing device 110 may be in communication with a computer-readable storage medium (e.g., a storage device in the database 130, an external storage device, etc. ) and may execute programs and/or instructions stored in the computer-readable storage medium.

In some embodiments, the processing device 110 may be a single server or a server group. The server group may be centralized or distributed. In some embodiments, the processing device 110 may be local or remote. For example, the processing device 110 may access information and/or data stored in the database 130 and/or the terminal device 140 via the network 120. As another example, the processing device 110 may be directly connected to the database 130 and/or the terminal device 140 to access stored information and/or data. In some embodiments, the processing device 110 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or any combination thereof.

In some embodiments, the processing device 110 may be implemented by a computing device. For example, the computing device may include a processor, a storage, an input/output (I/O) , and a communication port. The processor may execute computer instructions (e.g., program codes) and perform functions of the processing device 110 in accordance with the techniques described herein. The computer instructions may include, for example, routines, programs, objects, components, data structures, procedures, modules, and functions, which perform particular functions described herein. In some embodiments, the processing device 110, or a portion of the processing device 110 may be implemented by a portion of the terminal device 140.

In some embodiments, the processing device 110 may include multiple processing devices. Thus operations and/or method steps that are performed by one processing device as described in the present disclosure may also be jointly or separately performed by the multiple processing devices. For example, if in the present disclosure the, the data query system 100 executes both operation A and operation B, it should be understood that operation A and operation B may also be performed by two or more different processing devices jointly or separately (e.g., a first processing device executes operation A and a second processing device executes operation B, or the first and second processing devices jointly execute operations A and B) .

The network 120 may include any suitable network that can facilitate the exchange of information and/or data for the data query system 100. In some embodiments, one or more components (e.g., the database 130, the terminal device 140, etc. ) of the data query system 100 may communicate information and/or data with one or more other components of the data query system 100 via the network 120. In some embodiments, the network 120 may include one or more network access points. For example, the network 120 may include wired and/or wireless network access points such as base stations and/or internet exchange points through which one or more components of the data query system 100 may be connected to the network 120 to exchange data and/or information.

The database 130 may refer to an organized collection of data stored and accessed electronically. In some embodiments, the database 130 may be configured to store and/or manage the data. For example, the database 130 may be configured to perform operations (e.g., data query, data manipulation (insertion, updating, and/or deletion) , data definition (schema creation and/or modification) , data access control, etc. ) on the data stored in the database 130. Exemplary databases may include a relational database and a non-relational database. The relational database may refer to a database generated based on a relational model of data. For example, the relational database may be a structured query language (SQL) database, such as, a MySQL database, an Oracle database, a PostgreSQL database, a MariaDB database, a Snowflake database, a Teradata Vantage database, or the like, or any combination thereof. The non-relational database may refer to a database using a mechanism for storage and retrieval of data that is modeled in means other than the relational model of data. For example, the non-relational database may be a not only SQL (NoSQL) database, such as, a key-value database, a document-oriented database, a graph database, or the like, or any combination thereof. The graph database may refer to a database that uses graph structures to represent and store the data. A graph structure may include nodes, edges, and properties. Exemplary graph databases may include a Neo4j database, a JanusGraph database, an ArangoDB database, an AllegroGraph database, a Sparksee database, a FlockDB database, or the like, or any combination thereof. More descriptions regarding the graph database and the graph structure may be found elsewhere in the present disclosure (e.g., FIG. 2 and the descriptions thereof) .

In some embodiments, the database 130 may include a database management system (DBMS) that interacts with users, applications, etc., and the database 130 to store and/or manage the data. In some embodiments, the database 130 may store and/or manage the data by using the DBMS to store and/or manage the data. For illustration purposes, in the present disclosure, “storing and/or managing the data by using the DBMS to store and/or manage the data” may be referred to as “storing and/or managing the data” for brevity.

In some embodiments, the database 130 may be connected to or include a storage device. Therefore, the database 130 may be stored in the storage device. The storage device may include a mass storage, a removable storage, a volatile read-and-write memory, a read-only memory (ROM) , or the like, or any combination thereof. For example, the mass storage may include a magnetic disk, an optical disk, a solid-state drive, etc. The removable storage may include a flash drive, a floppy disk, an optical disk, a memory card, a zip disk, a magnetic tape, etc.

In some embodiments, the storage device may store data/information obtained from the processing device 110, the terminal device 140, and/or any other component of the data query system 100. For example, the processing device 110 may store the data structure (e.g., the target index) in the storage device corresponding to the database 130. In some embodiments, the storage device may further store one or more programs and/or instructions to perform exemplary methods described in the present disclosure. In some embodiments, the storage device may be part of the processing device 110.

In some embodiments, the terminal device 140 may provide a user interface via which a user may view information and/or input data and/or instructions to the data query system 100. In some embodiments, the terminal device 140 may include a mobile device 140-1, a tablet computer 140-2, a laptop computer 140-3, or the like, or any combination thereof. In some embodiments, the mobile device 140-1 may include a smart home device, a wearable device, a mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the terminal device 140 may include a display that can display information in a human-readable form, such as text, image, audio, video, graph, animation, or the like, or any combination thereof. The display of the at least one terminal may include a cathode ray tube (CRT) display, a liquid crystal display (LCD) , a light-emitting diode (LED) display, a plasma display panel (PDP) , a three-dimensional (3D) display, or the like, or a combination thereof. In some embodiments, the terminal device 140 may be part of the processing device 110.

It should be noted that the above description regarding the data query system 100 is merely provided for the purposes of illustration, and is not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, the data query system 100 may include one or more additional components, and/or one or more components of the data query system 100 described above may be omitted. In some embodiments, a component of the data query system 100 may be implemented on two or more sub-components. Two or more components of the data query system 100 may be integrated into a single component.

FIG. 2 is a schematic diagram illustrating an exemplary graph structure according to some embodiments of the present disclosure. A graph structure 200 may be an embodiment of a graph structure stored in the database 130 described in FIG. 1. Correspondingly, the database 130 may be a graph database (e.g., a JanusGraph database) . It should be noted that the graph database and the graph structure are merely provided for illustration, and are not intended to limit the scope of the present disclosure. The database 130 may be any database that is capable of storing and/or managing data, such as, a relational database, a key-value database, a document-oriented database, etc. The graph structure may be any graph structure that is capable of representing data to be stored.

In some embodiments, a graph database may include a plurality of graph structures, and each of the plurality of graph structures may include one or more nodes and/or one or more edges. For example, as shown in FIG. 2, the graph structure 200 may include nodes 202 and 204, and edges 212 and 214. A node (also referred to a vertex) may indicate an entity, such as, an object, an item, a position, an event, a category, etc. For example, the node 202 may indicate a person “A, ” and the node 204 may indicate a person “B” different from the person “A. ” As another example, the node 202 may indicate a person, and the node 204 may indicate an event.

In some embodiments, the node may include a static node or a dynamic node. The static node may refer to a node that does not change with time. For example, the static node may include file data, such as, a personnel file, a vehicle file, a case file, etc. The dynamic node may refer to a node that changes with time. For example, the dynamic node may be spatio-temporal event data, such as, data related to a violation time, data related to a travel trajectory, etc.

An edge (also referred to a relationship) may be used to associate two or more nodes. The edge may indicate a corresponding relationship between the two or more nodes. For example, if the person “A” indicated by the node 202 and the person “B” indicated by the node 204 are friends, the edge (e.g., the edge 212 or the edge 214) associating the node 202 with the node 204 may indicate a friend relationship. In some embodiments, the edge may be a directed edge or an undirected edge. In some embodiments, two nodes may be connected by two directed edges. The directed edges connecting two nodes may have different corresponding relationships depending on direction (s) of the directed edges. For example, as shown in FIG. 2, a corresponding relationship from the node 202 to the node 204 indicated by the edge 212 may be different from a corresponding relationship from the node 204 to the node 202 indicated by the edge 214. In some embodiments, the two nodes connected by a directed edge may be designated as a head node and a tail node based on a direction of the directed edge. The direction of the directed edge may be from the head node to the tail node. For example, for the edge 212, the node 202 may be designated as the head node, and the node 204 may be designated as the tail node. As another example, for the edge 214, the node 204 may be designated as the head node, and the node 202 may be designated as the tail node. In some embodiments, two nodes may be connected by an undirected edge. The undirected edge connecting two nodes may represent a single corresponding relationship between the two nodes. For example, if the nodes 202 and 204 are connected by an undirected edge, a corresponding relationship from the node 202 to the node 204 may be the same as a corresponding relationship from the node 204 to the node 202.

In some embodiments, based on a temporal correlation, a type of the edge may include a static edge and a dynamic edge. The static edge may refer to an edge indicating a relationship that does not change with time. For example, the static edge may include an edge indicating a kinship relationship (e.g., a father-child relationship, a mother-child relationship, etc. ) , an edge indicating an owner relationship (e.g., a car owner relationship) , etc. The dynamic edge may refer to an edge indicating a relationship that changes with time. For example, the dynamic edge may include a person-event relationship (e.g., a peer relationship) , a car-event relationship (e.g., a violation relationship) , etc.

In some embodiments, based on associated nodes, the type of the edge may include a one-to-one relationship, a one-to-many relationship, and a many-to-many relationship. For example, a relationship between a person and a personnel file of the person may belong to the one-to-one relationship, a relationship between a car and drivers of the car may belong to the one-to-many relationship, and a relationship between persons and events may be the many-to-many relationship.

In some embodiments, the graph structure may further include property information. The property information may indicate characteristic (s) corresponding to the node (s) or the edge (s) . For example, as shown in FIG. 2, the node 202 may include property information 222, the node 204 may include property information 224, the edge 212 may include property information 232, and the edge 214 may include property information 234. In some embodiments, the property information may include a plurality of properties and corresponding property values. For example, if a node is a person, property information of the node may include a name property (including a corresponding property value “Jack” ) , a gender property (including a corresponding property value “male” ) , an age property (including a corresponding property value “20” ) , etc. As another example, if a node is an event, property information of the node may include a name property (including a corresponding property value “walk” ) , a time property (including a corresponding property value “18: 00” ) , a location property (including a corresponding property value “place A” ) , etc.

In some embodiments, the graph structure may further include one or more labels. A label may be configured to determine a type (or a group) of the graph structure. For example, graph structures with a same label may be determined as a same type (or group) .

In some embodiments, the plurality of graph structures may be stored in the graph database (e.g., the database 130) directly. In some embodiments, the plurality of graph structures may be stored in the graph database through a key-value database or a document-oriented database. For example, the graph structure 200 may be stored in the database 130 through a Hadoop database (HBase) , a Cassandra database, etc.

Merely by way of example, the graph database may be the Janusgraph database, and the HBase may be used as a storage terminal of the Janusgraph database.

The Janusgraph database may refer to an open-source and distributed graph database.The Janusgraph database may include a plurality of graph structures (e.g., the graph structure 200) established based on the property graph model. The plurality of graph structures in the Janusgraph database may be stored in the HBase.

The HBase may refer to an open-source and distributed key-value database. In some embodiments, the HBase may be used to store data and/or information corresponding to the plurality of graph structures based on one or more column families. Each of the one or more column families may include one or more columns. A column may represent a type of data and/or information. Each of the one or more columns may include one or more storage cells. Each of the one or more storage cells may be used to store one piece of data and/or information. For example, a storage cell may be used to store a key-value pair indicating one piece of data and/or information, wherein a key of the key-value pair corresponds to a column including the storage cell, and a value of the key-value pair corresponds to the piece of data and/or information. In some embodiments, each of one or more rows may include a plurality of storage cells that store data and/or information related to a same node. That is, data and/or information related to one node may be stored in a same row. For example, property information of a node, edges associated with the node, property information of the edges associated with the node, etc., may be stored in a plurality of storage cells of a same row. In some embodiments, each storage cell may be determined based on the rows and columns of the HBase. Merely by way of example, a storage structure of the HBase may be shown in Table 1.

Table 1. Storage Structure of HBase

In Table 1, a column (or a column family) “RowKey” may correspond to nodes. A column family “ColumnFamily: CF1” may represent a first column family including a column “Column: Name” and a column “Column: Alias. ” The column “Column: Name” may correspond to names of the nodes, and the column “Column: Alias” may correspond to aliases of the nodes. A column family “ColumnFamily: CF2” may represent a second column family including a column “Column: Age” and a column “Column: Sex. ” The column “Column: Age” may correspond to ages of the nodes, and the column “Column: Sex” may correspond to genders of the nodes. A column (or a column family) “TimeStamp” may correspond to versions of data and/or information related to the nodes. For example, a row corresponding to a node “rk001” may be used to store information related to the node “rk001, ” such as, a name “Jacky, ” an alias “Jack, ” an age “66, ” a gender “male (M) , ” a version “T1, ” etc., of the node “rk001. ” As another example, a row corresponding to a node “rk002” may be used to store data and/or information related to the node “rk002, ” such as, a name “Johnson, ” an alias “John, ” an age “25, ” a gender “M, ” a version “T2, ” etc., of the node “rk002. ”

In some embodiments, when data and/or information related to a node is stored in the graph database, edges associated with the node and/or property information of the edges associated with the node may be stored. For example, as shown in FIG. 3, when a node (e.g., a node identity) 310 is stored (e.g., wrote) in a row of the HBase, information 320 related to the node 310 may be stored in the row, wherein the information 320 may include property information 322 of the node 310, an edge 324 associated with the node 310, an edge 326 associated with the node 310, property information 328 of the edge 326, etc.

In some embodiments, when data and/or information related to an edge is stored in the graph database, nodes associated with the edge may be queried, and the edge may be added to the data and/or information related to the nodes. For example, as shown in FIG. 3, when the edge 324 is stored in the database, the node 310 may be queried, and the edge 324 may be added to the information 320. In other word, when the edge is stored in the database, corresponding nodes may be updated.

In some embodiments, the plurality of graph structures may be managed through a graph processing algorithm. For example, the plurality of graph structures may be managed through a TinkerPop framework. The TinkerPop framework may refer to a management framework for graph structures. The TinkerPop framework may be used for on-line transaction processing (OLTP) , on-line analytical processing (OLAP) , etc., of the graph structures. In some embodiments, the TinkerPop framework may include a gremlin language. The gremlin language may refer to a graph traversal language of the TinkerPop framework. In some embodiments, the gremlin language may be used to manage the plurality of graph structures in the graph database. For example, a user may add a graph structure into a graph database through the gremlin language. As another example, a user may delete, alter, query, etc., a graph structure in a graph database through the gremlin language.

In some embodiments, the graph database may be connected to a search system (e.g., an Elasticsearch system (ES) , a Solr system, etc. ) , which may improve an efficiency of data query on the graph database. For example, fuzzy query, geographic coordinate query, full-image query, etc., may be performed on the graph database through the Elasticsearch system.

In some embodiments, the graph database may further include indexes. An index may refer to a decentralized storage structure that is established to improve the efficiency of the data query. For example, an index (also referred to as a single index) corresponding to nodes or edges may be established in the graph database. When a graph database includes no single index, the data query performed on the graph database may be a full-image query, which is inefficient. When a graph database includes a single index, original nodes or original edges may be determined based on the single index, and then a traversal query may be performed on the original nodes or the original edges to obtain a target query result, which can improve the efficiency of the data query.

However, the single index only corresponds to a type of nodes or edges, and the original nodes or the original edges can be determined based on the type of nodes or edges. Therefore, when the data query includes one or two query conditions, the single index can improve the efficiency of the data query. One query condition may correspond to one node or one edge. When the data query includes a plurality of query conditions (e.g., more than two query conditions) , the single index can be used to only determine nodes or edges that satisfy one query condition, which can hardly improve the efficiency of the data query.

Merely by way of example, a data query that determines a target graph structure that “a man walks in Park A” may be performed on a graph database. The target graph structure 400 may be as shown in FIG. 4A. As shown in FIG. 4A, the node 402 may indicate a person, property information 404 of the node 402 may indicate a garden “male, ” an edge 406 (also referred to as a walking edge) may indicate that a person walks, a node 408 may indicate an event “walk, ” and property information 410 of the node 408 may indicate a location “Park A. ” Correspondingly, the data query may include a first query condition that a node “person” includes a gender property “male, ” a second query condition that an edge indicates that a person walks, and a third query condition that a node “walk” includes a location property “Park A. ” To perform the data query, a single index may be obtained, and then a plurality of candidate subjects (e.g., nodes or edges) that satisfy one of the three query conditions may be determined. Further, the target graph structure may be determined by determining one or more candidate subjects that satisfy the two other query conditions from the plurality of candidate subjects in a traversal manner. For instance, as shown in FIG. 4B, a single index 420 may be established for nodes with the gender property. Nodes 430, …, 450, …, 470, etc., may be determined as the plurality of candidate subjects based on the first query condition that a node “person” includes a gender property “male. ” That is, the nodes 430, …, 450, …, 470, etc., may be nodes with the gender property “male. ” Then, whether the nodes 430, …, 450, …, 470, etc., satisfy the second query condition that the edge indicates that a person walks may be determined in the traversal manner. That is, whether the node 430 satisfies the second query condition may be determined firstly. If the node 430 satisfies the second query condition (e.g., the node 430 is associated with a walking edge) , whether another node associated with the walk edge satisfies the third query condition may be determined. If the node 430 does not satisfy the second query condition (e.g., the node 430 is associated with no walking edge) , whether a next node (e.g., the node 450) satisfies the second query condition may be determined. As shown in FIG. 4B, since edges 432, 434, etc., are not the walking edge, the node 430 may not satisfy the second query condition, and then whether the node 450 satisfies the second query condition may be determined. Since an edge 452 is the walking edge, the node 450 may satisfy the second query condition, and then whether a node 4522 satisfies the third query condition may be determined. If the node 4522 satisfies the third query condition (e.g., property information 4524 includes a location property “Park A” ) , a graph structure corresponding to the node 450 may be determined as the target graph structure. If the node 4522 does not satisfy the third query condition (e.g., the property information 4524 does not include the location property “Park A” ) , whether a next node (e.g., the node 470) satisfies the second query condition may be determined, and the second query condition and the third query condition are judged accordingly. Since the single index only corresponds to one query condition (e.g., one type of nodes or edges) , the one or more candidate subjects that satisfy the two other query conditions may be determined from the plurality of candidate subjects in the traversal manner. Therefore, when data is queried from the graph database based on the single index, one or more traversal processes may be performed, which reduces the efficiency of the data query and the application scope of the single index.

In some embodiments, the graph database may include a target index (also referred to as a union index) . The target index may refer to a mixed index for target edge (s) . In some embodiments, the target index may be generated for at least one target edge that satisfies the one or more target edge conditions. For example, if the one or more target edge conditions include a type condition (e.g., a many-to-many relationship and a dynamic edge) , an edge whose type is both the many-to-many relationship and the dynamic edge may be designated as one of the at least one target edge. In some embodiments, the target index may include at least one record. Each record of the at least one record in the target index may be generated based on schema information of a target edge of the at least one target edge and property information of one or more nodes associated with the target edge. More descriptions regarding the generation of the target index may be found elsewhere in the present disclosure (e.g., FIGs. 6-8 and the descriptions thereof) . In some embodiments, when the data query is performed on a graph database including the target index, at least one record in the target index may be determined based on one or more query conditions, and a query result may be determined based on the at least one record. The query result may include an edge of interest and one or more nodes associated with the edge of interest. More descriptions regarding the data query may be found elsewhere in the present disclosure (e.g., FIG. 9 and the descriptions thereof) . By establishing the target index, the data query may be performed without the traversal process, which can improve the efficiency of the data query.

FIG. 5 is a block diagram illustrating an exemplary processing device 110 according to some embodiments of the present disclosure. In some embodiments, the modules illustrated in FIG. 5 may be implemented on the processing device 110. In some embodiments, the processing device 110 may be in communication with a computer-readable storage medium (e.g., the storage device in the database 130) and may execute instructions stored in the computer-readable storage medium. The processing device 110 may include a generation module 510 and a query module 550.

The generation module 510 may be configured to generate a data structure (e.g., a target index) for data (e.g., graph data, edges, etc. ) in a graph database. In some embodiments, the generation module 510 may include a determination unit 512 and a generation unit 514.

The determination unit 512 may be configured to determine one or more target edge conditions. More descriptions regarding the determination of the one or more target edge conditions may be found elsewhere in the present disclosure. See, e.g., operation 602 and relevant descriptions thereof.

The generation unit 514 may be configured to generate, based on at least one target edge that satisfies the one or more target edge conditions, a target index including at least one record by generating each record of the at least one record in the target index based on schema information of a target edge of the at least one target edge and property information of one or more nodes associated with the target edge. More descriptions regarding the generation of the target index may be found elsewhere in the present disclosure. See, e.g., operation 604 and relevant descriptions thereof.

The query module 550 may be configured to perform a data query in the database. In some embodiments, the query module 550 may include an obtaining unit 552 and a determination unit 554.

The obtaining unit 552 may be configured to obtain one or more query conditions. Each of the one or more query conditions may include at least one of schema information of an edge of interest to be queried or property information of one or more nodes associated with the edge of interest to be queried. More descriptions regarding the obtaining of the one or more query conditions may be found elsewhere in the present disclosure. See, e.g., operation 902 and relevant descriptions thereof.

The determination unit 554 may be configured to determine, based on the one or more query conditions, at least one record in a target index. Each of the at least one record in the target index may be generated based on schema information of a target edge and property information of one or more nodes associated with the target edge. More descriptions regarding the determination of the at least one record may be found elsewhere in the present disclosure. See, e.g., operation 904 and relevant descriptions thereof.

In some embodiments, the determination unit 554 may be further configured to determine, based on the at least one record, a query result. The query result may include at least one edge of interest and corresponding the one or more nodes associated with the edge of interest. More descriptions regarding the determination of the query result may be found elsewhere in the present disclosure. See, e.g., operation 906 and relevant descriptions thereof.

The modules in the processing device 110 may be connected to or communicate with each other via a wired connection or a wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, or the like, or any combination thereof. The wireless connection may include a Local Area Network (LAN) , a Wide Area Network (WAN) , a Bluetooth, a ZigBee, a Near Field Communication (NFC) , or the like, or any combination thereof.

It should be noted that the above descriptions of the processing device 110 are provided for the purposes of illustration, and are not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, various variations and modifications may be conducted under the guidance of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. In some embodiments, the processing device 110 may include one or more other modules. For example, the processing device 110 may include a storage module to store data generated by the modules in the processing device 110. In some embodiments, any two of the modules may be combined as a single module, and any one of the modules may be divided into two or more units. In some embodiments, the generation module 510 and the query module 550 may be set in different processing devices.

FIG. 6 is a flowchart illustrating an exemplary process 600 for data structure generation according to some embodiments of the present disclosure. In some embodiments, the process 600 may be implemented in the data query system 100 illustrated in FIG. 1. For example, the process 600 may be stored in a storage device (e.g., the storage device in the database 130, an external storage device) in the form of instructions (e.g., an application) , and invoked and/or executed by the processing device 110. The operations of the process 600 presented below are intended to be illustrative. In some embodiments, the process 600 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed.Additionally, the order in which the operations of the process 600 as illustrated in FIG. 6 and described is not intended to be limiting.

A graph database can be used to store a plurality of graph structures. Each of the plurality of graph structures may include nodes and edges, so that the graph database can intuitively represent relationships between the nodes, and have a quick response to relationship queries.However, if the data is directly queried from the graph database, a full-image query needs to be performed on the graph database, which is time-consuming and inefficient.

In some embodiments, the efficiency of the data query may be improved by various manners. For example, a single index for nodes or edges may be established for the graph database. When the data query is performed, original nodes or original edges may be determined based on the single index, and then a traversal process may be performed on the original nodes or the original edges to obtain a target query result. However, when the data query includes a plurality of query conditions, one or more traversal processes may be performed, which reduces the efficiency of the data query and the application scope of the single index.

As another example, nodes or edges in the graph database may be divided into a plurality of sub-regions, and indexes corresponding to the nodes or edges may be established in the corresponding sub-regions. Therefore, the data query may be performed on the plurality of sub-regions based on the indexes to obtain a target query result. However, the efficiency of the data query may be improved by improving the performance of the indexes, which cannot simplify or avoid the traversal process during the data query.

As a still example, candidate query results may be predetermined for the graph database based on candidate query condition (s) that can be used to query the graph database, and stored in a storage device or an Elasticsearch system. When the data query is performed, a target query result may be obtained based on the candidate query results and the candidate query condition (s) . However, a large amount of workloads can be increased when the graph database is established or modified. Further, the candidate query results may occupy a large amount of storage space, which burdens the storage device of the graph database. In addition, when the data query includes a plurality of query conditions, the efficiency and accuracy of the data query can be reduced.

Therefore, to improve the efficiency of the data query, and simplify or avoid the traversal process during the data query, the process 600 may be performed to generate a target index (also referred to as a union index) .

In 602, the processing device 110 (e.g., the generation module 510, or the determination unit 512) may determine one or more target edge conditions.

The one or more target edge conditions may be configured to determine whether an edge to be processed is one of at least one target edge. In some embodiments, the one or more target edge conditions may include a type condition (also referred to a first condition) , a query condition (also referred to a second condition) , or the like, or any combination thereof.

The first condition may relate to an edge type. For example, the first condition may include a static edge, a dynamic edge, or a combination thereof. As another example, the first condition may include a one-to-one relationship, a one-to-many relationship, a many-to-many relationship, or the like, or any combination thereof.

The second condition may relate to data query on the graph database. For example, the second condition may relate to, such as, a count of query times, a query frequency, a query mode, a query efficiency, etc. Merely by way of example, the second condition may include a count threshold (e.g., 100 times, 200 times, 300 times, 500 times, 800 times, 1000 times, etc. ) . The count threshold may be determined based on a default system set or manually set by a user (e.g., a programmer, a manager, an operator, etc., of the graph database) .

In some embodiments, the processing device 110 may determine the one or more target edge conditions based on an instruction input by the user. For example, the user may input an instruction related to the one or more target edge conditions through an input device, and the processing device 110 may determine the one or more conditions of the target edge based on the instruction.

In some embodiments, the processing device 110 may automatically determine the one or more target edge conditions. For example, the processing device 110 may determine the one or more target edge conditions based on reference graph database (s) including target indexes. For instance, the processing device 110 may determine a reference graph database with a highest similarity between the graph database and the reference graph database from the reference graph database (s) , and determine the one or more target edge conditions based on reference condition (s) corresponding to the reference graph database. The similarity between the graph database and the reference graph database may be determined based on types, volumes, etc., of graph structures stored in the graph database and those in the reference graph database.

As another example, the processing device 110 may determine the one or more target edge conditions according to a system default set. For instance, since a large amount of data corresponds to the many-to-many relationship and/or the dynamic edge, the processing device 110 may determine the first condition including the many-to-many relationship and/or the dynamic edge as a default target edge condition. That is, when no instruction is input by the user, the processing device 110 may determine the first condition including the many-to-many relationship and/or the dynamic edge as the one or more target edge conditions.

In 604, the processing device 110 (e.g., the generation module 510, or the generation unit 514) may generate, based on the at least one target edge that satisfies the one or more target edge conditions, the target index including at least one record by generating each of the at least one record in the target index based on schema information of a target edge of the at least one target edge and property information of one or more nodes associated with the target edge.

A target edge may refer to an edge to be processed that satisfies the one or more target edge conditions. The edge to be processed may refer to an edge needs to be stored in the graph database or an edge needs to be determined whether a record corresponding to the edge needs to be generated.

The target index may refer to a mixed index for the at least one target edge. For example, the target index may be a data structure including the at least one record generated based on the at least one target edge. In some embodiments, the target index may be configured to determine, based on one or more query conditions, an edge of interest to be queried. The edge of interest may refer to an edge to be queried according to the one or more query conditions of the user. Each of the one or more query conditions may include at least one schema information of the edge of interest to be queried or property information of one or more nodes associated with the edge of interest to be queried. More descriptions regarding the data query may be found elsewhere in the present disclosure (e.g., FIG. 9 and the descriptions thereof) .

In some embodiments, the processing device 110 may determine whether the target index needs to be generated. For example, the processing device 110 may obtain information (or data) of edge (s) to be processed (e.g., edge (s) stored in the graph database or edge (s) to be stored in the graph database) , and determine whether one of the edge (s) satisfies the one or more target edge conditions. If one or more edges satisfies the one or more target edge conditions, the processing device 110 may generate the target index based on the one or more edges. If no edges satisfy the one or more target edge conditions, the processing device 110 may not generate the target index. Alternatively, the processing device 110 may generate an edge index (e.g., a single index) for an edge that does not satisfy the one or more target edge conditions.

In some embodiments, the one or more target edge conditions may include the first condition and/or the second condition. Accordingly, the processing device 110 may determine whether the edge to be processed satisfies the first condition and/or the second condition. If the edge to be processed satisfies the first condition and/or the second condition, the processing device 110 may designate the edge to be processed as one of the at least one target edge. Merely by way of example, the one or more target edge conditions may include the first condition that an edge type is the dynamic edge. The processing device 110 may determine whether an edge type of an edge to be processed is the dynamic edge. If the edge type of the edge to be processed is the dynamic edge, the processing device 110 may designate the edge to be processed as one of the at least one target edge. As another example, the one or more target edge conditions may include the second condition that a count of query times of the edge to be processed exceeds a count threshold (e.g., 100 times, 200 times, 300 times, 500 times, 800 times, 1000 times, etc. ) . The processing device 110 may obtain historical query data relating to the edge to be processed, such as, the count of query times of the edge to be processed. The historical query data may be obtained from a query log in the graph database or an external storage device. The processing device 110 may determine whether the historical query data relating to the edge to be processed satisfies the second condition (e.g., the count of query times of the edge to be processed exceeds the count threshold) . If the count of query times of the edge to be processed exceeds the count threshold, the processing device 110 may designate the edge to be processed as one of the at least one target edge.

In some embodiments, the one or more target edge conditions include no condition, the processing device 110 may generate the target index based on all edges to be processed.

In some embodiments, in response to that a target edge of the at least one target edge satisfies the one or more target edge conditions, the processing device 110 may generate a record of the at least one record in the target index. For example, for each of the at least one target edge, the processing device 110 may generate a record corresponding to the target edge based on schema information of the target edge and property information of one or more nodes associated with the target edge. For illustration purposes, in the present disclosure, when a record is generated, it can be considered that the target index is generated.

In some embodiments, schema information of an edge may refer to information for describing the edge. Exemplary schema information may include name information, identity information, structure information, union information, property information, or the like, or any combination thereof. The name information may refer to a name of the edge. The identity information may refer to an identity of the edge in the graph database. In some embodiments, the identity of the edge may be determined when the edge is stored in the graph database. For example, the identity of the edge may be determined based on a default system set. For instance, the identity of the edge may be a number corresponding to an order when the edge is stored in the graph database, such as, 1, 2, 3, ..., N (N is a positive integer) . The structure information may refer to a connection structure of the edge. For example, the structure information may include nodes (e.g., a head node and a tail node) connected by the edge. The property information may refer to one or more properties of the edge, such as, an address property where the edge is stored, a time property when the edge is stored, etc.

In some embodiments, the processing device 110 may obtain the schema information of the edge from a graph database (e.g., the database 130, the storage device in the database 130) . In some embodiments, the processing device 110 may obtain the information of the edge from an external storage device or an external database (e.g., an HBase) connected to the graph database.

In some embodiments, the schema information of the edge may further include union information. The union information may indicate whether to generate a record based on the edge in the target index. For example, the union information may include a value “True” and a value “False, ” wherein the value “True” indicates that it needs to generate a record based on the edge in the target index, and the value “False” indicates that it needs to generate no record based on the edge in the target index.

In some embodiments, the processing device 110 may determine the union information of the edge based on whether the edge satisfies the one or more target edge conditions. If the edge satisfies the one or more target edge conditions, the union information may be determined as the value “True. ” If the edge does not satisfy the one or more target edge conditions, the union information may be determined as the value “False. ” Accordingly, the processing device 110 may determine whether a record in the target index needs to be generated for the edge based on the union information of the edge.

In some embodiments, the target index may include data structure (s) configured to generate the each record. For example, the processing device 110 may generate the each record of the at least one record in the target index by updating the data structure based on the schema information of the target edge of the at least one target edge and the property information of the one or more nodes associated with the target edge.

Merely by way of example, an exemplary general structure may be shown as below:

{

Name of target index,

Property list of target index

[

Property information of edge,

Property information of nodes

]

}.

For each record, the processing device 110 may update the “Property information of edge” in the data structure based on the schema information of the target edge, and update the “Property information of nodes” in the data structure based on the property information of the one or more nodes associated with the target edge.

In some embodiments, the processing device 110 may adjust the data structure for different edge types of the target edge. For example, the processing device 110 may establish a plurality of candidate data structures, and determine a target data structure for the target edge based on a configuration rule related to, for example, a count of query times, a frequency of data query, a type of the target edge, an edge type of the target edge, etc.

The each record may include a pair of a key and a value (also referred to as a key-value pair) . The key may include the schema information of the target edge and the property information of the one or more nodes associated with the target edge, and the value may correspond to the edge. For example, the value may be the identity information of the target edge. In some embodiments, the processing device 110 may filter the schema information of the target edge and the property information of the one or more nodes associated with the target edge in the record. For example, the processing device 110 may obtain filtered information by filtering the schema information of the target edge and the property information of the one or more nodes associated with the target edge, and generate the record in the target index based on the filtered information. More descriptions regarding the generation of the record may be found elsewhere in the present disclosure (e.g., FIG. 8 and the descriptions thereof) .

In some embodiments, the key may include one or more property prefixes configured to recognize the property information of the one or more nodes. The one or more property prefixes may refer to preset characters or strings, for example, “str, ” “prefix_, ” etc. For example, the processing device 110 may concatenate a property prefix before the property information of the one or more nodes in the key. For instance, a property prefix “from_” may be concatenated to the property information of the head node, and a property prefix “to_” may be concatenated to the property information of the tail node, which can recognize the property information of the head node and the property information of the tail node even the head node and the tail node have the same property information. Merely by way of example, the name information of the head node may be marked as “from_name, ” and the name information of the tail node may be marked as “to_name. ”

Merely by way of example, referring to Tables 2 and 3, a record corresponding to a target edge in a target index may be generated. Two nodes associated with the target edge may be a head node “person” and a tail node “event, ” wherein the node “person” includes property information “num, ” “name, ” and “sex, ” and the node “event” includes property information “num” and “tag. ” The two nodes and the corresponding property information may be shown in Table 2.

Table 2. Nodes and Information of Nodes

In Table 2, a column “propertyKeys” may indicate the property information of the two nodes.

The target edge may include a relationship name “hasEvent” that associates the two nodes. The target edge may include union information with a value “True. ” Further, the target edge may include address property information “relationAddress” and time property information “relationTime. ” The information of the target edge may be shown in Table 3.

Table 3. Information of Edge

In Table 3, a column “relationName” may indicate name information of the target edge, a column “fromNode” and a column “toNode” may indicate node information of the target edge, a column “Union” may indicate the union information of the edge, and a column “PropertyKey” may indicate the property information of the target edge.

The record corresponding to the target edge in the target index may be generated as follows:

{

“name” : “hasEvent” ,

“propertyKeys” :

[

“relationAddress” ,

“relationTime” ,

“from_num” ,

“from_name” ,

“from_sex” ,

“to_name” ,

“to_tag”

]

}

In some embodiments, the processing device 110 may store the record corresponding to the edge in the Elasticsearch system. For example, the processing device 110 may input the record through an application program interface (API) of the graph database, and store the record corresponding to the edge in the Elasticsearch system.

In some embodiments, if the edge to be processed does not satisfy the one or more target edge conditions, the processing device 110 may store the edge (e.g., information of the edge) in the graph database (e.g., the database 130) or an external database (e.g., the HBase) . In some embodiments, the processing device 110 may generate an edge index (e.g., a single index) for the edge, and store the edge index in a database (e.g., the database 130) or an external database (e.g., the HBase) .

Merely by way of example, referring to FIG. 7, FIG. 7 is a flowchart illustrating an exemplary process 700 for storing an edge in a graph database according to some embodiments of the present disclosure.

In 702, the processing device 110 may determine union information of an edge to be processed. For example, the processing device 110 may determine the union information of the edge to be processed based on a determination result of whether the edge to be processed satisfies one or more target edge conditions.

In 704, the processing device 110 may determine whether the union information of the edge satisfies a third condition. The third condition may refer to that a value of the union information is “True. ”

If the schema information of the edge does not satisfy the third condition (e.g., the value of the union information is “False” ) , the process 700 may proceed to operation 704.

In 706, the processing device 110 may store information of the edge in an HBase.

If the union information of the edge satisfies the third condition (e.g., the value of the union information is “True” ) , the process 700 may proceed to operation 706.

In 708, the processing device 110 may generate a record in a target index for the edge. For example, the processing device 110 may generate a record including schema information of the edge and property information of one or more nodes associated with the edge.

In 710, the processing device 110 may store the edge in the HBase and store the record in an Elasticsearch system (ES) .

Some embodiments of the present disclosure, the one or more target edge conditions may be determined, and the target index may be generated based on the at least one target edge that satisfies the one or more target edge conditions by generating the each record of the at least one record in the target index based on the schema information of the target edge of the at least one target edge and the property information of the one or more nodes associated with the target edge, which can be performed with simple operations. In addition, the target index may be generated by adding the property information of the one or more nodes associated with the target edge to the schema information of the target edge, which needs a small amount of storage space and has no influence on the storage device of the graph database, thereby ensuring the operation of the graph database. Further, when the target index is used for data query, a query language may be the same as the graph database. For example, if the graph database uses the gremlin language, the data query through the target index also can use the gremlin language, which needs no additional query language, thereby reducing adjustment on the graph database and simplifying an application process of the target index. Moreover, by introducing the target index, the traversal process during the data query can be reduced or avoided, which can improve the efficiency of the data query.

It should be noted that the description of the process 600 is provided for the purposes of illustration, and is not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, various variations and modifications may be conducted under the teaching of the present disclosure. However, those variations and modifications may not depart from the protection of the present disclosure. For example, the data structure of the target index may be determined before operation 602. As another example, the union information of each edge may be determined between operation 602 and operation 604.

FIG. 8 is a flowchart illustrating an exemplary process 800 for generating a record in a target index according to some embodiments of the present disclosure. In some embodiments, the process 800 may be performed to achieve at least part of operation 604 as described in connection with FIG. 6.

In 802, the processing device 110 (e.g., the generation module 510, or the generation unit 514) may obtain filtered information by filtering schema information of a target edge and property information of one or more nodes associated with the target edge.

The filtered information may refer to information for configuring a record in a target index after the schema information of the edge and/or the property information of the nodes are filtered based on actual requirement (s) of a user. In some embodiments, the filtered information may include filtered information of the target edge and filtered information of the one or more nodes associated with the target edge. For example, the schema information of the target edge and/or the property information of the one or more nodes associated with the target edge may be filtered based on a graph data scenario, a query requirement (e.g., a potential query time and/or a potential query frequency) , an instruction input by the user, etc., to determine the filtered information.

In some embodiments, the processing device 110 may determine the filtered information based on historical query data. The historical query data may include a count of query times of each property information, a query frequency of each property information, etc. For example, the processing device 110 may determine a count threshold, and determine property information whose count of query times exceeds the count threshold as the filtered information. As another example, the processing device 110 may determine a frequency threshold, and determine property information whose query frequency exceeds the frequency threshold as the filtered information. In some embodiments, the processing device 110 may obtain the historical query data from a graph database (e.g., a storage device of the graph database) .

In 804, the processing device 110 (e.g., the generation module 510, or the generation unit 514) may generate, based on the filtered information, a record in the target index.

For example, the processing device may update a data structure based on the filtered information to generate the record in the target index.

Merely by way of example, referring to Tables 2 and 3, if the filtered information includes the property information “num” and “name” of the head node “person, ” the property information “num” and “tag” of the tail node “event, ” and the address property “relationAddress” of the edge, the record corresponding to the edge in the target index may be generated as follows:

{

“name” : “hasEvent” ,

“propertyKeys” :

[

“relationAddress” ,

“from_num” ,

“from_name” ,

“to_name” ,

“to_tag”

]

}

That is, the property information “sex” of the head node “person” and the time property “relationTime” of the edge may be removed.

According to some embodiments of the present disclosure, the schema information of the target edge and the property information of the one or more nodes associated with the target edge can be filtered before the generation of the record, which can reduce the storage space occupied by the target index (e.g., the record (s) in the target index) while meeting the query requirements of the user.

It should be noted that the description of the process 800 is provided for the purposes of illustration, and is not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, various variations and modifications may be conducted under the teaching of the present disclosure. However, those variations and modifications may not depart from the protection of the present disclosure.

FIG. 9 is a flowchart illustrating an exemplary process 900 for data query according to some embodiments of the present disclosure. In some embodiments, the process 900 may be implemented in the data query system 100 illustrated in FIG. 1. For example, the process 900 may be stored in a storage device (e.g., the storage device in the database 130, an external storage device) in the form of instructions (e.g., an application) , and invoked and/or executed by the processing device 110. The operations of the process 900 presented below are intended to be illustrative. In some embodiments, the process 900 may be accomplished with one or more additional operations not described, and/or without one or more of the operations discussed. Additionally, the order in which the operations of the process 900 as illustrated in FIG. 9 and described is not intended to be limiting.

In 902, the processing device 110 (e.g., the query module 550, or the obtaining unit 552) may obtain one or more query conditions. Each of the one or more query conditions may include at least one of schema information of an edge of interest to be queried or property information of one or more nodes associated with the edge of interest to be queried.

The one or more query conditions may be used to query the edge of interest to be queried. Each of the one or more query conditions may correspond to a node or an edge. In some embodiments, different query conditions may correspond to different nodes or different edges. For example, to query a target graph structure that “amen walks in Park A” in a graph database, the one or more query conditions may include a first query condition that a node “person” includes a gender property “male, ” a second query condition that an edge indicates that a person walks, and a third query condition that a node “walk” includes a location property “Park A. ”

In 904, the processing device 110 (e.g., the query module 550, or the determination unit 554) may determine, based on the one or more query conditions, at least one record in a target index. Each record of the at least one record in the target index may be generated based on schema information of a target edge of at least one target edge that satisfies one or more target edge conditions and property information of one or more nodes associated with the target edge.

In some embodiments, the each record may include a pair of a key and a value. The key may include the schema information of the target edge and the property information of the one or more nodes associated with the target edge, and the value may correspond to the target edge. More descriptions regarding the target index and the record may be found elsewhere in the present disclosure (e.g., FIGs. 6-8 and the descriptions thereof) .

In some embodiments, the processing device 110 may determine the at least one record in the target index that satisfies the one or more query conditions. For example, if a record includes the one or more query conditions, the processing device 110 may determine the record as one of the at least one record.

Merely by way of example, when the one or more query conditions include the first query condition that a node “person” includes a gender property “male, ” the second query condition that an edge indicates a person walks, and the third query condition that a node “walk” includes a location property “Park A, ” the processing device 110 may determine the record including a person with the gender property “male, ” an edge indicating a person walks, and an event “walk” with the location property “Park A” as one of the at least one record.

In 906, the processing device 110 (e.g., the query module 550, or the determination unit 554) may determine, based on the at least one record, a query result. The query result may include the edge of interest and the one or more nodes associated with the edge of interest.

In some embodiments, when the at least one record is determined, the processing device 110 may determine the at least one value. Since each of the at least one value includes identity information of the corresponding target edge, the processing device 110 may determine the corresponding target edge in the graph database. Accordingly, the processing device 110 may obtain the edge of interest and the one or more nodes associated with the edge of interest as the query result.

Merely by way of example, a data query for a man that performs a certain event may be performed on a graph database. The data query may include three query conditions, such as, a first query condition including a node “person” with a gender property “male, ” a second query condition including an edge indicating a person performs the certain event, and a third query condition including a node “certain event. ”

A query process based on a single index of a man may include determining one or more persons that satisfy the first query condition, and determining a target person that satisfies the second query condition and the third query condition from the one or more persons in a traversal manner. The target person may be determined as the query result. More descriptions regarding the data query based on the single index may be found elsewhere in the present disclosure (e.g., FIG. 2 and the descriptions thereof) .

A gremlin query statement corresponding to the single index may be as follows:

g.V() . has ( ‘person’ , ‘sex, ’ ‘male’ ) . as ( ‘person1’ )

.outE ( ‘hasEvent’ ) . as ( ‘hasEvent1’ )

.inV () . has ( ‘event’ , ‘tag’ , ‘certain event’ ) . as ( ‘event1’ )

.select ( ‘event’ , ‘tag’ , ‘certain event’ ) )

A query process based on a target index may including determining a record including property information of a gender property “male, ” property information of a tag property “certain event, ” and schema information of an edge “hasEvent. ” Therefore, a query result may be determined based on the record without a traversal process. A gremlin query statement corresponding to the target index may be as follows:

g.E() . has ( ‘hasEvent’ , ‘sex’ , ‘male’ ) . has ( ‘hasEvent’ , ‘tag’ , ‘certain event’ ) . as ( ‘hasEvent1’ )

.outV ( ‘event’ ) . as ( ‘event1’ )

.select ( ‘hasEvent1’ ) . inV ( ‘person’ ) . as ( ‘person1’ )

.select ( ‘hasEvent1’ , ‘event1’ , ‘person1’ ) )

Therefore, the data query based on the target index may have a higher efficiency than the data query based on the single index.

According to some embodiments of the present disclosure, the data query may be performed based on the graph database including the target index, which can directly determine the at least one record that satisfies the one or more query conditions and simplify or avoid the traversal process, which can improve the efficiency of the data query.

It should be noted that the description of the process 900 is provided for the purposes of illustration, and is not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, various variations and modifications may be conducted under the teaching of the present disclosure. However, those variations and modifications may not depart from the protection of the present disclosure.

FIG. 10 is a schematic diagram illustrating an exemplary electronic device 1000 for resource management according to some embodiments of the present disclosure.

As shown in FIG. 10, the electronic device 1000 may include a processor 1010 and a memory 1020 coupled to the processor 1010.

The memory 1020 may store programs and/or instructions for implementing the processes in the above embodiments of the present disclosure. The processor 1010 may be configured to execute the programs and/or instructions stored in the memory 1020 to implement operations of the processes in the above embodiments of the present disclosure. The processor 1010 may include a central processing unit (CPU) . The processor 1010 may be an integrated circuit chip can process a signal. The processor 1010 may include a general processor, a digital signal processor (DSP) , an application-specific integrated circuit (ASIC) , a field-programmable gate array (FPGA) or other programmable logic devices, a discrete gate or transistor logic devices, a discrete hardware component, etc. The general processor may be a microprocessor, or any conventional processor.

FIG. 11 is a schematic diagram illustrating an exemplary computer-readable storage medium 1100 for resource management according to some embodiments of the present disclosure.

As shown in FIG. 11, the computer-readable storage medium 1100 may store programs and/or instructions 1110. When the programs and/or instructions 1110 are executed, the processes in the above embodiments of the present disclosure may be implemented. The programs and/or instructions 1110 may form program files and be stored in the computer-readable storage medium 1100 in the form of software products, so that a computer device (which may include a personal computer, a server, or a network device, etc. ) or a processor to execute all or part of the operations of the processes in the above embodiments of the present disclosure. The computer-readable storage medium 1100 may include a storage medium that can store programs and/or instructions, such as, a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, an optical disk, or the like, or any combination thereof. Alternatively, the computer-readable storage medium 1100 may include a terminal device, such as, a computer, a server, a mobile phone, a tablet computer, or the like, or any combination thereof.

Having thus described the basic concepts, it may be rather apparent to those skilled in the art after reading this detailed disclosure that the foregoing detailed disclosure is intended to be presented by way of example only and is not limiting. Various alterations, improvements, and modifications may occur and are intended for those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested by this disclosure, and are within the spirit and scope of the exemplary embodiments of this disclosure.

Moreover, certain terminology has been used to describe embodiments of the present disclosure. For example, the terms “one embodiment, ” “an embodiment, ” and/or “some embodiments” mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Therefore, it is emphasized and should be appreciated that two or more references to “an embodiment” or “one embodiment” or “an alternative embodiment” in various portions of this disclosure are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined as suitable in one or more embodiments of the present disclosure.

Furthermore, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations therefore, is not intended to limit the claimed processes and methods to any order except as may be specified in the claims. Although the above disclosure discusses through various examples what is currently considered to be a variety of useful embodiments of the disclosure, it is to be understood that such detail is solely for that purpose, and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalent arrangements that are within the spirit and scope of the disclosed embodiments. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software only solution, e.g., an installation on an existing server or mobile device.

Similarly, it should be appreciated that in the foregoing description of embodiments of the present disclosure, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various inventive embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, inventive embodiments lie in less than all features of a single foregoing disclosed embodiment.

In some embodiments, the numbers expressing quantities or properties used to describe and claim certain embodiments of the application are to be understood as being modified in some instances by the term “about, ” “approximate, ” or “substantially. ” For example, “about, ” “approximate, ” or “substantially” may indicate ±20%variation of the value it describes, unless otherwise stated. Accordingly, in some embodiments, the numerical parameters set forth in the written description and attached claims are approximations that may vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameters should be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of some embodiments of the application are approximations, the numerical values set forth in the specific examples are reported as precisely as practicable.

Each of the patents, patent applications, publications of patent applications, and other material, such as articles, books, specifications, publications, documents, things, and/or the like, referenced herein is hereby incorporated herein by this reference in its entirety for all purposes, excepting any prosecution file history associated with same, any of same that is inconsistent with or in conflict with the present document, or any of same that may have a limiting effect as to the broadest scope of the claims now or later associated with the present document. By way of example, should there be any inconsistency or conflict between the description, definition, and/or the use of a term associated with any of the incorporated material and that associated with the present document, the description, definition, and/or the use of the term in the present document shall prevail.

In closing, it is to be understood that the embodiments of the application disclosed herein are illustrative of the principles of the embodiments of the application. Other modifications that may be employed may be within the scope of the application. Thus, by way of example, but not of limitation, alternative configurations of the embodiments of the application may be utilized in accordance with the teachings herein. Accordingly, embodiments of the present application are not limited to that precisely as shown and described.

Claims

A method for data structure generation implemented on a computing device having at least one processor and at least one storage device, the method comprising:

determining one or more target edge conditions; and

generating, based on at least one target edge that satisfies the one or more target edge conditions, a target index including at least one record by:

generating each record of the at least one record in the target index based on schema information of a target edge of the at least one target edge and property information of one or more nodes associated with the target edge.
The method of claim 1, wherein

the each record includes a pair of a key and a value,

the key includes the schema information of the target edge and the property information of the one or more nodes associated with the target edge, and

the value corresponds to the target edge.
The method of claim 2, wherein the key further includes one or more property prefixes configured to recognize the property information of the one or more nodes.
The method of claim 2 or claim 3, wherein the pair of the key and the value is stored in an Elasticsearch system (ES) .
The method of any one of claims 1-4, wherein

the target index is configured to determine, based on one or more query conditions, an edge of interest to be queried, and

each of the one or more query conditions include at least one schema information of the edge of interest to be queried or property information of one or more nodes associated with the edge of interest to be queried.
The method of claim 1, wherein the one or more target edge conditions include a first condition, and the method further comprises:

determining whether an edge to be processed satisfies the first condition;

in response to that the edge to be processed satisfies the first condition, designating the edge to be processed as one of the at least one target edge.
The method of claim 6, wherein the first condition relates to an edge type.
The method of claim 1, wherein the one or more target edge conditions include a second condition, and the method further comprises:

obtaining historical query data relating to an edge to be processed;

determining whether the historical query data relating to the edge to be processed satisfies the second condition; and

in response to that the edge to be processed satisfies the second condition, designating the edge to be processed as one of the at least one target edge.
The method of claim 8, wherein the second condition relates to at least one of a count of query times, a query frequency, a query mode, or a query efficiency.
The method of claim 1, wherein the generating each record of the at least one record in the target index includes:

obtaining filtered information by filtering the schema information of the target edge and the property information of the one or more nodes associated with the target edge; and

generating, based on the filtered information, the each record in the target index.
A method for data query implemented on a computing device having at least one processor and at least one storage device, the method comprising:

obtaining one or more query conditions, each of the one or more query conditions including at least one of schema information of an edge of interest to be queried or property information of one or more nodes associated with the edge of interest to be queried;

determining, based on the one or more query conditions, at least one record in a target index, wherein each record of the at least one record in the target index is generated based on schema information of a target edge of at least one target edge that satisfies one or more target edge conditions and property information of one or more nodes associated with the target edge; and

determining, based on the at least one record, a query result, the query result including the edge of interest and the one or more nodes associated with the edge of interest.
The method of claim 11, wherein

the each record includes a pair of a key and a value,

the key includes the schema information of the target edge and the property information of the one or more nodes associated with the target edge, and

the value corresponds to the target edge.
A system for data structure generation, comprising:

at least one storage device including a set of instructions; and

at least one processor in communication with the at least one storage device, wherein when executing the set of instructions, the at least one processor is directed to perform operations including:

determining one or more target edge conditions; and

generating, based on at least one target edge that satisfies the one or more target edge conditions, a target index including at least one record by:

generating each record of the at least one record in the target index based on schema information of a target edge of the at least one target edge and property information of one or more nodes associated with the target edge.
The system of claim 13, wherein

the each record includes a pair of a key and a value,

the key includes the schema information of the target edge and the property information of the one or more nodes associated with the target edge, and

the value corresponds to the target edge.
The system of claim 14, wherein the key further includes one or more property prefixes configured to recognize the property information of the one or more nodes.
The system of claim 14 or claim 15, wherein the pair of the key and the value is stored in an Elasticsearch system (ES) .
The system of any one of claims 13-16, wherein

the target index is configured to determine, based on one or more query conditions, an edge of interest to be queried, and

each of the one or more query conditions include at least one schema information of the edge of interest to be queried or property information of one or more nodes associated with the edge of interest to be queried.
The system of claim 13, wherein the one or more target edge conditions include a first condition, and the operations further comprise:

determining whether an edge to be processed satisfies the first condition;

in response to that the edge to be processed satisfies the first condition, designating the edge to be processed as one of the at least one target edge.
The system of claim 18, wherein the first condition relates to an edge type.
The system of claim 13, wherein the one or more target edge conditions include a second condition, and the operations further comprise:

obtaining historical query data relating to an edge to be processed;

determining whether the historical query data relating to the edge to be processed satisfies the second condition; and

in response to that the edge to be processed satisfies the second condition, designating the edge to be processed as one of the at least one target edge.
The system of claim 20, wherein the second condition relates to at least one of a count of query times, a query frequency, a query mode, or a query efficiency.
The system of claim 13, wherein the generating each record of the at least one record in the target index includes:

obtaining filtered information by filtering the schema information of the target edge and the property information of the one or more nodes associated with the target edge; and

generating, based on the filtered information, the each record in the target index.
A system for data query, comprising:

at least one storage device including a set of instructions; and

at least one processor in communication with the at least one storage device, wherein when executing the set of instructions, the at least one processor is directed to perform operations including:

obtaining one or more query conditions, each of the one or more query conditions including at least one of schema information of an edge of interest to be queried or property information of one or more nodes associated with the edge of interest to be queried;

determining, based on the one or more query conditions, at least one record in a target index, wherein each record of the at least one record in the target index is generated based on schema information of a target edge of at least one target edge that satisfies one or more target edge conditions and property information of one or more nodes associated with the target edge; and

determining, based on the at least one record, a query result, the query result including the edge of interest and the one or more nodes associated with the edge of interest.
The system of claim 23, wherein

the each record includes a pair of a key and a value,

the key includes the schema information of the target edge and the property information of the one or more nodes associated with the target edge, and

the value corresponds to the target edge.