CN114706846A - Method, device and equipment for querying data and storage medium - Google Patents

Method, device and equipment for querying data and storage medium Download PDF

Info

Publication number
CN114706846A
CN114706846A CN202111673409.6A CN202111673409A CN114706846A CN 114706846 A CN114706846 A CN 114706846A CN 202111673409 A CN202111673409 A CN 202111673409A CN 114706846 A CN114706846 A CN 114706846A
Authority
CN
China
Prior art keywords
node
query
data
nodes
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111673409.6A
Other languages
Chinese (zh)
Inventor
邹磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202111673409.6A priority Critical patent/CN114706846A/en
Publication of CN114706846A publication Critical patent/CN114706846A/en
Priority to PCT/CN2022/135606 priority patent/WO2023124729A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24539Query rewriting; Transformation using cached or materialised query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for querying data, and belongs to the technical field of graph databases. The method comprises the following steps: receiving a data query instruction sent by a data query application program, wherein the data query instruction carries a data query statement; establishing a first query tree corresponding to the data query statement based on the structure of the data query statement; simplifying the first query tree based on the types of the nodes in the first query tree to obtain a second query tree; based on a preset execution sequence, sequentially executing query operations corresponding to all nodes in the second query tree in a graph database to obtain a data query result; and returning the data query result to the data query application program. By the method and the device, the efficiency of inquiring data in the graph database can be improved.

Description

Method, device and equipment for querying data and storage medium
Technical Field
The present application relates to the field of graph database technologies, and in particular, to a method, an apparatus, a device, and a storage medium for querying data.
Background
RDF (Resource Description Framework) is a factual data model of a knowledge-graph, where each edge in the knowledge-graph is represented in the form of RDF triples like "subject, predicate, object", which represent a named relationship between a pair of entities or a named attribute value owned by an entity.
SPARQL (SPARQL Protocol and RDF Query Language, Query Language and data acquisition Protocol) is a standard Query Language for accessing RDF datasets, where UNION, option match, and FILTER expressions are commonly used Query expressions in SPARQL's data Query statements.
At present, when a computer device executes a query operation corresponding to a data query statement, query processing corresponding to each query expression is only executed in sequence, and the efficiency of querying data is low.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for querying data, which can improve the efficiency of querying data. The technical scheme is as follows:
in a first aspect, a method for querying data is provided, the method including:
receiving a data query instruction sent by a data query application program, wherein the data query instruction carries a data query statement;
establishing a first query tree corresponding to the data query statement based on the structure of the data query statement;
simplifying the first query tree based on the types of the nodes in the first query tree to obtain a second query tree;
based on a preset execution sequence, sequentially executing query operations corresponding to all nodes in the second query tree in a graph database to obtain a data query result;
and returning the data query result to the data query application program.
Optionally, the types of the nodes in the first query tree include a merge node and a query node, where the merge node is used to represent the data query statement or the sub-query statement in the data query statement, the query node is used to represent the data query statement or the query term in the sub-query statement in the data query statement, and the query node includes at least one of a BGP node, a UNION node, an option node, and a FILTER node.
Optionally, the simplifying the first query tree based on the type of each node in the first query tree to obtain a second query tree includes:
determining the first query tree as a third query tree to be simplified, and determining the depth of each node in the third query tree;
for a first merge node with a depth of 1 in the third query tree, if child nodes of the first merge node include multiple BGP nodes, merging the multiple BGP nodes to obtain a merged first BGP node, deleting the first merge node, and adding the first BGP node to the location of the first merge node;
for a second merge node with a depth of 2 in the third query tree, if child nodes of the second merge node include at least one BGP node and at least one UNION node, merging the at least one BGP node to obtain a merged second BGP node, and merging the at least one UNION node to obtain a merged third UNION node; merging the second BGP node into a child node of the third UNION node to obtain a fourth UNION node, deleting the second merged node, and adding the fourth UNION node to the position of the second merged node;
and for a fifth UNION node with the depth of 2 in the third query tree, adding a descendant node of the fifth UNION node into a child node of the fifth UNION node, and deleting a descendant node of the fifth UNION node and a parent node of the descendant node to obtain a simplified third query tree.
Optionally, before determining the first query tree as a third query tree to be simplified, the method further includes:
determining that a first optimal node of the optimal nodes does not exist in the corresponding ancestor nodes in the first query tree;
and converting the sub query tree which takes the parent node of the first optimal node as a root node into a third BGP node.
Optionally, the sequentially executing query operations corresponding to nodes in the second query tree in the graph database includes:
when the first query operation corresponding to the third BGP node is executed, determining a sub query tree corresponding to the third BGP node;
executing query operation corresponding to the brother node of the first option node in the sub query tree to obtain a first query result; determining the first query result as a data query range of descendant nodes of the first OPTIONAL node; and executing the query operation corresponding to the descendant nodes of the first OPTIONAL node based on the data query range.
Optionally, the sequentially executing query operations corresponding to nodes in the second query tree in the graph database includes:
when the first query operation corresponding to the third BGP node is executed, if the descendant nodes of the first OPTIONAL node are determined to comprise at least one second OPTIONAL node;
sequentially executing query operations corresponding to a first brother node of the first option node and a second brother node of the at least one second option node according to the depths of the first option node and the at least one second option node, wherein a data query range corresponding to the second brother node of each second option node is a query result corresponding to a second brother node of a previous option node;
and sequentially executing query operations corresponding to the child nodes of the first OPTIONAL node and the child nodes of the at least one second OPTIONAL node according to the depths of the first OPTIONAL node and the at least one second OPTIONAL node, wherein the corresponding data query range of the child nodes of any OPTIONAL node is a query result corresponding to the brother node of any OPTIONAL node.
Optionally, the simplifying the first query tree based on the type of each node in the first query tree to obtain a second query tree includes:
for the FILTER node in the first query tree, if the FILTER condition corresponding to the FILTER node meets a preset conversion condition, converting the FILTER condition into a disjunctive normal form;
and converting the FILTER node into a UNION node based on the disjunctive normal form.
Optionally, the conversion condition is that the FILTER condition corresponding to the FILTER node is composed of three operators, namely a variable, a constant, and an and/or an equal operator.
Optionally, the sequentially executing query operations corresponding to nodes in the second query tree in the graph database includes:
if a plurality of BGP nodes which can be executed in parallel exist, determining a common three-tuple mode corresponding to the BGP nodes;
determining a partial public triad mode with lowest corresponding query cost in the public triad modes based on the greedy algorithm;
querying data corresponding to the partial public triplet mode in the graph data;
and inquiring data corresponding to other triple patterns except the partial common triple pattern in the plurality of BGP nodes in the graph data.
In a second aspect, an apparatus for querying data is provided, the apparatus comprising:
the receiving module is used for receiving a data query instruction sent by a data query application program, wherein the data query instruction carries a data query statement;
the establishing module is used for establishing a first query tree corresponding to the data query statement based on the structure of the data query statement;
the processing module is used for simplifying the first query tree based on the types of all nodes in the first query tree to obtain a second query tree;
the query module is used for sequentially executing query operations corresponding to all nodes in the second query tree in the graph database based on a preset execution sequence to obtain a data query result;
and the return module is used for returning the data query result to the data query application program.
Optionally, the types of the nodes in the first query tree include a merge node and a query node, where the merge node is used to represent the data query statement or the sub-query statement in the data query statement, the query node is used to represent the data query statement or the query term in the sub-query statement in the data query statement, and the query node includes at least one of a BGP node, a UNION node, an option node, and a FILTER node.
Optionally, the query module is configured to:
determining the first query tree as a third query tree to be simplified, and determining the depth of each node in the third query tree;
for a first merge node with a depth of 1 in the third query tree, if child nodes of the first merge node include multiple BGP nodes, merging the multiple BGP nodes to obtain a merged first BGP node, deleting the first merge node, and adding the first BGP node to the location of the first merge node;
for a second merge node with a depth of 2 in the third query tree, if child nodes of the second merge node include at least one BGP node and at least one UNION node, merging the at least one BGP node to obtain a merged second BGP node, and merging the at least one UNION node to obtain a merged third UNION node; merging the second BGP node into a child node of the third UNION node to obtain a fourth UNION node, deleting the second merged node, and adding the fourth UNION node to the position of the second merged node;
and for a fifth UNION node with the depth of 2 in the third query tree, adding a descendant node of the fifth UNION node into a child node of the fifth UNION node, and deleting a descendant node of the fifth UNION node and a parent node of the descendant node to obtain a simplified third query tree.
Optionally, the processing module is further configured to:
determining that a first optimal node of the optimal nodes does not exist in the corresponding ancestor nodes in the first query tree;
and converting the child query tree taking the parent node of the first OPTIONAL node as a root node into a third BGP node.
Optionally, the query module is configured to:
when a first query operation corresponding to the third BGP node is executed, determining a sub-query tree corresponding to the third BGP node;
executing the query operation corresponding to the brother node of the first OPTIONAL node in the sub query tree to obtain a first query result; determining the first query result as a data query range of descendant nodes of the first OPTIONAL node; and executing the query operation corresponding to the descendant nodes of the first OPTIONAL node based on the data query range.
Optionally, the query module is configured to:
when the first query operation corresponding to the third BGP node is executed, if the descendant nodes of the first OPTIONAL node are determined to comprise at least one second OPTIONAL node;
sequentially executing query operations corresponding to a first brother node of the first option node and a second brother node of the at least one second option node according to the depths of the first option node and the at least one second option node, wherein a data query range corresponding to the second brother node of each second option node is a query result corresponding to a second brother node of a previous option node;
and sequentially executing query operations corresponding to the child nodes of the first OPTIONAL node and the child nodes of the at least one second OPTIONAL node according to the depths of the first OPTIONAL node and the at least one second OPTIONAL node, wherein the corresponding data query range of the child nodes of any OPTIONAL node is a query result corresponding to the brother node of any OPTIONAL node.
Optionally, the processing module is configured to:
for the FILTER node in the first query tree, if the FILTER condition corresponding to the FILTER node meets a preset conversion condition, converting the FILTER condition into a disjunctive normal form;
and converting the FILTER node into a UNION node based on the disjunctive normal form.
Optionally, the conversion condition is that the FILTER condition corresponding to the FILTER node is composed of three operators, namely a variable, a constant, and an and/or an equal operator.
Optionally, the query module is configured to:
if a plurality of BGP nodes which can be executed in parallel exist, determining a common three-tuple mode corresponding to the BGP nodes;
determining a partial public triad mode with the lowest corresponding query cost in the public triad modes based on the greedy algorithm;
querying data corresponding to the partial public triplet mode in the graph data;
and inquiring data corresponding to other triple patterns except the partial common triple pattern in the plurality of BGP nodes in the graph data.
In a third aspect, a computer device is provided, which includes a processor and a memory, where at least one instruction is stored, and loaded and executed by the processor to implement the operations performed in the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, having stored therein at least one instruction that is loaded and executed by a processor to perform the operations performed as the first aspect described above.
In a fifth aspect, a computer program product is provided, where the computer program product includes at least one instruction, and the at least one instruction is loaded and executed by a processor to implement the operations performed in the first aspect.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
according to the embodiment of the application, the query of the data query statement is converted into the first query tree according to the structure of the data query statement, and then the query tree can be simplified by using the query logic corresponding to each query node in the first query tree to obtain the second query tree. Therefore, the query operation of the data query statement can be simplified by simplifying the query tree, and the efficiency of data query can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of a method for querying data according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a method for querying data according to an embodiment of the present application;
FIG. 3 is a flowchart of a method for querying data according to an embodiment of the present disclosure;
FIG. 4 is a diagram illustrating a method for querying data according to an embodiment of the present disclosure;
FIG. 5 is a diagram illustrating a method for querying data according to an embodiment of the present disclosure;
FIG. 6 is a diagram illustrating a method for querying data according to an embodiment of the present disclosure;
FIG. 7 is a flowchart of a method for querying data according to an embodiment of the present disclosure;
FIG. 8 is a diagram illustrating a method for querying data according to an embodiment of the present disclosure;
FIG. 9 is a flowchart of a method for querying data according to an embodiment of the present disclosure;
FIG. 10 is a flowchart of a method for querying data according to an embodiment of the present disclosure;
FIG. 11 is a schematic structural diagram of an apparatus for querying data according to an embodiment of the present disclosure;
fig. 12 is a schematic diagram of a computer device for querying data according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The method for querying data provided by the embodiment of the application can be realized by computer equipment. An application for querying data (e.g., a graph data query application) may be run in the computer device. The computer device comprises at least a processor and a memory, wherein the memory is used for storing data related to a method for executing query data, and for example, the memory can comprise a graph database, program codes corresponding to the method for executing query data and the like. The processor can execute the program codes stored in the memory and realize the method for querying data provided by the embodiment of the application program according to the data query request of the application program.
The computer equipment can be a terminal or a server, and when the computer equipment is the terminal, the terminal can be a mobile phone, a tablet computer, intelligent wearable equipment, a desktop computer, a notebook computer and the like. When the computer device is a server, the server may establish communication with the terminal, the server may be a single server or a server group, if the server is a single server, the server may be responsible for all processing in the following scheme, if the server is a server group, different servers in the server group may be respectively responsible for different processing in the following scheme, and the specific processing allocation condition may be arbitrarily set by a technician according to actual requirements, which is not described herein again.
The following describes concepts related to embodiments of the present application:
RDF (Resource Description Framework) is a factual data model of a knowledge-graph, where each edge in the knowledge-graph is represented in the form of RDF triples like "subject, predicate, object", which represent a named relationship between a pair of entities or a named attribute value owned by an entity.
SPARQL (SPARQL Protocol and RDF Query Language, Query Language and data acquisition Protocol) is a standard Query Language for accessing RDF datasets.
The RDF dataset includes a plurality of RDF triples, and arbitrary graph data can be stored. There is a unique RDF triple corresponding to each edge in the graph data.
RDF triples: let pairwise disjoint infinite sets I, B, and L represent Internationalized Resource Identifiers (IRIs), null nodes, and literal values, respectively. And one RDF triple triplet is as t ═ subject, predicate, object > ∈ (I ^ B) XI × (I ^ B ^ L).
RDF data set: an RDF dataset is a collection of RDF triples.
Triple schema: let the infinite set V, which is disjoint from set I, set B, and set L above, represent variables. A three-tuple pattern is formed as t ═ subject, predicate, object > ∈ (VU I) x (VU I L). Because there may be variables in the subject, predicate, or object in the triple schema, multiple RDF triples may be matched in the RDF dataset by the triple schema. As can be seen, RDF triples may be queried in an RDF dataset through a triple schema. And because the RDF dataset can be used for representing graph data, when querying data in the graph data, the data can also be searched through the triple pattern.
BGP (Basic graph pattern, Basic graph schema): comprises at least one triple mode, wherein one triple mode t is BGP; if P is1And P2Are all BGP, then P1 AND P2Also BGP.
BGP is a basic unit for searching data in graph data, and data matched with each triad pattern in BGP can be searched in RDF data set through the existing BGP matching algorithm.
Drawing mode:
(1) if P is BGP, P is graph mode;
(2) if P is1And P2Are all in graph mode, then P1 AND P2Graph mode is also possible;
(3) if P is1And P2Is in graph mode, then { P }1}UNION{P2}、P1 OPTIONAL{P2Is also a graph pattern, where { P }iIndicates group diagram mode;
(4) if P is a graph pattern, C is a built-in condition (using iout ≧ laut V and a constant, which may contain a logical operator (@,, V-shaped), a compare operator (<, ≦, >, ≧, ═ or), a univariate function (isBlank (whether determined to be an empty node), isIRI (whether determined to be IRI)), etc., then P FILTER C is a graph pattern.
Wherein, UNION, option and FILTER are commonly used query expressions in the data query statement of SPARQL, wherein:
UNION refers to a merged lookup of multiple graph patterns, e.g. P1 UNION P2Refers to satisfying the triple pattern P separately for the RDF-satisfying dataset1And triple Pattern P2The triple of the search result is searched, and the search result is subjected to union set.
The OPTONAL refers to selective matching of graph patterns, for example, P1 OPTIONAL { P2} refers to adding results satisfying the graph pattern P2 compatible with the RDF data set while retaining the results satisfying the graph pattern P1 in the RDF data set.
FILTER is a conditional screening of the search results, e.g. P1FILTER C refers to the ternary pattern P1And screening the data meeting the condition C in the corresponding search result.
And (3) a drawing mode: a group graph pattern P is recursively defined as follows:
(1) if P is a graph schema, { P } is a group graph schema;
(2) if P is a group graph schema, then P is also a graph schema.
UNION graph mode:
(1) if P is1Is a group diagram mode or a UNION diagram mode, and P2Is a group diagram mode, then P1 UNION P2Is a UNION graph mode;
(2) if P is1 UNION P2Is a UNION graph schema, it is also a graph schema.
Well-defined SPARQL queries: a graph pattern P is said to be well defined if and only if the following conditions are met:
(1) for each sub-pattern in P, like P 'FILTER C, all variables that occur in the built-in condition C also occur in the graph pattern P';
(2) for each of the forms P, such as P ═ P1 OPTIONAL{P2Sub-patterns of, all appear in the pattern P2Variables other than P' also appear in the graph pattern P1In (1).
The principles of the present invention are based on the semantics of the select query in the SPARQL query. The selection query is in the form of "SELECT v1v2...vkWHERE { … } ", WHERE the SELECT clause represents the query header and the WHERE clause represents the query body. The SELECT clause determines a projection variable, namely a variable which needs to appear in a query result; the WHERE clause gives a group diagram mode which needs to be matched with the RDF dataset, that is, the WHERE clause gives a data query statement.
Matching the graph pattern P with the RDF dataset D to generate a series of mappings [ P [ [ P ]]]D={μ1,μ2,...,μn}. note that there are elements in the mapping that are allowed to be duplicated, i.e., mapped as packages rather than collections. Each mapping μ is a function of a set of variables into a combination of results. The set of variables that occur in the map μ is denoted as dom (μ).
If and only if all variables v ∈ dom (μ1)∩dom(μ2) Satisfies mu1(v)=μ2(v) When it is called mu1And mu2The two mappings are compatible and are denoted as mu1~μ2At this time μ1∪μ2Is also a mapping. If μ1And mu2These two mappings are incompatible and are noted as
Figure BDA0003453666000000071
If there are two mappings Ω1And Ω2,Ω1And Ω2The following operations may be performed:
(1)
Figure BDA0003453666000000072
(2)Ω1bagΩ2={μ11∈Ω1}∪bag22∈Ω2};
(3)
Figure BDA0003453666000000073
the map generated by matching the graph pattern P to the RDF dataset D (denoted as [ [ P ]]]D) The recursion is defined as follows:
(1) if P is a triplet pattern t, then [ P]]DWhere var (t) represents all variables appearing in t, and all variables appearing in μ (t) t are replaced with RDF triples obtained after μ;
(2) if P is ═ P1 AND P2) Then, then
Figure BDA0003453666000000074
(3) If P is ═ P1 UNION P2) Then [ [ P ]]]D=[[P1]]Dbag[[P2]]D
(4) If P is ═ P1 OPTIONAL P2),
Figure BDA0003453666000000075
Figure BDA0003453666000000076
(5) If P ═ P1FILTER C), then [ [ P ]]]D={μ|μ∈[[P1]]DΛ μ (C) } (i.e. when all variables appearing in C are replaced with μ (denoted as μ (C)), the value of μ (C) is true).
The invention aims to provide an optimization algorithm for SPARQL query execution plan generation aiming at UNION, OPTIONAL and FILTER expressions in a graph database, which is used for solving the problem of low query efficiency in the existing graph database system.
The following describes a method for querying data provided by the present application in detail with reference to embodiments:
fig. 1 is a flowchart of a method for querying data according to an embodiment of the present application. The method may be implemented by the computer device described above. Referring to fig. 1, the embodiment includes:
step 101, receiving a data query instruction sent by a data query application program.
Among other things, the data query application may be used to query graph data stored in a storage database, which may be stored in the form of RDF data sets. The graph data may be the equity relationship between different enterprises, wherein the nodes in the graph data may include the name, size, and establishment time of the enterprise, and the edges in the graph data may represent previous relationships of the enterprise, such as holdings, share quota, etc. For example, an RDF triple in an RDF dataset may be: < http:// example. com/TX > < http:// example. com/name > "TX computer systems ltd, beijing, inc., the RDF triple indicates that company TX is named TX computer systems ltd, beijing, inc.
The data query application program can run in the computer device, a user can input a data query statement in the data query application program according to business requirements, and a data query instruction can be triggered after the data query statement is input. After receiving the data query instruction, the processor in the computer device may execute the following processing according to the data query statement carried in the data query instruction and input by the user.
102, establishing a first query tree corresponding to the data query statement based on the structure of the data query statement.
After the data query statement is received, a query tree (also referred to as BE tree) corresponding to the data query statement may BE established according to the data query statement and the structure of each query term.
The types of nodes in the query tree may include a merge node and a query node, and the merge node may represent a data query statement or a sub-query statement in the data query statement. And the data query statement and the sub-query statement are in graph modes. When the merged node represents a data query statement, the merged node is the root node of the first query tree. The query node represents a query word appearing in the data query statement or the sub-query statement, such as BGP, UNION, option, FILTER, etc., the query node for representing BGP may be referred to as BGP node, the query node for representing UNION may be referred to as UNION node, the query node for representing option may be referred to as option node, and the query node for representing FILTER may be referred to as FILTER node.
The query tree built directly from the data query statements in this application may be referred to as the first query tree. It should be understood that, establishing the first query tree corresponding to the data query statement is to actually represent the data query statement and each sub-query statement by a merge node, represent query terms in the data query statement by a query node, and then establish a tree having the same structure as that between each sub-query statement and each query term in the data query statement.
For example, the data query statement has a structure of ((b)1 AND(b2 UNION b3))OPTIONAL(b4 UNION b5))FILTER c1Wherein b is1-b5、c1And may be denoted as BGP nodes. (b)2 UNION b3)、(b1 AND(b2 UNION b3)、(b4 UNION b5)、(b1 AND(b2 UNION b3))OPTIONAL(b4 UNION b5) For different sub-query statements, it may be represented by a merge node. The first query tree corresponding to the data query statement may be as shown in FIG. 2.
And 103, simplifying the first query tree based on the types of the nodes in the first query tree to obtain a second query tree.
After the first query tree corresponding to the query statement is obtained, simplification processing can be performed on the first query tree, and then the query operation corresponding to the data query statement can be executed according to the simplified query tree, so that the query operation corresponding to the data query statement can be simplified, and the efficiency of data query is improved.
It should be understood that the simplified processing of the first query tree is to perform the processing of combining, converting and deleting the nodes of the query logic corresponding to the nodes in the first query tree, and does not change the query result corresponding to the data query statement. The simplified processing is different according to the type of the node in the first query tree, and will not be described in detail here. The query tree obtained by performing the simplification process on the first query tree may be referred to as a second query tree.
And step 104, sequentially executing the query operation corresponding to each node in the second query tree in the graph database based on the preset execution sequence to obtain a data query result.
After the second query tree is obtained, the query operations corresponding to the nodes may be sequentially executed according to the depths corresponding to the nodes in the second query tree, and finally, the query result of the node with the largest depth (i.e., the root node) may be obtained, where the query result is the query result of the data query statement in the data query instruction.
It should be noted that, for each query node with the same depth, a BGP node may be executed first, then a corresponding UNION node or option node is executed, and finally a FILTER node is executed.
Step 105, returning the data query result to the data query application program.
After the data query result is obtained, the data query result can be sent to the query application program, and the query application program can display the corresponding query application program in the query result display interface for the user to view.
For example, the following triple pattern is included in the data query statement: is there a And x < http:// example. com/stock holding > "TX computer systems ltd, beijing city", all individuals, companies, etc. of the TX computer systems ltd, beijing city can be displayed in the query result display interface after the above processing.
According to the embodiment of the application, the query of the data query statement is converted into the first query tree according to the structure of the data query statement, and then the query tree can be simplified by using the query logic corresponding to each query node in the first query tree to obtain the second query tree. Therefore, the query operation of the data query statement can be simplified by simplifying the query tree, and the efficiency of data query is further improved.
The following describes in detail the simplified processing of the first query tree in step 103, and according to the different node types in the first query tree, the corresponding different processing is as follows:
as shown in fig. 3, fig. 3 is a method for simplifying processing provided by the present application, and the method includes:
step 301, the first query tree is determined as a third query tree to be simplified, and the depth of each node in the third query tree is determined.
Where the depth d (a) for the leaf node a in the third query tree is 0. Depth d (b) max { d (bi) | bi is the child node of o } +1 for node b other than leaf nodes.
Step 302, for the first merge node with the depth of 1 in the third query tree, if the child nodes of the first merge node include multiple BGP nodes, merge the multiple BGP nodes to obtain a merged first BGP node, delete the first merge node, and add the first BGP node to the location of the first merge node.
If a first merge node with the depth of 1 exists in the third query tree, and child nodes of the first merge node are multiple BGP nodes, merging the multiple BGP nodes to form one BGP node, where the BGP node obtained after merging is the first BGP node. The mappings corresponding to BGP node P1-Pn before merging are [ [ P1 ] respectively]]D-[[Pn]]DWherein n is the number of BGP nodes before merging, and the mapping corresponding to the BGP nodes Pm after merging
Figure BDA0003453666000000091
Figure BDA0003453666000000092
After the first sub-node is obtained, the first merge node with the original depth of 1 may be deleted, and then the first BGP node is added to the location of the merge node with the original depth of 1, that is, the first BGP node replaces the first merge node with the original depth of 1. As shown in fig. 4, fig. 4 is a schematic diagram of step 303, so that after the first merge node with a depth of 1 is replaced by the corresponding first BGP node, the node depth can be reduced, the data amount of the intermediate result is reduced, and the efficiency of querying data can be improved.
No processing is done for UNION nodes with depth 1. If the child nodes of the first merging node include a plurality of BGP nodes and also include a FILTER node, after the first BGP node is merged, the FILTER node may serve as a sibling node of the first BGP node to replace the first merging node together, and record the scope of the FILTER node, that is, record the corresponding relationship of the first BGP node of the FILTER node, before executing each FILTER node, the node included in the current FILTER scope may be determined according to the corresponding relationship recorded for the current FILTER node, and on the basis of the query result of the node included in the scope, the query operation corresponding to the FILTER node is executed.
Step 303, for a second merge node with a depth of 2 in the third query tree, if child nodes of the second merge node include at least one BGP node and at least one UNION node, merging the at least one BGP node to obtain a merged second BGP node, and merging the at least one UNION node to obtain a merged third UNION node; and merging the second BGP node into a child node of the third UNION node to obtain a fourth UNION node, deleting the second merged node, and adding the fourth UNION node to the position of the second merged node.
If a second merge node with a depth of 2 exists in the third query tree, and if the child nodes of the second merge node include multiple BGP nodes, merging each BGP node to obtain a second BGP node, where the process may refer to the process of obtaining the first BGP node in step 303, which is not described herein again.
If the child node of the second merge node further includes multiple UNION nodes, the UNION nodes may be merged to obtain a third UNION node. For example, there are two UNION nodes u1 and u2 that need to beThe mappings of two child nodes (the child nodes are BGP nodes) of each UNION node are respectively p1, p2, q1 and q2, and then the third UNION node after combination has four child nodes, and the corresponding mappings are respectively
Figure BDA0003453666000000101
If the child nodes of the second merged node comprise m UNION nodes, wherein the ith UNION node uiHas niThe child nodes of the third UNION node obtained by merging the child nodes (BGP nodes) are combined
Figure BDA0003453666000000102
And (4) respectively. Wherein, N different mapping sets may be determined according to the BGP node of the m UNION nodes. Each mapping set has m BGP node mappings, and BGP nodes to which different mappings belong are child nodes of different UNION nodes. And for the mapping of each child node of the third UNION node, the mapping is obtained by naturally connecting the mapping in each mapping set.
After the merging process for the BGP node and the UNION node, respectively, the second BGP node may be merged into the child node of the third UNION node u 3. For example, the second sub-node has a map of px, the third sub-node has four child nodes, and the corresponding maps are respectively
Figure BDA0003453666000000103
The second BGP node is merged into the child node of the third UNION node, and the resulting fourth UNION node u4 still has four child nodes, corresponding to the mappings of four child nodes, respectively
Figure BDA0003453666000000104
Figure BDA0003453666000000105
After the fourth UNION node is obtained, the second merge node with the depth of 2 may be deleted, and the fourth UNION node is added to the second merge node with the original depth of 2, that is, the second merge node with the original depth of 2 is replaced with the fourth UNION node. As shown in fig. 5, fig. 5 is a schematic diagram of step 304, so that after the second merge node with the depth of 2 is replaced by the corresponding fourth UNION node, the node depth can be reduced, the data amount of the intermediate result can be reduced, and the efficiency of querying data can be improved.
It should be noted that, if only the UNION node is included in the child nodes of the merging node with the depth of 2, the UNION node may be directly replaced with the original merging node with the depth of 2, so as to reduce the node depth. If the child nodes of the merging node with the depth of 2 only comprise one BGP node and one UNION node, the BGP node can be directly merged into the child nodes of the UNION node to reduce the node depth. If the child nodes of the merged node with the depth of 2 include one BGP node and multiple UNION nodes, the multiple UNION nodes may be merged first, and then the BGP node is merged into the child nodes of the merged UNION node, so as to reduce the node depth. If the child nodes of the merging node with the depth of 2 include a plurality of BGP nodes and a UNION node, the BGP nodes may be merged first, and then the merged BGP nodes are merged into the child nodes of the UNION node, so as to reduce the node depth.
In addition, if the child node of the second merged node further includes a FILTER node, after the fourth UNION node is merged to obtain a fourth UNION node, the FILTER node may be used as a sibling node of the fourth UNION node to replace the second merged node together, and the scope of the FILTER node is recorded, that is, the corresponding relationship of the fourth UNION node of the FILTER node is recorded.
And step 304, for a fifth UNION node with the depth of 2 in the third query tree, adding a descendant node of the fifth UNION node to a child node of the fifth UNION node, and deleting a descendant node of the fifth UNION node and a parent node of the descendant node to obtain the simplified third query tree.
If a fifth UNION node with a depth of 2 exists in the third query tree, and a plurality of UNION nodes are also included in the child nodes of the fifth UNION node, child nodes corresponding to UNION nodes in the child nodes of the fifth UNION node may be merged into the child nodes of the fifth UNION node, and a grandchild node of the fifth UNION node and a parent node of the grandchild node are deleted, so that the query tree after simplification is obtained.
Referring to fig. 6, fig. 6 is a schematic diagram of step 304, so that after the descendant node of the fifth UNION node with the depth of 2 is deleted, the node depth can be reduced, the data amount of the intermediate result can be reduced, and the efficiency of querying data can be improved.
It should be noted that, the above steps 302 to 304 are only descriptions of different processes, and they are not divided into different execution timings. After the above processing of steps 302-304, a simplified query tree can be obtained.
And 305, determining the depth of each node in the simplified query tree.
Step 306, if there are still query nodes satisfying the simplification processing in steps 302-304 in the simplified query tree, determining the simplified query tree as a third query tree to be simplified, and jumping to the corresponding step to continue the simplification processing until there are no nodes capable of performing the simplification processing in the simplified query tree.
Because the processing in steps 302-304 changes the depth corresponding to each node in the query tree, the depth corresponding to each node in the simplified query tree can be determined after the simplified query tree is obtained. If there are still query nodes capable of performing the corresponding simplification processing in steps 302-304 in the simplified query tree, the corresponding simplification processing may be continued on the query nodes, and the depth corresponding to each node in the query tree is further reduced until it is determined that there are no nodes capable of performing simplification in the simplified query tree.
Therefore, after the cyclic processing of the steps, the depth of the query tree is obviously reduced, the data volume of the intermediate query result can be reduced, and the efficiency of querying data can be improved.
As shown in fig. 7, fig. 7 is a method for simplifying processing provided by the present application, and the method includes:
step 701, determining that the corresponding ancestor node in the first query tree does not have a first optimal node of the optimal nodes.
Step 702, convert the child query tree with the parent node of the first OPTIONAL node as the root node into a third BGP node.
The process of fig. 7 may be implemented in combination with the process of fig. 3, that is, step 701 may be executed before step 301. If it is determined in step 701 that the first query tree has the first option node, the child query tree whose root is the parent node of the first option node may be regarded as a BGP node, that is, the child query tree whose root is the parent node of the first option node may be deleted from the first query tree, and then the third BGP node may be added to the position of the parent node of the original first option node, as shown in fig. 8, the child query tree whose root is merge node 2 may be converted into a BGP3 node.
After the processing of step 702 is completed, the processing of fig. 3 may be performed, and after the processing of fig. 3 is completed, a query operation may be performed on each node in the query tree obtained through the processing of fig. 3. When the query operation related to the third node is executed, the following two processing modes are included:
the first processing mode is as follows:
when the first query operation corresponding to the third BGP node is executed, a child query tree corresponding to the third BGP node may be determined, where the child query tree is the child query tree whose root node is the parent node of the first option node in step 702.
When executing the query operation corresponding to the node of the sub query tree, the query operation corresponding to the sibling node of the first option node may be executed first, and after executing the query operation corresponding to the sibling node, the first query result, that is, the subgraph queried in the graph data is obtained and used as the data query range of the descendant node of the first option node, and the query operation corresponding to the descendant node of the first option node is executed again. Therefore, the corresponding query data volume can be reduced when the query operation of the descendant nodes of the first OPTIONAL node is executed, and the execution efficiency of the query operation can be improved.
The second processing mode:
after determining the sub-query tree corresponding to the third BGP node, it may be determined whether the descendant nodes of the first operational node further include a second operational node.
If it is determined that the descendant nodes of the first option node include at least one second option node, query operations corresponding to the first sibling node of the first option node and the second sibling node of the at least one second option node may be sequentially performed according to the depths of the first option node and the at least one second option node.
And the data query range of the brother node corresponding to each second option node is the query result of the brother node corresponding to the previous option node.
Therefore, when the brother node corresponding to each OPTIONAL node is executed, the corresponding query operation can be executed on the basis of the query result of the brother node corresponding to the previous OPTIONAL node, the query data volume corresponding to each query operation can be reduced, and the query efficiency is further improved.
After the query result of the brother node corresponding to each option node is obtained, the query operations corresponding to the child nodes of the first option node and the at least one second option node may be sequentially executed according to the depths of the first option node and the at least one second option node again.
The data query range corresponding to the child node of any option node is the query result corresponding to the brother node of any option node. Therefore, when the child node corresponding to each OPTIONAL node is executed, the corresponding query operation can be executed on the basis of the query result of the brother node, the query data volume corresponding to each query operation can be reduced, and the query efficiency is further improved.
As shown in fig. 9, fig. 9 is a method for simplifying processing provided by the present application, and the method includes:
step 901, for the FILTER node in the first query tree, if the FILTER condition corresponding to the FILTER node satisfies the preset conversion condition, converting the FILTER condition into a disjunctive normal form.
Before the above step 301 is executed, if it is determined that the FILTER node exists in the first query tree, it may be determined whether the FILTER condition corresponding to the FILTER node satisfies the preset conversion condition. The FILTER condition corresponding to the FILTER node of the conversion condition consists of only three operators, namely a variable, a constant and an and/or an equal.
Step 902, based on the disjunctive normal form, transform the FILTER node into the UNION node.
If it is determined that the FILTER condition corresponding to the FILTER node meets the preset conversion condition, the FILTER condition corresponding to the FILTER node may be converted into an extracted normal form f1| | f2| |. Where any fi in the disjunctive normal form is a constraint on a variable, the constraint can be considered as a correspondence between the variable and a corresponding constraint value. After the disjunctive normal form f1| | f2| - |, is obtained, | | fm, if any fi and fj are determined to be incompatible, namely any mapping does not exist and meets any pair of fi ^ fj, variables corresponding to query nodes which are restrained by the FILTER condition and appear in the first query tree are sequentially evaluated as restraint values by BIND clauses, and then the FILTER nodes are converted into UNION nodes, and the specific processing is as follows:
one UNION node may be added as a brother of the FILTER node, and m child nodes, which are all merged nodes (m is the constraint number in the disjunctive normal form), are added for this UNION node. For each constraint condition fi, assigning variables in query nodes (namely other brother nodes except for the newly added UNION node corresponding to the FILTER node) constrained by the FILTER condition to be constraint values in fi by BIND clauses, and taking the obtained nodes as child nodes to be connected to the ith child node of the newly added UNION node. And finally, deleting all nodes except the newly added UNION node in the layer. After step 902 is completed, the process proceeds to step 301.
Thus, the FILTER node can be converted into a UNION node to participate in the simplification process of fig. 3, and the query tree can be simplified to a certain extent, thereby improving the data query efficiency.
As shown in fig. 10, fig. 10 is a method for simplifying processing provided by the present application, and the method includes:
step 1001, if there are multiple BGP nodes that can be executed in parallel, determining a common triplet pattern corresponding to the multiple BGP nodes.
Wherein, two triad pattern t1 ═<s1,p1,o1>And t2 ═<s2,p2,o2>Let t1 and t2 be equivalent (denoted as
Figure BDA0003453666000000131
) If and only if the following conditions hold: 1. s1 and s2 are both variables; 2. p1 and p2 are the same predicate; 3. o1 and o2 are both variables or are the same constants.
If it is
Figure BDA0003453666000000132
Then μ (t1, t2) is a bijection from Var (t1) to Var (t2), where Var (t1) represents the set of variables that occur in t 1.
Given two BGPbi and bj, their triplet sequences are Si ═ (ti 1...., tik) and Sj ═ tjk '(tj 1.., tjk'), respectively, it is said that Si and Sj are equivalent (denoted as Si and Sj, respectively)
Figure BDA0003453666000000133
) If and only if the following conditions hold: 1. the sequences Si and Sj are the same in length, namely k ═ k'; 2. at the same time satisfy
Figure BDA0003453666000000134
3.μ 1(ti1, tj1) is still bijective from var (Si) to var (sj), where var (Si) represents the set of variables present in Si.
Based on the above definition, if there are multiple BGP that can be executed in parallel in the query graph, a frequent subgraph mining algorithm may be used to find a common sub-query C { C1, …, cn } among multiple BGP, where each ci is an equivalent triple subsequence in these BGP. The common sub-query C is a common triplet pattern corresponding to the multiple BGP nodes.
And step 1002, determining a partial public triad mode with the lowest corresponding query cost in the public triad modes based on a greedy algorithm.
Selecting public sub-queries with high selectivity: given a BGP set B ═ B1.,. bm } and a common set of subqueries C ═ C1.,. cn } between them, a common subset of subqueries is selected
Figure BDA0003453666000000141
To minimize the cost of:
Figure BDA0003453666000000142
wherein Cost (B, C)S) The matching Cost for this BGP set, Cost (c)i) Common sub-query c for entering subset CSiMatching Cost of (c), Cost (b)j|CS) Given the results of all common sub-queries selected into the subset CS, the matching cost of querying the remaining RDF triples bj is given. The matching cost may be computing resources required to be consumed for querying the corresponding triplet, or occupied duration, and the like, the matching cost may be determined according to a variable in each triplet pattern, and the matching cost corresponding to each variable may be previously set by a technician.
Cost(ci) And Cost (b)j|CS) The degree of selectivity sel (t) based on the triplet pattern t is defined as follows:
Cost(ci)=min{sel(t)|t∈ci}×|ci|
Cost(bj|CS)=min{sel(t)|t∈b′j}×|b′j|
wherein
Figure BDA0003453666000000143
I.e. not commonly queried by the BGP set BThe part covered by CS is collected.
Minimizing the above objective equation is an NP-hard problem. Therefore, using a greedy algorithm to select the CS minimizes the above-mentioned goal. The CS is initialized to an empty set. In each step, a common sub-query ci ∈ C is selected to be added to CS to maximize Δ ═ Cost (B, C)S)-Cost(B,CS∪ci) Iterations are performed until no common sub-queries can be added to the CS. And the finally obtained CS is the partial public triad mode with the lowest query cost.
Step 1003, inquiring data corresponding to part of the common triple modes in the graph data.
Step 1004, inquiring data corresponding to other triple patterns except for the partial common triple pattern in the plurality of BGP nodes in the graph data.
When executing query operations corresponding to a plurality of BGP nodes, first, matching may be performed for all selected common sub-queries ci e CS, and the intermediate results [ [ ci ] ] ] are cached. When performing a match for each BGP node bj, the result set may be computed as follows:
Figure BDA0003453666000000144
each of which
Figure BDA0003453666000000145
Is a common sub-query in a bj triplet mode subsequence, b'jAs defined above as bjThe part not covered by the common sub-query subset CS.
Therefore, when the query processing corresponding to the multiple BGP nodes is executed, the public switched triple pattern corresponding to the multiple BGP nodes may be queried first, and then the query processing corresponding to the other triple patterns except for the partial public triple pattern in the multiple BGP nodes is executed, so that the queried data amount may be reduced, and the query speed may be improved.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
Fig. 11 is a schematic structural diagram of an apparatus for querying data according to an embodiment of the present application, where the apparatus may be a computer device in the foregoing embodiment, and referring to fig. 11, the apparatus includes:
a receiving module 1110, configured to receive a data query instruction sent by a data query application, where the data query instruction carries a data query statement;
an establishing module 1120, configured to establish a first query tree corresponding to the data query statement based on the structure of the data query statement;
a processing module 1130, configured to perform simplification processing on the first query tree based on the type of each node in the first query tree, to obtain a second query tree;
the query module 1140 is configured to sequentially execute query operations corresponding to nodes in the second query tree in the graph database based on a preset execution sequence to obtain a data query result;
a returning module 1150, configured to return the data query result to the data query application.
Optionally, the types of the nodes in the first query tree include a merge node and a query node, where the merge node is used to represent the data query statement or the sub-query statement in the data query statement, the query node is used to represent the data query statement or the query term in the sub-query statement in the data query statement, and the query node includes at least one of a BGP node, a UNION node, an option node, and a FILTER node.
Optionally, the query module 1140 is configured to:
determining the first query tree as a third query tree to be simplified, and determining the depth of each node in the third query tree;
for a first merge node with a depth of 1 in the third query tree, if child nodes of the first merge node include multiple BGP nodes, merging the multiple BGP nodes to obtain a merged first BGP node, deleting the first merge node, and adding the first BGP node to the location of the first merge node;
for a second merge node with a depth of 2 in the third query tree, if child nodes of the second merge node include at least one BGP node and at least one UNION node, merging the at least one BGP node to obtain a merged second BGP node, and merging the at least one UNION node to obtain a merged third UNION node; merging the second BGP node into a child node of the third UNION node to obtain a fourth UNION node, deleting the second merged node, and adding the fourth UNION node to the position of the second merged node;
and for a fifth UNION node with the depth of 2 in the third query tree, adding a descendant node of the fifth UNION node into a child node of the fifth UNION node, and deleting a descendant node of the fifth UNION node and a parent node of the descendant node to obtain a simplified third query tree.
Optionally, the processing module 1130 is further configured to:
determining that a first optimal node of which no optimal node exists in a corresponding ancestor node in the first query tree;
and converting the child query tree taking the parent node of the first OPTIONAL node as a root node into a third BGP node.
Optionally, the query module 1140 is configured to:
when the first query operation corresponding to the third BGP node is executed, determining a sub query tree corresponding to the third BGP node;
executing the query operation corresponding to the brother node of the first OPTIONAL node in the sub query tree to obtain a first query result; determining the first query result as a data query range of descendant nodes of the first OPTIONAL node; and executing the query operation corresponding to the descendant nodes of the first OPTIONAL node based on the data query range.
Optionally, the query module 1140 is configured to:
when the first query operation corresponding to the third BGP node is executed, if the descendant nodes of the first OPTIONAL node are determined to comprise at least one second OPTIONAL node;
sequentially executing query operations corresponding to a first brother node of the first option node and a second brother node of the at least one second option node according to the depths of the first option node and the at least one second option node, wherein a data query range corresponding to the second brother node of each second option node is a query result corresponding to a second brother node of a previous option node;
and sequentially executing query operations corresponding to the child nodes of the first OPTIONAL node and the child nodes of the at least one second OPTIONAL node according to the depths of the first OPTIONAL node and the at least one second OPTIONAL node, wherein the corresponding data query range of the child nodes of any OPTIONAL node is a query result corresponding to the brother node of any OPTIONAL node.
Optionally, the processing module 1130 is configured to:
for the FILTER node in the first query tree, if the FILTER condition corresponding to the FILTER node meets a preset conversion condition, converting the FILTER condition into a disjunctive normal form;
and converting the FILTER node into a UNION node based on the disjunctive normal form.
Optionally, the conversion condition is that the FILTER condition corresponding to the FILTER node is composed of three operators, namely a variable, a constant, and an and/or an equal operator.
Optionally, the query module 1140 is configured to:
if a plurality of BGP nodes which can be executed in parallel exist, determining a common three-tuple mode corresponding to the BGP nodes;
determining a partial public triad mode with the lowest corresponding query cost in the public triad modes based on the greedy algorithm;
querying data corresponding to the partial common triplet mode in the graph data;
and inquiring data corresponding to other triple patterns except the partial common triple pattern in the plurality of BGP nodes in the graph data.
It should be noted that: in the apparatus for querying data provided in the foregoing embodiment, when querying data, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus for querying data and the method for querying data provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and will not be described herein again.
Fig. 12 shows a block diagram of a computer device 1200 according to an exemplary embodiment of the present application. The computer device 1200 may be a portable mobile terminal, such as: a smart phone, a tablet computer, an MP3 player (moving picture experts group audio layer III, motion picture experts group audio layer 3), an MP4 player (moving picture experts group audio layer IV, motion picture experts group audio layer 4), a notebook computer, or a desktop computer. Computer device 1200 may also be referred to by other names such as user equipment, portable terminals, laptop terminals, desktop terminals, and the like.
Generally, computer device 1200 includes: a processor 1201 and a memory 1202.
The processor 1201 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1201 may be implemented in at least one hardware form of a DSP (digital signal processing), an FPGA (field-programmable gate array), and a PLA (programmable logic array). The processor 1201 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1201 may be integrated with a GPU (graphics processing unit) for rendering and drawing content that the display screen needs to display. In some embodiments, the processor 1201 may further include an AI (artificial intelligence) processor for processing a calculation operation related to machine learning.
Memory 1202 may include one or more computer-readable storage media, which may be non-transitory. Memory 1202 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1202 is used to store at least one instruction for execution by processor 1201 to implement a method of querying data as provided by method embodiments herein.
In some embodiments, the computer device 1200 may further optionally include: a peripheral interface 1203 and at least one peripheral. The processor 1201, memory 1202, and peripheral interface 1203 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1203 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1204, display 1205, camera assembly 1206, audio circuitry 1207, positioning assembly 1208, and power supply 1209.
The peripheral interface 1203 may be used to connect at least one peripheral associated with I/O (input/output) to the processor 1201 and the memory 1202. In some embodiments, the processor 1201, memory 1202, and peripheral interface 1203 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1201, the memory 1202 and the peripheral device interface 1203 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The radio frequency circuit 1204 is used for receiving and transmitting RF (radio frequency) signals, also called electromagnetic signals. The radio frequency circuit 1204 communicates with a communication network and other communication devices by electromagnetic signals. The radio frequency circuit 1204 converts an electric signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electric signal. Optionally, the radio frequency circuit 1204 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1204 may communicate with other terminals through at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (wireless fidelity) networks. In some embodiments, the rf circuit 1204 may further include NFC (near field communication) related circuits, which are not limited in this application.
The display screen 1205 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1205 is a touch display screen, the display screen 1205 also has the ability to acquire touch signals on or over the surface of the display screen 1205. The touch signal may be input to the processor 1201 as a control signal for processing. At this point, the display 1205 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 1205 may be one, disposed on the front panel of the computer device 1200; in other embodiments, the display panels 1205 can be at least two, each disposed on a different surface of the computer device 1200 or in a folded design; in other embodiments, the display 1205 may be a flexible display disposed on a curved surface or on a folded surface of the computer device 1200. Even further, the display screen 1205 may be arranged in a non-rectangular irregular figure, i.e., a shaped screen. The display panel 1205 can be made of a material such as an LCD (liquid crystal display), an OLED (organic light-emitting diode), and the like.
Camera assembly 1206 is used to capture images or video. Optionally, camera assembly 1206 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each of the rear cameras is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (virtual reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1206 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The audio circuitry 1207 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals into the processor 1201 for processing or inputting the electric signals into the radio frequency circuit 1204 to achieve voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and located at different locations on the computer device 1200. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 1201 or the radio frequency circuit 1204 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1207 may also include a headphone jack.
The location component 1208 is used to locate a current geographic location of the computer device 1200 for navigation or LBS (location based service). The positioning component 1208 can be a positioning component based on the united states GPS (global positioning system), the chinese beidou system, or the russian galileo system.
The power supply 1209 is used to power the various components in the computer device 1200. The power source 1209 may be alternating current, direct current, disposable or rechargeable. When the power source 1209 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, the computer device 1200 also includes one or more sensors 1210. The one or more sensors 1210 include, but are not limited to: acceleration sensor 1211, gyro sensor 1212, pressure sensor 1213, fingerprint sensor 1214, optical sensor 1215, and proximity sensor 1216.
The acceleration sensor 1211 may detect magnitudes of accelerations on three coordinate axes of a coordinate system established with the computer apparatus 1200. For example, the acceleration sensor 1211 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1201 may control the display screen 1205 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1211. The acceleration sensor 1211 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 1212 may detect a body direction and a rotation angle of the computer device 1200, and the gyro sensor 1212 may collect a 3D motion of the user on the computer device 1200 in cooperation with the acceleration sensor 1211. The processor 1201 can implement the following functions according to the data collected by the gyro sensor 1212: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
Pressure sensors 1213 may be disposed on the side bezel of computer device 1200 and/or underlying display 1205. When the pressure sensor 1213 is disposed on the side frame of the computer device 1200, the holding signal of the user to the computer device 1200 can be detected, and the processor 1201 performs left-right hand recognition or quick operation according to the holding signal acquired by the pressure sensor 1213. When the pressure sensor 1213 is disposed at a lower layer of the display screen 1205, the processor 1201 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1205. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 1214 is used for collecting a fingerprint of the user, and the processor 1201 identifies the user according to the fingerprint collected by the fingerprint sensor 1214, or the fingerprint sensor 1214 identifies the user according to the collected fingerprint. When the user identity is identified as a trusted identity, the processor 1201 authorizes the user to perform relevant sensitive operations, including unlocking a screen, viewing encrypted information, downloading software, paying, changing settings, and the like. The fingerprint sensor 1214 may be disposed on the front, back, or side of the computer device 1200. When a physical key or vendor Logo is provided on the computer device 1200, the fingerprint sensor 1214 may be integrated with the physical key or vendor Logo.
Optical sensor 1215 is used to collect ambient light intensity. In one embodiment, the processor 1201 may control the display brightness of the display 1205 according to the ambient light intensity collected by the optical sensor 1215. Specifically, when the ambient light intensity is high, the display luminance of the display panel 1205 is increased; when the ambient light intensity is low, the display brightness of the display panel 1205 is turned down. In another embodiment, processor 1201 may also dynamically adjust the camera head 1206 shooting parameters based on the ambient light intensity collected by optical sensor 1215.
A proximity sensor 1216, also called a distance sensor, is generally provided on a front panel of the computer apparatus 1200. The proximity sensor 1216 is used to collect the distance between the user and the front of the computer device 1200. In one embodiment, the processor 1201 controls the display screen 1205 to switch from the bright screen state to the dark screen state when the proximity sensor 1216 detects that the distance between the user and the front of the computer device 1200 is gradually decreasing; when the proximity sensor 1216 detects that the distance between the user and the front of the computer device 1200 is gradually increased, the display 1205 is controlled by the processor 1201 to switch from the rest state to the bright state.
Those skilled in the art will appreciate that the architecture illustrated in FIG. 12 is not intended to be limiting of the computer device 1200, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components may be used.
In an exemplary embodiment, a computer-readable storage medium, such as a memory, including instructions executable by a processor in a terminal to perform the method of querying data in the above embodiments is also provided. The computer readable storage medium may be non-transitory. For example, the computer-readable storage medium may be a ROM (read-only memory), a RAM (random access memory), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product is also provided, which includes at least one instruction that is loaded and executed by a processor to implement the method for querying data in the above embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The terms "first," "second," and the like, in this application, are used for distinguishing between similar items and items that have substantially the same function or similar functionality, and it is to be understood that "first" and "second" do not have a logical or temporal dependency, nor do they define a quantity or order of execution. It will be further understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
The term "at least one" in this application means one or more, and the term "plurality" in this application means two or more.
The above description is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1. A method of querying data, the method comprising:
receiving a data query instruction sent by a data query application program, wherein the data query instruction carries a data query statement;
establishing a first query tree corresponding to the data query statement based on the structure of the data query statement;
simplifying the first query tree based on the types of the nodes in the first query tree to obtain a second query tree;
based on a preset execution sequence, sequentially executing query operations corresponding to all nodes in the second query tree in a graph database to obtain a data query result;
and returning the data query result to the data query application program.
2. The method of claim 1, wherein the types of nodes in the first query tree include a merge node and a query node, wherein the merge node is used for representing the data query statement or a sub-query statement in the data query statement, the query node is used for representing a query term in the data query statement or a sub-query statement in the data query statement, and the query node includes at least one of a basic graph pattern BGP node, a UNION node, an OPTIONAL matching OPTIONAL node, and a FILTER node.
3. The method of claim 2, wherein the simplifying the first query tree based on the type of each node in the first query tree to obtain a second query tree comprises:
determining the first query tree as a third query tree to be simplified, and determining the depth of each node in the third query tree;
for a first merge node with a depth of 1 in the third query tree, if child nodes of the first merge node include multiple BGP nodes, merging the multiple BGP nodes to obtain a merged first BGP node, deleting the first merge node, and adding the first BGP node to the location of the first merge node;
for a second merge node with a depth of 2 in the third query tree, if child nodes of the second merge node include at least one BGP node and at least one UNION node, merging the at least one BGP node to obtain a merged second BGP node, and merging the at least one UNION node to obtain a merged third UNION node; merging the second BGP node into a child node of the third UNION node to obtain a fourth UNION node, deleting the second merging node, and adding the fourth UNION node to the position of the second merging node;
and for a fifth UNION node with the depth of 2 in the third query tree, adding a descendant node of the fifth UNION node into a child node of the fifth UNION node, and deleting a descendant node of the fifth UNION node and a parent node of the descendant node to obtain a simplified third query tree.
4. The method of claim 3, wherein prior to determining the first query tree as a third query tree to be simplified, the method further comprises:
determining that a first optimal node of the optimal nodes does not exist in the corresponding ancestor nodes in the first query tree;
and converting the child query tree taking the parent node of the first OPTIONAL node as a root node into a third BGP node.
5. The method according to claim 4, wherein said sequentially performing query operations corresponding to nodes in said second query tree in a graph database comprises:
when the first query operation corresponding to the third BGP node is executed, determining a sub query tree corresponding to the third BGP node;
executing the query operation corresponding to the brother node of the first OPTIONAL node in the sub query tree to obtain a first query result; determining the first query result as a data query range of descendant nodes of the first OPTIONAL node; and executing the query operation corresponding to the descendant nodes of the first OPTIONAL node based on the data query range.
6. The method according to claim 4, wherein said sequentially performing query operations corresponding to nodes in said second query tree in a graph database comprises:
when the first query operation corresponding to the third BGP node is executed, if the descendant nodes of the first OPTIONAL node are determined to comprise at least one second OPTIONAL node;
sequentially executing query operations corresponding to a first brother node of the first option node and a second brother node of the at least one second option node according to the depths of the first option node and the at least one second option node, wherein a data query range corresponding to the second brother node of each second option node is a query result corresponding to a second brother node of a previous option node;
and sequentially executing query operations corresponding to the child nodes of the first OPTIONAL node and the child nodes of the at least one second OPTIONAL node according to the depths of the first OPTIONAL node and the at least one second OPTIONAL node, wherein the corresponding data query range of the child nodes of any OPTIONAL node is a query result corresponding to the brother node of any OPTIONAL node.
7. The method of claim 2, wherein the simplifying the first query tree based on the type of each node in the first query tree to obtain a second query tree comprises:
for the FILTER node in the first query tree, if the FILTER condition corresponding to the FILTER node meets a preset conversion condition, converting the FILTER condition into a disjunctive normal form;
and converting the FILTER node into a UNION node based on the disjunctive normal form.
8. The method according to claim 7, wherein the conversion condition is that the FILTER condition corresponding to the FILTER node is composed of three operators, namely variable, constant and or and equals.
9. The method according to claim 2, wherein said sequentially performing query operations corresponding to nodes in said second query tree in a graph database comprises:
if a plurality of BGP nodes which can be executed in parallel exist, determining a common three-tuple mode corresponding to the BGP nodes;
determining a partial public triad mode with the lowest corresponding query cost in the public triad modes based on the greedy algorithm;
querying data corresponding to the partial public triplet mode in the graph data;
and inquiring data corresponding to other triple patterns except the partial common triple pattern in the plurality of BGP nodes in the graph data.
10. An apparatus for querying data, the apparatus comprising:
the receiving module is used for receiving a data query instruction sent by a data query application program, wherein the data query instruction carries a data query statement;
the establishing module is used for establishing a first query tree corresponding to the data query statement based on the structure of the data query statement;
the processing module is used for simplifying the first query tree based on the type of each node in the first query tree to obtain a second query tree;
the query module is used for sequentially executing query operations corresponding to all nodes in the second query tree in the graph database based on a preset execution sequence to obtain a data query result;
and the return module is used for returning the data query result to the data query application program.
11. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to perform operations performed by a method of querying data according to any one of claims 1 to 9.
12. A computer-readable storage medium having stored therein at least one instruction which is loaded and executed by a processor to perform operations performed by a method of querying data according to any one of claims 1 to 9.
CN202111673409.6A 2021-12-31 2021-12-31 Method, device and equipment for querying data and storage medium Pending CN114706846A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111673409.6A CN114706846A (en) 2021-12-31 2021-12-31 Method, device and equipment for querying data and storage medium
PCT/CN2022/135606 WO2023124729A1 (en) 2021-12-31 2022-11-30 Data query method and apparatus, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111673409.6A CN114706846A (en) 2021-12-31 2021-12-31 Method, device and equipment for querying data and storage medium

Publications (1)

Publication Number Publication Date
CN114706846A true CN114706846A (en) 2022-07-05

Family

ID=82166982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111673409.6A Pending CN114706846A (en) 2021-12-31 2021-12-31 Method, device and equipment for querying data and storage medium

Country Status (2)

Country Link
CN (1) CN114706846A (en)
WO (1) WO2023124729A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023124729A1 (en) * 2021-12-31 2023-07-06 北京大学 Data query method and apparatus, and device and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117271576A (en) * 2023-10-19 2023-12-22 北京人大金仓信息技术股份有限公司 Query optimization method, storage medium and computer equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9256639B2 (en) * 2012-08-31 2016-02-09 Infotech Soft, Inc. Query optimization for SPARQL
CN103116625A (en) * 2013-01-31 2013-05-22 重庆大学 Volume radio direction finde (RDF) data distribution type query processing method based on Hadoop
US9031933B2 (en) * 2013-04-03 2015-05-12 International Business Machines Corporation Method and apparatus for optimizing the evaluation of semantic web queries
CN111241127B (en) * 2020-01-16 2023-01-31 华南师范大学 Predicate combination-based SPARQL query optimization method, system, storage medium and equipment
CN114706846A (en) * 2021-12-31 2022-07-05 北京大学 Method, device and equipment for querying data and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023124729A1 (en) * 2021-12-31 2023-07-06 北京大学 Data query method and apparatus, and device and storage medium

Also Published As

Publication number Publication date
WO2023124729A1 (en) 2023-07-06

Similar Documents

Publication Publication Date Title
WO2023124729A1 (en) Data query method and apparatus, and device and storage medium
CN108717432B (en) Resource query method and device
WO2022100221A1 (en) Retrieval processing method and apparatus, and storage medium
CN109189282A (en) A kind of application recommended method, device and mobile terminal
CN111694834A (en) Method, device and equipment for putting picture data into storage and readable storage medium
CN109902089B (en) Query method and device using heterogeneous index, electronic equipment and medium
CN114244595B (en) Authority information acquisition method and device, computer equipment and storage medium
CN112287234B (en) Information retrieval method, device and storage medium
WO2021151320A1 (en) Holding posture detection method and electronic device
CN113742366A (en) Data processing method and device, computer equipment and storage medium
CN111061803A (en) Task processing method, device, equipment and storage medium
CN110597801B (en) Database system and establishing method and device thereof
CN110471614B (en) Method for storing data, method and device for detecting terminal
CN112561084B (en) Feature extraction method and device, computer equipment and storage medium
CN114741256B (en) Sensor monitoring method and device and terminal equipment
CN110737692A (en) data retrieval method, index database establishment method and device
CN110149408B (en) Service data display method and device, terminal and server
CN111651693A (en) Data display method, data sorting method, device, equipment and medium
CN111125095B (en) Method, device, electronic equipment and medium for adding data prefix
CN114329292A (en) Resource information configuration method and device, electronic equipment and storage medium
CN113900920A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN110928867B (en) Data fusion method and device
CN112416356A (en) JSON character string processing method, device, equipment and storage medium
CN112711636A (en) Data synchronization method, device, equipment and medium
CN111680039A (en) Storage method, query method, device, equipment and storage medium of order information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination