WO2023125718A1 - 基于知识图谱的数据查询方法、系统、设备及存储介质 - Google Patents

基于知识图谱的数据查询方法、系统、设备及存储介质 Download PDF

Info

Publication number
WO2023125718A1
WO2023125718A1 PCT/CN2022/143004 CN2022143004W WO2023125718A1 WO 2023125718 A1 WO2023125718 A1 WO 2023125718A1 CN 2022143004 W CN2022143004 W CN 2022143004W WO 2023125718 A1 WO2023125718 A1 WO 2023125718A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
statement
additional information
queried
data
Prior art date
Application number
PCT/CN2022/143004
Other languages
English (en)
French (fr)
Inventor
刘丰
刘东方
程东碧
杨旭
李潇洋
王云飞
胡晓
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023125718A1 publication Critical patent/WO2023125718A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present application relate to the communication field, and in particular to a data query method, system, device, and storage medium based on knowledge graphs.
  • the main purpose of the embodiments of the present application is to propose a data query method, system, device, and storage medium based on knowledge graphs, so as to obtain data query results quickly and in real time.
  • an embodiment of the present application provides a data query method based on a knowledge graph, including: obtaining a query statement, performing intent analysis on the query statement, and determining the query type to which the query statement belongs;
  • map the statement to be queried with a preset path index obtain a target index corresponding to the statement to be queried, and obtain a query result from the target index, wherein
  • the query results include data results and statistical values corresponding to the data results, and the statistical values represent the number of data results corresponding to the statement to be queried;
  • the statement to be queried belongs to the traversal query type, according to the statement to be queried
  • the query statement traverses the preset basic index to obtain the query result corresponding to the statement to be queried.
  • the embodiment of the present application also proposes a data query system based on knowledge graph, including:
  • An intent parsing module configured to acquire a query statement, perform intent analysis on the query statement, and determine the query type to which the query statement belongs;
  • a statistical query module configured to map the statement to be queried with a preset path index when the statement to be queried belongs to a statistical query type, obtain a target index corresponding to the statement to be queried, and obtain the target index from the target index Obtaining query results, wherein the query results include data results and statistical values corresponding to the data results, and the statistical values represent the number of data results corresponding to the statement to be queried;
  • the traversal query module is configured to traverse a preset basic index according to the query statement to obtain a query result corresponding to the statement to be queried when the statement to be queried belongs to the traversal query type.
  • an embodiment of the present application also provides an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be used by the at least one processor An instruction executed by a processor, the instruction is executed by the at least one processor, so that the at least one processor can execute the knowledge graph-based data query method described in the above embodiments.
  • the embodiment of the present application also proposes a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the knowledge graph-based data query method described in the above embodiments is implemented.
  • a data query method, system, device, and storage medium based on a knowledge map proposed in this application by analyzing the intent of the query statement to determine the query type to which the query statement belongs, and using different query methods according to the type, the data can be improved.
  • Query efficiency When the statement to be queried belongs to the type of statistical query, the statement to be queried is mapped with the path index to determine the target index, and the data result corresponding to the statement to be queried and the statistical value of the data result can be determined according to the target index. That is to say , the application can obtain the query results directly according to the target index during the statistical query, without traversing all data sources, and without the need to obtain statistical values after the traversal is completed, achieving the purpose of fast and real-time query.
  • the statement to be queried needs to perform a traversal query, it traverses all the basic indexes to obtain the query results.
  • FIG. 1 is a flow chart 1 of a data query method based on a knowledge map provided by an embodiment of the present application
  • Fig. 2 is the second flow chart of the knowledge map-based data query method provided by the embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a data query system based on a knowledge map provided by an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
  • the embodiment of the present application relates to a data query method based on a knowledge map, as shown in Figure 1, including:
  • step 101 the query statement is obtained, and the intent analysis is performed on the query statement to determine the query type to which the query statement belongs.
  • step 101 specifically includes: performing intent analysis on the query statement to obtain the atomic operation set corresponding to the query statement; If the collection contains statistical operations, the statement to be queried is a statistical query type.
  • an intent parser may be used to analyze the intent of the query statement. For example, when the query statement is in the Gremlin language, the Gremlin language parser is used to analyze the intent of the query statement.
  • the pending query statement is a traversal query type; if the atomic operation set contains a statistical operation, the pending query statement is a statistical query type.
  • an atomic operation refers to an operation that will not be interrupted by the thread scheduling mechanism; once such an operation starts, it will run until the end, and will not be interrupted by any other tasks or events before the execution is completed.
  • An atomic operation can be one step or multiple operation steps, but its order cannot be disturbed, nor can it be cut to execute only part of it. Treating the entire operation as a whole is a core feature of atomicity.
  • Step 102 when the statement to be queried belongs to the statistical query type, map the statement to be queried with a preset path index, obtain the target index corresponding to the statement to be queried, and obtain a query result from the target index, wherein the query result includes data results A statistical value corresponding to the data result, where the statistical value represents the quantity of the data result corresponding to the statement to be queried.
  • the target index is determined in , and the data result and the statistical value corresponding to the data result can be obtained according to the target index.
  • the data results that match the query statement and the corresponding number of data results can be directly obtained during data query, so as to achieve the effect of real-time query of massive data without traversing the entire data source, and it is not necessary to complete the final traversal. Get the quantity corresponding to the data result.
  • Step 103 when the statement to be queried belongs to the traversal query type, traverse the preset basic index according to the statement to be queried, and obtain the query result corresponding to the statement to be queried.
  • the query engine can be directly used to traverse each basic index to obtain the query result.
  • the query engine can be directly used to traverse each basic index to obtain the query result.
  • the statement to be queried is Gremlin language
  • the traversal query type refers to searching and traversing the entire data source according to the statement to be queried, and accurate and complete query results can be obtained only after the traversal is completed, while the statistical query type refers to obtaining the query result according to the statement to be queried At the same time, the number of query results should be counted.
  • a data query method based on a knowledge map proposed in this application by analyzing the intent of the query statement, determining the query type to which the query statement belongs, and using different query methods according to the type, can improve the efficiency of data query.
  • map the statement to be queried with the path index to determine the target index and then determine the data result corresponding to the statement to be queried and the statistical value of the data result according to the target index.
  • the query result can be obtained directly according to the target index, without traversing all data sources, and it is not necessary to obtain statistical values after the traversal is completed, realizing the purpose of fast and real-time query.
  • the statement to be queried needs to perform a traversal query, it traverses all the basic indexes to obtain the query results.
  • the embodiment of the present application relates to a data query method based on knowledge graph, as shown in Figure 2, including:
  • step 201 the ontology of the knowledge graph is obtained, and the ontology of the knowledge graph is parsed and transformed into a graph structure.
  • the acquired knowledge graph ontology is composed of relevant data of the application scenario of the data query method.
  • the data query method of the present application when the data query method of the present application is applied to the sales system of a certain company, the sales data of the company is obtained, and the ontology of the sales knowledge graph is extracted from the sales data.
  • the data query method of the present application is applied to a company's cargo transportation process, the company's cargo flow transfer data is obtained, and the ontology of the logistics knowledge graph is extracted from the cargo transfer data.
  • the data method of the present application is applied to public opinion analysis on Internet information, Internet data is obtained, and the Internet data is analyzed to extract the ontology of Internet data knowledge graph.
  • this is only a specific example, and the data method of the present application can be applied to any scene, so details are not described here.
  • the ontology of the knowledge graph includes entities, relationships, and attributes, and the attributes include entity attributes and relationship attributes; step 201 specifically includes: mapping entities to vertices of the graph structure, wherein the entity attributes are additional information of the vertices; The relationship is mapped to the edge of the graph structure, where the relationship attribute is the additional information of the edge; the vertex and the edge constitute the graph structure.
  • the knowledge map is a graph-based data result, which is mainly used to describe various entities and concepts in the real world, as well as the relationship between them.
  • the ontology of knowledge graph contains entities, relations and attributes.
  • an entity can be any transaction in the real world, such as: people, places, companies, books, animals, etc., and a relationship expresses the connection between different entities.
  • Attributes include entity attributes and relationship attributes, which represent the specific information of the entity or relationship itself.
  • entity A is a specific person, and the attributes of entity A include age, height, blood type, and student status.
  • Entity B is another specific person, and the attributes of entity B include age, height, blood type, and teacher status.
  • the relationship between entity A and entity B is a teacher-student relationship, and the attributes of the relationship may include: the start time of the teacher-student relationship, the end time of the teacher-student relationship, the location of the teacher-student relationship, etc.
  • each entity has an entity tag that identifies the entity type
  • each relationship has a relationship tag that identifies the relationship type.
  • the entity is mapped to a vertex
  • the relationship is mapped to an edge
  • the entity attribute is used as the additional information of the vertex.
  • the connection between the vertex and the additional information is connected by an empty edge.
  • the empty edge refers to an edge without a label, and the physical meaning is that the entity contains The property.
  • the relationship attribute is used as the additional information of the edge, and an empty edge connection is used between the edge and the additional information of the edge.
  • the entity also carries subject-predicate information, which is used to indicate the subject-predicate relationship between the entity and another entity that has a relationship, for example: there is a certain relationship between entity A and entity B, and entity A is entity A and entity B The initiator of the relationship between entities, entity B is the recipient of the relationship between entity A and entity B.
  • subject-predicate information Through the subject-predicate information, each relationship is mapped to a directed edge with additional information.
  • Step 202 building a basic index according to the graph structure.
  • step 202 specifically includes: obtaining the combination mode of all additional information of each vertex and the combination mode of all additional information of each edge; Combination methods are combined to obtain a combination method set; each combination method in the combination method set is used as a basic index.
  • the construction of the basic index is for each vertex or each edge, and there may be multiple additional information for a vertex or an edge, so for each vertex or each edge, get all Combination of additional information, each combination as a basic index, for example: a graph structure, including: vertex A (containing additional information a1, a2), vertex B (containing additional information b1, b2, b3), vertex C (containing additional information c1, c2), there is an edge M1 between vertex A and vertex B (containing additional information m11, m12), there is an edge M2 between vertex A and vertex C (containing additional information m21, m22, m23), then
  • the combination of additional information of vertex A is ⁇ a1, a2, a1a2 ⁇
  • the combination of additional information of vertex B is ⁇ b1, b2, b3, b1b2, b1b3, b2b3 ⁇
  • the combination of additional information of vertex C is ⁇ c1, c2 , c
  • the ontology of the knowledge graph is transformed into a graph structure.
  • the graph structure can easily obtain the connection between data. Extracting and constructing the basic index from the graph structure can easily obtain the relevant information of vertices (entities) and edges ( relationship) to avoid combinations or confusion of vertices and edges.
  • the entire basic index construction process is automated, which saves costs and is less prone to errors or omissions than manual construction.
  • the priority or weight for each additional information according to the type of each additional information.
  • the number of additional information exceeds the preset upper limit, it will be eliminated according to the priority or weight.
  • Low priority or low weight additional information such as: when the data query method of the present application is applied to the sales system, the vertex (entity) is a salesperson, then the vertex additional information (attributes) such as age, height and other information are not important for sales analysis, and these additional information can be used If the priority or weight is set lower, additional vertex information (attributes) such as working time and personality may be more important for sales analysis, so the priority or weight of these additional information can be set higher.
  • Step 203 extracting the query pattern from the graph structure, and constructing a path index based on the query pattern.
  • step 203 specifically includes: traversing the graph structure using a preset graph traversal algorithm to obtain all paths between vertices in the graph structure; using each path as a query mode to obtain multiple Query conditions: Obtain statistical values corresponding to each query condition, and use each query mode containing multiple query conditions and multiple statistical values as a path index. That is to say, in the process of constructing the path index, the number of each query condition has been statistically saved. In this way, in the subsequent data query process, it is only necessary to directly extract data and statistical values, thus achieving real-time query. Purpose.
  • the path index is for each path in the graph structure, so all paths between two vertices in the graph structure are obtained through the graph traversal algorithm, each path is used as a query mode, and then the For multiple query conditions corresponding to the query mode, the number of each query condition in the graph structure is further obtained as the statistical value of the query condition.
  • the data query method of this application is applied to the logistics system. There are 20 vertices in the graph structure. These 20 vertices all represent different items, but the delivery places of these items are the same, so when the query condition is the delivery place , the statistical value corresponding to the query condition is 20.
  • each path is regarded as a query mode, and a plurality of query conditions in the query mode are obtained, which specifically includes: respectively calculating the combination of all additional information of all vertices in each path and all Combination modes of all additional information of edges; for each path, combine the combination modes of additional information of vertices and the combination modes of additional information of edges to obtain a set of path combination modes; combine each combination mode in the set of path combination modes method as a query condition.
  • the path A-B-C in the graph structure there are three vertices A (containing 3 additional information), B (containing 5 additional information), C (containing 2 additional information), and two vertices in the path.
  • Edge M1 (contains 1 additional information), M2 (contains 3 additional information), and then calculates the combination of a total of 10 additional information of 3 vertices in this path, and calculates a total of 4 A combination of additional information.
  • the combinations of the additional information of the vertices and the combinations of the additional information of the edges are combined to obtain a set of path combinations, and each combination in this set is a query condition.
  • the path combination method set after combining the combination methods of the additional information of the vertices and the combination methods of the additional information of the edges to obtain the path combination method set, it further includes: for each combination method in the path combination method set according to the character string size Sort. Sorting the collection of path combinations by string size can improve the speed of search queries.
  • each query mode containing multiple query conditions and multiple statistical values as a path index it also includes: when a new entity or a new relationship is added When the path index is used, the entity attribute of the new entity or the relationship attribute of the new relationship is obtained; the new entity attribute or the new relationship attribute is compared with each path index, and the new entity attribute is determined to be or the path index that matches the new relationship attribute; according to the new entity attribute or the new relationship attribute, update the query condition and the path index that matches the new entity attribute or the new relationship attribute Statistical values corresponding to the above query conditions.
  • the entity attribute or relationship attribute is obtained, and the entity attribute or relationship attribute is compared with each path index. For example: when a new relationship is added to the path index, determine which relationship attribute of the new relationship describes the relationship between the two entities, and determine the matching path index, that is, the matching path index contains the description Corresponding additional information for these two entities.
  • determine which path indexes the entity attributes of the new entity match for example: when a path index contains multiple combinations of additional information, and these additional information are multiple entities
  • the entity attribute and the relationship attribute of the relationship between entities are multiple combinations of additional information.
  • it further includes: determining the priority for the additional information of all vertices in the path according to the preset corresponding relationship between the type of additional information of the vertices and the priority;
  • the preset corresponding relationship between edge additional information types and priorities determines the priority of all edge additional information in the path;
  • the priority of additional information removes N pieces of additional information from the additional information of all vertices, and N is the difference between the amount of additional information of a vertex and the upper limit of the amount of additional information of a vertex; when the amount of additional information of the edge exceeds the preset edge additional information
  • M pieces of additional information are removed from the additional information of all edges according to the priority of the additional information of the edge, and M is the difference between the amount of additional information of the edge and the upper limit of the amount of additional information of the edge.
  • the setting of the priority is determined according to information such as the importance of the additional information to the business, the frequency of use of the additional information when inquiring, and the like.
  • the priority of the additional information will be sorted from high to low. Low additional information, in order to improve the query rate.
  • step 204 the query statement is obtained, and the intent analysis is performed on the query statement to determine the query type to which the query statement belongs.
  • Step 205 when the statement to be queried belongs to the statistical query type, map the statement to be queried with the preset path index, obtain the target index corresponding to the statement to be queried, and obtain the query result from the target index, wherein the query result includes the data result A statistical value corresponding to the data result, where the statistical value represents the quantity of the data result corresponding to the statement to be queried.
  • Step 206 when the statement to be queried belongs to the traversal query type, traverse the preset basic index according to the statement to be queried, and obtain the query result corresponding to the statement to be queried.
  • step 204-step 204 are basically the same as those of step 101-step 103, and will not be repeated here.
  • a data query method based on a knowledge map proposed in this application by analyzing the intent of the query statement, determining the query type to which the query statement belongs, and using different query methods according to the type, can improve the efficiency of data query.
  • map the statement to be queried with the path index to determine the target index and then determine the data result corresponding to the statement to be queried and the statistical value of the data result according to the target index.
  • the query result can be obtained directly according to the target index, without traversing all data sources, and it is not necessary to obtain statistical values after the traversal is completed, realizing the purpose of fast and real-time query.
  • the statement to be queried needs to perform a traversal query, all basic indexes are traversed to obtain the query result.
  • the embodiment of the present application relates to a knowledge graph-based data query system, as shown in Figure 3, including:
  • the intention parsing module 301 is used to obtain the sentence to be queried, and perform intent parsing on the sentence to be queried, and determine the query type to which the sentence to be queried belongs;
  • Statistical query module 302 configured to map the statement to be queried with a preset path index when the statement to be queried belongs to the type of statistical query, obtain the target index corresponding to the statement to be queried, and obtain the target index corresponding to the statement to be queried Obtaining query results in the index, wherein the query results include data results and statistical values corresponding to the data results, and the statistical values represent the number of data results corresponding to the statement to be queried;
  • the traversal query module 303 is configured to traverse a preset basic index according to the query statement to obtain a query result corresponding to the statement to be queried when the statement to be queried belongs to the traversal query type.
  • modules involved in this embodiment are logical modules, and a logical unit may be a physical unit, or a part of a physical unit, or may be realized by a combination of multiple physical units.
  • a logical unit may be a physical unit, or a part of a physical unit, or may be realized by a combination of multiple physical units.
  • units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • this embodiment is a system embodiment corresponding to the embodiment of the knowledge map-based data query method, and this embodiment can be implemented in cooperation with the above-mentioned embodiments.
  • the relevant technical details mentioned in the foregoing embodiments are still valid in this embodiment, and will not be repeated here in order to reduce repetition.
  • the relevant technical details mentioned in this embodiment can also be applied to the above method embodiments.
  • the embodiment of the present application relates to an electronic device, as shown in FIG. 4 , including: at least one processor 401; Instructions executed by the at least one processor 401, the instructions are executed by the at least one processor 401, so that the at least one processor 401 can execute the data query method based on the knowledge graph in the above embodiment.
  • the memory and the processor are connected by a bus
  • the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors and various circuits of the memory together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other systems over transmission media.
  • the data processed by the processor is transmitted on the wireless medium through the antenna, further, the antenna also receives the data and transmits the data to the processor.
  • the processor is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interface, voltage regulation, power management, and other control functions. Instead, memory can be used to store data that the processor uses when performing operations.
  • Embodiments of the present application relate to a computer-readable storage medium storing a computer program.
  • the computer program is executed by the processor, the above data query method based on the knowledge graph is realized.
  • a storage medium includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提出一种基于知识图谱的数据查询方法、系统、设备及存储介质,涉及通信领域。基于知识图谱的数据查询方法包括:获取待查询语句,并对所述待查询语句进行意图解析,确定所述待查询语句所属的查询类型;当所述待查询语句属于统计查询类型时,将所述待查询语句与预设的路径索引进行映射,获取所述待查询语句对应的目标索引,并从所述目标索引中获取查询结果,其中所述查询结果包含数据结果和所述数据结果对应的统计值,所述统计值表示所述待查询语句对应的数据结果的数量;当所述待查询语句属于遍历查询类型时,根据所述待查询语句遍历预设的基本索引,获取与所述待查询语句对应的查询结果。

Description

基于知识图谱的数据查询方法、系统、设备及存储介质
相关申请
本申请要求于2021年12月29号申请的、申请号为202111643082.8的中国专利申请的优先权。
技术领域
本申请实施例涉及通信领域,特别涉及一种基于知识图谱的数据查询方法、系统、设备及存储介质。
背景技术
目前,基于知识图谱的数据查询方法主要有两种,一种是采用遍历整个数据源的方式,但这种方式在大规模数据场景下基本不可用,也无法达到实时查询的效果;另一种则是引入外部离线任务,即引入外部的分布式计算组件来帮助数据查询,但这种方法需要额外的计算资源,成本高、系统复杂,且同样无法实现实时查询。
发明内容
本申请实施例的主要目的在于提出一种基于知识图谱的数据查询方法、系统、设备及存储介质,实现快速、实时获取数据查询结果。
为实现上述目的,本申请实施例提供了一种基于知识图谱的数据查询方法,包括:获取待查询语句,并对所述待查询语句进行意图解析,确定所述待查询语句所属的查询类型;当所述待查询语句属于统计查询类型时,将所述待查询语句与预设的路径索引进行映射,获取所述待查询语句对应的目标索引,并从所述目标索引中获取查询结果,其中所述查询结果包含数据结果和所述数据结果对应的统计值,所述统计值表示所述待查询语句对应的数据结果的数量;当所述待查询语句属于遍历查询类型时,根据所述待查询语句遍历预设的基本索引,获取与所述待查询语句对应的查询结果。
为实现上述目的,本申请实施例还提出了一种基于知识图谱的数据查询系统,包括:
意图解析模块,用于获取待查询语句,并对所述待查询语句进行意图解析,确定所述待查询语句所属的查询类型;
统计查询模块,用于当所述待查询语句属于统计查询类型时,将所述待查询语句与预设的路径索引进行映射,获取所述待查询语句对应的目标索引,并从所述目标索引中获取查询结果,其中所述查询结果包含数据结果和所述数据结果对应的统计值,所述统计值表示所述待查询语句对应的数据结果的数量;
遍历查询模块,用于当所述待查询语句属于遍历查询类型时,根据所述待查询语句遍历预设的基本索引,获取与所述待查询语句对应的查询结果。
为实现上述目的,本申请实施例还提出了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个 处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行以上实施例所述的基于知识图谱的数据查询方法。
为实现上述目的,本申请实施例还提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现以上实施例所述的基于知识图谱的数据查询方法。
本申请提出的一种基于知识图谱的数据查询方法、系统、设备及存储介质,通过对待查询语句进行意图解析,确定待查询语句所属的查询类型,按照所属类型使用不同的查询方法,可以提高数据查询的效率,当待查询语句属于统计查询类型时,将待查询语句与路径索引进行映射确定目标索引,根据目标索引即可确定待查询语句对应的数据结果和数据结果的统计值,也就是说,本申请在统计查询时直接根据目标索引即可获取查询结果,无需遍历所有数据源,也无需在遍历完成后才能获取统计值,实现了快速、实时查询的目的。当待查询语句需要进行遍历查询时,则遍历所有的基本索引获取查询结果。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定。
图1是本申请的实施例提供的基于知识图谱的数据查询方法的流程图一;
图2是本申请的实施例提供的基于知识图谱的数据查询方法的流程图二;
图3是本申请的实施例提供的基于知识图谱的数据查询系统的结构示意图;
图4是本申请的实施方式提供的电子设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
本申请的实施例涉及一种基于知识图谱的数据查询方法,如图1所示,包括:
步骤101,获取待查询语句,并对待查询语句进行意图解析,确定待查询语句所属的查询类型。
在一实施例中,步骤101具体包括:对待查询语句进行意图解析,获取待查询语句对应的原子操作集合;当原子操作集合中不包含统计操作,则待查询语句为遍历查询类型;当原子操作集合中包含统计操作,则待查询语句为统计查询类型。具体地,可以通过意图解析器对待查询语句进行意图解析,比如当待查询语句为Gremlin语言时,采用Gremlin语言解析器对待查询语句进行意图解析。获取原子操作集合后,若原子操作集合中不包含任何统计操作,则待查询语句为遍历查询类型,若原子操作集合中包含了一个统计操作,则待查询语句为统计查询类型。
需要说明的是,原子操作指的是不会被线程调度机制打断的操作;这种操作一旦开始, 就一直运行到结束,在执行完毕之前不会被任何其它任务或事件中断。原子操作可以是一个步骤,也可以是多个操作步骤,但是其顺序不可以被打乱,也不可以被切割而只执行其中的一部分。将整个操作视作一个整体是原子性的核心特征。
步骤102,当待查询语句属于统计查询类型时,将待查询语句与预设的路径索引进行映射,获取待查询语句对应的目标索引,并从目标索引中获取查询结果,其中查询结果包含数据结果和数据结果对应的统计值,统计值表示所述待查询语句对应的数据结果的数量。
在本实施例中,在确定待查询语句的所属类型后,需要将待查询语句的格式转换为路径索引的格式,然后将格式转换后的待查询语句与路径索引进行映射,从多个路径索引中确定目标索引,根据目标索引即可获取数据结果和该数据结果对应的统计值。
也就是说,通过路径索引可以在数据查询时直接获取符合查询语句的数据结果和对应的数据结果的数量,达到实时查询海量数据的效果,而不用遍历整个数据源,也不用最终遍历完成后才能获取数据结果对应的数量。
步骤103,当待查询语句属于遍历查询类型时,根据待查询语句遍历预设的基本索引,获取与待查询语句对应的查询结果。
本实施例中,当待查询语句属于遍历查询类型时,可以直接使用查询引擎遍历每一条基本索引获取查询结果。比如:当待查询语句为Gremlin语言时,直接采用Gremlin查询引擎遍历每一条基本索引,然后过滤掉重复或不合法的结果,获取查询结果。
需要说明的是,遍历查询类型指的是,根据待查询语句需要搜索遍历整个数据源,遍历完成后才能获取准确且完成的查询结果,而统计查询类型指的是根据待查询语句在获取查询结果同时还要对查询结果的数量进行统计。
本申请提出的一种基于知识图谱的数据查询方法,通过对待查询语句进行意图解析,确定待查询语句所属的查询类型,按照所属类型使用不同的查询方法,可以提高数据查询的效率,当待查询语句属于统计查询类型时,将待查询语句与路径索引进行映射确定目标索引,根据目标索引即可确定待查询语句对应的数据结果和数据结果的统计值,也就是说,本申请在统计查询时直接根据目标索引即可获取查询结果,无需遍历所有数据源,也无需在遍历完成后才能获取统计值,实现了快速、实时查询的目的。当待查询语句需要进行遍历查询时,则遍历所有的基本索引获取查询结果。
本申请的实施例涉及一种基于知识图谱的数据查询方法,如图2所示,包括:
步骤201,获取知识图谱的本体,并将知识图谱的本体进行解析,转化为图结构。
本实施例中,获取的知识图谱本体由数据查询方法应用场景的相关数据构成。比如:当本申请的数据查询方法应用于某公司的销售系统时,则获取该公司的销售数据,从销售数据中提取出销售知识图谱的本体。当本申请的数据查询方法应用于某公司的货物运输过程时,则获取该公司的货物流程转数据,从货物流转数据中提取出物流知识图谱的本体。当本申请的数据方法应用于对互联网信息进行舆情分析时,则获取互联网数据,对互联网数据进行分析提取出互联网数据知识图谱的本体。当然,此处仅为具体的举例,本申请的数据方法可以应用于任何场景,在此不作赘述。
在一实施例中,知识图谱的本体包含实体、关系和属性,所述属性包含实体属性和关系属性;步骤201具体包括:将实体映射为图结构的顶点,其中实体属性为顶点的附加信息; 将关系映射为图结构的边,其中关系属性为边的附加信息;顶点和边构成图结构。
需要说明的是,知识图谱是一种基于图的数据结果,主要用来描述真实世界中存在的各种实体和概念,以及他们之间的关系。知识图谱的本体包含实体、关系和属性。具体地,实体可以是现实世界中的任何事务,比如:人、地方、公司、书、动物等等,关系则表达不同实体之间的联系。属性包括实体属性和关系属性,表示实体或关系本身具体的信息。比如实体A为一个具体的人,实体A的属性包括年龄、身高、血型、学生身份。实体B为另一个具体的人,实体B的属性包括年龄、身高、血型、老师身份。实体A和实体B的关系为师生关系,则关系的属性可以包括:师生关系的开始时间、师生关系的结束时间、师生关系的地点等。另外,每个实体都具有一个实体标签,用于标识实体类型,每一个关系也有一个关系标签,用于标识关系类型。
具体地,实体映射为顶点,关系映射为边,实体属性作为顶点的附加信息,顶点和附加信息之间采用空边连接,空边指的是没有标签的边,表现得物理含义为该实体含有该属性。类似地,关系属性作为边的附加信息,边和边的附加信息之间采用空边连接。另外,实体还携带主谓信息,用于表示该实体和具有关系的另一个实体之间的主谓关系,比如:实体A和实体B之间具有某种关系,实体A为实体A和实体B之间关系的发起者,实体B为实体A和实体B之间关系的接受者。通过主谓信息使得每一个关系映射为一个含有附加信息的有向边。
步骤202,根据图结构构建基本索引。
在一实施例中,步骤202具体包括:分别获取每个顶点的所有附加信息的组合方式和每个边的所有附加信息的组合方式;将顶点的所有附加信息的组合方式和边的所有附加信息的组合方式合并,得到组合方式集合;将组合方式集合中的每一种组合方式作为一条基本索引。
值得一提的是,目前基于知识图谱的查询方法,大多需要先人工设计并创建索引,当面对海量数据时,人工构建索引的方法极易出现错误和遗漏,且人工构建的索引无法解决统计查询的问题,不适用于实时场景。而本申请根据图结构自动构建基本索引,避免了人工构建索引造成的问题,同时根据图结构构建路径索引,解决了实时查询的问题。
本实施例中,基本索引的构建是针对每个顶点或每个边来说的,而一个顶点或一个边的附加信息可能有多个,因此对于每个顶点或每个边来说,获取所有附加信息的组合方式,每一种组合方式作为一条基本索引,比如:有一图结构,包括:顶点A(含有附加信息a1、a2)、顶点B(含有附加信息b1、b2、b3)、顶点C(含有附加信息c1、c2),顶点A和顶点B之间存在边M1(含有附加信息m11、m12),顶点A和顶点C之间存在边M2(含有附加信息m21、m22、m23),那么顶点A附加信息的组合方式有{a1,a2,a1a2},顶点B附加信息的组合方式有{b1,b2,b3,b1b2,b1b3,b2b3},顶点C附加信息的组合方式有{c1,c2,c1c2},边M1附加信息的组合方式有{m11,m12,m11m12},边M2附加信息的组合方式有{m21,m22,m21m22},则该图结构的组合方式集合为{a1,a2,a1a2,b1,b2,b3,b1b2,b1b3,b2b3,c1,c2,c1c2,m11,m12,m11m12,m21,m22,m21m22},其中每一种组合方式为一条基本索引。
需要说明的是,将知识图谱的本体转化为图结构,图结构可以很容易获取数据之间的联系,从图结构中提取构建基本索引,可以很容易获取顶点(实体)的相关信息和边(关系)的相关信息,避免出现顶点和边组合或混淆的情况。另外,整个基本索引构建过程均为自动 化构建,相比于人工构建的方式,节约成本且构建的过程不容易出现错误或遗漏的情况。
另外,在构建基本索引过程中,可以根据每个附加信息的类型,为每个附加信息设置优先级或权重,当附加信息的数量超过预设的上限值时,根据优先级或权重剔除掉低优先级的或低权重的附加信息。比如:当本申请的数据查询方法应用于销售系统时,顶点(实体)为销售员,则顶点附加信息(属性)如年龄、身高等信息对于销售分析来说不重要,可以将这些附加信息的优先级或权重设置的低一些,顶点附加信息(属性)如工作时间、性格等信息可能对于销售分析来说比较重要,则可以将这些附加信息的优先级或权重设置的高一些。
步骤203,从图结构中提取查询模式,基于查询模式构建路径索引。
在一实施例中,步骤203具体包括:采用预设的图遍历算法遍历图结构,获取图结构中顶点之间的所有路径;将每一条路径作为一种查询模式,获取查询模式下的多个查询条件;获取每个查询条件对应的统计值,并将每一种包含多个查询条件和多个统计值的查询模式作为一条路径索引。也就是说,在构建路径索引过程中,已经对每一种查询条件的数量进行统计保存,如此,在后续数据查询过程中,只需直接抽取数据和统计值即可,从而达到了实时查询的目的。
本实施例中,路径索引是针对图结构中每一条路径来说的,因此通过图遍历算法获取图结构中两两顶点之间的所有路径,将每一条路径作为一种查询模式,然后确定该查询模式下对应的多个查询条件,进一步获取每个查询条件在该图结构中的数量作为查询条件的统计值。比如:本申请的数据查询方法应用于物流系统,该图结构中有20个顶点,这20个顶点均表示不同的物品,但这些物品的发货地均相同,因此当查询条件为发货地时,该查询条件对应的统计值为20。
在一实施方式中,将每一条路径作为一种查询模式,获取查询模式下的多个查询条件,具体包括:分别计算每一条路径中所有顶点的所有附加信息的组合方式和每一条路径中所有边的所有附加信息的组合方式;对于每一条路径,将顶点的附加信息的组合方式和边的附加信息的组合方式合并,得到路径组合方式集合;将所述路径组合方式集合中每一种组合方式作为一个查询条件。
本实施例中,以图结构中路径A-B-C为例,该路径中有三个顶点A(含3个附加信息)、B(含有5个附加信息)、C(含有2个附加信息),以及两个边M1(含有1个附加信息)、M2(含有3个附加信息),然后计算这条路径中3个顶点的共10个附加信息的组合方式,并计算这条路径中2个边的共4个附加信息的组合方式。在一实施方式中,将顶点的附加信息的组合方式和边的附加信息的组合方式合并得到路径组合方式集合,这个集合中每一种组合方式都是一个查询条件。
在一实施例中,将顶点的附加信息的组合方式和边的附加信息的组合方式合并,得到路径组合方式集合之后,还包括:对路径组合方式集合中的每一种组合方式按照字符串大小排序。将路径组合方式集合按字符串大小排序可以提高搜索查询时的速度。
另外,在获取每个查询条件对应的统计值,并将每一种包含多个查询条件和多个统计值的查询模式作为一条路径索引之后,还包括:当有新的实体或新的关系加入所述路径索引时,获取所述新的实体的实体属性或新的关系的关系属性;将新的实体属性或新的关系属性与每一条路径索引进行比对,确定与所述新的实体属性或所述新的关系属性相符合的路径索引; 根据所述新的实体属性或所述新的关系属性,更新与新的实体属性或新的关系属性相符合的路径索引中的查询条件和所述查询条件对应的统计值。
本实施例中,当有新的实体或新的关系加入路径索引时,获取该实体属性或关系属性,将实体属性或关系属性与每一条路径索引进行比对。比如:当有新的关系加入路径索引时,确定新的关系的关系属性是描述哪两个实体之间的关系的,确定与之相符合的路径索引,即相符合的路径索引中包含有描述这两个实体的对应的附加信息。当有新的实体加入路径索引时,确定新的实体的实体属性与哪些路径索引相符合,比如:当某一路径索引中包含了多个附加信息组合方式,而这些附加信息均为多个实体的实体属性和实体间关系的关系属性,若这些实体均为销售员,且这些销售员均在同一家公司、互为同事关系,当有一个新的实体加入时,该实体同样为销售员,且与上述这些销售员互为同事关系,则新的实体属性与这一条路径索引相符合。
在一实施例中,在获取图结构中顶点之间的所有路径之后,还包括:根据预设的顶点附加信息类型和优先级的对应关系,为路径中所有顶点的附加信息确定优先级;根据预设的边附加信息类型和优先级的对应关系,为路径中所有边的附加信息确定优先级;当所述顶点的附加信息数量超过预设的顶点附加信息数量上限值时,根据顶点的附加信息的优先级从所有顶点的附加信息中剔除N个附加信息,N为顶点的附加信息数量与顶点附加信息数量上限值之差;当所述边的附加信息数量超过预设的边附加信息数量上限值时,根据边的附加信息的优先级从所有边的附加信息中剔除M个附加信息,M为边的附加信息数量与边附加信息数量上限值之差。
本实施例中,优先级的设定根据附加信息对业务的重要性、附加信息在查询时的使用频率等信息确定。当顶点(或边)附加信息的数量超过预设的顶点(边)附加信息数量上限值时,将对附加信息的优先级从高到低进行排序,剔除超出上限值且优先级等级较低的附加信息,以此来提高查询速率。
步骤204,获取待查询语句,并对待查询语句进行意图解析,确定待查询语句所属的查询类型。
步骤205,当待查询语句属于统计查询类型时,将待查询语句与预设的路径索引进行映射,获取待查询语句对应的目标索引,并从目标索引中获取查询结果,其中查询结果包含数据结果和数据结果对应的统计值,统计值表示所述待查询语句对应的数据结果的数量。
步骤206,当待查询语句属于遍历查询类型时,根据待查询语句遍历预设的基本索引,获取与待查询语句对应的查询结果。
本实施例中,步骤204-步骤204的具体实施细节与步骤101-步骤103的基本相同,在此不做赘述。
本申请提出的一种基于知识图谱的数据查询方法,通过对待查询语句进行意图解析,确定待查询语句所属的查询类型,按照所属类型使用不同的查询方法,可以提高数据查询的效率,当待查询语句属于统计查询类型时,将待查询语句与路径索引进行映射确定目标索引,根据目标索引即可确定待查询语句对应的数据结果和数据结果的统计值,也就是说,本申请在统计查询时直接根据目标索引即可获取查询结果,无需遍历所有数据源,也无需在遍历完成后才能获取统计值,实现了快速、实时查询的目的。当待查询语句需要进行遍历查询时, 则遍历所有的基本索引获取查询结果。
此外,应当理解的是,上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本申请的保护范围内;对流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其流程的核心设计都在该申请的保护范围内。
本申请的实施例涉及一种基于知识图谱的数据查询系统,如图3所示,包括:
意图解析模块301,用于获取待查询语句,并对所述待查询语句进行意图解析,确定所述待查询语句所属的查询类型;
统计查询模块302,用于当所述待查询语句属于统计查询类型时,将所述待查询语句与预设的路径索引进行映射,获取所述待查询语句对应的目标索引,并从所述目标索引中获取查询结果,其中所述查询结果包含数据结果和所述数据结果对应的统计值,所述统计值表示所述待查询语句对应的数据结果的数量;
遍历查询模块303,用于当所述待查询语句属于遍历查询类型时,根据所述待查询语句遍历预设的基本索引,获取与所述待查询语句对应的查询结果。
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。
不难发现,本实施例为与基于知识图谱的数据查询方法实施例相对应的系统实施例,本实施例可与上述实施例互相配合实施。上述实施例中提到的相关技术细节在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在上述方法实施例中。
本申请的实施方式涉及一种电子设备,如图4所示,包括:至少一个处理器401;以及,与所述至少一个处理器401通信连接的存储器402;其中,所述存储器402存储有可被所述至少一个处理器401执行的指令,所述指令被所述至少一个处理器401执行,以使所述至少一个处理器401能够执行上述实施方式的基于知识图谱的数据查询方法。
其中,存储器和处理器采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器和存储器的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他系统通信的单元。经处理器处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器。
处理器负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器可以被用于存储处理器在执行操作时所使用的数据。
本申请的实施方式涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述基于知识图谱的数据查询方法。
即,本领域技术人员可以理解,实现上述实施方式方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施方式所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本领域的普通技术人员可以理解,上述各实施例是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (13)

  1. 一种基于知识图谱的数据查询方法,包括:
    获取待查询语句,并对所述待查询语句进行意图解析,确定所述待查询语句所属的查询类型;
    当所述待查询语句属于统计查询类型时,将所述待查询语句与预设的路径索引进行映射,获取所述待查询语句对应的目标索引,并从所述目标索引中获取查询结果,其中所述查询结果包含数据结果和所述数据结果对应的统计值,所述统计值表示所述待查询语句对应的数据结果的数量;
    当所述待查询语句属于遍历查询类型时,根据所述待查询语句遍历预设的基本索引,获取与所述待查询语句对应的查询结果。
  2. 根据权利要求1所述的基于知识图谱的数据查询方法,其中,在所述获取待查询语句之前,还包括:
    获取知识图谱的本体,并将所述知识图谱的本体进行解析,转化为图结构;
    根据所述图结构构建所述基本索引;
    从所述图结构中提取查询模式,基于所述查询模式构建所述路径索引。
  3. 根据权利要求2所述的基于知识图谱的数据查询方法,其中,所述知识图谱的本体包含实体、关系和属性,所述属性包含实体属性和关系属性;
    所述将所述知识图谱的本体进行解析,转化为图结构,包括:
    将所述实体映射为所述图结构的顶点,其中所述实体属性为所述顶点的附加信息;
    将所述关系映射为所述图结构的边,其中所述关系属性为所述边的附加信息;
    所述顶点和所述边构成所述图结构。
  4. 根据权利要求3所述的基于知识图谱的数据查询方法,其中,所述根据所述图结构构建所述基本索引,包括:
    分别获取每个顶点的所有附加信息的组合方式和每个边的所有附加信息的组合方式;
    将顶点的所有附加信息的组合方式和边的所有附加信息的组合方式合并,得到组合方式集合;
    将所述组合方式集合中的每一种组合方式作为一条基本索引。
  5. 根据权利要求3所述的基于知识图谱的数据查询方法,其中,所述从所述图结构中提取查询模式,基于所述查询模式构建所述路径索引,包括:
    采用预设的图遍历算法遍历所述图结构,获取所述图结构中顶点之间的所有路径;
    将每一条路径作为一种查询模式,获取所述查询模式下的多个查询条件;
    获取每个查询条件对应的统计值,并将每一种包含多个查询条件和多个统计值的查询模式作为一条路径索引。
  6. 根据权利要求5所述的基于知识图谱的数据查询方法,其中,所述将每一条路径作为一种查询模式,获取所述查询模式下的多个查询条件,包括:
    分别计算每一条路径中所有顶点的所有附加信息的组合方式和每一条路径中所有边的所有附加信息的组合方式;
    对于每一条路径,将顶点的附加信息的组合方式和边的附加信息的组合方式合并,得到路径组合方式集合;
    将所述路径组合方式集合中每一种组合方式作为一个查询条件。
  7. 根据权利要求5所述的基于知识图谱的数据查询方法,其中,在所述获取每个查询条件对应的统计值,并将每一种包含多个查询条件和多个统计值的查询模式作为一条路径索引之后,还包括:
    当有新的实体或新的关系加入所述路径索引时,获取所述新的实体的实体属性或新的关系的关系属性;
    将新的实体属性或新的关系属性与每一条路径索引进行比对,确定与所述新的实体属性或所述新的关系属性相符合的路径索引;
    根据所述新的实体属性或所述新的关系属性,更新与所述新的实体属性或所述新的关系属性相符合的路径索引中的查询条件和所述查询条件对应的统计值。
  8. 根据权利要求5所述的基于知识图谱的数据查询方法,其中,所述获取所述图结构中顶点之间的所有路径之后,还包括:
    根据预设的顶点附加信息类型和优先级的对应关系,为所述路径中所有顶点的附加信息确定优先级;
    根据预设的边附加信息类型和优先级的对应关系,为所述路径中所有边的附加信息确定优先级;
    当所述顶点的附加信息数量超过预设的顶点附加信息数量上限值时,根据顶点的附加信息的优先级从所有顶点的附加信息中剔除N个附加信息,N为顶点的附加信息数量与顶点附加信息数量上限值之差;
    当所述边的附加信息数量超过预设的边附加信息数量上限值时,根据边的附加信息的优先级从所有边的附加信息中剔除M个附加信息,M为边的附加信息数量与边附加信息数量上限值之差。
  9. 根据权利要求6所述的基于知识图谱的数据查询方法,其中,所述将顶点的附加信息的组合方式和边的附加信息的组合方式合并,得到路径组合方式集合之后,还包括:
    对所述路径组合方式集合中的每一种组合方式按照字符串大小排序。
  10. 根据权利要求1所述的基于知识图谱的数据查询方法,其中,所述对所述待查询语句进行意图解析,确定所述待查询语句所属的查询类型,包括:
    对所述待查询语句进行意图解析,获取所述待查询语句对应的原子操作集合;
    当所述原子操作集合中不包含统计操作,则所述待查询语句为遍历查询类型;
    当所述原子操作集合中包含统计操作,则所述待查询语句为统计查询类型。
  11. 一种基于知识图谱的数据查询系统,包括:
    意图解析模块,用于获取待查询语句,并对所述待查询语句进行意图解析,确定所述待查询语句所属的查询类型;
    统计查询模块,用于当所述待查询语句属于统计查询类型时,将所述待查询语句与预设的路径索引进行映射,获取所述待查询语句对应的目标索引,并从所述目标索引中获取查询结果,其中所述查询结果包含数据结果和所述数据结果对应的统计值,所述统计值表示所述待查询语句对应的数据结果的数量;
    遍历查询模块,用于当所述待查询语句属于遍历查询类型时,根据所述待查询语句遍历预设的基本索引,获取与所述待查询语句对应的查询结果。
  12. 一种电子设备,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至10中任一项所述的基于知识图谱的数据查询方法。
  13. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至10中任一项所述的基于知识图谱的数据查询方法。
PCT/CN2022/143004 2021-12-29 2022-12-28 基于知识图谱的数据查询方法、系统、设备及存储介质 WO2023125718A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111643082.8A CN116414878A (zh) 2021-12-29 2021-12-29 基于知识图谱的数据查询方法、系统、设备及存储介质
CN202111643082.8 2021-12-29

Publications (1)

Publication Number Publication Date
WO2023125718A1 true WO2023125718A1 (zh) 2023-07-06

Family

ID=86998111

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/143004 WO2023125718A1 (zh) 2021-12-29 2022-12-28 基于知识图谱的数据查询方法、系统、设备及存储介质

Country Status (2)

Country Link
CN (1) CN116414878A (zh)
WO (1) WO2023125718A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129690A1 (en) * 2016-11-04 2018-05-10 International Business Machines Corporation Schema-Free In-Graph Indexing
CN110019694A (zh) * 2017-07-26 2019-07-16 凡普互金有限公司 用于知识图谱的方法、装置和计算机可读存储介质
CN111897971A (zh) * 2020-07-29 2020-11-06 中国电力科学研究院有限公司 一种适用于电网调度控制领域的知识图谱管理方法及系统
WO2021208703A1 (zh) * 2020-11-19 2021-10-21 平安科技(深圳)有限公司 问题解析方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129690A1 (en) * 2016-11-04 2018-05-10 International Business Machines Corporation Schema-Free In-Graph Indexing
CN110019694A (zh) * 2017-07-26 2019-07-16 凡普互金有限公司 用于知识图谱的方法、装置和计算机可读存储介质
CN111897971A (zh) * 2020-07-29 2020-11-06 中国电力科学研究院有限公司 一种适用于电网调度控制领域的知识图谱管理方法及系统
WO2021208703A1 (zh) * 2020-11-19 2021-10-21 平安科技(深圳)有限公司 问题解析方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANG, RONG ET AL.: "Design and Implementation of Information Query System Based on Knowledge Graph", COMPUTER & DIGITAL ENGINEERING, no. 04, 20 April 2020 (2020-04-20), XP009547272 *

Also Published As

Publication number Publication date
CN116414878A (zh) 2023-07-11

Similar Documents

Publication Publication Date Title
US11681702B2 (en) Conversion of model views into relational models
US10133778B2 (en) Query optimization using join cardinality
US9053210B2 (en) Graph query processing using plurality of engines
US20210192389A1 (en) Method for ai optimization data governance
EP3014488B1 (en) Incremental maintenance of range-partitioned statistics for query optimization
US20140351241A1 (en) Identifying and invoking applications based on data in a knowledge graph
WO2022143045A1 (zh) 数据血缘关系的确定方法及装置、存储介质、电子装置
CN114625732B (zh) 基于结构化查询语言sql的查询方法和系统
EP3654198A1 (en) Conversational database analysis
US11263218B2 (en) Global matching system
US20190332630A1 (en) Ontology index for content mapping
CN109325029A (zh) 基于稀疏矩阵的rdf数据存储和查询方法
CN111475588B (zh) 数据处理方法及装置
CN114461603A (zh) 多源异构数据融合方法及装置
CN113722600B (zh) 应用于大数据的数据查询方法、装置、设备及产品
CN113779349A (zh) 数据检索系统、装置、电子设备和可读存储介质
CN110874366A (zh) 数据处理、查询方法和装置
KR20130064160A (ko) Rdf 데이터에 대한 sparql 질의 결과의 개체 관계 변형 시스템 및 그 방법
CN110008448B (zh) 将SQL代码自动转换为Java代码的方法和装置
US20230153286A1 (en) Method and system for hybrid query based on cloud analysis scene, and storage medium
WO2023125718A1 (zh) 基于知识图谱的数据查询方法、系统、设备及存储介质
Kvet Dangling predicates and function call optimization in the Oracle database
CN115934969A (zh) 一种不可移动文物风险评估知识图谱构建方法
CN111159213A (zh) 一种数据查询方法、装置、系统和存储介质
CN109086426A (zh) 数据查询方法、装置、计算机设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914968

Country of ref document: EP

Kind code of ref document: A1