CN109241078B - Knowledge graph organization query method based on mixed database - Google Patents
Knowledge graph organization query method based on mixed database Download PDFInfo
- Publication number
- CN109241078B CN109241078B CN201811005179.4A CN201811005179A CN109241078B CN 109241078 B CN109241078 B CN 109241078B CN 201811005179 A CN201811005179 A CN 201811005179A CN 109241078 B CN109241078 B CN 109241078B
- Authority
- CN
- China
- Prior art keywords
- entity
- entities
- query
- knowledge base
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention relates to a knowledge graph organization query method based on a mixed database, which comprises the following steps: acquiring a triple set in a preset data set; distinguishing an entity triple set and a relation triple set from the triple set; storing the entity triple set on Neo4j to obtain a knowledge base with entities; constructing an index for the knowledge base with the entity to obtain the knowledge base with the index and the entity; storing the relation triple set on Neo4j to obtain a knowledge base with indexes, entities and relations; storing entity ambiguity information on MySQL to construct an entity ambiguity word list; and storing the constructed entity ambiguity word list into a knowledge base with indexes, entities and relations to obtain a complete knowledge base. The invention provides a knowledge graph organization method based on a mixed database by combining the advantages of a relational database and a graph database, is suitable for a general knowledge graph in a large-scale open field, and improves the query efficiency of the knowledge graph while optimizing the storage structure of the knowledge graph.
Description
Technical Field
The invention particularly relates to a knowledge graph organization query method based on a mixed database.
Background
As an efficient information organization and retrieval method, the knowledge graph has raised a hot learning trend since Google 2012. The aspects of entity extraction, attribute extraction, relationship extraction between entities, knowledge reasoning, knowledge representation learning and the like are more research hotspots, but few documents mention how to perform underlying storage of the graph and how to combine with an interface for storing design queries, or, although mention is made, the description of the aspects is incomplete and scattered. Storage and query usually appear as a whole, efficient query needs a good storage structure to support, and storage needs to be continuously optimized in combination with the characteristics of query.
Conventional databases, such as relational databases. The method can well perform clustering storage according to the information of the Schema layer of the knowledge graph, and has high efficiency when accessing certain class of data, but in other words, before the storage, Schema hierarchical information of the data needs to be known in advance, and once the Schema is determined, great change is difficult to be made, however, for the knowledge graph in the large-scale open field, the types of entities and relations are usually many and complex, and the Schema hierarchical information in the graph is difficult to be determined; secondly, when a multi-table connection (usually the connection depth is more than 2) query is faced, the relational database also seems inattentive, but the query operation is a very basic requirement of the knowledge graph.
For NOSQL database, such as primary key value database, column family storage database, document oriented database, graphic database, etc. The data structure of the graph database is closest to the knowledge graph and is represented as a huge graph structure model consisting of a large number of entity nodes and incidence relations among entities, and the graph structure model can well represent the relation among concrete or abstract things; meanwhile, the requirement of local access characteristics of the graph can be well met. However, how to store information that does not satisfy the graph data structure in the graph, such as ambiguity information between entities, becomes a problem to be solved.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a knowledge graph organization query method based on a hybrid database to solve the above problems, aiming at the above disadvantages of the conventional relational database and graph database technology.
A knowledge graph organization query method based on a mixed database comprises the following steps:
step 4, constructing indexes aiming at entity nodes stored in the knowledge base with the entities to obtain the knowledge base with the indexes and the entities;
step 5, storing the relation triple set on Neo4j to obtain a knowledge base with indexes, entities and relations;
step 6, storing entity ambiguity on MySQL to construct an entity ambiguity word list;
step 7, storing the entity ambiguity word list constructed in the step 6 into the knowledge base with indexes, entities and relations obtained in the step 5 to obtain a complete knowledge base;
and 8, inputting an entity to be queried, and querying in the complete knowledge base obtained in the step 7 by adopting a two-stage query method of MySQL + Neo4j to obtain complete entity information.
Further, the preset data set in step 2 refers to general descriptions of entities and relations, and is any one or combination of structured data, unstructured data and semi-structured data.
Further, the specific storage method in step 3 is as follows: and distinguishing different entity nodes from the entity triple set and storing the entity nodes.
Further, the specific storage method in step 5 is as follows: and (4) distinguishing head and tail entity nodes from the relation triple set, then inquiring the head and tail entities in the knowledge base with the index and the entities obtained in the step (4), if the head and tail entities are hit, constructing a relation for the head and tail nodes, and if the head and tail entities are not hit, cancelling the relation.
Further, the entity ambiguity in step 6 refers to the situation of word ambiguity and synonyms existing between entities.
Further, the two-level query structure of MySQL + Neo4j specifically includes:
(1) inputting an entity to be queried, firstly, performing SQL query in a MySQL database, and judging whether the query hits: if the SQL query is hit, judging that the entity to be queried has ambiguity, returning all ambiguous entities corresponding to the ambiguous entities to the user, disambiguating the entities, and inputting the disambiguated entities into a Neo4j database for CQL query; if the SQL query is not hit, judging that the entity to be queried has no ambiguity, and directly transmitting the entity to be queried to a Neo4j database for CQL query;
(2) and taking the entity to be queried or the entity after disambiguation as the input of the Neo4j database to perform CQL query, and obtaining complete entity information as final output.
Further, the method for judging whether the query hits in the SQL query is as follows: and (4) comparing the entity to be queried with the entity ambiguity word list obtained in the step (6), if matching exists, querying is hit, otherwise, querying is not hit.
The invention has the advantages that: the knowledge graph organization method based on the mixed database is provided by combining the advantages of the relational database and the graph database, is suitable for the knowledge graph in the general large-scale open field, optimizes the storage structure of the knowledge graph and improves the query efficiency of the knowledge graph.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of a knowledge-graph organization query method based on a hybrid database according to the present invention;
FIG. 2 is a two-level query structure diagram of MySQL + Neo4j of the present invention.
Detailed Description
For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
As shown in fig. 1, a method for querying a knowledge-graph organization based on a hybrid database includes:
step 4, constructing indexes aiming at entity nodes stored in the knowledge base with the entities to obtain the knowledge base with the indexes and the entities
Step 5, storing the relation triple set on Neo4j, distinguishing head and tail entity nodes from the relation triple set, then inquiring the head and tail entities in the knowledge base with the index and the entity obtained in step 4, if the head and tail entities are hit, establishing a relation for the head and tail nodes, and if the relation is not invalidated, obtaining the knowledge base with the index, the entity and the relation;
and 6, storing entity ambiguity information on MySQL to construct an entity ambiguity word list, wherein the entity ambiguity refers to the condition of one word multiple meaning and synonyms existing between entities. (ii) a
And 7, storing the entity ambiguity word list constructed in the step 6 into the knowledge base with indexes, entities and relations obtained in the step 5 to obtain a complete knowledge base.
And 8, inputting an entity to be queried, and querying in the complete knowledge base obtained in the step 7 by adopting a two-stage query method of MySQL + Neo4j to obtain complete entity information.
The two-stage query method of MvSQL + Neo4j specifically comprises the following steps: firstly, whether entity ambiguity information exists in the entity is inquired in the MvSOL, if so, the entity ambiguity information is disambiguated, and then the entity is inquired in Neo4j, otherwise, the entity ambiguity information is directly inquired in Neo4 j. As shown in fig. 2, the query process is as follows:
1. SQL query (as shown by the number 1 in FIG. 2)
Since it is unknown whether the input entity name is ambiguous, the input entity name first needs to be SQL queried in the MySQL database, that is, the input entity name is matched with the first column of the ambiguous vocabulary in fig. 2 (the first column of the ambiguous vocabulary is the entity name, the second column is ambiguous entity, for example, the key value pair < S1, < E1, E2> indicates that the entity name S1 is ambiguous, and the ambiguous entities E1 and E2 point to the same string S1). According to whether the query hits or not, the following two cases are processed:
1) SQL query hit:
that is, the input entity name is ambiguous (as shown in fig. 2, the input entity name Sm is ambiguous, so that the ambiguous entities Ek to Ek + n pointing to the same character string Sm are returned after query hit), all the ambiguous entities Ek to Ek + n corresponding to the input are returned to the user, and the entities are disambiguated (as shown by reference numeral 2 in fig. 2, a specific disambiguation mode is determined by a specific application scenario), and the disambiguated entities (Ek + i) are input into the Neo4j database for CQL query (as shown by reference numeral 3 in fig. 2).
2) SQL query miss:
namely, the input entity name is not ambiguous, and the CQL query is directly carried out.
2. CQL query (i.e., query to knowledge base in FIG. 2)
Whether the SQL query is hit or not, only the entity name is finally obtained. In order to obtain the complete information of the entity, the obtained entity name is required to be used as the input of the Neo4j database to perform the CQL query, so as to obtain the complete entity information as the final response to the input of the user.
Specific query examples are as follows:
query example 1: input entity name with entity ambiguity
1) Inputting an entity: radix Et rhizoma Rhei
2) SQL query: ambiguous vocabulary queries in MySQL
3) SQL query hits (which represent an ambiguity in the input entity name "qilixiang"), returning an ambiguous entity pointing to "qilixiang":
qilixiang (Zhoujilun 2004 album)
Qilixiang (Murraya plant of Rutaceae)
Qilixiang (Zhou Jie Lun singing song)
Qilixiang (poem song name, poem collection name)
Qilixiang (Thailand TV series)
Qilixiang (Chinese medicine)
Qilixiang (novel seven lixiang)
………………
4) Entity disambiguation:
assume that at this point entity disambiguation is performed according to context.
The context is: "Zhou Jilun Qilixiang is a song that I like.
So the entities disambiguated according to context are: qilixiang (Zhou Jie Lun singing song)
5) CQL query:
the entity information query is carried out on the entity 'Qilixiang (Zhongjilun singing song)' after disambiguation in Neo4j, and the final output is obtained:
qilixiang (Zhou Jie Lun singing song)
Baidusag: musical composition/single song
The name of Chinese: radix Et rhizoma Rhei
Release time: 2004, the year
Original singing of songs: zhou Jie Lun
Word filling: fangwenshan
The album to which it belongs: chinese medicinal preparation containing seven kinds of Zingiber officinale (published by Zhou Jie Lun 2004)
Duration of song: 4:56
Song language: mandarin Chinese
And (3) song editing: people with clock center
And (3) music composing: zhou Jie Lun
Music style: chinese wind
………………
Query example 2: assuming that the entity name of the input is not entity ambiguous
1) Inputting an entity: qilixiang (Zhou Jie Lun singing song)
2) SQL query: ambiguous vocabulary queries in MySQL
3) SQL query miss (representing no ambiguity in entity name input at this time)
4) CQL query:
entity information query is performed in Neo4j, resulting in the final output:
qilixiang (Zhou Jie Lun singing song)
Baidusag: musical composition/single song
The name of Chinese: radix Et rhizoma Rhei
Release time: 2004, the year
Original singing of songs: zhou Jie Lun
Word filling: fangwenshan
The album to which it belongs: chinese medicinal preparation containing seven kinds of Zingiber officinale (published by Zhou Jie Lun 2004)
Duration of song: 4:56
Song language: mandarin Chinese
And (3) song editing: people with clock center
And (3) music composing: zhou Jie Lun
Music style: chinese wind
………………
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (7)
1. A knowledge graph organization query method based on a mixed database is characterized by comprising the following steps:
step 1, acquiring a triple set in a preset data set;
step 2, distinguishing an entity triple set and a relation triple set from the triple set obtained in the step 1;
step 3, storing the entity triple set on Neo4j to obtain a knowledge base with the entity;
step 4, constructing indexes aiming at entity nodes stored in the knowledge base with the entities to obtain the knowledge base with the indexes and the entities;
step 5, storing the relation triple set on Neo4j to obtain a knowledge base with indexes, entities and relations;
step 6, storing entity ambiguity on MySQL to construct an entity ambiguity word list;
step 7, storing the entity ambiguity word list constructed in the step 6 into the knowledge base with indexes, entities and relations obtained in the step 5 to obtain a complete knowledge base;
and 8, inputting an entity to be queried, and querying in the complete knowledge base obtained in the step 7 by adopting a two-stage query method of MySQL + Neo4j to obtain complete entity information.
2. The method of claim 1, wherein the preset data set in step 2 refers to general description of entities and relationships, and is a combination of any one or more of structured data, unstructured data and semi-structured data.
3. The knowledge-graph organization query method based on the hybrid database according to claim 1, wherein the specific storage method in step 3 is: and distinguishing different entity nodes from the entity triple set and storing the entity nodes.
4. The knowledge-graph organization query method based on the hybrid database according to claim 1, wherein the specific storage method in step 5 is: and (4) distinguishing head and tail entity nodes from the relation triple set, then inquiring the head and tail entities in the knowledge base with the index and the entities obtained in the step (4), if the head and tail entities are hit, constructing a relation for the head and tail nodes, and if the head and tail entities are not hit, cancelling the relation.
5. The method of claim 1, wherein the entity ambiguity in step 6 is the presence of word ambiguity and synonyms between entities.
6. The knowledge-graph organization query method based on the hybrid database according to claim 1, wherein the two-stage query structure of MySQL + Neo4j specifically comprises:
(1) inputting an entity to be queried, firstly, performing SQL query in a MySQL database, and judging whether the query hits: if the SQL query is hit, judging that the entity to be queried has ambiguity, returning all ambiguous entities corresponding to the ambiguous entities to the user, disambiguating the entities, and inputting the disambiguated entities into a Neo4j database for CQL query; if the SQL query is not hit, judging that the entity to be queried has no ambiguity, and directly transmitting the entity to be queried to a Neo4j database for CQL query;
(2) and taking the entity to be queried or the entity after disambiguation as the input of the Neo4j database to perform CQL query, and obtaining complete entity information as final output.
7. The method of claim 6, wherein the method of determining whether the query hits in the SQL query is: and (4) comparing the entity to be queried with the entity ambiguity word list obtained in the step (6), if matching exists, querying is hit, otherwise, querying is not hit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811005179.4A CN109241078B (en) | 2018-08-30 | 2018-08-30 | Knowledge graph organization query method based on mixed database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811005179.4A CN109241078B (en) | 2018-08-30 | 2018-08-30 | Knowledge graph organization query method based on mixed database |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109241078A CN109241078A (en) | 2019-01-18 |
CN109241078B true CN109241078B (en) | 2021-07-20 |
Family
ID=65067986
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811005179.4A Active CN109241078B (en) | 2018-08-30 | 2018-08-30 | Knowledge graph organization query method based on mixed database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109241078B (en) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019687B (en) * | 2019-04-11 | 2021-03-23 | 宁波深擎信息科技有限公司 | Multi-intention recognition system, method, equipment and medium based on knowledge graph |
CN111859974A (en) * | 2019-04-22 | 2020-10-30 | 广东小天才科技有限公司 | Semantic disambiguation method and device combined with knowledge graph and intelligent learning equipment |
CN110489610B (en) * | 2019-08-14 | 2022-02-08 | 北京海致星图科技有限公司 | Knowledge graph real-time query solution |
CN110597927B (en) * | 2019-10-14 | 2022-08-16 | 上海依图网络科技有限公司 | Storage query method and device based on heterogeneous database |
CN110928960B (en) * | 2019-10-28 | 2023-08-11 | 华中科技大学 | Data storage system, method, equipment and storage medium |
CN111160841A (en) * | 2019-11-29 | 2020-05-15 | 广东轩辕网络科技股份有限公司 | Organization architecture construction method and device based on knowledge graph |
CN113761213A (en) * | 2020-06-01 | 2021-12-07 | Tcl科技集团股份有限公司 | Data query system and method based on knowledge graph and terminal equipment |
CN113342807A (en) * | 2021-05-20 | 2021-09-03 | 电子科技大学 | Knowledge graph based on mixed database and construction method thereof |
CN113297089B (en) * | 2021-06-09 | 2023-06-20 | 南京大学 | Knowledge graph-based mass measurement assistant implementation method |
CN114238268B (en) * | 2021-11-29 | 2022-09-30 | 武汉达梦数据技术有限公司 | Data storage method and device |
CN114398492B (en) * | 2021-12-24 | 2022-08-30 | 森纵艾数(北京)科技有限公司 | Knowledge graph construction method, terminal and medium in digital field |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105224630A (en) * | 2015-09-24 | 2016-01-06 | 中国科学院自动化研究所 | Based on the integrated approach of Ontology on Semantic Web data |
CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | System and method for constructing knowledge graph for information analysis |
CN107330125A (en) * | 2017-07-20 | 2017-11-07 | 云南电网有限责任公司电力科学研究院 | The unstructured distribution data integrated approach of magnanimity of knowledge based graphical spectrum technology |
CN107633075A (en) * | 2017-09-22 | 2018-01-26 | 吉林大学 | A kind of multi-source heterogeneous data fusion platform and fusion method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9946739B2 (en) * | 2013-03-15 | 2018-04-17 | Neura Labs Corp. | Intelligent internet system with adaptive user interface providing one-step access to knowledge |
KR20140145018A (en) * | 2013-06-12 | 2014-12-22 | 한국전자통신연구원 | Knowledge index system and method thereof |
US20180137424A1 (en) * | 2016-11-17 | 2018-05-17 | General Electric Company | Methods and systems for identifying gaps in predictive model ontology |
-
2018
- 2018-08-30 CN CN201811005179.4A patent/CN109241078B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105224630A (en) * | 2015-09-24 | 2016-01-06 | 中国科学院自动化研究所 | Based on the integrated approach of Ontology on Semantic Web data |
CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | System and method for constructing knowledge graph for information analysis |
CN107330125A (en) * | 2017-07-20 | 2017-11-07 | 云南电网有限责任公司电力科学研究院 | The unstructured distribution data integrated approach of magnanimity of knowledge based graphical spectrum technology |
CN107633075A (en) * | 2017-09-22 | 2018-01-26 | 吉林大学 | A kind of multi-source heterogeneous data fusion platform and fusion method |
Also Published As
Publication number | Publication date |
---|---|
CN109241078A (en) | 2019-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109241078B (en) | Knowledge graph organization query method based on mixed database | |
CN107993724B (en) | Medical intelligent question and answer data processing method and device | |
KR101732342B1 (en) | Trusted query system and method | |
CN111291161A (en) | Legal case knowledge graph query method, device, equipment and storage medium | |
Sarawagi et al. | Open-domain quantity queries on web tables: annotation, response, and consensus models | |
US20120166414A1 (en) | Systems and methods for relevance scoring | |
US20170116260A1 (en) | Using a dimensional data model for transforming a natural language query to a structured language query | |
CN101763402B (en) | Integrated retrieval method for multi-language information retrieval | |
WO2007143899A1 (en) | System and method for intelligent retrieval and treating of information | |
CN103646032A (en) | Database query method based on body and restricted natural language processing | |
JP2005251115A (en) | System and method of associative retrieval | |
KR20160007040A (en) | Method and system for searching by using natural language query | |
RU2010107150A (en) | IDENTIFICATION OF SEMANTIC RELATIONS IN INDIRECT SPEECH | |
CN109597895B (en) | Knowledge graph-based official document searching method | |
CN112231321B (en) | Oracle secondary index and index real-time synchronization method | |
KR101095866B1 (en) | Triple indexing and searching scheme for efficient information retrieval | |
WO2020074017A1 (en) | Deep learning-based method and device for screening for keywords in medical document | |
CN101751420A (en) | Semantics vein document searching method | |
CN115563313A (en) | Knowledge graph-based document book semantic retrieval system | |
Hu et al. | Scalable aggregate keyword query over knowledge graph | |
CN106649879A (en) | Method for intelligent recommendation of professional book in library | |
TWI605353B (en) | File classification system, method and computer program product based on lexical statistics | |
Hovy et al. | Data Acquisition and Integration in the DGRC's Energy Data Collection Project | |
Chakrabarti et al. | Enhancing search with structure | |
Li et al. | Ontology-based query system design and implementation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20190118 Assignee: WUHAN TIMES GEOSMART TECHNOLOGY Co.,Ltd. Assignor: CHINA University OF GEOSCIENCES (WUHAN CITY) Contract record no.: X2022420000021 Denomination of invention: An organization and query method of knowledge map based on hybrid database Granted publication date: 20210720 License type: Common License Record date: 20220302 |