CN109241078B - Knowledge graph organization query method based on mixed database - Google Patents

Knowledge graph organization query method based on mixed database Download PDF

Info

Publication number
CN109241078B
CN109241078B CN201811005179.4A CN201811005179A CN109241078B CN 109241078 B CN109241078 B CN 109241078B CN 201811005179 A CN201811005179 A CN 201811005179A CN 109241078 B CN109241078 B CN 109241078B
Authority
CN
China
Prior art keywords
entity
entities
query
knowledge base
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811005179.4A
Other languages
Chinese (zh)
Other versions
CN109241078A (en
Inventor
李新川
姚宏
陈仁谣
李圣文
梁庆中
郑坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN201811005179.4A priority Critical patent/CN109241078B/en
Publication of CN109241078A publication Critical patent/CN109241078A/en
Application granted granted Critical
Publication of CN109241078B publication Critical patent/CN109241078B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a knowledge graph organization query method based on a mixed database, which comprises the following steps: acquiring a triple set in a preset data set; distinguishing an entity triple set and a relation triple set from the triple set; storing the entity triple set on Neo4j to obtain a knowledge base with entities; constructing an index for the knowledge base with the entity to obtain the knowledge base with the index and the entity; storing the relation triple set on Neo4j to obtain a knowledge base with indexes, entities and relations; storing entity ambiguity information on MySQL to construct an entity ambiguity word list; and storing the constructed entity ambiguity word list into a knowledge base with indexes, entities and relations to obtain a complete knowledge base. The invention provides a knowledge graph organization method based on a mixed database by combining the advantages of a relational database and a graph database, is suitable for a general knowledge graph in a large-scale open field, and improves the query efficiency of the knowledge graph while optimizing the storage structure of the knowledge graph.

Description

Knowledge graph organization query method based on mixed database
Technical Field
The invention particularly relates to a knowledge graph organization query method based on a mixed database.
Background
As an efficient information organization and retrieval method, the knowledge graph has raised a hot learning trend since Google 2012. The aspects of entity extraction, attribute extraction, relationship extraction between entities, knowledge reasoning, knowledge representation learning and the like are more research hotspots, but few documents mention how to perform underlying storage of the graph and how to combine with an interface for storing design queries, or, although mention is made, the description of the aspects is incomplete and scattered. Storage and query usually appear as a whole, efficient query needs a good storage structure to support, and storage needs to be continuously optimized in combination with the characteristics of query.
Conventional databases, such as relational databases. The method can well perform clustering storage according to the information of the Schema layer of the knowledge graph, and has high efficiency when accessing certain class of data, but in other words, before the storage, Schema hierarchical information of the data needs to be known in advance, and once the Schema is determined, great change is difficult to be made, however, for the knowledge graph in the large-scale open field, the types of entities and relations are usually many and complex, and the Schema hierarchical information in the graph is difficult to be determined; secondly, when a multi-table connection (usually the connection depth is more than 2) query is faced, the relational database also seems inattentive, but the query operation is a very basic requirement of the knowledge graph.
For NOSQL database, such as primary key value database, column family storage database, document oriented database, graphic database, etc. The data structure of the graph database is closest to the knowledge graph and is represented as a huge graph structure model consisting of a large number of entity nodes and incidence relations among entities, and the graph structure model can well represent the relation among concrete or abstract things; meanwhile, the requirement of local access characteristics of the graph can be well met. However, how to store information that does not satisfy the graph data structure in the graph, such as ambiguity information between entities, becomes a problem to be solved.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a knowledge graph organization query method based on a hybrid database to solve the above problems, aiming at the above disadvantages of the conventional relational database and graph database technology.
A knowledge graph organization query method based on a mixed database comprises the following steps:
step 1, acquiring a triple set in a preset data set;
step 2, distinguishing an entity triple set and a relation triple set from the triple set obtained in the step 1;
step 3, storing the entity triple set on Neo4j to obtain a knowledge base with the entity;
step 4, constructing indexes aiming at entity nodes stored in the knowledge base with the entities to obtain the knowledge base with the indexes and the entities;
step 5, storing the relation triple set on Neo4j to obtain a knowledge base with indexes, entities and relations;
step 6, storing entity ambiguity on MySQL to construct an entity ambiguity word list;
step 7, storing the entity ambiguity word list constructed in the step 6 into the knowledge base with indexes, entities and relations obtained in the step 5 to obtain a complete knowledge base;
and 8, inputting an entity to be queried, and querying in the complete knowledge base obtained in the step 7 by adopting a two-stage query method of MySQL + Neo4j to obtain complete entity information.
Further, the preset data set in step 2 refers to general descriptions of entities and relations, and is any one or combination of structured data, unstructured data and semi-structured data.
Further, the specific storage method in step 3 is as follows: and distinguishing different entity nodes from the entity triple set and storing the entity nodes.
Further, the specific storage method in step 5 is as follows: and (4) distinguishing head and tail entity nodes from the relation triple set, then inquiring the head and tail entities in the knowledge base with the index and the entities obtained in the step (4), if the head and tail entities are hit, constructing a relation for the head and tail nodes, and if the head and tail entities are not hit, cancelling the relation.
Further, the entity ambiguity in step 6 refers to the situation of word ambiguity and synonyms existing between entities.
Further, the two-level query structure of MySQL + Neo4j specifically includes:
(1) inputting an entity to be queried, firstly, performing SQL query in a MySQL database, and judging whether the query hits: if the SQL query is hit, judging that the entity to be queried has ambiguity, returning all ambiguous entities corresponding to the ambiguous entities to the user, disambiguating the entities, and inputting the disambiguated entities into a Neo4j database for CQL query; if the SQL query is not hit, judging that the entity to be queried has no ambiguity, and directly transmitting the entity to be queried to a Neo4j database for CQL query;
(2) and taking the entity to be queried or the entity after disambiguation as the input of the Neo4j database to perform CQL query, and obtaining complete entity information as final output.
Further, the method for judging whether the query hits in the SQL query is as follows: and (4) comparing the entity to be queried with the entity ambiguity word list obtained in the step (6), if matching exists, querying is hit, otherwise, querying is not hit.
The invention has the advantages that: the knowledge graph organization method based on the mixed database is provided by combining the advantages of the relational database and the graph database, is suitable for the knowledge graph in the general large-scale open field, optimizes the storage structure of the knowledge graph and improves the query efficiency of the knowledge graph.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of a knowledge-graph organization query method based on a hybrid database according to the present invention;
FIG. 2 is a two-level query structure diagram of MySQL + Neo4j of the present invention.
Detailed Description
For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
As shown in fig. 1, a method for querying a knowledge-graph organization based on a hybrid database includes:
step 1, acquiring a triple set in a preset data set, wherein the preset data set refers to general description of entities and relations and comprises structured data, unstructured data and semi-structured data;
step 2, distinguishing an entity triple set and a relation triple set from the triple set obtained in the step 1;
step 3, storing the entity triple set on Neo4j, distinguishing different entity nodes from the entity triple set, and storing to obtain a knowledge base with an entity;
step 4, constructing indexes aiming at entity nodes stored in the knowledge base with the entities to obtain the knowledge base with the indexes and the entities
Step 5, storing the relation triple set on Neo4j, distinguishing head and tail entity nodes from the relation triple set, then inquiring the head and tail entities in the knowledge base with the index and the entity obtained in step 4, if the head and tail entities are hit, establishing a relation for the head and tail nodes, and if the relation is not invalidated, obtaining the knowledge base with the index, the entity and the relation;
and 6, storing entity ambiguity information on MySQL to construct an entity ambiguity word list, wherein the entity ambiguity refers to the condition of one word multiple meaning and synonyms existing between entities. (ii) a
And 7, storing the entity ambiguity word list constructed in the step 6 into the knowledge base with indexes, entities and relations obtained in the step 5 to obtain a complete knowledge base.
And 8, inputting an entity to be queried, and querying in the complete knowledge base obtained in the step 7 by adopting a two-stage query method of MySQL + Neo4j to obtain complete entity information.
The two-stage query method of MvSQL + Neo4j specifically comprises the following steps: firstly, whether entity ambiguity information exists in the entity is inquired in the MvSOL, if so, the entity ambiguity information is disambiguated, and then the entity is inquired in Neo4j, otherwise, the entity ambiguity information is directly inquired in Neo4 j. As shown in fig. 2, the query process is as follows:
1. SQL query (as shown by the number 1 in FIG. 2)
Since it is unknown whether the input entity name is ambiguous, the input entity name first needs to be SQL queried in the MySQL database, that is, the input entity name is matched with the first column of the ambiguous vocabulary in fig. 2 (the first column of the ambiguous vocabulary is the entity name, the second column is ambiguous entity, for example, the key value pair < S1, < E1, E2> indicates that the entity name S1 is ambiguous, and the ambiguous entities E1 and E2 point to the same string S1). According to whether the query hits or not, the following two cases are processed:
1) SQL query hit:
that is, the input entity name is ambiguous (as shown in fig. 2, the input entity name Sm is ambiguous, so that the ambiguous entities Ek to Ek + n pointing to the same character string Sm are returned after query hit), all the ambiguous entities Ek to Ek + n corresponding to the input are returned to the user, and the entities are disambiguated (as shown by reference numeral 2 in fig. 2, a specific disambiguation mode is determined by a specific application scenario), and the disambiguated entities (Ek + i) are input into the Neo4j database for CQL query (as shown by reference numeral 3 in fig. 2).
2) SQL query miss:
namely, the input entity name is not ambiguous, and the CQL query is directly carried out.
2. CQL query (i.e., query to knowledge base in FIG. 2)
Whether the SQL query is hit or not, only the entity name is finally obtained. In order to obtain the complete information of the entity, the obtained entity name is required to be used as the input of the Neo4j database to perform the CQL query, so as to obtain the complete entity information as the final response to the input of the user.
Specific query examples are as follows:
query example 1: input entity name with entity ambiguity
1) Inputting an entity: radix Et rhizoma Rhei
2) SQL query: ambiguous vocabulary queries in MySQL
3) SQL query hits (which represent an ambiguity in the input entity name "qilixiang"), returning an ambiguous entity pointing to "qilixiang":
qilixiang (Zhoujilun 2004 album)
Qilixiang (Murraya plant of Rutaceae)
Qilixiang (Zhou Jie Lun singing song)
Qilixiang (poem song name, poem collection name)
Qilixiang (Thailand TV series)
Qilixiang (Chinese medicine)
Qilixiang (novel seven lixiang)
………………
4) Entity disambiguation:
assume that at this point entity disambiguation is performed according to context.
The context is: "Zhou Jilun Qilixiang is a song that I like.
So the entities disambiguated according to context are: qilixiang (Zhou Jie Lun singing song)
5) CQL query:
the entity information query is carried out on the entity 'Qilixiang (Zhongjilun singing song)' after disambiguation in Neo4j, and the final output is obtained:
qilixiang (Zhou Jie Lun singing song)
Baidusag: musical composition/single song
The name of Chinese: radix Et rhizoma Rhei
Release time: 2004, the year
Original singing of songs: zhou Jie Lun
Word filling: fangwenshan
The album to which it belongs: chinese medicinal preparation containing seven kinds of Zingiber officinale (published by Zhou Jie Lun 2004)
Duration of song: 4:56
Song language: mandarin Chinese
And (3) song editing: people with clock center
And (3) music composing: zhou Jie Lun
Music style: chinese wind
………………
Query example 2: assuming that the entity name of the input is not entity ambiguous
1) Inputting an entity: qilixiang (Zhou Jie Lun singing song)
2) SQL query: ambiguous vocabulary queries in MySQL
3) SQL query miss (representing no ambiguity in entity name input at this time)
4) CQL query:
entity information query is performed in Neo4j, resulting in the final output:
qilixiang (Zhou Jie Lun singing song)
Baidusag: musical composition/single song
The name of Chinese: radix Et rhizoma Rhei
Release time: 2004, the year
Original singing of songs: zhou Jie Lun
Word filling: fangwenshan
The album to which it belongs: chinese medicinal preparation containing seven kinds of Zingiber officinale (published by Zhou Jie Lun 2004)
Duration of song: 4:56
Song language: mandarin Chinese
And (3) song editing: people with clock center
And (3) music composing: zhou Jie Lun
Music style: chinese wind
………………
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (7)

1. A knowledge graph organization query method based on a mixed database is characterized by comprising the following steps:
step 1, acquiring a triple set in a preset data set;
step 2, distinguishing an entity triple set and a relation triple set from the triple set obtained in the step 1;
step 3, storing the entity triple set on Neo4j to obtain a knowledge base with the entity;
step 4, constructing indexes aiming at entity nodes stored in the knowledge base with the entities to obtain the knowledge base with the indexes and the entities;
step 5, storing the relation triple set on Neo4j to obtain a knowledge base with indexes, entities and relations;
step 6, storing entity ambiguity on MySQL to construct an entity ambiguity word list;
step 7, storing the entity ambiguity word list constructed in the step 6 into the knowledge base with indexes, entities and relations obtained in the step 5 to obtain a complete knowledge base;
and 8, inputting an entity to be queried, and querying in the complete knowledge base obtained in the step 7 by adopting a two-stage query method of MySQL + Neo4j to obtain complete entity information.
2. The method of claim 1, wherein the preset data set in step 2 refers to general description of entities and relationships, and is a combination of any one or more of structured data, unstructured data and semi-structured data.
3. The knowledge-graph organization query method based on the hybrid database according to claim 1, wherein the specific storage method in step 3 is: and distinguishing different entity nodes from the entity triple set and storing the entity nodes.
4. The knowledge-graph organization query method based on the hybrid database according to claim 1, wherein the specific storage method in step 5 is: and (4) distinguishing head and tail entity nodes from the relation triple set, then inquiring the head and tail entities in the knowledge base with the index and the entities obtained in the step (4), if the head and tail entities are hit, constructing a relation for the head and tail nodes, and if the head and tail entities are not hit, cancelling the relation.
5. The method of claim 1, wherein the entity ambiguity in step 6 is the presence of word ambiguity and synonyms between entities.
6. The knowledge-graph organization query method based on the hybrid database according to claim 1, wherein the two-stage query structure of MySQL + Neo4j specifically comprises:
(1) inputting an entity to be queried, firstly, performing SQL query in a MySQL database, and judging whether the query hits: if the SQL query is hit, judging that the entity to be queried has ambiguity, returning all ambiguous entities corresponding to the ambiguous entities to the user, disambiguating the entities, and inputting the disambiguated entities into a Neo4j database for CQL query; if the SQL query is not hit, judging that the entity to be queried has no ambiguity, and directly transmitting the entity to be queried to a Neo4j database for CQL query;
(2) and taking the entity to be queried or the entity after disambiguation as the input of the Neo4j database to perform CQL query, and obtaining complete entity information as final output.
7. The method of claim 6, wherein the method of determining whether the query hits in the SQL query is: and (4) comparing the entity to be queried with the entity ambiguity word list obtained in the step (6), if matching exists, querying is hit, otherwise, querying is not hit.
CN201811005179.4A 2018-08-30 2018-08-30 Knowledge graph organization query method based on mixed database Active CN109241078B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811005179.4A CN109241078B (en) 2018-08-30 2018-08-30 Knowledge graph organization query method based on mixed database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811005179.4A CN109241078B (en) 2018-08-30 2018-08-30 Knowledge graph organization query method based on mixed database

Publications (2)

Publication Number Publication Date
CN109241078A CN109241078A (en) 2019-01-18
CN109241078B true CN109241078B (en) 2021-07-20

Family

ID=65067986

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811005179.4A Active CN109241078B (en) 2018-08-30 2018-08-30 Knowledge graph organization query method based on mixed database

Country Status (1)

Country Link
CN (1) CN109241078B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019687B (en) * 2019-04-11 2021-03-23 宁波深擎信息科技有限公司 Multi-intention recognition system, method, equipment and medium based on knowledge graph
CN111859974A (en) * 2019-04-22 2020-10-30 广东小天才科技有限公司 Semantic disambiguation method and device combined with knowledge graph and intelligent learning equipment
CN110489610B (en) * 2019-08-14 2022-02-08 北京海致星图科技有限公司 Knowledge graph real-time query solution
CN110597927B (en) * 2019-10-14 2022-08-16 上海依图网络科技有限公司 Storage query method and device based on heterogeneous database
CN110928960B (en) * 2019-10-28 2023-08-11 华中科技大学 Data storage system, method, equipment and storage medium
CN111160841A (en) * 2019-11-29 2020-05-15 广东轩辕网络科技股份有限公司 Organization architecture construction method and device based on knowledge graph
CN113761213A (en) * 2020-06-01 2021-12-07 Tcl科技集团股份有限公司 Data query system and method based on knowledge graph and terminal equipment
CN113342807A (en) * 2021-05-20 2021-09-03 电子科技大学 Knowledge graph based on mixed database and construction method thereof
CN113297089B (en) * 2021-06-09 2023-06-20 南京大学 Knowledge graph-based mass measurement assistant implementation method
CN114238268B (en) * 2021-11-29 2022-09-30 武汉达梦数据技术有限公司 Data storage method and device
CN114398492B (en) * 2021-12-24 2022-08-30 森纵艾数(北京)科技有限公司 Knowledge graph construction method, terminal and medium in digital field

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224630A (en) * 2015-09-24 2016-01-06 中国科学院自动化研究所 Based on the integrated approach of Ontology on Semantic Web data
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 System and method for constructing knowledge graph for information analysis
CN107330125A (en) * 2017-07-20 2017-11-07 云南电网有限责任公司电力科学研究院 The unstructured distribution data integrated approach of magnanimity of knowledge based graphical spectrum technology
CN107633075A (en) * 2017-09-22 2018-01-26 吉林大学 A kind of multi-source heterogeneous data fusion platform and fusion method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9946739B2 (en) * 2013-03-15 2018-04-17 Neura Labs Corp. Intelligent internet system with adaptive user interface providing one-step access to knowledge
KR20140145018A (en) * 2013-06-12 2014-12-22 한국전자통신연구원 Knowledge index system and method thereof
US20180137424A1 (en) * 2016-11-17 2018-05-17 General Electric Company Methods and systems for identifying gaps in predictive model ontology

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224630A (en) * 2015-09-24 2016-01-06 中国科学院自动化研究所 Based on the integrated approach of Ontology on Semantic Web data
CN106815293A (en) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 System and method for constructing knowledge graph for information analysis
CN107330125A (en) * 2017-07-20 2017-11-07 云南电网有限责任公司电力科学研究院 The unstructured distribution data integrated approach of magnanimity of knowledge based graphical spectrum technology
CN107633075A (en) * 2017-09-22 2018-01-26 吉林大学 A kind of multi-source heterogeneous data fusion platform and fusion method

Also Published As

Publication number Publication date
CN109241078A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN109241078B (en) Knowledge graph organization query method based on mixed database
CN107993724B (en) Medical intelligent question and answer data processing method and device
KR101732342B1 (en) Trusted query system and method
CN111291161A (en) Legal case knowledge graph query method, device, equipment and storage medium
Sarawagi et al. Open-domain quantity queries on web tables: annotation, response, and consensus models
US20120166414A1 (en) Systems and methods for relevance scoring
US20170116260A1 (en) Using a dimensional data model for transforming a natural language query to a structured language query
CN101763402B (en) Integrated retrieval method for multi-language information retrieval
WO2007143899A1 (en) System and method for intelligent retrieval and treating of information
CN103646032A (en) Database query method based on body and restricted natural language processing
JP2005251115A (en) System and method of associative retrieval
KR20160007040A (en) Method and system for searching by using natural language query
RU2010107150A (en) IDENTIFICATION OF SEMANTIC RELATIONS IN INDIRECT SPEECH
CN109597895B (en) Knowledge graph-based official document searching method
CN112231321B (en) Oracle secondary index and index real-time synchronization method
KR101095866B1 (en) Triple indexing and searching scheme for efficient information retrieval
WO2020074017A1 (en) Deep learning-based method and device for screening for keywords in medical document
CN101751420A (en) Semantics vein document searching method
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
Hu et al. Scalable aggregate keyword query over knowledge graph
CN106649879A (en) Method for intelligent recommendation of professional book in library
TWI605353B (en) File classification system, method and computer program product based on lexical statistics
Hovy et al. Data Acquisition and Integration in the DGRC's Energy Data Collection Project
Chakrabarti et al. Enhancing search with structure
Li et al. Ontology-based query system design and implementation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20190118

Assignee: WUHAN TIMES GEOSMART TECHNOLOGY Co.,Ltd.

Assignor: CHINA University OF GEOSCIENCES (WUHAN CITY)

Contract record no.: X2022420000021

Denomination of invention: An organization and query method of knowledge map based on hybrid database

Granted publication date: 20210720

License type: Common License

Record date: 20220302