CN111241293A

CN111241293A - Knowledge graph algorithm constructed based on academic literature

Info

Publication number: CN111241293A
Application number: CN201911383312.4A
Authority: CN
Inventors: 贾新志
Original assignee: Shanghai Jihao Network Co Ltd
Current assignee: Shanghai Jihao Network Co Ltd
Priority date: 2019-12-28
Filing date: 2019-12-28
Publication date: 2020-06-05

Abstract

The invention discloses a knowledge graph algorithm constructed based on academic documents, which comprises the following steps of; s1, acquiring original data information; s2, storing and processing the original data information; s3, establishing a primary framework of the knowledge graph according to the processed original data information; s4, updating knowledge through the primary framework of the knowledge graph; s5, performing multiple operations on the S4 to realize multiple updates of the primary framework of the knowledge graph; s6, completing the establishment of a knowledge graph; according to the invention, the construction and the updating of the knowledge graph are efficiently realized, the initial accuracy and the integrity of the knowledge graph are improved, the quick information retrieval of academic documents is facilitated, and the work of academic research is facilitated.

Description

Knowledge graph algorithm constructed based on academic literature

Technical Field

The invention relates to the field of knowledge graphs, in particular to a knowledge graph algorithm constructed based on academic documents.

Background

Knowledge graphs provide high quality structured data and are now widely used in many areas of artificial intelligence, such as automated question and answer, search engines, and information extraction. A typical knowledge-graph is usually represented in the form of triplets (head entities, relations, tail entities), e.g. (yaoming, nationality, china) reflecting the fact that the nationality of yaoming is china. However, most of the existing knowledge maps cannot be updated for a long time, the updating efficiency is low, and error information is easy to occur in the updating, so that the defects that the maps are incomplete, the expansibility is poor, and correct updating cannot be realized exist.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a knowledge graph algorithm constructed based on academic documents, the knowledge graph information is complete, the knowledge graph can be effectively and efficiently updated, the correctness of data information is ensured, the invention is beneficial to quickly retrieving information from the academic documents, and the academic research work is facilitated.

In order to solve the technical problems, the invention provides the following technical scheme:

a knowledge graph algorithm constructed based on academic documents comprises the following steps:

s1, acquiring original data information;

s2, storing and processing the original data information;

s3, establishing a primary framework of the knowledge graph according to the processed original data information;

s4, updating knowledge through the primary framework of the knowledge graph;

s5, performing multiple operations on the S4 to realize multiple updates of the primary framework of the knowledge graph; s6, completing the establishment of a knowledge graph;

wherein, in S4, the knowledge update comprises the steps of:

s101, acquiring latest data information from a website as a reference entity;

s102, extracting an existing entity in the map as an existing entity;

s103, comparing the reference entity with the existing entity;

s104, if the comparison result in the S103 shows that the result is correct, taking the reference entity as the entity of the final standard; if the comparison result in S103 shows that the parts are the same, the reference entity is used as the entity of the final standard; if the comparison results in S103 show completely different, the reference entity and the existing entity are both sent to the server for manual judgment and review, and after manual review, the final standard entity is selected;

and S105, updating data information by the primary framework of the knowledge graph according to the entity of the final standard selected in the S104 to finish the knowledge updating process.

As a preferred embodiment of the present invention, in S1, the raw data includes: data information obtained by taking periodicals, papers, patents, encyclopedias and dictionaries as corpus sources is taken as original data; searching by taking a hot title on a social network site and a hot search word on a search engine as starting points to obtain data information, and taking the data information as original data; the information obtained on the state official network, the enterprise official network and the official network of other regular organizations is used as original data; the authority information obtained in each specialty and profession is used as the original data.

As a preferred technical solution of the present invention, in S2, the storing and processing of the original data information includes a data storage module, a model editing module, a concurrency control module, an authority control module, a data verification module, and an automatic construction module;

the data storage module is used for storing structured data, semi-structured data and unstructured data;

the model editing module is used for editing concepts, entities, attributes, hierarchical relationships and concept-entity relationships of the knowledge model;

the concurrency control module is used for carrying out concurrent editing on the data in the database system according to the transaction isolation level;

and the authority control module is used for verifying the login information of the user so as to control the authority of different editing layers.

As a preferred embodiment of the present invention, in S101, entities are extracted as follows:

s201, using named entity identification for the captured title, and extracting a named entity;

s202, acquiring an unidentified candidate entity word list from the title by using a word segmentation technology;

s203, performing part-of-speech tagging on the candidate entity words, screening out candidate words without practical meanings, then verifying whether the candidate words are entity words on an encyclopedia website, and taking the entity words and the extracted named entities as reference entities.

As a preferred technical solution of the present invention, in S103, a data verification processing module is included; and the data verification processing module is used for verifying the integrity and consistency of the entity, backing up and exporting data, and realizing entity identification and entity disambiguation.

As a preferable embodiment of the present invention, in S2, the data information is stored using a graph database.

Compared with the prior art, the invention has the following beneficial effects:

according to the invention, the construction and the updating of the knowledge graph are efficiently realized; firstly, acquiring data, storing and processing the data, and establishing a primary framework of a knowledge graph according to the existing data; and then, updating knowledge of the primary framework of the knowledge graph, thereby improving the primary accuracy and integrity of the knowledge graph.

According to the invention, the latest information and the entities changing on each website are updated in time, so that the purpose of efficiently updating the data in the knowledge graph in real time is realized, and the hysteresis of the knowledge graph data is reduced. Meanwhile, in the data updating process, intelligent comparison of data is achieved, and an efficient operation mode that intelligent audit is matched with manual audit is achieved, so that the accuracy of the data and the data updating efficiency are improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a schematic diagram of a construction process of a construction algorithm of a knowledge graph provided by the invention.

FIG. 2 is a schematic diagram of a process of map update in the knowledge map construction algorithm provided by the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

Example 1

As shown in fig. 1-2, the invention provides a knowledge-graph algorithm constructed based on academic documents, which comprises the following steps:

s1, acquiring original data information;

s2, storing and processing the original data information;

s4, updating knowledge through the primary framework of the knowledge graph;

s5, performing multiple operations on the S4 to realize multiple updates of the primary framework of the knowledge graph;

s6, completing the establishment of a knowledge graph;

wherein, in S4, the knowledge update comprises the steps of:

s101, acquiring latest data information from a website as a reference entity;

s102, extracting an existing entity in the map as an existing entity;

s103, comparing the reference entity with the existing entity;

In an alternative embodiment, in S1, the raw data includes: data information obtained by taking periodicals, papers, patents, encyclopedias and dictionaries as corpus sources is taken as original data; searching by taking a hot title on a social network site and a hot search word on a search engine as starting points to obtain data information, and taking the data information as original data; the information obtained on the state official network, the enterprise official network and the official network of other regular organizations is used as original data; the authority information obtained in each specialty and profession is used as the original data.

It should be noted that the source of the acquired data information is reliable, and the accuracy of the data is ensured; the path for acquiring the data information is wide, and information omission is avoided, so that the accuracy of the knowledge graph is improved, and the coverage of the knowledge graph is wider.

In an alternative embodiment, in S2, the storing and processing of the original data information includes a data storing module, a model editing module, a concurrency control module, an authority control module, a data verifying module, and an automatic construction module;

It should be noted that, the storage and processing of data are realized, the subsequent processing of data information is facilitated, and the construction efficiency of the knowledge graph is improved.

In an alternative embodiment, in S101, entities are extracted as follows:

It should be noted that, when an entity is extracted, the entity which is most probably updated is mined out by selecting the characteristics, the knowledge graph is efficiently updated, unnecessary updating in the process of updating the knowledge graph is reduced, the waste of network bandwidth by the existing method is well avoided, and the time lag of data in the knowledge graph is greatly reduced.

In an optional embodiment, in S103, a data verification processing module is included; the data verification processing module is used for verifying the integrity and consistency of the entity, backing up and exporting data, and realizing entity identification and entity disambiguation, thereby being beneficial to accurately obtaining a comparison structure between a reference entity and the existing entity and improving the updating efficiency.

In an alternative embodiment, in S2, the data information is stored using a graph database, which has a significantly higher efficiency in association query than the conventional relational data storage method, and when we relate to 2,3 degree association query, the query efficiency based on the knowledge graph is thousands of times or even millions of times higher.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A knowledge graph algorithm constructed based on academic documents is characterized by comprising the following steps:

s1, acquiring original data information;

s2, storing and processing the original data information;

s4, updating knowledge through the primary framework of the knowledge graph;

wherein, in S4, the knowledge update comprises the steps of:

s101, acquiring latest data information from a website as a reference entity;

s102, extracting an existing entity in the map as an existing entity;

s103, comparing the reference entity with the existing entity;

2. The knowledge-graph constructing algorithm according to claim 1, wherein in S1, the raw data includes: data information obtained by taking periodicals, papers, patents, encyclopedias and dictionaries as corpus sources is taken as original data; searching by taking a hot title on a social network site and a hot search word on a search engine as starting points to obtain data information, and taking the data information as original data; the information obtained on the state official network, the enterprise official network and the official network of other regular organizations is used as original data; the authority information obtained in each specialty and profession is used as the original data.

3. The knowledge-graph constructing algorithm according to claim 1, wherein in S2, the storing and processing of the original data information comprises a data storing module, a model editing module, a concurrency control module, an authority control module, a data verifying module and an automatic constructing module;

4. The knowledge-graph constructing algorithm according to claim 1, wherein in S101, the entities are extracted as follows:

5. The knowledge-graph constructing algorithm according to claim 1, wherein in S103, a data verification processing module is included; and the data verification processing module is used for verifying the integrity and consistency of the entity, backing up and exporting data, and realizing entity identification and entity disambiguation.

6. The knowledge-graph constructing algorithm according to claim 1, wherein in S2, the data information is stored using a graph database.