CN111241293A - Knowledge graph algorithm constructed based on academic literature - Google Patents
Knowledge graph algorithm constructed based on academic literature Download PDFInfo
- Publication number
- CN111241293A CN111241293A CN201911383312.4A CN201911383312A CN111241293A CN 111241293 A CN111241293 A CN 111241293A CN 201911383312 A CN201911383312 A CN 201911383312A CN 111241293 A CN111241293 A CN 111241293A
- Authority
- CN
- China
- Prior art keywords
- entity
- data
- knowledge
- knowledge graph
- data information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a knowledge graph algorithm constructed based on academic documents, which comprises the following steps of; s1, acquiring original data information; s2, storing and processing the original data information; s3, establishing a primary framework of the knowledge graph according to the processed original data information; s4, updating knowledge through the primary framework of the knowledge graph; s5, performing multiple operations on the S4 to realize multiple updates of the primary framework of the knowledge graph; s6, completing the establishment of a knowledge graph; according to the invention, the construction and the updating of the knowledge graph are efficiently realized, the initial accuracy and the integrity of the knowledge graph are improved, the quick information retrieval of academic documents is facilitated, and the work of academic research is facilitated.
Description
Technical Field
The invention relates to the field of knowledge graphs, in particular to a knowledge graph algorithm constructed based on academic documents.
Background
Knowledge graphs provide high quality structured data and are now widely used in many areas of artificial intelligence, such as automated question and answer, search engines, and information extraction. A typical knowledge-graph is usually represented in the form of triplets (head entities, relations, tail entities), e.g. (yaoming, nationality, china) reflecting the fact that the nationality of yaoming is china. However, most of the existing knowledge maps cannot be updated for a long time, the updating efficiency is low, and error information is easy to occur in the updating, so that the defects that the maps are incomplete, the expansibility is poor, and correct updating cannot be realized exist.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a knowledge graph algorithm constructed based on academic documents, the knowledge graph information is complete, the knowledge graph can be effectively and efficiently updated, the correctness of data information is ensured, the invention is beneficial to quickly retrieving information from the academic documents, and the academic research work is facilitated.
In order to solve the technical problems, the invention provides the following technical scheme:
a knowledge graph algorithm constructed based on academic documents comprises the following steps:
s1, acquiring original data information;
s2, storing and processing the original data information;
s3, establishing a primary framework of the knowledge graph according to the processed original data information;
s4, updating knowledge through the primary framework of the knowledge graph;
s5, performing multiple operations on the S4 to realize multiple updates of the primary framework of the knowledge graph; s6, completing the establishment of a knowledge graph;
wherein, in S4, the knowledge update comprises the steps of:
s101, acquiring latest data information from a website as a reference entity;
s102, extracting an existing entity in the map as an existing entity;
s103, comparing the reference entity with the existing entity;
s104, if the comparison result in the S103 shows that the result is correct, taking the reference entity as the entity of the final standard; if the comparison result in S103 shows that the parts are the same, the reference entity is used as the entity of the final standard; if the comparison results in S103 show completely different, the reference entity and the existing entity are both sent to the server for manual judgment and review, and after manual review, the final standard entity is selected;
and S105, updating data information by the primary framework of the knowledge graph according to the entity of the final standard selected in the S104 to finish the knowledge updating process.
As a preferred embodiment of the present invention, in S1, the raw data includes: data information obtained by taking periodicals, papers, patents, encyclopedias and dictionaries as corpus sources is taken as original data; searching by taking a hot title on a social network site and a hot search word on a search engine as starting points to obtain data information, and taking the data information as original data; the information obtained on the state official network, the enterprise official network and the official network of other regular organizations is used as original data; the authority information obtained in each specialty and profession is used as the original data.
As a preferred technical solution of the present invention, in S2, the storing and processing of the original data information includes a data storage module, a model editing module, a concurrency control module, an authority control module, a data verification module, and an automatic construction module;
the data storage module is used for storing structured data, semi-structured data and unstructured data;
the model editing module is used for editing concepts, entities, attributes, hierarchical relationships and concept-entity relationships of the knowledge model;
the concurrency control module is used for carrying out concurrent editing on the data in the database system according to the transaction isolation level;
and the authority control module is used for verifying the login information of the user so as to control the authority of different editing layers.
As a preferred embodiment of the present invention, in S101, entities are extracted as follows:
s201, using named entity identification for the captured title, and extracting a named entity;
s202, acquiring an unidentified candidate entity word list from the title by using a word segmentation technology;
s203, performing part-of-speech tagging on the candidate entity words, screening out candidate words without practical meanings, then verifying whether the candidate words are entity words on an encyclopedia website, and taking the entity words and the extracted named entities as reference entities.
As a preferred technical solution of the present invention, in S103, a data verification processing module is included; and the data verification processing module is used for verifying the integrity and consistency of the entity, backing up and exporting data, and realizing entity identification and entity disambiguation.
As a preferable embodiment of the present invention, in S2, the data information is stored using a graph database.
Compared with the prior art, the invention has the following beneficial effects:
according to the invention, the construction and the updating of the knowledge graph are efficiently realized; firstly, acquiring data, storing and processing the data, and establishing a primary framework of a knowledge graph according to the existing data; and then, updating knowledge of the primary framework of the knowledge graph, thereby improving the primary accuracy and integrity of the knowledge graph.
According to the invention, the latest information and the entities changing on each website are updated in time, so that the purpose of efficiently updating the data in the knowledge graph in real time is realized, and the hysteresis of the knowledge graph data is reduced. Meanwhile, in the data updating process, intelligent comparison of data is achieved, and an efficient operation mode that intelligent audit is matched with manual audit is achieved, so that the accuracy of the data and the data updating efficiency are improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram of a construction process of a construction algorithm of a knowledge graph provided by the invention.
FIG. 2 is a schematic diagram of a process of map update in the knowledge map construction algorithm provided by the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example 1
As shown in fig. 1-2, the invention provides a knowledge-graph algorithm constructed based on academic documents, which comprises the following steps:
s1, acquiring original data information;
s2, storing and processing the original data information;
s3, establishing a primary framework of the knowledge graph according to the processed original data information;
s4, updating knowledge through the primary framework of the knowledge graph;
s5, performing multiple operations on the S4 to realize multiple updates of the primary framework of the knowledge graph;
s6, completing the establishment of a knowledge graph;
wherein, in S4, the knowledge update comprises the steps of:
s101, acquiring latest data information from a website as a reference entity;
s102, extracting an existing entity in the map as an existing entity;
s103, comparing the reference entity with the existing entity;
s104, if the comparison result in the S103 shows that the result is correct, taking the reference entity as the entity of the final standard; if the comparison result in S103 shows that the parts are the same, the reference entity is used as the entity of the final standard; if the comparison results in S103 show completely different, the reference entity and the existing entity are both sent to the server for manual judgment and review, and after manual review, the final standard entity is selected;
and S105, updating data information by the primary framework of the knowledge graph according to the entity of the final standard selected in the S104 to finish the knowledge updating process.
According to the invention, the construction and the updating of the knowledge graph are efficiently realized; firstly, acquiring data, storing and processing the data, and establishing a primary framework of a knowledge graph according to the existing data; and then, updating knowledge of the primary framework of the knowledge graph, thereby improving the primary accuracy and integrity of the knowledge graph.
According to the invention, the latest information and the entities changing on each website are updated in time, so that the purpose of efficiently updating the data in the knowledge graph in real time is realized, and the hysteresis of the knowledge graph data is reduced. Meanwhile, in the data updating process, intelligent comparison of data is achieved, and an efficient operation mode that intelligent audit is matched with manual audit is achieved, so that the accuracy of the data and the data updating efficiency are improved.
In an alternative embodiment, in S1, the raw data includes: data information obtained by taking periodicals, papers, patents, encyclopedias and dictionaries as corpus sources is taken as original data; searching by taking a hot title on a social network site and a hot search word on a search engine as starting points to obtain data information, and taking the data information as original data; the information obtained on the state official network, the enterprise official network and the official network of other regular organizations is used as original data; the authority information obtained in each specialty and profession is used as the original data.
It should be noted that the source of the acquired data information is reliable, and the accuracy of the data is ensured; the path for acquiring the data information is wide, and information omission is avoided, so that the accuracy of the knowledge graph is improved, and the coverage of the knowledge graph is wider.
In an alternative embodiment, in S2, the storing and processing of the original data information includes a data storing module, a model editing module, a concurrency control module, an authority control module, a data verifying module, and an automatic construction module;
the data storage module is used for storing structured data, semi-structured data and unstructured data;
the model editing module is used for editing concepts, entities, attributes, hierarchical relationships and concept-entity relationships of the knowledge model;
the concurrency control module is used for carrying out concurrent editing on the data in the database system according to the transaction isolation level;
and the authority control module is used for verifying the login information of the user so as to control the authority of different editing layers.
It should be noted that, the storage and processing of data are realized, the subsequent processing of data information is facilitated, and the construction efficiency of the knowledge graph is improved.
In an alternative embodiment, in S101, entities are extracted as follows:
s201, using named entity identification for the captured title, and extracting a named entity;
s202, acquiring an unidentified candidate entity word list from the title by using a word segmentation technology;
s203, performing part-of-speech tagging on the candidate entity words, screening out candidate words without practical meanings, then verifying whether the candidate words are entity words on an encyclopedia website, and taking the entity words and the extracted named entities as reference entities.
It should be noted that, when an entity is extracted, the entity which is most probably updated is mined out by selecting the characteristics, the knowledge graph is efficiently updated, unnecessary updating in the process of updating the knowledge graph is reduced, the waste of network bandwidth by the existing method is well avoided, and the time lag of data in the knowledge graph is greatly reduced.
In an optional embodiment, in S103, a data verification processing module is included; the data verification processing module is used for verifying the integrity and consistency of the entity, backing up and exporting data, and realizing entity identification and entity disambiguation, thereby being beneficial to accurately obtaining a comparison structure between a reference entity and the existing entity and improving the updating efficiency.
In an alternative embodiment, in S2, the data information is stored using a graph database, which has a significantly higher efficiency in association query than the conventional relational data storage method, and when we relate to 2,3 degree association query, the query efficiency based on the knowledge graph is thousands of times or even millions of times higher.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. A knowledge graph algorithm constructed based on academic documents is characterized by comprising the following steps:
s1, acquiring original data information;
s2, storing and processing the original data information;
s3, establishing a primary framework of the knowledge graph according to the processed original data information;
s4, updating knowledge through the primary framework of the knowledge graph;
s5, performing multiple operations on the S4 to realize multiple updates of the primary framework of the knowledge graph; s6, completing the establishment of a knowledge graph;
wherein, in S4, the knowledge update comprises the steps of:
s101, acquiring latest data information from a website as a reference entity;
s102, extracting an existing entity in the map as an existing entity;
s103, comparing the reference entity with the existing entity;
s104, if the comparison result in the S103 shows that the result is correct, taking the reference entity as the entity of the final standard; if the comparison result in S103 shows that the parts are the same, the reference entity is used as the entity of the final standard; if the comparison results in S103 show completely different, the reference entity and the existing entity are both sent to the server for manual judgment and review, and after manual review, the final standard entity is selected;
and S105, updating data information by the primary framework of the knowledge graph according to the entity of the final standard selected in the S104 to finish the knowledge updating process.
2. The knowledge-graph constructing algorithm according to claim 1, wherein in S1, the raw data includes: data information obtained by taking periodicals, papers, patents, encyclopedias and dictionaries as corpus sources is taken as original data; searching by taking a hot title on a social network site and a hot search word on a search engine as starting points to obtain data information, and taking the data information as original data; the information obtained on the state official network, the enterprise official network and the official network of other regular organizations is used as original data; the authority information obtained in each specialty and profession is used as the original data.
3. The knowledge-graph constructing algorithm according to claim 1, wherein in S2, the storing and processing of the original data information comprises a data storing module, a model editing module, a concurrency control module, an authority control module, a data verifying module and an automatic constructing module;
the data storage module is used for storing structured data, semi-structured data and unstructured data;
the model editing module is used for editing concepts, entities, attributes, hierarchical relationships and concept-entity relationships of the knowledge model;
the concurrency control module is used for carrying out concurrent editing on the data in the database system according to the transaction isolation level;
and the authority control module is used for verifying the login information of the user so as to control the authority of different editing layers.
4. The knowledge-graph constructing algorithm according to claim 1, wherein in S101, the entities are extracted as follows:
s201, using named entity identification for the captured title, and extracting a named entity;
s202, acquiring an unidentified candidate entity word list from the title by using a word segmentation technology;
s203, performing part-of-speech tagging on the candidate entity words, screening out candidate words without practical meanings, then verifying whether the candidate words are entity words on an encyclopedia website, and taking the entity words and the extracted named entities as reference entities.
5. The knowledge-graph constructing algorithm according to claim 1, wherein in S103, a data verification processing module is included; and the data verification processing module is used for verifying the integrity and consistency of the entity, backing up and exporting data, and realizing entity identification and entity disambiguation.
6. The knowledge-graph constructing algorithm according to claim 1, wherein in S2, the data information is stored using a graph database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911383312.4A CN111241293A (en) | 2019-12-28 | 2019-12-28 | Knowledge graph algorithm constructed based on academic literature |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911383312.4A CN111241293A (en) | 2019-12-28 | 2019-12-28 | Knowledge graph algorithm constructed based on academic literature |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111241293A true CN111241293A (en) | 2020-06-05 |
Family
ID=70871721
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911383312.4A Withdrawn CN111241293A (en) | 2019-12-28 | 2019-12-28 | Knowledge graph algorithm constructed based on academic literature |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111241293A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112487789A (en) * | 2020-11-27 | 2021-03-12 | 贵州电网有限责任公司 | Operation order scheduling logic validity verification method based on knowledge graph |
CN117808085A (en) * | 2024-02-29 | 2024-04-02 | 南京师范大学 | Automatic discipline knowledge framework construction method, device, equipment and storage medium |
-
2019
- 2019-12-28 CN CN201911383312.4A patent/CN111241293A/en not_active Withdrawn
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112487789A (en) * | 2020-11-27 | 2021-03-12 | 贵州电网有限责任公司 | Operation order scheduling logic validity verification method based on knowledge graph |
CN112487789B (en) * | 2020-11-27 | 2023-12-01 | 贵州电网有限责任公司 | Operation ticket scheduling logic validity verification method based on knowledge graph |
CN117808085A (en) * | 2024-02-29 | 2024-04-02 | 南京师范大学 | Automatic discipline knowledge framework construction method, device, equipment and storage medium |
CN117808085B (en) * | 2024-02-29 | 2024-05-07 | 南京师范大学 | Automatic discipline knowledge framework construction method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021103492A1 (en) | Risk prediction method and system for business operations | |
CN109977110A (en) | Data cleaning method, device and equipment | |
CN109657074B (en) | News knowledge graph construction method based on address tree | |
US8140495B2 (en) | Asynchronous database index maintenance | |
US8924365B2 (en) | System and method for range search over distributive storage systems | |
US7324998B2 (en) | Document search methods and systems | |
US20140222793A1 (en) | System and Method for Automatically Importing, Refreshing, Maintaining, and Merging Contact Sets | |
CN111597347B (en) | Knowledge embedding defect report reconstruction method and device | |
CN110008353A (en) | A kind of construction method of dynamic knowledge map | |
CN104252507B (en) | A kind of business data matching process and device | |
CN113254630B (en) | Domain knowledge map recommendation method for global comprehensive observation results | |
US20210334292A1 (en) | System and method for reconciliation of data in multiple systems using permutation matching | |
CN111241293A (en) | Knowledge graph algorithm constructed based on academic literature | |
CN107169003B (en) | Data association method and device | |
CN113722600A (en) | Data query method, device, equipment and product applied to big data | |
CN110502529B (en) | Data processing method, device, server and storage medium | |
CN109460467B (en) | Method for constructing network information classification system | |
CN113742498B (en) | Knowledge graph construction and updating method | |
CN115544181A (en) | Ontology-based automatic data loading method for power grid data mart | |
CN111061853B (en) | Method for rapidly acquiring FAQ model training corpus | |
US11775757B2 (en) | Automated machine-learning dataset preparation | |
CN114218277A (en) | Efficient query method and device for relational database | |
CN113254725A (en) | Data management and retrieval enhancement method for graph database | |
US8706769B1 (en) | Processing insert with normalize statements | |
KR101083425B1 (en) | Database detecting system and detecting method using the same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200605 |