CN111241293A - Knowledge graph algorithm constructed based on academic literature - Google Patents

Knowledge graph algorithm constructed based on academic literature Download PDF

Info

Publication number
CN111241293A
CN111241293A CN201911383312.4A CN201911383312A CN111241293A CN 111241293 A CN111241293 A CN 111241293A CN 201911383312 A CN201911383312 A CN 201911383312A CN 111241293 A CN111241293 A CN 111241293A
Authority
CN
China
Prior art keywords
entity
data
knowledge
knowledge graph
data information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911383312.4A
Other languages
Chinese (zh)
Inventor
贾新志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jihao Network Co Ltd
Original Assignee
Shanghai Jihao Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jihao Network Co Ltd filed Critical Shanghai Jihao Network Co Ltd
Priority to CN201911383312.4A priority Critical patent/CN111241293A/en
Publication of CN111241293A publication Critical patent/CN111241293A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a knowledge graph algorithm constructed based on academic documents, which comprises the following steps of; s1, acquiring original data information; s2, storing and processing the original data information; s3, establishing a primary framework of the knowledge graph according to the processed original data information; s4, updating knowledge through the primary framework of the knowledge graph; s5, performing multiple operations on the S4 to realize multiple updates of the primary framework of the knowledge graph; s6, completing the establishment of a knowledge graph; according to the invention, the construction and the updating of the knowledge graph are efficiently realized, the initial accuracy and the integrity of the knowledge graph are improved, the quick information retrieval of academic documents is facilitated, and the work of academic research is facilitated.

Description

Knowledge graph algorithm constructed based on academic literature
Technical Field
The invention relates to the field of knowledge graphs, in particular to a knowledge graph algorithm constructed based on academic documents.
Background
Knowledge graphs provide high quality structured data and are now widely used in many areas of artificial intelligence, such as automated question and answer, search engines, and information extraction. A typical knowledge-graph is usually represented in the form of triplets (head entities, relations, tail entities), e.g. (yaoming, nationality, china) reflecting the fact that the nationality of yaoming is china. However, most of the existing knowledge maps cannot be updated for a long time, the updating efficiency is low, and error information is easy to occur in the updating, so that the defects that the maps are incomplete, the expansibility is poor, and correct updating cannot be realized exist.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a knowledge graph algorithm constructed based on academic documents, the knowledge graph information is complete, the knowledge graph can be effectively and efficiently updated, the correctness of data information is ensured, the invention is beneficial to quickly retrieving information from the academic documents, and the academic research work is facilitated.
In order to solve the technical problems, the invention provides the following technical scheme:
a knowledge graph algorithm constructed based on academic documents comprises the following steps:
s1, acquiring original data information;
s2, storing and processing the original data information;
s3, establishing a primary framework of the knowledge graph according to the processed original data information;
s4, updating knowledge through the primary framework of the knowledge graph;
s5, performing multiple operations on the S4 to realize multiple updates of the primary framework of the knowledge graph; s6, completing the establishment of a knowledge graph;
wherein, in S4, the knowledge update comprises the steps of:
s101, acquiring latest data information from a website as a reference entity;
s102, extracting an existing entity in the map as an existing entity;
s103, comparing the reference entity with the existing entity;
s104, if the comparison result in the S103 shows that the result is correct, taking the reference entity as the entity of the final standard; if the comparison result in S103 shows that the parts are the same, the reference entity is used as the entity of the final standard; if the comparison results in S103 show completely different, the reference entity and the existing entity are both sent to the server for manual judgment and review, and after manual review, the final standard entity is selected;
and S105, updating data information by the primary framework of the knowledge graph according to the entity of the final standard selected in the S104 to finish the knowledge updating process.
As a preferred embodiment of the present invention, in S1, the raw data includes: data information obtained by taking periodicals, papers, patents, encyclopedias and dictionaries as corpus sources is taken as original data; searching by taking a hot title on a social network site and a hot search word on a search engine as starting points to obtain data information, and taking the data information as original data; the information obtained on the state official network, the enterprise official network and the official network of other regular organizations is used as original data; the authority information obtained in each specialty and profession is used as the original data.
As a preferred technical solution of the present invention, in S2, the storing and processing of the original data information includes a data storage module, a model editing module, a concurrency control module, an authority control module, a data verification module, and an automatic construction module;
the data storage module is used for storing structured data, semi-structured data and unstructured data;
the model editing module is used for editing concepts, entities, attributes, hierarchical relationships and concept-entity relationships of the knowledge model;
the concurrency control module is used for carrying out concurrent editing on the data in the database system according to the transaction isolation level;
and the authority control module is used for verifying the login information of the user so as to control the authority of different editing layers.
As a preferred embodiment of the present invention, in S101, entities are extracted as follows:
s201, using named entity identification for the captured title, and extracting a named entity;
s202, acquiring an unidentified candidate entity word list from the title by using a word segmentation technology;
s203, performing part-of-speech tagging on the candidate entity words, screening out candidate words without practical meanings, then verifying whether the candidate words are entity words on an encyclopedia website, and taking the entity words and the extracted named entities as reference entities.
As a preferred technical solution of the present invention, in S103, a data verification processing module is included; and the data verification processing module is used for verifying the integrity and consistency of the entity, backing up and exporting data, and realizing entity identification and entity disambiguation.
As a preferable embodiment of the present invention, in S2, the data information is stored using a graph database.
Compared with the prior art, the invention has the following beneficial effects:
according to the invention, the construction and the updating of the knowledge graph are efficiently realized; firstly, acquiring data, storing and processing the data, and establishing a primary framework of a knowledge graph according to the existing data; and then, updating knowledge of the primary framework of the knowledge graph, thereby improving the primary accuracy and integrity of the knowledge graph.
According to the invention, the latest information and the entities changing on each website are updated in time, so that the purpose of efficiently updating the data in the knowledge graph in real time is realized, and the hysteresis of the knowledge graph data is reduced. Meanwhile, in the data updating process, intelligent comparison of data is achieved, and an efficient operation mode that intelligent audit is matched with manual audit is achieved, so that the accuracy of the data and the data updating efficiency are improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram of a construction process of a construction algorithm of a knowledge graph provided by the invention.
FIG. 2 is a schematic diagram of a process of map update in the knowledge map construction algorithm provided by the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example 1
As shown in fig. 1-2, the invention provides a knowledge-graph algorithm constructed based on academic documents, which comprises the following steps:
s1, acquiring original data information;
s2, storing and processing the original data information;
s3, establishing a primary framework of the knowledge graph according to the processed original data information;
s4, updating knowledge through the primary framework of the knowledge graph;
s5, performing multiple operations on the S4 to realize multiple updates of the primary framework of the knowledge graph;
s6, completing the establishment of a knowledge graph;
wherein, in S4, the knowledge update comprises the steps of:
s101, acquiring latest data information from a website as a reference entity;
s102, extracting an existing entity in the map as an existing entity;
s103, comparing the reference entity with the existing entity;
s104, if the comparison result in the S103 shows that the result is correct, taking the reference entity as the entity of the final standard; if the comparison result in S103 shows that the parts are the same, the reference entity is used as the entity of the final standard; if the comparison results in S103 show completely different, the reference entity and the existing entity are both sent to the server for manual judgment and review, and after manual review, the final standard entity is selected;
and S105, updating data information by the primary framework of the knowledge graph according to the entity of the final standard selected in the S104 to finish the knowledge updating process.
According to the invention, the construction and the updating of the knowledge graph are efficiently realized; firstly, acquiring data, storing and processing the data, and establishing a primary framework of a knowledge graph according to the existing data; and then, updating knowledge of the primary framework of the knowledge graph, thereby improving the primary accuracy and integrity of the knowledge graph.
According to the invention, the latest information and the entities changing on each website are updated in time, so that the purpose of efficiently updating the data in the knowledge graph in real time is realized, and the hysteresis of the knowledge graph data is reduced. Meanwhile, in the data updating process, intelligent comparison of data is achieved, and an efficient operation mode that intelligent audit is matched with manual audit is achieved, so that the accuracy of the data and the data updating efficiency are improved.
In an alternative embodiment, in S1, the raw data includes: data information obtained by taking periodicals, papers, patents, encyclopedias and dictionaries as corpus sources is taken as original data; searching by taking a hot title on a social network site and a hot search word on a search engine as starting points to obtain data information, and taking the data information as original data; the information obtained on the state official network, the enterprise official network and the official network of other regular organizations is used as original data; the authority information obtained in each specialty and profession is used as the original data.
It should be noted that the source of the acquired data information is reliable, and the accuracy of the data is ensured; the path for acquiring the data information is wide, and information omission is avoided, so that the accuracy of the knowledge graph is improved, and the coverage of the knowledge graph is wider.
In an alternative embodiment, in S2, the storing and processing of the original data information includes a data storing module, a model editing module, a concurrency control module, an authority control module, a data verifying module, and an automatic construction module;
the data storage module is used for storing structured data, semi-structured data and unstructured data;
the model editing module is used for editing concepts, entities, attributes, hierarchical relationships and concept-entity relationships of the knowledge model;
the concurrency control module is used for carrying out concurrent editing on the data in the database system according to the transaction isolation level;
and the authority control module is used for verifying the login information of the user so as to control the authority of different editing layers.
It should be noted that, the storage and processing of data are realized, the subsequent processing of data information is facilitated, and the construction efficiency of the knowledge graph is improved.
In an alternative embodiment, in S101, entities are extracted as follows:
s201, using named entity identification for the captured title, and extracting a named entity;
s202, acquiring an unidentified candidate entity word list from the title by using a word segmentation technology;
s203, performing part-of-speech tagging on the candidate entity words, screening out candidate words without practical meanings, then verifying whether the candidate words are entity words on an encyclopedia website, and taking the entity words and the extracted named entities as reference entities.
It should be noted that, when an entity is extracted, the entity which is most probably updated is mined out by selecting the characteristics, the knowledge graph is efficiently updated, unnecessary updating in the process of updating the knowledge graph is reduced, the waste of network bandwidth by the existing method is well avoided, and the time lag of data in the knowledge graph is greatly reduced.
In an optional embodiment, in S103, a data verification processing module is included; the data verification processing module is used for verifying the integrity and consistency of the entity, backing up and exporting data, and realizing entity identification and entity disambiguation, thereby being beneficial to accurately obtaining a comparison structure between a reference entity and the existing entity and improving the updating efficiency.
In an alternative embodiment, in S2, the data information is stored using a graph database, which has a significantly higher efficiency in association query than the conventional relational data storage method, and when we relate to 2,3 degree association query, the query efficiency based on the knowledge graph is thousands of times or even millions of times higher.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A knowledge graph algorithm constructed based on academic documents is characterized by comprising the following steps:
s1, acquiring original data information;
s2, storing and processing the original data information;
s3, establishing a primary framework of the knowledge graph according to the processed original data information;
s4, updating knowledge through the primary framework of the knowledge graph;
s5, performing multiple operations on the S4 to realize multiple updates of the primary framework of the knowledge graph; s6, completing the establishment of a knowledge graph;
wherein, in S4, the knowledge update comprises the steps of:
s101, acquiring latest data information from a website as a reference entity;
s102, extracting an existing entity in the map as an existing entity;
s103, comparing the reference entity with the existing entity;
s104, if the comparison result in the S103 shows that the result is correct, taking the reference entity as the entity of the final standard; if the comparison result in S103 shows that the parts are the same, the reference entity is used as the entity of the final standard; if the comparison results in S103 show completely different, the reference entity and the existing entity are both sent to the server for manual judgment and review, and after manual review, the final standard entity is selected;
and S105, updating data information by the primary framework of the knowledge graph according to the entity of the final standard selected in the S104 to finish the knowledge updating process.
2. The knowledge-graph constructing algorithm according to claim 1, wherein in S1, the raw data includes: data information obtained by taking periodicals, papers, patents, encyclopedias and dictionaries as corpus sources is taken as original data; searching by taking a hot title on a social network site and a hot search word on a search engine as starting points to obtain data information, and taking the data information as original data; the information obtained on the state official network, the enterprise official network and the official network of other regular organizations is used as original data; the authority information obtained in each specialty and profession is used as the original data.
3. The knowledge-graph constructing algorithm according to claim 1, wherein in S2, the storing and processing of the original data information comprises a data storing module, a model editing module, a concurrency control module, an authority control module, a data verifying module and an automatic constructing module;
the data storage module is used for storing structured data, semi-structured data and unstructured data;
the model editing module is used for editing concepts, entities, attributes, hierarchical relationships and concept-entity relationships of the knowledge model;
the concurrency control module is used for carrying out concurrent editing on the data in the database system according to the transaction isolation level;
and the authority control module is used for verifying the login information of the user so as to control the authority of different editing layers.
4. The knowledge-graph constructing algorithm according to claim 1, wherein in S101, the entities are extracted as follows:
s201, using named entity identification for the captured title, and extracting a named entity;
s202, acquiring an unidentified candidate entity word list from the title by using a word segmentation technology;
s203, performing part-of-speech tagging on the candidate entity words, screening out candidate words without practical meanings, then verifying whether the candidate words are entity words on an encyclopedia website, and taking the entity words and the extracted named entities as reference entities.
5. The knowledge-graph constructing algorithm according to claim 1, wherein in S103, a data verification processing module is included; and the data verification processing module is used for verifying the integrity and consistency of the entity, backing up and exporting data, and realizing entity identification and entity disambiguation.
6. The knowledge-graph constructing algorithm according to claim 1, wherein in S2, the data information is stored using a graph database.
CN201911383312.4A 2019-12-28 2019-12-28 Knowledge graph algorithm constructed based on academic literature Withdrawn CN111241293A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911383312.4A CN111241293A (en) 2019-12-28 2019-12-28 Knowledge graph algorithm constructed based on academic literature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911383312.4A CN111241293A (en) 2019-12-28 2019-12-28 Knowledge graph algorithm constructed based on academic literature

Publications (1)

Publication Number Publication Date
CN111241293A true CN111241293A (en) 2020-06-05

Family

ID=70871721

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911383312.4A Withdrawn CN111241293A (en) 2019-12-28 2019-12-28 Knowledge graph algorithm constructed based on academic literature

Country Status (1)

Country Link
CN (1) CN111241293A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487789A (en) * 2020-11-27 2021-03-12 贵州电网有限责任公司 Operation order scheduling logic validity verification method based on knowledge graph
CN117808085A (en) * 2024-02-29 2024-04-02 南京师范大学 Automatic discipline knowledge framework construction method, device, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112487789A (en) * 2020-11-27 2021-03-12 贵州电网有限责任公司 Operation order scheduling logic validity verification method based on knowledge graph
CN112487789B (en) * 2020-11-27 2023-12-01 贵州电网有限责任公司 Operation ticket scheduling logic validity verification method based on knowledge graph
CN117808085A (en) * 2024-02-29 2024-04-02 南京师范大学 Automatic discipline knowledge framework construction method, device, equipment and storage medium
CN117808085B (en) * 2024-02-29 2024-05-07 南京师范大学 Automatic discipline knowledge framework construction method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2021103492A1 (en) Risk prediction method and system for business operations
CN109977110A (en) Data cleaning method, device and equipment
CN109657074B (en) News knowledge graph construction method based on address tree
US8140495B2 (en) Asynchronous database index maintenance
US8924365B2 (en) System and method for range search over distributive storage systems
US7324998B2 (en) Document search methods and systems
US20140222793A1 (en) System and Method for Automatically Importing, Refreshing, Maintaining, and Merging Contact Sets
CN111597347B (en) Knowledge embedding defect report reconstruction method and device
CN110008353A (en) A kind of construction method of dynamic knowledge map
CN104252507B (en) A kind of business data matching process and device
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
US20210334292A1 (en) System and method for reconciliation of data in multiple systems using permutation matching
CN111241293A (en) Knowledge graph algorithm constructed based on academic literature
CN107169003B (en) Data association method and device
CN113722600A (en) Data query method, device, equipment and product applied to big data
CN110502529B (en) Data processing method, device, server and storage medium
CN109460467B (en) Method for constructing network information classification system
CN113742498B (en) Knowledge graph construction and updating method
CN115544181A (en) Ontology-based automatic data loading method for power grid data mart
CN111061853B (en) Method for rapidly acquiring FAQ model training corpus
US11775757B2 (en) Automated machine-learning dataset preparation
CN114218277A (en) Efficient query method and device for relational database
CN113254725A (en) Data management and retrieval enhancement method for graph database
US8706769B1 (en) Processing insert with normalize statements
KR101083425B1 (en) Database detecting system and detecting method using the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200605