CN117610654A - Knowledge graph construction method and system - Google Patents

Knowledge graph construction method and system Download PDF

Info

Publication number
CN117610654A
CN117610654A CN202311750463.5A CN202311750463A CN117610654A CN 117610654 A CN117610654 A CN 117610654A CN 202311750463 A CN202311750463 A CN 202311750463A CN 117610654 A CN117610654 A CN 117610654A
Authority
CN
China
Prior art keywords
data
knowledge
knowledge graph
module
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311750463.5A
Other languages
Chinese (zh)
Inventor
田兆俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Focustar Technology Co ltd
Original Assignee
Guangzhou Focustar Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Focustar Technology Co ltd filed Critical Guangzhou Focustar Technology Co ltd
Priority to CN202311750463.5A priority Critical patent/CN117610654A/en
Publication of CN117610654A publication Critical patent/CN117610654A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for constructing a knowledge graph, and relates to the technical field of knowledge graphs. Comprising the following steps: s1, acquiring map relation data: acquiring data by using a downloading template and a batch uploading mode and storing the data into a database; s2, information extraction: extracting data in a database to obtain a plurality of data sets; s3, knowledge fusion: combining the multiple data sets into one data set to form a knowledge graph; s4, map visualization: and (5) performing automatic knowledge acquisition by using a 3D visualization technology. The word segmentation and part-of-speech tagging do not need manual tagging, so that a great deal of manpower and financial resources are saved; the knowledge graph can be quickly constructed by automatic development, and excessive manpower is not required; revealing the dynamic development rule of the knowledge field, providing a practical and valuable reference for discipline research, and enriching and enhancing the semantic expression capability of the knowledge graph.

Description

Knowledge graph construction method and system
Technical Field
The invention relates to the technical field of knowledge maps, in particular to a method and a system for constructing a knowledge map.
Background
Knowledge Graph (knowledgegraph) is an important branching technology of artificial intelligence, and is proposed by google in 2012, is a structured semantic Knowledge base for describing concepts and interrelationships thereof in physical world in symbol form, and its basic constituent units are 'entity-relationship-entity' triples, and entities and related attribute-value pairs thereof, and the entities are mutually linked through relationships to form a net Knowledge structure.
The knowledge graph can be divided into a general knowledge graph and a domain knowledge graph according to functions and application scenes. The general knowledge graph is oriented to the general field, emphasizes the breadth of knowledge, and is usually structured encyclopedia knowledge, and the aimed user is mainly a common user; the domain knowledge graph is oriented to a specific domain, emphasizes the depth of knowledge, and is usually constructed based on a database of the industry, and the specific user is a practitioner in the industry, a potential industry person, and the like.
The existing knowledge graph can only express some simple associated facts, but the application requirements of many fields are far beyond the simple associated facts expressed by triples, and the applicability of the graph is lower for non-academic persons who do relevant work; therefore, how to enrich and enhance the semantic expression capability of the knowledge graph by using more diversified knowledge representation is still a problem to be solved.
Therefore, a method and a system for constructing a knowledge graph are provided to solve the difficulties existing in the prior art, which are the problems to be solved by the skilled person.
Disclosure of Invention
In view of the above, the invention provides a method and a system for constructing a knowledge graph, which disclose the dynamic development rule of the knowledge field, provide a practical and valuable reference for discipline research, and enrich and enhance the semantic expression capability of the knowledge graph.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
the knowledge graph construction method comprises the following steps:
s1, acquiring map relation data: acquiring data by using a downloading template and a batch uploading mode and storing the data into a database;
s2, information extraction: extracting data in a database to obtain a plurality of data sets;
s3, knowledge fusion: combining the multiple data sets into one data set to form a knowledge graph;
s4, map visualization: and (5) performing automatic knowledge acquisition by using a 3D visualization technology.
Optionally, the data acquired in S1 includes structured data, unstructured data, and semi-structured data.
Optionally, the specific contents of extracting the data in the database in S2 are:
performing word segmentation and part-of-speech tagging on the data content segments to obtain keywords in the content segments; and matching the keywords with the domain ontology according to a preset rule, and obtaining knowledge element examples in the content segments, attributes of the knowledge element examples and association relations among the knowledge element examples to obtain a plurality of data sets.
Optionally, the specific content of the knowledge fusion in S3 is: and performing similarity calculation on the entities with ambiguity in the data sets, then removing the ambiguity by a clustering method, merging the data sets into one data set, and completing knowledge fusion.
Optionally, in S4, the knowledge graph is displayed on the 3D visualization large screen through the Web application framework.
Optionally, the method further comprises the step of storing the knowledge graph by using a Hugegraph graph database.
A knowledge graph construction system, a knowledge graph construction method using any one of the above, includes: the system comprises a map relation data acquisition module, an information extraction module, a knowledge fusion module and a map visualization module;
the map relation data acquisition module is connected with the input end of the information extraction module and is used for finishing data storage by using a downloading template and batch uploading;
the information extraction module is connected with the input end of the knowledge fusion module and is used for extracting data in the database to obtain a plurality of data sets;
the knowledge fusion module is connected with the input end of the map visualization module and is used for combining a plurality of data sets into one data set to form a knowledge map;
and the map visualization module is connected with the output end of the knowledge fusion module and is used for starting automatic knowledge acquisition by utilizing a 3D visualization technology.
Compared with the prior art, the invention provides a knowledge graph construction method and a knowledge graph construction system, which have the following beneficial effects: the word segmentation and the part-of-speech tagging do not need manual tagging, so that a great amount of manpower and financial resources are saved; the knowledge graph can be quickly constructed by automatic development, and excessive manpower is not required; revealing the dynamic development rule of the knowledge field, providing a practical and valuable reference for discipline research, and enriching and enhancing the semantic expression capability of the knowledge graph.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a knowledge graph construction method provided by the invention;
FIG. 2 is a block diagram of a knowledge graph construction system provided by the invention;
fig. 3 is a schematic diagram of a constructed knowledge graph according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the invention discloses a knowledge graph construction method, which comprises the following steps:
s1, acquiring map relation data: acquiring data by using a downloading template and a batch uploading mode and storing the data into a database;
s2, information extraction: extracting data in a database to obtain a plurality of data sets;
s3, knowledge fusion: combining the multiple data sets into one data set to form a knowledge graph;
s4, map visualization: and (5) performing automatic knowledge acquisition by using a 3D visualization technology.
Further, the data acquired in S1 includes structured data, unstructured data, and semi-structured data.
Further, the specific contents of extracting the data in the database in S2 are:
performing word segmentation and part-of-speech tagging on the data content segments to obtain keywords in the content segments; and matching the keywords with the domain ontology according to a preset rule, and obtaining knowledge element examples in the content segments, attributes of the knowledge element examples and association relations among the knowledge element examples to obtain a plurality of data sets.
Further, the specific content of the knowledge fusion in S3 is: and performing similarity calculation on the entities with ambiguity in the data sets, then removing the ambiguity by a clustering method, merging the data sets into one data set, and completing knowledge fusion.
Further, in S4, the knowledge graph is displayed on the 3D visualization large screen through the Web application framework.
Further, the method also comprises the step of storing the knowledge graph by adopting a Hugegraph graph database.
In one embodiment, as shown in fig. 3, a knowledge graph is created to obtain data first, where the data is a source of knowledge, and may be some form, text, database, etc. Structured data, unstructured data, and semi-structured data can be classified according to the type of data. Structured data is data represented by a table, a database and the like according to a certain format, and can be directly used for constructing a knowledge graph. Unstructured data are text, audio, video, pictures and the like, and information extraction is needed to further establish a knowledge graph. Semi-structured data is a data between structured and unstructured, and information extraction is also needed to build a knowledge graph.
When data from different sources is obtained, knowledge fusion of the data is required, i.e. entities representing the same concept are combined, and the data sets from multiple sources are combined into one data set. Thus, final data are obtained, and a corresponding knowledge graph is established on the basis of the final data.
S1 atlas relational data acquisition
The data sources of the domain knowledge graph are divided into three types of structuring, semi-structuring and unstructured, and the data acquisition modes are different. The structured data is stored in a relational database, and is derived from open source resources of a domain knowledge base or a data provider for providing domain consultation services; the semi-structured data does not have a fixed structure and is acquired by utilizing a web crawler; unstructured data contains a large amount of complex information and is the plain text content with the greatest knowledge extraction difficulty.
The use of the Scrapy framework for crawler tasks is as follows:
firstly, defining a container (Item) for storing crawled data, secondly, writing a crawler (Spider) of a target website, extracting information by using a selector (Selectors), and finally writing Pipeline to store the extracted data.
Downloading the text template, acquiring data in a batch uploading mode, and storing the data into a database.
S2, extracting information, namely segmenting the data content segments and marking parts of speech to obtain keywords in the content segments; and matching the keywords with the domain ontology according to a preset rule, and obtaining knowledge element examples in the content segments, attributes of the knowledge element examples and association relations among the knowledge element examples to obtain a plurality of data sets.
1. Knowledge extraction of structured data
The structured data is represented and stored in a two-dimensional form, and a clear corresponding relation exists among the data items. Thus, knowledge acquisition of relational data is typically a transformation thereof into RDF form, utilizing open source tools D2R MAP and D2RQ. Because the mapping difficulty of RDF data and the knowledge ontology model is high, the RDF graph-attribute graph conversion tool is also required to assist in completion.
2. Knowledge extraction of semi-structured data
The semi-structured data itself has a certain structure, is of a wide variety and does not have strict mode constraint, and needs further arrangement to extract domain knowledge. The wrapper tool is utilized to extract data from the HTML web page and revert to structuring for knowledge extraction.
3. Knowledge extraction of unstructured data
The natural language processing task based on the BERT model is realized through two stages of pre-training and fine tuning. In the first stage of pre-training process, language features of texts are fully learned by utilizing large-scale unlabeled text corpus through self-supervision training, deep text vector representation is obtained, and a pre-training model corresponding to the texts is formed. And in the second stage of fine tuning process, directly taking the network parameters which are converged in the last step as an initial model, inputting a marked data set according to a specific processing task, further fitting and converging the model, and finally obtaining a deep learning model for realizing the specific natural language processing task.
And thirdly, S3, knowledge fusion is carried out, similarity calculation is carried out on the entities with ambiguity in the plurality of data sets, the ambiguity is eliminated through a clustering method, and the plurality of data sets are combined into one data set, so that knowledge fusion is completed.
The constructed knowledge graph is stored by adopting a HugeGraph graph database.
(IV) S4 map visualization, wherein the knowledge map is displayed on a 3D visualization large screen through a Web application framework, and the method comprises the following steps:
receiving an http request of a client through a browser; sending the http request to a web server network management gateway; specifying information positions through a uniform resource locator and sending the information positions to a view function; the view function requests data at the data storage layer using the HttpRequest object; the data storage layer calls the database data, extracts corresponding data from the database into the view function according to the object required in the view function, transmits the data into the presentation layer through the template language after the data is processed in the view function, and returns an http request to the browser to be presented to the user.
In another specific embodiment, the content of the knowledge graph constructed by the specific application is:
converting the information into structured data, extracting the structured data, and segmenting the extracted structured data to obtain a plurality of text content fragments; obtaining entity information of a plurality of key entities; melting entity information of a plurality of key entities and data in a database according to entity relations respectively to generate a new datamation structure; generating a corresponding knowledge graph from the obtained new data structure, and storing the knowledge graph; and displaying the generated knowledge graph by using a 3D visualization technology.
Corresponding to the method shown in fig. 1, the embodiment of the invention further provides a knowledge graph construction system, which is used for implementing the method shown in fig. 1, and the structure schematic diagram is shown in fig. 2, and specifically includes:
the system comprises a map relation data acquisition module, an information extraction module, a knowledge fusion module and a map visualization module;
the map relation data acquisition module is connected with the input end of the information extraction module and is used for finishing data storage by using a downloading template and batch uploading;
the information extraction module is connected with the input end of the knowledge fusion module and is used for extracting data in the database to obtain a plurality of data sets;
the knowledge fusion module is connected with the input end of the map visualization module and is used for combining a plurality of data sets into one data set to form a knowledge map;
and the map visualization module is connected with the output end of the knowledge fusion module and is used for starting automatic knowledge acquisition by utilizing a 3D visualization technology.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. The knowledge graph construction method is characterized by comprising the following steps of:
s1, acquiring map relation data: acquiring data by using a downloading template and a batch uploading mode and storing the data into a database;
s2, information extraction: extracting data in a database to obtain a plurality of data sets;
s3, knowledge fusion: combining the multiple data sets into one data set to form a knowledge graph;
s4, map visualization: and (5) performing automatic knowledge acquisition by using a 3D visualization technology.
2. The method for constructing a knowledge graph according to claim 1, wherein,
the data acquired in S1 includes structured data, unstructured data, and semi-structured data.
3. The method for constructing a knowledge graph according to claim 1, wherein,
the specific content of extracting the data in the database in S2 is as follows:
performing word segmentation and part-of-speech tagging on the data content segments to obtain keywords in the content segments; and matching the keywords with the domain ontology according to a preset rule, and obtaining knowledge element examples in the content segments, attributes of the knowledge element examples and association relations among the knowledge element examples to obtain a plurality of data sets.
4. The method for constructing a knowledge graph according to claim 1, wherein,
the specific content of the knowledge fusion in the S3 is as follows: and performing similarity calculation on the entities with ambiguity in the data sets, then removing the ambiguity by a clustering method, merging the data sets into one data set, and completing knowledge fusion.
5. The method for constructing a knowledge graph according to claim 1, wherein,
and S4, displaying the knowledge graph on the 3D visualization large screen through a Web application framework.
6. The method for constructing a knowledge graph according to claim 1, wherein,
the method also comprises the step of storing the knowledge graph by adopting a Hugegraph graph database.
7. A knowledge graph construction system, characterized in that a knowledge graph construction method according to any one of claims 1-6 is applied, comprising: the system comprises a map relation data acquisition module, an information extraction module, a knowledge fusion module and a map visualization module;
the map relation data acquisition module is connected with the input end of the information extraction module and is used for finishing data storage by using a downloading template and batch uploading;
the information extraction module is connected with the input end of the knowledge fusion module and is used for extracting data in the database to obtain a plurality of data sets;
the knowledge fusion module is connected with the input end of the map visualization module and is used for combining a plurality of data sets into one data set to form a knowledge map;
and the map visualization module is connected with the output end of the knowledge fusion module and is used for starting automatic knowledge acquisition by utilizing a 3D visualization technology.
CN202311750463.5A 2023-12-19 2023-12-19 Knowledge graph construction method and system Pending CN117610654A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311750463.5A CN117610654A (en) 2023-12-19 2023-12-19 Knowledge graph construction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311750463.5A CN117610654A (en) 2023-12-19 2023-12-19 Knowledge graph construction method and system

Publications (1)

Publication Number Publication Date
CN117610654A true CN117610654A (en) 2024-02-27

Family

ID=89944336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311750463.5A Pending CN117610654A (en) 2023-12-19 2023-12-19 Knowledge graph construction method and system

Country Status (1)

Country Link
CN (1) CN117610654A (en)

Similar Documents

Publication Publication Date Title
US11899681B2 (en) Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
CN107391677B (en) Method and device for generating Chinese general knowledge graph with entity relation attributes
US8874552B2 (en) Automated generation of ontologies
CN111581990B (en) Cross-border transaction matching method and device
CN117056471A (en) Knowledge base construction method and question-answer dialogue method and system based on generation type large language model
US9400835B2 (en) Weighting metric for visual search of entity-relationship databases
CN111488467B (en) Construction method and device of geographical knowledge graph, storage medium and computer equipment
Southwick A guide for transforming digital collections metadata into linked data using open source technologies
CN104239340A (en) Search result screening method and search result screening device
CN101359332A (en) Design method for visual search interface with semantic categorization function
CN110110090A (en) Searching method, education search engine system and device
CN112115252B (en) Intelligent auxiliary writing processing method and device, electronic equipment and storage medium
CN108304519B (en) Knowledge forest construction method based on graph database
CN114218333A (en) Geological knowledge map construction method and device, electronic equipment and storage medium
KR101161241B1 (en) Information-providing system of augmented reality system for interworking with semantic web
US20100205229A1 (en) System and method for instances registering based on history
CN117610649A (en) Knowledge graph construction method and device, storage medium and electronic equipment
CN116523041A (en) Knowledge graph construction method, retrieval method and system for equipment field and electronic equipment
Carboni et al. Towards a semantic documentation of heritage objects through visual and iconographical representations
CN116341569A (en) Professional document intelligent auxiliary reading method based on domain knowledge base
CN117610654A (en) Knowledge graph construction method and system
CN113407668B (en) Data processing method and device for cognitive association capacity training
Pulsifer et al. The cartographer as mediator: cartographic representation from shared geographic information
Lang et al. The next-generation search engine: Challenges and key technologies
Hosam et al. The design and development of exceptional representation based on domain ontology and multi-agent systems for e-learning purposes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination