CN114637766B - Intelligent question-answering method and system based on natural resource industrial chain knowledge graph - Google Patents

Intelligent question-answering method and system based on natural resource industrial chain knowledge graph Download PDF

Info

Publication number
CN114637766B
CN114637766B CN202210536817.5A CN202210536817A CN114637766B CN 114637766 B CN114637766 B CN 114637766B CN 202210536817 A CN202210536817 A CN 202210536817A CN 114637766 B CN114637766 B CN 114637766B
Authority
CN
China
Prior art keywords
natural resource
data
chain
natural
industry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210536817.5A
Other languages
Chinese (zh)
Other versions
CN114637766A (en
Inventor
闫伟
王超越
张亮
王吉华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202210536817.5A priority Critical patent/CN114637766B/en
Publication of CN114637766A publication Critical patent/CN114637766A/en
Application granted granted Critical
Publication of CN114637766B publication Critical patent/CN114637766B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • G06F16/86Mapping to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention relates to the field of data processing suitable for prediction purposes, and discloses an intelligent question-answering method and system based on a knowledge graph of a natural resource industrial chain, wherein the method comprises the following steps: acquiring natural resource industrial data; converting the semi-structured data into new structured data; storing all structured data in a relational database; analyzing all the structured data to construct a domain ontology of a natural resource industrial chain; taking the domain ontology as a mode layer of a knowledge graph of a natural resource industrial chain; configuring a mapping relation between the domain ontology and the structured natural resource industrial chain data in the relational database; exporting structured data stored in a database into ternary group data based on the mapping relation, and taking the ternary group data as a data layer of a knowledge graph of the natural resource industrial chain; the method comprises the steps of obtaining natural language questions of the natural resource industry, carrying out answer search on natural language based on a knowledge graph of the natural resource industry chain, and outputting answers corresponding to the natural language.

Description

Intelligent question-answering method and system based on natural resource industrial chain knowledge graph
Technical Field
The invention relates to the field of data processing suitable for prediction purposes, in particular to an intelligent question-answering method and system based on a knowledge graph of a natural resource industrial chain.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
In recent years, with continuous enlargement of scale of enterprises related to natural resources, continuous optimization of industrial innovation systems and deepening of comprehensive management of natural resources, the single-family resource economic research based on departments or industries in the past cannot meet the working requirement of natural resource management, and systematic research on the natural resource economy and the whole industry is urgently needed.
In the prior art, in order to understand natural resources, users generally perform manual sorting and analysis on the information in a single-sided manner to obtain information of the natural resources.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides an intelligent question answering method and system based on a knowledge graph of a natural resource industrial chain; the inference function of the knowledge graph can be used for reasoning the natural resource industry knowledge stored in the industry chain knowledge graph, and new knowledge which is not mined is obtained and learned from the known natural resource industry knowledge and facts.
The first aspect provides an intelligent question-answering method based on a knowledge graph of a natural resource industry chain;
an intelligent question-answering method based on a knowledge graph of a natural resource industrial chain comprises the following steps:
acquiring natural resource industrial data; the natural resource industry data comprises semi-structured data and structured data; converting the semi-structured data into new structured data; storing the new structured data and the original structured data in a relational database;
analyzing all the structured data, and constructing a domain ontology of a natural resource industrial chain; taking the domain ontology as a mode layer of a knowledge graph of a natural resource industrial chain;
configuring a mapping relation between the domain ontology and the structured natural resource industrial chain data in the relational database; exporting structured data stored in a database into ternary group data based on the mapping relation, and taking the ternary group data as a data layer of a knowledge graph of the natural resource industrial chain; further constructing a natural resource industrial chain knowledge graph;
the method comprises the steps of obtaining natural language questions in the aspect of natural resource industry, searching answers of natural language based on a knowledge graph of a natural resource industry chain, and outputting answers corresponding to the natural language.
In a second aspect, an intelligent question-answering system based on a natural resource industry chain knowledge graph is provided;
intelligent question-answering system based on natural resource industry chain knowledge map includes:
an acquisition module configured to acquire natural resource industry data; the natural resource industry data comprises semi-structured data and structured data; converting the semi-structured data into new structured data; storing the new structured data and the original structured data in a relational database;
the ontology construction module is configured to analyze all the structured data and construct a domain ontology of the natural resource industrial chain; taking the domain ontology as a mode layer of a knowledge graph of a natural resource industrial chain;
a mapping relationship configuration module configured to configure a mapping relationship between the domain ontology and the structured natural resource industry chain data in the relational database; exporting structured data stored in a database into ternary group data based on the mapping relation, and taking the ternary group data as a data layer of a knowledge graph of the natural resource industrial chain; further constructing a knowledge graph of the natural resource industrial chain;
and the answer output module is configured to acquire natural language questions in the aspect of natural resource industry, perform answer search on the natural language based on the natural resource industry chain knowledge graph, and output answers corresponding to the natural language.
Compared with the prior art, the invention has the beneficial effects that:
firstly, a series of methods for designing, constructing and applying the industrial chain knowledge graph and the intelligent question-answering system can comprehensively and accurately answer the relevant questions in the natural resource industry field proposed by the user.
And secondly, classifying the collected industrial chain data into structured and semi-structured data according to different data types, wherein the accuracy of knowledge stored in the knowledge map can be remarkably improved after the data from various sources are cleaned and knowledge fused.
Finally, the problem searching range can be reduced and the searching efficiency and performance can be improved by using the method of predefining the problem template.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flowchart of a method according to a first embodiment.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
All data are obtained according to the embodiment and are legally applied on the data on the basis of compliance with laws and regulations and user consent.
The system can provide natural resource price evaluation from natural resource survey including various chains such as a supply chain, an information chain, a technical chain, a service chain, a talent chain, a fund chain and the like of the natural resource industry in a question-answer mode, and can provide related industrial information from ecological restoration to ecological economy and the like. In addition, because an upstream link in the industrial chain can convey products or services to a downstream link, and a downstream link can also feed back information to the upstream link, namely, a large amount of upstream and downstream relations and mutual valuable information exchange exist in the industrial chain, the question-answering system can simultaneously use the reasoning function of the knowledge graph to reason the natural resource industry knowledge stored in the knowledge graph of the industrial chain, and obtain new and un-mined knowledge from the known natural resource industry knowledge and facts, thereby combing out the definite upstream and downstream industrial chain relations of the natural resource industry, defining the competitive relation of each link of the natural resource industry through the natural resource industry chain relation, and helping an enterprise to define the main industrial direction of a natural resource industry plate. Finally, natural resource industrial layout is optimized, industrial chains are perfected, and the solid economy is strengthened.
The current knowledge graph data are more stored in a graph database, the graph database is stored in a graph mode, the graph database has the advantages that the query and search speed is high, the entity nodes in the graph database can keep attributes, the entity can keep more information, the graph database has complete query sentences like other relational databases, most graph mining algorithms are supported, and the graph database with the widest use range is Neo4j at present, so that the knowledge graph in the industry chain is stored in the graph database mode.
Example one
The embodiment provides an intelligent question-answering method based on a knowledge graph of a natural resource industrial chain;
as shown in fig. 1, the intelligent question-answering method based on the natural resource industry chain knowledge graph includes:
s101: acquiring natural resource industrial data; the natural resource industry data comprises semi-structured data and structured data; converting the semi-structured data into new structured data; storing the new structured data and the original structured data in a relational database;
s102: analyzing all the structured data to construct a domain ontology of a natural resource industrial chain; taking the domain ontology as a mode layer of a knowledge graph of a natural resource industrial chain;
s103: configuring a mapping relation between the domain ontology and the structured natural resource industrial chain data in the relational database; exporting structured data stored in a database into ternary group data based on the mapping relation, and taking the ternary group data as a data layer of a knowledge graph of the natural resource industrial chain; further constructing a natural resource industrial chain knowledge graph;
s104: the method comprises the steps of obtaining natural language questions in the aspect of natural resource industry, carrying out answer search on natural language based on a knowledge graph of a natural resource industry chain, and outputting answers corresponding to the natural language.
Further, the step S101: acquiring natural resource industrial data; specifically, two modes of web crawlers and manual collection are adopted for obtaining.
Further, the structured data includes: natural resource industry data in various databases, EXCEL files and comma separated value CSV files.
Further, the semi-structured data comprises: natural resource industry chain data information in a hypertext markup language (HTML) webpage.
Further, the natural resource industry chain data information in the hypertext markup language HTML webpage includes: agricultural department, natural resources department web pages of provinces and cities and local governments; related enterprises of the natural resource industry, official websites of the organization, other related webpages of the natural resource industry in the Internet and the like.
Further, the converting the semi-structured data into new structured data; the method specifically comprises the following steps:
noise elimination is carried out on unstructured data in webpage content through a web crawler tool;
and importing the structured data obtained after the noise is removed into a relational database MYSQL to convert the semi-structured data into structured data.
Further, the S102: analyzing all the structured data, and constructing a domain ontology of a natural resource industrial chain; the method specifically comprises the following steps:
and manually extracting, analyzing and summarizing the structured data to obtain terms, basic concepts and relationships among the concepts in the natural resource industry field, and finally constructing a natural resource industry chain field ontology which is used as a mode layer in a knowledge graph of the natural resource industry chain and defines the concepts and relationships of the ontology.
Further, the S102: analyzing all the structured data to construct a domain ontology of a natural resource industrial chain; the method specifically comprises the following steps:
s1021: constructing a natural resource industrial chain body in a top-down mode;
s1022: analyzing natural resource industrial chain data stored in a relational database, and determining common concepts and terms of fields required to be covered by ontology construction;
s1023: analyzing natural resource industry chain data stored in a relational database; the industry is divided into resource pedigree, asset pedigree, capital pedigree, industry pedigree and technology pedigree;
s1024: classes of natural resources industry chain ontologies are created, as well as hierarchies, relationships, attributes and ontology axioms of the classes (for implementing deductive reasoning).
Further, the S1023: analyzing natural resource industry chain data stored in a relational database; the industry is divided into five major categories: resource pedigree, asset pedigree, capital pedigree, industry pedigree, technology pedigree; the method specifically comprises the following steps:
the content in the resource pedigree belongs to the material resources which can be directly obtained in the natural resource industry and are used for production and life, and comprises land resources, mine resources, soil resources, ecological resources, low-efficiency idle resources and subordinate resources contained in resources in various fields.
The asset pedigree comprises derived assets, special towns, industrial parks, farmland infrastructure, platform systems and various subordinate assets contained by various assets, which are based on natural resources in the natural resource industry.
The industrial pedigree comprises various related industries depending on natural resources, wherein the various related industries comprise a land development industry, a modern agriculture industry, a soil restoration industry, a mine restoration and treatment industry, industrial park construction and operation, a natural resource related service industry, and subordinate and downstream industries thereof.
The capital pedigree mainly refers to industrial capital related to natural resources, including capital financing, land finance, asset finance, industrial financing, supply chain finance, private equity fund, government subsidy, and subordinate contents included in each capital.
The technical pedigree relates to resource development technology, resource restoration technology and resource production technology in various natural resource industries, and related technologies applied to the natural resource industries in different fields and different stages serve as subclasses of development, restoration and production technologies.
Further, the S1024: creating a class of a natural resource industrial chain body and a hierarchy structure, a relation, an attribute and a body axiom of the class; the method specifically comprises the following steps:
for class definition, five abstract classes are defined in the natural resource domain ontology: resource pedigrees, technology pedigrees, capital pedigrees, asset pedigrees, industry pedigrees, each abstract class defining subclasses of levels according to different domains and different classifications.
For the definition of the relationship, there are also a lot of upstream and downstream relationships in the industry chain and information exchange with each other, so there are relationships of "belonging", "forward driving", "backward driving", etc. between parent class and child class.
Selecting use ontology development tools
Figure 880809DEST_PATH_IMAGE001
Creating determined classes and their hierarchies, relationships, attributes, instances and ontology axioms, and ontology developmentTool with a locking mechanism
Figure 465374DEST_PATH_IMAGE001
Specific ontology description languages are shielded, and a user only needs to construct a domain ontology model on a concept level.
Further, the step S103: configuring a mapping relation between the domain ontology and the structured natural resource industrial chain data in the relational database; the method specifically comprises the following steps:
using a mapping tool OnTop to define a mapping rule between two data; directly accessing a relational database in a mode of accessing a knowledge graph by using a mapping tool OnTop, converting structured data stored in MySQL into RDF data, and mapping the RDF data to the RDF data through a mapping file constructed in advance
Figure 570733DEST_PATH_IMAGE001
On a defined ontology, the mapping tool OnTop can directly use the SPARQL statement to express the knowledge graph as a SQL query executed by a relational source, and further query information in an industry chain database mapped in the ontology, but the data is kept in the relational database instead of being moved to another database. Structured data can be mapped to ontology development tool through mapping tool OnTop
Figure 192338DEST_PATH_IMAGE002
The edited ontology.
Further, the triple data is triple data of the resource description framework.
Exemplarily, for data of the agricultural big data industry field stored in a certain table in the relational database, mapping the data of the agricultural big data industry field in the MySQL database into the established natural resource industry chain ontology, specifically:
the field 'sequence number' in the rural agriculture big data table stored in the MySQL database is mapped to a class in the ontology: "agricultural big data industry";
the field "case name" maps to the data attributes defined in the ontology: "Name _ of _ the _ case";
the field "declaration unit" is mapped to The data attribute "The _ notification" defined in The ontology.
The instance mapping may be ontology-developed software
Figure 15938DEST_PATH_IMAGE001
The method for realizing downloading of the OnTop plug-in specifically comprises the following steps of adding a Mapping statement in a Mapping editor of OnTop Mapping:
agricultural rural big data/station
Figure 138615DEST_PATH_IMAGE004
Serial number
Figure 973847DEST_PATH_IMAGE006
a;
:
Figure 156566DEST_PATH_IMAGE007
The agricultural big data industry;
:
Figure 467462DEST_PATH_IMAGE008
Figure 3617DEST_PATH_IMAGE009
Figure 817989DEST_PATH_IMAGE004
case name
Figure 171610DEST_PATH_IMAGE006
Figure 845168DEST_PATH_IMAGE010
xsd:string;
:
Figure 309647DEST_PATH_IMAGE008
Figure 978526DEST_PATH_IMAGE011
Figure 643993DEST_PATH_IMAGE004
Reporting unit
Figure 663902DEST_PATH_IMAGE006
Figure 666493DEST_PATH_IMAGE010
xsd:string。
Further, the structural data stored in the database is exported into ternary group data based on the mapping relation, and the ternary group data is used as a data layer of the knowledge graph of the natural resource industry chain; the method specifically comprises the following steps:
structured data in the relational database is exported as data in RDF triple format using materialize command inside OnTop.
The materialize command provides a "materialization program" by which mapping rules generate RDF data from the database.
The materialize command will take all triples that the map can generate from the data source and output them, the user selecting between three output formats: turtle, N-triples, or RDF/XML, for very large datasets, some time may be required to generate an output.
Adding a self-defined Mapping statement in the Mapping of on Mapping, and editing the software in the ontology
Figure 596403DEST_PATH_IMAGE012
After adding the plug-in OnTop, searching the materializeriples command, mapping the new structured data in the database to the OnTop
Figure 291827DEST_PATH_IMAGE012
And converting the developed ontology into corresponding data in an RDF triple format, and creating an external file to store the RDF triples.
Structured data stored in a database is mapped to by OnTop
Figure 533452DEST_PATH_IMAGE012
The book developedAfter the data is converted into data in an RDF triple format on the body, the data is used as a data layer ABox of the knowledge graph, and the data layer contains the extension knowledge and describes the designated individual in the domain of discourse. The natural resource industry chain knowledge graph is developed by combining a mode layer which takes a construction ontology as a knowledge graph with a data layer which takes an RDF triple as a knowledge graph.
Adding a self-defined Mapping statement in the Mapping of on Mapping, and editing the software in the ontology
Figure 339734DEST_PATH_IMAGE012
In the added plug-in OnTop, the query of the SPARQL query statement on the structured data stored in the relational database can be realized through the OnTop SPARQL, and the mapping result can be verified through the SPARQL query.
Illustratively, the ontology editing software is added with a custom Mapping statement based on the Mapping statement added in the Mapping editor of the on Mapping in the S1013 example
Figure 386800DEST_PATH_IMAGE012
After adding plug-in OnTop, the SPARQL query statement is edited by SPARQL queryeditor in OnTop SPARQL
Figure 987546DEST_PATH_IMAGE012
To query the data in the relational database. Verifying, by the SPARQL query statement, which data in the structured data stored in the relational database maps to classes in the ontology: "agricultural big data industry," as an example of this class; those data map to data attributes defined in the ontology: "Name _ of _ the _ case" as an example of the data attribute; those data are mapped to a data attribute "The _ notification" defined in The ontology as an instance of The data attribute.
The SPARQL statement that queries the mapping result is:
PREFIX :
Figure 982046DEST_PATH_IMAGE013
http://www.semanticweb.org/dell/ontologies/2022/1/untitled-ontology-2
Figure 201806DEST_PATH_IMAGE007
Figure 230942DEST_PATH_IMAGE014
PREFIX owl:
Figure 2589DEST_PATH_IMAGE013
http://www.w3.org/2002/07/owl
Figure 359752DEST_PATH_IMAGE007
Figure 773416DEST_PATH_IMAGE014
PREFIX rdf:
Figure 391479DEST_PATH_IMAGE013
http://www.w3.org/1999/02/22-rdf-syntax-ns
Figure 474973DEST_PATH_IMAGE007
Figure 178487DEST_PATH_IMAGE014
PREFIX xml:
Figure 130262DEST_PATH_IMAGE013
http://www.w3.org/XML/1998/namespace
Figure 743777DEST_PATH_IMAGE014
PREFIX xsd:
Figure 857227DEST_PATH_IMAGE013
http://www.w3.org/2001/XMLSchema
Figure 313616DEST_PATH_IMAGE007
Figure 678869DEST_PATH_IMAGE014
PREFIX rdfs:
Figure 271525DEST_PATH_IMAGE013
http://www.w3.org/2000/01/rdf-schema
Figure 821455DEST_PATH_IMAGE007
Figure 499561DEST_PATH_IMAGE014
SELECT DISTINCT
Figure 668505DEST_PATH_IMAGE015
serial number
Figure 115667DEST_PATH_IMAGE016
Case name
Figure 836498DEST_PATH_IMAGE015
Reporting unit
WHERE
Figure 877266DEST_PATH_IMAGE004
Figure 974535DEST_PATH_IMAGE015
Serial number a is agricultural big data industry;
:
Figure 276204DEST_PATH_IMAGE017
Figure 40373DEST_PATH_IMAGE015
a case name;
:
Figure 958650DEST_PATH_IMAGE018
Figure 328452DEST_PATH_IMAGE016
reporting a unit;
Figure 625572DEST_PATH_IMAGE006
further, the S104: acquiring natural language questions in the aspect of natural resource industry, searching answers of natural language based on a knowledge graph of a natural resource industry chain, and outputting answers corresponding to the natural language; the method specifically comprises the following steps:
s1041: acquiring a natural language problem in the aspect of natural resource industry;
s1042: performing word segmentation, part of speech tagging and keyword extraction processing on the natural language question;
s1043: matching the keywords with a predefined problem template, and screening out the problem template with the highest similarity;
s1044: and searching the query sentence corresponding to the problem template with the highest similarity, querying in the knowledge graph of the natural resource industry chain according to the query sentence, and outputting a final query result through a response function.
Further, the S1042: performing word segmentation, part of speech tagging and keyword extraction processing on the natural language question; specifically, a Chinese natural language processing tool is adopted for processing, and HanLP carries out word segmentation, part-of-speech tagging and keyword extraction processing on natural language questions.
Further, the step S1043: matching the keywords with a predefined problem template, and screening out the problem template with the highest similarity; specifically, a similarity matching algorithm is adopted for similarity calculation. The similarity matching algorithm is a cosine similarity algorithm.
Further, the method further comprises: s105: and based on the structured data, performing visualization and analysis on the natural resource industry data.
Further, the triple data is stored in an Apache-jena-fuseki server as a data source of the question answering system. Meanwhile, the server is responsible for receiving the query request, and after receiving the request, the server performs corresponding SPARQL query on the RDF triple stored in the server; and simultaneously, the system is responsible for the feedback of the query result.
The configuration of the Apache-jena-fuseki server specifically comprises the following steps: downloading two packets of apache-jena and apache-jena-fuseki, writing a path into a system environment variable, and running a fuseki-server. The web service can be viewed at localhost:3030 after the browser is opened.
Further, the step S105: based on the structured data, performing visualization and analysis on the natural resource industrial data; the method specifically comprises the following steps:
and the visual storage, retrieval and query of the knowledge graph of the natural resource industry knowledge chain are realized.
And storing the industry chain knowledge map in a database mode, and updating and visualizing the industry chain knowledge remotely through a browser.
Illustratively, the S105: based on the structured data, performing visualization and analysis on the natural resource industrial data; the method specifically comprises the following steps:
uploading the structured natural resource industrial chain data and the semi-structured natural resource industrial chain data obtained in the natural resource industrial chain data collection and extraction stage to a graph database Neo4j deployed on an Alice cloud server through Python in a TXT or CSV data format;
the visualized natural resource industrial chain network graph displays nodes and relations, and the back abstracts industrial chain information into the nodes and relations to construct a visualized network, breaks through the traditional knowledge form, organizes industrial data into a multi-type and multi-dimensional rich knowledge form, and displays the expressed knowledge through graph drawing, thereby helping people analyze the industrial chain data.
The link is placed in a question-answering system, and a browser installed in a computer can be automatically applied to open a remote access server website by clicking the link, so that the browser can enter a neo4j database for storing an industry chain knowledge map.
The user may enter a user name (Username): neo4j and Password (Password): neo4j may enter the database and perform queries, updates, etc. on the database. For example, we query the node "road renovation industry pedigree in the field" to the node "agricultural preparation industry pedigree".
It should be understood that the natural resource industry chain intelligent question answering accurately positions the question knowledge required by the user in a question-answer mode, and provides personalized information service for the user through interaction with the website user.
The present invention is based on the Python 3.9 platform and is described in MySQL 5.1.22, Navicat Premium 15,
Figure 953785DEST_PATH_IMAGE019
3.4.3, apache-jena-4.4.0, apache-jena-fuseki-4.4.0, neo4j-community-4.2.7 and ontop-cli-4.2.0. The technical scheme uses the user to input the problem on the interactive interface, which typical cases of the domestic agricultural and rural big data industry exist
Figure 828200DEST_PATH_IMAGE020
For example, firstly, a user logs in a natural resource industry chain question-answering system interface at a client, and inputs a question to be proposed in a dialog box, wherein the question is the typical case of the domestic agricultural rural big data industry
Figure 408217DEST_PATH_IMAGE021
After the system receives the natural language question, the system uses HanLP to perform word segmentation and part-of-speech tagging on the question, matches the obtained keywords 'big agricultural and rural data' and 'typical case' with a predefined question template, uses a matching algorithm to send a query request to an apache-jena-fuseki server by a SPARQL query sentence corresponding to the question template with the highest matching degree, and feeds a query result back to a front-end query interface after the server receives the query request.
For the information of the upstream and downstream industrial chains of the natural resource industry, which is combed by the data mining technology, the invention describes and visualizes the industrial chain knowledge resources by using the knowledge graph in the artificial intelligence field, simultaneously analyzes and mines the mutual relation among the knowledge in the industrial chain knowledge graph, and constructs an intelligent question-answering system in the natural resource industrial chain field, thereby helping enterprises of companies to determine the competitive relation of all links of the natural resource industry, analyze the classification of important natural resources, the upstream and downstream industrial chain relation and the industrial ecosystem, and perfect the investment mode and the investment layout strategy of the natural resources.
The industry chain is a chain type incidence relation form, and all industries are linked based on a certain technical and economic relation and are linked according to a specific logic relation and a space-time layout relation. Meanwhile, a large amount of exchange of upstream and downstream relations and mutual values exists in an industrial chain, an upstream link conveys products or services to a rapid downstream link, and a downstream link feeds back information to the upstream link. In order to quickly inquire pointed answers according to different industrial chain problems, a natural resource industrial chain intelligent question-answering system is developed based on a natural resource industrial chain knowledge graph established at present, knowledge is extracted from industrial chain related data by applying a natural language processing related technology, an industrial chain knowledge graph is constructed, and industrial chain semantic search and intelligent question-answering services are provided by applying a semantic search and question-answering system related technology on the basis of the knowledge graph. The system can find corresponding answers in the knowledge map on the basis of accurately identifying the user question intentions, supports question answering of various data such as entities, attributes, relations and the like, and can trace the source of industrial knowledge. The structured industry chain knowledge graph can provide more accurate answers for an industry chain question-answering system, and the relevant answers can be very conveniently expanded by depending on the industry chain entity incidence relation in the knowledge graph, so that the question-answering based on the industry chain knowledge graph is the standard component configuration of the industry chain intelligent question-answering system.
The invention also comprises a function of visualizing the knowledge stored in the knowledge graph of the natural resource industry chain. By displaying the knowledge of the structured representation in a graphical mode, the industrial knowledge is expressed in a mode closer to human understanding and cognition, and the capability of better organizing and displaying mass information is provided, so that more complex relationships and more unique and valuable information can be known from the knowledge.
In a second aspect, an intelligent question-answering system based on a knowledge graph of a natural resource industry chain is provided;
intelligent question-answering system based on natural resource industry chain knowledge map includes:
an acquisition module configured to: acquiring natural resource industrial data; the natural resource industry data comprises: semi-structured data and structured data; converting the semi-structured data into new structured data; storing the new structured data and the original structured data in a relational database;
an ontology building module configured to: analyzing all the structured data to construct a domain ontology of a natural resource industrial chain; taking the domain ontology as a mode layer of a knowledge graph of a natural resource industrial chain;
a mapping configuration module configured to: configuring a mapping relation between the domain ontology and the structured natural resource industrial chain data in the relational database; exporting structured data stored in a database into ternary group data based on the mapping relation, and taking the ternary group data as a data layer of a knowledge graph of the natural resource industrial chain; further constructing a natural resource industrial chain knowledge graph;
an answer output module configured to: the method comprises the steps of obtaining natural language questions in the aspect of natural resource industry, carrying out answer search on natural language based on a knowledge graph of a natural resource industry chain, and outputting answers corresponding to the natural language.
Further, acquiring natural language questions in the aspect of natural resource industry, searching answers of natural language based on a knowledge graph of a natural resource industry chain, and outputting answers corresponding to the natural language; the method specifically comprises the following steps:
acquiring a natural language problem in the aspect of natural resource industry;
performing word segmentation, part of speech tagging and keyword extraction processing on the natural language question;
matching the keywords with a predefined problem template, and screening out the problem template with the highest similarity;
and searching the query sentence corresponding to the problem template with the highest similarity, querying in the knowledge graph of the natural resource industry chain according to the query sentence, and outputting a final query result through a response function.
The intelligent question-answering system based on the natural resource industry chain knowledge graph is characterized in that the functional architecture developed by the system is divided into a user side and an administrator side. The technical architecture of the system is divided into three parts, including: front end, background, algorithm.
In the front end, the current mainstream framework Vue is used for development, the Vue framework is a progressive framework for constructing user pages, responsive data binding and combined view components can be realized through an API (application program interface) which is as simple as possible, and the development efficiency of the front end can be greatly improved. The interface provides functions of intelligent question answering, intelligent topic recommendation and the like; meanwhile, in the aspect of page design, the method is simple and attractive, is quick in response, provides good use experience for users, and can solve most problems related to the natural resource industry chain for the users.
The back end is a management page of a developer on the background, provides related functions such as document uploading, visual chart viewing of database contents, user question asking and the like for the manager, and is convenient for the worker to carry out integral analysis on the problems proposed by all users. The background main frame is a Python-based flash frame, the flash is a lightweight Web application frame written by using Python, has the characteristics of flexibility, lightness and simplicity, and is very suitable for being used under the condition of low development cost, the flash is equivalent to a kernel, almost all other functions are required to be expanded, and the expansion of a third party is required to realize, so that the customizability of the flash frame is greatly improved. The invention realizes some basic function modules in the flash, such as knowledge question answering between the user and the website robot, and can realize the functions of automatic completion, accurate question asking and the like in a question answering system by using the existing chat robot API.
In the aspect of algorithm, the similarity matching algorithm is used for analyzing the matching degree of the problem and a predefined template, and the problem template with the highest similarity is screened out. The research idea is as follows:
the patent selects a text matching algorithm in the natural language processing field to perform similarity matching on natural language questions and predefined questions, and the algorithm is mainly used for search engines, question-answering systems and the like and aims to find texts most relevant to target texts. And carrying out SPARQL query according to the problem template with the maximum matching degree and returning a query result. And matching the consistent query sentences according to the keywords, and after the matching is successful, querying the sentences through the corresponding SPARQL and returning.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (3)

1. An intelligent question-answering method based on a natural resource industry chain knowledge graph is characterized by comprising the following steps:
acquiring natural resource industrial data; wherein the natural resource industry data comprises semi-structured data and structured data; converting the semi-structured data into new structured data; storing the new structured data and the original structured data in a relational database;
analyzing all the structured data to construct a domain ontology of a natural resource industrial chain; taking the domain ontology as a mode layer of a knowledge graph of a natural resource industrial chain;
configuring a mapping relation between a domain ontology and structured natural resource industrial chain data in a relational database; exporting structured data stored in a database into ternary group data based on the mapping relation, and taking the ternary group data as a data layer of a knowledge graph of the natural resource industrial chain; further constructing a natural resource industrial chain knowledge graph;
acquiring natural language questions in the aspect of natural resource industry, searching answers of natural language based on a knowledge graph of a natural resource industry chain, and outputting answers corresponding to the natural language;
converting the semi-structured data into new structured data; the method specifically comprises the following steps:
noise elimination is carried out on unstructured data in webpage content through a web crawler tool;
importing the structured data obtained after the noise is removed into a relational database MYSQL to convert the semi-structured data into structured data;
analyzing all the structured data, and constructing a domain ontology of a natural resource industrial chain; the method specifically comprises the following steps:
manually extracting, analyzing and summarizing the structured data to obtain terms, basic concepts and relations among the concepts in the natural resource industry field, and finally constructing a natural resource industry chain field ontology which is used as a mode layer in a knowledge graph of the natural resource industry chain and defines the concepts and relations of the ontology;
analyzing all the structured data to construct a domain ontology of a natural resource industrial chain; the method specifically comprises the following steps:
constructing a natural resource industrial chain body in a top-down mode;
analyzing natural resource industrial chain data stored in a relational database, and determining common concepts and terms of fields required to be covered by ontology construction;
analyzing natural resource industry chain data stored in a relational database; the industry is divided into resource pedigree, asset pedigree, capital pedigree, industry pedigree and technology pedigree;
creating a class of a natural resource industrial chain body and a hierarchy structure, a relation, an attribute and a body axiom of the class;
creating a class of a natural resource industrial chain body and a hierarchy structure, a relation, an attribute and a body axiom of the class; the method specifically comprises the following steps:
for the definition of the classes, a resource pedigree, a technical pedigree, a capital pedigree, an asset pedigree and an industry pedigree are defined in the natural resource field ontology, and each abstract class defines various classes according to different fields and different classifications;
for the definition of the relationship, the industry chain has a great amount of upstream and downstream relationship and mutual valuable information exchange, so that the parent class and the child class have the relationship of 'belonging', 'forward driving', 'backward driving';
configuring a mapping relation between the domain ontology and the structured natural resource industrial chain data in the relational database; the method specifically comprises the following steps:
using a mapping tool OnTop to define a mapping rule between two data; directly accessing a relational database in a knowledge graph access mode by using a mapping tool OnTop, converting structured data stored in MySQL into RDF data, and mapping the RDF data to the relational database through a mapping file constructed in advance
Figure DEST_PATH_IMAGE001
On the defined ontology, the mapping tool OnTop can directly use the SPARQL statement to express the knowledge graph as an SQL query executed by a relational source, and further query information in an industrial chain database mapped in the ontology, but the data is kept in the relational database instead of being moved to another database; mapping structured data to ontology development tool by mapping tool OnTop
Figure 49308DEST_PATH_IMAGE001
On the edited ontology;
exporting the structural data stored in the database into ternary group data based on the mapping relation, and taking the ternary group data as a data layer of a knowledge graph of the natural resource industrial chain; the method specifically comprises the following steps:
exporting the structured data in the relational database into data in an RDF triple format by using a materialize command inside the OnTop;
acquiring natural language questions in the aspect of natural resource industry, searching answers of natural language based on a knowledge graph of a natural resource industry chain, and outputting answers corresponding to the natural language; the method specifically comprises the following steps:
acquiring a natural language problem in the aspect of natural resource industry;
performing word segmentation, part of speech tagging and keyword extraction processing on the natural language question;
matching the keywords with a predefined problem template, and screening out the problem template with the highest similarity;
matching the keywords with a predefined problem template, and screening out the problem template with the highest similarity; specifically, similarity calculation is carried out by adopting a similarity matching algorithm; the similarity matching algorithm is a cosine similarity algorithm;
searching the query sentence corresponding to the problem template with the highest similarity, querying in the knowledge graph of the natural resource industry chain according to the query sentence, and outputting a final query result through a response function;
the method further comprises the following steps: based on the structured data, performing visualization and analysis on the natural resource industrial data;
the ternary group data is stored in an Apache-jena-fuseki server to serve as a data source of the question answering system; meanwhile, the server is responsible for receiving the query request, and after receiving the request, the server performs corresponding SPARQL query on the RDF triple stored in the server; meanwhile, the system is responsible for the feedback of the query result;
based on the structured data, performing visualization and analysis on the natural resource industrial data; the method specifically comprises the following steps:
visual storage, retrieval and query of knowledge maps of the natural resource industry knowledge chain are realized;
storing the industry chain knowledge map in a database mode, and updating and visualizing industry chain knowledge remotely through a browser;
an upstream link in an industrial chain conveys products or services to a downstream link, the downstream link feeds back information to the upstream link, namely, upstream and downstream relations exist in the industrial chain and information exchange with values is realized, the inference function of a knowledge graph is utilized to infer natural resource industrial knowledge stored in the knowledge graph of the industrial chain, and new knowledge which is not mined is obtained and learned from known natural resource industrial knowledge and facts, so that the clear upstream and downstream industrial chain relations of the natural resource industry are combed, the competition relation of each link of the natural resource industry is clear through the natural resource industrial chain relation, and the dominant industrial direction of a natural resource industrial plate is clear; and finally, the natural resource industrial layout is optimized, and an industrial chain is perfected.
2. The intelligent question-answering system based on the knowledge graph of the natural resource industry chain, which adopts the intelligent question-answering method based on the knowledge graph of the natural resource industry chain according to claim 1, is characterized by comprising:
an acquisition module configured to acquire natural resource industry data; wherein the natural resource industry data comprises semi-structured data and structured data; converting the semi-structured data into new structured data; storing the new structured data and the original structured data in a relational database;
the ontology construction module is configured to analyze all the structured data and construct a domain ontology of the natural resource industrial chain; taking the domain ontology as a mode layer of a knowledge graph of a natural resource industrial chain;
a mapping relationship configuration module configured to configure a mapping relationship between the domain ontology and the structured natural resource industry chain data in the relational database; exporting structured data stored in a database into ternary group data based on the mapping relation, and taking the ternary group data as a data layer of a knowledge graph of the natural resource industrial chain; further constructing a natural resource industrial chain knowledge graph;
and the answer output module is configured to acquire natural language questions in the aspect of natural resource industry, perform answer search on the natural language based on the natural resource industry chain knowledge graph, and output answers corresponding to the natural language.
3. The intelligent question-answering system based on the knowledge graph of the natural resource industry chain as claimed in claim 2, wherein natural language questions in the natural resource industry are acquired, answer search is performed on natural language based on the knowledge graph of the natural resource industry chain, and answers corresponding to the natural language are output; the method specifically comprises the following steps:
acquiring a natural language problem in the aspect of natural resource industry;
performing word segmentation, part of speech tagging and keyword extraction processing on the natural language question;
matching the keywords with a predefined problem template, and screening out the problem template with the highest similarity;
and searching the query sentence corresponding to the problem template with the highest similarity, querying in the knowledge graph of the natural resource industry chain according to the query sentence, and outputting a final query result through a response function.
CN202210536817.5A 2022-05-18 2022-05-18 Intelligent question-answering method and system based on natural resource industrial chain knowledge graph Active CN114637766B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210536817.5A CN114637766B (en) 2022-05-18 2022-05-18 Intelligent question-answering method and system based on natural resource industrial chain knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210536817.5A CN114637766B (en) 2022-05-18 2022-05-18 Intelligent question-answering method and system based on natural resource industrial chain knowledge graph

Publications (2)

Publication Number Publication Date
CN114637766A CN114637766A (en) 2022-06-17
CN114637766B true CN114637766B (en) 2022-08-26

Family

ID=81953194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210536817.5A Active CN114637766B (en) 2022-05-18 2022-05-18 Intelligent question-answering method and system based on natural resource industrial chain knowledge graph

Country Status (1)

Country Link
CN (1) CN114637766B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004127003A (en) * 2002-10-03 2004-04-22 Nippon Telegr & Teleph Corp <Ntt> Question-answering method, question-answering device, question-answering program, and storage medium
WO2016156995A1 (en) * 2015-03-30 2016-10-06 Yokogawa Electric Corporation Methods, systems and computer program products for machine based processing of natural language input
CN109598384A (en) * 2018-12-06 2019-04-09 同方知网(北京)技术有限公司 A kind of agricultural industry innovation service map construction system
CN111444351A (en) * 2020-03-24 2020-07-24 清华苏州环境创新研究院 Method and device for constructing knowledge graph in industrial process field
CN112507691A (en) * 2020-12-07 2021-03-16 数地科技(北京)有限公司 Interpretable financial subject matter generating method and device fusing emotion, industrial chain and case logic
CN113806513A (en) * 2021-09-30 2021-12-17 中国人民解放军国防科技大学 Question-answering system construction method and system based on knowledge graph in military field
CN113918728A (en) * 2021-09-28 2022-01-11 安徽国科信通科技有限公司 Industrial Internet post-service knowledge map analysis platform
WO2022051996A1 (en) * 2020-09-10 2022-03-17 西门子(中国)有限公司 Method and apparatus for constructing knowledge graph
CN114328949A (en) * 2021-11-30 2022-04-12 德邦证券股份有限公司 Enterprise risk conduction analysis method and device based on knowledge graph

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9785671B2 (en) * 2013-07-15 2017-10-10 Capricorn Holdings Pte. Ltd. Template-driven structured query generation
US10599644B2 (en) * 2016-09-14 2020-03-24 International Business Machines Corporation System and method for managing artificial conversational entities enhanced by social knowledge
CN109492077B (en) * 2018-09-29 2020-09-29 北京智通云联科技有限公司 Knowledge graph-based petrochemical field question-answering method and system
CN109766417B (en) * 2018-11-30 2020-11-24 浙江大学 Knowledge graph-based literature dating history question-answering system construction method
CN111694968B (en) * 2020-06-15 2024-02-09 北京工商大学 Fresh food supply chain knowledge graph construction method based on semi-structured data
CN113407688B (en) * 2021-06-15 2022-09-16 西安理工大学 Method for establishing knowledge graph-based survey standard intelligent question-answering system
CN114490964A (en) * 2021-12-22 2022-05-13 安徽省农业科学院农业经济与信息研究所 Soil fertility knowledge question-answering method, system, equipment and medium based on knowledge map

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004127003A (en) * 2002-10-03 2004-04-22 Nippon Telegr & Teleph Corp <Ntt> Question-answering method, question-answering device, question-answering program, and storage medium
WO2016156995A1 (en) * 2015-03-30 2016-10-06 Yokogawa Electric Corporation Methods, systems and computer program products for machine based processing of natural language input
CN109598384A (en) * 2018-12-06 2019-04-09 同方知网(北京)技术有限公司 A kind of agricultural industry innovation service map construction system
CN111444351A (en) * 2020-03-24 2020-07-24 清华苏州环境创新研究院 Method and device for constructing knowledge graph in industrial process field
WO2022051996A1 (en) * 2020-09-10 2022-03-17 西门子(中国)有限公司 Method and apparatus for constructing knowledge graph
CN112507691A (en) * 2020-12-07 2021-03-16 数地科技(北京)有限公司 Interpretable financial subject matter generating method and device fusing emotion, industrial chain and case logic
CN113918728A (en) * 2021-09-28 2022-01-11 安徽国科信通科技有限公司 Industrial Internet post-service knowledge map analysis platform
CN113806513A (en) * 2021-09-30 2021-12-17 中国人民解放军国防科技大学 Question-answering system construction method and system based on knowledge graph in military field
CN114328949A (en) * 2021-11-30 2022-04-12 德邦证券股份有限公司 Enterprise risk conduction analysis method and device based on knowledge graph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Graph-based reasoning in collaborative knowledge management for industrial maintenance;Bernard Kamsu-Foguem等;《Computers in Industry》;20130717;第998-1013页 *
基于本体的油茶中文知识图谱构建与应用;丁浩宸等;《世界林业研究》;20200601(第04期);第50-55页 *

Also Published As

Publication number Publication date
CN114637766A (en) 2022-06-17

Similar Documents

Publication Publication Date Title
Meroño-Peñuela et al. Semantic technologies for historical research: A survey
Kalfoglou et al. IF-Map: An ontology-mapping method based on information-flow theory
Salas et al. Publishing statistical data on the web
Scheider et al. Semantic typing of linked geoprocessing workflows
Wang et al. A framework for evaluating snippet generation for dataset search
Bozic et al. KnowText: Auto-generated Knowledge Graphs for custom domain applications
Gacitua et al. Using Semantic Web technologies in the development of data warehouses: A systematic mapping
CN114637766B (en) Intelligent question-answering method and system based on natural resource industrial chain knowledge graph
Buffa et al. ISICIL: semantics and social networks for business intelligence
Wu et al. Understanding knowledge graphs
Telnov et al. Machine Learning and Text Analysis in the Tasks of Knowledge Graphs Refinement and Enrichment.
Abrosimova et al. The ontology-based event mining tools for monitoring global processes
Filipiak et al. Generating semantic media Wiki content from domain ontologies
Schröder Efficient High-Level Semantic Enrichment of Undocumented Enterprise Data
Barret et al. Exploring heterogeneous data graphs through their entity paths
Celli et al. Discovering, indexing and interlinking information resources
Walha et al. From user generated content to social data warehouse: Processes, operations and data modelling
Mimouni et al. Answering complex queries on legal networks: A direct and a structured IR approaches
Hasapis et al. Business value creation from linked data analytics: The LinDA approach
Dobrowolski et al. Semantic OLAP with FluentEditor and Ontorion Semantic Excel Toolchain
Yadav et al. Efficient retrieval of data using semantic search engine based on NLP and RDF
Elkaimbillah et al. Construction of an ontology-based document collection for the IT job offer in Morocco
Varanka et al. Topographic mapping data semantics through data conversion and enhancement
Khouri et al. LOD query-logs as an asset for multidimensional modeling
Sellami et al. Leveraging enterprise knowledge graphs for efficient bridging between business data with large-scale web data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant