CN113407688A

CN113407688A - Method for establishing knowledge graph-based survey standard intelligent question-answering system

Info

Publication number: CN113407688A
Application number: CN202110658780.9A
Authority: CN
Inventors: 何敏; 赵立洁; 姚旭豪; 赵钦
Original assignee: Xian University of Technology
Current assignee: Xian University of Technology
Priority date: 2021-06-15
Filing date: 2021-06-15
Publication date: 2021-09-17
Anticipated expiration: 2041-06-15
Also published as: CN113407688B

Abstract

The invention discloses a method for establishing an intellectual investigation standard intelligent question-answering system based on a knowledge graph, which comprises the steps of firstly designing investigation standard data, including an entity set, a relational database ER (ER) diagram of attributes and relations among entities, and collecting and storing the investigation standard data into an Excel table; combining the survey standard data with an ER (ER) diagram of a relational database to construct a survey standard relational database; constructing a body of the survey specification field by a top-down method; creating a mapping file; converting the mapping file into RDF format data by using a conversion tool dump-RDF provided by a D2RQ tool, and carrying out storage, management and knowledge reasoning on the relational database; the method comprises the steps of using a regular expression to perform segmentation, combination and matching on a character string layer on a preset problem, thereby realizing the combination query of character strings, namely the input of the problem; and finally, realizing visual display. The invention solves the problem of low efficiency of manually searching the specifications of the question answering system in the prior art.

Description

Method for establishing knowledge graph-based survey standard intelligent question-answering system

Technical Field

The invention belongs to the technical field of an industrial knowledge map question-answering system, and particularly relates to a method for establishing a knowledge map-based survey standard intelligent question-answering system.

Background

A Question Answering System (QA System) is a high-level form of information retrieval System that can answer questions posed by users in natural language with accurate and concise natural language. Different applications require different forms of question-answering systems, which also use different corporations and techniques. The question-answering system can be classified into a "specific field" and an "open field" according to the knowledge field. Domain-specific systems focus on answering questions in specific domains, such as medicine, sports, government affairs, and the like. Open-field systems are not expected to limit the scope of the question, and astronomical geography is not a question.

A knowledge graph is a structured semantic knowledge base, usually identified in the form of triples, that symbolically describes concepts in the physical world and their interrelationships. A Question-answering system constructed on the basis of a knowledge graph is also called KBQA (knowledge Base Question answering) and is one of the Question-answering systems in the specific field. By giving natural language questions, performing semantic understanding and analysis on the questions, and further performing query and reasoning by using a knowledge graph to obtain answers. Compared with the question-answering system in the open field, the question-answering system based on the knowledge graph can store related entities of one or more fields, carry out reasoning and deduction based on the knowledge graph and deeply answer questions put forward by users. Ontology modeling is required in the construction process of the knowledge graph, the construction forms of the ontology model comprise top-down and bottom-up, and more choices are made for constructing the model from top to bottom for the knowledge base in a specific field.

In recent years, the demand for KBQA has been increasing in various fields, and knowledge-graph-based question-answering systems have been widely used in various fields, particularly in the medical field, for example, the patent with application number CN202011067875.5 relates to the field of self-comprehensive management of hypertensive patients, and the patent with application number CN202010047420.0 relates to the field of electronic medical record question-answering, however, the research and discussion rules related to the field of building codes are less, and with the expansion of construction projects in loess areas, a large number of problems are not avoided in the investigation work, the relevant codes need to be consulted, and manual consultation needs to be performed conventionally, which brings about a large amount of work, and is not favorable for the development of the investigation work.

Disclosure of Invention

The invention aims to provide a method for establishing an intelligent question-answering system for surveying specifications based on a knowledge graph, which solves the problem of low efficiency of artificially searching the specifications by the question-answering system in the prior art.

The invention adopts the technical scheme that a method for establishing an intelligent question-answering system based on the knowledge graph survey specification is implemented according to the following steps:

step 1, establishing an ER relationship: designing survey specification data according to survey specifications, wherein the survey specification data comprises entity sets, attributes among entities and a relational database ER (ER) diagram of the relationships among the entities, and acquiring and storing the survey specification data into an Excel table;

step 2, database construction: combining the survey specification data collected in the step 1 with an ER (extreme learning) diagram of a relational database to construct a survey specification relational database;

step 3, ontology modeling: constructing a survey specification field body by a top-down method based on the survey specification relational database obtained in the step 2;

step 4, creating mapping: generating a mapping.ttl mapping file according to the reconnaissance specification relational database data obtained in the step 2 by using a D2RQ tool according to a mapping standard established by the W3C world Wide Web alliance;

step 5, converting the mapping file obtained in the step 4 into RDF format data by using a conversion tool dump-RDF provided by a D2RQ tool, and storing the data by taking 'N-TRIPLE' as a default format;

and 6, storing and managing the relational database: importing RDF format data by a Jena Fuseki component through a network interface, solidifying the RDF format data into a TDB format file, operating Fuseki-server.bat, and then exiting;

step 7, rule reasoning: performing knowledge reasoning on the RDF format data in the step 6 by using a Jena OWLFBRuleReasoner tool and combining the body file of the survey specification field constructed in the step 3;

step 8, regular semantic analysis: the method comprises the steps of using a regular expression to perform segmentation, combination and matching on a character string layer on a preset problem, thereby realizing the combination query of character strings, namely the input of the problem;

step 9, building a question-answering system by Python language: and the interaction of the RDF format data is realized by using a Python programming language, and the results of RDF data retrieval and reasoning are displayed visually.

The present invention is also characterized in that,

the step 1 is as follows:

step 1.1, determining related standard fields and ranges;

step 1.2, the sorted content comprises three parts of content, namely a standard type, a standard content and a standard rule, and meanwhile, the corresponding relation between the standard type and the standard content and the corresponding relation between the standard content and the standard rule are unified;

step 1.3, establishing a standard type data table, a standard content data table, a standard rule data table, a standard content and standard type corresponding relation data table and a standard rule and standard content corresponding relation data table in the Excel table.

The step 2 is as follows:

step 2.1, opening a visualization operation platform of the Navicat Premium 15, namely the MySQL database of the source software, connecting the Navicat Premium 15 with the MySQL database, and building an empty investigation specification relational database;

step 2.2, importing the standard type data table, the standard content data table, the standard rule data table, the standard content and standard type corresponding relation data table and the standard rule and standard content corresponding relation data table which are constructed in the step 1 into the newly-built survey standard relation database in the step 2.1;

step 2.3, setting field types, field lengths and adding primary keys among data according to the relation data imported in the step 2.2; setting the index relation between the standard content and the standard type corresponding relation data table and the index relation between the standard detail and the standard content corresponding relation data table by using the index and foreign key command in the Navicat Premium 15, and thus, constructing the survey standard relation database;

and 2.4, storing the reconnaissance standard relation database constructed in the step 2.3 in the form of an sql script file.

The step 3 is as follows:

step 3.1, determining three entities, attribute relations and data relations of the type of the survey specification, the content of the survey specification and the detailed rules of the survey specification, setting the type of the attributes, simultaneously appointing the characteristics of the attributes for carrying out knowledge reasoning by subsequently combining a Jena tool, and then completing the construction of a survey specification field body by using open source software Prot é;

and 3.2, formally storing the constructed exploration standard field ontology, wherein the constructed exploration standard field ontology is stored by adopting RDF/XML description language during storage.

The step 4 is as follows:

step 4.1, utilizing the general mapping script file in the third-party open source software package D2RQ to call the reconnaissance specification relational database obtained in the step 3, namely, the sql file, and generating a predefined mapping file, namely, the mapping.

Step 4.2, modifying the mapping file obtained in the step 4.1 according to the body file of the survey specification field constructed in the step 3:

the IRI of the ontology is first prefixed, i.e. using: http:// www.kancha.com # criterion represents criterion, and other words are the same; then, the mapping vocabulary generated by default is modified into the vocabulary in the body, and a modified mapping.ttl mapping file is obtained;

step 4.3, using a script command to start the D2R Server to verify the mapping file modified in the step 4.2: firstly calling a script file to run a mapping file of a survey specification, after the mapping file is successfully started, opening an interactive page through http:// localhost: 2020/' in a browser to realize data access, and checking the integrity of related data.

Step 7 is specifically as follows:

step 7.1, entering an 'apache-jena-fuseki-3.17.0' folder, operating 'fuseki-server.bat', then exiting, automatically creating a 'run' folder under the current directory, moving the body file of the survey standard field obtained in the step 3 into a database folder under the run folder, and changing the suffix name of the body file from owl to ttl;

step 7.2, configuring a rules file under the databases folder, and defining inference rules: the configuration of the rules file is based on RDFS, OWL and a general rule inference machine in Jena, the relationship between the constructed ontology entities is formalized and instantiated according to grammatical rules defined in official documents, and the suffix of the file is ttl;

and 7.3, after the rules file is configured, configuring a fuseki _ conf.ttl file in a configuration directory under the run folder for gathering an inference engine, respectively referencing the addresses of the ontology, the ttl and the rules.ttl files according to Jena official inference rule language, explaining an inference range, after the files are configured, starting the fuseki-server.bat service again, accessing http:// localhost: 3030/' through a browser, and accessing and inferring RDF data through an inquiry statement.

The step 8 is as follows:

step 8.1, word segmentation and entity identification of the input question: word segmentation and entity recognition are completed by using a jieba tool, and related first-pass survey professional vocabularies are used as an external dictionary; and when the jieba is used, the external dictionary is loaded, so that the problem of entity recognition can be solved.

Step 8.2, the specific method of the regular expression is as follows: taking each word in user query as an object, wherein the object has two basic attributes, namely vocabulary and part of speech, defining a matching rule by using an open source tool REFO, when a combination with a survey professional vocabulary appears, successfully matching one rule, executing a preset function, and aiming at each user query, firstly, performing word segmentation and part of speech tagging on the user query by using an open source word segmentation tool jieba to obtain an object list, and secondly, matching the object list with the matching rule defined by using the REFO one by one; and if the matching is successful, executing the corresponding function.

The invention has the beneficial effects that 1, based on the traditional building industry, the knowledge management efficiency is low, the manual inquiry specification is time-consuming and labor-consuming, and the redundancy cost is increased, the system provides an intellectual question-answering system based on the investigation specification of the knowledge map and an establishing method thereof, and by constructing the intellectual map of the investigation specification and realizing one investigation specification KBQA, the extensible intellectual question-answering service is provided for the building field, thereby being beneficial to the development of the investigation work. 2. The system has certain reasoning capability, is beneficial to the mutual linkage between specifications, and the knowledge map can carry out knowledge reasoning and deduction by defining reasoning rules, so that the aim of deeply answering the user questions is fulfilled, and meanwhile, the accuracy and the obtaining efficiency of the user for obtaining knowledge are improved.

Drawings

FIG. 1 is a flow chart of a survey specification question-answering system of the present invention;

FIG. 2 is a technical roadmap of the survey canonical question-answering system of the present invention.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention relates to a method for establishing a knowledge graph-based survey standard intelligent question-answering system, which is implemented by combining a figure and a figure 2 according to the following steps:

the step 1 is as follows:

step 1.1, determining related standard fields and ranges;

step 1.2, through the arrangement of the survey report, the arrangement content comprises three parts of contents, namely a standard type, a standard content and a standard rule, and meanwhile, the corresponding relation between the standard type and the standard content and the corresponding relation between the standard content and the standard rule are unified;

the step 2 is as follows:

the step 3 is as follows:

the step 4 is as follows:

step 7 is specifically as follows:

the step 8 is as follows:

step 8.1, word segmentation and entity identification of the input question: the word segmentation and the entity recognition are completed by using a jieba tool, and the related professional vocabulary cannot be accurately segmented by the jieba tool, so that the related first-pass survey professional vocabulary is used as an external dictionary; and when the jieba is used, the external dictionary is loaded, so that the problem of entity recognition can be solved.

According to the method, structured data in a survey specification is used as a basis, a survey specification body model is constructed from top to bottom through Prot gee, data storage is realized by establishing foreign keys and indexes between data tables in combination with a MySQL database, and data are converted through RDF (remote data format) by using D2RQ, so that data transmission is realized; establishing Rule in Apache Jena to realize knowledge inference; generating a SPARQL query statement by utilizing regular semantic analysis to perform data matching, and realizing data search; realizing final webpage display through a Python component; the invention can convert natural language into computer, deeply answer questions proposed by users, improve the efficiency and accuracy of returned results, and is beneficial to promoting the progress and efficiency of exploration work in the building field.

Examples

The specific steps for implementing the example are detailed below in conjunction with the above technical scheme:

1. the ER graph designed according to the survey field specification comprises specification types, specifications and detailed rules; the standard category table contains id and name; the specification table contains id, name and specific requirements; the rule table contains id, name and content.

2. Combining the designed ER diagram, and utilizing the Prot é g to perform ontology modeling to construct 3 classes of standard types, standards and detailed rules; 5 data attributes including 2 relations and detailed names are designed, and a Turtle Syntam format body file is exported.

3. Establishing a database named as 'kancha' in a MySQL database; importing the 3 created data tables in the Excel into MySQL; designing a table setting field type, length and a data table main key; creating a new table to generate a relation between tables, such as a standard type table and a standard table, adding 2 fields, namely id in the standard type table and id of the standard type, and adding an index and an external key to generate a corresponding reference relation between the standard type table and the standard table.

4. Calling a 'kancha' database by using a general mapping command in D2RQ through a script to generate a default mapping file; opening a mapping file script, and changing a default mapping vocabulary into a vocabulary in a body; the data can be viewed on the web page through the D2R Server command.

5. And converting the modified mapping file into RDF by using a dump-RDF command in the D2RQ, wherein the format is N-TRIPLE.

6. Creating a 'TDB' directory for storing TDB data, entering a bat directory of an apache-jena-3.5.0 'folder, and storing previous RDF data in a TDB mode by using' tdblob. Enter "apache-jena-fuseki-3.5.0" folder, run "fuseki-server.

7. The program will automatically create a "run" folder for us in the current directory. We move our ontology file "ontology. own" into the "databases" folder under the "run" folder, and change the "own" suffix name to "ttl". In "configuration" under the "run" folder, we create a text file named "fuseki _ conf.ttl", add a script; bat, again, i.e. accessible using a browser (http:// localhost:3030 /). Creating a text file 'rules.ttl' under the 'databases' folder, and adding a script; defining rules, and modifying the configuration file 'fuseki _ conf.ttl', so as to realize reasoning.

8. Formulating a corresponding series of problem templates according to the data relationship among the knowledge bases, simulating the problem mode, performing character string operation on the problem by using a regular expression, matching the relationship of entities in the problem, inputting the relationship into a query console, and constructing a query subgraph; converting natural language into computer language; the database is searched for the selected answer.

9. And writing a script by using a wrapper SPARQLWrapper under Python, and sending a query question to D2RQ endpoint to obtain a data query result.

10. And (5) packaging the whole implementation process by using a streamlit packet in Python to finish the interactive question-answering system.

Claims

1. A method for establishing a knowledge graph-based survey standard intelligent question-answering system is characterized by comprising the following steps:

step 1, establishing an ER relationship: designing survey standard data, including an entity set, attributes among entities and a relational database ER graph of the relationships among the entities, and collecting and storing the survey standard data into an Excel table;

2. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 1, wherein the step 1 is as follows:

step 1.1, determining related standard fields and ranges;

3. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 2, wherein the step 2 is as follows:

4. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 3, wherein the step 3 is as follows:

5. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 4, wherein the step 4 is as follows:

6. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 5, wherein the step 7 is as follows:

7. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 6, wherein the step 8 is as follows: