CN113407688A - Method for establishing knowledge graph-based survey standard intelligent question-answering system - Google Patents

Method for establishing knowledge graph-based survey standard intelligent question-answering system Download PDF

Info

Publication number
CN113407688A
CN113407688A CN202110658780.9A CN202110658780A CN113407688A CN 113407688 A CN113407688 A CN 113407688A CN 202110658780 A CN202110658780 A CN 202110658780A CN 113407688 A CN113407688 A CN 113407688A
Authority
CN
China
Prior art keywords
standard
survey
file
data
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110658780.9A
Other languages
Chinese (zh)
Other versions
CN113407688B (en
Inventor
何敏
赵立洁
姚旭豪
赵钦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202110658780.9A priority Critical patent/CN113407688B/en
Publication of CN113407688A publication Critical patent/CN113407688A/en
Application granted granted Critical
Publication of CN113407688B publication Critical patent/CN113407688B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24558Binary matching operations
    • G06F16/2456Join operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for establishing an intellectual investigation standard intelligent question-answering system based on a knowledge graph, which comprises the steps of firstly designing investigation standard data, including an entity set, a relational database ER (ER) diagram of attributes and relations among entities, and collecting and storing the investigation standard data into an Excel table; combining the survey standard data with an ER (ER) diagram of a relational database to construct a survey standard relational database; constructing a body of the survey specification field by a top-down method; creating a mapping file; converting the mapping file into RDF format data by using a conversion tool dump-RDF provided by a D2RQ tool, and carrying out storage, management and knowledge reasoning on the relational database; the method comprises the steps of using a regular expression to perform segmentation, combination and matching on a character string layer on a preset problem, thereby realizing the combination query of character strings, namely the input of the problem; and finally, realizing visual display. The invention solves the problem of low efficiency of manually searching the specifications of the question answering system in the prior art.

Description

Method for establishing knowledge graph-based survey standard intelligent question-answering system
Technical Field
The invention belongs to the technical field of an industrial knowledge map question-answering system, and particularly relates to a method for establishing a knowledge map-based survey standard intelligent question-answering system.
Background
A Question Answering System (QA System) is a high-level form of information retrieval System that can answer questions posed by users in natural language with accurate and concise natural language. Different applications require different forms of question-answering systems, which also use different corporations and techniques. The question-answering system can be classified into a "specific field" and an "open field" according to the knowledge field. Domain-specific systems focus on answering questions in specific domains, such as medicine, sports, government affairs, and the like. Open-field systems are not expected to limit the scope of the question, and astronomical geography is not a question.
A knowledge graph is a structured semantic knowledge base, usually identified in the form of triples, that symbolically describes concepts in the physical world and their interrelationships. A Question-answering system constructed on the basis of a knowledge graph is also called KBQA (knowledge Base Question answering) and is one of the Question-answering systems in the specific field. By giving natural language questions, performing semantic understanding and analysis on the questions, and further performing query and reasoning by using a knowledge graph to obtain answers. Compared with the question-answering system in the open field, the question-answering system based on the knowledge graph can store related entities of one or more fields, carry out reasoning and deduction based on the knowledge graph and deeply answer questions put forward by users. Ontology modeling is required in the construction process of the knowledge graph, the construction forms of the ontology model comprise top-down and bottom-up, and more choices are made for constructing the model from top to bottom for the knowledge base in a specific field.
In recent years, the demand for KBQA has been increasing in various fields, and knowledge-graph-based question-answering systems have been widely used in various fields, particularly in the medical field, for example, the patent with application number CN202011067875.5 relates to the field of self-comprehensive management of hypertensive patients, and the patent with application number CN202010047420.0 relates to the field of electronic medical record question-answering, however, the research and discussion rules related to the field of building codes are less, and with the expansion of construction projects in loess areas, a large number of problems are not avoided in the investigation work, the relevant codes need to be consulted, and manual consultation needs to be performed conventionally, which brings about a large amount of work, and is not favorable for the development of the investigation work.
Disclosure of Invention
The invention aims to provide a method for establishing an intelligent question-answering system for surveying specifications based on a knowledge graph, which solves the problem of low efficiency of artificially searching the specifications by the question-answering system in the prior art.
The invention adopts the technical scheme that a method for establishing an intelligent question-answering system based on the knowledge graph survey specification is implemented according to the following steps:
step 1, establishing an ER relationship: designing survey specification data according to survey specifications, wherein the survey specification data comprises entity sets, attributes among entities and a relational database ER (ER) diagram of the relationships among the entities, and acquiring and storing the survey specification data into an Excel table;
step 2, database construction: combining the survey specification data collected in the step 1 with an ER (extreme learning) diagram of a relational database to construct a survey specification relational database;
step 3, ontology modeling: constructing a survey specification field body by a top-down method based on the survey specification relational database obtained in the step 2;
step 4, creating mapping: generating a mapping.ttl mapping file according to the reconnaissance specification relational database data obtained in the step 2 by using a D2RQ tool according to a mapping standard established by the W3C world Wide Web alliance;
step 5, converting the mapping file obtained in the step 4 into RDF format data by using a conversion tool dump-RDF provided by a D2RQ tool, and storing the data by taking 'N-TRIPLE' as a default format;
and 6, storing and managing the relational database: importing RDF format data by a Jena Fuseki component through a network interface, solidifying the RDF format data into a TDB format file, operating Fuseki-server.bat, and then exiting;
step 7, rule reasoning: performing knowledge reasoning on the RDF format data in the step 6 by using a Jena OWLFBRuleReasoner tool and combining the body file of the survey specification field constructed in the step 3;
step 8, regular semantic analysis: the method comprises the steps of using a regular expression to perform segmentation, combination and matching on a character string layer on a preset problem, thereby realizing the combination query of character strings, namely the input of the problem;
step 9, building a question-answering system by Python language: and the interaction of the RDF format data is realized by using a Python programming language, and the results of RDF data retrieval and reasoning are displayed visually.
The present invention is also characterized in that,
the step 1 is as follows:
step 1.1, determining related standard fields and ranges;
step 1.2, the sorted content comprises three parts of content, namely a standard type, a standard content and a standard rule, and meanwhile, the corresponding relation between the standard type and the standard content and the corresponding relation between the standard content and the standard rule are unified;
step 1.3, establishing a standard type data table, a standard content data table, a standard rule data table, a standard content and standard type corresponding relation data table and a standard rule and standard content corresponding relation data table in the Excel table.
The step 2 is as follows:
step 2.1, opening a visualization operation platform of the Navicat Premium 15, namely the MySQL database of the source software, connecting the Navicat Premium 15 with the MySQL database, and building an empty investigation specification relational database;
step 2.2, importing the standard type data table, the standard content data table, the standard rule data table, the standard content and standard type corresponding relation data table and the standard rule and standard content corresponding relation data table which are constructed in the step 1 into the newly-built survey standard relation database in the step 2.1;
step 2.3, setting field types, field lengths and adding primary keys among data according to the relation data imported in the step 2.2; setting the index relation between the standard content and the standard type corresponding relation data table and the index relation between the standard detail and the standard content corresponding relation data table by using the index and foreign key command in the Navicat Premium 15, and thus, constructing the survey standard relation database;
and 2.4, storing the reconnaissance standard relation database constructed in the step 2.3 in the form of an sql script file.
The step 3 is as follows:
step 3.1, determining three entities, attribute relations and data relations of the type of the survey specification, the content of the survey specification and the detailed rules of the survey specification, setting the type of the attributes, simultaneously appointing the characteristics of the attributes for carrying out knowledge reasoning by subsequently combining a Jena tool, and then completing the construction of a survey specification field body by using open source software Prot é;
and 3.2, formally storing the constructed exploration standard field ontology, wherein the constructed exploration standard field ontology is stored by adopting RDF/XML description language during storage.
The step 4 is as follows:
step 4.1, utilizing the general mapping script file in the third-party open source software package D2RQ to call the reconnaissance specification relational database obtained in the step 3, namely, the sql file, and generating a predefined mapping file, namely, the mapping.
Step 4.2, modifying the mapping file obtained in the step 4.1 according to the body file of the survey specification field constructed in the step 3:
the IRI of the ontology is first prefixed, i.e. using: http:// www.kancha.com # criterion represents criterion, and other words are the same; then, the mapping vocabulary generated by default is modified into the vocabulary in the body, and a modified mapping.ttl mapping file is obtained;
step 4.3, using a script command to start the D2R Server to verify the mapping file modified in the step 4.2: firstly calling a script file to run a mapping file of a survey specification, after the mapping file is successfully started, opening an interactive page through http:// localhost: 2020/' in a browser to realize data access, and checking the integrity of related data.
Step 7 is specifically as follows:
step 7.1, entering an 'apache-jena-fuseki-3.17.0' folder, operating 'fuseki-server.bat', then exiting, automatically creating a 'run' folder under the current directory, moving the body file of the survey standard field obtained in the step 3 into a database folder under the run folder, and changing the suffix name of the body file from owl to ttl;
step 7.2, configuring a rules file under the databases folder, and defining inference rules: the configuration of the rules file is based on RDFS, OWL and a general rule inference machine in Jena, the relationship between the constructed ontology entities is formalized and instantiated according to grammatical rules defined in official documents, and the suffix of the file is ttl;
and 7.3, after the rules file is configured, configuring a fuseki _ conf.ttl file in a configuration directory under the run folder for gathering an inference engine, respectively referencing the addresses of the ontology, the ttl and the rules.ttl files according to Jena official inference rule language, explaining an inference range, after the files are configured, starting the fuseki-server.bat service again, accessing http:// localhost: 3030/' through a browser, and accessing and inferring RDF data through an inquiry statement.
The step 8 is as follows:
step 8.1, word segmentation and entity identification of the input question: word segmentation and entity recognition are completed by using a jieba tool, and related first-pass survey professional vocabularies are used as an external dictionary; and when the jieba is used, the external dictionary is loaded, so that the problem of entity recognition can be solved.
Step 8.2, the specific method of the regular expression is as follows: taking each word in user query as an object, wherein the object has two basic attributes, namely vocabulary and part of speech, defining a matching rule by using an open source tool REFO, when a combination with a survey professional vocabulary appears, successfully matching one rule, executing a preset function, and aiming at each user query, firstly, performing word segmentation and part of speech tagging on the user query by using an open source word segmentation tool jieba to obtain an object list, and secondly, matching the object list with the matching rule defined by using the REFO one by one; and if the matching is successful, executing the corresponding function.
The invention has the beneficial effects that 1, based on the traditional building industry, the knowledge management efficiency is low, the manual inquiry specification is time-consuming and labor-consuming, and the redundancy cost is increased, the system provides an intellectual question-answering system based on the investigation specification of the knowledge map and an establishing method thereof, and by constructing the intellectual map of the investigation specification and realizing one investigation specification KBQA, the extensible intellectual question-answering service is provided for the building field, thereby being beneficial to the development of the investigation work. 2. The system has certain reasoning capability, is beneficial to the mutual linkage between specifications, and the knowledge map can carry out knowledge reasoning and deduction by defining reasoning rules, so that the aim of deeply answering the user questions is fulfilled, and meanwhile, the accuracy and the obtaining efficiency of the user for obtaining knowledge are improved.
Drawings
FIG. 1 is a flow chart of a survey specification question-answering system of the present invention;
FIG. 2 is a technical roadmap of the survey canonical question-answering system of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention relates to a method for establishing a knowledge graph-based survey standard intelligent question-answering system, which is implemented by combining a figure and a figure 2 according to the following steps:
step 1, establishing an ER relationship: designing survey specification data according to survey specifications, wherein the survey specification data comprises entity sets, attributes among entities and a relational database ER (ER) diagram of the relationships among the entities, and acquiring and storing the survey specification data into an Excel table;
the step 1 is as follows:
step 1.1, determining related standard fields and ranges;
step 1.2, through the arrangement of the survey report, the arrangement content comprises three parts of contents, namely a standard type, a standard content and a standard rule, and meanwhile, the corresponding relation between the standard type and the standard content and the corresponding relation between the standard content and the standard rule are unified;
step 1.3, establishing a standard type data table, a standard content data table, a standard rule data table, a standard content and standard type corresponding relation data table and a standard rule and standard content corresponding relation data table in the Excel table.
Step 2, database construction: combining the survey specification data collected in the step 1 with an ER (extreme learning) diagram of a relational database to construct a survey specification relational database;
the step 2 is as follows:
step 2.1, opening a visualization operation platform of the Navicat Premium 15, namely the MySQL database of the source software, connecting the Navicat Premium 15 with the MySQL database, and building an empty investigation specification relational database;
step 2.2, importing the standard type data table, the standard content data table, the standard rule data table, the standard content and standard type corresponding relation data table and the standard rule and standard content corresponding relation data table which are constructed in the step 1 into the newly-built survey standard relation database in the step 2.1;
step 2.3, setting field types, field lengths and adding primary keys among data according to the relation data imported in the step 2.2; setting the index relation between the standard content and the standard type corresponding relation data table and the index relation between the standard detail and the standard content corresponding relation data table by using the index and foreign key command in the Navicat Premium 15, and thus, constructing the survey standard relation database;
and 2.4, storing the reconnaissance standard relation database constructed in the step 2.3 in the form of an sql script file.
Step 3, ontology modeling: constructing a survey specification field body by a top-down method based on the survey specification relational database obtained in the step 2;
the step 3 is as follows:
step 3.1, determining three entities, attribute relations and data relations of the type of the survey specification, the content of the survey specification and the detailed rules of the survey specification, setting the type of the attributes, simultaneously appointing the characteristics of the attributes for carrying out knowledge reasoning by subsequently combining a Jena tool, and then completing the construction of a survey specification field body by using open source software Prot é;
and 3.2, formally storing the constructed exploration standard field ontology, wherein the constructed exploration standard field ontology is stored by adopting RDF/XML description language during storage.
Step 4, creating mapping: generating a mapping.ttl mapping file according to the reconnaissance specification relational database data obtained in the step 2 by using a D2RQ tool according to a mapping standard established by the W3C world Wide Web alliance;
the step 4 is as follows:
step 4.1, utilizing the general mapping script file in the third-party open source software package D2RQ to call the reconnaissance specification relational database obtained in the step 3, namely, the sql file, and generating a predefined mapping file, namely, the mapping.
Step 4.2, modifying the mapping file obtained in the step 4.1 according to the body file of the survey specification field constructed in the step 3:
the IRI of the ontology is first prefixed, i.e. using: http:// www.kancha.com # criterion represents criterion, and other words are the same; then, the mapping vocabulary generated by default is modified into the vocabulary in the body, and a modified mapping.ttl mapping file is obtained;
step 4.3, using a script command to start the D2R Server to verify the mapping file modified in the step 4.2: firstly calling a script file to run a mapping file of a survey specification, after the mapping file is successfully started, opening an interactive page through http:// localhost: 2020/' in a browser to realize data access, and checking the integrity of related data.
Step 5, converting the mapping file obtained in the step 4 into RDF format data by using a conversion tool dump-RDF provided by a D2RQ tool, and storing the data by taking 'N-TRIPLE' as a default format;
and 6, storing and managing the relational database: importing RDF format data by a Jena Fuseki component through a network interface, solidifying the RDF format data into a TDB format file, operating Fuseki-server.bat, and then exiting;
step 7, rule reasoning: performing knowledge reasoning on the RDF format data in the step 6 by using a Jena OWLFBRuleReasoner tool and combining the body file of the survey specification field constructed in the step 3;
step 7 is specifically as follows:
step 7.1, entering an 'apache-jena-fuseki-3.17.0' folder, operating 'fuseki-server.bat', then exiting, automatically creating a 'run' folder under the current directory, moving the body file of the survey standard field obtained in the step 3 into a database folder under the run folder, and changing the suffix name of the body file from owl to ttl;
step 7.2, configuring a rules file under the databases folder, and defining inference rules: the configuration of the rules file is based on RDFS, OWL and a general rule inference machine in Jena, the relationship between the constructed ontology entities is formalized and instantiated according to grammatical rules defined in official documents, and the suffix of the file is ttl;
and 7.3, after the rules file is configured, configuring a fuseki _ conf.ttl file in a configuration directory under the run folder for gathering an inference engine, respectively referencing the addresses of the ontology, the ttl and the rules.ttl files according to Jena official inference rule language, explaining an inference range, after the files are configured, starting the fuseki-server.bat service again, accessing http:// localhost: 3030/' through a browser, and accessing and inferring RDF data through an inquiry statement.
Step 8, regular semantic analysis: the method comprises the steps of using a regular expression to perform segmentation, combination and matching on a character string layer on a preset problem, thereby realizing the combination query of character strings, namely the input of the problem;
the step 8 is as follows:
step 8.1, word segmentation and entity identification of the input question: the word segmentation and the entity recognition are completed by using a jieba tool, and the related professional vocabulary cannot be accurately segmented by the jieba tool, so that the related first-pass survey professional vocabulary is used as an external dictionary; and when the jieba is used, the external dictionary is loaded, so that the problem of entity recognition can be solved.
Step 8.2, the specific method of the regular expression is as follows: taking each word in user query as an object, wherein the object has two basic attributes, namely vocabulary and part of speech, defining a matching rule by using an open source tool REFO, when a combination with a survey professional vocabulary appears, successfully matching one rule, executing a preset function, and aiming at each user query, firstly, performing word segmentation and part of speech tagging on the user query by using an open source word segmentation tool jieba to obtain an object list, and secondly, matching the object list with the matching rule defined by using the REFO one by one; and if the matching is successful, executing the corresponding function.
Step 9, building a question-answering system by Python language: and the interaction of the RDF format data is realized by using a Python programming language, and the results of RDF data retrieval and reasoning are displayed visually.
According to the method, structured data in a survey specification is used as a basis, a survey specification body model is constructed from top to bottom through Prot gee, data storage is realized by establishing foreign keys and indexes between data tables in combination with a MySQL database, and data are converted through RDF (remote data format) by using D2RQ, so that data transmission is realized; establishing Rule in Apache Jena to realize knowledge inference; generating a SPARQL query statement by utilizing regular semantic analysis to perform data matching, and realizing data search; realizing final webpage display through a Python component; the invention can convert natural language into computer, deeply answer questions proposed by users, improve the efficiency and accuracy of returned results, and is beneficial to promoting the progress and efficiency of exploration work in the building field.
Examples
The specific steps for implementing the example are detailed below in conjunction with the above technical scheme:
1. the ER graph designed according to the survey field specification comprises specification types, specifications and detailed rules; the standard category table contains id and name; the specification table contains id, name and specific requirements; the rule table contains id, name and content.
2. Combining the designed ER diagram, and utilizing the Prot é g to perform ontology modeling to construct 3 classes of standard types, standards and detailed rules; 5 data attributes including 2 relations and detailed names are designed, and a Turtle Syntam format body file is exported.
3. Establishing a database named as 'kancha' in a MySQL database; importing the 3 created data tables in the Excel into MySQL; designing a table setting field type, length and a data table main key; creating a new table to generate a relation between tables, such as a standard type table and a standard table, adding 2 fields, namely id in the standard type table and id of the standard type, and adding an index and an external key to generate a corresponding reference relation between the standard type table and the standard table.
4. Calling a 'kancha' database by using a general mapping command in D2RQ through a script to generate a default mapping file; opening a mapping file script, and changing a default mapping vocabulary into a vocabulary in a body; the data can be viewed on the web page through the D2R Server command.
5. And converting the modified mapping file into RDF by using a dump-RDF command in the D2RQ, wherein the format is N-TRIPLE.
6. Creating a 'TDB' directory for storing TDB data, entering a bat directory of an apache-jena-3.5.0 'folder, and storing previous RDF data in a TDB mode by using' tdblob. Enter "apache-jena-fuseki-3.5.0" folder, run "fuseki-server.
7. The program will automatically create a "run" folder for us in the current directory. We move our ontology file "ontology. own" into the "databases" folder under the "run" folder, and change the "own" suffix name to "ttl". In "configuration" under the "run" folder, we create a text file named "fuseki _ conf.ttl", add a script; bat, again, i.e. accessible using a browser (http:// localhost:3030 /). Creating a text file 'rules.ttl' under the 'databases' folder, and adding a script; defining rules, and modifying the configuration file 'fuseki _ conf.ttl', so as to realize reasoning.
8. Formulating a corresponding series of problem templates according to the data relationship among the knowledge bases, simulating the problem mode, performing character string operation on the problem by using a regular expression, matching the relationship of entities in the problem, inputting the relationship into a query console, and constructing a query subgraph; converting natural language into computer language; the database is searched for the selected answer.
9. And writing a script by using a wrapper SPARQLWrapper under Python, and sending a query question to D2RQ endpoint to obtain a data query result.
10. And (5) packaging the whole implementation process by using a streamlit packet in Python to finish the interactive question-answering system.

Claims (7)

1. A method for establishing a knowledge graph-based survey standard intelligent question-answering system is characterized by comprising the following steps:
step 1, establishing an ER relationship: designing survey standard data, including an entity set, attributes among entities and a relational database ER graph of the relationships among the entities, and collecting and storing the survey standard data into an Excel table;
step 2, database construction: combining the survey specification data collected in the step 1 with an ER (extreme learning) diagram of a relational database to construct a survey specification relational database;
step 3, ontology modeling: constructing a survey specification field body by a top-down method based on the survey specification relational database obtained in the step 2;
step 4, creating mapping: generating a mapping.ttl mapping file according to the reconnaissance specification relational database data obtained in the step 2 by using a D2RQ tool according to a mapping standard established by the W3C world Wide Web alliance;
step 5, converting the mapping file obtained in the step 4 into RDF format data by using a conversion tool dump-RDF provided by a D2RQ tool, and storing the data by taking 'N-TRIPLE' as a default format;
and 6, storing and managing the relational database: importing RDF format data by a Jena Fuseki component through a network interface, solidifying the RDF format data into a TDB format file, operating Fuseki-server.bat, and then exiting;
step 7, rule reasoning: performing knowledge reasoning on the RDF format data in the step 6 by using a Jena OWLFBRuleReasoner tool and combining the body file of the survey specification field constructed in the step 3;
step 8, regular semantic analysis: the method comprises the steps of using a regular expression to perform segmentation, combination and matching on a character string layer on a preset problem, thereby realizing the combination query of character strings, namely the input of the problem;
step 9, building a question-answering system by Python language: and the interaction of the RDF format data is realized by using a Python programming language, and the results of RDF data retrieval and reasoning are displayed visually.
2. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 1, wherein the step 1 is as follows:
step 1.1, determining related standard fields and ranges;
step 1.2, the sorted content comprises three parts of content, namely a standard type, a standard content and a standard rule, and meanwhile, the corresponding relation between the standard type and the standard content and the corresponding relation between the standard content and the standard rule are unified;
step 1.3, establishing a standard type data table, a standard content data table, a standard rule data table, a standard content and standard type corresponding relation data table and a standard rule and standard content corresponding relation data table in the Excel table.
3. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 2, wherein the step 2 is as follows:
step 2.1, opening a visualization operation platform of the Navicat Premium 15, namely the MySQL database of the source software, connecting the Navicat Premium 15 with the MySQL database, and building an empty investigation specification relational database;
step 2.2, importing the standard type data table, the standard content data table, the standard rule data table, the standard content and standard type corresponding relation data table and the standard rule and standard content corresponding relation data table which are constructed in the step 1 into the newly-built survey standard relation database in the step 2.1;
step 2.3, setting field types, field lengths and adding primary keys among data according to the relation data imported in the step 2.2; setting the index relation between the standard content and the standard type corresponding relation data table and the index relation between the standard detail and the standard content corresponding relation data table by using the index and foreign key command in the Navicat Premium 15, and thus, constructing the survey standard relation database;
and 2.4, storing the reconnaissance standard relation database constructed in the step 2.3 in the form of an sql script file.
4. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 3, wherein the step 3 is as follows:
step 3.1, determining three entities, attribute relations and data relations of the type of the survey specification, the content of the survey specification and the detailed rules of the survey specification, setting the type of the attributes, simultaneously appointing the characteristics of the attributes for carrying out knowledge reasoning by subsequently combining a Jena tool, and then completing the construction of a survey specification field body by using open source software Prot é;
and 3.2, formally storing the constructed exploration standard field ontology, wherein the constructed exploration standard field ontology is stored by adopting RDF/XML description language during storage.
5. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 4, wherein the step 4 is as follows:
step 4.1, utilizing the general mapping script file in the third-party open source software package D2RQ to call the reconnaissance specification relational database obtained in the step 3, namely, the sql file, and generating a predefined mapping file, namely, the mapping.
Step 4.2, modifying the mapping file obtained in the step 4.1 according to the body file of the survey specification field constructed in the step 3:
the IRI of the ontology is first prefixed, i.e. using: http:// www.kancha.com # criterion represents criterion, and other words are the same; then, the mapping vocabulary generated by default is modified into the vocabulary in the body, and a modified mapping.ttl mapping file is obtained;
step 4.3, using a script command to start the D2R Server to verify the mapping file modified in the step 4.2: firstly calling a script file to run a mapping file of a survey specification, after the mapping file is successfully started, opening an interactive page through http:// localhost: 2020/' in a browser to realize data access, and checking the integrity of related data.
6. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 5, wherein the step 7 is as follows:
step 7.1, entering an 'apache-jena-fuseki-3.17.0' folder, operating 'fuseki-server.bat', then exiting, automatically creating a 'run' folder under the current directory, moving the body file of the survey standard field obtained in the step 3 into a database folder under the run folder, and changing the suffix name of the body file from owl to ttl;
step 7.2, configuring a rules file under the databases folder, and defining inference rules: the configuration of the rules file is based on RDFS, OWL and a general rule inference machine in Jena, the relationship between the constructed ontology entities is formalized and instantiated according to grammatical rules defined in official documents, and the suffix of the file is ttl;
and 7.3, after the rules file is configured, configuring a fuseki _ conf.ttl file in a configuration directory under the run folder for gathering an inference engine, respectively referencing the addresses of the ontology, the ttl and the rules.ttl files according to Jena official inference rule language, explaining an inference range, after the files are configured, starting the fuseki-server.bat service again, accessing http:// localhost: 3030/' through a browser, and accessing and inferring RDF data through an inquiry statement.
7. The method for establishing a knowledge-graph-based survey-specification intelligent question-answering system according to claim 6, wherein the step 8 is as follows:
step 8.1, word segmentation and entity identification of the input question: word segmentation and entity recognition are completed by using a jieba tool, and related first-pass survey professional vocabularies are used as an external dictionary; and when the jieba is used, the external dictionary is loaded, so that the problem of entity recognition can be solved.
Step 8.2, the specific method of the regular expression is as follows: taking each word in user query as an object, wherein the object has two basic attributes, namely vocabulary and part of speech, defining a matching rule by using an open source tool REFO, when a combination with a survey professional vocabulary appears, successfully matching one rule, executing a preset function, and aiming at each user query, firstly, performing word segmentation and part of speech tagging on the user query by using an open source word segmentation tool jieba to obtain an object list, and secondly, matching the object list with the matching rule defined by using the REFO one by one; and if the matching is successful, executing the corresponding function.
CN202110658780.9A 2021-06-15 2021-06-15 Method for establishing knowledge graph-based survey standard intelligent question-answering system Active CN113407688B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110658780.9A CN113407688B (en) 2021-06-15 2021-06-15 Method for establishing knowledge graph-based survey standard intelligent question-answering system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110658780.9A CN113407688B (en) 2021-06-15 2021-06-15 Method for establishing knowledge graph-based survey standard intelligent question-answering system

Publications (2)

Publication Number Publication Date
CN113407688A true CN113407688A (en) 2021-09-17
CN113407688B CN113407688B (en) 2022-09-16

Family

ID=77683776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110658780.9A Active CN113407688B (en) 2021-06-15 2021-06-15 Method for establishing knowledge graph-based survey standard intelligent question-answering system

Country Status (1)

Country Link
CN (1) CN113407688B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114637766A (en) * 2022-05-18 2022-06-17 山东师范大学 Intelligent question-answering method and system based on natural resource industrial chain knowledge graph

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080019439A (en) * 2006-08-28 2008-03-04 한국과학기술정보연구원 System and method for knowledge extension and inference service based on dbms
CN103955558A (en) * 2014-04-01 2014-07-30 武汉软想科技有限公司 Method for collecting and processing engineering investigation data of different industries
US9256682B1 (en) * 2012-12-05 2016-02-09 Google Inc. Providing search results based on sorted properties
CN106776797A (en) * 2016-11-22 2017-05-31 中国人名解放军理工大学 A kind of knowledge Q-A system and its method of work based on ontology inference
CN109766417A (en) * 2018-11-30 2019-05-17 浙江大学 A kind of construction method of the literature annals question answering system of knowledge based map
CN110347843A (en) * 2019-07-10 2019-10-18 陕西师范大学 A kind of Chinese tour field Knowledge Service Platform construction method of knowledge based map
CN110674274A (en) * 2019-09-23 2020-01-10 中国农业大学 Knowledge graph construction method for food safety regulation question-answering system
US20200134032A1 (en) * 2018-10-31 2020-04-30 Microsoft Technology Licensing, Llc Constructing structured database query language statements from natural language questions
CN111708874A (en) * 2020-08-24 2020-09-25 湖南大学 Man-machine interaction question-answering method and system based on intelligent complex intention recognition
WO2020233261A1 (en) * 2019-07-12 2020-11-26 之江实验室 Natural language generation-based knowledge graph understanding assistance system
CN112214611A (en) * 2020-09-24 2021-01-12 远光软件股份有限公司 Construction system and method of enterprise knowledge graph
CN112528036A (en) * 2020-11-30 2021-03-19 大连理工大学 Knowledge graph automatic construction method for evidence correlation analysis

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100049763A1 (en) * 2006-08-28 2010-02-25 Korea Institute Of Science & Technology Information System for Providing Service of Knowledge Extension and Inference Based on DBMS, and Method for the Same
KR20080019439A (en) * 2006-08-28 2008-03-04 한국과학기술정보연구원 System and method for knowledge extension and inference service based on dbms
US9256682B1 (en) * 2012-12-05 2016-02-09 Google Inc. Providing search results based on sorted properties
CN103955558A (en) * 2014-04-01 2014-07-30 武汉软想科技有限公司 Method for collecting and processing engineering investigation data of different industries
CN106776797A (en) * 2016-11-22 2017-05-31 中国人名解放军理工大学 A kind of knowledge Q-A system and its method of work based on ontology inference
US20200134032A1 (en) * 2018-10-31 2020-04-30 Microsoft Technology Licensing, Llc Constructing structured database query language statements from natural language questions
CN109766417A (en) * 2018-11-30 2019-05-17 浙江大学 A kind of construction method of the literature annals question answering system of knowledge based map
CN110347843A (en) * 2019-07-10 2019-10-18 陕西师范大学 A kind of Chinese tour field Knowledge Service Platform construction method of knowledge based map
WO2020233261A1 (en) * 2019-07-12 2020-11-26 之江实验室 Natural language generation-based knowledge graph understanding assistance system
CN110674274A (en) * 2019-09-23 2020-01-10 中国农业大学 Knowledge graph construction method for food safety regulation question-answering system
CN111708874A (en) * 2020-08-24 2020-09-25 湖南大学 Man-machine interaction question-answering method and system based on intelligent complex intention recognition
CN112214611A (en) * 2020-09-24 2021-01-12 远光软件股份有限公司 Construction system and method of enterprise knowledge graph
CN112528036A (en) * 2020-11-30 2021-03-19 大连理工大学 Knowledge graph automatic construction method for evidence correlation analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
熊嘉强等: "基于本体的学科体系知识图谱构建研究", 《电脑知识与技术》 *
顾绩等: "基于知识图谱技术的勘察设计企业知识库建设探索", 《中国勘探设计》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114637766A (en) * 2022-05-18 2022-06-17 山东师范大学 Intelligent question-answering method and system based on natural resource industrial chain knowledge graph

Also Published As

Publication number Publication date
CN113407688B (en) 2022-09-16

Similar Documents

Publication Publication Date Title
CN110399457B (en) Intelligent question answering method and system
CN110347843B (en) Knowledge map-based Chinese tourism field knowledge service platform construction method
US10380144B2 (en) Business intelligence (BI) query and answering using full text search and keyword semantics
KR100533810B1 (en) Semi-Automatic Construction Method for Knowledge of Encyclopedia Question Answering System
CN109522465A (en) The semantic searching method and device of knowledge based map
WO2021213314A1 (en) Data processing method and device, and computer readable storage medium
CN109033135A (en) A kind of natural language querying method and system of software-oriented project knowledge map
JP2017513134A (en) Ontology mapping method and apparatus
CN101566988A (en) Method, system and device for searching fuzzy semantics
CN104657439A (en) Generation system and method for structured query sentence used for precise retrieval of natural language
CN102982095B (en) A kind of body automatic creation system based on thesaurus and method thereof
Mena et al. Imprecise answers in distributed environments: Estimation of information loss for multi-ontology based query processing
CN116244344B (en) Retrieval method and device based on user requirements and electronic equipment
CN111061828B (en) Digital library knowledge retrieval method and device
CN113407688B (en) Method for establishing knowledge graph-based survey standard intelligent question-answering system
CN117271799A (en) Knowledge graph-based multi-round question answering method and system
CN117668182A (en) Standard intelligent question-answering method and system integrating knowledge graph and large language model
Embley et al. Conceptual modeling foundations for a web of knowledge
CN115964468A (en) Rural information intelligent question-answering method and device based on multilevel template matching
CN115794869A (en) Implementation method and device for visual construction and generation of semantic query
Gorenjak et al. A question answering system on domain specific knowledge with semantic web support
Furche et al. Survey over existing query and transformation languages
Виноградов et al. Ontologies in the problems of building a concept domain model
Alam et al. Towards a semantic web stack applicable for both RDF and topic maps: a survey
Storey An expert view creation system for database design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant