CN115982322A - Water conservancy industry design field knowledge graph retrieval method and retrieval system - Google Patents

Water conservancy industry design field knowledge graph retrieval method and retrieval system Download PDF

Info

Publication number
CN115982322A
CN115982322A CN202211684902.2A CN202211684902A CN115982322A CN 115982322 A CN115982322 A CN 115982322A CN 202211684902 A CN202211684902 A CN 202211684902A CN 115982322 A CN115982322 A CN 115982322A
Authority
CN
China
Prior art keywords
search
knowledge
module
question
water conservancy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211684902.2A
Other languages
Chinese (zh)
Inventor
冯燕青
徐朝辉
庞纪明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Water Planning And Designing Institute Co ltd
Original Assignee
Nanjing Water Planning And Designing Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Water Planning And Designing Institute Co ltd filed Critical Nanjing Water Planning And Designing Institute Co ltd
Priority to CN202211684902.2A priority Critical patent/CN115982322A/en
Publication of CN115982322A publication Critical patent/CN115982322A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for retrieving a knowledge graph in the design field of water conservancy industry, which comprises the following steps: s1, an unstructured natural language question is asked through a search sentence processing module; s2, distributing the search request to different search modules of a core service layer by the search distribution module according to different search intents; s3, after the query sentence input by the user is processed by the search sentence processing module and the knowledge search distribution module, the core service module transfers the query sentence to the corresponding knowledge service submodule for processing according to the search intention; and S4, displaying the data retrieved by the core service module through the search result display module. The retrieval system mainly comprises a search statement processing module, a search distribution module, a core service module and a search result display module, and realizes the combination of a semantic search function and recognition based on a template and recognition based on semantic analysis; the invention can solve the difficult problems of knowledge extraction and knowledge utilization in the water conservancy design industry.

Description

Water conservancy industry design field knowledge graph retrieval method and retrieval system
Technical Field
The invention relates to a method for retrieving a knowledge graph, in particular to a method and a system for retrieving the knowledge graph in the design field of water conservancy industry.
Background
The knowledge map is called knowledge domain visualization or knowledge domain mapping map in the book intelligence world, is a series of different graphs for displaying the relationship between the knowledge development process and the structure, describes knowledge resources and carriers thereof by using visualization technology, and excavates, analyzes, constructs, draws and displays knowledge and the mutual relation between the knowledge resources and the carriers. The method displays the complicated knowledge field through data mining, information processing, knowledge measurement and graph drawing, reveals the dynamic development rule of the knowledge field, and provides a practical and valuable reference for scientific research. With the arrival of the digital era, the information and knowledge in the field of the water conservancy design industry are more and more abundant, and the requirement of professional designers for acquiring the knowledge in the field of the water conservancy design industry is higher and higher. The current application of the field is deficient and the function is single.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to provide a method and a system for retrieving a knowledge graph in the design field of the water conservancy industry, which are accurate in identification and wide in processing range.
The technical scheme is as follows: the retrieval system of the present invention includes:
the input module is used for providing input of search terms;
the search sentence processing module is used for processing the search sentences and converting the search sentences into structured semantic triples;
the search distribution processing module is used for providing search intention judgment and core service selection functions;
the core service module provides the data retrieval capability of the core;
the search result display module is used for displaying the retrieved results in a list and a graph;
the output module provides format output of voice, characters or pictures;
and the searching and distributing module selects a sub-module to which the core service module belongs to provide retrieval service through judgment of the searching intention.
The retrieval method of the invention is used for realizing the retrieval system, and comprises the following steps:
s1, generating a grammar dependency tree by a search statement processing module through dependency syntax analysis of an unstructured natural language question, and extracting structured semantic triples according to structural features of the grammar dependency tree;
s2, the search distribution module distributes the search request to different search modules of the core service layer according to different search intents: distributing the knowledge of the water conservancy design commonsense to a search service based on a knowledge graph; distributing the knowledge question-answer questions to a question-answer based search service, and identifying the search intention of the user;
s3, after the query sentence input by the user is processed by the search sentence processing module and the knowledge search distribution module, the core service module transfers the query sentence to the corresponding knowledge service submodule for processing according to the search intention;
and S4, displaying the data retrieved by the core service module through the search result display module, and expressing the data in a graphical or list mode.
Further, in step S1, the search statement processing module generates a syntax dependency tree from the unstructured natural language question through dependency parsing, and then extracts a structured semantic triple according to the structural features of the syntax dependency tree, wherein the detailed implementation process includes the following steps:
s101, chinese word segmentation, part-of-speech tagging and named entity recognition are achieved through a model algorithm of natural language processing, and an unstructured natural language question is subjected to dependency syntactic analysis and semantic dependency analysis to generate a grammar dependency tree;
s102, extracting a structured semantic triple according to the structural feature of the syntax dependency tree;
and S103, performing syntax and semantic dependency analysis on the search sentence, and identifying the search intention of the user.
Further, in step S2, distributing the knowledge of the water conservancy design commonsense to a search service based on a knowledge graph; for knowledge question-and-answer questions distributed to a question-and-answer based search service, a detailed implementation process for identifying the search intention of a user comprises the following steps:
s201, identifying the search intention of a user, and judging by means of keywords in a search text;
s202, mining key words in water conservancy design text data through a TextRank algorithm, and defining corresponding water conservancy design field knowledge templates according to the key words;
s203, for the query input of the user, matching search is carried out in a knowledge template; if a corresponding matching template is found, the query is transferred to a knowledge graph search service for processing; otherwise, the system goes to the knowledge question and answer search service processing.
Further, in step S3, the detailed implementation process that the core service module transfers to the corresponding knowledge service sub-module for processing according to the search intention includes the following steps:
s301, converting an unstructured natural language query statement of a user into a structured knowledge graph query statement;
s302, determining that each word is an entity, attribute or concept by comparing the part of speech of each word after word segmentation; then matching with a predefined knowledge template in the water conservancy design field;
s303, if template information matched with the natural language query cannot be found in the predefined knowledge template, generating a structured query by semantic-based extraction;
s304, the retrieval service facing the question and answer takes the index of the question and answer data set stored in the index document as the data support of the bottom layer, extracts the key words from the natural language query sentence of the user, and then builds the full text index of the question and answer data by using Lucene, thereby querying the question and answer data set relevant to the user search.
Further, in step S4, the detailed implementation process for displaying the data retrieved by the core service module and expressed by a graphical or tabular method includes the following steps:
s401, extracting retrieval display of the relation map;
s402, retrieving and displaying a knowledge question and answer list;
and S403, providing services for the returned retrieval result data in the modes of interfaces, message queues and files.
Compared with the prior art, the invention has the following remarkable effects:
1. in the retrieval system, a semantic search function is combined with two strategies of identification based on a template and identification based on semantic analysis, and the semantic analysis is used as a supplement of pattern identification; firstly, performing word segmentation and entity identification operation on a search sentence of a user, and in the process, firstly, judging the part of speech and the entity type of each word by a system, namely whether the word belongs to an entity or an attribute or represents a concept; then, matching and identifying the condition search by using a predefined knowledge template in the hydraulic design field; the pre-defined knowledge template is a sub-graph of the constructed knowledge graph, and the information of one node or one edge in the knowledge graph is inquired; therefore, after matching a proper knowledge template, searching entity information capable of forming a subgraph with the knowledge template;
2. the search based on the semantics in the retrieval method of the invention searches the knowledge through the veins of the graph, so that the method has better recognition effect on the knowledge which can be obtained only by reasoning or associated information, has high precision of test sentences, more accurate recognition and wide processing range, and assists professional designers to carry out knowledge query and intelligent recommendation in the working process.
Drawings
FIG. 1 is a block diagram of a retrieval system according to the present invention;
FIG. 2 is a flow chart of a retrieval method of the present invention;
FIG. 3 (a) is an example of a syntactic dependency tree analysis of the present invention I;
FIG. 3 (b) is a syntactic dependency tree analysis example two of the present invention;
FIG. 4 is a graph of the precision of test statements of the present invention;
FIG. 5 is a flow chart of Lucene construction of a search application according to the present invention;
FIG. 6 is an entity attribute presentation diagram of the engineering and corresponding features of the present invention;
FIG. 7 is an entity relationship display diagram of the engineering and recommended design method of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description. The technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. The specific embodiments described herein are merely illustrative of the invention and do not delimit the invention.
As shown in fig. 1, the retrieval system of the present invention includes the following modules:
the input module is used for inputting the search words, and can be in various formats such as voice, characters or pictures;
the search sentence processing module is used for processing the search sentences and converting the search sentences into structured semantic triples;
the search distribution processing module is used for providing search intention judgment and core service selection functions;
the core service module provides the data retrieval capability of the core;
the search result display module is used for displaying the retrieved results in a list, a graph and the like;
and the output module provides various formats of output such as voice, characters or pictures.
As shown in fig. 2, the searching method of the present invention includes the following steps:
s1, generating a grammar dependency tree through dependency syntax analysis of an unstructured natural language question by a search statement processing module, and extracting structured semantic triples according to the structural characteristics of the grammar dependency tree to provide support for a knowledge search and distribution module. The implementation process comprises the following steps:
s101, chinese word segmentation, part-of-speech tagging and named entity recognition are achieved through a model algorithm of natural language processing, and a syntax dependence tree is generated for an unstructured natural language question through dependence syntactic analysis and semantic dependence analysis. For example, a Java based HanLp is selected for natural language processing work. The HanLP tool contains a series of models and algorithms for natural language processing, where the models have been trained for use in a production environment. The implemented components include Chinese word segmentation, part-of-speech tagging, named entity recognition, dependency parsing, and the like. Semantic Dependency analysis (SDP) is used to reveal Semantic Dependency structures of text, and analyze Semantic Dependency relationships between words. Semantic dependency analysis has the meaning of characterizing semantics through the semantic framework of sentences. The vocabulary is stripped, and only the semantic role relationship needs to be recognized. Semantic dependency analysis establishes dependency relationships on words with direct semantic associations and marks corresponding semantic relationships.
And S102, extracting a structured semantic triple according to the structural features of the syntax dependency tree, and enabling a computer to solve the natural language. A grammar dependency tree is a data structure that is generated by concatenating sentences in grammar dependencies. By means of the grammar tree, the comprehension of the sentence grammar structure level can be improved, and the problem of long-distance dependency in natural language is solved.
S103, in order to clearly display the fact that two different representation methods express the same semantic meaning, whether the semantic meaning contained in the two instances is the same is found by comparing the word centers and the dependency relationship of the current words of the two instances. By performing syntactic and semantic dependency analysis on the search sentence, the syntactic dependency analysis identifies the search intention of the user as shown in fig. 3 (a), (b). In addition, the semantic relationship triple can also be extracted and obtained based on the entity relationship of the dependency parsing.
S2, finally distributing the request to different search modules of the core service layer through the search distribution module according to different search intents: distributing knowledge of the water conservancy design commonsense to a knowledge graph-based search service; for knowledge question and answer questions, the knowledge question and answer based search service is distributed, and the search intention of the user is identified. The implementation process comprises the following steps:
s201, identifying the search intention of the user, and judging by means of keywords in the search text. And extracting keywords related to the search intention by adopting a TextRank algorithm.
The TextRank is an improved version based on a PageRank algorithm, and the extraction of the keywords is realized by calculating the similarity between adjacent windows of the vocabulary and calculating the weight of the edge on the basis of the PageRank. The TextRank model can be represented as a directed weighted graph G = (V, E), which is composed of a point set V and an edge set E, E is a subset of V × V. Any two points V in the figure i 、V j The weight of the edge between is W ji For a given point V i ,In(V i ) To point to the set of points at that point, out (V) i ) Is a point V i The set of points pointed to. The formula is as follows:
Figure BDA0004019378280000051
in the formula, WS (V) i ) A weight representing sentence i; the summation on the right side represents the contribution degree of each adjacent sentence to the sentence, in a single document, all the sentences can be roughly considered to be adjacent, the generation and extraction of a plurality of windows are not required to be carried out like a plurality of documents, and only a single document window is required; w is a group of ji Representing the similarity of two sentences, WS (V) j ) Representing the weight of the last iterated sentence j; d is the damping coefficient, typically 0.85.
S202, mining key words in the hydraulic design text data through a TextRank algorithm, and defining corresponding hydraulic design field knowledge templates according to the key words.
S203, for the query input of the user, matching and searching are carried out in the knowledge template, and if a corresponding matching template is found, the query is transferred to a knowledge graph search service for processing; otherwise, the system goes to the knowledge question and answer search service processing.
And S3, after the query sentence input by the user is processed by the search sentence processing and knowledge search distribution module, the core service module transfers the query sentence to the corresponding knowledge service submodule for processing according to the search intention. The implementation process comprises the following steps:
s301, converting the unstructured natural language query statement of the user into a structured knowledge graph query statement, namely extracting the entity and the relation from the natural language query of the user.
S302, determining that each word is an entity, attribute or concept by comparing information such as the part of speech and the like of each word after word segmentation. And then matching with a knowledge template of a predefined water conservancy design field. The knowledge template can be essentially viewed as a series of sub-graphs in the constructed knowledge-graph. The matching process is as follows:
and A1, determining a matched knowledge template according to the identified entity and the type thereof.
And A2, judging whether the candidate entity and the knowledge template can form a sub-graph of the knowledge graph or not, and finding the knowledge template with the highest matching degree from the plurality of knowledge templates.
And A3, after the knowledge template is determined, translating the knowledge template into a corresponding Cypher query language, for example, executing on neo4 j. Taking the template "entity + attribute" as an example, the corresponding query language is "match (x: entity type { attribute name: attribute value }) return x".
S303, if template information matching the natural language query cannot be found in the predefined knowledge template, a semantic-based extraction may be used to generate a structured query. The syntax dependence tree of the query statement is obtained in the search statement processing module, and by identifying the relation phrases in the syntax dependence tree, the entity can be identified through the syntax rules (mainly considering the subject-predicate structure and the pioneer structure) to generate the semantic triple set, and then the structured query statement is generated. And finally, submitting the structured query sentence obtained by using a template matching or semantic extraction method to a graph database for execution, and finally providing accurate question answers for users.
S304, the retrieval service facing to the question and answer takes the index of the question and answer data set stored in the index document as the data support of the bottom layer, the keywords are extracted from the natural language query sentences of the user, then the Lucene is used for establishing the full text index of the question and answer data, the process of constructing the search application program by the Lucene is shown in figure 5, and therefore the question and answer data set relevant to the user search is queried. In Lucene, there are two strategies for ranking the retrieval results, namely, according to the order of indexes, and calculating the similarity between the query statement and the document. The similarity is obtained by adding and counting each item contained after the query sentence is segmented with the document matching value. When calculating the matching value, the Lucene considers the different weight of each item in the query statement, and also considers the factors of normalization of scores and the like. Therefore, accurate return results can be provided for the query, and the comparison result of the query accuracy is shown in fig. 4.
And S4, displaying the water conservancy design knowledge search data based on the knowledge graph through a search result display module, and expressing the water conservancy design knowledge search data in a graphical visualization mode. The implementation process comprises the following steps:
s401, extracting retrieval display of the relation map; the user obtains the corresponding knowledge graph display by searching the relevant engineering characteristics, for example, the similar engineering is found by the help of the engineering characteristics as shown in fig. 6, more design method choices are provided by the determined engineering as shown in fig. 7, and more comprehensive and intuitive knowledge association can be obtained.
S402, retrieving and displaying a knowledge question and answer list; searching in the index documents according to the query sentences input by the user, calculating the similarity of the questions and the index documents, and then sorting the questions and the index documents from big to small according to the similarity and returning the query results. And after receiving the request, performing corresponding processing, and inquiring and returning corresponding knowledge result information.
And S403, providing services for returned retrieval result data in the modes of interfaces, message queues, files and the like.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (6)

1. A retrieval system of knowledge map in water conservancy industry design field is characterized by comprising:
the input module is used for providing input of search terms;
the search sentence processing module is used for processing the search sentences and converting the search sentences into structured semantic triples;
the search distribution processing module is used for providing search intention judgment and core service selection functions;
the core service module provides the data retrieval capability of the core;
the search result display module is used for displaying the retrieved results in a list and a graph;
the output module is used for providing format output of voice, characters or pictures;
and the search distribution module selects the sub-module to which the core service module belongs to provide retrieval service through the judgment of the search intention.
2. A method for retrieving a knowledge graph of a design field of water conservancy industry, which is used for realizing the retrieval system of the knowledge graph of the design field of water conservancy industry according to claim 1, and comprises the following steps:
s1, generating a syntax dependence tree by analyzing an unstructured natural language question through a search statement processing module through dependence syntax, and extracting a structured semantic triple according to the structural characteristics of the syntax dependence tree;
s2, the search distribution module distributes the search request to different search modules of the core service layer according to different search intents: distributing the knowledge of the water conservancy design commonsense to a search service based on a knowledge graph; distributing the knowledge question-answer questions to a question-answer based search service, and identifying the search intention of the user;
s3, after the query sentence input by the user is processed by the search sentence processing module and the knowledge search distribution module, the core service module transfers the query sentence to the corresponding knowledge service submodule for processing according to the search intention;
and S4, displaying the data retrieved by the core service module through the search result display module, and expressing the data in a graphical or list mode.
3. The water conservancy industry design field knowledge graph retrieval method as claimed in claim 2, wherein in step S1, the search sentence processing module is used to generate a syntax dependency tree from the unstructured natural language question by dependency syntax analysis, and then the structured semantic triples are extracted according to the structural features of the syntax dependency tree, and the detailed implementation process comprises the following steps:
s101, chinese word segmentation, part-of-speech tagging and named entity recognition are achieved through a model algorithm of natural language processing, and an unstructured natural language question is subjected to dependency syntactic analysis and semantic dependency analysis to generate a grammar dependency tree;
s102, extracting a structured semantic triple according to the structural feature of the syntax dependency tree;
and S103, performing syntax and semantic dependency analysis on the search sentence, and identifying the search intention of the user.
4. The method for retrieving the knowledge-graph of the hydraulic engineering design field according to claim 2, wherein in step S2, the knowledge of the common sense of hydraulic engineering design is distributed to a knowledge-graph-based search service; for knowledge question-and-answer questions distributed to a question-and-answer based search service, a detailed implementation process for identifying the search intention of a user comprises the following steps:
s201, identifying the search intention of a user, and judging by means of keywords in a search text;
s202, mining key words in water conservancy design text data through a TextRank algorithm, and defining corresponding water conservancy design field knowledge templates according to the key words;
s203, for the query input of the user, matching search is carried out in a knowledge template; if a corresponding matching template is found, the query is transferred to a knowledge graph search service for processing; otherwise, the system goes to the knowledge question and answer search service processing.
5. The water conservancy industry design field knowledge graph retrieval method according to claim 2, wherein in step S3, a detailed implementation process of transferring the core service module to the corresponding knowledge service sub-module for processing according to the search intention comprises the following steps:
s301, converting an unstructured natural language query sentence of a user into a structured knowledge graph query sentence;
s302, determining that each word is an entity, attribute or concept by comparing the part of speech of each word after word segmentation; then matching with a predefined knowledge template in the water conservancy design field;
s303, if template information matched with the natural language query cannot be found in the predefined knowledge template, generating a structured query by semantic-based extraction;
s304, the retrieval service facing the question and answer takes the question and answer data set index stored in the index document as the data support of the bottom layer, extracts the key words from the natural language query sentence of the user, and then builds the full text index of the question and answer data by using Lucene, thereby querying the question and answer data set related to the user search.
6. The water conservancy industry design field knowledge graph retrieval method according to claim 2, wherein in step S4, the detailed implementation process of displaying the data retrieved by the core service module and expressed by a graphical or tabulated method comprises the following steps:
s401, extracting retrieval display of the relation map;
s402, retrieving and displaying a knowledge question and answer list;
and S403, providing services for the returned retrieval result data in the modes of interfaces, message queues and files.
CN202211684902.2A 2022-12-27 2022-12-27 Water conservancy industry design field knowledge graph retrieval method and retrieval system Pending CN115982322A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211684902.2A CN115982322A (en) 2022-12-27 2022-12-27 Water conservancy industry design field knowledge graph retrieval method and retrieval system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211684902.2A CN115982322A (en) 2022-12-27 2022-12-27 Water conservancy industry design field knowledge graph retrieval method and retrieval system

Publications (1)

Publication Number Publication Date
CN115982322A true CN115982322A (en) 2023-04-18

Family

ID=85967654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211684902.2A Pending CN115982322A (en) 2022-12-27 2022-12-27 Water conservancy industry design field knowledge graph retrieval method and retrieval system

Country Status (1)

Country Link
CN (1) CN115982322A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775947A (en) * 2023-06-16 2023-09-19 北京枫清科技有限公司 Graph data semantic retrieval method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775947A (en) * 2023-06-16 2023-09-19 北京枫清科技有限公司 Graph data semantic retrieval method and device, electronic equipment and storage medium
CN116775947B (en) * 2023-06-16 2024-04-19 北京枫清科技有限公司 Graph data semantic retrieval method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109492077B (en) Knowledge graph-based petrochemical field question-answering method and system
CN109284363B (en) Question answering method and device, electronic equipment and storage medium
CN110399457B (en) Intelligent question answering method and system
CN110704743B (en) Semantic search method and device based on knowledge graph
CN111291161A (en) Legal case knowledge graph query method, device, equipment and storage medium
CN109446341A (en) The construction method and device of knowledge mapping
CN110727779A (en) Question-answering method and system based on multi-model fusion
CN111680173A (en) CMR model for uniformly retrieving cross-media information
CN111475623A (en) Case information semantic retrieval method and device based on knowledge graph
Devi et al. ADANS: An agriculture domain question answering system using ontologies
US20170262783A1 (en) Team Formation
CN112989208B (en) Information recommendation method and device, electronic equipment and storage medium
US20220391426A1 (en) Multi-system-based intelligent question answering method and apparatus, and device
CN114201587B (en) Ontology-based search intention expression method and system
CN111078835A (en) Resume evaluation method and device, computer equipment and storage medium
CN114117000A (en) Response method, device, equipment and storage medium
CN111104437A (en) Test data unified retrieval method and system based on object model
CN114218472A (en) Intelligent search system based on knowledge graph
CN112507089A (en) Intelligent question-answering engine based on knowledge graph and implementation method thereof
CN112507076A (en) Semantic analysis searching method and device and storage medium
US11487795B2 (en) Template-based automatic software bug question and answer method
KR20120047622A (en) System and method for managing digital contents
CN115982322A (en) Water conservancy industry design field knowledge graph retrieval method and retrieval system
CN113610626A (en) Bank credit risk identification knowledge graph construction method and device, computer equipment and computer readable storage medium
CN117609517A (en) Ocean data retrieval platform and retrieval method based on knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination