CN117851577A - Government service question-answering method based on knowledge graph enhanced large language model - Google Patents
Government service question-answering method based on knowledge graph enhanced large language model Download PDFInfo
- Publication number
- CN117851577A CN117851577A CN202410252313.XA CN202410252313A CN117851577A CN 117851577 A CN117851577 A CN 117851577A CN 202410252313 A CN202410252313 A CN 202410252313A CN 117851577 A CN117851577 A CN 117851577A
- Authority
- CN
- China
- Prior art keywords
- questions
- question
- knowledge
- user
- language model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 239000013598 vector Substances 0.000 claims abstract description 78
- 239000012634 fragment Substances 0.000 claims abstract description 61
- 238000007670 refining Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 12
- 238000012795 verification Methods 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 description 11
- 230000011218 segmentation Effects 0.000 description 7
- 238000010276 construction Methods 0.000 description 5
- 238000013508 migration Methods 0.000 description 5
- 230000005012 migration Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000009960 carding Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Tourism & Hospitality (AREA)
- Mathematical Physics (AREA)
- Educational Administration (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Human Computer Interaction (AREA)
- Development Economics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a government service question-answering method based on a knowledge graph enhanced large language model, which comprises the steps of firstly constructing a government service knowledge graph and a common problem solution library based on historical problems; then logically dividing an original government service document, and inputting a large language model for vectorization to obtain fragment vectors; after obtaining the user problem, judging whether the user problem is matched with the history problem in the library; if yes, outputting an answer based on the answer of the history question; if not, carrying out knowledge linking and vectorization on the user problem to obtain a knowledge vector; simultaneously, vectorizing the whole user problem to obtain a complete problem vector; performing joint search based on the knowledge vector and the complete problem vector to obtain a candidate document fragment list; and inputting the candidate document fragment list into a large language model, and finally generating an answer. The method solves the problems that a large language model cannot be combined with a knowledge graph in the prior art, and further cannot be well applied to government service scenes.
Description
Technical Field
The invention relates to the technical field of knowledge graphs, in particular to a government service question-answering method based on a knowledge graph reinforced large language model.
Background
The knowledge map is also called scientific knowledge map, called knowledge domain visualization or knowledge domain mapping map in book emotion, and is a series of different graphs for displaying knowledge development process and structural relationship, and knowledge resources and their carriers are described by using visualization technology, and knowledge and their interrelations are mined, analyzed, constructed, drawn and displayed.
The large language model (Large Language Model, LLM) refers to a deep learning model trained using large amounts of text data that can generate natural language text or understand the meaning of language text. LLM can handle a variety of natural language tasks, such as text classification, questions and answers, dialogue, etc., and is an important approach to artificial intelligence.
The existing LLM cannot be combined with a knowledge graph, and further cannot be better applied to government service scenes, so that a government service question-answering method based on a knowledge graph reinforced large language model is needed.
Disclosure of Invention
The embodiment of the invention aims to provide a government service question-answering method based on a knowledge graph enhanced large language model, which is used for solving the problem that LLM (logical level management) cannot be combined with the knowledge graph and cannot be better applied to government service scenes in the prior art.
In order to achieve the above object, an embodiment of the present invention provides a government service question-answering method based on a knowledge graph enhanced large language model, the method specifically comprising:
acquiring historical problems, and constructing a government service knowledge graph based on the historical problems;
extracting answers to the historical questions, and constructing a common question answering library based on the historical questions and answers corresponding to the historical questions;
obtaining an original government service document, logically dividing the original government service document by different dividing methods to obtain fragments, inputting the fragments into a large language model for vectorization to obtain fragment vectors;
acquiring a user problem, and judging whether the user problem is matched with a historical problem in a common problem solution library;
if so, outputting a question answer based on the answer of the history question;
if the user questions are not matched, carrying out knowledge linking on the user questions and a government service knowledge graph to obtain a plurality of knowledge, and inputting the plurality of knowledge into the large language model one by one to carry out vectorization to obtain a knowledge vector; meanwhile, inputting the user problem into a large language model for vectorization to obtain a complete problem vector;
performing joint search based on the knowledge vector and the complete problem vector to obtain a candidate document fragment list;
and inputting the candidate document fragment list into the large language model to generate a question answer.
Based on the technical scheme, the invention can also be improved as follows:
further, the obtaining the historical problem, constructing a government service knowledge graph based on the historical problem, includes:
constructing a business knowledge system, including classifying business domains of government services, refining user objects of government services, refining business scenes of government services and refining business processes of government services.
Further, the obtaining the historical problem, constructing a government service knowledge graph based on the historical problem, includes:
extracting a service label corresponding to the history problem;
and splitting the complex service into a plurality of sub-services, and performing cross-service association on the sub-services.
Further, the extracting the answers of the historical questions, and constructing a common question answer library based on the historical questions and the answers corresponding to the historical questions, includes:
extracting key questions from the historical questions, and constructing a key question solution library based on the key questions;
and extracting answers corresponding to the key questions, and constructing a key question answer library.
Further, the extracting the answers of the historical questions, constructing a common question answer library based on the historical questions and the answers corresponding to the historical questions, and further includes:
after obtaining the answers of the key questions, checking the answers, and judging whether the answers are correct or not;
when the answer is correct, determining an answer corresponding to the history question;
and when the answer is wrong, re-acquiring the answer corresponding to the key question, and checking the answer until the answer is correct.
Further, the obtaining the user problem, judging whether the user problem is matched with the historical problem in the common problem solution library, further includes:
judging whether the user problem only comprises one problem or not;
if the user problem comprises two or more problems, splitting the user problem to obtain a plurality of sub-problems;
inputting the sub-questions into a large language model to obtain business categories corresponding to the sub-questions;
if the user question only comprises one question, judging whether the user question is matched with the historical questions in the common question answering library.
Further, the obtaining the user problem and judging whether the user problem is matched with the historical problem in the common problem solution library includes:
according to the user questions, executing the common question solution library matching; the implementation mode comprises vector-based direct search, knowledge vector-based search and tag matching.
Further, the obtaining the user problem and judging whether the user problem is matched with the historical problem in the common problem solution library includes:
preferentially executing the matching of the key problem solution library and the user problem;
and if the key problem solving library cannot be matched with the user problem, executing the matching of the common problem library and the user problem.
Further, the performing a joint search based on the knowledge vector and the complete problem vector to obtain a candidate document fragment list includes:
and when the candidate document fragment list is not retrieved, re-acquiring the supplementary questions of the user questions until the candidate document fragment list is retrieved.
Further, the government service question-answering method based on the knowledge graph enhanced large language model further comprises the following steps:
and carrying out knowledge linking through the government service knowledge graph, and carrying out verification on answers to the questions based on the result of the knowledge linking.
The embodiment of the invention has the following advantages:
according to the government service question-answering method based on the knowledge graph enhanced large language model, a government service knowledge graph is constructed based on historical problems by acquiring the historical problems; extracting answers of the historical questions, and constructing a common question answering library based on the historical questions and answers corresponding to the historical questions; obtaining an original government service document, logically dividing the original government service document by different dividing methods to obtain fragments, inputting the fragments into a large language model for vectorization to obtain fragment vectors; acquiring user questions, and judging whether the user questions are matched with historical questions in a common question answering library; if so, outputting a question answer based on the answer of the history question; if the knowledge is not matched, carrying out knowledge linking on the user problem and the government service knowledge graph to obtain a plurality of knowledge, and inputting the plurality of knowledge into a large language model one by one to carry out vectorization to obtain a knowledge vector; meanwhile, inputting the user problem into a large language model for vectorization to obtain a complete problem vector; performing joint search based on the knowledge vector and the complete problem vector to obtain a candidate document fragment list; the candidate document fragment list is input into the large language model to generate a question answer, so that the problem that the large language model cannot be combined with a knowledge graph in the prior art, and further cannot be well applied to government service scenes is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.
FIG. 1 is a first flowchart of a government service questioning and answering method based on a knowledge graph enhanced large language model of the present invention;
FIG. 2 is a block diagram of a government service questioning and answering system based on a knowledge graph enhanced large language model of the present invention;
fig. 3 is a schematic diagram of an entity structure of an electronic device according to the present invention.
Wherein the reference numerals are as follows:
the system comprises a knowledge graph construction module 10, a common problem solution library construction module 20, a segment vector acquisition module 30, a problem solution generation module 40, an electronic device 50, a processor 501, a memory 502 and a bus 503.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
FIG. 1 is a flowchart of an embodiment of a government service question-answering method based on a knowledge graph enhanced large language model, as shown in FIG. 1, the government service question-answering method based on the knowledge graph enhanced large language model provided by the embodiment of the invention comprises the following steps:
s101, acquiring historical problems, and constructing a government service knowledge graph based on the historical problems;
in the field of government services, the knowledge content comparison focus of the question-answering robot is mainly based on government service expansion, and the business logic carding and knowledge system construction of the government services are the basis for realizing man-machine conversation of the question-answering robot. Thus, prior to constructing the government service knowledge map in S101, business logic for carding government services is also included.
From a government business perspective, often a transaction is associated with multiple scenarios, where the corresponding information and service resources are different for each scenario. For example, related services around an identification card can be subdivided into sub-items such as "claim", "change claim", "lose", etc., each sub-item in turn including a number of specific situations. Therefore, in order to achieve more accurate demand understanding and dialogue return, the specific situation of the user needs to be further refined, which needs to comb the business logic of the government, and construct a knowledge base and a knowledge graph on the basis, so as to support the subsequent question-answering step.
Specifically, in this embodiment, the carding government service logic includes at least three types: firstly, refining according to a flow, refining matters with complicated flow according to a handling link, and integrating related resources; secondly, refining according to user objects, and respectively expanding matters which have more user objects and different requirements for each class of object according to different user objects; thirdly, refining according to specific situations, and expanding matters which are different in requirements for more subdivision scenes according to specific situations; in addition, the process is refined according to the handling conditions, the fields and other dimensions.
In this embodiment, service logic is carded by taking a service handling service as an example:
firstly, the business main item of the business handling at the residence is as follows: handling the house;
secondly, the service sub-items of the service handling at the home are divided into: birth registration, entering into a Beijing, house entrance migration, house entrance cancellation and house entrance recovery;
and dividing the specific situations included in each service sub-item according to the service sub-item of the service transacted at the residence:
the specific cases of birth registration are divided into: the Beijing collective housing mouth of both parents and both parents, the Beijing household housing mouth of both parents and the Beijing collective housing mouth of both parents are in the military officer of the Beijing army, both parents and both parents are in the military officer of the Beijing collective housing mouth of both parents and one of the Beijing collective housing mouths of both parents is the outside-province city housing mouth of the other parent and one of the Beijing household housing mouths of both parents is the Beijing collective housing mouth of the other parent;
the specific situations of entering the Beijing and the household are as follows: the couple is close to the user, the old is close to the child, and the child is close to the parent;
the specific cases of the house entrance migration are divided into: urban migration, urban external migration, and urban external migration;
the specific cases of the log-out of the house are as follows: registering a log-out house;
the specific situations of the house opening recovery are divided into: restoring the house.
The constructing a government service knowledge map in S101 further includes: constructing a business knowledge system, including classifying business domains of government services, refining user objects of government services, refining business scenes of government services and refining business processes of government services;
the constructing a government service knowledge map in S101 further includes: extracting a service label corresponding to the history problem;
specifically, the service labels corresponding to the history problems include labels such as spoken language. From the perspective of users, when most users use the government service question-answering robot, some spoken keywords closely related to the service, such as identity card handling, identity card losing, identity card collar changing and the like, are often input, but complete sentences are rarely input, so that the words are more rarely matched with the names of government service items. Therefore, the method and the device can extract the labels of the types such as spoken language from the related historical problems on the basis of combing the business logic, so as to provide a basis for matching the user problems and the policy service contents in the subsequent steps, and further improve the accuracy of the mutual matching between the user problems and the policy service contents.
The constructing a government service knowledge map in S101 further includes: and splitting the complex service into a plurality of sub-services, and performing cross-service association on the sub-services.
In this embodiment, the complex service needs to be handled in a plurality of different domains and different types from the user's perspective, and the complex service needs to be handled according to a certain sequential flow. To achieve the goal of "one-time-through" of government services, the present solution requires that these complex services be carded, the related sub-services be carded out and organized according to a specific business process. Through the step, when the problem presented by the user relates to complex business, the whole business process related to different domains and different types can be obtained, the trouble of repeated questioning in different domains is avoided, and further the user experience is improved.
After the business logic of government service is combed, business labels corresponding to the history problems are extracted, and complex business is split into a plurality of sub-businesses, a government service knowledge graph can be further constructed, which comprises the following steps: and constructing a knowledge graph according to the government service business knowledge system, the government service knowledge flow and the cross-business association.
S102, extracting answers to the historical questions, and constructing a common question answering library based on the historical questions and answers corresponding to the historical questions;
in particular, the common problem solving (FAQ) library is a method for helping new users to become familiar with rules, and is usually provided to new users in a text manner, so as to integrate problems frequently posed by historical users in using a certain product or service, and provide explicit solutions to the problems. In this embodiment, the common question answering library is a set of question-answer pairs of all the historical questions and corresponding answers about the government service.
After the history problem is obtained in S102, the embodiment further includes: extracting key questions from the historical questions, and constructing a key question solution library based on the key questions;
typically, 20% of the frequently consulted questions in the history can cover 80% of the user's demand scenario, so these 20% of the frequently consulted questions can be defined as key questions, and the means of extraction include methods such as clustering, similar problem discovery, and the like. In addition, whether the historical problems can be judged to be frequent or not specifically is judged after a consultation threshold is preset according to the actual application situation of the question-answering robot.
The extracting the answer to the history question in the embodiment S102 further includes: extracting answers corresponding to the key questions and constructing a key question answer library;
aiming at the key questions, if the manual service results of the key questions are rated to be satisfactory answers of users, the manual service results are used as answers of the key questions; if the manual service result of the key question is not rated as a user satisfaction answer, automatically extracting a preliminary answer of the key question in a similar mode to an online question-answer mode, wherein the online question-answer mode is a subsequent document segment segmentation, vectorization, searching and answer generation process.
The extracting the answer to the history question in the embodiment S102 further includes: after obtaining the answers of the key questions, checking the answers, and judging whether the answers are correct or not;
when the answer is correct, determining an answer corresponding to the history question;
and when the answer is wrong, acquiring the answer of the history question again until the answer is correct.
Through the step, key questions in the historical questions are combed, accurate answers are provided, matching efficiency can be improved in the process of matching the follow-up user questions, and the effect of improving the service quality of the user can be achieved at lower cost.
S103, obtaining an original government service document, logically dividing the original government service document by different dividing methods to obtain fragments, inputting the fragments into a large language model for vectorization to obtain fragment vectors;
in particular, the large language model (Large Language Model, LLM) is a deep learning based natural language processing model that is capable of learning the grammar and semantics of natural language to generate human readable text.
In machine learning and natural language processing, vectorization (vectoring) refers to the process of mapping high-dimensional data (e.g., text, pictures, audio) to a low-dimensional space. The vectorization result is typically a vector of real numbers that represents the input data as points in a continuous numerical space. In short, the vectorization result is an N-dimensional real value vector, which can be used to represent almost anything, such as text, music, video, etc.
The vectorized results of the real-valued vectors may represent the semantics of the word, mainly because these vectorized results are learned according to the occurrence pattern of the word in the language context. For example, if a word is often presented with another word in some contexts, the vectorization results of the two words will have similar locations in vector space, meaning that they have similar meaning and semantics.
The semantics of words can be represented by their distribution in the context, that is, the meaning of a word can be inferred from its surrounding words. Large language models such as BERT, ELMo, and GPT may generate contextually relevant vectorized representations that better capture the semantics and context information of words.
And S103, logically dividing the original government service document into fragments by different dividing methods, wherein the dividing method comprises non-business segmentation and document division based on a business knowledge system, and the non-business segmentation comprises fixed window division, sliding window division, page division, paragraph division, chapter and section structure division and other logical divisions. In this scheme, the services all refer to government service services.
Fixed window partitioning: the coarsest dividing method directly divides according to fixed length, and generally causes logic splitting of window boundaries;
sliding window division: in order to solve the defect of fixed window division, a determined window is used for sliding, and a certain length of repetition is allowed between the windows; of course, this approach would bring about duplication of partial data between windows;
dividing according to pages: the content of the policy is logically divided according to pages, and obviously, logic splitting among pages is also caused;
dividing according to paragraphs: dividing the document in a paragraph way, wherein the method can keep the semantics of the original document in paragraph units;
dividing according to chapter and section structures: the logic structure of the book document is kept, the original semantic structure is kept theoretically, and the disassembly difficulty is high;
other logical divisions: the graphs, tables, formula codes, etc. in the document are divided as semantic segments.
Document division based on business knowledge system: the method is characterized in that the document is segmented based on a business knowledge system, the segmentation mode can be most close to a business scene of a user, and fine-granularity problem level segmentation can be generally achieved.
In addition, the original government service document is logically divided through S103 to obtain fragments, the fragments are input into a large language model for vectorization, and after fragment vectors are obtained, all fragment vectors need to be stored in a document vector library and a timely vector search engine for searching and indexing the document in the subsequent steps.
S104, acquiring the user questions, and judging whether the user questions are matched with the historical questions in the common question answering library.
After the user question is acquired in S104, the method further includes: judging whether the user problem only comprises one problem or not;
if the user problem comprises two or more problems, splitting the user problem to obtain a plurality of sub-problems; inputting the sub-questions into a large language model to obtain business categories corresponding to the sub-questions, and determining which business domain/subdomain the user questions correspond to; in this embodiment, by determining the service domain/subdomain corresponding to the user problem, the search range of the problem can be effectively reduced, and further the question-answering efficiency is improved.
If the user question only includes one question, judging whether the user question is matched with the historical questions in the common question answering library, including: according to the user questions, executing matching of the common question answering library and/or the key question library, wherein the matching method comprises the modes of vector-based direct search, knowledge vector-based search, label-based matching and the like;
after the common problem solving library matching is executed, calculating the matching degree of the user problem and the problem in the common problem solving library; and if the matching degree accords with a preset matching threshold, judging that the user problem is matched with the historical problem, and if the matching degree does not accord with the preset matching threshold, judging that the user problem is not matched with the historical problem. The matching threshold of the matching degree needs to be preset according to the actual application situation of the question-answering robot in the government service field.
S105, if the answer is matched, outputting a question answer based on the answer of the history question;
specifically, answers corresponding to the questions in the common question answer library are used for replying.
In addition, in the process of matching the questions and obtaining the answers to the questions in S104 and S105, matching of the key question answering library and the user questions is preferably performed, and if the key question answering library cannot be matched with the user questions, matching of the common question answering library and the user questions is performed, so that the user question matching efficiency is improved.
S106, if the knowledge is not matched, carrying out knowledge linking on the user problem and the government service knowledge graph to obtain a plurality of knowledge, and inputting the plurality of knowledge into a large language model one by one to carry out vectorization to obtain a knowledge vector; meanwhile, inputting the user problem into a large language model for vectorization to obtain a complete problem vector;
in particular, in machine learning and natural language processing, vectorization (empdding) refers to the process of mapping high-dimensional data (e.g., text, pictures, audio) to a low-dimensional space. The vector obtained by vectorization is typically a vector composed of real numbers, which represents the input data as points in a continuous numerical space. In short, vectorization is a real-valued vector in N dimensions, which can be used to represent almost anything, such as text, music, video, etc. Vectorization of real-valued vectors may represent the semantics of words, mainly because these vectorized vectors are learned according to the pattern of occurrence of the words in the language context. For example, if a word is often presented with another word in some contexts, then the embedded vectors of the two words will have similar locations in the vector space, meaning that they have similar meaning and semantics.
Thus, in vector space, the semantics of words can be represented by their distribution in context, that is, the meaning of a word can be inferred from its surrounding words. The large language models such as BERT, ELMo, GPT and the like can generate the vectorization representation of the context, and the step can better capture the semantics and the context information of specific words in the user input problem through the vectorization representation of the keywords after knowledge linkage. Similarly, the user input problem is vectorized as a whole through the step, so that the position of the whole user input problem in a vector space can be obtained, and the whole semantic and the context information of the problem can be obtained.
S107, carrying out joint search based on the knowledge vector and the complete problem vector to obtain a candidate document fragment list;
in this step, the joint search includes: firstly, vector searching is carried out by using a knowledge vector and a complete problem vector respectively to obtain a candidate document fragment list; for the document fragments in the two candidate document fragment lists, directly taking the document fragments as candidates of the final search result; and adding similarity scores for other document fragments only existing in one candidate list, and finally, taking a preset number of document fragments as final search results in descending order.
The method further comprises the following steps: when the candidate document snippet list is not retrieved through the joint search of S107, the supplementary questions of the user questions are retrieved until the candidate document snippet list is retrieved. Specifically, the supplementary problem is obtained after the supplementary problem is prompted to the user through feedback, and the feedback prompt comprises: prompting the user to supplement more information related to the user's problem or to re-describe the problem; if the preset number of times of acquiring the supplementary problem has been reached, the search is interrupted and the manual service is turned on.
S108, inputting the candidate document fragment list into a large language model to generate a question answer;
specifically, the method includes the steps that whether the document fragments in the candidate document fragment list meet preset conditions is judged, if yes, the candidate document fragment list is input into a large language model, and a question answer is generated; the preset conditions are preset according to the actual application situation of the question-answering robot in the government service field.
In this scheme, can realize through following steps: dividing the document fragments into a training set, a verification set and a test set; training the large language model based on the training set; performing performance evaluation on the trained large language model based on the verification set to obtain a large language model meeting performance conditions; evaluating the generation result of the large language model meeting the performance condition based on the test set to obtain an evaluation index corresponding to the large language model;
performing performance evaluation on the trained large language model based on the verification set to obtain a large language model meeting performance conditions; and evaluating a similarity calculation result of the large language model meeting the performance condition based on the test set to obtain an evaluation index corresponding to the large language model. Performing performance evaluation on the large language model to obtain a percentage score (namely, the maximum score is 100 points and the minimum score is 0 points), and determining the large language model with the score larger than a set value based on the percentage score, wherein for example, the large language model with the score larger than 90 points is the large language model meeting the performance condition;
and carrying out evaluation index calculation on the large language model meeting the performance condition to obtain evaluation indexes of the large language model, and calculating to obtain an evaluation value corresponding to each evaluation index, wherein the evaluation value is used for representing the capability value of the large language model on the evaluation indexes.
In this embodiment, after inputting the candidate document snippet list into the large language model and generating the answer to the question in S108, the method further includes: and carrying out knowledge linking through a government service knowledge graph, and carrying out verification on the answers of the questions based on the knowledge linking. Through the method, a complete government service question and answer flow is realized, and finally knowledge linking is carried out on the answers and the maps, so that a user can trace the answers, obtain the original text of the answers and carry out knowledge quotation on the answers, and therefore, the user can be assisted in judging the accuracy of the answers, and the user experience is improved.
The government service question-answering method based on the knowledge graph enhanced large language model acquires historical problems and constructs government service knowledge graphs based on the historical problems; extracting answers to the historical questions, and constructing a common question answering library based on the historical questions and answers corresponding to the historical questions; logically dividing an original government service document by different dividing methods to obtain different fragments, inputting the fragments into a large language model for vectorization to obtain fragment vectors; acquiring a user problem, and judging whether the user problem is matched with a historical problem in a common problem solution library; if so, outputting a question answer based on the answer of the history question; if the knowledge vectors are not matched, knowledge linking is carried out on the basis of the government service knowledge graph to obtain a plurality of knowledge, and the plurality of knowledge is input into the large language model one by one to carry out vectorization to obtain the knowledge vectors; inputting the user problem into a large language model for vectorization to obtain a complete problem vector; performing joint search based on the knowledge vector and the complete problem vector to obtain a candidate document fragment list; and inputting the candidate document fragment list into the large language model to generate a question answer. The method solves the problem that LLM in the prior art cannot be combined with a knowledge graph, and cannot be applied to government service scenes better.
FIG. 2 is a diagram of an exemplary embodiment of a government service questioning and answering system based on a knowledge graph enhanced large language model of the present invention; as shown in fig. 2, the government service question-answering system based on the knowledge graph enhanced large language model provided by the embodiment of the invention comprises the following modules:
the knowledge graph construction module 10 is used for acquiring historical problems and constructing government service knowledge graphs based on the historical problems;
the common problem solution library construction module 20 is configured to extract answers to the historical problems, and construct a common problem solution library based on the historical problems and answers corresponding to the historical problems;
the segment vector obtaining module 30 is configured to obtain an original government service document, logically divide the original government service document by different division methods to obtain segments, and input the segments into a large language model for vectorization to obtain segment vectors; the division method comprises non-business segmentation division and document division based on a business knowledge system, wherein the non-business segmentation division comprises fixed window division, sliding window division, page division, paragraph division, chapter structure division and other logic division.
The question answer generating module 40 is used for acquiring a user question and judging whether the user question is matched with a history question in a common question answer library;
if so, outputting a question answer based on the answer of the history question;
if the user questions are not matched, carrying out knowledge linking on the user questions and a government service knowledge graph to obtain a plurality of knowledge, and inputting the plurality of knowledge into the large language model one by one to carry out vectorization to obtain a knowledge vector; meanwhile, inputting the user problem into a large language model for vectorization to obtain a complete problem vector;
performing joint search based on the knowledge vector and the complete problem vector to obtain a candidate document fragment list;
and inputting the candidate document fragment list into the large language model to generate a question answer.
Fig. 3 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, as shown in fig. 3, an electronic device 50 includes: a processor 501 (processor), a memory 502 (memory), and a bus 503;
wherein, the processor 501 and the memory 502 complete the communication with each other through the bus 503;
the processor 501 is configured to invoke program instructions in the memory 502 to perform the methods provided by the above-described method embodiments, for example, including: acquiring historical problems, and constructing a government service knowledge graph based on the historical problems; extracting answers to the historical questions, and constructing a common question answering library based on the historical questions and answers corresponding to the historical questions; obtaining an original government service document, logically dividing the original government service document by different dividing methods to obtain fragments, inputting the fragments into a large language model for vectorization to obtain fragment vectors; acquiring a user problem, and judging whether the user problem is matched with a historical problem in a common problem solution library; if so, outputting a question answer based on the answer of the history question; if the user questions are not matched, carrying out knowledge linking on the user questions and a government service knowledge graph to obtain a plurality of knowledge, and inputting the plurality of knowledge into the large language model one by one to carry out vectorization to obtain a knowledge vector; meanwhile, inputting the user problem into a large language model for vectorization to obtain a complete problem vector; performing joint search based on the knowledge vector and the complete problem vector to obtain a candidate document fragment list; and inputting the candidate document fragment list into the large language model to generate a question answer.
The present embodiment provides a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: acquiring historical problems, and constructing a government service knowledge graph based on the historical problems; extracting answers to the historical questions, and constructing a common question answering library based on the historical questions and answers corresponding to the historical questions; obtaining an original government service document, logically dividing the original government service document by different dividing methods to obtain fragments, inputting the fragments into a large language model for vectorization to obtain fragment vectors; acquiring a user problem, and judging whether the user problem is matched with a historical problem in a common problem solution library; if so, outputting a question answer based on the answer of the history question; if the user questions are not matched, carrying out knowledge linking on the user questions and a government service knowledge graph to obtain a plurality of knowledge, and inputting the plurality of knowledge into the large language model one by one to carry out vectorization to obtain a knowledge vector; meanwhile, inputting the user problem into a large language model for vectorization to obtain a complete problem vector; performing joint search based on the knowledge vector and the complete problem vector to obtain a candidate document fragment list; and inputting the candidate document fragment list into the large language model to generate a question answer.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various storage media such as ROM, RAM, magnetic or optical disks may store program code.
The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the embodiments or the methods of some parts of the embodiments.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.
Claims (10)
1. A government service question-answering method based on a knowledge graph enhanced large language model, which is characterized by comprising the following steps:
acquiring historical problems, and constructing a government service knowledge graph based on the historical problems;
extracting answers to the historical questions, and constructing a common question answering library based on the historical questions and answers corresponding to the historical questions;
obtaining an original government service document, logically dividing the original government service document by different dividing methods to obtain fragments, inputting the fragments into a large language model for vectorization to obtain fragment vectors;
acquiring a user problem, and judging whether the user problem is matched with a historical problem in a common problem solution library;
if so, outputting a question answer based on the answer of the history question;
if the user questions are not matched, carrying out knowledge linking on the user questions and a government service knowledge graph to obtain a plurality of knowledge, and inputting the plurality of knowledge into the large language model one by one to carry out vectorization to obtain a knowledge vector; meanwhile, inputting the user problem into a large language model for vectorization to obtain a complete problem vector;
performing joint search based on the knowledge vector and the complete problem vector to obtain a candidate document fragment list;
and inputting the candidate document fragment list into the large language model to generate a question answer.
2. The knowledge-based enhanced large language model government service question and answer method of claim 1, wherein said obtaining historical questions, constructing government service knowledge-based on said historical questions, comprises:
constructing a business knowledge system, including classifying business domains of government services, refining user objects of government services, refining business scenes of government services and refining business processes of government services.
3. The knowledge-based enhanced large language model government service question and answer method of claim 2 wherein said obtaining historical questions and constructing government service knowledge based on said historical questions comprises:
extracting a service label corresponding to the history problem;
and splitting the complex service into a plurality of sub-services, and performing cross-service association on the sub-services.
4. The government service question-answering method based on a knowledge graph enhanced large language model according to claim 1, wherein the extracting answers to the historical questions and constructing a common question-answering library based on the historical questions and answers corresponding to the historical questions comprises:
extracting key questions from the historical questions, and constructing a key question solution library based on the key questions;
and extracting answers corresponding to the key questions, and constructing a key question answer library.
5. The method for government service questions and answers based on knowledge graph enhancement large language model according to claim 4, wherein the extracting answers of the historical questions, constructing a common question answer library based on the historical questions and answers corresponding to the historical questions, further comprises:
after obtaining the answers of the key questions, checking the answers, and judging whether the answers are correct or not;
when the answer is correct, determining an answer corresponding to the history question;
and when the answer is wrong, re-acquiring the answer corresponding to the key question, and checking the answer until the answer is correct.
6. The method for government service questions and answers based on knowledge graph enhancement large language model of claim 5, wherein said obtaining user questions, determining whether said user questions match with historical questions in a common question answer library, further comprises:
judging whether the user problem only comprises one problem or not;
if the user problem comprises two or more problems, splitting the user problem to obtain a plurality of sub-problems;
inputting the sub-questions into a large language model to obtain business categories corresponding to the sub-questions;
if the user question only comprises one question, judging whether the user question is matched with the historical questions in the common question answering library.
7. The method for government service questions and answers based on knowledge graph enhanced large language model of claim 4, wherein said obtaining user questions, determining whether the user questions match with historical questions in a common question answer library, comprises:
according to the user questions, executing the common question solution library matching; the implementation mode comprises vector-based direct search, knowledge vector-based search and tag matching.
8. The method for government service questions and answers based on knowledge graph enhanced large language model of claim 7, wherein said obtaining user questions, determining whether the user questions match with historical questions in a common question answer library, comprises:
preferentially executing the matching of the key problem solution library and the user problem;
and if the key problem solving library cannot be matched with the user problem, executing the matching of the common problem library and the user problem.
9. The government service question-answering method based on the knowledge-graph enhanced large language model according to claim 1, wherein the obtaining the candidate document fragment list based on the joint search of the knowledge vector and the complete question vector comprises:
and when the candidate document fragment list is not retrieved, re-acquiring the supplementary questions of the user questions until the candidate document fragment list is retrieved.
10. The knowledge-based enhanced large language model government service question-answering method according to claim 1, further comprising:
and carrying out knowledge linking through the government service knowledge graph, and carrying out verification on answers to the questions based on the result of the knowledge linking.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410252313.XA CN117851577B (en) | 2024-03-06 | 2024-03-06 | Government service question-answering method based on knowledge graph enhanced large language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410252313.XA CN117851577B (en) | 2024-03-06 | 2024-03-06 | Government service question-answering method based on knowledge graph enhanced large language model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117851577A true CN117851577A (en) | 2024-04-09 |
CN117851577B CN117851577B (en) | 2024-05-14 |
Family
ID=90538646
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410252313.XA Active CN117851577B (en) | 2024-03-06 | 2024-03-06 | Government service question-answering method based on knowledge graph enhanced large language model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117851577B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116303935A (en) * | 2023-02-07 | 2023-06-23 | 浪潮软件股份有限公司 | Intelligent question-answering method based on government affair service matters |
CN116628172A (en) * | 2023-07-24 | 2023-08-22 | 北京酷维在线科技有限公司 | Dialogue method for multi-strategy fusion in government service field based on knowledge graph |
CN116737887A (en) * | 2023-01-13 | 2023-09-12 | 浪潮软件股份有限公司 | Intelligent question-answering method based on service item elements |
CN116775847A (en) * | 2023-08-18 | 2023-09-19 | 中国电子科技集团公司第十五研究所 | Question answering method and system based on knowledge graph and large language model |
CN117056495A (en) * | 2023-10-08 | 2023-11-14 | 吉奥时空信息技术股份有限公司 | Automatic question-answering method and system for government affair consultation |
CN117435710A (en) * | 2023-11-01 | 2024-01-23 | 中电数据产业有限公司 | Government service question answering method, government service question answering device, terminal device and storage medium |
CN117493513A (en) * | 2023-11-08 | 2024-02-02 | 北京远问智能科技有限公司 | Question-answering system and method based on vector and large language model |
CN117573843A (en) * | 2024-01-15 | 2024-02-20 | 图灵人工智能研究院(南京)有限公司 | Knowledge calibration and retrieval enhancement-based medical auxiliary question-answering method and system |
-
2024
- 2024-03-06 CN CN202410252313.XA patent/CN117851577B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116737887A (en) * | 2023-01-13 | 2023-09-12 | 浪潮软件股份有限公司 | Intelligent question-answering method based on service item elements |
CN116303935A (en) * | 2023-02-07 | 2023-06-23 | 浪潮软件股份有限公司 | Intelligent question-answering method based on government affair service matters |
CN116628172A (en) * | 2023-07-24 | 2023-08-22 | 北京酷维在线科技有限公司 | Dialogue method for multi-strategy fusion in government service field based on knowledge graph |
CN116775847A (en) * | 2023-08-18 | 2023-09-19 | 中国电子科技集团公司第十五研究所 | Question answering method and system based on knowledge graph and large language model |
CN117056495A (en) * | 2023-10-08 | 2023-11-14 | 吉奥时空信息技术股份有限公司 | Automatic question-answering method and system for government affair consultation |
CN117435710A (en) * | 2023-11-01 | 2024-01-23 | 中电数据产业有限公司 | Government service question answering method, government service question answering device, terminal device and storage medium |
CN117493513A (en) * | 2023-11-08 | 2024-02-02 | 北京远问智能科技有限公司 | Question-answering system and method based on vector and large language model |
CN117573843A (en) * | 2024-01-15 | 2024-02-20 | 图灵人工智能研究院(南京)有限公司 | Knowledge calibration and retrieval enhancement-based medical auxiliary question-answering method and system |
Non-Patent Citations (2)
Title |
---|
孟庆国;王友奎;田红红;: "政务服务中的智能化搜索:特征、应用场景和运行机理", 电子政务, no. 02, 31 December 2020 (2020-12-31) * |
王友奎;张楠;赵雪娇;: "政务服务中的智能问答机器人:现状、机理和关键支撑", 电子政务, no. 02, 31 December 2020 (2020-12-31) * |
Also Published As
Publication number | Publication date |
---|---|
CN117851577B (en) | 2024-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117033608B (en) | Knowledge graph generation type question-answering method and system based on large language model | |
CN111026842B (en) | Natural language processing method, natural language processing device and intelligent question-answering system | |
CN117033571A (en) | Knowledge question-answering system construction method and system | |
CN117743315B (en) | Method for providing high-quality data for multi-mode large model system | |
CN113761868B (en) | Text processing method, text processing device, electronic equipment and readable storage medium | |
Hassani et al. | LVTIA: A new method for keyphrase extraction from scientific video lectures | |
CN111651569B (en) | Knowledge base question-answering method and system in electric power field | |
CN117891930B (en) | Book knowledge question-answering method based on knowledge graph enhanced large language model | |
CN117171325A (en) | Task processing method and server | |
CN112579666A (en) | Intelligent question-answering system and method and related equipment | |
CN112613293A (en) | Abstract generation method and device, electronic equipment and storage medium | |
CN118093839B (en) | Knowledge operation question-answer dialogue processing method and system based on deep learning | |
CN116955591A (en) | Recommendation language generation method, related device and medium for content recommendation | |
CN117520503A (en) | Financial customer service dialogue generation method, device, equipment and medium based on LLM model | |
CN112528653A (en) | Short text entity identification method and system | |
CN118410175A (en) | Intelligent manufacturing capacity diagnosis method and device based on large language model and knowledge graph | |
CN117875292A (en) | Financial knowledge intelligent question-answering method, system, terminal equipment and storage medium | |
CN114491079A (en) | Knowledge graph construction and query method, device, equipment and medium | |
CN111783425B (en) | Intention identification method based on syntactic analysis model and related device | |
CN117349515A (en) | Search processing method, electronic device and storage medium | |
CN117851577B (en) | Government service question-answering method based on knowledge graph enhanced large language model | |
Gomes Jr et al. | Framework for knowledge discovery in educational video repositories | |
CN112328812A (en) | Domain knowledge extraction method and system based on self-adjusting parameters and electronic equipment | |
CN116467414B (en) | Data verification method, device, equipment and computer readable storage medium | |
US11977853B2 (en) | Aggregating and identifying new sign language signs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A Government Service Q&A Method Based on Knowledge Graph Enhanced Big Language Model Granted publication date: 20240514 Pledgee: Nanjing Branch of Jiangsu Bank Co.,Ltd. Pledgor: Haiyizhi information technology (Nanjing) Co.,Ltd. Registration number: Y2024980022455 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right |